All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading
@ 2015-05-12 15:37 ` Maxime Ripard
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-12 15:37 UTC (permalink / raw)
  To: Vinod Koul, Dan Williams, Gregory Clement, Jason Cooper,
	Andrew Lunn, Sebastian Hesselbarth
  Cc: dmaengine, linux-arm-kernel, linux-kernel, linux-crypto,
	Lior Amsalem, Thomas Petazzoni, Herbert Xu, David S. Miller,
	Maxime Ripard

Hi,

This serie refactors the mv_xor in order to support the latest Armada
38x features, including the PQ support in order to offload the RAID6
PQ operations.

Not all the PQ operations are supported by the XOR engine, so we had
to introduce new async_tx flags in the process to identify
un-supported operations.

Please note that this is currently not usable because of a possible
regression in the RAID stack in 4.1 that is being discussed at the
moment here: https://lkml.org/lkml/2015/5/7/527

Let me know what you think,
Maxime

Lior Amsalem (7):
  dmaengine: mv_xor: add support for a38x command in descriptor mode
  dmaengine: mv_xor: Enlarge descriptor pool size
  dmaengine: mv_xor: improve descriptors list handling and reduce
    locking
  dmaengine: mv_xor: bug fix for racing condition in descriptors cleanup
  async_tx: adding mult and sum_product flags
  dmaengine: mv_xor: add support for a38x RAID6 support
  ARM: mvebu: a38x: Enable A38x XOR engine features

Maxime Ripard (1):
  dmaengine: mv_xor: Rename function for consistent naming

 Documentation/devicetree/bindings/dma/mv-xor.txt |   2 +-
 arch/arm/boot/dts/armada-38x.dtsi                |  20 +-
 crypto/async_tx/async_raid6_recov.c              |   4 +-
 drivers/dma/mv_xor.c                             | 459 +++++++++++++++--------
 drivers/dma/mv_xor.h                             |  32 +-
 include/linux/dmaengine.h                        |   4 +
 6 files changed, 326 insertions(+), 195 deletions(-)

-- 
2.4.0

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading
@ 2015-05-12 15:37 ` Maxime Ripard
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-12 15:37 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

This serie refactors the mv_xor in order to support the latest Armada
38x features, including the PQ support in order to offload the RAID6
PQ operations.

Not all the PQ operations are supported by the XOR engine, so we had
to introduce new async_tx flags in the process to identify
un-supported operations.

Please note that this is currently not usable because of a possible
regression in the RAID stack in 4.1 that is being discussed at the
moment here: https://lkml.org/lkml/2015/5/7/527

Let me know what you think,
Maxime

Lior Amsalem (7):
  dmaengine: mv_xor: add support for a38x command in descriptor mode
  dmaengine: mv_xor: Enlarge descriptor pool size
  dmaengine: mv_xor: improve descriptors list handling and reduce
    locking
  dmaengine: mv_xor: bug fix for racing condition in descriptors cleanup
  async_tx: adding mult and sum_product flags
  dmaengine: mv_xor: add support for a38x RAID6 support
  ARM: mvebu: a38x: Enable A38x XOR engine features

Maxime Ripard (1):
  dmaengine: mv_xor: Rename function for consistent naming

 Documentation/devicetree/bindings/dma/mv-xor.txt |   2 +-
 arch/arm/boot/dts/armada-38x.dtsi                |  20 +-
 crypto/async_tx/async_raid6_recov.c              |   4 +-
 drivers/dma/mv_xor.c                             | 459 +++++++++++++++--------
 drivers/dma/mv_xor.h                             |  32 +-
 include/linux/dmaengine.h                        |   4 +
 6 files changed, 326 insertions(+), 195 deletions(-)

-- 
2.4.0

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 1/8] dmaengine: mv_xor: Rename function for consistent naming
  2015-05-12 15:37 ` Maxime Ripard
@ 2015-05-12 15:37   ` Maxime Ripard
  -1 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-12 15:37 UTC (permalink / raw)
  To: Vinod Koul, Dan Williams, Gregory Clement, Jason Cooper,
	Andrew Lunn, Sebastian Hesselbarth
  Cc: dmaengine, linux-arm-kernel, linux-kernel, linux-crypto,
	Lior Amsalem, Thomas Petazzoni, Herbert Xu, David S. Miller,
	Maxime Ripard

The current function names isn't very consistent, and functions with the
same prefix might operate on either a channel or a descriptor, which is
kind of confusing.

Rename these functions to have a consistent and clearer naming scheme.

Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
---
 drivers/dma/mv_xor.c | 83 ++++++++++++++++++++++++++--------------------------
 1 file changed, 42 insertions(+), 41 deletions(-)

diff --git a/drivers/dma/mv_xor.c b/drivers/dma/mv_xor.c
index 1c56001df676..1affafa888c5 100644
--- a/drivers/dma/mv_xor.c
+++ b/drivers/dma/mv_xor.c
@@ -104,7 +104,7 @@ static u32 mv_chan_get_intr_cause(struct mv_xor_chan *chan)
 	return intr_cause;
 }
 
-static void mv_xor_device_clear_eoc_cause(struct mv_xor_chan *chan)
+static void mv_chan_clear_eoc_cause(struct mv_xor_chan *chan)
 {
 	u32 val;
 
@@ -114,14 +114,14 @@ static void mv_xor_device_clear_eoc_cause(struct mv_xor_chan *chan)
 	writel_relaxed(val, XOR_INTR_CAUSE(chan));
 }
 
-static void mv_xor_device_clear_err_status(struct mv_xor_chan *chan)
+static void mv_chan_clear_err_status(struct mv_xor_chan *chan)
 {
 	u32 val = 0xFFFF0000 >> (chan->idx * 16);
 	writel_relaxed(val, XOR_INTR_CAUSE(chan));
 }
 
-static void mv_set_mode(struct mv_xor_chan *chan,
-			       enum dma_transaction_type type)
+static void mv_chan_set_mode(struct mv_xor_chan *chan,
+			     enum dma_transaction_type type)
 {
 	u32 op_mode;
 	u32 config = readl_relaxed(XOR_CONFIG(chan));
@@ -172,12 +172,12 @@ static char mv_chan_is_busy(struct mv_xor_chan *chan)
 }
 
 /**
- * mv_xor_free_slots - flags descriptor slots for reuse
+ * mv_chan_free_slots - flags descriptor slots for reuse
  * @slot: Slot to free
  * Caller must hold &mv_chan->lock while calling this function
  */
-static void mv_xor_free_slots(struct mv_xor_chan *mv_chan,
-			      struct mv_xor_desc_slot *slot)
+static void mv_chan_free_slots(struct mv_xor_chan *mv_chan,
+			       struct mv_xor_desc_slot *slot)
 {
 	dev_dbg(mv_chan_to_devp(mv_chan), "%s %d slot %p\n",
 		__func__, __LINE__, slot);
@@ -187,12 +187,12 @@ static void mv_xor_free_slots(struct mv_xor_chan *mv_chan,
 }
 
 /*
- * mv_xor_start_new_chain - program the engine to operate on new chain headed by
- * sw_desc
+ * mv_chan_start_new_chain - program the engine to operate on new
+ * chain headed by sw_desc
  * Caller must hold &mv_chan->lock while calling this function
  */
-static void mv_xor_start_new_chain(struct mv_xor_chan *mv_chan,
-				   struct mv_xor_desc_slot *sw_desc)
+static void mv_chan_start_new_chain(struct mv_xor_chan *mv_chan,
+				    struct mv_xor_desc_slot *sw_desc)
 {
 	dev_dbg(mv_chan_to_devp(mv_chan), "%s %d: sw_desc %p\n",
 		__func__, __LINE__, sw_desc);
@@ -205,8 +205,9 @@ static void mv_xor_start_new_chain(struct mv_xor_chan *mv_chan,
 }
 
 static dma_cookie_t
-mv_xor_run_tx_complete_actions(struct mv_xor_desc_slot *desc,
-	struct mv_xor_chan *mv_chan, dma_cookie_t cookie)
+mv_desc_run_tx_complete_actions(struct mv_xor_desc_slot *desc,
+				struct mv_xor_chan *mv_chan,
+				dma_cookie_t cookie)
 {
 	BUG_ON(desc->async_tx.cookie < 0);
 
@@ -230,7 +231,7 @@ mv_xor_run_tx_complete_actions(struct mv_xor_desc_slot *desc,
 }
 
 static int
-mv_xor_clean_completed_slots(struct mv_xor_chan *mv_chan)
+mv_chan_clean_completed_slots(struct mv_xor_chan *mv_chan)
 {
 	struct mv_xor_desc_slot *iter, *_iter;
 
@@ -240,15 +241,15 @@ mv_xor_clean_completed_slots(struct mv_xor_chan *mv_chan)
 
 		if (async_tx_test_ack(&iter->async_tx)) {
 			list_del(&iter->completed_node);
-			mv_xor_free_slots(mv_chan, iter);
+			mv_chan_free_slots(mv_chan, iter);
 		}
 	}
 	return 0;
 }
 
 static int
-mv_xor_clean_slot(struct mv_xor_desc_slot *desc,
-	struct mv_xor_chan *mv_chan)
+mv_desc_clean_slot(struct mv_xor_desc_slot *desc,
+		   struct mv_xor_chan *mv_chan)
 {
 	dev_dbg(mv_chan_to_devp(mv_chan), "%s %d: desc %p flags %d\n",
 		__func__, __LINE__, desc, desc->async_tx.flags);
@@ -262,12 +263,12 @@ mv_xor_clean_slot(struct mv_xor_desc_slot *desc,
 		return 0;
 	}
 
-	mv_xor_free_slots(mv_chan, desc);
+	mv_chan_free_slots(mv_chan, desc);
 	return 0;
 }
 
 /* This function must be called with the mv_xor_chan spinlock held */
-static void mv_xor_slot_cleanup(struct mv_xor_chan *mv_chan)
+static void mv_chan_slot_cleanup(struct mv_xor_chan *mv_chan)
 {
 	struct mv_xor_desc_slot *iter, *_iter;
 	dma_cookie_t cookie = 0;
@@ -277,7 +278,7 @@ static void mv_xor_slot_cleanup(struct mv_xor_chan *mv_chan)
 
 	dev_dbg(mv_chan_to_devp(mv_chan), "%s %d\n", __func__, __LINE__);
 	dev_dbg(mv_chan_to_devp(mv_chan), "current_desc %x\n", current_desc);
-	mv_xor_clean_completed_slots(mv_chan);
+	mv_chan_clean_completed_slots(mv_chan);
 
 	/* free completed slots from the chain starting with
 	 * the oldest descriptor
@@ -304,9 +305,9 @@ static void mv_xor_slot_cleanup(struct mv_xor_chan *mv_chan)
 				break;
 		}
 
-		cookie = mv_xor_run_tx_complete_actions(iter, mv_chan, cookie);
+		cookie = mv_desc_run_tx_complete_actions(iter, mv_chan, cookie);
 
-		if (mv_xor_clean_slot(iter, mv_chan))
+		if (mv_desc_clean_slot(iter, mv_chan))
 			break;
 	}
 
@@ -316,7 +317,7 @@ static void mv_xor_slot_cleanup(struct mv_xor_chan *mv_chan)
 					struct mv_xor_desc_slot,
 					chain_node);
 
-		mv_xor_start_new_chain(mv_chan, chain_head);
+		mv_chan_start_new_chain(mv_chan, chain_head);
 	}
 
 	if (cookie > 0)
@@ -328,12 +329,12 @@ static void mv_xor_tasklet(unsigned long data)
 	struct mv_xor_chan *chan = (struct mv_xor_chan *) data;
 
 	spin_lock_bh(&chan->lock);
-	mv_xor_slot_cleanup(chan);
+	mv_chan_slot_cleanup(chan);
 	spin_unlock_bh(&chan->lock);
 }
 
 static struct mv_xor_desc_slot *
-mv_xor_alloc_slot(struct mv_xor_chan *mv_chan)
+mv_chan_alloc_slot(struct mv_xor_chan *mv_chan)
 {
 	struct mv_xor_desc_slot *iter, *_iter;
 	int retry = 0;
@@ -431,7 +432,7 @@ mv_xor_tx_submit(struct dma_async_tx_descriptor *tx)
 	}
 
 	if (new_hw_chain)
-		mv_xor_start_new_chain(mv_chan, sw_desc);
+		mv_chan_start_new_chain(mv_chan, sw_desc);
 
 	spin_unlock_bh(&mv_chan->lock);
 
@@ -504,7 +505,7 @@ mv_xor_prep_dma_xor(struct dma_chan *chan, dma_addr_t dest, dma_addr_t *src,
 		__func__, src_cnt, len, &dest, flags);
 
 	spin_lock_bh(&mv_chan->lock);
-	sw_desc = mv_xor_alloc_slot(mv_chan);
+	sw_desc = mv_chan_alloc_slot(mv_chan);
 	if (sw_desc) {
 		sw_desc->type = DMA_XOR;
 		sw_desc->async_tx.flags = flags;
@@ -556,7 +557,7 @@ static void mv_xor_free_chan_resources(struct dma_chan *chan)
 
 	spin_lock_bh(&mv_chan->lock);
 
-	mv_xor_slot_cleanup(mv_chan);
+	mv_chan_slot_cleanup(mv_chan);
 
 	list_for_each_entry_safe(iter, _iter, &mv_chan->chain,
 					chain_node) {
@@ -603,13 +604,13 @@ static enum dma_status mv_xor_status(struct dma_chan *chan,
 		return ret;
 
 	spin_lock_bh(&mv_chan->lock);
-	mv_xor_slot_cleanup(mv_chan);
+	mv_chan_slot_cleanup(mv_chan);
 	spin_unlock_bh(&mv_chan->lock);
 
 	return dma_cookie_status(chan, cookie, txstate);
 }
 
-static void mv_dump_xor_regs(struct mv_xor_chan *chan)
+static void mv_chan_dump_regs(struct mv_xor_chan *chan)
 {
 	u32 val;
 
@@ -632,8 +633,8 @@ static void mv_dump_xor_regs(struct mv_xor_chan *chan)
 	dev_err(mv_chan_to_devp(chan), "error addr   0x%08x\n", val);
 }
 
-static void mv_xor_err_interrupt_handler(struct mv_xor_chan *chan,
-					 u32 intr_cause)
+static void mv_chan_err_interrupt_handler(struct mv_xor_chan *chan,
+					  u32 intr_cause)
 {
 	if (intr_cause & XOR_INT_ERR_DECODE) {
 		dev_dbg(mv_chan_to_devp(chan), "ignoring address decode error\n");
@@ -643,7 +644,7 @@ static void mv_xor_err_interrupt_handler(struct mv_xor_chan *chan,
 	dev_err(mv_chan_to_devp(chan), "error on chan %d. intr cause 0x%08x\n",
 		chan->idx, intr_cause);
 
-	mv_dump_xor_regs(chan);
+	mv_chan_dump_regs(chan);
 	WARN_ON(1);
 }
 
@@ -655,11 +656,11 @@ static irqreturn_t mv_xor_interrupt_handler(int irq, void *data)
 	dev_dbg(mv_chan_to_devp(chan), "intr cause %x\n", intr_cause);
 
 	if (intr_cause & XOR_INTR_ERRORS)
-		mv_xor_err_interrupt_handler(chan, intr_cause);
+		mv_chan_err_interrupt_handler(chan, intr_cause);
 
 	tasklet_schedule(&chan->irq_tasklet);
 
-	mv_xor_device_clear_eoc_cause(chan);
+	mv_chan_clear_eoc_cause(chan);
 
 	return IRQ_HANDLED;
 }
@@ -678,7 +679,7 @@ static void mv_xor_issue_pending(struct dma_chan *chan)
  * Perform a transaction to verify the HW works.
  */
 
-static int mv_xor_memcpy_self_test(struct mv_xor_chan *mv_chan)
+static int mv_chan_memcpy_self_test(struct mv_xor_chan *mv_chan)
 {
 	int i, ret;
 	void *src, *dest;
@@ -787,7 +788,7 @@ out:
 
 #define MV_XOR_NUM_SRC_TEST 4 /* must be <= 15 */
 static int
-mv_xor_xor_self_test(struct mv_xor_chan *mv_chan)
+mv_chan_xor_self_test(struct mv_xor_chan *mv_chan)
 {
 	int i, src_idx, ret;
 	struct page *dest;
@@ -1014,7 +1015,7 @@ mv_xor_channel_add(struct mv_xor_device *xordev,
 		     mv_chan);
 
 	/* clear errors before enabling interrupts */
-	mv_xor_device_clear_err_status(mv_chan);
+	mv_chan_clear_err_status(mv_chan);
 
 	ret = request_irq(mv_chan->irq, mv_xor_interrupt_handler,
 			  0, dev_name(&pdev->dev), mv_chan);
@@ -1023,7 +1024,7 @@ mv_xor_channel_add(struct mv_xor_device *xordev,
 
 	mv_chan_unmask_interrupts(mv_chan);
 
-	mv_set_mode(mv_chan, DMA_XOR);
+	mv_chan_set_mode(mv_chan, DMA_XOR);
 
 	spin_lock_init(&mv_chan->lock);
 	INIT_LIST_HEAD(&mv_chan->chain);
@@ -1035,14 +1036,14 @@ mv_xor_channel_add(struct mv_xor_device *xordev,
 	list_add_tail(&mv_chan->dmachan.device_node, &dma_dev->channels);
 
 	if (dma_has_cap(DMA_MEMCPY, dma_dev->cap_mask)) {
-		ret = mv_xor_memcpy_self_test(mv_chan);
+		ret = mv_chan_memcpy_self_test(mv_chan);
 		dev_dbg(&pdev->dev, "memcpy self test returned %d\n", ret);
 		if (ret)
 			goto err_free_irq;
 	}
 
 	if (dma_has_cap(DMA_XOR, dma_dev->cap_mask)) {
-		ret = mv_xor_xor_self_test(mv_chan);
+		ret = mv_chan_xor_self_test(mv_chan);
 		dev_dbg(&pdev->dev, "xor self test returned %d\n", ret);
 		if (ret)
 			goto err_free_irq;
-- 
2.4.0

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 1/8] dmaengine: mv_xor: Rename function for consistent naming
@ 2015-05-12 15:37   ` Maxime Ripard
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-12 15:37 UTC (permalink / raw)
  To: linux-arm-kernel

The current function names isn't very consistent, and functions with the
same prefix might operate on either a channel or a descriptor, which is
kind of confusing.

Rename these functions to have a consistent and clearer naming scheme.

Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
---
 drivers/dma/mv_xor.c | 83 ++++++++++++++++++++++++++--------------------------
 1 file changed, 42 insertions(+), 41 deletions(-)

diff --git a/drivers/dma/mv_xor.c b/drivers/dma/mv_xor.c
index 1c56001df676..1affafa888c5 100644
--- a/drivers/dma/mv_xor.c
+++ b/drivers/dma/mv_xor.c
@@ -104,7 +104,7 @@ static u32 mv_chan_get_intr_cause(struct mv_xor_chan *chan)
 	return intr_cause;
 }
 
-static void mv_xor_device_clear_eoc_cause(struct mv_xor_chan *chan)
+static void mv_chan_clear_eoc_cause(struct mv_xor_chan *chan)
 {
 	u32 val;
 
@@ -114,14 +114,14 @@ static void mv_xor_device_clear_eoc_cause(struct mv_xor_chan *chan)
 	writel_relaxed(val, XOR_INTR_CAUSE(chan));
 }
 
-static void mv_xor_device_clear_err_status(struct mv_xor_chan *chan)
+static void mv_chan_clear_err_status(struct mv_xor_chan *chan)
 {
 	u32 val = 0xFFFF0000 >> (chan->idx * 16);
 	writel_relaxed(val, XOR_INTR_CAUSE(chan));
 }
 
-static void mv_set_mode(struct mv_xor_chan *chan,
-			       enum dma_transaction_type type)
+static void mv_chan_set_mode(struct mv_xor_chan *chan,
+			     enum dma_transaction_type type)
 {
 	u32 op_mode;
 	u32 config = readl_relaxed(XOR_CONFIG(chan));
@@ -172,12 +172,12 @@ static char mv_chan_is_busy(struct mv_xor_chan *chan)
 }
 
 /**
- * mv_xor_free_slots - flags descriptor slots for reuse
+ * mv_chan_free_slots - flags descriptor slots for reuse
  * @slot: Slot to free
  * Caller must hold &mv_chan->lock while calling this function
  */
-static void mv_xor_free_slots(struct mv_xor_chan *mv_chan,
-			      struct mv_xor_desc_slot *slot)
+static void mv_chan_free_slots(struct mv_xor_chan *mv_chan,
+			       struct mv_xor_desc_slot *slot)
 {
 	dev_dbg(mv_chan_to_devp(mv_chan), "%s %d slot %p\n",
 		__func__, __LINE__, slot);
@@ -187,12 +187,12 @@ static void mv_xor_free_slots(struct mv_xor_chan *mv_chan,
 }
 
 /*
- * mv_xor_start_new_chain - program the engine to operate on new chain headed by
- * sw_desc
+ * mv_chan_start_new_chain - program the engine to operate on new
+ * chain headed by sw_desc
  * Caller must hold &mv_chan->lock while calling this function
  */
-static void mv_xor_start_new_chain(struct mv_xor_chan *mv_chan,
-				   struct mv_xor_desc_slot *sw_desc)
+static void mv_chan_start_new_chain(struct mv_xor_chan *mv_chan,
+				    struct mv_xor_desc_slot *sw_desc)
 {
 	dev_dbg(mv_chan_to_devp(mv_chan), "%s %d: sw_desc %p\n",
 		__func__, __LINE__, sw_desc);
@@ -205,8 +205,9 @@ static void mv_xor_start_new_chain(struct mv_xor_chan *mv_chan,
 }
 
 static dma_cookie_t
-mv_xor_run_tx_complete_actions(struct mv_xor_desc_slot *desc,
-	struct mv_xor_chan *mv_chan, dma_cookie_t cookie)
+mv_desc_run_tx_complete_actions(struct mv_xor_desc_slot *desc,
+				struct mv_xor_chan *mv_chan,
+				dma_cookie_t cookie)
 {
 	BUG_ON(desc->async_tx.cookie < 0);
 
@@ -230,7 +231,7 @@ mv_xor_run_tx_complete_actions(struct mv_xor_desc_slot *desc,
 }
 
 static int
-mv_xor_clean_completed_slots(struct mv_xor_chan *mv_chan)
+mv_chan_clean_completed_slots(struct mv_xor_chan *mv_chan)
 {
 	struct mv_xor_desc_slot *iter, *_iter;
 
@@ -240,15 +241,15 @@ mv_xor_clean_completed_slots(struct mv_xor_chan *mv_chan)
 
 		if (async_tx_test_ack(&iter->async_tx)) {
 			list_del(&iter->completed_node);
-			mv_xor_free_slots(mv_chan, iter);
+			mv_chan_free_slots(mv_chan, iter);
 		}
 	}
 	return 0;
 }
 
 static int
-mv_xor_clean_slot(struct mv_xor_desc_slot *desc,
-	struct mv_xor_chan *mv_chan)
+mv_desc_clean_slot(struct mv_xor_desc_slot *desc,
+		   struct mv_xor_chan *mv_chan)
 {
 	dev_dbg(mv_chan_to_devp(mv_chan), "%s %d: desc %p flags %d\n",
 		__func__, __LINE__, desc, desc->async_tx.flags);
@@ -262,12 +263,12 @@ mv_xor_clean_slot(struct mv_xor_desc_slot *desc,
 		return 0;
 	}
 
-	mv_xor_free_slots(mv_chan, desc);
+	mv_chan_free_slots(mv_chan, desc);
 	return 0;
 }
 
 /* This function must be called with the mv_xor_chan spinlock held */
-static void mv_xor_slot_cleanup(struct mv_xor_chan *mv_chan)
+static void mv_chan_slot_cleanup(struct mv_xor_chan *mv_chan)
 {
 	struct mv_xor_desc_slot *iter, *_iter;
 	dma_cookie_t cookie = 0;
@@ -277,7 +278,7 @@ static void mv_xor_slot_cleanup(struct mv_xor_chan *mv_chan)
 
 	dev_dbg(mv_chan_to_devp(mv_chan), "%s %d\n", __func__, __LINE__);
 	dev_dbg(mv_chan_to_devp(mv_chan), "current_desc %x\n", current_desc);
-	mv_xor_clean_completed_slots(mv_chan);
+	mv_chan_clean_completed_slots(mv_chan);
 
 	/* free completed slots from the chain starting with
 	 * the oldest descriptor
@@ -304,9 +305,9 @@ static void mv_xor_slot_cleanup(struct mv_xor_chan *mv_chan)
 				break;
 		}
 
-		cookie = mv_xor_run_tx_complete_actions(iter, mv_chan, cookie);
+		cookie = mv_desc_run_tx_complete_actions(iter, mv_chan, cookie);
 
-		if (mv_xor_clean_slot(iter, mv_chan))
+		if (mv_desc_clean_slot(iter, mv_chan))
 			break;
 	}
 
@@ -316,7 +317,7 @@ static void mv_xor_slot_cleanup(struct mv_xor_chan *mv_chan)
 					struct mv_xor_desc_slot,
 					chain_node);
 
-		mv_xor_start_new_chain(mv_chan, chain_head);
+		mv_chan_start_new_chain(mv_chan, chain_head);
 	}
 
 	if (cookie > 0)
@@ -328,12 +329,12 @@ static void mv_xor_tasklet(unsigned long data)
 	struct mv_xor_chan *chan = (struct mv_xor_chan *) data;
 
 	spin_lock_bh(&chan->lock);
-	mv_xor_slot_cleanup(chan);
+	mv_chan_slot_cleanup(chan);
 	spin_unlock_bh(&chan->lock);
 }
 
 static struct mv_xor_desc_slot *
-mv_xor_alloc_slot(struct mv_xor_chan *mv_chan)
+mv_chan_alloc_slot(struct mv_xor_chan *mv_chan)
 {
 	struct mv_xor_desc_slot *iter, *_iter;
 	int retry = 0;
@@ -431,7 +432,7 @@ mv_xor_tx_submit(struct dma_async_tx_descriptor *tx)
 	}
 
 	if (new_hw_chain)
-		mv_xor_start_new_chain(mv_chan, sw_desc);
+		mv_chan_start_new_chain(mv_chan, sw_desc);
 
 	spin_unlock_bh(&mv_chan->lock);
 
@@ -504,7 +505,7 @@ mv_xor_prep_dma_xor(struct dma_chan *chan, dma_addr_t dest, dma_addr_t *src,
 		__func__, src_cnt, len, &dest, flags);
 
 	spin_lock_bh(&mv_chan->lock);
-	sw_desc = mv_xor_alloc_slot(mv_chan);
+	sw_desc = mv_chan_alloc_slot(mv_chan);
 	if (sw_desc) {
 		sw_desc->type = DMA_XOR;
 		sw_desc->async_tx.flags = flags;
@@ -556,7 +557,7 @@ static void mv_xor_free_chan_resources(struct dma_chan *chan)
 
 	spin_lock_bh(&mv_chan->lock);
 
-	mv_xor_slot_cleanup(mv_chan);
+	mv_chan_slot_cleanup(mv_chan);
 
 	list_for_each_entry_safe(iter, _iter, &mv_chan->chain,
 					chain_node) {
@@ -603,13 +604,13 @@ static enum dma_status mv_xor_status(struct dma_chan *chan,
 		return ret;
 
 	spin_lock_bh(&mv_chan->lock);
-	mv_xor_slot_cleanup(mv_chan);
+	mv_chan_slot_cleanup(mv_chan);
 	spin_unlock_bh(&mv_chan->lock);
 
 	return dma_cookie_status(chan, cookie, txstate);
 }
 
-static void mv_dump_xor_regs(struct mv_xor_chan *chan)
+static void mv_chan_dump_regs(struct mv_xor_chan *chan)
 {
 	u32 val;
 
@@ -632,8 +633,8 @@ static void mv_dump_xor_regs(struct mv_xor_chan *chan)
 	dev_err(mv_chan_to_devp(chan), "error addr   0x%08x\n", val);
 }
 
-static void mv_xor_err_interrupt_handler(struct mv_xor_chan *chan,
-					 u32 intr_cause)
+static void mv_chan_err_interrupt_handler(struct mv_xor_chan *chan,
+					  u32 intr_cause)
 {
 	if (intr_cause & XOR_INT_ERR_DECODE) {
 		dev_dbg(mv_chan_to_devp(chan), "ignoring address decode error\n");
@@ -643,7 +644,7 @@ static void mv_xor_err_interrupt_handler(struct mv_xor_chan *chan,
 	dev_err(mv_chan_to_devp(chan), "error on chan %d. intr cause 0x%08x\n",
 		chan->idx, intr_cause);
 
-	mv_dump_xor_regs(chan);
+	mv_chan_dump_regs(chan);
 	WARN_ON(1);
 }
 
@@ -655,11 +656,11 @@ static irqreturn_t mv_xor_interrupt_handler(int irq, void *data)
 	dev_dbg(mv_chan_to_devp(chan), "intr cause %x\n", intr_cause);
 
 	if (intr_cause & XOR_INTR_ERRORS)
-		mv_xor_err_interrupt_handler(chan, intr_cause);
+		mv_chan_err_interrupt_handler(chan, intr_cause);
 
 	tasklet_schedule(&chan->irq_tasklet);
 
-	mv_xor_device_clear_eoc_cause(chan);
+	mv_chan_clear_eoc_cause(chan);
 
 	return IRQ_HANDLED;
 }
@@ -678,7 +679,7 @@ static void mv_xor_issue_pending(struct dma_chan *chan)
  * Perform a transaction to verify the HW works.
  */
 
-static int mv_xor_memcpy_self_test(struct mv_xor_chan *mv_chan)
+static int mv_chan_memcpy_self_test(struct mv_xor_chan *mv_chan)
 {
 	int i, ret;
 	void *src, *dest;
@@ -787,7 +788,7 @@ out:
 
 #define MV_XOR_NUM_SRC_TEST 4 /* must be <= 15 */
 static int
-mv_xor_xor_self_test(struct mv_xor_chan *mv_chan)
+mv_chan_xor_self_test(struct mv_xor_chan *mv_chan)
 {
 	int i, src_idx, ret;
 	struct page *dest;
@@ -1014,7 +1015,7 @@ mv_xor_channel_add(struct mv_xor_device *xordev,
 		     mv_chan);
 
 	/* clear errors before enabling interrupts */
-	mv_xor_device_clear_err_status(mv_chan);
+	mv_chan_clear_err_status(mv_chan);
 
 	ret = request_irq(mv_chan->irq, mv_xor_interrupt_handler,
 			  0, dev_name(&pdev->dev), mv_chan);
@@ -1023,7 +1024,7 @@ mv_xor_channel_add(struct mv_xor_device *xordev,
 
 	mv_chan_unmask_interrupts(mv_chan);
 
-	mv_set_mode(mv_chan, DMA_XOR);
+	mv_chan_set_mode(mv_chan, DMA_XOR);
 
 	spin_lock_init(&mv_chan->lock);
 	INIT_LIST_HEAD(&mv_chan->chain);
@@ -1035,14 +1036,14 @@ mv_xor_channel_add(struct mv_xor_device *xordev,
 	list_add_tail(&mv_chan->dmachan.device_node, &dma_dev->channels);
 
 	if (dma_has_cap(DMA_MEMCPY, dma_dev->cap_mask)) {
-		ret = mv_xor_memcpy_self_test(mv_chan);
+		ret = mv_chan_memcpy_self_test(mv_chan);
 		dev_dbg(&pdev->dev, "memcpy self test returned %d\n", ret);
 		if (ret)
 			goto err_free_irq;
 	}
 
 	if (dma_has_cap(DMA_XOR, dma_dev->cap_mask)) {
-		ret = mv_xor_xor_self_test(mv_chan);
+		ret = mv_chan_xor_self_test(mv_chan);
 		dev_dbg(&pdev->dev, "xor self test returned %d\n", ret);
 		if (ret)
 			goto err_free_irq;
-- 
2.4.0

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode
  2015-05-12 15:37 ` Maxime Ripard
@ 2015-05-12 15:37   ` Maxime Ripard
  -1 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-12 15:37 UTC (permalink / raw)
  To: Vinod Koul, Dan Williams, Gregory Clement, Jason Cooper,
	Andrew Lunn, Sebastian Hesselbarth
  Cc: dmaengine, linux-arm-kernel, linux-kernel, linux-crypto,
	Lior Amsalem, Thomas Petazzoni, Herbert Xu, David S. Miller,
	Maxime Ripard

From: Lior Amsalem <alior@marvell.com>

The Marvell Armada 38x SoC introduce new features to the XOR engine,
especially the fact that the engine mode (MEMCPY/XOR/PQ/etc) can be part of
the descriptor and not set through the controller registers.

This new feature allows mixing of different commands (even PQ) on the same
channel/chain without the need to stop the engine to reconfigure the engine
mode.

Refactor the driver to be able to use that new feature on the Armada 38x,
while keeping the old behaviour on the older SoCs.

Signed-off-by: Lior Amsalem <alior@marvell.com>
Reviewed-by: Ofer Heifetz <oferh@marvell.com>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
---
 Documentation/devicetree/bindings/dma/mv-xor.txt |  2 +-
 drivers/dma/mv_xor.c                             | 83 ++++++++++++++++++++----
 drivers/dma/mv_xor.h                             |  7 ++
 3 files changed, 78 insertions(+), 14 deletions(-)

diff --git a/Documentation/devicetree/bindings/dma/mv-xor.txt b/Documentation/devicetree/bindings/dma/mv-xor.txt
index 7c6cb7fcecd2..5b10f9ee0937 100644
--- a/Documentation/devicetree/bindings/dma/mv-xor.txt
+++ b/Documentation/devicetree/bindings/dma/mv-xor.txt
@@ -1,7 +1,7 @@
 * Marvell XOR engines
 
 Required properties:
-- compatible: Should be "marvell,orion-xor"
+- compatible: Should be "marvell,orion-xor" or "marvell,a38x-xor"
 - reg: Should contain registers location and length (two sets)
     the first set is the low registers, the second set the high
     registers for the XOR engine.
diff --git a/drivers/dma/mv_xor.c b/drivers/dma/mv_xor.c
index 1affafa888c5..bcc9aec0ce02 100644
--- a/drivers/dma/mv_xor.c
+++ b/drivers/dma/mv_xor.c
@@ -19,6 +19,7 @@
 #include <linux/dma-mapping.h>
 #include <linux/spinlock.h>
 #include <linux/interrupt.h>
+#include <linux/of_device.h>
 #include <linux/platform_device.h>
 #include <linux/memory.h>
 #include <linux/clk.h>
@@ -30,6 +31,11 @@
 #include "dmaengine.h"
 #include "mv_xor.h"
 
+enum mv_xor_mode {
+	XOR_MODE_IN_REG,
+	XOR_MODE_IN_DESC,
+};
+
 static void mv_xor_issue_pending(struct dma_chan *chan);
 
 #define to_mv_xor_chan(chan)		\
@@ -56,6 +62,24 @@ static void mv_desc_init(struct mv_xor_desc_slot *desc,
 	hw_desc->byte_count = byte_count;
 }
 
+static void mv_desc_set_mode(struct mv_xor_desc_slot *desc)
+{
+	struct mv_xor_desc *hw_desc = desc->hw_desc;
+
+	switch (desc->type) {
+	case DMA_XOR:
+	case DMA_INTERRUPT:
+		hw_desc->desc_command |= XOR_DESC_OPERATION_XOR;
+		break;
+	case DMA_MEMCPY:
+		hw_desc->desc_command |= XOR_DESC_OPERATION_MEMCPY;
+		break;
+	default:
+		BUG();
+		return;
+	}
+}
+
 static void mv_desc_set_next_desc(struct mv_xor_desc_slot *desc,
 				  u32 next_desc_addr)
 {
@@ -154,6 +178,25 @@ static void mv_chan_set_mode(struct mv_xor_chan *chan,
 	chan->current_type = type;
 }
 
+static void mv_chan_set_mode_to_desc(struct mv_xor_chan *chan)
+{
+	u32 op_mode;
+	u32 config = readl_relaxed(XOR_CONFIG(chan));
+
+	op_mode = XOR_OPERATION_MODE_IN_DESC;
+
+	config &= ~0x7;
+	config |= op_mode;
+
+#if defined(__BIG_ENDIAN)
+	config |= XOR_DESCRIPTOR_SWAP;
+#else
+	config &= ~XOR_DESCRIPTOR_SWAP;
+#endif
+
+	writel_relaxed(config, XOR_CONFIG(chan));
+}
+
 static void mv_chan_activate(struct mv_xor_chan *chan)
 {
 	dev_dbg(mv_chan_to_devp(chan), " activate chan.\n");
@@ -510,6 +553,8 @@ mv_xor_prep_dma_xor(struct dma_chan *chan, dma_addr_t dest, dma_addr_t *src,
 		sw_desc->type = DMA_XOR;
 		sw_desc->async_tx.flags = flags;
 		mv_desc_init(sw_desc, dest, len, flags);
+		if (mv_chan->op_in_desc == XOR_MODE_IN_DESC)
+			mv_desc_set_mode(sw_desc);
 		while (src_cnt--)
 			mv_desc_set_src_addr(sw_desc, src_cnt, src[src_cnt]);
 	}
@@ -952,7 +997,7 @@ static int mv_xor_channel_remove(struct mv_xor_chan *mv_chan)
 static struct mv_xor_chan *
 mv_xor_channel_add(struct mv_xor_device *xordev,
 		   struct platform_device *pdev,
-		   int idx, dma_cap_mask_t cap_mask, int irq)
+		   int idx, dma_cap_mask_t cap_mask, int irq, int op_in_desc)
 {
 	int ret = 0;
 	struct mv_xor_chan *mv_chan;
@@ -964,6 +1009,7 @@ mv_xor_channel_add(struct mv_xor_device *xordev,
 
 	mv_chan->idx = idx;
 	mv_chan->irq = irq;
+	mv_chan->op_in_desc = op_in_desc;
 
 	dma_dev = &mv_chan->dmadev;
 
@@ -1024,7 +1070,10 @@ mv_xor_channel_add(struct mv_xor_device *xordev,
 
 	mv_chan_unmask_interrupts(mv_chan);
 
-	mv_chan_set_mode(mv_chan, DMA_XOR);
+	if (mv_chan->op_in_desc == XOR_MODE_IN_DESC)
+		mv_chan_set_mode_to_desc(mv_chan);
+	else
+		mv_chan_set_mode(mv_chan, DMA_XOR);
 
 	spin_lock_init(&mv_chan->lock);
 	INIT_LIST_HEAD(&mv_chan->chain);
@@ -1049,7 +1098,8 @@ mv_xor_channel_add(struct mv_xor_device *xordev,
 			goto err_free_irq;
 	}
 
-	dev_info(&pdev->dev, "Marvell XOR: ( %s%s%s)\n",
+	dev_info(&pdev->dev, "Marvell XOR (%s): ( %s%s%s)\n",
+		 mv_chan->op_in_desc ? "Descriptor Mode" : "Registers Mode",
 		 dma_has_cap(DMA_XOR, dma_dev->cap_mask) ? "xor " : "",
 		 dma_has_cap(DMA_MEMCPY, dma_dev->cap_mask) ? "cpy " : "",
 		 dma_has_cap(DMA_INTERRUPT, dma_dev->cap_mask) ? "intr " : "");
@@ -1098,6 +1148,15 @@ mv_xor_conf_mbus_windows(struct mv_xor_device *xordev,
 	writel(0, base + WINDOW_OVERRIDE_CTRL(1));
 }
 
+#ifdef CONFIG_OF
+static const struct of_device_id mv_xor_dt_ids[] = {
+	{ .compatible = "marvell,orion-xor", .data = (void *)XOR_MODE_IN_REG },
+	{ .compatible = "marvell,a38x-xor", .data = (void *)XOR_MODE_IN_DESC },
+	{},
+};
+MODULE_DEVICE_TABLE(of, mv_xor_dt_ids);
+#endif
+
 static int mv_xor_probe(struct platform_device *pdev)
 {
 	const struct mbus_dram_target_info *dram;
@@ -1105,6 +1164,7 @@ static int mv_xor_probe(struct platform_device *pdev)
 	struct mv_xor_platform_data *pdata = dev_get_platdata(&pdev->dev);
 	struct resource *res;
 	int i, ret;
+	int op_in_desc;
 
 	dev_notice(&pdev->dev, "Marvell shared XOR driver\n");
 
@@ -1149,11 +1209,15 @@ static int mv_xor_probe(struct platform_device *pdev)
 	if (pdev->dev.of_node) {
 		struct device_node *np;
 		int i = 0;
+		const struct of_device_id *of_id =
+			of_match_device(of_match_ptr(mv_xor_dt_ids),
+					&pdev->dev);
 
 		for_each_child_of_node(pdev->dev.of_node, np) {
 			struct mv_xor_chan *chan;
 			dma_cap_mask_t cap_mask;
 			int irq;
+			op_in_desc = (int)of_id->data;
 
 			dma_cap_zero(cap_mask);
 			if (of_property_read_bool(np, "dmacap,memcpy"))
@@ -1170,7 +1234,7 @@ static int mv_xor_probe(struct platform_device *pdev)
 			}
 
 			chan = mv_xor_channel_add(xordev, pdev, i,
-						  cap_mask, irq);
+						  cap_mask, irq, op_in_desc);
 			if (IS_ERR(chan)) {
 				ret = PTR_ERR(chan);
 				irq_dispose_mapping(irq);
@@ -1199,7 +1263,8 @@ static int mv_xor_probe(struct platform_device *pdev)
 			}
 
 			chan = mv_xor_channel_add(xordev, pdev, i,
-						  cd->cap_mask, irq);
+						  cd->cap_mask, irq,
+						  XOR_MODE_IN_REG);
 			if (IS_ERR(chan)) {
 				ret = PTR_ERR(chan);
 				goto err_channel_add;
@@ -1245,14 +1310,6 @@ static int mv_xor_remove(struct platform_device *pdev)
 	return 0;
 }
 
-#ifdef CONFIG_OF
-static const struct of_device_id mv_xor_dt_ids[] = {
-       { .compatible = "marvell,orion-xor", },
-       {},
-};
-MODULE_DEVICE_TABLE(of, mv_xor_dt_ids);
-#endif
-
 static struct platform_driver mv_xor_driver = {
 	.probe		= mv_xor_probe,
 	.remove		= mv_xor_remove,
diff --git a/drivers/dma/mv_xor.h b/drivers/dma/mv_xor.h
index 91958dba39a2..89a8c85539a1 100644
--- a/drivers/dma/mv_xor.h
+++ b/drivers/dma/mv_xor.h
@@ -30,8 +30,13 @@
 /* Values for the XOR_CONFIG register */
 #define XOR_OPERATION_MODE_XOR		0
 #define XOR_OPERATION_MODE_MEMCPY	2
+#define XOR_OPERATION_MODE_IN_DESC      7
 #define XOR_DESCRIPTOR_SWAP		BIT(14)
 
+#define XOR_DESC_OPERATION_XOR          (0 << 24)
+#define XOR_DESC_OPERATION_CRC32C       (1 << 24)
+#define XOR_DESC_OPERATION_MEMCPY       (2 << 24)
+
 #define XOR_DESC_DMA_OWNED		BIT(31)
 #define XOR_DESC_EOD_INT_EN		BIT(31)
 
@@ -95,6 +100,7 @@ struct mv_xor_device {
  * @all_slots: complete domain of slots usable by the channel
  * @slots_allocated: records the actual size of the descriptor slot pool
  * @irq_tasklet: bottom half where mv_xor_slot_cleanup runs
+ * @op_in_desc: new mode of driver, each op is writen to descriptor.
  */
 struct mv_xor_chan {
 	int			pending;
@@ -115,6 +121,7 @@ struct mv_xor_chan {
 	struct list_head	all_slots;
 	int			slots_allocated;
 	struct tasklet_struct	irq_tasklet;
+	int                     op_in_desc;
 	char			dummy_src[MV_XOR_MIN_BYTE_COUNT];
 	char			dummy_dst[MV_XOR_MIN_BYTE_COUNT];
 	dma_addr_t		dummy_src_addr, dummy_dst_addr;
-- 
2.4.0

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode
@ 2015-05-12 15:37   ` Maxime Ripard
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-12 15:37 UTC (permalink / raw)
  To: linux-arm-kernel

From: Lior Amsalem <alior@marvell.com>

The Marvell Armada 38x SoC introduce new features to the XOR engine,
especially the fact that the engine mode (MEMCPY/XOR/PQ/etc) can be part of
the descriptor and not set through the controller registers.

This new feature allows mixing of different commands (even PQ) on the same
channel/chain without the need to stop the engine to reconfigure the engine
mode.

Refactor the driver to be able to use that new feature on the Armada 38x,
while keeping the old behaviour on the older SoCs.

Signed-off-by: Lior Amsalem <alior@marvell.com>
Reviewed-by: Ofer Heifetz <oferh@marvell.com>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
---
 Documentation/devicetree/bindings/dma/mv-xor.txt |  2 +-
 drivers/dma/mv_xor.c                             | 83 ++++++++++++++++++++----
 drivers/dma/mv_xor.h                             |  7 ++
 3 files changed, 78 insertions(+), 14 deletions(-)

diff --git a/Documentation/devicetree/bindings/dma/mv-xor.txt b/Documentation/devicetree/bindings/dma/mv-xor.txt
index 7c6cb7fcecd2..5b10f9ee0937 100644
--- a/Documentation/devicetree/bindings/dma/mv-xor.txt
+++ b/Documentation/devicetree/bindings/dma/mv-xor.txt
@@ -1,7 +1,7 @@
 * Marvell XOR engines
 
 Required properties:
-- compatible: Should be "marvell,orion-xor"
+- compatible: Should be "marvell,orion-xor" or "marvell,a38x-xor"
 - reg: Should contain registers location and length (two sets)
     the first set is the low registers, the second set the high
     registers for the XOR engine.
diff --git a/drivers/dma/mv_xor.c b/drivers/dma/mv_xor.c
index 1affafa888c5..bcc9aec0ce02 100644
--- a/drivers/dma/mv_xor.c
+++ b/drivers/dma/mv_xor.c
@@ -19,6 +19,7 @@
 #include <linux/dma-mapping.h>
 #include <linux/spinlock.h>
 #include <linux/interrupt.h>
+#include <linux/of_device.h>
 #include <linux/platform_device.h>
 #include <linux/memory.h>
 #include <linux/clk.h>
@@ -30,6 +31,11 @@
 #include "dmaengine.h"
 #include "mv_xor.h"
 
+enum mv_xor_mode {
+	XOR_MODE_IN_REG,
+	XOR_MODE_IN_DESC,
+};
+
 static void mv_xor_issue_pending(struct dma_chan *chan);
 
 #define to_mv_xor_chan(chan)		\
@@ -56,6 +62,24 @@ static void mv_desc_init(struct mv_xor_desc_slot *desc,
 	hw_desc->byte_count = byte_count;
 }
 
+static void mv_desc_set_mode(struct mv_xor_desc_slot *desc)
+{
+	struct mv_xor_desc *hw_desc = desc->hw_desc;
+
+	switch (desc->type) {
+	case DMA_XOR:
+	case DMA_INTERRUPT:
+		hw_desc->desc_command |= XOR_DESC_OPERATION_XOR;
+		break;
+	case DMA_MEMCPY:
+		hw_desc->desc_command |= XOR_DESC_OPERATION_MEMCPY;
+		break;
+	default:
+		BUG();
+		return;
+	}
+}
+
 static void mv_desc_set_next_desc(struct mv_xor_desc_slot *desc,
 				  u32 next_desc_addr)
 {
@@ -154,6 +178,25 @@ static void mv_chan_set_mode(struct mv_xor_chan *chan,
 	chan->current_type = type;
 }
 
+static void mv_chan_set_mode_to_desc(struct mv_xor_chan *chan)
+{
+	u32 op_mode;
+	u32 config = readl_relaxed(XOR_CONFIG(chan));
+
+	op_mode = XOR_OPERATION_MODE_IN_DESC;
+
+	config &= ~0x7;
+	config |= op_mode;
+
+#if defined(__BIG_ENDIAN)
+	config |= XOR_DESCRIPTOR_SWAP;
+#else
+	config &= ~XOR_DESCRIPTOR_SWAP;
+#endif
+
+	writel_relaxed(config, XOR_CONFIG(chan));
+}
+
 static void mv_chan_activate(struct mv_xor_chan *chan)
 {
 	dev_dbg(mv_chan_to_devp(chan), " activate chan.\n");
@@ -510,6 +553,8 @@ mv_xor_prep_dma_xor(struct dma_chan *chan, dma_addr_t dest, dma_addr_t *src,
 		sw_desc->type = DMA_XOR;
 		sw_desc->async_tx.flags = flags;
 		mv_desc_init(sw_desc, dest, len, flags);
+		if (mv_chan->op_in_desc == XOR_MODE_IN_DESC)
+			mv_desc_set_mode(sw_desc);
 		while (src_cnt--)
 			mv_desc_set_src_addr(sw_desc, src_cnt, src[src_cnt]);
 	}
@@ -952,7 +997,7 @@ static int mv_xor_channel_remove(struct mv_xor_chan *mv_chan)
 static struct mv_xor_chan *
 mv_xor_channel_add(struct mv_xor_device *xordev,
 		   struct platform_device *pdev,
-		   int idx, dma_cap_mask_t cap_mask, int irq)
+		   int idx, dma_cap_mask_t cap_mask, int irq, int op_in_desc)
 {
 	int ret = 0;
 	struct mv_xor_chan *mv_chan;
@@ -964,6 +1009,7 @@ mv_xor_channel_add(struct mv_xor_device *xordev,
 
 	mv_chan->idx = idx;
 	mv_chan->irq = irq;
+	mv_chan->op_in_desc = op_in_desc;
 
 	dma_dev = &mv_chan->dmadev;
 
@@ -1024,7 +1070,10 @@ mv_xor_channel_add(struct mv_xor_device *xordev,
 
 	mv_chan_unmask_interrupts(mv_chan);
 
-	mv_chan_set_mode(mv_chan, DMA_XOR);
+	if (mv_chan->op_in_desc == XOR_MODE_IN_DESC)
+		mv_chan_set_mode_to_desc(mv_chan);
+	else
+		mv_chan_set_mode(mv_chan, DMA_XOR);
 
 	spin_lock_init(&mv_chan->lock);
 	INIT_LIST_HEAD(&mv_chan->chain);
@@ -1049,7 +1098,8 @@ mv_xor_channel_add(struct mv_xor_device *xordev,
 			goto err_free_irq;
 	}
 
-	dev_info(&pdev->dev, "Marvell XOR: ( %s%s%s)\n",
+	dev_info(&pdev->dev, "Marvell XOR (%s): ( %s%s%s)\n",
+		 mv_chan->op_in_desc ? "Descriptor Mode" : "Registers Mode",
 		 dma_has_cap(DMA_XOR, dma_dev->cap_mask) ? "xor " : "",
 		 dma_has_cap(DMA_MEMCPY, dma_dev->cap_mask) ? "cpy " : "",
 		 dma_has_cap(DMA_INTERRUPT, dma_dev->cap_mask) ? "intr " : "");
@@ -1098,6 +1148,15 @@ mv_xor_conf_mbus_windows(struct mv_xor_device *xordev,
 	writel(0, base + WINDOW_OVERRIDE_CTRL(1));
 }
 
+#ifdef CONFIG_OF
+static const struct of_device_id mv_xor_dt_ids[] = {
+	{ .compatible = "marvell,orion-xor", .data = (void *)XOR_MODE_IN_REG },
+	{ .compatible = "marvell,a38x-xor", .data = (void *)XOR_MODE_IN_DESC },
+	{},
+};
+MODULE_DEVICE_TABLE(of, mv_xor_dt_ids);
+#endif
+
 static int mv_xor_probe(struct platform_device *pdev)
 {
 	const struct mbus_dram_target_info *dram;
@@ -1105,6 +1164,7 @@ static int mv_xor_probe(struct platform_device *pdev)
 	struct mv_xor_platform_data *pdata = dev_get_platdata(&pdev->dev);
 	struct resource *res;
 	int i, ret;
+	int op_in_desc;
 
 	dev_notice(&pdev->dev, "Marvell shared XOR driver\n");
 
@@ -1149,11 +1209,15 @@ static int mv_xor_probe(struct platform_device *pdev)
 	if (pdev->dev.of_node) {
 		struct device_node *np;
 		int i = 0;
+		const struct of_device_id *of_id =
+			of_match_device(of_match_ptr(mv_xor_dt_ids),
+					&pdev->dev);
 
 		for_each_child_of_node(pdev->dev.of_node, np) {
 			struct mv_xor_chan *chan;
 			dma_cap_mask_t cap_mask;
 			int irq;
+			op_in_desc = (int)of_id->data;
 
 			dma_cap_zero(cap_mask);
 			if (of_property_read_bool(np, "dmacap,memcpy"))
@@ -1170,7 +1234,7 @@ static int mv_xor_probe(struct platform_device *pdev)
 			}
 
 			chan = mv_xor_channel_add(xordev, pdev, i,
-						  cap_mask, irq);
+						  cap_mask, irq, op_in_desc);
 			if (IS_ERR(chan)) {
 				ret = PTR_ERR(chan);
 				irq_dispose_mapping(irq);
@@ -1199,7 +1263,8 @@ static int mv_xor_probe(struct platform_device *pdev)
 			}
 
 			chan = mv_xor_channel_add(xordev, pdev, i,
-						  cd->cap_mask, irq);
+						  cd->cap_mask, irq,
+						  XOR_MODE_IN_REG);
 			if (IS_ERR(chan)) {
 				ret = PTR_ERR(chan);
 				goto err_channel_add;
@@ -1245,14 +1310,6 @@ static int mv_xor_remove(struct platform_device *pdev)
 	return 0;
 }
 
-#ifdef CONFIG_OF
-static const struct of_device_id mv_xor_dt_ids[] = {
-       { .compatible = "marvell,orion-xor", },
-       {},
-};
-MODULE_DEVICE_TABLE(of, mv_xor_dt_ids);
-#endif
-
 static struct platform_driver mv_xor_driver = {
 	.probe		= mv_xor_probe,
 	.remove		= mv_xor_remove,
diff --git a/drivers/dma/mv_xor.h b/drivers/dma/mv_xor.h
index 91958dba39a2..89a8c85539a1 100644
--- a/drivers/dma/mv_xor.h
+++ b/drivers/dma/mv_xor.h
@@ -30,8 +30,13 @@
 /* Values for the XOR_CONFIG register */
 #define XOR_OPERATION_MODE_XOR		0
 #define XOR_OPERATION_MODE_MEMCPY	2
+#define XOR_OPERATION_MODE_IN_DESC      7
 #define XOR_DESCRIPTOR_SWAP		BIT(14)
 
+#define XOR_DESC_OPERATION_XOR          (0 << 24)
+#define XOR_DESC_OPERATION_CRC32C       (1 << 24)
+#define XOR_DESC_OPERATION_MEMCPY       (2 << 24)
+
 #define XOR_DESC_DMA_OWNED		BIT(31)
 #define XOR_DESC_EOD_INT_EN		BIT(31)
 
@@ -95,6 +100,7 @@ struct mv_xor_device {
  * @all_slots: complete domain of slots usable by the channel
  * @slots_allocated: records the actual size of the descriptor slot pool
  * @irq_tasklet: bottom half where mv_xor_slot_cleanup runs
+ * @op_in_desc: new mode of driver, each op is writen to descriptor.
  */
 struct mv_xor_chan {
 	int			pending;
@@ -115,6 +121,7 @@ struct mv_xor_chan {
 	struct list_head	all_slots;
 	int			slots_allocated;
 	struct tasklet_struct	irq_tasklet;
+	int                     op_in_desc;
 	char			dummy_src[MV_XOR_MIN_BYTE_COUNT];
 	char			dummy_dst[MV_XOR_MIN_BYTE_COUNT];
 	dma_addr_t		dummy_src_addr, dummy_dst_addr;
-- 
2.4.0

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 3/8] dmaengine: mv_xor: Enlarge descriptor pool size
  2015-05-12 15:37 ` Maxime Ripard
  (?)
@ 2015-05-12 15:37   ` Maxime Ripard
  -1 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-12 15:37 UTC (permalink / raw)
  To: Vinod Koul, Dan Williams, Gregory Clement, Jason Cooper,
	Andrew Lunn, Sebastian Hesselbarth
  Cc: Lior Amsalem, Herbert Xu, Thomas Petazzoni, linux-kernel,
	linux-crypto, dmaengine, Maxime Ripard, David S. Miller,
	linux-arm-kernel

From: Lior Amsalem <alior@marvell.com>

Now that we have 2 channels assigned to 2 CPUs and all requests are chained
on same channels, we need much more descriptors available to satisfy
async_tx workload.

3072 descriptors was found in our lab as the number of descriptors which
allow the async_tx stack to work without waiting for free descriptors on
submission of new requests.

Signed-off-by: Lior Amsalem <alior@marvell.com>
Reviewed-by: Nadav Haklai <nadavh@marvell.com>
Tested-by: Nadav Haklai <nadavh@marvell.com>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
---
 drivers/dma/mv_xor.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dma/mv_xor.h b/drivers/dma/mv_xor.h
index 89a8c85539a1..145ba2f895f7 100644
--- a/drivers/dma/mv_xor.h
+++ b/drivers/dma/mv_xor.h
@@ -19,7 +19,7 @@
 #include <linux/dmaengine.h>
 #include <linux/interrupt.h>
 
-#define MV_XOR_POOL_SIZE		PAGE_SIZE
+#define MV_XOR_POOL_SIZE		(MV_XOR_SLOT_SIZE * 3072)
 #define MV_XOR_SLOT_SIZE		64
 #define MV_XOR_THRESHOLD		1
 #define MV_XOR_MAX_CHANNELS             2
-- 
2.4.0

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 3/8] dmaengine: mv_xor: Enlarge descriptor pool size
@ 2015-05-12 15:37   ` Maxime Ripard
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-12 15:37 UTC (permalink / raw)
  To: Vinod Koul, Dan Williams, Gregory Clement, Jason Cooper,
	Andrew Lunn, Sebastian Hesselbarth
  Cc: dmaengine, linux-arm-kernel, linux-kernel, linux-crypto,
	Lior Amsalem, Thomas Petazzoni, Herbert Xu, David S. Miller,
	Maxime Ripard

From: Lior Amsalem <alior@marvell.com>

Now that we have 2 channels assigned to 2 CPUs and all requests are chained
on same channels, we need much more descriptors available to satisfy
async_tx workload.

3072 descriptors was found in our lab as the number of descriptors which
allow the async_tx stack to work without waiting for free descriptors on
submission of new requests.

Signed-off-by: Lior Amsalem <alior@marvell.com>
Reviewed-by: Nadav Haklai <nadavh@marvell.com>
Tested-by: Nadav Haklai <nadavh@marvell.com>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
---
 drivers/dma/mv_xor.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dma/mv_xor.h b/drivers/dma/mv_xor.h
index 89a8c85539a1..145ba2f895f7 100644
--- a/drivers/dma/mv_xor.h
+++ b/drivers/dma/mv_xor.h
@@ -19,7 +19,7 @@
 #include <linux/dmaengine.h>
 #include <linux/interrupt.h>
 
-#define MV_XOR_POOL_SIZE		PAGE_SIZE
+#define MV_XOR_POOL_SIZE		(MV_XOR_SLOT_SIZE * 3072)
 #define MV_XOR_SLOT_SIZE		64
 #define MV_XOR_THRESHOLD		1
 #define MV_XOR_MAX_CHANNELS             2
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 3/8] dmaengine: mv_xor: Enlarge descriptor pool size
@ 2015-05-12 15:37   ` Maxime Ripard
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-12 15:37 UTC (permalink / raw)
  To: linux-arm-kernel

From: Lior Amsalem <alior@marvell.com>

Now that we have 2 channels assigned to 2 CPUs and all requests are chained
on same channels, we need much more descriptors available to satisfy
async_tx workload.

3072 descriptors was found in our lab as the number of descriptors which
allow the async_tx stack to work without waiting for free descriptors on
submission of new requests.

Signed-off-by: Lior Amsalem <alior@marvell.com>
Reviewed-by: Nadav Haklai <nadavh@marvell.com>
Tested-by: Nadav Haklai <nadavh@marvell.com>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
---
 drivers/dma/mv_xor.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dma/mv_xor.h b/drivers/dma/mv_xor.h
index 89a8c85539a1..145ba2f895f7 100644
--- a/drivers/dma/mv_xor.h
+++ b/drivers/dma/mv_xor.h
@@ -19,7 +19,7 @@
 #include <linux/dmaengine.h>
 #include <linux/interrupt.h>
 
-#define MV_XOR_POOL_SIZE		PAGE_SIZE
+#define MV_XOR_POOL_SIZE		(MV_XOR_SLOT_SIZE * 3072)
 #define MV_XOR_SLOT_SIZE		64
 #define MV_XOR_THRESHOLD		1
 #define MV_XOR_MAX_CHANNELS             2
-- 
2.4.0

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 4/8] dmaengine: mv_xor: improve descriptors list handling and reduce locking
  2015-05-12 15:37 ` Maxime Ripard
@ 2015-05-12 15:37   ` Maxime Ripard
  -1 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-12 15:37 UTC (permalink / raw)
  To: Vinod Koul, Dan Williams, Gregory Clement, Jason Cooper,
	Andrew Lunn, Sebastian Hesselbarth
  Cc: dmaengine, linux-arm-kernel, linux-kernel, linux-crypto,
	Lior Amsalem, Thomas Petazzoni, Herbert Xu, David S. Miller,
	Maxime Ripard

From: Lior Amsalem <alior@marvell.com>

This patch change the way free descriptors are marked.

Instead of having a field for descriptor in use, all the descriptors in the
all_slots list are free for use.

This simplify the allocation method and reduce the locking needed.

Signed-off-by: Lior Amsalem <alior@marvell.com>
Reviewed-by: Ofer Heifetz <oferh@marvell.com>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
---
 drivers/dma/mv_xor.c | 131 +++++++++++++++++----------------------------------
 drivers/dma/mv_xor.h |  17 +++----
 2 files changed, 48 insertions(+), 100 deletions(-)

diff --git a/drivers/dma/mv_xor.c b/drivers/dma/mv_xor.c
index bcc9aec0ce02..d1110442f0d2 100644
--- a/drivers/dma/mv_xor.c
+++ b/drivers/dma/mv_xor.c
@@ -88,12 +88,6 @@ static void mv_desc_set_next_desc(struct mv_xor_desc_slot *desc,
 	hw_desc->phy_next_desc = next_desc_addr;
 }
 
-static void mv_desc_clear_next_desc(struct mv_xor_desc_slot *desc)
-{
-	struct mv_xor_desc *hw_desc = desc->hw_desc;
-	hw_desc->phy_next_desc = 0;
-}
-
 static void mv_desc_set_src_addr(struct mv_xor_desc_slot *desc,
 				 int index, dma_addr_t addr)
 {
@@ -214,21 +208,6 @@ static char mv_chan_is_busy(struct mv_xor_chan *chan)
 	return (state == 1) ? 1 : 0;
 }
 
-/**
- * mv_chan_free_slots - flags descriptor slots for reuse
- * @slot: Slot to free
- * Caller must hold &mv_chan->lock while calling this function
- */
-static void mv_chan_free_slots(struct mv_xor_chan *mv_chan,
-			       struct mv_xor_desc_slot *slot)
-{
-	dev_dbg(mv_chan_to_devp(mv_chan), "%s %d slot %p\n",
-		__func__, __LINE__, slot);
-
-	slot->slot_used = 0;
-
-}
-
 /*
  * mv_chan_start_new_chain - program the engine to operate on new
  * chain headed by sw_desc
@@ -280,12 +259,10 @@ mv_chan_clean_completed_slots(struct mv_xor_chan *mv_chan)
 
 	dev_dbg(mv_chan_to_devp(mv_chan), "%s %d\n", __func__, __LINE__);
 	list_for_each_entry_safe(iter, _iter, &mv_chan->completed_slots,
-				 completed_node) {
+				 node) {
 
-		if (async_tx_test_ack(&iter->async_tx)) {
-			list_del(&iter->completed_node);
-			mv_chan_free_slots(mv_chan, iter);
-		}
+		if (async_tx_test_ack(&iter->async_tx))
+			list_move_tail(&iter->node, &mv_chan->free_slots);
 	}
 	return 0;
 }
@@ -296,17 +273,16 @@ mv_desc_clean_slot(struct mv_xor_desc_slot *desc,
 {
 	dev_dbg(mv_chan_to_devp(mv_chan), "%s %d: desc %p flags %d\n",
 		__func__, __LINE__, desc, desc->async_tx.flags);
-	list_del(&desc->chain_node);
+
 	/* the client is allowed to attach dependent operations
 	 * until 'ack' is set
 	 */
-	if (!async_tx_test_ack(&desc->async_tx)) {
+	if (!async_tx_test_ack(&desc->async_tx))
 		/* move this slot to the completed_slots */
-		list_add_tail(&desc->completed_node, &mv_chan->completed_slots);
-		return 0;
-	}
+		list_move_tail(&desc->node, &mv_chan->completed_slots);
+	else
+		list_move_tail(&desc->node, &mv_chan->free_slots);
 
-	mv_chan_free_slots(mv_chan, desc);
 	return 0;
 }
 
@@ -328,7 +304,7 @@ static void mv_chan_slot_cleanup(struct mv_xor_chan *mv_chan)
 	 */
 
 	list_for_each_entry_safe(iter, _iter, &mv_chan->chain,
-					chain_node) {
+					node) {
 		prefetch(_iter);
 		prefetch(&_iter->async_tx);
 
@@ -358,7 +334,7 @@ static void mv_chan_slot_cleanup(struct mv_xor_chan *mv_chan)
 		struct mv_xor_desc_slot *chain_head;
 		chain_head = list_entry(mv_chan->chain.next,
 					struct mv_xor_desc_slot,
-					chain_node);
+					node);
 
 		mv_chan_start_new_chain(mv_chan, chain_head);
 	}
@@ -379,49 +355,28 @@ static void mv_xor_tasklet(unsigned long data)
 static struct mv_xor_desc_slot *
 mv_chan_alloc_slot(struct mv_xor_chan *mv_chan)
 {
-	struct mv_xor_desc_slot *iter, *_iter;
-	int retry = 0;
+	struct mv_xor_desc_slot *iter;
 
-	/* start search from the last allocated descrtiptor
-	 * if a contiguous allocation can not be found start searching
-	 * from the beginning of the list
-	 */
-retry:
-	if (retry == 0)
-		iter = mv_chan->last_used;
-	else
-		iter = list_entry(&mv_chan->all_slots,
-			struct mv_xor_desc_slot,
-			slot_node);
+	spin_lock_bh(&mv_chan->lock);
 
-	list_for_each_entry_safe_continue(
-		iter, _iter, &mv_chan->all_slots, slot_node) {
+	if (!list_empty(&mv_chan->free_slots)) {
+		iter = list_first_entry(&mv_chan->free_slots,
+					struct mv_xor_desc_slot,
+					node);
 
-		prefetch(_iter);
-		prefetch(&_iter->async_tx);
-		if (iter->slot_used) {
-			/* give up after finding the first busy slot
-			 * on the second pass through the list
-			 */
-			if (retry)
-				break;
-			continue;
-		}
+		list_move_tail(&iter->node, &mv_chan->allocated_slots);
+
+		spin_unlock_bh(&mv_chan->lock);
 
 		/* pre-ack descriptor */
 		async_tx_ack(&iter->async_tx);
-
-		iter->slot_used = 1;
-		INIT_LIST_HEAD(&iter->chain_node);
 		iter->async_tx.cookie = -EBUSY;
-		mv_chan->last_used = iter;
-		mv_desc_clear_next_desc(iter);
 
 		return iter;
 
 	}
-	if (!retry++)
-		goto retry;
+
+	spin_unlock_bh(&mv_chan->lock);
 
 	/* try to free some slots if the allocation fails */
 	tasklet_schedule(&mv_chan->irq_tasklet);
@@ -447,14 +402,14 @@ mv_xor_tx_submit(struct dma_async_tx_descriptor *tx)
 	cookie = dma_cookie_assign(tx);
 
 	if (list_empty(&mv_chan->chain))
-		list_add_tail(&sw_desc->chain_node, &mv_chan->chain);
+		list_move_tail(&sw_desc->node, &mv_chan->chain);
 	else {
 		new_hw_chain = 0;
 
 		old_chain_tail = list_entry(mv_chan->chain.prev,
 					    struct mv_xor_desc_slot,
-					    chain_node);
-		list_add_tail(&sw_desc->chain_node, &mv_chan->chain);
+					    node);
+		list_move_tail(&sw_desc->node, &mv_chan->chain);
 
 		dev_dbg(mv_chan_to_devp(mv_chan), "Append to last desc %pa\n",
 			&old_chain_tail->async_tx.phys);
@@ -507,26 +462,20 @@ static int mv_xor_alloc_chan_resources(struct dma_chan *chan)
 
 		dma_async_tx_descriptor_init(&slot->async_tx, chan);
 		slot->async_tx.tx_submit = mv_xor_tx_submit;
-		INIT_LIST_HEAD(&slot->chain_node);
-		INIT_LIST_HEAD(&slot->slot_node);
+		INIT_LIST_HEAD(&slot->node);
 		dma_desc = mv_chan->dma_desc_pool;
 		slot->async_tx.phys = dma_desc + idx * MV_XOR_SLOT_SIZE;
 		slot->idx = idx++;
 
 		spin_lock_bh(&mv_chan->lock);
 		mv_chan->slots_allocated = idx;
-		list_add_tail(&slot->slot_node, &mv_chan->all_slots);
+		list_add_tail(&slot->node, &mv_chan->free_slots);
 		spin_unlock_bh(&mv_chan->lock);
 	}
 
-	if (mv_chan->slots_allocated && !mv_chan->last_used)
-		mv_chan->last_used = list_entry(mv_chan->all_slots.next,
-					struct mv_xor_desc_slot,
-					slot_node);
-
 	dev_dbg(mv_chan_to_devp(mv_chan),
-		"allocated %d descriptor slots last_used: %p\n",
-		mv_chan->slots_allocated, mv_chan->last_used);
+		"allocated %d descriptor slots\n",
+		mv_chan->slots_allocated);
 
 	return mv_chan->slots_allocated ? : -ENOMEM;
 }
@@ -547,7 +496,6 @@ mv_xor_prep_dma_xor(struct dma_chan *chan, dma_addr_t dest, dma_addr_t *src,
 		"%s src_cnt: %d len: %u dest %pad flags: %ld\n",
 		__func__, src_cnt, len, &dest, flags);
 
-	spin_lock_bh(&mv_chan->lock);
 	sw_desc = mv_chan_alloc_slot(mv_chan);
 	if (sw_desc) {
 		sw_desc->type = DMA_XOR;
@@ -558,7 +506,7 @@ mv_xor_prep_dma_xor(struct dma_chan *chan, dma_addr_t dest, dma_addr_t *src,
 		while (src_cnt--)
 			mv_desc_set_src_addr(sw_desc, src_cnt, src[src_cnt]);
 	}
-	spin_unlock_bh(&mv_chan->lock);
+
 	dev_dbg(mv_chan_to_devp(mv_chan),
 		"%s sw_desc %p async_tx %p \n",
 		__func__, sw_desc, &sw_desc->async_tx);
@@ -605,22 +553,26 @@ static void mv_xor_free_chan_resources(struct dma_chan *chan)
 	mv_chan_slot_cleanup(mv_chan);
 
 	list_for_each_entry_safe(iter, _iter, &mv_chan->chain,
-					chain_node) {
+					node) {
 		in_use_descs++;
-		list_del(&iter->chain_node);
+		list_move_tail(&iter->node, &mv_chan->free_slots);
 	}
 	list_for_each_entry_safe(iter, _iter, &mv_chan->completed_slots,
-				 completed_node) {
+				 node) {
+		in_use_descs++;
+		list_move_tail(&iter->node, &mv_chan->free_slots);
+	}
+	list_for_each_entry_safe(iter, _iter, &mv_chan->allocated_slots,
+				 node) {
 		in_use_descs++;
-		list_del(&iter->completed_node);
+		list_move_tail(&iter->node, &mv_chan->free_slots);
 	}
 	list_for_each_entry_safe_reverse(
-		iter, _iter, &mv_chan->all_slots, slot_node) {
-		list_del(&iter->slot_node);
+		iter, _iter, &mv_chan->free_slots, node) {
+		list_del(&iter->node);
 		kfree(iter);
 		mv_chan->slots_allocated--;
 	}
-	mv_chan->last_used = NULL;
 
 	dev_dbg(mv_chan_to_devp(mv_chan), "%s slots_allocated %d\n",
 		__func__, mv_chan->slots_allocated);
@@ -1078,7 +1030,8 @@ mv_xor_channel_add(struct mv_xor_device *xordev,
 	spin_lock_init(&mv_chan->lock);
 	INIT_LIST_HEAD(&mv_chan->chain);
 	INIT_LIST_HEAD(&mv_chan->completed_slots);
-	INIT_LIST_HEAD(&mv_chan->all_slots);
+	INIT_LIST_HEAD(&mv_chan->free_slots);
+	INIT_LIST_HEAD(&mv_chan->allocated_slots);
 	mv_chan->dmachan.device = dma_dev;
 	dma_cookie_init(&mv_chan->dmachan);
 
diff --git a/drivers/dma/mv_xor.h b/drivers/dma/mv_xor.h
index 145ba2f895f7..71684de37ddb 100644
--- a/drivers/dma/mv_xor.h
+++ b/drivers/dma/mv_xor.h
@@ -93,11 +93,11 @@ struct mv_xor_device {
  * @mmr_base: memory mapped register base
  * @idx: the index of the xor channel
  * @chain: device chain view of the descriptors
+ * @free_slots: free slots usable by the channel
+ * @allocated_slots: slots allocated by the driver
  * @completed_slots: slots completed by HW but still need to be acked
  * @device: parent device
  * @common: common dmaengine channel object members
- * @last_used: place holder for allocation to continue from where it left off
- * @all_slots: complete domain of slots usable by the channel
  * @slots_allocated: records the actual size of the descriptor slot pool
  * @irq_tasklet: bottom half where mv_xor_slot_cleanup runs
  * @op_in_desc: new mode of driver, each op is writen to descriptor.
@@ -111,14 +111,14 @@ struct mv_xor_chan {
 	int                     irq;
 	enum dma_transaction_type	current_type;
 	struct list_head	chain;
+	struct list_head	free_slots;
+	struct list_head	allocated_slots;
 	struct list_head	completed_slots;
 	dma_addr_t		dma_desc_pool;
 	void			*dma_desc_pool_virt;
 	size_t                  pool_size;
 	struct dma_device	dmadev;
 	struct dma_chan		dmachan;
-	struct mv_xor_desc_slot	*last_used;
-	struct list_head	all_slots;
 	int			slots_allocated;
 	struct tasklet_struct	irq_tasklet;
 	int                     op_in_desc;
@@ -129,9 +129,7 @@ struct mv_xor_chan {
 
 /**
  * struct mv_xor_desc_slot - software descriptor
- * @slot_node: node on the mv_xor_chan.all_slots list
- * @chain_node: node on the mv_xor_chan.chain list
- * @completed_node: node on the mv_xor_chan.completed_slots list
+ * @node: node on the mv_xor_chan lists
  * @hw_desc: virtual address of the hardware descriptor chain
  * @phys: hardware address of the hardware descriptor chain
  * @slot_used: slot in use or not
@@ -140,12 +138,9 @@ struct mv_xor_chan {
  * @async_tx: support for the async_tx api
  */
 struct mv_xor_desc_slot {
-	struct list_head	slot_node;
-	struct list_head	chain_node;
-	struct list_head	completed_node;
+	struct list_head	node;
 	enum dma_transaction_type	type;
 	void			*hw_desc;
-	u16			slot_used;
 	u16			idx;
 	struct dma_async_tx_descriptor	async_tx;
 };
-- 
2.4.0

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 4/8] dmaengine: mv_xor: improve descriptors list handling and reduce locking
@ 2015-05-12 15:37   ` Maxime Ripard
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-12 15:37 UTC (permalink / raw)
  To: linux-arm-kernel

From: Lior Amsalem <alior@marvell.com>

This patch change the way free descriptors are marked.

Instead of having a field for descriptor in use, all the descriptors in the
all_slots list are free for use.

This simplify the allocation method and reduce the locking needed.

Signed-off-by: Lior Amsalem <alior@marvell.com>
Reviewed-by: Ofer Heifetz <oferh@marvell.com>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
---
 drivers/dma/mv_xor.c | 131 +++++++++++++++++----------------------------------
 drivers/dma/mv_xor.h |  17 +++----
 2 files changed, 48 insertions(+), 100 deletions(-)

diff --git a/drivers/dma/mv_xor.c b/drivers/dma/mv_xor.c
index bcc9aec0ce02..d1110442f0d2 100644
--- a/drivers/dma/mv_xor.c
+++ b/drivers/dma/mv_xor.c
@@ -88,12 +88,6 @@ static void mv_desc_set_next_desc(struct mv_xor_desc_slot *desc,
 	hw_desc->phy_next_desc = next_desc_addr;
 }
 
-static void mv_desc_clear_next_desc(struct mv_xor_desc_slot *desc)
-{
-	struct mv_xor_desc *hw_desc = desc->hw_desc;
-	hw_desc->phy_next_desc = 0;
-}
-
 static void mv_desc_set_src_addr(struct mv_xor_desc_slot *desc,
 				 int index, dma_addr_t addr)
 {
@@ -214,21 +208,6 @@ static char mv_chan_is_busy(struct mv_xor_chan *chan)
 	return (state == 1) ? 1 : 0;
 }
 
-/**
- * mv_chan_free_slots - flags descriptor slots for reuse
- * @slot: Slot to free
- * Caller must hold &mv_chan->lock while calling this function
- */
-static void mv_chan_free_slots(struct mv_xor_chan *mv_chan,
-			       struct mv_xor_desc_slot *slot)
-{
-	dev_dbg(mv_chan_to_devp(mv_chan), "%s %d slot %p\n",
-		__func__, __LINE__, slot);
-
-	slot->slot_used = 0;
-
-}
-
 /*
  * mv_chan_start_new_chain - program the engine to operate on new
  * chain headed by sw_desc
@@ -280,12 +259,10 @@ mv_chan_clean_completed_slots(struct mv_xor_chan *mv_chan)
 
 	dev_dbg(mv_chan_to_devp(mv_chan), "%s %d\n", __func__, __LINE__);
 	list_for_each_entry_safe(iter, _iter, &mv_chan->completed_slots,
-				 completed_node) {
+				 node) {
 
-		if (async_tx_test_ack(&iter->async_tx)) {
-			list_del(&iter->completed_node);
-			mv_chan_free_slots(mv_chan, iter);
-		}
+		if (async_tx_test_ack(&iter->async_tx))
+			list_move_tail(&iter->node, &mv_chan->free_slots);
 	}
 	return 0;
 }
@@ -296,17 +273,16 @@ mv_desc_clean_slot(struct mv_xor_desc_slot *desc,
 {
 	dev_dbg(mv_chan_to_devp(mv_chan), "%s %d: desc %p flags %d\n",
 		__func__, __LINE__, desc, desc->async_tx.flags);
-	list_del(&desc->chain_node);
+
 	/* the client is allowed to attach dependent operations
 	 * until 'ack' is set
 	 */
-	if (!async_tx_test_ack(&desc->async_tx)) {
+	if (!async_tx_test_ack(&desc->async_tx))
 		/* move this slot to the completed_slots */
-		list_add_tail(&desc->completed_node, &mv_chan->completed_slots);
-		return 0;
-	}
+		list_move_tail(&desc->node, &mv_chan->completed_slots);
+	else
+		list_move_tail(&desc->node, &mv_chan->free_slots);
 
-	mv_chan_free_slots(mv_chan, desc);
 	return 0;
 }
 
@@ -328,7 +304,7 @@ static void mv_chan_slot_cleanup(struct mv_xor_chan *mv_chan)
 	 */
 
 	list_for_each_entry_safe(iter, _iter, &mv_chan->chain,
-					chain_node) {
+					node) {
 		prefetch(_iter);
 		prefetch(&_iter->async_tx);
 
@@ -358,7 +334,7 @@ static void mv_chan_slot_cleanup(struct mv_xor_chan *mv_chan)
 		struct mv_xor_desc_slot *chain_head;
 		chain_head = list_entry(mv_chan->chain.next,
 					struct mv_xor_desc_slot,
-					chain_node);
+					node);
 
 		mv_chan_start_new_chain(mv_chan, chain_head);
 	}
@@ -379,49 +355,28 @@ static void mv_xor_tasklet(unsigned long data)
 static struct mv_xor_desc_slot *
 mv_chan_alloc_slot(struct mv_xor_chan *mv_chan)
 {
-	struct mv_xor_desc_slot *iter, *_iter;
-	int retry = 0;
+	struct mv_xor_desc_slot *iter;
 
-	/* start search from the last allocated descrtiptor
-	 * if a contiguous allocation can not be found start searching
-	 * from the beginning of the list
-	 */
-retry:
-	if (retry == 0)
-		iter = mv_chan->last_used;
-	else
-		iter = list_entry(&mv_chan->all_slots,
-			struct mv_xor_desc_slot,
-			slot_node);
+	spin_lock_bh(&mv_chan->lock);
 
-	list_for_each_entry_safe_continue(
-		iter, _iter, &mv_chan->all_slots, slot_node) {
+	if (!list_empty(&mv_chan->free_slots)) {
+		iter = list_first_entry(&mv_chan->free_slots,
+					struct mv_xor_desc_slot,
+					node);
 
-		prefetch(_iter);
-		prefetch(&_iter->async_tx);
-		if (iter->slot_used) {
-			/* give up after finding the first busy slot
-			 * on the second pass through the list
-			 */
-			if (retry)
-				break;
-			continue;
-		}
+		list_move_tail(&iter->node, &mv_chan->allocated_slots);
+
+		spin_unlock_bh(&mv_chan->lock);
 
 		/* pre-ack descriptor */
 		async_tx_ack(&iter->async_tx);
-
-		iter->slot_used = 1;
-		INIT_LIST_HEAD(&iter->chain_node);
 		iter->async_tx.cookie = -EBUSY;
-		mv_chan->last_used = iter;
-		mv_desc_clear_next_desc(iter);
 
 		return iter;
 
 	}
-	if (!retry++)
-		goto retry;
+
+	spin_unlock_bh(&mv_chan->lock);
 
 	/* try to free some slots if the allocation fails */
 	tasklet_schedule(&mv_chan->irq_tasklet);
@@ -447,14 +402,14 @@ mv_xor_tx_submit(struct dma_async_tx_descriptor *tx)
 	cookie = dma_cookie_assign(tx);
 
 	if (list_empty(&mv_chan->chain))
-		list_add_tail(&sw_desc->chain_node, &mv_chan->chain);
+		list_move_tail(&sw_desc->node, &mv_chan->chain);
 	else {
 		new_hw_chain = 0;
 
 		old_chain_tail = list_entry(mv_chan->chain.prev,
 					    struct mv_xor_desc_slot,
-					    chain_node);
-		list_add_tail(&sw_desc->chain_node, &mv_chan->chain);
+					    node);
+		list_move_tail(&sw_desc->node, &mv_chan->chain);
 
 		dev_dbg(mv_chan_to_devp(mv_chan), "Append to last desc %pa\n",
 			&old_chain_tail->async_tx.phys);
@@ -507,26 +462,20 @@ static int mv_xor_alloc_chan_resources(struct dma_chan *chan)
 
 		dma_async_tx_descriptor_init(&slot->async_tx, chan);
 		slot->async_tx.tx_submit = mv_xor_tx_submit;
-		INIT_LIST_HEAD(&slot->chain_node);
-		INIT_LIST_HEAD(&slot->slot_node);
+		INIT_LIST_HEAD(&slot->node);
 		dma_desc = mv_chan->dma_desc_pool;
 		slot->async_tx.phys = dma_desc + idx * MV_XOR_SLOT_SIZE;
 		slot->idx = idx++;
 
 		spin_lock_bh(&mv_chan->lock);
 		mv_chan->slots_allocated = idx;
-		list_add_tail(&slot->slot_node, &mv_chan->all_slots);
+		list_add_tail(&slot->node, &mv_chan->free_slots);
 		spin_unlock_bh(&mv_chan->lock);
 	}
 
-	if (mv_chan->slots_allocated && !mv_chan->last_used)
-		mv_chan->last_used = list_entry(mv_chan->all_slots.next,
-					struct mv_xor_desc_slot,
-					slot_node);
-
 	dev_dbg(mv_chan_to_devp(mv_chan),
-		"allocated %d descriptor slots last_used: %p\n",
-		mv_chan->slots_allocated, mv_chan->last_used);
+		"allocated %d descriptor slots\n",
+		mv_chan->slots_allocated);
 
 	return mv_chan->slots_allocated ? : -ENOMEM;
 }
@@ -547,7 +496,6 @@ mv_xor_prep_dma_xor(struct dma_chan *chan, dma_addr_t dest, dma_addr_t *src,
 		"%s src_cnt: %d len: %u dest %pad flags: %ld\n",
 		__func__, src_cnt, len, &dest, flags);
 
-	spin_lock_bh(&mv_chan->lock);
 	sw_desc = mv_chan_alloc_slot(mv_chan);
 	if (sw_desc) {
 		sw_desc->type = DMA_XOR;
@@ -558,7 +506,7 @@ mv_xor_prep_dma_xor(struct dma_chan *chan, dma_addr_t dest, dma_addr_t *src,
 		while (src_cnt--)
 			mv_desc_set_src_addr(sw_desc, src_cnt, src[src_cnt]);
 	}
-	spin_unlock_bh(&mv_chan->lock);
+
 	dev_dbg(mv_chan_to_devp(mv_chan),
 		"%s sw_desc %p async_tx %p \n",
 		__func__, sw_desc, &sw_desc->async_tx);
@@ -605,22 +553,26 @@ static void mv_xor_free_chan_resources(struct dma_chan *chan)
 	mv_chan_slot_cleanup(mv_chan);
 
 	list_for_each_entry_safe(iter, _iter, &mv_chan->chain,
-					chain_node) {
+					node) {
 		in_use_descs++;
-		list_del(&iter->chain_node);
+		list_move_tail(&iter->node, &mv_chan->free_slots);
 	}
 	list_for_each_entry_safe(iter, _iter, &mv_chan->completed_slots,
-				 completed_node) {
+				 node) {
+		in_use_descs++;
+		list_move_tail(&iter->node, &mv_chan->free_slots);
+	}
+	list_for_each_entry_safe(iter, _iter, &mv_chan->allocated_slots,
+				 node) {
 		in_use_descs++;
-		list_del(&iter->completed_node);
+		list_move_tail(&iter->node, &mv_chan->free_slots);
 	}
 	list_for_each_entry_safe_reverse(
-		iter, _iter, &mv_chan->all_slots, slot_node) {
-		list_del(&iter->slot_node);
+		iter, _iter, &mv_chan->free_slots, node) {
+		list_del(&iter->node);
 		kfree(iter);
 		mv_chan->slots_allocated--;
 	}
-	mv_chan->last_used = NULL;
 
 	dev_dbg(mv_chan_to_devp(mv_chan), "%s slots_allocated %d\n",
 		__func__, mv_chan->slots_allocated);
@@ -1078,7 +1030,8 @@ mv_xor_channel_add(struct mv_xor_device *xordev,
 	spin_lock_init(&mv_chan->lock);
 	INIT_LIST_HEAD(&mv_chan->chain);
 	INIT_LIST_HEAD(&mv_chan->completed_slots);
-	INIT_LIST_HEAD(&mv_chan->all_slots);
+	INIT_LIST_HEAD(&mv_chan->free_slots);
+	INIT_LIST_HEAD(&mv_chan->allocated_slots);
 	mv_chan->dmachan.device = dma_dev;
 	dma_cookie_init(&mv_chan->dmachan);
 
diff --git a/drivers/dma/mv_xor.h b/drivers/dma/mv_xor.h
index 145ba2f895f7..71684de37ddb 100644
--- a/drivers/dma/mv_xor.h
+++ b/drivers/dma/mv_xor.h
@@ -93,11 +93,11 @@ struct mv_xor_device {
  * @mmr_base: memory mapped register base
  * @idx: the index of the xor channel
  * @chain: device chain view of the descriptors
+ * @free_slots: free slots usable by the channel
+ * @allocated_slots: slots allocated by the driver
  * @completed_slots: slots completed by HW but still need to be acked
  * @device: parent device
  * @common: common dmaengine channel object members
- * @last_used: place holder for allocation to continue from where it left off
- * @all_slots: complete domain of slots usable by the channel
  * @slots_allocated: records the actual size of the descriptor slot pool
  * @irq_tasklet: bottom half where mv_xor_slot_cleanup runs
  * @op_in_desc: new mode of driver, each op is writen to descriptor.
@@ -111,14 +111,14 @@ struct mv_xor_chan {
 	int                     irq;
 	enum dma_transaction_type	current_type;
 	struct list_head	chain;
+	struct list_head	free_slots;
+	struct list_head	allocated_slots;
 	struct list_head	completed_slots;
 	dma_addr_t		dma_desc_pool;
 	void			*dma_desc_pool_virt;
 	size_t                  pool_size;
 	struct dma_device	dmadev;
 	struct dma_chan		dmachan;
-	struct mv_xor_desc_slot	*last_used;
-	struct list_head	all_slots;
 	int			slots_allocated;
 	struct tasklet_struct	irq_tasklet;
 	int                     op_in_desc;
@@ -129,9 +129,7 @@ struct mv_xor_chan {
 
 /**
  * struct mv_xor_desc_slot - software descriptor
- * @slot_node: node on the mv_xor_chan.all_slots list
- * @chain_node: node on the mv_xor_chan.chain list
- * @completed_node: node on the mv_xor_chan.completed_slots list
+ * @node: node on the mv_xor_chan lists
  * @hw_desc: virtual address of the hardware descriptor chain
  * @phys: hardware address of the hardware descriptor chain
  * @slot_used: slot in use or not
@@ -140,12 +138,9 @@ struct mv_xor_chan {
  * @async_tx: support for the async_tx api
  */
 struct mv_xor_desc_slot {
-	struct list_head	slot_node;
-	struct list_head	chain_node;
-	struct list_head	completed_node;
+	struct list_head	node;
 	enum dma_transaction_type	type;
 	void			*hw_desc;
-	u16			slot_used;
 	u16			idx;
 	struct dma_async_tx_descriptor	async_tx;
 };
-- 
2.4.0

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 5/8] dmaengine: mv_xor: bug fix for racing condition in descriptors cleanup
  2015-05-12 15:37 ` Maxime Ripard
  (?)
@ 2015-05-12 15:37   ` Maxime Ripard
  -1 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-12 15:37 UTC (permalink / raw)
  To: Vinod Koul, Dan Williams, Gregory Clement, Jason Cooper,
	Andrew Lunn, Sebastian Hesselbarth
  Cc: Lior Amsalem, Herbert Xu, Thomas Petazzoni, linux-kernel,
	linux-crypto, dmaengine, Maxime Ripard, David S. Miller,
	linux-arm-kernel

From: Lior Amsalem <alior@marvell.com>

This patch fixes a bug in the XOR driver where the cleanup function can be
called and free descriptors that never been processed by the engine (which
result in data errors).

The cleanup function will free descriptors based on the ownership bit in
the descriptors.

Signed-off-by: Lior Amsalem <alior@marvell.com>
Reviewed-by: Ofer Heifetz <oferh@marvell.com>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
---
 drivers/dma/mv_xor.c | 72 +++++++++++++++++++++++++++++++++-------------------
 drivers/dma/mv_xor.h |  1 +
 2 files changed, 47 insertions(+), 26 deletions(-)

diff --git a/drivers/dma/mv_xor.c b/drivers/dma/mv_xor.c
index d1110442f0d2..28980483eafb 100644
--- a/drivers/dma/mv_xor.c
+++ b/drivers/dma/mv_xor.c
@@ -293,7 +293,8 @@ static void mv_chan_slot_cleanup(struct mv_xor_chan *mv_chan)
 	dma_cookie_t cookie = 0;
 	int busy = mv_chan_is_busy(mv_chan);
 	u32 current_desc = mv_chan_get_current_desc(mv_chan);
-	int seen_current = 0;
+	int current_cleaned = 0;
+	struct mv_xor_desc *hw_desc;
 
 	dev_dbg(mv_chan_to_devp(mv_chan), "%s %d\n", __func__, __LINE__);
 	dev_dbg(mv_chan_to_devp(mv_chan), "current_desc %x\n", current_desc);
@@ -305,38 +306,57 @@ static void mv_chan_slot_cleanup(struct mv_xor_chan *mv_chan)
 
 	list_for_each_entry_safe(iter, _iter, &mv_chan->chain,
 					node) {
-		prefetch(_iter);
-		prefetch(&_iter->async_tx);
 
-		/* do not advance past the current descriptor loaded into the
-		 * hardware channel, subsequent descriptors are either in
-		 * process or have not been submitted
-		 */
-		if (seen_current)
-			break;
+		/* clean finished descriptors */
+		hw_desc = iter->hw_desc;
+		if (hw_desc->status & XOR_DESC_SUCCESS) {
+			cookie = mv_desc_run_tx_complete_actions(iter, mv_chan,
+								 cookie);
 
-		/* stop the search if we reach the current descriptor and the
-		 * channel is busy
-		 */
-		if (iter->async_tx.phys == current_desc) {
-			seen_current = 1;
-			if (busy)
+			/* done processing desc, clean slot */
+			mv_desc_clean_slot(iter, mv_chan);
+
+			/* break if we did cleaned the current */
+			if (iter->async_tx.phys == current_desc) {
+				current_cleaned = 1;
 				break;
+			}
+		} else {
+			if (iter->async_tx.phys == current_desc) {
+				current_cleaned = 0;
+				break;
+			}
 		}
-
-		cookie = mv_desc_run_tx_complete_actions(iter, mv_chan, cookie);
-
-		if (mv_desc_clean_slot(iter, mv_chan))
-			break;
 	}
 
 	if ((busy == 0) && !list_empty(&mv_chan->chain)) {
-		struct mv_xor_desc_slot *chain_head;
-		chain_head = list_entry(mv_chan->chain.next,
-					struct mv_xor_desc_slot,
-					node);
-
-		mv_chan_start_new_chain(mv_chan, chain_head);
+		if (current_cleaned) {
+			/*
+			 * current descriptor cleaned and removed, run
+			 * from list head
+			 */
+			iter = list_entry(mv_chan->chain.next,
+					  struct mv_xor_desc_slot,
+					  node);
+			mv_chan_start_new_chain(mv_chan, iter);
+		} else {
+			if (!list_is_last(&iter->node, &mv_chan->chain)) {
+				/*
+				 * descriptors are still waiting after
+				 * current, trigger them
+				 */
+				iter = list_entry(iter->node.next,
+						  struct mv_xor_desc_slot,
+						  node);
+				mv_chan_start_new_chain(mv_chan, iter);
+			} else {
+				/*
+				 * some descriptors are still waiting
+				 * to be cleaned
+				 */
+				tasklet_schedule(&mv_chan->irq_tasklet);
+			}
+		}
 	}
 
 	if (cookie > 0)
diff --git a/drivers/dma/mv_xor.h b/drivers/dma/mv_xor.h
index 71684de37ddb..b7455b42137b 100644
--- a/drivers/dma/mv_xor.h
+++ b/drivers/dma/mv_xor.h
@@ -32,6 +32,7 @@
 #define XOR_OPERATION_MODE_MEMCPY	2
 #define XOR_OPERATION_MODE_IN_DESC      7
 #define XOR_DESCRIPTOR_SWAP		BIT(14)
+#define XOR_DESC_SUCCESS		0x40000000
 
 #define XOR_DESC_OPERATION_XOR          (0 << 24)
 #define XOR_DESC_OPERATION_CRC32C       (1 << 24)
-- 
2.4.0

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 5/8] dmaengine: mv_xor: bug fix for racing condition in descriptors cleanup
@ 2015-05-12 15:37   ` Maxime Ripard
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-12 15:37 UTC (permalink / raw)
  To: Vinod Koul, Dan Williams, Gregory Clement, Jason Cooper,
	Andrew Lunn, Sebastian Hesselbarth
  Cc: dmaengine, linux-arm-kernel, linux-kernel, linux-crypto,
	Lior Amsalem, Thomas Petazzoni, Herbert Xu, David S. Miller,
	Maxime Ripard

From: Lior Amsalem <alior@marvell.com>

This patch fixes a bug in the XOR driver where the cleanup function can be
called and free descriptors that never been processed by the engine (which
result in data errors).

The cleanup function will free descriptors based on the ownership bit in
the descriptors.

Signed-off-by: Lior Amsalem <alior@marvell.com>
Reviewed-by: Ofer Heifetz <oferh@marvell.com>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
---
 drivers/dma/mv_xor.c | 72 +++++++++++++++++++++++++++++++++-------------------
 drivers/dma/mv_xor.h |  1 +
 2 files changed, 47 insertions(+), 26 deletions(-)

diff --git a/drivers/dma/mv_xor.c b/drivers/dma/mv_xor.c
index d1110442f0d2..28980483eafb 100644
--- a/drivers/dma/mv_xor.c
+++ b/drivers/dma/mv_xor.c
@@ -293,7 +293,8 @@ static void mv_chan_slot_cleanup(struct mv_xor_chan *mv_chan)
 	dma_cookie_t cookie = 0;
 	int busy = mv_chan_is_busy(mv_chan);
 	u32 current_desc = mv_chan_get_current_desc(mv_chan);
-	int seen_current = 0;
+	int current_cleaned = 0;
+	struct mv_xor_desc *hw_desc;
 
 	dev_dbg(mv_chan_to_devp(mv_chan), "%s %d\n", __func__, __LINE__);
 	dev_dbg(mv_chan_to_devp(mv_chan), "current_desc %x\n", current_desc);
@@ -305,38 +306,57 @@ static void mv_chan_slot_cleanup(struct mv_xor_chan *mv_chan)
 
 	list_for_each_entry_safe(iter, _iter, &mv_chan->chain,
 					node) {
-		prefetch(_iter);
-		prefetch(&_iter->async_tx);
 
-		/* do not advance past the current descriptor loaded into the
-		 * hardware channel, subsequent descriptors are either in
-		 * process or have not been submitted
-		 */
-		if (seen_current)
-			break;
+		/* clean finished descriptors */
+		hw_desc = iter->hw_desc;
+		if (hw_desc->status & XOR_DESC_SUCCESS) {
+			cookie = mv_desc_run_tx_complete_actions(iter, mv_chan,
+								 cookie);
 
-		/* stop the search if we reach the current descriptor and the
-		 * channel is busy
-		 */
-		if (iter->async_tx.phys == current_desc) {
-			seen_current = 1;
-			if (busy)
+			/* done processing desc, clean slot */
+			mv_desc_clean_slot(iter, mv_chan);
+
+			/* break if we did cleaned the current */
+			if (iter->async_tx.phys == current_desc) {
+				current_cleaned = 1;
 				break;
+			}
+		} else {
+			if (iter->async_tx.phys == current_desc) {
+				current_cleaned = 0;
+				break;
+			}
 		}
-
-		cookie = mv_desc_run_tx_complete_actions(iter, mv_chan, cookie);
-
-		if (mv_desc_clean_slot(iter, mv_chan))
-			break;
 	}
 
 	if ((busy == 0) && !list_empty(&mv_chan->chain)) {
-		struct mv_xor_desc_slot *chain_head;
-		chain_head = list_entry(mv_chan->chain.next,
-					struct mv_xor_desc_slot,
-					node);
-
-		mv_chan_start_new_chain(mv_chan, chain_head);
+		if (current_cleaned) {
+			/*
+			 * current descriptor cleaned and removed, run
+			 * from list head
+			 */
+			iter = list_entry(mv_chan->chain.next,
+					  struct mv_xor_desc_slot,
+					  node);
+			mv_chan_start_new_chain(mv_chan, iter);
+		} else {
+			if (!list_is_last(&iter->node, &mv_chan->chain)) {
+				/*
+				 * descriptors are still waiting after
+				 * current, trigger them
+				 */
+				iter = list_entry(iter->node.next,
+						  struct mv_xor_desc_slot,
+						  node);
+				mv_chan_start_new_chain(mv_chan, iter);
+			} else {
+				/*
+				 * some descriptors are still waiting
+				 * to be cleaned
+				 */
+				tasklet_schedule(&mv_chan->irq_tasklet);
+			}
+		}
 	}
 
 	if (cookie > 0)
diff --git a/drivers/dma/mv_xor.h b/drivers/dma/mv_xor.h
index 71684de37ddb..b7455b42137b 100644
--- a/drivers/dma/mv_xor.h
+++ b/drivers/dma/mv_xor.h
@@ -32,6 +32,7 @@
 #define XOR_OPERATION_MODE_MEMCPY	2
 #define XOR_OPERATION_MODE_IN_DESC      7
 #define XOR_DESCRIPTOR_SWAP		BIT(14)
+#define XOR_DESC_SUCCESS		0x40000000
 
 #define XOR_DESC_OPERATION_XOR          (0 << 24)
 #define XOR_DESC_OPERATION_CRC32C       (1 << 24)
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 5/8] dmaengine: mv_xor: bug fix for racing condition in descriptors cleanup
@ 2015-05-12 15:37   ` Maxime Ripard
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-12 15:37 UTC (permalink / raw)
  To: linux-arm-kernel

From: Lior Amsalem <alior@marvell.com>

This patch fixes a bug in the XOR driver where the cleanup function can be
called and free descriptors that never been processed by the engine (which
result in data errors).

The cleanup function will free descriptors based on the ownership bit in
the descriptors.

Signed-off-by: Lior Amsalem <alior@marvell.com>
Reviewed-by: Ofer Heifetz <oferh@marvell.com>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
---
 drivers/dma/mv_xor.c | 72 +++++++++++++++++++++++++++++++++-------------------
 drivers/dma/mv_xor.h |  1 +
 2 files changed, 47 insertions(+), 26 deletions(-)

diff --git a/drivers/dma/mv_xor.c b/drivers/dma/mv_xor.c
index d1110442f0d2..28980483eafb 100644
--- a/drivers/dma/mv_xor.c
+++ b/drivers/dma/mv_xor.c
@@ -293,7 +293,8 @@ static void mv_chan_slot_cleanup(struct mv_xor_chan *mv_chan)
 	dma_cookie_t cookie = 0;
 	int busy = mv_chan_is_busy(mv_chan);
 	u32 current_desc = mv_chan_get_current_desc(mv_chan);
-	int seen_current = 0;
+	int current_cleaned = 0;
+	struct mv_xor_desc *hw_desc;
 
 	dev_dbg(mv_chan_to_devp(mv_chan), "%s %d\n", __func__, __LINE__);
 	dev_dbg(mv_chan_to_devp(mv_chan), "current_desc %x\n", current_desc);
@@ -305,38 +306,57 @@ static void mv_chan_slot_cleanup(struct mv_xor_chan *mv_chan)
 
 	list_for_each_entry_safe(iter, _iter, &mv_chan->chain,
 					node) {
-		prefetch(_iter);
-		prefetch(&_iter->async_tx);
 
-		/* do not advance past the current descriptor loaded into the
-		 * hardware channel, subsequent descriptors are either in
-		 * process or have not been submitted
-		 */
-		if (seen_current)
-			break;
+		/* clean finished descriptors */
+		hw_desc = iter->hw_desc;
+		if (hw_desc->status & XOR_DESC_SUCCESS) {
+			cookie = mv_desc_run_tx_complete_actions(iter, mv_chan,
+								 cookie);
 
-		/* stop the search if we reach the current descriptor and the
-		 * channel is busy
-		 */
-		if (iter->async_tx.phys == current_desc) {
-			seen_current = 1;
-			if (busy)
+			/* done processing desc, clean slot */
+			mv_desc_clean_slot(iter, mv_chan);
+
+			/* break if we did cleaned the current */
+			if (iter->async_tx.phys == current_desc) {
+				current_cleaned = 1;
 				break;
+			}
+		} else {
+			if (iter->async_tx.phys == current_desc) {
+				current_cleaned = 0;
+				break;
+			}
 		}
-
-		cookie = mv_desc_run_tx_complete_actions(iter, mv_chan, cookie);
-
-		if (mv_desc_clean_slot(iter, mv_chan))
-			break;
 	}
 
 	if ((busy == 0) && !list_empty(&mv_chan->chain)) {
-		struct mv_xor_desc_slot *chain_head;
-		chain_head = list_entry(mv_chan->chain.next,
-					struct mv_xor_desc_slot,
-					node);
-
-		mv_chan_start_new_chain(mv_chan, chain_head);
+		if (current_cleaned) {
+			/*
+			 * current descriptor cleaned and removed, run
+			 * from list head
+			 */
+			iter = list_entry(mv_chan->chain.next,
+					  struct mv_xor_desc_slot,
+					  node);
+			mv_chan_start_new_chain(mv_chan, iter);
+		} else {
+			if (!list_is_last(&iter->node, &mv_chan->chain)) {
+				/*
+				 * descriptors are still waiting after
+				 * current, trigger them
+				 */
+				iter = list_entry(iter->node.next,
+						  struct mv_xor_desc_slot,
+						  node);
+				mv_chan_start_new_chain(mv_chan, iter);
+			} else {
+				/*
+				 * some descriptors are still waiting
+				 * to be cleaned
+				 */
+				tasklet_schedule(&mv_chan->irq_tasklet);
+			}
+		}
 	}
 
 	if (cookie > 0)
diff --git a/drivers/dma/mv_xor.h b/drivers/dma/mv_xor.h
index 71684de37ddb..b7455b42137b 100644
--- a/drivers/dma/mv_xor.h
+++ b/drivers/dma/mv_xor.h
@@ -32,6 +32,7 @@
 #define XOR_OPERATION_MODE_MEMCPY	2
 #define XOR_OPERATION_MODE_IN_DESC      7
 #define XOR_DESCRIPTOR_SWAP		BIT(14)
+#define XOR_DESC_SUCCESS		0x40000000
 
 #define XOR_DESC_OPERATION_XOR          (0 << 24)
 #define XOR_DESC_OPERATION_CRC32C       (1 << 24)
-- 
2.4.0

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 6/8] async_tx: adding mult and sum_product flags
  2015-05-12 15:37 ` Maxime Ripard
@ 2015-05-12 15:37   ` Maxime Ripard
  -1 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-12 15:37 UTC (permalink / raw)
  To: Vinod Koul, Dan Williams, Gregory Clement, Jason Cooper,
	Andrew Lunn, Sebastian Hesselbarth
  Cc: dmaengine, linux-arm-kernel, linux-kernel, linux-crypto,
	Lior Amsalem, Thomas Petazzoni, Herbert Xu, David S. Miller,
	Maxime Ripard

From: Lior Amsalem <alior@marvell.com>

Some engines (like Marvell mv_xor) do not support mult and sum_product
operations as part of the pq support.

This patch adds new flags: DMA_PREP_PQ_MULT & DMA_PREP_PQ_SUM_PRODUCT these
flags helps the driver identify such operations.

Signed-off-by: Lior Amsalem <alior@marvell.com>
Reviewed-by: Ofer Heifetz <oferh@marvell.com>
Reviewed-by: Nadav Haklai <nadavh@marvell.com>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
---
 crypto/async_tx/async_raid6_recov.c | 4 ++--
 include/linux/dmaengine.h           | 4 ++++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/crypto/async_tx/async_raid6_recov.c b/crypto/async_tx/async_raid6_recov.c
index 934a84981495..2db5486fd873 100644
--- a/crypto/async_tx/async_raid6_recov.c
+++ b/crypto/async_tx/async_raid6_recov.c
@@ -47,7 +47,7 @@ async_sum_product(struct page *dest, struct page **srcs, unsigned char *coef,
 		struct device *dev = dma->dev;
 		dma_addr_t pq[2];
 		struct dma_async_tx_descriptor *tx;
-		enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P;
+		enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P | DMA_PREP_PQ_SUM_PRODUCT;
 
 		if (submit->flags & ASYNC_TX_FENCE)
 			dma_flags |= DMA_PREP_FENCE;
@@ -111,7 +111,7 @@ async_mult(struct page *dest, struct page *src, u8 coef, size_t len,
 		dma_addr_t dma_dest[2];
 		struct device *dev = dma->dev;
 		struct dma_async_tx_descriptor *tx;
-		enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P;
+		enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P | DMA_PREP_PQ_MULT;
 
 		if (submit->flags & ASYNC_TX_FENCE)
 			dma_flags |= DMA_PREP_FENCE;
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index ad419757241f..f19ecebb4d3f 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -174,6 +174,8 @@ struct dma_interleaved_template {
  *  operation it continues the calculation with new sources
  * @DMA_PREP_FENCE - tell the driver that subsequent operations depend
  *  on the result of this operation
+ * @DMA_PREP_PQ_MULT - tell the driver that this is a mult request
+ * @DMA_PREP_PQ_SUM_PRODUCT - tell the driver that this is a sum product request
  */
 enum dma_ctrl_flags {
 	DMA_PREP_INTERRUPT = (1 << 0),
@@ -182,6 +184,8 @@ enum dma_ctrl_flags {
 	DMA_PREP_PQ_DISABLE_Q = (1 << 3),
 	DMA_PREP_CONTINUE = (1 << 4),
 	DMA_PREP_FENCE = (1 << 5),
+	DMA_PREP_PQ_MULT = (1 << 10),
+	DMA_PREP_PQ_SUM_PRODUCT = (1 << 11),
 };
 
 /**
-- 
2.4.0

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 6/8] async_tx: adding mult and sum_product flags
@ 2015-05-12 15:37   ` Maxime Ripard
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-12 15:37 UTC (permalink / raw)
  To: linux-arm-kernel

From: Lior Amsalem <alior@marvell.com>

Some engines (like Marvell mv_xor) do not support mult and sum_product
operations as part of the pq support.

This patch adds new flags: DMA_PREP_PQ_MULT & DMA_PREP_PQ_SUM_PRODUCT these
flags helps the driver identify such operations.

Signed-off-by: Lior Amsalem <alior@marvell.com>
Reviewed-by: Ofer Heifetz <oferh@marvell.com>
Reviewed-by: Nadav Haklai <nadavh@marvell.com>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
---
 crypto/async_tx/async_raid6_recov.c | 4 ++--
 include/linux/dmaengine.h           | 4 ++++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/crypto/async_tx/async_raid6_recov.c b/crypto/async_tx/async_raid6_recov.c
index 934a84981495..2db5486fd873 100644
--- a/crypto/async_tx/async_raid6_recov.c
+++ b/crypto/async_tx/async_raid6_recov.c
@@ -47,7 +47,7 @@ async_sum_product(struct page *dest, struct page **srcs, unsigned char *coef,
 		struct device *dev = dma->dev;
 		dma_addr_t pq[2];
 		struct dma_async_tx_descriptor *tx;
-		enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P;
+		enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P | DMA_PREP_PQ_SUM_PRODUCT;
 
 		if (submit->flags & ASYNC_TX_FENCE)
 			dma_flags |= DMA_PREP_FENCE;
@@ -111,7 +111,7 @@ async_mult(struct page *dest, struct page *src, u8 coef, size_t len,
 		dma_addr_t dma_dest[2];
 		struct device *dev = dma->dev;
 		struct dma_async_tx_descriptor *tx;
-		enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P;
+		enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P | DMA_PREP_PQ_MULT;
 
 		if (submit->flags & ASYNC_TX_FENCE)
 			dma_flags |= DMA_PREP_FENCE;
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index ad419757241f..f19ecebb4d3f 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -174,6 +174,8 @@ struct dma_interleaved_template {
  *  operation it continues the calculation with new sources
  * @DMA_PREP_FENCE - tell the driver that subsequent operations depend
  *  on the result of this operation
+ * @DMA_PREP_PQ_MULT - tell the driver that this is a mult request
+ * @DMA_PREP_PQ_SUM_PRODUCT - tell the driver that this is a sum product request
  */
 enum dma_ctrl_flags {
 	DMA_PREP_INTERRUPT = (1 << 0),
@@ -182,6 +184,8 @@ enum dma_ctrl_flags {
 	DMA_PREP_PQ_DISABLE_Q = (1 << 3),
 	DMA_PREP_CONTINUE = (1 << 4),
 	DMA_PREP_FENCE = (1 << 5),
+	DMA_PREP_PQ_MULT = (1 << 10),
+	DMA_PREP_PQ_SUM_PRODUCT = (1 << 11),
 };
 
 /**
-- 
2.4.0

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 7/8] dmaengine: mv_xor: add support for a38x RAID6 support
  2015-05-12 15:37 ` Maxime Ripard
@ 2015-05-12 15:37   ` Maxime Ripard
  -1 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-12 15:37 UTC (permalink / raw)
  To: Vinod Koul, Dan Williams, Gregory Clement, Jason Cooper,
	Andrew Lunn, Sebastian Hesselbarth
  Cc: dmaengine, linux-arm-kernel, linux-kernel, linux-crypto,
	Lior Amsalem, Thomas Petazzoni, Herbert Xu, David S. Miller,
	Maxime Ripard

From: Lior Amsalem <alior@marvell.com>

The offload engine in A38x introduce RAID6 capability, this patch adds
RAID6 offload support for mv_xor driver.

Signed-off-by: Lior Amsalem <alior@marvell.com>
Reviewed-by: Ofer Heifetz <oferh@marvell.com>
Reviewed-by: Nadav Haklai <nadavh@marvell.com>
Tested-by: Nadav Haklai <nadavh@marvell.com>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
---
 drivers/dma/mv_xor.c | 116 +++++++++++++++++++++++++++++++++++++++++++++++----
 drivers/dma/mv_xor.h |   5 ++-
 2 files changed, 111 insertions(+), 10 deletions(-)

diff --git a/drivers/dma/mv_xor.c b/drivers/dma/mv_xor.c
index 28980483eafb..9b5753dcbf76 100644
--- a/drivers/dma/mv_xor.c
+++ b/drivers/dma/mv_xor.c
@@ -36,6 +36,11 @@ enum mv_xor_mode {
 	XOR_MODE_IN_DESC,
 };
 
+/* engine coefficients  */
+static u8 mv_xor_raid6_coefs[8] = {
+	0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80
+};
+
 static void mv_xor_issue_pending(struct dma_chan *chan);
 
 #define to_mv_xor_chan(chan)		\
@@ -48,8 +53,7 @@ static void mv_xor_issue_pending(struct dma_chan *chan);
 	((chan)->dmadev.dev)
 
 static void mv_desc_init(struct mv_xor_desc_slot *desc,
-			 dma_addr_t addr, u32 byte_count,
-			 enum dma_ctrl_flags flags)
+			  u32 byte_count, enum dma_ctrl_flags flags)
 {
 	struct mv_xor_desc *hw_desc = desc->hw_desc;
 
@@ -58,7 +62,6 @@ static void mv_desc_init(struct mv_xor_desc_slot *desc,
 	/* Enable end-of-descriptor interrupts only for DMA_PREP_INTERRUPT */
 	hw_desc->desc_command = (flags & DMA_PREP_INTERRUPT) ?
 				XOR_DESC_EOD_INT_EN : 0;
-	hw_desc->phy_dest_addr = addr;
 	hw_desc->byte_count = byte_count;
 }
 
@@ -74,6 +77,9 @@ static void mv_desc_set_mode(struct mv_xor_desc_slot *desc)
 	case DMA_MEMCPY:
 		hw_desc->desc_command |= XOR_DESC_OPERATION_MEMCPY;
 		break;
+	case DMA_PQ:
+		hw_desc->desc_command |= XOR_DESC_OPERATION_PQ;
+		break;
 	default:
 		BUG();
 		return;
@@ -84,16 +90,38 @@ static void mv_desc_set_next_desc(struct mv_xor_desc_slot *desc,
 				  u32 next_desc_addr)
 {
 	struct mv_xor_desc *hw_desc = desc->hw_desc;
+
 	BUG_ON(hw_desc->phy_next_desc);
 	hw_desc->phy_next_desc = next_desc_addr;
 }
 
+static void mv_desc_set_dest_addr(struct mv_xor_desc_slot *desc,
+				  dma_addr_t addr)
+{
+	struct mv_xor_desc *hw_desc = desc->hw_desc;
+
+	hw_desc->phy_dest_addr = addr;
+	if (desc->type == DMA_PQ)
+		hw_desc->desc_command |= (1 << 8);
+}
+
+static void mv_desc_set_q_dest_addr(struct mv_xor_desc_slot *desc,
+				    dma_addr_t addr)
+{
+	struct mv_xor_desc *hw_desc = desc->hw_desc;
+
+	hw_desc->phy_q_dest_addr = addr;
+	if (desc->type == DMA_PQ)
+		hw_desc->desc_command |= (1 << 9);
+}
+
 static void mv_desc_set_src_addr(struct mv_xor_desc_slot *desc,
 				 int index, dma_addr_t addr)
 {
 	struct mv_xor_desc *hw_desc = desc->hw_desc;
+
 	hw_desc->phy_src_addr[mv_phy_src_idx(index)] = addr;
-	if (desc->type == DMA_XOR)
+	if ((desc->type == DMA_XOR) || (desc->type == DMA_PQ))
 		hw_desc->desc_command |= (1 << index);
 }
 
@@ -520,7 +548,8 @@ mv_xor_prep_dma_xor(struct dma_chan *chan, dma_addr_t dest, dma_addr_t *src,
 	if (sw_desc) {
 		sw_desc->type = DMA_XOR;
 		sw_desc->async_tx.flags = flags;
-		mv_desc_init(sw_desc, dest, len, flags);
+		mv_desc_init(sw_desc, len, flags);
+		mv_desc_set_dest_addr(sw_desc, dest);
 		if (mv_chan->op_in_desc == XOR_MODE_IN_DESC)
 			mv_desc_set_mode(sw_desc);
 		while (src_cnt--)
@@ -562,6 +591,69 @@ mv_xor_prep_dma_interrupt(struct dma_chan *chan, unsigned long flags)
 	return mv_xor_prep_dma_xor(chan, dest, &src, 1, len, flags);
 }
 
+static struct dma_async_tx_descriptor *
+mv_xor_prep_dma_pq(struct dma_chan *chan, dma_addr_t *dst, dma_addr_t *src,
+		   unsigned int src_cnt, const unsigned char *scf,
+		   size_t len, unsigned long flags)
+{
+	struct mv_xor_chan *mv_chan = to_mv_xor_chan(chan);
+	struct mv_xor_desc_slot *sw_desc;
+	int src_i = 0;
+	int i = 0;
+
+	if (unlikely(len < MV_XOR_MIN_BYTE_COUNT))
+		return NULL;
+
+	BUG_ON(len > MV_XOR_MAX_BYTE_COUNT);
+
+	dev_dbg(mv_chan_to_devp(mv_chan),
+		"%s src_cnt: %d len: %u flags: %ld\n",
+		__func__, src_cnt, len, flags);
+
+	/*
+	 * since the coefs on Marvell engine are hardcoded, do not
+	 * support mult and sum product requests
+	 */
+	if ((flags & DMA_PREP_PQ_MULT) || (flags & DMA_PREP_PQ_SUM_PRODUCT))
+		return NULL;
+
+	sw_desc = mv_chan_alloc_slot(mv_chan);
+	if (sw_desc) {
+		sw_desc->type = DMA_PQ;
+		sw_desc->async_tx.flags = flags;
+		mv_desc_init(sw_desc, len, flags);
+		if (mv_chan->op_in_desc == XOR_MODE_IN_DESC)
+			mv_desc_set_mode(sw_desc);
+		if (!(flags & DMA_PREP_PQ_DISABLE_P))
+			mv_desc_set_dest_addr(sw_desc, dst[0]);
+		if (!(flags & DMA_PREP_PQ_DISABLE_Q))
+			mv_desc_set_q_dest_addr(sw_desc, dst[1]);
+		while (src_cnt) {
+			/*
+			 * probably we can do better coding below,
+			 * hash table maybe?
+			 */
+			if (scf[src_i] == mv_xor_raid6_coefs[i]) {
+				/*
+				 * coefs are hardcoded, assign the src
+				 * to the right place
+				 */
+				mv_desc_set_src_addr(sw_desc, i, src[src_i]);
+				src_i++;
+				i++;
+				src_cnt--;
+			} else {
+				i++;
+			}
+		}
+	}
+
+	dev_dbg(mv_chan_to_devp(mv_chan),
+		"%s sw_desc %p async_tx %p\n",
+		__func__, sw_desc, &sw_desc->async_tx);
+	return sw_desc ? &sw_desc->async_tx : NULL;
+}
+
 static void mv_xor_free_chan_resources(struct dma_chan *chan)
 {
 	struct mv_xor_chan *mv_chan = to_mv_xor_chan(chan);
@@ -1026,6 +1118,10 @@ mv_xor_channel_add(struct mv_xor_device *xordev,
 		dma_dev->max_xor = 8;
 		dma_dev->device_prep_dma_xor = mv_xor_prep_dma_xor;
 	}
+	if (dma_has_cap(DMA_PQ, dma_dev->cap_mask)) {
+		dma_set_maxpq(dma_dev, 8, 0);
+		dma_dev->device_prep_dma_pq = mv_xor_prep_dma_pq;
+	}
 
 	mv_chan->mmr_base = xordev->xor_base;
 	mv_chan->mmr_high_base = xordev->xor_high_base;
@@ -1071,11 +1167,13 @@ mv_xor_channel_add(struct mv_xor_device *xordev,
 			goto err_free_irq;
 	}
 
-	dev_info(&pdev->dev, "Marvell XOR (%s): ( %s%s%s)\n",
+	dev_info(&pdev->dev, "Marvell XOR (%s): ( %s%s%s%s)\n",
 		 mv_chan->op_in_desc ? "Descriptor Mode" : "Registers Mode",
 		 dma_has_cap(DMA_XOR, dma_dev->cap_mask) ? "xor " : "",
 		 dma_has_cap(DMA_MEMCPY, dma_dev->cap_mask) ? "cpy " : "",
-		 dma_has_cap(DMA_INTERRUPT, dma_dev->cap_mask) ? "intr " : "");
+		 dma_has_cap(DMA_INTERRUPT, dma_dev->cap_mask) ? "intr " : "",
+		 dma_has_cap(DMA_PQ, dma_dev->cap_mask) ? "pq " : "");
+
 
 	dma_async_device_register(dma_dev);
 	return mv_chan;
@@ -1199,6 +1297,8 @@ static int mv_xor_probe(struct platform_device *pdev)
 				dma_cap_set(DMA_XOR, cap_mask);
 			if (of_property_read_bool(np, "dmacap,interrupt"))
 				dma_cap_set(DMA_INTERRUPT, cap_mask);
+			if (of_property_read_bool(np, "dmacap,pq"))
+				dma_cap_set(DMA_PQ, cap_mask);
 
 			irq = irq_of_parse_and_map(np, 0);
 			if (!irq) {
@@ -1310,6 +1410,6 @@ static void __exit mv_xor_exit(void)
 module_exit(mv_xor_exit);
 #endif
 
-MODULE_AUTHOR("Saeed Bishara <saeed@marvell.com>");
+MODULE_AUTHOR("Lior Amsalem <alior@marvell.com>, Saeed Bishara <saeed@marvell.com>");
 MODULE_DESCRIPTION("DMA engine driver for Marvell's XOR engine");
 MODULE_LICENSE("GPL");
diff --git a/drivers/dma/mv_xor.h b/drivers/dma/mv_xor.h
index b7455b42137b..b72e7357c5c8 100644
--- a/drivers/dma/mv_xor.h
+++ b/drivers/dma/mv_xor.h
@@ -37,6 +37,7 @@
 #define XOR_DESC_OPERATION_XOR          (0 << 24)
 #define XOR_DESC_OPERATION_CRC32C       (1 << 24)
 #define XOR_DESC_OPERATION_MEMCPY       (2 << 24)
+#define XOR_DESC_OPERATION_PQ           (5 << 24)
 
 #define XOR_DESC_DMA_OWNED		BIT(31)
 #define XOR_DESC_EOD_INT_EN		BIT(31)
@@ -164,7 +165,7 @@ struct mv_xor_desc {
 	u32 byte_count;		/* size of src/dst blocks in bytes */
 	u32 phy_dest_addr;	/* destination block address */
 	u32 phy_src_addr[8];	/* source block addresses */
-	u32 reserved0;
+	u32 phy_q_dest_addr;
 	u32 reserved1;
 };
 #define mv_phy_src_idx(src_idx) (src_idx)
@@ -178,7 +179,7 @@ struct mv_xor_desc {
 	u32 byte_count;		/* size of src/dst blocks in bytes */
 	u32 phy_src_addr[8];	/* source block addresses */
 	u32 reserved1;
-	u32 reserved0;
+	u32 phy_q_dest_addr;
 };
 #define mv_phy_src_idx(src_idx) (src_idx ^ 1)
 #endif
-- 
2.4.0

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 7/8] dmaengine: mv_xor: add support for a38x RAID6 support
@ 2015-05-12 15:37   ` Maxime Ripard
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-12 15:37 UTC (permalink / raw)
  To: linux-arm-kernel

From: Lior Amsalem <alior@marvell.com>

The offload engine in A38x introduce RAID6 capability, this patch adds
RAID6 offload support for mv_xor driver.

Signed-off-by: Lior Amsalem <alior@marvell.com>
Reviewed-by: Ofer Heifetz <oferh@marvell.com>
Reviewed-by: Nadav Haklai <nadavh@marvell.com>
Tested-by: Nadav Haklai <nadavh@marvell.com>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
---
 drivers/dma/mv_xor.c | 116 +++++++++++++++++++++++++++++++++++++++++++++++----
 drivers/dma/mv_xor.h |   5 ++-
 2 files changed, 111 insertions(+), 10 deletions(-)

diff --git a/drivers/dma/mv_xor.c b/drivers/dma/mv_xor.c
index 28980483eafb..9b5753dcbf76 100644
--- a/drivers/dma/mv_xor.c
+++ b/drivers/dma/mv_xor.c
@@ -36,6 +36,11 @@ enum mv_xor_mode {
 	XOR_MODE_IN_DESC,
 };
 
+/* engine coefficients  */
+static u8 mv_xor_raid6_coefs[8] = {
+	0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80
+};
+
 static void mv_xor_issue_pending(struct dma_chan *chan);
 
 #define to_mv_xor_chan(chan)		\
@@ -48,8 +53,7 @@ static void mv_xor_issue_pending(struct dma_chan *chan);
 	((chan)->dmadev.dev)
 
 static void mv_desc_init(struct mv_xor_desc_slot *desc,
-			 dma_addr_t addr, u32 byte_count,
-			 enum dma_ctrl_flags flags)
+			  u32 byte_count, enum dma_ctrl_flags flags)
 {
 	struct mv_xor_desc *hw_desc = desc->hw_desc;
 
@@ -58,7 +62,6 @@ static void mv_desc_init(struct mv_xor_desc_slot *desc,
 	/* Enable end-of-descriptor interrupts only for DMA_PREP_INTERRUPT */
 	hw_desc->desc_command = (flags & DMA_PREP_INTERRUPT) ?
 				XOR_DESC_EOD_INT_EN : 0;
-	hw_desc->phy_dest_addr = addr;
 	hw_desc->byte_count = byte_count;
 }
 
@@ -74,6 +77,9 @@ static void mv_desc_set_mode(struct mv_xor_desc_slot *desc)
 	case DMA_MEMCPY:
 		hw_desc->desc_command |= XOR_DESC_OPERATION_MEMCPY;
 		break;
+	case DMA_PQ:
+		hw_desc->desc_command |= XOR_DESC_OPERATION_PQ;
+		break;
 	default:
 		BUG();
 		return;
@@ -84,16 +90,38 @@ static void mv_desc_set_next_desc(struct mv_xor_desc_slot *desc,
 				  u32 next_desc_addr)
 {
 	struct mv_xor_desc *hw_desc = desc->hw_desc;
+
 	BUG_ON(hw_desc->phy_next_desc);
 	hw_desc->phy_next_desc = next_desc_addr;
 }
 
+static void mv_desc_set_dest_addr(struct mv_xor_desc_slot *desc,
+				  dma_addr_t addr)
+{
+	struct mv_xor_desc *hw_desc = desc->hw_desc;
+
+	hw_desc->phy_dest_addr = addr;
+	if (desc->type == DMA_PQ)
+		hw_desc->desc_command |= (1 << 8);
+}
+
+static void mv_desc_set_q_dest_addr(struct mv_xor_desc_slot *desc,
+				    dma_addr_t addr)
+{
+	struct mv_xor_desc *hw_desc = desc->hw_desc;
+
+	hw_desc->phy_q_dest_addr = addr;
+	if (desc->type == DMA_PQ)
+		hw_desc->desc_command |= (1 << 9);
+}
+
 static void mv_desc_set_src_addr(struct mv_xor_desc_slot *desc,
 				 int index, dma_addr_t addr)
 {
 	struct mv_xor_desc *hw_desc = desc->hw_desc;
+
 	hw_desc->phy_src_addr[mv_phy_src_idx(index)] = addr;
-	if (desc->type == DMA_XOR)
+	if ((desc->type == DMA_XOR) || (desc->type == DMA_PQ))
 		hw_desc->desc_command |= (1 << index);
 }
 
@@ -520,7 +548,8 @@ mv_xor_prep_dma_xor(struct dma_chan *chan, dma_addr_t dest, dma_addr_t *src,
 	if (sw_desc) {
 		sw_desc->type = DMA_XOR;
 		sw_desc->async_tx.flags = flags;
-		mv_desc_init(sw_desc, dest, len, flags);
+		mv_desc_init(sw_desc, len, flags);
+		mv_desc_set_dest_addr(sw_desc, dest);
 		if (mv_chan->op_in_desc == XOR_MODE_IN_DESC)
 			mv_desc_set_mode(sw_desc);
 		while (src_cnt--)
@@ -562,6 +591,69 @@ mv_xor_prep_dma_interrupt(struct dma_chan *chan, unsigned long flags)
 	return mv_xor_prep_dma_xor(chan, dest, &src, 1, len, flags);
 }
 
+static struct dma_async_tx_descriptor *
+mv_xor_prep_dma_pq(struct dma_chan *chan, dma_addr_t *dst, dma_addr_t *src,
+		   unsigned int src_cnt, const unsigned char *scf,
+		   size_t len, unsigned long flags)
+{
+	struct mv_xor_chan *mv_chan = to_mv_xor_chan(chan);
+	struct mv_xor_desc_slot *sw_desc;
+	int src_i = 0;
+	int i = 0;
+
+	if (unlikely(len < MV_XOR_MIN_BYTE_COUNT))
+		return NULL;
+
+	BUG_ON(len > MV_XOR_MAX_BYTE_COUNT);
+
+	dev_dbg(mv_chan_to_devp(mv_chan),
+		"%s src_cnt: %d len: %u flags: %ld\n",
+		__func__, src_cnt, len, flags);
+
+	/*
+	 * since the coefs on Marvell engine are hardcoded, do not
+	 * support mult and sum product requests
+	 */
+	if ((flags & DMA_PREP_PQ_MULT) || (flags & DMA_PREP_PQ_SUM_PRODUCT))
+		return NULL;
+
+	sw_desc = mv_chan_alloc_slot(mv_chan);
+	if (sw_desc) {
+		sw_desc->type = DMA_PQ;
+		sw_desc->async_tx.flags = flags;
+		mv_desc_init(sw_desc, len, flags);
+		if (mv_chan->op_in_desc == XOR_MODE_IN_DESC)
+			mv_desc_set_mode(sw_desc);
+		if (!(flags & DMA_PREP_PQ_DISABLE_P))
+			mv_desc_set_dest_addr(sw_desc, dst[0]);
+		if (!(flags & DMA_PREP_PQ_DISABLE_Q))
+			mv_desc_set_q_dest_addr(sw_desc, dst[1]);
+		while (src_cnt) {
+			/*
+			 * probably we can do better coding below,
+			 * hash table maybe?
+			 */
+			if (scf[src_i] == mv_xor_raid6_coefs[i]) {
+				/*
+				 * coefs are hardcoded, assign the src
+				 * to the right place
+				 */
+				mv_desc_set_src_addr(sw_desc, i, src[src_i]);
+				src_i++;
+				i++;
+				src_cnt--;
+			} else {
+				i++;
+			}
+		}
+	}
+
+	dev_dbg(mv_chan_to_devp(mv_chan),
+		"%s sw_desc %p async_tx %p\n",
+		__func__, sw_desc, &sw_desc->async_tx);
+	return sw_desc ? &sw_desc->async_tx : NULL;
+}
+
 static void mv_xor_free_chan_resources(struct dma_chan *chan)
 {
 	struct mv_xor_chan *mv_chan = to_mv_xor_chan(chan);
@@ -1026,6 +1118,10 @@ mv_xor_channel_add(struct mv_xor_device *xordev,
 		dma_dev->max_xor = 8;
 		dma_dev->device_prep_dma_xor = mv_xor_prep_dma_xor;
 	}
+	if (dma_has_cap(DMA_PQ, dma_dev->cap_mask)) {
+		dma_set_maxpq(dma_dev, 8, 0);
+		dma_dev->device_prep_dma_pq = mv_xor_prep_dma_pq;
+	}
 
 	mv_chan->mmr_base = xordev->xor_base;
 	mv_chan->mmr_high_base = xordev->xor_high_base;
@@ -1071,11 +1167,13 @@ mv_xor_channel_add(struct mv_xor_device *xordev,
 			goto err_free_irq;
 	}
 
-	dev_info(&pdev->dev, "Marvell XOR (%s): ( %s%s%s)\n",
+	dev_info(&pdev->dev, "Marvell XOR (%s): ( %s%s%s%s)\n",
 		 mv_chan->op_in_desc ? "Descriptor Mode" : "Registers Mode",
 		 dma_has_cap(DMA_XOR, dma_dev->cap_mask) ? "xor " : "",
 		 dma_has_cap(DMA_MEMCPY, dma_dev->cap_mask) ? "cpy " : "",
-		 dma_has_cap(DMA_INTERRUPT, dma_dev->cap_mask) ? "intr " : "");
+		 dma_has_cap(DMA_INTERRUPT, dma_dev->cap_mask) ? "intr " : "",
+		 dma_has_cap(DMA_PQ, dma_dev->cap_mask) ? "pq " : "");
+
 
 	dma_async_device_register(dma_dev);
 	return mv_chan;
@@ -1199,6 +1297,8 @@ static int mv_xor_probe(struct platform_device *pdev)
 				dma_cap_set(DMA_XOR, cap_mask);
 			if (of_property_read_bool(np, "dmacap,interrupt"))
 				dma_cap_set(DMA_INTERRUPT, cap_mask);
+			if (of_property_read_bool(np, "dmacap,pq"))
+				dma_cap_set(DMA_PQ, cap_mask);
 
 			irq = irq_of_parse_and_map(np, 0);
 			if (!irq) {
@@ -1310,6 +1410,6 @@ static void __exit mv_xor_exit(void)
 module_exit(mv_xor_exit);
 #endif
 
-MODULE_AUTHOR("Saeed Bishara <saeed@marvell.com>");
+MODULE_AUTHOR("Lior Amsalem <alior@marvell.com>, Saeed Bishara <saeed@marvell.com>");
 MODULE_DESCRIPTION("DMA engine driver for Marvell's XOR engine");
 MODULE_LICENSE("GPL");
diff --git a/drivers/dma/mv_xor.h b/drivers/dma/mv_xor.h
index b7455b42137b..b72e7357c5c8 100644
--- a/drivers/dma/mv_xor.h
+++ b/drivers/dma/mv_xor.h
@@ -37,6 +37,7 @@
 #define XOR_DESC_OPERATION_XOR          (0 << 24)
 #define XOR_DESC_OPERATION_CRC32C       (1 << 24)
 #define XOR_DESC_OPERATION_MEMCPY       (2 << 24)
+#define XOR_DESC_OPERATION_PQ           (5 << 24)
 
 #define XOR_DESC_DMA_OWNED		BIT(31)
 #define XOR_DESC_EOD_INT_EN		BIT(31)
@@ -164,7 +165,7 @@ struct mv_xor_desc {
 	u32 byte_count;		/* size of src/dst blocks in bytes */
 	u32 phy_dest_addr;	/* destination block address */
 	u32 phy_src_addr[8];	/* source block addresses */
-	u32 reserved0;
+	u32 phy_q_dest_addr;
 	u32 reserved1;
 };
 #define mv_phy_src_idx(src_idx) (src_idx)
@@ -178,7 +179,7 @@ struct mv_xor_desc {
 	u32 byte_count;		/* size of src/dst blocks in bytes */
 	u32 phy_src_addr[8];	/* source block addresses */
 	u32 reserved1;
-	u32 reserved0;
+	u32 phy_q_dest_addr;
 };
 #define mv_phy_src_idx(src_idx) (src_idx ^ 1)
 #endif
-- 
2.4.0

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 8/8] ARM: mvebu: a38x: Enable A38x XOR engine features
  2015-05-12 15:37 ` Maxime Ripard
@ 2015-05-12 15:37   ` Maxime Ripard
  -1 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-12 15:37 UTC (permalink / raw)
  To: Vinod Koul, Dan Williams, Gregory Clement, Jason Cooper,
	Andrew Lunn, Sebastian Hesselbarth
  Cc: dmaengine, linux-arm-kernel, linux-kernel, linux-crypto,
	Lior Amsalem, Thomas Petazzoni, Herbert Xu, David S. Miller

From: Lior Amsalem <alior@marvell.com>

The new XOR engine has a new compatible of its own, together with new
channel capabilities.

Use that new compatible now that we have a driver that can handle it.

Signed-off-by: Lior Amsalem <alior@marvell.com>
Reviewed-by: Ofer Heifetz <oferh@marvell.com>
Reviewed-by: Nadav Haklai <nadavh@marvell.com>
Tested-by: Nadav Haklai <nadavh@marvell.com>
---
 arch/arm/boot/dts/armada-38x.dtsi | 20 ++++++--------------
 1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/arch/arm/boot/dts/armada-38x.dtsi b/arch/arm/boot/dts/armada-38x.dtsi
index ed2dd8ba4080..6d07b7389415 100644
--- a/arch/arm/boot/dts/armada-38x.dtsi
+++ b/arch/arm/boot/dts/armada-38x.dtsi
@@ -448,7 +448,7 @@
 			};
 
 			xor@60800 {
-				compatible = "marvell,orion-xor";
+				compatible = "marvell,a38x-xor";
 				reg = <0x60800 0x100
 				       0x60a00 0x100>;
 				clocks = <&gateclk 22>;
@@ -458,17 +458,13 @@
 					interrupts = <GIC_SPI 22 IRQ_TYPE_LEVEL_HIGH>;
 					dmacap,memcpy;
 					dmacap,xor;
-				};
-				xor01 {
-					interrupts = <GIC_SPI 23 IRQ_TYPE_LEVEL_HIGH>;
-					dmacap,memcpy;
-					dmacap,xor;
-					dmacap,memset;
+					dmacap,pq;
+					dmacap,interrupt;
 				};
 			};
 
 			xor@60900 {
-				compatible = "marvell,orion-xor";
+				compatible = "marvell,a38x-xor";
 				reg = <0x60900 0x100
 				       0x60b00 0x100>;
 				clocks = <&gateclk 28>;
@@ -478,12 +474,8 @@
 					interrupts = <GIC_SPI 65 IRQ_TYPE_LEVEL_HIGH>;
 					dmacap,memcpy;
 					dmacap,xor;
-				};
-				xor11 {
-					interrupts = <GIC_SPI 66 IRQ_TYPE_LEVEL_HIGH>;
-					dmacap,memcpy;
-					dmacap,xor;
-					dmacap,memset;
+					dmacap,pq;
+					dmacap,interrupt;
 				};
 			};
 
-- 
2.4.0

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 8/8] ARM: mvebu: a38x: Enable A38x XOR engine features
@ 2015-05-12 15:37   ` Maxime Ripard
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-12 15:37 UTC (permalink / raw)
  To: linux-arm-kernel

From: Lior Amsalem <alior@marvell.com>

The new XOR engine has a new compatible of its own, together with new
channel capabilities.

Use that new compatible now that we have a driver that can handle it.

Signed-off-by: Lior Amsalem <alior@marvell.com>
Reviewed-by: Ofer Heifetz <oferh@marvell.com>
Reviewed-by: Nadav Haklai <nadavh@marvell.com>
Tested-by: Nadav Haklai <nadavh@marvell.com>
---
 arch/arm/boot/dts/armada-38x.dtsi | 20 ++++++--------------
 1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/arch/arm/boot/dts/armada-38x.dtsi b/arch/arm/boot/dts/armada-38x.dtsi
index ed2dd8ba4080..6d07b7389415 100644
--- a/arch/arm/boot/dts/armada-38x.dtsi
+++ b/arch/arm/boot/dts/armada-38x.dtsi
@@ -448,7 +448,7 @@
 			};
 
 			xor at 60800 {
-				compatible = "marvell,orion-xor";
+				compatible = "marvell,a38x-xor";
 				reg = <0x60800 0x100
 				       0x60a00 0x100>;
 				clocks = <&gateclk 22>;
@@ -458,17 +458,13 @@
 					interrupts = <GIC_SPI 22 IRQ_TYPE_LEVEL_HIGH>;
 					dmacap,memcpy;
 					dmacap,xor;
-				};
-				xor01 {
-					interrupts = <GIC_SPI 23 IRQ_TYPE_LEVEL_HIGH>;
-					dmacap,memcpy;
-					dmacap,xor;
-					dmacap,memset;
+					dmacap,pq;
+					dmacap,interrupt;
 				};
 			};
 
 			xor at 60900 {
-				compatible = "marvell,orion-xor";
+				compatible = "marvell,a38x-xor";
 				reg = <0x60900 0x100
 				       0x60b00 0x100>;
 				clocks = <&gateclk 28>;
@@ -478,12 +474,8 @@
 					interrupts = <GIC_SPI 65 IRQ_TYPE_LEVEL_HIGH>;
 					dmacap,memcpy;
 					dmacap,xor;
-				};
-				xor11 {
-					interrupts = <GIC_SPI 66 IRQ_TYPE_LEVEL_HIGH>;
-					dmacap,memcpy;
-					dmacap,xor;
-					dmacap,memset;
+					dmacap,pq;
+					dmacap,interrupt;
 				};
 			};
 
-- 
2.4.0

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode
  2015-05-12 15:37   ` Maxime Ripard
@ 2015-05-12 15:49     ` Arnd Bergmann
  -1 siblings, 0 replies; 79+ messages in thread
From: Arnd Bergmann @ 2015-05-12 15:49 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Maxime Ripard, Vinod Koul, Dan Williams, Gregory Clement,
	Jason Cooper, Andrew Lunn, Sebastian Hesselbarth, Lior Amsalem,
	Herbert Xu, Thomas Petazzoni, linux-kernel, linux-crypto,
	dmaengine, David S. Miller

On Tuesday 12 May 2015 17:37:37 Maxime Ripard wrote:
> From: Lior Amsalem <alior@marvell.com>
> 
> The Marvell Armada 38x SoC introduce new features to the XOR engine,
> especially the fact that the engine mode (MEMCPY/XOR/PQ/etc) can be part of
> the descriptor and not set through the controller registers.
> 
> This new feature allows mixing of different commands (even PQ) on the same
> channel/chain without the need to stop the engine to reconfigure the engine
> mode.
> 
> Refactor the driver to be able to use that new feature on the Armada 38x,
> while keeping the old behaviour on the older SoCs.
> 
> Signed-off-by: Lior Amsalem <alior@marvell.com>
> Reviewed-by: Ofer Heifetz <oferh@marvell.com>
> Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>

Two minimal style comments:

> +static void mv_chan_set_mode_to_desc(struct mv_xor_chan *chan)
> +{
> +	u32 op_mode;
> +	u32 config = readl_relaxed(XOR_CONFIG(chan));
> +
> +	op_mode = XOR_OPERATION_MODE_IN_DESC;
> +
> +	config &= ~0x7;
> +	config |= op_mode;
> +
> +#if defined(__BIG_ENDIAN)
> +	config |= XOR_DESCRIPTOR_SWAP;
> +#else
> +	config &= ~XOR_DESCRIPTOR_SWAP;
> +#endif
> +
> +	writel_relaxed(config, XOR_CONFIG(chan));
> +}

Using 

	if (IS_ENABLED(__BIG_ENDIAN))

here would make it more readable by avoiding the #if. Alternatively,
you could leave the XOR_DESCRIPTOR_SWAP flag disabled and just swap
the descriptors manually like a lot of other drivers do. You have
to swap the mmio accesses anywya.

>  }
>  
> +#ifdef CONFIG_OF
> +static const struct of_device_id mv_xor_dt_ids[] = {
> +	{ .compatible = "marvell,orion-xor", .data = (void *)XOR_MODE_IN_REG },
> +	{ .compatible = "marvell,a38x-xor", .data = (void *)XOR_MODE_IN_DESC },
> +	{},
> +};
> +MODULE_DEVICE_TABLE(of, mv_xor_dt_ids);
> +#endif
> +

Just leave out the #ifdef here. Almost all the mvebu machines use DT now,
so it's not worth the size benefit of leaving it out on the few machines
that don't.

You'll have to remove the of_match_ptr() invocation as well if you do that,
to avoid a warning about an unused symbol.

	Arnd

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode
@ 2015-05-12 15:49     ` Arnd Bergmann
  0 siblings, 0 replies; 79+ messages in thread
From: Arnd Bergmann @ 2015-05-12 15:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 12 May 2015 17:37:37 Maxime Ripard wrote:
> From: Lior Amsalem <alior@marvell.com>
> 
> The Marvell Armada 38x SoC introduce new features to the XOR engine,
> especially the fact that the engine mode (MEMCPY/XOR/PQ/etc) can be part of
> the descriptor and not set through the controller registers.
> 
> This new feature allows mixing of different commands (even PQ) on the same
> channel/chain without the need to stop the engine to reconfigure the engine
> mode.
> 
> Refactor the driver to be able to use that new feature on the Armada 38x,
> while keeping the old behaviour on the older SoCs.
> 
> Signed-off-by: Lior Amsalem <alior@marvell.com>
> Reviewed-by: Ofer Heifetz <oferh@marvell.com>
> Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>

Two minimal style comments:

> +static void mv_chan_set_mode_to_desc(struct mv_xor_chan *chan)
> +{
> +	u32 op_mode;
> +	u32 config = readl_relaxed(XOR_CONFIG(chan));
> +
> +	op_mode = XOR_OPERATION_MODE_IN_DESC;
> +
> +	config &= ~0x7;
> +	config |= op_mode;
> +
> +#if defined(__BIG_ENDIAN)
> +	config |= XOR_DESCRIPTOR_SWAP;
> +#else
> +	config &= ~XOR_DESCRIPTOR_SWAP;
> +#endif
> +
> +	writel_relaxed(config, XOR_CONFIG(chan));
> +}

Using 

	if (IS_ENABLED(__BIG_ENDIAN))

here would make it more readable by avoiding the #if. Alternatively,
you could leave the XOR_DESCRIPTOR_SWAP flag disabled and just swap
the descriptors manually like a lot of other drivers do. You have
to swap the mmio accesses anywya.

>  }
>  
> +#ifdef CONFIG_OF
> +static const struct of_device_id mv_xor_dt_ids[] = {
> +	{ .compatible = "marvell,orion-xor", .data = (void *)XOR_MODE_IN_REG },
> +	{ .compatible = "marvell,a38x-xor", .data = (void *)XOR_MODE_IN_DESC },
> +	{},
> +};
> +MODULE_DEVICE_TABLE(of, mv_xor_dt_ids);
> +#endif
> +

Just leave out the #ifdef here. Almost all the mvebu machines use DT now,
so it's not worth the size benefit of leaving it out on the few machines
that don't.

You'll have to remove the of_match_ptr() invocation as well if you do that,
to avoid a warning about an unused symbol.

	Arnd

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode
  2015-05-12 15:37   ` Maxime Ripard
@ 2015-05-12 15:49     ` Thomas Petazzoni
  -1 siblings, 0 replies; 79+ messages in thread
From: Thomas Petazzoni @ 2015-05-12 15:49 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Vinod Koul, Dan Williams, Gregory Clement, Jason Cooper,
	Andrew Lunn, Sebastian Hesselbarth, dmaengine, linux-arm-kernel,
	linux-kernel, linux-crypto, Lior Amsalem, Thomas Petazzoni,
	Herbert Xu, David S. Miller

Maxime,

On Tue, 12 May 2015 17:37:37 +0200, Maxime Ripard wrote:

>  Required properties:
> -- compatible: Should be "marvell,orion-xor"
> +- compatible: Should be "marvell,orion-xor" or "marvell,a38x-xor"

I believe the new compatible string should be armada-380-xor or
armada380-xor. Wildcards in compatible strings are generally not
recommended, and we don't use the a38x- prefix anywhere in the tree, as
far as I can see:

drivers/ata/ahci_mvebu.c:	{ .compatible = "marvell,armada-380-ahci", },
drivers/bus/mvebu-mbus.c:	{ .compatible = "marvell,armada380-mbus",
drivers/mmc/host/sdhci-pxav3.c:		.compatible = "marvell,armada-380-sdhci",
drivers/rtc/rtc-armada38x.c:	{ .compatible = "marvell,armada-380-rtc", },
drivers/thermal/armada_thermal.c:		.compatible = "marvell,armada380-thermal",
drivers/usb/host/xhci-plat.c:	{ .compatible = "marvell,armada-380-xhci"},
drivers/watchdog/orion_wdt.c:		.compatible = "marvell,armada-380-wdt",

Yes, we're not very consistent between armada380 and armada-380, but
we're not using a38x anywhere.

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode
@ 2015-05-12 15:49     ` Thomas Petazzoni
  0 siblings, 0 replies; 79+ messages in thread
From: Thomas Petazzoni @ 2015-05-12 15:49 UTC (permalink / raw)
  To: linux-arm-kernel

Maxime,

On Tue, 12 May 2015 17:37:37 +0200, Maxime Ripard wrote:

>  Required properties:
> -- compatible: Should be "marvell,orion-xor"
> +- compatible: Should be "marvell,orion-xor" or "marvell,a38x-xor"

I believe the new compatible string should be armada-380-xor or
armada380-xor. Wildcards in compatible strings are generally not
recommended, and we don't use the a38x- prefix anywhere in the tree, as
far as I can see:

drivers/ata/ahci_mvebu.c:	{ .compatible = "marvell,armada-380-ahci", },
drivers/bus/mvebu-mbus.c:	{ .compatible = "marvell,armada380-mbus",
drivers/mmc/host/sdhci-pxav3.c:		.compatible = "marvell,armada-380-sdhci",
drivers/rtc/rtc-armada38x.c:	{ .compatible = "marvell,armada-380-rtc", },
drivers/thermal/armada_thermal.c:		.compatible = "marvell,armada380-thermal",
drivers/usb/host/xhci-plat.c:	{ .compatible = "marvell,armada-380-xhci"},
drivers/watchdog/orion_wdt.c:		.compatible = "marvell,armada-380-wdt",

Yes, we're not very consistent between armada380 and armada-380, but
we're not using a38x anywhere.

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 5/8] dmaengine: mv_xor: bug fix for racing condition in descriptors cleanup
  2015-05-12 15:37   ` Maxime Ripard
@ 2015-05-12 15:51     ` Thomas Petazzoni
  -1 siblings, 0 replies; 79+ messages in thread
From: Thomas Petazzoni @ 2015-05-12 15:51 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Vinod Koul, Dan Williams, Gregory Clement, Jason Cooper,
	Andrew Lunn, Sebastian Hesselbarth, dmaengine, linux-arm-kernel,
	linux-kernel, linux-crypto, Lior Amsalem, Thomas Petazzoni,
	Herbert Xu, David S. Miller

Maxime,

On Tue, 12 May 2015 17:37:40 +0200, Maxime Ripard wrote:
> From: Lior Amsalem <alior@marvell.com>
> 
> This patch fixes a bug in the XOR driver where the cleanup function can be
> called and free descriptors that never been processed by the engine (which
> result in data errors).
> 
> The cleanup function will free descriptors based on the ownership bit in
> the descriptors.
> 
> Signed-off-by: Lior Amsalem <alior@marvell.com>
> Reviewed-by: Ofer Heifetz <oferh@marvell.com>
> Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>

If this is fixing a bug, shouldn't it be Cc'ed to stable@ ? But then it
needs to be the first patch in the series.

>From what I remember, this series both fixes some issues *and* adds
RAID6 support. We need to separate out the two things so that the fixes
can be propagated to stable.

Thanks,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 5/8] dmaengine: mv_xor: bug fix for racing condition in descriptors cleanup
@ 2015-05-12 15:51     ` Thomas Petazzoni
  0 siblings, 0 replies; 79+ messages in thread
From: Thomas Petazzoni @ 2015-05-12 15:51 UTC (permalink / raw)
  To: linux-arm-kernel

Maxime,

On Tue, 12 May 2015 17:37:40 +0200, Maxime Ripard wrote:
> From: Lior Amsalem <alior@marvell.com>
> 
> This patch fixes a bug in the XOR driver where the cleanup function can be
> called and free descriptors that never been processed by the engine (which
> result in data errors).
> 
> The cleanup function will free descriptors based on the ownership bit in
> the descriptors.
> 
> Signed-off-by: Lior Amsalem <alior@marvell.com>
> Reviewed-by: Ofer Heifetz <oferh@marvell.com>
> Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>

If this is fixing a bug, shouldn't it be Cc'ed to stable@ ? But then it
needs to be the first patch in the series.

>From what I remember, this series both fixes some issues *and* adds
RAID6 support. We need to separate out the two things so that the fixes
can be propagated to stable.

Thanks,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode
  2015-05-12 15:49     ` Arnd Bergmann
@ 2015-05-12 15:54       ` Thomas Petazzoni
  -1 siblings, 0 replies; 79+ messages in thread
From: Thomas Petazzoni @ 2015-05-12 15:54 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, Maxime Ripard, Vinod Koul, Dan Williams,
	Gregory Clement, Jason Cooper, Andrew Lunn,
	Sebastian Hesselbarth, Lior Amsalem, Herbert Xu,
	Thomas Petazzoni, linux-kernel, linux-crypto, dmaengine,
	David S. Miller

Dear Arnd Bergmann,

On Tue, 12 May 2015 17:49:08 +0200, Arnd Bergmann wrote:

> Using 
> 
> 	if (IS_ENABLED(__BIG_ENDIAN))
> 
> here would make it more readable by avoiding the #if. Alternatively,
> you could leave the XOR_DESCRIPTOR_SWAP flag disabled and just swap
> the descriptors manually like a lot of other drivers do. You have
> to swap the mmio accesses anywya.

Agreed on IS_ENABLED(). However, I don't understand the comment about
the need to swap mmio accesses? We're using readl_relaxed() /
writel_relaxed(), and they do the swapping already. Am I missing
something?

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode
@ 2015-05-12 15:54       ` Thomas Petazzoni
  0 siblings, 0 replies; 79+ messages in thread
From: Thomas Petazzoni @ 2015-05-12 15:54 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Arnd Bergmann,

On Tue, 12 May 2015 17:49:08 +0200, Arnd Bergmann wrote:

> Using 
> 
> 	if (IS_ENABLED(__BIG_ENDIAN))
> 
> here would make it more readable by avoiding the #if. Alternatively,
> you could leave the XOR_DESCRIPTOR_SWAP flag disabled and just swap
> the descriptors manually like a lot of other drivers do. You have
> to swap the mmio accesses anywya.

Agreed on IS_ENABLED(). However, I don't understand the comment about
the need to swap mmio accesses? We're using readl_relaxed() /
writel_relaxed(), and they do the swapping already. Am I missing
something?

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode
  2015-05-12 15:49     ` Thomas Petazzoni
@ 2015-05-12 15:58       ` Andrew Lunn
  -1 siblings, 0 replies; 79+ messages in thread
From: Andrew Lunn @ 2015-05-12 15:58 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: Maxime Ripard, Vinod Koul, Dan Williams, Gregory Clement,
	Jason Cooper, Sebastian Hesselbarth, dmaengine, linux-arm-kernel,
	linux-kernel, linux-crypto, Lior Amsalem, Thomas Petazzoni,
	Herbert Xu, David S. Miller

On Tue, May 12, 2015 at 05:49:36PM +0200, Thomas Petazzoni wrote:
> Maxime,
> 
> On Tue, 12 May 2015 17:37:37 +0200, Maxime Ripard wrote:
> 
> >  Required properties:
> > -- compatible: Should be "marvell,orion-xor"
> > +- compatible: Should be "marvell,orion-xor" or "marvell,a38x-xor"
> 
> I believe the new compatible string should be armada-380-xor or
> armada380-xor. Wildcards in compatible strings are generally not
> recommended, and we don't use the a38x- prefix anywhere in the tree, as
> far as I can see:
> 
> drivers/ata/ahci_mvebu.c:	{ .compatible = "marvell,armada-380-ahci", },
> drivers/bus/mvebu-mbus.c:	{ .compatible = "marvell,armada380-mbus",
> drivers/mmc/host/sdhci-pxav3.c:		.compatible = "marvell,armada-380-sdhci",
> drivers/rtc/rtc-armada38x.c:	{ .compatible = "marvell,armada-380-rtc", },
> drivers/thermal/armada_thermal.c:		.compatible = "marvell,armada380-thermal",
> drivers/usb/host/xhci-plat.c:	{ .compatible = "marvell,armada-380-xhci"},
> drivers/watchdog/orion_wdt.c:		.compatible = "marvell,armada-380-wdt",
> 
> Yes, we're not very consistent between armada380 and armada-380, but
> we're not using a38x anywhere.

A quick grep for marvell,orion, marvell,kirkwood and marvell,dove
suggests it should have the - if we want to be consistent with older
Marvell devices. As you said, starting with marvell,armanda it gets
pretty inconsistent :-(

	Andrew

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode
@ 2015-05-12 15:58       ` Andrew Lunn
  0 siblings, 0 replies; 79+ messages in thread
From: Andrew Lunn @ 2015-05-12 15:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 12, 2015 at 05:49:36PM +0200, Thomas Petazzoni wrote:
> Maxime,
> 
> On Tue, 12 May 2015 17:37:37 +0200, Maxime Ripard wrote:
> 
> >  Required properties:
> > -- compatible: Should be "marvell,orion-xor"
> > +- compatible: Should be "marvell,orion-xor" or "marvell,a38x-xor"
> 
> I believe the new compatible string should be armada-380-xor or
> armada380-xor. Wildcards in compatible strings are generally not
> recommended, and we don't use the a38x- prefix anywhere in the tree, as
> far as I can see:
> 
> drivers/ata/ahci_mvebu.c:	{ .compatible = "marvell,armada-380-ahci", },
> drivers/bus/mvebu-mbus.c:	{ .compatible = "marvell,armada380-mbus",
> drivers/mmc/host/sdhci-pxav3.c:		.compatible = "marvell,armada-380-sdhci",
> drivers/rtc/rtc-armada38x.c:	{ .compatible = "marvell,armada-380-rtc", },
> drivers/thermal/armada_thermal.c:		.compatible = "marvell,armada380-thermal",
> drivers/usb/host/xhci-plat.c:	{ .compatible = "marvell,armada-380-xhci"},
> drivers/watchdog/orion_wdt.c:		.compatible = "marvell,armada-380-wdt",
> 
> Yes, we're not very consistent between armada380 and armada-380, but
> we're not using a38x anywhere.

A quick grep for marvell,orion, marvell,kirkwood and marvell,dove
suggests it should have the - if we want to be consistent with older
Marvell devices. As you said, starting with marvell,armanda it gets
pretty inconsistent :-(

	Andrew

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode
  2015-05-12 15:54       ` Thomas Petazzoni
@ 2015-05-12 16:03         ` Arnd Bergmann
  -1 siblings, 0 replies; 79+ messages in thread
From: Arnd Bergmann @ 2015-05-12 16:03 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Thomas Petazzoni, Lior Amsalem, Andrew Lunn, Jason Cooper,
	Vinod Koul, dmaengine, linux-kernel, David S. Miller,
	Thomas Petazzoni, linux-crypto, Gregory Clement, Herbert Xu,
	Maxime Ripard, Dan Williams, Sebastian Hesselbarth

On Tuesday 12 May 2015 17:54:31 Thomas Petazzoni wrote:
> Dear Arnd Bergmann,
> 
> On Tue, 12 May 2015 17:49:08 +0200, Arnd Bergmann wrote:
> 
> > Using 
> > 
> >       if (IS_ENABLED(__BIG_ENDIAN))
> > 
> > here would make it more readable by avoiding the #if. Alternatively,
> > you could leave the XOR_DESCRIPTOR_SWAP flag disabled and just swap
> > the descriptors manually like a lot of other drivers do. You have
> > to swap the mmio accesses anywya.
> 
> Agreed on IS_ENABLED(). However, I don't understand the comment about
> the need to swap mmio accesses? We're using readl_relaxed() /
> writel_relaxed(), and they do the swapping already. Am I missing
> something?
> 

Sorry for being unclear. This was exactly my point: readl_relaxed()
has to do swaps internally, and you can also do swaps when accessing
the descriptor in memory.

	Arnd

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode
@ 2015-05-12 16:03         ` Arnd Bergmann
  0 siblings, 0 replies; 79+ messages in thread
From: Arnd Bergmann @ 2015-05-12 16:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 12 May 2015 17:54:31 Thomas Petazzoni wrote:
> Dear Arnd Bergmann,
> 
> On Tue, 12 May 2015 17:49:08 +0200, Arnd Bergmann wrote:
> 
> > Using 
> > 
> >       if (IS_ENABLED(__BIG_ENDIAN))
> > 
> > here would make it more readable by avoiding the #if. Alternatively,
> > you could leave the XOR_DESCRIPTOR_SWAP flag disabled and just swap
> > the descriptors manually like a lot of other drivers do. You have
> > to swap the mmio accesses anywya.
> 
> Agreed on IS_ENABLED(). However, I don't understand the comment about
> the need to swap mmio accesses? We're using readl_relaxed() /
> writel_relaxed(), and they do the swapping already. Am I missing
> something?
> 

Sorry for being unclear. This was exactly my point: readl_relaxed()
has to do swaps internally, and you can also do swaps when accessing
the descriptor in memory.

	Arnd

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 6/8] async_tx: adding mult and sum_product flags
  2015-05-12 15:37   ` Maxime Ripard
@ 2015-05-12 16:05     ` Andrew Lunn
  -1 siblings, 0 replies; 79+ messages in thread
From: Andrew Lunn @ 2015-05-12 16:05 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Vinod Koul, Dan Williams, Gregory Clement, Jason Cooper,
	Sebastian Hesselbarth, dmaengine, linux-arm-kernel, linux-kernel,
	linux-crypto, Lior Amsalem, Thomas Petazzoni, Herbert Xu,
	David S. Miller

On Tue, May 12, 2015 at 05:37:41PM +0200, Maxime Ripard wrote:
> From: Lior Amsalem <alior@marvell.com>
> 
> Some engines (like Marvell mv_xor) do not support mult and sum_product
> operations as part of the pq support.
> 
> This patch adds new flags: DMA_PREP_PQ_MULT & DMA_PREP_PQ_SUM_PRODUCT these
> flags helps the driver identify such operations.
> 
> Signed-off-by: Lior Amsalem <alior@marvell.com>
> Reviewed-by: Ofer Heifetz <oferh@marvell.com>
> Reviewed-by: Nadav Haklai <nadavh@marvell.com>
> Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
> ---
>  crypto/async_tx/async_raid6_recov.c | 4 ++--
>  include/linux/dmaengine.h           | 4 ++++
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/crypto/async_tx/async_raid6_recov.c b/crypto/async_tx/async_raid6_recov.c
> index 934a84981495..2db5486fd873 100644
> --- a/crypto/async_tx/async_raid6_recov.c
> +++ b/crypto/async_tx/async_raid6_recov.c
> @@ -47,7 +47,7 @@ async_sum_product(struct page *dest, struct page **srcs, unsigned char *coef,
>  		struct device *dev = dma->dev;
>  		dma_addr_t pq[2];
>  		struct dma_async_tx_descriptor *tx;
> -		enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P;
> +		enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P | DMA_PREP_PQ_SUM_PRODUCT;
>  
>  		if (submit->flags & ASYNC_TX_FENCE)
>  			dma_flags |= DMA_PREP_FENCE;
> @@ -111,7 +111,7 @@ async_mult(struct page *dest, struct page *src, u8 coef, size_t len,
>  		dma_addr_t dma_dest[2];
>  		struct device *dev = dma->dev;
>  		struct dma_async_tx_descriptor *tx;
> -		enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P;
> +		enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P | DMA_PREP_PQ_MULT;
>  
>  		if (submit->flags & ASYNC_TX_FENCE)
>  			dma_flags |= DMA_PREP_FENCE;
> diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> index ad419757241f..f19ecebb4d3f 100644
> --- a/include/linux/dmaengine.h
> +++ b/include/linux/dmaengine.h
> @@ -174,6 +174,8 @@ struct dma_interleaved_template {
>   *  operation it continues the calculation with new sources
>   * @DMA_PREP_FENCE - tell the driver that subsequent operations depend
>   *  on the result of this operation
> + * @DMA_PREP_PQ_MULT - tell the driver that this is a mult request
> + * @DMA_PREP_PQ_SUM_PRODUCT - tell the driver that this is a sum product request
>   */
>  enum dma_ctrl_flags {
>  	DMA_PREP_INTERRUPT = (1 << 0),
> @@ -182,6 +184,8 @@ enum dma_ctrl_flags {
>  	DMA_PREP_PQ_DISABLE_Q = (1 << 3),
>  	DMA_PREP_CONTINUE = (1 << 4),
>  	DMA_PREP_FENCE = (1 << 5),
> +	DMA_PREP_PQ_MULT = (1 << 10),
> +	DMA_PREP_PQ_SUM_PRODUCT = (1 << 11),

Any reason for skipping 6 to 9?

    Andrew

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 6/8] async_tx: adding mult and sum_product flags
@ 2015-05-12 16:05     ` Andrew Lunn
  0 siblings, 0 replies; 79+ messages in thread
From: Andrew Lunn @ 2015-05-12 16:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 12, 2015 at 05:37:41PM +0200, Maxime Ripard wrote:
> From: Lior Amsalem <alior@marvell.com>
> 
> Some engines (like Marvell mv_xor) do not support mult and sum_product
> operations as part of the pq support.
> 
> This patch adds new flags: DMA_PREP_PQ_MULT & DMA_PREP_PQ_SUM_PRODUCT these
> flags helps the driver identify such operations.
> 
> Signed-off-by: Lior Amsalem <alior@marvell.com>
> Reviewed-by: Ofer Heifetz <oferh@marvell.com>
> Reviewed-by: Nadav Haklai <nadavh@marvell.com>
> Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
> ---
>  crypto/async_tx/async_raid6_recov.c | 4 ++--
>  include/linux/dmaengine.h           | 4 ++++
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/crypto/async_tx/async_raid6_recov.c b/crypto/async_tx/async_raid6_recov.c
> index 934a84981495..2db5486fd873 100644
> --- a/crypto/async_tx/async_raid6_recov.c
> +++ b/crypto/async_tx/async_raid6_recov.c
> @@ -47,7 +47,7 @@ async_sum_product(struct page *dest, struct page **srcs, unsigned char *coef,
>  		struct device *dev = dma->dev;
>  		dma_addr_t pq[2];
>  		struct dma_async_tx_descriptor *tx;
> -		enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P;
> +		enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P | DMA_PREP_PQ_SUM_PRODUCT;
>  
>  		if (submit->flags & ASYNC_TX_FENCE)
>  			dma_flags |= DMA_PREP_FENCE;
> @@ -111,7 +111,7 @@ async_mult(struct page *dest, struct page *src, u8 coef, size_t len,
>  		dma_addr_t dma_dest[2];
>  		struct device *dev = dma->dev;
>  		struct dma_async_tx_descriptor *tx;
> -		enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P;
> +		enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P | DMA_PREP_PQ_MULT;
>  
>  		if (submit->flags & ASYNC_TX_FENCE)
>  			dma_flags |= DMA_PREP_FENCE;
> diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> index ad419757241f..f19ecebb4d3f 100644
> --- a/include/linux/dmaengine.h
> +++ b/include/linux/dmaengine.h
> @@ -174,6 +174,8 @@ struct dma_interleaved_template {
>   *  operation it continues the calculation with new sources
>   * @DMA_PREP_FENCE - tell the driver that subsequent operations depend
>   *  on the result of this operation
> + * @DMA_PREP_PQ_MULT - tell the driver that this is a mult request
> + * @DMA_PREP_PQ_SUM_PRODUCT - tell the driver that this is a sum product request
>   */
>  enum dma_ctrl_flags {
>  	DMA_PREP_INTERRUPT = (1 << 0),
> @@ -182,6 +184,8 @@ enum dma_ctrl_flags {
>  	DMA_PREP_PQ_DISABLE_Q = (1 << 3),
>  	DMA_PREP_CONTINUE = (1 << 4),
>  	DMA_PREP_FENCE = (1 << 5),
> +	DMA_PREP_PQ_MULT = (1 << 10),
> +	DMA_PREP_PQ_SUM_PRODUCT = (1 << 11),

Any reason for skipping 6 to 9?

    Andrew

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode
  2015-05-12 15:58       ` Andrew Lunn
@ 2015-05-12 16:05         ` Thomas Petazzoni
  -1 siblings, 0 replies; 79+ messages in thread
From: Thomas Petazzoni @ 2015-05-12 16:05 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Maxime Ripard, Vinod Koul, Dan Williams, Gregory Clement,
	Jason Cooper, Sebastian Hesselbarth, dmaengine, linux-arm-kernel,
	linux-kernel, linux-crypto, Lior Amsalem, Thomas Petazzoni,
	Herbert Xu, David S. Miller

Andrew,

On Tue, 12 May 2015 17:58:24 +0200, Andrew Lunn wrote:

> A quick grep for marvell,orion, marvell,kirkwood and marvell,dove
> suggests it should have the - if we want to be consistent with older
> Marvell devices. As you said, starting with marvell,armanda it gets
> pretty inconsistent :-(

Yes, with the "-" would be better IMO.

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode
@ 2015-05-12 16:05         ` Thomas Petazzoni
  0 siblings, 0 replies; 79+ messages in thread
From: Thomas Petazzoni @ 2015-05-12 16:05 UTC (permalink / raw)
  To: linux-arm-kernel

Andrew,

On Tue, 12 May 2015 17:58:24 +0200, Andrew Lunn wrote:

> A quick grep for marvell,orion, marvell,kirkwood and marvell,dove
> suggests it should have the - if we want to be consistent with older
> Marvell devices. As you said, starting with marvell,armanda it gets
> pretty inconsistent :-(

Yes, with the "-" would be better IMO.

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading
  2015-05-12 15:37 ` Maxime Ripard
  (?)
@ 2015-05-12 16:05   ` Dan Williams
  -1 siblings, 0 replies; 79+ messages in thread
From: Dan Williams @ 2015-05-12 16:05 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Vinod Koul, Gregory Clement, Jason Cooper, Andrew Lunn,
	Sebastian Hesselbarth, dmaengine, linux-arm-kernel, linux-kernel,
	linux-crypto, Lior Amsalem, Thomas Petazzoni, Herbert Xu,
	David S. Miller

On Tue, May 12, 2015 at 8:37 AM, Maxime Ripard
<maxime.ripard@free-electrons.com> wrote:
> Hi,
>
> This serie refactors the mv_xor in order to support the latest Armada
> 38x features, including the PQ support in order to offload the RAID6
> PQ operations.
>
> Not all the PQ operations are supported by the XOR engine, so we had
> to introduce new async_tx flags in the process to identify
> un-supported operations.
>
> Please note that this is currently not usable because of a possible
> regression in the RAID stack in 4.1 that is being discussed at the
> moment here: https://lkml.org/lkml/2015/5/7/527

This is problematic as async_tx is a wart on the dmaengine subsystem
and needs to be deprecated, I just have yet to find the time to do
that work.  It turns out it was a mistake to hide the device details
from md, it should be explicitly managing the dma channels, not
relying on a abstraction api.  The async_tx api usage of the
dma-mapping api is broken in that it relies on overlapping mappings of
the same address.  This happens to work on x86, but on arm it needs
explicit non-overlapping mappings.  I started the work to reference
count dma-mappings in 3.13, and we need to teach md to use
dmaengine_unmap_data explicitly.  Yielding dma channel management to
md also results in a more efficient implementation as we can dma_map()
the stripe cache once rather than per-io.  The  "async_tx_ack()"
disaster can also go away when md is explicitly handling channel
switching.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading
@ 2015-05-12 16:05   ` Dan Williams
  0 siblings, 0 replies; 79+ messages in thread
From: Dan Williams @ 2015-05-12 16:05 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Vinod Koul, Gregory Clement, Jason Cooper, Andrew Lunn,
	Sebastian Hesselbarth, dmaengine, linux-arm-kernel, linux-kernel,
	linux-crypto, Lior Amsalem, Thomas Petazzoni, Herbert Xu,
	David S. Miller

On Tue, May 12, 2015 at 8:37 AM, Maxime Ripard
<maxime.ripard@free-electrons.com> wrote:
> Hi,
>
> This serie refactors the mv_xor in order to support the latest Armada
> 38x features, including the PQ support in order to offload the RAID6
> PQ operations.
>
> Not all the PQ operations are supported by the XOR engine, so we had
> to introduce new async_tx flags in the process to identify
> un-supported operations.
>
> Please note that this is currently not usable because of a possible
> regression in the RAID stack in 4.1 that is being discussed at the
> moment here: https://lkml.org/lkml/2015/5/7/527

This is problematic as async_tx is a wart on the dmaengine subsystem
and needs to be deprecated, I just have yet to find the time to do
that work.  It turns out it was a mistake to hide the device details
from md, it should be explicitly managing the dma channels, not
relying on a abstraction api.  The async_tx api usage of the
dma-mapping api is broken in that it relies on overlapping mappings of
the same address.  This happens to work on x86, but on arm it needs
explicit non-overlapping mappings.  I started the work to reference
count dma-mappings in 3.13, and we need to teach md to use
dmaengine_unmap_data explicitly.  Yielding dma channel management to
md also results in a more efficient implementation as we can dma_map()
the stripe cache once rather than per-io.  The  "async_tx_ack()"
disaster can also go away when md is explicitly handling channel
switching.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading
@ 2015-05-12 16:05   ` Dan Williams
  0 siblings, 0 replies; 79+ messages in thread
From: Dan Williams @ 2015-05-12 16:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 12, 2015 at 8:37 AM, Maxime Ripard
<maxime.ripard@free-electrons.com> wrote:
> Hi,
>
> This serie refactors the mv_xor in order to support the latest Armada
> 38x features, including the PQ support in order to offload the RAID6
> PQ operations.
>
> Not all the PQ operations are supported by the XOR engine, so we had
> to introduce new async_tx flags in the process to identify
> un-supported operations.
>
> Please note that this is currently not usable because of a possible
> regression in the RAID stack in 4.1 that is being discussed at the
> moment here: https://lkml.org/lkml/2015/5/7/527

This is problematic as async_tx is a wart on the dmaengine subsystem
and needs to be deprecated, I just have yet to find the time to do
that work.  It turns out it was a mistake to hide the device details
from md, it should be explicitly managing the dma channels, not
relying on a abstraction api.  The async_tx api usage of the
dma-mapping api is broken in that it relies on overlapping mappings of
the same address.  This happens to work on x86, but on arm it needs
explicit non-overlapping mappings.  I started the work to reference
count dma-mappings in 3.13, and we need to teach md to use
dmaengine_unmap_data explicitly.  Yielding dma channel management to
md also results in a more efficient implementation as we can dma_map()
the stripe cache once rather than per-io.  The  "async_tx_ack()"
disaster can also go away when md is explicitly handling channel
switching.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 8/8] ARM: mvebu: a38x: Enable A38x XOR engine features
  2015-05-12 15:37   ` Maxime Ripard
@ 2015-05-12 16:13     ` Andrew Lunn
  -1 siblings, 0 replies; 79+ messages in thread
From: Andrew Lunn @ 2015-05-12 16:13 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Vinod Koul, Dan Williams, Gregory Clement, Jason Cooper,
	Sebastian Hesselbarth, dmaengine, linux-arm-kernel, linux-kernel,
	linux-crypto, Lior Amsalem, Thomas Petazzoni, Herbert Xu,
	David S. Miller

On Tue, May 12, 2015 at 05:37:43PM +0200, Maxime Ripard wrote:
> From: Lior Amsalem <alior@marvell.com>
> 
> The new XOR engine has a new compatible of its own, together with new
> channel capabilities.
> 
> Use that new compatible now that we have a driver that can handle it.
> 
> Signed-off-by: Lior Amsalem <alior@marvell.com>
> Reviewed-by: Ofer Heifetz <oferh@marvell.com>
> Reviewed-by: Nadav Haklai <nadavh@marvell.com>
> Tested-by: Nadav Haklai <nadavh@marvell.com>
> ---
>  arch/arm/boot/dts/armada-38x.dtsi | 20 ++++++--------------
>  1 file changed, 6 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/arm/boot/dts/armada-38x.dtsi b/arch/arm/boot/dts/armada-38x.dtsi
> index ed2dd8ba4080..6d07b7389415 100644
> --- a/arch/arm/boot/dts/armada-38x.dtsi
> +++ b/arch/arm/boot/dts/armada-38x.dtsi
> @@ -448,7 +448,7 @@
>  			};
>  
>  			xor@60800 {
> -				compatible = "marvell,orion-xor";
> +				compatible = "marvell,a38x-xor";
>  				reg = <0x60800 0x100
>  				       0x60a00 0x100>;
>  				clocks = <&gateclk 22>;
> @@ -458,17 +458,13 @@
>  					interrupts = <GIC_SPI 22 IRQ_TYPE_LEVEL_HIGH>;
>  					dmacap,memcpy;
>  					dmacap,xor;
> -				};
> -				xor01 {
> -					interrupts = <GIC_SPI 23 IRQ_TYPE_LEVEL_HIGH>;
> -					dmacap,memcpy;
> -					dmacap,xor;
> -					dmacap,memset;
> +					dmacap,pq;
> +					dmacap,interrupt;

Does this mean the hardware only has one channel?
And memset is no longer supported?

    Andrew


>  				};
>  			};
>  
>  			xor@60900 {
> -				compatible = "marvell,orion-xor";
> +				compatible = "marvell,a38x-xor";
>  				reg = <0x60900 0x100
>  				       0x60b00 0x100>;
>  				clocks = <&gateclk 28>;
> @@ -478,12 +474,8 @@
>  					interrupts = <GIC_SPI 65 IRQ_TYPE_LEVEL_HIGH>;
>  					dmacap,memcpy;
>  					dmacap,xor;
> -				};
> -				xor11 {
> -					interrupts = <GIC_SPI 66 IRQ_TYPE_LEVEL_HIGH>;
> -					dmacap,memcpy;
> -					dmacap,xor;
> -					dmacap,memset;
> +					dmacap,pq;
> +					dmacap,interrupt;
>  				};
>  			};
>  
> -- 
> 2.4.0
> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 8/8] ARM: mvebu: a38x: Enable A38x XOR engine features
@ 2015-05-12 16:13     ` Andrew Lunn
  0 siblings, 0 replies; 79+ messages in thread
From: Andrew Lunn @ 2015-05-12 16:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 12, 2015 at 05:37:43PM +0200, Maxime Ripard wrote:
> From: Lior Amsalem <alior@marvell.com>
> 
> The new XOR engine has a new compatible of its own, together with new
> channel capabilities.
> 
> Use that new compatible now that we have a driver that can handle it.
> 
> Signed-off-by: Lior Amsalem <alior@marvell.com>
> Reviewed-by: Ofer Heifetz <oferh@marvell.com>
> Reviewed-by: Nadav Haklai <nadavh@marvell.com>
> Tested-by: Nadav Haklai <nadavh@marvell.com>
> ---
>  arch/arm/boot/dts/armada-38x.dtsi | 20 ++++++--------------
>  1 file changed, 6 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/arm/boot/dts/armada-38x.dtsi b/arch/arm/boot/dts/armada-38x.dtsi
> index ed2dd8ba4080..6d07b7389415 100644
> --- a/arch/arm/boot/dts/armada-38x.dtsi
> +++ b/arch/arm/boot/dts/armada-38x.dtsi
> @@ -448,7 +448,7 @@
>  			};
>  
>  			xor at 60800 {
> -				compatible = "marvell,orion-xor";
> +				compatible = "marvell,a38x-xor";
>  				reg = <0x60800 0x100
>  				       0x60a00 0x100>;
>  				clocks = <&gateclk 22>;
> @@ -458,17 +458,13 @@
>  					interrupts = <GIC_SPI 22 IRQ_TYPE_LEVEL_HIGH>;
>  					dmacap,memcpy;
>  					dmacap,xor;
> -				};
> -				xor01 {
> -					interrupts = <GIC_SPI 23 IRQ_TYPE_LEVEL_HIGH>;
> -					dmacap,memcpy;
> -					dmacap,xor;
> -					dmacap,memset;
> +					dmacap,pq;
> +					dmacap,interrupt;

Does this mean the hardware only has one channel?
And memset is no longer supported?

    Andrew


>  				};
>  			};
>  
>  			xor at 60900 {
> -				compatible = "marvell,orion-xor";
> +				compatible = "marvell,a38x-xor";
>  				reg = <0x60900 0x100
>  				       0x60b00 0x100>;
>  				clocks = <&gateclk 28>;
> @@ -478,12 +474,8 @@
>  					interrupts = <GIC_SPI 65 IRQ_TYPE_LEVEL_HIGH>;
>  					dmacap,memcpy;
>  					dmacap,xor;
> -				};
> -				xor11 {
> -					interrupts = <GIC_SPI 66 IRQ_TYPE_LEVEL_HIGH>;
> -					dmacap,memcpy;
> -					dmacap,xor;
> -					dmacap,memset;
> +					dmacap,pq;
> +					dmacap,interrupt;
>  				};
>  			};
>  
> -- 
> 2.4.0
> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [PATCH 8/8] ARM: mvebu: a38x: Enable A38x XOR engine features
  2015-05-12 16:13     ` Andrew Lunn
  (?)
@ 2015-05-13  7:16       ` Lior Amsalem
  -1 siblings, 0 replies; 79+ messages in thread
From: Lior Amsalem @ 2015-05-13  7:16 UTC (permalink / raw)
  To: Andrew Lunn, Maxime Ripard
  Cc: Vinod Koul, Dan Williams, Gregory Clement, Jason Cooper,
	Sebastian Hesselbarth, dmaengine, linux-arm-kernel, linux-kernel,
	linux-crypto, Thomas Petazzoni, Herbert Xu, David S. Miller

> From: Andrew Lunn [mailto:andrew@lunn.ch]
> Sent: Tuesday, May 12, 2015 7:13 PM
> 
> On Tue, May 12, 2015 at 05:37:43PM +0200, Maxime Ripard wrote:
> > From: Lior Amsalem <alior@marvell.com>
> >
> > The new XOR engine has a new compatible of its own, together with new
> > channel capabilities.
> >
> > Use that new compatible now that we have a driver that can handle it.
> >
> > Signed-off-by: Lior Amsalem <alior@marvell.com>
> > Reviewed-by: Ofer Heifetz <oferh@marvell.com>
> > Reviewed-by: Nadav Haklai <nadavh@marvell.com>
> > Tested-by: Nadav Haklai <nadavh@marvell.com>
> > ---
> >  arch/arm/boot/dts/armada-38x.dtsi | 20 ++++++--------------
> >  1 file changed, 6 insertions(+), 14 deletions(-)
> >
> > diff --git a/arch/arm/boot/dts/armada-38x.dtsi
> > b/arch/arm/boot/dts/armada-38x.dtsi
> > index ed2dd8ba4080..6d07b7389415 100644
> > --- a/arch/arm/boot/dts/armada-38x.dtsi
> > +++ b/arch/arm/boot/dts/armada-38x.dtsi
> > @@ -448,7 +448,7 @@
> >  			};
> >
> >  			xor@60800 {
> > -				compatible = "marvell,orion-xor";
> > +				compatible = "marvell,a38x-xor";
> >  				reg = <0x60800 0x100
> >  				       0x60a00 0x100>;
> >  				clocks = <&gateclk 22>;
> > @@ -458,17 +458,13 @@
> >  					interrupts = <GIC_SPI 22
> IRQ_TYPE_LEVEL_HIGH>;
> >  					dmacap,memcpy;
> >  					dmacap,xor;
> > -				};
> > -				xor01 {
> > -					interrupts = <GIC_SPI 23
> IRQ_TYPE_LEVEL_HIGH>;
> > -					dmacap,memcpy;
> > -					dmacap,xor;
> > -					dmacap,memset;
> > +					dmacap,pq;
> > +					dmacap,interrupt;
> 
> Does this mean the hardware only has one channel?
> And memset is no longer supported?
> 

The hardware has two channels per engine and two engines.
However, both on HW side (both channels are on the same "bus port")
and SW (the dma subsystem will assign one channel per CPU).
we found it's better (performance wise) to use only one channel on each engine
and let the framework assign one per CPU.
This way, descriptors chaining was better (cause of the depended descriptors
problem) and overall interrupt number reduced.

Yes, since memset is a problematic one. It can only be done via registers
(and not on descriptors level) plus no one really needs it...

>     Andrew
> 
> 
> >  				};
> >  			};
> >
> >  			xor@60900 {
> > -				compatible = "marvell,orion-xor";
> > +				compatible = "marvell,a38x-xor";
> >  				reg = <0x60900 0x100
> >  				       0x60b00 0x100>;
> >  				clocks = <&gateclk 28>;
> > @@ -478,12 +474,8 @@
> >  					interrupts = <GIC_SPI 65
> IRQ_TYPE_LEVEL_HIGH>;
> >  					dmacap,memcpy;
> >  					dmacap,xor;
> > -				};
> > -				xor11 {
> > -					interrupts = <GIC_SPI 66
> IRQ_TYPE_LEVEL_HIGH>;
> > -					dmacap,memcpy;
> > -					dmacap,xor;
> > -					dmacap,memset;
> > +					dmacap,pq;
> > +					dmacap,interrupt;
> >  				};
> >  			};
> >
> > --
> > 2.4.0
> >


Regards,
Lior Amsalem

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [PATCH 8/8] ARM: mvebu: a38x: Enable A38x XOR engine features
@ 2015-05-13  7:16       ` Lior Amsalem
  0 siblings, 0 replies; 79+ messages in thread
From: Lior Amsalem @ 2015-05-13  7:16 UTC (permalink / raw)
  To: Andrew Lunn, Maxime Ripard
  Cc: Vinod Koul, Dan Williams, Gregory Clement, Jason Cooper,
	Sebastian Hesselbarth, dmaengine, linux-arm-kernel, linux-kernel,
	linux-crypto, Thomas Petazzoni, Herbert Xu, David S. Miller

> From: Andrew Lunn [mailto:andrew@lunn.ch]
> Sent: Tuesday, May 12, 2015 7:13 PM
> 
> On Tue, May 12, 2015 at 05:37:43PM +0200, Maxime Ripard wrote:
> > From: Lior Amsalem <alior@marvell.com>
> >
> > The new XOR engine has a new compatible of its own, together with new
> > channel capabilities.
> >
> > Use that new compatible now that we have a driver that can handle it.
> >
> > Signed-off-by: Lior Amsalem <alior@marvell.com>
> > Reviewed-by: Ofer Heifetz <oferh@marvell.com>
> > Reviewed-by: Nadav Haklai <nadavh@marvell.com>
> > Tested-by: Nadav Haklai <nadavh@marvell.com>
> > ---
> >  arch/arm/boot/dts/armada-38x.dtsi | 20 ++++++--------------
> >  1 file changed, 6 insertions(+), 14 deletions(-)
> >
> > diff --git a/arch/arm/boot/dts/armada-38x.dtsi
> > b/arch/arm/boot/dts/armada-38x.dtsi
> > index ed2dd8ba4080..6d07b7389415 100644
> > --- a/arch/arm/boot/dts/armada-38x.dtsi
> > +++ b/arch/arm/boot/dts/armada-38x.dtsi
> > @@ -448,7 +448,7 @@
> >  			};
> >
> >  			xor@60800 {
> > -				compatible = "marvell,orion-xor";
> > +				compatible = "marvell,a38x-xor";
> >  				reg = <0x60800 0x100
> >  				       0x60a00 0x100>;
> >  				clocks = <&gateclk 22>;
> > @@ -458,17 +458,13 @@
> >  					interrupts = <GIC_SPI 22
> IRQ_TYPE_LEVEL_HIGH>;
> >  					dmacap,memcpy;
> >  					dmacap,xor;
> > -				};
> > -				xor01 {
> > -					interrupts = <GIC_SPI 23
> IRQ_TYPE_LEVEL_HIGH>;
> > -					dmacap,memcpy;
> > -					dmacap,xor;
> > -					dmacap,memset;
> > +					dmacap,pq;
> > +					dmacap,interrupt;
> 
> Does this mean the hardware only has one channel?
> And memset is no longer supported?
> 

The hardware has two channels per engine and two engines.
However, both on HW side (both channels are on the same "bus port")
and SW (the dma subsystem will assign one channel per CPU).
we found it's better (performance wise) to use only one channel on each engine
and let the framework assign one per CPU.
This way, descriptors chaining was better (cause of the depended descriptors
problem) and overall interrupt number reduced.

Yes, since memset is a problematic one. It can only be done via registers
(and not on descriptors level) plus no one really needs it...

>     Andrew
> 
> 
> >  				};
> >  			};
> >
> >  			xor@60900 {
> > -				compatible = "marvell,orion-xor";
> > +				compatible = "marvell,a38x-xor";
> >  				reg = <0x60900 0x100
> >  				       0x60b00 0x100>;
> >  				clocks = <&gateclk 28>;
> > @@ -478,12 +474,8 @@
> >  					interrupts = <GIC_SPI 65
> IRQ_TYPE_LEVEL_HIGH>;
> >  					dmacap,memcpy;
> >  					dmacap,xor;
> > -				};
> > -				xor11 {
> > -					interrupts = <GIC_SPI 66
> IRQ_TYPE_LEVEL_HIGH>;
> > -					dmacap,memcpy;
> > -					dmacap,xor;
> > -					dmacap,memset;
> > +					dmacap,pq;
> > +					dmacap,interrupt;
> >  				};
> >  			};
> >
> > --
> > 2.4.0
> >


Regards,
Lior Amsalem


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 8/8] ARM: mvebu: a38x: Enable A38x XOR engine features
@ 2015-05-13  7:16       ` Lior Amsalem
  0 siblings, 0 replies; 79+ messages in thread
From: Lior Amsalem @ 2015-05-13  7:16 UTC (permalink / raw)
  To: linux-arm-kernel

> From: Andrew Lunn [mailto:andrew at lunn.ch]
> Sent: Tuesday, May 12, 2015 7:13 PM
> 
> On Tue, May 12, 2015 at 05:37:43PM +0200, Maxime Ripard wrote:
> > From: Lior Amsalem <alior@marvell.com>
> >
> > The new XOR engine has a new compatible of its own, together with new
> > channel capabilities.
> >
> > Use that new compatible now that we have a driver that can handle it.
> >
> > Signed-off-by: Lior Amsalem <alior@marvell.com>
> > Reviewed-by: Ofer Heifetz <oferh@marvell.com>
> > Reviewed-by: Nadav Haklai <nadavh@marvell.com>
> > Tested-by: Nadav Haklai <nadavh@marvell.com>
> > ---
> >  arch/arm/boot/dts/armada-38x.dtsi | 20 ++++++--------------
> >  1 file changed, 6 insertions(+), 14 deletions(-)
> >
> > diff --git a/arch/arm/boot/dts/armada-38x.dtsi
> > b/arch/arm/boot/dts/armada-38x.dtsi
> > index ed2dd8ba4080..6d07b7389415 100644
> > --- a/arch/arm/boot/dts/armada-38x.dtsi
> > +++ b/arch/arm/boot/dts/armada-38x.dtsi
> > @@ -448,7 +448,7 @@
> >  			};
> >
> >  			xor at 60800 {
> > -				compatible = "marvell,orion-xor";
> > +				compatible = "marvell,a38x-xor";
> >  				reg = <0x60800 0x100
> >  				       0x60a00 0x100>;
> >  				clocks = <&gateclk 22>;
> > @@ -458,17 +458,13 @@
> >  					interrupts = <GIC_SPI 22
> IRQ_TYPE_LEVEL_HIGH>;
> >  					dmacap,memcpy;
> >  					dmacap,xor;
> > -				};
> > -				xor01 {
> > -					interrupts = <GIC_SPI 23
> IRQ_TYPE_LEVEL_HIGH>;
> > -					dmacap,memcpy;
> > -					dmacap,xor;
> > -					dmacap,memset;
> > +					dmacap,pq;
> > +					dmacap,interrupt;
> 
> Does this mean the hardware only has one channel?
> And memset is no longer supported?
> 

The hardware has two channels per engine and two engines.
However, both on HW side (both channels are on the same "bus port")
and SW (the dma subsystem will assign one channel per CPU).
we found it's better (performance wise) to use only one channel on each engine
and let the framework assign one per CPU.
This way, descriptors chaining was better (cause of the depended descriptors
problem) and overall interrupt number reduced.

Yes, since memset is a problematic one. It can only be done via registers
(and not on descriptors level) plus no one really needs it...

>     Andrew
> 
> 
> >  				};
> >  			};
> >
> >  			xor at 60900 {
> > -				compatible = "marvell,orion-xor";
> > +				compatible = "marvell,a38x-xor";
> >  				reg = <0x60900 0x100
> >  				       0x60b00 0x100>;
> >  				clocks = <&gateclk 28>;
> > @@ -478,12 +474,8 @@
> >  					interrupts = <GIC_SPI 65
> IRQ_TYPE_LEVEL_HIGH>;
> >  					dmacap,memcpy;
> >  					dmacap,xor;
> > -				};
> > -				xor11 {
> > -					interrupts = <GIC_SPI 66
> IRQ_TYPE_LEVEL_HIGH>;
> > -					dmacap,memcpy;
> > -					dmacap,xor;
> > -					dmacap,memset;
> > +					dmacap,pq;
> > +					dmacap,interrupt;
> >  				};
> >  			};
> >
> > --
> > 2.4.0
> >


Regards,
Lior Amsalem

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode
  2015-05-12 15:49     ` Arnd Bergmann
@ 2015-05-13  8:15       ` Maxime Ripard
  -1 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-13  8:15 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, Vinod Koul, Dan Williams, Gregory Clement,
	Jason Cooper, Andrew Lunn, Sebastian Hesselbarth, Lior Amsalem,
	Herbert Xu, Thomas Petazzoni, linux-kernel, linux-crypto,
	dmaengine, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 2894 bytes --]

Hi Arnd,

On Tue, May 12, 2015 at 05:49:08PM +0200, Arnd Bergmann wrote:
> On Tuesday 12 May 2015 17:37:37 Maxime Ripard wrote:
> > From: Lior Amsalem <alior@marvell.com>
> > 
> > The Marvell Armada 38x SoC introduce new features to the XOR engine,
> > especially the fact that the engine mode (MEMCPY/XOR/PQ/etc) can be part of
> > the descriptor and not set through the controller registers.
> > 
> > This new feature allows mixing of different commands (even PQ) on the same
> > channel/chain without the need to stop the engine to reconfigure the engine
> > mode.
> > 
> > Refactor the driver to be able to use that new feature on the Armada 38x,
> > while keeping the old behaviour on the older SoCs.
> > 
> > Signed-off-by: Lior Amsalem <alior@marvell.com>
> > Reviewed-by: Ofer Heifetz <oferh@marvell.com>
> > Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
> 
> Two minimal style comments:
> 
> > +static void mv_chan_set_mode_to_desc(struct mv_xor_chan *chan)
> > +{
> > +	u32 op_mode;
> > +	u32 config = readl_relaxed(XOR_CONFIG(chan));
> > +
> > +	op_mode = XOR_OPERATION_MODE_IN_DESC;
> > +
> > +	config &= ~0x7;
> > +	config |= op_mode;
> > +
> > +#if defined(__BIG_ENDIAN)
> > +	config |= XOR_DESCRIPTOR_SWAP;
> > +#else
> > +	config &= ~XOR_DESCRIPTOR_SWAP;
> > +#endif
> > +
> > +	writel_relaxed(config, XOR_CONFIG(chan));
> > +}
> 
> Using 
> 
> 	if (IS_ENABLED(__BIG_ENDIAN))
> 
> here would make it more readable by avoiding the #if.

Indeed. I'll change that.

> Alternatively, you could leave the XOR_DESCRIPTOR_SWAP flag disabled
> and just swap the descriptors manually like a lot of other drivers
> do. You have to swap the mmio accesses anywya.

That won't be easily doable however.

Not only the endianness of the individual fields in the descriptor
changes, but changing the endianness also swaps the fields themselves
by pair of u32.

You can see that here:
http://lxr.free-electrons.com/source/drivers/dma/mv_xor.h#L159

So I'm guessing that leaving it like it is was the more readable
solution.

> >  }
> >  
> > +#ifdef CONFIG_OF
> > +static const struct of_device_id mv_xor_dt_ids[] = {
> > +	{ .compatible = "marvell,orion-xor", .data = (void *)XOR_MODE_IN_REG },
> > +	{ .compatible = "marvell,a38x-xor", .data = (void *)XOR_MODE_IN_DESC },
> > +	{},
> > +};
> > +MODULE_DEVICE_TABLE(of, mv_xor_dt_ids);
> > +#endif
> > +
> 
> Just leave out the #ifdef here. Almost all the mvebu machines use DT now,
> so it's not worth the size benefit of leaving it out on the few machines
> that don't.
> 
> You'll have to remove the of_match_ptr() invocation as well if you do that,
> to avoid a warning about an unused symbol.

Will do.

Thanks!
Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode
@ 2015-05-13  8:15       ` Maxime Ripard
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-13  8:15 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Arnd,

On Tue, May 12, 2015 at 05:49:08PM +0200, Arnd Bergmann wrote:
> On Tuesday 12 May 2015 17:37:37 Maxime Ripard wrote:
> > From: Lior Amsalem <alior@marvell.com>
> > 
> > The Marvell Armada 38x SoC introduce new features to the XOR engine,
> > especially the fact that the engine mode (MEMCPY/XOR/PQ/etc) can be part of
> > the descriptor and not set through the controller registers.
> > 
> > This new feature allows mixing of different commands (even PQ) on the same
> > channel/chain without the need to stop the engine to reconfigure the engine
> > mode.
> > 
> > Refactor the driver to be able to use that new feature on the Armada 38x,
> > while keeping the old behaviour on the older SoCs.
> > 
> > Signed-off-by: Lior Amsalem <alior@marvell.com>
> > Reviewed-by: Ofer Heifetz <oferh@marvell.com>
> > Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
> 
> Two minimal style comments:
> 
> > +static void mv_chan_set_mode_to_desc(struct mv_xor_chan *chan)
> > +{
> > +	u32 op_mode;
> > +	u32 config = readl_relaxed(XOR_CONFIG(chan));
> > +
> > +	op_mode = XOR_OPERATION_MODE_IN_DESC;
> > +
> > +	config &= ~0x7;
> > +	config |= op_mode;
> > +
> > +#if defined(__BIG_ENDIAN)
> > +	config |= XOR_DESCRIPTOR_SWAP;
> > +#else
> > +	config &= ~XOR_DESCRIPTOR_SWAP;
> > +#endif
> > +
> > +	writel_relaxed(config, XOR_CONFIG(chan));
> > +}
> 
> Using 
> 
> 	if (IS_ENABLED(__BIG_ENDIAN))
> 
> here would make it more readable by avoiding the #if.

Indeed. I'll change that.

> Alternatively, you could leave the XOR_DESCRIPTOR_SWAP flag disabled
> and just swap the descriptors manually like a lot of other drivers
> do. You have to swap the mmio accesses anywya.

That won't be easily doable however.

Not only the endianness of the individual fields in the descriptor
changes, but changing the endianness also swaps the fields themselves
by pair of u32.

You can see that here:
http://lxr.free-electrons.com/source/drivers/dma/mv_xor.h#L159

So I'm guessing that leaving it like it is was the more readable
solution.

> >  }
> >  
> > +#ifdef CONFIG_OF
> > +static const struct of_device_id mv_xor_dt_ids[] = {
> > +	{ .compatible = "marvell,orion-xor", .data = (void *)XOR_MODE_IN_REG },
> > +	{ .compatible = "marvell,a38x-xor", .data = (void *)XOR_MODE_IN_DESC },
> > +	{},
> > +};
> > +MODULE_DEVICE_TABLE(of, mv_xor_dt_ids);
> > +#endif
> > +
> 
> Just leave out the #ifdef here. Almost all the mvebu machines use DT now,
> so it's not worth the size benefit of leaving it out on the few machines
> that don't.
> 
> You'll have to remove the of_match_ptr() invocation as well if you do that,
> to avoid a warning about an unused symbol.

Will do.

Thanks!
Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150513/4d57f15b/attachment.sig>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode
  2015-05-12 16:05         ` Thomas Petazzoni
@ 2015-05-13  8:23           ` Maxime Ripard
  -1 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-13  8:23 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: Andrew Lunn, Vinod Koul, Dan Williams, Gregory Clement,
	Jason Cooper, Sebastian Hesselbarth, dmaengine, linux-arm-kernel,
	linux-kernel, linux-crypto, Lior Amsalem, Thomas Petazzoni,
	Herbert Xu, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 598 bytes --]

On Tue, May 12, 2015 at 06:05:39PM +0200, Thomas Petazzoni wrote:
> Andrew,
> 
> On Tue, 12 May 2015 17:58:24 +0200, Andrew Lunn wrote:
> 
> > A quick grep for marvell,orion, marvell,kirkwood and marvell,dove
> > suggests it should have the - if we want to be consistent with older
> > Marvell devices. As you said, starting with marvell,armanda it gets
> > pretty inconsistent :-(
> 
> Yes, with the "-" would be better IMO.

changed to armada-380-xor, thanks!

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode
@ 2015-05-13  8:23           ` Maxime Ripard
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-13  8:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 12, 2015 at 06:05:39PM +0200, Thomas Petazzoni wrote:
> Andrew,
> 
> On Tue, 12 May 2015 17:58:24 +0200, Andrew Lunn wrote:
> 
> > A quick grep for marvell,orion, marvell,kirkwood and marvell,dove
> > suggests it should have the - if we want to be consistent with older
> > Marvell devices. As you said, starting with marvell,armanda it gets
> > pretty inconsistent :-(
> 
> Yes, with the "-" would be better IMO.

changed to armada-380-xor, thanks!

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150513/2d994dfe/attachment-0001.sig>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 8/8] ARM: mvebu: a38x: Enable A38x XOR engine features
  2015-05-13  7:16       ` Lior Amsalem
  (?)
@ 2015-05-13  8:33         ` Maxime Ripard
  -1 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-13  8:33 UTC (permalink / raw)
  To: Lior Amsalem
  Cc: Andrew Lunn, Vinod Koul, Dan Williams, Gregory Clement,
	Jason Cooper, Sebastian Hesselbarth, dmaengine, linux-arm-kernel,
	linux-kernel, linux-crypto, Thomas Petazzoni, Herbert Xu,
	David S. Miller

[-- Attachment #1: Type: text/plain, Size: 2798 bytes --]

On Wed, May 13, 2015 at 07:16:34AM +0000, Lior Amsalem wrote:
> > From: Andrew Lunn [mailto:andrew@lunn.ch]
> > Sent: Tuesday, May 12, 2015 7:13 PM
> > 
> > On Tue, May 12, 2015 at 05:37:43PM +0200, Maxime Ripard wrote:
> > > From: Lior Amsalem <alior@marvell.com>
> > >
> > > The new XOR engine has a new compatible of its own, together with new
> > > channel capabilities.
> > >
> > > Use that new compatible now that we have a driver that can handle it.
> > >
> > > Signed-off-by: Lior Amsalem <alior@marvell.com>
> > > Reviewed-by: Ofer Heifetz <oferh@marvell.com>
> > > Reviewed-by: Nadav Haklai <nadavh@marvell.com>
> > > Tested-by: Nadav Haklai <nadavh@marvell.com>
> > > ---
> > >  arch/arm/boot/dts/armada-38x.dtsi | 20 ++++++--------------
> > >  1 file changed, 6 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/arch/arm/boot/dts/armada-38x.dtsi
> > > b/arch/arm/boot/dts/armada-38x.dtsi
> > > index ed2dd8ba4080..6d07b7389415 100644
> > > --- a/arch/arm/boot/dts/armada-38x.dtsi
> > > +++ b/arch/arm/boot/dts/armada-38x.dtsi
> > > @@ -448,7 +448,7 @@
> > >  			};
> > >
> > >  			xor@60800 {
> > > -				compatible = "marvell,orion-xor";
> > > +				compatible = "marvell,a38x-xor";
> > >  				reg = <0x60800 0x100
> > >  				       0x60a00 0x100>;
> > >  				clocks = <&gateclk 22>;
> > > @@ -458,17 +458,13 @@
> > >  					interrupts = <GIC_SPI 22
> > IRQ_TYPE_LEVEL_HIGH>;
> > >  					dmacap,memcpy;
> > >  					dmacap,xor;
> > > -				};
> > > -				xor01 {
> > > -					interrupts = <GIC_SPI 23
> > IRQ_TYPE_LEVEL_HIGH>;
> > > -					dmacap,memcpy;
> > > -					dmacap,xor;
> > > -					dmacap,memset;
> > > +					dmacap,pq;
> > > +					dmacap,interrupt;
> > 
> > Does this mean the hardware only has one channel?
> > And memset is no longer supported?
> > 
> 
> The hardware has two channels per engine and two engines.
> However, both on HW side (both channels are on the same "bus port")
> and SW (the dma subsystem will assign one channel per CPU).
> we found it's better (performance wise) to use only one channel on each engine
> and let the framework assign one per CPU.
> This way, descriptors chaining was better (cause of the depended descriptors
> problem) and overall interrupt number reduced.
> 
> Yes, since memset is a problematic one. It can only be done via registers
> (and not on descriptors level) plus no one really needs it...

And memset support has been removed from dmaengine since 3.11, so it
doesn't look like anyone really needs it :)

We're talking about reintroducing it for some platforms that actually
need it, but it wasn't really used on marvell anyway...

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 8/8] ARM: mvebu: a38x: Enable A38x XOR engine features
@ 2015-05-13  8:33         ` Maxime Ripard
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-13  8:33 UTC (permalink / raw)
  To: Lior Amsalem
  Cc: Andrew Lunn, Vinod Koul, Dan Williams, Gregory Clement,
	Jason Cooper, Sebastian Hesselbarth, dmaengine, linux-arm-kernel,
	linux-kernel, linux-crypto, Thomas Petazzoni, Herbert Xu,
	David S. Miller

[-- Attachment #1: Type: text/plain, Size: 2798 bytes --]

On Wed, May 13, 2015 at 07:16:34AM +0000, Lior Amsalem wrote:
> > From: Andrew Lunn [mailto:andrew@lunn.ch]
> > Sent: Tuesday, May 12, 2015 7:13 PM
> > 
> > On Tue, May 12, 2015 at 05:37:43PM +0200, Maxime Ripard wrote:
> > > From: Lior Amsalem <alior@marvell.com>
> > >
> > > The new XOR engine has a new compatible of its own, together with new
> > > channel capabilities.
> > >
> > > Use that new compatible now that we have a driver that can handle it.
> > >
> > > Signed-off-by: Lior Amsalem <alior@marvell.com>
> > > Reviewed-by: Ofer Heifetz <oferh@marvell.com>
> > > Reviewed-by: Nadav Haklai <nadavh@marvell.com>
> > > Tested-by: Nadav Haklai <nadavh@marvell.com>
> > > ---
> > >  arch/arm/boot/dts/armada-38x.dtsi | 20 ++++++--------------
> > >  1 file changed, 6 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/arch/arm/boot/dts/armada-38x.dtsi
> > > b/arch/arm/boot/dts/armada-38x.dtsi
> > > index ed2dd8ba4080..6d07b7389415 100644
> > > --- a/arch/arm/boot/dts/armada-38x.dtsi
> > > +++ b/arch/arm/boot/dts/armada-38x.dtsi
> > > @@ -448,7 +448,7 @@
> > >  			};
> > >
> > >  			xor@60800 {
> > > -				compatible = "marvell,orion-xor";
> > > +				compatible = "marvell,a38x-xor";
> > >  				reg = <0x60800 0x100
> > >  				       0x60a00 0x100>;
> > >  				clocks = <&gateclk 22>;
> > > @@ -458,17 +458,13 @@
> > >  					interrupts = <GIC_SPI 22
> > IRQ_TYPE_LEVEL_HIGH>;
> > >  					dmacap,memcpy;
> > >  					dmacap,xor;
> > > -				};
> > > -				xor01 {
> > > -					interrupts = <GIC_SPI 23
> > IRQ_TYPE_LEVEL_HIGH>;
> > > -					dmacap,memcpy;
> > > -					dmacap,xor;
> > > -					dmacap,memset;
> > > +					dmacap,pq;
> > > +					dmacap,interrupt;
> > 
> > Does this mean the hardware only has one channel?
> > And memset is no longer supported?
> > 
> 
> The hardware has two channels per engine and two engines.
> However, both on HW side (both channels are on the same "bus port")
> and SW (the dma subsystem will assign one channel per CPU).
> we found it's better (performance wise) to use only one channel on each engine
> and let the framework assign one per CPU.
> This way, descriptors chaining was better (cause of the depended descriptors
> problem) and overall interrupt number reduced.
> 
> Yes, since memset is a problematic one. It can only be done via registers
> (and not on descriptors level) plus no one really needs it...

And memset support has been removed from dmaengine since 3.11, so it
doesn't look like anyone really needs it :)

We're talking about reintroducing it for some platforms that actually
need it, but it wasn't really used on marvell anyway...

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 8/8] ARM: mvebu: a38x: Enable A38x XOR engine features
@ 2015-05-13  8:33         ` Maxime Ripard
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-13  8:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 13, 2015 at 07:16:34AM +0000, Lior Amsalem wrote:
> > From: Andrew Lunn [mailto:andrew at lunn.ch]
> > Sent: Tuesday, May 12, 2015 7:13 PM
> > 
> > On Tue, May 12, 2015 at 05:37:43PM +0200, Maxime Ripard wrote:
> > > From: Lior Amsalem <alior@marvell.com>
> > >
> > > The new XOR engine has a new compatible of its own, together with new
> > > channel capabilities.
> > >
> > > Use that new compatible now that we have a driver that can handle it.
> > >
> > > Signed-off-by: Lior Amsalem <alior@marvell.com>
> > > Reviewed-by: Ofer Heifetz <oferh@marvell.com>
> > > Reviewed-by: Nadav Haklai <nadavh@marvell.com>
> > > Tested-by: Nadav Haklai <nadavh@marvell.com>
> > > ---
> > >  arch/arm/boot/dts/armada-38x.dtsi | 20 ++++++--------------
> > >  1 file changed, 6 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/arch/arm/boot/dts/armada-38x.dtsi
> > > b/arch/arm/boot/dts/armada-38x.dtsi
> > > index ed2dd8ba4080..6d07b7389415 100644
> > > --- a/arch/arm/boot/dts/armada-38x.dtsi
> > > +++ b/arch/arm/boot/dts/armada-38x.dtsi
> > > @@ -448,7 +448,7 @@
> > >  			};
> > >
> > >  			xor at 60800 {
> > > -				compatible = "marvell,orion-xor";
> > > +				compatible = "marvell,a38x-xor";
> > >  				reg = <0x60800 0x100
> > >  				       0x60a00 0x100>;
> > >  				clocks = <&gateclk 22>;
> > > @@ -458,17 +458,13 @@
> > >  					interrupts = <GIC_SPI 22
> > IRQ_TYPE_LEVEL_HIGH>;
> > >  					dmacap,memcpy;
> > >  					dmacap,xor;
> > > -				};
> > > -				xor01 {
> > > -					interrupts = <GIC_SPI 23
> > IRQ_TYPE_LEVEL_HIGH>;
> > > -					dmacap,memcpy;
> > > -					dmacap,xor;
> > > -					dmacap,memset;
> > > +					dmacap,pq;
> > > +					dmacap,interrupt;
> > 
> > Does this mean the hardware only has one channel?
> > And memset is no longer supported?
> > 
> 
> The hardware has two channels per engine and two engines.
> However, both on HW side (both channels are on the same "bus port")
> and SW (the dma subsystem will assign one channel per CPU).
> we found it's better (performance wise) to use only one channel on each engine
> and let the framework assign one per CPU.
> This way, descriptors chaining was better (cause of the depended descriptors
> problem) and overall interrupt number reduced.
> 
> Yes, since memset is a problematic one. It can only be done via registers
> (and not on descriptors level) plus no one really needs it...

And memset support has been removed from dmaengine since 3.11, so it
doesn't look like anyone really needs it :)

We're talking about reintroducing it for some platforms that actually
need it, but it wasn't really used on marvell anyway...

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150513/d4fddb50/attachment-0001.sig>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 6/8] async_tx: adding mult and sum_product flags
  2015-05-12 16:05     ` Andrew Lunn
@ 2015-05-13  8:45       ` Maxime Ripard
  -1 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-13  8:45 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Vinod Koul, Dan Williams, Gregory Clement, Jason Cooper,
	Sebastian Hesselbarth, dmaengine, linux-arm-kernel, linux-kernel,
	linux-crypto, Lior Amsalem, Thomas Petazzoni, Herbert Xu,
	David S. Miller

[-- Attachment #1: Type: text/plain, Size: 3047 bytes --]

On Tue, May 12, 2015 at 06:05:35PM +0200, Andrew Lunn wrote:
> On Tue, May 12, 2015 at 05:37:41PM +0200, Maxime Ripard wrote:
> > From: Lior Amsalem <alior@marvell.com>
> > 
> > Some engines (like Marvell mv_xor) do not support mult and sum_product
> > operations as part of the pq support.
> > 
> > This patch adds new flags: DMA_PREP_PQ_MULT & DMA_PREP_PQ_SUM_PRODUCT these
> > flags helps the driver identify such operations.
> > 
> > Signed-off-by: Lior Amsalem <alior@marvell.com>
> > Reviewed-by: Ofer Heifetz <oferh@marvell.com>
> > Reviewed-by: Nadav Haklai <nadavh@marvell.com>
> > Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
> > ---
> >  crypto/async_tx/async_raid6_recov.c | 4 ++--
> >  include/linux/dmaengine.h           | 4 ++++
> >  2 files changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/crypto/async_tx/async_raid6_recov.c b/crypto/async_tx/async_raid6_recov.c
> > index 934a84981495..2db5486fd873 100644
> > --- a/crypto/async_tx/async_raid6_recov.c
> > +++ b/crypto/async_tx/async_raid6_recov.c
> > @@ -47,7 +47,7 @@ async_sum_product(struct page *dest, struct page **srcs, unsigned char *coef,
> >  		struct device *dev = dma->dev;
> >  		dma_addr_t pq[2];
> >  		struct dma_async_tx_descriptor *tx;
> > -		enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P;
> > +		enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P | DMA_PREP_PQ_SUM_PRODUCT;
> >  
> >  		if (submit->flags & ASYNC_TX_FENCE)
> >  			dma_flags |= DMA_PREP_FENCE;
> > @@ -111,7 +111,7 @@ async_mult(struct page *dest, struct page *src, u8 coef, size_t len,
> >  		dma_addr_t dma_dest[2];
> >  		struct device *dev = dma->dev;
> >  		struct dma_async_tx_descriptor *tx;
> > -		enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P;
> > +		enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P | DMA_PREP_PQ_MULT;
> >  
> >  		if (submit->flags & ASYNC_TX_FENCE)
> >  			dma_flags |= DMA_PREP_FENCE;
> > diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> > index ad419757241f..f19ecebb4d3f 100644
> > --- a/include/linux/dmaengine.h
> > +++ b/include/linux/dmaengine.h
> > @@ -174,6 +174,8 @@ struct dma_interleaved_template {
> >   *  operation it continues the calculation with new sources
> >   * @DMA_PREP_FENCE - tell the driver that subsequent operations depend
> >   *  on the result of this operation
> > + * @DMA_PREP_PQ_MULT - tell the driver that this is a mult request
> > + * @DMA_PREP_PQ_SUM_PRODUCT - tell the driver that this is a sum product request
> >   */
> >  enum dma_ctrl_flags {
> >  	DMA_PREP_INTERRUPT = (1 << 0),
> > @@ -182,6 +184,8 @@ enum dma_ctrl_flags {
> >  	DMA_PREP_PQ_DISABLE_Q = (1 << 3),
> >  	DMA_PREP_CONTINUE = (1 << 4),
> >  	DMA_PREP_FENCE = (1 << 5),
> > +	DMA_PREP_PQ_MULT = (1 << 10),
> > +	DMA_PREP_PQ_SUM_PRODUCT = (1 << 11),

None. I just forgot to change this. Thanks!

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 6/8] async_tx: adding mult and sum_product flags
@ 2015-05-13  8:45       ` Maxime Ripard
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-13  8:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 12, 2015 at 06:05:35PM +0200, Andrew Lunn wrote:
> On Tue, May 12, 2015 at 05:37:41PM +0200, Maxime Ripard wrote:
> > From: Lior Amsalem <alior@marvell.com>
> > 
> > Some engines (like Marvell mv_xor) do not support mult and sum_product
> > operations as part of the pq support.
> > 
> > This patch adds new flags: DMA_PREP_PQ_MULT & DMA_PREP_PQ_SUM_PRODUCT these
> > flags helps the driver identify such operations.
> > 
> > Signed-off-by: Lior Amsalem <alior@marvell.com>
> > Reviewed-by: Ofer Heifetz <oferh@marvell.com>
> > Reviewed-by: Nadav Haklai <nadavh@marvell.com>
> > Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
> > ---
> >  crypto/async_tx/async_raid6_recov.c | 4 ++--
> >  include/linux/dmaengine.h           | 4 ++++
> >  2 files changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/crypto/async_tx/async_raid6_recov.c b/crypto/async_tx/async_raid6_recov.c
> > index 934a84981495..2db5486fd873 100644
> > --- a/crypto/async_tx/async_raid6_recov.c
> > +++ b/crypto/async_tx/async_raid6_recov.c
> > @@ -47,7 +47,7 @@ async_sum_product(struct page *dest, struct page **srcs, unsigned char *coef,
> >  		struct device *dev = dma->dev;
> >  		dma_addr_t pq[2];
> >  		struct dma_async_tx_descriptor *tx;
> > -		enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P;
> > +		enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P | DMA_PREP_PQ_SUM_PRODUCT;
> >  
> >  		if (submit->flags & ASYNC_TX_FENCE)
> >  			dma_flags |= DMA_PREP_FENCE;
> > @@ -111,7 +111,7 @@ async_mult(struct page *dest, struct page *src, u8 coef, size_t len,
> >  		dma_addr_t dma_dest[2];
> >  		struct device *dev = dma->dev;
> >  		struct dma_async_tx_descriptor *tx;
> > -		enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P;
> > +		enum dma_ctrl_flags dma_flags = DMA_PREP_PQ_DISABLE_P | DMA_PREP_PQ_MULT;
> >  
> >  		if (submit->flags & ASYNC_TX_FENCE)
> >  			dma_flags |= DMA_PREP_FENCE;
> > diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> > index ad419757241f..f19ecebb4d3f 100644
> > --- a/include/linux/dmaengine.h
> > +++ b/include/linux/dmaengine.h
> > @@ -174,6 +174,8 @@ struct dma_interleaved_template {
> >   *  operation it continues the calculation with new sources
> >   * @DMA_PREP_FENCE - tell the driver that subsequent operations depend
> >   *  on the result of this operation
> > + * @DMA_PREP_PQ_MULT - tell the driver that this is a mult request
> > + * @DMA_PREP_PQ_SUM_PRODUCT - tell the driver that this is a sum product request
> >   */
> >  enum dma_ctrl_flags {
> >  	DMA_PREP_INTERRUPT = (1 << 0),
> > @@ -182,6 +184,8 @@ enum dma_ctrl_flags {
> >  	DMA_PREP_PQ_DISABLE_Q = (1 << 3),
> >  	DMA_PREP_CONTINUE = (1 << 4),
> >  	DMA_PREP_FENCE = (1 << 5),
> > +	DMA_PREP_PQ_MULT = (1 << 10),
> > +	DMA_PREP_PQ_SUM_PRODUCT = (1 << 11),

None. I just forgot to change this. Thanks!

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150513/a6f788ef/attachment-0001.sig>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode
  2015-05-13  8:15       ` Maxime Ripard
@ 2015-05-13  8:46         ` Arnd Bergmann
  -1 siblings, 0 replies; 79+ messages in thread
From: Arnd Bergmann @ 2015-05-13  8:46 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: linux-arm-kernel, Vinod Koul, Dan Williams, Gregory Clement,
	Jason Cooper, Andrew Lunn, Sebastian Hesselbarth, Lior Amsalem,
	Herbert Xu, Thomas Petazzoni, linux-kernel, linux-crypto,
	dmaengine, David S. Miller

On Wednesday 13 May 2015 10:15:35 Maxime Ripard wrote:
> > Alternatively, you could leave the XOR_DESCRIPTOR_SWAP flag disabled
> > and just swap the descriptors manually like a lot of other drivers
> > do. You have to swap the mmio accesses anywya.
> 
> That won't be easily doable however.
> 
> Not only the endianness of the individual fields in the descriptor
> changes, but changing the endianness also swaps the fields themselves
> by pair of u32.
> 
> You can see that here:
> http://lxr.free-electrons.com/source/drivers/dma/mv_xor.h#L159
> 
> So I'm guessing that leaving it like it is was the more readable
> solution.

I see, yes that would get a bit ugly.

	Arnd

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode
@ 2015-05-13  8:46         ` Arnd Bergmann
  0 siblings, 0 replies; 79+ messages in thread
From: Arnd Bergmann @ 2015-05-13  8:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 13 May 2015 10:15:35 Maxime Ripard wrote:
> > Alternatively, you could leave the XOR_DESCRIPTOR_SWAP flag disabled
> > and just swap the descriptors manually like a lot of other drivers
> > do. You have to swap the mmio accesses anywya.
> 
> That won't be easily doable however.
> 
> Not only the endianness of the individual fields in the descriptor
> changes, but changing the endianness also swaps the fields themselves
> by pair of u32.
> 
> You can see that here:
> http://lxr.free-electrons.com/source/drivers/dma/mv_xor.h#L159
> 
> So I'm guessing that leaving it like it is was the more readable
> solution.

I see, yes that would get a bit ugly.

	Arnd

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading
  2015-05-12 16:05   ` Dan Williams
  (?)
@ 2015-05-13  9:17     ` Maxime Ripard
  -1 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-13  9:17 UTC (permalink / raw)
  To: Dan Williams
  Cc: Vinod Koul, Gregory Clement, Jason Cooper, Andrew Lunn,
	Sebastian Hesselbarth, dmaengine, linux-arm-kernel, linux-kernel,
	linux-crypto, Lior Amsalem, Thomas Petazzoni, Herbert Xu,
	David S. Miller

[-- Attachment #1: Type: text/plain, Size: 2145 bytes --]

Hi Dan,

On Tue, May 12, 2015 at 09:05:41AM -0700, Dan Williams wrote:
> On Tue, May 12, 2015 at 8:37 AM, Maxime Ripard
> <maxime.ripard@free-electrons.com> wrote:
> > Hi,
> >
> > This serie refactors the mv_xor in order to support the latest Armada
> > 38x features, including the PQ support in order to offload the RAID6
> > PQ operations.
> >
> > Not all the PQ operations are supported by the XOR engine, so we had
> > to introduce new async_tx flags in the process to identify
> > un-supported operations.
> >
> > Please note that this is currently not usable because of a possible
> > regression in the RAID stack in 4.1 that is being discussed at the
> > moment here: https://lkml.org/lkml/2015/5/7/527
> 
> This is problematic as async_tx is a wart on the dmaengine subsystem
> and needs to be deprecated, I just have yet to find the time to do
> that work.  It turns out it was a mistake to hide the device details
> from md, it should be explicitly managing the dma channels, not
> relying on a abstraction api.  The async_tx api usage of the
> dma-mapping api is broken in that it relies on overlapping mappings of
> the same address.  This happens to work on x86, but on arm it needs
> explicit non-overlapping mappings.  I started the work to reference
> count dma-mappings in 3.13, and we need to teach md to use
> dmaengine_unmap_data explicitly.  Yielding dma channel management to
> md also results in a more efficient implementation as we can dma_map()
> the stripe cache once rather than per-io.  The  "async_tx_ack()"
> disaster can also go away when md is explicitly handling channel
> switching.

Even though I'd be very much in favor of deprecating / removing
async_tx, is it something likely to happen soon?

I remember discussing this with Vinod at Plumbers back in October, but
haven't seen anything since then.

If not, I think that we shouldn't really hold back patches to
async_tx, even though we know than in a year from now, it's going to
be gone.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading
@ 2015-05-13  9:17     ` Maxime Ripard
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-13  9:17 UTC (permalink / raw)
  To: Dan Williams
  Cc: Vinod Koul, Gregory Clement, Jason Cooper, Andrew Lunn,
	Sebastian Hesselbarth, dmaengine, linux-arm-kernel, linux-kernel,
	linux-crypto, Lior Amsalem, Thomas Petazzoni, Herbert Xu,
	David S. Miller

[-- Attachment #1: Type: text/plain, Size: 2145 bytes --]

Hi Dan,

On Tue, May 12, 2015 at 09:05:41AM -0700, Dan Williams wrote:
> On Tue, May 12, 2015 at 8:37 AM, Maxime Ripard
> <maxime.ripard@free-electrons.com> wrote:
> > Hi,
> >
> > This serie refactors the mv_xor in order to support the latest Armada
> > 38x features, including the PQ support in order to offload the RAID6
> > PQ operations.
> >
> > Not all the PQ operations are supported by the XOR engine, so we had
> > to introduce new async_tx flags in the process to identify
> > un-supported operations.
> >
> > Please note that this is currently not usable because of a possible
> > regression in the RAID stack in 4.1 that is being discussed at the
> > moment here: https://lkml.org/lkml/2015/5/7/527
> 
> This is problematic as async_tx is a wart on the dmaengine subsystem
> and needs to be deprecated, I just have yet to find the time to do
> that work.  It turns out it was a mistake to hide the device details
> from md, it should be explicitly managing the dma channels, not
> relying on a abstraction api.  The async_tx api usage of the
> dma-mapping api is broken in that it relies on overlapping mappings of
> the same address.  This happens to work on x86, but on arm it needs
> explicit non-overlapping mappings.  I started the work to reference
> count dma-mappings in 3.13, and we need to teach md to use
> dmaengine_unmap_data explicitly.  Yielding dma channel management to
> md also results in a more efficient implementation as we can dma_map()
> the stripe cache once rather than per-io.  The  "async_tx_ack()"
> disaster can also go away when md is explicitly handling channel
> switching.

Even though I'd be very much in favor of deprecating / removing
async_tx, is it something likely to happen soon?

I remember discussing this with Vinod at Plumbers back in October, but
haven't seen anything since then.

If not, I think that we shouldn't really hold back patches to
async_tx, even though we know than in a year from now, it's going to
be gone.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading
@ 2015-05-13  9:17     ` Maxime Ripard
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-13  9:17 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Dan,

On Tue, May 12, 2015 at 09:05:41AM -0700, Dan Williams wrote:
> On Tue, May 12, 2015 at 8:37 AM, Maxime Ripard
> <maxime.ripard@free-electrons.com> wrote:
> > Hi,
> >
> > This serie refactors the mv_xor in order to support the latest Armada
> > 38x features, including the PQ support in order to offload the RAID6
> > PQ operations.
> >
> > Not all the PQ operations are supported by the XOR engine, so we had
> > to introduce new async_tx flags in the process to identify
> > un-supported operations.
> >
> > Please note that this is currently not usable because of a possible
> > regression in the RAID stack in 4.1 that is being discussed at the
> > moment here: https://lkml.org/lkml/2015/5/7/527
> 
> This is problematic as async_tx is a wart on the dmaengine subsystem
> and needs to be deprecated, I just have yet to find the time to do
> that work.  It turns out it was a mistake to hide the device details
> from md, it should be explicitly managing the dma channels, not
> relying on a abstraction api.  The async_tx api usage of the
> dma-mapping api is broken in that it relies on overlapping mappings of
> the same address.  This happens to work on x86, but on arm it needs
> explicit non-overlapping mappings.  I started the work to reference
> count dma-mappings in 3.13, and we need to teach md to use
> dmaengine_unmap_data explicitly.  Yielding dma channel management to
> md also results in a more efficient implementation as we can dma_map()
> the stripe cache once rather than per-io.  The  "async_tx_ack()"
> disaster can also go away when md is explicitly handling channel
> switching.

Even though I'd be very much in favor of deprecating / removing
async_tx, is it something likely to happen soon?

I remember discussing this with Vinod at Plumbers back in October, but
haven't seen anything since then.

If not, I think that we shouldn't really hold back patches to
async_tx, even though we know than in a year from now, it's going to
be gone.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150513/925cf70b/attachment.sig>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading
  2015-05-13  9:17     ` Maxime Ripard
  (?)
@ 2015-05-13 16:00       ` Dan Williams
  -1 siblings, 0 replies; 79+ messages in thread
From: Dan Williams @ 2015-05-13 16:00 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Vinod Koul, Gregory Clement, Jason Cooper, Andrew Lunn,
	Sebastian Hesselbarth, dmaengine, linux-arm-kernel, linux-kernel,
	linux-crypto, Lior Amsalem, Thomas Petazzoni, Herbert Xu,
	David S. Miller

On Wed, May 13, 2015 at 2:17 AM, Maxime Ripard
<maxime.ripard@free-electrons.com> wrote:
> Hi Dan,
>
> On Tue, May 12, 2015 at 09:05:41AM -0700, Dan Williams wrote:
>> On Tue, May 12, 2015 at 8:37 AM, Maxime Ripard
>> <maxime.ripard@free-electrons.com> wrote:
>> > Hi,
>> >
>> > This serie refactors the mv_xor in order to support the latest Armada
>> > 38x features, including the PQ support in order to offload the RAID6
>> > PQ operations.
>> >
>> > Not all the PQ operations are supported by the XOR engine, so we had
>> > to introduce new async_tx flags in the process to identify
>> > un-supported operations.
>> >
>> > Please note that this is currently not usable because of a possible
>> > regression in the RAID stack in 4.1 that is being discussed at the
>> > moment here: https://lkml.org/lkml/2015/5/7/527
>>
>> This is problematic as async_tx is a wart on the dmaengine subsystem
>> and needs to be deprecated, I just have yet to find the time to do
>> that work.  It turns out it was a mistake to hide the device details
>> from md, it should be explicitly managing the dma channels, not
>> relying on a abstraction api.  The async_tx api usage of the
>> dma-mapping api is broken in that it relies on overlapping mappings of
>> the same address.  This happens to work on x86, but on arm it needs
>> explicit non-overlapping mappings.  I started the work to reference
>> count dma-mappings in 3.13, and we need to teach md to use
>> dmaengine_unmap_data explicitly.  Yielding dma channel management to
>> md also results in a more efficient implementation as we can dma_map()
>> the stripe cache once rather than per-io.  The  "async_tx_ack()"
>> disaster can also go away when md is explicitly handling channel
>> switching.
>
> Even though I'd be very much in favor of deprecating / removing
> async_tx, is it something likely to happen soon?

Not unless someone else takes it on, I'm actively asking for help.

> I remember discussing this with Vinod at Plumbers back in October, but
> haven't seen anything since then.

Right, "help!" :)

> If not, I think that we shouldn't really hold back patches to
> async_tx, even though we know than in a year from now, it's going to
> be gone.

We definitely should block new usages, because they make a bad
situation worse.  Russell already warned that the dma_mapping api
abuse could lead to data corruption on ARM (speculative pre-fetching).
We need to mark ASYNC_TX_DMA as "depends on !ARM" or even "depends on
BROKEN" until we can get this resolved.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading
@ 2015-05-13 16:00       ` Dan Williams
  0 siblings, 0 replies; 79+ messages in thread
From: Dan Williams @ 2015-05-13 16:00 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Vinod Koul, Gregory Clement, Jason Cooper, Andrew Lunn,
	Sebastian Hesselbarth, dmaengine, linux-arm-kernel, linux-kernel,
	linux-crypto, Lior Amsalem, Thomas Petazzoni, Herbert Xu,
	David S. Miller

On Wed, May 13, 2015 at 2:17 AM, Maxime Ripard
<maxime.ripard@free-electrons.com> wrote:
> Hi Dan,
>
> On Tue, May 12, 2015 at 09:05:41AM -0700, Dan Williams wrote:
>> On Tue, May 12, 2015 at 8:37 AM, Maxime Ripard
>> <maxime.ripard@free-electrons.com> wrote:
>> > Hi,
>> >
>> > This serie refactors the mv_xor in order to support the latest Armada
>> > 38x features, including the PQ support in order to offload the RAID6
>> > PQ operations.
>> >
>> > Not all the PQ operations are supported by the XOR engine, so we had
>> > to introduce new async_tx flags in the process to identify
>> > un-supported operations.
>> >
>> > Please note that this is currently not usable because of a possible
>> > regression in the RAID stack in 4.1 that is being discussed at the
>> > moment here: https://lkml.org/lkml/2015/5/7/527
>>
>> This is problematic as async_tx is a wart on the dmaengine subsystem
>> and needs to be deprecated, I just have yet to find the time to do
>> that work.  It turns out it was a mistake to hide the device details
>> from md, it should be explicitly managing the dma channels, not
>> relying on a abstraction api.  The async_tx api usage of the
>> dma-mapping api is broken in that it relies on overlapping mappings of
>> the same address.  This happens to work on x86, but on arm it needs
>> explicit non-overlapping mappings.  I started the work to reference
>> count dma-mappings in 3.13, and we need to teach md to use
>> dmaengine_unmap_data explicitly.  Yielding dma channel management to
>> md also results in a more efficient implementation as we can dma_map()
>> the stripe cache once rather than per-io.  The  "async_tx_ack()"
>> disaster can also go away when md is explicitly handling channel
>> switching.
>
> Even though I'd be very much in favor of deprecating / removing
> async_tx, is it something likely to happen soon?

Not unless someone else takes it on, I'm actively asking for help.

> I remember discussing this with Vinod at Plumbers back in October, but
> haven't seen anything since then.

Right, "help!" :)

> If not, I think that we shouldn't really hold back patches to
> async_tx, even though we know than in a year from now, it's going to
> be gone.

We definitely should block new usages, because they make a bad
situation worse.  Russell already warned that the dma_mapping api
abuse could lead to data corruption on ARM (speculative pre-fetching).
We need to mark ASYNC_TX_DMA as "depends on !ARM" or even "depends on
BROKEN" until we can get this resolved.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading
@ 2015-05-13 16:00       ` Dan Williams
  0 siblings, 0 replies; 79+ messages in thread
From: Dan Williams @ 2015-05-13 16:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 13, 2015 at 2:17 AM, Maxime Ripard
<maxime.ripard@free-electrons.com> wrote:
> Hi Dan,
>
> On Tue, May 12, 2015 at 09:05:41AM -0700, Dan Williams wrote:
>> On Tue, May 12, 2015 at 8:37 AM, Maxime Ripard
>> <maxime.ripard@free-electrons.com> wrote:
>> > Hi,
>> >
>> > This serie refactors the mv_xor in order to support the latest Armada
>> > 38x features, including the PQ support in order to offload the RAID6
>> > PQ operations.
>> >
>> > Not all the PQ operations are supported by the XOR engine, so we had
>> > to introduce new async_tx flags in the process to identify
>> > un-supported operations.
>> >
>> > Please note that this is currently not usable because of a possible
>> > regression in the RAID stack in 4.1 that is being discussed at the
>> > moment here: https://lkml.org/lkml/2015/5/7/527
>>
>> This is problematic as async_tx is a wart on the dmaengine subsystem
>> and needs to be deprecated, I just have yet to find the time to do
>> that work.  It turns out it was a mistake to hide the device details
>> from md, it should be explicitly managing the dma channels, not
>> relying on a abstraction api.  The async_tx api usage of the
>> dma-mapping api is broken in that it relies on overlapping mappings of
>> the same address.  This happens to work on x86, but on arm it needs
>> explicit non-overlapping mappings.  I started the work to reference
>> count dma-mappings in 3.13, and we need to teach md to use
>> dmaengine_unmap_data explicitly.  Yielding dma channel management to
>> md also results in a more efficient implementation as we can dma_map()
>> the stripe cache once rather than per-io.  The  "async_tx_ack()"
>> disaster can also go away when md is explicitly handling channel
>> switching.
>
> Even though I'd be very much in favor of deprecating / removing
> async_tx, is it something likely to happen soon?

Not unless someone else takes it on, I'm actively asking for help.

> I remember discussing this with Vinod at Plumbers back in October, but
> haven't seen anything since then.

Right, "help!" :)

> If not, I think that we shouldn't really hold back patches to
> async_tx, even though we know than in a year from now, it's going to
> be gone.

We definitely should block new usages, because they make a bad
situation worse.  Russell already warned that the dma_mapping api
abuse could lead to data corruption on ARM (speculative pre-fetching).
We need to mark ASYNC_TX_DMA as "depends on !ARM" or even "depends on
BROKEN" until we can get this resolved.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading
  2015-05-13 16:00       ` Dan Williams
  (?)
@ 2015-05-18  9:14         ` Maxime Ripard
  -1 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-18  9:14 UTC (permalink / raw)
  To: Dan Williams
  Cc: Vinod Koul, Gregory Clement, Jason Cooper, Andrew Lunn,
	Sebastian Hesselbarth, dmaengine, linux-arm-kernel, linux-kernel,
	linux-crypto, Lior Amsalem, Thomas Petazzoni, Herbert Xu,
	David S. Miller

[-- Attachment #1: Type: text/plain, Size: 3300 bytes --]

Hi Dan,

On Wed, May 13, 2015 at 09:00:46AM -0700, Dan Williams wrote:
> On Wed, May 13, 2015 at 2:17 AM, Maxime Ripard
> <maxime.ripard@free-electrons.com> wrote:
> > Hi Dan,
> >
> > On Tue, May 12, 2015 at 09:05:41AM -0700, Dan Williams wrote:
> >> On Tue, May 12, 2015 at 8:37 AM, Maxime Ripard
> >> <maxime.ripard@free-electrons.com> wrote:
> >> > Hi,
> >> >
> >> > This serie refactors the mv_xor in order to support the latest Armada
> >> > 38x features, including the PQ support in order to offload the RAID6
> >> > PQ operations.
> >> >
> >> > Not all the PQ operations are supported by the XOR engine, so we had
> >> > to introduce new async_tx flags in the process to identify
> >> > un-supported operations.
> >> >
> >> > Please note that this is currently not usable because of a possible
> >> > regression in the RAID stack in 4.1 that is being discussed at the
> >> > moment here: https://lkml.org/lkml/2015/5/7/527
> >>
> >> This is problematic as async_tx is a wart on the dmaengine subsystem
> >> and needs to be deprecated, I just have yet to find the time to do
> >> that work.  It turns out it was a mistake to hide the device details
> >> from md, it should be explicitly managing the dma channels, not
> >> relying on a abstraction api.  The async_tx api usage of the
> >> dma-mapping api is broken in that it relies on overlapping mappings of
> >> the same address.  This happens to work on x86, but on arm it needs
> >> explicit non-overlapping mappings.  I started the work to reference
> >> count dma-mappings in 3.13, and we need to teach md to use
> >> dmaengine_unmap_data explicitly.  Yielding dma channel management to
> >> md also results in a more efficient implementation as we can dma_map()
> >> the stripe cache once rather than per-io.  The  "async_tx_ack()"
> >> disaster can also go away when md is explicitly handling channel
> >> switching.
> >
> > Even though I'd be very much in favor of deprecating / removing
> > async_tx, is it something likely to happen soon?
> 
> Not unless someone else takes it on, I'm actively asking for help.
> 
> > I remember discussing this with Vinod at Plumbers back in October, but
> > haven't seen anything since then.
> 
> Right, "help!" :)
> 
> > If not, I think that we shouldn't really hold back patches to
> > async_tx, even though we know than in a year from now, it's going to
> > be gone.
> 
> We definitely should block new usages, because they make a bad
> situation worse.  Russell already warned that the dma_mapping api
> abuse could lead to data corruption on ARM (speculative pre-fetching).
> We need to mark ASYNC_TX_DMA as "depends on !ARM" or even "depends on
> BROKEN" until we can get this resolved.

I'm not sure what the issues exactly are with async_tx and ARM, but
these patches have been tested on ARM and are working quite well.

What I'm doing here is merely using the existing API, I'm not making
it worse, just using the API that is used by numerous drivers
already. So I'm not sure this is really reasonable to ask for such a
huge rework (with a huge potential of regressions) before merging my
patches.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading
@ 2015-05-18  9:14         ` Maxime Ripard
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-18  9:14 UTC (permalink / raw)
  To: Dan Williams
  Cc: Vinod Koul, Gregory Clement, Jason Cooper, Andrew Lunn,
	Sebastian Hesselbarth, dmaengine, linux-arm-kernel, linux-kernel,
	linux-crypto, Lior Amsalem, Thomas Petazzoni, Herbert Xu,
	David S. Miller

[-- Attachment #1: Type: text/plain, Size: 3300 bytes --]

Hi Dan,

On Wed, May 13, 2015 at 09:00:46AM -0700, Dan Williams wrote:
> On Wed, May 13, 2015 at 2:17 AM, Maxime Ripard
> <maxime.ripard@free-electrons.com> wrote:
> > Hi Dan,
> >
> > On Tue, May 12, 2015 at 09:05:41AM -0700, Dan Williams wrote:
> >> On Tue, May 12, 2015 at 8:37 AM, Maxime Ripard
> >> <maxime.ripard@free-electrons.com> wrote:
> >> > Hi,
> >> >
> >> > This serie refactors the mv_xor in order to support the latest Armada
> >> > 38x features, including the PQ support in order to offload the RAID6
> >> > PQ operations.
> >> >
> >> > Not all the PQ operations are supported by the XOR engine, so we had
> >> > to introduce new async_tx flags in the process to identify
> >> > un-supported operations.
> >> >
> >> > Please note that this is currently not usable because of a possible
> >> > regression in the RAID stack in 4.1 that is being discussed at the
> >> > moment here: https://lkml.org/lkml/2015/5/7/527
> >>
> >> This is problematic as async_tx is a wart on the dmaengine subsystem
> >> and needs to be deprecated, I just have yet to find the time to do
> >> that work.  It turns out it was a mistake to hide the device details
> >> from md, it should be explicitly managing the dma channels, not
> >> relying on a abstraction api.  The async_tx api usage of the
> >> dma-mapping api is broken in that it relies on overlapping mappings of
> >> the same address.  This happens to work on x86, but on arm it needs
> >> explicit non-overlapping mappings.  I started the work to reference
> >> count dma-mappings in 3.13, and we need to teach md to use
> >> dmaengine_unmap_data explicitly.  Yielding dma channel management to
> >> md also results in a more efficient implementation as we can dma_map()
> >> the stripe cache once rather than per-io.  The  "async_tx_ack()"
> >> disaster can also go away when md is explicitly handling channel
> >> switching.
> >
> > Even though I'd be very much in favor of deprecating / removing
> > async_tx, is it something likely to happen soon?
> 
> Not unless someone else takes it on, I'm actively asking for help.
> 
> > I remember discussing this with Vinod at Plumbers back in October, but
> > haven't seen anything since then.
> 
> Right, "help!" :)
> 
> > If not, I think that we shouldn't really hold back patches to
> > async_tx, even though we know than in a year from now, it's going to
> > be gone.
> 
> We definitely should block new usages, because they make a bad
> situation worse.  Russell already warned that the dma_mapping api
> abuse could lead to data corruption on ARM (speculative pre-fetching).
> We need to mark ASYNC_TX_DMA as "depends on !ARM" or even "depends on
> BROKEN" until we can get this resolved.

I'm not sure what the issues exactly are with async_tx and ARM, but
these patches have been tested on ARM and are working quite well.

What I'm doing here is merely using the existing API, I'm not making
it worse, just using the API that is used by numerous drivers
already. So I'm not sure this is really reasonable to ask for such a
huge rework (with a huge potential of regressions) before merging my
patches.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading
@ 2015-05-18  9:14         ` Maxime Ripard
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-18  9:14 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Dan,

On Wed, May 13, 2015 at 09:00:46AM -0700, Dan Williams wrote:
> On Wed, May 13, 2015 at 2:17 AM, Maxime Ripard
> <maxime.ripard@free-electrons.com> wrote:
> > Hi Dan,
> >
> > On Tue, May 12, 2015 at 09:05:41AM -0700, Dan Williams wrote:
> >> On Tue, May 12, 2015 at 8:37 AM, Maxime Ripard
> >> <maxime.ripard@free-electrons.com> wrote:
> >> > Hi,
> >> >
> >> > This serie refactors the mv_xor in order to support the latest Armada
> >> > 38x features, including the PQ support in order to offload the RAID6
> >> > PQ operations.
> >> >
> >> > Not all the PQ operations are supported by the XOR engine, so we had
> >> > to introduce new async_tx flags in the process to identify
> >> > un-supported operations.
> >> >
> >> > Please note that this is currently not usable because of a possible
> >> > regression in the RAID stack in 4.1 that is being discussed at the
> >> > moment here: https://lkml.org/lkml/2015/5/7/527
> >>
> >> This is problematic as async_tx is a wart on the dmaengine subsystem
> >> and needs to be deprecated, I just have yet to find the time to do
> >> that work.  It turns out it was a mistake to hide the device details
> >> from md, it should be explicitly managing the dma channels, not
> >> relying on a abstraction api.  The async_tx api usage of the
> >> dma-mapping api is broken in that it relies on overlapping mappings of
> >> the same address.  This happens to work on x86, but on arm it needs
> >> explicit non-overlapping mappings.  I started the work to reference
> >> count dma-mappings in 3.13, and we need to teach md to use
> >> dmaengine_unmap_data explicitly.  Yielding dma channel management to
> >> md also results in a more efficient implementation as we can dma_map()
> >> the stripe cache once rather than per-io.  The  "async_tx_ack()"
> >> disaster can also go away when md is explicitly handling channel
> >> switching.
> >
> > Even though I'd be very much in favor of deprecating / removing
> > async_tx, is it something likely to happen soon?
> 
> Not unless someone else takes it on, I'm actively asking for help.
> 
> > I remember discussing this with Vinod at Plumbers back in October, but
> > haven't seen anything since then.
> 
> Right, "help!" :)
> 
> > If not, I think that we shouldn't really hold back patches to
> > async_tx, even though we know than in a year from now, it's going to
> > be gone.
> 
> We definitely should block new usages, because they make a bad
> situation worse.  Russell already warned that the dma_mapping api
> abuse could lead to data corruption on ARM (speculative pre-fetching).
> We need to mark ASYNC_TX_DMA as "depends on !ARM" or even "depends on
> BROKEN" until we can get this resolved.

I'm not sure what the issues exactly are with async_tx and ARM, but
these patches have been tested on ARM and are working quite well.

What I'm doing here is merely using the existing API, I'm not making
it worse, just using the API that is used by numerous drivers
already. So I'm not sure this is really reasonable to ask for such a
huge rework (with a huge potential of regressions) before merging my
patches.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150518/2ea565a3/attachment.sig>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading
  2015-05-18  9:14         ` Maxime Ripard
  (?)
@ 2015-05-18 17:06           ` Dan Williams
  -1 siblings, 0 replies; 79+ messages in thread
From: Dan Williams @ 2015-05-18 17:06 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Vinod Koul, Gregory Clement, Jason Cooper, Andrew Lunn,
	Sebastian Hesselbarth, dmaengine, linux-arm-kernel, linux-kernel,
	linux-crypto, Lior Amsalem, Thomas Petazzoni, Herbert Xu,
	David S. Miller

On Mon, May 18, 2015 at 2:14 AM, Maxime Ripard
<maxime.ripard@free-electrons.com> wrote:
> Hi Dan,
>
> On Wed, May 13, 2015 at 09:00:46AM -0700, Dan Williams wrote:
>> On Wed, May 13, 2015 at 2:17 AM, Maxime Ripard
>> <maxime.ripard@free-electrons.com> wrote:
>> > Hi Dan,
>> >
>> > On Tue, May 12, 2015 at 09:05:41AM -0700, Dan Williams wrote:
>> >> On Tue, May 12, 2015 at 8:37 AM, Maxime Ripard
>> >> <maxime.ripard@free-electrons.com> wrote:
>> >> > Hi,
>> >> >
>> >> > This serie refactors the mv_xor in order to support the latest Armada
>> >> > 38x features, including the PQ support in order to offload the RAID6
>> >> > PQ operations.
>> >> >
>> >> > Not all the PQ operations are supported by the XOR engine, so we had
>> >> > to introduce new async_tx flags in the process to identify
>> >> > un-supported operations.
>> >> >
>> >> > Please note that this is currently not usable because of a possible
>> >> > regression in the RAID stack in 4.1 that is being discussed at the
>> >> > moment here: https://lkml.org/lkml/2015/5/7/527
>> >>
>> >> This is problematic as async_tx is a wart on the dmaengine subsystem
>> >> and needs to be deprecated, I just have yet to find the time to do
>> >> that work.  It turns out it was a mistake to hide the device details
>> >> from md, it should be explicitly managing the dma channels, not
>> >> relying on a abstraction api.  The async_tx api usage of the
>> >> dma-mapping api is broken in that it relies on overlapping mappings of
>> >> the same address.  This happens to work on x86, but on arm it needs
>> >> explicit non-overlapping mappings.  I started the work to reference
>> >> count dma-mappings in 3.13, and we need to teach md to use
>> >> dmaengine_unmap_data explicitly.  Yielding dma channel management to
>> >> md also results in a more efficient implementation as we can dma_map()
>> >> the stripe cache once rather than per-io.  The  "async_tx_ack()"
>> >> disaster can also go away when md is explicitly handling channel
>> >> switching.
>> >
>> > Even though I'd be very much in favor of deprecating / removing
>> > async_tx, is it something likely to happen soon?
>>
>> Not unless someone else takes it on, I'm actively asking for help.
>>
>> > I remember discussing this with Vinod at Plumbers back in October, but
>> > haven't seen anything since then.
>>
>> Right, "help!" :)
>>
>> > If not, I think that we shouldn't really hold back patches to
>> > async_tx, even though we know than in a year from now, it's going to
>> > be gone.
>>
>> We definitely should block new usages, because they make a bad
>> situation worse.  Russell already warned that the dma_mapping api
>> abuse could lead to data corruption on ARM (speculative pre-fetching).
>> We need to mark ASYNC_TX_DMA as "depends on !ARM" or even "depends on
>> BROKEN" until we can get this resolved.
>
> I'm not sure what the issues exactly are with async_tx and ARM, but
> these patches have been tested on ARM and are working quite well.

https://lkml.org/lkml/2011/7/8/363

> What I'm doing here is merely using the existing API, I'm not making
> it worse, just using the API that is used by numerous drivers
> already. So I'm not sure this is really reasonable to ask for such a
> huge rework (with a huge potential of regressions) before merging my
> patches.

It happens.

https://lwn.net/Articles/641443/

I'm not happy about not having had the time to do this rework myself.
Linux is better off with this api deprecated.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading
@ 2015-05-18 17:06           ` Dan Williams
  0 siblings, 0 replies; 79+ messages in thread
From: Dan Williams @ 2015-05-18 17:06 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Vinod Koul, Gregory Clement, Jason Cooper, Andrew Lunn,
	Sebastian Hesselbarth, dmaengine, linux-arm-kernel, linux-kernel,
	linux-crypto, Lior Amsalem, Thomas Petazzoni, Herbert Xu,
	David S. Miller

On Mon, May 18, 2015 at 2:14 AM, Maxime Ripard
<maxime.ripard@free-electrons.com> wrote:
> Hi Dan,
>
> On Wed, May 13, 2015 at 09:00:46AM -0700, Dan Williams wrote:
>> On Wed, May 13, 2015 at 2:17 AM, Maxime Ripard
>> <maxime.ripard@free-electrons.com> wrote:
>> > Hi Dan,
>> >
>> > On Tue, May 12, 2015 at 09:05:41AM -0700, Dan Williams wrote:
>> >> On Tue, May 12, 2015 at 8:37 AM, Maxime Ripard
>> >> <maxime.ripard@free-electrons.com> wrote:
>> >> > Hi,
>> >> >
>> >> > This serie refactors the mv_xor in order to support the latest Armada
>> >> > 38x features, including the PQ support in order to offload the RAID6
>> >> > PQ operations.
>> >> >
>> >> > Not all the PQ operations are supported by the XOR engine, so we had
>> >> > to introduce new async_tx flags in the process to identify
>> >> > un-supported operations.
>> >> >
>> >> > Please note that this is currently not usable because of a possible
>> >> > regression in the RAID stack in 4.1 that is being discussed at the
>> >> > moment here: https://lkml.org/lkml/2015/5/7/527
>> >>
>> >> This is problematic as async_tx is a wart on the dmaengine subsystem
>> >> and needs to be deprecated, I just have yet to find the time to do
>> >> that work.  It turns out it was a mistake to hide the device details
>> >> from md, it should be explicitly managing the dma channels, not
>> >> relying on a abstraction api.  The async_tx api usage of the
>> >> dma-mapping api is broken in that it relies on overlapping mappings of
>> >> the same address.  This happens to work on x86, but on arm it needs
>> >> explicit non-overlapping mappings.  I started the work to reference
>> >> count dma-mappings in 3.13, and we need to teach md to use
>> >> dmaengine_unmap_data explicitly.  Yielding dma channel management to
>> >> md also results in a more efficient implementation as we can dma_map()
>> >> the stripe cache once rather than per-io.  The  "async_tx_ack()"
>> >> disaster can also go away when md is explicitly handling channel
>> >> switching.
>> >
>> > Even though I'd be very much in favor of deprecating / removing
>> > async_tx, is it something likely to happen soon?
>>
>> Not unless someone else takes it on, I'm actively asking for help.
>>
>> > I remember discussing this with Vinod at Plumbers back in October, but
>> > haven't seen anything since then.
>>
>> Right, "help!" :)
>>
>> > If not, I think that we shouldn't really hold back patches to
>> > async_tx, even though we know than in a year from now, it's going to
>> > be gone.
>>
>> We definitely should block new usages, because they make a bad
>> situation worse.  Russell already warned that the dma_mapping api
>> abuse could lead to data corruption on ARM (speculative pre-fetching).
>> We need to mark ASYNC_TX_DMA as "depends on !ARM" or even "depends on
>> BROKEN" until we can get this resolved.
>
> I'm not sure what the issues exactly are with async_tx and ARM, but
> these patches have been tested on ARM and are working quite well.

https://lkml.org/lkml/2011/7/8/363

> What I'm doing here is merely using the existing API, I'm not making
> it worse, just using the API that is used by numerous drivers
> already. So I'm not sure this is really reasonable to ask for such a
> huge rework (with a huge potential of regressions) before merging my
> patches.

It happens.

https://lwn.net/Articles/641443/

I'm not happy about not having had the time to do this rework myself.
Linux is better off with this api deprecated.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading
@ 2015-05-18 17:06           ` Dan Williams
  0 siblings, 0 replies; 79+ messages in thread
From: Dan Williams @ 2015-05-18 17:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, May 18, 2015 at 2:14 AM, Maxime Ripard
<maxime.ripard@free-electrons.com> wrote:
> Hi Dan,
>
> On Wed, May 13, 2015 at 09:00:46AM -0700, Dan Williams wrote:
>> On Wed, May 13, 2015 at 2:17 AM, Maxime Ripard
>> <maxime.ripard@free-electrons.com> wrote:
>> > Hi Dan,
>> >
>> > On Tue, May 12, 2015 at 09:05:41AM -0700, Dan Williams wrote:
>> >> On Tue, May 12, 2015 at 8:37 AM, Maxime Ripard
>> >> <maxime.ripard@free-electrons.com> wrote:
>> >> > Hi,
>> >> >
>> >> > This serie refactors the mv_xor in order to support the latest Armada
>> >> > 38x features, including the PQ support in order to offload the RAID6
>> >> > PQ operations.
>> >> >
>> >> > Not all the PQ operations are supported by the XOR engine, so we had
>> >> > to introduce new async_tx flags in the process to identify
>> >> > un-supported operations.
>> >> >
>> >> > Please note that this is currently not usable because of a possible
>> >> > regression in the RAID stack in 4.1 that is being discussed at the
>> >> > moment here: https://lkml.org/lkml/2015/5/7/527
>> >>
>> >> This is problematic as async_tx is a wart on the dmaengine subsystem
>> >> and needs to be deprecated, I just have yet to find the time to do
>> >> that work.  It turns out it was a mistake to hide the device details
>> >> from md, it should be explicitly managing the dma channels, not
>> >> relying on a abstraction api.  The async_tx api usage of the
>> >> dma-mapping api is broken in that it relies on overlapping mappings of
>> >> the same address.  This happens to work on x86, but on arm it needs
>> >> explicit non-overlapping mappings.  I started the work to reference
>> >> count dma-mappings in 3.13, and we need to teach md to use
>> >> dmaengine_unmap_data explicitly.  Yielding dma channel management to
>> >> md also results in a more efficient implementation as we can dma_map()
>> >> the stripe cache once rather than per-io.  The  "async_tx_ack()"
>> >> disaster can also go away when md is explicitly handling channel
>> >> switching.
>> >
>> > Even though I'd be very much in favor of deprecating / removing
>> > async_tx, is it something likely to happen soon?
>>
>> Not unless someone else takes it on, I'm actively asking for help.
>>
>> > I remember discussing this with Vinod at Plumbers back in October, but
>> > haven't seen anything since then.
>>
>> Right, "help!" :)
>>
>> > If not, I think that we shouldn't really hold back patches to
>> > async_tx, even though we know than in a year from now, it's going to
>> > be gone.
>>
>> We definitely should block new usages, because they make a bad
>> situation worse.  Russell already warned that the dma_mapping api
>> abuse could lead to data corruption on ARM (speculative pre-fetching).
>> We need to mark ASYNC_TX_DMA as "depends on !ARM" or even "depends on
>> BROKEN" until we can get this resolved.
>
> I'm not sure what the issues exactly are with async_tx and ARM, but
> these patches have been tested on ARM and are working quite well.

https://lkml.org/lkml/2011/7/8/363

> What I'm doing here is merely using the existing API, I'm not making
> it worse, just using the API that is used by numerous drivers
> already. So I'm not sure this is really reasonable to ask for such a
> huge rework (with a huge potential of regressions) before merging my
> patches.

It happens.

https://lwn.net/Articles/641443/

I'm not happy about not having had the time to do this rework myself.
Linux is better off with this api deprecated.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading
  2015-05-18 17:06           ` Dan Williams
  (?)
@ 2015-05-26  9:45             ` Maxime Ripard
  -1 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-26  9:45 UTC (permalink / raw)
  To: Dan Williams
  Cc: Lior Amsalem, Andrew Lunn, Jason Cooper, Vinod Koul,
	linux-kernel, dmaengine, linux-crypto, Gregory Clement,
	Herbert Xu, Thomas Petazzoni, David S. Miller, linux-arm-kernel,
	Sebastian Hesselbarth


[-- Attachment #1.1: Type: text/plain, Size: 4315 bytes --]

On Mon, May 18, 2015 at 10:06:55AM -0700, Dan Williams wrote:
> On Mon, May 18, 2015 at 2:14 AM, Maxime Ripard
> <maxime.ripard@free-electrons.com> wrote:
> > Hi Dan,
> >
> > On Wed, May 13, 2015 at 09:00:46AM -0700, Dan Williams wrote:
> >> On Wed, May 13, 2015 at 2:17 AM, Maxime Ripard
> >> <maxime.ripard@free-electrons.com> wrote:
> >> > Hi Dan,
> >> >
> >> > On Tue, May 12, 2015 at 09:05:41AM -0700, Dan Williams wrote:
> >> >> On Tue, May 12, 2015 at 8:37 AM, Maxime Ripard
> >> >> <maxime.ripard@free-electrons.com> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > This serie refactors the mv_xor in order to support the latest Armada
> >> >> > 38x features, including the PQ support in order to offload the RAID6
> >> >> > PQ operations.
> >> >> >
> >> >> > Not all the PQ operations are supported by the XOR engine, so we had
> >> >> > to introduce new async_tx flags in the process to identify
> >> >> > un-supported operations.
> >> >> >
> >> >> > Please note that this is currently not usable because of a possible
> >> >> > regression in the RAID stack in 4.1 that is being discussed at the
> >> >> > moment here: https://lkml.org/lkml/2015/5/7/527
> >> >>
> >> >> This is problematic as async_tx is a wart on the dmaengine subsystem
> >> >> and needs to be deprecated, I just have yet to find the time to do
> >> >> that work.  It turns out it was a mistake to hide the device details
> >> >> from md, it should be explicitly managing the dma channels, not
> >> >> relying on a abstraction api.  The async_tx api usage of the
> >> >> dma-mapping api is broken in that it relies on overlapping mappings of
> >> >> the same address.  This happens to work on x86, but on arm it needs
> >> >> explicit non-overlapping mappings.  I started the work to reference
> >> >> count dma-mappings in 3.13, and we need to teach md to use
> >> >> dmaengine_unmap_data explicitly.  Yielding dma channel management to
> >> >> md also results in a more efficient implementation as we can dma_map()
> >> >> the stripe cache once rather than per-io.  The  "async_tx_ack()"
> >> >> disaster can also go away when md is explicitly handling channel
> >> >> switching.
> >> >
> >> > Even though I'd be very much in favor of deprecating / removing
> >> > async_tx, is it something likely to happen soon?
> >>
> >> Not unless someone else takes it on, I'm actively asking for help.
> >>
> >> > I remember discussing this with Vinod at Plumbers back in October, but
> >> > haven't seen anything since then.
> >>
> >> Right, "help!" :)
> >>
> >> > If not, I think that we shouldn't really hold back patches to
> >> > async_tx, even though we know than in a year from now, it's going to
> >> > be gone.
> >>
> >> We definitely should block new usages, because they make a bad
> >> situation worse.  Russell already warned that the dma_mapping api
> >> abuse could lead to data corruption on ARM (speculative pre-fetching).
> >> We need to mark ASYNC_TX_DMA as "depends on !ARM" or even "depends on
> >> BROKEN" until we can get this resolved.
> >
> > I'm not sure what the issues exactly are with async_tx and ARM, but
> > these patches have been tested on ARM and are working quite well.
> 
> https://lkml.org/lkml/2011/7/8/363
> 
> > What I'm doing here is merely using the existing API, I'm not making
> > it worse, just using the API that is used by numerous drivers
> > already. So I'm not sure this is really reasonable to ask for such a
> > huge rework (with a huge potential of regressions) before merging my
> > patches.
> 
> It happens.
> 
> https://lwn.net/Articles/641443/

It really depends on what you mean by "help". If you mean "undertake
all by yourself the removal of async tx", then no, sorry, I won't,
especially when you ask to do that for a patch that just enables a
feature of an API already used on that platform.

If you mean, "give me a hand, you can start there", then yeah, I can
do that.

> I'm not happy about not having had the time to do this rework myself.
> Linux is better off with this api deprecated.

You're not talking about deprecating it, you're talking about removing
it entirely.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading
@ 2015-05-26  9:45             ` Maxime Ripard
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-26  9:45 UTC (permalink / raw)
  To: Dan Williams
  Cc: Vinod Koul, Gregory Clement, Jason Cooper, Andrew Lunn,
	Sebastian Hesselbarth, dmaengine, linux-arm-kernel, linux-kernel,
	linux-crypto, Lior Amsalem, Thomas Petazzoni, Herbert Xu,
	David S. Miller

[-- Attachment #1: Type: text/plain, Size: 4315 bytes --]

On Mon, May 18, 2015 at 10:06:55AM -0700, Dan Williams wrote:
> On Mon, May 18, 2015 at 2:14 AM, Maxime Ripard
> <maxime.ripard@free-electrons.com> wrote:
> > Hi Dan,
> >
> > On Wed, May 13, 2015 at 09:00:46AM -0700, Dan Williams wrote:
> >> On Wed, May 13, 2015 at 2:17 AM, Maxime Ripard
> >> <maxime.ripard@free-electrons.com> wrote:
> >> > Hi Dan,
> >> >
> >> > On Tue, May 12, 2015 at 09:05:41AM -0700, Dan Williams wrote:
> >> >> On Tue, May 12, 2015 at 8:37 AM, Maxime Ripard
> >> >> <maxime.ripard@free-electrons.com> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > This serie refactors the mv_xor in order to support the latest Armada
> >> >> > 38x features, including the PQ support in order to offload the RAID6
> >> >> > PQ operations.
> >> >> >
> >> >> > Not all the PQ operations are supported by the XOR engine, so we had
> >> >> > to introduce new async_tx flags in the process to identify
> >> >> > un-supported operations.
> >> >> >
> >> >> > Please note that this is currently not usable because of a possible
> >> >> > regression in the RAID stack in 4.1 that is being discussed at the
> >> >> > moment here: https://lkml.org/lkml/2015/5/7/527
> >> >>
> >> >> This is problematic as async_tx is a wart on the dmaengine subsystem
> >> >> and needs to be deprecated, I just have yet to find the time to do
> >> >> that work.  It turns out it was a mistake to hide the device details
> >> >> from md, it should be explicitly managing the dma channels, not
> >> >> relying on a abstraction api.  The async_tx api usage of the
> >> >> dma-mapping api is broken in that it relies on overlapping mappings of
> >> >> the same address.  This happens to work on x86, but on arm it needs
> >> >> explicit non-overlapping mappings.  I started the work to reference
> >> >> count dma-mappings in 3.13, and we need to teach md to use
> >> >> dmaengine_unmap_data explicitly.  Yielding dma channel management to
> >> >> md also results in a more efficient implementation as we can dma_map()
> >> >> the stripe cache once rather than per-io.  The  "async_tx_ack()"
> >> >> disaster can also go away when md is explicitly handling channel
> >> >> switching.
> >> >
> >> > Even though I'd be very much in favor of deprecating / removing
> >> > async_tx, is it something likely to happen soon?
> >>
> >> Not unless someone else takes it on, I'm actively asking for help.
> >>
> >> > I remember discussing this with Vinod at Plumbers back in October, but
> >> > haven't seen anything since then.
> >>
> >> Right, "help!" :)
> >>
> >> > If not, I think that we shouldn't really hold back patches to
> >> > async_tx, even though we know than in a year from now, it's going to
> >> > be gone.
> >>
> >> We definitely should block new usages, because they make a bad
> >> situation worse.  Russell already warned that the dma_mapping api
> >> abuse could lead to data corruption on ARM (speculative pre-fetching).
> >> We need to mark ASYNC_TX_DMA as "depends on !ARM" or even "depends on
> >> BROKEN" until we can get this resolved.
> >
> > I'm not sure what the issues exactly are with async_tx and ARM, but
> > these patches have been tested on ARM and are working quite well.
> 
> https://lkml.org/lkml/2011/7/8/363
> 
> > What I'm doing here is merely using the existing API, I'm not making
> > it worse, just using the API that is used by numerous drivers
> > already. So I'm not sure this is really reasonable to ask for such a
> > huge rework (with a huge potential of regressions) before merging my
> > patches.
> 
> It happens.
> 
> https://lwn.net/Articles/641443/

It really depends on what you mean by "help". If you mean "undertake
all by yourself the removal of async tx", then no, sorry, I won't,
especially when you ask to do that for a patch that just enables a
feature of an API already used on that platform.

If you mean, "give me a hand, you can start there", then yeah, I can
do that.

> I'm not happy about not having had the time to do this rework myself.
> Linux is better off with this api deprecated.

You're not talking about deprecating it, you're talking about removing
it entirely.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading
@ 2015-05-26  9:45             ` Maxime Ripard
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-05-26  9:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, May 18, 2015 at 10:06:55AM -0700, Dan Williams wrote:
> On Mon, May 18, 2015 at 2:14 AM, Maxime Ripard
> <maxime.ripard@free-electrons.com> wrote:
> > Hi Dan,
> >
> > On Wed, May 13, 2015 at 09:00:46AM -0700, Dan Williams wrote:
> >> On Wed, May 13, 2015 at 2:17 AM, Maxime Ripard
> >> <maxime.ripard@free-electrons.com> wrote:
> >> > Hi Dan,
> >> >
> >> > On Tue, May 12, 2015 at 09:05:41AM -0700, Dan Williams wrote:
> >> >> On Tue, May 12, 2015 at 8:37 AM, Maxime Ripard
> >> >> <maxime.ripard@free-electrons.com> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > This serie refactors the mv_xor in order to support the latest Armada
> >> >> > 38x features, including the PQ support in order to offload the RAID6
> >> >> > PQ operations.
> >> >> >
> >> >> > Not all the PQ operations are supported by the XOR engine, so we had
> >> >> > to introduce new async_tx flags in the process to identify
> >> >> > un-supported operations.
> >> >> >
> >> >> > Please note that this is currently not usable because of a possible
> >> >> > regression in the RAID stack in 4.1 that is being discussed at the
> >> >> > moment here: https://lkml.org/lkml/2015/5/7/527
> >> >>
> >> >> This is problematic as async_tx is a wart on the dmaengine subsystem
> >> >> and needs to be deprecated, I just have yet to find the time to do
> >> >> that work.  It turns out it was a mistake to hide the device details
> >> >> from md, it should be explicitly managing the dma channels, not
> >> >> relying on a abstraction api.  The async_tx api usage of the
> >> >> dma-mapping api is broken in that it relies on overlapping mappings of
> >> >> the same address.  This happens to work on x86, but on arm it needs
> >> >> explicit non-overlapping mappings.  I started the work to reference
> >> >> count dma-mappings in 3.13, and we need to teach md to use
> >> >> dmaengine_unmap_data explicitly.  Yielding dma channel management to
> >> >> md also results in a more efficient implementation as we can dma_map()
> >> >> the stripe cache once rather than per-io.  The  "async_tx_ack()"
> >> >> disaster can also go away when md is explicitly handling channel
> >> >> switching.
> >> >
> >> > Even though I'd be very much in favor of deprecating / removing
> >> > async_tx, is it something likely to happen soon?
> >>
> >> Not unless someone else takes it on, I'm actively asking for help.
> >>
> >> > I remember discussing this with Vinod at Plumbers back in October, but
> >> > haven't seen anything since then.
> >>
> >> Right, "help!" :)
> >>
> >> > If not, I think that we shouldn't really hold back patches to
> >> > async_tx, even though we know than in a year from now, it's going to
> >> > be gone.
> >>
> >> We definitely should block new usages, because they make a bad
> >> situation worse.  Russell already warned that the dma_mapping api
> >> abuse could lead to data corruption on ARM (speculative pre-fetching).
> >> We need to mark ASYNC_TX_DMA as "depends on !ARM" or even "depends on
> >> BROKEN" until we can get this resolved.
> >
> > I'm not sure what the issues exactly are with async_tx and ARM, but
> > these patches have been tested on ARM and are working quite well.
> 
> https://lkml.org/lkml/2011/7/8/363
> 
> > What I'm doing here is merely using the existing API, I'm not making
> > it worse, just using the API that is used by numerous drivers
> > already. So I'm not sure this is really reasonable to ask for such a
> > huge rework (with a huge potential of regressions) before merging my
> > patches.
> 
> It happens.
> 
> https://lwn.net/Articles/641443/

It really depends on what you mean by "help". If you mean "undertake
all by yourself the removal of async tx", then no, sorry, I won't,
especially when you ask to do that for a patch that just enables a
feature of an API already used on that platform.

If you mean, "give me a hand, you can start there", then yeah, I can
do that.

> I'm not happy about not having had the time to do this rework myself.
> Linux is better off with this api deprecated.

You're not talking about deprecating it, you're talking about removing
it entirely.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150526/401d4c0a/attachment.sig>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading
  2015-05-26  9:45             ` Maxime Ripard
  (?)
@ 2015-05-26 16:31               ` Dan Williams
  -1 siblings, 0 replies; 79+ messages in thread
From: Dan Williams @ 2015-05-26 16:31 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Vinod Koul, Gregory Clement, Jason Cooper, Andrew Lunn,
	Sebastian Hesselbarth, dmaengine, linux-arm-kernel, linux-kernel,
	linux-crypto, Lior Amsalem, Thomas Petazzoni, Herbert Xu,
	David S. Miller, Dave Jiang, Boaz Harrosh

[ adding Boaz as this discussion has implications for ore_raid ]

On Tue, May 26, 2015 at 2:45 AM, Maxime Ripard
<maxime.ripard@free-electrons.com> wrote:
> On Mon, May 18, 2015 at 10:06:55AM -0700, Dan Williams wrote:
>> On Mon, May 18, 2015 at 2:14 AM, Maxime Ripard
>> <maxime.ripard@free-electrons.com> wrote:
>> > Hi Dan,
>> >
>> > On Wed, May 13, 2015 at 09:00:46AM -0700, Dan Williams wrote:
>> >> On Wed, May 13, 2015 at 2:17 AM, Maxime Ripard
>> >> <maxime.ripard@free-electrons.com> wrote:
>> >> > Hi Dan,
>> >> >
>> >> > On Tue, May 12, 2015 at 09:05:41AM -0700, Dan Williams wrote:
>> >> >> On Tue, May 12, 2015 at 8:37 AM, Maxime Ripard
>> >> >> <maxime.ripard@free-electrons.com> wrote:
>> >> >> > Hi,
>> >> >> >
>> >> >> > This serie refactors the mv_xor in order to support the latest Armada
>> >> >> > 38x features, including the PQ support in order to offload the RAID6
>> >> >> > PQ operations.
>> >> >> >
>> >> >> > Not all the PQ operations are supported by the XOR engine, so we had
>> >> >> > to introduce new async_tx flags in the process to identify
>> >> >> > un-supported operations.
>> >> >> >
>> >> >> > Please note that this is currently not usable because of a possible
>> >> >> > regression in the RAID stack in 4.1 that is being discussed at the
>> >> >> > moment here: https://lkml.org/lkml/2015/5/7/527
>> >> >>
>> >> >> This is problematic as async_tx is a wart on the dmaengine subsystem
>> >> >> and needs to be deprecated, I just have yet to find the time to do
>> >> >> that work.  It turns out it was a mistake to hide the device details
>> >> >> from md, it should be explicitly managing the dma channels, not
>> >> >> relying on a abstraction api.  The async_tx api usage of the
>> >> >> dma-mapping api is broken in that it relies on overlapping mappings of
>> >> >> the same address.  This happens to work on x86, but on arm it needs
>> >> >> explicit non-overlapping mappings.  I started the work to reference
>> >> >> count dma-mappings in 3.13, and we need to teach md to use
>> >> >> dmaengine_unmap_data explicitly.  Yielding dma channel management to
>> >> >> md also results in a more efficient implementation as we can dma_map()
>> >> >> the stripe cache once rather than per-io.  The  "async_tx_ack()"
>> >> >> disaster can also go away when md is explicitly handling channel
>> >> >> switching.
>> >> >
>> >> > Even though I'd be very much in favor of deprecating / removing
>> >> > async_tx, is it something likely to happen soon?
>> >>
>> >> Not unless someone else takes it on, I'm actively asking for help.
>> >>
>> >> > I remember discussing this with Vinod at Plumbers back in October, but
>> >> > haven't seen anything since then.
>> >>
>> >> Right, "help!" :)
>> >>
>> >> > If not, I think that we shouldn't really hold back patches to
>> >> > async_tx, even though we know than in a year from now, it's going to
>> >> > be gone.
>> >>
>> >> We definitely should block new usages, because they make a bad
>> >> situation worse.  Russell already warned that the dma_mapping api
>> >> abuse could lead to data corruption on ARM (speculative pre-fetching).
>> >> We need to mark ASYNC_TX_DMA as "depends on !ARM" or even "depends on
>> >> BROKEN" until we can get this resolved.
>> >
>> > I'm not sure what the issues exactly are with async_tx and ARM, but
>> > these patches have been tested on ARM and are working quite well.
>>
>> https://lkml.org/lkml/2011/7/8/363
>>
>> > What I'm doing here is merely using the existing API, I'm not making
>> > it worse, just using the API that is used by numerous drivers
>> > already. So I'm not sure this is really reasonable to ask for such a
>> > huge rework (with a huge potential of regressions) before merging my
>> > patches.
>>
>> It happens.
>>
>> https://lwn.net/Articles/641443/
>
> It really depends on what you mean by "help". If you mean "undertake
> all by yourself the removal of async tx", then no, sorry, I won't,
> especially when you ask to do that for a patch that just enables a
> feature of an API already used on that platform.

...a potentially broken feature.  Are you sure that this speculative
prefetch problem does not affect your implementation?

> If you mean, "give me a hand, you can start there", then yeah, I can
> do that.
>
>> I'm not happy about not having had the time to do this rework myself.
>> Linux is better off with this api deprecated.
>
> You're not talking about deprecating it, you're talking about removing
> it entirely.

True, and adding more users makes that removal more difficult.  I'm
willing to help out on the design and review for this work, I just
can't commit to doing the implementation and testing.

I think it looks something like this:

At init time the raid456 driver probes for offload resources It can
discover several scenarios:

1/ "the ioatdma case": raid channels that have all the necessary
operations (copy, xor/pq, xor/pq check).  In this case we'll never
need to perform a channel switch.  Potentially the cpu never touches
the stripe cache in this case and we can maintain a static dma mapping
for the entire lifespan of a struct stripe_head.

2/ "the channel switch case":  All the necessary offload resources are
available but span multiple devices.  In this case we need to wait for
channel1 to complete an operation before channel2 can start.  This
case is complicated by the fact that different channels may need their
own dma mappings.  In the simplest case channels can share the same
mapping and raid456 needs to wait for channel completions.  I think we
can do a better job than the async_tx api here as raid456 should
probably poll for completions after each stripe processing batch.
Taking an interrupt per channel-switch event seems like excessive
overhead.

3/ "the co-op case": We have a xor/pq offload resource, but copy and
check operations require the cpu to touch the stripe cache.  In this
case we need to use the dma_sync_*_for_cpu()/dma_sync_*_for_device()
to pass buffers back and forth between device and cpu ownership.  This
shares some of the complexity of waiting for completions with scenario
2.

Which scenario does your implementation fall into?  Maybe we can focus
on that one and leave the other scenarios for other dmaengine
maintainers to jump in an implement?

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading
@ 2015-05-26 16:31               ` Dan Williams
  0 siblings, 0 replies; 79+ messages in thread
From: Dan Williams @ 2015-05-26 16:31 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Vinod Koul, Gregory Clement, Jason Cooper, Andrew Lunn,
	Sebastian Hesselbarth, dmaengine, linux-arm-kernel, linux-kernel,
	linux-crypto, Lior Amsalem, Thomas Petazzoni, Herbert Xu,
	David S. Miller, Dave Jiang, Boaz Harrosh

[ adding Boaz as this discussion has implications for ore_raid ]

On Tue, May 26, 2015 at 2:45 AM, Maxime Ripard
<maxime.ripard@free-electrons.com> wrote:
> On Mon, May 18, 2015 at 10:06:55AM -0700, Dan Williams wrote:
>> On Mon, May 18, 2015 at 2:14 AM, Maxime Ripard
>> <maxime.ripard@free-electrons.com> wrote:
>> > Hi Dan,
>> >
>> > On Wed, May 13, 2015 at 09:00:46AM -0700, Dan Williams wrote:
>> >> On Wed, May 13, 2015 at 2:17 AM, Maxime Ripard
>> >> <maxime.ripard@free-electrons.com> wrote:
>> >> > Hi Dan,
>> >> >
>> >> > On Tue, May 12, 2015 at 09:05:41AM -0700, Dan Williams wrote:
>> >> >> On Tue, May 12, 2015 at 8:37 AM, Maxime Ripard
>> >> >> <maxime.ripard@free-electrons.com> wrote:
>> >> >> > Hi,
>> >> >> >
>> >> >> > This serie refactors the mv_xor in order to support the latest Armada
>> >> >> > 38x features, including the PQ support in order to offload the RAID6
>> >> >> > PQ operations.
>> >> >> >
>> >> >> > Not all the PQ operations are supported by the XOR engine, so we had
>> >> >> > to introduce new async_tx flags in the process to identify
>> >> >> > un-supported operations.
>> >> >> >
>> >> >> > Please note that this is currently not usable because of a possible
>> >> >> > regression in the RAID stack in 4.1 that is being discussed at the
>> >> >> > moment here: https://lkml.org/lkml/2015/5/7/527
>> >> >>
>> >> >> This is problematic as async_tx is a wart on the dmaengine subsystem
>> >> >> and needs to be deprecated, I just have yet to find the time to do
>> >> >> that work.  It turns out it was a mistake to hide the device details
>> >> >> from md, it should be explicitly managing the dma channels, not
>> >> >> relying on a abstraction api.  The async_tx api usage of the
>> >> >> dma-mapping api is broken in that it relies on overlapping mappings of
>> >> >> the same address.  This happens to work on x86, but on arm it needs
>> >> >> explicit non-overlapping mappings.  I started the work to reference
>> >> >> count dma-mappings in 3.13, and we need to teach md to use
>> >> >> dmaengine_unmap_data explicitly.  Yielding dma channel management to
>> >> >> md also results in a more efficient implementation as we can dma_map()
>> >> >> the stripe cache once rather than per-io.  The  "async_tx_ack()"
>> >> >> disaster can also go away when md is explicitly handling channel
>> >> >> switching.
>> >> >
>> >> > Even though I'd be very much in favor of deprecating / removing
>> >> > async_tx, is it something likely to happen soon?
>> >>
>> >> Not unless someone else takes it on, I'm actively asking for help.
>> >>
>> >> > I remember discussing this with Vinod at Plumbers back in October, but
>> >> > haven't seen anything since then.
>> >>
>> >> Right, "help!" :)
>> >>
>> >> > If not, I think that we shouldn't really hold back patches to
>> >> > async_tx, even though we know than in a year from now, it's going to
>> >> > be gone.
>> >>
>> >> We definitely should block new usages, because they make a bad
>> >> situation worse.  Russell already warned that the dma_mapping api
>> >> abuse could lead to data corruption on ARM (speculative pre-fetching).
>> >> We need to mark ASYNC_TX_DMA as "depends on !ARM" or even "depends on
>> >> BROKEN" until we can get this resolved.
>> >
>> > I'm not sure what the issues exactly are with async_tx and ARM, but
>> > these patches have been tested on ARM and are working quite well.
>>
>> https://lkml.org/lkml/2011/7/8/363
>>
>> > What I'm doing here is merely using the existing API, I'm not making
>> > it worse, just using the API that is used by numerous drivers
>> > already. So I'm not sure this is really reasonable to ask for such a
>> > huge rework (with a huge potential of regressions) before merging my
>> > patches.
>>
>> It happens.
>>
>> https://lwn.net/Articles/641443/
>
> It really depends on what you mean by "help". If you mean "undertake
> all by yourself the removal of async tx", then no, sorry, I won't,
> especially when you ask to do that for a patch that just enables a
> feature of an API already used on that platform.

...a potentially broken feature.  Are you sure that this speculative
prefetch problem does not affect your implementation?

> If you mean, "give me a hand, you can start there", then yeah, I can
> do that.
>
>> I'm not happy about not having had the time to do this rework myself.
>> Linux is better off with this api deprecated.
>
> You're not talking about deprecating it, you're talking about removing
> it entirely.

True, and adding more users makes that removal more difficult.  I'm
willing to help out on the design and review for this work, I just
can't commit to doing the implementation and testing.

I think it looks something like this:

At init time the raid456 driver probes for offload resources It can
discover several scenarios:

1/ "the ioatdma case": raid channels that have all the necessary
operations (copy, xor/pq, xor/pq check).  In this case we'll never
need to perform a channel switch.  Potentially the cpu never touches
the stripe cache in this case and we can maintain a static dma mapping
for the entire lifespan of a struct stripe_head.

2/ "the channel switch case":  All the necessary offload resources are
available but span multiple devices.  In this case we need to wait for
channel1 to complete an operation before channel2 can start.  This
case is complicated by the fact that different channels may need their
own dma mappings.  In the simplest case channels can share the same
mapping and raid456 needs to wait for channel completions.  I think we
can do a better job than the async_tx api here as raid456 should
probably poll for completions after each stripe processing batch.
Taking an interrupt per channel-switch event seems like excessive
overhead.

3/ "the co-op case": We have a xor/pq offload resource, but copy and
check operations require the cpu to touch the stripe cache.  In this
case we need to use the dma_sync_*_for_cpu()/dma_sync_*_for_device()
to pass buffers back and forth between device and cpu ownership.  This
shares some of the complexity of waiting for completions with scenario
2.

Which scenario does your implementation fall into?  Maybe we can focus
on that one and leave the other scenarios for other dmaengine
maintainers to jump in an implement?

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading
@ 2015-05-26 16:31               ` Dan Williams
  0 siblings, 0 replies; 79+ messages in thread
From: Dan Williams @ 2015-05-26 16:31 UTC (permalink / raw)
  To: linux-arm-kernel

[ adding Boaz as this discussion has implications for ore_raid ]

On Tue, May 26, 2015 at 2:45 AM, Maxime Ripard
<maxime.ripard@free-electrons.com> wrote:
> On Mon, May 18, 2015 at 10:06:55AM -0700, Dan Williams wrote:
>> On Mon, May 18, 2015 at 2:14 AM, Maxime Ripard
>> <maxime.ripard@free-electrons.com> wrote:
>> > Hi Dan,
>> >
>> > On Wed, May 13, 2015 at 09:00:46AM -0700, Dan Williams wrote:
>> >> On Wed, May 13, 2015 at 2:17 AM, Maxime Ripard
>> >> <maxime.ripard@free-electrons.com> wrote:
>> >> > Hi Dan,
>> >> >
>> >> > On Tue, May 12, 2015 at 09:05:41AM -0700, Dan Williams wrote:
>> >> >> On Tue, May 12, 2015 at 8:37 AM, Maxime Ripard
>> >> >> <maxime.ripard@free-electrons.com> wrote:
>> >> >> > Hi,
>> >> >> >
>> >> >> > This serie refactors the mv_xor in order to support the latest Armada
>> >> >> > 38x features, including the PQ support in order to offload the RAID6
>> >> >> > PQ operations.
>> >> >> >
>> >> >> > Not all the PQ operations are supported by the XOR engine, so we had
>> >> >> > to introduce new async_tx flags in the process to identify
>> >> >> > un-supported operations.
>> >> >> >
>> >> >> > Please note that this is currently not usable because of a possible
>> >> >> > regression in the RAID stack in 4.1 that is being discussed at the
>> >> >> > moment here: https://lkml.org/lkml/2015/5/7/527
>> >> >>
>> >> >> This is problematic as async_tx is a wart on the dmaengine subsystem
>> >> >> and needs to be deprecated, I just have yet to find the time to do
>> >> >> that work.  It turns out it was a mistake to hide the device details
>> >> >> from md, it should be explicitly managing the dma channels, not
>> >> >> relying on a abstraction api.  The async_tx api usage of the
>> >> >> dma-mapping api is broken in that it relies on overlapping mappings of
>> >> >> the same address.  This happens to work on x86, but on arm it needs
>> >> >> explicit non-overlapping mappings.  I started the work to reference
>> >> >> count dma-mappings in 3.13, and we need to teach md to use
>> >> >> dmaengine_unmap_data explicitly.  Yielding dma channel management to
>> >> >> md also results in a more efficient implementation as we can dma_map()
>> >> >> the stripe cache once rather than per-io.  The  "async_tx_ack()"
>> >> >> disaster can also go away when md is explicitly handling channel
>> >> >> switching.
>> >> >
>> >> > Even though I'd be very much in favor of deprecating / removing
>> >> > async_tx, is it something likely to happen soon?
>> >>
>> >> Not unless someone else takes it on, I'm actively asking for help.
>> >>
>> >> > I remember discussing this with Vinod at Plumbers back in October, but
>> >> > haven't seen anything since then.
>> >>
>> >> Right, "help!" :)
>> >>
>> >> > If not, I think that we shouldn't really hold back patches to
>> >> > async_tx, even though we know than in a year from now, it's going to
>> >> > be gone.
>> >>
>> >> We definitely should block new usages, because they make a bad
>> >> situation worse.  Russell already warned that the dma_mapping api
>> >> abuse could lead to data corruption on ARM (speculative pre-fetching).
>> >> We need to mark ASYNC_TX_DMA as "depends on !ARM" or even "depends on
>> >> BROKEN" until we can get this resolved.
>> >
>> > I'm not sure what the issues exactly are with async_tx and ARM, but
>> > these patches have been tested on ARM and are working quite well.
>>
>> https://lkml.org/lkml/2011/7/8/363
>>
>> > What I'm doing here is merely using the existing API, I'm not making
>> > it worse, just using the API that is used by numerous drivers
>> > already. So I'm not sure this is really reasonable to ask for such a
>> > huge rework (with a huge potential of regressions) before merging my
>> > patches.
>>
>> It happens.
>>
>> https://lwn.net/Articles/641443/
>
> It really depends on what you mean by "help". If you mean "undertake
> all by yourself the removal of async tx", then no, sorry, I won't,
> especially when you ask to do that for a patch that just enables a
> feature of an API already used on that platform.

...a potentially broken feature.  Are you sure that this speculative
prefetch problem does not affect your implementation?

> If you mean, "give me a hand, you can start there", then yeah, I can
> do that.
>
>> I'm not happy about not having had the time to do this rework myself.
>> Linux is better off with this api deprecated.
>
> You're not talking about deprecating it, you're talking about removing
> it entirely.

True, and adding more users makes that removal more difficult.  I'm
willing to help out on the design and review for this work, I just
can't commit to doing the implementation and testing.

I think it looks something like this:

At init time the raid456 driver probes for offload resources It can
discover several scenarios:

1/ "the ioatdma case": raid channels that have all the necessary
operations (copy, xor/pq, xor/pq check).  In this case we'll never
need to perform a channel switch.  Potentially the cpu never touches
the stripe cache in this case and we can maintain a static dma mapping
for the entire lifespan of a struct stripe_head.

2/ "the channel switch case":  All the necessary offload resources are
available but span multiple devices.  In this case we need to wait for
channel1 to complete an operation before channel2 can start.  This
case is complicated by the fact that different channels may need their
own dma mappings.  In the simplest case channels can share the same
mapping and raid456 needs to wait for channel completions.  I think we
can do a better job than the async_tx api here as raid456 should
probably poll for completions after each stripe processing batch.
Taking an interrupt per channel-switch event seems like excessive
overhead.

3/ "the co-op case": We have a xor/pq offload resource, but copy and
check operations require the cpu to touch the stripe cache.  In this
case we need to use the dma_sync_*_for_cpu()/dma_sync_*_for_device()
to pass buffers back and forth between device and cpu ownership.  This
shares some of the complexity of waiting for completions with scenario
2.

Which scenario does your implementation fall into?  Maybe we can focus
on that one and leave the other scenarios for other dmaengine
maintainers to jump in an implement?

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading
  2015-05-26 16:31               ` Dan Williams
  (?)
@ 2015-05-27 11:52                 ` Boaz Harrosh
  -1 siblings, 0 replies; 79+ messages in thread
From: Boaz Harrosh @ 2015-05-27 11:52 UTC (permalink / raw)
  To: Dan Williams, Maxime Ripard
  Cc: Vinod Koul, Gregory Clement, Jason Cooper, Andrew Lunn,
	Sebastian Hesselbarth, dmaengine, linux-arm-kernel, linux-kernel,
	linux-crypto, Lior Amsalem, Thomas Petazzoni, Herbert Xu,
	David S. Miller, Dave Jiang, Boaz Harrosh

On 05/26/2015 07:31 PM, Dan Williams wrote:
> [ adding Boaz as this discussion has implications for ore_raid ]
> 
<>
>> You're not talking about deprecating it, you're talking about removing
>> it entirely.
> 
> True, and adding more users makes that removal more difficult.  I'm
> willing to help out on the design and review for this work, I just
> can't commit to doing the implementation and testing.
> 
<>

Hi

So for ore_raid, Yes it uses both xor and pq functions, and I expect
that to work also after the API changes.

That said, I never really cared for the HW offload engines of these
APIs. Actually I never met any. On a modern machine I always got
the DCE/MMX kick in or one of the other CPU variants. With preliminary
testing of XOR I got an almost memory speed for xor (read n pages
+ write one) So with multy-core CPUs I fail to see how an HW do
better, memory caching and all. The PQ was not that far behind.

All I need is an abstract API that gives me the best implementation
on any ARCH / configuration. Actually the async_tx API is a pain
and a sync API would make things simple. I do not use the concurrent
async submit, wait later at all. I submit then wait.

So anything you change this to, as long as you keep the wonderful
dce implementation is good with me, just that the code keeps running
after the new API is fine with me.

(And the common structures between XOR and PQ was also nice, but I
 can also use a union, its always either or in ore_raid)

Once you make API changes and modify code, CC me I'll run tests

good luck
Boaz

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading
@ 2015-05-27 11:52                 ` Boaz Harrosh
  0 siblings, 0 replies; 79+ messages in thread
From: Boaz Harrosh @ 2015-05-27 11:52 UTC (permalink / raw)
  To: Dan Williams, Maxime Ripard
  Cc: Vinod Koul, Gregory Clement, Jason Cooper, Andrew Lunn,
	Sebastian Hesselbarth, dmaengine, linux-arm-kernel, linux-kernel,
	linux-crypto, Lior Amsalem, Thomas Petazzoni, Herbert Xu,
	David S. Miller, Dave Jiang, Boaz Harrosh

On 05/26/2015 07:31 PM, Dan Williams wrote:
> [ adding Boaz as this discussion has implications for ore_raid ]
> 
<>
>> You're not talking about deprecating it, you're talking about removing
>> it entirely.
> 
> True, and adding more users makes that removal more difficult.  I'm
> willing to help out on the design and review for this work, I just
> can't commit to doing the implementation and testing.
> 
<>

Hi

So for ore_raid, Yes it uses both xor and pq functions, and I expect
that to work also after the API changes.

That said, I never really cared for the HW offload engines of these
APIs. Actually I never met any. On a modern machine I always got
the DCE/MMX kick in or one of the other CPU variants. With preliminary
testing of XOR I got an almost memory speed for xor (read n pages
+ write one) So with multy-core CPUs I fail to see how an HW do
better, memory caching and all. The PQ was not that far behind.

All I need is an abstract API that gives me the best implementation
on any ARCH / configuration. Actually the async_tx API is a pain
and a sync API would make things simple. I do not use the concurrent
async submit, wait later at all. I submit then wait.

So anything you change this to, as long as you keep the wonderful
dce implementation is good with me, just that the code keeps running
after the new API is fine with me.

(And the common structures between XOR and PQ was also nice, but I
 can also use a union, its always either or in ore_raid)

Once you make API changes and modify code, CC me I'll run tests

good luck
Boaz


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading
@ 2015-05-27 11:52                 ` Boaz Harrosh
  0 siblings, 0 replies; 79+ messages in thread
From: Boaz Harrosh @ 2015-05-27 11:52 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/26/2015 07:31 PM, Dan Williams wrote:
> [ adding Boaz as this discussion has implications for ore_raid ]
> 
<>
>> You're not talking about deprecating it, you're talking about removing
>> it entirely.
> 
> True, and adding more users makes that removal more difficult.  I'm
> willing to help out on the design and review for this work, I just
> can't commit to doing the implementation and testing.
> 
<>

Hi

So for ore_raid, Yes it uses both xor and pq functions, and I expect
that to work also after the API changes.

That said, I never really cared for the HW offload engines of these
APIs. Actually I never met any. On a modern machine I always got
the DCE/MMX kick in or one of the other CPU variants. With preliminary
testing of XOR I got an almost memory speed for xor (read n pages
+ write one) So with multy-core CPUs I fail to see how an HW do
better, memory caching and all. The PQ was not that far behind.

All I need is an abstract API that gives me the best implementation
on any ARCH / configuration. Actually the async_tx API is a pain
and a sync API would make things simple. I do not use the concurrent
async submit, wait later at all. I submit then wait.

So anything you change this to, as long as you keep the wonderful
dce implementation is good with me, just that the code keeps running
after the new API is fine with me.

(And the common structures between XOR and PQ was also nice, but I
 can also use a union, its always either or in ore_raid)

Once you make API changes and modify code, CC me I'll run tests

good luck
Boaz

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading
  2015-05-26 16:31               ` Dan Williams
  (?)
@ 2015-06-02 14:41                 ` Maxime Ripard
  -1 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-06-02 14:41 UTC (permalink / raw)
  To: Dan Williams
  Cc: Vinod Koul, Gregory Clement, Jason Cooper, Andrew Lunn,
	Sebastian Hesselbarth, dmaengine, linux-arm-kernel, linux-kernel,
	linux-crypto, Lior Amsalem, Thomas Petazzoni, Herbert Xu,
	David S. Miller, Dave Jiang, Boaz Harrosh

[-- Attachment #1: Type: text/plain, Size: 2671 bytes --]

On Tue, May 26, 2015 at 09:31:03AM -0700, Dan Williams wrote:
> > If you mean, "give me a hand, you can start there", then yeah, I can
> > do that.
> >
> >> I'm not happy about not having had the time to do this rework myself.
> >> Linux is better off with this api deprecated.
> >
> > You're not talking about deprecating it, you're talking about removing
> > it entirely.
> 
> True, and adding more users makes that removal more difficult.  I'm
> willing to help out on the design and review for this work, I just
> can't commit to doing the implementation and testing.
> 
> I think it looks something like this:
> 
> At init time the raid456 driver probes for offload resources It can
> discover several scenarios:
> 
> 1/ "the ioatdma case": raid channels that have all the necessary
> operations (copy, xor/pq, xor/pq check).  In this case we'll never
> need to perform a channel switch.  Potentially the cpu never touches
> the stripe cache in this case and we can maintain a static dma mapping
> for the entire lifespan of a struct stripe_head.
> 
> 2/ "the channel switch case":  All the necessary offload resources are
> available but span multiple devices.  In this case we need to wait for
> channel1 to complete an operation before channel2 can start.  This
> case is complicated by the fact that different channels may need their
> own dma mappings.  In the simplest case channels can share the same
> mapping and raid456 needs to wait for channel completions.  I think we
> can do a better job than the async_tx api here as raid456 should
> probably poll for completions after each stripe processing batch.
> Taking an interrupt per channel-switch event seems like excessive
> overhead.
> 
> 3/ "the co-op case": We have a xor/pq offload resource, but copy and
> check operations require the cpu to touch the stripe cache.  In this
> case we need to use the dma_sync_*_for_cpu()/dma_sync_*_for_device()
> to pass buffers back and forth between device and cpu ownership.  This
> shares some of the complexity of waiting for completions with scenario
> 2.
> 
> Which scenario does your implementation fall into?  Maybe we can focus
> on that one and leave the other scenarios for other dmaengine
> maintainers to jump in an implement?

From my limited understanding of RAID and PQ computations, it would be
3 with a twist.

Our hardware controller supports xor and PQ, but the checks and
recovering data is not supported (we're not able to offload async_mult
and async_sum_product).

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading
@ 2015-06-02 14:41                 ` Maxime Ripard
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-06-02 14:41 UTC (permalink / raw)
  To: Dan Williams
  Cc: Vinod Koul, Gregory Clement, Jason Cooper, Andrew Lunn,
	Sebastian Hesselbarth, dmaengine, linux-arm-kernel, linux-kernel,
	linux-crypto, Lior Amsalem, Thomas Petazzoni, Herbert Xu,
	David S. Miller, Dave Jiang, Boaz Harrosh

[-- Attachment #1: Type: text/plain, Size: 2671 bytes --]

On Tue, May 26, 2015 at 09:31:03AM -0700, Dan Williams wrote:
> > If you mean, "give me a hand, you can start there", then yeah, I can
> > do that.
> >
> >> I'm not happy about not having had the time to do this rework myself.
> >> Linux is better off with this api deprecated.
> >
> > You're not talking about deprecating it, you're talking about removing
> > it entirely.
> 
> True, and adding more users makes that removal more difficult.  I'm
> willing to help out on the design and review for this work, I just
> can't commit to doing the implementation and testing.
> 
> I think it looks something like this:
> 
> At init time the raid456 driver probes for offload resources It can
> discover several scenarios:
> 
> 1/ "the ioatdma case": raid channels that have all the necessary
> operations (copy, xor/pq, xor/pq check).  In this case we'll never
> need to perform a channel switch.  Potentially the cpu never touches
> the stripe cache in this case and we can maintain a static dma mapping
> for the entire lifespan of a struct stripe_head.
> 
> 2/ "the channel switch case":  All the necessary offload resources are
> available but span multiple devices.  In this case we need to wait for
> channel1 to complete an operation before channel2 can start.  This
> case is complicated by the fact that different channels may need their
> own dma mappings.  In the simplest case channels can share the same
> mapping and raid456 needs to wait for channel completions.  I think we
> can do a better job than the async_tx api here as raid456 should
> probably poll for completions after each stripe processing batch.
> Taking an interrupt per channel-switch event seems like excessive
> overhead.
> 
> 3/ "the co-op case": We have a xor/pq offload resource, but copy and
> check operations require the cpu to touch the stripe cache.  In this
> case we need to use the dma_sync_*_for_cpu()/dma_sync_*_for_device()
> to pass buffers back and forth between device and cpu ownership.  This
> shares some of the complexity of waiting for completions with scenario
> 2.
> 
> Which scenario does your implementation fall into?  Maybe we can focus
> on that one and leave the other scenarios for other dmaengine
> maintainers to jump in an implement?

From my limited understanding of RAID and PQ computations, it would be
3 with a twist.

Our hardware controller supports xor and PQ, but the checks and
recovering data is not supported (we're not able to offload async_mult
and async_sum_product).

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading
@ 2015-06-02 14:41                 ` Maxime Ripard
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Ripard @ 2015-06-02 14:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 26, 2015 at 09:31:03AM -0700, Dan Williams wrote:
> > If you mean, "give me a hand, you can start there", then yeah, I can
> > do that.
> >
> >> I'm not happy about not having had the time to do this rework myself.
> >> Linux is better off with this api deprecated.
> >
> > You're not talking about deprecating it, you're talking about removing
> > it entirely.
> 
> True, and adding more users makes that removal more difficult.  I'm
> willing to help out on the design and review for this work, I just
> can't commit to doing the implementation and testing.
> 
> I think it looks something like this:
> 
> At init time the raid456 driver probes for offload resources It can
> discover several scenarios:
> 
> 1/ "the ioatdma case": raid channels that have all the necessary
> operations (copy, xor/pq, xor/pq check).  In this case we'll never
> need to perform a channel switch.  Potentially the cpu never touches
> the stripe cache in this case and we can maintain a static dma mapping
> for the entire lifespan of a struct stripe_head.
> 
> 2/ "the channel switch case":  All the necessary offload resources are
> available but span multiple devices.  In this case we need to wait for
> channel1 to complete an operation before channel2 can start.  This
> case is complicated by the fact that different channels may need their
> own dma mappings.  In the simplest case channels can share the same
> mapping and raid456 needs to wait for channel completions.  I think we
> can do a better job than the async_tx api here as raid456 should
> probably poll for completions after each stripe processing batch.
> Taking an interrupt per channel-switch event seems like excessive
> overhead.
> 
> 3/ "the co-op case": We have a xor/pq offload resource, but copy and
> check operations require the cpu to touch the stripe cache.  In this
> case we need to use the dma_sync_*_for_cpu()/dma_sync_*_for_device()
> to pass buffers back and forth between device and cpu ownership.  This
> shares some of the complexity of waiting for completions with scenario
> 2.
> 
> Which scenario does your implementation fall into?  Maybe we can focus
> on that one and leave the other scenarios for other dmaengine
> maintainers to jump in an implement?

>From my limited understanding of RAID and PQ computations, it would be
3 with a twist.

Our hardware controller supports xor and PQ, but the checks and
recovering data is not supported (we're not able to offload async_mult
and async_sum_product).

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150602/54ee2b65/attachment.sig>

^ permalink raw reply	[flat|nested] 79+ messages in thread

end of thread, other threads:[~2015-06-02 14:45 UTC | newest]

Thread overview: 79+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-12 15:37 [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading Maxime Ripard
2015-05-12 15:37 ` Maxime Ripard
2015-05-12 15:37 ` [PATCH 1/8] dmaengine: mv_xor: Rename function for consistent naming Maxime Ripard
2015-05-12 15:37   ` Maxime Ripard
2015-05-12 15:37 ` [PATCH 2/8] dmaengine: mv_xor: add support for a38x command in descriptor mode Maxime Ripard
2015-05-12 15:37   ` Maxime Ripard
2015-05-12 15:49   ` Arnd Bergmann
2015-05-12 15:49     ` Arnd Bergmann
2015-05-12 15:54     ` Thomas Petazzoni
2015-05-12 15:54       ` Thomas Petazzoni
2015-05-12 16:03       ` Arnd Bergmann
2015-05-12 16:03         ` Arnd Bergmann
2015-05-13  8:15     ` Maxime Ripard
2015-05-13  8:15       ` Maxime Ripard
2015-05-13  8:46       ` Arnd Bergmann
2015-05-13  8:46         ` Arnd Bergmann
2015-05-12 15:49   ` Thomas Petazzoni
2015-05-12 15:49     ` Thomas Petazzoni
2015-05-12 15:58     ` Andrew Lunn
2015-05-12 15:58       ` Andrew Lunn
2015-05-12 16:05       ` Thomas Petazzoni
2015-05-12 16:05         ` Thomas Petazzoni
2015-05-13  8:23         ` Maxime Ripard
2015-05-13  8:23           ` Maxime Ripard
2015-05-12 15:37 ` [PATCH 3/8] dmaengine: mv_xor: Enlarge descriptor pool size Maxime Ripard
2015-05-12 15:37   ` Maxime Ripard
2015-05-12 15:37   ` Maxime Ripard
2015-05-12 15:37 ` [PATCH 4/8] dmaengine: mv_xor: improve descriptors list handling and reduce locking Maxime Ripard
2015-05-12 15:37   ` Maxime Ripard
2015-05-12 15:37 ` [PATCH 5/8] dmaengine: mv_xor: bug fix for racing condition in descriptors cleanup Maxime Ripard
2015-05-12 15:37   ` Maxime Ripard
2015-05-12 15:37   ` Maxime Ripard
2015-05-12 15:51   ` Thomas Petazzoni
2015-05-12 15:51     ` Thomas Petazzoni
2015-05-12 15:37 ` [PATCH 6/8] async_tx: adding mult and sum_product flags Maxime Ripard
2015-05-12 15:37   ` Maxime Ripard
2015-05-12 16:05   ` Andrew Lunn
2015-05-12 16:05     ` Andrew Lunn
2015-05-13  8:45     ` Maxime Ripard
2015-05-13  8:45       ` Maxime Ripard
2015-05-12 15:37 ` [PATCH 7/8] dmaengine: mv_xor: add support for a38x RAID6 support Maxime Ripard
2015-05-12 15:37   ` Maxime Ripard
2015-05-12 15:37 ` [PATCH 8/8] ARM: mvebu: a38x: Enable A38x XOR engine features Maxime Ripard
2015-05-12 15:37   ` Maxime Ripard
2015-05-12 16:13   ` Andrew Lunn
2015-05-12 16:13     ` Andrew Lunn
2015-05-13  7:16     ` Lior Amsalem
2015-05-13  7:16       ` Lior Amsalem
2015-05-13  7:16       ` Lior Amsalem
2015-05-13  8:33       ` Maxime Ripard
2015-05-13  8:33         ` Maxime Ripard
2015-05-13  8:33         ` Maxime Ripard
2015-05-12 16:05 ` [PATCH 0/8] ARM: mvebu: Add support for RAID6 PQ offloading Dan Williams
2015-05-12 16:05   ` Dan Williams
2015-05-12 16:05   ` Dan Williams
2015-05-13  9:17   ` Maxime Ripard
2015-05-13  9:17     ` Maxime Ripard
2015-05-13  9:17     ` Maxime Ripard
2015-05-13 16:00     ` Dan Williams
2015-05-13 16:00       ` Dan Williams
2015-05-13 16:00       ` Dan Williams
2015-05-18  9:14       ` Maxime Ripard
2015-05-18  9:14         ` Maxime Ripard
2015-05-18  9:14         ` Maxime Ripard
2015-05-18 17:06         ` Dan Williams
2015-05-18 17:06           ` Dan Williams
2015-05-18 17:06           ` Dan Williams
2015-05-26  9:45           ` Maxime Ripard
2015-05-26  9:45             ` Maxime Ripard
2015-05-26  9:45             ` Maxime Ripard
2015-05-26 16:31             ` Dan Williams
2015-05-26 16:31               ` Dan Williams
2015-05-26 16:31               ` Dan Williams
2015-05-27 11:52               ` Boaz Harrosh
2015-05-27 11:52                 ` Boaz Harrosh
2015-05-27 11:52                 ` Boaz Harrosh
2015-06-02 14:41               ` Maxime Ripard
2015-06-02 14:41                 ` Maxime Ripard
2015-06-02 14:41                 ` Maxime Ripard

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.