linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9] crypto: Add Allwinner Security System crypto accelerator
@ 2015-05-14 12:58 LABBE Corentin
  2015-05-14 12:58 ` [PATCH v9 1/4] ARM: sun7i: dt: Add Security System to A20 SoC DTS LABBE Corentin
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: LABBE Corentin @ 2015-05-14 12:58 UTC (permalink / raw)
  To: robh+dt, pawel.moll, mark.rutland, ijc+devicetree, galak,
	maxime.ripard, linux, herbert, davem, akpm, gregkh, mchehab, joe,
	tj, arnd, boris.brezillon
  Cc: devicetree, linux-doc, linux-arm-kernel, linux-kernel,
	linux-crypto, linux-sunxi


Hello

This is the driver for the Security System included in Allwinner SoC A20.
The Security System (SS for short) is a hardware cryptographic accelerator that
support AES/MD5/SHA1/DES/3DES/PRNG algorithms.
It could be found on others Allwinner SoC: 
- A10, A10s, A13, A31 and A33 manual give the same datasheet for SS than A20
- A23 speak about a security system but without precisions
- A80 and A83T datasheet speak about a security system with more functions
  (SHA224/SHA256/RSA/CRC), they will be supported in a separate driver
But I do not have access on any of those hardware, tests are welcome.

This driver currently supports:
- MD5 and SHA1 hash algorithms
- AES block cipher in CBC/ECB mode with 128/196/256bits keys.
- DES and 3DES block cipher in CBC/ECB mode
The driver exposes all those algorithms through the kernel cryptographic API.

The driver support only CPU driven (aka poll mode) transfer mode,
since the DMA engine of the A20 does not have a mainline driver yet.

Changes since v8
- use sg_miter in cipher functions, this clean all kmap stuff (thanks to bbrezillon)
- use GIC in dt interupts
- rename all references (files/dtcompatible) from sunxi to sun4i in prevision of futur A80/A83 driver

Changes since v7
- Add ECB block mode
- split the sunxi_ss_cipher_init in two
- remove sunxi_ss_cipher_encrypt/sunxi_ss_cipher_decrypt
- use now sunxi_ss_block_method_encrypt and sunxi_ss_block_method_decrypt
- Add weak DES key test needed by ECB tests

Changes since v6:
- Use alphabetic ordering for new sections in MAINTAINERS
- Clean remaining PRNG driver header

Changes since v5:
- Hash functions now keep partial hash states in sunxi_ss structures
- Use of spinlock instead of mutex
- Remove the static sunxi_ss structures by using container of
- Add export/import functions
- replace lots of writel by writesl
- replace lots of readl by readsl

Changes since v4:
- Rework all mutex path
- Use ahash_request_ctx() in hash functions
- Major rework of hash functions for solving mutex problems
- Split sunxi_req_ctx in two since ciphers now use struct sunxi_tfm_ctx
- Hash functions now test FIFO space register

Changes since v3:
- Remove all algorithms options from Kconfig, so now only one module is used
- Add the sunxi_ss_cipher function to unify mode calculation
- Remove the sunxi_cipher_exit empty function
- Add some missing mutex_unlock()
- Drop PRNG support, I wait for more comment on its results before re-enabling it.

Changes since v2:
- Fix Makefile and Kconfig for static kernel.

Changes since v1:
- annotate ss->base as __iomem
- regroup all mutex in the ss_ctx structure
- splited driver in 7 modules (core md5 sha1 aes des 3des prng) in sunxi-ss directory
- use dev_exit_p() for .remove
- added missing CRYPTO_BLKCIPHER dep in Kconfig
- use ahash instead of shash
- use ablkcipher instead of blkcipher
- use crypto_rng_ctx instead of crypto_tfm_ctx
- set seed as an u32
- drop useless comment decoration
- drop useless debug
- ss_ctx is now a static pointer and whole structure being allocated
- fix the platform_get_resource/devm_ioremap_resource pattern
- invert getting die id and configuring clock
- set clock value as a const unsigned long
- add MODULE_ALIAS
- use define names more consistency (SS_xxx)
- fix PRNG errors
- respell SS to Security System in DT documentation


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v9 1/4] ARM: sun7i: dt: Add Security System to A20 SoC DTS
  2015-05-14 12:58 [PATCH v9] crypto: Add Allwinner Security System crypto accelerator LABBE Corentin
@ 2015-05-14 12:58 ` LABBE Corentin
  2015-05-15  7:31   ` Maxime Ripard
  2015-05-14 12:58 ` [PATCH v9 2/4] ARM: sun4i: dt: Add DT bindings documentation for SUN4I Security System LABBE Corentin
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: LABBE Corentin @ 2015-05-14 12:58 UTC (permalink / raw)
  To: robh+dt, pawel.moll, mark.rutland, ijc+devicetree, galak,
	maxime.ripard, linux, herbert, davem, akpm, gregkh, mchehab, joe,
	tj, arnd, boris.brezillon
  Cc: devicetree, linux-doc, linux-arm-kernel, linux-kernel,
	linux-crypto, linux-sunxi, LABBE Corentin

The Security System is a hardware cryptographic accelerator that support
AES/MD5/SHA1/DES/3DES/PRNG algorithms.
It could be found on many Allwinner SoC.

This patch enable the Security System on the Allwinner A20 SoC Device-tree.

Signed-off-by: LABBE Corentin <clabbe.montjoie@gmail.com>
---
 arch/arm/boot/dts/sun7i-a20.dtsi | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/arm/boot/dts/sun7i-a20.dtsi b/arch/arm/boot/dts/sun7i-a20.dtsi
index fdd1817..9a823ce 100644
--- a/arch/arm/boot/dts/sun7i-a20.dtsi
+++ b/arch/arm/boot/dts/sun7i-a20.dtsi
@@ -679,6 +679,14 @@
 			status = "disabled";
 		};
 
+		crypto: crypto-engine@01c15000 {
+			compatible = "allwinner,sun4i-a20-crypto";
+			reg = <0x01c15000 0x1000>;
+			interrupts = <GIC_SPI 86 IRQ_TYPE_LEVEL_HIGH>;
+			clocks = <&ahb_gates 5>, <&ss_clk>;
+			clock-names = "ahb", "mod";
+		};
+
 		spi2: spi@01c17000 {
 			compatible = "allwinner,sun4i-a10-spi";
 			reg = <0x01c17000 0x1000>;
-- 
2.3.6


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v9 2/4] ARM: sun4i: dt: Add DT bindings documentation for SUN4I Security System
  2015-05-14 12:58 [PATCH v9] crypto: Add Allwinner Security System crypto accelerator LABBE Corentin
  2015-05-14 12:58 ` [PATCH v9 1/4] ARM: sun7i: dt: Add Security System to A20 SoC DTS LABBE Corentin
@ 2015-05-14 12:58 ` LABBE Corentin
  2015-05-14 12:59 ` [PATCH v9 3/4] MAINTAINERS: Add myself as maintainer of Allwinner " LABBE Corentin
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 26+ messages in thread
From: LABBE Corentin @ 2015-05-14 12:58 UTC (permalink / raw)
  To: robh+dt, pawel.moll, mark.rutland, ijc+devicetree, galak,
	maxime.ripard, linux, herbert, davem, akpm, gregkh, mchehab, joe,
	tj, arnd, boris.brezillon
  Cc: devicetree, linux-doc, linux-arm-kernel, linux-kernel,
	linux-crypto, linux-sunxi, LABBE Corentin

This patch adds documentation for Device-Tree bindings for the Security System
cryptographic accelerator driver.

Signed-off-by: LABBE Corentin <clabbe.montjoie@gmail.com>
---
 Documentation/devicetree/bindings/crypto/sun4i-ss.txt | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/crypto/sun4i-ss.txt

diff --git a/Documentation/devicetree/bindings/crypto/sun4i-ss.txt b/Documentation/devicetree/bindings/crypto/sun4i-ss.txt
new file mode 100644
index 0000000..f2192d4
--- /dev/null
+++ b/Documentation/devicetree/bindings/crypto/sun4i-ss.txt
@@ -0,0 +1,19 @@
+* Allwinner Security System found on A20 SoC
+
+Required properties:
+- compatible : Should be "allwinner,sun4i-a20-crypto".
+- reg: Should contain the Security System register location and length.
+- interrupts: Should contain the IRQ line for the Security System.
+- clocks : List of clock specifiers, corresponding to ahb and ss.
+- clock-names : Name of the functional clock, should be
+	* "ahb" : AHB gating clock
+	* "mod" : SS controller clock
+
+Example:
+	crypto: crypto-engine@01c15000 {
+		compatible = "allwinner,sun4i-a20-crypto";
+		reg = <0x01c15000 0x1000>;
+		interrupts = <GIC_SPI 86 IRQ_TYPE_LEVEL_HIGH>;
+		clocks = <&ahb_gates 5>, <&ss_clk>;
+		clock-names = "ahb", "mod";
+	};
-- 
2.3.6


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v9 3/4] MAINTAINERS: Add myself as maintainer of Allwinner Security System
  2015-05-14 12:58 [PATCH v9] crypto: Add Allwinner Security System crypto accelerator LABBE Corentin
  2015-05-14 12:58 ` [PATCH v9 1/4] ARM: sun7i: dt: Add Security System to A20 SoC DTS LABBE Corentin
  2015-05-14 12:58 ` [PATCH v9 2/4] ARM: sun4i: dt: Add DT bindings documentation for SUN4I Security System LABBE Corentin
@ 2015-05-14 12:59 ` LABBE Corentin
  2015-05-14 12:59 ` [PATCH v9 4/4] crypto: Add Allwinner Security System crypto accelerator LABBE Corentin
  2015-05-17  7:45 ` [PATCH v9] " Boris Brezillon
  4 siblings, 0 replies; 26+ messages in thread
From: LABBE Corentin @ 2015-05-14 12:59 UTC (permalink / raw)
  To: robh+dt, pawel.moll, mark.rutland, ijc+devicetree, galak,
	maxime.ripard, linux, herbert, davem, akpm, gregkh, mchehab, joe,
	tj, arnd, boris.brezillon
  Cc: devicetree, linux-doc, linux-arm-kernel, linux-kernel,
	linux-crypto, linux-sunxi, LABBE Corentin

Signed-off-by: LABBE Corentin <clabbe.montjoie@gmail.com>
---
 MAINTAINERS | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5d87ccb..f2518d8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -555,6 +555,12 @@ S:	Maintained
 F:	Documentation/i2c/busses/i2c-ali1563
 F:	drivers/i2c/busses/i2c-ali1563.c
 
+ALLWINNER SECURITY SYSTEM
+M:	Corentin Labbe <clabbe.montjoie@gmail.com>
+L:	linux-crypto@vger.kernel.org
+S:	Maintained
+F:	drivers/crypto/sunxi-ss/
+
 ALPHA PORT
 M:	Richard Henderson <rth@twiddle.net>
 M:	Ivan Kokshaysky <ink@jurassic.park.msu.ru>
-- 
2.3.6


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v9 4/4] crypto: Add Allwinner Security System crypto accelerator
  2015-05-14 12:58 [PATCH v9] crypto: Add Allwinner Security System crypto accelerator LABBE Corentin
                   ` (2 preceding siblings ...)
  2015-05-14 12:59 ` [PATCH v9 3/4] MAINTAINERS: Add myself as maintainer of Allwinner " LABBE Corentin
@ 2015-05-14 12:59 ` LABBE Corentin
  2015-05-15  6:49   ` Herbert Xu
                     ` (2 more replies)
  2015-05-17  7:45 ` [PATCH v9] " Boris Brezillon
  4 siblings, 3 replies; 26+ messages in thread
From: LABBE Corentin @ 2015-05-14 12:59 UTC (permalink / raw)
  To: robh+dt, pawel.moll, mark.rutland, ijc+devicetree, galak,
	maxime.ripard, linux, herbert, davem, akpm, gregkh, mchehab, joe,
	tj, arnd, boris.brezillon
  Cc: devicetree, linux-doc, linux-arm-kernel, linux-kernel,
	linux-crypto, linux-sunxi, LABBE Corentin

Add support for the Security System included in Allwinner SoC A20.
The Security System is a hardware cryptographic accelerator that support:
- MD5 and SHA1 hash algorithms
- AES block cipher in CBC/ECB mode with 128/196/256bits keys.
- DES and 3DES block cipher in CBC/ECB mode

Signed-off-by: LABBE Corentin <clabbe.montjoie@gmail.com>
---
 drivers/crypto/Kconfig                    |  17 ++
 drivers/crypto/Makefile                   |   1 +
 drivers/crypto/sunxi-ss/Makefile          |   2 +
 drivers/crypto/sunxi-ss/sun4i-ss-cipher.c | 432 +++++++++++++++++++++++++++
 drivers/crypto/sunxi-ss/sun4i-ss-core.c   | 411 +++++++++++++++++++++++++
 drivers/crypto/sunxi-ss/sun4i-ss-hash.c   | 481 ++++++++++++++++++++++++++++++
 drivers/crypto/sunxi-ss/sun4i-ss.h        | 198 ++++++++++++
 7 files changed, 1542 insertions(+)
 create mode 100644 drivers/crypto/sunxi-ss/Makefile
 create mode 100644 drivers/crypto/sunxi-ss/sun4i-ss-cipher.c
 create mode 100644 drivers/crypto/sunxi-ss/sun4i-ss-core.c
 create mode 100644 drivers/crypto/sunxi-ss/sun4i-ss-hash.c
 create mode 100644 drivers/crypto/sunxi-ss/sun4i-ss.h

diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index 033c0c8..2a5598e 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -459,4 +459,21 @@ config CRYPTO_DEV_IMGTEC_HASH
 	  hardware hash accelerator. Supporting MD5/SHA1/SHA224/SHA256
 	  hashing algorithms.
 
+config CRYPTO_DEV_SUN4I_SS
+	tristate "Support for Allwinner Security System cryptographic accelerator"
+	depends on ARCH_SUNXI
+	select CRYPTO_MD5
+	select CRYPTO_SHA1
+	select CRYPTO_AES
+	select CRYPTO_DES
+	select CRYPTO_BLKCIPHER
+	help
+	  Some Allwinner SoC have a crypto accelerator named
+	  Security System. Select this if you want to use it.
+	  The Security System handle AES/DES/3DES ciphers in CBC mode
+	  and SHA1 and MD5 hash algorithms.
+
+	  To compile this driver as a module, choose M here: the module
+	  will be called sun4i-ss.
+
 endif # CRYPTO_HW
diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
index fb84be7..e843292 100644
--- a/drivers/crypto/Makefile
+++ b/drivers/crypto/Makefile
@@ -27,3 +27,4 @@ obj-$(CONFIG_CRYPTO_DEV_UX500) += ux500/
 obj-$(CONFIG_CRYPTO_DEV_QAT) += qat/
 obj-$(CONFIG_CRYPTO_DEV_QCE) += qce/
 obj-$(CONFIG_CRYPTO_DEV_VMX) += vmx/
+obj-$(CONFIG_CRYPTO_DEV_SUN4I_SS) += sunxi-ss/
diff --git a/drivers/crypto/sunxi-ss/Makefile b/drivers/crypto/sunxi-ss/Makefile
new file mode 100644
index 0000000..8f4c7a2
--- /dev/null
+++ b/drivers/crypto/sunxi-ss/Makefile
@@ -0,0 +1,2 @@
+obj-$(CONFIG_CRYPTO_DEV_SUN4I_SS) += sun4i-ss.o
+sun4i-ss-y += sun4i-ss-core.o sun4i-ss-hash.o sun4i-ss-cipher.o
diff --git a/drivers/crypto/sunxi-ss/sun4i-ss-cipher.c b/drivers/crypto/sunxi-ss/sun4i-ss-cipher.c
new file mode 100644
index 0000000..e331884
--- /dev/null
+++ b/drivers/crypto/sunxi-ss/sun4i-ss-cipher.c
@@ -0,0 +1,432 @@
+/*
+ * sun4i-ss-cipher.c - hardware cryptographic accelerator for Allwinner A20 SoC
+ *
+ * Copyright (C) 2013-2015 Corentin LABBE <clabbe.montjoie@gmail.com>
+ *
+ * This file add support for AES cipher with 128,192,256 bits
+ * keysize in CBC mode.
+ * Add support also for DES and 3DES in CBC mode.
+ *
+ * You could find the datasheet in Documentation/arm/sunxi/README
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+#include "sun4i-ss.h"
+
+/* CBC AES */
+int sun4i_ss_cbc_aes_encrypt(struct ablkcipher_request *areq)
+{
+	struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(areq);
+	struct sun4i_tfm_ctx *op = crypto_ablkcipher_ctx(tfm);
+	u32 mode;
+
+	mode = SS_OP_AES | SS_CBC | SS_ENABLED | SS_ENCRYPTION | op->keymode;
+	return sun4i_ss_aes_poll(areq, mode);
+}
+
+int sun4i_ss_cbc_aes_decrypt(struct ablkcipher_request *areq)
+{
+	struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(areq);
+	struct sun4i_tfm_ctx *op = crypto_ablkcipher_ctx(tfm);
+	u32 mode;
+
+	mode = SS_OP_AES | SS_CBC | SS_ENABLED | SS_DECRYPTION | op->keymode;
+	return sun4i_ss_aes_poll(areq, mode);
+}
+
+/* ECB AES */
+int sun4i_ss_ecb_aes_encrypt(struct ablkcipher_request *areq)
+{
+	struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(areq);
+	struct sun4i_tfm_ctx *op = crypto_ablkcipher_ctx(tfm);
+	u32 mode;
+
+	mode = SS_OP_AES | SS_ECB | SS_ENABLED | SS_ENCRYPTION | op->keymode;
+	return sun4i_ss_aes_poll(areq, mode);
+}
+
+int sun4i_ss_ecb_aes_decrypt(struct ablkcipher_request *areq)
+{
+	struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(areq);
+	struct sun4i_tfm_ctx *op = crypto_ablkcipher_ctx(tfm);
+	u32 mode;
+
+	mode = SS_OP_AES | SS_ECB | SS_ENABLED | SS_DECRYPTION | op->keymode;
+	return sun4i_ss_aes_poll(areq, mode);
+}
+
+/* CBC DES */
+int sun4i_ss_cbc_des_encrypt(struct ablkcipher_request *areq)
+{
+	struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(areq);
+	struct sun4i_tfm_ctx *op = crypto_ablkcipher_ctx(tfm);
+	u32 mode;
+
+	mode = SS_OP_DES | SS_CBC | SS_ENABLED | SS_ENCRYPTION | op->keymode;
+	return sun4i_ss_des_poll(areq, mode);
+}
+
+int sun4i_ss_cbc_des_decrypt(struct ablkcipher_request *areq)
+{
+	struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(areq);
+	struct sun4i_tfm_ctx *op = crypto_ablkcipher_ctx(tfm);
+	u32 mode;
+
+	mode = SS_OP_DES | SS_CBC | SS_ENABLED | SS_DECRYPTION | op->keymode;
+	return sun4i_ss_des_poll(areq, mode);
+}
+
+/* ECB DES */
+int sun4i_ss_ecb_des_encrypt(struct ablkcipher_request *areq)
+{
+	struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(areq);
+	struct sun4i_tfm_ctx *op = crypto_ablkcipher_ctx(tfm);
+	u32 mode;
+
+	mode = SS_OP_DES | SS_ECB | SS_ENABLED | SS_ENCRYPTION | op->keymode;
+	return sun4i_ss_des_poll(areq, mode);
+}
+
+int sun4i_ss_ecb_des_decrypt(struct ablkcipher_request *areq)
+{
+	struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(areq);
+	struct sun4i_tfm_ctx *op = crypto_ablkcipher_ctx(tfm);
+	u32 mode;
+
+	mode = SS_OP_DES | SS_ECB | SS_ENABLED | SS_DECRYPTION | op->keymode;
+	return sun4i_ss_des_poll(areq, mode);
+}
+
+/* CBC 3DES */
+int sun4i_ss_cbc_des3_encrypt(struct ablkcipher_request *areq)
+{
+	struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(areq);
+	struct sun4i_tfm_ctx *op = crypto_ablkcipher_ctx(tfm);
+	u32 mode;
+
+	mode = SS_OP_3DES | SS_CBC | SS_ENABLED | SS_ENCRYPTION | op->keymode;
+	return sun4i_ss_des_poll(areq, mode);
+}
+
+int sun4i_ss_cbc_des3_decrypt(struct ablkcipher_request *areq)
+{
+	struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(areq);
+	struct sun4i_tfm_ctx *op = crypto_ablkcipher_ctx(tfm);
+	u32 mode;
+
+	mode = SS_OP_3DES | SS_CBC | SS_ENABLED | SS_DECRYPTION | op->keymode;
+	return sun4i_ss_des_poll(areq, mode);
+}
+
+/* ECB 3DES */
+int sun4i_ss_ecb_des3_encrypt(struct ablkcipher_request *areq)
+{
+	struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(areq);
+	struct sun4i_tfm_ctx *op = crypto_ablkcipher_ctx(tfm);
+	u32 mode;
+
+	mode = SS_OP_3DES | SS_ECB | SS_ENABLED | SS_ENCRYPTION | op->keymode;
+	return sun4i_ss_des_poll(areq, mode);
+}
+
+int sun4i_ss_ecb_des3_decrypt(struct ablkcipher_request *areq)
+{
+	struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(areq);
+	struct sun4i_tfm_ctx *op = crypto_ablkcipher_ctx(tfm);
+	u32 mode;
+
+	mode = SS_OP_3DES | SS_ECB | SS_ENABLED | SS_DECRYPTION | op->keymode;
+	return sun4i_ss_des_poll(areq, mode);
+}
+
+int sun4i_ss_cipher_init(struct crypto_tfm *tfm)
+{
+	struct sun4i_tfm_ctx *op = crypto_tfm_ctx(tfm);
+	struct crypto_alg *alg = tfm->__crt_alg;
+	struct sun4i_ss_alg_template *algt;
+
+	memset(op, 0, sizeof(struct sun4i_tfm_ctx));
+
+	algt = container_of(alg, struct sun4i_ss_alg_template, alg.crypto);
+	op->ss = algt->ss;
+
+	return 0;
+}
+
+int sun4i_ss_cipher_des_init(struct crypto_tfm *tfm)
+{
+	const char *name = crypto_tfm_alg_name(tfm);
+	struct sun4i_tfm_ctx *op = crypto_tfm_ctx(tfm);
+	struct crypto_alg *alg = tfm->__crt_alg;
+	struct sun4i_ss_alg_template *algt;
+	struct sun4i_ss_ctx *ss;
+
+	memset(op, 0, sizeof(struct sun4i_tfm_ctx));
+
+	algt = container_of(alg, struct sun4i_ss_alg_template, alg.crypto);
+	ss = algt->ss;
+	op->ss = algt->ss;
+
+	op->fallback = crypto_alloc_ablkcipher(name, 0,
+			CRYPTO_ALG_ASYNC | CRYPTO_ALG_NEED_FALLBACK);
+	if (IS_ERR(op->fallback)) {
+		dev_err(ss->dev, "ERROR: allocating fallback algo %s\n", name);
+		return PTR_ERR(op->fallback);
+	}
+	return 0;
+}
+
+void sun4i_ss_cipher_exit(struct crypto_tfm *tfm)
+{
+	struct sun4i_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	if (ctx->fallback)
+		crypto_free_ablkcipher(ctx->fallback);
+	ctx->fallback = NULL;
+}
+
+int sun4i_ss_aes_poll(struct ablkcipher_request *areq, u32 mode)
+{
+	u32 spaces;
+	struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(areq);
+	struct sun4i_tfm_ctx *op = crypto_ablkcipher_ctx(tfm);
+	struct sun4i_ss_ctx *ss = op->ss;
+	unsigned int ivsize = crypto_ablkcipher_ivsize(tfm);
+	/* when activating SS, the default FIFO space is 32 */
+	u32 rx_cnt = 32;
+	u32 tx_cnt = 0;
+	u32 v;
+	int i, sgnum, err = 0;
+	unsigned int ileft = areq->nbytes;
+	unsigned int oleft = areq->nbytes;
+	unsigned int todo, miter_flag;
+	unsigned long flags;
+	struct sg_mapping_iter mi, mo;
+	unsigned int oi, oo; /* offset for in and out */
+
+	if (areq->nbytes == 0)
+		return 0;
+
+	if (areq->info == NULL) {
+		dev_err(ss->dev, "ERROR: Empty IV\n");
+		return -EINVAL;
+	}
+
+	if (areq->src == NULL || areq->dst == NULL) {
+		dev_err(ss->dev, "ERROR: Some SGs are NULL\n");
+		return -EINVAL;
+	}
+
+	spin_lock_irqsave(&ss->slock, flags);
+
+	for (i = 0; i < op->keylen; i += 4)
+		writel(*(op->key + i / 4), ss->base + SS_KEY0 + i);
+
+	if (areq->info != NULL) {
+		for (i = 0; i < 4 && i < ivsize / 4; i++) {
+			v = *(u32 *)(areq->info + i * 4);
+			writel(v, ss->base + SS_IV0 + i * 4);
+		}
+	}
+	writel(mode, ss->base + SS_CTL);
+
+	sgnum = sg_nents(areq->src);
+	if (sgnum == 1)
+		miter_flag = SG_MITER_FROM_SG | SG_MITER_ATOMIC;
+	else
+		miter_flag = SG_MITER_FROM_SG;
+	sg_miter_start(&mi, areq->src, sgnum, miter_flag);
+	sgnum = sg_nents(areq->dst);
+	if (sgnum == 1)
+		miter_flag = SG_MITER_TO_SG | SG_MITER_ATOMIC;
+	else
+		miter_flag = SG_MITER_TO_SG;
+	sg_miter_start(&mo, areq->dst, sgnum, miter_flag);
+	sg_miter_next(&mi);
+	sg_miter_next(&mo);
+	if (mi.addr == NULL || mo.addr == NULL) {
+		err = -EINVAL;
+		goto release_ss;
+	}
+
+	ileft = areq->nbytes / 4;
+	oleft = areq->nbytes / 4;
+	oi = 0;
+	oo = 0;
+	do {
+		todo = min3(rx_cnt, ileft, (mi.length - oi) / 4);
+		if (todo > 0) {
+			ileft -= todo;
+			writesl(ss->base + SS_RXFIFO, mi.addr + oi, todo);
+			oi += todo * 4;
+		}
+		if (oi == mi.length) {
+			sg_miter_next(&mi);
+			oi = 0;
+		}
+
+		spaces = readl_relaxed(ss->base + SS_FCSR);
+		rx_cnt = SS_RXFIFO_SPACES(spaces);
+		tx_cnt = SS_TXFIFO_SPACES(spaces);
+
+		todo = min3(tx_cnt, oleft, (mo.length - oo) / 4);
+		if (todo > 0) {
+			oleft -= todo;
+			readsl(ss->base + SS_TXFIFO, mo.addr + oo, todo);
+			oo += todo * 4;
+		}
+		if (oo == mo.length) {
+			sg_miter_next(&mo);
+			oo = 0;
+		}
+	} while (mo.length > 0);
+
+release_ss:
+	sg_miter_stop(&mi);
+	sg_miter_stop(&mo);
+	writel_relaxed(0, ss->base + SS_CTL);
+	spin_unlock_irqrestore(&ss->slock, flags);
+	return err;
+}
+
+/* Pure CPU driven way of doing DES/3DES with SS */
+int sun4i_ss_des_poll(struct ablkcipher_request *areq, u32 mode)
+{
+	struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(areq);
+	struct sun4i_tfm_ctx *op = crypto_ablkcipher_ctx(tfm);
+	struct sun4i_ss_ctx *ss = op->ss;
+	int i, err = 0;
+	int no_chunk = 1;
+	struct scatterlist *in_sg = areq->src;
+	struct scatterlist *out_sg = areq->dst;
+	u8 kkey[256 / 8];
+
+	if (areq->nbytes == 0)
+		return 0;
+
+	if (areq->info == NULL) {
+		dev_err(ss->dev, "ERROR: Empty IV\n");
+		return -EINVAL;
+	}
+
+	if (areq->src == NULL || areq->dst == NULL) {
+		dev_err(ss->dev, "ERROR: Some SGs are NULL\n");
+		return -EINVAL;
+	}
+
+	/*
+	 * if we have only SGs with size multiple of 4,
+	 * we can use the SS AES function
+	 */
+	while (in_sg != NULL && no_chunk == 1) {
+		if ((in_sg->length % 4) != 0)
+			no_chunk = 0;
+		in_sg = sg_next(in_sg);
+	}
+	while (out_sg != NULL && no_chunk == 1) {
+		if ((out_sg->length % 4) != 0)
+			no_chunk = 0;
+		out_sg = sg_next(out_sg);
+	}
+
+	if (no_chunk == 1)
+		return sun4i_ss_aes_poll(areq, mode);
+
+	/*
+	 * if some SG are not multiple of 4bytes use a fallback
+	 * it is much easy and clean
+	 */
+	ablkcipher_request_set_tfm(areq, op->fallback);
+	for (i = 0; i < op->keylen; i++)
+		*(u32 *)(kkey + i * 4) = op->key[i];
+
+	err = crypto_ablkcipher_setkey(op->fallback, kkey, op->keylen);
+	if (err != 0) {
+		dev_err(ss->dev, "Cannot set key on fallback\n");
+		return -EINVAL;
+	}
+
+	if ((mode & SS_DECRYPTION) == SS_DECRYPTION)
+		err = crypto_ablkcipher_decrypt(areq);
+	else
+		err = crypto_ablkcipher_encrypt(areq);
+	ablkcipher_request_set_tfm(areq, tfm);
+	return err;
+}
+
+/* check and set the AES key, prepare the mode to be used */
+int sun4i_ss_aes_setkey(struct crypto_ablkcipher *tfm, const u8 *key,
+		unsigned int keylen)
+{
+	struct sun4i_tfm_ctx *op = crypto_ablkcipher_ctx(tfm);
+	struct sun4i_ss_ctx *ss = op->ss;
+
+	switch (keylen) {
+	case 128 / 8:
+		op->keymode = SS_AES_128BITS;
+		break;
+	case 192 / 8:
+		op->keymode = SS_AES_192BITS;
+		break;
+	case 256 / 8:
+		op->keymode = SS_AES_256BITS;
+		break;
+	default:
+		dev_err(ss->dev, "ERROR: Invalid keylen %u\n", keylen);
+		crypto_ablkcipher_set_flags(tfm, CRYPTO_TFM_RES_BAD_KEY_LEN);
+		return -EINVAL;
+	}
+	op->keylen = keylen;
+	memcpy(op->key, key, keylen);
+	return 0;
+}
+
+/* check and set the DES key, prepare the mode to be used */
+int sun4i_ss_des_setkey(struct crypto_ablkcipher *tfm, const u8 *key,
+		unsigned int keylen)
+{
+	struct sun4i_tfm_ctx *op = crypto_ablkcipher_ctx(tfm);
+	struct sun4i_ss_ctx *ss = op->ss;
+	u32 flags;
+	u32 tmp[DES_EXPKEY_WORDS];
+	int ret;
+
+	if (unlikely(keylen != DES_KEY_SIZE)) {
+		dev_err(ss->dev, "Invalid keylen %u\n", keylen);
+		crypto_ablkcipher_set_flags(tfm, CRYPTO_TFM_RES_BAD_KEY_LEN);
+		return -EINVAL;
+	}
+
+	flags = crypto_ablkcipher_get_flags(tfm);
+
+	ret = des_ekey(tmp, key);
+	if (unlikely(ret == 0) && (flags & CRYPTO_TFM_REQ_WEAK_KEY)) {
+		crypto_ablkcipher_set_flags(tfm, CRYPTO_TFM_RES_WEAK_KEY);
+		dev_dbg(ss->dev, "Weak key %u\n", keylen);
+		return -EINVAL;
+	}
+
+	op->keylen = keylen;
+	memcpy(op->key, key, keylen);
+	return 0;
+}
+
+/* check and set the 3DES key, prepare the mode to be used */
+int sun4i_ss_des3_setkey(struct crypto_ablkcipher *tfm, const u8 *key,
+		unsigned int keylen)
+{
+	struct sun4i_tfm_ctx *op = crypto_ablkcipher_ctx(tfm);
+	struct sun4i_ss_ctx *ss = op->ss;
+
+	if (unlikely(keylen != 3 * DES_KEY_SIZE)) {
+		dev_err(ss->dev, "Invalid keylen %u\n", keylen);
+		crypto_ablkcipher_set_flags(tfm, CRYPTO_TFM_RES_BAD_KEY_LEN);
+		return -EINVAL;
+	}
+	op->keylen = keylen;
+	memcpy(op->key, key, keylen);
+	return 0;
+}
diff --git a/drivers/crypto/sunxi-ss/sun4i-ss-core.c b/drivers/crypto/sunxi-ss/sun4i-ss-core.c
new file mode 100644
index 0000000..f9a2e5c
--- /dev/null
+++ b/drivers/crypto/sunxi-ss/sun4i-ss-core.c
@@ -0,0 +1,411 @@
+/*
+ * sun4i-ss-core.c - hardware cryptographic accelerator for Allwinner A20 SoC
+ *
+ * Copyright (C) 2013-2015 Corentin LABBE <clabbe.montjoie@gmail.com>
+ *
+ * Core file which registers crypto algorithms supported by the SS.
+ *
+ * You could find a link for the datasheet in Documentation/arm/sunxi/README
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+#include <linux/clk.h>
+#include <linux/crypto.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <crypto/scatterwalk.h>
+#include <linux/scatterlist.h>
+#include <linux/interrupt.h>
+#include <linux/delay.h>
+
+#include "sun4i-ss.h"
+
+static struct sun4i_ss_alg_template driver_algs[] = {
+{       .type = CRYPTO_ALG_TYPE_AHASH,
+	.alg.hash = {
+		.init = sun4i_hash_init,
+		.update = sun4i_hash_update,
+		.final = sun4i_hash_final,
+		.finup = sun4i_hash_finup,
+		.digest = sun4i_hash_digest,
+		.export = sun4i_hash_export,
+		.import = sun4i_hash_import,
+		.halg = {
+			.digestsize = MD5_DIGEST_SIZE,
+			.base = {
+				.cra_name = "md5",
+				.cra_driver_name = "md5-sun4i-ss",
+				.cra_priority = 300,
+				.cra_alignmask = 3,
+				.cra_flags = CRYPTO_ALG_TYPE_AHASH |
+					CRYPTO_ALG_ASYNC,
+				.cra_blocksize = MD5_HMAC_BLOCK_SIZE,
+				.cra_ctxsize = sizeof(struct sun4i_req_ctx),
+				.cra_module = THIS_MODULE,
+				.cra_type = &crypto_ahash_type,
+				.cra_init = sun4i_hash_crainit
+			}
+		}
+	}
+},
+{       .type = CRYPTO_ALG_TYPE_AHASH,
+	.alg.hash = {
+		.init = sun4i_hash_init,
+		.update = sun4i_hash_update,
+		.final = sun4i_hash_final,
+		.finup = sun4i_hash_finup,
+		.digest = sun4i_hash_digest,
+		.export = sun4i_hash_export,
+		.import = sun4i_hash_import,
+		.halg = {
+			.digestsize = SHA1_DIGEST_SIZE,
+			.base = {
+				.cra_name = "sha1",
+				.cra_driver_name = "sha1-sun4i-ss",
+				.cra_priority = 300,
+				.cra_alignmask = 3,
+				.cra_flags = CRYPTO_ALG_TYPE_AHASH |
+					CRYPTO_ALG_ASYNC,
+				.cra_blocksize = SHA1_BLOCK_SIZE,
+				.cra_ctxsize = sizeof(struct sun4i_req_ctx),
+				.cra_module = THIS_MODULE,
+				.cra_type = &crypto_ahash_type,
+				.cra_init = sun4i_hash_crainit
+			}
+		}
+	}
+},
+{       .type = CRYPTO_ALG_TYPE_ABLKCIPHER,
+	.alg.crypto = {
+		.cra_name = "cbc(aes)",
+		.cra_driver_name = "cbc-aes-sun4i-ss",
+		.cra_priority = 300,
+		.cra_blocksize = AES_BLOCK_SIZE,
+		.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER |
+			CRYPTO_ALG_ASYNC,
+		.cra_ctxsize = sizeof(struct sun4i_tfm_ctx),
+		.cra_module = THIS_MODULE,
+		.cra_alignmask = 3,
+		.cra_type = &crypto_ablkcipher_type,
+		.cra_init = sun4i_ss_cipher_init,
+		.cra_exit = sun4i_ss_cipher_exit,
+		.cra_ablkcipher = {
+			.min_keysize	= AES_MIN_KEY_SIZE,
+			.max_keysize	= AES_MAX_KEY_SIZE,
+			.ivsize		= AES_BLOCK_SIZE,
+			.setkey         = sun4i_ss_aes_setkey,
+			.encrypt        = sun4i_ss_cbc_aes_encrypt,
+			.decrypt        = sun4i_ss_cbc_aes_decrypt,
+		}
+	}
+},
+{       .type = CRYPTO_ALG_TYPE_ABLKCIPHER,
+	.alg.crypto = {
+		.cra_name = "ecb(aes)",
+		.cra_driver_name = "ecb-aes-sun4i-ss",
+		.cra_priority = 300,
+		.cra_blocksize = AES_BLOCK_SIZE,
+		.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER |
+			CRYPTO_ALG_ASYNC,
+		.cra_ctxsize = sizeof(struct sun4i_tfm_ctx),
+		.cra_module = THIS_MODULE,
+		.cra_alignmask = 3,
+		.cra_type = &crypto_ablkcipher_type,
+		.cra_init = sun4i_ss_cipher_init,
+		.cra_exit = sun4i_ss_cipher_exit,
+		.cra_ablkcipher = {
+			.min_keysize	= AES_MIN_KEY_SIZE,
+			.max_keysize	= AES_MAX_KEY_SIZE,
+			.ivsize		= AES_BLOCK_SIZE,
+			.setkey         = sun4i_ss_aes_setkey,
+			.encrypt        = sun4i_ss_ecb_aes_encrypt,
+			.decrypt        = sun4i_ss_ecb_aes_decrypt,
+		}
+	}
+},
+{       .type = CRYPTO_ALG_TYPE_ABLKCIPHER,
+	.alg.crypto = {
+		.cra_name = "cbc(des)",
+		.cra_driver_name = "cbc-des-sun4i-ss",
+		.cra_priority = 300,
+		.cra_blocksize = DES_BLOCK_SIZE,
+		.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC |
+			CRYPTO_ALG_NEED_FALLBACK,
+		.cra_ctxsize = sizeof(struct sun4i_req_ctx),
+		.cra_module = THIS_MODULE,
+		.cra_alignmask = 3,
+		.cra_type = &crypto_ablkcipher_type,
+		.cra_init = sun4i_ss_cipher_des_init,
+		.cra_exit = sun4i_ss_cipher_exit,
+		.cra_u.ablkcipher = {
+			.min_keysize    = DES_KEY_SIZE,
+			.max_keysize    = DES_KEY_SIZE,
+			.ivsize         = DES_BLOCK_SIZE,
+			.setkey         = sun4i_ss_des_setkey,
+			.encrypt        = sun4i_ss_cbc_des_encrypt,
+			.decrypt        = sun4i_ss_cbc_des_decrypt,
+		}
+	}
+},
+{       .type = CRYPTO_ALG_TYPE_ABLKCIPHER,
+	.alg.crypto = {
+		.cra_name = "ecb(des)",
+		.cra_driver_name = "ecb-des-sun4i-ss",
+		.cra_priority = 300,
+		.cra_blocksize = DES_BLOCK_SIZE,
+		.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC |
+			CRYPTO_ALG_NEED_FALLBACK,
+		.cra_ctxsize = sizeof(struct sun4i_req_ctx),
+		.cra_module = THIS_MODULE,
+		.cra_alignmask = 3,
+		.cra_type = &crypto_ablkcipher_type,
+		.cra_init = sun4i_ss_cipher_des_init,
+		.cra_exit = sun4i_ss_cipher_exit,
+		.cra_u.ablkcipher = {
+			.min_keysize    = DES_KEY_SIZE,
+			.max_keysize    = DES_KEY_SIZE,
+			.setkey         = sun4i_ss_des_setkey,
+			.encrypt        = sun4i_ss_ecb_des_encrypt,
+			.decrypt        = sun4i_ss_ecb_des_decrypt,
+		}
+	}
+},
+{       .type = CRYPTO_ALG_TYPE_ABLKCIPHER,
+	.alg.crypto = {
+			.cra_name = "cbc(des3_ede)",
+			.cra_driver_name = "cbc-des3-sun4i-ss",
+			.cra_priority = 300,
+			.cra_blocksize = DES3_EDE_BLOCK_SIZE,
+			.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER |
+				CRYPTO_ALG_ASYNC |
+				CRYPTO_ALG_NEED_FALLBACK,
+			.cra_ctxsize = sizeof(struct sun4i_req_ctx),
+			.cra_module = THIS_MODULE,
+			.cra_alignmask = 3,
+			.cra_type = &crypto_ablkcipher_type,
+			.cra_init = sun4i_ss_cipher_des_init,
+			.cra_exit = sun4i_ss_cipher_exit,
+			.cra_u.ablkcipher = {
+				.min_keysize    = DES3_EDE_KEY_SIZE,
+				.max_keysize    = DES3_EDE_KEY_SIZE,
+				.ivsize         = DES3_EDE_BLOCK_SIZE,
+				.setkey         = sun4i_ss_des3_setkey,
+				.encrypt        = sun4i_ss_cbc_des3_encrypt,
+				.decrypt        = sun4i_ss_cbc_des3_decrypt,
+		}
+	}
+},
+{       .type = CRYPTO_ALG_TYPE_ABLKCIPHER,
+	.alg.crypto = {
+			.cra_name = "ecb(des3_ede)",
+			.cra_driver_name = "ecb-des3-sun4i-ss",
+			.cra_priority = 300,
+			.cra_blocksize = DES3_EDE_BLOCK_SIZE,
+			.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER |
+				CRYPTO_ALG_ASYNC |
+				CRYPTO_ALG_NEED_FALLBACK,
+			.cra_ctxsize = sizeof(struct sun4i_req_ctx),
+			.cra_module = THIS_MODULE,
+			.cra_alignmask = 3,
+			.cra_type = &crypto_ablkcipher_type,
+			.cra_init = sun4i_ss_cipher_des_init,
+			.cra_exit = sun4i_ss_cipher_exit,
+			.cra_u.ablkcipher = {
+				.min_keysize    = DES3_EDE_KEY_SIZE,
+				.max_keysize    = DES3_EDE_KEY_SIZE,
+				.ivsize         = DES3_EDE_BLOCK_SIZE,
+				.setkey         = sun4i_ss_des3_setkey,
+				.encrypt        = sun4i_ss_ecb_des3_encrypt,
+				.decrypt        = sun4i_ss_ecb_des3_decrypt,
+		}
+	}
+},
+};
+
+static int sun4i_ss_probe(struct platform_device *pdev)
+{
+	struct resource *res;
+	u32 v;
+	int err, i;
+	unsigned long cr;
+	const unsigned long cr_ahb = 24 * 1000 * 1000;
+	const unsigned long cr_mod = 150 * 1000 * 1000;
+	struct sun4i_ss_ctx *ss;
+
+	if (!pdev->dev.of_node)
+		return -ENODEV;
+
+	ss = devm_kzalloc(&pdev->dev, sizeof(*ss), GFP_KERNEL);
+	if (ss == NULL)
+		return -ENOMEM;
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	ss->base = devm_ioremap_resource(&pdev->dev, res);
+	if (IS_ERR(ss->base)) {
+		dev_err(&pdev->dev, "Cannot request MMIO\n");
+		return PTR_ERR(ss->base);
+	}
+
+	ss->ssclk = devm_clk_get(&pdev->dev, "mod");
+	if (IS_ERR(ss->ssclk)) {
+		err = PTR_ERR(ss->ssclk);
+		dev_err(&pdev->dev, "Cannot get SS clock err=%d\n", err);
+		return err;
+	}
+	dev_dbg(&pdev->dev, "clock ss acquired\n");
+
+	ss->busclk = devm_clk_get(&pdev->dev, "ahb");
+	if (IS_ERR(ss->busclk)) {
+		err = PTR_ERR(ss->busclk);
+		dev_err(&pdev->dev, "Cannot get AHB SS clock err=%d\n", err);
+		return err;
+	}
+	dev_dbg(&pdev->dev, "clock ahb_ss acquired\n");
+
+	/* Enable both clocks */
+	err = clk_prepare_enable(ss->busclk);
+	if (err != 0) {
+		dev_err(&pdev->dev, "Cannot prepare_enable busclk\n");
+		return err;
+	}
+	err = clk_prepare_enable(ss->ssclk);
+	if (err != 0) {
+		dev_err(&pdev->dev, "Cannot prepare_enable ssclk\n");
+		goto error_ssclk;
+	}
+
+	/*
+	 * Check that clock have the correct rates gived in the datasheet
+	 * Try to set the clock to the maximum allowed
+	 */
+	err = clk_set_rate(ss->ssclk, cr_mod);
+	if (err != 0) {
+		dev_err(&pdev->dev, "Cannot set clock rate to ssclk\n");
+		goto error_clk;
+	}
+
+	/*
+	 * The only impact on clocks below requirement are bad performance,
+	 * so do not print "errors"
+	 * warn on Overclocked clocks
+	 */
+	cr = clk_get_rate(ss->busclk);
+	if (cr >= cr_ahb)
+		dev_dbg(&pdev->dev, "Clock bus %lu (%lu MHz) (must be >= %lu)\n",
+				cr, cr / 1000000, cr_ahb);
+	else
+		dev_warn(&pdev->dev, "Clock bus %lu (%lu MHz) (must be >= %lu)\n",
+				cr, cr / 1000000, cr_ahb);
+
+	cr = clk_get_rate(ss->ssclk);
+	if (cr <= cr_mod)
+		if (cr < cr_mod)
+			dev_warn(&pdev->dev, "Clock ss %lu (%lu MHz) (must be <= %lu)\n",
+					cr, cr / 1000000, cr_mod);
+		else
+			dev_dbg(&pdev->dev, "Clock ss %lu (%lu MHz) (must be <= %lu)\n",
+					cr, cr / 1000000, cr_mod);
+	else
+		dev_warn(&pdev->dev, "Clock ss is at %lu (%lu MHz) (must be <= %lu)\n",
+				cr, cr / 1000000, cr_mod);
+
+	/*
+	 * Datasheet named it "Die Bonding ID"
+	 * I expect to be a sort of Security System Revision number.
+	 * Since the A80 seems to have an other version of SS
+	 * this info could be useful
+	 */
+	writel(SS_ENABLED, ss->base + SS_CTL);
+	v = readl(ss->base + SS_CTL);
+	v >>= 16;
+	v &= 0x07;
+	dev_info(&pdev->dev, "Die ID %d\n", v);
+	writel(0, ss->base + SS_CTL);
+
+	ss->dev = &pdev->dev;
+
+	spin_lock_init(&ss->slock);
+
+	for (i = 0; i < ARRAY_SIZE(driver_algs); i++) {
+		driver_algs[i].ss = ss;
+		switch (driver_algs[i].type) {
+		case CRYPTO_ALG_TYPE_ABLKCIPHER:
+			err = crypto_register_alg(&driver_algs[i].alg.crypto);
+			if (err != 0)
+				goto error_alg;
+			break;
+		case CRYPTO_ALG_TYPE_AHASH:
+			err = crypto_register_ahash(&driver_algs[i].alg.hash);
+			if (err != 0)
+				goto error_alg;
+			break;
+		}
+	}
+	platform_set_drvdata(pdev, ss);
+	return 0;
+error_alg:
+	i--;
+	for (; i >= 0; i--) {
+		switch (driver_algs[i].type) {
+		case CRYPTO_ALG_TYPE_ABLKCIPHER:
+			crypto_unregister_alg(&driver_algs[i].alg.crypto);
+			break;
+		case CRYPTO_ALG_TYPE_AHASH:
+			crypto_unregister_ahash(&driver_algs[i].alg.hash);
+			break;
+		}
+	}
+error_clk:
+	clk_disable_unprepare(ss->ssclk);
+error_ssclk:
+	clk_disable_unprepare(ss->busclk);
+	return err;
+}
+
+static int sun4i_ss_remove(struct platform_device *pdev)
+{
+	int i;
+	struct sun4i_ss_ctx *ss = platform_get_drvdata(pdev);
+
+	for (i = 0; i < ARRAY_SIZE(driver_algs); i++) {
+		switch (driver_algs[i].type) {
+		case CRYPTO_ALG_TYPE_ABLKCIPHER:
+			crypto_unregister_alg(&driver_algs[i].alg.crypto);
+			break;
+		case CRYPTO_ALG_TYPE_AHASH:
+			crypto_unregister_ahash(&driver_algs[i].alg.hash);
+			break;
+		}
+	}
+
+	writel(0, ss->base + SS_CTL);
+	clk_disable_unprepare(ss->busclk);
+	clk_disable_unprepare(ss->ssclk);
+	return 0;
+}
+
+static const struct of_device_id a20ss_crypto_of_match_table[] = {
+	{ .compatible = "allwinner,sun4i-a20-crypto" },
+	{}
+};
+MODULE_DEVICE_TABLE(of, a20ss_crypto_of_match_table);
+
+static struct platform_driver sun4i_ss_driver = {
+	.probe          = sun4i_ss_probe,
+	.remove         = sun4i_ss_remove,
+	.driver         = {
+		.name           = "sun4i-ss",
+		.of_match_table	= a20ss_crypto_of_match_table,
+	},
+};
+
+module_platform_driver(sun4i_ss_driver);
+
+MODULE_DESCRIPTION("Allwinner Security System cryptographic accelerator");
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Corentin LABBE <clabbe.montjoie@gmail.com>");
diff --git a/drivers/crypto/sunxi-ss/sun4i-ss-hash.c b/drivers/crypto/sunxi-ss/sun4i-ss-hash.c
new file mode 100644
index 0000000..44cc2ce
--- /dev/null
+++ b/drivers/crypto/sunxi-ss/sun4i-ss-hash.c
@@ -0,0 +1,481 @@
+/*
+ * sun4i-ss-hash.c - hardware cryptographic accelerator for Allwinner A20 SoC
+ *
+ * Copyright (C) 2013-2015 Corentin LABBE <clabbe.montjoie@gmail.com>
+ *
+ * This file add support for MD5 and SHA1.
+ *
+ * You could find the datasheet in Documentation/arm/sunxi/README
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+#include "sun4i-ss.h"
+
+/* This is a totaly arbitrary value */
+#define SS_TIMEOUT 100
+
+int sun4i_hash_crainit(struct crypto_tfm *tfm)
+{
+	crypto_ahash_set_reqsize(__crypto_ahash_cast(tfm),
+			sizeof(struct sun4i_req_ctx));
+	return 0;
+}
+
+/* sun4i_hash_init: initialize request context */
+int sun4i_hash_init(struct ahash_request *areq)
+{
+	const char *hash_type;
+	struct sun4i_req_ctx *op = ahash_request_ctx(areq);
+	struct crypto_ahash *tfm = crypto_ahash_reqtfm(areq);
+	struct ahash_alg *alg = __crypto_ahash_alg(tfm->base.__crt_alg);
+	struct sun4i_ss_alg_template *algt;
+	struct sun4i_ss_ctx *ss;
+
+	memset(op, 0, sizeof(struct sun4i_req_ctx));
+
+	algt = container_of(alg, struct sun4i_ss_alg_template, alg.hash);
+	ss = algt->ss;
+	op->ss = algt->ss;
+
+	hash_type = crypto_tfm_alg_name(areq->base.tfm);
+
+	if (strcmp(hash_type, "sha1") == 0)
+		op->mode = SS_OP_SHA1;
+	else if (strcmp(hash_type, "md5") == 0)
+		op->mode = SS_OP_MD5;
+	else
+		return -EINVAL;
+
+	return 0;
+}
+
+int sun4i_hash_export(struct ahash_request *areq, void *out)
+{
+	struct sun4i_req_ctx *op = ahash_request_ctx(areq);
+
+	memcpy(out, op, sizeof(struct sun4i_req_ctx));
+	return 0;
+}
+
+int sun4i_hash_import(struct ahash_request *areq, const void *in)
+{
+	struct sun4i_req_ctx *op = ahash_request_ctx(areq);
+
+	memcpy(op, in, sizeof(struct sun4i_req_ctx));
+	return 0;
+}
+
+/*
+ * sun4i_hash_update: update hash engine
+ *
+ * Could be used for both SHA1 and MD5
+ * Write data by step of 32bits and put then in the SS.
+ *
+ * Since we cannot leave partial data and hash state in the engine,
+ * we need to get the hash state at the end of this function.
+ * After some work, I have found that we can get the hash state every 64 bytes
+ *
+ * So the first work is to get the number of bytes to write to SS modulo 64
+ * The extra bytes will go to two different destination:
+ * op->wait for full 32bits word
+ * op->wb (waiting bytes) for partial 32 bits word
+ * So we can have up to (64/4)-1 op->wait words and 0/1/2/3 bytes in wb
+ *
+ * So at the begin of update()
+ * if op->nwait * 4 + areq->nbytes < 64
+ * => all data will be writen to wait buffers and end=0
+ * if not, write all nwait to the device and position end to complete to 64bytes
+ *
+ * example 1:
+ * update1 60o => nwait=15
+ * update2 60o => need one more word to have 64 bytes
+ * end=4
+ * so write all data in op->wait and one word of SGs
+ * write remaining data in op->wait
+ * final state op->nwait=14
+ */
+int sun4i_hash_update(struct ahash_request *areq)
+{
+	u32 v, ivmode = 0;
+	unsigned int i = 0;
+	/*
+	 * i is the total bytes read from SGs, to be compared to areq->nbytes
+	 * i is important because we cannot rely on SG length since the sum of
+	 * SG->length could be greater than areq->nbytes
+	 */
+
+	struct sun4i_req_ctx *op = ahash_request_ctx(areq);
+	struct sun4i_ss_ctx *ss = op->ss;
+	struct scatterlist *in_sg;
+	unsigned int in_i = 0; /* advancement in the current SG */
+	unsigned int end;
+	/*
+	 * end is the position when we need to stop writing to the device,
+	 * to be compared to i
+	 * So end is always a multiple of 64
+	 * if end = 0 all data must be kept for later use and no write
+	 * on the device is done.
+	 */
+	int in_r, err = 0;
+	void *src_addr;
+	unsigned int todo;
+	u32 spaces, rx_cnt;
+	unsigned long flags = 0;
+
+	dev_dbg(ss->dev, "%s %s bc=%llu len=%u mode=%x bw=%u ww=%u h0=%0x",
+			__func__, crypto_tfm_alg_name(areq->base.tfm),
+			op->byte_count, areq->nbytes, op->mode,
+			op->nbw, op->nwait, op->hash[0]);
+
+	if (areq->nbytes == 0)
+		return 0;
+
+	/* protect againt overflow */
+	if (areq->nbytes > UINT_MAX - op->nwait * 4 - op->nbw) {
+		dev_err(ss->dev, "Cannot process too large request\n");
+		return -EINVAL;
+	}
+
+	if (areq->nbytes + op->nwait * 4 + op->nbw < 64)
+		end = 0;
+	else
+		end = ((areq->nbytes + op->nwait * 4 + op->nbw) / 64) * 64
+			- op->nbw - op->nwait * 4;
+
+	if (end > areq->nbytes || areq->nbytes - end > 63) {
+		dev_err(ss->dev, "ERROR: Bound error %u %u\n",
+				end, areq->nbytes);
+		return -EINVAL;
+	}
+
+	if (end > 0) {
+		spin_lock_irqsave(&ss->slock, flags);
+
+		/*
+		 * if some data have been processed before,
+		 * we need to restore the partial hash state
+		 */
+		if (op->byte_count > 0) {
+			ivmode = SS_IV_ARBITRARY;
+			for (i = 0; i < 5; i++)
+				writel(op->hash[i], ss->base + SS_IV0 + i * 4);
+		}
+		/* Enable the device */
+		writel(op->mode | SS_ENABLED | ivmode, ss->base + SS_CTL);
+	}
+
+	rx_cnt = 32;
+	i = 0;
+
+	if (op->nwait > 0 && end > 0) {
+		/*
+		 * a precedent update was done
+		 * No test versus rx_cnt since op->nwait cannot be more than 15
+		 */
+		writesl(ss->base + SS_RXFIFO, op->wait, op->nwait);
+		op->byte_count += 4 * op->nwait;
+		op->nwait = 0;
+	}
+
+	in_sg = areq->src;
+	src_addr = kmap(sg_page(in_sg)) + in_sg->offset;
+	if (src_addr == NULL) {
+		dev_err(ss->dev, "ERROR: Cannot kmap source buffer\n");
+		err = -EFAULT;
+		goto hash_update_release_ss;
+	}
+	do {
+		/*
+		 * step 1, if some bytes remains from last SG,
+		 * try to complete them to 4 and send that word
+		 */
+		if (op->nbw > 0) {
+			while (op->nbw < 4 && i < areq->nbytes &&
+					in_i < in_sg->length) {
+				op->wb |= (*(u8 *)(src_addr + in_i))
+					<< (8 * op->nbw);
+				dev_dbg(ss->dev, "%s: Complete w=%d wb=%x\n",
+						__func__, op->nbw, op->wb);
+				i++;
+				in_i++;
+				op->nbw++;
+			}
+			if (op->nbw == 4) {
+				if (i <= end) {
+					writel(op->wb, ss->base + SS_RXFIFO);
+					rx_cnt--;
+					if (rx_cnt > 0) {
+						spaces = readl_relaxed(ss->base + SS_FCSR);
+						rx_cnt = SS_RXFIFO_SPACES(spaces);
+					}
+					op->byte_count += 4;
+				} else {
+					op->wait[op->nwait] = op->wb;
+					op->nwait++;
+					dev_dbg(ss->dev, "%s: Keep %u bytes after %u\n",
+						__func__, op->nwait, end);
+				}
+				op->nbw = 0;
+				op->wb = 0;
+			}
+		}
+		/* step 2, main loop, read data 4bytes at a time */
+		while (i < areq->nbytes && in_i < in_sg->length) {
+			/* how many bytes we can read from current SG */
+			in_r = min(in_sg->length - in_i, areq->nbytes - i);
+			if (in_r < 4) {
+				/* Not enough data to write to the device */
+				op->wb = 0;
+				while (in_r > 0) {
+					op->wb |= (*(u8 *)(src_addr + in_i))
+						<< (8 * op->nbw);
+					dev_dbg(ss->dev, "%s: ending bw=%d wb=%x\n",
+						__func__, op->nbw, op->wb);
+					in_r--;
+					i++;
+					in_i++;
+					op->nbw++;
+				}
+				goto nextsg;
+			}
+			v = *(u32 *)(src_addr + in_i);
+			if (i < end) {
+				todo = min3((u32)(end - i) / 4, rx_cnt, (u32)in_r / 4);
+				writesl(ss->base + SS_RXFIFO, src_addr + in_i, todo);
+				i += 4 * todo;
+				in_i += 4 * todo;
+				op->byte_count += 4 * todo;
+				rx_cnt -= todo;
+				if (rx_cnt == 0) {
+					spaces = readl_relaxed(ss->base + SS_FCSR);
+					rx_cnt = SS_RXFIFO_SPACES(spaces);
+				}
+			} else {
+				op->wait[op->nwait] = v;
+				i += 4;
+				in_i += 4;
+				op->nwait++;
+				dev_dbg(ss->dev, "%s: Keep word ww=%u after %u\n",
+						__func__, op->nwait, end);
+				if (op->nwait > 15) {
+					dev_err(ss->dev, "FATAL: Cannot enqueue more, bug?\n");
+					err = -EIO;
+					goto hash_update_release_ss;
+				}
+			}
+		}
+nextsg:
+		/* Nothing more to read in this SG */
+		if (in_i == in_sg->length) {
+			kunmap(sg_page(in_sg));
+			do {
+				in_sg = sg_next(in_sg);
+			} while (in_sg != NULL && in_sg->length == 0);
+			in_i = 0;
+			if (in_sg != NULL) {
+				src_addr = kmap(sg_page(in_sg)) + in_sg->offset;
+				if (src_addr == NULL) {
+					dev_err(ss->dev, "ERROR: Cannot kmap source buffer\n");
+					err = -EFAULT;
+					goto hash_update_release_ss;
+				}
+			}
+		}
+	} while (in_sg != NULL && i < areq->nbytes);
+
+hash_update_release_ss:
+	/* the device was not used, so nothing to release */
+	if (end == 0)
+		return err;
+
+	if (err == 0) {
+		/* ask the device to finish the hashing */
+		writel(op->mode | SS_ENABLED | SS_DATA_END, ss->base + SS_CTL);
+		i = 0;
+		do {
+			v = readl(ss->base + SS_CTL);
+			i++;
+		} while (i < SS_TIMEOUT && (v & SS_DATA_END) > 0);
+		if (i >= SS_TIMEOUT) {
+			dev_err(ss->dev, "ERROR: %s: hash end timeout after %d loop, CTL=%x\n",
+					__func__, i, v);
+			err = -EIO;
+			goto hash_update_release_ss;
+			/*
+			 * this seems strange (to go backward)
+			 * but since err is set, it works
+			 * */
+		}
+
+		/* get the partial hash only if something was written */
+		if (op->mode == SS_OP_SHA1) {
+			for (i = 0; i < 5; i++)
+				op->hash[i] = readl(ss->base + SS_MD0 + i * 4);
+		} else {
+			for (i = 0; i < 4; i++)
+				op->hash[i] = readl(ss->base + SS_MD0 + i * 4);
+		}
+	}
+	writel(0, ss->base + SS_CTL);
+	spin_unlock_irqrestore(&ss->slock, flags);
+	return err;
+}
+
+/*
+ * sun4i_hash_final: finalize hashing operation
+ *
+ * If we have some remaining bytes, we write them.
+ * Then ask the SS for finalizing the hashing operation
+ *
+ * I do not check RX FIFO size in this function since the size is 32
+ * after each enabling and this function neither write more than 32 words.
+ */
+int sun4i_hash_final(struct ahash_request *areq)
+{
+	u32 v, ivmode = 0;
+	unsigned int i;
+	unsigned int j = 0;
+	int zeros;
+	unsigned int index, padlen;
+	__be64 bits;
+	struct sun4i_req_ctx *op = ahash_request_ctx(areq);
+	struct sun4i_ss_ctx *ss = op->ss;
+	u32 bf[32];
+	unsigned long flags;
+
+	dev_dbg(ss->dev, "%s: byte=%llu len=%u mode=%x bw=%u %x h=%x ww=%u",
+			__func__, op->byte_count, areq->nbytes, op->mode,
+			op->nbw, op->wb, op->hash[0], op->nwait);
+
+	spin_lock_irqsave(&ss->slock, flags);
+
+	/*
+	 * if we have already writed something,
+	 * restore the partial hash state
+	 */
+	if (op->byte_count > 0) {
+		ivmode = SS_IV_ARBITRARY;
+		for (i = 0; i < 5; i++)
+			writel(op->hash[i], ss->base + SS_IV0 + i * 4);
+	}
+	writel(op->mode | SS_ENABLED | ivmode, ss->base + SS_CTL);
+
+	/* write the remaining words of the wait buffer */
+	if (op->nwait > 0) {
+		writesl(ss->base + SS_RXFIFO, op->wait, op->nwait);
+		op->byte_count += 4 * op->nwait;
+		op->nwait = 0;
+	}
+
+	/* write the remaining bytes of the nbw buffer */
+	if (op->nbw > 0) {
+		op->wb |= ((1 << 7) << (op->nbw * 8));
+		bf[j++] = op->wb;
+	} else {
+		bf[j++] = 1 << 7;
+	}
+
+	/*
+	 * number of space to pad to obtain 64o minus 8(size) minus 4 (final 1)
+	 * I take the operations from other md5/sha1 implementations
+	 */
+
+	/* we have already send 4 more byte of which nbw data */
+	if (op->mode == SS_OP_MD5) {
+		index = (op->byte_count + 4) & 0x3f;
+		op->byte_count += op->nbw;
+		if (index > 56)
+			zeros = (120 - index) / 4;
+		else
+			zeros = (56 - index) / 4;
+	} else {
+		op->byte_count += op->nbw;
+		index = op->byte_count & 0x3f;
+		padlen = (index < 56) ? (56 - index) : ((64 + 56) - index);
+		zeros = (padlen - 1) / 4;
+	}
+
+	/*for (i = 0; i < zeros; i++)
+		bf[j++] = 0;*/
+	memset(bf + j, 0, 4 * zeros);
+	j += zeros;
+
+	/* write the length of data */
+	if (op->mode == SS_OP_SHA1) {
+		bits = cpu_to_be64(op->byte_count << 3);
+		bf[j++] = bits & 0xffffffff;
+		bf[j++] = (bits >> 32) & 0xffffffff;
+	} else {
+		bf[j++] = (op->byte_count << 3) & 0xffffffff;
+		bf[j++] = (op->byte_count >> 29) & 0xffffffff;
+	}
+	writesl(ss->base + SS_RXFIFO, bf, j);
+
+	/* Tell the SS to stop the hashing */
+	writel(op->mode | SS_ENABLED | SS_DATA_END, ss->base + SS_CTL);
+
+	/*
+	 * Wait for SS to finish the hash.
+	 * The timeout could happend only in case of bad overcloking
+	 * or driver bug.
+	 */
+	i = 0;
+	do {
+		v = readl(ss->base + SS_CTL);
+		i++;
+	} while (i < SS_TIMEOUT && (v & SS_DATA_END) > 0);
+	if (i >= SS_TIMEOUT) {
+		dev_err(ss->dev, "ERROR: hash end timeout %d>%d ctl=%x len=%u\n",
+				i, SS_TIMEOUT, v, areq->nbytes);
+		writel(0, ss->base + SS_CTL);
+		spin_unlock_irqrestore(&ss->slock, flags);
+		return -EIO;
+	}
+
+	/* Get the hash from the device */
+	if (op->mode == SS_OP_SHA1) {
+		for (i = 0; i < 5; i++) {
+			v = cpu_to_be32(readl(ss->base + SS_MD0 + i * 4));
+			memcpy(areq->result + i * 4, &v, 4);
+		}
+	} else {
+		for (i = 0; i < 4; i++) {
+			v = readl(ss->base + SS_MD0 + i * 4);
+			memcpy(areq->result + i * 4, &v, 4);
+		}
+	}
+	writel(0, ss->base + SS_CTL);
+	spin_unlock_irqrestore(&ss->slock, flags);
+	return 0;
+}
+
+/* sun4i_hash_finup: finalize hashing operation after an update */
+int sun4i_hash_finup(struct ahash_request *areq)
+{
+	int err;
+
+	err = sun4i_hash_update(areq);
+	if (err != 0)
+		return err;
+
+	return sun4i_hash_final(areq);
+}
+
+/* combo of init/update/final functions */
+int sun4i_hash_digest(struct ahash_request *areq)
+{
+	int err;
+
+	err = sun4i_hash_init(areq);
+	if (err != 0)
+		return err;
+
+	err = sun4i_hash_update(areq);
+	if (err != 0)
+		return err;
+
+	return sun4i_hash_final(areq);
+}
diff --git a/drivers/crypto/sunxi-ss/sun4i-ss.h b/drivers/crypto/sunxi-ss/sun4i-ss.h
new file mode 100644
index 0000000..7e442be
--- /dev/null
+++ b/drivers/crypto/sunxi-ss/sun4i-ss.h
@@ -0,0 +1,198 @@
+/*
+ * sun4i-ss.h - hardware cryptographic accelerator for Allwinner A20 SoC
+ *
+ * Copyright (C) 2013-2015 Corentin LABBE <clabbe.montjoie@gmail.com>
+ *
+ * Support AES cipher with 128,192,256 bits keysize.
+ * Support MD5 and SHA1 hash algorithms.
+ * Support DES and 3DES
+ *
+ * You could find the datasheet in Documentation/arm/sunxi/README
+ *
+ * Licensed under the GPL-2.
+ */
+
+#include <linux/clk.h>
+#include <linux/crypto.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <crypto/scatterwalk.h>
+#include <linux/scatterlist.h>
+#include <linux/interrupt.h>
+#include <linux/delay.h>
+#include <crypto/md5.h>
+#include <crypto/sha.h>
+#include <crypto/hash.h>
+#include <crypto/internal/hash.h>
+#include <crypto/aes.h>
+#include <crypto/des.h>
+#include <crypto/internal/rng.h>
+
+#define SS_CTL            0x00
+#define SS_KEY0           0x04
+#define SS_KEY1           0x08
+#define SS_KEY2           0x0C
+#define SS_KEY3           0x10
+#define SS_KEY4           0x14
+#define SS_KEY5           0x18
+#define SS_KEY6           0x1C
+#define SS_KEY7           0x20
+
+#define SS_IV0            0x24
+#define SS_IV1            0x28
+#define SS_IV2            0x2C
+#define SS_IV3            0x30
+
+#define SS_FCSR           0x44
+
+#define SS_MD0            0x4C
+#define SS_MD1            0x50
+#define SS_MD2            0x54
+#define SS_MD3            0x58
+#define SS_MD4            0x5C
+
+#define SS_RXFIFO         0x200
+#define SS_TXFIFO         0x204
+
+/* SS_CTL configuration values */
+
+/* PRNG generator mode - bit 15 */
+#define SS_PRNG_ONESHOT		(0 << 15)
+#define SS_PRNG_CONTINUE	(1 << 15)
+
+/* IV mode for hash */
+#define SS_IV_ARBITRARY		(1 << 14)
+
+/* SS operation mode - bits 12-13 */
+#define SS_ECB			(0 << 12)
+#define SS_CBC			(1 << 12)
+#define SS_CTS			(3 << 12)
+
+/* Counter width for CNT mode - bits 10-11 */
+#define SS_CNT_16BITS		(0 << 10)
+#define SS_CNT_32BITS		(1 << 10)
+#define SS_CNT_64BITS		(2 << 10)
+
+/* Key size for AES - bits 8-9 */
+#define SS_AES_128BITS		(0 << 8)
+#define SS_AES_192BITS		(1 << 8)
+#define SS_AES_256BITS		(2 << 8)
+
+/* Operation direction - bit 7 */
+#define SS_ENCRYPTION		(0 << 7)
+#define SS_DECRYPTION		(1 << 7)
+
+/* SS Method - bits 4-6 */
+#define SS_OP_AES		(0 << 4)
+#define SS_OP_DES		(1 << 4)
+#define SS_OP_3DES		(2 << 4)
+#define SS_OP_SHA1		(3 << 4)
+#define SS_OP_MD5		(4 << 4)
+#define SS_OP_PRNG		(5 << 4)
+
+/* Data end bit - bit 2 */
+#define SS_DATA_END		(1 << 2)
+
+/* PRNG start bit - bit 1 */
+#define SS_PRNG_START		(1 << 1)
+
+/* SS Enable bit - bit 0 */
+#define SS_DISABLED		(0 << 0)
+#define SS_ENABLED		(1 << 0)
+
+/* SS_FCSR configuration values */
+/* RX FIFO status - bit 30 */
+#define SS_RXFIFO_FREE		(1 << 30)
+
+/* RX FIFO empty spaces - bits 24-29 */
+#define SS_RXFIFO_SPACES(val)	(((val) >> 24) & 0x3f)
+
+/* TX FIFO status - bit 22 */
+#define SS_TXFIFO_AVAILABLE	(1 << 22)
+
+/* TX FIFO available spaces - bits 16-21 */
+#define SS_TXFIFO_SPACES(val)	(((val) >> 16) & 0x3f)
+
+#define SS_RXFIFO_EMP_INT_PENDING	(1 << 10)
+#define SS_TXFIFO_AVA_INT_PENDING	(1 << 8)
+#define SS_RXFIFO_EMP_INT_ENABLE	(1 << 2)
+#define SS_TXFIFO_AVA_INT_ENABLE	(1 << 0)
+
+
+struct sun4i_ss_ctx {
+	void __iomem *base;
+	int irq;
+	struct clk *busclk;
+	struct clk *ssclk;
+	struct device *dev;
+	struct resource *res;
+	spinlock_t slock; /* control the use of the device */
+};
+
+struct sun4i_ss_alg_template {
+	u32 type;
+	union {
+		struct crypto_alg crypto;
+		struct ahash_alg hash;
+	} alg;
+	struct sun4i_ss_ctx *ss;
+};
+
+struct sun4i_tfm_ctx {
+	u32 key[AES_MAX_KEY_SIZE / 4];/* divided by sizeof(u32) */
+	u32 keylen;
+	u32 keymode;
+	struct crypto_ablkcipher *fallback;
+	struct sun4i_ss_ctx *ss;
+};
+
+struct sun4i_req_ctx {
+	u32 mode;
+	u64 byte_count; /* number of bytes "uploaded" to the device */
+	/* wb: partial word waiting to be completed and written to the device */
+	u32 wb;
+	/* number of bytes to be uploaded in the wb word */
+	unsigned int nbw;
+	u32 hash[5]; /* for storing SS_IVx register */
+	u32 wait[64];
+	unsigned int nwait;
+	struct sun4i_ss_ctx *ss;
+};
+
+int sun4i_hash_crainit(struct crypto_tfm *tfm);
+int sun4i_hash_init(struct ahash_request *areq);
+int sun4i_hash_update(struct ahash_request *areq);
+int sun4i_hash_final(struct ahash_request *areq);
+int sun4i_hash_finup(struct ahash_request *areq);
+int sun4i_hash_digest(struct ahash_request *areq);
+int sun4i_hash_export(struct ahash_request *areq, void *out);
+int sun4i_hash_import(struct ahash_request *areq, const void *in);
+
+int sun4i_ss_cbc_aes_encrypt(struct ablkcipher_request *areq);
+int sun4i_ss_cbc_aes_decrypt(struct ablkcipher_request *areq);
+int sun4i_ss_ecb_aes_encrypt(struct ablkcipher_request *areq);
+int sun4i_ss_ecb_aes_decrypt(struct ablkcipher_request *areq);
+
+int sun4i_ss_cbc_des_encrypt(struct ablkcipher_request *areq);
+int sun4i_ss_cbc_des_decrypt(struct ablkcipher_request *areq);
+int sun4i_ss_ecb_des_encrypt(struct ablkcipher_request *areq);
+int sun4i_ss_ecb_des_decrypt(struct ablkcipher_request *areq);
+
+int sun4i_ss_cbc_des3_encrypt(struct ablkcipher_request *areq);
+int sun4i_ss_cbc_des3_decrypt(struct ablkcipher_request *areq);
+int sun4i_ss_ecb_des3_encrypt(struct ablkcipher_request *areq);
+int sun4i_ss_ecb_des3_decrypt(struct ablkcipher_request *areq);
+
+int sun4i_ss_aes_poll(struct ablkcipher_request *areq, u32 mode);
+int sun4i_ss_des_poll(struct ablkcipher_request *areq, u32 mode);
+int sun4i_ss_cipher_init(struct crypto_tfm *tfm);
+int sun4i_ss_cipher_des_init(struct crypto_tfm *tfm);
+void sun4i_ss_cipher_exit(struct crypto_tfm *tfm);
+int sun4i_ss_aes_setkey(struct crypto_ablkcipher *tfm, const u8 *key,
+		unsigned int keylen);
+int sun4i_ss_des_setkey(struct crypto_ablkcipher *tfm, const u8 *key,
+		unsigned int keylen);
+int sun4i_ss_des3_setkey(struct crypto_ablkcipher *tfm, const u8 *key,
+		unsigned int keylen);
-- 
2.3.6


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 4/4] crypto: Add Allwinner Security System crypto accelerator
  2015-05-14 12:59 ` [PATCH v9 4/4] crypto: Add Allwinner Security System crypto accelerator LABBE Corentin
@ 2015-05-15  6:49   ` Herbert Xu
  2015-05-23 12:18     ` [linux-sunxi] " Corentin LABBE
  2015-05-15  6:52   ` Herbert Xu
  2015-05-17  8:45   ` Boris Brezillon
  2 siblings, 1 reply; 26+ messages in thread
From: Herbert Xu @ 2015-05-15  6:49 UTC (permalink / raw)
  To: LABBE Corentin
  Cc: robh+dt, pawel.moll, mark.rutland, ijc+devicetree, galak,
	maxime.ripard, linux, davem, akpm, gregkh, mchehab, joe, tj,
	arnd, boris.brezillon, devicetree, linux-doc, linux-arm-kernel,
	linux-kernel, linux-crypto, linux-sunxi

On Thu, May 14, 2015 at 02:59:01PM +0200, LABBE Corentin wrote:
>
> +	err = crypto_ablkcipher_setkey(op->fallback, kkey, op->keylen);
> +	if (err != 0) {
> +		dev_err(ss->dev, "Cannot set key on fallback\n");
> +		return -EINVAL;
> +	}

You cannot call setkey from an atomic context.  The easiest solution
is to call setkey in your setkey function instead.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 4/4] crypto: Add Allwinner Security System crypto accelerator
  2015-05-14 12:59 ` [PATCH v9 4/4] crypto: Add Allwinner Security System crypto accelerator LABBE Corentin
  2015-05-15  6:49   ` Herbert Xu
@ 2015-05-15  6:52   ` Herbert Xu
  2015-05-16 18:09     ` [linux-sunxi] " Corentin LABBE
  2015-05-17  8:45   ` Boris Brezillon
  2 siblings, 1 reply; 26+ messages in thread
From: Herbert Xu @ 2015-05-15  6:52 UTC (permalink / raw)
  To: LABBE Corentin
  Cc: robh+dt, pawel.moll, mark.rutland, ijc+devicetree, galak,
	maxime.ripard, linux, davem, akpm, gregkh, mchehab, joe, tj,
	arnd, boris.brezillon, devicetree, linux-doc, linux-arm-kernel,
	linux-kernel, linux-crypto, linux-sunxi

On Thu, May 14, 2015 at 02:59:01PM +0200, LABBE Corentin wrote:
>
> +int sun4i_hash_export(struct ahash_request *areq, void *out)
> +{
> +	struct sun4i_req_ctx *op = ahash_request_ctx(areq);
> +
> +	memcpy(out, op, sizeof(struct sun4i_req_ctx));
> +	return 0;
> +}
> +
> +int sun4i_hash_import(struct ahash_request *areq, const void *in)
> +{
> +	struct sun4i_req_ctx *op = ahash_request_ctx(areq);
> +
> +	memcpy(op, in, sizeof(struct sun4i_req_ctx));

This is very wrong.  You're importing an arbitrary ss pointer.  The
whole point of having an import function instead of just a simple
memcpy is to deal with such problems.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 1/4] ARM: sun7i: dt: Add Security System to A20 SoC DTS
  2015-05-14 12:58 ` [PATCH v9 1/4] ARM: sun7i: dt: Add Security System to A20 SoC DTS LABBE Corentin
@ 2015-05-15  7:31   ` Maxime Ripard
  2015-05-23 12:08     ` Corentin LABBE
  0 siblings, 1 reply; 26+ messages in thread
From: Maxime Ripard @ 2015-05-15  7:31 UTC (permalink / raw)
  To: LABBE Corentin
  Cc: robh+dt, pawel.moll, mark.rutland, ijc+devicetree, galak, linux,
	herbert, davem, akpm, gregkh, mchehab, joe, tj, arnd,
	boris.brezillon, devicetree, linux-doc, linux-arm-kernel,
	linux-kernel, linux-crypto, linux-sunxi

[-- Attachment #1: Type: text/plain, Size: 1045 bytes --]

On Thu, May 14, 2015 at 02:58:58PM +0200, LABBE Corentin wrote:
> The Security System is a hardware cryptographic accelerator that support
> AES/MD5/SHA1/DES/3DES/PRNG algorithms.
> It could be found on many Allwinner SoC.
> 
> This patch enable the Security System on the Allwinner A20 SoC Device-tree.
> 
> Signed-off-by: LABBE Corentin <clabbe.montjoie@gmail.com>
> ---
>  arch/arm/boot/dts/sun7i-a20.dtsi | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/arm/boot/dts/sun7i-a20.dtsi b/arch/arm/boot/dts/sun7i-a20.dtsi
> index fdd1817..9a823ce 100644
> --- a/arch/arm/boot/dts/sun7i-a20.dtsi
> +++ b/arch/arm/boot/dts/sun7i-a20.dtsi
> @@ -679,6 +679,14 @@
>  			status = "disabled";
>  		};
>  
> +		crypto: crypto-engine@01c15000 {
> +			compatible = "allwinner,sun4i-a20-crypto";

This looks wrong. It's either sun4i-a10 or sun7i-a20.

It looks good otherwise.

Thanks!
Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [linux-sunxi] Re: [PATCH v9 4/4] crypto: Add Allwinner Security System crypto accelerator
  2015-05-15  6:52   ` Herbert Xu
@ 2015-05-16 18:09     ` Corentin LABBE
  2015-05-17  1:32       ` Herbert Xu
  0 siblings, 1 reply; 26+ messages in thread
From: Corentin LABBE @ 2015-05-16 18:09 UTC (permalink / raw)
  To: herbert
  Cc: robh+dt, pawel.moll, mark.rutland, ijc+devicetree, galak,
	maxime.ripard, linux, davem, akpm, gregkh, mchehab, joe, tj,
	arnd, boris.brezillon, devicetree, linux-doc, linux-arm-kernel,
	linux-kernel, linux-crypto, linux-sunxi

Le 15/05/2015 08:52, Herbert Xu a écrit :
> On Thu, May 14, 2015 at 02:59:01PM +0200, LABBE Corentin wrote:
>>
>> +int sun4i_hash_export(struct ahash_request *areq, void *out)
>> +{
>> +	struct sun4i_req_ctx *op = ahash_request_ctx(areq);
>> +
>> +	memcpy(out, op, sizeof(struct sun4i_req_ctx));
>> +	return 0;
>> +}
>> +
>> +int sun4i_hash_import(struct ahash_request *areq, const void *in)
>> +{
>> +	struct sun4i_req_ctx *op = ahash_request_ctx(areq);
>> +
>> +	memcpy(op, in, sizeof(struct sun4i_req_ctx));
> 
> This is very wrong.  You're importing an arbitrary ss pointer.  The
> whole point of having an import function instead of just a simple
> memcpy is to deal with such problems.
> 
> Cheers,
> 

Hello

You are right, I have totally mis-understood the export/import functions by just reading them from crypto/md5.c.
Reading them from drivers/crypto/* is better.

Incidental question, I need to use the MD5 IV for export_md5 function, but they are not defined anywhere (unlike SHAxx ones), does this is voluntary or do you will accept a patch for adding them.

Regards


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [linux-sunxi] Re: [PATCH v9 4/4] crypto: Add Allwinner Security System crypto accelerator
  2015-05-16 18:09     ` [linux-sunxi] " Corentin LABBE
@ 2015-05-17  1:32       ` Herbert Xu
  0 siblings, 0 replies; 26+ messages in thread
From: Herbert Xu @ 2015-05-17  1:32 UTC (permalink / raw)
  To: Corentin LABBE
  Cc: robh+dt, pawel.moll, mark.rutland, ijc+devicetree, galak,
	maxime.ripard, linux, davem, akpm, gregkh, mchehab, joe, tj,
	arnd, boris.brezillon, devicetree, linux-doc, linux-arm-kernel,
	linux-kernel, linux-crypto, linux-sunxi

On Sat, May 16, 2015 at 08:09:27PM +0200, Corentin LABBE wrote:
>
> Incidental question, I need to use the MD5 IV for export_md5 function, but they are not defined anywhere (unlike SHAxx ones), does this is voluntary or do you will accept a patch for adding them.

Please send a patch to add them.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9] crypto: Add Allwinner Security System crypto accelerator
  2015-05-14 12:58 [PATCH v9] crypto: Add Allwinner Security System crypto accelerator LABBE Corentin
                   ` (3 preceding siblings ...)
  2015-05-14 12:59 ` [PATCH v9 4/4] crypto: Add Allwinner Security System crypto accelerator LABBE Corentin
@ 2015-05-17  7:45 ` Boris Brezillon
  4 siblings, 0 replies; 26+ messages in thread
From: Boris Brezillon @ 2015-05-17  7:45 UTC (permalink / raw)
  To: LABBE Corentin
  Cc: robh+dt, pawel.moll, mark.rutland, ijc+devicetree, galak,
	maxime.ripard, linux, herbert, davem, akpm, gregkh, mchehab, joe,
	tj, arnd, devicetree, linux-doc, linux-arm-kernel, linux-kernel,
	linux-crypto, linux-sunxi

Hello Corentin,

On Thu, 14 May 2015 14:58:57 +0200
LABBE Corentin <clabbe.montjoie@gmail.com> wrote:

> 
> Hello
> 
> This is the driver for the Security System included in Allwinner SoC A20.
> The Security System (SS for short) is a hardware cryptographic accelerator that
> support AES/MD5/SHA1/DES/3DES/PRNG algorithms.
> It could be found on others Allwinner SoC: 
> - A10, A10s, A13, A31 and A33 manual give the same datasheet for SS than A20
> - A23 speak about a security system but without precisions
> - A80 and A83T datasheet speak about a security system with more functions
>   (SHA224/SHA256/RSA/CRC), they will be supported in a separate driver
> But I do not have access on any of those hardware, tests are welcome.
> 

Maybe this is just a nitpick, but I would reverse the ordering of
patches in this series. The biggest inconsistency is the fact that
you're adding yourself as a maintainer of something that does not even
exist.

Best Regards,

Boris

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 4/4] crypto: Add Allwinner Security System crypto accelerator
  2015-05-14 12:59 ` [PATCH v9 4/4] crypto: Add Allwinner Security System crypto accelerator LABBE Corentin
  2015-05-15  6:49   ` Herbert Xu
  2015-05-15  6:52   ` Herbert Xu
@ 2015-05-17  8:45   ` Boris Brezillon
  2015-05-17 10:34     ` Herbert Xu
  2015-05-23 13:12     ` Corentin LABBE
  2 siblings, 2 replies; 26+ messages in thread
From: Boris Brezillon @ 2015-05-17  8:45 UTC (permalink / raw)
  To: LABBE Corentin
  Cc: robh+dt, pawel.moll, mark.rutland, ijc+devicetree, galak,
	maxime.ripard, linux, herbert, davem, akpm, gregkh, mchehab, joe,
	tj, arnd, devicetree, linux-doc, linux-arm-kernel, linux-kernel,
	linux-crypto, linux-sunxi

Hi Corentin,

I started to review this new version, and I still think there's
something wrong with the way your processing crypto requests.
>From my POV this is not asynchronous at all (see my comments inline),
but maybe Herbert can confirm that.
I haven't reviewed the hash part yet, but I guess it has the same
problem.

Best Regards,

Boris

On Thu, 14 May 2015 14:59:01 +0200
LABBE Corentin <clabbe.montjoie@gmail.com> wrote:

> Add support for the Security System included in Allwinner SoC A20.
> The Security System is a hardware cryptographic accelerator that support:
> - MD5 and SHA1 hash algorithms
> - AES block cipher in CBC/ECB mode with 128/196/256bits keys.
> - DES and 3DES block cipher in CBC/ECB mode
> 
> Signed-off-by: LABBE Corentin <clabbe.montjoie@gmail.com>
> ---

[...]

> +
> +int sun4i_ss_aes_poll(struct ablkcipher_request *areq, u32 mode)
> +{
> +	u32 spaces;
> +	struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(areq);
> +	struct sun4i_tfm_ctx *op = crypto_ablkcipher_ctx(tfm);
> +	struct sun4i_ss_ctx *ss = op->ss;
> +	unsigned int ivsize = crypto_ablkcipher_ivsize(tfm);
> +	/* when activating SS, the default FIFO space is 32 */
> +	u32 rx_cnt = 32;
> +	u32 tx_cnt = 0;
> +	u32 v;
> +	int i, sgnum, err = 0;
> +	unsigned int ileft = areq->nbytes;
> +	unsigned int oleft = areq->nbytes;
> +	unsigned int todo, miter_flag;
> +	unsigned long flags;
> +	struct sg_mapping_iter mi, mo;
> +	unsigned int oi, oo; /* offset for in and out */
> +
> +	if (areq->nbytes == 0)
> +		return 0;
> +
> +	if (areq->info == NULL) {
> +		dev_err(ss->dev, "ERROR: Empty IV\n");
> +		return -EINVAL;
> +	}
> +
> +	if (areq->src == NULL || areq->dst == NULL) {
> +		dev_err(ss->dev, "ERROR: Some SGs are NULL\n");
> +		return -EINVAL;
> +	}
> +
> +	spin_lock_irqsave(&ss->slock, flags);

Do you really need to take this lock so early ?
BTW, what is this lock protecting ?

IMHO, taking a spinlock and disabling irqs for the whole time you're
executing a crypto request is not a good idea (it will prevent all
other irqs from running, and potentially introduce latencies in other
parts of the kernel).

What you can do though is declare the following fields in your crypto
engine struct (sun4i_ss_ctx):
- a crypto request queue (struct crypto_queue [1])
- a crypto_async_request variable storing the request being processed
- a lock protecting the queue and the current request variable

This way you'll only have to take the lock when queuing or dequeuing a
request.

Another comment, your implementation does not seem to be asynchronous at
all: you're blocking the caller until its crypto request is complete.


> +
> +	for (i = 0; i < op->keylen; i += 4)
> +		writel(*(op->key + i / 4), ss->base + SS_KEY0 + i);
> +
> +	if (areq->info != NULL) {
> +		for (i = 0; i < 4 && i < ivsize / 4; i++) {
> +			v = *(u32 *)(areq->info + i * 4);
> +			writel(v, ss->base + SS_IV0 + i * 4);
> +		}
> +	}
> +	writel(mode, ss->base + SS_CTL);
> +
> +	sgnum = sg_nents(areq->src);
> +	if (sgnum == 1)
> +		miter_flag = SG_MITER_FROM_SG | SG_MITER_ATOMIC;
> +	else
> +		miter_flag = SG_MITER_FROM_SG;


Why is the ATOMIC flag depending on the number of sg elements.
IMO it should only depends on the context you're currently in, and in
your specific case, you're always in atomic context since you've taken
a spinlock (and disabled irqs) a few lines above.

Note that with the approach I previously proposed, you can even get rid
of this ATMIC flag (or always set it depending on the context you're in
when dequeuing the crypto requests).


> +	sg_miter_start(&mi, areq->src, sgnum, miter_flag);
> +	sgnum = sg_nents(areq->dst);
> +	if (sgnum == 1)
> +		miter_flag = SG_MITER_TO_SG | SG_MITER_ATOMIC;
> +	else
> +		miter_flag = SG_MITER_TO_SG;

Ditto

> +	sg_miter_start(&mo, areq->dst, sgnum, miter_flag);
> +	sg_miter_next(&mi);
> +	sg_miter_next(&mo);
> +	if (mi.addr == NULL || mo.addr == NULL) {
> +		err = -EINVAL;
> +		goto release_ss;
> +	}
> +
> +	ileft = areq->nbytes / 4;
> +	oleft = areq->nbytes / 4;
> +	oi = 0;
> +	oo = 0;
> +	do {
> +		todo = min3(rx_cnt, ileft, (mi.length - oi) / 4);
> +		if (todo > 0) {
> +			ileft -= todo;
> +			writesl(ss->base + SS_RXFIFO, mi.addr + oi, todo);

Is there anything guaranteeing that your pointer is aligned on a 4 byte
boundary ? If that's not the case, I guess you should copy it in a
temporary variable before using writesl.

> +			oi += todo * 4;
> +		}
> +		if (oi == mi.length) {
> +			sg_miter_next(&mi);
> +			oi = 0;
> +		}
> +
> +		ispaces = readl_relaxed(ss->base + SS_FCSR);

Is there a good reason for using the _relaxed version of readl/writel
(the same comment applies a few lines below) ?

> +		rx_cnt = SS_RXFIFO_SPACES(spaces);
> +		tx_cnt = SS_TXFIFO_SPACES(spaces);
> +
> +		todo = min3(tx_cnt, oleft, (mo.length - oo) / 4);
> +		if (todo > 0) {
> +			oleft -= todo;
> +			readsl(ss->base + SS_TXFIFO, mo.addr + oo, todo);
> +			oo += todo * 4;
> +		}
> +		if (oo == mo.length) {
> +			sg_miter_next(&mo);
> +			oo = 0;
> +		}
> +	} while (mo.length > 0);
> +
> +release_ss:
> +	sg_miter_stop(&mi);
> +	sg_miter_stop(&mo);
> +	writel_relaxed(0, ss->base + SS_CTL);

Ditto.

> +	spin_unlock_irqrestore(&ss->slock, flags);
> +	return err;
> +}
> +
> +/* Pure CPU driven way of doing DES/3DES with SS */
> +int sun4i_ss_des_poll(struct ablkcipher_request *areq, u32 mode)
> +{
> +	struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(areq);
> +	struct sun4i_tfm_ctx *op = crypto_ablkcipher_ctx(tfm);
> +	struct sun4i_ss_ctx *ss = op->ss;
> +	int i, err = 0;
> +	int no_chunk = 1;
> +	struct scatterlist *in_sg = areq->src;
> +	struct scatterlist *out_sg = areq->dst;
> +	u8 kkey[256 / 8];
> +
> +	if (areq->nbytes == 0)
> +		return 0;
> +
> +	if (areq->info == NULL) {
> +		dev_err(ss->dev, "ERROR: Empty IV\n");
> +		return -EINVAL;
> +	}
> +
> +	if (areq->src == NULL || areq->dst == NULL) {
> +		dev_err(ss->dev, "ERROR: Some SGs are NULL\n");
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * if we have only SGs with size multiple of 4,
> +	 * we can use the SS AES function
> +	 */
> +	while (in_sg != NULL && no_chunk == 1) {
> +		if ((in_sg->length % 4) != 0)
> +			no_chunk = 0;
> +		in_sg = sg_next(in_sg);
> +	}
> +	while (out_sg != NULL && no_chunk == 1) {
> +		if ((out_sg->length % 4) != 0)
> +			no_chunk = 0;
> +		out_sg = sg_next(out_sg);
> +	}
> +
> +	if (no_chunk == 1)
> +		return sun4i_ss_aes_poll(areq, mode);
> +
> +	/*
> +	 * if some SG are not multiple of 4bytes use a fallback
> +	 * it is much easy and clean
> +	 */

Hm, is this really the best solution. I mean, you can easily pack
values from non aligned sg buffers so that you have only a 4 byte
aligned buffers.
Moreover, by doing this you'll end up with a single
sun4i_ss_ablkcipher_poll function.

BTW, I wonder if there's anything preventing an AES crypto request to be
forged the same way DES/3DES requests are (sg entries not aligned on 4
bytes boundary).

> +	ablkcipher_request_set_tfm(areq, op->fallback);
> +	for (i = 0; i < op->keylen; i++)
> +		*(u32 *)(kkey + i * 4) = op->key[i];
> +
> +	err = crypto_ablkcipher_setkey(op->fallback, kkey, op->keylen);
> +	if (err != 0) {
> +		dev_err(ss->dev, "Cannot set key on fallback\n");
> +		return -EINVAL;
> +	}
> +
> +	if ((mode & SS_DECRYPTION) == SS_DECRYPTION)
> +		err = crypto_ablkcipher_decrypt(areq);
> +	else
> +		err = crypto_ablkcipher_encrypt(areq);
> +	ablkcipher_request_set_tfm(areq, tfm);
> +	return err;
> +}

[1]http://lxr.free-electrons.com/source/include/crypto/algapi.h#L68

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 4/4] crypto: Add Allwinner Security System crypto accelerator
  2015-05-17  8:45   ` Boris Brezillon
@ 2015-05-17 10:34     ` Herbert Xu
  2015-05-17 10:48       ` Boris Brezillon
  2015-05-23 13:12     ` Corentin LABBE
  1 sibling, 1 reply; 26+ messages in thread
From: Herbert Xu @ 2015-05-17 10:34 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: LABBE Corentin, robh+dt, pawel.moll, mark.rutland,
	ijc+devicetree, galak, maxime.ripard, linux, davem, akpm, gregkh,
	mchehab, joe, tj, arnd, devicetree, linux-doc, linux-arm-kernel,
	linux-kernel, linux-crypto, linux-sunxi

On Sun, May 17, 2015 at 10:45:08AM +0200, Boris Brezillon wrote:
>
> From my POV this is not asynchronous at all (see my comments inline),
> but maybe Herbert can confirm that.

Well we don't actually require drivers to be asynchronous.  It is
obviously the preferred method but some hardware may not be capable
of that.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 4/4] crypto: Add Allwinner Security System crypto accelerator
  2015-05-17 10:34     ` Herbert Xu
@ 2015-05-17 10:48       ` Boris Brezillon
  2015-05-18  0:41         ` Herbert Xu
  0 siblings, 1 reply; 26+ messages in thread
From: Boris Brezillon @ 2015-05-17 10:48 UTC (permalink / raw)
  To: Herbert Xu
  Cc: LABBE Corentin, robh+dt, pawel.moll, mark.rutland,
	ijc+devicetree, galak, maxime.ripard, linux, davem, akpm, gregkh,
	mchehab, joe, tj, arnd, devicetree, linux-doc, linux-arm-kernel,
	linux-kernel, linux-crypto, linux-sunxi

Hi Herbert,

On Sun, 17 May 2015 18:34:40 +0800
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> On Sun, May 17, 2015 at 10:45:08AM +0200, Boris Brezillon wrote:
> >
> > From my POV this is not asynchronous at all (see my comments inline),
> > but maybe Herbert can confirm that.
> 
> Well we don't actually require drivers to be asynchronous.  It is
> obviously the preferred method but some hardware may not be capable
> of that.

Yep, but then they shouldn't be declared with CRYPTO_ALG_ASYNC and as an
ablkcipher algorithm (*Asynchronous* Block Cipher), right ?

Regarding the capabilities of this crypto engine, from what I've seen,
it is definitely capable of doing asynchronous requests ;-).

Best Regards,

Boris

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 4/4] crypto: Add Allwinner Security System crypto accelerator
  2015-05-17 10:48       ` Boris Brezillon
@ 2015-05-18  0:41         ` Herbert Xu
  2015-05-18  7:24           ` Boris Brezillon
  0 siblings, 1 reply; 26+ messages in thread
From: Herbert Xu @ 2015-05-18  0:41 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: LABBE Corentin, robh+dt, pawel.moll, mark.rutland,
	ijc+devicetree, galak, maxime.ripard, linux, davem, akpm, gregkh,
	mchehab, joe, tj, arnd, devicetree, linux-doc, linux-arm-kernel,
	linux-kernel, linux-crypto, linux-sunxi

On Sun, May 17, 2015 at 12:48:11PM +0200, Boris Brezillon wrote:
> 
> Yep, but then they shouldn't be declared with CRYPTO_ALG_ASYNC and as an
> ablkcipher algorithm (*Asynchronous* Block Cipher), right ?

Right.  They can still use ablkcipher but should clear the ASYNC
bit.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 4/4] crypto: Add Allwinner Security System crypto accelerator
  2015-05-18  0:41         ` Herbert Xu
@ 2015-05-18  7:24           ` Boris Brezillon
  2015-05-18  8:33             ` Herbert Xu
  0 siblings, 1 reply; 26+ messages in thread
From: Boris Brezillon @ 2015-05-18  7:24 UTC (permalink / raw)
  To: Herbert Xu
  Cc: LABBE Corentin, robh+dt, pawel.moll, mark.rutland,
	ijc+devicetree, galak, maxime.ripard, linux, davem, akpm, gregkh,
	mchehab, joe, tj, arnd, devicetree, linux-doc, linux-arm-kernel,
	linux-kernel, linux-crypto, linux-sunxi

Hi Herbert,

On Mon, 18 May 2015 08:41:21 +0800
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> On Sun, May 17, 2015 at 12:48:11PM +0200, Boris Brezillon wrote:
> > 
> > Yep, but then they shouldn't be declared with CRYPTO_ALG_ASYNC and as an

                                                                  not ^

> > ablkcipher algorithm (*Asynchronous* Block Cipher), right ?
> 
> Right.  They can still use ablkcipher but should clear the ASYNC
> bit.

Okay, just to be sure, what does "Asynchronous" mean in ablkcipher or
ahash ?
Is it related to the fact that crypto operations can be done in
multiple steps (e.g.: one set_key + several encrypt chunk operations),
or is this something else ?

Best Regards,

Boris

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 4/4] crypto: Add Allwinner Security System crypto accelerator
  2015-05-18  7:24           ` Boris Brezillon
@ 2015-05-18  8:33             ` Herbert Xu
  0 siblings, 0 replies; 26+ messages in thread
From: Herbert Xu @ 2015-05-18  8:33 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: LABBE Corentin, robh+dt, pawel.moll, mark.rutland,
	ijc+devicetree, galak, maxime.ripard, linux, davem, akpm, gregkh,
	mchehab, joe, tj, arnd, devicetree, linux-doc, linux-arm-kernel,
	linux-kernel, linux-crypto, linux-sunxi

On Mon, May 18, 2015 at 09:24:49AM +0200, Boris Brezillon wrote:
> 
> Okay, just to be sure, what does "Asynchronous" mean in ablkcipher or
> ahash ?
> Is it related to the fact that crypto operations can be done in
> multiple steps (e.g.: one set_key + several encrypt chunk operations),
> or is this something else ?

It gives you the ability to postpone the processing.  However
it doesn't mean that you have to postpone the processing.  For
instance, all of our sync software algorithms are also available
through the async interface for obvious reasons.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 1/4] ARM: sun7i: dt: Add Security System to A20 SoC DTS
  2015-05-15  7:31   ` Maxime Ripard
@ 2015-05-23 12:08     ` Corentin LABBE
  0 siblings, 0 replies; 26+ messages in thread
From: Corentin LABBE @ 2015-05-23 12:08 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: robh+dt, pawel.moll, mark.rutland, ijc+devicetree, galak, linux,
	herbert, davem, akpm, gregkh, mchehab, joe, tj, arnd,
	boris.brezillon, devicetree, linux-doc, linux-arm-kernel,
	linux-kernel, linux-crypto, linux-sunxi

Le 15/05/2015 09:31, Maxime Ripard a écrit :
> On Thu, May 14, 2015 at 02:58:58PM +0200, LABBE Corentin wrote:
>> The Security System is a hardware cryptographic accelerator that support
>> AES/MD5/SHA1/DES/3DES/PRNG algorithms.
>> It could be found on many Allwinner SoC.
>>
>> This patch enable the Security System on the Allwinner A20 SoC Device-tree.
>>
>> Signed-off-by: LABBE Corentin <clabbe.montjoie@gmail.com>
>> ---
>>  arch/arm/boot/dts/sun7i-a20.dtsi | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/arch/arm/boot/dts/sun7i-a20.dtsi b/arch/arm/boot/dts/sun7i-a20.dtsi
>> index fdd1817..9a823ce 100644
>> --- a/arch/arm/boot/dts/sun7i-a20.dtsi
>> +++ b/arch/arm/boot/dts/sun7i-a20.dtsi
>> @@ -679,6 +679,14 @@
>>  			status = "disabled";
>>  		};
>>  
>> +		crypto: crypto-engine@01c15000 {
>> +			compatible = "allwinner,sun4i-a20-crypto";
> 
> This looks wrong. It's either sun4i-a10 or sun7i-a20.
> 
> It looks good otherwise.
> 
> Thanks!
> Maxime
> 
Fixed with sun4i-a10

Thanks


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [linux-sunxi] Re: [PATCH v9 4/4] crypto: Add Allwinner Security System crypto accelerator
  2015-05-15  6:49   ` Herbert Xu
@ 2015-05-23 12:18     ` Corentin LABBE
  2015-05-24  3:30       ` Herbert Xu
  0 siblings, 1 reply; 26+ messages in thread
From: Corentin LABBE @ 2015-05-23 12:18 UTC (permalink / raw)
  To: herbert
  Cc: robh+dt, pawel.moll, mark.rutland, ijc+devicetree, galak,
	maxime.ripard, linux, davem, akpm, gregkh, mchehab, joe, tj,
	arnd, boris.brezillon, devicetree, linux-doc, linux-arm-kernel,
	linux-kernel, linux-crypto, linux-sunxi

Le 15/05/2015 08:49, Herbert Xu a écrit :
> On Thu, May 14, 2015 at 02:59:01PM +0200, LABBE Corentin wrote:
>>
>> +	err = crypto_ablkcipher_setkey(op->fallback, kkey, op->keylen);
>> +	if (err != 0) {
>> +		dev_err(ss->dev, "Cannot set key on fallback\n");
>> +		return -EINVAL;
>> +	}
> 
> You cannot call setkey from an atomic context.  The easiest solution
> is to call setkey in your setkey function instead.
> 
> Cheers,
> 
Fixed

What do you think about adding a BUG_ON(in_atomic()) in crypto_ablkcipher_setkey() ?

Thanks


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 4/4] crypto: Add Allwinner Security System crypto accelerator
  2015-05-17  8:45   ` Boris Brezillon
  2015-05-17 10:34     ` Herbert Xu
@ 2015-05-23 13:12     ` Corentin LABBE
  2015-05-23 14:35       ` Boris Brezillon
  1 sibling, 1 reply; 26+ messages in thread
From: Corentin LABBE @ 2015-05-23 13:12 UTC (permalink / raw)
  To: Boris Brezillon, herbert
  Cc: robh+dt, pawel.moll, mark.rutland, ijc+devicetree, galak,
	maxime.ripard, linux, davem, akpm, gregkh, mchehab, joe, tj,
	arnd, devicetree, linux-doc, linux-arm-kernel, linux-kernel,
	linux-crypto, linux-sunxi

Le 17/05/2015 10:45, Boris Brezillon a écrit :
> Hi Corentin,
> 
> I started to review this new version, and I still think there's
> something wrong with the way your processing crypto requests.
> From my POV this is not asynchronous at all (see my comments inline),
> but maybe Herbert can confirm that.
> I haven't reviewed the hash part yet, but I guess it has the same
> problem.

For resuming your conversation with Herbert, I have removed all CRYPTO_ALG_ASYNC flags.

> 
> Best Regards,
> 
> Boris
> 
> On Thu, 14 May 2015 14:59:01 +0200
> LABBE Corentin <clabbe.montjoie@gmail.com> wrote:
> 
>> Add support for the Security System included in Allwinner SoC A20.
>> The Security System is a hardware cryptographic accelerator that support:
>> - MD5 and SHA1 hash algorithms
>> - AES block cipher in CBC/ECB mode with 128/196/256bits keys.
>> - DES and 3DES block cipher in CBC/ECB mode
>>
>> Signed-off-by: LABBE Corentin <clabbe.montjoie@gmail.com>
>> ---
> 
> [...]
> 
>> +
>> +int sun4i_ss_aes_poll(struct ablkcipher_request *areq, u32 mode)
>> +{
>> +	u32 spaces;
>> +	struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(areq);
>> +	struct sun4i_tfm_ctx *op = crypto_ablkcipher_ctx(tfm);
>> +	struct sun4i_ss_ctx *ss = op->ss;
>> +	unsigned int ivsize = crypto_ablkcipher_ivsize(tfm);
>> +	/* when activating SS, the default FIFO space is 32 */
>> +	u32 rx_cnt = 32;
>> +	u32 tx_cnt = 0;
>> +	u32 v;
>> +	int i, sgnum, err = 0;
>> +	unsigned int ileft = areq->nbytes;
>> +	unsigned int oleft = areq->nbytes;
>> +	unsigned int todo, miter_flag;
>> +	unsigned long flags;
>> +	struct sg_mapping_iter mi, mo;
>> +	unsigned int oi, oo; /* offset for in and out */
>> +
>> +	if (areq->nbytes == 0)
>> +		return 0;
>> +
>> +	if (areq->info == NULL) {
>> +		dev_err(ss->dev, "ERROR: Empty IV\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (areq->src == NULL || areq->dst == NULL) {
>> +		dev_err(ss->dev, "ERROR: Some SGs are NULL\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	spin_lock_irqsave(&ss->slock, flags);
> 
> Do you really need to take this lock so early ?
> BTW, what is this lock protecting ?

As I have written in the header, the spinlock protect the usage of the device.
In this case, I need to lock just before writing the key in the device.

> 
> IMHO, taking a spinlock and disabling irqs for the whole time you're
> executing a crypto request is not a good idea (it will prevent all
> other irqs from running, and potentially introduce latencies in other
> parts of the kernel).

Since crypto operation could be called by software interrupt, I need to disable them.
(Confirmed by http://www.makelinux.net/ldd3/chp-5-sect-5 5.5.3)

> 
> What you can do though is declare the following fields in your crypto
> engine struct (sun4i_ss_ctx):
> - a crypto request queue (struct crypto_queue [1])
> - a crypto_async_request variable storing the request being processed
> - a lock protecting the queue and the current request variable
> 
> This way you'll only have to take the lock when queuing or dequeuing a
> request.
> 
> Another comment, your implementation does not seem to be asynchronous at
> all: you're blocking the caller until its crypto request is complete.
> 
> 
>> +
>> +	for (i = 0; i < op->keylen; i += 4)
>> +		writel(*(op->key + i / 4), ss->base + SS_KEY0 + i);
>> +
>> +	if (areq->info != NULL) {
>> +		for (i = 0; i < 4 && i < ivsize / 4; i++) {
>> +			v = *(u32 *)(areq->info + i * 4);
>> +			writel(v, ss->base + SS_IV0 + i * 4);
>> +		}
>> +	}
>> +	writel(mode, ss->base + SS_CTL);
>> +
>> +	sgnum = sg_nents(areq->src);
>> +	if (sgnum == 1)
>> +		miter_flag = SG_MITER_FROM_SG | SG_MITER_ATOMIC;
>> +	else
>> +		miter_flag = SG_MITER_FROM_SG;
> 
> 
> Why is the ATOMIC flag depending on the number of sg elements.
> IMO it should only depends on the context you're currently in, and in
> your specific case, you're always in atomic context since you've taken
> a spinlock (and disabled irqs) a few lines above.
> 
> Note that with the approach I previously proposed, you can even get rid
> of this ATMIC flag (or always set it depending on the context you're in
> when dequeuing the crypto requests).
> 

For sg_miter, the ATOMIC flag control the usage of kmap vs kmap_atomic.
Since kmap_atomic must not be held too long, I use them only for short crypto operation.
But I need to rebench for be sure that using kmap_atomic give faster result.
If I keep that, I will add a comment explaining that.

> 
>> +	sg_miter_start(&mi, areq->src, sgnum, miter_flag);
>> +	sgnum = sg_nents(areq->dst);
>> +	if (sgnum == 1)
>> +		miter_flag = SG_MITER_TO_SG | SG_MITER_ATOMIC;
>> +	else
>> +		miter_flag = SG_MITER_TO_SG;
> 
> Ditto
> 
>> +	sg_miter_start(&mo, areq->dst, sgnum, miter_flag);
>> +	sg_miter_next(&mi);
>> +	sg_miter_next(&mo);
>> +	if (mi.addr == NULL || mo.addr == NULL) {
>> +		err = -EINVAL;
>> +		goto release_ss;
>> +	}
>> +
>> +	ileft = areq->nbytes / 4;
>> +	oleft = areq->nbytes / 4;
>> +	oi = 0;
>> +	oo = 0;
>> +	do {
>> +		todo = min3(rx_cnt, ileft, (mi.length - oi) / 4);
>> +		if (todo > 0) {
>> +			ileft -= todo;
>> +			writesl(ss->base + SS_RXFIFO, mi.addr + oi, todo);
> 
> Is there anything guaranteeing that your pointer is aligned on a 4 byte
> boundary ? If that's not the case, I guess you should copy it in a
> temporary variable before using writesl.

The cra_alignmask is my guarantee.

> 
>> +			oi += todo * 4;
>> +		}
>> +		if (oi == mi.length) {
>> +			sg_miter_next(&mi);
>> +			oi = 0;
>> +		}
>> +
>> +		ispaces = readl_relaxed(ss->base + SS_FCSR);
> 
> Is there a good reason for using the _relaxed version of readl/writel
> (the same comment applies a few lines below) ?

No, it is clearly a remaining of the times where all my read/write was with _relaxed.

> 
>> +		rx_cnt = SS_RXFIFO_SPACES(spaces);
>> +		tx_cnt = SS_TXFIFO_SPACES(spaces);
>> +
>> +		todo = min3(tx_cnt, oleft, (mo.length - oo) / 4);
>> +		if (todo > 0) {
>> +			oleft -= todo;
>> +			readsl(ss->base + SS_TXFIFO, mo.addr + oo, todo);
>> +			oo += todo * 4;
>> +		}
>> +		if (oo == mo.length) {
>> +			sg_miter_next(&mo);
>> +			oo = 0;
>> +		}
>> +	} while (mo.length > 0);
>> +
>> +release_ss:
>> +	sg_miter_stop(&mi);
>> +	sg_miter_stop(&mo);
>> +	writel_relaxed(0, ss->base + SS_CTL);
> 
> Ditto.
> 
>> +	spin_unlock_irqrestore(&ss->slock, flags);
>> +	return err;
>> +}
>> +
>> +/* Pure CPU driven way of doing DES/3DES with SS */
>> +int sun4i_ss_des_poll(struct ablkcipher_request *areq, u32 mode)
>> +{
>> +	struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(areq);
>> +	struct sun4i_tfm_ctx *op = crypto_ablkcipher_ctx(tfm);
>> +	struct sun4i_ss_ctx *ss = op->ss;
>> +	int i, err = 0;
>> +	int no_chunk = 1;
>> +	struct scatterlist *in_sg = areq->src;
>> +	struct scatterlist *out_sg = areq->dst;
>> +	u8 kkey[256 / 8];
>> +
>> +	if (areq->nbytes == 0)
>> +		return 0;
>> +
>> +	if (areq->info == NULL) {
>> +		dev_err(ss->dev, "ERROR: Empty IV\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (areq->src == NULL || areq->dst == NULL) {
>> +		dev_err(ss->dev, "ERROR: Some SGs are NULL\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	/*
>> +	 * if we have only SGs with size multiple of 4,
>> +	 * we can use the SS AES function
>> +	 */
>> +	while (in_sg != NULL && no_chunk == 1) {
>> +		if ((in_sg->length % 4) != 0)
>> +			no_chunk = 0;
>> +		in_sg = sg_next(in_sg);
>> +	}
>> +	while (out_sg != NULL && no_chunk == 1) {
>> +		if ((out_sg->length % 4) != 0)
>> +			no_chunk = 0;
>> +		out_sg = sg_next(out_sg);
>> +	}
>> +
>> +	if (no_chunk == 1)
>> +		return sun4i_ss_aes_poll(areq, mode);
>> +
>> +	/*
>> +	 * if some SG are not multiple of 4bytes use a fallback
>> +	 * it is much easy and clean
>> +	 */
> 
> Hm, is this really the best solution. I mean, you can easily pack
> values from non aligned sg buffers so that you have only a 4 byte
> aligned buffers.
> Moreover, by doing this you'll end up with a single
> sun4i_ss_ablkcipher_poll function.
> 
> BTW, I wonder if there's anything preventing an AES crypto request to be
> forged the same way DES/3DES requests are (sg entries not aligned on 4
> bytes boundary).

There are two different problem: chunking and alignment.
The correct alignment is handled by the crypto API with the alignmask, so the driver do not need to care about it.
The chunking is the fact that I can have a SG with a size that is non multiple of 4.
For DES/3DES I handle this problem by using a fallback since DES/3DES was not my priority. (but yes I will handle it in the future)
For AES I have assumed that it cannot happen since no test in tcrypt check for it.
Since all SG I get was always a multiple of 16 (AES BLOCK SIZE) it was a sort of confirmation.

Herbert ? does am I right or a chunking test is missing for cbc(aes) in testmgr.h

> 
>> +	ablkcipher_request_set_tfm(areq, op->fallback);
>> +	for (i = 0; i < op->keylen; i++)
>> +		*(u32 *)(kkey + i * 4) = op->key[i];
>> +
>> +	err = crypto_ablkcipher_setkey(op->fallback, kkey, op->keylen);
>> +	if (err != 0) {
>> +		dev_err(ss->dev, "Cannot set key on fallback\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if ((mode & SS_DECRYPTION) == SS_DECRYPTION)
>> +		err = crypto_ablkcipher_decrypt(areq);
>> +	else
>> +		err = crypto_ablkcipher_encrypt(areq);
>> +	ablkcipher_request_set_tfm(areq, tfm);
>> +	return err;
>> +}
> 
> [1]http://lxr.free-electrons.com/source/include/crypto/algapi.h#L68
> 

Thanks for your review.

LABBE Corentin


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 4/4] crypto: Add Allwinner Security System crypto accelerator
  2015-05-23 13:12     ` Corentin LABBE
@ 2015-05-23 14:35       ` Boris Brezillon
  2015-05-24  3:32         ` Herbert Xu
  2015-06-03 19:06         ` Corentin LABBE
  0 siblings, 2 replies; 26+ messages in thread
From: Boris Brezillon @ 2015-05-23 14:35 UTC (permalink / raw)
  To: Corentin LABBE
  Cc: herbert, robh+dt, pawel.moll, mark.rutland, ijc+devicetree,
	galak, maxime.ripard, linux, davem, akpm, gregkh, mchehab, joe,
	tj, arnd, devicetree, linux-doc, linux-arm-kernel, linux-kernel,
	linux-crypto, linux-sunxi

Hi Corentin,

On Sat, 23 May 2015 15:12:23 +0200
Corentin LABBE <clabbe.montjoie@gmail.com> wrote:

> Le 17/05/2015 10:45, Boris Brezillon a écrit :
> > Hi Corentin,
> > 
> > I started to review this new version, and I still think there's
> > something wrong with the way your processing crypto requests.
> > From my POV this is not asynchronous at all (see my comments inline),
> > but maybe Herbert can confirm that.
> > I haven't reviewed the hash part yet, but I guess it has the same
> > problem.
> 
> For resuming your conversation with Herbert, I have removed all CRYPTO_ALG_ASYNC flags.

Okay. I really think you can easily deal with asynchronous request (I
had a look at the datasheet), but I'll let maintainers decide whether
this is important or not.


> > 
> >> +
> >> +int sun4i_ss_aes_poll(struct ablkcipher_request *areq, u32 mode)
> >> +{
> >> +	u32 spaces;
> >> +	struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(areq);
> >> +	struct sun4i_tfm_ctx *op = crypto_ablkcipher_ctx(tfm);
> >> +	struct sun4i_ss_ctx *ss = op->ss;
> >> +	unsigned int ivsize = crypto_ablkcipher_ivsize(tfm);
> >> +	/* when activating SS, the default FIFO space is 32 */
> >> +	u32 rx_cnt = 32;
> >> +	u32 tx_cnt = 0;
> >> +	u32 v;
> >> +	int i, sgnum, err = 0;
> >> +	unsigned int ileft = areq->nbytes;
> >> +	unsigned int oleft = areq->nbytes;
> >> +	unsigned int todo, miter_flag;
> >> +	unsigned long flags;
> >> +	struct sg_mapping_iter mi, mo;
> >> +	unsigned int oi, oo; /* offset for in and out */
> >> +
> >> +	if (areq->nbytes == 0)
> >> +		return 0;
> >> +
> >> +	if (areq->info == NULL) {
> >> +		dev_err(ss->dev, "ERROR: Empty IV\n");
> >> +		return -EINVAL;
> >> +	}
> >> +
> >> +	if (areq->src == NULL || areq->dst == NULL) {
> >> +		dev_err(ss->dev, "ERROR: Some SGs are NULL\n");
> >> +		return -EINVAL;
> >> +	}
> >> +
> >> +	spin_lock_irqsave(&ss->slock, flags);
> > 
> > Do you really need to take this lock so early ?
> > BTW, what is this lock protecting ?
> 
> As I have written in the header, the spinlock protect the usage of the device.
> In this case, I need to lock just before writing the key in the device.

I'm not denying the fact that you need some locking, just how this
locking is done: you're preventing the all system from receiving
interrupts for the all time your doing your crypto request.

Here is a suggestion (if you still want to keep the synchronous model,
which IMHO is not a good idea, but hey, that's not my call to make).

1/ wait for the device to be ready (using a waitqueue)
2/ take the lock
3/ check if the engine is busy (already handling another crypto
   request).
4.1/ If the engine is not busy, mark the engine as busy, release the
     lock and proceed with the crytpo request handling.
4.2/ If the engine is busy, release the lock and go back to 1/ 


> 
> > 
> > IMHO, taking a spinlock and disabling irqs for the whole time you're
> > executing a crypto request is not a good idea (it will prevent all
> > other irqs from running, and potentially introduce latencies in other
> > parts of the kernel).
> 
> Since crypto operation could be called by software interrupt, I need to disable them.
> (Confirmed by http://www.makelinux.net/ldd3/chp-5-sect-5 5.5.3)

Hm, you're not even using the interrupts provided by the IP to detect
when the engine is ready to accept new data chunks (which is another
aspect that should be addressed IMO), so I don't get why you need to
disable HW interrupts.
If you just want to disable SW interrupts, you can use spin_lock_bh()
instead of spin_lock_irqsave().

> 
> > 
> > What you can do though is declare the following fields in your crypto
> > engine struct (sun4i_ss_ctx):
> > - a crypto request queue (struct crypto_queue [1])
> > - a crypto_async_request variable storing the request being processed
> > - a lock protecting the queue and the current request variable
> > 
> > This way you'll only have to take the lock when queuing or dequeuing a
> > request.
> > 
> > Another comment, your implementation does not seem to be asynchronous at
> > all: you're blocking the caller until its crypto request is complete.
> > 
> > 
> >> +
> >> +	for (i = 0; i < op->keylen; i += 4)
> >> +		writel(*(op->key + i / 4), ss->base + SS_KEY0 + i);
> >> +
> >> +	if (areq->info != NULL) {
> >> +		for (i = 0; i < 4 && i < ivsize / 4; i++) {
> >> +			v = *(u32 *)(areq->info + i * 4);
> >> +			writel(v, ss->base + SS_IV0 + i * 4);
> >> +		}
> >> +	}
> >> +	writel(mode, ss->base + SS_CTL);
> >> +
> >> +	sgnum = sg_nents(areq->src);
> >> +	if (sgnum == 1)
> >> +		miter_flag = SG_MITER_FROM_SG | SG_MITER_ATOMIC;
> >> +	else
> >> +		miter_flag = SG_MITER_FROM_SG;
> > 
> > 
> > Why is the ATOMIC flag depending on the number of sg elements.
> > IMO it should only depends on the context you're currently in, and in
> > your specific case, you're always in atomic context since you've taken
> > a spinlock (and disabled irqs) a few lines above.
> > 
> > Note that with the approach I previously proposed, you can even get rid
> > of this ATMIC flag (or always set it depending on the context you're in
> > when dequeuing the crypto requests).
> > 
> 
> For sg_miter, the ATOMIC flag control the usage of kmap vs kmap_atomic.
> Since kmap_atomic must not be held too long, I use them only for short crypto operation.
> But I need to rebench for be sure that using kmap_atomic give faster result.
> If I keep that, I will add a comment explaining that.

Maybe you can call kmap_atomic when you're in standard context (even if
I'm not sure this is relevant), but you definitely can't call kmap when
you're in atomic context (see the might_sleep() line here [1]). And
since you've taken a spinlock (and disabled all the interrupts) a few
lines above, you're in atomic context here.

> > 
> >> +	sg_miter_start(&mo, areq->dst, sgnum, miter_flag);
> >> +	sg_miter_next(&mi);
> >> +	sg_miter_next(&mo);
> >> +	if (mi.addr == NULL || mo.addr == NULL) {
> >> +		err = -EINVAL;
> >> +		goto release_ss;
> >> +	}
> >> +
> >> +	ileft = areq->nbytes / 4;
> >> +	oleft = areq->nbytes / 4;
> >> +	oi = 0;
> >> +	oo = 0;
> >> +	do {
> >> +		todo = min3(rx_cnt, ileft, (mi.length - oi) / 4);
> >> +		if (todo > 0) {
> >> +			ileft -= todo;
> >> +			writesl(ss->base + SS_RXFIFO, mi.addr + oi, todo);
> > 
> > Is there anything guaranteeing that your pointer is aligned on a 4 byte
> > boundary ? If that's not the case, I guess you should copy it in a
> > temporary variable before using writesl.
> 
> The cra_alignmask is my guarantee.

Right.

> 
> > 
> >> +			oi += todo * 4;
> >> +		}
> >> +		if (oi == mi.length) {
> >> +			sg_miter_next(&mi);
> >> +			oi = 0;
> >> +		}
> >> +
> >> +		ispaces = readl_relaxed(ss->base + SS_FCSR);
> > 
> > Is there a good reason for using the _relaxed version of readl/writel
> > (the same comment applies a few lines below) ?
> 
> No, it is clearly a remaining of the times where all my read/write was with _relaxed.


Okay, then maybe you should reconsider using readl/writel, unless you
really know why you're using relaxed versions.

> > 
> >> +	spin_unlock_irqrestore(&ss->slock, flags);
> >> +	return err;
> >> +}
> >> +
> >> +/* Pure CPU driven way of doing DES/3DES with SS */
> >> +int sun4i_ss_des_poll(struct ablkcipher_request *areq, u32 mode)
> >> +{
> >> +	struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(areq);
> >> +	struct sun4i_tfm_ctx *op = crypto_ablkcipher_ctx(tfm);
> >> +	struct sun4i_ss_ctx *ss = op->ss;
> >> +	int i, err = 0;
> >> +	int no_chunk = 1;
> >> +	struct scatterlist *in_sg = areq->src;
> >> +	struct scatterlist *out_sg = areq->dst;
> >> +	u8 kkey[256 / 8];
> >> +
> >> +	if (areq->nbytes == 0)
> >> +		return 0;
> >> +
> >> +	if (areq->info == NULL) {
> >> +		dev_err(ss->dev, "ERROR: Empty IV\n");
> >> +		return -EINVAL;
> >> +	}
> >> +
> >> +	if (areq->src == NULL || areq->dst == NULL) {
> >> +		dev_err(ss->dev, "ERROR: Some SGs are NULL\n");
> >> +		return -EINVAL;
> >> +	}
> >> +
> >> +	/*
> >> +	 * if we have only SGs with size multiple of 4,
> >> +	 * we can use the SS AES function
> >> +	 */
> >> +	while (in_sg != NULL && no_chunk == 1) {
> >> +		if ((in_sg->length % 4) != 0)
> >> +			no_chunk = 0;
> >> +		in_sg = sg_next(in_sg);
> >> +	}
> >> +	while (out_sg != NULL && no_chunk == 1) {
> >> +		if ((out_sg->length % 4) != 0)
> >> +			no_chunk = 0;
> >> +		out_sg = sg_next(out_sg);
> >> +	}
> >> +
> >> +	if (no_chunk == 1)
> >> +		return sun4i_ss_aes_poll(areq, mode);
> >> +
> >> +	/*
> >> +	 * if some SG are not multiple of 4bytes use a fallback
> >> +	 * it is much easy and clean
> >> +	 */
> > 
> > Hm, is this really the best solution. I mean, you can easily pack
> > values from non aligned sg buffers so that you have only a 4 byte
> > aligned buffers.
> > Moreover, by doing this you'll end up with a single
> > sun4i_ss_ablkcipher_poll function.
> > 
> > BTW, I wonder if there's anything preventing an AES crypto request to be
> > forged the same way DES/3DES requests are (sg entries not aligned on 4
> > bytes boundary).
> 
> There are two different problem: chunking and alignment.
> The correct alignment is handled by the crypto API with the alignmask, so the driver do not need to care about it.

Yes...

> The chunking is the fact that I can have a SG with a size that is non multiple of 4.

and I was takling about this specific aspect here.

> For DES/3DES I handle this problem by using a fallback since DES/3DES was not my priority. (but yes I will handle it in the future)
> For AES I have assumed that it cannot happen since no test in tcrypt check for it.

If that's something you're willing to address in a future version, then
I'm fine with that.

> Since all SG I get was always a multiple of 16 (AES BLOCK SIZE) it was a sort of confirmation.
> 
> Herbert ? does am I right or a chunking test is missing for cbc(aes) in testmgr.h

Okay, just sharing my vision of this thing (I'll let Herbert comment on
this aspect): I'd say that theoretically nothing prevents one from
splitting its sg list in chunks smaller than the block size, so I'd
say you should use the same trick for AES. 



[1]http://lxr.free-electrons.com/source/arch/arm/mm/highmem.c#L37

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [linux-sunxi] Re: [PATCH v9 4/4] crypto: Add Allwinner Security System crypto accelerator
  2015-05-23 12:18     ` [linux-sunxi] " Corentin LABBE
@ 2015-05-24  3:30       ` Herbert Xu
  0 siblings, 0 replies; 26+ messages in thread
From: Herbert Xu @ 2015-05-24  3:30 UTC (permalink / raw)
  To: Corentin LABBE
  Cc: robh+dt, pawel.moll, mark.rutland, ijc+devicetree, galak,
	maxime.ripard, linux, davem, akpm, gregkh, mchehab, joe, tj,
	arnd, boris.brezillon, devicetree, linux-doc, linux-arm-kernel,
	linux-kernel, linux-crypto, linux-sunxi

On Sat, May 23, 2015 at 02:18:06PM +0200, Corentin LABBE wrote:
>
> What do you think about adding a BUG_ON(in_atomic()) in crypto_ablkcipher_setkey() ?

Just add a might_sleep() to it.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 4/4] crypto: Add Allwinner Security System crypto accelerator
  2015-05-23 14:35       ` Boris Brezillon
@ 2015-05-24  3:32         ` Herbert Xu
  2015-05-24  8:04           ` [linux-sunxi] " Corentin LABBE
  2015-06-03 19:06         ` Corentin LABBE
  1 sibling, 1 reply; 26+ messages in thread
From: Herbert Xu @ 2015-05-24  3:32 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Corentin LABBE, robh+dt, pawel.moll, mark.rutland,
	ijc+devicetree, galak, maxime.ripard, linux, davem, akpm, gregkh,
	mchehab, joe, tj, arnd, devicetree, linux-doc, linux-arm-kernel,
	linux-kernel, linux-crypto, linux-sunxi

On Sat, May 23, 2015 at 04:35:36PM +0200, Boris Brezillon wrote:
>
> > Since all SG I get was always a multiple of 16 (AES BLOCK SIZE) it was a sort of confirmation.
> > 
> > Herbert ? does am I right or a chunking test is missing for cbc(aes) in testmgr.h
> 
> Okay, just sharing my vision of this thing (I'll let Herbert comment on
> this aspect): I'd say that theoretically nothing prevents one from
> splitting its sg list in chunks smaller than the block size, so I'd
> say you should use the same trick for AES. 

Indeed, you must be able to handle a block that straddles two SG
entries.  If the hardware is not capable then you have to linearise
it by copying or use a fallback.

I can't believe we don't have a generic test for this in testmgr.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [linux-sunxi] Re: [PATCH v9 4/4] crypto: Add Allwinner Security System crypto accelerator
  2015-05-24  3:32         ` Herbert Xu
@ 2015-05-24  8:04           ` Corentin LABBE
  2015-05-24  8:12             ` Herbert Xu
  0 siblings, 1 reply; 26+ messages in thread
From: Corentin LABBE @ 2015-05-24  8:04 UTC (permalink / raw)
  To: herbert, Boris Brezillon
  Cc: robh+dt, pawel.moll, mark.rutland, ijc+devicetree, galak,
	maxime.ripard, linux, davem, akpm, gregkh, mchehab, joe, tj,
	arnd, devicetree, linux-doc, linux-arm-kernel, linux-kernel,
	linux-crypto, linux-sunxi

Le 24/05/2015 05:32, Herbert Xu a écrit :
> On Sat, May 23, 2015 at 04:35:36PM +0200, Boris Brezillon wrote:
>>
>>> Since all SG I get was always a multiple of 16 (AES BLOCK SIZE) it was a sort of confirmation.
>>>
>>> Herbert ? does am I right or a chunking test is missing for cbc(aes) in testmgr.h
>>
>> Okay, just sharing my vision of this thing (I'll let Herbert comment on
>> this aspect): I'd say that theoretically nothing prevents one from
>> splitting its sg list in chunks smaller than the block size, so I'd
>> say you should use the same trick for AES. 
> 
> Indeed, you must be able to handle a block that straddles two SG
> entries.  If the hardware is not capable then you have to linearise
> it by copying or use a fallback.
> 
> I can't believe we don't have a generic test for this in testmgr.
> 

For aes_cbc it exists a test with 3 SG with .tap = { 496 - 20, 4, 16 }
But my driver handle that. (multiple of 4)

What do you think about adding a test with 16 SG of 1 byte ? (or 3 + 2 + 3 + 8 * 1)

Furthermore, I need to re-read testmgr.h but it seems that no aes test exists with odd SG size.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [linux-sunxi] Re: [PATCH v9 4/4] crypto: Add Allwinner Security System crypto accelerator
  2015-05-24  8:04           ` [linux-sunxi] " Corentin LABBE
@ 2015-05-24  8:12             ` Herbert Xu
  0 siblings, 0 replies; 26+ messages in thread
From: Herbert Xu @ 2015-05-24  8:12 UTC (permalink / raw)
  To: Corentin LABBE
  Cc: Boris Brezillon, robh+dt, pawel.moll, mark.rutland,
	ijc+devicetree, galak, maxime.ripard, linux, davem, akpm, gregkh,
	mchehab, joe, tj, arnd, devicetree, linux-doc, linux-arm-kernel,
	linux-kernel, linux-crypto, linux-sunxi

On Sun, May 24, 2015 at 10:04:16AM +0200, Corentin LABBE wrote:
> 
> For aes_cbc it exists a test with 3 SG with .tap = { 496 - 20, 4, 16 }
> But my driver handle that. (multiple of 4)
> 
> What do you think about adding a test with 16 SG of 1 byte ? (or 3 + 2 + 3 + 8 * 1)

Sure please send a patch.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 4/4] crypto: Add Allwinner Security System crypto accelerator
  2015-05-23 14:35       ` Boris Brezillon
  2015-05-24  3:32         ` Herbert Xu
@ 2015-06-03 19:06         ` Corentin LABBE
  1 sibling, 0 replies; 26+ messages in thread
From: Corentin LABBE @ 2015-06-03 19:06 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: herbert, robh+dt, pawel.moll, mark.rutland, ijc+devicetree,
	galak, maxime.ripard, linux, davem, akpm, gregkh, mchehab, joe,
	tj, arnd, devicetree, linux-doc, linux-arm-kernel, linux-kernel,
	linux-crypto, linux-sunxi

Le 23/05/2015 16:35, Boris Brezillon a écrit :
> Hi Corentin,
> 
> On Sat, 23 May 2015 15:12:23 +0200
> Corentin LABBE <clabbe.montjoie@gmail.com> wrote:
> 
>> Le 17/05/2015 10:45, Boris Brezillon a écrit :
>>> Hi Corentin,
>>>
>>> I started to review this new version, and I still think there's
>>> something wrong with the way your processing crypto requests.
>>> From my POV this is not asynchronous at all (see my comments inline),
>>> but maybe Herbert can confirm that.
>>> I haven't reviewed the hash part yet, but I guess it has the same
>>> problem.
>>
>> For resuming your conversation with Herbert, I have removed all CRYPTO_ALG_ASYNC flags.
> 
> Okay. I really think you can easily deal with asynchronous request (I
> had a look at the datasheet), but I'll let maintainers decide whether
> this is important or not.
> 
> 
>>>> +
>>>> +	if (areq->src == NULL || areq->dst == NULL) {
>>>> +		dev_err(ss->dev, "ERROR: Some SGs are NULL\n");
>>>> +		return -EINVAL;
>>>> +	}
>>>> +
>>>> +	spin_lock_irqsave(&ss->slock, flags);
>>>
>>> Do you really need to take this lock so early ?
>>> BTW, what is this lock protecting ?
>>
>> As I have written in the header, the spinlock protect the usage of the device.
>> In this case, I need to lock just before writing the key in the device.
> 
> I'm not denying the fact that you need some locking, just how this
> locking is done: you're preventing the all system from receiving
> interrupts for the all time your doing your crypto request.
> 
> Here is a suggestion (if you still want to keep the synchronous model,
> which IMHO is not a good idea, but hey, that's not my call to make).
> 
> 1/ wait for the device to be ready (using a waitqueue)
> 2/ take the lock
> 3/ check if the engine is busy (already handling another crypto
>    request).
> 4.1/ If the engine is not busy, mark the engine as busy, release the
>      lock and proceed with the crytpo request handling.
> 4.2/ If the engine is busy, release the lock and go back to 1/ 
> 
> 

I have done a version with a crypto_queue and a kthread.
This works perfectly but.. the performance are really really bad.
Never more than 20k request/s when both generic and synchronous SS can go beyond 80k.
With this asynchronous driver, the Security System become useful only with request larger than 2048 bytes
I have patched my driver to create stats on request size, this show that real usage is less than that. (512bytes for cryptoluks for example)

Furthermore, my current patch for using the PRNG cannot be used with the kthread since it use the hwrng interface.
But perhaps by using also a crypto_rng alg it could be done.

So I think that if I want to make the SS driver useful, I cannot set it asynchronous.
Perhaps when the DMA engine will be available, this will change.

I have attached the patch that make my driver asynchronous for comments on possible improvement.

>>
>>>
>>> IMHO, taking a spinlock and disabling irqs for the whole time you're
>>> executing a crypto request is not a good idea (it will prevent all
>>> other irqs from running, and potentially introduce latencies in other
>>> parts of the kernel).
>>
>> Since crypto operation could be called by software interrupt, I need to disable them.
>> (Confirmed by http://www.makelinux.net/ldd3/chp-5-sect-5 5.5.3)
> 
> Hm, you're not even using the interrupts provided by the IP to detect
> when the engine is ready to accept new data chunks (which is another
> aspect that should be addressed IMO), so I don't get why you need to
> disable HW interrupts.
> If you just want to disable SW interrupts, you can use spin_lock_bh()
> instead of spin_lock_irqsave().
> 

Thanks I use spin_lock_bh now.

>>
>>>
>>> What you can do though is declare the following fields in your crypto
>>> engine struct (sun4i_ss_ctx):
>>> - a crypto request queue (struct crypto_queue [1])
>>> - a crypto_async_request variable storing the request being processed
>>> - a lock protecting the queue and the current request variable
>>>
>>> This way you'll only have to take the lock when queuing or dequeuing a
>>> request.
>>>
>>> Another comment, your implementation does not seem to be asynchronous at
>>> all: you're blocking the caller until its crypto request is complete.
>>>
>>>
>>>> +
>>>> +	for (i = 0; i < op->keylen; i += 4)
>>>> +		writel(*(op->key + i / 4), ss->base + SS_KEY0 + i);
>>>> +
>>>> +	if (areq->info != NULL) {
>>>> +		for (i = 0; i < 4 && i < ivsize / 4; i++) {
>>>> +			v = *(u32 *)(areq->info + i * 4);
>>>> +			writel(v, ss->base + SS_IV0 + i * 4);
>>>> +		}
>>>> +	}
>>>> +	writel(mode, ss->base + SS_CTL);
>>>> +
>>>> +	sgnum = sg_nents(areq->src);
>>>> +	if (sgnum == 1)
>>>> +		miter_flag = SG_MITER_FROM_SG | SG_MITER_ATOMIC;
>>>> +	else
>>>> +		miter_flag = SG_MITER_FROM_SG;
>>>
>>>
>>> Why is the ATOMIC flag depending on the number of sg elements.
>>> IMO it should only depends on the context you're currently in, and in
>>> your specific case, you're always in atomic context since you've taken
>>> a spinlock (and disabled irqs) a few lines above.
>>>
>>> Note that with the approach I previously proposed, you can even get rid
>>> of this ATMIC flag (or always set it depending on the context you're in
>>> when dequeuing the crypto requests).
>>>
>>
>> For sg_miter, the ATOMIC flag control the usage of kmap vs kmap_atomic.
>> Since kmap_atomic must not be held too long, I use them only for short crypto operation.
>> But I need to rebench for be sure that using kmap_atomic give faster result.
>> If I keep that, I will add a comment explaining that.
> 
> Maybe you can call kmap_atomic when you're in standard context (even if
> I'm not sure this is relevant), but you definitely can't call kmap when
> you're in atomic context (see the might_sleep() line here [1]). And
> since you've taken a spinlock (and disabled all the interrupts) a few
> lines above, you're in atomic context here.
> 

Thanks, now I use atomic function everywhere it need to.

>>>
>>>> +	sg_miter_start(&mo, areq->dst, sgnum, miter_flag);
>>>> +	sg_miter_next(&mi);
>>>> +	sg_miter_next(&mo);
>>>> +	if (mi.addr == NULL || mo.addr == NULL) {
>>>> +		err = -EINVAL;
>>>> +		goto release_ss;
>>>> +	}
>>>> +
>>>> +	ileft = areq->nbytes / 4;
>>>> +	oleft = areq->nbytes / 4;
>>>> +	oi = 0;
>>>> +	oo = 0;
>>>> +	do {
>>>> +		todo = min3(rx_cnt, ileft, (mi.length - oi) / 4);
>>>> +		if (todo > 0) {
>>>> +			ileft -= todo;
>>>> +			writesl(ss->base + SS_RXFIFO, mi.addr + oi, todo);
>>>
>>> Is there anything guaranteeing that your pointer is aligned on a 4 byte
>>> boundary ? If that's not the case, I guess you should copy it in a
>>> temporary variable before using writesl.
>>
>> The cra_alignmask is my guarantee.
> 
> Right.
> 
>>
>>>
>>>> +			oi += todo * 4;
>>>> +		}
>>>> +		if (oi == mi.length) {
>>>> +			sg_miter_next(&mi);
>>>> +			oi = 0;
>>>> +		}
>>>> +
>>>> +		ispaces = readl_relaxed(ss->base + SS_FCSR);
>>>
>>> Is there a good reason for using the _relaxed version of readl/writel
>>> (the same comment applies a few lines below) ?
>>
>> No, it is clearly a remaining of the times where all my read/write was with _relaxed.
> 
> 
> Okay, then maybe you should reconsider using readl/writel, unless you
> really know why you're using relaxed versions.
> 

Yes I fixed it by not using _relaxed.

>>>
>>>> +	spin_unlock_irqrestore(&ss->slock, flags);
>>>> +	return err;
>>>> +}
>>>> +
>>>> +/* Pure CPU driven way of doing DES/3DES with SS */
>>>> +int sun4i_ss_des_poll(struct ablkcipher_request *areq, u32 mode)
>>>> +{
>>>> +	struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(areq);
>>>> +	struct sun4i_tfm_ctx *op = crypto_ablkcipher_ctx(tfm);
>>>> +	struct sun4i_ss_ctx *ss = op->ss;
>>>> +	int i, err = 0;
>>>> +	int no_chunk = 1;
>>>> +	struct scatterlist *in_sg = areq->src;
>>>> +	struct scatterlist *out_sg = areq->dst;
>>>> +	u8 kkey[256 / 8];
>>>> +
>>>> +	if (areq->nbytes == 0)
>>>> +		return 0;
>>>> +
>>>> +	if (areq->info == NULL) {
>>>> +		dev_err(ss->dev, "ERROR: Empty IV\n");
>>>> +		return -EINVAL;
>>>> +	}
>>>> +
>>>> +	if (areq->src == NULL || areq->dst == NULL) {
>>>> +		dev_err(ss->dev, "ERROR: Some SGs are NULL\n");
>>>> +		return -EINVAL;
>>>> +	}
>>>> +
>>>> +	/*
>>>> +	 * if we have only SGs with size multiple of 4,
>>>> +	 * we can use the SS AES function
>>>> +	 */
>>>> +	while (in_sg != NULL && no_chunk == 1) {
>>>> +		if ((in_sg->length % 4) != 0)
>>>> +			no_chunk = 0;
>>>> +		in_sg = sg_next(in_sg);
>>>> +	}
>>>> +	while (out_sg != NULL && no_chunk == 1) {
>>>> +		if ((out_sg->length % 4) != 0)
>>>> +			no_chunk = 0;
>>>> +		out_sg = sg_next(out_sg);
>>>> +	}
>>>> +
>>>> +	if (no_chunk == 1)
>>>> +		return sun4i_ss_aes_poll(areq, mode);
>>>> +
>>>> +	/*
>>>> +	 * if some SG are not multiple of 4bytes use a fallback
>>>> +	 * it is much easy and clean
>>>> +	 */
>>>
>>> Hm, is this really the best solution. I mean, you can easily pack
>>> values from non aligned sg buffers so that you have only a 4 byte
>>> aligned buffers.
>>> Moreover, by doing this you'll end up with a single
>>> sun4i_ss_ablkcipher_poll function.
>>>
>>> BTW, I wonder if there's anything preventing an AES crypto request to be
>>> forged the same way DES/3DES requests are (sg entries not aligned on 4
>>> bytes boundary).
>>
>> There are two different problem: chunking and alignment.
>> The correct alignment is handled by the crypto API with the alignmask, so the driver do not need to care about it.
> 
> Yes...
> 
>> The chunking is the fact that I can have a SG with a size that is non multiple of 4.
> 
> and I was takling about this specific aspect here.
> 
>> For DES/3DES I handle this problem by using a fallback since DES/3DES was not my priority. (but yes I will handle it in the future)
>> For AES I have assumed that it cannot happen since no test in tcrypt check for it.
> 
> If that's something you're willing to address in a future version, then
> I'm fine with that.
> 
>> Since all SG I get was always a multiple of 16 (AES BLOCK SIZE) it was a sort of confirmation.
>>
>> Herbert ? does am I right or a chunking test is missing for cbc(aes) in testmgr.h
> 
> Okay, just sharing my vision of this thing (I'll let Herbert comment on
> this aspect): I'd say that theoretically nothing prevents one from
> splitting its sg list in chunks smaller than the block size, so I'd
> say you should use the same trick for AES. 
> 

I have rework the cipher functions and now they handle all possibility.
With that DES/3DES is now fully supported without fallback.

Thanks for your time and review

LABBE Corentin

--

diff --git a/drivers/crypto/sunxi-ss/sun4i-ss-cipher.c b/drivers/crypto/sunxi-ss/sun4i-ss-cipher.c
index 01634aa..8f59458 100644
--- a/drivers/crypto/sunxi-ss/sun4i-ss-cipher.c
+++ b/drivers/crypto/sunxi-ss/sun4i-ss-cipher.c
@@ -49,7 +49,7 @@ static int sun4i_ss_opti_poll(struct ablkcipher_request *areq)
 		return -EINVAL;
 	}

-	spin_lock_bh(&ss->slock);
+	/*spin_lock_bh(&ss->slock);*/

 	for (i = 0; i < op->keylen; i += 4)
 		writel(*(op->key + i / 4), ss->base + SS_KEY0 + i);
@@ -110,12 +110,12 @@ release_ss:
 	sg_miter_stop(&mi);
 	sg_miter_stop(&mo);
 	writel(0, ss->base + SS_CTL);
-	spin_unlock_bh(&ss->slock);
+	/*spin_unlock_bh(&ss->slock);*/
 	return err;
 }

 /* Generic function that support SG with size not multiple of 4 */
-static int sun4i_ss_cipher_poll(struct ablkcipher_request *areq)
+int sun4i_ss_cipher_poll(struct ablkcipher_request *areq)
 {
 	struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(areq);
 	struct sun4i_tfm_ctx *op = crypto_ablkcipher_ctx(tfm);
@@ -174,7 +174,7 @@ static int sun4i_ss_cipher_poll(struct ablkcipher_request *areq)
 	if (no_chunk == 1)
 		return sun4i_ss_opti_poll(areq);

-	spin_lock_bh(&ss->slock);
+	/*spin_lock_bh(&ss->slock);*/

 	for (i = 0; i < op->keylen; i += 4)
 		writel(*(op->key + i / 4), ss->base + SS_KEY0 + i);
@@ -295,7 +295,7 @@ release_ss:
 	sg_miter_stop(&mi);
 	sg_miter_stop(&mo);
 	writel(0, ss->base + SS_CTL);
-	spin_unlock_bh(&ss->slock);
+	/*spin_unlock_bh(&ss->slock);*/

 	return err;
 }
@@ -309,6 +309,7 @@ int sun4i_ss_cbc_aes_encrypt(struct ablkcipher_request *areq)

 	rctx->mode = SS_OP_AES | SS_CBC | SS_ENABLED | SS_ENCRYPTION |
 		op->keymode;
+	return sun4i_ss_enqueue(&areq->base);
 	return sun4i_ss_cipher_poll(areq);
 }

@@ -320,6 +321,7 @@ int sun4i_ss_cbc_aes_decrypt(struct ablkcipher_request *areq)

 	rctx->mode = SS_OP_AES | SS_CBC | SS_ENABLED | SS_DECRYPTION |
 		op->keymode;
+	return sun4i_ss_enqueue(&areq->base);
 	return sun4i_ss_cipher_poll(areq);
 }

diff --git a/drivers/crypto/sunxi-ss/sun4i-ss-core.c b/drivers/crypto/sunxi-ss/sun4i-ss-core.c
index f3a410a..4b8dcef 100644
--- a/drivers/crypto/sunxi-ss/sun4i-ss-core.c
+++ b/drivers/crypto/sunxi-ss/sun4i-ss-core.c
@@ -22,10 +22,79 @@
 #include <linux/scatterlist.h>
 #include <linux/interrupt.h>
 #include <linux/delay.h>
+#include <linux/kthread.h>

 #include "sun4i-ss.h"

-static struct sun4i_ss_alg_template driver_algs[] = {
+int sun4i_ss_enqueue(struct crypto_async_request *areq)
+{
+	struct ablkcipher_request *abreq = ablkcipher_request_cast(areq);
+	struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(abreq);
+	struct sun4i_tfm_ctx *op = crypto_ablkcipher_ctx(tfm);
+	int ret;
+
+	spin_lock_bh(&op->ss->slock);
+	ret = crypto_enqueue_request(&op->ss->queue, areq);
+	spin_unlock_bh(&op->ss->slock);
+	if (ret != -EINPROGRESS)
+		return ret;
+
+	wake_up_process(op->ss->thread);
+
+	return -EINPROGRESS;
+}
+
+static int sun4i_ss_thread(void *data) {
+	struct crypto_async_request *backlog;
+        struct crypto_async_request *arq;
+	struct sun4i_ss_ctx *ss = data;
+	u32 rtype;
+	struct ablkcipher_request *areq;
+
+	int ret;
+
+	do {
+                __set_current_state(TASK_INTERRUPTIBLE);
+		spin_lock_bh(&ss->slock);
+		backlog = crypto_get_backlog(&ss->queue);
+		arq = crypto_dequeue_request(&ss->queue);
+		spin_unlock_bh(&ss->slock);
+
+		if (backlog)
+			backlog->complete(backlog, -EINPROGRESS);
+
+		if (arq) {
+			rtype = crypto_tfm_alg_type(arq->tfm);
+			switch (rtype) {
+			/*
+			case CRYPTO_ALG_TYPE_AHASH:
+				struct ahash_request *areq = ahash_request_cast(arq);
+				ret = -1;
+				arq->complete(arq, ret);
+				break;
+			*/
+			case CRYPTO_ALG_TYPE_ABLKCIPHER:
+				areq = ablkcipher_request_cast(arq);
+				ret = sun4i_ss_cipher_poll(areq);
+				/*pr_info("task cipher %d %d %d %u\n", ret,
+						sg_nents(areq->src), sg_nents(areq->dst), areq->nbytes);*/
+				/* we are in a thread and complete must be called with softirq off */
+				local_bh_disable();
+				arq->complete(arq, ret);
+				local_bh_enable();
+				break;
+			default:
+				dev_err(ss->dev, "ERROR: invalid request\n");
+				arq->complete(arq, -EINVAL);
+			}
+		} else {
+			schedule();
+		}
+	} while (!kthread_should_stop());
+	return 0;
+}
+
+static struct sun4i_ss_alg_template driver_algs[] = {/*
 {       .type = CRYPTO_ALG_TYPE_AHASH,
 	.alg.hash = {
 		.init = sun4i_hash_init,
@@ -77,14 +146,14 @@ static struct sun4i_ss_alg_template driver_algs[] = {
 			}
 		}
 	}
-},
+},*/
 {       .type = CRYPTO_ALG_TYPE_ABLKCIPHER,
 	.alg.crypto = {
 		.cra_name = "cbc(aes)",
 		.cra_driver_name = "cbc-aes-sun4i-ss",
 		.cra_priority = 300,
 		.cra_blocksize = AES_BLOCK_SIZE,
-		.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER,
+		.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
 		.cra_ctxsize = sizeof(struct sun4i_tfm_ctx),
 		.cra_module = THIS_MODULE,
 		.cra_alignmask = 3,
@@ -99,7 +168,7 @@ static struct sun4i_ss_alg_template driver_algs[] = {
 			.decrypt        = sun4i_ss_cbc_aes_decrypt,
 		}
 	}
-},
+},/*
 {       .type = CRYPTO_ALG_TYPE_ABLKCIPHER,
 	.alg.crypto = {
 		.cra_name = "ecb(aes)",
@@ -208,7 +277,7 @@ static struct sun4i_ss_alg_template driver_algs[] = {
 				.decrypt        = sun4i_ss_ecb_des3_decrypt,
 		}
 	}
-},
+},*/
 };

 static int sun4i_ss_probe(struct platform_device *pdev)
@@ -313,8 +382,16 @@ static int sun4i_ss_probe(struct platform_device *pdev)

 	ss->dev = &pdev->dev;

+	crypto_init_queue(&ss->queue, 50);
+
 	spin_lock_init(&ss->slock);

+	ss->thread = kthread_run(sun4i_ss_thread, ss, "sun4i_sskd");
+	if (IS_ERR(ss->thread)) {
+		err = PTR_ERR(ss->thread);
+		goto error_thread;
+	}
+
 	for (i = 0; i < ARRAY_SIZE(driver_algs); i++) {
 		driver_algs[i].ss = ss;
 		switch (driver_algs[i].type) {
@@ -347,6 +424,8 @@ error_alg:
 			break;
 		}
 	}
+error_thread:
+	kthread_stop(ss->thread);
 error_clk:
 	clk_disable_unprepare(ss->ssclk);
 error_ssclk:
@@ -359,6 +438,8 @@ static int sun4i_ss_remove(struct platform_device *pdev)
 	int i;
 	struct sun4i_ss_ctx *ss = platform_get_drvdata(pdev);

+	kthread_stop(ss->thread);
+
 	sun4i_ss_hwrng_remove(&ss->hwrng);

 	for (i = 0; i < ARRAY_SIZE(driver_algs); i++) {
diff --git a/drivers/crypto/sunxi-ss/sun4i-ss.h b/drivers/crypto/sunxi-ss/sun4i-ss.h
index 2185a05..8cdf00a 100644
--- a/drivers/crypto/sunxi-ss/sun4i-ss.h
+++ b/drivers/crypto/sunxi-ss/sun4i-ss.h
@@ -132,6 +132,8 @@ struct sun4i_ss_ctx {
 	struct device *dev;
 	struct resource *res;
 	spinlock_t slock; /* control the use of the device */
+	struct crypto_queue queue;
+	struct task_struct *thread;
 	struct hwrng hwrng;
 	u32 seed[SS_SEED_LEN / 4];
 };
@@ -165,6 +167,9 @@ struct sun4i_req_ctx {
 	struct sun4i_ss_ctx *ss;
 };

+int sun4i_ss_enqueue(struct crypto_async_request *areq);
+int sun4i_ss_cipher_poll(struct ablkcipher_request *areq);
+
 int sun4i_hash_crainit(struct crypto_tfm *tfm);
 int sun4i_hash_init(struct ahash_request *areq);
 int sun4i_hash_update(struct ahash_request *areq);

^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2015-06-03 19:06 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-14 12:58 [PATCH v9] crypto: Add Allwinner Security System crypto accelerator LABBE Corentin
2015-05-14 12:58 ` [PATCH v9 1/4] ARM: sun7i: dt: Add Security System to A20 SoC DTS LABBE Corentin
2015-05-15  7:31   ` Maxime Ripard
2015-05-23 12:08     ` Corentin LABBE
2015-05-14 12:58 ` [PATCH v9 2/4] ARM: sun4i: dt: Add DT bindings documentation for SUN4I Security System LABBE Corentin
2015-05-14 12:59 ` [PATCH v9 3/4] MAINTAINERS: Add myself as maintainer of Allwinner " LABBE Corentin
2015-05-14 12:59 ` [PATCH v9 4/4] crypto: Add Allwinner Security System crypto accelerator LABBE Corentin
2015-05-15  6:49   ` Herbert Xu
2015-05-23 12:18     ` [linux-sunxi] " Corentin LABBE
2015-05-24  3:30       ` Herbert Xu
2015-05-15  6:52   ` Herbert Xu
2015-05-16 18:09     ` [linux-sunxi] " Corentin LABBE
2015-05-17  1:32       ` Herbert Xu
2015-05-17  8:45   ` Boris Brezillon
2015-05-17 10:34     ` Herbert Xu
2015-05-17 10:48       ` Boris Brezillon
2015-05-18  0:41         ` Herbert Xu
2015-05-18  7:24           ` Boris Brezillon
2015-05-18  8:33             ` Herbert Xu
2015-05-23 13:12     ` Corentin LABBE
2015-05-23 14:35       ` Boris Brezillon
2015-05-24  3:32         ` Herbert Xu
2015-05-24  8:04           ` [linux-sunxi] " Corentin LABBE
2015-05-24  8:12             ` Herbert Xu
2015-06-03 19:06         ` Corentin LABBE
2015-05-17  7:45 ` [PATCH v9] " Boris Brezillon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).