linux-crypto.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/11] add 842 hw compression for PowerNV platform
@ 2015-04-07 17:34 Dan Streetman
  2015-04-07 17:34 ` [PATCH 01/11] powerpc: export of_get_ibm_chip_id function Dan Streetman
                   ` (11 more replies)
  0 siblings, 12 replies; 46+ messages in thread
From: Dan Streetman @ 2015-04-07 17:34 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Seth Jennings, Robert Jennings, linux-crypto, linuxppc-dev,
	linux-kernel, Dan Streetman

IBM PowerPC processors starting at version P7+ contain a NX coprocessor that
provides various hw-accelerated functions, one of which is memory compression
to the IBM "842" compression format.  This NX-842 coprocessor is already
supported on the pSeries platform, by the nx-842.c driver and the crypto
compression interface at crypto/842.c.  This patch set adds support for NX-842
on the PowerNV (Non-Virtualized) platform.

The existing pSeries platform NX-842 driver could not be re-used for the
PowerNV platform driver, as there are fundamentally different interfaces;
on pSeries the system hypervisor (pHyp) provides the interface and manages
communication with the coprocessor, while on PowerNV the kernel talks directly
to the coprocessor using the ICSWX instruction.  The data structures used to
describe each compression or decompression request to the coprocessor are
also different between pHyp's interface and direct communication with ICSWX.
So, different drivers for pSeries and PowerNV are required.  Adding the new
PowerNV driver but keeping the interface to the drivers the same required
adding a new common frontend interface, to which only one of the platform
drivers will connect (based on what platform the kernel is currently running
on), and moving some functionality out of the existing pSeries driver into a
more common location.  Also, the crypto/842.c interface to the NX-842 hw
driver is modified to be able to handle any alignment or length input or
output buffer; currently with the pSeries driver only page-size and
page-aligned (uncompressed) buffers are possible.

The result is a crypto 842 interface that allows using any input and output
buffers (i.e. any alignment and length) to communicate with the NX-842
hardware on either the pSeries or PowerNV platforms, as well as a generic
842 software decompressor that the crypto 842 interface falls back to if the
NX-842 hardware fails and/or returns error during decompression.

Finally, this also adds a generic crypto compression selftest module, that
can verify correct compression/decompression cycles using variable alignment
and length buffers, multiple threads, and can calculate the throughput.

Dan Streetman (11):
  powerpc: export of_get_ibm_chip_id function
  powerpc: Add ICSWX instruction
  crypto: add software 842 decompression
  drivers/crypto/nx: move nx-842.c to nx-842-pseries.c
  drivers/crypto/nx: add NX-842 platform frontend driver
  drivers/crypto/nx: add nx842 constraints
  drivers/crypto/nx: add PowerNV platform NX-842 driver
  drivers/crypto/nx: simplify pSeries nx842 driver
  crypto: remove LZO fallback from crypto 842
  crypto: rewrite crypto 842 to use nx842 constraints
  crypto: add crypto compression sefltest

 MAINTAINERS                           |    5 +-
 arch/powerpc/include/asm/icswx.h      |  184 ++++
 arch/powerpc/include/asm/ppc-opcode.h |   13 +
 arch/powerpc/kernel/prom.c            |    1 +
 crypto/842.c                          |  495 ++++++++--
 crypto/Kconfig                        |   13 +-
 crypto/Makefile                       |    1 +
 crypto/comp_selftest.c                |  928 +++++++++++++++++++
 drivers/crypto/Kconfig                |    6 +-
 drivers/crypto/nx/Kconfig             |   43 +-
 drivers/crypto/nx/Makefile            |    4 +
 drivers/crypto/nx/nx-842-powernv.c    |  623 +++++++++++++
 drivers/crypto/nx/nx-842-pseries.c    | 1126 +++++++++++++++++++++++
 drivers/crypto/nx/nx-842.c            | 1623 +++------------------------------
 drivers/crypto/nx/nx-842.h            |  131 +++
 include/linux/nx842.h                 |   17 +-
 include/linux/sw842.h                 |    7 +
 lib/842/842_decompress.c              |  413 +++++++++
 lib/842/Makefile                      |    1 +
 lib/Kconfig                           |    3 +
 lib/Makefile                          |    1 +
 21 files changed, 4001 insertions(+), 1637 deletions(-)
 create mode 100644 arch/powerpc/include/asm/icswx.h
 create mode 100644 crypto/comp_selftest.c
 create mode 100644 drivers/crypto/nx/nx-842-powernv.c
 create mode 100644 drivers/crypto/nx/nx-842-pseries.c
 create mode 100644 drivers/crypto/nx/nx-842.h
 create mode 100644 include/linux/sw842.h
 create mode 100644 lib/842/842_decompress.c
 create mode 100644 lib/842/Makefile

-- 
2.1.0

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 01/11] powerpc: export of_get_ibm_chip_id function
  2015-04-07 17:34 [PATCH 00/11] add 842 hw compression for PowerNV platform Dan Streetman
@ 2015-04-07 17:34 ` Dan Streetman
  2015-04-07 17:34 ` [PATCH 02/11] powerpc: Add ICSWX instruction Dan Streetman
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 46+ messages in thread
From: Dan Streetman @ 2015-04-07 17:34 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Seth Jennings, Robert Jennings, Rob Herring, Anton Blanchard,
	linux-crypto, linuxppc-dev, linux-kernel, Dan Streetman

Export the of_get_ibm_chip_id() function.  This will be used by the
PowerNV NX-842 driver.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 arch/powerpc/kernel/prom.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index b8e15c6..f9fb9a2 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -800,6 +800,7 @@ int of_get_ibm_chip_id(struct device_node *np)
 	}
 	return -1;
 }
+EXPORT_SYMBOL(of_get_ibm_chip_id);
 
 /**
  * cpu_to_chip_id - Return the cpus chip-id
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 02/11] powerpc: Add ICSWX instruction
  2015-04-07 17:34 [PATCH 00/11] add 842 hw compression for PowerNV platform Dan Streetman
  2015-04-07 17:34 ` [PATCH 01/11] powerpc: export of_get_ibm_chip_id function Dan Streetman
@ 2015-04-07 17:34 ` Dan Streetman
  2015-04-07 17:34 ` [PATCH 03/11] crypto: add software 842 decompression Dan Streetman
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 46+ messages in thread
From: Dan Streetman @ 2015-04-07 17:34 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Seth Jennings, Robert Jennings, linux-crypto, linuxppc-dev,
	linux-kernel, Dan Streetman

Add the asm ICSWX and ICSWEPX opcodes.  Add definitions for the
Coprocessor Request structures needed to use the icswx calls to
coprocessors.  Add icswx() function to perform the ICSWX asm
using the provided Coprocessor Command Word value and
Coprocessor Request Block structure.

This is required for communication with the NX-842 coprocessor on
a PowerNV system.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 arch/powerpc/include/asm/icswx.h      | 184 ++++++++++++++++++++++++++++++++++
 arch/powerpc/include/asm/ppc-opcode.h |  13 +++
 2 files changed, 197 insertions(+)
 create mode 100644 arch/powerpc/include/asm/icswx.h

diff --git a/arch/powerpc/include/asm/icswx.h b/arch/powerpc/include/asm/icswx.h
new file mode 100644
index 0000000..a70ae93
--- /dev/null
+++ b/arch/powerpc/include/asm/icswx.h
@@ -0,0 +1,184 @@
+/*
+ * ICSWX api
+ *
+ * Copyright (C) 2015 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * This provides the Initiate Coprocessor Store Word Indexed (ICSWX)
+ * instruction.  This instruction is used to communicate with PowerPC
+ * coprocessors.  This also provides definitions of the structures used
+ * to communicate with the coprocessor.
+ *
+ * The RFC02130: Coprocessor Architecture document is the reference for
+ * everything in this file unless otherwise noted.
+ */
+#ifndef _ARCH_POWERPC_INCLUDE_ASM_ICSWX_H_
+#define _ARCH_POWERPC_INCLUDE_ASM_ICSWX_H_
+
+#include <asm/ppc-opcode.h> /* for PPC_ICSWX */
+
+/* Chapter 6.5.8 Coprocessor-Completion Block (CCB) */
+
+#define CCB_VALUE		(0x3fffffffffffffff)
+#define CCB_ADDRESS		(0xfffffffffffffff8)
+#define CCB_CM			(0x0000000000000007)
+#define CCB_CM0			(0x0000000000000004)
+#define CCB_CM12		(0x0000000000000003)
+
+#define CCB_CM0_ALL_COMPLETIONS	(0x0)
+#define CCB_CM0_LAST_IN_CHAIN	(0x4)
+#define CCB_CM12_STORE		(0x0)
+#define CCB_CM12_INTERRUPT	(0x1)
+
+#define CCB_SIZE		(0x10)
+#define CCB_ALIGN		CCB_SIZE
+
+struct coprocessor_completion_block {
+	__be64 value;
+	__be64 address;
+} __packed __aligned(CCB_ALIGN);
+
+
+/* Chapter 6.5.7 Coprocessor-Status Block (CSB) */
+
+#define CSB_V			(0x80)
+#define CSB_F			(0x04)
+#define CSB_CH			(0x03)
+#define CSB_CE_INCOMPLETE	(0x80)
+#define CSB_CE_TERMINATION	(0x40)
+#define CSB_CE_TPBC		(0x20)
+
+#define CSB_CC_SUCCESS		(0)
+#define CSB_CC_INVALID_ALIGN	(1)
+#define CSB_CC_OPERAND_OVERLAP	(2)
+#define CSB_CC_DATA_LENGTH	(3)
+#define CSB_CC_TRANSLATION	(5)
+#define CSB_CC_PROTECTION	(6)
+#define CSB_CC_RD_EXTERNAL	(7)
+#define CSB_CC_INVALID_OPERAND	(8)
+#define CSB_CC_PRIVILEGE	(9)
+#define CSB_CC_INTERNAL		(10)
+#define CSB_CC_WR_EXTERNAL	(12)
+#define CSB_CC_NOSPC		(13)
+#define CSB_CC_EXCESSIVE_DDE	(14)
+#define CSB_CC_WR_TRANSLATION	(15)
+#define CSB_CC_WR_PROTECTION	(16)
+#define CSB_CC_UNKNOWN_CODE	(17)
+#define CSB_CC_ABORT		(18)
+#define CSB_CC_TRANSPORT	(20)
+#define CSB_CC_SEGMENTED_DDL	(31)
+#define CSB_CC_PROGRESS_POINT	(32)
+#define CSB_CC_DDE_OVERFLOW	(33)
+#define CSB_CC_SESSION		(34)
+#define CSB_CC_PROVISION	(36)
+#define CSB_CC_CHAIN		(37)
+#define CSB_CC_SEQUENCE		(38)
+#define CSB_CC_HW		(39)
+
+#define CSB_SIZE		(0x10)
+#define CSB_ALIGN		CSB_SIZE
+
+struct coprocessor_status_block {
+	u8 flags;
+	u8 cs;
+	u8 cc;
+	u8 ce;
+	__be32 count;
+	__be64 address;
+} __packed __aligned(CSB_ALIGN);
+
+
+/* Chapter 6.5.10 Data-Descriptor List (DDL)
+ * each list contains one or more Data-Descriptor Entries (DDE)
+ */
+
+#define DDE_P			(0x8000)
+
+#define DDE_SIZE		(16)
+#define DDE_ALIGN		DDE_SIZE
+
+struct data_descriptor_entry {
+	__be16 flags;
+	u8 count;
+	u8 index;
+	__be32 length;
+	__be64 address;
+} __packed __aligned(DDE_ALIGN);
+
+
+/* Chapter 6.5.2 Coprocessor-Request Block (CRB) */
+
+#define CRB_SIZE		(128)
+#define CRB_ALIGN		(256) /* Errata: requires 256 alignment */
+
+/* Coprocessor Status Block field
+ *   ADDRESS	address of CSB
+ *   C		CCB is valid
+ *   AT		0 = addrs are virtual, 1 = addrs are phys
+ *   M		enable perf monitor
+ */
+#define CRB_CSB_ADDRESS		(0xfffffffffffffff0)
+#define CRB_CSB_C		(0x0000000000000008)
+#define CRB_CSB_AT		(0x0000000000000002)
+#define CRB_CSB_M		(0x0000000000000001)
+
+struct coprocessor_request_block {
+	__be32 ccw;
+	__be32 flags;
+	__be64 csb_addr;
+
+	struct data_descriptor_entry source;
+	struct data_descriptor_entry target;
+
+	struct coprocessor_completion_block ccb;
+
+	u8 reserved[48];
+
+	struct coprocessor_status_block csb;
+} __packed __aligned(CRB_ALIGN);
+
+
+/* RFC02167 Initiate Coprocessor Instructions document
+ * Chapter 8.2.1.1.1 RS
+ * Chapter 8.2.3 Coprocessor Directive
+ * Chapter 8.2.4 Execution
+ *
+ * The CCW must be converted to BE before passing to icswx()
+ */
+
+#define CCW_PS			(0xff000000)
+#define CCW_CT			(0x00ff0000)
+#define CCW_CD			(0x0000ffff)
+#define CCW_CL			(0x0000c000)
+
+
+/* RFC02167 Initiate Coprocessor Instructions document
+ * Chapter 8.2.1 Initiate Coprocessor Store Word Indexed (ICSWX)
+ * Chapter 8.2.4.1 Condition Register 0
+ */
+
+#define ICSWX_INITIATED		(0x8)
+#define ICSWX_BUSY		(0x4)
+#define ICSWX_REJECTED		(0x2)
+
+static inline int icswx(__be32 ccw, struct coprocessor_request_block *crb)
+{
+	__be64 ccw_reg = ccw;
+	u32 cr;
+
+	__asm__ __volatile__(
+	PPC_ICSWX(%1,0,%2) "\n"
+	"mfcr %0\n"
+	: "=r" (cr)
+	: "r" (ccw_reg), "r" (crb)
+	: "cr0", "memory");
+
+	return (int)((cr >> 28) & 0xf);
+}
+
+
+#endif /* _ARCH_POWERPC_INCLUDE_ASM_ICSWX_H_ */
diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index 4cbe23a..e025d8e 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -136,6 +136,8 @@
 #define PPC_INST_DCBAL			0x7c2005ec
 #define PPC_INST_DCBZL			0x7c2007ec
 #define PPC_INST_ICBT			0x7c00002c
+#define PPC_INST_ICSWX			0x7c00032d
+#define PPC_INST_ICSWEPX		0x7c00076d
 #define PPC_INST_ISEL			0x7c00001e
 #define PPC_INST_ISEL_MASK		0xfc00003e
 #define PPC_INST_LDARX			0x7c0000a8
@@ -401,4 +403,15 @@
 #define MFTMR(tmr, r)		stringify_in_c(.long PPC_INST_MFTMR | \
 					       TMRN(tmr) | ___PPC_RT(r))
 
+/* Coprocessor instructions */
+#define PPC_ICSWX(s, a, b)	stringify_in_c(.long PPC_INST_ICSWX |	\
+					       ___PPC_RS(s) |		\
+					       ___PPC_RA(a) |		\
+					       ___PPC_RB(b))
+#define PPC_ICSWEPX(s, a, b)	stringify_in_c(.long PPC_INST_ICSWEPX | \
+					       ___PPC_RS(s) |		\
+					       ___PPC_RA(a) |		\
+					       ___PPC_RB(b))
+
+
 #endif /* _ASM_POWERPC_PPC_OPCODE_H */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 03/11] crypto: add software 842 decompression
  2015-04-07 17:34 [PATCH 00/11] add 842 hw compression for PowerNV platform Dan Streetman
  2015-04-07 17:34 ` [PATCH 01/11] powerpc: export of_get_ibm_chip_id function Dan Streetman
  2015-04-07 17:34 ` [PATCH 02/11] powerpc: Add ICSWX instruction Dan Streetman
@ 2015-04-07 17:34 ` Dan Streetman
  2015-04-07 17:34 ` [PATCH 04/11] drivers/crypto/nx: move nx-842.c to nx-842-pseries.c Dan Streetman
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 46+ messages in thread
From: Dan Streetman @ 2015-04-07 17:34 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Seth Jennings, Robert Jennings, Andrew Morton, Greg KH,
	linux-crypto, linuxppc-dev, linux-kernel, Dan Streetman

Add an 842-format software decompression function.  Update the MAINTAINERS
842 section to include the new files.

This decompression function can decompress any standard-format 842
compressed data.  The 842 compressed format is explained in the header
comments.  This general-use decompression function is required by later
patches that update the crypto 842 driver to fall back to software 842
decompression if the NX-842 hardware fails and/or returns an error.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 MAINTAINERS              |   2 +
 include/linux/sw842.h    |   7 +
 lib/842/842_decompress.c | 413 +++++++++++++++++++++++++++++++++++++++++++++++
 lib/842/Makefile         |   1 +
 lib/Kconfig              |   3 +
 lib/Makefile             |   1 +
 6 files changed, 427 insertions(+)
 create mode 100644 include/linux/sw842.h
 create mode 100644 lib/842/842_decompress.c
 create mode 100644 lib/842/Makefile

diff --git a/MAINTAINERS b/MAINTAINERS
index efbcb50..3dc973a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4836,6 +4836,8 @@ M:	Dan Streetman <ddstreet@us.ibm.com>
 S:	Supported
 F:	drivers/crypto/nx/nx-842.c
 F:	include/linux/nx842.h
+F:	include/linux/sw842.h
+F:	lib/842/
 
 IBM Power Linux RAID adapter
 M:	Brian King <brking@us.ibm.com>
diff --git a/include/linux/sw842.h b/include/linux/sw842.h
new file mode 100644
index 0000000..aa8d86e
--- /dev/null
+++ b/include/linux/sw842.h
@@ -0,0 +1,7 @@
+#ifndef __SW842_H__
+#define __SW842_H__
+
+int sw842_decompress(const unsigned char *src, int srclen,
+			unsigned char *dst, int *destlen);
+
+#endif
diff --git a/lib/842/842_decompress.c b/lib/842/842_decompress.c
new file mode 100644
index 0000000..9fc0ffc
--- /dev/null
+++ b/lib/842/842_decompress.c
@@ -0,0 +1,413 @@
+/*
+ * 842 Decompressor
+ *
+ * Copyright (C) 2015 Dan Streetman, IBM Corp
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * The 842 compressed format is made up of multiple blocks, each of
+ * which have the format:
+ *
+ * <template>[arg1][arg2][arg3][arg4]
+ *
+ * where there are between 0 and 4 template args, depending on the specific
+ * template operation.  For normal operations, each arg is either a specific
+ * number of data bytes to add to the output stream, or an index pointing
+ * to a previously-written number of data bytes to copy to the output stream.
+ *
+ * The template code is a 5-bit value.  This code indicates what to
+ * do with the following data.  Template codes from 0 to 0x19 should
+ * use the template table, the static "ops" table in the code below.
+ * For each template (table row), there are between 1 and 4 actions;
+ * each action corresponds to an arg following the template code
+ * bits.  Each action is either a "data" type action, or a "index"
+ * type action, and each action results in 2, 4, or 8 bytes being
+ * written to the output stream.  Each template (i.e. all actions in
+ * the table row) will add up to 8 bytes being written to the output
+ * stream.  Any row with less than 4 actions is padded with noop
+ * actions, indicated by N0 (for which there is no corresponding arg
+ * in the compressed data stream).
+ *
+ * "Data" actions, indicated in the table by D2, D4, and D8, mean that
+ * the corresponding arg is 2, 4, or 8 bytes, respectively, in the
+ * compressed data stream should be copied directly to the output stream.
+ *
+ * "Index" actions, indicated in the table by I2, I4, and I8, mean
+ * the corresponding arg is an index parameter that points to,
+ * respectively, a 2, 4, or 8 byte value already in the output
+ * stream, that should be copied to the end of the output stream.
+ * Essentially, the index points to a position in a ring buffer that
+ * contains the last N bytes of output stream data.  The number of bits
+ * for each index's arg are: 8 bits for I2, 9 bits for I4, and 8 bits for
+ * I8.  Since each index points to a 2, 4, or 8 byte section, this means
+ * that I2 can reference 512 bytes ((2^8 bits = 256) * 2 bytes), I4 can
+ * reference 2048 bytes ((2^9 = 512) * 4 bytes), and I8 can reference
+ * 2048 bytes ((2^8 = 256) * 8 bytes).  Think of it as a dedicated ring
+ * buffer for each of I2, I4, and I8 that are updated for each byte
+ * written to the output stream.  In this implementation, the output stream
+ * is directly used for each index; there is no additional memory required.
+ * Note that the index is into a ring buffer, not a sliding window;
+ * for example, if there have been 260 bytes written to the output stream,
+ * an I2 index of 0 would index to byte 256 in the output stream, while
+ * an I2 index of 16 would index to byte 16 in the output stream.
+ *
+ * There are also 3 special template codes; 0x1b for "repeat", 0x1c for
+ * "zeros", and 0x1e for "end".  The "repeat" operation is followed by
+ * a 6 bit arg N indicating how many times to repeat.  The last 8
+ * bytes written to the output stream are written again to the output
+ * stream, N + 1 times.  The "zeros" operation, which has no arg bits,
+ * writes 8 zeros to the output stream.  The "end" operation, which also
+ * has no arg bits, signals the end of the compressed data.  There may
+ * be some number of padding (don't care, but usually 0) bits after
+ * the "end" operation bits, to fill the stream length to a specific
+ * byte multiple (usually a multiple of 8, 16, or 32 bytes).
+ *
+ * After all actions for each operation code are processed, another
+ * template code is in the next 5 bits.  The decompression ends
+ * once the "end" template code is detected.
+ */
+
+#ifndef STATIC
+#include <linux/module.h>
+#include <linux/kernel.h>
+#endif
+
+#include <linux/sw842.h>
+
+/* special templates */
+#define OP_REPEAT	(0x1B)
+#define OP_ZEROS	(0x1C)
+#define OP_END		(0x1E)
+
+/* additional bits of each op param */
+#define OP_BITS		(5)
+#define REPEAT_BITS	(6)
+#define I2_BITS		(8)
+#define I4_BITS		(9)
+#define I8_BITS		(8)
+
+/* rolling fifo sizes */
+#define I2_FIFO_SIZE	(512)
+#define I4_FIFO_SIZE	(2048)
+#define I8_FIFO_SIZE	(2048)
+
+/* Arbitrary values used to indicate action */
+#define OP_ACTION	(0x30)
+#define OP_ACTION_NOOP	(0x00)
+#define OP_ACTION_DATA	(0x10)
+#define OP_ACTION_INDEX	(0x20)
+#define OP_AMOUNT	(0x0f)
+#define OP_AMOUNT_0	(0x00)
+#define OP_AMOUNT_2	(0x02)
+#define OP_AMOUNT_4	(0x04)
+#define OP_AMOUNT_8	(0x08)
+
+#define D2		(OP_ACTION_DATA  | OP_AMOUNT_2)
+#define D4		(OP_ACTION_DATA  | OP_AMOUNT_4)
+#define D8		(OP_ACTION_DATA  | OP_AMOUNT_8)
+#define I2		(OP_ACTION_INDEX | OP_AMOUNT_2)
+#define I4		(OP_ACTION_INDEX | OP_AMOUNT_4)
+#define I8		(OP_ACTION_INDEX | OP_AMOUNT_8)
+#define N0		(OP_ACTION_NOOP  | OP_AMOUNT_0)
+
+#define OPS_MAX		(0x19)
+
+static u8 ops[OPS_MAX + 1][4] = {
+	{ D8, N0, N0, N0 },
+	{ D4, D2, I2, N0 },
+	{ D4, I2, D2, N0 },
+	{ D4, I2, I2, N0 },
+	{ D4, I4, N0, N0 },
+	{ D2, I2, D4, N0 },
+	{ D2, I2, D2, I2 },
+	{ D2, I2, I2, D2 },
+	{ D2, I2, I2, I2 },
+	{ D2, I2, I4, N0 },
+	{ I2, D2, D4, N0 },
+	{ I2, D4, I2, N0 },
+	{ I2, D2, I2, D2 },
+	{ I2, D2, I2, I2 },
+	{ I2, D2, I4, N0 },
+	{ I2, I2, D4, N0 },
+	{ I2, I2, D2, I2 },
+	{ I2, I2, I2, D2 },
+	{ I2, I2, I2, I2 },
+	{ I2, I2, I4, N0 },
+	{ I4, D4, N0, N0 },
+	{ I4, D2, I2, N0 },
+	{ I4, I2, D2, N0 },
+	{ I4, I2, I2, N0 },
+	{ I4, I4, N0, N0 },
+	{ I8, N0, N0, N0 }
+};
+
+struct sw842_param {
+	u8 *in;
+	int bit;
+	int ilen;
+	u8 *out;
+	u8 *ostart;
+	int olen;
+};
+
+/**
+ * Get the next specified bits, up to 57 bits
+ *
+ * This also increments the byte and bit positions, and remaining
+ * length.  This can return no more than 57 bits, because in the
+ * worst case the starting bit is bit 7, which would place the end
+ * of the following 57 bits at the end of an 8 byte span, which
+ * is the max that this function's type casting approach can
+ * handle.
+ *
+ * Returns: the value of the requested bits, or -1 on failure
+ */
+static s64 next_bits(struct sw842_param *p, int n)
+{
+	u64 v;
+	u8 *in = p->in;
+	int b = p->bit, bits = b + n;
+
+	if (b > 7 || n > 57) {
+		WARN(1, "b %d n %d\n", b, n);
+		return -EINVAL;
+	}
+
+	if (DIV_ROUND_UP(bits, 8) > p->ilen)
+		return -EOVERFLOW;
+
+	if (bits <= 8)
+		v = *in >> (8 - bits);
+	else if (bits <= 16)
+		v = be16_to_cpu(*(__be16 *)in) >> (16 - bits);
+	else if (bits <= 32)
+		v = be32_to_cpu(*(__be32 *)in) >> (32 - bits);
+	else
+		v = be64_to_cpu(*(__be64 *)in) >> (64 - bits);
+
+	p->bit += n;
+
+	p->in += p->bit / 8;
+	p->ilen -= p->bit / 8;
+	p->bit %= 8;
+
+	return (s64)(v & ((1 << n) - 1));
+}
+
+static int __do_data(struct sw842_param *p, int n)
+{
+	s64 v = next_bits(p, n * 8);
+
+	if (v < 0 || n > p->olen)
+		return -EINVAL;
+
+	switch (n) {
+	case 2:
+		*(__be16 *)p->out = cpu_to_be16((u16)v);
+		break;
+	case 4:
+		*(__be32 *)p->out = cpu_to_be32((u32)v);
+		break;
+	default:
+		return -EINVAL;
+	}
+	p->out += n;
+	p->olen -= n;
+
+	return 0;
+}
+
+static int do_data(struct sw842_param *p, int n)
+{
+	switch (n) {
+	case 2:
+		if (__do_data(p, 2))
+			return -EINVAL;
+		break;
+	case 8:
+		/* we copy two 4-byte chunks here because
+		 * next_bits() can't do a full 64 bits
+		 */
+		if (__do_data(p, 4))
+			return -EINVAL;
+		/* fallthrough */
+	case 4:
+		if (__do_data(p, 4))
+			return -EINVAL;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int __do_index(struct sw842_param *p, int size, int bits, int fsize)
+{
+	s64 index = next_bits(p, bits);
+	u64 offset;
+	int total = (int)(p->out - p->ostart);
+
+	if (index < 0)
+		return -EINVAL;
+
+	offset = index * size;
+
+	/* a ring buffer of fsize is used; correct the offset */
+	if (total > fsize) {
+		/* this is where the current fifo is */
+		int sec = (total / fsize) * fsize;
+		/* the current pos in the fifo */
+		int pos = total % fsize;
+
+		/* if the offset is past/at the pos, we need to
+		 * go back to the last fifo section
+		 */
+		if (offset >= pos)
+			sec -= fsize;
+
+		offset += sec;
+	}
+
+	if (offset + size > total)
+		return -EINVAL;
+
+	memcpy(p->out, &p->ostart[offset], size);
+	p->out += size;
+	p->olen -= size;
+
+	return 0;
+}
+
+int do_index(struct sw842_param *p, int n)
+{
+	switch (n) {
+	case 2:
+		return __do_index(p, 2, I2_BITS, I2_FIFO_SIZE);
+	case 4:
+		return __do_index(p, 4, I4_BITS, I4_FIFO_SIZE);
+	case 8:
+		return __do_index(p, 8, I8_BITS, I8_FIFO_SIZE);
+	default:
+		return -EINVAL;
+	}
+}
+
+int do_op(struct sw842_param *p, int o)
+{
+	int i;
+	u8 op, n;
+
+	if (o > OPS_MAX)
+		return -EINVAL;
+
+	for (i = 0; i < 4; i++) {
+		op = ops[o][i];
+		n = op & OP_AMOUNT;
+
+		switch (op & OP_ACTION) {
+		case OP_ACTION_DATA:
+			if (do_data(p, n))
+				return -EINVAL;
+			break;
+		case OP_ACTION_INDEX:
+			if (do_index(p, n))
+				return -EINVAL;
+			break;
+		case OP_ACTION_NOOP:
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
+/**
+ * sw842_decompress
+ *
+ * Decompress the 842-compressed buffer of length @len at @in
+ * to the output buffer @out.
+ *
+ * The compressed buffer must be only a single 842-compressed buffer,
+ * with the standard format described in the comments at the top of
+ * this file.  Processing will stop when the 842 "END" template is
+ * detected, not the end of the buffer.
+ *
+ * Returns: 0 on success, error on failure.  The @olen parameter
+ * will contain the number of output bytes written on success, or
+ * 0 on error.
+ */
+int sw842_decompress(const unsigned char *in, int len,
+		     unsigned char *out, int *olen)
+{
+	struct sw842_param p;
+	int op, total = *olen;
+
+	p.in = (unsigned char *)in;
+	p.bit = 0;
+	p.ilen = len;
+	p.out = out;
+	p.ostart = out;
+	p.olen = *olen;
+
+	*olen = 0;
+
+	while ((op = (int)next_bits(&p, OP_BITS)) != OP_END) {
+		if (op < 0)
+			return op;
+
+		if (op == OP_REPEAT) {
+			int rep = (int)next_bits(&p, REPEAT_BITS);
+
+			if (rep < 0)
+				return rep;
+
+			if (p.out == out) /* no previous bytes */
+				return -EINVAL;
+
+			/* copy rep + 1 */
+			rep++;
+
+			if (rep * 8 > p.olen)
+				return -ENOSPC;
+
+			while (rep-- > 0) {
+				memcpy(p.out, p.out - 8, 8);
+				p.out += 8;
+				p.olen -= 8;
+			}
+		} else if (op == OP_ZEROS) {
+			if (8 > p.olen)
+				return -ENOSPC;
+
+			memset(p.out, 0, 8);
+			p.out += 8;
+			p.olen -= 8;
+		} else { /* use template */
+			if (do_op(&p, op))
+				return -EINVAL;
+		}
+	}
+
+	*olen = total - p.olen;
+
+	return 0;
+}
+#ifndef STATIC
+EXPORT_SYMBOL_GPL(sw842_decompress);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Software 842 Decompressor");
+MODULE_AUTHOR("Dan Streetman <ddstreet@ieee.org>");
+
+#endif
diff --git a/lib/842/Makefile b/lib/842/Makefile
new file mode 100644
index 0000000..8071c3f
--- /dev/null
+++ b/lib/842/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_842_DECOMPRESS) += 842_decompress.o
diff --git a/lib/Kconfig b/lib/Kconfig
index 87da53b..78aee1f 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -213,6 +213,9 @@ config RANDOM32_SELFTEST
 #
 # compression support is select'ed if needed
 #
+config 842_DECOMPRESS
+	tristate
+
 config ZLIB_INFLATE
 	tristate
 
diff --git a/lib/Makefile b/lib/Makefile
index 58f74d2..88b144e 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -78,6 +78,7 @@ obj-$(CONFIG_LIBCRC32C)	+= libcrc32c.o
 obj-$(CONFIG_CRC8)	+= crc8.o
 obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o
 
+obj-$(CONFIG_842_DECOMPRESS) += 842/
 obj-$(CONFIG_ZLIB_INFLATE) += zlib_inflate/
 obj-$(CONFIG_ZLIB_DEFLATE) += zlib_deflate/
 obj-$(CONFIG_REED_SOLOMON) += reed_solomon/
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 04/11] drivers/crypto/nx: move nx-842.c to nx-842-pseries.c
  2015-04-07 17:34 [PATCH 00/11] add 842 hw compression for PowerNV platform Dan Streetman
                   ` (2 preceding siblings ...)
  2015-04-07 17:34 ` [PATCH 03/11] crypto: add software 842 decompression Dan Streetman
@ 2015-04-07 17:34 ` Dan Streetman
  2015-04-07 17:34 ` [PATCH 05/11] drivers/crypto/nx: add NX-842 platform frontend driver Dan Streetman
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 46+ messages in thread
From: Dan Streetman @ 2015-04-07 17:34 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Seth Jennings, Robert Jennings, Marcelo Henrique Cerri,
	Fionnuala Gunter, linux-crypto, linuxppc-dev, linux-kernel,
	Dan Streetman

Move the entire NX-842 driver for the pSeries platform from the file
nx-842.c to nx-842-pseries.c.  This is required by later patches that
add NX-842 support for the PowerNV platform.

This patch does not alter the content of the pSeries NX-842 driver at
all, it only changes the filename.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 drivers/crypto/nx/Makefile         |    2 +-
 drivers/crypto/nx/nx-842-pseries.c | 1603 ++++++++++++++++++++++++++++++++++++
 drivers/crypto/nx/nx-842.c         | 1603 ------------------------------------
 3 files changed, 1604 insertions(+), 1604 deletions(-)
 create mode 100644 drivers/crypto/nx/nx-842-pseries.c
 delete mode 100644 drivers/crypto/nx/nx-842.c

diff --git a/drivers/crypto/nx/Makefile b/drivers/crypto/nx/Makefile
index bb770ea..8669ffa 100644
--- a/drivers/crypto/nx/Makefile
+++ b/drivers/crypto/nx/Makefile
@@ -11,4 +11,4 @@ nx-crypto-objs := nx.o \
 		  nx-sha512.o
 
 obj-$(CONFIG_CRYPTO_DEV_NX_COMPRESS) += nx-compress.o
-nx-compress-objs := nx-842.o
+nx-compress-objs := nx-842-pseries.o
diff --git a/drivers/crypto/nx/nx-842-pseries.c b/drivers/crypto/nx/nx-842-pseries.c
new file mode 100644
index 0000000..887196e
--- /dev/null
+++ b/drivers/crypto/nx/nx-842-pseries.c
@@ -0,0 +1,1603 @@
+/*
+ * Driver for IBM Power 842 compression accelerator
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright (C) IBM Corporation, 2012
+ *
+ * Authors: Robert Jennings <rcj@linux.vnet.ibm.com>
+ *          Seth Jennings <sjenning@linux.vnet.ibm.com>
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/nx842.h>
+#include <linux/of.h>
+#include <linux/slab.h>
+
+#include <asm/page.h>
+#include <asm/vio.h>
+
+#include "nx_csbcpb.h" /* struct nx_csbcpb */
+
+#define MODULE_NAME "nx-compress"
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Robert Jennings <rcj@linux.vnet.ibm.com>");
+MODULE_DESCRIPTION("842 H/W Compression driver for IBM Power processors");
+
+#define SHIFT_4K 12
+#define SHIFT_64K 16
+#define SIZE_4K (1UL << SHIFT_4K)
+#define SIZE_64K (1UL << SHIFT_64K)
+
+/* IO buffer must be 128 byte aligned */
+#define IO_BUFFER_ALIGN 128
+
+struct nx842_header {
+	int blocks_nr; /* number of compressed blocks */
+	int offset; /* offset of the first block (from beginning of header) */
+	int sizes[0]; /* size of compressed blocks */
+};
+
+static inline int nx842_header_size(const struct nx842_header *hdr)
+{
+	return sizeof(struct nx842_header) +
+			hdr->blocks_nr * sizeof(hdr->sizes[0]);
+}
+
+/* Macros for fields within nx_csbcpb */
+/* Check the valid bit within the csbcpb valid field */
+#define NX842_CSBCBP_VALID_CHK(x) (x & BIT_MASK(7))
+
+/* CE macros operate on the completion_extension field bits in the csbcpb.
+ * CE0 0=full completion, 1=partial completion
+ * CE1 0=CE0 indicates completion, 1=termination (output may be modified)
+ * CE2 0=processed_bytes is source bytes, 1=processed_bytes is target bytes */
+#define NX842_CSBCPB_CE0(x)	(x & BIT_MASK(7))
+#define NX842_CSBCPB_CE1(x)	(x & BIT_MASK(6))
+#define NX842_CSBCPB_CE2(x)	(x & BIT_MASK(5))
+
+/* The NX unit accepts data only on 4K page boundaries */
+#define NX842_HW_PAGE_SHIFT	SHIFT_4K
+#define NX842_HW_PAGE_SIZE	(ASM_CONST(1) << NX842_HW_PAGE_SHIFT)
+#define NX842_HW_PAGE_MASK	(~(NX842_HW_PAGE_SIZE-1))
+
+enum nx842_status {
+	UNAVAILABLE,
+	AVAILABLE
+};
+
+struct ibm_nx842_counters {
+	atomic64_t comp_complete;
+	atomic64_t comp_failed;
+	atomic64_t decomp_complete;
+	atomic64_t decomp_failed;
+	atomic64_t swdecomp;
+	atomic64_t comp_times[32];
+	atomic64_t decomp_times[32];
+};
+
+static struct nx842_devdata {
+	struct vio_dev *vdev;
+	struct device *dev;
+	struct ibm_nx842_counters *counters;
+	unsigned int max_sg_len;
+	unsigned int max_sync_size;
+	unsigned int max_sync_sg;
+	enum nx842_status status;
+} __rcu *devdata;
+static DEFINE_SPINLOCK(devdata_mutex);
+
+#define NX842_COUNTER_INC(_x) \
+static inline void nx842_inc_##_x( \
+	const struct nx842_devdata *dev) { \
+	if (dev) \
+		atomic64_inc(&dev->counters->_x); \
+}
+NX842_COUNTER_INC(comp_complete);
+NX842_COUNTER_INC(comp_failed);
+NX842_COUNTER_INC(decomp_complete);
+NX842_COUNTER_INC(decomp_failed);
+NX842_COUNTER_INC(swdecomp);
+
+#define NX842_HIST_SLOTS 16
+
+static void ibm_nx842_incr_hist(atomic64_t *times, unsigned int time)
+{
+	int bucket = fls(time);
+
+	if (bucket)
+		bucket = min((NX842_HIST_SLOTS - 1), bucket - 1);
+
+	atomic64_inc(&times[bucket]);
+}
+
+/* NX unit operation flags */
+#define NX842_OP_COMPRESS	0x0
+#define NX842_OP_CRC		0x1
+#define NX842_OP_DECOMPRESS	0x2
+#define NX842_OP_COMPRESS_CRC   (NX842_OP_COMPRESS | NX842_OP_CRC)
+#define NX842_OP_DECOMPRESS_CRC (NX842_OP_DECOMPRESS | NX842_OP_CRC)
+#define NX842_OP_ASYNC		(1<<23)
+#define NX842_OP_NOTIFY		(1<<22)
+#define NX842_OP_NOTIFY_INT(x)	((x & 0xff)<<8)
+
+static unsigned long nx842_get_desired_dma(struct vio_dev *viodev)
+{
+	/* No use of DMA mappings within the driver. */
+	return 0;
+}
+
+struct nx842_slentry {
+	unsigned long ptr; /* Real address (use __pa()) */
+	unsigned long len;
+};
+
+/* pHyp scatterlist entry */
+struct nx842_scatterlist {
+	int entry_nr; /* number of slentries */
+	struct nx842_slentry *entries; /* ptr to array of slentries */
+};
+
+/* Does not include sizeof(entry_nr) in the size */
+static inline unsigned long nx842_get_scatterlist_size(
+				struct nx842_scatterlist *sl)
+{
+	return sl->entry_nr * sizeof(struct nx842_slentry);
+}
+
+static inline unsigned long nx842_get_pa(void *addr)
+{
+	if (is_vmalloc_addr(addr))
+		return page_to_phys(vmalloc_to_page(addr))
+		       + offset_in_page(addr);
+	else
+		return __pa(addr);
+}
+
+static int nx842_build_scatterlist(unsigned long buf, int len,
+			struct nx842_scatterlist *sl)
+{
+	unsigned long nextpage;
+	struct nx842_slentry *entry;
+
+	sl->entry_nr = 0;
+
+	entry = sl->entries;
+	while (len) {
+		entry->ptr = nx842_get_pa((void *)buf);
+		nextpage = ALIGN(buf + 1, NX842_HW_PAGE_SIZE);
+		if (nextpage < buf + len) {
+			/* we aren't at the end yet */
+			if (IS_ALIGNED(buf, NX842_HW_PAGE_SIZE))
+				/* we are in the middle (or beginning) */
+				entry->len = NX842_HW_PAGE_SIZE;
+			else
+				/* we are at the beginning */
+				entry->len = nextpage - buf;
+		} else {
+			/* at the end */
+			entry->len = len;
+		}
+
+		len -= entry->len;
+		buf += entry->len;
+		sl->entry_nr++;
+		entry++;
+	}
+
+	return 0;
+}
+
+/*
+ * Working memory for software decompression
+ */
+struct sw842_fifo {
+	union {
+		char f8[256][8];
+		char f4[512][4];
+	};
+	char f2[256][2];
+	unsigned char f84_full;
+	unsigned char f2_full;
+	unsigned char f8_count;
+	unsigned char f2_count;
+	unsigned int f4_count;
+};
+
+/*
+ * Working memory for crypto API
+ */
+struct nx842_workmem {
+	char bounce[PAGE_SIZE]; /* bounce buffer for decompression input */
+	union {
+		/* hardware working memory */
+		struct {
+			/* scatterlist */
+			char slin[SIZE_4K];
+			char slout[SIZE_4K];
+			/* coprocessor status/parameter block */
+			struct nx_csbcpb csbcpb;
+		};
+		/* software working memory */
+		struct sw842_fifo swfifo; /* software decompression fifo */
+	};
+};
+
+int nx842_get_workmem_size(void)
+{
+	return sizeof(struct nx842_workmem) + NX842_HW_PAGE_SIZE;
+}
+EXPORT_SYMBOL_GPL(nx842_get_workmem_size);
+
+int nx842_get_workmem_size_aligned(void)
+{
+	return sizeof(struct nx842_workmem);
+}
+EXPORT_SYMBOL_GPL(nx842_get_workmem_size_aligned);
+
+static int nx842_validate_result(struct device *dev,
+	struct cop_status_block *csb)
+{
+	/* The csb must be valid after returning from vio_h_cop_sync */
+	if (!NX842_CSBCBP_VALID_CHK(csb->valid)) {
+		dev_err(dev, "%s: cspcbp not valid upon completion.\n",
+				__func__);
+		dev_dbg(dev, "valid:0x%02x cs:0x%02x cc:0x%02x ce:0x%02x\n",
+				csb->valid,
+				csb->crb_seq_number,
+				csb->completion_code,
+				csb->completion_extension);
+		dev_dbg(dev, "processed_bytes:%d address:0x%016lx\n",
+				csb->processed_byte_count,
+				(unsigned long)csb->address);
+		return -EIO;
+	}
+
+	/* Check return values from the hardware in the CSB */
+	switch (csb->completion_code) {
+	case 0:	/* Completed without error */
+		break;
+	case 64: /* Target bytes > Source bytes during compression */
+	case 13: /* Output buffer too small */
+		dev_dbg(dev, "%s: Compression output larger than input\n",
+					__func__);
+		return -ENOSPC;
+	case 66: /* Input data contains an illegal template field */
+	case 67: /* Template indicates data past the end of the input stream */
+		dev_dbg(dev, "%s: Bad data for decompression (code:%d)\n",
+					__func__, csb->completion_code);
+		return -EINVAL;
+	default:
+		dev_dbg(dev, "%s: Unspecified error (code:%d)\n",
+					__func__, csb->completion_code);
+		return -EIO;
+	}
+
+	/* Hardware sanity check */
+	if (!NX842_CSBCPB_CE2(csb->completion_extension)) {
+		dev_err(dev, "%s: No error returned by hardware, but "
+				"data returned is unusable, contact support.\n"
+				"(Additional info: csbcbp->processed bytes "
+				"does not specify processed bytes for the "
+				"target buffer.)\n", __func__);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/**
+ * nx842_compress - Compress data using the 842 algorithm
+ *
+ * Compression provide by the NX842 coprocessor on IBM Power systems.
+ * The input buffer is compressed and the result is stored in the
+ * provided output buffer.
+ *
+ * Upon return from this function @outlen contains the length of the
+ * compressed data.  If there is an error then @outlen will be 0 and an
+ * error will be specified by the return code from this function.
+ *
+ * @in: Pointer to input buffer, must be page aligned
+ * @inlen: Length of input buffer, must be PAGE_SIZE
+ * @out: Pointer to output buffer
+ * @outlen: Length of output buffer
+ * @wrkmem: ptr to buffer for working memory, size determined by
+ *          nx842_get_workmem_size()
+ *
+ * Returns:
+ *   0		Success, output of length @outlen stored in the buffer at @out
+ *   -ENOMEM	Unable to allocate internal buffers
+ *   -ENOSPC	Output buffer is to small
+ *   -EMSGSIZE	XXX Difficult to describe this limitation
+ *   -EIO	Internal error
+ *   -ENODEV	Hardware unavailable
+ */
+int nx842_compress(const unsigned char *in, unsigned int inlen,
+		       unsigned char *out, unsigned int *outlen, void *wmem)
+{
+	struct nx842_header *hdr;
+	struct nx842_devdata *local_devdata;
+	struct device *dev = NULL;
+	struct nx842_workmem *workmem;
+	struct nx842_scatterlist slin, slout;
+	struct nx_csbcpb *csbcpb;
+	int ret = 0, max_sync_size, i, bytesleft, size, hdrsize;
+	unsigned long inbuf, outbuf, padding;
+	struct vio_pfo_op op = {
+		.done = NULL,
+		.handle = 0,
+		.timeout = 0,
+	};
+	unsigned long start_time = get_tb();
+
+	/*
+	 * Make sure input buffer is 64k page aligned.  This is assumed since
+	 * this driver is designed for page compression only (for now).  This
+	 * is very nice since we can now use direct DDE(s) for the input and
+	 * the alignment is guaranteed.
+	*/
+	inbuf = (unsigned long)in;
+	if (!IS_ALIGNED(inbuf, PAGE_SIZE) || inlen != PAGE_SIZE)
+		return -EINVAL;
+
+	rcu_read_lock();
+	local_devdata = rcu_dereference(devdata);
+	if (!local_devdata || !local_devdata->dev) {
+		rcu_read_unlock();
+		return -ENODEV;
+	}
+	max_sync_size = local_devdata->max_sync_size;
+	dev = local_devdata->dev;
+
+	/* Create the header */
+	hdr = (struct nx842_header *)out;
+	hdr->blocks_nr = PAGE_SIZE / max_sync_size;
+	hdrsize = nx842_header_size(hdr);
+	outbuf = (unsigned long)out + hdrsize;
+	bytesleft = *outlen - hdrsize;
+
+	/* Init scatterlist */
+	workmem = (struct nx842_workmem *)ALIGN((unsigned long)wmem,
+		NX842_HW_PAGE_SIZE);
+	slin.entries = (struct nx842_slentry *)workmem->slin;
+	slout.entries = (struct nx842_slentry *)workmem->slout;
+
+	/* Init operation */
+	op.flags = NX842_OP_COMPRESS;
+	csbcpb = &workmem->csbcpb;
+	memset(csbcpb, 0, sizeof(*csbcpb));
+	op.csbcpb = nx842_get_pa(csbcpb);
+	op.out = nx842_get_pa(slout.entries);
+
+	for (i = 0; i < hdr->blocks_nr; i++) {
+		/*
+		 * Aligning the output blocks to 128 bytes does waste space,
+		 * but it prevents the need for bounce buffers and memory
+		 * copies.  It also simplifies the code a lot.  In the worst
+		 * case (64k page, 4k max_sync_size), you lose up to
+		 * (128*16)/64k = ~3% the compression factor. For 64k
+		 * max_sync_size, the loss would be at most 128/64k = ~0.2%.
+		 */
+		padding = ALIGN(outbuf, IO_BUFFER_ALIGN) - outbuf;
+		outbuf += padding;
+		bytesleft -= padding;
+		if (i == 0)
+			/* save offset into first block in header */
+			hdr->offset = padding + hdrsize;
+
+		if (bytesleft <= 0) {
+			ret = -ENOSPC;
+			goto unlock;
+		}
+
+		/*
+		 * NOTE: If the default max_sync_size is changed from 4k
+		 * to 64k, remove the "likely" case below, since a
+		 * scatterlist will always be needed.
+		 */
+		if (likely(max_sync_size == NX842_HW_PAGE_SIZE)) {
+			/* Create direct DDE */
+			op.in = nx842_get_pa((void *)inbuf);
+			op.inlen = max_sync_size;
+
+		} else {
+			/* Create indirect DDE (scatterlist) */
+			nx842_build_scatterlist(inbuf, max_sync_size, &slin);
+			op.in = nx842_get_pa(slin.entries);
+			op.inlen = -nx842_get_scatterlist_size(&slin);
+		}
+
+		/*
+		 * If max_sync_size != NX842_HW_PAGE_SIZE, an indirect
+		 * DDE is required for the outbuf.
+		 * If max_sync_size == NX842_HW_PAGE_SIZE, outbuf must
+		 * also be page aligned (1 in 128/4k=32 chance) in order
+		 * to use a direct DDE.
+		 * This is unlikely, just use an indirect DDE always.
+		 */
+		nx842_build_scatterlist(outbuf,
+			min(bytesleft, max_sync_size), &slout);
+		/* op.out set before loop */
+		op.outlen = -nx842_get_scatterlist_size(&slout);
+
+		/* Send request to pHyp */
+		ret = vio_h_cop_sync(local_devdata->vdev, &op);
+
+		/* Check for pHyp error */
+		if (ret) {
+			dev_dbg(dev, "%s: vio_h_cop_sync error (ret=%d, hret=%ld)\n",
+				__func__, ret, op.hcall_err);
+			ret = -EIO;
+			goto unlock;
+		}
+
+		/* Check for hardware error */
+		ret = nx842_validate_result(dev, &csbcpb->csb);
+		if (ret && ret != -ENOSPC)
+			goto unlock;
+
+		/* Handle incompressible data */
+		if (unlikely(ret == -ENOSPC)) {
+			if (bytesleft < max_sync_size) {
+				/*
+				 * Not enough space left in the output buffer
+				 * to store uncompressed block
+				 */
+				goto unlock;
+			} else {
+				/* Store incompressible block */
+				memcpy((void *)outbuf, (void *)inbuf,
+					max_sync_size);
+				hdr->sizes[i] = -max_sync_size;
+				outbuf += max_sync_size;
+				bytesleft -= max_sync_size;
+				/* Reset ret, incompressible data handled */
+				ret = 0;
+			}
+		} else {
+			/* Normal case, compression was successful */
+			size = csbcpb->csb.processed_byte_count;
+			dev_dbg(dev, "%s: processed_bytes=%d\n",
+				__func__, size);
+			hdr->sizes[i] = size;
+			outbuf += size;
+			bytesleft -= size;
+		}
+
+		inbuf += max_sync_size;
+	}
+
+	*outlen = (unsigned int)(outbuf - (unsigned long)out);
+
+unlock:
+	if (ret)
+		nx842_inc_comp_failed(local_devdata);
+	else {
+		nx842_inc_comp_complete(local_devdata);
+		ibm_nx842_incr_hist(local_devdata->counters->comp_times,
+			(get_tb() - start_time) / tb_ticks_per_usec);
+	}
+	rcu_read_unlock();
+	return ret;
+}
+EXPORT_SYMBOL_GPL(nx842_compress);
+
+static int sw842_decompress(const unsigned char *, int, unsigned char *, int *,
+			const void *);
+
+/**
+ * nx842_decompress - Decompress data using the 842 algorithm
+ *
+ * Decompression provide by the NX842 coprocessor on IBM Power systems.
+ * The input buffer is decompressed and the result is stored in the
+ * provided output buffer.  The size allocated to the output buffer is
+ * provided by the caller of this function in @outlen.  Upon return from
+ * this function @outlen contains the length of the decompressed data.
+ * If there is an error then @outlen will be 0 and an error will be
+ * specified by the return code from this function.
+ *
+ * @in: Pointer to input buffer, will use bounce buffer if not 128 byte
+ *      aligned
+ * @inlen: Length of input buffer
+ * @out: Pointer to output buffer, must be page aligned
+ * @outlen: Length of output buffer, must be PAGE_SIZE
+ * @wrkmem: ptr to buffer for working memory, size determined by
+ *          nx842_get_workmem_size()
+ *
+ * Returns:
+ *   0		Success, output of length @outlen stored in the buffer at @out
+ *   -ENODEV	Hardware decompression device is unavailable
+ *   -ENOMEM	Unable to allocate internal buffers
+ *   -ENOSPC	Output buffer is to small
+ *   -EINVAL	Bad input data encountered when attempting decompress
+ *   -EIO	Internal error
+ */
+int nx842_decompress(const unsigned char *in, unsigned int inlen,
+			 unsigned char *out, unsigned int *outlen, void *wmem)
+{
+	struct nx842_header *hdr;
+	struct nx842_devdata *local_devdata;
+	struct device *dev = NULL;
+	struct nx842_workmem *workmem;
+	struct nx842_scatterlist slin, slout;
+	struct nx_csbcpb *csbcpb;
+	int ret = 0, i, size, max_sync_size;
+	unsigned long inbuf, outbuf;
+	struct vio_pfo_op op = {
+		.done = NULL,
+		.handle = 0,
+		.timeout = 0,
+	};
+	unsigned long start_time = get_tb();
+
+	/* Ensure page alignment and size */
+	outbuf = (unsigned long)out;
+	if (!IS_ALIGNED(outbuf, PAGE_SIZE) || *outlen != PAGE_SIZE)
+		return -EINVAL;
+
+	rcu_read_lock();
+	local_devdata = rcu_dereference(devdata);
+	if (local_devdata)
+		dev = local_devdata->dev;
+
+	/* Get header */
+	hdr = (struct nx842_header *)in;
+
+	workmem = (struct nx842_workmem *)ALIGN((unsigned long)wmem,
+		NX842_HW_PAGE_SIZE);
+
+	inbuf = (unsigned long)in + hdr->offset;
+	if (likely(!IS_ALIGNED(inbuf, IO_BUFFER_ALIGN))) {
+		/* Copy block(s) into bounce buffer for alignment */
+		memcpy(workmem->bounce, in + hdr->offset, inlen - hdr->offset);
+		inbuf = (unsigned long)workmem->bounce;
+	}
+
+	/* Init scatterlist */
+	slin.entries = (struct nx842_slentry *)workmem->slin;
+	slout.entries = (struct nx842_slentry *)workmem->slout;
+
+	/* Init operation */
+	op.flags = NX842_OP_DECOMPRESS;
+	csbcpb = &workmem->csbcpb;
+	memset(csbcpb, 0, sizeof(*csbcpb));
+	op.csbcpb = nx842_get_pa(csbcpb);
+
+	/*
+	 * max_sync_size may have changed since compression,
+	 * so we can't read it from the device info. We need
+	 * to derive it from hdr->blocks_nr.
+	 */
+	max_sync_size = PAGE_SIZE / hdr->blocks_nr;
+
+	for (i = 0; i < hdr->blocks_nr; i++) {
+		/* Skip padding */
+		inbuf = ALIGN(inbuf, IO_BUFFER_ALIGN);
+
+		if (hdr->sizes[i] < 0) {
+			/* Negative sizes indicate uncompressed data blocks */
+			size = abs(hdr->sizes[i]);
+			memcpy((void *)outbuf, (void *)inbuf, size);
+			outbuf += size;
+			inbuf += size;
+			continue;
+		}
+
+		if (!dev)
+			goto sw;
+
+		/*
+		 * The better the compression, the more likely the "likely"
+		 * case becomes.
+		 */
+		if (likely((inbuf & NX842_HW_PAGE_MASK) ==
+			((inbuf + hdr->sizes[i] - 1) & NX842_HW_PAGE_MASK))) {
+			/* Create direct DDE */
+			op.in = nx842_get_pa((void *)inbuf);
+			op.inlen = hdr->sizes[i];
+		} else {
+			/* Create indirect DDE (scatterlist) */
+			nx842_build_scatterlist(inbuf, hdr->sizes[i] , &slin);
+			op.in = nx842_get_pa(slin.entries);
+			op.inlen = -nx842_get_scatterlist_size(&slin);
+		}
+
+		/*
+		 * NOTE: If the default max_sync_size is changed from 4k
+		 * to 64k, remove the "likely" case below, since a
+		 * scatterlist will always be needed.
+		 */
+		if (likely(max_sync_size == NX842_HW_PAGE_SIZE)) {
+			/* Create direct DDE */
+			op.out = nx842_get_pa((void *)outbuf);
+			op.outlen = max_sync_size;
+		} else {
+			/* Create indirect DDE (scatterlist) */
+			nx842_build_scatterlist(outbuf, max_sync_size, &slout);
+			op.out = nx842_get_pa(slout.entries);
+			op.outlen = -nx842_get_scatterlist_size(&slout);
+		}
+
+		/* Send request to pHyp */
+		ret = vio_h_cop_sync(local_devdata->vdev, &op);
+
+		/* Check for pHyp error */
+		if (ret) {
+			dev_dbg(dev, "%s: vio_h_cop_sync error (ret=%d, hret=%ld)\n",
+				__func__, ret, op.hcall_err);
+			dev = NULL;
+			goto sw;
+		}
+
+		/* Check for hardware error */
+		ret = nx842_validate_result(dev, &csbcpb->csb);
+		if (ret) {
+			dev = NULL;
+			goto sw;
+		}
+
+		/* HW decompression success */
+		inbuf += hdr->sizes[i];
+		outbuf += csbcpb->csb.processed_byte_count;
+		continue;
+
+sw:
+		/* software decompression */
+		size = max_sync_size;
+		ret = sw842_decompress(
+			(unsigned char *)inbuf, hdr->sizes[i],
+			(unsigned char *)outbuf, &size, wmem);
+		if (ret)
+			pr_debug("%s: sw842_decompress failed with %d\n",
+				__func__, ret);
+
+		if (ret) {
+			if (ret != -ENOSPC && ret != -EINVAL &&
+					ret != -EMSGSIZE)
+				ret = -EIO;
+			goto unlock;
+		}
+
+		/* SW decompression success */
+		inbuf += hdr->sizes[i];
+		outbuf += size;
+	}
+
+	*outlen = (unsigned int)(outbuf - (unsigned long)out);
+
+unlock:
+	if (ret)
+		/* decompress fail */
+		nx842_inc_decomp_failed(local_devdata);
+	else {
+		if (!dev)
+			/* software decompress */
+			nx842_inc_swdecomp(local_devdata);
+		nx842_inc_decomp_complete(local_devdata);
+		ibm_nx842_incr_hist(local_devdata->counters->decomp_times,
+			(get_tb() - start_time) / tb_ticks_per_usec);
+	}
+
+	rcu_read_unlock();
+	return ret;
+}
+EXPORT_SYMBOL_GPL(nx842_decompress);
+
+/**
+ * nx842_OF_set_defaults -- Set default (disabled) values for devdata
+ *
+ * @devdata - struct nx842_devdata to update
+ *
+ * Returns:
+ *  0 on success
+ *  -ENOENT if @devdata ptr is NULL
+ */
+static int nx842_OF_set_defaults(struct nx842_devdata *devdata)
+{
+	if (devdata) {
+		devdata->max_sync_size = 0;
+		devdata->max_sync_sg = 0;
+		devdata->max_sg_len = 0;
+		devdata->status = UNAVAILABLE;
+		return 0;
+	} else
+		return -ENOENT;
+}
+
+/**
+ * nx842_OF_upd_status -- Update the device info from OF status prop
+ *
+ * The status property indicates if the accelerator is enabled.  If the
+ * device is in the OF tree it indicates that the hardware is present.
+ * The status field indicates if the device is enabled when the status
+ * is 'okay'.  Otherwise the device driver will be disabled.
+ *
+ * @devdata - struct nx842_devdata to update
+ * @prop - struct property point containing the maxsyncop for the update
+ *
+ * Returns:
+ *  0 - Device is available
+ *  -EINVAL - Device is not available
+ */
+static int nx842_OF_upd_status(struct nx842_devdata *devdata,
+					struct property *prop) {
+	int ret = 0;
+	const char *status = (const char *)prop->value;
+
+	if (!strncmp(status, "okay", (size_t)prop->length)) {
+		devdata->status = AVAILABLE;
+	} else {
+		dev_info(devdata->dev, "%s: status '%s' is not 'okay'\n",
+				__func__, status);
+		devdata->status = UNAVAILABLE;
+	}
+
+	return ret;
+}
+
+/**
+ * nx842_OF_upd_maxsglen -- Update the device info from OF maxsglen prop
+ *
+ * Definition of the 'ibm,max-sg-len' OF property:
+ *  This field indicates the maximum byte length of a scatter list
+ *  for the platform facility. It is a single cell encoded as with encode-int.
+ *
+ * Example:
+ *  # od -x ibm,max-sg-len
+ *  0000000 0000 0ff0
+ *
+ *  In this example, the maximum byte length of a scatter list is
+ *  0x0ff0 (4,080).
+ *
+ * @devdata - struct nx842_devdata to update
+ * @prop - struct property point containing the maxsyncop for the update
+ *
+ * Returns:
+ *  0 on success
+ *  -EINVAL on failure
+ */
+static int nx842_OF_upd_maxsglen(struct nx842_devdata *devdata,
+					struct property *prop) {
+	int ret = 0;
+	const int *maxsglen = prop->value;
+
+	if (prop->length != sizeof(*maxsglen)) {
+		dev_err(devdata->dev, "%s: unexpected format for ibm,max-sg-len property\n", __func__);
+		dev_dbg(devdata->dev, "%s: ibm,max-sg-len is %d bytes long, expected %lu bytes\n", __func__,
+				prop->length, sizeof(*maxsglen));
+		ret = -EINVAL;
+	} else {
+		devdata->max_sg_len = (unsigned int)min(*maxsglen,
+				(int)NX842_HW_PAGE_SIZE);
+	}
+
+	return ret;
+}
+
+/**
+ * nx842_OF_upd_maxsyncop -- Update the device info from OF maxsyncop prop
+ *
+ * Definition of the 'ibm,max-sync-cop' OF property:
+ *  Two series of cells.  The first series of cells represents the maximums
+ *  that can be synchronously compressed. The second series of cells
+ *  represents the maximums that can be synchronously decompressed.
+ *  1. The first cell in each series contains the count of the number of
+ *     data length, scatter list elements pairs that follow – each being
+ *     of the form
+ *    a. One cell data byte length
+ *    b. One cell total number of scatter list elements
+ *
+ * Example:
+ *  # od -x ibm,max-sync-cop
+ *  0000000 0000 0001 0000 1000 0000 01fe 0000 0001
+ *  0000020 0000 1000 0000 01fe
+ *
+ *  In this example, compression supports 0x1000 (4,096) data byte length
+ *  and 0x1fe (510) total scatter list elements.  Decompression supports
+ *  0x1000 (4,096) data byte length and 0x1f3 (510) total scatter list
+ *  elements.
+ *
+ * @devdata - struct nx842_devdata to update
+ * @prop - struct property point containing the maxsyncop for the update
+ *
+ * Returns:
+ *  0 on success
+ *  -EINVAL on failure
+ */
+static int nx842_OF_upd_maxsyncop(struct nx842_devdata *devdata,
+					struct property *prop) {
+	int ret = 0;
+	const struct maxsynccop_t {
+		int comp_elements;
+		int comp_data_limit;
+		int comp_sg_limit;
+		int decomp_elements;
+		int decomp_data_limit;
+		int decomp_sg_limit;
+	} *maxsynccop;
+
+	if (prop->length != sizeof(*maxsynccop)) {
+		dev_err(devdata->dev, "%s: unexpected format for ibm,max-sync-cop property\n", __func__);
+		dev_dbg(devdata->dev, "%s: ibm,max-sync-cop is %d bytes long, expected %lu bytes\n", __func__, prop->length,
+				sizeof(*maxsynccop));
+		ret = -EINVAL;
+		goto out;
+	}
+
+	maxsynccop = (const struct maxsynccop_t *)prop->value;
+
+	/* Use one limit rather than separate limits for compression and
+	 * decompression. Set a maximum for this so as not to exceed the
+	 * size that the header can support and round the value down to
+	 * the hardware page size (4K) */
+	devdata->max_sync_size =
+			(unsigned int)min(maxsynccop->comp_data_limit,
+					maxsynccop->decomp_data_limit);
+
+	devdata->max_sync_size = min_t(unsigned int, devdata->max_sync_size,
+					SIZE_64K);
+
+	if (devdata->max_sync_size < SIZE_4K) {
+		dev_err(devdata->dev, "%s: hardware max data size (%u) is "
+				"less than the driver minimum, unable to use "
+				"the hardware device\n",
+				__func__, devdata->max_sync_size);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	devdata->max_sync_sg = (unsigned int)min(maxsynccop->comp_sg_limit,
+						maxsynccop->decomp_sg_limit);
+	if (devdata->max_sync_sg < 1) {
+		dev_err(devdata->dev, "%s: hardware max sg size (%u) is "
+				"less than the driver minimum, unable to use "
+				"the hardware device\n",
+				__func__, devdata->max_sync_sg);
+		ret = -EINVAL;
+		goto out;
+	}
+
+out:
+	return ret;
+}
+
+/**
+ *
+ * nx842_OF_upd -- Handle OF properties updates for the device.
+ *
+ * Set all properties from the OF tree.  Optionally, a new property
+ * can be provided by the @new_prop pointer to overwrite an existing value.
+ * The device will remain disabled until all values are valid, this function
+ * will return an error for updates unless all values are valid.
+ *
+ * @new_prop: If not NULL, this property is being updated.  If NULL, update
+ *  all properties from the current values in the OF tree.
+ *
+ * Returns:
+ *  0 - Success
+ *  -ENOMEM - Could not allocate memory for new devdata structure
+ *  -EINVAL - property value not found, new_prop is not a recognized
+ *	property for the device or property value is not valid.
+ *  -ENODEV - Device is not available
+ */
+static int nx842_OF_upd(struct property *new_prop)
+{
+	struct nx842_devdata *old_devdata = NULL;
+	struct nx842_devdata *new_devdata = NULL;
+	struct device_node *of_node = NULL;
+	struct property *status = NULL;
+	struct property *maxsglen = NULL;
+	struct property *maxsyncop = NULL;
+	int ret = 0;
+	unsigned long flags;
+
+	spin_lock_irqsave(&devdata_mutex, flags);
+	old_devdata = rcu_dereference_check(devdata,
+			lockdep_is_held(&devdata_mutex));
+	if (old_devdata)
+		of_node = old_devdata->dev->of_node;
+
+	if (!old_devdata || !of_node) {
+		pr_err("%s: device is not available\n", __func__);
+		spin_unlock_irqrestore(&devdata_mutex, flags);
+		return -ENODEV;
+	}
+
+	new_devdata = kzalloc(sizeof(*new_devdata), GFP_NOFS);
+	if (!new_devdata) {
+		dev_err(old_devdata->dev, "%s: Could not allocate memory for device data\n", __func__);
+		ret = -ENOMEM;
+		goto error_out;
+	}
+
+	memcpy(new_devdata, old_devdata, sizeof(*old_devdata));
+	new_devdata->counters = old_devdata->counters;
+
+	/* Set ptrs for existing properties */
+	status = of_find_property(of_node, "status", NULL);
+	maxsglen = of_find_property(of_node, "ibm,max-sg-len", NULL);
+	maxsyncop = of_find_property(of_node, "ibm,max-sync-cop", NULL);
+	if (!status || !maxsglen || !maxsyncop) {
+		dev_err(old_devdata->dev, "%s: Could not locate device properties\n", __func__);
+		ret = -EINVAL;
+		goto error_out;
+	}
+
+	/*
+	 * If this is a property update, there are only certain properties that
+	 * we care about. Bail if it isn't in the below list
+	 */
+	if (new_prop && (strncmp(new_prop->name, "status", new_prop->length) ||
+		         strncmp(new_prop->name, "ibm,max-sg-len", new_prop->length) ||
+		         strncmp(new_prop->name, "ibm,max-sync-cop", new_prop->length)))
+		goto out;
+
+	/* Perform property updates */
+	ret = nx842_OF_upd_status(new_devdata, status);
+	if (ret)
+		goto error_out;
+
+	ret = nx842_OF_upd_maxsglen(new_devdata, maxsglen);
+	if (ret)
+		goto error_out;
+
+	ret = nx842_OF_upd_maxsyncop(new_devdata, maxsyncop);
+	if (ret)
+		goto error_out;
+
+out:
+	dev_info(old_devdata->dev, "%s: max_sync_size new:%u old:%u\n",
+			__func__, new_devdata->max_sync_size,
+			old_devdata->max_sync_size);
+	dev_info(old_devdata->dev, "%s: max_sync_sg new:%u old:%u\n",
+			__func__, new_devdata->max_sync_sg,
+			old_devdata->max_sync_sg);
+	dev_info(old_devdata->dev, "%s: max_sg_len new:%u old:%u\n",
+			__func__, new_devdata->max_sg_len,
+			old_devdata->max_sg_len);
+
+	rcu_assign_pointer(devdata, new_devdata);
+	spin_unlock_irqrestore(&devdata_mutex, flags);
+	synchronize_rcu();
+	dev_set_drvdata(new_devdata->dev, new_devdata);
+	kfree(old_devdata);
+	return 0;
+
+error_out:
+	if (new_devdata) {
+		dev_info(old_devdata->dev, "%s: device disabled\n", __func__);
+		nx842_OF_set_defaults(new_devdata);
+		rcu_assign_pointer(devdata, new_devdata);
+		spin_unlock_irqrestore(&devdata_mutex, flags);
+		synchronize_rcu();
+		dev_set_drvdata(new_devdata->dev, new_devdata);
+		kfree(old_devdata);
+	} else {
+		dev_err(old_devdata->dev, "%s: could not update driver from hardware\n", __func__);
+		spin_unlock_irqrestore(&devdata_mutex, flags);
+	}
+
+	if (!ret)
+		ret = -EINVAL;
+	return ret;
+}
+
+/**
+ * nx842_OF_notifier - Process updates to OF properties for the device
+ *
+ * @np: notifier block
+ * @action: notifier action
+ * @update: struct pSeries_reconfig_prop_update pointer if action is
+ *	PSERIES_UPDATE_PROPERTY
+ *
+ * Returns:
+ *	NOTIFY_OK on success
+ *	NOTIFY_BAD encoded with error number on failure, use
+ *		notifier_to_errno() to decode this value
+ */
+static int nx842_OF_notifier(struct notifier_block *np, unsigned long action,
+			     void *data)
+{
+	struct of_reconfig_data *upd = data;
+	struct nx842_devdata *local_devdata;
+	struct device_node *node = NULL;
+
+	rcu_read_lock();
+	local_devdata = rcu_dereference(devdata);
+	if (local_devdata)
+		node = local_devdata->dev->of_node;
+
+	if (local_devdata &&
+			action == OF_RECONFIG_UPDATE_PROPERTY &&
+			!strcmp(upd->dn->name, node->name)) {
+		rcu_read_unlock();
+		nx842_OF_upd(upd->prop);
+	} else
+		rcu_read_unlock();
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block nx842_of_nb = {
+	.notifier_call = nx842_OF_notifier,
+};
+
+#define nx842_counter_read(_name)					\
+static ssize_t nx842_##_name##_show(struct device *dev,		\
+		struct device_attribute *attr,				\
+		char *buf) {						\
+	struct nx842_devdata *local_devdata;			\
+	int p = 0;							\
+	rcu_read_lock();						\
+	local_devdata = rcu_dereference(devdata);			\
+	if (local_devdata)						\
+		p = snprintf(buf, PAGE_SIZE, "%ld\n",			\
+		       atomic64_read(&local_devdata->counters->_name));	\
+	rcu_read_unlock();						\
+	return p;							\
+}
+
+#define NX842DEV_COUNTER_ATTR_RO(_name)					\
+	nx842_counter_read(_name);					\
+	static struct device_attribute dev_attr_##_name = __ATTR(_name,	\
+						0444,			\
+						nx842_##_name##_show,\
+						NULL);
+
+NX842DEV_COUNTER_ATTR_RO(comp_complete);
+NX842DEV_COUNTER_ATTR_RO(comp_failed);
+NX842DEV_COUNTER_ATTR_RO(decomp_complete);
+NX842DEV_COUNTER_ATTR_RO(decomp_failed);
+NX842DEV_COUNTER_ATTR_RO(swdecomp);
+
+static ssize_t nx842_timehist_show(struct device *,
+		struct device_attribute *, char *);
+
+static struct device_attribute dev_attr_comp_times = __ATTR(comp_times, 0444,
+		nx842_timehist_show, NULL);
+static struct device_attribute dev_attr_decomp_times = __ATTR(decomp_times,
+		0444, nx842_timehist_show, NULL);
+
+static ssize_t nx842_timehist_show(struct device *dev,
+		struct device_attribute *attr, char *buf) {
+	char *p = buf;
+	struct nx842_devdata *local_devdata;
+	atomic64_t *times;
+	int bytes_remain = PAGE_SIZE;
+	int bytes;
+	int i;
+
+	rcu_read_lock();
+	local_devdata = rcu_dereference(devdata);
+	if (!local_devdata) {
+		rcu_read_unlock();
+		return 0;
+	}
+
+	if (attr == &dev_attr_comp_times)
+		times = local_devdata->counters->comp_times;
+	else if (attr == &dev_attr_decomp_times)
+		times = local_devdata->counters->decomp_times;
+	else {
+		rcu_read_unlock();
+		return 0;
+	}
+
+	for (i = 0; i < (NX842_HIST_SLOTS - 2); i++) {
+		bytes = snprintf(p, bytes_remain, "%u-%uus:\t%ld\n",
+			       i ? (2<<(i-1)) : 0, (2<<i)-1,
+			       atomic64_read(&times[i]));
+		bytes_remain -= bytes;
+		p += bytes;
+	}
+	/* The last bucket holds everything over
+	 * 2<<(NX842_HIST_SLOTS - 2) us */
+	bytes = snprintf(p, bytes_remain, "%uus - :\t%ld\n",
+			2<<(NX842_HIST_SLOTS - 2),
+			atomic64_read(&times[(NX842_HIST_SLOTS - 1)]));
+	p += bytes;
+
+	rcu_read_unlock();
+	return p - buf;
+}
+
+static struct attribute *nx842_sysfs_entries[] = {
+	&dev_attr_comp_complete.attr,
+	&dev_attr_comp_failed.attr,
+	&dev_attr_decomp_complete.attr,
+	&dev_attr_decomp_failed.attr,
+	&dev_attr_swdecomp.attr,
+	&dev_attr_comp_times.attr,
+	&dev_attr_decomp_times.attr,
+	NULL,
+};
+
+static struct attribute_group nx842_attribute_group = {
+	.name = NULL,		/* put in device directory */
+	.attrs = nx842_sysfs_entries,
+};
+
+static int __init nx842_probe(struct vio_dev *viodev,
+				  const struct vio_device_id *id)
+{
+	struct nx842_devdata *old_devdata, *new_devdata = NULL;
+	unsigned long flags;
+	int ret = 0;
+
+	spin_lock_irqsave(&devdata_mutex, flags);
+	old_devdata = rcu_dereference_check(devdata,
+			lockdep_is_held(&devdata_mutex));
+
+	if (old_devdata && old_devdata->vdev != NULL) {
+		dev_err(&viodev->dev, "%s: Attempt to register more than one instance of the hardware\n", __func__);
+		ret = -1;
+		goto error_unlock;
+	}
+
+	dev_set_drvdata(&viodev->dev, NULL);
+
+	new_devdata = kzalloc(sizeof(*new_devdata), GFP_NOFS);
+	if (!new_devdata) {
+		dev_err(&viodev->dev, "%s: Could not allocate memory for device data\n", __func__);
+		ret = -ENOMEM;
+		goto error_unlock;
+	}
+
+	new_devdata->counters = kzalloc(sizeof(*new_devdata->counters),
+			GFP_NOFS);
+	if (!new_devdata->counters) {
+		dev_err(&viodev->dev, "%s: Could not allocate memory for performance counters\n", __func__);
+		ret = -ENOMEM;
+		goto error_unlock;
+	}
+
+	new_devdata->vdev = viodev;
+	new_devdata->dev = &viodev->dev;
+	nx842_OF_set_defaults(new_devdata);
+
+	rcu_assign_pointer(devdata, new_devdata);
+	spin_unlock_irqrestore(&devdata_mutex, flags);
+	synchronize_rcu();
+	kfree(old_devdata);
+
+	of_reconfig_notifier_register(&nx842_of_nb);
+
+	ret = nx842_OF_upd(NULL);
+	if (ret && ret != -ENODEV) {
+		dev_err(&viodev->dev, "could not parse device tree. %d\n", ret);
+		ret = -1;
+		goto error;
+	}
+
+	rcu_read_lock();
+	dev_set_drvdata(&viodev->dev, rcu_dereference(devdata));
+	rcu_read_unlock();
+
+	if (sysfs_create_group(&viodev->dev.kobj, &nx842_attribute_group)) {
+		dev_err(&viodev->dev, "could not create sysfs device attributes\n");
+		ret = -1;
+		goto error;
+	}
+
+	return 0;
+
+error_unlock:
+	spin_unlock_irqrestore(&devdata_mutex, flags);
+	if (new_devdata)
+		kfree(new_devdata->counters);
+	kfree(new_devdata);
+error:
+	return ret;
+}
+
+static int __exit nx842_remove(struct vio_dev *viodev)
+{
+	struct nx842_devdata *old_devdata;
+	unsigned long flags;
+
+	pr_info("Removing IBM Power 842 compression device\n");
+	sysfs_remove_group(&viodev->dev.kobj, &nx842_attribute_group);
+
+	spin_lock_irqsave(&devdata_mutex, flags);
+	old_devdata = rcu_dereference_check(devdata,
+			lockdep_is_held(&devdata_mutex));
+	of_reconfig_notifier_unregister(&nx842_of_nb);
+	RCU_INIT_POINTER(devdata, NULL);
+	spin_unlock_irqrestore(&devdata_mutex, flags);
+	synchronize_rcu();
+	dev_set_drvdata(&viodev->dev, NULL);
+	if (old_devdata)
+		kfree(old_devdata->counters);
+	kfree(old_devdata);
+	return 0;
+}
+
+static struct vio_device_id nx842_driver_ids[] = {
+	{"ibm,compression-v1", "ibm,compression"},
+	{"", ""},
+};
+
+static struct vio_driver nx842_driver = {
+	.name = MODULE_NAME,
+	.probe = nx842_probe,
+	.remove = __exit_p(nx842_remove),
+	.get_desired_dma = nx842_get_desired_dma,
+	.id_table = nx842_driver_ids,
+};
+
+static int __init nx842_init(void)
+{
+	struct nx842_devdata *new_devdata;
+	pr_info("Registering IBM Power 842 compression driver\n");
+
+	RCU_INIT_POINTER(devdata, NULL);
+	new_devdata = kzalloc(sizeof(*new_devdata), GFP_KERNEL);
+	if (!new_devdata) {
+		pr_err("Could not allocate memory for device data\n");
+		return -ENOMEM;
+	}
+	new_devdata->status = UNAVAILABLE;
+	RCU_INIT_POINTER(devdata, new_devdata);
+
+	return vio_register_driver(&nx842_driver);
+}
+
+module_init(nx842_init);
+
+static void __exit nx842_exit(void)
+{
+	struct nx842_devdata *old_devdata;
+	unsigned long flags;
+
+	pr_info("Exiting IBM Power 842 compression driver\n");
+	spin_lock_irqsave(&devdata_mutex, flags);
+	old_devdata = rcu_dereference_check(devdata,
+			lockdep_is_held(&devdata_mutex));
+	RCU_INIT_POINTER(devdata, NULL);
+	spin_unlock_irqrestore(&devdata_mutex, flags);
+	synchronize_rcu();
+	if (old_devdata)
+		dev_set_drvdata(old_devdata->dev, NULL);
+	kfree(old_devdata);
+	vio_unregister_driver(&nx842_driver);
+}
+
+module_exit(nx842_exit);
+
+/*********************************
+ * 842 software decompressor
+*********************************/
+typedef int (*sw842_template_op)(const char **, int *, unsigned char **,
+						struct sw842_fifo *);
+
+static int sw842_data8(const char **, int *, unsigned char **,
+						struct sw842_fifo *);
+static int sw842_data4(const char **, int *, unsigned char **,
+						struct sw842_fifo *);
+static int sw842_data2(const char **, int *, unsigned char **,
+						struct sw842_fifo *);
+static int sw842_ptr8(const char **, int *, unsigned char **,
+						struct sw842_fifo *);
+static int sw842_ptr4(const char **, int *, unsigned char **,
+						struct sw842_fifo *);
+static int sw842_ptr2(const char **, int *, unsigned char **,
+						struct sw842_fifo *);
+
+/* special templates */
+#define SW842_TMPL_REPEAT 0x1B
+#define SW842_TMPL_ZEROS 0x1C
+#define SW842_TMPL_EOF 0x1E
+
+static sw842_template_op sw842_tmpl_ops[26][4] = {
+	{ sw842_data8, NULL}, /* 0 (00000) */
+	{ sw842_data4, sw842_data2, sw842_ptr2,  NULL},
+	{ sw842_data4, sw842_ptr2,  sw842_data2, NULL},
+	{ sw842_data4, sw842_ptr2,  sw842_ptr2,  NULL},
+	{ sw842_data4, sw842_ptr4,  NULL},
+	{ sw842_data2, sw842_ptr2,  sw842_data4, NULL},
+	{ sw842_data2, sw842_ptr2,  sw842_data2, sw842_ptr2},
+	{ sw842_data2, sw842_ptr2,  sw842_ptr2,  sw842_data2},
+	{ sw842_data2, sw842_ptr2,  sw842_ptr2,  sw842_ptr2,},
+	{ sw842_data2, sw842_ptr2,  sw842_ptr4,  NULL},
+	{ sw842_ptr2,  sw842_data2, sw842_data4, NULL}, /* 10 (01010) */
+	{ sw842_ptr2,  sw842_data4, sw842_ptr2,  NULL},
+	{ sw842_ptr2,  sw842_data2, sw842_ptr2,  sw842_data2},
+	{ sw842_ptr2,  sw842_data2, sw842_ptr2,  sw842_ptr2},
+	{ sw842_ptr2,  sw842_data2, sw842_ptr4,  NULL},
+	{ sw842_ptr2,  sw842_ptr2,  sw842_data4, NULL},
+	{ sw842_ptr2,  sw842_ptr2,  sw842_data2, sw842_ptr2},
+	{ sw842_ptr2,  sw842_ptr2,  sw842_ptr2,  sw842_data2},
+	{ sw842_ptr2,  sw842_ptr2,  sw842_ptr2,  sw842_ptr2},
+	{ sw842_ptr2,  sw842_ptr2,  sw842_ptr4,  NULL},
+	{ sw842_ptr4,  sw842_data4, NULL}, /* 20 (10100) */
+	{ sw842_ptr4,  sw842_data2, sw842_ptr2,  NULL},
+	{ sw842_ptr4,  sw842_ptr2,  sw842_data2, NULL},
+	{ sw842_ptr4,  sw842_ptr2,  sw842_ptr2,  NULL},
+	{ sw842_ptr4,  sw842_ptr4,  NULL},
+	{ sw842_ptr8,  NULL}
+};
+
+/* Software decompress helpers */
+
+static uint8_t sw842_get_byte(const char *buf, int bit)
+{
+	uint8_t tmpl;
+	uint16_t tmp;
+	tmp = htons(*(uint16_t *)(buf));
+	tmp = (uint16_t)(tmp << bit);
+	tmp = ntohs(tmp);
+	memcpy(&tmpl, &tmp, 1);
+	return tmpl;
+}
+
+static uint8_t sw842_get_template(const char **buf, int *bit)
+{
+	uint8_t byte;
+	byte = sw842_get_byte(*buf, *bit);
+	byte = byte >> 3;
+	byte &= 0x1F;
+	*buf += (*bit + 5) / 8;
+	*bit = (*bit + 5) % 8;
+	return byte;
+}
+
+/* repeat_count happens to be 5-bit too (like the template) */
+static uint8_t sw842_get_repeat_count(const char **buf, int *bit)
+{
+	uint8_t byte;
+	byte = sw842_get_byte(*buf, *bit);
+	byte = byte >> 2;
+	byte &= 0x3F;
+	*buf += (*bit + 6) / 8;
+	*bit = (*bit + 6) % 8;
+	return byte;
+}
+
+static uint8_t sw842_get_ptr2(const char **buf, int *bit)
+{
+	uint8_t ptr;
+	ptr = sw842_get_byte(*buf, *bit);
+	(*buf)++;
+	return ptr;
+}
+
+static uint16_t sw842_get_ptr4(const char **buf, int *bit,
+		struct sw842_fifo *fifo)
+{
+	uint16_t ptr;
+	ptr = htons(*(uint16_t *)(*buf));
+	ptr = (uint16_t)(ptr << *bit);
+	ptr = ptr >> 7;
+	ptr &= 0x01FF;
+	*buf += (*bit + 9) / 8;
+	*bit = (*bit + 9) % 8;
+	return ptr;
+}
+
+static uint8_t sw842_get_ptr8(const char **buf, int *bit,
+		struct sw842_fifo *fifo)
+{
+	return sw842_get_ptr2(buf, bit);
+}
+
+/* Software decompress template ops */
+
+static int sw842_data8(const char **inbuf, int *inbit,
+		unsigned char **outbuf, struct sw842_fifo *fifo)
+{
+	int ret;
+
+	ret = sw842_data4(inbuf, inbit, outbuf, fifo);
+	if (ret)
+		return ret;
+	ret = sw842_data4(inbuf, inbit, outbuf, fifo);
+	return ret;
+}
+
+static int sw842_data4(const char **inbuf, int *inbit,
+		unsigned char **outbuf, struct sw842_fifo *fifo)
+{
+	int ret;
+
+	ret = sw842_data2(inbuf, inbit, outbuf, fifo);
+	if (ret)
+		return ret;
+	ret = sw842_data2(inbuf, inbit, outbuf, fifo);
+	return ret;
+}
+
+static int sw842_data2(const char **inbuf, int *inbit,
+		unsigned char **outbuf, struct sw842_fifo *fifo)
+{
+	**outbuf = sw842_get_byte(*inbuf, *inbit);
+	(*inbuf)++;
+	(*outbuf)++;
+	**outbuf = sw842_get_byte(*inbuf, *inbit);
+	(*inbuf)++;
+	(*outbuf)++;
+	return 0;
+}
+
+static int sw842_ptr8(const char **inbuf, int *inbit,
+		unsigned char **outbuf, struct sw842_fifo *fifo)
+{
+	uint8_t ptr;
+	ptr = sw842_get_ptr8(inbuf, inbit, fifo);
+	if (!fifo->f84_full && (ptr >= fifo->f8_count))
+		return 1;
+	memcpy(*outbuf, fifo->f8[ptr], 8);
+	*outbuf += 8;
+	return 0;
+}
+
+static int sw842_ptr4(const char **inbuf, int *inbit,
+		unsigned char **outbuf, struct sw842_fifo *fifo)
+{
+	uint16_t ptr;
+	ptr = sw842_get_ptr4(inbuf, inbit, fifo);
+	if (!fifo->f84_full && (ptr >= fifo->f4_count))
+		return 1;
+	memcpy(*outbuf, fifo->f4[ptr], 4);
+	*outbuf += 4;
+	return 0;
+}
+
+static int sw842_ptr2(const char **inbuf, int *inbit,
+		unsigned char **outbuf, struct sw842_fifo *fifo)
+{
+	uint8_t ptr;
+	ptr = sw842_get_ptr2(inbuf, inbit);
+	if (!fifo->f2_full && (ptr >= fifo->f2_count))
+		return 1;
+	memcpy(*outbuf, fifo->f2[ptr], 2);
+	*outbuf += 2;
+	return 0;
+}
+
+static void sw842_copy_to_fifo(const char *buf, struct sw842_fifo *fifo)
+{
+	unsigned char initial_f2count = fifo->f2_count;
+
+	memcpy(fifo->f8[fifo->f8_count], buf, 8);
+	fifo->f4_count += 2;
+	fifo->f8_count += 1;
+
+	if (!fifo->f84_full && fifo->f4_count >= 512) {
+		fifo->f84_full = 1;
+		fifo->f4_count /= 512;
+	}
+
+	memcpy(fifo->f2[fifo->f2_count++], buf, 2);
+	memcpy(fifo->f2[fifo->f2_count++], buf + 2, 2);
+	memcpy(fifo->f2[fifo->f2_count++], buf + 4, 2);
+	memcpy(fifo->f2[fifo->f2_count++], buf + 6, 2);
+	if (fifo->f2_count < initial_f2count)
+		fifo->f2_full = 1;
+}
+
+static int sw842_decompress(const unsigned char *src, int srclen,
+			unsigned char *dst, int *destlen,
+			const void *wrkmem)
+{
+	uint8_t tmpl;
+	const char *inbuf;
+	int inbit = 0;
+	unsigned char *outbuf, *outbuf_end, *origbuf, *prevbuf;
+	const char *inbuf_end;
+	sw842_template_op op;
+	int opindex;
+	int i, repeat_count;
+	struct sw842_fifo *fifo;
+	int ret = 0;
+
+	fifo = &((struct nx842_workmem *)(wrkmem))->swfifo;
+	memset(fifo, 0, sizeof(*fifo));
+
+	origbuf = NULL;
+	inbuf = src;
+	inbuf_end = src + srclen;
+	outbuf = dst;
+	outbuf_end = dst + *destlen;
+
+	while ((tmpl = sw842_get_template(&inbuf, &inbit)) != SW842_TMPL_EOF) {
+		if (inbuf >= inbuf_end) {
+			ret = -EINVAL;
+			goto out;
+		}
+
+		opindex = 0;
+		prevbuf = origbuf;
+		origbuf = outbuf;
+		switch (tmpl) {
+		case SW842_TMPL_REPEAT:
+			if (prevbuf == NULL) {
+				ret = -EINVAL;
+				goto out;
+			}
+
+			repeat_count = sw842_get_repeat_count(&inbuf,
+								&inbit) + 1;
+
+			/* Did the repeat count advance past the end of input */
+			if (inbuf > inbuf_end) {
+				ret = -EINVAL;
+				goto out;
+			}
+
+			for (i = 0; i < repeat_count; i++) {
+				/* Would this overflow the output buffer */
+				if ((outbuf + 8) > outbuf_end) {
+					ret = -ENOSPC;
+					goto out;
+				}
+
+				memcpy(outbuf, prevbuf, 8);
+				sw842_copy_to_fifo(outbuf, fifo);
+				outbuf += 8;
+			}
+			break;
+
+		case SW842_TMPL_ZEROS:
+			/* Would this overflow the output buffer */
+			if ((outbuf + 8) > outbuf_end) {
+				ret = -ENOSPC;
+				goto out;
+			}
+
+			memset(outbuf, 0, 8);
+			sw842_copy_to_fifo(outbuf, fifo);
+			outbuf += 8;
+			break;
+
+		default:
+			if (tmpl > 25) {
+				ret = -EINVAL;
+				goto out;
+			}
+
+			/* Does this go past the end of the input buffer */
+			if ((inbuf + 2) > inbuf_end) {
+				ret = -EINVAL;
+				goto out;
+			}
+
+			/* Would this overflow the output buffer */
+			if ((outbuf + 8) > outbuf_end) {
+				ret = -ENOSPC;
+				goto out;
+			}
+
+			while (opindex < 4 &&
+				(op = sw842_tmpl_ops[tmpl][opindex++])
+					!= NULL) {
+				ret = (*op)(&inbuf, &inbit, &outbuf, fifo);
+				if (ret) {
+					ret = -EINVAL;
+					goto out;
+				}
+				sw842_copy_to_fifo(origbuf, fifo);
+			}
+		}
+	}
+
+out:
+	if (!ret)
+		*destlen = (unsigned int)(outbuf - dst);
+	else
+		*destlen = 0;
+
+	return ret;
+}
diff --git a/drivers/crypto/nx/nx-842.c b/drivers/crypto/nx/nx-842.c
deleted file mode 100644
index 887196e..0000000
--- a/drivers/crypto/nx/nx-842.c
+++ /dev/null
@@ -1,1603 +0,0 @@
-/*
- * Driver for IBM Power 842 compression accelerator
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
- *
- * Copyright (C) IBM Corporation, 2012
- *
- * Authors: Robert Jennings <rcj@linux.vnet.ibm.com>
- *          Seth Jennings <sjenning@linux.vnet.ibm.com>
- */
-
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/nx842.h>
-#include <linux/of.h>
-#include <linux/slab.h>
-
-#include <asm/page.h>
-#include <asm/vio.h>
-
-#include "nx_csbcpb.h" /* struct nx_csbcpb */
-
-#define MODULE_NAME "nx-compress"
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Robert Jennings <rcj@linux.vnet.ibm.com>");
-MODULE_DESCRIPTION("842 H/W Compression driver for IBM Power processors");
-
-#define SHIFT_4K 12
-#define SHIFT_64K 16
-#define SIZE_4K (1UL << SHIFT_4K)
-#define SIZE_64K (1UL << SHIFT_64K)
-
-/* IO buffer must be 128 byte aligned */
-#define IO_BUFFER_ALIGN 128
-
-struct nx842_header {
-	int blocks_nr; /* number of compressed blocks */
-	int offset; /* offset of the first block (from beginning of header) */
-	int sizes[0]; /* size of compressed blocks */
-};
-
-static inline int nx842_header_size(const struct nx842_header *hdr)
-{
-	return sizeof(struct nx842_header) +
-			hdr->blocks_nr * sizeof(hdr->sizes[0]);
-}
-
-/* Macros for fields within nx_csbcpb */
-/* Check the valid bit within the csbcpb valid field */
-#define NX842_CSBCBP_VALID_CHK(x) (x & BIT_MASK(7))
-
-/* CE macros operate on the completion_extension field bits in the csbcpb.
- * CE0 0=full completion, 1=partial completion
- * CE1 0=CE0 indicates completion, 1=termination (output may be modified)
- * CE2 0=processed_bytes is source bytes, 1=processed_bytes is target bytes */
-#define NX842_CSBCPB_CE0(x)	(x & BIT_MASK(7))
-#define NX842_CSBCPB_CE1(x)	(x & BIT_MASK(6))
-#define NX842_CSBCPB_CE2(x)	(x & BIT_MASK(5))
-
-/* The NX unit accepts data only on 4K page boundaries */
-#define NX842_HW_PAGE_SHIFT	SHIFT_4K
-#define NX842_HW_PAGE_SIZE	(ASM_CONST(1) << NX842_HW_PAGE_SHIFT)
-#define NX842_HW_PAGE_MASK	(~(NX842_HW_PAGE_SIZE-1))
-
-enum nx842_status {
-	UNAVAILABLE,
-	AVAILABLE
-};
-
-struct ibm_nx842_counters {
-	atomic64_t comp_complete;
-	atomic64_t comp_failed;
-	atomic64_t decomp_complete;
-	atomic64_t decomp_failed;
-	atomic64_t swdecomp;
-	atomic64_t comp_times[32];
-	atomic64_t decomp_times[32];
-};
-
-static struct nx842_devdata {
-	struct vio_dev *vdev;
-	struct device *dev;
-	struct ibm_nx842_counters *counters;
-	unsigned int max_sg_len;
-	unsigned int max_sync_size;
-	unsigned int max_sync_sg;
-	enum nx842_status status;
-} __rcu *devdata;
-static DEFINE_SPINLOCK(devdata_mutex);
-
-#define NX842_COUNTER_INC(_x) \
-static inline void nx842_inc_##_x( \
-	const struct nx842_devdata *dev) { \
-	if (dev) \
-		atomic64_inc(&dev->counters->_x); \
-}
-NX842_COUNTER_INC(comp_complete);
-NX842_COUNTER_INC(comp_failed);
-NX842_COUNTER_INC(decomp_complete);
-NX842_COUNTER_INC(decomp_failed);
-NX842_COUNTER_INC(swdecomp);
-
-#define NX842_HIST_SLOTS 16
-
-static void ibm_nx842_incr_hist(atomic64_t *times, unsigned int time)
-{
-	int bucket = fls(time);
-
-	if (bucket)
-		bucket = min((NX842_HIST_SLOTS - 1), bucket - 1);
-
-	atomic64_inc(&times[bucket]);
-}
-
-/* NX unit operation flags */
-#define NX842_OP_COMPRESS	0x0
-#define NX842_OP_CRC		0x1
-#define NX842_OP_DECOMPRESS	0x2
-#define NX842_OP_COMPRESS_CRC   (NX842_OP_COMPRESS | NX842_OP_CRC)
-#define NX842_OP_DECOMPRESS_CRC (NX842_OP_DECOMPRESS | NX842_OP_CRC)
-#define NX842_OP_ASYNC		(1<<23)
-#define NX842_OP_NOTIFY		(1<<22)
-#define NX842_OP_NOTIFY_INT(x)	((x & 0xff)<<8)
-
-static unsigned long nx842_get_desired_dma(struct vio_dev *viodev)
-{
-	/* No use of DMA mappings within the driver. */
-	return 0;
-}
-
-struct nx842_slentry {
-	unsigned long ptr; /* Real address (use __pa()) */
-	unsigned long len;
-};
-
-/* pHyp scatterlist entry */
-struct nx842_scatterlist {
-	int entry_nr; /* number of slentries */
-	struct nx842_slentry *entries; /* ptr to array of slentries */
-};
-
-/* Does not include sizeof(entry_nr) in the size */
-static inline unsigned long nx842_get_scatterlist_size(
-				struct nx842_scatterlist *sl)
-{
-	return sl->entry_nr * sizeof(struct nx842_slentry);
-}
-
-static inline unsigned long nx842_get_pa(void *addr)
-{
-	if (is_vmalloc_addr(addr))
-		return page_to_phys(vmalloc_to_page(addr))
-		       + offset_in_page(addr);
-	else
-		return __pa(addr);
-}
-
-static int nx842_build_scatterlist(unsigned long buf, int len,
-			struct nx842_scatterlist *sl)
-{
-	unsigned long nextpage;
-	struct nx842_slentry *entry;
-
-	sl->entry_nr = 0;
-
-	entry = sl->entries;
-	while (len) {
-		entry->ptr = nx842_get_pa((void *)buf);
-		nextpage = ALIGN(buf + 1, NX842_HW_PAGE_SIZE);
-		if (nextpage < buf + len) {
-			/* we aren't at the end yet */
-			if (IS_ALIGNED(buf, NX842_HW_PAGE_SIZE))
-				/* we are in the middle (or beginning) */
-				entry->len = NX842_HW_PAGE_SIZE;
-			else
-				/* we are at the beginning */
-				entry->len = nextpage - buf;
-		} else {
-			/* at the end */
-			entry->len = len;
-		}
-
-		len -= entry->len;
-		buf += entry->len;
-		sl->entry_nr++;
-		entry++;
-	}
-
-	return 0;
-}
-
-/*
- * Working memory for software decompression
- */
-struct sw842_fifo {
-	union {
-		char f8[256][8];
-		char f4[512][4];
-	};
-	char f2[256][2];
-	unsigned char f84_full;
-	unsigned char f2_full;
-	unsigned char f8_count;
-	unsigned char f2_count;
-	unsigned int f4_count;
-};
-
-/*
- * Working memory for crypto API
- */
-struct nx842_workmem {
-	char bounce[PAGE_SIZE]; /* bounce buffer for decompression input */
-	union {
-		/* hardware working memory */
-		struct {
-			/* scatterlist */
-			char slin[SIZE_4K];
-			char slout[SIZE_4K];
-			/* coprocessor status/parameter block */
-			struct nx_csbcpb csbcpb;
-		};
-		/* software working memory */
-		struct sw842_fifo swfifo; /* software decompression fifo */
-	};
-};
-
-int nx842_get_workmem_size(void)
-{
-	return sizeof(struct nx842_workmem) + NX842_HW_PAGE_SIZE;
-}
-EXPORT_SYMBOL_GPL(nx842_get_workmem_size);
-
-int nx842_get_workmem_size_aligned(void)
-{
-	return sizeof(struct nx842_workmem);
-}
-EXPORT_SYMBOL_GPL(nx842_get_workmem_size_aligned);
-
-static int nx842_validate_result(struct device *dev,
-	struct cop_status_block *csb)
-{
-	/* The csb must be valid after returning from vio_h_cop_sync */
-	if (!NX842_CSBCBP_VALID_CHK(csb->valid)) {
-		dev_err(dev, "%s: cspcbp not valid upon completion.\n",
-				__func__);
-		dev_dbg(dev, "valid:0x%02x cs:0x%02x cc:0x%02x ce:0x%02x\n",
-				csb->valid,
-				csb->crb_seq_number,
-				csb->completion_code,
-				csb->completion_extension);
-		dev_dbg(dev, "processed_bytes:%d address:0x%016lx\n",
-				csb->processed_byte_count,
-				(unsigned long)csb->address);
-		return -EIO;
-	}
-
-	/* Check return values from the hardware in the CSB */
-	switch (csb->completion_code) {
-	case 0:	/* Completed without error */
-		break;
-	case 64: /* Target bytes > Source bytes during compression */
-	case 13: /* Output buffer too small */
-		dev_dbg(dev, "%s: Compression output larger than input\n",
-					__func__);
-		return -ENOSPC;
-	case 66: /* Input data contains an illegal template field */
-	case 67: /* Template indicates data past the end of the input stream */
-		dev_dbg(dev, "%s: Bad data for decompression (code:%d)\n",
-					__func__, csb->completion_code);
-		return -EINVAL;
-	default:
-		dev_dbg(dev, "%s: Unspecified error (code:%d)\n",
-					__func__, csb->completion_code);
-		return -EIO;
-	}
-
-	/* Hardware sanity check */
-	if (!NX842_CSBCPB_CE2(csb->completion_extension)) {
-		dev_err(dev, "%s: No error returned by hardware, but "
-				"data returned is unusable, contact support.\n"
-				"(Additional info: csbcbp->processed bytes "
-				"does not specify processed bytes for the "
-				"target buffer.)\n", __func__);
-		return -EIO;
-	}
-
-	return 0;
-}
-
-/**
- * nx842_compress - Compress data using the 842 algorithm
- *
- * Compression provide by the NX842 coprocessor on IBM Power systems.
- * The input buffer is compressed and the result is stored in the
- * provided output buffer.
- *
- * Upon return from this function @outlen contains the length of the
- * compressed data.  If there is an error then @outlen will be 0 and an
- * error will be specified by the return code from this function.
- *
- * @in: Pointer to input buffer, must be page aligned
- * @inlen: Length of input buffer, must be PAGE_SIZE
- * @out: Pointer to output buffer
- * @outlen: Length of output buffer
- * @wrkmem: ptr to buffer for working memory, size determined by
- *          nx842_get_workmem_size()
- *
- * Returns:
- *   0		Success, output of length @outlen stored in the buffer at @out
- *   -ENOMEM	Unable to allocate internal buffers
- *   -ENOSPC	Output buffer is to small
- *   -EMSGSIZE	XXX Difficult to describe this limitation
- *   -EIO	Internal error
- *   -ENODEV	Hardware unavailable
- */
-int nx842_compress(const unsigned char *in, unsigned int inlen,
-		       unsigned char *out, unsigned int *outlen, void *wmem)
-{
-	struct nx842_header *hdr;
-	struct nx842_devdata *local_devdata;
-	struct device *dev = NULL;
-	struct nx842_workmem *workmem;
-	struct nx842_scatterlist slin, slout;
-	struct nx_csbcpb *csbcpb;
-	int ret = 0, max_sync_size, i, bytesleft, size, hdrsize;
-	unsigned long inbuf, outbuf, padding;
-	struct vio_pfo_op op = {
-		.done = NULL,
-		.handle = 0,
-		.timeout = 0,
-	};
-	unsigned long start_time = get_tb();
-
-	/*
-	 * Make sure input buffer is 64k page aligned.  This is assumed since
-	 * this driver is designed for page compression only (for now).  This
-	 * is very nice since we can now use direct DDE(s) for the input and
-	 * the alignment is guaranteed.
-	*/
-	inbuf = (unsigned long)in;
-	if (!IS_ALIGNED(inbuf, PAGE_SIZE) || inlen != PAGE_SIZE)
-		return -EINVAL;
-
-	rcu_read_lock();
-	local_devdata = rcu_dereference(devdata);
-	if (!local_devdata || !local_devdata->dev) {
-		rcu_read_unlock();
-		return -ENODEV;
-	}
-	max_sync_size = local_devdata->max_sync_size;
-	dev = local_devdata->dev;
-
-	/* Create the header */
-	hdr = (struct nx842_header *)out;
-	hdr->blocks_nr = PAGE_SIZE / max_sync_size;
-	hdrsize = nx842_header_size(hdr);
-	outbuf = (unsigned long)out + hdrsize;
-	bytesleft = *outlen - hdrsize;
-
-	/* Init scatterlist */
-	workmem = (struct nx842_workmem *)ALIGN((unsigned long)wmem,
-		NX842_HW_PAGE_SIZE);
-	slin.entries = (struct nx842_slentry *)workmem->slin;
-	slout.entries = (struct nx842_slentry *)workmem->slout;
-
-	/* Init operation */
-	op.flags = NX842_OP_COMPRESS;
-	csbcpb = &workmem->csbcpb;
-	memset(csbcpb, 0, sizeof(*csbcpb));
-	op.csbcpb = nx842_get_pa(csbcpb);
-	op.out = nx842_get_pa(slout.entries);
-
-	for (i = 0; i < hdr->blocks_nr; i++) {
-		/*
-		 * Aligning the output blocks to 128 bytes does waste space,
-		 * but it prevents the need for bounce buffers and memory
-		 * copies.  It also simplifies the code a lot.  In the worst
-		 * case (64k page, 4k max_sync_size), you lose up to
-		 * (128*16)/64k = ~3% the compression factor. For 64k
-		 * max_sync_size, the loss would be at most 128/64k = ~0.2%.
-		 */
-		padding = ALIGN(outbuf, IO_BUFFER_ALIGN) - outbuf;
-		outbuf += padding;
-		bytesleft -= padding;
-		if (i == 0)
-			/* save offset into first block in header */
-			hdr->offset = padding + hdrsize;
-
-		if (bytesleft <= 0) {
-			ret = -ENOSPC;
-			goto unlock;
-		}
-
-		/*
-		 * NOTE: If the default max_sync_size is changed from 4k
-		 * to 64k, remove the "likely" case below, since a
-		 * scatterlist will always be needed.
-		 */
-		if (likely(max_sync_size == NX842_HW_PAGE_SIZE)) {
-			/* Create direct DDE */
-			op.in = nx842_get_pa((void *)inbuf);
-			op.inlen = max_sync_size;
-
-		} else {
-			/* Create indirect DDE (scatterlist) */
-			nx842_build_scatterlist(inbuf, max_sync_size, &slin);
-			op.in = nx842_get_pa(slin.entries);
-			op.inlen = -nx842_get_scatterlist_size(&slin);
-		}
-
-		/*
-		 * If max_sync_size != NX842_HW_PAGE_SIZE, an indirect
-		 * DDE is required for the outbuf.
-		 * If max_sync_size == NX842_HW_PAGE_SIZE, outbuf must
-		 * also be page aligned (1 in 128/4k=32 chance) in order
-		 * to use a direct DDE.
-		 * This is unlikely, just use an indirect DDE always.
-		 */
-		nx842_build_scatterlist(outbuf,
-			min(bytesleft, max_sync_size), &slout);
-		/* op.out set before loop */
-		op.outlen = -nx842_get_scatterlist_size(&slout);
-
-		/* Send request to pHyp */
-		ret = vio_h_cop_sync(local_devdata->vdev, &op);
-
-		/* Check for pHyp error */
-		if (ret) {
-			dev_dbg(dev, "%s: vio_h_cop_sync error (ret=%d, hret=%ld)\n",
-				__func__, ret, op.hcall_err);
-			ret = -EIO;
-			goto unlock;
-		}
-
-		/* Check for hardware error */
-		ret = nx842_validate_result(dev, &csbcpb->csb);
-		if (ret && ret != -ENOSPC)
-			goto unlock;
-
-		/* Handle incompressible data */
-		if (unlikely(ret == -ENOSPC)) {
-			if (bytesleft < max_sync_size) {
-				/*
-				 * Not enough space left in the output buffer
-				 * to store uncompressed block
-				 */
-				goto unlock;
-			} else {
-				/* Store incompressible block */
-				memcpy((void *)outbuf, (void *)inbuf,
-					max_sync_size);
-				hdr->sizes[i] = -max_sync_size;
-				outbuf += max_sync_size;
-				bytesleft -= max_sync_size;
-				/* Reset ret, incompressible data handled */
-				ret = 0;
-			}
-		} else {
-			/* Normal case, compression was successful */
-			size = csbcpb->csb.processed_byte_count;
-			dev_dbg(dev, "%s: processed_bytes=%d\n",
-				__func__, size);
-			hdr->sizes[i] = size;
-			outbuf += size;
-			bytesleft -= size;
-		}
-
-		inbuf += max_sync_size;
-	}
-
-	*outlen = (unsigned int)(outbuf - (unsigned long)out);
-
-unlock:
-	if (ret)
-		nx842_inc_comp_failed(local_devdata);
-	else {
-		nx842_inc_comp_complete(local_devdata);
-		ibm_nx842_incr_hist(local_devdata->counters->comp_times,
-			(get_tb() - start_time) / tb_ticks_per_usec);
-	}
-	rcu_read_unlock();
-	return ret;
-}
-EXPORT_SYMBOL_GPL(nx842_compress);
-
-static int sw842_decompress(const unsigned char *, int, unsigned char *, int *,
-			const void *);
-
-/**
- * nx842_decompress - Decompress data using the 842 algorithm
- *
- * Decompression provide by the NX842 coprocessor on IBM Power systems.
- * The input buffer is decompressed and the result is stored in the
- * provided output buffer.  The size allocated to the output buffer is
- * provided by the caller of this function in @outlen.  Upon return from
- * this function @outlen contains the length of the decompressed data.
- * If there is an error then @outlen will be 0 and an error will be
- * specified by the return code from this function.
- *
- * @in: Pointer to input buffer, will use bounce buffer if not 128 byte
- *      aligned
- * @inlen: Length of input buffer
- * @out: Pointer to output buffer, must be page aligned
- * @outlen: Length of output buffer, must be PAGE_SIZE
- * @wrkmem: ptr to buffer for working memory, size determined by
- *          nx842_get_workmem_size()
- *
- * Returns:
- *   0		Success, output of length @outlen stored in the buffer at @out
- *   -ENODEV	Hardware decompression device is unavailable
- *   -ENOMEM	Unable to allocate internal buffers
- *   -ENOSPC	Output buffer is to small
- *   -EINVAL	Bad input data encountered when attempting decompress
- *   -EIO	Internal error
- */
-int nx842_decompress(const unsigned char *in, unsigned int inlen,
-			 unsigned char *out, unsigned int *outlen, void *wmem)
-{
-	struct nx842_header *hdr;
-	struct nx842_devdata *local_devdata;
-	struct device *dev = NULL;
-	struct nx842_workmem *workmem;
-	struct nx842_scatterlist slin, slout;
-	struct nx_csbcpb *csbcpb;
-	int ret = 0, i, size, max_sync_size;
-	unsigned long inbuf, outbuf;
-	struct vio_pfo_op op = {
-		.done = NULL,
-		.handle = 0,
-		.timeout = 0,
-	};
-	unsigned long start_time = get_tb();
-
-	/* Ensure page alignment and size */
-	outbuf = (unsigned long)out;
-	if (!IS_ALIGNED(outbuf, PAGE_SIZE) || *outlen != PAGE_SIZE)
-		return -EINVAL;
-
-	rcu_read_lock();
-	local_devdata = rcu_dereference(devdata);
-	if (local_devdata)
-		dev = local_devdata->dev;
-
-	/* Get header */
-	hdr = (struct nx842_header *)in;
-
-	workmem = (struct nx842_workmem *)ALIGN((unsigned long)wmem,
-		NX842_HW_PAGE_SIZE);
-
-	inbuf = (unsigned long)in + hdr->offset;
-	if (likely(!IS_ALIGNED(inbuf, IO_BUFFER_ALIGN))) {
-		/* Copy block(s) into bounce buffer for alignment */
-		memcpy(workmem->bounce, in + hdr->offset, inlen - hdr->offset);
-		inbuf = (unsigned long)workmem->bounce;
-	}
-
-	/* Init scatterlist */
-	slin.entries = (struct nx842_slentry *)workmem->slin;
-	slout.entries = (struct nx842_slentry *)workmem->slout;
-
-	/* Init operation */
-	op.flags = NX842_OP_DECOMPRESS;
-	csbcpb = &workmem->csbcpb;
-	memset(csbcpb, 0, sizeof(*csbcpb));
-	op.csbcpb = nx842_get_pa(csbcpb);
-
-	/*
-	 * max_sync_size may have changed since compression,
-	 * so we can't read it from the device info. We need
-	 * to derive it from hdr->blocks_nr.
-	 */
-	max_sync_size = PAGE_SIZE / hdr->blocks_nr;
-
-	for (i = 0; i < hdr->blocks_nr; i++) {
-		/* Skip padding */
-		inbuf = ALIGN(inbuf, IO_BUFFER_ALIGN);
-
-		if (hdr->sizes[i] < 0) {
-			/* Negative sizes indicate uncompressed data blocks */
-			size = abs(hdr->sizes[i]);
-			memcpy((void *)outbuf, (void *)inbuf, size);
-			outbuf += size;
-			inbuf += size;
-			continue;
-		}
-
-		if (!dev)
-			goto sw;
-
-		/*
-		 * The better the compression, the more likely the "likely"
-		 * case becomes.
-		 */
-		if (likely((inbuf & NX842_HW_PAGE_MASK) ==
-			((inbuf + hdr->sizes[i] - 1) & NX842_HW_PAGE_MASK))) {
-			/* Create direct DDE */
-			op.in = nx842_get_pa((void *)inbuf);
-			op.inlen = hdr->sizes[i];
-		} else {
-			/* Create indirect DDE (scatterlist) */
-			nx842_build_scatterlist(inbuf, hdr->sizes[i] , &slin);
-			op.in = nx842_get_pa(slin.entries);
-			op.inlen = -nx842_get_scatterlist_size(&slin);
-		}
-
-		/*
-		 * NOTE: If the default max_sync_size is changed from 4k
-		 * to 64k, remove the "likely" case below, since a
-		 * scatterlist will always be needed.
-		 */
-		if (likely(max_sync_size == NX842_HW_PAGE_SIZE)) {
-			/* Create direct DDE */
-			op.out = nx842_get_pa((void *)outbuf);
-			op.outlen = max_sync_size;
-		} else {
-			/* Create indirect DDE (scatterlist) */
-			nx842_build_scatterlist(outbuf, max_sync_size, &slout);
-			op.out = nx842_get_pa(slout.entries);
-			op.outlen = -nx842_get_scatterlist_size(&slout);
-		}
-
-		/* Send request to pHyp */
-		ret = vio_h_cop_sync(local_devdata->vdev, &op);
-
-		/* Check for pHyp error */
-		if (ret) {
-			dev_dbg(dev, "%s: vio_h_cop_sync error (ret=%d, hret=%ld)\n",
-				__func__, ret, op.hcall_err);
-			dev = NULL;
-			goto sw;
-		}
-
-		/* Check for hardware error */
-		ret = nx842_validate_result(dev, &csbcpb->csb);
-		if (ret) {
-			dev = NULL;
-			goto sw;
-		}
-
-		/* HW decompression success */
-		inbuf += hdr->sizes[i];
-		outbuf += csbcpb->csb.processed_byte_count;
-		continue;
-
-sw:
-		/* software decompression */
-		size = max_sync_size;
-		ret = sw842_decompress(
-			(unsigned char *)inbuf, hdr->sizes[i],
-			(unsigned char *)outbuf, &size, wmem);
-		if (ret)
-			pr_debug("%s: sw842_decompress failed with %d\n",
-				__func__, ret);
-
-		if (ret) {
-			if (ret != -ENOSPC && ret != -EINVAL &&
-					ret != -EMSGSIZE)
-				ret = -EIO;
-			goto unlock;
-		}
-
-		/* SW decompression success */
-		inbuf += hdr->sizes[i];
-		outbuf += size;
-	}
-
-	*outlen = (unsigned int)(outbuf - (unsigned long)out);
-
-unlock:
-	if (ret)
-		/* decompress fail */
-		nx842_inc_decomp_failed(local_devdata);
-	else {
-		if (!dev)
-			/* software decompress */
-			nx842_inc_swdecomp(local_devdata);
-		nx842_inc_decomp_complete(local_devdata);
-		ibm_nx842_incr_hist(local_devdata->counters->decomp_times,
-			(get_tb() - start_time) / tb_ticks_per_usec);
-	}
-
-	rcu_read_unlock();
-	return ret;
-}
-EXPORT_SYMBOL_GPL(nx842_decompress);
-
-/**
- * nx842_OF_set_defaults -- Set default (disabled) values for devdata
- *
- * @devdata - struct nx842_devdata to update
- *
- * Returns:
- *  0 on success
- *  -ENOENT if @devdata ptr is NULL
- */
-static int nx842_OF_set_defaults(struct nx842_devdata *devdata)
-{
-	if (devdata) {
-		devdata->max_sync_size = 0;
-		devdata->max_sync_sg = 0;
-		devdata->max_sg_len = 0;
-		devdata->status = UNAVAILABLE;
-		return 0;
-	} else
-		return -ENOENT;
-}
-
-/**
- * nx842_OF_upd_status -- Update the device info from OF status prop
- *
- * The status property indicates if the accelerator is enabled.  If the
- * device is in the OF tree it indicates that the hardware is present.
- * The status field indicates if the device is enabled when the status
- * is 'okay'.  Otherwise the device driver will be disabled.
- *
- * @devdata - struct nx842_devdata to update
- * @prop - struct property point containing the maxsyncop for the update
- *
- * Returns:
- *  0 - Device is available
- *  -EINVAL - Device is not available
- */
-static int nx842_OF_upd_status(struct nx842_devdata *devdata,
-					struct property *prop) {
-	int ret = 0;
-	const char *status = (const char *)prop->value;
-
-	if (!strncmp(status, "okay", (size_t)prop->length)) {
-		devdata->status = AVAILABLE;
-	} else {
-		dev_info(devdata->dev, "%s: status '%s' is not 'okay'\n",
-				__func__, status);
-		devdata->status = UNAVAILABLE;
-	}
-
-	return ret;
-}
-
-/**
- * nx842_OF_upd_maxsglen -- Update the device info from OF maxsglen prop
- *
- * Definition of the 'ibm,max-sg-len' OF property:
- *  This field indicates the maximum byte length of a scatter list
- *  for the platform facility. It is a single cell encoded as with encode-int.
- *
- * Example:
- *  # od -x ibm,max-sg-len
- *  0000000 0000 0ff0
- *
- *  In this example, the maximum byte length of a scatter list is
- *  0x0ff0 (4,080).
- *
- * @devdata - struct nx842_devdata to update
- * @prop - struct property point containing the maxsyncop for the update
- *
- * Returns:
- *  0 on success
- *  -EINVAL on failure
- */
-static int nx842_OF_upd_maxsglen(struct nx842_devdata *devdata,
-					struct property *prop) {
-	int ret = 0;
-	const int *maxsglen = prop->value;
-
-	if (prop->length != sizeof(*maxsglen)) {
-		dev_err(devdata->dev, "%s: unexpected format for ibm,max-sg-len property\n", __func__);
-		dev_dbg(devdata->dev, "%s: ibm,max-sg-len is %d bytes long, expected %lu bytes\n", __func__,
-				prop->length, sizeof(*maxsglen));
-		ret = -EINVAL;
-	} else {
-		devdata->max_sg_len = (unsigned int)min(*maxsglen,
-				(int)NX842_HW_PAGE_SIZE);
-	}
-
-	return ret;
-}
-
-/**
- * nx842_OF_upd_maxsyncop -- Update the device info from OF maxsyncop prop
- *
- * Definition of the 'ibm,max-sync-cop' OF property:
- *  Two series of cells.  The first series of cells represents the maximums
- *  that can be synchronously compressed. The second series of cells
- *  represents the maximums that can be synchronously decompressed.
- *  1. The first cell in each series contains the count of the number of
- *     data length, scatter list elements pairs that follow – each being
- *     of the form
- *    a. One cell data byte length
- *    b. One cell total number of scatter list elements
- *
- * Example:
- *  # od -x ibm,max-sync-cop
- *  0000000 0000 0001 0000 1000 0000 01fe 0000 0001
- *  0000020 0000 1000 0000 01fe
- *
- *  In this example, compression supports 0x1000 (4,096) data byte length
- *  and 0x1fe (510) total scatter list elements.  Decompression supports
- *  0x1000 (4,096) data byte length and 0x1f3 (510) total scatter list
- *  elements.
- *
- * @devdata - struct nx842_devdata to update
- * @prop - struct property point containing the maxsyncop for the update
- *
- * Returns:
- *  0 on success
- *  -EINVAL on failure
- */
-static int nx842_OF_upd_maxsyncop(struct nx842_devdata *devdata,
-					struct property *prop) {
-	int ret = 0;
-	const struct maxsynccop_t {
-		int comp_elements;
-		int comp_data_limit;
-		int comp_sg_limit;
-		int decomp_elements;
-		int decomp_data_limit;
-		int decomp_sg_limit;
-	} *maxsynccop;
-
-	if (prop->length != sizeof(*maxsynccop)) {
-		dev_err(devdata->dev, "%s: unexpected format for ibm,max-sync-cop property\n", __func__);
-		dev_dbg(devdata->dev, "%s: ibm,max-sync-cop is %d bytes long, expected %lu bytes\n", __func__, prop->length,
-				sizeof(*maxsynccop));
-		ret = -EINVAL;
-		goto out;
-	}
-
-	maxsynccop = (const struct maxsynccop_t *)prop->value;
-
-	/* Use one limit rather than separate limits for compression and
-	 * decompression. Set a maximum for this so as not to exceed the
-	 * size that the header can support and round the value down to
-	 * the hardware page size (4K) */
-	devdata->max_sync_size =
-			(unsigned int)min(maxsynccop->comp_data_limit,
-					maxsynccop->decomp_data_limit);
-
-	devdata->max_sync_size = min_t(unsigned int, devdata->max_sync_size,
-					SIZE_64K);
-
-	if (devdata->max_sync_size < SIZE_4K) {
-		dev_err(devdata->dev, "%s: hardware max data size (%u) is "
-				"less than the driver minimum, unable to use "
-				"the hardware device\n",
-				__func__, devdata->max_sync_size);
-		ret = -EINVAL;
-		goto out;
-	}
-
-	devdata->max_sync_sg = (unsigned int)min(maxsynccop->comp_sg_limit,
-						maxsynccop->decomp_sg_limit);
-	if (devdata->max_sync_sg < 1) {
-		dev_err(devdata->dev, "%s: hardware max sg size (%u) is "
-				"less than the driver minimum, unable to use "
-				"the hardware device\n",
-				__func__, devdata->max_sync_sg);
-		ret = -EINVAL;
-		goto out;
-	}
-
-out:
-	return ret;
-}
-
-/**
- *
- * nx842_OF_upd -- Handle OF properties updates for the device.
- *
- * Set all properties from the OF tree.  Optionally, a new property
- * can be provided by the @new_prop pointer to overwrite an existing value.
- * The device will remain disabled until all values are valid, this function
- * will return an error for updates unless all values are valid.
- *
- * @new_prop: If not NULL, this property is being updated.  If NULL, update
- *  all properties from the current values in the OF tree.
- *
- * Returns:
- *  0 - Success
- *  -ENOMEM - Could not allocate memory for new devdata structure
- *  -EINVAL - property value not found, new_prop is not a recognized
- *	property for the device or property value is not valid.
- *  -ENODEV - Device is not available
- */
-static int nx842_OF_upd(struct property *new_prop)
-{
-	struct nx842_devdata *old_devdata = NULL;
-	struct nx842_devdata *new_devdata = NULL;
-	struct device_node *of_node = NULL;
-	struct property *status = NULL;
-	struct property *maxsglen = NULL;
-	struct property *maxsyncop = NULL;
-	int ret = 0;
-	unsigned long flags;
-
-	spin_lock_irqsave(&devdata_mutex, flags);
-	old_devdata = rcu_dereference_check(devdata,
-			lockdep_is_held(&devdata_mutex));
-	if (old_devdata)
-		of_node = old_devdata->dev->of_node;
-
-	if (!old_devdata || !of_node) {
-		pr_err("%s: device is not available\n", __func__);
-		spin_unlock_irqrestore(&devdata_mutex, flags);
-		return -ENODEV;
-	}
-
-	new_devdata = kzalloc(sizeof(*new_devdata), GFP_NOFS);
-	if (!new_devdata) {
-		dev_err(old_devdata->dev, "%s: Could not allocate memory for device data\n", __func__);
-		ret = -ENOMEM;
-		goto error_out;
-	}
-
-	memcpy(new_devdata, old_devdata, sizeof(*old_devdata));
-	new_devdata->counters = old_devdata->counters;
-
-	/* Set ptrs for existing properties */
-	status = of_find_property(of_node, "status", NULL);
-	maxsglen = of_find_property(of_node, "ibm,max-sg-len", NULL);
-	maxsyncop = of_find_property(of_node, "ibm,max-sync-cop", NULL);
-	if (!status || !maxsglen || !maxsyncop) {
-		dev_err(old_devdata->dev, "%s: Could not locate device properties\n", __func__);
-		ret = -EINVAL;
-		goto error_out;
-	}
-
-	/*
-	 * If this is a property update, there are only certain properties that
-	 * we care about. Bail if it isn't in the below list
-	 */
-	if (new_prop && (strncmp(new_prop->name, "status", new_prop->length) ||
-		         strncmp(new_prop->name, "ibm,max-sg-len", new_prop->length) ||
-		         strncmp(new_prop->name, "ibm,max-sync-cop", new_prop->length)))
-		goto out;
-
-	/* Perform property updates */
-	ret = nx842_OF_upd_status(new_devdata, status);
-	if (ret)
-		goto error_out;
-
-	ret = nx842_OF_upd_maxsglen(new_devdata, maxsglen);
-	if (ret)
-		goto error_out;
-
-	ret = nx842_OF_upd_maxsyncop(new_devdata, maxsyncop);
-	if (ret)
-		goto error_out;
-
-out:
-	dev_info(old_devdata->dev, "%s: max_sync_size new:%u old:%u\n",
-			__func__, new_devdata->max_sync_size,
-			old_devdata->max_sync_size);
-	dev_info(old_devdata->dev, "%s: max_sync_sg new:%u old:%u\n",
-			__func__, new_devdata->max_sync_sg,
-			old_devdata->max_sync_sg);
-	dev_info(old_devdata->dev, "%s: max_sg_len new:%u old:%u\n",
-			__func__, new_devdata->max_sg_len,
-			old_devdata->max_sg_len);
-
-	rcu_assign_pointer(devdata, new_devdata);
-	spin_unlock_irqrestore(&devdata_mutex, flags);
-	synchronize_rcu();
-	dev_set_drvdata(new_devdata->dev, new_devdata);
-	kfree(old_devdata);
-	return 0;
-
-error_out:
-	if (new_devdata) {
-		dev_info(old_devdata->dev, "%s: device disabled\n", __func__);
-		nx842_OF_set_defaults(new_devdata);
-		rcu_assign_pointer(devdata, new_devdata);
-		spin_unlock_irqrestore(&devdata_mutex, flags);
-		synchronize_rcu();
-		dev_set_drvdata(new_devdata->dev, new_devdata);
-		kfree(old_devdata);
-	} else {
-		dev_err(old_devdata->dev, "%s: could not update driver from hardware\n", __func__);
-		spin_unlock_irqrestore(&devdata_mutex, flags);
-	}
-
-	if (!ret)
-		ret = -EINVAL;
-	return ret;
-}
-
-/**
- * nx842_OF_notifier - Process updates to OF properties for the device
- *
- * @np: notifier block
- * @action: notifier action
- * @update: struct pSeries_reconfig_prop_update pointer if action is
- *	PSERIES_UPDATE_PROPERTY
- *
- * Returns:
- *	NOTIFY_OK on success
- *	NOTIFY_BAD encoded with error number on failure, use
- *		notifier_to_errno() to decode this value
- */
-static int nx842_OF_notifier(struct notifier_block *np, unsigned long action,
-			     void *data)
-{
-	struct of_reconfig_data *upd = data;
-	struct nx842_devdata *local_devdata;
-	struct device_node *node = NULL;
-
-	rcu_read_lock();
-	local_devdata = rcu_dereference(devdata);
-	if (local_devdata)
-		node = local_devdata->dev->of_node;
-
-	if (local_devdata &&
-			action == OF_RECONFIG_UPDATE_PROPERTY &&
-			!strcmp(upd->dn->name, node->name)) {
-		rcu_read_unlock();
-		nx842_OF_upd(upd->prop);
-	} else
-		rcu_read_unlock();
-
-	return NOTIFY_OK;
-}
-
-static struct notifier_block nx842_of_nb = {
-	.notifier_call = nx842_OF_notifier,
-};
-
-#define nx842_counter_read(_name)					\
-static ssize_t nx842_##_name##_show(struct device *dev,		\
-		struct device_attribute *attr,				\
-		char *buf) {						\
-	struct nx842_devdata *local_devdata;			\
-	int p = 0;							\
-	rcu_read_lock();						\
-	local_devdata = rcu_dereference(devdata);			\
-	if (local_devdata)						\
-		p = snprintf(buf, PAGE_SIZE, "%ld\n",			\
-		       atomic64_read(&local_devdata->counters->_name));	\
-	rcu_read_unlock();						\
-	return p;							\
-}
-
-#define NX842DEV_COUNTER_ATTR_RO(_name)					\
-	nx842_counter_read(_name);					\
-	static struct device_attribute dev_attr_##_name = __ATTR(_name,	\
-						0444,			\
-						nx842_##_name##_show,\
-						NULL);
-
-NX842DEV_COUNTER_ATTR_RO(comp_complete);
-NX842DEV_COUNTER_ATTR_RO(comp_failed);
-NX842DEV_COUNTER_ATTR_RO(decomp_complete);
-NX842DEV_COUNTER_ATTR_RO(decomp_failed);
-NX842DEV_COUNTER_ATTR_RO(swdecomp);
-
-static ssize_t nx842_timehist_show(struct device *,
-		struct device_attribute *, char *);
-
-static struct device_attribute dev_attr_comp_times = __ATTR(comp_times, 0444,
-		nx842_timehist_show, NULL);
-static struct device_attribute dev_attr_decomp_times = __ATTR(decomp_times,
-		0444, nx842_timehist_show, NULL);
-
-static ssize_t nx842_timehist_show(struct device *dev,
-		struct device_attribute *attr, char *buf) {
-	char *p = buf;
-	struct nx842_devdata *local_devdata;
-	atomic64_t *times;
-	int bytes_remain = PAGE_SIZE;
-	int bytes;
-	int i;
-
-	rcu_read_lock();
-	local_devdata = rcu_dereference(devdata);
-	if (!local_devdata) {
-		rcu_read_unlock();
-		return 0;
-	}
-
-	if (attr == &dev_attr_comp_times)
-		times = local_devdata->counters->comp_times;
-	else if (attr == &dev_attr_decomp_times)
-		times = local_devdata->counters->decomp_times;
-	else {
-		rcu_read_unlock();
-		return 0;
-	}
-
-	for (i = 0; i < (NX842_HIST_SLOTS - 2); i++) {
-		bytes = snprintf(p, bytes_remain, "%u-%uus:\t%ld\n",
-			       i ? (2<<(i-1)) : 0, (2<<i)-1,
-			       atomic64_read(&times[i]));
-		bytes_remain -= bytes;
-		p += bytes;
-	}
-	/* The last bucket holds everything over
-	 * 2<<(NX842_HIST_SLOTS - 2) us */
-	bytes = snprintf(p, bytes_remain, "%uus - :\t%ld\n",
-			2<<(NX842_HIST_SLOTS - 2),
-			atomic64_read(&times[(NX842_HIST_SLOTS - 1)]));
-	p += bytes;
-
-	rcu_read_unlock();
-	return p - buf;
-}
-
-static struct attribute *nx842_sysfs_entries[] = {
-	&dev_attr_comp_complete.attr,
-	&dev_attr_comp_failed.attr,
-	&dev_attr_decomp_complete.attr,
-	&dev_attr_decomp_failed.attr,
-	&dev_attr_swdecomp.attr,
-	&dev_attr_comp_times.attr,
-	&dev_attr_decomp_times.attr,
-	NULL,
-};
-
-static struct attribute_group nx842_attribute_group = {
-	.name = NULL,		/* put in device directory */
-	.attrs = nx842_sysfs_entries,
-};
-
-static int __init nx842_probe(struct vio_dev *viodev,
-				  const struct vio_device_id *id)
-{
-	struct nx842_devdata *old_devdata, *new_devdata = NULL;
-	unsigned long flags;
-	int ret = 0;
-
-	spin_lock_irqsave(&devdata_mutex, flags);
-	old_devdata = rcu_dereference_check(devdata,
-			lockdep_is_held(&devdata_mutex));
-
-	if (old_devdata && old_devdata->vdev != NULL) {
-		dev_err(&viodev->dev, "%s: Attempt to register more than one instance of the hardware\n", __func__);
-		ret = -1;
-		goto error_unlock;
-	}
-
-	dev_set_drvdata(&viodev->dev, NULL);
-
-	new_devdata = kzalloc(sizeof(*new_devdata), GFP_NOFS);
-	if (!new_devdata) {
-		dev_err(&viodev->dev, "%s: Could not allocate memory for device data\n", __func__);
-		ret = -ENOMEM;
-		goto error_unlock;
-	}
-
-	new_devdata->counters = kzalloc(sizeof(*new_devdata->counters),
-			GFP_NOFS);
-	if (!new_devdata->counters) {
-		dev_err(&viodev->dev, "%s: Could not allocate memory for performance counters\n", __func__);
-		ret = -ENOMEM;
-		goto error_unlock;
-	}
-
-	new_devdata->vdev = viodev;
-	new_devdata->dev = &viodev->dev;
-	nx842_OF_set_defaults(new_devdata);
-
-	rcu_assign_pointer(devdata, new_devdata);
-	spin_unlock_irqrestore(&devdata_mutex, flags);
-	synchronize_rcu();
-	kfree(old_devdata);
-
-	of_reconfig_notifier_register(&nx842_of_nb);
-
-	ret = nx842_OF_upd(NULL);
-	if (ret && ret != -ENODEV) {
-		dev_err(&viodev->dev, "could not parse device tree. %d\n", ret);
-		ret = -1;
-		goto error;
-	}
-
-	rcu_read_lock();
-	dev_set_drvdata(&viodev->dev, rcu_dereference(devdata));
-	rcu_read_unlock();
-
-	if (sysfs_create_group(&viodev->dev.kobj, &nx842_attribute_group)) {
-		dev_err(&viodev->dev, "could not create sysfs device attributes\n");
-		ret = -1;
-		goto error;
-	}
-
-	return 0;
-
-error_unlock:
-	spin_unlock_irqrestore(&devdata_mutex, flags);
-	if (new_devdata)
-		kfree(new_devdata->counters);
-	kfree(new_devdata);
-error:
-	return ret;
-}
-
-static int __exit nx842_remove(struct vio_dev *viodev)
-{
-	struct nx842_devdata *old_devdata;
-	unsigned long flags;
-
-	pr_info("Removing IBM Power 842 compression device\n");
-	sysfs_remove_group(&viodev->dev.kobj, &nx842_attribute_group);
-
-	spin_lock_irqsave(&devdata_mutex, flags);
-	old_devdata = rcu_dereference_check(devdata,
-			lockdep_is_held(&devdata_mutex));
-	of_reconfig_notifier_unregister(&nx842_of_nb);
-	RCU_INIT_POINTER(devdata, NULL);
-	spin_unlock_irqrestore(&devdata_mutex, flags);
-	synchronize_rcu();
-	dev_set_drvdata(&viodev->dev, NULL);
-	if (old_devdata)
-		kfree(old_devdata->counters);
-	kfree(old_devdata);
-	return 0;
-}
-
-static struct vio_device_id nx842_driver_ids[] = {
-	{"ibm,compression-v1", "ibm,compression"},
-	{"", ""},
-};
-
-static struct vio_driver nx842_driver = {
-	.name = MODULE_NAME,
-	.probe = nx842_probe,
-	.remove = __exit_p(nx842_remove),
-	.get_desired_dma = nx842_get_desired_dma,
-	.id_table = nx842_driver_ids,
-};
-
-static int __init nx842_init(void)
-{
-	struct nx842_devdata *new_devdata;
-	pr_info("Registering IBM Power 842 compression driver\n");
-
-	RCU_INIT_POINTER(devdata, NULL);
-	new_devdata = kzalloc(sizeof(*new_devdata), GFP_KERNEL);
-	if (!new_devdata) {
-		pr_err("Could not allocate memory for device data\n");
-		return -ENOMEM;
-	}
-	new_devdata->status = UNAVAILABLE;
-	RCU_INIT_POINTER(devdata, new_devdata);
-
-	return vio_register_driver(&nx842_driver);
-}
-
-module_init(nx842_init);
-
-static void __exit nx842_exit(void)
-{
-	struct nx842_devdata *old_devdata;
-	unsigned long flags;
-
-	pr_info("Exiting IBM Power 842 compression driver\n");
-	spin_lock_irqsave(&devdata_mutex, flags);
-	old_devdata = rcu_dereference_check(devdata,
-			lockdep_is_held(&devdata_mutex));
-	RCU_INIT_POINTER(devdata, NULL);
-	spin_unlock_irqrestore(&devdata_mutex, flags);
-	synchronize_rcu();
-	if (old_devdata)
-		dev_set_drvdata(old_devdata->dev, NULL);
-	kfree(old_devdata);
-	vio_unregister_driver(&nx842_driver);
-}
-
-module_exit(nx842_exit);
-
-/*********************************
- * 842 software decompressor
-*********************************/
-typedef int (*sw842_template_op)(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-
-static int sw842_data8(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-static int sw842_data4(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-static int sw842_data2(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-static int sw842_ptr8(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-static int sw842_ptr4(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-static int sw842_ptr2(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-
-/* special templates */
-#define SW842_TMPL_REPEAT 0x1B
-#define SW842_TMPL_ZEROS 0x1C
-#define SW842_TMPL_EOF 0x1E
-
-static sw842_template_op sw842_tmpl_ops[26][4] = {
-	{ sw842_data8, NULL}, /* 0 (00000) */
-	{ sw842_data4, sw842_data2, sw842_ptr2,  NULL},
-	{ sw842_data4, sw842_ptr2,  sw842_data2, NULL},
-	{ sw842_data4, sw842_ptr2,  sw842_ptr2,  NULL},
-	{ sw842_data4, sw842_ptr4,  NULL},
-	{ sw842_data2, sw842_ptr2,  sw842_data4, NULL},
-	{ sw842_data2, sw842_ptr2,  sw842_data2, sw842_ptr2},
-	{ sw842_data2, sw842_ptr2,  sw842_ptr2,  sw842_data2},
-	{ sw842_data2, sw842_ptr2,  sw842_ptr2,  sw842_ptr2,},
-	{ sw842_data2, sw842_ptr2,  sw842_ptr4,  NULL},
-	{ sw842_ptr2,  sw842_data2, sw842_data4, NULL}, /* 10 (01010) */
-	{ sw842_ptr2,  sw842_data4, sw842_ptr2,  NULL},
-	{ sw842_ptr2,  sw842_data2, sw842_ptr2,  sw842_data2},
-	{ sw842_ptr2,  sw842_data2, sw842_ptr2,  sw842_ptr2},
-	{ sw842_ptr2,  sw842_data2, sw842_ptr4,  NULL},
-	{ sw842_ptr2,  sw842_ptr2,  sw842_data4, NULL},
-	{ sw842_ptr2,  sw842_ptr2,  sw842_data2, sw842_ptr2},
-	{ sw842_ptr2,  sw842_ptr2,  sw842_ptr2,  sw842_data2},
-	{ sw842_ptr2,  sw842_ptr2,  sw842_ptr2,  sw842_ptr2},
-	{ sw842_ptr2,  sw842_ptr2,  sw842_ptr4,  NULL},
-	{ sw842_ptr4,  sw842_data4, NULL}, /* 20 (10100) */
-	{ sw842_ptr4,  sw842_data2, sw842_ptr2,  NULL},
-	{ sw842_ptr4,  sw842_ptr2,  sw842_data2, NULL},
-	{ sw842_ptr4,  sw842_ptr2,  sw842_ptr2,  NULL},
-	{ sw842_ptr4,  sw842_ptr4,  NULL},
-	{ sw842_ptr8,  NULL}
-};
-
-/* Software decompress helpers */
-
-static uint8_t sw842_get_byte(const char *buf, int bit)
-{
-	uint8_t tmpl;
-	uint16_t tmp;
-	tmp = htons(*(uint16_t *)(buf));
-	tmp = (uint16_t)(tmp << bit);
-	tmp = ntohs(tmp);
-	memcpy(&tmpl, &tmp, 1);
-	return tmpl;
-}
-
-static uint8_t sw842_get_template(const char **buf, int *bit)
-{
-	uint8_t byte;
-	byte = sw842_get_byte(*buf, *bit);
-	byte = byte >> 3;
-	byte &= 0x1F;
-	*buf += (*bit + 5) / 8;
-	*bit = (*bit + 5) % 8;
-	return byte;
-}
-
-/* repeat_count happens to be 5-bit too (like the template) */
-static uint8_t sw842_get_repeat_count(const char **buf, int *bit)
-{
-	uint8_t byte;
-	byte = sw842_get_byte(*buf, *bit);
-	byte = byte >> 2;
-	byte &= 0x3F;
-	*buf += (*bit + 6) / 8;
-	*bit = (*bit + 6) % 8;
-	return byte;
-}
-
-static uint8_t sw842_get_ptr2(const char **buf, int *bit)
-{
-	uint8_t ptr;
-	ptr = sw842_get_byte(*buf, *bit);
-	(*buf)++;
-	return ptr;
-}
-
-static uint16_t sw842_get_ptr4(const char **buf, int *bit,
-		struct sw842_fifo *fifo)
-{
-	uint16_t ptr;
-	ptr = htons(*(uint16_t *)(*buf));
-	ptr = (uint16_t)(ptr << *bit);
-	ptr = ptr >> 7;
-	ptr &= 0x01FF;
-	*buf += (*bit + 9) / 8;
-	*bit = (*bit + 9) % 8;
-	return ptr;
-}
-
-static uint8_t sw842_get_ptr8(const char **buf, int *bit,
-		struct sw842_fifo *fifo)
-{
-	return sw842_get_ptr2(buf, bit);
-}
-
-/* Software decompress template ops */
-
-static int sw842_data8(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	int ret;
-
-	ret = sw842_data4(inbuf, inbit, outbuf, fifo);
-	if (ret)
-		return ret;
-	ret = sw842_data4(inbuf, inbit, outbuf, fifo);
-	return ret;
-}
-
-static int sw842_data4(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	int ret;
-
-	ret = sw842_data2(inbuf, inbit, outbuf, fifo);
-	if (ret)
-		return ret;
-	ret = sw842_data2(inbuf, inbit, outbuf, fifo);
-	return ret;
-}
-
-static int sw842_data2(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	**outbuf = sw842_get_byte(*inbuf, *inbit);
-	(*inbuf)++;
-	(*outbuf)++;
-	**outbuf = sw842_get_byte(*inbuf, *inbit);
-	(*inbuf)++;
-	(*outbuf)++;
-	return 0;
-}
-
-static int sw842_ptr8(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	uint8_t ptr;
-	ptr = sw842_get_ptr8(inbuf, inbit, fifo);
-	if (!fifo->f84_full && (ptr >= fifo->f8_count))
-		return 1;
-	memcpy(*outbuf, fifo->f8[ptr], 8);
-	*outbuf += 8;
-	return 0;
-}
-
-static int sw842_ptr4(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	uint16_t ptr;
-	ptr = sw842_get_ptr4(inbuf, inbit, fifo);
-	if (!fifo->f84_full && (ptr >= fifo->f4_count))
-		return 1;
-	memcpy(*outbuf, fifo->f4[ptr], 4);
-	*outbuf += 4;
-	return 0;
-}
-
-static int sw842_ptr2(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	uint8_t ptr;
-	ptr = sw842_get_ptr2(inbuf, inbit);
-	if (!fifo->f2_full && (ptr >= fifo->f2_count))
-		return 1;
-	memcpy(*outbuf, fifo->f2[ptr], 2);
-	*outbuf += 2;
-	return 0;
-}
-
-static void sw842_copy_to_fifo(const char *buf, struct sw842_fifo *fifo)
-{
-	unsigned char initial_f2count = fifo->f2_count;
-
-	memcpy(fifo->f8[fifo->f8_count], buf, 8);
-	fifo->f4_count += 2;
-	fifo->f8_count += 1;
-
-	if (!fifo->f84_full && fifo->f4_count >= 512) {
-		fifo->f84_full = 1;
-		fifo->f4_count /= 512;
-	}
-
-	memcpy(fifo->f2[fifo->f2_count++], buf, 2);
-	memcpy(fifo->f2[fifo->f2_count++], buf + 2, 2);
-	memcpy(fifo->f2[fifo->f2_count++], buf + 4, 2);
-	memcpy(fifo->f2[fifo->f2_count++], buf + 6, 2);
-	if (fifo->f2_count < initial_f2count)
-		fifo->f2_full = 1;
-}
-
-static int sw842_decompress(const unsigned char *src, int srclen,
-			unsigned char *dst, int *destlen,
-			const void *wrkmem)
-{
-	uint8_t tmpl;
-	const char *inbuf;
-	int inbit = 0;
-	unsigned char *outbuf, *outbuf_end, *origbuf, *prevbuf;
-	const char *inbuf_end;
-	sw842_template_op op;
-	int opindex;
-	int i, repeat_count;
-	struct sw842_fifo *fifo;
-	int ret = 0;
-
-	fifo = &((struct nx842_workmem *)(wrkmem))->swfifo;
-	memset(fifo, 0, sizeof(*fifo));
-
-	origbuf = NULL;
-	inbuf = src;
-	inbuf_end = src + srclen;
-	outbuf = dst;
-	outbuf_end = dst + *destlen;
-
-	while ((tmpl = sw842_get_template(&inbuf, &inbit)) != SW842_TMPL_EOF) {
-		if (inbuf >= inbuf_end) {
-			ret = -EINVAL;
-			goto out;
-		}
-
-		opindex = 0;
-		prevbuf = origbuf;
-		origbuf = outbuf;
-		switch (tmpl) {
-		case SW842_TMPL_REPEAT:
-			if (prevbuf == NULL) {
-				ret = -EINVAL;
-				goto out;
-			}
-
-			repeat_count = sw842_get_repeat_count(&inbuf,
-								&inbit) + 1;
-
-			/* Did the repeat count advance past the end of input */
-			if (inbuf > inbuf_end) {
-				ret = -EINVAL;
-				goto out;
-			}
-
-			for (i = 0; i < repeat_count; i++) {
-				/* Would this overflow the output buffer */
-				if ((outbuf + 8) > outbuf_end) {
-					ret = -ENOSPC;
-					goto out;
-				}
-
-				memcpy(outbuf, prevbuf, 8);
-				sw842_copy_to_fifo(outbuf, fifo);
-				outbuf += 8;
-			}
-			break;
-
-		case SW842_TMPL_ZEROS:
-			/* Would this overflow the output buffer */
-			if ((outbuf + 8) > outbuf_end) {
-				ret = -ENOSPC;
-				goto out;
-			}
-
-			memset(outbuf, 0, 8);
-			sw842_copy_to_fifo(outbuf, fifo);
-			outbuf += 8;
-			break;
-
-		default:
-			if (tmpl > 25) {
-				ret = -EINVAL;
-				goto out;
-			}
-
-			/* Does this go past the end of the input buffer */
-			if ((inbuf + 2) > inbuf_end) {
-				ret = -EINVAL;
-				goto out;
-			}
-
-			/* Would this overflow the output buffer */
-			if ((outbuf + 8) > outbuf_end) {
-				ret = -ENOSPC;
-				goto out;
-			}
-
-			while (opindex < 4 &&
-				(op = sw842_tmpl_ops[tmpl][opindex++])
-					!= NULL) {
-				ret = (*op)(&inbuf, &inbit, &outbuf, fifo);
-				if (ret) {
-					ret = -EINVAL;
-					goto out;
-				}
-				sw842_copy_to_fifo(origbuf, fifo);
-			}
-		}
-	}
-
-out:
-	if (!ret)
-		*destlen = (unsigned int)(outbuf - dst);
-	else
-		*destlen = 0;
-
-	return ret;
-}
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 05/11] drivers/crypto/nx: add NX-842 platform frontend driver
  2015-04-07 17:34 [PATCH 00/11] add 842 hw compression for PowerNV platform Dan Streetman
                   ` (3 preceding siblings ...)
  2015-04-07 17:34 ` [PATCH 04/11] drivers/crypto/nx: move nx-842.c to nx-842-pseries.c Dan Streetman
@ 2015-04-07 17:34 ` Dan Streetman
  2015-04-07 17:34 ` [PATCH 06/11] drivers/crypto/nx: add nx842 constraints Dan Streetman
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 46+ messages in thread
From: Dan Streetman @ 2015-04-07 17:34 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Seth Jennings, Robert Jennings, Marcelo Henrique Cerri,
	Fionnuala Gunter, Andrew Morton, Greg KH, Mauro Carvalho Chehab,
	Arnd Bergmann, Joe Perches, linux-crypto, linuxppc-dev,
	linux-kernel, Dan Streetman

Add NX-842 frontend that allows using either the pSeries platform
or PowerNV platform driver for the NX-842 hardware.  Update the
MAINTAINERS file to include the new filenames.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 MAINTAINERS                        |   2 +-
 crypto/842.c                       |   2 +-
 drivers/crypto/Kconfig             |   6 +-
 drivers/crypto/nx/Kconfig          |  33 ++++++---
 drivers/crypto/nx/Makefile         |   4 +-
 drivers/crypto/nx/nx-842-pseries.c |  51 ++++++-------
 drivers/crypto/nx/nx-842.c         | 144 +++++++++++++++++++++++++++++++++++++
 drivers/crypto/nx/nx-842.h         |  32 +++++++++
 include/linux/nx842.h              |   6 +-
 9 files changed, 235 insertions(+), 45 deletions(-)
 create mode 100644 drivers/crypto/nx/nx-842.c
 create mode 100644 drivers/crypto/nx/nx-842.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 3dc973a..5a8d46d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4834,7 +4834,7 @@ F:	drivers/crypto/nx/
 IBM Power 842 compression accelerator
 M:	Dan Streetman <ddstreet@us.ibm.com>
 S:	Supported
-F:	drivers/crypto/nx/nx-842.c
+F:	drivers/crypto/nx/nx-842*
 F:	include/linux/nx842.h
 F:	include/linux/sw842.h
 F:	lib/842/
diff --git a/crypto/842.c b/crypto/842.c
index b48f4f1..d21cedb 100644
--- a/crypto/842.c
+++ b/crypto/842.c
@@ -52,7 +52,7 @@ static int nx842_init(struct crypto_tfm *tfm)
 	struct nx842_ctx *ctx = crypto_tfm_ctx(tfm);
 	int wmemsize;
 
-	wmemsize = max_t(int, nx842_get_workmem_size(), LZO1X_MEM_COMPRESS);
+	wmemsize = max_t(int, NX842_MEM_COMPRESS, LZO1X_MEM_COMPRESS);
 	ctx->nx842_wmem = kmalloc(wmemsize, GFP_NOFS);
 	if (!ctx->nx842_wmem)
 		return -ENOMEM;
diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index 2fb0fdf..6d8b11f 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -312,11 +312,11 @@ config CRYPTO_DEV_S5P
 	  algorithms execution.
 
 config CRYPTO_DEV_NX
-	bool "Support for IBM Power7+ in-Nest cryptographic acceleration"
-	depends on PPC64 && IBMVIO && !CPU_LITTLE_ENDIAN
+	bool "Support for IBM Power in-Nest cryptographic acceleration"
+	depends on PPC64
 	default n
 	help
-	  Support for Power7+ in-Nest cryptographic acceleration.
+	  Support for Power in-Nest cryptographic acceleration.
 
 if CRYPTO_DEV_NX
 	source "drivers/crypto/nx/Kconfig"
diff --git a/drivers/crypto/nx/Kconfig b/drivers/crypto/nx/Kconfig
index f826166..e4396fc 100644
--- a/drivers/crypto/nx/Kconfig
+++ b/drivers/crypto/nx/Kconfig
@@ -1,6 +1,6 @@
 config CRYPTO_DEV_NX_ENCRYPT
-	tristate "Encryption acceleration support"
-	depends on PPC64 && IBMVIO
+	tristate "Encryption acceleration support on pSeries platform"
+	depends on PPC_PSERIES && IBMVIO && !CPU_LITTLE_ENDIAN
 	default y
 	select CRYPTO_AES
 	select CRYPTO_CBC
@@ -12,15 +12,30 @@ config CRYPTO_DEV_NX_ENCRYPT
 	select CRYPTO_SHA256
 	select CRYPTO_SHA512
 	help
-	  Support for Power7+ in-Nest encryption acceleration. This
-	  module supports acceleration for AES and SHA2 algorithms. If you
-	  choose 'M' here, this module will be called nx_crypto.
+	  Support for Power in-Nest encryption acceleration. This
+	  module supports acceleration for AES and SHA2 algorithms on
+	  the pSeries platform.  If you choose 'M' here, this module
+	  will be called nx_crypto.
 
 config CRYPTO_DEV_NX_COMPRESS
 	tristate "Compression acceleration support"
-	depends on PPC64 && IBMVIO
 	default y
 	help
-	  Support for Power7+ in-Nest compression acceleration. This
-	  module supports acceleration for AES and SHA2 algorithms. If you
-	  choose 'M' here, this module will be called nx_compress.
+	  Support for Power in-Nest compression acceleration. This
+	  module supports acceleration for compressing memory with the 842
+	  algorithm.  One of the platform drivers must be selected also.
+	  If you choose 'M' here, this module will be called nx_compress.
+
+if CRYPTO_DEV_NX_COMPRESS
+
+config CRYPTO_DEV_NX_PSERIES_COMPRESS
+	tristate "Compression acceleration support on pSeries platform"
+	depends on PPC_PSERIES && IBMVIO && !CPU_LITTLE_ENDIAN
+	default y
+	help
+	  Support for Power in-Nest compression acceleration. This
+	  module supports acceleration for compressing memory with the 842
+	  algorithm.  This supports NX hardware on the pSeries platform.
+	  If you choose 'M' here, this module will be called nx_compress_pseries.
+
+endif
diff --git a/drivers/crypto/nx/Makefile b/drivers/crypto/nx/Makefile
index 8669ffa..bc7b7ea 100644
--- a/drivers/crypto/nx/Makefile
+++ b/drivers/crypto/nx/Makefile
@@ -11,4 +11,6 @@ nx-crypto-objs := nx.o \
 		  nx-sha512.o
 
 obj-$(CONFIG_CRYPTO_DEV_NX_COMPRESS) += nx-compress.o
-nx-compress-objs := nx-842-pseries.o
+obj-$(CONFIG_CRYPTO_DEV_NX_PSERIES_COMPRESS) += nx-compress-pseries.o
+nx-compress-objs := nx-842.o
+nx-compress-pseries-objs := nx-842-pseries.o
diff --git a/drivers/crypto/nx/nx-842-pseries.c b/drivers/crypto/nx/nx-842-pseries.c
index 887196e..728a148 100644
--- a/drivers/crypto/nx/nx-842-pseries.c
+++ b/drivers/crypto/nx/nx-842-pseries.c
@@ -21,18 +21,13 @@
  *          Seth Jennings <sjenning@linux.vnet.ibm.com>
  */
 
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/nx842.h>
-#include <linux/of.h>
-#include <linux/slab.h>
-
 #include <asm/page.h>
 #include <asm/vio.h>
 
+#include "nx-842.h"
 #include "nx_csbcpb.h" /* struct nx_csbcpb */
 
-#define MODULE_NAME "nx-compress"
+#define MODULE_NAME NX842_PSERIES_MODULE_NAME
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Robert Jennings <rcj@linux.vnet.ibm.com>");
 MODULE_DESCRIPTION("842 H/W Compression driver for IBM Power processors");
@@ -236,18 +231,6 @@ struct nx842_workmem {
 	};
 };
 
-int nx842_get_workmem_size(void)
-{
-	return sizeof(struct nx842_workmem) + NX842_HW_PAGE_SIZE;
-}
-EXPORT_SYMBOL_GPL(nx842_get_workmem_size);
-
-int nx842_get_workmem_size_aligned(void)
-{
-	return sizeof(struct nx842_workmem);
-}
-EXPORT_SYMBOL_GPL(nx842_get_workmem_size_aligned);
-
 static int nx842_validate_result(struct device *dev,
 	struct cop_status_block *csb)
 {
@@ -300,7 +283,7 @@ static int nx842_validate_result(struct device *dev,
 }
 
 /**
- * nx842_compress - Compress data using the 842 algorithm
+ * nx842_pseries_compress - Compress data using the 842 algorithm
  *
  * Compression provide by the NX842 coprocessor on IBM Power systems.
  * The input buffer is compressed and the result is stored in the
@@ -315,7 +298,7 @@ static int nx842_validate_result(struct device *dev,
  * @out: Pointer to output buffer
  * @outlen: Length of output buffer
  * @wrkmem: ptr to buffer for working memory, size determined by
- *          nx842_get_workmem_size()
+ *          NX842_MEM_COMPRESS
  *
  * Returns:
  *   0		Success, output of length @outlen stored in the buffer at @out
@@ -325,7 +308,7 @@ static int nx842_validate_result(struct device *dev,
  *   -EIO	Internal error
  *   -ENODEV	Hardware unavailable
  */
-int nx842_compress(const unsigned char *in, unsigned int inlen,
+static int nx842_pseries_compress(const unsigned char *in, unsigned int inlen,
 		       unsigned char *out, unsigned int *outlen, void *wmem)
 {
 	struct nx842_header *hdr;
@@ -493,13 +476,12 @@ unlock:
 	rcu_read_unlock();
 	return ret;
 }
-EXPORT_SYMBOL_GPL(nx842_compress);
 
 static int sw842_decompress(const unsigned char *, int, unsigned char *, int *,
 			const void *);
 
 /**
- * nx842_decompress - Decompress data using the 842 algorithm
+ * nx842_pseries_decompress - Decompress data using the 842 algorithm
  *
  * Decompression provide by the NX842 coprocessor on IBM Power systems.
  * The input buffer is decompressed and the result is stored in the
@@ -515,7 +497,7 @@ static int sw842_decompress(const unsigned char *, int, unsigned char *, int *,
  * @out: Pointer to output buffer, must be page aligned
  * @outlen: Length of output buffer, must be PAGE_SIZE
  * @wrkmem: ptr to buffer for working memory, size determined by
- *          nx842_get_workmem_size()
+ *          NX842_MEM_COMPRESS
  *
  * Returns:
  *   0		Success, output of length @outlen stored in the buffer at @out
@@ -525,7 +507,7 @@ static int sw842_decompress(const unsigned char *, int, unsigned char *, int *,
  *   -EINVAL	Bad input data encountered when attempting decompress
  *   -EIO	Internal error
  */
-int nx842_decompress(const unsigned char *in, unsigned int inlen,
+static int nx842_pseries_decompress(const unsigned char *in, unsigned int inlen,
 			 unsigned char *out, unsigned int *outlen, void *wmem)
 {
 	struct nx842_header *hdr;
@@ -694,7 +676,6 @@ unlock:
 	rcu_read_unlock();
 	return ret;
 }
-EXPORT_SYMBOL_GPL(nx842_decompress);
 
 /**
  * nx842_OF_set_defaults -- Set default (disabled) values for devdata
@@ -1130,6 +1111,12 @@ static struct attribute_group nx842_attribute_group = {
 	.attrs = nx842_sysfs_entries,
 };
 
+static struct nx842_driver nx842_pseries_driver = {
+	.owner =	THIS_MODULE,
+	.compress =	nx842_pseries_compress,
+	.decompress =	nx842_pseries_decompress,
+};
+
 static int __init nx842_probe(struct vio_dev *viodev,
 				  const struct vio_device_id *id)
 {
@@ -1192,6 +1179,8 @@ static int __init nx842_probe(struct vio_dev *viodev,
 		goto error;
 	}
 
+	nx842_register_driver(&nx842_pseries_driver);
+
 	return 0;
 
 error_unlock:
@@ -1222,11 +1211,14 @@ static int __exit nx842_remove(struct vio_dev *viodev)
 	if (old_devdata)
 		kfree(old_devdata->counters);
 	kfree(old_devdata);
+
+	nx842_unregister_driver(&nx842_pseries_driver);
+
 	return 0;
 }
 
 static struct vio_device_id nx842_driver_ids[] = {
-	{"ibm,compression-v1", "ibm,compression"},
+	{NX842_PSERIES_COMPAT_NAME "-v1", NX842_PSERIES_COMPAT_NAME},
 	{"", ""},
 };
 
@@ -1243,6 +1235,8 @@ static int __init nx842_init(void)
 	struct nx842_devdata *new_devdata;
 	pr_info("Registering IBM Power 842 compression driver\n");
 
+	BUILD_BUG_ON(sizeof(struct nx842_workmem) > NX842_MEM_COMPRESS);
+
 	RCU_INIT_POINTER(devdata, NULL);
 	new_devdata = kzalloc(sizeof(*new_devdata), GFP_KERNEL);
 	if (!new_devdata) {
@@ -1272,6 +1266,7 @@ static void __exit nx842_exit(void)
 	if (old_devdata)
 		dev_set_drvdata(old_devdata->dev, NULL);
 	kfree(old_devdata);
+	nx842_unregister_driver(&nx842_pseries_driver);
 	vio_unregister_driver(&nx842_driver);
 }
 
diff --git a/drivers/crypto/nx/nx-842.c b/drivers/crypto/nx/nx-842.c
new file mode 100644
index 0000000..815d277
--- /dev/null
+++ b/drivers/crypto/nx/nx-842.c
@@ -0,0 +1,144 @@
+/*
+ * Driver frontend for IBM Power 842 compression accelerator
+ *
+ * Copyright (C) 2015 Dan Streetman, IBM Corp
+ *
+ * Designer of the Power data compression engine:
+ *   Bulent Abali <abali@us.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include "nx-842.h"
+
+#define MODULE_NAME "nx-compress"
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Dan Streetman <ddstreet@ieee.org>");
+MODULE_DESCRIPTION("842 H/W Compression driver for IBM Power processors");
+
+/* Only one driver is expected, based on the HW platform */
+static struct nx842_driver *nx842_driver;
+static DEFINE_SPINLOCK(nx842_driver_lock); /* protects driver pointers */
+
+void nx842_register_driver(struct nx842_driver *driver)
+{
+	spin_lock(&nx842_driver_lock);
+
+	if (nx842_driver) {
+		pr_err("can't register driver %s, already using driver %s\n",
+		       driver->owner->name, nx842_driver->owner->name);
+	} else {
+		pr_info("registering driver %s\n", driver->owner->name);
+		nx842_driver = driver;
+	}
+
+	spin_unlock(&nx842_driver_lock);
+}
+EXPORT_SYMBOL_GPL(nx842_register_driver);
+
+void nx842_unregister_driver(struct nx842_driver *driver)
+{
+	spin_lock(&nx842_driver_lock);
+
+	if (nx842_driver == driver) {
+		pr_info("unregistering driver %s\n", driver->owner->name);
+		nx842_driver = NULL;
+	} else if (nx842_driver) {
+		pr_err("can't unregister driver %s, using driver %s\n",
+		       driver->owner->name, nx842_driver->owner->name);
+	} else {
+		pr_err("can't unregister driver %s, no driver in use\n",
+		       driver->owner->name);
+	}
+
+	spin_unlock(&nx842_driver_lock);
+}
+EXPORT_SYMBOL_GPL(nx842_unregister_driver);
+
+static struct nx842_driver *get_driver(void)
+{
+	struct nx842_driver *driver = NULL;
+
+	spin_lock(&nx842_driver_lock);
+
+	driver = nx842_driver;
+
+	if (driver && !try_module_get(driver->owner))
+		driver = NULL;
+
+	spin_unlock(&nx842_driver_lock);
+
+	return driver;
+}
+
+static void put_driver(struct nx842_driver *driver)
+{
+	module_put(driver->owner);
+}
+
+int nx842_compress(const unsigned char *in, unsigned int in_len,
+			unsigned char *out, unsigned int *out_len,
+			void *wrkmem)
+{
+	struct nx842_driver *driver = get_driver();
+	int ret;
+
+	if (!driver)
+		return -ENODEV;
+
+	ret = driver->compress(in, in_len, out, out_len, wrkmem);
+
+	put_driver(driver);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(nx842_compress);
+
+int nx842_decompress(const unsigned char *in, unsigned int in_len,
+			unsigned char *out, unsigned int *out_len,
+			void *wrkmem)
+{
+	struct nx842_driver *driver = get_driver();
+	int ret;
+
+	if (!driver)
+		return -ENODEV;
+
+	ret = driver->decompress(in, in_len, out, out_len, wrkmem);
+
+	put_driver(driver);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(nx842_decompress);
+
+static __init int nx842_init(void)
+{
+	pr_info("loading\n");
+
+	if (of_find_compatible_node(NULL, NULL, NX842_PSERIES_COMPAT_NAME))
+		request_module_nowait(NX842_PSERIES_MODULE_NAME);
+	else
+		pr_err("no nx842 driver found.\n");
+
+	pr_info("loaded\n");
+
+	return 0;
+}
+module_init(nx842_init);
+
+static void __exit nx842_exit(void)
+{
+	pr_info("NX842 unloaded\n");
+}
+module_exit(nx842_exit);
diff --git a/drivers/crypto/nx/nx-842.h b/drivers/crypto/nx/nx-842.h
new file mode 100644
index 0000000..2a5d4e1
--- /dev/null
+++ b/drivers/crypto/nx/nx-842.h
@@ -0,0 +1,32 @@
+
+#ifndef __NX_842_H__
+#define __NX_842_H__
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/nx842.h>
+#include <linux/of.h>
+#include <linux/slab.h>
+#include <linux/io.h>
+
+struct nx842_driver {
+	struct module *owner;
+
+	int (*compress)(const unsigned char *in, unsigned int in_len,
+			unsigned char *out, unsigned int *out_len,
+			void *wrkmem);
+	int (*decompress)(const unsigned char *in, unsigned int in_len,
+			  unsigned char *out, unsigned int *out_len,
+			  void *wrkmem);
+};
+
+void nx842_register_driver(struct nx842_driver *driver);
+void nx842_unregister_driver(struct nx842_driver *driver);
+
+
+/* To allow the main nx-compress module to load platform module */
+#define NX842_PSERIES_MODULE_NAME	"nx-compress-pseries"
+#define NX842_PSERIES_COMPAT_NAME	"ibm,compression"
+
+
+#endif /* __NX_842_H__ */
diff --git a/include/linux/nx842.h b/include/linux/nx842.h
index a4d324c..778e3ab 100644
--- a/include/linux/nx842.h
+++ b/include/linux/nx842.h
@@ -1,8 +1,10 @@
 #ifndef __NX842_H__
 #define __NX842_H__
 
-int nx842_get_workmem_size(void);
-int nx842_get_workmem_size_aligned(void);
+#define __NX842_PSERIES_MEM_COMPRESS	(PAGE_SIZE * 2 + 10240)
+
+#define NX842_MEM_COMPRESS	__NX842_PSERIES_MEM_COMPRESS
+
 int nx842_compress(const unsigned char *in, unsigned int in_len,
 		unsigned char *out, unsigned int *out_len, void *wrkmem);
 int nx842_decompress(const unsigned char *in, unsigned int in_len,
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 06/11] drivers/crypto/nx: add nx842 constraints
  2015-04-07 17:34 [PATCH 00/11] add 842 hw compression for PowerNV platform Dan Streetman
                   ` (4 preceding siblings ...)
  2015-04-07 17:34 ` [PATCH 05/11] drivers/crypto/nx: add NX-842 platform frontend driver Dan Streetman
@ 2015-04-07 17:34 ` Dan Streetman
  2015-04-07 17:34 ` [PATCH 07/11] drivers/crypto/nx: add PowerNV platform NX-842 driver Dan Streetman
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 46+ messages in thread
From: Dan Streetman @ 2015-04-07 17:34 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Seth Jennings, Robert Jennings, Marcelo Henrique Cerri,
	Fionnuala Gunter, linux-crypto, linuxppc-dev, linux-kernel,
	Dan Streetman

Add "constraints" for the NX-842 driver.  The constraints are used to
indicate what the current NX-842 platform driver is capable of.  The
constraints tell the NX-842 user what alignment, min and max length, and
length multiple each provided buffers should conform to.  These are
required because the 842 hardware requires buffers to meet specific
constraints that vary based on platform - for example, the pSeries
max length is much lower than the PowerNV max length.

These constraints are used by a later patch that improves the crypto 842
driver.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 drivers/crypto/nx/nx-842-pseries.c | 10 ++++++++++
 drivers/crypto/nx/nx-842.c         | 38 ++++++++++++++++++++++++++++++++++++++
 drivers/crypto/nx/nx-842.h         |  2 ++
 include/linux/nx842.h              |  9 +++++++++
 4 files changed, 59 insertions(+)

diff --git a/drivers/crypto/nx/nx-842-pseries.c b/drivers/crypto/nx/nx-842-pseries.c
index 728a148..baa2e52 100644
--- a/drivers/crypto/nx/nx-842-pseries.c
+++ b/drivers/crypto/nx/nx-842-pseries.c
@@ -40,6 +40,13 @@ MODULE_DESCRIPTION("842 H/W Compression driver for IBM Power processors");
 /* IO buffer must be 128 byte aligned */
 #define IO_BUFFER_ALIGN 128
 
+static struct nx842_constraints nx842_pseries_constraints = {
+	.alignment =	IO_BUFFER_ALIGN,
+	.multiple =	DDE_BUFFER_LAST_MULT,
+	.minimum =	IO_BUFFER_ALIGN,
+	.maximum =	PAGE_SIZE, /* dynamic, max_sync_size */
+};
+
 struct nx842_header {
 	int blocks_nr; /* number of compressed blocks */
 	int offset; /* offset of the first block (from beginning of header) */
@@ -840,6 +847,8 @@ static int nx842_OF_upd_maxsyncop(struct nx842_devdata *devdata,
 		goto out;
 	}
 
+	nx842_pseries_constraints.maximum = devdata->max_sync_size;
+
 	devdata->max_sync_sg = (unsigned int)min(maxsynccop->comp_sg_limit,
 						maxsynccop->decomp_sg_limit);
 	if (devdata->max_sync_sg < 1) {
@@ -1113,6 +1122,7 @@ static struct attribute_group nx842_attribute_group = {
 
 static struct nx842_driver nx842_pseries_driver = {
 	.owner =	THIS_MODULE,
+	.constraints =	&nx842_pseries_constraints,
 	.compress =	nx842_pseries_compress,
 	.decompress =	nx842_pseries_decompress,
 };
diff --git a/drivers/crypto/nx/nx-842.c b/drivers/crypto/nx/nx-842.c
index 815d277..279a38e 100644
--- a/drivers/crypto/nx/nx-842.c
+++ b/drivers/crypto/nx/nx-842.c
@@ -86,6 +86,44 @@ static void put_driver(struct nx842_driver *driver)
 	module_put(driver->owner);
 }
 
+/**
+ * nx842_constraints
+ *
+ * This provides the driver's constraints.  Different nx842 implementations
+ * may have varying requirements.  The constraints are:
+ *   @alignment:	All buffers should be aligned to this
+ *   @multiple:		All buffer lengths should be a multiple of this
+ *   @minimum:		Buffer lengths must not be less than this amount
+ *   @maximum:		Buffer lengths must not be more than this amount
+ *
+ * The constraints apply to all buffers and lengths, both input and output,
+ * for both compression and decompression, except for the minimum which
+ * only applies to compression input and decompression output; the
+ * compressed data can be less than the minimum constraint.  It can be
+ * assumed that compressed data will always adhere to the multiple
+ * constraint.
+ *
+ * The driver may succeed even if these constraints are violated;
+ * however the driver can return failure or suffer reduced performance
+ * if any constraint is not met.
+ */
+int nx842_constraints(struct nx842_constraints *c)
+{
+	struct nx842_driver *driver = get_driver();
+	int ret = 0;
+
+	if (!driver)
+		return -ENODEV;
+
+	BUG_ON(!c);
+	memcpy(c, driver->constraints, sizeof(*c));
+
+	put_driver(driver);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(nx842_constraints);
+
 int nx842_compress(const unsigned char *in, unsigned int in_len,
 			unsigned char *out, unsigned int *out_len,
 			void *wrkmem)
diff --git a/drivers/crypto/nx/nx-842.h b/drivers/crypto/nx/nx-842.h
index 2a5d4e1..c6ceb0f 100644
--- a/drivers/crypto/nx/nx-842.h
+++ b/drivers/crypto/nx/nx-842.h
@@ -12,6 +12,8 @@
 struct nx842_driver {
 	struct module *owner;
 
+	struct nx842_constraints *constraints;
+
 	int (*compress)(const unsigned char *in, unsigned int in_len,
 			unsigned char *out, unsigned int *out_len,
 			void *wrkmem);
diff --git a/include/linux/nx842.h b/include/linux/nx842.h
index 778e3ab..883b474 100644
--- a/include/linux/nx842.h
+++ b/include/linux/nx842.h
@@ -5,6 +5,15 @@
 
 #define NX842_MEM_COMPRESS	__NX842_PSERIES_MEM_COMPRESS
 
+struct nx842_constraints {
+	int alignment;
+	int multiple;
+	int minimum;
+	int maximum;
+};
+
+int nx842_constraints(struct nx842_constraints *constraints);
+
 int nx842_compress(const unsigned char *in, unsigned int in_len,
 		unsigned char *out, unsigned int *out_len, void *wrkmem);
 int nx842_decompress(const unsigned char *in, unsigned int in_len,
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 07/11] drivers/crypto/nx: add PowerNV platform NX-842 driver
  2015-04-07 17:34 [PATCH 00/11] add 842 hw compression for PowerNV platform Dan Streetman
                   ` (5 preceding siblings ...)
  2015-04-07 17:34 ` [PATCH 06/11] drivers/crypto/nx: add nx842 constraints Dan Streetman
@ 2015-04-07 17:34 ` Dan Streetman
  2015-04-07 17:34 ` [PATCH 08/11] drivers/crypto/nx: simplify pSeries nx842 driver Dan Streetman
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 46+ messages in thread
From: Dan Streetman @ 2015-04-07 17:34 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Seth Jennings, Robert Jennings, Marcelo Henrique Cerri,
	Fionnuala Gunter, linux-crypto, linuxppc-dev, linux-kernel,
	Dan Streetman

Add driver for NX-842 hardware on the PowerNV platform.

This allows the use of the 842 compression hardware coprocessor on
the PowerNV platform.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 drivers/crypto/nx/Kconfig          |  10 +
 drivers/crypto/nx/Makefile         |   2 +
 drivers/crypto/nx/nx-842-powernv.c | 623 +++++++++++++++++++++++++++++++++++++
 drivers/crypto/nx/nx-842-pseries.c |   9 -
 drivers/crypto/nx/nx-842.c         |   4 +-
 drivers/crypto/nx/nx-842.h         |  97 ++++++
 include/linux/nx842.h              |   6 +-
 7 files changed, 739 insertions(+), 12 deletions(-)
 create mode 100644 drivers/crypto/nx/nx-842-powernv.c

diff --git a/drivers/crypto/nx/Kconfig b/drivers/crypto/nx/Kconfig
index e4396fc..4bf400a 100644
--- a/drivers/crypto/nx/Kconfig
+++ b/drivers/crypto/nx/Kconfig
@@ -38,4 +38,14 @@ config CRYPTO_DEV_NX_PSERIES_COMPRESS
 	  algorithm.  This supports NX hardware on the pSeries platform.
 	  If you choose 'M' here, this module will be called nx_compress_pseries.
 
+config CRYPTO_DEV_NX_POWERNV_COMPRESS
+	tristate "Compression acceleration support on PowerNV platform"
+	depends on PPC_POWERNV
+	default y
+	help
+	  Support for Power in-Nest compression acceleration. This
+	  module supports acceleration for compressing memory with the 842
+	  algorithm.  This supports NX hardware on the PowerNV platform.
+	  If you choose 'M' here, this module will be called nx_compress_powernv.
+
 endif
diff --git a/drivers/crypto/nx/Makefile b/drivers/crypto/nx/Makefile
index bc7b7ea..82221f2 100644
--- a/drivers/crypto/nx/Makefile
+++ b/drivers/crypto/nx/Makefile
@@ -12,5 +12,7 @@ nx-crypto-objs := nx.o \
 
 obj-$(CONFIG_CRYPTO_DEV_NX_COMPRESS) += nx-compress.o
 obj-$(CONFIG_CRYPTO_DEV_NX_PSERIES_COMPRESS) += nx-compress-pseries.o
+obj-$(CONFIG_CRYPTO_DEV_NX_POWERNV_COMPRESS) += nx-compress-powernv.o
 nx-compress-objs := nx-842.o
 nx-compress-pseries-objs := nx-842-pseries.o
+nx-compress-powernv-objs := nx-842-powernv.o
diff --git a/drivers/crypto/nx/nx-842-powernv.c b/drivers/crypto/nx/nx-842-powernv.c
new file mode 100644
index 0000000..f1624a8
--- /dev/null
+++ b/drivers/crypto/nx/nx-842-powernv.c
@@ -0,0 +1,623 @@
+/*
+ * Driver for IBM PowerNV 842 compression accelerator
+ *
+ * Copyright (C) 2015 Dan Streetman, IBM Corp
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include "nx-842.h"
+
+#include <linux/timer.h>
+
+#include <asm/prom.h>
+#include <asm/icswx.h>
+
+#define MODULE_NAME NX842_POWERNV_MODULE_NAME
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Dan Streetman <ddstreet@ieee.org>");
+MODULE_DESCRIPTION("842 H/W Compression driver for IBM PowerNV processors");
+
+#define WORKMEM_ALIGN	(CRB_ALIGN)
+#define CSB_WAIT_MAX	(5000) /* ms */
+
+struct nx842_workmem {
+	/* Below fields must be properly aligned */
+	struct coprocessor_request_block crb; /* CRB_ALIGN align */
+	struct data_descriptor_entry ddl_in[DDL_LEN_MAX]; /* DDE_ALIGN align */
+	struct data_descriptor_entry ddl_out[DDL_LEN_MAX]; /* DDE_ALIGN align */
+	/* Above fields must be properly aligned */
+
+	ktime_t start;
+
+	char padding[WORKMEM_ALIGN]; /* unused, to allow alignment */
+} __packed __aligned(WORKMEM_ALIGN);
+
+struct nx842_coproc {
+	unsigned int chip_id;
+	unsigned int ct;
+	unsigned int ci;
+	struct list_head list;
+};
+
+/* no cpu hotplug on powernv, so this list never changes after init */
+static LIST_HEAD(nx842_coprocs);
+static unsigned int nx842_ct;
+
+/**
+ * setup_indirect_dde - Setup an indirect DDE
+ *
+ * The DDE is setup with the the DDE count, byte count, and address of
+ * first direct DDE in the list.
+ */
+static void setup_indirect_dde(struct data_descriptor_entry *dde,
+			struct data_descriptor_entry *ddl,
+			unsigned int dde_count, unsigned int byte_count)
+{
+	dde->flags = 0;
+	dde->count = dde_count;
+	dde->index = 0;
+	dde->length = cpu_to_be32(byte_count);
+	dde->address = cpu_to_be64(nx842_get_pa(ddl));
+}
+
+/**
+ * setup_direct_dde - Setup single DDE from buffer
+ *
+ * The DDE is setup with the buffer and length.  The buffer must be properly
+ * aligned.  The used length is returned.
+ * Returns:
+ *   N    Successfully set up DDE with N bytes
+ */
+static unsigned int setup_direct_dde(struct data_descriptor_entry *dde,
+			unsigned long pa, unsigned int len)
+{
+	unsigned int l = min_t(unsigned int, len, LEN_ON_PAGE(pa));
+
+	dde->flags = 0;
+	dde->count = 0;
+	dde->index = 0;
+	dde->length = cpu_to_be32(l);
+	dde->address = cpu_to_be64(pa);
+
+	return l;
+}
+
+/**
+ * setup_ddl - Setup DDL from buffer
+ *
+ * Returns:
+ *   0		Successfully set up DDL
+ */
+static int setup_ddl(struct data_descriptor_entry *dde,
+			struct data_descriptor_entry *ddl,
+			unsigned char *buf, unsigned int len,
+			bool in)
+{
+	unsigned long pa = nx842_get_pa(buf);
+	int i, ret, total_len = len;
+
+	if (!IS_ALIGNED(pa, DDE_BUFFER_ALIGN)) {
+		pr_debug("%s buffer pa 0x%lx not 0x%x-byte aligned\n",
+			 in ? "input" : "output", pa, DDE_BUFFER_ALIGN);
+		return -EINVAL;
+	}
+
+	/* only need to check last mult; since buffer must be
+	 * DDE_BUFFER_ALIGN aligned, and that is a multiple of
+	 * DDE_BUFFER_SIZE_MULT, and pre-last page DDE buffers
+	 * are guaranteed a multiple of DDE_BUFFER_SIZE_MULT.
+	 */
+	if (len % DDE_BUFFER_LAST_MULT) {
+		pr_debug("%s buffer len 0x%x not a multiple of 0x%x\n",
+			 in ? "input" : "output", len, DDE_BUFFER_LAST_MULT);
+		if (in)
+			return -EINVAL;
+		len = round_down(len, DDE_BUFFER_LAST_MULT);
+	}
+
+	/* use a single direct DDE */
+	if (len <= LEN_ON_PAGE(pa)) {
+		ret = setup_direct_dde(dde, pa, len);
+		WARN_ON(ret < len);
+		return 0;
+	}
+
+	/* use the DDL */
+	for (i = 0; i < DDL_LEN_MAX && len > 0; i++) {
+		ret = setup_direct_dde(&ddl[i], pa, len);
+		buf += ret;
+		len -= ret;
+		pa = nx842_get_pa(buf);
+	}
+
+	if (len > 0) {
+		pr_debug("0x%x total %s bytes 0x%x too many for DDL.\n",
+			 total_len, in ? "input" : "output", len);
+		if (in)
+			return -EMSGSIZE;
+		total_len -= len;
+	}
+	setup_indirect_dde(dde, ddl, i, total_len);
+
+	return 0;
+}
+
+#define CSB_ERR(csb, msg, ...)					\
+	pr_err("ERROR: " msg " : %02x %02x %02x %02x %08x\n",	\
+	       ##__VA_ARGS__, (csb)->flags,			\
+	       (csb)->cs, (csb)->cc, (csb)->ce,			\
+	       be32_to_cpu((csb)->count))
+
+#define CSB_ERR_ADDR(csb, msg, ...)				\
+	CSB_ERR(csb, msg " at %lx", ##__VA_ARGS__,		\
+		(unsigned long)be64_to_cpu((csb)->address))
+
+/**
+ * wait_for_csb
+ */
+static int wait_for_csb(struct nx842_workmem *wmem,
+			struct coprocessor_status_block *csb)
+{
+	ktime_t start = wmem->start, now = ktime_get();
+	ktime_t timeout = ktime_add_ms(start, CSB_WAIT_MAX);
+
+	while (!(ACCESS_ONCE(csb->flags) & CSB_V)) {
+		cpu_relax();
+		now = ktime_get();
+		if (ktime_after(now, timeout))
+			break;
+	}
+
+	/* hw has updated csb and output buffer */
+	barrier();
+
+	/* check CSB flags */
+	if (!(csb->flags & CSB_V)) {
+		CSB_ERR(csb, "CSB still not valid after %ld us, giving up",
+			(long)ktime_us_delta(now, start));
+		return -ETIMEDOUT;
+	}
+	if (csb->flags & CSB_F) {
+		CSB_ERR(csb, "Invalid CSB format");
+		return -EPROTO;
+	}
+	if (csb->flags & CSB_CH) {
+		CSB_ERR(csb, "Invalid CSB chaining state");
+		return -EPROTO;
+	}
+
+	/* verify CSB completion sequence is 0 */
+	if (csb->cs) {
+		CSB_ERR(csb, "Invalid CSB completion sequence");
+		return -EPROTO;
+	}
+
+	/* check CSB Completion Code */
+	switch (csb->cc) {
+	/* no error */
+	case CSB_CC_SUCCESS:
+		break;
+	case CSB_CC_TPBC_GT_SPBC:
+		/* not an error, but the compressed data is
+		 * larger than the uncompressed data :(
+		 */
+		break;
+
+	/* input data errors */
+	case CSB_CC_OPERAND_OVERLAP:
+		/* input and output buffers overlap */
+		CSB_ERR(csb, "Operand Overlap error");
+		return -EINVAL;
+	case CSB_CC_INVALID_OPERAND:
+		CSB_ERR(csb, "Invalid operand");
+		return -EINVAL;
+	case CSB_CC_NOSPC:
+		/* output buffer too small */
+		return -ENOSPC;
+	case CSB_CC_ABORT:
+		CSB_ERR(csb, "Function aborted");
+		return -EINTR;
+	case CSB_CC_CRC_MISMATCH:
+		CSB_ERR(csb, "CRC mismatch");
+		return -EINVAL;
+	case CSB_CC_TEMPL_INVALID:
+		CSB_ERR(csb, "Compressed data template invalid");
+		return -EINVAL;
+	case CSB_CC_TEMPL_OVERFLOW:
+		CSB_ERR(csb, "Compressed data template shows data past end");
+		return -EINVAL;
+
+	/* these should not happen */
+	case CSB_CC_INVALID_ALIGN:
+		/* setup_ddl should have detected this */
+		CSB_ERR_ADDR(csb, "Invalid alignment");
+		return -EINVAL;
+	case CSB_CC_DATA_LENGTH:
+		/* setup_ddl should have detected this */
+		CSB_ERR(csb, "Invalid data length");
+		return -EINVAL;
+	case CSB_CC_WR_TRANSLATION:
+	case CSB_CC_TRANSLATION:
+	case CSB_CC_TRANSLATION_DUP1:
+	case CSB_CC_TRANSLATION_DUP2:
+	case CSB_CC_TRANSLATION_DUP3:
+	case CSB_CC_TRANSLATION_DUP4:
+	case CSB_CC_TRANSLATION_DUP5:
+	case CSB_CC_TRANSLATION_DUP6:
+		/* should not happen, we use physical addrs */
+		CSB_ERR_ADDR(csb, "Translation error");
+		return -EPROTO;
+	case CSB_CC_WR_PROTECTION:
+	case CSB_CC_PROTECTION:
+	case CSB_CC_PROTECTION_DUP1:
+	case CSB_CC_PROTECTION_DUP2:
+	case CSB_CC_PROTECTION_DUP3:
+	case CSB_CC_PROTECTION_DUP4:
+	case CSB_CC_PROTECTION_DUP5:
+	case CSB_CC_PROTECTION_DUP6:
+		/* should not happen, we use physical addrs */
+		CSB_ERR_ADDR(csb, "Protection error");
+		return -EPROTO;
+	case CSB_CC_PRIVILEGE:
+		/* shouldn't happen, we're in HYP mode */
+		CSB_ERR(csb, "Insufficient Privilege error");
+		return -EPROTO;
+	case CSB_CC_EXCESSIVE_DDE:
+		/* shouldn't happen, setup_ddl doesn't use many dde's */
+		CSB_ERR(csb, "Too many DDEs in DDL");
+		return -EINVAL;
+	case CSB_CC_TRANSPORT:
+		/* shouldn't happen, we setup CRB correctly */
+		CSB_ERR(csb, "Invalid CRB");
+		return -EINVAL;
+	case CSB_CC_SEGMENTED_DDL:
+		/* shouldn't happen, setup_ddl creates DDL right */
+		CSB_ERR(csb, "Segmented DDL error");
+		return -EINVAL;
+	case CSB_CC_DDE_OVERFLOW:
+		/* shouldn't happen, setup_ddl creates DDL right */
+		CSB_ERR(csb, "DDE overflow error");
+		return -EINVAL;
+	case CSB_CC_SESSION:
+		/* should not happen with ICSWX */
+		CSB_ERR(csb, "Session violation error");
+		return -EPROTO;
+	case CSB_CC_CHAIN:
+		/* should not happen, we don't use chained CRBs */
+		CSB_ERR(csb, "Chained CRB error");
+		return -EPROTO;
+	case CSB_CC_SEQUENCE:
+		/* should not happen, we don't use chained CRBs */
+		CSB_ERR(csb, "CRB seqeunce number error");
+		return -EPROTO;
+	case CSB_CC_UNKNOWN_CODE:
+		CSB_ERR(csb, "Unknown subfunction code");
+		return -EPROTO;
+
+	/* hardware errors */
+	case CSB_CC_RD_EXTERNAL:
+	case CSB_CC_RD_EXTERNAL_DUP1:
+	case CSB_CC_RD_EXTERNAL_DUP2:
+	case CSB_CC_RD_EXTERNAL_DUP3:
+		CSB_ERR_ADDR(csb, "Read error outside coprocessor");
+		return -EPROTO;
+	case CSB_CC_WR_EXTERNAL:
+		CSB_ERR_ADDR(csb, "Write error outside coprocessor");
+		return -EPROTO;
+	case CSB_CC_INTERNAL:
+		CSB_ERR(csb, "Internal error in coprocessor");
+		return -EPROTO;
+	case CSB_CC_PROVISION:
+		CSB_ERR(csb, "Storage provision error");
+		return -EPROTO;
+	case CSB_CC_HW:
+		CSB_ERR(csb, "Correctable hardware error");
+		return -EPROTO;
+
+	default:
+		CSB_ERR(csb, "Invalid CC %d", csb->cc);
+		return -EPROTO;
+	}
+
+	/* check Completion Extension state */
+	if (csb->ce & CSB_CE_TERMINATION) {
+		CSB_ERR(csb, "CSB request was terminated");
+		return -EPROTO;
+	}
+	if (csb->ce & CSB_CE_INCOMPLETE) {
+		CSB_ERR(csb, "CSB request not complete");
+		return -EPROTO;
+	}
+	if (!(csb->ce & CSB_CE_TPBC)) {
+		CSB_ERR(csb, "TPBC not provided, unknown target length");
+		return -EPROTO;
+	}
+
+	/* successful completion */
+	pr_debug_ratelimited("Processed %u bytes in %lu us\n", csb->count,
+			     (unsigned long)ktime_us_delta(now, start));
+
+	return 0;
+}
+
+/**
+ * nx842_powernv_function - compress/decompress data using the 842 algorithm
+ *
+ * (De)compression provided by the NX842 coprocessor on IBM PowerNV systems.
+ * This compresses or decompresses the provided input buffer into the provided
+ * output buffer.
+ *
+ * Upon return from this function @outlen contains the length of the
+ * output data.  If there is an error then @outlen will be 0 and an
+ * error will be specified by the return code from this function.
+ *
+ * The @workmem buffer should only be used by one function call at a time.
+ *
+ * @in: input buffer pointer
+ * @inlen: input buffer size
+ * @out: output buffer pointer
+ * @outlenp: output buffer size pointer
+ * @workmem: working memory buffer pointer, must be at least NX842_MEM_COMPRESS
+ * @fc: function code, see CCW Function Codes in nx-842.h
+ *
+ * Returns:
+ *   0		Success, output of length @outlenp stored in the buffer at @out
+ *   -ENODEV	Hardware unavailable
+ *   -ENOSPC	Output buffer is to small
+ *   -EMSGSIZE	Input buffer too large
+ *   -EINVAL	buffer constraints do not fix nx842_constraints
+ *   -EPROTO	hardware error during operation
+ *   -ETIMEDOUT	hardware did not complete operation in reasonable time
+ *   -EINTR	operation was aborted
+ */
+static int nx842_powernv_function(const unsigned char *in, unsigned int inlen,
+			unsigned char *out, unsigned int *outlenp,
+			void *workmem, int fc)
+{
+	struct coprocessor_request_block *crb;
+	struct coprocessor_status_block *csb;
+	struct nx842_workmem *wmem;
+	int ret;
+	u64 csb_addr;
+	u32 ccw;
+	unsigned int outlen = *outlenp;
+
+	wmem = PTR_ALIGN(workmem, WORKMEM_ALIGN);
+
+	*outlenp = 0;
+
+	/* shoudn't happen, we don't load without a coproc */
+	if (!nx842_ct) {
+		pr_err_ratelimited("coprocessor CT is 0");
+		return -ENODEV;
+	}
+
+	crb = &wmem->crb;
+	csb = &crb->csb;
+
+	/* Clear any previous values */
+	memset(crb, 0, sizeof(*crb));
+
+	/* set up DDLs */
+	ret = setup_ddl(&crb->source, wmem->ddl_in,
+			(unsigned char *)in, inlen, true);
+	if (ret)
+		return ret;
+	ret = setup_ddl(&crb->target, wmem->ddl_out,
+			out, outlen, false);
+	if (ret)
+		return ret;
+
+	/* set up CCW */
+	ccw = 0;
+	ccw = SET_FIELD(ccw, CCW_CT, nx842_ct);
+	ccw = SET_FIELD(ccw, CCW_CI_842, 0); /* use 0 for hw auto-selection */
+	ccw = SET_FIELD(ccw, CCW_FC_842, fc);
+
+	/* set up CRB's CSB addr */
+	csb_addr = nx842_get_pa(csb) & CRB_CSB_ADDRESS;
+	csb_addr |= CRB_CSB_AT; /* Addrs are phys */
+	crb->csb_addr = cpu_to_be64(csb_addr);
+
+	wmem->start = ktime_get();
+
+	/* do ICSWX */
+	ret = icswx(cpu_to_be32(ccw), crb);
+
+	pr_debug_ratelimited("icswx CR %x ccw %x crb->ccw %x\n", ret,
+			     (unsigned int)ccw,
+			     (unsigned int)be32_to_cpu(crb->ccw));
+
+	switch (ret) {
+	case ICSWX_INITIATED:
+		ret = wait_for_csb(wmem, csb);
+		break;
+	case ICSWX_BUSY:
+		pr_debug_ratelimited("842 Coprocessor busy\n");
+		ret = -EBUSY;
+		break;
+	case ICSWX_REJECTED:
+		pr_err_ratelimited("ICSWX rejected\n");
+		ret = -EPROTO;
+		break;
+	default:
+		pr_err_ratelimited("Invalid ICSWX return code %x\n", ret);
+		ret = -EPROTO;
+		break;
+	}
+
+	if (!ret)
+		*outlenp = be32_to_cpu(csb->count);
+
+	return ret;
+}
+
+/**
+ * nx842_powernv_compress - Compress data using the 842 algorithm
+ *
+ * Compression provided by the NX842 coprocessor on IBM PowerNV systems.
+ * The input buffer is compressed and the result is stored in the
+ * provided output buffer.
+ *
+ * Upon return from this function @outlen contains the length of the
+ * compressed data.  If there is an error then @outlen will be 0 and an
+ * error will be specified by the return code from this function.
+ *
+ * @in: input buffer pointer
+ * @inlen: input buffer size
+ * @out: output buffer pointer
+ * @outlenp: output buffer size pointer
+ * @workmem: working memory buffer pointer, must be at least NX842_MEM_COMPRESS
+ *
+ * Returns: see @nx842_powernv_function()
+ */
+static int nx842_powernv_compress(const unsigned char *in, unsigned int inlen,
+		       unsigned char *out, unsigned int *outlenp, void *wmem)
+{
+	return nx842_powernv_function(in, inlen, out, outlenp,
+				      wmem, CCW_FC_842_COMP_NOCRC);
+}
+
+/**
+ * nx842_powernv_decompress - Decompress data using the 842 algorithm
+ *
+ * Decompression provided by the NX842 coprocessor on IBM PowerNV systems.
+ * The input buffer is decompressed and the result is stored in the
+ * provided output buffer.
+ *
+ * Upon return from this function @outlen contains the length of the
+ * decompressed data.  If there is an error then @outlen will be 0 and an
+ * error will be specified by the return code from this function.
+ *
+ * @in: input buffer pointer
+ * @inlen: input buffer size
+ * @out: output buffer pointer
+ * @outlenp: output buffer size pointer
+ * @workmem: working memory buffer pointer, must be at least NX842_MEM_COMPRESS
+ *
+ * Returns: see @nx842_powernv_function()
+ */
+static int nx842_powernv_decompress(const unsigned char *in, unsigned int inlen,
+			 unsigned char *out, unsigned int *outlenp, void *wmem)
+{
+	return nx842_powernv_function(in, inlen, out, outlenp,
+				      wmem, CCW_FC_842_DECOMP_NOCRC);
+}
+
+static int __init nx842_powernv_probe(struct device_node *dn)
+{
+	struct nx842_coproc *coproc;
+	struct property *ct_prop, *ci_prop;
+	unsigned int ct, ci;
+	int chip_id;
+
+	chip_id = of_get_ibm_chip_id(dn);
+	if (chip_id < 0) {
+		pr_err("ibm,chip-id missing\n");
+		return -EINVAL;
+	}
+	ct_prop = of_find_property(dn, "ibm,842-coprocessor-type", NULL);
+	if (!ct_prop) {
+		pr_err("ibm,842-coprocessor-type missing\n");
+		return -EINVAL;
+	}
+	ct = be32_to_cpu(*(unsigned int *)ct_prop->value);
+	ci_prop = of_find_property(dn, "ibm,842-coprocessor-instance", NULL);
+	if (!ci_prop) {
+		pr_err("ibm,842-coprocessor-instance missing\n");
+		return -EINVAL;
+	}
+	ci = be32_to_cpu(*(unsigned int *)ci_prop->value);
+
+	coproc = kmalloc(sizeof(*coproc), GFP_KERNEL);
+	if (!coproc)
+		return -ENOMEM;
+
+	coproc->chip_id = chip_id;
+	coproc->ct = ct;
+	coproc->ci = ci;
+	INIT_LIST_HEAD(&coproc->list);
+	list_add(&coproc->list, &nx842_coprocs);
+
+	pr_info("coprocessor found on chip %d, CT %d CI %d\n", chip_id, ct, ci);
+
+	if (!nx842_ct)
+		nx842_ct = ct;
+	else if (nx842_ct != ct)
+		pr_err("NX842 chip %d, CT %d != first found CT %d\n",
+		       chip_id, ct, nx842_ct);
+
+	return 0;
+}
+
+static struct nx842_constraints nx842_powernv_constraints = {
+	.alignment =	DDE_BUFFER_ALIGN,
+	.multiple =	DDE_BUFFER_LAST_MULT,
+	.minimum =	DDE_BUFFER_LAST_MULT,
+	.maximum =	(DDL_LEN_MAX - 1) * PAGE_SIZE,
+};
+
+static struct nx842_driver nx842_powernv_driver = {
+	.owner =	THIS_MODULE,
+	.constraints =	&nx842_powernv_constraints,
+	.compress =	nx842_powernv_compress,
+	.decompress =	nx842_powernv_decompress,
+};
+
+static __init int nx842_powernv_init(void)
+{
+	struct device_node *dn;
+
+	/* verify workmem size/align restrictions */
+	BUILD_BUG_ON(sizeof(struct nx842_workmem) > NX842_MEM_COMPRESS);
+	BUILD_BUG_ON(WORKMEM_ALIGN % CRB_ALIGN);
+	BUILD_BUG_ON(CRB_ALIGN % DDE_ALIGN);
+	BUILD_BUG_ON(CRB_SIZE % DDE_ALIGN);
+	/* verify buffer size/align restrictions */
+	BUILD_BUG_ON(PAGE_SIZE % DDE_BUFFER_ALIGN);
+	BUILD_BUG_ON(DDE_BUFFER_ALIGN % DDE_BUFFER_SIZE_MULT);
+	BUILD_BUG_ON(DDE_BUFFER_SIZE_MULT % DDE_BUFFER_LAST_MULT);
+
+	pr_info("loading\n");
+
+	for_each_compatible_node(dn, NULL, NX842_POWERNV_COMPAT_NAME)
+		nx842_powernv_probe(dn);
+
+	if (!nx842_ct) {
+		pr_err("no coprocessors found\n");
+		return -ENODEV;
+	}
+
+	nx842_register_driver(&nx842_powernv_driver);
+
+	pr_info("loaded\n");
+
+	return 0;
+}
+module_init(nx842_powernv_init);
+
+static void __exit nx842_powernv_exit(void)
+{
+	struct nx842_coproc *coproc, *n;
+
+	nx842_unregister_driver(&nx842_powernv_driver);
+
+	list_for_each_entry_safe(coproc, n, &nx842_coprocs, list) {
+		list_del(&coproc->list);
+		kfree(coproc);
+	}
+
+	pr_info("unloaded\n");
+}
+module_exit(nx842_powernv_exit);
diff --git a/drivers/crypto/nx/nx-842-pseries.c b/drivers/crypto/nx/nx-842-pseries.c
index baa2e52..3773e36 100644
--- a/drivers/crypto/nx/nx-842-pseries.c
+++ b/drivers/crypto/nx/nx-842-pseries.c
@@ -160,15 +160,6 @@ static inline unsigned long nx842_get_scatterlist_size(
 	return sl->entry_nr * sizeof(struct nx842_slentry);
 }
 
-static inline unsigned long nx842_get_pa(void *addr)
-{
-	if (is_vmalloc_addr(addr))
-		return page_to_phys(vmalloc_to_page(addr))
-		       + offset_in_page(addr);
-	else
-		return __pa(addr);
-}
-
 static int nx842_build_scatterlist(unsigned long buf, int len,
 			struct nx842_scatterlist *sl)
 {
diff --git a/drivers/crypto/nx/nx-842.c b/drivers/crypto/nx/nx-842.c
index 279a38e..6187bcc 100644
--- a/drivers/crypto/nx/nx-842.c
+++ b/drivers/crypto/nx/nx-842.c
@@ -164,7 +164,9 @@ static __init int nx842_init(void)
 {
 	pr_info("loading\n");
 
-	if (of_find_compatible_node(NULL, NULL, NX842_PSERIES_COMPAT_NAME))
+	if (of_find_compatible_node(NULL, NULL, NX842_POWERNV_COMPAT_NAME))
+		request_module_nowait(NX842_POWERNV_MODULE_NAME);
+	else if (of_find_compatible_node(NULL, NULL, NX842_PSERIES_COMPAT_NAME))
 		request_module_nowait(NX842_PSERIES_MODULE_NAME);
 	else
 		pr_err("no nx842 driver found.\n");
diff --git a/drivers/crypto/nx/nx-842.h b/drivers/crypto/nx/nx-842.h
index c6ceb0f..84b15b7 100644
--- a/drivers/crypto/nx/nx-842.h
+++ b/drivers/crypto/nx/nx-842.h
@@ -5,9 +5,104 @@
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/nx842.h>
+#include <linux/sw842.h>
 #include <linux/of.h>
 #include <linux/slab.h>
 #include <linux/io.h>
+#include <linux/mm.h>
+#include <linux/ratelimit.h>
+
+/* Restrictions on Data Descriptor List (DDL) and Entry (DDE) buffers
+ *
+ * From NX P8 workbook, sec 4.9.1 "842 details"
+ *   Each DDE buffer is 128 byte aligned
+ *   Each DDE buffer size is a multiple of 32 bytes (except the last)
+ *   The last DDE buffer size is a multiple of 8 bytes
+ */
+#define DDE_BUFFER_ALIGN	(128)
+#define DDE_BUFFER_SIZE_MULT	(32)
+#define DDE_BUFFER_LAST_MULT	(8)
+
+/* Arbitrary DDL length limit
+ * Allows max buffer size of MAX-1 to MAX pages
+ * (depending on alignment)
+ */
+#define DDL_LEN_MAX		(17)
+
+/* CCW 842 CI/FC masks
+ * NX P8 workbook, section 4.3.1, figure 4-6
+ * "CI/FC Boundary by NX CT type"
+ */
+#define CCW_CI_842		(0x00003ff8)
+#define CCW_FC_842		(0x00000007)
+
+/* CCW Function Codes (FC) for 842
+ * NX P8 workbook, section 4.9, table 4-28
+ * "Function Code Definitions for 842 Memory Compression"
+ */
+#define CCW_FC_842_COMP_NOCRC	(0)
+#define CCW_FC_842_COMP_CRC	(1)
+#define CCW_FC_842_DECOMP_NOCRC	(2)
+#define CCW_FC_842_DECOMP_CRC	(3)
+#define CCW_FC_842_MOVE		(4)
+
+/* CSB CC Error Types for 842
+ * NX P8 workbook, section 4.10.3, table 4-30
+ * "Reported Error Types Summary Table"
+ */
+/* These are all duplicates of existing codes defined in icswx.h. */
+#define CSB_CC_TRANSLATION_DUP1	(80)
+#define CSB_CC_TRANSLATION_DUP2	(82)
+#define CSB_CC_TRANSLATION_DUP3	(84)
+#define CSB_CC_TRANSLATION_DUP4	(86)
+#define CSB_CC_TRANSLATION_DUP5	(92)
+#define CSB_CC_TRANSLATION_DUP6	(94)
+#define CSB_CC_PROTECTION_DUP1	(81)
+#define CSB_CC_PROTECTION_DUP2	(83)
+#define CSB_CC_PROTECTION_DUP3	(85)
+#define CSB_CC_PROTECTION_DUP4	(87)
+#define CSB_CC_PROTECTION_DUP5	(93)
+#define CSB_CC_PROTECTION_DUP6	(95)
+#define CSB_CC_RD_EXTERNAL_DUP1	(89)
+#define CSB_CC_RD_EXTERNAL_DUP2	(90)
+#define CSB_CC_RD_EXTERNAL_DUP3	(91)
+/* These are specific to NX */
+/* 842 codes */
+#define CSB_CC_TPBC_GT_SPBC	(64) /* no error, but >1 comp ratio */
+#define CSB_CC_CRC_MISMATCH	(65) /* decomp crc mismatch */
+#define CSB_CC_TEMPL_INVALID	(66) /* decomp invalid template value */
+#define CSB_CC_TEMPL_OVERFLOW	(67) /* decomp template shows data after end */
+/* sym crypt codes */
+#define CSB_CC_DECRYPT_OVERFLOW	(64)
+/* asym crypt codes */
+#define CSB_CC_MINV_OVERFLOW	(128)
+/* These are reserved for hypervisor use */
+#define CSB_CC_HYP_RESERVE_START	(240)
+#define CSB_CC_HYP_RESERVE_END		(253)
+#define CSB_CC_HYP_NO_HW		(254)
+#define CSB_CC_HYP_HANG_ABORTED		(255)
+
+/* CCB Completion Modes (CM) for 842
+ * NX P8 workbook, section 4.3, figure 4-5
+ * "CRB Details - Normal Cop_Req (CL=00, C=1)"
+ */
+#define CCB_CM_EXTRA_WRITE	(CCB_CM0_ALL_COMPLETIONS & CCB_CM12_STORE)
+#define CCB_CM_INTERRUPT	(CCB_CM0_ALL_COMPLETIONS & CCB_CM12_INTERRUPT)
+
+#define LEN_ON_PAGE(pa)		(PAGE_SIZE - ((pa) & ~PAGE_MASK))
+
+static inline unsigned long nx842_get_pa(void *addr)
+{
+	if (!is_vmalloc_addr(addr))
+		return __pa(addr);
+
+	return page_to_phys(vmalloc_to_page(addr)) + offset_in_page(addr);
+}
+
+/* Get/Set bit fields */
+#define MASK_LSH(m)		(__builtin_ffsl(m) - 1)
+#define GET_FIELD(v, m)		(((v) & (m)) >> MASK_LSH(m))
+#define SET_FIELD(v, m, val)	(((v) & ~(m)) | (((val) << MASK_LSH(m)) & (m)))
 
 struct nx842_driver {
 	struct module *owner;
@@ -27,6 +122,8 @@ void nx842_unregister_driver(struct nx842_driver *driver);
 
 
 /* To allow the main nx-compress module to load platform module */
+#define NX842_POWERNV_MODULE_NAME	"nx-compress-powernv"
+#define NX842_POWERNV_COMPAT_NAME	"ibm,power-nx"
 #define NX842_PSERIES_MODULE_NAME	"nx-compress-pseries"
 #define NX842_PSERIES_COMPAT_NAME	"ibm,compression"
 
diff --git a/include/linux/nx842.h b/include/linux/nx842.h
index 883b474..966d4d5 100644
--- a/include/linux/nx842.h
+++ b/include/linux/nx842.h
@@ -1,9 +1,11 @@
 #ifndef __NX842_H__
 #define __NX842_H__
 
-#define __NX842_PSERIES_MEM_COMPRESS	(PAGE_SIZE * 2 + 10240)
+#define __NX842_PSERIES_MEM_COMPRESS	(10240)
+#define __NX842_POWERNV_MEM_COMPRESS	(1024)
 
-#define NX842_MEM_COMPRESS	__NX842_PSERIES_MEM_COMPRESS
+#define NX842_MEM_COMPRESS	(max_t(unsigned int,			\
+	__NX842_PSERIES_MEM_COMPRESS, __NX842_POWERNV_MEM_COMPRESS))
 
 struct nx842_constraints {
 	int alignment;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 08/11] drivers/crypto/nx: simplify pSeries nx842 driver
  2015-04-07 17:34 [PATCH 00/11] add 842 hw compression for PowerNV platform Dan Streetman
                   ` (6 preceding siblings ...)
  2015-04-07 17:34 ` [PATCH 07/11] drivers/crypto/nx: add PowerNV platform NX-842 driver Dan Streetman
@ 2015-04-07 17:34 ` Dan Streetman
  2015-04-07 17:34 ` [PATCH 09/11] crypto: remove LZO fallback from crypto 842 Dan Streetman
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 46+ messages in thread
From: Dan Streetman @ 2015-04-07 17:34 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Seth Jennings, Robert Jennings, Marcelo Henrique Cerri,
	Fionnuala Gunter, linux-crypto, linuxppc-dev, linux-kernel,
	Dan Streetman

Simplify the pSeries NX-842 driver: do not expect incoming buffers to be
exactly page-sized; do not break up input buffers to compress smaller blocks;
do not use any internal headers in the compressed data blocks; remove the
software decompression implementation.

This changes the pSeries NX-842 driver to perform constraints-based compression
so that it only needs to compress one entire input block at a time.  This
removes the need for it to split input data blocks into multiple compressed
data sections in the output buffer, and removes the need for any extra header
info in the compressed data; all that is moved (in a later patch) into the
main crypto 842 driver.  Additionally, the 842 software decompression
implementation is no longer needed here, and the crypto 842 driver will use
the generic software 842 decompression function as a fallback if any hardware
842 driver fails.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 drivers/crypto/nx/nx-842-pseries.c | 779 ++++++++-----------------------------
 1 file changed, 153 insertions(+), 626 deletions(-)

diff --git a/drivers/crypto/nx/nx-842-pseries.c b/drivers/crypto/nx/nx-842-pseries.c
index 3773e36..0b7bad3 100644
--- a/drivers/crypto/nx/nx-842-pseries.c
+++ b/drivers/crypto/nx/nx-842-pseries.c
@@ -21,7 +21,6 @@
  *          Seth Jennings <sjenning@linux.vnet.ibm.com>
  */
 
-#include <asm/page.h>
 #include <asm/vio.h>
 
 #include "nx-842.h"
@@ -32,11 +31,6 @@ MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Robert Jennings <rcj@linux.vnet.ibm.com>");
 MODULE_DESCRIPTION("842 H/W Compression driver for IBM Power processors");
 
-#define SHIFT_4K 12
-#define SHIFT_64K 16
-#define SIZE_4K (1UL << SHIFT_4K)
-#define SIZE_64K (1UL << SHIFT_64K)
-
 /* IO buffer must be 128 byte aligned */
 #define IO_BUFFER_ALIGN 128
 
@@ -47,18 +41,52 @@ static struct nx842_constraints nx842_pseries_constraints = {
 	.maximum =	PAGE_SIZE, /* dynamic, max_sync_size */
 };
 
-struct nx842_header {
-	int blocks_nr; /* number of compressed blocks */
-	int offset; /* offset of the first block (from beginning of header) */
-	int sizes[0]; /* size of compressed blocks */
-};
-
-static inline int nx842_header_size(const struct nx842_header *hdr)
+static int check_constraints(unsigned long buf, unsigned int *len, bool in)
 {
-	return sizeof(struct nx842_header) +
-			hdr->blocks_nr * sizeof(hdr->sizes[0]);
+	if (!IS_ALIGNED(buf, nx842_pseries_constraints.alignment)) {
+		pr_debug("%s buffer 0x%lx not aligned to 0x%x\n",
+			 in ? "input" : "output", buf,
+			 nx842_pseries_constraints.alignment);
+		return -EINVAL;
+	}
+	if (*len % nx842_pseries_constraints.multiple) {
+		pr_debug("%s buffer len 0x%x not multiple of 0x%x\n",
+			 in ? "input" : "output", *len,
+			 nx842_pseries_constraints.multiple);
+		if (in)
+			return -EINVAL;
+		*len = round_down(*len, nx842_pseries_constraints.multiple);
+	}
+	if (*len < nx842_pseries_constraints.minimum) {
+		pr_debug("%s buffer len 0x%x under minimum 0x%x\n",
+			 in ? "input" : "output", *len,
+			 nx842_pseries_constraints.minimum);
+		return -EINVAL;
+	}
+	if (*len > nx842_pseries_constraints.maximum) {
+		pr_debug("%s buffer len 0x%x over maximum 0x%x\n",
+			 in ? "input" : "output", *len,
+			 nx842_pseries_constraints.maximum);
+		if (in)
+			return -EINVAL;
+		*len = nx842_pseries_constraints.maximum;
+	}
+	return 0;
 }
 
+/* I assume we need to align the CSB? */
+#define WORKMEM_ALIGN	(256)
+
+struct nx842_workmem {
+	/* scatterlist */
+	char slin[4096];
+	char slout[4096];
+	/* coprocessor status/parameter block */
+	struct nx_csbcpb csbcpb;
+
+	char padding[WORKMEM_ALIGN];
+} __packed __aligned(WORKMEM_ALIGN);
+
 /* Macros for fields within nx_csbcpb */
 /* Check the valid bit within the csbcpb valid field */
 #define NX842_CSBCBP_VALID_CHK(x) (x & BIT_MASK(7))
@@ -72,8 +100,7 @@ static inline int nx842_header_size(const struct nx842_header *hdr)
 #define NX842_CSBCPB_CE2(x)	(x & BIT_MASK(5))
 
 /* The NX unit accepts data only on 4K page boundaries */
-#define NX842_HW_PAGE_SHIFT	SHIFT_4K
-#define NX842_HW_PAGE_SIZE	(ASM_CONST(1) << NX842_HW_PAGE_SHIFT)
+#define NX842_HW_PAGE_SIZE	(4096)
 #define NX842_HW_PAGE_MASK	(~(NX842_HW_PAGE_SIZE-1))
 
 enum nx842_status {
@@ -194,41 +221,6 @@ static int nx842_build_scatterlist(unsigned long buf, int len,
 	return 0;
 }
 
-/*
- * Working memory for software decompression
- */
-struct sw842_fifo {
-	union {
-		char f8[256][8];
-		char f4[512][4];
-	};
-	char f2[256][2];
-	unsigned char f84_full;
-	unsigned char f2_full;
-	unsigned char f8_count;
-	unsigned char f2_count;
-	unsigned int f4_count;
-};
-
-/*
- * Working memory for crypto API
- */
-struct nx842_workmem {
-	char bounce[PAGE_SIZE]; /* bounce buffer for decompression input */
-	union {
-		/* hardware working memory */
-		struct {
-			/* scatterlist */
-			char slin[SIZE_4K];
-			char slout[SIZE_4K];
-			/* coprocessor status/parameter block */
-			struct nx_csbcpb csbcpb;
-		};
-		/* software working memory */
-		struct sw842_fifo swfifo; /* software decompression fifo */
-	};
-};
-
 static int nx842_validate_result(struct device *dev,
 	struct cop_status_block *csb)
 {
@@ -291,8 +283,8 @@ static int nx842_validate_result(struct device *dev,
  * compressed data.  If there is an error then @outlen will be 0 and an
  * error will be specified by the return code from this function.
  *
- * @in: Pointer to input buffer, must be page aligned
- * @inlen: Length of input buffer, must be PAGE_SIZE
+ * @in: Pointer to input buffer
+ * @inlen: Length of input buffer
  * @out: Pointer to output buffer
  * @outlen: Length of output buffer
  * @wrkmem: ptr to buffer for working memory, size determined by
@@ -302,36 +294,32 @@ static int nx842_validate_result(struct device *dev,
  *   0		Success, output of length @outlen stored in the buffer at @out
  *   -ENOMEM	Unable to allocate internal buffers
  *   -ENOSPC	Output buffer is to small
- *   -EMSGSIZE	XXX Difficult to describe this limitation
  *   -EIO	Internal error
  *   -ENODEV	Hardware unavailable
  */
 static int nx842_pseries_compress(const unsigned char *in, unsigned int inlen,
 		       unsigned char *out, unsigned int *outlen, void *wmem)
 {
-	struct nx842_header *hdr;
 	struct nx842_devdata *local_devdata;
 	struct device *dev = NULL;
 	struct nx842_workmem *workmem;
 	struct nx842_scatterlist slin, slout;
 	struct nx_csbcpb *csbcpb;
-	int ret = 0, max_sync_size, i, bytesleft, size, hdrsize;
-	unsigned long inbuf, outbuf, padding;
+	int ret = 0, max_sync_size;
+	unsigned long inbuf, outbuf;
 	struct vio_pfo_op op = {
 		.done = NULL,
 		.handle = 0,
 		.timeout = 0,
 	};
-	unsigned long start_time = get_tb();
+	unsigned long start = get_tb();
 
-	/*
-	 * Make sure input buffer is 64k page aligned.  This is assumed since
-	 * this driver is designed for page compression only (for now).  This
-	 * is very nice since we can now use direct DDE(s) for the input and
-	 * the alignment is guaranteed.
-	*/
 	inbuf = (unsigned long)in;
-	if (!IS_ALIGNED(inbuf, PAGE_SIZE) || inlen != PAGE_SIZE)
+	if (check_constraints(inbuf, &inlen, true))
+		return -EINVAL;
+
+	outbuf = (unsigned long)out;
+	if (check_constraints(outbuf, outlen, false))
 		return -EINVAL;
 
 	rcu_read_lock();
@@ -343,16 +331,8 @@ static int nx842_pseries_compress(const unsigned char *in, unsigned int inlen,
 	max_sync_size = local_devdata->max_sync_size;
 	dev = local_devdata->dev;
 
-	/* Create the header */
-	hdr = (struct nx842_header *)out;
-	hdr->blocks_nr = PAGE_SIZE / max_sync_size;
-	hdrsize = nx842_header_size(hdr);
-	outbuf = (unsigned long)out + hdrsize;
-	bytesleft = *outlen - hdrsize;
-
 	/* Init scatterlist */
-	workmem = (struct nx842_workmem *)ALIGN((unsigned long)wmem,
-		NX842_HW_PAGE_SIZE);
+	workmem = PTR_ALIGN(wmem, WORKMEM_ALIGN);
 	slin.entries = (struct nx842_slentry *)workmem->slin;
 	slout.entries = (struct nx842_slentry *)workmem->slout;
 
@@ -363,105 +343,48 @@ static int nx842_pseries_compress(const unsigned char *in, unsigned int inlen,
 	op.csbcpb = nx842_get_pa(csbcpb);
 	op.out = nx842_get_pa(slout.entries);
 
-	for (i = 0; i < hdr->blocks_nr; i++) {
-		/*
-		 * Aligning the output blocks to 128 bytes does waste space,
-		 * but it prevents the need for bounce buffers and memory
-		 * copies.  It also simplifies the code a lot.  In the worst
-		 * case (64k page, 4k max_sync_size), you lose up to
-		 * (128*16)/64k = ~3% the compression factor. For 64k
-		 * max_sync_size, the loss would be at most 128/64k = ~0.2%.
-		 */
-		padding = ALIGN(outbuf, IO_BUFFER_ALIGN) - outbuf;
-		outbuf += padding;
-		bytesleft -= padding;
-		if (i == 0)
-			/* save offset into first block in header */
-			hdr->offset = padding + hdrsize;
-
-		if (bytesleft <= 0) {
-			ret = -ENOSPC;
-			goto unlock;
-		}
-
-		/*
-		 * NOTE: If the default max_sync_size is changed from 4k
-		 * to 64k, remove the "likely" case below, since a
-		 * scatterlist will always be needed.
-		 */
-		if (likely(max_sync_size == NX842_HW_PAGE_SIZE)) {
-			/* Create direct DDE */
-			op.in = nx842_get_pa((void *)inbuf);
-			op.inlen = max_sync_size;
-
-		} else {
-			/* Create indirect DDE (scatterlist) */
-			nx842_build_scatterlist(inbuf, max_sync_size, &slin);
-			op.in = nx842_get_pa(slin.entries);
-			op.inlen = -nx842_get_scatterlist_size(&slin);
-		}
+	if ((inbuf & NX842_HW_PAGE_MASK) ==
+	    ((inbuf + inlen - 1) & NX842_HW_PAGE_MASK)) {
+		/* Create direct DDE */
+		op.in = nx842_get_pa((void *)inbuf);
+		op.inlen = inlen;
+	} else {
+		/* Create indirect DDE (scatterlist) */
+		nx842_build_scatterlist(inbuf, inlen, &slin);
+		op.in = nx842_get_pa(slin.entries);
+		op.inlen = -nx842_get_scatterlist_size(&slin);
+	}
 
-		/*
-		 * If max_sync_size != NX842_HW_PAGE_SIZE, an indirect
-		 * DDE is required for the outbuf.
-		 * If max_sync_size == NX842_HW_PAGE_SIZE, outbuf must
-		 * also be page aligned (1 in 128/4k=32 chance) in order
-		 * to use a direct DDE.
-		 * This is unlikely, just use an indirect DDE always.
-		 */
-		nx842_build_scatterlist(outbuf,
-			min(bytesleft, max_sync_size), &slout);
-		/* op.out set before loop */
+	if ((outbuf & NX842_HW_PAGE_MASK) ==
+	    ((outbuf + *outlen - 1) & NX842_HW_PAGE_MASK)) {
+		/* Create direct DDE */
+		op.out = nx842_get_pa((void *)outbuf);
+		op.outlen = *outlen;
+	} else {
+		/* Create indirect DDE (scatterlist) */
+		nx842_build_scatterlist(outbuf, *outlen, &slout);
+		op.out = nx842_get_pa(slout.entries);
 		op.outlen = -nx842_get_scatterlist_size(&slout);
+	}
 
-		/* Send request to pHyp */
-		ret = vio_h_cop_sync(local_devdata->vdev, &op);
-
-		/* Check for pHyp error */
-		if (ret) {
-			dev_dbg(dev, "%s: vio_h_cop_sync error (ret=%d, hret=%ld)\n",
-				__func__, ret, op.hcall_err);
-			ret = -EIO;
-			goto unlock;
-		}
-
-		/* Check for hardware error */
-		ret = nx842_validate_result(dev, &csbcpb->csb);
-		if (ret && ret != -ENOSPC)
-			goto unlock;
-
-		/* Handle incompressible data */
-		if (unlikely(ret == -ENOSPC)) {
-			if (bytesleft < max_sync_size) {
-				/*
-				 * Not enough space left in the output buffer
-				 * to store uncompressed block
-				 */
-				goto unlock;
-			} else {
-				/* Store incompressible block */
-				memcpy((void *)outbuf, (void *)inbuf,
-					max_sync_size);
-				hdr->sizes[i] = -max_sync_size;
-				outbuf += max_sync_size;
-				bytesleft -= max_sync_size;
-				/* Reset ret, incompressible data handled */
-				ret = 0;
-			}
-		} else {
-			/* Normal case, compression was successful */
-			size = csbcpb->csb.processed_byte_count;
-			dev_dbg(dev, "%s: processed_bytes=%d\n",
-				__func__, size);
-			hdr->sizes[i] = size;
-			outbuf += size;
-			bytesleft -= size;
-		}
+	/* Send request to pHyp */
+	ret = vio_h_cop_sync(local_devdata->vdev, &op);
 
-		inbuf += max_sync_size;
+	/* Check for pHyp error */
+	if (ret) {
+		dev_dbg(dev, "%s: vio_h_cop_sync error (ret=%d, hret=%ld)\n",
+			__func__, ret, op.hcall_err);
+		ret = -EIO;
+		goto unlock;
 	}
 
-	*outlen = (unsigned int)(outbuf - (unsigned long)out);
+	/* Check for hardware error */
+	ret = nx842_validate_result(dev, &csbcpb->csb);
+	if (ret)
+		goto unlock;
+
+	*outlen = csbcpb->csb.processed_byte_count;
+	dev_dbg(dev, "%s: processed_bytes=%d\n", __func__, *outlen);
 
 unlock:
 	if (ret)
@@ -469,15 +392,12 @@ unlock:
 	else {
 		nx842_inc_comp_complete(local_devdata);
 		ibm_nx842_incr_hist(local_devdata->counters->comp_times,
-			(get_tb() - start_time) / tb_ticks_per_usec);
+			(get_tb() - start) / tb_ticks_per_usec);
 	}
 	rcu_read_unlock();
 	return ret;
 }
 
-static int sw842_decompress(const unsigned char *, int, unsigned char *, int *,
-			const void *);
-
 /**
  * nx842_pseries_decompress - Decompress data using the 842 algorithm
  *
@@ -489,11 +409,10 @@ static int sw842_decompress(const unsigned char *, int, unsigned char *, int *,
  * If there is an error then @outlen will be 0 and an error will be
  * specified by the return code from this function.
  *
- * @in: Pointer to input buffer, will use bounce buffer if not 128 byte
- *      aligned
+ * @in: Pointer to input buffer
  * @inlen: Length of input buffer
- * @out: Pointer to output buffer, must be page aligned
- * @outlen: Length of output buffer, must be PAGE_SIZE
+ * @out: Pointer to output buffer
+ * @outlen: Length of output buffer
  * @wrkmem: ptr to buffer for working memory, size determined by
  *          NX842_MEM_COMPRESS
  *
@@ -508,43 +427,39 @@ static int sw842_decompress(const unsigned char *, int, unsigned char *, int *,
 static int nx842_pseries_decompress(const unsigned char *in, unsigned int inlen,
 			 unsigned char *out, unsigned int *outlen, void *wmem)
 {
-	struct nx842_header *hdr;
 	struct nx842_devdata *local_devdata;
 	struct device *dev = NULL;
 	struct nx842_workmem *workmem;
 	struct nx842_scatterlist slin, slout;
 	struct nx_csbcpb *csbcpb;
-	int ret = 0, i, size, max_sync_size;
+	int ret = 0, max_sync_size;
 	unsigned long inbuf, outbuf;
 	struct vio_pfo_op op = {
 		.done = NULL,
 		.handle = 0,
 		.timeout = 0,
 	};
-	unsigned long start_time = get_tb();
+	unsigned long start = get_tb();
 
 	/* Ensure page alignment and size */
+	inbuf = (unsigned long)in;
+	if (check_constraints(inbuf, &inlen, true))
+		return -EINVAL;
+
 	outbuf = (unsigned long)out;
-	if (!IS_ALIGNED(outbuf, PAGE_SIZE) || *outlen != PAGE_SIZE)
+	if (check_constraints(outbuf, outlen, false))
 		return -EINVAL;
 
 	rcu_read_lock();
 	local_devdata = rcu_dereference(devdata);
-	if (local_devdata)
-		dev = local_devdata->dev;
-
-	/* Get header */
-	hdr = (struct nx842_header *)in;
-
-	workmem = (struct nx842_workmem *)ALIGN((unsigned long)wmem,
-		NX842_HW_PAGE_SIZE);
-
-	inbuf = (unsigned long)in + hdr->offset;
-	if (likely(!IS_ALIGNED(inbuf, IO_BUFFER_ALIGN))) {
-		/* Copy block(s) into bounce buffer for alignment */
-		memcpy(workmem->bounce, in + hdr->offset, inlen - hdr->offset);
-		inbuf = (unsigned long)workmem->bounce;
+	if (!local_devdata || !local_devdata->dev) {
+		rcu_read_unlock();
+		return -ENODEV;
 	}
+	max_sync_size = local_devdata->max_sync_size;
+	dev = local_devdata->dev;
+
+	workmem = PTR_ALIGN(wmem, WORKMEM_ALIGN);
 
 	/* Init scatterlist */
 	slin.entries = (struct nx842_slentry *)workmem->slin;
@@ -556,119 +471,55 @@ static int nx842_pseries_decompress(const unsigned char *in, unsigned int inlen,
 	memset(csbcpb, 0, sizeof(*csbcpb));
 	op.csbcpb = nx842_get_pa(csbcpb);
 
-	/*
-	 * max_sync_size may have changed since compression,
-	 * so we can't read it from the device info. We need
-	 * to derive it from hdr->blocks_nr.
-	 */
-	max_sync_size = PAGE_SIZE / hdr->blocks_nr;
-
-	for (i = 0; i < hdr->blocks_nr; i++) {
-		/* Skip padding */
-		inbuf = ALIGN(inbuf, IO_BUFFER_ALIGN);
-
-		if (hdr->sizes[i] < 0) {
-			/* Negative sizes indicate uncompressed data blocks */
-			size = abs(hdr->sizes[i]);
-			memcpy((void *)outbuf, (void *)inbuf, size);
-			outbuf += size;
-			inbuf += size;
-			continue;
-		}
-
-		if (!dev)
-			goto sw;
-
-		/*
-		 * The better the compression, the more likely the "likely"
-		 * case becomes.
-		 */
-		if (likely((inbuf & NX842_HW_PAGE_MASK) ==
-			((inbuf + hdr->sizes[i] - 1) & NX842_HW_PAGE_MASK))) {
-			/* Create direct DDE */
-			op.in = nx842_get_pa((void *)inbuf);
-			op.inlen = hdr->sizes[i];
-		} else {
-			/* Create indirect DDE (scatterlist) */
-			nx842_build_scatterlist(inbuf, hdr->sizes[i] , &slin);
-			op.in = nx842_get_pa(slin.entries);
-			op.inlen = -nx842_get_scatterlist_size(&slin);
-		}
-
-		/*
-		 * NOTE: If the default max_sync_size is changed from 4k
-		 * to 64k, remove the "likely" case below, since a
-		 * scatterlist will always be needed.
-		 */
-		if (likely(max_sync_size == NX842_HW_PAGE_SIZE)) {
-			/* Create direct DDE */
-			op.out = nx842_get_pa((void *)outbuf);
-			op.outlen = max_sync_size;
-		} else {
-			/* Create indirect DDE (scatterlist) */
-			nx842_build_scatterlist(outbuf, max_sync_size, &slout);
-			op.out = nx842_get_pa(slout.entries);
-			op.outlen = -nx842_get_scatterlist_size(&slout);
-		}
-
-		/* Send request to pHyp */
-		ret = vio_h_cop_sync(local_devdata->vdev, &op);
-
-		/* Check for pHyp error */
-		if (ret) {
-			dev_dbg(dev, "%s: vio_h_cop_sync error (ret=%d, hret=%ld)\n",
-				__func__, ret, op.hcall_err);
-			dev = NULL;
-			goto sw;
-		}
+	if ((inbuf & NX842_HW_PAGE_MASK) ==
+	    ((inbuf + inlen - 1) & NX842_HW_PAGE_MASK)) {
+		/* Create direct DDE */
+		op.in = nx842_get_pa((void *)inbuf);
+		op.inlen = inlen;
+	} else {
+		/* Create indirect DDE (scatterlist) */
+		nx842_build_scatterlist(inbuf, inlen, &slin);
+		op.in = nx842_get_pa(slin.entries);
+		op.inlen = -nx842_get_scatterlist_size(&slin);
+	}
 
-		/* Check for hardware error */
-		ret = nx842_validate_result(dev, &csbcpb->csb);
-		if (ret) {
-			dev = NULL;
-			goto sw;
-		}
+	if ((outbuf & NX842_HW_PAGE_MASK) ==
+	    ((outbuf + *outlen - 1) & NX842_HW_PAGE_MASK)) {
+		/* Create direct DDE */
+		op.out = nx842_get_pa((void *)outbuf);
+		op.outlen = *outlen;
+	} else {
+		/* Create indirect DDE (scatterlist) */
+		nx842_build_scatterlist(outbuf, *outlen, &slout);
+		op.out = nx842_get_pa(slout.entries);
+		op.outlen = -nx842_get_scatterlist_size(&slout);
+	}
 
-		/* HW decompression success */
-		inbuf += hdr->sizes[i];
-		outbuf += csbcpb->csb.processed_byte_count;
-		continue;
-
-sw:
-		/* software decompression */
-		size = max_sync_size;
-		ret = sw842_decompress(
-			(unsigned char *)inbuf, hdr->sizes[i],
-			(unsigned char *)outbuf, &size, wmem);
-		if (ret)
-			pr_debug("%s: sw842_decompress failed with %d\n",
-				__func__, ret);
-
-		if (ret) {
-			if (ret != -ENOSPC && ret != -EINVAL &&
-					ret != -EMSGSIZE)
-				ret = -EIO;
-			goto unlock;
-		}
+	/* Send request to pHyp */
+	ret = vio_h_cop_sync(local_devdata->vdev, &op);
 
-		/* SW decompression success */
-		inbuf += hdr->sizes[i];
-		outbuf += size;
+	/* Check for pHyp error */
+	if (ret) {
+		dev_dbg(dev, "%s: vio_h_cop_sync error (ret=%d, hret=%ld)\n",
+			__func__, ret, op.hcall_err);
+		goto unlock;
 	}
 
-	*outlen = (unsigned int)(outbuf - (unsigned long)out);
+	/* Check for hardware error */
+	ret = nx842_validate_result(dev, &csbcpb->csb);
+	if (ret)
+		goto unlock;
+
+	*outlen = csbcpb->csb.processed_byte_count;
 
 unlock:
 	if (ret)
 		/* decompress fail */
 		nx842_inc_decomp_failed(local_devdata);
 	else {
-		if (!dev)
-			/* software decompress */
-			nx842_inc_swdecomp(local_devdata);
 		nx842_inc_decomp_complete(local_devdata);
 		ibm_nx842_incr_hist(local_devdata->counters->decomp_times,
-			(get_tb() - start_time) / tb_ticks_per_usec);
+			(get_tb() - start) / tb_ticks_per_usec);
 	}
 
 	rcu_read_unlock();
@@ -827,9 +678,9 @@ static int nx842_OF_upd_maxsyncop(struct nx842_devdata *devdata,
 					maxsynccop->decomp_data_limit);
 
 	devdata->max_sync_size = min_t(unsigned int, devdata->max_sync_size,
-					SIZE_64K);
+					65536);
 
-	if (devdata->max_sync_size < SIZE_4K) {
+	if (devdata->max_sync_size < 4096) {
 		dev_err(devdata->dev, "%s: hardware max data size (%u) is "
 				"less than the driver minimum, unable to use "
 				"the hardware device\n",
@@ -1218,17 +1069,17 @@ static int __exit nx842_remove(struct vio_dev *viodev)
 	return 0;
 }
 
-static struct vio_device_id nx842_driver_ids[] = {
+static struct vio_device_id nx842_vio_driver_ids[] = {
 	{NX842_PSERIES_COMPAT_NAME "-v1", NX842_PSERIES_COMPAT_NAME},
 	{"", ""},
 };
 
-static struct vio_driver nx842_driver = {
+static struct vio_driver nx842_vio_driver = {
 	.name = MODULE_NAME,
 	.probe = nx842_probe,
 	.remove = __exit_p(nx842_remove),
 	.get_desired_dma = nx842_get_desired_dma,
-	.id_table = nx842_driver_ids,
+	.id_table = nx842_vio_driver_ids,
 };
 
 static int __init nx842_init(void)
@@ -1247,7 +1098,7 @@ static int __init nx842_init(void)
 	new_devdata->status = UNAVAILABLE;
 	RCU_INIT_POINTER(devdata, new_devdata);
 
-	return vio_register_driver(&nx842_driver);
+	return vio_register_driver(&nx842_vio_driver);
 }
 
 module_init(nx842_init);
@@ -1264,336 +1115,12 @@ static void __exit nx842_exit(void)
 	RCU_INIT_POINTER(devdata, NULL);
 	spin_unlock_irqrestore(&devdata_mutex, flags);
 	synchronize_rcu();
-	if (old_devdata)
+	if (old_devdata && old_devdata->dev)
 		dev_set_drvdata(old_devdata->dev, NULL);
 	kfree(old_devdata);
 	nx842_unregister_driver(&nx842_pseries_driver);
-	vio_unregister_driver(&nx842_driver);
+	vio_unregister_driver(&nx842_vio_driver);
 }
 
 module_exit(nx842_exit);
 
-/*********************************
- * 842 software decompressor
-*********************************/
-typedef int (*sw842_template_op)(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-
-static int sw842_data8(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-static int sw842_data4(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-static int sw842_data2(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-static int sw842_ptr8(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-static int sw842_ptr4(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-static int sw842_ptr2(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-
-/* special templates */
-#define SW842_TMPL_REPEAT 0x1B
-#define SW842_TMPL_ZEROS 0x1C
-#define SW842_TMPL_EOF 0x1E
-
-static sw842_template_op sw842_tmpl_ops[26][4] = {
-	{ sw842_data8, NULL}, /* 0 (00000) */
-	{ sw842_data4, sw842_data2, sw842_ptr2,  NULL},
-	{ sw842_data4, sw842_ptr2,  sw842_data2, NULL},
-	{ sw842_data4, sw842_ptr2,  sw842_ptr2,  NULL},
-	{ sw842_data4, sw842_ptr4,  NULL},
-	{ sw842_data2, sw842_ptr2,  sw842_data4, NULL},
-	{ sw842_data2, sw842_ptr2,  sw842_data2, sw842_ptr2},
-	{ sw842_data2, sw842_ptr2,  sw842_ptr2,  sw842_data2},
-	{ sw842_data2, sw842_ptr2,  sw842_ptr2,  sw842_ptr2,},
-	{ sw842_data2, sw842_ptr2,  sw842_ptr4,  NULL},
-	{ sw842_ptr2,  sw842_data2, sw842_data4, NULL}, /* 10 (01010) */
-	{ sw842_ptr2,  sw842_data4, sw842_ptr2,  NULL},
-	{ sw842_ptr2,  sw842_data2, sw842_ptr2,  sw842_data2},
-	{ sw842_ptr2,  sw842_data2, sw842_ptr2,  sw842_ptr2},
-	{ sw842_ptr2,  sw842_data2, sw842_ptr4,  NULL},
-	{ sw842_ptr2,  sw842_ptr2,  sw842_data4, NULL},
-	{ sw842_ptr2,  sw842_ptr2,  sw842_data2, sw842_ptr2},
-	{ sw842_ptr2,  sw842_ptr2,  sw842_ptr2,  sw842_data2},
-	{ sw842_ptr2,  sw842_ptr2,  sw842_ptr2,  sw842_ptr2},
-	{ sw842_ptr2,  sw842_ptr2,  sw842_ptr4,  NULL},
-	{ sw842_ptr4,  sw842_data4, NULL}, /* 20 (10100) */
-	{ sw842_ptr4,  sw842_data2, sw842_ptr2,  NULL},
-	{ sw842_ptr4,  sw842_ptr2,  sw842_data2, NULL},
-	{ sw842_ptr4,  sw842_ptr2,  sw842_ptr2,  NULL},
-	{ sw842_ptr4,  sw842_ptr4,  NULL},
-	{ sw842_ptr8,  NULL}
-};
-
-/* Software decompress helpers */
-
-static uint8_t sw842_get_byte(const char *buf, int bit)
-{
-	uint8_t tmpl;
-	uint16_t tmp;
-	tmp = htons(*(uint16_t *)(buf));
-	tmp = (uint16_t)(tmp << bit);
-	tmp = ntohs(tmp);
-	memcpy(&tmpl, &tmp, 1);
-	return tmpl;
-}
-
-static uint8_t sw842_get_template(const char **buf, int *bit)
-{
-	uint8_t byte;
-	byte = sw842_get_byte(*buf, *bit);
-	byte = byte >> 3;
-	byte &= 0x1F;
-	*buf += (*bit + 5) / 8;
-	*bit = (*bit + 5) % 8;
-	return byte;
-}
-
-/* repeat_count happens to be 5-bit too (like the template) */
-static uint8_t sw842_get_repeat_count(const char **buf, int *bit)
-{
-	uint8_t byte;
-	byte = sw842_get_byte(*buf, *bit);
-	byte = byte >> 2;
-	byte &= 0x3F;
-	*buf += (*bit + 6) / 8;
-	*bit = (*bit + 6) % 8;
-	return byte;
-}
-
-static uint8_t sw842_get_ptr2(const char **buf, int *bit)
-{
-	uint8_t ptr;
-	ptr = sw842_get_byte(*buf, *bit);
-	(*buf)++;
-	return ptr;
-}
-
-static uint16_t sw842_get_ptr4(const char **buf, int *bit,
-		struct sw842_fifo *fifo)
-{
-	uint16_t ptr;
-	ptr = htons(*(uint16_t *)(*buf));
-	ptr = (uint16_t)(ptr << *bit);
-	ptr = ptr >> 7;
-	ptr &= 0x01FF;
-	*buf += (*bit + 9) / 8;
-	*bit = (*bit + 9) % 8;
-	return ptr;
-}
-
-static uint8_t sw842_get_ptr8(const char **buf, int *bit,
-		struct sw842_fifo *fifo)
-{
-	return sw842_get_ptr2(buf, bit);
-}
-
-/* Software decompress template ops */
-
-static int sw842_data8(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	int ret;
-
-	ret = sw842_data4(inbuf, inbit, outbuf, fifo);
-	if (ret)
-		return ret;
-	ret = sw842_data4(inbuf, inbit, outbuf, fifo);
-	return ret;
-}
-
-static int sw842_data4(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	int ret;
-
-	ret = sw842_data2(inbuf, inbit, outbuf, fifo);
-	if (ret)
-		return ret;
-	ret = sw842_data2(inbuf, inbit, outbuf, fifo);
-	return ret;
-}
-
-static int sw842_data2(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	**outbuf = sw842_get_byte(*inbuf, *inbit);
-	(*inbuf)++;
-	(*outbuf)++;
-	**outbuf = sw842_get_byte(*inbuf, *inbit);
-	(*inbuf)++;
-	(*outbuf)++;
-	return 0;
-}
-
-static int sw842_ptr8(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	uint8_t ptr;
-	ptr = sw842_get_ptr8(inbuf, inbit, fifo);
-	if (!fifo->f84_full && (ptr >= fifo->f8_count))
-		return 1;
-	memcpy(*outbuf, fifo->f8[ptr], 8);
-	*outbuf += 8;
-	return 0;
-}
-
-static int sw842_ptr4(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	uint16_t ptr;
-	ptr = sw842_get_ptr4(inbuf, inbit, fifo);
-	if (!fifo->f84_full && (ptr >= fifo->f4_count))
-		return 1;
-	memcpy(*outbuf, fifo->f4[ptr], 4);
-	*outbuf += 4;
-	return 0;
-}
-
-static int sw842_ptr2(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	uint8_t ptr;
-	ptr = sw842_get_ptr2(inbuf, inbit);
-	if (!fifo->f2_full && (ptr >= fifo->f2_count))
-		return 1;
-	memcpy(*outbuf, fifo->f2[ptr], 2);
-	*outbuf += 2;
-	return 0;
-}
-
-static void sw842_copy_to_fifo(const char *buf, struct sw842_fifo *fifo)
-{
-	unsigned char initial_f2count = fifo->f2_count;
-
-	memcpy(fifo->f8[fifo->f8_count], buf, 8);
-	fifo->f4_count += 2;
-	fifo->f8_count += 1;
-
-	if (!fifo->f84_full && fifo->f4_count >= 512) {
-		fifo->f84_full = 1;
-		fifo->f4_count /= 512;
-	}
-
-	memcpy(fifo->f2[fifo->f2_count++], buf, 2);
-	memcpy(fifo->f2[fifo->f2_count++], buf + 2, 2);
-	memcpy(fifo->f2[fifo->f2_count++], buf + 4, 2);
-	memcpy(fifo->f2[fifo->f2_count++], buf + 6, 2);
-	if (fifo->f2_count < initial_f2count)
-		fifo->f2_full = 1;
-}
-
-static int sw842_decompress(const unsigned char *src, int srclen,
-			unsigned char *dst, int *destlen,
-			const void *wrkmem)
-{
-	uint8_t tmpl;
-	const char *inbuf;
-	int inbit = 0;
-	unsigned char *outbuf, *outbuf_end, *origbuf, *prevbuf;
-	const char *inbuf_end;
-	sw842_template_op op;
-	int opindex;
-	int i, repeat_count;
-	struct sw842_fifo *fifo;
-	int ret = 0;
-
-	fifo = &((struct nx842_workmem *)(wrkmem))->swfifo;
-	memset(fifo, 0, sizeof(*fifo));
-
-	origbuf = NULL;
-	inbuf = src;
-	inbuf_end = src + srclen;
-	outbuf = dst;
-	outbuf_end = dst + *destlen;
-
-	while ((tmpl = sw842_get_template(&inbuf, &inbit)) != SW842_TMPL_EOF) {
-		if (inbuf >= inbuf_end) {
-			ret = -EINVAL;
-			goto out;
-		}
-
-		opindex = 0;
-		prevbuf = origbuf;
-		origbuf = outbuf;
-		switch (tmpl) {
-		case SW842_TMPL_REPEAT:
-			if (prevbuf == NULL) {
-				ret = -EINVAL;
-				goto out;
-			}
-
-			repeat_count = sw842_get_repeat_count(&inbuf,
-								&inbit) + 1;
-
-			/* Did the repeat count advance past the end of input */
-			if (inbuf > inbuf_end) {
-				ret = -EINVAL;
-				goto out;
-			}
-
-			for (i = 0; i < repeat_count; i++) {
-				/* Would this overflow the output buffer */
-				if ((outbuf + 8) > outbuf_end) {
-					ret = -ENOSPC;
-					goto out;
-				}
-
-				memcpy(outbuf, prevbuf, 8);
-				sw842_copy_to_fifo(outbuf, fifo);
-				outbuf += 8;
-			}
-			break;
-
-		case SW842_TMPL_ZEROS:
-			/* Would this overflow the output buffer */
-			if ((outbuf + 8) > outbuf_end) {
-				ret = -ENOSPC;
-				goto out;
-			}
-
-			memset(outbuf, 0, 8);
-			sw842_copy_to_fifo(outbuf, fifo);
-			outbuf += 8;
-			break;
-
-		default:
-			if (tmpl > 25) {
-				ret = -EINVAL;
-				goto out;
-			}
-
-			/* Does this go past the end of the input buffer */
-			if ((inbuf + 2) > inbuf_end) {
-				ret = -EINVAL;
-				goto out;
-			}
-
-			/* Would this overflow the output buffer */
-			if ((outbuf + 8) > outbuf_end) {
-				ret = -ENOSPC;
-				goto out;
-			}
-
-			while (opindex < 4 &&
-				(op = sw842_tmpl_ops[tmpl][opindex++])
-					!= NULL) {
-				ret = (*op)(&inbuf, &inbit, &outbuf, fifo);
-				if (ret) {
-					ret = -EINVAL;
-					goto out;
-				}
-				sw842_copy_to_fifo(origbuf, fifo);
-			}
-		}
-	}
-
-out:
-	if (!ret)
-		*destlen = (unsigned int)(outbuf - dst);
-	else
-		*destlen = 0;
-
-	return ret;
-}
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 09/11] crypto: remove LZO fallback from crypto 842
  2015-04-07 17:34 [PATCH 00/11] add 842 hw compression for PowerNV platform Dan Streetman
                   ` (7 preceding siblings ...)
  2015-04-07 17:34 ` [PATCH 08/11] drivers/crypto/nx: simplify pSeries nx842 driver Dan Streetman
@ 2015-04-07 17:34 ` Dan Streetman
  2015-04-08 14:16   ` Herbert Xu
  2015-04-07 17:34 ` [PATCH 10/11] crypto: rewrite crypto 842 to use nx842 constraints Dan Streetman
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Dan Streetman @ 2015-04-07 17:34 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Seth Jennings, Robert Jennings, linux-crypto, linuxppc-dev,
	linux-kernel, Dan Streetman

Update the crypto 842 driver to no longer fallback to LZO if the 842
hardware is unavailable.  Simplify the crpypto 842 driver to remove all
headers indicating 842/lzo.

The crypto 842 driver should do 842-format compression and decompression
only.  It should not fallback to LZO compression/decompression.  The
user of the crypto 842 driver can fallback to another format if desired.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 crypto/842.c   | 139 ++++++++++++---------------------------------------------
 crypto/Kconfig |   4 +-
 2 files changed, 29 insertions(+), 114 deletions(-)

diff --git a/crypto/842.c b/crypto/842.c
index d21cedb..d81c6c7 100644
--- a/crypto/842.c
+++ b/crypto/842.c
@@ -26,128 +26,46 @@
 #include <linux/crypto.h>
 #include <linux/vmalloc.h>
 #include <linux/nx842.h>
-#include <linux/lzo.h>
+#include <linux/sw842.h>
 #include <linux/timer.h>
 
-static int nx842_uselzo;
-
-struct nx842_ctx {
-	void *nx842_wmem; /* working memory for 842/lzo */
-};
-
-enum nx842_crypto_type {
-	NX842_CRYPTO_TYPE_842,
-	NX842_CRYPTO_TYPE_LZO
+struct crypto842_ctx {
+	void *wmem; /* working memory for 842 */
 };
 
-#define NX842_SENTINEL 0xdeadbeef
-
-struct nx842_crypto_header {
-	unsigned int sentinel; /* debug */
-	enum nx842_crypto_type type;
-};
-
-static int nx842_init(struct crypto_tfm *tfm)
+static int crypto842_init(struct crypto_tfm *tfm)
 {
-	struct nx842_ctx *ctx = crypto_tfm_ctx(tfm);
-	int wmemsize;
+	struct crypto842_ctx *ctx = crypto_tfm_ctx(tfm);
 
-	wmemsize = max_t(int, NX842_MEM_COMPRESS, LZO1X_MEM_COMPRESS);
-	ctx->nx842_wmem = kmalloc(wmemsize, GFP_NOFS);
-	if (!ctx->nx842_wmem)
+	ctx->wmem = kmalloc(NX842_MEM_COMPRESS, GFP_NOFS);
+	if (!ctx->wmem)
 		return -ENOMEM;
 
 	return 0;
 }
 
-static void nx842_exit(struct crypto_tfm *tfm)
+static void crypto842_exit(struct crypto_tfm *tfm)
 {
-	struct nx842_ctx *ctx = crypto_tfm_ctx(tfm);
+	struct crypto842_ctx *ctx = crypto_tfm_ctx(tfm);
 
-	kfree(ctx->nx842_wmem);
+	kfree(ctx->wmem);
 }
 
-static void nx842_reset_uselzo(unsigned long data)
-{
-	nx842_uselzo = 0;
-}
-
-static DEFINE_TIMER(failover_timer, nx842_reset_uselzo, 0, 0);
-
-static int nx842_crypto_compress(struct crypto_tfm *tfm, const u8 *src,
+static int crypto842_compress(struct crypto_tfm *tfm, const u8 *src,
 			    unsigned int slen, u8 *dst, unsigned int *dlen)
 {
-	struct nx842_ctx *ctx = crypto_tfm_ctx(tfm);
-	struct nx842_crypto_header *hdr;
-	unsigned int tmp_len = *dlen;
-	size_t lzodlen; /* needed for lzo */
-	int err;
-
-	*dlen = 0;
-	hdr = (struct nx842_crypto_header *)dst;
-	hdr->sentinel = NX842_SENTINEL; /* debug */
-	dst += sizeof(struct nx842_crypto_header);
-	tmp_len -= sizeof(struct nx842_crypto_header);
-	lzodlen = tmp_len;
-
-	if (likely(!nx842_uselzo)) {
-		err = nx842_compress(src, slen, dst, &tmp_len, ctx->nx842_wmem);
-
-		if (likely(!err)) {
-			hdr->type = NX842_CRYPTO_TYPE_842;
-			*dlen = tmp_len + sizeof(struct nx842_crypto_header);
-			return 0;
-		}
-
-		/* hardware failed */
-		nx842_uselzo = 1;
-
-		/* set timer to check for hardware again in 1 second */
-		mod_timer(&failover_timer, jiffies + msecs_to_jiffies(1000));
-	}
-
-	/* no hardware, use lzo */
-	err = lzo1x_1_compress(src, slen, dst, &lzodlen, ctx->nx842_wmem);
-	if (err != LZO_E_OK)
-		return -EINVAL;
-
-	hdr->type = NX842_CRYPTO_TYPE_LZO;
-	*dlen = lzodlen + sizeof(struct nx842_crypto_header);
-	return 0;
+	struct crypto842_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	return nx842_compress(src, slen, dst, dlen, ctx->wmem);
 }
 
-static int nx842_crypto_decompress(struct crypto_tfm *tfm, const u8 *src,
+static int crypto842_decompress(struct crypto_tfm *tfm, const u8 *src,
 			      unsigned int slen, u8 *dst, unsigned int *dlen)
 {
-	struct nx842_ctx *ctx = crypto_tfm_ctx(tfm);
-	struct nx842_crypto_header *hdr;
-	unsigned int tmp_len = *dlen;
-	size_t lzodlen; /* needed for lzo */
-	int err;
-
-	*dlen = 0;
-	hdr = (struct nx842_crypto_header *)src;
-
-	if (unlikely(hdr->sentinel != NX842_SENTINEL))
-		return -EINVAL;
-
-	src += sizeof(struct nx842_crypto_header);
-	slen -= sizeof(struct nx842_crypto_header);
-
-	if (likely(hdr->type == NX842_CRYPTO_TYPE_842)) {
-		err = nx842_decompress(src, slen, dst, &tmp_len,
-			ctx->nx842_wmem);
-		if (err)
-			return -EINVAL;
-		*dlen = tmp_len;
-	} else if (hdr->type == NX842_CRYPTO_TYPE_LZO) {
-		lzodlen = tmp_len;
-		err = lzo1x_decompress_safe(src, slen, dst, &lzodlen);
-		if (err != LZO_E_OK)
-			return -EINVAL;
-		*dlen = lzodlen;
-	} else
-		return -EINVAL;
+	struct crypto842_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	if (nx842_decompress(src, slen, dst, dlen, ctx->wmem))
+		return sw842_decompress(src, slen, dst, dlen);
 
 	return 0;
 }
@@ -155,28 +73,27 @@ static int nx842_crypto_decompress(struct crypto_tfm *tfm, const u8 *src,
 static struct crypto_alg alg = {
 	.cra_name		= "842",
 	.cra_flags		= CRYPTO_ALG_TYPE_COMPRESS,
-	.cra_ctxsize		= sizeof(struct nx842_ctx),
+	.cra_ctxsize		= sizeof(struct crypto842_ctx),
 	.cra_module		= THIS_MODULE,
-	.cra_init		= nx842_init,
-	.cra_exit		= nx842_exit,
+	.cra_init		= crypto842_init,
+	.cra_exit		= crypto842_exit,
 	.cra_u			= { .compress = {
-	.coa_compress		= nx842_crypto_compress,
-	.coa_decompress		= nx842_crypto_decompress } }
+	.coa_compress		= crypto842_compress,
+	.coa_decompress		= crypto842_decompress } }
 };
 
-static int __init nx842_mod_init(void)
+static int __init crypto842_mod_init(void)
 {
-	del_timer(&failover_timer);
 	return crypto_register_alg(&alg);
 }
 
-static void __exit nx842_mod_exit(void)
+static void __exit crypto842_mod_exit(void)
 {
 	crypto_unregister_alg(&alg);
 }
 
-module_init(nx842_mod_init);
-module_exit(nx842_mod_exit);
+module_init(crypto842_mod_init);
+module_exit(crypto842_mod_exit);
 
 MODULE_LICENSE("GPL");
 MODULE_DESCRIPTION("842 Compression Algorithm");
diff --git a/crypto/Kconfig b/crypto/Kconfig
index 50f4da4..a7148ff 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1424,9 +1424,7 @@ config CRYPTO_LZO
 config CRYPTO_842
 	tristate "842 compression algorithm"
 	depends on CRYPTO_DEV_NX_COMPRESS
-	# 842 uses lzo if the hardware becomes unavailable
-	select LZO_COMPRESS
-	select LZO_DECOMPRESS
+	select 842_DECOMPRESS
 	help
 	  This is the 842 algorithm.
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 10/11] crypto: rewrite crypto 842 to use nx842 constraints
  2015-04-07 17:34 [PATCH 00/11] add 842 hw compression for PowerNV platform Dan Streetman
                   ` (8 preceding siblings ...)
  2015-04-07 17:34 ` [PATCH 09/11] crypto: remove LZO fallback from crypto 842 Dan Streetman
@ 2015-04-07 17:34 ` Dan Streetman
  2015-04-07 17:34 ` [PATCH 11/11] crypto: add crypto compression sefltest Dan Streetman
  2015-05-06 16:50 ` [PATCHv2 00/10] add 842 hw compression for PowerNV platform Dan Streetman
  11 siblings, 0 replies; 46+ messages in thread
From: Dan Streetman @ 2015-04-07 17:34 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Seth Jennings, Robert Jennings, linux-crypto, linuxppc-dev,
	linux-kernel, Dan Streetman

Major rewrite of the crypto 842 driver to use the "constraints" from the
NX-842 hardware driver, and split and/or shift input or output buffers to
fit the required alignment/length constraints.  Add a header to the compressed
buffers.  Update the MAINTAINERS 842 section with the crypto 842 filename.
Fallback to software 842 decompression if the NX-842 hardware fails.

Now that the NX-842 hardware driver provides information about its constraints,
this updates the main crypto 842 driver to adjust each incoming buffer to those
hw constraints.  This allows using the 842 compression hardware with any
alignment or length buffers; previously with only the pSeries NX-842 driver
all (uncompressed) buffers needed to be page-sized and page-aligned.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 MAINTAINERS  |   1 +
 crypto/842.c | 414 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 399 insertions(+), 16 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5a8d46d..850540d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4837,6 +4837,7 @@ S:	Supported
 F:	drivers/crypto/nx/nx-842*
 F:	include/linux/nx842.h
 F:	include/linux/sw842.h
+F:	crypto/842.c
 F:	lib/842/
 
 IBM Power Linux RAID adapter
diff --git a/crypto/842.c b/crypto/842.c
index d81c6c7..7f35c49 100644
--- a/crypto/842.c
+++ b/crypto/842.c
@@ -11,14 +11,22 @@
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  * GNU General Public License for more details.
  *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ * Copyright (C) IBM Corporation, 2011-2015
  *
- * Copyright (C) IBM Corporation, 2011
+ * Original Authors: Robert Jennings <rcj@linux.vnet.ibm.com>
+ *                   Seth Jennings <sjenning@linux.vnet.ibm.com>
  *
- * Authors: Robert Jennings <rcj@linux.vnet.ibm.com>
- *          Seth Jennings <sjenning@linux.vnet.ibm.com>
+ * Rewrite: Dan Streetman <ddstreet@ieee.org>
+ *
+ * This is an interface to the NX-842 compression hardware in PowerPC
+ * processors (see drivers/crypto/nx/nx-842.c for details).  Most (all?)
+ * of the complexity of this drvier is due to the fact that the NX-842
+ * compression hardware requires the input and output data buffers to be
+ * specifically aligned, to be a specific multiple in length, and within
+ * specific minimum and maximum lengths.  Those restrictions, provided by
+ * the nx-842 driver via is nx842_constraints, mean this driver must use
+ * bounce buffers and headers to correct misaligned in or out buffers,
+ * and to split input buffers that are too large.
  */
 
 #include <linux/init.h>
@@ -27,10 +35,59 @@
 #include <linux/vmalloc.h>
 #include <linux/nx842.h>
 #include <linux/sw842.h>
-#include <linux/timer.h>
+#include <linux/ratelimit.h>
+
+/* The first 5 bits of this magic are 0x1f, which is an invalid 842 5-bit
+ * template (see lib/842/842_decompress.c), so this magic number
+ * will never appear at the start of a raw 842 compressed buffer.
+ * That can be useful in the future, if buffer alignment and length is
+ * correct, to not require the use of any header, which will save some
+ * space in the resulting compressed buffer; then in decompress, if the
+ * input buffer does not contain this header magic, it's assumed to be
+ * a raw compressed buffer and should be passed directly to the NX-842
+ * hardware driver.
+ */
+#define CRYPTO_842_MAGIC	(0xf842)
+#define CRYPTO_842_GROUP_MAX	(0x19)	/* max 0-based index, real max is +1 */
+#define CRYPTO_842_HEADER_SIZE(h)				\
+	(sizeof(*(h)) +	sizeof((h)->group) * (h)->groups)
+#define CRYPTO_842_HEADER_MAX_SIZE					\
+	(sizeof(struct crypto842_header) +				\
+	 sizeof(struct crypto842_header_group) * CRYPTO_842_GROUP_MAX)
+
+/* try longer on comp because we can fallback to sw decomp if hw is busy */
+#define COMP_BUSY_TIMEOUT	(250) /* ms */
+#define DECOMP_BUSY_TIMEOUT	(50) /* ms */
+
+struct crypto842_header_group {
+	u16 padding;	/* unused bytes at start of group */
+	u32 length;	/* length of group, not including padding */
+} __packed;
+
+struct crypto842_header {
+	u16 magic;		/* CRYPTO_842_MAGIC */
+	u16 ignore;		/* decompressed end bytes to ignore */
+	u8 groups;		/* 0-based; add 1 for total */
+	struct crypto842_header_group group[1];
+} __packed;
+
+struct crypto842_param {
+	u8 *in;
+	long iremain;
+	u8 *out;
+	long oremain;
+	long ototal;
+};
 
 struct crypto842_ctx {
-	void *wmem; /* working memory for 842 */
+	void *wmem;	/* working memory for 842 */
+	void *bounce;	/* bounce buffer to correct alignment */
+
+	/* header includes 1 group, so the total usable groups are
+	 * max + 1; meaning max is the highest valid 0-based index.
+	 */
+	struct crypto842_header header;
+	struct crypto842_header_group group[CRYPTO_842_GROUP_MAX];
 };
 
 static int crypto842_init(struct crypto_tfm *tfm)
@@ -38,8 +95,12 @@ static int crypto842_init(struct crypto_tfm *tfm)
 	struct crypto842_ctx *ctx = crypto_tfm_ctx(tfm);
 
 	ctx->wmem = kmalloc(NX842_MEM_COMPRESS, GFP_NOFS);
-	if (!ctx->wmem)
+	ctx->bounce = (void *)__get_free_page(GFP_NOFS);
+	if (!ctx->wmem || !ctx->bounce) {
+		kfree(ctx->wmem);
+		free_page((unsigned long)ctx->bounce);
 		return -ENOMEM;
+	}
 
 	return 0;
 }
@@ -49,23 +110,343 @@ static void crypto842_exit(struct crypto_tfm *tfm)
 	struct crypto842_ctx *ctx = crypto_tfm_ctx(tfm);
 
 	kfree(ctx->wmem);
+	free_page((unsigned long)ctx->bounce);
+}
+
+static inline bool check_len(char *type, long len)
+{
+	bool valid = len > 0;
+
+	if (!valid)
+		pr_err("invalid %s length 0x%lx\n", type, len);
+
+	return valid;
 }
 
-static int crypto842_compress(struct crypto_tfm *tfm, const u8 *src,
-			    unsigned int slen, u8 *dst, unsigned int *dlen)
+static int read_constraints(struct nx842_constraints *c)
+{
+	int ret;
+
+	ret = nx842_constraints(c);
+	if (ret) {
+		pr_err_ratelimited("could not get nx842 constraints : %d\n",
+				   ret);
+		return ret;
+	}
+
+	if (c->alignment > PAGE_SIZE) {
+		WARN_ONCE(1, "NX842 alignment is invalid, ignoring : 0x%x\n",
+			  c->alignment);
+		c->alignment = 1;
+	}
+
+	if (c->minimum > PAGE_SIZE) {
+		WARN_ONCE(1, "NX842 minimum is invalid, ignoring : 0x%x\n",
+			  c->minimum);
+		c->minimum = 1;
+	}
+
+	return 0;
+}
+
+static int crypto842_add_header(struct crypto842_header *hdr, u8 *buf)
+{
+	int s = CRYPTO_842_HEADER_SIZE(hdr);
+
+	/* error - compress should have added space for header */
+	if (s > hdr->group[0].padding) {
+		WARN_ONCE(1, "Internal driver error: no space for header\n");
+		return -EINVAL;
+	}
+
+	memcpy(buf, hdr, s);
+
+	return 0;
+}
+
+static int __crypto842_compress(struct crypto842_ctx *ctx,
+				struct crypto842_param *p,
+				struct crypto842_header_group *g,
+				struct nx842_constraints *c,
+				bool first)
+{
+	unsigned int slen = p->iremain, dlen = p->oremain, tmplen;
+	unsigned int adj_slen = slen;
+	u8 *src = p->in, *dst = p->out;
+	int ret, dskip;
+	ktime_t timeout;
+
+	if (p->iremain <= 0 || p->oremain <= 0)
+		return -EINVAL;
+
+	if (slen % c->multiple)
+		adj_slen = round_up(slen, c->multiple);
+	if (slen < c->minimum)
+		adj_slen = c->minimum;
+	if (slen > c->maximum)
+		adj_slen = slen = c->maximum;
+	if (adj_slen > slen || (unsigned long)src % c->alignment) {
+		adj_slen = min_t(unsigned int, adj_slen, PAGE_SIZE);
+		slen = min_t(unsigned int, slen, PAGE_SIZE);
+		if (adj_slen > slen)
+			memset(ctx->bounce + slen, 0, adj_slen - slen);
+		memcpy(ctx->bounce, src, slen);
+		src = ctx->bounce;
+		slen = adj_slen;
+	}
+
+	/* skip space for the main header */
+	dskip = first ? CRYPTO_842_HEADER_MAX_SIZE : 0;
+	if ((unsigned long)(dst + dskip) % c->alignment)
+		dskip = (int)(PTR_ALIGN(dst + dskip, c->alignment) - dst);
+	if (dskip >= dlen)
+		return -ENOSPC;
+	dst += dskip;
+	dlen -= dskip;
+	if (dlen % c->multiple)
+		dlen = round_down(dlen, c->multiple);
+	if (dlen < c->minimum)
+		return -ENOSPC;
+	if (dlen > c->maximum)
+		dlen = c->maximum;
+
+	tmplen = dlen;
+	timeout = ktime_add_ms(ktime_get(), COMP_BUSY_TIMEOUT);
+	do {
+		dlen = tmplen; /* reset dlen, if we're retrying */
+		ret = nx842_compress(src, slen, dst, &dlen, ctx->wmem);
+	} while (ret == -EBUSY && ktime_before(ktime_get(), timeout));
+	if (ret)
+		return ret;
+
+	g->padding = dskip;
+	g->length = dlen;
+
+	p->in += slen;
+	p->iremain -= slen;
+	p->out += dskip + dlen;
+	p->oremain -= dskip + dlen;
+	p->ototal += dskip + dlen;
+
+	return 0;
+}
+
+static int crypto842_compress(struct crypto_tfm *tfm,
+			      const u8 *src, unsigned int slen,
+			      u8 *dst, unsigned int *dlen)
 {
 	struct crypto842_ctx *ctx = crypto_tfm_ctx(tfm);
+	struct crypto842_header *hdr = &ctx->header;
+	struct crypto842_param p;
+	struct nx842_constraints c;
+	int ret;
+
+	p.in = (u8 *)src;
+	p.iremain = slen;
+	p.out = dst;
+	p.oremain = *dlen;
+	p.ototal = 0;
+
+	*dlen = 0;
+
+	if (!check_len("src", p.iremain) || !check_len("dest", p.oremain))
+		return -EINVAL;
+
+	ret = read_constraints(&c);
+	if (ret)
+		return ret;
+
+	hdr->magic = CRYPTO_842_MAGIC;
+	hdr->groups = 0;
+
+	while (p.iremain > 0) {
+		int n = hdr->groups++;
+
+		if (n > CRYPTO_842_GROUP_MAX)
+			return -ENOSPC;
+
+		ret = __crypto842_compress(ctx, &p, &hdr->group[n], &c, !n);
+		if (ret)
+			return ret;
+	}
+
+	/* count is zero-based, so 1 less than actual count */
+	hdr->groups--;
+
+	/* ignore indicates the input stream needed to be padded */
+	hdr->ignore = abs(p.iremain);
+	if (hdr->ignore)
+		pr_debug("marked %d bytes as ignore\n", hdr->ignore);
+
+	ret = crypto842_add_header(hdr, dst);
+	if (ret)
+		return ret;
+
+	*dlen = p.ototal;
+
+	return 0;
+}
+
+static int __crypto842_decompress(struct crypto842_ctx *ctx,
+				  struct crypto842_param *p,
+				  struct crypto842_header_group *g,
+				  struct nx842_constraints *c,
+				  bool usehw)
+{
+	unsigned int slen = g->length, dlen = p->oremain, tmplen;
+	u8 *src = p->in, *dst = p->out;
+	int ret, doffset = 0;
+	ktime_t timeout;
+
+	if (p->iremain <= 0 || p->oremain <= 0 || !slen)
+		return -EINVAL;
 
-	return nx842_compress(src, slen, dst, dlen, ctx->wmem);
+	if (g->padding + slen > p->iremain)
+		return -EINVAL;
+
+	src += g->padding;
+
+	if (usehw) {
+		/* skip length-based constraints for compressed buffer;
+		 * the hardware created it, so it should accept any
+		 * buffer length it created.  Defer to the hw driver
+		 * to handle or reject.
+		 *
+		 * however we do need to correct alignment, since the buffer
+		 * may have been moved in memory since it was created.
+		 */
+		if ((unsigned long)src % c->alignment) {
+			/* We only have 1 page of bounce buffer; if the
+			 * compressed data is larger and misaligned, we'll
+			 * have to use sw to decompress.  If this is a
+			 * common problem, someone should increase the
+			 * bounce buffer size (or make sure you keep large
+			 * compressed buffers aligned).
+			 */
+			if (slen > PAGE_SIZE) {
+				pr_warn("Compressed buffer misaligned\n");
+				usehw = false;
+			} else {
+				memcpy(ctx->bounce, src, slen);
+				src = ctx->bounce;
+			}
+		}
+
+		/* if the dest buffer isn't aligned, we have no choice but to
+		 * use an aligned buffer and copy the results into place.
+		 * So we just align the start position, reduce the length, and
+		 * after decompressing move the data back to the actual buffer
+		 * start position.
+		 */
+		if ((unsigned long)dst % c->alignment) {
+			doffset = (int)(PTR_ALIGN(dst, c->alignment) - dst);
+			dst += doffset;
+			dlen -= doffset;
+		}
+		if (dlen % c->multiple)
+			dlen = round_down(dlen, c->multiple);
+		if (dlen < c->minimum)
+			return -ENOSPC;
+		if (dlen > c->maximum)
+			dlen = c->maximum;
+	}
+
+	tmplen = dlen;
+	timeout = ktime_add_ms(ktime_get(), DECOMP_BUSY_TIMEOUT);
+	do {
+		if (!usehw)
+			break;
+
+		dlen = tmplen; /* reset dlen, if we're retrying */
+		ret = nx842_decompress(src, slen, dst, &dlen, ctx->wmem);
+	} while (ret == -EBUSY && ktime_before(ktime_get(), timeout));
+	if (!usehw || ret) {
+		dlen = tmplen; /* reset dlen, if hw failed */
+		ret = sw842_decompress(src, slen, dst, &dlen);
+	}
+	if (ret)
+		return ret;
+
+	if (doffset) {
+		dst -= doffset;
+		memmove(dst, dst + doffset, dlen);
+	}
+
+	p->in += g->padding + slen;
+	p->iremain -= g->padding + slen;
+	p->out += doffset + dlen;
+	p->oremain -= doffset + dlen;
+	p->ototal += doffset + dlen;
+
+	return 0;
 }
 
-static int crypto842_decompress(struct crypto_tfm *tfm, const u8 *src,
-			      unsigned int slen, u8 *dst, unsigned int *dlen)
+static int crypto842_decompress(struct crypto_tfm *tfm,
+				const u8 *src, unsigned int slen,
+				u8 *dst, unsigned int *dlen)
 {
 	struct crypto842_ctx *ctx = crypto_tfm_ctx(tfm);
+	struct crypto842_header *hdr;
+	struct crypto842_param p;
+	struct nx842_constraints c;
+	int n, ret, hdr_len;
+	bool usehw = true;
+
+	p.in = (u8 *)src;
+	p.iremain = slen;
+	p.out = dst;
+	p.oremain = *dlen;
+	p.ototal = 0;
+
+	*dlen = 0;
+
+	if (!check_len("src", p.iremain) || !check_len("dest", p.oremain))
+		return -EINVAL;
+
+	ret = read_constraints(&c);
+	if (ret) {
+		pr_err("could not get nx842 constraints : %d\n", ret);
+		usehw = false;
+		ret = 0;
+	}
+
+	hdr = (struct crypto842_header *)src;
+
+	/* If it doesn't start with our header magic number,
+	 * we didn't create it and therefore can't decompress it.
+	 */
+	if (hdr->magic != CRYPTO_842_MAGIC) {
+		pr_err("header magic 0x%04x is not 0x%04x\n",
+			 hdr->magic, CRYPTO_842_MAGIC);
+		return -EINVAL;
+	}
+	if (hdr->groups > CRYPTO_842_GROUP_MAX) {
+		pr_err("header has too many groups 0x%x, max 0x%x\n",
+			 hdr->groups, CRYPTO_842_GROUP_MAX);
+		return -EINVAL;
+	}
+
+	hdr_len = CRYPTO_842_HEADER_SIZE(hdr);
+	memcpy(&ctx->header, src, hdr_len);
+	hdr = &ctx->header;
+
+	/* groups is zero-based */
+	for (n = 0; n <= hdr->groups; n++) {
+		if (n > CRYPTO_842_GROUP_MAX)
+			return -EINVAL;
+
+		ret = __crypto842_decompress(ctx, &p, &hdr->group[n], &c,
+					     usehw);
+		if (ret)
+			return ret;
+	}
+
+	/* ignore the last N bytes, which were padding */
+	p.ototal -= hdr->ignore;
+	if (hdr->ignore)
+		pr_debug("ignoring last 0x%x bytes:\n", hdr->ignore);
 
-	if (nx842_decompress(src, slen, dst, dlen, ctx->wmem))
-		return sw842_decompress(src, slen, dst, dlen);
+	*dlen = p.ototal;
 
 	return 0;
 }
@@ -98,3 +479,4 @@ module_exit(crypto842_mod_exit);
 MODULE_LICENSE("GPL");
 MODULE_DESCRIPTION("842 Compression Algorithm");
 MODULE_ALIAS_CRYPTO("842");
+MODULE_AUTHOR("Dan Streetman <ddstreet@ieee.org>");
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 11/11] crypto: add crypto compression sefltest
  2015-04-07 17:34 [PATCH 00/11] add 842 hw compression for PowerNV platform Dan Streetman
                   ` (9 preceding siblings ...)
  2015-04-07 17:34 ` [PATCH 10/11] crypto: rewrite crypto 842 to use nx842 constraints Dan Streetman
@ 2015-04-07 17:34 ` Dan Streetman
  2015-04-08 14:16   ` Herbert Xu
  2015-05-06 16:50 ` [PATCHv2 00/10] add 842 hw compression for PowerNV platform Dan Streetman
  11 siblings, 1 reply; 46+ messages in thread
From: Dan Streetman @ 2015-04-07 17:34 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Seth Jennings, Robert Jennings, linux-crypto, linuxppc-dev,
	linux-kernel, Dan Streetman

Add configurable module to perform self-tests on any crypto compression
driver.

This allows testing any crypto compression driver with any input buffer,
at varying alignments and lengths.  It calculates the average bytes per
second compression and decompression rates.  Any errors reported by the
compressor during compression or decompression will end the test and
be logged.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 crypto/Kconfig         |   9 +
 crypto/Makefile        |   1 +
 crypto/comp_selftest.c | 928 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 938 insertions(+)
 create mode 100644 crypto/comp_selftest.c

diff --git a/crypto/Kconfig b/crypto/Kconfig
index a7148ff..e56ecf2 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -189,6 +189,15 @@ config CRYPTO_TEST
 	help
 	  Quick & dirty crypto test module.
 
+config CRYPTO_COMP_SELFTEST
+	tristate "Compression Self-Testing module"
+	help
+	  Configurable Compression Self-Testing using debugfs interface.
+	  This allows you to compress and decompress buffers of variable
+	  offsets, lengths, and data, using different compressors.  Also
+	  the average bytes per second rate for compression/decompression
+	  can be calculated.
+
 config CRYPTO_ABLK_HELPER
 	tristate
 	select CRYPTO_CRYPTD
diff --git a/crypto/Makefile b/crypto/Makefile
index ba19465..0bb1ac2 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -95,6 +95,7 @@ obj-$(CONFIG_CRYPTO_RNG2) += krng.o
 obj-$(CONFIG_CRYPTO_ANSI_CPRNG) += ansi_cprng.o
 obj-$(CONFIG_CRYPTO_DRBG) += drbg.o
 obj-$(CONFIG_CRYPTO_TEST) += tcrypt.o
+obj-$(CONFIG_CRYPTO_COMP_SELFTEST) += comp_selftest.o
 obj-$(CONFIG_CRYPTO_GHASH) += ghash-generic.o
 obj-$(CONFIG_CRYPTO_USER_API) += af_alg.o
 obj-$(CONFIG_CRYPTO_USER_API_HASH) += algif_hash.o
diff --git a/crypto/comp_selftest.c b/crypto/comp_selftest.c
new file mode 100644
index 0000000..691a8ea
--- /dev/null
+++ b/crypto/comp_selftest.c
@@ -0,0 +1,928 @@
+/*
+ * Self-test for compression
+ *
+ * Copyright (C) 2015 Dan Streetman, IBM Corp
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/module.h>
+#include <linux/kthread.h>
+#include <linux/debugfs.h>
+#include <linux/uaccess.h>
+#include <linux/crypto.h>
+#include <linux/rwsem.h>
+#include <linux/ratelimit.h>
+
+#define MODULE_NAME "comp_selftest"
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Dan Streetman <ddstreet@ieee.org>");
+MODULE_DESCRIPTION("Crypto Compression Self-Test");
+
+static unsigned int test_kthreads_max = 64;
+module_param_named(threads_max, test_kthreads_max, uint, 0444);
+
+static unsigned int test_buffer_order = 2;
+module_param_named(buffer_order, test_buffer_order, uint, 0444);
+
+#define TEST_KTHREADS_DEFAULT	(4)
+
+#define TEST_REPEAT_DEFAULT	(1)
+
+#define TEST_BPS_WINDOW_DEFAULT	(1)
+
+#define TEST_BUFFER_SIZE	(PAGE_SIZE << test_buffer_order)
+
+#define TEST_CHECK_INTERVAL	(msecs_to_jiffies(500))
+
+#define OFFSET_START_DEFAULT	(0)
+#define OFFSET_END_DEFAULT	OFFSET_START_DEFAULT
+#define OFFSET_INTERVAL_DEFAULT	(1)
+#define LENGTH_START_DEFAULT	(PAGE_SIZE)
+#define LENGTH_END_DEFAULT	LENGTH_START_DEFAULT
+#define LENGTH_INTERVAL_DEFAULT	(1)
+
+struct test_range {
+	u32 start, interval, end;
+};
+
+struct test_param {
+	u32 running;
+	u32 repeat;
+	u32 kthreads;
+	u32 bps_window; /* in seconds */
+	struct test_range offset[3];
+	struct test_range length[3];
+};
+
+struct test_kthread_param {
+	bool running;
+	struct task_struct *kthread;
+	struct crypto_comp *tfm;
+	u8 *buffer[3];
+	u32 offset[3];
+	u32 length[3];
+	atomic64_t bps[2];
+};
+
+static struct test_kthread_param *test_kthread_params;
+
+static struct task_struct *test_kthread;
+static int test_return;
+static u8 *test_buffer;
+
+static atomic64_t test_max_bps[2];
+
+#define TEST_TFM_NAME_MAX	(32)
+static char test_tfm[TEST_TFM_NAME_MAX];
+
+static struct test_param test_params, test_new_params;
+
+static DECLARE_RWSEM(test_lock);
+
+
+static unsigned long total_bps(int i)
+{
+	unsigned long total = 0;
+	int j;
+
+	for (j = 0; j < test_kthreads_max; j++)
+		total += atomic64_read(&test_kthread_params[j].bps[i]);
+
+	return total;
+}
+
+static void update_max_bps(int i)
+{
+	uint64_t prev, t;
+
+	t = total_bps(i);
+	prev = atomic64_read(&test_max_bps[i]);
+	while (t > prev) {
+		uint64_t a = atomic64_cmpxchg(&test_max_bps[i], prev, t);
+
+		if (prev == a)
+			break;
+		prev = a;
+	}
+}
+
+#define NS_PER_S	(1000000000)
+
+static void update_bps(struct test_kthread_param *p,
+		       int i, u64 bytes, ktime_t start)
+{
+	u64 ns = ktime_to_ns(ktime_sub(ktime_get(), start));
+	u64 bps = atomic64_read(&p->bps[i]);
+	u64 window_ns = NS_PER_S * test_params.bps_window;
+	u64 window_bytes = bps * test_params.bps_window;
+	s64 a, b, delta;
+
+	a = window_ns * bytes;
+	b = ns * window_bytes;
+
+	delta = (s64)(a - b) / (s64)(window_ns + ns);
+
+	window_bytes += delta;
+
+	atomic64_set(&p->bps[i], window_bytes / test_params.bps_window);
+
+	update_max_bps(i);
+}
+
+static void reset_bps(void)
+{
+	int i;
+
+	for (i = 0; i < test_kthreads_max; i++) {
+		atomic64_set(&test_kthread_params[i].bps[0], 0);
+		atomic64_set(&test_kthread_params[i].bps[1], 0);
+	}
+
+	atomic64_set(&test_max_bps[0], 0);
+	atomic64_set(&test_max_bps[1], 0);
+}
+
+static int test_compare(struct test_kthread_param *p, unsigned int clen)
+{
+	u8 *a = &p->buffer[0][p->offset[0]];
+	u8 *c = &p->buffer[2][p->offset[2]];
+	unsigned int alen = p->length[0];
+
+	if (alen != clen) {
+		pr_err("buffer length mismatch, alen 0x%x clen 0x%x\n",
+		      alen, clen);
+		return -EINVAL;
+	}
+
+	if (memcmp(a, c, alen)) {
+		pr_err("buffer data mismatch\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int __test_decompress(struct test_kthread_param *p, unsigned int blen)
+{
+	u8 *b = &p->buffer[1][p->offset[1]];
+	u8 *c = &p->buffer[2][p->offset[2]];
+	unsigned int clen = p->length[2];
+	ktime_t start = ktime_get();
+	int ret;
+
+	ret = crypto_comp_decompress(p->tfm, b, blen, c, &clen);
+	if (ret)
+		return ret;
+
+	update_bps(p, 1, clen, start);
+
+	return test_compare(p, clen);
+}
+
+static int test_decompress(struct test_kthread_param *p, unsigned int blen)
+{
+	struct test_range *off, *len;
+	u32 o, l;
+	int ret;
+
+	off = &test_params.offset[2];
+	len = &test_params.length[2];
+
+	for (l = len->start; l <= len->end; l += len->interval) {
+		p->length[2] = l;
+
+		for (o = off->start; o <= off->end; o += off->interval) {
+			p->offset[2] = o;
+
+			ret = __test_decompress(p, blen);
+			if (ret)
+				return ret;
+
+			if (kthread_should_stop())
+				return 0;
+
+			/* so we don't appear hung */
+			schedule();
+		}
+	}
+
+	return 0;
+}
+
+static int __test_compress(struct test_kthread_param *p)
+{
+	u8 *a = &p->buffer[0][p->offset[0]];
+	u8 *b = &p->buffer[1][p->offset[1]];
+	unsigned int alen = p->length[0], blen = p->length[1];
+	ktime_t start = ktime_get();
+	int ret;
+
+	ret = crypto_comp_compress(p->tfm, a, alen, b, &blen);
+	if (ret)
+		return ret;
+
+	update_bps(p, 0, alen, start);
+
+	return test_decompress(p, blen);
+}
+
+static int test_compress(struct test_kthread_param *p)
+{
+	struct test_range *off, *len;
+	u32 o, l;
+	int ret;
+
+	off = &test_params.offset[1];
+	len = &test_params.length[1];
+
+	for (l = len->start; l <= len->end; l += len->interval) {
+		p->length[1] = l;
+
+		for (o = off->start; o <= off->end; o += off->interval) {
+			p->offset[1] = o;
+
+			ret = __test_compress(p);
+			if (ret)
+				return ret;
+
+			if (kthread_should_stop())
+				return 0;
+
+			/* so we don't appear hung */
+			schedule();
+		}
+	}
+
+	return 0;
+}
+
+static int test_kthread_func(void *arg)
+{
+	struct test_kthread_param *p = arg;
+	struct test_range *off, *len;
+	u32 o, l;
+	int ret;
+
+	off = &test_params.offset[0];
+	len = &test_params.length[0];
+
+repeat:
+	for (l = len->start; l <= len->end; l += len->interval) {
+		p->length[0] = l;
+
+		for (o = off->start; o <= off->end; o += off->interval) {
+			p->offset[0] = o;
+
+			ret = test_compress(p);
+			if (ret)
+				goto end;
+
+			if (kthread_should_stop())
+				goto end;
+
+			/* so we don't appear hung */
+			schedule();
+		}
+	}
+
+	if (test_params.repeat)
+		goto repeat;
+
+end:
+	if (ret)
+		test_return = ret;
+
+	p->running = false;
+
+	return 0;
+}
+
+static void test_free_kthread(int i)
+{
+	struct test_kthread_param *p;
+
+	if (i > test_kthreads_max)
+		return;
+
+	p = &test_kthread_params[i];
+
+	if (p->tfm && !IS_ERR(p->tfm))
+		crypto_free_comp(p->tfm);
+	p->tfm = NULL;
+
+	free_pages((unsigned long)p->buffer[0], test_buffer_order);
+	p->buffer[0] = NULL;
+	free_pages((unsigned long)p->buffer[1], test_buffer_order);
+	p->buffer[1] = NULL;
+	free_pages((unsigned long)p->buffer[2], test_buffer_order);
+	p->buffer[2] = NULL;
+}
+
+static int test_alloc_kthread(int i)
+{
+	struct test_kthread_param *p;
+	int ret = 0;
+
+	if (i > test_kthreads_max)
+		return -EINVAL;
+
+	p = &test_kthread_params[i];
+
+	p->buffer[0] = (u8 *)__get_free_pages(GFP_KERNEL, test_buffer_order);
+	p->buffer[1] = (u8 *)__get_free_pages(GFP_KERNEL, test_buffer_order);
+	p->buffer[2] = (u8 *)__get_free_pages(GFP_KERNEL, test_buffer_order);
+	p->tfm = crypto_alloc_comp(test_tfm, 0, 0);
+
+	if (IS_ERR(p->tfm)) {
+		pr_err("could not create compressor %s : %ld\n",
+			 test_tfm, PTR_ERR(p->tfm));
+		ret = PTR_ERR(p->tfm);
+	}
+
+	if (!p->buffer[0] || !p->buffer[1] || !p->buffer[2]) {
+		pr_err("could not create buffer\n");
+		ret = -ENOMEM;
+	}
+
+	if (ret) {
+		test_free_kthread(i);
+	} else {
+		memcpy(p->buffer[0], test_buffer, TEST_BUFFER_SIZE);
+		memcpy(p->buffer[1], test_buffer, TEST_BUFFER_SIZE);
+		memcpy(p->buffer[2], test_buffer, TEST_BUFFER_SIZE);
+	}
+
+	return ret;
+}
+
+static int test_kthread_running(void)
+{
+	int i;
+
+	for (i = 0; i < test_kthreads_max; i++)
+		if (test_kthread_params[i].running)
+			return 1;
+
+	return 0;
+}
+
+static int test_run(void *arg)
+{
+	int i;
+
+	test_return = 0;
+
+	if (test_tfm[0] == 0) {
+		test_return = -ENODEV;
+		pr_err("compression self test error: no compressor defined\n");
+		goto error;
+	}
+
+	reset_bps();
+
+	pr_info("compression self test starting\n");
+	pr_info("  compressor: %s\n", test_tfm);
+	pr_info("  repeat: %s\n", test_params.repeat ? "Y" : "N");
+	pr_info("  threads: %d\n", test_params.kthreads);
+	pr_info("  offsets %x-%x/%x, %x-%x/%x, %x-%x/%x\n",
+		test_params.offset[0].start, test_params.offset[0].end,
+		test_params.offset[0].interval,
+		test_params.offset[1].start, test_params.offset[1].end,
+		test_params.offset[1].interval,
+		test_params.offset[2].start, test_params.offset[2].end,
+		test_params.offset[2].interval);
+	pr_info("  lengths %x-%x/%x, %x-%x/%x, %x-%x/%x\n",
+		test_params.length[0].start, test_params.length[0].end,
+		test_params.length[0].interval,
+		test_params.length[1].start, test_params.length[1].end,
+		test_params.length[1].interval,
+		test_params.length[2].start, test_params.length[2].end,
+		test_params.length[2].interval);
+
+	for (i = 0; !test_return && i < test_params.kthreads; i++) {
+		test_return = test_alloc_kthread(i);
+		if (test_return)
+			while (--i >= 0)
+				test_free_kthread(i);
+	}
+
+	for (i = 0; !test_return && i < test_params.kthreads; i++) {
+		struct test_kthread_param *p = &test_kthread_params[i];
+
+		p->running = true;
+		p->kthread = kthread_run(test_kthread_func, p, "selftest%d", i);
+		if (IS_ERR(p->kthread)) {
+			test_return = PTR_ERR(p->kthread);
+			p->kthread = NULL;
+			p->running = false;
+		} else {
+			get_task_struct(p->kthread);
+		}
+	}
+
+	while (!kthread_should_stop() && !test_return && test_kthread_running())
+		schedule_timeout_interruptible(msecs_to_jiffies(100));
+
+	for (i = 0; i < test_params.kthreads; i++) {
+		struct test_kthread_param *p = &test_kthread_params[i];
+
+		if (p->running)
+			kthread_stop(p->kthread);
+		if (p->kthread)
+			put_task_struct(p->kthread);
+		p->kthread = NULL;
+		p->running = false;
+	}
+
+	pr_info("compression self test MBps %ld/%ld peak %ld/%ld\n",
+		(unsigned long)(total_bps(0) / 1000000),
+		(unsigned long)(total_bps(1) / 1000000),
+		(unsigned long)(atomic64_read(&test_max_bps[0]) / 1000000),
+		(unsigned long)(atomic64_read(&test_max_bps[1]) / 1000000));
+
+	if (kthread_should_stop())
+		pr_info("compression self test stopped by user\n");
+	else
+		pr_info("compression self test ended\n");
+
+error:
+	/* this causes test_stop() to get called */
+	test_new_params.running = 0;
+
+	if (test_return)
+		pr_info("compression self test failed\n");
+
+	return test_return;
+}
+
+static void test_stop(void)
+{
+	if (!test_kthread)
+		return;
+
+	if (!IS_ERR(test_kthread)) {
+		kthread_stop(test_kthread);
+		put_task_struct(test_kthread);
+	}
+
+	test_kthread = NULL;
+	test_params.running = false;
+	test_new_params.running = false;
+}
+
+static int test_start(void)
+{
+	int ret = 0;
+
+	if (test_kthread)
+		return 0;
+
+	test_kthread = kthread_run(test_run, NULL, MODULE_NAME);
+	if (IS_ERR(test_kthread)) {
+		ret = PTR_ERR(test_kthread);
+		test_stop();
+	} else
+		get_task_struct(test_kthread);
+
+	return ret;
+}
+
+/* this changes the source buffer passed in */
+static void test_buffer_fill(u8 *buf, size_t len)
+{
+	size_t i, l, pos;
+
+	/* repeat source buf across entire test buffer,
+	 * changing each byte by 1 per repeat
+	 */
+	for (pos = 0; pos < TEST_BUFFER_SIZE; pos += len) {
+		l = min_t(size_t, len, TEST_BUFFER_SIZE - pos);
+
+		memcpy(&test_buffer[pos], buf, l);
+
+		if (pos + len < TEST_BUFFER_SIZE)
+			for (i = 0; i < len; i++)
+				buf[i] += 1;
+	}
+}
+
+static ssize_t test_buffer_read(struct file *file, char __user *buf,
+				size_t len, loff_t *off)
+{
+	return simple_read_from_buffer(buf, len, off,
+				       test_buffer, TEST_BUFFER_SIZE);
+}
+
+static ssize_t test_buffer_write(struct file *file, const char __user *buf,
+				 size_t len, loff_t *off)
+{
+	size_t l = min_t(size_t, len, TEST_BUFFER_SIZE);
+	char *tmp;
+	u32 was_running;
+
+	tmp = kmalloc(l, GFP_KERNEL);
+	if (!tmp)
+		return -ENOMEM;
+
+	if (copy_from_user(tmp, buf, l)) {
+		kfree(tmp);
+		return -EFAULT;
+	}
+
+	down_write(&test_lock);
+
+	was_running = test_params.running && test_new_params.running;
+
+	/* if was_running, this clears the running flags */
+	test_stop();
+
+	/* special case if writing to start, we fill out the rest
+	 * of the buffer using the starting pattern
+	 */
+	if (*off == 0)
+		test_buffer_fill(tmp, l);
+	else
+		l = simple_write_to_buffer(test_buffer, TEST_BUFFER_SIZE,
+					   off, buf, l);
+	*off += l;
+
+	if (was_running) {
+		int ret;
+
+		test_new_params.running = was_running;
+		test_params.running = was_running;
+		ret = test_start();
+		if (ret)
+			l = ret;
+	}
+
+	up_write(&test_lock);
+
+	kfree(tmp);
+
+	return l;
+}
+
+static const struct file_operations test_buffer_ops = {
+	.owner =	THIS_MODULE,
+	.read =		test_buffer_read,
+	.write =	test_buffer_write,
+};
+
+static ssize_t test_compressor_read(struct file *file, char __user *buf,
+				    size_t len, loff_t *off)
+{
+	char name[TEST_TFM_NAME_MAX] = "none\n";
+
+	down_read(&test_lock);
+
+	if (test_tfm[0] != 0)
+		snprintf(name, TEST_TFM_NAME_MAX, "%s\n", test_tfm);
+	name[TEST_TFM_NAME_MAX-1] = 0;
+
+	up_read(&test_lock);
+
+	return simple_read_from_buffer(buf, len, off, name, TEST_TFM_NAME_MAX);
+}
+
+static ssize_t test_compressor_write(struct file *file, const char __user *buf,
+				     size_t len, loff_t *off)
+{
+	char tmp[TEST_TFM_NAME_MAX], *name;
+	size_t l = min_t(size_t, len, TEST_TFM_NAME_MAX);
+	u32 was_running;
+
+	if (l < 1)
+		return -EINVAL;
+
+	if (copy_from_user(tmp, buf, l))
+		return -EFAULT;
+	tmp[l-1] = 0;
+	name = strim(tmp);
+	l = strlen(name) + 1;
+
+	if (!crypto_has_comp(name, 0, 0))
+		return -ENODEV;
+
+	down_write(&test_lock);
+
+	was_running = test_params.running && test_new_params.running;
+
+	/* if was_running, this clears the running flags */
+	test_stop();
+
+	strncpy(test_tfm, name, l);
+	test_tfm[l-1] = 0;
+
+	if (was_running) {
+		int ret;
+
+		test_new_params.running = was_running;
+		test_params.running = was_running;
+		ret = test_start();
+		if (ret)
+			len = ret;
+	}
+
+	up_write(&test_lock);
+
+	return len;
+}
+
+static const struct file_operations test_compressor_ops = {
+	.owner =	THIS_MODULE,
+	.read =		test_compressor_read,
+	.write =	test_compressor_write,
+};
+
+static ssize_t test_status_read(struct file *file, char __user *buf,
+				size_t len, loff_t *off)
+{
+	int tmplen = 120;
+	char tmp[tmplen];
+
+	if (!test_params.running) {
+		if (*off == 0) /* haven't printed anything yet, so print msg */
+			snprintf(tmp, tmplen, "Test not running\n");
+		else if (*off == 1) /* printed status, just print \n */
+			snprintf(tmp, tmplen, "\n");
+		else /* printed msg or \n, report end */
+			return 0;
+		*off = 2;
+		goto end;
+	}
+
+	/* slow down reading */
+	schedule_timeout_interruptible(msecs_to_jiffies(250));
+
+	snprintf(tmp, tmplen, "Threads %d MBps: %lu/%lu peak %lu/%lu       \r",
+		 test_params.kthreads,
+		 (unsigned long)(total_bps(0) / 1000000),
+		 (unsigned long)(total_bps(1) / 1000000),
+		 (unsigned long)(atomic64_read(&test_max_bps[0]) / 1000000),
+		 (unsigned long)(atomic64_read(&test_max_bps[1]) / 1000000));
+	tmp[tmplen - 2] = '\r';
+	tmp[tmplen - 1] = 0;
+
+end:
+	len = min(len, strlen(tmp));
+
+	if (copy_to_user(buf, tmp, len))
+		return -EFAULT;
+
+	if (*off == 0)
+		*off = 1;
+
+	return len;
+}
+
+static const struct file_operations test_status_ops = {
+	.owner =	THIS_MODULE,
+	.read =		test_status_read,
+};
+
+static void test_param_check_valid(int i)
+{
+	struct test_range *off, *len;
+
+	off = &test_params.offset[i];
+	len = &test_params.length[i];
+
+	/* interval must be at least 1 */
+	if (off->interval < 1)
+		off->interval = OFFSET_INTERVAL_DEFAULT;
+	if (len->interval < 1)
+		len->interval = LENGTH_INTERVAL_DEFAULT;
+
+	/* end offset + length can't be more than buffer size */
+	if (off->end + len->end > TEST_BUFFER_SIZE)
+		off->end = max_t(u32, 0, TEST_BUFFER_SIZE - len->end);
+	if (off->end + len->end > TEST_BUFFER_SIZE)
+		len->end = TEST_BUFFER_SIZE;
+
+	/* end must be equal or after start */
+	if (off->end < off->start)
+		off->start = off->end;
+	if (len->end < len->start)
+		len->start = len->end;
+}
+
+static void test_param_change(void)
+{
+	bool should_run;
+	int i;
+
+	down_write(&test_lock);
+
+	should_run = test_new_params.running;
+
+	/* if already running, this clears running flags */
+	test_stop();
+
+	test_new_params.running = should_run;
+
+	memcpy(&test_params, &test_new_params, sizeof(test_params));
+
+	/* kthreads must be between 1 and max */
+	if (test_params.kthreads < 1)
+		test_params.kthreads = 1;
+	if (test_params.kthreads > test_kthreads_max)
+		test_params.kthreads = test_kthreads_max;
+
+	/* bps_window must be at least 1 */
+	if (test_params.bps_window < 1)
+		test_params.bps_window = 1;
+
+	for (i = 0; i < 3; i++)
+		test_param_check_valid(i);
+
+	/* update any corrected params */
+	memcpy(&test_new_params, &test_params, sizeof(test_params));
+
+	if (test_params.running)
+		test_start();
+
+	up_write(&test_lock);
+}
+
+static int test_check(void *ignored)
+{
+	bool changed;
+
+	while (1) {
+		set_current_state(TASK_INTERRUPTIBLE);
+		if (kthread_should_stop())
+			break;
+
+		schedule_timeout(TEST_CHECK_INTERVAL);
+
+		if (kthread_should_stop())
+			break;
+
+		down_read(&test_lock);
+		changed = !!memcmp(&test_params, &test_new_params,
+				   sizeof(test_params));
+		up_read(&test_lock);
+
+		if (changed)
+			test_param_change();
+	}
+
+	return 0;
+}
+
+static struct task_struct *test_check_kthread;
+
+static struct dentry *test_root;
+
+static void test_exit(void)
+{
+	if (test_root && !IS_ERR(test_root))
+		debugfs_remove_recursive(test_root);
+	if (test_check_kthread && !IS_ERR(test_check_kthread))
+		kthread_stop(test_check_kthread);
+	test_stop();
+	kfree(test_kthread_params);
+	free_pages((unsigned long)test_buffer, test_buffer_order);
+}
+module_exit(test_exit);
+
+static int __init test_init(void)
+{
+	struct dentry *offsets, *lengths;
+	u8 test_buffer_fill_default[8] = { 0 };
+	int ret = 0, i;
+
+	if (!debugfs_initialized())
+		return -EINVAL;
+
+	test_root = debugfs_create_dir(MODULE_NAME, NULL);
+	if (IS_ERR(test_root)) {
+		pr_err("could not create debugfs dir %s : %ld\n",
+		       MODULE_NAME, PTR_ERR(test_root));
+		return PTR_ERR(test_root);
+	}
+
+	offsets = debugfs_create_dir("offsets", test_root);
+	if (IS_ERR(offsets)) {
+		pr_err("could not create debugfs dir %s/offsets : %ld\n",
+		       MODULE_NAME, PTR_ERR(offsets));
+		ret = PTR_ERR(offsets);
+		goto end;
+	}
+	lengths = debugfs_create_dir("lengths", test_root);
+	if (IS_ERR(lengths)) {
+		pr_err("could not create debugfs dir %s/lengths : %ld\n",
+		       MODULE_NAME, PTR_ERR(lengths));
+		ret = PTR_ERR(lengths);
+		goto end;
+	}
+
+	debugfs_create_file("status", S_IRUGO, test_root, NULL,
+			    &test_status_ops);
+	debugfs_create_file("compressor", S_IRUGO | S_IWUSR, test_root, NULL,
+			    &test_compressor_ops);
+	debugfs_create_file("buffer", S_IRUSR | S_IWUSR, test_root, NULL,
+			    &test_buffer_ops);
+
+	debugfs_create_bool("running", S_IRUGO | S_IWUSR, test_root,
+			    &test_new_params.running);
+	debugfs_create_bool("repeat", S_IRUGO | S_IWUSR, test_root,
+			    &test_new_params.repeat);
+	debugfs_create_u32("threads", S_IRUGO | S_IWUSR, test_root,
+			    &test_new_params.kthreads);
+	debugfs_create_u32("bps_window", S_IRUGO | S_IWUSR, test_root,
+			    &test_new_params.bps_window);
+
+	debugfs_create_u32("start_a", S_IRUGO | S_IWUSR, offsets,
+			    &test_new_params.offset[0].start);
+	debugfs_create_u32("start_b", S_IRUGO | S_IWUSR, offsets,
+			    &test_new_params.offset[1].start);
+	debugfs_create_u32("start_c", S_IRUGO | S_IWUSR, offsets,
+			    &test_new_params.offset[2].start);
+	debugfs_create_u32("end_a", S_IRUGO | S_IWUSR, offsets,
+			    &test_new_params.offset[0].end);
+	debugfs_create_u32("end_b", S_IRUGO | S_IWUSR, offsets,
+			    &test_new_params.offset[1].end);
+	debugfs_create_u32("end_c", S_IRUGO | S_IWUSR, offsets,
+			    &test_new_params.offset[2].end);
+	debugfs_create_u32("interval_a", S_IRUGO | S_IWUSR, offsets,
+			    &test_new_params.offset[0].interval);
+	debugfs_create_u32("interval_b", S_IRUGO | S_IWUSR, offsets,
+			    &test_new_params.offset[1].interval);
+	debugfs_create_u32("interval_c", S_IRUGO | S_IWUSR, offsets,
+			    &test_new_params.offset[2].interval);
+
+	debugfs_create_u32("start_a", S_IRUGO | S_IWUSR, lengths,
+			    &test_new_params.length[0].start);
+	debugfs_create_u32("start_b", S_IRUGO | S_IWUSR, lengths,
+			    &test_new_params.length[1].start);
+	debugfs_create_u32("start_c", S_IRUGO | S_IWUSR, lengths,
+			    &test_new_params.length[2].start);
+	debugfs_create_u32("end_a", S_IRUGO | S_IWUSR, lengths,
+			    &test_new_params.length[0].end);
+	debugfs_create_u32("end_b", S_IRUGO | S_IWUSR, lengths,
+			    &test_new_params.length[1].end);
+	debugfs_create_u32("end_c", S_IRUGO | S_IWUSR, lengths,
+			    &test_new_params.length[2].end);
+	debugfs_create_u32("interval_a", S_IRUGO | S_IWUSR, lengths,
+			    &test_new_params.length[0].interval);
+	debugfs_create_u32("interval_b", S_IRUGO | S_IWUSR, lengths,
+			    &test_new_params.length[1].interval);
+	debugfs_create_u32("interval_c", S_IRUGO | S_IWUSR, lengths,
+			    &test_new_params.length[2].interval);
+
+	test_kthread_params = kcalloc(test_kthreads_max,
+				      sizeof(*test_kthread_params),
+				      GFP_KERNEL);
+
+	if (!test_kthread_params) {
+		pr_err("could kthread params\n");
+		ret = -ENOMEM;
+		goto end;
+	}
+
+	test_buffer = (u8 *)__get_free_pages(GFP_KERNEL, test_buffer_order);
+	if (!test_buffer) {
+		pr_err("could not allocate test buffer\n");
+		ret = -ENOMEM;
+		goto end;
+	}
+
+	test_buffer_fill(test_buffer_fill_default,
+			 ARRAY_SIZE(test_buffer_fill_default));
+
+	test_params.running = 0;
+	test_params.repeat = TEST_REPEAT_DEFAULT;
+	test_params.kthreads = TEST_KTHREADS_DEFAULT;
+	test_params.bps_window = TEST_BPS_WINDOW_DEFAULT;
+	for (i = 0; i < 3; i++) {
+		test_params.offset[i].start = OFFSET_START_DEFAULT;
+		test_params.offset[i].end = OFFSET_END_DEFAULT;
+		test_params.offset[i].interval = OFFSET_INTERVAL_DEFAULT;
+		test_params.length[i].start = LENGTH_START_DEFAULT;
+		test_params.length[i].end = LENGTH_END_DEFAULT;
+		test_params.length[i].interval = LENGTH_INTERVAL_DEFAULT;
+	}
+	memcpy(&test_new_params, &test_params, sizeof(test_params));
+
+	test_check_kthread = kthread_run(test_check, NULL, MODULE_NAME "_chk");
+	if (IS_ERR(test_check_kthread))
+		ret = PTR_ERR(test_check_kthread);
+
+end:
+	if (ret)
+		test_exit();
+
+	return ret;
+}
+module_init(test_init);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 09/11] crypto: remove LZO fallback from crypto 842
  2015-04-07 17:34 ` [PATCH 09/11] crypto: remove LZO fallback from crypto 842 Dan Streetman
@ 2015-04-08 14:16   ` Herbert Xu
  2015-04-08 14:28     ` Dan Streetman
  0 siblings, 1 reply; 46+ messages in thread
From: Herbert Xu @ 2015-04-08 14:16 UTC (permalink / raw)
  To: Dan Streetman
  Cc: David S. Miller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Seth Jennings, Robert Jennings, linux-crypto,
	linuxppc-dev, linux-kernel

On Tue, Apr 07, 2015 at 01:34:28PM -0400, Dan Streetman wrote:
> Update the crypto 842 driver to no longer fallback to LZO if the 842
> hardware is unavailable.  Simplify the crpypto 842 driver to remove all
> headers indicating 842/lzo.
> 
> The crypto 842 driver should do 842-format compression and decompression
> only.  It should not fallback to LZO compression/decompression.  The
> user of the crypto 842 driver can fallback to another format if desired.
> 
> Signed-off-by: Dan Streetman <ddstreet@ieee.org>

Thanks for the series.  I don't know how I missed it when it was
merged originally but crypto/842.c needs to be moved over to
drivers/crypto.  Your software implementation should take its
places as the reference implementation.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/11] crypto: add crypto compression sefltest
  2015-04-07 17:34 ` [PATCH 11/11] crypto: add crypto compression sefltest Dan Streetman
@ 2015-04-08 14:16   ` Herbert Xu
  2015-04-08 14:48     ` Dan Streetman
  0 siblings, 1 reply; 46+ messages in thread
From: Herbert Xu @ 2015-04-08 14:16 UTC (permalink / raw)
  To: Dan Streetman
  Cc: David S. Miller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Seth Jennings, Robert Jennings, linux-crypto,
	linuxppc-dev, linux-kernel

On Tue, Apr 07, 2015 at 01:34:30PM -0400, Dan Streetman wrote:
> Add configurable module to perform self-tests on any crypto compression
> driver.
> 
> This allows testing any crypto compression driver with any input buffer,
> at varying alignments and lengths.  It calculates the average bytes per
> second compression and decompression rates.  Any errors reported by the
> compressor during compression or decompression will end the test and
> be logged.
> 
> Signed-off-by: Dan Streetman <ddstreet@ieee.org>

Please use the existing infrastructure in testmgr/tcrypt.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 09/11] crypto: remove LZO fallback from crypto 842
  2015-04-08 14:16   ` Herbert Xu
@ 2015-04-08 14:28     ` Dan Streetman
  2015-04-08 14:38       ` Herbert Xu
  0 siblings, 1 reply; 46+ messages in thread
From: Dan Streetman @ 2015-04-08 14:28 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Seth Jennings, Robert Jennings, linux-crypto,
	linuxppc-dev, linux-kernel

On Wed, Apr 8, 2015 at 10:16 AM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Tue, Apr 07, 2015 at 01:34:28PM -0400, Dan Streetman wrote:
>> Update the crypto 842 driver to no longer fallback to LZO if the 842
>> hardware is unavailable.  Simplify the crpypto 842 driver to remove all
>> headers indicating 842/lzo.
>>
>> The crypto 842 driver should do 842-format compression and decompression
>> only.  It should not fallback to LZO compression/decompression.  The
>> user of the crypto 842 driver can fallback to another format if desired.
>>
>> Signed-off-by: Dan Streetman <ddstreet@ieee.org>
>
> Thanks for the series.  I don't know how I missed it when it was
> merged originally but crypto/842.c needs to be moved over to
> drivers/crypto.  Your software implementation should take its
> places as the reference implementation.

So, the sw implementation is only for decompression; there's no sw
compression implementation in these patches.

The hw 842 driver is currently at drivers/crypto/nx, and the
crypto/842 driver just calls the hw driver (after correctly
aligning/sizing the provided buffers to what the hw driver expects),
and falls back to the sw decompression if the hw decompression fails
(there is no compression fallback, a failure is reported to the
caller).

Is that setup ok?  If users had to directly call the hw driver,
instead of using the generic crypto_comp interface, it would
complicate things, e.g. in zswap it only expects to call
crypto_comp_compress()/decompress(), not call the 842 hw driver
directly.


>
> Cheers,
> --
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 09/11] crypto: remove LZO fallback from crypto 842
  2015-04-08 14:28     ` Dan Streetman
@ 2015-04-08 14:38       ` Herbert Xu
  2015-04-08 14:45         ` Dan Streetman
  0 siblings, 1 reply; 46+ messages in thread
From: Herbert Xu @ 2015-04-08 14:38 UTC (permalink / raw)
  To: Dan Streetman
  Cc: David S. Miller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Seth Jennings, Robert Jennings, linux-crypto,
	linuxppc-dev, linux-kernel

On Wed, Apr 08, 2015 at 10:28:23AM -0400, Dan Streetman wrote:
> 
> So, the sw implementation is only for decompression; there's no sw
> compression implementation in these patches.

As a general rule we don't add any hardware implementation unless
there is a software implementation.  The reason is that every new
algorithm creates an API (potentially a user-space API if the
algorithm can be exported via algif).  But sometimes things slip
through.

So I'm not going to immediately remove 842 but it would be nice
if we had a reference implementation so that if ever there were
another hardware 842 implmentation added then at least we have
something that we can judge against.

> The hw 842 driver is currently at drivers/crypto/nx, and the
> crypto/842 driver just calls the hw driver (after correctly
> aligning/sizing the provided buffers to what the hw driver expects),
> and falls back to the sw decompression if the hw decompression fails
> (there is no compression fallback, a failure is reported to the
> caller).
> 
> Is that setup ok?  If users had to directly call the hw driver,
> instead of using the generic crypto_comp interface, it would
> complicate things, e.g. in zswap it only expects to call
> crypto_comp_compress()/decompress(), not call the 842 hw driver
> directly.

I think the only thing that needs to happen for now is moving
crypto/842.c over to drivers/crypto/nx (perhaps merge it into
nx-842.c) so that it's obvious that this is not a generic
implementation.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 09/11] crypto: remove LZO fallback from crypto 842
  2015-04-08 14:38       ` Herbert Xu
@ 2015-04-08 14:45         ` Dan Streetman
  2015-04-08 14:48           ` Herbert Xu
  0 siblings, 1 reply; 46+ messages in thread
From: Dan Streetman @ 2015-04-08 14:45 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Seth Jennings, Robert Jennings, linux-crypto,
	linuxppc-dev, linux-kernel

On Wed, Apr 8, 2015 at 10:38 AM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Wed, Apr 08, 2015 at 10:28:23AM -0400, Dan Streetman wrote:
>>
>> So, the sw implementation is only for decompression; there's no sw
>> compression implementation in these patches.
>
> As a general rule we don't add any hardware implementation unless
> there is a software implementation.  The reason is that every new
> algorithm creates an API (potentially a user-space API if the
> algorithm can be exported via algif).  But sometimes things slip
> through.
>
> So I'm not going to immediately remove 842 but it would be nice
> if we had a reference implementation so that if ever there were
> another hardware 842 implmentation added then at least we have
> something that we can judge against.

Ok I'll see if I can include a sw compression implementation.

>
>> The hw 842 driver is currently at drivers/crypto/nx, and the
>> crypto/842 driver just calls the hw driver (after correctly
>> aligning/sizing the provided buffers to what the hw driver expects),
>> and falls back to the sw decompression if the hw decompression fails
>> (there is no compression fallback, a failure is reported to the
>> caller).
>>
>> Is that setup ok?  If users had to directly call the hw driver,
>> instead of using the generic crypto_comp interface, it would
>> complicate things, e.g. in zswap it only expects to call
>> crypto_comp_compress()/decompress(), not call the 842 hw driver
>> directly.
>
> I think the only thing that needs to happen for now is moving
> crypto/842.c over to drivers/crypto/nx (perhaps merge it into
> nx-842.c) so that it's obvious that this is not a generic
> implementation.

ah ok, so you mean it can still be a crypto_comp interface, just move
its location and/or merge it into nx-842.c?

>
> Cheers,
> --
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/11] crypto: add crypto compression sefltest
  2015-04-08 14:16   ` Herbert Xu
@ 2015-04-08 14:48     ` Dan Streetman
  0 siblings, 0 replies; 46+ messages in thread
From: Dan Streetman @ 2015-04-08 14:48 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Seth Jennings, Robert Jennings, linux-crypto,
	linuxppc-dev, linux-kernel

On Wed, Apr 8, 2015 at 10:16 AM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Tue, Apr 07, 2015 at 01:34:30PM -0400, Dan Streetman wrote:
>> Add configurable module to perform self-tests on any crypto compression
>> driver.
>>
>> This allows testing any crypto compression driver with any input buffer,
>> at varying alignments and lengths.  It calculates the average bytes per
>> second compression and decompression rates.  Any errors reported by the
>> compressor during compression or decompression will end the test and
>> be logged.
>>
>> Signed-off-by: Dan Streetman <ddstreet@ieee.org>
>
> Please use the existing infrastructure in testmgr/tcrypt.

I think I looked at that and it didn't seem like it would be able to
include all the user controls, but I'll look again.  In any case this
test doesn't have to be included in this 842 patch set, so I'll drop
it from the next version and re-send it by itself if/when I can update
it to use testmgr.

>
> Thanks,
> --
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 09/11] crypto: remove LZO fallback from crypto 842
  2015-04-08 14:45         ` Dan Streetman
@ 2015-04-08 14:48           ` Herbert Xu
  0 siblings, 0 replies; 46+ messages in thread
From: Herbert Xu @ 2015-04-08 14:48 UTC (permalink / raw)
  To: Dan Streetman
  Cc: David S. Miller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Seth Jennings, Robert Jennings, linux-crypto,
	linuxppc-dev, linux-kernel

On Wed, Apr 08, 2015 at 10:45:45AM -0400, Dan Streetman wrote:
>
> Ok I'll see if I can include a sw compression implementation.

That would be great!

> ah ok, so you mean it can still be a crypto_comp interface, just move
> its location and/or merge it into nx-842.c?

Oh yes of course.  There is nothing wrong with it exporting this
via the crypto API.  It's just in an unusual spot for a driver.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCHv2 00/10] add 842 hw compression for PowerNV platform
  2015-04-07 17:34 [PATCH 00/11] add 842 hw compression for PowerNV platform Dan Streetman
                   ` (10 preceding siblings ...)
  2015-04-07 17:34 ` [PATCH 11/11] crypto: add crypto compression sefltest Dan Streetman
@ 2015-05-06 16:50 ` Dan Streetman
  2015-05-06 16:50   ` [PATCH 01/10] powerpc: export of_get_ibm_chip_id function Dan Streetman
                     ` (10 more replies)
  11 siblings, 11 replies; 46+ messages in thread
From: Dan Streetman @ 2015-05-06 16:50 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Dan Streetman, Seth Jennings, Robert Jennings, linux-crypto,
	linuxppc-dev, linux-kernel

IBM PowerPC processors starting at version P7+ contain a NX coprocessor
that provides various hw-accelerated functions, one of which is memory
compression to the IBM "842" compression format.  This NX-842 coprocessor
is already supported on the pSeries platform, by the nx-842.c driver and
the crypto compression interface at crypto/842.c.  This patch set adds
support for NX-842 on the PowerNV (Non-Virtualized) platform, as well as
adding a full software 842 compression/decompression implementation.

Quick summary of changes: the current 842 crypto compression interface uses
only the 842 hardware on pSeries platforms, and can handle only page-sized
and page-aligned uncompressed buffers.  These patches add a full software
842 impementation, change the crypto/ directory 842 interface to a
software only implementation, add a 842 hardware crypto compression
interface that can handle any size and alignment buffers, add a
driver for 842 hardware on PowerNV platforms, and create a common
interface for both 842 hardware platform drivers.

The existing pSeries platform NX-842 driver could not be re-used for the
PowerNV platform driver, as there are fundamentally different interfaces;
on pSeries the system hypervisor (pHyp) provides the interface and manages
communication with the coprocessor, while on PowerNV the kernel talks directly
to the coprocessor using the ICSWX instruction.  The data structures used to
describe each compression or decompression request to the coprocessor are
also different between pHyp's interface and direct communication with ICSWX.
So, different drivers for pSeries and PowerNV are required.  Adding the new
PowerNV driver but keeping the interface to the drivers the same required
adding a new common frontend interface, to which only one of the platform
drivers will connect (based on what platform the kernel is currently running
on), and moving some functionality out of the existing pSeries driver into a
more common location.

The existing crypto/842.c interface is in the wrong place, since crypto/
should only contain software implementations; so lib/842/ is added
containing a reference (i.e. rather slow) implementation in software
of both 842 compression and 842 decompression.  The crypto/842.c interface
is changed to use only that software implementation.

The hardware 842 crypto compression interface is moved to
drivers/crypto/nx/nx-842-crypto.c.  It is also modified to be able to
handle any alignment/length input or output buffer; currently it is only
able to handle page-size and page-aligned (uncompressed) buffers, due to
restrictions in the pSeries 842 hardware driver.

Note that several of these patches have changed significantly since the
last patch series; I didn't list specific differences since there are
so many.

Dan Streetman (10):
  powerpc: export of_get_ibm_chip_id function
  powerpc: Add ICSWX instruction
  lib: add software 842 compression/decompression
  crypto: change 842 alg to use software
  drivers/crypto/nx: rename nx-842.c to nx-842-pseries.c
  drivers/crypto/nx: add NX-842 platform frontend driver
  drivers/crypto/nx: add nx842 constraints
  drivers/crypto/nx: add PowerNV platform NX-842 driver
  drivers/crypto/nx: simplify pSeries nx842 driver
  drivers/crypto/nx: add hardware 842 crypto comp alg

 MAINTAINERS                           |    5 +-
 arch/powerpc/include/asm/icswx.h      |  184 ++++
 arch/powerpc/include/asm/ppc-opcode.h |   13 +
 arch/powerpc/kernel/prom.c            |    1 +
 crypto/842.c                          |  175 +---
 crypto/Kconfig                        |    7 +-
 drivers/crypto/Kconfig                |   10 +-
 drivers/crypto/nx/Kconfig             |   55 +-
 drivers/crypto/nx/Makefile            |    6 +
 drivers/crypto/nx/nx-842-crypto.c     |  603 ++++++++++++
 drivers/crypto/nx/nx-842-powernv.c    |  625 +++++++++++++
 drivers/crypto/nx/nx-842-pseries.c    | 1128 +++++++++++++++++++++++
 drivers/crypto/nx/nx-842.c            | 1623 +++------------------------------
 drivers/crypto/nx/nx-842.h            |  131 +++
 include/linux/nx842.h                 |   21 +-
 include/linux/sw842.h                 |   12 +
 lib/842/842.h                         |  127 +++
 lib/842/842_compress.c                |  626 +++++++++++++
 lib/842/842_debugfs.h                 |   52 ++
 lib/842/842_decompress.c              |  405 ++++++++
 lib/842/Makefile                      |    2 +
 lib/Kconfig                           |    6 +
 lib/Makefile                          |    2 +
 23 files changed, 4138 insertions(+), 1681 deletions(-)
 create mode 100644 arch/powerpc/include/asm/icswx.h
 create mode 100644 drivers/crypto/nx/nx-842-crypto.c
 create mode 100644 drivers/crypto/nx/nx-842-powernv.c
 create mode 100644 drivers/crypto/nx/nx-842-pseries.c
 create mode 100644 drivers/crypto/nx/nx-842.h
 create mode 100644 include/linux/sw842.h
 create mode 100644 lib/842/842.h
 create mode 100644 lib/842/842_compress.c
 create mode 100644 lib/842/842_debugfs.h
 create mode 100644 lib/842/842_decompress.c
 create mode 100644 lib/842/Makefile

-- 
2.1.0

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 01/10] powerpc: export of_get_ibm_chip_id function
  2015-05-06 16:50 ` [PATCHv2 00/10] add 842 hw compression for PowerNV platform Dan Streetman
@ 2015-05-06 16:50   ` Dan Streetman
  2015-05-06 16:50   ` [PATCH 02/10] powerpc: Add ICSWX instruction Dan Streetman
                     ` (9 subsequent siblings)
  10 siblings, 0 replies; 46+ messages in thread
From: Dan Streetman @ 2015-05-06 16:50 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Dan Streetman, Seth Jennings, Robert Jennings, Rob Herring,
	Anton Blanchard, linux-crypto, linuxppc-dev, linux-kernel

Export the of_get_ibm_chip_id() function.  This will be used by the
PowerNV NX-842 driver.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 arch/powerpc/kernel/prom.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 308c5e1..ea2cea7 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -800,6 +800,7 @@ int of_get_ibm_chip_id(struct device_node *np)
 	}
 	return -1;
 }
+EXPORT_SYMBOL(of_get_ibm_chip_id);
 
 /**
  * cpu_to_chip_id - Return the cpus chip-id
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 02/10] powerpc: Add ICSWX instruction
  2015-05-06 16:50 ` [PATCHv2 00/10] add 842 hw compression for PowerNV platform Dan Streetman
  2015-05-06 16:50   ` [PATCH 01/10] powerpc: export of_get_ibm_chip_id function Dan Streetman
@ 2015-05-06 16:50   ` Dan Streetman
  2015-05-06 16:50   ` [PATCH 03/10] lib: add software 842 compression/decompression Dan Streetman
                     ` (8 subsequent siblings)
  10 siblings, 0 replies; 46+ messages in thread
From: Dan Streetman @ 2015-05-06 16:50 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Dan Streetman, Seth Jennings, Robert Jennings, linux-crypto,
	linuxppc-dev, linux-kernel

Add the asm ICSWX and ICSWEPX opcodes.  Add definitions for the
Coprocessor Request structures needed to use the icswx calls to
coprocessors.  Add icswx() function to perform the ICSWX asm
using the provided Coprocessor Command Word value and
Coprocessor Request Block structure.

This is required for communication with the NX-842 coprocessor on
a PowerNV system.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 arch/powerpc/include/asm/icswx.h      | 184 ++++++++++++++++++++++++++++++++++
 arch/powerpc/include/asm/ppc-opcode.h |  13 +++
 2 files changed, 197 insertions(+)
 create mode 100644 arch/powerpc/include/asm/icswx.h

diff --git a/arch/powerpc/include/asm/icswx.h b/arch/powerpc/include/asm/icswx.h
new file mode 100644
index 0000000..9f8402b
--- /dev/null
+++ b/arch/powerpc/include/asm/icswx.h
@@ -0,0 +1,184 @@
+/*
+ * ICSWX api
+ *
+ * Copyright (C) 2015 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * This provides the Initiate Coprocessor Store Word Indexed (ICSWX)
+ * instruction.  This instruction is used to communicate with PowerPC
+ * coprocessors.  This also provides definitions of the structures used
+ * to communicate with the coprocessor.
+ *
+ * The RFC02130: Coprocessor Architecture document is the reference for
+ * everything in this file unless otherwise noted.
+ */
+#ifndef _ARCH_POWERPC_INCLUDE_ASM_ICSWX_H_
+#define _ARCH_POWERPC_INCLUDE_ASM_ICSWX_H_
+
+#include <asm/ppc-opcode.h> /* for PPC_ICSWX */
+
+/* Chapter 6.5.8 Coprocessor-Completion Block (CCB) */
+
+#define CCB_VALUE		(0x3fffffffffffffff)
+#define CCB_ADDRESS		(0xfffffffffffffff8)
+#define CCB_CM			(0x0000000000000007)
+#define CCB_CM0			(0x0000000000000004)
+#define CCB_CM12		(0x0000000000000003)
+
+#define CCB_CM0_ALL_COMPLETIONS	(0x0)
+#define CCB_CM0_LAST_IN_CHAIN	(0x4)
+#define CCB_CM12_STORE		(0x0)
+#define CCB_CM12_INTERRUPT	(0x1)
+
+#define CCB_SIZE		(0x10)
+#define CCB_ALIGN		CCB_SIZE
+
+struct coprocessor_completion_block {
+	__be64 value;
+	__be64 address;
+} __packed __aligned(CCB_ALIGN);
+
+
+/* Chapter 6.5.7 Coprocessor-Status Block (CSB) */
+
+#define CSB_V			(0x80)
+#define CSB_F			(0x04)
+#define CSB_CH			(0x03)
+#define CSB_CE_INCOMPLETE	(0x80)
+#define CSB_CE_TERMINATION	(0x40)
+#define CSB_CE_TPBC		(0x20)
+
+#define CSB_CC_SUCCESS		(0)
+#define CSB_CC_INVALID_ALIGN	(1)
+#define CSB_CC_OPERAND_OVERLAP	(2)
+#define CSB_CC_DATA_LENGTH	(3)
+#define CSB_CC_TRANSLATION	(5)
+#define CSB_CC_PROTECTION	(6)
+#define CSB_CC_RD_EXTERNAL	(7)
+#define CSB_CC_INVALID_OPERAND	(8)
+#define CSB_CC_PRIVILEGE	(9)
+#define CSB_CC_INTERNAL		(10)
+#define CSB_CC_WR_EXTERNAL	(12)
+#define CSB_CC_NOSPC		(13)
+#define CSB_CC_EXCESSIVE_DDE	(14)
+#define CSB_CC_WR_TRANSLATION	(15)
+#define CSB_CC_WR_PROTECTION	(16)
+#define CSB_CC_UNKNOWN_CODE	(17)
+#define CSB_CC_ABORT		(18)
+#define CSB_CC_TRANSPORT	(20)
+#define CSB_CC_SEGMENTED_DDL	(31)
+#define CSB_CC_PROGRESS_POINT	(32)
+#define CSB_CC_DDE_OVERFLOW	(33)
+#define CSB_CC_SESSION		(34)
+#define CSB_CC_PROVISION	(36)
+#define CSB_CC_CHAIN		(37)
+#define CSB_CC_SEQUENCE		(38)
+#define CSB_CC_HW		(39)
+
+#define CSB_SIZE		(0x10)
+#define CSB_ALIGN		CSB_SIZE
+
+struct coprocessor_status_block {
+	u8 flags;
+	u8 cs;
+	u8 cc;
+	u8 ce;
+	__be32 count;
+	__be64 address;
+} __packed __aligned(CSB_ALIGN);
+
+
+/* Chapter 6.5.10 Data-Descriptor List (DDL)
+ * each list contains one or more Data-Descriptor Entries (DDE)
+ */
+
+#define DDE_P			(0x8000)
+
+#define DDE_SIZE		(0x10)
+#define DDE_ALIGN		DDE_SIZE
+
+struct data_descriptor_entry {
+	__be16 flags;
+	u8 count;
+	u8 index;
+	__be32 length;
+	__be64 address;
+} __packed __aligned(DDE_ALIGN);
+
+
+/* Chapter 6.5.2 Coprocessor-Request Block (CRB) */
+
+#define CRB_SIZE		(0x80)
+#define CRB_ALIGN		(0x100) /* Errata: requires 256 alignment */
+
+/* Coprocessor Status Block field
+ *   ADDRESS	address of CSB
+ *   C		CCB is valid
+ *   AT		0 = addrs are virtual, 1 = addrs are phys
+ *   M		enable perf monitor
+ */
+#define CRB_CSB_ADDRESS		(0xfffffffffffffff0)
+#define CRB_CSB_C		(0x0000000000000008)
+#define CRB_CSB_AT		(0x0000000000000002)
+#define CRB_CSB_M		(0x0000000000000001)
+
+struct coprocessor_request_block {
+	__be32 ccw;
+	__be32 flags;
+	__be64 csb_addr;
+
+	struct data_descriptor_entry source;
+	struct data_descriptor_entry target;
+
+	struct coprocessor_completion_block ccb;
+
+	u8 reserved[48];
+
+	struct coprocessor_status_block csb;
+} __packed __aligned(CRB_ALIGN);
+
+
+/* RFC02167 Initiate Coprocessor Instructions document
+ * Chapter 8.2.1.1.1 RS
+ * Chapter 8.2.3 Coprocessor Directive
+ * Chapter 8.2.4 Execution
+ *
+ * The CCW must be converted to BE before passing to icswx()
+ */
+
+#define CCW_PS			(0xff000000)
+#define CCW_CT			(0x00ff0000)
+#define CCW_CD			(0x0000ffff)
+#define CCW_CL			(0x0000c000)
+
+
+/* RFC02167 Initiate Coprocessor Instructions document
+ * Chapter 8.2.1 Initiate Coprocessor Store Word Indexed (ICSWX)
+ * Chapter 8.2.4.1 Condition Register 0
+ */
+
+#define ICSWX_INITIATED		(0x8)
+#define ICSWX_BUSY		(0x4)
+#define ICSWX_REJECTED		(0x2)
+
+static inline int icswx(__be32 ccw, struct coprocessor_request_block *crb)
+{
+	__be64 ccw_reg = ccw;
+	u32 cr;
+
+	__asm__ __volatile__(
+	PPC_ICSWX(%1,0,%2) "\n"
+	"mfcr %0\n"
+	: "=r" (cr)
+	: "r" (ccw_reg), "r" (crb)
+	: "cr0", "memory");
+
+	return (int)((cr >> 28) & 0xf);
+}
+
+
+#endif /* _ARCH_POWERPC_INCLUDE_ASM_ICSWX_H_ */
diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index 5c93f69..8452335 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -136,6 +136,8 @@
 #define PPC_INST_DCBAL			0x7c2005ec
 #define PPC_INST_DCBZL			0x7c2007ec
 #define PPC_INST_ICBT			0x7c00002c
+#define PPC_INST_ICSWX			0x7c00032d
+#define PPC_INST_ICSWEPX		0x7c00076d
 #define PPC_INST_ISEL			0x7c00001e
 #define PPC_INST_ISEL_MASK		0xfc00003e
 #define PPC_INST_LDARX			0x7c0000a8
@@ -403,4 +405,15 @@
 #define MFTMR(tmr, r)		stringify_in_c(.long PPC_INST_MFTMR | \
 					       TMRN(tmr) | ___PPC_RT(r))
 
+/* Coprocessor instructions */
+#define PPC_ICSWX(s, a, b)	stringify_in_c(.long PPC_INST_ICSWX |	\
+					       ___PPC_RS(s) |		\
+					       ___PPC_RA(a) |		\
+					       ___PPC_RB(b))
+#define PPC_ICSWEPX(s, a, b)	stringify_in_c(.long PPC_INST_ICSWEPX | \
+					       ___PPC_RS(s) |		\
+					       ___PPC_RA(a) |		\
+					       ___PPC_RB(b))
+
+
 #endif /* _ASM_POWERPC_PPC_OPCODE_H */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 03/10] lib: add software 842 compression/decompression
  2015-05-06 16:50 ` [PATCHv2 00/10] add 842 hw compression for PowerNV platform Dan Streetman
  2015-05-06 16:50   ` [PATCH 01/10] powerpc: export of_get_ibm_chip_id function Dan Streetman
  2015-05-06 16:50   ` [PATCH 02/10] powerpc: Add ICSWX instruction Dan Streetman
@ 2015-05-06 16:50   ` Dan Streetman
  2015-05-06 16:51   ` [PATCH 04/10] crypto: change 842 alg to use software Dan Streetman
                     ` (7 subsequent siblings)
  10 siblings, 0 replies; 46+ messages in thread
From: Dan Streetman @ 2015-05-06 16:50 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Dan Streetman, Seth Jennings, Robert Jennings, Andrew Morton,
	Greg KH, linux-crypto, linuxppc-dev, linux-kernel

Add 842-format software compression and decompression functions.
Update the MAINTAINERS 842 section to include the new files.

The 842 compression function can compress any input data into the 842
compression format.  The 842 decompression function can decompress any
standard-format 842 compressed data - specifically, either a compressed
data buffer created by the 842 software compression function, or a
compressed data buffer created by the 842 hardware compressor (located
in PowerPC coprocessors).

The 842 compressed data format is explained in the header comments.

This is used in a later patch to provide a full software 842 compression
and decompression crypto interface.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 MAINTAINERS              |   2 +
 include/linux/sw842.h    |  12 +
 lib/842/842.h            | 127 ++++++++++
 lib/842/842_compress.c   | 626 +++++++++++++++++++++++++++++++++++++++++++++++
 lib/842/842_debugfs.h    |  52 ++++
 lib/842/842_decompress.c | 405 ++++++++++++++++++++++++++++++
 lib/842/Makefile         |   2 +
 lib/Kconfig              |   6 +
 lib/Makefile             |   2 +
 9 files changed, 1234 insertions(+)
 create mode 100644 include/linux/sw842.h
 create mode 100644 lib/842/842.h
 create mode 100644 lib/842/842_compress.c
 create mode 100644 lib/842/842_debugfs.h
 create mode 100644 lib/842/842_decompress.c
 create mode 100644 lib/842/Makefile

diff --git a/MAINTAINERS b/MAINTAINERS
index 781e099..116af01 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4872,6 +4872,8 @@ M:	Dan Streetman <ddstreet@us.ibm.com>
 S:	Supported
 F:	drivers/crypto/nx/nx-842.c
 F:	include/linux/nx842.h
+F:	include/linux/sw842.h
+F:	lib/842/
 
 IBM Power Linux RAID adapter
 M:	Brian King <brking@us.ibm.com>
diff --git a/include/linux/sw842.h b/include/linux/sw842.h
new file mode 100644
index 0000000..109ba04
--- /dev/null
+++ b/include/linux/sw842.h
@@ -0,0 +1,12 @@
+#ifndef __SW842_H__
+#define __SW842_H__
+
+#define SW842_MEM_COMPRESS	(0xf000)
+
+int sw842_compress(const u8 *src, unsigned int srclen,
+		   u8 *dst, unsigned int *destlen, void *wmem);
+
+int sw842_decompress(const u8 *src, unsigned int srclen,
+		     u8 *dst, unsigned int *destlen);
+
+#endif
diff --git a/lib/842/842.h b/lib/842/842.h
new file mode 100644
index 0000000..7c20003
--- /dev/null
+++ b/lib/842/842.h
@@ -0,0 +1,127 @@
+
+#ifndef __842_H__
+#define __842_H__
+
+/* The 842 compressed format is made up of multiple blocks, each of
+ * which have the format:
+ *
+ * <template>[arg1][arg2][arg3][arg4]
+ *
+ * where there are between 0 and 4 template args, depending on the specific
+ * template operation.  For normal operations, each arg is either a specific
+ * number of data bytes to add to the output buffer, or an index pointing
+ * to a previously-written number of data bytes to copy to the output buffer.
+ *
+ * The template code is a 5-bit value.  This code indicates what to do with
+ * the following data.  Template codes from 0 to 0x19 should use the template
+ * table, the static "decomp_ops" table used in decompress.  For each template
+ * (table row), there are between 1 and 4 actions; each action corresponds to
+ * an arg following the template code bits.  Each action is either a "data"
+ * type action, or a "index" type action, and each action results in 2, 4, or 8
+ * bytes being written to the output buffer.  Each template (i.e. all actions
+ * in the table row) will add up to 8 bytes being written to the output buffer.
+ * Any row with less than 4 actions is padded with noop actions, indicated by
+ * N0 (for which there is no corresponding arg in the compressed data buffer).
+ *
+ * "Data" actions, indicated in the table by D2, D4, and D8, mean that the
+ * corresponding arg is 2, 4, or 8 bytes, respectively, in the compressed data
+ * buffer should be copied directly to the output buffer.
+ *
+ * "Index" actions, indicated in the table by I2, I4, and I8, mean the
+ * corresponding arg is an index parameter that points to, respectively, a 2,
+ * 4, or 8 byte value already in the output buffer, that should be copied to
+ * the end of the output buffer.  Essentially, the index points to a position
+ * in a ring buffer that contains the last N bytes of output buffer data.
+ * The number of bits for each index's arg are: 8 bits for I2, 9 bits for I4,
+ * and 8 bits for I8.  Since each index points to a 2, 4, or 8 byte section,
+ * this means that I2 can reference 512 bytes ((2^8 bits = 256) * 2 bytes), I4
+ * can reference 2048 bytes ((2^9 = 512) * 4 bytes), and I8 can reference 2048
+ * bytes ((2^8 = 256) * 8 bytes).  Think of it as a kind-of ring buffer for
+ * each of I2, I4, and I8 that are updated for each byte written to the output
+ * buffer.  In this implementation, the output buffer is directly used for each
+ * index; there is no additional memory required.  Note that the index is into
+ * a ring buffer, not a sliding window; for example, if there have been 260
+ * bytes written to the output buffer, an I2 index of 0 would index to byte 256
+ * in the output buffer, while an I2 index of 16 would index to byte 16 in the
+ * output buffer.
+ *
+ * There are also 3 special template codes; 0x1b for "repeat", 0x1c for
+ * "zeros", and 0x1e for "end".  The "repeat" operation is followed by a 6 bit
+ * arg N indicating how many times to repeat.  The last 8 bytes written to the
+ * output buffer are written again to the output buffer, N + 1 times.  The
+ * "zeros" operation, which has no arg bits, writes 8 zeros to the output
+ * buffer.  The "end" operation, which also has no arg bits, signals the end
+ * of the compressed data.  There may be some number of padding (don't care,
+ * but usually 0) bits after the "end" operation bits, to fill the buffer
+ * length to a specific byte multiple (usually a multiple of 8, 16, or 32
+ * bytes).
+ *
+ * This software implementation also uses one of the undefined template values,
+ * 0x1d as a special "short data" template code, to represent less than 8 bytes
+ * of uncompressed data.  It is followed by a 3 bit arg N indicating how many
+ * data bytes will follow, and then N bytes of data, which should be copied to
+ * the output buffer.  This allows the software 842 compressor to accept input
+ * buffers that are not an exact multiple of 8 bytes long.  However, those
+ * compressed buffers containing this sw-only template will be rejected by
+ * the 842 hardware decompressor, and must be decompressed with this software
+ * library.  The 842 software compression module includes a parameter to
+ * disable using this sw-only "short data" template, and instead simply
+ * reject any input buffer that is not a multiple of 8 bytes long.
+ *
+ * After all actions for each operation code are processed, another template
+ * code is in the next 5 bits.  The decompression ends once the "end" template
+ * code is detected.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/bitops.h>
+#include <asm/unaligned.h>
+
+#include <linux/sw842.h>
+
+/* special templates */
+#define OP_REPEAT	(0x1B)
+#define OP_ZEROS	(0x1C)
+#define OP_END		(0x1E)
+
+/* sw only template - this is not in the hw design; it's used only by this
+ * software compressor and decompressor, to allow input buffers that aren't
+ * a multiple of 8.
+ */
+#define OP_SHORT_DATA	(0x1D)
+
+/* additional bits of each op param */
+#define OP_BITS		(5)
+#define REPEAT_BITS	(6)
+#define SHORT_DATA_BITS	(3)
+#define I2_BITS		(8)
+#define I4_BITS		(9)
+#define I8_BITS		(8)
+
+#define REPEAT_BITS_MAX		(0x3f)
+#define SHORT_DATA_BITS_MAX	(0x7)
+
+/* Arbitrary values used to indicate action */
+#define OP_ACTION	(0x70)
+#define OP_ACTION_INDEX	(0x10)
+#define OP_ACTION_DATA	(0x20)
+#define OP_ACTION_NOOP	(0x40)
+#define OP_AMOUNT	(0x0f)
+#define OP_AMOUNT_0	(0x00)
+#define OP_AMOUNT_2	(0x02)
+#define OP_AMOUNT_4	(0x04)
+#define OP_AMOUNT_8	(0x08)
+
+#define D2		(OP_ACTION_DATA  | OP_AMOUNT_2)
+#define D4		(OP_ACTION_DATA  | OP_AMOUNT_4)
+#define D8		(OP_ACTION_DATA  | OP_AMOUNT_8)
+#define I2		(OP_ACTION_INDEX | OP_AMOUNT_2)
+#define I4		(OP_ACTION_INDEX | OP_AMOUNT_4)
+#define I8		(OP_ACTION_INDEX | OP_AMOUNT_8)
+#define N0		(OP_ACTION_NOOP  | OP_AMOUNT_0)
+
+/* the max of the regular templates - not including the special templates */
+#define OPS_MAX		(0x1a)
+
+#endif
diff --git a/lib/842/842_compress.c b/lib/842/842_compress.c
new file mode 100644
index 0000000..7ce6894
--- /dev/null
+++ b/lib/842/842_compress.c
@@ -0,0 +1,626 @@
+/*
+ * 842 Software Compression
+ *
+ * Copyright (C) 2015 Dan Streetman, IBM Corp
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * See 842.h for details of the 842 compressed format.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#define MODULE_NAME "842_compress"
+
+#include <linux/hashtable.h>
+
+#include "842.h"
+#include "842_debugfs.h"
+
+#define SW842_HASHTABLE8_BITS	(10)
+#define SW842_HASHTABLE4_BITS	(11)
+#define SW842_HASHTABLE2_BITS	(10)
+
+/* By default, we allow compressing input buffers of any length, but we must
+ * use the non-standard "short data" template so the decompressor can correctly
+ * reproduce the uncompressed data buffer at the right length.  However the
+ * hardware 842 compressor will not recognize the "short data" template, and
+ * will fail to decompress any compressed buffer containing it (I have no idea
+ * why anyone would want to use software to compress and hardware to decompress
+ * but that's beside the point).  This parameter forces the compression
+ * function to simply reject any input buffer that isn't a multiple of 8 bytes
+ * long, instead of using the "short data" template, so that all compressed
+ * buffers produced by this function will be decompressable by the 842 hardware
+ * decompressor.  Unless you have a specific need for that, leave this disabled
+ * so that any length buffer can be compressed.
+ */
+static bool sw842_strict;
+module_param_named(strict, sw842_strict, bool, 0644);
+
+static u8 comp_ops[OPS_MAX][5] = { /* params size in bits */
+	{ I8, N0, N0, N0, 0x19 }, /* 8 */
+	{ I4, I4, N0, N0, 0x18 }, /* 18 */
+	{ I4, I2, I2, N0, 0x17 }, /* 25 */
+	{ I2, I2, I4, N0, 0x13 }, /* 25 */
+	{ I2, I2, I2, I2, 0x12 }, /* 32 */
+	{ I4, I2, D2, N0, 0x16 }, /* 33 */
+	{ I4, D2, I2, N0, 0x15 }, /* 33 */
+	{ I2, D2, I4, N0, 0x0e }, /* 33 */
+	{ D2, I2, I4, N0, 0x09 }, /* 33 */
+	{ I2, I2, I2, D2, 0x11 }, /* 40 */
+	{ I2, I2, D2, I2, 0x10 }, /* 40 */
+	{ I2, D2, I2, I2, 0x0d }, /* 40 */
+	{ D2, I2, I2, I2, 0x08 }, /* 40 */
+	{ I4, D4, N0, N0, 0x14 }, /* 41 */
+	{ D4, I4, N0, N0, 0x04 }, /* 41 */
+	{ I2, I2, D4, N0, 0x0f }, /* 48 */
+	{ I2, D2, I2, D2, 0x0c }, /* 48 */
+	{ I2, D4, I2, N0, 0x0b }, /* 48 */
+	{ D2, I2, I2, D2, 0x07 }, /* 48 */
+	{ D2, I2, D2, I2, 0x06 }, /* 48 */
+	{ D4, I2, I2, N0, 0x03 }, /* 48 */
+	{ I2, D2, D4, N0, 0x0a }, /* 56 */
+	{ D2, I2, D4, N0, 0x05 }, /* 56 */
+	{ D4, I2, D2, N0, 0x02 }, /* 56 */
+	{ D4, D2, I2, N0, 0x01 }, /* 56 */
+	{ D8, N0, N0, N0, 0x00 }, /* 64 */
+};
+
+struct sw842_hlist_node8 {
+	struct hlist_node node;
+	u64 data;
+	u8 index;
+};
+
+struct sw842_hlist_node4 {
+	struct hlist_node node;
+	u32 data;
+	u16 index;
+};
+
+struct sw842_hlist_node2 {
+	struct hlist_node node;
+	u16 data;
+	u8 index;
+};
+
+#define INDEX_NOT_FOUND		(-1)
+#define INDEX_NOT_CHECKED	(-2)
+
+struct sw842_param {
+	u8 *in;
+	u8 *instart;
+	u64 ilen;
+	u8 *out;
+	u64 olen;
+	u8 bit;
+	u64 data8[1];
+	u32 data4[2];
+	u16 data2[4];
+	int index8[1];
+	int index4[2];
+	int index2[4];
+	DECLARE_HASHTABLE(htable8, SW842_HASHTABLE8_BITS);
+	DECLARE_HASHTABLE(htable4, SW842_HASHTABLE4_BITS);
+	DECLARE_HASHTABLE(htable2, SW842_HASHTABLE2_BITS);
+	struct sw842_hlist_node8 node8[1 << I8_BITS];
+	struct sw842_hlist_node4 node4[1 << I4_BITS];
+	struct sw842_hlist_node2 node2[1 << I2_BITS];
+};
+
+#define get_input_data(p, o, b)						\
+	be##b##_to_cpu(get_unaligned((__be##b *)((p)->in + (o))))
+
+#define init_hashtable_nodes(p, b)	do {			\
+	int _i;							\
+	hash_init((p)->htable##b);				\
+	for (_i = 0; _i < ARRAY_SIZE((p)->node##b); _i++) {	\
+		(p)->node##b[_i].index = _i;			\
+		(p)->node##b[_i].data = 0;			\
+		INIT_HLIST_NODE(&(p)->node##b[_i].node);	\
+	}							\
+} while (0)
+
+#define find_index(p, b, n)	({					\
+	struct sw842_hlist_node##b *_n;					\
+	p->index##b[n] = INDEX_NOT_FOUND;				\
+	hash_for_each_possible(p->htable##b, _n, node, p->data##b[n]) {	\
+		if (p->data##b[n] == _n->data) {			\
+			p->index##b[n] = _n->index;			\
+			break;						\
+		}							\
+	}								\
+	p->index##b[n] >= 0;						\
+})
+
+#define check_index(p, b, n)			\
+	((p)->index##b[n] == INDEX_NOT_CHECKED	\
+	 ? find_index(p, b, n)			\
+	 : (p)->index##b[n] >= 0)
+
+#define replace_hash(p, b, i, d)	do {				\
+	struct sw842_hlist_node##b *_n = &(p)->node##b[(i)+(d)];	\
+	hash_del(&_n->node);						\
+	_n->data = (p)->data##b[d];					\
+	pr_debug("add hash index%x %x pos %x data %lx\n", b,		\
+		 (unsigned int)_n->index,				\
+		 (unsigned int)((p)->in - (p)->instart),		\
+		 (unsigned long)_n->data);				\
+	hash_add((p)->htable##b, &_n->node, _n->data);			\
+} while (0)
+
+static u8 bmask[8] = { 0x00, 0x80, 0xc0, 0xe0, 0xf0, 0xf8, 0xfc, 0xfe };
+
+static int add_bits(struct sw842_param *p, u64 d, u8 n);
+
+static int __split_add_bits(struct sw842_param *p, u64 d, u8 n, u8 s)
+{
+	int ret;
+
+	if (n <= s)
+		return -EINVAL;
+
+	ret = add_bits(p, d >> s, n - s);
+	if (ret)
+		return ret;
+	return add_bits(p, d & GENMASK_ULL(s - 1, 0), s);
+}
+
+static int add_bits(struct sw842_param *p, u64 d, u8 n)
+{
+	int b = p->bit, bits = b + n, s = round_up(bits, 8) - bits;
+	u64 o;
+	u8 *out = p->out;
+
+	pr_debug("add %u bits %lx\n", (unsigned char)n, (unsigned long)d);
+
+	if (n > 64)
+		return -EINVAL;
+
+	/* split this up if writing to > 8 bytes (i.e. n == 64 && p->bit > 0),
+	 * or if we're at the end of the output buffer and would write past end
+	 */
+	if (bits > 64)
+		return __split_add_bits(p, d, n, 32);
+	else if (p->olen < 8 && bits > 32 && bits <= 56)
+		return __split_add_bits(p, d, n, 16);
+	else if (p->olen < 4 && bits > 16 && bits <= 24)
+		return __split_add_bits(p, d, n, 8);
+
+	if (DIV_ROUND_UP(bits, 8) > p->olen)
+		return -ENOSPC;
+
+	o = *out & bmask[b];
+	d <<= s;
+
+	if (bits <= 8)
+		*out = o | d;
+	else if (bits <= 16)
+		put_unaligned(cpu_to_be16(o << 8 | d), (__be16 *)out);
+	else if (bits <= 24)
+		put_unaligned(cpu_to_be32(o << 24 | d << 8), (__be32 *)out);
+	else if (bits <= 32)
+		put_unaligned(cpu_to_be32(o << 24 | d), (__be32 *)out);
+	else if (bits <= 40)
+		put_unaligned(cpu_to_be64(o << 56 | d << 24), (__be64 *)out);
+	else if (bits <= 48)
+		put_unaligned(cpu_to_be64(o << 56 | d << 16), (__be64 *)out);
+	else if (bits <= 56)
+		put_unaligned(cpu_to_be64(o << 56 | d << 8), (__be64 *)out);
+	else
+		put_unaligned(cpu_to_be64(o << 56 | d), (__be64 *)out);
+
+	p->bit += n;
+
+	if (p->bit > 7) {
+		p->out += p->bit / 8;
+		p->olen -= p->bit / 8;
+		p->bit %= 8;
+	}
+
+	return 0;
+}
+
+static int add_template(struct sw842_param *p, u8 c)
+{
+	int ret, i, b = 0;
+	u8 *t = comp_ops[c];
+	bool inv = false;
+
+	if (c >= OPS_MAX)
+		return -EINVAL;
+
+	pr_debug("template %x\n", t[4]);
+
+	ret = add_bits(p, t[4], OP_BITS);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < 4; i++) {
+		pr_debug("op %x\n", t[i]);
+
+		switch (t[i] & OP_AMOUNT) {
+		case OP_AMOUNT_8:
+			if (b)
+				inv = true;
+			else if (t[i] & OP_ACTION_INDEX)
+				ret = add_bits(p, p->index8[0], I8_BITS);
+			else if (t[i] & OP_ACTION_DATA)
+				ret = add_bits(p, p->data8[0], 64);
+			else
+				inv = true;
+			break;
+		case OP_AMOUNT_4:
+			if (b == 2 && t[i] & OP_ACTION_DATA)
+				ret = add_bits(p, get_input_data(p, 2, 32), 32);
+			else if (b != 0 && b != 4)
+				inv = true;
+			else if (t[i] & OP_ACTION_INDEX)
+				ret = add_bits(p, p->index4[b >> 2], I4_BITS);
+			else if (t[i] & OP_ACTION_DATA)
+				ret = add_bits(p, p->data4[b >> 2], 32);
+			else
+				inv = true;
+			break;
+		case OP_AMOUNT_2:
+			if (b != 0 && b != 2 && b != 4 && b != 6)
+				inv = true;
+			if (t[i] & OP_ACTION_INDEX)
+				ret = add_bits(p, p->index2[b >> 1], I2_BITS);
+			else if (t[i] & OP_ACTION_DATA)
+				ret = add_bits(p, p->data2[b >> 1], 16);
+			else
+				inv = true;
+			break;
+		case OP_AMOUNT_0:
+			inv = (b != 8) || !(t[i] & OP_ACTION_NOOP);
+			break;
+		default:
+			inv = true;
+			break;
+		}
+
+		if (ret)
+			return ret;
+
+		if (inv) {
+			pr_err("Invalid templ %x op %d : %x %x %x %x\n",
+			       c, i, t[0], t[1], t[2], t[3]);
+			return -EINVAL;
+		}
+
+		b += t[i] & OP_AMOUNT;
+	}
+
+	if (b != 8) {
+		pr_err("Invalid template %x len %x : %x %x %x %x\n",
+		       c, b, t[0], t[1], t[2], t[3]);
+		return -EINVAL;
+	}
+
+	if (sw842_template_counts)
+		atomic_inc(&template_count[t[4]]);
+
+	return 0;
+}
+
+static int add_repeat_template(struct sw842_param *p, u8 r)
+{
+	int ret;
+
+	/* repeat param is 0-based */
+	if (!r || --r > REPEAT_BITS_MAX)
+		return -EINVAL;
+
+	ret = add_bits(p, OP_REPEAT, OP_BITS);
+	if (ret)
+		return ret;
+
+	ret = add_bits(p, r, REPEAT_BITS);
+	if (ret)
+		return ret;
+
+	if (sw842_template_counts)
+		atomic_inc(&template_repeat_count);
+
+	return 0;
+}
+
+static int add_short_data_template(struct sw842_param *p, u8 b)
+{
+	int ret, i;
+
+	if (!b || b > SHORT_DATA_BITS_MAX)
+		return -EINVAL;
+
+	ret = add_bits(p, OP_SHORT_DATA, OP_BITS);
+	if (ret)
+		return ret;
+
+	ret = add_bits(p, b, SHORT_DATA_BITS);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < b; i++) {
+		ret = add_bits(p, p->in[i], 8);
+		if (ret)
+			return ret;
+	}
+
+	if (sw842_template_counts)
+		atomic_inc(&template_short_data_count);
+
+	return 0;
+}
+
+static int add_zeros_template(struct sw842_param *p)
+{
+	int ret = add_bits(p, OP_ZEROS, OP_BITS);
+
+	if (ret)
+		return ret;
+
+	if (sw842_template_counts)
+		atomic_inc(&template_zeros_count);
+
+	return 0;
+}
+
+static int add_end_template(struct sw842_param *p)
+{
+	int ret = add_bits(p, OP_END, OP_BITS);
+
+	if (ret)
+		return ret;
+
+	if (sw842_template_counts)
+		atomic_inc(&template_end_count);
+
+	return 0;
+}
+
+static bool check_template(struct sw842_param *p, u8 c)
+{
+	u8 *t = comp_ops[c];
+	int i, match, b = 0;
+
+	if (c >= OPS_MAX)
+		return false;
+
+	for (i = 0; i < 4; i++) {
+		if (t[i] & OP_ACTION_INDEX) {
+			if (t[i] & OP_AMOUNT_2)
+				match = check_index(p, 2, b >> 1);
+			else if (t[i] & OP_AMOUNT_4)
+				match = check_index(p, 4, b >> 2);
+			else if (t[i] & OP_AMOUNT_8)
+				match = check_index(p, 8, 0);
+			else
+				return false;
+			if (!match)
+				return false;
+		}
+
+		b += t[i] & OP_AMOUNT;
+	}
+
+	return true;
+}
+
+static void get_next_data(struct sw842_param *p)
+{
+	p->data8[0] = get_input_data(p, 0, 64);
+	p->data4[0] = get_input_data(p, 0, 32);
+	p->data4[1] = get_input_data(p, 4, 32);
+	p->data2[0] = get_input_data(p, 0, 16);
+	p->data2[1] = get_input_data(p, 2, 16);
+	p->data2[2] = get_input_data(p, 4, 16);
+	p->data2[3] = get_input_data(p, 6, 16);
+}
+
+/* update the hashtable entries.
+ * only call this after finding/adding the current template
+ * the dataN fields for the current 8 byte block must be already updated
+ */
+static void update_hashtables(struct sw842_param *p)
+{
+	u64 pos = p->in - p->instart;
+	u64 n8 = (pos >> 3) % (1 << I8_BITS);
+	u64 n4 = (pos >> 2) % (1 << I4_BITS);
+	u64 n2 = (pos >> 1) % (1 << I2_BITS);
+
+	replace_hash(p, 8, n8, 0);
+	replace_hash(p, 4, n4, 0);
+	replace_hash(p, 4, n4, 1);
+	replace_hash(p, 2, n2, 0);
+	replace_hash(p, 2, n2, 1);
+	replace_hash(p, 2, n2, 2);
+	replace_hash(p, 2, n2, 3);
+}
+
+/* find the next template to use, and add it
+ * the p->dataN fields must already be set for the current 8 byte block
+ */
+static int process_next(struct sw842_param *p)
+{
+	int ret, i;
+
+	p->index8[0] = INDEX_NOT_CHECKED;
+	p->index4[0] = INDEX_NOT_CHECKED;
+	p->index4[1] = INDEX_NOT_CHECKED;
+	p->index2[0] = INDEX_NOT_CHECKED;
+	p->index2[1] = INDEX_NOT_CHECKED;
+	p->index2[2] = INDEX_NOT_CHECKED;
+	p->index2[3] = INDEX_NOT_CHECKED;
+
+	/* check up to OPS_MAX - 1; last op is our fallback */
+	for (i = 0; i < OPS_MAX - 1; i++) {
+		if (check_template(p, i))
+			break;
+	}
+
+	ret = add_template(p, i);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+/**
+ * sw842_compress
+ *
+ * Compress the uncompressed buffer of length @ilen at @in to the output buffer
+ * @out, using no more than @olen bytes, using the 842 compression format.
+ *
+ * Returns: 0 on success, error on failure.  The @olen parameter
+ * will contain the number of output bytes written on success, or
+ * 0 on error.
+ */
+int sw842_compress(const u8 *in, unsigned int ilen,
+		   u8 *out, unsigned int *olen, void *wmem)
+{
+	struct sw842_param *p = (struct sw842_param *)wmem;
+	int ret;
+	u64 last, next, pad, total;
+	u8 repeat_count = 0;
+
+	BUILD_BUG_ON(sizeof(*p) > SW842_MEM_COMPRESS);
+
+	init_hashtable_nodes(p, 8);
+	init_hashtable_nodes(p, 4);
+	init_hashtable_nodes(p, 2);
+
+	p->in = (u8 *)in;
+	p->instart = p->in;
+	p->ilen = ilen;
+	p->out = out;
+	p->olen = *olen;
+	p->bit = 0;
+
+	total = p->olen;
+
+	*olen = 0;
+
+	/* if using strict mode, we can only compress a multiple of 8 */
+	if (sw842_strict && (ilen % 8)) {
+		pr_err("Using strict mode, can't compress len %d\n", ilen);
+		return -EINVAL;
+	}
+
+	/* let's compress at least 8 bytes, mkay? */
+	if (unlikely(ilen < 8))
+		goto skip_comp;
+
+	/* make initial 'last' different so we don't match the first time */
+	last = ~get_unaligned((u64 *)p->in);
+
+	while (p->ilen > 7) {
+		next = get_unaligned((u64 *)p->in);
+
+		/* must get the next data, as we need to update the hashtable
+		 * entries with the new data every time
+		 */
+		get_next_data(p);
+
+		/* we don't care about endianness in last or next;
+		 * we're just comparing 8 bytes to another 8 bytes,
+		 * they're both the same endianness
+		 */
+		if (next == last) {
+			/* repeat count bits are 0-based, so we stop at +1 */
+			if (++repeat_count <= REPEAT_BITS_MAX)
+				goto repeat;
+		}
+		if (repeat_count) {
+			ret = add_repeat_template(p, repeat_count);
+			repeat_count = 0;
+			if (next == last) /* reached max repeat bits */
+				goto repeat;
+		}
+
+		if (next == 0)
+			ret = add_zeros_template(p);
+		else
+			ret = process_next(p);
+
+		if (ret)
+			return ret;
+
+repeat:
+		last = next;
+		update_hashtables(p);
+		p->in += 8;
+		p->ilen -= 8;
+	}
+
+	if (repeat_count) {
+		ret = add_repeat_template(p, repeat_count);
+		if (ret)
+			return ret;
+	}
+
+skip_comp:
+	if (p->ilen > 0) {
+		ret = add_short_data_template(p, p->ilen);
+		if (ret)
+			return ret;
+
+		p->in += p->ilen;
+		p->ilen = 0;
+	}
+
+	ret = add_end_template(p);
+	if (ret)
+		return ret;
+
+	if (p->bit) {
+		p->out++;
+		p->olen--;
+		p->bit = 0;
+	}
+
+	/* pad compressed length to multiple of 8 */
+	pad = (8 - ((total - p->olen) % 8)) % 8;
+	if (pad) {
+		if (pad > p->olen) /* we were so close! */
+			return -ENOSPC;
+		memset(p->out, 0, pad);
+		p->out += pad;
+		p->olen -= pad;
+	}
+
+	if (unlikely((total - p->olen) > UINT_MAX))
+		return -ENOSPC;
+
+	*olen = total - p->olen;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(sw842_compress);
+
+static int __init sw842_init(void)
+{
+	if (sw842_template_counts)
+		sw842_debugfs_create();
+
+	return 0;
+}
+module_init(sw842_init);
+
+static void __exit sw842_exit(void)
+{
+	if (sw842_template_counts)
+		sw842_debugfs_remove();
+}
+module_exit(sw842_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Software 842 Compressor");
+MODULE_AUTHOR("Dan Streetman <ddstreet@ieee.org>");
diff --git a/lib/842/842_debugfs.h b/lib/842/842_debugfs.h
new file mode 100644
index 0000000..e7f3bff
--- /dev/null
+++ b/lib/842/842_debugfs.h
@@ -0,0 +1,52 @@
+
+#ifndef __842_DEBUGFS_H__
+#define __842_DEBUGFS_H__
+
+#include <linux/debugfs.h>
+
+static bool sw842_template_counts;
+module_param_named(template_counts, sw842_template_counts, bool, 0444);
+
+static atomic_t template_count[OPS_MAX], template_repeat_count,
+	template_zeros_count, template_short_data_count, template_end_count;
+
+static struct dentry *sw842_debugfs_root;
+
+static int __init sw842_debugfs_create(void)
+{
+	umode_t m = S_IRUGO | S_IWUSR;
+	int i;
+
+	if (!debugfs_initialized())
+		return -ENODEV;
+
+	sw842_debugfs_root = debugfs_create_dir(MODULE_NAME, NULL);
+	if (IS_ERR(sw842_debugfs_root))
+		return PTR_ERR(sw842_debugfs_root);
+
+	for (i = 0; i < ARRAY_SIZE(template_count); i++) {
+		char name[32];
+
+		snprintf(name, 32, "template_%02x", i);
+		debugfs_create_atomic_t(name, m, sw842_debugfs_root,
+					&template_count[i]);
+	}
+	debugfs_create_atomic_t("template_repeat", m, sw842_debugfs_root,
+				&template_repeat_count);
+	debugfs_create_atomic_t("template_zeros", m, sw842_debugfs_root,
+				&template_zeros_count);
+	debugfs_create_atomic_t("template_short_data", m, sw842_debugfs_root,
+				&template_short_data_count);
+	debugfs_create_atomic_t("template_end", m, sw842_debugfs_root,
+				&template_end_count);
+
+	return 0;
+}
+
+static void __exit sw842_debugfs_remove(void)
+{
+	if (sw842_debugfs_root && !IS_ERR(sw842_debugfs_root))
+		debugfs_remove_recursive(sw842_debugfs_root);
+}
+
+#endif
diff --git a/lib/842/842_decompress.c b/lib/842/842_decompress.c
new file mode 100644
index 0000000..6b2b45a
--- /dev/null
+++ b/lib/842/842_decompress.c
@@ -0,0 +1,405 @@
+/*
+ * 842 Software Decompression
+ *
+ * Copyright (C) 2015 Dan Streetman, IBM Corp
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * See 842.h for details of the 842 compressed format.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#define MODULE_NAME "842_decompress"
+
+#include "842.h"
+#include "842_debugfs.h"
+
+/* rolling fifo sizes */
+#define I2_FIFO_SIZE	(2 * (1 << I2_BITS))
+#define I4_FIFO_SIZE	(4 * (1 << I4_BITS))
+#define I8_FIFO_SIZE	(8 * (1 << I8_BITS))
+
+static u8 decomp_ops[OPS_MAX][4] = {
+	{ D8, N0, N0, N0 },
+	{ D4, D2, I2, N0 },
+	{ D4, I2, D2, N0 },
+	{ D4, I2, I2, N0 },
+	{ D4, I4, N0, N0 },
+	{ D2, I2, D4, N0 },
+	{ D2, I2, D2, I2 },
+	{ D2, I2, I2, D2 },
+	{ D2, I2, I2, I2 },
+	{ D2, I2, I4, N0 },
+	{ I2, D2, D4, N0 },
+	{ I2, D4, I2, N0 },
+	{ I2, D2, I2, D2 },
+	{ I2, D2, I2, I2 },
+	{ I2, D2, I4, N0 },
+	{ I2, I2, D4, N0 },
+	{ I2, I2, D2, I2 },
+	{ I2, I2, I2, D2 },
+	{ I2, I2, I2, I2 },
+	{ I2, I2, I4, N0 },
+	{ I4, D4, N0, N0 },
+	{ I4, D2, I2, N0 },
+	{ I4, I2, D2, N0 },
+	{ I4, I2, I2, N0 },
+	{ I4, I4, N0, N0 },
+	{ I8, N0, N0, N0 }
+};
+
+struct sw842_param {
+	u8 *in;
+	u8 bit;
+	u64 ilen;
+	u8 *out;
+	u8 *ostart;
+	u64 olen;
+};
+
+#define beN_to_cpu(d, s)					\
+	((s) == 2 ? be16_to_cpu(get_unaligned((__be16 *)d)) :	\
+	 (s) == 4 ? be32_to_cpu(get_unaligned((__be32 *)d)) :	\
+	 (s) == 8 ? be64_to_cpu(get_unaligned((__be64 *)d)) :	\
+	 WARN(1, "pr_debug param err invalid size %x\n", s))
+
+static int next_bits(struct sw842_param *p, u64 *d, u8 n);
+
+static int __split_next_bits(struct sw842_param *p, u64 *d, u8 n, u8 s)
+{
+	u64 tmp = 0;
+	int ret;
+
+	if (n <= s) {
+		pr_debug("split_next_bits invalid n %u s %u\n", n, s);
+		return -EINVAL;
+	}
+
+	ret = next_bits(p, &tmp, n - s);
+	if (ret)
+		return ret;
+	ret = next_bits(p, d, s);
+	if (ret)
+		return ret;
+	*d |= tmp << s;
+	return 0;
+}
+
+static int next_bits(struct sw842_param *p, u64 *d, u8 n)
+{
+	u8 *in = p->in, b = p->bit, bits = b + n;
+
+	if (n > 64) {
+		pr_debug("next_bits invalid n %u\n", n);
+		return -EINVAL;
+	}
+
+	/* split this up if reading > 8 bytes, or if we're at the end of
+	 * the input buffer and would read past the end
+	 */
+	if (bits > 64)
+		return __split_next_bits(p, d, n, 32);
+	else if (p->ilen < 8 && bits > 32 && bits <= 56)
+		return __split_next_bits(p, d, n, 16);
+	else if (p->ilen < 4 && bits > 16 && bits <= 24)
+		return __split_next_bits(p, d, n, 8);
+
+	if (DIV_ROUND_UP(bits, 8) > p->ilen)
+		return -EOVERFLOW;
+
+	if (bits <= 8)
+		*d = *in >> (8 - bits);
+	else if (bits <= 16)
+		*d = be16_to_cpu(get_unaligned((__be16 *)in)) >> (16 - bits);
+	else if (bits <= 32)
+		*d = be32_to_cpu(get_unaligned((__be32 *)in)) >> (32 - bits);
+	else
+		*d = be64_to_cpu(get_unaligned((__be64 *)in)) >> (64 - bits);
+
+	*d &= GENMASK_ULL(n - 1, 0);
+
+	p->bit += n;
+
+	if (p->bit > 7) {
+		p->in += p->bit / 8;
+		p->ilen -= p->bit / 8;
+		p->bit %= 8;
+	}
+
+	return 0;
+}
+
+static int do_data(struct sw842_param *p, u8 n)
+{
+	u64 v;
+	int ret;
+
+	if (n > p->olen)
+		return -ENOSPC;
+
+	ret = next_bits(p, &v, n * 8);
+	if (ret)
+		return ret;
+
+	switch (n) {
+	case 2:
+		put_unaligned(cpu_to_be16((u16)v), (__be16 *)p->out);
+		break;
+	case 4:
+		put_unaligned(cpu_to_be32((u32)v), (__be32 *)p->out);
+		break;
+	case 8:
+		put_unaligned(cpu_to_be64((u64)v), (__be64 *)p->out);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	p->out += n;
+	p->olen -= n;
+
+	return 0;
+}
+
+static int __do_index(struct sw842_param *p, u8 size, u8 bits, u64 fsize)
+{
+	u64 index, offset, total = round_down(p->out - p->ostart, 8);
+	int ret;
+
+	ret = next_bits(p, &index, bits);
+	if (ret)
+		return ret;
+
+	offset = index * size;
+
+	/* a ring buffer of fsize is used; correct the offset */
+	if (total > fsize) {
+		/* this is where the current fifo is */
+		u64 section = round_down(total, fsize);
+		/* the current pos in the fifo */
+		u64 pos = total % fsize;
+
+		/* if the offset is past/at the pos, we need to
+		 * go back to the last fifo section
+		 */
+		if (offset >= pos)
+			section -= fsize;
+
+		offset += section;
+	}
+
+	if (offset + size > total) {
+		pr_debug("index%x %lx points past end %lx\n", size,
+			 (unsigned long)offset, (unsigned long)total);
+		return -EINVAL;
+	}
+
+	pr_debug("index%x to %lx off %lx adjoff %lx tot %lx data %lx\n",
+		 size, (unsigned long)index, (unsigned long)(index * size),
+		 (unsigned long)offset, (unsigned long)total,
+		 (unsigned long)beN_to_cpu(&p->ostart[offset], size));
+
+	memcpy(p->out, &p->ostart[offset], size);
+	p->out += size;
+	p->olen -= size;
+
+	return 0;
+}
+
+int do_index(struct sw842_param *p, u8 n)
+{
+	switch (n) {
+	case 2:
+		return __do_index(p, 2, I2_BITS, I2_FIFO_SIZE);
+	case 4:
+		return __do_index(p, 4, I4_BITS, I4_FIFO_SIZE);
+	case 8:
+		return __do_index(p, 8, I8_BITS, I8_FIFO_SIZE);
+	default:
+		return -EINVAL;
+	}
+}
+
+int do_op(struct sw842_param *p, u8 o)
+{
+	int i, ret = 0;
+
+	if (o >= OPS_MAX)
+		return -EINVAL;
+
+	for (i = 0; i < 4; i++) {
+		u8 op = decomp_ops[o][i];
+
+		pr_debug("op is %x\n", op);
+
+		switch (op & OP_ACTION) {
+		case OP_ACTION_DATA:
+			ret = do_data(p, op & OP_AMOUNT);
+			break;
+		case OP_ACTION_INDEX:
+			ret = do_index(p, op & OP_AMOUNT);
+			break;
+		case OP_ACTION_NOOP:
+			break;
+		default:
+			pr_err("Interal error, invalid op %x\n", op);
+			return -EINVAL;
+		}
+
+		if (ret)
+			return ret;
+	}
+
+	if (sw842_template_counts)
+		atomic_inc(&template_count[o]);
+
+	return 0;
+}
+
+/**
+ * sw842_decompress
+ *
+ * Decompress the 842-compressed buffer of length @ilen at @in
+ * to the output buffer @out, using no more than @olen bytes.
+ *
+ * The compressed buffer must be only a single 842-compressed buffer,
+ * with the standard format described in the comments in 842.h
+ * Processing will stop when the 842 "END" template is detected,
+ * not the end of the buffer.
+ *
+ * Returns: 0 on success, error on failure.  The @olen parameter
+ * will contain the number of output bytes written on success, or
+ * 0 on error.
+ */
+int sw842_decompress(const u8 *in, unsigned int ilen,
+		     u8 *out, unsigned int *olen)
+{
+	struct sw842_param p;
+	int ret;
+	u64 op, rep, tmp, bytes, total;
+
+	p.in = (u8 *)in;
+	p.bit = 0;
+	p.ilen = ilen;
+	p.out = out;
+	p.ostart = out;
+	p.olen = *olen;
+
+	total = p.olen;
+
+	*olen = 0;
+
+	do {
+		ret = next_bits(&p, &op, OP_BITS);
+		if (ret)
+			return ret;
+
+		pr_debug("template is %lx\n", (unsigned long)op);
+
+		switch (op) {
+		case OP_REPEAT:
+			ret = next_bits(&p, &rep, REPEAT_BITS);
+			if (ret)
+				return ret;
+
+			if (p.out == out) /* no previous bytes */
+				return -EINVAL;
+
+			/* copy rep + 1 */
+			rep++;
+
+			if (rep * 8 > p.olen)
+				return -ENOSPC;
+
+			while (rep-- > 0) {
+				memcpy(p.out, p.out - 8, 8);
+				p.out += 8;
+				p.olen -= 8;
+			}
+
+			if (sw842_template_counts)
+				atomic_inc(&template_repeat_count);
+
+			break;
+		case OP_ZEROS:
+			if (8 > p.olen)
+				return -ENOSPC;
+
+			memset(p.out, 0, 8);
+			p.out += 8;
+			p.olen -= 8;
+
+			if (sw842_template_counts)
+				atomic_inc(&template_zeros_count);
+
+			break;
+		case OP_SHORT_DATA:
+			ret = next_bits(&p, &bytes, SHORT_DATA_BITS);
+			if (ret)
+				return ret;
+
+			if (!bytes || bytes > SHORT_DATA_BITS_MAX)
+				return -EINVAL;
+
+			while (bytes-- > 0) {
+				ret = next_bits(&p, &tmp, 8);
+				if (ret)
+					return ret;
+				*p.out = (u8)tmp;
+				p.out++;
+				p.olen--;
+			}
+
+			if (sw842_template_counts)
+				atomic_inc(&template_short_data_count);
+
+			break;
+		case OP_END:
+			if (sw842_template_counts)
+				atomic_inc(&template_end_count);
+
+			break;
+		default: /* use template */
+			ret = do_op(&p, op);
+			if (ret)
+				return ret;
+			break;
+		}
+	} while (op != OP_END);
+
+	if (unlikely((total - p.olen) > UINT_MAX))
+		return -ENOSPC;
+
+	*olen = total - p.olen;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(sw842_decompress);
+
+static int __init sw842_init(void)
+{
+	if (sw842_template_counts)
+		sw842_debugfs_create();
+
+	return 0;
+}
+module_init(sw842_init);
+
+static void __exit sw842_exit(void)
+{
+	if (sw842_template_counts)
+		sw842_debugfs_remove();
+}
+module_exit(sw842_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Software 842 Decompressor");
+MODULE_AUTHOR("Dan Streetman <ddstreet@ieee.org>");
diff --git a/lib/842/Makefile b/lib/842/Makefile
new file mode 100644
index 0000000..5d24c0b
--- /dev/null
+++ b/lib/842/Makefile
@@ -0,0 +1,2 @@
+obj-$(CONFIG_842_COMPRESS) += 842_compress.o
+obj-$(CONFIG_842_DECOMPRESS) += 842_decompress.o
diff --git a/lib/Kconfig b/lib/Kconfig
index 601965a..34e332b 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -212,6 +212,12 @@ config RANDOM32_SELFTEST
 #
 # compression support is select'ed if needed
 #
+config 842_COMPRESS
+	tristate
+
+config 842_DECOMPRESS
+	tristate
+
 config ZLIB_INFLATE
 	tristate
 
diff --git a/lib/Makefile b/lib/Makefile
index 6c37933..ff37c8c 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -78,6 +78,8 @@ obj-$(CONFIG_LIBCRC32C)	+= libcrc32c.o
 obj-$(CONFIG_CRC8)	+= crc8.o
 obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o
 
+obj-$(CONFIG_842_COMPRESS) += 842/
+obj-$(CONFIG_842_DECOMPRESS) += 842/
 obj-$(CONFIG_ZLIB_INFLATE) += zlib_inflate/
 obj-$(CONFIG_ZLIB_DEFLATE) += zlib_deflate/
 obj-$(CONFIG_REED_SOLOMON) += reed_solomon/
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 04/10] crypto: change 842 alg to use software
  2015-05-06 16:50 ` [PATCHv2 00/10] add 842 hw compression for PowerNV platform Dan Streetman
                     ` (2 preceding siblings ...)
  2015-05-06 16:50   ` [PATCH 03/10] lib: add software 842 compression/decompression Dan Streetman
@ 2015-05-06 16:51   ` Dan Streetman
  2015-05-07  3:07     ` Herbert Xu
  2015-05-06 16:51   ` [PATCH 05/10] drivers/crypto/nx: rename nx-842.c to nx-842-pseries.c Dan Streetman
                     ` (6 subsequent siblings)
  10 siblings, 1 reply; 46+ messages in thread
From: Dan Streetman @ 2015-05-06 16:51 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Dan Streetman, Seth Jennings, Robert Jennings, linux-crypto,
	linuxppc-dev, linux-kernel

Change the crypto 842 compression alg to use the software 842 compression
and decompression library.  Change the name of this crypto alg to "sw842".
Remove the fallback to LZO compression.

Previously, this crypto compression alg attemped 842 compression using
PowerPC hardware, and fell back to LZO compression and decompression if
the 842 PowerPC hardware was unavailable or failed.  This should not
fall back to any other compression method, however; users of this crypto
compression alg can fallback if desired, and transparent fallback tricks
callers into thinking they are getting 842 compression when they actually
get LZO compression - the failure of the 842 hardware should not be
transparent to the caller.

The crypto compression alg for a hardware device also should not be located
in crypto/ so this is now a software-only implementation that uses the 842
software compression/decompression library.  Since users of the "842" alg
expected hardware compression, the name of this software-only alg is
changed to "sw842"; the new hardware 842 crypto compression alg will be
aliased to "842" in a later patch.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 MAINTAINERS    |   1 +
 crypto/842.c   | 175 ++++++++++++---------------------------------------------
 crypto/Kconfig |   7 +--
 3 files changed, 41 insertions(+), 142 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 116af01..5a5c1dc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4873,6 +4873,7 @@ S:	Supported
 F:	drivers/crypto/nx/nx-842.c
 F:	include/linux/nx842.h
 F:	include/linux/sw842.h
+F:	crypto/842.c
 F:	lib/842/
 
 IBM Power Linux RAID adapter
diff --git a/crypto/842.c b/crypto/842.c
index b48f4f1..c43b157 100644
--- a/crypto/842.c
+++ b/crypto/842.c
@@ -1,5 +1,5 @@
 /*
- * Cryptographic API for the 842 compression algorithm.
+ * Cryptographic API for the 842 software compression algorithm.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -11,173 +11,72 @@
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  * GNU General Public License for more details.
  *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ * Copyright (C) IBM Corporation, 2011-2015
  *
- * Copyright (C) IBM Corporation, 2011
+ * Original Authors: Robert Jennings <rcj@linux.vnet.ibm.com>
+ *                   Seth Jennings <sjenning@linux.vnet.ibm.com>
  *
- * Authors: Robert Jennings <rcj@linux.vnet.ibm.com>
- *          Seth Jennings <sjenning@linux.vnet.ibm.com>
+ * Rewrite: Dan Streetman <ddstreet@ieee.org>
+ *
+ * This is the software implementation of compression and decompression using
+ * the 842 format.  This uses the software 842 library at lib/842/ which is
+ * only a reference implementation, and is very, very slow as compared to other
+ * software compressors.  You probably do not want to use this software
+ * compression.  If you have access to the PowerPC 842 compression hardware, you
+ * want to use the 842 hardware compression interface, which is at:
+ * drivers/crypto/nx/nx-842-crypto.c
  */
 
 #include <linux/init.h>
 #include <linux/module.h>
 #include <linux/crypto.h>
-#include <linux/vmalloc.h>
-#include <linux/nx842.h>
-#include <linux/lzo.h>
-#include <linux/timer.h>
-
-static int nx842_uselzo;
-
-struct nx842_ctx {
-	void *nx842_wmem; /* working memory for 842/lzo */
-};
+#include <linux/sw842.h>
 
-enum nx842_crypto_type {
-	NX842_CRYPTO_TYPE_842,
-	NX842_CRYPTO_TYPE_LZO
-};
-
-#define NX842_SENTINEL 0xdeadbeef
+#define CRYPTO842_NAME	"sw842"
 
-struct nx842_crypto_header {
-	unsigned int sentinel; /* debug */
-	enum nx842_crypto_type type;
+struct crypto842_ctx {
+	char wmem[SW842_MEM_COMPRESS];	/* working memory for compress */
 };
 
-static int nx842_init(struct crypto_tfm *tfm)
-{
-	struct nx842_ctx *ctx = crypto_tfm_ctx(tfm);
-	int wmemsize;
-
-	wmemsize = max_t(int, nx842_get_workmem_size(), LZO1X_MEM_COMPRESS);
-	ctx->nx842_wmem = kmalloc(wmemsize, GFP_NOFS);
-	if (!ctx->nx842_wmem)
-		return -ENOMEM;
-
-	return 0;
-}
-
-static void nx842_exit(struct crypto_tfm *tfm)
+static int crypto842_compress(struct crypto_tfm *tfm,
+			      const u8 *src, unsigned int slen,
+			      u8 *dst, unsigned int *dlen)
 {
-	struct nx842_ctx *ctx = crypto_tfm_ctx(tfm);
+	struct crypto842_ctx *ctx = crypto_tfm_ctx(tfm);
 
-	kfree(ctx->nx842_wmem);
+	return sw842_compress(src, slen, dst, dlen, ctx->wmem);
 }
 
-static void nx842_reset_uselzo(unsigned long data)
+static int crypto842_decompress(struct crypto_tfm *tfm,
+				const u8 *src, unsigned int slen,
+				u8 *dst, unsigned int *dlen)
 {
-	nx842_uselzo = 0;
-}
-
-static DEFINE_TIMER(failover_timer, nx842_reset_uselzo, 0, 0);
-
-static int nx842_crypto_compress(struct crypto_tfm *tfm, const u8 *src,
-			    unsigned int slen, u8 *dst, unsigned int *dlen)
-{
-	struct nx842_ctx *ctx = crypto_tfm_ctx(tfm);
-	struct nx842_crypto_header *hdr;
-	unsigned int tmp_len = *dlen;
-	size_t lzodlen; /* needed for lzo */
-	int err;
-
-	*dlen = 0;
-	hdr = (struct nx842_crypto_header *)dst;
-	hdr->sentinel = NX842_SENTINEL; /* debug */
-	dst += sizeof(struct nx842_crypto_header);
-	tmp_len -= sizeof(struct nx842_crypto_header);
-	lzodlen = tmp_len;
-
-	if (likely(!nx842_uselzo)) {
-		err = nx842_compress(src, slen, dst, &tmp_len, ctx->nx842_wmem);
-
-		if (likely(!err)) {
-			hdr->type = NX842_CRYPTO_TYPE_842;
-			*dlen = tmp_len + sizeof(struct nx842_crypto_header);
-			return 0;
-		}
-
-		/* hardware failed */
-		nx842_uselzo = 1;
-
-		/* set timer to check for hardware again in 1 second */
-		mod_timer(&failover_timer, jiffies + msecs_to_jiffies(1000));
-	}
-
-	/* no hardware, use lzo */
-	err = lzo1x_1_compress(src, slen, dst, &lzodlen, ctx->nx842_wmem);
-	if (err != LZO_E_OK)
-		return -EINVAL;
-
-	hdr->type = NX842_CRYPTO_TYPE_LZO;
-	*dlen = lzodlen + sizeof(struct nx842_crypto_header);
-	return 0;
-}
-
-static int nx842_crypto_decompress(struct crypto_tfm *tfm, const u8 *src,
-			      unsigned int slen, u8 *dst, unsigned int *dlen)
-{
-	struct nx842_ctx *ctx = crypto_tfm_ctx(tfm);
-	struct nx842_crypto_header *hdr;
-	unsigned int tmp_len = *dlen;
-	size_t lzodlen; /* needed for lzo */
-	int err;
-
-	*dlen = 0;
-	hdr = (struct nx842_crypto_header *)src;
-
-	if (unlikely(hdr->sentinel != NX842_SENTINEL))
-		return -EINVAL;
-
-	src += sizeof(struct nx842_crypto_header);
-	slen -= sizeof(struct nx842_crypto_header);
-
-	if (likely(hdr->type == NX842_CRYPTO_TYPE_842)) {
-		err = nx842_decompress(src, slen, dst, &tmp_len,
-			ctx->nx842_wmem);
-		if (err)
-			return -EINVAL;
-		*dlen = tmp_len;
-	} else if (hdr->type == NX842_CRYPTO_TYPE_LZO) {
-		lzodlen = tmp_len;
-		err = lzo1x_decompress_safe(src, slen, dst, &lzodlen);
-		if (err != LZO_E_OK)
-			return -EINVAL;
-		*dlen = lzodlen;
-	} else
-		return -EINVAL;
-
-	return 0;
+	return sw842_decompress(src, slen, dst, dlen);
 }
 
 static struct crypto_alg alg = {
-	.cra_name		= "842",
+	.cra_name		= CRYPTO842_NAME,
 	.cra_flags		= CRYPTO_ALG_TYPE_COMPRESS,
-	.cra_ctxsize		= sizeof(struct nx842_ctx),
+	.cra_ctxsize		= sizeof(struct crypto842_ctx),
 	.cra_module		= THIS_MODULE,
-	.cra_init		= nx842_init,
-	.cra_exit		= nx842_exit,
 	.cra_u			= { .compress = {
-	.coa_compress		= nx842_crypto_compress,
-	.coa_decompress		= nx842_crypto_decompress } }
+	.coa_compress		= crypto842_compress,
+	.coa_decompress		= crypto842_decompress } }
 };
 
-static int __init nx842_mod_init(void)
+static int __init crypto842_mod_init(void)
 {
-	del_timer(&failover_timer);
 	return crypto_register_alg(&alg);
 }
+module_init(crypto842_mod_init);
 
-static void __exit nx842_mod_exit(void)
+static void __exit crypto842_mod_exit(void)
 {
 	crypto_unregister_alg(&alg);
 }
-
-module_init(nx842_mod_init);
-module_exit(nx842_mod_exit);
+module_exit(crypto842_mod_exit);
 
 MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("842 Compression Algorithm");
-MODULE_ALIAS_CRYPTO("842");
+MODULE_DESCRIPTION("842 Software Compression Algorithm");
+MODULE_ALIAS_CRYPTO(CRYPTO842_NAME);
+MODULE_AUTHOR("Dan Streetman <ddstreet@ieee.org>");
diff --git a/crypto/Kconfig b/crypto/Kconfig
index 8aaf298..eba55b4 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1412,10 +1412,9 @@ config CRYPTO_LZO
 
 config CRYPTO_842
 	tristate "842 compression algorithm"
-	depends on CRYPTO_DEV_NX_COMPRESS
-	# 842 uses lzo if the hardware becomes unavailable
-	select LZO_COMPRESS
-	select LZO_DECOMPRESS
+	select CRYPTO_ALGAPI
+	select 842_COMPRESS
+	select 842_DECOMPRESS
 	help
 	  This is the 842 algorithm.
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 05/10] drivers/crypto/nx: rename nx-842.c to nx-842-pseries.c
  2015-05-06 16:50 ` [PATCHv2 00/10] add 842 hw compression for PowerNV platform Dan Streetman
                     ` (3 preceding siblings ...)
  2015-05-06 16:51   ` [PATCH 04/10] crypto: change 842 alg to use software Dan Streetman
@ 2015-05-06 16:51   ` Dan Streetman
  2015-05-06 16:51   ` [PATCH 06/10] drivers/crypto/nx: add NX-842 platform frontend driver Dan Streetman
                     ` (5 subsequent siblings)
  10 siblings, 0 replies; 46+ messages in thread
From: Dan Streetman @ 2015-05-06 16:51 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Dan Streetman, Seth Jennings, Robert Jennings, linux-crypto,
	linuxppc-dev, linux-kernel

Move the entire NX-842 driver for the pSeries platform from the file
nx-842.c to nx-842-pseries.c.  This is required by later patches that
add NX-842 support for the PowerNV platform.

This patch does not alter the content of the pSeries NX-842 driver at
all, it only changes the filename.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 drivers/crypto/nx/Makefile         |    2 +-
 drivers/crypto/nx/nx-842-pseries.c | 1603 ++++++++++++++++++++++++++++++++++++
 drivers/crypto/nx/nx-842.c         | 1603 ------------------------------------
 3 files changed, 1604 insertions(+), 1604 deletions(-)
 create mode 100644 drivers/crypto/nx/nx-842-pseries.c
 delete mode 100644 drivers/crypto/nx/nx-842.c

diff --git a/drivers/crypto/nx/Makefile b/drivers/crypto/nx/Makefile
index bb770ea..8669ffa 100644
--- a/drivers/crypto/nx/Makefile
+++ b/drivers/crypto/nx/Makefile
@@ -11,4 +11,4 @@ nx-crypto-objs := nx.o \
 		  nx-sha512.o
 
 obj-$(CONFIG_CRYPTO_DEV_NX_COMPRESS) += nx-compress.o
-nx-compress-objs := nx-842.o
+nx-compress-objs := nx-842-pseries.o
diff --git a/drivers/crypto/nx/nx-842-pseries.c b/drivers/crypto/nx/nx-842-pseries.c
new file mode 100644
index 0000000..887196e
--- /dev/null
+++ b/drivers/crypto/nx/nx-842-pseries.c
@@ -0,0 +1,1603 @@
+/*
+ * Driver for IBM Power 842 compression accelerator
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright (C) IBM Corporation, 2012
+ *
+ * Authors: Robert Jennings <rcj@linux.vnet.ibm.com>
+ *          Seth Jennings <sjenning@linux.vnet.ibm.com>
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/nx842.h>
+#include <linux/of.h>
+#include <linux/slab.h>
+
+#include <asm/page.h>
+#include <asm/vio.h>
+
+#include "nx_csbcpb.h" /* struct nx_csbcpb */
+
+#define MODULE_NAME "nx-compress"
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Robert Jennings <rcj@linux.vnet.ibm.com>");
+MODULE_DESCRIPTION("842 H/W Compression driver for IBM Power processors");
+
+#define SHIFT_4K 12
+#define SHIFT_64K 16
+#define SIZE_4K (1UL << SHIFT_4K)
+#define SIZE_64K (1UL << SHIFT_64K)
+
+/* IO buffer must be 128 byte aligned */
+#define IO_BUFFER_ALIGN 128
+
+struct nx842_header {
+	int blocks_nr; /* number of compressed blocks */
+	int offset; /* offset of the first block (from beginning of header) */
+	int sizes[0]; /* size of compressed blocks */
+};
+
+static inline int nx842_header_size(const struct nx842_header *hdr)
+{
+	return sizeof(struct nx842_header) +
+			hdr->blocks_nr * sizeof(hdr->sizes[0]);
+}
+
+/* Macros for fields within nx_csbcpb */
+/* Check the valid bit within the csbcpb valid field */
+#define NX842_CSBCBP_VALID_CHK(x) (x & BIT_MASK(7))
+
+/* CE macros operate on the completion_extension field bits in the csbcpb.
+ * CE0 0=full completion, 1=partial completion
+ * CE1 0=CE0 indicates completion, 1=termination (output may be modified)
+ * CE2 0=processed_bytes is source bytes, 1=processed_bytes is target bytes */
+#define NX842_CSBCPB_CE0(x)	(x & BIT_MASK(7))
+#define NX842_CSBCPB_CE1(x)	(x & BIT_MASK(6))
+#define NX842_CSBCPB_CE2(x)	(x & BIT_MASK(5))
+
+/* The NX unit accepts data only on 4K page boundaries */
+#define NX842_HW_PAGE_SHIFT	SHIFT_4K
+#define NX842_HW_PAGE_SIZE	(ASM_CONST(1) << NX842_HW_PAGE_SHIFT)
+#define NX842_HW_PAGE_MASK	(~(NX842_HW_PAGE_SIZE-1))
+
+enum nx842_status {
+	UNAVAILABLE,
+	AVAILABLE
+};
+
+struct ibm_nx842_counters {
+	atomic64_t comp_complete;
+	atomic64_t comp_failed;
+	atomic64_t decomp_complete;
+	atomic64_t decomp_failed;
+	atomic64_t swdecomp;
+	atomic64_t comp_times[32];
+	atomic64_t decomp_times[32];
+};
+
+static struct nx842_devdata {
+	struct vio_dev *vdev;
+	struct device *dev;
+	struct ibm_nx842_counters *counters;
+	unsigned int max_sg_len;
+	unsigned int max_sync_size;
+	unsigned int max_sync_sg;
+	enum nx842_status status;
+} __rcu *devdata;
+static DEFINE_SPINLOCK(devdata_mutex);
+
+#define NX842_COUNTER_INC(_x) \
+static inline void nx842_inc_##_x( \
+	const struct nx842_devdata *dev) { \
+	if (dev) \
+		atomic64_inc(&dev->counters->_x); \
+}
+NX842_COUNTER_INC(comp_complete);
+NX842_COUNTER_INC(comp_failed);
+NX842_COUNTER_INC(decomp_complete);
+NX842_COUNTER_INC(decomp_failed);
+NX842_COUNTER_INC(swdecomp);
+
+#define NX842_HIST_SLOTS 16
+
+static void ibm_nx842_incr_hist(atomic64_t *times, unsigned int time)
+{
+	int bucket = fls(time);
+
+	if (bucket)
+		bucket = min((NX842_HIST_SLOTS - 1), bucket - 1);
+
+	atomic64_inc(&times[bucket]);
+}
+
+/* NX unit operation flags */
+#define NX842_OP_COMPRESS	0x0
+#define NX842_OP_CRC		0x1
+#define NX842_OP_DECOMPRESS	0x2
+#define NX842_OP_COMPRESS_CRC   (NX842_OP_COMPRESS | NX842_OP_CRC)
+#define NX842_OP_DECOMPRESS_CRC (NX842_OP_DECOMPRESS | NX842_OP_CRC)
+#define NX842_OP_ASYNC		(1<<23)
+#define NX842_OP_NOTIFY		(1<<22)
+#define NX842_OP_NOTIFY_INT(x)	((x & 0xff)<<8)
+
+static unsigned long nx842_get_desired_dma(struct vio_dev *viodev)
+{
+	/* No use of DMA mappings within the driver. */
+	return 0;
+}
+
+struct nx842_slentry {
+	unsigned long ptr; /* Real address (use __pa()) */
+	unsigned long len;
+};
+
+/* pHyp scatterlist entry */
+struct nx842_scatterlist {
+	int entry_nr; /* number of slentries */
+	struct nx842_slentry *entries; /* ptr to array of slentries */
+};
+
+/* Does not include sizeof(entry_nr) in the size */
+static inline unsigned long nx842_get_scatterlist_size(
+				struct nx842_scatterlist *sl)
+{
+	return sl->entry_nr * sizeof(struct nx842_slentry);
+}
+
+static inline unsigned long nx842_get_pa(void *addr)
+{
+	if (is_vmalloc_addr(addr))
+		return page_to_phys(vmalloc_to_page(addr))
+		       + offset_in_page(addr);
+	else
+		return __pa(addr);
+}
+
+static int nx842_build_scatterlist(unsigned long buf, int len,
+			struct nx842_scatterlist *sl)
+{
+	unsigned long nextpage;
+	struct nx842_slentry *entry;
+
+	sl->entry_nr = 0;
+
+	entry = sl->entries;
+	while (len) {
+		entry->ptr = nx842_get_pa((void *)buf);
+		nextpage = ALIGN(buf + 1, NX842_HW_PAGE_SIZE);
+		if (nextpage < buf + len) {
+			/* we aren't at the end yet */
+			if (IS_ALIGNED(buf, NX842_HW_PAGE_SIZE))
+				/* we are in the middle (or beginning) */
+				entry->len = NX842_HW_PAGE_SIZE;
+			else
+				/* we are at the beginning */
+				entry->len = nextpage - buf;
+		} else {
+			/* at the end */
+			entry->len = len;
+		}
+
+		len -= entry->len;
+		buf += entry->len;
+		sl->entry_nr++;
+		entry++;
+	}
+
+	return 0;
+}
+
+/*
+ * Working memory for software decompression
+ */
+struct sw842_fifo {
+	union {
+		char f8[256][8];
+		char f4[512][4];
+	};
+	char f2[256][2];
+	unsigned char f84_full;
+	unsigned char f2_full;
+	unsigned char f8_count;
+	unsigned char f2_count;
+	unsigned int f4_count;
+};
+
+/*
+ * Working memory for crypto API
+ */
+struct nx842_workmem {
+	char bounce[PAGE_SIZE]; /* bounce buffer for decompression input */
+	union {
+		/* hardware working memory */
+		struct {
+			/* scatterlist */
+			char slin[SIZE_4K];
+			char slout[SIZE_4K];
+			/* coprocessor status/parameter block */
+			struct nx_csbcpb csbcpb;
+		};
+		/* software working memory */
+		struct sw842_fifo swfifo; /* software decompression fifo */
+	};
+};
+
+int nx842_get_workmem_size(void)
+{
+	return sizeof(struct nx842_workmem) + NX842_HW_PAGE_SIZE;
+}
+EXPORT_SYMBOL_GPL(nx842_get_workmem_size);
+
+int nx842_get_workmem_size_aligned(void)
+{
+	return sizeof(struct nx842_workmem);
+}
+EXPORT_SYMBOL_GPL(nx842_get_workmem_size_aligned);
+
+static int nx842_validate_result(struct device *dev,
+	struct cop_status_block *csb)
+{
+	/* The csb must be valid after returning from vio_h_cop_sync */
+	if (!NX842_CSBCBP_VALID_CHK(csb->valid)) {
+		dev_err(dev, "%s: cspcbp not valid upon completion.\n",
+				__func__);
+		dev_dbg(dev, "valid:0x%02x cs:0x%02x cc:0x%02x ce:0x%02x\n",
+				csb->valid,
+				csb->crb_seq_number,
+				csb->completion_code,
+				csb->completion_extension);
+		dev_dbg(dev, "processed_bytes:%d address:0x%016lx\n",
+				csb->processed_byte_count,
+				(unsigned long)csb->address);
+		return -EIO;
+	}
+
+	/* Check return values from the hardware in the CSB */
+	switch (csb->completion_code) {
+	case 0:	/* Completed without error */
+		break;
+	case 64: /* Target bytes > Source bytes during compression */
+	case 13: /* Output buffer too small */
+		dev_dbg(dev, "%s: Compression output larger than input\n",
+					__func__);
+		return -ENOSPC;
+	case 66: /* Input data contains an illegal template field */
+	case 67: /* Template indicates data past the end of the input stream */
+		dev_dbg(dev, "%s: Bad data for decompression (code:%d)\n",
+					__func__, csb->completion_code);
+		return -EINVAL;
+	default:
+		dev_dbg(dev, "%s: Unspecified error (code:%d)\n",
+					__func__, csb->completion_code);
+		return -EIO;
+	}
+
+	/* Hardware sanity check */
+	if (!NX842_CSBCPB_CE2(csb->completion_extension)) {
+		dev_err(dev, "%s: No error returned by hardware, but "
+				"data returned is unusable, contact support.\n"
+				"(Additional info: csbcbp->processed bytes "
+				"does not specify processed bytes for the "
+				"target buffer.)\n", __func__);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/**
+ * nx842_compress - Compress data using the 842 algorithm
+ *
+ * Compression provide by the NX842 coprocessor on IBM Power systems.
+ * The input buffer is compressed and the result is stored in the
+ * provided output buffer.
+ *
+ * Upon return from this function @outlen contains the length of the
+ * compressed data.  If there is an error then @outlen will be 0 and an
+ * error will be specified by the return code from this function.
+ *
+ * @in: Pointer to input buffer, must be page aligned
+ * @inlen: Length of input buffer, must be PAGE_SIZE
+ * @out: Pointer to output buffer
+ * @outlen: Length of output buffer
+ * @wrkmem: ptr to buffer for working memory, size determined by
+ *          nx842_get_workmem_size()
+ *
+ * Returns:
+ *   0		Success, output of length @outlen stored in the buffer at @out
+ *   -ENOMEM	Unable to allocate internal buffers
+ *   -ENOSPC	Output buffer is to small
+ *   -EMSGSIZE	XXX Difficult to describe this limitation
+ *   -EIO	Internal error
+ *   -ENODEV	Hardware unavailable
+ */
+int nx842_compress(const unsigned char *in, unsigned int inlen,
+		       unsigned char *out, unsigned int *outlen, void *wmem)
+{
+	struct nx842_header *hdr;
+	struct nx842_devdata *local_devdata;
+	struct device *dev = NULL;
+	struct nx842_workmem *workmem;
+	struct nx842_scatterlist slin, slout;
+	struct nx_csbcpb *csbcpb;
+	int ret = 0, max_sync_size, i, bytesleft, size, hdrsize;
+	unsigned long inbuf, outbuf, padding;
+	struct vio_pfo_op op = {
+		.done = NULL,
+		.handle = 0,
+		.timeout = 0,
+	};
+	unsigned long start_time = get_tb();
+
+	/*
+	 * Make sure input buffer is 64k page aligned.  This is assumed since
+	 * this driver is designed for page compression only (for now).  This
+	 * is very nice since we can now use direct DDE(s) for the input and
+	 * the alignment is guaranteed.
+	*/
+	inbuf = (unsigned long)in;
+	if (!IS_ALIGNED(inbuf, PAGE_SIZE) || inlen != PAGE_SIZE)
+		return -EINVAL;
+
+	rcu_read_lock();
+	local_devdata = rcu_dereference(devdata);
+	if (!local_devdata || !local_devdata->dev) {
+		rcu_read_unlock();
+		return -ENODEV;
+	}
+	max_sync_size = local_devdata->max_sync_size;
+	dev = local_devdata->dev;
+
+	/* Create the header */
+	hdr = (struct nx842_header *)out;
+	hdr->blocks_nr = PAGE_SIZE / max_sync_size;
+	hdrsize = nx842_header_size(hdr);
+	outbuf = (unsigned long)out + hdrsize;
+	bytesleft = *outlen - hdrsize;
+
+	/* Init scatterlist */
+	workmem = (struct nx842_workmem *)ALIGN((unsigned long)wmem,
+		NX842_HW_PAGE_SIZE);
+	slin.entries = (struct nx842_slentry *)workmem->slin;
+	slout.entries = (struct nx842_slentry *)workmem->slout;
+
+	/* Init operation */
+	op.flags = NX842_OP_COMPRESS;
+	csbcpb = &workmem->csbcpb;
+	memset(csbcpb, 0, sizeof(*csbcpb));
+	op.csbcpb = nx842_get_pa(csbcpb);
+	op.out = nx842_get_pa(slout.entries);
+
+	for (i = 0; i < hdr->blocks_nr; i++) {
+		/*
+		 * Aligning the output blocks to 128 bytes does waste space,
+		 * but it prevents the need for bounce buffers and memory
+		 * copies.  It also simplifies the code a lot.  In the worst
+		 * case (64k page, 4k max_sync_size), you lose up to
+		 * (128*16)/64k = ~3% the compression factor. For 64k
+		 * max_sync_size, the loss would be at most 128/64k = ~0.2%.
+		 */
+		padding = ALIGN(outbuf, IO_BUFFER_ALIGN) - outbuf;
+		outbuf += padding;
+		bytesleft -= padding;
+		if (i == 0)
+			/* save offset into first block in header */
+			hdr->offset = padding + hdrsize;
+
+		if (bytesleft <= 0) {
+			ret = -ENOSPC;
+			goto unlock;
+		}
+
+		/*
+		 * NOTE: If the default max_sync_size is changed from 4k
+		 * to 64k, remove the "likely" case below, since a
+		 * scatterlist will always be needed.
+		 */
+		if (likely(max_sync_size == NX842_HW_PAGE_SIZE)) {
+			/* Create direct DDE */
+			op.in = nx842_get_pa((void *)inbuf);
+			op.inlen = max_sync_size;
+
+		} else {
+			/* Create indirect DDE (scatterlist) */
+			nx842_build_scatterlist(inbuf, max_sync_size, &slin);
+			op.in = nx842_get_pa(slin.entries);
+			op.inlen = -nx842_get_scatterlist_size(&slin);
+		}
+
+		/*
+		 * If max_sync_size != NX842_HW_PAGE_SIZE, an indirect
+		 * DDE is required for the outbuf.
+		 * If max_sync_size == NX842_HW_PAGE_SIZE, outbuf must
+		 * also be page aligned (1 in 128/4k=32 chance) in order
+		 * to use a direct DDE.
+		 * This is unlikely, just use an indirect DDE always.
+		 */
+		nx842_build_scatterlist(outbuf,
+			min(bytesleft, max_sync_size), &slout);
+		/* op.out set before loop */
+		op.outlen = -nx842_get_scatterlist_size(&slout);
+
+		/* Send request to pHyp */
+		ret = vio_h_cop_sync(local_devdata->vdev, &op);
+
+		/* Check for pHyp error */
+		if (ret) {
+			dev_dbg(dev, "%s: vio_h_cop_sync error (ret=%d, hret=%ld)\n",
+				__func__, ret, op.hcall_err);
+			ret = -EIO;
+			goto unlock;
+		}
+
+		/* Check for hardware error */
+		ret = nx842_validate_result(dev, &csbcpb->csb);
+		if (ret && ret != -ENOSPC)
+			goto unlock;
+
+		/* Handle incompressible data */
+		if (unlikely(ret == -ENOSPC)) {
+			if (bytesleft < max_sync_size) {
+				/*
+				 * Not enough space left in the output buffer
+				 * to store uncompressed block
+				 */
+				goto unlock;
+			} else {
+				/* Store incompressible block */
+				memcpy((void *)outbuf, (void *)inbuf,
+					max_sync_size);
+				hdr->sizes[i] = -max_sync_size;
+				outbuf += max_sync_size;
+				bytesleft -= max_sync_size;
+				/* Reset ret, incompressible data handled */
+				ret = 0;
+			}
+		} else {
+			/* Normal case, compression was successful */
+			size = csbcpb->csb.processed_byte_count;
+			dev_dbg(dev, "%s: processed_bytes=%d\n",
+				__func__, size);
+			hdr->sizes[i] = size;
+			outbuf += size;
+			bytesleft -= size;
+		}
+
+		inbuf += max_sync_size;
+	}
+
+	*outlen = (unsigned int)(outbuf - (unsigned long)out);
+
+unlock:
+	if (ret)
+		nx842_inc_comp_failed(local_devdata);
+	else {
+		nx842_inc_comp_complete(local_devdata);
+		ibm_nx842_incr_hist(local_devdata->counters->comp_times,
+			(get_tb() - start_time) / tb_ticks_per_usec);
+	}
+	rcu_read_unlock();
+	return ret;
+}
+EXPORT_SYMBOL_GPL(nx842_compress);
+
+static int sw842_decompress(const unsigned char *, int, unsigned char *, int *,
+			const void *);
+
+/**
+ * nx842_decompress - Decompress data using the 842 algorithm
+ *
+ * Decompression provide by the NX842 coprocessor on IBM Power systems.
+ * The input buffer is decompressed and the result is stored in the
+ * provided output buffer.  The size allocated to the output buffer is
+ * provided by the caller of this function in @outlen.  Upon return from
+ * this function @outlen contains the length of the decompressed data.
+ * If there is an error then @outlen will be 0 and an error will be
+ * specified by the return code from this function.
+ *
+ * @in: Pointer to input buffer, will use bounce buffer if not 128 byte
+ *      aligned
+ * @inlen: Length of input buffer
+ * @out: Pointer to output buffer, must be page aligned
+ * @outlen: Length of output buffer, must be PAGE_SIZE
+ * @wrkmem: ptr to buffer for working memory, size determined by
+ *          nx842_get_workmem_size()
+ *
+ * Returns:
+ *   0		Success, output of length @outlen stored in the buffer at @out
+ *   -ENODEV	Hardware decompression device is unavailable
+ *   -ENOMEM	Unable to allocate internal buffers
+ *   -ENOSPC	Output buffer is to small
+ *   -EINVAL	Bad input data encountered when attempting decompress
+ *   -EIO	Internal error
+ */
+int nx842_decompress(const unsigned char *in, unsigned int inlen,
+			 unsigned char *out, unsigned int *outlen, void *wmem)
+{
+	struct nx842_header *hdr;
+	struct nx842_devdata *local_devdata;
+	struct device *dev = NULL;
+	struct nx842_workmem *workmem;
+	struct nx842_scatterlist slin, slout;
+	struct nx_csbcpb *csbcpb;
+	int ret = 0, i, size, max_sync_size;
+	unsigned long inbuf, outbuf;
+	struct vio_pfo_op op = {
+		.done = NULL,
+		.handle = 0,
+		.timeout = 0,
+	};
+	unsigned long start_time = get_tb();
+
+	/* Ensure page alignment and size */
+	outbuf = (unsigned long)out;
+	if (!IS_ALIGNED(outbuf, PAGE_SIZE) || *outlen != PAGE_SIZE)
+		return -EINVAL;
+
+	rcu_read_lock();
+	local_devdata = rcu_dereference(devdata);
+	if (local_devdata)
+		dev = local_devdata->dev;
+
+	/* Get header */
+	hdr = (struct nx842_header *)in;
+
+	workmem = (struct nx842_workmem *)ALIGN((unsigned long)wmem,
+		NX842_HW_PAGE_SIZE);
+
+	inbuf = (unsigned long)in + hdr->offset;
+	if (likely(!IS_ALIGNED(inbuf, IO_BUFFER_ALIGN))) {
+		/* Copy block(s) into bounce buffer for alignment */
+		memcpy(workmem->bounce, in + hdr->offset, inlen - hdr->offset);
+		inbuf = (unsigned long)workmem->bounce;
+	}
+
+	/* Init scatterlist */
+	slin.entries = (struct nx842_slentry *)workmem->slin;
+	slout.entries = (struct nx842_slentry *)workmem->slout;
+
+	/* Init operation */
+	op.flags = NX842_OP_DECOMPRESS;
+	csbcpb = &workmem->csbcpb;
+	memset(csbcpb, 0, sizeof(*csbcpb));
+	op.csbcpb = nx842_get_pa(csbcpb);
+
+	/*
+	 * max_sync_size may have changed since compression,
+	 * so we can't read it from the device info. We need
+	 * to derive it from hdr->blocks_nr.
+	 */
+	max_sync_size = PAGE_SIZE / hdr->blocks_nr;
+
+	for (i = 0; i < hdr->blocks_nr; i++) {
+		/* Skip padding */
+		inbuf = ALIGN(inbuf, IO_BUFFER_ALIGN);
+
+		if (hdr->sizes[i] < 0) {
+			/* Negative sizes indicate uncompressed data blocks */
+			size = abs(hdr->sizes[i]);
+			memcpy((void *)outbuf, (void *)inbuf, size);
+			outbuf += size;
+			inbuf += size;
+			continue;
+		}
+
+		if (!dev)
+			goto sw;
+
+		/*
+		 * The better the compression, the more likely the "likely"
+		 * case becomes.
+		 */
+		if (likely((inbuf & NX842_HW_PAGE_MASK) ==
+			((inbuf + hdr->sizes[i] - 1) & NX842_HW_PAGE_MASK))) {
+			/* Create direct DDE */
+			op.in = nx842_get_pa((void *)inbuf);
+			op.inlen = hdr->sizes[i];
+		} else {
+			/* Create indirect DDE (scatterlist) */
+			nx842_build_scatterlist(inbuf, hdr->sizes[i] , &slin);
+			op.in = nx842_get_pa(slin.entries);
+			op.inlen = -nx842_get_scatterlist_size(&slin);
+		}
+
+		/*
+		 * NOTE: If the default max_sync_size is changed from 4k
+		 * to 64k, remove the "likely" case below, since a
+		 * scatterlist will always be needed.
+		 */
+		if (likely(max_sync_size == NX842_HW_PAGE_SIZE)) {
+			/* Create direct DDE */
+			op.out = nx842_get_pa((void *)outbuf);
+			op.outlen = max_sync_size;
+		} else {
+			/* Create indirect DDE (scatterlist) */
+			nx842_build_scatterlist(outbuf, max_sync_size, &slout);
+			op.out = nx842_get_pa(slout.entries);
+			op.outlen = -nx842_get_scatterlist_size(&slout);
+		}
+
+		/* Send request to pHyp */
+		ret = vio_h_cop_sync(local_devdata->vdev, &op);
+
+		/* Check for pHyp error */
+		if (ret) {
+			dev_dbg(dev, "%s: vio_h_cop_sync error (ret=%d, hret=%ld)\n",
+				__func__, ret, op.hcall_err);
+			dev = NULL;
+			goto sw;
+		}
+
+		/* Check for hardware error */
+		ret = nx842_validate_result(dev, &csbcpb->csb);
+		if (ret) {
+			dev = NULL;
+			goto sw;
+		}
+
+		/* HW decompression success */
+		inbuf += hdr->sizes[i];
+		outbuf += csbcpb->csb.processed_byte_count;
+		continue;
+
+sw:
+		/* software decompression */
+		size = max_sync_size;
+		ret = sw842_decompress(
+			(unsigned char *)inbuf, hdr->sizes[i],
+			(unsigned char *)outbuf, &size, wmem);
+		if (ret)
+			pr_debug("%s: sw842_decompress failed with %d\n",
+				__func__, ret);
+
+		if (ret) {
+			if (ret != -ENOSPC && ret != -EINVAL &&
+					ret != -EMSGSIZE)
+				ret = -EIO;
+			goto unlock;
+		}
+
+		/* SW decompression success */
+		inbuf += hdr->sizes[i];
+		outbuf += size;
+	}
+
+	*outlen = (unsigned int)(outbuf - (unsigned long)out);
+
+unlock:
+	if (ret)
+		/* decompress fail */
+		nx842_inc_decomp_failed(local_devdata);
+	else {
+		if (!dev)
+			/* software decompress */
+			nx842_inc_swdecomp(local_devdata);
+		nx842_inc_decomp_complete(local_devdata);
+		ibm_nx842_incr_hist(local_devdata->counters->decomp_times,
+			(get_tb() - start_time) / tb_ticks_per_usec);
+	}
+
+	rcu_read_unlock();
+	return ret;
+}
+EXPORT_SYMBOL_GPL(nx842_decompress);
+
+/**
+ * nx842_OF_set_defaults -- Set default (disabled) values for devdata
+ *
+ * @devdata - struct nx842_devdata to update
+ *
+ * Returns:
+ *  0 on success
+ *  -ENOENT if @devdata ptr is NULL
+ */
+static int nx842_OF_set_defaults(struct nx842_devdata *devdata)
+{
+	if (devdata) {
+		devdata->max_sync_size = 0;
+		devdata->max_sync_sg = 0;
+		devdata->max_sg_len = 0;
+		devdata->status = UNAVAILABLE;
+		return 0;
+	} else
+		return -ENOENT;
+}
+
+/**
+ * nx842_OF_upd_status -- Update the device info from OF status prop
+ *
+ * The status property indicates if the accelerator is enabled.  If the
+ * device is in the OF tree it indicates that the hardware is present.
+ * The status field indicates if the device is enabled when the status
+ * is 'okay'.  Otherwise the device driver will be disabled.
+ *
+ * @devdata - struct nx842_devdata to update
+ * @prop - struct property point containing the maxsyncop for the update
+ *
+ * Returns:
+ *  0 - Device is available
+ *  -EINVAL - Device is not available
+ */
+static int nx842_OF_upd_status(struct nx842_devdata *devdata,
+					struct property *prop) {
+	int ret = 0;
+	const char *status = (const char *)prop->value;
+
+	if (!strncmp(status, "okay", (size_t)prop->length)) {
+		devdata->status = AVAILABLE;
+	} else {
+		dev_info(devdata->dev, "%s: status '%s' is not 'okay'\n",
+				__func__, status);
+		devdata->status = UNAVAILABLE;
+	}
+
+	return ret;
+}
+
+/**
+ * nx842_OF_upd_maxsglen -- Update the device info from OF maxsglen prop
+ *
+ * Definition of the 'ibm,max-sg-len' OF property:
+ *  This field indicates the maximum byte length of a scatter list
+ *  for the platform facility. It is a single cell encoded as with encode-int.
+ *
+ * Example:
+ *  # od -x ibm,max-sg-len
+ *  0000000 0000 0ff0
+ *
+ *  In this example, the maximum byte length of a scatter list is
+ *  0x0ff0 (4,080).
+ *
+ * @devdata - struct nx842_devdata to update
+ * @prop - struct property point containing the maxsyncop for the update
+ *
+ * Returns:
+ *  0 on success
+ *  -EINVAL on failure
+ */
+static int nx842_OF_upd_maxsglen(struct nx842_devdata *devdata,
+					struct property *prop) {
+	int ret = 0;
+	const int *maxsglen = prop->value;
+
+	if (prop->length != sizeof(*maxsglen)) {
+		dev_err(devdata->dev, "%s: unexpected format for ibm,max-sg-len property\n", __func__);
+		dev_dbg(devdata->dev, "%s: ibm,max-sg-len is %d bytes long, expected %lu bytes\n", __func__,
+				prop->length, sizeof(*maxsglen));
+		ret = -EINVAL;
+	} else {
+		devdata->max_sg_len = (unsigned int)min(*maxsglen,
+				(int)NX842_HW_PAGE_SIZE);
+	}
+
+	return ret;
+}
+
+/**
+ * nx842_OF_upd_maxsyncop -- Update the device info from OF maxsyncop prop
+ *
+ * Definition of the 'ibm,max-sync-cop' OF property:
+ *  Two series of cells.  The first series of cells represents the maximums
+ *  that can be synchronously compressed. The second series of cells
+ *  represents the maximums that can be synchronously decompressed.
+ *  1. The first cell in each series contains the count of the number of
+ *     data length, scatter list elements pairs that follow – each being
+ *     of the form
+ *    a. One cell data byte length
+ *    b. One cell total number of scatter list elements
+ *
+ * Example:
+ *  # od -x ibm,max-sync-cop
+ *  0000000 0000 0001 0000 1000 0000 01fe 0000 0001
+ *  0000020 0000 1000 0000 01fe
+ *
+ *  In this example, compression supports 0x1000 (4,096) data byte length
+ *  and 0x1fe (510) total scatter list elements.  Decompression supports
+ *  0x1000 (4,096) data byte length and 0x1f3 (510) total scatter list
+ *  elements.
+ *
+ * @devdata - struct nx842_devdata to update
+ * @prop - struct property point containing the maxsyncop for the update
+ *
+ * Returns:
+ *  0 on success
+ *  -EINVAL on failure
+ */
+static int nx842_OF_upd_maxsyncop(struct nx842_devdata *devdata,
+					struct property *prop) {
+	int ret = 0;
+	const struct maxsynccop_t {
+		int comp_elements;
+		int comp_data_limit;
+		int comp_sg_limit;
+		int decomp_elements;
+		int decomp_data_limit;
+		int decomp_sg_limit;
+	} *maxsynccop;
+
+	if (prop->length != sizeof(*maxsynccop)) {
+		dev_err(devdata->dev, "%s: unexpected format for ibm,max-sync-cop property\n", __func__);
+		dev_dbg(devdata->dev, "%s: ibm,max-sync-cop is %d bytes long, expected %lu bytes\n", __func__, prop->length,
+				sizeof(*maxsynccop));
+		ret = -EINVAL;
+		goto out;
+	}
+
+	maxsynccop = (const struct maxsynccop_t *)prop->value;
+
+	/* Use one limit rather than separate limits for compression and
+	 * decompression. Set a maximum for this so as not to exceed the
+	 * size that the header can support and round the value down to
+	 * the hardware page size (4K) */
+	devdata->max_sync_size =
+			(unsigned int)min(maxsynccop->comp_data_limit,
+					maxsynccop->decomp_data_limit);
+
+	devdata->max_sync_size = min_t(unsigned int, devdata->max_sync_size,
+					SIZE_64K);
+
+	if (devdata->max_sync_size < SIZE_4K) {
+		dev_err(devdata->dev, "%s: hardware max data size (%u) is "
+				"less than the driver minimum, unable to use "
+				"the hardware device\n",
+				__func__, devdata->max_sync_size);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	devdata->max_sync_sg = (unsigned int)min(maxsynccop->comp_sg_limit,
+						maxsynccop->decomp_sg_limit);
+	if (devdata->max_sync_sg < 1) {
+		dev_err(devdata->dev, "%s: hardware max sg size (%u) is "
+				"less than the driver minimum, unable to use "
+				"the hardware device\n",
+				__func__, devdata->max_sync_sg);
+		ret = -EINVAL;
+		goto out;
+	}
+
+out:
+	return ret;
+}
+
+/**
+ *
+ * nx842_OF_upd -- Handle OF properties updates for the device.
+ *
+ * Set all properties from the OF tree.  Optionally, a new property
+ * can be provided by the @new_prop pointer to overwrite an existing value.
+ * The device will remain disabled until all values are valid, this function
+ * will return an error for updates unless all values are valid.
+ *
+ * @new_prop: If not NULL, this property is being updated.  If NULL, update
+ *  all properties from the current values in the OF tree.
+ *
+ * Returns:
+ *  0 - Success
+ *  -ENOMEM - Could not allocate memory for new devdata structure
+ *  -EINVAL - property value not found, new_prop is not a recognized
+ *	property for the device or property value is not valid.
+ *  -ENODEV - Device is not available
+ */
+static int nx842_OF_upd(struct property *new_prop)
+{
+	struct nx842_devdata *old_devdata = NULL;
+	struct nx842_devdata *new_devdata = NULL;
+	struct device_node *of_node = NULL;
+	struct property *status = NULL;
+	struct property *maxsglen = NULL;
+	struct property *maxsyncop = NULL;
+	int ret = 0;
+	unsigned long flags;
+
+	spin_lock_irqsave(&devdata_mutex, flags);
+	old_devdata = rcu_dereference_check(devdata,
+			lockdep_is_held(&devdata_mutex));
+	if (old_devdata)
+		of_node = old_devdata->dev->of_node;
+
+	if (!old_devdata || !of_node) {
+		pr_err("%s: device is not available\n", __func__);
+		spin_unlock_irqrestore(&devdata_mutex, flags);
+		return -ENODEV;
+	}
+
+	new_devdata = kzalloc(sizeof(*new_devdata), GFP_NOFS);
+	if (!new_devdata) {
+		dev_err(old_devdata->dev, "%s: Could not allocate memory for device data\n", __func__);
+		ret = -ENOMEM;
+		goto error_out;
+	}
+
+	memcpy(new_devdata, old_devdata, sizeof(*old_devdata));
+	new_devdata->counters = old_devdata->counters;
+
+	/* Set ptrs for existing properties */
+	status = of_find_property(of_node, "status", NULL);
+	maxsglen = of_find_property(of_node, "ibm,max-sg-len", NULL);
+	maxsyncop = of_find_property(of_node, "ibm,max-sync-cop", NULL);
+	if (!status || !maxsglen || !maxsyncop) {
+		dev_err(old_devdata->dev, "%s: Could not locate device properties\n", __func__);
+		ret = -EINVAL;
+		goto error_out;
+	}
+
+	/*
+	 * If this is a property update, there are only certain properties that
+	 * we care about. Bail if it isn't in the below list
+	 */
+	if (new_prop && (strncmp(new_prop->name, "status", new_prop->length) ||
+		         strncmp(new_prop->name, "ibm,max-sg-len", new_prop->length) ||
+		         strncmp(new_prop->name, "ibm,max-sync-cop", new_prop->length)))
+		goto out;
+
+	/* Perform property updates */
+	ret = nx842_OF_upd_status(new_devdata, status);
+	if (ret)
+		goto error_out;
+
+	ret = nx842_OF_upd_maxsglen(new_devdata, maxsglen);
+	if (ret)
+		goto error_out;
+
+	ret = nx842_OF_upd_maxsyncop(new_devdata, maxsyncop);
+	if (ret)
+		goto error_out;
+
+out:
+	dev_info(old_devdata->dev, "%s: max_sync_size new:%u old:%u\n",
+			__func__, new_devdata->max_sync_size,
+			old_devdata->max_sync_size);
+	dev_info(old_devdata->dev, "%s: max_sync_sg new:%u old:%u\n",
+			__func__, new_devdata->max_sync_sg,
+			old_devdata->max_sync_sg);
+	dev_info(old_devdata->dev, "%s: max_sg_len new:%u old:%u\n",
+			__func__, new_devdata->max_sg_len,
+			old_devdata->max_sg_len);
+
+	rcu_assign_pointer(devdata, new_devdata);
+	spin_unlock_irqrestore(&devdata_mutex, flags);
+	synchronize_rcu();
+	dev_set_drvdata(new_devdata->dev, new_devdata);
+	kfree(old_devdata);
+	return 0;
+
+error_out:
+	if (new_devdata) {
+		dev_info(old_devdata->dev, "%s: device disabled\n", __func__);
+		nx842_OF_set_defaults(new_devdata);
+		rcu_assign_pointer(devdata, new_devdata);
+		spin_unlock_irqrestore(&devdata_mutex, flags);
+		synchronize_rcu();
+		dev_set_drvdata(new_devdata->dev, new_devdata);
+		kfree(old_devdata);
+	} else {
+		dev_err(old_devdata->dev, "%s: could not update driver from hardware\n", __func__);
+		spin_unlock_irqrestore(&devdata_mutex, flags);
+	}
+
+	if (!ret)
+		ret = -EINVAL;
+	return ret;
+}
+
+/**
+ * nx842_OF_notifier - Process updates to OF properties for the device
+ *
+ * @np: notifier block
+ * @action: notifier action
+ * @update: struct pSeries_reconfig_prop_update pointer if action is
+ *	PSERIES_UPDATE_PROPERTY
+ *
+ * Returns:
+ *	NOTIFY_OK on success
+ *	NOTIFY_BAD encoded with error number on failure, use
+ *		notifier_to_errno() to decode this value
+ */
+static int nx842_OF_notifier(struct notifier_block *np, unsigned long action,
+			     void *data)
+{
+	struct of_reconfig_data *upd = data;
+	struct nx842_devdata *local_devdata;
+	struct device_node *node = NULL;
+
+	rcu_read_lock();
+	local_devdata = rcu_dereference(devdata);
+	if (local_devdata)
+		node = local_devdata->dev->of_node;
+
+	if (local_devdata &&
+			action == OF_RECONFIG_UPDATE_PROPERTY &&
+			!strcmp(upd->dn->name, node->name)) {
+		rcu_read_unlock();
+		nx842_OF_upd(upd->prop);
+	} else
+		rcu_read_unlock();
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block nx842_of_nb = {
+	.notifier_call = nx842_OF_notifier,
+};
+
+#define nx842_counter_read(_name)					\
+static ssize_t nx842_##_name##_show(struct device *dev,		\
+		struct device_attribute *attr,				\
+		char *buf) {						\
+	struct nx842_devdata *local_devdata;			\
+	int p = 0;							\
+	rcu_read_lock();						\
+	local_devdata = rcu_dereference(devdata);			\
+	if (local_devdata)						\
+		p = snprintf(buf, PAGE_SIZE, "%ld\n",			\
+		       atomic64_read(&local_devdata->counters->_name));	\
+	rcu_read_unlock();						\
+	return p;							\
+}
+
+#define NX842DEV_COUNTER_ATTR_RO(_name)					\
+	nx842_counter_read(_name);					\
+	static struct device_attribute dev_attr_##_name = __ATTR(_name,	\
+						0444,			\
+						nx842_##_name##_show,\
+						NULL);
+
+NX842DEV_COUNTER_ATTR_RO(comp_complete);
+NX842DEV_COUNTER_ATTR_RO(comp_failed);
+NX842DEV_COUNTER_ATTR_RO(decomp_complete);
+NX842DEV_COUNTER_ATTR_RO(decomp_failed);
+NX842DEV_COUNTER_ATTR_RO(swdecomp);
+
+static ssize_t nx842_timehist_show(struct device *,
+		struct device_attribute *, char *);
+
+static struct device_attribute dev_attr_comp_times = __ATTR(comp_times, 0444,
+		nx842_timehist_show, NULL);
+static struct device_attribute dev_attr_decomp_times = __ATTR(decomp_times,
+		0444, nx842_timehist_show, NULL);
+
+static ssize_t nx842_timehist_show(struct device *dev,
+		struct device_attribute *attr, char *buf) {
+	char *p = buf;
+	struct nx842_devdata *local_devdata;
+	atomic64_t *times;
+	int bytes_remain = PAGE_SIZE;
+	int bytes;
+	int i;
+
+	rcu_read_lock();
+	local_devdata = rcu_dereference(devdata);
+	if (!local_devdata) {
+		rcu_read_unlock();
+		return 0;
+	}
+
+	if (attr == &dev_attr_comp_times)
+		times = local_devdata->counters->comp_times;
+	else if (attr == &dev_attr_decomp_times)
+		times = local_devdata->counters->decomp_times;
+	else {
+		rcu_read_unlock();
+		return 0;
+	}
+
+	for (i = 0; i < (NX842_HIST_SLOTS - 2); i++) {
+		bytes = snprintf(p, bytes_remain, "%u-%uus:\t%ld\n",
+			       i ? (2<<(i-1)) : 0, (2<<i)-1,
+			       atomic64_read(&times[i]));
+		bytes_remain -= bytes;
+		p += bytes;
+	}
+	/* The last bucket holds everything over
+	 * 2<<(NX842_HIST_SLOTS - 2) us */
+	bytes = snprintf(p, bytes_remain, "%uus - :\t%ld\n",
+			2<<(NX842_HIST_SLOTS - 2),
+			atomic64_read(&times[(NX842_HIST_SLOTS - 1)]));
+	p += bytes;
+
+	rcu_read_unlock();
+	return p - buf;
+}
+
+static struct attribute *nx842_sysfs_entries[] = {
+	&dev_attr_comp_complete.attr,
+	&dev_attr_comp_failed.attr,
+	&dev_attr_decomp_complete.attr,
+	&dev_attr_decomp_failed.attr,
+	&dev_attr_swdecomp.attr,
+	&dev_attr_comp_times.attr,
+	&dev_attr_decomp_times.attr,
+	NULL,
+};
+
+static struct attribute_group nx842_attribute_group = {
+	.name = NULL,		/* put in device directory */
+	.attrs = nx842_sysfs_entries,
+};
+
+static int __init nx842_probe(struct vio_dev *viodev,
+				  const struct vio_device_id *id)
+{
+	struct nx842_devdata *old_devdata, *new_devdata = NULL;
+	unsigned long flags;
+	int ret = 0;
+
+	spin_lock_irqsave(&devdata_mutex, flags);
+	old_devdata = rcu_dereference_check(devdata,
+			lockdep_is_held(&devdata_mutex));
+
+	if (old_devdata && old_devdata->vdev != NULL) {
+		dev_err(&viodev->dev, "%s: Attempt to register more than one instance of the hardware\n", __func__);
+		ret = -1;
+		goto error_unlock;
+	}
+
+	dev_set_drvdata(&viodev->dev, NULL);
+
+	new_devdata = kzalloc(sizeof(*new_devdata), GFP_NOFS);
+	if (!new_devdata) {
+		dev_err(&viodev->dev, "%s: Could not allocate memory for device data\n", __func__);
+		ret = -ENOMEM;
+		goto error_unlock;
+	}
+
+	new_devdata->counters = kzalloc(sizeof(*new_devdata->counters),
+			GFP_NOFS);
+	if (!new_devdata->counters) {
+		dev_err(&viodev->dev, "%s: Could not allocate memory for performance counters\n", __func__);
+		ret = -ENOMEM;
+		goto error_unlock;
+	}
+
+	new_devdata->vdev = viodev;
+	new_devdata->dev = &viodev->dev;
+	nx842_OF_set_defaults(new_devdata);
+
+	rcu_assign_pointer(devdata, new_devdata);
+	spin_unlock_irqrestore(&devdata_mutex, flags);
+	synchronize_rcu();
+	kfree(old_devdata);
+
+	of_reconfig_notifier_register(&nx842_of_nb);
+
+	ret = nx842_OF_upd(NULL);
+	if (ret && ret != -ENODEV) {
+		dev_err(&viodev->dev, "could not parse device tree. %d\n", ret);
+		ret = -1;
+		goto error;
+	}
+
+	rcu_read_lock();
+	dev_set_drvdata(&viodev->dev, rcu_dereference(devdata));
+	rcu_read_unlock();
+
+	if (sysfs_create_group(&viodev->dev.kobj, &nx842_attribute_group)) {
+		dev_err(&viodev->dev, "could not create sysfs device attributes\n");
+		ret = -1;
+		goto error;
+	}
+
+	return 0;
+
+error_unlock:
+	spin_unlock_irqrestore(&devdata_mutex, flags);
+	if (new_devdata)
+		kfree(new_devdata->counters);
+	kfree(new_devdata);
+error:
+	return ret;
+}
+
+static int __exit nx842_remove(struct vio_dev *viodev)
+{
+	struct nx842_devdata *old_devdata;
+	unsigned long flags;
+
+	pr_info("Removing IBM Power 842 compression device\n");
+	sysfs_remove_group(&viodev->dev.kobj, &nx842_attribute_group);
+
+	spin_lock_irqsave(&devdata_mutex, flags);
+	old_devdata = rcu_dereference_check(devdata,
+			lockdep_is_held(&devdata_mutex));
+	of_reconfig_notifier_unregister(&nx842_of_nb);
+	RCU_INIT_POINTER(devdata, NULL);
+	spin_unlock_irqrestore(&devdata_mutex, flags);
+	synchronize_rcu();
+	dev_set_drvdata(&viodev->dev, NULL);
+	if (old_devdata)
+		kfree(old_devdata->counters);
+	kfree(old_devdata);
+	return 0;
+}
+
+static struct vio_device_id nx842_driver_ids[] = {
+	{"ibm,compression-v1", "ibm,compression"},
+	{"", ""},
+};
+
+static struct vio_driver nx842_driver = {
+	.name = MODULE_NAME,
+	.probe = nx842_probe,
+	.remove = __exit_p(nx842_remove),
+	.get_desired_dma = nx842_get_desired_dma,
+	.id_table = nx842_driver_ids,
+};
+
+static int __init nx842_init(void)
+{
+	struct nx842_devdata *new_devdata;
+	pr_info("Registering IBM Power 842 compression driver\n");
+
+	RCU_INIT_POINTER(devdata, NULL);
+	new_devdata = kzalloc(sizeof(*new_devdata), GFP_KERNEL);
+	if (!new_devdata) {
+		pr_err("Could not allocate memory for device data\n");
+		return -ENOMEM;
+	}
+	new_devdata->status = UNAVAILABLE;
+	RCU_INIT_POINTER(devdata, new_devdata);
+
+	return vio_register_driver(&nx842_driver);
+}
+
+module_init(nx842_init);
+
+static void __exit nx842_exit(void)
+{
+	struct nx842_devdata *old_devdata;
+	unsigned long flags;
+
+	pr_info("Exiting IBM Power 842 compression driver\n");
+	spin_lock_irqsave(&devdata_mutex, flags);
+	old_devdata = rcu_dereference_check(devdata,
+			lockdep_is_held(&devdata_mutex));
+	RCU_INIT_POINTER(devdata, NULL);
+	spin_unlock_irqrestore(&devdata_mutex, flags);
+	synchronize_rcu();
+	if (old_devdata)
+		dev_set_drvdata(old_devdata->dev, NULL);
+	kfree(old_devdata);
+	vio_unregister_driver(&nx842_driver);
+}
+
+module_exit(nx842_exit);
+
+/*********************************
+ * 842 software decompressor
+*********************************/
+typedef int (*sw842_template_op)(const char **, int *, unsigned char **,
+						struct sw842_fifo *);
+
+static int sw842_data8(const char **, int *, unsigned char **,
+						struct sw842_fifo *);
+static int sw842_data4(const char **, int *, unsigned char **,
+						struct sw842_fifo *);
+static int sw842_data2(const char **, int *, unsigned char **,
+						struct sw842_fifo *);
+static int sw842_ptr8(const char **, int *, unsigned char **,
+						struct sw842_fifo *);
+static int sw842_ptr4(const char **, int *, unsigned char **,
+						struct sw842_fifo *);
+static int sw842_ptr2(const char **, int *, unsigned char **,
+						struct sw842_fifo *);
+
+/* special templates */
+#define SW842_TMPL_REPEAT 0x1B
+#define SW842_TMPL_ZEROS 0x1C
+#define SW842_TMPL_EOF 0x1E
+
+static sw842_template_op sw842_tmpl_ops[26][4] = {
+	{ sw842_data8, NULL}, /* 0 (00000) */
+	{ sw842_data4, sw842_data2, sw842_ptr2,  NULL},
+	{ sw842_data4, sw842_ptr2,  sw842_data2, NULL},
+	{ sw842_data4, sw842_ptr2,  sw842_ptr2,  NULL},
+	{ sw842_data4, sw842_ptr4,  NULL},
+	{ sw842_data2, sw842_ptr2,  sw842_data4, NULL},
+	{ sw842_data2, sw842_ptr2,  sw842_data2, sw842_ptr2},
+	{ sw842_data2, sw842_ptr2,  sw842_ptr2,  sw842_data2},
+	{ sw842_data2, sw842_ptr2,  sw842_ptr2,  sw842_ptr2,},
+	{ sw842_data2, sw842_ptr2,  sw842_ptr4,  NULL},
+	{ sw842_ptr2,  sw842_data2, sw842_data4, NULL}, /* 10 (01010) */
+	{ sw842_ptr2,  sw842_data4, sw842_ptr2,  NULL},
+	{ sw842_ptr2,  sw842_data2, sw842_ptr2,  sw842_data2},
+	{ sw842_ptr2,  sw842_data2, sw842_ptr2,  sw842_ptr2},
+	{ sw842_ptr2,  sw842_data2, sw842_ptr4,  NULL},
+	{ sw842_ptr2,  sw842_ptr2,  sw842_data4, NULL},
+	{ sw842_ptr2,  sw842_ptr2,  sw842_data2, sw842_ptr2},
+	{ sw842_ptr2,  sw842_ptr2,  sw842_ptr2,  sw842_data2},
+	{ sw842_ptr2,  sw842_ptr2,  sw842_ptr2,  sw842_ptr2},
+	{ sw842_ptr2,  sw842_ptr2,  sw842_ptr4,  NULL},
+	{ sw842_ptr4,  sw842_data4, NULL}, /* 20 (10100) */
+	{ sw842_ptr4,  sw842_data2, sw842_ptr2,  NULL},
+	{ sw842_ptr4,  sw842_ptr2,  sw842_data2, NULL},
+	{ sw842_ptr4,  sw842_ptr2,  sw842_ptr2,  NULL},
+	{ sw842_ptr4,  sw842_ptr4,  NULL},
+	{ sw842_ptr8,  NULL}
+};
+
+/* Software decompress helpers */
+
+static uint8_t sw842_get_byte(const char *buf, int bit)
+{
+	uint8_t tmpl;
+	uint16_t tmp;
+	tmp = htons(*(uint16_t *)(buf));
+	tmp = (uint16_t)(tmp << bit);
+	tmp = ntohs(tmp);
+	memcpy(&tmpl, &tmp, 1);
+	return tmpl;
+}
+
+static uint8_t sw842_get_template(const char **buf, int *bit)
+{
+	uint8_t byte;
+	byte = sw842_get_byte(*buf, *bit);
+	byte = byte >> 3;
+	byte &= 0x1F;
+	*buf += (*bit + 5) / 8;
+	*bit = (*bit + 5) % 8;
+	return byte;
+}
+
+/* repeat_count happens to be 5-bit too (like the template) */
+static uint8_t sw842_get_repeat_count(const char **buf, int *bit)
+{
+	uint8_t byte;
+	byte = sw842_get_byte(*buf, *bit);
+	byte = byte >> 2;
+	byte &= 0x3F;
+	*buf += (*bit + 6) / 8;
+	*bit = (*bit + 6) % 8;
+	return byte;
+}
+
+static uint8_t sw842_get_ptr2(const char **buf, int *bit)
+{
+	uint8_t ptr;
+	ptr = sw842_get_byte(*buf, *bit);
+	(*buf)++;
+	return ptr;
+}
+
+static uint16_t sw842_get_ptr4(const char **buf, int *bit,
+		struct sw842_fifo *fifo)
+{
+	uint16_t ptr;
+	ptr = htons(*(uint16_t *)(*buf));
+	ptr = (uint16_t)(ptr << *bit);
+	ptr = ptr >> 7;
+	ptr &= 0x01FF;
+	*buf += (*bit + 9) / 8;
+	*bit = (*bit + 9) % 8;
+	return ptr;
+}
+
+static uint8_t sw842_get_ptr8(const char **buf, int *bit,
+		struct sw842_fifo *fifo)
+{
+	return sw842_get_ptr2(buf, bit);
+}
+
+/* Software decompress template ops */
+
+static int sw842_data8(const char **inbuf, int *inbit,
+		unsigned char **outbuf, struct sw842_fifo *fifo)
+{
+	int ret;
+
+	ret = sw842_data4(inbuf, inbit, outbuf, fifo);
+	if (ret)
+		return ret;
+	ret = sw842_data4(inbuf, inbit, outbuf, fifo);
+	return ret;
+}
+
+static int sw842_data4(const char **inbuf, int *inbit,
+		unsigned char **outbuf, struct sw842_fifo *fifo)
+{
+	int ret;
+
+	ret = sw842_data2(inbuf, inbit, outbuf, fifo);
+	if (ret)
+		return ret;
+	ret = sw842_data2(inbuf, inbit, outbuf, fifo);
+	return ret;
+}
+
+static int sw842_data2(const char **inbuf, int *inbit,
+		unsigned char **outbuf, struct sw842_fifo *fifo)
+{
+	**outbuf = sw842_get_byte(*inbuf, *inbit);
+	(*inbuf)++;
+	(*outbuf)++;
+	**outbuf = sw842_get_byte(*inbuf, *inbit);
+	(*inbuf)++;
+	(*outbuf)++;
+	return 0;
+}
+
+static int sw842_ptr8(const char **inbuf, int *inbit,
+		unsigned char **outbuf, struct sw842_fifo *fifo)
+{
+	uint8_t ptr;
+	ptr = sw842_get_ptr8(inbuf, inbit, fifo);
+	if (!fifo->f84_full && (ptr >= fifo->f8_count))
+		return 1;
+	memcpy(*outbuf, fifo->f8[ptr], 8);
+	*outbuf += 8;
+	return 0;
+}
+
+static int sw842_ptr4(const char **inbuf, int *inbit,
+		unsigned char **outbuf, struct sw842_fifo *fifo)
+{
+	uint16_t ptr;
+	ptr = sw842_get_ptr4(inbuf, inbit, fifo);
+	if (!fifo->f84_full && (ptr >= fifo->f4_count))
+		return 1;
+	memcpy(*outbuf, fifo->f4[ptr], 4);
+	*outbuf += 4;
+	return 0;
+}
+
+static int sw842_ptr2(const char **inbuf, int *inbit,
+		unsigned char **outbuf, struct sw842_fifo *fifo)
+{
+	uint8_t ptr;
+	ptr = sw842_get_ptr2(inbuf, inbit);
+	if (!fifo->f2_full && (ptr >= fifo->f2_count))
+		return 1;
+	memcpy(*outbuf, fifo->f2[ptr], 2);
+	*outbuf += 2;
+	return 0;
+}
+
+static void sw842_copy_to_fifo(const char *buf, struct sw842_fifo *fifo)
+{
+	unsigned char initial_f2count = fifo->f2_count;
+
+	memcpy(fifo->f8[fifo->f8_count], buf, 8);
+	fifo->f4_count += 2;
+	fifo->f8_count += 1;
+
+	if (!fifo->f84_full && fifo->f4_count >= 512) {
+		fifo->f84_full = 1;
+		fifo->f4_count /= 512;
+	}
+
+	memcpy(fifo->f2[fifo->f2_count++], buf, 2);
+	memcpy(fifo->f2[fifo->f2_count++], buf + 2, 2);
+	memcpy(fifo->f2[fifo->f2_count++], buf + 4, 2);
+	memcpy(fifo->f2[fifo->f2_count++], buf + 6, 2);
+	if (fifo->f2_count < initial_f2count)
+		fifo->f2_full = 1;
+}
+
+static int sw842_decompress(const unsigned char *src, int srclen,
+			unsigned char *dst, int *destlen,
+			const void *wrkmem)
+{
+	uint8_t tmpl;
+	const char *inbuf;
+	int inbit = 0;
+	unsigned char *outbuf, *outbuf_end, *origbuf, *prevbuf;
+	const char *inbuf_end;
+	sw842_template_op op;
+	int opindex;
+	int i, repeat_count;
+	struct sw842_fifo *fifo;
+	int ret = 0;
+
+	fifo = &((struct nx842_workmem *)(wrkmem))->swfifo;
+	memset(fifo, 0, sizeof(*fifo));
+
+	origbuf = NULL;
+	inbuf = src;
+	inbuf_end = src + srclen;
+	outbuf = dst;
+	outbuf_end = dst + *destlen;
+
+	while ((tmpl = sw842_get_template(&inbuf, &inbit)) != SW842_TMPL_EOF) {
+		if (inbuf >= inbuf_end) {
+			ret = -EINVAL;
+			goto out;
+		}
+
+		opindex = 0;
+		prevbuf = origbuf;
+		origbuf = outbuf;
+		switch (tmpl) {
+		case SW842_TMPL_REPEAT:
+			if (prevbuf == NULL) {
+				ret = -EINVAL;
+				goto out;
+			}
+
+			repeat_count = sw842_get_repeat_count(&inbuf,
+								&inbit) + 1;
+
+			/* Did the repeat count advance past the end of input */
+			if (inbuf > inbuf_end) {
+				ret = -EINVAL;
+				goto out;
+			}
+
+			for (i = 0; i < repeat_count; i++) {
+				/* Would this overflow the output buffer */
+				if ((outbuf + 8) > outbuf_end) {
+					ret = -ENOSPC;
+					goto out;
+				}
+
+				memcpy(outbuf, prevbuf, 8);
+				sw842_copy_to_fifo(outbuf, fifo);
+				outbuf += 8;
+			}
+			break;
+
+		case SW842_TMPL_ZEROS:
+			/* Would this overflow the output buffer */
+			if ((outbuf + 8) > outbuf_end) {
+				ret = -ENOSPC;
+				goto out;
+			}
+
+			memset(outbuf, 0, 8);
+			sw842_copy_to_fifo(outbuf, fifo);
+			outbuf += 8;
+			break;
+
+		default:
+			if (tmpl > 25) {
+				ret = -EINVAL;
+				goto out;
+			}
+
+			/* Does this go past the end of the input buffer */
+			if ((inbuf + 2) > inbuf_end) {
+				ret = -EINVAL;
+				goto out;
+			}
+
+			/* Would this overflow the output buffer */
+			if ((outbuf + 8) > outbuf_end) {
+				ret = -ENOSPC;
+				goto out;
+			}
+
+			while (opindex < 4 &&
+				(op = sw842_tmpl_ops[tmpl][opindex++])
+					!= NULL) {
+				ret = (*op)(&inbuf, &inbit, &outbuf, fifo);
+				if (ret) {
+					ret = -EINVAL;
+					goto out;
+				}
+				sw842_copy_to_fifo(origbuf, fifo);
+			}
+		}
+	}
+
+out:
+	if (!ret)
+		*destlen = (unsigned int)(outbuf - dst);
+	else
+		*destlen = 0;
+
+	return ret;
+}
diff --git a/drivers/crypto/nx/nx-842.c b/drivers/crypto/nx/nx-842.c
deleted file mode 100644
index 887196e..0000000
--- a/drivers/crypto/nx/nx-842.c
+++ /dev/null
@@ -1,1603 +0,0 @@
-/*
- * Driver for IBM Power 842 compression accelerator
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
- *
- * Copyright (C) IBM Corporation, 2012
- *
- * Authors: Robert Jennings <rcj@linux.vnet.ibm.com>
- *          Seth Jennings <sjenning@linux.vnet.ibm.com>
- */
-
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/nx842.h>
-#include <linux/of.h>
-#include <linux/slab.h>
-
-#include <asm/page.h>
-#include <asm/vio.h>
-
-#include "nx_csbcpb.h" /* struct nx_csbcpb */
-
-#define MODULE_NAME "nx-compress"
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Robert Jennings <rcj@linux.vnet.ibm.com>");
-MODULE_DESCRIPTION("842 H/W Compression driver for IBM Power processors");
-
-#define SHIFT_4K 12
-#define SHIFT_64K 16
-#define SIZE_4K (1UL << SHIFT_4K)
-#define SIZE_64K (1UL << SHIFT_64K)
-
-/* IO buffer must be 128 byte aligned */
-#define IO_BUFFER_ALIGN 128
-
-struct nx842_header {
-	int blocks_nr; /* number of compressed blocks */
-	int offset; /* offset of the first block (from beginning of header) */
-	int sizes[0]; /* size of compressed blocks */
-};
-
-static inline int nx842_header_size(const struct nx842_header *hdr)
-{
-	return sizeof(struct nx842_header) +
-			hdr->blocks_nr * sizeof(hdr->sizes[0]);
-}
-
-/* Macros for fields within nx_csbcpb */
-/* Check the valid bit within the csbcpb valid field */
-#define NX842_CSBCBP_VALID_CHK(x) (x & BIT_MASK(7))
-
-/* CE macros operate on the completion_extension field bits in the csbcpb.
- * CE0 0=full completion, 1=partial completion
- * CE1 0=CE0 indicates completion, 1=termination (output may be modified)
- * CE2 0=processed_bytes is source bytes, 1=processed_bytes is target bytes */
-#define NX842_CSBCPB_CE0(x)	(x & BIT_MASK(7))
-#define NX842_CSBCPB_CE1(x)	(x & BIT_MASK(6))
-#define NX842_CSBCPB_CE2(x)	(x & BIT_MASK(5))
-
-/* The NX unit accepts data only on 4K page boundaries */
-#define NX842_HW_PAGE_SHIFT	SHIFT_4K
-#define NX842_HW_PAGE_SIZE	(ASM_CONST(1) << NX842_HW_PAGE_SHIFT)
-#define NX842_HW_PAGE_MASK	(~(NX842_HW_PAGE_SIZE-1))
-
-enum nx842_status {
-	UNAVAILABLE,
-	AVAILABLE
-};
-
-struct ibm_nx842_counters {
-	atomic64_t comp_complete;
-	atomic64_t comp_failed;
-	atomic64_t decomp_complete;
-	atomic64_t decomp_failed;
-	atomic64_t swdecomp;
-	atomic64_t comp_times[32];
-	atomic64_t decomp_times[32];
-};
-
-static struct nx842_devdata {
-	struct vio_dev *vdev;
-	struct device *dev;
-	struct ibm_nx842_counters *counters;
-	unsigned int max_sg_len;
-	unsigned int max_sync_size;
-	unsigned int max_sync_sg;
-	enum nx842_status status;
-} __rcu *devdata;
-static DEFINE_SPINLOCK(devdata_mutex);
-
-#define NX842_COUNTER_INC(_x) \
-static inline void nx842_inc_##_x( \
-	const struct nx842_devdata *dev) { \
-	if (dev) \
-		atomic64_inc(&dev->counters->_x); \
-}
-NX842_COUNTER_INC(comp_complete);
-NX842_COUNTER_INC(comp_failed);
-NX842_COUNTER_INC(decomp_complete);
-NX842_COUNTER_INC(decomp_failed);
-NX842_COUNTER_INC(swdecomp);
-
-#define NX842_HIST_SLOTS 16
-
-static void ibm_nx842_incr_hist(atomic64_t *times, unsigned int time)
-{
-	int bucket = fls(time);
-
-	if (bucket)
-		bucket = min((NX842_HIST_SLOTS - 1), bucket - 1);
-
-	atomic64_inc(&times[bucket]);
-}
-
-/* NX unit operation flags */
-#define NX842_OP_COMPRESS	0x0
-#define NX842_OP_CRC		0x1
-#define NX842_OP_DECOMPRESS	0x2
-#define NX842_OP_COMPRESS_CRC   (NX842_OP_COMPRESS | NX842_OP_CRC)
-#define NX842_OP_DECOMPRESS_CRC (NX842_OP_DECOMPRESS | NX842_OP_CRC)
-#define NX842_OP_ASYNC		(1<<23)
-#define NX842_OP_NOTIFY		(1<<22)
-#define NX842_OP_NOTIFY_INT(x)	((x & 0xff)<<8)
-
-static unsigned long nx842_get_desired_dma(struct vio_dev *viodev)
-{
-	/* No use of DMA mappings within the driver. */
-	return 0;
-}
-
-struct nx842_slentry {
-	unsigned long ptr; /* Real address (use __pa()) */
-	unsigned long len;
-};
-
-/* pHyp scatterlist entry */
-struct nx842_scatterlist {
-	int entry_nr; /* number of slentries */
-	struct nx842_slentry *entries; /* ptr to array of slentries */
-};
-
-/* Does not include sizeof(entry_nr) in the size */
-static inline unsigned long nx842_get_scatterlist_size(
-				struct nx842_scatterlist *sl)
-{
-	return sl->entry_nr * sizeof(struct nx842_slentry);
-}
-
-static inline unsigned long nx842_get_pa(void *addr)
-{
-	if (is_vmalloc_addr(addr))
-		return page_to_phys(vmalloc_to_page(addr))
-		       + offset_in_page(addr);
-	else
-		return __pa(addr);
-}
-
-static int nx842_build_scatterlist(unsigned long buf, int len,
-			struct nx842_scatterlist *sl)
-{
-	unsigned long nextpage;
-	struct nx842_slentry *entry;
-
-	sl->entry_nr = 0;
-
-	entry = sl->entries;
-	while (len) {
-		entry->ptr = nx842_get_pa((void *)buf);
-		nextpage = ALIGN(buf + 1, NX842_HW_PAGE_SIZE);
-		if (nextpage < buf + len) {
-			/* we aren't at the end yet */
-			if (IS_ALIGNED(buf, NX842_HW_PAGE_SIZE))
-				/* we are in the middle (or beginning) */
-				entry->len = NX842_HW_PAGE_SIZE;
-			else
-				/* we are at the beginning */
-				entry->len = nextpage - buf;
-		} else {
-			/* at the end */
-			entry->len = len;
-		}
-
-		len -= entry->len;
-		buf += entry->len;
-		sl->entry_nr++;
-		entry++;
-	}
-
-	return 0;
-}
-
-/*
- * Working memory for software decompression
- */
-struct sw842_fifo {
-	union {
-		char f8[256][8];
-		char f4[512][4];
-	};
-	char f2[256][2];
-	unsigned char f84_full;
-	unsigned char f2_full;
-	unsigned char f8_count;
-	unsigned char f2_count;
-	unsigned int f4_count;
-};
-
-/*
- * Working memory for crypto API
- */
-struct nx842_workmem {
-	char bounce[PAGE_SIZE]; /* bounce buffer for decompression input */
-	union {
-		/* hardware working memory */
-		struct {
-			/* scatterlist */
-			char slin[SIZE_4K];
-			char slout[SIZE_4K];
-			/* coprocessor status/parameter block */
-			struct nx_csbcpb csbcpb;
-		};
-		/* software working memory */
-		struct sw842_fifo swfifo; /* software decompression fifo */
-	};
-};
-
-int nx842_get_workmem_size(void)
-{
-	return sizeof(struct nx842_workmem) + NX842_HW_PAGE_SIZE;
-}
-EXPORT_SYMBOL_GPL(nx842_get_workmem_size);
-
-int nx842_get_workmem_size_aligned(void)
-{
-	return sizeof(struct nx842_workmem);
-}
-EXPORT_SYMBOL_GPL(nx842_get_workmem_size_aligned);
-
-static int nx842_validate_result(struct device *dev,
-	struct cop_status_block *csb)
-{
-	/* The csb must be valid after returning from vio_h_cop_sync */
-	if (!NX842_CSBCBP_VALID_CHK(csb->valid)) {
-		dev_err(dev, "%s: cspcbp not valid upon completion.\n",
-				__func__);
-		dev_dbg(dev, "valid:0x%02x cs:0x%02x cc:0x%02x ce:0x%02x\n",
-				csb->valid,
-				csb->crb_seq_number,
-				csb->completion_code,
-				csb->completion_extension);
-		dev_dbg(dev, "processed_bytes:%d address:0x%016lx\n",
-				csb->processed_byte_count,
-				(unsigned long)csb->address);
-		return -EIO;
-	}
-
-	/* Check return values from the hardware in the CSB */
-	switch (csb->completion_code) {
-	case 0:	/* Completed without error */
-		break;
-	case 64: /* Target bytes > Source bytes during compression */
-	case 13: /* Output buffer too small */
-		dev_dbg(dev, "%s: Compression output larger than input\n",
-					__func__);
-		return -ENOSPC;
-	case 66: /* Input data contains an illegal template field */
-	case 67: /* Template indicates data past the end of the input stream */
-		dev_dbg(dev, "%s: Bad data for decompression (code:%d)\n",
-					__func__, csb->completion_code);
-		return -EINVAL;
-	default:
-		dev_dbg(dev, "%s: Unspecified error (code:%d)\n",
-					__func__, csb->completion_code);
-		return -EIO;
-	}
-
-	/* Hardware sanity check */
-	if (!NX842_CSBCPB_CE2(csb->completion_extension)) {
-		dev_err(dev, "%s: No error returned by hardware, but "
-				"data returned is unusable, contact support.\n"
-				"(Additional info: csbcbp->processed bytes "
-				"does not specify processed bytes for the "
-				"target buffer.)\n", __func__);
-		return -EIO;
-	}
-
-	return 0;
-}
-
-/**
- * nx842_compress - Compress data using the 842 algorithm
- *
- * Compression provide by the NX842 coprocessor on IBM Power systems.
- * The input buffer is compressed and the result is stored in the
- * provided output buffer.
- *
- * Upon return from this function @outlen contains the length of the
- * compressed data.  If there is an error then @outlen will be 0 and an
- * error will be specified by the return code from this function.
- *
- * @in: Pointer to input buffer, must be page aligned
- * @inlen: Length of input buffer, must be PAGE_SIZE
- * @out: Pointer to output buffer
- * @outlen: Length of output buffer
- * @wrkmem: ptr to buffer for working memory, size determined by
- *          nx842_get_workmem_size()
- *
- * Returns:
- *   0		Success, output of length @outlen stored in the buffer at @out
- *   -ENOMEM	Unable to allocate internal buffers
- *   -ENOSPC	Output buffer is to small
- *   -EMSGSIZE	XXX Difficult to describe this limitation
- *   -EIO	Internal error
- *   -ENODEV	Hardware unavailable
- */
-int nx842_compress(const unsigned char *in, unsigned int inlen,
-		       unsigned char *out, unsigned int *outlen, void *wmem)
-{
-	struct nx842_header *hdr;
-	struct nx842_devdata *local_devdata;
-	struct device *dev = NULL;
-	struct nx842_workmem *workmem;
-	struct nx842_scatterlist slin, slout;
-	struct nx_csbcpb *csbcpb;
-	int ret = 0, max_sync_size, i, bytesleft, size, hdrsize;
-	unsigned long inbuf, outbuf, padding;
-	struct vio_pfo_op op = {
-		.done = NULL,
-		.handle = 0,
-		.timeout = 0,
-	};
-	unsigned long start_time = get_tb();
-
-	/*
-	 * Make sure input buffer is 64k page aligned.  This is assumed since
-	 * this driver is designed for page compression only (for now).  This
-	 * is very nice since we can now use direct DDE(s) for the input and
-	 * the alignment is guaranteed.
-	*/
-	inbuf = (unsigned long)in;
-	if (!IS_ALIGNED(inbuf, PAGE_SIZE) || inlen != PAGE_SIZE)
-		return -EINVAL;
-
-	rcu_read_lock();
-	local_devdata = rcu_dereference(devdata);
-	if (!local_devdata || !local_devdata->dev) {
-		rcu_read_unlock();
-		return -ENODEV;
-	}
-	max_sync_size = local_devdata->max_sync_size;
-	dev = local_devdata->dev;
-
-	/* Create the header */
-	hdr = (struct nx842_header *)out;
-	hdr->blocks_nr = PAGE_SIZE / max_sync_size;
-	hdrsize = nx842_header_size(hdr);
-	outbuf = (unsigned long)out + hdrsize;
-	bytesleft = *outlen - hdrsize;
-
-	/* Init scatterlist */
-	workmem = (struct nx842_workmem *)ALIGN((unsigned long)wmem,
-		NX842_HW_PAGE_SIZE);
-	slin.entries = (struct nx842_slentry *)workmem->slin;
-	slout.entries = (struct nx842_slentry *)workmem->slout;
-
-	/* Init operation */
-	op.flags = NX842_OP_COMPRESS;
-	csbcpb = &workmem->csbcpb;
-	memset(csbcpb, 0, sizeof(*csbcpb));
-	op.csbcpb = nx842_get_pa(csbcpb);
-	op.out = nx842_get_pa(slout.entries);
-
-	for (i = 0; i < hdr->blocks_nr; i++) {
-		/*
-		 * Aligning the output blocks to 128 bytes does waste space,
-		 * but it prevents the need for bounce buffers and memory
-		 * copies.  It also simplifies the code a lot.  In the worst
-		 * case (64k page, 4k max_sync_size), you lose up to
-		 * (128*16)/64k = ~3% the compression factor. For 64k
-		 * max_sync_size, the loss would be at most 128/64k = ~0.2%.
-		 */
-		padding = ALIGN(outbuf, IO_BUFFER_ALIGN) - outbuf;
-		outbuf += padding;
-		bytesleft -= padding;
-		if (i == 0)
-			/* save offset into first block in header */
-			hdr->offset = padding + hdrsize;
-
-		if (bytesleft <= 0) {
-			ret = -ENOSPC;
-			goto unlock;
-		}
-
-		/*
-		 * NOTE: If the default max_sync_size is changed from 4k
-		 * to 64k, remove the "likely" case below, since a
-		 * scatterlist will always be needed.
-		 */
-		if (likely(max_sync_size == NX842_HW_PAGE_SIZE)) {
-			/* Create direct DDE */
-			op.in = nx842_get_pa((void *)inbuf);
-			op.inlen = max_sync_size;
-
-		} else {
-			/* Create indirect DDE (scatterlist) */
-			nx842_build_scatterlist(inbuf, max_sync_size, &slin);
-			op.in = nx842_get_pa(slin.entries);
-			op.inlen = -nx842_get_scatterlist_size(&slin);
-		}
-
-		/*
-		 * If max_sync_size != NX842_HW_PAGE_SIZE, an indirect
-		 * DDE is required for the outbuf.
-		 * If max_sync_size == NX842_HW_PAGE_SIZE, outbuf must
-		 * also be page aligned (1 in 128/4k=32 chance) in order
-		 * to use a direct DDE.
-		 * This is unlikely, just use an indirect DDE always.
-		 */
-		nx842_build_scatterlist(outbuf,
-			min(bytesleft, max_sync_size), &slout);
-		/* op.out set before loop */
-		op.outlen = -nx842_get_scatterlist_size(&slout);
-
-		/* Send request to pHyp */
-		ret = vio_h_cop_sync(local_devdata->vdev, &op);
-
-		/* Check for pHyp error */
-		if (ret) {
-			dev_dbg(dev, "%s: vio_h_cop_sync error (ret=%d, hret=%ld)\n",
-				__func__, ret, op.hcall_err);
-			ret = -EIO;
-			goto unlock;
-		}
-
-		/* Check for hardware error */
-		ret = nx842_validate_result(dev, &csbcpb->csb);
-		if (ret && ret != -ENOSPC)
-			goto unlock;
-
-		/* Handle incompressible data */
-		if (unlikely(ret == -ENOSPC)) {
-			if (bytesleft < max_sync_size) {
-				/*
-				 * Not enough space left in the output buffer
-				 * to store uncompressed block
-				 */
-				goto unlock;
-			} else {
-				/* Store incompressible block */
-				memcpy((void *)outbuf, (void *)inbuf,
-					max_sync_size);
-				hdr->sizes[i] = -max_sync_size;
-				outbuf += max_sync_size;
-				bytesleft -= max_sync_size;
-				/* Reset ret, incompressible data handled */
-				ret = 0;
-			}
-		} else {
-			/* Normal case, compression was successful */
-			size = csbcpb->csb.processed_byte_count;
-			dev_dbg(dev, "%s: processed_bytes=%d\n",
-				__func__, size);
-			hdr->sizes[i] = size;
-			outbuf += size;
-			bytesleft -= size;
-		}
-
-		inbuf += max_sync_size;
-	}
-
-	*outlen = (unsigned int)(outbuf - (unsigned long)out);
-
-unlock:
-	if (ret)
-		nx842_inc_comp_failed(local_devdata);
-	else {
-		nx842_inc_comp_complete(local_devdata);
-		ibm_nx842_incr_hist(local_devdata->counters->comp_times,
-			(get_tb() - start_time) / tb_ticks_per_usec);
-	}
-	rcu_read_unlock();
-	return ret;
-}
-EXPORT_SYMBOL_GPL(nx842_compress);
-
-static int sw842_decompress(const unsigned char *, int, unsigned char *, int *,
-			const void *);
-
-/**
- * nx842_decompress - Decompress data using the 842 algorithm
- *
- * Decompression provide by the NX842 coprocessor on IBM Power systems.
- * The input buffer is decompressed and the result is stored in the
- * provided output buffer.  The size allocated to the output buffer is
- * provided by the caller of this function in @outlen.  Upon return from
- * this function @outlen contains the length of the decompressed data.
- * If there is an error then @outlen will be 0 and an error will be
- * specified by the return code from this function.
- *
- * @in: Pointer to input buffer, will use bounce buffer if not 128 byte
- *      aligned
- * @inlen: Length of input buffer
- * @out: Pointer to output buffer, must be page aligned
- * @outlen: Length of output buffer, must be PAGE_SIZE
- * @wrkmem: ptr to buffer for working memory, size determined by
- *          nx842_get_workmem_size()
- *
- * Returns:
- *   0		Success, output of length @outlen stored in the buffer at @out
- *   -ENODEV	Hardware decompression device is unavailable
- *   -ENOMEM	Unable to allocate internal buffers
- *   -ENOSPC	Output buffer is to small
- *   -EINVAL	Bad input data encountered when attempting decompress
- *   -EIO	Internal error
- */
-int nx842_decompress(const unsigned char *in, unsigned int inlen,
-			 unsigned char *out, unsigned int *outlen, void *wmem)
-{
-	struct nx842_header *hdr;
-	struct nx842_devdata *local_devdata;
-	struct device *dev = NULL;
-	struct nx842_workmem *workmem;
-	struct nx842_scatterlist slin, slout;
-	struct nx_csbcpb *csbcpb;
-	int ret = 0, i, size, max_sync_size;
-	unsigned long inbuf, outbuf;
-	struct vio_pfo_op op = {
-		.done = NULL,
-		.handle = 0,
-		.timeout = 0,
-	};
-	unsigned long start_time = get_tb();
-
-	/* Ensure page alignment and size */
-	outbuf = (unsigned long)out;
-	if (!IS_ALIGNED(outbuf, PAGE_SIZE) || *outlen != PAGE_SIZE)
-		return -EINVAL;
-
-	rcu_read_lock();
-	local_devdata = rcu_dereference(devdata);
-	if (local_devdata)
-		dev = local_devdata->dev;
-
-	/* Get header */
-	hdr = (struct nx842_header *)in;
-
-	workmem = (struct nx842_workmem *)ALIGN((unsigned long)wmem,
-		NX842_HW_PAGE_SIZE);
-
-	inbuf = (unsigned long)in + hdr->offset;
-	if (likely(!IS_ALIGNED(inbuf, IO_BUFFER_ALIGN))) {
-		/* Copy block(s) into bounce buffer for alignment */
-		memcpy(workmem->bounce, in + hdr->offset, inlen - hdr->offset);
-		inbuf = (unsigned long)workmem->bounce;
-	}
-
-	/* Init scatterlist */
-	slin.entries = (struct nx842_slentry *)workmem->slin;
-	slout.entries = (struct nx842_slentry *)workmem->slout;
-
-	/* Init operation */
-	op.flags = NX842_OP_DECOMPRESS;
-	csbcpb = &workmem->csbcpb;
-	memset(csbcpb, 0, sizeof(*csbcpb));
-	op.csbcpb = nx842_get_pa(csbcpb);
-
-	/*
-	 * max_sync_size may have changed since compression,
-	 * so we can't read it from the device info. We need
-	 * to derive it from hdr->blocks_nr.
-	 */
-	max_sync_size = PAGE_SIZE / hdr->blocks_nr;
-
-	for (i = 0; i < hdr->blocks_nr; i++) {
-		/* Skip padding */
-		inbuf = ALIGN(inbuf, IO_BUFFER_ALIGN);
-
-		if (hdr->sizes[i] < 0) {
-			/* Negative sizes indicate uncompressed data blocks */
-			size = abs(hdr->sizes[i]);
-			memcpy((void *)outbuf, (void *)inbuf, size);
-			outbuf += size;
-			inbuf += size;
-			continue;
-		}
-
-		if (!dev)
-			goto sw;
-
-		/*
-		 * The better the compression, the more likely the "likely"
-		 * case becomes.
-		 */
-		if (likely((inbuf & NX842_HW_PAGE_MASK) ==
-			((inbuf + hdr->sizes[i] - 1) & NX842_HW_PAGE_MASK))) {
-			/* Create direct DDE */
-			op.in = nx842_get_pa((void *)inbuf);
-			op.inlen = hdr->sizes[i];
-		} else {
-			/* Create indirect DDE (scatterlist) */
-			nx842_build_scatterlist(inbuf, hdr->sizes[i] , &slin);
-			op.in = nx842_get_pa(slin.entries);
-			op.inlen = -nx842_get_scatterlist_size(&slin);
-		}
-
-		/*
-		 * NOTE: If the default max_sync_size is changed from 4k
-		 * to 64k, remove the "likely" case below, since a
-		 * scatterlist will always be needed.
-		 */
-		if (likely(max_sync_size == NX842_HW_PAGE_SIZE)) {
-			/* Create direct DDE */
-			op.out = nx842_get_pa((void *)outbuf);
-			op.outlen = max_sync_size;
-		} else {
-			/* Create indirect DDE (scatterlist) */
-			nx842_build_scatterlist(outbuf, max_sync_size, &slout);
-			op.out = nx842_get_pa(slout.entries);
-			op.outlen = -nx842_get_scatterlist_size(&slout);
-		}
-
-		/* Send request to pHyp */
-		ret = vio_h_cop_sync(local_devdata->vdev, &op);
-
-		/* Check for pHyp error */
-		if (ret) {
-			dev_dbg(dev, "%s: vio_h_cop_sync error (ret=%d, hret=%ld)\n",
-				__func__, ret, op.hcall_err);
-			dev = NULL;
-			goto sw;
-		}
-
-		/* Check for hardware error */
-		ret = nx842_validate_result(dev, &csbcpb->csb);
-		if (ret) {
-			dev = NULL;
-			goto sw;
-		}
-
-		/* HW decompression success */
-		inbuf += hdr->sizes[i];
-		outbuf += csbcpb->csb.processed_byte_count;
-		continue;
-
-sw:
-		/* software decompression */
-		size = max_sync_size;
-		ret = sw842_decompress(
-			(unsigned char *)inbuf, hdr->sizes[i],
-			(unsigned char *)outbuf, &size, wmem);
-		if (ret)
-			pr_debug("%s: sw842_decompress failed with %d\n",
-				__func__, ret);
-
-		if (ret) {
-			if (ret != -ENOSPC && ret != -EINVAL &&
-					ret != -EMSGSIZE)
-				ret = -EIO;
-			goto unlock;
-		}
-
-		/* SW decompression success */
-		inbuf += hdr->sizes[i];
-		outbuf += size;
-	}
-
-	*outlen = (unsigned int)(outbuf - (unsigned long)out);
-
-unlock:
-	if (ret)
-		/* decompress fail */
-		nx842_inc_decomp_failed(local_devdata);
-	else {
-		if (!dev)
-			/* software decompress */
-			nx842_inc_swdecomp(local_devdata);
-		nx842_inc_decomp_complete(local_devdata);
-		ibm_nx842_incr_hist(local_devdata->counters->decomp_times,
-			(get_tb() - start_time) / tb_ticks_per_usec);
-	}
-
-	rcu_read_unlock();
-	return ret;
-}
-EXPORT_SYMBOL_GPL(nx842_decompress);
-
-/**
- * nx842_OF_set_defaults -- Set default (disabled) values for devdata
- *
- * @devdata - struct nx842_devdata to update
- *
- * Returns:
- *  0 on success
- *  -ENOENT if @devdata ptr is NULL
- */
-static int nx842_OF_set_defaults(struct nx842_devdata *devdata)
-{
-	if (devdata) {
-		devdata->max_sync_size = 0;
-		devdata->max_sync_sg = 0;
-		devdata->max_sg_len = 0;
-		devdata->status = UNAVAILABLE;
-		return 0;
-	} else
-		return -ENOENT;
-}
-
-/**
- * nx842_OF_upd_status -- Update the device info from OF status prop
- *
- * The status property indicates if the accelerator is enabled.  If the
- * device is in the OF tree it indicates that the hardware is present.
- * The status field indicates if the device is enabled when the status
- * is 'okay'.  Otherwise the device driver will be disabled.
- *
- * @devdata - struct nx842_devdata to update
- * @prop - struct property point containing the maxsyncop for the update
- *
- * Returns:
- *  0 - Device is available
- *  -EINVAL - Device is not available
- */
-static int nx842_OF_upd_status(struct nx842_devdata *devdata,
-					struct property *prop) {
-	int ret = 0;
-	const char *status = (const char *)prop->value;
-
-	if (!strncmp(status, "okay", (size_t)prop->length)) {
-		devdata->status = AVAILABLE;
-	} else {
-		dev_info(devdata->dev, "%s: status '%s' is not 'okay'\n",
-				__func__, status);
-		devdata->status = UNAVAILABLE;
-	}
-
-	return ret;
-}
-
-/**
- * nx842_OF_upd_maxsglen -- Update the device info from OF maxsglen prop
- *
- * Definition of the 'ibm,max-sg-len' OF property:
- *  This field indicates the maximum byte length of a scatter list
- *  for the platform facility. It is a single cell encoded as with encode-int.
- *
- * Example:
- *  # od -x ibm,max-sg-len
- *  0000000 0000 0ff0
- *
- *  In this example, the maximum byte length of a scatter list is
- *  0x0ff0 (4,080).
- *
- * @devdata - struct nx842_devdata to update
- * @prop - struct property point containing the maxsyncop for the update
- *
- * Returns:
- *  0 on success
- *  -EINVAL on failure
- */
-static int nx842_OF_upd_maxsglen(struct nx842_devdata *devdata,
-					struct property *prop) {
-	int ret = 0;
-	const int *maxsglen = prop->value;
-
-	if (prop->length != sizeof(*maxsglen)) {
-		dev_err(devdata->dev, "%s: unexpected format for ibm,max-sg-len property\n", __func__);
-		dev_dbg(devdata->dev, "%s: ibm,max-sg-len is %d bytes long, expected %lu bytes\n", __func__,
-				prop->length, sizeof(*maxsglen));
-		ret = -EINVAL;
-	} else {
-		devdata->max_sg_len = (unsigned int)min(*maxsglen,
-				(int)NX842_HW_PAGE_SIZE);
-	}
-
-	return ret;
-}
-
-/**
- * nx842_OF_upd_maxsyncop -- Update the device info from OF maxsyncop prop
- *
- * Definition of the 'ibm,max-sync-cop' OF property:
- *  Two series of cells.  The first series of cells represents the maximums
- *  that can be synchronously compressed. The second series of cells
- *  represents the maximums that can be synchronously decompressed.
- *  1. The first cell in each series contains the count of the number of
- *     data length, scatter list elements pairs that follow – each being
- *     of the form
- *    a. One cell data byte length
- *    b. One cell total number of scatter list elements
- *
- * Example:
- *  # od -x ibm,max-sync-cop
- *  0000000 0000 0001 0000 1000 0000 01fe 0000 0001
- *  0000020 0000 1000 0000 01fe
- *
- *  In this example, compression supports 0x1000 (4,096) data byte length
- *  and 0x1fe (510) total scatter list elements.  Decompression supports
- *  0x1000 (4,096) data byte length and 0x1f3 (510) total scatter list
- *  elements.
- *
- * @devdata - struct nx842_devdata to update
- * @prop - struct property point containing the maxsyncop for the update
- *
- * Returns:
- *  0 on success
- *  -EINVAL on failure
- */
-static int nx842_OF_upd_maxsyncop(struct nx842_devdata *devdata,
-					struct property *prop) {
-	int ret = 0;
-	const struct maxsynccop_t {
-		int comp_elements;
-		int comp_data_limit;
-		int comp_sg_limit;
-		int decomp_elements;
-		int decomp_data_limit;
-		int decomp_sg_limit;
-	} *maxsynccop;
-
-	if (prop->length != sizeof(*maxsynccop)) {
-		dev_err(devdata->dev, "%s: unexpected format for ibm,max-sync-cop property\n", __func__);
-		dev_dbg(devdata->dev, "%s: ibm,max-sync-cop is %d bytes long, expected %lu bytes\n", __func__, prop->length,
-				sizeof(*maxsynccop));
-		ret = -EINVAL;
-		goto out;
-	}
-
-	maxsynccop = (const struct maxsynccop_t *)prop->value;
-
-	/* Use one limit rather than separate limits for compression and
-	 * decompression. Set a maximum for this so as not to exceed the
-	 * size that the header can support and round the value down to
-	 * the hardware page size (4K) */
-	devdata->max_sync_size =
-			(unsigned int)min(maxsynccop->comp_data_limit,
-					maxsynccop->decomp_data_limit);
-
-	devdata->max_sync_size = min_t(unsigned int, devdata->max_sync_size,
-					SIZE_64K);
-
-	if (devdata->max_sync_size < SIZE_4K) {
-		dev_err(devdata->dev, "%s: hardware max data size (%u) is "
-				"less than the driver minimum, unable to use "
-				"the hardware device\n",
-				__func__, devdata->max_sync_size);
-		ret = -EINVAL;
-		goto out;
-	}
-
-	devdata->max_sync_sg = (unsigned int)min(maxsynccop->comp_sg_limit,
-						maxsynccop->decomp_sg_limit);
-	if (devdata->max_sync_sg < 1) {
-		dev_err(devdata->dev, "%s: hardware max sg size (%u) is "
-				"less than the driver minimum, unable to use "
-				"the hardware device\n",
-				__func__, devdata->max_sync_sg);
-		ret = -EINVAL;
-		goto out;
-	}
-
-out:
-	return ret;
-}
-
-/**
- *
- * nx842_OF_upd -- Handle OF properties updates for the device.
- *
- * Set all properties from the OF tree.  Optionally, a new property
- * can be provided by the @new_prop pointer to overwrite an existing value.
- * The device will remain disabled until all values are valid, this function
- * will return an error for updates unless all values are valid.
- *
- * @new_prop: If not NULL, this property is being updated.  If NULL, update
- *  all properties from the current values in the OF tree.
- *
- * Returns:
- *  0 - Success
- *  -ENOMEM - Could not allocate memory for new devdata structure
- *  -EINVAL - property value not found, new_prop is not a recognized
- *	property for the device or property value is not valid.
- *  -ENODEV - Device is not available
- */
-static int nx842_OF_upd(struct property *new_prop)
-{
-	struct nx842_devdata *old_devdata = NULL;
-	struct nx842_devdata *new_devdata = NULL;
-	struct device_node *of_node = NULL;
-	struct property *status = NULL;
-	struct property *maxsglen = NULL;
-	struct property *maxsyncop = NULL;
-	int ret = 0;
-	unsigned long flags;
-
-	spin_lock_irqsave(&devdata_mutex, flags);
-	old_devdata = rcu_dereference_check(devdata,
-			lockdep_is_held(&devdata_mutex));
-	if (old_devdata)
-		of_node = old_devdata->dev->of_node;
-
-	if (!old_devdata || !of_node) {
-		pr_err("%s: device is not available\n", __func__);
-		spin_unlock_irqrestore(&devdata_mutex, flags);
-		return -ENODEV;
-	}
-
-	new_devdata = kzalloc(sizeof(*new_devdata), GFP_NOFS);
-	if (!new_devdata) {
-		dev_err(old_devdata->dev, "%s: Could not allocate memory for device data\n", __func__);
-		ret = -ENOMEM;
-		goto error_out;
-	}
-
-	memcpy(new_devdata, old_devdata, sizeof(*old_devdata));
-	new_devdata->counters = old_devdata->counters;
-
-	/* Set ptrs for existing properties */
-	status = of_find_property(of_node, "status", NULL);
-	maxsglen = of_find_property(of_node, "ibm,max-sg-len", NULL);
-	maxsyncop = of_find_property(of_node, "ibm,max-sync-cop", NULL);
-	if (!status || !maxsglen || !maxsyncop) {
-		dev_err(old_devdata->dev, "%s: Could not locate device properties\n", __func__);
-		ret = -EINVAL;
-		goto error_out;
-	}
-
-	/*
-	 * If this is a property update, there are only certain properties that
-	 * we care about. Bail if it isn't in the below list
-	 */
-	if (new_prop && (strncmp(new_prop->name, "status", new_prop->length) ||
-		         strncmp(new_prop->name, "ibm,max-sg-len", new_prop->length) ||
-		         strncmp(new_prop->name, "ibm,max-sync-cop", new_prop->length)))
-		goto out;
-
-	/* Perform property updates */
-	ret = nx842_OF_upd_status(new_devdata, status);
-	if (ret)
-		goto error_out;
-
-	ret = nx842_OF_upd_maxsglen(new_devdata, maxsglen);
-	if (ret)
-		goto error_out;
-
-	ret = nx842_OF_upd_maxsyncop(new_devdata, maxsyncop);
-	if (ret)
-		goto error_out;
-
-out:
-	dev_info(old_devdata->dev, "%s: max_sync_size new:%u old:%u\n",
-			__func__, new_devdata->max_sync_size,
-			old_devdata->max_sync_size);
-	dev_info(old_devdata->dev, "%s: max_sync_sg new:%u old:%u\n",
-			__func__, new_devdata->max_sync_sg,
-			old_devdata->max_sync_sg);
-	dev_info(old_devdata->dev, "%s: max_sg_len new:%u old:%u\n",
-			__func__, new_devdata->max_sg_len,
-			old_devdata->max_sg_len);
-
-	rcu_assign_pointer(devdata, new_devdata);
-	spin_unlock_irqrestore(&devdata_mutex, flags);
-	synchronize_rcu();
-	dev_set_drvdata(new_devdata->dev, new_devdata);
-	kfree(old_devdata);
-	return 0;
-
-error_out:
-	if (new_devdata) {
-		dev_info(old_devdata->dev, "%s: device disabled\n", __func__);
-		nx842_OF_set_defaults(new_devdata);
-		rcu_assign_pointer(devdata, new_devdata);
-		spin_unlock_irqrestore(&devdata_mutex, flags);
-		synchronize_rcu();
-		dev_set_drvdata(new_devdata->dev, new_devdata);
-		kfree(old_devdata);
-	} else {
-		dev_err(old_devdata->dev, "%s: could not update driver from hardware\n", __func__);
-		spin_unlock_irqrestore(&devdata_mutex, flags);
-	}
-
-	if (!ret)
-		ret = -EINVAL;
-	return ret;
-}
-
-/**
- * nx842_OF_notifier - Process updates to OF properties for the device
- *
- * @np: notifier block
- * @action: notifier action
- * @update: struct pSeries_reconfig_prop_update pointer if action is
- *	PSERIES_UPDATE_PROPERTY
- *
- * Returns:
- *	NOTIFY_OK on success
- *	NOTIFY_BAD encoded with error number on failure, use
- *		notifier_to_errno() to decode this value
- */
-static int nx842_OF_notifier(struct notifier_block *np, unsigned long action,
-			     void *data)
-{
-	struct of_reconfig_data *upd = data;
-	struct nx842_devdata *local_devdata;
-	struct device_node *node = NULL;
-
-	rcu_read_lock();
-	local_devdata = rcu_dereference(devdata);
-	if (local_devdata)
-		node = local_devdata->dev->of_node;
-
-	if (local_devdata &&
-			action == OF_RECONFIG_UPDATE_PROPERTY &&
-			!strcmp(upd->dn->name, node->name)) {
-		rcu_read_unlock();
-		nx842_OF_upd(upd->prop);
-	} else
-		rcu_read_unlock();
-
-	return NOTIFY_OK;
-}
-
-static struct notifier_block nx842_of_nb = {
-	.notifier_call = nx842_OF_notifier,
-};
-
-#define nx842_counter_read(_name)					\
-static ssize_t nx842_##_name##_show(struct device *dev,		\
-		struct device_attribute *attr,				\
-		char *buf) {						\
-	struct nx842_devdata *local_devdata;			\
-	int p = 0;							\
-	rcu_read_lock();						\
-	local_devdata = rcu_dereference(devdata);			\
-	if (local_devdata)						\
-		p = snprintf(buf, PAGE_SIZE, "%ld\n",			\
-		       atomic64_read(&local_devdata->counters->_name));	\
-	rcu_read_unlock();						\
-	return p;							\
-}
-
-#define NX842DEV_COUNTER_ATTR_RO(_name)					\
-	nx842_counter_read(_name);					\
-	static struct device_attribute dev_attr_##_name = __ATTR(_name,	\
-						0444,			\
-						nx842_##_name##_show,\
-						NULL);
-
-NX842DEV_COUNTER_ATTR_RO(comp_complete);
-NX842DEV_COUNTER_ATTR_RO(comp_failed);
-NX842DEV_COUNTER_ATTR_RO(decomp_complete);
-NX842DEV_COUNTER_ATTR_RO(decomp_failed);
-NX842DEV_COUNTER_ATTR_RO(swdecomp);
-
-static ssize_t nx842_timehist_show(struct device *,
-		struct device_attribute *, char *);
-
-static struct device_attribute dev_attr_comp_times = __ATTR(comp_times, 0444,
-		nx842_timehist_show, NULL);
-static struct device_attribute dev_attr_decomp_times = __ATTR(decomp_times,
-		0444, nx842_timehist_show, NULL);
-
-static ssize_t nx842_timehist_show(struct device *dev,
-		struct device_attribute *attr, char *buf) {
-	char *p = buf;
-	struct nx842_devdata *local_devdata;
-	atomic64_t *times;
-	int bytes_remain = PAGE_SIZE;
-	int bytes;
-	int i;
-
-	rcu_read_lock();
-	local_devdata = rcu_dereference(devdata);
-	if (!local_devdata) {
-		rcu_read_unlock();
-		return 0;
-	}
-
-	if (attr == &dev_attr_comp_times)
-		times = local_devdata->counters->comp_times;
-	else if (attr == &dev_attr_decomp_times)
-		times = local_devdata->counters->decomp_times;
-	else {
-		rcu_read_unlock();
-		return 0;
-	}
-
-	for (i = 0; i < (NX842_HIST_SLOTS - 2); i++) {
-		bytes = snprintf(p, bytes_remain, "%u-%uus:\t%ld\n",
-			       i ? (2<<(i-1)) : 0, (2<<i)-1,
-			       atomic64_read(&times[i]));
-		bytes_remain -= bytes;
-		p += bytes;
-	}
-	/* The last bucket holds everything over
-	 * 2<<(NX842_HIST_SLOTS - 2) us */
-	bytes = snprintf(p, bytes_remain, "%uus - :\t%ld\n",
-			2<<(NX842_HIST_SLOTS - 2),
-			atomic64_read(&times[(NX842_HIST_SLOTS - 1)]));
-	p += bytes;
-
-	rcu_read_unlock();
-	return p - buf;
-}
-
-static struct attribute *nx842_sysfs_entries[] = {
-	&dev_attr_comp_complete.attr,
-	&dev_attr_comp_failed.attr,
-	&dev_attr_decomp_complete.attr,
-	&dev_attr_decomp_failed.attr,
-	&dev_attr_swdecomp.attr,
-	&dev_attr_comp_times.attr,
-	&dev_attr_decomp_times.attr,
-	NULL,
-};
-
-static struct attribute_group nx842_attribute_group = {
-	.name = NULL,		/* put in device directory */
-	.attrs = nx842_sysfs_entries,
-};
-
-static int __init nx842_probe(struct vio_dev *viodev,
-				  const struct vio_device_id *id)
-{
-	struct nx842_devdata *old_devdata, *new_devdata = NULL;
-	unsigned long flags;
-	int ret = 0;
-
-	spin_lock_irqsave(&devdata_mutex, flags);
-	old_devdata = rcu_dereference_check(devdata,
-			lockdep_is_held(&devdata_mutex));
-
-	if (old_devdata && old_devdata->vdev != NULL) {
-		dev_err(&viodev->dev, "%s: Attempt to register more than one instance of the hardware\n", __func__);
-		ret = -1;
-		goto error_unlock;
-	}
-
-	dev_set_drvdata(&viodev->dev, NULL);
-
-	new_devdata = kzalloc(sizeof(*new_devdata), GFP_NOFS);
-	if (!new_devdata) {
-		dev_err(&viodev->dev, "%s: Could not allocate memory for device data\n", __func__);
-		ret = -ENOMEM;
-		goto error_unlock;
-	}
-
-	new_devdata->counters = kzalloc(sizeof(*new_devdata->counters),
-			GFP_NOFS);
-	if (!new_devdata->counters) {
-		dev_err(&viodev->dev, "%s: Could not allocate memory for performance counters\n", __func__);
-		ret = -ENOMEM;
-		goto error_unlock;
-	}
-
-	new_devdata->vdev = viodev;
-	new_devdata->dev = &viodev->dev;
-	nx842_OF_set_defaults(new_devdata);
-
-	rcu_assign_pointer(devdata, new_devdata);
-	spin_unlock_irqrestore(&devdata_mutex, flags);
-	synchronize_rcu();
-	kfree(old_devdata);
-
-	of_reconfig_notifier_register(&nx842_of_nb);
-
-	ret = nx842_OF_upd(NULL);
-	if (ret && ret != -ENODEV) {
-		dev_err(&viodev->dev, "could not parse device tree. %d\n", ret);
-		ret = -1;
-		goto error;
-	}
-
-	rcu_read_lock();
-	dev_set_drvdata(&viodev->dev, rcu_dereference(devdata));
-	rcu_read_unlock();
-
-	if (sysfs_create_group(&viodev->dev.kobj, &nx842_attribute_group)) {
-		dev_err(&viodev->dev, "could not create sysfs device attributes\n");
-		ret = -1;
-		goto error;
-	}
-
-	return 0;
-
-error_unlock:
-	spin_unlock_irqrestore(&devdata_mutex, flags);
-	if (new_devdata)
-		kfree(new_devdata->counters);
-	kfree(new_devdata);
-error:
-	return ret;
-}
-
-static int __exit nx842_remove(struct vio_dev *viodev)
-{
-	struct nx842_devdata *old_devdata;
-	unsigned long flags;
-
-	pr_info("Removing IBM Power 842 compression device\n");
-	sysfs_remove_group(&viodev->dev.kobj, &nx842_attribute_group);
-
-	spin_lock_irqsave(&devdata_mutex, flags);
-	old_devdata = rcu_dereference_check(devdata,
-			lockdep_is_held(&devdata_mutex));
-	of_reconfig_notifier_unregister(&nx842_of_nb);
-	RCU_INIT_POINTER(devdata, NULL);
-	spin_unlock_irqrestore(&devdata_mutex, flags);
-	synchronize_rcu();
-	dev_set_drvdata(&viodev->dev, NULL);
-	if (old_devdata)
-		kfree(old_devdata->counters);
-	kfree(old_devdata);
-	return 0;
-}
-
-static struct vio_device_id nx842_driver_ids[] = {
-	{"ibm,compression-v1", "ibm,compression"},
-	{"", ""},
-};
-
-static struct vio_driver nx842_driver = {
-	.name = MODULE_NAME,
-	.probe = nx842_probe,
-	.remove = __exit_p(nx842_remove),
-	.get_desired_dma = nx842_get_desired_dma,
-	.id_table = nx842_driver_ids,
-};
-
-static int __init nx842_init(void)
-{
-	struct nx842_devdata *new_devdata;
-	pr_info("Registering IBM Power 842 compression driver\n");
-
-	RCU_INIT_POINTER(devdata, NULL);
-	new_devdata = kzalloc(sizeof(*new_devdata), GFP_KERNEL);
-	if (!new_devdata) {
-		pr_err("Could not allocate memory for device data\n");
-		return -ENOMEM;
-	}
-	new_devdata->status = UNAVAILABLE;
-	RCU_INIT_POINTER(devdata, new_devdata);
-
-	return vio_register_driver(&nx842_driver);
-}
-
-module_init(nx842_init);
-
-static void __exit nx842_exit(void)
-{
-	struct nx842_devdata *old_devdata;
-	unsigned long flags;
-
-	pr_info("Exiting IBM Power 842 compression driver\n");
-	spin_lock_irqsave(&devdata_mutex, flags);
-	old_devdata = rcu_dereference_check(devdata,
-			lockdep_is_held(&devdata_mutex));
-	RCU_INIT_POINTER(devdata, NULL);
-	spin_unlock_irqrestore(&devdata_mutex, flags);
-	synchronize_rcu();
-	if (old_devdata)
-		dev_set_drvdata(old_devdata->dev, NULL);
-	kfree(old_devdata);
-	vio_unregister_driver(&nx842_driver);
-}
-
-module_exit(nx842_exit);
-
-/*********************************
- * 842 software decompressor
-*********************************/
-typedef int (*sw842_template_op)(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-
-static int sw842_data8(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-static int sw842_data4(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-static int sw842_data2(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-static int sw842_ptr8(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-static int sw842_ptr4(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-static int sw842_ptr2(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-
-/* special templates */
-#define SW842_TMPL_REPEAT 0x1B
-#define SW842_TMPL_ZEROS 0x1C
-#define SW842_TMPL_EOF 0x1E
-
-static sw842_template_op sw842_tmpl_ops[26][4] = {
-	{ sw842_data8, NULL}, /* 0 (00000) */
-	{ sw842_data4, sw842_data2, sw842_ptr2,  NULL},
-	{ sw842_data4, sw842_ptr2,  sw842_data2, NULL},
-	{ sw842_data4, sw842_ptr2,  sw842_ptr2,  NULL},
-	{ sw842_data4, sw842_ptr4,  NULL},
-	{ sw842_data2, sw842_ptr2,  sw842_data4, NULL},
-	{ sw842_data2, sw842_ptr2,  sw842_data2, sw842_ptr2},
-	{ sw842_data2, sw842_ptr2,  sw842_ptr2,  sw842_data2},
-	{ sw842_data2, sw842_ptr2,  sw842_ptr2,  sw842_ptr2,},
-	{ sw842_data2, sw842_ptr2,  sw842_ptr4,  NULL},
-	{ sw842_ptr2,  sw842_data2, sw842_data4, NULL}, /* 10 (01010) */
-	{ sw842_ptr2,  sw842_data4, sw842_ptr2,  NULL},
-	{ sw842_ptr2,  sw842_data2, sw842_ptr2,  sw842_data2},
-	{ sw842_ptr2,  sw842_data2, sw842_ptr2,  sw842_ptr2},
-	{ sw842_ptr2,  sw842_data2, sw842_ptr4,  NULL},
-	{ sw842_ptr2,  sw842_ptr2,  sw842_data4, NULL},
-	{ sw842_ptr2,  sw842_ptr2,  sw842_data2, sw842_ptr2},
-	{ sw842_ptr2,  sw842_ptr2,  sw842_ptr2,  sw842_data2},
-	{ sw842_ptr2,  sw842_ptr2,  sw842_ptr2,  sw842_ptr2},
-	{ sw842_ptr2,  sw842_ptr2,  sw842_ptr4,  NULL},
-	{ sw842_ptr4,  sw842_data4, NULL}, /* 20 (10100) */
-	{ sw842_ptr4,  sw842_data2, sw842_ptr2,  NULL},
-	{ sw842_ptr4,  sw842_ptr2,  sw842_data2, NULL},
-	{ sw842_ptr4,  sw842_ptr2,  sw842_ptr2,  NULL},
-	{ sw842_ptr4,  sw842_ptr4,  NULL},
-	{ sw842_ptr8,  NULL}
-};
-
-/* Software decompress helpers */
-
-static uint8_t sw842_get_byte(const char *buf, int bit)
-{
-	uint8_t tmpl;
-	uint16_t tmp;
-	tmp = htons(*(uint16_t *)(buf));
-	tmp = (uint16_t)(tmp << bit);
-	tmp = ntohs(tmp);
-	memcpy(&tmpl, &tmp, 1);
-	return tmpl;
-}
-
-static uint8_t sw842_get_template(const char **buf, int *bit)
-{
-	uint8_t byte;
-	byte = sw842_get_byte(*buf, *bit);
-	byte = byte >> 3;
-	byte &= 0x1F;
-	*buf += (*bit + 5) / 8;
-	*bit = (*bit + 5) % 8;
-	return byte;
-}
-
-/* repeat_count happens to be 5-bit too (like the template) */
-static uint8_t sw842_get_repeat_count(const char **buf, int *bit)
-{
-	uint8_t byte;
-	byte = sw842_get_byte(*buf, *bit);
-	byte = byte >> 2;
-	byte &= 0x3F;
-	*buf += (*bit + 6) / 8;
-	*bit = (*bit + 6) % 8;
-	return byte;
-}
-
-static uint8_t sw842_get_ptr2(const char **buf, int *bit)
-{
-	uint8_t ptr;
-	ptr = sw842_get_byte(*buf, *bit);
-	(*buf)++;
-	return ptr;
-}
-
-static uint16_t sw842_get_ptr4(const char **buf, int *bit,
-		struct sw842_fifo *fifo)
-{
-	uint16_t ptr;
-	ptr = htons(*(uint16_t *)(*buf));
-	ptr = (uint16_t)(ptr << *bit);
-	ptr = ptr >> 7;
-	ptr &= 0x01FF;
-	*buf += (*bit + 9) / 8;
-	*bit = (*bit + 9) % 8;
-	return ptr;
-}
-
-static uint8_t sw842_get_ptr8(const char **buf, int *bit,
-		struct sw842_fifo *fifo)
-{
-	return sw842_get_ptr2(buf, bit);
-}
-
-/* Software decompress template ops */
-
-static int sw842_data8(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	int ret;
-
-	ret = sw842_data4(inbuf, inbit, outbuf, fifo);
-	if (ret)
-		return ret;
-	ret = sw842_data4(inbuf, inbit, outbuf, fifo);
-	return ret;
-}
-
-static int sw842_data4(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	int ret;
-
-	ret = sw842_data2(inbuf, inbit, outbuf, fifo);
-	if (ret)
-		return ret;
-	ret = sw842_data2(inbuf, inbit, outbuf, fifo);
-	return ret;
-}
-
-static int sw842_data2(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	**outbuf = sw842_get_byte(*inbuf, *inbit);
-	(*inbuf)++;
-	(*outbuf)++;
-	**outbuf = sw842_get_byte(*inbuf, *inbit);
-	(*inbuf)++;
-	(*outbuf)++;
-	return 0;
-}
-
-static int sw842_ptr8(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	uint8_t ptr;
-	ptr = sw842_get_ptr8(inbuf, inbit, fifo);
-	if (!fifo->f84_full && (ptr >= fifo->f8_count))
-		return 1;
-	memcpy(*outbuf, fifo->f8[ptr], 8);
-	*outbuf += 8;
-	return 0;
-}
-
-static int sw842_ptr4(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	uint16_t ptr;
-	ptr = sw842_get_ptr4(inbuf, inbit, fifo);
-	if (!fifo->f84_full && (ptr >= fifo->f4_count))
-		return 1;
-	memcpy(*outbuf, fifo->f4[ptr], 4);
-	*outbuf += 4;
-	return 0;
-}
-
-static int sw842_ptr2(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	uint8_t ptr;
-	ptr = sw842_get_ptr2(inbuf, inbit);
-	if (!fifo->f2_full && (ptr >= fifo->f2_count))
-		return 1;
-	memcpy(*outbuf, fifo->f2[ptr], 2);
-	*outbuf += 2;
-	return 0;
-}
-
-static void sw842_copy_to_fifo(const char *buf, struct sw842_fifo *fifo)
-{
-	unsigned char initial_f2count = fifo->f2_count;
-
-	memcpy(fifo->f8[fifo->f8_count], buf, 8);
-	fifo->f4_count += 2;
-	fifo->f8_count += 1;
-
-	if (!fifo->f84_full && fifo->f4_count >= 512) {
-		fifo->f84_full = 1;
-		fifo->f4_count /= 512;
-	}
-
-	memcpy(fifo->f2[fifo->f2_count++], buf, 2);
-	memcpy(fifo->f2[fifo->f2_count++], buf + 2, 2);
-	memcpy(fifo->f2[fifo->f2_count++], buf + 4, 2);
-	memcpy(fifo->f2[fifo->f2_count++], buf + 6, 2);
-	if (fifo->f2_count < initial_f2count)
-		fifo->f2_full = 1;
-}
-
-static int sw842_decompress(const unsigned char *src, int srclen,
-			unsigned char *dst, int *destlen,
-			const void *wrkmem)
-{
-	uint8_t tmpl;
-	const char *inbuf;
-	int inbit = 0;
-	unsigned char *outbuf, *outbuf_end, *origbuf, *prevbuf;
-	const char *inbuf_end;
-	sw842_template_op op;
-	int opindex;
-	int i, repeat_count;
-	struct sw842_fifo *fifo;
-	int ret = 0;
-
-	fifo = &((struct nx842_workmem *)(wrkmem))->swfifo;
-	memset(fifo, 0, sizeof(*fifo));
-
-	origbuf = NULL;
-	inbuf = src;
-	inbuf_end = src + srclen;
-	outbuf = dst;
-	outbuf_end = dst + *destlen;
-
-	while ((tmpl = sw842_get_template(&inbuf, &inbit)) != SW842_TMPL_EOF) {
-		if (inbuf >= inbuf_end) {
-			ret = -EINVAL;
-			goto out;
-		}
-
-		opindex = 0;
-		prevbuf = origbuf;
-		origbuf = outbuf;
-		switch (tmpl) {
-		case SW842_TMPL_REPEAT:
-			if (prevbuf == NULL) {
-				ret = -EINVAL;
-				goto out;
-			}
-
-			repeat_count = sw842_get_repeat_count(&inbuf,
-								&inbit) + 1;
-
-			/* Did the repeat count advance past the end of input */
-			if (inbuf > inbuf_end) {
-				ret = -EINVAL;
-				goto out;
-			}
-
-			for (i = 0; i < repeat_count; i++) {
-				/* Would this overflow the output buffer */
-				if ((outbuf + 8) > outbuf_end) {
-					ret = -ENOSPC;
-					goto out;
-				}
-
-				memcpy(outbuf, prevbuf, 8);
-				sw842_copy_to_fifo(outbuf, fifo);
-				outbuf += 8;
-			}
-			break;
-
-		case SW842_TMPL_ZEROS:
-			/* Would this overflow the output buffer */
-			if ((outbuf + 8) > outbuf_end) {
-				ret = -ENOSPC;
-				goto out;
-			}
-
-			memset(outbuf, 0, 8);
-			sw842_copy_to_fifo(outbuf, fifo);
-			outbuf += 8;
-			break;
-
-		default:
-			if (tmpl > 25) {
-				ret = -EINVAL;
-				goto out;
-			}
-
-			/* Does this go past the end of the input buffer */
-			if ((inbuf + 2) > inbuf_end) {
-				ret = -EINVAL;
-				goto out;
-			}
-
-			/* Would this overflow the output buffer */
-			if ((outbuf + 8) > outbuf_end) {
-				ret = -ENOSPC;
-				goto out;
-			}
-
-			while (opindex < 4 &&
-				(op = sw842_tmpl_ops[tmpl][opindex++])
-					!= NULL) {
-				ret = (*op)(&inbuf, &inbit, &outbuf, fifo);
-				if (ret) {
-					ret = -EINVAL;
-					goto out;
-				}
-				sw842_copy_to_fifo(origbuf, fifo);
-			}
-		}
-	}
-
-out:
-	if (!ret)
-		*destlen = (unsigned int)(outbuf - dst);
-	else
-		*destlen = 0;
-
-	return ret;
-}
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 06/10] drivers/crypto/nx: add NX-842 platform frontend driver
  2015-05-06 16:50 ` [PATCHv2 00/10] add 842 hw compression for PowerNV platform Dan Streetman
                     ` (4 preceding siblings ...)
  2015-05-06 16:51   ` [PATCH 05/10] drivers/crypto/nx: rename nx-842.c to nx-842-pseries.c Dan Streetman
@ 2015-05-06 16:51   ` Dan Streetman
  2015-05-06 16:51   ` [PATCH 07/10] drivers/crypto/nx: add nx842 constraints Dan Streetman
                     ` (4 subsequent siblings)
  10 siblings, 0 replies; 46+ messages in thread
From: Dan Streetman @ 2015-05-06 16:51 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Dan Streetman, Seth Jennings, Robert Jennings, Andrew Morton,
	Greg KH, Mauro Carvalho Chehab, Arnd Bergmann, Joe Perches,
	linux-crypto, linuxppc-dev, linux-kernel

Add NX-842 frontend that allows using either the pSeries platform or
PowerNV platform driver (to be added by later patch) for the NX-842
hardware.  Update the MAINTAINERS file to include the new filenames.
Update Kconfig files to clarify titles and descriptions, and correct
dependencies.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 MAINTAINERS                        |   2 +-
 drivers/crypto/Kconfig             |  10 +--
 drivers/crypto/nx/Kconfig          |  35 ++++++---
 drivers/crypto/nx/Makefile         |   4 +-
 drivers/crypto/nx/nx-842-pseries.c |  57 +++++++--------
 drivers/crypto/nx/nx-842.c         | 144 +++++++++++++++++++++++++++++++++++++
 drivers/crypto/nx/nx-842.h         |  32 +++++++++
 include/linux/nx842.h              |  10 +--
 8 files changed, 245 insertions(+), 49 deletions(-)
 create mode 100644 drivers/crypto/nx/nx-842.c
 create mode 100644 drivers/crypto/nx/nx-842.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 5a5c1dc..e71855f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4870,7 +4870,7 @@ F:	drivers/crypto/nx/
 IBM Power 842 compression accelerator
 M:	Dan Streetman <ddstreet@us.ibm.com>
 S:	Supported
-F:	drivers/crypto/nx/nx-842.c
+F:	drivers/crypto/nx/nx-842*
 F:	include/linux/nx842.h
 F:	include/linux/sw842.h
 F:	crypto/842.c
diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index 033c0c8..872de26 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -312,11 +312,13 @@ config CRYPTO_DEV_S5P
 	  algorithms execution.
 
 config CRYPTO_DEV_NX
-	bool "Support for IBM Power7+ in-Nest cryptographic acceleration"
-	depends on PPC64 && IBMVIO && !CPU_LITTLE_ENDIAN
-	default n
+	bool "Support for IBM PowerPC Nest (NX) cryptographic acceleration"
+	depends on PPC64
 	help
-	  Support for Power7+ in-Nest cryptographic acceleration.
+	  This enables support for the NX hardware cryptographic accelerator
+	  coprocessor that is in IBM PowerPC P7+ or later processors.  This
+	  does not actually enable any drivers, it only allows you to select
+	  which acceleration type (encryption and/or compression) to enable.
 
 if CRYPTO_DEV_NX
 	source "drivers/crypto/nx/Kconfig"
diff --git a/drivers/crypto/nx/Kconfig b/drivers/crypto/nx/Kconfig
index f826166..34013f7 100644
--- a/drivers/crypto/nx/Kconfig
+++ b/drivers/crypto/nx/Kconfig
@@ -1,7 +1,9 @@
+
 config CRYPTO_DEV_NX_ENCRYPT
-	tristate "Encryption acceleration support"
-	depends on PPC64 && IBMVIO
+	tristate "Encryption acceleration support on pSeries platform"
+	depends on PPC_PSERIES && IBMVIO && !CPU_LITTLE_ENDIAN
 	default y
+	select CRYPTO_ALGAPI
 	select CRYPTO_AES
 	select CRYPTO_CBC
 	select CRYPTO_ECB
@@ -12,15 +14,30 @@ config CRYPTO_DEV_NX_ENCRYPT
 	select CRYPTO_SHA256
 	select CRYPTO_SHA512
 	help
-	  Support for Power7+ in-Nest encryption acceleration. This
-	  module supports acceleration for AES and SHA2 algorithms. If you
-	  choose 'M' here, this module will be called nx_crypto.
+	  Support for PowerPC Nest (NX) encryption acceleration. This
+	  module supports acceleration for AES and SHA2 algorithms on
+	  the pSeries platform.  If you choose 'M' here, this module
+	  will be called nx_crypto.
 
 config CRYPTO_DEV_NX_COMPRESS
 	tristate "Compression acceleration support"
-	depends on PPC64 && IBMVIO
 	default y
 	help
-	  Support for Power7+ in-Nest compression acceleration. This
-	  module supports acceleration for AES and SHA2 algorithms. If you
-	  choose 'M' here, this module will be called nx_compress.
+	  Support for PowerPC Nest (NX) compression acceleration. This
+	  module supports acceleration for compressing memory with the 842
+	  algorithm.  One of the platform drivers must be selected also.
+	  If you choose 'M' here, this module will be called nx_compress.
+
+if CRYPTO_DEV_NX_COMPRESS
+
+config CRYPTO_DEV_NX_COMPRESS_PSERIES
+	tristate "Compression acceleration support on pSeries platform"
+	depends on PPC_PSERIES && IBMVIO && !CPU_LITTLE_ENDIAN
+	default y
+	help
+	  Support for PowerPC Nest (NX) compression acceleration. This
+	  module supports acceleration for compressing memory with the 842
+	  algorithm.  This supports NX hardware on the pSeries platform.
+	  If you choose 'M' here, this module will be called nx_compress_pseries.
+
+endif
diff --git a/drivers/crypto/nx/Makefile b/drivers/crypto/nx/Makefile
index 8669ffa..5d9f4bc 100644
--- a/drivers/crypto/nx/Makefile
+++ b/drivers/crypto/nx/Makefile
@@ -11,4 +11,6 @@ nx-crypto-objs := nx.o \
 		  nx-sha512.o
 
 obj-$(CONFIG_CRYPTO_DEV_NX_COMPRESS) += nx-compress.o
-nx-compress-objs := nx-842-pseries.o
+obj-$(CONFIG_CRYPTO_DEV_NX_COMPRESS_PSERIES) += nx-compress-pseries.o
+nx-compress-objs := nx-842.o
+nx-compress-pseries-objs := nx-842-pseries.o
diff --git a/drivers/crypto/nx/nx-842-pseries.c b/drivers/crypto/nx/nx-842-pseries.c
index 887196e..9b83c9e 100644
--- a/drivers/crypto/nx/nx-842-pseries.c
+++ b/drivers/crypto/nx/nx-842-pseries.c
@@ -21,18 +21,13 @@
  *          Seth Jennings <sjenning@linux.vnet.ibm.com>
  */
 
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/nx842.h>
-#include <linux/of.h>
-#include <linux/slab.h>
-
 #include <asm/page.h>
 #include <asm/vio.h>
 
+#include "nx-842.h"
 #include "nx_csbcpb.h" /* struct nx_csbcpb */
 
-#define MODULE_NAME "nx-compress"
+#define MODULE_NAME NX842_PSERIES_MODULE_NAME
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Robert Jennings <rcj@linux.vnet.ibm.com>");
 MODULE_DESCRIPTION("842 H/W Compression driver for IBM Power processors");
@@ -236,18 +231,6 @@ struct nx842_workmem {
 	};
 };
 
-int nx842_get_workmem_size(void)
-{
-	return sizeof(struct nx842_workmem) + NX842_HW_PAGE_SIZE;
-}
-EXPORT_SYMBOL_GPL(nx842_get_workmem_size);
-
-int nx842_get_workmem_size_aligned(void)
-{
-	return sizeof(struct nx842_workmem);
-}
-EXPORT_SYMBOL_GPL(nx842_get_workmem_size_aligned);
-
 static int nx842_validate_result(struct device *dev,
 	struct cop_status_block *csb)
 {
@@ -300,7 +283,7 @@ static int nx842_validate_result(struct device *dev,
 }
 
 /**
- * nx842_compress - Compress data using the 842 algorithm
+ * nx842_pseries_compress - Compress data using the 842 algorithm
  *
  * Compression provide by the NX842 coprocessor on IBM Power systems.
  * The input buffer is compressed and the result is stored in the
@@ -315,7 +298,7 @@ static int nx842_validate_result(struct device *dev,
  * @out: Pointer to output buffer
  * @outlen: Length of output buffer
  * @wrkmem: ptr to buffer for working memory, size determined by
- *          nx842_get_workmem_size()
+ *          NX842_MEM_COMPRESS
  *
  * Returns:
  *   0		Success, output of length @outlen stored in the buffer at @out
@@ -325,8 +308,9 @@ static int nx842_validate_result(struct device *dev,
  *   -EIO	Internal error
  *   -ENODEV	Hardware unavailable
  */
-int nx842_compress(const unsigned char *in, unsigned int inlen,
-		       unsigned char *out, unsigned int *outlen, void *wmem)
+static int nx842_pseries_compress(const unsigned char *in, unsigned int inlen,
+				  unsigned char *out, unsigned int *outlen,
+				  void *wmem)
 {
 	struct nx842_header *hdr;
 	struct nx842_devdata *local_devdata;
@@ -493,13 +477,12 @@ unlock:
 	rcu_read_unlock();
 	return ret;
 }
-EXPORT_SYMBOL_GPL(nx842_compress);
 
 static int sw842_decompress(const unsigned char *, int, unsigned char *, int *,
 			const void *);
 
 /**
- * nx842_decompress - Decompress data using the 842 algorithm
+ * nx842_pseries_decompress - Decompress data using the 842 algorithm
  *
  * Decompression provide by the NX842 coprocessor on IBM Power systems.
  * The input buffer is decompressed and the result is stored in the
@@ -515,7 +498,7 @@ static int sw842_decompress(const unsigned char *, int, unsigned char *, int *,
  * @out: Pointer to output buffer, must be page aligned
  * @outlen: Length of output buffer, must be PAGE_SIZE
  * @wrkmem: ptr to buffer for working memory, size determined by
- *          nx842_get_workmem_size()
+ *          NX842_MEM_COMPRESS
  *
  * Returns:
  *   0		Success, output of length @outlen stored in the buffer at @out
@@ -525,8 +508,9 @@ static int sw842_decompress(const unsigned char *, int, unsigned char *, int *,
  *   -EINVAL	Bad input data encountered when attempting decompress
  *   -EIO	Internal error
  */
-int nx842_decompress(const unsigned char *in, unsigned int inlen,
-			 unsigned char *out, unsigned int *outlen, void *wmem)
+static int nx842_pseries_decompress(const unsigned char *in, unsigned int inlen,
+				    unsigned char *out, unsigned int *outlen,
+				    void *wmem)
 {
 	struct nx842_header *hdr;
 	struct nx842_devdata *local_devdata;
@@ -694,7 +678,6 @@ unlock:
 	rcu_read_unlock();
 	return ret;
 }
-EXPORT_SYMBOL_GPL(nx842_decompress);
 
 /**
  * nx842_OF_set_defaults -- Set default (disabled) values for devdata
@@ -1130,6 +1113,12 @@ static struct attribute_group nx842_attribute_group = {
 	.attrs = nx842_sysfs_entries,
 };
 
+static struct nx842_driver nx842_pseries_driver = {
+	.owner =	THIS_MODULE,
+	.compress =	nx842_pseries_compress,
+	.decompress =	nx842_pseries_decompress,
+};
+
 static int __init nx842_probe(struct vio_dev *viodev,
 				  const struct vio_device_id *id)
 {
@@ -1192,6 +1181,8 @@ static int __init nx842_probe(struct vio_dev *viodev,
 		goto error;
 	}
 
+	nx842_register_driver(&nx842_pseries_driver);
+
 	return 0;
 
 error_unlock:
@@ -1222,11 +1213,14 @@ static int __exit nx842_remove(struct vio_dev *viodev)
 	if (old_devdata)
 		kfree(old_devdata->counters);
 	kfree(old_devdata);
+
+	nx842_unregister_driver(&nx842_pseries_driver);
+
 	return 0;
 }
 
 static struct vio_device_id nx842_driver_ids[] = {
-	{"ibm,compression-v1", "ibm,compression"},
+	{NX842_PSERIES_COMPAT_NAME "-v1", NX842_PSERIES_COMPAT_NAME},
 	{"", ""},
 };
 
@@ -1243,6 +1237,8 @@ static int __init nx842_init(void)
 	struct nx842_devdata *new_devdata;
 	pr_info("Registering IBM Power 842 compression driver\n");
 
+	BUILD_BUG_ON(sizeof(struct nx842_workmem) > NX842_MEM_COMPRESS);
+
 	RCU_INIT_POINTER(devdata, NULL);
 	new_devdata = kzalloc(sizeof(*new_devdata), GFP_KERNEL);
 	if (!new_devdata) {
@@ -1272,6 +1268,7 @@ static void __exit nx842_exit(void)
 	if (old_devdata)
 		dev_set_drvdata(old_devdata->dev, NULL);
 	kfree(old_devdata);
+	nx842_unregister_driver(&nx842_pseries_driver);
 	vio_unregister_driver(&nx842_driver);
 }
 
diff --git a/drivers/crypto/nx/nx-842.c b/drivers/crypto/nx/nx-842.c
new file mode 100644
index 0000000..f1f378e
--- /dev/null
+++ b/drivers/crypto/nx/nx-842.c
@@ -0,0 +1,144 @@
+/*
+ * Driver frontend for IBM Power 842 compression accelerator
+ *
+ * Copyright (C) 2015 Dan Streetman, IBM Corp
+ *
+ * Designer of the Power data compression engine:
+ *   Bulent Abali <abali@us.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include "nx-842.h"
+
+#define MODULE_NAME "nx-compress"
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Dan Streetman <ddstreet@ieee.org>");
+MODULE_DESCRIPTION("842 H/W Compression driver for IBM Power processors");
+
+/* Only one driver is expected, based on the HW platform */
+static struct nx842_driver *nx842_driver;
+static DEFINE_SPINLOCK(nx842_driver_lock); /* protects driver pointers */
+
+void nx842_register_driver(struct nx842_driver *driver)
+{
+	spin_lock(&nx842_driver_lock);
+
+	if (nx842_driver) {
+		pr_err("can't register driver %s, already using driver %s\n",
+		       driver->owner->name, nx842_driver->owner->name);
+	} else {
+		pr_info("registering driver %s\n", driver->owner->name);
+		nx842_driver = driver;
+	}
+
+	spin_unlock(&nx842_driver_lock);
+}
+EXPORT_SYMBOL_GPL(nx842_register_driver);
+
+void nx842_unregister_driver(struct nx842_driver *driver)
+{
+	spin_lock(&nx842_driver_lock);
+
+	if (nx842_driver == driver) {
+		pr_info("unregistering driver %s\n", driver->owner->name);
+		nx842_driver = NULL;
+	} else if (nx842_driver) {
+		pr_err("can't unregister driver %s, using driver %s\n",
+		       driver->owner->name, nx842_driver->owner->name);
+	} else {
+		pr_err("can't unregister driver %s, no driver in use\n",
+		       driver->owner->name);
+	}
+
+	spin_unlock(&nx842_driver_lock);
+}
+EXPORT_SYMBOL_GPL(nx842_unregister_driver);
+
+static struct nx842_driver *get_driver(void)
+{
+	struct nx842_driver *driver = NULL;
+
+	spin_lock(&nx842_driver_lock);
+
+	driver = nx842_driver;
+
+	if (driver && !try_module_get(driver->owner))
+		driver = NULL;
+
+	spin_unlock(&nx842_driver_lock);
+
+	return driver;
+}
+
+static void put_driver(struct nx842_driver *driver)
+{
+	module_put(driver->owner);
+}
+
+int nx842_compress(const unsigned char *in, unsigned int in_len,
+		   unsigned char *out, unsigned int *out_len,
+		   void *wrkmem)
+{
+	struct nx842_driver *driver = get_driver();
+	int ret;
+
+	if (!driver)
+		return -ENODEV;
+
+	ret = driver->compress(in, in_len, out, out_len, wrkmem);
+
+	put_driver(driver);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(nx842_compress);
+
+int nx842_decompress(const unsigned char *in, unsigned int in_len,
+		     unsigned char *out, unsigned int *out_len,
+		     void *wrkmem)
+{
+	struct nx842_driver *driver = get_driver();
+	int ret;
+
+	if (!driver)
+		return -ENODEV;
+
+	ret = driver->decompress(in, in_len, out, out_len, wrkmem);
+
+	put_driver(driver);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(nx842_decompress);
+
+static __init int nx842_init(void)
+{
+	pr_info("loading\n");
+
+	if (of_find_compatible_node(NULL, NULL, NX842_PSERIES_COMPAT_NAME))
+		request_module_nowait(NX842_PSERIES_MODULE_NAME);
+	else
+		pr_err("no nx842 driver found.\n");
+
+	pr_info("loaded\n");
+
+	return 0;
+}
+module_init(nx842_init);
+
+static void __exit nx842_exit(void)
+{
+	pr_info("NX842 unloaded\n");
+}
+module_exit(nx842_exit);
diff --git a/drivers/crypto/nx/nx-842.h b/drivers/crypto/nx/nx-842.h
new file mode 100644
index 0000000..2a5d4e1
--- /dev/null
+++ b/drivers/crypto/nx/nx-842.h
@@ -0,0 +1,32 @@
+
+#ifndef __NX_842_H__
+#define __NX_842_H__
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/nx842.h>
+#include <linux/of.h>
+#include <linux/slab.h>
+#include <linux/io.h>
+
+struct nx842_driver {
+	struct module *owner;
+
+	int (*compress)(const unsigned char *in, unsigned int in_len,
+			unsigned char *out, unsigned int *out_len,
+			void *wrkmem);
+	int (*decompress)(const unsigned char *in, unsigned int in_len,
+			  unsigned char *out, unsigned int *out_len,
+			  void *wrkmem);
+};
+
+void nx842_register_driver(struct nx842_driver *driver);
+void nx842_unregister_driver(struct nx842_driver *driver);
+
+
+/* To allow the main nx-compress module to load platform module */
+#define NX842_PSERIES_MODULE_NAME	"nx-compress-pseries"
+#define NX842_PSERIES_COMPAT_NAME	"ibm,compression"
+
+
+#endif /* __NX_842_H__ */
diff --git a/include/linux/nx842.h b/include/linux/nx842.h
index a4d324c..d919c22 100644
--- a/include/linux/nx842.h
+++ b/include/linux/nx842.h
@@ -1,11 +1,13 @@
 #ifndef __NX842_H__
 #define __NX842_H__
 
-int nx842_get_workmem_size(void);
-int nx842_get_workmem_size_aligned(void);
+#define __NX842_PSERIES_MEM_COMPRESS	((PAGE_SIZE * 2) + 10240)
+
+#define NX842_MEM_COMPRESS	__NX842_PSERIES_MEM_COMPRESS
+
 int nx842_compress(const unsigned char *in, unsigned int in_len,
-		unsigned char *out, unsigned int *out_len, void *wrkmem);
+		   unsigned char *out, unsigned int *out_len, void *wrkmem);
 int nx842_decompress(const unsigned char *in, unsigned int in_len,
-		unsigned char *out, unsigned int *out_len, void *wrkmem);
+		     unsigned char *out, unsigned int *out_len, void *wrkmem);
 
 #endif
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 07/10] drivers/crypto/nx: add nx842 constraints
  2015-05-06 16:50 ` [PATCHv2 00/10] add 842 hw compression for PowerNV platform Dan Streetman
                     ` (5 preceding siblings ...)
  2015-05-06 16:51   ` [PATCH 06/10] drivers/crypto/nx: add NX-842 platform frontend driver Dan Streetman
@ 2015-05-06 16:51   ` Dan Streetman
  2015-05-06 16:51   ` [PATCH 08/10] drivers/crypto/nx: add PowerNV platform NX-842 driver Dan Streetman
                     ` (3 subsequent siblings)
  10 siblings, 0 replies; 46+ messages in thread
From: Dan Streetman @ 2015-05-06 16:51 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Dan Streetman, Seth Jennings, Robert Jennings, linux-crypto,
	linuxppc-dev, linux-kernel

Add "constraints" for the NX-842 driver.  The constraints are used to
indicate what the current NX-842 platform driver is capable of.  The
constraints tell the NX-842 user what alignment, min and max length, and
length multiple each provided buffers should conform to.  These are
required because the 842 hardware requires buffers to meet specific
constraints that vary based on platform - for example, the pSeries
max length is much lower than the PowerNV max length.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 drivers/crypto/nx/nx-842-pseries.c | 10 ++++++++++
 drivers/crypto/nx/nx-842.c         | 38 ++++++++++++++++++++++++++++++++++++++
 drivers/crypto/nx/nx-842.h         |  2 ++
 include/linux/nx842.h              |  9 +++++++++
 4 files changed, 59 insertions(+)

diff --git a/drivers/crypto/nx/nx-842-pseries.c b/drivers/crypto/nx/nx-842-pseries.c
index 9b83c9e..cb481d8 100644
--- a/drivers/crypto/nx/nx-842-pseries.c
+++ b/drivers/crypto/nx/nx-842-pseries.c
@@ -40,6 +40,13 @@ MODULE_DESCRIPTION("842 H/W Compression driver for IBM Power processors");
 /* IO buffer must be 128 byte aligned */
 #define IO_BUFFER_ALIGN 128
 
+static struct nx842_constraints nx842_pseries_constraints = {
+	.alignment =	IO_BUFFER_ALIGN,
+	.multiple =	DDE_BUFFER_LAST_MULT,
+	.minimum =	IO_BUFFER_ALIGN,
+	.maximum =	PAGE_SIZE, /* dynamic, max_sync_size */
+};
+
 struct nx842_header {
 	int blocks_nr; /* number of compressed blocks */
 	int offset; /* offset of the first block (from beginning of header) */
@@ -842,6 +849,8 @@ static int nx842_OF_upd_maxsyncop(struct nx842_devdata *devdata,
 		goto out;
 	}
 
+	nx842_pseries_constraints.maximum = devdata->max_sync_size;
+
 	devdata->max_sync_sg = (unsigned int)min(maxsynccop->comp_sg_limit,
 						maxsynccop->decomp_sg_limit);
 	if (devdata->max_sync_sg < 1) {
@@ -1115,6 +1124,7 @@ static struct attribute_group nx842_attribute_group = {
 
 static struct nx842_driver nx842_pseries_driver = {
 	.owner =	THIS_MODULE,
+	.constraints =	&nx842_pseries_constraints,
 	.compress =	nx842_pseries_compress,
 	.decompress =	nx842_pseries_decompress,
 };
diff --git a/drivers/crypto/nx/nx-842.c b/drivers/crypto/nx/nx-842.c
index f1f378e..160fe2d 100644
--- a/drivers/crypto/nx/nx-842.c
+++ b/drivers/crypto/nx/nx-842.c
@@ -86,6 +86,44 @@ static void put_driver(struct nx842_driver *driver)
 	module_put(driver->owner);
 }
 
+/**
+ * nx842_constraints
+ *
+ * This provides the driver's constraints.  Different nx842 implementations
+ * may have varying requirements.  The constraints are:
+ *   @alignment:	All buffers should be aligned to this
+ *   @multiple:		All buffer lengths should be a multiple of this
+ *   @minimum:		Buffer lengths must not be less than this amount
+ *   @maximum:		Buffer lengths must not be more than this amount
+ *
+ * The constraints apply to all buffers and lengths, both input and output,
+ * for both compression and decompression, except for the minimum which
+ * only applies to compression input and decompression output; the
+ * compressed data can be less than the minimum constraint.  It can be
+ * assumed that compressed data will always adhere to the multiple
+ * constraint.
+ *
+ * The driver may succeed even if these constraints are violated;
+ * however the driver can return failure or suffer reduced performance
+ * if any constraint is not met.
+ */
+int nx842_constraints(struct nx842_constraints *c)
+{
+	struct nx842_driver *driver = get_driver();
+	int ret = 0;
+
+	if (!driver)
+		return -ENODEV;
+
+	BUG_ON(!c);
+	memcpy(c, driver->constraints, sizeof(*c));
+
+	put_driver(driver);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(nx842_constraints);
+
 int nx842_compress(const unsigned char *in, unsigned int in_len,
 		   unsigned char *out, unsigned int *out_len,
 		   void *wrkmem)
diff --git a/drivers/crypto/nx/nx-842.h b/drivers/crypto/nx/nx-842.h
index 2a5d4e1..c6ceb0f 100644
--- a/drivers/crypto/nx/nx-842.h
+++ b/drivers/crypto/nx/nx-842.h
@@ -12,6 +12,8 @@
 struct nx842_driver {
 	struct module *owner;
 
+	struct nx842_constraints *constraints;
+
 	int (*compress)(const unsigned char *in, unsigned int in_len,
 			unsigned char *out, unsigned int *out_len,
 			void *wrkmem);
diff --git a/include/linux/nx842.h b/include/linux/nx842.h
index d919c22..aa1a97e9 100644
--- a/include/linux/nx842.h
+++ b/include/linux/nx842.h
@@ -5,6 +5,15 @@
 
 #define NX842_MEM_COMPRESS	__NX842_PSERIES_MEM_COMPRESS
 
+struct nx842_constraints {
+	int alignment;
+	int multiple;
+	int minimum;
+	int maximum;
+};
+
+int nx842_constraints(struct nx842_constraints *constraints);
+
 int nx842_compress(const unsigned char *in, unsigned int in_len,
 		   unsigned char *out, unsigned int *out_len, void *wrkmem);
 int nx842_decompress(const unsigned char *in, unsigned int in_len,
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 08/10] drivers/crypto/nx: add PowerNV platform NX-842 driver
  2015-05-06 16:50 ` [PATCHv2 00/10] add 842 hw compression for PowerNV platform Dan Streetman
                     ` (6 preceding siblings ...)
  2015-05-06 16:51   ` [PATCH 07/10] drivers/crypto/nx: add nx842 constraints Dan Streetman
@ 2015-05-06 16:51   ` Dan Streetman
  2015-05-06 16:51   ` [PATCH 09/10] drivers/crypto/nx: simplify pSeries nx842 driver Dan Streetman
                     ` (2 subsequent siblings)
  10 siblings, 0 replies; 46+ messages in thread
From: Dan Streetman @ 2015-05-06 16:51 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Dan Streetman, Seth Jennings, Robert Jennings, linux-crypto,
	linuxppc-dev, linux-kernel

Add driver for NX-842 hardware on the PowerNV platform.

This allows the use of the 842 compression hardware coprocessor on
the PowerNV platform.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 drivers/crypto/nx/Kconfig          |  10 +
 drivers/crypto/nx/Makefile         |   2 +
 drivers/crypto/nx/nx-842-powernv.c | 625 +++++++++++++++++++++++++++++++++++++
 drivers/crypto/nx/nx-842-pseries.c |   9 -
 drivers/crypto/nx/nx-842.c         |   4 +-
 drivers/crypto/nx/nx-842.h         |  97 ++++++
 include/linux/nx842.h              |   6 +-
 7 files changed, 741 insertions(+), 12 deletions(-)
 create mode 100644 drivers/crypto/nx/nx-842-powernv.c

diff --git a/drivers/crypto/nx/Kconfig b/drivers/crypto/nx/Kconfig
index 34013f7..ee9e259 100644
--- a/drivers/crypto/nx/Kconfig
+++ b/drivers/crypto/nx/Kconfig
@@ -40,4 +40,14 @@ config CRYPTO_DEV_NX_COMPRESS_PSERIES
 	  algorithm.  This supports NX hardware on the pSeries platform.
 	  If you choose 'M' here, this module will be called nx_compress_pseries.
 
+config CRYPTO_DEV_NX_COMPRESS_POWERNV
+	tristate "Compression acceleration support on PowerNV platform"
+	depends on PPC_POWERNV
+	default y
+	help
+	  Support for PowerPC Nest (NX) compression acceleration. This
+	  module supports acceleration for compressing memory with the 842
+	  algorithm.  This supports NX hardware on the PowerNV platform.
+	  If you choose 'M' here, this module will be called nx_compress_powernv.
+
 endif
diff --git a/drivers/crypto/nx/Makefile b/drivers/crypto/nx/Makefile
index 5d9f4bc..6619787 100644
--- a/drivers/crypto/nx/Makefile
+++ b/drivers/crypto/nx/Makefile
@@ -12,5 +12,7 @@ nx-crypto-objs := nx.o \
 
 obj-$(CONFIG_CRYPTO_DEV_NX_COMPRESS) += nx-compress.o
 obj-$(CONFIG_CRYPTO_DEV_NX_COMPRESS_PSERIES) += nx-compress-pseries.o
+obj-$(CONFIG_CRYPTO_DEV_NX_COMPRESS_POWERNV) += nx-compress-powernv.o
 nx-compress-objs := nx-842.o
 nx-compress-pseries-objs := nx-842-pseries.o
+nx-compress-powernv-objs := nx-842-powernv.o
diff --git a/drivers/crypto/nx/nx-842-powernv.c b/drivers/crypto/nx/nx-842-powernv.c
new file mode 100644
index 0000000..6a9fb8b
--- /dev/null
+++ b/drivers/crypto/nx/nx-842-powernv.c
@@ -0,0 +1,625 @@
+/*
+ * Driver for IBM PowerNV 842 compression accelerator
+ *
+ * Copyright (C) 2015 Dan Streetman, IBM Corp
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include "nx-842.h"
+
+#include <linux/timer.h>
+
+#include <asm/prom.h>
+#include <asm/icswx.h>
+
+#define MODULE_NAME NX842_POWERNV_MODULE_NAME
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Dan Streetman <ddstreet@ieee.org>");
+MODULE_DESCRIPTION("842 H/W Compression driver for IBM PowerNV processors");
+
+#define WORKMEM_ALIGN	(CRB_ALIGN)
+#define CSB_WAIT_MAX	(5000) /* ms */
+
+struct nx842_workmem {
+	/* Below fields must be properly aligned */
+	struct coprocessor_request_block crb; /* CRB_ALIGN align */
+	struct data_descriptor_entry ddl_in[DDL_LEN_MAX]; /* DDE_ALIGN align */
+	struct data_descriptor_entry ddl_out[DDL_LEN_MAX]; /* DDE_ALIGN align */
+	/* Above fields must be properly aligned */
+
+	ktime_t start;
+
+	char padding[WORKMEM_ALIGN]; /* unused, to allow alignment */
+} __packed __aligned(WORKMEM_ALIGN);
+
+struct nx842_coproc {
+	unsigned int chip_id;
+	unsigned int ct;
+	unsigned int ci;
+	struct list_head list;
+};
+
+/* no cpu hotplug on powernv, so this list never changes after init */
+static LIST_HEAD(nx842_coprocs);
+static unsigned int nx842_ct;
+
+/**
+ * setup_indirect_dde - Setup an indirect DDE
+ *
+ * The DDE is setup with the the DDE count, byte count, and address of
+ * first direct DDE in the list.
+ */
+static void setup_indirect_dde(struct data_descriptor_entry *dde,
+			       struct data_descriptor_entry *ddl,
+			       unsigned int dde_count, unsigned int byte_count)
+{
+	dde->flags = 0;
+	dde->count = dde_count;
+	dde->index = 0;
+	dde->length = cpu_to_be32(byte_count);
+	dde->address = cpu_to_be64(nx842_get_pa(ddl));
+}
+
+/**
+ * setup_direct_dde - Setup single DDE from buffer
+ *
+ * The DDE is setup with the buffer and length.  The buffer must be properly
+ * aligned.  The used length is returned.
+ * Returns:
+ *   N    Successfully set up DDE with N bytes
+ */
+static unsigned int setup_direct_dde(struct data_descriptor_entry *dde,
+				     unsigned long pa, unsigned int len)
+{
+	unsigned int l = min_t(unsigned int, len, LEN_ON_PAGE(pa));
+
+	dde->flags = 0;
+	dde->count = 0;
+	dde->index = 0;
+	dde->length = cpu_to_be32(l);
+	dde->address = cpu_to_be64(pa);
+
+	return l;
+}
+
+/**
+ * setup_ddl - Setup DDL from buffer
+ *
+ * Returns:
+ *   0		Successfully set up DDL
+ */
+static int setup_ddl(struct data_descriptor_entry *dde,
+		     struct data_descriptor_entry *ddl,
+		     unsigned char *buf, unsigned int len,
+		     bool in)
+{
+	unsigned long pa = nx842_get_pa(buf);
+	int i, ret, total_len = len;
+
+	if (!IS_ALIGNED(pa, DDE_BUFFER_ALIGN)) {
+		pr_debug("%s buffer pa 0x%lx not 0x%x-byte aligned\n",
+			 in ? "input" : "output", pa, DDE_BUFFER_ALIGN);
+		return -EINVAL;
+	}
+
+	/* only need to check last mult; since buffer must be
+	 * DDE_BUFFER_ALIGN aligned, and that is a multiple of
+	 * DDE_BUFFER_SIZE_MULT, and pre-last page DDE buffers
+	 * are guaranteed a multiple of DDE_BUFFER_SIZE_MULT.
+	 */
+	if (len % DDE_BUFFER_LAST_MULT) {
+		pr_debug("%s buffer len 0x%x not a multiple of 0x%x\n",
+			 in ? "input" : "output", len, DDE_BUFFER_LAST_MULT);
+		if (in)
+			return -EINVAL;
+		len = round_down(len, DDE_BUFFER_LAST_MULT);
+	}
+
+	/* use a single direct DDE */
+	if (len <= LEN_ON_PAGE(pa)) {
+		ret = setup_direct_dde(dde, pa, len);
+		WARN_ON(ret < len);
+		return 0;
+	}
+
+	/* use the DDL */
+	for (i = 0; i < DDL_LEN_MAX && len > 0; i++) {
+		ret = setup_direct_dde(&ddl[i], pa, len);
+		buf += ret;
+		len -= ret;
+		pa = nx842_get_pa(buf);
+	}
+
+	if (len > 0) {
+		pr_debug("0x%x total %s bytes 0x%x too many for DDL.\n",
+			 total_len, in ? "input" : "output", len);
+		if (in)
+			return -EMSGSIZE;
+		total_len -= len;
+	}
+	setup_indirect_dde(dde, ddl, i, total_len);
+
+	return 0;
+}
+
+#define CSB_ERR(csb, msg, ...)					\
+	pr_err("ERROR: " msg " : %02x %02x %02x %02x %08x\n",	\
+	       ##__VA_ARGS__, (csb)->flags,			\
+	       (csb)->cs, (csb)->cc, (csb)->ce,			\
+	       be32_to_cpu((csb)->count))
+
+#define CSB_ERR_ADDR(csb, msg, ...)				\
+	CSB_ERR(csb, msg " at %lx", ##__VA_ARGS__,		\
+		(unsigned long)be64_to_cpu((csb)->address))
+
+/**
+ * wait_for_csb
+ */
+static int wait_for_csb(struct nx842_workmem *wmem,
+			struct coprocessor_status_block *csb)
+{
+	ktime_t start = wmem->start, now = ktime_get();
+	ktime_t timeout = ktime_add_ms(start, CSB_WAIT_MAX);
+
+	while (!(ACCESS_ONCE(csb->flags) & CSB_V)) {
+		cpu_relax();
+		now = ktime_get();
+		if (ktime_after(now, timeout))
+			break;
+	}
+
+	/* hw has updated csb and output buffer */
+	barrier();
+
+	/* check CSB flags */
+	if (!(csb->flags & CSB_V)) {
+		CSB_ERR(csb, "CSB still not valid after %ld us, giving up",
+			(long)ktime_us_delta(now, start));
+		return -ETIMEDOUT;
+	}
+	if (csb->flags & CSB_F) {
+		CSB_ERR(csb, "Invalid CSB format");
+		return -EPROTO;
+	}
+	if (csb->flags & CSB_CH) {
+		CSB_ERR(csb, "Invalid CSB chaining state");
+		return -EPROTO;
+	}
+
+	/* verify CSB completion sequence is 0 */
+	if (csb->cs) {
+		CSB_ERR(csb, "Invalid CSB completion sequence");
+		return -EPROTO;
+	}
+
+	/* check CSB Completion Code */
+	switch (csb->cc) {
+	/* no error */
+	case CSB_CC_SUCCESS:
+		break;
+	case CSB_CC_TPBC_GT_SPBC:
+		/* not an error, but the compressed data is
+		 * larger than the uncompressed data :(
+		 */
+		break;
+
+	/* input data errors */
+	case CSB_CC_OPERAND_OVERLAP:
+		/* input and output buffers overlap */
+		CSB_ERR(csb, "Operand Overlap error");
+		return -EINVAL;
+	case CSB_CC_INVALID_OPERAND:
+		CSB_ERR(csb, "Invalid operand");
+		return -EINVAL;
+	case CSB_CC_NOSPC:
+		/* output buffer too small */
+		return -ENOSPC;
+	case CSB_CC_ABORT:
+		CSB_ERR(csb, "Function aborted");
+		return -EINTR;
+	case CSB_CC_CRC_MISMATCH:
+		CSB_ERR(csb, "CRC mismatch");
+		return -EINVAL;
+	case CSB_CC_TEMPL_INVALID:
+		CSB_ERR(csb, "Compressed data template invalid");
+		return -EINVAL;
+	case CSB_CC_TEMPL_OVERFLOW:
+		CSB_ERR(csb, "Compressed data template shows data past end");
+		return -EINVAL;
+
+	/* these should not happen */
+	case CSB_CC_INVALID_ALIGN:
+		/* setup_ddl should have detected this */
+		CSB_ERR_ADDR(csb, "Invalid alignment");
+		return -EINVAL;
+	case CSB_CC_DATA_LENGTH:
+		/* setup_ddl should have detected this */
+		CSB_ERR(csb, "Invalid data length");
+		return -EINVAL;
+	case CSB_CC_WR_TRANSLATION:
+	case CSB_CC_TRANSLATION:
+	case CSB_CC_TRANSLATION_DUP1:
+	case CSB_CC_TRANSLATION_DUP2:
+	case CSB_CC_TRANSLATION_DUP3:
+	case CSB_CC_TRANSLATION_DUP4:
+	case CSB_CC_TRANSLATION_DUP5:
+	case CSB_CC_TRANSLATION_DUP6:
+		/* should not happen, we use physical addrs */
+		CSB_ERR_ADDR(csb, "Translation error");
+		return -EPROTO;
+	case CSB_CC_WR_PROTECTION:
+	case CSB_CC_PROTECTION:
+	case CSB_CC_PROTECTION_DUP1:
+	case CSB_CC_PROTECTION_DUP2:
+	case CSB_CC_PROTECTION_DUP3:
+	case CSB_CC_PROTECTION_DUP4:
+	case CSB_CC_PROTECTION_DUP5:
+	case CSB_CC_PROTECTION_DUP6:
+		/* should not happen, we use physical addrs */
+		CSB_ERR_ADDR(csb, "Protection error");
+		return -EPROTO;
+	case CSB_CC_PRIVILEGE:
+		/* shouldn't happen, we're in HYP mode */
+		CSB_ERR(csb, "Insufficient Privilege error");
+		return -EPROTO;
+	case CSB_CC_EXCESSIVE_DDE:
+		/* shouldn't happen, setup_ddl doesn't use many dde's */
+		CSB_ERR(csb, "Too many DDEs in DDL");
+		return -EINVAL;
+	case CSB_CC_TRANSPORT:
+		/* shouldn't happen, we setup CRB correctly */
+		CSB_ERR(csb, "Invalid CRB");
+		return -EINVAL;
+	case CSB_CC_SEGMENTED_DDL:
+		/* shouldn't happen, setup_ddl creates DDL right */
+		CSB_ERR(csb, "Segmented DDL error");
+		return -EINVAL;
+	case CSB_CC_DDE_OVERFLOW:
+		/* shouldn't happen, setup_ddl creates DDL right */
+		CSB_ERR(csb, "DDE overflow error");
+		return -EINVAL;
+	case CSB_CC_SESSION:
+		/* should not happen with ICSWX */
+		CSB_ERR(csb, "Session violation error");
+		return -EPROTO;
+	case CSB_CC_CHAIN:
+		/* should not happen, we don't use chained CRBs */
+		CSB_ERR(csb, "Chained CRB error");
+		return -EPROTO;
+	case CSB_CC_SEQUENCE:
+		/* should not happen, we don't use chained CRBs */
+		CSB_ERR(csb, "CRB seqeunce number error");
+		return -EPROTO;
+	case CSB_CC_UNKNOWN_CODE:
+		CSB_ERR(csb, "Unknown subfunction code");
+		return -EPROTO;
+
+	/* hardware errors */
+	case CSB_CC_RD_EXTERNAL:
+	case CSB_CC_RD_EXTERNAL_DUP1:
+	case CSB_CC_RD_EXTERNAL_DUP2:
+	case CSB_CC_RD_EXTERNAL_DUP3:
+		CSB_ERR_ADDR(csb, "Read error outside coprocessor");
+		return -EPROTO;
+	case CSB_CC_WR_EXTERNAL:
+		CSB_ERR_ADDR(csb, "Write error outside coprocessor");
+		return -EPROTO;
+	case CSB_CC_INTERNAL:
+		CSB_ERR(csb, "Internal error in coprocessor");
+		return -EPROTO;
+	case CSB_CC_PROVISION:
+		CSB_ERR(csb, "Storage provision error");
+		return -EPROTO;
+	case CSB_CC_HW:
+		CSB_ERR(csb, "Correctable hardware error");
+		return -EPROTO;
+
+	default:
+		CSB_ERR(csb, "Invalid CC %d", csb->cc);
+		return -EPROTO;
+	}
+
+	/* check Completion Extension state */
+	if (csb->ce & CSB_CE_TERMINATION) {
+		CSB_ERR(csb, "CSB request was terminated");
+		return -EPROTO;
+	}
+	if (csb->ce & CSB_CE_INCOMPLETE) {
+		CSB_ERR(csb, "CSB request not complete");
+		return -EPROTO;
+	}
+	if (!(csb->ce & CSB_CE_TPBC)) {
+		CSB_ERR(csb, "TPBC not provided, unknown target length");
+		return -EPROTO;
+	}
+
+	/* successful completion */
+	pr_debug_ratelimited("Processed %u bytes in %lu us\n", csb->count,
+			     (unsigned long)ktime_us_delta(now, start));
+
+	return 0;
+}
+
+/**
+ * nx842_powernv_function - compress/decompress data using the 842 algorithm
+ *
+ * (De)compression provided by the NX842 coprocessor on IBM PowerNV systems.
+ * This compresses or decompresses the provided input buffer into the provided
+ * output buffer.
+ *
+ * Upon return from this function @outlen contains the length of the
+ * output data.  If there is an error then @outlen will be 0 and an
+ * error will be specified by the return code from this function.
+ *
+ * The @workmem buffer should only be used by one function call at a time.
+ *
+ * @in: input buffer pointer
+ * @inlen: input buffer size
+ * @out: output buffer pointer
+ * @outlenp: output buffer size pointer
+ * @workmem: working memory buffer pointer, must be at least NX842_MEM_COMPRESS
+ * @fc: function code, see CCW Function Codes in nx-842.h
+ *
+ * Returns:
+ *   0		Success, output of length @outlenp stored in the buffer at @out
+ *   -ENODEV	Hardware unavailable
+ *   -ENOSPC	Output buffer is to small
+ *   -EMSGSIZE	Input buffer too large
+ *   -EINVAL	buffer constraints do not fix nx842_constraints
+ *   -EPROTO	hardware error during operation
+ *   -ETIMEDOUT	hardware did not complete operation in reasonable time
+ *   -EINTR	operation was aborted
+ */
+static int nx842_powernv_function(const unsigned char *in, unsigned int inlen,
+				  unsigned char *out, unsigned int *outlenp,
+				  void *workmem, int fc)
+{
+	struct coprocessor_request_block *crb;
+	struct coprocessor_status_block *csb;
+	struct nx842_workmem *wmem;
+	int ret;
+	u64 csb_addr;
+	u32 ccw;
+	unsigned int outlen = *outlenp;
+
+	wmem = PTR_ALIGN(workmem, WORKMEM_ALIGN);
+
+	*outlenp = 0;
+
+	/* shoudn't happen, we don't load without a coproc */
+	if (!nx842_ct) {
+		pr_err_ratelimited("coprocessor CT is 0");
+		return -ENODEV;
+	}
+
+	crb = &wmem->crb;
+	csb = &crb->csb;
+
+	/* Clear any previous values */
+	memset(crb, 0, sizeof(*crb));
+
+	/* set up DDLs */
+	ret = setup_ddl(&crb->source, wmem->ddl_in,
+			(unsigned char *)in, inlen, true);
+	if (ret)
+		return ret;
+	ret = setup_ddl(&crb->target, wmem->ddl_out,
+			out, outlen, false);
+	if (ret)
+		return ret;
+
+	/* set up CCW */
+	ccw = 0;
+	ccw = SET_FIELD(ccw, CCW_CT, nx842_ct);
+	ccw = SET_FIELD(ccw, CCW_CI_842, 0); /* use 0 for hw auto-selection */
+	ccw = SET_FIELD(ccw, CCW_FC_842, fc);
+
+	/* set up CRB's CSB addr */
+	csb_addr = nx842_get_pa(csb) & CRB_CSB_ADDRESS;
+	csb_addr |= CRB_CSB_AT; /* Addrs are phys */
+	crb->csb_addr = cpu_to_be64(csb_addr);
+
+	wmem->start = ktime_get();
+
+	/* do ICSWX */
+	ret = icswx(cpu_to_be32(ccw), crb);
+
+	pr_debug_ratelimited("icswx CR %x ccw %x crb->ccw %x\n", ret,
+			     (unsigned int)ccw,
+			     (unsigned int)be32_to_cpu(crb->ccw));
+
+	switch (ret) {
+	case ICSWX_INITIATED:
+		ret = wait_for_csb(wmem, csb);
+		break;
+	case ICSWX_BUSY:
+		pr_debug_ratelimited("842 Coprocessor busy\n");
+		ret = -EBUSY;
+		break;
+	case ICSWX_REJECTED:
+		pr_err_ratelimited("ICSWX rejected\n");
+		ret = -EPROTO;
+		break;
+	default:
+		pr_err_ratelimited("Invalid ICSWX return code %x\n", ret);
+		ret = -EPROTO;
+		break;
+	}
+
+	if (!ret)
+		*outlenp = be32_to_cpu(csb->count);
+
+	return ret;
+}
+
+/**
+ * nx842_powernv_compress - Compress data using the 842 algorithm
+ *
+ * Compression provided by the NX842 coprocessor on IBM PowerNV systems.
+ * The input buffer is compressed and the result is stored in the
+ * provided output buffer.
+ *
+ * Upon return from this function @outlen contains the length of the
+ * compressed data.  If there is an error then @outlen will be 0 and an
+ * error will be specified by the return code from this function.
+ *
+ * @in: input buffer pointer
+ * @inlen: input buffer size
+ * @out: output buffer pointer
+ * @outlenp: output buffer size pointer
+ * @workmem: working memory buffer pointer, must be at least NX842_MEM_COMPRESS
+ *
+ * Returns: see @nx842_powernv_function()
+ */
+static int nx842_powernv_compress(const unsigned char *in, unsigned int inlen,
+				  unsigned char *out, unsigned int *outlenp,
+				  void *wmem)
+{
+	return nx842_powernv_function(in, inlen, out, outlenp,
+				      wmem, CCW_FC_842_COMP_NOCRC);
+}
+
+/**
+ * nx842_powernv_decompress - Decompress data using the 842 algorithm
+ *
+ * Decompression provided by the NX842 coprocessor on IBM PowerNV systems.
+ * The input buffer is decompressed and the result is stored in the
+ * provided output buffer.
+ *
+ * Upon return from this function @outlen contains the length of the
+ * decompressed data.  If there is an error then @outlen will be 0 and an
+ * error will be specified by the return code from this function.
+ *
+ * @in: input buffer pointer
+ * @inlen: input buffer size
+ * @out: output buffer pointer
+ * @outlenp: output buffer size pointer
+ * @workmem: working memory buffer pointer, must be at least NX842_MEM_COMPRESS
+ *
+ * Returns: see @nx842_powernv_function()
+ */
+static int nx842_powernv_decompress(const unsigned char *in, unsigned int inlen,
+				    unsigned char *out, unsigned int *outlenp,
+				    void *wmem)
+{
+	return nx842_powernv_function(in, inlen, out, outlenp,
+				      wmem, CCW_FC_842_DECOMP_NOCRC);
+}
+
+static int __init nx842_powernv_probe(struct device_node *dn)
+{
+	struct nx842_coproc *coproc;
+	struct property *ct_prop, *ci_prop;
+	unsigned int ct, ci;
+	int chip_id;
+
+	chip_id = of_get_ibm_chip_id(dn);
+	if (chip_id < 0) {
+		pr_err("ibm,chip-id missing\n");
+		return -EINVAL;
+	}
+	ct_prop = of_find_property(dn, "ibm,842-coprocessor-type", NULL);
+	if (!ct_prop) {
+		pr_err("ibm,842-coprocessor-type missing\n");
+		return -EINVAL;
+	}
+	ct = be32_to_cpu(*(unsigned int *)ct_prop->value);
+	ci_prop = of_find_property(dn, "ibm,842-coprocessor-instance", NULL);
+	if (!ci_prop) {
+		pr_err("ibm,842-coprocessor-instance missing\n");
+		return -EINVAL;
+	}
+	ci = be32_to_cpu(*(unsigned int *)ci_prop->value);
+
+	coproc = kmalloc(sizeof(*coproc), GFP_KERNEL);
+	if (!coproc)
+		return -ENOMEM;
+
+	coproc->chip_id = chip_id;
+	coproc->ct = ct;
+	coproc->ci = ci;
+	INIT_LIST_HEAD(&coproc->list);
+	list_add(&coproc->list, &nx842_coprocs);
+
+	pr_info("coprocessor found on chip %d, CT %d CI %d\n", chip_id, ct, ci);
+
+	if (!nx842_ct)
+		nx842_ct = ct;
+	else if (nx842_ct != ct)
+		pr_err("NX842 chip %d, CT %d != first found CT %d\n",
+		       chip_id, ct, nx842_ct);
+
+	return 0;
+}
+
+static struct nx842_constraints nx842_powernv_constraints = {
+	.alignment =	DDE_BUFFER_ALIGN,
+	.multiple =	DDE_BUFFER_LAST_MULT,
+	.minimum =	DDE_BUFFER_LAST_MULT,
+	.maximum =	(DDL_LEN_MAX - 1) * PAGE_SIZE,
+};
+
+static struct nx842_driver nx842_powernv_driver = {
+	.owner =	THIS_MODULE,
+	.constraints =	&nx842_powernv_constraints,
+	.compress =	nx842_powernv_compress,
+	.decompress =	nx842_powernv_decompress,
+};
+
+static __init int nx842_powernv_init(void)
+{
+	struct device_node *dn;
+
+	/* verify workmem size/align restrictions */
+	BUILD_BUG_ON(sizeof(struct nx842_workmem) > NX842_MEM_COMPRESS);
+	BUILD_BUG_ON(WORKMEM_ALIGN % CRB_ALIGN);
+	BUILD_BUG_ON(CRB_ALIGN % DDE_ALIGN);
+	BUILD_BUG_ON(CRB_SIZE % DDE_ALIGN);
+	/* verify buffer size/align restrictions */
+	BUILD_BUG_ON(PAGE_SIZE % DDE_BUFFER_ALIGN);
+	BUILD_BUG_ON(DDE_BUFFER_ALIGN % DDE_BUFFER_SIZE_MULT);
+	BUILD_BUG_ON(DDE_BUFFER_SIZE_MULT % DDE_BUFFER_LAST_MULT);
+
+	pr_info("loading\n");
+
+	for_each_compatible_node(dn, NULL, NX842_POWERNV_COMPAT_NAME)
+		nx842_powernv_probe(dn);
+
+	if (!nx842_ct) {
+		pr_err("no coprocessors found\n");
+		return -ENODEV;
+	}
+
+	nx842_register_driver(&nx842_powernv_driver);
+
+	pr_info("loaded\n");
+
+	return 0;
+}
+module_init(nx842_powernv_init);
+
+static void __exit nx842_powernv_exit(void)
+{
+	struct nx842_coproc *coproc, *n;
+
+	nx842_unregister_driver(&nx842_powernv_driver);
+
+	list_for_each_entry_safe(coproc, n, &nx842_coprocs, list) {
+		list_del(&coproc->list);
+		kfree(coproc);
+	}
+
+	pr_info("unloaded\n");
+}
+module_exit(nx842_powernv_exit);
diff --git a/drivers/crypto/nx/nx-842-pseries.c b/drivers/crypto/nx/nx-842-pseries.c
index cb481d8..6db9992 100644
--- a/drivers/crypto/nx/nx-842-pseries.c
+++ b/drivers/crypto/nx/nx-842-pseries.c
@@ -160,15 +160,6 @@ static inline unsigned long nx842_get_scatterlist_size(
 	return sl->entry_nr * sizeof(struct nx842_slentry);
 }
 
-static inline unsigned long nx842_get_pa(void *addr)
-{
-	if (is_vmalloc_addr(addr))
-		return page_to_phys(vmalloc_to_page(addr))
-		       + offset_in_page(addr);
-	else
-		return __pa(addr);
-}
-
 static int nx842_build_scatterlist(unsigned long buf, int len,
 			struct nx842_scatterlist *sl)
 {
diff --git a/drivers/crypto/nx/nx-842.c b/drivers/crypto/nx/nx-842.c
index 160fe2d..bf2823c 100644
--- a/drivers/crypto/nx/nx-842.c
+++ b/drivers/crypto/nx/nx-842.c
@@ -164,7 +164,9 @@ static __init int nx842_init(void)
 {
 	pr_info("loading\n");
 
-	if (of_find_compatible_node(NULL, NULL, NX842_PSERIES_COMPAT_NAME))
+	if (of_find_compatible_node(NULL, NULL, NX842_POWERNV_COMPAT_NAME))
+		request_module_nowait(NX842_POWERNV_MODULE_NAME);
+	else if (of_find_compatible_node(NULL, NULL, NX842_PSERIES_COMPAT_NAME))
 		request_module_nowait(NX842_PSERIES_MODULE_NAME);
 	else
 		pr_err("no nx842 driver found.\n");
diff --git a/drivers/crypto/nx/nx-842.h b/drivers/crypto/nx/nx-842.h
index c6ceb0f..84b15b7 100644
--- a/drivers/crypto/nx/nx-842.h
+++ b/drivers/crypto/nx/nx-842.h
@@ -5,9 +5,104 @@
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/nx842.h>
+#include <linux/sw842.h>
 #include <linux/of.h>
 #include <linux/slab.h>
 #include <linux/io.h>
+#include <linux/mm.h>
+#include <linux/ratelimit.h>
+
+/* Restrictions on Data Descriptor List (DDL) and Entry (DDE) buffers
+ *
+ * From NX P8 workbook, sec 4.9.1 "842 details"
+ *   Each DDE buffer is 128 byte aligned
+ *   Each DDE buffer size is a multiple of 32 bytes (except the last)
+ *   The last DDE buffer size is a multiple of 8 bytes
+ */
+#define DDE_BUFFER_ALIGN	(128)
+#define DDE_BUFFER_SIZE_MULT	(32)
+#define DDE_BUFFER_LAST_MULT	(8)
+
+/* Arbitrary DDL length limit
+ * Allows max buffer size of MAX-1 to MAX pages
+ * (depending on alignment)
+ */
+#define DDL_LEN_MAX		(17)
+
+/* CCW 842 CI/FC masks
+ * NX P8 workbook, section 4.3.1, figure 4-6
+ * "CI/FC Boundary by NX CT type"
+ */
+#define CCW_CI_842		(0x00003ff8)
+#define CCW_FC_842		(0x00000007)
+
+/* CCW Function Codes (FC) for 842
+ * NX P8 workbook, section 4.9, table 4-28
+ * "Function Code Definitions for 842 Memory Compression"
+ */
+#define CCW_FC_842_COMP_NOCRC	(0)
+#define CCW_FC_842_COMP_CRC	(1)
+#define CCW_FC_842_DECOMP_NOCRC	(2)
+#define CCW_FC_842_DECOMP_CRC	(3)
+#define CCW_FC_842_MOVE		(4)
+
+/* CSB CC Error Types for 842
+ * NX P8 workbook, section 4.10.3, table 4-30
+ * "Reported Error Types Summary Table"
+ */
+/* These are all duplicates of existing codes defined in icswx.h. */
+#define CSB_CC_TRANSLATION_DUP1	(80)
+#define CSB_CC_TRANSLATION_DUP2	(82)
+#define CSB_CC_TRANSLATION_DUP3	(84)
+#define CSB_CC_TRANSLATION_DUP4	(86)
+#define CSB_CC_TRANSLATION_DUP5	(92)
+#define CSB_CC_TRANSLATION_DUP6	(94)
+#define CSB_CC_PROTECTION_DUP1	(81)
+#define CSB_CC_PROTECTION_DUP2	(83)
+#define CSB_CC_PROTECTION_DUP3	(85)
+#define CSB_CC_PROTECTION_DUP4	(87)
+#define CSB_CC_PROTECTION_DUP5	(93)
+#define CSB_CC_PROTECTION_DUP6	(95)
+#define CSB_CC_RD_EXTERNAL_DUP1	(89)
+#define CSB_CC_RD_EXTERNAL_DUP2	(90)
+#define CSB_CC_RD_EXTERNAL_DUP3	(91)
+/* These are specific to NX */
+/* 842 codes */
+#define CSB_CC_TPBC_GT_SPBC	(64) /* no error, but >1 comp ratio */
+#define CSB_CC_CRC_MISMATCH	(65) /* decomp crc mismatch */
+#define CSB_CC_TEMPL_INVALID	(66) /* decomp invalid template value */
+#define CSB_CC_TEMPL_OVERFLOW	(67) /* decomp template shows data after end */
+/* sym crypt codes */
+#define CSB_CC_DECRYPT_OVERFLOW	(64)
+/* asym crypt codes */
+#define CSB_CC_MINV_OVERFLOW	(128)
+/* These are reserved for hypervisor use */
+#define CSB_CC_HYP_RESERVE_START	(240)
+#define CSB_CC_HYP_RESERVE_END		(253)
+#define CSB_CC_HYP_NO_HW		(254)
+#define CSB_CC_HYP_HANG_ABORTED		(255)
+
+/* CCB Completion Modes (CM) for 842
+ * NX P8 workbook, section 4.3, figure 4-5
+ * "CRB Details - Normal Cop_Req (CL=00, C=1)"
+ */
+#define CCB_CM_EXTRA_WRITE	(CCB_CM0_ALL_COMPLETIONS & CCB_CM12_STORE)
+#define CCB_CM_INTERRUPT	(CCB_CM0_ALL_COMPLETIONS & CCB_CM12_INTERRUPT)
+
+#define LEN_ON_PAGE(pa)		(PAGE_SIZE - ((pa) & ~PAGE_MASK))
+
+static inline unsigned long nx842_get_pa(void *addr)
+{
+	if (!is_vmalloc_addr(addr))
+		return __pa(addr);
+
+	return page_to_phys(vmalloc_to_page(addr)) + offset_in_page(addr);
+}
+
+/* Get/Set bit fields */
+#define MASK_LSH(m)		(__builtin_ffsl(m) - 1)
+#define GET_FIELD(v, m)		(((v) & (m)) >> MASK_LSH(m))
+#define SET_FIELD(v, m, val)	(((v) & ~(m)) | (((val) << MASK_LSH(m)) & (m)))
 
 struct nx842_driver {
 	struct module *owner;
@@ -27,6 +122,8 @@ void nx842_unregister_driver(struct nx842_driver *driver);
 
 
 /* To allow the main nx-compress module to load platform module */
+#define NX842_POWERNV_MODULE_NAME	"nx-compress-powernv"
+#define NX842_POWERNV_COMPAT_NAME	"ibm,power-nx"
 #define NX842_PSERIES_MODULE_NAME	"nx-compress-pseries"
 #define NX842_PSERIES_COMPAT_NAME	"ibm,compression"
 
diff --git a/include/linux/nx842.h b/include/linux/nx842.h
index aa1a97e9..4ddf68d 100644
--- a/include/linux/nx842.h
+++ b/include/linux/nx842.h
@@ -1,9 +1,11 @@
 #ifndef __NX842_H__
 #define __NX842_H__
 
-#define __NX842_PSERIES_MEM_COMPRESS	((PAGE_SIZE * 2) + 10240)
+#define __NX842_PSERIES_MEM_COMPRESS	(10240)
+#define __NX842_POWERNV_MEM_COMPRESS	(1024)
 
-#define NX842_MEM_COMPRESS	__NX842_PSERIES_MEM_COMPRESS
+#define NX842_MEM_COMPRESS	(max_t(unsigned int,			\
+	__NX842_PSERIES_MEM_COMPRESS, __NX842_POWERNV_MEM_COMPRESS))
 
 struct nx842_constraints {
 	int alignment;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 09/10] drivers/crypto/nx: simplify pSeries nx842 driver
  2015-05-06 16:50 ` [PATCHv2 00/10] add 842 hw compression for PowerNV platform Dan Streetman
                     ` (7 preceding siblings ...)
  2015-05-06 16:51   ` [PATCH 08/10] drivers/crypto/nx: add PowerNV platform NX-842 driver Dan Streetman
@ 2015-05-06 16:51   ` Dan Streetman
  2015-05-06 16:51   ` [PATCH 10/10] drivers/crypto/nx: add hardware 842 crypto comp alg Dan Streetman
  2015-05-07 17:49   ` [PATCHv3 00/10] add 842 hw compression for PowerNV platform Dan Streetman
  10 siblings, 0 replies; 46+ messages in thread
From: Dan Streetman @ 2015-05-06 16:51 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Dan Streetman, Seth Jennings, Robert Jennings, linux-crypto,
	linuxppc-dev, linux-kernel

Simplify the pSeries NX-842 driver: do not expect incoming buffers to be
exactly page-sized; do not break up input buffers to compress smaller
blocks; do not use any internal headers in the compressed data blocks;
remove the software decompression implementation; implement the pSeries
nx842_constraints.

This changes the pSeries NX-842 driver to perform constraints-based
compression so that it only needs to compress one entire input block at a
time.  This removes the need for it to split input data blocks into
multiple compressed data sections in the output buffer, and removes the
need for any extra header info in the compressed data; all that is moved
(in a later patch) into the main crypto 842 driver.  Additionally, the
842 software decompression implementation is no longer needed here, as
the crypto 842 driver will use the generic software 842 decompression
function as a fallback if any hardware 842 driver fails.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 drivers/crypto/nx/nx-842-pseries.c | 779 ++++++++-----------------------------
 1 file changed, 153 insertions(+), 626 deletions(-)

diff --git a/drivers/crypto/nx/nx-842-pseries.c b/drivers/crypto/nx/nx-842-pseries.c
index 6db9992..85837e9 100644
--- a/drivers/crypto/nx/nx-842-pseries.c
+++ b/drivers/crypto/nx/nx-842-pseries.c
@@ -21,7 +21,6 @@
  *          Seth Jennings <sjenning@linux.vnet.ibm.com>
  */
 
-#include <asm/page.h>
 #include <asm/vio.h>
 
 #include "nx-842.h"
@@ -32,11 +31,6 @@ MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Robert Jennings <rcj@linux.vnet.ibm.com>");
 MODULE_DESCRIPTION("842 H/W Compression driver for IBM Power processors");
 
-#define SHIFT_4K 12
-#define SHIFT_64K 16
-#define SIZE_4K (1UL << SHIFT_4K)
-#define SIZE_64K (1UL << SHIFT_64K)
-
 /* IO buffer must be 128 byte aligned */
 #define IO_BUFFER_ALIGN 128
 
@@ -47,18 +41,52 @@ static struct nx842_constraints nx842_pseries_constraints = {
 	.maximum =	PAGE_SIZE, /* dynamic, max_sync_size */
 };
 
-struct nx842_header {
-	int blocks_nr; /* number of compressed blocks */
-	int offset; /* offset of the first block (from beginning of header) */
-	int sizes[0]; /* size of compressed blocks */
-};
-
-static inline int nx842_header_size(const struct nx842_header *hdr)
+static int check_constraints(unsigned long buf, unsigned int *len, bool in)
 {
-	return sizeof(struct nx842_header) +
-			hdr->blocks_nr * sizeof(hdr->sizes[0]);
+	if (!IS_ALIGNED(buf, nx842_pseries_constraints.alignment)) {
+		pr_debug("%s buffer 0x%lx not aligned to 0x%x\n",
+			 in ? "input" : "output", buf,
+			 nx842_pseries_constraints.alignment);
+		return -EINVAL;
+	}
+	if (*len % nx842_pseries_constraints.multiple) {
+		pr_debug("%s buffer len 0x%x not multiple of 0x%x\n",
+			 in ? "input" : "output", *len,
+			 nx842_pseries_constraints.multiple);
+		if (in)
+			return -EINVAL;
+		*len = round_down(*len, nx842_pseries_constraints.multiple);
+	}
+	if (*len < nx842_pseries_constraints.minimum) {
+		pr_debug("%s buffer len 0x%x under minimum 0x%x\n",
+			 in ? "input" : "output", *len,
+			 nx842_pseries_constraints.minimum);
+		return -EINVAL;
+	}
+	if (*len > nx842_pseries_constraints.maximum) {
+		pr_debug("%s buffer len 0x%x over maximum 0x%x\n",
+			 in ? "input" : "output", *len,
+			 nx842_pseries_constraints.maximum);
+		if (in)
+			return -EINVAL;
+		*len = nx842_pseries_constraints.maximum;
+	}
+	return 0;
 }
 
+/* I assume we need to align the CSB? */
+#define WORKMEM_ALIGN	(256)
+
+struct nx842_workmem {
+	/* scatterlist */
+	char slin[4096];
+	char slout[4096];
+	/* coprocessor status/parameter block */
+	struct nx_csbcpb csbcpb;
+
+	char padding[WORKMEM_ALIGN];
+} __aligned(WORKMEM_ALIGN);
+
 /* Macros for fields within nx_csbcpb */
 /* Check the valid bit within the csbcpb valid field */
 #define NX842_CSBCBP_VALID_CHK(x) (x & BIT_MASK(7))
@@ -72,8 +100,7 @@ static inline int nx842_header_size(const struct nx842_header *hdr)
 #define NX842_CSBCPB_CE2(x)	(x & BIT_MASK(5))
 
 /* The NX unit accepts data only on 4K page boundaries */
-#define NX842_HW_PAGE_SHIFT	SHIFT_4K
-#define NX842_HW_PAGE_SIZE	(ASM_CONST(1) << NX842_HW_PAGE_SHIFT)
+#define NX842_HW_PAGE_SIZE	(4096)
 #define NX842_HW_PAGE_MASK	(~(NX842_HW_PAGE_SIZE-1))
 
 enum nx842_status {
@@ -194,41 +221,6 @@ static int nx842_build_scatterlist(unsigned long buf, int len,
 	return 0;
 }
 
-/*
- * Working memory for software decompression
- */
-struct sw842_fifo {
-	union {
-		char f8[256][8];
-		char f4[512][4];
-	};
-	char f2[256][2];
-	unsigned char f84_full;
-	unsigned char f2_full;
-	unsigned char f8_count;
-	unsigned char f2_count;
-	unsigned int f4_count;
-};
-
-/*
- * Working memory for crypto API
- */
-struct nx842_workmem {
-	char bounce[PAGE_SIZE]; /* bounce buffer for decompression input */
-	union {
-		/* hardware working memory */
-		struct {
-			/* scatterlist */
-			char slin[SIZE_4K];
-			char slout[SIZE_4K];
-			/* coprocessor status/parameter block */
-			struct nx_csbcpb csbcpb;
-		};
-		/* software working memory */
-		struct sw842_fifo swfifo; /* software decompression fifo */
-	};
-};
-
 static int nx842_validate_result(struct device *dev,
 	struct cop_status_block *csb)
 {
@@ -291,8 +283,8 @@ static int nx842_validate_result(struct device *dev,
  * compressed data.  If there is an error then @outlen will be 0 and an
  * error will be specified by the return code from this function.
  *
- * @in: Pointer to input buffer, must be page aligned
- * @inlen: Length of input buffer, must be PAGE_SIZE
+ * @in: Pointer to input buffer
+ * @inlen: Length of input buffer
  * @out: Pointer to output buffer
  * @outlen: Length of output buffer
  * @wrkmem: ptr to buffer for working memory, size determined by
@@ -302,7 +294,6 @@ static int nx842_validate_result(struct device *dev,
  *   0		Success, output of length @outlen stored in the buffer at @out
  *   -ENOMEM	Unable to allocate internal buffers
  *   -ENOSPC	Output buffer is to small
- *   -EMSGSIZE	XXX Difficult to describe this limitation
  *   -EIO	Internal error
  *   -ENODEV	Hardware unavailable
  */
@@ -310,29 +301,26 @@ static int nx842_pseries_compress(const unsigned char *in, unsigned int inlen,
 				  unsigned char *out, unsigned int *outlen,
 				  void *wmem)
 {
-	struct nx842_header *hdr;
 	struct nx842_devdata *local_devdata;
 	struct device *dev = NULL;
 	struct nx842_workmem *workmem;
 	struct nx842_scatterlist slin, slout;
 	struct nx_csbcpb *csbcpb;
-	int ret = 0, max_sync_size, i, bytesleft, size, hdrsize;
-	unsigned long inbuf, outbuf, padding;
+	int ret = 0, max_sync_size;
+	unsigned long inbuf, outbuf;
 	struct vio_pfo_op op = {
 		.done = NULL,
 		.handle = 0,
 		.timeout = 0,
 	};
-	unsigned long start_time = get_tb();
+	unsigned long start = get_tb();
 
-	/*
-	 * Make sure input buffer is 64k page aligned.  This is assumed since
-	 * this driver is designed for page compression only (for now).  This
-	 * is very nice since we can now use direct DDE(s) for the input and
-	 * the alignment is guaranteed.
-	*/
 	inbuf = (unsigned long)in;
-	if (!IS_ALIGNED(inbuf, PAGE_SIZE) || inlen != PAGE_SIZE)
+	if (check_constraints(inbuf, &inlen, true))
+		return -EINVAL;
+
+	outbuf = (unsigned long)out;
+	if (check_constraints(outbuf, outlen, false))
 		return -EINVAL;
 
 	rcu_read_lock();
@@ -344,16 +332,8 @@ static int nx842_pseries_compress(const unsigned char *in, unsigned int inlen,
 	max_sync_size = local_devdata->max_sync_size;
 	dev = local_devdata->dev;
 
-	/* Create the header */
-	hdr = (struct nx842_header *)out;
-	hdr->blocks_nr = PAGE_SIZE / max_sync_size;
-	hdrsize = nx842_header_size(hdr);
-	outbuf = (unsigned long)out + hdrsize;
-	bytesleft = *outlen - hdrsize;
-
 	/* Init scatterlist */
-	workmem = (struct nx842_workmem *)ALIGN((unsigned long)wmem,
-		NX842_HW_PAGE_SIZE);
+	workmem = PTR_ALIGN(wmem, WORKMEM_ALIGN);
 	slin.entries = (struct nx842_slentry *)workmem->slin;
 	slout.entries = (struct nx842_slentry *)workmem->slout;
 
@@ -364,105 +344,48 @@ static int nx842_pseries_compress(const unsigned char *in, unsigned int inlen,
 	op.csbcpb = nx842_get_pa(csbcpb);
 	op.out = nx842_get_pa(slout.entries);
 
-	for (i = 0; i < hdr->blocks_nr; i++) {
-		/*
-		 * Aligning the output blocks to 128 bytes does waste space,
-		 * but it prevents the need for bounce buffers and memory
-		 * copies.  It also simplifies the code a lot.  In the worst
-		 * case (64k page, 4k max_sync_size), you lose up to
-		 * (128*16)/64k = ~3% the compression factor. For 64k
-		 * max_sync_size, the loss would be at most 128/64k = ~0.2%.
-		 */
-		padding = ALIGN(outbuf, IO_BUFFER_ALIGN) - outbuf;
-		outbuf += padding;
-		bytesleft -= padding;
-		if (i == 0)
-			/* save offset into first block in header */
-			hdr->offset = padding + hdrsize;
-
-		if (bytesleft <= 0) {
-			ret = -ENOSPC;
-			goto unlock;
-		}
-
-		/*
-		 * NOTE: If the default max_sync_size is changed from 4k
-		 * to 64k, remove the "likely" case below, since a
-		 * scatterlist will always be needed.
-		 */
-		if (likely(max_sync_size == NX842_HW_PAGE_SIZE)) {
-			/* Create direct DDE */
-			op.in = nx842_get_pa((void *)inbuf);
-			op.inlen = max_sync_size;
-
-		} else {
-			/* Create indirect DDE (scatterlist) */
-			nx842_build_scatterlist(inbuf, max_sync_size, &slin);
-			op.in = nx842_get_pa(slin.entries);
-			op.inlen = -nx842_get_scatterlist_size(&slin);
-		}
+	if ((inbuf & NX842_HW_PAGE_MASK) ==
+	    ((inbuf + inlen - 1) & NX842_HW_PAGE_MASK)) {
+		/* Create direct DDE */
+		op.in = nx842_get_pa((void *)inbuf);
+		op.inlen = inlen;
+	} else {
+		/* Create indirect DDE (scatterlist) */
+		nx842_build_scatterlist(inbuf, inlen, &slin);
+		op.in = nx842_get_pa(slin.entries);
+		op.inlen = -nx842_get_scatterlist_size(&slin);
+	}
 
-		/*
-		 * If max_sync_size != NX842_HW_PAGE_SIZE, an indirect
-		 * DDE is required for the outbuf.
-		 * If max_sync_size == NX842_HW_PAGE_SIZE, outbuf must
-		 * also be page aligned (1 in 128/4k=32 chance) in order
-		 * to use a direct DDE.
-		 * This is unlikely, just use an indirect DDE always.
-		 */
-		nx842_build_scatterlist(outbuf,
-			min(bytesleft, max_sync_size), &slout);
-		/* op.out set before loop */
+	if ((outbuf & NX842_HW_PAGE_MASK) ==
+	    ((outbuf + *outlen - 1) & NX842_HW_PAGE_MASK)) {
+		/* Create direct DDE */
+		op.out = nx842_get_pa((void *)outbuf);
+		op.outlen = *outlen;
+	} else {
+		/* Create indirect DDE (scatterlist) */
+		nx842_build_scatterlist(outbuf, *outlen, &slout);
+		op.out = nx842_get_pa(slout.entries);
 		op.outlen = -nx842_get_scatterlist_size(&slout);
+	}
 
-		/* Send request to pHyp */
-		ret = vio_h_cop_sync(local_devdata->vdev, &op);
-
-		/* Check for pHyp error */
-		if (ret) {
-			dev_dbg(dev, "%s: vio_h_cop_sync error (ret=%d, hret=%ld)\n",
-				__func__, ret, op.hcall_err);
-			ret = -EIO;
-			goto unlock;
-		}
-
-		/* Check for hardware error */
-		ret = nx842_validate_result(dev, &csbcpb->csb);
-		if (ret && ret != -ENOSPC)
-			goto unlock;
-
-		/* Handle incompressible data */
-		if (unlikely(ret == -ENOSPC)) {
-			if (bytesleft < max_sync_size) {
-				/*
-				 * Not enough space left in the output buffer
-				 * to store uncompressed block
-				 */
-				goto unlock;
-			} else {
-				/* Store incompressible block */
-				memcpy((void *)outbuf, (void *)inbuf,
-					max_sync_size);
-				hdr->sizes[i] = -max_sync_size;
-				outbuf += max_sync_size;
-				bytesleft -= max_sync_size;
-				/* Reset ret, incompressible data handled */
-				ret = 0;
-			}
-		} else {
-			/* Normal case, compression was successful */
-			size = csbcpb->csb.processed_byte_count;
-			dev_dbg(dev, "%s: processed_bytes=%d\n",
-				__func__, size);
-			hdr->sizes[i] = size;
-			outbuf += size;
-			bytesleft -= size;
-		}
+	/* Send request to pHyp */
+	ret = vio_h_cop_sync(local_devdata->vdev, &op);
 
-		inbuf += max_sync_size;
+	/* Check for pHyp error */
+	if (ret) {
+		dev_dbg(dev, "%s: vio_h_cop_sync error (ret=%d, hret=%ld)\n",
+			__func__, ret, op.hcall_err);
+		ret = -EIO;
+		goto unlock;
 	}
 
-	*outlen = (unsigned int)(outbuf - (unsigned long)out);
+	/* Check for hardware error */
+	ret = nx842_validate_result(dev, &csbcpb->csb);
+	if (ret)
+		goto unlock;
+
+	*outlen = csbcpb->csb.processed_byte_count;
+	dev_dbg(dev, "%s: processed_bytes=%d\n", __func__, *outlen);
 
 unlock:
 	if (ret)
@@ -470,15 +393,12 @@ unlock:
 	else {
 		nx842_inc_comp_complete(local_devdata);
 		ibm_nx842_incr_hist(local_devdata->counters->comp_times,
-			(get_tb() - start_time) / tb_ticks_per_usec);
+			(get_tb() - start) / tb_ticks_per_usec);
 	}
 	rcu_read_unlock();
 	return ret;
 }
 
-static int sw842_decompress(const unsigned char *, int, unsigned char *, int *,
-			const void *);
-
 /**
  * nx842_pseries_decompress - Decompress data using the 842 algorithm
  *
@@ -490,11 +410,10 @@ static int sw842_decompress(const unsigned char *, int, unsigned char *, int *,
  * If there is an error then @outlen will be 0 and an error will be
  * specified by the return code from this function.
  *
- * @in: Pointer to input buffer, will use bounce buffer if not 128 byte
- *      aligned
+ * @in: Pointer to input buffer
  * @inlen: Length of input buffer
- * @out: Pointer to output buffer, must be page aligned
- * @outlen: Length of output buffer, must be PAGE_SIZE
+ * @out: Pointer to output buffer
+ * @outlen: Length of output buffer
  * @wrkmem: ptr to buffer for working memory, size determined by
  *          NX842_MEM_COMPRESS
  *
@@ -510,43 +429,39 @@ static int nx842_pseries_decompress(const unsigned char *in, unsigned int inlen,
 				    unsigned char *out, unsigned int *outlen,
 				    void *wmem)
 {
-	struct nx842_header *hdr;
 	struct nx842_devdata *local_devdata;
 	struct device *dev = NULL;
 	struct nx842_workmem *workmem;
 	struct nx842_scatterlist slin, slout;
 	struct nx_csbcpb *csbcpb;
-	int ret = 0, i, size, max_sync_size;
+	int ret = 0, max_sync_size;
 	unsigned long inbuf, outbuf;
 	struct vio_pfo_op op = {
 		.done = NULL,
 		.handle = 0,
 		.timeout = 0,
 	};
-	unsigned long start_time = get_tb();
+	unsigned long start = get_tb();
 
 	/* Ensure page alignment and size */
+	inbuf = (unsigned long)in;
+	if (check_constraints(inbuf, &inlen, true))
+		return -EINVAL;
+
 	outbuf = (unsigned long)out;
-	if (!IS_ALIGNED(outbuf, PAGE_SIZE) || *outlen != PAGE_SIZE)
+	if (check_constraints(outbuf, outlen, false))
 		return -EINVAL;
 
 	rcu_read_lock();
 	local_devdata = rcu_dereference(devdata);
-	if (local_devdata)
-		dev = local_devdata->dev;
-
-	/* Get header */
-	hdr = (struct nx842_header *)in;
-
-	workmem = (struct nx842_workmem *)ALIGN((unsigned long)wmem,
-		NX842_HW_PAGE_SIZE);
-
-	inbuf = (unsigned long)in + hdr->offset;
-	if (likely(!IS_ALIGNED(inbuf, IO_BUFFER_ALIGN))) {
-		/* Copy block(s) into bounce buffer for alignment */
-		memcpy(workmem->bounce, in + hdr->offset, inlen - hdr->offset);
-		inbuf = (unsigned long)workmem->bounce;
+	if (!local_devdata || !local_devdata->dev) {
+		rcu_read_unlock();
+		return -ENODEV;
 	}
+	max_sync_size = local_devdata->max_sync_size;
+	dev = local_devdata->dev;
+
+	workmem = PTR_ALIGN(wmem, WORKMEM_ALIGN);
 
 	/* Init scatterlist */
 	slin.entries = (struct nx842_slentry *)workmem->slin;
@@ -558,119 +473,55 @@ static int nx842_pseries_decompress(const unsigned char *in, unsigned int inlen,
 	memset(csbcpb, 0, sizeof(*csbcpb));
 	op.csbcpb = nx842_get_pa(csbcpb);
 
-	/*
-	 * max_sync_size may have changed since compression,
-	 * so we can't read it from the device info. We need
-	 * to derive it from hdr->blocks_nr.
-	 */
-	max_sync_size = PAGE_SIZE / hdr->blocks_nr;
-
-	for (i = 0; i < hdr->blocks_nr; i++) {
-		/* Skip padding */
-		inbuf = ALIGN(inbuf, IO_BUFFER_ALIGN);
-
-		if (hdr->sizes[i] < 0) {
-			/* Negative sizes indicate uncompressed data blocks */
-			size = abs(hdr->sizes[i]);
-			memcpy((void *)outbuf, (void *)inbuf, size);
-			outbuf += size;
-			inbuf += size;
-			continue;
-		}
-
-		if (!dev)
-			goto sw;
-
-		/*
-		 * The better the compression, the more likely the "likely"
-		 * case becomes.
-		 */
-		if (likely((inbuf & NX842_HW_PAGE_MASK) ==
-			((inbuf + hdr->sizes[i] - 1) & NX842_HW_PAGE_MASK))) {
-			/* Create direct DDE */
-			op.in = nx842_get_pa((void *)inbuf);
-			op.inlen = hdr->sizes[i];
-		} else {
-			/* Create indirect DDE (scatterlist) */
-			nx842_build_scatterlist(inbuf, hdr->sizes[i] , &slin);
-			op.in = nx842_get_pa(slin.entries);
-			op.inlen = -nx842_get_scatterlist_size(&slin);
-		}
-
-		/*
-		 * NOTE: If the default max_sync_size is changed from 4k
-		 * to 64k, remove the "likely" case below, since a
-		 * scatterlist will always be needed.
-		 */
-		if (likely(max_sync_size == NX842_HW_PAGE_SIZE)) {
-			/* Create direct DDE */
-			op.out = nx842_get_pa((void *)outbuf);
-			op.outlen = max_sync_size;
-		} else {
-			/* Create indirect DDE (scatterlist) */
-			nx842_build_scatterlist(outbuf, max_sync_size, &slout);
-			op.out = nx842_get_pa(slout.entries);
-			op.outlen = -nx842_get_scatterlist_size(&slout);
-		}
-
-		/* Send request to pHyp */
-		ret = vio_h_cop_sync(local_devdata->vdev, &op);
-
-		/* Check for pHyp error */
-		if (ret) {
-			dev_dbg(dev, "%s: vio_h_cop_sync error (ret=%d, hret=%ld)\n",
-				__func__, ret, op.hcall_err);
-			dev = NULL;
-			goto sw;
-		}
+	if ((inbuf & NX842_HW_PAGE_MASK) ==
+	    ((inbuf + inlen - 1) & NX842_HW_PAGE_MASK)) {
+		/* Create direct DDE */
+		op.in = nx842_get_pa((void *)inbuf);
+		op.inlen = inlen;
+	} else {
+		/* Create indirect DDE (scatterlist) */
+		nx842_build_scatterlist(inbuf, inlen, &slin);
+		op.in = nx842_get_pa(slin.entries);
+		op.inlen = -nx842_get_scatterlist_size(&slin);
+	}
 
-		/* Check for hardware error */
-		ret = nx842_validate_result(dev, &csbcpb->csb);
-		if (ret) {
-			dev = NULL;
-			goto sw;
-		}
+	if ((outbuf & NX842_HW_PAGE_MASK) ==
+	    ((outbuf + *outlen - 1) & NX842_HW_PAGE_MASK)) {
+		/* Create direct DDE */
+		op.out = nx842_get_pa((void *)outbuf);
+		op.outlen = *outlen;
+	} else {
+		/* Create indirect DDE (scatterlist) */
+		nx842_build_scatterlist(outbuf, *outlen, &slout);
+		op.out = nx842_get_pa(slout.entries);
+		op.outlen = -nx842_get_scatterlist_size(&slout);
+	}
 
-		/* HW decompression success */
-		inbuf += hdr->sizes[i];
-		outbuf += csbcpb->csb.processed_byte_count;
-		continue;
-
-sw:
-		/* software decompression */
-		size = max_sync_size;
-		ret = sw842_decompress(
-			(unsigned char *)inbuf, hdr->sizes[i],
-			(unsigned char *)outbuf, &size, wmem);
-		if (ret)
-			pr_debug("%s: sw842_decompress failed with %d\n",
-				__func__, ret);
-
-		if (ret) {
-			if (ret != -ENOSPC && ret != -EINVAL &&
-					ret != -EMSGSIZE)
-				ret = -EIO;
-			goto unlock;
-		}
+	/* Send request to pHyp */
+	ret = vio_h_cop_sync(local_devdata->vdev, &op);
 
-		/* SW decompression success */
-		inbuf += hdr->sizes[i];
-		outbuf += size;
+	/* Check for pHyp error */
+	if (ret) {
+		dev_dbg(dev, "%s: vio_h_cop_sync error (ret=%d, hret=%ld)\n",
+			__func__, ret, op.hcall_err);
+		goto unlock;
 	}
 
-	*outlen = (unsigned int)(outbuf - (unsigned long)out);
+	/* Check for hardware error */
+	ret = nx842_validate_result(dev, &csbcpb->csb);
+	if (ret)
+		goto unlock;
+
+	*outlen = csbcpb->csb.processed_byte_count;
 
 unlock:
 	if (ret)
 		/* decompress fail */
 		nx842_inc_decomp_failed(local_devdata);
 	else {
-		if (!dev)
-			/* software decompress */
-			nx842_inc_swdecomp(local_devdata);
 		nx842_inc_decomp_complete(local_devdata);
 		ibm_nx842_incr_hist(local_devdata->counters->decomp_times,
-			(get_tb() - start_time) / tb_ticks_per_usec);
+			(get_tb() - start) / tb_ticks_per_usec);
 	}
 
 	rcu_read_unlock();
@@ -829,9 +680,9 @@ static int nx842_OF_upd_maxsyncop(struct nx842_devdata *devdata,
 					maxsynccop->decomp_data_limit);
 
 	devdata->max_sync_size = min_t(unsigned int, devdata->max_sync_size,
-					SIZE_64K);
+					65536);
 
-	if (devdata->max_sync_size < SIZE_4K) {
+	if (devdata->max_sync_size < 4096) {
 		dev_err(devdata->dev, "%s: hardware max data size (%u) is "
 				"less than the driver minimum, unable to use "
 				"the hardware device\n",
@@ -1220,17 +1071,17 @@ static int __exit nx842_remove(struct vio_dev *viodev)
 	return 0;
 }
 
-static struct vio_device_id nx842_driver_ids[] = {
+static struct vio_device_id nx842_vio_driver_ids[] = {
 	{NX842_PSERIES_COMPAT_NAME "-v1", NX842_PSERIES_COMPAT_NAME},
 	{"", ""},
 };
 
-static struct vio_driver nx842_driver = {
+static struct vio_driver nx842_vio_driver = {
 	.name = MODULE_NAME,
 	.probe = nx842_probe,
 	.remove = __exit_p(nx842_remove),
 	.get_desired_dma = nx842_get_desired_dma,
-	.id_table = nx842_driver_ids,
+	.id_table = nx842_vio_driver_ids,
 };
 
 static int __init nx842_init(void)
@@ -1249,7 +1100,7 @@ static int __init nx842_init(void)
 	new_devdata->status = UNAVAILABLE;
 	RCU_INIT_POINTER(devdata, new_devdata);
 
-	return vio_register_driver(&nx842_driver);
+	return vio_register_driver(&nx842_vio_driver);
 }
 
 module_init(nx842_init);
@@ -1266,336 +1117,12 @@ static void __exit nx842_exit(void)
 	RCU_INIT_POINTER(devdata, NULL);
 	spin_unlock_irqrestore(&devdata_mutex, flags);
 	synchronize_rcu();
-	if (old_devdata)
+	if (old_devdata && old_devdata->dev)
 		dev_set_drvdata(old_devdata->dev, NULL);
 	kfree(old_devdata);
 	nx842_unregister_driver(&nx842_pseries_driver);
-	vio_unregister_driver(&nx842_driver);
+	vio_unregister_driver(&nx842_vio_driver);
 }
 
 module_exit(nx842_exit);
 
-/*********************************
- * 842 software decompressor
-*********************************/
-typedef int (*sw842_template_op)(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-
-static int sw842_data8(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-static int sw842_data4(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-static int sw842_data2(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-static int sw842_ptr8(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-static int sw842_ptr4(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-static int sw842_ptr2(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-
-/* special templates */
-#define SW842_TMPL_REPEAT 0x1B
-#define SW842_TMPL_ZEROS 0x1C
-#define SW842_TMPL_EOF 0x1E
-
-static sw842_template_op sw842_tmpl_ops[26][4] = {
-	{ sw842_data8, NULL}, /* 0 (00000) */
-	{ sw842_data4, sw842_data2, sw842_ptr2,  NULL},
-	{ sw842_data4, sw842_ptr2,  sw842_data2, NULL},
-	{ sw842_data4, sw842_ptr2,  sw842_ptr2,  NULL},
-	{ sw842_data4, sw842_ptr4,  NULL},
-	{ sw842_data2, sw842_ptr2,  sw842_data4, NULL},
-	{ sw842_data2, sw842_ptr2,  sw842_data2, sw842_ptr2},
-	{ sw842_data2, sw842_ptr2,  sw842_ptr2,  sw842_data2},
-	{ sw842_data2, sw842_ptr2,  sw842_ptr2,  sw842_ptr2,},
-	{ sw842_data2, sw842_ptr2,  sw842_ptr4,  NULL},
-	{ sw842_ptr2,  sw842_data2, sw842_data4, NULL}, /* 10 (01010) */
-	{ sw842_ptr2,  sw842_data4, sw842_ptr2,  NULL},
-	{ sw842_ptr2,  sw842_data2, sw842_ptr2,  sw842_data2},
-	{ sw842_ptr2,  sw842_data2, sw842_ptr2,  sw842_ptr2},
-	{ sw842_ptr2,  sw842_data2, sw842_ptr4,  NULL},
-	{ sw842_ptr2,  sw842_ptr2,  sw842_data4, NULL},
-	{ sw842_ptr2,  sw842_ptr2,  sw842_data2, sw842_ptr2},
-	{ sw842_ptr2,  sw842_ptr2,  sw842_ptr2,  sw842_data2},
-	{ sw842_ptr2,  sw842_ptr2,  sw842_ptr2,  sw842_ptr2},
-	{ sw842_ptr2,  sw842_ptr2,  sw842_ptr4,  NULL},
-	{ sw842_ptr4,  sw842_data4, NULL}, /* 20 (10100) */
-	{ sw842_ptr4,  sw842_data2, sw842_ptr2,  NULL},
-	{ sw842_ptr4,  sw842_ptr2,  sw842_data2, NULL},
-	{ sw842_ptr4,  sw842_ptr2,  sw842_ptr2,  NULL},
-	{ sw842_ptr4,  sw842_ptr4,  NULL},
-	{ sw842_ptr8,  NULL}
-};
-
-/* Software decompress helpers */
-
-static uint8_t sw842_get_byte(const char *buf, int bit)
-{
-	uint8_t tmpl;
-	uint16_t tmp;
-	tmp = htons(*(uint16_t *)(buf));
-	tmp = (uint16_t)(tmp << bit);
-	tmp = ntohs(tmp);
-	memcpy(&tmpl, &tmp, 1);
-	return tmpl;
-}
-
-static uint8_t sw842_get_template(const char **buf, int *bit)
-{
-	uint8_t byte;
-	byte = sw842_get_byte(*buf, *bit);
-	byte = byte >> 3;
-	byte &= 0x1F;
-	*buf += (*bit + 5) / 8;
-	*bit = (*bit + 5) % 8;
-	return byte;
-}
-
-/* repeat_count happens to be 5-bit too (like the template) */
-static uint8_t sw842_get_repeat_count(const char **buf, int *bit)
-{
-	uint8_t byte;
-	byte = sw842_get_byte(*buf, *bit);
-	byte = byte >> 2;
-	byte &= 0x3F;
-	*buf += (*bit + 6) / 8;
-	*bit = (*bit + 6) % 8;
-	return byte;
-}
-
-static uint8_t sw842_get_ptr2(const char **buf, int *bit)
-{
-	uint8_t ptr;
-	ptr = sw842_get_byte(*buf, *bit);
-	(*buf)++;
-	return ptr;
-}
-
-static uint16_t sw842_get_ptr4(const char **buf, int *bit,
-		struct sw842_fifo *fifo)
-{
-	uint16_t ptr;
-	ptr = htons(*(uint16_t *)(*buf));
-	ptr = (uint16_t)(ptr << *bit);
-	ptr = ptr >> 7;
-	ptr &= 0x01FF;
-	*buf += (*bit + 9) / 8;
-	*bit = (*bit + 9) % 8;
-	return ptr;
-}
-
-static uint8_t sw842_get_ptr8(const char **buf, int *bit,
-		struct sw842_fifo *fifo)
-{
-	return sw842_get_ptr2(buf, bit);
-}
-
-/* Software decompress template ops */
-
-static int sw842_data8(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	int ret;
-
-	ret = sw842_data4(inbuf, inbit, outbuf, fifo);
-	if (ret)
-		return ret;
-	ret = sw842_data4(inbuf, inbit, outbuf, fifo);
-	return ret;
-}
-
-static int sw842_data4(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	int ret;
-
-	ret = sw842_data2(inbuf, inbit, outbuf, fifo);
-	if (ret)
-		return ret;
-	ret = sw842_data2(inbuf, inbit, outbuf, fifo);
-	return ret;
-}
-
-static int sw842_data2(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	**outbuf = sw842_get_byte(*inbuf, *inbit);
-	(*inbuf)++;
-	(*outbuf)++;
-	**outbuf = sw842_get_byte(*inbuf, *inbit);
-	(*inbuf)++;
-	(*outbuf)++;
-	return 0;
-}
-
-static int sw842_ptr8(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	uint8_t ptr;
-	ptr = sw842_get_ptr8(inbuf, inbit, fifo);
-	if (!fifo->f84_full && (ptr >= fifo->f8_count))
-		return 1;
-	memcpy(*outbuf, fifo->f8[ptr], 8);
-	*outbuf += 8;
-	return 0;
-}
-
-static int sw842_ptr4(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	uint16_t ptr;
-	ptr = sw842_get_ptr4(inbuf, inbit, fifo);
-	if (!fifo->f84_full && (ptr >= fifo->f4_count))
-		return 1;
-	memcpy(*outbuf, fifo->f4[ptr], 4);
-	*outbuf += 4;
-	return 0;
-}
-
-static int sw842_ptr2(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	uint8_t ptr;
-	ptr = sw842_get_ptr2(inbuf, inbit);
-	if (!fifo->f2_full && (ptr >= fifo->f2_count))
-		return 1;
-	memcpy(*outbuf, fifo->f2[ptr], 2);
-	*outbuf += 2;
-	return 0;
-}
-
-static void sw842_copy_to_fifo(const char *buf, struct sw842_fifo *fifo)
-{
-	unsigned char initial_f2count = fifo->f2_count;
-
-	memcpy(fifo->f8[fifo->f8_count], buf, 8);
-	fifo->f4_count += 2;
-	fifo->f8_count += 1;
-
-	if (!fifo->f84_full && fifo->f4_count >= 512) {
-		fifo->f84_full = 1;
-		fifo->f4_count /= 512;
-	}
-
-	memcpy(fifo->f2[fifo->f2_count++], buf, 2);
-	memcpy(fifo->f2[fifo->f2_count++], buf + 2, 2);
-	memcpy(fifo->f2[fifo->f2_count++], buf + 4, 2);
-	memcpy(fifo->f2[fifo->f2_count++], buf + 6, 2);
-	if (fifo->f2_count < initial_f2count)
-		fifo->f2_full = 1;
-}
-
-static int sw842_decompress(const unsigned char *src, int srclen,
-			unsigned char *dst, int *destlen,
-			const void *wrkmem)
-{
-	uint8_t tmpl;
-	const char *inbuf;
-	int inbit = 0;
-	unsigned char *outbuf, *outbuf_end, *origbuf, *prevbuf;
-	const char *inbuf_end;
-	sw842_template_op op;
-	int opindex;
-	int i, repeat_count;
-	struct sw842_fifo *fifo;
-	int ret = 0;
-
-	fifo = &((struct nx842_workmem *)(wrkmem))->swfifo;
-	memset(fifo, 0, sizeof(*fifo));
-
-	origbuf = NULL;
-	inbuf = src;
-	inbuf_end = src + srclen;
-	outbuf = dst;
-	outbuf_end = dst + *destlen;
-
-	while ((tmpl = sw842_get_template(&inbuf, &inbit)) != SW842_TMPL_EOF) {
-		if (inbuf >= inbuf_end) {
-			ret = -EINVAL;
-			goto out;
-		}
-
-		opindex = 0;
-		prevbuf = origbuf;
-		origbuf = outbuf;
-		switch (tmpl) {
-		case SW842_TMPL_REPEAT:
-			if (prevbuf == NULL) {
-				ret = -EINVAL;
-				goto out;
-			}
-
-			repeat_count = sw842_get_repeat_count(&inbuf,
-								&inbit) + 1;
-
-			/* Did the repeat count advance past the end of input */
-			if (inbuf > inbuf_end) {
-				ret = -EINVAL;
-				goto out;
-			}
-
-			for (i = 0; i < repeat_count; i++) {
-				/* Would this overflow the output buffer */
-				if ((outbuf + 8) > outbuf_end) {
-					ret = -ENOSPC;
-					goto out;
-				}
-
-				memcpy(outbuf, prevbuf, 8);
-				sw842_copy_to_fifo(outbuf, fifo);
-				outbuf += 8;
-			}
-			break;
-
-		case SW842_TMPL_ZEROS:
-			/* Would this overflow the output buffer */
-			if ((outbuf + 8) > outbuf_end) {
-				ret = -ENOSPC;
-				goto out;
-			}
-
-			memset(outbuf, 0, 8);
-			sw842_copy_to_fifo(outbuf, fifo);
-			outbuf += 8;
-			break;
-
-		default:
-			if (tmpl > 25) {
-				ret = -EINVAL;
-				goto out;
-			}
-
-			/* Does this go past the end of the input buffer */
-			if ((inbuf + 2) > inbuf_end) {
-				ret = -EINVAL;
-				goto out;
-			}
-
-			/* Would this overflow the output buffer */
-			if ((outbuf + 8) > outbuf_end) {
-				ret = -ENOSPC;
-				goto out;
-			}
-
-			while (opindex < 4 &&
-				(op = sw842_tmpl_ops[tmpl][opindex++])
-					!= NULL) {
-				ret = (*op)(&inbuf, &inbit, &outbuf, fifo);
-				if (ret) {
-					ret = -EINVAL;
-					goto out;
-				}
-				sw842_copy_to_fifo(origbuf, fifo);
-			}
-		}
-	}
-
-out:
-	if (!ret)
-		*destlen = (unsigned int)(outbuf - dst);
-	else
-		*destlen = 0;
-
-	return ret;
-}
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 10/10] drivers/crypto/nx: add hardware 842 crypto comp alg
  2015-05-06 16:50 ` [PATCHv2 00/10] add 842 hw compression for PowerNV platform Dan Streetman
                     ` (8 preceding siblings ...)
  2015-05-06 16:51   ` [PATCH 09/10] drivers/crypto/nx: simplify pSeries nx842 driver Dan Streetman
@ 2015-05-06 16:51   ` Dan Streetman
  2015-05-07  3:12     ` Herbert Xu
  2015-05-07 17:49   ` [PATCHv3 00/10] add 842 hw compression for PowerNV platform Dan Streetman
  10 siblings, 1 reply; 46+ messages in thread
From: Dan Streetman @ 2015-05-06 16:51 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Dan Streetman, Seth Jennings, Robert Jennings, linux-crypto,
	linuxppc-dev, linux-kernel

Add crypto compression alg for 842 hardware compression and decompression.

This crypto compression alg is named "nx842" to indicate it uses hardware
to perform the compression and decompression, while the software 842
compression alg is named "sw842".  However, since before this split there
was only one 842 compression alg named "842" which only used hardware,
this is also aliased "842" for backwards compatibility.

This uses only the PowerPC coprocessor hardware for 842 compression.  It
also uses the hardware for decompression, but if the hardware fails it will
fall back to the 842 software decompression library, so that decompression
never fails (for valid 842 compressed buffers).  A header must be used in
most cases, due to the hardware's restrictions on the buffers being
specifically aligned and sized.

Due to the header this driver adds, compressed buffers it creates cannot be
directly passed to the 842 software library for decompression.  However,
compressed buffers created by the software 842 library can be passed to
this driver for hardware 842 decompression (with the exception of buffers
containing the "short data" template, as lib/842/842.h explains).

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 drivers/crypto/nx/Kconfig         |  10 +
 drivers/crypto/nx/Makefile        |   2 +
 drivers/crypto/nx/nx-842-crypto.c | 603 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 615 insertions(+)
 create mode 100644 drivers/crypto/nx/nx-842-crypto.c

diff --git a/drivers/crypto/nx/Kconfig b/drivers/crypto/nx/Kconfig
index ee9e259..3e621ad 100644
--- a/drivers/crypto/nx/Kconfig
+++ b/drivers/crypto/nx/Kconfig
@@ -50,4 +50,14 @@ config CRYPTO_DEV_NX_COMPRESS_POWERNV
 	  algorithm.  This supports NX hardware on the PowerNV platform.
 	  If you choose 'M' here, this module will be called nx_compress_powernv.
 
+config CRYPTO_DEV_NX_COMPRESS_CRYPTO
+	tristate "Compression acceleration cryptographic interface"
+	select CRYPTO_ALGAPI
+	select 842_DECOMPRESS
+	default y
+	help
+	  Support for PowerPC Nest (NX) accelerators using the cryptographic
+	  API.  If you choose 'M' here, this module will be called
+	  nx_compress_crypto.
+
 endif
diff --git a/drivers/crypto/nx/Makefile b/drivers/crypto/nx/Makefile
index 6619787..868b5e6 100644
--- a/drivers/crypto/nx/Makefile
+++ b/drivers/crypto/nx/Makefile
@@ -13,6 +13,8 @@ nx-crypto-objs := nx.o \
 obj-$(CONFIG_CRYPTO_DEV_NX_COMPRESS) += nx-compress.o
 obj-$(CONFIG_CRYPTO_DEV_NX_COMPRESS_PSERIES) += nx-compress-pseries.o
 obj-$(CONFIG_CRYPTO_DEV_NX_COMPRESS_POWERNV) += nx-compress-powernv.o
+obj-$(CONFIG_CRYPTO_DEV_NX_COMPRESS_CRYPTO) += nx-compress-crypto.o
 nx-compress-objs := nx-842.o
 nx-compress-pseries-objs := nx-842-pseries.o
 nx-compress-powernv-objs := nx-842-powernv.o
+nx-compress-crypto-objs := nx-842-crypto.o
diff --git a/drivers/crypto/nx/nx-842-crypto.c b/drivers/crypto/nx/nx-842-crypto.c
new file mode 100644
index 0000000..42d0da8
--- /dev/null
+++ b/drivers/crypto/nx/nx-842-crypto.c
@@ -0,0 +1,603 @@
+/*
+ * Cryptographic API for the NX-842 hardware compression.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Copyright (C) IBM Corporation, 2011-2015
+ *
+ * Original Authors: Robert Jennings <rcj@linux.vnet.ibm.com>
+ *                   Seth Jennings <sjenning@linux.vnet.ibm.com>
+ *
+ * Rewrite: Dan Streetman <ddstreet@ieee.org>
+ *
+ * This is an interface to the NX-842 compression hardware in PowerPC
+ * processors.  Most of the complexity of this drvier is due to the fact that
+ * the NX-842 compression hardware requires the input and output data buffers
+ * to be specifically aligned, to be a specific multiple in length, and within
+ * specific minimum and maximum lengths.  Those restrictions, provided by the
+ * nx-842 driver via nx842_constraints, mean this driver must use bounce
+ * buffers and headers to correct misaligned in or out buffers, and to split
+ * input buffers that are too large.
+ *
+ * This driver will fall back to software decompression if the hardware
+ * decompression fails, so this driver's decompression should never fail as
+ * long as the provided compressed buffer is valid.  Any compressed buffer
+ * created by this driver will have a header (except ones where the input
+ * perfectly matches the constraints); so users of this driver cannot simply
+ * pass a compressed buffer created by this driver over to the 842 software
+ * decompression library.  Instead, users must use this driver to decompress;
+ * if the hardware fails or is unavailable, the compressed buffer will be
+ * parsed and the header removed, and the raw 842 buffer(s) passed to the 842
+ * software decompression library.
+ *
+ * This does not fall back to software compression, however, since the caller
+ * of this function is specifically requesting hardware compression; if the
+ * hardware compression fails, the caller can fall back to software
+ * compression, and the raw 842 compressed buffer that the software compressor
+ * creates can be passed to this driver for hardware decompression; any
+ * buffer without our specific header magic is assumed to be a raw 842 buffer
+ * and passed directly to the hardware.  Note that the software compression
+ * library will produce a compressed buffer that is incompatible with the
+ * hardware decompressor if the original input buffer length is not a multiple
+ * of 8; if such a compressed buffer is passed to this driver for
+ * decompression, the hardware will reject it and this driver will then pass
+ * it over to the software library for decompression.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/crypto.h>
+#include <linux/vmalloc.h>
+#include <linux/nx842.h>
+#include <linux/sw842.h>
+#include <linux/ratelimit.h>
+
+/* Use the name "nx842" to register as a crypto compression alg. */
+#define NX842_CRYPTO_NAME	"nx842"
+/* Also alias as "842" to keep backwards compatibility; before the crypto 842
+ * software compression alg (named "sw842") was added, the existing crypto
+ * compression alg was named "842" and used this hardware 842 compression.
+ * This allows users to continue using "842" to access this hardware 842
+ * compression.
+ */
+#define NX842_CRYPTO_ALIAS	"842"
+
+/* The first 5 bits of this magic are 0x1f, which is an invalid 842 5-bit
+ * template (see lib/842/842.h), so this magic number will never appear at
+ * the start of a raw 842 compressed buffer.  That is important, as any buffer
+ * passed to us without this magic is assumed to be a raw 842 compressed
+ * buffer, and passed directly to the hardware to decompress.
+ */
+#define NX842_CRYPTO_MAGIC	(0xf842)
+#define NX842_CRYPTO_GROUP_MAX	(0x20)
+#define NX842_CRYPTO_HEADER_SIZE(g)				\
+	(sizeof(struct nx842_crypto_header) +			\
+	 sizeof(struct nx842_crypto_header_group) * (g))
+#define NX842_CRYPTO_HEADER_MAX_SIZE				\
+	NX842_CRYPTO_HEADER_SIZE(NX842_CRYPTO_GROUP_MAX)
+
+/* bounce buffer size */
+#define BOUNCE_BUFFER_ORDER	(2)
+#define BOUNCE_BUFFER_SIZE					\
+	((unsigned int)(PAGE_SIZE << BOUNCE_BUFFER_ORDER))
+
+/* try longer on comp because we can fallback to sw decomp if hw is busy */
+#define COMP_BUSY_TIMEOUT	(250) /* ms */
+#define DECOMP_BUSY_TIMEOUT	(50) /* ms */
+
+struct nx842_crypto_header_group {
+	__be16 padding;			/* unused bytes at start of group */
+	__be32 compressed_length;	/* compressed bytes in group */
+	__be32 uncompressed_length;	/* bytes after decompression */
+} __packed;
+
+struct nx842_crypto_header {
+	__be16 magic;		/* NX842_CRYPTO_MAGIC */
+	__be16 ignore;		/* decompressed end bytes to ignore */
+	u8 groups;		/* total groups in this header */
+	struct nx842_crypto_header_group group[];
+} __packed;
+
+struct nx842_crypto_param {
+	u8 *in;
+	unsigned int iremain;
+	u8 *out;
+	unsigned int oremain;
+	unsigned int ototal;
+};
+
+static int update_param(struct nx842_crypto_param *p,
+			unsigned int slen, unsigned int dlen)
+{
+	if (p->iremain < slen)
+		return -EOVERFLOW;
+	if (p->oremain < dlen)
+		return -ENOSPC;
+
+	p->in += slen;
+	p->iremain -= slen;
+	p->out += dlen;
+	p->oremain -= dlen;
+	p->ototal += dlen;
+
+	return 0;
+}
+
+struct nx842_crypto_ctx {
+	u8 *wmem;
+	u8 *sbounce, *dbounce;
+
+	struct nx842_crypto_header header;
+	struct nx842_crypto_header_group group[NX842_CRYPTO_GROUP_MAX];
+};
+
+static int nx842_crypto_init(struct crypto_tfm *tfm)
+{
+	struct nx842_crypto_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	ctx->wmem = kmalloc(NX842_MEM_COMPRESS, GFP_KERNEL);
+	ctx->sbounce = (u8 *)__get_free_pages(GFP_KERNEL, BOUNCE_BUFFER_ORDER);
+	ctx->dbounce = (u8 *)__get_free_pages(GFP_KERNEL, BOUNCE_BUFFER_ORDER);
+	if (!ctx->wmem || !ctx->sbounce || !ctx->dbounce) {
+		kfree(ctx->wmem);
+		free_page((unsigned long)ctx->sbounce);
+		free_page((unsigned long)ctx->dbounce);
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void nx842_crypto_exit(struct crypto_tfm *tfm)
+{
+	struct nx842_crypto_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	kfree(ctx->wmem);
+	free_page((unsigned long)ctx->sbounce);
+	free_page((unsigned long)ctx->dbounce);
+}
+
+static int read_constraints(struct nx842_constraints *c)
+{
+	int ret;
+
+	ret = nx842_constraints(c);
+	if (ret) {
+		pr_err_ratelimited("could not get nx842 constraints : %d\n",
+				   ret);
+		return ret;
+	}
+
+	/* limit maximum, to always have enough bounce buffer to decompress */
+	if (c->maximum > BOUNCE_BUFFER_SIZE) {
+		c->maximum = BOUNCE_BUFFER_SIZE;
+		pr_info_once("limiting nx842 maximum to %x\n", c->maximum);
+	}
+
+	return 0;
+}
+
+static int nx842_crypto_add_header(struct nx842_crypto_header *hdr, u8 *buf)
+{
+	int s = NX842_CRYPTO_HEADER_SIZE(hdr->groups);
+
+	/* compress should have added space for header */
+	if (s > be16_to_cpu(hdr->group[0].padding)) {
+		pr_err("Internal error: no space for header\n");
+		return -EINVAL;
+	}
+
+	memcpy(buf, hdr, s);
+
+	print_hex_dump_debug("header ", DUMP_PREFIX_OFFSET, 16, 1, buf, s, 0);
+
+	return 0;
+}
+
+static int compress(struct nx842_crypto_ctx *ctx,
+		    struct nx842_crypto_param *p,
+		    struct nx842_crypto_header_group *g,
+		    struct nx842_constraints *c,
+		    u16 *ignore,
+		    unsigned int hdrsize)
+{
+	unsigned int slen = p->iremain, dlen = p->oremain, tmplen;
+	unsigned int adj_slen = slen;
+	u8 *src = p->in, *dst = p->out;
+	int ret, dskip = 0;
+	ktime_t timeout;
+
+	if (p->iremain == 0)
+		return -EOVERFLOW;
+
+	if (p->oremain == 0 || hdrsize + c->minimum > dlen)
+		return -ENOSPC;
+
+	if (slen % c->multiple)
+		adj_slen = round_up(slen, c->multiple);
+	if (slen < c->minimum)
+		adj_slen = c->minimum;
+	if (slen > c->maximum)
+		adj_slen = slen = c->maximum;
+	if (adj_slen > slen || (u64)src % c->alignment) {
+		adj_slen = min(adj_slen, BOUNCE_BUFFER_SIZE);
+		slen = min(slen, BOUNCE_BUFFER_SIZE);
+		if (adj_slen > slen)
+			memset(ctx->sbounce + slen, 0, adj_slen - slen);
+		memcpy(ctx->sbounce, src, slen);
+		src = ctx->sbounce;
+		slen = adj_slen;
+		pr_debug("using comp sbounce buffer, len %x\n", slen);
+	}
+
+	dst += hdrsize;
+	dlen -= hdrsize;
+
+	if ((u64)dst % c->alignment) {
+		dskip = (int)(PTR_ALIGN(dst, c->alignment) - dst);
+		dst += dskip;
+		dlen -= dskip;
+	}
+	if (dlen % c->multiple)
+		dlen = round_down(dlen, c->multiple);
+	if (dlen < c->minimum) {
+nospc:
+		dst = ctx->dbounce;
+		dlen = min(p->oremain, BOUNCE_BUFFER_SIZE);
+		dlen = round_down(dlen, c->multiple);
+		dskip = 0;
+		pr_debug("using comp dbounce buffer, len %x\n", dlen);
+	}
+	if (dlen > c->maximum)
+		dlen = c->maximum;
+
+	tmplen = dlen;
+	timeout = ktime_add_ms(ktime_get(), COMP_BUSY_TIMEOUT);
+	do {
+		dlen = tmplen; /* reset dlen, if we're retrying */
+		ret = nx842_compress(src, slen, dst, &dlen, ctx->wmem);
+		/* possibly we should reduce the slen here, instead of
+		 * retrying with the dbounce buffer?
+		 */
+		if (ret == -ENOSPC && dst != ctx->dbounce)
+			goto nospc;
+	} while (ret == -EBUSY && ktime_before(ktime_get(), timeout));
+	if (ret)
+		return ret;
+
+	dskip += hdrsize;
+
+	if (dst == ctx->dbounce)
+		memcpy(p->out + dskip, dst, dlen);
+
+	g->padding = cpu_to_be16(dskip);
+	g->compressed_length = cpu_to_be32(dlen);
+	g->uncompressed_length = cpu_to_be32(slen);
+
+	if (p->iremain < slen) {
+		*ignore = slen - p->iremain;
+		slen = p->iremain;
+	}
+
+	pr_debug("compress slen %x ignore %x dlen %x padding %x\n",
+		 slen, *ignore, dlen, dskip);
+
+	return update_param(p, slen, dskip + dlen);
+}
+
+static int nx842_crypto_compress(struct crypto_tfm *tfm,
+				 const u8 *src, unsigned int slen,
+				 u8 *dst, unsigned int *dlen)
+{
+	struct nx842_crypto_ctx *ctx = crypto_tfm_ctx(tfm);
+	struct nx842_crypto_header *hdr = &ctx->header;
+	struct nx842_crypto_param p;
+	struct nx842_constraints c;
+	unsigned int groups, hdrsize, h;
+	int ret, n;
+	bool add_header;
+	u16 ignore = 0;
+
+	if (!tfm || !src || !slen || !dst || !dlen)
+		return -EINVAL;
+
+	p.in = (u8 *)src;
+	p.iremain = slen;
+	p.out = dst;
+	p.oremain = *dlen;
+	p.ototal = 0;
+
+	*dlen = 0;
+
+	ret = read_constraints(&c);
+	if (ret)
+		return ret;
+
+	groups = min_t(unsigned int, NX842_CRYPTO_GROUP_MAX,
+		       DIV_ROUND_UP(p.iremain, c.maximum));
+	hdrsize = NX842_CRYPTO_HEADER_SIZE(groups);
+
+	/* skip adding header if the buffers meet all constraints */
+	add_header = (p.iremain % c.multiple	||
+		      p.iremain < c.minimum	||
+		      p.iremain > c.maximum	||
+		      (u64)p.in % c.alignment	||
+		      p.oremain % c.multiple	||
+		      p.oremain < c.minimum	||
+		      p.oremain > c.maximum	||
+		      (u64)p.out % c.alignment);
+
+	hdr->magic = cpu_to_be16(NX842_CRYPTO_MAGIC);
+	hdr->groups = 0;
+	hdr->ignore = 0;
+
+	while (p.iremain > 0) {
+		n = hdr->groups++;
+		if (hdr->groups > NX842_CRYPTO_GROUP_MAX)
+			return -ENOSPC;
+
+		/* header goes before first group */
+		h = !n && add_header ? hdrsize : 0;
+
+		if (ignore)
+			pr_warn("interal error, ignore is set %x\n", ignore);
+
+		ret = compress(ctx, &p, &hdr->group[n], &c, &ignore, h);
+		if (ret)
+			return ret;
+	}
+
+	if (!add_header && hdr->groups > 1) {
+		pr_err("Internal error: No header but multiple groups\n");
+		return -EINVAL;
+	}
+
+	/* ignore indicates the input stream needed to be padded */
+	hdr->ignore = cpu_to_be16(ignore);
+	if (ignore)
+		pr_debug("marked %d bytes as ignore\n", ignore);
+
+	if (add_header)
+		ret = nx842_crypto_add_header(hdr, dst);
+	if (ret)
+		return ret;
+
+	*dlen = p.ototal;
+
+	pr_debug("compress total slen %x dlen %x\n", slen, *dlen);
+
+	return 0;
+}
+
+static int decompress(struct nx842_crypto_ctx *ctx,
+		      struct nx842_crypto_param *p,
+		      struct nx842_crypto_header_group *g,
+		      struct nx842_constraints *c,
+		      u16 ignore,
+		      bool usehw)
+{
+	unsigned int slen = be32_to_cpu(g->compressed_length);
+	unsigned int required_len = be32_to_cpu(g->uncompressed_length);
+	unsigned int dlen = p->oremain, tmplen;
+	unsigned int adj_slen = slen;
+	u8 *src = p->in, *dst = p->out;
+	u16 padding = be16_to_cpu(g->padding);
+	int ret, spadding = 0, dpadding = 0;
+	ktime_t timeout;
+
+	if (!slen || !required_len)
+		return -EINVAL;
+
+	if (p->iremain <= 0 || padding + slen > p->iremain)
+		return -EOVERFLOW;
+
+	if (p->oremain <= 0 || required_len - ignore > p->oremain)
+		return -ENOSPC;
+
+	src += padding;
+
+	if (!usehw)
+		goto usesw;
+
+	if (slen % c->multiple)
+		adj_slen = round_up(slen, c->multiple);
+	if (slen < c->minimum)
+		adj_slen = c->minimum;
+	if (slen > c->maximum)
+		goto usesw;
+	if (slen < adj_slen || (u64)src % c->alignment) {
+		/* we can append padding bytes because the 842 format defines
+		 * an "end" template (see lib/842/842_decompress.c) and will
+		 * ignore any bytes following it.
+		 */
+		if (slen < adj_slen)
+			memset(ctx->sbounce + slen, 0, adj_slen - slen);
+		memcpy(ctx->sbounce, src, slen);
+		src = ctx->sbounce;
+		spadding = adj_slen - slen;
+		slen = adj_slen;
+		pr_debug("using decomp sbounce buffer, len %x\n", slen);
+	}
+
+	if (dlen % c->multiple)
+		dlen = round_down(dlen, c->multiple);
+	if (dlen < required_len || (u64)dst % c->alignment) {
+		dst = ctx->dbounce;
+		dlen = min(required_len, BOUNCE_BUFFER_SIZE);
+		pr_debug("using decomp dbounce buffer, len %x\n", dlen);
+	}
+	if (dlen < c->minimum)
+		goto usesw;
+	if (dlen > c->maximum)
+		dlen = c->maximum;
+
+	tmplen = dlen;
+	timeout = ktime_add_ms(ktime_get(), DECOMP_BUSY_TIMEOUT);
+	do {
+		dlen = tmplen; /* reset dlen, if we're retrying */
+		ret = nx842_decompress(src, slen, dst, &dlen, ctx->wmem);
+	} while (ret == -EBUSY && ktime_before(ktime_get(), timeout));
+	if (ret) {
+usesw:
+		/* reset everything, sw doesn't have constraints */
+		src = p->in + padding;
+		slen = be32_to_cpu(g->compressed_length);
+		spadding = 0;
+		dst = p->out;
+		dlen = p->oremain;
+		dpadding = 0;
+		if (dlen < required_len) { /* have ignore bytes */
+			dst = ctx->dbounce;
+			dlen = BOUNCE_BUFFER_SIZE;
+		}
+		pr_info_ratelimited("using software 842 decompression\n");
+		ret = sw842_decompress(src, slen, dst, &dlen);
+	}
+	if (ret)
+		return ret;
+
+	slen -= spadding;
+
+	dlen -= ignore;
+	if (ignore)
+		pr_debug("ignoring last %x bytes\n", ignore);
+
+	if (dst == ctx->dbounce)
+		memcpy(p->out, dst, dlen);
+
+	pr_debug("decompress slen %x padding %x dlen %x ignore %x\n",
+		 slen, padding, dlen, ignore);
+
+	return update_param(p, slen + padding, dlen);
+}
+
+static int nx842_crypto_decompress(struct crypto_tfm *tfm,
+				   const u8 *src, unsigned int slen,
+				   u8 *dst, unsigned int *dlen)
+{
+	struct nx842_crypto_ctx *ctx = crypto_tfm_ctx(tfm);
+	struct nx842_crypto_header *hdr;
+	struct nx842_crypto_param p;
+	struct nx842_constraints c;
+	int n, ret, hdr_len;
+	u16 ignore = 0;
+	bool usehw = true;
+
+	if (!tfm || !src || !slen || !dst || !dlen)
+		return -EINVAL;
+
+	p.in = (u8 *)src;
+	p.iremain = slen;
+	p.out = dst;
+	p.oremain = *dlen;
+	p.ototal = 0;
+
+	*dlen = 0;
+
+	if (read_constraints(&c))
+		usehw = false;
+
+	hdr = (struct nx842_crypto_header *)src;
+
+	/* If it doesn't start with our header magic number, assume it's a raw
+	 * 842 compressed buffer and pass it directly to the hardware driver
+	 */
+	if (be16_to_cpu(hdr->magic) != NX842_CRYPTO_MAGIC) {
+		struct nx842_crypto_header_group g = {
+			.padding =		0,
+			.compressed_length =	cpu_to_be32(p.iremain),
+			.uncompressed_length =	cpu_to_be32(p.oremain),
+		};
+
+		ret = decompress(ctx, &p, &g, &c, 0, usehw);
+		if (ret)
+			return ret;
+
+		*dlen = p.ototal;
+
+		return 0;
+	}
+
+	if (!hdr->groups) {
+		pr_err("header has no groups\n");
+		return -EINVAL;
+	}
+	if (hdr->groups > NX842_CRYPTO_GROUP_MAX) {
+		pr_err("header has too many groups %x, max %x\n",
+		       hdr->groups, NX842_CRYPTO_GROUP_MAX);
+		return -EINVAL;
+	}
+
+	hdr_len = NX842_CRYPTO_HEADER_SIZE(hdr->groups);
+	if (hdr_len > slen)
+		return -EOVERFLOW;
+
+	memcpy(&ctx->header, src, hdr_len);
+	hdr = &ctx->header;
+
+	for (n = 0; n < hdr->groups; n++) {
+		/* ignore applies to last group */
+		if (n + 1 == hdr->groups)
+			ignore = be16_to_cpu(hdr->ignore);
+
+		ret = decompress(ctx, &p, &hdr->group[n], &c, ignore, usehw);
+		if (ret)
+			return ret;
+	}
+
+	*dlen = p.ototal;
+
+	pr_debug("decompress total slen %x dlen %x\n", slen, *dlen);
+
+	return 0;
+}
+
+static struct crypto_alg algs[2] = { {
+	.cra_name		= NX842_CRYPTO_NAME,
+	.cra_flags		= CRYPTO_ALG_TYPE_COMPRESS,
+	.cra_ctxsize		= sizeof(struct nx842_crypto_ctx),
+	.cra_module		= THIS_MODULE,
+	.cra_init		= nx842_crypto_init,
+	.cra_exit		= nx842_crypto_exit,
+	.cra_u			= { .compress = {
+	.coa_compress		= nx842_crypto_compress,
+	.coa_decompress		= nx842_crypto_decompress } }
+}, {
+	.cra_name		= NX842_CRYPTO_ALIAS,
+	.cra_flags		= CRYPTO_ALG_TYPE_COMPRESS,
+	.cra_ctxsize		= sizeof(struct nx842_crypto_ctx),
+	.cra_module		= THIS_MODULE,
+	.cra_init		= nx842_crypto_init,
+	.cra_exit		= nx842_crypto_exit,
+	.cra_u			= { .compress = {
+	.coa_compress		= nx842_crypto_compress,
+	.coa_decompress		= nx842_crypto_decompress } }
+} };
+
+static int __init nx842_crypto_mod_init(void)
+{
+	return crypto_register_algs(algs, ARRAY_SIZE(algs));
+}
+module_init(nx842_crypto_mod_init);
+
+static void __exit nx842_crypto_mod_exit(void)
+{
+	crypto_unregister_algs(algs, ARRAY_SIZE(algs));
+}
+module_exit(nx842_crypto_mod_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("IBM PowerPC Nest (NX) 842 Hardware Compression Interface");
+MODULE_ALIAS_CRYPTO(NX842_CRYPTO_NAME);
+MODULE_ALIAS_CRYPTO(NX842_CRYPTO_ALIAS);
+MODULE_AUTHOR("Dan Streetman <ddstreet@ieee.org>");
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 04/10] crypto: change 842 alg to use software
  2015-05-06 16:51   ` [PATCH 04/10] crypto: change 842 alg to use software Dan Streetman
@ 2015-05-07  3:07     ` Herbert Xu
  0 siblings, 0 replies; 46+ messages in thread
From: Herbert Xu @ 2015-05-07  3:07 UTC (permalink / raw)
  To: Dan Streetman
  Cc: David S. Miller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Seth Jennings, Robert Jennings, linux-crypto,
	linuxppc-dev, linux-kernel

On Wed, May 06, 2015 at 12:51:00PM -0400, Dan Streetman wrote:
> Change the crypto 842 compression alg to use the software 842 compression
> and decompression library.  Change the name of this crypto alg to "sw842".
> Remove the fallback to LZO compression.

That's not how the name works.  All implementations of 842 should
bear that name.  They should differentiate themselves based on
cra_driver_name.  For example, we generally call the software
implementation of foo "foo-generic".

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/10] drivers/crypto/nx: add hardware 842 crypto comp alg
  2015-05-06 16:51   ` [PATCH 10/10] drivers/crypto/nx: add hardware 842 crypto comp alg Dan Streetman
@ 2015-05-07  3:12     ` Herbert Xu
  2015-05-07 15:06       ` Dan Streetman
  0 siblings, 1 reply; 46+ messages in thread
From: Herbert Xu @ 2015-05-07  3:12 UTC (permalink / raw)
  To: Dan Streetman
  Cc: David S. Miller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Seth Jennings, Robert Jennings, linux-crypto,
	linuxppc-dev, linux-kernel

On Wed, May 06, 2015 at 12:51:06PM -0400, Dan Streetman wrote:
> Add crypto compression alg for 842 hardware compression and decompression.
> 
> This crypto compression alg is named "nx842" to indicate it uses hardware
> to perform the compression and decompression, while the software 842
> compression alg is named "sw842".  However, since before this split there
> was only one 842 compression alg named "842" which only used hardware,
> this is also aliased "842" for backwards compatibility.

This should still be called 842.  You can set the driver name to
nx842 or 842-nx.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/10] drivers/crypto/nx: add hardware 842 crypto comp alg
  2015-05-07  3:12     ` Herbert Xu
@ 2015-05-07 15:06       ` Dan Streetman
  2015-05-08  2:32         ` Herbert Xu
  0 siblings, 1 reply; 46+ messages in thread
From: Dan Streetman @ 2015-05-07 15:06 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Seth Jennings, Robert Jennings, linux-crypto,
	linuxppc-dev, linux-kernel

On Wed, May 6, 2015 at 11:12 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Wed, May 06, 2015 at 12:51:06PM -0400, Dan Streetman wrote:
>> Add crypto compression alg for 842 hardware compression and decompression.
>>
>> This crypto compression alg is named "nx842" to indicate it uses hardware
>> to perform the compression and decompression, while the software 842
>> compression alg is named "sw842".  However, since before this split there
>> was only one 842 compression alg named "842" which only used hardware,
>> this is also aliased "842" for backwards compatibility.
>
> This should still be called 842.  You can set the driver name to
> nx842 or 842-nx.

ah, ok, will do.

So, I'm wondering about the common NX 842 frontend driver, for the
pSeries and PowerNV platform drivers.  The current setup is:

[ crypto "842-nx" driver ]
           v
[ nx-842 main driver ]
          v
[ nx-842 pSeries driver | nx-842 PowerNV driver ]

The main reason for that is that the HW has specific constraints,
specifically each input and output buffer passed to it for comp or
decomp has to:
-be located at a specific alignment
-have a length of a specific multiple
-have a length between a specific minimum and maximum

The crypto 842-nx has (significant) code in it to handle any alignment
and length input buffers, to match them to what the driver requires.
Would it be better to move that into the crypto code, so that any
crypto compression hw driver can request buffers be specifically
aligned/sized?  I did have to use a header on each compressed buffer
that needed re-alignment or re-sizing, so maybe it's not appropriate
for common crypto compression code.

Since there doesn't seem to be any other hw compression drivers (yet),
maybe it should stay in the 842-nx code, at least for now.  Hopefully
any future compression hw won't have alignment or length multiple
restrictions...

>
> Cheers,
> --
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCHv3 00/10] add 842 hw compression for PowerNV platform
  2015-05-06 16:50 ` [PATCHv2 00/10] add 842 hw compression for PowerNV platform Dan Streetman
                     ` (9 preceding siblings ...)
  2015-05-06 16:51   ` [PATCH 10/10] drivers/crypto/nx: add hardware 842 crypto comp alg Dan Streetman
@ 2015-05-07 17:49   ` Dan Streetman
  2015-05-07 17:49     ` [PATCH 01/10] powerpc: export of_get_ibm_chip_id function Dan Streetman
                       ` (10 more replies)
  10 siblings, 11 replies; 46+ messages in thread
From: Dan Streetman @ 2015-05-07 17:49 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Seth Jennings, Robert Jennings, linux-crypto, linuxppc-dev,
	linux-kernel, Dan Streetman

IBM PowerPC processors starting at version P7+ contain a NX coprocessor
that provides various hw-accelerated functions, one of which is memory
compression to the IBM "842" compression format.  This NX-842 coprocessor
is already supported on the pSeries platform, by the nx-842.c driver and
the crypto compression interface at crypto/842.c.  This patch set adds
support for NX-842 on the PowerNV (Non-Virtualized) platform, as well as
adding a full software 842 compression/decompression implementation.

Quick summary of changes: the current 842 crypto compression interface uses
only the 842 hardware on pSeries platforms, and can handle only page-sized
and page-aligned uncompressed buffers.  These patches add a full software
842 impementation, change the crypto/ directory 842 interface to a
software only implementation, add a 842 hardware crypto compression
interface that can handle any size and alignment buffers, add a
driver for 842 hardware on PowerNV platforms, and create a common
interface for both 842 hardware platform drivers.

The existing pSeries platform NX-842 driver could not be re-used for the
PowerNV platform driver, as there are fundamentally different interfaces;
on pSeries the system hypervisor (pHyp) provides the interface and manages
communication with the coprocessor, while on PowerNV the kernel talks directly
to the coprocessor using the ICSWX instruction.  The data structures used to
describe each compression or decompression request to the coprocessor are
also different between pHyp's interface and direct communication with ICSWX.
So, different drivers for pSeries and PowerNV are required.  Adding the new
PowerNV driver but keeping the interface to the drivers the same required
adding a new common frontend interface, to which only one of the platform
drivers will connect (based on what platform the kernel is currently running
on), and moving some functionality out of the existing pSeries driver into a
more common location.

The existing crypto/842.c interface is in the wrong place, since crypto/
should only contain software implementations; so lib/842/ is added
containing a reference (i.e. rather slow) implementation in software
of both 842 compression and 842 decompression.  The crypto/842.c interface
is changed to use only that software implementation.

The hardware 842 crypto compression interface is moved to
drivers/crypto/nx/nx-842-crypto.c.  It is also modified to be able to
handle any alignment/length input or output buffer; currently it is only
able to handle page-size and page-aligned (uncompressed) buffers, due to
restrictions in the pSeries 842 hardware driver.

v3 changes the sw and hw crypto drivers to use the same alg name "842",
and different driver names, "842-generic" and "842-nx"


Dan Streetman (10):
  powerpc: export of_get_ibm_chip_id function
  powerpc: Add ICSWX instruction
  lib: add software 842 compression/decompression
  crypto: change 842 alg to use software
  drivers/crypto/nx: rename nx-842.c to nx-842-pseries.c
  drivers/crypto/nx: add NX-842 platform frontend driver
  drivers/crypto/nx: add nx842 constraints
  drivers/crypto/nx: add PowerNV platform NX-842 driver
  drivers/crypto/nx: simplify pSeries nx842 driver
  drivers/crypto/nx: add hardware 842 crypto comp alg

 MAINTAINERS                           |    5 +-
 arch/powerpc/include/asm/icswx.h      |  184 ++++
 arch/powerpc/include/asm/ppc-opcode.h |   13 +
 arch/powerpc/kernel/prom.c            |    1 +
 crypto/842.c                          |  174 +---
 crypto/Kconfig                        |    7 +-
 drivers/crypto/Kconfig                |   10 +-
 drivers/crypto/nx/Kconfig             |   55 +-
 drivers/crypto/nx/Makefile            |    6 +
 drivers/crypto/nx/nx-842-crypto.c     |  585 ++++++++++++
 drivers/crypto/nx/nx-842-powernv.c    |  625 +++++++++++++
 drivers/crypto/nx/nx-842-pseries.c    | 1128 +++++++++++++++++++++++
 drivers/crypto/nx/nx-842.c            | 1623 +++------------------------------
 drivers/crypto/nx/nx-842.h            |  131 +++
 include/linux/nx842.h                 |   21 +-
 include/linux/sw842.h                 |   12 +
 lib/842/842.h                         |  127 +++
 lib/842/842_compress.c                |  626 +++++++++++++
 lib/842/842_debugfs.h                 |   52 ++
 lib/842/842_decompress.c              |  405 ++++++++
 lib/842/Makefile                      |    2 +
 lib/Kconfig                           |    6 +
 lib/Makefile                          |    2 +
 23 files changed, 4120 insertions(+), 1680 deletions(-)
 create mode 100644 arch/powerpc/include/asm/icswx.h
 create mode 100644 drivers/crypto/nx/nx-842-crypto.c
 create mode 100644 drivers/crypto/nx/nx-842-powernv.c
 create mode 100644 drivers/crypto/nx/nx-842-pseries.c
 create mode 100644 drivers/crypto/nx/nx-842.h
 create mode 100644 include/linux/sw842.h
 create mode 100644 lib/842/842.h
 create mode 100644 lib/842/842_compress.c
 create mode 100644 lib/842/842_debugfs.h
 create mode 100644 lib/842/842_decompress.c
 create mode 100644 lib/842/Makefile

-- 
2.1.0

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 01/10] powerpc: export of_get_ibm_chip_id function
  2015-05-07 17:49   ` [PATCHv3 00/10] add 842 hw compression for PowerNV platform Dan Streetman
@ 2015-05-07 17:49     ` Dan Streetman
  2015-05-07 17:49     ` [PATCH 02/10] powerpc: Add ICSWX instruction Dan Streetman
                       ` (9 subsequent siblings)
  10 siblings, 0 replies; 46+ messages in thread
From: Dan Streetman @ 2015-05-07 17:49 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Rob Herring, linux-kernel, Robert Jennings, linux-crypto,
	linuxppc-dev, Seth Jennings, Dan Streetman, Anton Blanchard

Export the of_get_ibm_chip_id() function.  This will be used by the
PowerNV NX-842 driver.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 arch/powerpc/kernel/prom.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 308c5e1..ea2cea7 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -800,6 +800,7 @@ int of_get_ibm_chip_id(struct device_node *np)
 	}
 	return -1;
 }
+EXPORT_SYMBOL(of_get_ibm_chip_id);
 
 /**
  * cpu_to_chip_id - Return the cpus chip-id
-- 
2.1.0

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 02/10] powerpc: Add ICSWX instruction
  2015-05-07 17:49   ` [PATCHv3 00/10] add 842 hw compression for PowerNV platform Dan Streetman
  2015-05-07 17:49     ` [PATCH 01/10] powerpc: export of_get_ibm_chip_id function Dan Streetman
@ 2015-05-07 17:49     ` Dan Streetman
  2015-05-07 17:49     ` [PATCH 03/10] lib: add software 842 compression/decompression Dan Streetman
                       ` (8 subsequent siblings)
  10 siblings, 0 replies; 46+ messages in thread
From: Dan Streetman @ 2015-05-07 17:49 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Seth Jennings, Robert Jennings, linux-crypto, linuxppc-dev,
	linux-kernel, Dan Streetman

Add the asm ICSWX and ICSWEPX opcodes.  Add definitions for the
Coprocessor Request structures needed to use the icswx calls to
coprocessors.  Add icswx() function to perform the ICSWX asm
using the provided Coprocessor Command Word value and
Coprocessor Request Block structure.

This is required for communication with the NX-842 coprocessor on
a PowerNV system.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 arch/powerpc/include/asm/icswx.h      | 184 ++++++++++++++++++++++++++++++++++
 arch/powerpc/include/asm/ppc-opcode.h |  13 +++
 2 files changed, 197 insertions(+)
 create mode 100644 arch/powerpc/include/asm/icswx.h

diff --git a/arch/powerpc/include/asm/icswx.h b/arch/powerpc/include/asm/icswx.h
new file mode 100644
index 0000000..9f8402b
--- /dev/null
+++ b/arch/powerpc/include/asm/icswx.h
@@ -0,0 +1,184 @@
+/*
+ * ICSWX api
+ *
+ * Copyright (C) 2015 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * This provides the Initiate Coprocessor Store Word Indexed (ICSWX)
+ * instruction.  This instruction is used to communicate with PowerPC
+ * coprocessors.  This also provides definitions of the structures used
+ * to communicate with the coprocessor.
+ *
+ * The RFC02130: Coprocessor Architecture document is the reference for
+ * everything in this file unless otherwise noted.
+ */
+#ifndef _ARCH_POWERPC_INCLUDE_ASM_ICSWX_H_
+#define _ARCH_POWERPC_INCLUDE_ASM_ICSWX_H_
+
+#include <asm/ppc-opcode.h> /* for PPC_ICSWX */
+
+/* Chapter 6.5.8 Coprocessor-Completion Block (CCB) */
+
+#define CCB_VALUE		(0x3fffffffffffffff)
+#define CCB_ADDRESS		(0xfffffffffffffff8)
+#define CCB_CM			(0x0000000000000007)
+#define CCB_CM0			(0x0000000000000004)
+#define CCB_CM12		(0x0000000000000003)
+
+#define CCB_CM0_ALL_COMPLETIONS	(0x0)
+#define CCB_CM0_LAST_IN_CHAIN	(0x4)
+#define CCB_CM12_STORE		(0x0)
+#define CCB_CM12_INTERRUPT	(0x1)
+
+#define CCB_SIZE		(0x10)
+#define CCB_ALIGN		CCB_SIZE
+
+struct coprocessor_completion_block {
+	__be64 value;
+	__be64 address;
+} __packed __aligned(CCB_ALIGN);
+
+
+/* Chapter 6.5.7 Coprocessor-Status Block (CSB) */
+
+#define CSB_V			(0x80)
+#define CSB_F			(0x04)
+#define CSB_CH			(0x03)
+#define CSB_CE_INCOMPLETE	(0x80)
+#define CSB_CE_TERMINATION	(0x40)
+#define CSB_CE_TPBC		(0x20)
+
+#define CSB_CC_SUCCESS		(0)
+#define CSB_CC_INVALID_ALIGN	(1)
+#define CSB_CC_OPERAND_OVERLAP	(2)
+#define CSB_CC_DATA_LENGTH	(3)
+#define CSB_CC_TRANSLATION	(5)
+#define CSB_CC_PROTECTION	(6)
+#define CSB_CC_RD_EXTERNAL	(7)
+#define CSB_CC_INVALID_OPERAND	(8)
+#define CSB_CC_PRIVILEGE	(9)
+#define CSB_CC_INTERNAL		(10)
+#define CSB_CC_WR_EXTERNAL	(12)
+#define CSB_CC_NOSPC		(13)
+#define CSB_CC_EXCESSIVE_DDE	(14)
+#define CSB_CC_WR_TRANSLATION	(15)
+#define CSB_CC_WR_PROTECTION	(16)
+#define CSB_CC_UNKNOWN_CODE	(17)
+#define CSB_CC_ABORT		(18)
+#define CSB_CC_TRANSPORT	(20)
+#define CSB_CC_SEGMENTED_DDL	(31)
+#define CSB_CC_PROGRESS_POINT	(32)
+#define CSB_CC_DDE_OVERFLOW	(33)
+#define CSB_CC_SESSION		(34)
+#define CSB_CC_PROVISION	(36)
+#define CSB_CC_CHAIN		(37)
+#define CSB_CC_SEQUENCE		(38)
+#define CSB_CC_HW		(39)
+
+#define CSB_SIZE		(0x10)
+#define CSB_ALIGN		CSB_SIZE
+
+struct coprocessor_status_block {
+	u8 flags;
+	u8 cs;
+	u8 cc;
+	u8 ce;
+	__be32 count;
+	__be64 address;
+} __packed __aligned(CSB_ALIGN);
+
+
+/* Chapter 6.5.10 Data-Descriptor List (DDL)
+ * each list contains one or more Data-Descriptor Entries (DDE)
+ */
+
+#define DDE_P			(0x8000)
+
+#define DDE_SIZE		(0x10)
+#define DDE_ALIGN		DDE_SIZE
+
+struct data_descriptor_entry {
+	__be16 flags;
+	u8 count;
+	u8 index;
+	__be32 length;
+	__be64 address;
+} __packed __aligned(DDE_ALIGN);
+
+
+/* Chapter 6.5.2 Coprocessor-Request Block (CRB) */
+
+#define CRB_SIZE		(0x80)
+#define CRB_ALIGN		(0x100) /* Errata: requires 256 alignment */
+
+/* Coprocessor Status Block field
+ *   ADDRESS	address of CSB
+ *   C		CCB is valid
+ *   AT		0 = addrs are virtual, 1 = addrs are phys
+ *   M		enable perf monitor
+ */
+#define CRB_CSB_ADDRESS		(0xfffffffffffffff0)
+#define CRB_CSB_C		(0x0000000000000008)
+#define CRB_CSB_AT		(0x0000000000000002)
+#define CRB_CSB_M		(0x0000000000000001)
+
+struct coprocessor_request_block {
+	__be32 ccw;
+	__be32 flags;
+	__be64 csb_addr;
+
+	struct data_descriptor_entry source;
+	struct data_descriptor_entry target;
+
+	struct coprocessor_completion_block ccb;
+
+	u8 reserved[48];
+
+	struct coprocessor_status_block csb;
+} __packed __aligned(CRB_ALIGN);
+
+
+/* RFC02167 Initiate Coprocessor Instructions document
+ * Chapter 8.2.1.1.1 RS
+ * Chapter 8.2.3 Coprocessor Directive
+ * Chapter 8.2.4 Execution
+ *
+ * The CCW must be converted to BE before passing to icswx()
+ */
+
+#define CCW_PS			(0xff000000)
+#define CCW_CT			(0x00ff0000)
+#define CCW_CD			(0x0000ffff)
+#define CCW_CL			(0x0000c000)
+
+
+/* RFC02167 Initiate Coprocessor Instructions document
+ * Chapter 8.2.1 Initiate Coprocessor Store Word Indexed (ICSWX)
+ * Chapter 8.2.4.1 Condition Register 0
+ */
+
+#define ICSWX_INITIATED		(0x8)
+#define ICSWX_BUSY		(0x4)
+#define ICSWX_REJECTED		(0x2)
+
+static inline int icswx(__be32 ccw, struct coprocessor_request_block *crb)
+{
+	__be64 ccw_reg = ccw;
+	u32 cr;
+
+	__asm__ __volatile__(
+	PPC_ICSWX(%1,0,%2) "\n"
+	"mfcr %0\n"
+	: "=r" (cr)
+	: "r" (ccw_reg), "r" (crb)
+	: "cr0", "memory");
+
+	return (int)((cr >> 28) & 0xf);
+}
+
+
+#endif /* _ARCH_POWERPC_INCLUDE_ASM_ICSWX_H_ */
diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index 5c93f69..8452335 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -136,6 +136,8 @@
 #define PPC_INST_DCBAL			0x7c2005ec
 #define PPC_INST_DCBZL			0x7c2007ec
 #define PPC_INST_ICBT			0x7c00002c
+#define PPC_INST_ICSWX			0x7c00032d
+#define PPC_INST_ICSWEPX		0x7c00076d
 #define PPC_INST_ISEL			0x7c00001e
 #define PPC_INST_ISEL_MASK		0xfc00003e
 #define PPC_INST_LDARX			0x7c0000a8
@@ -403,4 +405,15 @@
 #define MFTMR(tmr, r)		stringify_in_c(.long PPC_INST_MFTMR | \
 					       TMRN(tmr) | ___PPC_RT(r))
 
+/* Coprocessor instructions */
+#define PPC_ICSWX(s, a, b)	stringify_in_c(.long PPC_INST_ICSWX |	\
+					       ___PPC_RS(s) |		\
+					       ___PPC_RA(a) |		\
+					       ___PPC_RB(b))
+#define PPC_ICSWEPX(s, a, b)	stringify_in_c(.long PPC_INST_ICSWEPX | \
+					       ___PPC_RS(s) |		\
+					       ___PPC_RA(a) |		\
+					       ___PPC_RB(b))
+
+
 #endif /* _ASM_POWERPC_PPC_OPCODE_H */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 03/10] lib: add software 842 compression/decompression
  2015-05-07 17:49   ` [PATCHv3 00/10] add 842 hw compression for PowerNV platform Dan Streetman
  2015-05-07 17:49     ` [PATCH 01/10] powerpc: export of_get_ibm_chip_id function Dan Streetman
  2015-05-07 17:49     ` [PATCH 02/10] powerpc: Add ICSWX instruction Dan Streetman
@ 2015-05-07 17:49     ` Dan Streetman
  2015-05-07 17:49     ` [PATCH 04/10] crypto: change 842 alg to use software Dan Streetman
                       ` (7 subsequent siblings)
  10 siblings, 0 replies; 46+ messages in thread
From: Dan Streetman @ 2015-05-07 17:49 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Seth Jennings, Robert Jennings, Andrew Morton, Greg KH,
	linux-crypto, linuxppc-dev, linux-kernel, Dan Streetman

Add 842-format software compression and decompression functions.
Update the MAINTAINERS 842 section to include the new files.

The 842 compression function can compress any input data into the 842
compression format.  The 842 decompression function can decompress any
standard-format 842 compressed data - specifically, either a compressed
data buffer created by the 842 software compression function, or a
compressed data buffer created by the 842 hardware compressor (located
in PowerPC coprocessors).

The 842 compressed data format is explained in the header comments.

This is used in a later patch to provide a full software 842 compression
and decompression crypto interface.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 MAINTAINERS              |   2 +
 include/linux/sw842.h    |  12 +
 lib/842/842.h            | 127 ++++++++++
 lib/842/842_compress.c   | 626 +++++++++++++++++++++++++++++++++++++++++++++++
 lib/842/842_debugfs.h    |  52 ++++
 lib/842/842_decompress.c | 405 ++++++++++++++++++++++++++++++
 lib/842/Makefile         |   2 +
 lib/Kconfig              |   6 +
 lib/Makefile             |   2 +
 9 files changed, 1234 insertions(+)
 create mode 100644 include/linux/sw842.h
 create mode 100644 lib/842/842.h
 create mode 100644 lib/842/842_compress.c
 create mode 100644 lib/842/842_debugfs.h
 create mode 100644 lib/842/842_decompress.c
 create mode 100644 lib/842/Makefile

diff --git a/MAINTAINERS b/MAINTAINERS
index 781e099..116af01 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4872,6 +4872,8 @@ M:	Dan Streetman <ddstreet@us.ibm.com>
 S:	Supported
 F:	drivers/crypto/nx/nx-842.c
 F:	include/linux/nx842.h
+F:	include/linux/sw842.h
+F:	lib/842/
 
 IBM Power Linux RAID adapter
 M:	Brian King <brking@us.ibm.com>
diff --git a/include/linux/sw842.h b/include/linux/sw842.h
new file mode 100644
index 0000000..109ba04
--- /dev/null
+++ b/include/linux/sw842.h
@@ -0,0 +1,12 @@
+#ifndef __SW842_H__
+#define __SW842_H__
+
+#define SW842_MEM_COMPRESS	(0xf000)
+
+int sw842_compress(const u8 *src, unsigned int srclen,
+		   u8 *dst, unsigned int *destlen, void *wmem);
+
+int sw842_decompress(const u8 *src, unsigned int srclen,
+		     u8 *dst, unsigned int *destlen);
+
+#endif
diff --git a/lib/842/842.h b/lib/842/842.h
new file mode 100644
index 0000000..7c20003
--- /dev/null
+++ b/lib/842/842.h
@@ -0,0 +1,127 @@
+
+#ifndef __842_H__
+#define __842_H__
+
+/* The 842 compressed format is made up of multiple blocks, each of
+ * which have the format:
+ *
+ * <template>[arg1][arg2][arg3][arg4]
+ *
+ * where there are between 0 and 4 template args, depending on the specific
+ * template operation.  For normal operations, each arg is either a specific
+ * number of data bytes to add to the output buffer, or an index pointing
+ * to a previously-written number of data bytes to copy to the output buffer.
+ *
+ * The template code is a 5-bit value.  This code indicates what to do with
+ * the following data.  Template codes from 0 to 0x19 should use the template
+ * table, the static "decomp_ops" table used in decompress.  For each template
+ * (table row), there are between 1 and 4 actions; each action corresponds to
+ * an arg following the template code bits.  Each action is either a "data"
+ * type action, or a "index" type action, and each action results in 2, 4, or 8
+ * bytes being written to the output buffer.  Each template (i.e. all actions
+ * in the table row) will add up to 8 bytes being written to the output buffer.
+ * Any row with less than 4 actions is padded with noop actions, indicated by
+ * N0 (for which there is no corresponding arg in the compressed data buffer).
+ *
+ * "Data" actions, indicated in the table by D2, D4, and D8, mean that the
+ * corresponding arg is 2, 4, or 8 bytes, respectively, in the compressed data
+ * buffer should be copied directly to the output buffer.
+ *
+ * "Index" actions, indicated in the table by I2, I4, and I8, mean the
+ * corresponding arg is an index parameter that points to, respectively, a 2,
+ * 4, or 8 byte value already in the output buffer, that should be copied to
+ * the end of the output buffer.  Essentially, the index points to a position
+ * in a ring buffer that contains the last N bytes of output buffer data.
+ * The number of bits for each index's arg are: 8 bits for I2, 9 bits for I4,
+ * and 8 bits for I8.  Since each index points to a 2, 4, or 8 byte section,
+ * this means that I2 can reference 512 bytes ((2^8 bits = 256) * 2 bytes), I4
+ * can reference 2048 bytes ((2^9 = 512) * 4 bytes), and I8 can reference 2048
+ * bytes ((2^8 = 256) * 8 bytes).  Think of it as a kind-of ring buffer for
+ * each of I2, I4, and I8 that are updated for each byte written to the output
+ * buffer.  In this implementation, the output buffer is directly used for each
+ * index; there is no additional memory required.  Note that the index is into
+ * a ring buffer, not a sliding window; for example, if there have been 260
+ * bytes written to the output buffer, an I2 index of 0 would index to byte 256
+ * in the output buffer, while an I2 index of 16 would index to byte 16 in the
+ * output buffer.
+ *
+ * There are also 3 special template codes; 0x1b for "repeat", 0x1c for
+ * "zeros", and 0x1e for "end".  The "repeat" operation is followed by a 6 bit
+ * arg N indicating how many times to repeat.  The last 8 bytes written to the
+ * output buffer are written again to the output buffer, N + 1 times.  The
+ * "zeros" operation, which has no arg bits, writes 8 zeros to the output
+ * buffer.  The "end" operation, which also has no arg bits, signals the end
+ * of the compressed data.  There may be some number of padding (don't care,
+ * but usually 0) bits after the "end" operation bits, to fill the buffer
+ * length to a specific byte multiple (usually a multiple of 8, 16, or 32
+ * bytes).
+ *
+ * This software implementation also uses one of the undefined template values,
+ * 0x1d as a special "short data" template code, to represent less than 8 bytes
+ * of uncompressed data.  It is followed by a 3 bit arg N indicating how many
+ * data bytes will follow, and then N bytes of data, which should be copied to
+ * the output buffer.  This allows the software 842 compressor to accept input
+ * buffers that are not an exact multiple of 8 bytes long.  However, those
+ * compressed buffers containing this sw-only template will be rejected by
+ * the 842 hardware decompressor, and must be decompressed with this software
+ * library.  The 842 software compression module includes a parameter to
+ * disable using this sw-only "short data" template, and instead simply
+ * reject any input buffer that is not a multiple of 8 bytes long.
+ *
+ * After all actions for each operation code are processed, another template
+ * code is in the next 5 bits.  The decompression ends once the "end" template
+ * code is detected.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/bitops.h>
+#include <asm/unaligned.h>
+
+#include <linux/sw842.h>
+
+/* special templates */
+#define OP_REPEAT	(0x1B)
+#define OP_ZEROS	(0x1C)
+#define OP_END		(0x1E)
+
+/* sw only template - this is not in the hw design; it's used only by this
+ * software compressor and decompressor, to allow input buffers that aren't
+ * a multiple of 8.
+ */
+#define OP_SHORT_DATA	(0x1D)
+
+/* additional bits of each op param */
+#define OP_BITS		(5)
+#define REPEAT_BITS	(6)
+#define SHORT_DATA_BITS	(3)
+#define I2_BITS		(8)
+#define I4_BITS		(9)
+#define I8_BITS		(8)
+
+#define REPEAT_BITS_MAX		(0x3f)
+#define SHORT_DATA_BITS_MAX	(0x7)
+
+/* Arbitrary values used to indicate action */
+#define OP_ACTION	(0x70)
+#define OP_ACTION_INDEX	(0x10)
+#define OP_ACTION_DATA	(0x20)
+#define OP_ACTION_NOOP	(0x40)
+#define OP_AMOUNT	(0x0f)
+#define OP_AMOUNT_0	(0x00)
+#define OP_AMOUNT_2	(0x02)
+#define OP_AMOUNT_4	(0x04)
+#define OP_AMOUNT_8	(0x08)
+
+#define D2		(OP_ACTION_DATA  | OP_AMOUNT_2)
+#define D4		(OP_ACTION_DATA  | OP_AMOUNT_4)
+#define D8		(OP_ACTION_DATA  | OP_AMOUNT_8)
+#define I2		(OP_ACTION_INDEX | OP_AMOUNT_2)
+#define I4		(OP_ACTION_INDEX | OP_AMOUNT_4)
+#define I8		(OP_ACTION_INDEX | OP_AMOUNT_8)
+#define N0		(OP_ACTION_NOOP  | OP_AMOUNT_0)
+
+/* the max of the regular templates - not including the special templates */
+#define OPS_MAX		(0x1a)
+
+#endif
diff --git a/lib/842/842_compress.c b/lib/842/842_compress.c
new file mode 100644
index 0000000..7ce6894
--- /dev/null
+++ b/lib/842/842_compress.c
@@ -0,0 +1,626 @@
+/*
+ * 842 Software Compression
+ *
+ * Copyright (C) 2015 Dan Streetman, IBM Corp
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * See 842.h for details of the 842 compressed format.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#define MODULE_NAME "842_compress"
+
+#include <linux/hashtable.h>
+
+#include "842.h"
+#include "842_debugfs.h"
+
+#define SW842_HASHTABLE8_BITS	(10)
+#define SW842_HASHTABLE4_BITS	(11)
+#define SW842_HASHTABLE2_BITS	(10)
+
+/* By default, we allow compressing input buffers of any length, but we must
+ * use the non-standard "short data" template so the decompressor can correctly
+ * reproduce the uncompressed data buffer at the right length.  However the
+ * hardware 842 compressor will not recognize the "short data" template, and
+ * will fail to decompress any compressed buffer containing it (I have no idea
+ * why anyone would want to use software to compress and hardware to decompress
+ * but that's beside the point).  This parameter forces the compression
+ * function to simply reject any input buffer that isn't a multiple of 8 bytes
+ * long, instead of using the "short data" template, so that all compressed
+ * buffers produced by this function will be decompressable by the 842 hardware
+ * decompressor.  Unless you have a specific need for that, leave this disabled
+ * so that any length buffer can be compressed.
+ */
+static bool sw842_strict;
+module_param_named(strict, sw842_strict, bool, 0644);
+
+static u8 comp_ops[OPS_MAX][5] = { /* params size in bits */
+	{ I8, N0, N0, N0, 0x19 }, /* 8 */
+	{ I4, I4, N0, N0, 0x18 }, /* 18 */
+	{ I4, I2, I2, N0, 0x17 }, /* 25 */
+	{ I2, I2, I4, N0, 0x13 }, /* 25 */
+	{ I2, I2, I2, I2, 0x12 }, /* 32 */
+	{ I4, I2, D2, N0, 0x16 }, /* 33 */
+	{ I4, D2, I2, N0, 0x15 }, /* 33 */
+	{ I2, D2, I4, N0, 0x0e }, /* 33 */
+	{ D2, I2, I4, N0, 0x09 }, /* 33 */
+	{ I2, I2, I2, D2, 0x11 }, /* 40 */
+	{ I2, I2, D2, I2, 0x10 }, /* 40 */
+	{ I2, D2, I2, I2, 0x0d }, /* 40 */
+	{ D2, I2, I2, I2, 0x08 }, /* 40 */
+	{ I4, D4, N0, N0, 0x14 }, /* 41 */
+	{ D4, I4, N0, N0, 0x04 }, /* 41 */
+	{ I2, I2, D4, N0, 0x0f }, /* 48 */
+	{ I2, D2, I2, D2, 0x0c }, /* 48 */
+	{ I2, D4, I2, N0, 0x0b }, /* 48 */
+	{ D2, I2, I2, D2, 0x07 }, /* 48 */
+	{ D2, I2, D2, I2, 0x06 }, /* 48 */
+	{ D4, I2, I2, N0, 0x03 }, /* 48 */
+	{ I2, D2, D4, N0, 0x0a }, /* 56 */
+	{ D2, I2, D4, N0, 0x05 }, /* 56 */
+	{ D4, I2, D2, N0, 0x02 }, /* 56 */
+	{ D4, D2, I2, N0, 0x01 }, /* 56 */
+	{ D8, N0, N0, N0, 0x00 }, /* 64 */
+};
+
+struct sw842_hlist_node8 {
+	struct hlist_node node;
+	u64 data;
+	u8 index;
+};
+
+struct sw842_hlist_node4 {
+	struct hlist_node node;
+	u32 data;
+	u16 index;
+};
+
+struct sw842_hlist_node2 {
+	struct hlist_node node;
+	u16 data;
+	u8 index;
+};
+
+#define INDEX_NOT_FOUND		(-1)
+#define INDEX_NOT_CHECKED	(-2)
+
+struct sw842_param {
+	u8 *in;
+	u8 *instart;
+	u64 ilen;
+	u8 *out;
+	u64 olen;
+	u8 bit;
+	u64 data8[1];
+	u32 data4[2];
+	u16 data2[4];
+	int index8[1];
+	int index4[2];
+	int index2[4];
+	DECLARE_HASHTABLE(htable8, SW842_HASHTABLE8_BITS);
+	DECLARE_HASHTABLE(htable4, SW842_HASHTABLE4_BITS);
+	DECLARE_HASHTABLE(htable2, SW842_HASHTABLE2_BITS);
+	struct sw842_hlist_node8 node8[1 << I8_BITS];
+	struct sw842_hlist_node4 node4[1 << I4_BITS];
+	struct sw842_hlist_node2 node2[1 << I2_BITS];
+};
+
+#define get_input_data(p, o, b)						\
+	be##b##_to_cpu(get_unaligned((__be##b *)((p)->in + (o))))
+
+#define init_hashtable_nodes(p, b)	do {			\
+	int _i;							\
+	hash_init((p)->htable##b);				\
+	for (_i = 0; _i < ARRAY_SIZE((p)->node##b); _i++) {	\
+		(p)->node##b[_i].index = _i;			\
+		(p)->node##b[_i].data = 0;			\
+		INIT_HLIST_NODE(&(p)->node##b[_i].node);	\
+	}							\
+} while (0)
+
+#define find_index(p, b, n)	({					\
+	struct sw842_hlist_node##b *_n;					\
+	p->index##b[n] = INDEX_NOT_FOUND;				\
+	hash_for_each_possible(p->htable##b, _n, node, p->data##b[n]) {	\
+		if (p->data##b[n] == _n->data) {			\
+			p->index##b[n] = _n->index;			\
+			break;						\
+		}							\
+	}								\
+	p->index##b[n] >= 0;						\
+})
+
+#define check_index(p, b, n)			\
+	((p)->index##b[n] == INDEX_NOT_CHECKED	\
+	 ? find_index(p, b, n)			\
+	 : (p)->index##b[n] >= 0)
+
+#define replace_hash(p, b, i, d)	do {				\
+	struct sw842_hlist_node##b *_n = &(p)->node##b[(i)+(d)];	\
+	hash_del(&_n->node);						\
+	_n->data = (p)->data##b[d];					\
+	pr_debug("add hash index%x %x pos %x data %lx\n", b,		\
+		 (unsigned int)_n->index,				\
+		 (unsigned int)((p)->in - (p)->instart),		\
+		 (unsigned long)_n->data);				\
+	hash_add((p)->htable##b, &_n->node, _n->data);			\
+} while (0)
+
+static u8 bmask[8] = { 0x00, 0x80, 0xc0, 0xe0, 0xf0, 0xf8, 0xfc, 0xfe };
+
+static int add_bits(struct sw842_param *p, u64 d, u8 n);
+
+static int __split_add_bits(struct sw842_param *p, u64 d, u8 n, u8 s)
+{
+	int ret;
+
+	if (n <= s)
+		return -EINVAL;
+
+	ret = add_bits(p, d >> s, n - s);
+	if (ret)
+		return ret;
+	return add_bits(p, d & GENMASK_ULL(s - 1, 0), s);
+}
+
+static int add_bits(struct sw842_param *p, u64 d, u8 n)
+{
+	int b = p->bit, bits = b + n, s = round_up(bits, 8) - bits;
+	u64 o;
+	u8 *out = p->out;
+
+	pr_debug("add %u bits %lx\n", (unsigned char)n, (unsigned long)d);
+
+	if (n > 64)
+		return -EINVAL;
+
+	/* split this up if writing to > 8 bytes (i.e. n == 64 && p->bit > 0),
+	 * or if we're at the end of the output buffer and would write past end
+	 */
+	if (bits > 64)
+		return __split_add_bits(p, d, n, 32);
+	else if (p->olen < 8 && bits > 32 && bits <= 56)
+		return __split_add_bits(p, d, n, 16);
+	else if (p->olen < 4 && bits > 16 && bits <= 24)
+		return __split_add_bits(p, d, n, 8);
+
+	if (DIV_ROUND_UP(bits, 8) > p->olen)
+		return -ENOSPC;
+
+	o = *out & bmask[b];
+	d <<= s;
+
+	if (bits <= 8)
+		*out = o | d;
+	else if (bits <= 16)
+		put_unaligned(cpu_to_be16(o << 8 | d), (__be16 *)out);
+	else if (bits <= 24)
+		put_unaligned(cpu_to_be32(o << 24 | d << 8), (__be32 *)out);
+	else if (bits <= 32)
+		put_unaligned(cpu_to_be32(o << 24 | d), (__be32 *)out);
+	else if (bits <= 40)
+		put_unaligned(cpu_to_be64(o << 56 | d << 24), (__be64 *)out);
+	else if (bits <= 48)
+		put_unaligned(cpu_to_be64(o << 56 | d << 16), (__be64 *)out);
+	else if (bits <= 56)
+		put_unaligned(cpu_to_be64(o << 56 | d << 8), (__be64 *)out);
+	else
+		put_unaligned(cpu_to_be64(o << 56 | d), (__be64 *)out);
+
+	p->bit += n;
+
+	if (p->bit > 7) {
+		p->out += p->bit / 8;
+		p->olen -= p->bit / 8;
+		p->bit %= 8;
+	}
+
+	return 0;
+}
+
+static int add_template(struct sw842_param *p, u8 c)
+{
+	int ret, i, b = 0;
+	u8 *t = comp_ops[c];
+	bool inv = false;
+
+	if (c >= OPS_MAX)
+		return -EINVAL;
+
+	pr_debug("template %x\n", t[4]);
+
+	ret = add_bits(p, t[4], OP_BITS);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < 4; i++) {
+		pr_debug("op %x\n", t[i]);
+
+		switch (t[i] & OP_AMOUNT) {
+		case OP_AMOUNT_8:
+			if (b)
+				inv = true;
+			else if (t[i] & OP_ACTION_INDEX)
+				ret = add_bits(p, p->index8[0], I8_BITS);
+			else if (t[i] & OP_ACTION_DATA)
+				ret = add_bits(p, p->data8[0], 64);
+			else
+				inv = true;
+			break;
+		case OP_AMOUNT_4:
+			if (b == 2 && t[i] & OP_ACTION_DATA)
+				ret = add_bits(p, get_input_data(p, 2, 32), 32);
+			else if (b != 0 && b != 4)
+				inv = true;
+			else if (t[i] & OP_ACTION_INDEX)
+				ret = add_bits(p, p->index4[b >> 2], I4_BITS);
+			else if (t[i] & OP_ACTION_DATA)
+				ret = add_bits(p, p->data4[b >> 2], 32);
+			else
+				inv = true;
+			break;
+		case OP_AMOUNT_2:
+			if (b != 0 && b != 2 && b != 4 && b != 6)
+				inv = true;
+			if (t[i] & OP_ACTION_INDEX)
+				ret = add_bits(p, p->index2[b >> 1], I2_BITS);
+			else if (t[i] & OP_ACTION_DATA)
+				ret = add_bits(p, p->data2[b >> 1], 16);
+			else
+				inv = true;
+			break;
+		case OP_AMOUNT_0:
+			inv = (b != 8) || !(t[i] & OP_ACTION_NOOP);
+			break;
+		default:
+			inv = true;
+			break;
+		}
+
+		if (ret)
+			return ret;
+
+		if (inv) {
+			pr_err("Invalid templ %x op %d : %x %x %x %x\n",
+			       c, i, t[0], t[1], t[2], t[3]);
+			return -EINVAL;
+		}
+
+		b += t[i] & OP_AMOUNT;
+	}
+
+	if (b != 8) {
+		pr_err("Invalid template %x len %x : %x %x %x %x\n",
+		       c, b, t[0], t[1], t[2], t[3]);
+		return -EINVAL;
+	}
+
+	if (sw842_template_counts)
+		atomic_inc(&template_count[t[4]]);
+
+	return 0;
+}
+
+static int add_repeat_template(struct sw842_param *p, u8 r)
+{
+	int ret;
+
+	/* repeat param is 0-based */
+	if (!r || --r > REPEAT_BITS_MAX)
+		return -EINVAL;
+
+	ret = add_bits(p, OP_REPEAT, OP_BITS);
+	if (ret)
+		return ret;
+
+	ret = add_bits(p, r, REPEAT_BITS);
+	if (ret)
+		return ret;
+
+	if (sw842_template_counts)
+		atomic_inc(&template_repeat_count);
+
+	return 0;
+}
+
+static int add_short_data_template(struct sw842_param *p, u8 b)
+{
+	int ret, i;
+
+	if (!b || b > SHORT_DATA_BITS_MAX)
+		return -EINVAL;
+
+	ret = add_bits(p, OP_SHORT_DATA, OP_BITS);
+	if (ret)
+		return ret;
+
+	ret = add_bits(p, b, SHORT_DATA_BITS);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < b; i++) {
+		ret = add_bits(p, p->in[i], 8);
+		if (ret)
+			return ret;
+	}
+
+	if (sw842_template_counts)
+		atomic_inc(&template_short_data_count);
+
+	return 0;
+}
+
+static int add_zeros_template(struct sw842_param *p)
+{
+	int ret = add_bits(p, OP_ZEROS, OP_BITS);
+
+	if (ret)
+		return ret;
+
+	if (sw842_template_counts)
+		atomic_inc(&template_zeros_count);
+
+	return 0;
+}
+
+static int add_end_template(struct sw842_param *p)
+{
+	int ret = add_bits(p, OP_END, OP_BITS);
+
+	if (ret)
+		return ret;
+
+	if (sw842_template_counts)
+		atomic_inc(&template_end_count);
+
+	return 0;
+}
+
+static bool check_template(struct sw842_param *p, u8 c)
+{
+	u8 *t = comp_ops[c];
+	int i, match, b = 0;
+
+	if (c >= OPS_MAX)
+		return false;
+
+	for (i = 0; i < 4; i++) {
+		if (t[i] & OP_ACTION_INDEX) {
+			if (t[i] & OP_AMOUNT_2)
+				match = check_index(p, 2, b >> 1);
+			else if (t[i] & OP_AMOUNT_4)
+				match = check_index(p, 4, b >> 2);
+			else if (t[i] & OP_AMOUNT_8)
+				match = check_index(p, 8, 0);
+			else
+				return false;
+			if (!match)
+				return false;
+		}
+
+		b += t[i] & OP_AMOUNT;
+	}
+
+	return true;
+}
+
+static void get_next_data(struct sw842_param *p)
+{
+	p->data8[0] = get_input_data(p, 0, 64);
+	p->data4[0] = get_input_data(p, 0, 32);
+	p->data4[1] = get_input_data(p, 4, 32);
+	p->data2[0] = get_input_data(p, 0, 16);
+	p->data2[1] = get_input_data(p, 2, 16);
+	p->data2[2] = get_input_data(p, 4, 16);
+	p->data2[3] = get_input_data(p, 6, 16);
+}
+
+/* update the hashtable entries.
+ * only call this after finding/adding the current template
+ * the dataN fields for the current 8 byte block must be already updated
+ */
+static void update_hashtables(struct sw842_param *p)
+{
+	u64 pos = p->in - p->instart;
+	u64 n8 = (pos >> 3) % (1 << I8_BITS);
+	u64 n4 = (pos >> 2) % (1 << I4_BITS);
+	u64 n2 = (pos >> 1) % (1 << I2_BITS);
+
+	replace_hash(p, 8, n8, 0);
+	replace_hash(p, 4, n4, 0);
+	replace_hash(p, 4, n4, 1);
+	replace_hash(p, 2, n2, 0);
+	replace_hash(p, 2, n2, 1);
+	replace_hash(p, 2, n2, 2);
+	replace_hash(p, 2, n2, 3);
+}
+
+/* find the next template to use, and add it
+ * the p->dataN fields must already be set for the current 8 byte block
+ */
+static int process_next(struct sw842_param *p)
+{
+	int ret, i;
+
+	p->index8[0] = INDEX_NOT_CHECKED;
+	p->index4[0] = INDEX_NOT_CHECKED;
+	p->index4[1] = INDEX_NOT_CHECKED;
+	p->index2[0] = INDEX_NOT_CHECKED;
+	p->index2[1] = INDEX_NOT_CHECKED;
+	p->index2[2] = INDEX_NOT_CHECKED;
+	p->index2[3] = INDEX_NOT_CHECKED;
+
+	/* check up to OPS_MAX - 1; last op is our fallback */
+	for (i = 0; i < OPS_MAX - 1; i++) {
+		if (check_template(p, i))
+			break;
+	}
+
+	ret = add_template(p, i);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+/**
+ * sw842_compress
+ *
+ * Compress the uncompressed buffer of length @ilen at @in to the output buffer
+ * @out, using no more than @olen bytes, using the 842 compression format.
+ *
+ * Returns: 0 on success, error on failure.  The @olen parameter
+ * will contain the number of output bytes written on success, or
+ * 0 on error.
+ */
+int sw842_compress(const u8 *in, unsigned int ilen,
+		   u8 *out, unsigned int *olen, void *wmem)
+{
+	struct sw842_param *p = (struct sw842_param *)wmem;
+	int ret;
+	u64 last, next, pad, total;
+	u8 repeat_count = 0;
+
+	BUILD_BUG_ON(sizeof(*p) > SW842_MEM_COMPRESS);
+
+	init_hashtable_nodes(p, 8);
+	init_hashtable_nodes(p, 4);
+	init_hashtable_nodes(p, 2);
+
+	p->in = (u8 *)in;
+	p->instart = p->in;
+	p->ilen = ilen;
+	p->out = out;
+	p->olen = *olen;
+	p->bit = 0;
+
+	total = p->olen;
+
+	*olen = 0;
+
+	/* if using strict mode, we can only compress a multiple of 8 */
+	if (sw842_strict && (ilen % 8)) {
+		pr_err("Using strict mode, can't compress len %d\n", ilen);
+		return -EINVAL;
+	}
+
+	/* let's compress at least 8 bytes, mkay? */
+	if (unlikely(ilen < 8))
+		goto skip_comp;
+
+	/* make initial 'last' different so we don't match the first time */
+	last = ~get_unaligned((u64 *)p->in);
+
+	while (p->ilen > 7) {
+		next = get_unaligned((u64 *)p->in);
+
+		/* must get the next data, as we need to update the hashtable
+		 * entries with the new data every time
+		 */
+		get_next_data(p);
+
+		/* we don't care about endianness in last or next;
+		 * we're just comparing 8 bytes to another 8 bytes,
+		 * they're both the same endianness
+		 */
+		if (next == last) {
+			/* repeat count bits are 0-based, so we stop at +1 */
+			if (++repeat_count <= REPEAT_BITS_MAX)
+				goto repeat;
+		}
+		if (repeat_count) {
+			ret = add_repeat_template(p, repeat_count);
+			repeat_count = 0;
+			if (next == last) /* reached max repeat bits */
+				goto repeat;
+		}
+
+		if (next == 0)
+			ret = add_zeros_template(p);
+		else
+			ret = process_next(p);
+
+		if (ret)
+			return ret;
+
+repeat:
+		last = next;
+		update_hashtables(p);
+		p->in += 8;
+		p->ilen -= 8;
+	}
+
+	if (repeat_count) {
+		ret = add_repeat_template(p, repeat_count);
+		if (ret)
+			return ret;
+	}
+
+skip_comp:
+	if (p->ilen > 0) {
+		ret = add_short_data_template(p, p->ilen);
+		if (ret)
+			return ret;
+
+		p->in += p->ilen;
+		p->ilen = 0;
+	}
+
+	ret = add_end_template(p);
+	if (ret)
+		return ret;
+
+	if (p->bit) {
+		p->out++;
+		p->olen--;
+		p->bit = 0;
+	}
+
+	/* pad compressed length to multiple of 8 */
+	pad = (8 - ((total - p->olen) % 8)) % 8;
+	if (pad) {
+		if (pad > p->olen) /* we were so close! */
+			return -ENOSPC;
+		memset(p->out, 0, pad);
+		p->out += pad;
+		p->olen -= pad;
+	}
+
+	if (unlikely((total - p->olen) > UINT_MAX))
+		return -ENOSPC;
+
+	*olen = total - p->olen;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(sw842_compress);
+
+static int __init sw842_init(void)
+{
+	if (sw842_template_counts)
+		sw842_debugfs_create();
+
+	return 0;
+}
+module_init(sw842_init);
+
+static void __exit sw842_exit(void)
+{
+	if (sw842_template_counts)
+		sw842_debugfs_remove();
+}
+module_exit(sw842_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Software 842 Compressor");
+MODULE_AUTHOR("Dan Streetman <ddstreet@ieee.org>");
diff --git a/lib/842/842_debugfs.h b/lib/842/842_debugfs.h
new file mode 100644
index 0000000..e7f3bff
--- /dev/null
+++ b/lib/842/842_debugfs.h
@@ -0,0 +1,52 @@
+
+#ifndef __842_DEBUGFS_H__
+#define __842_DEBUGFS_H__
+
+#include <linux/debugfs.h>
+
+static bool sw842_template_counts;
+module_param_named(template_counts, sw842_template_counts, bool, 0444);
+
+static atomic_t template_count[OPS_MAX], template_repeat_count,
+	template_zeros_count, template_short_data_count, template_end_count;
+
+static struct dentry *sw842_debugfs_root;
+
+static int __init sw842_debugfs_create(void)
+{
+	umode_t m = S_IRUGO | S_IWUSR;
+	int i;
+
+	if (!debugfs_initialized())
+		return -ENODEV;
+
+	sw842_debugfs_root = debugfs_create_dir(MODULE_NAME, NULL);
+	if (IS_ERR(sw842_debugfs_root))
+		return PTR_ERR(sw842_debugfs_root);
+
+	for (i = 0; i < ARRAY_SIZE(template_count); i++) {
+		char name[32];
+
+		snprintf(name, 32, "template_%02x", i);
+		debugfs_create_atomic_t(name, m, sw842_debugfs_root,
+					&template_count[i]);
+	}
+	debugfs_create_atomic_t("template_repeat", m, sw842_debugfs_root,
+				&template_repeat_count);
+	debugfs_create_atomic_t("template_zeros", m, sw842_debugfs_root,
+				&template_zeros_count);
+	debugfs_create_atomic_t("template_short_data", m, sw842_debugfs_root,
+				&template_short_data_count);
+	debugfs_create_atomic_t("template_end", m, sw842_debugfs_root,
+				&template_end_count);
+
+	return 0;
+}
+
+static void __exit sw842_debugfs_remove(void)
+{
+	if (sw842_debugfs_root && !IS_ERR(sw842_debugfs_root))
+		debugfs_remove_recursive(sw842_debugfs_root);
+}
+
+#endif
diff --git a/lib/842/842_decompress.c b/lib/842/842_decompress.c
new file mode 100644
index 0000000..6b2b45a
--- /dev/null
+++ b/lib/842/842_decompress.c
@@ -0,0 +1,405 @@
+/*
+ * 842 Software Decompression
+ *
+ * Copyright (C) 2015 Dan Streetman, IBM Corp
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * See 842.h for details of the 842 compressed format.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#define MODULE_NAME "842_decompress"
+
+#include "842.h"
+#include "842_debugfs.h"
+
+/* rolling fifo sizes */
+#define I2_FIFO_SIZE	(2 * (1 << I2_BITS))
+#define I4_FIFO_SIZE	(4 * (1 << I4_BITS))
+#define I8_FIFO_SIZE	(8 * (1 << I8_BITS))
+
+static u8 decomp_ops[OPS_MAX][4] = {
+	{ D8, N0, N0, N0 },
+	{ D4, D2, I2, N0 },
+	{ D4, I2, D2, N0 },
+	{ D4, I2, I2, N0 },
+	{ D4, I4, N0, N0 },
+	{ D2, I2, D4, N0 },
+	{ D2, I2, D2, I2 },
+	{ D2, I2, I2, D2 },
+	{ D2, I2, I2, I2 },
+	{ D2, I2, I4, N0 },
+	{ I2, D2, D4, N0 },
+	{ I2, D4, I2, N0 },
+	{ I2, D2, I2, D2 },
+	{ I2, D2, I2, I2 },
+	{ I2, D2, I4, N0 },
+	{ I2, I2, D4, N0 },
+	{ I2, I2, D2, I2 },
+	{ I2, I2, I2, D2 },
+	{ I2, I2, I2, I2 },
+	{ I2, I2, I4, N0 },
+	{ I4, D4, N0, N0 },
+	{ I4, D2, I2, N0 },
+	{ I4, I2, D2, N0 },
+	{ I4, I2, I2, N0 },
+	{ I4, I4, N0, N0 },
+	{ I8, N0, N0, N0 }
+};
+
+struct sw842_param {
+	u8 *in;
+	u8 bit;
+	u64 ilen;
+	u8 *out;
+	u8 *ostart;
+	u64 olen;
+};
+
+#define beN_to_cpu(d, s)					\
+	((s) == 2 ? be16_to_cpu(get_unaligned((__be16 *)d)) :	\
+	 (s) == 4 ? be32_to_cpu(get_unaligned((__be32 *)d)) :	\
+	 (s) == 8 ? be64_to_cpu(get_unaligned((__be64 *)d)) :	\
+	 WARN(1, "pr_debug param err invalid size %x\n", s))
+
+static int next_bits(struct sw842_param *p, u64 *d, u8 n);
+
+static int __split_next_bits(struct sw842_param *p, u64 *d, u8 n, u8 s)
+{
+	u64 tmp = 0;
+	int ret;
+
+	if (n <= s) {
+		pr_debug("split_next_bits invalid n %u s %u\n", n, s);
+		return -EINVAL;
+	}
+
+	ret = next_bits(p, &tmp, n - s);
+	if (ret)
+		return ret;
+	ret = next_bits(p, d, s);
+	if (ret)
+		return ret;
+	*d |= tmp << s;
+	return 0;
+}
+
+static int next_bits(struct sw842_param *p, u64 *d, u8 n)
+{
+	u8 *in = p->in, b = p->bit, bits = b + n;
+
+	if (n > 64) {
+		pr_debug("next_bits invalid n %u\n", n);
+		return -EINVAL;
+	}
+
+	/* split this up if reading > 8 bytes, or if we're at the end of
+	 * the input buffer and would read past the end
+	 */
+	if (bits > 64)
+		return __split_next_bits(p, d, n, 32);
+	else if (p->ilen < 8 && bits > 32 && bits <= 56)
+		return __split_next_bits(p, d, n, 16);
+	else if (p->ilen < 4 && bits > 16 && bits <= 24)
+		return __split_next_bits(p, d, n, 8);
+
+	if (DIV_ROUND_UP(bits, 8) > p->ilen)
+		return -EOVERFLOW;
+
+	if (bits <= 8)
+		*d = *in >> (8 - bits);
+	else if (bits <= 16)
+		*d = be16_to_cpu(get_unaligned((__be16 *)in)) >> (16 - bits);
+	else if (bits <= 32)
+		*d = be32_to_cpu(get_unaligned((__be32 *)in)) >> (32 - bits);
+	else
+		*d = be64_to_cpu(get_unaligned((__be64 *)in)) >> (64 - bits);
+
+	*d &= GENMASK_ULL(n - 1, 0);
+
+	p->bit += n;
+
+	if (p->bit > 7) {
+		p->in += p->bit / 8;
+		p->ilen -= p->bit / 8;
+		p->bit %= 8;
+	}
+
+	return 0;
+}
+
+static int do_data(struct sw842_param *p, u8 n)
+{
+	u64 v;
+	int ret;
+
+	if (n > p->olen)
+		return -ENOSPC;
+
+	ret = next_bits(p, &v, n * 8);
+	if (ret)
+		return ret;
+
+	switch (n) {
+	case 2:
+		put_unaligned(cpu_to_be16((u16)v), (__be16 *)p->out);
+		break;
+	case 4:
+		put_unaligned(cpu_to_be32((u32)v), (__be32 *)p->out);
+		break;
+	case 8:
+		put_unaligned(cpu_to_be64((u64)v), (__be64 *)p->out);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	p->out += n;
+	p->olen -= n;
+
+	return 0;
+}
+
+static int __do_index(struct sw842_param *p, u8 size, u8 bits, u64 fsize)
+{
+	u64 index, offset, total = round_down(p->out - p->ostart, 8);
+	int ret;
+
+	ret = next_bits(p, &index, bits);
+	if (ret)
+		return ret;
+
+	offset = index * size;
+
+	/* a ring buffer of fsize is used; correct the offset */
+	if (total > fsize) {
+		/* this is where the current fifo is */
+		u64 section = round_down(total, fsize);
+		/* the current pos in the fifo */
+		u64 pos = total % fsize;
+
+		/* if the offset is past/at the pos, we need to
+		 * go back to the last fifo section
+		 */
+		if (offset >= pos)
+			section -= fsize;
+
+		offset += section;
+	}
+
+	if (offset + size > total) {
+		pr_debug("index%x %lx points past end %lx\n", size,
+			 (unsigned long)offset, (unsigned long)total);
+		return -EINVAL;
+	}
+
+	pr_debug("index%x to %lx off %lx adjoff %lx tot %lx data %lx\n",
+		 size, (unsigned long)index, (unsigned long)(index * size),
+		 (unsigned long)offset, (unsigned long)total,
+		 (unsigned long)beN_to_cpu(&p->ostart[offset], size));
+
+	memcpy(p->out, &p->ostart[offset], size);
+	p->out += size;
+	p->olen -= size;
+
+	return 0;
+}
+
+int do_index(struct sw842_param *p, u8 n)
+{
+	switch (n) {
+	case 2:
+		return __do_index(p, 2, I2_BITS, I2_FIFO_SIZE);
+	case 4:
+		return __do_index(p, 4, I4_BITS, I4_FIFO_SIZE);
+	case 8:
+		return __do_index(p, 8, I8_BITS, I8_FIFO_SIZE);
+	default:
+		return -EINVAL;
+	}
+}
+
+int do_op(struct sw842_param *p, u8 o)
+{
+	int i, ret = 0;
+
+	if (o >= OPS_MAX)
+		return -EINVAL;
+
+	for (i = 0; i < 4; i++) {
+		u8 op = decomp_ops[o][i];
+
+		pr_debug("op is %x\n", op);
+
+		switch (op & OP_ACTION) {
+		case OP_ACTION_DATA:
+			ret = do_data(p, op & OP_AMOUNT);
+			break;
+		case OP_ACTION_INDEX:
+			ret = do_index(p, op & OP_AMOUNT);
+			break;
+		case OP_ACTION_NOOP:
+			break;
+		default:
+			pr_err("Interal error, invalid op %x\n", op);
+			return -EINVAL;
+		}
+
+		if (ret)
+			return ret;
+	}
+
+	if (sw842_template_counts)
+		atomic_inc(&template_count[o]);
+
+	return 0;
+}
+
+/**
+ * sw842_decompress
+ *
+ * Decompress the 842-compressed buffer of length @ilen at @in
+ * to the output buffer @out, using no more than @olen bytes.
+ *
+ * The compressed buffer must be only a single 842-compressed buffer,
+ * with the standard format described in the comments in 842.h
+ * Processing will stop when the 842 "END" template is detected,
+ * not the end of the buffer.
+ *
+ * Returns: 0 on success, error on failure.  The @olen parameter
+ * will contain the number of output bytes written on success, or
+ * 0 on error.
+ */
+int sw842_decompress(const u8 *in, unsigned int ilen,
+		     u8 *out, unsigned int *olen)
+{
+	struct sw842_param p;
+	int ret;
+	u64 op, rep, tmp, bytes, total;
+
+	p.in = (u8 *)in;
+	p.bit = 0;
+	p.ilen = ilen;
+	p.out = out;
+	p.ostart = out;
+	p.olen = *olen;
+
+	total = p.olen;
+
+	*olen = 0;
+
+	do {
+		ret = next_bits(&p, &op, OP_BITS);
+		if (ret)
+			return ret;
+
+		pr_debug("template is %lx\n", (unsigned long)op);
+
+		switch (op) {
+		case OP_REPEAT:
+			ret = next_bits(&p, &rep, REPEAT_BITS);
+			if (ret)
+				return ret;
+
+			if (p.out == out) /* no previous bytes */
+				return -EINVAL;
+
+			/* copy rep + 1 */
+			rep++;
+
+			if (rep * 8 > p.olen)
+				return -ENOSPC;
+
+			while (rep-- > 0) {
+				memcpy(p.out, p.out - 8, 8);
+				p.out += 8;
+				p.olen -= 8;
+			}
+
+			if (sw842_template_counts)
+				atomic_inc(&template_repeat_count);
+
+			break;
+		case OP_ZEROS:
+			if (8 > p.olen)
+				return -ENOSPC;
+
+			memset(p.out, 0, 8);
+			p.out += 8;
+			p.olen -= 8;
+
+			if (sw842_template_counts)
+				atomic_inc(&template_zeros_count);
+
+			break;
+		case OP_SHORT_DATA:
+			ret = next_bits(&p, &bytes, SHORT_DATA_BITS);
+			if (ret)
+				return ret;
+
+			if (!bytes || bytes > SHORT_DATA_BITS_MAX)
+				return -EINVAL;
+
+			while (bytes-- > 0) {
+				ret = next_bits(&p, &tmp, 8);
+				if (ret)
+					return ret;
+				*p.out = (u8)tmp;
+				p.out++;
+				p.olen--;
+			}
+
+			if (sw842_template_counts)
+				atomic_inc(&template_short_data_count);
+
+			break;
+		case OP_END:
+			if (sw842_template_counts)
+				atomic_inc(&template_end_count);
+
+			break;
+		default: /* use template */
+			ret = do_op(&p, op);
+			if (ret)
+				return ret;
+			break;
+		}
+	} while (op != OP_END);
+
+	if (unlikely((total - p.olen) > UINT_MAX))
+		return -ENOSPC;
+
+	*olen = total - p.olen;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(sw842_decompress);
+
+static int __init sw842_init(void)
+{
+	if (sw842_template_counts)
+		sw842_debugfs_create();
+
+	return 0;
+}
+module_init(sw842_init);
+
+static void __exit sw842_exit(void)
+{
+	if (sw842_template_counts)
+		sw842_debugfs_remove();
+}
+module_exit(sw842_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Software 842 Decompressor");
+MODULE_AUTHOR("Dan Streetman <ddstreet@ieee.org>");
diff --git a/lib/842/Makefile b/lib/842/Makefile
new file mode 100644
index 0000000..5d24c0b
--- /dev/null
+++ b/lib/842/Makefile
@@ -0,0 +1,2 @@
+obj-$(CONFIG_842_COMPRESS) += 842_compress.o
+obj-$(CONFIG_842_DECOMPRESS) += 842_decompress.o
diff --git a/lib/Kconfig b/lib/Kconfig
index 601965a..34e332b 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -212,6 +212,12 @@ config RANDOM32_SELFTEST
 #
 # compression support is select'ed if needed
 #
+config 842_COMPRESS
+	tristate
+
+config 842_DECOMPRESS
+	tristate
+
 config ZLIB_INFLATE
 	tristate
 
diff --git a/lib/Makefile b/lib/Makefile
index 6c37933..ff37c8c 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -78,6 +78,8 @@ obj-$(CONFIG_LIBCRC32C)	+= libcrc32c.o
 obj-$(CONFIG_CRC8)	+= crc8.o
 obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o
 
+obj-$(CONFIG_842_COMPRESS) += 842/
+obj-$(CONFIG_842_DECOMPRESS) += 842/
 obj-$(CONFIG_ZLIB_INFLATE) += zlib_inflate/
 obj-$(CONFIG_ZLIB_DEFLATE) += zlib_deflate/
 obj-$(CONFIG_REED_SOLOMON) += reed_solomon/
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 04/10] crypto: change 842 alg to use software
  2015-05-07 17:49   ` [PATCHv3 00/10] add 842 hw compression for PowerNV platform Dan Streetman
                       ` (2 preceding siblings ...)
  2015-05-07 17:49     ` [PATCH 03/10] lib: add software 842 compression/decompression Dan Streetman
@ 2015-05-07 17:49     ` Dan Streetman
  2015-05-07 17:49     ` [PATCH 05/10] drivers/crypto/nx: rename nx-842.c to nx-842-pseries.c Dan Streetman
                       ` (6 subsequent siblings)
  10 siblings, 0 replies; 46+ messages in thread
From: Dan Streetman @ 2015-05-07 17:49 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Seth Jennings, Robert Jennings, linux-crypto, linuxppc-dev,
	linux-kernel, Dan Streetman

Change the crypto 842 compression alg to use the software 842 compression
and decompression library.  Add the crypto driver_name as "842-generic".
Remove the fallback to LZO compression.

Previously, this crypto compression alg attemped 842 compression using
PowerPC hardware, and fell back to LZO compression and decompression if
the 842 PowerPC hardware was unavailable or failed.  This should not
fall back to any other compression method, however; users of this crypto
compression alg can fallback if desired, and transparent fallback tricks
callers into thinking they are getting 842 compression when they actually
get LZO compression - the failure of the 842 hardware should not be
transparent to the caller.

The crypto compression alg for a hardware device also should not be located
in crypto/ so this is now a software-only implementation that uses the 842
software compression/decompression library.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 MAINTAINERS    |   1 +
 crypto/842.c   | 174 ++++++++++++---------------------------------------------
 crypto/Kconfig |   7 +--
 3 files changed, 41 insertions(+), 141 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 116af01..5a5c1dc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4873,6 +4873,7 @@ S:	Supported
 F:	drivers/crypto/nx/nx-842.c
 F:	include/linux/nx842.h
 F:	include/linux/sw842.h
+F:	crypto/842.c
 F:	lib/842/
 
 IBM Power Linux RAID adapter
diff --git a/crypto/842.c b/crypto/842.c
index b48f4f1..98e387e 100644
--- a/crypto/842.c
+++ b/crypto/842.c
@@ -1,5 +1,5 @@
 /*
- * Cryptographic API for the 842 compression algorithm.
+ * Cryptographic API for the 842 software compression algorithm.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -11,173 +11,73 @@
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  * GNU General Public License for more details.
  *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ * Copyright (C) IBM Corporation, 2011-2015
  *
- * Copyright (C) IBM Corporation, 2011
+ * Original Authors: Robert Jennings <rcj@linux.vnet.ibm.com>
+ *                   Seth Jennings <sjenning@linux.vnet.ibm.com>
  *
- * Authors: Robert Jennings <rcj@linux.vnet.ibm.com>
- *          Seth Jennings <sjenning@linux.vnet.ibm.com>
+ * Rewrite: Dan Streetman <ddstreet@ieee.org>
+ *
+ * This is the software implementation of compression and decompression using
+ * the 842 format.  This uses the software 842 library at lib/842/ which is
+ * only a reference implementation, and is very, very slow as compared to other
+ * software compressors.  You probably do not want to use this software
+ * compression.  If you have access to the PowerPC 842 compression hardware, you
+ * want to use the 842 hardware compression interface, which is at:
+ * drivers/crypto/nx/nx-842-crypto.c
  */
 
 #include <linux/init.h>
 #include <linux/module.h>
 #include <linux/crypto.h>
-#include <linux/vmalloc.h>
-#include <linux/nx842.h>
-#include <linux/lzo.h>
-#include <linux/timer.h>
-
-static int nx842_uselzo;
-
-struct nx842_ctx {
-	void *nx842_wmem; /* working memory for 842/lzo */
-};
+#include <linux/sw842.h>
 
-enum nx842_crypto_type {
-	NX842_CRYPTO_TYPE_842,
-	NX842_CRYPTO_TYPE_LZO
+struct crypto842_ctx {
+	char wmem[SW842_MEM_COMPRESS];	/* working memory for compress */
 };
 
-#define NX842_SENTINEL 0xdeadbeef
-
-struct nx842_crypto_header {
-	unsigned int sentinel; /* debug */
-	enum nx842_crypto_type type;
-};
-
-static int nx842_init(struct crypto_tfm *tfm)
-{
-	struct nx842_ctx *ctx = crypto_tfm_ctx(tfm);
-	int wmemsize;
-
-	wmemsize = max_t(int, nx842_get_workmem_size(), LZO1X_MEM_COMPRESS);
-	ctx->nx842_wmem = kmalloc(wmemsize, GFP_NOFS);
-	if (!ctx->nx842_wmem)
-		return -ENOMEM;
-
-	return 0;
-}
-
-static void nx842_exit(struct crypto_tfm *tfm)
-{
-	struct nx842_ctx *ctx = crypto_tfm_ctx(tfm);
-
-	kfree(ctx->nx842_wmem);
-}
-
-static void nx842_reset_uselzo(unsigned long data)
+static int crypto842_compress(struct crypto_tfm *tfm,
+			      const u8 *src, unsigned int slen,
+			      u8 *dst, unsigned int *dlen)
 {
-	nx842_uselzo = 0;
-}
-
-static DEFINE_TIMER(failover_timer, nx842_reset_uselzo, 0, 0);
-
-static int nx842_crypto_compress(struct crypto_tfm *tfm, const u8 *src,
-			    unsigned int slen, u8 *dst, unsigned int *dlen)
-{
-	struct nx842_ctx *ctx = crypto_tfm_ctx(tfm);
-	struct nx842_crypto_header *hdr;
-	unsigned int tmp_len = *dlen;
-	size_t lzodlen; /* needed for lzo */
-	int err;
-
-	*dlen = 0;
-	hdr = (struct nx842_crypto_header *)dst;
-	hdr->sentinel = NX842_SENTINEL; /* debug */
-	dst += sizeof(struct nx842_crypto_header);
-	tmp_len -= sizeof(struct nx842_crypto_header);
-	lzodlen = tmp_len;
-
-	if (likely(!nx842_uselzo)) {
-		err = nx842_compress(src, slen, dst, &tmp_len, ctx->nx842_wmem);
-
-		if (likely(!err)) {
-			hdr->type = NX842_CRYPTO_TYPE_842;
-			*dlen = tmp_len + sizeof(struct nx842_crypto_header);
-			return 0;
-		}
-
-		/* hardware failed */
-		nx842_uselzo = 1;
+	struct crypto842_ctx *ctx = crypto_tfm_ctx(tfm);
 
-		/* set timer to check for hardware again in 1 second */
-		mod_timer(&failover_timer, jiffies + msecs_to_jiffies(1000));
-	}
-
-	/* no hardware, use lzo */
-	err = lzo1x_1_compress(src, slen, dst, &lzodlen, ctx->nx842_wmem);
-	if (err != LZO_E_OK)
-		return -EINVAL;
-
-	hdr->type = NX842_CRYPTO_TYPE_LZO;
-	*dlen = lzodlen + sizeof(struct nx842_crypto_header);
-	return 0;
+	return sw842_compress(src, slen, dst, dlen, ctx->wmem);
 }
 
-static int nx842_crypto_decompress(struct crypto_tfm *tfm, const u8 *src,
-			      unsigned int slen, u8 *dst, unsigned int *dlen)
+static int crypto842_decompress(struct crypto_tfm *tfm,
+				const u8 *src, unsigned int slen,
+				u8 *dst, unsigned int *dlen)
 {
-	struct nx842_ctx *ctx = crypto_tfm_ctx(tfm);
-	struct nx842_crypto_header *hdr;
-	unsigned int tmp_len = *dlen;
-	size_t lzodlen; /* needed for lzo */
-	int err;
-
-	*dlen = 0;
-	hdr = (struct nx842_crypto_header *)src;
-
-	if (unlikely(hdr->sentinel != NX842_SENTINEL))
-		return -EINVAL;
-
-	src += sizeof(struct nx842_crypto_header);
-	slen -= sizeof(struct nx842_crypto_header);
-
-	if (likely(hdr->type == NX842_CRYPTO_TYPE_842)) {
-		err = nx842_decompress(src, slen, dst, &tmp_len,
-			ctx->nx842_wmem);
-		if (err)
-			return -EINVAL;
-		*dlen = tmp_len;
-	} else if (hdr->type == NX842_CRYPTO_TYPE_LZO) {
-		lzodlen = tmp_len;
-		err = lzo1x_decompress_safe(src, slen, dst, &lzodlen);
-		if (err != LZO_E_OK)
-			return -EINVAL;
-		*dlen = lzodlen;
-	} else
-		return -EINVAL;
-
-	return 0;
+	return sw842_decompress(src, slen, dst, dlen);
 }
 
 static struct crypto_alg alg = {
 	.cra_name		= "842",
+	.cra_driver_name	= "842-generic",
+	.cra_priority		= 100,
 	.cra_flags		= CRYPTO_ALG_TYPE_COMPRESS,
-	.cra_ctxsize		= sizeof(struct nx842_ctx),
+	.cra_ctxsize		= sizeof(struct crypto842_ctx),
 	.cra_module		= THIS_MODULE,
-	.cra_init		= nx842_init,
-	.cra_exit		= nx842_exit,
 	.cra_u			= { .compress = {
-	.coa_compress		= nx842_crypto_compress,
-	.coa_decompress		= nx842_crypto_decompress } }
+	.coa_compress		= crypto842_compress,
+	.coa_decompress		= crypto842_decompress } }
 };
 
-static int __init nx842_mod_init(void)
+static int __init crypto842_mod_init(void)
 {
-	del_timer(&failover_timer);
 	return crypto_register_alg(&alg);
 }
+module_init(crypto842_mod_init);
 
-static void __exit nx842_mod_exit(void)
+static void __exit crypto842_mod_exit(void)
 {
 	crypto_unregister_alg(&alg);
 }
-
-module_init(nx842_mod_init);
-module_exit(nx842_mod_exit);
+module_exit(crypto842_mod_exit);
 
 MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("842 Compression Algorithm");
+MODULE_DESCRIPTION("842 Software Compression Algorithm");
 MODULE_ALIAS_CRYPTO("842");
+MODULE_ALIAS_CRYPTO("842-generic");
+MODULE_AUTHOR("Dan Streetman <ddstreet@ieee.org>");
diff --git a/crypto/Kconfig b/crypto/Kconfig
index 8aaf298..eba55b4 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1412,10 +1412,9 @@ config CRYPTO_LZO
 
 config CRYPTO_842
 	tristate "842 compression algorithm"
-	depends on CRYPTO_DEV_NX_COMPRESS
-	# 842 uses lzo if the hardware becomes unavailable
-	select LZO_COMPRESS
-	select LZO_DECOMPRESS
+	select CRYPTO_ALGAPI
+	select 842_COMPRESS
+	select 842_DECOMPRESS
 	help
 	  This is the 842 algorithm.
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 05/10] drivers/crypto/nx: rename nx-842.c to nx-842-pseries.c
  2015-05-07 17:49   ` [PATCHv3 00/10] add 842 hw compression for PowerNV platform Dan Streetman
                       ` (3 preceding siblings ...)
  2015-05-07 17:49     ` [PATCH 04/10] crypto: change 842 alg to use software Dan Streetman
@ 2015-05-07 17:49     ` Dan Streetman
  2015-05-07 17:49     ` [PATCH 06/10] drivers/crypto/nx: add NX-842 platform frontend driver Dan Streetman
                       ` (5 subsequent siblings)
  10 siblings, 0 replies; 46+ messages in thread
From: Dan Streetman @ 2015-05-07 17:49 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Seth Jennings, Robert Jennings, linux-crypto, linuxppc-dev,
	linux-kernel, Dan Streetman

Move the entire NX-842 driver for the pSeries platform from the file
nx-842.c to nx-842-pseries.c.  This is required by later patches that
add NX-842 support for the PowerNV platform.

This patch does not alter the content of the pSeries NX-842 driver at
all, it only changes the filename.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 drivers/crypto/nx/Makefile         |    2 +-
 drivers/crypto/nx/nx-842-pseries.c | 1603 ++++++++++++++++++++++++++++++++++++
 drivers/crypto/nx/nx-842.c         | 1603 ------------------------------------
 3 files changed, 1604 insertions(+), 1604 deletions(-)
 create mode 100644 drivers/crypto/nx/nx-842-pseries.c
 delete mode 100644 drivers/crypto/nx/nx-842.c

diff --git a/drivers/crypto/nx/Makefile b/drivers/crypto/nx/Makefile
index bb770ea..8669ffa 100644
--- a/drivers/crypto/nx/Makefile
+++ b/drivers/crypto/nx/Makefile
@@ -11,4 +11,4 @@ nx-crypto-objs := nx.o \
 		  nx-sha512.o
 
 obj-$(CONFIG_CRYPTO_DEV_NX_COMPRESS) += nx-compress.o
-nx-compress-objs := nx-842.o
+nx-compress-objs := nx-842-pseries.o
diff --git a/drivers/crypto/nx/nx-842-pseries.c b/drivers/crypto/nx/nx-842-pseries.c
new file mode 100644
index 0000000..887196e
--- /dev/null
+++ b/drivers/crypto/nx/nx-842-pseries.c
@@ -0,0 +1,1603 @@
+/*
+ * Driver for IBM Power 842 compression accelerator
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright (C) IBM Corporation, 2012
+ *
+ * Authors: Robert Jennings <rcj@linux.vnet.ibm.com>
+ *          Seth Jennings <sjenning@linux.vnet.ibm.com>
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/nx842.h>
+#include <linux/of.h>
+#include <linux/slab.h>
+
+#include <asm/page.h>
+#include <asm/vio.h>
+
+#include "nx_csbcpb.h" /* struct nx_csbcpb */
+
+#define MODULE_NAME "nx-compress"
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Robert Jennings <rcj@linux.vnet.ibm.com>");
+MODULE_DESCRIPTION("842 H/W Compression driver for IBM Power processors");
+
+#define SHIFT_4K 12
+#define SHIFT_64K 16
+#define SIZE_4K (1UL << SHIFT_4K)
+#define SIZE_64K (1UL << SHIFT_64K)
+
+/* IO buffer must be 128 byte aligned */
+#define IO_BUFFER_ALIGN 128
+
+struct nx842_header {
+	int blocks_nr; /* number of compressed blocks */
+	int offset; /* offset of the first block (from beginning of header) */
+	int sizes[0]; /* size of compressed blocks */
+};
+
+static inline int nx842_header_size(const struct nx842_header *hdr)
+{
+	return sizeof(struct nx842_header) +
+			hdr->blocks_nr * sizeof(hdr->sizes[0]);
+}
+
+/* Macros for fields within nx_csbcpb */
+/* Check the valid bit within the csbcpb valid field */
+#define NX842_CSBCBP_VALID_CHK(x) (x & BIT_MASK(7))
+
+/* CE macros operate on the completion_extension field bits in the csbcpb.
+ * CE0 0=full completion, 1=partial completion
+ * CE1 0=CE0 indicates completion, 1=termination (output may be modified)
+ * CE2 0=processed_bytes is source bytes, 1=processed_bytes is target bytes */
+#define NX842_CSBCPB_CE0(x)	(x & BIT_MASK(7))
+#define NX842_CSBCPB_CE1(x)	(x & BIT_MASK(6))
+#define NX842_CSBCPB_CE2(x)	(x & BIT_MASK(5))
+
+/* The NX unit accepts data only on 4K page boundaries */
+#define NX842_HW_PAGE_SHIFT	SHIFT_4K
+#define NX842_HW_PAGE_SIZE	(ASM_CONST(1) << NX842_HW_PAGE_SHIFT)
+#define NX842_HW_PAGE_MASK	(~(NX842_HW_PAGE_SIZE-1))
+
+enum nx842_status {
+	UNAVAILABLE,
+	AVAILABLE
+};
+
+struct ibm_nx842_counters {
+	atomic64_t comp_complete;
+	atomic64_t comp_failed;
+	atomic64_t decomp_complete;
+	atomic64_t decomp_failed;
+	atomic64_t swdecomp;
+	atomic64_t comp_times[32];
+	atomic64_t decomp_times[32];
+};
+
+static struct nx842_devdata {
+	struct vio_dev *vdev;
+	struct device *dev;
+	struct ibm_nx842_counters *counters;
+	unsigned int max_sg_len;
+	unsigned int max_sync_size;
+	unsigned int max_sync_sg;
+	enum nx842_status status;
+} __rcu *devdata;
+static DEFINE_SPINLOCK(devdata_mutex);
+
+#define NX842_COUNTER_INC(_x) \
+static inline void nx842_inc_##_x( \
+	const struct nx842_devdata *dev) { \
+	if (dev) \
+		atomic64_inc(&dev->counters->_x); \
+}
+NX842_COUNTER_INC(comp_complete);
+NX842_COUNTER_INC(comp_failed);
+NX842_COUNTER_INC(decomp_complete);
+NX842_COUNTER_INC(decomp_failed);
+NX842_COUNTER_INC(swdecomp);
+
+#define NX842_HIST_SLOTS 16
+
+static void ibm_nx842_incr_hist(atomic64_t *times, unsigned int time)
+{
+	int bucket = fls(time);
+
+	if (bucket)
+		bucket = min((NX842_HIST_SLOTS - 1), bucket - 1);
+
+	atomic64_inc(&times[bucket]);
+}
+
+/* NX unit operation flags */
+#define NX842_OP_COMPRESS	0x0
+#define NX842_OP_CRC		0x1
+#define NX842_OP_DECOMPRESS	0x2
+#define NX842_OP_COMPRESS_CRC   (NX842_OP_COMPRESS | NX842_OP_CRC)
+#define NX842_OP_DECOMPRESS_CRC (NX842_OP_DECOMPRESS | NX842_OP_CRC)
+#define NX842_OP_ASYNC		(1<<23)
+#define NX842_OP_NOTIFY		(1<<22)
+#define NX842_OP_NOTIFY_INT(x)	((x & 0xff)<<8)
+
+static unsigned long nx842_get_desired_dma(struct vio_dev *viodev)
+{
+	/* No use of DMA mappings within the driver. */
+	return 0;
+}
+
+struct nx842_slentry {
+	unsigned long ptr; /* Real address (use __pa()) */
+	unsigned long len;
+};
+
+/* pHyp scatterlist entry */
+struct nx842_scatterlist {
+	int entry_nr; /* number of slentries */
+	struct nx842_slentry *entries; /* ptr to array of slentries */
+};
+
+/* Does not include sizeof(entry_nr) in the size */
+static inline unsigned long nx842_get_scatterlist_size(
+				struct nx842_scatterlist *sl)
+{
+	return sl->entry_nr * sizeof(struct nx842_slentry);
+}
+
+static inline unsigned long nx842_get_pa(void *addr)
+{
+	if (is_vmalloc_addr(addr))
+		return page_to_phys(vmalloc_to_page(addr))
+		       + offset_in_page(addr);
+	else
+		return __pa(addr);
+}
+
+static int nx842_build_scatterlist(unsigned long buf, int len,
+			struct nx842_scatterlist *sl)
+{
+	unsigned long nextpage;
+	struct nx842_slentry *entry;
+
+	sl->entry_nr = 0;
+
+	entry = sl->entries;
+	while (len) {
+		entry->ptr = nx842_get_pa((void *)buf);
+		nextpage = ALIGN(buf + 1, NX842_HW_PAGE_SIZE);
+		if (nextpage < buf + len) {
+			/* we aren't at the end yet */
+			if (IS_ALIGNED(buf, NX842_HW_PAGE_SIZE))
+				/* we are in the middle (or beginning) */
+				entry->len = NX842_HW_PAGE_SIZE;
+			else
+				/* we are at the beginning */
+				entry->len = nextpage - buf;
+		} else {
+			/* at the end */
+			entry->len = len;
+		}
+
+		len -= entry->len;
+		buf += entry->len;
+		sl->entry_nr++;
+		entry++;
+	}
+
+	return 0;
+}
+
+/*
+ * Working memory for software decompression
+ */
+struct sw842_fifo {
+	union {
+		char f8[256][8];
+		char f4[512][4];
+	};
+	char f2[256][2];
+	unsigned char f84_full;
+	unsigned char f2_full;
+	unsigned char f8_count;
+	unsigned char f2_count;
+	unsigned int f4_count;
+};
+
+/*
+ * Working memory for crypto API
+ */
+struct nx842_workmem {
+	char bounce[PAGE_SIZE]; /* bounce buffer for decompression input */
+	union {
+		/* hardware working memory */
+		struct {
+			/* scatterlist */
+			char slin[SIZE_4K];
+			char slout[SIZE_4K];
+			/* coprocessor status/parameter block */
+			struct nx_csbcpb csbcpb;
+		};
+		/* software working memory */
+		struct sw842_fifo swfifo; /* software decompression fifo */
+	};
+};
+
+int nx842_get_workmem_size(void)
+{
+	return sizeof(struct nx842_workmem) + NX842_HW_PAGE_SIZE;
+}
+EXPORT_SYMBOL_GPL(nx842_get_workmem_size);
+
+int nx842_get_workmem_size_aligned(void)
+{
+	return sizeof(struct nx842_workmem);
+}
+EXPORT_SYMBOL_GPL(nx842_get_workmem_size_aligned);
+
+static int nx842_validate_result(struct device *dev,
+	struct cop_status_block *csb)
+{
+	/* The csb must be valid after returning from vio_h_cop_sync */
+	if (!NX842_CSBCBP_VALID_CHK(csb->valid)) {
+		dev_err(dev, "%s: cspcbp not valid upon completion.\n",
+				__func__);
+		dev_dbg(dev, "valid:0x%02x cs:0x%02x cc:0x%02x ce:0x%02x\n",
+				csb->valid,
+				csb->crb_seq_number,
+				csb->completion_code,
+				csb->completion_extension);
+		dev_dbg(dev, "processed_bytes:%d address:0x%016lx\n",
+				csb->processed_byte_count,
+				(unsigned long)csb->address);
+		return -EIO;
+	}
+
+	/* Check return values from the hardware in the CSB */
+	switch (csb->completion_code) {
+	case 0:	/* Completed without error */
+		break;
+	case 64: /* Target bytes > Source bytes during compression */
+	case 13: /* Output buffer too small */
+		dev_dbg(dev, "%s: Compression output larger than input\n",
+					__func__);
+		return -ENOSPC;
+	case 66: /* Input data contains an illegal template field */
+	case 67: /* Template indicates data past the end of the input stream */
+		dev_dbg(dev, "%s: Bad data for decompression (code:%d)\n",
+					__func__, csb->completion_code);
+		return -EINVAL;
+	default:
+		dev_dbg(dev, "%s: Unspecified error (code:%d)\n",
+					__func__, csb->completion_code);
+		return -EIO;
+	}
+
+	/* Hardware sanity check */
+	if (!NX842_CSBCPB_CE2(csb->completion_extension)) {
+		dev_err(dev, "%s: No error returned by hardware, but "
+				"data returned is unusable, contact support.\n"
+				"(Additional info: csbcbp->processed bytes "
+				"does not specify processed bytes for the "
+				"target buffer.)\n", __func__);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/**
+ * nx842_compress - Compress data using the 842 algorithm
+ *
+ * Compression provide by the NX842 coprocessor on IBM Power systems.
+ * The input buffer is compressed and the result is stored in the
+ * provided output buffer.
+ *
+ * Upon return from this function @outlen contains the length of the
+ * compressed data.  If there is an error then @outlen will be 0 and an
+ * error will be specified by the return code from this function.
+ *
+ * @in: Pointer to input buffer, must be page aligned
+ * @inlen: Length of input buffer, must be PAGE_SIZE
+ * @out: Pointer to output buffer
+ * @outlen: Length of output buffer
+ * @wrkmem: ptr to buffer for working memory, size determined by
+ *          nx842_get_workmem_size()
+ *
+ * Returns:
+ *   0		Success, output of length @outlen stored in the buffer at @out
+ *   -ENOMEM	Unable to allocate internal buffers
+ *   -ENOSPC	Output buffer is to small
+ *   -EMSGSIZE	XXX Difficult to describe this limitation
+ *   -EIO	Internal error
+ *   -ENODEV	Hardware unavailable
+ */
+int nx842_compress(const unsigned char *in, unsigned int inlen,
+		       unsigned char *out, unsigned int *outlen, void *wmem)
+{
+	struct nx842_header *hdr;
+	struct nx842_devdata *local_devdata;
+	struct device *dev = NULL;
+	struct nx842_workmem *workmem;
+	struct nx842_scatterlist slin, slout;
+	struct nx_csbcpb *csbcpb;
+	int ret = 0, max_sync_size, i, bytesleft, size, hdrsize;
+	unsigned long inbuf, outbuf, padding;
+	struct vio_pfo_op op = {
+		.done = NULL,
+		.handle = 0,
+		.timeout = 0,
+	};
+	unsigned long start_time = get_tb();
+
+	/*
+	 * Make sure input buffer is 64k page aligned.  This is assumed since
+	 * this driver is designed for page compression only (for now).  This
+	 * is very nice since we can now use direct DDE(s) for the input and
+	 * the alignment is guaranteed.
+	*/
+	inbuf = (unsigned long)in;
+	if (!IS_ALIGNED(inbuf, PAGE_SIZE) || inlen != PAGE_SIZE)
+		return -EINVAL;
+
+	rcu_read_lock();
+	local_devdata = rcu_dereference(devdata);
+	if (!local_devdata || !local_devdata->dev) {
+		rcu_read_unlock();
+		return -ENODEV;
+	}
+	max_sync_size = local_devdata->max_sync_size;
+	dev = local_devdata->dev;
+
+	/* Create the header */
+	hdr = (struct nx842_header *)out;
+	hdr->blocks_nr = PAGE_SIZE / max_sync_size;
+	hdrsize = nx842_header_size(hdr);
+	outbuf = (unsigned long)out + hdrsize;
+	bytesleft = *outlen - hdrsize;
+
+	/* Init scatterlist */
+	workmem = (struct nx842_workmem *)ALIGN((unsigned long)wmem,
+		NX842_HW_PAGE_SIZE);
+	slin.entries = (struct nx842_slentry *)workmem->slin;
+	slout.entries = (struct nx842_slentry *)workmem->slout;
+
+	/* Init operation */
+	op.flags = NX842_OP_COMPRESS;
+	csbcpb = &workmem->csbcpb;
+	memset(csbcpb, 0, sizeof(*csbcpb));
+	op.csbcpb = nx842_get_pa(csbcpb);
+	op.out = nx842_get_pa(slout.entries);
+
+	for (i = 0; i < hdr->blocks_nr; i++) {
+		/*
+		 * Aligning the output blocks to 128 bytes does waste space,
+		 * but it prevents the need for bounce buffers and memory
+		 * copies.  It also simplifies the code a lot.  In the worst
+		 * case (64k page, 4k max_sync_size), you lose up to
+		 * (128*16)/64k = ~3% the compression factor. For 64k
+		 * max_sync_size, the loss would be at most 128/64k = ~0.2%.
+		 */
+		padding = ALIGN(outbuf, IO_BUFFER_ALIGN) - outbuf;
+		outbuf += padding;
+		bytesleft -= padding;
+		if (i == 0)
+			/* save offset into first block in header */
+			hdr->offset = padding + hdrsize;
+
+		if (bytesleft <= 0) {
+			ret = -ENOSPC;
+			goto unlock;
+		}
+
+		/*
+		 * NOTE: If the default max_sync_size is changed from 4k
+		 * to 64k, remove the "likely" case below, since a
+		 * scatterlist will always be needed.
+		 */
+		if (likely(max_sync_size == NX842_HW_PAGE_SIZE)) {
+			/* Create direct DDE */
+			op.in = nx842_get_pa((void *)inbuf);
+			op.inlen = max_sync_size;
+
+		} else {
+			/* Create indirect DDE (scatterlist) */
+			nx842_build_scatterlist(inbuf, max_sync_size, &slin);
+			op.in = nx842_get_pa(slin.entries);
+			op.inlen = -nx842_get_scatterlist_size(&slin);
+		}
+
+		/*
+		 * If max_sync_size != NX842_HW_PAGE_SIZE, an indirect
+		 * DDE is required for the outbuf.
+		 * If max_sync_size == NX842_HW_PAGE_SIZE, outbuf must
+		 * also be page aligned (1 in 128/4k=32 chance) in order
+		 * to use a direct DDE.
+		 * This is unlikely, just use an indirect DDE always.
+		 */
+		nx842_build_scatterlist(outbuf,
+			min(bytesleft, max_sync_size), &slout);
+		/* op.out set before loop */
+		op.outlen = -nx842_get_scatterlist_size(&slout);
+
+		/* Send request to pHyp */
+		ret = vio_h_cop_sync(local_devdata->vdev, &op);
+
+		/* Check for pHyp error */
+		if (ret) {
+			dev_dbg(dev, "%s: vio_h_cop_sync error (ret=%d, hret=%ld)\n",
+				__func__, ret, op.hcall_err);
+			ret = -EIO;
+			goto unlock;
+		}
+
+		/* Check for hardware error */
+		ret = nx842_validate_result(dev, &csbcpb->csb);
+		if (ret && ret != -ENOSPC)
+			goto unlock;
+
+		/* Handle incompressible data */
+		if (unlikely(ret == -ENOSPC)) {
+			if (bytesleft < max_sync_size) {
+				/*
+				 * Not enough space left in the output buffer
+				 * to store uncompressed block
+				 */
+				goto unlock;
+			} else {
+				/* Store incompressible block */
+				memcpy((void *)outbuf, (void *)inbuf,
+					max_sync_size);
+				hdr->sizes[i] = -max_sync_size;
+				outbuf += max_sync_size;
+				bytesleft -= max_sync_size;
+				/* Reset ret, incompressible data handled */
+				ret = 0;
+			}
+		} else {
+			/* Normal case, compression was successful */
+			size = csbcpb->csb.processed_byte_count;
+			dev_dbg(dev, "%s: processed_bytes=%d\n",
+				__func__, size);
+			hdr->sizes[i] = size;
+			outbuf += size;
+			bytesleft -= size;
+		}
+
+		inbuf += max_sync_size;
+	}
+
+	*outlen = (unsigned int)(outbuf - (unsigned long)out);
+
+unlock:
+	if (ret)
+		nx842_inc_comp_failed(local_devdata);
+	else {
+		nx842_inc_comp_complete(local_devdata);
+		ibm_nx842_incr_hist(local_devdata->counters->comp_times,
+			(get_tb() - start_time) / tb_ticks_per_usec);
+	}
+	rcu_read_unlock();
+	return ret;
+}
+EXPORT_SYMBOL_GPL(nx842_compress);
+
+static int sw842_decompress(const unsigned char *, int, unsigned char *, int *,
+			const void *);
+
+/**
+ * nx842_decompress - Decompress data using the 842 algorithm
+ *
+ * Decompression provide by the NX842 coprocessor on IBM Power systems.
+ * The input buffer is decompressed and the result is stored in the
+ * provided output buffer.  The size allocated to the output buffer is
+ * provided by the caller of this function in @outlen.  Upon return from
+ * this function @outlen contains the length of the decompressed data.
+ * If there is an error then @outlen will be 0 and an error will be
+ * specified by the return code from this function.
+ *
+ * @in: Pointer to input buffer, will use bounce buffer if not 128 byte
+ *      aligned
+ * @inlen: Length of input buffer
+ * @out: Pointer to output buffer, must be page aligned
+ * @outlen: Length of output buffer, must be PAGE_SIZE
+ * @wrkmem: ptr to buffer for working memory, size determined by
+ *          nx842_get_workmem_size()
+ *
+ * Returns:
+ *   0		Success, output of length @outlen stored in the buffer at @out
+ *   -ENODEV	Hardware decompression device is unavailable
+ *   -ENOMEM	Unable to allocate internal buffers
+ *   -ENOSPC	Output buffer is to small
+ *   -EINVAL	Bad input data encountered when attempting decompress
+ *   -EIO	Internal error
+ */
+int nx842_decompress(const unsigned char *in, unsigned int inlen,
+			 unsigned char *out, unsigned int *outlen, void *wmem)
+{
+	struct nx842_header *hdr;
+	struct nx842_devdata *local_devdata;
+	struct device *dev = NULL;
+	struct nx842_workmem *workmem;
+	struct nx842_scatterlist slin, slout;
+	struct nx_csbcpb *csbcpb;
+	int ret = 0, i, size, max_sync_size;
+	unsigned long inbuf, outbuf;
+	struct vio_pfo_op op = {
+		.done = NULL,
+		.handle = 0,
+		.timeout = 0,
+	};
+	unsigned long start_time = get_tb();
+
+	/* Ensure page alignment and size */
+	outbuf = (unsigned long)out;
+	if (!IS_ALIGNED(outbuf, PAGE_SIZE) || *outlen != PAGE_SIZE)
+		return -EINVAL;
+
+	rcu_read_lock();
+	local_devdata = rcu_dereference(devdata);
+	if (local_devdata)
+		dev = local_devdata->dev;
+
+	/* Get header */
+	hdr = (struct nx842_header *)in;
+
+	workmem = (struct nx842_workmem *)ALIGN((unsigned long)wmem,
+		NX842_HW_PAGE_SIZE);
+
+	inbuf = (unsigned long)in + hdr->offset;
+	if (likely(!IS_ALIGNED(inbuf, IO_BUFFER_ALIGN))) {
+		/* Copy block(s) into bounce buffer for alignment */
+		memcpy(workmem->bounce, in + hdr->offset, inlen - hdr->offset);
+		inbuf = (unsigned long)workmem->bounce;
+	}
+
+	/* Init scatterlist */
+	slin.entries = (struct nx842_slentry *)workmem->slin;
+	slout.entries = (struct nx842_slentry *)workmem->slout;
+
+	/* Init operation */
+	op.flags = NX842_OP_DECOMPRESS;
+	csbcpb = &workmem->csbcpb;
+	memset(csbcpb, 0, sizeof(*csbcpb));
+	op.csbcpb = nx842_get_pa(csbcpb);
+
+	/*
+	 * max_sync_size may have changed since compression,
+	 * so we can't read it from the device info. We need
+	 * to derive it from hdr->blocks_nr.
+	 */
+	max_sync_size = PAGE_SIZE / hdr->blocks_nr;
+
+	for (i = 0; i < hdr->blocks_nr; i++) {
+		/* Skip padding */
+		inbuf = ALIGN(inbuf, IO_BUFFER_ALIGN);
+
+		if (hdr->sizes[i] < 0) {
+			/* Negative sizes indicate uncompressed data blocks */
+			size = abs(hdr->sizes[i]);
+			memcpy((void *)outbuf, (void *)inbuf, size);
+			outbuf += size;
+			inbuf += size;
+			continue;
+		}
+
+		if (!dev)
+			goto sw;
+
+		/*
+		 * The better the compression, the more likely the "likely"
+		 * case becomes.
+		 */
+		if (likely((inbuf & NX842_HW_PAGE_MASK) ==
+			((inbuf + hdr->sizes[i] - 1) & NX842_HW_PAGE_MASK))) {
+			/* Create direct DDE */
+			op.in = nx842_get_pa((void *)inbuf);
+			op.inlen = hdr->sizes[i];
+		} else {
+			/* Create indirect DDE (scatterlist) */
+			nx842_build_scatterlist(inbuf, hdr->sizes[i] , &slin);
+			op.in = nx842_get_pa(slin.entries);
+			op.inlen = -nx842_get_scatterlist_size(&slin);
+		}
+
+		/*
+		 * NOTE: If the default max_sync_size is changed from 4k
+		 * to 64k, remove the "likely" case below, since a
+		 * scatterlist will always be needed.
+		 */
+		if (likely(max_sync_size == NX842_HW_PAGE_SIZE)) {
+			/* Create direct DDE */
+			op.out = nx842_get_pa((void *)outbuf);
+			op.outlen = max_sync_size;
+		} else {
+			/* Create indirect DDE (scatterlist) */
+			nx842_build_scatterlist(outbuf, max_sync_size, &slout);
+			op.out = nx842_get_pa(slout.entries);
+			op.outlen = -nx842_get_scatterlist_size(&slout);
+		}
+
+		/* Send request to pHyp */
+		ret = vio_h_cop_sync(local_devdata->vdev, &op);
+
+		/* Check for pHyp error */
+		if (ret) {
+			dev_dbg(dev, "%s: vio_h_cop_sync error (ret=%d, hret=%ld)\n",
+				__func__, ret, op.hcall_err);
+			dev = NULL;
+			goto sw;
+		}
+
+		/* Check for hardware error */
+		ret = nx842_validate_result(dev, &csbcpb->csb);
+		if (ret) {
+			dev = NULL;
+			goto sw;
+		}
+
+		/* HW decompression success */
+		inbuf += hdr->sizes[i];
+		outbuf += csbcpb->csb.processed_byte_count;
+		continue;
+
+sw:
+		/* software decompression */
+		size = max_sync_size;
+		ret = sw842_decompress(
+			(unsigned char *)inbuf, hdr->sizes[i],
+			(unsigned char *)outbuf, &size, wmem);
+		if (ret)
+			pr_debug("%s: sw842_decompress failed with %d\n",
+				__func__, ret);
+
+		if (ret) {
+			if (ret != -ENOSPC && ret != -EINVAL &&
+					ret != -EMSGSIZE)
+				ret = -EIO;
+			goto unlock;
+		}
+
+		/* SW decompression success */
+		inbuf += hdr->sizes[i];
+		outbuf += size;
+	}
+
+	*outlen = (unsigned int)(outbuf - (unsigned long)out);
+
+unlock:
+	if (ret)
+		/* decompress fail */
+		nx842_inc_decomp_failed(local_devdata);
+	else {
+		if (!dev)
+			/* software decompress */
+			nx842_inc_swdecomp(local_devdata);
+		nx842_inc_decomp_complete(local_devdata);
+		ibm_nx842_incr_hist(local_devdata->counters->decomp_times,
+			(get_tb() - start_time) / tb_ticks_per_usec);
+	}
+
+	rcu_read_unlock();
+	return ret;
+}
+EXPORT_SYMBOL_GPL(nx842_decompress);
+
+/**
+ * nx842_OF_set_defaults -- Set default (disabled) values for devdata
+ *
+ * @devdata - struct nx842_devdata to update
+ *
+ * Returns:
+ *  0 on success
+ *  -ENOENT if @devdata ptr is NULL
+ */
+static int nx842_OF_set_defaults(struct nx842_devdata *devdata)
+{
+	if (devdata) {
+		devdata->max_sync_size = 0;
+		devdata->max_sync_sg = 0;
+		devdata->max_sg_len = 0;
+		devdata->status = UNAVAILABLE;
+		return 0;
+	} else
+		return -ENOENT;
+}
+
+/**
+ * nx842_OF_upd_status -- Update the device info from OF status prop
+ *
+ * The status property indicates if the accelerator is enabled.  If the
+ * device is in the OF tree it indicates that the hardware is present.
+ * The status field indicates if the device is enabled when the status
+ * is 'okay'.  Otherwise the device driver will be disabled.
+ *
+ * @devdata - struct nx842_devdata to update
+ * @prop - struct property point containing the maxsyncop for the update
+ *
+ * Returns:
+ *  0 - Device is available
+ *  -EINVAL - Device is not available
+ */
+static int nx842_OF_upd_status(struct nx842_devdata *devdata,
+					struct property *prop) {
+	int ret = 0;
+	const char *status = (const char *)prop->value;
+
+	if (!strncmp(status, "okay", (size_t)prop->length)) {
+		devdata->status = AVAILABLE;
+	} else {
+		dev_info(devdata->dev, "%s: status '%s' is not 'okay'\n",
+				__func__, status);
+		devdata->status = UNAVAILABLE;
+	}
+
+	return ret;
+}
+
+/**
+ * nx842_OF_upd_maxsglen -- Update the device info from OF maxsglen prop
+ *
+ * Definition of the 'ibm,max-sg-len' OF property:
+ *  This field indicates the maximum byte length of a scatter list
+ *  for the platform facility. It is a single cell encoded as with encode-int.
+ *
+ * Example:
+ *  # od -x ibm,max-sg-len
+ *  0000000 0000 0ff0
+ *
+ *  In this example, the maximum byte length of a scatter list is
+ *  0x0ff0 (4,080).
+ *
+ * @devdata - struct nx842_devdata to update
+ * @prop - struct property point containing the maxsyncop for the update
+ *
+ * Returns:
+ *  0 on success
+ *  -EINVAL on failure
+ */
+static int nx842_OF_upd_maxsglen(struct nx842_devdata *devdata,
+					struct property *prop) {
+	int ret = 0;
+	const int *maxsglen = prop->value;
+
+	if (prop->length != sizeof(*maxsglen)) {
+		dev_err(devdata->dev, "%s: unexpected format for ibm,max-sg-len property\n", __func__);
+		dev_dbg(devdata->dev, "%s: ibm,max-sg-len is %d bytes long, expected %lu bytes\n", __func__,
+				prop->length, sizeof(*maxsglen));
+		ret = -EINVAL;
+	} else {
+		devdata->max_sg_len = (unsigned int)min(*maxsglen,
+				(int)NX842_HW_PAGE_SIZE);
+	}
+
+	return ret;
+}
+
+/**
+ * nx842_OF_upd_maxsyncop -- Update the device info from OF maxsyncop prop
+ *
+ * Definition of the 'ibm,max-sync-cop' OF property:
+ *  Two series of cells.  The first series of cells represents the maximums
+ *  that can be synchronously compressed. The second series of cells
+ *  represents the maximums that can be synchronously decompressed.
+ *  1. The first cell in each series contains the count of the number of
+ *     data length, scatter list elements pairs that follow – each being
+ *     of the form
+ *    a. One cell data byte length
+ *    b. One cell total number of scatter list elements
+ *
+ * Example:
+ *  # od -x ibm,max-sync-cop
+ *  0000000 0000 0001 0000 1000 0000 01fe 0000 0001
+ *  0000020 0000 1000 0000 01fe
+ *
+ *  In this example, compression supports 0x1000 (4,096) data byte length
+ *  and 0x1fe (510) total scatter list elements.  Decompression supports
+ *  0x1000 (4,096) data byte length and 0x1f3 (510) total scatter list
+ *  elements.
+ *
+ * @devdata - struct nx842_devdata to update
+ * @prop - struct property point containing the maxsyncop for the update
+ *
+ * Returns:
+ *  0 on success
+ *  -EINVAL on failure
+ */
+static int nx842_OF_upd_maxsyncop(struct nx842_devdata *devdata,
+					struct property *prop) {
+	int ret = 0;
+	const struct maxsynccop_t {
+		int comp_elements;
+		int comp_data_limit;
+		int comp_sg_limit;
+		int decomp_elements;
+		int decomp_data_limit;
+		int decomp_sg_limit;
+	} *maxsynccop;
+
+	if (prop->length != sizeof(*maxsynccop)) {
+		dev_err(devdata->dev, "%s: unexpected format for ibm,max-sync-cop property\n", __func__);
+		dev_dbg(devdata->dev, "%s: ibm,max-sync-cop is %d bytes long, expected %lu bytes\n", __func__, prop->length,
+				sizeof(*maxsynccop));
+		ret = -EINVAL;
+		goto out;
+	}
+
+	maxsynccop = (const struct maxsynccop_t *)prop->value;
+
+	/* Use one limit rather than separate limits for compression and
+	 * decompression. Set a maximum for this so as not to exceed the
+	 * size that the header can support and round the value down to
+	 * the hardware page size (4K) */
+	devdata->max_sync_size =
+			(unsigned int)min(maxsynccop->comp_data_limit,
+					maxsynccop->decomp_data_limit);
+
+	devdata->max_sync_size = min_t(unsigned int, devdata->max_sync_size,
+					SIZE_64K);
+
+	if (devdata->max_sync_size < SIZE_4K) {
+		dev_err(devdata->dev, "%s: hardware max data size (%u) is "
+				"less than the driver minimum, unable to use "
+				"the hardware device\n",
+				__func__, devdata->max_sync_size);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	devdata->max_sync_sg = (unsigned int)min(maxsynccop->comp_sg_limit,
+						maxsynccop->decomp_sg_limit);
+	if (devdata->max_sync_sg < 1) {
+		dev_err(devdata->dev, "%s: hardware max sg size (%u) is "
+				"less than the driver minimum, unable to use "
+				"the hardware device\n",
+				__func__, devdata->max_sync_sg);
+		ret = -EINVAL;
+		goto out;
+	}
+
+out:
+	return ret;
+}
+
+/**
+ *
+ * nx842_OF_upd -- Handle OF properties updates for the device.
+ *
+ * Set all properties from the OF tree.  Optionally, a new property
+ * can be provided by the @new_prop pointer to overwrite an existing value.
+ * The device will remain disabled until all values are valid, this function
+ * will return an error for updates unless all values are valid.
+ *
+ * @new_prop: If not NULL, this property is being updated.  If NULL, update
+ *  all properties from the current values in the OF tree.
+ *
+ * Returns:
+ *  0 - Success
+ *  -ENOMEM - Could not allocate memory for new devdata structure
+ *  -EINVAL - property value not found, new_prop is not a recognized
+ *	property for the device or property value is not valid.
+ *  -ENODEV - Device is not available
+ */
+static int nx842_OF_upd(struct property *new_prop)
+{
+	struct nx842_devdata *old_devdata = NULL;
+	struct nx842_devdata *new_devdata = NULL;
+	struct device_node *of_node = NULL;
+	struct property *status = NULL;
+	struct property *maxsglen = NULL;
+	struct property *maxsyncop = NULL;
+	int ret = 0;
+	unsigned long flags;
+
+	spin_lock_irqsave(&devdata_mutex, flags);
+	old_devdata = rcu_dereference_check(devdata,
+			lockdep_is_held(&devdata_mutex));
+	if (old_devdata)
+		of_node = old_devdata->dev->of_node;
+
+	if (!old_devdata || !of_node) {
+		pr_err("%s: device is not available\n", __func__);
+		spin_unlock_irqrestore(&devdata_mutex, flags);
+		return -ENODEV;
+	}
+
+	new_devdata = kzalloc(sizeof(*new_devdata), GFP_NOFS);
+	if (!new_devdata) {
+		dev_err(old_devdata->dev, "%s: Could not allocate memory for device data\n", __func__);
+		ret = -ENOMEM;
+		goto error_out;
+	}
+
+	memcpy(new_devdata, old_devdata, sizeof(*old_devdata));
+	new_devdata->counters = old_devdata->counters;
+
+	/* Set ptrs for existing properties */
+	status = of_find_property(of_node, "status", NULL);
+	maxsglen = of_find_property(of_node, "ibm,max-sg-len", NULL);
+	maxsyncop = of_find_property(of_node, "ibm,max-sync-cop", NULL);
+	if (!status || !maxsglen || !maxsyncop) {
+		dev_err(old_devdata->dev, "%s: Could not locate device properties\n", __func__);
+		ret = -EINVAL;
+		goto error_out;
+	}
+
+	/*
+	 * If this is a property update, there are only certain properties that
+	 * we care about. Bail if it isn't in the below list
+	 */
+	if (new_prop && (strncmp(new_prop->name, "status", new_prop->length) ||
+		         strncmp(new_prop->name, "ibm,max-sg-len", new_prop->length) ||
+		         strncmp(new_prop->name, "ibm,max-sync-cop", new_prop->length)))
+		goto out;
+
+	/* Perform property updates */
+	ret = nx842_OF_upd_status(new_devdata, status);
+	if (ret)
+		goto error_out;
+
+	ret = nx842_OF_upd_maxsglen(new_devdata, maxsglen);
+	if (ret)
+		goto error_out;
+
+	ret = nx842_OF_upd_maxsyncop(new_devdata, maxsyncop);
+	if (ret)
+		goto error_out;
+
+out:
+	dev_info(old_devdata->dev, "%s: max_sync_size new:%u old:%u\n",
+			__func__, new_devdata->max_sync_size,
+			old_devdata->max_sync_size);
+	dev_info(old_devdata->dev, "%s: max_sync_sg new:%u old:%u\n",
+			__func__, new_devdata->max_sync_sg,
+			old_devdata->max_sync_sg);
+	dev_info(old_devdata->dev, "%s: max_sg_len new:%u old:%u\n",
+			__func__, new_devdata->max_sg_len,
+			old_devdata->max_sg_len);
+
+	rcu_assign_pointer(devdata, new_devdata);
+	spin_unlock_irqrestore(&devdata_mutex, flags);
+	synchronize_rcu();
+	dev_set_drvdata(new_devdata->dev, new_devdata);
+	kfree(old_devdata);
+	return 0;
+
+error_out:
+	if (new_devdata) {
+		dev_info(old_devdata->dev, "%s: device disabled\n", __func__);
+		nx842_OF_set_defaults(new_devdata);
+		rcu_assign_pointer(devdata, new_devdata);
+		spin_unlock_irqrestore(&devdata_mutex, flags);
+		synchronize_rcu();
+		dev_set_drvdata(new_devdata->dev, new_devdata);
+		kfree(old_devdata);
+	} else {
+		dev_err(old_devdata->dev, "%s: could not update driver from hardware\n", __func__);
+		spin_unlock_irqrestore(&devdata_mutex, flags);
+	}
+
+	if (!ret)
+		ret = -EINVAL;
+	return ret;
+}
+
+/**
+ * nx842_OF_notifier - Process updates to OF properties for the device
+ *
+ * @np: notifier block
+ * @action: notifier action
+ * @update: struct pSeries_reconfig_prop_update pointer if action is
+ *	PSERIES_UPDATE_PROPERTY
+ *
+ * Returns:
+ *	NOTIFY_OK on success
+ *	NOTIFY_BAD encoded with error number on failure, use
+ *		notifier_to_errno() to decode this value
+ */
+static int nx842_OF_notifier(struct notifier_block *np, unsigned long action,
+			     void *data)
+{
+	struct of_reconfig_data *upd = data;
+	struct nx842_devdata *local_devdata;
+	struct device_node *node = NULL;
+
+	rcu_read_lock();
+	local_devdata = rcu_dereference(devdata);
+	if (local_devdata)
+		node = local_devdata->dev->of_node;
+
+	if (local_devdata &&
+			action == OF_RECONFIG_UPDATE_PROPERTY &&
+			!strcmp(upd->dn->name, node->name)) {
+		rcu_read_unlock();
+		nx842_OF_upd(upd->prop);
+	} else
+		rcu_read_unlock();
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block nx842_of_nb = {
+	.notifier_call = nx842_OF_notifier,
+};
+
+#define nx842_counter_read(_name)					\
+static ssize_t nx842_##_name##_show(struct device *dev,		\
+		struct device_attribute *attr,				\
+		char *buf) {						\
+	struct nx842_devdata *local_devdata;			\
+	int p = 0;							\
+	rcu_read_lock();						\
+	local_devdata = rcu_dereference(devdata);			\
+	if (local_devdata)						\
+		p = snprintf(buf, PAGE_SIZE, "%ld\n",			\
+		       atomic64_read(&local_devdata->counters->_name));	\
+	rcu_read_unlock();						\
+	return p;							\
+}
+
+#define NX842DEV_COUNTER_ATTR_RO(_name)					\
+	nx842_counter_read(_name);					\
+	static struct device_attribute dev_attr_##_name = __ATTR(_name,	\
+						0444,			\
+						nx842_##_name##_show,\
+						NULL);
+
+NX842DEV_COUNTER_ATTR_RO(comp_complete);
+NX842DEV_COUNTER_ATTR_RO(comp_failed);
+NX842DEV_COUNTER_ATTR_RO(decomp_complete);
+NX842DEV_COUNTER_ATTR_RO(decomp_failed);
+NX842DEV_COUNTER_ATTR_RO(swdecomp);
+
+static ssize_t nx842_timehist_show(struct device *,
+		struct device_attribute *, char *);
+
+static struct device_attribute dev_attr_comp_times = __ATTR(comp_times, 0444,
+		nx842_timehist_show, NULL);
+static struct device_attribute dev_attr_decomp_times = __ATTR(decomp_times,
+		0444, nx842_timehist_show, NULL);
+
+static ssize_t nx842_timehist_show(struct device *dev,
+		struct device_attribute *attr, char *buf) {
+	char *p = buf;
+	struct nx842_devdata *local_devdata;
+	atomic64_t *times;
+	int bytes_remain = PAGE_SIZE;
+	int bytes;
+	int i;
+
+	rcu_read_lock();
+	local_devdata = rcu_dereference(devdata);
+	if (!local_devdata) {
+		rcu_read_unlock();
+		return 0;
+	}
+
+	if (attr == &dev_attr_comp_times)
+		times = local_devdata->counters->comp_times;
+	else if (attr == &dev_attr_decomp_times)
+		times = local_devdata->counters->decomp_times;
+	else {
+		rcu_read_unlock();
+		return 0;
+	}
+
+	for (i = 0; i < (NX842_HIST_SLOTS - 2); i++) {
+		bytes = snprintf(p, bytes_remain, "%u-%uus:\t%ld\n",
+			       i ? (2<<(i-1)) : 0, (2<<i)-1,
+			       atomic64_read(&times[i]));
+		bytes_remain -= bytes;
+		p += bytes;
+	}
+	/* The last bucket holds everything over
+	 * 2<<(NX842_HIST_SLOTS - 2) us */
+	bytes = snprintf(p, bytes_remain, "%uus - :\t%ld\n",
+			2<<(NX842_HIST_SLOTS - 2),
+			atomic64_read(&times[(NX842_HIST_SLOTS - 1)]));
+	p += bytes;
+
+	rcu_read_unlock();
+	return p - buf;
+}
+
+static struct attribute *nx842_sysfs_entries[] = {
+	&dev_attr_comp_complete.attr,
+	&dev_attr_comp_failed.attr,
+	&dev_attr_decomp_complete.attr,
+	&dev_attr_decomp_failed.attr,
+	&dev_attr_swdecomp.attr,
+	&dev_attr_comp_times.attr,
+	&dev_attr_decomp_times.attr,
+	NULL,
+};
+
+static struct attribute_group nx842_attribute_group = {
+	.name = NULL,		/* put in device directory */
+	.attrs = nx842_sysfs_entries,
+};
+
+static int __init nx842_probe(struct vio_dev *viodev,
+				  const struct vio_device_id *id)
+{
+	struct nx842_devdata *old_devdata, *new_devdata = NULL;
+	unsigned long flags;
+	int ret = 0;
+
+	spin_lock_irqsave(&devdata_mutex, flags);
+	old_devdata = rcu_dereference_check(devdata,
+			lockdep_is_held(&devdata_mutex));
+
+	if (old_devdata && old_devdata->vdev != NULL) {
+		dev_err(&viodev->dev, "%s: Attempt to register more than one instance of the hardware\n", __func__);
+		ret = -1;
+		goto error_unlock;
+	}
+
+	dev_set_drvdata(&viodev->dev, NULL);
+
+	new_devdata = kzalloc(sizeof(*new_devdata), GFP_NOFS);
+	if (!new_devdata) {
+		dev_err(&viodev->dev, "%s: Could not allocate memory for device data\n", __func__);
+		ret = -ENOMEM;
+		goto error_unlock;
+	}
+
+	new_devdata->counters = kzalloc(sizeof(*new_devdata->counters),
+			GFP_NOFS);
+	if (!new_devdata->counters) {
+		dev_err(&viodev->dev, "%s: Could not allocate memory for performance counters\n", __func__);
+		ret = -ENOMEM;
+		goto error_unlock;
+	}
+
+	new_devdata->vdev = viodev;
+	new_devdata->dev = &viodev->dev;
+	nx842_OF_set_defaults(new_devdata);
+
+	rcu_assign_pointer(devdata, new_devdata);
+	spin_unlock_irqrestore(&devdata_mutex, flags);
+	synchronize_rcu();
+	kfree(old_devdata);
+
+	of_reconfig_notifier_register(&nx842_of_nb);
+
+	ret = nx842_OF_upd(NULL);
+	if (ret && ret != -ENODEV) {
+		dev_err(&viodev->dev, "could not parse device tree. %d\n", ret);
+		ret = -1;
+		goto error;
+	}
+
+	rcu_read_lock();
+	dev_set_drvdata(&viodev->dev, rcu_dereference(devdata));
+	rcu_read_unlock();
+
+	if (sysfs_create_group(&viodev->dev.kobj, &nx842_attribute_group)) {
+		dev_err(&viodev->dev, "could not create sysfs device attributes\n");
+		ret = -1;
+		goto error;
+	}
+
+	return 0;
+
+error_unlock:
+	spin_unlock_irqrestore(&devdata_mutex, flags);
+	if (new_devdata)
+		kfree(new_devdata->counters);
+	kfree(new_devdata);
+error:
+	return ret;
+}
+
+static int __exit nx842_remove(struct vio_dev *viodev)
+{
+	struct nx842_devdata *old_devdata;
+	unsigned long flags;
+
+	pr_info("Removing IBM Power 842 compression device\n");
+	sysfs_remove_group(&viodev->dev.kobj, &nx842_attribute_group);
+
+	spin_lock_irqsave(&devdata_mutex, flags);
+	old_devdata = rcu_dereference_check(devdata,
+			lockdep_is_held(&devdata_mutex));
+	of_reconfig_notifier_unregister(&nx842_of_nb);
+	RCU_INIT_POINTER(devdata, NULL);
+	spin_unlock_irqrestore(&devdata_mutex, flags);
+	synchronize_rcu();
+	dev_set_drvdata(&viodev->dev, NULL);
+	if (old_devdata)
+		kfree(old_devdata->counters);
+	kfree(old_devdata);
+	return 0;
+}
+
+static struct vio_device_id nx842_driver_ids[] = {
+	{"ibm,compression-v1", "ibm,compression"},
+	{"", ""},
+};
+
+static struct vio_driver nx842_driver = {
+	.name = MODULE_NAME,
+	.probe = nx842_probe,
+	.remove = __exit_p(nx842_remove),
+	.get_desired_dma = nx842_get_desired_dma,
+	.id_table = nx842_driver_ids,
+};
+
+static int __init nx842_init(void)
+{
+	struct nx842_devdata *new_devdata;
+	pr_info("Registering IBM Power 842 compression driver\n");
+
+	RCU_INIT_POINTER(devdata, NULL);
+	new_devdata = kzalloc(sizeof(*new_devdata), GFP_KERNEL);
+	if (!new_devdata) {
+		pr_err("Could not allocate memory for device data\n");
+		return -ENOMEM;
+	}
+	new_devdata->status = UNAVAILABLE;
+	RCU_INIT_POINTER(devdata, new_devdata);
+
+	return vio_register_driver(&nx842_driver);
+}
+
+module_init(nx842_init);
+
+static void __exit nx842_exit(void)
+{
+	struct nx842_devdata *old_devdata;
+	unsigned long flags;
+
+	pr_info("Exiting IBM Power 842 compression driver\n");
+	spin_lock_irqsave(&devdata_mutex, flags);
+	old_devdata = rcu_dereference_check(devdata,
+			lockdep_is_held(&devdata_mutex));
+	RCU_INIT_POINTER(devdata, NULL);
+	spin_unlock_irqrestore(&devdata_mutex, flags);
+	synchronize_rcu();
+	if (old_devdata)
+		dev_set_drvdata(old_devdata->dev, NULL);
+	kfree(old_devdata);
+	vio_unregister_driver(&nx842_driver);
+}
+
+module_exit(nx842_exit);
+
+/*********************************
+ * 842 software decompressor
+*********************************/
+typedef int (*sw842_template_op)(const char **, int *, unsigned char **,
+						struct sw842_fifo *);
+
+static int sw842_data8(const char **, int *, unsigned char **,
+						struct sw842_fifo *);
+static int sw842_data4(const char **, int *, unsigned char **,
+						struct sw842_fifo *);
+static int sw842_data2(const char **, int *, unsigned char **,
+						struct sw842_fifo *);
+static int sw842_ptr8(const char **, int *, unsigned char **,
+						struct sw842_fifo *);
+static int sw842_ptr4(const char **, int *, unsigned char **,
+						struct sw842_fifo *);
+static int sw842_ptr2(const char **, int *, unsigned char **,
+						struct sw842_fifo *);
+
+/* special templates */
+#define SW842_TMPL_REPEAT 0x1B
+#define SW842_TMPL_ZEROS 0x1C
+#define SW842_TMPL_EOF 0x1E
+
+static sw842_template_op sw842_tmpl_ops[26][4] = {
+	{ sw842_data8, NULL}, /* 0 (00000) */
+	{ sw842_data4, sw842_data2, sw842_ptr2,  NULL},
+	{ sw842_data4, sw842_ptr2,  sw842_data2, NULL},
+	{ sw842_data4, sw842_ptr2,  sw842_ptr2,  NULL},
+	{ sw842_data4, sw842_ptr4,  NULL},
+	{ sw842_data2, sw842_ptr2,  sw842_data4, NULL},
+	{ sw842_data2, sw842_ptr2,  sw842_data2, sw842_ptr2},
+	{ sw842_data2, sw842_ptr2,  sw842_ptr2,  sw842_data2},
+	{ sw842_data2, sw842_ptr2,  sw842_ptr2,  sw842_ptr2,},
+	{ sw842_data2, sw842_ptr2,  sw842_ptr4,  NULL},
+	{ sw842_ptr2,  sw842_data2, sw842_data4, NULL}, /* 10 (01010) */
+	{ sw842_ptr2,  sw842_data4, sw842_ptr2,  NULL},
+	{ sw842_ptr2,  sw842_data2, sw842_ptr2,  sw842_data2},
+	{ sw842_ptr2,  sw842_data2, sw842_ptr2,  sw842_ptr2},
+	{ sw842_ptr2,  sw842_data2, sw842_ptr4,  NULL},
+	{ sw842_ptr2,  sw842_ptr2,  sw842_data4, NULL},
+	{ sw842_ptr2,  sw842_ptr2,  sw842_data2, sw842_ptr2},
+	{ sw842_ptr2,  sw842_ptr2,  sw842_ptr2,  sw842_data2},
+	{ sw842_ptr2,  sw842_ptr2,  sw842_ptr2,  sw842_ptr2},
+	{ sw842_ptr2,  sw842_ptr2,  sw842_ptr4,  NULL},
+	{ sw842_ptr4,  sw842_data4, NULL}, /* 20 (10100) */
+	{ sw842_ptr4,  sw842_data2, sw842_ptr2,  NULL},
+	{ sw842_ptr4,  sw842_ptr2,  sw842_data2, NULL},
+	{ sw842_ptr4,  sw842_ptr2,  sw842_ptr2,  NULL},
+	{ sw842_ptr4,  sw842_ptr4,  NULL},
+	{ sw842_ptr8,  NULL}
+};
+
+/* Software decompress helpers */
+
+static uint8_t sw842_get_byte(const char *buf, int bit)
+{
+	uint8_t tmpl;
+	uint16_t tmp;
+	tmp = htons(*(uint16_t *)(buf));
+	tmp = (uint16_t)(tmp << bit);
+	tmp = ntohs(tmp);
+	memcpy(&tmpl, &tmp, 1);
+	return tmpl;
+}
+
+static uint8_t sw842_get_template(const char **buf, int *bit)
+{
+	uint8_t byte;
+	byte = sw842_get_byte(*buf, *bit);
+	byte = byte >> 3;
+	byte &= 0x1F;
+	*buf += (*bit + 5) / 8;
+	*bit = (*bit + 5) % 8;
+	return byte;
+}
+
+/* repeat_count happens to be 5-bit too (like the template) */
+static uint8_t sw842_get_repeat_count(const char **buf, int *bit)
+{
+	uint8_t byte;
+	byte = sw842_get_byte(*buf, *bit);
+	byte = byte >> 2;
+	byte &= 0x3F;
+	*buf += (*bit + 6) / 8;
+	*bit = (*bit + 6) % 8;
+	return byte;
+}
+
+static uint8_t sw842_get_ptr2(const char **buf, int *bit)
+{
+	uint8_t ptr;
+	ptr = sw842_get_byte(*buf, *bit);
+	(*buf)++;
+	return ptr;
+}
+
+static uint16_t sw842_get_ptr4(const char **buf, int *bit,
+		struct sw842_fifo *fifo)
+{
+	uint16_t ptr;
+	ptr = htons(*(uint16_t *)(*buf));
+	ptr = (uint16_t)(ptr << *bit);
+	ptr = ptr >> 7;
+	ptr &= 0x01FF;
+	*buf += (*bit + 9) / 8;
+	*bit = (*bit + 9) % 8;
+	return ptr;
+}
+
+static uint8_t sw842_get_ptr8(const char **buf, int *bit,
+		struct sw842_fifo *fifo)
+{
+	return sw842_get_ptr2(buf, bit);
+}
+
+/* Software decompress template ops */
+
+static int sw842_data8(const char **inbuf, int *inbit,
+		unsigned char **outbuf, struct sw842_fifo *fifo)
+{
+	int ret;
+
+	ret = sw842_data4(inbuf, inbit, outbuf, fifo);
+	if (ret)
+		return ret;
+	ret = sw842_data4(inbuf, inbit, outbuf, fifo);
+	return ret;
+}
+
+static int sw842_data4(const char **inbuf, int *inbit,
+		unsigned char **outbuf, struct sw842_fifo *fifo)
+{
+	int ret;
+
+	ret = sw842_data2(inbuf, inbit, outbuf, fifo);
+	if (ret)
+		return ret;
+	ret = sw842_data2(inbuf, inbit, outbuf, fifo);
+	return ret;
+}
+
+static int sw842_data2(const char **inbuf, int *inbit,
+		unsigned char **outbuf, struct sw842_fifo *fifo)
+{
+	**outbuf = sw842_get_byte(*inbuf, *inbit);
+	(*inbuf)++;
+	(*outbuf)++;
+	**outbuf = sw842_get_byte(*inbuf, *inbit);
+	(*inbuf)++;
+	(*outbuf)++;
+	return 0;
+}
+
+static int sw842_ptr8(const char **inbuf, int *inbit,
+		unsigned char **outbuf, struct sw842_fifo *fifo)
+{
+	uint8_t ptr;
+	ptr = sw842_get_ptr8(inbuf, inbit, fifo);
+	if (!fifo->f84_full && (ptr >= fifo->f8_count))
+		return 1;
+	memcpy(*outbuf, fifo->f8[ptr], 8);
+	*outbuf += 8;
+	return 0;
+}
+
+static int sw842_ptr4(const char **inbuf, int *inbit,
+		unsigned char **outbuf, struct sw842_fifo *fifo)
+{
+	uint16_t ptr;
+	ptr = sw842_get_ptr4(inbuf, inbit, fifo);
+	if (!fifo->f84_full && (ptr >= fifo->f4_count))
+		return 1;
+	memcpy(*outbuf, fifo->f4[ptr], 4);
+	*outbuf += 4;
+	return 0;
+}
+
+static int sw842_ptr2(const char **inbuf, int *inbit,
+		unsigned char **outbuf, struct sw842_fifo *fifo)
+{
+	uint8_t ptr;
+	ptr = sw842_get_ptr2(inbuf, inbit);
+	if (!fifo->f2_full && (ptr >= fifo->f2_count))
+		return 1;
+	memcpy(*outbuf, fifo->f2[ptr], 2);
+	*outbuf += 2;
+	return 0;
+}
+
+static void sw842_copy_to_fifo(const char *buf, struct sw842_fifo *fifo)
+{
+	unsigned char initial_f2count = fifo->f2_count;
+
+	memcpy(fifo->f8[fifo->f8_count], buf, 8);
+	fifo->f4_count += 2;
+	fifo->f8_count += 1;
+
+	if (!fifo->f84_full && fifo->f4_count >= 512) {
+		fifo->f84_full = 1;
+		fifo->f4_count /= 512;
+	}
+
+	memcpy(fifo->f2[fifo->f2_count++], buf, 2);
+	memcpy(fifo->f2[fifo->f2_count++], buf + 2, 2);
+	memcpy(fifo->f2[fifo->f2_count++], buf + 4, 2);
+	memcpy(fifo->f2[fifo->f2_count++], buf + 6, 2);
+	if (fifo->f2_count < initial_f2count)
+		fifo->f2_full = 1;
+}
+
+static int sw842_decompress(const unsigned char *src, int srclen,
+			unsigned char *dst, int *destlen,
+			const void *wrkmem)
+{
+	uint8_t tmpl;
+	const char *inbuf;
+	int inbit = 0;
+	unsigned char *outbuf, *outbuf_end, *origbuf, *prevbuf;
+	const char *inbuf_end;
+	sw842_template_op op;
+	int opindex;
+	int i, repeat_count;
+	struct sw842_fifo *fifo;
+	int ret = 0;
+
+	fifo = &((struct nx842_workmem *)(wrkmem))->swfifo;
+	memset(fifo, 0, sizeof(*fifo));
+
+	origbuf = NULL;
+	inbuf = src;
+	inbuf_end = src + srclen;
+	outbuf = dst;
+	outbuf_end = dst + *destlen;
+
+	while ((tmpl = sw842_get_template(&inbuf, &inbit)) != SW842_TMPL_EOF) {
+		if (inbuf >= inbuf_end) {
+			ret = -EINVAL;
+			goto out;
+		}
+
+		opindex = 0;
+		prevbuf = origbuf;
+		origbuf = outbuf;
+		switch (tmpl) {
+		case SW842_TMPL_REPEAT:
+			if (prevbuf == NULL) {
+				ret = -EINVAL;
+				goto out;
+			}
+
+			repeat_count = sw842_get_repeat_count(&inbuf,
+								&inbit) + 1;
+
+			/* Did the repeat count advance past the end of input */
+			if (inbuf > inbuf_end) {
+				ret = -EINVAL;
+				goto out;
+			}
+
+			for (i = 0; i < repeat_count; i++) {
+				/* Would this overflow the output buffer */
+				if ((outbuf + 8) > outbuf_end) {
+					ret = -ENOSPC;
+					goto out;
+				}
+
+				memcpy(outbuf, prevbuf, 8);
+				sw842_copy_to_fifo(outbuf, fifo);
+				outbuf += 8;
+			}
+			break;
+
+		case SW842_TMPL_ZEROS:
+			/* Would this overflow the output buffer */
+			if ((outbuf + 8) > outbuf_end) {
+				ret = -ENOSPC;
+				goto out;
+			}
+
+			memset(outbuf, 0, 8);
+			sw842_copy_to_fifo(outbuf, fifo);
+			outbuf += 8;
+			break;
+
+		default:
+			if (tmpl > 25) {
+				ret = -EINVAL;
+				goto out;
+			}
+
+			/* Does this go past the end of the input buffer */
+			if ((inbuf + 2) > inbuf_end) {
+				ret = -EINVAL;
+				goto out;
+			}
+
+			/* Would this overflow the output buffer */
+			if ((outbuf + 8) > outbuf_end) {
+				ret = -ENOSPC;
+				goto out;
+			}
+
+			while (opindex < 4 &&
+				(op = sw842_tmpl_ops[tmpl][opindex++])
+					!= NULL) {
+				ret = (*op)(&inbuf, &inbit, &outbuf, fifo);
+				if (ret) {
+					ret = -EINVAL;
+					goto out;
+				}
+				sw842_copy_to_fifo(origbuf, fifo);
+			}
+		}
+	}
+
+out:
+	if (!ret)
+		*destlen = (unsigned int)(outbuf - dst);
+	else
+		*destlen = 0;
+
+	return ret;
+}
diff --git a/drivers/crypto/nx/nx-842.c b/drivers/crypto/nx/nx-842.c
deleted file mode 100644
index 887196e..0000000
--- a/drivers/crypto/nx/nx-842.c
+++ /dev/null
@@ -1,1603 +0,0 @@
-/*
- * Driver for IBM Power 842 compression accelerator
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
- *
- * Copyright (C) IBM Corporation, 2012
- *
- * Authors: Robert Jennings <rcj@linux.vnet.ibm.com>
- *          Seth Jennings <sjenning@linux.vnet.ibm.com>
- */
-
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/nx842.h>
-#include <linux/of.h>
-#include <linux/slab.h>
-
-#include <asm/page.h>
-#include <asm/vio.h>
-
-#include "nx_csbcpb.h" /* struct nx_csbcpb */
-
-#define MODULE_NAME "nx-compress"
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Robert Jennings <rcj@linux.vnet.ibm.com>");
-MODULE_DESCRIPTION("842 H/W Compression driver for IBM Power processors");
-
-#define SHIFT_4K 12
-#define SHIFT_64K 16
-#define SIZE_4K (1UL << SHIFT_4K)
-#define SIZE_64K (1UL << SHIFT_64K)
-
-/* IO buffer must be 128 byte aligned */
-#define IO_BUFFER_ALIGN 128
-
-struct nx842_header {
-	int blocks_nr; /* number of compressed blocks */
-	int offset; /* offset of the first block (from beginning of header) */
-	int sizes[0]; /* size of compressed blocks */
-};
-
-static inline int nx842_header_size(const struct nx842_header *hdr)
-{
-	return sizeof(struct nx842_header) +
-			hdr->blocks_nr * sizeof(hdr->sizes[0]);
-}
-
-/* Macros for fields within nx_csbcpb */
-/* Check the valid bit within the csbcpb valid field */
-#define NX842_CSBCBP_VALID_CHK(x) (x & BIT_MASK(7))
-
-/* CE macros operate on the completion_extension field bits in the csbcpb.
- * CE0 0=full completion, 1=partial completion
- * CE1 0=CE0 indicates completion, 1=termination (output may be modified)
- * CE2 0=processed_bytes is source bytes, 1=processed_bytes is target bytes */
-#define NX842_CSBCPB_CE0(x)	(x & BIT_MASK(7))
-#define NX842_CSBCPB_CE1(x)	(x & BIT_MASK(6))
-#define NX842_CSBCPB_CE2(x)	(x & BIT_MASK(5))
-
-/* The NX unit accepts data only on 4K page boundaries */
-#define NX842_HW_PAGE_SHIFT	SHIFT_4K
-#define NX842_HW_PAGE_SIZE	(ASM_CONST(1) << NX842_HW_PAGE_SHIFT)
-#define NX842_HW_PAGE_MASK	(~(NX842_HW_PAGE_SIZE-1))
-
-enum nx842_status {
-	UNAVAILABLE,
-	AVAILABLE
-};
-
-struct ibm_nx842_counters {
-	atomic64_t comp_complete;
-	atomic64_t comp_failed;
-	atomic64_t decomp_complete;
-	atomic64_t decomp_failed;
-	atomic64_t swdecomp;
-	atomic64_t comp_times[32];
-	atomic64_t decomp_times[32];
-};
-
-static struct nx842_devdata {
-	struct vio_dev *vdev;
-	struct device *dev;
-	struct ibm_nx842_counters *counters;
-	unsigned int max_sg_len;
-	unsigned int max_sync_size;
-	unsigned int max_sync_sg;
-	enum nx842_status status;
-} __rcu *devdata;
-static DEFINE_SPINLOCK(devdata_mutex);
-
-#define NX842_COUNTER_INC(_x) \
-static inline void nx842_inc_##_x( \
-	const struct nx842_devdata *dev) { \
-	if (dev) \
-		atomic64_inc(&dev->counters->_x); \
-}
-NX842_COUNTER_INC(comp_complete);
-NX842_COUNTER_INC(comp_failed);
-NX842_COUNTER_INC(decomp_complete);
-NX842_COUNTER_INC(decomp_failed);
-NX842_COUNTER_INC(swdecomp);
-
-#define NX842_HIST_SLOTS 16
-
-static void ibm_nx842_incr_hist(atomic64_t *times, unsigned int time)
-{
-	int bucket = fls(time);
-
-	if (bucket)
-		bucket = min((NX842_HIST_SLOTS - 1), bucket - 1);
-
-	atomic64_inc(&times[bucket]);
-}
-
-/* NX unit operation flags */
-#define NX842_OP_COMPRESS	0x0
-#define NX842_OP_CRC		0x1
-#define NX842_OP_DECOMPRESS	0x2
-#define NX842_OP_COMPRESS_CRC   (NX842_OP_COMPRESS | NX842_OP_CRC)
-#define NX842_OP_DECOMPRESS_CRC (NX842_OP_DECOMPRESS | NX842_OP_CRC)
-#define NX842_OP_ASYNC		(1<<23)
-#define NX842_OP_NOTIFY		(1<<22)
-#define NX842_OP_NOTIFY_INT(x)	((x & 0xff)<<8)
-
-static unsigned long nx842_get_desired_dma(struct vio_dev *viodev)
-{
-	/* No use of DMA mappings within the driver. */
-	return 0;
-}
-
-struct nx842_slentry {
-	unsigned long ptr; /* Real address (use __pa()) */
-	unsigned long len;
-};
-
-/* pHyp scatterlist entry */
-struct nx842_scatterlist {
-	int entry_nr; /* number of slentries */
-	struct nx842_slentry *entries; /* ptr to array of slentries */
-};
-
-/* Does not include sizeof(entry_nr) in the size */
-static inline unsigned long nx842_get_scatterlist_size(
-				struct nx842_scatterlist *sl)
-{
-	return sl->entry_nr * sizeof(struct nx842_slentry);
-}
-
-static inline unsigned long nx842_get_pa(void *addr)
-{
-	if (is_vmalloc_addr(addr))
-		return page_to_phys(vmalloc_to_page(addr))
-		       + offset_in_page(addr);
-	else
-		return __pa(addr);
-}
-
-static int nx842_build_scatterlist(unsigned long buf, int len,
-			struct nx842_scatterlist *sl)
-{
-	unsigned long nextpage;
-	struct nx842_slentry *entry;
-
-	sl->entry_nr = 0;
-
-	entry = sl->entries;
-	while (len) {
-		entry->ptr = nx842_get_pa((void *)buf);
-		nextpage = ALIGN(buf + 1, NX842_HW_PAGE_SIZE);
-		if (nextpage < buf + len) {
-			/* we aren't at the end yet */
-			if (IS_ALIGNED(buf, NX842_HW_PAGE_SIZE))
-				/* we are in the middle (or beginning) */
-				entry->len = NX842_HW_PAGE_SIZE;
-			else
-				/* we are at the beginning */
-				entry->len = nextpage - buf;
-		} else {
-			/* at the end */
-			entry->len = len;
-		}
-
-		len -= entry->len;
-		buf += entry->len;
-		sl->entry_nr++;
-		entry++;
-	}
-
-	return 0;
-}
-
-/*
- * Working memory for software decompression
- */
-struct sw842_fifo {
-	union {
-		char f8[256][8];
-		char f4[512][4];
-	};
-	char f2[256][2];
-	unsigned char f84_full;
-	unsigned char f2_full;
-	unsigned char f8_count;
-	unsigned char f2_count;
-	unsigned int f4_count;
-};
-
-/*
- * Working memory for crypto API
- */
-struct nx842_workmem {
-	char bounce[PAGE_SIZE]; /* bounce buffer for decompression input */
-	union {
-		/* hardware working memory */
-		struct {
-			/* scatterlist */
-			char slin[SIZE_4K];
-			char slout[SIZE_4K];
-			/* coprocessor status/parameter block */
-			struct nx_csbcpb csbcpb;
-		};
-		/* software working memory */
-		struct sw842_fifo swfifo; /* software decompression fifo */
-	};
-};
-
-int nx842_get_workmem_size(void)
-{
-	return sizeof(struct nx842_workmem) + NX842_HW_PAGE_SIZE;
-}
-EXPORT_SYMBOL_GPL(nx842_get_workmem_size);
-
-int nx842_get_workmem_size_aligned(void)
-{
-	return sizeof(struct nx842_workmem);
-}
-EXPORT_SYMBOL_GPL(nx842_get_workmem_size_aligned);
-
-static int nx842_validate_result(struct device *dev,
-	struct cop_status_block *csb)
-{
-	/* The csb must be valid after returning from vio_h_cop_sync */
-	if (!NX842_CSBCBP_VALID_CHK(csb->valid)) {
-		dev_err(dev, "%s: cspcbp not valid upon completion.\n",
-				__func__);
-		dev_dbg(dev, "valid:0x%02x cs:0x%02x cc:0x%02x ce:0x%02x\n",
-				csb->valid,
-				csb->crb_seq_number,
-				csb->completion_code,
-				csb->completion_extension);
-		dev_dbg(dev, "processed_bytes:%d address:0x%016lx\n",
-				csb->processed_byte_count,
-				(unsigned long)csb->address);
-		return -EIO;
-	}
-
-	/* Check return values from the hardware in the CSB */
-	switch (csb->completion_code) {
-	case 0:	/* Completed without error */
-		break;
-	case 64: /* Target bytes > Source bytes during compression */
-	case 13: /* Output buffer too small */
-		dev_dbg(dev, "%s: Compression output larger than input\n",
-					__func__);
-		return -ENOSPC;
-	case 66: /* Input data contains an illegal template field */
-	case 67: /* Template indicates data past the end of the input stream */
-		dev_dbg(dev, "%s: Bad data for decompression (code:%d)\n",
-					__func__, csb->completion_code);
-		return -EINVAL;
-	default:
-		dev_dbg(dev, "%s: Unspecified error (code:%d)\n",
-					__func__, csb->completion_code);
-		return -EIO;
-	}
-
-	/* Hardware sanity check */
-	if (!NX842_CSBCPB_CE2(csb->completion_extension)) {
-		dev_err(dev, "%s: No error returned by hardware, but "
-				"data returned is unusable, contact support.\n"
-				"(Additional info: csbcbp->processed bytes "
-				"does not specify processed bytes for the "
-				"target buffer.)\n", __func__);
-		return -EIO;
-	}
-
-	return 0;
-}
-
-/**
- * nx842_compress - Compress data using the 842 algorithm
- *
- * Compression provide by the NX842 coprocessor on IBM Power systems.
- * The input buffer is compressed and the result is stored in the
- * provided output buffer.
- *
- * Upon return from this function @outlen contains the length of the
- * compressed data.  If there is an error then @outlen will be 0 and an
- * error will be specified by the return code from this function.
- *
- * @in: Pointer to input buffer, must be page aligned
- * @inlen: Length of input buffer, must be PAGE_SIZE
- * @out: Pointer to output buffer
- * @outlen: Length of output buffer
- * @wrkmem: ptr to buffer for working memory, size determined by
- *          nx842_get_workmem_size()
- *
- * Returns:
- *   0		Success, output of length @outlen stored in the buffer at @out
- *   -ENOMEM	Unable to allocate internal buffers
- *   -ENOSPC	Output buffer is to small
- *   -EMSGSIZE	XXX Difficult to describe this limitation
- *   -EIO	Internal error
- *   -ENODEV	Hardware unavailable
- */
-int nx842_compress(const unsigned char *in, unsigned int inlen,
-		       unsigned char *out, unsigned int *outlen, void *wmem)
-{
-	struct nx842_header *hdr;
-	struct nx842_devdata *local_devdata;
-	struct device *dev = NULL;
-	struct nx842_workmem *workmem;
-	struct nx842_scatterlist slin, slout;
-	struct nx_csbcpb *csbcpb;
-	int ret = 0, max_sync_size, i, bytesleft, size, hdrsize;
-	unsigned long inbuf, outbuf, padding;
-	struct vio_pfo_op op = {
-		.done = NULL,
-		.handle = 0,
-		.timeout = 0,
-	};
-	unsigned long start_time = get_tb();
-
-	/*
-	 * Make sure input buffer is 64k page aligned.  This is assumed since
-	 * this driver is designed for page compression only (for now).  This
-	 * is very nice since we can now use direct DDE(s) for the input and
-	 * the alignment is guaranteed.
-	*/
-	inbuf = (unsigned long)in;
-	if (!IS_ALIGNED(inbuf, PAGE_SIZE) || inlen != PAGE_SIZE)
-		return -EINVAL;
-
-	rcu_read_lock();
-	local_devdata = rcu_dereference(devdata);
-	if (!local_devdata || !local_devdata->dev) {
-		rcu_read_unlock();
-		return -ENODEV;
-	}
-	max_sync_size = local_devdata->max_sync_size;
-	dev = local_devdata->dev;
-
-	/* Create the header */
-	hdr = (struct nx842_header *)out;
-	hdr->blocks_nr = PAGE_SIZE / max_sync_size;
-	hdrsize = nx842_header_size(hdr);
-	outbuf = (unsigned long)out + hdrsize;
-	bytesleft = *outlen - hdrsize;
-
-	/* Init scatterlist */
-	workmem = (struct nx842_workmem *)ALIGN((unsigned long)wmem,
-		NX842_HW_PAGE_SIZE);
-	slin.entries = (struct nx842_slentry *)workmem->slin;
-	slout.entries = (struct nx842_slentry *)workmem->slout;
-
-	/* Init operation */
-	op.flags = NX842_OP_COMPRESS;
-	csbcpb = &workmem->csbcpb;
-	memset(csbcpb, 0, sizeof(*csbcpb));
-	op.csbcpb = nx842_get_pa(csbcpb);
-	op.out = nx842_get_pa(slout.entries);
-
-	for (i = 0; i < hdr->blocks_nr; i++) {
-		/*
-		 * Aligning the output blocks to 128 bytes does waste space,
-		 * but it prevents the need for bounce buffers and memory
-		 * copies.  It also simplifies the code a lot.  In the worst
-		 * case (64k page, 4k max_sync_size), you lose up to
-		 * (128*16)/64k = ~3% the compression factor. For 64k
-		 * max_sync_size, the loss would be at most 128/64k = ~0.2%.
-		 */
-		padding = ALIGN(outbuf, IO_BUFFER_ALIGN) - outbuf;
-		outbuf += padding;
-		bytesleft -= padding;
-		if (i == 0)
-			/* save offset into first block in header */
-			hdr->offset = padding + hdrsize;
-
-		if (bytesleft <= 0) {
-			ret = -ENOSPC;
-			goto unlock;
-		}
-
-		/*
-		 * NOTE: If the default max_sync_size is changed from 4k
-		 * to 64k, remove the "likely" case below, since a
-		 * scatterlist will always be needed.
-		 */
-		if (likely(max_sync_size == NX842_HW_PAGE_SIZE)) {
-			/* Create direct DDE */
-			op.in = nx842_get_pa((void *)inbuf);
-			op.inlen = max_sync_size;
-
-		} else {
-			/* Create indirect DDE (scatterlist) */
-			nx842_build_scatterlist(inbuf, max_sync_size, &slin);
-			op.in = nx842_get_pa(slin.entries);
-			op.inlen = -nx842_get_scatterlist_size(&slin);
-		}
-
-		/*
-		 * If max_sync_size != NX842_HW_PAGE_SIZE, an indirect
-		 * DDE is required for the outbuf.
-		 * If max_sync_size == NX842_HW_PAGE_SIZE, outbuf must
-		 * also be page aligned (1 in 128/4k=32 chance) in order
-		 * to use a direct DDE.
-		 * This is unlikely, just use an indirect DDE always.
-		 */
-		nx842_build_scatterlist(outbuf,
-			min(bytesleft, max_sync_size), &slout);
-		/* op.out set before loop */
-		op.outlen = -nx842_get_scatterlist_size(&slout);
-
-		/* Send request to pHyp */
-		ret = vio_h_cop_sync(local_devdata->vdev, &op);
-
-		/* Check for pHyp error */
-		if (ret) {
-			dev_dbg(dev, "%s: vio_h_cop_sync error (ret=%d, hret=%ld)\n",
-				__func__, ret, op.hcall_err);
-			ret = -EIO;
-			goto unlock;
-		}
-
-		/* Check for hardware error */
-		ret = nx842_validate_result(dev, &csbcpb->csb);
-		if (ret && ret != -ENOSPC)
-			goto unlock;
-
-		/* Handle incompressible data */
-		if (unlikely(ret == -ENOSPC)) {
-			if (bytesleft < max_sync_size) {
-				/*
-				 * Not enough space left in the output buffer
-				 * to store uncompressed block
-				 */
-				goto unlock;
-			} else {
-				/* Store incompressible block */
-				memcpy((void *)outbuf, (void *)inbuf,
-					max_sync_size);
-				hdr->sizes[i] = -max_sync_size;
-				outbuf += max_sync_size;
-				bytesleft -= max_sync_size;
-				/* Reset ret, incompressible data handled */
-				ret = 0;
-			}
-		} else {
-			/* Normal case, compression was successful */
-			size = csbcpb->csb.processed_byte_count;
-			dev_dbg(dev, "%s: processed_bytes=%d\n",
-				__func__, size);
-			hdr->sizes[i] = size;
-			outbuf += size;
-			bytesleft -= size;
-		}
-
-		inbuf += max_sync_size;
-	}
-
-	*outlen = (unsigned int)(outbuf - (unsigned long)out);
-
-unlock:
-	if (ret)
-		nx842_inc_comp_failed(local_devdata);
-	else {
-		nx842_inc_comp_complete(local_devdata);
-		ibm_nx842_incr_hist(local_devdata->counters->comp_times,
-			(get_tb() - start_time) / tb_ticks_per_usec);
-	}
-	rcu_read_unlock();
-	return ret;
-}
-EXPORT_SYMBOL_GPL(nx842_compress);
-
-static int sw842_decompress(const unsigned char *, int, unsigned char *, int *,
-			const void *);
-
-/**
- * nx842_decompress - Decompress data using the 842 algorithm
- *
- * Decompression provide by the NX842 coprocessor on IBM Power systems.
- * The input buffer is decompressed and the result is stored in the
- * provided output buffer.  The size allocated to the output buffer is
- * provided by the caller of this function in @outlen.  Upon return from
- * this function @outlen contains the length of the decompressed data.
- * If there is an error then @outlen will be 0 and an error will be
- * specified by the return code from this function.
- *
- * @in: Pointer to input buffer, will use bounce buffer if not 128 byte
- *      aligned
- * @inlen: Length of input buffer
- * @out: Pointer to output buffer, must be page aligned
- * @outlen: Length of output buffer, must be PAGE_SIZE
- * @wrkmem: ptr to buffer for working memory, size determined by
- *          nx842_get_workmem_size()
- *
- * Returns:
- *   0		Success, output of length @outlen stored in the buffer at @out
- *   -ENODEV	Hardware decompression device is unavailable
- *   -ENOMEM	Unable to allocate internal buffers
- *   -ENOSPC	Output buffer is to small
- *   -EINVAL	Bad input data encountered when attempting decompress
- *   -EIO	Internal error
- */
-int nx842_decompress(const unsigned char *in, unsigned int inlen,
-			 unsigned char *out, unsigned int *outlen, void *wmem)
-{
-	struct nx842_header *hdr;
-	struct nx842_devdata *local_devdata;
-	struct device *dev = NULL;
-	struct nx842_workmem *workmem;
-	struct nx842_scatterlist slin, slout;
-	struct nx_csbcpb *csbcpb;
-	int ret = 0, i, size, max_sync_size;
-	unsigned long inbuf, outbuf;
-	struct vio_pfo_op op = {
-		.done = NULL,
-		.handle = 0,
-		.timeout = 0,
-	};
-	unsigned long start_time = get_tb();
-
-	/* Ensure page alignment and size */
-	outbuf = (unsigned long)out;
-	if (!IS_ALIGNED(outbuf, PAGE_SIZE) || *outlen != PAGE_SIZE)
-		return -EINVAL;
-
-	rcu_read_lock();
-	local_devdata = rcu_dereference(devdata);
-	if (local_devdata)
-		dev = local_devdata->dev;
-
-	/* Get header */
-	hdr = (struct nx842_header *)in;
-
-	workmem = (struct nx842_workmem *)ALIGN((unsigned long)wmem,
-		NX842_HW_PAGE_SIZE);
-
-	inbuf = (unsigned long)in + hdr->offset;
-	if (likely(!IS_ALIGNED(inbuf, IO_BUFFER_ALIGN))) {
-		/* Copy block(s) into bounce buffer for alignment */
-		memcpy(workmem->bounce, in + hdr->offset, inlen - hdr->offset);
-		inbuf = (unsigned long)workmem->bounce;
-	}
-
-	/* Init scatterlist */
-	slin.entries = (struct nx842_slentry *)workmem->slin;
-	slout.entries = (struct nx842_slentry *)workmem->slout;
-
-	/* Init operation */
-	op.flags = NX842_OP_DECOMPRESS;
-	csbcpb = &workmem->csbcpb;
-	memset(csbcpb, 0, sizeof(*csbcpb));
-	op.csbcpb = nx842_get_pa(csbcpb);
-
-	/*
-	 * max_sync_size may have changed since compression,
-	 * so we can't read it from the device info. We need
-	 * to derive it from hdr->blocks_nr.
-	 */
-	max_sync_size = PAGE_SIZE / hdr->blocks_nr;
-
-	for (i = 0; i < hdr->blocks_nr; i++) {
-		/* Skip padding */
-		inbuf = ALIGN(inbuf, IO_BUFFER_ALIGN);
-
-		if (hdr->sizes[i] < 0) {
-			/* Negative sizes indicate uncompressed data blocks */
-			size = abs(hdr->sizes[i]);
-			memcpy((void *)outbuf, (void *)inbuf, size);
-			outbuf += size;
-			inbuf += size;
-			continue;
-		}
-
-		if (!dev)
-			goto sw;
-
-		/*
-		 * The better the compression, the more likely the "likely"
-		 * case becomes.
-		 */
-		if (likely((inbuf & NX842_HW_PAGE_MASK) ==
-			((inbuf + hdr->sizes[i] - 1) & NX842_HW_PAGE_MASK))) {
-			/* Create direct DDE */
-			op.in = nx842_get_pa((void *)inbuf);
-			op.inlen = hdr->sizes[i];
-		} else {
-			/* Create indirect DDE (scatterlist) */
-			nx842_build_scatterlist(inbuf, hdr->sizes[i] , &slin);
-			op.in = nx842_get_pa(slin.entries);
-			op.inlen = -nx842_get_scatterlist_size(&slin);
-		}
-
-		/*
-		 * NOTE: If the default max_sync_size is changed from 4k
-		 * to 64k, remove the "likely" case below, since a
-		 * scatterlist will always be needed.
-		 */
-		if (likely(max_sync_size == NX842_HW_PAGE_SIZE)) {
-			/* Create direct DDE */
-			op.out = nx842_get_pa((void *)outbuf);
-			op.outlen = max_sync_size;
-		} else {
-			/* Create indirect DDE (scatterlist) */
-			nx842_build_scatterlist(outbuf, max_sync_size, &slout);
-			op.out = nx842_get_pa(slout.entries);
-			op.outlen = -nx842_get_scatterlist_size(&slout);
-		}
-
-		/* Send request to pHyp */
-		ret = vio_h_cop_sync(local_devdata->vdev, &op);
-
-		/* Check for pHyp error */
-		if (ret) {
-			dev_dbg(dev, "%s: vio_h_cop_sync error (ret=%d, hret=%ld)\n",
-				__func__, ret, op.hcall_err);
-			dev = NULL;
-			goto sw;
-		}
-
-		/* Check for hardware error */
-		ret = nx842_validate_result(dev, &csbcpb->csb);
-		if (ret) {
-			dev = NULL;
-			goto sw;
-		}
-
-		/* HW decompression success */
-		inbuf += hdr->sizes[i];
-		outbuf += csbcpb->csb.processed_byte_count;
-		continue;
-
-sw:
-		/* software decompression */
-		size = max_sync_size;
-		ret = sw842_decompress(
-			(unsigned char *)inbuf, hdr->sizes[i],
-			(unsigned char *)outbuf, &size, wmem);
-		if (ret)
-			pr_debug("%s: sw842_decompress failed with %d\n",
-				__func__, ret);
-
-		if (ret) {
-			if (ret != -ENOSPC && ret != -EINVAL &&
-					ret != -EMSGSIZE)
-				ret = -EIO;
-			goto unlock;
-		}
-
-		/* SW decompression success */
-		inbuf += hdr->sizes[i];
-		outbuf += size;
-	}
-
-	*outlen = (unsigned int)(outbuf - (unsigned long)out);
-
-unlock:
-	if (ret)
-		/* decompress fail */
-		nx842_inc_decomp_failed(local_devdata);
-	else {
-		if (!dev)
-			/* software decompress */
-			nx842_inc_swdecomp(local_devdata);
-		nx842_inc_decomp_complete(local_devdata);
-		ibm_nx842_incr_hist(local_devdata->counters->decomp_times,
-			(get_tb() - start_time) / tb_ticks_per_usec);
-	}
-
-	rcu_read_unlock();
-	return ret;
-}
-EXPORT_SYMBOL_GPL(nx842_decompress);
-
-/**
- * nx842_OF_set_defaults -- Set default (disabled) values for devdata
- *
- * @devdata - struct nx842_devdata to update
- *
- * Returns:
- *  0 on success
- *  -ENOENT if @devdata ptr is NULL
- */
-static int nx842_OF_set_defaults(struct nx842_devdata *devdata)
-{
-	if (devdata) {
-		devdata->max_sync_size = 0;
-		devdata->max_sync_sg = 0;
-		devdata->max_sg_len = 0;
-		devdata->status = UNAVAILABLE;
-		return 0;
-	} else
-		return -ENOENT;
-}
-
-/**
- * nx842_OF_upd_status -- Update the device info from OF status prop
- *
- * The status property indicates if the accelerator is enabled.  If the
- * device is in the OF tree it indicates that the hardware is present.
- * The status field indicates if the device is enabled when the status
- * is 'okay'.  Otherwise the device driver will be disabled.
- *
- * @devdata - struct nx842_devdata to update
- * @prop - struct property point containing the maxsyncop for the update
- *
- * Returns:
- *  0 - Device is available
- *  -EINVAL - Device is not available
- */
-static int nx842_OF_upd_status(struct nx842_devdata *devdata,
-					struct property *prop) {
-	int ret = 0;
-	const char *status = (const char *)prop->value;
-
-	if (!strncmp(status, "okay", (size_t)prop->length)) {
-		devdata->status = AVAILABLE;
-	} else {
-		dev_info(devdata->dev, "%s: status '%s' is not 'okay'\n",
-				__func__, status);
-		devdata->status = UNAVAILABLE;
-	}
-
-	return ret;
-}
-
-/**
- * nx842_OF_upd_maxsglen -- Update the device info from OF maxsglen prop
- *
- * Definition of the 'ibm,max-sg-len' OF property:
- *  This field indicates the maximum byte length of a scatter list
- *  for the platform facility. It is a single cell encoded as with encode-int.
- *
- * Example:
- *  # od -x ibm,max-sg-len
- *  0000000 0000 0ff0
- *
- *  In this example, the maximum byte length of a scatter list is
- *  0x0ff0 (4,080).
- *
- * @devdata - struct nx842_devdata to update
- * @prop - struct property point containing the maxsyncop for the update
- *
- * Returns:
- *  0 on success
- *  -EINVAL on failure
- */
-static int nx842_OF_upd_maxsglen(struct nx842_devdata *devdata,
-					struct property *prop) {
-	int ret = 0;
-	const int *maxsglen = prop->value;
-
-	if (prop->length != sizeof(*maxsglen)) {
-		dev_err(devdata->dev, "%s: unexpected format for ibm,max-sg-len property\n", __func__);
-		dev_dbg(devdata->dev, "%s: ibm,max-sg-len is %d bytes long, expected %lu bytes\n", __func__,
-				prop->length, sizeof(*maxsglen));
-		ret = -EINVAL;
-	} else {
-		devdata->max_sg_len = (unsigned int)min(*maxsglen,
-				(int)NX842_HW_PAGE_SIZE);
-	}
-
-	return ret;
-}
-
-/**
- * nx842_OF_upd_maxsyncop -- Update the device info from OF maxsyncop prop
- *
- * Definition of the 'ibm,max-sync-cop' OF property:
- *  Two series of cells.  The first series of cells represents the maximums
- *  that can be synchronously compressed. The second series of cells
- *  represents the maximums that can be synchronously decompressed.
- *  1. The first cell in each series contains the count of the number of
- *     data length, scatter list elements pairs that follow – each being
- *     of the form
- *    a. One cell data byte length
- *    b. One cell total number of scatter list elements
- *
- * Example:
- *  # od -x ibm,max-sync-cop
- *  0000000 0000 0001 0000 1000 0000 01fe 0000 0001
- *  0000020 0000 1000 0000 01fe
- *
- *  In this example, compression supports 0x1000 (4,096) data byte length
- *  and 0x1fe (510) total scatter list elements.  Decompression supports
- *  0x1000 (4,096) data byte length and 0x1f3 (510) total scatter list
- *  elements.
- *
- * @devdata - struct nx842_devdata to update
- * @prop - struct property point containing the maxsyncop for the update
- *
- * Returns:
- *  0 on success
- *  -EINVAL on failure
- */
-static int nx842_OF_upd_maxsyncop(struct nx842_devdata *devdata,
-					struct property *prop) {
-	int ret = 0;
-	const struct maxsynccop_t {
-		int comp_elements;
-		int comp_data_limit;
-		int comp_sg_limit;
-		int decomp_elements;
-		int decomp_data_limit;
-		int decomp_sg_limit;
-	} *maxsynccop;
-
-	if (prop->length != sizeof(*maxsynccop)) {
-		dev_err(devdata->dev, "%s: unexpected format for ibm,max-sync-cop property\n", __func__);
-		dev_dbg(devdata->dev, "%s: ibm,max-sync-cop is %d bytes long, expected %lu bytes\n", __func__, prop->length,
-				sizeof(*maxsynccop));
-		ret = -EINVAL;
-		goto out;
-	}
-
-	maxsynccop = (const struct maxsynccop_t *)prop->value;
-
-	/* Use one limit rather than separate limits for compression and
-	 * decompression. Set a maximum for this so as not to exceed the
-	 * size that the header can support and round the value down to
-	 * the hardware page size (4K) */
-	devdata->max_sync_size =
-			(unsigned int)min(maxsynccop->comp_data_limit,
-					maxsynccop->decomp_data_limit);
-
-	devdata->max_sync_size = min_t(unsigned int, devdata->max_sync_size,
-					SIZE_64K);
-
-	if (devdata->max_sync_size < SIZE_4K) {
-		dev_err(devdata->dev, "%s: hardware max data size (%u) is "
-				"less than the driver minimum, unable to use "
-				"the hardware device\n",
-				__func__, devdata->max_sync_size);
-		ret = -EINVAL;
-		goto out;
-	}
-
-	devdata->max_sync_sg = (unsigned int)min(maxsynccop->comp_sg_limit,
-						maxsynccop->decomp_sg_limit);
-	if (devdata->max_sync_sg < 1) {
-		dev_err(devdata->dev, "%s: hardware max sg size (%u) is "
-				"less than the driver minimum, unable to use "
-				"the hardware device\n",
-				__func__, devdata->max_sync_sg);
-		ret = -EINVAL;
-		goto out;
-	}
-
-out:
-	return ret;
-}
-
-/**
- *
- * nx842_OF_upd -- Handle OF properties updates for the device.
- *
- * Set all properties from the OF tree.  Optionally, a new property
- * can be provided by the @new_prop pointer to overwrite an existing value.
- * The device will remain disabled until all values are valid, this function
- * will return an error for updates unless all values are valid.
- *
- * @new_prop: If not NULL, this property is being updated.  If NULL, update
- *  all properties from the current values in the OF tree.
- *
- * Returns:
- *  0 - Success
- *  -ENOMEM - Could not allocate memory for new devdata structure
- *  -EINVAL - property value not found, new_prop is not a recognized
- *	property for the device or property value is not valid.
- *  -ENODEV - Device is not available
- */
-static int nx842_OF_upd(struct property *new_prop)
-{
-	struct nx842_devdata *old_devdata = NULL;
-	struct nx842_devdata *new_devdata = NULL;
-	struct device_node *of_node = NULL;
-	struct property *status = NULL;
-	struct property *maxsglen = NULL;
-	struct property *maxsyncop = NULL;
-	int ret = 0;
-	unsigned long flags;
-
-	spin_lock_irqsave(&devdata_mutex, flags);
-	old_devdata = rcu_dereference_check(devdata,
-			lockdep_is_held(&devdata_mutex));
-	if (old_devdata)
-		of_node = old_devdata->dev->of_node;
-
-	if (!old_devdata || !of_node) {
-		pr_err("%s: device is not available\n", __func__);
-		spin_unlock_irqrestore(&devdata_mutex, flags);
-		return -ENODEV;
-	}
-
-	new_devdata = kzalloc(sizeof(*new_devdata), GFP_NOFS);
-	if (!new_devdata) {
-		dev_err(old_devdata->dev, "%s: Could not allocate memory for device data\n", __func__);
-		ret = -ENOMEM;
-		goto error_out;
-	}
-
-	memcpy(new_devdata, old_devdata, sizeof(*old_devdata));
-	new_devdata->counters = old_devdata->counters;
-
-	/* Set ptrs for existing properties */
-	status = of_find_property(of_node, "status", NULL);
-	maxsglen = of_find_property(of_node, "ibm,max-sg-len", NULL);
-	maxsyncop = of_find_property(of_node, "ibm,max-sync-cop", NULL);
-	if (!status || !maxsglen || !maxsyncop) {
-		dev_err(old_devdata->dev, "%s: Could not locate device properties\n", __func__);
-		ret = -EINVAL;
-		goto error_out;
-	}
-
-	/*
-	 * If this is a property update, there are only certain properties that
-	 * we care about. Bail if it isn't in the below list
-	 */
-	if (new_prop && (strncmp(new_prop->name, "status", new_prop->length) ||
-		         strncmp(new_prop->name, "ibm,max-sg-len", new_prop->length) ||
-		         strncmp(new_prop->name, "ibm,max-sync-cop", new_prop->length)))
-		goto out;
-
-	/* Perform property updates */
-	ret = nx842_OF_upd_status(new_devdata, status);
-	if (ret)
-		goto error_out;
-
-	ret = nx842_OF_upd_maxsglen(new_devdata, maxsglen);
-	if (ret)
-		goto error_out;
-
-	ret = nx842_OF_upd_maxsyncop(new_devdata, maxsyncop);
-	if (ret)
-		goto error_out;
-
-out:
-	dev_info(old_devdata->dev, "%s: max_sync_size new:%u old:%u\n",
-			__func__, new_devdata->max_sync_size,
-			old_devdata->max_sync_size);
-	dev_info(old_devdata->dev, "%s: max_sync_sg new:%u old:%u\n",
-			__func__, new_devdata->max_sync_sg,
-			old_devdata->max_sync_sg);
-	dev_info(old_devdata->dev, "%s: max_sg_len new:%u old:%u\n",
-			__func__, new_devdata->max_sg_len,
-			old_devdata->max_sg_len);
-
-	rcu_assign_pointer(devdata, new_devdata);
-	spin_unlock_irqrestore(&devdata_mutex, flags);
-	synchronize_rcu();
-	dev_set_drvdata(new_devdata->dev, new_devdata);
-	kfree(old_devdata);
-	return 0;
-
-error_out:
-	if (new_devdata) {
-		dev_info(old_devdata->dev, "%s: device disabled\n", __func__);
-		nx842_OF_set_defaults(new_devdata);
-		rcu_assign_pointer(devdata, new_devdata);
-		spin_unlock_irqrestore(&devdata_mutex, flags);
-		synchronize_rcu();
-		dev_set_drvdata(new_devdata->dev, new_devdata);
-		kfree(old_devdata);
-	} else {
-		dev_err(old_devdata->dev, "%s: could not update driver from hardware\n", __func__);
-		spin_unlock_irqrestore(&devdata_mutex, flags);
-	}
-
-	if (!ret)
-		ret = -EINVAL;
-	return ret;
-}
-
-/**
- * nx842_OF_notifier - Process updates to OF properties for the device
- *
- * @np: notifier block
- * @action: notifier action
- * @update: struct pSeries_reconfig_prop_update pointer if action is
- *	PSERIES_UPDATE_PROPERTY
- *
- * Returns:
- *	NOTIFY_OK on success
- *	NOTIFY_BAD encoded with error number on failure, use
- *		notifier_to_errno() to decode this value
- */
-static int nx842_OF_notifier(struct notifier_block *np, unsigned long action,
-			     void *data)
-{
-	struct of_reconfig_data *upd = data;
-	struct nx842_devdata *local_devdata;
-	struct device_node *node = NULL;
-
-	rcu_read_lock();
-	local_devdata = rcu_dereference(devdata);
-	if (local_devdata)
-		node = local_devdata->dev->of_node;
-
-	if (local_devdata &&
-			action == OF_RECONFIG_UPDATE_PROPERTY &&
-			!strcmp(upd->dn->name, node->name)) {
-		rcu_read_unlock();
-		nx842_OF_upd(upd->prop);
-	} else
-		rcu_read_unlock();
-
-	return NOTIFY_OK;
-}
-
-static struct notifier_block nx842_of_nb = {
-	.notifier_call = nx842_OF_notifier,
-};
-
-#define nx842_counter_read(_name)					\
-static ssize_t nx842_##_name##_show(struct device *dev,		\
-		struct device_attribute *attr,				\
-		char *buf) {						\
-	struct nx842_devdata *local_devdata;			\
-	int p = 0;							\
-	rcu_read_lock();						\
-	local_devdata = rcu_dereference(devdata);			\
-	if (local_devdata)						\
-		p = snprintf(buf, PAGE_SIZE, "%ld\n",			\
-		       atomic64_read(&local_devdata->counters->_name));	\
-	rcu_read_unlock();						\
-	return p;							\
-}
-
-#define NX842DEV_COUNTER_ATTR_RO(_name)					\
-	nx842_counter_read(_name);					\
-	static struct device_attribute dev_attr_##_name = __ATTR(_name,	\
-						0444,			\
-						nx842_##_name##_show,\
-						NULL);
-
-NX842DEV_COUNTER_ATTR_RO(comp_complete);
-NX842DEV_COUNTER_ATTR_RO(comp_failed);
-NX842DEV_COUNTER_ATTR_RO(decomp_complete);
-NX842DEV_COUNTER_ATTR_RO(decomp_failed);
-NX842DEV_COUNTER_ATTR_RO(swdecomp);
-
-static ssize_t nx842_timehist_show(struct device *,
-		struct device_attribute *, char *);
-
-static struct device_attribute dev_attr_comp_times = __ATTR(comp_times, 0444,
-		nx842_timehist_show, NULL);
-static struct device_attribute dev_attr_decomp_times = __ATTR(decomp_times,
-		0444, nx842_timehist_show, NULL);
-
-static ssize_t nx842_timehist_show(struct device *dev,
-		struct device_attribute *attr, char *buf) {
-	char *p = buf;
-	struct nx842_devdata *local_devdata;
-	atomic64_t *times;
-	int bytes_remain = PAGE_SIZE;
-	int bytes;
-	int i;
-
-	rcu_read_lock();
-	local_devdata = rcu_dereference(devdata);
-	if (!local_devdata) {
-		rcu_read_unlock();
-		return 0;
-	}
-
-	if (attr == &dev_attr_comp_times)
-		times = local_devdata->counters->comp_times;
-	else if (attr == &dev_attr_decomp_times)
-		times = local_devdata->counters->decomp_times;
-	else {
-		rcu_read_unlock();
-		return 0;
-	}
-
-	for (i = 0; i < (NX842_HIST_SLOTS - 2); i++) {
-		bytes = snprintf(p, bytes_remain, "%u-%uus:\t%ld\n",
-			       i ? (2<<(i-1)) : 0, (2<<i)-1,
-			       atomic64_read(&times[i]));
-		bytes_remain -= bytes;
-		p += bytes;
-	}
-	/* The last bucket holds everything over
-	 * 2<<(NX842_HIST_SLOTS - 2) us */
-	bytes = snprintf(p, bytes_remain, "%uus - :\t%ld\n",
-			2<<(NX842_HIST_SLOTS - 2),
-			atomic64_read(&times[(NX842_HIST_SLOTS - 1)]));
-	p += bytes;
-
-	rcu_read_unlock();
-	return p - buf;
-}
-
-static struct attribute *nx842_sysfs_entries[] = {
-	&dev_attr_comp_complete.attr,
-	&dev_attr_comp_failed.attr,
-	&dev_attr_decomp_complete.attr,
-	&dev_attr_decomp_failed.attr,
-	&dev_attr_swdecomp.attr,
-	&dev_attr_comp_times.attr,
-	&dev_attr_decomp_times.attr,
-	NULL,
-};
-
-static struct attribute_group nx842_attribute_group = {
-	.name = NULL,		/* put in device directory */
-	.attrs = nx842_sysfs_entries,
-};
-
-static int __init nx842_probe(struct vio_dev *viodev,
-				  const struct vio_device_id *id)
-{
-	struct nx842_devdata *old_devdata, *new_devdata = NULL;
-	unsigned long flags;
-	int ret = 0;
-
-	spin_lock_irqsave(&devdata_mutex, flags);
-	old_devdata = rcu_dereference_check(devdata,
-			lockdep_is_held(&devdata_mutex));
-
-	if (old_devdata && old_devdata->vdev != NULL) {
-		dev_err(&viodev->dev, "%s: Attempt to register more than one instance of the hardware\n", __func__);
-		ret = -1;
-		goto error_unlock;
-	}
-
-	dev_set_drvdata(&viodev->dev, NULL);
-
-	new_devdata = kzalloc(sizeof(*new_devdata), GFP_NOFS);
-	if (!new_devdata) {
-		dev_err(&viodev->dev, "%s: Could not allocate memory for device data\n", __func__);
-		ret = -ENOMEM;
-		goto error_unlock;
-	}
-
-	new_devdata->counters = kzalloc(sizeof(*new_devdata->counters),
-			GFP_NOFS);
-	if (!new_devdata->counters) {
-		dev_err(&viodev->dev, "%s: Could not allocate memory for performance counters\n", __func__);
-		ret = -ENOMEM;
-		goto error_unlock;
-	}
-
-	new_devdata->vdev = viodev;
-	new_devdata->dev = &viodev->dev;
-	nx842_OF_set_defaults(new_devdata);
-
-	rcu_assign_pointer(devdata, new_devdata);
-	spin_unlock_irqrestore(&devdata_mutex, flags);
-	synchronize_rcu();
-	kfree(old_devdata);
-
-	of_reconfig_notifier_register(&nx842_of_nb);
-
-	ret = nx842_OF_upd(NULL);
-	if (ret && ret != -ENODEV) {
-		dev_err(&viodev->dev, "could not parse device tree. %d\n", ret);
-		ret = -1;
-		goto error;
-	}
-
-	rcu_read_lock();
-	dev_set_drvdata(&viodev->dev, rcu_dereference(devdata));
-	rcu_read_unlock();
-
-	if (sysfs_create_group(&viodev->dev.kobj, &nx842_attribute_group)) {
-		dev_err(&viodev->dev, "could not create sysfs device attributes\n");
-		ret = -1;
-		goto error;
-	}
-
-	return 0;
-
-error_unlock:
-	spin_unlock_irqrestore(&devdata_mutex, flags);
-	if (new_devdata)
-		kfree(new_devdata->counters);
-	kfree(new_devdata);
-error:
-	return ret;
-}
-
-static int __exit nx842_remove(struct vio_dev *viodev)
-{
-	struct nx842_devdata *old_devdata;
-	unsigned long flags;
-
-	pr_info("Removing IBM Power 842 compression device\n");
-	sysfs_remove_group(&viodev->dev.kobj, &nx842_attribute_group);
-
-	spin_lock_irqsave(&devdata_mutex, flags);
-	old_devdata = rcu_dereference_check(devdata,
-			lockdep_is_held(&devdata_mutex));
-	of_reconfig_notifier_unregister(&nx842_of_nb);
-	RCU_INIT_POINTER(devdata, NULL);
-	spin_unlock_irqrestore(&devdata_mutex, flags);
-	synchronize_rcu();
-	dev_set_drvdata(&viodev->dev, NULL);
-	if (old_devdata)
-		kfree(old_devdata->counters);
-	kfree(old_devdata);
-	return 0;
-}
-
-static struct vio_device_id nx842_driver_ids[] = {
-	{"ibm,compression-v1", "ibm,compression"},
-	{"", ""},
-};
-
-static struct vio_driver nx842_driver = {
-	.name = MODULE_NAME,
-	.probe = nx842_probe,
-	.remove = __exit_p(nx842_remove),
-	.get_desired_dma = nx842_get_desired_dma,
-	.id_table = nx842_driver_ids,
-};
-
-static int __init nx842_init(void)
-{
-	struct nx842_devdata *new_devdata;
-	pr_info("Registering IBM Power 842 compression driver\n");
-
-	RCU_INIT_POINTER(devdata, NULL);
-	new_devdata = kzalloc(sizeof(*new_devdata), GFP_KERNEL);
-	if (!new_devdata) {
-		pr_err("Could not allocate memory for device data\n");
-		return -ENOMEM;
-	}
-	new_devdata->status = UNAVAILABLE;
-	RCU_INIT_POINTER(devdata, new_devdata);
-
-	return vio_register_driver(&nx842_driver);
-}
-
-module_init(nx842_init);
-
-static void __exit nx842_exit(void)
-{
-	struct nx842_devdata *old_devdata;
-	unsigned long flags;
-
-	pr_info("Exiting IBM Power 842 compression driver\n");
-	spin_lock_irqsave(&devdata_mutex, flags);
-	old_devdata = rcu_dereference_check(devdata,
-			lockdep_is_held(&devdata_mutex));
-	RCU_INIT_POINTER(devdata, NULL);
-	spin_unlock_irqrestore(&devdata_mutex, flags);
-	synchronize_rcu();
-	if (old_devdata)
-		dev_set_drvdata(old_devdata->dev, NULL);
-	kfree(old_devdata);
-	vio_unregister_driver(&nx842_driver);
-}
-
-module_exit(nx842_exit);
-
-/*********************************
- * 842 software decompressor
-*********************************/
-typedef int (*sw842_template_op)(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-
-static int sw842_data8(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-static int sw842_data4(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-static int sw842_data2(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-static int sw842_ptr8(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-static int sw842_ptr4(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-static int sw842_ptr2(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-
-/* special templates */
-#define SW842_TMPL_REPEAT 0x1B
-#define SW842_TMPL_ZEROS 0x1C
-#define SW842_TMPL_EOF 0x1E
-
-static sw842_template_op sw842_tmpl_ops[26][4] = {
-	{ sw842_data8, NULL}, /* 0 (00000) */
-	{ sw842_data4, sw842_data2, sw842_ptr2,  NULL},
-	{ sw842_data4, sw842_ptr2,  sw842_data2, NULL},
-	{ sw842_data4, sw842_ptr2,  sw842_ptr2,  NULL},
-	{ sw842_data4, sw842_ptr4,  NULL},
-	{ sw842_data2, sw842_ptr2,  sw842_data4, NULL},
-	{ sw842_data2, sw842_ptr2,  sw842_data2, sw842_ptr2},
-	{ sw842_data2, sw842_ptr2,  sw842_ptr2,  sw842_data2},
-	{ sw842_data2, sw842_ptr2,  sw842_ptr2,  sw842_ptr2,},
-	{ sw842_data2, sw842_ptr2,  sw842_ptr4,  NULL},
-	{ sw842_ptr2,  sw842_data2, sw842_data4, NULL}, /* 10 (01010) */
-	{ sw842_ptr2,  sw842_data4, sw842_ptr2,  NULL},
-	{ sw842_ptr2,  sw842_data2, sw842_ptr2,  sw842_data2},
-	{ sw842_ptr2,  sw842_data2, sw842_ptr2,  sw842_ptr2},
-	{ sw842_ptr2,  sw842_data2, sw842_ptr4,  NULL},
-	{ sw842_ptr2,  sw842_ptr2,  sw842_data4, NULL},
-	{ sw842_ptr2,  sw842_ptr2,  sw842_data2, sw842_ptr2},
-	{ sw842_ptr2,  sw842_ptr2,  sw842_ptr2,  sw842_data2},
-	{ sw842_ptr2,  sw842_ptr2,  sw842_ptr2,  sw842_ptr2},
-	{ sw842_ptr2,  sw842_ptr2,  sw842_ptr4,  NULL},
-	{ sw842_ptr4,  sw842_data4, NULL}, /* 20 (10100) */
-	{ sw842_ptr4,  sw842_data2, sw842_ptr2,  NULL},
-	{ sw842_ptr4,  sw842_ptr2,  sw842_data2, NULL},
-	{ sw842_ptr4,  sw842_ptr2,  sw842_ptr2,  NULL},
-	{ sw842_ptr4,  sw842_ptr4,  NULL},
-	{ sw842_ptr8,  NULL}
-};
-
-/* Software decompress helpers */
-
-static uint8_t sw842_get_byte(const char *buf, int bit)
-{
-	uint8_t tmpl;
-	uint16_t tmp;
-	tmp = htons(*(uint16_t *)(buf));
-	tmp = (uint16_t)(tmp << bit);
-	tmp = ntohs(tmp);
-	memcpy(&tmpl, &tmp, 1);
-	return tmpl;
-}
-
-static uint8_t sw842_get_template(const char **buf, int *bit)
-{
-	uint8_t byte;
-	byte = sw842_get_byte(*buf, *bit);
-	byte = byte >> 3;
-	byte &= 0x1F;
-	*buf += (*bit + 5) / 8;
-	*bit = (*bit + 5) % 8;
-	return byte;
-}
-
-/* repeat_count happens to be 5-bit too (like the template) */
-static uint8_t sw842_get_repeat_count(const char **buf, int *bit)
-{
-	uint8_t byte;
-	byte = sw842_get_byte(*buf, *bit);
-	byte = byte >> 2;
-	byte &= 0x3F;
-	*buf += (*bit + 6) / 8;
-	*bit = (*bit + 6) % 8;
-	return byte;
-}
-
-static uint8_t sw842_get_ptr2(const char **buf, int *bit)
-{
-	uint8_t ptr;
-	ptr = sw842_get_byte(*buf, *bit);
-	(*buf)++;
-	return ptr;
-}
-
-static uint16_t sw842_get_ptr4(const char **buf, int *bit,
-		struct sw842_fifo *fifo)
-{
-	uint16_t ptr;
-	ptr = htons(*(uint16_t *)(*buf));
-	ptr = (uint16_t)(ptr << *bit);
-	ptr = ptr >> 7;
-	ptr &= 0x01FF;
-	*buf += (*bit + 9) / 8;
-	*bit = (*bit + 9) % 8;
-	return ptr;
-}
-
-static uint8_t sw842_get_ptr8(const char **buf, int *bit,
-		struct sw842_fifo *fifo)
-{
-	return sw842_get_ptr2(buf, bit);
-}
-
-/* Software decompress template ops */
-
-static int sw842_data8(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	int ret;
-
-	ret = sw842_data4(inbuf, inbit, outbuf, fifo);
-	if (ret)
-		return ret;
-	ret = sw842_data4(inbuf, inbit, outbuf, fifo);
-	return ret;
-}
-
-static int sw842_data4(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	int ret;
-
-	ret = sw842_data2(inbuf, inbit, outbuf, fifo);
-	if (ret)
-		return ret;
-	ret = sw842_data2(inbuf, inbit, outbuf, fifo);
-	return ret;
-}
-
-static int sw842_data2(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	**outbuf = sw842_get_byte(*inbuf, *inbit);
-	(*inbuf)++;
-	(*outbuf)++;
-	**outbuf = sw842_get_byte(*inbuf, *inbit);
-	(*inbuf)++;
-	(*outbuf)++;
-	return 0;
-}
-
-static int sw842_ptr8(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	uint8_t ptr;
-	ptr = sw842_get_ptr8(inbuf, inbit, fifo);
-	if (!fifo->f84_full && (ptr >= fifo->f8_count))
-		return 1;
-	memcpy(*outbuf, fifo->f8[ptr], 8);
-	*outbuf += 8;
-	return 0;
-}
-
-static int sw842_ptr4(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	uint16_t ptr;
-	ptr = sw842_get_ptr4(inbuf, inbit, fifo);
-	if (!fifo->f84_full && (ptr >= fifo->f4_count))
-		return 1;
-	memcpy(*outbuf, fifo->f4[ptr], 4);
-	*outbuf += 4;
-	return 0;
-}
-
-static int sw842_ptr2(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	uint8_t ptr;
-	ptr = sw842_get_ptr2(inbuf, inbit);
-	if (!fifo->f2_full && (ptr >= fifo->f2_count))
-		return 1;
-	memcpy(*outbuf, fifo->f2[ptr], 2);
-	*outbuf += 2;
-	return 0;
-}
-
-static void sw842_copy_to_fifo(const char *buf, struct sw842_fifo *fifo)
-{
-	unsigned char initial_f2count = fifo->f2_count;
-
-	memcpy(fifo->f8[fifo->f8_count], buf, 8);
-	fifo->f4_count += 2;
-	fifo->f8_count += 1;
-
-	if (!fifo->f84_full && fifo->f4_count >= 512) {
-		fifo->f84_full = 1;
-		fifo->f4_count /= 512;
-	}
-
-	memcpy(fifo->f2[fifo->f2_count++], buf, 2);
-	memcpy(fifo->f2[fifo->f2_count++], buf + 2, 2);
-	memcpy(fifo->f2[fifo->f2_count++], buf + 4, 2);
-	memcpy(fifo->f2[fifo->f2_count++], buf + 6, 2);
-	if (fifo->f2_count < initial_f2count)
-		fifo->f2_full = 1;
-}
-
-static int sw842_decompress(const unsigned char *src, int srclen,
-			unsigned char *dst, int *destlen,
-			const void *wrkmem)
-{
-	uint8_t tmpl;
-	const char *inbuf;
-	int inbit = 0;
-	unsigned char *outbuf, *outbuf_end, *origbuf, *prevbuf;
-	const char *inbuf_end;
-	sw842_template_op op;
-	int opindex;
-	int i, repeat_count;
-	struct sw842_fifo *fifo;
-	int ret = 0;
-
-	fifo = &((struct nx842_workmem *)(wrkmem))->swfifo;
-	memset(fifo, 0, sizeof(*fifo));
-
-	origbuf = NULL;
-	inbuf = src;
-	inbuf_end = src + srclen;
-	outbuf = dst;
-	outbuf_end = dst + *destlen;
-
-	while ((tmpl = sw842_get_template(&inbuf, &inbit)) != SW842_TMPL_EOF) {
-		if (inbuf >= inbuf_end) {
-			ret = -EINVAL;
-			goto out;
-		}
-
-		opindex = 0;
-		prevbuf = origbuf;
-		origbuf = outbuf;
-		switch (tmpl) {
-		case SW842_TMPL_REPEAT:
-			if (prevbuf == NULL) {
-				ret = -EINVAL;
-				goto out;
-			}
-
-			repeat_count = sw842_get_repeat_count(&inbuf,
-								&inbit) + 1;
-
-			/* Did the repeat count advance past the end of input */
-			if (inbuf > inbuf_end) {
-				ret = -EINVAL;
-				goto out;
-			}
-
-			for (i = 0; i < repeat_count; i++) {
-				/* Would this overflow the output buffer */
-				if ((outbuf + 8) > outbuf_end) {
-					ret = -ENOSPC;
-					goto out;
-				}
-
-				memcpy(outbuf, prevbuf, 8);
-				sw842_copy_to_fifo(outbuf, fifo);
-				outbuf += 8;
-			}
-			break;
-
-		case SW842_TMPL_ZEROS:
-			/* Would this overflow the output buffer */
-			if ((outbuf + 8) > outbuf_end) {
-				ret = -ENOSPC;
-				goto out;
-			}
-
-			memset(outbuf, 0, 8);
-			sw842_copy_to_fifo(outbuf, fifo);
-			outbuf += 8;
-			break;
-
-		default:
-			if (tmpl > 25) {
-				ret = -EINVAL;
-				goto out;
-			}
-
-			/* Does this go past the end of the input buffer */
-			if ((inbuf + 2) > inbuf_end) {
-				ret = -EINVAL;
-				goto out;
-			}
-
-			/* Would this overflow the output buffer */
-			if ((outbuf + 8) > outbuf_end) {
-				ret = -ENOSPC;
-				goto out;
-			}
-
-			while (opindex < 4 &&
-				(op = sw842_tmpl_ops[tmpl][opindex++])
-					!= NULL) {
-				ret = (*op)(&inbuf, &inbit, &outbuf, fifo);
-				if (ret) {
-					ret = -EINVAL;
-					goto out;
-				}
-				sw842_copy_to_fifo(origbuf, fifo);
-			}
-		}
-	}
-
-out:
-	if (!ret)
-		*destlen = (unsigned int)(outbuf - dst);
-	else
-		*destlen = 0;
-
-	return ret;
-}
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 06/10] drivers/crypto/nx: add NX-842 platform frontend driver
  2015-05-07 17:49   ` [PATCHv3 00/10] add 842 hw compression for PowerNV platform Dan Streetman
                       ` (4 preceding siblings ...)
  2015-05-07 17:49     ` [PATCH 05/10] drivers/crypto/nx: rename nx-842.c to nx-842-pseries.c Dan Streetman
@ 2015-05-07 17:49     ` Dan Streetman
  2015-05-07 17:49     ` [PATCH 07/10] drivers/crypto/nx: add nx842 constraints Dan Streetman
                       ` (4 subsequent siblings)
  10 siblings, 0 replies; 46+ messages in thread
From: Dan Streetman @ 2015-05-07 17:49 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Seth Jennings, Robert Jennings, Andrew Morton, Greg KH,
	Mauro Carvalho Chehab, Arnd Bergmann, Joe Perches, linux-crypto,
	linuxppc-dev, linux-kernel, Dan Streetman

Add NX-842 frontend that allows using either the pSeries platform or
PowerNV platform driver (to be added by later patch) for the NX-842
hardware.  Update the MAINTAINERS file to include the new filenames.
Update Kconfig files to clarify titles and descriptions, and correct
dependencies.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 MAINTAINERS                        |   2 +-
 drivers/crypto/Kconfig             |  10 +--
 drivers/crypto/nx/Kconfig          |  35 ++++++---
 drivers/crypto/nx/Makefile         |   4 +-
 drivers/crypto/nx/nx-842-pseries.c |  57 +++++++--------
 drivers/crypto/nx/nx-842.c         | 144 +++++++++++++++++++++++++++++++++++++
 drivers/crypto/nx/nx-842.h         |  32 +++++++++
 include/linux/nx842.h              |  10 +--
 8 files changed, 245 insertions(+), 49 deletions(-)
 create mode 100644 drivers/crypto/nx/nx-842.c
 create mode 100644 drivers/crypto/nx/nx-842.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 5a5c1dc..e71855f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4870,7 +4870,7 @@ F:	drivers/crypto/nx/
 IBM Power 842 compression accelerator
 M:	Dan Streetman <ddstreet@us.ibm.com>
 S:	Supported
-F:	drivers/crypto/nx/nx-842.c
+F:	drivers/crypto/nx/nx-842*
 F:	include/linux/nx842.h
 F:	include/linux/sw842.h
 F:	crypto/842.c
diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index 033c0c8..872de26 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -312,11 +312,13 @@ config CRYPTO_DEV_S5P
 	  algorithms execution.
 
 config CRYPTO_DEV_NX
-	bool "Support for IBM Power7+ in-Nest cryptographic acceleration"
-	depends on PPC64 && IBMVIO && !CPU_LITTLE_ENDIAN
-	default n
+	bool "Support for IBM PowerPC Nest (NX) cryptographic acceleration"
+	depends on PPC64
 	help
-	  Support for Power7+ in-Nest cryptographic acceleration.
+	  This enables support for the NX hardware cryptographic accelerator
+	  coprocessor that is in IBM PowerPC P7+ or later processors.  This
+	  does not actually enable any drivers, it only allows you to select
+	  which acceleration type (encryption and/or compression) to enable.
 
 if CRYPTO_DEV_NX
 	source "drivers/crypto/nx/Kconfig"
diff --git a/drivers/crypto/nx/Kconfig b/drivers/crypto/nx/Kconfig
index f826166..34013f7 100644
--- a/drivers/crypto/nx/Kconfig
+++ b/drivers/crypto/nx/Kconfig
@@ -1,7 +1,9 @@
+
 config CRYPTO_DEV_NX_ENCRYPT
-	tristate "Encryption acceleration support"
-	depends on PPC64 && IBMVIO
+	tristate "Encryption acceleration support on pSeries platform"
+	depends on PPC_PSERIES && IBMVIO && !CPU_LITTLE_ENDIAN
 	default y
+	select CRYPTO_ALGAPI
 	select CRYPTO_AES
 	select CRYPTO_CBC
 	select CRYPTO_ECB
@@ -12,15 +14,30 @@ config CRYPTO_DEV_NX_ENCRYPT
 	select CRYPTO_SHA256
 	select CRYPTO_SHA512
 	help
-	  Support for Power7+ in-Nest encryption acceleration. This
-	  module supports acceleration for AES and SHA2 algorithms. If you
-	  choose 'M' here, this module will be called nx_crypto.
+	  Support for PowerPC Nest (NX) encryption acceleration. This
+	  module supports acceleration for AES and SHA2 algorithms on
+	  the pSeries platform.  If you choose 'M' here, this module
+	  will be called nx_crypto.
 
 config CRYPTO_DEV_NX_COMPRESS
 	tristate "Compression acceleration support"
-	depends on PPC64 && IBMVIO
 	default y
 	help
-	  Support for Power7+ in-Nest compression acceleration. This
-	  module supports acceleration for AES and SHA2 algorithms. If you
-	  choose 'M' here, this module will be called nx_compress.
+	  Support for PowerPC Nest (NX) compression acceleration. This
+	  module supports acceleration for compressing memory with the 842
+	  algorithm.  One of the platform drivers must be selected also.
+	  If you choose 'M' here, this module will be called nx_compress.
+
+if CRYPTO_DEV_NX_COMPRESS
+
+config CRYPTO_DEV_NX_COMPRESS_PSERIES
+	tristate "Compression acceleration support on pSeries platform"
+	depends on PPC_PSERIES && IBMVIO && !CPU_LITTLE_ENDIAN
+	default y
+	help
+	  Support for PowerPC Nest (NX) compression acceleration. This
+	  module supports acceleration for compressing memory with the 842
+	  algorithm.  This supports NX hardware on the pSeries platform.
+	  If you choose 'M' here, this module will be called nx_compress_pseries.
+
+endif
diff --git a/drivers/crypto/nx/Makefile b/drivers/crypto/nx/Makefile
index 8669ffa..5d9f4bc 100644
--- a/drivers/crypto/nx/Makefile
+++ b/drivers/crypto/nx/Makefile
@@ -11,4 +11,6 @@ nx-crypto-objs := nx.o \
 		  nx-sha512.o
 
 obj-$(CONFIG_CRYPTO_DEV_NX_COMPRESS) += nx-compress.o
-nx-compress-objs := nx-842-pseries.o
+obj-$(CONFIG_CRYPTO_DEV_NX_COMPRESS_PSERIES) += nx-compress-pseries.o
+nx-compress-objs := nx-842.o
+nx-compress-pseries-objs := nx-842-pseries.o
diff --git a/drivers/crypto/nx/nx-842-pseries.c b/drivers/crypto/nx/nx-842-pseries.c
index 887196e..9b83c9e 100644
--- a/drivers/crypto/nx/nx-842-pseries.c
+++ b/drivers/crypto/nx/nx-842-pseries.c
@@ -21,18 +21,13 @@
  *          Seth Jennings <sjenning@linux.vnet.ibm.com>
  */
 
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/nx842.h>
-#include <linux/of.h>
-#include <linux/slab.h>
-
 #include <asm/page.h>
 #include <asm/vio.h>
 
+#include "nx-842.h"
 #include "nx_csbcpb.h" /* struct nx_csbcpb */
 
-#define MODULE_NAME "nx-compress"
+#define MODULE_NAME NX842_PSERIES_MODULE_NAME
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Robert Jennings <rcj@linux.vnet.ibm.com>");
 MODULE_DESCRIPTION("842 H/W Compression driver for IBM Power processors");
@@ -236,18 +231,6 @@ struct nx842_workmem {
 	};
 };
 
-int nx842_get_workmem_size(void)
-{
-	return sizeof(struct nx842_workmem) + NX842_HW_PAGE_SIZE;
-}
-EXPORT_SYMBOL_GPL(nx842_get_workmem_size);
-
-int nx842_get_workmem_size_aligned(void)
-{
-	return sizeof(struct nx842_workmem);
-}
-EXPORT_SYMBOL_GPL(nx842_get_workmem_size_aligned);
-
 static int nx842_validate_result(struct device *dev,
 	struct cop_status_block *csb)
 {
@@ -300,7 +283,7 @@ static int nx842_validate_result(struct device *dev,
 }
 
 /**
- * nx842_compress - Compress data using the 842 algorithm
+ * nx842_pseries_compress - Compress data using the 842 algorithm
  *
  * Compression provide by the NX842 coprocessor on IBM Power systems.
  * The input buffer is compressed and the result is stored in the
@@ -315,7 +298,7 @@ static int nx842_validate_result(struct device *dev,
  * @out: Pointer to output buffer
  * @outlen: Length of output buffer
  * @wrkmem: ptr to buffer for working memory, size determined by
- *          nx842_get_workmem_size()
+ *          NX842_MEM_COMPRESS
  *
  * Returns:
  *   0		Success, output of length @outlen stored in the buffer at @out
@@ -325,8 +308,9 @@ static int nx842_validate_result(struct device *dev,
  *   -EIO	Internal error
  *   -ENODEV	Hardware unavailable
  */
-int nx842_compress(const unsigned char *in, unsigned int inlen,
-		       unsigned char *out, unsigned int *outlen, void *wmem)
+static int nx842_pseries_compress(const unsigned char *in, unsigned int inlen,
+				  unsigned char *out, unsigned int *outlen,
+				  void *wmem)
 {
 	struct nx842_header *hdr;
 	struct nx842_devdata *local_devdata;
@@ -493,13 +477,12 @@ unlock:
 	rcu_read_unlock();
 	return ret;
 }
-EXPORT_SYMBOL_GPL(nx842_compress);
 
 static int sw842_decompress(const unsigned char *, int, unsigned char *, int *,
 			const void *);
 
 /**
- * nx842_decompress - Decompress data using the 842 algorithm
+ * nx842_pseries_decompress - Decompress data using the 842 algorithm
  *
  * Decompression provide by the NX842 coprocessor on IBM Power systems.
  * The input buffer is decompressed and the result is stored in the
@@ -515,7 +498,7 @@ static int sw842_decompress(const unsigned char *, int, unsigned char *, int *,
  * @out: Pointer to output buffer, must be page aligned
  * @outlen: Length of output buffer, must be PAGE_SIZE
  * @wrkmem: ptr to buffer for working memory, size determined by
- *          nx842_get_workmem_size()
+ *          NX842_MEM_COMPRESS
  *
  * Returns:
  *   0		Success, output of length @outlen stored in the buffer at @out
@@ -525,8 +508,9 @@ static int sw842_decompress(const unsigned char *, int, unsigned char *, int *,
  *   -EINVAL	Bad input data encountered when attempting decompress
  *   -EIO	Internal error
  */
-int nx842_decompress(const unsigned char *in, unsigned int inlen,
-			 unsigned char *out, unsigned int *outlen, void *wmem)
+static int nx842_pseries_decompress(const unsigned char *in, unsigned int inlen,
+				    unsigned char *out, unsigned int *outlen,
+				    void *wmem)
 {
 	struct nx842_header *hdr;
 	struct nx842_devdata *local_devdata;
@@ -694,7 +678,6 @@ unlock:
 	rcu_read_unlock();
 	return ret;
 }
-EXPORT_SYMBOL_GPL(nx842_decompress);
 
 /**
  * nx842_OF_set_defaults -- Set default (disabled) values for devdata
@@ -1130,6 +1113,12 @@ static struct attribute_group nx842_attribute_group = {
 	.attrs = nx842_sysfs_entries,
 };
 
+static struct nx842_driver nx842_pseries_driver = {
+	.owner =	THIS_MODULE,
+	.compress =	nx842_pseries_compress,
+	.decompress =	nx842_pseries_decompress,
+};
+
 static int __init nx842_probe(struct vio_dev *viodev,
 				  const struct vio_device_id *id)
 {
@@ -1192,6 +1181,8 @@ static int __init nx842_probe(struct vio_dev *viodev,
 		goto error;
 	}
 
+	nx842_register_driver(&nx842_pseries_driver);
+
 	return 0;
 
 error_unlock:
@@ -1222,11 +1213,14 @@ static int __exit nx842_remove(struct vio_dev *viodev)
 	if (old_devdata)
 		kfree(old_devdata->counters);
 	kfree(old_devdata);
+
+	nx842_unregister_driver(&nx842_pseries_driver);
+
 	return 0;
 }
 
 static struct vio_device_id nx842_driver_ids[] = {
-	{"ibm,compression-v1", "ibm,compression"},
+	{NX842_PSERIES_COMPAT_NAME "-v1", NX842_PSERIES_COMPAT_NAME},
 	{"", ""},
 };
 
@@ -1243,6 +1237,8 @@ static int __init nx842_init(void)
 	struct nx842_devdata *new_devdata;
 	pr_info("Registering IBM Power 842 compression driver\n");
 
+	BUILD_BUG_ON(sizeof(struct nx842_workmem) > NX842_MEM_COMPRESS);
+
 	RCU_INIT_POINTER(devdata, NULL);
 	new_devdata = kzalloc(sizeof(*new_devdata), GFP_KERNEL);
 	if (!new_devdata) {
@@ -1272,6 +1268,7 @@ static void __exit nx842_exit(void)
 	if (old_devdata)
 		dev_set_drvdata(old_devdata->dev, NULL);
 	kfree(old_devdata);
+	nx842_unregister_driver(&nx842_pseries_driver);
 	vio_unregister_driver(&nx842_driver);
 }
 
diff --git a/drivers/crypto/nx/nx-842.c b/drivers/crypto/nx/nx-842.c
new file mode 100644
index 0000000..f1f378e
--- /dev/null
+++ b/drivers/crypto/nx/nx-842.c
@@ -0,0 +1,144 @@
+/*
+ * Driver frontend for IBM Power 842 compression accelerator
+ *
+ * Copyright (C) 2015 Dan Streetman, IBM Corp
+ *
+ * Designer of the Power data compression engine:
+ *   Bulent Abali <abali@us.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include "nx-842.h"
+
+#define MODULE_NAME "nx-compress"
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Dan Streetman <ddstreet@ieee.org>");
+MODULE_DESCRIPTION("842 H/W Compression driver for IBM Power processors");
+
+/* Only one driver is expected, based on the HW platform */
+static struct nx842_driver *nx842_driver;
+static DEFINE_SPINLOCK(nx842_driver_lock); /* protects driver pointers */
+
+void nx842_register_driver(struct nx842_driver *driver)
+{
+	spin_lock(&nx842_driver_lock);
+
+	if (nx842_driver) {
+		pr_err("can't register driver %s, already using driver %s\n",
+		       driver->owner->name, nx842_driver->owner->name);
+	} else {
+		pr_info("registering driver %s\n", driver->owner->name);
+		nx842_driver = driver;
+	}
+
+	spin_unlock(&nx842_driver_lock);
+}
+EXPORT_SYMBOL_GPL(nx842_register_driver);
+
+void nx842_unregister_driver(struct nx842_driver *driver)
+{
+	spin_lock(&nx842_driver_lock);
+
+	if (nx842_driver == driver) {
+		pr_info("unregistering driver %s\n", driver->owner->name);
+		nx842_driver = NULL;
+	} else if (nx842_driver) {
+		pr_err("can't unregister driver %s, using driver %s\n",
+		       driver->owner->name, nx842_driver->owner->name);
+	} else {
+		pr_err("can't unregister driver %s, no driver in use\n",
+		       driver->owner->name);
+	}
+
+	spin_unlock(&nx842_driver_lock);
+}
+EXPORT_SYMBOL_GPL(nx842_unregister_driver);
+
+static struct nx842_driver *get_driver(void)
+{
+	struct nx842_driver *driver = NULL;
+
+	spin_lock(&nx842_driver_lock);
+
+	driver = nx842_driver;
+
+	if (driver && !try_module_get(driver->owner))
+		driver = NULL;
+
+	spin_unlock(&nx842_driver_lock);
+
+	return driver;
+}
+
+static void put_driver(struct nx842_driver *driver)
+{
+	module_put(driver->owner);
+}
+
+int nx842_compress(const unsigned char *in, unsigned int in_len,
+		   unsigned char *out, unsigned int *out_len,
+		   void *wrkmem)
+{
+	struct nx842_driver *driver = get_driver();
+	int ret;
+
+	if (!driver)
+		return -ENODEV;
+
+	ret = driver->compress(in, in_len, out, out_len, wrkmem);
+
+	put_driver(driver);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(nx842_compress);
+
+int nx842_decompress(const unsigned char *in, unsigned int in_len,
+		     unsigned char *out, unsigned int *out_len,
+		     void *wrkmem)
+{
+	struct nx842_driver *driver = get_driver();
+	int ret;
+
+	if (!driver)
+		return -ENODEV;
+
+	ret = driver->decompress(in, in_len, out, out_len, wrkmem);
+
+	put_driver(driver);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(nx842_decompress);
+
+static __init int nx842_init(void)
+{
+	pr_info("loading\n");
+
+	if (of_find_compatible_node(NULL, NULL, NX842_PSERIES_COMPAT_NAME))
+		request_module_nowait(NX842_PSERIES_MODULE_NAME);
+	else
+		pr_err("no nx842 driver found.\n");
+
+	pr_info("loaded\n");
+
+	return 0;
+}
+module_init(nx842_init);
+
+static void __exit nx842_exit(void)
+{
+	pr_info("NX842 unloaded\n");
+}
+module_exit(nx842_exit);
diff --git a/drivers/crypto/nx/nx-842.h b/drivers/crypto/nx/nx-842.h
new file mode 100644
index 0000000..2a5d4e1
--- /dev/null
+++ b/drivers/crypto/nx/nx-842.h
@@ -0,0 +1,32 @@
+
+#ifndef __NX_842_H__
+#define __NX_842_H__
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/nx842.h>
+#include <linux/of.h>
+#include <linux/slab.h>
+#include <linux/io.h>
+
+struct nx842_driver {
+	struct module *owner;
+
+	int (*compress)(const unsigned char *in, unsigned int in_len,
+			unsigned char *out, unsigned int *out_len,
+			void *wrkmem);
+	int (*decompress)(const unsigned char *in, unsigned int in_len,
+			  unsigned char *out, unsigned int *out_len,
+			  void *wrkmem);
+};
+
+void nx842_register_driver(struct nx842_driver *driver);
+void nx842_unregister_driver(struct nx842_driver *driver);
+
+
+/* To allow the main nx-compress module to load platform module */
+#define NX842_PSERIES_MODULE_NAME	"nx-compress-pseries"
+#define NX842_PSERIES_COMPAT_NAME	"ibm,compression"
+
+
+#endif /* __NX_842_H__ */
diff --git a/include/linux/nx842.h b/include/linux/nx842.h
index a4d324c..d919c22 100644
--- a/include/linux/nx842.h
+++ b/include/linux/nx842.h
@@ -1,11 +1,13 @@
 #ifndef __NX842_H__
 #define __NX842_H__
 
-int nx842_get_workmem_size(void);
-int nx842_get_workmem_size_aligned(void);
+#define __NX842_PSERIES_MEM_COMPRESS	((PAGE_SIZE * 2) + 10240)
+
+#define NX842_MEM_COMPRESS	__NX842_PSERIES_MEM_COMPRESS
+
 int nx842_compress(const unsigned char *in, unsigned int in_len,
-		unsigned char *out, unsigned int *out_len, void *wrkmem);
+		   unsigned char *out, unsigned int *out_len, void *wrkmem);
 int nx842_decompress(const unsigned char *in, unsigned int in_len,
-		unsigned char *out, unsigned int *out_len, void *wrkmem);
+		     unsigned char *out, unsigned int *out_len, void *wrkmem);
 
 #endif
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 07/10] drivers/crypto/nx: add nx842 constraints
  2015-05-07 17:49   ` [PATCHv3 00/10] add 842 hw compression for PowerNV platform Dan Streetman
                       ` (5 preceding siblings ...)
  2015-05-07 17:49     ` [PATCH 06/10] drivers/crypto/nx: add NX-842 platform frontend driver Dan Streetman
@ 2015-05-07 17:49     ` Dan Streetman
  2015-05-07 17:49     ` [PATCH 08/10] drivers/crypto/nx: add PowerNV platform NX-842 driver Dan Streetman
                       ` (3 subsequent siblings)
  10 siblings, 0 replies; 46+ messages in thread
From: Dan Streetman @ 2015-05-07 17:49 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Seth Jennings, Robert Jennings, linux-crypto, linuxppc-dev,
	linux-kernel, Dan Streetman

Add "constraints" for the NX-842 driver.  The constraints are used to
indicate what the current NX-842 platform driver is capable of.  The
constraints tell the NX-842 user what alignment, min and max length, and
length multiple each provided buffers should conform to.  These are
required because the 842 hardware requires buffers to meet specific
constraints that vary based on platform - for example, the pSeries
max length is much lower than the PowerNV max length.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 drivers/crypto/nx/nx-842-pseries.c | 10 ++++++++++
 drivers/crypto/nx/nx-842.c         | 38 ++++++++++++++++++++++++++++++++++++++
 drivers/crypto/nx/nx-842.h         |  2 ++
 include/linux/nx842.h              |  9 +++++++++
 4 files changed, 59 insertions(+)

diff --git a/drivers/crypto/nx/nx-842-pseries.c b/drivers/crypto/nx/nx-842-pseries.c
index 9b83c9e..cb481d8 100644
--- a/drivers/crypto/nx/nx-842-pseries.c
+++ b/drivers/crypto/nx/nx-842-pseries.c
@@ -40,6 +40,13 @@ MODULE_DESCRIPTION("842 H/W Compression driver for IBM Power processors");
 /* IO buffer must be 128 byte aligned */
 #define IO_BUFFER_ALIGN 128
 
+static struct nx842_constraints nx842_pseries_constraints = {
+	.alignment =	IO_BUFFER_ALIGN,
+	.multiple =	DDE_BUFFER_LAST_MULT,
+	.minimum =	IO_BUFFER_ALIGN,
+	.maximum =	PAGE_SIZE, /* dynamic, max_sync_size */
+};
+
 struct nx842_header {
 	int blocks_nr; /* number of compressed blocks */
 	int offset; /* offset of the first block (from beginning of header) */
@@ -842,6 +849,8 @@ static int nx842_OF_upd_maxsyncop(struct nx842_devdata *devdata,
 		goto out;
 	}
 
+	nx842_pseries_constraints.maximum = devdata->max_sync_size;
+
 	devdata->max_sync_sg = (unsigned int)min(maxsynccop->comp_sg_limit,
 						maxsynccop->decomp_sg_limit);
 	if (devdata->max_sync_sg < 1) {
@@ -1115,6 +1124,7 @@ static struct attribute_group nx842_attribute_group = {
 
 static struct nx842_driver nx842_pseries_driver = {
 	.owner =	THIS_MODULE,
+	.constraints =	&nx842_pseries_constraints,
 	.compress =	nx842_pseries_compress,
 	.decompress =	nx842_pseries_decompress,
 };
diff --git a/drivers/crypto/nx/nx-842.c b/drivers/crypto/nx/nx-842.c
index f1f378e..160fe2d 100644
--- a/drivers/crypto/nx/nx-842.c
+++ b/drivers/crypto/nx/nx-842.c
@@ -86,6 +86,44 @@ static void put_driver(struct nx842_driver *driver)
 	module_put(driver->owner);
 }
 
+/**
+ * nx842_constraints
+ *
+ * This provides the driver's constraints.  Different nx842 implementations
+ * may have varying requirements.  The constraints are:
+ *   @alignment:	All buffers should be aligned to this
+ *   @multiple:		All buffer lengths should be a multiple of this
+ *   @minimum:		Buffer lengths must not be less than this amount
+ *   @maximum:		Buffer lengths must not be more than this amount
+ *
+ * The constraints apply to all buffers and lengths, both input and output,
+ * for both compression and decompression, except for the minimum which
+ * only applies to compression input and decompression output; the
+ * compressed data can be less than the minimum constraint.  It can be
+ * assumed that compressed data will always adhere to the multiple
+ * constraint.
+ *
+ * The driver may succeed even if these constraints are violated;
+ * however the driver can return failure or suffer reduced performance
+ * if any constraint is not met.
+ */
+int nx842_constraints(struct nx842_constraints *c)
+{
+	struct nx842_driver *driver = get_driver();
+	int ret = 0;
+
+	if (!driver)
+		return -ENODEV;
+
+	BUG_ON(!c);
+	memcpy(c, driver->constraints, sizeof(*c));
+
+	put_driver(driver);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(nx842_constraints);
+
 int nx842_compress(const unsigned char *in, unsigned int in_len,
 		   unsigned char *out, unsigned int *out_len,
 		   void *wrkmem)
diff --git a/drivers/crypto/nx/nx-842.h b/drivers/crypto/nx/nx-842.h
index 2a5d4e1..c6ceb0f 100644
--- a/drivers/crypto/nx/nx-842.h
+++ b/drivers/crypto/nx/nx-842.h
@@ -12,6 +12,8 @@
 struct nx842_driver {
 	struct module *owner;
 
+	struct nx842_constraints *constraints;
+
 	int (*compress)(const unsigned char *in, unsigned int in_len,
 			unsigned char *out, unsigned int *out_len,
 			void *wrkmem);
diff --git a/include/linux/nx842.h b/include/linux/nx842.h
index d919c22..aa1a97e9 100644
--- a/include/linux/nx842.h
+++ b/include/linux/nx842.h
@@ -5,6 +5,15 @@
 
 #define NX842_MEM_COMPRESS	__NX842_PSERIES_MEM_COMPRESS
 
+struct nx842_constraints {
+	int alignment;
+	int multiple;
+	int minimum;
+	int maximum;
+};
+
+int nx842_constraints(struct nx842_constraints *constraints);
+
 int nx842_compress(const unsigned char *in, unsigned int in_len,
 		   unsigned char *out, unsigned int *out_len, void *wrkmem);
 int nx842_decompress(const unsigned char *in, unsigned int in_len,
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 08/10] drivers/crypto/nx: add PowerNV platform NX-842 driver
  2015-05-07 17:49   ` [PATCHv3 00/10] add 842 hw compression for PowerNV platform Dan Streetman
                       ` (6 preceding siblings ...)
  2015-05-07 17:49     ` [PATCH 07/10] drivers/crypto/nx: add nx842 constraints Dan Streetman
@ 2015-05-07 17:49     ` Dan Streetman
  2015-05-07 17:49     ` [PATCH 09/10] drivers/crypto/nx: simplify pSeries nx842 driver Dan Streetman
                       ` (2 subsequent siblings)
  10 siblings, 0 replies; 46+ messages in thread
From: Dan Streetman @ 2015-05-07 17:49 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Seth Jennings, Robert Jennings, linux-crypto, linuxppc-dev,
	linux-kernel, Dan Streetman

Add driver for NX-842 hardware on the PowerNV platform.

This allows the use of the 842 compression hardware coprocessor on
the PowerNV platform.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 drivers/crypto/nx/Kconfig          |  10 +
 drivers/crypto/nx/Makefile         |   2 +
 drivers/crypto/nx/nx-842-powernv.c | 625 +++++++++++++++++++++++++++++++++++++
 drivers/crypto/nx/nx-842-pseries.c |   9 -
 drivers/crypto/nx/nx-842.c         |   4 +-
 drivers/crypto/nx/nx-842.h         |  97 ++++++
 include/linux/nx842.h              |   6 +-
 7 files changed, 741 insertions(+), 12 deletions(-)
 create mode 100644 drivers/crypto/nx/nx-842-powernv.c

diff --git a/drivers/crypto/nx/Kconfig b/drivers/crypto/nx/Kconfig
index 34013f7..ee9e259 100644
--- a/drivers/crypto/nx/Kconfig
+++ b/drivers/crypto/nx/Kconfig
@@ -40,4 +40,14 @@ config CRYPTO_DEV_NX_COMPRESS_PSERIES
 	  algorithm.  This supports NX hardware on the pSeries platform.
 	  If you choose 'M' here, this module will be called nx_compress_pseries.
 
+config CRYPTO_DEV_NX_COMPRESS_POWERNV
+	tristate "Compression acceleration support on PowerNV platform"
+	depends on PPC_POWERNV
+	default y
+	help
+	  Support for PowerPC Nest (NX) compression acceleration. This
+	  module supports acceleration for compressing memory with the 842
+	  algorithm.  This supports NX hardware on the PowerNV platform.
+	  If you choose 'M' here, this module will be called nx_compress_powernv.
+
 endif
diff --git a/drivers/crypto/nx/Makefile b/drivers/crypto/nx/Makefile
index 5d9f4bc..6619787 100644
--- a/drivers/crypto/nx/Makefile
+++ b/drivers/crypto/nx/Makefile
@@ -12,5 +12,7 @@ nx-crypto-objs := nx.o \
 
 obj-$(CONFIG_CRYPTO_DEV_NX_COMPRESS) += nx-compress.o
 obj-$(CONFIG_CRYPTO_DEV_NX_COMPRESS_PSERIES) += nx-compress-pseries.o
+obj-$(CONFIG_CRYPTO_DEV_NX_COMPRESS_POWERNV) += nx-compress-powernv.o
 nx-compress-objs := nx-842.o
 nx-compress-pseries-objs := nx-842-pseries.o
+nx-compress-powernv-objs := nx-842-powernv.o
diff --git a/drivers/crypto/nx/nx-842-powernv.c b/drivers/crypto/nx/nx-842-powernv.c
new file mode 100644
index 0000000..6a9fb8b
--- /dev/null
+++ b/drivers/crypto/nx/nx-842-powernv.c
@@ -0,0 +1,625 @@
+/*
+ * Driver for IBM PowerNV 842 compression accelerator
+ *
+ * Copyright (C) 2015 Dan Streetman, IBM Corp
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include "nx-842.h"
+
+#include <linux/timer.h>
+
+#include <asm/prom.h>
+#include <asm/icswx.h>
+
+#define MODULE_NAME NX842_POWERNV_MODULE_NAME
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Dan Streetman <ddstreet@ieee.org>");
+MODULE_DESCRIPTION("842 H/W Compression driver for IBM PowerNV processors");
+
+#define WORKMEM_ALIGN	(CRB_ALIGN)
+#define CSB_WAIT_MAX	(5000) /* ms */
+
+struct nx842_workmem {
+	/* Below fields must be properly aligned */
+	struct coprocessor_request_block crb; /* CRB_ALIGN align */
+	struct data_descriptor_entry ddl_in[DDL_LEN_MAX]; /* DDE_ALIGN align */
+	struct data_descriptor_entry ddl_out[DDL_LEN_MAX]; /* DDE_ALIGN align */
+	/* Above fields must be properly aligned */
+
+	ktime_t start;
+
+	char padding[WORKMEM_ALIGN]; /* unused, to allow alignment */
+} __packed __aligned(WORKMEM_ALIGN);
+
+struct nx842_coproc {
+	unsigned int chip_id;
+	unsigned int ct;
+	unsigned int ci;
+	struct list_head list;
+};
+
+/* no cpu hotplug on powernv, so this list never changes after init */
+static LIST_HEAD(nx842_coprocs);
+static unsigned int nx842_ct;
+
+/**
+ * setup_indirect_dde - Setup an indirect DDE
+ *
+ * The DDE is setup with the the DDE count, byte count, and address of
+ * first direct DDE in the list.
+ */
+static void setup_indirect_dde(struct data_descriptor_entry *dde,
+			       struct data_descriptor_entry *ddl,
+			       unsigned int dde_count, unsigned int byte_count)
+{
+	dde->flags = 0;
+	dde->count = dde_count;
+	dde->index = 0;
+	dde->length = cpu_to_be32(byte_count);
+	dde->address = cpu_to_be64(nx842_get_pa(ddl));
+}
+
+/**
+ * setup_direct_dde - Setup single DDE from buffer
+ *
+ * The DDE is setup with the buffer and length.  The buffer must be properly
+ * aligned.  The used length is returned.
+ * Returns:
+ *   N    Successfully set up DDE with N bytes
+ */
+static unsigned int setup_direct_dde(struct data_descriptor_entry *dde,
+				     unsigned long pa, unsigned int len)
+{
+	unsigned int l = min_t(unsigned int, len, LEN_ON_PAGE(pa));
+
+	dde->flags = 0;
+	dde->count = 0;
+	dde->index = 0;
+	dde->length = cpu_to_be32(l);
+	dde->address = cpu_to_be64(pa);
+
+	return l;
+}
+
+/**
+ * setup_ddl - Setup DDL from buffer
+ *
+ * Returns:
+ *   0		Successfully set up DDL
+ */
+static int setup_ddl(struct data_descriptor_entry *dde,
+		     struct data_descriptor_entry *ddl,
+		     unsigned char *buf, unsigned int len,
+		     bool in)
+{
+	unsigned long pa = nx842_get_pa(buf);
+	int i, ret, total_len = len;
+
+	if (!IS_ALIGNED(pa, DDE_BUFFER_ALIGN)) {
+		pr_debug("%s buffer pa 0x%lx not 0x%x-byte aligned\n",
+			 in ? "input" : "output", pa, DDE_BUFFER_ALIGN);
+		return -EINVAL;
+	}
+
+	/* only need to check last mult; since buffer must be
+	 * DDE_BUFFER_ALIGN aligned, and that is a multiple of
+	 * DDE_BUFFER_SIZE_MULT, and pre-last page DDE buffers
+	 * are guaranteed a multiple of DDE_BUFFER_SIZE_MULT.
+	 */
+	if (len % DDE_BUFFER_LAST_MULT) {
+		pr_debug("%s buffer len 0x%x not a multiple of 0x%x\n",
+			 in ? "input" : "output", len, DDE_BUFFER_LAST_MULT);
+		if (in)
+			return -EINVAL;
+		len = round_down(len, DDE_BUFFER_LAST_MULT);
+	}
+
+	/* use a single direct DDE */
+	if (len <= LEN_ON_PAGE(pa)) {
+		ret = setup_direct_dde(dde, pa, len);
+		WARN_ON(ret < len);
+		return 0;
+	}
+
+	/* use the DDL */
+	for (i = 0; i < DDL_LEN_MAX && len > 0; i++) {
+		ret = setup_direct_dde(&ddl[i], pa, len);
+		buf += ret;
+		len -= ret;
+		pa = nx842_get_pa(buf);
+	}
+
+	if (len > 0) {
+		pr_debug("0x%x total %s bytes 0x%x too many for DDL.\n",
+			 total_len, in ? "input" : "output", len);
+		if (in)
+			return -EMSGSIZE;
+		total_len -= len;
+	}
+	setup_indirect_dde(dde, ddl, i, total_len);
+
+	return 0;
+}
+
+#define CSB_ERR(csb, msg, ...)					\
+	pr_err("ERROR: " msg " : %02x %02x %02x %02x %08x\n",	\
+	       ##__VA_ARGS__, (csb)->flags,			\
+	       (csb)->cs, (csb)->cc, (csb)->ce,			\
+	       be32_to_cpu((csb)->count))
+
+#define CSB_ERR_ADDR(csb, msg, ...)				\
+	CSB_ERR(csb, msg " at %lx", ##__VA_ARGS__,		\
+		(unsigned long)be64_to_cpu((csb)->address))
+
+/**
+ * wait_for_csb
+ */
+static int wait_for_csb(struct nx842_workmem *wmem,
+			struct coprocessor_status_block *csb)
+{
+	ktime_t start = wmem->start, now = ktime_get();
+	ktime_t timeout = ktime_add_ms(start, CSB_WAIT_MAX);
+
+	while (!(ACCESS_ONCE(csb->flags) & CSB_V)) {
+		cpu_relax();
+		now = ktime_get();
+		if (ktime_after(now, timeout))
+			break;
+	}
+
+	/* hw has updated csb and output buffer */
+	barrier();
+
+	/* check CSB flags */
+	if (!(csb->flags & CSB_V)) {
+		CSB_ERR(csb, "CSB still not valid after %ld us, giving up",
+			(long)ktime_us_delta(now, start));
+		return -ETIMEDOUT;
+	}
+	if (csb->flags & CSB_F) {
+		CSB_ERR(csb, "Invalid CSB format");
+		return -EPROTO;
+	}
+	if (csb->flags & CSB_CH) {
+		CSB_ERR(csb, "Invalid CSB chaining state");
+		return -EPROTO;
+	}
+
+	/* verify CSB completion sequence is 0 */
+	if (csb->cs) {
+		CSB_ERR(csb, "Invalid CSB completion sequence");
+		return -EPROTO;
+	}
+
+	/* check CSB Completion Code */
+	switch (csb->cc) {
+	/* no error */
+	case CSB_CC_SUCCESS:
+		break;
+	case CSB_CC_TPBC_GT_SPBC:
+		/* not an error, but the compressed data is
+		 * larger than the uncompressed data :(
+		 */
+		break;
+
+	/* input data errors */
+	case CSB_CC_OPERAND_OVERLAP:
+		/* input and output buffers overlap */
+		CSB_ERR(csb, "Operand Overlap error");
+		return -EINVAL;
+	case CSB_CC_INVALID_OPERAND:
+		CSB_ERR(csb, "Invalid operand");
+		return -EINVAL;
+	case CSB_CC_NOSPC:
+		/* output buffer too small */
+		return -ENOSPC;
+	case CSB_CC_ABORT:
+		CSB_ERR(csb, "Function aborted");
+		return -EINTR;
+	case CSB_CC_CRC_MISMATCH:
+		CSB_ERR(csb, "CRC mismatch");
+		return -EINVAL;
+	case CSB_CC_TEMPL_INVALID:
+		CSB_ERR(csb, "Compressed data template invalid");
+		return -EINVAL;
+	case CSB_CC_TEMPL_OVERFLOW:
+		CSB_ERR(csb, "Compressed data template shows data past end");
+		return -EINVAL;
+
+	/* these should not happen */
+	case CSB_CC_INVALID_ALIGN:
+		/* setup_ddl should have detected this */
+		CSB_ERR_ADDR(csb, "Invalid alignment");
+		return -EINVAL;
+	case CSB_CC_DATA_LENGTH:
+		/* setup_ddl should have detected this */
+		CSB_ERR(csb, "Invalid data length");
+		return -EINVAL;
+	case CSB_CC_WR_TRANSLATION:
+	case CSB_CC_TRANSLATION:
+	case CSB_CC_TRANSLATION_DUP1:
+	case CSB_CC_TRANSLATION_DUP2:
+	case CSB_CC_TRANSLATION_DUP3:
+	case CSB_CC_TRANSLATION_DUP4:
+	case CSB_CC_TRANSLATION_DUP5:
+	case CSB_CC_TRANSLATION_DUP6:
+		/* should not happen, we use physical addrs */
+		CSB_ERR_ADDR(csb, "Translation error");
+		return -EPROTO;
+	case CSB_CC_WR_PROTECTION:
+	case CSB_CC_PROTECTION:
+	case CSB_CC_PROTECTION_DUP1:
+	case CSB_CC_PROTECTION_DUP2:
+	case CSB_CC_PROTECTION_DUP3:
+	case CSB_CC_PROTECTION_DUP4:
+	case CSB_CC_PROTECTION_DUP5:
+	case CSB_CC_PROTECTION_DUP6:
+		/* should not happen, we use physical addrs */
+		CSB_ERR_ADDR(csb, "Protection error");
+		return -EPROTO;
+	case CSB_CC_PRIVILEGE:
+		/* shouldn't happen, we're in HYP mode */
+		CSB_ERR(csb, "Insufficient Privilege error");
+		return -EPROTO;
+	case CSB_CC_EXCESSIVE_DDE:
+		/* shouldn't happen, setup_ddl doesn't use many dde's */
+		CSB_ERR(csb, "Too many DDEs in DDL");
+		return -EINVAL;
+	case CSB_CC_TRANSPORT:
+		/* shouldn't happen, we setup CRB correctly */
+		CSB_ERR(csb, "Invalid CRB");
+		return -EINVAL;
+	case CSB_CC_SEGMENTED_DDL:
+		/* shouldn't happen, setup_ddl creates DDL right */
+		CSB_ERR(csb, "Segmented DDL error");
+		return -EINVAL;
+	case CSB_CC_DDE_OVERFLOW:
+		/* shouldn't happen, setup_ddl creates DDL right */
+		CSB_ERR(csb, "DDE overflow error");
+		return -EINVAL;
+	case CSB_CC_SESSION:
+		/* should not happen with ICSWX */
+		CSB_ERR(csb, "Session violation error");
+		return -EPROTO;
+	case CSB_CC_CHAIN:
+		/* should not happen, we don't use chained CRBs */
+		CSB_ERR(csb, "Chained CRB error");
+		return -EPROTO;
+	case CSB_CC_SEQUENCE:
+		/* should not happen, we don't use chained CRBs */
+		CSB_ERR(csb, "CRB seqeunce number error");
+		return -EPROTO;
+	case CSB_CC_UNKNOWN_CODE:
+		CSB_ERR(csb, "Unknown subfunction code");
+		return -EPROTO;
+
+	/* hardware errors */
+	case CSB_CC_RD_EXTERNAL:
+	case CSB_CC_RD_EXTERNAL_DUP1:
+	case CSB_CC_RD_EXTERNAL_DUP2:
+	case CSB_CC_RD_EXTERNAL_DUP3:
+		CSB_ERR_ADDR(csb, "Read error outside coprocessor");
+		return -EPROTO;
+	case CSB_CC_WR_EXTERNAL:
+		CSB_ERR_ADDR(csb, "Write error outside coprocessor");
+		return -EPROTO;
+	case CSB_CC_INTERNAL:
+		CSB_ERR(csb, "Internal error in coprocessor");
+		return -EPROTO;
+	case CSB_CC_PROVISION:
+		CSB_ERR(csb, "Storage provision error");
+		return -EPROTO;
+	case CSB_CC_HW:
+		CSB_ERR(csb, "Correctable hardware error");
+		return -EPROTO;
+
+	default:
+		CSB_ERR(csb, "Invalid CC %d", csb->cc);
+		return -EPROTO;
+	}
+
+	/* check Completion Extension state */
+	if (csb->ce & CSB_CE_TERMINATION) {
+		CSB_ERR(csb, "CSB request was terminated");
+		return -EPROTO;
+	}
+	if (csb->ce & CSB_CE_INCOMPLETE) {
+		CSB_ERR(csb, "CSB request not complete");
+		return -EPROTO;
+	}
+	if (!(csb->ce & CSB_CE_TPBC)) {
+		CSB_ERR(csb, "TPBC not provided, unknown target length");
+		return -EPROTO;
+	}
+
+	/* successful completion */
+	pr_debug_ratelimited("Processed %u bytes in %lu us\n", csb->count,
+			     (unsigned long)ktime_us_delta(now, start));
+
+	return 0;
+}
+
+/**
+ * nx842_powernv_function - compress/decompress data using the 842 algorithm
+ *
+ * (De)compression provided by the NX842 coprocessor on IBM PowerNV systems.
+ * This compresses or decompresses the provided input buffer into the provided
+ * output buffer.
+ *
+ * Upon return from this function @outlen contains the length of the
+ * output data.  If there is an error then @outlen will be 0 and an
+ * error will be specified by the return code from this function.
+ *
+ * The @workmem buffer should only be used by one function call at a time.
+ *
+ * @in: input buffer pointer
+ * @inlen: input buffer size
+ * @out: output buffer pointer
+ * @outlenp: output buffer size pointer
+ * @workmem: working memory buffer pointer, must be at least NX842_MEM_COMPRESS
+ * @fc: function code, see CCW Function Codes in nx-842.h
+ *
+ * Returns:
+ *   0		Success, output of length @outlenp stored in the buffer at @out
+ *   -ENODEV	Hardware unavailable
+ *   -ENOSPC	Output buffer is to small
+ *   -EMSGSIZE	Input buffer too large
+ *   -EINVAL	buffer constraints do not fix nx842_constraints
+ *   -EPROTO	hardware error during operation
+ *   -ETIMEDOUT	hardware did not complete operation in reasonable time
+ *   -EINTR	operation was aborted
+ */
+static int nx842_powernv_function(const unsigned char *in, unsigned int inlen,
+				  unsigned char *out, unsigned int *outlenp,
+				  void *workmem, int fc)
+{
+	struct coprocessor_request_block *crb;
+	struct coprocessor_status_block *csb;
+	struct nx842_workmem *wmem;
+	int ret;
+	u64 csb_addr;
+	u32 ccw;
+	unsigned int outlen = *outlenp;
+
+	wmem = PTR_ALIGN(workmem, WORKMEM_ALIGN);
+
+	*outlenp = 0;
+
+	/* shoudn't happen, we don't load without a coproc */
+	if (!nx842_ct) {
+		pr_err_ratelimited("coprocessor CT is 0");
+		return -ENODEV;
+	}
+
+	crb = &wmem->crb;
+	csb = &crb->csb;
+
+	/* Clear any previous values */
+	memset(crb, 0, sizeof(*crb));
+
+	/* set up DDLs */
+	ret = setup_ddl(&crb->source, wmem->ddl_in,
+			(unsigned char *)in, inlen, true);
+	if (ret)
+		return ret;
+	ret = setup_ddl(&crb->target, wmem->ddl_out,
+			out, outlen, false);
+	if (ret)
+		return ret;
+
+	/* set up CCW */
+	ccw = 0;
+	ccw = SET_FIELD(ccw, CCW_CT, nx842_ct);
+	ccw = SET_FIELD(ccw, CCW_CI_842, 0); /* use 0 for hw auto-selection */
+	ccw = SET_FIELD(ccw, CCW_FC_842, fc);
+
+	/* set up CRB's CSB addr */
+	csb_addr = nx842_get_pa(csb) & CRB_CSB_ADDRESS;
+	csb_addr |= CRB_CSB_AT; /* Addrs are phys */
+	crb->csb_addr = cpu_to_be64(csb_addr);
+
+	wmem->start = ktime_get();
+
+	/* do ICSWX */
+	ret = icswx(cpu_to_be32(ccw), crb);
+
+	pr_debug_ratelimited("icswx CR %x ccw %x crb->ccw %x\n", ret,
+			     (unsigned int)ccw,
+			     (unsigned int)be32_to_cpu(crb->ccw));
+
+	switch (ret) {
+	case ICSWX_INITIATED:
+		ret = wait_for_csb(wmem, csb);
+		break;
+	case ICSWX_BUSY:
+		pr_debug_ratelimited("842 Coprocessor busy\n");
+		ret = -EBUSY;
+		break;
+	case ICSWX_REJECTED:
+		pr_err_ratelimited("ICSWX rejected\n");
+		ret = -EPROTO;
+		break;
+	default:
+		pr_err_ratelimited("Invalid ICSWX return code %x\n", ret);
+		ret = -EPROTO;
+		break;
+	}
+
+	if (!ret)
+		*outlenp = be32_to_cpu(csb->count);
+
+	return ret;
+}
+
+/**
+ * nx842_powernv_compress - Compress data using the 842 algorithm
+ *
+ * Compression provided by the NX842 coprocessor on IBM PowerNV systems.
+ * The input buffer is compressed and the result is stored in the
+ * provided output buffer.
+ *
+ * Upon return from this function @outlen contains the length of the
+ * compressed data.  If there is an error then @outlen will be 0 and an
+ * error will be specified by the return code from this function.
+ *
+ * @in: input buffer pointer
+ * @inlen: input buffer size
+ * @out: output buffer pointer
+ * @outlenp: output buffer size pointer
+ * @workmem: working memory buffer pointer, must be at least NX842_MEM_COMPRESS
+ *
+ * Returns: see @nx842_powernv_function()
+ */
+static int nx842_powernv_compress(const unsigned char *in, unsigned int inlen,
+				  unsigned char *out, unsigned int *outlenp,
+				  void *wmem)
+{
+	return nx842_powernv_function(in, inlen, out, outlenp,
+				      wmem, CCW_FC_842_COMP_NOCRC);
+}
+
+/**
+ * nx842_powernv_decompress - Decompress data using the 842 algorithm
+ *
+ * Decompression provided by the NX842 coprocessor on IBM PowerNV systems.
+ * The input buffer is decompressed and the result is stored in the
+ * provided output buffer.
+ *
+ * Upon return from this function @outlen contains the length of the
+ * decompressed data.  If there is an error then @outlen will be 0 and an
+ * error will be specified by the return code from this function.
+ *
+ * @in: input buffer pointer
+ * @inlen: input buffer size
+ * @out: output buffer pointer
+ * @outlenp: output buffer size pointer
+ * @workmem: working memory buffer pointer, must be at least NX842_MEM_COMPRESS
+ *
+ * Returns: see @nx842_powernv_function()
+ */
+static int nx842_powernv_decompress(const unsigned char *in, unsigned int inlen,
+				    unsigned char *out, unsigned int *outlenp,
+				    void *wmem)
+{
+	return nx842_powernv_function(in, inlen, out, outlenp,
+				      wmem, CCW_FC_842_DECOMP_NOCRC);
+}
+
+static int __init nx842_powernv_probe(struct device_node *dn)
+{
+	struct nx842_coproc *coproc;
+	struct property *ct_prop, *ci_prop;
+	unsigned int ct, ci;
+	int chip_id;
+
+	chip_id = of_get_ibm_chip_id(dn);
+	if (chip_id < 0) {
+		pr_err("ibm,chip-id missing\n");
+		return -EINVAL;
+	}
+	ct_prop = of_find_property(dn, "ibm,842-coprocessor-type", NULL);
+	if (!ct_prop) {
+		pr_err("ibm,842-coprocessor-type missing\n");
+		return -EINVAL;
+	}
+	ct = be32_to_cpu(*(unsigned int *)ct_prop->value);
+	ci_prop = of_find_property(dn, "ibm,842-coprocessor-instance", NULL);
+	if (!ci_prop) {
+		pr_err("ibm,842-coprocessor-instance missing\n");
+		return -EINVAL;
+	}
+	ci = be32_to_cpu(*(unsigned int *)ci_prop->value);
+
+	coproc = kmalloc(sizeof(*coproc), GFP_KERNEL);
+	if (!coproc)
+		return -ENOMEM;
+
+	coproc->chip_id = chip_id;
+	coproc->ct = ct;
+	coproc->ci = ci;
+	INIT_LIST_HEAD(&coproc->list);
+	list_add(&coproc->list, &nx842_coprocs);
+
+	pr_info("coprocessor found on chip %d, CT %d CI %d\n", chip_id, ct, ci);
+
+	if (!nx842_ct)
+		nx842_ct = ct;
+	else if (nx842_ct != ct)
+		pr_err("NX842 chip %d, CT %d != first found CT %d\n",
+		       chip_id, ct, nx842_ct);
+
+	return 0;
+}
+
+static struct nx842_constraints nx842_powernv_constraints = {
+	.alignment =	DDE_BUFFER_ALIGN,
+	.multiple =	DDE_BUFFER_LAST_MULT,
+	.minimum =	DDE_BUFFER_LAST_MULT,
+	.maximum =	(DDL_LEN_MAX - 1) * PAGE_SIZE,
+};
+
+static struct nx842_driver nx842_powernv_driver = {
+	.owner =	THIS_MODULE,
+	.constraints =	&nx842_powernv_constraints,
+	.compress =	nx842_powernv_compress,
+	.decompress =	nx842_powernv_decompress,
+};
+
+static __init int nx842_powernv_init(void)
+{
+	struct device_node *dn;
+
+	/* verify workmem size/align restrictions */
+	BUILD_BUG_ON(sizeof(struct nx842_workmem) > NX842_MEM_COMPRESS);
+	BUILD_BUG_ON(WORKMEM_ALIGN % CRB_ALIGN);
+	BUILD_BUG_ON(CRB_ALIGN % DDE_ALIGN);
+	BUILD_BUG_ON(CRB_SIZE % DDE_ALIGN);
+	/* verify buffer size/align restrictions */
+	BUILD_BUG_ON(PAGE_SIZE % DDE_BUFFER_ALIGN);
+	BUILD_BUG_ON(DDE_BUFFER_ALIGN % DDE_BUFFER_SIZE_MULT);
+	BUILD_BUG_ON(DDE_BUFFER_SIZE_MULT % DDE_BUFFER_LAST_MULT);
+
+	pr_info("loading\n");
+
+	for_each_compatible_node(dn, NULL, NX842_POWERNV_COMPAT_NAME)
+		nx842_powernv_probe(dn);
+
+	if (!nx842_ct) {
+		pr_err("no coprocessors found\n");
+		return -ENODEV;
+	}
+
+	nx842_register_driver(&nx842_powernv_driver);
+
+	pr_info("loaded\n");
+
+	return 0;
+}
+module_init(nx842_powernv_init);
+
+static void __exit nx842_powernv_exit(void)
+{
+	struct nx842_coproc *coproc, *n;
+
+	nx842_unregister_driver(&nx842_powernv_driver);
+
+	list_for_each_entry_safe(coproc, n, &nx842_coprocs, list) {
+		list_del(&coproc->list);
+		kfree(coproc);
+	}
+
+	pr_info("unloaded\n");
+}
+module_exit(nx842_powernv_exit);
diff --git a/drivers/crypto/nx/nx-842-pseries.c b/drivers/crypto/nx/nx-842-pseries.c
index cb481d8..6db9992 100644
--- a/drivers/crypto/nx/nx-842-pseries.c
+++ b/drivers/crypto/nx/nx-842-pseries.c
@@ -160,15 +160,6 @@ static inline unsigned long nx842_get_scatterlist_size(
 	return sl->entry_nr * sizeof(struct nx842_slentry);
 }
 
-static inline unsigned long nx842_get_pa(void *addr)
-{
-	if (is_vmalloc_addr(addr))
-		return page_to_phys(vmalloc_to_page(addr))
-		       + offset_in_page(addr);
-	else
-		return __pa(addr);
-}
-
 static int nx842_build_scatterlist(unsigned long buf, int len,
 			struct nx842_scatterlist *sl)
 {
diff --git a/drivers/crypto/nx/nx-842.c b/drivers/crypto/nx/nx-842.c
index 160fe2d..bf2823c 100644
--- a/drivers/crypto/nx/nx-842.c
+++ b/drivers/crypto/nx/nx-842.c
@@ -164,7 +164,9 @@ static __init int nx842_init(void)
 {
 	pr_info("loading\n");
 
-	if (of_find_compatible_node(NULL, NULL, NX842_PSERIES_COMPAT_NAME))
+	if (of_find_compatible_node(NULL, NULL, NX842_POWERNV_COMPAT_NAME))
+		request_module_nowait(NX842_POWERNV_MODULE_NAME);
+	else if (of_find_compatible_node(NULL, NULL, NX842_PSERIES_COMPAT_NAME))
 		request_module_nowait(NX842_PSERIES_MODULE_NAME);
 	else
 		pr_err("no nx842 driver found.\n");
diff --git a/drivers/crypto/nx/nx-842.h b/drivers/crypto/nx/nx-842.h
index c6ceb0f..84b15b7 100644
--- a/drivers/crypto/nx/nx-842.h
+++ b/drivers/crypto/nx/nx-842.h
@@ -5,9 +5,104 @@
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/nx842.h>
+#include <linux/sw842.h>
 #include <linux/of.h>
 #include <linux/slab.h>
 #include <linux/io.h>
+#include <linux/mm.h>
+#include <linux/ratelimit.h>
+
+/* Restrictions on Data Descriptor List (DDL) and Entry (DDE) buffers
+ *
+ * From NX P8 workbook, sec 4.9.1 "842 details"
+ *   Each DDE buffer is 128 byte aligned
+ *   Each DDE buffer size is a multiple of 32 bytes (except the last)
+ *   The last DDE buffer size is a multiple of 8 bytes
+ */
+#define DDE_BUFFER_ALIGN	(128)
+#define DDE_BUFFER_SIZE_MULT	(32)
+#define DDE_BUFFER_LAST_MULT	(8)
+
+/* Arbitrary DDL length limit
+ * Allows max buffer size of MAX-1 to MAX pages
+ * (depending on alignment)
+ */
+#define DDL_LEN_MAX		(17)
+
+/* CCW 842 CI/FC masks
+ * NX P8 workbook, section 4.3.1, figure 4-6
+ * "CI/FC Boundary by NX CT type"
+ */
+#define CCW_CI_842		(0x00003ff8)
+#define CCW_FC_842		(0x00000007)
+
+/* CCW Function Codes (FC) for 842
+ * NX P8 workbook, section 4.9, table 4-28
+ * "Function Code Definitions for 842 Memory Compression"
+ */
+#define CCW_FC_842_COMP_NOCRC	(0)
+#define CCW_FC_842_COMP_CRC	(1)
+#define CCW_FC_842_DECOMP_NOCRC	(2)
+#define CCW_FC_842_DECOMP_CRC	(3)
+#define CCW_FC_842_MOVE		(4)
+
+/* CSB CC Error Types for 842
+ * NX P8 workbook, section 4.10.3, table 4-30
+ * "Reported Error Types Summary Table"
+ */
+/* These are all duplicates of existing codes defined in icswx.h. */
+#define CSB_CC_TRANSLATION_DUP1	(80)
+#define CSB_CC_TRANSLATION_DUP2	(82)
+#define CSB_CC_TRANSLATION_DUP3	(84)
+#define CSB_CC_TRANSLATION_DUP4	(86)
+#define CSB_CC_TRANSLATION_DUP5	(92)
+#define CSB_CC_TRANSLATION_DUP6	(94)
+#define CSB_CC_PROTECTION_DUP1	(81)
+#define CSB_CC_PROTECTION_DUP2	(83)
+#define CSB_CC_PROTECTION_DUP3	(85)
+#define CSB_CC_PROTECTION_DUP4	(87)
+#define CSB_CC_PROTECTION_DUP5	(93)
+#define CSB_CC_PROTECTION_DUP6	(95)
+#define CSB_CC_RD_EXTERNAL_DUP1	(89)
+#define CSB_CC_RD_EXTERNAL_DUP2	(90)
+#define CSB_CC_RD_EXTERNAL_DUP3	(91)
+/* These are specific to NX */
+/* 842 codes */
+#define CSB_CC_TPBC_GT_SPBC	(64) /* no error, but >1 comp ratio */
+#define CSB_CC_CRC_MISMATCH	(65) /* decomp crc mismatch */
+#define CSB_CC_TEMPL_INVALID	(66) /* decomp invalid template value */
+#define CSB_CC_TEMPL_OVERFLOW	(67) /* decomp template shows data after end */
+/* sym crypt codes */
+#define CSB_CC_DECRYPT_OVERFLOW	(64)
+/* asym crypt codes */
+#define CSB_CC_MINV_OVERFLOW	(128)
+/* These are reserved for hypervisor use */
+#define CSB_CC_HYP_RESERVE_START	(240)
+#define CSB_CC_HYP_RESERVE_END		(253)
+#define CSB_CC_HYP_NO_HW		(254)
+#define CSB_CC_HYP_HANG_ABORTED		(255)
+
+/* CCB Completion Modes (CM) for 842
+ * NX P8 workbook, section 4.3, figure 4-5
+ * "CRB Details - Normal Cop_Req (CL=00, C=1)"
+ */
+#define CCB_CM_EXTRA_WRITE	(CCB_CM0_ALL_COMPLETIONS & CCB_CM12_STORE)
+#define CCB_CM_INTERRUPT	(CCB_CM0_ALL_COMPLETIONS & CCB_CM12_INTERRUPT)
+
+#define LEN_ON_PAGE(pa)		(PAGE_SIZE - ((pa) & ~PAGE_MASK))
+
+static inline unsigned long nx842_get_pa(void *addr)
+{
+	if (!is_vmalloc_addr(addr))
+		return __pa(addr);
+
+	return page_to_phys(vmalloc_to_page(addr)) + offset_in_page(addr);
+}
+
+/* Get/Set bit fields */
+#define MASK_LSH(m)		(__builtin_ffsl(m) - 1)
+#define GET_FIELD(v, m)		(((v) & (m)) >> MASK_LSH(m))
+#define SET_FIELD(v, m, val)	(((v) & ~(m)) | (((val) << MASK_LSH(m)) & (m)))
 
 struct nx842_driver {
 	struct module *owner;
@@ -27,6 +122,8 @@ void nx842_unregister_driver(struct nx842_driver *driver);
 
 
 /* To allow the main nx-compress module to load platform module */
+#define NX842_POWERNV_MODULE_NAME	"nx-compress-powernv"
+#define NX842_POWERNV_COMPAT_NAME	"ibm,power-nx"
 #define NX842_PSERIES_MODULE_NAME	"nx-compress-pseries"
 #define NX842_PSERIES_COMPAT_NAME	"ibm,compression"
 
diff --git a/include/linux/nx842.h b/include/linux/nx842.h
index aa1a97e9..4ddf68d 100644
--- a/include/linux/nx842.h
+++ b/include/linux/nx842.h
@@ -1,9 +1,11 @@
 #ifndef __NX842_H__
 #define __NX842_H__
 
-#define __NX842_PSERIES_MEM_COMPRESS	((PAGE_SIZE * 2) + 10240)
+#define __NX842_PSERIES_MEM_COMPRESS	(10240)
+#define __NX842_POWERNV_MEM_COMPRESS	(1024)
 
-#define NX842_MEM_COMPRESS	__NX842_PSERIES_MEM_COMPRESS
+#define NX842_MEM_COMPRESS	(max_t(unsigned int,			\
+	__NX842_PSERIES_MEM_COMPRESS, __NX842_POWERNV_MEM_COMPRESS))
 
 struct nx842_constraints {
 	int alignment;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 09/10] drivers/crypto/nx: simplify pSeries nx842 driver
  2015-05-07 17:49   ` [PATCHv3 00/10] add 842 hw compression for PowerNV platform Dan Streetman
                       ` (7 preceding siblings ...)
  2015-05-07 17:49     ` [PATCH 08/10] drivers/crypto/nx: add PowerNV platform NX-842 driver Dan Streetman
@ 2015-05-07 17:49     ` Dan Streetman
  2015-05-07 17:49     ` [PATCH 10/10] drivers/crypto/nx: add hardware 842 crypto comp alg Dan Streetman
  2015-05-11  7:21     ` [PATCHv3 00/10] add 842 hw compression for PowerNV platform Herbert Xu
  10 siblings, 0 replies; 46+ messages in thread
From: Dan Streetman @ 2015-05-07 17:49 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Seth Jennings, Robert Jennings, linux-crypto, linuxppc-dev,
	linux-kernel, Dan Streetman

Simplify the pSeries NX-842 driver: do not expect incoming buffers to be
exactly page-sized; do not break up input buffers to compress smaller
blocks; do not use any internal headers in the compressed data blocks;
remove the software decompression implementation; implement the pSeries
nx842_constraints.

This changes the pSeries NX-842 driver to perform constraints-based
compression so that it only needs to compress one entire input block at a
time.  This removes the need for it to split input data blocks into
multiple compressed data sections in the output buffer, and removes the
need for any extra header info in the compressed data; all that is moved
(in a later patch) into the main crypto 842 driver.  Additionally, the
842 software decompression implementation is no longer needed here, as
the crypto 842 driver will use the generic software 842 decompression
function as a fallback if any hardware 842 driver fails.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 drivers/crypto/nx/nx-842-pseries.c | 779 ++++++++-----------------------------
 1 file changed, 153 insertions(+), 626 deletions(-)

diff --git a/drivers/crypto/nx/nx-842-pseries.c b/drivers/crypto/nx/nx-842-pseries.c
index 6db9992..85837e9 100644
--- a/drivers/crypto/nx/nx-842-pseries.c
+++ b/drivers/crypto/nx/nx-842-pseries.c
@@ -21,7 +21,6 @@
  *          Seth Jennings <sjenning@linux.vnet.ibm.com>
  */
 
-#include <asm/page.h>
 #include <asm/vio.h>
 
 #include "nx-842.h"
@@ -32,11 +31,6 @@ MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Robert Jennings <rcj@linux.vnet.ibm.com>");
 MODULE_DESCRIPTION("842 H/W Compression driver for IBM Power processors");
 
-#define SHIFT_4K 12
-#define SHIFT_64K 16
-#define SIZE_4K (1UL << SHIFT_4K)
-#define SIZE_64K (1UL << SHIFT_64K)
-
 /* IO buffer must be 128 byte aligned */
 #define IO_BUFFER_ALIGN 128
 
@@ -47,18 +41,52 @@ static struct nx842_constraints nx842_pseries_constraints = {
 	.maximum =	PAGE_SIZE, /* dynamic, max_sync_size */
 };
 
-struct nx842_header {
-	int blocks_nr; /* number of compressed blocks */
-	int offset; /* offset of the first block (from beginning of header) */
-	int sizes[0]; /* size of compressed blocks */
-};
-
-static inline int nx842_header_size(const struct nx842_header *hdr)
+static int check_constraints(unsigned long buf, unsigned int *len, bool in)
 {
-	return sizeof(struct nx842_header) +
-			hdr->blocks_nr * sizeof(hdr->sizes[0]);
+	if (!IS_ALIGNED(buf, nx842_pseries_constraints.alignment)) {
+		pr_debug("%s buffer 0x%lx not aligned to 0x%x\n",
+			 in ? "input" : "output", buf,
+			 nx842_pseries_constraints.alignment);
+		return -EINVAL;
+	}
+	if (*len % nx842_pseries_constraints.multiple) {
+		pr_debug("%s buffer len 0x%x not multiple of 0x%x\n",
+			 in ? "input" : "output", *len,
+			 nx842_pseries_constraints.multiple);
+		if (in)
+			return -EINVAL;
+		*len = round_down(*len, nx842_pseries_constraints.multiple);
+	}
+	if (*len < nx842_pseries_constraints.minimum) {
+		pr_debug("%s buffer len 0x%x under minimum 0x%x\n",
+			 in ? "input" : "output", *len,
+			 nx842_pseries_constraints.minimum);
+		return -EINVAL;
+	}
+	if (*len > nx842_pseries_constraints.maximum) {
+		pr_debug("%s buffer len 0x%x over maximum 0x%x\n",
+			 in ? "input" : "output", *len,
+			 nx842_pseries_constraints.maximum);
+		if (in)
+			return -EINVAL;
+		*len = nx842_pseries_constraints.maximum;
+	}
+	return 0;
 }
 
+/* I assume we need to align the CSB? */
+#define WORKMEM_ALIGN	(256)
+
+struct nx842_workmem {
+	/* scatterlist */
+	char slin[4096];
+	char slout[4096];
+	/* coprocessor status/parameter block */
+	struct nx_csbcpb csbcpb;
+
+	char padding[WORKMEM_ALIGN];
+} __aligned(WORKMEM_ALIGN);
+
 /* Macros for fields within nx_csbcpb */
 /* Check the valid bit within the csbcpb valid field */
 #define NX842_CSBCBP_VALID_CHK(x) (x & BIT_MASK(7))
@@ -72,8 +100,7 @@ static inline int nx842_header_size(const struct nx842_header *hdr)
 #define NX842_CSBCPB_CE2(x)	(x & BIT_MASK(5))
 
 /* The NX unit accepts data only on 4K page boundaries */
-#define NX842_HW_PAGE_SHIFT	SHIFT_4K
-#define NX842_HW_PAGE_SIZE	(ASM_CONST(1) << NX842_HW_PAGE_SHIFT)
+#define NX842_HW_PAGE_SIZE	(4096)
 #define NX842_HW_PAGE_MASK	(~(NX842_HW_PAGE_SIZE-1))
 
 enum nx842_status {
@@ -194,41 +221,6 @@ static int nx842_build_scatterlist(unsigned long buf, int len,
 	return 0;
 }
 
-/*
- * Working memory for software decompression
- */
-struct sw842_fifo {
-	union {
-		char f8[256][8];
-		char f4[512][4];
-	};
-	char f2[256][2];
-	unsigned char f84_full;
-	unsigned char f2_full;
-	unsigned char f8_count;
-	unsigned char f2_count;
-	unsigned int f4_count;
-};
-
-/*
- * Working memory for crypto API
- */
-struct nx842_workmem {
-	char bounce[PAGE_SIZE]; /* bounce buffer for decompression input */
-	union {
-		/* hardware working memory */
-		struct {
-			/* scatterlist */
-			char slin[SIZE_4K];
-			char slout[SIZE_4K];
-			/* coprocessor status/parameter block */
-			struct nx_csbcpb csbcpb;
-		};
-		/* software working memory */
-		struct sw842_fifo swfifo; /* software decompression fifo */
-	};
-};
-
 static int nx842_validate_result(struct device *dev,
 	struct cop_status_block *csb)
 {
@@ -291,8 +283,8 @@ static int nx842_validate_result(struct device *dev,
  * compressed data.  If there is an error then @outlen will be 0 and an
  * error will be specified by the return code from this function.
  *
- * @in: Pointer to input buffer, must be page aligned
- * @inlen: Length of input buffer, must be PAGE_SIZE
+ * @in: Pointer to input buffer
+ * @inlen: Length of input buffer
  * @out: Pointer to output buffer
  * @outlen: Length of output buffer
  * @wrkmem: ptr to buffer for working memory, size determined by
@@ -302,7 +294,6 @@ static int nx842_validate_result(struct device *dev,
  *   0		Success, output of length @outlen stored in the buffer at @out
  *   -ENOMEM	Unable to allocate internal buffers
  *   -ENOSPC	Output buffer is to small
- *   -EMSGSIZE	XXX Difficult to describe this limitation
  *   -EIO	Internal error
  *   -ENODEV	Hardware unavailable
  */
@@ -310,29 +301,26 @@ static int nx842_pseries_compress(const unsigned char *in, unsigned int inlen,
 				  unsigned char *out, unsigned int *outlen,
 				  void *wmem)
 {
-	struct nx842_header *hdr;
 	struct nx842_devdata *local_devdata;
 	struct device *dev = NULL;
 	struct nx842_workmem *workmem;
 	struct nx842_scatterlist slin, slout;
 	struct nx_csbcpb *csbcpb;
-	int ret = 0, max_sync_size, i, bytesleft, size, hdrsize;
-	unsigned long inbuf, outbuf, padding;
+	int ret = 0, max_sync_size;
+	unsigned long inbuf, outbuf;
 	struct vio_pfo_op op = {
 		.done = NULL,
 		.handle = 0,
 		.timeout = 0,
 	};
-	unsigned long start_time = get_tb();
+	unsigned long start = get_tb();
 
-	/*
-	 * Make sure input buffer is 64k page aligned.  This is assumed since
-	 * this driver is designed for page compression only (for now).  This
-	 * is very nice since we can now use direct DDE(s) for the input and
-	 * the alignment is guaranteed.
-	*/
 	inbuf = (unsigned long)in;
-	if (!IS_ALIGNED(inbuf, PAGE_SIZE) || inlen != PAGE_SIZE)
+	if (check_constraints(inbuf, &inlen, true))
+		return -EINVAL;
+
+	outbuf = (unsigned long)out;
+	if (check_constraints(outbuf, outlen, false))
 		return -EINVAL;
 
 	rcu_read_lock();
@@ -344,16 +332,8 @@ static int nx842_pseries_compress(const unsigned char *in, unsigned int inlen,
 	max_sync_size = local_devdata->max_sync_size;
 	dev = local_devdata->dev;
 
-	/* Create the header */
-	hdr = (struct nx842_header *)out;
-	hdr->blocks_nr = PAGE_SIZE / max_sync_size;
-	hdrsize = nx842_header_size(hdr);
-	outbuf = (unsigned long)out + hdrsize;
-	bytesleft = *outlen - hdrsize;
-
 	/* Init scatterlist */
-	workmem = (struct nx842_workmem *)ALIGN((unsigned long)wmem,
-		NX842_HW_PAGE_SIZE);
+	workmem = PTR_ALIGN(wmem, WORKMEM_ALIGN);
 	slin.entries = (struct nx842_slentry *)workmem->slin;
 	slout.entries = (struct nx842_slentry *)workmem->slout;
 
@@ -364,105 +344,48 @@ static int nx842_pseries_compress(const unsigned char *in, unsigned int inlen,
 	op.csbcpb = nx842_get_pa(csbcpb);
 	op.out = nx842_get_pa(slout.entries);
 
-	for (i = 0; i < hdr->blocks_nr; i++) {
-		/*
-		 * Aligning the output blocks to 128 bytes does waste space,
-		 * but it prevents the need for bounce buffers and memory
-		 * copies.  It also simplifies the code a lot.  In the worst
-		 * case (64k page, 4k max_sync_size), you lose up to
-		 * (128*16)/64k = ~3% the compression factor. For 64k
-		 * max_sync_size, the loss would be at most 128/64k = ~0.2%.
-		 */
-		padding = ALIGN(outbuf, IO_BUFFER_ALIGN) - outbuf;
-		outbuf += padding;
-		bytesleft -= padding;
-		if (i == 0)
-			/* save offset into first block in header */
-			hdr->offset = padding + hdrsize;
-
-		if (bytesleft <= 0) {
-			ret = -ENOSPC;
-			goto unlock;
-		}
-
-		/*
-		 * NOTE: If the default max_sync_size is changed from 4k
-		 * to 64k, remove the "likely" case below, since a
-		 * scatterlist will always be needed.
-		 */
-		if (likely(max_sync_size == NX842_HW_PAGE_SIZE)) {
-			/* Create direct DDE */
-			op.in = nx842_get_pa((void *)inbuf);
-			op.inlen = max_sync_size;
-
-		} else {
-			/* Create indirect DDE (scatterlist) */
-			nx842_build_scatterlist(inbuf, max_sync_size, &slin);
-			op.in = nx842_get_pa(slin.entries);
-			op.inlen = -nx842_get_scatterlist_size(&slin);
-		}
+	if ((inbuf & NX842_HW_PAGE_MASK) ==
+	    ((inbuf + inlen - 1) & NX842_HW_PAGE_MASK)) {
+		/* Create direct DDE */
+		op.in = nx842_get_pa((void *)inbuf);
+		op.inlen = inlen;
+	} else {
+		/* Create indirect DDE (scatterlist) */
+		nx842_build_scatterlist(inbuf, inlen, &slin);
+		op.in = nx842_get_pa(slin.entries);
+		op.inlen = -nx842_get_scatterlist_size(&slin);
+	}
 
-		/*
-		 * If max_sync_size != NX842_HW_PAGE_SIZE, an indirect
-		 * DDE is required for the outbuf.
-		 * If max_sync_size == NX842_HW_PAGE_SIZE, outbuf must
-		 * also be page aligned (1 in 128/4k=32 chance) in order
-		 * to use a direct DDE.
-		 * This is unlikely, just use an indirect DDE always.
-		 */
-		nx842_build_scatterlist(outbuf,
-			min(bytesleft, max_sync_size), &slout);
-		/* op.out set before loop */
+	if ((outbuf & NX842_HW_PAGE_MASK) ==
+	    ((outbuf + *outlen - 1) & NX842_HW_PAGE_MASK)) {
+		/* Create direct DDE */
+		op.out = nx842_get_pa((void *)outbuf);
+		op.outlen = *outlen;
+	} else {
+		/* Create indirect DDE (scatterlist) */
+		nx842_build_scatterlist(outbuf, *outlen, &slout);
+		op.out = nx842_get_pa(slout.entries);
 		op.outlen = -nx842_get_scatterlist_size(&slout);
+	}
 
-		/* Send request to pHyp */
-		ret = vio_h_cop_sync(local_devdata->vdev, &op);
-
-		/* Check for pHyp error */
-		if (ret) {
-			dev_dbg(dev, "%s: vio_h_cop_sync error (ret=%d, hret=%ld)\n",
-				__func__, ret, op.hcall_err);
-			ret = -EIO;
-			goto unlock;
-		}
-
-		/* Check for hardware error */
-		ret = nx842_validate_result(dev, &csbcpb->csb);
-		if (ret && ret != -ENOSPC)
-			goto unlock;
-
-		/* Handle incompressible data */
-		if (unlikely(ret == -ENOSPC)) {
-			if (bytesleft < max_sync_size) {
-				/*
-				 * Not enough space left in the output buffer
-				 * to store uncompressed block
-				 */
-				goto unlock;
-			} else {
-				/* Store incompressible block */
-				memcpy((void *)outbuf, (void *)inbuf,
-					max_sync_size);
-				hdr->sizes[i] = -max_sync_size;
-				outbuf += max_sync_size;
-				bytesleft -= max_sync_size;
-				/* Reset ret, incompressible data handled */
-				ret = 0;
-			}
-		} else {
-			/* Normal case, compression was successful */
-			size = csbcpb->csb.processed_byte_count;
-			dev_dbg(dev, "%s: processed_bytes=%d\n",
-				__func__, size);
-			hdr->sizes[i] = size;
-			outbuf += size;
-			bytesleft -= size;
-		}
+	/* Send request to pHyp */
+	ret = vio_h_cop_sync(local_devdata->vdev, &op);
 
-		inbuf += max_sync_size;
+	/* Check for pHyp error */
+	if (ret) {
+		dev_dbg(dev, "%s: vio_h_cop_sync error (ret=%d, hret=%ld)\n",
+			__func__, ret, op.hcall_err);
+		ret = -EIO;
+		goto unlock;
 	}
 
-	*outlen = (unsigned int)(outbuf - (unsigned long)out);
+	/* Check for hardware error */
+	ret = nx842_validate_result(dev, &csbcpb->csb);
+	if (ret)
+		goto unlock;
+
+	*outlen = csbcpb->csb.processed_byte_count;
+	dev_dbg(dev, "%s: processed_bytes=%d\n", __func__, *outlen);
 
 unlock:
 	if (ret)
@@ -470,15 +393,12 @@ unlock:
 	else {
 		nx842_inc_comp_complete(local_devdata);
 		ibm_nx842_incr_hist(local_devdata->counters->comp_times,
-			(get_tb() - start_time) / tb_ticks_per_usec);
+			(get_tb() - start) / tb_ticks_per_usec);
 	}
 	rcu_read_unlock();
 	return ret;
 }
 
-static int sw842_decompress(const unsigned char *, int, unsigned char *, int *,
-			const void *);
-
 /**
  * nx842_pseries_decompress - Decompress data using the 842 algorithm
  *
@@ -490,11 +410,10 @@ static int sw842_decompress(const unsigned char *, int, unsigned char *, int *,
  * If there is an error then @outlen will be 0 and an error will be
  * specified by the return code from this function.
  *
- * @in: Pointer to input buffer, will use bounce buffer if not 128 byte
- *      aligned
+ * @in: Pointer to input buffer
  * @inlen: Length of input buffer
- * @out: Pointer to output buffer, must be page aligned
- * @outlen: Length of output buffer, must be PAGE_SIZE
+ * @out: Pointer to output buffer
+ * @outlen: Length of output buffer
  * @wrkmem: ptr to buffer for working memory, size determined by
  *          NX842_MEM_COMPRESS
  *
@@ -510,43 +429,39 @@ static int nx842_pseries_decompress(const unsigned char *in, unsigned int inlen,
 				    unsigned char *out, unsigned int *outlen,
 				    void *wmem)
 {
-	struct nx842_header *hdr;
 	struct nx842_devdata *local_devdata;
 	struct device *dev = NULL;
 	struct nx842_workmem *workmem;
 	struct nx842_scatterlist slin, slout;
 	struct nx_csbcpb *csbcpb;
-	int ret = 0, i, size, max_sync_size;
+	int ret = 0, max_sync_size;
 	unsigned long inbuf, outbuf;
 	struct vio_pfo_op op = {
 		.done = NULL,
 		.handle = 0,
 		.timeout = 0,
 	};
-	unsigned long start_time = get_tb();
+	unsigned long start = get_tb();
 
 	/* Ensure page alignment and size */
+	inbuf = (unsigned long)in;
+	if (check_constraints(inbuf, &inlen, true))
+		return -EINVAL;
+
 	outbuf = (unsigned long)out;
-	if (!IS_ALIGNED(outbuf, PAGE_SIZE) || *outlen != PAGE_SIZE)
+	if (check_constraints(outbuf, outlen, false))
 		return -EINVAL;
 
 	rcu_read_lock();
 	local_devdata = rcu_dereference(devdata);
-	if (local_devdata)
-		dev = local_devdata->dev;
-
-	/* Get header */
-	hdr = (struct nx842_header *)in;
-
-	workmem = (struct nx842_workmem *)ALIGN((unsigned long)wmem,
-		NX842_HW_PAGE_SIZE);
-
-	inbuf = (unsigned long)in + hdr->offset;
-	if (likely(!IS_ALIGNED(inbuf, IO_BUFFER_ALIGN))) {
-		/* Copy block(s) into bounce buffer for alignment */
-		memcpy(workmem->bounce, in + hdr->offset, inlen - hdr->offset);
-		inbuf = (unsigned long)workmem->bounce;
+	if (!local_devdata || !local_devdata->dev) {
+		rcu_read_unlock();
+		return -ENODEV;
 	}
+	max_sync_size = local_devdata->max_sync_size;
+	dev = local_devdata->dev;
+
+	workmem = PTR_ALIGN(wmem, WORKMEM_ALIGN);
 
 	/* Init scatterlist */
 	slin.entries = (struct nx842_slentry *)workmem->slin;
@@ -558,119 +473,55 @@ static int nx842_pseries_decompress(const unsigned char *in, unsigned int inlen,
 	memset(csbcpb, 0, sizeof(*csbcpb));
 	op.csbcpb = nx842_get_pa(csbcpb);
 
-	/*
-	 * max_sync_size may have changed since compression,
-	 * so we can't read it from the device info. We need
-	 * to derive it from hdr->blocks_nr.
-	 */
-	max_sync_size = PAGE_SIZE / hdr->blocks_nr;
-
-	for (i = 0; i < hdr->blocks_nr; i++) {
-		/* Skip padding */
-		inbuf = ALIGN(inbuf, IO_BUFFER_ALIGN);
-
-		if (hdr->sizes[i] < 0) {
-			/* Negative sizes indicate uncompressed data blocks */
-			size = abs(hdr->sizes[i]);
-			memcpy((void *)outbuf, (void *)inbuf, size);
-			outbuf += size;
-			inbuf += size;
-			continue;
-		}
-
-		if (!dev)
-			goto sw;
-
-		/*
-		 * The better the compression, the more likely the "likely"
-		 * case becomes.
-		 */
-		if (likely((inbuf & NX842_HW_PAGE_MASK) ==
-			((inbuf + hdr->sizes[i] - 1) & NX842_HW_PAGE_MASK))) {
-			/* Create direct DDE */
-			op.in = nx842_get_pa((void *)inbuf);
-			op.inlen = hdr->sizes[i];
-		} else {
-			/* Create indirect DDE (scatterlist) */
-			nx842_build_scatterlist(inbuf, hdr->sizes[i] , &slin);
-			op.in = nx842_get_pa(slin.entries);
-			op.inlen = -nx842_get_scatterlist_size(&slin);
-		}
-
-		/*
-		 * NOTE: If the default max_sync_size is changed from 4k
-		 * to 64k, remove the "likely" case below, since a
-		 * scatterlist will always be needed.
-		 */
-		if (likely(max_sync_size == NX842_HW_PAGE_SIZE)) {
-			/* Create direct DDE */
-			op.out = nx842_get_pa((void *)outbuf);
-			op.outlen = max_sync_size;
-		} else {
-			/* Create indirect DDE (scatterlist) */
-			nx842_build_scatterlist(outbuf, max_sync_size, &slout);
-			op.out = nx842_get_pa(slout.entries);
-			op.outlen = -nx842_get_scatterlist_size(&slout);
-		}
-
-		/* Send request to pHyp */
-		ret = vio_h_cop_sync(local_devdata->vdev, &op);
-
-		/* Check for pHyp error */
-		if (ret) {
-			dev_dbg(dev, "%s: vio_h_cop_sync error (ret=%d, hret=%ld)\n",
-				__func__, ret, op.hcall_err);
-			dev = NULL;
-			goto sw;
-		}
+	if ((inbuf & NX842_HW_PAGE_MASK) ==
+	    ((inbuf + inlen - 1) & NX842_HW_PAGE_MASK)) {
+		/* Create direct DDE */
+		op.in = nx842_get_pa((void *)inbuf);
+		op.inlen = inlen;
+	} else {
+		/* Create indirect DDE (scatterlist) */
+		nx842_build_scatterlist(inbuf, inlen, &slin);
+		op.in = nx842_get_pa(slin.entries);
+		op.inlen = -nx842_get_scatterlist_size(&slin);
+	}
 
-		/* Check for hardware error */
-		ret = nx842_validate_result(dev, &csbcpb->csb);
-		if (ret) {
-			dev = NULL;
-			goto sw;
-		}
+	if ((outbuf & NX842_HW_PAGE_MASK) ==
+	    ((outbuf + *outlen - 1) & NX842_HW_PAGE_MASK)) {
+		/* Create direct DDE */
+		op.out = nx842_get_pa((void *)outbuf);
+		op.outlen = *outlen;
+	} else {
+		/* Create indirect DDE (scatterlist) */
+		nx842_build_scatterlist(outbuf, *outlen, &slout);
+		op.out = nx842_get_pa(slout.entries);
+		op.outlen = -nx842_get_scatterlist_size(&slout);
+	}
 
-		/* HW decompression success */
-		inbuf += hdr->sizes[i];
-		outbuf += csbcpb->csb.processed_byte_count;
-		continue;
-
-sw:
-		/* software decompression */
-		size = max_sync_size;
-		ret = sw842_decompress(
-			(unsigned char *)inbuf, hdr->sizes[i],
-			(unsigned char *)outbuf, &size, wmem);
-		if (ret)
-			pr_debug("%s: sw842_decompress failed with %d\n",
-				__func__, ret);
-
-		if (ret) {
-			if (ret != -ENOSPC && ret != -EINVAL &&
-					ret != -EMSGSIZE)
-				ret = -EIO;
-			goto unlock;
-		}
+	/* Send request to pHyp */
+	ret = vio_h_cop_sync(local_devdata->vdev, &op);
 
-		/* SW decompression success */
-		inbuf += hdr->sizes[i];
-		outbuf += size;
+	/* Check for pHyp error */
+	if (ret) {
+		dev_dbg(dev, "%s: vio_h_cop_sync error (ret=%d, hret=%ld)\n",
+			__func__, ret, op.hcall_err);
+		goto unlock;
 	}
 
-	*outlen = (unsigned int)(outbuf - (unsigned long)out);
+	/* Check for hardware error */
+	ret = nx842_validate_result(dev, &csbcpb->csb);
+	if (ret)
+		goto unlock;
+
+	*outlen = csbcpb->csb.processed_byte_count;
 
 unlock:
 	if (ret)
 		/* decompress fail */
 		nx842_inc_decomp_failed(local_devdata);
 	else {
-		if (!dev)
-			/* software decompress */
-			nx842_inc_swdecomp(local_devdata);
 		nx842_inc_decomp_complete(local_devdata);
 		ibm_nx842_incr_hist(local_devdata->counters->decomp_times,
-			(get_tb() - start_time) / tb_ticks_per_usec);
+			(get_tb() - start) / tb_ticks_per_usec);
 	}
 
 	rcu_read_unlock();
@@ -829,9 +680,9 @@ static int nx842_OF_upd_maxsyncop(struct nx842_devdata *devdata,
 					maxsynccop->decomp_data_limit);
 
 	devdata->max_sync_size = min_t(unsigned int, devdata->max_sync_size,
-					SIZE_64K);
+					65536);
 
-	if (devdata->max_sync_size < SIZE_4K) {
+	if (devdata->max_sync_size < 4096) {
 		dev_err(devdata->dev, "%s: hardware max data size (%u) is "
 				"less than the driver minimum, unable to use "
 				"the hardware device\n",
@@ -1220,17 +1071,17 @@ static int __exit nx842_remove(struct vio_dev *viodev)
 	return 0;
 }
 
-static struct vio_device_id nx842_driver_ids[] = {
+static struct vio_device_id nx842_vio_driver_ids[] = {
 	{NX842_PSERIES_COMPAT_NAME "-v1", NX842_PSERIES_COMPAT_NAME},
 	{"", ""},
 };
 
-static struct vio_driver nx842_driver = {
+static struct vio_driver nx842_vio_driver = {
 	.name = MODULE_NAME,
 	.probe = nx842_probe,
 	.remove = __exit_p(nx842_remove),
 	.get_desired_dma = nx842_get_desired_dma,
-	.id_table = nx842_driver_ids,
+	.id_table = nx842_vio_driver_ids,
 };
 
 static int __init nx842_init(void)
@@ -1249,7 +1100,7 @@ static int __init nx842_init(void)
 	new_devdata->status = UNAVAILABLE;
 	RCU_INIT_POINTER(devdata, new_devdata);
 
-	return vio_register_driver(&nx842_driver);
+	return vio_register_driver(&nx842_vio_driver);
 }
 
 module_init(nx842_init);
@@ -1266,336 +1117,12 @@ static void __exit nx842_exit(void)
 	RCU_INIT_POINTER(devdata, NULL);
 	spin_unlock_irqrestore(&devdata_mutex, flags);
 	synchronize_rcu();
-	if (old_devdata)
+	if (old_devdata && old_devdata->dev)
 		dev_set_drvdata(old_devdata->dev, NULL);
 	kfree(old_devdata);
 	nx842_unregister_driver(&nx842_pseries_driver);
-	vio_unregister_driver(&nx842_driver);
+	vio_unregister_driver(&nx842_vio_driver);
 }
 
 module_exit(nx842_exit);
 
-/*********************************
- * 842 software decompressor
-*********************************/
-typedef int (*sw842_template_op)(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-
-static int sw842_data8(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-static int sw842_data4(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-static int sw842_data2(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-static int sw842_ptr8(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-static int sw842_ptr4(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-static int sw842_ptr2(const char **, int *, unsigned char **,
-						struct sw842_fifo *);
-
-/* special templates */
-#define SW842_TMPL_REPEAT 0x1B
-#define SW842_TMPL_ZEROS 0x1C
-#define SW842_TMPL_EOF 0x1E
-
-static sw842_template_op sw842_tmpl_ops[26][4] = {
-	{ sw842_data8, NULL}, /* 0 (00000) */
-	{ sw842_data4, sw842_data2, sw842_ptr2,  NULL},
-	{ sw842_data4, sw842_ptr2,  sw842_data2, NULL},
-	{ sw842_data4, sw842_ptr2,  sw842_ptr2,  NULL},
-	{ sw842_data4, sw842_ptr4,  NULL},
-	{ sw842_data2, sw842_ptr2,  sw842_data4, NULL},
-	{ sw842_data2, sw842_ptr2,  sw842_data2, sw842_ptr2},
-	{ sw842_data2, sw842_ptr2,  sw842_ptr2,  sw842_data2},
-	{ sw842_data2, sw842_ptr2,  sw842_ptr2,  sw842_ptr2,},
-	{ sw842_data2, sw842_ptr2,  sw842_ptr4,  NULL},
-	{ sw842_ptr2,  sw842_data2, sw842_data4, NULL}, /* 10 (01010) */
-	{ sw842_ptr2,  sw842_data4, sw842_ptr2,  NULL},
-	{ sw842_ptr2,  sw842_data2, sw842_ptr2,  sw842_data2},
-	{ sw842_ptr2,  sw842_data2, sw842_ptr2,  sw842_ptr2},
-	{ sw842_ptr2,  sw842_data2, sw842_ptr4,  NULL},
-	{ sw842_ptr2,  sw842_ptr2,  sw842_data4, NULL},
-	{ sw842_ptr2,  sw842_ptr2,  sw842_data2, sw842_ptr2},
-	{ sw842_ptr2,  sw842_ptr2,  sw842_ptr2,  sw842_data2},
-	{ sw842_ptr2,  sw842_ptr2,  sw842_ptr2,  sw842_ptr2},
-	{ sw842_ptr2,  sw842_ptr2,  sw842_ptr4,  NULL},
-	{ sw842_ptr4,  sw842_data4, NULL}, /* 20 (10100) */
-	{ sw842_ptr4,  sw842_data2, sw842_ptr2,  NULL},
-	{ sw842_ptr4,  sw842_ptr2,  sw842_data2, NULL},
-	{ sw842_ptr4,  sw842_ptr2,  sw842_ptr2,  NULL},
-	{ sw842_ptr4,  sw842_ptr4,  NULL},
-	{ sw842_ptr8,  NULL}
-};
-
-/* Software decompress helpers */
-
-static uint8_t sw842_get_byte(const char *buf, int bit)
-{
-	uint8_t tmpl;
-	uint16_t tmp;
-	tmp = htons(*(uint16_t *)(buf));
-	tmp = (uint16_t)(tmp << bit);
-	tmp = ntohs(tmp);
-	memcpy(&tmpl, &tmp, 1);
-	return tmpl;
-}
-
-static uint8_t sw842_get_template(const char **buf, int *bit)
-{
-	uint8_t byte;
-	byte = sw842_get_byte(*buf, *bit);
-	byte = byte >> 3;
-	byte &= 0x1F;
-	*buf += (*bit + 5) / 8;
-	*bit = (*bit + 5) % 8;
-	return byte;
-}
-
-/* repeat_count happens to be 5-bit too (like the template) */
-static uint8_t sw842_get_repeat_count(const char **buf, int *bit)
-{
-	uint8_t byte;
-	byte = sw842_get_byte(*buf, *bit);
-	byte = byte >> 2;
-	byte &= 0x3F;
-	*buf += (*bit + 6) / 8;
-	*bit = (*bit + 6) % 8;
-	return byte;
-}
-
-static uint8_t sw842_get_ptr2(const char **buf, int *bit)
-{
-	uint8_t ptr;
-	ptr = sw842_get_byte(*buf, *bit);
-	(*buf)++;
-	return ptr;
-}
-
-static uint16_t sw842_get_ptr4(const char **buf, int *bit,
-		struct sw842_fifo *fifo)
-{
-	uint16_t ptr;
-	ptr = htons(*(uint16_t *)(*buf));
-	ptr = (uint16_t)(ptr << *bit);
-	ptr = ptr >> 7;
-	ptr &= 0x01FF;
-	*buf += (*bit + 9) / 8;
-	*bit = (*bit + 9) % 8;
-	return ptr;
-}
-
-static uint8_t sw842_get_ptr8(const char **buf, int *bit,
-		struct sw842_fifo *fifo)
-{
-	return sw842_get_ptr2(buf, bit);
-}
-
-/* Software decompress template ops */
-
-static int sw842_data8(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	int ret;
-
-	ret = sw842_data4(inbuf, inbit, outbuf, fifo);
-	if (ret)
-		return ret;
-	ret = sw842_data4(inbuf, inbit, outbuf, fifo);
-	return ret;
-}
-
-static int sw842_data4(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	int ret;
-
-	ret = sw842_data2(inbuf, inbit, outbuf, fifo);
-	if (ret)
-		return ret;
-	ret = sw842_data2(inbuf, inbit, outbuf, fifo);
-	return ret;
-}
-
-static int sw842_data2(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	**outbuf = sw842_get_byte(*inbuf, *inbit);
-	(*inbuf)++;
-	(*outbuf)++;
-	**outbuf = sw842_get_byte(*inbuf, *inbit);
-	(*inbuf)++;
-	(*outbuf)++;
-	return 0;
-}
-
-static int sw842_ptr8(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	uint8_t ptr;
-	ptr = sw842_get_ptr8(inbuf, inbit, fifo);
-	if (!fifo->f84_full && (ptr >= fifo->f8_count))
-		return 1;
-	memcpy(*outbuf, fifo->f8[ptr], 8);
-	*outbuf += 8;
-	return 0;
-}
-
-static int sw842_ptr4(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	uint16_t ptr;
-	ptr = sw842_get_ptr4(inbuf, inbit, fifo);
-	if (!fifo->f84_full && (ptr >= fifo->f4_count))
-		return 1;
-	memcpy(*outbuf, fifo->f4[ptr], 4);
-	*outbuf += 4;
-	return 0;
-}
-
-static int sw842_ptr2(const char **inbuf, int *inbit,
-		unsigned char **outbuf, struct sw842_fifo *fifo)
-{
-	uint8_t ptr;
-	ptr = sw842_get_ptr2(inbuf, inbit);
-	if (!fifo->f2_full && (ptr >= fifo->f2_count))
-		return 1;
-	memcpy(*outbuf, fifo->f2[ptr], 2);
-	*outbuf += 2;
-	return 0;
-}
-
-static void sw842_copy_to_fifo(const char *buf, struct sw842_fifo *fifo)
-{
-	unsigned char initial_f2count = fifo->f2_count;
-
-	memcpy(fifo->f8[fifo->f8_count], buf, 8);
-	fifo->f4_count += 2;
-	fifo->f8_count += 1;
-
-	if (!fifo->f84_full && fifo->f4_count >= 512) {
-		fifo->f84_full = 1;
-		fifo->f4_count /= 512;
-	}
-
-	memcpy(fifo->f2[fifo->f2_count++], buf, 2);
-	memcpy(fifo->f2[fifo->f2_count++], buf + 2, 2);
-	memcpy(fifo->f2[fifo->f2_count++], buf + 4, 2);
-	memcpy(fifo->f2[fifo->f2_count++], buf + 6, 2);
-	if (fifo->f2_count < initial_f2count)
-		fifo->f2_full = 1;
-}
-
-static int sw842_decompress(const unsigned char *src, int srclen,
-			unsigned char *dst, int *destlen,
-			const void *wrkmem)
-{
-	uint8_t tmpl;
-	const char *inbuf;
-	int inbit = 0;
-	unsigned char *outbuf, *outbuf_end, *origbuf, *prevbuf;
-	const char *inbuf_end;
-	sw842_template_op op;
-	int opindex;
-	int i, repeat_count;
-	struct sw842_fifo *fifo;
-	int ret = 0;
-
-	fifo = &((struct nx842_workmem *)(wrkmem))->swfifo;
-	memset(fifo, 0, sizeof(*fifo));
-
-	origbuf = NULL;
-	inbuf = src;
-	inbuf_end = src + srclen;
-	outbuf = dst;
-	outbuf_end = dst + *destlen;
-
-	while ((tmpl = sw842_get_template(&inbuf, &inbit)) != SW842_TMPL_EOF) {
-		if (inbuf >= inbuf_end) {
-			ret = -EINVAL;
-			goto out;
-		}
-
-		opindex = 0;
-		prevbuf = origbuf;
-		origbuf = outbuf;
-		switch (tmpl) {
-		case SW842_TMPL_REPEAT:
-			if (prevbuf == NULL) {
-				ret = -EINVAL;
-				goto out;
-			}
-
-			repeat_count = sw842_get_repeat_count(&inbuf,
-								&inbit) + 1;
-
-			/* Did the repeat count advance past the end of input */
-			if (inbuf > inbuf_end) {
-				ret = -EINVAL;
-				goto out;
-			}
-
-			for (i = 0; i < repeat_count; i++) {
-				/* Would this overflow the output buffer */
-				if ((outbuf + 8) > outbuf_end) {
-					ret = -ENOSPC;
-					goto out;
-				}
-
-				memcpy(outbuf, prevbuf, 8);
-				sw842_copy_to_fifo(outbuf, fifo);
-				outbuf += 8;
-			}
-			break;
-
-		case SW842_TMPL_ZEROS:
-			/* Would this overflow the output buffer */
-			if ((outbuf + 8) > outbuf_end) {
-				ret = -ENOSPC;
-				goto out;
-			}
-
-			memset(outbuf, 0, 8);
-			sw842_copy_to_fifo(outbuf, fifo);
-			outbuf += 8;
-			break;
-
-		default:
-			if (tmpl > 25) {
-				ret = -EINVAL;
-				goto out;
-			}
-
-			/* Does this go past the end of the input buffer */
-			if ((inbuf + 2) > inbuf_end) {
-				ret = -EINVAL;
-				goto out;
-			}
-
-			/* Would this overflow the output buffer */
-			if ((outbuf + 8) > outbuf_end) {
-				ret = -ENOSPC;
-				goto out;
-			}
-
-			while (opindex < 4 &&
-				(op = sw842_tmpl_ops[tmpl][opindex++])
-					!= NULL) {
-				ret = (*op)(&inbuf, &inbit, &outbuf, fifo);
-				if (ret) {
-					ret = -EINVAL;
-					goto out;
-				}
-				sw842_copy_to_fifo(origbuf, fifo);
-			}
-		}
-	}
-
-out:
-	if (!ret)
-		*destlen = (unsigned int)(outbuf - dst);
-	else
-		*destlen = 0;
-
-	return ret;
-}
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 10/10] drivers/crypto/nx: add hardware 842 crypto comp alg
  2015-05-07 17:49   ` [PATCHv3 00/10] add 842 hw compression for PowerNV platform Dan Streetman
                       ` (8 preceding siblings ...)
  2015-05-07 17:49     ` [PATCH 09/10] drivers/crypto/nx: simplify pSeries nx842 driver Dan Streetman
@ 2015-05-07 17:49     ` Dan Streetman
  2015-05-11  7:21     ` [PATCHv3 00/10] add 842 hw compression for PowerNV platform Herbert Xu
  10 siblings, 0 replies; 46+ messages in thread
From: Dan Streetman @ 2015-05-07 17:49 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Seth Jennings, Robert Jennings, linux-crypto, linuxppc-dev,
	linux-kernel, Dan Streetman

Add crypto compression alg for 842 hardware compression and decompression,
using the alg name "842" and driver_name "842-nx".

This uses only the PowerPC coprocessor hardware for 842 compression.  It
also uses the hardware for decompression, but if the hardware fails it will
fall back to the 842 software decompression library, so that decompression
never fails (for valid 842 compressed buffers).  A header must be used in
most cases, due to the hardware's restrictions on the buffers being
specifically aligned and sized.

Due to the header this driver adds, compressed buffers it creates cannot be
directly passed to the 842 software library for decompression.  However,
compressed buffers created by the software 842 library can be passed to
this driver for hardware 842 decompression (with the exception of buffers
containing the "short data" template, as lib/842/842.h explains).

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
---
 drivers/crypto/nx/Kconfig         |  10 +
 drivers/crypto/nx/Makefile        |   2 +
 drivers/crypto/nx/nx-842-crypto.c | 585 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 597 insertions(+)
 create mode 100644 drivers/crypto/nx/nx-842-crypto.c

diff --git a/drivers/crypto/nx/Kconfig b/drivers/crypto/nx/Kconfig
index ee9e259..3e621ad 100644
--- a/drivers/crypto/nx/Kconfig
+++ b/drivers/crypto/nx/Kconfig
@@ -50,4 +50,14 @@ config CRYPTO_DEV_NX_COMPRESS_POWERNV
 	  algorithm.  This supports NX hardware on the PowerNV platform.
 	  If you choose 'M' here, this module will be called nx_compress_powernv.
 
+config CRYPTO_DEV_NX_COMPRESS_CRYPTO
+	tristate "Compression acceleration cryptographic interface"
+	select CRYPTO_ALGAPI
+	select 842_DECOMPRESS
+	default y
+	help
+	  Support for PowerPC Nest (NX) accelerators using the cryptographic
+	  API.  If you choose 'M' here, this module will be called
+	  nx_compress_crypto.
+
 endif
diff --git a/drivers/crypto/nx/Makefile b/drivers/crypto/nx/Makefile
index 6619787..868b5e6 100644
--- a/drivers/crypto/nx/Makefile
+++ b/drivers/crypto/nx/Makefile
@@ -13,6 +13,8 @@ nx-crypto-objs := nx.o \
 obj-$(CONFIG_CRYPTO_DEV_NX_COMPRESS) += nx-compress.o
 obj-$(CONFIG_CRYPTO_DEV_NX_COMPRESS_PSERIES) += nx-compress-pseries.o
 obj-$(CONFIG_CRYPTO_DEV_NX_COMPRESS_POWERNV) += nx-compress-powernv.o
+obj-$(CONFIG_CRYPTO_DEV_NX_COMPRESS_CRYPTO) += nx-compress-crypto.o
 nx-compress-objs := nx-842.o
 nx-compress-pseries-objs := nx-842-pseries.o
 nx-compress-powernv-objs := nx-842-powernv.o
+nx-compress-crypto-objs := nx-842-crypto.o
diff --git a/drivers/crypto/nx/nx-842-crypto.c b/drivers/crypto/nx/nx-842-crypto.c
new file mode 100644
index 0000000..cb177c3
--- /dev/null
+++ b/drivers/crypto/nx/nx-842-crypto.c
@@ -0,0 +1,585 @@
+/*
+ * Cryptographic API for the NX-842 hardware compression.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Copyright (C) IBM Corporation, 2011-2015
+ *
+ * Original Authors: Robert Jennings <rcj@linux.vnet.ibm.com>
+ *                   Seth Jennings <sjenning@linux.vnet.ibm.com>
+ *
+ * Rewrite: Dan Streetman <ddstreet@ieee.org>
+ *
+ * This is an interface to the NX-842 compression hardware in PowerPC
+ * processors.  Most of the complexity of this drvier is due to the fact that
+ * the NX-842 compression hardware requires the input and output data buffers
+ * to be specifically aligned, to be a specific multiple in length, and within
+ * specific minimum and maximum lengths.  Those restrictions, provided by the
+ * nx-842 driver via nx842_constraints, mean this driver must use bounce
+ * buffers and headers to correct misaligned in or out buffers, and to split
+ * input buffers that are too large.
+ *
+ * This driver will fall back to software decompression if the hardware
+ * decompression fails, so this driver's decompression should never fail as
+ * long as the provided compressed buffer is valid.  Any compressed buffer
+ * created by this driver will have a header (except ones where the input
+ * perfectly matches the constraints); so users of this driver cannot simply
+ * pass a compressed buffer created by this driver over to the 842 software
+ * decompression library.  Instead, users must use this driver to decompress;
+ * if the hardware fails or is unavailable, the compressed buffer will be
+ * parsed and the header removed, and the raw 842 buffer(s) passed to the 842
+ * software decompression library.
+ *
+ * This does not fall back to software compression, however, since the caller
+ * of this function is specifically requesting hardware compression; if the
+ * hardware compression fails, the caller can fall back to software
+ * compression, and the raw 842 compressed buffer that the software compressor
+ * creates can be passed to this driver for hardware decompression; any
+ * buffer without our specific header magic is assumed to be a raw 842 buffer
+ * and passed directly to the hardware.  Note that the software compression
+ * library will produce a compressed buffer that is incompatible with the
+ * hardware decompressor if the original input buffer length is not a multiple
+ * of 8; if such a compressed buffer is passed to this driver for
+ * decompression, the hardware will reject it and this driver will then pass
+ * it over to the software library for decompression.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/crypto.h>
+#include <linux/vmalloc.h>
+#include <linux/nx842.h>
+#include <linux/sw842.h>
+#include <linux/ratelimit.h>
+
+/* The first 5 bits of this magic are 0x1f, which is an invalid 842 5-bit
+ * template (see lib/842/842.h), so this magic number will never appear at
+ * the start of a raw 842 compressed buffer.  That is important, as any buffer
+ * passed to us without this magic is assumed to be a raw 842 compressed
+ * buffer, and passed directly to the hardware to decompress.
+ */
+#define NX842_CRYPTO_MAGIC	(0xf842)
+#define NX842_CRYPTO_GROUP_MAX	(0x20)
+#define NX842_CRYPTO_HEADER_SIZE(g)				\
+	(sizeof(struct nx842_crypto_header) +			\
+	 sizeof(struct nx842_crypto_header_group) * (g))
+#define NX842_CRYPTO_HEADER_MAX_SIZE				\
+	NX842_CRYPTO_HEADER_SIZE(NX842_CRYPTO_GROUP_MAX)
+
+/* bounce buffer size */
+#define BOUNCE_BUFFER_ORDER	(2)
+#define BOUNCE_BUFFER_SIZE					\
+	((unsigned int)(PAGE_SIZE << BOUNCE_BUFFER_ORDER))
+
+/* try longer on comp because we can fallback to sw decomp if hw is busy */
+#define COMP_BUSY_TIMEOUT	(250) /* ms */
+#define DECOMP_BUSY_TIMEOUT	(50) /* ms */
+
+struct nx842_crypto_header_group {
+	__be16 padding;			/* unused bytes at start of group */
+	__be32 compressed_length;	/* compressed bytes in group */
+	__be32 uncompressed_length;	/* bytes after decompression */
+} __packed;
+
+struct nx842_crypto_header {
+	__be16 magic;		/* NX842_CRYPTO_MAGIC */
+	__be16 ignore;		/* decompressed end bytes to ignore */
+	u8 groups;		/* total groups in this header */
+	struct nx842_crypto_header_group group[];
+} __packed;
+
+struct nx842_crypto_param {
+	u8 *in;
+	unsigned int iremain;
+	u8 *out;
+	unsigned int oremain;
+	unsigned int ototal;
+};
+
+static int update_param(struct nx842_crypto_param *p,
+			unsigned int slen, unsigned int dlen)
+{
+	if (p->iremain < slen)
+		return -EOVERFLOW;
+	if (p->oremain < dlen)
+		return -ENOSPC;
+
+	p->in += slen;
+	p->iremain -= slen;
+	p->out += dlen;
+	p->oremain -= dlen;
+	p->ototal += dlen;
+
+	return 0;
+}
+
+struct nx842_crypto_ctx {
+	u8 *wmem;
+	u8 *sbounce, *dbounce;
+
+	struct nx842_crypto_header header;
+	struct nx842_crypto_header_group group[NX842_CRYPTO_GROUP_MAX];
+};
+
+static int nx842_crypto_init(struct crypto_tfm *tfm)
+{
+	struct nx842_crypto_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	ctx->wmem = kmalloc(NX842_MEM_COMPRESS, GFP_KERNEL);
+	ctx->sbounce = (u8 *)__get_free_pages(GFP_KERNEL, BOUNCE_BUFFER_ORDER);
+	ctx->dbounce = (u8 *)__get_free_pages(GFP_KERNEL, BOUNCE_BUFFER_ORDER);
+	if (!ctx->wmem || !ctx->sbounce || !ctx->dbounce) {
+		kfree(ctx->wmem);
+		free_page((unsigned long)ctx->sbounce);
+		free_page((unsigned long)ctx->dbounce);
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void nx842_crypto_exit(struct crypto_tfm *tfm)
+{
+	struct nx842_crypto_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	kfree(ctx->wmem);
+	free_page((unsigned long)ctx->sbounce);
+	free_page((unsigned long)ctx->dbounce);
+}
+
+static int read_constraints(struct nx842_constraints *c)
+{
+	int ret;
+
+	ret = nx842_constraints(c);
+	if (ret) {
+		pr_err_ratelimited("could not get nx842 constraints : %d\n",
+				   ret);
+		return ret;
+	}
+
+	/* limit maximum, to always have enough bounce buffer to decompress */
+	if (c->maximum > BOUNCE_BUFFER_SIZE) {
+		c->maximum = BOUNCE_BUFFER_SIZE;
+		pr_info_once("limiting nx842 maximum to %x\n", c->maximum);
+	}
+
+	return 0;
+}
+
+static int nx842_crypto_add_header(struct nx842_crypto_header *hdr, u8 *buf)
+{
+	int s = NX842_CRYPTO_HEADER_SIZE(hdr->groups);
+
+	/* compress should have added space for header */
+	if (s > be16_to_cpu(hdr->group[0].padding)) {
+		pr_err("Internal error: no space for header\n");
+		return -EINVAL;
+	}
+
+	memcpy(buf, hdr, s);
+
+	print_hex_dump_debug("header ", DUMP_PREFIX_OFFSET, 16, 1, buf, s, 0);
+
+	return 0;
+}
+
+static int compress(struct nx842_crypto_ctx *ctx,
+		    struct nx842_crypto_param *p,
+		    struct nx842_crypto_header_group *g,
+		    struct nx842_constraints *c,
+		    u16 *ignore,
+		    unsigned int hdrsize)
+{
+	unsigned int slen = p->iremain, dlen = p->oremain, tmplen;
+	unsigned int adj_slen = slen;
+	u8 *src = p->in, *dst = p->out;
+	int ret, dskip = 0;
+	ktime_t timeout;
+
+	if (p->iremain == 0)
+		return -EOVERFLOW;
+
+	if (p->oremain == 0 || hdrsize + c->minimum > dlen)
+		return -ENOSPC;
+
+	if (slen % c->multiple)
+		adj_slen = round_up(slen, c->multiple);
+	if (slen < c->minimum)
+		adj_slen = c->minimum;
+	if (slen > c->maximum)
+		adj_slen = slen = c->maximum;
+	if (adj_slen > slen || (u64)src % c->alignment) {
+		adj_slen = min(adj_slen, BOUNCE_BUFFER_SIZE);
+		slen = min(slen, BOUNCE_BUFFER_SIZE);
+		if (adj_slen > slen)
+			memset(ctx->sbounce + slen, 0, adj_slen - slen);
+		memcpy(ctx->sbounce, src, slen);
+		src = ctx->sbounce;
+		slen = adj_slen;
+		pr_debug("using comp sbounce buffer, len %x\n", slen);
+	}
+
+	dst += hdrsize;
+	dlen -= hdrsize;
+
+	if ((u64)dst % c->alignment) {
+		dskip = (int)(PTR_ALIGN(dst, c->alignment) - dst);
+		dst += dskip;
+		dlen -= dskip;
+	}
+	if (dlen % c->multiple)
+		dlen = round_down(dlen, c->multiple);
+	if (dlen < c->minimum) {
+nospc:
+		dst = ctx->dbounce;
+		dlen = min(p->oremain, BOUNCE_BUFFER_SIZE);
+		dlen = round_down(dlen, c->multiple);
+		dskip = 0;
+		pr_debug("using comp dbounce buffer, len %x\n", dlen);
+	}
+	if (dlen > c->maximum)
+		dlen = c->maximum;
+
+	tmplen = dlen;
+	timeout = ktime_add_ms(ktime_get(), COMP_BUSY_TIMEOUT);
+	do {
+		dlen = tmplen; /* reset dlen, if we're retrying */
+		ret = nx842_compress(src, slen, dst, &dlen, ctx->wmem);
+		/* possibly we should reduce the slen here, instead of
+		 * retrying with the dbounce buffer?
+		 */
+		if (ret == -ENOSPC && dst != ctx->dbounce)
+			goto nospc;
+	} while (ret == -EBUSY && ktime_before(ktime_get(), timeout));
+	if (ret)
+		return ret;
+
+	dskip += hdrsize;
+
+	if (dst == ctx->dbounce)
+		memcpy(p->out + dskip, dst, dlen);
+
+	g->padding = cpu_to_be16(dskip);
+	g->compressed_length = cpu_to_be32(dlen);
+	g->uncompressed_length = cpu_to_be32(slen);
+
+	if (p->iremain < slen) {
+		*ignore = slen - p->iremain;
+		slen = p->iremain;
+	}
+
+	pr_debug("compress slen %x ignore %x dlen %x padding %x\n",
+		 slen, *ignore, dlen, dskip);
+
+	return update_param(p, slen, dskip + dlen);
+}
+
+static int nx842_crypto_compress(struct crypto_tfm *tfm,
+				 const u8 *src, unsigned int slen,
+				 u8 *dst, unsigned int *dlen)
+{
+	struct nx842_crypto_ctx *ctx = crypto_tfm_ctx(tfm);
+	struct nx842_crypto_header *hdr = &ctx->header;
+	struct nx842_crypto_param p;
+	struct nx842_constraints c;
+	unsigned int groups, hdrsize, h;
+	int ret, n;
+	bool add_header;
+	u16 ignore = 0;
+
+	if (!tfm || !src || !slen || !dst || !dlen)
+		return -EINVAL;
+
+	p.in = (u8 *)src;
+	p.iremain = slen;
+	p.out = dst;
+	p.oremain = *dlen;
+	p.ototal = 0;
+
+	*dlen = 0;
+
+	ret = read_constraints(&c);
+	if (ret)
+		return ret;
+
+	groups = min_t(unsigned int, NX842_CRYPTO_GROUP_MAX,
+		       DIV_ROUND_UP(p.iremain, c.maximum));
+	hdrsize = NX842_CRYPTO_HEADER_SIZE(groups);
+
+	/* skip adding header if the buffers meet all constraints */
+	add_header = (p.iremain % c.multiple	||
+		      p.iremain < c.minimum	||
+		      p.iremain > c.maximum	||
+		      (u64)p.in % c.alignment	||
+		      p.oremain % c.multiple	||
+		      p.oremain < c.minimum	||
+		      p.oremain > c.maximum	||
+		      (u64)p.out % c.alignment);
+
+	hdr->magic = cpu_to_be16(NX842_CRYPTO_MAGIC);
+	hdr->groups = 0;
+	hdr->ignore = 0;
+
+	while (p.iremain > 0) {
+		n = hdr->groups++;
+		if (hdr->groups > NX842_CRYPTO_GROUP_MAX)
+			return -ENOSPC;
+
+		/* header goes before first group */
+		h = !n && add_header ? hdrsize : 0;
+
+		if (ignore)
+			pr_warn("interal error, ignore is set %x\n", ignore);
+
+		ret = compress(ctx, &p, &hdr->group[n], &c, &ignore, h);
+		if (ret)
+			return ret;
+	}
+
+	if (!add_header && hdr->groups > 1) {
+		pr_err("Internal error: No header but multiple groups\n");
+		return -EINVAL;
+	}
+
+	/* ignore indicates the input stream needed to be padded */
+	hdr->ignore = cpu_to_be16(ignore);
+	if (ignore)
+		pr_debug("marked %d bytes as ignore\n", ignore);
+
+	if (add_header)
+		ret = nx842_crypto_add_header(hdr, dst);
+	if (ret)
+		return ret;
+
+	*dlen = p.ototal;
+
+	pr_debug("compress total slen %x dlen %x\n", slen, *dlen);
+
+	return 0;
+}
+
+static int decompress(struct nx842_crypto_ctx *ctx,
+		      struct nx842_crypto_param *p,
+		      struct nx842_crypto_header_group *g,
+		      struct nx842_constraints *c,
+		      u16 ignore,
+		      bool usehw)
+{
+	unsigned int slen = be32_to_cpu(g->compressed_length);
+	unsigned int required_len = be32_to_cpu(g->uncompressed_length);
+	unsigned int dlen = p->oremain, tmplen;
+	unsigned int adj_slen = slen;
+	u8 *src = p->in, *dst = p->out;
+	u16 padding = be16_to_cpu(g->padding);
+	int ret, spadding = 0, dpadding = 0;
+	ktime_t timeout;
+
+	if (!slen || !required_len)
+		return -EINVAL;
+
+	if (p->iremain <= 0 || padding + slen > p->iremain)
+		return -EOVERFLOW;
+
+	if (p->oremain <= 0 || required_len - ignore > p->oremain)
+		return -ENOSPC;
+
+	src += padding;
+
+	if (!usehw)
+		goto usesw;
+
+	if (slen % c->multiple)
+		adj_slen = round_up(slen, c->multiple);
+	if (slen < c->minimum)
+		adj_slen = c->minimum;
+	if (slen > c->maximum)
+		goto usesw;
+	if (slen < adj_slen || (u64)src % c->alignment) {
+		/* we can append padding bytes because the 842 format defines
+		 * an "end" template (see lib/842/842_decompress.c) and will
+		 * ignore any bytes following it.
+		 */
+		if (slen < adj_slen)
+			memset(ctx->sbounce + slen, 0, adj_slen - slen);
+		memcpy(ctx->sbounce, src, slen);
+		src = ctx->sbounce;
+		spadding = adj_slen - slen;
+		slen = adj_slen;
+		pr_debug("using decomp sbounce buffer, len %x\n", slen);
+	}
+
+	if (dlen % c->multiple)
+		dlen = round_down(dlen, c->multiple);
+	if (dlen < required_len || (u64)dst % c->alignment) {
+		dst = ctx->dbounce;
+		dlen = min(required_len, BOUNCE_BUFFER_SIZE);
+		pr_debug("using decomp dbounce buffer, len %x\n", dlen);
+	}
+	if (dlen < c->minimum)
+		goto usesw;
+	if (dlen > c->maximum)
+		dlen = c->maximum;
+
+	tmplen = dlen;
+	timeout = ktime_add_ms(ktime_get(), DECOMP_BUSY_TIMEOUT);
+	do {
+		dlen = tmplen; /* reset dlen, if we're retrying */
+		ret = nx842_decompress(src, slen, dst, &dlen, ctx->wmem);
+	} while (ret == -EBUSY && ktime_before(ktime_get(), timeout));
+	if (ret) {
+usesw:
+		/* reset everything, sw doesn't have constraints */
+		src = p->in + padding;
+		slen = be32_to_cpu(g->compressed_length);
+		spadding = 0;
+		dst = p->out;
+		dlen = p->oremain;
+		dpadding = 0;
+		if (dlen < required_len) { /* have ignore bytes */
+			dst = ctx->dbounce;
+			dlen = BOUNCE_BUFFER_SIZE;
+		}
+		pr_info_ratelimited("using software 842 decompression\n");
+		ret = sw842_decompress(src, slen, dst, &dlen);
+	}
+	if (ret)
+		return ret;
+
+	slen -= spadding;
+
+	dlen -= ignore;
+	if (ignore)
+		pr_debug("ignoring last %x bytes\n", ignore);
+
+	if (dst == ctx->dbounce)
+		memcpy(p->out, dst, dlen);
+
+	pr_debug("decompress slen %x padding %x dlen %x ignore %x\n",
+		 slen, padding, dlen, ignore);
+
+	return update_param(p, slen + padding, dlen);
+}
+
+static int nx842_crypto_decompress(struct crypto_tfm *tfm,
+				   const u8 *src, unsigned int slen,
+				   u8 *dst, unsigned int *dlen)
+{
+	struct nx842_crypto_ctx *ctx = crypto_tfm_ctx(tfm);
+	struct nx842_crypto_header *hdr;
+	struct nx842_crypto_param p;
+	struct nx842_constraints c;
+	int n, ret, hdr_len;
+	u16 ignore = 0;
+	bool usehw = true;
+
+	if (!tfm || !src || !slen || !dst || !dlen)
+		return -EINVAL;
+
+	p.in = (u8 *)src;
+	p.iremain = slen;
+	p.out = dst;
+	p.oremain = *dlen;
+	p.ototal = 0;
+
+	*dlen = 0;
+
+	if (read_constraints(&c))
+		usehw = false;
+
+	hdr = (struct nx842_crypto_header *)src;
+
+	/* If it doesn't start with our header magic number, assume it's a raw
+	 * 842 compressed buffer and pass it directly to the hardware driver
+	 */
+	if (be16_to_cpu(hdr->magic) != NX842_CRYPTO_MAGIC) {
+		struct nx842_crypto_header_group g = {
+			.padding =		0,
+			.compressed_length =	cpu_to_be32(p.iremain),
+			.uncompressed_length =	cpu_to_be32(p.oremain),
+		};
+
+		ret = decompress(ctx, &p, &g, &c, 0, usehw);
+		if (ret)
+			return ret;
+
+		*dlen = p.ototal;
+
+		return 0;
+	}
+
+	if (!hdr->groups) {
+		pr_err("header has no groups\n");
+		return -EINVAL;
+	}
+	if (hdr->groups > NX842_CRYPTO_GROUP_MAX) {
+		pr_err("header has too many groups %x, max %x\n",
+		       hdr->groups, NX842_CRYPTO_GROUP_MAX);
+		return -EINVAL;
+	}
+
+	hdr_len = NX842_CRYPTO_HEADER_SIZE(hdr->groups);
+	if (hdr_len > slen)
+		return -EOVERFLOW;
+
+	memcpy(&ctx->header, src, hdr_len);
+	hdr = &ctx->header;
+
+	for (n = 0; n < hdr->groups; n++) {
+		/* ignore applies to last group */
+		if (n + 1 == hdr->groups)
+			ignore = be16_to_cpu(hdr->ignore);
+
+		ret = decompress(ctx, &p, &hdr->group[n], &c, ignore, usehw);
+		if (ret)
+			return ret;
+	}
+
+	*dlen = p.ototal;
+
+	pr_debug("decompress total slen %x dlen %x\n", slen, *dlen);
+
+	return 0;
+}
+
+static struct crypto_alg alg = {
+	.cra_name		= "842",
+	.cra_driver_name	= "842-nx",
+	.cra_priority		= 300,
+	.cra_flags		= CRYPTO_ALG_TYPE_COMPRESS,
+	.cra_ctxsize		= sizeof(struct nx842_crypto_ctx),
+	.cra_module		= THIS_MODULE,
+	.cra_init		= nx842_crypto_init,
+	.cra_exit		= nx842_crypto_exit,
+	.cra_u			= { .compress = {
+	.coa_compress		= nx842_crypto_compress,
+	.coa_decompress		= nx842_crypto_decompress } }
+};
+
+static int __init nx842_crypto_mod_init(void)
+{
+	return crypto_register_alg(&alg);
+}
+module_init(nx842_crypto_mod_init);
+
+static void __exit nx842_crypto_mod_exit(void)
+{
+	crypto_unregister_alg(&alg);
+}
+module_exit(nx842_crypto_mod_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("IBM PowerPC Nest (NX) 842 Hardware Compression Interface");
+MODULE_ALIAS_CRYPTO("842");
+MODULE_ALIAS_CRYPTO("842-nx");
+MODULE_AUTHOR("Dan Streetman <ddstreet@ieee.org>");
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/10] drivers/crypto/nx: add hardware 842 crypto comp alg
  2015-05-07 15:06       ` Dan Streetman
@ 2015-05-08  2:32         ` Herbert Xu
  0 siblings, 0 replies; 46+ messages in thread
From: Herbert Xu @ 2015-05-08  2:32 UTC (permalink / raw)
  To: Dan Streetman
  Cc: David S. Miller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Seth Jennings, Robert Jennings, linux-crypto,
	linuxppc-dev, linux-kernel

On Thu, May 07, 2015 at 11:06:06AM -0400, Dan Streetman wrote:
> 
> The crypto 842-nx has (significant) code in it to handle any alignment
> and length input buffers, to match them to what the driver requires.
> Would it be better to move that into the crypto code, so that any
> crypto compression hw driver can request buffers be specifically
> aligned/sized?  I did have to use a header on each compressed buffer
> that needed re-alignment or re-sizing, so maybe it's not appropriate
> for common crypto compression code.

Yes we could certainly move this logic into the crypto layer, as
we do for ciphers and hashes.

But as you say we could make the next guy who writes a comp driver
do this :)

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCHv3 00/10] add 842 hw compression for PowerNV platform
  2015-05-07 17:49   ` [PATCHv3 00/10] add 842 hw compression for PowerNV platform Dan Streetman
                       ` (9 preceding siblings ...)
  2015-05-07 17:49     ` [PATCH 10/10] drivers/crypto/nx: add hardware 842 crypto comp alg Dan Streetman
@ 2015-05-11  7:21     ` Herbert Xu
  10 siblings, 0 replies; 46+ messages in thread
From: Herbert Xu @ 2015-05-11  7:21 UTC (permalink / raw)
  To: Dan Streetman
  Cc: David S. Miller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Seth Jennings, Robert Jennings, linux-crypto,
	linuxppc-dev, linux-kernel

On Thu, May 07, 2015 at 01:49:11PM -0400, Dan Streetman wrote:
>
> v3 changes the sw and hw crypto drivers to use the same alg name "842",
> and different driver names, "842-generic" and "842-nx"

All applied.  Thanks a lot!
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2015-05-11  7:21 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-07 17:34 [PATCH 00/11] add 842 hw compression for PowerNV platform Dan Streetman
2015-04-07 17:34 ` [PATCH 01/11] powerpc: export of_get_ibm_chip_id function Dan Streetman
2015-04-07 17:34 ` [PATCH 02/11] powerpc: Add ICSWX instruction Dan Streetman
2015-04-07 17:34 ` [PATCH 03/11] crypto: add software 842 decompression Dan Streetman
2015-04-07 17:34 ` [PATCH 04/11] drivers/crypto/nx: move nx-842.c to nx-842-pseries.c Dan Streetman
2015-04-07 17:34 ` [PATCH 05/11] drivers/crypto/nx: add NX-842 platform frontend driver Dan Streetman
2015-04-07 17:34 ` [PATCH 06/11] drivers/crypto/nx: add nx842 constraints Dan Streetman
2015-04-07 17:34 ` [PATCH 07/11] drivers/crypto/nx: add PowerNV platform NX-842 driver Dan Streetman
2015-04-07 17:34 ` [PATCH 08/11] drivers/crypto/nx: simplify pSeries nx842 driver Dan Streetman
2015-04-07 17:34 ` [PATCH 09/11] crypto: remove LZO fallback from crypto 842 Dan Streetman
2015-04-08 14:16   ` Herbert Xu
2015-04-08 14:28     ` Dan Streetman
2015-04-08 14:38       ` Herbert Xu
2015-04-08 14:45         ` Dan Streetman
2015-04-08 14:48           ` Herbert Xu
2015-04-07 17:34 ` [PATCH 10/11] crypto: rewrite crypto 842 to use nx842 constraints Dan Streetman
2015-04-07 17:34 ` [PATCH 11/11] crypto: add crypto compression sefltest Dan Streetman
2015-04-08 14:16   ` Herbert Xu
2015-04-08 14:48     ` Dan Streetman
2015-05-06 16:50 ` [PATCHv2 00/10] add 842 hw compression for PowerNV platform Dan Streetman
2015-05-06 16:50   ` [PATCH 01/10] powerpc: export of_get_ibm_chip_id function Dan Streetman
2015-05-06 16:50   ` [PATCH 02/10] powerpc: Add ICSWX instruction Dan Streetman
2015-05-06 16:50   ` [PATCH 03/10] lib: add software 842 compression/decompression Dan Streetman
2015-05-06 16:51   ` [PATCH 04/10] crypto: change 842 alg to use software Dan Streetman
2015-05-07  3:07     ` Herbert Xu
2015-05-06 16:51   ` [PATCH 05/10] drivers/crypto/nx: rename nx-842.c to nx-842-pseries.c Dan Streetman
2015-05-06 16:51   ` [PATCH 06/10] drivers/crypto/nx: add NX-842 platform frontend driver Dan Streetman
2015-05-06 16:51   ` [PATCH 07/10] drivers/crypto/nx: add nx842 constraints Dan Streetman
2015-05-06 16:51   ` [PATCH 08/10] drivers/crypto/nx: add PowerNV platform NX-842 driver Dan Streetman
2015-05-06 16:51   ` [PATCH 09/10] drivers/crypto/nx: simplify pSeries nx842 driver Dan Streetman
2015-05-06 16:51   ` [PATCH 10/10] drivers/crypto/nx: add hardware 842 crypto comp alg Dan Streetman
2015-05-07  3:12     ` Herbert Xu
2015-05-07 15:06       ` Dan Streetman
2015-05-08  2:32         ` Herbert Xu
2015-05-07 17:49   ` [PATCHv3 00/10] add 842 hw compression for PowerNV platform Dan Streetman
2015-05-07 17:49     ` [PATCH 01/10] powerpc: export of_get_ibm_chip_id function Dan Streetman
2015-05-07 17:49     ` [PATCH 02/10] powerpc: Add ICSWX instruction Dan Streetman
2015-05-07 17:49     ` [PATCH 03/10] lib: add software 842 compression/decompression Dan Streetman
2015-05-07 17:49     ` [PATCH 04/10] crypto: change 842 alg to use software Dan Streetman
2015-05-07 17:49     ` [PATCH 05/10] drivers/crypto/nx: rename nx-842.c to nx-842-pseries.c Dan Streetman
2015-05-07 17:49     ` [PATCH 06/10] drivers/crypto/nx: add NX-842 platform frontend driver Dan Streetman
2015-05-07 17:49     ` [PATCH 07/10] drivers/crypto/nx: add nx842 constraints Dan Streetman
2015-05-07 17:49     ` [PATCH 08/10] drivers/crypto/nx: add PowerNV platform NX-842 driver Dan Streetman
2015-05-07 17:49     ` [PATCH 09/10] drivers/crypto/nx: simplify pSeries nx842 driver Dan Streetman
2015-05-07 17:49     ` [PATCH 10/10] drivers/crypto/nx: add hardware 842 crypto comp alg Dan Streetman
2015-05-11  7:21     ` [PATCHv3 00/10] add 842 hw compression for PowerNV platform Herbert Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).