linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] dm: verity target
@ 2011-12-22  0:36 Mandeep Singh Baines
  0 siblings, 0 replies; 22+ messages in thread
From: Mandeep Singh Baines @ 2011-12-22  0:36 UTC (permalink / raw)
  To: Alasdair G Kergon, dm-devel, linux-kernel, Neil Brown
  Cc: Mandeep Singh Baines, Will Drewry, Elly Jones, Alasdair G Kergon,
	Milan Broz, Olof Johansson, Steffen Klassert

The verity target provides transparent integrity checking of block devices
using a cryptographic digest.

dm-verity is meant to be setup as part of a verified boot path.  This
may be anything ranging from a boot using tboot or trustedgrub to just
booting from a known-good device (like a USB drive or CD).

dm-verity is part of ChromeOS's verified boot path. It is used to verify
the integrity of the root filesystem on boot. The root filesystem is
mounted on a dm-verity partition which transparently verifies each block
with a bootloader verified hash passed into the kernel at boot.

Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Elly Jones <ellyjones@chromium.org>
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Milan Broz <mbroz@redhat.com>
Cc: Olof Johansson <olofj@chromium.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: dm-devel@redhat.com
---
 Documentation/device-mapper/dm-bht.txt    |   59 ++
 Documentation/device-mapper/dm-verity.txt |   76 +++
 drivers/md/Kconfig                        |   30 +
 drivers/md/Makefile                       |    2 +
 drivers/md/dm-bht.c                       |  559 +++++++++++++++
 drivers/md/dm-verity.c                    | 1043 +++++++++++++++++++++++++++++
 drivers/md/dm-verity.h                    |   45 ++
 include/linux/dm-bht.h                    |  166 +++++
 8 files changed, 1980 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/device-mapper/dm-bht.txt
 create mode 100644 Documentation/device-mapper/dm-verity.txt
 create mode 100644 drivers/md/dm-bht.c
 create mode 100644 drivers/md/dm-verity.c
 create mode 100644 drivers/md/dm-verity.h
 create mode 100644 include/linux/dm-bht.h

diff --git a/Documentation/device-mapper/dm-bht.txt b/Documentation/device-mapper/dm-bht.txt
new file mode 100644
index 0000000..21d929f
--- /dev/null
+++ b/Documentation/device-mapper/dm-bht.txt
@@ -0,0 +1,59 @@
+dm-bht
+======
+
+dm-bht provides a block hash tree implementation.  The use of dm-bht allows
+for integrity checking of a given block device without reading the entire
+set of blocks into memory before use.
+
+In particular, dm-bht supplies an interface for creating and verifying a tree
+of cryptographic digests with any algorithm supported by the kernel crypto API.
+
+The `verity' target is the motivating example.
+
+
+Theory of operation
+===================
+
+dm-bht is logically comprised of multiple nodes organized in a tree-like
+structure.  Each node in the tree is a cryptographic hash.  If it is a leaf
+node, the hash is of some block data on disk.  If it is an intermediary node,
+then the hash is of a number of child nodes.
+
+dm-bht has a given depth starting at 1 (ignoring the root node).  Each level in
+the tree is concretely made up of dm_bht_entry structs.  Each entry in the tree
+is a collection of neighboring nodes that fit in one page-sized block.  The
+number is determined based on PAGE_SIZE and the size of the selected
+cryptographic digest algorithm.  The hashes are linearly ordered in this entry
+and any unaligned trailing space is ignored but included when calculating the
+parent node.
+
+The tree looks something like:
+
+alg= sha256, num_blocks = 32767
+                                 [   root    ]
+                                /    . . .    \
+                     [entry_0]                 [entry_1]
+                    /  . . .  \                 . . .   \
+         [entry_0_0]   . . .  [entry_0_127]    . . . .  [entry_1_127]
+           / ... \             /   . . .  \             /           \
+     blk_0 ... blk_127  blk_16256   blk_16383      blk_32640 . . . blk_32767
+
+root is treated independently from the depth and the blocks are expected to
+be hashed and supplied to the dm-bht.  hash blocks that make up the entry
+contents are expected to be read from disk.
+
+dm-bht does not handle I/O directly but instead expects the consumer to
+supply callbacks.  The read callback will always receive a page-align value
+to pass to the block device layer to read in a hash value.
+
+Usage
+=====
+
+The API provides mechanisms for reading and verifying a tree. When reading, all
+required data for the hash tree should be populated for a block before
+attempting a verify.  This can be done by calling dm_bht_populate().  When all
+data is ready, a call to dm_bht_verify_block() with the expected hash value will
+perform both the direct block hash check and the hashes of the parent and
+neighboring nodes where needed to ensure validity up to the root hash.  Note,
+dm_bht_set_root_hexdigest() should be called before any verification attempts
+occur.
diff --git a/Documentation/device-mapper/dm-verity.txt b/Documentation/device-mapper/dm-verity.txt
new file mode 100644
index 0000000..f33b984
--- /dev/null
+++ b/Documentation/device-mapper/dm-verity.txt
@@ -0,0 +1,76 @@
+dm-verity
+==========
+
+Device-Mapper's "verity" target provides transparent integrity checking of
+block devices using a cryptographic digest provided by the kernel crypto API.
+This target is read-only.
+
+Parameters: payload=<device path> hashtree=<hash device path> alg=<alg> \
+            salt=<salt> root_hexagiest=<root hash> \
+            [ hashstart=<hash start> error_behavior=<error behavior> ]
+
+<device path>
+    This is the device that is going to be integrity checked.  It may be
+    a subset of the full device as specified to dmsetup (start sector and count)
+    It may be specified as a path, like /dev/sdaX, or a device number,
+    <major>:<minor>.
+
+<hash device path>
+    This is the device that that supplies the dm-bht hash data.  It may be
+    specified similarly to the device path and may be the same device.  If the
+    same device is used, the hash offset should be outside of the dm-verity
+    configured device size.
+
+<alg>
+    The cryptographic hash algorithm used for this device.  This should
+    be the name of the algorithm, like "sha1".
+
+<salt>
+    Salt value (in hex).
+
+<root hash>
+    The hexadecimal encoding of the cryptographic hash of all of the
+    neighboring nodes at the first level of the tree.  This hash should be
+    trusted as there is no other authenticity beyond this point.
+
+<hash start>
+    Start address of hashes (default 0).
+
+<error behavior>
+    0 = return -EIO. 1 = panic. 2 = none. 3 = call notifier.
+
+Theory of operation
+===================
+
+dm-verity is meant to be setup as part of a verified boot path.  This
+may be anything ranging from a boot using tboot or trustedgrub to just
+booting from a known-good device (like a USB drive or CD).
+
+When a dm-verity device is configured, it is expected that the caller
+has been authenticated in some way (cryptographic signatures, etc).
+After instantiation, all hashes will be verified on-demand during
+disk access.  If they cannot be verified up to the root node of the
+tree, the root hash, then the I/O will fail.  This should identify
+tampering with any data on the device and the hash data.
+
+Cryptographic hashes are used to assert the integrity of the device on a
+per-block basis.  This allows for a lightweight hash computation on first read
+into the page cache.  Block hashes are stored linearly aligned to the nearest
+block the size of a page.
+
+For more information on the hashing process, see dm-bht.txt.
+
+
+Example
+=======
+
+Setup a device;
+[[
+  dmsetup create vroot --table \
+    "0 204800 verity payload=/dev/sda1 hashtree=/dev/sda2 alg=sha1 "\
+    "root_hexdigest=9f74809a2ee7607b16fcc70d9399a4de9725a727"
+]]
+
+A command line tool is available to compute the hash tree and return the
+root hash value.
+  http://git.chromium.org/cgi-bin/gitweb.cgi?p=dm-verity.git;a=tree
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index faa4741..3cdf95c 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -370,4 +370,34 @@ config DM_FLAKEY
        ---help---
          A target that intermittently fails I/O for debugging purposes.
 
+config DM_BHT
+        tristate "Block hash tree support"
+        select CRYPTO
+        select CRYPTO_HASH
+        ---help---
+          Include support for device-mapper devices to use a block hash
+          tree for managing data integrity checks in a scalable way.
+
+          Targets that use this functionality should include it
+          automatically.
+
+          If unsure, say N.
+
+config DM_VERITY
+        tristate "Verity target support"
+        depends on BLK_DEV_DM
+        select DM_BHT
+        select CRYPTO
+        select CRYPTO_HASH
+        ---help---
+          This device-mapper target allows you to create a device that
+          transparently integrity checks the data on it. You'll need to
+          activate the digests you're going to use in the cryptoapi
+          configuration.
+
+          To compile this code as a module, choose M here: the module will
+          be called dm-verity.
+
+          If unsure, say N.
+
 endif # MD
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index 046860c..c069953 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -39,6 +39,8 @@ obj-$(CONFIG_DM_SNAPSHOT)	+= dm-snapshot.o
 obj-$(CONFIG_DM_PERSISTENT_DATA)	+= persistent-data/
 obj-$(CONFIG_DM_MIRROR)		+= dm-mirror.o dm-log.o dm-region-hash.o
 obj-$(CONFIG_DM_LOG_USERSPACE)	+= dm-log-userspace.o
+obj-$(CONFIG_DM_BHT)            += dm-bht.o
+obj-$(CONFIG_DM_VERITY)         += dm-verity.o
 obj-$(CONFIG_DM_ZERO)		+= dm-zero.o
 obj-$(CONFIG_DM_RAID)	+= dm-raid.o
 obj-$(CONFIG_DM_THIN_PROVISIONING)	+= dm-thin-pool.o
diff --git a/drivers/md/dm-bht.c b/drivers/md/dm-bht.c
new file mode 100644
index 0000000..6eb2be3
--- /dev/null
+++ b/drivers/md/dm-bht.c
@@ -0,0 +1,559 @@
+ /*
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *
+ * Device-Mapper block hash tree interface.
+ * See Documentation/device-mapper/dm-bht.txt for details.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <crypto/hash.h>
+#include <linux/atomic.h>
+#include <linux/bitops.h>
+#include <linux/bug.h>
+#include <linux/cpumask.h>
+#include <linux/device-mapper.h>
+#include <linux/dm-bht.h>
+#include <linux/err.h>
+#include <linux/errno.h>
+#include <linux/gfp.h>
+#include <linux/highmem.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/mm_types.h>
+#include <linux/scatterlist.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+#define DM_MSG_PREFIX "dm bht"
+
+
+/*
+ * Utilities
+ */
+
+static u8 from_hex(u8 ch)
+{
+	if ((ch >= '0') && (ch <= '9'))
+		return ch - '0';
+	if ((ch >= 'a') && (ch <= 'f'))
+		return ch - 'a' + 10;
+	if ((ch >= 'A') && (ch <= 'F'))
+		return ch - 'A' + 10;
+	return -1;
+}
+
+/**
+ * dm_bht_bin_to_hex - converts a binary stream to human-readable hex
+ * @binary:	a byte array of length @binary_len
+ * @hex:	a byte array of length @binary_len * 2 + 1
+ */
+static void dm_bht_bin_to_hex(u8 *binary, u8 *hex, unsigned int binary_len)
+{
+	while (binary_len-- > 0) {
+		sprintf((char *)hex, "%02hhx", (int)*binary);
+		hex += 2;
+		binary++;
+	}
+}
+
+/**
+ * dm_bht_hex_to_bin - converts a hex stream to binary
+ * @binary:	a byte array of length @binary_len
+ * @hex:	a byte array of length @binary_len * 2 + 1
+ */
+static void dm_bht_hex_to_bin(u8 *binary, const u8 *hex,
+			      unsigned int binary_len)
+{
+	while (binary_len-- > 0) {
+		*binary = from_hex(*(hex++));
+		*binary *= 16;
+		*binary += from_hex(*(hex++));
+		binary++;
+	}
+}
+
+static void dm_bht_log_mismatch(struct dm_bht *bht, u8 *given, u8 *computed)
+{
+	u8 given_hex[DM_BHT_MAX_DIGEST_SIZE * 2 + 1];
+	u8 computed_hex[DM_BHT_MAX_DIGEST_SIZE * 2 + 1];
+
+	dm_bht_bin_to_hex(given, given_hex, bht->digest_size);
+	dm_bht_bin_to_hex(computed, computed_hex, bht->digest_size);
+	DMERR_LIMIT("%s != %s", given_hex, computed_hex);
+}
+
+/**
+ * dm_bht_compute_hash: hashes a page of data
+ */
+static int dm_bht_compute_hash(struct dm_bht *bht, struct page *pg,
+			       unsigned int offset, u8 *digest)
+{
+	struct shash_desc *hash_desc = bht->hash_desc[smp_processor_id()];
+	void *data;
+	int err;
+
+	/* Note, this is synchronous. */
+	if (crypto_shash_init(hash_desc)) {
+		DMCRIT("failed to reinitialize crypto hash (proc:%d)",
+			smp_processor_id());
+		return -EINVAL;
+	}
+	data = kmap_atomic(pg);
+	err = crypto_shash_update(hash_desc, data + offset, PAGE_SIZE);
+	kunmap_atomic(data);
+	if (err) {
+		DMCRIT("crypto_hash_update failed");
+		return -EINVAL;
+	}
+	if (crypto_shash_update(hash_desc, bht->salt, sizeof(bht->salt))) {
+		DMCRIT("crypto_hash_update failed");
+		return -EINVAL;
+	}
+	if (crypto_shash_final(hash_desc, digest)) {
+		DMCRIT("crypto_hash_final failed");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/*
+ * Implementation functions
+ */
+
+static int dm_bht_initialize_entries(struct dm_bht *bht)
+{
+	/* last represents the index of the last digest store in the tree.
+	 * By walking the tree with that index, it is possible to compute the
+	 * total number of entries at each level.
+	 *
+	 * Since each entry will contain up to |node_count| nodes of the tree,
+	 * it is possible that the last index may not be at the end of a given
+	 * entry->nodes.  In that case, it is assumed the value is padded.
+	 *
+	 * Note, we treat both the tree root (1 hash) and the tree leaves
+	 * independently from the bht data structures.  Logically, the root is
+	 * depth=-1 and the block layer level is depth=bht->depth
+	 */
+	unsigned int last = bht->block_count;
+	int depth;
+
+	/* check that the largest level->count can't result in an int overflow
+	 * on allocation or sector calculation.
+	 */
+	if (((last >> bht->node_count_shift) + 1) >
+	    UINT_MAX / max((unsigned int)sizeof(struct dm_bht_entry),
+			   (unsigned int)to_sector(bht->block_size))) {
+		DMCRIT("required entries %u is too large", last + 1);
+		return -EINVAL;
+	}
+
+	/* Track the current sector location for each level so we don't have to
+	 * compute it during traversals.
+	 */
+	bht->sectors = 0;
+	for (depth = 0; depth < bht->depth; ++depth) {
+		struct dm_bht_level *level = &bht->levels[depth];
+
+		level->count = dm_bht_index_at_level(bht, depth, last) + 1;
+		level->entries = (struct dm_bht_entry *)
+				 kcalloc(level->count,
+					 sizeof(struct dm_bht_entry),
+					 GFP_KERNEL);
+		if (!level->entries) {
+			DMERR("failed to allocate entries for depth %d", depth);
+			return -ENOMEM;
+		}
+		level->sector = bht->sectors;
+		bht->sectors += level->count * to_sector(bht->block_size);
+	}
+
+	return 0;
+}
+
+/**
+ * dm_bht_create - prepares @bht for us
+ * @bht:	pointer to a dm_bht_create()d bht
+ * @depth:	tree depth without the root; including block hashes
+ * @block_count:the number of block hashes / tree leaves
+ * @alg_name:	crypto hash algorithm name
+ *
+ * Returns 0 on success.
+ *
+ * Callers can offset into devices by storing the data in the io callbacks.
+ */
+int dm_bht_create(struct dm_bht *bht, unsigned int block_count,
+		  unsigned int block_size, const char *alg_name)
+{
+	struct crypto_shash *tfm;
+	int size, cpu, status = 0;
+
+	bht->block_size = block_size;
+	/* Verify that PAGE_SIZE >= block_size >= SECTOR_SIZE. */
+	if ((block_size > PAGE_SIZE) ||
+	    (PAGE_SIZE % block_size) ||
+	    (to_sector(block_size) == 0))
+		return -EINVAL;
+
+	tfm = crypto_alloc_shash(alg_name, 0, 0);
+	if (IS_ERR(tfm)) {
+		DMERR("failed to allocate crypto hash '%s'", alg_name);
+		return -ENOMEM;
+	}
+	size = sizeof(struct shash_desc) + crypto_shash_descsize(tfm);
+
+	/* Pre-allocate per-cpu crypto contexts to avoid having to
+	 * kmalloc/kfree a context for every hash operation.
+	 */
+	for (cpu = 0; cpu < nr_cpu_ids; ++cpu) {
+		struct shash_desc *hash_desc = kmalloc(size, GFP_KERNEL);
+
+		bht->hash_desc[cpu] = hash_desc;
+		if (!hash_desc) {
+			DMERR("failed to allocate crypto hash contexts");
+			status = -ENOMEM;
+			goto bad_hash_alloc;
+		}
+		hash_desc->tfm = tfm;
+		hash_desc->flags = 0x0;
+	}
+	bht->digest_size = crypto_shash_digestsize(tfm);
+	/* We expect to be able to pack >=2 hashes into a block */
+	if (block_size / bht->digest_size < 2) {
+		DMERR("too few hashes fit in a block");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	if (bht->digest_size > DM_BHT_MAX_DIGEST_SIZE) {
+		DMERR("DM_BHT_MAX_DIGEST_SIZE too small for chosen digest");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	/* Configure the tree */
+	bht->block_count = block_count;
+	if (block_count == 0) {
+		DMERR("block_count must be non-zero");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	/* Each dm_bht_entry->nodes is one block.  The node code tracks
+	 * how many nodes fit into one entry where a node is a single
+	 * hash (message digest).
+	 */
+	bht->node_count_shift = fls(block_size / bht->digest_size) - 1;
+	/* Round down to the nearest power of two.  This makes indexing
+	 * into the tree much less painful.
+	 */
+	bht->node_count = 1 << bht->node_count_shift;
+
+	/* This is unlikely to happen, but with 64k pages, who knows. */
+	if (bht->node_count > UINT_MAX / bht->digest_size) {
+		DMERR("node_count * hash_len exceeds UINT_MAX!");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	bht->depth = DIV_ROUND_UP(fls(block_count - 1), bht->node_count_shift);
+
+	/* Ensure that we can safely shift by this value. */
+	if (bht->depth * bht->node_count_shift >= sizeof(unsigned int) * 8) {
+		DMERR("specified depth and node_count_shift is too large");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	/* Allocate levels. Each level of the tree may have an arbitrary number
+	 * of dm_bht_entry structs.  Each entry contains node_count nodes.
+	 * Each node in the tree is a cryptographic digest of either node_count
+	 * nodes on the subsequent level or of a specific block on disk.
+	 */
+	bht->levels = (struct dm_bht_level *)
+			kcalloc(bht->depth,
+				sizeof(struct dm_bht_level), GFP_KERNEL);
+	if (!bht->levels) {
+		DMERR("failed to allocate tree levels");
+		status = -ENOMEM;
+		goto bad_level_alloc;
+	}
+
+	bht->read_cb = NULL;
+
+	status = dm_bht_initialize_entries(bht);
+	if (status)
+		goto bad_entries_alloc;
+
+	/* We compute depth such that there is only be 1 block at level 0. */
+	BUG_ON(bht->levels[0].count != 1);
+
+	return 0;
+
+bad_entries_alloc:
+	while (bht->depth-- > 0)
+		kfree(bht->levels[bht->depth].entries);
+	kfree(bht->levels);
+bad_level_alloc:
+bad_arg:
+bad_hash_alloc:
+	for (cpu = 0; cpu < nr_cpu_ids && bht->hash_desc[cpu]; ++cpu)
+		kfree(bht->hash_desc[cpu]);
+	crypto_free_shash(tfm);
+	return status;
+}
+EXPORT_SYMBOL(dm_bht_create);
+
+/**
+ * dm_bht_read_completed
+ * @entry:	pointer to the entry that's been loaded
+ * @status:	I/O status. Non-zero is failure.
+ * MUST always be called after a read_cb completes.
+ */
+void dm_bht_read_completed(struct dm_bht_entry *entry, int status)
+{
+	if (status) {
+		/* TODO(wad) add retry support */
+		DMCRIT("an I/O error occurred while reading entry");
+		atomic_set(&entry->state, DM_BHT_ENTRY_ERROR_IO);
+		/* entry->nodes will be freed later */
+		return;
+	}
+	BUG_ON(atomic_read(&entry->state) != DM_BHT_ENTRY_PENDING);
+	atomic_set(&entry->state, DM_BHT_ENTRY_READY);
+}
+EXPORT_SYMBOL(dm_bht_read_completed);
+
+/**
+ * dm_bht_verify_block - checks that all nodes in the path for @block are valid
+ * @bht:	pointer to a dm_bht_create()d bht
+ * @block:	specific block data is expected from
+ * @pg:		page holding the block data
+ * @offset:	offset into the page
+ *
+ * Returns 0 on success, DM_BHT_ENTRY_ERROR_MISMATCH on error.
+ */
+int dm_bht_verify_block(struct dm_bht *bht, unsigned int block,
+			struct page *pg, unsigned int offset)
+{
+	int state, depth = bht->depth;
+	u8 digest[DM_BHT_MAX_DIGEST_SIZE];
+	struct dm_bht_entry *entry;
+	void *node;
+
+	do {
+		/* Need to check that the hash of the current block is accurate
+		 * in its parent.
+		 */
+		entry = dm_bht_get_entry(bht, depth - 1, block);
+		state = atomic_read(&entry->state);
+		/* This call is only safe if all nodes along the path
+		 * are already populated (i.e. READY) via dm_bht_populate.
+		 */
+		BUG_ON(state < DM_BHT_ENTRY_READY);
+		node = dm_bht_get_node(bht, entry, depth, block);
+
+		if (dm_bht_compute_hash(bht, pg, offset, digest) ||
+		    memcmp(digest, node, bht->digest_size))
+			goto mismatch;
+
+		/* Keep the containing block of hashes to be verified in the
+		 * next pass.
+		 */
+		pg = virt_to_page(entry->nodes);
+		offset = offset_in_page(entry->nodes);
+	} while (--depth > 0 && state != DM_BHT_ENTRY_VERIFIED);
+
+	if (depth == 0 && state != DM_BHT_ENTRY_VERIFIED) {
+		if (dm_bht_compute_hash(bht, pg, offset, digest) ||
+		    memcmp(digest, bht->root_digest, bht->digest_size))
+			goto mismatch;
+		atomic_set(&entry->state, DM_BHT_ENTRY_VERIFIED);
+	}
+
+	/* Mark path to leaf as verified. */
+	for (depth++; depth < bht->depth; depth++) {
+		entry = dm_bht_get_entry(bht, depth, block);
+		/* At this point, entry can only be in VERIFIED or READY state.
+		 * So it is safe to use atomic_set instead of atomic_cmpxchg.
+		 */
+		atomic_set(&entry->state, DM_BHT_ENTRY_VERIFIED);
+	}
+
+	return 0;
+
+mismatch:
+	DMERR_LIMIT("verify_path: failed to verify hash (d=%d,bi=%u)",
+		    depth, block);
+	dm_bht_log_mismatch(bht, node, digest);
+	return DM_BHT_ENTRY_ERROR_MISMATCH;
+}
+EXPORT_SYMBOL(dm_bht_verify_block);
+
+/**
+ * dm_bht_is_populated - check that entries from disk needed to verify a given
+ *                       block are all ready
+ * @bht:	pointer to a dm_bht_create()d bht
+ * @block:	specific block data is expected from
+ *
+ * Callers may wish to call dm_bht_is_populated() when checking an io
+ * for which entries were already pending.
+ */
+bool dm_bht_is_populated(struct dm_bht *bht, unsigned int block)
+{
+	int depth;
+
+	for (depth = bht->depth - 1; depth >= 0; depth--) {
+		struct dm_bht_entry *entry = dm_bht_get_entry(bht, depth,
+							      block);
+		if (atomic_read(&entry->state) < DM_BHT_ENTRY_READY)
+			return false;
+	}
+
+	return true;
+}
+EXPORT_SYMBOL(dm_bht_is_populated);
+
+/**
+ * dm_bht_populate - reads entries from disk needed to verify a given block
+ * @bht:	pointer to a dm_bht_create()d bht
+ * @ctx:        context used for all read_cb calls on this request
+ * @block:	specific block data is expected from
+ *
+ * Returns negative value on error. Returns 0 on success.
+ */
+int dm_bht_populate(struct dm_bht *bht, void *ctx, unsigned int block)
+{
+	int depth, state;
+
+	BUG_ON(block >= bht->block_count);
+
+	for (depth = bht->depth - 1; depth >= 0; --depth) {
+		unsigned int index = dm_bht_index_at_level(bht, depth, block);
+		struct dm_bht_level *level = &bht->levels[depth];
+		struct dm_bht_entry *entry = dm_bht_get_entry(bht, depth,
+							      block);
+		state = atomic_cmpxchg(&entry->state,
+				       DM_BHT_ENTRY_UNALLOCATED,
+				       DM_BHT_ENTRY_PENDING);
+		if (state == DM_BHT_ENTRY_VERIFIED)
+			break;
+		if (state <= DM_BHT_ENTRY_ERROR)
+			goto error_state;
+		if (state != DM_BHT_ENTRY_UNALLOCATED)
+			continue;
+
+		/* Current entry is claimed for allocation and loading */
+		entry->nodes = kmalloc(bht->block_size, GFP_NOIO);
+		if (!entry->nodes)
+			goto nomem;
+
+		bht->read_cb(ctx,
+			     level->sector + to_sector(index * bht->block_size),
+			     entry->nodes, to_sector(bht->block_size), entry);
+	}
+
+	return 0;
+
+error_state:
+	DMCRIT("block %u at depth %d is in an error state", block, depth);
+	return -EPERM;
+
+nomem:
+	DMCRIT("failed to allocate memory for entry->nodes");
+	return -ENOMEM;
+}
+EXPORT_SYMBOL(dm_bht_populate);
+
+/**
+ * dm_bht_destroy - cleans up all memory used by @bht
+ * @bht:	pointer to a dm_bht_create()d bht
+ */
+void dm_bht_destroy(struct dm_bht *bht)
+{
+	int depth, cpu;
+
+	for (depth = 0; depth < bht->depth; depth++) {
+		struct dm_bht_entry *entry = bht->levels[depth].entries;
+		struct dm_bht_entry *entry_end = entry +
+						 bht->levels[depth].count;
+		for (; entry < entry_end; ++entry)
+			kfree(entry->nodes);
+		kfree(bht->levels[depth].entries);
+	}
+	kfree(bht->levels);
+	crypto_free_shash((bht->hash_desc[0])->tfm);
+	for (cpu = 0; cpu < nr_cpu_ids; ++cpu)
+		kfree(bht->hash_desc[cpu]);
+}
+EXPORT_SYMBOL(dm_bht_destroy);
+
+/*
+ * Accessors
+ */
+
+/**
+ * dm_bht_set_root_hexdigest - sets an unverified root digest hash from hex
+ * @bht:	pointer to a dm_bht_create()d bht
+ * @hexdigest:	array of u8s containing the new digest in binary
+ * Returns non-zero on error.  hexdigest should be NUL terminated.
+ */
+int dm_bht_set_root_hexdigest(struct dm_bht *bht, const u8 *hexdigest)
+{
+	/* Make sure we have at least the bytes expected */
+	if (strnlen((char *)hexdigest, bht->digest_size * 2) !=
+	    bht->digest_size * 2) {
+		DMERR("root digest length does not match hash algorithm");
+		return -1;
+	}
+	dm_bht_hex_to_bin(bht->root_digest, hexdigest, bht->digest_size);
+	return 0;
+}
+EXPORT_SYMBOL(dm_bht_set_root_hexdigest);
+
+/**
+ * dm_bht_root_hexdigest - returns root digest in hex
+ * @bht:	pointer to a dm_bht_create()d bht
+ * @hexdigest:	u8 array of size @available
+ * @available:	must be bht->digest_size * 2 + 1
+ */
+int dm_bht_root_hexdigest(struct dm_bht *bht, u8 *hexdigest, int available)
+{
+	if (available < 0 ||
+	    ((unsigned int) available) < bht->digest_size * 2 + 1) {
+		DMERR("hexdigest has too few bytes available");
+		return -EINVAL;
+	}
+	dm_bht_bin_to_hex(bht->root_digest, hexdigest, bht->digest_size);
+	return 0;
+}
+EXPORT_SYMBOL(dm_bht_root_hexdigest);
+
+/**
+ * dm_bht_set_salt - sets the salt used, in hex
+ * @bht:      pointer to a dm_bht_create()d bht
+ * @hexsalt:  salt string, as hex; will be zero-padded or truncated to
+ *            DM_BHT_SALT_SIZE * 2 hex digits.
+ */
+void dm_bht_set_salt(struct dm_bht *bht, const char *hexsalt)
+{
+	size_t saltlen = min(strlen(hexsalt) / 2, sizeof(bht->salt));
+
+	memset(bht->salt, 0, sizeof(bht->salt));
+	dm_bht_hex_to_bin(bht->salt, (const u8 *)hexsalt, saltlen);
+}
+EXPORT_SYMBOL(dm_bht_set_salt);
+
+/**
+ * dm_bht_salt - returns the salt used, in hex
+ * @bht:      pointer to a dm_bht_create()d bht
+ * @hexsalt:  buffer to put salt into, of length DM_BHT_SALT_SIZE * 2 + 1.
+ */
+int dm_bht_salt(struct dm_bht *bht, char *hexsalt)
+{
+	dm_bht_bin_to_hex(bht->salt, (u8 *)hexsalt, sizeof(bht->salt));
+	return 0;
+}
+EXPORT_SYMBOL(dm_bht_salt);
+
diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
new file mode 100644
index 0000000..a9bd0e8
--- /dev/null
+++ b/drivers/md/dm-verity.c
@@ -0,0 +1,1043 @@
+/*
+ * Originally based on dm-crypt.c,
+ * Copyright (C) 2003 Christophe Saout <christophe@saout.de>
+ * Copyright (C) 2004 Clemens Fruhwirth <clemens@endorphin.org>
+ * Copyright (C) 2006-2008 Red Hat, Inc. All rights reserved.
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *                    All Rights Reserved.
+ *
+ * This file is released under the GPLv2.
+ *
+ * Implements a verifying transparent block device.
+ * See Documentation/device-mapper/dm-verity.txt
+ */
+#include <linux/async.h>
+#include <linux/atomic.h>
+#include <linux/bio.h>
+#include <linux/blkdev.h>
+#include <linux/delay.h>
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/genhd.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mempool.h>
+#include <linux/mm_types.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/workqueue.h>
+#include <linux/device-mapper.h>
+#include <linux/dm-bht.h>
+
+#include "dm-verity.h"
+
+#define DM_MSG_PREFIX "verity"
+
+/* Supports up to 512-bit digests */
+#define VERITY_MAX_DIGEST_SIZE 64
+
+/* TODO(wad) make both of these report the error line/file to a
+ *           verity_bug function.
+ */
+#define VERITY_BUG(msg...) BUG()
+#define VERITY_BUG_ON(cond, msg...) BUG_ON(cond)
+
+/* Helper for printing sector_t */
+#define ULL(x) ((unsigned long long)(x))
+
+#define MIN_IOS 32
+#define MIN_BIOS (MIN_IOS * 2)
+#define VERITY_DEFAULT_BLOCK_SIZE 4096
+
+/* Provide a lightweight means of specifying the global default for
+ * error behavior: eio, reboot, or none
+ * Legacy support for 0 = eio, 1 = reboot/panic, 2 = none, 3 = notify.
+ * This is matched to the enum in dm-verity.h.
+ */
+static const char * const allowed_error_behaviors[] = { "eio", "panic", "none",
+							"notify", NULL };
+static char *error_behavior = "eio";
+module_param(error_behavior, charp, 0644);
+MODULE_PARM_DESC(error_behavior, "Behavior on error "
+				 "(eio, panic, none, notify)");
+
+/* Controls whether verity_get_device will wait forever for a device. */
+static int dev_wait;
+module_param(dev_wait, bool, 0444);
+MODULE_PARM_DESC(dev_wait, "Wait forever for a backing device");
+
+/* per-requested-bio private data */
+enum verity_io_flags {
+	VERITY_IOFLAGS_CLONED = 0x1,	/* original bio has been cloned */
+};
+
+struct dm_verity_io {
+	struct dm_target *target;
+	struct bio *bio;
+	struct delayed_work work;
+	unsigned int flags;
+
+	int error;
+	atomic_t pending;
+
+	u64 block;  /* aligned block index */
+	u64 count;  /* aligned count in blocks */
+};
+
+struct verity_config {
+	struct dm_dev *dev;
+	sector_t start;
+	sector_t size;
+
+	struct dm_dev *hash_dev;
+	sector_t hash_start;
+
+	struct dm_bht bht;
+
+	/* Pool required for io contexts */
+	mempool_t *io_pool;
+	/* Pool and bios required for making sure that backing device reads are
+	 * in PAGE_SIZE increments.
+	 */
+	struct bio_set *bs;
+
+	char hash_alg[CRYPTO_MAX_ALG_NAME];
+
+	int error_behavior;
+};
+
+static struct kmem_cache *_verity_io_pool;
+static struct workqueue_struct *kveritydq, *kverityd_ioq;
+
+static void kverityd_verify(struct work_struct *work);
+static void kverityd_io(struct work_struct *work);
+static void kverityd_io_bht_populate(struct dm_verity_io *io);
+static void kverityd_io_bht_populate_end(struct bio *, int error);
+
+static BLOCKING_NOTIFIER_HEAD(verity_error_notifier);
+
+/*
+ * Exported interfaces
+ */
+
+int dm_verity_register_error_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_register(&verity_error_notifier, nb);
+}
+EXPORT_SYMBOL_GPL(dm_verity_register_error_notifier);
+
+int dm_verity_unregister_error_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_unregister(&verity_error_notifier, nb);
+}
+EXPORT_SYMBOL_GPL(dm_verity_unregister_error_notifier);
+
+/*
+ * Allocation and utility functions
+ */
+
+static void kverityd_src_io_read_end(struct bio *clone, int error);
+
+/* Shared destructor for all internal bios */
+static void dm_verity_bio_destructor(struct bio *bio)
+{
+	struct dm_verity_io *io = bio->bi_private;
+	struct verity_config *vc = io->target->private;
+	bio_free(bio, vc->bs);
+}
+
+static struct bio *verity_alloc_bioset(struct verity_config *vc, gfp_t gfp_mask,
+				       int nr_iovecs)
+{
+	return bio_alloc_bioset(gfp_mask, nr_iovecs, vc->bs);
+}
+
+static struct dm_verity_io *verity_io_alloc(struct dm_target *ti,
+					    struct bio *bio)
+{
+	struct verity_config *vc = ti->private;
+	sector_t sector = bio->bi_sector - ti->begin;
+	struct dm_verity_io *io;
+
+	io = mempool_alloc(vc->io_pool, GFP_NOIO);
+	if (unlikely(!io))
+		return NULL;
+	io->flags = 0;
+	io->target = ti;
+	io->bio = bio;
+	io->error = 0;
+
+	/* Adjust the sector by the virtual starting sector */
+	io->block = to_bytes(sector) / vc->bht.block_size;
+	io->count = bio->bi_size / vc->bht.block_size;
+
+	atomic_set(&io->pending, 0);
+
+	return io;
+}
+
+static struct bio *verity_bio_clone(struct dm_verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+	struct bio *bio = io->bio;
+	struct bio *clone = verity_alloc_bioset(vc, GFP_NOIO, bio->bi_max_vecs);
+
+	if (!clone)
+		return NULL;
+
+	__bio_clone(clone, bio);
+	clone->bi_private = io;
+	clone->bi_end_io  = kverityd_src_io_read_end;
+	clone->bi_bdev    = vc->dev->bdev;
+	clone->bi_sector += vc->start - io->target->begin;
+	clone->bi_destructor = dm_verity_bio_destructor;
+
+	return clone;
+}
+
+/* If the request is not successful, this handler takes action.
+ * TODO make this call a registered handler.
+ */
+static void verity_error(struct verity_config *vc, struct dm_verity_io *io,
+			 int error)
+{
+	const char *message;
+	int error_mode = DM_VERITY_ERROR_BEHAVIOR_PANIC;
+	dev_t devt = 0;
+	u64 block = ~0;
+	int transient = 1;
+	struct dm_verity_error_state error_state;
+
+	if (vc) {
+		devt = vc->dev->bdev->bd_dev;
+		error_mode = vc->error_behavior;
+	}
+
+	if (io) {
+		io->error = -EIO;
+		block = io->block;
+	}
+
+	switch (error) {
+	case -ENOMEM:
+		message = "out of memory";
+		break;
+	case -EBUSY:
+		message = "pending data seen during verify";
+		break;
+	case -EFAULT:
+		message = "crypto operation failure";
+		break;
+	case -EACCES:
+		message = "integrity failure";
+		/* Image is bad. */
+		transient = 0;
+		break;
+	case -EPERM:
+		message = "hash tree population failure";
+		/* Should be dm-bht specific errors */
+		transient = 0;
+		break;
+	case -EINVAL:
+		message = "unexpected missing/invalid data";
+		/* The device was configured incorrectly - fallback. */
+		transient = 0;
+		break;
+	default:
+		/* Other errors can be passed through as IO errors */
+		message = "unknown or I/O error";
+		return;
+	}
+
+	DMERR_LIMIT("verification failure occurred: %s", message);
+
+	if (error_mode == DM_VERITY_ERROR_BEHAVIOR_NOTIFY) {
+		error_state.code = error;
+		error_state.transient = transient;
+		error_state.block = block;
+		error_state.message = message;
+		error_state.dev_start = vc->start;
+		error_state.dev_len = vc->size;
+		error_state.dev = vc->dev->bdev;
+		error_state.hash_dev_start = vc->hash_start;
+		error_state.hash_dev_len = vc->bht.sectors;
+		error_state.hash_dev = vc->hash_dev->bdev;
+
+		/* Set default fallthrough behavior. */
+		error_state.behavior = DM_VERITY_ERROR_BEHAVIOR_PANIC;
+		error_mode = DM_VERITY_ERROR_BEHAVIOR_PANIC;
+
+		if (!blocking_notifier_call_chain(
+		    &verity_error_notifier, transient, &error_state)) {
+			error_mode = error_state.behavior;
+		}
+	}
+
+	switch (error_mode) {
+	case DM_VERITY_ERROR_BEHAVIOR_EIO:
+		break;
+	case DM_VERITY_ERROR_BEHAVIOR_NONE:
+		if (error != -EIO && io)
+			io->error = 0;
+		break;
+	default:
+		goto do_panic;
+	}
+	return;
+
+do_panic:
+	panic("dm-verity failure: "
+	      "device:%u:%u error:%d block:%llu message:%s",
+	      MAJOR(devt), MINOR(devt), error, ULL(block), message);
+}
+
+/**
+ * verity_parse_error_behavior - parse a behavior charp to the enum
+ * @behavior:	NUL-terminated char array
+ *
+ * Checks if the behavior is valid either as text or as an index digit
+ * and returns the proper enum value or -1 on error.
+ */
+static int verity_parse_error_behavior(const char *behavior)
+{
+	const char * const *allowed = allowed_error_behaviors;
+	char index = '0';
+
+	for (; *allowed; allowed++, index++)
+		if (!strcmp(*allowed, behavior) || behavior[0] == index)
+			break;
+
+	if (!*allowed)
+		return -1;
+
+	/* Convert to the integer index matching the enum. */
+	return allowed - allowed_error_behaviors;
+}
+
+/*
+ * Reverse flow of requests into the device.
+ *
+ * (Start at the bottom with verity_map and work your way upward).
+ */
+
+static void verity_inc_pending(struct dm_verity_io *io);
+
+static void verity_return_bio_to_caller(struct dm_verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+
+	if (io->error)
+		verity_error(vc, io, io->error);
+
+	bio_endio(io->bio, io->error);
+	mempool_free(io, vc->io_pool);
+}
+
+/* Check for any missing bht hashes. */
+static bool verity_is_bht_populated(struct dm_verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+	u64 block;
+
+	for (block = io->block; block < io->block + io->count; ++block)
+		if (!dm_bht_is_populated(&vc->bht, block))
+			return false;
+
+	return true;
+}
+
+/* verity_dec_pending manages the lifetime of all dm_verity_io structs.
+ * Non-bug error handling is centralized through this interface and
+ * all passage from workqueue to workqueue.
+ */
+static void verity_dec_pending(struct dm_verity_io *io)
+{
+	if (!atomic_dec_and_test(&io->pending))
+		goto done;
+
+	if (unlikely(io->error))
+		goto io_error;
+
+	/* I/Os that were pending may now be ready */
+	if (verity_is_bht_populated(io)) {
+		INIT_DELAYED_WORK(&io->work, kverityd_verify);
+		queue_delayed_work(kveritydq, &io->work, 0);
+	} else {
+		INIT_DELAYED_WORK(&io->work, kverityd_io);
+		queue_delayed_work(kverityd_ioq, &io->work, HZ/10);
+	}
+
+done:
+	return;
+
+io_error:
+	verity_return_bio_to_caller(io);
+}
+
+/* Walks the data set and computes the hash of the data read from the
+ * untrusted source device.  The computed hash is then passed to dm-bht
+ * for verification.
+ */
+static int verity_verify(struct verity_config *vc,
+			 struct dm_verity_io *io)
+{
+	unsigned int block_size = vc->bht.block_size;
+	struct bio *bio = io->bio;
+	u64 block = io->block;
+	unsigned int idx;
+	int r;
+
+	for (idx = bio->bi_idx; idx < bio->bi_vcnt; idx++) {
+		struct bio_vec *bv = bio_iovec_idx(bio, idx);
+		unsigned int offset = bv->bv_offset;
+		unsigned int len = bv->bv_len;
+
+		VERITY_BUG_ON(offset % block_size);
+		VERITY_BUG_ON(len % block_size);
+
+		while (len) {
+			r = dm_bht_verify_block(&vc->bht, block,
+						bv->bv_page, offset);
+			if (r)
+				goto bad_return;
+
+			offset += block_size;
+			len -= block_size;
+			block++;
+			cond_resched();
+		}
+	}
+
+	return 0;
+
+bad_return:
+	/* dm_bht functions aren't expected to return errno friendly
+	 * values.  They are converted here for uniformity.
+	 */
+	if (r > 0) {
+		DMERR("Pending data for block %llu seen at verify", ULL(block));
+		r = -EBUSY;
+	} else {
+		DMERR_LIMIT("Block hash does not match!");
+		r = -EACCES;
+	}
+	return r;
+}
+
+/* Services the verify workqueue */
+static void kverityd_verify(struct work_struct *work)
+{
+	struct delayed_work *dwork = container_of(work, struct delayed_work,
+						  work);
+	struct dm_verity_io *io = container_of(dwork, struct dm_verity_io,
+					       work);
+	struct verity_config *vc = io->target->private;
+
+	io->error = verity_verify(vc, io);
+
+	/* Free up the bio and tag with the return value */
+	verity_return_bio_to_caller(io);
+}
+
+/* Asynchronously called upon the completion of dm-bht I/O.  The status
+ * of the operation is passed back to dm-bht and the next steps are
+ * decided by verity_dec_pending.
+ */
+static void kverityd_io_bht_populate_end(struct bio *bio, int error)
+{
+	struct dm_bht_entry *entry = (struct dm_bht_entry *) bio->bi_private;
+	struct dm_verity_io *io = (struct dm_verity_io *) entry->io_context;
+
+	/* Tell the tree to atomically update now that we've populated
+	 * the given entry.
+	 */
+	dm_bht_read_completed(entry, error);
+
+	/* Clean up for reuse when reading data to be checked */
+	bio->bi_vcnt = 0;
+	bio->bi_io_vec->bv_offset = 0;
+	bio->bi_io_vec->bv_len = 0;
+	bio->bi_io_vec->bv_page = NULL;
+	/* Restore the private data to I/O so the destructor can be shared. */
+	bio->bi_private = (void *) io;
+	bio_put(bio);
+
+	/* We bail but assume the tree has been marked bad. */
+	if (unlikely(error)) {
+		DMERR("Failed to read for sector %llu (%u)",
+		      ULL(io->bio->bi_sector), io->bio->bi_size);
+		io->error = error;
+		/* Pass through the error to verity_dec_pending below */
+	}
+	/* When pending = 0, it will transition to reading real data */
+	verity_dec_pending(io);
+}
+
+/* Called by dm-bht (via dm_bht_populate), this function provides
+ * the message digests to dm-bht that are stored on disk.
+ */
+static int kverityd_bht_read_callback(void *ctx, sector_t start, u8 *dst,
+				      sector_t count,
+				      struct dm_bht_entry *entry)
+{
+	struct dm_verity_io *io = ctx;  /* I/O for this batch */
+	struct verity_config *vc;
+	struct bio *bio;
+
+	vc = io->target->private;
+
+	/* The I/O context is nested inside the entry so that we don't need one
+	 * io context per page read.
+	 */
+	entry->io_context = ctx;
+
+	/* We should only get page size requests at present. */
+	verity_inc_pending(io);
+	bio = verity_alloc_bioset(vc, GFP_NOIO, 1);
+	if (unlikely(!bio)) {
+		DMCRIT("Out of memory at bio_alloc_bioset");
+		dm_bht_read_completed(entry, -ENOMEM);
+		return -ENOMEM;
+	}
+	bio->bi_private = (void *) entry;
+	bio->bi_idx = 0;
+	bio->bi_size = vc->bht.block_size;
+	bio->bi_sector = vc->hash_start + start;
+	bio->bi_bdev = vc->hash_dev->bdev;
+	bio->bi_end_io = kverityd_io_bht_populate_end;
+	bio->bi_rw = REQ_META;
+	/* Only need to free the bio since the page is managed by bht */
+	bio->bi_destructor = dm_verity_bio_destructor;
+	bio->bi_vcnt = 1;
+	bio->bi_io_vec->bv_offset = offset_in_page(dst);
+	bio->bi_io_vec->bv_len = to_bytes(count);
+	/* dst is guaranteed to be a page_pool allocation */
+	bio->bi_io_vec->bv_page = virt_to_page(dst);
+	/* Track that this I/O is in use.  There should be no risk of the io
+	 * being removed prior since this is called synchronously.
+	 */
+	generic_make_request(bio);
+	return 0;
+}
+
+/* Submits an io request for each missing block of block hashes.
+ * The last one to return will then enqueue this on the io workqueue.
+ */
+static void kverityd_io_bht_populate(struct dm_verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+	u64 block;
+
+	for (block = io->block; block < io->block + io->count; ++block) {
+		int ret = dm_bht_populate(&vc->bht, io, block);
+
+		if (ret < 0) {
+			/* verity_dec_pending will handle the error case. */
+			io->error = ret;
+			break;
+		}
+	}
+}
+
+/* Asynchronously called upon the completion of I/O issued
+ * from kverityd_src_io_read. verity_dec_pending() acts as
+ * the scheduler/flow manager.
+ */
+static void kverityd_src_io_read_end(struct bio *clone, int error)
+{
+	struct dm_verity_io *io = clone->bi_private;
+
+	if (unlikely(!bio_flagged(clone, BIO_UPTODATE) && !error))
+		error = -EIO;
+
+	if (unlikely(error)) {
+		DMERR("Error occurred: %d (%llu, %u)",
+			error, ULL(clone->bi_sector), clone->bi_size);
+		io->error = error;
+	}
+
+	/* Release the clone which just avoids the block layer from
+	 * leaving offsets, etc in unexpected states.
+	 */
+	bio_put(clone);
+
+	verity_dec_pending(io);
+}
+
+/* If not yet underway, an I/O request will be issued to the vc->dev
+ * device for the data needed. It is cloned to avoid unexpected changes
+ * to the original bio struct.
+ */
+static void kverityd_src_io_read(struct dm_verity_io *io)
+{
+	struct bio *clone;
+
+	/* Check if the read is already issued. */
+	if (io->flags & VERITY_IOFLAGS_CLONED)
+		return;
+
+	io->flags |= VERITY_IOFLAGS_CLONED;
+
+	/* Clone the bio. The block layer may modify the bvec array. */
+	clone = verity_bio_clone(io);
+	if (unlikely(!clone)) {
+		io->error = -ENOMEM;
+		return;
+	}
+
+	verity_inc_pending(io);
+
+	generic_make_request(clone);
+}
+
+/* kverityd_io services the I/O workqueue. For each pass through
+ * the I/O workqueue, a call to populate both the origin drive
+ * data and the hash tree data is made.
+ */
+static void kverityd_io(struct work_struct *work)
+{
+	struct delayed_work *dwork = container_of(work, struct delayed_work,
+						  work);
+	struct dm_verity_io *io = container_of(dwork, struct dm_verity_io,
+					       work);
+
+	/* Issue requests asynchronously. */
+	verity_inc_pending(io);
+	kverityd_src_io_read(io);
+	kverityd_io_bht_populate(io);
+	verity_dec_pending(io);
+}
+
+/* Paired with verity_dec_pending, the pending value in the io dictate the
+ * lifetime of a request and when it is ready to be processed on the
+ * workqueues.
+ */
+static void verity_inc_pending(struct dm_verity_io *io)
+{
+	atomic_inc(&io->pending);
+}
+
+/* Block-level requests start here. */
+static int verity_map(struct dm_target *ti, struct bio *bio,
+		      union map_info *map_context)
+{
+	struct dm_verity_io *io;
+	struct verity_config *vc;
+	struct request_queue *r_queue;
+
+	if (unlikely(!ti)) {
+		DMERR("dm_target was NULL");
+		return -EIO;
+	}
+
+	vc = ti->private;
+	r_queue = bdev_get_queue(vc->dev->bdev);
+
+	if (bio_data_dir(bio) == WRITE) {
+		/* If we silently drop writes, then the VFS layer will cache
+		 * the write and persist it in memory. While it doesn't change
+		 * the underlying storage, it still may be contrary to the
+		 * behavior expected by a verified, read-only device.
+		 */
+		DMWARN_LIMIT("write request received. rejecting with -EIO.");
+		verity_error(vc, NULL, -EIO);
+		return -EIO;
+	} else {
+		/* Queue up the request to be verified */
+		io = verity_io_alloc(ti, bio);
+		if (!io) {
+			DMERR_LIMIT("Failed to allocate and init IO data");
+			return DM_MAPIO_REQUEUE;
+		}
+		INIT_DELAYED_WORK(&io->work, kverityd_io);
+		queue_delayed_work(kverityd_ioq, &io->work, 0);
+	}
+
+	return DM_MAPIO_SUBMITTED;
+}
+
+static void splitarg(char *arg, char **key, char **val)
+{
+	*key = strsep(&arg, "=");
+	*val = strsep(&arg, "");
+}
+
+/*
+ * Non-block interfaces and device-mapper specific code
+ */
+
+/**
+ * verity_ctr - Construct a verified mapping
+ * @ti:   Target being created
+ * @argc: Number of elements in argv
+ * @argv: Vector of key-value pairs (see below).
+ *
+ * Accepts the following keys:
+ * @payload:        hashed device
+ * @hashtree:       device hashtree is stored on
+ * @hashstart:      start address of hashes (default 0)
+ * @block_size:     size of a hash block
+ * @alg:            hash algorithm
+ * @root_hexdigest: toplevel hash of the tree
+ * @error_behavior: what to do when verification fails [optional]
+ * @salt:           salt, in hex [optional]
+ *
+ * E.g.,
+ * payload=/dev/sda2 hashtree=/dev/sda3 alg=sha256
+ * root_hexdigest=f08aa4a3695290c569eb1b0ac032ae1040150afb527abbeb0a3da33d82fb2c6e
+ *
+ * TODO(wad):
+ * - Boot time addition
+ * - Track block verification to free block_hashes if memory use is a concern
+ * Testing needed:
+ * - Regular slub_debug tracing (on checkins)
+ * - Improper block hash padding
+ * - Improper bundle padding
+ * - Improper hash layout
+ * - Missing padding at end of device
+ * - Improperly sized underlying devices
+ * - Out of memory conditions (make sure this isn't too flaky under high load!)
+ * - Incorrect superhash
+ * - Incorrect block hashes
+ * - Incorrect bundle hashes
+ * - Boot-up read speed; sustained read speeds
+ */
+static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+{
+	struct verity_config *vc = NULL;
+	int ret = 0;
+	sector_t blocks;
+	unsigned int block_size = VERITY_DEFAULT_BLOCK_SIZE;
+	const char *payload = NULL;
+	const char *hashtree = NULL;
+	unsigned long hashstart = 0;
+	const char *alg = NULL;
+	const char *root_hexdigest = NULL;
+	const char *dev_error_behavior = error_behavior;
+	const char *hexsalt = "";
+	int i;
+
+	for (i = 0; i < argc; ++i) {
+		char *key, *val;
+		DMWARN("Argument %d: '%s'", i, argv[i]);
+		splitarg(argv[i], &key, &val);
+		if (!key) {
+			DMWARN("Bad argument %d: missing key?", i);
+			break;
+		}
+		if (!val) {
+			DMWARN("Bad argument %d='%s': missing value", i, key);
+			break;
+		}
+
+		if (!strcmp(key, "alg")) {
+			alg = val;
+		} else if (!strcmp(key, "payload")) {
+			payload = val;
+		} else if (!strcmp(key, "hashtree")) {
+			hashtree = val;
+		} else if (!strcmp(key, "root_hexdigest")) {
+			root_hexdigest = val;
+		} else if (!strcmp(key, "hashstart")) {
+			if (strict_strtoul(val, 10, &hashstart)) {
+				ti->error = "Invalid hashstart";
+				return -EINVAL;
+			}
+		} else if (!strcmp(key, "block_size")) {
+			unsigned long tmp;
+			if (strict_strtoul(val, 10, &tmp) ||
+			    (tmp > UINT_MAX)) {
+				ti->error = "Invalid block_size";
+				return -EINVAL;
+			}
+			block_size = (unsigned int)tmp;
+		} else if (!strcmp(key, "error_behavior")) {
+			dev_error_behavior = val;
+		} else if (!strcmp(key, "salt")) {
+			hexsalt = val;
+		} else if (!strcmp(key, "error_behavior")) {
+			dev_error_behavior = val;
+		}
+	}
+
+#define NEEDARG(n) \
+	if (!(n)) { \
+		ti->error = "Missing argument: " #n; \
+		return -EINVAL; \
+	}
+
+	NEEDARG(alg);
+	NEEDARG(payload);
+	NEEDARG(hashtree);
+	NEEDARG(root_hexdigest);
+
+#undef NEEDARG
+
+	/* The device mapper device should be setup read-only */
+	if ((dm_table_get_mode(ti->table) & ~FMODE_READ) != 0) {
+		ti->error = "Must be created readonly.";
+		return -EINVAL;
+	}
+
+	vc = kzalloc(sizeof(*vc), GFP_KERNEL);
+	if (!vc) {
+		/* TODO(wad) if this is called from the setup helper, then we
+		 * catch these errors and do a CrOS specific thing. if not, we
+		 * need to have this call the error handler.
+		 */
+		return -EINVAL;
+	}
+
+	/* Calculate the blocks from the given device size */
+	vc->size = ti->len;
+	blocks = to_bytes(vc->size) / block_size;
+	if (dm_bht_create(&vc->bht, blocks, block_size, alg)) {
+		DMERR("failed to create required bht");
+		goto bad_bht;
+	}
+	if (dm_bht_set_root_hexdigest(&vc->bht, root_hexdigest)) {
+		DMERR("root hexdigest error");
+		goto bad_root_hexdigest;
+	}
+	dm_bht_set_salt(&vc->bht, hexsalt);
+	vc->bht.read_cb = kverityd_bht_read_callback;
+
+	/* payload: device to verify */
+	vc->start = 0;  /* TODO: should this support a starting offset? */
+	/* We only ever grab the device in read-only mode. */
+	ret = dm_get_device(ti, payload,
+			    dm_table_get_mode(ti->table), &vc->dev);
+	if (ret) {
+		DMERR("Failed to acquire device '%s': %d", payload, ret);
+		ti->error = "Device lookup failed";
+		goto bad_verity_dev;
+	}
+
+	if ((to_bytes(vc->start) % block_size) ||
+	    (to_bytes(vc->size) % block_size)) {
+		ti->error = "Device must be block_size divisble/aligned";
+		goto bad_hash_start;
+	}
+
+	vc->hash_start = (sector_t)hashstart;
+
+	/* hashtree: device with hashes.
+	 * Note, payload == hashtree is okay as long as the size of
+	 *       ti->len passed to device mapper does not include
+	 *       the hashes.
+	 */
+	if (dm_get_device(ti, hashtree,
+			  dm_table_get_mode(ti->table), &vc->hash_dev)) {
+		ti->error = "Hash device lookup failed";
+		goto bad_hash_dev;
+	}
+
+	/* arg4: cryptographic digest algorithm */
+	if (snprintf(vc->hash_alg, CRYPTO_MAX_ALG_NAME, "%s", alg) >=
+	    CRYPTO_MAX_ALG_NAME) {
+		ti->error = "Hash algorithm name is too long";
+		goto bad_hash;
+	}
+
+	/* override with optional device-specific error behavior */
+	vc->error_behavior = verity_parse_error_behavior(dev_error_behavior);
+	if (vc->error_behavior == -1) {
+		ti->error = "Bad error_behavior supplied";
+		goto bad_err_behavior;
+	}
+
+	/* TODO: Maybe issues a request on the io queue for block 0? */
+
+	/* Argument processing is done, setup operational data */
+	/* Pool for dm_verity_io objects */
+	vc->io_pool = mempool_create_slab_pool(MIN_IOS, _verity_io_pool);
+	if (!vc->io_pool) {
+		ti->error = "Cannot allocate verity io mempool";
+		goto bad_slab_pool;
+	}
+
+	/* Allocate the bioset used for request padding */
+	/* TODO(wad) allocate a separate bioset for the first verify maybe */
+	vc->bs = bioset_create(MIN_BIOS, 0);
+	if (!vc->bs) {
+		ti->error = "Cannot allocate verity bioset";
+		goto bad_bs;
+	}
+
+	ti->num_flush_requests = 1;
+	ti->private = vc;
+
+	/* TODO(wad) add device and hash device names */
+	{
+		char hashdev[BDEVNAME_SIZE], vdev[BDEVNAME_SIZE];
+		bdevname(vc->hash_dev->bdev, hashdev);
+		bdevname(vc->dev->bdev, vdev);
+		DMINFO("dev:%s hash:%s [sectors:%llu blocks:%llu]", vdev,
+		       hashdev, ULL(vc->bht.sectors), ULL(blocks));
+	}
+	return 0;
+
+bad_bs:
+	mempool_destroy(vc->io_pool);
+bad_slab_pool:
+bad_err_behavior:
+bad_hash:
+	dm_put_device(ti, vc->hash_dev);
+bad_hash_dev:
+bad_hash_start:
+	dm_put_device(ti, vc->dev);
+bad_bht:
+bad_root_hexdigest:
+bad_verity_dev:
+	kfree(vc);   /* hash is not secret so no need to zero */
+	return -EINVAL;
+}
+
+static void verity_dtr(struct dm_target *ti)
+{
+	struct verity_config *vc = (struct verity_config *) ti->private;
+
+	bioset_free(vc->bs);
+	mempool_destroy(vc->io_pool);
+	dm_bht_destroy(&vc->bht);
+	dm_put_device(ti, vc->hash_dev);
+	dm_put_device(ti, vc->dev);
+	kfree(vc);
+}
+
+static int verity_status(struct dm_target *ti, status_type_t type,
+			char *result, unsigned int maxlen)
+{
+	struct verity_config *vc = (struct verity_config *) ti->private;
+	unsigned int sz = 0;
+	char hashdev[BDEVNAME_SIZE], vdev[BDEVNAME_SIZE];
+	u8 hexdigest[VERITY_MAX_DIGEST_SIZE * 2 + 1] = { 0 };
+
+	dm_bht_root_hexdigest(&vc->bht, hexdigest, sizeof(hexdigest));
+
+	switch (type) {
+	case STATUSTYPE_INFO:
+		break;
+	case STATUSTYPE_TABLE:
+		bdevname(vc->hash_dev->bdev, hashdev);
+		bdevname(vc->dev->bdev, vdev);
+		DMEMIT("/dev/%s /dev/%s %llu %u %s %s",
+			vdev,
+			hashdev,
+			ULL(vc->hash_start),
+			vc->bht.depth,
+			vc->hash_alg,
+			hexdigest);
+		break;
+	}
+	return 0;
+}
+
+static int verity_merge(struct dm_target *ti, struct bvec_merge_data *bvm,
+		       struct bio_vec *biovec, int max_size)
+{
+	struct verity_config *vc = ti->private;
+	struct request_queue *q = bdev_get_queue(vc->dev->bdev);
+
+	if (!q->merge_bvec_fn)
+		return max_size;
+
+	bvm->bi_bdev = vc->dev->bdev;
+	bvm->bi_sector = vc->start + bvm->bi_sector - ti->begin;
+
+	/* Optionally, this could just return 0 to stick to single pages. */
+	return min(max_size, q->merge_bvec_fn(q, bvm, biovec));
+}
+
+static int verity_iterate_devices(struct dm_target *ti,
+				 iterate_devices_callout_fn fn, void *data)
+{
+	struct verity_config *vc = ti->private;
+
+	return fn(ti, vc->dev, vc->start, ti->len, data);
+}
+
+static void verity_io_hints(struct dm_target *ti,
+			    struct queue_limits *limits)
+{
+	struct verity_config *vc = ti->private;
+	unsigned int block_size = vc->bht.block_size;
+
+	limits->logical_block_size = block_size;
+	limits->physical_block_size = block_size;
+	blk_limits_io_min(limits, block_size);
+}
+
+static struct target_type verity_target = {
+	.name   = "verity",
+	.version = {0, 1, 0},
+	.module = THIS_MODULE,
+	.ctr    = verity_ctr,
+	.dtr    = verity_dtr,
+	.map    = verity_map,
+	.merge  = verity_merge,
+	.status = verity_status,
+	.iterate_devices = verity_iterate_devices,
+	.io_hints = verity_io_hints,
+};
+
+#define VERITY_WQ_FLAGS (WQ_CPU_INTENSIVE|WQ_HIGHPRI)
+
+static int __init dm_verity_init(void)
+{
+	int r = -ENOMEM;
+
+	_verity_io_pool = KMEM_CACHE(dm_verity_io, 0);
+	if (!_verity_io_pool) {
+		DMERR("failed to allocate pool dm_verity_io");
+		goto bad_io_pool;
+	}
+
+	kverityd_ioq = alloc_workqueue("kverityd_io", VERITY_WQ_FLAGS, 1);
+	if (!kverityd_ioq) {
+		DMERR("failed to create workqueue kverityd_ioq");
+		goto bad_io_queue;
+	}
+
+	kveritydq = alloc_workqueue("kverityd", VERITY_WQ_FLAGS, 1);
+	if (!kveritydq) {
+		DMERR("failed to create workqueue kveritydq");
+		goto bad_verify_queue;
+	}
+
+	r = dm_register_target(&verity_target);
+	if (r < 0) {
+		DMERR("register failed %d", r);
+		goto register_failed;
+	}
+
+	DMINFO("version %u.%u.%u loaded", verity_target.version[0],
+	       verity_target.version[1], verity_target.version[2]);
+
+	return r;
+
+register_failed:
+	destroy_workqueue(kveritydq);
+bad_verify_queue:
+	destroy_workqueue(kverityd_ioq);
+bad_io_queue:
+	kmem_cache_destroy(_verity_io_pool);
+bad_io_pool:
+	return r;
+}
+
+static void __exit dm_verity_exit(void)
+{
+	destroy_workqueue(kveritydq);
+	destroy_workqueue(kverityd_ioq);
+
+	dm_unregister_target(&verity_target);
+	kmem_cache_destroy(_verity_io_pool);
+}
+
+module_init(dm_verity_init);
+module_exit(dm_verity_exit);
+
+MODULE_AUTHOR("The Chromium OS Authors <chromium-os-dev@chromium.org>");
+MODULE_DESCRIPTION(DM_NAME " target for transparent disk integrity checking");
+MODULE_LICENSE("GPL");
diff --git a/drivers/md/dm-verity.h b/drivers/md/dm-verity.h
new file mode 100644
index 0000000..e0664c9
--- /dev/null
+++ b/drivers/md/dm-verity.h
@@ -0,0 +1,45 @@
+/*
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *                    All Rights Reserved.
+ *
+ * This file is released under the GPLv2.
+ *
+ * Provide error types for use when creating a custom error handler.
+ * See Documentation/device-mapper/dm-verity.txt
+ */
+#ifndef DM_VERITY_H
+#define DM_VERITY_H
+
+#include <linux/notifier.h>
+
+struct dm_verity_error_state {
+	int code;
+	int transient;  /* Likely to not happen after a reboot */
+	u64 block;
+	const char *message;
+
+	sector_t dev_start;
+	sector_t dev_len;
+	struct block_device *dev;
+
+	sector_t hash_dev_start;
+	sector_t hash_dev_len;
+	struct block_device *hash_dev;
+
+	/* Final behavior after all notifications are completed. */
+	int behavior;
+};
+
+/* This enum must be matched to allowed_error_behaviors in dm-verity.c */
+enum dm_verity_error_behavior {
+	DM_VERITY_ERROR_BEHAVIOR_EIO = 0,
+	DM_VERITY_ERROR_BEHAVIOR_PANIC,
+	DM_VERITY_ERROR_BEHAVIOR_NONE,
+	DM_VERITY_ERROR_BEHAVIOR_NOTIFY
+};
+
+
+int dm_verity_register_error_notifier(struct notifier_block *nb);
+int dm_verity_unregister_error_notifier(struct notifier_block *nb);
+
+#endif  /* DM_VERITY_H */
diff --git a/include/linux/dm-bht.h b/include/linux/dm-bht.h
new file mode 100644
index 0000000..3a4b432
--- /dev/null
+++ b/include/linux/dm-bht.h
@@ -0,0 +1,166 @@
+/*
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *
+ * Device-Mapper block hash tree interface.
+ * See Documentation/device-mapper/dm-bht.txt for details.
+ *
+ * This file is released under the GPLv2.
+ */
+#ifndef __LINUX_DM_BHT_H
+#define __LINUX_DM_BHT_H
+
+#include <crypto/hash.h>
+#include <linux/compiler.h>
+#include <linux/types.h>
+
+/* To avoid allocating memory for digest tests, we just setup a
+ * max to use for now.
+ */
+#define DM_BHT_MAX_DIGEST_SIZE 128  /* 1k hashes are unlikely for now */
+#define DM_BHT_SALT_SIZE       32   /* 256 bits of salt is a lot */
+
+/* UNALLOCATED, PENDING, READY, and VERIFIED are valid states. All other
+ * values are entry-related return codes.
+ */
+#define DM_BHT_ENTRY_VERIFIED 8  /* 'nodes' has been checked against parent */
+#define DM_BHT_ENTRY_READY 4  /* 'nodes' is loaded and available */
+#define DM_BHT_ENTRY_PENDING 2  /* 'nodes' is being loaded */
+#define DM_BHT_ENTRY_UNALLOCATED 0 /* untouched */
+#define DM_BHT_ENTRY_ERROR -1 /* entry is unsuitable for use */
+#define DM_BHT_ENTRY_ERROR_IO -2 /* I/O error on load */
+
+/* Additional possible return codes */
+#define DM_BHT_ENTRY_ERROR_MISMATCH -3 /* Digest mismatch */
+
+/* dm_bht_entry
+ * Contains dm_bht->node_count tree nodes at a given tree depth.
+ * state is used to transactionally assure that data is paged in
+ * from disk.  Unless dm_bht kept running crypto contexts for each
+ * level, we need to load in the data for on-demand verification.
+ */
+struct dm_bht_entry {
+	atomic_t state; /* see defines */
+	/* Keeping an extra pointer per entry wastes up to ~33k of
+	 * memory if a 1m blocks are used (or 66 on 64-bit arch)
+	 */
+	void *io_context;  /* Reserve a pointer for use during io */
+	/* data should only be non-NULL if fully populated. */
+	void *nodes;  /* The hash data used to verify the children.
+		       * Guaranteed to be page-aligned.
+		       */
+};
+
+/* dm_bht_level
+ * Contains an array of entries which represent a page of hashes where
+ * each hash is a node in the tree at the given tree depth/level.
+ */
+struct dm_bht_level {
+	struct dm_bht_entry *entries;  /* array of entries of tree nodes */
+	unsigned int count;  /* number of entries at this level */
+	sector_t sector;  /* starting sector for this level */
+};
+
+/* opaque context, start, databuf, sector_count */
+typedef int(*dm_bht_callback)(void *,  /* external context */
+			      sector_t,  /* start sector */
+			      u8 *,  /* destination page */
+			      sector_t,  /* num sectors */
+			      struct dm_bht_entry *);
+/* dm_bht - Device mapper block hash tree
+ * dm_bht provides a fixed interface for comparing data blocks
+ * against a cryptographic hashes stored in a hash tree. It
+ * optimizes the tree structure for storage on disk.
+ *
+ * The tree is built from the bottom up.  A collection of data,
+ * external to the tree, is hashed and these hashes are stored
+ * as the blocks in the tree.  For some number of these hashes,
+ * a parent node is created by hashing them.  These steps are
+ * repeated.
+ *
+ * TODO(wad): All hash storage memory is pre-allocated and freed once an
+ * entire branch has been verified.
+ */
+struct dm_bht {
+	/* Configured values */
+	int depth;  /* Depth of the tree including the root */
+	unsigned int block_count;  /* Number of blocks hashed */
+	unsigned int block_size;  /* Size of a hash block */
+	char hash_alg[CRYPTO_MAX_ALG_NAME];
+	unsigned char salt[DM_BHT_SALT_SIZE];
+
+	/* Computed values */
+	unsigned int node_count;  /* Data size (in hashes) for each entry */
+	unsigned int node_count_shift;  /* first bit set - 1 */
+	/* There is one per CPU so that verified can be simultaneous. */
+	struct shash_desc *hash_desc[NR_CPUS];  /* Container for the hash alg */
+	unsigned int digest_size;
+	sector_t sectors;  /* Number of disk sectors used */
+
+	/* bool verified;  Full tree is verified */
+	u8 root_digest[DM_BHT_MAX_DIGEST_SIZE];
+	struct dm_bht_level *levels;  /* in reverse order */
+	/* Callback for reading from the hash device */
+	dm_bht_callback read_cb;
+};
+
+/* Constructor for struct dm_bht instances. */
+int dm_bht_create(struct dm_bht *bht,
+		  unsigned int block_count,
+		  unsigned int block_size,
+		  const char *alg_name);
+/* Destructor for struct dm_bht instances.  Does not free @bht */
+void dm_bht_destroy(struct dm_bht *bht);
+
+/* Basic accessors for struct dm_bht */
+int dm_bht_set_root_hexdigest(struct dm_bht *bht, const u8 *hexdigest);
+int dm_bht_root_hexdigest(struct dm_bht *bht, u8 *hexdigest, int available);
+void dm_bht_set_salt(struct dm_bht *bht, const char *hexsalt);
+int dm_bht_salt(struct dm_bht *bht, char *hexsalt);
+
+/* Functions for loading in data from disk for verification */
+bool dm_bht_is_populated(struct dm_bht *bht, unsigned int block);
+int dm_bht_populate(struct dm_bht *bht, void *read_cb_ctx,
+		    unsigned int block);
+int dm_bht_verify_block(struct dm_bht *bht, unsigned int block,
+			struct page *pg, unsigned int offset);
+void dm_bht_read_completed(struct dm_bht_entry *entry, int status);
+
+/* Functions for converting indices to nodes. */
+
+static inline unsigned int dm_bht_get_level_shift(struct dm_bht *bht,
+						  int depth)
+{
+	return (bht->depth - depth) * bht->node_count_shift;
+}
+
+/* For the given depth, this is the entry index.  At depth+1 it is the node
+ * index for depth.
+ */
+static inline unsigned int dm_bht_index_at_level(struct dm_bht *bht,
+							int depth,
+							unsigned int leaf)
+{
+	return leaf >> dm_bht_get_level_shift(bht, depth);
+}
+
+static inline struct dm_bht_entry *dm_bht_get_entry(struct dm_bht *bht,
+						    int depth,
+						    unsigned int block)
+{
+	unsigned int index = dm_bht_index_at_level(bht, depth, block);
+	struct dm_bht_level *level = &bht->levels[depth];
+
+	return &level->entries[index];
+}
+
+static inline void *dm_bht_get_node(struct dm_bht *bht,
+				  struct dm_bht_entry *entry,
+				  int depth,
+				  unsigned int block)
+{
+	unsigned int index = dm_bht_index_at_level(bht, depth, block);
+	unsigned int node_index = index % bht->node_count;
+
+	return entry->nodes + (node_index * bht->digest_size);
+}
+#endif  /* __LINUX_DM_BHT_H */
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH] dm: verity target
@ 2012-03-10  0:03 Mandeep Singh Baines
  0 siblings, 0 replies; 22+ messages in thread
From: Mandeep Singh Baines @ 2012-03-10  0:03 UTC (permalink / raw)
  To: Alasdair G Kergon, dm-devel, linux-kernel
  Cc: Mandeep Singh Baines, Will Drewry, Elly Jones, Milan Broz,
	Olof Johansson, Steffen Klassert, Andrew Morton, Mikulas Patocka,
	Tejun Heo

The verity target provides transparent integrity checking of block devices
using a cryptographic digest.

dm-verity is meant to be setup as part of a verified boot path.  This
may be anything ranging from a boot using tboot or trustedgrub to just
booting from a known-good device (like a USB drive or CD).

dm-verity is part of ChromeOS's verified boot path. It is used to verify
the integrity of the root filesystem on boot. The root filesystem is
mounted on a dm-verity partition which transparently verifies each block
with a bootloader verified hash passed into the kernel at boot.

Changes in V7:
* https://lkml.org/lkml/2012/3/8/454 (Mikulas Patocka)
  * Reported bug: can't assume workqueue's don't migrate CPUs.
  * https://lkml.org/lkml/2012/3/9/580 (Tejun Heo)
    * Suggested fixing by flushing workqueue in hotcpu notifier.
Changes in V6:
  * Fixed bug in rmmod. Was freeing the same object NR_CPUS times.
  * Fixed example in documentation.
Changes in V5:
* https://lkml.org/lkml/2012/2/29/421 (Mikulas Patocka)
  * Fixed off-by-one error.
  * Added support for filesystems bigger than 4G (bug fix).
* https://lkml.org/lkml/2012/2/29/426 (Andrew Morton)
  * Fixed checkpatch errors/warning.
  * Made code cpu-hotplug-aware.
  * Remove NULL check before calling kfree.
  * No longer checking __GFP_WAIT allocations.
  * Propogate io->error instead of always EIO.
  * Remove unneeded and undesirable casts of void.
  * Use DMERR_LIMIT on io errors to avoid spamming dmesg.
  * Flush workqueue on rmmod.
Changes in V4:
* Discussion over phone (Alasdair G Kergon)
 * copy _ioctl fix from dm-linear
 * verity_status format fixes to match dm conventions
 * s/dm-bht/verity_tree
 * put everything into dm-verity.c
 * ctr changed to dm conventions
 * use hex2bin
 * use conventional dm names for function
  * s/dm_//
  * for example: verity_ctr versus dm_verity_ctr
 * use per_cpu API
Changes in V3:
* Discussion over irc (Alasdair G Kergon)
  * Implement ioctl hook
Changes in V2:
* https://lkml.org/lkml/2011/11/10/85 (Steffen Klassert)
  * Use shash API instead of older hash API

Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Elly Jones <ellyjones@chromium.org>
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Milan Broz <mbroz@redhat.com>
Cc: Olof Johansson <olofj@chromium.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: dm-devel@redhat.com
---
 Documentation/device-mapper/verity.txt |  151 ++++
 drivers/md/Kconfig                     |   16 +
 drivers/md/Makefile                    |    1 +
 drivers/md/dm-verity.c                 | 1390 ++++++++++++++++++++++++++++++++
 4 files changed, 1558 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/device-mapper/verity.txt
 create mode 100644 drivers/md/dm-verity.c

diff --git a/Documentation/device-mapper/verity.txt b/Documentation/device-mapper/verity.txt
new file mode 100644
index 0000000..6729102
--- /dev/null
+++ b/Documentation/device-mapper/verity.txt
@@ -0,0 +1,151 @@
+dm-verity
+==========
+
+Device-Mapper's "verity" target provides transparent integrity checking of
+block devices using a cryptographic digest provided by the kernel crypto API.
+This target is read-only.
+
+Parameters:
+    <version> <dev> <hash_dev> <hash_start> <block_size> <alg> <digest> <salt>
+
+<version>
+    This is the version number of the on-disk format. Currently, there is
+    only version 0.
+
+<dev>
+    This is the device that is going to be integrity checked.  It may be
+    a subset of the full device as specified to dmsetup (start sector and count)
+    It may be specified as a path, like /dev/sdaX, or a device number,
+    <major>:<minor>.
+
+<hash_dev>
+    This is the device that that supplies the hash tree data.  It may be
+    specified similarly to the device path and may be the same device.  If the
+    same device is used, the hash offset should be outside of the dm-verity
+    configured device size.
+
+<hash_start>
+    This is the offset, in 512-byte sectors, from the start of hash_dev to
+    the root block of the hash tree.
+
+<block_size>
+    The size of a hash block. Also, the size of a block to be hashed.
+
+<alg>
+    The cryptographic hash algorithm used for this device.  This should
+    be the name of the algorithm, like "sha1".
+
+<digest>
+    The hexadecimal encoding of the cryptographic hash of all of the
+    neighboring nodes at the first level of the tree.  This hash should be
+    trusted as there is no other authenticity beyond this point.
+
+<salt>
+    The hexadecimal encoding of the salt value.
+
+Theory of operation
+===================
+
+dm-verity is meant to be setup as part of a verified boot path.  This
+may be anything ranging from a boot using tboot or trustedgrub to just
+booting from a known-good device (like a USB drive or CD).
+
+When a dm-verity device is configured, it is expected that the caller
+has been authenticated in some way (cryptographic signatures, etc).
+After instantiation, all hashes will be verified on-demand during
+disk access.  If they cannot be verified up to the root node of the
+tree, the root hash, then the I/O will fail.  This should identify
+tampering with any data on the device and the hash data.
+
+Cryptographic hashes are used to assert the integrity of the device on a
+per-block basis.  This allows for a lightweight hash computation on first read
+into the page cache.  Block hashes are stored linearly aligned to the nearest
+block the size of a page.
+
+Hash Tree
+---------
+
+Each node in the tree is a cryptographic hash.  If it is a leaf node, the hash
+is of some block data on disk.  If it is an intermediary node, then the hash is
+of a number of child nodes.
+
+Each entry in the tree is a collection of neighboring nodes that fit in one
+block.  The number is determined based on block_size and the size of the
+selected cryptographic digest algorithm.  The hashes are linearly ordered in
+this entry and any unaligned trailing space is ignored but included when
+calculating the parent node.
+
+The tree looks something like:
+
+alg = sha256, num_blocks = 32768, block_size = 4096
+
+                                 [   root    ]
+                                /    . . .    \
+                     [entry_0]                 [entry_1]
+                    /  . . .  \                 . . .   \
+         [entry_0_0]   . . .  [entry_0_127]    . . . .  [entry_1_127]
+           / ... \             /   . . .  \             /           \
+     blk_0 ... blk_127  blk_16256   blk_16383      blk_32640 . . . blk_32767
+
+On-disk format
+==============
+
+Below is the recommended on-disk format. The verity kernel code does not
+read the on-disk header. It only reads the hash blocks which directly
+follow the header. It is expected that a user-space tool will verify the
+integrity of the verity_header and then call dm_setup with the correct
+parameters. Alternatively, the header can be omitted and the dm_setup
+parameters can be passed via the kernel command-line in a rooted chain
+of trust where the command-line is verified.
+
+The on-disk format is especially useful in cases where the hash blocks
+are on a separate partition. The magic number allows easy identification
+of the partition contents. Alternatively, the hash blocks can be stored
+in the same partition as the data to be verified. In such a configuration
+the filesystem on the partition would be sized a little smaller than
+the full-partition, leaving room for the hash blocks.
+
+struct verity_header {
+       uint64_t magic = 0x7665726974790a00;
+       uint32_t version;
+       uint32_t block_size;
+       char digest[128]; /* in hex-ascii, null-terminated or 128-bytes */
+       char salt[128]; /* in hex-ascii, null-terminated or 128-bytes */
+}
+
+struct verity_header_block {
+	struct verity_header;
+	char unused[block_size - sizeof(struct verity_header) - sizeof(sig)];
+	char sig[128]; /* in hex-ascii, null-terminated or 128-bytes */
+}
+
+Directly following the header are the hash blocks which are stored a depth
+at a time (starting from the root), sorted in order of increasing index.
+
+Usage
+=====
+
+The API provides mechanisms for reading and verifying a tree. When reading, all
+required data for the hash tree should be populated for a block before
+attempting a verify.  This can be done by calling dm_bht_populate().  When all
+data is ready, a call to dm_bht_verify_block() with the expected hash value will
+perform both the direct block hash check and the hashes of the parent and
+neighboring nodes where needed to ensure validity up to the root hash.  Note,
+dm_bht_set_root_hexdigest() should be called before any verification attempts
+occur.
+
+Example
+=======
+
+Setup a device;
+[[
+  dmsetup create vroot --table \
+    "0 `blockdev --getsize /dev/sda1` "\
+    "verity /dev/sda1 /dev/sda2 0 4096 sha256 "\
+    "4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076 "\
+    "1234000000000000000000000000000000000000000000000000000000000000"
+]]
+
+A command line tool is available to compute the hash tree and return the
+root hash value.
+  http://git.chromium.org/cgi-bin/gitweb.cgi?p=dm-verity.git;a=tree
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index faa4741..b8bb690 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -370,4 +370,20 @@ config DM_FLAKEY
        ---help---
          A target that intermittently fails I/O for debugging purposes.
 
+config DM_VERITY
+        tristate "Verity target support"
+        depends on BLK_DEV_DM
+        select CRYPTO
+        select CRYPTO_HASH
+        ---help---
+          This device-mapper target allows you to create a device that
+          transparently integrity checks the data on it. You'll need to
+          activate the digests you're going to use in the cryptoapi
+          configuration.
+
+          To compile this code as a module, choose M here: the module will
+          be called dm-verity.
+
+          If unsure, say N.
+
 endif # MD
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index 046860c..70a29af 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -39,6 +39,7 @@ obj-$(CONFIG_DM_SNAPSHOT)	+= dm-snapshot.o
 obj-$(CONFIG_DM_PERSISTENT_DATA)	+= persistent-data/
 obj-$(CONFIG_DM_MIRROR)		+= dm-mirror.o dm-log.o dm-region-hash.o
 obj-$(CONFIG_DM_LOG_USERSPACE)	+= dm-log-userspace.o
+obj-$(CONFIG_DM_VERITY)         += dm-verity.o
 obj-$(CONFIG_DM_ZERO)		+= dm-zero.o
 obj-$(CONFIG_DM_RAID)	+= dm-raid.o
 obj-$(CONFIG_DM_THIN_PROVISIONING)	+= dm-thin-pool.o
diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
new file mode 100644
index 0000000..6728315
--- /dev/null
+++ b/drivers/md/dm-verity.c
@@ -0,0 +1,1390 @@
+/*
+ * Originally based on dm-crypt.c,
+ * Copyright (C) 2003 Christophe Saout <christophe@saout.de>
+ * Copyright (C) 2004 Clemens Fruhwirth <clemens@endorphin.org>
+ * Copyright (C) 2006-2008 Red Hat, Inc. All rights reserved.
+ * Copyright (C) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *                    All Rights Reserved.
+ *
+ * This file is released under the GPLv2.
+ *
+ * Implements a verifying transparent block device.
+ * See Documentation/device-mapper/verity.txt
+ */
+#include <crypto/hash.h>
+#include <linux/atomic.h>
+#include <linux/bio.h>
+#include <linux/blkdev.h>
+#include <linux/cpu.h>
+#include <linux/genhd.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mempool.h>
+#include <linux/module.h>
+#include <linux/percpu.h>
+#include <linux/workqueue.h>
+#include <linux/device-mapper.h>
+
+
+#define DM_MSG_PREFIX "verity"
+
+
+/* Helper for printing sector_t */
+#define ULL(x) ((unsigned long long)(x))
+
+#define MIN_IOS 32
+#define MIN_BIOS (MIN_IOS * 2)
+
+/* To avoid allocating memory for digest tests, we just setup a
+ * max to use for now.
+ */
+#define VERITY_MAX_DIGEST_SIZE 64   /* Supports up to 512-bit digests */
+#define VERITY_SALT_SIZE       32   /* 256 bits of salt is a lot */
+
+/* UNALLOCATED, PENDING, READY, and VERIFIED are valid states. All other
+ * values are entry-related return codes.
+ */
+#define VERITY_TREE_ENTRY_VERIFIED 8  /* 'nodes' checked against parent */
+#define VERITY_TREE_ENTRY_READY 4  /* 'nodes' is loaded and available */
+#define VERITY_TREE_ENTRY_PENDING 2  /* 'nodes' is being loaded */
+#define VERITY_TREE_ENTRY_UNALLOCATED 0 /* untouched */
+#define VERITY_TREE_ENTRY_ERROR -1 /* entry is unsuitable for use */
+#define VERITY_TREE_ENTRY_ERROR_IO -2 /* I/O error on load */
+
+/* Additional possible return codes */
+#define VERITY_TREE_ENTRY_ERROR_MISMATCH -3 /* Digest mismatch */
+
+
+struct verity_io {
+	struct dm_target *target;
+	struct bio *bio;
+	struct delayed_work work;
+	unsigned int flags;
+
+	int error;
+	atomic_t pending;
+
+	u64 block;  /* aligned block index */
+	u64 count;  /* aligned count in blocks */
+};
+
+/* verity_tree_entry
+ * Contains verity_tree->node_count tree nodes at a given tree depth.
+ * state is used to transactionally assure that data is paged in
+ * from disk.  Unless verity_tree kept running crypto contexts for each
+ * level, we need to load in the data for on-demand verification.
+ */
+struct verity_tree_entry {
+	atomic_t state; /* see defines */
+	/* Keeping an extra pointer per entry wastes up to ~33k of
+	 * memory if a 1m blocks are used (or 66 on 64-bit arch)
+	 */
+	struct verity_io *io_context;  /* Reserve a pointer for use during io */
+	/* data should only be non-NULL if fully populated. */
+	void *nodes;  /* The hash data used to verify the children.
+		       * Guaranteed to be page-aligned.
+		       */
+};
+
+/* verity_tree_level
+ * Contains an array of entries which represent a page of hashes where
+ * each hash is a node in the tree at the given tree depth/level.
+ */
+struct verity_tree_level {
+	struct verity_tree_entry *entries;  /* array of entries of tree nodes */
+	unsigned int count;  /* number of entries at this level */
+	sector_t sector;  /* starting sector for this level */
+};
+
+/* opaque context, start, databuf, sector_count */
+typedef int(*verity_tree_callback)(void *,  /* external context */
+			      sector_t,  /* start sector */
+			      u8 *,  /* destination page */
+			      sector_t,  /* num sectors */
+			      struct verity_tree_entry *);
+/* verity_tree - Device mapper block hash tree
+ * verity_tree provides a fixed interface for comparing data blocks
+ * against a cryptographic hashes stored in a hash tree. It
+ * optimizes the tree structure for storage on disk.
+ *
+ * The tree is built from the bottom up.  A collection of data,
+ * external to the tree, is hashed and these hashes are stored
+ * as the blocks in the tree.  For some number of these hashes,
+ * a parent node is created by hashing them.  These steps are
+ * repeated.
+ */
+struct verity_tree {
+	/* Configured values */
+	int depth;  /* Depth of the tree including the root */
+	unsigned int block_size;  /* Size of a hash block */
+	u64 block_count;  /* Number of blocks hashed */
+	char hash_alg[CRYPTO_MAX_ALG_NAME];
+	u8 salt[VERITY_SALT_SIZE];
+
+	/* Computed values */
+	unsigned int node_count;  /* Data size (in hashes) for each entry */
+	unsigned int node_count_shift;  /* first bit set - 1 */
+	struct crypto_shash *tfm; /* hash for this device */
+	unsigned int hash_desc_size;
+	sector_t sectors;  /* Number of disk sectors used */
+	u8 digest[VERITY_MAX_DIGEST_SIZE];
+	unsigned int digest_size;
+
+	struct verity_tree_level *levels;
+
+	/* Callback for reading from the hash device */
+	verity_tree_callback read_cb;
+};
+
+/* per-requested-bio private data */
+enum verity_io_flags {
+	VERITY_IOFLAGS_CLONED = 0x1,	/* original bio has been cloned */
+};
+
+struct verity_config {
+	struct dm_dev *dev;
+	sector_t start;
+	sector_t size;
+
+	struct dm_dev *hash_dev;
+	sector_t hash_start;
+
+	struct verity_tree vt;
+
+	/* Pool required for io contexts */
+	mempool_t *io_pool;
+	/* Pool and bios required for making sure that backing device reads are
+	 * in PAGE_SIZE increments.
+	 */
+	struct bio_set *bs;
+
+	char hash_alg[CRYPTO_MAX_ALG_NAME];
+};
+
+
+static struct kmem_cache *_verity_io_pool;
+static struct workqueue_struct *kveritydq, *kverityd_ioq;
+
+static DEFINE_PER_CPU(struct shash_desc *, verity_hash_desc);
+static DEFINE_PER_CPU(unsigned int, verity_hash_size);
+
+static void kverityd_verify(struct work_struct *work);
+static void kverityd_io(struct work_struct *work);
+static void kverityd_io_vt_populate(struct verity_io *io);
+static void kverityd_io_vt_populate_end(struct bio *, int error);
+
+
+/*
+ * Utilities
+ */
+
+static void bin2hex(char *dst, const u8 *src, size_t count)
+{
+	while (count-- > 0) {
+		sprintf(dst, "%02hhx", (int)*src);
+		dst += 2;
+		src++;
+	}
+}
+
+/*
+ * Verity Tree
+ */
+
+/* Functions for converting indices to nodes. */
+
+static inline unsigned int verity_tree_get_level_shift(struct verity_tree *vt,
+						  int depth)
+{
+	return (vt->depth - depth) * vt->node_count_shift;
+}
+
+/* For the given depth, this is the entry index.  At depth+1 it is the node
+ * index for depth.
+ */
+static inline u64 verity_tree_index_at_level(struct verity_tree *vt,
+					     int depth, u64 leaf)
+{
+	return leaf >> verity_tree_get_level_shift(vt, depth);
+}
+
+static inline struct verity_tree_entry *verity_tree_get_entry(
+		struct verity_tree *vt,
+		int depth, u64 block)
+{
+	u64 index = verity_tree_index_at_level(vt, depth, block);
+	struct verity_tree_level *level = &vt->levels[depth];
+
+	return &level->entries[index];
+}
+
+static inline void *verity_tree_get_node(struct verity_tree *vt,
+					 struct verity_tree_entry *entry,
+					 int depth, unsigned int block)
+{
+	u64 index = verity_tree_index_at_level(vt, depth, block);
+	unsigned int node_index = (unsigned int)index % vt->node_count;
+
+	return entry->nodes + (node_index * vt->digest_size);
+}
+
+/**
+ * verity_tree_compute_hash: hashes a page of data
+ */
+static int verity_tree_compute_hash(struct verity_tree *vt, struct page *pg,
+				    unsigned int offset, u8 *digest)
+{
+	struct shash_desc **hash_descp = &__get_cpu_var(verity_hash_desc);
+	unsigned int *hash_sizep = &__get_cpu_var(verity_hash_size);
+	struct shash_desc *hash_desc;
+	void *data;
+	int err;
+
+	if (!*hash_descp || *hash_sizep < vt->hash_desc_size) {
+		kfree(*hash_descp);
+		*hash_descp = kmalloc(vt->hash_desc_size, GFP_KERNEL);
+		*hash_sizep = vt->hash_desc_size;
+	}
+	hash_desc = *hash_descp;
+	hash_desc->tfm = vt->tfm;
+	hash_desc->flags = 0x0;
+
+	if (crypto_shash_init(hash_desc)) {
+		DMCRIT("failed to reinitialize crypto hash (proc:%d)",
+			smp_processor_id());
+		return -EINVAL;
+	}
+	data = kmap_atomic(pg);
+	err = crypto_shash_update(hash_desc, data + offset, PAGE_SIZE);
+	kunmap_atomic(data);
+	if (err) {
+		DMCRIT("crypto_hash_update failed");
+		return -EINVAL;
+	}
+	if (crypto_shash_update(hash_desc, vt->salt, sizeof(vt->salt))) {
+		DMCRIT("crypto_hash_update failed");
+		return -EINVAL;
+	}
+	if (crypto_shash_final(hash_desc, digest)) {
+		DMCRIT("crypto_hash_final failed");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int verity_tree_initialize_entries(struct verity_tree *vt)
+{
+	/* last represents the index of the last digest store in the tree.
+	 * By walking the tree with that index, it is possible to compute the
+	 * total number of entries at each level.
+	 *
+	 * Since each entry will contain up to |node_count| nodes of the tree,
+	 * it is possible that the last index may not be at the end of a given
+	 * entry->nodes.  In that case, it is assumed the value is padded.
+	 *
+	 * Note, we treat both the tree root (1 hash) and the tree leaves
+	 * independently from the vt data structures.  Logically, the root is
+	 * depth=-1 and the block layer level is depth=vt->depth
+	 */
+	u64 last = vt->block_count - 1;
+	int depth;
+
+	/* check that the largest level->count can't result in an int overflow
+	 * on allocation or sector calculation.
+	 */
+	if (((last >> vt->node_count_shift) + 1) >
+	    UINT_MAX / max_t(unsigned long,
+			     sizeof(struct verity_tree_entry),
+			     (unsigned long)to_sector(vt->block_size))) {
+		DMCRIT("required entries %llu is too large", vt->block_count);
+		return -EINVAL;
+	}
+
+	/* Track the current sector location for each level so we don't have to
+	 * compute it during traversals.
+	 */
+	vt->sectors = 0;
+	for (depth = 0; depth < vt->depth; ++depth) {
+		struct verity_tree_level *level = &vt->levels[depth];
+
+		level->count = verity_tree_index_at_level(vt, depth, last) + 1;
+		level->entries = kcalloc(level->count,
+					 sizeof(struct verity_tree_entry),
+					 GFP_KERNEL);
+		if (!level->entries) {
+			DMERR("failed to allocate entries for depth %d", depth);
+			return -ENOMEM;
+		}
+		level->sector = vt->sectors;
+		vt->sectors += level->count * to_sector(vt->block_size);
+	}
+
+	return 0;
+}
+
+/**
+ * verity_tree_create - prepares @vt for us
+ * @vt:	          pointer to a verity_tree_create()d vt
+ * @depth:	  tree depth without the root; including block hashes
+ * @block_count:  the number of block hashes / tree leaves
+ * @alg_name:	  crypto hash algorithm name
+ *
+ * Returns 0 on success.
+ *
+ * Callers can offset into devices by storing the data in the io callbacks.
+ */
+static int verity_tree_create(struct verity_tree *vt, u64 block_count,
+			      unsigned int block_size, const char *alg_name)
+{
+	int status = 0;
+
+	vt->block_size = block_size;
+	/* Verify that PAGE_SIZE >= block_size >= SECTOR_SIZE. */
+	if ((block_size > PAGE_SIZE) ||
+	    (PAGE_SIZE % block_size) ||
+	    (to_sector(block_size) == 0))
+		return -EINVAL;
+
+	vt->tfm = crypto_alloc_shash(alg_name, 0, 0);
+	if (IS_ERR(vt->tfm)) {
+		DMERR("failed to allocate crypto hash '%s'", alg_name);
+		return -ENOMEM;
+	}
+	vt->hash_desc_size = sizeof(struct shash_desc) +
+		crypto_shash_descsize(vt->tfm);
+
+	vt->digest_size = crypto_shash_digestsize(vt->tfm);
+	/* We expect to be able to pack >=2 hashes into a block */
+	if (block_size / vt->digest_size < 2) {
+		DMERR("too few hashes fit in a block");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	if (vt->digest_size > VERITY_MAX_DIGEST_SIZE) {
+		DMERR("VERITY_MAX_DIGEST_SIZE too small for digest");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	/* Configure the tree */
+	vt->block_count = block_count;
+	if (block_count == 0) {
+		DMERR("block_count must be non-zero");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	/* Each verity_tree_entry->nodes is one block.  The node code tracks
+	 * how many nodes fit into one entry where a node is a single
+	 * hash (message digest).
+	 */
+	vt->node_count_shift = fls(block_size / vt->digest_size) - 1;
+	/* Round down to the nearest power of two.  This makes indexing
+	 * into the tree much less painful.
+	 */
+	vt->node_count = 1 << vt->node_count_shift;
+
+	/* This is unlikely to happen, but with 64k pages, who knows. */
+	if (vt->node_count > UINT_MAX / vt->digest_size) {
+		DMERR("node_count * hash_len exceeds UINT_MAX!");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	vt->depth = DIV_ROUND_UP(fls64(block_count - 1), vt->node_count_shift);
+
+	/* Ensure that we can safely shift by this value. */
+	if (vt->depth * vt->node_count_shift >= sizeof(unsigned int) * 8) {
+		DMERR("specified depth and node_count_shift is too large");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	/* Allocate levels. Each level of the tree may have an arbitrary number
+	 * of verity_tree_entry structs.  Each entry contains node_count nodes.
+	 * Each node in the tree is a cryptographic digest of either node_count
+	 * nodes on the subsequent level or of a specific block on disk.
+	 */
+	vt->levels = kcalloc(vt->depth,
+			     sizeof(struct verity_tree_level), GFP_KERNEL);
+
+	vt->read_cb = NULL;
+
+	status = verity_tree_initialize_entries(vt);
+	if (status)
+		goto bad_entries_alloc;
+
+	/* We compute depth such that there is only be 1 block at level 0. */
+	BUG_ON(vt->levels[0].count != 1);
+
+	return 0;
+
+bad_entries_alloc:
+	while (vt->depth-- > 0)
+		kfree(vt->levels[vt->depth].entries);
+	kfree(vt->levels);
+bad_arg:
+	crypto_free_shash(vt->tfm);
+	return status;
+}
+
+/**
+ * verity_tree_read_completed
+ * @entry:   pointer to the entry that's been loaded
+ * @status:  I/O status. Non-zero is failure.
+ * MUST always be called after a read_cb completes.
+ */
+static void verity_tree_read_completed(struct verity_tree_entry *entry,
+				       int status)
+{
+	if (status) {
+		DMCRIT("an I/O error occurred while reading entry");
+		atomic_set(&entry->state, VERITY_TREE_ENTRY_ERROR_IO);
+		return;
+	}
+	BUG_ON(atomic_read(&entry->state) != VERITY_TREE_ENTRY_PENDING);
+	atomic_set(&entry->state, VERITY_TREE_ENTRY_READY);
+}
+
+/**
+ * verity_tree_verify_block - checks that all path nodes for @block are valid
+ * @vt:	     pointer to a verity_tree_create()d vt
+ * @block:   specific block data is expected from
+ * @pg:	     page holding the block data
+ * @offset:  offset into the page
+ *
+ * Returns 0 on success, VERITY_TREE_ENTRY_ERROR_MISMATCH on error.
+ */
+static int verity_tree_verify_block(struct verity_tree *vt, unsigned int block,
+				    struct page *pg, unsigned int offset)
+{
+	int state, depth = vt->depth;
+	u8 digest[VERITY_MAX_DIGEST_SIZE];
+	struct verity_tree_entry *entry;
+	void *node;
+
+	do {
+		/* Need to check that the hash of the current block is accurate
+		 * in its parent.
+		 */
+		entry = verity_tree_get_entry(vt, depth - 1, block);
+		state = atomic_read(&entry->state);
+		/* This call is only safe if all nodes along the path
+		 * are already populated (i.e. READY) via verity_tree_populate.
+		 */
+		BUG_ON(state < VERITY_TREE_ENTRY_READY);
+		node = verity_tree_get_node(vt, entry, depth, block);
+
+		if (verity_tree_compute_hash(vt, pg, offset, digest) ||
+		    memcmp(digest, node, vt->digest_size))
+			goto mismatch;
+
+		/* Keep the containing block of hashes to be verified in the
+		 * next pass.
+		 */
+		pg = virt_to_page(entry->nodes);
+		offset = offset_in_page(entry->nodes);
+	} while (--depth > 0 && state != VERITY_TREE_ENTRY_VERIFIED);
+
+	if (depth == 0 && state != VERITY_TREE_ENTRY_VERIFIED) {
+		if (verity_tree_compute_hash(vt, pg, offset, digest) ||
+		    memcmp(digest, vt->digest, vt->digest_size))
+			goto mismatch;
+		atomic_set(&entry->state, VERITY_TREE_ENTRY_VERIFIED);
+	}
+
+	/* Mark path to leaf as verified. */
+	for (depth++; depth < vt->depth; depth++) {
+		entry = verity_tree_get_entry(vt, depth, block);
+		/* At this point, entry can only be in VERIFIED or READY state.
+		 * So it is safe to use atomic_set instead of atomic_cmpxchg.
+		 */
+		atomic_set(&entry->state, VERITY_TREE_ENTRY_VERIFIED);
+	}
+
+	return 0;
+
+mismatch:
+	DMERR_LIMIT("verify_path: failed to verify hash (d=%d,bi=%u)",
+		    depth, block);
+	return VERITY_TREE_ENTRY_ERROR_MISMATCH;
+}
+
+/**
+ * verity_tree_is_populated - check that nodes needed to verify a given
+ *                            block are all ready
+ * @vt:	    pointer to a verity_tree_create()d vt
+ * @block:  specific block data is expected from
+ *
+ * Callers may wish to call verity_tree_is_populated() when checking an io
+ * for which entries were already pending.
+ */
+static bool verity_tree_is_populated(struct verity_tree *vt, unsigned int block)
+{
+	int depth;
+
+	for (depth = vt->depth - 1; depth >= 0; depth--) {
+		struct verity_tree_entry *entry;
+		entry = verity_tree_get_entry(vt, depth, block);
+		if (atomic_read(&entry->state) < VERITY_TREE_ENTRY_READY)
+			return false;
+	}
+
+	return true;
+}
+
+/**
+ * verity_tree_populate - reads entries from disk needed to verify a given block
+ * @vt:     pointer to a verity_tree_create()d vt
+ * @ctx:    context used for all read_cb calls on this request
+ * @block:  specific block data is expected from
+ *
+ * Returns negative value on error. Returns 0 on success.
+ */
+static int verity_tree_populate(struct verity_tree *vt, void *ctx,
+				unsigned int block)
+{
+	int depth, state;
+
+	BUG_ON(block >= vt->block_count);
+
+	for (depth = vt->depth - 1; depth >= 0; --depth) {
+		struct verity_tree_level *level;
+		struct verity_tree_entry *entry;
+		u64 index;
+
+		index = verity_tree_index_at_level(vt, depth, block);
+		level = &vt->levels[depth];
+		entry = verity_tree_get_entry(vt, depth, block);
+		state = atomic_cmpxchg(&entry->state,
+				       VERITY_TREE_ENTRY_UNALLOCATED,
+				       VERITY_TREE_ENTRY_PENDING);
+		if (state == VERITY_TREE_ENTRY_VERIFIED)
+			break;
+		if (state <= VERITY_TREE_ENTRY_ERROR)
+			goto error_state;
+		if (state != VERITY_TREE_ENTRY_UNALLOCATED)
+			continue;
+
+		/* Current entry is claimed for allocation and loading */
+		entry->nodes = kmalloc(vt->block_size, GFP_NOIO);
+
+		vt->read_cb(ctx,
+			    level->sector + to_sector(index * vt->block_size),
+			    entry->nodes, to_sector(vt->block_size), entry);
+	}
+
+	return 0;
+
+error_state:
+	DMCRIT("block %u at depth %d is in an error state", block, depth);
+	return -EPERM;
+}
+
+/**
+ * verity_tree_destroy - cleans up all memory used by @vt
+ * @vt:	 pointer to a verity_tree_create()d vt
+ */
+static void verity_tree_destroy(struct verity_tree *vt)
+{
+	int depth;
+
+	for (depth = 0; depth < vt->depth; depth++) {
+		struct verity_tree_entry *entry = vt->levels[depth].entries;
+		struct verity_tree_entry *entry_end = entry +
+			vt->levels[depth].count;
+		for (; entry < entry_end; ++entry)
+			kfree(entry->nodes);
+		kfree(vt->levels[depth].entries);
+	}
+	kfree(vt->levels);
+	crypto_free_shash(vt->tfm);
+}
+
+/*
+ * Verity Tree Accessors
+ */
+
+/**
+ * verity_tree_set_digest - sets an unverified root digest hash from hex
+ * @vt:	     pointer to a verity_tree_create()d vt
+ * @digest:  string containing the digest in hex
+ * Returns non-zero on error.
+ */
+static int verity_tree_set_digest(struct verity_tree *vt, const char *digest)
+{
+	/* Make sure we have at least the bytes expected */
+	if (strnlen(digest, vt->digest_size * 2) != vt->digest_size * 2) {
+		DMERR("root digest length does not match hash algorithm");
+		return -1;
+	}
+	return hex2bin(vt->digest, digest, vt->digest_size);
+}
+
+/**
+ * verity_tree_digest - returns root digest in hex
+ * @vt:	     pointer to a verity_tree_create()d vt
+ * @digest:  buffer to put into, must be of length VERITY_SALT_SIZE * 2 + 1.
+ */
+int verity_tree_digest(struct verity_tree *vt, char *digest)
+{
+	bin2hex(digest, vt->digest, vt->digest_size);
+	return 0;
+}
+
+/**
+ * verity_tree_set_salt - sets the salt
+ * @vt:    pointer to a verity_tree_create()d vt
+ * @salt:  string containing the salt in hex
+ * Returns non-zero on error.
+ */
+int verity_tree_set_salt(struct verity_tree *vt, const char *salt)
+{
+	size_t saltlen = min(strlen(salt) / 2, sizeof(vt->salt));
+	memset(vt->salt, 0, sizeof(vt->salt));
+	return hex2bin(vt->salt, salt, saltlen);
+}
+
+
+/**
+ * verity_tree_salt - returns the salt in hex
+ * @vt:    pointer to a verity_tree_create()d vt
+ * @salt:  buffer to put salt into, of length VERITY_SALT_SIZE * 2 + 1.
+ */
+int verity_tree_salt(struct verity_tree *vt, char *salt)
+{
+	bin2hex(salt, vt->salt, sizeof(vt->salt));
+	return 0;
+}
+
+/*
+ * Allocation and utility functions
+ */
+
+static void kverityd_src_io_read_end(struct bio *clone, int error);
+
+/* Shared destructor for all internal bios */
+static void verity_bio_destructor(struct bio *bio)
+{
+	struct verity_io *io = bio->bi_private;
+	struct verity_config *vc = io->target->private;
+	bio_free(bio, vc->bs);
+}
+
+static struct bio *verity_alloc_bioset(struct verity_config *vc, gfp_t gfp_mask,
+				       int nr_iovecs)
+{
+	return bio_alloc_bioset(gfp_mask, nr_iovecs, vc->bs);
+}
+
+static struct verity_io *verity_io_alloc(struct dm_target *ti,
+					    struct bio *bio)
+{
+	struct verity_config *vc = ti->private;
+	sector_t sector = bio->bi_sector - ti->begin;
+	struct verity_io *io;
+	u64 tmp;
+
+	io = mempool_alloc(vc->io_pool, GFP_NOIO);
+	io->flags = 0;
+	io->target = ti;
+	io->bio = bio;
+	io->error = 0;
+
+	/* Adjust the sector by the virtual starting sector */
+	tmp = (u64)to_bytes(1) * sector;
+	do_div(tmp, vc->vt.block_size);
+	io->block = tmp;
+	io->count = bio->bi_size / vc->vt.block_size;
+
+	atomic_set(&io->pending, 0);
+
+	return io;
+}
+
+static struct bio *verity_bio_clone(struct verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+	struct bio *bio = io->bio;
+	struct bio *clone = verity_alloc_bioset(vc, GFP_NOIO, bio->bi_max_vecs);
+
+	__bio_clone(clone, bio);
+	clone->bi_private = io;
+	clone->bi_end_io  = kverityd_src_io_read_end;
+	clone->bi_bdev    = vc->dev->bdev;
+	clone->bi_sector += vc->start - io->target->begin;
+	clone->bi_destructor = verity_bio_destructor;
+
+	return clone;
+}
+
+/*
+ * Reverse flow of requests into the device.
+ *
+ * (Start at the bottom with verity_map and work your way upward).
+ */
+
+static void verity_inc_pending(struct verity_io *io);
+
+static void verity_return_bio_to_caller(struct verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+
+	bio_endio(io->bio, io->error);
+	mempool_free(io, vc->io_pool);
+}
+
+/* Check for any missing vt hashes. */
+static bool verity_is_vt_populated(struct verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+	u64 block;
+
+	for (block = io->block; block < io->block + io->count; ++block)
+		if (!verity_tree_is_populated(&vc->vt, block))
+			return false;
+
+	return true;
+}
+
+/* verity_dec_pending manages the lifetime of all verity_io structs.
+ * Non-bug error handling is centralized through this interface and
+ * all passage from workqueue to workqueue.
+ */
+static void verity_dec_pending(struct verity_io *io)
+{
+	if (!atomic_dec_and_test(&io->pending))
+		goto done;
+
+	if (unlikely(io->error))
+		goto io_error;
+
+	/* I/Os that were pending may now be ready */
+	if (verity_is_vt_populated(io)) {
+		INIT_DELAYED_WORK(&io->work, kverityd_verify);
+		queue_delayed_work(kveritydq, &io->work, 0);
+	} else {
+		INIT_DELAYED_WORK(&io->work, kverityd_io);
+		queue_delayed_work(kverityd_ioq, &io->work, HZ/10);
+	}
+
+done:
+	return;
+
+io_error:
+	verity_return_bio_to_caller(io);
+}
+
+/* Walks the data set and computes the hash of the data read from the
+ * untrusted source device.  The computed hash is then passed to verity-tree
+ * for verification.
+ */
+static int verity_verify(struct verity_config *vc,
+			 struct verity_io *io)
+{
+	unsigned int block_size = vc->vt.block_size;
+	struct bio *bio = io->bio;
+	u64 block = io->block;
+	unsigned int idx;
+	int r;
+
+	for (idx = bio->bi_idx; idx < bio->bi_vcnt; idx++) {
+		struct bio_vec *bv = bio_iovec_idx(bio, idx);
+		unsigned int offset = bv->bv_offset;
+		unsigned int len = bv->bv_len;
+
+		BUG_ON(offset % block_size);
+		BUG_ON(len % block_size);
+
+		while (len) {
+			r = verity_tree_verify_block(&vc->vt, block,
+						bv->bv_page, offset);
+			if (r)
+				goto bad_return;
+
+			offset += block_size;
+			len -= block_size;
+			block++;
+			cond_resched();
+		}
+	}
+
+	return 0;
+
+bad_return:
+	/* verity_tree functions aren't expected to return errno friendly
+	 * values.  They are converted here for uniformity.
+	 */
+	if (r > 0) {
+		DMERR("Pending data for block %llu seen at verify", ULL(block));
+		r = -EBUSY;
+	} else {
+		DMERR_LIMIT("Block hash does not match!");
+		r = -EACCES;
+	}
+	return r;
+}
+
+/* Services the verify workqueue */
+static void kverityd_verify(struct work_struct *work)
+{
+	struct delayed_work *dwork = container_of(work, struct delayed_work,
+						  work);
+	struct verity_io *io = container_of(dwork, struct verity_io,
+					    work);
+	struct verity_config *vc = io->target->private;
+
+	io->error = verity_verify(vc, io);
+
+	/* Free up the bio and tag with the return value */
+	verity_return_bio_to_caller(io);
+}
+
+/* Asynchronously called upon the completion of verity-tree I/O. The status
+ * of the operation is passed back to verity-tree and the next steps are
+ * decided by verity_dec_pending.
+ */
+static void kverityd_io_vt_populate_end(struct bio *bio, int error)
+{
+	struct verity_tree_entry *entry = bio->bi_private;
+	struct verity_io *io = entry->io_context;
+
+	/* Tell the tree to atomically update now that we've populated
+	 * the given entry.
+	 */
+	verity_tree_read_completed(entry, error);
+
+	/* Clean up for reuse when reading data to be checked */
+	bio->bi_vcnt = 0;
+	bio->bi_io_vec->bv_offset = 0;
+	bio->bi_io_vec->bv_len = 0;
+	bio->bi_io_vec->bv_page = NULL;
+	/* Restore the private data to I/O so the destructor can be shared. */
+	bio->bi_private = io;
+	bio_put(bio);
+
+	/* We bail but assume the tree has been marked bad. */
+	if (unlikely(error)) {
+		DMERR("Failed to read for sector %llu (%u)",
+		      ULL(io->bio->bi_sector), io->bio->bi_size);
+		io->error = error;
+		/* Pass through the error to verity_dec_pending below */
+	}
+	/* When pending = 0, it will transition to reading real data */
+	verity_dec_pending(io);
+}
+
+/* Called by verity-tree (via verity_tree_populate), this function provides
+ * the message digests to verity-tree that are stored on disk.
+ */
+static int kverityd_vt_read_callback(void *ctx, sector_t start, u8 *dst,
+				      sector_t count,
+				      struct verity_tree_entry *entry)
+{
+	struct verity_io *io = ctx;  /* I/O for this batch */
+	struct verity_config *vc;
+	struct bio *bio;
+
+	vc = io->target->private;
+
+	/* The I/O context is nested inside the entry so that we don't need one
+	 * io context per page read.
+	 */
+	entry->io_context = ctx;
+
+	/* We should only get page size requests at present. */
+	verity_inc_pending(io);
+	bio = verity_alloc_bioset(vc, GFP_NOIO, 1);
+	bio->bi_private = entry;
+	bio->bi_idx = 0;
+	bio->bi_size = vc->vt.block_size;
+	bio->bi_sector = vc->hash_start + start;
+	bio->bi_bdev = vc->hash_dev->bdev;
+	bio->bi_end_io = kverityd_io_vt_populate_end;
+	bio->bi_rw = REQ_META;
+	/* Only need to free the bio since the page is managed by vt */
+	bio->bi_destructor = verity_bio_destructor;
+	bio->bi_vcnt = 1;
+	bio->bi_io_vec->bv_offset = offset_in_page(dst);
+	bio->bi_io_vec->bv_len = to_bytes(count);
+	/* dst is guaranteed to be a page_pool allocation */
+	bio->bi_io_vec->bv_page = virt_to_page(dst);
+	/* Track that this I/O is in use.  There should be no risk of the io
+	 * being removed prior since this is called synchronously.
+	 */
+	generic_make_request(bio);
+	return 0;
+}
+
+/* Submits an io request for each missing block of block hashes.
+ * The last one to return will then enqueue this on the io workqueue.
+ */
+static void kverityd_io_vt_populate(struct verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+	u64 block;
+
+	for (block = io->block; block < io->block + io->count; ++block) {
+		int ret = verity_tree_populate(&vc->vt, io, block);
+
+		if (ret < 0) {
+			/* verity_dec_pending will handle the error case. */
+			io->error = ret;
+			break;
+		}
+	}
+}
+
+/* Asynchronously called upon the completion of I/O issued
+ * from kverityd_src_io_read. verity_dec_pending() acts as
+ * the scheduler/flow manager.
+ */
+static void kverityd_src_io_read_end(struct bio *clone, int error)
+{
+	struct verity_io *io = clone->bi_private;
+
+	if (unlikely(!bio_flagged(clone, BIO_UPTODATE) && !error))
+		error = -EIO;
+
+	if (unlikely(error)) {
+		DMERR_LIMIT("Error occurred: %d (%llu, %u)",
+			    error, ULL(clone->bi_sector), clone->bi_size);
+		io->error = error;
+	}
+
+	/* Release the clone which just avoids the block layer from
+	 * leaving offsets, etc in unexpected states.
+	 */
+	bio_put(clone);
+
+	verity_dec_pending(io);
+}
+
+/* If not yet underway, an I/O request will be issued to the vc->dev
+ * device for the data needed. It is cloned to avoid unexpected changes
+ * to the original bio struct.
+ */
+static void kverityd_src_io_read(struct verity_io *io)
+{
+	struct bio *clone;
+
+	/* Check if the read is already issued. */
+	if (io->flags & VERITY_IOFLAGS_CLONED)
+		return;
+
+	io->flags |= VERITY_IOFLAGS_CLONED;
+
+	/* Clone the bio. The block layer may modify the bvec array. */
+	clone = verity_bio_clone(io);
+	if (unlikely(!clone)) {
+		io->error = -ENOMEM;
+		return;
+	}
+
+	verity_inc_pending(io);
+
+	generic_make_request(clone);
+}
+
+/* kverityd_io services the I/O workqueue. For each pass through
+ * the I/O workqueue, a call to populate both the origin drive
+ * data and the hash tree data is made.
+ */
+static void kverityd_io(struct work_struct *work)
+{
+	struct delayed_work *dwork = container_of(work, struct delayed_work,
+						  work);
+	struct verity_io *io = container_of(dwork, struct verity_io,
+					    work);
+
+	/* Issue requests asynchronously. */
+	verity_inc_pending(io);
+	kverityd_src_io_read(io);
+	kverityd_io_vt_populate(io);
+	verity_dec_pending(io);
+}
+
+/* Paired with verity_dec_pending, the pending value in the io dictate the
+ * lifetime of a request and when it is ready to be processed on the
+ * workqueues.
+ */
+static void verity_inc_pending(struct verity_io *io)
+{
+	atomic_inc(&io->pending);
+}
+
+/* Block-level requests start here. */
+static int verity_map(struct dm_target *ti, struct bio *bio,
+		      union map_info *map_context)
+{
+	struct verity_io *io;
+	struct verity_config *vc;
+	struct request_queue *r_queue;
+
+	if (unlikely(!ti)) {
+		DMERR("dm_target was NULL");
+		return -EIO;
+	}
+
+	vc = ti->private;
+	r_queue = bdev_get_queue(vc->dev->bdev);
+
+	if (bio_data_dir(bio) == WRITE) {
+		/* If we silently drop writes, then the VFS layer will cache
+		 * the write and persist it in memory. While it doesn't change
+		 * the underlying storage, it still may be contrary to the
+		 * behavior expected by a verified, read-only device.
+		 */
+		DMWARN_LIMIT("write request received. rejecting with -EIO.");
+		return -EIO;
+	} else {
+		/* Queue up the request to be verified */
+		io = verity_io_alloc(ti, bio);
+		if (!io) {
+			DMERR_LIMIT("Failed to allocate and init IO data");
+			return DM_MAPIO_REQUEUE;
+		}
+		INIT_DELAYED_WORK(&io->work, kverityd_io);
+		queue_delayed_work(kverityd_ioq, &io->work, 0);
+	}
+
+	return DM_MAPIO_SUBMITTED;
+}
+
+/*
+ * Non-block interfaces and device-mapper specific code
+ */
+
+/*
+ * Verity target parameters:
+ *
+ * <dev> <hash_dev> <hash_start> <block_size> <alg> <digest> <salt>
+ *
+ * version:        version of the hash tree on-disk format
+ * dev:            device to verify
+ * hash_dev:       device hashtree is stored on
+ * hash_start:     start address of hashes
+ * block_size:     size of a hash block
+ * alg:            hash algorithm
+ * digest:         toplevel hash of the tree
+ * salt:           salt
+ */
+static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+{
+	struct verity_config *vc = NULL;
+	const char *dev, *hash_dev, *alg, *digest, *salt;
+	unsigned long hash_start, block_size, version;
+	sector_t blocks;
+	int ret;
+
+	if (argc != 8) {
+		ti->error = "Invalid argument count";
+		return -EINVAL;
+	}
+
+	if (kstrtoul(argv[0], 10, &version) || (version != 0)) {
+		ti->error = "Invalid version";
+		return -EINVAL;
+	}
+	dev = argv[1];
+	hash_dev = argv[2];
+	if (kstrtoul(argv[3], 10, &hash_start)) {
+		ti->error = "Invalid hash_start";
+		return -EINVAL;
+	}
+	if (kstrtoul(argv[4], 10, &block_size) || (block_size > UINT_MAX)) {
+		ti->error = "Invalid block_size";
+		return -EINVAL;
+	}
+	alg = argv[5];
+	digest = argv[6];
+	salt = argv[7];
+
+	/* The device mapper device should be setup read-only */
+	if ((dm_table_get_mode(ti->table) & ~FMODE_READ) != 0) {
+		ti->error = "Must be created readonly.";
+		return -EINVAL;
+	}
+
+	vc = kzalloc(sizeof(*vc), GFP_KERNEL);
+	if (!vc)
+		return -EINVAL;
+
+	/* Calculate the blocks from the given device size */
+	vc->size = ti->len;
+	blocks = to_bytes(vc->size) / block_size;
+	if (verity_tree_create(&vc->vt, blocks, block_size, alg)) {
+		DMERR("failed to create required vt");
+		goto bad_vt;
+	}
+	if (verity_tree_set_digest(&vc->vt, digest)) {
+		DMERR("digest error");
+		goto bad_digest;
+	}
+	verity_tree_set_salt(&vc->vt, salt);
+	vc->vt.read_cb = kverityd_vt_read_callback;
+
+	vc->start = 0;
+	/* We only ever grab the device in read-only mode. */
+	ret = dm_get_device(ti, dev, dm_table_get_mode(ti->table), &vc->dev);
+	if (ret) {
+		DMERR("Failed to acquire device '%s': %d", dev, ret);
+		ti->error = "Device lookup failed";
+		goto bad_verity_dev;
+	}
+
+	if ((to_bytes(vc->start) % block_size) ||
+	    (to_bytes(vc->size) % block_size)) {
+		ti->error = "Device must be block_size divisble/aligned";
+		goto bad_hash_start;
+	}
+
+	vc->hash_start = (sector_t)hash_start;
+
+	/*
+	 * Note, dev == hash_dev is okay as long as the size of
+	 *       ti->len passed to device mapper does not include
+	 *       the hashes.
+	 */
+	if (dm_get_device(ti, hash_dev,
+			  dm_table_get_mode(ti->table), &vc->hash_dev)) {
+		ti->error = "Hash device lookup failed";
+		goto bad_hash_dev;
+	}
+
+	if (snprintf(vc->hash_alg, CRYPTO_MAX_ALG_NAME, "%s", alg) >=
+	    CRYPTO_MAX_ALG_NAME) {
+		ti->error = "Hash algorithm name is too long";
+		goto bad_hash;
+	}
+
+	vc->io_pool = mempool_create_slab_pool(MIN_IOS, _verity_io_pool);
+	if (!vc->io_pool) {
+		ti->error = "Cannot allocate verity io mempool";
+		goto bad_slab_pool;
+	}
+
+	vc->bs = bioset_create(MIN_BIOS, 0);
+	if (!vc->bs) {
+		ti->error = "Cannot allocate verity bioset";
+		goto bad_bs;
+	}
+
+	ti->private = vc;
+
+	return 0;
+
+bad_bs:
+	mempool_destroy(vc->io_pool);
+bad_slab_pool:
+bad_hash:
+	dm_put_device(ti, vc->hash_dev);
+bad_hash_dev:
+bad_hash_start:
+	dm_put_device(ti, vc->dev);
+bad_vt:
+bad_digest:
+bad_verity_dev:
+	kfree(vc);   /* hash is not secret so no need to zero */
+	return -EINVAL;
+}
+
+static void verity_dtr(struct dm_target *ti)
+{
+	struct verity_config *vc = ti->private;
+
+	bioset_free(vc->bs);
+	mempool_destroy(vc->io_pool);
+	verity_tree_destroy(&vc->vt);
+	dm_put_device(ti, vc->hash_dev);
+	dm_put_device(ti, vc->dev);
+	kfree(vc);
+}
+
+static int verity_ioctl(struct dm_target *ti, unsigned int cmd,
+			unsigned long arg)
+{
+	struct verity_config *vc = ti->private;
+	struct dm_dev *dev = vc->dev;
+	int r = 0;
+
+	/*
+	 * Only pass ioctls through if the device sizes match exactly.
+	 */
+	if (vc->start ||
+	    ti->len != i_size_read(dev->bdev->bd_inode) >> SECTOR_SHIFT)
+		r = scsi_verify_blk_ioctl(NULL, cmd);
+
+	return r ? : __blkdev_driver_ioctl(dev->bdev, dev->mode, cmd, arg);
+}
+
+static int verity_status(struct dm_target *ti, status_type_t type,
+			char *result, unsigned int maxlen)
+{
+	struct verity_config *vc = ti->private;
+	char digest[VERITY_MAX_DIGEST_SIZE * 2 + 1] = { 0 };
+	char salt[VERITY_SALT_SIZE * 2 + 1] = { 0 };
+	unsigned int sz = 0;
+
+	verity_tree_digest(&vc->vt, digest);
+	verity_tree_salt(&vc->vt, salt);
+
+	switch (type) {
+	case STATUSTYPE_INFO:
+		result[0] = '\0';
+		break;
+	case STATUSTYPE_TABLE:
+		DMEMIT("%s %s %llu %llu %s %s %s",
+		       vc->dev->name,
+		       vc->hash_dev->name,
+		       ULL(vc->hash_start),
+		       ULL(vc->vt.block_size),
+		       vc->hash_alg,
+		       digest,
+		       salt);
+		break;
+	}
+	return 0;
+}
+
+static int verity_merge(struct dm_target *ti, struct bvec_merge_data *bvm,
+		       struct bio_vec *biovec, int max_size)
+{
+	struct verity_config *vc = ti->private;
+	struct request_queue *q = bdev_get_queue(vc->dev->bdev);
+
+	if (!q->merge_bvec_fn)
+		return max_size;
+
+	bvm->bi_bdev = vc->dev->bdev;
+	bvm->bi_sector = vc->start + bvm->bi_sector - ti->begin;
+
+	/* Optionally, this could just return 0 to stick to single pages. */
+	return min(max_size, q->merge_bvec_fn(q, bvm, biovec));
+}
+
+static int verity_iterate_devices(struct dm_target *ti,
+				 iterate_devices_callout_fn fn, void *data)
+{
+	struct verity_config *vc = ti->private;
+
+	return fn(ti, vc->dev, vc->start, ti->len, data);
+}
+
+static void verity_io_hints(struct dm_target *ti,
+			    struct queue_limits *limits)
+{
+	struct verity_config *vc = ti->private;
+	unsigned int block_size = vc->vt.block_size;
+
+	limits->logical_block_size = block_size;
+	limits->physical_block_size = block_size;
+	blk_limits_io_min(limits, block_size);
+}
+
+static struct target_type verity_target = {
+	.name   = "verity",
+	.version = {0, 1, 0},
+	.module = THIS_MODULE,
+	.ctr    = verity_ctr,
+	.dtr    = verity_dtr,
+	.ioctl  = verity_ioctl,
+	.map    = verity_map,
+	.merge  = verity_merge,
+	.status = verity_status,
+	.iterate_devices = verity_iterate_devices,
+	.io_hints = verity_io_hints,
+};
+
+static int __cpuinit verity_cpu_callback(struct notifier_block *nfb,
+				  unsigned long action,
+				  void *hcpu)
+{
+	switch (action) {
+	case CPU_DOWN_PREPARE:
+		/* Temporary-fix: https://lkml.org/lkml/2012/3/9/580 */
+		flush_workqueue(kveritydq);
+		break;
+	}
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block verity_cpu_nfb __cpuinitdata = {
+	.notifier_call	= verity_cpu_callback,
+	.priority	= 0,
+};
+
+#define VERITY_WQ_FLAGS (WQ_CPU_INTENSIVE|WQ_HIGHPRI)
+
+static int __init verity_init(void)
+{
+	int r = -ENOMEM;
+
+	_verity_io_pool = KMEM_CACHE(verity_io, 0);
+	if (!_verity_io_pool) {
+		DMERR("failed to allocate pool verity_io");
+		goto bad_io_pool;
+	}
+
+	kverityd_ioq = alloc_workqueue("kverityd_io", VERITY_WQ_FLAGS, 1);
+	if (!kverityd_ioq) {
+		DMERR("failed to create workqueue kverityd_ioq");
+		goto bad_io_queue;
+	}
+
+	kveritydq = alloc_workqueue("kverityd", VERITY_WQ_FLAGS, 1);
+	if (!kveritydq) {
+		DMERR("failed to create workqueue kveritydq");
+		goto bad_verify_queue;
+	}
+
+	r = dm_register_target(&verity_target);
+	if (r < 0) {
+		DMERR("register failed %d", r);
+		goto register_failed;
+	}
+
+	register_hotcpu_notifier(&verity_cpu_nfb);
+
+	DMINFO("version %u.%u.%u loaded", verity_target.version[0],
+	       verity_target.version[1], verity_target.version[2]);
+
+	return r;
+
+register_failed:
+	destroy_workqueue(kveritydq);
+bad_verify_queue:
+	destroy_workqueue(kverityd_ioq);
+bad_io_queue:
+	kmem_cache_destroy(_verity_io_pool);
+bad_io_pool:
+	return r;
+}
+
+static void __exit verity_exit(void)
+{
+	int cpu;
+
+	unregister_hotcpu_notifier(&verity_cpu_nfb);
+
+	flush_workqueue(kverityd_ioq);
+	flush_workqueue(kveritydq);
+	destroy_workqueue(kveritydq);
+	destroy_workqueue(kverityd_ioq);
+
+	for_each_possible_cpu(cpu)
+		kfree(per_cpu(verity_hash_desc, cpu));
+
+	dm_unregister_target(&verity_target);
+	kmem_cache_destroy(_verity_io_pool);
+}
+
+module_init(verity_init);
+module_exit(verity_exit);
+
+MODULE_AUTHOR("The Chromium OS Authors <chromium-os-dev@chromium.org>");
+MODULE_DESCRIPTION(DM_NAME " target for transparent disk integrity checking");
+MODULE_LICENSE("GPL");
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH] dm: verity target
  2012-03-02  0:33 Mandeep Singh Baines
@ 2012-03-02 16:08 ` Mandeep Singh Baines
  0 siblings, 0 replies; 22+ messages in thread
From: Mandeep Singh Baines @ 2012-03-02 16:08 UTC (permalink / raw)
  To: Alasdair G Kergon, dm-devel, linux-kernel, Andrew Morton,
	Mikulas Patocka
  Cc: Mandeep Singh Baines, Will Drewry, Elly Jones, Milan Broz,
	Olof Johansson, Steffen Klassert

The verity target provides transparent integrity checking of block devices
using a cryptographic digest.

dm-verity is meant to be setup as part of a verified boot path.  This
may be anything ranging from a boot using tboot or trustedgrub to just
booting from a known-good device (like a USB drive or CD).

dm-verity is part of ChromeOS's verified boot path. It is used to verify
the integrity of the root filesystem on boot. The root filesystem is
mounted on a dm-verity partition which transparently verifies each block
with a bootloader verified hash passed into the kernel at boot.

Changes in V6:
  * Fixed bug in rmmod. Was freeing the same object NR_CPUS times.
  * Fixed example in documentation.
Changes in V5:
* https://lkml.org/lkml/2012/2/29/421 (Mikulas Patocka)
  * Fixed off-by-one error.
  * Added support for filesystems bigger than 4G (bug fix).
* https://lkml.org/lkml/2012/2/29/426 (Andrew Morton)
  * Fixed checkpatch errors/warning.
  * Made code cpu-hotplug-aware.
  * Remove NULL check before calling kfree.
  * No longer checking __GFP_WAIT allocations.
  * Propogate io->error instead of always EIO.
  * Remove unneeded and undesirable casts of void.
  * Use DMERR_LIMIT on io errors to avoid spamming dmesg.
  * Flush workqueue on rmmod.
Changes in V4:
* Discussion over phone (Alasdair G Kergon)
 * copy _ioctl fix from dm-linear
 * verity_status format fixes to match dm conventions
 * s/dm-bht/verity_tree
 * put everything into dm-verity.c
 * ctr changed to dm conventions
 * use hex2bin
 * use conventional dm names for function
  * s/dm_//
  * for example: verity_ctr versus dm_verity_ctr
 * use per_cpu API
Changes in V3:
* Discussion over irc (Alasdair G Kergon)
  * Implement ioctl hook
Changes in V2:
* https://lkml.org/lkml/2011/11/10/85 (Steffen Klassert)
  * Use shash API instead of older hash API

Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Elly Jones <ellyjones@chromium.org>
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Milan Broz <mbroz@redhat.com>
Cc: Olof Johansson <olofj@chromium.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: dm-devel@redhat.com
---
 Documentation/device-mapper/verity.txt |  151 ++++
 drivers/md/Kconfig                     |   16 +
 drivers/md/Makefile                    |    1 +
 drivers/md/dm-verity.c                 | 1366 ++++++++++++++++++++++++++++++++
 4 files changed, 1534 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/device-mapper/verity.txt
 create mode 100644 drivers/md/dm-verity.c

diff --git a/Documentation/device-mapper/verity.txt b/Documentation/device-mapper/verity.txt
new file mode 100644
index 0000000..6729102
--- /dev/null
+++ b/Documentation/device-mapper/verity.txt
@@ -0,0 +1,151 @@
+dm-verity
+==========
+
+Device-Mapper's "verity" target provides transparent integrity checking of
+block devices using a cryptographic digest provided by the kernel crypto API.
+This target is read-only.
+
+Parameters:
+    <version> <dev> <hash_dev> <hash_start> <block_size> <alg> <digest> <salt>
+
+<version>
+    This is the version number of the on-disk format. Currently, there is
+    only version 0.
+
+<dev>
+    This is the device that is going to be integrity checked.  It may be
+    a subset of the full device as specified to dmsetup (start sector and count)
+    It may be specified as a path, like /dev/sdaX, or a device number,
+    <major>:<minor>.
+
+<hash_dev>
+    This is the device that that supplies the hash tree data.  It may be
+    specified similarly to the device path and may be the same device.  If the
+    same device is used, the hash offset should be outside of the dm-verity
+    configured device size.
+
+<hash_start>
+    This is the offset, in 512-byte sectors, from the start of hash_dev to
+    the root block of the hash tree.
+
+<block_size>
+    The size of a hash block. Also, the size of a block to be hashed.
+
+<alg>
+    The cryptographic hash algorithm used for this device.  This should
+    be the name of the algorithm, like "sha1".
+
+<digest>
+    The hexadecimal encoding of the cryptographic hash of all of the
+    neighboring nodes at the first level of the tree.  This hash should be
+    trusted as there is no other authenticity beyond this point.
+
+<salt>
+    The hexadecimal encoding of the salt value.
+
+Theory of operation
+===================
+
+dm-verity is meant to be setup as part of a verified boot path.  This
+may be anything ranging from a boot using tboot or trustedgrub to just
+booting from a known-good device (like a USB drive or CD).
+
+When a dm-verity device is configured, it is expected that the caller
+has been authenticated in some way (cryptographic signatures, etc).
+After instantiation, all hashes will be verified on-demand during
+disk access.  If they cannot be verified up to the root node of the
+tree, the root hash, then the I/O will fail.  This should identify
+tampering with any data on the device and the hash data.
+
+Cryptographic hashes are used to assert the integrity of the device on a
+per-block basis.  This allows for a lightweight hash computation on first read
+into the page cache.  Block hashes are stored linearly aligned to the nearest
+block the size of a page.
+
+Hash Tree
+---------
+
+Each node in the tree is a cryptographic hash.  If it is a leaf node, the hash
+is of some block data on disk.  If it is an intermediary node, then the hash is
+of a number of child nodes.
+
+Each entry in the tree is a collection of neighboring nodes that fit in one
+block.  The number is determined based on block_size and the size of the
+selected cryptographic digest algorithm.  The hashes are linearly ordered in
+this entry and any unaligned trailing space is ignored but included when
+calculating the parent node.
+
+The tree looks something like:
+
+alg = sha256, num_blocks = 32768, block_size = 4096
+
+                                 [   root    ]
+                                /    . . .    \
+                     [entry_0]                 [entry_1]
+                    /  . . .  \                 . . .   \
+         [entry_0_0]   . . .  [entry_0_127]    . . . .  [entry_1_127]
+           / ... \             /   . . .  \             /           \
+     blk_0 ... blk_127  blk_16256   blk_16383      blk_32640 . . . blk_32767
+
+On-disk format
+==============
+
+Below is the recommended on-disk format. The verity kernel code does not
+read the on-disk header. It only reads the hash blocks which directly
+follow the header. It is expected that a user-space tool will verify the
+integrity of the verity_header and then call dm_setup with the correct
+parameters. Alternatively, the header can be omitted and the dm_setup
+parameters can be passed via the kernel command-line in a rooted chain
+of trust where the command-line is verified.
+
+The on-disk format is especially useful in cases where the hash blocks
+are on a separate partition. The magic number allows easy identification
+of the partition contents. Alternatively, the hash blocks can be stored
+in the same partition as the data to be verified. In such a configuration
+the filesystem on the partition would be sized a little smaller than
+the full-partition, leaving room for the hash blocks.
+
+struct verity_header {
+       uint64_t magic = 0x7665726974790a00;
+       uint32_t version;
+       uint32_t block_size;
+       char digest[128]; /* in hex-ascii, null-terminated or 128-bytes */
+       char salt[128]; /* in hex-ascii, null-terminated or 128-bytes */
+}
+
+struct verity_header_block {
+	struct verity_header;
+	char unused[block_size - sizeof(struct verity_header) - sizeof(sig)];
+	char sig[128]; /* in hex-ascii, null-terminated or 128-bytes */
+}
+
+Directly following the header are the hash blocks which are stored a depth
+at a time (starting from the root), sorted in order of increasing index.
+
+Usage
+=====
+
+The API provides mechanisms for reading and verifying a tree. When reading, all
+required data for the hash tree should be populated for a block before
+attempting a verify.  This can be done by calling dm_bht_populate().  When all
+data is ready, a call to dm_bht_verify_block() with the expected hash value will
+perform both the direct block hash check and the hashes of the parent and
+neighboring nodes where needed to ensure validity up to the root hash.  Note,
+dm_bht_set_root_hexdigest() should be called before any verification attempts
+occur.
+
+Example
+=======
+
+Setup a device;
+[[
+  dmsetup create vroot --table \
+    "0 `blockdev --getsize /dev/sda1` "\
+    "verity /dev/sda1 /dev/sda2 0 4096 sha256 "\
+    "4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076 "\
+    "1234000000000000000000000000000000000000000000000000000000000000"
+]]
+
+A command line tool is available to compute the hash tree and return the
+root hash value.
+  http://git.chromium.org/cgi-bin/gitweb.cgi?p=dm-verity.git;a=tree
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index faa4741..b8bb690 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -370,4 +370,20 @@ config DM_FLAKEY
        ---help---
          A target that intermittently fails I/O for debugging purposes.
 
+config DM_VERITY
+        tristate "Verity target support"
+        depends on BLK_DEV_DM
+        select CRYPTO
+        select CRYPTO_HASH
+        ---help---
+          This device-mapper target allows you to create a device that
+          transparently integrity checks the data on it. You'll need to
+          activate the digests you're going to use in the cryptoapi
+          configuration.
+
+          To compile this code as a module, choose M here: the module will
+          be called dm-verity.
+
+          If unsure, say N.
+
 endif # MD
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index 046860c..70a29af 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -39,6 +39,7 @@ obj-$(CONFIG_DM_SNAPSHOT)	+= dm-snapshot.o
 obj-$(CONFIG_DM_PERSISTENT_DATA)	+= persistent-data/
 obj-$(CONFIG_DM_MIRROR)		+= dm-mirror.o dm-log.o dm-region-hash.o
 obj-$(CONFIG_DM_LOG_USERSPACE)	+= dm-log-userspace.o
+obj-$(CONFIG_DM_VERITY)         += dm-verity.o
 obj-$(CONFIG_DM_ZERO)		+= dm-zero.o
 obj-$(CONFIG_DM_RAID)	+= dm-raid.o
 obj-$(CONFIG_DM_THIN_PROVISIONING)	+= dm-thin-pool.o
diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
new file mode 100644
index 0000000..3f9fed9
--- /dev/null
+++ b/drivers/md/dm-verity.c
@@ -0,0 +1,1366 @@
+/*
+ * Originally based on dm-crypt.c,
+ * Copyright (C) 2003 Christophe Saout <christophe@saout.de>
+ * Copyright (C) 2004 Clemens Fruhwirth <clemens@endorphin.org>
+ * Copyright (C) 2006-2008 Red Hat, Inc. All rights reserved.
+ * Copyright (C) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *                    All Rights Reserved.
+ *
+ * This file is released under the GPLv2.
+ *
+ * Implements a verifying transparent block device.
+ * See Documentation/device-mapper/verity.txt
+ */
+#include <crypto/hash.h>
+#include <linux/atomic.h>
+#include <linux/bio.h>
+#include <linux/blkdev.h>
+#include <linux/genhd.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mempool.h>
+#include <linux/module.h>
+#include <linux/percpu.h>
+#include <linux/workqueue.h>
+#include <linux/device-mapper.h>
+
+
+#define DM_MSG_PREFIX "verity"
+
+
+/* Helper for printing sector_t */
+#define ULL(x) ((unsigned long long)(x))
+
+#define MIN_IOS 32
+#define MIN_BIOS (MIN_IOS * 2)
+
+/* To avoid allocating memory for digest tests, we just setup a
+ * max to use for now.
+ */
+#define VERITY_MAX_DIGEST_SIZE 64   /* Supports up to 512-bit digests */
+#define VERITY_SALT_SIZE       32   /* 256 bits of salt is a lot */
+
+/* UNALLOCATED, PENDING, READY, and VERIFIED are valid states. All other
+ * values are entry-related return codes.
+ */
+#define VERITY_TREE_ENTRY_VERIFIED 8  /* 'nodes' checked against parent */
+#define VERITY_TREE_ENTRY_READY 4  /* 'nodes' is loaded and available */
+#define VERITY_TREE_ENTRY_PENDING 2  /* 'nodes' is being loaded */
+#define VERITY_TREE_ENTRY_UNALLOCATED 0 /* untouched */
+#define VERITY_TREE_ENTRY_ERROR -1 /* entry is unsuitable for use */
+#define VERITY_TREE_ENTRY_ERROR_IO -2 /* I/O error on load */
+
+/* Additional possible return codes */
+#define VERITY_TREE_ENTRY_ERROR_MISMATCH -3 /* Digest mismatch */
+
+
+struct verity_io {
+	struct dm_target *target;
+	struct bio *bio;
+	struct delayed_work work;
+	unsigned int flags;
+
+	int error;
+	atomic_t pending;
+
+	u64 block;  /* aligned block index */
+	u64 count;  /* aligned count in blocks */
+};
+
+/* verity_tree_entry
+ * Contains verity_tree->node_count tree nodes at a given tree depth.
+ * state is used to transactionally assure that data is paged in
+ * from disk.  Unless verity_tree kept running crypto contexts for each
+ * level, we need to load in the data for on-demand verification.
+ */
+struct verity_tree_entry {
+	atomic_t state; /* see defines */
+	/* Keeping an extra pointer per entry wastes up to ~33k of
+	 * memory if a 1m blocks are used (or 66 on 64-bit arch)
+	 */
+	struct verity_io *io_context;  /* Reserve a pointer for use during io */
+	/* data should only be non-NULL if fully populated. */
+	void *nodes;  /* The hash data used to verify the children.
+		       * Guaranteed to be page-aligned.
+		       */
+};
+
+/* verity_tree_level
+ * Contains an array of entries which represent a page of hashes where
+ * each hash is a node in the tree at the given tree depth/level.
+ */
+struct verity_tree_level {
+	struct verity_tree_entry *entries;  /* array of entries of tree nodes */
+	unsigned int count;  /* number of entries at this level */
+	sector_t sector;  /* starting sector for this level */
+};
+
+/* opaque context, start, databuf, sector_count */
+typedef int(*verity_tree_callback)(void *,  /* external context */
+			      sector_t,  /* start sector */
+			      u8 *,  /* destination page */
+			      sector_t,  /* num sectors */
+			      struct verity_tree_entry *);
+/* verity_tree - Device mapper block hash tree
+ * verity_tree provides a fixed interface for comparing data blocks
+ * against a cryptographic hashes stored in a hash tree. It
+ * optimizes the tree structure for storage on disk.
+ *
+ * The tree is built from the bottom up.  A collection of data,
+ * external to the tree, is hashed and these hashes are stored
+ * as the blocks in the tree.  For some number of these hashes,
+ * a parent node is created by hashing them.  These steps are
+ * repeated.
+ */
+struct verity_tree {
+	/* Configured values */
+	int depth;  /* Depth of the tree including the root */
+	unsigned int block_size;  /* Size of a hash block */
+	u64 block_count;  /* Number of blocks hashed */
+	char hash_alg[CRYPTO_MAX_ALG_NAME];
+	u8 salt[VERITY_SALT_SIZE];
+
+	/* Computed values */
+	unsigned int node_count;  /* Data size (in hashes) for each entry */
+	unsigned int node_count_shift;  /* first bit set - 1 */
+	struct crypto_shash *tfm; /* hash for this device */
+	unsigned int hash_desc_size;
+	sector_t sectors;  /* Number of disk sectors used */
+	u8 digest[VERITY_MAX_DIGEST_SIZE];
+	unsigned int digest_size;
+
+	struct verity_tree_level *levels;
+
+	/* Callback for reading from the hash device */
+	verity_tree_callback read_cb;
+};
+
+/* per-requested-bio private data */
+enum verity_io_flags {
+	VERITY_IOFLAGS_CLONED = 0x1,	/* original bio has been cloned */
+};
+
+struct verity_config {
+	struct dm_dev *dev;
+	sector_t start;
+	sector_t size;
+
+	struct dm_dev *hash_dev;
+	sector_t hash_start;
+
+	struct verity_tree vt;
+
+	/* Pool required for io contexts */
+	mempool_t *io_pool;
+	/* Pool and bios required for making sure that backing device reads are
+	 * in PAGE_SIZE increments.
+	 */
+	struct bio_set *bs;
+
+	char hash_alg[CRYPTO_MAX_ALG_NAME];
+};
+
+
+static struct kmem_cache *_verity_io_pool;
+static struct workqueue_struct *kveritydq, *kverityd_ioq;
+
+static DEFINE_PER_CPU(struct shash_desc *, verity_hash_desc);
+static DEFINE_PER_CPU(unsigned int, verity_hash_size);
+
+static void kverityd_verify(struct work_struct *work);
+static void kverityd_io(struct work_struct *work);
+static void kverityd_io_vt_populate(struct verity_io *io);
+static void kverityd_io_vt_populate_end(struct bio *, int error);
+
+
+/*
+ * Utilities
+ */
+
+static void bin2hex(char *dst, const u8 *src, size_t count)
+{
+	while (count-- > 0) {
+		sprintf(dst, "%02hhx", (int)*src);
+		dst += 2;
+		src++;
+	}
+}
+
+/*
+ * Verity Tree
+ */
+
+/* Functions for converting indices to nodes. */
+
+static inline unsigned int verity_tree_get_level_shift(struct verity_tree *vt,
+						  int depth)
+{
+	return (vt->depth - depth) * vt->node_count_shift;
+}
+
+/* For the given depth, this is the entry index.  At depth+1 it is the node
+ * index for depth.
+ */
+static inline u64 verity_tree_index_at_level(struct verity_tree *vt,
+					     int depth, u64 leaf)
+{
+	return leaf >> verity_tree_get_level_shift(vt, depth);
+}
+
+static inline struct verity_tree_entry *verity_tree_get_entry(
+		struct verity_tree *vt,
+		int depth, u64 block)
+{
+	u64 index = verity_tree_index_at_level(vt, depth, block);
+	struct verity_tree_level *level = &vt->levels[depth];
+
+	return &level->entries[index];
+}
+
+static inline void *verity_tree_get_node(struct verity_tree *vt,
+					 struct verity_tree_entry *entry,
+					 int depth, unsigned int block)
+{
+	u64 index = verity_tree_index_at_level(vt, depth, block);
+	unsigned int node_index = (unsigned int)index % vt->node_count;
+
+	return entry->nodes + (node_index * vt->digest_size);
+}
+
+/**
+ * verity_tree_compute_hash: hashes a page of data
+ */
+static int verity_tree_compute_hash(struct verity_tree *vt, struct page *pg,
+				    unsigned int offset, u8 *digest)
+{
+	struct shash_desc **hash_descp = &__get_cpu_var(verity_hash_desc);
+	unsigned int *hash_sizep = &__get_cpu_var(verity_hash_size);
+	struct shash_desc *hash_desc;
+	void *data;
+	int err;
+
+	if (!*hash_descp || *hash_sizep < vt->hash_desc_size) {
+		kfree(*hash_descp);
+		*hash_descp = kmalloc(vt->hash_desc_size, GFP_KERNEL);
+		*hash_sizep = vt->hash_desc_size;
+	}
+	hash_desc = *hash_descp;
+	hash_desc->tfm = vt->tfm;
+	hash_desc->flags = 0x0;
+
+	if (crypto_shash_init(hash_desc)) {
+		DMCRIT("failed to reinitialize crypto hash (proc:%d)",
+			smp_processor_id());
+		return -EINVAL;
+	}
+	data = kmap_atomic(pg);
+	err = crypto_shash_update(hash_desc, data + offset, PAGE_SIZE);
+	kunmap_atomic(data);
+	if (err) {
+		DMCRIT("crypto_hash_update failed");
+		return -EINVAL;
+	}
+	if (crypto_shash_update(hash_desc, vt->salt, sizeof(vt->salt))) {
+		DMCRIT("crypto_hash_update failed");
+		return -EINVAL;
+	}
+	if (crypto_shash_final(hash_desc, digest)) {
+		DMCRIT("crypto_hash_final failed");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int verity_tree_initialize_entries(struct verity_tree *vt)
+{
+	/* last represents the index of the last digest store in the tree.
+	 * By walking the tree with that index, it is possible to compute the
+	 * total number of entries at each level.
+	 *
+	 * Since each entry will contain up to |node_count| nodes of the tree,
+	 * it is possible that the last index may not be at the end of a given
+	 * entry->nodes.  In that case, it is assumed the value is padded.
+	 *
+	 * Note, we treat both the tree root (1 hash) and the tree leaves
+	 * independently from the vt data structures.  Logically, the root is
+	 * depth=-1 and the block layer level is depth=vt->depth
+	 */
+	u64 last = vt->block_count - 1;
+	int depth;
+
+	/* check that the largest level->count can't result in an int overflow
+	 * on allocation or sector calculation.
+	 */
+	if (((last >> vt->node_count_shift) + 1) >
+	    UINT_MAX / max_t(unsigned long,
+			     sizeof(struct verity_tree_entry),
+			     (unsigned long)to_sector(vt->block_size))) {
+		DMCRIT("required entries %llu is too large", vt->block_count);
+		return -EINVAL;
+	}
+
+	/* Track the current sector location for each level so we don't have to
+	 * compute it during traversals.
+	 */
+	vt->sectors = 0;
+	for (depth = 0; depth < vt->depth; ++depth) {
+		struct verity_tree_level *level = &vt->levels[depth];
+
+		level->count = verity_tree_index_at_level(vt, depth, last) + 1;
+		level->entries = kcalloc(level->count,
+					 sizeof(struct verity_tree_entry),
+					 GFP_KERNEL);
+		if (!level->entries) {
+			DMERR("failed to allocate entries for depth %d", depth);
+			return -ENOMEM;
+		}
+		level->sector = vt->sectors;
+		vt->sectors += level->count * to_sector(vt->block_size);
+	}
+
+	return 0;
+}
+
+/**
+ * verity_tree_create - prepares @vt for us
+ * @vt:	          pointer to a verity_tree_create()d vt
+ * @depth:	  tree depth without the root; including block hashes
+ * @block_count:  the number of block hashes / tree leaves
+ * @alg_name:	  crypto hash algorithm name
+ *
+ * Returns 0 on success.
+ *
+ * Callers can offset into devices by storing the data in the io callbacks.
+ */
+static int verity_tree_create(struct verity_tree *vt, u64 block_count,
+			      unsigned int block_size, const char *alg_name)
+{
+	int status = 0;
+
+	vt->block_size = block_size;
+	/* Verify that PAGE_SIZE >= block_size >= SECTOR_SIZE. */
+	if ((block_size > PAGE_SIZE) ||
+	    (PAGE_SIZE % block_size) ||
+	    (to_sector(block_size) == 0))
+		return -EINVAL;
+
+	vt->tfm = crypto_alloc_shash(alg_name, 0, 0);
+	if (IS_ERR(vt->tfm)) {
+		DMERR("failed to allocate crypto hash '%s'", alg_name);
+		return -ENOMEM;
+	}
+	vt->hash_desc_size = sizeof(struct shash_desc) +
+		crypto_shash_descsize(vt->tfm);
+
+	vt->digest_size = crypto_shash_digestsize(vt->tfm);
+	/* We expect to be able to pack >=2 hashes into a block */
+	if (block_size / vt->digest_size < 2) {
+		DMERR("too few hashes fit in a block");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	if (vt->digest_size > VERITY_MAX_DIGEST_SIZE) {
+		DMERR("VERITY_MAX_DIGEST_SIZE too small for digest");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	/* Configure the tree */
+	vt->block_count = block_count;
+	if (block_count == 0) {
+		DMERR("block_count must be non-zero");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	/* Each verity_tree_entry->nodes is one block.  The node code tracks
+	 * how many nodes fit into one entry where a node is a single
+	 * hash (message digest).
+	 */
+	vt->node_count_shift = fls(block_size / vt->digest_size) - 1;
+	/* Round down to the nearest power of two.  This makes indexing
+	 * into the tree much less painful.
+	 */
+	vt->node_count = 1 << vt->node_count_shift;
+
+	/* This is unlikely to happen, but with 64k pages, who knows. */
+	if (vt->node_count > UINT_MAX / vt->digest_size) {
+		DMERR("node_count * hash_len exceeds UINT_MAX!");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	vt->depth = DIV_ROUND_UP(fls64(block_count - 1), vt->node_count_shift);
+
+	/* Ensure that we can safely shift by this value. */
+	if (vt->depth * vt->node_count_shift >= sizeof(unsigned int) * 8) {
+		DMERR("specified depth and node_count_shift is too large");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	/* Allocate levels. Each level of the tree may have an arbitrary number
+	 * of verity_tree_entry structs.  Each entry contains node_count nodes.
+	 * Each node in the tree is a cryptographic digest of either node_count
+	 * nodes on the subsequent level or of a specific block on disk.
+	 */
+	vt->levels = kcalloc(vt->depth,
+			     sizeof(struct verity_tree_level), GFP_KERNEL);
+
+	vt->read_cb = NULL;
+
+	status = verity_tree_initialize_entries(vt);
+	if (status)
+		goto bad_entries_alloc;
+
+	/* We compute depth such that there is only be 1 block at level 0. */
+	BUG_ON(vt->levels[0].count != 1);
+
+	return 0;
+
+bad_entries_alloc:
+	while (vt->depth-- > 0)
+		kfree(vt->levels[vt->depth].entries);
+	kfree(vt->levels);
+bad_arg:
+	crypto_free_shash(vt->tfm);
+	return status;
+}
+
+/**
+ * verity_tree_read_completed
+ * @entry:   pointer to the entry that's been loaded
+ * @status:  I/O status. Non-zero is failure.
+ * MUST always be called after a read_cb completes.
+ */
+static void verity_tree_read_completed(struct verity_tree_entry *entry,
+				       int status)
+{
+	if (status) {
+		DMCRIT("an I/O error occurred while reading entry");
+		atomic_set(&entry->state, VERITY_TREE_ENTRY_ERROR_IO);
+		return;
+	}
+	BUG_ON(atomic_read(&entry->state) != VERITY_TREE_ENTRY_PENDING);
+	atomic_set(&entry->state, VERITY_TREE_ENTRY_READY);
+}
+
+/**
+ * verity_tree_verify_block - checks that all path nodes for @block are valid
+ * @vt:	     pointer to a verity_tree_create()d vt
+ * @block:   specific block data is expected from
+ * @pg:	     page holding the block data
+ * @offset:  offset into the page
+ *
+ * Returns 0 on success, VERITY_TREE_ENTRY_ERROR_MISMATCH on error.
+ */
+static int verity_tree_verify_block(struct verity_tree *vt, unsigned int block,
+				    struct page *pg, unsigned int offset)
+{
+	int state, depth = vt->depth;
+	u8 digest[VERITY_MAX_DIGEST_SIZE];
+	struct verity_tree_entry *entry;
+	void *node;
+
+	do {
+		/* Need to check that the hash of the current block is accurate
+		 * in its parent.
+		 */
+		entry = verity_tree_get_entry(vt, depth - 1, block);
+		state = atomic_read(&entry->state);
+		/* This call is only safe if all nodes along the path
+		 * are already populated (i.e. READY) via verity_tree_populate.
+		 */
+		BUG_ON(state < VERITY_TREE_ENTRY_READY);
+		node = verity_tree_get_node(vt, entry, depth, block);
+
+		if (verity_tree_compute_hash(vt, pg, offset, digest) ||
+		    memcmp(digest, node, vt->digest_size))
+			goto mismatch;
+
+		/* Keep the containing block of hashes to be verified in the
+		 * next pass.
+		 */
+		pg = virt_to_page(entry->nodes);
+		offset = offset_in_page(entry->nodes);
+	} while (--depth > 0 && state != VERITY_TREE_ENTRY_VERIFIED);
+
+	if (depth == 0 && state != VERITY_TREE_ENTRY_VERIFIED) {
+		if (verity_tree_compute_hash(vt, pg, offset, digest) ||
+		    memcmp(digest, vt->digest, vt->digest_size))
+			goto mismatch;
+		atomic_set(&entry->state, VERITY_TREE_ENTRY_VERIFIED);
+	}
+
+	/* Mark path to leaf as verified. */
+	for (depth++; depth < vt->depth; depth++) {
+		entry = verity_tree_get_entry(vt, depth, block);
+		/* At this point, entry can only be in VERIFIED or READY state.
+		 * So it is safe to use atomic_set instead of atomic_cmpxchg.
+		 */
+		atomic_set(&entry->state, VERITY_TREE_ENTRY_VERIFIED);
+	}
+
+	return 0;
+
+mismatch:
+	DMERR_LIMIT("verify_path: failed to verify hash (d=%d,bi=%u)",
+		    depth, block);
+	return VERITY_TREE_ENTRY_ERROR_MISMATCH;
+}
+
+/**
+ * verity_tree_is_populated - check that nodes needed to verify a given
+ *                            block are all ready
+ * @vt:	    pointer to a verity_tree_create()d vt
+ * @block:  specific block data is expected from
+ *
+ * Callers may wish to call verity_tree_is_populated() when checking an io
+ * for which entries were already pending.
+ */
+static bool verity_tree_is_populated(struct verity_tree *vt, unsigned int block)
+{
+	int depth;
+
+	for (depth = vt->depth - 1; depth >= 0; depth--) {
+		struct verity_tree_entry *entry;
+		entry = verity_tree_get_entry(vt, depth, block);
+		if (atomic_read(&entry->state) < VERITY_TREE_ENTRY_READY)
+			return false;
+	}
+
+	return true;
+}
+
+/**
+ * verity_tree_populate - reads entries from disk needed to verify a given block
+ * @vt:     pointer to a verity_tree_create()d vt
+ * @ctx:    context used for all read_cb calls on this request
+ * @block:  specific block data is expected from
+ *
+ * Returns negative value on error. Returns 0 on success.
+ */
+static int verity_tree_populate(struct verity_tree *vt, void *ctx,
+				unsigned int block)
+{
+	int depth, state;
+
+	BUG_ON(block >= vt->block_count);
+
+	for (depth = vt->depth - 1; depth >= 0; --depth) {
+		struct verity_tree_level *level;
+		struct verity_tree_entry *entry;
+		u64 index;
+
+		index = verity_tree_index_at_level(vt, depth, block);
+		level = &vt->levels[depth];
+		entry = verity_tree_get_entry(vt, depth, block);
+		state = atomic_cmpxchg(&entry->state,
+				       VERITY_TREE_ENTRY_UNALLOCATED,
+				       VERITY_TREE_ENTRY_PENDING);
+		if (state == VERITY_TREE_ENTRY_VERIFIED)
+			break;
+		if (state <= VERITY_TREE_ENTRY_ERROR)
+			goto error_state;
+		if (state != VERITY_TREE_ENTRY_UNALLOCATED)
+			continue;
+
+		/* Current entry is claimed for allocation and loading */
+		entry->nodes = kmalloc(vt->block_size, GFP_NOIO);
+
+		vt->read_cb(ctx,
+			    level->sector + to_sector(index * vt->block_size),
+			    entry->nodes, to_sector(vt->block_size), entry);
+	}
+
+	return 0;
+
+error_state:
+	DMCRIT("block %u at depth %d is in an error state", block, depth);
+	return -EPERM;
+}
+
+/**
+ * verity_tree_destroy - cleans up all memory used by @vt
+ * @vt:	 pointer to a verity_tree_create()d vt
+ */
+static void verity_tree_destroy(struct verity_tree *vt)
+{
+	int depth;
+
+	for (depth = 0; depth < vt->depth; depth++) {
+		struct verity_tree_entry *entry = vt->levels[depth].entries;
+		struct verity_tree_entry *entry_end = entry +
+			vt->levels[depth].count;
+		for (; entry < entry_end; ++entry)
+			kfree(entry->nodes);
+		kfree(vt->levels[depth].entries);
+	}
+	kfree(vt->levels);
+	crypto_free_shash(vt->tfm);
+}
+
+/*
+ * Verity Tree Accessors
+ */
+
+/**
+ * verity_tree_set_digest - sets an unverified root digest hash from hex
+ * @vt:	     pointer to a verity_tree_create()d vt
+ * @digest:  string containing the digest in hex
+ * Returns non-zero on error.
+ */
+static int verity_tree_set_digest(struct verity_tree *vt, const char *digest)
+{
+	/* Make sure we have at least the bytes expected */
+	if (strnlen(digest, vt->digest_size * 2) != vt->digest_size * 2) {
+		DMERR("root digest length does not match hash algorithm");
+		return -1;
+	}
+	return hex2bin(vt->digest, digest, vt->digest_size);
+}
+
+/**
+ * verity_tree_digest - returns root digest in hex
+ * @vt:	     pointer to a verity_tree_create()d vt
+ * @digest:  buffer to put into, must be of length VERITY_SALT_SIZE * 2 + 1.
+ */
+int verity_tree_digest(struct verity_tree *vt, char *digest)
+{
+	bin2hex(digest, vt->digest, vt->digest_size);
+	return 0;
+}
+
+/**
+ * verity_tree_set_salt - sets the salt
+ * @vt:    pointer to a verity_tree_create()d vt
+ * @salt:  string containing the salt in hex
+ * Returns non-zero on error.
+ */
+int verity_tree_set_salt(struct verity_tree *vt, const char *salt)
+{
+	size_t saltlen = min(strlen(salt) / 2, sizeof(vt->salt));
+	memset(vt->salt, 0, sizeof(vt->salt));
+	return hex2bin(vt->salt, salt, saltlen);
+}
+
+
+/**
+ * verity_tree_salt - returns the salt in hex
+ * @vt:    pointer to a verity_tree_create()d vt
+ * @salt:  buffer to put salt into, of length VERITY_SALT_SIZE * 2 + 1.
+ */
+int verity_tree_salt(struct verity_tree *vt, char *salt)
+{
+	bin2hex(salt, vt->salt, sizeof(vt->salt));
+	return 0;
+}
+
+/*
+ * Allocation and utility functions
+ */
+
+static void kverityd_src_io_read_end(struct bio *clone, int error);
+
+/* Shared destructor for all internal bios */
+static void verity_bio_destructor(struct bio *bio)
+{
+	struct verity_io *io = bio->bi_private;
+	struct verity_config *vc = io->target->private;
+	bio_free(bio, vc->bs);
+}
+
+static struct bio *verity_alloc_bioset(struct verity_config *vc, gfp_t gfp_mask,
+				       int nr_iovecs)
+{
+	return bio_alloc_bioset(gfp_mask, nr_iovecs, vc->bs);
+}
+
+static struct verity_io *verity_io_alloc(struct dm_target *ti,
+					    struct bio *bio)
+{
+	struct verity_config *vc = ti->private;
+	sector_t sector = bio->bi_sector - ti->begin;
+	struct verity_io *io;
+	u64 tmp;
+
+	io = mempool_alloc(vc->io_pool, GFP_NOIO);
+	io->flags = 0;
+	io->target = ti;
+	io->bio = bio;
+	io->error = 0;
+
+	/* Adjust the sector by the virtual starting sector */
+	tmp = (u64)to_bytes(1) * sector;
+	do_div(tmp, vc->vt.block_size);
+	io->block = tmp;
+	io->count = bio->bi_size / vc->vt.block_size;
+
+	atomic_set(&io->pending, 0);
+
+	return io;
+}
+
+static struct bio *verity_bio_clone(struct verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+	struct bio *bio = io->bio;
+	struct bio *clone = verity_alloc_bioset(vc, GFP_NOIO, bio->bi_max_vecs);
+
+	__bio_clone(clone, bio);
+	clone->bi_private = io;
+	clone->bi_end_io  = kverityd_src_io_read_end;
+	clone->bi_bdev    = vc->dev->bdev;
+	clone->bi_sector += vc->start - io->target->begin;
+	clone->bi_destructor = verity_bio_destructor;
+
+	return clone;
+}
+
+/*
+ * Reverse flow of requests into the device.
+ *
+ * (Start at the bottom with verity_map and work your way upward).
+ */
+
+static void verity_inc_pending(struct verity_io *io);
+
+static void verity_return_bio_to_caller(struct verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+
+	bio_endio(io->bio, io->error);
+	mempool_free(io, vc->io_pool);
+}
+
+/* Check for any missing vt hashes. */
+static bool verity_is_vt_populated(struct verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+	u64 block;
+
+	for (block = io->block; block < io->block + io->count; ++block)
+		if (!verity_tree_is_populated(&vc->vt, block))
+			return false;
+
+	return true;
+}
+
+/* verity_dec_pending manages the lifetime of all verity_io structs.
+ * Non-bug error handling is centralized through this interface and
+ * all passage from workqueue to workqueue.
+ */
+static void verity_dec_pending(struct verity_io *io)
+{
+	if (!atomic_dec_and_test(&io->pending))
+		goto done;
+
+	if (unlikely(io->error))
+		goto io_error;
+
+	/* I/Os that were pending may now be ready */
+	if (verity_is_vt_populated(io)) {
+		INIT_DELAYED_WORK(&io->work, kverityd_verify);
+		queue_delayed_work(kveritydq, &io->work, 0);
+	} else {
+		INIT_DELAYED_WORK(&io->work, kverityd_io);
+		queue_delayed_work(kverityd_ioq, &io->work, HZ/10);
+	}
+
+done:
+	return;
+
+io_error:
+	verity_return_bio_to_caller(io);
+}
+
+/* Walks the data set and computes the hash of the data read from the
+ * untrusted source device.  The computed hash is then passed to verity-tree
+ * for verification.
+ */
+static int verity_verify(struct verity_config *vc,
+			 struct verity_io *io)
+{
+	unsigned int block_size = vc->vt.block_size;
+	struct bio *bio = io->bio;
+	u64 block = io->block;
+	unsigned int idx;
+	int r;
+
+	for (idx = bio->bi_idx; idx < bio->bi_vcnt; idx++) {
+		struct bio_vec *bv = bio_iovec_idx(bio, idx);
+		unsigned int offset = bv->bv_offset;
+		unsigned int len = bv->bv_len;
+
+		BUG_ON(offset % block_size);
+		BUG_ON(len % block_size);
+
+		while (len) {
+			r = verity_tree_verify_block(&vc->vt, block,
+						bv->bv_page, offset);
+			if (r)
+				goto bad_return;
+
+			offset += block_size;
+			len -= block_size;
+			block++;
+			cond_resched();
+		}
+	}
+
+	return 0;
+
+bad_return:
+	/* verity_tree functions aren't expected to return errno friendly
+	 * values.  They are converted here for uniformity.
+	 */
+	if (r > 0) {
+		DMERR("Pending data for block %llu seen at verify", ULL(block));
+		r = -EBUSY;
+	} else {
+		DMERR_LIMIT("Block hash does not match!");
+		r = -EACCES;
+	}
+	return r;
+}
+
+/* Services the verify workqueue */
+static void kverityd_verify(struct work_struct *work)
+{
+	struct delayed_work *dwork = container_of(work, struct delayed_work,
+						  work);
+	struct verity_io *io = container_of(dwork, struct verity_io,
+					    work);
+	struct verity_config *vc = io->target->private;
+
+	io->error = verity_verify(vc, io);
+
+	/* Free up the bio and tag with the return value */
+	verity_return_bio_to_caller(io);
+}
+
+/* Asynchronously called upon the completion of verity-tree I/O. The status
+ * of the operation is passed back to verity-tree and the next steps are
+ * decided by verity_dec_pending.
+ */
+static void kverityd_io_vt_populate_end(struct bio *bio, int error)
+{
+	struct verity_tree_entry *entry = bio->bi_private;
+	struct verity_io *io = entry->io_context;
+
+	/* Tell the tree to atomically update now that we've populated
+	 * the given entry.
+	 */
+	verity_tree_read_completed(entry, error);
+
+	/* Clean up for reuse when reading data to be checked */
+	bio->bi_vcnt = 0;
+	bio->bi_io_vec->bv_offset = 0;
+	bio->bi_io_vec->bv_len = 0;
+	bio->bi_io_vec->bv_page = NULL;
+	/* Restore the private data to I/O so the destructor can be shared. */
+	bio->bi_private = io;
+	bio_put(bio);
+
+	/* We bail but assume the tree has been marked bad. */
+	if (unlikely(error)) {
+		DMERR("Failed to read for sector %llu (%u)",
+		      ULL(io->bio->bi_sector), io->bio->bi_size);
+		io->error = error;
+		/* Pass through the error to verity_dec_pending below */
+	}
+	/* When pending = 0, it will transition to reading real data */
+	verity_dec_pending(io);
+}
+
+/* Called by verity-tree (via verity_tree_populate), this function provides
+ * the message digests to verity-tree that are stored on disk.
+ */
+static int kverityd_vt_read_callback(void *ctx, sector_t start, u8 *dst,
+				      sector_t count,
+				      struct verity_tree_entry *entry)
+{
+	struct verity_io *io = ctx;  /* I/O for this batch */
+	struct verity_config *vc;
+	struct bio *bio;
+
+	vc = io->target->private;
+
+	/* The I/O context is nested inside the entry so that we don't need one
+	 * io context per page read.
+	 */
+	entry->io_context = ctx;
+
+	/* We should only get page size requests at present. */
+	verity_inc_pending(io);
+	bio = verity_alloc_bioset(vc, GFP_NOIO, 1);
+	bio->bi_private = entry;
+	bio->bi_idx = 0;
+	bio->bi_size = vc->vt.block_size;
+	bio->bi_sector = vc->hash_start + start;
+	bio->bi_bdev = vc->hash_dev->bdev;
+	bio->bi_end_io = kverityd_io_vt_populate_end;
+	bio->bi_rw = REQ_META;
+	/* Only need to free the bio since the page is managed by vt */
+	bio->bi_destructor = verity_bio_destructor;
+	bio->bi_vcnt = 1;
+	bio->bi_io_vec->bv_offset = offset_in_page(dst);
+	bio->bi_io_vec->bv_len = to_bytes(count);
+	/* dst is guaranteed to be a page_pool allocation */
+	bio->bi_io_vec->bv_page = virt_to_page(dst);
+	/* Track that this I/O is in use.  There should be no risk of the io
+	 * being removed prior since this is called synchronously.
+	 */
+	generic_make_request(bio);
+	return 0;
+}
+
+/* Submits an io request for each missing block of block hashes.
+ * The last one to return will then enqueue this on the io workqueue.
+ */
+static void kverityd_io_vt_populate(struct verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+	u64 block;
+
+	for (block = io->block; block < io->block + io->count; ++block) {
+		int ret = verity_tree_populate(&vc->vt, io, block);
+
+		if (ret < 0) {
+			/* verity_dec_pending will handle the error case. */
+			io->error = ret;
+			break;
+		}
+	}
+}
+
+/* Asynchronously called upon the completion of I/O issued
+ * from kverityd_src_io_read. verity_dec_pending() acts as
+ * the scheduler/flow manager.
+ */
+static void kverityd_src_io_read_end(struct bio *clone, int error)
+{
+	struct verity_io *io = clone->bi_private;
+
+	if (unlikely(!bio_flagged(clone, BIO_UPTODATE) && !error))
+		error = -EIO;
+
+	if (unlikely(error)) {
+		DMERR_LIMIT("Error occurred: %d (%llu, %u)",
+			    error, ULL(clone->bi_sector), clone->bi_size);
+		io->error = error;
+	}
+
+	/* Release the clone which just avoids the block layer from
+	 * leaving offsets, etc in unexpected states.
+	 */
+	bio_put(clone);
+
+	verity_dec_pending(io);
+}
+
+/* If not yet underway, an I/O request will be issued to the vc->dev
+ * device for the data needed. It is cloned to avoid unexpected changes
+ * to the original bio struct.
+ */
+static void kverityd_src_io_read(struct verity_io *io)
+{
+	struct bio *clone;
+
+	/* Check if the read is already issued. */
+	if (io->flags & VERITY_IOFLAGS_CLONED)
+		return;
+
+	io->flags |= VERITY_IOFLAGS_CLONED;
+
+	/* Clone the bio. The block layer may modify the bvec array. */
+	clone = verity_bio_clone(io);
+	if (unlikely(!clone)) {
+		io->error = -ENOMEM;
+		return;
+	}
+
+	verity_inc_pending(io);
+
+	generic_make_request(clone);
+}
+
+/* kverityd_io services the I/O workqueue. For each pass through
+ * the I/O workqueue, a call to populate both the origin drive
+ * data and the hash tree data is made.
+ */
+static void kverityd_io(struct work_struct *work)
+{
+	struct delayed_work *dwork = container_of(work, struct delayed_work,
+						  work);
+	struct verity_io *io = container_of(dwork, struct verity_io,
+					    work);
+
+	/* Issue requests asynchronously. */
+	verity_inc_pending(io);
+	kverityd_src_io_read(io);
+	kverityd_io_vt_populate(io);
+	verity_dec_pending(io);
+}
+
+/* Paired with verity_dec_pending, the pending value in the io dictate the
+ * lifetime of a request and when it is ready to be processed on the
+ * workqueues.
+ */
+static void verity_inc_pending(struct verity_io *io)
+{
+	atomic_inc(&io->pending);
+}
+
+/* Block-level requests start here. */
+static int verity_map(struct dm_target *ti, struct bio *bio,
+		      union map_info *map_context)
+{
+	struct verity_io *io;
+	struct verity_config *vc;
+	struct request_queue *r_queue;
+
+	if (unlikely(!ti)) {
+		DMERR("dm_target was NULL");
+		return -EIO;
+	}
+
+	vc = ti->private;
+	r_queue = bdev_get_queue(vc->dev->bdev);
+
+	if (bio_data_dir(bio) == WRITE) {
+		/* If we silently drop writes, then the VFS layer will cache
+		 * the write and persist it in memory. While it doesn't change
+		 * the underlying storage, it still may be contrary to the
+		 * behavior expected by a verified, read-only device.
+		 */
+		DMWARN_LIMIT("write request received. rejecting with -EIO.");
+		return -EIO;
+	} else {
+		/* Queue up the request to be verified */
+		io = verity_io_alloc(ti, bio);
+		if (!io) {
+			DMERR_LIMIT("Failed to allocate and init IO data");
+			return DM_MAPIO_REQUEUE;
+		}
+		INIT_DELAYED_WORK(&io->work, kverityd_io);
+		queue_delayed_work(kverityd_ioq, &io->work, 0);
+	}
+
+	return DM_MAPIO_SUBMITTED;
+}
+
+/*
+ * Non-block interfaces and device-mapper specific code
+ */
+
+/*
+ * Verity target parameters:
+ *
+ * <dev> <hash_dev> <hash_start> <block_size> <alg> <digest> <salt>
+ *
+ * version:        version of the hash tree on-disk format
+ * dev:            device to verify
+ * hash_dev:       device hashtree is stored on
+ * hash_start:     start address of hashes
+ * block_size:     size of a hash block
+ * alg:            hash algorithm
+ * digest:         toplevel hash of the tree
+ * salt:           salt
+ */
+static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+{
+	struct verity_config *vc = NULL;
+	const char *dev, *hash_dev, *alg, *digest, *salt;
+	unsigned long hash_start, block_size, version;
+	sector_t blocks;
+	int ret;
+
+	if (argc != 8) {
+		ti->error = "Invalid argument count";
+		return -EINVAL;
+	}
+
+	if (kstrtoul(argv[0], 10, &version) || (version != 0)) {
+		ti->error = "Invalid version";
+		return -EINVAL;
+	}
+	dev = argv[1];
+	hash_dev = argv[2];
+	if (kstrtoul(argv[3], 10, &hash_start)) {
+		ti->error = "Invalid hash_start";
+		return -EINVAL;
+	}
+	if (kstrtoul(argv[4], 10, &block_size) || (block_size > UINT_MAX)) {
+		ti->error = "Invalid block_size";
+		return -EINVAL;
+	}
+	alg = argv[5];
+	digest = argv[6];
+	salt = argv[7];
+
+	/* The device mapper device should be setup read-only */
+	if ((dm_table_get_mode(ti->table) & ~FMODE_READ) != 0) {
+		ti->error = "Must be created readonly.";
+		return -EINVAL;
+	}
+
+	vc = kzalloc(sizeof(*vc), GFP_KERNEL);
+	if (!vc)
+		return -EINVAL;
+
+	/* Calculate the blocks from the given device size */
+	vc->size = ti->len;
+	blocks = to_bytes(vc->size) / block_size;
+	if (verity_tree_create(&vc->vt, blocks, block_size, alg)) {
+		DMERR("failed to create required vt");
+		goto bad_vt;
+	}
+	if (verity_tree_set_digest(&vc->vt, digest)) {
+		DMERR("digest error");
+		goto bad_digest;
+	}
+	verity_tree_set_salt(&vc->vt, salt);
+	vc->vt.read_cb = kverityd_vt_read_callback;
+
+	vc->start = 0;
+	/* We only ever grab the device in read-only mode. */
+	ret = dm_get_device(ti, dev, dm_table_get_mode(ti->table), &vc->dev);
+	if (ret) {
+		DMERR("Failed to acquire device '%s': %d", dev, ret);
+		ti->error = "Device lookup failed";
+		goto bad_verity_dev;
+	}
+
+	if ((to_bytes(vc->start) % block_size) ||
+	    (to_bytes(vc->size) % block_size)) {
+		ti->error = "Device must be block_size divisble/aligned";
+		goto bad_hash_start;
+	}
+
+	vc->hash_start = (sector_t)hash_start;
+
+	/*
+	 * Note, dev == hash_dev is okay as long as the size of
+	 *       ti->len passed to device mapper does not include
+	 *       the hashes.
+	 */
+	if (dm_get_device(ti, hash_dev,
+			  dm_table_get_mode(ti->table), &vc->hash_dev)) {
+		ti->error = "Hash device lookup failed";
+		goto bad_hash_dev;
+	}
+
+	if (snprintf(vc->hash_alg, CRYPTO_MAX_ALG_NAME, "%s", alg) >=
+	    CRYPTO_MAX_ALG_NAME) {
+		ti->error = "Hash algorithm name is too long";
+		goto bad_hash;
+	}
+
+	vc->io_pool = mempool_create_slab_pool(MIN_IOS, _verity_io_pool);
+	if (!vc->io_pool) {
+		ti->error = "Cannot allocate verity io mempool";
+		goto bad_slab_pool;
+	}
+
+	vc->bs = bioset_create(MIN_BIOS, 0);
+	if (!vc->bs) {
+		ti->error = "Cannot allocate verity bioset";
+		goto bad_bs;
+	}
+
+	ti->private = vc;
+
+	return 0;
+
+bad_bs:
+	mempool_destroy(vc->io_pool);
+bad_slab_pool:
+bad_hash:
+	dm_put_device(ti, vc->hash_dev);
+bad_hash_dev:
+bad_hash_start:
+	dm_put_device(ti, vc->dev);
+bad_vt:
+bad_digest:
+bad_verity_dev:
+	kfree(vc);   /* hash is not secret so no need to zero */
+	return -EINVAL;
+}
+
+static void verity_dtr(struct dm_target *ti)
+{
+	struct verity_config *vc = ti->private;
+
+	bioset_free(vc->bs);
+	mempool_destroy(vc->io_pool);
+	verity_tree_destroy(&vc->vt);
+	dm_put_device(ti, vc->hash_dev);
+	dm_put_device(ti, vc->dev);
+	kfree(vc);
+}
+
+static int verity_ioctl(struct dm_target *ti, unsigned int cmd,
+			unsigned long arg)
+{
+	struct verity_config *vc = ti->private;
+	struct dm_dev *dev = vc->dev;
+	int r = 0;
+
+	/*
+	 * Only pass ioctls through if the device sizes match exactly.
+	 */
+	if (vc->start ||
+	    ti->len != i_size_read(dev->bdev->bd_inode) >> SECTOR_SHIFT)
+		r = scsi_verify_blk_ioctl(NULL, cmd);
+
+	return r ? : __blkdev_driver_ioctl(dev->bdev, dev->mode, cmd, arg);
+}
+
+static int verity_status(struct dm_target *ti, status_type_t type,
+			char *result, unsigned int maxlen)
+{
+	struct verity_config *vc = ti->private;
+	char digest[VERITY_MAX_DIGEST_SIZE * 2 + 1] = { 0 };
+	char salt[VERITY_SALT_SIZE * 2 + 1] = { 0 };
+	unsigned int sz = 0;
+
+	verity_tree_digest(&vc->vt, digest);
+	verity_tree_salt(&vc->vt, salt);
+
+	switch (type) {
+	case STATUSTYPE_INFO:
+		result[0] = '\0';
+		break;
+	case STATUSTYPE_TABLE:
+		DMEMIT("%s %s %llu %llu %s %s %s",
+		       vc->dev->name,
+		       vc->hash_dev->name,
+		       ULL(vc->hash_start),
+		       ULL(vc->vt.block_size),
+		       vc->hash_alg,
+		       digest,
+		       salt);
+		break;
+	}
+	return 0;
+}
+
+static int verity_merge(struct dm_target *ti, struct bvec_merge_data *bvm,
+		       struct bio_vec *biovec, int max_size)
+{
+	struct verity_config *vc = ti->private;
+	struct request_queue *q = bdev_get_queue(vc->dev->bdev);
+
+	if (!q->merge_bvec_fn)
+		return max_size;
+
+	bvm->bi_bdev = vc->dev->bdev;
+	bvm->bi_sector = vc->start + bvm->bi_sector - ti->begin;
+
+	/* Optionally, this could just return 0 to stick to single pages. */
+	return min(max_size, q->merge_bvec_fn(q, bvm, biovec));
+}
+
+static int verity_iterate_devices(struct dm_target *ti,
+				 iterate_devices_callout_fn fn, void *data)
+{
+	struct verity_config *vc = ti->private;
+
+	return fn(ti, vc->dev, vc->start, ti->len, data);
+}
+
+static void verity_io_hints(struct dm_target *ti,
+			    struct queue_limits *limits)
+{
+	struct verity_config *vc = ti->private;
+	unsigned int block_size = vc->vt.block_size;
+
+	limits->logical_block_size = block_size;
+	limits->physical_block_size = block_size;
+	blk_limits_io_min(limits, block_size);
+}
+
+static struct target_type verity_target = {
+	.name   = "verity",
+	.version = {0, 1, 0},
+	.module = THIS_MODULE,
+	.ctr    = verity_ctr,
+	.dtr    = verity_dtr,
+	.ioctl  = verity_ioctl,
+	.map    = verity_map,
+	.merge  = verity_merge,
+	.status = verity_status,
+	.iterate_devices = verity_iterate_devices,
+	.io_hints = verity_io_hints,
+};
+
+#define VERITY_WQ_FLAGS (WQ_CPU_INTENSIVE|WQ_HIGHPRI)
+
+static int __init verity_init(void)
+{
+	int r = -ENOMEM;
+
+	_verity_io_pool = KMEM_CACHE(verity_io, 0);
+	if (!_verity_io_pool) {
+		DMERR("failed to allocate pool verity_io");
+		goto bad_io_pool;
+	}
+
+	kverityd_ioq = alloc_workqueue("kverityd_io", VERITY_WQ_FLAGS, 1);
+	if (!kverityd_ioq) {
+		DMERR("failed to create workqueue kverityd_ioq");
+		goto bad_io_queue;
+	}
+
+	kveritydq = alloc_workqueue("kverityd", VERITY_WQ_FLAGS, 1);
+	if (!kveritydq) {
+		DMERR("failed to create workqueue kveritydq");
+		goto bad_verify_queue;
+	}
+
+	r = dm_register_target(&verity_target);
+	if (r < 0) {
+		DMERR("register failed %d", r);
+		goto register_failed;
+	}
+
+	DMINFO("version %u.%u.%u loaded", verity_target.version[0],
+	       verity_target.version[1], verity_target.version[2]);
+
+	return r;
+
+register_failed:
+	destroy_workqueue(kveritydq);
+bad_verify_queue:
+	destroy_workqueue(kverityd_ioq);
+bad_io_queue:
+	kmem_cache_destroy(_verity_io_pool);
+bad_io_pool:
+	return r;
+}
+
+static void __exit verity_exit(void)
+{
+	int cpu;
+
+	flush_workqueue(kverityd_ioq);
+	flush_workqueue(kveritydq);
+	destroy_workqueue(kveritydq);
+	destroy_workqueue(kverityd_ioq);
+
+	for_each_possible_cpu(cpu)
+		kfree(per_cpu(verity_hash_desc, cpu));
+
+	dm_unregister_target(&verity_target);
+	kmem_cache_destroy(_verity_io_pool);
+}
+
+module_init(verity_init);
+module_exit(verity_exit);
+
+MODULE_AUTHOR("The Chromium OS Authors <chromium-os-dev@chromium.org>");
+MODULE_DESCRIPTION(DM_NAME " target for transparent disk integrity checking");
+MODULE_LICENSE("GPL");
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH] dm: verity target
@ 2012-03-02  0:33 Mandeep Singh Baines
  2012-03-02 16:08 ` Mandeep Singh Baines
  0 siblings, 1 reply; 22+ messages in thread
From: Mandeep Singh Baines @ 2012-03-02  0:33 UTC (permalink / raw)
  To: linux-kernel, dm-devel, Alasdair G Kergon
  Cc: Mandeep Singh Baines, Will Drewry, Elly Jones, Milan Broz,
	Olof Johansson, Steffen Klassert, Andrew Morton, Mikulas Patocka

The verity target provides transparent integrity checking of block devices
using a cryptographic digest.

dm-verity is meant to be setup as part of a verified boot path.  This
may be anything ranging from a boot using tboot or trustedgrub to just
booting from a known-good device (like a USB drive or CD).

dm-verity is part of ChromeOS's verified boot path. It is used to verify
the integrity of the root filesystem on boot. The root filesystem is
mounted on a dm-verity partition which transparently verifies each block
with a bootloader verified hash passed into the kernel at boot.

Changes in V5:
* https://lkml.org/lkml/2012/2/29/421 (Mikulas Patocka)
  * Fixed off-by-one error.
  * Added support for filesystems bigger than 4G (bug fix).
* https://lkml.org/lkml/2012/2/29/426 (Andrew Morton)
  * Fixed checkpatch errors/warning.
  * Made code cpu-hotplug-aware.
  * Remove NULL check before calling kfree.
  * No longer checking __GFP_WAIT allocations.
  * Propogate io->error instead of always EIO.
  * Remove unneeded and undesirable casts of void.
  * Use DMERR_LIMIT on io errors to avoid spamming dmesg.
  * Flush workqueue on rmmod.
Changes in V4:
* Discussion over phone (Alasdair G Kergon)
 * copy _ioctl fix from dm-linear
 * verity_status format fixes to match dm conventions
 * s/dm-bht/verity_tree
 * put everything into dm-verity.c
 * ctr changed to dm conventions
 * use hex2bin
 * use conventional dm names for function
  * s/dm_//
  * for example: verity_ctr versus dm_verity_ctr
 * use per_cpu API
Changes in V3:
* Discussion over irc (Alasdair G Kergon)
  * Implement ioctl hook
Changes in V2:
* https://lkml.org/lkml/2011/11/10/85 (Steffen Klassert)
  * Use shash API instead of older hash API

Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Elly Jones <ellyjones@chromium.org>
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Milan Broz <mbroz@redhat.com>
Cc: Olof Johansson <olofj@chromium.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: dm-devel@redhat.com
---
 Documentation/device-mapper/verity.txt |  149 ++++
 drivers/md/Kconfig                     |   16 +
 drivers/md/Makefile                    |    1 +
 drivers/md/dm-verity.c                 | 1367 ++++++++++++++++++++++++++++++++
 4 files changed, 1533 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/device-mapper/verity.txt
 create mode 100644 drivers/md/dm-verity.c

diff --git a/Documentation/device-mapper/verity.txt b/Documentation/device-mapper/verity.txt
new file mode 100644
index 0000000..b631f12
--- /dev/null
+++ b/Documentation/device-mapper/verity.txt
@@ -0,0 +1,149 @@
+dm-verity
+==========
+
+Device-Mapper's "verity" target provides transparent integrity checking of
+block devices using a cryptographic digest provided by the kernel crypto API.
+This target is read-only.
+
+Parameters:
+    <version> <dev> <hash_dev> <hash_start> <block_size> <alg> <digest> <salt>
+
+<version>
+    This is the version number of the on-disk format. Currently, there is
+    only version 0.
+
+<dev>
+    This is the device that is going to be integrity checked.  It may be
+    a subset of the full device as specified to dmsetup (start sector and count)
+    It may be specified as a path, like /dev/sdaX, or a device number,
+    <major>:<minor>.
+
+<hash_dev>
+    This is the device that that supplies the hash tree data.  It may be
+    specified similarly to the device path and may be the same device.  If the
+    same device is used, the hash offset should be outside of the dm-verity
+    configured device size.
+
+<hash_start>
+    This is the offset, in 512-byte sectors, from the start of hash_dev to
+    the root block of the hash tree.
+
+<block_size>
+    The size of a hash block. Also, the size of a block to be hashed.
+
+<alg>
+    The cryptographic hash algorithm used for this device.  This should
+    be the name of the algorithm, like "sha1".
+
+<digest>
+    The hexadecimal encoding of the cryptographic hash of all of the
+    neighboring nodes at the first level of the tree.  This hash should be
+    trusted as there is no other authenticity beyond this point.
+
+<salt>
+    The hexadecimal encoding of the salt value.
+
+Theory of operation
+===================
+
+dm-verity is meant to be setup as part of a verified boot path.  This
+may be anything ranging from a boot using tboot or trustedgrub to just
+booting from a known-good device (like a USB drive or CD).
+
+When a dm-verity device is configured, it is expected that the caller
+has been authenticated in some way (cryptographic signatures, etc).
+After instantiation, all hashes will be verified on-demand during
+disk access.  If they cannot be verified up to the root node of the
+tree, the root hash, then the I/O will fail.  This should identify
+tampering with any data on the device and the hash data.
+
+Cryptographic hashes are used to assert the integrity of the device on a
+per-block basis.  This allows for a lightweight hash computation on first read
+into the page cache.  Block hashes are stored linearly aligned to the nearest
+block the size of a page.
+
+Hash Tree
+---------
+
+Each node in the tree is a cryptographic hash.  If it is a leaf node, the hash
+is of some block data on disk.  If it is an intermediary node, then the hash is
+of a number of child nodes.
+
+Each entry in the tree is a collection of neighboring nodes that fit in one
+block.  The number is determined based on block_size and the size of the
+selected cryptographic digest algorithm.  The hashes are linearly ordered in
+this entry and any unaligned trailing space is ignored but included when
+calculating the parent node.
+
+The tree looks something like:
+
+alg = sha256, num_blocks = 32768, block_size = 4096
+
+                                 [   root    ]
+                                /    . . .    \
+                     [entry_0]                 [entry_1]
+                    /  . . .  \                 . . .   \
+         [entry_0_0]   . . .  [entry_0_127]    . . . .  [entry_1_127]
+           / ... \             /   . . .  \             /           \
+     blk_0 ... blk_127  blk_16256   blk_16383      blk_32640 . . . blk_32767
+
+On-disk format
+==============
+
+Below is the recommended on-disk format. The verity kernel code does not
+read the on-disk header. It only reads the hash blocks which directly
+follow the header. It is expected that a user-space tool will verify the
+integrity of the verity_header and then call dm_setup with the correct
+parameters. Alternatively, the header can be omitted and the dm_setup
+parameters can be passed via the kernel command-line in a rooted chain
+of trust where the command-line is verified.
+
+The on-disk format is especially useful in cases where the hash blocks
+are on a separate partition. The magic number allows easy identification
+of the partition contents. Alternatively, the hash blocks can be stored
+in the same partition as the data to be verified. In such a configuration
+the filesystem on the partition would be sized a little smaller than
+the full-partition, leaving room for the hash blocks.
+
+struct verity_header {
+       uint64_t magic = 0x7665726974790a00;
+       uint32_t version;
+       uint32_t block_size;
+       char digest[128]; /* in hex-ascii, null-terminated or 128-bytes */
+       char salt[128]; /* in hex-ascii, null-terminated or 128-bytes */
+}
+
+struct verity_header_block {
+	struct verity_header;
+	char unused[block_size - sizeof(struct verity_header) - sizeof(sig)];
+	char sig[128]; /* in hex-ascii, null-terminated or 128-bytes */
+}
+
+Directly following the header are the hash blocks which are stored a depth
+at a time (starting from the root), sorted in order of increasing index.
+
+Usage
+=====
+
+The API provides mechanisms for reading and verifying a tree. When reading, all
+required data for the hash tree should be populated for a block before
+attempting a verify.  This can be done by calling dm_bht_populate().  When all
+data is ready, a call to dm_bht_verify_block() with the expected hash value will
+perform both the direct block hash check and the hashes of the parent and
+neighboring nodes where needed to ensure validity up to the root hash.  Note,
+dm_bht_set_root_hexdigest() should be called before any verification attempts
+occur.
+
+Example
+=======
+
+Setup a device;
+[[
+  dmsetup create vroot --table \
+    "0 204800 verity /dev/sda1 /dev/sda2 alg=sha1 "\
+    "root_hexdigest=9f74809a2ee7607b16fcc70d9399a4de9725a727"
+]]
+
+A command line tool is available to compute the hash tree and return the
+root hash value.
+  http://git.chromium.org/cgi-bin/gitweb.cgi?p=dm-verity.git;a=tree
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index faa4741..b8bb690 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -370,4 +370,20 @@ config DM_FLAKEY
        ---help---
          A target that intermittently fails I/O for debugging purposes.
 
+config DM_VERITY
+        tristate "Verity target support"
+        depends on BLK_DEV_DM
+        select CRYPTO
+        select CRYPTO_HASH
+        ---help---
+          This device-mapper target allows you to create a device that
+          transparently integrity checks the data on it. You'll need to
+          activate the digests you're going to use in the cryptoapi
+          configuration.
+
+          To compile this code as a module, choose M here: the module will
+          be called dm-verity.
+
+          If unsure, say N.
+
 endif # MD
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index 046860c..70a29af 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -39,6 +39,7 @@ obj-$(CONFIG_DM_SNAPSHOT)	+= dm-snapshot.o
 obj-$(CONFIG_DM_PERSISTENT_DATA)	+= persistent-data/
 obj-$(CONFIG_DM_MIRROR)		+= dm-mirror.o dm-log.o dm-region-hash.o
 obj-$(CONFIG_DM_LOG_USERSPACE)	+= dm-log-userspace.o
+obj-$(CONFIG_DM_VERITY)         += dm-verity.o
 obj-$(CONFIG_DM_ZERO)		+= dm-zero.o
 obj-$(CONFIG_DM_RAID)	+= dm-raid.o
 obj-$(CONFIG_DM_THIN_PROVISIONING)	+= dm-thin-pool.o
diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
new file mode 100644
index 0000000..b72f350
--- /dev/null
+++ b/drivers/md/dm-verity.c
@@ -0,0 +1,1367 @@
+/*
+ * Originally based on dm-crypt.c,
+ * Copyright (C) 2003 Christophe Saout <christophe@saout.de>
+ * Copyright (C) 2004 Clemens Fruhwirth <clemens@endorphin.org>
+ * Copyright (C) 2006-2008 Red Hat, Inc. All rights reserved.
+ * Copyright (C) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *                    All Rights Reserved.
+ *
+ * This file is released under the GPLv2.
+ *
+ * Implements a verifying transparent block device.
+ * See Documentation/device-mapper/verity.txt
+ */
+#include <crypto/hash.h>
+#include <linux/atomic.h>
+#include <linux/bio.h>
+#include <linux/blkdev.h>
+#include <linux/genhd.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mempool.h>
+#include <linux/module.h>
+#include <linux/percpu.h>
+#include <linux/workqueue.h>
+#include <linux/device-mapper.h>
+
+
+#define DM_MSG_PREFIX "verity"
+
+
+/* Helper for printing sector_t */
+#define ULL(x) ((unsigned long long)(x))
+
+#define MIN_IOS 32
+#define MIN_BIOS (MIN_IOS * 2)
+
+/* To avoid allocating memory for digest tests, we just setup a
+ * max to use for now.
+ */
+#define VERITY_MAX_DIGEST_SIZE 64   /* Supports up to 512-bit digests */
+#define VERITY_SALT_SIZE       32   /* 256 bits of salt is a lot */
+
+/* UNALLOCATED, PENDING, READY, and VERIFIED are valid states. All other
+ * values are entry-related return codes.
+ */
+#define VERITY_TREE_ENTRY_VERIFIED 8  /* 'nodes' checked against parent */
+#define VERITY_TREE_ENTRY_READY 4  /* 'nodes' is loaded and available */
+#define VERITY_TREE_ENTRY_PENDING 2  /* 'nodes' is being loaded */
+#define VERITY_TREE_ENTRY_UNALLOCATED 0 /* untouched */
+#define VERITY_TREE_ENTRY_ERROR -1 /* entry is unsuitable for use */
+#define VERITY_TREE_ENTRY_ERROR_IO -2 /* I/O error on load */
+
+/* Additional possible return codes */
+#define VERITY_TREE_ENTRY_ERROR_MISMATCH -3 /* Digest mismatch */
+
+
+struct verity_io {
+	struct dm_target *target;
+	struct bio *bio;
+	struct delayed_work work;
+	unsigned int flags;
+
+	int error;
+	atomic_t pending;
+
+	u64 block;  /* aligned block index */
+	u64 count;  /* aligned count in blocks */
+};
+
+/* verity_tree_entry
+ * Contains verity_tree->node_count tree nodes at a given tree depth.
+ * state is used to transactionally assure that data is paged in
+ * from disk.  Unless verity_tree kept running crypto contexts for each
+ * level, we need to load in the data for on-demand verification.
+ */
+struct verity_tree_entry {
+	atomic_t state; /* see defines */
+	/* Keeping an extra pointer per entry wastes up to ~33k of
+	 * memory if a 1m blocks are used (or 66 on 64-bit arch)
+	 */
+	struct verity_io *io_context;  /* Reserve a pointer for use during io */
+	/* data should only be non-NULL if fully populated. */
+	void *nodes;  /* The hash data used to verify the children.
+		       * Guaranteed to be page-aligned.
+		       */
+};
+
+/* verity_tree_level
+ * Contains an array of entries which represent a page of hashes where
+ * each hash is a node in the tree at the given tree depth/level.
+ */
+struct verity_tree_level {
+	struct verity_tree_entry *entries;  /* array of entries of tree nodes */
+	unsigned int count;  /* number of entries at this level */
+	sector_t sector;  /* starting sector for this level */
+};
+
+/* opaque context, start, databuf, sector_count */
+typedef int(*verity_tree_callback)(void *,  /* external context */
+			      sector_t,  /* start sector */
+			      u8 *,  /* destination page */
+			      sector_t,  /* num sectors */
+			      struct verity_tree_entry *);
+/* verity_tree - Device mapper block hash tree
+ * verity_tree provides a fixed interface for comparing data blocks
+ * against a cryptographic hashes stored in a hash tree. It
+ * optimizes the tree structure for storage on disk.
+ *
+ * The tree is built from the bottom up.  A collection of data,
+ * external to the tree, is hashed and these hashes are stored
+ * as the blocks in the tree.  For some number of these hashes,
+ * a parent node is created by hashing them.  These steps are
+ * repeated.
+ */
+struct verity_tree {
+	/* Configured values */
+	int depth;  /* Depth of the tree including the root */
+	unsigned int block_size;  /* Size of a hash block */
+	u64 block_count;  /* Number of blocks hashed */
+	char hash_alg[CRYPTO_MAX_ALG_NAME];
+	u8 salt[VERITY_SALT_SIZE];
+
+	/* Computed values */
+	unsigned int node_count;  /* Data size (in hashes) for each entry */
+	unsigned int node_count_shift;  /* first bit set - 1 */
+	struct crypto_shash *tfm; /* hash for this device */
+	unsigned int hash_desc_size;
+	sector_t sectors;  /* Number of disk sectors used */
+	u8 digest[VERITY_MAX_DIGEST_SIZE];
+	unsigned int digest_size;
+
+	struct verity_tree_level *levels;
+
+	/* Callback for reading from the hash device */
+	verity_tree_callback read_cb;
+};
+
+/* per-requested-bio private data */
+enum verity_io_flags {
+	VERITY_IOFLAGS_CLONED = 0x1,	/* original bio has been cloned */
+};
+
+struct verity_config {
+	struct dm_dev *dev;
+	sector_t start;
+	sector_t size;
+
+	struct dm_dev *hash_dev;
+	sector_t hash_start;
+
+	struct verity_tree vt;
+
+	/* Pool required for io contexts */
+	mempool_t *io_pool;
+	/* Pool and bios required for making sure that backing device reads are
+	 * in PAGE_SIZE increments.
+	 */
+	struct bio_set *bs;
+
+	char hash_alg[CRYPTO_MAX_ALG_NAME];
+};
+
+
+static struct kmem_cache *_verity_io_pool;
+static struct workqueue_struct *kveritydq, *kverityd_ioq;
+
+static DEFINE_PER_CPU(struct shash_desc *, verity_hash_desc);
+static DEFINE_PER_CPU(unsigned int, verity_hash_size);
+
+static void kverityd_verify(struct work_struct *work);
+static void kverityd_io(struct work_struct *work);
+static void kverityd_io_vt_populate(struct verity_io *io);
+static void kverityd_io_vt_populate_end(struct bio *, int error);
+
+
+/*
+ * Utilities
+ */
+
+static void bin2hex(char *dst, const u8 *src, size_t count)
+{
+	while (count-- > 0) {
+		sprintf(dst, "%02hhx", (int)*src);
+		dst += 2;
+		src++;
+	}
+}
+
+/*
+ * Verity Tree
+ */
+
+/* Functions for converting indices to nodes. */
+
+static inline unsigned int verity_tree_get_level_shift(struct verity_tree *vt,
+						  int depth)
+{
+	return (vt->depth - depth) * vt->node_count_shift;
+}
+
+/* For the given depth, this is the entry index.  At depth+1 it is the node
+ * index for depth.
+ */
+static inline u64 verity_tree_index_at_level(struct verity_tree *vt,
+					     int depth, u64 leaf)
+{
+	return leaf >> verity_tree_get_level_shift(vt, depth);
+}
+
+static inline struct verity_tree_entry *verity_tree_get_entry(
+		struct verity_tree *vt,
+		int depth, u64 block)
+{
+	u64 index = verity_tree_index_at_level(vt, depth, block);
+	struct verity_tree_level *level = &vt->levels[depth];
+
+	return &level->entries[index];
+}
+
+static inline void *verity_tree_get_node(struct verity_tree *vt,
+					 struct verity_tree_entry *entry,
+					 int depth, unsigned int block)
+{
+	u64 index = verity_tree_index_at_level(vt, depth, block);
+	unsigned int node_index = (unsigned int)index % vt->node_count;
+
+	return entry->nodes + (node_index * vt->digest_size);
+}
+
+/**
+ * verity_tree_compute_hash: hashes a page of data
+ */
+static int verity_tree_compute_hash(struct verity_tree *vt, struct page *pg,
+				    unsigned int offset, u8 *digest)
+{
+	struct shash_desc **hash_descp = &__get_cpu_var(verity_hash_desc);
+	unsigned int *hash_sizep = &__get_cpu_var(verity_hash_size);
+	struct shash_desc *hash_desc;
+	void *data;
+	int err;
+
+	if (!*hash_descp || *hash_sizep < vt->hash_desc_size) {
+		kfree(*hash_descp);
+		*hash_descp = kmalloc(vt->hash_desc_size, GFP_KERNEL);
+		*hash_sizep = vt->hash_desc_size;
+	}
+	hash_desc = *hash_descp;
+	hash_desc->tfm = vt->tfm;
+	hash_desc->flags = 0x0;
+
+	if (crypto_shash_init(hash_desc)) {
+		DMCRIT("failed to reinitialize crypto hash (proc:%d)",
+			smp_processor_id());
+		return -EINVAL;
+	}
+	data = kmap_atomic(pg);
+	err = crypto_shash_update(hash_desc, data + offset, PAGE_SIZE);
+	kunmap_atomic(data);
+	if (err) {
+		DMCRIT("crypto_hash_update failed");
+		return -EINVAL;
+	}
+	if (crypto_shash_update(hash_desc, vt->salt, sizeof(vt->salt))) {
+		DMCRIT("crypto_hash_update failed");
+		return -EINVAL;
+	}
+	if (crypto_shash_final(hash_desc, digest)) {
+		DMCRIT("crypto_hash_final failed");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int verity_tree_initialize_entries(struct verity_tree *vt)
+{
+	/* last represents the index of the last digest store in the tree.
+	 * By walking the tree with that index, it is possible to compute the
+	 * total number of entries at each level.
+	 *
+	 * Since each entry will contain up to |node_count| nodes of the tree,
+	 * it is possible that the last index may not be at the end of a given
+	 * entry->nodes.  In that case, it is assumed the value is padded.
+	 *
+	 * Note, we treat both the tree root (1 hash) and the tree leaves
+	 * independently from the vt data structures.  Logically, the root is
+	 * depth=-1 and the block layer level is depth=vt->depth
+	 */
+	u64 last = vt->block_count - 1;
+	int depth;
+
+	/* check that the largest level->count can't result in an int overflow
+	 * on allocation or sector calculation.
+	 */
+	if (((last >> vt->node_count_shift) + 1) >
+	    UINT_MAX / max_t(unsigned long,
+			     sizeof(struct verity_tree_entry),
+			     (unsigned long)to_sector(vt->block_size))) {
+		DMCRIT("required entries %llu is too large", vt->block_count);
+		return -EINVAL;
+	}
+
+	/* Track the current sector location for each level so we don't have to
+	 * compute it during traversals.
+	 */
+	vt->sectors = 0;
+	for (depth = 0; depth < vt->depth; ++depth) {
+		struct verity_tree_level *level = &vt->levels[depth];
+
+		level->count = verity_tree_index_at_level(vt, depth, last) + 1;
+		level->entries = kcalloc(level->count,
+					 sizeof(struct verity_tree_entry),
+					 GFP_KERNEL);
+		if (!level->entries) {
+			DMERR("failed to allocate entries for depth %d", depth);
+			return -ENOMEM;
+		}
+		level->sector = vt->sectors;
+		vt->sectors += level->count * to_sector(vt->block_size);
+	}
+
+	return 0;
+}
+
+/**
+ * verity_tree_create - prepares @vt for us
+ * @vt:	          pointer to a verity_tree_create()d vt
+ * @depth:	  tree depth without the root; including block hashes
+ * @block_count:  the number of block hashes / tree leaves
+ * @alg_name:	  crypto hash algorithm name
+ *
+ * Returns 0 on success.
+ *
+ * Callers can offset into devices by storing the data in the io callbacks.
+ */
+static int verity_tree_create(struct verity_tree *vt, u64 block_count,
+			      unsigned int block_size, const char *alg_name)
+{
+	int status = 0;
+
+	vt->block_size = block_size;
+	/* Verify that PAGE_SIZE >= block_size >= SECTOR_SIZE. */
+	if ((block_size > PAGE_SIZE) ||
+	    (PAGE_SIZE % block_size) ||
+	    (to_sector(block_size) == 0))
+		return -EINVAL;
+
+	vt->tfm = crypto_alloc_shash(alg_name, 0, 0);
+	if (IS_ERR(vt->tfm)) {
+		DMERR("failed to allocate crypto hash '%s'", alg_name);
+		return -ENOMEM;
+	}
+	vt->hash_desc_size = sizeof(struct shash_desc) +
+		crypto_shash_descsize(vt->tfm);
+
+	vt->digest_size = crypto_shash_digestsize(vt->tfm);
+	/* We expect to be able to pack >=2 hashes into a block */
+	if (block_size / vt->digest_size < 2) {
+		DMERR("too few hashes fit in a block");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	if (vt->digest_size > VERITY_MAX_DIGEST_SIZE) {
+		DMERR("VERITY_MAX_DIGEST_SIZE too small for digest");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	/* Configure the tree */
+	vt->block_count = block_count;
+	if (block_count == 0) {
+		DMERR("block_count must be non-zero");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	/* Each verity_tree_entry->nodes is one block.  The node code tracks
+	 * how many nodes fit into one entry where a node is a single
+	 * hash (message digest).
+	 */
+	vt->node_count_shift = fls(block_size / vt->digest_size) - 1;
+	/* Round down to the nearest power of two.  This makes indexing
+	 * into the tree much less painful.
+	 */
+	vt->node_count = 1 << vt->node_count_shift;
+
+	/* This is unlikely to happen, but with 64k pages, who knows. */
+	if (vt->node_count > UINT_MAX / vt->digest_size) {
+		DMERR("node_count * hash_len exceeds UINT_MAX!");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	vt->depth = DIV_ROUND_UP(fls64(block_count - 1), vt->node_count_shift);
+
+	/* Ensure that we can safely shift by this value. */
+	if (vt->depth * vt->node_count_shift >= sizeof(unsigned int) * 8) {
+		DMERR("specified depth and node_count_shift is too large");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	/* Allocate levels. Each level of the tree may have an arbitrary number
+	 * of verity_tree_entry structs.  Each entry contains node_count nodes.
+	 * Each node in the tree is a cryptographic digest of either node_count
+	 * nodes on the subsequent level or of a specific block on disk.
+	 */
+	vt->levels = kcalloc(vt->depth,
+			     sizeof(struct verity_tree_level), GFP_KERNEL);
+
+	vt->read_cb = NULL;
+
+	status = verity_tree_initialize_entries(vt);
+	if (status)
+		goto bad_entries_alloc;
+
+	/* We compute depth such that there is only be 1 block at level 0. */
+	BUG_ON(vt->levels[0].count != 1);
+
+	return 0;
+
+bad_entries_alloc:
+	while (vt->depth-- > 0)
+		kfree(vt->levels[vt->depth].entries);
+	kfree(vt->levels);
+bad_arg:
+	crypto_free_shash(vt->tfm);
+	return status;
+}
+
+/**
+ * verity_tree_read_completed
+ * @entry:   pointer to the entry that's been loaded
+ * @status:  I/O status. Non-zero is failure.
+ * MUST always be called after a read_cb completes.
+ */
+static void verity_tree_read_completed(struct verity_tree_entry *entry,
+				       int status)
+{
+	if (status) {
+		DMCRIT("an I/O error occurred while reading entry");
+		atomic_set(&entry->state, VERITY_TREE_ENTRY_ERROR_IO);
+		return;
+	}
+	BUG_ON(atomic_read(&entry->state) != VERITY_TREE_ENTRY_PENDING);
+	atomic_set(&entry->state, VERITY_TREE_ENTRY_READY);
+}
+
+/**
+ * verity_tree_verify_block - checks that all path nodes for @block are valid
+ * @vt:	     pointer to a verity_tree_create()d vt
+ * @block:   specific block data is expected from
+ * @pg:	     page holding the block data
+ * @offset:  offset into the page
+ *
+ * Returns 0 on success, VERITY_TREE_ENTRY_ERROR_MISMATCH on error.
+ */
+static int verity_tree_verify_block(struct verity_tree *vt, unsigned int block,
+				    struct page *pg, unsigned int offset)
+{
+	int state, depth = vt->depth;
+	u8 digest[VERITY_MAX_DIGEST_SIZE];
+	struct verity_tree_entry *entry;
+	void *node;
+
+	do {
+		/* Need to check that the hash of the current block is accurate
+		 * in its parent.
+		 */
+		entry = verity_tree_get_entry(vt, depth - 1, block);
+		state = atomic_read(&entry->state);
+		/* This call is only safe if all nodes along the path
+		 * are already populated (i.e. READY) via verity_tree_populate.
+		 */
+		BUG_ON(state < VERITY_TREE_ENTRY_READY);
+		node = verity_tree_get_node(vt, entry, depth, block);
+
+		if (verity_tree_compute_hash(vt, pg, offset, digest) ||
+		    memcmp(digest, node, vt->digest_size))
+			goto mismatch;
+
+		/* Keep the containing block of hashes to be verified in the
+		 * next pass.
+		 */
+		pg = virt_to_page(entry->nodes);
+		offset = offset_in_page(entry->nodes);
+	} while (--depth > 0 && state != VERITY_TREE_ENTRY_VERIFIED);
+
+	if (depth == 0 && state != VERITY_TREE_ENTRY_VERIFIED) {
+		if (verity_tree_compute_hash(vt, pg, offset, digest) ||
+		    memcmp(digest, vt->digest, vt->digest_size))
+			goto mismatch;
+		atomic_set(&entry->state, VERITY_TREE_ENTRY_VERIFIED);
+	}
+
+	/* Mark path to leaf as verified. */
+	for (depth++; depth < vt->depth; depth++) {
+		entry = verity_tree_get_entry(vt, depth, block);
+		/* At this point, entry can only be in VERIFIED or READY state.
+		 * So it is safe to use atomic_set instead of atomic_cmpxchg.
+		 */
+		atomic_set(&entry->state, VERITY_TREE_ENTRY_VERIFIED);
+	}
+
+	return 0;
+
+mismatch:
+	DMERR_LIMIT("verify_path: failed to verify hash (d=%d,bi=%u)",
+		    depth, block);
+	return VERITY_TREE_ENTRY_ERROR_MISMATCH;
+}
+
+/**
+ * verity_tree_is_populated - check that nodes needed to verify a given
+ *                            block are all ready
+ * @vt:	    pointer to a verity_tree_create()d vt
+ * @block:  specific block data is expected from
+ *
+ * Callers may wish to call verity_tree_is_populated() when checking an io
+ * for which entries were already pending.
+ */
+static bool verity_tree_is_populated(struct verity_tree *vt, unsigned int block)
+{
+	int depth;
+
+	for (depth = vt->depth - 1; depth >= 0; depth--) {
+		struct verity_tree_entry *entry;
+		entry = verity_tree_get_entry(vt, depth, block);
+		if (atomic_read(&entry->state) < VERITY_TREE_ENTRY_READY)
+			return false;
+	}
+
+	return true;
+}
+
+/**
+ * verity_tree_populate - reads entries from disk needed to verify a given block
+ * @vt:     pointer to a verity_tree_create()d vt
+ * @ctx:    context used for all read_cb calls on this request
+ * @block:  specific block data is expected from
+ *
+ * Returns negative value on error. Returns 0 on success.
+ */
+static int verity_tree_populate(struct verity_tree *vt, void *ctx,
+				unsigned int block)
+{
+	int depth, state;
+
+	BUG_ON(block >= vt->block_count);
+
+	for (depth = vt->depth - 1; depth >= 0; --depth) {
+		struct verity_tree_level *level;
+		struct verity_tree_entry *entry;
+		u64 index;
+
+		index = verity_tree_index_at_level(vt, depth, block);
+		level = &vt->levels[depth];
+		entry = verity_tree_get_entry(vt, depth, block);
+		state = atomic_cmpxchg(&entry->state,
+				       VERITY_TREE_ENTRY_UNALLOCATED,
+				       VERITY_TREE_ENTRY_PENDING);
+		if (state == VERITY_TREE_ENTRY_VERIFIED)
+			break;
+		if (state <= VERITY_TREE_ENTRY_ERROR)
+			goto error_state;
+		if (state != VERITY_TREE_ENTRY_UNALLOCATED)
+			continue;
+
+		/* Current entry is claimed for allocation and loading */
+		entry->nodes = kmalloc(vt->block_size, GFP_NOIO);
+
+		vt->read_cb(ctx,
+			    level->sector + to_sector(index * vt->block_size),
+			    entry->nodes, to_sector(vt->block_size), entry);
+	}
+
+	return 0;
+
+error_state:
+	DMCRIT("block %u at depth %d is in an error state", block, depth);
+	return -EPERM;
+}
+
+/**
+ * verity_tree_destroy - cleans up all memory used by @vt
+ * @vt:	 pointer to a verity_tree_create()d vt
+ */
+static void verity_tree_destroy(struct verity_tree *vt)
+{
+	int depth;
+
+	for (depth = 0; depth < vt->depth; depth++) {
+		struct verity_tree_entry *entry = vt->levels[depth].entries;
+		struct verity_tree_entry *entry_end = entry +
+			vt->levels[depth].count;
+		for (; entry < entry_end; ++entry)
+			kfree(entry->nodes);
+		kfree(vt->levels[depth].entries);
+	}
+	kfree(vt->levels);
+	crypto_free_shash(vt->tfm);
+}
+
+/*
+ * Verity Tree Accessors
+ */
+
+/**
+ * verity_tree_set_digest - sets an unverified root digest hash from hex
+ * @vt:	     pointer to a verity_tree_create()d vt
+ * @digest:  string containing the digest in hex
+ * Returns non-zero on error.
+ */
+static int verity_tree_set_digest(struct verity_tree *vt, const char *digest)
+{
+	/* Make sure we have at least the bytes expected */
+	if (strnlen(digest, vt->digest_size * 2) != vt->digest_size * 2) {
+		DMERR("root digest length does not match hash algorithm");
+		return -1;
+	}
+	return hex2bin(vt->digest, digest, vt->digest_size);
+}
+
+/**
+ * verity_tree_digest - returns root digest in hex
+ * @vt:	     pointer to a verity_tree_create()d vt
+ * @digest:  buffer to put into, must be of length VERITY_SALT_SIZE * 2 + 1.
+ */
+int verity_tree_digest(struct verity_tree *vt, char *digest)
+{
+	bin2hex(digest, vt->digest, vt->digest_size);
+	return 0;
+}
+
+/**
+ * verity_tree_set_salt - sets the salt
+ * @vt:    pointer to a verity_tree_create()d vt
+ * @salt:  string containing the salt in hex
+ * Returns non-zero on error.
+ */
+int verity_tree_set_salt(struct verity_tree *vt, const char *salt)
+{
+	size_t saltlen = min(strlen(salt) / 2, sizeof(vt->salt));
+	memset(vt->salt, 0, sizeof(vt->salt));
+	return hex2bin(vt->salt, salt, saltlen);
+}
+
+
+/**
+ * verity_tree_salt - returns the salt in hex
+ * @vt:    pointer to a verity_tree_create()d vt
+ * @salt:  buffer to put salt into, of length VERITY_SALT_SIZE * 2 + 1.
+ */
+int verity_tree_salt(struct verity_tree *vt, char *salt)
+{
+	bin2hex(salt, vt->salt, sizeof(vt->salt));
+	return 0;
+}
+
+/*
+ * Allocation and utility functions
+ */
+
+static void kverityd_src_io_read_end(struct bio *clone, int error);
+
+/* Shared destructor for all internal bios */
+static void verity_bio_destructor(struct bio *bio)
+{
+	struct verity_io *io = bio->bi_private;
+	struct verity_config *vc = io->target->private;
+	bio_free(bio, vc->bs);
+}
+
+static struct bio *verity_alloc_bioset(struct verity_config *vc, gfp_t gfp_mask,
+				       int nr_iovecs)
+{
+	return bio_alloc_bioset(gfp_mask, nr_iovecs, vc->bs);
+}
+
+static struct verity_io *verity_io_alloc(struct dm_target *ti,
+					    struct bio *bio)
+{
+	struct verity_config *vc = ti->private;
+	sector_t sector = bio->bi_sector - ti->begin;
+	struct verity_io *io;
+	u64 tmp;
+
+	io = mempool_alloc(vc->io_pool, GFP_NOIO);
+	io->flags = 0;
+	io->target = ti;
+	io->bio = bio;
+	io->error = 0;
+
+	/* Adjust the sector by the virtual starting sector */
+	tmp = (u64)to_bytes(1) * sector;
+	do_div(tmp, vc->vt.block_size);
+	io->block = tmp;
+	io->count = bio->bi_size / vc->vt.block_size;
+
+	atomic_set(&io->pending, 0);
+
+	return io;
+}
+
+static struct bio *verity_bio_clone(struct verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+	struct bio *bio = io->bio;
+	struct bio *clone = verity_alloc_bioset(vc, GFP_NOIO, bio->bi_max_vecs);
+
+	__bio_clone(clone, bio);
+	clone->bi_private = io;
+	clone->bi_end_io  = kverityd_src_io_read_end;
+	clone->bi_bdev    = vc->dev->bdev;
+	clone->bi_sector += vc->start - io->target->begin;
+	clone->bi_destructor = verity_bio_destructor;
+
+	return clone;
+}
+
+/*
+ * Reverse flow of requests into the device.
+ *
+ * (Start at the bottom with verity_map and work your way upward).
+ */
+
+static void verity_inc_pending(struct verity_io *io);
+
+static void verity_return_bio_to_caller(struct verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+
+	bio_endio(io->bio, io->error);
+	mempool_free(io, vc->io_pool);
+}
+
+/* Check for any missing vt hashes. */
+static bool verity_is_vt_populated(struct verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+	u64 block;
+
+	for (block = io->block; block < io->block + io->count; ++block)
+		if (!verity_tree_is_populated(&vc->vt, block))
+			return false;
+
+	return true;
+}
+
+/* verity_dec_pending manages the lifetime of all verity_io structs.
+ * Non-bug error handling is centralized through this interface and
+ * all passage from workqueue to workqueue.
+ */
+static void verity_dec_pending(struct verity_io *io)
+{
+	if (!atomic_dec_and_test(&io->pending))
+		goto done;
+
+	if (unlikely(io->error))
+		goto io_error;
+
+	/* I/Os that were pending may now be ready */
+	if (verity_is_vt_populated(io)) {
+		INIT_DELAYED_WORK(&io->work, kverityd_verify);
+		queue_delayed_work(kveritydq, &io->work, 0);
+	} else {
+		INIT_DELAYED_WORK(&io->work, kverityd_io);
+		queue_delayed_work(kverityd_ioq, &io->work, HZ/10);
+	}
+
+done:
+	return;
+
+io_error:
+	verity_return_bio_to_caller(io);
+}
+
+/* Walks the data set and computes the hash of the data read from the
+ * untrusted source device.  The computed hash is then passed to verity-tree
+ * for verification.
+ */
+static int verity_verify(struct verity_config *vc,
+			 struct verity_io *io)
+{
+	unsigned int block_size = vc->vt.block_size;
+	struct bio *bio = io->bio;
+	u64 block = io->block;
+	unsigned int idx;
+	int r;
+
+	for (idx = bio->bi_idx; idx < bio->bi_vcnt; idx++) {
+		struct bio_vec *bv = bio_iovec_idx(bio, idx);
+		unsigned int offset = bv->bv_offset;
+		unsigned int len = bv->bv_len;
+
+		BUG_ON(offset % block_size);
+		BUG_ON(len % block_size);
+
+		while (len) {
+			r = verity_tree_verify_block(&vc->vt, block,
+						bv->bv_page, offset);
+			if (r)
+				goto bad_return;
+
+			offset += block_size;
+			len -= block_size;
+			block++;
+			cond_resched();
+		}
+	}
+
+	return 0;
+
+bad_return:
+	/* verity_tree functions aren't expected to return errno friendly
+	 * values.  They are converted here for uniformity.
+	 */
+	if (r > 0) {
+		DMERR("Pending data for block %llu seen at verify", ULL(block));
+		r = -EBUSY;
+	} else {
+		DMERR_LIMIT("Block hash does not match!");
+		r = -EACCES;
+	}
+	return r;
+}
+
+/* Services the verify workqueue */
+static void kverityd_verify(struct work_struct *work)
+{
+	struct delayed_work *dwork = container_of(work, struct delayed_work,
+						  work);
+	struct verity_io *io = container_of(dwork, struct verity_io,
+					    work);
+	struct verity_config *vc = io->target->private;
+
+	io->error = verity_verify(vc, io);
+
+	/* Free up the bio and tag with the return value */
+	verity_return_bio_to_caller(io);
+}
+
+/* Asynchronously called upon the completion of verity-tree I/O. The status
+ * of the operation is passed back to verity-tree and the next steps are
+ * decided by verity_dec_pending.
+ */
+static void kverityd_io_vt_populate_end(struct bio *bio, int error)
+{
+	struct verity_tree_entry *entry = bio->bi_private;
+	struct verity_io *io = entry->io_context;
+
+	/* Tell the tree to atomically update now that we've populated
+	 * the given entry.
+	 */
+	verity_tree_read_completed(entry, error);
+
+	/* Clean up for reuse when reading data to be checked */
+	bio->bi_vcnt = 0;
+	bio->bi_io_vec->bv_offset = 0;
+	bio->bi_io_vec->bv_len = 0;
+	bio->bi_io_vec->bv_page = NULL;
+	/* Restore the private data to I/O so the destructor can be shared. */
+	bio->bi_private = io;
+	bio_put(bio);
+
+	/* We bail but assume the tree has been marked bad. */
+	if (unlikely(error)) {
+		DMERR("Failed to read for sector %llu (%u)",
+		      ULL(io->bio->bi_sector), io->bio->bi_size);
+		io->error = error;
+		/* Pass through the error to verity_dec_pending below */
+	}
+	/* When pending = 0, it will transition to reading real data */
+	verity_dec_pending(io);
+}
+
+/* Called by verity-tree (via verity_tree_populate), this function provides
+ * the message digests to verity-tree that are stored on disk.
+ */
+static int kverityd_vt_read_callback(void *ctx, sector_t start, u8 *dst,
+				      sector_t count,
+				      struct verity_tree_entry *entry)
+{
+	struct verity_io *io = ctx;  /* I/O for this batch */
+	struct verity_config *vc;
+	struct bio *bio;
+
+	vc = io->target->private;
+
+	/* The I/O context is nested inside the entry so that we don't need one
+	 * io context per page read.
+	 */
+	entry->io_context = ctx;
+
+	/* We should only get page size requests at present. */
+	verity_inc_pending(io);
+	bio = verity_alloc_bioset(vc, GFP_NOIO, 1);
+	bio->bi_private = entry;
+	bio->bi_idx = 0;
+	bio->bi_size = vc->vt.block_size;
+	bio->bi_sector = vc->hash_start + start;
+	bio->bi_bdev = vc->hash_dev->bdev;
+	bio->bi_end_io = kverityd_io_vt_populate_end;
+	bio->bi_rw = REQ_META;
+	/* Only need to free the bio since the page is managed by vt */
+	bio->bi_destructor = verity_bio_destructor;
+	bio->bi_vcnt = 1;
+	bio->bi_io_vec->bv_offset = offset_in_page(dst);
+	bio->bi_io_vec->bv_len = to_bytes(count);
+	/* dst is guaranteed to be a page_pool allocation */
+	bio->bi_io_vec->bv_page = virt_to_page(dst);
+	/* Track that this I/O is in use.  There should be no risk of the io
+	 * being removed prior since this is called synchronously.
+	 */
+	generic_make_request(bio);
+	return 0;
+}
+
+/* Submits an io request for each missing block of block hashes.
+ * The last one to return will then enqueue this on the io workqueue.
+ */
+static void kverityd_io_vt_populate(struct verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+	u64 block;
+
+	for (block = io->block; block < io->block + io->count; ++block) {
+		int ret = verity_tree_populate(&vc->vt, io, block);
+
+		if (ret < 0) {
+			/* verity_dec_pending will handle the error case. */
+			io->error = ret;
+			break;
+		}
+	}
+}
+
+/* Asynchronously called upon the completion of I/O issued
+ * from kverityd_src_io_read. verity_dec_pending() acts as
+ * the scheduler/flow manager.
+ */
+static void kverityd_src_io_read_end(struct bio *clone, int error)
+{
+	struct verity_io *io = clone->bi_private;
+
+	if (unlikely(!bio_flagged(clone, BIO_UPTODATE) && !error))
+		error = -EIO;
+
+	if (unlikely(error)) {
+		DMERR_LIMIT("Error occurred: %d (%llu, %u)",
+			    error, ULL(clone->bi_sector), clone->bi_size);
+		io->error = error;
+	}
+
+	/* Release the clone which just avoids the block layer from
+	 * leaving offsets, etc in unexpected states.
+	 */
+	bio_put(clone);
+
+	verity_dec_pending(io);
+}
+
+/* If not yet underway, an I/O request will be issued to the vc->dev
+ * device for the data needed. It is cloned to avoid unexpected changes
+ * to the original bio struct.
+ */
+static void kverityd_src_io_read(struct verity_io *io)
+{
+	struct bio *clone;
+
+	/* Check if the read is already issued. */
+	if (io->flags & VERITY_IOFLAGS_CLONED)
+		return;
+
+	io->flags |= VERITY_IOFLAGS_CLONED;
+
+	/* Clone the bio. The block layer may modify the bvec array. */
+	clone = verity_bio_clone(io);
+	if (unlikely(!clone)) {
+		io->error = -ENOMEM;
+		return;
+	}
+
+	verity_inc_pending(io);
+
+	generic_make_request(clone);
+}
+
+/* kverityd_io services the I/O workqueue. For each pass through
+ * the I/O workqueue, a call to populate both the origin drive
+ * data and the hash tree data is made.
+ */
+static void kverityd_io(struct work_struct *work)
+{
+	struct delayed_work *dwork = container_of(work, struct delayed_work,
+						  work);
+	struct verity_io *io = container_of(dwork, struct verity_io,
+					    work);
+
+	/* Issue requests asynchronously. */
+	verity_inc_pending(io);
+	kverityd_src_io_read(io);
+	kverityd_io_vt_populate(io);
+	verity_dec_pending(io);
+}
+
+/* Paired with verity_dec_pending, the pending value in the io dictate the
+ * lifetime of a request and when it is ready to be processed on the
+ * workqueues.
+ */
+static void verity_inc_pending(struct verity_io *io)
+{
+	atomic_inc(&io->pending);
+}
+
+/* Block-level requests start here. */
+static int verity_map(struct dm_target *ti, struct bio *bio,
+		      union map_info *map_context)
+{
+	struct verity_io *io;
+	struct verity_config *vc;
+	struct request_queue *r_queue;
+
+	if (unlikely(!ti)) {
+		DMERR("dm_target was NULL");
+		return -EIO;
+	}
+
+	vc = ti->private;
+	r_queue = bdev_get_queue(vc->dev->bdev);
+
+	if (bio_data_dir(bio) == WRITE) {
+		/* If we silently drop writes, then the VFS layer will cache
+		 * the write and persist it in memory. While it doesn't change
+		 * the underlying storage, it still may be contrary to the
+		 * behavior expected by a verified, read-only device.
+		 */
+		DMWARN_LIMIT("write request received. rejecting with -EIO.");
+		return -EIO;
+	} else {
+		/* Queue up the request to be verified */
+		io = verity_io_alloc(ti, bio);
+		if (!io) {
+			DMERR_LIMIT("Failed to allocate and init IO data");
+			return DM_MAPIO_REQUEUE;
+		}
+		INIT_DELAYED_WORK(&io->work, kverityd_io);
+		queue_delayed_work(kverityd_ioq, &io->work, 0);
+	}
+
+	return DM_MAPIO_SUBMITTED;
+}
+
+/*
+ * Non-block interfaces and device-mapper specific code
+ */
+
+/*
+ * Verity target parameters:
+ *
+ * <dev> <hash_dev> <hash_start> <block_size> <alg> <digest> <salt>
+ *
+ * version:        version of the hash tree on-disk format
+ * dev:            device to verify
+ * hash_dev:       device hashtree is stored on
+ * hash_start:     start address of hashes
+ * block_size:     size of a hash block
+ * alg:            hash algorithm
+ * digest:         toplevel hash of the tree
+ * salt:           salt
+ */
+static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+{
+	struct verity_config *vc = NULL;
+	const char *dev, *hash_dev, *alg, *digest, *salt;
+	unsigned long hash_start, block_size, version;
+	sector_t blocks;
+	int ret;
+
+	if (argc != 8) {
+		ti->error = "Invalid argument count";
+		return -EINVAL;
+	}
+
+	if (kstrtoul(argv[0], 10, &version) || (version != 0)) {
+		ti->error = "Invalid version";
+		return -EINVAL;
+	}
+	dev = argv[1];
+	hash_dev = argv[2];
+	if (kstrtoul(argv[3], 10, &hash_start)) {
+		ti->error = "Invalid hash_start";
+		return -EINVAL;
+	}
+	if (kstrtoul(argv[4], 10, &block_size) || (block_size > UINT_MAX)) {
+		ti->error = "Invalid block_size";
+		return -EINVAL;
+	}
+	alg = argv[5];
+	digest = argv[6];
+	salt = argv[7];
+
+	/* The device mapper device should be setup read-only */
+	if ((dm_table_get_mode(ti->table) & ~FMODE_READ) != 0) {
+		ti->error = "Must be created readonly.";
+		return -EINVAL;
+	}
+
+	vc = kzalloc(sizeof(*vc), GFP_KERNEL);
+	if (!vc)
+		return -EINVAL;
+
+	/* Calculate the blocks from the given device size */
+	vc->size = ti->len;
+	blocks = to_bytes(vc->size) / block_size;
+	if (verity_tree_create(&vc->vt, blocks, block_size, alg)) {
+		DMERR("failed to create required vt");
+		goto bad_vt;
+	}
+	if (verity_tree_set_digest(&vc->vt, digest)) {
+		DMERR("digest error");
+		goto bad_digest;
+	}
+	verity_tree_set_salt(&vc->vt, salt);
+	vc->vt.read_cb = kverityd_vt_read_callback;
+
+	vc->start = 0;
+	/* We only ever grab the device in read-only mode. */
+	ret = dm_get_device(ti, dev, dm_table_get_mode(ti->table), &vc->dev);
+	if (ret) {
+		DMERR("Failed to acquire device '%s': %d", dev, ret);
+		ti->error = "Device lookup failed";
+		goto bad_verity_dev;
+	}
+
+	if ((to_bytes(vc->start) % block_size) ||
+	    (to_bytes(vc->size) % block_size)) {
+		ti->error = "Device must be block_size divisble/aligned";
+		goto bad_hash_start;
+	}
+
+	vc->hash_start = (sector_t)hash_start;
+
+	/*
+	 * Note, dev == hash_dev is okay as long as the size of
+	 *       ti->len passed to device mapper does not include
+	 *       the hashes.
+	 */
+	if (dm_get_device(ti, hash_dev,
+			  dm_table_get_mode(ti->table), &vc->hash_dev)) {
+		ti->error = "Hash device lookup failed";
+		goto bad_hash_dev;
+	}
+
+	if (snprintf(vc->hash_alg, CRYPTO_MAX_ALG_NAME, "%s", alg) >=
+	    CRYPTO_MAX_ALG_NAME) {
+		ti->error = "Hash algorithm name is too long";
+		goto bad_hash;
+	}
+
+	vc->io_pool = mempool_create_slab_pool(MIN_IOS, _verity_io_pool);
+	if (!vc->io_pool) {
+		ti->error = "Cannot allocate verity io mempool";
+		goto bad_slab_pool;
+	}
+
+	vc->bs = bioset_create(MIN_BIOS, 0);
+	if (!vc->bs) {
+		ti->error = "Cannot allocate verity bioset";
+		goto bad_bs;
+	}
+
+	ti->private = vc;
+
+	return 0;
+
+bad_bs:
+	mempool_destroy(vc->io_pool);
+bad_slab_pool:
+bad_hash:
+	dm_put_device(ti, vc->hash_dev);
+bad_hash_dev:
+bad_hash_start:
+	dm_put_device(ti, vc->dev);
+bad_vt:
+bad_digest:
+bad_verity_dev:
+	kfree(vc);   /* hash is not secret so no need to zero */
+	return -EINVAL;
+}
+
+static void verity_dtr(struct dm_target *ti)
+{
+	struct verity_config *vc = ti->private;
+
+	bioset_free(vc->bs);
+	mempool_destroy(vc->io_pool);
+	verity_tree_destroy(&vc->vt);
+	dm_put_device(ti, vc->hash_dev);
+	dm_put_device(ti, vc->dev);
+	kfree(vc);
+}
+
+static int verity_ioctl(struct dm_target *ti, unsigned int cmd,
+			unsigned long arg)
+{
+	struct verity_config *vc = ti->private;
+	struct dm_dev *dev = vc->dev;
+	int r = 0;
+
+	/*
+	 * Only pass ioctls through if the device sizes match exactly.
+	 */
+	if (vc->start ||
+	    ti->len != i_size_read(dev->bdev->bd_inode) >> SECTOR_SHIFT)
+		r = scsi_verify_blk_ioctl(NULL, cmd);
+
+	return r ? : __blkdev_driver_ioctl(dev->bdev, dev->mode, cmd, arg);
+}
+
+static int verity_status(struct dm_target *ti, status_type_t type,
+			char *result, unsigned int maxlen)
+{
+	struct verity_config *vc = ti->private;
+	char digest[VERITY_MAX_DIGEST_SIZE * 2 + 1] = { 0 };
+	char salt[VERITY_SALT_SIZE * 2 + 1] = { 0 };
+	unsigned int sz = 0;
+
+	verity_tree_digest(&vc->vt, digest);
+	verity_tree_salt(&vc->vt, salt);
+
+	switch (type) {
+	case STATUSTYPE_INFO:
+		result[0] = '\0';
+		break;
+	case STATUSTYPE_TABLE:
+		DMEMIT("%s %s %llu %llu %s %s %s",
+		       vc->dev->name,
+		       vc->hash_dev->name,
+		       ULL(vc->hash_start),
+		       ULL(vc->vt.block_size),
+		       vc->hash_alg,
+		       digest,
+		       salt);
+		break;
+	}
+	return 0;
+}
+
+static int verity_merge(struct dm_target *ti, struct bvec_merge_data *bvm,
+		       struct bio_vec *biovec, int max_size)
+{
+	struct verity_config *vc = ti->private;
+	struct request_queue *q = bdev_get_queue(vc->dev->bdev);
+
+	if (!q->merge_bvec_fn)
+		return max_size;
+
+	bvm->bi_bdev = vc->dev->bdev;
+	bvm->bi_sector = vc->start + bvm->bi_sector - ti->begin;
+
+	/* Optionally, this could just return 0 to stick to single pages. */
+	return min(max_size, q->merge_bvec_fn(q, bvm, biovec));
+}
+
+static int verity_iterate_devices(struct dm_target *ti,
+				 iterate_devices_callout_fn fn, void *data)
+{
+	struct verity_config *vc = ti->private;
+
+	return fn(ti, vc->dev, vc->start, ti->len, data);
+}
+
+static void verity_io_hints(struct dm_target *ti,
+			    struct queue_limits *limits)
+{
+	struct verity_config *vc = ti->private;
+	unsigned int block_size = vc->vt.block_size;
+
+	limits->logical_block_size = block_size;
+	limits->physical_block_size = block_size;
+	blk_limits_io_min(limits, block_size);
+}
+
+static struct target_type verity_target = {
+	.name   = "verity",
+	.version = {0, 1, 0},
+	.module = THIS_MODULE,
+	.ctr    = verity_ctr,
+	.dtr    = verity_dtr,
+	.ioctl  = verity_ioctl,
+	.map    = verity_map,
+	.merge  = verity_merge,
+	.status = verity_status,
+	.iterate_devices = verity_iterate_devices,
+	.io_hints = verity_io_hints,
+};
+
+#define VERITY_WQ_FLAGS (WQ_CPU_INTENSIVE|WQ_HIGHPRI)
+
+static int __init verity_init(void)
+{
+	int r = -ENOMEM;
+
+	_verity_io_pool = KMEM_CACHE(verity_io, 0);
+	if (!_verity_io_pool) {
+		DMERR("failed to allocate pool verity_io");
+		goto bad_io_pool;
+	}
+
+	kverityd_ioq = alloc_workqueue("kverityd_io", VERITY_WQ_FLAGS, 1);
+	if (!kverityd_ioq) {
+		DMERR("failed to create workqueue kverityd_ioq");
+		goto bad_io_queue;
+	}
+
+	kveritydq = alloc_workqueue("kverityd", VERITY_WQ_FLAGS, 1);
+	if (!kveritydq) {
+		DMERR("failed to create workqueue kveritydq");
+		goto bad_verify_queue;
+	}
+
+	r = dm_register_target(&verity_target);
+	if (r < 0) {
+		DMERR("register failed %d", r);
+		goto register_failed;
+	}
+
+	DMINFO("version %u.%u.%u loaded", verity_target.version[0],
+	       verity_target.version[1], verity_target.version[2]);
+
+	return r;
+
+register_failed:
+	destroy_workqueue(kveritydq);
+bad_verify_queue:
+	destroy_workqueue(kverityd_ioq);
+bad_io_queue:
+	kmem_cache_destroy(_verity_io_pool);
+bad_io_pool:
+	return r;
+}
+
+static void __exit verity_exit(void)
+{
+	int cpu;
+
+	flush_workqueue(kverityd_ioq);
+	flush_workqueue(kveritydq);
+
+	for_each_possible_cpu(cpu)
+		kfree(__get_cpu_var(verity_hash_desc));
+
+	destroy_workqueue(kveritydq);
+	destroy_workqueue(kverityd_ioq);
+
+	dm_unregister_target(&verity_target);
+	kmem_cache_destroy(_verity_io_pool);
+}
+
+module_init(verity_init);
+module_exit(verity_exit);
+
+MODULE_AUTHOR("The Chromium OS Authors <chromium-os-dev@chromium.org>");
+MODULE_DESCRIPTION(DM_NAME " target for transparent disk integrity checking");
+MODULE_LICENSE("GPL");
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] dm: verity target
  2012-02-29 21:16 ` Mikulas Patocka
@ 2012-03-01  6:24   ` Mandeep Singh Baines
  0 siblings, 0 replies; 22+ messages in thread
From: Mandeep Singh Baines @ 2012-03-01  6:24 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Mandeep Singh Baines, Alasdair G Kergon, dm-devel, linux-kernel,
	Will Drewry, Elly Jones, Milan Broz, Olof Johansson,
	Steffen Klassert, Andrew Morton

Mikulas Patocka (mpatocka@redhat.com) wrote:
> Hi
> 
> This crashes if the device size is 64MiB (and sha256 hash is used).
> 

Hi Mikulas,

Thanks for catching this! Below is a fix:

diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
index 263070d..74616b7 100644
--- a/drivers/md/dm-verity.c
+++ b/drivers/md/dm-verity.c
@@ -278,7 +278,7 @@ static int verity_tree_initialize_entries(struct verity_tree *vt)
         * independently from the vt data structures.  Logically, the root is
         * depth=-1 and the block layer level is depth=vt->depth
         */
-       unsigned int last = vt->block_count;
+       unsigned int last = vt->block_count - 1;
        int depth;
 
        /* check that the largest level->count can't result in an int overflow

I'm still working on addressing akpm's feedback so will send out a v5
once that is done.

Regards,
Mandeep

> I tested it with the userspace utility and it doesn't work with device 
> >= 128MiB, it fails to verify the output of the utility.
> 
> I run this (/dev/vg1/verity_long_data has 128MiB size):
> ./verity mode=create alg=sha256 payload=/dev/vg1/verity_long_data 
> hashtree=/dev/vg1/verity_long_hash 
> salt=1234000000000000000000000000000000000000000000000000000000000000
> dm:dm bht[DEBUG] Setting block_count 32768
> dm:dm bht[DEBUG] Setting depth to 3.
> dm:dm bht[DEBUG] depth: 0 entries: 1
> dm:dm bht[DEBUG] depth: 1 entries: 2
> dm:dm bht[DEBUG] depth: 2 entries: 256
> 0 262144 verity payload=ROOT_DEV hashtree=HASH_DEV hashstart=262144 
> alg=sha256 
> root_hexdigest=6e46e106b288812a881a9da3f11180433f90ce264c4f1e8fa191fb40409846fb 
> salt=1234000000000000000000000000000000000000000000000000000000000000
> 
> dmsetup -r create verity --table "0 `blockdev --getsize 
> /dev/vg1/verity_long_data` verity 0 /dev/vg1/verity_long_data 
> /dev/vg1/verity_long_hash 0 4096 sha256 
> 6e46e106b288812a881a9da3f11180433f90ce264c4f1e8fa191fb40409846fb 
> 1234000000000000000000000000000000000000000000000000000000000000"
> 
> and get a lot of log messages "failed to verify hash (d=3,bi=0)"
> 
> If the device is smaller than 128MiB, it works (except for 64MiB device 
> where it crashes).
> 
> Mikulas
> 
> dmsetup -r create verity --table "0 131072 verity 0 
> /dev/vg1/verity_long_data /dev/vg1/verity_long_hash 0 4096 sha256 
> d821fec17e151a6e7b91c4a7a71487760185c25af49a74955a3e7a718c1f97dd 
> 1234000000000000000000000000000000000000000000000000000000000000"
> 
> [14356.819694] ------------[ cut here ]------------
> [14356.819747] kernel BUG at drivers/md/dm-verity2.c:439!
> [14356.819796] invalid opcode: 0000 [#1] PREEMPT SMP
> [14356.819879] CPU 5
> [14356.819897] Modules linked in: dm_verity2 md5 sha1_generic cryptomgr 
> aead sha256_generic dm_zero dm_bufio crypto_hash crypto_algapi crypto 
> dm_loop dm_mod parport_pc parport powernow_k8 mperf cpufreq_stats 
> cpufreq_powersave cpufreq_userspace cpufreq_conservative cpufreq_ondemand 
> freq_table snd_usb_audio snd_pcm_oss snd_mixer_oss snd_pcm snd_timer 
> snd_page_alloc snd_hwdep snd_usbmidi_lib snd_rawmidi snd soundcore fuse 
> raid0 md_mod lm85 hwmon_vid ide_cd_mod cdrom ohci_hcd ehci_hcd sata_svw 
> libata serverworks ide_core usbcore floppy usb_common rtc_cmos e100 tg3 
> mii libphy k10temp skge hwmon button i2c_piix4 processor unix [last 
> unloaded: dm_verity2]
> [14356.820536]
> [14356.820580] Pid: 10909, comm: dmsetup Not tainted 3.2.0 #19 empty 
> empty/S3992-E
> [14356.820680] RIP: 0010:[<ffffffffa0195908>]  [<ffffffffa0195908>] 
> verity_ctr+0x808/0x860 [dm_verity2]
> [14356.820772] RSP: 0018:ffff880141b7bcc8  EFLAGS: 00010202
> [14356.820821] RAX: 0000000000000408 RBX: ffff8802afedcc28 RCX: 
> 0000000000000002[14356.820875] RDX: 0000000000000081 RSI: ffff88044685a540 
> RDI: ffff8802afc4b000[14356.820929] RBP: ffff8802afedcc00 R08: 
> 0000000000000000 R09: ffff8802afc4a000[14356.820987] R10: 0000000000000001 
> R11: ffffff9000000018 R12: ffffffff8147a630[14356.821041] R13: 
> ffff88044685a558 R14: 0000000000004000 R15: 0000000000000002[14356.821114] 
> FS:  00007f1c9ea567a0(0000) GS:ffff880447c80000(0000) 
> knlGS:0000000000000000
> [14356.821199] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [14356.821257] CR2: 00007f1c9e199f80 CR3: 00000003fe98e000 CR4: 
> 00000000000006e0[14356.821330] DR0: 0000000000000000 DR1: 0000000000000000 
> DR2: 0000000000000000[14356.821394] DR3: 0000000000000000 DR6: 
> 00000000ffff0ff0 DR7: 0000000000000400[14356.821464] Process dmsetup (pid: 
> 10909, threadinfo ffff880141b7a000, task ffff88023d8dec90)
> [14356.821557] Stack:
> [14356.821606]  ffff880141b7bd34 0000000000000007 ffffc90012ac9040 
> ffff880429987340
> [14356.821700]  ffffc90012ac41a4 0000000000020000 ffffc90012ac419d 
> ffffc90012ac4162
> [14356.821787]  ffffc90012ac417c ffffc90012ac41e5 0000000000020000 
> 0000000000000000
> [14356.821905] Call Trace:
> [14356.821965]  [<ffffffffa01d2cab>] ? dm_table_add_target+0x19b/0x450 
> [dm_mod] [14356.822034]  [<ffffffffa01d5690>] ? table_clear+0x80/0x80 
> [dm_mod]
> [14356.822090]  [<ffffffffa01d5762>] ? table_load+0xd2/0x330 [dm_mod]
> [14356.822150]  [<ffffffffa01d5690>] ? table_clear+0x80/0x80 [dm_mod]
> [14356.822213]  [<ffffffffa01d6bf9>] ? ctl_ioctl+0x159/0x2a0 [dm_mod]
> [14356.822286]  [<ffffffff8117351d>] ? ipc_addid+0x4d/0xd0
> [14356.822338]  [<ffffffffa01d6d4e>] ? dm_ctl_ioctl+0xe/0x20 [dm_mod]
> [14356.822411]  [<ffffffff81105a9e>] ? do_vfs_ioctl+0x8e/0x4f0
> [14356.822474]  [<ffffffff8110a980>] ? dput+0x20/0x230
> [14356.822533]  [<ffffffff810f6282>] ? fput+0x162/0x220
> [14356.822590]  [<ffffffff81105f49>] ? sys_ioctl+0x49/0x90
> [14356.822652]  [<ffffffff81313abb>] ? system_call_fastpath+0x16/0x1b
> [14356.822702] Code: 38 ce 65 19 a0 e9 61 fb ff ff 48 c7 c7 d8 62 19 a0 31 
> c0 e8 d9 80 17 e1 48 c7 c7 28 63 19 a0 31 c0 e8 cb 80 17 e1 e9 40 fb ff ff 
> <0f> 0b 48 c7 c7 b8 60 19 a0 31 c0 e8 b6 80 17 e1 e9 9b fa ff ff
> [14356.823081] RIP  [<ffffffffa0195908>] verity_ctr+0x808/0x860 
> [dm_verity2]
> [14356.823146]  RSP <ffff880141b7bcc8>
> [14356.823530] ---[ end trace 773c24b9dbd5cfff ]---
> 
> 
> On Tue, 28 Feb 2012, Mandeep Singh Baines wrote:
> 
> > The verity target provides transparent integrity checking of block devices
> > using a cryptographic digest.
> > 
> > dm-verity is meant to be setup as part of a verified boot path.  This
> > may be anything ranging from a boot using tboot or trustedgrub to just
> > booting from a known-good device (like a USB drive or CD).
> > 
> > dm-verity is part of ChromeOS's verified boot path. It is used to verify
> > the integrity of the root filesystem on boot. The root filesystem is
> > mounted on a dm-verity partition which transparently verifies each block
> > with a bootloader verified hash passed into the kernel at boot.
> > 
> > Changes in V4:
> > * Discussion over phone (Alasdair G Kergon)
> >  * copy _ioctl fix from dm-linear
> >  * verity_status format fixes to match dm conventions
> >  * s/dm-bht/verity_tree
> >  * put everything into dm-verity.c
> >  * ctr changed to dm conventions
> >  * use hex2bin
> >  * use conventional dm names for function
> >   * s/dm_//
> >   * for example: verity_ctr versus dm_verity_ctr
> >  * use per_cpu API
> > Changes in V3:
> > * Discussion over irc (Alasdair G Kergon)
> >   * Implement ioctl hook
> > Changes in V2:
> > * https://lkml.org/lkml/2011/11/10/85 (Steffen Klassert)
> >   * Use shash API instead of older hash API
> > 
> > Signed-off-by: Will Drewry <wad@chromium.org>
> > Signed-off-by: Elly Jones <ellyjones@chromium.org>
> > Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
> > Cc: Alasdair G Kergon <agk@redhat.com>
> > Cc: Milan Broz <mbroz@redhat.com>
> > Cc: Olof Johansson <olofj@chromium.org>
> > Cc: Steffen Klassert <steffen.klassert@secunet.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Mikulas Patocka <mpatocka@redhat.com>
> > Cc: dm-devel@redhat.com
> > ---
> >  Documentation/device-mapper/verity.txt |  149 ++++
> >  drivers/md/Kconfig                     |   16 +
> >  drivers/md/Makefile                    |    1 +
> >  drivers/md/dm-verity.c                 | 1411 ++++++++++++++++++++++++++++++++
> >  4 files changed, 1577 insertions(+), 0 deletions(-)
> >  create mode 100644 Documentation/device-mapper/verity.txt
> >  create mode 100644 drivers/md/dm-verity.c
> > 
> > diff --git a/Documentation/device-mapper/verity.txt b/Documentation/device-mapper/verity.txt
> > new file mode 100644
> > index 0000000..b631f12
> > --- /dev/null
> > +++ b/Documentation/device-mapper/verity.txt
> > @@ -0,0 +1,149 @@
> > +dm-verity
> > +==========
> > +
> > +Device-Mapper's "verity" target provides transparent integrity checking of
> > +block devices using a cryptographic digest provided by the kernel crypto API.
> > +This target is read-only.
> > +
> > +Parameters:
> > +    <version> <dev> <hash_dev> <hash_start> <block_size> <alg> <digest> <salt>
> > +
> > +<version>
> > +    This is the version number of the on-disk format. Currently, there is
> > +    only version 0.
> > +
> > +<dev>
> > +    This is the device that is going to be integrity checked.  It may be
> > +    a subset of the full device as specified to dmsetup (start sector and count)
> > +    It may be specified as a path, like /dev/sdaX, or a device number,
> > +    <major>:<minor>.
> > +
> > +<hash_dev>
> > +    This is the device that that supplies the hash tree data.  It may be
> > +    specified similarly to the device path and may be the same device.  If the
> > +    same device is used, the hash offset should be outside of the dm-verity
> > +    configured device size.
> > +
> > +<hash_start>
> > +    This is the offset, in 512-byte sectors, from the start of hash_dev to
> > +    the root block of the hash tree.
> > +
> > +<block_size>
> > +    The size of a hash block. Also, the size of a block to be hashed.
> > +
> > +<alg>
> > +    The cryptographic hash algorithm used for this device.  This should
> > +    be the name of the algorithm, like "sha1".
> > +
> > +<digest>
> > +    The hexadecimal encoding of the cryptographic hash of all of the
> > +    neighboring nodes at the first level of the tree.  This hash should be
> > +    trusted as there is no other authenticity beyond this point.
> > +
> > +<salt>
> > +    The hexadecimal encoding of the salt value.
> > +
> > +Theory of operation
> > +===================
> > +
> > +dm-verity is meant to be setup as part of a verified boot path.  This
> > +may be anything ranging from a boot using tboot or trustedgrub to just
> > +booting from a known-good device (like a USB drive or CD).
> > +
> > +When a dm-verity device is configured, it is expected that the caller
> > +has been authenticated in some way (cryptographic signatures, etc).
> > +After instantiation, all hashes will be verified on-demand during
> > +disk access.  If they cannot be verified up to the root node of the
> > +tree, the root hash, then the I/O will fail.  This should identify
> > +tampering with any data on the device and the hash data.
> > +
> > +Cryptographic hashes are used to assert the integrity of the device on a
> > +per-block basis.  This allows for a lightweight hash computation on first read
> > +into the page cache.  Block hashes are stored linearly aligned to the nearest
> > +block the size of a page.
> > +
> > +Hash Tree
> > +---------
> > +
> > +Each node in the tree is a cryptographic hash.  If it is a leaf node, the hash
> > +is of some block data on disk.  If it is an intermediary node, then the hash is
> > +of a number of child nodes.
> > +
> > +Each entry in the tree is a collection of neighboring nodes that fit in one
> > +block.  The number is determined based on block_size and the size of the
> > +selected cryptographic digest algorithm.  The hashes are linearly ordered in
> > +this entry and any unaligned trailing space is ignored but included when
> > +calculating the parent node.
> > +
> > +The tree looks something like:
> > +
> > +alg = sha256, num_blocks = 32768, block_size = 4096
> > +
> > +                                 [   root    ]
> > +                                /    . . .    \
> > +                     [entry_0]                 [entry_1]
> > +                    /  . . .  \                 . . .   \
> > +         [entry_0_0]   . . .  [entry_0_127]    . . . .  [entry_1_127]
> > +           / ... \             /   . . .  \             /           \
> > +     blk_0 ... blk_127  blk_16256   blk_16383      blk_32640 . . . blk_32767
> > +
> > +On-disk format
> > +==============
> > +
> > +Below is the recommended on-disk format. The verity kernel code does not
> > +read the on-disk header. It only reads the hash blocks which directly
> > +follow the header. It is expected that a user-space tool will verify the
> > +integrity of the verity_header and then call dm_setup with the correct
> > +parameters. Alternatively, the header can be omitted and the dm_setup
> > +parameters can be passed via the kernel command-line in a rooted chain
> > +of trust where the command-line is verified.
> > +
> > +The on-disk format is especially useful in cases where the hash blocks
> > +are on a separate partition. The magic number allows easy identification
> > +of the partition contents. Alternatively, the hash blocks can be stored
> > +in the same partition as the data to be verified. In such a configuration
> > +the filesystem on the partition would be sized a little smaller than
> > +the full-partition, leaving room for the hash blocks.
> > +
> > +struct verity_header {
> > +       uint64_t magic = 0x7665726974790a00;
> > +       uint32_t version;
> > +       uint32_t block_size;
> > +       char digest[128]; /* in hex-ascii, null-terminated or 128-bytes */
> > +       char salt[128]; /* in hex-ascii, null-terminated or 128-bytes */
> > +}
> > +
> > +struct verity_header_block {
> > +	struct verity_header;
> > +	char unused[block_size - sizeof(struct verity_header) - sizeof(sig)];
> > +	char sig[128]; /* in hex-ascii, null-terminated or 128-bytes */
> > +}
> > +
> > +Directly following the header are the hash blocks which are stored a depth
> > +at a time (starting from the root), sorted in order of increasing index.
> > +
> > +Usage
> > +=====
> > +
> > +The API provides mechanisms for reading and verifying a tree. When reading, all
> > +required data for the hash tree should be populated for a block before
> > +attempting a verify.  This can be done by calling dm_bht_populate().  When all
> > +data is ready, a call to dm_bht_verify_block() with the expected hash value will
> > +perform both the direct block hash check and the hashes of the parent and
> > +neighboring nodes where needed to ensure validity up to the root hash.  Note,
> > +dm_bht_set_root_hexdigest() should be called before any verification attempts
> > +occur.
> > +
> > +Example
> > +=======
> > +
> > +Setup a device;
> > +[[
> > +  dmsetup create vroot --table \
> > +    "0 204800 verity /dev/sda1 /dev/sda2 alg=sha1 "\
> > +    "root_hexdigest=9f74809a2ee7607b16fcc70d9399a4de9725a727"
> > +]]
> > +
> > +A command line tool is available to compute the hash tree and return the
> > +root hash value.
> > +  http://git.chromium.org/cgi-bin/gitweb.cgi?p=dm-verity.git;a=tree
> > diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
> > index faa4741..b8bb690 100644
> > --- a/drivers/md/Kconfig
> > +++ b/drivers/md/Kconfig
> > @@ -370,4 +370,20 @@ config DM_FLAKEY
> >         ---help---
> >           A target that intermittently fails I/O for debugging purposes.
> >  
> > +config DM_VERITY
> > +        tristate "Verity target support"
> > +        depends on BLK_DEV_DM
> > +        select CRYPTO
> > +        select CRYPTO_HASH
> > +        ---help---
> > +          This device-mapper target allows you to create a device that
> > +          transparently integrity checks the data on it. You'll need to
> > +          activate the digests you're going to use in the cryptoapi
> > +          configuration.
> > +
> > +          To compile this code as a module, choose M here: the module will
> > +          be called dm-verity.
> > +
> > +          If unsure, say N.
> > +
> >  endif # MD
> > diff --git a/drivers/md/Makefile b/drivers/md/Makefile
> > index 046860c..70a29af 100644
> > --- a/drivers/md/Makefile
> > +++ b/drivers/md/Makefile
> > @@ -39,6 +39,7 @@ obj-$(CONFIG_DM_SNAPSHOT)	+= dm-snapshot.o
> >  obj-$(CONFIG_DM_PERSISTENT_DATA)	+= persistent-data/
> >  obj-$(CONFIG_DM_MIRROR)		+= dm-mirror.o dm-log.o dm-region-hash.o
> >  obj-$(CONFIG_DM_LOG_USERSPACE)	+= dm-log-userspace.o
> > +obj-$(CONFIG_DM_VERITY)         += dm-verity.o
> >  obj-$(CONFIG_DM_ZERO)		+= dm-zero.o
> >  obj-$(CONFIG_DM_RAID)	+= dm-raid.o
> >  obj-$(CONFIG_DM_THIN_PROVISIONING)	+= dm-thin-pool.o
> > diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
> > new file mode 100644
> > index 0000000..87b7958
> > --- /dev/null
> > +++ b/drivers/md/dm-verity.c
> > @@ -0,0 +1,1411 @@
> > +/*
> > + * Originally based on dm-crypt.c,
> > + * Copyright (C) 2003 Christophe Saout <christophe@saout.de>
> > + * Copyright (C) 2004 Clemens Fruhwirth <clemens@endorphin.org>
> > + * Copyright (C) 2006-2008 Red Hat, Inc. All rights reserved.
> > + * Copyright (C) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org>
> > + *                    All Rights Reserved.
> > + *
> > + * This file is released under the GPLv2.
> > + *
> > + * Implements a verifying transparent block device.
> > + * See Documentation/device-mapper/dm-verity.txt
> > + */
> > +#include <crypto/hash.h>
> > +#include <linux/atomic.h>
> > +#include <linux/bio.h>
> > +#include <linux/blkdev.h>
> > +#include <linux/genhd.h>
> > +#include <linux/init.h>
> > +#include <linux/kernel.h>
> > +#include <linux/mempool.h>
> > +#include <linux/module.h>
> > +#include <linux/workqueue.h>
> > +#include <linux/device-mapper.h>
> > +
> > +
> > +#define DM_MSG_PREFIX "verity"
> > +
> > +
> > +/* Helper for printing sector_t */
> > +#define ULL(x) ((unsigned long long)(x))
> > +
> > +#define MIN_IOS 32
> > +#define MIN_BIOS (MIN_IOS * 2)
> > +
> > +/* To avoid allocating memory for digest tests, we just setup a
> > + * max to use for now.
> > + */
> > +#define VERITY_MAX_DIGEST_SIZE 64   /* Supports up to 512-bit digests */
> > +#define VERITY_SALT_SIZE       32   /* 256 bits of salt is a lot */
> > +
> > +/* UNALLOCATED, PENDING, READY, and VERIFIED are valid states. All other
> > + * values are entry-related return codes.
> > + */
> > +#define VERITY_TREE_ENTRY_VERIFIED 8  /* 'nodes' checked against parent */
> > +#define VERITY_TREE_ENTRY_READY 4  /* 'nodes' is loaded and available */
> > +#define VERITY_TREE_ENTRY_PENDING 2  /* 'nodes' is being loaded */
> > +#define VERITY_TREE_ENTRY_UNALLOCATED 0 /* untouched */
> > +#define VERITY_TREE_ENTRY_ERROR -1 /* entry is unsuitable for use */
> > +#define VERITY_TREE_ENTRY_ERROR_IO -2 /* I/O error on load */
> > +
> > +/* Additional possible return codes */
> > +#define VERITY_TREE_ENTRY_ERROR_MISMATCH -3 /* Digest mismatch */
> > +
> > +
> > +/* verity_tree_entry
> > + * Contains verity_tree->node_count tree nodes at a given tree depth.
> > + * state is used to transactionally assure that data is paged in
> > + * from disk.  Unless verity_tree kept running crypto contexts for each
> > + * level, we need to load in the data for on-demand verification.
> > + */
> > +struct verity_tree_entry {
> > +	atomic_t state; /* see defines */
> > +	/* Keeping an extra pointer per entry wastes up to ~33k of
> > +	 * memory if a 1m blocks are used (or 66 on 64-bit arch)
> > +	 */
> > +	void *io_context;  /* Reserve a pointer for use during io */
> > +	/* data should only be non-NULL if fully populated. */
> > +	void *nodes;  /* The hash data used to verify the children.
> > +		       * Guaranteed to be page-aligned.
> > +		       */
> > +};
> > +
> > +/* verity_tree_level
> > + * Contains an array of entries which represent a page of hashes where
> > + * each hash is a node in the tree at the given tree depth/level.
> > + */
> > +struct verity_tree_level {
> > +	struct verity_tree_entry *entries;  /* array of entries of tree nodes */
> > +	unsigned int count;  /* number of entries at this level */
> > +	sector_t sector;  /* starting sector for this level */
> > +};
> > +
> > +/* opaque context, start, databuf, sector_count */
> > +typedef int(*verity_tree_callback)(void *,  /* external context */
> > +			      sector_t,  /* start sector */
> > +			      u8 *,  /* destination page */
> > +			      sector_t,  /* num sectors */
> > +			      struct verity_tree_entry *);
> > +/* verity_tree - Device mapper block hash tree
> > + * verity_tree provides a fixed interface for comparing data blocks
> > + * against a cryptographic hashes stored in a hash tree. It
> > + * optimizes the tree structure for storage on disk.
> > + *
> > + * The tree is built from the bottom up.  A collection of data,
> > + * external to the tree, is hashed and these hashes are stored
> > + * as the blocks in the tree.  For some number of these hashes,
> > + * a parent node is created by hashing them.  These steps are
> > + * repeated.
> > + */
> > +struct verity_tree {
> > +	/* Configured values */
> > +	int depth;  /* Depth of the tree including the root */
> > +	unsigned int block_count;  /* Number of blocks hashed */
> > +	unsigned int block_size;  /* Size of a hash block */
> > +	char hash_alg[CRYPTO_MAX_ALG_NAME];
> > +	u8 salt[VERITY_SALT_SIZE];
> > +
> > +	/* Computed values */
> > +	unsigned int node_count;  /* Data size (in hashes) for each entry */
> > +	unsigned int node_count_shift;  /* first bit set - 1 */
> > +	/*
> > +	 * There is one per CPU so that verified can be simultaneous.
> > +	 * Access through per_cpu_ptr() only
> > +	 */
> > +	struct shash_desc * __percpu *hash_desc; /* Container for hash alg */
> > +	unsigned int digest_size;
> > +	sector_t sectors;  /* Number of disk sectors used */
> > +
> > +	/* bool verified;  Full tree is verified */
> > +	u8 digest[VERITY_MAX_DIGEST_SIZE];
> > +	struct verity_tree_level *levels;  /* in reverse order */
> > +	/* Callback for reading from the hash device */
> > +	verity_tree_callback read_cb;
> > +};
> > +
> > +/* per-requested-bio private data */
> > +enum verity_io_flags {
> > +	VERITY_IOFLAGS_CLONED = 0x1,	/* original bio has been cloned */
> > +};
> > +
> > +struct verity_io {
> > +	struct dm_target *target;
> > +	struct bio *bio;
> > +	struct delayed_work work;
> > +	unsigned int flags;
> > +
> > +	int error;
> > +	atomic_t pending;
> > +
> > +	u64 block;  /* aligned block index */
> > +	u64 count;  /* aligned count in blocks */
> > +};
> > +
> > +struct verity_config {
> > +	struct dm_dev *dev;
> > +	sector_t start;
> > +	sector_t size;
> > +
> > +	struct dm_dev *hash_dev;
> > +	sector_t hash_start;
> > +
> > +	struct verity_tree bht;
> > +
> > +	/* Pool required for io contexts */
> > +	mempool_t *io_pool;
> > +	/* Pool and bios required for making sure that backing device reads are
> > +	 * in PAGE_SIZE increments.
> > +	 */
> > +	struct bio_set *bs;
> > +
> > +	char hash_alg[CRYPTO_MAX_ALG_NAME];
> > +};
> > +
> > +
> > +static struct kmem_cache *_verity_io_pool;
> > +static struct workqueue_struct *kveritydq, *kverityd_ioq;
> > +
> > +
> > +static void kverityd_verify(struct work_struct *work);
> > +static void kverityd_io(struct work_struct *work);
> > +static void kverityd_io_bht_populate(struct verity_io *io);
> > +static void kverityd_io_bht_populate_end(struct bio *, int error);
> > +
> > +
> > +/*
> > + * Utilities
> > + */
> > +
> > +static void bin2hex(char *dst, const u8 *src, size_t count)
> > +{
> > +	while (count-- > 0) {
> > +		sprintf(dst, "%02hhx", (int)*src);
> > +		dst += 2;
> > +		src++;
> > +	}
> > +}
> > +
> > +/*
> > + * Verity Tree
> > + */
> > +
> > +/* Functions for converting indices to nodes. */
> > +
> > +static inline unsigned int verity_tree_get_level_shift(struct verity_tree *bht,
> > +						  int depth)
> > +{
> > +	return (bht->depth - depth) * bht->node_count_shift;
> > +}
> > +
> > +/* For the given depth, this is the entry index.  At depth+1 it is the node
> > + * index for depth.
> > + */
> > +static inline unsigned int verity_tree_index_at_level(struct verity_tree *bht,
> > +						      int depth,
> > +						      unsigned int leaf)
> > +{
> > +	return leaf >> verity_tree_get_level_shift(bht, depth);
> > +}
> > +
> > +static inline struct verity_tree_entry *verity_tree_get_entry(
> > +		struct verity_tree *bht,
> > +		int depth,
> > +		unsigned int block)
> > +{
> > +	unsigned int index = verity_tree_index_at_level(bht, depth, block);
> > +	struct verity_tree_level *level = &bht->levels[depth];
> > +
> > +	return &level->entries[index];
> > +}
> > +
> > +static inline void *verity_tree_get_node(struct verity_tree *bht,
> > +					 struct verity_tree_entry *entry,
> > +					 int depth, unsigned int block)
> > +{
> > +	unsigned int index = verity_tree_index_at_level(bht, depth, block);
> > +	unsigned int node_index = index % bht->node_count;
> > +
> > +	return entry->nodes + (node_index * bht->digest_size);
> > +}
> > +/**
> > + * verity_tree_compute_hash: hashes a page of data
> > + */
> > +static int verity_tree_compute_hash(struct verity_tree *vt, struct page *pg,
> > +				    unsigned int offset, u8 *digest)
> > +{
> > +	struct shash_desc *hash_desc;
> > +	void *data;
> > +	int err;
> > +
> > +	hash_desc = *per_cpu_ptr(vt->hash_desc, smp_processor_id());
> > +
> > +	if (crypto_shash_init(hash_desc)) {
> > +		DMCRIT("failed to reinitialize crypto hash (proc:%d)",
> > +			smp_processor_id());
> > +		return -EINVAL;
> > +	}
> > +	data = kmap_atomic(pg);
> > +	err = crypto_shash_update(hash_desc, data + offset, PAGE_SIZE);
> > +	kunmap_atomic(data);
> > +	if (err) {
> > +		DMCRIT("crypto_hash_update failed");
> > +		return -EINVAL;
> > +	}
> > +	if (crypto_shash_update(hash_desc, vt->salt, sizeof(vt->salt))) {
> > +		DMCRIT("crypto_hash_update failed");
> > +		return -EINVAL;
> > +	}
> > +	if (crypto_shash_final(hash_desc, digest)) {
> > +		DMCRIT("crypto_hash_final failed");
> > +		return -EINVAL;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int verity_tree_initialize_entries(struct verity_tree *vt)
> > +{
> > +	/* last represents the index of the last digest store in the tree.
> > +	 * By walking the tree with that index, it is possible to compute the
> > +	 * total number of entries at each level.
> > +	 *
> > +	 * Since each entry will contain up to |node_count| nodes of the tree,
> > +	 * it is possible that the last index may not be at the end of a given
> > +	 * entry->nodes.  In that case, it is assumed the value is padded.
> > +	 *
> > +	 * Note, we treat both the tree root (1 hash) and the tree leaves
> > +	 * independently from the vt data structures.  Logically, the root is
> > +	 * depth=-1 and the block layer level is depth=vt->depth
> > +	 */
> > +	unsigned int last = vt->block_count;
> > +	int depth;
> > +
> > +	/* check that the largest level->count can't result in an int overflow
> > +	 * on allocation or sector calculation.
> > +	 */
> > +	if (((last >> vt->node_count_shift) + 1) >
> > +	    UINT_MAX / max((unsigned int)sizeof(struct verity_tree_entry),
> > +			   (unsigned int)to_sector(vt->block_size))) {
> > +		DMCRIT("required entries %u is too large", last + 1);
> > +		return -EINVAL;
> > +	}
> > +
> > +	/* Track the current sector location for each level so we don't have to
> > +	 * compute it during traversals.
> > +	 */
> > +	vt->sectors = 0;
> > +	for (depth = 0; depth < vt->depth; ++depth) {
> > +		struct verity_tree_level *level = &vt->levels[depth];
> > +
> > +		level->count = verity_tree_index_at_level(vt, depth, last) + 1;
> > +		level->entries = (struct verity_tree_entry *)
> > +				 kcalloc(level->count,
> > +					 sizeof(struct verity_tree_entry),
> > +					 GFP_KERNEL);
> > +		if (!level->entries) {
> > +			DMERR("failed to allocate entries for depth %d", depth);
> > +			return -ENOMEM;
> > +		}
> > +		level->sector = vt->sectors;
> > +		vt->sectors += level->count * to_sector(vt->block_size);
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +/**
> > + * verity_tree_create - prepares @vt for us
> > + * @vt:	          pointer to a verity_tree_create()d vt
> > + * @depth:	  tree depth without the root; including block hashes
> > + * @block_count:  the number of block hashes / tree leaves
> > + * @alg_name:	  crypto hash algorithm name
> > + *
> > + * Returns 0 on success.
> > + *
> > + * Callers can offset into devices by storing the data in the io callbacks.
> > + */
> > +static int verity_tree_create(struct verity_tree *vt, unsigned int block_count,
> > +			      unsigned int block_size, const char *alg_name)
> > +{
> > +	struct crypto_shash *tfm;
> > +	int size, cpu, status = 0;
> > +
> > +	vt->block_size = block_size;
> > +	/* Verify that PAGE_SIZE >= block_size >= SECTOR_SIZE. */
> > +	if ((block_size > PAGE_SIZE) ||
> > +	    (PAGE_SIZE % block_size) ||
> > +	    (to_sector(block_size) == 0))
> > +		return -EINVAL;
> > +
> > +	tfm = crypto_alloc_shash(alg_name, 0, 0);
> > +	if (IS_ERR(tfm)) {
> > +		DMERR("failed to allocate crypto hash '%s'", alg_name);
> > +		return -ENOMEM;
> > +	}
> > +	size = sizeof(struct shash_desc) + crypto_shash_descsize(tfm);
> > +
> > +	vt->hash_desc = alloc_percpu(struct shash_desc *);
> > +	if (!vt->hash_desc) {
> > +		DMERR("Failed to allocate per cpu hash_desc");
> > +		status = -ENOMEM;
> > +		goto bad_per_cpu;
> > +	}
> > +
> > +	/* Pre-allocate per-cpu crypto contexts to avoid having to
> > +	 * kmalloc/kfree a context for every hash operation.
> > +	 */
> > +	for_each_possible_cpu(cpu) {
> > +		struct shash_desc *hash_desc = kmalloc(size, GFP_KERNEL);
> > +
> > +		*per_cpu_ptr(vt->hash_desc, cpu) = hash_desc;
> > +		if (!hash_desc) {
> > +			DMERR("failed to allocate crypto hash contexts");
> > +			status = -ENOMEM;
> > +			goto bad_hash_alloc;
> > +		}
> > +		hash_desc->tfm = tfm;
> > +		hash_desc->flags = 0x0;
> > +	}
> > +	vt->digest_size = crypto_shash_digestsize(tfm);
> > +	/* We expect to be able to pack >=2 hashes into a block */
> > +	if (block_size / vt->digest_size < 2) {
> > +		DMERR("too few hashes fit in a block");
> > +		status = -EINVAL;
> > +		goto bad_arg;
> > +	}
> > +
> > +	if (vt->digest_size > VERITY_MAX_DIGEST_SIZE) {
> > +		DMERR("VERITY_MAX_DIGEST_SIZE too small for digest");
> > +		status = -EINVAL;
> > +		goto bad_arg;
> > +	}
> > +
> > +	/* Configure the tree */
> > +	vt->block_count = block_count;
> > +	if (block_count == 0) {
> > +		DMERR("block_count must be non-zero");
> > +		status = -EINVAL;
> > +		goto bad_arg;
> > +	}
> > +
> > +	/* Each verity_tree_entry->nodes is one block.  The node code tracks
> > +	 * how many nodes fit into one entry where a node is a single
> > +	 * hash (message digest).
> > +	 */
> > +	vt->node_count_shift = fls(block_size / vt->digest_size) - 1;
> > +	/* Round down to the nearest power of two.  This makes indexing
> > +	 * into the tree much less painful.
> > +	 */
> > +	vt->node_count = 1 << vt->node_count_shift;
> > +
> > +	/* This is unlikely to happen, but with 64k pages, who knows. */
> > +	if (vt->node_count > UINT_MAX / vt->digest_size) {
> > +		DMERR("node_count * hash_len exceeds UINT_MAX!");
> > +		status = -EINVAL;
> > +		goto bad_arg;
> > +	}
> > +
> > +	vt->depth = DIV_ROUND_UP(fls(block_count - 1), vt->node_count_shift);
> > +
> > +	/* Ensure that we can safely shift by this value. */
> > +	if (vt->depth * vt->node_count_shift >= sizeof(unsigned int) * 8) {
> > +		DMERR("specified depth and node_count_shift is too large");
> > +		status = -EINVAL;
> > +		goto bad_arg;
> > +	}
> > +
> > +	/* Allocate levels. Each level of the tree may have an arbitrary number
> > +	 * of verity_tree_entry structs.  Each entry contains node_count nodes.
> > +	 * Each node in the tree is a cryptographic digest of either node_count
> > +	 * nodes on the subsequent level or of a specific block on disk.
> > +	 */
> > +	vt->levels = (struct verity_tree_level *)
> > +			kcalloc(vt->depth,
> > +				sizeof(struct verity_tree_level), GFP_KERNEL);
> > +	if (!vt->levels) {
> > +		DMERR("failed to allocate tree levels");
> > +		status = -ENOMEM;
> > +		goto bad_level_alloc;
> > +	}
> > +
> > +	vt->read_cb = NULL;
> > +
> > +	status = verity_tree_initialize_entries(vt);
> > +	if (status)
> > +		goto bad_entries_alloc;
> > +
> > +	/* We compute depth such that there is only be 1 block at level 0. */
> > +	BUG_ON(vt->levels[0].count != 1);
> > +
> > +	return 0;
> > +
> > +bad_entries_alloc:
> > +	while (vt->depth-- > 0)
> > +		kfree(vt->levels[vt->depth].entries);
> > +	kfree(vt->levels);
> > +bad_level_alloc:
> > +bad_arg:
> > +bad_hash_alloc:
> > +	for_each_possible_cpu(cpu)
> > +		if (*per_cpu_ptr(vt->hash_desc, cpu))
> > +			kfree(*per_cpu_ptr(vt->hash_desc, cpu));
> > +	free_percpu(vt->hash_desc);
> > +bad_per_cpu:
> > +	crypto_free_shash(tfm);
> > +	return status;
> > +}
> > +
> > +/**
> > + * verity_tree_read_completed
> > + * @entry:   pointer to the entry that's been loaded
> > + * @status:  I/O status. Non-zero is failure.
> > + * MUST always be called after a read_cb completes.
> > + */
> > +static void verity_tree_read_completed(struct verity_tree_entry *entry,
> > +				       int status)
> > +{
> > +	if (status) {
> > +		DMCRIT("an I/O error occurred while reading entry");
> > +		atomic_set(&entry->state, VERITY_TREE_ENTRY_ERROR_IO);
> > +		return;
> > +	}
> > +	BUG_ON(atomic_read(&entry->state) != VERITY_TREE_ENTRY_PENDING);
> > +	atomic_set(&entry->state, VERITY_TREE_ENTRY_READY);
> > +}
> > +
> > +/**
> > + * verity_tree_verify_block - checks that all path nodes for @block are valid
> > + * @vt:	     pointer to a verity_tree_create()d vt
> > + * @block:   specific block data is expected from
> > + * @pg:	     page holding the block data
> > + * @offset:  offset into the page
> > + *
> > + * Returns 0 on success, VERITY_TREE_ENTRY_ERROR_MISMATCH on error.
> > + */
> > +static int verity_tree_verify_block(struct verity_tree *vt, unsigned int block,
> > +				    struct page *pg, unsigned int offset)
> > +{
> > +	int state, depth = vt->depth;
> > +	u8 digest[VERITY_MAX_DIGEST_SIZE];
> > +	struct verity_tree_entry *entry;
> > +	void *node;
> > +
> > +	do {
> > +		/* Need to check that the hash of the current block is accurate
> > +		 * in its parent.
> > +		 */
> > +		entry = verity_tree_get_entry(vt, depth - 1, block);
> > +		state = atomic_read(&entry->state);
> > +		/* This call is only safe if all nodes along the path
> > +		 * are already populated (i.e. READY) via verity_tree_populate.
> > +		 */
> > +		BUG_ON(state < VERITY_TREE_ENTRY_READY);
> > +		node = verity_tree_get_node(vt, entry, depth, block);
> > +
> > +		if (verity_tree_compute_hash(vt, pg, offset, digest) ||
> > +		    memcmp(digest, node, vt->digest_size))
> > +			goto mismatch;
> > +
> > +		/* Keep the containing block of hashes to be verified in the
> > +		 * next pass.
> > +		 */
> > +		pg = virt_to_page(entry->nodes);
> > +		offset = offset_in_page(entry->nodes);
> > +	} while (--depth > 0 && state != VERITY_TREE_ENTRY_VERIFIED);
> > +
> > +	if (depth == 0 && state != VERITY_TREE_ENTRY_VERIFIED) {
> > +		if (verity_tree_compute_hash(vt, pg, offset, digest) ||
> > +		    memcmp(digest, vt->digest, vt->digest_size))
> > +			goto mismatch;
> > +		atomic_set(&entry->state, VERITY_TREE_ENTRY_VERIFIED);
> > +	}
> > +
> > +	/* Mark path to leaf as verified. */
> > +	for (depth++; depth < vt->depth; depth++) {
> > +		entry = verity_tree_get_entry(vt, depth, block);
> > +		/* At this point, entry can only be in VERIFIED or READY state.
> > +		 * So it is safe to use atomic_set instead of atomic_cmpxchg.
> > +		 */
> > +		atomic_set(&entry->state, VERITY_TREE_ENTRY_VERIFIED);
> > +	}
> > +
> > +	return 0;
> > +
> > +mismatch:
> > +	DMERR_LIMIT("verify_path: failed to verify hash (d=%d,bi=%u)",
> > +		    depth, block);
> > +	return VERITY_TREE_ENTRY_ERROR_MISMATCH;
> > +}
> > +
> > +/**
> > + * verity_tree_is_populated - check that nodes needed to verify a given
> > + *                            block are all ready
> > + * @vt:	    pointer to a verity_tree_create()d vt
> > + * @block:  specific block data is expected from
> > + *
> > + * Callers may wish to call verity_tree_is_populated() when checking an io
> > + * for which entries were already pending.
> > + */
> > +static bool verity_tree_is_populated(struct verity_tree *vt, unsigned int block)
> > +{
> > +	int depth;
> > +
> > +	for (depth = vt->depth - 1; depth >= 0; depth--) {
> > +		struct verity_tree_entry *entry;
> > +		entry = verity_tree_get_entry(vt, depth, block);
> > +		if (atomic_read(&entry->state) < VERITY_TREE_ENTRY_READY)
> > +			return false;
> > +	}
> > +
> > +	return true;
> > +}
> > +
> > +/**
> > + * verity_tree_populate - reads entries from disk needed to verify a given block
> > + * @vt:     pointer to a verity_tree_create()d vt
> > + * @ctx:    context used for all read_cb calls on this request
> > + * @block:  specific block data is expected from
> > + *
> > + * Returns negative value on error. Returns 0 on success.
> > + */
> > +static int verity_tree_populate(struct verity_tree *vt, void *ctx,
> > +				unsigned int block)
> > +{
> > +	int depth, state;
> > +
> > +	BUG_ON(block >= vt->block_count);
> > +
> > +	for (depth = vt->depth - 1; depth >= 0; --depth) {
> > +		unsigned int index;
> > +		struct verity_tree_level *level;
> > +		struct verity_tree_entry *entry;
> > +
> > +		index = verity_tree_index_at_level(vt, depth, block);
> > +		level = &vt->levels[depth];
> > +		entry = verity_tree_get_entry(vt, depth, block);
> > +		state = atomic_cmpxchg(&entry->state,
> > +				       VERITY_TREE_ENTRY_UNALLOCATED,
> > +				       VERITY_TREE_ENTRY_PENDING);
> > +		if (state == VERITY_TREE_ENTRY_VERIFIED)
> > +			break;
> > +		if (state <= VERITY_TREE_ENTRY_ERROR)
> > +			goto error_state;
> > +		if (state != VERITY_TREE_ENTRY_UNALLOCATED)
> > +			continue;
> > +
> > +		/* Current entry is claimed for allocation and loading */
> > +		entry->nodes = kmalloc(vt->block_size, GFP_NOIO);
> > +		if (!entry->nodes)
> > +			goto nomem;
> > +
> > +		vt->read_cb(ctx,
> > +			    level->sector + to_sector(index * vt->block_size),
> > +			    entry->nodes, to_sector(vt->block_size), entry);
> > +	}
> > +
> > +	return 0;
> > +
> > +error_state:
> > +	DMCRIT("block %u at depth %d is in an error state", block, depth);
> > +	return -EPERM;
> > +
> > +nomem:
> > +	DMCRIT("failed to allocate memory for entry->nodes");
> > +	return -ENOMEM;
> > +}
> > +
> > +/**
> > + * verity_tree_destroy - cleans up all memory used by @vt
> > + * @vt:	 pointer to a verity_tree_create()d vt
> > + */
> > +static void verity_tree_destroy(struct verity_tree *vt)
> > +{
> > +	int depth, cpu;
> > +
> > +	for (depth = 0; depth < vt->depth; depth++) {
> > +		struct verity_tree_entry *entry = vt->levels[depth].entries;
> > +		struct verity_tree_entry *entry_end = entry +
> > +			vt->levels[depth].count;
> > +		for (; entry < entry_end; ++entry)
> > +			kfree(entry->nodes);
> > +		kfree(vt->levels[depth].entries);
> > +	}
> > +	kfree(vt->levels);
> > +	crypto_free_shash((*per_cpu_ptr(vt->hash_desc, 0))->tfm);
> > +	for_each_possible_cpu(cpu)
> > +		kfree(*per_cpu_ptr(vt->hash_desc, cpu));
> > +}
> > +
> > +/*
> > + * Verity Tree Accessors
> > + */
> > +
> > +/**
> > + * verity_tree_set_digest - sets an unverified root digest hash from hex
> > + * @vt:	     pointer to a verity_tree_create()d vt
> > + * @digest:  string containing the digest in hex
> > + * Returns non-zero on error.
> > + */
> > +static int verity_tree_set_digest(struct verity_tree *vt, const char *digest)
> > +{
> > +	/* Make sure we have at least the bytes expected */
> > +	if (strnlen((char *)digest, vt->digest_size * 2) !=
> > +	    vt->digest_size * 2) {
> > +		DMERR("root digest length does not match hash algorithm");
> > +		return -1;
> > +	}
> > +	return hex2bin(vt->digest, digest, vt->digest_size);
> > +}
> > +
> > +/**
> > + * verity_tree_digest - returns root digest in hex
> > + * @vt:	     pointer to a verity_tree_create()d vt
> > + * @digest:  buffer to put into, must be of length VERITY_SALT_SIZE * 2 + 1.
> > + */
> > +int verity_tree_digest(struct verity_tree *vt, char *digest)
> > +{
> > +	bin2hex(digest, vt->digest, vt->digest_size);
> > +	return 0;
> > +}
> > +
> > +/**
> > + * verity_tree_set_salt - sets the salt
> > + * @vt:    pointer to a verity_tree_create()d vt
> > + * @salt:  string containing the salt in hex
> > + * Returns non-zero on error.
> > + */
> > +int verity_tree_set_salt(struct verity_tree *vt, const char *salt)
> > +{
> > +	size_t saltlen = min(strlen(salt) / 2, sizeof(vt->salt));
> > +	memset(vt->salt, 0, sizeof(vt->salt));
> > +	return hex2bin(vt->salt, salt, saltlen);
> > +}
> > +
> > +
> > +/**
> > + * verity_tree_salt - returns the salt in hex
> > + * @vt:    pointer to a verity_tree_create()d vt
> > + * @salt:  buffer to put salt into, of length VERITY_SALT_SIZE * 2 + 1.
> > + */
> > +int verity_tree_salt(struct verity_tree *vt, char *salt)
> > +{
> > +	bin2hex(salt, vt->salt, sizeof(vt->salt));
> > +	return 0;
> > +}
> > +
> > +/*
> > + * Allocation and utility functions
> > + */
> > +
> > +static void kverityd_src_io_read_end(struct bio *clone, int error);
> > +
> > +/* Shared destructor for all internal bios */
> > +static void verity_bio_destructor(struct bio *bio)
> > +{
> > +	struct verity_io *io = bio->bi_private;
> > +	struct verity_config *vc = io->target->private;
> > +	bio_free(bio, vc->bs);
> > +}
> > +
> > +static struct bio *verity_alloc_bioset(struct verity_config *vc, gfp_t gfp_mask,
> > +				       int nr_iovecs)
> > +{
> > +	return bio_alloc_bioset(gfp_mask, nr_iovecs, vc->bs);
> > +}
> > +
> > +static struct verity_io *verity_io_alloc(struct dm_target *ti,
> > +					    struct bio *bio)
> > +{
> > +	struct verity_config *vc = ti->private;
> > +	sector_t sector = bio->bi_sector - ti->begin;
> > +	struct verity_io *io;
> > +
> > +	io = mempool_alloc(vc->io_pool, GFP_NOIO);
> > +	if (unlikely(!io))
> > +		return NULL;
> > +	io->flags = 0;
> > +	io->target = ti;
> > +	io->bio = bio;
> > +	io->error = 0;
> > +
> > +	/* Adjust the sector by the virtual starting sector */
> > +	io->block = to_bytes(sector) / vc->bht.block_size;
> > +	io->count = bio->bi_size / vc->bht.block_size;
> > +
> > +	atomic_set(&io->pending, 0);
> > +
> > +	return io;
> > +}
> > +
> > +static struct bio *verity_bio_clone(struct verity_io *io)
> > +{
> > +	struct verity_config *vc = io->target->private;
> > +	struct bio *bio = io->bio;
> > +	struct bio *clone = verity_alloc_bioset(vc, GFP_NOIO, bio->bi_max_vecs);
> > +
> > +	if (!clone)
> > +		return NULL;
> > +
> > +	__bio_clone(clone, bio);
> > +	clone->bi_private = io;
> > +	clone->bi_end_io  = kverityd_src_io_read_end;
> > +	clone->bi_bdev    = vc->dev->bdev;
> > +	clone->bi_sector += vc->start - io->target->begin;
> > +	clone->bi_destructor = verity_bio_destructor;
> > +
> > +	return clone;
> > +}
> > +
> > +/*
> > + * Reverse flow of requests into the device.
> > + *
> > + * (Start at the bottom with verity_map and work your way upward).
> > + */
> > +
> > +static void verity_inc_pending(struct verity_io *io);
> > +
> > +static void verity_return_bio_to_caller(struct verity_io *io)
> > +{
> > +	struct verity_config *vc = io->target->private;
> > +
> > +	if (io->error)
> > +		io->error = -EIO;
> > +
> > +	bio_endio(io->bio, io->error);
> > +	mempool_free(io, vc->io_pool);
> > +}
> > +
> > +/* Check for any missing bht hashes. */
> > +static bool verity_is_bht_populated(struct verity_io *io)
> > +{
> > +	struct verity_config *vc = io->target->private;
> > +	u64 block;
> > +
> > +	for (block = io->block; block < io->block + io->count; ++block)
> > +		if (!verity_tree_is_populated(&vc->bht, block))
> > +			return false;
> > +
> > +	return true;
> > +}
> > +
> > +/* verity_dec_pending manages the lifetime of all verity_io structs.
> > + * Non-bug error handling is centralized through this interface and
> > + * all passage from workqueue to workqueue.
> > + */
> > +static void verity_dec_pending(struct verity_io *io)
> > +{
> > +	if (!atomic_dec_and_test(&io->pending))
> > +		goto done;
> > +
> > +	if (unlikely(io->error))
> > +		goto io_error;
> > +
> > +	/* I/Os that were pending may now be ready */
> > +	if (verity_is_bht_populated(io)) {
> > +		INIT_DELAYED_WORK(&io->work, kverityd_verify);
> > +		queue_delayed_work(kveritydq, &io->work, 0);
> > +	} else {
> > +		INIT_DELAYED_WORK(&io->work, kverityd_io);
> > +		queue_delayed_work(kverityd_ioq, &io->work, HZ/10);
> > +	}
> > +
> > +done:
> > +	return;
> > +
> > +io_error:
> > +	verity_return_bio_to_caller(io);
> > +}
> > +
> > +/* Walks the data set and computes the hash of the data read from the
> > + * untrusted source device.  The computed hash is then passed to verity-tree
> > + * for verification.
> > + */
> > +static int verity_verify(struct verity_config *vc,
> > +			 struct verity_io *io)
> > +{
> > +	unsigned int block_size = vc->bht.block_size;
> > +	struct bio *bio = io->bio;
> > +	u64 block = io->block;
> > +	unsigned int idx;
> > +	int r;
> > +
> > +	for (idx = bio->bi_idx; idx < bio->bi_vcnt; idx++) {
> > +		struct bio_vec *bv = bio_iovec_idx(bio, idx);
> > +		unsigned int offset = bv->bv_offset;
> > +		unsigned int len = bv->bv_len;
> > +
> > +		BUG_ON(offset % block_size);
> > +		BUG_ON(len % block_size);
> > +
> > +		while (len) {
> > +			r = verity_tree_verify_block(&vc->bht, block,
> > +						bv->bv_page, offset);
> > +			if (r)
> > +				goto bad_return;
> > +
> > +			offset += block_size;
> > +			len -= block_size;
> > +			block++;
> > +			cond_resched();
> > +		}
> > +	}
> > +
> > +	return 0;
> > +
> > +bad_return:
> > +	/* verity_tree functions aren't expected to return errno friendly
> > +	 * values.  They are converted here for uniformity.
> > +	 */
> > +	if (r > 0) {
> > +		DMERR("Pending data for block %llu seen at verify", ULL(block));
> > +		r = -EBUSY;
> > +	} else {
> > +		DMERR_LIMIT("Block hash does not match!");
> > +		r = -EACCES;
> > +	}
> > +	return r;
> > +}
> > +
> > +/* Services the verify workqueue */
> > +static void kverityd_verify(struct work_struct *work)
> > +{
> > +	struct delayed_work *dwork = container_of(work, struct delayed_work,
> > +						  work);
> > +	struct verity_io *io = container_of(dwork, struct verity_io,
> > +					    work);
> > +	struct verity_config *vc = io->target->private;
> > +
> > +	io->error = verity_verify(vc, io);
> > +
> > +	/* Free up the bio and tag with the return value */
> > +	verity_return_bio_to_caller(io);
> > +}
> > +
> > +/* Asynchronously called upon the completion of verity-tree I/O. The status
> > + * of the operation is passed back to verity-tree and the next steps are
> > + * decided by verity_dec_pending.
> > + */
> > +static void kverityd_io_bht_populate_end(struct bio *bio, int error)
> > +{
> > +	struct verity_tree_entry *entry;
> > +	struct verity_io *io;
> > +
> > +	entry = (struct verity_tree_entry *) bio->bi_private;
> > +	io = (struct verity_io *) entry->io_context;
> > +
> > +	/* Tell the tree to atomically update now that we've populated
> > +	 * the given entry.
> > +	 */
> > +	verity_tree_read_completed(entry, error);
> > +
> > +	/* Clean up for reuse when reading data to be checked */
> > +	bio->bi_vcnt = 0;
> > +	bio->bi_io_vec->bv_offset = 0;
> > +	bio->bi_io_vec->bv_len = 0;
> > +	bio->bi_io_vec->bv_page = NULL;
> > +	/* Restore the private data to I/O so the destructor can be shared. */
> > +	bio->bi_private = (void *) io;
> > +	bio_put(bio);
> > +
> > +	/* We bail but assume the tree has been marked bad. */
> > +	if (unlikely(error)) {
> > +		DMERR("Failed to read for sector %llu (%u)",
> > +		      ULL(io->bio->bi_sector), io->bio->bi_size);
> > +		io->error = error;
> > +		/* Pass through the error to verity_dec_pending below */
> > +	}
> > +	/* When pending = 0, it will transition to reading real data */
> > +	verity_dec_pending(io);
> > +}
> > +
> > +/* Called by verity-tree (via verity_tree_populate), this function provides
> > + * the message digests to verity-tree that are stored on disk.
> > + */
> > +static int kverityd_bht_read_callback(void *ctx, sector_t start, u8 *dst,
> > +				      sector_t count,
> > +				      struct verity_tree_entry *entry)
> > +{
> > +	struct verity_io *io = ctx;  /* I/O for this batch */
> > +	struct verity_config *vc;
> > +	struct bio *bio;
> > +
> > +	vc = io->target->private;
> > +
> > +	/* The I/O context is nested inside the entry so that we don't need one
> > +	 * io context per page read.
> > +	 */
> > +	entry->io_context = ctx;
> > +
> > +	/* We should only get page size requests at present. */
> > +	verity_inc_pending(io);
> > +	bio = verity_alloc_bioset(vc, GFP_NOIO, 1);
> > +	if (unlikely(!bio)) {
> > +		DMCRIT("Out of memory at bio_alloc_bioset");
> > +		verity_tree_read_completed(entry, -ENOMEM);
> > +		return -ENOMEM;
> > +	}
> > +	bio->bi_private = (void *) entry;
> > +	bio->bi_idx = 0;
> > +	bio->bi_size = vc->bht.block_size;
> > +	bio->bi_sector = vc->hash_start + start;
> > +	bio->bi_bdev = vc->hash_dev->bdev;
> > +	bio->bi_end_io = kverityd_io_bht_populate_end;
> > +	bio->bi_rw = REQ_META;
> > +	/* Only need to free the bio since the page is managed by bht */
> > +	bio->bi_destructor = verity_bio_destructor;
> > +	bio->bi_vcnt = 1;
> > +	bio->bi_io_vec->bv_offset = offset_in_page(dst);
> > +	bio->bi_io_vec->bv_len = to_bytes(count);
> > +	/* dst is guaranteed to be a page_pool allocation */
> > +	bio->bi_io_vec->bv_page = virt_to_page(dst);
> > +	/* Track that this I/O is in use.  There should be no risk of the io
> > +	 * being removed prior since this is called synchronously.
> > +	 */
> > +	generic_make_request(bio);
> > +	return 0;
> > +}
> > +
> > +/* Submits an io request for each missing block of block hashes.
> > + * The last one to return will then enqueue this on the io workqueue.
> > + */
> > +static void kverityd_io_bht_populate(struct verity_io *io)
> > +{
> > +	struct verity_config *vc = io->target->private;
> > +	u64 block;
> > +
> > +	for (block = io->block; block < io->block + io->count; ++block) {
> > +		int ret = verity_tree_populate(&vc->bht, io, block);
> > +
> > +		if (ret < 0) {
> > +			/* verity_dec_pending will handle the error case. */
> > +			io->error = ret;
> > +			break;
> > +		}
> > +	}
> > +}
> > +
> > +/* Asynchronously called upon the completion of I/O issued
> > + * from kverityd_src_io_read. verity_dec_pending() acts as
> > + * the scheduler/flow manager.
> > + */
> > +static void kverityd_src_io_read_end(struct bio *clone, int error)
> > +{
> > +	struct verity_io *io = clone->bi_private;
> > +
> > +	if (unlikely(!bio_flagged(clone, BIO_UPTODATE) && !error))
> > +		error = -EIO;
> > +
> > +	if (unlikely(error)) {
> > +		DMERR("Error occurred: %d (%llu, %u)",
> > +			error, ULL(clone->bi_sector), clone->bi_size);
> > +		io->error = error;
> > +	}
> > +
> > +	/* Release the clone which just avoids the block layer from
> > +	 * leaving offsets, etc in unexpected states.
> > +	 */
> > +	bio_put(clone);
> > +
> > +	verity_dec_pending(io);
> > +}
> > +
> > +/* If not yet underway, an I/O request will be issued to the vc->dev
> > + * device for the data needed. It is cloned to avoid unexpected changes
> > + * to the original bio struct.
> > + */
> > +static void kverityd_src_io_read(struct verity_io *io)
> > +{
> > +	struct bio *clone;
> > +
> > +	/* Check if the read is already issued. */
> > +	if (io->flags & VERITY_IOFLAGS_CLONED)
> > +		return;
> > +
> > +	io->flags |= VERITY_IOFLAGS_CLONED;
> > +
> > +	/* Clone the bio. The block layer may modify the bvec array. */
> > +	clone = verity_bio_clone(io);
> > +	if (unlikely(!clone)) {
> > +		io->error = -ENOMEM;
> > +		return;
> > +	}
> > +
> > +	verity_inc_pending(io);
> > +
> > +	generic_make_request(clone);
> > +}
> > +
> > +/* kverityd_io services the I/O workqueue. For each pass through
> > + * the I/O workqueue, a call to populate both the origin drive
> > + * data and the hash tree data is made.
> > + */
> > +static void kverityd_io(struct work_struct *work)
> > +{
> > +	struct delayed_work *dwork = container_of(work, struct delayed_work,
> > +						  work);
> > +	struct verity_io *io = container_of(dwork, struct verity_io,
> > +					    work);
> > +
> > +	/* Issue requests asynchronously. */
> > +	verity_inc_pending(io);
> > +	kverityd_src_io_read(io);
> > +	kverityd_io_bht_populate(io);
> > +	verity_dec_pending(io);
> > +}
> > +
> > +/* Paired with verity_dec_pending, the pending value in the io dictate the
> > + * lifetime of a request and when it is ready to be processed on the
> > + * workqueues.
> > + */
> > +static void verity_inc_pending(struct verity_io *io)
> > +{
> > +	atomic_inc(&io->pending);
> > +}
> > +
> > +/* Block-level requests start here. */
> > +static int verity_map(struct dm_target *ti, struct bio *bio,
> > +		      union map_info *map_context)
> > +{
> > +	struct verity_io *io;
> > +	struct verity_config *vc;
> > +	struct request_queue *r_queue;
> > +
> > +	if (unlikely(!ti)) {
> > +		DMERR("dm_target was NULL");
> > +		return -EIO;
> > +	}
> > +
> > +	vc = ti->private;
> > +	r_queue = bdev_get_queue(vc->dev->bdev);
> > +
> > +	if (bio_data_dir(bio) == WRITE) {
> > +		/* If we silently drop writes, then the VFS layer will cache
> > +		 * the write and persist it in memory. While it doesn't change
> > +		 * the underlying storage, it still may be contrary to the
> > +		 * behavior expected by a verified, read-only device.
> > +		 */
> > +		DMWARN_LIMIT("write request received. rejecting with -EIO.");
> > +		return -EIO;
> > +	} else {
> > +		/* Queue up the request to be verified */
> > +		io = verity_io_alloc(ti, bio);
> > +		if (!io) {
> > +			DMERR_LIMIT("Failed to allocate and init IO data");
> > +			return DM_MAPIO_REQUEUE;
> > +		}
> > +		INIT_DELAYED_WORK(&io->work, kverityd_io);
> > +		queue_delayed_work(kverityd_ioq, &io->work, 0);
> > +	}
> > +
> > +	return DM_MAPIO_SUBMITTED;
> > +}
> > +
> > +/*
> > + * Non-block interfaces and device-mapper specific code
> > + */
> > +
> > +/*
> > + * Verity target parameters:
> > + *
> > + * <dev> <hash_dev> <hash_start> <block_size> <alg> <digest> <salt>
> > + *
> > + * version:        version of the hash tree on-disk format
> > + * dev:            device to verify
> > + * hash_dev:       device hashtree is stored on
> > + * hash_start:     start address of hashes
> > + * block_size:     size of a hash block
> > + * alg:            hash algorithm
> > + * digest:         toplevel hash of the tree
> > + * salt:           salt
> > + */
> > +static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv)
> > +{
> > +	struct verity_config *vc = NULL;
> > +	const char *dev, *hash_dev, *alg, *digest, *salt;
> > +	unsigned long hash_start, block_size, version;
> > +	sector_t blocks;
> > +	int ret;
> > +
> > +	if (argc != 8) {
> > +		ti->error = "Invalid argument count";
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (strict_strtoul(argv[0], 10, &version) ||
> > +	    (version != 0)) {
> > +		ti->error = "Invalid version";
> > +		return -EINVAL;
> > +	}
> > +	dev = argv[1];
> > +	hash_dev = argv[2];
> > +	if (strict_strtoul(argv[3], 10, &hash_start)) {
> > +		ti->error = "Invalid hash_start";
> > +		return -EINVAL;
> > +	}
> > +	if (strict_strtoul(argv[4], 10, &block_size) ||
> > +	    (block_size > UINT_MAX)) {
> > +		ti->error = "Invalid block_size";
> > +		return -EINVAL;
> > +	}
> > +	alg = argv[5];
> > +	digest = argv[6];
> > +	salt = argv[7];
> > +
> > +	/* The device mapper device should be setup read-only */
> > +	if ((dm_table_get_mode(ti->table) & ~FMODE_READ) != 0) {
> > +		ti->error = "Must be created readonly.";
> > +		return -EINVAL;
> > +	}
> > +
> > +	vc = kzalloc(sizeof(*vc), GFP_KERNEL);
> > +	if (!vc) {
> > +		return -EINVAL;
> > +	}
> > +
> > +	/* Calculate the blocks from the given device size */
> > +	vc->size = ti->len;
> > +	blocks = to_bytes(vc->size) / block_size;
> > +	if (verity_tree_create(&vc->bht, blocks, block_size, alg)) {
> > +		DMERR("failed to create required bht");
> > +		goto bad_bht;
> > +	}
> > +	if (verity_tree_set_digest(&vc->bht, digest)) {
> > +		DMERR("digest error");
> > +		goto bad_digest;
> > +	}
> > +	verity_tree_set_salt(&vc->bht, salt);
> > +	vc->bht.read_cb = kverityd_bht_read_callback;
> > +
> > +	vc->start = 0;
> > +	/* We only ever grab the device in read-only mode. */
> > +	ret = dm_get_device(ti, dev, dm_table_get_mode(ti->table), &vc->dev);
> > +	if (ret) {
> > +		DMERR("Failed to acquire device '%s': %d", dev, ret);
> > +		ti->error = "Device lookup failed";
> > +		goto bad_verity_dev;
> > +	}
> > +
> > +	if ((to_bytes(vc->start) % block_size) ||
> > +	    (to_bytes(vc->size) % block_size)) {
> > +		ti->error = "Device must be block_size divisble/aligned";
> > +		goto bad_hash_start;
> > +	}
> > +
> > +	vc->hash_start = (sector_t)hash_start;
> > +
> > +	/*
> > +	 * Note, dev == hash_dev is okay as long as the size of
> > +	 *       ti->len passed to device mapper does not include
> > +	 *       the hashes.
> > +	 */
> > +	if (dm_get_device(ti, hash_dev,
> > +			  dm_table_get_mode(ti->table), &vc->hash_dev)) {
> > +		ti->error = "Hash device lookup failed";
> > +		goto bad_hash_dev;
> > +	}
> > +
> > +	if (snprintf(vc->hash_alg, CRYPTO_MAX_ALG_NAME, "%s", alg) >=
> > +	    CRYPTO_MAX_ALG_NAME) {
> > +		ti->error = "Hash algorithm name is too long";
> > +		goto bad_hash;
> > +	}
> > +
> > +	vc->io_pool = mempool_create_slab_pool(MIN_IOS, _verity_io_pool);
> > +	if (!vc->io_pool) {
> > +		ti->error = "Cannot allocate verity io mempool";
> > +		goto bad_slab_pool;
> > +	}
> > +
> > +	vc->bs = bioset_create(MIN_BIOS, 0);
> > +	if (!vc->bs) {
> > +		ti->error = "Cannot allocate verity bioset";
> > +		goto bad_bs;
> > +	}
> > +
> > +	ti->private = vc;
> > +
> > +	return 0;
> > +
> > +bad_bs:
> > +	mempool_destroy(vc->io_pool);
> > +bad_slab_pool:
> > +bad_hash:
> > +	dm_put_device(ti, vc->hash_dev);
> > +bad_hash_dev:
> > +bad_hash_start:
> > +	dm_put_device(ti, vc->dev);
> > +bad_bht:
> > +bad_digest:
> > +bad_verity_dev:
> > +	kfree(vc);   /* hash is not secret so no need to zero */
> > +	return -EINVAL;
> > +}
> > +
> > +static void verity_dtr(struct dm_target *ti)
> > +{
> > +	struct verity_config *vc = (struct verity_config *) ti->private;
> > +
> > +	bioset_free(vc->bs);
> > +	mempool_destroy(vc->io_pool);
> > +	verity_tree_destroy(&vc->bht);
> > +	dm_put_device(ti, vc->hash_dev);
> > +	dm_put_device(ti, vc->dev);
> > +	kfree(vc);
> > +}
> > +
> > +static int verity_ioctl(struct dm_target *ti, unsigned int cmd,
> > +			unsigned long arg)
> > +{
> > +	struct verity_config *vc = (struct verity_config *) ti->private;
> > +	struct dm_dev *dev = vc->dev;
> > +	int r = 0;
> > +
> > +	/*
> > +	 * Only pass ioctls through if the device sizes match exactly.
> > +	 */
> > +	if (vc->start ||
> > +	    ti->len != i_size_read(dev->bdev->bd_inode) >> SECTOR_SHIFT)
> > +		r = scsi_verify_blk_ioctl(NULL, cmd);
> > +
> > +	return r ? : __blkdev_driver_ioctl(dev->bdev, dev->mode, cmd, arg);
> > +}
> > +
> > +static int verity_status(struct dm_target *ti, status_type_t type,
> > +			char *result, unsigned int maxlen)
> > +{
> > +	struct verity_config *vc = (struct verity_config *) ti->private;
> > +	unsigned int sz = 0;
> > +	char digest[VERITY_MAX_DIGEST_SIZE * 2 + 1] = { 0 };
> > +	char salt[VERITY_SALT_SIZE * 2 + 1] = { 0 };
> > +
> > +	verity_tree_digest(&vc->bht, digest);
> > +	verity_tree_salt(&vc->bht, salt);
> > +
> > +	switch (type) {
> > +	case STATUSTYPE_INFO:
> > +		result[0] = '\0';
> > +		break;
> > +	case STATUSTYPE_TABLE:
> > +		DMEMIT("%s %s %llu %llu %s %s %s",
> > +		       vc->dev->name,
> > +		       vc->hash_dev->name,
> > +		       ULL(vc->hash_start),
> > +		       ULL(vc->bht.block_size),
> > +		       vc->hash_alg,
> > +		       digest,
> > +		       salt);
> > +		break;
> > +	}
> > +	return 0;
> > +}
> > +
> > +static int verity_merge(struct dm_target *ti, struct bvec_merge_data *bvm,
> > +		       struct bio_vec *biovec, int max_size)
> > +{
> > +	struct verity_config *vc = ti->private;
> > +	struct request_queue *q = bdev_get_queue(vc->dev->bdev);
> > +
> > +	if (!q->merge_bvec_fn)
> > +		return max_size;
> > +
> > +	bvm->bi_bdev = vc->dev->bdev;
> > +	bvm->bi_sector = vc->start + bvm->bi_sector - ti->begin;
> > +
> > +	/* Optionally, this could just return 0 to stick to single pages. */
> > +	return min(max_size, q->merge_bvec_fn(q, bvm, biovec));
> > +}
> > +
> > +static int verity_iterate_devices(struct dm_target *ti,
> > +				 iterate_devices_callout_fn fn, void *data)
> > +{
> > +	struct verity_config *vc = ti->private;
> > +
> > +	return fn(ti, vc->dev, vc->start, ti->len, data);
> > +}
> > +
> > +static void verity_io_hints(struct dm_target *ti,
> > +			    struct queue_limits *limits)
> > +{
> > +	struct verity_config *vc = ti->private;
> > +	unsigned int block_size = vc->bht.block_size;
> > +
> > +	limits->logical_block_size = block_size;
> > +	limits->physical_block_size = block_size;
> > +	blk_limits_io_min(limits, block_size);
> > +}
> > +
> > +static struct target_type verity_target = {
> > +	.name   = "verity",
> > +	.version = {0, 1, 0},
> > +	.module = THIS_MODULE,
> > +	.ctr    = verity_ctr,
> > +	.dtr    = verity_dtr,
> > +	.ioctl  = verity_ioctl,
> > +	.map    = verity_map,
> > +	.merge  = verity_merge,
> > +	.status = verity_status,
> > +	.iterate_devices = verity_iterate_devices,
> > +	.io_hints = verity_io_hints,
> > +};
> > +
> > +#define VERITY_WQ_FLAGS (WQ_CPU_INTENSIVE|WQ_HIGHPRI)
> > +
> > +static int __init verity_init(void)
> > +{
> > +	int r = -ENOMEM;
> > +
> > +	_verity_io_pool = KMEM_CACHE(verity_io, 0);
> > +	if (!_verity_io_pool) {
> > +		DMERR("failed to allocate pool verity_io");
> > +		goto bad_io_pool;
> > +	}
> > +
> > +	kverityd_ioq = alloc_workqueue("kverityd_io", VERITY_WQ_FLAGS, 1);
> > +	if (!kverityd_ioq) {
> > +		DMERR("failed to create workqueue kverityd_ioq");
> > +		goto bad_io_queue;
> > +	}
> > +
> > +	kveritydq = alloc_workqueue("kverityd", VERITY_WQ_FLAGS, 1);
> > +	if (!kveritydq) {
> > +		DMERR("failed to create workqueue kveritydq");
> > +		goto bad_verify_queue;
> > +	}
> > +
> > +	r = dm_register_target(&verity_target);
> > +	if (r < 0) {
> > +		DMERR("register failed %d", r);
> > +		goto register_failed;
> > +	}
> > +
> > +	DMINFO("version %u.%u.%u loaded", verity_target.version[0],
> > +	       verity_target.version[1], verity_target.version[2]);
> > +
> > +	return r;
> > +
> > +register_failed:
> > +	destroy_workqueue(kveritydq);
> > +bad_verify_queue:
> > +	destroy_workqueue(kverityd_ioq);
> > +bad_io_queue:
> > +	kmem_cache_destroy(_verity_io_pool);
> > +bad_io_pool:
> > +	return r;
> > +}
> > +
> > +static void __exit verity_exit(void)
> > +{
> > +	destroy_workqueue(kveritydq);
> > +	destroy_workqueue(kverityd_ioq);
> > +
> > +	dm_unregister_target(&verity_target);
> > +	kmem_cache_destroy(_verity_io_pool);
> > +}
> > +
> > +module_init(verity_init);
> > +module_exit(verity_exit);
> > +
> > +MODULE_AUTHOR("The Chromium OS Authors <chromium-os-dev@chromium.org>");
> > +MODULE_DESCRIPTION(DM_NAME " target for transparent disk integrity checking");
> > +MODULE_LICENSE("GPL");
> > -- 
> > 1.7.7.3
> > 

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] dm: verity target
  2012-02-28 22:57 Mandeep Singh Baines
  2012-02-29 21:16 ` Mikulas Patocka
@ 2012-02-29 21:30 ` Andrew Morton
  1 sibling, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2012-02-29 21:30 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: Alasdair G Kergon, dm-devel, linux-kernel, Will Drewry,
	Elly Jones, Milan Broz, Olof Johansson, Steffen Klassert,
	Mikulas Patocka

On Tue, 28 Feb 2012 14:57:52 -0800
Mandeep Singh Baines <msb@chromium.org> wrote:

> The verity target provides transparent integrity checking of block devices
> using a cryptographic digest.
> 
> dm-verity is meant to be setup as part of a verified boot path.  This
> may be anything ranging from a boot using tboot or trustedgrub to just
> booting from a known-good device (like a USB drive or CD).
> 
> dm-verity is part of ChromeOS's verified boot path. It is used to verify
> the integrity of the root filesystem on boot. The root filesystem is
> mounted on a dm-verity partition which transparently verifies each block
> with a bootloader verified hash passed into the kernel at boot.

I brought my towering knowledge of DM drivers to bear upon your patch!

The documentation in this patch is brilliant.  You usefully documented
the data structures!  Never seen that before.

>
> ...
>
> +static int verity_tree_create(struct verity_tree *vt, unsigned int block_count,
> +			      unsigned int block_size, const char *alg_name)
> +{
> +	struct crypto_shash *tfm;
> +	int size, cpu, status = 0;
> +
> +	vt->block_size = block_size;
> +	/* Verify that PAGE_SIZE >= block_size >= SECTOR_SIZE. */
> +	if ((block_size > PAGE_SIZE) ||
> +	    (PAGE_SIZE % block_size) ||
> +	    (to_sector(block_size) == 0))
> +		return -EINVAL;
> +
> +	tfm = crypto_alloc_shash(alg_name, 0, 0);
> +	if (IS_ERR(tfm)) {
> +		DMERR("failed to allocate crypto hash '%s'", alg_name);
> +		return -ENOMEM;
> +	}
> +	size = sizeof(struct shash_desc) + crypto_shash_descsize(tfm);
> +
> +	vt->hash_desc = alloc_percpu(struct shash_desc *);
> +	if (!vt->hash_desc) {
> +		DMERR("Failed to allocate per cpu hash_desc");
> +		status = -ENOMEM;
> +		goto bad_per_cpu;
> +	}
> +
> +	/* Pre-allocate per-cpu crypto contexts to avoid having to
> +	 * kmalloc/kfree a context for every hash operation.
> +	 */
> +	for_each_possible_cpu(cpu) {

Is lame.  Can/should it be made cpu-hotplug-aware, so we use
for_each_online_cpu()?

> +		struct shash_desc *hash_desc = kmalloc(size, GFP_KERNEL);
> +
> +		*per_cpu_ptr(vt->hash_desc, cpu) = hash_desc;
> +		if (!hash_desc) {
> +			DMERR("failed to allocate crypto hash contexts");
> +			status = -ENOMEM;
> +			goto bad_hash_alloc;
> +		}
> +		hash_desc->tfm = tfm;
> +		hash_desc->flags = 0x0;
> +	}
> +	vt->digest_size = crypto_shash_digestsize(tfm);
> +	/* We expect to be able to pack >=2 hashes into a block */
> +	if (block_size / vt->digest_size < 2) {
> +		DMERR("too few hashes fit in a block");
> +		status = -EINVAL;
> +		goto bad_arg;
> +	}
> +
> +	if (vt->digest_size > VERITY_MAX_DIGEST_SIZE) {
> +		DMERR("VERITY_MAX_DIGEST_SIZE too small for digest");
> +		status = -EINVAL;
> +		goto bad_arg;
> +	}
> +
> +	/* Configure the tree */
> +	vt->block_count = block_count;
> +	if (block_count == 0) {
> +		DMERR("block_count must be non-zero");
> +		status = -EINVAL;
> +		goto bad_arg;
> +	}
> +
> +	/* Each verity_tree_entry->nodes is one block.  The node code tracks
> +	 * how many nodes fit into one entry where a node is a single
> +	 * hash (message digest).
> +	 */
> +	vt->node_count_shift = fls(block_size / vt->digest_size) - 1;
> +	/* Round down to the nearest power of two.  This makes indexing
> +	 * into the tree much less painful.
> +	 */
> +	vt->node_count = 1 << vt->node_count_shift;
> +
> +	/* This is unlikely to happen, but with 64k pages, who knows. */
> +	if (vt->node_count > UINT_MAX / vt->digest_size) {
> +		DMERR("node_count * hash_len exceeds UINT_MAX!");
> +		status = -EINVAL;
> +		goto bad_arg;
> +	}
> +
> +	vt->depth = DIV_ROUND_UP(fls(block_count - 1), vt->node_count_shift);
> +
> +	/* Ensure that we can safely shift by this value. */
> +	if (vt->depth * vt->node_count_shift >= sizeof(unsigned int) * 8) {
> +		DMERR("specified depth and node_count_shift is too large");
> +		status = -EINVAL;
> +		goto bad_arg;
> +	}
> +
> +	/* Allocate levels. Each level of the tree may have an arbitrary number
> +	 * of verity_tree_entry structs.  Each entry contains node_count nodes.
> +	 * Each node in the tree is a cryptographic digest of either node_count
> +	 * nodes on the subsequent level or of a specific block on disk.
> +	 */
> +	vt->levels = (struct verity_tree_level *)
> +			kcalloc(vt->depth,
> +				sizeof(struct verity_tree_level), GFP_KERNEL);
> +	if (!vt->levels) {
> +		DMERR("failed to allocate tree levels");
> +		status = -ENOMEM;
> +		goto bad_level_alloc;
> +	}
> +
> +	vt->read_cb = NULL;
> +
> +	status = verity_tree_initialize_entries(vt);
> +	if (status)
> +		goto bad_entries_alloc;
> +
> +	/* We compute depth such that there is only be 1 block at level 0. */
> +	BUG_ON(vt->levels[0].count != 1);
> +
> +	return 0;
> +
> +bad_entries_alloc:
> +	while (vt->depth-- > 0)
> +		kfree(vt->levels[vt->depth].entries);
> +	kfree(vt->levels);
> +bad_level_alloc:
> +bad_arg:
> +bad_hash_alloc:
> +	for_each_possible_cpu(cpu)
> +		if (*per_cpu_ptr(vt->hash_desc, cpu))

This test assumes that alloc_percpu() zeroed out the target memory. 
Not true, is it?

> +			kfree(*per_cpu_ptr(vt->hash_desc, cpu));

Also, kfree(NULL) is OK, so the test was unneeded.  But it will crash
the kernel either way ;)

> +	free_percpu(vt->hash_desc);
> +bad_per_cpu:
> +	crypto_free_shash(tfm);
> +	return status;
> +}
> +
>
> ...
>
> +static struct verity_io *verity_io_alloc(struct dm_target *ti,
> +					    struct bio *bio)
> +{
> +	struct verity_config *vc = ti->private;
> +	sector_t sector = bio->bi_sector - ti->begin;
> +	struct verity_io *io;
> +
> +	io = mempool_alloc(vc->io_pool, GFP_NOIO);
> +	if (unlikely(!io))
> +		return NULL;

Actually, mempool_alloc(..., __GFP_WAIT) cannot fail.  But I wouldn't
trust it either ;)

> +	io->flags = 0;
> +	io->target = ti;
> +	io->bio = bio;
> +	io->error = 0;
> +
> +	/* Adjust the sector by the virtual starting sector */
> +	io->block = to_bytes(sector) / vc->bht.block_size;
> +	io->count = bio->bi_size / vc->bht.block_size;
> +
> +	atomic_set(&io->pending, 0);
> +
> +	return io;
> +}
> +
>
> ...
>
> +static void verity_return_bio_to_caller(struct verity_io *io)
> +{
> +	struct verity_config *vc = io->target->private;
> +
> +	if (io->error)
> +		io->error = -EIO;

That's odd.  Why overwrite a potentially useful errno?

> +	bio_endio(io->bio, io->error);
> +	mempool_free(io, vc->io_pool);
> +}
> +
>
> ...
>
> +static void kverityd_io_bht_populate_end(struct bio *bio, int error)
> +{
> +	struct verity_tree_entry *entry;
> +	struct verity_io *io;
> +
> +	entry = (struct verity_tree_entry *) bio->bi_private;

Unneeded and undesirable cast of void*.

> +	io = (struct verity_io *) entry->io_context;

Ditto.

> +	/* Tell the tree to atomically update now that we've populated
> +	 * the given entry.
> +	 */
> +	verity_tree_read_completed(entry, error);
> +
> +	/* Clean up for reuse when reading data to be checked */
> +	bio->bi_vcnt = 0;
> +	bio->bi_io_vec->bv_offset = 0;
> +	bio->bi_io_vec->bv_len = 0;
> +	bio->bi_io_vec->bv_page = NULL;
> +	/* Restore the private data to I/O so the destructor can be shared. */
> +	bio->bi_private = (void *) io;
> +	bio_put(bio);
> +
> +	/* We bail but assume the tree has been marked bad. */
> +	if (unlikely(error)) {
> +		DMERR("Failed to read for sector %llu (%u)",
> +		      ULL(io->bio->bi_sector), io->bio->bi_size);
> +		io->error = error;
> +		/* Pass through the error to verity_dec_pending below */
> +	}
> +	/* When pending = 0, it will transition to reading real data */
> +	verity_dec_pending(io);
> +}
> +
> +/* Called by verity-tree (via verity_tree_populate), this function provides
> + * the message digests to verity-tree that are stored on disk.
> + */
> +static int kverityd_bht_read_callback(void *ctx, sector_t start, u8 *dst,
> +				      sector_t count,
> +				      struct verity_tree_entry *entry)
> +{
> +	struct verity_io *io = ctx;  /* I/O for this batch */
> +	struct verity_config *vc;
> +	struct bio *bio;
> +
> +	vc = io->target->private;
> +
> +	/* The I/O context is nested inside the entry so that we don't need one
> +	 * io context per page read.
> +	 */
> +	entry->io_context = ctx;
> +
> +	/* We should only get page size requests at present. */
> +	verity_inc_pending(io);
> +	bio = verity_alloc_bioset(vc, GFP_NOIO, 1);
> +	if (unlikely(!bio)) {
> +		DMCRIT("Out of memory at bio_alloc_bioset");
> +		verity_tree_read_completed(entry, -ENOMEM);
> +		return -ENOMEM;
> +	}
> +	bio->bi_private = (void *) entry;

Another unneeded cast.  And it's "undesirable" because the cast defeats
typechecking.  Suppose someone were to change "entry"'s type to "long".

> +	bio->bi_idx = 0;
> +	bio->bi_size = vc->bht.block_size;
> +	bio->bi_sector = vc->hash_start + start;
> +	bio->bi_bdev = vc->hash_dev->bdev;
> +	bio->bi_end_io = kverityd_io_bht_populate_end;
> +	bio->bi_rw = REQ_META;
> +	/* Only need to free the bio since the page is managed by bht */
> +	bio->bi_destructor = verity_bio_destructor;
> +	bio->bi_vcnt = 1;
> +	bio->bi_io_vec->bv_offset = offset_in_page(dst);
> +	bio->bi_io_vec->bv_len = to_bytes(count);
> +	/* dst is guaranteed to be a page_pool allocation */
> +	bio->bi_io_vec->bv_page = virt_to_page(dst);
> +	/* Track that this I/O is in use.  There should be no risk of the io
> +	 * being removed prior since this is called synchronously.
> +	 */
> +	generic_make_request(bio);
> +	return 0;
> +}
> +
>
> ...
>
> +static void kverityd_src_io_read_end(struct bio *clone, int error)
> +{
> +	struct verity_io *io = clone->bi_private;
> +
> +	if (unlikely(!bio_flagged(clone, BIO_UPTODATE) && !error))
> +		error = -EIO;
> +
> +	if (unlikely(error)) {
> +		DMERR("Error occurred: %d (%llu, %u)",
> +			error, ULL(clone->bi_sector), clone->bi_size);

Doing a printk() on each I/O error is often a bad idea - if a device
dies it can cause enormous uncontrollable log storms. 
printk_ratelimited(), perhaps?

> +		io->error = error;
> +	}
> +
> +	/* Release the clone which just avoids the block layer from
> +	 * leaving offsets, etc in unexpected states.
> +	 */
> +	bio_put(clone);
> +
> +	verity_dec_pending(io);
> +}
> +
>
> ...
>
> +static int verity_map(struct dm_target *ti, struct bio *bio,
> +		      union map_info *map_context)
> +{
> +	struct verity_io *io;
> +	struct verity_config *vc;
> +	struct request_queue *r_queue;
> +
> +	if (unlikely(!ti)) {
> +		DMERR("dm_target was NULL");
> +		return -EIO;
> +	}
> +
> +	vc = ti->private;
> +	r_queue = bdev_get_queue(vc->dev->bdev);
> +
> +	if (bio_data_dir(bio) == WRITE) {
> +		/* If we silently drop writes, then the VFS layer will cache
> +		 * the write and persist it in memory. While it doesn't change
> +		 * the underlying storage, it still may be contrary to the
> +		 * behavior expected by a verified, read-only device.
> +		 */
> +		DMWARN_LIMIT("write request received. rejecting with -EIO.");
> +		return -EIO;
> +	} else {
> +		/* Queue up the request to be verified */
> +		io = verity_io_alloc(ti, bio);
> +		if (!io) {
> +			DMERR_LIMIT("Failed to allocate and init IO data");
> +			return DM_MAPIO_REQUEUE;
> +		}
> +		INIT_DELAYED_WORK(&io->work, kverityd_io);
> +		queue_delayed_work(kverityd_ioq, &io->work, 0);

hm, I'm seeing delayed works being queued but I'm not seeing anywhere
which explicitly flushes them all out on the shutdown/rmmod paths.  Are
you sure we can't accidentally leave works in flight?

> +	}
> +
> +	return DM_MAPIO_SUBMITTED;
> +}
> +
>
> ...
>
> +static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv)
> +{
> +	struct verity_config *vc = NULL;
> +	const char *dev, *hash_dev, *alg, *digest, *salt;
> +	unsigned long hash_start, block_size, version;
> +	sector_t blocks;
> +	int ret;
> +
> +	if (argc != 8) {
> +		ti->error = "Invalid argument count";
> +		return -EINVAL;
> +	}
> +
> +	if (strict_strtoul(argv[0], 10, &version) ||

There's no point in me telling you things which checkpatch knows about ;)

> +	    (version != 0)) {
> +		ti->error = "Invalid version";
> +		return -EINVAL;
> +	}
>
> ...
>
> +static int verity_ioctl(struct dm_target *ti, unsigned int cmd,
> +			unsigned long arg)
> +{
> +	struct verity_config *vc = (struct verity_config *) ti->private;

Another unneeded/undesirable cast.  Multiple of instances of this one.

> +	struct dm_dev *dev = vc->dev;
> +	int r = 0;
> +
> +	/*
> +	 * Only pass ioctls through if the device sizes match exactly.
> +	 */
> +	if (vc->start ||
> +	    ti->len != i_size_read(dev->bdev->bd_inode) >> SECTOR_SHIFT)
> +		r = scsi_verify_blk_ioctl(NULL, cmd);
> +
> +	return r ? : __blkdev_driver_ioctl(dev->bdev, dev->mode, cmd, arg);
> +}
> +
> +static int verity_status(struct dm_target *ti, status_type_t type,
> +			char *result, unsigned int maxlen)
> +{
> +	struct verity_config *vc = (struct verity_config *) ti->private;
> +	unsigned int sz = 0;

unused

> +	char digest[VERITY_MAX_DIGEST_SIZE * 2 + 1] = { 0 };
> +	char salt[VERITY_SALT_SIZE * 2 + 1] = { 0 };
> +
> +	verity_tree_digest(&vc->bht, digest);
> +	verity_tree_salt(&vc->bht, salt);
> +
> +	switch (type) {
> +	case STATUSTYPE_INFO:
> +		result[0] = '\0';
> +		break;
> +	case STATUSTYPE_TABLE:
> +		DMEMIT("%s %s %llu %llu %s %s %s",
> +		       vc->dev->name,
> +		       vc->hash_dev->name,
> +		       ULL(vc->hash_start),
> +		       ULL(vc->bht.block_size),
> +		       vc->hash_alg,
> +		       digest,
> +		       salt);
> +		break;
> +	}
> +	return 0;
> +}
> +
>
> ...
>
> +static void verity_io_hints(struct dm_target *ti,
> +			    struct queue_limits *limits)
> +{
> +	struct verity_config *vc = ti->private;

Did it right that time!

> +	unsigned int block_size = vc->bht.block_size;
> +
> +	limits->logical_block_size = block_size;
> +	limits->physical_block_size = block_size;
> +	blk_limits_io_min(limits, block_size);
> +}
> +
>
> ...
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] dm: verity target
  2012-02-28 22:57 Mandeep Singh Baines
@ 2012-02-29 21:16 ` Mikulas Patocka
  2012-03-01  6:24   ` Mandeep Singh Baines
  2012-02-29 21:30 ` Andrew Morton
  1 sibling, 1 reply; 22+ messages in thread
From: Mikulas Patocka @ 2012-02-29 21:16 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: Alasdair G Kergon, dm-devel, linux-kernel, Will Drewry,
	Elly Jones, Milan Broz, Olof Johansson, Steffen Klassert,
	Andrew Morton

Hi

This crashes if the device size is 64MiB (and sha256 hash is used).

I tested it with the userspace utility and it doesn't work with device 
>= 128MiB, it fails to verify the output of the utility.

I run this (/dev/vg1/verity_long_data has 128MiB size):
./verity mode=create alg=sha256 payload=/dev/vg1/verity_long_data 
hashtree=/dev/vg1/verity_long_hash 
salt=1234000000000000000000000000000000000000000000000000000000000000
dm:dm bht[DEBUG] Setting block_count 32768
dm:dm bht[DEBUG] Setting depth to 3.
dm:dm bht[DEBUG] depth: 0 entries: 1
dm:dm bht[DEBUG] depth: 1 entries: 2
dm:dm bht[DEBUG] depth: 2 entries: 256
0 262144 verity payload=ROOT_DEV hashtree=HASH_DEV hashstart=262144 
alg=sha256 
root_hexdigest=6e46e106b288812a881a9da3f11180433f90ce264c4f1e8fa191fb40409846fb 
salt=1234000000000000000000000000000000000000000000000000000000000000

dmsetup -r create verity --table "0 `blockdev --getsize 
/dev/vg1/verity_long_data` verity 0 /dev/vg1/verity_long_data 
/dev/vg1/verity_long_hash 0 4096 sha256 
6e46e106b288812a881a9da3f11180433f90ce264c4f1e8fa191fb40409846fb 
1234000000000000000000000000000000000000000000000000000000000000"

and get a lot of log messages "failed to verify hash (d=3,bi=0)"

If the device is smaller than 128MiB, it works (except for 64MiB device 
where it crashes).

Mikulas

dmsetup -r create verity --table "0 131072 verity 0 
/dev/vg1/verity_long_data /dev/vg1/verity_long_hash 0 4096 sha256 
d821fec17e151a6e7b91c4a7a71487760185c25af49a74955a3e7a718c1f97dd 
1234000000000000000000000000000000000000000000000000000000000000"

[14356.819694] ------------[ cut here ]------------
[14356.819747] kernel BUG at drivers/md/dm-verity2.c:439!
[14356.819796] invalid opcode: 0000 [#1] PREEMPT SMP
[14356.819879] CPU 5
[14356.819897] Modules linked in: dm_verity2 md5 sha1_generic cryptomgr 
aead sha256_generic dm_zero dm_bufio crypto_hash crypto_algapi crypto 
dm_loop dm_mod parport_pc parport powernow_k8 mperf cpufreq_stats 
cpufreq_powersave cpufreq_userspace cpufreq_conservative cpufreq_ondemand 
freq_table snd_usb_audio snd_pcm_oss snd_mixer_oss snd_pcm snd_timer 
snd_page_alloc snd_hwdep snd_usbmidi_lib snd_rawmidi snd soundcore fuse 
raid0 md_mod lm85 hwmon_vid ide_cd_mod cdrom ohci_hcd ehci_hcd sata_svw 
libata serverworks ide_core usbcore floppy usb_common rtc_cmos e100 tg3 
mii libphy k10temp skge hwmon button i2c_piix4 processor unix [last 
unloaded: dm_verity2]
[14356.820536]
[14356.820580] Pid: 10909, comm: dmsetup Not tainted 3.2.0 #19 empty 
empty/S3992-E
[14356.820680] RIP: 0010:[<ffffffffa0195908>]  [<ffffffffa0195908>] 
verity_ctr+0x808/0x860 [dm_verity2]
[14356.820772] RSP: 0018:ffff880141b7bcc8  EFLAGS: 00010202
[14356.820821] RAX: 0000000000000408 RBX: ffff8802afedcc28 RCX: 
0000000000000002[14356.820875] RDX: 0000000000000081 RSI: ffff88044685a540 
RDI: ffff8802afc4b000[14356.820929] RBP: ffff8802afedcc00 R08: 
0000000000000000 R09: ffff8802afc4a000[14356.820987] R10: 0000000000000001 
R11: ffffff9000000018 R12: ffffffff8147a630[14356.821041] R13: 
ffff88044685a558 R14: 0000000000004000 R15: 0000000000000002[14356.821114] 
FS:  00007f1c9ea567a0(0000) GS:ffff880447c80000(0000) 
knlGS:0000000000000000
[14356.821199] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[14356.821257] CR2: 00007f1c9e199f80 CR3: 00000003fe98e000 CR4: 
00000000000006e0[14356.821330] DR0: 0000000000000000 DR1: 0000000000000000 
DR2: 0000000000000000[14356.821394] DR3: 0000000000000000 DR6: 
00000000ffff0ff0 DR7: 0000000000000400[14356.821464] Process dmsetup (pid: 
10909, threadinfo ffff880141b7a000, task ffff88023d8dec90)
[14356.821557] Stack:
[14356.821606]  ffff880141b7bd34 0000000000000007 ffffc90012ac9040 
ffff880429987340
[14356.821700]  ffffc90012ac41a4 0000000000020000 ffffc90012ac419d 
ffffc90012ac4162
[14356.821787]  ffffc90012ac417c ffffc90012ac41e5 0000000000020000 
0000000000000000
[14356.821905] Call Trace:
[14356.821965]  [<ffffffffa01d2cab>] ? dm_table_add_target+0x19b/0x450 
[dm_mod] [14356.822034]  [<ffffffffa01d5690>] ? table_clear+0x80/0x80 
[dm_mod]
[14356.822090]  [<ffffffffa01d5762>] ? table_load+0xd2/0x330 [dm_mod]
[14356.822150]  [<ffffffffa01d5690>] ? table_clear+0x80/0x80 [dm_mod]
[14356.822213]  [<ffffffffa01d6bf9>] ? ctl_ioctl+0x159/0x2a0 [dm_mod]
[14356.822286]  [<ffffffff8117351d>] ? ipc_addid+0x4d/0xd0
[14356.822338]  [<ffffffffa01d6d4e>] ? dm_ctl_ioctl+0xe/0x20 [dm_mod]
[14356.822411]  [<ffffffff81105a9e>] ? do_vfs_ioctl+0x8e/0x4f0
[14356.822474]  [<ffffffff8110a980>] ? dput+0x20/0x230
[14356.822533]  [<ffffffff810f6282>] ? fput+0x162/0x220
[14356.822590]  [<ffffffff81105f49>] ? sys_ioctl+0x49/0x90
[14356.822652]  [<ffffffff81313abb>] ? system_call_fastpath+0x16/0x1b
[14356.822702] Code: 38 ce 65 19 a0 e9 61 fb ff ff 48 c7 c7 d8 62 19 a0 31 
c0 e8 d9 80 17 e1 48 c7 c7 28 63 19 a0 31 c0 e8 cb 80 17 e1 e9 40 fb ff ff 
<0f> 0b 48 c7 c7 b8 60 19 a0 31 c0 e8 b6 80 17 e1 e9 9b fa ff ff
[14356.823081] RIP  [<ffffffffa0195908>] verity_ctr+0x808/0x860 
[dm_verity2]
[14356.823146]  RSP <ffff880141b7bcc8>
[14356.823530] ---[ end trace 773c24b9dbd5cfff ]---


On Tue, 28 Feb 2012, Mandeep Singh Baines wrote:

> The verity target provides transparent integrity checking of block devices
> using a cryptographic digest.
> 
> dm-verity is meant to be setup as part of a verified boot path.  This
> may be anything ranging from a boot using tboot or trustedgrub to just
> booting from a known-good device (like a USB drive or CD).
> 
> dm-verity is part of ChromeOS's verified boot path. It is used to verify
> the integrity of the root filesystem on boot. The root filesystem is
> mounted on a dm-verity partition which transparently verifies each block
> with a bootloader verified hash passed into the kernel at boot.
> 
> Changes in V4:
> * Discussion over phone (Alasdair G Kergon)
>  * copy _ioctl fix from dm-linear
>  * verity_status format fixes to match dm conventions
>  * s/dm-bht/verity_tree
>  * put everything into dm-verity.c
>  * ctr changed to dm conventions
>  * use hex2bin
>  * use conventional dm names for function
>   * s/dm_//
>   * for example: verity_ctr versus dm_verity_ctr
>  * use per_cpu API
> Changes in V3:
> * Discussion over irc (Alasdair G Kergon)
>   * Implement ioctl hook
> Changes in V2:
> * https://lkml.org/lkml/2011/11/10/85 (Steffen Klassert)
>   * Use shash API instead of older hash API
> 
> Signed-off-by: Will Drewry <wad@chromium.org>
> Signed-off-by: Elly Jones <ellyjones@chromium.org>
> Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
> Cc: Alasdair G Kergon <agk@redhat.com>
> Cc: Milan Broz <mbroz@redhat.com>
> Cc: Olof Johansson <olofj@chromium.org>
> Cc: Steffen Klassert <steffen.klassert@secunet.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mikulas Patocka <mpatocka@redhat.com>
> Cc: dm-devel@redhat.com
> ---
>  Documentation/device-mapper/verity.txt |  149 ++++
>  drivers/md/Kconfig                     |   16 +
>  drivers/md/Makefile                    |    1 +
>  drivers/md/dm-verity.c                 | 1411 ++++++++++++++++++++++++++++++++
>  4 files changed, 1577 insertions(+), 0 deletions(-)
>  create mode 100644 Documentation/device-mapper/verity.txt
>  create mode 100644 drivers/md/dm-verity.c
> 
> diff --git a/Documentation/device-mapper/verity.txt b/Documentation/device-mapper/verity.txt
> new file mode 100644
> index 0000000..b631f12
> --- /dev/null
> +++ b/Documentation/device-mapper/verity.txt
> @@ -0,0 +1,149 @@
> +dm-verity
> +==========
> +
> +Device-Mapper's "verity" target provides transparent integrity checking of
> +block devices using a cryptographic digest provided by the kernel crypto API.
> +This target is read-only.
> +
> +Parameters:
> +    <version> <dev> <hash_dev> <hash_start> <block_size> <alg> <digest> <salt>
> +
> +<version>
> +    This is the version number of the on-disk format. Currently, there is
> +    only version 0.
> +
> +<dev>
> +    This is the device that is going to be integrity checked.  It may be
> +    a subset of the full device as specified to dmsetup (start sector and count)
> +    It may be specified as a path, like /dev/sdaX, or a device number,
> +    <major>:<minor>.
> +
> +<hash_dev>
> +    This is the device that that supplies the hash tree data.  It may be
> +    specified similarly to the device path and may be the same device.  If the
> +    same device is used, the hash offset should be outside of the dm-verity
> +    configured device size.
> +
> +<hash_start>
> +    This is the offset, in 512-byte sectors, from the start of hash_dev to
> +    the root block of the hash tree.
> +
> +<block_size>
> +    The size of a hash block. Also, the size of a block to be hashed.
> +
> +<alg>
> +    The cryptographic hash algorithm used for this device.  This should
> +    be the name of the algorithm, like "sha1".
> +
> +<digest>
> +    The hexadecimal encoding of the cryptographic hash of all of the
> +    neighboring nodes at the first level of the tree.  This hash should be
> +    trusted as there is no other authenticity beyond this point.
> +
> +<salt>
> +    The hexadecimal encoding of the salt value.
> +
> +Theory of operation
> +===================
> +
> +dm-verity is meant to be setup as part of a verified boot path.  This
> +may be anything ranging from a boot using tboot or trustedgrub to just
> +booting from a known-good device (like a USB drive or CD).
> +
> +When a dm-verity device is configured, it is expected that the caller
> +has been authenticated in some way (cryptographic signatures, etc).
> +After instantiation, all hashes will be verified on-demand during
> +disk access.  If they cannot be verified up to the root node of the
> +tree, the root hash, then the I/O will fail.  This should identify
> +tampering with any data on the device and the hash data.
> +
> +Cryptographic hashes are used to assert the integrity of the device on a
> +per-block basis.  This allows for a lightweight hash computation on first read
> +into the page cache.  Block hashes are stored linearly aligned to the nearest
> +block the size of a page.
> +
> +Hash Tree
> +---------
> +
> +Each node in the tree is a cryptographic hash.  If it is a leaf node, the hash
> +is of some block data on disk.  If it is an intermediary node, then the hash is
> +of a number of child nodes.
> +
> +Each entry in the tree is a collection of neighboring nodes that fit in one
> +block.  The number is determined based on block_size and the size of the
> +selected cryptographic digest algorithm.  The hashes are linearly ordered in
> +this entry and any unaligned trailing space is ignored but included when
> +calculating the parent node.
> +
> +The tree looks something like:
> +
> +alg = sha256, num_blocks = 32768, block_size = 4096
> +
> +                                 [   root    ]
> +                                /    . . .    \
> +                     [entry_0]                 [entry_1]
> +                    /  . . .  \                 . . .   \
> +         [entry_0_0]   . . .  [entry_0_127]    . . . .  [entry_1_127]
> +           / ... \             /   . . .  \             /           \
> +     blk_0 ... blk_127  blk_16256   blk_16383      blk_32640 . . . blk_32767
> +
> +On-disk format
> +==============
> +
> +Below is the recommended on-disk format. The verity kernel code does not
> +read the on-disk header. It only reads the hash blocks which directly
> +follow the header. It is expected that a user-space tool will verify the
> +integrity of the verity_header and then call dm_setup with the correct
> +parameters. Alternatively, the header can be omitted and the dm_setup
> +parameters can be passed via the kernel command-line in a rooted chain
> +of trust where the command-line is verified.
> +
> +The on-disk format is especially useful in cases where the hash blocks
> +are on a separate partition. The magic number allows easy identification
> +of the partition contents. Alternatively, the hash blocks can be stored
> +in the same partition as the data to be verified. In such a configuration
> +the filesystem on the partition would be sized a little smaller than
> +the full-partition, leaving room for the hash blocks.
> +
> +struct verity_header {
> +       uint64_t magic = 0x7665726974790a00;
> +       uint32_t version;
> +       uint32_t block_size;
> +       char digest[128]; /* in hex-ascii, null-terminated or 128-bytes */
> +       char salt[128]; /* in hex-ascii, null-terminated or 128-bytes */
> +}
> +
> +struct verity_header_block {
> +	struct verity_header;
> +	char unused[block_size - sizeof(struct verity_header) - sizeof(sig)];
> +	char sig[128]; /* in hex-ascii, null-terminated or 128-bytes */
> +}
> +
> +Directly following the header are the hash blocks which are stored a depth
> +at a time (starting from the root), sorted in order of increasing index.
> +
> +Usage
> +=====
> +
> +The API provides mechanisms for reading and verifying a tree. When reading, all
> +required data for the hash tree should be populated for a block before
> +attempting a verify.  This can be done by calling dm_bht_populate().  When all
> +data is ready, a call to dm_bht_verify_block() with the expected hash value will
> +perform both the direct block hash check and the hashes of the parent and
> +neighboring nodes where needed to ensure validity up to the root hash.  Note,
> +dm_bht_set_root_hexdigest() should be called before any verification attempts
> +occur.
> +
> +Example
> +=======
> +
> +Setup a device;
> +[[
> +  dmsetup create vroot --table \
> +    "0 204800 verity /dev/sda1 /dev/sda2 alg=sha1 "\
> +    "root_hexdigest=9f74809a2ee7607b16fcc70d9399a4de9725a727"
> +]]
> +
> +A command line tool is available to compute the hash tree and return the
> +root hash value.
> +  http://git.chromium.org/cgi-bin/gitweb.cgi?p=dm-verity.git;a=tree
> diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
> index faa4741..b8bb690 100644
> --- a/drivers/md/Kconfig
> +++ b/drivers/md/Kconfig
> @@ -370,4 +370,20 @@ config DM_FLAKEY
>         ---help---
>           A target that intermittently fails I/O for debugging purposes.
>  
> +config DM_VERITY
> +        tristate "Verity target support"
> +        depends on BLK_DEV_DM
> +        select CRYPTO
> +        select CRYPTO_HASH
> +        ---help---
> +          This device-mapper target allows you to create a device that
> +          transparently integrity checks the data on it. You'll need to
> +          activate the digests you're going to use in the cryptoapi
> +          configuration.
> +
> +          To compile this code as a module, choose M here: the module will
> +          be called dm-verity.
> +
> +          If unsure, say N.
> +
>  endif # MD
> diff --git a/drivers/md/Makefile b/drivers/md/Makefile
> index 046860c..70a29af 100644
> --- a/drivers/md/Makefile
> +++ b/drivers/md/Makefile
> @@ -39,6 +39,7 @@ obj-$(CONFIG_DM_SNAPSHOT)	+= dm-snapshot.o
>  obj-$(CONFIG_DM_PERSISTENT_DATA)	+= persistent-data/
>  obj-$(CONFIG_DM_MIRROR)		+= dm-mirror.o dm-log.o dm-region-hash.o
>  obj-$(CONFIG_DM_LOG_USERSPACE)	+= dm-log-userspace.o
> +obj-$(CONFIG_DM_VERITY)         += dm-verity.o
>  obj-$(CONFIG_DM_ZERO)		+= dm-zero.o
>  obj-$(CONFIG_DM_RAID)	+= dm-raid.o
>  obj-$(CONFIG_DM_THIN_PROVISIONING)	+= dm-thin-pool.o
> diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
> new file mode 100644
> index 0000000..87b7958
> --- /dev/null
> +++ b/drivers/md/dm-verity.c
> @@ -0,0 +1,1411 @@
> +/*
> + * Originally based on dm-crypt.c,
> + * Copyright (C) 2003 Christophe Saout <christophe@saout.de>
> + * Copyright (C) 2004 Clemens Fruhwirth <clemens@endorphin.org>
> + * Copyright (C) 2006-2008 Red Hat, Inc. All rights reserved.
> + * Copyright (C) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org>
> + *                    All Rights Reserved.
> + *
> + * This file is released under the GPLv2.
> + *
> + * Implements a verifying transparent block device.
> + * See Documentation/device-mapper/dm-verity.txt
> + */
> +#include <crypto/hash.h>
> +#include <linux/atomic.h>
> +#include <linux/bio.h>
> +#include <linux/blkdev.h>
> +#include <linux/genhd.h>
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/mempool.h>
> +#include <linux/module.h>
> +#include <linux/workqueue.h>
> +#include <linux/device-mapper.h>
> +
> +
> +#define DM_MSG_PREFIX "verity"
> +
> +
> +/* Helper for printing sector_t */
> +#define ULL(x) ((unsigned long long)(x))
> +
> +#define MIN_IOS 32
> +#define MIN_BIOS (MIN_IOS * 2)
> +
> +/* To avoid allocating memory for digest tests, we just setup a
> + * max to use for now.
> + */
> +#define VERITY_MAX_DIGEST_SIZE 64   /* Supports up to 512-bit digests */
> +#define VERITY_SALT_SIZE       32   /* 256 bits of salt is a lot */
> +
> +/* UNALLOCATED, PENDING, READY, and VERIFIED are valid states. All other
> + * values are entry-related return codes.
> + */
> +#define VERITY_TREE_ENTRY_VERIFIED 8  /* 'nodes' checked against parent */
> +#define VERITY_TREE_ENTRY_READY 4  /* 'nodes' is loaded and available */
> +#define VERITY_TREE_ENTRY_PENDING 2  /* 'nodes' is being loaded */
> +#define VERITY_TREE_ENTRY_UNALLOCATED 0 /* untouched */
> +#define VERITY_TREE_ENTRY_ERROR -1 /* entry is unsuitable for use */
> +#define VERITY_TREE_ENTRY_ERROR_IO -2 /* I/O error on load */
> +
> +/* Additional possible return codes */
> +#define VERITY_TREE_ENTRY_ERROR_MISMATCH -3 /* Digest mismatch */
> +
> +
> +/* verity_tree_entry
> + * Contains verity_tree->node_count tree nodes at a given tree depth.
> + * state is used to transactionally assure that data is paged in
> + * from disk.  Unless verity_tree kept running crypto contexts for each
> + * level, we need to load in the data for on-demand verification.
> + */
> +struct verity_tree_entry {
> +	atomic_t state; /* see defines */
> +	/* Keeping an extra pointer per entry wastes up to ~33k of
> +	 * memory if a 1m blocks are used (or 66 on 64-bit arch)
> +	 */
> +	void *io_context;  /* Reserve a pointer for use during io */
> +	/* data should only be non-NULL if fully populated. */
> +	void *nodes;  /* The hash data used to verify the children.
> +		       * Guaranteed to be page-aligned.
> +		       */
> +};
> +
> +/* verity_tree_level
> + * Contains an array of entries which represent a page of hashes where
> + * each hash is a node in the tree at the given tree depth/level.
> + */
> +struct verity_tree_level {
> +	struct verity_tree_entry *entries;  /* array of entries of tree nodes */
> +	unsigned int count;  /* number of entries at this level */
> +	sector_t sector;  /* starting sector for this level */
> +};
> +
> +/* opaque context, start, databuf, sector_count */
> +typedef int(*verity_tree_callback)(void *,  /* external context */
> +			      sector_t,  /* start sector */
> +			      u8 *,  /* destination page */
> +			      sector_t,  /* num sectors */
> +			      struct verity_tree_entry *);
> +/* verity_tree - Device mapper block hash tree
> + * verity_tree provides a fixed interface for comparing data blocks
> + * against a cryptographic hashes stored in a hash tree. It
> + * optimizes the tree structure for storage on disk.
> + *
> + * The tree is built from the bottom up.  A collection of data,
> + * external to the tree, is hashed and these hashes are stored
> + * as the blocks in the tree.  For some number of these hashes,
> + * a parent node is created by hashing them.  These steps are
> + * repeated.
> + */
> +struct verity_tree {
> +	/* Configured values */
> +	int depth;  /* Depth of the tree including the root */
> +	unsigned int block_count;  /* Number of blocks hashed */
> +	unsigned int block_size;  /* Size of a hash block */
> +	char hash_alg[CRYPTO_MAX_ALG_NAME];
> +	u8 salt[VERITY_SALT_SIZE];
> +
> +	/* Computed values */
> +	unsigned int node_count;  /* Data size (in hashes) for each entry */
> +	unsigned int node_count_shift;  /* first bit set - 1 */
> +	/*
> +	 * There is one per CPU so that verified can be simultaneous.
> +	 * Access through per_cpu_ptr() only
> +	 */
> +	struct shash_desc * __percpu *hash_desc; /* Container for hash alg */
> +	unsigned int digest_size;
> +	sector_t sectors;  /* Number of disk sectors used */
> +
> +	/* bool verified;  Full tree is verified */
> +	u8 digest[VERITY_MAX_DIGEST_SIZE];
> +	struct verity_tree_level *levels;  /* in reverse order */
> +	/* Callback for reading from the hash device */
> +	verity_tree_callback read_cb;
> +};
> +
> +/* per-requested-bio private data */
> +enum verity_io_flags {
> +	VERITY_IOFLAGS_CLONED = 0x1,	/* original bio has been cloned */
> +};
> +
> +struct verity_io {
> +	struct dm_target *target;
> +	struct bio *bio;
> +	struct delayed_work work;
> +	unsigned int flags;
> +
> +	int error;
> +	atomic_t pending;
> +
> +	u64 block;  /* aligned block index */
> +	u64 count;  /* aligned count in blocks */
> +};
> +
> +struct verity_config {
> +	struct dm_dev *dev;
> +	sector_t start;
> +	sector_t size;
> +
> +	struct dm_dev *hash_dev;
> +	sector_t hash_start;
> +
> +	struct verity_tree bht;
> +
> +	/* Pool required for io contexts */
> +	mempool_t *io_pool;
> +	/* Pool and bios required for making sure that backing device reads are
> +	 * in PAGE_SIZE increments.
> +	 */
> +	struct bio_set *bs;
> +
> +	char hash_alg[CRYPTO_MAX_ALG_NAME];
> +};
> +
> +
> +static struct kmem_cache *_verity_io_pool;
> +static struct workqueue_struct *kveritydq, *kverityd_ioq;
> +
> +
> +static void kverityd_verify(struct work_struct *work);
> +static void kverityd_io(struct work_struct *work);
> +static void kverityd_io_bht_populate(struct verity_io *io);
> +static void kverityd_io_bht_populate_end(struct bio *, int error);
> +
> +
> +/*
> + * Utilities
> + */
> +
> +static void bin2hex(char *dst, const u8 *src, size_t count)
> +{
> +	while (count-- > 0) {
> +		sprintf(dst, "%02hhx", (int)*src);
> +		dst += 2;
> +		src++;
> +	}
> +}
> +
> +/*
> + * Verity Tree
> + */
> +
> +/* Functions for converting indices to nodes. */
> +
> +static inline unsigned int verity_tree_get_level_shift(struct verity_tree *bht,
> +						  int depth)
> +{
> +	return (bht->depth - depth) * bht->node_count_shift;
> +}
> +
> +/* For the given depth, this is the entry index.  At depth+1 it is the node
> + * index for depth.
> + */
> +static inline unsigned int verity_tree_index_at_level(struct verity_tree *bht,
> +						      int depth,
> +						      unsigned int leaf)
> +{
> +	return leaf >> verity_tree_get_level_shift(bht, depth);
> +}
> +
> +static inline struct verity_tree_entry *verity_tree_get_entry(
> +		struct verity_tree *bht,
> +		int depth,
> +		unsigned int block)
> +{
> +	unsigned int index = verity_tree_index_at_level(bht, depth, block);
> +	struct verity_tree_level *level = &bht->levels[depth];
> +
> +	return &level->entries[index];
> +}
> +
> +static inline void *verity_tree_get_node(struct verity_tree *bht,
> +					 struct verity_tree_entry *entry,
> +					 int depth, unsigned int block)
> +{
> +	unsigned int index = verity_tree_index_at_level(bht, depth, block);
> +	unsigned int node_index = index % bht->node_count;
> +
> +	return entry->nodes + (node_index * bht->digest_size);
> +}
> +/**
> + * verity_tree_compute_hash: hashes a page of data
> + */
> +static int verity_tree_compute_hash(struct verity_tree *vt, struct page *pg,
> +				    unsigned int offset, u8 *digest)
> +{
> +	struct shash_desc *hash_desc;
> +	void *data;
> +	int err;
> +
> +	hash_desc = *per_cpu_ptr(vt->hash_desc, smp_processor_id());
> +
> +	if (crypto_shash_init(hash_desc)) {
> +		DMCRIT("failed to reinitialize crypto hash (proc:%d)",
> +			smp_processor_id());
> +		return -EINVAL;
> +	}
> +	data = kmap_atomic(pg);
> +	err = crypto_shash_update(hash_desc, data + offset, PAGE_SIZE);
> +	kunmap_atomic(data);
> +	if (err) {
> +		DMCRIT("crypto_hash_update failed");
> +		return -EINVAL;
> +	}
> +	if (crypto_shash_update(hash_desc, vt->salt, sizeof(vt->salt))) {
> +		DMCRIT("crypto_hash_update failed");
> +		return -EINVAL;
> +	}
> +	if (crypto_shash_final(hash_desc, digest)) {
> +		DMCRIT("crypto_hash_final failed");
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static int verity_tree_initialize_entries(struct verity_tree *vt)
> +{
> +	/* last represents the index of the last digest store in the tree.
> +	 * By walking the tree with that index, it is possible to compute the
> +	 * total number of entries at each level.
> +	 *
> +	 * Since each entry will contain up to |node_count| nodes of the tree,
> +	 * it is possible that the last index may not be at the end of a given
> +	 * entry->nodes.  In that case, it is assumed the value is padded.
> +	 *
> +	 * Note, we treat both the tree root (1 hash) and the tree leaves
> +	 * independently from the vt data structures.  Logically, the root is
> +	 * depth=-1 and the block layer level is depth=vt->depth
> +	 */
> +	unsigned int last = vt->block_count;
> +	int depth;
> +
> +	/* check that the largest level->count can't result in an int overflow
> +	 * on allocation or sector calculation.
> +	 */
> +	if (((last >> vt->node_count_shift) + 1) >
> +	    UINT_MAX / max((unsigned int)sizeof(struct verity_tree_entry),
> +			   (unsigned int)to_sector(vt->block_size))) {
> +		DMCRIT("required entries %u is too large", last + 1);
> +		return -EINVAL;
> +	}
> +
> +	/* Track the current sector location for each level so we don't have to
> +	 * compute it during traversals.
> +	 */
> +	vt->sectors = 0;
> +	for (depth = 0; depth < vt->depth; ++depth) {
> +		struct verity_tree_level *level = &vt->levels[depth];
> +
> +		level->count = verity_tree_index_at_level(vt, depth, last) + 1;
> +		level->entries = (struct verity_tree_entry *)
> +				 kcalloc(level->count,
> +					 sizeof(struct verity_tree_entry),
> +					 GFP_KERNEL);
> +		if (!level->entries) {
> +			DMERR("failed to allocate entries for depth %d", depth);
> +			return -ENOMEM;
> +		}
> +		level->sector = vt->sectors;
> +		vt->sectors += level->count * to_sector(vt->block_size);
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * verity_tree_create - prepares @vt for us
> + * @vt:	          pointer to a verity_tree_create()d vt
> + * @depth:	  tree depth without the root; including block hashes
> + * @block_count:  the number of block hashes / tree leaves
> + * @alg_name:	  crypto hash algorithm name
> + *
> + * Returns 0 on success.
> + *
> + * Callers can offset into devices by storing the data in the io callbacks.
> + */
> +static int verity_tree_create(struct verity_tree *vt, unsigned int block_count,
> +			      unsigned int block_size, const char *alg_name)
> +{
> +	struct crypto_shash *tfm;
> +	int size, cpu, status = 0;
> +
> +	vt->block_size = block_size;
> +	/* Verify that PAGE_SIZE >= block_size >= SECTOR_SIZE. */
> +	if ((block_size > PAGE_SIZE) ||
> +	    (PAGE_SIZE % block_size) ||
> +	    (to_sector(block_size) == 0))
> +		return -EINVAL;
> +
> +	tfm = crypto_alloc_shash(alg_name, 0, 0);
> +	if (IS_ERR(tfm)) {
> +		DMERR("failed to allocate crypto hash '%s'", alg_name);
> +		return -ENOMEM;
> +	}
> +	size = sizeof(struct shash_desc) + crypto_shash_descsize(tfm);
> +
> +	vt->hash_desc = alloc_percpu(struct shash_desc *);
> +	if (!vt->hash_desc) {
> +		DMERR("Failed to allocate per cpu hash_desc");
> +		status = -ENOMEM;
> +		goto bad_per_cpu;
> +	}
> +
> +	/* Pre-allocate per-cpu crypto contexts to avoid having to
> +	 * kmalloc/kfree a context for every hash operation.
> +	 */
> +	for_each_possible_cpu(cpu) {
> +		struct shash_desc *hash_desc = kmalloc(size, GFP_KERNEL);
> +
> +		*per_cpu_ptr(vt->hash_desc, cpu) = hash_desc;
> +		if (!hash_desc) {
> +			DMERR("failed to allocate crypto hash contexts");
> +			status = -ENOMEM;
> +			goto bad_hash_alloc;
> +		}
> +		hash_desc->tfm = tfm;
> +		hash_desc->flags = 0x0;
> +	}
> +	vt->digest_size = crypto_shash_digestsize(tfm);
> +	/* We expect to be able to pack >=2 hashes into a block */
> +	if (block_size / vt->digest_size < 2) {
> +		DMERR("too few hashes fit in a block");
> +		status = -EINVAL;
> +		goto bad_arg;
> +	}
> +
> +	if (vt->digest_size > VERITY_MAX_DIGEST_SIZE) {
> +		DMERR("VERITY_MAX_DIGEST_SIZE too small for digest");
> +		status = -EINVAL;
> +		goto bad_arg;
> +	}
> +
> +	/* Configure the tree */
> +	vt->block_count = block_count;
> +	if (block_count == 0) {
> +		DMERR("block_count must be non-zero");
> +		status = -EINVAL;
> +		goto bad_arg;
> +	}
> +
> +	/* Each verity_tree_entry->nodes is one block.  The node code tracks
> +	 * how many nodes fit into one entry where a node is a single
> +	 * hash (message digest).
> +	 */
> +	vt->node_count_shift = fls(block_size / vt->digest_size) - 1;
> +	/* Round down to the nearest power of two.  This makes indexing
> +	 * into the tree much less painful.
> +	 */
> +	vt->node_count = 1 << vt->node_count_shift;
> +
> +	/* This is unlikely to happen, but with 64k pages, who knows. */
> +	if (vt->node_count > UINT_MAX / vt->digest_size) {
> +		DMERR("node_count * hash_len exceeds UINT_MAX!");
> +		status = -EINVAL;
> +		goto bad_arg;
> +	}
> +
> +	vt->depth = DIV_ROUND_UP(fls(block_count - 1), vt->node_count_shift);
> +
> +	/* Ensure that we can safely shift by this value. */
> +	if (vt->depth * vt->node_count_shift >= sizeof(unsigned int) * 8) {
> +		DMERR("specified depth and node_count_shift is too large");
> +		status = -EINVAL;
> +		goto bad_arg;
> +	}
> +
> +	/* Allocate levels. Each level of the tree may have an arbitrary number
> +	 * of verity_tree_entry structs.  Each entry contains node_count nodes.
> +	 * Each node in the tree is a cryptographic digest of either node_count
> +	 * nodes on the subsequent level or of a specific block on disk.
> +	 */
> +	vt->levels = (struct verity_tree_level *)
> +			kcalloc(vt->depth,
> +				sizeof(struct verity_tree_level), GFP_KERNEL);
> +	if (!vt->levels) {
> +		DMERR("failed to allocate tree levels");
> +		status = -ENOMEM;
> +		goto bad_level_alloc;
> +	}
> +
> +	vt->read_cb = NULL;
> +
> +	status = verity_tree_initialize_entries(vt);
> +	if (status)
> +		goto bad_entries_alloc;
> +
> +	/* We compute depth such that there is only be 1 block at level 0. */
> +	BUG_ON(vt->levels[0].count != 1);
> +
> +	return 0;
> +
> +bad_entries_alloc:
> +	while (vt->depth-- > 0)
> +		kfree(vt->levels[vt->depth].entries);
> +	kfree(vt->levels);
> +bad_level_alloc:
> +bad_arg:
> +bad_hash_alloc:
> +	for_each_possible_cpu(cpu)
> +		if (*per_cpu_ptr(vt->hash_desc, cpu))
> +			kfree(*per_cpu_ptr(vt->hash_desc, cpu));
> +	free_percpu(vt->hash_desc);
> +bad_per_cpu:
> +	crypto_free_shash(tfm);
> +	return status;
> +}
> +
> +/**
> + * verity_tree_read_completed
> + * @entry:   pointer to the entry that's been loaded
> + * @status:  I/O status. Non-zero is failure.
> + * MUST always be called after a read_cb completes.
> + */
> +static void verity_tree_read_completed(struct verity_tree_entry *entry,
> +				       int status)
> +{
> +	if (status) {
> +		DMCRIT("an I/O error occurred while reading entry");
> +		atomic_set(&entry->state, VERITY_TREE_ENTRY_ERROR_IO);
> +		return;
> +	}
> +	BUG_ON(atomic_read(&entry->state) != VERITY_TREE_ENTRY_PENDING);
> +	atomic_set(&entry->state, VERITY_TREE_ENTRY_READY);
> +}
> +
> +/**
> + * verity_tree_verify_block - checks that all path nodes for @block are valid
> + * @vt:	     pointer to a verity_tree_create()d vt
> + * @block:   specific block data is expected from
> + * @pg:	     page holding the block data
> + * @offset:  offset into the page
> + *
> + * Returns 0 on success, VERITY_TREE_ENTRY_ERROR_MISMATCH on error.
> + */
> +static int verity_tree_verify_block(struct verity_tree *vt, unsigned int block,
> +				    struct page *pg, unsigned int offset)
> +{
> +	int state, depth = vt->depth;
> +	u8 digest[VERITY_MAX_DIGEST_SIZE];
> +	struct verity_tree_entry *entry;
> +	void *node;
> +
> +	do {
> +		/* Need to check that the hash of the current block is accurate
> +		 * in its parent.
> +		 */
> +		entry = verity_tree_get_entry(vt, depth - 1, block);
> +		state = atomic_read(&entry->state);
> +		/* This call is only safe if all nodes along the path
> +		 * are already populated (i.e. READY) via verity_tree_populate.
> +		 */
> +		BUG_ON(state < VERITY_TREE_ENTRY_READY);
> +		node = verity_tree_get_node(vt, entry, depth, block);
> +
> +		if (verity_tree_compute_hash(vt, pg, offset, digest) ||
> +		    memcmp(digest, node, vt->digest_size))
> +			goto mismatch;
> +
> +		/* Keep the containing block of hashes to be verified in the
> +		 * next pass.
> +		 */
> +		pg = virt_to_page(entry->nodes);
> +		offset = offset_in_page(entry->nodes);
> +	} while (--depth > 0 && state != VERITY_TREE_ENTRY_VERIFIED);
> +
> +	if (depth == 0 && state != VERITY_TREE_ENTRY_VERIFIED) {
> +		if (verity_tree_compute_hash(vt, pg, offset, digest) ||
> +		    memcmp(digest, vt->digest, vt->digest_size))
> +			goto mismatch;
> +		atomic_set(&entry->state, VERITY_TREE_ENTRY_VERIFIED);
> +	}
> +
> +	/* Mark path to leaf as verified. */
> +	for (depth++; depth < vt->depth; depth++) {
> +		entry = verity_tree_get_entry(vt, depth, block);
> +		/* At this point, entry can only be in VERIFIED or READY state.
> +		 * So it is safe to use atomic_set instead of atomic_cmpxchg.
> +		 */
> +		atomic_set(&entry->state, VERITY_TREE_ENTRY_VERIFIED);
> +	}
> +
> +	return 0;
> +
> +mismatch:
> +	DMERR_LIMIT("verify_path: failed to verify hash (d=%d,bi=%u)",
> +		    depth, block);
> +	return VERITY_TREE_ENTRY_ERROR_MISMATCH;
> +}
> +
> +/**
> + * verity_tree_is_populated - check that nodes needed to verify a given
> + *                            block are all ready
> + * @vt:	    pointer to a verity_tree_create()d vt
> + * @block:  specific block data is expected from
> + *
> + * Callers may wish to call verity_tree_is_populated() when checking an io
> + * for which entries were already pending.
> + */
> +static bool verity_tree_is_populated(struct verity_tree *vt, unsigned int block)
> +{
> +	int depth;
> +
> +	for (depth = vt->depth - 1; depth >= 0; depth--) {
> +		struct verity_tree_entry *entry;
> +		entry = verity_tree_get_entry(vt, depth, block);
> +		if (atomic_read(&entry->state) < VERITY_TREE_ENTRY_READY)
> +			return false;
> +	}
> +
> +	return true;
> +}
> +
> +/**
> + * verity_tree_populate - reads entries from disk needed to verify a given block
> + * @vt:     pointer to a verity_tree_create()d vt
> + * @ctx:    context used for all read_cb calls on this request
> + * @block:  specific block data is expected from
> + *
> + * Returns negative value on error. Returns 0 on success.
> + */
> +static int verity_tree_populate(struct verity_tree *vt, void *ctx,
> +				unsigned int block)
> +{
> +	int depth, state;
> +
> +	BUG_ON(block >= vt->block_count);
> +
> +	for (depth = vt->depth - 1; depth >= 0; --depth) {
> +		unsigned int index;
> +		struct verity_tree_level *level;
> +		struct verity_tree_entry *entry;
> +
> +		index = verity_tree_index_at_level(vt, depth, block);
> +		level = &vt->levels[depth];
> +		entry = verity_tree_get_entry(vt, depth, block);
> +		state = atomic_cmpxchg(&entry->state,
> +				       VERITY_TREE_ENTRY_UNALLOCATED,
> +				       VERITY_TREE_ENTRY_PENDING);
> +		if (state == VERITY_TREE_ENTRY_VERIFIED)
> +			break;
> +		if (state <= VERITY_TREE_ENTRY_ERROR)
> +			goto error_state;
> +		if (state != VERITY_TREE_ENTRY_UNALLOCATED)
> +			continue;
> +
> +		/* Current entry is claimed for allocation and loading */
> +		entry->nodes = kmalloc(vt->block_size, GFP_NOIO);
> +		if (!entry->nodes)
> +			goto nomem;
> +
> +		vt->read_cb(ctx,
> +			    level->sector + to_sector(index * vt->block_size),
> +			    entry->nodes, to_sector(vt->block_size), entry);
> +	}
> +
> +	return 0;
> +
> +error_state:
> +	DMCRIT("block %u at depth %d is in an error state", block, depth);
> +	return -EPERM;
> +
> +nomem:
> +	DMCRIT("failed to allocate memory for entry->nodes");
> +	return -ENOMEM;
> +}
> +
> +/**
> + * verity_tree_destroy - cleans up all memory used by @vt
> + * @vt:	 pointer to a verity_tree_create()d vt
> + */
> +static void verity_tree_destroy(struct verity_tree *vt)
> +{
> +	int depth, cpu;
> +
> +	for (depth = 0; depth < vt->depth; depth++) {
> +		struct verity_tree_entry *entry = vt->levels[depth].entries;
> +		struct verity_tree_entry *entry_end = entry +
> +			vt->levels[depth].count;
> +		for (; entry < entry_end; ++entry)
> +			kfree(entry->nodes);
> +		kfree(vt->levels[depth].entries);
> +	}
> +	kfree(vt->levels);
> +	crypto_free_shash((*per_cpu_ptr(vt->hash_desc, 0))->tfm);
> +	for_each_possible_cpu(cpu)
> +		kfree(*per_cpu_ptr(vt->hash_desc, cpu));
> +}
> +
> +/*
> + * Verity Tree Accessors
> + */
> +
> +/**
> + * verity_tree_set_digest - sets an unverified root digest hash from hex
> + * @vt:	     pointer to a verity_tree_create()d vt
> + * @digest:  string containing the digest in hex
> + * Returns non-zero on error.
> + */
> +static int verity_tree_set_digest(struct verity_tree *vt, const char *digest)
> +{
> +	/* Make sure we have at least the bytes expected */
> +	if (strnlen((char *)digest, vt->digest_size * 2) !=
> +	    vt->digest_size * 2) {
> +		DMERR("root digest length does not match hash algorithm");
> +		return -1;
> +	}
> +	return hex2bin(vt->digest, digest, vt->digest_size);
> +}
> +
> +/**
> + * verity_tree_digest - returns root digest in hex
> + * @vt:	     pointer to a verity_tree_create()d vt
> + * @digest:  buffer to put into, must be of length VERITY_SALT_SIZE * 2 + 1.
> + */
> +int verity_tree_digest(struct verity_tree *vt, char *digest)
> +{
> +	bin2hex(digest, vt->digest, vt->digest_size);
> +	return 0;
> +}
> +
> +/**
> + * verity_tree_set_salt - sets the salt
> + * @vt:    pointer to a verity_tree_create()d vt
> + * @salt:  string containing the salt in hex
> + * Returns non-zero on error.
> + */
> +int verity_tree_set_salt(struct verity_tree *vt, const char *salt)
> +{
> +	size_t saltlen = min(strlen(salt) / 2, sizeof(vt->salt));
> +	memset(vt->salt, 0, sizeof(vt->salt));
> +	return hex2bin(vt->salt, salt, saltlen);
> +}
> +
> +
> +/**
> + * verity_tree_salt - returns the salt in hex
> + * @vt:    pointer to a verity_tree_create()d vt
> + * @salt:  buffer to put salt into, of length VERITY_SALT_SIZE * 2 + 1.
> + */
> +int verity_tree_salt(struct verity_tree *vt, char *salt)
> +{
> +	bin2hex(salt, vt->salt, sizeof(vt->salt));
> +	return 0;
> +}
> +
> +/*
> + * Allocation and utility functions
> + */
> +
> +static void kverityd_src_io_read_end(struct bio *clone, int error);
> +
> +/* Shared destructor for all internal bios */
> +static void verity_bio_destructor(struct bio *bio)
> +{
> +	struct verity_io *io = bio->bi_private;
> +	struct verity_config *vc = io->target->private;
> +	bio_free(bio, vc->bs);
> +}
> +
> +static struct bio *verity_alloc_bioset(struct verity_config *vc, gfp_t gfp_mask,
> +				       int nr_iovecs)
> +{
> +	return bio_alloc_bioset(gfp_mask, nr_iovecs, vc->bs);
> +}
> +
> +static struct verity_io *verity_io_alloc(struct dm_target *ti,
> +					    struct bio *bio)
> +{
> +	struct verity_config *vc = ti->private;
> +	sector_t sector = bio->bi_sector - ti->begin;
> +	struct verity_io *io;
> +
> +	io = mempool_alloc(vc->io_pool, GFP_NOIO);
> +	if (unlikely(!io))
> +		return NULL;
> +	io->flags = 0;
> +	io->target = ti;
> +	io->bio = bio;
> +	io->error = 0;
> +
> +	/* Adjust the sector by the virtual starting sector */
> +	io->block = to_bytes(sector) / vc->bht.block_size;
> +	io->count = bio->bi_size / vc->bht.block_size;
> +
> +	atomic_set(&io->pending, 0);
> +
> +	return io;
> +}
> +
> +static struct bio *verity_bio_clone(struct verity_io *io)
> +{
> +	struct verity_config *vc = io->target->private;
> +	struct bio *bio = io->bio;
> +	struct bio *clone = verity_alloc_bioset(vc, GFP_NOIO, bio->bi_max_vecs);
> +
> +	if (!clone)
> +		return NULL;
> +
> +	__bio_clone(clone, bio);
> +	clone->bi_private = io;
> +	clone->bi_end_io  = kverityd_src_io_read_end;
> +	clone->bi_bdev    = vc->dev->bdev;
> +	clone->bi_sector += vc->start - io->target->begin;
> +	clone->bi_destructor = verity_bio_destructor;
> +
> +	return clone;
> +}
> +
> +/*
> + * Reverse flow of requests into the device.
> + *
> + * (Start at the bottom with verity_map and work your way upward).
> + */
> +
> +static void verity_inc_pending(struct verity_io *io);
> +
> +static void verity_return_bio_to_caller(struct verity_io *io)
> +{
> +	struct verity_config *vc = io->target->private;
> +
> +	if (io->error)
> +		io->error = -EIO;
> +
> +	bio_endio(io->bio, io->error);
> +	mempool_free(io, vc->io_pool);
> +}
> +
> +/* Check for any missing bht hashes. */
> +static bool verity_is_bht_populated(struct verity_io *io)
> +{
> +	struct verity_config *vc = io->target->private;
> +	u64 block;
> +
> +	for (block = io->block; block < io->block + io->count; ++block)
> +		if (!verity_tree_is_populated(&vc->bht, block))
> +			return false;
> +
> +	return true;
> +}
> +
> +/* verity_dec_pending manages the lifetime of all verity_io structs.
> + * Non-bug error handling is centralized through this interface and
> + * all passage from workqueue to workqueue.
> + */
> +static void verity_dec_pending(struct verity_io *io)
> +{
> +	if (!atomic_dec_and_test(&io->pending))
> +		goto done;
> +
> +	if (unlikely(io->error))
> +		goto io_error;
> +
> +	/* I/Os that were pending may now be ready */
> +	if (verity_is_bht_populated(io)) {
> +		INIT_DELAYED_WORK(&io->work, kverityd_verify);
> +		queue_delayed_work(kveritydq, &io->work, 0);
> +	} else {
> +		INIT_DELAYED_WORK(&io->work, kverityd_io);
> +		queue_delayed_work(kverityd_ioq, &io->work, HZ/10);
> +	}
> +
> +done:
> +	return;
> +
> +io_error:
> +	verity_return_bio_to_caller(io);
> +}
> +
> +/* Walks the data set and computes the hash of the data read from the
> + * untrusted source device.  The computed hash is then passed to verity-tree
> + * for verification.
> + */
> +static int verity_verify(struct verity_config *vc,
> +			 struct verity_io *io)
> +{
> +	unsigned int block_size = vc->bht.block_size;
> +	struct bio *bio = io->bio;
> +	u64 block = io->block;
> +	unsigned int idx;
> +	int r;
> +
> +	for (idx = bio->bi_idx; idx < bio->bi_vcnt; idx++) {
> +		struct bio_vec *bv = bio_iovec_idx(bio, idx);
> +		unsigned int offset = bv->bv_offset;
> +		unsigned int len = bv->bv_len;
> +
> +		BUG_ON(offset % block_size);
> +		BUG_ON(len % block_size);
> +
> +		while (len) {
> +			r = verity_tree_verify_block(&vc->bht, block,
> +						bv->bv_page, offset);
> +			if (r)
> +				goto bad_return;
> +
> +			offset += block_size;
> +			len -= block_size;
> +			block++;
> +			cond_resched();
> +		}
> +	}
> +
> +	return 0;
> +
> +bad_return:
> +	/* verity_tree functions aren't expected to return errno friendly
> +	 * values.  They are converted here for uniformity.
> +	 */
> +	if (r > 0) {
> +		DMERR("Pending data for block %llu seen at verify", ULL(block));
> +		r = -EBUSY;
> +	} else {
> +		DMERR_LIMIT("Block hash does not match!");
> +		r = -EACCES;
> +	}
> +	return r;
> +}
> +
> +/* Services the verify workqueue */
> +static void kverityd_verify(struct work_struct *work)
> +{
> +	struct delayed_work *dwork = container_of(work, struct delayed_work,
> +						  work);
> +	struct verity_io *io = container_of(dwork, struct verity_io,
> +					    work);
> +	struct verity_config *vc = io->target->private;
> +
> +	io->error = verity_verify(vc, io);
> +
> +	/* Free up the bio and tag with the return value */
> +	verity_return_bio_to_caller(io);
> +}
> +
> +/* Asynchronously called upon the completion of verity-tree I/O. The status
> + * of the operation is passed back to verity-tree and the next steps are
> + * decided by verity_dec_pending.
> + */
> +static void kverityd_io_bht_populate_end(struct bio *bio, int error)
> +{
> +	struct verity_tree_entry *entry;
> +	struct verity_io *io;
> +
> +	entry = (struct verity_tree_entry *) bio->bi_private;
> +	io = (struct verity_io *) entry->io_context;
> +
> +	/* Tell the tree to atomically update now that we've populated
> +	 * the given entry.
> +	 */
> +	verity_tree_read_completed(entry, error);
> +
> +	/* Clean up for reuse when reading data to be checked */
> +	bio->bi_vcnt = 0;
> +	bio->bi_io_vec->bv_offset = 0;
> +	bio->bi_io_vec->bv_len = 0;
> +	bio->bi_io_vec->bv_page = NULL;
> +	/* Restore the private data to I/O so the destructor can be shared. */
> +	bio->bi_private = (void *) io;
> +	bio_put(bio);
> +
> +	/* We bail but assume the tree has been marked bad. */
> +	if (unlikely(error)) {
> +		DMERR("Failed to read for sector %llu (%u)",
> +		      ULL(io->bio->bi_sector), io->bio->bi_size);
> +		io->error = error;
> +		/* Pass through the error to verity_dec_pending below */
> +	}
> +	/* When pending = 0, it will transition to reading real data */
> +	verity_dec_pending(io);
> +}
> +
> +/* Called by verity-tree (via verity_tree_populate), this function provides
> + * the message digests to verity-tree that are stored on disk.
> + */
> +static int kverityd_bht_read_callback(void *ctx, sector_t start, u8 *dst,
> +				      sector_t count,
> +				      struct verity_tree_entry *entry)
> +{
> +	struct verity_io *io = ctx;  /* I/O for this batch */
> +	struct verity_config *vc;
> +	struct bio *bio;
> +
> +	vc = io->target->private;
> +
> +	/* The I/O context is nested inside the entry so that we don't need one
> +	 * io context per page read.
> +	 */
> +	entry->io_context = ctx;
> +
> +	/* We should only get page size requests at present. */
> +	verity_inc_pending(io);
> +	bio = verity_alloc_bioset(vc, GFP_NOIO, 1);
> +	if (unlikely(!bio)) {
> +		DMCRIT("Out of memory at bio_alloc_bioset");
> +		verity_tree_read_completed(entry, -ENOMEM);
> +		return -ENOMEM;
> +	}
> +	bio->bi_private = (void *) entry;
> +	bio->bi_idx = 0;
> +	bio->bi_size = vc->bht.block_size;
> +	bio->bi_sector = vc->hash_start + start;
> +	bio->bi_bdev = vc->hash_dev->bdev;
> +	bio->bi_end_io = kverityd_io_bht_populate_end;
> +	bio->bi_rw = REQ_META;
> +	/* Only need to free the bio since the page is managed by bht */
> +	bio->bi_destructor = verity_bio_destructor;
> +	bio->bi_vcnt = 1;
> +	bio->bi_io_vec->bv_offset = offset_in_page(dst);
> +	bio->bi_io_vec->bv_len = to_bytes(count);
> +	/* dst is guaranteed to be a page_pool allocation */
> +	bio->bi_io_vec->bv_page = virt_to_page(dst);
> +	/* Track that this I/O is in use.  There should be no risk of the io
> +	 * being removed prior since this is called synchronously.
> +	 */
> +	generic_make_request(bio);
> +	return 0;
> +}
> +
> +/* Submits an io request for each missing block of block hashes.
> + * The last one to return will then enqueue this on the io workqueue.
> + */
> +static void kverityd_io_bht_populate(struct verity_io *io)
> +{
> +	struct verity_config *vc = io->target->private;
> +	u64 block;
> +
> +	for (block = io->block; block < io->block + io->count; ++block) {
> +		int ret = verity_tree_populate(&vc->bht, io, block);
> +
> +		if (ret < 0) {
> +			/* verity_dec_pending will handle the error case. */
> +			io->error = ret;
> +			break;
> +		}
> +	}
> +}
> +
> +/* Asynchronously called upon the completion of I/O issued
> + * from kverityd_src_io_read. verity_dec_pending() acts as
> + * the scheduler/flow manager.
> + */
> +static void kverityd_src_io_read_end(struct bio *clone, int error)
> +{
> +	struct verity_io *io = clone->bi_private;
> +
> +	if (unlikely(!bio_flagged(clone, BIO_UPTODATE) && !error))
> +		error = -EIO;
> +
> +	if (unlikely(error)) {
> +		DMERR("Error occurred: %d (%llu, %u)",
> +			error, ULL(clone->bi_sector), clone->bi_size);
> +		io->error = error;
> +	}
> +
> +	/* Release the clone which just avoids the block layer from
> +	 * leaving offsets, etc in unexpected states.
> +	 */
> +	bio_put(clone);
> +
> +	verity_dec_pending(io);
> +}
> +
> +/* If not yet underway, an I/O request will be issued to the vc->dev
> + * device for the data needed. It is cloned to avoid unexpected changes
> + * to the original bio struct.
> + */
> +static void kverityd_src_io_read(struct verity_io *io)
> +{
> +	struct bio *clone;
> +
> +	/* Check if the read is already issued. */
> +	if (io->flags & VERITY_IOFLAGS_CLONED)
> +		return;
> +
> +	io->flags |= VERITY_IOFLAGS_CLONED;
> +
> +	/* Clone the bio. The block layer may modify the bvec array. */
> +	clone = verity_bio_clone(io);
> +	if (unlikely(!clone)) {
> +		io->error = -ENOMEM;
> +		return;
> +	}
> +
> +	verity_inc_pending(io);
> +
> +	generic_make_request(clone);
> +}
> +
> +/* kverityd_io services the I/O workqueue. For each pass through
> + * the I/O workqueue, a call to populate both the origin drive
> + * data and the hash tree data is made.
> + */
> +static void kverityd_io(struct work_struct *work)
> +{
> +	struct delayed_work *dwork = container_of(work, struct delayed_work,
> +						  work);
> +	struct verity_io *io = container_of(dwork, struct verity_io,
> +					    work);
> +
> +	/* Issue requests asynchronously. */
> +	verity_inc_pending(io);
> +	kverityd_src_io_read(io);
> +	kverityd_io_bht_populate(io);
> +	verity_dec_pending(io);
> +}
> +
> +/* Paired with verity_dec_pending, the pending value in the io dictate the
> + * lifetime of a request and when it is ready to be processed on the
> + * workqueues.
> + */
> +static void verity_inc_pending(struct verity_io *io)
> +{
> +	atomic_inc(&io->pending);
> +}
> +
> +/* Block-level requests start here. */
> +static int verity_map(struct dm_target *ti, struct bio *bio,
> +		      union map_info *map_context)
> +{
> +	struct verity_io *io;
> +	struct verity_config *vc;
> +	struct request_queue *r_queue;
> +
> +	if (unlikely(!ti)) {
> +		DMERR("dm_target was NULL");
> +		return -EIO;
> +	}
> +
> +	vc = ti->private;
> +	r_queue = bdev_get_queue(vc->dev->bdev);
> +
> +	if (bio_data_dir(bio) == WRITE) {
> +		/* If we silently drop writes, then the VFS layer will cache
> +		 * the write and persist it in memory. While it doesn't change
> +		 * the underlying storage, it still may be contrary to the
> +		 * behavior expected by a verified, read-only device.
> +		 */
> +		DMWARN_LIMIT("write request received. rejecting with -EIO.");
> +		return -EIO;
> +	} else {
> +		/* Queue up the request to be verified */
> +		io = verity_io_alloc(ti, bio);
> +		if (!io) {
> +			DMERR_LIMIT("Failed to allocate and init IO data");
> +			return DM_MAPIO_REQUEUE;
> +		}
> +		INIT_DELAYED_WORK(&io->work, kverityd_io);
> +		queue_delayed_work(kverityd_ioq, &io->work, 0);
> +	}
> +
> +	return DM_MAPIO_SUBMITTED;
> +}
> +
> +/*
> + * Non-block interfaces and device-mapper specific code
> + */
> +
> +/*
> + * Verity target parameters:
> + *
> + * <dev> <hash_dev> <hash_start> <block_size> <alg> <digest> <salt>
> + *
> + * version:        version of the hash tree on-disk format
> + * dev:            device to verify
> + * hash_dev:       device hashtree is stored on
> + * hash_start:     start address of hashes
> + * block_size:     size of a hash block
> + * alg:            hash algorithm
> + * digest:         toplevel hash of the tree
> + * salt:           salt
> + */
> +static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv)
> +{
> +	struct verity_config *vc = NULL;
> +	const char *dev, *hash_dev, *alg, *digest, *salt;
> +	unsigned long hash_start, block_size, version;
> +	sector_t blocks;
> +	int ret;
> +
> +	if (argc != 8) {
> +		ti->error = "Invalid argument count";
> +		return -EINVAL;
> +	}
> +
> +	if (strict_strtoul(argv[0], 10, &version) ||
> +	    (version != 0)) {
> +		ti->error = "Invalid version";
> +		return -EINVAL;
> +	}
> +	dev = argv[1];
> +	hash_dev = argv[2];
> +	if (strict_strtoul(argv[3], 10, &hash_start)) {
> +		ti->error = "Invalid hash_start";
> +		return -EINVAL;
> +	}
> +	if (strict_strtoul(argv[4], 10, &block_size) ||
> +	    (block_size > UINT_MAX)) {
> +		ti->error = "Invalid block_size";
> +		return -EINVAL;
> +	}
> +	alg = argv[5];
> +	digest = argv[6];
> +	salt = argv[7];
> +
> +	/* The device mapper device should be setup read-only */
> +	if ((dm_table_get_mode(ti->table) & ~FMODE_READ) != 0) {
> +		ti->error = "Must be created readonly.";
> +		return -EINVAL;
> +	}
> +
> +	vc = kzalloc(sizeof(*vc), GFP_KERNEL);
> +	if (!vc) {
> +		return -EINVAL;
> +	}
> +
> +	/* Calculate the blocks from the given device size */
> +	vc->size = ti->len;
> +	blocks = to_bytes(vc->size) / block_size;
> +	if (verity_tree_create(&vc->bht, blocks, block_size, alg)) {
> +		DMERR("failed to create required bht");
> +		goto bad_bht;
> +	}
> +	if (verity_tree_set_digest(&vc->bht, digest)) {
> +		DMERR("digest error");
> +		goto bad_digest;
> +	}
> +	verity_tree_set_salt(&vc->bht, salt);
> +	vc->bht.read_cb = kverityd_bht_read_callback;
> +
> +	vc->start = 0;
> +	/* We only ever grab the device in read-only mode. */
> +	ret = dm_get_device(ti, dev, dm_table_get_mode(ti->table), &vc->dev);
> +	if (ret) {
> +		DMERR("Failed to acquire device '%s': %d", dev, ret);
> +		ti->error = "Device lookup failed";
> +		goto bad_verity_dev;
> +	}
> +
> +	if ((to_bytes(vc->start) % block_size) ||
> +	    (to_bytes(vc->size) % block_size)) {
> +		ti->error = "Device must be block_size divisble/aligned";
> +		goto bad_hash_start;
> +	}
> +
> +	vc->hash_start = (sector_t)hash_start;
> +
> +	/*
> +	 * Note, dev == hash_dev is okay as long as the size of
> +	 *       ti->len passed to device mapper does not include
> +	 *       the hashes.
> +	 */
> +	if (dm_get_device(ti, hash_dev,
> +			  dm_table_get_mode(ti->table), &vc->hash_dev)) {
> +		ti->error = "Hash device lookup failed";
> +		goto bad_hash_dev;
> +	}
> +
> +	if (snprintf(vc->hash_alg, CRYPTO_MAX_ALG_NAME, "%s", alg) >=
> +	    CRYPTO_MAX_ALG_NAME) {
> +		ti->error = "Hash algorithm name is too long";
> +		goto bad_hash;
> +	}
> +
> +	vc->io_pool = mempool_create_slab_pool(MIN_IOS, _verity_io_pool);
> +	if (!vc->io_pool) {
> +		ti->error = "Cannot allocate verity io mempool";
> +		goto bad_slab_pool;
> +	}
> +
> +	vc->bs = bioset_create(MIN_BIOS, 0);
> +	if (!vc->bs) {
> +		ti->error = "Cannot allocate verity bioset";
> +		goto bad_bs;
> +	}
> +
> +	ti->private = vc;
> +
> +	return 0;
> +
> +bad_bs:
> +	mempool_destroy(vc->io_pool);
> +bad_slab_pool:
> +bad_hash:
> +	dm_put_device(ti, vc->hash_dev);
> +bad_hash_dev:
> +bad_hash_start:
> +	dm_put_device(ti, vc->dev);
> +bad_bht:
> +bad_digest:
> +bad_verity_dev:
> +	kfree(vc);   /* hash is not secret so no need to zero */
> +	return -EINVAL;
> +}
> +
> +static void verity_dtr(struct dm_target *ti)
> +{
> +	struct verity_config *vc = (struct verity_config *) ti->private;
> +
> +	bioset_free(vc->bs);
> +	mempool_destroy(vc->io_pool);
> +	verity_tree_destroy(&vc->bht);
> +	dm_put_device(ti, vc->hash_dev);
> +	dm_put_device(ti, vc->dev);
> +	kfree(vc);
> +}
> +
> +static int verity_ioctl(struct dm_target *ti, unsigned int cmd,
> +			unsigned long arg)
> +{
> +	struct verity_config *vc = (struct verity_config *) ti->private;
> +	struct dm_dev *dev = vc->dev;
> +	int r = 0;
> +
> +	/*
> +	 * Only pass ioctls through if the device sizes match exactly.
> +	 */
> +	if (vc->start ||
> +	    ti->len != i_size_read(dev->bdev->bd_inode) >> SECTOR_SHIFT)
> +		r = scsi_verify_blk_ioctl(NULL, cmd);
> +
> +	return r ? : __blkdev_driver_ioctl(dev->bdev, dev->mode, cmd, arg);
> +}
> +
> +static int verity_status(struct dm_target *ti, status_type_t type,
> +			char *result, unsigned int maxlen)
> +{
> +	struct verity_config *vc = (struct verity_config *) ti->private;
> +	unsigned int sz = 0;
> +	char digest[VERITY_MAX_DIGEST_SIZE * 2 + 1] = { 0 };
> +	char salt[VERITY_SALT_SIZE * 2 + 1] = { 0 };
> +
> +	verity_tree_digest(&vc->bht, digest);
> +	verity_tree_salt(&vc->bht, salt);
> +
> +	switch (type) {
> +	case STATUSTYPE_INFO:
> +		result[0] = '\0';
> +		break;
> +	case STATUSTYPE_TABLE:
> +		DMEMIT("%s %s %llu %llu %s %s %s",
> +		       vc->dev->name,
> +		       vc->hash_dev->name,
> +		       ULL(vc->hash_start),
> +		       ULL(vc->bht.block_size),
> +		       vc->hash_alg,
> +		       digest,
> +		       salt);
> +		break;
> +	}
> +	return 0;
> +}
> +
> +static int verity_merge(struct dm_target *ti, struct bvec_merge_data *bvm,
> +		       struct bio_vec *biovec, int max_size)
> +{
> +	struct verity_config *vc = ti->private;
> +	struct request_queue *q = bdev_get_queue(vc->dev->bdev);
> +
> +	if (!q->merge_bvec_fn)
> +		return max_size;
> +
> +	bvm->bi_bdev = vc->dev->bdev;
> +	bvm->bi_sector = vc->start + bvm->bi_sector - ti->begin;
> +
> +	/* Optionally, this could just return 0 to stick to single pages. */
> +	return min(max_size, q->merge_bvec_fn(q, bvm, biovec));
> +}
> +
> +static int verity_iterate_devices(struct dm_target *ti,
> +				 iterate_devices_callout_fn fn, void *data)
> +{
> +	struct verity_config *vc = ti->private;
> +
> +	return fn(ti, vc->dev, vc->start, ti->len, data);
> +}
> +
> +static void verity_io_hints(struct dm_target *ti,
> +			    struct queue_limits *limits)
> +{
> +	struct verity_config *vc = ti->private;
> +	unsigned int block_size = vc->bht.block_size;
> +
> +	limits->logical_block_size = block_size;
> +	limits->physical_block_size = block_size;
> +	blk_limits_io_min(limits, block_size);
> +}
> +
> +static struct target_type verity_target = {
> +	.name   = "verity",
> +	.version = {0, 1, 0},
> +	.module = THIS_MODULE,
> +	.ctr    = verity_ctr,
> +	.dtr    = verity_dtr,
> +	.ioctl  = verity_ioctl,
> +	.map    = verity_map,
> +	.merge  = verity_merge,
> +	.status = verity_status,
> +	.iterate_devices = verity_iterate_devices,
> +	.io_hints = verity_io_hints,
> +};
> +
> +#define VERITY_WQ_FLAGS (WQ_CPU_INTENSIVE|WQ_HIGHPRI)
> +
> +static int __init verity_init(void)
> +{
> +	int r = -ENOMEM;
> +
> +	_verity_io_pool = KMEM_CACHE(verity_io, 0);
> +	if (!_verity_io_pool) {
> +		DMERR("failed to allocate pool verity_io");
> +		goto bad_io_pool;
> +	}
> +
> +	kverityd_ioq = alloc_workqueue("kverityd_io", VERITY_WQ_FLAGS, 1);
> +	if (!kverityd_ioq) {
> +		DMERR("failed to create workqueue kverityd_ioq");
> +		goto bad_io_queue;
> +	}
> +
> +	kveritydq = alloc_workqueue("kverityd", VERITY_WQ_FLAGS, 1);
> +	if (!kveritydq) {
> +		DMERR("failed to create workqueue kveritydq");
> +		goto bad_verify_queue;
> +	}
> +
> +	r = dm_register_target(&verity_target);
> +	if (r < 0) {
> +		DMERR("register failed %d", r);
> +		goto register_failed;
> +	}
> +
> +	DMINFO("version %u.%u.%u loaded", verity_target.version[0],
> +	       verity_target.version[1], verity_target.version[2]);
> +
> +	return r;
> +
> +register_failed:
> +	destroy_workqueue(kveritydq);
> +bad_verify_queue:
> +	destroy_workqueue(kverityd_ioq);
> +bad_io_queue:
> +	kmem_cache_destroy(_verity_io_pool);
> +bad_io_pool:
> +	return r;
> +}
> +
> +static void __exit verity_exit(void)
> +{
> +	destroy_workqueue(kveritydq);
> +	destroy_workqueue(kverityd_ioq);
> +
> +	dm_unregister_target(&verity_target);
> +	kmem_cache_destroy(_verity_io_pool);
> +}
> +
> +module_init(verity_init);
> +module_exit(verity_exit);
> +
> +MODULE_AUTHOR("The Chromium OS Authors <chromium-os-dev@chromium.org>");
> +MODULE_DESCRIPTION(DM_NAME " target for transparent disk integrity checking");
> +MODULE_LICENSE("GPL");
> -- 
> 1.7.7.3
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] dm: verity target
@ 2012-02-28 22:57 Mandeep Singh Baines
  2012-02-29 21:16 ` Mikulas Patocka
  2012-02-29 21:30 ` Andrew Morton
  0 siblings, 2 replies; 22+ messages in thread
From: Mandeep Singh Baines @ 2012-02-28 22:57 UTC (permalink / raw)
  To: Alasdair G Kergon, dm-devel, linux-kernel
  Cc: Mandeep Singh Baines, Will Drewry, Elly Jones, Milan Broz,
	Olof Johansson, Steffen Klassert, Andrew Morton, Mikulas Patocka

The verity target provides transparent integrity checking of block devices
using a cryptographic digest.

dm-verity is meant to be setup as part of a verified boot path.  This
may be anything ranging from a boot using tboot or trustedgrub to just
booting from a known-good device (like a USB drive or CD).

dm-verity is part of ChromeOS's verified boot path. It is used to verify
the integrity of the root filesystem on boot. The root filesystem is
mounted on a dm-verity partition which transparently verifies each block
with a bootloader verified hash passed into the kernel at boot.

Changes in V4:
* Discussion over phone (Alasdair G Kergon)
 * copy _ioctl fix from dm-linear
 * verity_status format fixes to match dm conventions
 * s/dm-bht/verity_tree
 * put everything into dm-verity.c
 * ctr changed to dm conventions
 * use hex2bin
 * use conventional dm names for function
  * s/dm_//
  * for example: verity_ctr versus dm_verity_ctr
 * use per_cpu API
Changes in V3:
* Discussion over irc (Alasdair G Kergon)
  * Implement ioctl hook
Changes in V2:
* https://lkml.org/lkml/2011/11/10/85 (Steffen Klassert)
  * Use shash API instead of older hash API

Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Elly Jones <ellyjones@chromium.org>
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Milan Broz <mbroz@redhat.com>
Cc: Olof Johansson <olofj@chromium.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: dm-devel@redhat.com
---
 Documentation/device-mapper/verity.txt |  149 ++++
 drivers/md/Kconfig                     |   16 +
 drivers/md/Makefile                    |    1 +
 drivers/md/dm-verity.c                 | 1411 ++++++++++++++++++++++++++++++++
 4 files changed, 1577 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/device-mapper/verity.txt
 create mode 100644 drivers/md/dm-verity.c

diff --git a/Documentation/device-mapper/verity.txt b/Documentation/device-mapper/verity.txt
new file mode 100644
index 0000000..b631f12
--- /dev/null
+++ b/Documentation/device-mapper/verity.txt
@@ -0,0 +1,149 @@
+dm-verity
+==========
+
+Device-Mapper's "verity" target provides transparent integrity checking of
+block devices using a cryptographic digest provided by the kernel crypto API.
+This target is read-only.
+
+Parameters:
+    <version> <dev> <hash_dev> <hash_start> <block_size> <alg> <digest> <salt>
+
+<version>
+    This is the version number of the on-disk format. Currently, there is
+    only version 0.
+
+<dev>
+    This is the device that is going to be integrity checked.  It may be
+    a subset of the full device as specified to dmsetup (start sector and count)
+    It may be specified as a path, like /dev/sdaX, or a device number,
+    <major>:<minor>.
+
+<hash_dev>
+    This is the device that that supplies the hash tree data.  It may be
+    specified similarly to the device path and may be the same device.  If the
+    same device is used, the hash offset should be outside of the dm-verity
+    configured device size.
+
+<hash_start>
+    This is the offset, in 512-byte sectors, from the start of hash_dev to
+    the root block of the hash tree.
+
+<block_size>
+    The size of a hash block. Also, the size of a block to be hashed.
+
+<alg>
+    The cryptographic hash algorithm used for this device.  This should
+    be the name of the algorithm, like "sha1".
+
+<digest>
+    The hexadecimal encoding of the cryptographic hash of all of the
+    neighboring nodes at the first level of the tree.  This hash should be
+    trusted as there is no other authenticity beyond this point.
+
+<salt>
+    The hexadecimal encoding of the salt value.
+
+Theory of operation
+===================
+
+dm-verity is meant to be setup as part of a verified boot path.  This
+may be anything ranging from a boot using tboot or trustedgrub to just
+booting from a known-good device (like a USB drive or CD).
+
+When a dm-verity device is configured, it is expected that the caller
+has been authenticated in some way (cryptographic signatures, etc).
+After instantiation, all hashes will be verified on-demand during
+disk access.  If they cannot be verified up to the root node of the
+tree, the root hash, then the I/O will fail.  This should identify
+tampering with any data on the device and the hash data.
+
+Cryptographic hashes are used to assert the integrity of the device on a
+per-block basis.  This allows for a lightweight hash computation on first read
+into the page cache.  Block hashes are stored linearly aligned to the nearest
+block the size of a page.
+
+Hash Tree
+---------
+
+Each node in the tree is a cryptographic hash.  If it is a leaf node, the hash
+is of some block data on disk.  If it is an intermediary node, then the hash is
+of a number of child nodes.
+
+Each entry in the tree is a collection of neighboring nodes that fit in one
+block.  The number is determined based on block_size and the size of the
+selected cryptographic digest algorithm.  The hashes are linearly ordered in
+this entry and any unaligned trailing space is ignored but included when
+calculating the parent node.
+
+The tree looks something like:
+
+alg = sha256, num_blocks = 32768, block_size = 4096
+
+                                 [   root    ]
+                                /    . . .    \
+                     [entry_0]                 [entry_1]
+                    /  . . .  \                 . . .   \
+         [entry_0_0]   . . .  [entry_0_127]    . . . .  [entry_1_127]
+           / ... \             /   . . .  \             /           \
+     blk_0 ... blk_127  blk_16256   blk_16383      blk_32640 . . . blk_32767
+
+On-disk format
+==============
+
+Below is the recommended on-disk format. The verity kernel code does not
+read the on-disk header. It only reads the hash blocks which directly
+follow the header. It is expected that a user-space tool will verify the
+integrity of the verity_header and then call dm_setup with the correct
+parameters. Alternatively, the header can be omitted and the dm_setup
+parameters can be passed via the kernel command-line in a rooted chain
+of trust where the command-line is verified.
+
+The on-disk format is especially useful in cases where the hash blocks
+are on a separate partition. The magic number allows easy identification
+of the partition contents. Alternatively, the hash blocks can be stored
+in the same partition as the data to be verified. In such a configuration
+the filesystem on the partition would be sized a little smaller than
+the full-partition, leaving room for the hash blocks.
+
+struct verity_header {
+       uint64_t magic = 0x7665726974790a00;
+       uint32_t version;
+       uint32_t block_size;
+       char digest[128]; /* in hex-ascii, null-terminated or 128-bytes */
+       char salt[128]; /* in hex-ascii, null-terminated or 128-bytes */
+}
+
+struct verity_header_block {
+	struct verity_header;
+	char unused[block_size - sizeof(struct verity_header) - sizeof(sig)];
+	char sig[128]; /* in hex-ascii, null-terminated or 128-bytes */
+}
+
+Directly following the header are the hash blocks which are stored a depth
+at a time (starting from the root), sorted in order of increasing index.
+
+Usage
+=====
+
+The API provides mechanisms for reading and verifying a tree. When reading, all
+required data for the hash tree should be populated for a block before
+attempting a verify.  This can be done by calling dm_bht_populate().  When all
+data is ready, a call to dm_bht_verify_block() with the expected hash value will
+perform both the direct block hash check and the hashes of the parent and
+neighboring nodes where needed to ensure validity up to the root hash.  Note,
+dm_bht_set_root_hexdigest() should be called before any verification attempts
+occur.
+
+Example
+=======
+
+Setup a device;
+[[
+  dmsetup create vroot --table \
+    "0 204800 verity /dev/sda1 /dev/sda2 alg=sha1 "\
+    "root_hexdigest=9f74809a2ee7607b16fcc70d9399a4de9725a727"
+]]
+
+A command line tool is available to compute the hash tree and return the
+root hash value.
+  http://git.chromium.org/cgi-bin/gitweb.cgi?p=dm-verity.git;a=tree
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index faa4741..b8bb690 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -370,4 +370,20 @@ config DM_FLAKEY
        ---help---
          A target that intermittently fails I/O for debugging purposes.
 
+config DM_VERITY
+        tristate "Verity target support"
+        depends on BLK_DEV_DM
+        select CRYPTO
+        select CRYPTO_HASH
+        ---help---
+          This device-mapper target allows you to create a device that
+          transparently integrity checks the data on it. You'll need to
+          activate the digests you're going to use in the cryptoapi
+          configuration.
+
+          To compile this code as a module, choose M here: the module will
+          be called dm-verity.
+
+          If unsure, say N.
+
 endif # MD
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index 046860c..70a29af 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -39,6 +39,7 @@ obj-$(CONFIG_DM_SNAPSHOT)	+= dm-snapshot.o
 obj-$(CONFIG_DM_PERSISTENT_DATA)	+= persistent-data/
 obj-$(CONFIG_DM_MIRROR)		+= dm-mirror.o dm-log.o dm-region-hash.o
 obj-$(CONFIG_DM_LOG_USERSPACE)	+= dm-log-userspace.o
+obj-$(CONFIG_DM_VERITY)         += dm-verity.o
 obj-$(CONFIG_DM_ZERO)		+= dm-zero.o
 obj-$(CONFIG_DM_RAID)	+= dm-raid.o
 obj-$(CONFIG_DM_THIN_PROVISIONING)	+= dm-thin-pool.o
diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
new file mode 100644
index 0000000..87b7958
--- /dev/null
+++ b/drivers/md/dm-verity.c
@@ -0,0 +1,1411 @@
+/*
+ * Originally based on dm-crypt.c,
+ * Copyright (C) 2003 Christophe Saout <christophe@saout.de>
+ * Copyright (C) 2004 Clemens Fruhwirth <clemens@endorphin.org>
+ * Copyright (C) 2006-2008 Red Hat, Inc. All rights reserved.
+ * Copyright (C) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *                    All Rights Reserved.
+ *
+ * This file is released under the GPLv2.
+ *
+ * Implements a verifying transparent block device.
+ * See Documentation/device-mapper/dm-verity.txt
+ */
+#include <crypto/hash.h>
+#include <linux/atomic.h>
+#include <linux/bio.h>
+#include <linux/blkdev.h>
+#include <linux/genhd.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mempool.h>
+#include <linux/module.h>
+#include <linux/workqueue.h>
+#include <linux/device-mapper.h>
+
+
+#define DM_MSG_PREFIX "verity"
+
+
+/* Helper for printing sector_t */
+#define ULL(x) ((unsigned long long)(x))
+
+#define MIN_IOS 32
+#define MIN_BIOS (MIN_IOS * 2)
+
+/* To avoid allocating memory for digest tests, we just setup a
+ * max to use for now.
+ */
+#define VERITY_MAX_DIGEST_SIZE 64   /* Supports up to 512-bit digests */
+#define VERITY_SALT_SIZE       32   /* 256 bits of salt is a lot */
+
+/* UNALLOCATED, PENDING, READY, and VERIFIED are valid states. All other
+ * values are entry-related return codes.
+ */
+#define VERITY_TREE_ENTRY_VERIFIED 8  /* 'nodes' checked against parent */
+#define VERITY_TREE_ENTRY_READY 4  /* 'nodes' is loaded and available */
+#define VERITY_TREE_ENTRY_PENDING 2  /* 'nodes' is being loaded */
+#define VERITY_TREE_ENTRY_UNALLOCATED 0 /* untouched */
+#define VERITY_TREE_ENTRY_ERROR -1 /* entry is unsuitable for use */
+#define VERITY_TREE_ENTRY_ERROR_IO -2 /* I/O error on load */
+
+/* Additional possible return codes */
+#define VERITY_TREE_ENTRY_ERROR_MISMATCH -3 /* Digest mismatch */
+
+
+/* verity_tree_entry
+ * Contains verity_tree->node_count tree nodes at a given tree depth.
+ * state is used to transactionally assure that data is paged in
+ * from disk.  Unless verity_tree kept running crypto contexts for each
+ * level, we need to load in the data for on-demand verification.
+ */
+struct verity_tree_entry {
+	atomic_t state; /* see defines */
+	/* Keeping an extra pointer per entry wastes up to ~33k of
+	 * memory if a 1m blocks are used (or 66 on 64-bit arch)
+	 */
+	void *io_context;  /* Reserve a pointer for use during io */
+	/* data should only be non-NULL if fully populated. */
+	void *nodes;  /* The hash data used to verify the children.
+		       * Guaranteed to be page-aligned.
+		       */
+};
+
+/* verity_tree_level
+ * Contains an array of entries which represent a page of hashes where
+ * each hash is a node in the tree at the given tree depth/level.
+ */
+struct verity_tree_level {
+	struct verity_tree_entry *entries;  /* array of entries of tree nodes */
+	unsigned int count;  /* number of entries at this level */
+	sector_t sector;  /* starting sector for this level */
+};
+
+/* opaque context, start, databuf, sector_count */
+typedef int(*verity_tree_callback)(void *,  /* external context */
+			      sector_t,  /* start sector */
+			      u8 *,  /* destination page */
+			      sector_t,  /* num sectors */
+			      struct verity_tree_entry *);
+/* verity_tree - Device mapper block hash tree
+ * verity_tree provides a fixed interface for comparing data blocks
+ * against a cryptographic hashes stored in a hash tree. It
+ * optimizes the tree structure for storage on disk.
+ *
+ * The tree is built from the bottom up.  A collection of data,
+ * external to the tree, is hashed and these hashes are stored
+ * as the blocks in the tree.  For some number of these hashes,
+ * a parent node is created by hashing them.  These steps are
+ * repeated.
+ */
+struct verity_tree {
+	/* Configured values */
+	int depth;  /* Depth of the tree including the root */
+	unsigned int block_count;  /* Number of blocks hashed */
+	unsigned int block_size;  /* Size of a hash block */
+	char hash_alg[CRYPTO_MAX_ALG_NAME];
+	u8 salt[VERITY_SALT_SIZE];
+
+	/* Computed values */
+	unsigned int node_count;  /* Data size (in hashes) for each entry */
+	unsigned int node_count_shift;  /* first bit set - 1 */
+	/*
+	 * There is one per CPU so that verified can be simultaneous.
+	 * Access through per_cpu_ptr() only
+	 */
+	struct shash_desc * __percpu *hash_desc; /* Container for hash alg */
+	unsigned int digest_size;
+	sector_t sectors;  /* Number of disk sectors used */
+
+	/* bool verified;  Full tree is verified */
+	u8 digest[VERITY_MAX_DIGEST_SIZE];
+	struct verity_tree_level *levels;  /* in reverse order */
+	/* Callback for reading from the hash device */
+	verity_tree_callback read_cb;
+};
+
+/* per-requested-bio private data */
+enum verity_io_flags {
+	VERITY_IOFLAGS_CLONED = 0x1,	/* original bio has been cloned */
+};
+
+struct verity_io {
+	struct dm_target *target;
+	struct bio *bio;
+	struct delayed_work work;
+	unsigned int flags;
+
+	int error;
+	atomic_t pending;
+
+	u64 block;  /* aligned block index */
+	u64 count;  /* aligned count in blocks */
+};
+
+struct verity_config {
+	struct dm_dev *dev;
+	sector_t start;
+	sector_t size;
+
+	struct dm_dev *hash_dev;
+	sector_t hash_start;
+
+	struct verity_tree bht;
+
+	/* Pool required for io contexts */
+	mempool_t *io_pool;
+	/* Pool and bios required for making sure that backing device reads are
+	 * in PAGE_SIZE increments.
+	 */
+	struct bio_set *bs;
+
+	char hash_alg[CRYPTO_MAX_ALG_NAME];
+};
+
+
+static struct kmem_cache *_verity_io_pool;
+static struct workqueue_struct *kveritydq, *kverityd_ioq;
+
+
+static void kverityd_verify(struct work_struct *work);
+static void kverityd_io(struct work_struct *work);
+static void kverityd_io_bht_populate(struct verity_io *io);
+static void kverityd_io_bht_populate_end(struct bio *, int error);
+
+
+/*
+ * Utilities
+ */
+
+static void bin2hex(char *dst, const u8 *src, size_t count)
+{
+	while (count-- > 0) {
+		sprintf(dst, "%02hhx", (int)*src);
+		dst += 2;
+		src++;
+	}
+}
+
+/*
+ * Verity Tree
+ */
+
+/* Functions for converting indices to nodes. */
+
+static inline unsigned int verity_tree_get_level_shift(struct verity_tree *bht,
+						  int depth)
+{
+	return (bht->depth - depth) * bht->node_count_shift;
+}
+
+/* For the given depth, this is the entry index.  At depth+1 it is the node
+ * index for depth.
+ */
+static inline unsigned int verity_tree_index_at_level(struct verity_tree *bht,
+						      int depth,
+						      unsigned int leaf)
+{
+	return leaf >> verity_tree_get_level_shift(bht, depth);
+}
+
+static inline struct verity_tree_entry *verity_tree_get_entry(
+		struct verity_tree *bht,
+		int depth,
+		unsigned int block)
+{
+	unsigned int index = verity_tree_index_at_level(bht, depth, block);
+	struct verity_tree_level *level = &bht->levels[depth];
+
+	return &level->entries[index];
+}
+
+static inline void *verity_tree_get_node(struct verity_tree *bht,
+					 struct verity_tree_entry *entry,
+					 int depth, unsigned int block)
+{
+	unsigned int index = verity_tree_index_at_level(bht, depth, block);
+	unsigned int node_index = index % bht->node_count;
+
+	return entry->nodes + (node_index * bht->digest_size);
+}
+/**
+ * verity_tree_compute_hash: hashes a page of data
+ */
+static int verity_tree_compute_hash(struct verity_tree *vt, struct page *pg,
+				    unsigned int offset, u8 *digest)
+{
+	struct shash_desc *hash_desc;
+	void *data;
+	int err;
+
+	hash_desc = *per_cpu_ptr(vt->hash_desc, smp_processor_id());
+
+	if (crypto_shash_init(hash_desc)) {
+		DMCRIT("failed to reinitialize crypto hash (proc:%d)",
+			smp_processor_id());
+		return -EINVAL;
+	}
+	data = kmap_atomic(pg);
+	err = crypto_shash_update(hash_desc, data + offset, PAGE_SIZE);
+	kunmap_atomic(data);
+	if (err) {
+		DMCRIT("crypto_hash_update failed");
+		return -EINVAL;
+	}
+	if (crypto_shash_update(hash_desc, vt->salt, sizeof(vt->salt))) {
+		DMCRIT("crypto_hash_update failed");
+		return -EINVAL;
+	}
+	if (crypto_shash_final(hash_desc, digest)) {
+		DMCRIT("crypto_hash_final failed");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int verity_tree_initialize_entries(struct verity_tree *vt)
+{
+	/* last represents the index of the last digest store in the tree.
+	 * By walking the tree with that index, it is possible to compute the
+	 * total number of entries at each level.
+	 *
+	 * Since each entry will contain up to |node_count| nodes of the tree,
+	 * it is possible that the last index may not be at the end of a given
+	 * entry->nodes.  In that case, it is assumed the value is padded.
+	 *
+	 * Note, we treat both the tree root (1 hash) and the tree leaves
+	 * independently from the vt data structures.  Logically, the root is
+	 * depth=-1 and the block layer level is depth=vt->depth
+	 */
+	unsigned int last = vt->block_count;
+	int depth;
+
+	/* check that the largest level->count can't result in an int overflow
+	 * on allocation or sector calculation.
+	 */
+	if (((last >> vt->node_count_shift) + 1) >
+	    UINT_MAX / max((unsigned int)sizeof(struct verity_tree_entry),
+			   (unsigned int)to_sector(vt->block_size))) {
+		DMCRIT("required entries %u is too large", last + 1);
+		return -EINVAL;
+	}
+
+	/* Track the current sector location for each level so we don't have to
+	 * compute it during traversals.
+	 */
+	vt->sectors = 0;
+	for (depth = 0; depth < vt->depth; ++depth) {
+		struct verity_tree_level *level = &vt->levels[depth];
+
+		level->count = verity_tree_index_at_level(vt, depth, last) + 1;
+		level->entries = (struct verity_tree_entry *)
+				 kcalloc(level->count,
+					 sizeof(struct verity_tree_entry),
+					 GFP_KERNEL);
+		if (!level->entries) {
+			DMERR("failed to allocate entries for depth %d", depth);
+			return -ENOMEM;
+		}
+		level->sector = vt->sectors;
+		vt->sectors += level->count * to_sector(vt->block_size);
+	}
+
+	return 0;
+}
+
+/**
+ * verity_tree_create - prepares @vt for us
+ * @vt:	          pointer to a verity_tree_create()d vt
+ * @depth:	  tree depth without the root; including block hashes
+ * @block_count:  the number of block hashes / tree leaves
+ * @alg_name:	  crypto hash algorithm name
+ *
+ * Returns 0 on success.
+ *
+ * Callers can offset into devices by storing the data in the io callbacks.
+ */
+static int verity_tree_create(struct verity_tree *vt, unsigned int block_count,
+			      unsigned int block_size, const char *alg_name)
+{
+	struct crypto_shash *tfm;
+	int size, cpu, status = 0;
+
+	vt->block_size = block_size;
+	/* Verify that PAGE_SIZE >= block_size >= SECTOR_SIZE. */
+	if ((block_size > PAGE_SIZE) ||
+	    (PAGE_SIZE % block_size) ||
+	    (to_sector(block_size) == 0))
+		return -EINVAL;
+
+	tfm = crypto_alloc_shash(alg_name, 0, 0);
+	if (IS_ERR(tfm)) {
+		DMERR("failed to allocate crypto hash '%s'", alg_name);
+		return -ENOMEM;
+	}
+	size = sizeof(struct shash_desc) + crypto_shash_descsize(tfm);
+
+	vt->hash_desc = alloc_percpu(struct shash_desc *);
+	if (!vt->hash_desc) {
+		DMERR("Failed to allocate per cpu hash_desc");
+		status = -ENOMEM;
+		goto bad_per_cpu;
+	}
+
+	/* Pre-allocate per-cpu crypto contexts to avoid having to
+	 * kmalloc/kfree a context for every hash operation.
+	 */
+	for_each_possible_cpu(cpu) {
+		struct shash_desc *hash_desc = kmalloc(size, GFP_KERNEL);
+
+		*per_cpu_ptr(vt->hash_desc, cpu) = hash_desc;
+		if (!hash_desc) {
+			DMERR("failed to allocate crypto hash contexts");
+			status = -ENOMEM;
+			goto bad_hash_alloc;
+		}
+		hash_desc->tfm = tfm;
+		hash_desc->flags = 0x0;
+	}
+	vt->digest_size = crypto_shash_digestsize(tfm);
+	/* We expect to be able to pack >=2 hashes into a block */
+	if (block_size / vt->digest_size < 2) {
+		DMERR("too few hashes fit in a block");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	if (vt->digest_size > VERITY_MAX_DIGEST_SIZE) {
+		DMERR("VERITY_MAX_DIGEST_SIZE too small for digest");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	/* Configure the tree */
+	vt->block_count = block_count;
+	if (block_count == 0) {
+		DMERR("block_count must be non-zero");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	/* Each verity_tree_entry->nodes is one block.  The node code tracks
+	 * how many nodes fit into one entry where a node is a single
+	 * hash (message digest).
+	 */
+	vt->node_count_shift = fls(block_size / vt->digest_size) - 1;
+	/* Round down to the nearest power of two.  This makes indexing
+	 * into the tree much less painful.
+	 */
+	vt->node_count = 1 << vt->node_count_shift;
+
+	/* This is unlikely to happen, but with 64k pages, who knows. */
+	if (vt->node_count > UINT_MAX / vt->digest_size) {
+		DMERR("node_count * hash_len exceeds UINT_MAX!");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	vt->depth = DIV_ROUND_UP(fls(block_count - 1), vt->node_count_shift);
+
+	/* Ensure that we can safely shift by this value. */
+	if (vt->depth * vt->node_count_shift >= sizeof(unsigned int) * 8) {
+		DMERR("specified depth and node_count_shift is too large");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	/* Allocate levels. Each level of the tree may have an arbitrary number
+	 * of verity_tree_entry structs.  Each entry contains node_count nodes.
+	 * Each node in the tree is a cryptographic digest of either node_count
+	 * nodes on the subsequent level or of a specific block on disk.
+	 */
+	vt->levels = (struct verity_tree_level *)
+			kcalloc(vt->depth,
+				sizeof(struct verity_tree_level), GFP_KERNEL);
+	if (!vt->levels) {
+		DMERR("failed to allocate tree levels");
+		status = -ENOMEM;
+		goto bad_level_alloc;
+	}
+
+	vt->read_cb = NULL;
+
+	status = verity_tree_initialize_entries(vt);
+	if (status)
+		goto bad_entries_alloc;
+
+	/* We compute depth such that there is only be 1 block at level 0. */
+	BUG_ON(vt->levels[0].count != 1);
+
+	return 0;
+
+bad_entries_alloc:
+	while (vt->depth-- > 0)
+		kfree(vt->levels[vt->depth].entries);
+	kfree(vt->levels);
+bad_level_alloc:
+bad_arg:
+bad_hash_alloc:
+	for_each_possible_cpu(cpu)
+		if (*per_cpu_ptr(vt->hash_desc, cpu))
+			kfree(*per_cpu_ptr(vt->hash_desc, cpu));
+	free_percpu(vt->hash_desc);
+bad_per_cpu:
+	crypto_free_shash(tfm);
+	return status;
+}
+
+/**
+ * verity_tree_read_completed
+ * @entry:   pointer to the entry that's been loaded
+ * @status:  I/O status. Non-zero is failure.
+ * MUST always be called after a read_cb completes.
+ */
+static void verity_tree_read_completed(struct verity_tree_entry *entry,
+				       int status)
+{
+	if (status) {
+		DMCRIT("an I/O error occurred while reading entry");
+		atomic_set(&entry->state, VERITY_TREE_ENTRY_ERROR_IO);
+		return;
+	}
+	BUG_ON(atomic_read(&entry->state) != VERITY_TREE_ENTRY_PENDING);
+	atomic_set(&entry->state, VERITY_TREE_ENTRY_READY);
+}
+
+/**
+ * verity_tree_verify_block - checks that all path nodes for @block are valid
+ * @vt:	     pointer to a verity_tree_create()d vt
+ * @block:   specific block data is expected from
+ * @pg:	     page holding the block data
+ * @offset:  offset into the page
+ *
+ * Returns 0 on success, VERITY_TREE_ENTRY_ERROR_MISMATCH on error.
+ */
+static int verity_tree_verify_block(struct verity_tree *vt, unsigned int block,
+				    struct page *pg, unsigned int offset)
+{
+	int state, depth = vt->depth;
+	u8 digest[VERITY_MAX_DIGEST_SIZE];
+	struct verity_tree_entry *entry;
+	void *node;
+
+	do {
+		/* Need to check that the hash of the current block is accurate
+		 * in its parent.
+		 */
+		entry = verity_tree_get_entry(vt, depth - 1, block);
+		state = atomic_read(&entry->state);
+		/* This call is only safe if all nodes along the path
+		 * are already populated (i.e. READY) via verity_tree_populate.
+		 */
+		BUG_ON(state < VERITY_TREE_ENTRY_READY);
+		node = verity_tree_get_node(vt, entry, depth, block);
+
+		if (verity_tree_compute_hash(vt, pg, offset, digest) ||
+		    memcmp(digest, node, vt->digest_size))
+			goto mismatch;
+
+		/* Keep the containing block of hashes to be verified in the
+		 * next pass.
+		 */
+		pg = virt_to_page(entry->nodes);
+		offset = offset_in_page(entry->nodes);
+	} while (--depth > 0 && state != VERITY_TREE_ENTRY_VERIFIED);
+
+	if (depth == 0 && state != VERITY_TREE_ENTRY_VERIFIED) {
+		if (verity_tree_compute_hash(vt, pg, offset, digest) ||
+		    memcmp(digest, vt->digest, vt->digest_size))
+			goto mismatch;
+		atomic_set(&entry->state, VERITY_TREE_ENTRY_VERIFIED);
+	}
+
+	/* Mark path to leaf as verified. */
+	for (depth++; depth < vt->depth; depth++) {
+		entry = verity_tree_get_entry(vt, depth, block);
+		/* At this point, entry can only be in VERIFIED or READY state.
+		 * So it is safe to use atomic_set instead of atomic_cmpxchg.
+		 */
+		atomic_set(&entry->state, VERITY_TREE_ENTRY_VERIFIED);
+	}
+
+	return 0;
+
+mismatch:
+	DMERR_LIMIT("verify_path: failed to verify hash (d=%d,bi=%u)",
+		    depth, block);
+	return VERITY_TREE_ENTRY_ERROR_MISMATCH;
+}
+
+/**
+ * verity_tree_is_populated - check that nodes needed to verify a given
+ *                            block are all ready
+ * @vt:	    pointer to a verity_tree_create()d vt
+ * @block:  specific block data is expected from
+ *
+ * Callers may wish to call verity_tree_is_populated() when checking an io
+ * for which entries were already pending.
+ */
+static bool verity_tree_is_populated(struct verity_tree *vt, unsigned int block)
+{
+	int depth;
+
+	for (depth = vt->depth - 1; depth >= 0; depth--) {
+		struct verity_tree_entry *entry;
+		entry = verity_tree_get_entry(vt, depth, block);
+		if (atomic_read(&entry->state) < VERITY_TREE_ENTRY_READY)
+			return false;
+	}
+
+	return true;
+}
+
+/**
+ * verity_tree_populate - reads entries from disk needed to verify a given block
+ * @vt:     pointer to a verity_tree_create()d vt
+ * @ctx:    context used for all read_cb calls on this request
+ * @block:  specific block data is expected from
+ *
+ * Returns negative value on error. Returns 0 on success.
+ */
+static int verity_tree_populate(struct verity_tree *vt, void *ctx,
+				unsigned int block)
+{
+	int depth, state;
+
+	BUG_ON(block >= vt->block_count);
+
+	for (depth = vt->depth - 1; depth >= 0; --depth) {
+		unsigned int index;
+		struct verity_tree_level *level;
+		struct verity_tree_entry *entry;
+
+		index = verity_tree_index_at_level(vt, depth, block);
+		level = &vt->levels[depth];
+		entry = verity_tree_get_entry(vt, depth, block);
+		state = atomic_cmpxchg(&entry->state,
+				       VERITY_TREE_ENTRY_UNALLOCATED,
+				       VERITY_TREE_ENTRY_PENDING);
+		if (state == VERITY_TREE_ENTRY_VERIFIED)
+			break;
+		if (state <= VERITY_TREE_ENTRY_ERROR)
+			goto error_state;
+		if (state != VERITY_TREE_ENTRY_UNALLOCATED)
+			continue;
+
+		/* Current entry is claimed for allocation and loading */
+		entry->nodes = kmalloc(vt->block_size, GFP_NOIO);
+		if (!entry->nodes)
+			goto nomem;
+
+		vt->read_cb(ctx,
+			    level->sector + to_sector(index * vt->block_size),
+			    entry->nodes, to_sector(vt->block_size), entry);
+	}
+
+	return 0;
+
+error_state:
+	DMCRIT("block %u at depth %d is in an error state", block, depth);
+	return -EPERM;
+
+nomem:
+	DMCRIT("failed to allocate memory for entry->nodes");
+	return -ENOMEM;
+}
+
+/**
+ * verity_tree_destroy - cleans up all memory used by @vt
+ * @vt:	 pointer to a verity_tree_create()d vt
+ */
+static void verity_tree_destroy(struct verity_tree *vt)
+{
+	int depth, cpu;
+
+	for (depth = 0; depth < vt->depth; depth++) {
+		struct verity_tree_entry *entry = vt->levels[depth].entries;
+		struct verity_tree_entry *entry_end = entry +
+			vt->levels[depth].count;
+		for (; entry < entry_end; ++entry)
+			kfree(entry->nodes);
+		kfree(vt->levels[depth].entries);
+	}
+	kfree(vt->levels);
+	crypto_free_shash((*per_cpu_ptr(vt->hash_desc, 0))->tfm);
+	for_each_possible_cpu(cpu)
+		kfree(*per_cpu_ptr(vt->hash_desc, cpu));
+}
+
+/*
+ * Verity Tree Accessors
+ */
+
+/**
+ * verity_tree_set_digest - sets an unverified root digest hash from hex
+ * @vt:	     pointer to a verity_tree_create()d vt
+ * @digest:  string containing the digest in hex
+ * Returns non-zero on error.
+ */
+static int verity_tree_set_digest(struct verity_tree *vt, const char *digest)
+{
+	/* Make sure we have at least the bytes expected */
+	if (strnlen((char *)digest, vt->digest_size * 2) !=
+	    vt->digest_size * 2) {
+		DMERR("root digest length does not match hash algorithm");
+		return -1;
+	}
+	return hex2bin(vt->digest, digest, vt->digest_size);
+}
+
+/**
+ * verity_tree_digest - returns root digest in hex
+ * @vt:	     pointer to a verity_tree_create()d vt
+ * @digest:  buffer to put into, must be of length VERITY_SALT_SIZE * 2 + 1.
+ */
+int verity_tree_digest(struct verity_tree *vt, char *digest)
+{
+	bin2hex(digest, vt->digest, vt->digest_size);
+	return 0;
+}
+
+/**
+ * verity_tree_set_salt - sets the salt
+ * @vt:    pointer to a verity_tree_create()d vt
+ * @salt:  string containing the salt in hex
+ * Returns non-zero on error.
+ */
+int verity_tree_set_salt(struct verity_tree *vt, const char *salt)
+{
+	size_t saltlen = min(strlen(salt) / 2, sizeof(vt->salt));
+	memset(vt->salt, 0, sizeof(vt->salt));
+	return hex2bin(vt->salt, salt, saltlen);
+}
+
+
+/**
+ * verity_tree_salt - returns the salt in hex
+ * @vt:    pointer to a verity_tree_create()d vt
+ * @salt:  buffer to put salt into, of length VERITY_SALT_SIZE * 2 + 1.
+ */
+int verity_tree_salt(struct verity_tree *vt, char *salt)
+{
+	bin2hex(salt, vt->salt, sizeof(vt->salt));
+	return 0;
+}
+
+/*
+ * Allocation and utility functions
+ */
+
+static void kverityd_src_io_read_end(struct bio *clone, int error);
+
+/* Shared destructor for all internal bios */
+static void verity_bio_destructor(struct bio *bio)
+{
+	struct verity_io *io = bio->bi_private;
+	struct verity_config *vc = io->target->private;
+	bio_free(bio, vc->bs);
+}
+
+static struct bio *verity_alloc_bioset(struct verity_config *vc, gfp_t gfp_mask,
+				       int nr_iovecs)
+{
+	return bio_alloc_bioset(gfp_mask, nr_iovecs, vc->bs);
+}
+
+static struct verity_io *verity_io_alloc(struct dm_target *ti,
+					    struct bio *bio)
+{
+	struct verity_config *vc = ti->private;
+	sector_t sector = bio->bi_sector - ti->begin;
+	struct verity_io *io;
+
+	io = mempool_alloc(vc->io_pool, GFP_NOIO);
+	if (unlikely(!io))
+		return NULL;
+	io->flags = 0;
+	io->target = ti;
+	io->bio = bio;
+	io->error = 0;
+
+	/* Adjust the sector by the virtual starting sector */
+	io->block = to_bytes(sector) / vc->bht.block_size;
+	io->count = bio->bi_size / vc->bht.block_size;
+
+	atomic_set(&io->pending, 0);
+
+	return io;
+}
+
+static struct bio *verity_bio_clone(struct verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+	struct bio *bio = io->bio;
+	struct bio *clone = verity_alloc_bioset(vc, GFP_NOIO, bio->bi_max_vecs);
+
+	if (!clone)
+		return NULL;
+
+	__bio_clone(clone, bio);
+	clone->bi_private = io;
+	clone->bi_end_io  = kverityd_src_io_read_end;
+	clone->bi_bdev    = vc->dev->bdev;
+	clone->bi_sector += vc->start - io->target->begin;
+	clone->bi_destructor = verity_bio_destructor;
+
+	return clone;
+}
+
+/*
+ * Reverse flow of requests into the device.
+ *
+ * (Start at the bottom with verity_map and work your way upward).
+ */
+
+static void verity_inc_pending(struct verity_io *io);
+
+static void verity_return_bio_to_caller(struct verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+
+	if (io->error)
+		io->error = -EIO;
+
+	bio_endio(io->bio, io->error);
+	mempool_free(io, vc->io_pool);
+}
+
+/* Check for any missing bht hashes. */
+static bool verity_is_bht_populated(struct verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+	u64 block;
+
+	for (block = io->block; block < io->block + io->count; ++block)
+		if (!verity_tree_is_populated(&vc->bht, block))
+			return false;
+
+	return true;
+}
+
+/* verity_dec_pending manages the lifetime of all verity_io structs.
+ * Non-bug error handling is centralized through this interface and
+ * all passage from workqueue to workqueue.
+ */
+static void verity_dec_pending(struct verity_io *io)
+{
+	if (!atomic_dec_and_test(&io->pending))
+		goto done;
+
+	if (unlikely(io->error))
+		goto io_error;
+
+	/* I/Os that were pending may now be ready */
+	if (verity_is_bht_populated(io)) {
+		INIT_DELAYED_WORK(&io->work, kverityd_verify);
+		queue_delayed_work(kveritydq, &io->work, 0);
+	} else {
+		INIT_DELAYED_WORK(&io->work, kverityd_io);
+		queue_delayed_work(kverityd_ioq, &io->work, HZ/10);
+	}
+
+done:
+	return;
+
+io_error:
+	verity_return_bio_to_caller(io);
+}
+
+/* Walks the data set and computes the hash of the data read from the
+ * untrusted source device.  The computed hash is then passed to verity-tree
+ * for verification.
+ */
+static int verity_verify(struct verity_config *vc,
+			 struct verity_io *io)
+{
+	unsigned int block_size = vc->bht.block_size;
+	struct bio *bio = io->bio;
+	u64 block = io->block;
+	unsigned int idx;
+	int r;
+
+	for (idx = bio->bi_idx; idx < bio->bi_vcnt; idx++) {
+		struct bio_vec *bv = bio_iovec_idx(bio, idx);
+		unsigned int offset = bv->bv_offset;
+		unsigned int len = bv->bv_len;
+
+		BUG_ON(offset % block_size);
+		BUG_ON(len % block_size);
+
+		while (len) {
+			r = verity_tree_verify_block(&vc->bht, block,
+						bv->bv_page, offset);
+			if (r)
+				goto bad_return;
+
+			offset += block_size;
+			len -= block_size;
+			block++;
+			cond_resched();
+		}
+	}
+
+	return 0;
+
+bad_return:
+	/* verity_tree functions aren't expected to return errno friendly
+	 * values.  They are converted here for uniformity.
+	 */
+	if (r > 0) {
+		DMERR("Pending data for block %llu seen at verify", ULL(block));
+		r = -EBUSY;
+	} else {
+		DMERR_LIMIT("Block hash does not match!");
+		r = -EACCES;
+	}
+	return r;
+}
+
+/* Services the verify workqueue */
+static void kverityd_verify(struct work_struct *work)
+{
+	struct delayed_work *dwork = container_of(work, struct delayed_work,
+						  work);
+	struct verity_io *io = container_of(dwork, struct verity_io,
+					    work);
+	struct verity_config *vc = io->target->private;
+
+	io->error = verity_verify(vc, io);
+
+	/* Free up the bio and tag with the return value */
+	verity_return_bio_to_caller(io);
+}
+
+/* Asynchronously called upon the completion of verity-tree I/O. The status
+ * of the operation is passed back to verity-tree and the next steps are
+ * decided by verity_dec_pending.
+ */
+static void kverityd_io_bht_populate_end(struct bio *bio, int error)
+{
+	struct verity_tree_entry *entry;
+	struct verity_io *io;
+
+	entry = (struct verity_tree_entry *) bio->bi_private;
+	io = (struct verity_io *) entry->io_context;
+
+	/* Tell the tree to atomically update now that we've populated
+	 * the given entry.
+	 */
+	verity_tree_read_completed(entry, error);
+
+	/* Clean up for reuse when reading data to be checked */
+	bio->bi_vcnt = 0;
+	bio->bi_io_vec->bv_offset = 0;
+	bio->bi_io_vec->bv_len = 0;
+	bio->bi_io_vec->bv_page = NULL;
+	/* Restore the private data to I/O so the destructor can be shared. */
+	bio->bi_private = (void *) io;
+	bio_put(bio);
+
+	/* We bail but assume the tree has been marked bad. */
+	if (unlikely(error)) {
+		DMERR("Failed to read for sector %llu (%u)",
+		      ULL(io->bio->bi_sector), io->bio->bi_size);
+		io->error = error;
+		/* Pass through the error to verity_dec_pending below */
+	}
+	/* When pending = 0, it will transition to reading real data */
+	verity_dec_pending(io);
+}
+
+/* Called by verity-tree (via verity_tree_populate), this function provides
+ * the message digests to verity-tree that are stored on disk.
+ */
+static int kverityd_bht_read_callback(void *ctx, sector_t start, u8 *dst,
+				      sector_t count,
+				      struct verity_tree_entry *entry)
+{
+	struct verity_io *io = ctx;  /* I/O for this batch */
+	struct verity_config *vc;
+	struct bio *bio;
+
+	vc = io->target->private;
+
+	/* The I/O context is nested inside the entry so that we don't need one
+	 * io context per page read.
+	 */
+	entry->io_context = ctx;
+
+	/* We should only get page size requests at present. */
+	verity_inc_pending(io);
+	bio = verity_alloc_bioset(vc, GFP_NOIO, 1);
+	if (unlikely(!bio)) {
+		DMCRIT("Out of memory at bio_alloc_bioset");
+		verity_tree_read_completed(entry, -ENOMEM);
+		return -ENOMEM;
+	}
+	bio->bi_private = (void *) entry;
+	bio->bi_idx = 0;
+	bio->bi_size = vc->bht.block_size;
+	bio->bi_sector = vc->hash_start + start;
+	bio->bi_bdev = vc->hash_dev->bdev;
+	bio->bi_end_io = kverityd_io_bht_populate_end;
+	bio->bi_rw = REQ_META;
+	/* Only need to free the bio since the page is managed by bht */
+	bio->bi_destructor = verity_bio_destructor;
+	bio->bi_vcnt = 1;
+	bio->bi_io_vec->bv_offset = offset_in_page(dst);
+	bio->bi_io_vec->bv_len = to_bytes(count);
+	/* dst is guaranteed to be a page_pool allocation */
+	bio->bi_io_vec->bv_page = virt_to_page(dst);
+	/* Track that this I/O is in use.  There should be no risk of the io
+	 * being removed prior since this is called synchronously.
+	 */
+	generic_make_request(bio);
+	return 0;
+}
+
+/* Submits an io request for each missing block of block hashes.
+ * The last one to return will then enqueue this on the io workqueue.
+ */
+static void kverityd_io_bht_populate(struct verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+	u64 block;
+
+	for (block = io->block; block < io->block + io->count; ++block) {
+		int ret = verity_tree_populate(&vc->bht, io, block);
+
+		if (ret < 0) {
+			/* verity_dec_pending will handle the error case. */
+			io->error = ret;
+			break;
+		}
+	}
+}
+
+/* Asynchronously called upon the completion of I/O issued
+ * from kverityd_src_io_read. verity_dec_pending() acts as
+ * the scheduler/flow manager.
+ */
+static void kverityd_src_io_read_end(struct bio *clone, int error)
+{
+	struct verity_io *io = clone->bi_private;
+
+	if (unlikely(!bio_flagged(clone, BIO_UPTODATE) && !error))
+		error = -EIO;
+
+	if (unlikely(error)) {
+		DMERR("Error occurred: %d (%llu, %u)",
+			error, ULL(clone->bi_sector), clone->bi_size);
+		io->error = error;
+	}
+
+	/* Release the clone which just avoids the block layer from
+	 * leaving offsets, etc in unexpected states.
+	 */
+	bio_put(clone);
+
+	verity_dec_pending(io);
+}
+
+/* If not yet underway, an I/O request will be issued to the vc->dev
+ * device for the data needed. It is cloned to avoid unexpected changes
+ * to the original bio struct.
+ */
+static void kverityd_src_io_read(struct verity_io *io)
+{
+	struct bio *clone;
+
+	/* Check if the read is already issued. */
+	if (io->flags & VERITY_IOFLAGS_CLONED)
+		return;
+
+	io->flags |= VERITY_IOFLAGS_CLONED;
+
+	/* Clone the bio. The block layer may modify the bvec array. */
+	clone = verity_bio_clone(io);
+	if (unlikely(!clone)) {
+		io->error = -ENOMEM;
+		return;
+	}
+
+	verity_inc_pending(io);
+
+	generic_make_request(clone);
+}
+
+/* kverityd_io services the I/O workqueue. For each pass through
+ * the I/O workqueue, a call to populate both the origin drive
+ * data and the hash tree data is made.
+ */
+static void kverityd_io(struct work_struct *work)
+{
+	struct delayed_work *dwork = container_of(work, struct delayed_work,
+						  work);
+	struct verity_io *io = container_of(dwork, struct verity_io,
+					    work);
+
+	/* Issue requests asynchronously. */
+	verity_inc_pending(io);
+	kverityd_src_io_read(io);
+	kverityd_io_bht_populate(io);
+	verity_dec_pending(io);
+}
+
+/* Paired with verity_dec_pending, the pending value in the io dictate the
+ * lifetime of a request and when it is ready to be processed on the
+ * workqueues.
+ */
+static void verity_inc_pending(struct verity_io *io)
+{
+	atomic_inc(&io->pending);
+}
+
+/* Block-level requests start here. */
+static int verity_map(struct dm_target *ti, struct bio *bio,
+		      union map_info *map_context)
+{
+	struct verity_io *io;
+	struct verity_config *vc;
+	struct request_queue *r_queue;
+
+	if (unlikely(!ti)) {
+		DMERR("dm_target was NULL");
+		return -EIO;
+	}
+
+	vc = ti->private;
+	r_queue = bdev_get_queue(vc->dev->bdev);
+
+	if (bio_data_dir(bio) == WRITE) {
+		/* If we silently drop writes, then the VFS layer will cache
+		 * the write and persist it in memory. While it doesn't change
+		 * the underlying storage, it still may be contrary to the
+		 * behavior expected by a verified, read-only device.
+		 */
+		DMWARN_LIMIT("write request received. rejecting with -EIO.");
+		return -EIO;
+	} else {
+		/* Queue up the request to be verified */
+		io = verity_io_alloc(ti, bio);
+		if (!io) {
+			DMERR_LIMIT("Failed to allocate and init IO data");
+			return DM_MAPIO_REQUEUE;
+		}
+		INIT_DELAYED_WORK(&io->work, kverityd_io);
+		queue_delayed_work(kverityd_ioq, &io->work, 0);
+	}
+
+	return DM_MAPIO_SUBMITTED;
+}
+
+/*
+ * Non-block interfaces and device-mapper specific code
+ */
+
+/*
+ * Verity target parameters:
+ *
+ * <dev> <hash_dev> <hash_start> <block_size> <alg> <digest> <salt>
+ *
+ * version:        version of the hash tree on-disk format
+ * dev:            device to verify
+ * hash_dev:       device hashtree is stored on
+ * hash_start:     start address of hashes
+ * block_size:     size of a hash block
+ * alg:            hash algorithm
+ * digest:         toplevel hash of the tree
+ * salt:           salt
+ */
+static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+{
+	struct verity_config *vc = NULL;
+	const char *dev, *hash_dev, *alg, *digest, *salt;
+	unsigned long hash_start, block_size, version;
+	sector_t blocks;
+	int ret;
+
+	if (argc != 8) {
+		ti->error = "Invalid argument count";
+		return -EINVAL;
+	}
+
+	if (strict_strtoul(argv[0], 10, &version) ||
+	    (version != 0)) {
+		ti->error = "Invalid version";
+		return -EINVAL;
+	}
+	dev = argv[1];
+	hash_dev = argv[2];
+	if (strict_strtoul(argv[3], 10, &hash_start)) {
+		ti->error = "Invalid hash_start";
+		return -EINVAL;
+	}
+	if (strict_strtoul(argv[4], 10, &block_size) ||
+	    (block_size > UINT_MAX)) {
+		ti->error = "Invalid block_size";
+		return -EINVAL;
+	}
+	alg = argv[5];
+	digest = argv[6];
+	salt = argv[7];
+
+	/* The device mapper device should be setup read-only */
+	if ((dm_table_get_mode(ti->table) & ~FMODE_READ) != 0) {
+		ti->error = "Must be created readonly.";
+		return -EINVAL;
+	}
+
+	vc = kzalloc(sizeof(*vc), GFP_KERNEL);
+	if (!vc) {
+		return -EINVAL;
+	}
+
+	/* Calculate the blocks from the given device size */
+	vc->size = ti->len;
+	blocks = to_bytes(vc->size) / block_size;
+	if (verity_tree_create(&vc->bht, blocks, block_size, alg)) {
+		DMERR("failed to create required bht");
+		goto bad_bht;
+	}
+	if (verity_tree_set_digest(&vc->bht, digest)) {
+		DMERR("digest error");
+		goto bad_digest;
+	}
+	verity_tree_set_salt(&vc->bht, salt);
+	vc->bht.read_cb = kverityd_bht_read_callback;
+
+	vc->start = 0;
+	/* We only ever grab the device in read-only mode. */
+	ret = dm_get_device(ti, dev, dm_table_get_mode(ti->table), &vc->dev);
+	if (ret) {
+		DMERR("Failed to acquire device '%s': %d", dev, ret);
+		ti->error = "Device lookup failed";
+		goto bad_verity_dev;
+	}
+
+	if ((to_bytes(vc->start) % block_size) ||
+	    (to_bytes(vc->size) % block_size)) {
+		ti->error = "Device must be block_size divisble/aligned";
+		goto bad_hash_start;
+	}
+
+	vc->hash_start = (sector_t)hash_start;
+
+	/*
+	 * Note, dev == hash_dev is okay as long as the size of
+	 *       ti->len passed to device mapper does not include
+	 *       the hashes.
+	 */
+	if (dm_get_device(ti, hash_dev,
+			  dm_table_get_mode(ti->table), &vc->hash_dev)) {
+		ti->error = "Hash device lookup failed";
+		goto bad_hash_dev;
+	}
+
+	if (snprintf(vc->hash_alg, CRYPTO_MAX_ALG_NAME, "%s", alg) >=
+	    CRYPTO_MAX_ALG_NAME) {
+		ti->error = "Hash algorithm name is too long";
+		goto bad_hash;
+	}
+
+	vc->io_pool = mempool_create_slab_pool(MIN_IOS, _verity_io_pool);
+	if (!vc->io_pool) {
+		ti->error = "Cannot allocate verity io mempool";
+		goto bad_slab_pool;
+	}
+
+	vc->bs = bioset_create(MIN_BIOS, 0);
+	if (!vc->bs) {
+		ti->error = "Cannot allocate verity bioset";
+		goto bad_bs;
+	}
+
+	ti->private = vc;
+
+	return 0;
+
+bad_bs:
+	mempool_destroy(vc->io_pool);
+bad_slab_pool:
+bad_hash:
+	dm_put_device(ti, vc->hash_dev);
+bad_hash_dev:
+bad_hash_start:
+	dm_put_device(ti, vc->dev);
+bad_bht:
+bad_digest:
+bad_verity_dev:
+	kfree(vc);   /* hash is not secret so no need to zero */
+	return -EINVAL;
+}
+
+static void verity_dtr(struct dm_target *ti)
+{
+	struct verity_config *vc = (struct verity_config *) ti->private;
+
+	bioset_free(vc->bs);
+	mempool_destroy(vc->io_pool);
+	verity_tree_destroy(&vc->bht);
+	dm_put_device(ti, vc->hash_dev);
+	dm_put_device(ti, vc->dev);
+	kfree(vc);
+}
+
+static int verity_ioctl(struct dm_target *ti, unsigned int cmd,
+			unsigned long arg)
+{
+	struct verity_config *vc = (struct verity_config *) ti->private;
+	struct dm_dev *dev = vc->dev;
+	int r = 0;
+
+	/*
+	 * Only pass ioctls through if the device sizes match exactly.
+	 */
+	if (vc->start ||
+	    ti->len != i_size_read(dev->bdev->bd_inode) >> SECTOR_SHIFT)
+		r = scsi_verify_blk_ioctl(NULL, cmd);
+
+	return r ? : __blkdev_driver_ioctl(dev->bdev, dev->mode, cmd, arg);
+}
+
+static int verity_status(struct dm_target *ti, status_type_t type,
+			char *result, unsigned int maxlen)
+{
+	struct verity_config *vc = (struct verity_config *) ti->private;
+	unsigned int sz = 0;
+	char digest[VERITY_MAX_DIGEST_SIZE * 2 + 1] = { 0 };
+	char salt[VERITY_SALT_SIZE * 2 + 1] = { 0 };
+
+	verity_tree_digest(&vc->bht, digest);
+	verity_tree_salt(&vc->bht, salt);
+
+	switch (type) {
+	case STATUSTYPE_INFO:
+		result[0] = '\0';
+		break;
+	case STATUSTYPE_TABLE:
+		DMEMIT("%s %s %llu %llu %s %s %s",
+		       vc->dev->name,
+		       vc->hash_dev->name,
+		       ULL(vc->hash_start),
+		       ULL(vc->bht.block_size),
+		       vc->hash_alg,
+		       digest,
+		       salt);
+		break;
+	}
+	return 0;
+}
+
+static int verity_merge(struct dm_target *ti, struct bvec_merge_data *bvm,
+		       struct bio_vec *biovec, int max_size)
+{
+	struct verity_config *vc = ti->private;
+	struct request_queue *q = bdev_get_queue(vc->dev->bdev);
+
+	if (!q->merge_bvec_fn)
+		return max_size;
+
+	bvm->bi_bdev = vc->dev->bdev;
+	bvm->bi_sector = vc->start + bvm->bi_sector - ti->begin;
+
+	/* Optionally, this could just return 0 to stick to single pages. */
+	return min(max_size, q->merge_bvec_fn(q, bvm, biovec));
+}
+
+static int verity_iterate_devices(struct dm_target *ti,
+				 iterate_devices_callout_fn fn, void *data)
+{
+	struct verity_config *vc = ti->private;
+
+	return fn(ti, vc->dev, vc->start, ti->len, data);
+}
+
+static void verity_io_hints(struct dm_target *ti,
+			    struct queue_limits *limits)
+{
+	struct verity_config *vc = ti->private;
+	unsigned int block_size = vc->bht.block_size;
+
+	limits->logical_block_size = block_size;
+	limits->physical_block_size = block_size;
+	blk_limits_io_min(limits, block_size);
+}
+
+static struct target_type verity_target = {
+	.name   = "verity",
+	.version = {0, 1, 0},
+	.module = THIS_MODULE,
+	.ctr    = verity_ctr,
+	.dtr    = verity_dtr,
+	.ioctl  = verity_ioctl,
+	.map    = verity_map,
+	.merge  = verity_merge,
+	.status = verity_status,
+	.iterate_devices = verity_iterate_devices,
+	.io_hints = verity_io_hints,
+};
+
+#define VERITY_WQ_FLAGS (WQ_CPU_INTENSIVE|WQ_HIGHPRI)
+
+static int __init verity_init(void)
+{
+	int r = -ENOMEM;
+
+	_verity_io_pool = KMEM_CACHE(verity_io, 0);
+	if (!_verity_io_pool) {
+		DMERR("failed to allocate pool verity_io");
+		goto bad_io_pool;
+	}
+
+	kverityd_ioq = alloc_workqueue("kverityd_io", VERITY_WQ_FLAGS, 1);
+	if (!kverityd_ioq) {
+		DMERR("failed to create workqueue kverityd_ioq");
+		goto bad_io_queue;
+	}
+
+	kveritydq = alloc_workqueue("kverityd", VERITY_WQ_FLAGS, 1);
+	if (!kveritydq) {
+		DMERR("failed to create workqueue kveritydq");
+		goto bad_verify_queue;
+	}
+
+	r = dm_register_target(&verity_target);
+	if (r < 0) {
+		DMERR("register failed %d", r);
+		goto register_failed;
+	}
+
+	DMINFO("version %u.%u.%u loaded", verity_target.version[0],
+	       verity_target.version[1], verity_target.version[2]);
+
+	return r;
+
+register_failed:
+	destroy_workqueue(kveritydq);
+bad_verify_queue:
+	destroy_workqueue(kverityd_ioq);
+bad_io_queue:
+	kmem_cache_destroy(_verity_io_pool);
+bad_io_pool:
+	return r;
+}
+
+static void __exit verity_exit(void)
+{
+	destroy_workqueue(kveritydq);
+	destroy_workqueue(kverityd_ioq);
+
+	dm_unregister_target(&verity_target);
+	kmem_cache_destroy(_verity_io_pool);
+}
+
+module_init(verity_init);
+module_exit(verity_exit);
+
+MODULE_AUTHOR("The Chromium OS Authors <chromium-os-dev@chromium.org>");
+MODULE_DESCRIPTION(DM_NAME " target for transparent disk integrity checking");
+MODULE_LICENSE("GPL");
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH] dm: verity target
  2012-01-04 21:49 Mandeep Singh Baines
@ 2012-01-04 22:42 ` Mandeep Singh Baines
  0 siblings, 0 replies; 22+ messages in thread
From: Mandeep Singh Baines @ 2012-01-04 22:42 UTC (permalink / raw)
  To: Alasdair G Kergon, dm-devel, linux-kernel
  Cc: Mandeep Singh Baines, Will Drewry, Elly Jones, Alasdair G Kergon,
	Milan Broz, Olof Johansson, Steffen Klassert

D'oh. Forget to -a when running git commit earlier. Ignore earlier
post and use this instead.

---

The verity target provides transparent integrity checking of block devices
using a cryptographic digest.

dm-verity is meant to be setup as part of a verified boot path.  This
may be anything ranging from a boot using tboot or trustedgrub to just
booting from a known-good device (like a USB drive or CD).

dm-verity is part of ChromeOS's verified boot path. It is used to verify
the integrity of the root filesystem on boot. The root filesystem is
mounted on a dm-verity partition which transparently verifies each block
with a bootloader verified hash passed into the kernel at boot.

Changes in V3:
* Discussion over irc (Alasdair G Kergon)
  * Implement ioctl hook
Changes in V2:
* https://lkml.org/lkml/2011/11/10/85 (Steffen Klassert)
  * Use shash API instead of older hash API

Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Elly Jones <ellyjones@chromium.org>
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Milan Broz <mbroz@redhat.com>
Cc: Olof Johansson <olofj@chromium.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: dm-devel@redhat.com
---
 Documentation/device-mapper/dm-bht.txt    |   59 ++
 Documentation/device-mapper/dm-verity.txt |   76 +++
 drivers/md/Kconfig                        |   30 +
 drivers/md/Makefile                       |    2 +
 drivers/md/dm-bht.c                       |  559 +++++++++++++++
 drivers/md/dm-verity.c                    | 1051 +++++++++++++++++++++++++++++
 drivers/md/dm-verity.h                    |   45 ++
 include/linux/dm-bht.h                    |  166 +++++
 8 files changed, 1988 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/device-mapper/dm-bht.txt
 create mode 100644 Documentation/device-mapper/dm-verity.txt
 create mode 100644 drivers/md/dm-bht.c
 create mode 100644 drivers/md/dm-verity.c
 create mode 100644 drivers/md/dm-verity.h
 create mode 100644 include/linux/dm-bht.h

diff --git a/Documentation/device-mapper/dm-bht.txt b/Documentation/device-mapper/dm-bht.txt
new file mode 100644
index 0000000..21d929f
--- /dev/null
+++ b/Documentation/device-mapper/dm-bht.txt
@@ -0,0 +1,59 @@
+dm-bht
+======
+
+dm-bht provides a block hash tree implementation.  The use of dm-bht allows
+for integrity checking of a given block device without reading the entire
+set of blocks into memory before use.
+
+In particular, dm-bht supplies an interface for creating and verifying a tree
+of cryptographic digests with any algorithm supported by the kernel crypto API.
+
+The `verity' target is the motivating example.
+
+
+Theory of operation
+===================
+
+dm-bht is logically comprised of multiple nodes organized in a tree-like
+structure.  Each node in the tree is a cryptographic hash.  If it is a leaf
+node, the hash is of some block data on disk.  If it is an intermediary node,
+then the hash is of a number of child nodes.
+
+dm-bht has a given depth starting at 1 (ignoring the root node).  Each level in
+the tree is concretely made up of dm_bht_entry structs.  Each entry in the tree
+is a collection of neighboring nodes that fit in one page-sized block.  The
+number is determined based on PAGE_SIZE and the size of the selected
+cryptographic digest algorithm.  The hashes are linearly ordered in this entry
+and any unaligned trailing space is ignored but included when calculating the
+parent node.
+
+The tree looks something like:
+
+alg= sha256, num_blocks = 32767
+                                 [   root    ]
+                                /    . . .    \
+                     [entry_0]                 [entry_1]
+                    /  . . .  \                 . . .   \
+         [entry_0_0]   . . .  [entry_0_127]    . . . .  [entry_1_127]
+           / ... \             /   . . .  \             /           \
+     blk_0 ... blk_127  blk_16256   blk_16383      blk_32640 . . . blk_32767
+
+root is treated independently from the depth and the blocks are expected to
+be hashed and supplied to the dm-bht.  hash blocks that make up the entry
+contents are expected to be read from disk.
+
+dm-bht does not handle I/O directly but instead expects the consumer to
+supply callbacks.  The read callback will always receive a page-align value
+to pass to the block device layer to read in a hash value.
+
+Usage
+=====
+
+The API provides mechanisms for reading and verifying a tree. When reading, all
+required data for the hash tree should be populated for a block before
+attempting a verify.  This can be done by calling dm_bht_populate().  When all
+data is ready, a call to dm_bht_verify_block() with the expected hash value will
+perform both the direct block hash check and the hashes of the parent and
+neighboring nodes where needed to ensure validity up to the root hash.  Note,
+dm_bht_set_root_hexdigest() should be called before any verification attempts
+occur.
diff --git a/Documentation/device-mapper/dm-verity.txt b/Documentation/device-mapper/dm-verity.txt
new file mode 100644
index 0000000..f33b984
--- /dev/null
+++ b/Documentation/device-mapper/dm-verity.txt
@@ -0,0 +1,76 @@
+dm-verity
+==========
+
+Device-Mapper's "verity" target provides transparent integrity checking of
+block devices using a cryptographic digest provided by the kernel crypto API.
+This target is read-only.
+
+Parameters: payload=<device path> hashtree=<hash device path> alg=<alg> \
+            salt=<salt> root_hexagiest=<root hash> \
+            [ hashstart=<hash start> error_behavior=<error behavior> ]
+
+<device path>
+    This is the device that is going to be integrity checked.  It may be
+    a subset of the full device as specified to dmsetup (start sector and count)
+    It may be specified as a path, like /dev/sdaX, or a device number,
+    <major>:<minor>.
+
+<hash device path>
+    This is the device that that supplies the dm-bht hash data.  It may be
+    specified similarly to the device path and may be the same device.  If the
+    same device is used, the hash offset should be outside of the dm-verity
+    configured device size.
+
+<alg>
+    The cryptographic hash algorithm used for this device.  This should
+    be the name of the algorithm, like "sha1".
+
+<salt>
+    Salt value (in hex).
+
+<root hash>
+    The hexadecimal encoding of the cryptographic hash of all of the
+    neighboring nodes at the first level of the tree.  This hash should be
+    trusted as there is no other authenticity beyond this point.
+
+<hash start>
+    Start address of hashes (default 0).
+
+<error behavior>
+    0 = return -EIO. 1 = panic. 2 = none. 3 = call notifier.
+
+Theory of operation
+===================
+
+dm-verity is meant to be setup as part of a verified boot path.  This
+may be anything ranging from a boot using tboot or trustedgrub to just
+booting from a known-good device (like a USB drive or CD).
+
+When a dm-verity device is configured, it is expected that the caller
+has been authenticated in some way (cryptographic signatures, etc).
+After instantiation, all hashes will be verified on-demand during
+disk access.  If they cannot be verified up to the root node of the
+tree, the root hash, then the I/O will fail.  This should identify
+tampering with any data on the device and the hash data.
+
+Cryptographic hashes are used to assert the integrity of the device on a
+per-block basis.  This allows for a lightweight hash computation on first read
+into the page cache.  Block hashes are stored linearly aligned to the nearest
+block the size of a page.
+
+For more information on the hashing process, see dm-bht.txt.
+
+
+Example
+=======
+
+Setup a device;
+[[
+  dmsetup create vroot --table \
+    "0 204800 verity payload=/dev/sda1 hashtree=/dev/sda2 alg=sha1 "\
+    "root_hexdigest=9f74809a2ee7607b16fcc70d9399a4de9725a727"
+]]
+
+A command line tool is available to compute the hash tree and return the
+root hash value.
+  http://git.chromium.org/cgi-bin/gitweb.cgi?p=dm-verity.git;a=tree
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index faa4741..3cdf95c 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -370,4 +370,34 @@ config DM_FLAKEY
        ---help---
          A target that intermittently fails I/O for debugging purposes.
 
+config DM_BHT
+        tristate "Block hash tree support"
+        select CRYPTO
+        select CRYPTO_HASH
+        ---help---
+          Include support for device-mapper devices to use a block hash
+          tree for managing data integrity checks in a scalable way.
+
+          Targets that use this functionality should include it
+          automatically.
+
+          If unsure, say N.
+
+config DM_VERITY
+        tristate "Verity target support"
+        depends on BLK_DEV_DM
+        select DM_BHT
+        select CRYPTO
+        select CRYPTO_HASH
+        ---help---
+          This device-mapper target allows you to create a device that
+          transparently integrity checks the data on it. You'll need to
+          activate the digests you're going to use in the cryptoapi
+          configuration.
+
+          To compile this code as a module, choose M here: the module will
+          be called dm-verity.
+
+          If unsure, say N.
+
 endif # MD
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index 046860c..c069953 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -39,6 +39,8 @@ obj-$(CONFIG_DM_SNAPSHOT)	+= dm-snapshot.o
 obj-$(CONFIG_DM_PERSISTENT_DATA)	+= persistent-data/
 obj-$(CONFIG_DM_MIRROR)		+= dm-mirror.o dm-log.o dm-region-hash.o
 obj-$(CONFIG_DM_LOG_USERSPACE)	+= dm-log-userspace.o
+obj-$(CONFIG_DM_BHT)            += dm-bht.o
+obj-$(CONFIG_DM_VERITY)         += dm-verity.o
 obj-$(CONFIG_DM_ZERO)		+= dm-zero.o
 obj-$(CONFIG_DM_RAID)	+= dm-raid.o
 obj-$(CONFIG_DM_THIN_PROVISIONING)	+= dm-thin-pool.o
diff --git a/drivers/md/dm-bht.c b/drivers/md/dm-bht.c
new file mode 100644
index 0000000..6eb2be3
--- /dev/null
+++ b/drivers/md/dm-bht.c
@@ -0,0 +1,559 @@
+ /*
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *
+ * Device-Mapper block hash tree interface.
+ * See Documentation/device-mapper/dm-bht.txt for details.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <crypto/hash.h>
+#include <linux/atomic.h>
+#include <linux/bitops.h>
+#include <linux/bug.h>
+#include <linux/cpumask.h>
+#include <linux/device-mapper.h>
+#include <linux/dm-bht.h>
+#include <linux/err.h>
+#include <linux/errno.h>
+#include <linux/gfp.h>
+#include <linux/highmem.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/mm_types.h>
+#include <linux/scatterlist.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+#define DM_MSG_PREFIX "dm bht"
+
+
+/*
+ * Utilities
+ */
+
+static u8 from_hex(u8 ch)
+{
+	if ((ch >= '0') && (ch <= '9'))
+		return ch - '0';
+	if ((ch >= 'a') && (ch <= 'f'))
+		return ch - 'a' + 10;
+	if ((ch >= 'A') && (ch <= 'F'))
+		return ch - 'A' + 10;
+	return -1;
+}
+
+/**
+ * dm_bht_bin_to_hex - converts a binary stream to human-readable hex
+ * @binary:	a byte array of length @binary_len
+ * @hex:	a byte array of length @binary_len * 2 + 1
+ */
+static void dm_bht_bin_to_hex(u8 *binary, u8 *hex, unsigned int binary_len)
+{
+	while (binary_len-- > 0) {
+		sprintf((char *)hex, "%02hhx", (int)*binary);
+		hex += 2;
+		binary++;
+	}
+}
+
+/**
+ * dm_bht_hex_to_bin - converts a hex stream to binary
+ * @binary:	a byte array of length @binary_len
+ * @hex:	a byte array of length @binary_len * 2 + 1
+ */
+static void dm_bht_hex_to_bin(u8 *binary, const u8 *hex,
+			      unsigned int binary_len)
+{
+	while (binary_len-- > 0) {
+		*binary = from_hex(*(hex++));
+		*binary *= 16;
+		*binary += from_hex(*(hex++));
+		binary++;
+	}
+}
+
+static void dm_bht_log_mismatch(struct dm_bht *bht, u8 *given, u8 *computed)
+{
+	u8 given_hex[DM_BHT_MAX_DIGEST_SIZE * 2 + 1];
+	u8 computed_hex[DM_BHT_MAX_DIGEST_SIZE * 2 + 1];
+
+	dm_bht_bin_to_hex(given, given_hex, bht->digest_size);
+	dm_bht_bin_to_hex(computed, computed_hex, bht->digest_size);
+	DMERR_LIMIT("%s != %s", given_hex, computed_hex);
+}
+
+/**
+ * dm_bht_compute_hash: hashes a page of data
+ */
+static int dm_bht_compute_hash(struct dm_bht *bht, struct page *pg,
+			       unsigned int offset, u8 *digest)
+{
+	struct shash_desc *hash_desc = bht->hash_desc[smp_processor_id()];
+	void *data;
+	int err;
+
+	/* Note, this is synchronous. */
+	if (crypto_shash_init(hash_desc)) {
+		DMCRIT("failed to reinitialize crypto hash (proc:%d)",
+			smp_processor_id());
+		return -EINVAL;
+	}
+	data = kmap_atomic(pg);
+	err = crypto_shash_update(hash_desc, data + offset, PAGE_SIZE);
+	kunmap_atomic(data);
+	if (err) {
+		DMCRIT("crypto_hash_update failed");
+		return -EINVAL;
+	}
+	if (crypto_shash_update(hash_desc, bht->salt, sizeof(bht->salt))) {
+		DMCRIT("crypto_hash_update failed");
+		return -EINVAL;
+	}
+	if (crypto_shash_final(hash_desc, digest)) {
+		DMCRIT("crypto_hash_final failed");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/*
+ * Implementation functions
+ */
+
+static int dm_bht_initialize_entries(struct dm_bht *bht)
+{
+	/* last represents the index of the last digest store in the tree.
+	 * By walking the tree with that index, it is possible to compute the
+	 * total number of entries at each level.
+	 *
+	 * Since each entry will contain up to |node_count| nodes of the tree,
+	 * it is possible that the last index may not be at the end of a given
+	 * entry->nodes.  In that case, it is assumed the value is padded.
+	 *
+	 * Note, we treat both the tree root (1 hash) and the tree leaves
+	 * independently from the bht data structures.  Logically, the root is
+	 * depth=-1 and the block layer level is depth=bht->depth
+	 */
+	unsigned int last = bht->block_count;
+	int depth;
+
+	/* check that the largest level->count can't result in an int overflow
+	 * on allocation or sector calculation.
+	 */
+	if (((last >> bht->node_count_shift) + 1) >
+	    UINT_MAX / max((unsigned int)sizeof(struct dm_bht_entry),
+			   (unsigned int)to_sector(bht->block_size))) {
+		DMCRIT("required entries %u is too large", last + 1);
+		return -EINVAL;
+	}
+
+	/* Track the current sector location for each level so we don't have to
+	 * compute it during traversals.
+	 */
+	bht->sectors = 0;
+	for (depth = 0; depth < bht->depth; ++depth) {
+		struct dm_bht_level *level = &bht->levels[depth];
+
+		level->count = dm_bht_index_at_level(bht, depth, last) + 1;
+		level->entries = (struct dm_bht_entry *)
+				 kcalloc(level->count,
+					 sizeof(struct dm_bht_entry),
+					 GFP_KERNEL);
+		if (!level->entries) {
+			DMERR("failed to allocate entries for depth %d", depth);
+			return -ENOMEM;
+		}
+		level->sector = bht->sectors;
+		bht->sectors += level->count * to_sector(bht->block_size);
+	}
+
+	return 0;
+}
+
+/**
+ * dm_bht_create - prepares @bht for us
+ * @bht:	pointer to a dm_bht_create()d bht
+ * @depth:	tree depth without the root; including block hashes
+ * @block_count:the number of block hashes / tree leaves
+ * @alg_name:	crypto hash algorithm name
+ *
+ * Returns 0 on success.
+ *
+ * Callers can offset into devices by storing the data in the io callbacks.
+ */
+int dm_bht_create(struct dm_bht *bht, unsigned int block_count,
+		  unsigned int block_size, const char *alg_name)
+{
+	struct crypto_shash *tfm;
+	int size, cpu, status = 0;
+
+	bht->block_size = block_size;
+	/* Verify that PAGE_SIZE >= block_size >= SECTOR_SIZE. */
+	if ((block_size > PAGE_SIZE) ||
+	    (PAGE_SIZE % block_size) ||
+	    (to_sector(block_size) == 0))
+		return -EINVAL;
+
+	tfm = crypto_alloc_shash(alg_name, 0, 0);
+	if (IS_ERR(tfm)) {
+		DMERR("failed to allocate crypto hash '%s'", alg_name);
+		return -ENOMEM;
+	}
+	size = sizeof(struct shash_desc) + crypto_shash_descsize(tfm);
+
+	/* Pre-allocate per-cpu crypto contexts to avoid having to
+	 * kmalloc/kfree a context for every hash operation.
+	 */
+	for (cpu = 0; cpu < nr_cpu_ids; ++cpu) {
+		struct shash_desc *hash_desc = kmalloc(size, GFP_KERNEL);
+
+		bht->hash_desc[cpu] = hash_desc;
+		if (!hash_desc) {
+			DMERR("failed to allocate crypto hash contexts");
+			status = -ENOMEM;
+			goto bad_hash_alloc;
+		}
+		hash_desc->tfm = tfm;
+		hash_desc->flags = 0x0;
+	}
+	bht->digest_size = crypto_shash_digestsize(tfm);
+	/* We expect to be able to pack >=2 hashes into a block */
+	if (block_size / bht->digest_size < 2) {
+		DMERR("too few hashes fit in a block");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	if (bht->digest_size > DM_BHT_MAX_DIGEST_SIZE) {
+		DMERR("DM_BHT_MAX_DIGEST_SIZE too small for chosen digest");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	/* Configure the tree */
+	bht->block_count = block_count;
+	if (block_count == 0) {
+		DMERR("block_count must be non-zero");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	/* Each dm_bht_entry->nodes is one block.  The node code tracks
+	 * how many nodes fit into one entry where a node is a single
+	 * hash (message digest).
+	 */
+	bht->node_count_shift = fls(block_size / bht->digest_size) - 1;
+	/* Round down to the nearest power of two.  This makes indexing
+	 * into the tree much less painful.
+	 */
+	bht->node_count = 1 << bht->node_count_shift;
+
+	/* This is unlikely to happen, but with 64k pages, who knows. */
+	if (bht->node_count > UINT_MAX / bht->digest_size) {
+		DMERR("node_count * hash_len exceeds UINT_MAX!");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	bht->depth = DIV_ROUND_UP(fls(block_count - 1), bht->node_count_shift);
+
+	/* Ensure that we can safely shift by this value. */
+	if (bht->depth * bht->node_count_shift >= sizeof(unsigned int) * 8) {
+		DMERR("specified depth and node_count_shift is too large");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	/* Allocate levels. Each level of the tree may have an arbitrary number
+	 * of dm_bht_entry structs.  Each entry contains node_count nodes.
+	 * Each node in the tree is a cryptographic digest of either node_count
+	 * nodes on the subsequent level or of a specific block on disk.
+	 */
+	bht->levels = (struct dm_bht_level *)
+			kcalloc(bht->depth,
+				sizeof(struct dm_bht_level), GFP_KERNEL);
+	if (!bht->levels) {
+		DMERR("failed to allocate tree levels");
+		status = -ENOMEM;
+		goto bad_level_alloc;
+	}
+
+	bht->read_cb = NULL;
+
+	status = dm_bht_initialize_entries(bht);
+	if (status)
+		goto bad_entries_alloc;
+
+	/* We compute depth such that there is only be 1 block at level 0. */
+	BUG_ON(bht->levels[0].count != 1);
+
+	return 0;
+
+bad_entries_alloc:
+	while (bht->depth-- > 0)
+		kfree(bht->levels[bht->depth].entries);
+	kfree(bht->levels);
+bad_level_alloc:
+bad_arg:
+bad_hash_alloc:
+	for (cpu = 0; cpu < nr_cpu_ids && bht->hash_desc[cpu]; ++cpu)
+		kfree(bht->hash_desc[cpu]);
+	crypto_free_shash(tfm);
+	return status;
+}
+EXPORT_SYMBOL(dm_bht_create);
+
+/**
+ * dm_bht_read_completed
+ * @entry:	pointer to the entry that's been loaded
+ * @status:	I/O status. Non-zero is failure.
+ * MUST always be called after a read_cb completes.
+ */
+void dm_bht_read_completed(struct dm_bht_entry *entry, int status)
+{
+	if (status) {
+		/* TODO(wad) add retry support */
+		DMCRIT("an I/O error occurred while reading entry");
+		atomic_set(&entry->state, DM_BHT_ENTRY_ERROR_IO);
+		/* entry->nodes will be freed later */
+		return;
+	}
+	BUG_ON(atomic_read(&entry->state) != DM_BHT_ENTRY_PENDING);
+	atomic_set(&entry->state, DM_BHT_ENTRY_READY);
+}
+EXPORT_SYMBOL(dm_bht_read_completed);
+
+/**
+ * dm_bht_verify_block - checks that all nodes in the path for @block are valid
+ * @bht:	pointer to a dm_bht_create()d bht
+ * @block:	specific block data is expected from
+ * @pg:		page holding the block data
+ * @offset:	offset into the page
+ *
+ * Returns 0 on success, DM_BHT_ENTRY_ERROR_MISMATCH on error.
+ */
+int dm_bht_verify_block(struct dm_bht *bht, unsigned int block,
+			struct page *pg, unsigned int offset)
+{
+	int state, depth = bht->depth;
+	u8 digest[DM_BHT_MAX_DIGEST_SIZE];
+	struct dm_bht_entry *entry;
+	void *node;
+
+	do {
+		/* Need to check that the hash of the current block is accurate
+		 * in its parent.
+		 */
+		entry = dm_bht_get_entry(bht, depth - 1, block);
+		state = atomic_read(&entry->state);
+		/* This call is only safe if all nodes along the path
+		 * are already populated (i.e. READY) via dm_bht_populate.
+		 */
+		BUG_ON(state < DM_BHT_ENTRY_READY);
+		node = dm_bht_get_node(bht, entry, depth, block);
+
+		if (dm_bht_compute_hash(bht, pg, offset, digest) ||
+		    memcmp(digest, node, bht->digest_size))
+			goto mismatch;
+
+		/* Keep the containing block of hashes to be verified in the
+		 * next pass.
+		 */
+		pg = virt_to_page(entry->nodes);
+		offset = offset_in_page(entry->nodes);
+	} while (--depth > 0 && state != DM_BHT_ENTRY_VERIFIED);
+
+	if (depth == 0 && state != DM_BHT_ENTRY_VERIFIED) {
+		if (dm_bht_compute_hash(bht, pg, offset, digest) ||
+		    memcmp(digest, bht->root_digest, bht->digest_size))
+			goto mismatch;
+		atomic_set(&entry->state, DM_BHT_ENTRY_VERIFIED);
+	}
+
+	/* Mark path to leaf as verified. */
+	for (depth++; depth < bht->depth; depth++) {
+		entry = dm_bht_get_entry(bht, depth, block);
+		/* At this point, entry can only be in VERIFIED or READY state.
+		 * So it is safe to use atomic_set instead of atomic_cmpxchg.
+		 */
+		atomic_set(&entry->state, DM_BHT_ENTRY_VERIFIED);
+	}
+
+	return 0;
+
+mismatch:
+	DMERR_LIMIT("verify_path: failed to verify hash (d=%d,bi=%u)",
+		    depth, block);
+	dm_bht_log_mismatch(bht, node, digest);
+	return DM_BHT_ENTRY_ERROR_MISMATCH;
+}
+EXPORT_SYMBOL(dm_bht_verify_block);
+
+/**
+ * dm_bht_is_populated - check that entries from disk needed to verify a given
+ *                       block are all ready
+ * @bht:	pointer to a dm_bht_create()d bht
+ * @block:	specific block data is expected from
+ *
+ * Callers may wish to call dm_bht_is_populated() when checking an io
+ * for which entries were already pending.
+ */
+bool dm_bht_is_populated(struct dm_bht *bht, unsigned int block)
+{
+	int depth;
+
+	for (depth = bht->depth - 1; depth >= 0; depth--) {
+		struct dm_bht_entry *entry = dm_bht_get_entry(bht, depth,
+							      block);
+		if (atomic_read(&entry->state) < DM_BHT_ENTRY_READY)
+			return false;
+	}
+
+	return true;
+}
+EXPORT_SYMBOL(dm_bht_is_populated);
+
+/**
+ * dm_bht_populate - reads entries from disk needed to verify a given block
+ * @bht:	pointer to a dm_bht_create()d bht
+ * @ctx:        context used for all read_cb calls on this request
+ * @block:	specific block data is expected from
+ *
+ * Returns negative value on error. Returns 0 on success.
+ */
+int dm_bht_populate(struct dm_bht *bht, void *ctx, unsigned int block)
+{
+	int depth, state;
+
+	BUG_ON(block >= bht->block_count);
+
+	for (depth = bht->depth - 1; depth >= 0; --depth) {
+		unsigned int index = dm_bht_index_at_level(bht, depth, block);
+		struct dm_bht_level *level = &bht->levels[depth];
+		struct dm_bht_entry *entry = dm_bht_get_entry(bht, depth,
+							      block);
+		state = atomic_cmpxchg(&entry->state,
+				       DM_BHT_ENTRY_UNALLOCATED,
+				       DM_BHT_ENTRY_PENDING);
+		if (state == DM_BHT_ENTRY_VERIFIED)
+			break;
+		if (state <= DM_BHT_ENTRY_ERROR)
+			goto error_state;
+		if (state != DM_BHT_ENTRY_UNALLOCATED)
+			continue;
+
+		/* Current entry is claimed for allocation and loading */
+		entry->nodes = kmalloc(bht->block_size, GFP_NOIO);
+		if (!entry->nodes)
+			goto nomem;
+
+		bht->read_cb(ctx,
+			     level->sector + to_sector(index * bht->block_size),
+			     entry->nodes, to_sector(bht->block_size), entry);
+	}
+
+	return 0;
+
+error_state:
+	DMCRIT("block %u at depth %d is in an error state", block, depth);
+	return -EPERM;
+
+nomem:
+	DMCRIT("failed to allocate memory for entry->nodes");
+	return -ENOMEM;
+}
+EXPORT_SYMBOL(dm_bht_populate);
+
+/**
+ * dm_bht_destroy - cleans up all memory used by @bht
+ * @bht:	pointer to a dm_bht_create()d bht
+ */
+void dm_bht_destroy(struct dm_bht *bht)
+{
+	int depth, cpu;
+
+	for (depth = 0; depth < bht->depth; depth++) {
+		struct dm_bht_entry *entry = bht->levels[depth].entries;
+		struct dm_bht_entry *entry_end = entry +
+						 bht->levels[depth].count;
+		for (; entry < entry_end; ++entry)
+			kfree(entry->nodes);
+		kfree(bht->levels[depth].entries);
+	}
+	kfree(bht->levels);
+	crypto_free_shash((bht->hash_desc[0])->tfm);
+	for (cpu = 0; cpu < nr_cpu_ids; ++cpu)
+		kfree(bht->hash_desc[cpu]);
+}
+EXPORT_SYMBOL(dm_bht_destroy);
+
+/*
+ * Accessors
+ */
+
+/**
+ * dm_bht_set_root_hexdigest - sets an unverified root digest hash from hex
+ * @bht:	pointer to a dm_bht_create()d bht
+ * @hexdigest:	array of u8s containing the new digest in binary
+ * Returns non-zero on error.  hexdigest should be NUL terminated.
+ */
+int dm_bht_set_root_hexdigest(struct dm_bht *bht, const u8 *hexdigest)
+{
+	/* Make sure we have at least the bytes expected */
+	if (strnlen((char *)hexdigest, bht->digest_size * 2) !=
+	    bht->digest_size * 2) {
+		DMERR("root digest length does not match hash algorithm");
+		return -1;
+	}
+	dm_bht_hex_to_bin(bht->root_digest, hexdigest, bht->digest_size);
+	return 0;
+}
+EXPORT_SYMBOL(dm_bht_set_root_hexdigest);
+
+/**
+ * dm_bht_root_hexdigest - returns root digest in hex
+ * @bht:	pointer to a dm_bht_create()d bht
+ * @hexdigest:	u8 array of size @available
+ * @available:	must be bht->digest_size * 2 + 1
+ */
+int dm_bht_root_hexdigest(struct dm_bht *bht, u8 *hexdigest, int available)
+{
+	if (available < 0 ||
+	    ((unsigned int) available) < bht->digest_size * 2 + 1) {
+		DMERR("hexdigest has too few bytes available");
+		return -EINVAL;
+	}
+	dm_bht_bin_to_hex(bht->root_digest, hexdigest, bht->digest_size);
+	return 0;
+}
+EXPORT_SYMBOL(dm_bht_root_hexdigest);
+
+/**
+ * dm_bht_set_salt - sets the salt used, in hex
+ * @bht:      pointer to a dm_bht_create()d bht
+ * @hexsalt:  salt string, as hex; will be zero-padded or truncated to
+ *            DM_BHT_SALT_SIZE * 2 hex digits.
+ */
+void dm_bht_set_salt(struct dm_bht *bht, const char *hexsalt)
+{
+	size_t saltlen = min(strlen(hexsalt) / 2, sizeof(bht->salt));
+
+	memset(bht->salt, 0, sizeof(bht->salt));
+	dm_bht_hex_to_bin(bht->salt, (const u8 *)hexsalt, saltlen);
+}
+EXPORT_SYMBOL(dm_bht_set_salt);
+
+/**
+ * dm_bht_salt - returns the salt used, in hex
+ * @bht:      pointer to a dm_bht_create()d bht
+ * @hexsalt:  buffer to put salt into, of length DM_BHT_SALT_SIZE * 2 + 1.
+ */
+int dm_bht_salt(struct dm_bht *bht, char *hexsalt)
+{
+	dm_bht_bin_to_hex(bht->salt, (u8 *)hexsalt, sizeof(bht->salt));
+	return 0;
+}
+EXPORT_SYMBOL(dm_bht_salt);
+
diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
new file mode 100644
index 0000000..8f1e8dc
--- /dev/null
+++ b/drivers/md/dm-verity.c
@@ -0,0 +1,1051 @@
+/*
+ * Originally based on dm-crypt.c,
+ * Copyright (C) 2003 Christophe Saout <christophe@saout.de>
+ * Copyright (C) 2004 Clemens Fruhwirth <clemens@endorphin.org>
+ * Copyright (C) 2006-2008 Red Hat, Inc. All rights reserved.
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *                    All Rights Reserved.
+ *
+ * This file is released under the GPLv2.
+ *
+ * Implements a verifying transparent block device.
+ * See Documentation/device-mapper/dm-verity.txt
+ */
+#include <linux/async.h>
+#include <linux/atomic.h>
+#include <linux/bio.h>
+#include <linux/blkdev.h>
+#include <linux/delay.h>
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/genhd.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mempool.h>
+#include <linux/mm_types.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/workqueue.h>
+#include <linux/device-mapper.h>
+#include <linux/dm-bht.h>
+
+#include "dm-verity.h"
+
+#define DM_MSG_PREFIX "verity"
+
+/* Supports up to 512-bit digests */
+#define VERITY_MAX_DIGEST_SIZE 64
+
+/* TODO(wad) make both of these report the error line/file to a
+ *           verity_bug function.
+ */
+#define VERITY_BUG(msg...) BUG()
+#define VERITY_BUG_ON(cond, msg...) BUG_ON(cond)
+
+/* Helper for printing sector_t */
+#define ULL(x) ((unsigned long long)(x))
+
+#define MIN_IOS 32
+#define MIN_BIOS (MIN_IOS * 2)
+#define VERITY_DEFAULT_BLOCK_SIZE 4096
+
+/* Provide a lightweight means of specifying the global default for
+ * error behavior: eio, reboot, or none
+ * Legacy support for 0 = eio, 1 = reboot/panic, 2 = none, 3 = notify.
+ * This is matched to the enum in dm-verity.h.
+ */
+static const char * const allowed_error_behaviors[] = { "eio", "panic", "none",
+							"notify", NULL };
+static char *error_behavior = "eio";
+module_param(error_behavior, charp, 0644);
+MODULE_PARM_DESC(error_behavior, "Behavior on error "
+				 "(eio, panic, none, notify)");
+
+/* Controls whether verity_get_device will wait forever for a device. */
+static int dev_wait;
+module_param(dev_wait, bool, 0444);
+MODULE_PARM_DESC(dev_wait, "Wait forever for a backing device");
+
+/* per-requested-bio private data */
+enum verity_io_flags {
+	VERITY_IOFLAGS_CLONED = 0x1,	/* original bio has been cloned */
+};
+
+struct dm_verity_io {
+	struct dm_target *target;
+	struct bio *bio;
+	struct delayed_work work;
+	unsigned int flags;
+
+	int error;
+	atomic_t pending;
+
+	u64 block;  /* aligned block index */
+	u64 count;  /* aligned count in blocks */
+};
+
+struct verity_config {
+	struct dm_dev *dev;
+	sector_t start;
+	sector_t size;
+
+	struct dm_dev *hash_dev;
+	sector_t hash_start;
+
+	struct dm_bht bht;
+
+	/* Pool required for io contexts */
+	mempool_t *io_pool;
+	/* Pool and bios required for making sure that backing device reads are
+	 * in PAGE_SIZE increments.
+	 */
+	struct bio_set *bs;
+
+	char hash_alg[CRYPTO_MAX_ALG_NAME];
+
+	int error_behavior;
+};
+
+static struct kmem_cache *_verity_io_pool;
+static struct workqueue_struct *kveritydq, *kverityd_ioq;
+
+static void kverityd_verify(struct work_struct *work);
+static void kverityd_io(struct work_struct *work);
+static void kverityd_io_bht_populate(struct dm_verity_io *io);
+static void kverityd_io_bht_populate_end(struct bio *, int error);
+
+static BLOCKING_NOTIFIER_HEAD(verity_error_notifier);
+
+/*
+ * Exported interfaces
+ */
+
+int dm_verity_register_error_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_register(&verity_error_notifier, nb);
+}
+EXPORT_SYMBOL_GPL(dm_verity_register_error_notifier);
+
+int dm_verity_unregister_error_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_unregister(&verity_error_notifier, nb);
+}
+EXPORT_SYMBOL_GPL(dm_verity_unregister_error_notifier);
+
+/*
+ * Allocation and utility functions
+ */
+
+static void kverityd_src_io_read_end(struct bio *clone, int error);
+
+/* Shared destructor for all internal bios */
+static void dm_verity_bio_destructor(struct bio *bio)
+{
+	struct dm_verity_io *io = bio->bi_private;
+	struct verity_config *vc = io->target->private;
+	bio_free(bio, vc->bs);
+}
+
+static struct bio *verity_alloc_bioset(struct verity_config *vc, gfp_t gfp_mask,
+				       int nr_iovecs)
+{
+	return bio_alloc_bioset(gfp_mask, nr_iovecs, vc->bs);
+}
+
+static struct dm_verity_io *verity_io_alloc(struct dm_target *ti,
+					    struct bio *bio)
+{
+	struct verity_config *vc = ti->private;
+	sector_t sector = bio->bi_sector - ti->begin;
+	struct dm_verity_io *io;
+
+	io = mempool_alloc(vc->io_pool, GFP_NOIO);
+	if (unlikely(!io))
+		return NULL;
+	io->flags = 0;
+	io->target = ti;
+	io->bio = bio;
+	io->error = 0;
+
+	/* Adjust the sector by the virtual starting sector */
+	io->block = to_bytes(sector) / vc->bht.block_size;
+	io->count = bio->bi_size / vc->bht.block_size;
+
+	atomic_set(&io->pending, 0);
+
+	return io;
+}
+
+static struct bio *verity_bio_clone(struct dm_verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+	struct bio *bio = io->bio;
+	struct bio *clone = verity_alloc_bioset(vc, GFP_NOIO, bio->bi_max_vecs);
+
+	if (!clone)
+		return NULL;
+
+	__bio_clone(clone, bio);
+	clone->bi_private = io;
+	clone->bi_end_io  = kverityd_src_io_read_end;
+	clone->bi_bdev    = vc->dev->bdev;
+	clone->bi_sector += vc->start - io->target->begin;
+	clone->bi_destructor = dm_verity_bio_destructor;
+
+	return clone;
+}
+
+/* If the request is not successful, this handler takes action.
+ * TODO make this call a registered handler.
+ */
+static void verity_error(struct verity_config *vc, struct dm_verity_io *io,
+			 int error)
+{
+	const char *message;
+	int error_mode = DM_VERITY_ERROR_BEHAVIOR_PANIC;
+	dev_t devt = 0;
+	u64 block = ~0;
+	int transient = 1;
+	struct dm_verity_error_state error_state;
+
+	if (vc) {
+		devt = vc->dev->bdev->bd_dev;
+		error_mode = vc->error_behavior;
+	}
+
+	if (io) {
+		io->error = -EIO;
+		block = io->block;
+	}
+
+	switch (error) {
+	case -ENOMEM:
+		message = "out of memory";
+		break;
+	case -EBUSY:
+		message = "pending data seen during verify";
+		break;
+	case -EFAULT:
+		message = "crypto operation failure";
+		break;
+	case -EACCES:
+		message = "integrity failure";
+		/* Image is bad. */
+		transient = 0;
+		break;
+	case -EPERM:
+		message = "hash tree population failure";
+		/* Should be dm-bht specific errors */
+		transient = 0;
+		break;
+	case -EINVAL:
+		message = "unexpected missing/invalid data";
+		/* The device was configured incorrectly - fallback. */
+		transient = 0;
+		break;
+	default:
+		/* Other errors can be passed through as IO errors */
+		message = "unknown or I/O error";
+		return;
+	}
+
+	DMERR_LIMIT("verification failure occurred: %s", message);
+
+	if (error_mode == DM_VERITY_ERROR_BEHAVIOR_NOTIFY) {
+		error_state.code = error;
+		error_state.transient = transient;
+		error_state.block = block;
+		error_state.message = message;
+		error_state.dev_start = vc->start;
+		error_state.dev_len = vc->size;
+		error_state.dev = vc->dev->bdev;
+		error_state.hash_dev_start = vc->hash_start;
+		error_state.hash_dev_len = vc->bht.sectors;
+		error_state.hash_dev = vc->hash_dev->bdev;
+
+		/* Set default fallthrough behavior. */
+		error_state.behavior = DM_VERITY_ERROR_BEHAVIOR_PANIC;
+		error_mode = DM_VERITY_ERROR_BEHAVIOR_PANIC;
+
+		if (!blocking_notifier_call_chain(
+		    &verity_error_notifier, transient, &error_state)) {
+			error_mode = error_state.behavior;
+		}
+	}
+
+	switch (error_mode) {
+	case DM_VERITY_ERROR_BEHAVIOR_EIO:
+		break;
+	case DM_VERITY_ERROR_BEHAVIOR_NONE:
+		if (error != -EIO && io)
+			io->error = 0;
+		break;
+	default:
+		goto do_panic;
+	}
+	return;
+
+do_panic:
+	panic("dm-verity failure: "
+	      "device:%u:%u error:%d block:%llu message:%s",
+	      MAJOR(devt), MINOR(devt), error, ULL(block), message);
+}
+
+/**
+ * verity_parse_error_behavior - parse a behavior charp to the enum
+ * @behavior:	NUL-terminated char array
+ *
+ * Checks if the behavior is valid either as text or as an index digit
+ * and returns the proper enum value or -1 on error.
+ */
+static int verity_parse_error_behavior(const char *behavior)
+{
+	const char * const *allowed = allowed_error_behaviors;
+	char index = '0';
+
+	for (; *allowed; allowed++, index++)
+		if (!strcmp(*allowed, behavior) || behavior[0] == index)
+			break;
+
+	if (!*allowed)
+		return -1;
+
+	/* Convert to the integer index matching the enum. */
+	return allowed - allowed_error_behaviors;
+}
+
+/*
+ * Reverse flow of requests into the device.
+ *
+ * (Start at the bottom with verity_map and work your way upward).
+ */
+
+static void verity_inc_pending(struct dm_verity_io *io);
+
+static void verity_return_bio_to_caller(struct dm_verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+
+	if (io->error)
+		verity_error(vc, io, io->error);
+
+	bio_endio(io->bio, io->error);
+	mempool_free(io, vc->io_pool);
+}
+
+/* Check for any missing bht hashes. */
+static bool verity_is_bht_populated(struct dm_verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+	u64 block;
+
+	for (block = io->block; block < io->block + io->count; ++block)
+		if (!dm_bht_is_populated(&vc->bht, block))
+			return false;
+
+	return true;
+}
+
+/* verity_dec_pending manages the lifetime of all dm_verity_io structs.
+ * Non-bug error handling is centralized through this interface and
+ * all passage from workqueue to workqueue.
+ */
+static void verity_dec_pending(struct dm_verity_io *io)
+{
+	if (!atomic_dec_and_test(&io->pending))
+		goto done;
+
+	if (unlikely(io->error))
+		goto io_error;
+
+	/* I/Os that were pending may now be ready */
+	if (verity_is_bht_populated(io)) {
+		INIT_DELAYED_WORK(&io->work, kverityd_verify);
+		queue_delayed_work(kveritydq, &io->work, 0);
+	} else {
+		INIT_DELAYED_WORK(&io->work, kverityd_io);
+		queue_delayed_work(kverityd_ioq, &io->work, HZ/10);
+	}
+
+done:
+	return;
+
+io_error:
+	verity_return_bio_to_caller(io);
+}
+
+/* Walks the data set and computes the hash of the data read from the
+ * untrusted source device.  The computed hash is then passed to dm-bht
+ * for verification.
+ */
+static int verity_verify(struct verity_config *vc,
+			 struct dm_verity_io *io)
+{
+	unsigned int block_size = vc->bht.block_size;
+	struct bio *bio = io->bio;
+	u64 block = io->block;
+	unsigned int idx;
+	int r;
+
+	for (idx = bio->bi_idx; idx < bio->bi_vcnt; idx++) {
+		struct bio_vec *bv = bio_iovec_idx(bio, idx);
+		unsigned int offset = bv->bv_offset;
+		unsigned int len = bv->bv_len;
+
+		VERITY_BUG_ON(offset % block_size);
+		VERITY_BUG_ON(len % block_size);
+
+		while (len) {
+			r = dm_bht_verify_block(&vc->bht, block,
+						bv->bv_page, offset);
+			if (r)
+				goto bad_return;
+
+			offset += block_size;
+			len -= block_size;
+			block++;
+			cond_resched();
+		}
+	}
+
+	return 0;
+
+bad_return:
+	/* dm_bht functions aren't expected to return errno friendly
+	 * values.  They are converted here for uniformity.
+	 */
+	if (r > 0) {
+		DMERR("Pending data for block %llu seen at verify", ULL(block));
+		r = -EBUSY;
+	} else {
+		DMERR_LIMIT("Block hash does not match!");
+		r = -EACCES;
+	}
+	return r;
+}
+
+/* Services the verify workqueue */
+static void kverityd_verify(struct work_struct *work)
+{
+	struct delayed_work *dwork = container_of(work, struct delayed_work,
+						  work);
+	struct dm_verity_io *io = container_of(dwork, struct dm_verity_io,
+					       work);
+	struct verity_config *vc = io->target->private;
+
+	io->error = verity_verify(vc, io);
+
+	/* Free up the bio and tag with the return value */
+	verity_return_bio_to_caller(io);
+}
+
+/* Asynchronously called upon the completion of dm-bht I/O.  The status
+ * of the operation is passed back to dm-bht and the next steps are
+ * decided by verity_dec_pending.
+ */
+static void kverityd_io_bht_populate_end(struct bio *bio, int error)
+{
+	struct dm_bht_entry *entry = (struct dm_bht_entry *) bio->bi_private;
+	struct dm_verity_io *io = (struct dm_verity_io *) entry->io_context;
+
+	/* Tell the tree to atomically update now that we've populated
+	 * the given entry.
+	 */
+	dm_bht_read_completed(entry, error);
+
+	/* Clean up for reuse when reading data to be checked */
+	bio->bi_vcnt = 0;
+	bio->bi_io_vec->bv_offset = 0;
+	bio->bi_io_vec->bv_len = 0;
+	bio->bi_io_vec->bv_page = NULL;
+	/* Restore the private data to I/O so the destructor can be shared. */
+	bio->bi_private = (void *) io;
+	bio_put(bio);
+
+	/* We bail but assume the tree has been marked bad. */
+	if (unlikely(error)) {
+		DMERR("Failed to read for sector %llu (%u)",
+		      ULL(io->bio->bi_sector), io->bio->bi_size);
+		io->error = error;
+		/* Pass through the error to verity_dec_pending below */
+	}
+	/* When pending = 0, it will transition to reading real data */
+	verity_dec_pending(io);
+}
+
+/* Called by dm-bht (via dm_bht_populate), this function provides
+ * the message digests to dm-bht that are stored on disk.
+ */
+static int kverityd_bht_read_callback(void *ctx, sector_t start, u8 *dst,
+				      sector_t count,
+				      struct dm_bht_entry *entry)
+{
+	struct dm_verity_io *io = ctx;  /* I/O for this batch */
+	struct verity_config *vc;
+	struct bio *bio;
+
+	vc = io->target->private;
+
+	/* The I/O context is nested inside the entry so that we don't need one
+	 * io context per page read.
+	 */
+	entry->io_context = ctx;
+
+	/* We should only get page size requests at present. */
+	verity_inc_pending(io);
+	bio = verity_alloc_bioset(vc, GFP_NOIO, 1);
+	if (unlikely(!bio)) {
+		DMCRIT("Out of memory at bio_alloc_bioset");
+		dm_bht_read_completed(entry, -ENOMEM);
+		return -ENOMEM;
+	}
+	bio->bi_private = (void *) entry;
+	bio->bi_idx = 0;
+	bio->bi_size = vc->bht.block_size;
+	bio->bi_sector = vc->hash_start + start;
+	bio->bi_bdev = vc->hash_dev->bdev;
+	bio->bi_end_io = kverityd_io_bht_populate_end;
+	bio->bi_rw = REQ_META;
+	/* Only need to free the bio since the page is managed by bht */
+	bio->bi_destructor = dm_verity_bio_destructor;
+	bio->bi_vcnt = 1;
+	bio->bi_io_vec->bv_offset = offset_in_page(dst);
+	bio->bi_io_vec->bv_len = to_bytes(count);
+	/* dst is guaranteed to be a page_pool allocation */
+	bio->bi_io_vec->bv_page = virt_to_page(dst);
+	/* Track that this I/O is in use.  There should be no risk of the io
+	 * being removed prior since this is called synchronously.
+	 */
+	generic_make_request(bio);
+	return 0;
+}
+
+/* Submits an io request for each missing block of block hashes.
+ * The last one to return will then enqueue this on the io workqueue.
+ */
+static void kverityd_io_bht_populate(struct dm_verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+	u64 block;
+
+	for (block = io->block; block < io->block + io->count; ++block) {
+		int ret = dm_bht_populate(&vc->bht, io, block);
+
+		if (ret < 0) {
+			/* verity_dec_pending will handle the error case. */
+			io->error = ret;
+			break;
+		}
+	}
+}
+
+/* Asynchronously called upon the completion of I/O issued
+ * from kverityd_src_io_read. verity_dec_pending() acts as
+ * the scheduler/flow manager.
+ */
+static void kverityd_src_io_read_end(struct bio *clone, int error)
+{
+	struct dm_verity_io *io = clone->bi_private;
+
+	if (unlikely(!bio_flagged(clone, BIO_UPTODATE) && !error))
+		error = -EIO;
+
+	if (unlikely(error)) {
+		DMERR("Error occurred: %d (%llu, %u)",
+			error, ULL(clone->bi_sector), clone->bi_size);
+		io->error = error;
+	}
+
+	/* Release the clone which just avoids the block layer from
+	 * leaving offsets, etc in unexpected states.
+	 */
+	bio_put(clone);
+
+	verity_dec_pending(io);
+}
+
+/* If not yet underway, an I/O request will be issued to the vc->dev
+ * device for the data needed. It is cloned to avoid unexpected changes
+ * to the original bio struct.
+ */
+static void kverityd_src_io_read(struct dm_verity_io *io)
+{
+	struct bio *clone;
+
+	/* Check if the read is already issued. */
+	if (io->flags & VERITY_IOFLAGS_CLONED)
+		return;
+
+	io->flags |= VERITY_IOFLAGS_CLONED;
+
+	/* Clone the bio. The block layer may modify the bvec array. */
+	clone = verity_bio_clone(io);
+	if (unlikely(!clone)) {
+		io->error = -ENOMEM;
+		return;
+	}
+
+	verity_inc_pending(io);
+
+	generic_make_request(clone);
+}
+
+/* kverityd_io services the I/O workqueue. For each pass through
+ * the I/O workqueue, a call to populate both the origin drive
+ * data and the hash tree data is made.
+ */
+static void kverityd_io(struct work_struct *work)
+{
+	struct delayed_work *dwork = container_of(work, struct delayed_work,
+						  work);
+	struct dm_verity_io *io = container_of(dwork, struct dm_verity_io,
+					       work);
+
+	/* Issue requests asynchronously. */
+	verity_inc_pending(io);
+	kverityd_src_io_read(io);
+	kverityd_io_bht_populate(io);
+	verity_dec_pending(io);
+}
+
+/* Paired with verity_dec_pending, the pending value in the io dictate the
+ * lifetime of a request and when it is ready to be processed on the
+ * workqueues.
+ */
+static void verity_inc_pending(struct dm_verity_io *io)
+{
+	atomic_inc(&io->pending);
+}
+
+/* Block-level requests start here. */
+static int verity_map(struct dm_target *ti, struct bio *bio,
+		      union map_info *map_context)
+{
+	struct dm_verity_io *io;
+	struct verity_config *vc;
+	struct request_queue *r_queue;
+
+	if (unlikely(!ti)) {
+		DMERR("dm_target was NULL");
+		return -EIO;
+	}
+
+	vc = ti->private;
+	r_queue = bdev_get_queue(vc->dev->bdev);
+
+	if (bio_data_dir(bio) == WRITE) {
+		/* If we silently drop writes, then the VFS layer will cache
+		 * the write and persist it in memory. While it doesn't change
+		 * the underlying storage, it still may be contrary to the
+		 * behavior expected by a verified, read-only device.
+		 */
+		DMWARN_LIMIT("write request received. rejecting with -EIO.");
+		verity_error(vc, NULL, -EIO);
+		return -EIO;
+	} else {
+		/* Queue up the request to be verified */
+		io = verity_io_alloc(ti, bio);
+		if (!io) {
+			DMERR_LIMIT("Failed to allocate and init IO data");
+			return DM_MAPIO_REQUEUE;
+		}
+		INIT_DELAYED_WORK(&io->work, kverityd_io);
+		queue_delayed_work(kverityd_ioq, &io->work, 0);
+	}
+
+	return DM_MAPIO_SUBMITTED;
+}
+
+static void splitarg(char *arg, char **key, char **val)
+{
+	*key = strsep(&arg, "=");
+	*val = strsep(&arg, "");
+}
+
+/*
+ * Non-block interfaces and device-mapper specific code
+ */
+
+/**
+ * verity_ctr - Construct a verified mapping
+ * @ti:   Target being created
+ * @argc: Number of elements in argv
+ * @argv: Vector of key-value pairs (see below).
+ *
+ * Accepts the following keys:
+ * @payload:        hashed device
+ * @hashtree:       device hashtree is stored on
+ * @hashstart:      start address of hashes (default 0)
+ * @block_size:     size of a hash block
+ * @alg:            hash algorithm
+ * @root_hexdigest: toplevel hash of the tree
+ * @error_behavior: what to do when verification fails [optional]
+ * @salt:           salt, in hex [optional]
+ *
+ * E.g.,
+ * payload=/dev/sda2 hashtree=/dev/sda3 alg=sha256
+ * root_hexdigest=f08aa4a3695290c569eb1b0ac032ae1040150afb527abbeb0a3da33d82fb2c6e
+ *
+ * TODO(wad):
+ * - Boot time addition
+ * - Track block verification to free block_hashes if memory use is a concern
+ * Testing needed:
+ * - Regular slub_debug tracing (on checkins)
+ * - Improper block hash padding
+ * - Improper bundle padding
+ * - Improper hash layout
+ * - Missing padding at end of device
+ * - Improperly sized underlying devices
+ * - Out of memory conditions (make sure this isn't too flaky under high load!)
+ * - Incorrect superhash
+ * - Incorrect block hashes
+ * - Incorrect bundle hashes
+ * - Boot-up read speed; sustained read speeds
+ */
+static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+{
+	struct verity_config *vc = NULL;
+	int ret = 0;
+	sector_t blocks;
+	unsigned int block_size = VERITY_DEFAULT_BLOCK_SIZE;
+	const char *payload = NULL;
+	const char *hashtree = NULL;
+	unsigned long hashstart = 0;
+	const char *alg = NULL;
+	const char *root_hexdigest = NULL;
+	const char *dev_error_behavior = error_behavior;
+	const char *hexsalt = "";
+	int i;
+
+	for (i = 0; i < argc; ++i) {
+		char *key, *val;
+		DMWARN("Argument %d: '%s'", i, argv[i]);
+		splitarg(argv[i], &key, &val);
+		if (!key) {
+			DMWARN("Bad argument %d: missing key?", i);
+			break;
+		}
+		if (!val) {
+			DMWARN("Bad argument %d='%s': missing value", i, key);
+			break;
+		}
+
+		if (!strcmp(key, "alg")) {
+			alg = val;
+		} else if (!strcmp(key, "payload")) {
+			payload = val;
+		} else if (!strcmp(key, "hashtree")) {
+			hashtree = val;
+		} else if (!strcmp(key, "root_hexdigest")) {
+			root_hexdigest = val;
+		} else if (!strcmp(key, "hashstart")) {
+			if (strict_strtoul(val, 10, &hashstart)) {
+				ti->error = "Invalid hashstart";
+				return -EINVAL;
+			}
+		} else if (!strcmp(key, "block_size")) {
+			unsigned long tmp;
+			if (strict_strtoul(val, 10, &tmp) ||
+			    (tmp > UINT_MAX)) {
+				ti->error = "Invalid block_size";
+				return -EINVAL;
+			}
+			block_size = (unsigned int)tmp;
+		} else if (!strcmp(key, "error_behavior")) {
+			dev_error_behavior = val;
+		} else if (!strcmp(key, "salt")) {
+			hexsalt = val;
+		} else if (!strcmp(key, "error_behavior")) {
+			dev_error_behavior = val;
+		}
+	}
+
+#define NEEDARG(n) \
+	if (!(n)) { \
+		ti->error = "Missing argument: " #n; \
+		return -EINVAL; \
+	}
+
+	NEEDARG(alg);
+	NEEDARG(payload);
+	NEEDARG(hashtree);
+	NEEDARG(root_hexdigest);
+
+#undef NEEDARG
+
+	/* The device mapper device should be setup read-only */
+	if ((dm_table_get_mode(ti->table) & ~FMODE_READ) != 0) {
+		ti->error = "Must be created readonly.";
+		return -EINVAL;
+	}
+
+	vc = kzalloc(sizeof(*vc), GFP_KERNEL);
+	if (!vc) {
+		/* TODO(wad) if this is called from the setup helper, then we
+		 * catch these errors and do a CrOS specific thing. if not, we
+		 * need to have this call the error handler.
+		 */
+		return -EINVAL;
+	}
+
+	/* Calculate the blocks from the given device size */
+	vc->size = ti->len;
+	blocks = to_bytes(vc->size) / block_size;
+	if (dm_bht_create(&vc->bht, blocks, block_size, alg)) {
+		DMERR("failed to create required bht");
+		goto bad_bht;
+	}
+	if (dm_bht_set_root_hexdigest(&vc->bht, root_hexdigest)) {
+		DMERR("root hexdigest error");
+		goto bad_root_hexdigest;
+	}
+	dm_bht_set_salt(&vc->bht, hexsalt);
+	vc->bht.read_cb = kverityd_bht_read_callback;
+
+	/* payload: device to verify */
+	vc->start = 0;  /* TODO: should this support a starting offset? */
+	/* We only ever grab the device in read-only mode. */
+	ret = dm_get_device(ti, payload,
+			    dm_table_get_mode(ti->table), &vc->dev);
+	if (ret) {
+		DMERR("Failed to acquire device '%s': %d", payload, ret);
+		ti->error = "Device lookup failed";
+		goto bad_verity_dev;
+	}
+
+	if ((to_bytes(vc->start) % block_size) ||
+	    (to_bytes(vc->size) % block_size)) {
+		ti->error = "Device must be block_size divisble/aligned";
+		goto bad_hash_start;
+	}
+
+	vc->hash_start = (sector_t)hashstart;
+
+	/* hashtree: device with hashes.
+	 * Note, payload == hashtree is okay as long as the size of
+	 *       ti->len passed to device mapper does not include
+	 *       the hashes.
+	 */
+	if (dm_get_device(ti, hashtree,
+			  dm_table_get_mode(ti->table), &vc->hash_dev)) {
+		ti->error = "Hash device lookup failed";
+		goto bad_hash_dev;
+	}
+
+	/* arg4: cryptographic digest algorithm */
+	if (snprintf(vc->hash_alg, CRYPTO_MAX_ALG_NAME, "%s", alg) >=
+	    CRYPTO_MAX_ALG_NAME) {
+		ti->error = "Hash algorithm name is too long";
+		goto bad_hash;
+	}
+
+	/* override with optional device-specific error behavior */
+	vc->error_behavior = verity_parse_error_behavior(dev_error_behavior);
+	if (vc->error_behavior == -1) {
+		ti->error = "Bad error_behavior supplied";
+		goto bad_err_behavior;
+	}
+
+	/* TODO: Maybe issues a request on the io queue for block 0? */
+
+	/* Argument processing is done, setup operational data */
+	/* Pool for dm_verity_io objects */
+	vc->io_pool = mempool_create_slab_pool(MIN_IOS, _verity_io_pool);
+	if (!vc->io_pool) {
+		ti->error = "Cannot allocate verity io mempool";
+		goto bad_slab_pool;
+	}
+
+	/* Allocate the bioset used for request padding */
+	/* TODO(wad) allocate a separate bioset for the first verify maybe */
+	vc->bs = bioset_create(MIN_BIOS, 0);
+	if (!vc->bs) {
+		ti->error = "Cannot allocate verity bioset";
+		goto bad_bs;
+	}
+
+	ti->num_flush_requests = 1;
+	ti->private = vc;
+
+	/* TODO(wad) add device and hash device names */
+	{
+		char hashdev[BDEVNAME_SIZE], vdev[BDEVNAME_SIZE];
+		bdevname(vc->hash_dev->bdev, hashdev);
+		bdevname(vc->dev->bdev, vdev);
+		DMINFO("dev:%s hash:%s [sectors:%llu blocks:%llu]", vdev,
+		       hashdev, ULL(vc->bht.sectors), ULL(blocks));
+	}
+	return 0;
+
+bad_bs:
+	mempool_destroy(vc->io_pool);
+bad_slab_pool:
+bad_err_behavior:
+bad_hash:
+	dm_put_device(ti, vc->hash_dev);
+bad_hash_dev:
+bad_hash_start:
+	dm_put_device(ti, vc->dev);
+bad_bht:
+bad_root_hexdigest:
+bad_verity_dev:
+	kfree(vc);   /* hash is not secret so no need to zero */
+	return -EINVAL;
+}
+
+static void verity_dtr(struct dm_target *ti)
+{
+	struct verity_config *vc = (struct verity_config *) ti->private;
+
+	bioset_free(vc->bs);
+	mempool_destroy(vc->io_pool);
+	dm_bht_destroy(&vc->bht);
+	dm_put_device(ti, vc->hash_dev);
+	dm_put_device(ti, vc->dev);
+	kfree(vc);
+}
+
+static int verity_ioctl(struct dm_target *ti, unsigned int cmd,
+			unsigned long arg)
+{
+	struct verity_config *vc = (struct verity_config *) ti->private;
+	return __blkdev_driver_ioctl(vc->dev->bdev, vc->dev->mode, cmd, arg);
+}
+
+static int verity_status(struct dm_target *ti, status_type_t type,
+			char *result, unsigned int maxlen)
+{
+	struct verity_config *vc = (struct verity_config *) ti->private;
+	unsigned int sz = 0;
+	char hashdev[BDEVNAME_SIZE], vdev[BDEVNAME_SIZE];
+	u8 hexdigest[VERITY_MAX_DIGEST_SIZE * 2 + 1] = { 0 };
+
+	dm_bht_root_hexdigest(&vc->bht, hexdigest, sizeof(hexdigest));
+
+	switch (type) {
+	case STATUSTYPE_INFO:
+		break;
+	case STATUSTYPE_TABLE:
+		bdevname(vc->hash_dev->bdev, hashdev);
+		bdevname(vc->dev->bdev, vdev);
+		DMEMIT("/dev/%s /dev/%s %llu %u %s %s",
+			vdev,
+			hashdev,
+			ULL(vc->hash_start),
+			vc->bht.depth,
+			vc->hash_alg,
+			hexdigest);
+		break;
+	}
+	return 0;
+}
+
+static int verity_merge(struct dm_target *ti, struct bvec_merge_data *bvm,
+		       struct bio_vec *biovec, int max_size)
+{
+	struct verity_config *vc = ti->private;
+	struct request_queue *q = bdev_get_queue(vc->dev->bdev);
+
+	if (!q->merge_bvec_fn)
+		return max_size;
+
+	bvm->bi_bdev = vc->dev->bdev;
+	bvm->bi_sector = vc->start + bvm->bi_sector - ti->begin;
+
+	/* Optionally, this could just return 0 to stick to single pages. */
+	return min(max_size, q->merge_bvec_fn(q, bvm, biovec));
+}
+
+static int verity_iterate_devices(struct dm_target *ti,
+				 iterate_devices_callout_fn fn, void *data)
+{
+	struct verity_config *vc = ti->private;
+
+	return fn(ti, vc->dev, vc->start, ti->len, data);
+}
+
+static void verity_io_hints(struct dm_target *ti,
+			    struct queue_limits *limits)
+{
+	struct verity_config *vc = ti->private;
+	unsigned int block_size = vc->bht.block_size;
+
+	limits->logical_block_size = block_size;
+	limits->physical_block_size = block_size;
+	blk_limits_io_min(limits, block_size);
+}
+
+static struct target_type verity_target = {
+	.name   = "verity",
+	.version = {0, 1, 0},
+	.module = THIS_MODULE,
+	.ctr    = verity_ctr,
+	.dtr    = verity_dtr,
+	.ioctl  = verity_ioctl,
+	.map    = verity_map,
+	.merge  = verity_merge,
+	.status = verity_status,
+	.iterate_devices = verity_iterate_devices,
+	.io_hints = verity_io_hints,
+};
+
+#define VERITY_WQ_FLAGS (WQ_CPU_INTENSIVE|WQ_HIGHPRI)
+
+static int __init dm_verity_init(void)
+{
+	int r = -ENOMEM;
+
+	_verity_io_pool = KMEM_CACHE(dm_verity_io, 0);
+	if (!_verity_io_pool) {
+		DMERR("failed to allocate pool dm_verity_io");
+		goto bad_io_pool;
+	}
+
+	kverityd_ioq = alloc_workqueue("kverityd_io", VERITY_WQ_FLAGS, 1);
+	if (!kverityd_ioq) {
+		DMERR("failed to create workqueue kverityd_ioq");
+		goto bad_io_queue;
+	}
+
+	kveritydq = alloc_workqueue("kverityd", VERITY_WQ_FLAGS, 1);
+	if (!kveritydq) {
+		DMERR("failed to create workqueue kveritydq");
+		goto bad_verify_queue;
+	}
+
+	r = dm_register_target(&verity_target);
+	if (r < 0) {
+		DMERR("register failed %d", r);
+		goto register_failed;
+	}
+
+	DMINFO("version %u.%u.%u loaded", verity_target.version[0],
+	       verity_target.version[1], verity_target.version[2]);
+
+	return r;
+
+register_failed:
+	destroy_workqueue(kveritydq);
+bad_verify_queue:
+	destroy_workqueue(kverityd_ioq);
+bad_io_queue:
+	kmem_cache_destroy(_verity_io_pool);
+bad_io_pool:
+	return r;
+}
+
+static void __exit dm_verity_exit(void)
+{
+	destroy_workqueue(kveritydq);
+	destroy_workqueue(kverityd_ioq);
+
+	dm_unregister_target(&verity_target);
+	kmem_cache_destroy(_verity_io_pool);
+}
+
+module_init(dm_verity_init);
+module_exit(dm_verity_exit);
+
+MODULE_AUTHOR("The Chromium OS Authors <chromium-os-dev@chromium.org>");
+MODULE_DESCRIPTION(DM_NAME " target for transparent disk integrity checking");
+MODULE_LICENSE("GPL");
diff --git a/drivers/md/dm-verity.h b/drivers/md/dm-verity.h
new file mode 100644
index 0000000..e0664c9
--- /dev/null
+++ b/drivers/md/dm-verity.h
@@ -0,0 +1,45 @@
+/*
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *                    All Rights Reserved.
+ *
+ * This file is released under the GPLv2.
+ *
+ * Provide error types for use when creating a custom error handler.
+ * See Documentation/device-mapper/dm-verity.txt
+ */
+#ifndef DM_VERITY_H
+#define DM_VERITY_H
+
+#include <linux/notifier.h>
+
+struct dm_verity_error_state {
+	int code;
+	int transient;  /* Likely to not happen after a reboot */
+	u64 block;
+	const char *message;
+
+	sector_t dev_start;
+	sector_t dev_len;
+	struct block_device *dev;
+
+	sector_t hash_dev_start;
+	sector_t hash_dev_len;
+	struct block_device *hash_dev;
+
+	/* Final behavior after all notifications are completed. */
+	int behavior;
+};
+
+/* This enum must be matched to allowed_error_behaviors in dm-verity.c */
+enum dm_verity_error_behavior {
+	DM_VERITY_ERROR_BEHAVIOR_EIO = 0,
+	DM_VERITY_ERROR_BEHAVIOR_PANIC,
+	DM_VERITY_ERROR_BEHAVIOR_NONE,
+	DM_VERITY_ERROR_BEHAVIOR_NOTIFY
+};
+
+
+int dm_verity_register_error_notifier(struct notifier_block *nb);
+int dm_verity_unregister_error_notifier(struct notifier_block *nb);
+
+#endif  /* DM_VERITY_H */
diff --git a/include/linux/dm-bht.h b/include/linux/dm-bht.h
new file mode 100644
index 0000000..3a4b432
--- /dev/null
+++ b/include/linux/dm-bht.h
@@ -0,0 +1,166 @@
+/*
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *
+ * Device-Mapper block hash tree interface.
+ * See Documentation/device-mapper/dm-bht.txt for details.
+ *
+ * This file is released under the GPLv2.
+ */
+#ifndef __LINUX_DM_BHT_H
+#define __LINUX_DM_BHT_H
+
+#include <crypto/hash.h>
+#include <linux/compiler.h>
+#include <linux/types.h>
+
+/* To avoid allocating memory for digest tests, we just setup a
+ * max to use for now.
+ */
+#define DM_BHT_MAX_DIGEST_SIZE 128  /* 1k hashes are unlikely for now */
+#define DM_BHT_SALT_SIZE       32   /* 256 bits of salt is a lot */
+
+/* UNALLOCATED, PENDING, READY, and VERIFIED are valid states. All other
+ * values are entry-related return codes.
+ */
+#define DM_BHT_ENTRY_VERIFIED 8  /* 'nodes' has been checked against parent */
+#define DM_BHT_ENTRY_READY 4  /* 'nodes' is loaded and available */
+#define DM_BHT_ENTRY_PENDING 2  /* 'nodes' is being loaded */
+#define DM_BHT_ENTRY_UNALLOCATED 0 /* untouched */
+#define DM_BHT_ENTRY_ERROR -1 /* entry is unsuitable for use */
+#define DM_BHT_ENTRY_ERROR_IO -2 /* I/O error on load */
+
+/* Additional possible return codes */
+#define DM_BHT_ENTRY_ERROR_MISMATCH -3 /* Digest mismatch */
+
+/* dm_bht_entry
+ * Contains dm_bht->node_count tree nodes at a given tree depth.
+ * state is used to transactionally assure that data is paged in
+ * from disk.  Unless dm_bht kept running crypto contexts for each
+ * level, we need to load in the data for on-demand verification.
+ */
+struct dm_bht_entry {
+	atomic_t state; /* see defines */
+	/* Keeping an extra pointer per entry wastes up to ~33k of
+	 * memory if a 1m blocks are used (or 66 on 64-bit arch)
+	 */
+	void *io_context;  /* Reserve a pointer for use during io */
+	/* data should only be non-NULL if fully populated. */
+	void *nodes;  /* The hash data used to verify the children.
+		       * Guaranteed to be page-aligned.
+		       */
+};
+
+/* dm_bht_level
+ * Contains an array of entries which represent a page of hashes where
+ * each hash is a node in the tree at the given tree depth/level.
+ */
+struct dm_bht_level {
+	struct dm_bht_entry *entries;  /* array of entries of tree nodes */
+	unsigned int count;  /* number of entries at this level */
+	sector_t sector;  /* starting sector for this level */
+};
+
+/* opaque context, start, databuf, sector_count */
+typedef int(*dm_bht_callback)(void *,  /* external context */
+			      sector_t,  /* start sector */
+			      u8 *,  /* destination page */
+			      sector_t,  /* num sectors */
+			      struct dm_bht_entry *);
+/* dm_bht - Device mapper block hash tree
+ * dm_bht provides a fixed interface for comparing data blocks
+ * against a cryptographic hashes stored in a hash tree. It
+ * optimizes the tree structure for storage on disk.
+ *
+ * The tree is built from the bottom up.  A collection of data,
+ * external to the tree, is hashed and these hashes are stored
+ * as the blocks in the tree.  For some number of these hashes,
+ * a parent node is created by hashing them.  These steps are
+ * repeated.
+ *
+ * TODO(wad): All hash storage memory is pre-allocated and freed once an
+ * entire branch has been verified.
+ */
+struct dm_bht {
+	/* Configured values */
+	int depth;  /* Depth of the tree including the root */
+	unsigned int block_count;  /* Number of blocks hashed */
+	unsigned int block_size;  /* Size of a hash block */
+	char hash_alg[CRYPTO_MAX_ALG_NAME];
+	unsigned char salt[DM_BHT_SALT_SIZE];
+
+	/* Computed values */
+	unsigned int node_count;  /* Data size (in hashes) for each entry */
+	unsigned int node_count_shift;  /* first bit set - 1 */
+	/* There is one per CPU so that verified can be simultaneous. */
+	struct shash_desc *hash_desc[NR_CPUS];  /* Container for the hash alg */
+	unsigned int digest_size;
+	sector_t sectors;  /* Number of disk sectors used */
+
+	/* bool verified;  Full tree is verified */
+	u8 root_digest[DM_BHT_MAX_DIGEST_SIZE];
+	struct dm_bht_level *levels;  /* in reverse order */
+	/* Callback for reading from the hash device */
+	dm_bht_callback read_cb;
+};
+
+/* Constructor for struct dm_bht instances. */
+int dm_bht_create(struct dm_bht *bht,
+		  unsigned int block_count,
+		  unsigned int block_size,
+		  const char *alg_name);
+/* Destructor for struct dm_bht instances.  Does not free @bht */
+void dm_bht_destroy(struct dm_bht *bht);
+
+/* Basic accessors for struct dm_bht */
+int dm_bht_set_root_hexdigest(struct dm_bht *bht, const u8 *hexdigest);
+int dm_bht_root_hexdigest(struct dm_bht *bht, u8 *hexdigest, int available);
+void dm_bht_set_salt(struct dm_bht *bht, const char *hexsalt);
+int dm_bht_salt(struct dm_bht *bht, char *hexsalt);
+
+/* Functions for loading in data from disk for verification */
+bool dm_bht_is_populated(struct dm_bht *bht, unsigned int block);
+int dm_bht_populate(struct dm_bht *bht, void *read_cb_ctx,
+		    unsigned int block);
+int dm_bht_verify_block(struct dm_bht *bht, unsigned int block,
+			struct page *pg, unsigned int offset);
+void dm_bht_read_completed(struct dm_bht_entry *entry, int status);
+
+/* Functions for converting indices to nodes. */
+
+static inline unsigned int dm_bht_get_level_shift(struct dm_bht *bht,
+						  int depth)
+{
+	return (bht->depth - depth) * bht->node_count_shift;
+}
+
+/* For the given depth, this is the entry index.  At depth+1 it is the node
+ * index for depth.
+ */
+static inline unsigned int dm_bht_index_at_level(struct dm_bht *bht,
+							int depth,
+							unsigned int leaf)
+{
+	return leaf >> dm_bht_get_level_shift(bht, depth);
+}
+
+static inline struct dm_bht_entry *dm_bht_get_entry(struct dm_bht *bht,
+						    int depth,
+						    unsigned int block)
+{
+	unsigned int index = dm_bht_index_at_level(bht, depth, block);
+	struct dm_bht_level *level = &bht->levels[depth];
+
+	return &level->entries[index];
+}
+
+static inline void *dm_bht_get_node(struct dm_bht *bht,
+				  struct dm_bht_entry *entry,
+				  int depth,
+				  unsigned int block)
+{
+	unsigned int index = dm_bht_index_at_level(bht, depth, block);
+	unsigned int node_index = index % bht->node_count;
+
+	return entry->nodes + (node_index * bht->digest_size);
+}
+#endif  /* __LINUX_DM_BHT_H */
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH] dm: verity target
@ 2012-01-04 21:49 Mandeep Singh Baines
  2012-01-04 22:42 ` Mandeep Singh Baines
  0 siblings, 1 reply; 22+ messages in thread
From: Mandeep Singh Baines @ 2012-01-04 21:49 UTC (permalink / raw)
  To: dm-devel, Alasdair G Kergon, linux-kernel
  Cc: Mandeep Singh Baines, Will Drewry, Elly Jones, Alasdair G Kergon,
	Milan Broz, Olof Johansson, Steffen Klassert

The verity target provides transparent integrity checking of block devices
using a cryptographic digest.

dm-verity is meant to be setup as part of a verified boot path.  This
may be anything ranging from a boot using tboot or trustedgrub to just
booting from a known-good device (like a USB drive or CD).

dm-verity is part of ChromeOS's verified boot path. It is used to verify
the integrity of the root filesystem on boot. The root filesystem is
mounted on a dm-verity partition which transparently verifies each block
with a bootloader verified hash passed into the kernel at boot.

Changes in V3:
* Discussion over irc (Alasdair G Kergon)
  * Implement ioctl hook
Changes in V2:
* https://lkml.org/lkml/2011/11/10/85 (Steffen Klassert)
  * Use shash API instead of older hash API

Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Elly Jones <ellyjones@chromium.org>
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Milan Broz <mbroz@redhat.com>
Cc: Olof Johansson <olofj@chromium.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: dm-devel@redhat.com
---
 Documentation/device-mapper/dm-bht.txt    |   59 ++
 Documentation/device-mapper/dm-verity.txt |   76 +++
 drivers/md/Kconfig                        |   30 +
 drivers/md/Makefile                       |    2 +
 drivers/md/dm-bht.c                       |  559 +++++++++++++++
 drivers/md/dm-verity.c                    | 1043 +++++++++++++++++++++++++++++
 drivers/md/dm-verity.h                    |   45 ++
 include/linux/dm-bht.h                    |  166 +++++
 8 files changed, 1980 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/device-mapper/dm-bht.txt
 create mode 100644 Documentation/device-mapper/dm-verity.txt
 create mode 100644 drivers/md/dm-bht.c
 create mode 100644 drivers/md/dm-verity.c
 create mode 100644 drivers/md/dm-verity.h
 create mode 100644 include/linux/dm-bht.h

diff --git a/Documentation/device-mapper/dm-bht.txt b/Documentation/device-mapper/dm-bht.txt
new file mode 100644
index 0000000..21d929f
--- /dev/null
+++ b/Documentation/device-mapper/dm-bht.txt
@@ -0,0 +1,59 @@
+dm-bht
+======
+
+dm-bht provides a block hash tree implementation.  The use of dm-bht allows
+for integrity checking of a given block device without reading the entire
+set of blocks into memory before use.
+
+In particular, dm-bht supplies an interface for creating and verifying a tree
+of cryptographic digests with any algorithm supported by the kernel crypto API.
+
+The `verity' target is the motivating example.
+
+
+Theory of operation
+===================
+
+dm-bht is logically comprised of multiple nodes organized in a tree-like
+structure.  Each node in the tree is a cryptographic hash.  If it is a leaf
+node, the hash is of some block data on disk.  If it is an intermediary node,
+then the hash is of a number of child nodes.
+
+dm-bht has a given depth starting at 1 (ignoring the root node).  Each level in
+the tree is concretely made up of dm_bht_entry structs.  Each entry in the tree
+is a collection of neighboring nodes that fit in one page-sized block.  The
+number is determined based on PAGE_SIZE and the size of the selected
+cryptographic digest algorithm.  The hashes are linearly ordered in this entry
+and any unaligned trailing space is ignored but included when calculating the
+parent node.
+
+The tree looks something like:
+
+alg= sha256, num_blocks = 32767
+                                 [   root    ]
+                                /    . . .    \
+                     [entry_0]                 [entry_1]
+                    /  . . .  \                 . . .   \
+         [entry_0_0]   . . .  [entry_0_127]    . . . .  [entry_1_127]
+           / ... \             /   . . .  \             /           \
+     blk_0 ... blk_127  blk_16256   blk_16383      blk_32640 . . . blk_32767
+
+root is treated independently from the depth and the blocks are expected to
+be hashed and supplied to the dm-bht.  hash blocks that make up the entry
+contents are expected to be read from disk.
+
+dm-bht does not handle I/O directly but instead expects the consumer to
+supply callbacks.  The read callback will always receive a page-align value
+to pass to the block device layer to read in a hash value.
+
+Usage
+=====
+
+The API provides mechanisms for reading and verifying a tree. When reading, all
+required data for the hash tree should be populated for a block before
+attempting a verify.  This can be done by calling dm_bht_populate().  When all
+data is ready, a call to dm_bht_verify_block() with the expected hash value will
+perform both the direct block hash check and the hashes of the parent and
+neighboring nodes where needed to ensure validity up to the root hash.  Note,
+dm_bht_set_root_hexdigest() should be called before any verification attempts
+occur.
diff --git a/Documentation/device-mapper/dm-verity.txt b/Documentation/device-mapper/dm-verity.txt
new file mode 100644
index 0000000..f33b984
--- /dev/null
+++ b/Documentation/device-mapper/dm-verity.txt
@@ -0,0 +1,76 @@
+dm-verity
+==========
+
+Device-Mapper's "verity" target provides transparent integrity checking of
+block devices using a cryptographic digest provided by the kernel crypto API.
+This target is read-only.
+
+Parameters: payload=<device path> hashtree=<hash device path> alg=<alg> \
+            salt=<salt> root_hexagiest=<root hash> \
+            [ hashstart=<hash start> error_behavior=<error behavior> ]
+
+<device path>
+    This is the device that is going to be integrity checked.  It may be
+    a subset of the full device as specified to dmsetup (start sector and count)
+    It may be specified as a path, like /dev/sdaX, or a device number,
+    <major>:<minor>.
+
+<hash device path>
+    This is the device that that supplies the dm-bht hash data.  It may be
+    specified similarly to the device path and may be the same device.  If the
+    same device is used, the hash offset should be outside of the dm-verity
+    configured device size.
+
+<alg>
+    The cryptographic hash algorithm used for this device.  This should
+    be the name of the algorithm, like "sha1".
+
+<salt>
+    Salt value (in hex).
+
+<root hash>
+    The hexadecimal encoding of the cryptographic hash of all of the
+    neighboring nodes at the first level of the tree.  This hash should be
+    trusted as there is no other authenticity beyond this point.
+
+<hash start>
+    Start address of hashes (default 0).
+
+<error behavior>
+    0 = return -EIO. 1 = panic. 2 = none. 3 = call notifier.
+
+Theory of operation
+===================
+
+dm-verity is meant to be setup as part of a verified boot path.  This
+may be anything ranging from a boot using tboot or trustedgrub to just
+booting from a known-good device (like a USB drive or CD).
+
+When a dm-verity device is configured, it is expected that the caller
+has been authenticated in some way (cryptographic signatures, etc).
+After instantiation, all hashes will be verified on-demand during
+disk access.  If they cannot be verified up to the root node of the
+tree, the root hash, then the I/O will fail.  This should identify
+tampering with any data on the device and the hash data.
+
+Cryptographic hashes are used to assert the integrity of the device on a
+per-block basis.  This allows for a lightweight hash computation on first read
+into the page cache.  Block hashes are stored linearly aligned to the nearest
+block the size of a page.
+
+For more information on the hashing process, see dm-bht.txt.
+
+
+Example
+=======
+
+Setup a device;
+[[
+  dmsetup create vroot --table \
+    "0 204800 verity payload=/dev/sda1 hashtree=/dev/sda2 alg=sha1 "\
+    "root_hexdigest=9f74809a2ee7607b16fcc70d9399a4de9725a727"
+]]
+
+A command line tool is available to compute the hash tree and return the
+root hash value.
+  http://git.chromium.org/cgi-bin/gitweb.cgi?p=dm-verity.git;a=tree
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index faa4741..3cdf95c 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -370,4 +370,34 @@ config DM_FLAKEY
        ---help---
          A target that intermittently fails I/O for debugging purposes.
 
+config DM_BHT
+        tristate "Block hash tree support"
+        select CRYPTO
+        select CRYPTO_HASH
+        ---help---
+          Include support for device-mapper devices to use a block hash
+          tree for managing data integrity checks in a scalable way.
+
+          Targets that use this functionality should include it
+          automatically.
+
+          If unsure, say N.
+
+config DM_VERITY
+        tristate "Verity target support"
+        depends on BLK_DEV_DM
+        select DM_BHT
+        select CRYPTO
+        select CRYPTO_HASH
+        ---help---
+          This device-mapper target allows you to create a device that
+          transparently integrity checks the data on it. You'll need to
+          activate the digests you're going to use in the cryptoapi
+          configuration.
+
+          To compile this code as a module, choose M here: the module will
+          be called dm-verity.
+
+          If unsure, say N.
+
 endif # MD
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index 046860c..c069953 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -39,6 +39,8 @@ obj-$(CONFIG_DM_SNAPSHOT)	+= dm-snapshot.o
 obj-$(CONFIG_DM_PERSISTENT_DATA)	+= persistent-data/
 obj-$(CONFIG_DM_MIRROR)		+= dm-mirror.o dm-log.o dm-region-hash.o
 obj-$(CONFIG_DM_LOG_USERSPACE)	+= dm-log-userspace.o
+obj-$(CONFIG_DM_BHT)            += dm-bht.o
+obj-$(CONFIG_DM_VERITY)         += dm-verity.o
 obj-$(CONFIG_DM_ZERO)		+= dm-zero.o
 obj-$(CONFIG_DM_RAID)	+= dm-raid.o
 obj-$(CONFIG_DM_THIN_PROVISIONING)	+= dm-thin-pool.o
diff --git a/drivers/md/dm-bht.c b/drivers/md/dm-bht.c
new file mode 100644
index 0000000..6eb2be3
--- /dev/null
+++ b/drivers/md/dm-bht.c
@@ -0,0 +1,559 @@
+ /*
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *
+ * Device-Mapper block hash tree interface.
+ * See Documentation/device-mapper/dm-bht.txt for details.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <crypto/hash.h>
+#include <linux/atomic.h>
+#include <linux/bitops.h>
+#include <linux/bug.h>
+#include <linux/cpumask.h>
+#include <linux/device-mapper.h>
+#include <linux/dm-bht.h>
+#include <linux/err.h>
+#include <linux/errno.h>
+#include <linux/gfp.h>
+#include <linux/highmem.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/mm_types.h>
+#include <linux/scatterlist.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+#define DM_MSG_PREFIX "dm bht"
+
+
+/*
+ * Utilities
+ */
+
+static u8 from_hex(u8 ch)
+{
+	if ((ch >= '0') && (ch <= '9'))
+		return ch - '0';
+	if ((ch >= 'a') && (ch <= 'f'))
+		return ch - 'a' + 10;
+	if ((ch >= 'A') && (ch <= 'F'))
+		return ch - 'A' + 10;
+	return -1;
+}
+
+/**
+ * dm_bht_bin_to_hex - converts a binary stream to human-readable hex
+ * @binary:	a byte array of length @binary_len
+ * @hex:	a byte array of length @binary_len * 2 + 1
+ */
+static void dm_bht_bin_to_hex(u8 *binary, u8 *hex, unsigned int binary_len)
+{
+	while (binary_len-- > 0) {
+		sprintf((char *)hex, "%02hhx", (int)*binary);
+		hex += 2;
+		binary++;
+	}
+}
+
+/**
+ * dm_bht_hex_to_bin - converts a hex stream to binary
+ * @binary:	a byte array of length @binary_len
+ * @hex:	a byte array of length @binary_len * 2 + 1
+ */
+static void dm_bht_hex_to_bin(u8 *binary, const u8 *hex,
+			      unsigned int binary_len)
+{
+	while (binary_len-- > 0) {
+		*binary = from_hex(*(hex++));
+		*binary *= 16;
+		*binary += from_hex(*(hex++));
+		binary++;
+	}
+}
+
+static void dm_bht_log_mismatch(struct dm_bht *bht, u8 *given, u8 *computed)
+{
+	u8 given_hex[DM_BHT_MAX_DIGEST_SIZE * 2 + 1];
+	u8 computed_hex[DM_BHT_MAX_DIGEST_SIZE * 2 + 1];
+
+	dm_bht_bin_to_hex(given, given_hex, bht->digest_size);
+	dm_bht_bin_to_hex(computed, computed_hex, bht->digest_size);
+	DMERR_LIMIT("%s != %s", given_hex, computed_hex);
+}
+
+/**
+ * dm_bht_compute_hash: hashes a page of data
+ */
+static int dm_bht_compute_hash(struct dm_bht *bht, struct page *pg,
+			       unsigned int offset, u8 *digest)
+{
+	struct shash_desc *hash_desc = bht->hash_desc[smp_processor_id()];
+	void *data;
+	int err;
+
+	/* Note, this is synchronous. */
+	if (crypto_shash_init(hash_desc)) {
+		DMCRIT("failed to reinitialize crypto hash (proc:%d)",
+			smp_processor_id());
+		return -EINVAL;
+	}
+	data = kmap_atomic(pg);
+	err = crypto_shash_update(hash_desc, data + offset, PAGE_SIZE);
+	kunmap_atomic(data);
+	if (err) {
+		DMCRIT("crypto_hash_update failed");
+		return -EINVAL;
+	}
+	if (crypto_shash_update(hash_desc, bht->salt, sizeof(bht->salt))) {
+		DMCRIT("crypto_hash_update failed");
+		return -EINVAL;
+	}
+	if (crypto_shash_final(hash_desc, digest)) {
+		DMCRIT("crypto_hash_final failed");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/*
+ * Implementation functions
+ */
+
+static int dm_bht_initialize_entries(struct dm_bht *bht)
+{
+	/* last represents the index of the last digest store in the tree.
+	 * By walking the tree with that index, it is possible to compute the
+	 * total number of entries at each level.
+	 *
+	 * Since each entry will contain up to |node_count| nodes of the tree,
+	 * it is possible that the last index may not be at the end of a given
+	 * entry->nodes.  In that case, it is assumed the value is padded.
+	 *
+	 * Note, we treat both the tree root (1 hash) and the tree leaves
+	 * independently from the bht data structures.  Logically, the root is
+	 * depth=-1 and the block layer level is depth=bht->depth
+	 */
+	unsigned int last = bht->block_count;
+	int depth;
+
+	/* check that the largest level->count can't result in an int overflow
+	 * on allocation or sector calculation.
+	 */
+	if (((last >> bht->node_count_shift) + 1) >
+	    UINT_MAX / max((unsigned int)sizeof(struct dm_bht_entry),
+			   (unsigned int)to_sector(bht->block_size))) {
+		DMCRIT("required entries %u is too large", last + 1);
+		return -EINVAL;
+	}
+
+	/* Track the current sector location for each level so we don't have to
+	 * compute it during traversals.
+	 */
+	bht->sectors = 0;
+	for (depth = 0; depth < bht->depth; ++depth) {
+		struct dm_bht_level *level = &bht->levels[depth];
+
+		level->count = dm_bht_index_at_level(bht, depth, last) + 1;
+		level->entries = (struct dm_bht_entry *)
+				 kcalloc(level->count,
+					 sizeof(struct dm_bht_entry),
+					 GFP_KERNEL);
+		if (!level->entries) {
+			DMERR("failed to allocate entries for depth %d", depth);
+			return -ENOMEM;
+		}
+		level->sector = bht->sectors;
+		bht->sectors += level->count * to_sector(bht->block_size);
+	}
+
+	return 0;
+}
+
+/**
+ * dm_bht_create - prepares @bht for us
+ * @bht:	pointer to a dm_bht_create()d bht
+ * @depth:	tree depth without the root; including block hashes
+ * @block_count:the number of block hashes / tree leaves
+ * @alg_name:	crypto hash algorithm name
+ *
+ * Returns 0 on success.
+ *
+ * Callers can offset into devices by storing the data in the io callbacks.
+ */
+int dm_bht_create(struct dm_bht *bht, unsigned int block_count,
+		  unsigned int block_size, const char *alg_name)
+{
+	struct crypto_shash *tfm;
+	int size, cpu, status = 0;
+
+	bht->block_size = block_size;
+	/* Verify that PAGE_SIZE >= block_size >= SECTOR_SIZE. */
+	if ((block_size > PAGE_SIZE) ||
+	    (PAGE_SIZE % block_size) ||
+	    (to_sector(block_size) == 0))
+		return -EINVAL;
+
+	tfm = crypto_alloc_shash(alg_name, 0, 0);
+	if (IS_ERR(tfm)) {
+		DMERR("failed to allocate crypto hash '%s'", alg_name);
+		return -ENOMEM;
+	}
+	size = sizeof(struct shash_desc) + crypto_shash_descsize(tfm);
+
+	/* Pre-allocate per-cpu crypto contexts to avoid having to
+	 * kmalloc/kfree a context for every hash operation.
+	 */
+	for (cpu = 0; cpu < nr_cpu_ids; ++cpu) {
+		struct shash_desc *hash_desc = kmalloc(size, GFP_KERNEL);
+
+		bht->hash_desc[cpu] = hash_desc;
+		if (!hash_desc) {
+			DMERR("failed to allocate crypto hash contexts");
+			status = -ENOMEM;
+			goto bad_hash_alloc;
+		}
+		hash_desc->tfm = tfm;
+		hash_desc->flags = 0x0;
+	}
+	bht->digest_size = crypto_shash_digestsize(tfm);
+	/* We expect to be able to pack >=2 hashes into a block */
+	if (block_size / bht->digest_size < 2) {
+		DMERR("too few hashes fit in a block");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	if (bht->digest_size > DM_BHT_MAX_DIGEST_SIZE) {
+		DMERR("DM_BHT_MAX_DIGEST_SIZE too small for chosen digest");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	/* Configure the tree */
+	bht->block_count = block_count;
+	if (block_count == 0) {
+		DMERR("block_count must be non-zero");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	/* Each dm_bht_entry->nodes is one block.  The node code tracks
+	 * how many nodes fit into one entry where a node is a single
+	 * hash (message digest).
+	 */
+	bht->node_count_shift = fls(block_size / bht->digest_size) - 1;
+	/* Round down to the nearest power of two.  This makes indexing
+	 * into the tree much less painful.
+	 */
+	bht->node_count = 1 << bht->node_count_shift;
+
+	/* This is unlikely to happen, but with 64k pages, who knows. */
+	if (bht->node_count > UINT_MAX / bht->digest_size) {
+		DMERR("node_count * hash_len exceeds UINT_MAX!");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	bht->depth = DIV_ROUND_UP(fls(block_count - 1), bht->node_count_shift);
+
+	/* Ensure that we can safely shift by this value. */
+	if (bht->depth * bht->node_count_shift >= sizeof(unsigned int) * 8) {
+		DMERR("specified depth and node_count_shift is too large");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	/* Allocate levels. Each level of the tree may have an arbitrary number
+	 * of dm_bht_entry structs.  Each entry contains node_count nodes.
+	 * Each node in the tree is a cryptographic digest of either node_count
+	 * nodes on the subsequent level or of a specific block on disk.
+	 */
+	bht->levels = (struct dm_bht_level *)
+			kcalloc(bht->depth,
+				sizeof(struct dm_bht_level), GFP_KERNEL);
+	if (!bht->levels) {
+		DMERR("failed to allocate tree levels");
+		status = -ENOMEM;
+		goto bad_level_alloc;
+	}
+
+	bht->read_cb = NULL;
+
+	status = dm_bht_initialize_entries(bht);
+	if (status)
+		goto bad_entries_alloc;
+
+	/* We compute depth such that there is only be 1 block at level 0. */
+	BUG_ON(bht->levels[0].count != 1);
+
+	return 0;
+
+bad_entries_alloc:
+	while (bht->depth-- > 0)
+		kfree(bht->levels[bht->depth].entries);
+	kfree(bht->levels);
+bad_level_alloc:
+bad_arg:
+bad_hash_alloc:
+	for (cpu = 0; cpu < nr_cpu_ids && bht->hash_desc[cpu]; ++cpu)
+		kfree(bht->hash_desc[cpu]);
+	crypto_free_shash(tfm);
+	return status;
+}
+EXPORT_SYMBOL(dm_bht_create);
+
+/**
+ * dm_bht_read_completed
+ * @entry:	pointer to the entry that's been loaded
+ * @status:	I/O status. Non-zero is failure.
+ * MUST always be called after a read_cb completes.
+ */
+void dm_bht_read_completed(struct dm_bht_entry *entry, int status)
+{
+	if (status) {
+		/* TODO(wad) add retry support */
+		DMCRIT("an I/O error occurred while reading entry");
+		atomic_set(&entry->state, DM_BHT_ENTRY_ERROR_IO);
+		/* entry->nodes will be freed later */
+		return;
+	}
+	BUG_ON(atomic_read(&entry->state) != DM_BHT_ENTRY_PENDING);
+	atomic_set(&entry->state, DM_BHT_ENTRY_READY);
+}
+EXPORT_SYMBOL(dm_bht_read_completed);
+
+/**
+ * dm_bht_verify_block - checks that all nodes in the path for @block are valid
+ * @bht:	pointer to a dm_bht_create()d bht
+ * @block:	specific block data is expected from
+ * @pg:		page holding the block data
+ * @offset:	offset into the page
+ *
+ * Returns 0 on success, DM_BHT_ENTRY_ERROR_MISMATCH on error.
+ */
+int dm_bht_verify_block(struct dm_bht *bht, unsigned int block,
+			struct page *pg, unsigned int offset)
+{
+	int state, depth = bht->depth;
+	u8 digest[DM_BHT_MAX_DIGEST_SIZE];
+	struct dm_bht_entry *entry;
+	void *node;
+
+	do {
+		/* Need to check that the hash of the current block is accurate
+		 * in its parent.
+		 */
+		entry = dm_bht_get_entry(bht, depth - 1, block);
+		state = atomic_read(&entry->state);
+		/* This call is only safe if all nodes along the path
+		 * are already populated (i.e. READY) via dm_bht_populate.
+		 */
+		BUG_ON(state < DM_BHT_ENTRY_READY);
+		node = dm_bht_get_node(bht, entry, depth, block);
+
+		if (dm_bht_compute_hash(bht, pg, offset, digest) ||
+		    memcmp(digest, node, bht->digest_size))
+			goto mismatch;
+
+		/* Keep the containing block of hashes to be verified in the
+		 * next pass.
+		 */
+		pg = virt_to_page(entry->nodes);
+		offset = offset_in_page(entry->nodes);
+	} while (--depth > 0 && state != DM_BHT_ENTRY_VERIFIED);
+
+	if (depth == 0 && state != DM_BHT_ENTRY_VERIFIED) {
+		if (dm_bht_compute_hash(bht, pg, offset, digest) ||
+		    memcmp(digest, bht->root_digest, bht->digest_size))
+			goto mismatch;
+		atomic_set(&entry->state, DM_BHT_ENTRY_VERIFIED);
+	}
+
+	/* Mark path to leaf as verified. */
+	for (depth++; depth < bht->depth; depth++) {
+		entry = dm_bht_get_entry(bht, depth, block);
+		/* At this point, entry can only be in VERIFIED or READY state.
+		 * So it is safe to use atomic_set instead of atomic_cmpxchg.
+		 */
+		atomic_set(&entry->state, DM_BHT_ENTRY_VERIFIED);
+	}
+
+	return 0;
+
+mismatch:
+	DMERR_LIMIT("verify_path: failed to verify hash (d=%d,bi=%u)",
+		    depth, block);
+	dm_bht_log_mismatch(bht, node, digest);
+	return DM_BHT_ENTRY_ERROR_MISMATCH;
+}
+EXPORT_SYMBOL(dm_bht_verify_block);
+
+/**
+ * dm_bht_is_populated - check that entries from disk needed to verify a given
+ *                       block are all ready
+ * @bht:	pointer to a dm_bht_create()d bht
+ * @block:	specific block data is expected from
+ *
+ * Callers may wish to call dm_bht_is_populated() when checking an io
+ * for which entries were already pending.
+ */
+bool dm_bht_is_populated(struct dm_bht *bht, unsigned int block)
+{
+	int depth;
+
+	for (depth = bht->depth - 1; depth >= 0; depth--) {
+		struct dm_bht_entry *entry = dm_bht_get_entry(bht, depth,
+							      block);
+		if (atomic_read(&entry->state) < DM_BHT_ENTRY_READY)
+			return false;
+	}
+
+	return true;
+}
+EXPORT_SYMBOL(dm_bht_is_populated);
+
+/**
+ * dm_bht_populate - reads entries from disk needed to verify a given block
+ * @bht:	pointer to a dm_bht_create()d bht
+ * @ctx:        context used for all read_cb calls on this request
+ * @block:	specific block data is expected from
+ *
+ * Returns negative value on error. Returns 0 on success.
+ */
+int dm_bht_populate(struct dm_bht *bht, void *ctx, unsigned int block)
+{
+	int depth, state;
+
+	BUG_ON(block >= bht->block_count);
+
+	for (depth = bht->depth - 1; depth >= 0; --depth) {
+		unsigned int index = dm_bht_index_at_level(bht, depth, block);
+		struct dm_bht_level *level = &bht->levels[depth];
+		struct dm_bht_entry *entry = dm_bht_get_entry(bht, depth,
+							      block);
+		state = atomic_cmpxchg(&entry->state,
+				       DM_BHT_ENTRY_UNALLOCATED,
+				       DM_BHT_ENTRY_PENDING);
+		if (state == DM_BHT_ENTRY_VERIFIED)
+			break;
+		if (state <= DM_BHT_ENTRY_ERROR)
+			goto error_state;
+		if (state != DM_BHT_ENTRY_UNALLOCATED)
+			continue;
+
+		/* Current entry is claimed for allocation and loading */
+		entry->nodes = kmalloc(bht->block_size, GFP_NOIO);
+		if (!entry->nodes)
+			goto nomem;
+
+		bht->read_cb(ctx,
+			     level->sector + to_sector(index * bht->block_size),
+			     entry->nodes, to_sector(bht->block_size), entry);
+	}
+
+	return 0;
+
+error_state:
+	DMCRIT("block %u at depth %d is in an error state", block, depth);
+	return -EPERM;
+
+nomem:
+	DMCRIT("failed to allocate memory for entry->nodes");
+	return -ENOMEM;
+}
+EXPORT_SYMBOL(dm_bht_populate);
+
+/**
+ * dm_bht_destroy - cleans up all memory used by @bht
+ * @bht:	pointer to a dm_bht_create()d bht
+ */
+void dm_bht_destroy(struct dm_bht *bht)
+{
+	int depth, cpu;
+
+	for (depth = 0; depth < bht->depth; depth++) {
+		struct dm_bht_entry *entry = bht->levels[depth].entries;
+		struct dm_bht_entry *entry_end = entry +
+						 bht->levels[depth].count;
+		for (; entry < entry_end; ++entry)
+			kfree(entry->nodes);
+		kfree(bht->levels[depth].entries);
+	}
+	kfree(bht->levels);
+	crypto_free_shash((bht->hash_desc[0])->tfm);
+	for (cpu = 0; cpu < nr_cpu_ids; ++cpu)
+		kfree(bht->hash_desc[cpu]);
+}
+EXPORT_SYMBOL(dm_bht_destroy);
+
+/*
+ * Accessors
+ */
+
+/**
+ * dm_bht_set_root_hexdigest - sets an unverified root digest hash from hex
+ * @bht:	pointer to a dm_bht_create()d bht
+ * @hexdigest:	array of u8s containing the new digest in binary
+ * Returns non-zero on error.  hexdigest should be NUL terminated.
+ */
+int dm_bht_set_root_hexdigest(struct dm_bht *bht, const u8 *hexdigest)
+{
+	/* Make sure we have at least the bytes expected */
+	if (strnlen((char *)hexdigest, bht->digest_size * 2) !=
+	    bht->digest_size * 2) {
+		DMERR("root digest length does not match hash algorithm");
+		return -1;
+	}
+	dm_bht_hex_to_bin(bht->root_digest, hexdigest, bht->digest_size);
+	return 0;
+}
+EXPORT_SYMBOL(dm_bht_set_root_hexdigest);
+
+/**
+ * dm_bht_root_hexdigest - returns root digest in hex
+ * @bht:	pointer to a dm_bht_create()d bht
+ * @hexdigest:	u8 array of size @available
+ * @available:	must be bht->digest_size * 2 + 1
+ */
+int dm_bht_root_hexdigest(struct dm_bht *bht, u8 *hexdigest, int available)
+{
+	if (available < 0 ||
+	    ((unsigned int) available) < bht->digest_size * 2 + 1) {
+		DMERR("hexdigest has too few bytes available");
+		return -EINVAL;
+	}
+	dm_bht_bin_to_hex(bht->root_digest, hexdigest, bht->digest_size);
+	return 0;
+}
+EXPORT_SYMBOL(dm_bht_root_hexdigest);
+
+/**
+ * dm_bht_set_salt - sets the salt used, in hex
+ * @bht:      pointer to a dm_bht_create()d bht
+ * @hexsalt:  salt string, as hex; will be zero-padded or truncated to
+ *            DM_BHT_SALT_SIZE * 2 hex digits.
+ */
+void dm_bht_set_salt(struct dm_bht *bht, const char *hexsalt)
+{
+	size_t saltlen = min(strlen(hexsalt) / 2, sizeof(bht->salt));
+
+	memset(bht->salt, 0, sizeof(bht->salt));
+	dm_bht_hex_to_bin(bht->salt, (const u8 *)hexsalt, saltlen);
+}
+EXPORT_SYMBOL(dm_bht_set_salt);
+
+/**
+ * dm_bht_salt - returns the salt used, in hex
+ * @bht:      pointer to a dm_bht_create()d bht
+ * @hexsalt:  buffer to put salt into, of length DM_BHT_SALT_SIZE * 2 + 1.
+ */
+int dm_bht_salt(struct dm_bht *bht, char *hexsalt)
+{
+	dm_bht_bin_to_hex(bht->salt, (u8 *)hexsalt, sizeof(bht->salt));
+	return 0;
+}
+EXPORT_SYMBOL(dm_bht_salt);
+
diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
new file mode 100644
index 0000000..a9bd0e8
--- /dev/null
+++ b/drivers/md/dm-verity.c
@@ -0,0 +1,1043 @@
+/*
+ * Originally based on dm-crypt.c,
+ * Copyright (C) 2003 Christophe Saout <christophe@saout.de>
+ * Copyright (C) 2004 Clemens Fruhwirth <clemens@endorphin.org>
+ * Copyright (C) 2006-2008 Red Hat, Inc. All rights reserved.
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *                    All Rights Reserved.
+ *
+ * This file is released under the GPLv2.
+ *
+ * Implements a verifying transparent block device.
+ * See Documentation/device-mapper/dm-verity.txt
+ */
+#include <linux/async.h>
+#include <linux/atomic.h>
+#include <linux/bio.h>
+#include <linux/blkdev.h>
+#include <linux/delay.h>
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/genhd.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mempool.h>
+#include <linux/mm_types.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/workqueue.h>
+#include <linux/device-mapper.h>
+#include <linux/dm-bht.h>
+
+#include "dm-verity.h"
+
+#define DM_MSG_PREFIX "verity"
+
+/* Supports up to 512-bit digests */
+#define VERITY_MAX_DIGEST_SIZE 64
+
+/* TODO(wad) make both of these report the error line/file to a
+ *           verity_bug function.
+ */
+#define VERITY_BUG(msg...) BUG()
+#define VERITY_BUG_ON(cond, msg...) BUG_ON(cond)
+
+/* Helper for printing sector_t */
+#define ULL(x) ((unsigned long long)(x))
+
+#define MIN_IOS 32
+#define MIN_BIOS (MIN_IOS * 2)
+#define VERITY_DEFAULT_BLOCK_SIZE 4096
+
+/* Provide a lightweight means of specifying the global default for
+ * error behavior: eio, reboot, or none
+ * Legacy support for 0 = eio, 1 = reboot/panic, 2 = none, 3 = notify.
+ * This is matched to the enum in dm-verity.h.
+ */
+static const char * const allowed_error_behaviors[] = { "eio", "panic", "none",
+							"notify", NULL };
+static char *error_behavior = "eio";
+module_param(error_behavior, charp, 0644);
+MODULE_PARM_DESC(error_behavior, "Behavior on error "
+				 "(eio, panic, none, notify)");
+
+/* Controls whether verity_get_device will wait forever for a device. */
+static int dev_wait;
+module_param(dev_wait, bool, 0444);
+MODULE_PARM_DESC(dev_wait, "Wait forever for a backing device");
+
+/* per-requested-bio private data */
+enum verity_io_flags {
+	VERITY_IOFLAGS_CLONED = 0x1,	/* original bio has been cloned */
+};
+
+struct dm_verity_io {
+	struct dm_target *target;
+	struct bio *bio;
+	struct delayed_work work;
+	unsigned int flags;
+
+	int error;
+	atomic_t pending;
+
+	u64 block;  /* aligned block index */
+	u64 count;  /* aligned count in blocks */
+};
+
+struct verity_config {
+	struct dm_dev *dev;
+	sector_t start;
+	sector_t size;
+
+	struct dm_dev *hash_dev;
+	sector_t hash_start;
+
+	struct dm_bht bht;
+
+	/* Pool required for io contexts */
+	mempool_t *io_pool;
+	/* Pool and bios required for making sure that backing device reads are
+	 * in PAGE_SIZE increments.
+	 */
+	struct bio_set *bs;
+
+	char hash_alg[CRYPTO_MAX_ALG_NAME];
+
+	int error_behavior;
+};
+
+static struct kmem_cache *_verity_io_pool;
+static struct workqueue_struct *kveritydq, *kverityd_ioq;
+
+static void kverityd_verify(struct work_struct *work);
+static void kverityd_io(struct work_struct *work);
+static void kverityd_io_bht_populate(struct dm_verity_io *io);
+static void kverityd_io_bht_populate_end(struct bio *, int error);
+
+static BLOCKING_NOTIFIER_HEAD(verity_error_notifier);
+
+/*
+ * Exported interfaces
+ */
+
+int dm_verity_register_error_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_register(&verity_error_notifier, nb);
+}
+EXPORT_SYMBOL_GPL(dm_verity_register_error_notifier);
+
+int dm_verity_unregister_error_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_unregister(&verity_error_notifier, nb);
+}
+EXPORT_SYMBOL_GPL(dm_verity_unregister_error_notifier);
+
+/*
+ * Allocation and utility functions
+ */
+
+static void kverityd_src_io_read_end(struct bio *clone, int error);
+
+/* Shared destructor for all internal bios */
+static void dm_verity_bio_destructor(struct bio *bio)
+{
+	struct dm_verity_io *io = bio->bi_private;
+	struct verity_config *vc = io->target->private;
+	bio_free(bio, vc->bs);
+}
+
+static struct bio *verity_alloc_bioset(struct verity_config *vc, gfp_t gfp_mask,
+				       int nr_iovecs)
+{
+	return bio_alloc_bioset(gfp_mask, nr_iovecs, vc->bs);
+}
+
+static struct dm_verity_io *verity_io_alloc(struct dm_target *ti,
+					    struct bio *bio)
+{
+	struct verity_config *vc = ti->private;
+	sector_t sector = bio->bi_sector - ti->begin;
+	struct dm_verity_io *io;
+
+	io = mempool_alloc(vc->io_pool, GFP_NOIO);
+	if (unlikely(!io))
+		return NULL;
+	io->flags = 0;
+	io->target = ti;
+	io->bio = bio;
+	io->error = 0;
+
+	/* Adjust the sector by the virtual starting sector */
+	io->block = to_bytes(sector) / vc->bht.block_size;
+	io->count = bio->bi_size / vc->bht.block_size;
+
+	atomic_set(&io->pending, 0);
+
+	return io;
+}
+
+static struct bio *verity_bio_clone(struct dm_verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+	struct bio *bio = io->bio;
+	struct bio *clone = verity_alloc_bioset(vc, GFP_NOIO, bio->bi_max_vecs);
+
+	if (!clone)
+		return NULL;
+
+	__bio_clone(clone, bio);
+	clone->bi_private = io;
+	clone->bi_end_io  = kverityd_src_io_read_end;
+	clone->bi_bdev    = vc->dev->bdev;
+	clone->bi_sector += vc->start - io->target->begin;
+	clone->bi_destructor = dm_verity_bio_destructor;
+
+	return clone;
+}
+
+/* If the request is not successful, this handler takes action.
+ * TODO make this call a registered handler.
+ */
+static void verity_error(struct verity_config *vc, struct dm_verity_io *io,
+			 int error)
+{
+	const char *message;
+	int error_mode = DM_VERITY_ERROR_BEHAVIOR_PANIC;
+	dev_t devt = 0;
+	u64 block = ~0;
+	int transient = 1;
+	struct dm_verity_error_state error_state;
+
+	if (vc) {
+		devt = vc->dev->bdev->bd_dev;
+		error_mode = vc->error_behavior;
+	}
+
+	if (io) {
+		io->error = -EIO;
+		block = io->block;
+	}
+
+	switch (error) {
+	case -ENOMEM:
+		message = "out of memory";
+		break;
+	case -EBUSY:
+		message = "pending data seen during verify";
+		break;
+	case -EFAULT:
+		message = "crypto operation failure";
+		break;
+	case -EACCES:
+		message = "integrity failure";
+		/* Image is bad. */
+		transient = 0;
+		break;
+	case -EPERM:
+		message = "hash tree population failure";
+		/* Should be dm-bht specific errors */
+		transient = 0;
+		break;
+	case -EINVAL:
+		message = "unexpected missing/invalid data";
+		/* The device was configured incorrectly - fallback. */
+		transient = 0;
+		break;
+	default:
+		/* Other errors can be passed through as IO errors */
+		message = "unknown or I/O error";
+		return;
+	}
+
+	DMERR_LIMIT("verification failure occurred: %s", message);
+
+	if (error_mode == DM_VERITY_ERROR_BEHAVIOR_NOTIFY) {
+		error_state.code = error;
+		error_state.transient = transient;
+		error_state.block = block;
+		error_state.message = message;
+		error_state.dev_start = vc->start;
+		error_state.dev_len = vc->size;
+		error_state.dev = vc->dev->bdev;
+		error_state.hash_dev_start = vc->hash_start;
+		error_state.hash_dev_len = vc->bht.sectors;
+		error_state.hash_dev = vc->hash_dev->bdev;
+
+		/* Set default fallthrough behavior. */
+		error_state.behavior = DM_VERITY_ERROR_BEHAVIOR_PANIC;
+		error_mode = DM_VERITY_ERROR_BEHAVIOR_PANIC;
+
+		if (!blocking_notifier_call_chain(
+		    &verity_error_notifier, transient, &error_state)) {
+			error_mode = error_state.behavior;
+		}
+	}
+
+	switch (error_mode) {
+	case DM_VERITY_ERROR_BEHAVIOR_EIO:
+		break;
+	case DM_VERITY_ERROR_BEHAVIOR_NONE:
+		if (error != -EIO && io)
+			io->error = 0;
+		break;
+	default:
+		goto do_panic;
+	}
+	return;
+
+do_panic:
+	panic("dm-verity failure: "
+	      "device:%u:%u error:%d block:%llu message:%s",
+	      MAJOR(devt), MINOR(devt), error, ULL(block), message);
+}
+
+/**
+ * verity_parse_error_behavior - parse a behavior charp to the enum
+ * @behavior:	NUL-terminated char array
+ *
+ * Checks if the behavior is valid either as text or as an index digit
+ * and returns the proper enum value or -1 on error.
+ */
+static int verity_parse_error_behavior(const char *behavior)
+{
+	const char * const *allowed = allowed_error_behaviors;
+	char index = '0';
+
+	for (; *allowed; allowed++, index++)
+		if (!strcmp(*allowed, behavior) || behavior[0] == index)
+			break;
+
+	if (!*allowed)
+		return -1;
+
+	/* Convert to the integer index matching the enum. */
+	return allowed - allowed_error_behaviors;
+}
+
+/*
+ * Reverse flow of requests into the device.
+ *
+ * (Start at the bottom with verity_map and work your way upward).
+ */
+
+static void verity_inc_pending(struct dm_verity_io *io);
+
+static void verity_return_bio_to_caller(struct dm_verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+
+	if (io->error)
+		verity_error(vc, io, io->error);
+
+	bio_endio(io->bio, io->error);
+	mempool_free(io, vc->io_pool);
+}
+
+/* Check for any missing bht hashes. */
+static bool verity_is_bht_populated(struct dm_verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+	u64 block;
+
+	for (block = io->block; block < io->block + io->count; ++block)
+		if (!dm_bht_is_populated(&vc->bht, block))
+			return false;
+
+	return true;
+}
+
+/* verity_dec_pending manages the lifetime of all dm_verity_io structs.
+ * Non-bug error handling is centralized through this interface and
+ * all passage from workqueue to workqueue.
+ */
+static void verity_dec_pending(struct dm_verity_io *io)
+{
+	if (!atomic_dec_and_test(&io->pending))
+		goto done;
+
+	if (unlikely(io->error))
+		goto io_error;
+
+	/* I/Os that were pending may now be ready */
+	if (verity_is_bht_populated(io)) {
+		INIT_DELAYED_WORK(&io->work, kverityd_verify);
+		queue_delayed_work(kveritydq, &io->work, 0);
+	} else {
+		INIT_DELAYED_WORK(&io->work, kverityd_io);
+		queue_delayed_work(kverityd_ioq, &io->work, HZ/10);
+	}
+
+done:
+	return;
+
+io_error:
+	verity_return_bio_to_caller(io);
+}
+
+/* Walks the data set and computes the hash of the data read from the
+ * untrusted source device.  The computed hash is then passed to dm-bht
+ * for verification.
+ */
+static int verity_verify(struct verity_config *vc,
+			 struct dm_verity_io *io)
+{
+	unsigned int block_size = vc->bht.block_size;
+	struct bio *bio = io->bio;
+	u64 block = io->block;
+	unsigned int idx;
+	int r;
+
+	for (idx = bio->bi_idx; idx < bio->bi_vcnt; idx++) {
+		struct bio_vec *bv = bio_iovec_idx(bio, idx);
+		unsigned int offset = bv->bv_offset;
+		unsigned int len = bv->bv_len;
+
+		VERITY_BUG_ON(offset % block_size);
+		VERITY_BUG_ON(len % block_size);
+
+		while (len) {
+			r = dm_bht_verify_block(&vc->bht, block,
+						bv->bv_page, offset);
+			if (r)
+				goto bad_return;
+
+			offset += block_size;
+			len -= block_size;
+			block++;
+			cond_resched();
+		}
+	}
+
+	return 0;
+
+bad_return:
+	/* dm_bht functions aren't expected to return errno friendly
+	 * values.  They are converted here for uniformity.
+	 */
+	if (r > 0) {
+		DMERR("Pending data for block %llu seen at verify", ULL(block));
+		r = -EBUSY;
+	} else {
+		DMERR_LIMIT("Block hash does not match!");
+		r = -EACCES;
+	}
+	return r;
+}
+
+/* Services the verify workqueue */
+static void kverityd_verify(struct work_struct *work)
+{
+	struct delayed_work *dwork = container_of(work, struct delayed_work,
+						  work);
+	struct dm_verity_io *io = container_of(dwork, struct dm_verity_io,
+					       work);
+	struct verity_config *vc = io->target->private;
+
+	io->error = verity_verify(vc, io);
+
+	/* Free up the bio and tag with the return value */
+	verity_return_bio_to_caller(io);
+}
+
+/* Asynchronously called upon the completion of dm-bht I/O.  The status
+ * of the operation is passed back to dm-bht and the next steps are
+ * decided by verity_dec_pending.
+ */
+static void kverityd_io_bht_populate_end(struct bio *bio, int error)
+{
+	struct dm_bht_entry *entry = (struct dm_bht_entry *) bio->bi_private;
+	struct dm_verity_io *io = (struct dm_verity_io *) entry->io_context;
+
+	/* Tell the tree to atomically update now that we've populated
+	 * the given entry.
+	 */
+	dm_bht_read_completed(entry, error);
+
+	/* Clean up for reuse when reading data to be checked */
+	bio->bi_vcnt = 0;
+	bio->bi_io_vec->bv_offset = 0;
+	bio->bi_io_vec->bv_len = 0;
+	bio->bi_io_vec->bv_page = NULL;
+	/* Restore the private data to I/O so the destructor can be shared. */
+	bio->bi_private = (void *) io;
+	bio_put(bio);
+
+	/* We bail but assume the tree has been marked bad. */
+	if (unlikely(error)) {
+		DMERR("Failed to read for sector %llu (%u)",
+		      ULL(io->bio->bi_sector), io->bio->bi_size);
+		io->error = error;
+		/* Pass through the error to verity_dec_pending below */
+	}
+	/* When pending = 0, it will transition to reading real data */
+	verity_dec_pending(io);
+}
+
+/* Called by dm-bht (via dm_bht_populate), this function provides
+ * the message digests to dm-bht that are stored on disk.
+ */
+static int kverityd_bht_read_callback(void *ctx, sector_t start, u8 *dst,
+				      sector_t count,
+				      struct dm_bht_entry *entry)
+{
+	struct dm_verity_io *io = ctx;  /* I/O for this batch */
+	struct verity_config *vc;
+	struct bio *bio;
+
+	vc = io->target->private;
+
+	/* The I/O context is nested inside the entry so that we don't need one
+	 * io context per page read.
+	 */
+	entry->io_context = ctx;
+
+	/* We should only get page size requests at present. */
+	verity_inc_pending(io);
+	bio = verity_alloc_bioset(vc, GFP_NOIO, 1);
+	if (unlikely(!bio)) {
+		DMCRIT("Out of memory at bio_alloc_bioset");
+		dm_bht_read_completed(entry, -ENOMEM);
+		return -ENOMEM;
+	}
+	bio->bi_private = (void *) entry;
+	bio->bi_idx = 0;
+	bio->bi_size = vc->bht.block_size;
+	bio->bi_sector = vc->hash_start + start;
+	bio->bi_bdev = vc->hash_dev->bdev;
+	bio->bi_end_io = kverityd_io_bht_populate_end;
+	bio->bi_rw = REQ_META;
+	/* Only need to free the bio since the page is managed by bht */
+	bio->bi_destructor = dm_verity_bio_destructor;
+	bio->bi_vcnt = 1;
+	bio->bi_io_vec->bv_offset = offset_in_page(dst);
+	bio->bi_io_vec->bv_len = to_bytes(count);
+	/* dst is guaranteed to be a page_pool allocation */
+	bio->bi_io_vec->bv_page = virt_to_page(dst);
+	/* Track that this I/O is in use.  There should be no risk of the io
+	 * being removed prior since this is called synchronously.
+	 */
+	generic_make_request(bio);
+	return 0;
+}
+
+/* Submits an io request for each missing block of block hashes.
+ * The last one to return will then enqueue this on the io workqueue.
+ */
+static void kverityd_io_bht_populate(struct dm_verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+	u64 block;
+
+	for (block = io->block; block < io->block + io->count; ++block) {
+		int ret = dm_bht_populate(&vc->bht, io, block);
+
+		if (ret < 0) {
+			/* verity_dec_pending will handle the error case. */
+			io->error = ret;
+			break;
+		}
+	}
+}
+
+/* Asynchronously called upon the completion of I/O issued
+ * from kverityd_src_io_read. verity_dec_pending() acts as
+ * the scheduler/flow manager.
+ */
+static void kverityd_src_io_read_end(struct bio *clone, int error)
+{
+	struct dm_verity_io *io = clone->bi_private;
+
+	if (unlikely(!bio_flagged(clone, BIO_UPTODATE) && !error))
+		error = -EIO;
+
+	if (unlikely(error)) {
+		DMERR("Error occurred: %d (%llu, %u)",
+			error, ULL(clone->bi_sector), clone->bi_size);
+		io->error = error;
+	}
+
+	/* Release the clone which just avoids the block layer from
+	 * leaving offsets, etc in unexpected states.
+	 */
+	bio_put(clone);
+
+	verity_dec_pending(io);
+}
+
+/* If not yet underway, an I/O request will be issued to the vc->dev
+ * device for the data needed. It is cloned to avoid unexpected changes
+ * to the original bio struct.
+ */
+static void kverityd_src_io_read(struct dm_verity_io *io)
+{
+	struct bio *clone;
+
+	/* Check if the read is already issued. */
+	if (io->flags & VERITY_IOFLAGS_CLONED)
+		return;
+
+	io->flags |= VERITY_IOFLAGS_CLONED;
+
+	/* Clone the bio. The block layer may modify the bvec array. */
+	clone = verity_bio_clone(io);
+	if (unlikely(!clone)) {
+		io->error = -ENOMEM;
+		return;
+	}
+
+	verity_inc_pending(io);
+
+	generic_make_request(clone);
+}
+
+/* kverityd_io services the I/O workqueue. For each pass through
+ * the I/O workqueue, a call to populate both the origin drive
+ * data and the hash tree data is made.
+ */
+static void kverityd_io(struct work_struct *work)
+{
+	struct delayed_work *dwork = container_of(work, struct delayed_work,
+						  work);
+	struct dm_verity_io *io = container_of(dwork, struct dm_verity_io,
+					       work);
+
+	/* Issue requests asynchronously. */
+	verity_inc_pending(io);
+	kverityd_src_io_read(io);
+	kverityd_io_bht_populate(io);
+	verity_dec_pending(io);
+}
+
+/* Paired with verity_dec_pending, the pending value in the io dictate the
+ * lifetime of a request and when it is ready to be processed on the
+ * workqueues.
+ */
+static void verity_inc_pending(struct dm_verity_io *io)
+{
+	atomic_inc(&io->pending);
+}
+
+/* Block-level requests start here. */
+static int verity_map(struct dm_target *ti, struct bio *bio,
+		      union map_info *map_context)
+{
+	struct dm_verity_io *io;
+	struct verity_config *vc;
+	struct request_queue *r_queue;
+
+	if (unlikely(!ti)) {
+		DMERR("dm_target was NULL");
+		return -EIO;
+	}
+
+	vc = ti->private;
+	r_queue = bdev_get_queue(vc->dev->bdev);
+
+	if (bio_data_dir(bio) == WRITE) {
+		/* If we silently drop writes, then the VFS layer will cache
+		 * the write and persist it in memory. While it doesn't change
+		 * the underlying storage, it still may be contrary to the
+		 * behavior expected by a verified, read-only device.
+		 */
+		DMWARN_LIMIT("write request received. rejecting with -EIO.");
+		verity_error(vc, NULL, -EIO);
+		return -EIO;
+	} else {
+		/* Queue up the request to be verified */
+		io = verity_io_alloc(ti, bio);
+		if (!io) {
+			DMERR_LIMIT("Failed to allocate and init IO data");
+			return DM_MAPIO_REQUEUE;
+		}
+		INIT_DELAYED_WORK(&io->work, kverityd_io);
+		queue_delayed_work(kverityd_ioq, &io->work, 0);
+	}
+
+	return DM_MAPIO_SUBMITTED;
+}
+
+static void splitarg(char *arg, char **key, char **val)
+{
+	*key = strsep(&arg, "=");
+	*val = strsep(&arg, "");
+}
+
+/*
+ * Non-block interfaces and device-mapper specific code
+ */
+
+/**
+ * verity_ctr - Construct a verified mapping
+ * @ti:   Target being created
+ * @argc: Number of elements in argv
+ * @argv: Vector of key-value pairs (see below).
+ *
+ * Accepts the following keys:
+ * @payload:        hashed device
+ * @hashtree:       device hashtree is stored on
+ * @hashstart:      start address of hashes (default 0)
+ * @block_size:     size of a hash block
+ * @alg:            hash algorithm
+ * @root_hexdigest: toplevel hash of the tree
+ * @error_behavior: what to do when verification fails [optional]
+ * @salt:           salt, in hex [optional]
+ *
+ * E.g.,
+ * payload=/dev/sda2 hashtree=/dev/sda3 alg=sha256
+ * root_hexdigest=f08aa4a3695290c569eb1b0ac032ae1040150afb527abbeb0a3da33d82fb2c6e
+ *
+ * TODO(wad):
+ * - Boot time addition
+ * - Track block verification to free block_hashes if memory use is a concern
+ * Testing needed:
+ * - Regular slub_debug tracing (on checkins)
+ * - Improper block hash padding
+ * - Improper bundle padding
+ * - Improper hash layout
+ * - Missing padding at end of device
+ * - Improperly sized underlying devices
+ * - Out of memory conditions (make sure this isn't too flaky under high load!)
+ * - Incorrect superhash
+ * - Incorrect block hashes
+ * - Incorrect bundle hashes
+ * - Boot-up read speed; sustained read speeds
+ */
+static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+{
+	struct verity_config *vc = NULL;
+	int ret = 0;
+	sector_t blocks;
+	unsigned int block_size = VERITY_DEFAULT_BLOCK_SIZE;
+	const char *payload = NULL;
+	const char *hashtree = NULL;
+	unsigned long hashstart = 0;
+	const char *alg = NULL;
+	const char *root_hexdigest = NULL;
+	const char *dev_error_behavior = error_behavior;
+	const char *hexsalt = "";
+	int i;
+
+	for (i = 0; i < argc; ++i) {
+		char *key, *val;
+		DMWARN("Argument %d: '%s'", i, argv[i]);
+		splitarg(argv[i], &key, &val);
+		if (!key) {
+			DMWARN("Bad argument %d: missing key?", i);
+			break;
+		}
+		if (!val) {
+			DMWARN("Bad argument %d='%s': missing value", i, key);
+			break;
+		}
+
+		if (!strcmp(key, "alg")) {
+			alg = val;
+		} else if (!strcmp(key, "payload")) {
+			payload = val;
+		} else if (!strcmp(key, "hashtree")) {
+			hashtree = val;
+		} else if (!strcmp(key, "root_hexdigest")) {
+			root_hexdigest = val;
+		} else if (!strcmp(key, "hashstart")) {
+			if (strict_strtoul(val, 10, &hashstart)) {
+				ti->error = "Invalid hashstart";
+				return -EINVAL;
+			}
+		} else if (!strcmp(key, "block_size")) {
+			unsigned long tmp;
+			if (strict_strtoul(val, 10, &tmp) ||
+			    (tmp > UINT_MAX)) {
+				ti->error = "Invalid block_size";
+				return -EINVAL;
+			}
+			block_size = (unsigned int)tmp;
+		} else if (!strcmp(key, "error_behavior")) {
+			dev_error_behavior = val;
+		} else if (!strcmp(key, "salt")) {
+			hexsalt = val;
+		} else if (!strcmp(key, "error_behavior")) {
+			dev_error_behavior = val;
+		}
+	}
+
+#define NEEDARG(n) \
+	if (!(n)) { \
+		ti->error = "Missing argument: " #n; \
+		return -EINVAL; \
+	}
+
+	NEEDARG(alg);
+	NEEDARG(payload);
+	NEEDARG(hashtree);
+	NEEDARG(root_hexdigest);
+
+#undef NEEDARG
+
+	/* The device mapper device should be setup read-only */
+	if ((dm_table_get_mode(ti->table) & ~FMODE_READ) != 0) {
+		ti->error = "Must be created readonly.";
+		return -EINVAL;
+	}
+
+	vc = kzalloc(sizeof(*vc), GFP_KERNEL);
+	if (!vc) {
+		/* TODO(wad) if this is called from the setup helper, then we
+		 * catch these errors and do a CrOS specific thing. if not, we
+		 * need to have this call the error handler.
+		 */
+		return -EINVAL;
+	}
+
+	/* Calculate the blocks from the given device size */
+	vc->size = ti->len;
+	blocks = to_bytes(vc->size) / block_size;
+	if (dm_bht_create(&vc->bht, blocks, block_size, alg)) {
+		DMERR("failed to create required bht");
+		goto bad_bht;
+	}
+	if (dm_bht_set_root_hexdigest(&vc->bht, root_hexdigest)) {
+		DMERR("root hexdigest error");
+		goto bad_root_hexdigest;
+	}
+	dm_bht_set_salt(&vc->bht, hexsalt);
+	vc->bht.read_cb = kverityd_bht_read_callback;
+
+	/* payload: device to verify */
+	vc->start = 0;  /* TODO: should this support a starting offset? */
+	/* We only ever grab the device in read-only mode. */
+	ret = dm_get_device(ti, payload,
+			    dm_table_get_mode(ti->table), &vc->dev);
+	if (ret) {
+		DMERR("Failed to acquire device '%s': %d", payload, ret);
+		ti->error = "Device lookup failed";
+		goto bad_verity_dev;
+	}
+
+	if ((to_bytes(vc->start) % block_size) ||
+	    (to_bytes(vc->size) % block_size)) {
+		ti->error = "Device must be block_size divisble/aligned";
+		goto bad_hash_start;
+	}
+
+	vc->hash_start = (sector_t)hashstart;
+
+	/* hashtree: device with hashes.
+	 * Note, payload == hashtree is okay as long as the size of
+	 *       ti->len passed to device mapper does not include
+	 *       the hashes.
+	 */
+	if (dm_get_device(ti, hashtree,
+			  dm_table_get_mode(ti->table), &vc->hash_dev)) {
+		ti->error = "Hash device lookup failed";
+		goto bad_hash_dev;
+	}
+
+	/* arg4: cryptographic digest algorithm */
+	if (snprintf(vc->hash_alg, CRYPTO_MAX_ALG_NAME, "%s", alg) >=
+	    CRYPTO_MAX_ALG_NAME) {
+		ti->error = "Hash algorithm name is too long";
+		goto bad_hash;
+	}
+
+	/* override with optional device-specific error behavior */
+	vc->error_behavior = verity_parse_error_behavior(dev_error_behavior);
+	if (vc->error_behavior == -1) {
+		ti->error = "Bad error_behavior supplied";
+		goto bad_err_behavior;
+	}
+
+	/* TODO: Maybe issues a request on the io queue for block 0? */
+
+	/* Argument processing is done, setup operational data */
+	/* Pool for dm_verity_io objects */
+	vc->io_pool = mempool_create_slab_pool(MIN_IOS, _verity_io_pool);
+	if (!vc->io_pool) {
+		ti->error = "Cannot allocate verity io mempool";
+		goto bad_slab_pool;
+	}
+
+	/* Allocate the bioset used for request padding */
+	/* TODO(wad) allocate a separate bioset for the first verify maybe */
+	vc->bs = bioset_create(MIN_BIOS, 0);
+	if (!vc->bs) {
+		ti->error = "Cannot allocate verity bioset";
+		goto bad_bs;
+	}
+
+	ti->num_flush_requests = 1;
+	ti->private = vc;
+
+	/* TODO(wad) add device and hash device names */
+	{
+		char hashdev[BDEVNAME_SIZE], vdev[BDEVNAME_SIZE];
+		bdevname(vc->hash_dev->bdev, hashdev);
+		bdevname(vc->dev->bdev, vdev);
+		DMINFO("dev:%s hash:%s [sectors:%llu blocks:%llu]", vdev,
+		       hashdev, ULL(vc->bht.sectors), ULL(blocks));
+	}
+	return 0;
+
+bad_bs:
+	mempool_destroy(vc->io_pool);
+bad_slab_pool:
+bad_err_behavior:
+bad_hash:
+	dm_put_device(ti, vc->hash_dev);
+bad_hash_dev:
+bad_hash_start:
+	dm_put_device(ti, vc->dev);
+bad_bht:
+bad_root_hexdigest:
+bad_verity_dev:
+	kfree(vc);   /* hash is not secret so no need to zero */
+	return -EINVAL;
+}
+
+static void verity_dtr(struct dm_target *ti)
+{
+	struct verity_config *vc = (struct verity_config *) ti->private;
+
+	bioset_free(vc->bs);
+	mempool_destroy(vc->io_pool);
+	dm_bht_destroy(&vc->bht);
+	dm_put_device(ti, vc->hash_dev);
+	dm_put_device(ti, vc->dev);
+	kfree(vc);
+}
+
+static int verity_status(struct dm_target *ti, status_type_t type,
+			char *result, unsigned int maxlen)
+{
+	struct verity_config *vc = (struct verity_config *) ti->private;
+	unsigned int sz = 0;
+	char hashdev[BDEVNAME_SIZE], vdev[BDEVNAME_SIZE];
+	u8 hexdigest[VERITY_MAX_DIGEST_SIZE * 2 + 1] = { 0 };
+
+	dm_bht_root_hexdigest(&vc->bht, hexdigest, sizeof(hexdigest));
+
+	switch (type) {
+	case STATUSTYPE_INFO:
+		break;
+	case STATUSTYPE_TABLE:
+		bdevname(vc->hash_dev->bdev, hashdev);
+		bdevname(vc->dev->bdev, vdev);
+		DMEMIT("/dev/%s /dev/%s %llu %u %s %s",
+			vdev,
+			hashdev,
+			ULL(vc->hash_start),
+			vc->bht.depth,
+			vc->hash_alg,
+			hexdigest);
+		break;
+	}
+	return 0;
+}
+
+static int verity_merge(struct dm_target *ti, struct bvec_merge_data *bvm,
+		       struct bio_vec *biovec, int max_size)
+{
+	struct verity_config *vc = ti->private;
+	struct request_queue *q = bdev_get_queue(vc->dev->bdev);
+
+	if (!q->merge_bvec_fn)
+		return max_size;
+
+	bvm->bi_bdev = vc->dev->bdev;
+	bvm->bi_sector = vc->start + bvm->bi_sector - ti->begin;
+
+	/* Optionally, this could just return 0 to stick to single pages. */
+	return min(max_size, q->merge_bvec_fn(q, bvm, biovec));
+}
+
+static int verity_iterate_devices(struct dm_target *ti,
+				 iterate_devices_callout_fn fn, void *data)
+{
+	struct verity_config *vc = ti->private;
+
+	return fn(ti, vc->dev, vc->start, ti->len, data);
+}
+
+static void verity_io_hints(struct dm_target *ti,
+			    struct queue_limits *limits)
+{
+	struct verity_config *vc = ti->private;
+	unsigned int block_size = vc->bht.block_size;
+
+	limits->logical_block_size = block_size;
+	limits->physical_block_size = block_size;
+	blk_limits_io_min(limits, block_size);
+}
+
+static struct target_type verity_target = {
+	.name   = "verity",
+	.version = {0, 1, 0},
+	.module = THIS_MODULE,
+	.ctr    = verity_ctr,
+	.dtr    = verity_dtr,
+	.map    = verity_map,
+	.merge  = verity_merge,
+	.status = verity_status,
+	.iterate_devices = verity_iterate_devices,
+	.io_hints = verity_io_hints,
+};
+
+#define VERITY_WQ_FLAGS (WQ_CPU_INTENSIVE|WQ_HIGHPRI)
+
+static int __init dm_verity_init(void)
+{
+	int r = -ENOMEM;
+
+	_verity_io_pool = KMEM_CACHE(dm_verity_io, 0);
+	if (!_verity_io_pool) {
+		DMERR("failed to allocate pool dm_verity_io");
+		goto bad_io_pool;
+	}
+
+	kverityd_ioq = alloc_workqueue("kverityd_io", VERITY_WQ_FLAGS, 1);
+	if (!kverityd_ioq) {
+		DMERR("failed to create workqueue kverityd_ioq");
+		goto bad_io_queue;
+	}
+
+	kveritydq = alloc_workqueue("kverityd", VERITY_WQ_FLAGS, 1);
+	if (!kveritydq) {
+		DMERR("failed to create workqueue kveritydq");
+		goto bad_verify_queue;
+	}
+
+	r = dm_register_target(&verity_target);
+	if (r < 0) {
+		DMERR("register failed %d", r);
+		goto register_failed;
+	}
+
+	DMINFO("version %u.%u.%u loaded", verity_target.version[0],
+	       verity_target.version[1], verity_target.version[2]);
+
+	return r;
+
+register_failed:
+	destroy_workqueue(kveritydq);
+bad_verify_queue:
+	destroy_workqueue(kverityd_ioq);
+bad_io_queue:
+	kmem_cache_destroy(_verity_io_pool);
+bad_io_pool:
+	return r;
+}
+
+static void __exit dm_verity_exit(void)
+{
+	destroy_workqueue(kveritydq);
+	destroy_workqueue(kverityd_ioq);
+
+	dm_unregister_target(&verity_target);
+	kmem_cache_destroy(_verity_io_pool);
+}
+
+module_init(dm_verity_init);
+module_exit(dm_verity_exit);
+
+MODULE_AUTHOR("The Chromium OS Authors <chromium-os-dev@chromium.org>");
+MODULE_DESCRIPTION(DM_NAME " target for transparent disk integrity checking");
+MODULE_LICENSE("GPL");
diff --git a/drivers/md/dm-verity.h b/drivers/md/dm-verity.h
new file mode 100644
index 0000000..e0664c9
--- /dev/null
+++ b/drivers/md/dm-verity.h
@@ -0,0 +1,45 @@
+/*
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *                    All Rights Reserved.
+ *
+ * This file is released under the GPLv2.
+ *
+ * Provide error types for use when creating a custom error handler.
+ * See Documentation/device-mapper/dm-verity.txt
+ */
+#ifndef DM_VERITY_H
+#define DM_VERITY_H
+
+#include <linux/notifier.h>
+
+struct dm_verity_error_state {
+	int code;
+	int transient;  /* Likely to not happen after a reboot */
+	u64 block;
+	const char *message;
+
+	sector_t dev_start;
+	sector_t dev_len;
+	struct block_device *dev;
+
+	sector_t hash_dev_start;
+	sector_t hash_dev_len;
+	struct block_device *hash_dev;
+
+	/* Final behavior after all notifications are completed. */
+	int behavior;
+};
+
+/* This enum must be matched to allowed_error_behaviors in dm-verity.c */
+enum dm_verity_error_behavior {
+	DM_VERITY_ERROR_BEHAVIOR_EIO = 0,
+	DM_VERITY_ERROR_BEHAVIOR_PANIC,
+	DM_VERITY_ERROR_BEHAVIOR_NONE,
+	DM_VERITY_ERROR_BEHAVIOR_NOTIFY
+};
+
+
+int dm_verity_register_error_notifier(struct notifier_block *nb);
+int dm_verity_unregister_error_notifier(struct notifier_block *nb);
+
+#endif  /* DM_VERITY_H */
diff --git a/include/linux/dm-bht.h b/include/linux/dm-bht.h
new file mode 100644
index 0000000..3a4b432
--- /dev/null
+++ b/include/linux/dm-bht.h
@@ -0,0 +1,166 @@
+/*
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *
+ * Device-Mapper block hash tree interface.
+ * See Documentation/device-mapper/dm-bht.txt for details.
+ *
+ * This file is released under the GPLv2.
+ */
+#ifndef __LINUX_DM_BHT_H
+#define __LINUX_DM_BHT_H
+
+#include <crypto/hash.h>
+#include <linux/compiler.h>
+#include <linux/types.h>
+
+/* To avoid allocating memory for digest tests, we just setup a
+ * max to use for now.
+ */
+#define DM_BHT_MAX_DIGEST_SIZE 128  /* 1k hashes are unlikely for now */
+#define DM_BHT_SALT_SIZE       32   /* 256 bits of salt is a lot */
+
+/* UNALLOCATED, PENDING, READY, and VERIFIED are valid states. All other
+ * values are entry-related return codes.
+ */
+#define DM_BHT_ENTRY_VERIFIED 8  /* 'nodes' has been checked against parent */
+#define DM_BHT_ENTRY_READY 4  /* 'nodes' is loaded and available */
+#define DM_BHT_ENTRY_PENDING 2  /* 'nodes' is being loaded */
+#define DM_BHT_ENTRY_UNALLOCATED 0 /* untouched */
+#define DM_BHT_ENTRY_ERROR -1 /* entry is unsuitable for use */
+#define DM_BHT_ENTRY_ERROR_IO -2 /* I/O error on load */
+
+/* Additional possible return codes */
+#define DM_BHT_ENTRY_ERROR_MISMATCH -3 /* Digest mismatch */
+
+/* dm_bht_entry
+ * Contains dm_bht->node_count tree nodes at a given tree depth.
+ * state is used to transactionally assure that data is paged in
+ * from disk.  Unless dm_bht kept running crypto contexts for each
+ * level, we need to load in the data for on-demand verification.
+ */
+struct dm_bht_entry {
+	atomic_t state; /* see defines */
+	/* Keeping an extra pointer per entry wastes up to ~33k of
+	 * memory if a 1m blocks are used (or 66 on 64-bit arch)
+	 */
+	void *io_context;  /* Reserve a pointer for use during io */
+	/* data should only be non-NULL if fully populated. */
+	void *nodes;  /* The hash data used to verify the children.
+		       * Guaranteed to be page-aligned.
+		       */
+};
+
+/* dm_bht_level
+ * Contains an array of entries which represent a page of hashes where
+ * each hash is a node in the tree at the given tree depth/level.
+ */
+struct dm_bht_level {
+	struct dm_bht_entry *entries;  /* array of entries of tree nodes */
+	unsigned int count;  /* number of entries at this level */
+	sector_t sector;  /* starting sector for this level */
+};
+
+/* opaque context, start, databuf, sector_count */
+typedef int(*dm_bht_callback)(void *,  /* external context */
+			      sector_t,  /* start sector */
+			      u8 *,  /* destination page */
+			      sector_t,  /* num sectors */
+			      struct dm_bht_entry *);
+/* dm_bht - Device mapper block hash tree
+ * dm_bht provides a fixed interface for comparing data blocks
+ * against a cryptographic hashes stored in a hash tree. It
+ * optimizes the tree structure for storage on disk.
+ *
+ * The tree is built from the bottom up.  A collection of data,
+ * external to the tree, is hashed and these hashes are stored
+ * as the blocks in the tree.  For some number of these hashes,
+ * a parent node is created by hashing them.  These steps are
+ * repeated.
+ *
+ * TODO(wad): All hash storage memory is pre-allocated and freed once an
+ * entire branch has been verified.
+ */
+struct dm_bht {
+	/* Configured values */
+	int depth;  /* Depth of the tree including the root */
+	unsigned int block_count;  /* Number of blocks hashed */
+	unsigned int block_size;  /* Size of a hash block */
+	char hash_alg[CRYPTO_MAX_ALG_NAME];
+	unsigned char salt[DM_BHT_SALT_SIZE];
+
+	/* Computed values */
+	unsigned int node_count;  /* Data size (in hashes) for each entry */
+	unsigned int node_count_shift;  /* first bit set - 1 */
+	/* There is one per CPU so that verified can be simultaneous. */
+	struct shash_desc *hash_desc[NR_CPUS];  /* Container for the hash alg */
+	unsigned int digest_size;
+	sector_t sectors;  /* Number of disk sectors used */
+
+	/* bool verified;  Full tree is verified */
+	u8 root_digest[DM_BHT_MAX_DIGEST_SIZE];
+	struct dm_bht_level *levels;  /* in reverse order */
+	/* Callback for reading from the hash device */
+	dm_bht_callback read_cb;
+};
+
+/* Constructor for struct dm_bht instances. */
+int dm_bht_create(struct dm_bht *bht,
+		  unsigned int block_count,
+		  unsigned int block_size,
+		  const char *alg_name);
+/* Destructor for struct dm_bht instances.  Does not free @bht */
+void dm_bht_destroy(struct dm_bht *bht);
+
+/* Basic accessors for struct dm_bht */
+int dm_bht_set_root_hexdigest(struct dm_bht *bht, const u8 *hexdigest);
+int dm_bht_root_hexdigest(struct dm_bht *bht, u8 *hexdigest, int available);
+void dm_bht_set_salt(struct dm_bht *bht, const char *hexsalt);
+int dm_bht_salt(struct dm_bht *bht, char *hexsalt);
+
+/* Functions for loading in data from disk for verification */
+bool dm_bht_is_populated(struct dm_bht *bht, unsigned int block);
+int dm_bht_populate(struct dm_bht *bht, void *read_cb_ctx,
+		    unsigned int block);
+int dm_bht_verify_block(struct dm_bht *bht, unsigned int block,
+			struct page *pg, unsigned int offset);
+void dm_bht_read_completed(struct dm_bht_entry *entry, int status);
+
+/* Functions for converting indices to nodes. */
+
+static inline unsigned int dm_bht_get_level_shift(struct dm_bht *bht,
+						  int depth)
+{
+	return (bht->depth - depth) * bht->node_count_shift;
+}
+
+/* For the given depth, this is the entry index.  At depth+1 it is the node
+ * index for depth.
+ */
+static inline unsigned int dm_bht_index_at_level(struct dm_bht *bht,
+							int depth,
+							unsigned int leaf)
+{
+	return leaf >> dm_bht_get_level_shift(bht, depth);
+}
+
+static inline struct dm_bht_entry *dm_bht_get_entry(struct dm_bht *bht,
+						    int depth,
+						    unsigned int block)
+{
+	unsigned int index = dm_bht_index_at_level(bht, depth, block);
+	struct dm_bht_level *level = &bht->levels[depth];
+
+	return &level->entries[index];
+}
+
+static inline void *dm_bht_get_node(struct dm_bht *bht,
+				  struct dm_bht_entry *entry,
+				  int depth,
+				  unsigned int block)
+{
+	unsigned int index = dm_bht_index_at_level(bht, depth, block);
+	unsigned int node_index = index % bht->node_count;
+
+	return entry->nodes + (node_index * bht->digest_size);
+}
+#endif  /* __LINUX_DM_BHT_H */
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] dm: verity target
  2011-11-10  7:44 ` Steffen Klassert
@ 2011-11-10 14:42   ` Will Drewry
  0 siblings, 0 replies; 22+ messages in thread
From: Will Drewry @ 2011-11-10 14:42 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Mandeep Singh Baines, Alasdair G Kergon, dm-devel, Elly Jones,
	Milan Broz, Olof Johansson, linux-kernel

On Thu, Nov 10, 2011 at 1:44 AM, Steffen Klassert
<steffen.klassert@secunet.com> wrote:
> On Wed, Nov 09, 2011 at 09:18:10PM -0800, Mandeep Singh Baines wrote:
>>
>> + * TODO(wad): All hash storage memory is pre-allocated and freed once an
>> + * entire branch has been verified.
>> + */
>> +struct dm_bht {
>> +     /* Configured values */
>> +     int depth;  /* Depth of the tree including the root */
>> +     unsigned int block_count;  /* Number of blocks hashed */
>> +     unsigned int block_size;  /* Size of a hash block */
>> +     char hash_alg[CRYPTO_MAX_ALG_NAME];
>> +     unsigned char salt[DM_BHT_SALT_SIZE];
>> +
>> +     /* Computed values */
>> +     unsigned int node_count;  /* Data size (in hashes) for each entry */
>> +     unsigned int node_count_shift;  /* first bit set - 1 */
>> +     /* There is one per CPU so that verified can be simultaneous. */
>> +     struct hash_desc hash_desc[NR_CPUS];  /* Container for the hash alg */
>
> Please don't add a new user for the old hash interface. If the hashes can
> be done asynchronous you can use ahash, if not use shash. Both interfaces
> are reentrant, that's probaply what you want to have here. You don't
> need to have this in a per cpu manner.

I'll check out the two interfaces.  I didn't realize hash_desc was
deprecated specifically in favor of one of the others.  I'm interested
in seeing if it is possible to not keep a per-cpu desc though with the
other apis.  We do this now to avoid contention across multiple kernel
threads on crypto operations sharing the hash_desc (e.g., by wrapping
it in a mutex).

Thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] dm: verity target
  2011-11-10  5:18 Mandeep Singh Baines
@ 2011-11-10  7:44 ` Steffen Klassert
  2011-11-10 14:42   ` Will Drewry
  0 siblings, 1 reply; 22+ messages in thread
From: Steffen Klassert @ 2011-11-10  7:44 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: Alasdair G Kergon, dm-devel, Will Drewry, Elly Jones, Milan Broz,
	Olof Johansson, linux-kernel

On Wed, Nov 09, 2011 at 09:18:10PM -0800, Mandeep Singh Baines wrote:
>
> + * TODO(wad): All hash storage memory is pre-allocated and freed once an
> + * entire branch has been verified.
> + */
> +struct dm_bht {
> +	/* Configured values */
> +	int depth;  /* Depth of the tree including the root */
> +	unsigned int block_count;  /* Number of blocks hashed */
> +	unsigned int block_size;  /* Size of a hash block */
> +	char hash_alg[CRYPTO_MAX_ALG_NAME];
> +	unsigned char salt[DM_BHT_SALT_SIZE];
> +
> +	/* Computed values */
> +	unsigned int node_count;  /* Data size (in hashes) for each entry */
> +	unsigned int node_count_shift;  /* first bit set - 1 */
> +	/* There is one per CPU so that verified can be simultaneous. */
> +	struct hash_desc hash_desc[NR_CPUS];  /* Container for the hash alg */

Please don't add a new user for the old hash interface. If the hashes can
be done asynchronous you can use ahash, if not use shash. Both interfaces
are reentrant, that's probaply what you want to have here. You don't
need to have this in a per cpu manner.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] dm: verity target
@ 2011-11-10  5:18 Mandeep Singh Baines
  2011-11-10  7:44 ` Steffen Klassert
  0 siblings, 1 reply; 22+ messages in thread
From: Mandeep Singh Baines @ 2011-11-10  5:18 UTC (permalink / raw)
  To: Alasdair G Kergon, dm-devel
  Cc: Mandeep Singh Baines, Will Drewry, Elly Jones, Alasdair G Kergon,
	Milan Broz, Olof Johansson, linux-kernel

The verity target provides transparent integrity checking of block devices
using a cryptographic digest.

dm-verity is meant to be setup as part of a verified boot path.  This
may be anything ranging from a boot using tboot or trustedgrub to just
booting from a known-good device (like a USB drive or CD).

dm-verity is part of ChromeOS's verified boot path. It is used to verify
the integrity of the root filesystem on boot. The root filesystem is
mounted on a dm-verity partition which transparently verifies each block
with a bootloader verified hash passed into the kernel at boot.

Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Elly Jones <ellyjones@chromium.org>
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Milan Broz <mbroz@redhat.com>
Cc: Olof Johansson <olofj@chromium.org>
Cc: dm-devel@redhat.com
Cc: linux-kernel@vger.kernel.org
---
 Documentation/device-mapper/dm-bht.txt    |   59 ++
 Documentation/device-mapper/dm-verity.txt |   76 +++
 drivers/md/Kconfig                        |   30 +
 drivers/md/Makefile                       |    2 +
 drivers/md/dm-bht.c                       |  542 +++++++++++++++
 drivers/md/dm-verity.c                    | 1043 +++++++++++++++++++++++++++++
 drivers/md/dm-verity.h                    |   45 ++
 include/linux/dm-bht.h                    |  166 +++++
 8 files changed, 1963 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/device-mapper/dm-bht.txt
 create mode 100644 Documentation/device-mapper/dm-verity.txt
 create mode 100644 drivers/md/dm-bht.c
 create mode 100644 drivers/md/dm-verity.c
 create mode 100644 drivers/md/dm-verity.h
 create mode 100644 include/linux/dm-bht.h

diff --git a/Documentation/device-mapper/dm-bht.txt b/Documentation/device-mapper/dm-bht.txt
new file mode 100644
index 0000000..21d929f
--- /dev/null
+++ b/Documentation/device-mapper/dm-bht.txt
@@ -0,0 +1,59 @@
+dm-bht
+======
+
+dm-bht provides a block hash tree implementation.  The use of dm-bht allows
+for integrity checking of a given block device without reading the entire
+set of blocks into memory before use.
+
+In particular, dm-bht supplies an interface for creating and verifying a tree
+of cryptographic digests with any algorithm supported by the kernel crypto API.
+
+The `verity' target is the motivating example.
+
+
+Theory of operation
+===================
+
+dm-bht is logically comprised of multiple nodes organized in a tree-like
+structure.  Each node in the tree is a cryptographic hash.  If it is a leaf
+node, the hash is of some block data on disk.  If it is an intermediary node,
+then the hash is of a number of child nodes.
+
+dm-bht has a given depth starting at 1 (ignoring the root node).  Each level in
+the tree is concretely made up of dm_bht_entry structs.  Each entry in the tree
+is a collection of neighboring nodes that fit in one page-sized block.  The
+number is determined based on PAGE_SIZE and the size of the selected
+cryptographic digest algorithm.  The hashes are linearly ordered in this entry
+and any unaligned trailing space is ignored but included when calculating the
+parent node.
+
+The tree looks something like:
+
+alg= sha256, num_blocks = 32767
+                                 [   root    ]
+                                /    . . .    \
+                     [entry_0]                 [entry_1]
+                    /  . . .  \                 . . .   \
+         [entry_0_0]   . . .  [entry_0_127]    . . . .  [entry_1_127]
+           / ... \             /   . . .  \             /           \
+     blk_0 ... blk_127  blk_16256   blk_16383      blk_32640 . . . blk_32767
+
+root is treated independently from the depth and the blocks are expected to
+be hashed and supplied to the dm-bht.  hash blocks that make up the entry
+contents are expected to be read from disk.
+
+dm-bht does not handle I/O directly but instead expects the consumer to
+supply callbacks.  The read callback will always receive a page-align value
+to pass to the block device layer to read in a hash value.
+
+Usage
+=====
+
+The API provides mechanisms for reading and verifying a tree. When reading, all
+required data for the hash tree should be populated for a block before
+attempting a verify.  This can be done by calling dm_bht_populate().  When all
+data is ready, a call to dm_bht_verify_block() with the expected hash value will
+perform both the direct block hash check and the hashes of the parent and
+neighboring nodes where needed to ensure validity up to the root hash.  Note,
+dm_bht_set_root_hexdigest() should be called before any verification attempts
+occur.
diff --git a/Documentation/device-mapper/dm-verity.txt b/Documentation/device-mapper/dm-verity.txt
new file mode 100644
index 0000000..f33b984
--- /dev/null
+++ b/Documentation/device-mapper/dm-verity.txt
@@ -0,0 +1,76 @@
+dm-verity
+==========
+
+Device-Mapper's "verity" target provides transparent integrity checking of
+block devices using a cryptographic digest provided by the kernel crypto API.
+This target is read-only.
+
+Parameters: payload=<device path> hashtree=<hash device path> alg=<alg> \
+            salt=<salt> root_hexagiest=<root hash> \
+            [ hashstart=<hash start> error_behavior=<error behavior> ]
+
+<device path>
+    This is the device that is going to be integrity checked.  It may be
+    a subset of the full device as specified to dmsetup (start sector and count)
+    It may be specified as a path, like /dev/sdaX, or a device number,
+    <major>:<minor>.
+
+<hash device path>
+    This is the device that that supplies the dm-bht hash data.  It may be
+    specified similarly to the device path and may be the same device.  If the
+    same device is used, the hash offset should be outside of the dm-verity
+    configured device size.
+
+<alg>
+    The cryptographic hash algorithm used for this device.  This should
+    be the name of the algorithm, like "sha1".
+
+<salt>
+    Salt value (in hex).
+
+<root hash>
+    The hexadecimal encoding of the cryptographic hash of all of the
+    neighboring nodes at the first level of the tree.  This hash should be
+    trusted as there is no other authenticity beyond this point.
+
+<hash start>
+    Start address of hashes (default 0).
+
+<error behavior>
+    0 = return -EIO. 1 = panic. 2 = none. 3 = call notifier.
+
+Theory of operation
+===================
+
+dm-verity is meant to be setup as part of a verified boot path.  This
+may be anything ranging from a boot using tboot or trustedgrub to just
+booting from a known-good device (like a USB drive or CD).
+
+When a dm-verity device is configured, it is expected that the caller
+has been authenticated in some way (cryptographic signatures, etc).
+After instantiation, all hashes will be verified on-demand during
+disk access.  If they cannot be verified up to the root node of the
+tree, the root hash, then the I/O will fail.  This should identify
+tampering with any data on the device and the hash data.
+
+Cryptographic hashes are used to assert the integrity of the device on a
+per-block basis.  This allows for a lightweight hash computation on first read
+into the page cache.  Block hashes are stored linearly aligned to the nearest
+block the size of a page.
+
+For more information on the hashing process, see dm-bht.txt.
+
+
+Example
+=======
+
+Setup a device;
+[[
+  dmsetup create vroot --table \
+    "0 204800 verity payload=/dev/sda1 hashtree=/dev/sda2 alg=sha1 "\
+    "root_hexdigest=9f74809a2ee7607b16fcc70d9399a4de9725a727"
+]]
+
+A command line tool is available to compute the hash tree and return the
+root hash value.
+  http://git.chromium.org/cgi-bin/gitweb.cgi?p=dm-verity.git;a=tree
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index faa4741..3cdf95c 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -370,4 +370,34 @@ config DM_FLAKEY
        ---help---
          A target that intermittently fails I/O for debugging purposes.
 
+config DM_BHT
+        tristate "Block hash tree support"
+        select CRYPTO
+        select CRYPTO_HASH
+        ---help---
+          Include support for device-mapper devices to use a block hash
+          tree for managing data integrity checks in a scalable way.
+
+          Targets that use this functionality should include it
+          automatically.
+
+          If unsure, say N.
+
+config DM_VERITY
+        tristate "Verity target support"
+        depends on BLK_DEV_DM
+        select DM_BHT
+        select CRYPTO
+        select CRYPTO_HASH
+        ---help---
+          This device-mapper target allows you to create a device that
+          transparently integrity checks the data on it. You'll need to
+          activate the digests you're going to use in the cryptoapi
+          configuration.
+
+          To compile this code as a module, choose M here: the module will
+          be called dm-verity.
+
+          If unsure, say N.
+
 endif # MD
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index 046860c..c069953 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -39,6 +39,8 @@ obj-$(CONFIG_DM_SNAPSHOT)	+= dm-snapshot.o
 obj-$(CONFIG_DM_PERSISTENT_DATA)	+= persistent-data/
 obj-$(CONFIG_DM_MIRROR)		+= dm-mirror.o dm-log.o dm-region-hash.o
 obj-$(CONFIG_DM_LOG_USERSPACE)	+= dm-log-userspace.o
+obj-$(CONFIG_DM_BHT)            += dm-bht.o
+obj-$(CONFIG_DM_VERITY)         += dm-verity.o
 obj-$(CONFIG_DM_ZERO)		+= dm-zero.o
 obj-$(CONFIG_DM_RAID)	+= dm-raid.o
 obj-$(CONFIG_DM_THIN_PROVISIONING)	+= dm-thin-pool.o
diff --git a/drivers/md/dm-bht.c b/drivers/md/dm-bht.c
new file mode 100644
index 0000000..fd853db
--- /dev/null
+++ b/drivers/md/dm-bht.c
@@ -0,0 +1,542 @@
+ /*
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *
+ * Device-Mapper block hash tree interface.
+ * See Documentation/device-mapper/dm-bht.txt for details.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/atomic.h>
+#include <linux/bitops.h>
+#include <linux/bug.h>
+#include <linux/cpumask.h>
+#include <linux/device-mapper.h>
+#include <linux/dm-bht.h>
+#include <linux/err.h>
+#include <linux/errno.h>
+#include <linux/gfp.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/mm_types.h>
+#include <linux/scatterlist.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+#define DM_MSG_PREFIX "dm bht"
+
+
+/*
+ * Utilities
+ */
+
+static u8 from_hex(u8 ch)
+{
+	if ((ch >= '0') && (ch <= '9'))
+		return ch - '0';
+	if ((ch >= 'a') && (ch <= 'f'))
+		return ch - 'a' + 10;
+	if ((ch >= 'A') && (ch <= 'F'))
+		return ch - 'A' + 10;
+	return -1;
+}
+
+/**
+ * dm_bht_bin_to_hex - converts a binary stream to human-readable hex
+ * @binary:	a byte array of length @binary_len
+ * @hex:	a byte array of length @binary_len * 2 + 1
+ */
+static void dm_bht_bin_to_hex(u8 *binary, u8 *hex, unsigned int binary_len)
+{
+	while (binary_len-- > 0) {
+		sprintf((char *)hex, "%02hhx", (int)*binary);
+		hex += 2;
+		binary++;
+	}
+}
+
+/**
+ * dm_bht_hex_to_bin - converts a hex stream to binary
+ * @binary:	a byte array of length @binary_len
+ * @hex:	a byte array of length @binary_len * 2 + 1
+ */
+static void dm_bht_hex_to_bin(u8 *binary, const u8 *hex,
+			      unsigned int binary_len)
+{
+	while (binary_len-- > 0) {
+		*binary = from_hex(*(hex++));
+		*binary *= 16;
+		*binary += from_hex(*(hex++));
+		binary++;
+	}
+}
+
+static void dm_bht_log_mismatch(struct dm_bht *bht, u8 *given, u8 *computed)
+{
+	u8 given_hex[DM_BHT_MAX_DIGEST_SIZE * 2 + 1];
+	u8 computed_hex[DM_BHT_MAX_DIGEST_SIZE * 2 + 1];
+
+	dm_bht_bin_to_hex(given, given_hex, bht->digest_size);
+	dm_bht_bin_to_hex(computed, computed_hex, bht->digest_size);
+	DMERR_LIMIT("%s != %s", given_hex, computed_hex);
+}
+
+/**
+ * dm_bht_compute_hash: hashes a page of data
+ */
+static int dm_bht_compute_hash(struct dm_bht *bht, struct page *pg,
+			       unsigned int offset, u8 *digest)
+{
+	struct hash_desc *hash_desc = &bht->hash_desc[smp_processor_id()];
+	struct scatterlist sg;
+
+	sg_init_table(&sg, 1);
+	sg_set_page(&sg, pg, bht->block_size, offset);
+	/* Note, this is synchronous. */
+	if (crypto_hash_init(hash_desc)) {
+		DMCRIT("failed to reinitialize crypto hash (proc:%d)",
+			smp_processor_id());
+		return -EINVAL;
+	}
+	if (crypto_hash_update(hash_desc, &sg, bht->block_size)) {
+		DMCRIT("crypto_hash_update failed");
+		return -EINVAL;
+	}
+	sg_set_buf(&sg, bht->salt, sizeof(bht->salt));
+	if (crypto_hash_update(hash_desc, &sg, sizeof(bht->salt))) {
+		DMCRIT("crypto_hash_update failed");
+		return -EINVAL;
+	}
+	if (crypto_hash_final(hash_desc, digest)) {
+		DMCRIT("crypto_hash_final failed");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/*
+ * Implementation functions
+ */
+
+static int dm_bht_initialize_entries(struct dm_bht *bht)
+{
+	/* last represents the index of the last digest store in the tree.
+	 * By walking the tree with that index, it is possible to compute the
+	 * total number of entries at each level.
+	 *
+	 * Since each entry will contain up to |node_count| nodes of the tree,
+	 * it is possible that the last index may not be at the end of a given
+	 * entry->nodes.  In that case, it is assumed the value is padded.
+	 *
+	 * Note, we treat both the tree root (1 hash) and the tree leaves
+	 * independently from the bht data structures.  Logically, the root is
+	 * depth=-1 and the block layer level is depth=bht->depth
+	 */
+	unsigned int last = bht->block_count;
+	int depth;
+
+	/* check that the largest level->count can't result in an int overflow
+	 * on allocation or sector calculation.
+	 */
+	if (((last >> bht->node_count_shift) + 1) >
+	    UINT_MAX / max((unsigned int)sizeof(struct dm_bht_entry),
+			   (unsigned int)to_sector(bht->block_size))) {
+		DMCRIT("required entries %u is too large", last + 1);
+		return -EINVAL;
+	}
+
+	/* Track the current sector location for each level so we don't have to
+	 * compute it during traversals.
+	 */
+	bht->sectors = 0;
+	for (depth = 0; depth < bht->depth; ++depth) {
+		struct dm_bht_level *level = &bht->levels[depth];
+
+		level->count = dm_bht_index_at_level(bht, depth, last) + 1;
+		level->entries = (struct dm_bht_entry *)
+				 kcalloc(level->count,
+					 sizeof(struct dm_bht_entry),
+					 GFP_KERNEL);
+		if (!level->entries) {
+			DMERR("failed to allocate entries for depth %d", depth);
+			return -ENOMEM;
+		}
+		level->sector = bht->sectors;
+		bht->sectors += level->count * to_sector(bht->block_size);
+	}
+
+	return 0;
+}
+
+/**
+ * dm_bht_create - prepares @bht for us
+ * @bht:	pointer to a dm_bht_create()d bht
+ * @depth:	tree depth without the root; including block hashes
+ * @block_count:the number of block hashes / tree leaves
+ * @alg_name:	crypto hash algorithm name
+ *
+ * Returns 0 on success.
+ *
+ * Callers can offset into devices by storing the data in the io callbacks.
+ */
+int dm_bht_create(struct dm_bht *bht, unsigned int block_count,
+		  unsigned int block_size, const char *alg_name)
+{
+	int cpu, status;
+
+	bht->block_size = block_size;
+	/* Verify that PAGE_SIZE >= block_size >= SECTOR_SIZE. */
+	if ((block_size > PAGE_SIZE) ||
+	    (PAGE_SIZE % block_size) ||
+	    (to_sector(block_size) == 0))
+		return -EINVAL;
+
+	/* Setup the hash first. Its length determines much of the bht layout */
+	for (cpu = 0; cpu < nr_cpu_ids; ++cpu) {
+		bht->hash_desc[cpu].tfm = crypto_alloc_hash(alg_name, 0, 0);
+		if (IS_ERR(bht->hash_desc[cpu].tfm)) {
+			DMERR("failed to allocate crypto hash '%s'", alg_name);
+			status = -ENOMEM;
+			bht->hash_desc[cpu].tfm = NULL;
+			goto bad_arg;
+		}
+	}
+	bht->digest_size = crypto_hash_digestsize(bht->hash_desc[0].tfm);
+	/* We expect to be able to pack >=2 hashes into a block */
+	if (block_size / bht->digest_size < 2) {
+		DMERR("too few hashes fit in a block");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	if (bht->digest_size > DM_BHT_MAX_DIGEST_SIZE) {
+		DMERR("DM_BHT_MAX_DIGEST_SIZE too small for chosen digest");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	/* Configure the tree */
+	bht->block_count = block_count;
+	if (block_count == 0) {
+		DMERR("block_count must be non-zero");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	/* Each dm_bht_entry->nodes is one block.  The node code tracks
+	 * how many nodes fit into one entry where a node is a single
+	 * hash (message digest).
+	 */
+	bht->node_count_shift = fls(block_size / bht->digest_size) - 1;
+	/* Round down to the nearest power of two.  This makes indexing
+	 * into the tree much less painful.
+	 */
+	bht->node_count = 1 << bht->node_count_shift;
+
+	/* This is unlikely to happen, but with 64k pages, who knows. */
+	if (bht->node_count > UINT_MAX / bht->digest_size) {
+		DMERR("node_count * hash_len exceeds UINT_MAX!");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	bht->depth = DIV_ROUND_UP(fls(block_count - 1), bht->node_count_shift);
+
+	/* Ensure that we can safely shift by this value. */
+	if (bht->depth * bht->node_count_shift >= sizeof(unsigned int) * 8) {
+		DMERR("specified depth and node_count_shift is too large");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	/* Allocate levels. Each level of the tree may have an arbitrary number
+	 * of dm_bht_entry structs.  Each entry contains node_count nodes.
+	 * Each node in the tree is a cryptographic digest of either node_count
+	 * nodes on the subsequent level or of a specific block on disk.
+	 */
+	bht->levels = (struct dm_bht_level *)
+			kcalloc(bht->depth,
+				sizeof(struct dm_bht_level), GFP_KERNEL);
+	if (!bht->levels) {
+		DMERR("failed to allocate tree levels");
+		status = -ENOMEM;
+		goto bad_level_alloc;
+	}
+
+	bht->read_cb = NULL;
+
+	status = dm_bht_initialize_entries(bht);
+	if (status)
+		goto bad_entries_alloc;
+
+	/* We compute depth such that there is only be 1 block at level 0. */
+	BUG_ON(bht->levels[0].count != 1);
+
+	return 0;
+
+bad_entries_alloc:
+	while (bht->depth-- > 0)
+		kfree(bht->levels[bht->depth].entries);
+	kfree(bht->levels);
+bad_level_alloc:
+bad_arg:
+	for (cpu = 0; cpu < nr_cpu_ids; ++cpu)
+		if (bht->hash_desc[cpu].tfm)
+			crypto_free_hash(bht->hash_desc[cpu].tfm);
+	return status;
+}
+EXPORT_SYMBOL(dm_bht_create);
+
+/**
+ * dm_bht_read_completed
+ * @entry:	pointer to the entry that's been loaded
+ * @status:	I/O status. Non-zero is failure.
+ * MUST always be called after a read_cb completes.
+ */
+void dm_bht_read_completed(struct dm_bht_entry *entry, int status)
+{
+	if (status) {
+		/* TODO(wad) add retry support */
+		DMCRIT("an I/O error occurred while reading entry");
+		atomic_set(&entry->state, DM_BHT_ENTRY_ERROR_IO);
+		/* entry->nodes will be freed later */
+		return;
+	}
+	BUG_ON(atomic_read(&entry->state) != DM_BHT_ENTRY_PENDING);
+	atomic_set(&entry->state, DM_BHT_ENTRY_READY);
+}
+EXPORT_SYMBOL(dm_bht_read_completed);
+
+/**
+ * dm_bht_verify_block - checks that all nodes in the path for @block are valid
+ * @bht:	pointer to a dm_bht_create()d bht
+ * @block:	specific block data is expected from
+ * @pg:		page holding the block data
+ * @offset:	offset into the page
+ *
+ * Returns 0 on success, DM_BHT_ENTRY_ERROR_MISMATCH on error.
+ */
+int dm_bht_verify_block(struct dm_bht *bht, unsigned int block,
+			struct page *pg, unsigned int offset)
+{
+	int state, depth = bht->depth;
+	u8 digest[DM_BHT_MAX_DIGEST_SIZE];
+	struct dm_bht_entry *entry;
+	void *node;
+
+	do {
+		/* Need to check that the hash of the current block is accurate
+		 * in its parent.
+		 */
+		entry = dm_bht_get_entry(bht, depth - 1, block);
+		state = atomic_read(&entry->state);
+		/* This call is only safe if all nodes along the path
+		 * are already populated (i.e. READY) via dm_bht_populate.
+		 */
+		BUG_ON(state < DM_BHT_ENTRY_READY);
+		node = dm_bht_get_node(bht, entry, depth, block);
+
+		if (dm_bht_compute_hash(bht, pg, offset, digest) ||
+		    memcmp(digest, node, bht->digest_size))
+			goto mismatch;
+
+		/* Keep the containing block of hashes to be verified in the
+		 * next pass.
+		 */
+		pg = virt_to_page(entry->nodes);
+		offset = offset_in_page(entry->nodes);
+	} while (--depth > 0 && state != DM_BHT_ENTRY_VERIFIED);
+
+	if (depth == 0 && state != DM_BHT_ENTRY_VERIFIED) {
+		if (dm_bht_compute_hash(bht, pg, offset, digest) ||
+		    memcmp(digest, bht->root_digest, bht->digest_size))
+			goto mismatch;
+		atomic_set(&entry->state, DM_BHT_ENTRY_VERIFIED);
+	}
+
+	/* Mark path to leaf as verified. */
+	for (depth++; depth < bht->depth; depth++) {
+		entry = dm_bht_get_entry(bht, depth, block);
+		/* At this point, entry can only be in VERIFIED or READY state.
+		 * So it is safe to use atomic_set instead of atomic_cmpxchg.
+		 */
+		atomic_set(&entry->state, DM_BHT_ENTRY_VERIFIED);
+	}
+
+	return 0;
+
+mismatch:
+	DMERR_LIMIT("verify_path: failed to verify hash (d=%d,bi=%u)",
+		    depth, block);
+	dm_bht_log_mismatch(bht, node, digest);
+	return DM_BHT_ENTRY_ERROR_MISMATCH;
+}
+EXPORT_SYMBOL(dm_bht_verify_block);
+
+/**
+ * dm_bht_is_populated - check that entries from disk needed to verify a given
+ *                       block are all ready
+ * @bht:	pointer to a dm_bht_create()d bht
+ * @block:	specific block data is expected from
+ *
+ * Callers may wish to call dm_bht_is_populated() when checking an io
+ * for which entries were already pending.
+ */
+bool dm_bht_is_populated(struct dm_bht *bht, unsigned int block)
+{
+	int depth;
+
+	for (depth = bht->depth - 1; depth >= 0; depth--) {
+		struct dm_bht_entry *entry = dm_bht_get_entry(bht, depth,
+							      block);
+		if (atomic_read(&entry->state) < DM_BHT_ENTRY_READY)
+			return false;
+	}
+
+	return true;
+}
+EXPORT_SYMBOL(dm_bht_is_populated);
+
+/**
+ * dm_bht_populate - reads entries from disk needed to verify a given block
+ * @bht:	pointer to a dm_bht_create()d bht
+ * @ctx:        context used for all read_cb calls on this request
+ * @block:	specific block data is expected from
+ *
+ * Returns negative value on error. Returns 0 on success.
+ */
+int dm_bht_populate(struct dm_bht *bht, void *ctx, unsigned int block)
+{
+	int depth, state;
+
+	BUG_ON(block >= bht->block_count);
+
+	for (depth = bht->depth - 1; depth >= 0; --depth) {
+		unsigned int index = dm_bht_index_at_level(bht, depth, block);
+		struct dm_bht_level *level = &bht->levels[depth];
+		struct dm_bht_entry *entry = dm_bht_get_entry(bht, depth,
+							      block);
+		state = atomic_cmpxchg(&entry->state,
+				       DM_BHT_ENTRY_UNALLOCATED,
+				       DM_BHT_ENTRY_PENDING);
+		if (state == DM_BHT_ENTRY_VERIFIED)
+			break;
+		if (state <= DM_BHT_ENTRY_ERROR)
+			goto error_state;
+		if (state != DM_BHT_ENTRY_UNALLOCATED)
+			continue;
+
+		/* Current entry is claimed for allocation and loading */
+		entry->nodes = kmalloc(bht->block_size, GFP_NOIO);
+		if (!entry->nodes)
+			goto nomem;
+
+		bht->read_cb(ctx,
+			     level->sector + to_sector(index * bht->block_size),
+			     entry->nodes, to_sector(bht->block_size), entry);
+	}
+
+	return 0;
+
+error_state:
+	DMCRIT("block %u at depth %d is in an error state", block, depth);
+	return -EPERM;
+
+nomem:
+	DMCRIT("failed to allocate memory for entry->nodes");
+	return -ENOMEM;
+}
+EXPORT_SYMBOL(dm_bht_populate);
+
+/**
+ * dm_bht_destroy - cleans up all memory used by @bht
+ * @bht:	pointer to a dm_bht_create()d bht
+ */
+void dm_bht_destroy(struct dm_bht *bht)
+{
+	int depth, cpu;
+
+	for (depth = 0; depth < bht->depth; depth++) {
+		struct dm_bht_entry *entry = bht->levels[depth].entries;
+		struct dm_bht_entry *entry_end = entry +
+						 bht->levels[depth].count;
+		for (; entry < entry_end; ++entry)
+			kfree(entry->nodes);
+		kfree(bht->levels[depth].entries);
+	}
+	kfree(bht->levels);
+	for (cpu = 0; cpu < nr_cpu_ids; ++cpu)
+		if (bht->hash_desc[cpu].tfm)
+			crypto_free_hash(bht->hash_desc[cpu].tfm);
+}
+EXPORT_SYMBOL(dm_bht_destroy);
+
+/*
+ * Accessors
+ */
+
+/**
+ * dm_bht_set_root_hexdigest - sets an unverified root digest hash from hex
+ * @bht:	pointer to a dm_bht_create()d bht
+ * @hexdigest:	array of u8s containing the new digest in binary
+ * Returns non-zero on error.  hexdigest should be NUL terminated.
+ */
+int dm_bht_set_root_hexdigest(struct dm_bht *bht, const u8 *hexdigest)
+{
+	/* Make sure we have at least the bytes expected */
+	if (strnlen((char *)hexdigest, bht->digest_size * 2) !=
+	    bht->digest_size * 2) {
+		DMERR("root digest length does not match hash algorithm");
+		return -1;
+	}
+	dm_bht_hex_to_bin(bht->root_digest, hexdigest, bht->digest_size);
+	return 0;
+}
+EXPORT_SYMBOL(dm_bht_set_root_hexdigest);
+
+/**
+ * dm_bht_root_hexdigest - returns root digest in hex
+ * @bht:	pointer to a dm_bht_create()d bht
+ * @hexdigest:	u8 array of size @available
+ * @available:	must be bht->digest_size * 2 + 1
+ */
+int dm_bht_root_hexdigest(struct dm_bht *bht, u8 *hexdigest, int available)
+{
+	if (available < 0 ||
+	    ((unsigned int) available) < bht->digest_size * 2 + 1) {
+		DMERR("hexdigest has too few bytes available");
+		return -EINVAL;
+	}
+	dm_bht_bin_to_hex(bht->root_digest, hexdigest, bht->digest_size);
+	return 0;
+}
+EXPORT_SYMBOL(dm_bht_root_hexdigest);
+
+/**
+ * dm_bht_set_salt - sets the salt used, in hex
+ * @bht:      pointer to a dm_bht_create()d bht
+ * @hexsalt:  salt string, as hex; will be zero-padded or truncated to
+ *            DM_BHT_SALT_SIZE * 2 hex digits.
+ */
+void dm_bht_set_salt(struct dm_bht *bht, const char *hexsalt)
+{
+	size_t saltlen = min(strlen(hexsalt) / 2, sizeof(bht->salt));
+
+	memset(bht->salt, 0, sizeof(bht->salt));
+	dm_bht_hex_to_bin(bht->salt, (const u8 *)hexsalt, saltlen);
+}
+EXPORT_SYMBOL(dm_bht_set_salt);
+
+/**
+ * dm_bht_salt - returns the salt used, in hex
+ * @bht:      pointer to a dm_bht_create()d bht
+ * @hexsalt:  buffer to put salt into, of length DM_BHT_SALT_SIZE * 2 + 1.
+ */
+int dm_bht_salt(struct dm_bht *bht, char *hexsalt)
+{
+	dm_bht_bin_to_hex(bht->salt, (u8 *)hexsalt, sizeof(bht->salt));
+	return 0;
+}
+EXPORT_SYMBOL(dm_bht_salt);
+
diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
new file mode 100644
index 0000000..a9bd0e8
--- /dev/null
+++ b/drivers/md/dm-verity.c
@@ -0,0 +1,1043 @@
+/*
+ * Originally based on dm-crypt.c,
+ * Copyright (C) 2003 Christophe Saout <christophe@saout.de>
+ * Copyright (C) 2004 Clemens Fruhwirth <clemens@endorphin.org>
+ * Copyright (C) 2006-2008 Red Hat, Inc. All rights reserved.
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *                    All Rights Reserved.
+ *
+ * This file is released under the GPLv2.
+ *
+ * Implements a verifying transparent block device.
+ * See Documentation/device-mapper/dm-verity.txt
+ */
+#include <linux/async.h>
+#include <linux/atomic.h>
+#include <linux/bio.h>
+#include <linux/blkdev.h>
+#include <linux/delay.h>
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/genhd.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mempool.h>
+#include <linux/mm_types.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/workqueue.h>
+#include <linux/device-mapper.h>
+#include <linux/dm-bht.h>
+
+#include "dm-verity.h"
+
+#define DM_MSG_PREFIX "verity"
+
+/* Supports up to 512-bit digests */
+#define VERITY_MAX_DIGEST_SIZE 64
+
+/* TODO(wad) make both of these report the error line/file to a
+ *           verity_bug function.
+ */
+#define VERITY_BUG(msg...) BUG()
+#define VERITY_BUG_ON(cond, msg...) BUG_ON(cond)
+
+/* Helper for printing sector_t */
+#define ULL(x) ((unsigned long long)(x))
+
+#define MIN_IOS 32
+#define MIN_BIOS (MIN_IOS * 2)
+#define VERITY_DEFAULT_BLOCK_SIZE 4096
+
+/* Provide a lightweight means of specifying the global default for
+ * error behavior: eio, reboot, or none
+ * Legacy support for 0 = eio, 1 = reboot/panic, 2 = none, 3 = notify.
+ * This is matched to the enum in dm-verity.h.
+ */
+static const char * const allowed_error_behaviors[] = { "eio", "panic", "none",
+							"notify", NULL };
+static char *error_behavior = "eio";
+module_param(error_behavior, charp, 0644);
+MODULE_PARM_DESC(error_behavior, "Behavior on error "
+				 "(eio, panic, none, notify)");
+
+/* Controls whether verity_get_device will wait forever for a device. */
+static int dev_wait;
+module_param(dev_wait, bool, 0444);
+MODULE_PARM_DESC(dev_wait, "Wait forever for a backing device");
+
+/* per-requested-bio private data */
+enum verity_io_flags {
+	VERITY_IOFLAGS_CLONED = 0x1,	/* original bio has been cloned */
+};
+
+struct dm_verity_io {
+	struct dm_target *target;
+	struct bio *bio;
+	struct delayed_work work;
+	unsigned int flags;
+
+	int error;
+	atomic_t pending;
+
+	u64 block;  /* aligned block index */
+	u64 count;  /* aligned count in blocks */
+};
+
+struct verity_config {
+	struct dm_dev *dev;
+	sector_t start;
+	sector_t size;
+
+	struct dm_dev *hash_dev;
+	sector_t hash_start;
+
+	struct dm_bht bht;
+
+	/* Pool required for io contexts */
+	mempool_t *io_pool;
+	/* Pool and bios required for making sure that backing device reads are
+	 * in PAGE_SIZE increments.
+	 */
+	struct bio_set *bs;
+
+	char hash_alg[CRYPTO_MAX_ALG_NAME];
+
+	int error_behavior;
+};
+
+static struct kmem_cache *_verity_io_pool;
+static struct workqueue_struct *kveritydq, *kverityd_ioq;
+
+static void kverityd_verify(struct work_struct *work);
+static void kverityd_io(struct work_struct *work);
+static void kverityd_io_bht_populate(struct dm_verity_io *io);
+static void kverityd_io_bht_populate_end(struct bio *, int error);
+
+static BLOCKING_NOTIFIER_HEAD(verity_error_notifier);
+
+/*
+ * Exported interfaces
+ */
+
+int dm_verity_register_error_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_register(&verity_error_notifier, nb);
+}
+EXPORT_SYMBOL_GPL(dm_verity_register_error_notifier);
+
+int dm_verity_unregister_error_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_unregister(&verity_error_notifier, nb);
+}
+EXPORT_SYMBOL_GPL(dm_verity_unregister_error_notifier);
+
+/*
+ * Allocation and utility functions
+ */
+
+static void kverityd_src_io_read_end(struct bio *clone, int error);
+
+/* Shared destructor for all internal bios */
+static void dm_verity_bio_destructor(struct bio *bio)
+{
+	struct dm_verity_io *io = bio->bi_private;
+	struct verity_config *vc = io->target->private;
+	bio_free(bio, vc->bs);
+}
+
+static struct bio *verity_alloc_bioset(struct verity_config *vc, gfp_t gfp_mask,
+				       int nr_iovecs)
+{
+	return bio_alloc_bioset(gfp_mask, nr_iovecs, vc->bs);
+}
+
+static struct dm_verity_io *verity_io_alloc(struct dm_target *ti,
+					    struct bio *bio)
+{
+	struct verity_config *vc = ti->private;
+	sector_t sector = bio->bi_sector - ti->begin;
+	struct dm_verity_io *io;
+
+	io = mempool_alloc(vc->io_pool, GFP_NOIO);
+	if (unlikely(!io))
+		return NULL;
+	io->flags = 0;
+	io->target = ti;
+	io->bio = bio;
+	io->error = 0;
+
+	/* Adjust the sector by the virtual starting sector */
+	io->block = to_bytes(sector) / vc->bht.block_size;
+	io->count = bio->bi_size / vc->bht.block_size;
+
+	atomic_set(&io->pending, 0);
+
+	return io;
+}
+
+static struct bio *verity_bio_clone(struct dm_verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+	struct bio *bio = io->bio;
+	struct bio *clone = verity_alloc_bioset(vc, GFP_NOIO, bio->bi_max_vecs);
+
+	if (!clone)
+		return NULL;
+
+	__bio_clone(clone, bio);
+	clone->bi_private = io;
+	clone->bi_end_io  = kverityd_src_io_read_end;
+	clone->bi_bdev    = vc->dev->bdev;
+	clone->bi_sector += vc->start - io->target->begin;
+	clone->bi_destructor = dm_verity_bio_destructor;
+
+	return clone;
+}
+
+/* If the request is not successful, this handler takes action.
+ * TODO make this call a registered handler.
+ */
+static void verity_error(struct verity_config *vc, struct dm_verity_io *io,
+			 int error)
+{
+	const char *message;
+	int error_mode = DM_VERITY_ERROR_BEHAVIOR_PANIC;
+	dev_t devt = 0;
+	u64 block = ~0;
+	int transient = 1;
+	struct dm_verity_error_state error_state;
+
+	if (vc) {
+		devt = vc->dev->bdev->bd_dev;
+		error_mode = vc->error_behavior;
+	}
+
+	if (io) {
+		io->error = -EIO;
+		block = io->block;
+	}
+
+	switch (error) {
+	case -ENOMEM:
+		message = "out of memory";
+		break;
+	case -EBUSY:
+		message = "pending data seen during verify";
+		break;
+	case -EFAULT:
+		message = "crypto operation failure";
+		break;
+	case -EACCES:
+		message = "integrity failure";
+		/* Image is bad. */
+		transient = 0;
+		break;
+	case -EPERM:
+		message = "hash tree population failure";
+		/* Should be dm-bht specific errors */
+		transient = 0;
+		break;
+	case -EINVAL:
+		message = "unexpected missing/invalid data";
+		/* The device was configured incorrectly - fallback. */
+		transient = 0;
+		break;
+	default:
+		/* Other errors can be passed through as IO errors */
+		message = "unknown or I/O error";
+		return;
+	}
+
+	DMERR_LIMIT("verification failure occurred: %s", message);
+
+	if (error_mode == DM_VERITY_ERROR_BEHAVIOR_NOTIFY) {
+		error_state.code = error;
+		error_state.transient = transient;
+		error_state.block = block;
+		error_state.message = message;
+		error_state.dev_start = vc->start;
+		error_state.dev_len = vc->size;
+		error_state.dev = vc->dev->bdev;
+		error_state.hash_dev_start = vc->hash_start;
+		error_state.hash_dev_len = vc->bht.sectors;
+		error_state.hash_dev = vc->hash_dev->bdev;
+
+		/* Set default fallthrough behavior. */
+		error_state.behavior = DM_VERITY_ERROR_BEHAVIOR_PANIC;
+		error_mode = DM_VERITY_ERROR_BEHAVIOR_PANIC;
+
+		if (!blocking_notifier_call_chain(
+		    &verity_error_notifier, transient, &error_state)) {
+			error_mode = error_state.behavior;
+		}
+	}
+
+	switch (error_mode) {
+	case DM_VERITY_ERROR_BEHAVIOR_EIO:
+		break;
+	case DM_VERITY_ERROR_BEHAVIOR_NONE:
+		if (error != -EIO && io)
+			io->error = 0;
+		break;
+	default:
+		goto do_panic;
+	}
+	return;
+
+do_panic:
+	panic("dm-verity failure: "
+	      "device:%u:%u error:%d block:%llu message:%s",
+	      MAJOR(devt), MINOR(devt), error, ULL(block), message);
+}
+
+/**
+ * verity_parse_error_behavior - parse a behavior charp to the enum
+ * @behavior:	NUL-terminated char array
+ *
+ * Checks if the behavior is valid either as text or as an index digit
+ * and returns the proper enum value or -1 on error.
+ */
+static int verity_parse_error_behavior(const char *behavior)
+{
+	const char * const *allowed = allowed_error_behaviors;
+	char index = '0';
+
+	for (; *allowed; allowed++, index++)
+		if (!strcmp(*allowed, behavior) || behavior[0] == index)
+			break;
+
+	if (!*allowed)
+		return -1;
+
+	/* Convert to the integer index matching the enum. */
+	return allowed - allowed_error_behaviors;
+}
+
+/*
+ * Reverse flow of requests into the device.
+ *
+ * (Start at the bottom with verity_map and work your way upward).
+ */
+
+static void verity_inc_pending(struct dm_verity_io *io);
+
+static void verity_return_bio_to_caller(struct dm_verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+
+	if (io->error)
+		verity_error(vc, io, io->error);
+
+	bio_endio(io->bio, io->error);
+	mempool_free(io, vc->io_pool);
+}
+
+/* Check for any missing bht hashes. */
+static bool verity_is_bht_populated(struct dm_verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+	u64 block;
+
+	for (block = io->block; block < io->block + io->count; ++block)
+		if (!dm_bht_is_populated(&vc->bht, block))
+			return false;
+
+	return true;
+}
+
+/* verity_dec_pending manages the lifetime of all dm_verity_io structs.
+ * Non-bug error handling is centralized through this interface and
+ * all passage from workqueue to workqueue.
+ */
+static void verity_dec_pending(struct dm_verity_io *io)
+{
+	if (!atomic_dec_and_test(&io->pending))
+		goto done;
+
+	if (unlikely(io->error))
+		goto io_error;
+
+	/* I/Os that were pending may now be ready */
+	if (verity_is_bht_populated(io)) {
+		INIT_DELAYED_WORK(&io->work, kverityd_verify);
+		queue_delayed_work(kveritydq, &io->work, 0);
+	} else {
+		INIT_DELAYED_WORK(&io->work, kverityd_io);
+		queue_delayed_work(kverityd_ioq, &io->work, HZ/10);
+	}
+
+done:
+	return;
+
+io_error:
+	verity_return_bio_to_caller(io);
+}
+
+/* Walks the data set and computes the hash of the data read from the
+ * untrusted source device.  The computed hash is then passed to dm-bht
+ * for verification.
+ */
+static int verity_verify(struct verity_config *vc,
+			 struct dm_verity_io *io)
+{
+	unsigned int block_size = vc->bht.block_size;
+	struct bio *bio = io->bio;
+	u64 block = io->block;
+	unsigned int idx;
+	int r;
+
+	for (idx = bio->bi_idx; idx < bio->bi_vcnt; idx++) {
+		struct bio_vec *bv = bio_iovec_idx(bio, idx);
+		unsigned int offset = bv->bv_offset;
+		unsigned int len = bv->bv_len;
+
+		VERITY_BUG_ON(offset % block_size);
+		VERITY_BUG_ON(len % block_size);
+
+		while (len) {
+			r = dm_bht_verify_block(&vc->bht, block,
+						bv->bv_page, offset);
+			if (r)
+				goto bad_return;
+
+			offset += block_size;
+			len -= block_size;
+			block++;
+			cond_resched();
+		}
+	}
+
+	return 0;
+
+bad_return:
+	/* dm_bht functions aren't expected to return errno friendly
+	 * values.  They are converted here for uniformity.
+	 */
+	if (r > 0) {
+		DMERR("Pending data for block %llu seen at verify", ULL(block));
+		r = -EBUSY;
+	} else {
+		DMERR_LIMIT("Block hash does not match!");
+		r = -EACCES;
+	}
+	return r;
+}
+
+/* Services the verify workqueue */
+static void kverityd_verify(struct work_struct *work)
+{
+	struct delayed_work *dwork = container_of(work, struct delayed_work,
+						  work);
+	struct dm_verity_io *io = container_of(dwork, struct dm_verity_io,
+					       work);
+	struct verity_config *vc = io->target->private;
+
+	io->error = verity_verify(vc, io);
+
+	/* Free up the bio and tag with the return value */
+	verity_return_bio_to_caller(io);
+}
+
+/* Asynchronously called upon the completion of dm-bht I/O.  The status
+ * of the operation is passed back to dm-bht and the next steps are
+ * decided by verity_dec_pending.
+ */
+static void kverityd_io_bht_populate_end(struct bio *bio, int error)
+{
+	struct dm_bht_entry *entry = (struct dm_bht_entry *) bio->bi_private;
+	struct dm_verity_io *io = (struct dm_verity_io *) entry->io_context;
+
+	/* Tell the tree to atomically update now that we've populated
+	 * the given entry.
+	 */
+	dm_bht_read_completed(entry, error);
+
+	/* Clean up for reuse when reading data to be checked */
+	bio->bi_vcnt = 0;
+	bio->bi_io_vec->bv_offset = 0;
+	bio->bi_io_vec->bv_len = 0;
+	bio->bi_io_vec->bv_page = NULL;
+	/* Restore the private data to I/O so the destructor can be shared. */
+	bio->bi_private = (void *) io;
+	bio_put(bio);
+
+	/* We bail but assume the tree has been marked bad. */
+	if (unlikely(error)) {
+		DMERR("Failed to read for sector %llu (%u)",
+		      ULL(io->bio->bi_sector), io->bio->bi_size);
+		io->error = error;
+		/* Pass through the error to verity_dec_pending below */
+	}
+	/* When pending = 0, it will transition to reading real data */
+	verity_dec_pending(io);
+}
+
+/* Called by dm-bht (via dm_bht_populate), this function provides
+ * the message digests to dm-bht that are stored on disk.
+ */
+static int kverityd_bht_read_callback(void *ctx, sector_t start, u8 *dst,
+				      sector_t count,
+				      struct dm_bht_entry *entry)
+{
+	struct dm_verity_io *io = ctx;  /* I/O for this batch */
+	struct verity_config *vc;
+	struct bio *bio;
+
+	vc = io->target->private;
+
+	/* The I/O context is nested inside the entry so that we don't need one
+	 * io context per page read.
+	 */
+	entry->io_context = ctx;
+
+	/* We should only get page size requests at present. */
+	verity_inc_pending(io);
+	bio = verity_alloc_bioset(vc, GFP_NOIO, 1);
+	if (unlikely(!bio)) {
+		DMCRIT("Out of memory at bio_alloc_bioset");
+		dm_bht_read_completed(entry, -ENOMEM);
+		return -ENOMEM;
+	}
+	bio->bi_private = (void *) entry;
+	bio->bi_idx = 0;
+	bio->bi_size = vc->bht.block_size;
+	bio->bi_sector = vc->hash_start + start;
+	bio->bi_bdev = vc->hash_dev->bdev;
+	bio->bi_end_io = kverityd_io_bht_populate_end;
+	bio->bi_rw = REQ_META;
+	/* Only need to free the bio since the page is managed by bht */
+	bio->bi_destructor = dm_verity_bio_destructor;
+	bio->bi_vcnt = 1;
+	bio->bi_io_vec->bv_offset = offset_in_page(dst);
+	bio->bi_io_vec->bv_len = to_bytes(count);
+	/* dst is guaranteed to be a page_pool allocation */
+	bio->bi_io_vec->bv_page = virt_to_page(dst);
+	/* Track that this I/O is in use.  There should be no risk of the io
+	 * being removed prior since this is called synchronously.
+	 */
+	generic_make_request(bio);
+	return 0;
+}
+
+/* Submits an io request for each missing block of block hashes.
+ * The last one to return will then enqueue this on the io workqueue.
+ */
+static void kverityd_io_bht_populate(struct dm_verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+	u64 block;
+
+	for (block = io->block; block < io->block + io->count; ++block) {
+		int ret = dm_bht_populate(&vc->bht, io, block);
+
+		if (ret < 0) {
+			/* verity_dec_pending will handle the error case. */
+			io->error = ret;
+			break;
+		}
+	}
+}
+
+/* Asynchronously called upon the completion of I/O issued
+ * from kverityd_src_io_read. verity_dec_pending() acts as
+ * the scheduler/flow manager.
+ */
+static void kverityd_src_io_read_end(struct bio *clone, int error)
+{
+	struct dm_verity_io *io = clone->bi_private;
+
+	if (unlikely(!bio_flagged(clone, BIO_UPTODATE) && !error))
+		error = -EIO;
+
+	if (unlikely(error)) {
+		DMERR("Error occurred: %d (%llu, %u)",
+			error, ULL(clone->bi_sector), clone->bi_size);
+		io->error = error;
+	}
+
+	/* Release the clone which just avoids the block layer from
+	 * leaving offsets, etc in unexpected states.
+	 */
+	bio_put(clone);
+
+	verity_dec_pending(io);
+}
+
+/* If not yet underway, an I/O request will be issued to the vc->dev
+ * device for the data needed. It is cloned to avoid unexpected changes
+ * to the original bio struct.
+ */
+static void kverityd_src_io_read(struct dm_verity_io *io)
+{
+	struct bio *clone;
+
+	/* Check if the read is already issued. */
+	if (io->flags & VERITY_IOFLAGS_CLONED)
+		return;
+
+	io->flags |= VERITY_IOFLAGS_CLONED;
+
+	/* Clone the bio. The block layer may modify the bvec array. */
+	clone = verity_bio_clone(io);
+	if (unlikely(!clone)) {
+		io->error = -ENOMEM;
+		return;
+	}
+
+	verity_inc_pending(io);
+
+	generic_make_request(clone);
+}
+
+/* kverityd_io services the I/O workqueue. For each pass through
+ * the I/O workqueue, a call to populate both the origin drive
+ * data and the hash tree data is made.
+ */
+static void kverityd_io(struct work_struct *work)
+{
+	struct delayed_work *dwork = container_of(work, struct delayed_work,
+						  work);
+	struct dm_verity_io *io = container_of(dwork, struct dm_verity_io,
+					       work);
+
+	/* Issue requests asynchronously. */
+	verity_inc_pending(io);
+	kverityd_src_io_read(io);
+	kverityd_io_bht_populate(io);
+	verity_dec_pending(io);
+}
+
+/* Paired with verity_dec_pending, the pending value in the io dictate the
+ * lifetime of a request and when it is ready to be processed on the
+ * workqueues.
+ */
+static void verity_inc_pending(struct dm_verity_io *io)
+{
+	atomic_inc(&io->pending);
+}
+
+/* Block-level requests start here. */
+static int verity_map(struct dm_target *ti, struct bio *bio,
+		      union map_info *map_context)
+{
+	struct dm_verity_io *io;
+	struct verity_config *vc;
+	struct request_queue *r_queue;
+
+	if (unlikely(!ti)) {
+		DMERR("dm_target was NULL");
+		return -EIO;
+	}
+
+	vc = ti->private;
+	r_queue = bdev_get_queue(vc->dev->bdev);
+
+	if (bio_data_dir(bio) == WRITE) {
+		/* If we silently drop writes, then the VFS layer will cache
+		 * the write and persist it in memory. While it doesn't change
+		 * the underlying storage, it still may be contrary to the
+		 * behavior expected by a verified, read-only device.
+		 */
+		DMWARN_LIMIT("write request received. rejecting with -EIO.");
+		verity_error(vc, NULL, -EIO);
+		return -EIO;
+	} else {
+		/* Queue up the request to be verified */
+		io = verity_io_alloc(ti, bio);
+		if (!io) {
+			DMERR_LIMIT("Failed to allocate and init IO data");
+			return DM_MAPIO_REQUEUE;
+		}
+		INIT_DELAYED_WORK(&io->work, kverityd_io);
+		queue_delayed_work(kverityd_ioq, &io->work, 0);
+	}
+
+	return DM_MAPIO_SUBMITTED;
+}
+
+static void splitarg(char *arg, char **key, char **val)
+{
+	*key = strsep(&arg, "=");
+	*val = strsep(&arg, "");
+}
+
+/*
+ * Non-block interfaces and device-mapper specific code
+ */
+
+/**
+ * verity_ctr - Construct a verified mapping
+ * @ti:   Target being created
+ * @argc: Number of elements in argv
+ * @argv: Vector of key-value pairs (see below).
+ *
+ * Accepts the following keys:
+ * @payload:        hashed device
+ * @hashtree:       device hashtree is stored on
+ * @hashstart:      start address of hashes (default 0)
+ * @block_size:     size of a hash block
+ * @alg:            hash algorithm
+ * @root_hexdigest: toplevel hash of the tree
+ * @error_behavior: what to do when verification fails [optional]
+ * @salt:           salt, in hex [optional]
+ *
+ * E.g.,
+ * payload=/dev/sda2 hashtree=/dev/sda3 alg=sha256
+ * root_hexdigest=f08aa4a3695290c569eb1b0ac032ae1040150afb527abbeb0a3da33d82fb2c6e
+ *
+ * TODO(wad):
+ * - Boot time addition
+ * - Track block verification to free block_hashes if memory use is a concern
+ * Testing needed:
+ * - Regular slub_debug tracing (on checkins)
+ * - Improper block hash padding
+ * - Improper bundle padding
+ * - Improper hash layout
+ * - Missing padding at end of device
+ * - Improperly sized underlying devices
+ * - Out of memory conditions (make sure this isn't too flaky under high load!)
+ * - Incorrect superhash
+ * - Incorrect block hashes
+ * - Incorrect bundle hashes
+ * - Boot-up read speed; sustained read speeds
+ */
+static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+{
+	struct verity_config *vc = NULL;
+	int ret = 0;
+	sector_t blocks;
+	unsigned int block_size = VERITY_DEFAULT_BLOCK_SIZE;
+	const char *payload = NULL;
+	const char *hashtree = NULL;
+	unsigned long hashstart = 0;
+	const char *alg = NULL;
+	const char *root_hexdigest = NULL;
+	const char *dev_error_behavior = error_behavior;
+	const char *hexsalt = "";
+	int i;
+
+	for (i = 0; i < argc; ++i) {
+		char *key, *val;
+		DMWARN("Argument %d: '%s'", i, argv[i]);
+		splitarg(argv[i], &key, &val);
+		if (!key) {
+			DMWARN("Bad argument %d: missing key?", i);
+			break;
+		}
+		if (!val) {
+			DMWARN("Bad argument %d='%s': missing value", i, key);
+			break;
+		}
+
+		if (!strcmp(key, "alg")) {
+			alg = val;
+		} else if (!strcmp(key, "payload")) {
+			payload = val;
+		} else if (!strcmp(key, "hashtree")) {
+			hashtree = val;
+		} else if (!strcmp(key, "root_hexdigest")) {
+			root_hexdigest = val;
+		} else if (!strcmp(key, "hashstart")) {
+			if (strict_strtoul(val, 10, &hashstart)) {
+				ti->error = "Invalid hashstart";
+				return -EINVAL;
+			}
+		} else if (!strcmp(key, "block_size")) {
+			unsigned long tmp;
+			if (strict_strtoul(val, 10, &tmp) ||
+			    (tmp > UINT_MAX)) {
+				ti->error = "Invalid block_size";
+				return -EINVAL;
+			}
+			block_size = (unsigned int)tmp;
+		} else if (!strcmp(key, "error_behavior")) {
+			dev_error_behavior = val;
+		} else if (!strcmp(key, "salt")) {
+			hexsalt = val;
+		} else if (!strcmp(key, "error_behavior")) {
+			dev_error_behavior = val;
+		}
+	}
+
+#define NEEDARG(n) \
+	if (!(n)) { \
+		ti->error = "Missing argument: " #n; \
+		return -EINVAL; \
+	}
+
+	NEEDARG(alg);
+	NEEDARG(payload);
+	NEEDARG(hashtree);
+	NEEDARG(root_hexdigest);
+
+#undef NEEDARG
+
+	/* The device mapper device should be setup read-only */
+	if ((dm_table_get_mode(ti->table) & ~FMODE_READ) != 0) {
+		ti->error = "Must be created readonly.";
+		return -EINVAL;
+	}
+
+	vc = kzalloc(sizeof(*vc), GFP_KERNEL);
+	if (!vc) {
+		/* TODO(wad) if this is called from the setup helper, then we
+		 * catch these errors and do a CrOS specific thing. if not, we
+		 * need to have this call the error handler.
+		 */
+		return -EINVAL;
+	}
+
+	/* Calculate the blocks from the given device size */
+	vc->size = ti->len;
+	blocks = to_bytes(vc->size) / block_size;
+	if (dm_bht_create(&vc->bht, blocks, block_size, alg)) {
+		DMERR("failed to create required bht");
+		goto bad_bht;
+	}
+	if (dm_bht_set_root_hexdigest(&vc->bht, root_hexdigest)) {
+		DMERR("root hexdigest error");
+		goto bad_root_hexdigest;
+	}
+	dm_bht_set_salt(&vc->bht, hexsalt);
+	vc->bht.read_cb = kverityd_bht_read_callback;
+
+	/* payload: device to verify */
+	vc->start = 0;  /* TODO: should this support a starting offset? */
+	/* We only ever grab the device in read-only mode. */
+	ret = dm_get_device(ti, payload,
+			    dm_table_get_mode(ti->table), &vc->dev);
+	if (ret) {
+		DMERR("Failed to acquire device '%s': %d", payload, ret);
+		ti->error = "Device lookup failed";
+		goto bad_verity_dev;
+	}
+
+	if ((to_bytes(vc->start) % block_size) ||
+	    (to_bytes(vc->size) % block_size)) {
+		ti->error = "Device must be block_size divisble/aligned";
+		goto bad_hash_start;
+	}
+
+	vc->hash_start = (sector_t)hashstart;
+
+	/* hashtree: device with hashes.
+	 * Note, payload == hashtree is okay as long as the size of
+	 *       ti->len passed to device mapper does not include
+	 *       the hashes.
+	 */
+	if (dm_get_device(ti, hashtree,
+			  dm_table_get_mode(ti->table), &vc->hash_dev)) {
+		ti->error = "Hash device lookup failed";
+		goto bad_hash_dev;
+	}
+
+	/* arg4: cryptographic digest algorithm */
+	if (snprintf(vc->hash_alg, CRYPTO_MAX_ALG_NAME, "%s", alg) >=
+	    CRYPTO_MAX_ALG_NAME) {
+		ti->error = "Hash algorithm name is too long";
+		goto bad_hash;
+	}
+
+	/* override with optional device-specific error behavior */
+	vc->error_behavior = verity_parse_error_behavior(dev_error_behavior);
+	if (vc->error_behavior == -1) {
+		ti->error = "Bad error_behavior supplied";
+		goto bad_err_behavior;
+	}
+
+	/* TODO: Maybe issues a request on the io queue for block 0? */
+
+	/* Argument processing is done, setup operational data */
+	/* Pool for dm_verity_io objects */
+	vc->io_pool = mempool_create_slab_pool(MIN_IOS, _verity_io_pool);
+	if (!vc->io_pool) {
+		ti->error = "Cannot allocate verity io mempool";
+		goto bad_slab_pool;
+	}
+
+	/* Allocate the bioset used for request padding */
+	/* TODO(wad) allocate a separate bioset for the first verify maybe */
+	vc->bs = bioset_create(MIN_BIOS, 0);
+	if (!vc->bs) {
+		ti->error = "Cannot allocate verity bioset";
+		goto bad_bs;
+	}
+
+	ti->num_flush_requests = 1;
+	ti->private = vc;
+
+	/* TODO(wad) add device and hash device names */
+	{
+		char hashdev[BDEVNAME_SIZE], vdev[BDEVNAME_SIZE];
+		bdevname(vc->hash_dev->bdev, hashdev);
+		bdevname(vc->dev->bdev, vdev);
+		DMINFO("dev:%s hash:%s [sectors:%llu blocks:%llu]", vdev,
+		       hashdev, ULL(vc->bht.sectors), ULL(blocks));
+	}
+	return 0;
+
+bad_bs:
+	mempool_destroy(vc->io_pool);
+bad_slab_pool:
+bad_err_behavior:
+bad_hash:
+	dm_put_device(ti, vc->hash_dev);
+bad_hash_dev:
+bad_hash_start:
+	dm_put_device(ti, vc->dev);
+bad_bht:
+bad_root_hexdigest:
+bad_verity_dev:
+	kfree(vc);   /* hash is not secret so no need to zero */
+	return -EINVAL;
+}
+
+static void verity_dtr(struct dm_target *ti)
+{
+	struct verity_config *vc = (struct verity_config *) ti->private;
+
+	bioset_free(vc->bs);
+	mempool_destroy(vc->io_pool);
+	dm_bht_destroy(&vc->bht);
+	dm_put_device(ti, vc->hash_dev);
+	dm_put_device(ti, vc->dev);
+	kfree(vc);
+}
+
+static int verity_status(struct dm_target *ti, status_type_t type,
+			char *result, unsigned int maxlen)
+{
+	struct verity_config *vc = (struct verity_config *) ti->private;
+	unsigned int sz = 0;
+	char hashdev[BDEVNAME_SIZE], vdev[BDEVNAME_SIZE];
+	u8 hexdigest[VERITY_MAX_DIGEST_SIZE * 2 + 1] = { 0 };
+
+	dm_bht_root_hexdigest(&vc->bht, hexdigest, sizeof(hexdigest));
+
+	switch (type) {
+	case STATUSTYPE_INFO:
+		break;
+	case STATUSTYPE_TABLE:
+		bdevname(vc->hash_dev->bdev, hashdev);
+		bdevname(vc->dev->bdev, vdev);
+		DMEMIT("/dev/%s /dev/%s %llu %u %s %s",
+			vdev,
+			hashdev,
+			ULL(vc->hash_start),
+			vc->bht.depth,
+			vc->hash_alg,
+			hexdigest);
+		break;
+	}
+	return 0;
+}
+
+static int verity_merge(struct dm_target *ti, struct bvec_merge_data *bvm,
+		       struct bio_vec *biovec, int max_size)
+{
+	struct verity_config *vc = ti->private;
+	struct request_queue *q = bdev_get_queue(vc->dev->bdev);
+
+	if (!q->merge_bvec_fn)
+		return max_size;
+
+	bvm->bi_bdev = vc->dev->bdev;
+	bvm->bi_sector = vc->start + bvm->bi_sector - ti->begin;
+
+	/* Optionally, this could just return 0 to stick to single pages. */
+	return min(max_size, q->merge_bvec_fn(q, bvm, biovec));
+}
+
+static int verity_iterate_devices(struct dm_target *ti,
+				 iterate_devices_callout_fn fn, void *data)
+{
+	struct verity_config *vc = ti->private;
+
+	return fn(ti, vc->dev, vc->start, ti->len, data);
+}
+
+static void verity_io_hints(struct dm_target *ti,
+			    struct queue_limits *limits)
+{
+	struct verity_config *vc = ti->private;
+	unsigned int block_size = vc->bht.block_size;
+
+	limits->logical_block_size = block_size;
+	limits->physical_block_size = block_size;
+	blk_limits_io_min(limits, block_size);
+}
+
+static struct target_type verity_target = {
+	.name   = "verity",
+	.version = {0, 1, 0},
+	.module = THIS_MODULE,
+	.ctr    = verity_ctr,
+	.dtr    = verity_dtr,
+	.map    = verity_map,
+	.merge  = verity_merge,
+	.status = verity_status,
+	.iterate_devices = verity_iterate_devices,
+	.io_hints = verity_io_hints,
+};
+
+#define VERITY_WQ_FLAGS (WQ_CPU_INTENSIVE|WQ_HIGHPRI)
+
+static int __init dm_verity_init(void)
+{
+	int r = -ENOMEM;
+
+	_verity_io_pool = KMEM_CACHE(dm_verity_io, 0);
+	if (!_verity_io_pool) {
+		DMERR("failed to allocate pool dm_verity_io");
+		goto bad_io_pool;
+	}
+
+	kverityd_ioq = alloc_workqueue("kverityd_io", VERITY_WQ_FLAGS, 1);
+	if (!kverityd_ioq) {
+		DMERR("failed to create workqueue kverityd_ioq");
+		goto bad_io_queue;
+	}
+
+	kveritydq = alloc_workqueue("kverityd", VERITY_WQ_FLAGS, 1);
+	if (!kveritydq) {
+		DMERR("failed to create workqueue kveritydq");
+		goto bad_verify_queue;
+	}
+
+	r = dm_register_target(&verity_target);
+	if (r < 0) {
+		DMERR("register failed %d", r);
+		goto register_failed;
+	}
+
+	DMINFO("version %u.%u.%u loaded", verity_target.version[0],
+	       verity_target.version[1], verity_target.version[2]);
+
+	return r;
+
+register_failed:
+	destroy_workqueue(kveritydq);
+bad_verify_queue:
+	destroy_workqueue(kverityd_ioq);
+bad_io_queue:
+	kmem_cache_destroy(_verity_io_pool);
+bad_io_pool:
+	return r;
+}
+
+static void __exit dm_verity_exit(void)
+{
+	destroy_workqueue(kveritydq);
+	destroy_workqueue(kverityd_ioq);
+
+	dm_unregister_target(&verity_target);
+	kmem_cache_destroy(_verity_io_pool);
+}
+
+module_init(dm_verity_init);
+module_exit(dm_verity_exit);
+
+MODULE_AUTHOR("The Chromium OS Authors <chromium-os-dev@chromium.org>");
+MODULE_DESCRIPTION(DM_NAME " target for transparent disk integrity checking");
+MODULE_LICENSE("GPL");
diff --git a/drivers/md/dm-verity.h b/drivers/md/dm-verity.h
new file mode 100644
index 0000000..e0664c9
--- /dev/null
+++ b/drivers/md/dm-verity.h
@@ -0,0 +1,45 @@
+/*
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *                    All Rights Reserved.
+ *
+ * This file is released under the GPLv2.
+ *
+ * Provide error types for use when creating a custom error handler.
+ * See Documentation/device-mapper/dm-verity.txt
+ */
+#ifndef DM_VERITY_H
+#define DM_VERITY_H
+
+#include <linux/notifier.h>
+
+struct dm_verity_error_state {
+	int code;
+	int transient;  /* Likely to not happen after a reboot */
+	u64 block;
+	const char *message;
+
+	sector_t dev_start;
+	sector_t dev_len;
+	struct block_device *dev;
+
+	sector_t hash_dev_start;
+	sector_t hash_dev_len;
+	struct block_device *hash_dev;
+
+	/* Final behavior after all notifications are completed. */
+	int behavior;
+};
+
+/* This enum must be matched to allowed_error_behaviors in dm-verity.c */
+enum dm_verity_error_behavior {
+	DM_VERITY_ERROR_BEHAVIOR_EIO = 0,
+	DM_VERITY_ERROR_BEHAVIOR_PANIC,
+	DM_VERITY_ERROR_BEHAVIOR_NONE,
+	DM_VERITY_ERROR_BEHAVIOR_NOTIFY
+};
+
+
+int dm_verity_register_error_notifier(struct notifier_block *nb);
+int dm_verity_unregister_error_notifier(struct notifier_block *nb);
+
+#endif  /* DM_VERITY_H */
diff --git a/include/linux/dm-bht.h b/include/linux/dm-bht.h
new file mode 100644
index 0000000..0595911
--- /dev/null
+++ b/include/linux/dm-bht.h
@@ -0,0 +1,166 @@
+/*
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *
+ * Device-Mapper block hash tree interface.
+ * See Documentation/device-mapper/dm-bht.txt for details.
+ *
+ * This file is released under the GPLv2.
+ */
+#ifndef __LINUX_DM_BHT_H
+#define __LINUX_DM_BHT_H
+
+#include <linux/compiler.h>
+#include <linux/crypto.h>
+#include <linux/types.h>
+
+/* To avoid allocating memory for digest tests, we just setup a
+ * max to use for now.
+ */
+#define DM_BHT_MAX_DIGEST_SIZE 128  /* 1k hashes are unlikely for now */
+#define DM_BHT_SALT_SIZE       32   /* 256 bits of salt is a lot */
+
+/* UNALLOCATED, PENDING, READY, and VERIFIED are valid states. All other
+ * values are entry-related return codes.
+ */
+#define DM_BHT_ENTRY_VERIFIED 8  /* 'nodes' has been checked against parent */
+#define DM_BHT_ENTRY_READY 4  /* 'nodes' is loaded and available */
+#define DM_BHT_ENTRY_PENDING 2  /* 'nodes' is being loaded */
+#define DM_BHT_ENTRY_UNALLOCATED 0 /* untouched */
+#define DM_BHT_ENTRY_ERROR -1 /* entry is unsuitable for use */
+#define DM_BHT_ENTRY_ERROR_IO -2 /* I/O error on load */
+
+/* Additional possible return codes */
+#define DM_BHT_ENTRY_ERROR_MISMATCH -3 /* Digest mismatch */
+
+/* dm_bht_entry
+ * Contains dm_bht->node_count tree nodes at a given tree depth.
+ * state is used to transactionally assure that data is paged in
+ * from disk.  Unless dm_bht kept running crypto contexts for each
+ * level, we need to load in the data for on-demand verification.
+ */
+struct dm_bht_entry {
+	atomic_t state; /* see defines */
+	/* Keeping an extra pointer per entry wastes up to ~33k of
+	 * memory if a 1m blocks are used (or 66 on 64-bit arch)
+	 */
+	void *io_context;  /* Reserve a pointer for use during io */
+	/* data should only be non-NULL if fully populated. */
+	void *nodes;  /* The hash data used to verify the children.
+		       * Guaranteed to be page-aligned.
+		       */
+};
+
+/* dm_bht_level
+ * Contains an array of entries which represent a page of hashes where
+ * each hash is a node in the tree at the given tree depth/level.
+ */
+struct dm_bht_level {
+	struct dm_bht_entry *entries;  /* array of entries of tree nodes */
+	unsigned int count;  /* number of entries at this level */
+	sector_t sector;  /* starting sector for this level */
+};
+
+/* opaque context, start, databuf, sector_count */
+typedef int(*dm_bht_callback)(void *,  /* external context */
+			      sector_t,  /* start sector */
+			      u8 *,  /* destination page */
+			      sector_t,  /* num sectors */
+			      struct dm_bht_entry *);
+/* dm_bht - Device mapper block hash tree
+ * dm_bht provides a fixed interface for comparing data blocks
+ * against a cryptographic hashes stored in a hash tree. It
+ * optimizes the tree structure for storage on disk.
+ *
+ * The tree is built from the bottom up.  A collection of data,
+ * external to the tree, is hashed and these hashes are stored
+ * as the blocks in the tree.  For some number of these hashes,
+ * a parent node is created by hashing them.  These steps are
+ * repeated.
+ *
+ * TODO(wad): All hash storage memory is pre-allocated and freed once an
+ * entire branch has been verified.
+ */
+struct dm_bht {
+	/* Configured values */
+	int depth;  /* Depth of the tree including the root */
+	unsigned int block_count;  /* Number of blocks hashed */
+	unsigned int block_size;  /* Size of a hash block */
+	char hash_alg[CRYPTO_MAX_ALG_NAME];
+	unsigned char salt[DM_BHT_SALT_SIZE];
+
+	/* Computed values */
+	unsigned int node_count;  /* Data size (in hashes) for each entry */
+	unsigned int node_count_shift;  /* first bit set - 1 */
+	/* There is one per CPU so that verified can be simultaneous. */
+	struct hash_desc hash_desc[NR_CPUS];  /* Container for the hash alg */
+	unsigned int digest_size;
+	sector_t sectors;  /* Number of disk sectors used */
+
+	/* bool verified;  Full tree is verified */
+	u8 root_digest[DM_BHT_MAX_DIGEST_SIZE];
+	struct dm_bht_level *levels;  /* in reverse order */
+	/* Callback for reading from the hash device */
+	dm_bht_callback read_cb;
+};
+
+/* Constructor for struct dm_bht instances. */
+int dm_bht_create(struct dm_bht *bht,
+		  unsigned int block_count,
+		  unsigned int block_size,
+		  const char *alg_name);
+/* Destructor for struct dm_bht instances.  Does not free @bht */
+void dm_bht_destroy(struct dm_bht *bht);
+
+/* Basic accessors for struct dm_bht */
+int dm_bht_set_root_hexdigest(struct dm_bht *bht, const u8 *hexdigest);
+int dm_bht_root_hexdigest(struct dm_bht *bht, u8 *hexdigest, int available);
+void dm_bht_set_salt(struct dm_bht *bht, const char *hexsalt);
+int dm_bht_salt(struct dm_bht *bht, char *hexsalt);
+
+/* Functions for loading in data from disk for verification */
+bool dm_bht_is_populated(struct dm_bht *bht, unsigned int block);
+int dm_bht_populate(struct dm_bht *bht, void *read_cb_ctx,
+		    unsigned int block);
+int dm_bht_verify_block(struct dm_bht *bht, unsigned int block,
+			struct page *pg, unsigned int offset);
+void dm_bht_read_completed(struct dm_bht_entry *entry, int status);
+
+/* Functions for converting indices to nodes. */
+
+static inline unsigned int dm_bht_get_level_shift(struct dm_bht *bht,
+						  int depth)
+{
+	return (bht->depth - depth) * bht->node_count_shift;
+}
+
+/* For the given depth, this is the entry index.  At depth+1 it is the node
+ * index for depth.
+ */
+static inline unsigned int dm_bht_index_at_level(struct dm_bht *bht,
+							int depth,
+							unsigned int leaf)
+{
+	return leaf >> dm_bht_get_level_shift(bht, depth);
+}
+
+static inline struct dm_bht_entry *dm_bht_get_entry(struct dm_bht *bht,
+						    int depth,
+						    unsigned int block)
+{
+	unsigned int index = dm_bht_index_at_level(bht, depth, block);
+	struct dm_bht_level *level = &bht->levels[depth];
+
+	return &level->entries[index];
+}
+
+static inline void *dm_bht_get_node(struct dm_bht *bht,
+				  struct dm_bht_entry *entry,
+				  int depth,
+				  unsigned int block)
+{
+	unsigned int index = dm_bht_index_at_level(bht, depth, block);
+	unsigned int node_index = index % bht->node_count;
+
+	return entry->nodes + (node_index * bht->digest_size);
+}
+#endif  /* __LINUX_DM_BHT_H */
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] dm: verity target
  2011-09-28 21:30   ` Valdis.Kletnieks
  2011-09-29  1:07     ` John Stoffel
@ 2011-09-29 17:31     ` Mandeep Singh Baines
  1 sibling, 0 replies; 22+ messages in thread
From: Mandeep Singh Baines @ 2011-09-29 17:31 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Will Drewry, Mandeep Singh Baines, Alasdair G Kergon, Milan Broz,
	dm-devel, Elly Jones, Olof Johansson, linux-kernel

Valdis.Kletnieks@vt.edu (Valdis.Kletnieks@vt.edu) wrote:
> On Tue, 27 Sep 2011 14:02:05 CDT, Will Drewry said:
> 
> > I was just curious if there is any interest in pulling this change, or
> > if not, if there is any particular set of concerns, fixes, etc.
> 
> Out of curiosity, how much of the stack does this end up eating?  My root
> filesystem is already ext4 on an LVM partition that's on a LUKS/dm-crypt
> partition on a hard drive, and I'm sure somebody out there will have used xfs
> instead - and then exported it via NFS or something. Are we going to get weird
> stack overflows if people throw dm-verity into this sort of mix?
> 

No. dm-verity uses very little stack since most of the code is running
in a separate workqueue context. The _map call is pretty light.

> > realize it's not a small amount of code to digest (though it is
> > smaller than the post from last year[1]).   Would re-posting with an
> > added blob explaining the name be useful,
> 
> Probably will need it to be merged, unless you set up an auto-reply that says
> "Patch rejected, 'verity' is *not* a typo for 'verify'" ;)
> 
> I'll hopefully have some more comments over the weekend if I get some spare
> cycles.
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] dm: verity target
  2011-09-28 21:30   ` Valdis.Kletnieks
@ 2011-09-29  1:07     ` John Stoffel
  2011-09-29 17:31     ` Mandeep Singh Baines
  1 sibling, 0 replies; 22+ messages in thread
From: John Stoffel @ 2011-09-29  1:07 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Will Drewry, Mandeep Singh Baines, Alasdair G Kergon, Milan Broz,
	dm-devel, Elly Jones, Olof Johansson, linux-kernel

>>>>> "Valdis" == Valdis Kletnieks <Valdis.Kletnieks@vt.edu> writes:

Valdis> On Tue, 27 Sep 2011 14:02:05 CDT, Will Drewry said:
>> I was just curious if there is any interest in pulling this change, or
>> if not, if there is any particular set of concerns, fixes, etc.

Valdis> Out of curiosity, how much of the stack does this end up eating?  My root
Valdis> filesystem is already ext4 on an LVM partition that's on a LUKS/dm-crypt
Valdis> partition on a hard drive, and I'm sure somebody out there will have used xfs
Valdis> instead - and then exported it via NFS or something. Are we going to get weird
Valdis> stack overflows if people throw dm-verity into this sort of mix?

>> realize it's not a small amount of code to digest (though it is
>> smaller than the post from last year[1]).   Would re-posting with an
>> added blob explaining the name be useful,

Valdis> Probably will need it to be merged, unless you set up an
Valdis> auto-reply that says "Patch rejected, 'verity' is *not* a typo
Valdis> for 'verify'" ;)

God, I've been reading this as veriFy all along.  I think your name
stinks because it's too close to Verify, and too obscure otherwise.

John

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] dm: verity target
  2011-09-27 19:02 ` Will Drewry
  2011-09-27 19:13   ` Alasdair G Kergon
@ 2011-09-28 21:30   ` Valdis.Kletnieks
  2011-09-29  1:07     ` John Stoffel
  2011-09-29 17:31     ` Mandeep Singh Baines
  1 sibling, 2 replies; 22+ messages in thread
From: Valdis.Kletnieks @ 2011-09-28 21:30 UTC (permalink / raw)
  To: Will Drewry
  Cc: Mandeep Singh Baines, Alasdair G Kergon, Milan Broz, dm-devel,
	Elly Jones, Olof Johansson, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 989 bytes --]

On Tue, 27 Sep 2011 14:02:05 CDT, Will Drewry said:

> I was just curious if there is any interest in pulling this change, or
> if not, if there is any particular set of concerns, fixes, etc.

Out of curiosity, how much of the stack does this end up eating?  My root
filesystem is already ext4 on an LVM partition that's on a LUKS/dm-crypt
partition on a hard drive, and I'm sure somebody out there will have used xfs
instead - and then exported it via NFS or something. Are we going to get weird
stack overflows if people throw dm-verity into this sort of mix?

> realize it's not a small amount of code to digest (though it is
> smaller than the post from last year[1]).   Would re-posting with an
> added blob explaining the name be useful,

Probably will need it to be merged, unless you set up an auto-reply that says
"Patch rejected, 'verity' is *not* a typo for 'verify'" ;)

I'll hopefully have some more comments over the weekend if I get some spare
cycles.


[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] dm: verity target
  2011-09-27 19:13   ` Alasdair G Kergon
@ 2011-09-27 19:31     ` Will Drewry
  0 siblings, 0 replies; 22+ messages in thread
From: Will Drewry @ 2011-09-27 19:31 UTC (permalink / raw)
  To: Will Drewry, Mandeep Singh Baines, Alasdair G Kergon, Milan Broz,
	dm-devel, Elly Jones, Olof Johansson, linux-kernel

On Tue, Sep 27, 2011 at 2:13 PM, Alasdair G Kergon <agk@redhat.com> wrote:
> Well I intend to look at it seriously (and dm-switch too) once we're past the
> next merge window.
>
> Until then, my priority is finalising things scheduled for the upcoming merge
> window - in particular the new thin provisioning target.


Thanks - that makes perfect sense! (I'm quite excited to see the thin
provisioning target land and mature, as well.)

cheers!
will

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] dm: verity target
  2011-09-27 19:02 ` Will Drewry
@ 2011-09-27 19:13   ` Alasdair G Kergon
  2011-09-27 19:31     ` Will Drewry
  2011-09-28 21:30   ` Valdis.Kletnieks
  1 sibling, 1 reply; 22+ messages in thread
From: Alasdair G Kergon @ 2011-09-27 19:13 UTC (permalink / raw)
  To: Will Drewry
  Cc: Mandeep Singh Baines, Alasdair G Kergon, Milan Broz, dm-devel,
	Elly Jones, Olof Johansson, linux-kernel

Well I intend to look at it seriously (and dm-switch too) once we're past the
next merge window.

Until then, my priority is finalising things scheduled for the upcoming merge
window - in particular the new thin provisioning target.

Alasdair

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] dm: verity target
  2011-09-15 18:45 Mandeep Singh Baines
  2011-09-16 17:54 ` Valdis.Kletnieks
@ 2011-09-27 19:02 ` Will Drewry
  2011-09-27 19:13   ` Alasdair G Kergon
  2011-09-28 21:30   ` Valdis.Kletnieks
  1 sibling, 2 replies; 22+ messages in thread
From: Will Drewry @ 2011-09-27 19:02 UTC (permalink / raw)
  To: Mandeep Singh Baines, Alasdair G Kergon, Milan Broz
  Cc: dm-devel, Elly Jones, Olof Johansson, linux-kernel

Hi all!

I was just curious if there is any interest in pulling this change, or
if not, if there is any particular set of concerns, fixes, etc.  I
realize it's not a small amount of code to digest (though it is
smaller than the post from last year[1]).   Would re-posting with an
added blob explaining the name be useful, or, perhaps, a name change,
or is there anything further that would be beneficial to
consideration?  Jonathan Corbet was kind enough to wade through the
docs and code to write an article[2] which may help.  Additionally,
Mandeep and I presented[3] at the Security Summit and the Filesystems
track of Plumbers on the topic which I hope helped show the value of
this patch (everything from layering with EVM to providing tboot users
with a fast, efficient way to verify their system images without
requiring immutable media).

As usual, any and all guidance/feedback/flames will be appreciated - thanks!
will


1 - http://thread.gmane.org/gmane.linux.kernel/989307
2 - http://lwn.net/Articles/459420/
3 - http://selinuxproject.org/~jmorris/lss2011_slides/LSS_11_Integrity_checked_block_devices.pdf

On Thu, Sep 15, 2011 at 1:45 PM, Mandeep Singh Baines <msb@chromium.org> wrote:
> The verity target provides transparent integrity checking of block devices
> using a cryptographic digest.
>
> dm-verity is meant to be setup as part of a verified boot path.  This
> may be anything ranging from a boot using tboot or trustedgrub to just
> booting from a known-good device (like a USB drive or CD).
>
> dm-verity is part of ChromeOS's verified boot path. It is used to verify
> the integrity of the root filesystem on boot. The root filesystem is
> mounted on a dm-verity partition which transparently verifies each block
> with a bootloader verified hash passed into the kernel at boot.
>
> Signed-off-by: Will Drewry <wad@chromium.org>
> Signed-off-by: Elly Jones <ellyjones@chromium.org>
> Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
> Cc: Alasdair G Kergon <agk@redhat.com>
> Cc: Milan Broz <mbroz@redhat.com>
> Cc: Olof Johansson <olofj@chromium.org>
> Cc: dm-devel@redhat.com
> Cc: linux-kernel@vger.kernel.org
> ---
>  Documentation/device-mapper/dm-bht.txt    |   59 ++
>  Documentation/device-mapper/dm-verity.txt |   76 +++
>  drivers/md/Kconfig                        |   30 +
>  drivers/md/Makefile                       |    2 +
>  drivers/md/dm-bht.c                       |  541 +++++++++++++++
>  drivers/md/dm-verity.c                    | 1043 +++++++++++++++++++++++++++++
>  drivers/md/dm-verity.h                    |   45 ++
>  include/linux/dm-bht.h                    |  166 +++++
>  8 files changed, 1962 insertions(+), 0 deletions(-)
>  create mode 100644 Documentation/device-mapper/dm-bht.txt
>  create mode 100644 Documentation/device-mapper/dm-verity.txt
>  create mode 100644 drivers/md/dm-bht.c
>  create mode 100644 drivers/md/dm-verity.c
>  create mode 100644 drivers/md/dm-verity.h
>  create mode 100644 include/linux/dm-bht.h
>
> diff --git a/Documentation/device-mapper/dm-bht.txt b/Documentation/device-mapper/dm-bht.txt
> new file mode 100644
> index 0000000..21d929f
> --- /dev/null
> +++ b/Documentation/device-mapper/dm-bht.txt
> @@ -0,0 +1,59 @@
> +dm-bht
> +======
> +
> +dm-bht provides a block hash tree implementation.  The use of dm-bht allows
> +for integrity checking of a given block device without reading the entire
> +set of blocks into memory before use.
> +
> +In particular, dm-bht supplies an interface for creating and verifying a tree
> +of cryptographic digests with any algorithm supported by the kernel crypto API.
> +
> +The `verity' target is the motivating example.
> +
> +
> +Theory of operation
> +===================
> +
> +dm-bht is logically comprised of multiple nodes organized in a tree-like
> +structure.  Each node in the tree is a cryptographic hash.  If it is a leaf
> +node, the hash is of some block data on disk.  If it is an intermediary node,
> +then the hash is of a number of child nodes.
> +
> +dm-bht has a given depth starting at 1 (ignoring the root node).  Each level in
> +the tree is concretely made up of dm_bht_entry structs.  Each entry in the tree
> +is a collection of neighboring nodes that fit in one page-sized block.  The
> +number is determined based on PAGE_SIZE and the size of the selected
> +cryptographic digest algorithm.  The hashes are linearly ordered in this entry
> +and any unaligned trailing space is ignored but included when calculating the
> +parent node.
> +
> +The tree looks something like:
> +
> +alg= sha256, num_blocks = 32767
> +                                 [   root    ]
> +                                /    . . .    \
> +                     [entry_0]                 [entry_1]
> +                    /  . . .  \                 . . .   \
> +         [entry_0_0]   . . .  [entry_0_127]    . . . .  [entry_1_127]
> +           / ... \             /   . . .  \             /           \
> +     blk_0 ... blk_127  blk_16256   blk_16383      blk_32640 . . . blk_32767
> +
> +root is treated independently from the depth and the blocks are expected to
> +be hashed and supplied to the dm-bht.  hash blocks that make up the entry
> +contents are expected to be read from disk.
> +
> +dm-bht does not handle I/O directly but instead expects the consumer to
> +supply callbacks.  The read callback will always receive a page-align value
> +to pass to the block device layer to read in a hash value.
> +
> +Usage
> +=====
> +
> +The API provides mechanisms for reading and verifying a tree. When reading, all
> +required data for the hash tree should be populated for a block before
> +attempting a verify.  This can be done by calling dm_bht_populate().  When all
> +data is ready, a call to dm_bht_verify_block() with the expected hash value will
> +perform both the direct block hash check and the hashes of the parent and
> +neighboring nodes where needed to ensure validity up to the root hash.  Note,
> +dm_bht_set_root_hexdigest() should be called before any verification attempts
> +occur.
> diff --git a/Documentation/device-mapper/dm-verity.txt b/Documentation/device-mapper/dm-verity.txt
> new file mode 100644
> index 0000000..f33b984
> --- /dev/null
> +++ b/Documentation/device-mapper/dm-verity.txt
> @@ -0,0 +1,76 @@
> +dm-verity
> +==========
> +
> +Device-Mapper's "verity" target provides transparent integrity checking of
> +block devices using a cryptographic digest provided by the kernel crypto API.
> +This target is read-only.
> +
> +Parameters: payload=<device path> hashtree=<hash device path> alg=<alg> \
> +            salt=<salt> root_hexagiest=<root hash> \
> +            [ hashstart=<hash start> error_behavior=<error behavior> ]
> +
> +<device path>
> +    This is the device that is going to be integrity checked.  It may be
> +    a subset of the full device as specified to dmsetup (start sector and count)
> +    It may be specified as a path, like /dev/sdaX, or a device number,
> +    <major>:<minor>.
> +
> +<hash device path>
> +    This is the device that that supplies the dm-bht hash data.  It may be
> +    specified similarly to the device path and may be the same device.  If the
> +    same device is used, the hash offset should be outside of the dm-verity
> +    configured device size.
> +
> +<alg>
> +    The cryptographic hash algorithm used for this device.  This should
> +    be the name of the algorithm, like "sha1".
> +
> +<salt>
> +    Salt value (in hex).
> +
> +<root hash>
> +    The hexadecimal encoding of the cryptographic hash of all of the
> +    neighboring nodes at the first level of the tree.  This hash should be
> +    trusted as there is no other authenticity beyond this point.
> +
> +<hash start>
> +    Start address of hashes (default 0).
> +
> +<error behavior>
> +    0 = return -EIO. 1 = panic. 2 = none. 3 = call notifier.
> +
> +Theory of operation
> +===================
> +
> +dm-verity is meant to be setup as part of a verified boot path.  This
> +may be anything ranging from a boot using tboot or trustedgrub to just
> +booting from a known-good device (like a USB drive or CD).
> +
> +When a dm-verity device is configured, it is expected that the caller
> +has been authenticated in some way (cryptographic signatures, etc).
> +After instantiation, all hashes will be verified on-demand during
> +disk access.  If they cannot be verified up to the root node of the
> +tree, the root hash, then the I/O will fail.  This should identify
> +tampering with any data on the device and the hash data.
> +
> +Cryptographic hashes are used to assert the integrity of the device on a
> +per-block basis.  This allows for a lightweight hash computation on first read
> +into the page cache.  Block hashes are stored linearly aligned to the nearest
> +block the size of a page.
> +
> +For more information on the hashing process, see dm-bht.txt.
> +
> +
> +Example
> +=======
> +
> +Setup a device;
> +[[
> +  dmsetup create vroot --table \
> +    "0 204800 verity payload=/dev/sda1 hashtree=/dev/sda2 alg=sha1 "\
> +    "root_hexdigest=9f74809a2ee7607b16fcc70d9399a4de9725a727"
> +]]
> +
> +A command line tool is available to compute the hash tree and return the
> +root hash value.
> +  http://git.chromium.org/cgi-bin/gitweb.cgi?p=dm-verity.git;a=tree
> diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
> index f75a66e..cb5f425 100644
> --- a/drivers/md/Kconfig
> +++ b/drivers/md/Kconfig
> @@ -334,4 +334,34 @@ config DM_FLAKEY
>        ---help---
>          A target that intermittently fails I/O for debugging purposes.
>
> +config DM_BHT
> +        tristate "Block hash tree support"
> +        select CRYPTO
> +        select CRYPTO_HASH
> +        ---help---
> +          Include support for device-mapper devices to use a block hash
> +          tree for managing data integrity checks in a scalable way.
> +
> +          Targets that use this functionality should include it
> +          automatically.
> +
> +          If unsure, say N.
> +
> +config DM_VERITY
> +        tristate "Verity target support"
> +        depends on BLK_DEV_DM
> +        select DM_BHT
> +        select CRYPTO
> +        select CRYPTO_HASH
> +        ---help---
> +          This device-mapper target allows you to create a device that
> +          transparently integrity checks the data on it. You'll need to
> +          activate the digests you're going to use in the cryptoapi
> +          configuration.
> +
> +          To compile this code as a module, choose M here: the module will
> +          be called dm-verity.
> +
> +          If unsure, say N.
> +
>  endif # MD
> diff --git a/drivers/md/Makefile b/drivers/md/Makefile
> index 448838b..58eb088 100644
> --- a/drivers/md/Makefile
> +++ b/drivers/md/Makefile
> @@ -36,6 +36,8 @@ obj-$(CONFIG_DM_MULTIPATH_ST) += dm-service-time.o
>  obj-$(CONFIG_DM_SNAPSHOT)      += dm-snapshot.o
>  obj-$(CONFIG_DM_MIRROR)                += dm-mirror.o dm-log.o dm-region-hash.o
>  obj-$(CONFIG_DM_LOG_USERSPACE) += dm-log-userspace.o
> +obj-$(CONFIG_DM_BHT)            += dm-bht.o
> +obj-$(CONFIG_DM_VERITY)         += dm-verity.o
>  obj-$(CONFIG_DM_ZERO)          += dm-zero.o
>  obj-$(CONFIG_DM_RAID)  += dm-raid.o
>
> diff --git a/drivers/md/dm-bht.c b/drivers/md/dm-bht.c
> new file mode 100644
> index 0000000..32b8ccf
> --- /dev/null
> +++ b/drivers/md/dm-bht.c
> @@ -0,0 +1,541 @@
> + /*
> + * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
> + *
> + * Device-Mapper block hash tree interface.
> + * See Documentation/device-mapper/dm-bht.txt for details.
> + *
> + * This file is released under the GPLv2.
> + */
> +
> +#include <linux/atomic.h>
> +#include <linux/bitops.h>
> +#include <linux/bug.h>
> +#include <linux/cpumask.h>
> +#include <linux/device-mapper.h>
> +#include <linux/dm-bht.h>
> +#include <linux/err.h>
> +#include <linux/errno.h>
> +#include <linux/gfp.h>
> +#include <linux/kernel.h>
> +#include <linux/mm_types.h>
> +#include <linux/scatterlist.h>
> +#include <linux/slab.h>
> +#include <linux/string.h>
> +
> +#define DM_MSG_PREFIX "dm bht"
> +
> +
> +/*
> + * Utilities
> + */
> +
> +static u8 from_hex(u8 ch)
> +{
> +       if ((ch >= '0') && (ch <= '9'))
> +               return ch - '0';
> +       if ((ch >= 'a') && (ch <= 'f'))
> +               return ch - 'a' + 10;
> +       if ((ch >= 'A') && (ch <= 'F'))
> +               return ch - 'A' + 10;
> +       return -1;
> +}
> +
> +/**
> + * dm_bht_bin_to_hex - converts a binary stream to human-readable hex
> + * @binary:    a byte array of length @binary_len
> + * @hex:       a byte array of length @binary_len * 2 + 1
> + */
> +static void dm_bht_bin_to_hex(u8 *binary, u8 *hex, unsigned int binary_len)
> +{
> +       while (binary_len-- > 0) {
> +               sprintf((char *)hex, "%02hhx", (int)*binary);
> +               hex += 2;
> +               binary++;
> +       }
> +}
> +
> +/**
> + * dm_bht_hex_to_bin - converts a hex stream to binary
> + * @binary:    a byte array of length @binary_len
> + * @hex:       a byte array of length @binary_len * 2 + 1
> + */
> +static void dm_bht_hex_to_bin(u8 *binary, const u8 *hex,
> +                             unsigned int binary_len)
> +{
> +       while (binary_len-- > 0) {
> +               *binary = from_hex(*(hex++));
> +               *binary *= 16;
> +               *binary += from_hex(*(hex++));
> +               binary++;
> +       }
> +}
> +
> +static void dm_bht_log_mismatch(struct dm_bht *bht, u8 *given, u8 *computed)
> +{
> +       u8 given_hex[DM_BHT_MAX_DIGEST_SIZE * 2 + 1];
> +       u8 computed_hex[DM_BHT_MAX_DIGEST_SIZE * 2 + 1];
> +
> +       dm_bht_bin_to_hex(given, given_hex, bht->digest_size);
> +       dm_bht_bin_to_hex(computed, computed_hex, bht->digest_size);
> +       DMERR_LIMIT("%s != %s", given_hex, computed_hex);
> +}
> +
> +/**
> + * dm_bht_compute_hash: hashes a page of data
> + */
> +static int dm_bht_compute_hash(struct dm_bht *bht, struct page *pg,
> +                              unsigned int offset, u8 *digest)
> +{
> +       struct hash_desc *hash_desc = &bht->hash_desc[smp_processor_id()];
> +       struct scatterlist sg;
> +
> +       sg_init_table(&sg, 1);
> +       sg_set_page(&sg, pg, bht->block_size, offset);
> +       /* Note, this is synchronous. */
> +       if (crypto_hash_init(hash_desc)) {
> +               DMCRIT("failed to reinitialize crypto hash (proc:%d)",
> +                       smp_processor_id());
> +               return -EINVAL;
> +       }
> +       if (crypto_hash_update(hash_desc, &sg, bht->block_size)) {
> +               DMCRIT("crypto_hash_update failed");
> +               return -EINVAL;
> +       }
> +       sg_set_buf(&sg, bht->salt, sizeof(bht->salt));
> +       if (crypto_hash_update(hash_desc, &sg, sizeof(bht->salt))) {
> +               DMCRIT("crypto_hash_update failed");
> +               return -EINVAL;
> +       }
> +       if (crypto_hash_final(hash_desc, digest)) {
> +               DMCRIT("crypto_hash_final failed");
> +               return -EINVAL;
> +       }
> +
> +       return 0;
> +}
> +
> +/*
> + * Implementation functions
> + */
> +
> +static int dm_bht_initialize_entries(struct dm_bht *bht)
> +{
> +       /* last represents the index of the last digest store in the tree.
> +        * By walking the tree with that index, it is possible to compute the
> +        * total number of entries at each level.
> +        *
> +        * Since each entry will contain up to |node_count| nodes of the tree,
> +        * it is possible that the last index may not be at the end of a given
> +        * entry->nodes.  In that case, it is assumed the value is padded.
> +        *
> +        * Note, we treat both the tree root (1 hash) and the tree leaves
> +        * independently from the bht data structures.  Logically, the root is
> +        * depth=-1 and the block layer level is depth=bht->depth
> +        */
> +       unsigned int last = bht->block_count;
> +       int depth;
> +
> +       /* check that the largest level->count can't result in an int overflow
> +        * on allocation or sector calculation.
> +        */
> +       if (((last >> bht->node_count_shift) + 1) >
> +           UINT_MAX / max((unsigned int)sizeof(struct dm_bht_entry),
> +                          (unsigned int)to_sector(bht->block_size))) {
> +               DMCRIT("required entries %u is too large", last + 1);
> +               return -EINVAL;
> +       }
> +
> +       /* Track the current sector location for each level so we don't have to
> +        * compute it during traversals.
> +        */
> +       bht->sectors = 0;
> +       for (depth = 0; depth < bht->depth; ++depth) {
> +               struct dm_bht_level *level = &bht->levels[depth];
> +
> +               level->count = dm_bht_index_at_level(bht, depth, last) + 1;
> +               level->entries = (struct dm_bht_entry *)
> +                                kcalloc(level->count,
> +                                        sizeof(struct dm_bht_entry),
> +                                        GFP_KERNEL);
> +               if (!level->entries) {
> +                       DMERR("failed to allocate entries for depth %d", depth);
> +                       return -ENOMEM;
> +               }
> +               level->sector = bht->sectors;
> +               bht->sectors += level->count * to_sector(bht->block_size);
> +       }
> +
> +       return 0;
> +}
> +
> +/**
> + * dm_bht_create - prepares @bht for us
> + * @bht:       pointer to a dm_bht_create()d bht
> + * @depth:     tree depth without the root; including block hashes
> + * @block_count:the number of block hashes / tree leaves
> + * @alg_name:  crypto hash algorithm name
> + *
> + * Returns 0 on success.
> + *
> + * Callers can offset into devices by storing the data in the io callbacks.
> + */
> +int dm_bht_create(struct dm_bht *bht, unsigned int block_count,
> +                 unsigned int block_size, const char *alg_name)
> +{
> +       int cpu, status;
> +
> +       bht->block_size = block_size;
> +       /* Verify that PAGE_SIZE >= block_size >= SECTOR_SIZE. */
> +       if ((block_size > PAGE_SIZE) ||
> +           (PAGE_SIZE % block_size) ||
> +           (to_sector(block_size) == 0))
> +               return -EINVAL;
> +
> +       /* Setup the hash first. Its length determines much of the bht layout */
> +       for (cpu = 0; cpu < nr_cpu_ids; ++cpu) {
> +               bht->hash_desc[cpu].tfm = crypto_alloc_hash(alg_name, 0, 0);
> +               if (IS_ERR(bht->hash_desc[cpu].tfm)) {
> +                       DMERR("failed to allocate crypto hash '%s'", alg_name);
> +                       status = -ENOMEM;
> +                       bht->hash_desc[cpu].tfm = NULL;
> +                       goto bad_arg;
> +               }
> +       }
> +       bht->digest_size = crypto_hash_digestsize(bht->hash_desc[0].tfm);
> +       /* We expect to be able to pack >=2 hashes into a block */
> +       if (block_size / bht->digest_size < 2) {
> +               DMERR("too few hashes fit in a block");
> +               status = -EINVAL;
> +               goto bad_arg;
> +       }
> +
> +       if (bht->digest_size > DM_BHT_MAX_DIGEST_SIZE) {
> +               DMERR("DM_BHT_MAX_DIGEST_SIZE too small for chosen digest");
> +               status = -EINVAL;
> +               goto bad_arg;
> +       }
> +
> +       /* Configure the tree */
> +       bht->block_count = block_count;
> +       if (block_count == 0) {
> +               DMERR("block_count must be non-zero");
> +               status = -EINVAL;
> +               goto bad_arg;
> +       }
> +
> +       /* Each dm_bht_entry->nodes is one block.  The node code tracks
> +        * how many nodes fit into one entry where a node is a single
> +        * hash (message digest).
> +        */
> +       bht->node_count_shift = fls(block_size / bht->digest_size) - 1;
> +       /* Round down to the nearest power of two.  This makes indexing
> +        * into the tree much less painful.
> +        */
> +       bht->node_count = 1 << bht->node_count_shift;
> +
> +       /* This is unlikely to happen, but with 64k pages, who knows. */
> +       if (bht->node_count > UINT_MAX / bht->digest_size) {
> +               DMERR("node_count * hash_len exceeds UINT_MAX!");
> +               status = -EINVAL;
> +               goto bad_arg;
> +       }
> +
> +       bht->depth = DIV_ROUND_UP(fls(block_count - 1), bht->node_count_shift);
> +
> +       /* Ensure that we can safely shift by this value. */
> +       if (bht->depth * bht->node_count_shift >= sizeof(unsigned int) * 8) {
> +               DMERR("specified depth and node_count_shift is too large");
> +               status = -EINVAL;
> +               goto bad_arg;
> +       }
> +
> +       /* Allocate levels. Each level of the tree may have an arbitrary number
> +        * of dm_bht_entry structs.  Each entry contains node_count nodes.
> +        * Each node in the tree is a cryptographic digest of either node_count
> +        * nodes on the subsequent level or of a specific block on disk.
> +        */
> +       bht->levels = (struct dm_bht_level *)
> +                       kcalloc(bht->depth,
> +                               sizeof(struct dm_bht_level), GFP_KERNEL);
> +       if (!bht->levels) {
> +               DMERR("failed to allocate tree levels");
> +               status = -ENOMEM;
> +               goto bad_level_alloc;
> +       }
> +
> +       bht->read_cb = NULL;
> +
> +       status = dm_bht_initialize_entries(bht);
> +       if (status)
> +               goto bad_entries_alloc;
> +
> +       /* We compute depth such that there is only be 1 block at level 0. */
> +       BUG_ON(bht->levels[0].count != 1);
> +
> +       return 0;
> +
> +bad_entries_alloc:
> +       while (bht->depth-- > 0)
> +               kfree(bht->levels[bht->depth].entries);
> +       kfree(bht->levels);
> +bad_level_alloc:
> +bad_arg:
> +       for (cpu = 0; cpu < nr_cpu_ids; ++cpu)
> +               if (bht->hash_desc[cpu].tfm)
> +                       crypto_free_hash(bht->hash_desc[cpu].tfm);
> +       return status;
> +}
> +EXPORT_SYMBOL(dm_bht_create);
> +
> +/**
> + * dm_bht_read_completed
> + * @entry:     pointer to the entry that's been loaded
> + * @status:    I/O status. Non-zero is failure.
> + * MUST always be called after a read_cb completes.
> + */
> +void dm_bht_read_completed(struct dm_bht_entry *entry, int status)
> +{
> +       if (status) {
> +               /* TODO(wad) add retry support */
> +               DMCRIT("an I/O error occurred while reading entry");
> +               atomic_set(&entry->state, DM_BHT_ENTRY_ERROR_IO);
> +               /* entry->nodes will be freed later */
> +               return;
> +       }
> +       BUG_ON(atomic_read(&entry->state) != DM_BHT_ENTRY_PENDING);
> +       atomic_set(&entry->state, DM_BHT_ENTRY_READY);
> +}
> +EXPORT_SYMBOL(dm_bht_read_completed);
> +
> +/**
> + * dm_bht_verify_block - checks that all nodes in the path for @block are valid
> + * @bht:       pointer to a dm_bht_create()d bht
> + * @block:     specific block data is expected from
> + * @pg:                page holding the block data
> + * @offset:    offset into the page
> + *
> + * Returns 0 on success, DM_BHT_ENTRY_ERROR_MISMATCH on error.
> + */
> +int dm_bht_verify_block(struct dm_bht *bht, unsigned int block,
> +                       struct page *pg, unsigned int offset)
> +{
> +       int state, depth = bht->depth;
> +       u8 digest[DM_BHT_MAX_DIGEST_SIZE];
> +       struct dm_bht_entry *entry;
> +       void *node;
> +
> +       do {
> +               /* Need to check that the hash of the current block is accurate
> +                * in its parent.
> +                */
> +               entry = dm_bht_get_entry(bht, depth - 1, block);
> +               state = atomic_read(&entry->state);
> +               /* This call is only safe if all nodes along the path
> +                * are already populated (i.e. READY) via dm_bht_populate.
> +                */
> +               BUG_ON(state < DM_BHT_ENTRY_READY);
> +               node = dm_bht_get_node(bht, entry, depth, block);
> +
> +               if (dm_bht_compute_hash(bht, pg, offset, digest) ||
> +                   memcmp(digest, node, bht->digest_size))
> +                       goto mismatch;
> +
> +               /* Keep the containing block of hashes to be verified in the
> +                * next pass.
> +                */
> +               pg = virt_to_page(entry->nodes);
> +               offset = offset_in_page(entry->nodes);
> +       } while (--depth > 0 && state != DM_BHT_ENTRY_VERIFIED);
> +
> +       if (depth == 0 && state != DM_BHT_ENTRY_VERIFIED) {
> +               if (dm_bht_compute_hash(bht, pg, offset, digest) ||
> +                   memcmp(digest, bht->root_digest, bht->digest_size))
> +                       goto mismatch;
> +               atomic_set(&entry->state, DM_BHT_ENTRY_VERIFIED);
> +       }
> +
> +       /* Mark path to leaf as verified. */
> +       for (depth++; depth < bht->depth; depth++) {
> +               entry = dm_bht_get_entry(bht, depth, block);
> +               /* At this point, entry can only be in VERIFIED or READY state.
> +                * So it is safe to use atomic_set instead of atomic_cmpxchg.
> +                */
> +               atomic_set(&entry->state, DM_BHT_ENTRY_VERIFIED);
> +       }
> +
> +       return 0;
> +
> +mismatch:
> +       DMERR_LIMIT("verify_path: failed to verify hash (d=%d,bi=%u)",
> +                   depth, block);
> +       dm_bht_log_mismatch(bht, node, digest);
> +       return DM_BHT_ENTRY_ERROR_MISMATCH;
> +}
> +EXPORT_SYMBOL(dm_bht_verify_block);
> +
> +/**
> + * dm_bht_is_populated - check that entries from disk needed to verify a given
> + *                       block are all ready
> + * @bht:       pointer to a dm_bht_create()d bht
> + * @block:     specific block data is expected from
> + *
> + * Callers may wish to call dm_bht_is_populated() when checking an io
> + * for which entries were already pending.
> + */
> +bool dm_bht_is_populated(struct dm_bht *bht, unsigned int block)
> +{
> +       int depth;
> +
> +       for (depth = bht->depth - 1; depth >= 0; depth--) {
> +               struct dm_bht_entry *entry = dm_bht_get_entry(bht, depth,
> +                                                             block);
> +               if (atomic_read(&entry->state) < DM_BHT_ENTRY_READY)
> +                       return false;
> +       }
> +
> +       return true;
> +}
> +EXPORT_SYMBOL(dm_bht_is_populated);
> +
> +/**
> + * dm_bht_populate - reads entries from disk needed to verify a given block
> + * @bht:       pointer to a dm_bht_create()d bht
> + * @ctx:        context used for all read_cb calls on this request
> + * @block:     specific block data is expected from
> + *
> + * Returns negative value on error. Returns 0 on success.
> + */
> +int dm_bht_populate(struct dm_bht *bht, void *ctx, unsigned int block)
> +{
> +       int depth, state;
> +
> +       BUG_ON(block >= bht->block_count);
> +
> +       for (depth = bht->depth - 1; depth >= 0; --depth) {
> +               unsigned int index = dm_bht_index_at_level(bht, depth, block);
> +               struct dm_bht_level *level = &bht->levels[depth];
> +               struct dm_bht_entry *entry = dm_bht_get_entry(bht, depth,
> +                                                             block);
> +               state = atomic_cmpxchg(&entry->state,
> +                                      DM_BHT_ENTRY_UNALLOCATED,
> +                                      DM_BHT_ENTRY_PENDING);
> +               if (state == DM_BHT_ENTRY_VERIFIED)
> +                       break;
> +               if (state <= DM_BHT_ENTRY_ERROR)
> +                       goto error_state;
> +               if (state != DM_BHT_ENTRY_UNALLOCATED)
> +                       continue;
> +
> +               /* Current entry is claimed for allocation and loading */
> +               entry->nodes = kmalloc(bht->block_size, GFP_NOIO);
> +               if (!entry->nodes)
> +                       goto nomem;
> +
> +               bht->read_cb(ctx,
> +                            level->sector + to_sector(index * bht->block_size),
> +                            entry->nodes, to_sector(bht->block_size), entry);
> +       }
> +
> +       return 0;
> +
> +error_state:
> +       DMCRIT("block %u at depth %d is in an error state", block, depth);
> +       return -EPERM;
> +
> +nomem:
> +       DMCRIT("failed to allocate memory for entry->nodes");
> +       return -ENOMEM;
> +}
> +EXPORT_SYMBOL(dm_bht_populate);
> +
> +/**
> + * dm_bht_destroy - cleans up all memory used by @bht
> + * @bht:       pointer to a dm_bht_create()d bht
> + */
> +void dm_bht_destroy(struct dm_bht *bht)
> +{
> +       int depth, cpu;
> +
> +       for (depth = 0; depth < bht->depth; depth++) {
> +               struct dm_bht_entry *entry = bht->levels[depth].entries;
> +               struct dm_bht_entry *entry_end = entry +
> +                                                bht->levels[depth].count;
> +               for (; entry < entry_end; ++entry)
> +                       kfree(entry->nodes);
> +               kfree(bht->levels[depth].entries);
> +       }
> +       kfree(bht->levels);
> +       for (cpu = 0; cpu < nr_cpu_ids; ++cpu)
> +               if (bht->hash_desc[cpu].tfm)
> +                       crypto_free_hash(bht->hash_desc[cpu].tfm);
> +}
> +EXPORT_SYMBOL(dm_bht_destroy);
> +
> +/*
> + * Accessors
> + */
> +
> +/**
> + * dm_bht_set_root_hexdigest - sets an unverified root digest hash from hex
> + * @bht:       pointer to a dm_bht_create()d bht
> + * @hexdigest: array of u8s containing the new digest in binary
> + * Returns non-zero on error.  hexdigest should be NUL terminated.
> + */
> +int dm_bht_set_root_hexdigest(struct dm_bht *bht, const u8 *hexdigest)
> +{
> +       /* Make sure we have at least the bytes expected */
> +       if (strnlen((char *)hexdigest, bht->digest_size * 2) !=
> +           bht->digest_size * 2) {
> +               DMERR("root digest length does not match hash algorithm");
> +               return -1;
> +       }
> +       dm_bht_hex_to_bin(bht->root_digest, hexdigest, bht->digest_size);
> +       return 0;
> +}
> +EXPORT_SYMBOL(dm_bht_set_root_hexdigest);
> +
> +/**
> + * dm_bht_root_hexdigest - returns root digest in hex
> + * @bht:       pointer to a dm_bht_create()d bht
> + * @hexdigest: u8 array of size @available
> + * @available: must be bht->digest_size * 2 + 1
> + */
> +int dm_bht_root_hexdigest(struct dm_bht *bht, u8 *hexdigest, int available)
> +{
> +       if (available < 0 ||
> +           ((unsigned int) available) < bht->digest_size * 2 + 1) {
> +               DMERR("hexdigest has too few bytes available");
> +               return -EINVAL;
> +       }
> +       dm_bht_bin_to_hex(bht->root_digest, hexdigest, bht->digest_size);
> +       return 0;
> +}
> +EXPORT_SYMBOL(dm_bht_root_hexdigest);
> +
> +/**
> + * dm_bht_set_salt - sets the salt used, in hex
> + * @bht:      pointer to a dm_bht_create()d bht
> + * @hexsalt:  salt string, as hex; will be zero-padded or truncated to
> + *            DM_BHT_SALT_SIZE * 2 hex digits.
> + */
> +void dm_bht_set_salt(struct dm_bht *bht, const char *hexsalt)
> +{
> +       size_t saltlen = min(strlen(hexsalt) / 2, sizeof(bht->salt));
> +
> +       memset(bht->salt, 0, sizeof(bht->salt));
> +       dm_bht_hex_to_bin(bht->salt, (const u8 *)hexsalt, saltlen);
> +}
> +EXPORT_SYMBOL(dm_bht_set_salt);
> +
> +/**
> + * dm_bht_salt - returns the salt used, in hex
> + * @bht:      pointer to a dm_bht_create()d bht
> + * @hexsalt:  buffer to put salt into, of length DM_BHT_SALT_SIZE * 2 + 1.
> + */
> +int dm_bht_salt(struct dm_bht *bht, char *hexsalt)
> +{
> +       dm_bht_bin_to_hex(bht->salt, (u8 *)hexsalt, sizeof(bht->salt));
> +       return 0;
> +}
> +EXPORT_SYMBOL(dm_bht_salt);
> +
> diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
> new file mode 100644
> index 0000000..a9bd0e8
> --- /dev/null
> +++ b/drivers/md/dm-verity.c
> @@ -0,0 +1,1043 @@
> +/*
> + * Originally based on dm-crypt.c,
> + * Copyright (C) 2003 Christophe Saout <christophe@saout.de>
> + * Copyright (C) 2004 Clemens Fruhwirth <clemens@endorphin.org>
> + * Copyright (C) 2006-2008 Red Hat, Inc. All rights reserved.
> + * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
> + *                    All Rights Reserved.
> + *
> + * This file is released under the GPLv2.
> + *
> + * Implements a verifying transparent block device.
> + * See Documentation/device-mapper/dm-verity.txt
> + */
> +#include <linux/async.h>
> +#include <linux/atomic.h>
> +#include <linux/bio.h>
> +#include <linux/blkdev.h>
> +#include <linux/delay.h>
> +#include <linux/device.h>
> +#include <linux/err.h>
> +#include <linux/genhd.h>
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/mempool.h>
> +#include <linux/mm_types.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/workqueue.h>
> +#include <linux/device-mapper.h>
> +#include <linux/dm-bht.h>
> +
> +#include "dm-verity.h"
> +
> +#define DM_MSG_PREFIX "verity"
> +
> +/* Supports up to 512-bit digests */
> +#define VERITY_MAX_DIGEST_SIZE 64
> +
> +/* TODO(wad) make both of these report the error line/file to a
> + *           verity_bug function.
> + */
> +#define VERITY_BUG(msg...) BUG()
> +#define VERITY_BUG_ON(cond, msg...) BUG_ON(cond)
> +
> +/* Helper for printing sector_t */
> +#define ULL(x) ((unsigned long long)(x))
> +
> +#define MIN_IOS 32
> +#define MIN_BIOS (MIN_IOS * 2)
> +#define VERITY_DEFAULT_BLOCK_SIZE 4096
> +
> +/* Provide a lightweight means of specifying the global default for
> + * error behavior: eio, reboot, or none
> + * Legacy support for 0 = eio, 1 = reboot/panic, 2 = none, 3 = notify.
> + * This is matched to the enum in dm-verity.h.
> + */
> +static const char * const allowed_error_behaviors[] = { "eio", "panic", "none",
> +                                                       "notify", NULL };
> +static char *error_behavior = "eio";
> +module_param(error_behavior, charp, 0644);
> +MODULE_PARM_DESC(error_behavior, "Behavior on error "
> +                                "(eio, panic, none, notify)");
> +
> +/* Controls whether verity_get_device will wait forever for a device. */
> +static int dev_wait;
> +module_param(dev_wait, bool, 0444);
> +MODULE_PARM_DESC(dev_wait, "Wait forever for a backing device");
> +
> +/* per-requested-bio private data */
> +enum verity_io_flags {
> +       VERITY_IOFLAGS_CLONED = 0x1,    /* original bio has been cloned */
> +};
> +
> +struct dm_verity_io {
> +       struct dm_target *target;
> +       struct bio *bio;
> +       struct delayed_work work;
> +       unsigned int flags;
> +
> +       int error;
> +       atomic_t pending;
> +
> +       u64 block;  /* aligned block index */
> +       u64 count;  /* aligned count in blocks */
> +};
> +
> +struct verity_config {
> +       struct dm_dev *dev;
> +       sector_t start;
> +       sector_t size;
> +
> +       struct dm_dev *hash_dev;
> +       sector_t hash_start;
> +
> +       struct dm_bht bht;
> +
> +       /* Pool required for io contexts */
> +       mempool_t *io_pool;
> +       /* Pool and bios required for making sure that backing device reads are
> +        * in PAGE_SIZE increments.
> +        */
> +       struct bio_set *bs;
> +
> +       char hash_alg[CRYPTO_MAX_ALG_NAME];
> +
> +       int error_behavior;
> +};
> +
> +static struct kmem_cache *_verity_io_pool;
> +static struct workqueue_struct *kveritydq, *kverityd_ioq;
> +
> +static void kverityd_verify(struct work_struct *work);
> +static void kverityd_io(struct work_struct *work);
> +static void kverityd_io_bht_populate(struct dm_verity_io *io);
> +static void kverityd_io_bht_populate_end(struct bio *, int error);
> +
> +static BLOCKING_NOTIFIER_HEAD(verity_error_notifier);
> +
> +/*
> + * Exported interfaces
> + */
> +
> +int dm_verity_register_error_notifier(struct notifier_block *nb)
> +{
> +       return blocking_notifier_chain_register(&verity_error_notifier, nb);
> +}
> +EXPORT_SYMBOL_GPL(dm_verity_register_error_notifier);
> +
> +int dm_verity_unregister_error_notifier(struct notifier_block *nb)
> +{
> +       return blocking_notifier_chain_unregister(&verity_error_notifier, nb);
> +}
> +EXPORT_SYMBOL_GPL(dm_verity_unregister_error_notifier);
> +
> +/*
> + * Allocation and utility functions
> + */
> +
> +static void kverityd_src_io_read_end(struct bio *clone, int error);
> +
> +/* Shared destructor for all internal bios */
> +static void dm_verity_bio_destructor(struct bio *bio)
> +{
> +       struct dm_verity_io *io = bio->bi_private;
> +       struct verity_config *vc = io->target->private;
> +       bio_free(bio, vc->bs);
> +}
> +
> +static struct bio *verity_alloc_bioset(struct verity_config *vc, gfp_t gfp_mask,
> +                                      int nr_iovecs)
> +{
> +       return bio_alloc_bioset(gfp_mask, nr_iovecs, vc->bs);
> +}
> +
> +static struct dm_verity_io *verity_io_alloc(struct dm_target *ti,
> +                                           struct bio *bio)
> +{
> +       struct verity_config *vc = ti->private;
> +       sector_t sector = bio->bi_sector - ti->begin;
> +       struct dm_verity_io *io;
> +
> +       io = mempool_alloc(vc->io_pool, GFP_NOIO);
> +       if (unlikely(!io))
> +               return NULL;
> +       io->flags = 0;
> +       io->target = ti;
> +       io->bio = bio;
> +       io->error = 0;
> +
> +       /* Adjust the sector by the virtual starting sector */
> +       io->block = to_bytes(sector) / vc->bht.block_size;
> +       io->count = bio->bi_size / vc->bht.block_size;
> +
> +       atomic_set(&io->pending, 0);
> +
> +       return io;
> +}
> +
> +static struct bio *verity_bio_clone(struct dm_verity_io *io)
> +{
> +       struct verity_config *vc = io->target->private;
> +       struct bio *bio = io->bio;
> +       struct bio *clone = verity_alloc_bioset(vc, GFP_NOIO, bio->bi_max_vecs);
> +
> +       if (!clone)
> +               return NULL;
> +
> +       __bio_clone(clone, bio);
> +       clone->bi_private = io;
> +       clone->bi_end_io  = kverityd_src_io_read_end;
> +       clone->bi_bdev    = vc->dev->bdev;
> +       clone->bi_sector += vc->start - io->target->begin;
> +       clone->bi_destructor = dm_verity_bio_destructor;
> +
> +       return clone;
> +}
> +
> +/* If the request is not successful, this handler takes action.
> + * TODO make this call a registered handler.
> + */
> +static void verity_error(struct verity_config *vc, struct dm_verity_io *io,
> +                        int error)
> +{
> +       const char *message;
> +       int error_mode = DM_VERITY_ERROR_BEHAVIOR_PANIC;
> +       dev_t devt = 0;
> +       u64 block = ~0;
> +       int transient = 1;
> +       struct dm_verity_error_state error_state;
> +
> +       if (vc) {
> +               devt = vc->dev->bdev->bd_dev;
> +               error_mode = vc->error_behavior;
> +       }
> +
> +       if (io) {
> +               io->error = -EIO;
> +               block = io->block;
> +       }
> +
> +       switch (error) {
> +       case -ENOMEM:
> +               message = "out of memory";
> +               break;
> +       case -EBUSY:
> +               message = "pending data seen during verify";
> +               break;
> +       case -EFAULT:
> +               message = "crypto operation failure";
> +               break;
> +       case -EACCES:
> +               message = "integrity failure";
> +               /* Image is bad. */
> +               transient = 0;
> +               break;
> +       case -EPERM:
> +               message = "hash tree population failure";
> +               /* Should be dm-bht specific errors */
> +               transient = 0;
> +               break;
> +       case -EINVAL:
> +               message = "unexpected missing/invalid data";
> +               /* The device was configured incorrectly - fallback. */
> +               transient = 0;
> +               break;
> +       default:
> +               /* Other errors can be passed through as IO errors */
> +               message = "unknown or I/O error";
> +               return;
> +       }
> +
> +       DMERR_LIMIT("verification failure occurred: %s", message);
> +
> +       if (error_mode == DM_VERITY_ERROR_BEHAVIOR_NOTIFY) {
> +               error_state.code = error;
> +               error_state.transient = transient;
> +               error_state.block = block;
> +               error_state.message = message;
> +               error_state.dev_start = vc->start;
> +               error_state.dev_len = vc->size;
> +               error_state.dev = vc->dev->bdev;
> +               error_state.hash_dev_start = vc->hash_start;
> +               error_state.hash_dev_len = vc->bht.sectors;
> +               error_state.hash_dev = vc->hash_dev->bdev;
> +
> +               /* Set default fallthrough behavior. */
> +               error_state.behavior = DM_VERITY_ERROR_BEHAVIOR_PANIC;
> +               error_mode = DM_VERITY_ERROR_BEHAVIOR_PANIC;
> +
> +               if (!blocking_notifier_call_chain(
> +                   &verity_error_notifier, transient, &error_state)) {
> +                       error_mode = error_state.behavior;
> +               }
> +       }
> +
> +       switch (error_mode) {
> +       case DM_VERITY_ERROR_BEHAVIOR_EIO:
> +               break;
> +       case DM_VERITY_ERROR_BEHAVIOR_NONE:
> +               if (error != -EIO && io)
> +                       io->error = 0;
> +               break;
> +       default:
> +               goto do_panic;
> +       }
> +       return;
> +
> +do_panic:
> +       panic("dm-verity failure: "
> +             "device:%u:%u error:%d block:%llu message:%s",
> +             MAJOR(devt), MINOR(devt), error, ULL(block), message);
> +}
> +
> +/**
> + * verity_parse_error_behavior - parse a behavior charp to the enum
> + * @behavior:  NUL-terminated char array
> + *
> + * Checks if the behavior is valid either as text or as an index digit
> + * and returns the proper enum value or -1 on error.
> + */
> +static int verity_parse_error_behavior(const char *behavior)
> +{
> +       const char * const *allowed = allowed_error_behaviors;
> +       char index = '0';
> +
> +       for (; *allowed; allowed++, index++)
> +               if (!strcmp(*allowed, behavior) || behavior[0] == index)
> +                       break;
> +
> +       if (!*allowed)
> +               return -1;
> +
> +       /* Convert to the integer index matching the enum. */
> +       return allowed - allowed_error_behaviors;
> +}
> +
> +/*
> + * Reverse flow of requests into the device.
> + *
> + * (Start at the bottom with verity_map and work your way upward).
> + */
> +
> +static void verity_inc_pending(struct dm_verity_io *io);
> +
> +static void verity_return_bio_to_caller(struct dm_verity_io *io)
> +{
> +       struct verity_config *vc = io->target->private;
> +
> +       if (io->error)
> +               verity_error(vc, io, io->error);
> +
> +       bio_endio(io->bio, io->error);
> +       mempool_free(io, vc->io_pool);
> +}
> +
> +/* Check for any missing bht hashes. */
> +static bool verity_is_bht_populated(struct dm_verity_io *io)
> +{
> +       struct verity_config *vc = io->target->private;
> +       u64 block;
> +
> +       for (block = io->block; block < io->block + io->count; ++block)
> +               if (!dm_bht_is_populated(&vc->bht, block))
> +                       return false;
> +
> +       return true;
> +}
> +
> +/* verity_dec_pending manages the lifetime of all dm_verity_io structs.
> + * Non-bug error handling is centralized through this interface and
> + * all passage from workqueue to workqueue.
> + */
> +static void verity_dec_pending(struct dm_verity_io *io)
> +{
> +       if (!atomic_dec_and_test(&io->pending))
> +               goto done;
> +
> +       if (unlikely(io->error))
> +               goto io_error;
> +
> +       /* I/Os that were pending may now be ready */
> +       if (verity_is_bht_populated(io)) {
> +               INIT_DELAYED_WORK(&io->work, kverityd_verify);
> +               queue_delayed_work(kveritydq, &io->work, 0);
> +       } else {
> +               INIT_DELAYED_WORK(&io->work, kverityd_io);
> +               queue_delayed_work(kverityd_ioq, &io->work, HZ/10);
> +       }
> +
> +done:
> +       return;
> +
> +io_error:
> +       verity_return_bio_to_caller(io);
> +}
> +
> +/* Walks the data set and computes the hash of the data read from the
> + * untrusted source device.  The computed hash is then passed to dm-bht
> + * for verification.
> + */
> +static int verity_verify(struct verity_config *vc,
> +                        struct dm_verity_io *io)
> +{
> +       unsigned int block_size = vc->bht.block_size;
> +       struct bio *bio = io->bio;
> +       u64 block = io->block;
> +       unsigned int idx;
> +       int r;
> +
> +       for (idx = bio->bi_idx; idx < bio->bi_vcnt; idx++) {
> +               struct bio_vec *bv = bio_iovec_idx(bio, idx);
> +               unsigned int offset = bv->bv_offset;
> +               unsigned int len = bv->bv_len;
> +
> +               VERITY_BUG_ON(offset % block_size);
> +               VERITY_BUG_ON(len % block_size);
> +
> +               while (len) {
> +                       r = dm_bht_verify_block(&vc->bht, block,
> +                                               bv->bv_page, offset);
> +                       if (r)
> +                               goto bad_return;
> +
> +                       offset += block_size;
> +                       len -= block_size;
> +                       block++;
> +                       cond_resched();
> +               }
> +       }
> +
> +       return 0;
> +
> +bad_return:
> +       /* dm_bht functions aren't expected to return errno friendly
> +        * values.  They are converted here for uniformity.
> +        */
> +       if (r > 0) {
> +               DMERR("Pending data for block %llu seen at verify", ULL(block));
> +               r = -EBUSY;
> +       } else {
> +               DMERR_LIMIT("Block hash does not match!");
> +               r = -EACCES;
> +       }
> +       return r;
> +}
> +
> +/* Services the verify workqueue */
> +static void kverityd_verify(struct work_struct *work)
> +{
> +       struct delayed_work *dwork = container_of(work, struct delayed_work,
> +                                                 work);
> +       struct dm_verity_io *io = container_of(dwork, struct dm_verity_io,
> +                                              work);
> +       struct verity_config *vc = io->target->private;
> +
> +       io->error = verity_verify(vc, io);
> +
> +       /* Free up the bio and tag with the return value */
> +       verity_return_bio_to_caller(io);
> +}
> +
> +/* Asynchronously called upon the completion of dm-bht I/O.  The status
> + * of the operation is passed back to dm-bht and the next steps are
> + * decided by verity_dec_pending.
> + */
> +static void kverityd_io_bht_populate_end(struct bio *bio, int error)
> +{
> +       struct dm_bht_entry *entry = (struct dm_bht_entry *) bio->bi_private;
> +       struct dm_verity_io *io = (struct dm_verity_io *) entry->io_context;
> +
> +       /* Tell the tree to atomically update now that we've populated
> +        * the given entry.
> +        */
> +       dm_bht_read_completed(entry, error);
> +
> +       /* Clean up for reuse when reading data to be checked */
> +       bio->bi_vcnt = 0;
> +       bio->bi_io_vec->bv_offset = 0;
> +       bio->bi_io_vec->bv_len = 0;
> +       bio->bi_io_vec->bv_page = NULL;
> +       /* Restore the private data to I/O so the destructor can be shared. */
> +       bio->bi_private = (void *) io;
> +       bio_put(bio);
> +
> +       /* We bail but assume the tree has been marked bad. */
> +       if (unlikely(error)) {
> +               DMERR("Failed to read for sector %llu (%u)",
> +                     ULL(io->bio->bi_sector), io->bio->bi_size);
> +               io->error = error;
> +               /* Pass through the error to verity_dec_pending below */
> +       }
> +       /* When pending = 0, it will transition to reading real data */
> +       verity_dec_pending(io);
> +}
> +
> +/* Called by dm-bht (via dm_bht_populate), this function provides
> + * the message digests to dm-bht that are stored on disk.
> + */
> +static int kverityd_bht_read_callback(void *ctx, sector_t start, u8 *dst,
> +                                     sector_t count,
> +                                     struct dm_bht_entry *entry)
> +{
> +       struct dm_verity_io *io = ctx;  /* I/O for this batch */
> +       struct verity_config *vc;
> +       struct bio *bio;
> +
> +       vc = io->target->private;
> +
> +       /* The I/O context is nested inside the entry so that we don't need one
> +        * io context per page read.
> +        */
> +       entry->io_context = ctx;
> +
> +       /* We should only get page size requests at present. */
> +       verity_inc_pending(io);
> +       bio = verity_alloc_bioset(vc, GFP_NOIO, 1);
> +       if (unlikely(!bio)) {
> +               DMCRIT("Out of memory at bio_alloc_bioset");
> +               dm_bht_read_completed(entry, -ENOMEM);
> +               return -ENOMEM;
> +       }
> +       bio->bi_private = (void *) entry;
> +       bio->bi_idx = 0;
> +       bio->bi_size = vc->bht.block_size;
> +       bio->bi_sector = vc->hash_start + start;
> +       bio->bi_bdev = vc->hash_dev->bdev;
> +       bio->bi_end_io = kverityd_io_bht_populate_end;
> +       bio->bi_rw = REQ_META;
> +       /* Only need to free the bio since the page is managed by bht */
> +       bio->bi_destructor = dm_verity_bio_destructor;
> +       bio->bi_vcnt = 1;
> +       bio->bi_io_vec->bv_offset = offset_in_page(dst);
> +       bio->bi_io_vec->bv_len = to_bytes(count);
> +       /* dst is guaranteed to be a page_pool allocation */
> +       bio->bi_io_vec->bv_page = virt_to_page(dst);
> +       /* Track that this I/O is in use.  There should be no risk of the io
> +        * being removed prior since this is called synchronously.
> +        */
> +       generic_make_request(bio);
> +       return 0;
> +}
> +
> +/* Submits an io request for each missing block of block hashes.
> + * The last one to return will then enqueue this on the io workqueue.
> + */
> +static void kverityd_io_bht_populate(struct dm_verity_io *io)
> +{
> +       struct verity_config *vc = io->target->private;
> +       u64 block;
> +
> +       for (block = io->block; block < io->block + io->count; ++block) {
> +               int ret = dm_bht_populate(&vc->bht, io, block);
> +
> +               if (ret < 0) {
> +                       /* verity_dec_pending will handle the error case. */
> +                       io->error = ret;
> +                       break;
> +               }
> +       }
> +}
> +
> +/* Asynchronously called upon the completion of I/O issued
> + * from kverityd_src_io_read. verity_dec_pending() acts as
> + * the scheduler/flow manager.
> + */
> +static void kverityd_src_io_read_end(struct bio *clone, int error)
> +{
> +       struct dm_verity_io *io = clone->bi_private;
> +
> +       if (unlikely(!bio_flagged(clone, BIO_UPTODATE) && !error))
> +               error = -EIO;
> +
> +       if (unlikely(error)) {
> +               DMERR("Error occurred: %d (%llu, %u)",
> +                       error, ULL(clone->bi_sector), clone->bi_size);
> +               io->error = error;
> +       }
> +
> +       /* Release the clone which just avoids the block layer from
> +        * leaving offsets, etc in unexpected states.
> +        */
> +       bio_put(clone);
> +
> +       verity_dec_pending(io);
> +}
> +
> +/* If not yet underway, an I/O request will be issued to the vc->dev
> + * device for the data needed. It is cloned to avoid unexpected changes
> + * to the original bio struct.
> + */
> +static void kverityd_src_io_read(struct dm_verity_io *io)
> +{
> +       struct bio *clone;
> +
> +       /* Check if the read is already issued. */
> +       if (io->flags & VERITY_IOFLAGS_CLONED)
> +               return;
> +
> +       io->flags |= VERITY_IOFLAGS_CLONED;
> +
> +       /* Clone the bio. The block layer may modify the bvec array. */
> +       clone = verity_bio_clone(io);
> +       if (unlikely(!clone)) {
> +               io->error = -ENOMEM;
> +               return;
> +       }
> +
> +       verity_inc_pending(io);
> +
> +       generic_make_request(clone);
> +}
> +
> +/* kverityd_io services the I/O workqueue. For each pass through
> + * the I/O workqueue, a call to populate both the origin drive
> + * data and the hash tree data is made.
> + */
> +static void kverityd_io(struct work_struct *work)
> +{
> +       struct delayed_work *dwork = container_of(work, struct delayed_work,
> +                                                 work);
> +       struct dm_verity_io *io = container_of(dwork, struct dm_verity_io,
> +                                              work);
> +
> +       /* Issue requests asynchronously. */
> +       verity_inc_pending(io);
> +       kverityd_src_io_read(io);
> +       kverityd_io_bht_populate(io);
> +       verity_dec_pending(io);
> +}
> +
> +/* Paired with verity_dec_pending, the pending value in the io dictate the
> + * lifetime of a request and when it is ready to be processed on the
> + * workqueues.
> + */
> +static void verity_inc_pending(struct dm_verity_io *io)
> +{
> +       atomic_inc(&io->pending);
> +}
> +
> +/* Block-level requests start here. */
> +static int verity_map(struct dm_target *ti, struct bio *bio,
> +                     union map_info *map_context)
> +{
> +       struct dm_verity_io *io;
> +       struct verity_config *vc;
> +       struct request_queue *r_queue;
> +
> +       if (unlikely(!ti)) {
> +               DMERR("dm_target was NULL");
> +               return -EIO;
> +       }
> +
> +       vc = ti->private;
> +       r_queue = bdev_get_queue(vc->dev->bdev);
> +
> +       if (bio_data_dir(bio) == WRITE) {
> +               /* If we silently drop writes, then the VFS layer will cache
> +                * the write and persist it in memory. While it doesn't change
> +                * the underlying storage, it still may be contrary to the
> +                * behavior expected by a verified, read-only device.
> +                */
> +               DMWARN_LIMIT("write request received. rejecting with -EIO.");
> +               verity_error(vc, NULL, -EIO);
> +               return -EIO;
> +       } else {
> +               /* Queue up the request to be verified */
> +               io = verity_io_alloc(ti, bio);
> +               if (!io) {
> +                       DMERR_LIMIT("Failed to allocate and init IO data");
> +                       return DM_MAPIO_REQUEUE;
> +               }
> +               INIT_DELAYED_WORK(&io->work, kverityd_io);
> +               queue_delayed_work(kverityd_ioq, &io->work, 0);
> +       }
> +
> +       return DM_MAPIO_SUBMITTED;
> +}
> +
> +static void splitarg(char *arg, char **key, char **val)
> +{
> +       *key = strsep(&arg, "=");
> +       *val = strsep(&arg, "");
> +}
> +
> +/*
> + * Non-block interfaces and device-mapper specific code
> + */
> +
> +/**
> + * verity_ctr - Construct a verified mapping
> + * @ti:   Target being created
> + * @argc: Number of elements in argv
> + * @argv: Vector of key-value pairs (see below).
> + *
> + * Accepts the following keys:
> + * @payload:        hashed device
> + * @hashtree:       device hashtree is stored on
> + * @hashstart:      start address of hashes (default 0)
> + * @block_size:     size of a hash block
> + * @alg:            hash algorithm
> + * @root_hexdigest: toplevel hash of the tree
> + * @error_behavior: what to do when verification fails [optional]
> + * @salt:           salt, in hex [optional]
> + *
> + * E.g.,
> + * payload=/dev/sda2 hashtree=/dev/sda3 alg=sha256
> + * root_hexdigest=f08aa4a3695290c569eb1b0ac032ae1040150afb527abbeb0a3da33d82fb2c6e
> + *
> + * TODO(wad):
> + * - Boot time addition
> + * - Track block verification to free block_hashes if memory use is a concern
> + * Testing needed:
> + * - Regular slub_debug tracing (on checkins)
> + * - Improper block hash padding
> + * - Improper bundle padding
> + * - Improper hash layout
> + * - Missing padding at end of device
> + * - Improperly sized underlying devices
> + * - Out of memory conditions (make sure this isn't too flaky under high load!)
> + * - Incorrect superhash
> + * - Incorrect block hashes
> + * - Incorrect bundle hashes
> + * - Boot-up read speed; sustained read speeds
> + */
> +static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv)
> +{
> +       struct verity_config *vc = NULL;
> +       int ret = 0;
> +       sector_t blocks;
> +       unsigned int block_size = VERITY_DEFAULT_BLOCK_SIZE;
> +       const char *payload = NULL;
> +       const char *hashtree = NULL;
> +       unsigned long hashstart = 0;
> +       const char *alg = NULL;
> +       const char *root_hexdigest = NULL;
> +       const char *dev_error_behavior = error_behavior;
> +       const char *hexsalt = "";
> +       int i;
> +
> +       for (i = 0; i < argc; ++i) {
> +               char *key, *val;
> +               DMWARN("Argument %d: '%s'", i, argv[i]);
> +               splitarg(argv[i], &key, &val);
> +               if (!key) {
> +                       DMWARN("Bad argument %d: missing key?", i);
> +                       break;
> +               }
> +               if (!val) {
> +                       DMWARN("Bad argument %d='%s': missing value", i, key);
> +                       break;
> +               }
> +
> +               if (!strcmp(key, "alg")) {
> +                       alg = val;
> +               } else if (!strcmp(key, "payload")) {
> +                       payload = val;
> +               } else if (!strcmp(key, "hashtree")) {
> +                       hashtree = val;
> +               } else if (!strcmp(key, "root_hexdigest")) {
> +                       root_hexdigest = val;
> +               } else if (!strcmp(key, "hashstart")) {
> +                       if (strict_strtoul(val, 10, &hashstart)) {
> +                               ti->error = "Invalid hashstart";
> +                               return -EINVAL;
> +                       }
> +               } else if (!strcmp(key, "block_size")) {
> +                       unsigned long tmp;
> +                       if (strict_strtoul(val, 10, &tmp) ||
> +                           (tmp > UINT_MAX)) {
> +                               ti->error = "Invalid block_size";
> +                               return -EINVAL;
> +                       }
> +                       block_size = (unsigned int)tmp;
> +               } else if (!strcmp(key, "error_behavior")) {
> +                       dev_error_behavior = val;
> +               } else if (!strcmp(key, "salt")) {
> +                       hexsalt = val;
> +               } else if (!strcmp(key, "error_behavior")) {
> +                       dev_error_behavior = val;
> +               }
> +       }
> +
> +#define NEEDARG(n) \
> +       if (!(n)) { \
> +               ti->error = "Missing argument: " #n; \
> +               return -EINVAL; \
> +       }
> +
> +       NEEDARG(alg);
> +       NEEDARG(payload);
> +       NEEDARG(hashtree);
> +       NEEDARG(root_hexdigest);
> +
> +#undef NEEDARG
> +
> +       /* The device mapper device should be setup read-only */
> +       if ((dm_table_get_mode(ti->table) & ~FMODE_READ) != 0) {
> +               ti->error = "Must be created readonly.";
> +               return -EINVAL;
> +       }
> +
> +       vc = kzalloc(sizeof(*vc), GFP_KERNEL);
> +       if (!vc) {
> +               /* TODO(wad) if this is called from the setup helper, then we
> +                * catch these errors and do a CrOS specific thing. if not, we
> +                * need to have this call the error handler.
> +                */
> +               return -EINVAL;
> +       }
> +
> +       /* Calculate the blocks from the given device size */
> +       vc->size = ti->len;
> +       blocks = to_bytes(vc->size) / block_size;
> +       if (dm_bht_create(&vc->bht, blocks, block_size, alg)) {
> +               DMERR("failed to create required bht");
> +               goto bad_bht;
> +       }
> +       if (dm_bht_set_root_hexdigest(&vc->bht, root_hexdigest)) {
> +               DMERR("root hexdigest error");
> +               goto bad_root_hexdigest;
> +       }
> +       dm_bht_set_salt(&vc->bht, hexsalt);
> +       vc->bht.read_cb = kverityd_bht_read_callback;
> +
> +       /* payload: device to verify */
> +       vc->start = 0;  /* TODO: should this support a starting offset? */
> +       /* We only ever grab the device in read-only mode. */
> +       ret = dm_get_device(ti, payload,
> +                           dm_table_get_mode(ti->table), &vc->dev);
> +       if (ret) {
> +               DMERR("Failed to acquire device '%s': %d", payload, ret);
> +               ti->error = "Device lookup failed";
> +               goto bad_verity_dev;
> +       }
> +
> +       if ((to_bytes(vc->start) % block_size) ||
> +           (to_bytes(vc->size) % block_size)) {
> +               ti->error = "Device must be block_size divisble/aligned";
> +               goto bad_hash_start;
> +       }
> +
> +       vc->hash_start = (sector_t)hashstart;
> +
> +       /* hashtree: device with hashes.
> +        * Note, payload == hashtree is okay as long as the size of
> +        *       ti->len passed to device mapper does not include
> +        *       the hashes.
> +        */
> +       if (dm_get_device(ti, hashtree,
> +                         dm_table_get_mode(ti->table), &vc->hash_dev)) {
> +               ti->error = "Hash device lookup failed";
> +               goto bad_hash_dev;
> +       }
> +
> +       /* arg4: cryptographic digest algorithm */
> +       if (snprintf(vc->hash_alg, CRYPTO_MAX_ALG_NAME, "%s", alg) >=
> +           CRYPTO_MAX_ALG_NAME) {
> +               ti->error = "Hash algorithm name is too long";
> +               goto bad_hash;
> +       }
> +
> +       /* override with optional device-specific error behavior */
> +       vc->error_behavior = verity_parse_error_behavior(dev_error_behavior);
> +       if (vc->error_behavior == -1) {
> +               ti->error = "Bad error_behavior supplied";
> +               goto bad_err_behavior;
> +       }
> +
> +       /* TODO: Maybe issues a request on the io queue for block 0? */
> +
> +       /* Argument processing is done, setup operational data */
> +       /* Pool for dm_verity_io objects */
> +       vc->io_pool = mempool_create_slab_pool(MIN_IOS, _verity_io_pool);
> +       if (!vc->io_pool) {
> +               ti->error = "Cannot allocate verity io mempool";
> +               goto bad_slab_pool;
> +       }
> +
> +       /* Allocate the bioset used for request padding */
> +       /* TODO(wad) allocate a separate bioset for the first verify maybe */
> +       vc->bs = bioset_create(MIN_BIOS, 0);
> +       if (!vc->bs) {
> +               ti->error = "Cannot allocate verity bioset";
> +               goto bad_bs;
> +       }
> +
> +       ti->num_flush_requests = 1;
> +       ti->private = vc;
> +
> +       /* TODO(wad) add device and hash device names */
> +       {
> +               char hashdev[BDEVNAME_SIZE], vdev[BDEVNAME_SIZE];
> +               bdevname(vc->hash_dev->bdev, hashdev);
> +               bdevname(vc->dev->bdev, vdev);
> +               DMINFO("dev:%s hash:%s [sectors:%llu blocks:%llu]", vdev,
> +                      hashdev, ULL(vc->bht.sectors), ULL(blocks));
> +       }
> +       return 0;
> +
> +bad_bs:
> +       mempool_destroy(vc->io_pool);
> +bad_slab_pool:
> +bad_err_behavior:
> +bad_hash:
> +       dm_put_device(ti, vc->hash_dev);
> +bad_hash_dev:
> +bad_hash_start:
> +       dm_put_device(ti, vc->dev);
> +bad_bht:
> +bad_root_hexdigest:
> +bad_verity_dev:
> +       kfree(vc);   /* hash is not secret so no need to zero */
> +       return -EINVAL;
> +}
> +
> +static void verity_dtr(struct dm_target *ti)
> +{
> +       struct verity_config *vc = (struct verity_config *) ti->private;
> +
> +       bioset_free(vc->bs);
> +       mempool_destroy(vc->io_pool);
> +       dm_bht_destroy(&vc->bht);
> +       dm_put_device(ti, vc->hash_dev);
> +       dm_put_device(ti, vc->dev);
> +       kfree(vc);
> +}
> +
> +static int verity_status(struct dm_target *ti, status_type_t type,
> +                       char *result, unsigned int maxlen)
> +{
> +       struct verity_config *vc = (struct verity_config *) ti->private;
> +       unsigned int sz = 0;
> +       char hashdev[BDEVNAME_SIZE], vdev[BDEVNAME_SIZE];
> +       u8 hexdigest[VERITY_MAX_DIGEST_SIZE * 2 + 1] = { 0 };
> +
> +       dm_bht_root_hexdigest(&vc->bht, hexdigest, sizeof(hexdigest));
> +
> +       switch (type) {
> +       case STATUSTYPE_INFO:
> +               break;
> +       case STATUSTYPE_TABLE:
> +               bdevname(vc->hash_dev->bdev, hashdev);
> +               bdevname(vc->dev->bdev, vdev);
> +               DMEMIT("/dev/%s /dev/%s %llu %u %s %s",
> +                       vdev,
> +                       hashdev,
> +                       ULL(vc->hash_start),
> +                       vc->bht.depth,
> +                       vc->hash_alg,
> +                       hexdigest);
> +               break;
> +       }
> +       return 0;
> +}
> +
> +static int verity_merge(struct dm_target *ti, struct bvec_merge_data *bvm,
> +                      struct bio_vec *biovec, int max_size)
> +{
> +       struct verity_config *vc = ti->private;
> +       struct request_queue *q = bdev_get_queue(vc->dev->bdev);
> +
> +       if (!q->merge_bvec_fn)
> +               return max_size;
> +
> +       bvm->bi_bdev = vc->dev->bdev;
> +       bvm->bi_sector = vc->start + bvm->bi_sector - ti->begin;
> +
> +       /* Optionally, this could just return 0 to stick to single pages. */
> +       return min(max_size, q->merge_bvec_fn(q, bvm, biovec));
> +}
> +
> +static int verity_iterate_devices(struct dm_target *ti,
> +                                iterate_devices_callout_fn fn, void *data)
> +{
> +       struct verity_config *vc = ti->private;
> +
> +       return fn(ti, vc->dev, vc->start, ti->len, data);
> +}
> +
> +static void verity_io_hints(struct dm_target *ti,
> +                           struct queue_limits *limits)
> +{
> +       struct verity_config *vc = ti->private;
> +       unsigned int block_size = vc->bht.block_size;
> +
> +       limits->logical_block_size = block_size;
> +       limits->physical_block_size = block_size;
> +       blk_limits_io_min(limits, block_size);
> +}
> +
> +static struct target_type verity_target = {
> +       .name   = "verity",
> +       .version = {0, 1, 0},
> +       .module = THIS_MODULE,
> +       .ctr    = verity_ctr,
> +       .dtr    = verity_dtr,
> +       .map    = verity_map,
> +       .merge  = verity_merge,
> +       .status = verity_status,
> +       .iterate_devices = verity_iterate_devices,
> +       .io_hints = verity_io_hints,
> +};
> +
> +#define VERITY_WQ_FLAGS (WQ_CPU_INTENSIVE|WQ_HIGHPRI)
> +
> +static int __init dm_verity_init(void)
> +{
> +       int r = -ENOMEM;
> +
> +       _verity_io_pool = KMEM_CACHE(dm_verity_io, 0);
> +       if (!_verity_io_pool) {
> +               DMERR("failed to allocate pool dm_verity_io");
> +               goto bad_io_pool;
> +       }
> +
> +       kverityd_ioq = alloc_workqueue("kverityd_io", VERITY_WQ_FLAGS, 1);
> +       if (!kverityd_ioq) {
> +               DMERR("failed to create workqueue kverityd_ioq");
> +               goto bad_io_queue;
> +       }
> +
> +       kveritydq = alloc_workqueue("kverityd", VERITY_WQ_FLAGS, 1);
> +       if (!kveritydq) {
> +               DMERR("failed to create workqueue kveritydq");
> +               goto bad_verify_queue;
> +       }
> +
> +       r = dm_register_target(&verity_target);
> +       if (r < 0) {
> +               DMERR("register failed %d", r);
> +               goto register_failed;
> +       }
> +
> +       DMINFO("version %u.%u.%u loaded", verity_target.version[0],
> +              verity_target.version[1], verity_target.version[2]);
> +
> +       return r;
> +
> +register_failed:
> +       destroy_workqueue(kveritydq);
> +bad_verify_queue:
> +       destroy_workqueue(kverityd_ioq);
> +bad_io_queue:
> +       kmem_cache_destroy(_verity_io_pool);
> +bad_io_pool:
> +       return r;
> +}
> +
> +static void __exit dm_verity_exit(void)
> +{
> +       destroy_workqueue(kveritydq);
> +       destroy_workqueue(kverityd_ioq);
> +
> +       dm_unregister_target(&verity_target);
> +       kmem_cache_destroy(_verity_io_pool);
> +}
> +
> +module_init(dm_verity_init);
> +module_exit(dm_verity_exit);
> +
> +MODULE_AUTHOR("The Chromium OS Authors <chromium-os-dev@chromium.org>");
> +MODULE_DESCRIPTION(DM_NAME " target for transparent disk integrity checking");
> +MODULE_LICENSE("GPL");
> diff --git a/drivers/md/dm-verity.h b/drivers/md/dm-verity.h
> new file mode 100644
> index 0000000..e0664c9
> --- /dev/null
> +++ b/drivers/md/dm-verity.h
> @@ -0,0 +1,45 @@
> +/*
> + * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
> + *                    All Rights Reserved.
> + *
> + * This file is released under the GPLv2.
> + *
> + * Provide error types for use when creating a custom error handler.
> + * See Documentation/device-mapper/dm-verity.txt
> + */
> +#ifndef DM_VERITY_H
> +#define DM_VERITY_H
> +
> +#include <linux/notifier.h>
> +
> +struct dm_verity_error_state {
> +       int code;
> +       int transient;  /* Likely to not happen after a reboot */
> +       u64 block;
> +       const char *message;
> +
> +       sector_t dev_start;
> +       sector_t dev_len;
> +       struct block_device *dev;
> +
> +       sector_t hash_dev_start;
> +       sector_t hash_dev_len;
> +       struct block_device *hash_dev;
> +
> +       /* Final behavior after all notifications are completed. */
> +       int behavior;
> +};
> +
> +/* This enum must be matched to allowed_error_behaviors in dm-verity.c */
> +enum dm_verity_error_behavior {
> +       DM_VERITY_ERROR_BEHAVIOR_EIO = 0,
> +       DM_VERITY_ERROR_BEHAVIOR_PANIC,
> +       DM_VERITY_ERROR_BEHAVIOR_NONE,
> +       DM_VERITY_ERROR_BEHAVIOR_NOTIFY
> +};
> +
> +
> +int dm_verity_register_error_notifier(struct notifier_block *nb);
> +int dm_verity_unregister_error_notifier(struct notifier_block *nb);
> +
> +#endif  /* DM_VERITY_H */
> diff --git a/include/linux/dm-bht.h b/include/linux/dm-bht.h
> new file mode 100644
> index 0000000..0595911
> --- /dev/null
> +++ b/include/linux/dm-bht.h
> @@ -0,0 +1,166 @@
> +/*
> + * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
> + *
> + * Device-Mapper block hash tree interface.
> + * See Documentation/device-mapper/dm-bht.txt for details.
> + *
> + * This file is released under the GPLv2.
> + */
> +#ifndef __LINUX_DM_BHT_H
> +#define __LINUX_DM_BHT_H
> +
> +#include <linux/compiler.h>
> +#include <linux/crypto.h>
> +#include <linux/types.h>
> +
> +/* To avoid allocating memory for digest tests, we just setup a
> + * max to use for now.
> + */
> +#define DM_BHT_MAX_DIGEST_SIZE 128  /* 1k hashes are unlikely for now */
> +#define DM_BHT_SALT_SIZE       32   /* 256 bits of salt is a lot */
> +
> +/* UNALLOCATED, PENDING, READY, and VERIFIED are valid states. All other
> + * values are entry-related return codes.
> + */
> +#define DM_BHT_ENTRY_VERIFIED 8  /* 'nodes' has been checked against parent */
> +#define DM_BHT_ENTRY_READY 4  /* 'nodes' is loaded and available */
> +#define DM_BHT_ENTRY_PENDING 2  /* 'nodes' is being loaded */
> +#define DM_BHT_ENTRY_UNALLOCATED 0 /* untouched */
> +#define DM_BHT_ENTRY_ERROR -1 /* entry is unsuitable for use */
> +#define DM_BHT_ENTRY_ERROR_IO -2 /* I/O error on load */
> +
> +/* Additional possible return codes */
> +#define DM_BHT_ENTRY_ERROR_MISMATCH -3 /* Digest mismatch */
> +
> +/* dm_bht_entry
> + * Contains dm_bht->node_count tree nodes at a given tree depth.
> + * state is used to transactionally assure that data is paged in
> + * from disk.  Unless dm_bht kept running crypto contexts for each
> + * level, we need to load in the data for on-demand verification.
> + */
> +struct dm_bht_entry {
> +       atomic_t state; /* see defines */
> +       /* Keeping an extra pointer per entry wastes up to ~33k of
> +        * memory if a 1m blocks are used (or 66 on 64-bit arch)
> +        */
> +       void *io_context;  /* Reserve a pointer for use during io */
> +       /* data should only be non-NULL if fully populated. */
> +       void *nodes;  /* The hash data used to verify the children.
> +                      * Guaranteed to be page-aligned.
> +                      */
> +};
> +
> +/* dm_bht_level
> + * Contains an array of entries which represent a page of hashes where
> + * each hash is a node in the tree at the given tree depth/level.
> + */
> +struct dm_bht_level {
> +       struct dm_bht_entry *entries;  /* array of entries of tree nodes */
> +       unsigned int count;  /* number of entries at this level */
> +       sector_t sector;  /* starting sector for this level */
> +};
> +
> +/* opaque context, start, databuf, sector_count */
> +typedef int(*dm_bht_callback)(void *,  /* external context */
> +                             sector_t,  /* start sector */
> +                             u8 *,  /* destination page */
> +                             sector_t,  /* num sectors */
> +                             struct dm_bht_entry *);
> +/* dm_bht - Device mapper block hash tree
> + * dm_bht provides a fixed interface for comparing data blocks
> + * against a cryptographic hashes stored in a hash tree. It
> + * optimizes the tree structure for storage on disk.
> + *
> + * The tree is built from the bottom up.  A collection of data,
> + * external to the tree, is hashed and these hashes are stored
> + * as the blocks in the tree.  For some number of these hashes,
> + * a parent node is created by hashing them.  These steps are
> + * repeated.
> + *
> + * TODO(wad): All hash storage memory is pre-allocated and freed once an
> + * entire branch has been verified.
> + */
> +struct dm_bht {
> +       /* Configured values */
> +       int depth;  /* Depth of the tree including the root */
> +       unsigned int block_count;  /* Number of blocks hashed */
> +       unsigned int block_size;  /* Size of a hash block */
> +       char hash_alg[CRYPTO_MAX_ALG_NAME];
> +       unsigned char salt[DM_BHT_SALT_SIZE];
> +
> +       /* Computed values */
> +       unsigned int node_count;  /* Data size (in hashes) for each entry */
> +       unsigned int node_count_shift;  /* first bit set - 1 */
> +       /* There is one per CPU so that verified can be simultaneous. */
> +       struct hash_desc hash_desc[NR_CPUS];  /* Container for the hash alg */
> +       unsigned int digest_size;
> +       sector_t sectors;  /* Number of disk sectors used */
> +
> +       /* bool verified;  Full tree is verified */
> +       u8 root_digest[DM_BHT_MAX_DIGEST_SIZE];
> +       struct dm_bht_level *levels;  /* in reverse order */
> +       /* Callback for reading from the hash device */
> +       dm_bht_callback read_cb;
> +};
> +
> +/* Constructor for struct dm_bht instances. */
> +int dm_bht_create(struct dm_bht *bht,
> +                 unsigned int block_count,
> +                 unsigned int block_size,
> +                 const char *alg_name);
> +/* Destructor for struct dm_bht instances.  Does not free @bht */
> +void dm_bht_destroy(struct dm_bht *bht);
> +
> +/* Basic accessors for struct dm_bht */
> +int dm_bht_set_root_hexdigest(struct dm_bht *bht, const u8 *hexdigest);
> +int dm_bht_root_hexdigest(struct dm_bht *bht, u8 *hexdigest, int available);
> +void dm_bht_set_salt(struct dm_bht *bht, const char *hexsalt);
> +int dm_bht_salt(struct dm_bht *bht, char *hexsalt);
> +
> +/* Functions for loading in data from disk for verification */
> +bool dm_bht_is_populated(struct dm_bht *bht, unsigned int block);
> +int dm_bht_populate(struct dm_bht *bht, void *read_cb_ctx,
> +                   unsigned int block);
> +int dm_bht_verify_block(struct dm_bht *bht, unsigned int block,
> +                       struct page *pg, unsigned int offset);
> +void dm_bht_read_completed(struct dm_bht_entry *entry, int status);
> +
> +/* Functions for converting indices to nodes. */
> +
> +static inline unsigned int dm_bht_get_level_shift(struct dm_bht *bht,
> +                                                 int depth)
> +{
> +       return (bht->depth - depth) * bht->node_count_shift;
> +}
> +
> +/* For the given depth, this is the entry index.  At depth+1 it is the node
> + * index for depth.
> + */
> +static inline unsigned int dm_bht_index_at_level(struct dm_bht *bht,
> +                                                       int depth,
> +                                                       unsigned int leaf)
> +{
> +       return leaf >> dm_bht_get_level_shift(bht, depth);
> +}
> +
> +static inline struct dm_bht_entry *dm_bht_get_entry(struct dm_bht *bht,
> +                                                   int depth,
> +                                                   unsigned int block)
> +{
> +       unsigned int index = dm_bht_index_at_level(bht, depth, block);
> +       struct dm_bht_level *level = &bht->levels[depth];
> +
> +       return &level->entries[index];
> +}
> +
> +static inline void *dm_bht_get_node(struct dm_bht *bht,
> +                                 struct dm_bht_entry *entry,
> +                                 int depth,
> +                                 unsigned int block)
> +{
> +       unsigned int index = dm_bht_index_at_level(bht, depth, block);
> +       unsigned int node_index = index % bht->node_count;
> +
> +       return entry->nodes + (node_index * bht->digest_size);
> +}
> +#endif  /* __LINUX_DM_BHT_H */
> --
> 1.7.3.1
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] dm: verity target
  2011-09-15 18:45 Mandeep Singh Baines
@ 2011-09-16 17:54 ` Valdis.Kletnieks
  2011-09-27 19:02 ` Will Drewry
  1 sibling, 0 replies; 22+ messages in thread
From: Valdis.Kletnieks @ 2011-09-16 17:54 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: dm-devel, Will Drewry, Elly Jones, Alasdair G Kergon, Milan Broz,
	Olof Johansson, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 400 bytes --]

On Thu, 15 Sep 2011 11:45:59 PDT, Mandeep Singh Baines said:
> The verity target provides transparent integrity checking of block devices
> using a cryptographic digest.

I just had this mental image of Dr Henry Jones saying: "But in Latin, verify is
spelled with an f..." ;)

Might want to add something to device-mapper/dm-verity.txt explaining
where the name came from and that it's *not* a typo?

[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] dm: verity target
@ 2011-09-15 22:02 Wesley Miaw
  0 siblings, 0 replies; 22+ messages in thread
From: Wesley Miaw @ 2011-09-15 22:02 UTC (permalink / raw)
  To: linux-kernel

On 2011 Sep 15, at 11:45 AM, Mandeep Singh Baines wrote:

> The verity target provides transparent integrity checking of block devices
> using a cryptographic digest.
> 
> dm-verity is meant to be setup as part of a verified boot path.  This
> may be anything ranging from a boot using tboot or trustedgrub to just
> booting from a known-good device (like a USB drive or CD).
> 
> dm-verity is part of ChromeOS's verified boot path. It is used to verify
> the integrity of the root filesystem on boot. The root filesystem is
> mounted on a dm-verity partition which transparently verifies each block
> with a bootloader verified hash passed into the kernel at boot.

Netflix would like dm-verity to be included in the Linux kernel. Over the past year, we have been working with Google and porting dm-verity onto a number of consumer electronics devices running embedded Linux. Demand for this feature has been high and we see a lot of benefit associated with making dm-verity part of the official kernel.
--
Wesley Miaw

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] dm: verity target
@ 2011-09-15 18:45 Mandeep Singh Baines
  2011-09-16 17:54 ` Valdis.Kletnieks
  2011-09-27 19:02 ` Will Drewry
  0 siblings, 2 replies; 22+ messages in thread
From: Mandeep Singh Baines @ 2011-09-15 18:45 UTC (permalink / raw)
  To: dm-devel
  Cc: Mandeep Singh Baines, Will Drewry, Elly Jones, Alasdair G Kergon,
	Milan Broz, Olof Johansson, linux-kernel

The verity target provides transparent integrity checking of block devices
using a cryptographic digest.

dm-verity is meant to be setup as part of a verified boot path.  This
may be anything ranging from a boot using tboot or trustedgrub to just
booting from a known-good device (like a USB drive or CD).

dm-verity is part of ChromeOS's verified boot path. It is used to verify
the integrity of the root filesystem on boot. The root filesystem is
mounted on a dm-verity partition which transparently verifies each block
with a bootloader verified hash passed into the kernel at boot.

Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Elly Jones <ellyjones@chromium.org>
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Milan Broz <mbroz@redhat.com>
Cc: Olof Johansson <olofj@chromium.org>
Cc: dm-devel@redhat.com
Cc: linux-kernel@vger.kernel.org
---
 Documentation/device-mapper/dm-bht.txt    |   59 ++
 Documentation/device-mapper/dm-verity.txt |   76 +++
 drivers/md/Kconfig                        |   30 +
 drivers/md/Makefile                       |    2 +
 drivers/md/dm-bht.c                       |  541 +++++++++++++++
 drivers/md/dm-verity.c                    | 1043 +++++++++++++++++++++++++++++
 drivers/md/dm-verity.h                    |   45 ++
 include/linux/dm-bht.h                    |  166 +++++
 8 files changed, 1962 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/device-mapper/dm-bht.txt
 create mode 100644 Documentation/device-mapper/dm-verity.txt
 create mode 100644 drivers/md/dm-bht.c
 create mode 100644 drivers/md/dm-verity.c
 create mode 100644 drivers/md/dm-verity.h
 create mode 100644 include/linux/dm-bht.h

diff --git a/Documentation/device-mapper/dm-bht.txt b/Documentation/device-mapper/dm-bht.txt
new file mode 100644
index 0000000..21d929f
--- /dev/null
+++ b/Documentation/device-mapper/dm-bht.txt
@@ -0,0 +1,59 @@
+dm-bht
+======
+
+dm-bht provides a block hash tree implementation.  The use of dm-bht allows
+for integrity checking of a given block device without reading the entire
+set of blocks into memory before use.
+
+In particular, dm-bht supplies an interface for creating and verifying a tree
+of cryptographic digests with any algorithm supported by the kernel crypto API.
+
+The `verity' target is the motivating example.
+
+
+Theory of operation
+===================
+
+dm-bht is logically comprised of multiple nodes organized in a tree-like
+structure.  Each node in the tree is a cryptographic hash.  If it is a leaf
+node, the hash is of some block data on disk.  If it is an intermediary node,
+then the hash is of a number of child nodes.
+
+dm-bht has a given depth starting at 1 (ignoring the root node).  Each level in
+the tree is concretely made up of dm_bht_entry structs.  Each entry in the tree
+is a collection of neighboring nodes that fit in one page-sized block.  The
+number is determined based on PAGE_SIZE and the size of the selected
+cryptographic digest algorithm.  The hashes are linearly ordered in this entry
+and any unaligned trailing space is ignored but included when calculating the
+parent node.
+
+The tree looks something like:
+
+alg= sha256, num_blocks = 32767
+                                 [   root    ]
+                                /    . . .    \
+                     [entry_0]                 [entry_1]
+                    /  . . .  \                 . . .   \
+         [entry_0_0]   . . .  [entry_0_127]    . . . .  [entry_1_127]
+           / ... \             /   . . .  \             /           \
+     blk_0 ... blk_127  blk_16256   blk_16383      blk_32640 . . . blk_32767
+
+root is treated independently from the depth and the blocks are expected to
+be hashed and supplied to the dm-bht.  hash blocks that make up the entry
+contents are expected to be read from disk.
+
+dm-bht does not handle I/O directly but instead expects the consumer to
+supply callbacks.  The read callback will always receive a page-align value
+to pass to the block device layer to read in a hash value.
+
+Usage
+=====
+
+The API provides mechanisms for reading and verifying a tree. When reading, all
+required data for the hash tree should be populated for a block before
+attempting a verify.  This can be done by calling dm_bht_populate().  When all
+data is ready, a call to dm_bht_verify_block() with the expected hash value will
+perform both the direct block hash check and the hashes of the parent and
+neighboring nodes where needed to ensure validity up to the root hash.  Note,
+dm_bht_set_root_hexdigest() should be called before any verification attempts
+occur.
diff --git a/Documentation/device-mapper/dm-verity.txt b/Documentation/device-mapper/dm-verity.txt
new file mode 100644
index 0000000..f33b984
--- /dev/null
+++ b/Documentation/device-mapper/dm-verity.txt
@@ -0,0 +1,76 @@
+dm-verity
+==========
+
+Device-Mapper's "verity" target provides transparent integrity checking of
+block devices using a cryptographic digest provided by the kernel crypto API.
+This target is read-only.
+
+Parameters: payload=<device path> hashtree=<hash device path> alg=<alg> \
+            salt=<salt> root_hexagiest=<root hash> \
+            [ hashstart=<hash start> error_behavior=<error behavior> ]
+
+<device path>
+    This is the device that is going to be integrity checked.  It may be
+    a subset of the full device as specified to dmsetup (start sector and count)
+    It may be specified as a path, like /dev/sdaX, or a device number,
+    <major>:<minor>.
+
+<hash device path>
+    This is the device that that supplies the dm-bht hash data.  It may be
+    specified similarly to the device path and may be the same device.  If the
+    same device is used, the hash offset should be outside of the dm-verity
+    configured device size.
+
+<alg>
+    The cryptographic hash algorithm used for this device.  This should
+    be the name of the algorithm, like "sha1".
+
+<salt>
+    Salt value (in hex).
+
+<root hash>
+    The hexadecimal encoding of the cryptographic hash of all of the
+    neighboring nodes at the first level of the tree.  This hash should be
+    trusted as there is no other authenticity beyond this point.
+
+<hash start>
+    Start address of hashes (default 0).
+
+<error behavior>
+    0 = return -EIO. 1 = panic. 2 = none. 3 = call notifier.
+
+Theory of operation
+===================
+
+dm-verity is meant to be setup as part of a verified boot path.  This
+may be anything ranging from a boot using tboot or trustedgrub to just
+booting from a known-good device (like a USB drive or CD).
+
+When a dm-verity device is configured, it is expected that the caller
+has been authenticated in some way (cryptographic signatures, etc).
+After instantiation, all hashes will be verified on-demand during
+disk access.  If they cannot be verified up to the root node of the
+tree, the root hash, then the I/O will fail.  This should identify
+tampering with any data on the device and the hash data.
+
+Cryptographic hashes are used to assert the integrity of the device on a
+per-block basis.  This allows for a lightweight hash computation on first read
+into the page cache.  Block hashes are stored linearly aligned to the nearest
+block the size of a page.
+
+For more information on the hashing process, see dm-bht.txt.
+
+
+Example
+=======
+
+Setup a device;
+[[
+  dmsetup create vroot --table \
+    "0 204800 verity payload=/dev/sda1 hashtree=/dev/sda2 alg=sha1 "\
+    "root_hexdigest=9f74809a2ee7607b16fcc70d9399a4de9725a727"
+]]
+
+A command line tool is available to compute the hash tree and return the
+root hash value.
+  http://git.chromium.org/cgi-bin/gitweb.cgi?p=dm-verity.git;a=tree
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index f75a66e..cb5f425 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -334,4 +334,34 @@ config DM_FLAKEY
        ---help---
          A target that intermittently fails I/O for debugging purposes.
 
+config DM_BHT
+        tristate "Block hash tree support"
+        select CRYPTO
+        select CRYPTO_HASH
+        ---help---
+          Include support for device-mapper devices to use a block hash
+          tree for managing data integrity checks in a scalable way.
+
+          Targets that use this functionality should include it
+          automatically.
+
+          If unsure, say N.
+
+config DM_VERITY
+        tristate "Verity target support"
+        depends on BLK_DEV_DM
+        select DM_BHT
+        select CRYPTO
+        select CRYPTO_HASH
+        ---help---
+          This device-mapper target allows you to create a device that
+          transparently integrity checks the data on it. You'll need to
+          activate the digests you're going to use in the cryptoapi
+          configuration.
+
+          To compile this code as a module, choose M here: the module will
+          be called dm-verity.
+
+          If unsure, say N.
+
 endif # MD
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index 448838b..58eb088 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -36,6 +36,8 @@ obj-$(CONFIG_DM_MULTIPATH_ST)	+= dm-service-time.o
 obj-$(CONFIG_DM_SNAPSHOT)	+= dm-snapshot.o
 obj-$(CONFIG_DM_MIRROR)		+= dm-mirror.o dm-log.o dm-region-hash.o
 obj-$(CONFIG_DM_LOG_USERSPACE)	+= dm-log-userspace.o
+obj-$(CONFIG_DM_BHT)            += dm-bht.o
+obj-$(CONFIG_DM_VERITY)         += dm-verity.o
 obj-$(CONFIG_DM_ZERO)		+= dm-zero.o
 obj-$(CONFIG_DM_RAID)	+= dm-raid.o
 
diff --git a/drivers/md/dm-bht.c b/drivers/md/dm-bht.c
new file mode 100644
index 0000000..32b8ccf
--- /dev/null
+++ b/drivers/md/dm-bht.c
@@ -0,0 +1,541 @@
+ /*
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *
+ * Device-Mapper block hash tree interface.
+ * See Documentation/device-mapper/dm-bht.txt for details.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/atomic.h>
+#include <linux/bitops.h>
+#include <linux/bug.h>
+#include <linux/cpumask.h>
+#include <linux/device-mapper.h>
+#include <linux/dm-bht.h>
+#include <linux/err.h>
+#include <linux/errno.h>
+#include <linux/gfp.h>
+#include <linux/kernel.h>
+#include <linux/mm_types.h>
+#include <linux/scatterlist.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+#define DM_MSG_PREFIX "dm bht"
+
+
+/*
+ * Utilities
+ */
+
+static u8 from_hex(u8 ch)
+{
+	if ((ch >= '0') && (ch <= '9'))
+		return ch - '0';
+	if ((ch >= 'a') && (ch <= 'f'))
+		return ch - 'a' + 10;
+	if ((ch >= 'A') && (ch <= 'F'))
+		return ch - 'A' + 10;
+	return -1;
+}
+
+/**
+ * dm_bht_bin_to_hex - converts a binary stream to human-readable hex
+ * @binary:	a byte array of length @binary_len
+ * @hex:	a byte array of length @binary_len * 2 + 1
+ */
+static void dm_bht_bin_to_hex(u8 *binary, u8 *hex, unsigned int binary_len)
+{
+	while (binary_len-- > 0) {
+		sprintf((char *)hex, "%02hhx", (int)*binary);
+		hex += 2;
+		binary++;
+	}
+}
+
+/**
+ * dm_bht_hex_to_bin - converts a hex stream to binary
+ * @binary:	a byte array of length @binary_len
+ * @hex:	a byte array of length @binary_len * 2 + 1
+ */
+static void dm_bht_hex_to_bin(u8 *binary, const u8 *hex,
+			      unsigned int binary_len)
+{
+	while (binary_len-- > 0) {
+		*binary = from_hex(*(hex++));
+		*binary *= 16;
+		*binary += from_hex(*(hex++));
+		binary++;
+	}
+}
+
+static void dm_bht_log_mismatch(struct dm_bht *bht, u8 *given, u8 *computed)
+{
+	u8 given_hex[DM_BHT_MAX_DIGEST_SIZE * 2 + 1];
+	u8 computed_hex[DM_BHT_MAX_DIGEST_SIZE * 2 + 1];
+
+	dm_bht_bin_to_hex(given, given_hex, bht->digest_size);
+	dm_bht_bin_to_hex(computed, computed_hex, bht->digest_size);
+	DMERR_LIMIT("%s != %s", given_hex, computed_hex);
+}
+
+/**
+ * dm_bht_compute_hash: hashes a page of data
+ */
+static int dm_bht_compute_hash(struct dm_bht *bht, struct page *pg,
+			       unsigned int offset, u8 *digest)
+{
+	struct hash_desc *hash_desc = &bht->hash_desc[smp_processor_id()];
+	struct scatterlist sg;
+
+	sg_init_table(&sg, 1);
+	sg_set_page(&sg, pg, bht->block_size, offset);
+	/* Note, this is synchronous. */
+	if (crypto_hash_init(hash_desc)) {
+		DMCRIT("failed to reinitialize crypto hash (proc:%d)",
+			smp_processor_id());
+		return -EINVAL;
+	}
+	if (crypto_hash_update(hash_desc, &sg, bht->block_size)) {
+		DMCRIT("crypto_hash_update failed");
+		return -EINVAL;
+	}
+	sg_set_buf(&sg, bht->salt, sizeof(bht->salt));
+	if (crypto_hash_update(hash_desc, &sg, sizeof(bht->salt))) {
+		DMCRIT("crypto_hash_update failed");
+		return -EINVAL;
+	}
+	if (crypto_hash_final(hash_desc, digest)) {
+		DMCRIT("crypto_hash_final failed");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/*
+ * Implementation functions
+ */
+
+static int dm_bht_initialize_entries(struct dm_bht *bht)
+{
+	/* last represents the index of the last digest store in the tree.
+	 * By walking the tree with that index, it is possible to compute the
+	 * total number of entries at each level.
+	 *
+	 * Since each entry will contain up to |node_count| nodes of the tree,
+	 * it is possible that the last index may not be at the end of a given
+	 * entry->nodes.  In that case, it is assumed the value is padded.
+	 *
+	 * Note, we treat both the tree root (1 hash) and the tree leaves
+	 * independently from the bht data structures.  Logically, the root is
+	 * depth=-1 and the block layer level is depth=bht->depth
+	 */
+	unsigned int last = bht->block_count;
+	int depth;
+
+	/* check that the largest level->count can't result in an int overflow
+	 * on allocation or sector calculation.
+	 */
+	if (((last >> bht->node_count_shift) + 1) >
+	    UINT_MAX / max((unsigned int)sizeof(struct dm_bht_entry),
+			   (unsigned int)to_sector(bht->block_size))) {
+		DMCRIT("required entries %u is too large", last + 1);
+		return -EINVAL;
+	}
+
+	/* Track the current sector location for each level so we don't have to
+	 * compute it during traversals.
+	 */
+	bht->sectors = 0;
+	for (depth = 0; depth < bht->depth; ++depth) {
+		struct dm_bht_level *level = &bht->levels[depth];
+
+		level->count = dm_bht_index_at_level(bht, depth, last) + 1;
+		level->entries = (struct dm_bht_entry *)
+				 kcalloc(level->count,
+					 sizeof(struct dm_bht_entry),
+					 GFP_KERNEL);
+		if (!level->entries) {
+			DMERR("failed to allocate entries for depth %d", depth);
+			return -ENOMEM;
+		}
+		level->sector = bht->sectors;
+		bht->sectors += level->count * to_sector(bht->block_size);
+	}
+
+	return 0;
+}
+
+/**
+ * dm_bht_create - prepares @bht for us
+ * @bht:	pointer to a dm_bht_create()d bht
+ * @depth:	tree depth without the root; including block hashes
+ * @block_count:the number of block hashes / tree leaves
+ * @alg_name:	crypto hash algorithm name
+ *
+ * Returns 0 on success.
+ *
+ * Callers can offset into devices by storing the data in the io callbacks.
+ */
+int dm_bht_create(struct dm_bht *bht, unsigned int block_count,
+		  unsigned int block_size, const char *alg_name)
+{
+	int cpu, status;
+
+	bht->block_size = block_size;
+	/* Verify that PAGE_SIZE >= block_size >= SECTOR_SIZE. */
+	if ((block_size > PAGE_SIZE) ||
+	    (PAGE_SIZE % block_size) ||
+	    (to_sector(block_size) == 0))
+		return -EINVAL;
+
+	/* Setup the hash first. Its length determines much of the bht layout */
+	for (cpu = 0; cpu < nr_cpu_ids; ++cpu) {
+		bht->hash_desc[cpu].tfm = crypto_alloc_hash(alg_name, 0, 0);
+		if (IS_ERR(bht->hash_desc[cpu].tfm)) {
+			DMERR("failed to allocate crypto hash '%s'", alg_name);
+			status = -ENOMEM;
+			bht->hash_desc[cpu].tfm = NULL;
+			goto bad_arg;
+		}
+	}
+	bht->digest_size = crypto_hash_digestsize(bht->hash_desc[0].tfm);
+	/* We expect to be able to pack >=2 hashes into a block */
+	if (block_size / bht->digest_size < 2) {
+		DMERR("too few hashes fit in a block");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	if (bht->digest_size > DM_BHT_MAX_DIGEST_SIZE) {
+		DMERR("DM_BHT_MAX_DIGEST_SIZE too small for chosen digest");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	/* Configure the tree */
+	bht->block_count = block_count;
+	if (block_count == 0) {
+		DMERR("block_count must be non-zero");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	/* Each dm_bht_entry->nodes is one block.  The node code tracks
+	 * how many nodes fit into one entry where a node is a single
+	 * hash (message digest).
+	 */
+	bht->node_count_shift = fls(block_size / bht->digest_size) - 1;
+	/* Round down to the nearest power of two.  This makes indexing
+	 * into the tree much less painful.
+	 */
+	bht->node_count = 1 << bht->node_count_shift;
+
+	/* This is unlikely to happen, but with 64k pages, who knows. */
+	if (bht->node_count > UINT_MAX / bht->digest_size) {
+		DMERR("node_count * hash_len exceeds UINT_MAX!");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	bht->depth = DIV_ROUND_UP(fls(block_count - 1), bht->node_count_shift);
+
+	/* Ensure that we can safely shift by this value. */
+	if (bht->depth * bht->node_count_shift >= sizeof(unsigned int) * 8) {
+		DMERR("specified depth and node_count_shift is too large");
+		status = -EINVAL;
+		goto bad_arg;
+	}
+
+	/* Allocate levels. Each level of the tree may have an arbitrary number
+	 * of dm_bht_entry structs.  Each entry contains node_count nodes.
+	 * Each node in the tree is a cryptographic digest of either node_count
+	 * nodes on the subsequent level or of a specific block on disk.
+	 */
+	bht->levels = (struct dm_bht_level *)
+			kcalloc(bht->depth,
+				sizeof(struct dm_bht_level), GFP_KERNEL);
+	if (!bht->levels) {
+		DMERR("failed to allocate tree levels");
+		status = -ENOMEM;
+		goto bad_level_alloc;
+	}
+
+	bht->read_cb = NULL;
+
+	status = dm_bht_initialize_entries(bht);
+	if (status)
+		goto bad_entries_alloc;
+
+	/* We compute depth such that there is only be 1 block at level 0. */
+	BUG_ON(bht->levels[0].count != 1);
+
+	return 0;
+
+bad_entries_alloc:
+	while (bht->depth-- > 0)
+		kfree(bht->levels[bht->depth].entries);
+	kfree(bht->levels);
+bad_level_alloc:
+bad_arg:
+	for (cpu = 0; cpu < nr_cpu_ids; ++cpu)
+		if (bht->hash_desc[cpu].tfm)
+			crypto_free_hash(bht->hash_desc[cpu].tfm);
+	return status;
+}
+EXPORT_SYMBOL(dm_bht_create);
+
+/**
+ * dm_bht_read_completed
+ * @entry:	pointer to the entry that's been loaded
+ * @status:	I/O status. Non-zero is failure.
+ * MUST always be called after a read_cb completes.
+ */
+void dm_bht_read_completed(struct dm_bht_entry *entry, int status)
+{
+	if (status) {
+		/* TODO(wad) add retry support */
+		DMCRIT("an I/O error occurred while reading entry");
+		atomic_set(&entry->state, DM_BHT_ENTRY_ERROR_IO);
+		/* entry->nodes will be freed later */
+		return;
+	}
+	BUG_ON(atomic_read(&entry->state) != DM_BHT_ENTRY_PENDING);
+	atomic_set(&entry->state, DM_BHT_ENTRY_READY);
+}
+EXPORT_SYMBOL(dm_bht_read_completed);
+
+/**
+ * dm_bht_verify_block - checks that all nodes in the path for @block are valid
+ * @bht:	pointer to a dm_bht_create()d bht
+ * @block:	specific block data is expected from
+ * @pg:		page holding the block data
+ * @offset:	offset into the page
+ *
+ * Returns 0 on success, DM_BHT_ENTRY_ERROR_MISMATCH on error.
+ */
+int dm_bht_verify_block(struct dm_bht *bht, unsigned int block,
+			struct page *pg, unsigned int offset)
+{
+	int state, depth = bht->depth;
+	u8 digest[DM_BHT_MAX_DIGEST_SIZE];
+	struct dm_bht_entry *entry;
+	void *node;
+
+	do {
+		/* Need to check that the hash of the current block is accurate
+		 * in its parent.
+		 */
+		entry = dm_bht_get_entry(bht, depth - 1, block);
+		state = atomic_read(&entry->state);
+		/* This call is only safe if all nodes along the path
+		 * are already populated (i.e. READY) via dm_bht_populate.
+		 */
+		BUG_ON(state < DM_BHT_ENTRY_READY);
+		node = dm_bht_get_node(bht, entry, depth, block);
+
+		if (dm_bht_compute_hash(bht, pg, offset, digest) ||
+		    memcmp(digest, node, bht->digest_size))
+			goto mismatch;
+
+		/* Keep the containing block of hashes to be verified in the
+		 * next pass.
+		 */
+		pg = virt_to_page(entry->nodes);
+		offset = offset_in_page(entry->nodes);
+	} while (--depth > 0 && state != DM_BHT_ENTRY_VERIFIED);
+
+	if (depth == 0 && state != DM_BHT_ENTRY_VERIFIED) {
+		if (dm_bht_compute_hash(bht, pg, offset, digest) ||
+		    memcmp(digest, bht->root_digest, bht->digest_size))
+			goto mismatch;
+		atomic_set(&entry->state, DM_BHT_ENTRY_VERIFIED);
+	}
+
+	/* Mark path to leaf as verified. */
+	for (depth++; depth < bht->depth; depth++) {
+		entry = dm_bht_get_entry(bht, depth, block);
+		/* At this point, entry can only be in VERIFIED or READY state.
+		 * So it is safe to use atomic_set instead of atomic_cmpxchg.
+		 */
+		atomic_set(&entry->state, DM_BHT_ENTRY_VERIFIED);
+	}
+
+	return 0;
+
+mismatch:
+	DMERR_LIMIT("verify_path: failed to verify hash (d=%d,bi=%u)",
+		    depth, block);
+	dm_bht_log_mismatch(bht, node, digest);
+	return DM_BHT_ENTRY_ERROR_MISMATCH;
+}
+EXPORT_SYMBOL(dm_bht_verify_block);
+
+/**
+ * dm_bht_is_populated - check that entries from disk needed to verify a given
+ *                       block are all ready
+ * @bht:	pointer to a dm_bht_create()d bht
+ * @block:	specific block data is expected from
+ *
+ * Callers may wish to call dm_bht_is_populated() when checking an io
+ * for which entries were already pending.
+ */
+bool dm_bht_is_populated(struct dm_bht *bht, unsigned int block)
+{
+	int depth;
+
+	for (depth = bht->depth - 1; depth >= 0; depth--) {
+		struct dm_bht_entry *entry = dm_bht_get_entry(bht, depth,
+							      block);
+		if (atomic_read(&entry->state) < DM_BHT_ENTRY_READY)
+			return false;
+	}
+
+	return true;
+}
+EXPORT_SYMBOL(dm_bht_is_populated);
+
+/**
+ * dm_bht_populate - reads entries from disk needed to verify a given block
+ * @bht:	pointer to a dm_bht_create()d bht
+ * @ctx:        context used for all read_cb calls on this request
+ * @block:	specific block data is expected from
+ *
+ * Returns negative value on error. Returns 0 on success.
+ */
+int dm_bht_populate(struct dm_bht *bht, void *ctx, unsigned int block)
+{
+	int depth, state;
+
+	BUG_ON(block >= bht->block_count);
+
+	for (depth = bht->depth - 1; depth >= 0; --depth) {
+		unsigned int index = dm_bht_index_at_level(bht, depth, block);
+		struct dm_bht_level *level = &bht->levels[depth];
+		struct dm_bht_entry *entry = dm_bht_get_entry(bht, depth,
+							      block);
+		state = atomic_cmpxchg(&entry->state,
+				       DM_BHT_ENTRY_UNALLOCATED,
+				       DM_BHT_ENTRY_PENDING);
+		if (state == DM_BHT_ENTRY_VERIFIED)
+			break;
+		if (state <= DM_BHT_ENTRY_ERROR)
+			goto error_state;
+		if (state != DM_BHT_ENTRY_UNALLOCATED)
+			continue;
+
+		/* Current entry is claimed for allocation and loading */
+		entry->nodes = kmalloc(bht->block_size, GFP_NOIO);
+		if (!entry->nodes)
+			goto nomem;
+
+		bht->read_cb(ctx,
+			     level->sector + to_sector(index * bht->block_size),
+			     entry->nodes, to_sector(bht->block_size), entry);
+	}
+
+	return 0;
+
+error_state:
+	DMCRIT("block %u at depth %d is in an error state", block, depth);
+	return -EPERM;
+
+nomem:
+	DMCRIT("failed to allocate memory for entry->nodes");
+	return -ENOMEM;
+}
+EXPORT_SYMBOL(dm_bht_populate);
+
+/**
+ * dm_bht_destroy - cleans up all memory used by @bht
+ * @bht:	pointer to a dm_bht_create()d bht
+ */
+void dm_bht_destroy(struct dm_bht *bht)
+{
+	int depth, cpu;
+
+	for (depth = 0; depth < bht->depth; depth++) {
+		struct dm_bht_entry *entry = bht->levels[depth].entries;
+		struct dm_bht_entry *entry_end = entry +
+						 bht->levels[depth].count;
+		for (; entry < entry_end; ++entry)
+			kfree(entry->nodes);
+		kfree(bht->levels[depth].entries);
+	}
+	kfree(bht->levels);
+	for (cpu = 0; cpu < nr_cpu_ids; ++cpu)
+		if (bht->hash_desc[cpu].tfm)
+			crypto_free_hash(bht->hash_desc[cpu].tfm);
+}
+EXPORT_SYMBOL(dm_bht_destroy);
+
+/*
+ * Accessors
+ */
+
+/**
+ * dm_bht_set_root_hexdigest - sets an unverified root digest hash from hex
+ * @bht:	pointer to a dm_bht_create()d bht
+ * @hexdigest:	array of u8s containing the new digest in binary
+ * Returns non-zero on error.  hexdigest should be NUL terminated.
+ */
+int dm_bht_set_root_hexdigest(struct dm_bht *bht, const u8 *hexdigest)
+{
+	/* Make sure we have at least the bytes expected */
+	if (strnlen((char *)hexdigest, bht->digest_size * 2) !=
+	    bht->digest_size * 2) {
+		DMERR("root digest length does not match hash algorithm");
+		return -1;
+	}
+	dm_bht_hex_to_bin(bht->root_digest, hexdigest, bht->digest_size);
+	return 0;
+}
+EXPORT_SYMBOL(dm_bht_set_root_hexdigest);
+
+/**
+ * dm_bht_root_hexdigest - returns root digest in hex
+ * @bht:	pointer to a dm_bht_create()d bht
+ * @hexdigest:	u8 array of size @available
+ * @available:	must be bht->digest_size * 2 + 1
+ */
+int dm_bht_root_hexdigest(struct dm_bht *bht, u8 *hexdigest, int available)
+{
+	if (available < 0 ||
+	    ((unsigned int) available) < bht->digest_size * 2 + 1) {
+		DMERR("hexdigest has too few bytes available");
+		return -EINVAL;
+	}
+	dm_bht_bin_to_hex(bht->root_digest, hexdigest, bht->digest_size);
+	return 0;
+}
+EXPORT_SYMBOL(dm_bht_root_hexdigest);
+
+/**
+ * dm_bht_set_salt - sets the salt used, in hex
+ * @bht:      pointer to a dm_bht_create()d bht
+ * @hexsalt:  salt string, as hex; will be zero-padded or truncated to
+ *            DM_BHT_SALT_SIZE * 2 hex digits.
+ */
+void dm_bht_set_salt(struct dm_bht *bht, const char *hexsalt)
+{
+	size_t saltlen = min(strlen(hexsalt) / 2, sizeof(bht->salt));
+
+	memset(bht->salt, 0, sizeof(bht->salt));
+	dm_bht_hex_to_bin(bht->salt, (const u8 *)hexsalt, saltlen);
+}
+EXPORT_SYMBOL(dm_bht_set_salt);
+
+/**
+ * dm_bht_salt - returns the salt used, in hex
+ * @bht:      pointer to a dm_bht_create()d bht
+ * @hexsalt:  buffer to put salt into, of length DM_BHT_SALT_SIZE * 2 + 1.
+ */
+int dm_bht_salt(struct dm_bht *bht, char *hexsalt)
+{
+	dm_bht_bin_to_hex(bht->salt, (u8 *)hexsalt, sizeof(bht->salt));
+	return 0;
+}
+EXPORT_SYMBOL(dm_bht_salt);
+
diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
new file mode 100644
index 0000000..a9bd0e8
--- /dev/null
+++ b/drivers/md/dm-verity.c
@@ -0,0 +1,1043 @@
+/*
+ * Originally based on dm-crypt.c,
+ * Copyright (C) 2003 Christophe Saout <christophe@saout.de>
+ * Copyright (C) 2004 Clemens Fruhwirth <clemens@endorphin.org>
+ * Copyright (C) 2006-2008 Red Hat, Inc. All rights reserved.
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *                    All Rights Reserved.
+ *
+ * This file is released under the GPLv2.
+ *
+ * Implements a verifying transparent block device.
+ * See Documentation/device-mapper/dm-verity.txt
+ */
+#include <linux/async.h>
+#include <linux/atomic.h>
+#include <linux/bio.h>
+#include <linux/blkdev.h>
+#include <linux/delay.h>
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/genhd.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mempool.h>
+#include <linux/mm_types.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/workqueue.h>
+#include <linux/device-mapper.h>
+#include <linux/dm-bht.h>
+
+#include "dm-verity.h"
+
+#define DM_MSG_PREFIX "verity"
+
+/* Supports up to 512-bit digests */
+#define VERITY_MAX_DIGEST_SIZE 64
+
+/* TODO(wad) make both of these report the error line/file to a
+ *           verity_bug function.
+ */
+#define VERITY_BUG(msg...) BUG()
+#define VERITY_BUG_ON(cond, msg...) BUG_ON(cond)
+
+/* Helper for printing sector_t */
+#define ULL(x) ((unsigned long long)(x))
+
+#define MIN_IOS 32
+#define MIN_BIOS (MIN_IOS * 2)
+#define VERITY_DEFAULT_BLOCK_SIZE 4096
+
+/* Provide a lightweight means of specifying the global default for
+ * error behavior: eio, reboot, or none
+ * Legacy support for 0 = eio, 1 = reboot/panic, 2 = none, 3 = notify.
+ * This is matched to the enum in dm-verity.h.
+ */
+static const char * const allowed_error_behaviors[] = { "eio", "panic", "none",
+							"notify", NULL };
+static char *error_behavior = "eio";
+module_param(error_behavior, charp, 0644);
+MODULE_PARM_DESC(error_behavior, "Behavior on error "
+				 "(eio, panic, none, notify)");
+
+/* Controls whether verity_get_device will wait forever for a device. */
+static int dev_wait;
+module_param(dev_wait, bool, 0444);
+MODULE_PARM_DESC(dev_wait, "Wait forever for a backing device");
+
+/* per-requested-bio private data */
+enum verity_io_flags {
+	VERITY_IOFLAGS_CLONED = 0x1,	/* original bio has been cloned */
+};
+
+struct dm_verity_io {
+	struct dm_target *target;
+	struct bio *bio;
+	struct delayed_work work;
+	unsigned int flags;
+
+	int error;
+	atomic_t pending;
+
+	u64 block;  /* aligned block index */
+	u64 count;  /* aligned count in blocks */
+};
+
+struct verity_config {
+	struct dm_dev *dev;
+	sector_t start;
+	sector_t size;
+
+	struct dm_dev *hash_dev;
+	sector_t hash_start;
+
+	struct dm_bht bht;
+
+	/* Pool required for io contexts */
+	mempool_t *io_pool;
+	/* Pool and bios required for making sure that backing device reads are
+	 * in PAGE_SIZE increments.
+	 */
+	struct bio_set *bs;
+
+	char hash_alg[CRYPTO_MAX_ALG_NAME];
+
+	int error_behavior;
+};
+
+static struct kmem_cache *_verity_io_pool;
+static struct workqueue_struct *kveritydq, *kverityd_ioq;
+
+static void kverityd_verify(struct work_struct *work);
+static void kverityd_io(struct work_struct *work);
+static void kverityd_io_bht_populate(struct dm_verity_io *io);
+static void kverityd_io_bht_populate_end(struct bio *, int error);
+
+static BLOCKING_NOTIFIER_HEAD(verity_error_notifier);
+
+/*
+ * Exported interfaces
+ */
+
+int dm_verity_register_error_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_register(&verity_error_notifier, nb);
+}
+EXPORT_SYMBOL_GPL(dm_verity_register_error_notifier);
+
+int dm_verity_unregister_error_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_unregister(&verity_error_notifier, nb);
+}
+EXPORT_SYMBOL_GPL(dm_verity_unregister_error_notifier);
+
+/*
+ * Allocation and utility functions
+ */
+
+static void kverityd_src_io_read_end(struct bio *clone, int error);
+
+/* Shared destructor for all internal bios */
+static void dm_verity_bio_destructor(struct bio *bio)
+{
+	struct dm_verity_io *io = bio->bi_private;
+	struct verity_config *vc = io->target->private;
+	bio_free(bio, vc->bs);
+}
+
+static struct bio *verity_alloc_bioset(struct verity_config *vc, gfp_t gfp_mask,
+				       int nr_iovecs)
+{
+	return bio_alloc_bioset(gfp_mask, nr_iovecs, vc->bs);
+}
+
+static struct dm_verity_io *verity_io_alloc(struct dm_target *ti,
+					    struct bio *bio)
+{
+	struct verity_config *vc = ti->private;
+	sector_t sector = bio->bi_sector - ti->begin;
+	struct dm_verity_io *io;
+
+	io = mempool_alloc(vc->io_pool, GFP_NOIO);
+	if (unlikely(!io))
+		return NULL;
+	io->flags = 0;
+	io->target = ti;
+	io->bio = bio;
+	io->error = 0;
+
+	/* Adjust the sector by the virtual starting sector */
+	io->block = to_bytes(sector) / vc->bht.block_size;
+	io->count = bio->bi_size / vc->bht.block_size;
+
+	atomic_set(&io->pending, 0);
+
+	return io;
+}
+
+static struct bio *verity_bio_clone(struct dm_verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+	struct bio *bio = io->bio;
+	struct bio *clone = verity_alloc_bioset(vc, GFP_NOIO, bio->bi_max_vecs);
+
+	if (!clone)
+		return NULL;
+
+	__bio_clone(clone, bio);
+	clone->bi_private = io;
+	clone->bi_end_io  = kverityd_src_io_read_end;
+	clone->bi_bdev    = vc->dev->bdev;
+	clone->bi_sector += vc->start - io->target->begin;
+	clone->bi_destructor = dm_verity_bio_destructor;
+
+	return clone;
+}
+
+/* If the request is not successful, this handler takes action.
+ * TODO make this call a registered handler.
+ */
+static void verity_error(struct verity_config *vc, struct dm_verity_io *io,
+			 int error)
+{
+	const char *message;
+	int error_mode = DM_VERITY_ERROR_BEHAVIOR_PANIC;
+	dev_t devt = 0;
+	u64 block = ~0;
+	int transient = 1;
+	struct dm_verity_error_state error_state;
+
+	if (vc) {
+		devt = vc->dev->bdev->bd_dev;
+		error_mode = vc->error_behavior;
+	}
+
+	if (io) {
+		io->error = -EIO;
+		block = io->block;
+	}
+
+	switch (error) {
+	case -ENOMEM:
+		message = "out of memory";
+		break;
+	case -EBUSY:
+		message = "pending data seen during verify";
+		break;
+	case -EFAULT:
+		message = "crypto operation failure";
+		break;
+	case -EACCES:
+		message = "integrity failure";
+		/* Image is bad. */
+		transient = 0;
+		break;
+	case -EPERM:
+		message = "hash tree population failure";
+		/* Should be dm-bht specific errors */
+		transient = 0;
+		break;
+	case -EINVAL:
+		message = "unexpected missing/invalid data";
+		/* The device was configured incorrectly - fallback. */
+		transient = 0;
+		break;
+	default:
+		/* Other errors can be passed through as IO errors */
+		message = "unknown or I/O error";
+		return;
+	}
+
+	DMERR_LIMIT("verification failure occurred: %s", message);
+
+	if (error_mode == DM_VERITY_ERROR_BEHAVIOR_NOTIFY) {
+		error_state.code = error;
+		error_state.transient = transient;
+		error_state.block = block;
+		error_state.message = message;
+		error_state.dev_start = vc->start;
+		error_state.dev_len = vc->size;
+		error_state.dev = vc->dev->bdev;
+		error_state.hash_dev_start = vc->hash_start;
+		error_state.hash_dev_len = vc->bht.sectors;
+		error_state.hash_dev = vc->hash_dev->bdev;
+
+		/* Set default fallthrough behavior. */
+		error_state.behavior = DM_VERITY_ERROR_BEHAVIOR_PANIC;
+		error_mode = DM_VERITY_ERROR_BEHAVIOR_PANIC;
+
+		if (!blocking_notifier_call_chain(
+		    &verity_error_notifier, transient, &error_state)) {
+			error_mode = error_state.behavior;
+		}
+	}
+
+	switch (error_mode) {
+	case DM_VERITY_ERROR_BEHAVIOR_EIO:
+		break;
+	case DM_VERITY_ERROR_BEHAVIOR_NONE:
+		if (error != -EIO && io)
+			io->error = 0;
+		break;
+	default:
+		goto do_panic;
+	}
+	return;
+
+do_panic:
+	panic("dm-verity failure: "
+	      "device:%u:%u error:%d block:%llu message:%s",
+	      MAJOR(devt), MINOR(devt), error, ULL(block), message);
+}
+
+/**
+ * verity_parse_error_behavior - parse a behavior charp to the enum
+ * @behavior:	NUL-terminated char array
+ *
+ * Checks if the behavior is valid either as text or as an index digit
+ * and returns the proper enum value or -1 on error.
+ */
+static int verity_parse_error_behavior(const char *behavior)
+{
+	const char * const *allowed = allowed_error_behaviors;
+	char index = '0';
+
+	for (; *allowed; allowed++, index++)
+		if (!strcmp(*allowed, behavior) || behavior[0] == index)
+			break;
+
+	if (!*allowed)
+		return -1;
+
+	/* Convert to the integer index matching the enum. */
+	return allowed - allowed_error_behaviors;
+}
+
+/*
+ * Reverse flow of requests into the device.
+ *
+ * (Start at the bottom with verity_map and work your way upward).
+ */
+
+static void verity_inc_pending(struct dm_verity_io *io);
+
+static void verity_return_bio_to_caller(struct dm_verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+
+	if (io->error)
+		verity_error(vc, io, io->error);
+
+	bio_endio(io->bio, io->error);
+	mempool_free(io, vc->io_pool);
+}
+
+/* Check for any missing bht hashes. */
+static bool verity_is_bht_populated(struct dm_verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+	u64 block;
+
+	for (block = io->block; block < io->block + io->count; ++block)
+		if (!dm_bht_is_populated(&vc->bht, block))
+			return false;
+
+	return true;
+}
+
+/* verity_dec_pending manages the lifetime of all dm_verity_io structs.
+ * Non-bug error handling is centralized through this interface and
+ * all passage from workqueue to workqueue.
+ */
+static void verity_dec_pending(struct dm_verity_io *io)
+{
+	if (!atomic_dec_and_test(&io->pending))
+		goto done;
+
+	if (unlikely(io->error))
+		goto io_error;
+
+	/* I/Os that were pending may now be ready */
+	if (verity_is_bht_populated(io)) {
+		INIT_DELAYED_WORK(&io->work, kverityd_verify);
+		queue_delayed_work(kveritydq, &io->work, 0);
+	} else {
+		INIT_DELAYED_WORK(&io->work, kverityd_io);
+		queue_delayed_work(kverityd_ioq, &io->work, HZ/10);
+	}
+
+done:
+	return;
+
+io_error:
+	verity_return_bio_to_caller(io);
+}
+
+/* Walks the data set and computes the hash of the data read from the
+ * untrusted source device.  The computed hash is then passed to dm-bht
+ * for verification.
+ */
+static int verity_verify(struct verity_config *vc,
+			 struct dm_verity_io *io)
+{
+	unsigned int block_size = vc->bht.block_size;
+	struct bio *bio = io->bio;
+	u64 block = io->block;
+	unsigned int idx;
+	int r;
+
+	for (idx = bio->bi_idx; idx < bio->bi_vcnt; idx++) {
+		struct bio_vec *bv = bio_iovec_idx(bio, idx);
+		unsigned int offset = bv->bv_offset;
+		unsigned int len = bv->bv_len;
+
+		VERITY_BUG_ON(offset % block_size);
+		VERITY_BUG_ON(len % block_size);
+
+		while (len) {
+			r = dm_bht_verify_block(&vc->bht, block,
+						bv->bv_page, offset);
+			if (r)
+				goto bad_return;
+
+			offset += block_size;
+			len -= block_size;
+			block++;
+			cond_resched();
+		}
+	}
+
+	return 0;
+
+bad_return:
+	/* dm_bht functions aren't expected to return errno friendly
+	 * values.  They are converted here for uniformity.
+	 */
+	if (r > 0) {
+		DMERR("Pending data for block %llu seen at verify", ULL(block));
+		r = -EBUSY;
+	} else {
+		DMERR_LIMIT("Block hash does not match!");
+		r = -EACCES;
+	}
+	return r;
+}
+
+/* Services the verify workqueue */
+static void kverityd_verify(struct work_struct *work)
+{
+	struct delayed_work *dwork = container_of(work, struct delayed_work,
+						  work);
+	struct dm_verity_io *io = container_of(dwork, struct dm_verity_io,
+					       work);
+	struct verity_config *vc = io->target->private;
+
+	io->error = verity_verify(vc, io);
+
+	/* Free up the bio and tag with the return value */
+	verity_return_bio_to_caller(io);
+}
+
+/* Asynchronously called upon the completion of dm-bht I/O.  The status
+ * of the operation is passed back to dm-bht and the next steps are
+ * decided by verity_dec_pending.
+ */
+static void kverityd_io_bht_populate_end(struct bio *bio, int error)
+{
+	struct dm_bht_entry *entry = (struct dm_bht_entry *) bio->bi_private;
+	struct dm_verity_io *io = (struct dm_verity_io *) entry->io_context;
+
+	/* Tell the tree to atomically update now that we've populated
+	 * the given entry.
+	 */
+	dm_bht_read_completed(entry, error);
+
+	/* Clean up for reuse when reading data to be checked */
+	bio->bi_vcnt = 0;
+	bio->bi_io_vec->bv_offset = 0;
+	bio->bi_io_vec->bv_len = 0;
+	bio->bi_io_vec->bv_page = NULL;
+	/* Restore the private data to I/O so the destructor can be shared. */
+	bio->bi_private = (void *) io;
+	bio_put(bio);
+
+	/* We bail but assume the tree has been marked bad. */
+	if (unlikely(error)) {
+		DMERR("Failed to read for sector %llu (%u)",
+		      ULL(io->bio->bi_sector), io->bio->bi_size);
+		io->error = error;
+		/* Pass through the error to verity_dec_pending below */
+	}
+	/* When pending = 0, it will transition to reading real data */
+	verity_dec_pending(io);
+}
+
+/* Called by dm-bht (via dm_bht_populate), this function provides
+ * the message digests to dm-bht that are stored on disk.
+ */
+static int kverityd_bht_read_callback(void *ctx, sector_t start, u8 *dst,
+				      sector_t count,
+				      struct dm_bht_entry *entry)
+{
+	struct dm_verity_io *io = ctx;  /* I/O for this batch */
+	struct verity_config *vc;
+	struct bio *bio;
+
+	vc = io->target->private;
+
+	/* The I/O context is nested inside the entry so that we don't need one
+	 * io context per page read.
+	 */
+	entry->io_context = ctx;
+
+	/* We should only get page size requests at present. */
+	verity_inc_pending(io);
+	bio = verity_alloc_bioset(vc, GFP_NOIO, 1);
+	if (unlikely(!bio)) {
+		DMCRIT("Out of memory at bio_alloc_bioset");
+		dm_bht_read_completed(entry, -ENOMEM);
+		return -ENOMEM;
+	}
+	bio->bi_private = (void *) entry;
+	bio->bi_idx = 0;
+	bio->bi_size = vc->bht.block_size;
+	bio->bi_sector = vc->hash_start + start;
+	bio->bi_bdev = vc->hash_dev->bdev;
+	bio->bi_end_io = kverityd_io_bht_populate_end;
+	bio->bi_rw = REQ_META;
+	/* Only need to free the bio since the page is managed by bht */
+	bio->bi_destructor = dm_verity_bio_destructor;
+	bio->bi_vcnt = 1;
+	bio->bi_io_vec->bv_offset = offset_in_page(dst);
+	bio->bi_io_vec->bv_len = to_bytes(count);
+	/* dst is guaranteed to be a page_pool allocation */
+	bio->bi_io_vec->bv_page = virt_to_page(dst);
+	/* Track that this I/O is in use.  There should be no risk of the io
+	 * being removed prior since this is called synchronously.
+	 */
+	generic_make_request(bio);
+	return 0;
+}
+
+/* Submits an io request for each missing block of block hashes.
+ * The last one to return will then enqueue this on the io workqueue.
+ */
+static void kverityd_io_bht_populate(struct dm_verity_io *io)
+{
+	struct verity_config *vc = io->target->private;
+	u64 block;
+
+	for (block = io->block; block < io->block + io->count; ++block) {
+		int ret = dm_bht_populate(&vc->bht, io, block);
+
+		if (ret < 0) {
+			/* verity_dec_pending will handle the error case. */
+			io->error = ret;
+			break;
+		}
+	}
+}
+
+/* Asynchronously called upon the completion of I/O issued
+ * from kverityd_src_io_read. verity_dec_pending() acts as
+ * the scheduler/flow manager.
+ */
+static void kverityd_src_io_read_end(struct bio *clone, int error)
+{
+	struct dm_verity_io *io = clone->bi_private;
+
+	if (unlikely(!bio_flagged(clone, BIO_UPTODATE) && !error))
+		error = -EIO;
+
+	if (unlikely(error)) {
+		DMERR("Error occurred: %d (%llu, %u)",
+			error, ULL(clone->bi_sector), clone->bi_size);
+		io->error = error;
+	}
+
+	/* Release the clone which just avoids the block layer from
+	 * leaving offsets, etc in unexpected states.
+	 */
+	bio_put(clone);
+
+	verity_dec_pending(io);
+}
+
+/* If not yet underway, an I/O request will be issued to the vc->dev
+ * device for the data needed. It is cloned to avoid unexpected changes
+ * to the original bio struct.
+ */
+static void kverityd_src_io_read(struct dm_verity_io *io)
+{
+	struct bio *clone;
+
+	/* Check if the read is already issued. */
+	if (io->flags & VERITY_IOFLAGS_CLONED)
+		return;
+
+	io->flags |= VERITY_IOFLAGS_CLONED;
+
+	/* Clone the bio. The block layer may modify the bvec array. */
+	clone = verity_bio_clone(io);
+	if (unlikely(!clone)) {
+		io->error = -ENOMEM;
+		return;
+	}
+
+	verity_inc_pending(io);
+
+	generic_make_request(clone);
+}
+
+/* kverityd_io services the I/O workqueue. For each pass through
+ * the I/O workqueue, a call to populate both the origin drive
+ * data and the hash tree data is made.
+ */
+static void kverityd_io(struct work_struct *work)
+{
+	struct delayed_work *dwork = container_of(work, struct delayed_work,
+						  work);
+	struct dm_verity_io *io = container_of(dwork, struct dm_verity_io,
+					       work);
+
+	/* Issue requests asynchronously. */
+	verity_inc_pending(io);
+	kverityd_src_io_read(io);
+	kverityd_io_bht_populate(io);
+	verity_dec_pending(io);
+}
+
+/* Paired with verity_dec_pending, the pending value in the io dictate the
+ * lifetime of a request and when it is ready to be processed on the
+ * workqueues.
+ */
+static void verity_inc_pending(struct dm_verity_io *io)
+{
+	atomic_inc(&io->pending);
+}
+
+/* Block-level requests start here. */
+static int verity_map(struct dm_target *ti, struct bio *bio,
+		      union map_info *map_context)
+{
+	struct dm_verity_io *io;
+	struct verity_config *vc;
+	struct request_queue *r_queue;
+
+	if (unlikely(!ti)) {
+		DMERR("dm_target was NULL");
+		return -EIO;
+	}
+
+	vc = ti->private;
+	r_queue = bdev_get_queue(vc->dev->bdev);
+
+	if (bio_data_dir(bio) == WRITE) {
+		/* If we silently drop writes, then the VFS layer will cache
+		 * the write and persist it in memory. While it doesn't change
+		 * the underlying storage, it still may be contrary to the
+		 * behavior expected by a verified, read-only device.
+		 */
+		DMWARN_LIMIT("write request received. rejecting with -EIO.");
+		verity_error(vc, NULL, -EIO);
+		return -EIO;
+	} else {
+		/* Queue up the request to be verified */
+		io = verity_io_alloc(ti, bio);
+		if (!io) {
+			DMERR_LIMIT("Failed to allocate and init IO data");
+			return DM_MAPIO_REQUEUE;
+		}
+		INIT_DELAYED_WORK(&io->work, kverityd_io);
+		queue_delayed_work(kverityd_ioq, &io->work, 0);
+	}
+
+	return DM_MAPIO_SUBMITTED;
+}
+
+static void splitarg(char *arg, char **key, char **val)
+{
+	*key = strsep(&arg, "=");
+	*val = strsep(&arg, "");
+}
+
+/*
+ * Non-block interfaces and device-mapper specific code
+ */
+
+/**
+ * verity_ctr - Construct a verified mapping
+ * @ti:   Target being created
+ * @argc: Number of elements in argv
+ * @argv: Vector of key-value pairs (see below).
+ *
+ * Accepts the following keys:
+ * @payload:        hashed device
+ * @hashtree:       device hashtree is stored on
+ * @hashstart:      start address of hashes (default 0)
+ * @block_size:     size of a hash block
+ * @alg:            hash algorithm
+ * @root_hexdigest: toplevel hash of the tree
+ * @error_behavior: what to do when verification fails [optional]
+ * @salt:           salt, in hex [optional]
+ *
+ * E.g.,
+ * payload=/dev/sda2 hashtree=/dev/sda3 alg=sha256
+ * root_hexdigest=f08aa4a3695290c569eb1b0ac032ae1040150afb527abbeb0a3da33d82fb2c6e
+ *
+ * TODO(wad):
+ * - Boot time addition
+ * - Track block verification to free block_hashes if memory use is a concern
+ * Testing needed:
+ * - Regular slub_debug tracing (on checkins)
+ * - Improper block hash padding
+ * - Improper bundle padding
+ * - Improper hash layout
+ * - Missing padding at end of device
+ * - Improperly sized underlying devices
+ * - Out of memory conditions (make sure this isn't too flaky under high load!)
+ * - Incorrect superhash
+ * - Incorrect block hashes
+ * - Incorrect bundle hashes
+ * - Boot-up read speed; sustained read speeds
+ */
+static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+{
+	struct verity_config *vc = NULL;
+	int ret = 0;
+	sector_t blocks;
+	unsigned int block_size = VERITY_DEFAULT_BLOCK_SIZE;
+	const char *payload = NULL;
+	const char *hashtree = NULL;
+	unsigned long hashstart = 0;
+	const char *alg = NULL;
+	const char *root_hexdigest = NULL;
+	const char *dev_error_behavior = error_behavior;
+	const char *hexsalt = "";
+	int i;
+
+	for (i = 0; i < argc; ++i) {
+		char *key, *val;
+		DMWARN("Argument %d: '%s'", i, argv[i]);
+		splitarg(argv[i], &key, &val);
+		if (!key) {
+			DMWARN("Bad argument %d: missing key?", i);
+			break;
+		}
+		if (!val) {
+			DMWARN("Bad argument %d='%s': missing value", i, key);
+			break;
+		}
+
+		if (!strcmp(key, "alg")) {
+			alg = val;
+		} else if (!strcmp(key, "payload")) {
+			payload = val;
+		} else if (!strcmp(key, "hashtree")) {
+			hashtree = val;
+		} else if (!strcmp(key, "root_hexdigest")) {
+			root_hexdigest = val;
+		} else if (!strcmp(key, "hashstart")) {
+			if (strict_strtoul(val, 10, &hashstart)) {
+				ti->error = "Invalid hashstart";
+				return -EINVAL;
+			}
+		} else if (!strcmp(key, "block_size")) {
+			unsigned long tmp;
+			if (strict_strtoul(val, 10, &tmp) ||
+			    (tmp > UINT_MAX)) {
+				ti->error = "Invalid block_size";
+				return -EINVAL;
+			}
+			block_size = (unsigned int)tmp;
+		} else if (!strcmp(key, "error_behavior")) {
+			dev_error_behavior = val;
+		} else if (!strcmp(key, "salt")) {
+			hexsalt = val;
+		} else if (!strcmp(key, "error_behavior")) {
+			dev_error_behavior = val;
+		}
+	}
+
+#define NEEDARG(n) \
+	if (!(n)) { \
+		ti->error = "Missing argument: " #n; \
+		return -EINVAL; \
+	}
+
+	NEEDARG(alg);
+	NEEDARG(payload);
+	NEEDARG(hashtree);
+	NEEDARG(root_hexdigest);
+
+#undef NEEDARG
+
+	/* The device mapper device should be setup read-only */
+	if ((dm_table_get_mode(ti->table) & ~FMODE_READ) != 0) {
+		ti->error = "Must be created readonly.";
+		return -EINVAL;
+	}
+
+	vc = kzalloc(sizeof(*vc), GFP_KERNEL);
+	if (!vc) {
+		/* TODO(wad) if this is called from the setup helper, then we
+		 * catch these errors and do a CrOS specific thing. if not, we
+		 * need to have this call the error handler.
+		 */
+		return -EINVAL;
+	}
+
+	/* Calculate the blocks from the given device size */
+	vc->size = ti->len;
+	blocks = to_bytes(vc->size) / block_size;
+	if (dm_bht_create(&vc->bht, blocks, block_size, alg)) {
+		DMERR("failed to create required bht");
+		goto bad_bht;
+	}
+	if (dm_bht_set_root_hexdigest(&vc->bht, root_hexdigest)) {
+		DMERR("root hexdigest error");
+		goto bad_root_hexdigest;
+	}
+	dm_bht_set_salt(&vc->bht, hexsalt);
+	vc->bht.read_cb = kverityd_bht_read_callback;
+
+	/* payload: device to verify */
+	vc->start = 0;  /* TODO: should this support a starting offset? */
+	/* We only ever grab the device in read-only mode. */
+	ret = dm_get_device(ti, payload,
+			    dm_table_get_mode(ti->table), &vc->dev);
+	if (ret) {
+		DMERR("Failed to acquire device '%s': %d", payload, ret);
+		ti->error = "Device lookup failed";
+		goto bad_verity_dev;
+	}
+
+	if ((to_bytes(vc->start) % block_size) ||
+	    (to_bytes(vc->size) % block_size)) {
+		ti->error = "Device must be block_size divisble/aligned";
+		goto bad_hash_start;
+	}
+
+	vc->hash_start = (sector_t)hashstart;
+
+	/* hashtree: device with hashes.
+	 * Note, payload == hashtree is okay as long as the size of
+	 *       ti->len passed to device mapper does not include
+	 *       the hashes.
+	 */
+	if (dm_get_device(ti, hashtree,
+			  dm_table_get_mode(ti->table), &vc->hash_dev)) {
+		ti->error = "Hash device lookup failed";
+		goto bad_hash_dev;
+	}
+
+	/* arg4: cryptographic digest algorithm */
+	if (snprintf(vc->hash_alg, CRYPTO_MAX_ALG_NAME, "%s", alg) >=
+	    CRYPTO_MAX_ALG_NAME) {
+		ti->error = "Hash algorithm name is too long";
+		goto bad_hash;
+	}
+
+	/* override with optional device-specific error behavior */
+	vc->error_behavior = verity_parse_error_behavior(dev_error_behavior);
+	if (vc->error_behavior == -1) {
+		ti->error = "Bad error_behavior supplied";
+		goto bad_err_behavior;
+	}
+
+	/* TODO: Maybe issues a request on the io queue for block 0? */
+
+	/* Argument processing is done, setup operational data */
+	/* Pool for dm_verity_io objects */
+	vc->io_pool = mempool_create_slab_pool(MIN_IOS, _verity_io_pool);
+	if (!vc->io_pool) {
+		ti->error = "Cannot allocate verity io mempool";
+		goto bad_slab_pool;
+	}
+
+	/* Allocate the bioset used for request padding */
+	/* TODO(wad) allocate a separate bioset for the first verify maybe */
+	vc->bs = bioset_create(MIN_BIOS, 0);
+	if (!vc->bs) {
+		ti->error = "Cannot allocate verity bioset";
+		goto bad_bs;
+	}
+
+	ti->num_flush_requests = 1;
+	ti->private = vc;
+
+	/* TODO(wad) add device and hash device names */
+	{
+		char hashdev[BDEVNAME_SIZE], vdev[BDEVNAME_SIZE];
+		bdevname(vc->hash_dev->bdev, hashdev);
+		bdevname(vc->dev->bdev, vdev);
+		DMINFO("dev:%s hash:%s [sectors:%llu blocks:%llu]", vdev,
+		       hashdev, ULL(vc->bht.sectors), ULL(blocks));
+	}
+	return 0;
+
+bad_bs:
+	mempool_destroy(vc->io_pool);
+bad_slab_pool:
+bad_err_behavior:
+bad_hash:
+	dm_put_device(ti, vc->hash_dev);
+bad_hash_dev:
+bad_hash_start:
+	dm_put_device(ti, vc->dev);
+bad_bht:
+bad_root_hexdigest:
+bad_verity_dev:
+	kfree(vc);   /* hash is not secret so no need to zero */
+	return -EINVAL;
+}
+
+static void verity_dtr(struct dm_target *ti)
+{
+	struct verity_config *vc = (struct verity_config *) ti->private;
+
+	bioset_free(vc->bs);
+	mempool_destroy(vc->io_pool);
+	dm_bht_destroy(&vc->bht);
+	dm_put_device(ti, vc->hash_dev);
+	dm_put_device(ti, vc->dev);
+	kfree(vc);
+}
+
+static int verity_status(struct dm_target *ti, status_type_t type,
+			char *result, unsigned int maxlen)
+{
+	struct verity_config *vc = (struct verity_config *) ti->private;
+	unsigned int sz = 0;
+	char hashdev[BDEVNAME_SIZE], vdev[BDEVNAME_SIZE];
+	u8 hexdigest[VERITY_MAX_DIGEST_SIZE * 2 + 1] = { 0 };
+
+	dm_bht_root_hexdigest(&vc->bht, hexdigest, sizeof(hexdigest));
+
+	switch (type) {
+	case STATUSTYPE_INFO:
+		break;
+	case STATUSTYPE_TABLE:
+		bdevname(vc->hash_dev->bdev, hashdev);
+		bdevname(vc->dev->bdev, vdev);
+		DMEMIT("/dev/%s /dev/%s %llu %u %s %s",
+			vdev,
+			hashdev,
+			ULL(vc->hash_start),
+			vc->bht.depth,
+			vc->hash_alg,
+			hexdigest);
+		break;
+	}
+	return 0;
+}
+
+static int verity_merge(struct dm_target *ti, struct bvec_merge_data *bvm,
+		       struct bio_vec *biovec, int max_size)
+{
+	struct verity_config *vc = ti->private;
+	struct request_queue *q = bdev_get_queue(vc->dev->bdev);
+
+	if (!q->merge_bvec_fn)
+		return max_size;
+
+	bvm->bi_bdev = vc->dev->bdev;
+	bvm->bi_sector = vc->start + bvm->bi_sector - ti->begin;
+
+	/* Optionally, this could just return 0 to stick to single pages. */
+	return min(max_size, q->merge_bvec_fn(q, bvm, biovec));
+}
+
+static int verity_iterate_devices(struct dm_target *ti,
+				 iterate_devices_callout_fn fn, void *data)
+{
+	struct verity_config *vc = ti->private;
+
+	return fn(ti, vc->dev, vc->start, ti->len, data);
+}
+
+static void verity_io_hints(struct dm_target *ti,
+			    struct queue_limits *limits)
+{
+	struct verity_config *vc = ti->private;
+	unsigned int block_size = vc->bht.block_size;
+
+	limits->logical_block_size = block_size;
+	limits->physical_block_size = block_size;
+	blk_limits_io_min(limits, block_size);
+}
+
+static struct target_type verity_target = {
+	.name   = "verity",
+	.version = {0, 1, 0},
+	.module = THIS_MODULE,
+	.ctr    = verity_ctr,
+	.dtr    = verity_dtr,
+	.map    = verity_map,
+	.merge  = verity_merge,
+	.status = verity_status,
+	.iterate_devices = verity_iterate_devices,
+	.io_hints = verity_io_hints,
+};
+
+#define VERITY_WQ_FLAGS (WQ_CPU_INTENSIVE|WQ_HIGHPRI)
+
+static int __init dm_verity_init(void)
+{
+	int r = -ENOMEM;
+
+	_verity_io_pool = KMEM_CACHE(dm_verity_io, 0);
+	if (!_verity_io_pool) {
+		DMERR("failed to allocate pool dm_verity_io");
+		goto bad_io_pool;
+	}
+
+	kverityd_ioq = alloc_workqueue("kverityd_io", VERITY_WQ_FLAGS, 1);
+	if (!kverityd_ioq) {
+		DMERR("failed to create workqueue kverityd_ioq");
+		goto bad_io_queue;
+	}
+
+	kveritydq = alloc_workqueue("kverityd", VERITY_WQ_FLAGS, 1);
+	if (!kveritydq) {
+		DMERR("failed to create workqueue kveritydq");
+		goto bad_verify_queue;
+	}
+
+	r = dm_register_target(&verity_target);
+	if (r < 0) {
+		DMERR("register failed %d", r);
+		goto register_failed;
+	}
+
+	DMINFO("version %u.%u.%u loaded", verity_target.version[0],
+	       verity_target.version[1], verity_target.version[2]);
+
+	return r;
+
+register_failed:
+	destroy_workqueue(kveritydq);
+bad_verify_queue:
+	destroy_workqueue(kverityd_ioq);
+bad_io_queue:
+	kmem_cache_destroy(_verity_io_pool);
+bad_io_pool:
+	return r;
+}
+
+static void __exit dm_verity_exit(void)
+{
+	destroy_workqueue(kveritydq);
+	destroy_workqueue(kverityd_ioq);
+
+	dm_unregister_target(&verity_target);
+	kmem_cache_destroy(_verity_io_pool);
+}
+
+module_init(dm_verity_init);
+module_exit(dm_verity_exit);
+
+MODULE_AUTHOR("The Chromium OS Authors <chromium-os-dev@chromium.org>");
+MODULE_DESCRIPTION(DM_NAME " target for transparent disk integrity checking");
+MODULE_LICENSE("GPL");
diff --git a/drivers/md/dm-verity.h b/drivers/md/dm-verity.h
new file mode 100644
index 0000000..e0664c9
--- /dev/null
+++ b/drivers/md/dm-verity.h
@@ -0,0 +1,45 @@
+/*
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *                    All Rights Reserved.
+ *
+ * This file is released under the GPLv2.
+ *
+ * Provide error types for use when creating a custom error handler.
+ * See Documentation/device-mapper/dm-verity.txt
+ */
+#ifndef DM_VERITY_H
+#define DM_VERITY_H
+
+#include <linux/notifier.h>
+
+struct dm_verity_error_state {
+	int code;
+	int transient;  /* Likely to not happen after a reboot */
+	u64 block;
+	const char *message;
+
+	sector_t dev_start;
+	sector_t dev_len;
+	struct block_device *dev;
+
+	sector_t hash_dev_start;
+	sector_t hash_dev_len;
+	struct block_device *hash_dev;
+
+	/* Final behavior after all notifications are completed. */
+	int behavior;
+};
+
+/* This enum must be matched to allowed_error_behaviors in dm-verity.c */
+enum dm_verity_error_behavior {
+	DM_VERITY_ERROR_BEHAVIOR_EIO = 0,
+	DM_VERITY_ERROR_BEHAVIOR_PANIC,
+	DM_VERITY_ERROR_BEHAVIOR_NONE,
+	DM_VERITY_ERROR_BEHAVIOR_NOTIFY
+};
+
+
+int dm_verity_register_error_notifier(struct notifier_block *nb);
+int dm_verity_unregister_error_notifier(struct notifier_block *nb);
+
+#endif  /* DM_VERITY_H */
diff --git a/include/linux/dm-bht.h b/include/linux/dm-bht.h
new file mode 100644
index 0000000..0595911
--- /dev/null
+++ b/include/linux/dm-bht.h
@@ -0,0 +1,166 @@
+/*
+ * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ *
+ * Device-Mapper block hash tree interface.
+ * See Documentation/device-mapper/dm-bht.txt for details.
+ *
+ * This file is released under the GPLv2.
+ */
+#ifndef __LINUX_DM_BHT_H
+#define __LINUX_DM_BHT_H
+
+#include <linux/compiler.h>
+#include <linux/crypto.h>
+#include <linux/types.h>
+
+/* To avoid allocating memory for digest tests, we just setup a
+ * max to use for now.
+ */
+#define DM_BHT_MAX_DIGEST_SIZE 128  /* 1k hashes are unlikely for now */
+#define DM_BHT_SALT_SIZE       32   /* 256 bits of salt is a lot */
+
+/* UNALLOCATED, PENDING, READY, and VERIFIED are valid states. All other
+ * values are entry-related return codes.
+ */
+#define DM_BHT_ENTRY_VERIFIED 8  /* 'nodes' has been checked against parent */
+#define DM_BHT_ENTRY_READY 4  /* 'nodes' is loaded and available */
+#define DM_BHT_ENTRY_PENDING 2  /* 'nodes' is being loaded */
+#define DM_BHT_ENTRY_UNALLOCATED 0 /* untouched */
+#define DM_BHT_ENTRY_ERROR -1 /* entry is unsuitable for use */
+#define DM_BHT_ENTRY_ERROR_IO -2 /* I/O error on load */
+
+/* Additional possible return codes */
+#define DM_BHT_ENTRY_ERROR_MISMATCH -3 /* Digest mismatch */
+
+/* dm_bht_entry
+ * Contains dm_bht->node_count tree nodes at a given tree depth.
+ * state is used to transactionally assure that data is paged in
+ * from disk.  Unless dm_bht kept running crypto contexts for each
+ * level, we need to load in the data for on-demand verification.
+ */
+struct dm_bht_entry {
+	atomic_t state; /* see defines */
+	/* Keeping an extra pointer per entry wastes up to ~33k of
+	 * memory if a 1m blocks are used (or 66 on 64-bit arch)
+	 */
+	void *io_context;  /* Reserve a pointer for use during io */
+	/* data should only be non-NULL if fully populated. */
+	void *nodes;  /* The hash data used to verify the children.
+		       * Guaranteed to be page-aligned.
+		       */
+};
+
+/* dm_bht_level
+ * Contains an array of entries which represent a page of hashes where
+ * each hash is a node in the tree at the given tree depth/level.
+ */
+struct dm_bht_level {
+	struct dm_bht_entry *entries;  /* array of entries of tree nodes */
+	unsigned int count;  /* number of entries at this level */
+	sector_t sector;  /* starting sector for this level */
+};
+
+/* opaque context, start, databuf, sector_count */
+typedef int(*dm_bht_callback)(void *,  /* external context */
+			      sector_t,  /* start sector */
+			      u8 *,  /* destination page */
+			      sector_t,  /* num sectors */
+			      struct dm_bht_entry *);
+/* dm_bht - Device mapper block hash tree
+ * dm_bht provides a fixed interface for comparing data blocks
+ * against a cryptographic hashes stored in a hash tree. It
+ * optimizes the tree structure for storage on disk.
+ *
+ * The tree is built from the bottom up.  A collection of data,
+ * external to the tree, is hashed and these hashes are stored
+ * as the blocks in the tree.  For some number of these hashes,
+ * a parent node is created by hashing them.  These steps are
+ * repeated.
+ *
+ * TODO(wad): All hash storage memory is pre-allocated and freed once an
+ * entire branch has been verified.
+ */
+struct dm_bht {
+	/* Configured values */
+	int depth;  /* Depth of the tree including the root */
+	unsigned int block_count;  /* Number of blocks hashed */
+	unsigned int block_size;  /* Size of a hash block */
+	char hash_alg[CRYPTO_MAX_ALG_NAME];
+	unsigned char salt[DM_BHT_SALT_SIZE];
+
+	/* Computed values */
+	unsigned int node_count;  /* Data size (in hashes) for each entry */
+	unsigned int node_count_shift;  /* first bit set - 1 */
+	/* There is one per CPU so that verified can be simultaneous. */
+	struct hash_desc hash_desc[NR_CPUS];  /* Container for the hash alg */
+	unsigned int digest_size;
+	sector_t sectors;  /* Number of disk sectors used */
+
+	/* bool verified;  Full tree is verified */
+	u8 root_digest[DM_BHT_MAX_DIGEST_SIZE];
+	struct dm_bht_level *levels;  /* in reverse order */
+	/* Callback for reading from the hash device */
+	dm_bht_callback read_cb;
+};
+
+/* Constructor for struct dm_bht instances. */
+int dm_bht_create(struct dm_bht *bht,
+		  unsigned int block_count,
+		  unsigned int block_size,
+		  const char *alg_name);
+/* Destructor for struct dm_bht instances.  Does not free @bht */
+void dm_bht_destroy(struct dm_bht *bht);
+
+/* Basic accessors for struct dm_bht */
+int dm_bht_set_root_hexdigest(struct dm_bht *bht, const u8 *hexdigest);
+int dm_bht_root_hexdigest(struct dm_bht *bht, u8 *hexdigest, int available);
+void dm_bht_set_salt(struct dm_bht *bht, const char *hexsalt);
+int dm_bht_salt(struct dm_bht *bht, char *hexsalt);
+
+/* Functions for loading in data from disk for verification */
+bool dm_bht_is_populated(struct dm_bht *bht, unsigned int block);
+int dm_bht_populate(struct dm_bht *bht, void *read_cb_ctx,
+		    unsigned int block);
+int dm_bht_verify_block(struct dm_bht *bht, unsigned int block,
+			struct page *pg, unsigned int offset);
+void dm_bht_read_completed(struct dm_bht_entry *entry, int status);
+
+/* Functions for converting indices to nodes. */
+
+static inline unsigned int dm_bht_get_level_shift(struct dm_bht *bht,
+						  int depth)
+{
+	return (bht->depth - depth) * bht->node_count_shift;
+}
+
+/* For the given depth, this is the entry index.  At depth+1 it is the node
+ * index for depth.
+ */
+static inline unsigned int dm_bht_index_at_level(struct dm_bht *bht,
+							int depth,
+							unsigned int leaf)
+{
+	return leaf >> dm_bht_get_level_shift(bht, depth);
+}
+
+static inline struct dm_bht_entry *dm_bht_get_entry(struct dm_bht *bht,
+						    int depth,
+						    unsigned int block)
+{
+	unsigned int index = dm_bht_index_at_level(bht, depth, block);
+	struct dm_bht_level *level = &bht->levels[depth];
+
+	return &level->entries[index];
+}
+
+static inline void *dm_bht_get_node(struct dm_bht *bht,
+				  struct dm_bht_entry *entry,
+				  int depth,
+				  unsigned int block)
+{
+	unsigned int index = dm_bht_index_at_level(bht, depth, block);
+	unsigned int node_index = index % bht->node_count;
+
+	return entry->nodes + (node_index * bht->digest_size);
+}
+#endif  /* __LINUX_DM_BHT_H */
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2012-03-10  0:03 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-22  0:36 [PATCH] dm: verity target Mandeep Singh Baines
  -- strict thread matches above, loose matches on Subject: below --
2012-03-10  0:03 Mandeep Singh Baines
2012-03-02  0:33 Mandeep Singh Baines
2012-03-02 16:08 ` Mandeep Singh Baines
2012-02-28 22:57 Mandeep Singh Baines
2012-02-29 21:16 ` Mikulas Patocka
2012-03-01  6:24   ` Mandeep Singh Baines
2012-02-29 21:30 ` Andrew Morton
2012-01-04 21:49 Mandeep Singh Baines
2012-01-04 22:42 ` Mandeep Singh Baines
2011-11-10  5:18 Mandeep Singh Baines
2011-11-10  7:44 ` Steffen Klassert
2011-11-10 14:42   ` Will Drewry
2011-09-15 22:02 Wesley Miaw
2011-09-15 18:45 Mandeep Singh Baines
2011-09-16 17:54 ` Valdis.Kletnieks
2011-09-27 19:02 ` Will Drewry
2011-09-27 19:13   ` Alasdair G Kergon
2011-09-27 19:31     ` Will Drewry
2011-09-28 21:30   ` Valdis.Kletnieks
2011-09-29  1:07     ` John Stoffel
2011-09-29 17:31     ` Mandeep Singh Baines

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).