All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] Data integrity protection with dm-integrity and dm-crypt
@ 2017-01-04 19:23 Milan Broz
  2017-01-04 19:23 ` [RFC PATCH 1/4] dm-table: Add flag to allow own target handling of integrity metadata Milan Broz
                   ` (11 more replies)
  0 siblings, 12 replies; 14+ messages in thread
From: Milan Broz @ 2017-01-04 19:23 UTC (permalink / raw)
  To: dm-devel; +Cc: Milan Broz

Hello,

following patches are the first iteration of our work on providing
data integrity protection on the block device level.

This covering mail is very simple introduction to motivation of this work,
more rigorous description will come in the form of some paper later.

Our main goal is to provide an option to implement strong data integrity
protection on the low block device level, without need of any specialized
hardware - just as an extension of device-mapper infrastructure.

The solution is meant to be used over a standard common-off-the-shelf device.
N.B. we are not replacing nor disputing a need to implement some data integrity
protection on the higher level. But the state of integrity protection
on higher level is in sad state in Linux in general and we believe that
this target can provide at least something to the rescue :)

My special focus (and my academic motivation) is to provide cryptographically
strong data protection in combination with full disk encryption algorithms.

In principle, so-called authenticated encryption provides guaranties that data
were not tampered with, so only authorised user can decrypt, change or verify data
integrity.
N.B. Current full disk encryption algorithms provide only confidentiality,
they does NOT provide any data integrity protection (because of its
length-preserving nature).
Corrupted data cannot be easily detected (except detection that decrypted
data seems to be garbage).

I. dm-integrity - a "simulated" data integrity profile
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Any integrity protection requires additional data to store data protection tags.
Because we are operating on the sector level, we need an additional
per-sector metadata for any integrity (or other) tags.

Attached patches include new dm-integrity target that provides basic logic
to store additional metadata per-sector.

On top of it we can provide data integrity protection, either inside
dm-integrity itself or in stacked device like dm-crypt.

The basic idea of dm-integrity is to "simulate" DIF/IP profile over
a normal block device and store a per-sector metadata in interleaved metadata
sectors. It using existing implementation of integrity profile metadata
already present in kernel, just registering own configurable profile.

To provide atomic update of data + metadata sectors, dm-integrity implements
also optional data journal.

The dm-integrity target can operate in two modes:
    * it can use metadata itself and calculate non-cryptographic data integrity
      (for example CRC or hash-based checksum)

    * it can receive bio structure with attached integrity fields and
      just provide handler for these metadata (read/store them to metadata sectors).
      It is up to higher target how the metadata are used.

The second mode is used for the dm-crypt authenticated encryption extension.

II. dm-crypt and authenticated encryption
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The second part adds possibility to use Authenticated Encryption with Additional
Data (AEAD) algorithms for dm-crypt.

These algorithms can encrypt data and also produce additional authentication
tag that provides cryptographically sound integrity protection.
There can be also additional data (that are only authenticated but not encrypted),
in disk encryption is it for example sector number (so the sector fails
integrity validation if ciphertext sector is relocated).

This authentication tag is then stored into metadata space provided
by underlying dm-integrity target. On top of this we can also store
another metadata like randomly generated, Initialization Vector (IV).
(This could increase security of disk-encryption model. Basically if IV
can be randomly regenerated on every write, ciphertext sector will change
pseudo-randomly on every write, even with the same written data.
This is the best possible achievable security scenario for disk-encryption.)

In principle we can use any authenticated mode provided by kernel crypto API
(but not all authenticated mode are secure in this environment, this applies
mainly for standard GCM mode).

We can also compose AEAD mode using existing disk encryption mode (XTS) and
combine it with keyed hash (HMAC) like HMAC(SHA256).

For illustration, the dm-crypt authenticated encryption uses following format:

 |-ADDITIONAL DATA-|------ DATA -------|-- AUTH TAG --| 
 | (authenticated) | (auth+encryption) |              |
 | sector_LE |  IV |  sector in/out    |  tag in/out  |

Sector_LE is sector number in little-endian, IV is authenticated as well.

The logic is in principle compatible with IEEE 1619.1 standard recommendation.

~~~~~~~~~~~~~~~~~~~

For more information please see individual patches.

The price for this solution is obviously storage space and performance.
Our solution can be quite easily extended of other information stored in metadata,
for example of Forward Error Correction (FEC) codes.
(I plan to experiment with this later).

I will try to provide some testing examples soon (we have already experimental
support in LUKS userspace based on experimental LUKS2 format).

(I hope I can show more on DevConf.cz talk later this January in Brno,
I hope we can discuss it there as well.)

Any constructive feedback welcome!

Thanks,
Milan (and Mikulas)

Mikulas Patocka (2):
  Add sector start offset to dm-bufio interface.
  Add the dm-integrity target

Milan Broz (2):
  dm-table: Add flag to allow own target handling of integrity metadata.
  Add cryptographic data integrity protection (authenticated encryption)
    to dm-crypt.

 Documentation/device-mapper/dm-crypt.txt     |   16 +
 Documentation/device-mapper/dm-integrity.txt |  189 ++
 drivers/md/Kconfig                           |   10 +
 drivers/md/Makefile                          |    1 +
 drivers/md/dm-bufio.c                        |   51 +-
 drivers/md/dm-bufio.h                        |    7 +
 drivers/md/dm-crypt.c                        |  857 ++++++--
 drivers/md/dm-integrity.c                    | 3031 ++++++++++++++++++++++++++
 drivers/md/dm-table.c                        |   11 +
 include/linux/device-mapper.h                |    6 +
 10 files changed, 4029 insertions(+), 150 deletions(-)
 create mode 100644 Documentation/device-mapper/dm-integrity.txt
 create mode 100644 drivers/md/dm-integrity.c

-- 
2.11.0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH 1/4] dm-table: Add flag to allow own target handling of integrity metadata.
  2017-01-04 19:23 [RFC PATCH 0/4] Data integrity protection with dm-integrity and dm-crypt Milan Broz
@ 2017-01-04 19:23 ` Milan Broz
  2017-01-04 19:23 ` [RFC PATCH 2/4] Add sector start offset to dm-bufio interface Milan Broz
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Milan Broz @ 2017-01-04 19:23 UTC (permalink / raw)
  To: dm-devel; +Cc: Milan Broz

This patch adds DM_TARGET_INTEGRITY flag that specifies that
bio integrity metadata is not inherited but implemented in target itself.

Signed-off-by: Milan Broz <gmazyland@gmail.com>
---
 drivers/md/dm-table.c         | 11 +++++++++++
 include/linux/device-mapper.h |  6 ++++++
 2 files changed, 17 insertions(+)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 0a427de23ed2..95094c133d6c 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -47,6 +47,7 @@ struct dm_table {
 	bool integrity_supported:1;
 	bool singleton:1;
 	bool all_blk_mq:1;
+	unsigned integrity_added:1;
 
 	/*
 	 * Indicates the rw permissions for the new logical
@@ -725,6 +726,9 @@ int dm_table_add_target(struct dm_table *t, const char *type,
 		t->immutable_target_type = tgt->type;
 	}
 
+	if (dm_target_has_integrity(tgt->type))
+		t->integrity_added = 1;
+
 	tgt->table = t;
 	tgt->begin = start;
 	tgt->len = len;
@@ -1168,6 +1172,10 @@ static int dm_table_register_integrity(struct dm_table *t)
 	struct mapped_device *md = t->md;
 	struct gendisk *template_disk = NULL;
 
+	/* If target handles integrity itself do not register it here. */
+	if (t->integrity_added)
+		return 0;
+
 	template_disk = dm_table_get_integrity_disk(t);
 	if (!template_disk)
 		return 0;
@@ -1394,6 +1402,9 @@ static void dm_table_verify_integrity(struct dm_table *t)
 {
 	struct gendisk *template_disk = NULL;
 
+	if (t->integrity_added)
+		return;
+
 	if (t->integrity_supported) {
 		/*
 		 * Verify that the original integrity profile
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index ef7962e84444..b508e8169133 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -224,6 +224,12 @@ struct target_type {
  */
 typedef unsigned (*dm_num_write_bios_fn) (struct dm_target *ti, struct bio *bio);
 
+/*
+ * A target implements own bio data integrity.
+ */
+#define DM_TARGET_INTEGRITY		0x00000010
+#define dm_target_has_integrity(type)	((type)->features & DM_TARGET_INTEGRITY)
+
 struct dm_target {
 	struct dm_table *table;
 	struct target_type *type;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 2/4] Add sector start offset to dm-bufio interface.
  2017-01-04 19:23 [RFC PATCH 0/4] Data integrity protection with dm-integrity and dm-crypt Milan Broz
  2017-01-04 19:23 ` [RFC PATCH 1/4] dm-table: Add flag to allow own target handling of integrity metadata Milan Broz
@ 2017-01-04 19:23 ` Milan Broz
  2017-01-04 19:23 ` [RFC PATCH 3/4] Add the dm-integrity target Milan Broz
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Milan Broz @ 2017-01-04 19:23 UTC (permalink / raw)
  To: dm-devel; +Cc: Mikulas Patocka, Milan Broz

From: Mikulas Patocka <mpatocka@redhat.com>

This patch adds possibility to set sector offset for dm-bufio client.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Milan Broz <gmazyland@gmail.com>
---
 drivers/md/dm-bufio.c | 51 ++++++++++++++++++++++++++++++++-------------------
 drivers/md/dm-bufio.h |  7 +++++++
 2 files changed, 39 insertions(+), 19 deletions(-)

diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index 84d2f0e4c754..0105f9196ab3 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -109,6 +109,8 @@ struct dm_bufio_client {
 	struct rb_root buffer_tree;
 	wait_queue_head_t free_buffer_wait;
 
+	sector_t start;
+
 	int async_write_error;
 
 	struct list_head client_list;
@@ -556,8 +558,8 @@ static void dmio_complete(unsigned long error, void *context)
 	b->bio.bi_end_io(&b->bio);
 }
 
-static void use_dmio(struct dm_buffer *b, int rw, sector_t block,
-		     bio_end_io_t *end_io)
+static void use_dmio(struct dm_buffer *b, int rw, sector_t sector,
+		     unsigned n_sectors, bio_end_io_t *end_io)
 {
 	int r;
 	struct dm_io_request io_req = {
@@ -569,8 +571,8 @@ static void use_dmio(struct dm_buffer *b, int rw, sector_t block,
 	};
 	struct dm_io_region region = {
 		.bdev = b->c->bdev,
-		.sector = block << b->c->sectors_per_block_bits,
-		.count = b->c->block_size >> SECTOR_SHIFT,
+		.sector = sector,
+		.count = n_sectors,
 	};
 
 	if (b->data_mode != DATA_MODE_VMALLOC) {
@@ -605,14 +607,14 @@ static void inline_endio(struct bio *bio)
 	end_fn(bio);
 }
 
-static void use_inline_bio(struct dm_buffer *b, int rw, sector_t block,
-			   bio_end_io_t *end_io)
+static void use_inline_bio(struct dm_buffer *b, int rw, sector_t sector,
+			   unsigned n_sectors, bio_end_io_t *end_io)
 {
 	char *ptr;
 	int len;
 
 	bio_init(&b->bio, b->bio_vec, DM_BUFIO_INLINE_VECS);
-	b->bio.bi_iter.bi_sector = block << b->c->sectors_per_block_bits;
+	b->bio.bi_iter.bi_sector = sector;
 	b->bio.bi_bdev = b->c->bdev;
 	b->bio.bi_end_io = inline_endio;
 	/*
@@ -627,7 +629,7 @@ static void use_inline_bio(struct dm_buffer *b, int rw, sector_t block,
 	 * If len < PAGE_SIZE the buffer doesn't cross page boundary.
 	 */
 	ptr = b->data;
-	len = b->c->block_size;
+	len = n_sectors << SECTOR_SHIFT;
 
 	if (len >= PAGE_SIZE)
 		BUG_ON((unsigned long)ptr & (PAGE_SIZE - 1));
@@ -639,7 +641,7 @@ static void use_inline_bio(struct dm_buffer *b, int rw, sector_t block,
 				  len < PAGE_SIZE ? len : PAGE_SIZE,
 				  offset_in_page(ptr))) {
 			BUG_ON(b->c->block_size <= PAGE_SIZE);
-			use_dmio(b, rw, block, end_io);
+			use_dmio(b, rw, sector, n_sectors, end_io);
 			return;
 		}
 
@@ -650,17 +652,22 @@ static void use_inline_bio(struct dm_buffer *b, int rw, sector_t block,
 	submit_bio(&b->bio);
 }
 
-static void submit_io(struct dm_buffer *b, int rw, sector_t block,
-		      bio_end_io_t *end_io)
+static void submit_io(struct dm_buffer *b, int rw, bio_end_io_t *end_io)
 {
+	unsigned n_sectors;
+	sector_t sector;
+
 	if (rw == WRITE && b->c->write_callback)
 		b->c->write_callback(b);
 
-	if (b->c->block_size <= DM_BUFIO_INLINE_VECS * PAGE_SIZE &&
+	sector = (b->block << b->c->sectors_per_block_bits) + b->c->start;
+	n_sectors = 1 << b->c->sectors_per_block_bits;
+
+	if (n_sectors <= ((DM_BUFIO_INLINE_VECS * PAGE_SIZE) >> SECTOR_SHIFT) &&
 	    b->data_mode != DATA_MODE_VMALLOC)
-		use_inline_bio(b, rw, block, end_io);
+		use_inline_bio(b, rw, sector, n_sectors, end_io);
 	else
-		use_dmio(b, rw, block, end_io);
+		use_dmio(b, rw, sector, n_sectors, end_io);
 }
 
 /*----------------------------------------------------------------
@@ -712,7 +719,7 @@ static void __write_dirty_buffer(struct dm_buffer *b,
 	wait_on_bit_lock_io(&b->state, B_WRITING, TASK_UNINTERRUPTIBLE);
 
 	if (!write_list)
-		submit_io(b, WRITE, b->block, write_endio);
+		submit_io(b, WRITE, write_endio);
 	else
 		list_add_tail(&b->write_list, write_list);
 }
@@ -725,7 +732,7 @@ static void __flush_write_list(struct list_head *write_list)
 		struct dm_buffer *b =
 			list_entry(write_list->next, struct dm_buffer, write_list);
 		list_del(&b->write_list);
-		submit_io(b, WRITE, b->block, write_endio);
+		submit_io(b, WRITE, write_endio);
 		cond_resched();
 	}
 	blk_finish_plug(&plug);
@@ -1093,7 +1100,7 @@ static void *new_read(struct dm_bufio_client *c, sector_t block,
 		return NULL;
 
 	if (need_submit)
-		submit_io(b, READ, b->block, read_endio);
+		submit_io(b, READ, read_endio);
 
 	wait_on_bit_io(&b->state, B_READING, TASK_UNINTERRUPTIBLE);
 
@@ -1163,7 +1170,7 @@ void dm_bufio_prefetch(struct dm_bufio_client *c,
 			dm_bufio_unlock(c);
 
 			if (need_submit)
-				submit_io(b, READ, b->block, read_endio);
+				submit_io(b, READ, read_endio);
 			dm_bufio_release(b);
 
 			cond_resched();
@@ -1404,7 +1411,7 @@ void dm_bufio_release_move(struct dm_buffer *b, sector_t new_block)
 		old_block = b->block;
 		__unlink_buffer(b);
 		__link_buffer(b, new_block, b->list_mode);
-		submit_io(b, WRITE, new_block, write_endio);
+		submit_io(b, WRITE, write_endio);
 		wait_on_bit_io(&b->state, B_WRITING,
 			       TASK_UNINTERRUPTIBLE);
 		__unlink_buffer(b);
@@ -1761,6 +1768,12 @@ void dm_bufio_client_destroy(struct dm_bufio_client *c)
 }
 EXPORT_SYMBOL_GPL(dm_bufio_client_destroy);
 
+void dm_bufio_set_sector_offset(struct dm_bufio_client *c, sector_t start)
+{
+	c->start = start;
+}
+EXPORT_SYMBOL_GPL(dm_bufio_set_sector_offset);
+
 static unsigned get_max_age_hz(void)
 {
 	unsigned max_age = ACCESS_ONCE(dm_bufio_max_age);
diff --git a/drivers/md/dm-bufio.h b/drivers/md/dm-bufio.h
index c096779a7292..b6d8f53ec15b 100644
--- a/drivers/md/dm-bufio.h
+++ b/drivers/md/dm-bufio.h
@@ -32,6 +32,13 @@ dm_bufio_client_create(struct block_device *bdev, unsigned block_size,
 void dm_bufio_client_destroy(struct dm_bufio_client *c);
 
 /*
+ * Set the sector range.
+ * When this function is called, there must be no I/O in progress on the bufio
+ * client.
+ */
+void dm_bufio_set_sector_offset(struct dm_bufio_client *c, sector_t start);
+
+/*
  * WARNING: to avoid deadlocks, these conditions are observed:
  *
  * - At most one thread can hold at most "reserved_buffers" simultaneously.
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 3/4] Add the dm-integrity target
  2017-01-04 19:23 [RFC PATCH 0/4] Data integrity protection with dm-integrity and dm-crypt Milan Broz
  2017-01-04 19:23 ` [RFC PATCH 1/4] dm-table: Add flag to allow own target handling of integrity metadata Milan Broz
  2017-01-04 19:23 ` [RFC PATCH 2/4] Add sector start offset to dm-bufio interface Milan Broz
@ 2017-01-04 19:23 ` Milan Broz
  2017-01-04 19:23 ` [RFC PATCH 4/4] Add cryptographic data integrity protection (authenticated encryption) to dm-crypt Milan Broz
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Milan Broz @ 2017-01-04 19:23 UTC (permalink / raw)
  To: dm-devel; +Cc: Mikulas Patocka, Milan Broz

From: Mikulas Patocka <mpatocka@redhat.com>

The dm-integrity target emulates a block device that has additional
per-sector tags that can be used for storing integrity information.

A general problem with storing integrity tags with every sector is that
writing the sector and the integrity tag must be atomic - i.e. in case of
crash, either both sector and integrity tag or none of them is written.

To guarantee write atomicity, the dm-integrity target uses journal, it
writes sector data and integrity tags into a journal, commits the journal
and then copies the data and integrity tags to their respective location.

The dm-integrity target can be used with the dm-crypt target - in this
situation the dm-crypt target creates the integrity data and passes them
to the dm-integrity target via bio_integrity_payload attached to the bio.
In this mode, the dm-crypt and dm-integrity targets provide authenticated
disk encryption - if the attacker modifies the encrypted device, an I/O
error is returned instead of random data.

The dm-integrity target can also be used as a standalone target, in this
mode it calculates and verifies the integrity tag internally. In this
mode, the dm-integrity target can be used to detect silent data
corruption on the disk or in the I/O path.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Milan Broz <gmazyland@gmail.com>
---
 Documentation/device-mapper/dm-integrity.txt |  189 ++
 drivers/md/Kconfig                           |   10 +
 drivers/md/Makefile                          |    1 +
 drivers/md/dm-integrity.c                    | 3031 ++++++++++++++++++++++++++
 4 files changed, 3231 insertions(+)
 create mode 100644 Documentation/device-mapper/dm-integrity.txt
 create mode 100644 drivers/md/dm-integrity.c

diff --git a/Documentation/device-mapper/dm-integrity.txt b/Documentation/device-mapper/dm-integrity.txt
new file mode 100644
index 000000000000..2406f56501dc
--- /dev/null
+++ b/Documentation/device-mapper/dm-integrity.txt
@@ -0,0 +1,189 @@
+The dm-integrity target emulates a block device that has additional
+per-sector tags that can be used for storing integrity information.
+
+A general problem with storing integrity tags with every sector is that
+writing the sector and the integrity tag must be atomic - i.e. in case of
+crash, either both sector and integrity tag or none of them is written.
+
+To guarantee write atomicity, the dm-integrity target uses journal, it
+writes sector data and integrity tags into a journal, commits the journal
+and then copies the data and integrity tags to their respective location.
+
+The dm-integrity target can be used with the dm-crypt target - in this
+situation the dm-crypt target creates the integrity data and passes them
+to the dm-integrity target via bio_integrity_payload attached to the bio.
+In this mode, the dm-crypt and dm-integrity targets provide authenticated
+disk encryption - if the attacker modifies the encrypted device, an I/O
+error is returned instead of random data.
+
+The dm-integrity target can also be used as a standalone target, in this
+mode it calculates and verifies the integrity tag internally. In this
+mode, the dm-integrity target can be used to detect silent data
+corruption on the disk or in the I/O path.
+
+
+When loading the target for the first time, the kernel driver will format
+the device. But it will only format the device if the superblock contains
+zeroes. If the superblock is neither valid nor zeroed, the dm-integrity
+target can't be loaded.
+
+To use the target for the first time:
+1. overwrite the superblock with zeroes
+2. load the dm-integrity target with one-sector size, the kernel driver
+	will format the device
+3. unload the dm-integrity target
+4. read the "provided_data_sectors" value from the superblock
+5. load the dm-integrity target with the the target size
+	"provided_data_sectors"
+6. if you want to use dm-integrity with dm-crypt, load the dm-crypt target
+	with the size "provided_data_sectors"
+
+
+Target arguments:
+
+1. the underlying block device
+
+2. the number of reserved sector at the beginning of the device - the
+	dm-integrity won't read of write these sectors
+
+3. the size of the integrity tag (if "-" is used, the size is taken from
+	the internal-hash algorithm)
+
+4. mode:
+	D - direct writes (without journal) - in this mode, journaling is
+		not used and data sectors and integrity tags are written
+		separately. In case of crash, it is possible that the data
+		and integrity tag doesn't match.
+	J - journaled writes - data and integrity tags are written to the
+		journal and atomicity is guaranteed. In case of crash,
+		either both data and tag or none of them are written. The
+		journaled mode degrades write throughput twice because the
+		data have to be written twice.
+
+5. the number of additional arguments
+
+Additional arguments:
+
+journal-sectors:number
+	The size of journal, this argument is used only if formatting the
+	device. If the device is already formatted, the value from the
+	superblock is used.
+
+interleave-sectors:number
+	The number of interleaved sectors. This values is rounded down to
+	a power of two. If the device is already formatted, the value from
+	the superblock is used.
+
+buffer-sectors:number
+	The number of sectors in one buffer. The value is rounded down to
+	a power of two.
+
+	The tag area is accessed using buffers, the buffer size is
+	configurable. The large buffer size means that the I/O size will
+	be larger, but there could be less I/Os issued.
+
+journal-watermark:number
+	The journal watermark in percents. When the size of the journal
+	exceeds this watermark, the thread that flushes the journal will
+	be started.
+
+commit-time:number
+	Commit time in milliseconds. When this time passes, the journal is
+	written. The journal is also written immediatelly if the FLUSH
+	request is received.
+
+internal-hash:algorithm(:key)	(the key is optional)
+	Use internal hash or crc.
+	When this argument is used, the dm-integrity target won't accept
+	integrity tags from the upper target, but it will automatically
+	generate and verify the integrity tags.
+
+	You can use a crc algorithm (such as crc32), then integrity target
+	will protect the data against accidental corruption.
+	You can also use a hmac algorithm (for example
+	"hmac(sha256):0123456789abcdef"), in this mode it will provide
+	cryptographic authentication of the data without encryption.
+
+	When this argument is not used, the integrity tags are accepted
+	from an upper layer target, such as dm-crypt. The upper layer
+	target should check the validity of the integrity tags.
+
+journal-crypt:algorithm(:key)	(the key is optional)
+	Encrypt the journal using given algorithm to make sure that the
+	attacker can't read the journal. You can use a block cipher here
+	(such as "cbc(aes)") or a stream cipher (for example "chacha20",
+	"salsa20", "ctr(aes)" or "ecb(arc4)").
+
+	The journal contains history of last writes to the block device,
+	an attacker reading the journal could see the last sector nubmers
+	that were written. From the sector numbers, the attacker can infer
+	the size of files that were written. To protect against this
+	situation, you can encrypt the journal.
+
+journal-mac:algorithm(:key)	(the key is optional)
+	Protect sector numbers in the journal from accidental or malicious
+	modification. To protect against accidental modification, use a
+	crc algorithm, to protect against malicious modification, use a
+	hmac algorithm with a key.
+
+	This option is not needed when using internal-hash because in this
+	mode, the integrity of journal entries is checked when replaying
+	the journal. Thus, modified sector number would be detected at
+	this stage.
+
+
+The journal mode (D/J), buffer-sectors, journal-watermark, commit-time can
+be changed when reloading the target (load an inactive table and swap the
+tables with suspend and resume). The other arguments should not be changed
+when reloading the target because the layout of disk data depend on them
+and the reloaded target would be non-functional.
+
+
+The layout of the formatted block device:
+* reserved sectors (they are not used by this target, they can be used for
+  storing LUKS metadata or for other purpose), the size of the reserved
+  area is specified in the target arguments
+* superblock (4kiB)
+	* magic string - identifies that the device was formatted
+	* version
+	* log2(interleave sectors)
+	* integrity tag size
+	* the number of journal sections
+	* provided data sectors - the number of sectors that this target
+	  provides (i.e. the size of the device minus the size of all
+	  metadata and padding). The user of this target should not send
+	  bios that access data beyond the "provided data sectors" limit.
+	* flags - a flag is set if journal-mac is used
+* journal
+	The journal is divided into sections, each section contains:
+	* metadata area (4kiB), it contains journal entries
+	  every journal entry contains:
+		* logical sector (specifies where the data and tag should
+		  be written)
+		* last 8 bytes of data
+		* integrity tag (the size is specified in the superblock)
+	    every metadata sector ends with
+		* mac (8-bytes), all the macs in 8 metadata sectors form a
+		  64-byte value. It is used to store hmac of sector
+		  numbers in the journal section, to protect against a
+		  possibility that the attacker tampers with sector
+		  numbers in the journal.
+		* commit id
+	* data area (the size is variable; it depends on how many journal
+	  entries fit into the metadata area)
+	    every sector in the data area contains:
+		* data (504 bytes of data, the last 8 bytes are stored in
+		  the journal entry)
+		* commit id
+	To test if the whole journal section was written correctly, every
+	512-byte sector of the journal ends with 8-byte commit id. If the
+	commit id matches on all sectors in a journal section, then it is
+	assumed that the section was written correctly. If the commit id
+	doesn't match, the section was written partially and it should not
+	be replayed.
+* one or more runs of interleaved tags and data. Each run contains:
+	* tag area - it contains integrity tags. There is one tag for each
+	  sector in the data area
+	* data area - it contains data sectors. The number of data sectors
+	  in one run must be a power of two. log2 of this value is stored
+	  in the superblock.
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index b7767da50c26..0bac9ca2819d 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -508,4 +508,14 @@ config DM_LOG_WRITES
 
 	  If unsure, say N.
 
+config DM_INTEGRITY
+	tristate "Integrity target"
+	depends on BLK_DEV_DM
+	select BLK_DEV_INTEGRITY
+	select DM_BUFIO
+	select CRYPTO
+	select ASYNC_XOR
+	---help---
+	   This is the integrity target.
+
 endif # MD
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index 3cbda1af87a0..61270eadf5ec 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_DM_CACHE_SMQ)	+= dm-cache-smq.o
 obj-$(CONFIG_DM_CACHE_CLEANER)	+= dm-cache-cleaner.o
 obj-$(CONFIG_DM_ERA)		+= dm-era.o
 obj-$(CONFIG_DM_LOG_WRITES)	+= dm-log-writes.o
+obj-$(CONFIG_DM_INTEGRITY)	+= dm-integrity.o
 
 ifeq ($(CONFIG_DM_UEVENT),y)
 dm-mod-objs			+= dm-uevent.o
diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c
new file mode 100644
index 000000000000..0816deb453a0
--- /dev/null
+++ b/drivers/md/dm-integrity.c
@@ -0,0 +1,3031 @@
+/*
+ * Copyright (C) 2016-2017 Red Hat, Inc. All rights reserved.
+ * Copyright (C) 2016-2017 Milan Broz
+ * Copyright (C) 2016-2017 Mikulas Patocka
+ *
+ * This file is released under the GPL.
+ */
+
+#include <linux/module.h>
+#include <linux/device-mapper.h>
+#include <linux/dm-io.h>
+#include <linux/vmalloc.h>
+#include <linux/sort.h>
+#include <linux/rbtree.h>
+#include <linux/delay.h>
+#include <linux/random.h>
+#include <crypto/hash.h>
+#include <crypto/skcipher.h>
+#include <linux/async_tx.h>
+#include "dm-bufio.h"
+
+#define DM_MSG_PREFIX "integrity"
+
+#define DEFAULT_INTERLEAVE_SECTORS	32768
+#define DEFAULT_JOURNAL_SIZE_FACTOR	7
+#define DEFAULT_BUFFER_SECTORS		128
+#define DEFAULT_JOURNAL_WATERMARK	50
+#define DEFAULT_SYNC_MSEC		10000
+#define DEFAULT_MAX_JOURNAL_SECTORS	131072
+#define MIN_INTERLEAVE_SECTORS		3
+#define MAX_INTERLEAVE_SECTORS		31
+#define METADATA_WORKQUEUE_MAX_ACTIVE	16
+
+/*
+ * Warning - DEBUG_PRINT prints security-sensitive data to the log,
+ * so it should not be enabled in the official kernel
+ */
+//#define DEBUG_PRINT
+//#define INTERNAL_VERIFY
+
+/*
+ * On disk structures
+ */
+
+#define SB_MAGIC			"integrt"
+#define SB_VERSION			1
+#define SB_SECTORS			8
+
+struct superblock {
+	__u8 magic[8];
+	__u8 version;
+	__u8 log2_interleave_sectors;
+	__u16 integrity_tag_size;
+	__u32 journal_sections;
+	__u64 provided_data_sectors;	/* userspace uses this value */
+	__u32 flags;
+};
+
+#define SB_FLAG_HAVE_JOURNAL_MAC	0x1
+
+#define	JOURNAL_ENTRY_ROUNDUP		8
+
+typedef __u64 commit_id_t;
+#define JOURNAL_MAC_PER_SECTOR		8
+
+struct journal_entry {
+	union {
+		struct {
+			__u32 sector_lo;
+			__u32 sector_hi;
+		} s;
+		__u64 sector;
+	} u;
+	commit_id_t last_bytes;
+	__u8 tag[0];
+};
+
+#if BITS_PER_LONG == 64
+#define journal_entry_set_sector(je, x)		do { smp_wmb(); ACCESS_ONCE((je)->u.sector) = cpu_to_le64(x); } while (0)
+#define journal_entry_get_sector(je)		le64_to_cpu((je)->u.sector)
+#elif defined(CONFIG_LBDAF)
+#define journal_entry_set_sector(je, x)		do { (je)->u.s.sector_lo = cpu_to_le32(x); smp_wmb(); ACCESS_ONCE((je)->u.s.sector_hi) = cpu_to_le32((x) >> 32); } while (0)
+#define journal_entry_get_sector(je)		le64_to_cpu((je)->u.sector)
+#else
+#define journal_entry_set_sector(je, x)		do { (je)->u.s.sector_lo = cpu_to_le32(x); smp_wmb(); ACCESS_ONCE((je)->u.s.sector_hi) = cpu_to_le32(0); } while (0)
+#define journal_entry_get_sector(je)		le32_to_cpu((je)->u.s.sector_lo)
+#endif
+#define journal_entry_is_unused(je)		((je)->u.s.sector_hi == cpu_to_le32(-1))
+#define journal_entry_set_unused(je)		do { ((je)->u.s.sector_hi = cpu_to_le32(-1)); } while (0)
+#define journal_entry_is_inprogress(je)		((je)->u.s.sector_hi == cpu_to_le32(-2))
+#define journal_entry_set_inprogress(je)	do { ((je)->u.s.sector_hi = cpu_to_le32(-2)); } while (0)
+
+#define JOURNAL_BLOCK_SECTORS		8
+#define JOURNAL_SECTOR_DATA		((1 << SECTOR_SHIFT) - sizeof(commit_id_t))
+#define JOURNAL_MAC_SIZE		(JOURNAL_MAC_PER_SECTOR * JOURNAL_BLOCK_SECTORS)
+
+struct journal_sector {
+	__u8 entries[JOURNAL_SECTOR_DATA - JOURNAL_MAC_PER_SECTOR];
+	__u8 mac[JOURNAL_MAC_PER_SECTOR];
+	commit_id_t commit_id;
+};
+
+#define MAX_TAG_SIZE			(JOURNAL_SECTOR_DATA - JOURNAL_MAC_PER_SECTOR - offsetof(struct journal_entry, tag))
+
+#define METADATA_PADDING_SECTORS	8
+
+#define N_COMMIT_IDS			4
+
+static unsigned char prev_commit_seq(unsigned char seq)
+{
+	return (seq + N_COMMIT_IDS - 1) % N_COMMIT_IDS;
+}
+
+static unsigned char next_commit_seq(unsigned char seq)
+{
+	return (seq + 1) % N_COMMIT_IDS;
+}
+
+/*
+ * In-memory structures
+ */
+
+struct journal_node {
+	struct rb_node node;
+	sector_t sector;
+};
+
+struct alg_spec {
+	char *alg_string;
+	char *key_string;
+	__u8 *key;
+	unsigned key_size;
+};
+
+struct dm_integrity_c {
+	struct dm_dev *dev;
+	unsigned tag_size;
+	__s8 log2_tag_size;
+	sector_t start;
+	mempool_t *journal_io_mempool;
+	struct dm_io_client *io;
+	struct dm_bufio_client *bufio;
+	struct workqueue_struct *metadata_wq;
+	struct superblock *sb;
+	unsigned journal_pages;
+	struct page_list *journal;
+	struct page_list *journal_io;
+	struct page_list *journal_xor;
+
+	struct crypto_skcipher *journal_crypt;
+	struct scatterlist **journal_scatterlist;
+	struct scatterlist **journal_io_scatterlist;
+	struct skcipher_request **sk_requests;
+
+	struct crypto_shash *journal_mac;
+
+	struct journal_node *journal_tree;
+	struct rb_root journal_tree_root;
+
+	sector_t provided_data_sectors;
+
+	unsigned short journal_entry_size;
+	unsigned char journal_entries_per_sector;
+	unsigned char journal_section_entries;
+	unsigned char journal_section_sectors;
+	unsigned journal_sections;
+	unsigned journal_entries;
+	sector_t device_sectors;
+	unsigned initial_sectors;
+	unsigned metadata_run;
+	__s8 log2_metadata_run;
+	__u8 log2_buffer_sectors;
+
+	bool direct_writes;
+	bool suspending;
+
+	int failed;
+
+	struct crypto_shash *internal_hash;
+
+	/* these variables are locked with endio_wait.lock */
+	struct rb_root in_progress;
+	wait_queue_head_t endio_wait;
+	struct workqueue_struct *wait_wq;
+
+	unsigned char commit_seq;
+	commit_id_t commit_ids[N_COMMIT_IDS];
+
+	unsigned committed_section;
+	unsigned n_committed_sections;
+
+	unsigned uncommitted_section;
+	unsigned n_uncommitted_sections;
+
+	unsigned free_section;
+	unsigned char free_section_entry;
+	unsigned free_sectors;
+
+	unsigned free_sectors_threshold;
+
+	struct workqueue_struct *commit_wq;
+	struct work_struct commit_work;
+
+	struct workqueue_struct *writer_wq;
+	struct work_struct writer_work;
+
+	struct bio_list flush_bio_list;
+
+	unsigned long autocommit_jiffies;
+	struct timer_list autocommit_timer;
+	unsigned autocommit_msec;
+
+	wait_queue_head_t copy_to_journal_wait;
+
+	struct completion crypto_backoff;
+
+	bool journal_uptodate;
+	bool just_formatted;
+
+	struct alg_spec internal_hash_alg;
+	struct alg_spec journal_crypt_alg;
+	struct alg_spec journal_mac_alg;
+};
+
+struct dm_integrity_range {
+	sector_t logical_sector;
+	unsigned n_sectors;
+	struct rb_node node;
+};
+
+struct dm_integrity_io {
+	struct work_struct work;
+
+	struct dm_integrity_c *ic;
+	bool write;
+	bool fua;
+
+	struct dm_integrity_range range;
+
+	sector_t metadata_block;
+	unsigned metadata_offset;
+
+	atomic_t in_flight;
+	int bi_error;
+
+	struct completion *completion;
+
+	struct block_device *orig_bi_bdev;
+	bio_end_io_t *orig_bi_end_io;
+	struct bio_integrity_payload *orig_bi_integrity;
+	struct bvec_iter orig_bi_iter;
+};
+
+struct journal_completion {
+	struct dm_integrity_c *ic;
+	atomic_t in_flight;
+	struct completion comp;
+};
+
+struct journal_io {
+	struct dm_integrity_range range;
+	struct journal_completion *comp;
+};
+
+static struct kmem_cache *journal_io_cache;
+
+#define JOURNAL_IO_MEMPOOL	32
+
+#ifdef DEBUG_PRINT
+#define DEBUG_print(x, ...)	printk(KERN_DEBUG x, ##__VA_ARGS__)
+static void __DEBUG_bytes(__u8 *bytes, size_t len, const char *msg, ...)
+{
+	va_list args;
+	va_start(args, msg);
+	vprintk(msg, args);
+	va_end(args);
+	if (len)
+		pr_cont(":");
+	while (len) {
+		pr_cont(" %02x", *bytes);
+		bytes++;
+		len--;
+	}
+	pr_cont("\n");
+}
+#define DEBUG_bytes(bytes, len, msg, ...)	__DEBUG_bytes(bytes, len, KERN_DEBUG msg, ##__VA_ARGS__)
+#else
+#define DEBUG_print(x, ...)			do { } while (0)
+#define DEBUG_bytes(bytes, len, msg, ...)	do { } while (0)
+#endif
+
+/*
+ * DM Integrity profile, protection is performed layer above (dm-crypt)
+ */
+static struct blk_integrity_profile dm_integrity_profile = {
+	.name			= "DM-DIF-EXT-TAG",
+	.generate_fn		= NULL,
+	.verify_fn		= NULL,
+};
+
+static void dm_integrity_map_continue(struct dm_integrity_io *dio, bool from_map);
+static void integrity_bio_wait(struct work_struct *w);
+static void dm_integrity_dtr(struct dm_target *ti);
+
+static void dm_integrity_io_error(struct dm_integrity_c *ic, const char *msg, int err)
+{
+	if (!cmpxchg(&ic->failed, 0, err))
+		DMERR("Error on %s: %d", msg, err);
+}
+
+static int dm_integrity_failed(struct dm_integrity_c *ic)
+{
+	return ACCESS_ONCE(ic->failed);
+}
+
+static commit_id_t dm_integrity_commit_id(struct dm_integrity_c *ic, unsigned i, unsigned j, unsigned char seq)
+{
+	/*
+	 * Xor the number with section and sector, so that if a piece of
+	 * journal is written at wrong place, it is detected.
+	 */
+	return ic->commit_ids[seq] ^ cpu_to_le64(((__u64)i << 32) ^ j);
+}
+
+static void get_area_and_offset(struct dm_integrity_c *ic, sector_t data_sector, sector_t *area, sector_t *offset)
+{
+	__u8 log2_interleave_sectors = ic->sb->log2_interleave_sectors;
+	*area = data_sector >> log2_interleave_sectors;
+	*offset = (unsigned)data_sector & ((1U << log2_interleave_sectors) - 1);
+}
+
+static __u64 get_metadata_sector_and_offset(struct dm_integrity_c *ic, sector_t area, sector_t offset, unsigned *metadata_offset)
+{
+	__u64 ms;
+	unsigned mo;
+
+	ms = area << ic->sb->log2_interleave_sectors;
+	if (likely(ic->log2_metadata_run >= 0))
+		ms += area << ic->log2_metadata_run;
+	else
+		ms += area * ic->metadata_run;
+	ms >>= ic->log2_buffer_sectors;
+
+	if (likely(ic->log2_tag_size >= 0)) {
+		ms += offset >> (SECTOR_SHIFT + ic->log2_buffer_sectors - ic->log2_tag_size);
+		mo = (offset << ic->log2_tag_size) & ((1U << SECTOR_SHIFT << ic->log2_buffer_sectors) - 1);
+	} else {
+		ms += (__u64)offset * ic->tag_size >> (SECTOR_SHIFT + ic->log2_buffer_sectors);
+		mo = (offset * ic->tag_size) & ((1U << SECTOR_SHIFT << ic->log2_buffer_sectors) - 1);
+	}
+	*metadata_offset = mo;
+	return ms;
+}
+
+static sector_t get_data_sector(struct dm_integrity_c *ic, sector_t area, sector_t offset)
+{
+	sector_t result;
+
+	result = area << ic->sb->log2_interleave_sectors;
+	if (likely(ic->log2_metadata_run >= 0))
+		result += (area + 1) << ic->log2_metadata_run;
+	else
+		result += (area + 1) * ic->metadata_run;
+
+	result += (sector_t)ic->initial_sectors + offset;
+	return result;
+}
+
+static void wraparound_section(struct dm_integrity_c *ic, unsigned *sec_ptr)
+{
+	if (unlikely(*sec_ptr >= ic->journal_sections))
+		*sec_ptr -= ic->journal_sections;
+}
+
+static int sync_rw_sb(struct dm_integrity_c *ic, int op, int op_flags)
+{
+	struct dm_io_request io_req;
+	struct dm_io_region io_loc;
+
+	io_req.bi_op = op;
+	io_req.bi_op_flags = op_flags;
+	io_req.mem.type = DM_IO_KMEM;
+	io_req.mem.ptr.addr = ic->sb;
+	io_req.notify.fn = NULL;
+	io_req.client = ic->io;
+	io_loc.bdev = ic->dev->bdev;
+	io_loc.sector = ic->start;
+	io_loc.count = SB_SECTORS;
+
+	return dm_io(&io_req, 1, &io_loc, NULL);
+}
+
+static void access_journal_check(struct dm_integrity_c *ic, unsigned section, unsigned offset, bool e, const char *function)
+{
+#if defined(CONFIG_DM_DEBUG) || defined(INTERNAL_VERIFY)
+	unsigned limit = e ? ic->journal_section_entries : ic->journal_section_sectors;
+
+	if (unlikely(section >= ic->journal_sections) ||
+	    unlikely(offset >= limit)) {
+		printk(KERN_CRIT "%s: invalid access at (%u,%u), limit (%u,%u)\n",
+			function, section, offset, ic->journal_sections, limit);
+		BUG();
+	}
+#endif
+}
+
+static void page_list_location(struct dm_integrity_c *ic, unsigned section, unsigned offset, unsigned *pl_index, unsigned *pl_offset)
+{
+	unsigned sector;
+
+	access_journal_check(ic, section, offset, false, "access_journal");
+
+	sector = section * ic->journal_section_sectors + offset;
+
+	*pl_index = sector >> (PAGE_SHIFT - SECTOR_SHIFT);
+	*pl_offset = (sector << SECTOR_SHIFT) & (PAGE_SIZE - 1);
+}
+
+static struct journal_sector *access_page_list(struct dm_integrity_c *ic, struct page_list *pl, unsigned section, unsigned offset, unsigned *n_sectors)
+{
+	unsigned pl_index, pl_offset;
+	char *va;
+
+	page_list_location(ic, section, offset, &pl_index, &pl_offset);
+
+	if (n_sectors)
+		*n_sectors = (PAGE_SIZE - pl_offset) >> SECTOR_SHIFT;
+
+	va = lowmem_page_address(pl[pl_index].page);
+
+	return (struct journal_sector *)(va + pl_offset);
+}
+
+static struct journal_sector *access_journal(struct dm_integrity_c *ic, unsigned section, unsigned offset)
+{
+	return access_page_list(ic, ic->journal, section, offset, NULL);
+}
+
+static struct journal_entry *access_journal_entry(struct dm_integrity_c *ic, unsigned section, unsigned n)
+{
+	unsigned rel_sector, offset;
+	struct journal_sector *js;
+
+	access_journal_check(ic, section, n, true, "access_journal_entry");
+
+	rel_sector = n % JOURNAL_BLOCK_SECTORS;
+	offset = n / JOURNAL_BLOCK_SECTORS;
+
+	js = access_journal(ic, section, rel_sector);
+	return (struct journal_entry *)((char *)js + offset * ic->journal_entry_size);
+}
+
+static struct journal_sector *access_journal_data(struct dm_integrity_c *ic, unsigned section, unsigned n)
+{
+	access_journal_check(ic, section, n, true, "access_journal_data");
+
+	return access_journal(ic, section, n + JOURNAL_BLOCK_SECTORS);
+}
+
+static void section_mac(struct dm_integrity_c *ic, unsigned section, __u8 result[JOURNAL_MAC_SIZE])
+{
+	SHASH_DESC_ON_STACK(desc, ic->journal_mac);
+	int r;
+	unsigned j, size;
+
+	desc->tfm = ic->journal_mac;
+	desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP;
+
+	r = crypto_shash_init(desc);
+	if (unlikely(r)) {
+		dm_integrity_io_error(ic, "crypto_shash_init", r);
+		goto err;
+	}
+
+	for (j = 0; j < ic->journal_section_entries; j++) {
+		struct journal_entry *je = access_journal_entry(ic, section, j);
+		r = crypto_shash_update(desc, (__u8 *)&je->u.sector, sizeof je->u.sector);
+		if (unlikely(r)) {
+			dm_integrity_io_error(ic, "crypto_shash_update", r);
+			goto err;
+		}
+	}
+
+	size = crypto_shash_digestsize(ic->journal_mac);
+
+	if (likely(size <= JOURNAL_MAC_SIZE)) {
+		r = crypto_shash_final(desc, result);
+		if (unlikely(r)) {
+			dm_integrity_io_error(ic, "crypto_shash_final", r);
+			goto err;
+		}
+		memset(result + size, 0, JOURNAL_MAC_SIZE - size);
+	} else {
+		__u8 digest[size];
+		r = crypto_shash_final(desc, digest);
+		if (unlikely(r)) {
+			dm_integrity_io_error(ic, "crypto_shash_final", r);
+			goto err;
+		}
+		memcpy(result, digest, JOURNAL_MAC_SIZE);
+	}
+
+	return;
+
+err:
+	memset(result, 0, JOURNAL_MAC_SIZE);
+}
+
+static void rw_section_mac(struct dm_integrity_c *ic, unsigned section, bool wr)
+{
+	__u8 result[JOURNAL_MAC_SIZE];
+	unsigned j;
+
+	if (!ic->journal_mac)
+		return;
+
+	section_mac(ic, section, result);
+
+	for (j = 0; j < JOURNAL_BLOCK_SECTORS; j++) {
+		struct journal_sector *js = access_journal(ic, section, j);
+		if (likely(wr)) {
+			memcpy(&js->mac, result + (j * JOURNAL_MAC_PER_SECTOR), JOURNAL_MAC_PER_SECTOR);
+		} else {
+			if (memcmp(&js->mac, result + (j * JOURNAL_MAC_PER_SECTOR), JOURNAL_MAC_PER_SECTOR))
+				dm_integrity_io_error(ic, "journal mac", -EIO);
+		}
+	}
+}
+
+static void complete_journal_op(void *context)
+{
+	struct journal_completion *comp = context;
+	BUG_ON(!atomic_read(&comp->in_flight));
+	if (likely(atomic_dec_and_test(&comp->in_flight)))
+		complete(&comp->comp);
+}
+
+static void xor_journal(struct dm_integrity_c *ic, bool encrypt, unsigned section, unsigned n_sections, struct journal_completion *comp)
+{
+	struct async_submit_ctl submit;
+	size_t n_bytes = (size_t)(n_sections * ic->journal_section_sectors) << SECTOR_SHIFT;
+	unsigned pl_index, pl_offset, section_index;
+	struct page_list *source_pl, *target_pl;
+
+	if (likely(encrypt)) {
+		source_pl = ic->journal;
+		target_pl = ic->journal_io;
+	} else {
+		source_pl = ic->journal_io;
+		target_pl = ic->journal;
+	}
+
+	page_list_location(ic, section, 0, &pl_index, &pl_offset);
+
+	atomic_add(roundup(pl_offset + n_bytes, PAGE_SIZE) >> PAGE_SHIFT, &comp->in_flight);
+
+	init_async_submit(&submit, ASYNC_TX_XOR_ZERO_DST, NULL, complete_journal_op, comp, NULL);
+
+	section_index = pl_index;
+
+	do {
+		size_t this_step;
+		struct page *src_pages[2];
+		struct page *dst_page;
+
+		while (unlikely(pl_index == section_index)) {
+			unsigned dummy;
+			if (likely(encrypt))
+				rw_section_mac(ic, section, true);
+			section++;
+			n_sections--;
+			if (!n_sections)
+				break;
+			page_list_location(ic, section, 0, &section_index, &dummy);
+		}
+
+		this_step = min(n_bytes, (size_t)PAGE_SIZE - pl_offset);
+		dst_page = target_pl[pl_index].page;
+		src_pages[0] = source_pl[pl_index].page;
+		src_pages[1] = ic->journal_xor[pl_index].page;
+
+		async_xor(dst_page, src_pages, pl_offset, 2, this_step, &submit);
+
+		pl_index++;
+		pl_offset = 0;
+		n_bytes -= this_step;
+	} while (n_bytes);
+
+	BUG_ON(n_sections);
+
+	async_tx_issue_pending_all();
+}
+
+static void complete_journal_encrypt(struct crypto_async_request *req, int err)
+{
+	struct journal_completion *comp = req->data;
+	if (unlikely(err)) {
+		if (likely(err == -EINPROGRESS)) {
+			complete(&comp->ic->crypto_backoff);
+			return;
+		}
+		dm_integrity_io_error(comp->ic, "asynchronous encrypt", err);
+	}
+	complete_journal_op(comp);
+}
+
+static char do_crypt(bool encrypt, struct skcipher_request *req, struct journal_completion *comp)
+{
+	int r;
+	skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP, complete_journal_encrypt, comp);
+	if (likely(encrypt))
+		r = crypto_skcipher_encrypt(req);
+	else
+		r = crypto_skcipher_decrypt(req);
+	if (likely(!r))
+		return false;
+	if (likely(r == -EINPROGRESS))
+		return true;
+	if (likely(r == -EBUSY)) {
+		wait_for_completion(&comp->ic->crypto_backoff);
+		reinit_completion(&comp->ic->crypto_backoff);
+		return true;
+	}
+	dm_integrity_io_error(comp->ic, "encrypt", r);
+	return false;
+}
+
+static void crypt_journal(struct dm_integrity_c *ic, bool encrypt, unsigned section, unsigned n_sections, struct journal_completion *comp)
+{
+	struct scatterlist **source_sg;
+	struct scatterlist **target_sg;
+
+	atomic_add(2, &comp->in_flight);
+
+	if (likely(encrypt)) {
+		source_sg = ic->journal_scatterlist;
+		target_sg = ic->journal_io_scatterlist;
+	} else {
+		source_sg = ic->journal_io_scatterlist;
+		target_sg = ic->journal_scatterlist;
+	}
+
+	do {
+		struct skcipher_request *req;
+		unsigned ivsize;
+		char *iv;
+
+		if (likely(encrypt))
+			rw_section_mac(ic, section, true);
+
+		req = ic->sk_requests[section];
+		ivsize = crypto_skcipher_ivsize(ic->journal_crypt);
+		iv = req->iv;
+
+		memcpy(iv, iv + ivsize, ivsize);
+
+		req->src = source_sg[section];
+		req->dst = target_sg[section];
+
+		if (unlikely(do_crypt(encrypt, req, comp)))
+			atomic_inc(&comp->in_flight);
+
+		section++;
+		n_sections--;
+	} while (n_sections);
+
+	atomic_dec(&comp->in_flight);
+	complete_journal_op(comp);
+}
+
+static void encrypt_journal(struct dm_integrity_c *ic, bool encrypt, unsigned section, unsigned n_sections, struct journal_completion *comp)
+{
+	if (ic->journal_xor)
+		return xor_journal(ic, encrypt, section, n_sections, comp);
+	else
+		return crypt_journal(ic, encrypt, section, n_sections, comp);
+}
+
+static void complete_journal_io(unsigned long error, void *context)
+{
+	struct journal_completion *comp = context;
+	if (unlikely(error != 0))
+		dm_integrity_io_error(comp->ic, "writing journal", -EIO);
+	complete_journal_op(comp);
+}
+
+static void rw_journal(struct dm_integrity_c *ic, int op, int op_flags, unsigned section, unsigned n_sections, struct journal_completion *comp)
+{
+	struct dm_io_request io_req;
+	struct dm_io_region io_loc;
+	unsigned sector, n_sectors, pl_index, pl_offset;
+	int r;
+
+	if (unlikely(dm_integrity_failed(ic))) {
+		if (comp)
+			complete_journal_io(-1UL, comp);
+		return;
+	}
+
+	sector = section * ic->journal_section_sectors;
+	n_sectors = n_sections * ic->journal_section_sectors;
+
+	pl_index = sector >> (PAGE_SHIFT - SECTOR_SHIFT);
+	pl_offset = (sector << SECTOR_SHIFT) & (PAGE_SIZE - 1);
+
+	io_req.bi_op = op;
+	io_req.bi_op_flags = op_flags;
+	io_req.mem.type = DM_IO_PAGE_LIST;
+	if (ic->journal_io)
+		io_req.mem.ptr.pl = &ic->journal_io[pl_index];
+	else
+		io_req.mem.ptr.pl = &ic->journal[pl_index];
+	io_req.mem.offset = pl_offset;
+	if (likely(comp != NULL)) {
+		io_req.notify.fn = complete_journal_io;
+		io_req.notify.context = comp;
+	} else {
+		io_req.notify.fn = NULL;
+	}
+	io_req.client = ic->io;
+	io_loc.bdev = ic->dev->bdev;
+	io_loc.sector = ic->start + SB_SECTORS + sector;
+	io_loc.count = n_sectors;
+
+	r = dm_io(&io_req, 1, &io_loc, NULL);
+	if (unlikely(r)) {
+		dm_integrity_io_error(ic, op == REQ_OP_READ ? "reading journal" : "writing journal", r);
+		if (comp) {
+			WARN_ONCE(1, "asynchronous dm_io failed: %d", r);
+			complete_journal_io(-1UL, comp);
+		}
+	}
+}
+
+static void write_journal(struct dm_integrity_c *ic, unsigned commit_start, unsigned commit_sections)
+{
+	struct journal_completion io_comp;
+	struct journal_completion crypt_comp_1;
+	struct journal_completion crypt_comp_2;
+	unsigned i;
+
+	io_comp.ic = ic;
+	io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
+
+	if (commit_start + commit_sections <= ic->journal_sections) {
+		io_comp.in_flight = (atomic_t)ATOMIC_INIT(1);
+		if (ic->journal_io) {
+			crypt_comp_1.ic = ic;
+			crypt_comp_1.comp = COMPLETION_INITIALIZER_ONSTACK(crypt_comp_1.comp);
+			crypt_comp_1.in_flight = (atomic_t)ATOMIC_INIT(0);
+			encrypt_journal(ic, true, commit_start, commit_sections, &crypt_comp_1);
+			wait_for_completion_io(&crypt_comp_1.comp);
+		} else {
+			for (i = 0; i < commit_sections; i++)
+				rw_section_mac(ic, commit_start + i, true);
+		}
+		rw_journal(ic, REQ_OP_WRITE, REQ_FUA, commit_start, commit_sections, &io_comp);
+	} else {
+		unsigned to_end;
+		io_comp.in_flight = (atomic_t)ATOMIC_INIT(2);
+		to_end = ic->journal_sections - commit_start;
+		if (ic->journal_io) {
+			crypt_comp_1.ic = ic;
+			crypt_comp_1.comp = COMPLETION_INITIALIZER_ONSTACK(crypt_comp_1.comp);
+			crypt_comp_1.in_flight = (atomic_t)ATOMIC_INIT(0);
+			encrypt_journal(ic, true, commit_start, to_end, &crypt_comp_1);
+			if (try_wait_for_completion(&crypt_comp_1.comp)) {
+				rw_journal(ic, REQ_OP_WRITE, REQ_FUA, commit_start, to_end, &io_comp);
+				crypt_comp_1.comp = COMPLETION_INITIALIZER_ONSTACK(crypt_comp_1.comp);
+				crypt_comp_1.in_flight = (atomic_t)ATOMIC_INIT(0);
+				encrypt_journal(ic, true, 0, commit_sections - to_end, &crypt_comp_1);
+				wait_for_completion_io(&crypt_comp_1.comp);
+			} else {
+				crypt_comp_2.ic = ic;
+				crypt_comp_2.comp = COMPLETION_INITIALIZER_ONSTACK(crypt_comp_2.comp);
+				crypt_comp_2.in_flight = (atomic_t)ATOMIC_INIT(0);
+				encrypt_journal(ic, true, 0, commit_sections - to_end, &crypt_comp_2);
+				wait_for_completion_io(&crypt_comp_1.comp);
+				rw_journal(ic, REQ_OP_WRITE, REQ_FUA, commit_start, to_end, &io_comp);
+				wait_for_completion_io(&crypt_comp_2.comp);
+			}
+		} else {
+			for (i = 0; i < to_end; i++)
+				rw_section_mac(ic, commit_start + i, true);
+			rw_journal(ic, REQ_OP_WRITE, REQ_FUA, commit_start, to_end, &io_comp);
+			for (i = 0; i < commit_sections - to_end; i++)
+				rw_section_mac(ic, i, true);
+		}
+		rw_journal(ic, REQ_OP_WRITE, REQ_FUA, 0, commit_sections - to_end, &io_comp);
+	}
+
+	wait_for_completion_io(&io_comp.comp);
+}
+
+static void copy_from_journal(struct dm_integrity_c *ic, unsigned section, unsigned offset, unsigned n_sectors, sector_t target, io_notify_fn fn, void *data)
+{
+	struct dm_io_request io_req;
+	struct dm_io_region io_loc;
+	int r;
+	unsigned sector, pl_index, pl_offset;
+
+	if (unlikely(dm_integrity_failed(ic))) {
+		fn(-1UL, data);
+		return;
+	}
+
+	sector = section * ic->journal_section_sectors + JOURNAL_BLOCK_SECTORS + offset;
+
+	pl_index = sector >> (PAGE_SHIFT - SECTOR_SHIFT);
+	pl_offset = (sector << SECTOR_SHIFT) & (PAGE_SIZE - 1);
+
+	io_req.bi_op = REQ_OP_WRITE;
+	io_req.bi_op_flags = 0;
+	io_req.mem.type = DM_IO_PAGE_LIST;
+	io_req.mem.ptr.pl = &ic->journal[pl_index];
+	io_req.mem.offset = pl_offset;
+	io_req.notify.fn = fn;
+	io_req.notify.context = data;
+	io_req.client = ic->io;
+	io_loc.bdev = ic->dev->bdev;
+	io_loc.sector = ic->start + target;
+	io_loc.count = n_sectors;
+
+	r = dm_io(&io_req, 1, &io_loc, NULL);
+	if (unlikely(r)) {
+		WARN_ONCE(1, "asynchronous dm_io failed: %d", r);
+		fn(-1UL, data);
+	}
+}
+
+static bool add_new_range(struct dm_integrity_c *ic, struct dm_integrity_range *new_range)
+{
+	struct rb_node **n = &ic->in_progress.rb_node;
+	struct rb_node *parent;
+
+	parent = NULL;
+
+	while (*n) {
+		struct dm_integrity_range *range = container_of(*n, struct dm_integrity_range, node);
+		parent = *n;
+		if (new_range->logical_sector + new_range->n_sectors <= range->logical_sector) {
+			n = &range->node.rb_left;
+		} else if (new_range->logical_sector >= range->logical_sector + range->n_sectors) {
+			n = &range->node.rb_right;
+		} else {
+			return false;
+		}
+	}
+
+	rb_link_node(&new_range->node, parent, n);
+	rb_insert_color(&new_range->node, &ic->in_progress);
+
+	return true;
+}
+
+static void remove_range_unlocked(struct dm_integrity_c *ic, struct dm_integrity_range *range)
+{
+	rb_erase(&range->node, &ic->in_progress);
+	wake_up_locked(&ic->endio_wait);
+}
+
+static void remove_range(struct dm_integrity_c *ic, struct dm_integrity_range *range)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&ic->endio_wait.lock, flags);
+	remove_range_unlocked(ic, range);
+	spin_unlock_irqrestore(&ic->endio_wait.lock, flags);
+}
+
+static void init_journal_node(struct journal_node *node)
+{
+	RB_CLEAR_NODE(&node->node);
+	node->sector = (sector_t)-1;
+}
+
+static void add_journal_node(struct dm_integrity_c *ic, struct journal_node *node, sector_t sector)
+{
+	struct rb_node **link;
+	struct rb_node *parent;
+
+	node->sector = sector;
+	BUG_ON(!RB_EMPTY_NODE(&node->node));
+
+	link = &ic->journal_tree_root.rb_node;
+	parent = NULL;
+
+	while (*link) {
+		struct journal_node *j;
+		parent = *link;
+		j = container_of(parent, struct journal_node, node);
+		if (sector < j->sector)
+			link = &j->node.rb_left;
+		else
+			link = &j->node.rb_right;
+	}
+
+	rb_link_node(&node->node, parent, link);
+	rb_insert_color(&node->node, &ic->journal_tree_root);
+}
+
+static void remove_journal_node(struct dm_integrity_c *ic, struct journal_node *node)
+{
+	BUG_ON(RB_EMPTY_NODE(&node->node));
+	rb_erase(&node->node, &ic->journal_tree_root);
+	init_journal_node(node);
+}
+
+#define NOT_FOUND	(-1U)
+
+static unsigned find_journal_node(struct dm_integrity_c *ic, sector_t sector, sector_t *next_sector)
+{
+	struct rb_node *n = ic->journal_tree_root.rb_node;
+	unsigned found = NOT_FOUND;
+	*next_sector = (sector_t)-1;
+	while (n) {
+		struct journal_node *j = container_of(n, struct journal_node, node);
+		if (sector == j->sector) {
+			found = j - ic->journal_tree;
+		}
+		if (sector < j->sector) {
+			*next_sector = j->sector;
+			n = j->node.rb_left;
+		} else {
+			n = j->node.rb_right;
+		}
+	}
+
+	return found;
+}
+
+static bool test_journal_node(struct dm_integrity_c *ic, unsigned pos, sector_t sector)
+{
+	struct journal_node *node, *next_node;
+	struct rb_node *next;
+
+	if (unlikely(pos >= ic->journal_entries))
+		return false;
+	node = &ic->journal_tree[pos];
+	if (unlikely(RB_EMPTY_NODE(&node->node)))
+		return false;
+	if (unlikely(node->sector != sector))
+		return false;
+
+	next = rb_next(&node->node);
+	if (unlikely(!next))
+		return true;
+
+	next_node = container_of(next, struct journal_node, node);
+	return next_node->sector != sector;
+}
+
+static bool find_newer_committed_node(struct dm_integrity_c *ic, struct journal_node *node)
+{
+	struct rb_node *next;
+	struct journal_node *next_node;
+	unsigned next_section;
+
+	BUG_ON(RB_EMPTY_NODE(&node->node));
+
+	next = rb_next(&node->node);
+	if (unlikely(!next))
+		return false;
+
+	next_node = container_of(next, struct journal_node, node);
+
+	if (next_node->sector != node->sector)
+		return false;
+
+	next_section = (unsigned)(next_node - ic->journal_tree) / ic->journal_section_entries;
+	if (next_section >= ic->committed_section && next_section < ic->committed_section + ic->n_committed_sections)
+		return true;
+	if (next_section + ic->journal_sections < ic->committed_section + ic->n_committed_sections)
+		return true;
+
+	return false;
+}
+
+#define TAG_READ	0
+#define TAG_WRITE	1
+#define TAG_CMP		2
+
+static int dm_integrity_rw_tag(struct dm_integrity_c *ic, unsigned char *tag, sector_t *metadata_block, unsigned *metadata_offset, unsigned total_size, int op)
+{
+	do {
+		unsigned char *data, *dp;
+		struct dm_buffer *b;
+		unsigned to_copy;
+		int r;
+
+		r = dm_integrity_failed(ic);
+		if (unlikely(r))
+			return r;
+
+		data = dm_bufio_read(ic->bufio, *metadata_block, &b);
+		if (unlikely(IS_ERR(data)))
+			return PTR_ERR(data);
+
+		to_copy = min((1U << SECTOR_SHIFT << ic->log2_buffer_sectors) - *metadata_offset, total_size);
+		dp = data + *metadata_offset;
+		if (op == TAG_READ) {
+			memcpy(tag, dp, to_copy);
+		} else if (op == TAG_WRITE) {
+			memcpy(dp, tag, to_copy);
+			dm_bufio_mark_buffer_dirty(b);
+		} else /*if (op == TAG_CMP)*/ {
+			if (unlikely(memcmp(dp, tag, to_copy))) {
+				unsigned i;
+				/*printk("miscompare: %llx %x %x\n", (unsigned long long)*metadata_block, *metadata_offset, to_copy);
+				for (i = 0; i < to_copy; i++) {
+					printk("%03u : %02x - %02x\n", i, dp[i], tag[i]);
+				}*/
+				for (i = 0; i < to_copy; i++) {
+					if (dp[i] != tag[i])
+						break;
+					total_size--;
+				}
+				dm_bufio_release(b);
+				return total_size;
+			}
+		}
+		dm_bufio_release(b);
+
+		tag += to_copy;
+		*metadata_offset += to_copy;
+		if (unlikely(*metadata_offset == 1U << SECTOR_SHIFT << ic->log2_buffer_sectors)) {
+			(*metadata_block)++;
+			*metadata_offset = 0;
+		}
+		total_size -= to_copy;
+	} while (unlikely(total_size));
+
+	return 0;
+}
+
+static void dm_integrity_flush_buffers(struct dm_integrity_c *ic)
+{
+	int r;
+	r = dm_bufio_write_dirty_buffers(ic->bufio);
+	if (unlikely(r))
+		dm_integrity_io_error(ic, "writing tags", r);
+}
+
+static void sleep_on_endio_wait(struct dm_integrity_c *ic)
+{
+	DECLARE_WAITQUEUE(wait, current);
+	__add_wait_queue(&ic->endio_wait, &wait);
+	__set_current_state(TASK_UNINTERRUPTIBLE);
+	spin_unlock_irq(&ic->endio_wait.lock);
+	io_schedule();
+	spin_lock_irq(&ic->endio_wait.lock);
+	__remove_wait_queue(&ic->endio_wait, &wait);
+}
+
+static void autocommit_fn(unsigned long data)
+{
+	struct dm_integrity_c *ic = (struct dm_integrity_c *)data;
+	if (likely(!dm_integrity_failed(ic)))
+		queue_work(ic->commit_wq, &ic->commit_work);
+}
+
+static void schedule_autocommit(struct dm_integrity_c *ic)
+{
+	if (!timer_pending(&ic->autocommit_timer))
+		mod_timer(&ic->autocommit_timer, jiffies + ic->autocommit_jiffies);
+}
+
+static void submit_flush_bio(struct dm_integrity_c *ic, struct dm_integrity_io *dio)
+{
+	struct bio *bio;
+	spin_lock_irq(&ic->endio_wait.lock);
+	bio = dm_bio_from_per_bio_data(dio, sizeof(struct dm_integrity_io));
+	bio_list_add(&ic->flush_bio_list, bio);
+	spin_unlock_irq(&ic->endio_wait.lock);
+	queue_work(ic->commit_wq, &ic->commit_work);
+}
+
+static void do_endio(struct dm_integrity_c *ic, struct bio *bio)
+{
+	int r = dm_integrity_failed(ic);
+	if (unlikely(r) && !bio->bi_error)
+		bio->bi_error = r;
+	bio_endio(bio);
+}
+
+static void do_endio_flush(struct dm_integrity_c *ic, struct dm_integrity_io *dio)
+{
+	struct bio *bio = dm_bio_from_per_bio_data(dio, sizeof(struct dm_integrity_io));
+	if (unlikely(dio->fua) && likely(!bio->bi_error) && likely(!dm_integrity_failed(ic)))
+		submit_flush_bio(ic, dio);
+	else
+		do_endio(ic, bio);
+}
+
+static void dec_in_flight(struct dm_integrity_io *dio)
+{
+	if (atomic_dec_and_test(&dio->in_flight)) {
+		struct dm_integrity_c *ic = dio->ic;
+		struct bio *bio;
+
+		remove_range(ic, &dio->range);
+
+		if (unlikely(dio->write))
+			schedule_autocommit(ic);
+
+		bio = dm_bio_from_per_bio_data(dio, sizeof(struct dm_integrity_io));
+
+		if (unlikely(dio->bi_error) && !bio->bi_error)
+			bio->bi_error = dio->bi_error;
+		if (likely(!bio->bi_error) && unlikely(bio_sectors(bio) != dio->range.n_sectors)) {
+			dio->range.logical_sector += dio->range.n_sectors;
+			bio_advance(bio, dio->range.n_sectors << SECTOR_SHIFT);
+			INIT_WORK(&dio->work, integrity_bio_wait);
+			queue_work(ic->wait_wq, &dio->work);
+			return;
+		}
+		do_endio_flush(ic, dio);
+	}
+}
+
+static void integrity_end_io(struct bio *bio)
+{
+	struct dm_integrity_io *dio = dm_per_bio_data(bio, sizeof(struct dm_integrity_io));
+
+	bio->bi_iter = dio->orig_bi_iter;
+	bio->bi_bdev = dio->orig_bi_bdev;
+	if (dio->orig_bi_integrity) {
+		bio->bi_integrity = dio->orig_bi_integrity;
+		bio->bi_opf |= REQ_INTEGRITY;
+	}
+	bio->bi_end_io = dio->orig_bi_end_io;
+
+	if (dio->completion)
+		complete(dio->completion);
+
+	dec_in_flight(dio);
+}
+
+static void integrity_sector_checksum(struct dm_integrity_c *ic, sector_t sector, const char *data, char *result)
+{
+	__u64 sector_le = cpu_to_le64(sector);
+	SHASH_DESC_ON_STACK(req, ic->internal_hash);
+	int r;
+	unsigned digest_size;
+
+	req->tfm = ic->internal_hash;
+	req->flags = 0;
+
+	r = crypto_shash_init(req);
+	if (unlikely(r < 0)) {
+		dm_integrity_io_error(ic, "crypto_shash_init", r);
+		goto failed;
+	}
+
+	r = crypto_shash_update(req, (const __u8 *)&sector_le, sizeof sector_le);
+	if (unlikely(r < 0)) {
+		dm_integrity_io_error(ic, "crypto_shash_update", r);
+		goto failed;
+	}
+
+	r = crypto_shash_update(req, data, 1 << SECTOR_SHIFT);
+	if (unlikely(r < 0)) {
+		dm_integrity_io_error(ic, "crypto_shash_update", r);
+		goto failed;
+	}
+
+	r = crypto_shash_final(req, result);
+	if (unlikely(r < 0)) {
+		dm_integrity_io_error(ic, "crypto_shash_final", r);
+		goto failed;
+	}
+
+	digest_size = crypto_shash_digestsize(ic->internal_hash);
+	if (unlikely(digest_size < ic->tag_size))
+		memset(result + digest_size, 0, ic->tag_size - digest_size);
+
+	return;
+
+failed:
+	/* this shouldn't happen anyway, the hash functions have no reason to fail */
+	get_random_bytes(result, ic->tag_size);
+}
+
+static void integrity_metadata(struct work_struct *w)
+{
+	struct dm_integrity_io *dio = container_of(w, struct dm_integrity_io, work);
+	struct dm_integrity_c *ic = dio->ic;
+
+	int r;
+
+	if (ic->internal_hash) {
+		struct bvec_iter iter;
+		struct bio_vec bv;
+		unsigned digest_size = crypto_shash_digestsize(ic->internal_hash);
+		struct bio *bio = dm_bio_from_per_bio_data(dio, sizeof(struct dm_integrity_io));
+		char *checksums;
+		unsigned extra_space = digest_size > ic->tag_size ? digest_size - ic->tag_size : 0;
+		char checksums_onstack[ic->tag_size + extra_space];
+		unsigned sectors_to_process = dio->range.n_sectors;
+		sector_t sector = dio->range.logical_sector;
+
+		checksums = kmalloc((PAGE_SIZE >> SECTOR_SHIFT) * ic->tag_size + extra_space, GFP_NOIO | __GFP_NORETRY | __GFP_NOWARN);
+		if (!checksums)
+			checksums = checksums_onstack;
+
+		__bio_for_each_segment(bv, bio, iter, dio->orig_bi_iter) {
+			unsigned pos;
+			char *mem, *checksums_ptr;
+
+again:
+			mem = (char *)kmap_atomic(bv.bv_page) + bv.bv_offset;
+			pos = 0;
+			checksums_ptr = checksums;
+			do {
+				integrity_sector_checksum(ic, sector, mem + pos, checksums_ptr);
+				checksums_ptr += ic->tag_size;
+				sectors_to_process--;
+				pos += 1 << SECTOR_SHIFT;
+				sector++;
+			} while (pos < bv.bv_len && sectors_to_process && checksums != checksums_onstack);
+			kunmap_atomic(mem);
+
+			r = dm_integrity_rw_tag(ic, checksums, &dio->metadata_block, &dio->metadata_offset, checksums_ptr - checksums, !dio->write ? TAG_CMP : TAG_WRITE);
+			if (unlikely(r)) {
+				if (r > 0) {
+					DMERR("Checksum failed at sector 0x%llx", (unsigned long long)(sector - ((r + ic->tag_size - 1) / ic->tag_size)));
+					r = -EIO;
+				}
+				if (likely(checksums != checksums_onstack))
+					kfree(checksums);
+				goto error;
+			}
+
+			if (!sectors_to_process)
+				break;
+
+			if (unlikely(pos < bv.bv_len)) {
+				bv.bv_offset += pos;
+				bv.bv_len -= pos;
+				goto again;
+			}
+		}
+
+		if (likely(checksums != checksums_onstack))
+			kfree(checksums);
+	} else {
+		struct bio_integrity_payload *bip = dio->orig_bi_integrity;
+		if (bip) {
+			struct bio_vec biv;
+			struct bvec_iter iter;
+			unsigned data_to_process = dio->range.n_sectors * ic->tag_size;
+
+			bip_for_each_vec(biv, bip, iter) {
+				unsigned char *tag;
+				unsigned this_len;
+
+				BUG_ON(PageHighMem(biv.bv_page));
+				tag = lowmem_page_address(biv.bv_page) + biv.bv_offset;
+				this_len = min(biv.bv_len, data_to_process);
+				r = dm_integrity_rw_tag(ic, tag, &dio->metadata_block, &dio->metadata_offset, this_len, !dio->write ? TAG_READ : TAG_WRITE);
+				if (unlikely(r))
+					goto error;
+				data_to_process -= this_len;
+				if (!data_to_process)
+					break;
+			}
+		}
+	}
+	dec_in_flight(dio);
+	return;
+
+error:
+	dio->bi_error = r;
+	dec_in_flight(dio);
+}
+
+static int dm_integrity_map(struct dm_target *ti, struct bio *bio)
+{
+	struct dm_integrity_c *ic = ti->private;
+	struct dm_integrity_io *dio = dm_per_bio_data(bio, sizeof(struct dm_integrity_io));
+
+	sector_t area, offset;
+
+	dio->ic = ic;
+	dio->bi_error = 0;
+
+	if (unlikely(bio->bi_opf & REQ_PREFLUSH)) {
+		submit_flush_bio(ic, dio);
+		return DM_MAPIO_SUBMITTED;
+	}
+
+	dio->range.logical_sector = dm_target_offset(ti, bio->bi_iter.bi_sector);
+	dio->write = bio_op(bio) == REQ_OP_WRITE;
+	dio->fua = dio->write && bio->bi_opf & REQ_FUA;
+	if (unlikely(dio->fua)) {
+		/*
+		 * Don't pass down the FUA flag because we have to flush
+		 * disk cache anyway.
+		 */
+		bio->bi_opf &= ~REQ_FUA;
+	}
+	if (unlikely(dio->range.logical_sector + bio_sectors(bio) > ic->provided_data_sectors)) {
+		DMERR("Too big sector number: 0x%llx + 0x%x > 0x%llx", (unsigned long long)dio->range.logical_sector, bio_sectors(bio), (unsigned long long)ic->provided_data_sectors);
+		return -EIO;
+	}
+
+	get_area_and_offset(ic, dio->range.logical_sector, &area, &offset);
+	dio->metadata_block = get_metadata_sector_and_offset(ic, area, offset, &dio->metadata_offset);
+	bio->bi_iter.bi_sector = get_data_sector(ic, area, offset);
+
+	dm_integrity_map_continue(dio, true);
+	return DM_MAPIO_SUBMITTED;
+}
+
+static void dm_integrity_map_continue(struct dm_integrity_io *dio, bool from_map)
+{
+	struct dm_integrity_c *ic = dio->ic;
+	struct bio *bio = dm_bio_from_per_bio_data(dio, sizeof(struct dm_integrity_io));
+
+	unsigned journal_section, journal_entry;
+	unsigned journal_read_pos;
+
+	sector_t logical_sector;
+	unsigned n_sectors;
+
+	struct completion read_comp;
+
+	bool need_sync_io = ic->internal_hash && !dio->write;
+
+	if (need_sync_io && from_map) {
+		INIT_WORK(&dio->work, integrity_bio_wait);
+		queue_work(ic->metadata_wq, &dio->work);
+		return;
+	}
+
+lock_retry:
+	spin_lock_irq(&ic->endio_wait.lock);
+retry:
+	if (unlikely(dm_integrity_failed(ic))) {
+		spin_unlock_irq(&ic->endio_wait.lock);
+		do_endio(ic, bio);
+		return;
+	}
+	dio->range.n_sectors = bio_sectors(bio);
+	journal_read_pos = NOT_FOUND;
+	if (likely(!ic->direct_writes)) {
+		if (dio->write) {
+			unsigned next_entry, i, pos;
+			unsigned ws, we;
+
+			dio->range.n_sectors = min(dio->range.n_sectors, ic->free_sectors);
+			if (unlikely(!dio->range.n_sectors))
+				goto sleep;
+			ic->free_sectors -= dio->range.n_sectors;
+			journal_section = ic->free_section;
+			journal_entry = ic->free_section_entry;
+
+			next_entry = ic->free_section_entry + dio->range.n_sectors;
+			ic->free_section_entry = next_entry % ic->journal_section_entries;
+			ic->free_section += next_entry / ic->journal_section_entries;
+			ic->n_uncommitted_sections += next_entry / ic->journal_section_entries;
+			wraparound_section(ic, &ic->free_section);
+
+			pos = journal_section * ic->journal_section_entries + journal_entry;
+			ws = journal_section;
+			we = journal_entry;
+			for (i = 0; i < dio->range.n_sectors; i++) {
+				struct journal_entry *je;
+
+				add_journal_node(ic, &ic->journal_tree[pos], dio->range.logical_sector + i);
+				pos++;
+				if (unlikely(pos >= ic->journal_entries))
+					pos = 0;
+
+				je = access_journal_entry(ic, ws, we);
+				BUG_ON(!journal_entry_is_unused(je));
+				journal_entry_set_inprogress(je);
+				we++;
+				if (unlikely(we == ic->journal_section_entries)) {
+					we = 0;
+					ws++;
+					wraparound_section(ic, &ws);
+				}
+			}
+
+			spin_unlock_irq(&ic->endio_wait.lock);
+			goto journal_read_write;
+		} else {
+			sector_t next_sector;
+			journal_read_pos = find_journal_node(ic, dio->range.logical_sector, &next_sector);
+			if (likely(journal_read_pos == NOT_FOUND)) {
+				if (unlikely(dio->range.n_sectors > next_sector - dio->range.logical_sector))
+					dio->range.n_sectors = next_sector - dio->range.logical_sector;
+			} else {
+				unsigned i;
+				for (i = 1; i < dio->range.n_sectors; i++) {
+					if (!test_journal_node(ic, journal_read_pos + i, dio->range.logical_sector + i))
+						break;
+				}
+				dio->range.n_sectors = i;
+			}
+		}
+	}
+	if (unlikely(!add_new_range(ic, &dio->range))) {
+		/*
+		 * We must not sleep in the request routine because it could
+		 * stall bios on current->bio_list.
+		 * So, we offload the bio to a workqueue if we have to sleep.
+		 */
+sleep:
+		if (from_map) {
+			spin_unlock_irq(&ic->endio_wait.lock);
+			INIT_WORK(&dio->work, integrity_bio_wait);
+			queue_work(ic->wait_wq, &dio->work);
+			return;
+		} else {
+			sleep_on_endio_wait(ic);
+			goto retry;
+		}
+	}
+	spin_unlock_irq(&ic->endio_wait.lock);
+
+	if (unlikely(journal_read_pos != NOT_FOUND)) {
+		journal_section = journal_read_pos / ic->journal_section_entries;
+		journal_entry = journal_read_pos % ic->journal_section_entries;
+		goto journal_read_write;
+	}
+
+	dio->in_flight = (atomic_t)ATOMIC_INIT(2);
+
+	if (need_sync_io) {
+		read_comp = COMPLETION_INITIALIZER_ONSTACK(read_comp);
+		dio->completion = &read_comp;
+	} else {
+		dio->completion = NULL;
+	}
+
+	dio->orig_bi_iter = bio->bi_iter;
+
+	dio->orig_bi_bdev = bio->bi_bdev;
+	bio->bi_bdev = ic->dev->bdev;
+
+	dio->orig_bi_integrity = bio_integrity(bio);
+	bio->bi_integrity = NULL;
+	bio->bi_opf &= ~REQ_INTEGRITY;
+
+	dio->orig_bi_end_io = bio->bi_end_io;
+	bio->bi_end_io = integrity_end_io;
+
+	bio->bi_iter.bi_size = dio->range.n_sectors << SECTOR_SHIFT;
+	bio->bi_iter.bi_sector += ic->start;
+	generic_make_request(bio);
+
+	if (need_sync_io) {
+		wait_for_completion_io(&read_comp);
+		integrity_metadata(&dio->work);
+	} else {
+		INIT_WORK(&dio->work, integrity_metadata);
+		queue_work(ic->metadata_wq, &dio->work);
+	}
+
+	return;
+
+journal_read_write:
+	logical_sector = dio->range.logical_sector;
+	n_sectors = dio->range.n_sectors;
+	do {
+		struct bio_vec bv = bio_iovec(bio);
+		char *mem;
+
+		if (unlikely(bv.bv_len >> SECTOR_SHIFT > n_sectors))
+			bv.bv_len = n_sectors << SECTOR_SHIFT;
+		n_sectors -= bv.bv_len >> SECTOR_SHIFT;
+		bio_advance_iter(bio, &bio->bi_iter, bv.bv_len);
+
+retry_kmap:
+		mem = kmap_atomic(bv.bv_page);
+		if (likely(dio->write))
+			flush_dcache_page(bv.bv_page);
+
+		do {
+			struct journal_entry *je = access_journal_entry(ic, journal_section, journal_entry);
+
+			if (unlikely(!dio->write)) {
+				struct journal_sector *js;
+				if (unlikely(journal_entry_is_inprogress(je))) {
+					flush_dcache_page(bv.bv_page);
+					kunmap_atomic(mem);
+
+					__io_wait_event(ic->copy_to_journal_wait, !journal_entry_is_inprogress(je));
+					goto retry_kmap;
+				}
+				smp_rmb();
+				BUG_ON(journal_entry_get_sector(je) != logical_sector);
+				js = access_journal_data(ic, journal_section, journal_entry);
+				memcpy(mem + bv.bv_offset, js, JOURNAL_SECTOR_DATA);
+				memcpy(mem + bv.bv_offset + JOURNAL_SECTOR_DATA, &je->last_bytes, sizeof je->last_bytes);
+#ifdef INTERNAL_VERIFY
+				if (ic->internal_hash) {
+					char checksums_onstack[max(crypto_shash_digestsize(ic->internal_hash), ic->tag_size)];
+					integrity_sector_checksum(ic, logical_sector, mem + bv.bv_offset, checksums_onstack);
+					if (unlikely(memcmp(checksums_onstack, je->tag, ic->tag_size))) {
+						DMERR("Checksum failed when reading from journal, at sector 0x%llx", (unsigned long long)logical_sector);
+					}
+				}
+#endif
+			}
+
+			if (!ic->internal_hash) {
+				struct bio_integrity_payload *bip = bio_integrity(bio);
+				unsigned tag_todo = ic->tag_size;
+				char *tag_ptr = je->tag;
+				if (bip) do {
+					struct bio_vec biv = bvec_iter_bvec(bip->bip_vec, bip->bip_iter);
+					unsigned tag_now = min(biv.bv_len, tag_todo);
+					char *tag_addr;
+					BUG_ON(PageHighMem(biv.bv_page));
+					tag_addr = lowmem_page_address(biv.bv_page) + biv.bv_offset;
+					if (likely(dio->write))
+						memcpy(tag_ptr, tag_addr, tag_now);
+					else
+						memcpy(tag_addr, tag_ptr, tag_now);
+					bvec_iter_advance(bip->bip_vec, &bip->bip_iter, tag_now);
+					tag_ptr += tag_now;
+					tag_todo -= tag_now;
+				} while (unlikely(tag_todo)); else {
+					if (likely(dio->write))
+						memset(tag_ptr, 0, tag_todo);
+				}
+			}
+
+			if (likely(dio->write)) {
+				struct journal_sector *js;
+				js = access_journal_data(ic, journal_section, journal_entry);
+				memcpy(js, mem + bv.bv_offset, 1 << SECTOR_SHIFT);
+				je->last_bytes = js->commit_id;
+
+				if (ic->internal_hash) {
+					unsigned digest_size = crypto_shash_digestsize(ic->internal_hash);
+					if (unlikely(digest_size > ic->tag_size)) {
+						char checksums_onstack[digest_size];
+						integrity_sector_checksum(ic, logical_sector, (char *)js, checksums_onstack);
+						memcpy(je->tag, checksums_onstack, ic->tag_size);
+					} else {
+						integrity_sector_checksum(ic, logical_sector, (char *)js, je->tag);
+					}
+				}
+
+				journal_entry_set_sector(je, logical_sector);
+			}
+			logical_sector++;
+
+			journal_entry++;
+			if (unlikely(journal_entry == ic->journal_section_entries)) {
+				journal_entry = 0;
+				journal_section++;
+				wraparound_section(ic, &journal_section);
+			}
+
+			bv.bv_offset += 1 << SECTOR_SHIFT;
+		} while (bv.bv_len -= 1 << SECTOR_SHIFT);
+
+		if (unlikely(!dio->write))
+			flush_dcache_page(bv.bv_page);
+		kunmap_atomic(mem);
+	} while (n_sectors);
+
+	if (likely(dio->write)) {
+		smp_mb();
+		if (unlikely(waitqueue_active(&ic->copy_to_journal_wait)))
+			wake_up(&ic->copy_to_journal_wait);
+		if (ACCESS_ONCE(ic->free_sectors) <= ic->free_sectors_threshold) {
+			queue_work(ic->commit_wq, &ic->commit_work);
+		} else {
+			schedule_autocommit(ic);
+		}
+	} else {
+		remove_range(ic, &dio->range);
+	}
+
+	if (unlikely(bio->bi_iter.bi_size)) {
+		sector_t area, offset;
+		dio->range.logical_sector = logical_sector;
+		get_area_and_offset(ic, dio->range.logical_sector, &area, &offset);
+		dio->metadata_block = get_metadata_sector_and_offset(ic, area, offset, &dio->metadata_offset);
+		goto lock_retry;
+	}
+
+	do_endio_flush(ic, dio);
+}
+
+
+static void integrity_bio_wait(struct work_struct *w)
+{
+	struct dm_integrity_io *dio = container_of(w, struct dm_integrity_io, work);
+
+	dm_integrity_map_continue(dio, false);
+}
+
+static void pad_uncommitted(struct dm_integrity_c *ic)
+{
+	if (ic->free_section_entry) {
+		ic->free_sectors -= ic->journal_section_entries - ic->free_section_entry;
+		ic->free_section_entry = 0;
+		ic->free_section++;
+		wraparound_section(ic, &ic->free_section);
+		ic->n_uncommitted_sections++;
+	}
+}
+
+static void integrity_commit(struct work_struct *w)
+{
+	struct dm_integrity_c *ic = container_of(w, struct dm_integrity_c, commit_work);
+	unsigned commit_start, commit_sections;
+	unsigned i, j, n;
+	struct bio *flushes;
+
+	del_timer(&ic->autocommit_timer);
+
+	spin_lock_irq(&ic->endio_wait.lock);
+	flushes = bio_list_get(&ic->flush_bio_list);
+	if (unlikely(ic->direct_writes)) {
+		spin_unlock_irq(&ic->endio_wait.lock);
+		dm_integrity_flush_buffers(ic);
+		goto release_flush_bios;
+	}
+
+	pad_uncommitted(ic);
+	commit_start = ic->uncommitted_section;
+	commit_sections = ic->n_uncommitted_sections;
+	spin_unlock_irq(&ic->endio_wait.lock);
+
+	if (!commit_sections)
+		goto release_flush_bios;
+
+	i = commit_start;
+	for (n = 0; n < commit_sections; n++) {
+		for (j = 0; j < ic->journal_section_entries; j++) {
+			struct journal_entry *je;
+			je = access_journal_entry(ic, i, j);
+			io_wait_event(ic->copy_to_journal_wait, !journal_entry_is_inprogress(je));
+		}
+		for (j = 0; j < ic->journal_section_sectors; j++) {
+			struct journal_sector *js;
+			js = access_journal(ic, i, j);
+			js->commit_id = dm_integrity_commit_id(ic, i, j, ic->commit_seq);
+		}
+		i++;
+		if (unlikely(i >= ic->journal_sections))
+			ic->commit_seq = next_commit_seq(ic->commit_seq);
+		wraparound_section(ic, &i);
+	}
+	smp_rmb();
+
+	write_journal(ic, commit_start, commit_sections);
+
+	spin_lock_irq(&ic->endio_wait.lock);
+	ic->uncommitted_section += commit_sections;
+	wraparound_section(ic, &ic->uncommitted_section);
+	ic->n_uncommitted_sections -= commit_sections;
+	ic->n_committed_sections += commit_sections;
+	spin_unlock_irq(&ic->endio_wait.lock);
+
+	if (ACCESS_ONCE(ic->free_sectors) <= ic->free_sectors_threshold)
+		queue_work(ic->writer_wq, &ic->writer_work);
+
+release_flush_bios:
+	while (flushes) {
+		struct bio *next = flushes->bi_next;
+		flushes->bi_next = NULL;
+		do_endio(ic, flushes);
+		flushes = next;
+	}
+}
+
+static void complete_copy_from_journal(unsigned long error, void *context)
+{
+	struct journal_io *io = context;
+	struct journal_completion *comp = io->comp;
+	struct dm_integrity_c *ic = comp->ic;
+	remove_range(ic, &io->range);
+	mempool_free(io, ic->journal_io_mempool);
+	if (unlikely(error != 0))
+		dm_integrity_io_error(ic, "copying from journal", -EIO);
+	complete_journal_op(comp);
+}
+
+static void do_journal_write(struct dm_integrity_c *ic, unsigned write_start, unsigned write_sections, bool from_replay)
+{
+	unsigned i, j, n;
+	struct journal_completion comp;
+
+	comp.ic = ic;
+	comp.in_flight = (atomic_t)ATOMIC_INIT(1);
+	comp.comp = COMPLETION_INITIALIZER_ONSTACK(comp.comp);
+
+	i = write_start;
+	for (n = 0; n < write_sections; n++, i++, wraparound_section(ic, &i)) {
+#ifndef INTERNAL_VERIFY
+		if (unlikely(from_replay))
+#endif
+			rw_section_mac(ic, i, false);
+		for (j = 0; j < ic->journal_section_entries; j++) {
+			struct journal_entry *je = access_journal_entry(ic, i, j);
+			sector_t sec, area, offset;
+			unsigned k, l, next_loop;
+
+			sector_t metadata_block;
+			unsigned metadata_offset;
+
+			struct journal_io *io;
+
+			if (journal_entry_is_unused(je))
+				continue;
+			BUG_ON(unlikely(journal_entry_is_inprogress(je)) && !from_replay);
+			sec = journal_entry_get_sector(je);
+			get_area_and_offset(ic, sec, &area, &offset);
+			access_journal_data(ic, i, j)->commit_id = je->last_bytes;
+			for (k = j + 1; k < ic->journal_section_entries; k++) {
+				struct journal_entry *je2 = access_journal_entry(ic, i, k);
+				sector_t sec2, area2, offset2;
+				if (journal_entry_is_unused(je2))
+					break;
+				BUG_ON(unlikely(journal_entry_is_inprogress(je2)) && !from_replay);
+				sec2 = journal_entry_get_sector(je2);
+				get_area_and_offset(ic, sec2, &area2, &offset2);
+				if (area2 != area || offset2 != offset + (k - j))
+					break;
+				access_journal_data(ic, i, k)->commit_id = je2->last_bytes;
+			}
+			next_loop = k - 1;
+
+			io = mempool_alloc(ic->journal_io_mempool, GFP_NOIO);
+			io->comp = &comp;
+			io->range.logical_sector = sec;
+			io->range.n_sectors = k - j;
+
+			spin_lock_irq(&ic->endio_wait.lock);
+			while (unlikely(!add_new_range(ic, &io->range)))
+				sleep_on_endio_wait(ic);
+
+			if (likely(!from_replay)) {
+				struct journal_node *section_node = &ic->journal_tree[i * ic->journal_section_entries];
+
+				/* don't write if there is newer committed sector */
+				while (j < k && find_newer_committed_node(ic, &section_node[j])) {
+					struct journal_entry *je2 = access_journal_entry(ic, i, j);
+					journal_entry_set_unused(je2);
+					remove_journal_node(ic, &section_node[j]);
+					j++;
+					sec++;
+					offset++;
+				}
+				while (j < k && find_newer_committed_node(ic, &section_node[k - 1])) {
+					struct journal_entry *je2 = access_journal_entry(ic, i, k - 1);
+					journal_entry_set_unused(je2);
+					remove_journal_node(ic, &section_node[k - 1]);
+					k--;
+				}
+				if (j == k) {
+					remove_range_unlocked(ic, &io->range);
+					spin_unlock_irq(&ic->endio_wait.lock);
+					mempool_free(io, ic->journal_io_mempool);
+					goto skip_io;
+				}
+				for (l = j; l < k; l++) {
+					remove_journal_node(ic, &section_node[l]);
+				}
+			}
+			spin_unlock_irq(&ic->endio_wait.lock);
+
+			metadata_block = get_metadata_sector_and_offset(ic, area, offset, &metadata_offset);
+			for (l = j; l < k; l++) {
+				int r;
+				struct journal_entry *je2 = access_journal_entry(ic, i, l);
+
+				if (
+#ifndef INTERNAL_VERIFY
+				    unlikely(from_replay) &&
+#endif
+				    ic->internal_hash) {
+					unsigned char test_tag[ic->tag_size];
+					integrity_sector_checksum(ic, sec + (l - j), (char *)access_journal_data(ic, i, l), test_tag);
+					if (unlikely(memcmp(test_tag, je2->tag, ic->tag_size)))
+						dm_integrity_io_error(ic, "tag mismatch when replaying journal", -EIO);
+				}
+
+				journal_entry_set_unused(je2);
+				r = dm_integrity_rw_tag(ic, je2->tag, &metadata_block, &metadata_offset, ic->tag_size, TAG_WRITE);
+				if (unlikely(r)) {
+					dm_integrity_io_error(ic, "reading tags", r);
+				}
+			}
+
+			atomic_inc(&comp.in_flight);
+			copy_from_journal(ic, i, j, k - j, get_data_sector(ic, area, offset), complete_copy_from_journal, io);
+
+skip_io:
+			j = next_loop;
+		}
+	}
+
+	dm_bufio_write_dirty_buffers_async(ic->bufio);
+
+	complete_journal_op(&comp);
+	wait_for_completion_io(&comp.comp);
+
+	dm_integrity_flush_buffers(ic);
+}
+
+static void integrity_writer(struct work_struct *w)
+{
+	struct dm_integrity_c *ic = container_of(w, struct dm_integrity_c, writer_work);
+	unsigned write_start, write_sections;
+
+	unsigned prev_free_sectors;
+
+	/* the following test is not needed, but it tests the replay code */
+	if (ACCESS_ONCE(ic->suspending))
+		return;
+
+	spin_lock_irq(&ic->endio_wait.lock);
+	write_start = ic->committed_section;
+	write_sections = ic->n_committed_sections;
+	spin_unlock_irq(&ic->endio_wait.lock);
+
+	if (!write_sections)
+		return;
+
+	/*if (write_sections > (ic->journal_sections + 2) / 3)
+		write_sections = (ic->journal_sections + 2) / 3;*/
+
+	do_journal_write(ic, write_start, write_sections, false);
+
+	spin_lock_irq(&ic->endio_wait.lock);
+
+	ic->committed_section += write_sections;
+	wraparound_section(ic, &ic->committed_section);
+	ic->n_committed_sections -= write_sections;
+
+	prev_free_sectors = ic->free_sectors;
+	ic->free_sectors += write_sections * ic->journal_section_entries;
+	if (unlikely(!prev_free_sectors))
+		wake_up_locked(&ic->endio_wait);
+
+	spin_unlock_irq(&ic->endio_wait.lock);
+}
+
+static void init_journal(struct dm_integrity_c *ic, unsigned start_section, unsigned n_sections, unsigned char commit_seq)
+{
+	unsigned i, j, n;
+
+	if (!n_sections)
+		return;
+
+	for (n = 0; n < n_sections; n++) {
+		i = start_section + n;
+		wraparound_section(ic, &i);
+		for (j = 0; j < ic->journal_section_sectors; j++) {
+			struct journal_sector *js = access_journal(ic, i, j);
+			memset(&js->entries, 0, JOURNAL_SECTOR_DATA);
+			js->commit_id = dm_integrity_commit_id(ic, i, j, commit_seq);
+		}
+		for (j = 0; j < ic->journal_section_entries; j++) {
+			struct journal_entry *je = access_journal_entry(ic, i, j);
+			journal_entry_set_unused(je);
+		}
+	}
+
+	write_journal(ic, start_section, n_sections);
+}
+
+static int find_commit_seq(struct dm_integrity_c *ic, unsigned i, unsigned j, commit_id_t id)
+{
+	unsigned char k;
+	for (k = 0; k < N_COMMIT_IDS; k++) {
+		if (dm_integrity_commit_id(ic, i, j, k) == id)
+			return k;
+	}
+	dm_integrity_io_error(ic, "journal commit id", -EIO);
+	return -EIO;
+}
+
+static void replay_journal(struct dm_integrity_c *ic)
+{
+	unsigned i, j;
+	bool used_commit_ids[N_COMMIT_IDS];
+	unsigned max_commit_id_sections[N_COMMIT_IDS];
+	unsigned write_start, write_sections;
+	unsigned continue_section;
+	bool journal_empty;
+	unsigned char unused, last_used, want_commit_seq;
+
+	if (ic->journal_uptodate)
+		return;
+
+	last_used = 0;
+	write_start = 0;
+
+	if (!ic->just_formatted) {
+		DEBUG_print("reading journal\n");
+		rw_journal(ic, REQ_OP_READ, 0, 0, ic->journal_sections, NULL);
+		if (ic->journal_io)
+			DEBUG_bytes(lowmem_page_address(ic->journal_io[0].page), 64, "read journal");
+		if (ic->journal_io) {
+			struct journal_completion crypt_comp;
+			crypt_comp.ic = ic;
+			crypt_comp.comp = COMPLETION_INITIALIZER_ONSTACK(crypt_comp.comp);
+			crypt_comp.in_flight = (atomic_t)ATOMIC_INIT(0);
+			encrypt_journal(ic, false, 0, ic->journal_sections, &crypt_comp);
+			wait_for_completion(&crypt_comp.comp);
+		}
+		DEBUG_bytes(lowmem_page_address(ic->journal[0].page), 64, "decrypted journal");
+	}
+
+	if (dm_integrity_failed(ic)) {
+		goto clear_journal;
+	}
+
+	journal_empty = true;
+	memset(used_commit_ids, 0, sizeof used_commit_ids);
+	memset(max_commit_id_sections, 0, sizeof max_commit_id_sections);
+	for (i = 0; i < ic->journal_sections; i++) {
+		for (j = 0; j < ic->journal_section_sectors; j++) {
+			int k;
+			struct journal_sector *js = access_journal(ic, i, j);
+			k = find_commit_seq(ic, i, j, js->commit_id);
+			if (k < 0)
+				goto clear_journal;
+			used_commit_ids[k] = true;
+			max_commit_id_sections[k] = i;
+		}
+		if (journal_empty) {
+			for (j = 0; j < ic->journal_section_entries; j++) {
+				struct journal_entry *je = access_journal_entry(ic, i, j);
+				if (!journal_entry_is_unused(je)) {
+					journal_empty = false;
+					break;
+				}
+			}
+		}
+	}
+
+	if (!used_commit_ids[N_COMMIT_IDS - 1]) {
+		unused = N_COMMIT_IDS - 1;
+		while (unused && !used_commit_ids[unused - 1])
+			unused--;
+	} else {
+		for (unused = 0; unused < N_COMMIT_IDS; unused++)
+			if (!used_commit_ids[unused])
+				break;
+		if (unused == N_COMMIT_IDS) {
+			dm_integrity_io_error(ic, "journal commit ids", -EIO);
+			goto clear_journal;
+		}
+	}
+	DEBUG_print("first unused commit seq %d [%d,%d,%d,%d]\n", unused, used_commit_ids[0], used_commit_ids[1], used_commit_ids[2], used_commit_ids[3]);
+
+	last_used = prev_commit_seq(unused);
+	want_commit_seq = prev_commit_seq(last_used);
+
+	if (!used_commit_ids[want_commit_seq] && used_commit_ids[prev_commit_seq(want_commit_seq)])
+		journal_empty = true;
+
+	write_start = max_commit_id_sections[last_used] + 1;
+	if (unlikely(write_start >= ic->journal_sections))
+		want_commit_seq = next_commit_seq(want_commit_seq);
+	wraparound_section(ic, &write_start);
+
+	i = write_start;
+	for (write_sections = 0; write_sections < ic->journal_sections; write_sections++) {
+		for (j = 0; j < ic->journal_section_sectors; j++) {
+			struct journal_sector *js = access_journal(ic, i, j);
+			if (js->commit_id != dm_integrity_commit_id(ic, i, j, want_commit_seq)) {
+				/*
+				 * This could be caused by crash during writing.
+				 * We won't replay the inconsistent part of the
+				 * journal.
+				 */
+				DEBUG_print("commit id mismatch at position (%u, %u): %d != %d\n", i, j, find_commit_seq(ic, i, j, js->commit_id), want_commit_seq);
+				goto brk;
+			}
+		}
+		i++;
+		if (unlikely(i >= ic->journal_sections))
+			want_commit_seq = next_commit_seq(want_commit_seq);
+		wraparound_section(ic, &i);
+	}
+brk:
+
+	if (!journal_empty) {
+		DEBUG_print("replaying %u sections, starting at %u, commit seq %d\n", write_sections, write_start, want_commit_seq);
+		do_journal_write(ic, write_start, write_sections, true);
+	}
+
+	if (write_sections == ic->journal_sections && (!ic->direct_writes || journal_empty)) {
+		continue_section = write_start;
+		ic->commit_seq = want_commit_seq;
+		DEBUG_print("continuing from section %u, commit seq %d\n", write_start, ic->commit_seq);
+	} else {
+		unsigned s;
+		unsigned char erase_seq;
+clear_journal:
+		DEBUG_print("clearing journal\n");
+
+		erase_seq = prev_commit_seq(prev_commit_seq(last_used));
+		s = write_start;
+		init_journal(ic, s, 1, erase_seq);
+		s++;
+		wraparound_section(ic, &s);
+		if (ic->journal_sections >= 2) {
+			init_journal(ic, s, ic->journal_sections - 2, erase_seq);
+			s += ic->journal_sections - 2;
+			wraparound_section(ic, &s);
+			init_journal(ic, s, 1, erase_seq);
+		}
+
+		continue_section = 0;
+		ic->commit_seq = next_commit_seq(erase_seq);
+	}
+
+	ic->committed_section = continue_section;
+	ic->n_committed_sections = 0;
+
+	ic->uncommitted_section = continue_section;
+	ic->n_uncommitted_sections = 0;
+
+	ic->free_section = continue_section;
+	ic->free_section_entry = 0;
+	ic->free_sectors = ic->journal_entries;
+
+	ic->journal_tree_root = RB_ROOT;
+	for (i = 0; i < ic->journal_entries; i++)
+		init_journal_node(&ic->journal_tree[i]);
+}
+
+static void dm_integrity_postsuspend(struct dm_target *ti)
+{
+	struct dm_integrity_c *ic = (struct dm_integrity_c *)ti->private;
+
+	del_timer_sync(&ic->autocommit_timer);
+
+	ic->suspending = true;
+
+	queue_work(ic->commit_wq, &ic->commit_work);
+	drain_workqueue(ic->commit_wq);
+
+	if (!ic->direct_writes) {
+		drain_workqueue(ic->writer_wq);
+		dm_integrity_flush_buffers(ic);
+	}
+
+	ic->suspending = false;
+
+	BUG_ON(!RB_EMPTY_ROOT(&ic->in_progress));
+
+	ic->journal_uptodate = true;
+}
+
+static void dm_integrity_resume(struct dm_target *ti)
+{
+	struct dm_integrity_c *ic = (struct dm_integrity_c *)ti->private;
+
+	replay_journal(ic);
+}
+
+static void dm_integrity_status(struct dm_target *ti, status_type_t type,
+				unsigned status_flags, char *result, unsigned maxlen)
+{
+	struct dm_integrity_c *ic = (struct dm_integrity_c *)ti->private;
+	unsigned arg_count;
+	size_t sz = 0;
+
+	switch (type) {
+	case STATUSTYPE_INFO:
+		result[0] = '\0';
+		break;
+
+	case STATUSTYPE_TABLE: {
+		__u64 watermark_percentage = (__u64)(ic->journal_entries - ic->free_sectors_threshold) * 100;
+		watermark_percentage += ic->journal_entries / 2;
+		do_div(watermark_percentage, ic->journal_entries);
+		arg_count = 5;
+		arg_count += !!ic->internal_hash_alg.alg_string;
+		arg_count += !!ic->journal_crypt_alg.alg_string;
+		arg_count += !!ic->journal_mac_alg.alg_string;
+		DMEMIT("%s %llu %u %c %u", ic->dev->name, (unsigned long long)ic->start, ic->tag_size, ic->direct_writes ? 'D' : 'J', arg_count);
+		DMEMIT(" journal-sectors:%u", ic->initial_sectors - SB_SECTORS);
+		DMEMIT(" interleave-sectors:%u", 1U << ic->sb->log2_interleave_sectors);
+		DMEMIT(" buffer-sectors:%u", 1U << ic->log2_buffer_sectors);
+		DMEMIT(" journal-watermark:%u", (unsigned)watermark_percentage);
+		DMEMIT(" commit-time:%u", ic->autocommit_msec);
+#define EMIT_ALG(a, n)							\
+		do {							\
+			if (ic->a.alg_string) {				\
+				DMEMIT(" %s:%s", n, ic->a.alg_string);	\
+				if (ic->a.key_string)			\
+					DMEMIT(":%s", ic->a.key_string);\
+			}						\
+		} while (0)
+		EMIT_ALG(internal_hash_alg, "internal-hash");
+		EMIT_ALG(journal_crypt_alg, "journal-crypt");
+		EMIT_ALG(journal_mac_alg, "journal-mac");
+		break;
+	}
+	}
+}
+
+static int dm_integrity_iterate_devices(struct dm_target *ti,
+					iterate_devices_callout_fn fn, void *data)
+{
+	struct dm_integrity_c *ic = ti->private;
+
+	return fn(ti, ic->dev, ic->start + ic->initial_sectors + ic->metadata_run, ti->len, data);
+}
+
+static void calculate_journal_section_size(struct dm_integrity_c *ic)
+{
+	unsigned sector_space;
+	ic->journal_sections = le32_to_cpu(ic->sb->journal_sections);
+	ic->journal_entry_size = roundup(offsetof(struct journal_entry, tag) + ic->tag_size, JOURNAL_ENTRY_ROUNDUP);
+	sector_space = JOURNAL_SECTOR_DATA;
+	if (ic->sb->flags & cpu_to_le32(SB_FLAG_HAVE_JOURNAL_MAC))
+		sector_space -= JOURNAL_MAC_PER_SECTOR;
+	ic->journal_entries_per_sector = sector_space / ic->journal_entry_size;
+	ic->journal_section_entries = ic->journal_entries_per_sector * JOURNAL_BLOCK_SECTORS;
+	ic->journal_section_sectors = ic->journal_section_entries + JOURNAL_BLOCK_SECTORS;
+	ic->journal_entries = ic->journal_section_entries * ic->journal_sections;
+}
+
+static int calculate_device_limits(struct dm_integrity_c *ic)
+{
+	__u64 initial_sectors;
+	sector_t last_sector, last_area, last_offset;
+
+	calculate_journal_section_size(ic);
+	initial_sectors = SB_SECTORS + (__u64)ic->journal_section_sectors * ic->journal_sections;
+	if (initial_sectors + METADATA_PADDING_SECTORS >= ic->device_sectors || initial_sectors > UINT_MAX)
+		return -EINVAL;
+	ic->initial_sectors = initial_sectors;
+
+	ic->metadata_run = roundup((__u64)ic->tag_size << ic->sb->log2_interleave_sectors, (__u64)(1 << SECTOR_SHIFT << METADATA_PADDING_SECTORS)) >> SECTOR_SHIFT;
+	if (!(ic->metadata_run & (ic->metadata_run - 1)))
+		ic->log2_metadata_run = __ffs(ic->metadata_run);
+	else
+		ic->log2_metadata_run = -1;
+
+	get_area_and_offset(ic, ic->provided_data_sectors - 1, &last_area, &last_offset);
+	last_sector = get_data_sector(ic, last_area, last_offset);
+	/*printk("provided %llu, last_area %llu, last_offset %llu, last_sector %llu\n", (unsigned long long)ic->provided_data_sectors, (unsigned long long)last_area, (unsigned long long)last_offset, (unsigned long long)last_sector);*/
+
+	if (ic->start + last_sector < last_sector ||
+	    ic->start + last_sector >= ic->device_sectors) {
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static int initialize_superblock(struct dm_integrity_c *ic, unsigned journal_sectors, unsigned interleave_sectors)
+{
+	unsigned journal_sections;
+	int test_bit;
+
+	memcpy(ic->sb->magic, SB_MAGIC, 8);
+	ic->sb->version = SB_VERSION;
+	ic->sb->integrity_tag_size = cpu_to_le16(ic->tag_size);
+	if (ic->journal_mac_alg.alg_string)
+		ic->sb->flags |= cpu_to_le32(SB_FLAG_HAVE_JOURNAL_MAC);
+
+	calculate_journal_section_size(ic);
+	journal_sections = journal_sectors / ic->journal_section_sectors;
+	if (!journal_sections)
+		journal_sections = 1;
+	ic->sb->journal_sections = cpu_to_le32(journal_sections);
+
+	ic->sb->log2_interleave_sectors = __fls(interleave_sectors);
+	ic->sb->log2_interleave_sectors = max((__u8)MIN_INTERLEAVE_SECTORS, ic->sb->log2_interleave_sectors);
+	ic->sb->log2_interleave_sectors = min((__u8)MAX_INTERLEAVE_SECTORS, ic->sb->log2_interleave_sectors);
+
+	ic->provided_data_sectors = 0;
+	for (test_bit = fls64(ic->device_sectors) - 1; test_bit >= 3; test_bit--) {
+		__u64 prev_data_sectors = ic->provided_data_sectors;
+		ic->provided_data_sectors |= (sector_t)1 << test_bit;
+		if (calculate_device_limits(ic))
+			ic->provided_data_sectors = prev_data_sectors;
+	}
+
+	if (!le64_to_cpu(ic->provided_data_sectors))
+		return -EINVAL;
+
+	ic->sb->provided_data_sectors = cpu_to_le64(ic->provided_data_sectors);
+
+	return 0;
+}
+
+static void dm_integrity_set(struct dm_target *ti, struct dm_integrity_c *ic)
+{
+	struct gendisk *disk = dm_disk(dm_table_get_md(ti->table));
+	struct blk_integrity bi;
+
+	bi.profile = &dm_integrity_profile;
+	bi.tuple_size = ic->tag_size * (queue_logical_block_size(disk->queue) >> SECTOR_SHIFT);
+	bi.tag_size = ic->tag_size;
+
+	blk_integrity_register(disk, &bi);
+	blk_queue_max_integrity_segments(disk->queue, UINT_MAX);
+}
+
+static void *dm_integrity_kvmalloc(size_t size, gfp_t gfp)
+{
+	void *ptr = NULL;
+	if (size <= PAGE_SIZE)
+		ptr = kmalloc(size, GFP_KERNEL | gfp);
+	if (!ptr && size <= KMALLOC_MAX_SIZE)
+		ptr = kmalloc(size, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY | gfp);
+	if (!ptr)
+		ptr = __vmalloc(size, GFP_KERNEL | gfp, PAGE_KERNEL);
+	return ptr;
+}
+
+static void dm_integrity_free_page_list(struct dm_integrity_c *ic, struct page_list *pl)
+{
+	unsigned i;
+	if (!pl)
+		return;
+	for (i = 0; i < ic->journal_pages; i++)
+		if (pl[i].page)
+			__free_page(pl[i].page);
+	kvfree(pl);
+}
+
+static struct page_list *dm_integrity_alloc_page_list(struct dm_integrity_c *ic)
+{
+	size_t page_list_desc_size = ic->journal_pages * sizeof(struct page_list);
+	struct page_list *pl;
+	unsigned i;
+
+	pl = dm_integrity_kvmalloc(page_list_desc_size, __GFP_ZERO);
+	if (!pl)
+		return NULL;
+
+	for (i = 0; i < ic->journal_pages; i++) {
+		pl[i].page = alloc_page(GFP_KERNEL);
+		if (!pl[i].page) {
+			dm_integrity_free_page_list(ic, pl);
+			return NULL;
+		}
+		if (i)
+			pl[i - 1].next = &pl[i];
+	}
+
+	return pl;
+}
+
+static void dm_integrity_free_journal_scatterlist(struct dm_integrity_c *ic, struct scatterlist **sl)
+{
+	unsigned i;
+	for (i = 0; i < ic->journal_sections; i++)
+		kvfree(sl[i]);
+	kfree(sl);
+}
+
+static struct scatterlist **dm_integrity_alloc_journal_scatterlist(struct dm_integrity_c *ic, struct page_list *pl)
+{
+	struct scatterlist **sl;
+	unsigned i;
+
+	sl = dm_integrity_kvmalloc(ic->journal_sections * sizeof(struct scatterlist *), __GFP_ZERO);
+	if (!sl)
+		return NULL;
+
+	for (i = 0; i < ic->journal_sections; i++) {
+		struct scatterlist *s;
+		unsigned start_index, start_offset;
+		unsigned end_index, end_offset;
+		unsigned n_pages;
+		unsigned idx;
+
+		page_list_location(ic, i, 0, &start_index, &start_offset);
+		page_list_location(ic, i, ic->journal_section_sectors - 1, &end_index, &end_offset);
+
+		n_pages = (end_index - start_index + 1);
+
+		s = dm_integrity_kvmalloc(n_pages * sizeof(struct scatterlist), 0);
+		if (!s) {
+			dm_integrity_free_journal_scatterlist(ic, sl);
+			return NULL;
+		}
+
+		sg_init_table(s, n_pages);
+		for (idx = start_index; idx <= end_index; idx++) {
+			char *va = lowmem_page_address(pl[idx].page);
+			unsigned start = 0, end = PAGE_SIZE;
+			if (idx == start_index)
+				start = start_offset;
+			if (idx == end_index)
+				end = end_offset + (1 << SECTOR_SHIFT);
+			sg_set_buf(&s[idx - start_index], va + start, end - start);
+		}
+
+		sl[i] = s;
+	}
+
+	return sl;
+}
+
+static void free_alg(struct alg_spec *a)
+{
+	kzfree(a->alg_string);
+	kzfree(a->key);
+	memset(a, 0, sizeof *a);
+}
+
+static int get_alg_and_key(const char *arg, struct alg_spec *a, char **error, char *error_inval)
+{
+	char *k;
+
+	free_alg(a);
+
+	a->alg_string = kstrdup(strchr(arg, ':') + 1, GFP_KERNEL);
+	if (!a->alg_string)
+		goto nomem;
+
+	k = strchr(a->alg_string, ':');
+	if (k) {
+		unsigned i;
+
+		*k = 0;
+		a->key_string = k + 1;
+		if (strlen(a->key_string) & 1)
+			goto inval;
+
+		a->key_size = strlen(a->key_string) / 2;
+		a->key = kmalloc(a->key_size, GFP_KERNEL);
+		if (!a->key)
+			goto nomem;
+		for (i = 0; i < a->key_size; i++) {
+			char digit[3];
+			digit[0] = a->key_string[i * 2];
+			digit[1] = a->key_string[i * 2 + 1];
+			digit[2] = 0;
+			if (strspn(digit, "0123456789abcdefABCDEF") != 2)
+				goto inval;
+			if (kstrtou8(digit, 16, &a->key[i]))
+				goto inval;
+		}
+	}
+
+	return 0;
+
+inval:
+	*error = error_inval;
+	return -EINVAL;
+
+nomem:
+	*error = "Out of memory for an argument";
+	return -ENOMEM;
+}
+
+static int get_mac(struct crypto_shash **hash, struct alg_spec *a, char **error, char *error_alg, char *error_key)
+{
+	int r;
+
+	if (a->alg_string) {
+		*hash = crypto_alloc_shash(a->alg_string, 0, CRYPTO_ALG_ASYNC);
+		if (IS_ERR(*hash)) {
+			*error = error_alg;
+			r = PTR_ERR(*hash);
+			*hash = NULL;
+			return r;
+		}
+
+		if (a->key) {
+			r = crypto_shash_setkey(*hash, a->key, a->key_size);
+			if (r) {
+				*error = error_key;
+				return r;
+			}
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * Construct a integrity mapping: <dev_path> <offset> <tag_size>
+ *
+ * Arguments:
+ *	device
+ *	offset from the start of the device
+ *	tag size
+ *	D - direct writes, J - journal writes
+ *	number of optional arguments
+ *	optional arguments:
+ *		journal-sectors
+ *		interleave-sectors
+ *		buffer-sectors
+ *		journal-watermark
+ *		commit-time
+ *		internal-hash
+ *		journal-crypt
+ *		journal-mac
+ */
+static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv)
+{
+	struct dm_integrity_c *ic;
+	char dummy;
+	int r;
+	unsigned i;
+	unsigned extra_args;
+	struct dm_arg_set as;
+	static struct dm_arg _args[] = {
+		{0, 7, "Invalid number of feature args"},
+	};
+	unsigned journal_sectors, interleave_sectors, buffer_sectors, journal_watermark, sync_msec;
+	bool should_write_sb;
+	__u64 journal_pages, journal_desc_size, journal_tree_size;
+	__u64 threshold;
+	unsigned long long start;
+
+#define DIRECT_ARGUMENTS	4
+
+	if (argc <= DIRECT_ARGUMENTS) {
+		ti->error = "Invalid argument count";
+		return -EINVAL;
+	}
+
+	ic = kzalloc(sizeof(struct dm_integrity_c), GFP_KERNEL);
+	if (!ic) {
+		ti->error = "Cannot allocate integrity context";
+		return -ENOMEM;
+	}
+	ti->private = ic;
+	ti->per_io_data_size = sizeof(struct dm_integrity_io);
+
+	ic->commit_ids[0] = cpu_to_le64(0x1111111111111111ULL);
+	ic->commit_ids[1] = cpu_to_le64(0x2222222222222222ULL);
+	ic->commit_ids[2] = cpu_to_le64(0x3333333333333333ULL);
+	ic->commit_ids[3] = cpu_to_le64(0x4444444444444444ULL);
+
+	ic->in_progress = RB_ROOT;
+	init_waitqueue_head(&ic->endio_wait);
+	bio_list_init(&ic->flush_bio_list);
+	init_waitqueue_head(&ic->copy_to_journal_wait);
+	init_completion(&ic->crypto_backoff);
+
+	r = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table), &ic->dev);
+	if (r) {
+		ti->error = "Device lookup failed";
+		goto bad;
+	}
+
+	if (sscanf(argv[1], "%llu%c", &start, &dummy) != 1 || start != (sector_t)start) {
+		ti->error = "Invalid starting offset";
+		r = -EINVAL;
+		goto bad;
+	}
+	ic->start = start;
+
+	if (strcmp(argv[2], "-")) {
+		if (sscanf(argv[2], "%u%c", &ic->tag_size, &dummy) != 1 || !ic->tag_size) {
+			ti->error = "Invalid tag size";
+			r = -EINVAL;
+			goto bad;
+		}
+	}
+
+	if (!strcasecmp(argv[3], "J"))
+		ic->direct_writes = false;
+	else if (!strcasecmp(argv[3], "D"))
+		ic->direct_writes = true;
+	else {
+		ti->error = "Invalid mode (expecing J or D)";
+		r = -EINVAL;
+		goto bad;
+	}
+
+	ic->device_sectors = i_size_read(ic->dev->bdev->bd_inode) >> SECTOR_SHIFT;
+	journal_sectors = min((sector_t)DEFAULT_MAX_JOURNAL_SECTORS,
+			ic->device_sectors >> DEFAULT_JOURNAL_SIZE_FACTOR);
+	interleave_sectors = DEFAULT_INTERLEAVE_SECTORS;
+	buffer_sectors = DEFAULT_BUFFER_SECTORS;
+	journal_watermark = DEFAULT_JOURNAL_WATERMARK;
+	sync_msec = DEFAULT_SYNC_MSEC;
+
+	as.argc = argc - DIRECT_ARGUMENTS;
+	as.argv = argv + DIRECT_ARGUMENTS;
+	r = dm_read_arg_group(_args, &as, &extra_args, &ti->error);
+	if (r)
+		goto bad;
+
+	while (extra_args--) {
+		const char *opt_string;
+		unsigned val;
+		opt_string = dm_shift_arg(&as);
+		if (!opt_string) {
+			r = -EINVAL;
+			ti->error = "Not enough feature arguments";
+			goto bad;
+		}
+		if (sscanf(opt_string, "journal-sectors:%u%c", &val, &dummy) == 1)
+			journal_sectors = val;
+		else if (sscanf(opt_string, "interleave-sectors:%u%c", &val, &dummy) == 1)
+			interleave_sectors = val;
+		else if (sscanf(opt_string, "buffer-sectors:%u%c", &val, &dummy) == 1)
+			buffer_sectors = val;
+		else if (sscanf(opt_string, "journal-watermark:%u%c", &val, &dummy) == 1 && val <= 100)
+			journal_watermark = val;
+		else if (sscanf(opt_string, "commit-time:%u%c", &val, &dummy) == 1)
+			sync_msec = val;
+		else if (!memcmp(opt_string, "internal-hash:", strlen("internal-hash:"))) {
+			r = get_alg_and_key(opt_string, &ic->internal_hash_alg, &ti->error, "Invalid internal-hash argument");
+			if (r)
+				goto bad;
+		} else if (!memcmp(opt_string, "journal-crypt:", strlen("journal-crypt:"))) {
+			r = get_alg_and_key(opt_string, &ic->journal_crypt_alg, &ti->error, "Invalid journal-crypt argument");
+			if (r)
+				goto bad;
+		} else if (!memcmp(opt_string, "journal-mac:", strlen("journal-mac:"))) {
+			r = get_alg_and_key(opt_string, &ic->journal_mac_alg,  &ti->error, "Invalid journal-mac argument");
+			if (r)
+				goto bad;
+		} else {
+			r = -EINVAL;
+			ti->error = "Invalid argument";
+			goto bad;
+		}
+	}
+
+	r = get_mac(&ic->internal_hash, &ic->internal_hash_alg, &ti->error, "Invalid internal hash", "Error setting internal hash key");
+	if (r)
+		goto bad;
+
+	r = get_mac(&ic->journal_mac, &ic->journal_mac_alg, &ti->error, "Invalid journal mac", "Error setting journal mac key");
+	if (r)
+		goto bad;
+
+	if (!ic->tag_size) {
+		if (!ic->internal_hash) {
+			ti->error = "Unknown tag size";
+			r = -EINVAL;
+			goto bad;
+		}
+		ic->tag_size = crypto_shash_digestsize(ic->internal_hash);
+	}
+	if (ic->tag_size > MAX_TAG_SIZE) {
+		ti->error = "Too big tag size";
+		r = -EINVAL;
+		goto bad;
+	}
+	if (!(ic->tag_size & (ic->tag_size - 1)))
+		ic->log2_tag_size = __ffs(ic->tag_size);
+	else
+		ic->log2_tag_size = -1;
+
+	ic->autocommit_jiffies = msecs_to_jiffies(sync_msec);
+	ic->autocommit_msec = sync_msec;
+	setup_timer(&ic->autocommit_timer, autocommit_fn, (unsigned long)ic);
+
+	ic->io = dm_io_client_create();
+	if (IS_ERR(ic->io)) {
+		r = PTR_ERR(ic->io);
+		ic->io = NULL;
+		ti->error = "Cannot allocate dm io";
+		goto bad;
+	}
+
+	ic->journal_io_mempool = mempool_create_slab_pool(JOURNAL_IO_MEMPOOL, journal_io_cache);
+	if (!ic->journal_io_mempool) {
+		r = -ENOMEM;
+		ti->error = "Cannot allocate mempool";
+		goto bad;
+	}
+
+	ic->metadata_wq = alloc_workqueue("dm-integrity-metadata", WQ_MEM_RECLAIM, METADATA_WORKQUEUE_MAX_ACTIVE);
+	if (!ic->metadata_wq) {
+		ti->error = "Cannot allocate workqueue";
+		r = -ENOMEM;
+		goto bad;
+	}
+
+	/*
+	 * If this workqueue were percpu, it would cause bio reordering
+	 * and reduced performance.
+	 */
+	ic->wait_wq = alloc_workqueue("dm-integrity-wait", WQ_MEM_RECLAIM | WQ_UNBOUND, 1);
+	if (!ic->wait_wq) {
+		ti->error = "Cannot allocate workqueue";
+		r = -ENOMEM;
+		goto bad;
+	}
+
+	ic->commit_wq = alloc_workqueue("dm-integrity-commit", WQ_MEM_RECLAIM, 1);
+	if (!ic->commit_wq) {
+		ti->error = "Cannot allocate workqueue";
+		r = -ENOMEM;
+		goto bad;
+	}
+	INIT_WORK(&ic->commit_work, integrity_commit);
+
+	if (!ic->direct_writes) {
+		ic->writer_wq = alloc_workqueue("dm-integrity-writer", WQ_MEM_RECLAIM, 1);
+		if (!ic->writer_wq) {
+			ti->error = "Cannot allocate workqueue";
+			r = -ENOMEM;
+			goto bad;
+		}
+		INIT_WORK(&ic->writer_work, integrity_writer);
+	}
+
+	ic->sb = alloc_pages_exact(SB_SECTORS << SECTOR_SHIFT, GFP_KERNEL);
+	if (!ic->sb) {
+		r = -ENOMEM;
+		ti->error = "Cannot allocate superblock area";
+		goto bad;
+	}
+
+	r = sync_rw_sb(ic, REQ_OP_READ, 0);
+	if (r) {
+		ti->error = "Error reading superblock";
+		goto bad;
+	}
+	if (!memcmp(ic->sb->magic, SB_MAGIC, 8)) {
+		should_write_sb = false;
+	} else {
+		for (i = 0; i < 512; i += 8) {
+			if (*(__u64 *)((__u8 *)ic->sb + i)) {
+				r = -EINVAL;
+				ti->error = "The device is not initialized";
+				goto bad;
+			}
+		}
+
+		r = initialize_superblock(ic, journal_sectors, interleave_sectors);
+		if (r) {
+			ti->error = "Could not initialize superblock";
+			goto bad;
+		}
+		should_write_sb = true;
+	}
+
+	if (ic->sb->version != SB_VERSION) {
+		r = -EINVAL;
+		ti->error = "Unknown version";
+		goto bad;
+	}
+	if (le16_to_cpu(ic->sb->integrity_tag_size) != ic->tag_size) {
+		r = -EINVAL;
+		ti->error = "Invalid tag size";
+		goto bad;
+	}
+	/* make sure that ti->max_io_len doesn't overflow */
+	if (ic->sb->log2_interleave_sectors != -1 && (ic->sb->log2_interleave_sectors < MIN_INTERLEAVE_SECTORS || ic->sb->log2_interleave_sectors > MAX_INTERLEAVE_SECTORS)) {
+		r = -EINVAL;
+		ti->error = "Invalid interleave_sectors in the superblock";
+		goto bad;
+	}
+	ic->provided_data_sectors = le64_to_cpu(ic->sb->provided_data_sectors);
+	if (ic->provided_data_sectors != le64_to_cpu(ic->sb->provided_data_sectors)) {	/* test for overflow */
+		r = -EINVAL;
+		ti->error = "The superblock has 64-bit device size, but the kernel was compiled with 32-bit sectors";
+		goto bad;
+	}
+	if (!!(ic->sb->flags & cpu_to_le32(SB_FLAG_HAVE_JOURNAL_MAC)) != !!ic->journal_mac_alg.alg_string) {
+		r = -EINVAL;
+		ti->error = "Journal mac mismatch";
+		goto bad;
+	}
+	r = calculate_device_limits(ic);
+	if (r) {
+		ti->error = "The device is too small";
+		goto bad;
+	}
+
+	if (!buffer_sectors)
+		buffer_sectors = 1;
+	ic->log2_buffer_sectors = min3((int)__fls(buffer_sectors), (int)__ffs(ic->metadata_run), 31 - SECTOR_SHIFT);
+
+	threshold = (__u64)ic->journal_entries * (100 - journal_watermark);
+	threshold += 50;
+	do_div(threshold, 100);
+	ic->free_sectors_threshold = threshold;
+
+	DEBUG_print("initialized:\n");
+	DEBUG_print("	integrity_tag_size %u\n", le16_to_cpu(ic->sb->integrity_tag_size));
+	DEBUG_print("	journal_entry_size %u\n", ic->journal_entry_size);
+	DEBUG_print("	journal_entries_per_sector %u\n", ic->journal_entries_per_sector);
+	DEBUG_print("	journal_section_entries %u\n", ic->journal_section_entries);
+	DEBUG_print("	journal_section_sectors %u\n", ic->journal_section_sectors);
+	DEBUG_print("	journal_sections %u\n", (unsigned)le32_to_cpu(ic->sb->journal_sections));
+	DEBUG_print("	journal_entries %u\n", ic->journal_entries);
+	DEBUG_print("	log2_interleave_sectors %d\n", ic->sb->log2_interleave_sectors);
+	DEBUG_print("	device_sectors 0x%llx\n", (unsigned long long)ic->device_sectors);
+	DEBUG_print("	initial_sectors 0x%x\n", ic->initial_sectors);
+	DEBUG_print("	metadata_run 0x%x\n", ic->metadata_run);
+	DEBUG_print("	log2_metadata_run %d\n", ic->log2_metadata_run);
+	DEBUG_print("	provided_data_sectors 0x%llx (%llu)\n", (unsigned long long)ic->provided_data_sectors, (unsigned long long)ic->provided_data_sectors);
+	DEBUG_print("	log2_buffer_sectors %u\n", ic->log2_buffer_sectors);
+
+	ic->bufio = dm_bufio_client_create(ic->dev->bdev, 1U << (SECTOR_SHIFT + ic->log2_buffer_sectors), 1, 0, NULL, NULL);
+	if (IS_ERR(ic->bufio)) {
+		r = PTR_ERR(ic->bufio);
+		ti->error = "Cannot initialize dm-bufio";
+		ic->bufio = NULL;
+		goto bad;
+	}
+	dm_bufio_set_sector_offset(ic->bufio, ic->start + ic->initial_sectors);
+
+	journal_pages = roundup((__u64)ic->journal_sections * ic->journal_section_sectors, PAGE_SIZE >> SECTOR_SHIFT) >> (PAGE_SHIFT - SECTOR_SHIFT);
+	journal_desc_size = journal_pages * sizeof(struct page_list);
+	if (journal_pages >= totalram_pages - totalhigh_pages || journal_desc_size > ULONG_MAX) {
+		ti->error = "Journal doesn't fit into memory";
+		r = -ENOMEM;
+		goto bad;
+	}
+	ic->journal_pages = journal_pages;
+
+	ic->journal = dm_integrity_alloc_page_list(ic);
+	if (!ic->journal) {
+		ti->error = "Could not allocate memory for journal";
+		r = -ENOMEM;
+		goto bad;
+	}
+	if (ic->journal_crypt_alg.alg_string) {
+		unsigned ivsize, blocksize;
+		struct journal_completion comp;
+		comp.ic = ic;
+
+		ic->journal_crypt = crypto_alloc_skcipher(ic->journal_crypt_alg.alg_string, 0, 0);
+		if (IS_ERR(ic->journal_crypt)) {
+			ti->error = "Invalid journal cipher";
+			r = PTR_ERR(ic->journal_crypt);
+			ic->journal_crypt = NULL;
+			goto bad;
+		}
+		ivsize = crypto_skcipher_ivsize(ic->journal_crypt);
+		blocksize = crypto_skcipher_blocksize(ic->journal_crypt);
+
+		if (ic->journal_crypt_alg.key) {
+			r = crypto_skcipher_setkey(ic->journal_crypt, ic->journal_crypt_alg.key, ic->journal_crypt_alg.key_size);
+			if (r) {
+				ti->error = "Error setting encryption key";
+				goto bad;
+			}
+		}
+		DEBUG_print("cipher %s, block size %u iv size %u\n", ic->journal_crypt_alg.alg_string, blocksize, ivsize);
+
+		ic->journal_io = dm_integrity_alloc_page_list(ic);
+		if (!ic->journal_io) {
+			ti->error = "Could not allocate memory for journal io";
+			r = -ENOMEM;
+			goto bad;
+		}
+
+		if (blocksize == 1) {
+			struct scatterlist *sg;
+			SKCIPHER_REQUEST_ON_STACK(req, ic->journal_crypt);
+			unsigned char iv[ivsize];
+			skcipher_request_set_tfm(req, ic->journal_crypt);
+
+			ic->journal_xor = dm_integrity_alloc_page_list(ic);
+			if (!ic->journal_xor) {
+				ti->error = "Could not allocate memory for journal xor";
+				r = -ENOMEM;
+				goto bad;
+			}
+
+			sg = dm_integrity_kvmalloc((ic->journal_pages + 1) * sizeof(struct scatterlist), 0);
+			if (!sg) {
+				ti->error = "Unable to allocate sg list";
+				r = -ENOMEM;
+				goto bad;
+			}
+			sg_init_table(sg, ic->journal_pages + 1);
+			for (i = 0; i < ic->journal_pages; i++) {
+				char *va = lowmem_page_address(ic->journal_xor[i].page);
+				clear_page(va);
+				sg_set_buf(&sg[i], va, PAGE_SIZE);
+			}
+			sg_set_buf(&sg[i], &ic->commit_ids, sizeof ic->commit_ids);
+			memset(iv, 0x00, ivsize);
+
+			skcipher_request_set_crypt(req, sg, sg, PAGE_SIZE * ic->journal_pages + sizeof ic->commit_ids, iv);
+			comp.comp = COMPLETION_INITIALIZER_ONSTACK(comp.comp);
+			comp.in_flight = (atomic_t)ATOMIC_INIT(1);
+			if (do_crypt(true, req, &comp))
+				wait_for_completion(&comp.comp);
+			kvfree(sg);
+			if ((r = dm_integrity_failed(ic))) {
+				ti->error = "Unable to encrypt journal";
+				goto bad;
+			}
+			DEBUG_bytes(lowmem_page_address(ic->journal_xor[0].page), 64, "xor data");
+
+			crypto_free_skcipher(ic->journal_crypt);
+			ic->journal_crypt = NULL;
+		} else {
+			SKCIPHER_REQUEST_ON_STACK(req, ic->journal_crypt);
+			unsigned char iv[ivsize];
+			unsigned crypt_len = roundup(ivsize, blocksize);
+			unsigned char crypt_data[crypt_len];
+
+			skcipher_request_set_tfm(req, ic->journal_crypt);
+
+			ic->journal_scatterlist = dm_integrity_alloc_journal_scatterlist(ic, ic->journal);
+			if (!ic->journal_scatterlist) {
+				ti->error = "Unable to allocate sg list";
+				r = -ENOMEM;
+				goto bad;
+			}
+			ic->journal_io_scatterlist = dm_integrity_alloc_journal_scatterlist(ic, ic->journal_io);
+			if (!ic->journal_io_scatterlist) {
+				ti->error = "Unable to allocate sg list";
+				r = -ENOMEM;
+				goto bad;
+			}
+			ic->sk_requests = dm_integrity_kvmalloc(ic->journal_sections * sizeof(struct skcipher_request *), __GFP_ZERO);
+			if (!ic->sk_requests) {
+				ti->error = "Unable to allocate sk requests";
+				r = -ENOMEM;
+				goto bad;
+			}
+			for (i = 0; i < ic->journal_sections; i++) {
+				struct scatterlist sg;
+				struct skcipher_request *section_req;
+				__u32 section_le = cpu_to_le32(i);
+
+				memset(iv, 0x00, ivsize);
+				memset(crypt_data, 0x00, crypt_len);
+				memcpy(crypt_data, &section_le, min((size_t)crypt_len, sizeof(section_le)));
+
+				sg_init_one(&sg, crypt_data, crypt_len);
+				skcipher_request_set_crypt(req, &sg, &sg, crypt_len, iv);
+				comp.comp = COMPLETION_INITIALIZER_ONSTACK(comp.comp);
+				comp.in_flight = (atomic_t)ATOMIC_INIT(1);
+				if (do_crypt(true, req, &comp))
+					wait_for_completion(&comp.comp);
+
+				if ((r = dm_integrity_failed(ic))) {
+					ti->error = "Unable to generate iv";
+					goto bad;
+				}
+
+				section_req = skcipher_request_alloc(ic->journal_crypt, GFP_KERNEL);
+				if (!section_req) {
+					ti->error = "Unable to allocate crypt request";
+					r = -ENOMEM;
+					goto bad;
+				}
+				section_req->iv = kmalloc(ivsize * 2, GFP_KERNEL);
+				if (!section_req->iv) {
+					skcipher_request_free(section_req);
+					ti->error = "Unable to allocate iv";
+					r = -ENOMEM;
+					goto bad;
+				}
+				memcpy(section_req->iv + ivsize, crypt_data, ivsize);
+				section_req->cryptlen = (size_t)ic->journal_section_sectors << SECTOR_SHIFT;
+				ic->sk_requests[i] = section_req;
+				DEBUG_bytes(crypt_data, ivsize, "iv(%u)", i);
+			}
+		}
+	}
+
+	for (i = 0; i < N_COMMIT_IDS; i++) {
+		unsigned j;
+retest_commit_id:
+		for (j = 0; j < i; j++) {
+			if (ic->commit_ids[j] == ic->commit_ids[i]) {
+				ic->commit_ids[i] = cpu_to_le64(le64_to_cpu(ic->commit_ids[i]) + 1);
+				goto retest_commit_id;
+			}
+		}
+		DEBUG_print("commit id %u: %016llx\n", i, ic->commit_ids[i]);
+	}
+
+	journal_tree_size = (__u64)ic->journal_entries * sizeof(struct journal_node);
+	if (journal_tree_size > ULONG_MAX) {
+		ti->error = "Journal doesn't fit into memory";
+		r = -ENOMEM;
+		goto bad;
+	}
+	ic->journal_tree = dm_integrity_kvmalloc(journal_tree_size, 0);
+	if (!ic->journal_tree) {
+		ti->error = "Could not allocate memory for journal tree";
+		r = -ENOMEM;
+		goto bad;
+	}
+
+	if (should_write_sb) {
+		int r;
+		init_journal(ic, 0, ic->journal_sections, 0);
+		r = dm_integrity_failed(ic);
+		if (unlikely(r)) {
+			ti->error = "Error initializing journal";
+			goto bad;
+		}
+		r = sync_rw_sb(ic, REQ_OP_WRITE, REQ_FUA);
+		if (r) {
+			ti->error = "Error initializing superblock";
+			goto bad;
+		}
+		ic->just_formatted = true;
+	}
+
+	r = dm_set_target_max_io_len(ti, 1U << ic->sb->log2_interleave_sectors);
+	if (r)
+		goto bad;
+#if 0
+	ti->split_discard_bios = true;
+#endif
+
+	if (!ic->internal_hash)
+		dm_integrity_set(ti, ic);
+
+	ti->num_flush_bios = 1;
+	ti->flush_supported = true;
+#if 0
+	ti->num_discard_bios = 1;
+#endif
+
+	return 0;
+
+bad:
+	dm_integrity_dtr(ti);
+	return r;
+}
+
+static void dm_integrity_dtr(struct dm_target *ti)
+{
+	struct dm_integrity_c *ic = ti->private;
+
+	BUG_ON(!RB_EMPTY_ROOT(&ic->in_progress));
+
+	if (ic->metadata_wq)
+		destroy_workqueue(ic->metadata_wq);
+	if (ic->wait_wq)
+		destroy_workqueue(ic->wait_wq);
+	if (ic->commit_wq)
+		destroy_workqueue(ic->commit_wq);
+	if (ic->writer_wq)
+		destroy_workqueue(ic->writer_wq);
+	if (ic->bufio)
+		dm_bufio_client_destroy(ic->bufio);
+	mempool_destroy(ic->journal_io_mempool);
+	if (ic->io)
+		dm_io_client_destroy(ic->io);
+	if (ic->dev)
+		dm_put_device(ti, ic->dev);
+	dm_integrity_free_page_list(ic, ic->journal);
+	dm_integrity_free_page_list(ic, ic->journal_io);
+	dm_integrity_free_page_list(ic, ic->journal_xor);
+	if (ic->journal_scatterlist)
+		dm_integrity_free_journal_scatterlist(ic, ic->journal_scatterlist);
+	if (ic->journal_io_scatterlist)
+		dm_integrity_free_journal_scatterlist(ic, ic->journal_io_scatterlist);
+	if (ic->sk_requests) {
+		unsigned i;
+		for (i = 0; i < ic->journal_sections; i++) {
+			struct skcipher_request *req = ic->sk_requests[i];
+			if (req) {
+				kzfree(req->iv);
+				skcipher_request_free(req);
+			}
+		}
+		kvfree(ic->sk_requests);
+	}
+	kvfree(ic->journal_tree);
+	if (ic->sb)
+		free_pages_exact(ic->sb, SB_SECTORS << SECTOR_SHIFT);
+
+	if (ic->internal_hash)
+		crypto_free_shash(ic->internal_hash);
+	free_alg(&ic->internal_hash_alg);
+
+	if (ic->journal_crypt)
+		crypto_free_skcipher(ic->journal_crypt);
+	free_alg(&ic->journal_crypt_alg);
+
+	if (ic->journal_mac)
+		crypto_free_shash(ic->journal_mac);
+	free_alg(&ic->journal_mac_alg);
+
+	kfree(ic);
+}
+
+static struct target_type integrity_target = {
+	.name			= "integrity",
+	.version		= {0, 0, 1},
+	.module			= THIS_MODULE,
+	.features		= DM_TARGET_SINGLETON | DM_TARGET_INTEGRITY,
+	.ctr			= dm_integrity_ctr,
+	.dtr			= dm_integrity_dtr,
+	.map			= dm_integrity_map,
+	.postsuspend		= dm_integrity_postsuspend,
+	.resume			= dm_integrity_resume,
+	.status			= dm_integrity_status,
+	.iterate_devices	= dm_integrity_iterate_devices,
+};
+
+int __init dm_integrity_init(void)
+{
+	int r;
+
+	journal_io_cache = kmem_cache_create("integrity_journal_io", sizeof(struct journal_io), 0, 0, NULL);
+	if (!journal_io_cache) {
+		DMERR("can't allocate journal io cache");
+		return -ENOMEM;
+	}
+
+	r = dm_register_target(&integrity_target);
+
+	if (r < 0)
+		DMERR("register failed %d", r);
+
+	return r;
+}
+
+void dm_integrity_exit(void)
+{
+	dm_unregister_target(&integrity_target);
+	kmem_cache_destroy(journal_io_cache);
+}
+
+module_init(dm_integrity_init);
+module_exit(dm_integrity_exit);
+
+MODULE_AUTHOR("Milan Broz");
+MODULE_AUTHOR("Mikulas Patocka");
+MODULE_DESCRIPTION(DM_NAME " target for integrity tags extension");
+MODULE_LICENSE("GPL");
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 4/4] Add cryptographic data integrity protection (authenticated encryption) to dm-crypt.
  2017-01-04 19:23 [RFC PATCH 0/4] Data integrity protection with dm-integrity and dm-crypt Milan Broz
                   ` (2 preceding siblings ...)
  2017-01-04 19:23 ` [RFC PATCH 3/4] Add the dm-integrity target Milan Broz
@ 2017-01-04 19:23 ` Milan Broz
  2017-03-16 14:39 ` [PATCH 0/7] Data integrity protection with dm-integrity and dm-crypt Milan Broz
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Milan Broz @ 2017-01-04 19:23 UTC (permalink / raw)
  To: dm-devel; +Cc: Vashek Matyas, Ondrej Mosnacek, Milan Broz

This patch allows to use per-sector metadata provided by dm-integrity module
for integrity protection and persistently stored per-sector Initialization Vector (IV).
The underlying device must supports "DM-DIF-EXT-TAG" dm-integrity profile.

The per-bio integrity metadata are allocated by dm-crypt for every bio.

Example of low-level mapping table for various types of use:
 DEV=/dev/sdb
 SIZE=417792

 # Additional HMAC with CBC-ESSIV, key is concatenated encryption key + HMAC key
 SIZE_INT=389952
 dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 32 J 0"
 dmsetup create y --table "0 $SIZE_INT crypt aes-cbc-essiv:sha256 \
 11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \
 00112233445566778899aabbccddeeff00112233445566778899aabbccddeeff \
 0 /dev/mapper/x 0 1 integrity:32:hmac(sha256)"

 # AEAD (Authenticated Encryption with Additional Data) - GCM with random IVs
 # GCM in kernel uses 96bits IV and we store 128bits auth tag (so 28 bytes metadata space)
 SIZE_INT=393024
 dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 28 J 0"
 dmsetup create y --table "0 $SIZE_INT crypt aes-gcm-random \
 11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \
 0 /dev/mapper/x 0 1 integrity:28:aead"

 # Random IV only for XTS mode (no integrity protection but provides atomic random sector change)
 SIZE_INT=401272
 dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 16 J 0"
 dmsetup create y --table "0 $SIZE_INT crypt aes-xts-random \
 11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \
 0 /dev/mapper/x 0 1 integrity:16:none"

 # Random IV with XTS + HMAC integrity protection
 SIZE_INT=377656
 dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 48 J 0"
 dmsetup create y --table "0 $SIZE_INT crypt aes-xts-random \
 11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \
 00112233445566778899aabbccddeeff00112233445566778899aabbccddeeff \
 0 /dev/mapper/x 0 1 integrity:48:hmac(sha256)"

Both AEAD and HMAC protection authenticates not only data but also sector metadata.

HMAC protection is implemented through autenc wrapper (so it is processed the
same way as a authenticated mode).

In HMAC mode there are two keys (concatenated in dm-crypt mapping table).
First is the encryption key and the second is the key for authentication (HMAC).
(It is userspace decision if these keys are independent or somehow derived.)

The sector request for AEAD/HMAC authenticated encryption looks like this:
 |----- AAD -------|------ DATA -------|-- AUTH TAG --|
 | (authenticated) | (auth+encryption) |              |
 | sector_LE |  IV |  sector in/out    |  tag in/out  |

For writes, the integrity fields are calculated during AEAD encryption of every sector
and stored in bio integrity fields and sent to underlying dm-integrity target for storage.

For reads, the integrity metadata are verified during AEAD decryption of every sector
(they are filled in in dm-integrity, but the integrity fields are pre-allocated in dm-crypt).

There is also an experimental support in cryptsetup utility (wip-integrity branch)
for more friendly configuration (part of LUKS2 format).

Because the integrity fields are not valid on initial creation, the device must be
"formatted". This can be done by direct-io writes to the device.
For now, there is available trivial tool to do this https://github.com/mbroz/dm_int_tools
(but dd is direct-io mode can be used as well).

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com>
Signed-off-by: Vashek Matyas <matyas@fi.muni.cz>
---
 Documentation/device-mapper/dm-crypt.txt |  16 +
 drivers/md/dm-crypt.c                    | 857 ++++++++++++++++++++++++++-----
 2 files changed, 742 insertions(+), 131 deletions(-)

diff --git a/Documentation/device-mapper/dm-crypt.txt b/Documentation/device-mapper/dm-crypt.txt
index ff1f87bf26e8..a2a6627aa659 100644
--- a/Documentation/device-mapper/dm-crypt.txt
+++ b/Documentation/device-mapper/dm-crypt.txt
@@ -93,6 +93,22 @@ submit_from_crypt_cpus
     thread because it benefits CFQ to have writes submitted using the
     same context.
 
+integrity:<bytes>:<type>
+    Calculates and verifies integrity for the encrypted device (uses
+    authenticated encryption). This mode requires metadata stored in per-bio
+    integrity structure of <bytes> in size.
+
+    This option requires that the underlying device is created by dm-integrity
+    target and provides exactly <bytes> of per-sector metadata.
+
+    There can by two options for <type>. The first one is used when encryption
+    mode is Authenticated mode (AEAD mode), then type must be just "aead".
+    The second option is integrity calculated by keyed hash (HMAC), then
+    <type> is for example "hmac(sha256)".
+
+    If random IV is used (persistently stored IV in metadata per-sector),
+    then <bytes> includes both space for random IV and authentication tag.
+
 Example scripts
 ===============
 LUKS (Linux Unified Key Setup) is now the preferred way to set up disk
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 7c6c57216bf2..de2fc7225939 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1,8 +1,8 @@
 /*
  * Copyright (C) 2003 Jana Saout <jana@saout.de>
  * Copyright (C) 2004 Clemens Fruhwirth <clemens@endorphin.org>
- * Copyright (C) 2006-2015 Red Hat, Inc. All rights reserved.
- * Copyright (C) 2013 Milan Broz <gmazyland@gmail.com>
+ * Copyright (C) 2006-2017 Red Hat, Inc. All rights reserved.
+ * Copyright (C) 2013-2017 Milan Broz <gmazyland@gmail.com>
  *
  * This file is released under the GPL.
  */
@@ -31,6 +31,9 @@
 #include <crypto/md5.h>
 #include <crypto/algapi.h>
 #include <crypto/skcipher.h>
+#include <crypto/aead.h>
+#include <crypto/authenc.h>
+#include <linux/rtnetlink.h> /* for struct rtattr and RTA macros only */
 #include <keys/user-type.h>
 
 #include <linux/device-mapper.h>
@@ -48,7 +51,11 @@ struct convert_context {
 	struct bvec_iter iter_out;
 	sector_t cc_sector;
 	atomic_t cc_pending;
-	struct skcipher_request *req;
+	union {
+		struct skcipher_request *req;
+		struct aead_request *req_aead;
+	} r;
+
 };
 
 /*
@@ -57,6 +64,8 @@ struct convert_context {
 struct dm_crypt_io {
 	struct crypt_config *cc;
 	struct bio *base_bio;
+	u8 *integrity_metadata;
+	bool integrity_metadata_from_pool;
 	struct work_struct work;
 
 	struct convert_context ctx;
@@ -70,8 +79,8 @@ struct dm_crypt_io {
 
 struct dm_crypt_request {
 	struct convert_context *ctx;
-	struct scatterlist sg_in;
-	struct scatterlist sg_out;
+	struct scatterlist sg_in[4];
+	struct scatterlist sg_out[4];
 	sector_t iv_sector;
 };
 
@@ -118,6 +127,11 @@ struct iv_tcw_private {
 enum flags { DM_CRYPT_SUSPENDED, DM_CRYPT_KEY_VALID,
 	     DM_CRYPT_SAME_CPU, DM_CRYPT_NO_OFFLOAD };
 
+enum cipher_flags {
+	CRYPT_MODE_INTEGRITY_AEAD,	/* Use authenticated mode for cihper */
+	CRYPT_MODE_INTEGRITY_HMAC,	/* Compose authenticated mode from normal mode and HMAC */
+};
+
 /*
  * The fields in here must be read only after initialization.
  */
@@ -126,11 +140,14 @@ struct crypt_config {
 	sector_t start;
 
 	/*
-	 * pool for per bio private data, crypto requests and
-	 * encryption requeusts/buffer pages
+	 * pool for per bio private data, crypto requests,
+	 * encryption requeusts/buffer pages and integrity tags
 	 */
 	mempool_t *req_pool;
 	mempool_t *page_pool;
+	mempool_t *tag_pool;
+	unsigned tag_pool_max_sectors;
+
 	struct bio_set *bs;
 	struct mutex bio_alloc_lock;
 
@@ -143,6 +160,7 @@ struct crypt_config {
 
 	char *cipher;
 	char *cipher_string;
+	char *cipher_auth;
 	char *key_string;
 
 	const struct crypt_iv_operations *iv_gen_ops;
@@ -157,8 +175,12 @@ struct crypt_config {
 
 	/* ESSIV: struct crypto_cipher *essiv_tfm */
 	void *iv_private;
-	struct crypto_skcipher **tfms;
+	union {
+		struct crypto_skcipher **tfms;
+		struct crypto_aead **tfms_aead;
+	} cipher_tfm;
 	unsigned tfms_count;
+	unsigned long cipher_flags;
 
 	/*
 	 * Layout of each crypto request:
@@ -181,21 +203,36 @@ struct crypt_config {
 	unsigned int key_size;
 	unsigned int key_parts;      /* independent parts in key buffer */
 	unsigned int key_extra_size; /* additional keys length */
+	unsigned int key_mac_size;   /* MAC key size for authenc(...) */
+
+	unsigned int integrity_tag_size;
+	unsigned int integrity_iv_size;
+	unsigned int on_disk_tag_size;
+
+	u8 *authenc_key; /* space for keys in authenc() format (if used) */
 	u8 key[0];
 };
 
-#define MIN_IOS        64
+#define MIN_IOS		64
+#define MAX_TAG_SIZE	480
+#define POOL_ENTRY_SIZE	512
 
 static void clone_init(struct dm_crypt_io *, struct bio *);
 static void kcryptd_queue_crypt(struct dm_crypt_io *io);
-static u8 *iv_of_dmreq(struct crypt_config *cc, struct dm_crypt_request *dmreq);
+static struct scatterlist *crypt_get_sg_data(struct crypt_config *cc,
+					     struct scatterlist *sg);
 
 /*
  * Use this to access cipher attributes that are the same for each CPU.
  */
 static struct crypto_skcipher *any_tfm(struct crypt_config *cc)
 {
-	return cc->tfms[0];
+	return cc->cipher_tfm.tfms[0];
+}
+
+static struct crypto_aead *any_tfm_aead(struct crypt_config *cc)
+{
+	return cc->cipher_tfm.tfms_aead[0];
 }
 
 /*
@@ -325,8 +362,7 @@ static struct crypto_cipher *setup_essiv_cpu(struct crypt_config *cc,
 		return essiv_tfm;
 	}
 
-	if (crypto_cipher_blocksize(essiv_tfm) !=
-	    crypto_skcipher_ivsize(any_tfm(cc))) {
+	if (crypto_cipher_blocksize(essiv_tfm) != cc->iv_size) {
 		ti->error = "Block size of ESSIV cipher does "
 			    "not match IV size of block cipher";
 		crypto_free_cipher(essiv_tfm);
@@ -395,6 +431,7 @@ static int crypt_iv_essiv_ctr(struct crypt_config *cc, struct dm_target *ti,
 
 	essiv_tfm = setup_essiv_cpu(cc, ti, salt,
 				crypto_ahash_digestsize(hash_tfm));
+
 	if (IS_ERR(essiv_tfm)) {
 		crypt_iv_essiv_dtr(cc);
 		return PTR_ERR(essiv_tfm);
@@ -585,12 +622,14 @@ static int crypt_iv_lmk_one(struct crypt_config *cc, u8 *iv,
 static int crypt_iv_lmk_gen(struct crypt_config *cc, u8 *iv,
 			    struct dm_crypt_request *dmreq)
 {
+	struct scatterlist *sg;
 	u8 *src;
 	int r = 0;
 
 	if (bio_data_dir(dmreq->ctx->bio_in) == WRITE) {
-		src = kmap_atomic(sg_page(&dmreq->sg_in));
-		r = crypt_iv_lmk_one(cc, iv, dmreq, src + dmreq->sg_in.offset);
+		sg = crypt_get_sg_data(cc, dmreq->sg_in);
+		src = kmap_atomic(sg_page(sg));
+		r = crypt_iv_lmk_one(cc, iv, dmreq, src + sg->offset);
 		kunmap_atomic(src);
 	} else
 		memset(iv, 0, cc->iv_size);
@@ -601,18 +640,20 @@ static int crypt_iv_lmk_gen(struct crypt_config *cc, u8 *iv,
 static int crypt_iv_lmk_post(struct crypt_config *cc, u8 *iv,
 			     struct dm_crypt_request *dmreq)
 {
+	struct scatterlist *sg;
 	u8 *dst;
 	int r;
 
 	if (bio_data_dir(dmreq->ctx->bio_in) == WRITE)
 		return 0;
 
-	dst = kmap_atomic(sg_page(&dmreq->sg_out));
-	r = crypt_iv_lmk_one(cc, iv, dmreq, dst + dmreq->sg_out.offset);
+	sg = crypt_get_sg_data(cc, dmreq->sg_out);
+	dst = kmap_atomic(sg_page(sg));
+	r = crypt_iv_lmk_one(cc, iv, dmreq, dst + sg->offset);
 
 	/* Tweak the first block of plaintext sector */
 	if (!r)
-		crypto_xor(dst + dmreq->sg_out.offset, iv, cc->iv_size);
+		crypto_xor(dst + sg->offset, iv, cc->iv_size);
 
 	kunmap_atomic(dst);
 	return r;
@@ -724,6 +765,7 @@ static int crypt_iv_tcw_whitening(struct crypt_config *cc,
 static int crypt_iv_tcw_gen(struct crypt_config *cc, u8 *iv,
 			    struct dm_crypt_request *dmreq)
 {
+	struct scatterlist *sg;
 	struct iv_tcw_private *tcw = &cc->iv_gen_private.tcw;
 	__le64 sector = cpu_to_le64(dmreq->iv_sector);
 	u8 *src;
@@ -731,8 +773,9 @@ static int crypt_iv_tcw_gen(struct crypt_config *cc, u8 *iv,
 
 	/* Remove whitening from ciphertext */
 	if (bio_data_dir(dmreq->ctx->bio_in) != WRITE) {
-		src = kmap_atomic(sg_page(&dmreq->sg_in));
-		r = crypt_iv_tcw_whitening(cc, dmreq, src + dmreq->sg_in.offset);
+		sg = crypt_get_sg_data(cc, dmreq->sg_in);
+		src = kmap_atomic(sg_page(sg));
+		r = crypt_iv_tcw_whitening(cc, dmreq, src + sg->offset);
 		kunmap_atomic(src);
 	}
 
@@ -748,6 +791,7 @@ static int crypt_iv_tcw_gen(struct crypt_config *cc, u8 *iv,
 static int crypt_iv_tcw_post(struct crypt_config *cc, u8 *iv,
 			     struct dm_crypt_request *dmreq)
 {
+	struct scatterlist *sg;
 	u8 *dst;
 	int r;
 
@@ -755,13 +799,22 @@ static int crypt_iv_tcw_post(struct crypt_config *cc, u8 *iv,
 		return 0;
 
 	/* Apply whitening on ciphertext */
-	dst = kmap_atomic(sg_page(&dmreq->sg_out));
-	r = crypt_iv_tcw_whitening(cc, dmreq, dst + dmreq->sg_out.offset);
+	sg = crypt_get_sg_data(cc, dmreq->sg_out);
+	dst = kmap_atomic(sg_page(sg));
+	r = crypt_iv_tcw_whitening(cc, dmreq, dst + sg->offset);
 	kunmap_atomic(dst);
 
 	return r;
 }
 
+static int crypt_iv_random_gen(struct crypt_config *cc, u8 *iv,
+				struct dm_crypt_request *dmreq)
+{
+	/* Used only for writes, there must be an additional space to store IV */
+	get_random_bytes(iv, cc->iv_size);
+	return 0;
+}
+
 static const struct crypt_iv_operations crypt_iv_plain_ops = {
 	.generator = crypt_iv_plain_gen
 };
@@ -806,6 +859,108 @@ static const struct crypt_iv_operations crypt_iv_tcw_ops = {
 	.post	   = crypt_iv_tcw_post
 };
 
+static struct crypt_iv_operations crypt_iv_random_ops = {
+	.generator = crypt_iv_random_gen
+};
+
+/*
+ * Integrity extensions
+ */
+static bool crypt_integrity_aead(struct crypt_config *cc)
+{
+	return test_bit(CRYPT_MODE_INTEGRITY_AEAD, &cc->cipher_flags);
+}
+
+static bool crypt_integrity_hmac(struct crypt_config *cc)
+{
+	return test_bit(CRYPT_MODE_INTEGRITY_HMAC, &cc->cipher_flags);
+}
+
+static bool crypt_integrity_mode(struct crypt_config *cc)
+{
+	return crypt_integrity_aead(cc) || crypt_integrity_hmac(cc);
+}
+
+/* Get sg containing data */
+static struct scatterlist *crypt_get_sg_data(struct crypt_config *cc,
+					     struct scatterlist *sg)
+{
+	if (unlikely(crypt_integrity_mode(cc)))
+		return &sg[2];
+
+	return sg;
+}
+
+static int dm_crypt_integrity_io_alloc(struct dm_crypt_io *io, struct bio *bio)
+{
+	struct bio_integrity_payload *bip;
+	unsigned int tag_len;
+	int ret;
+
+	if (!bio_sectors(bio) || !io->cc->on_disk_tag_size)
+		return 0;
+
+	bip = bio_integrity_alloc(bio, GFP_NOIO, 1);
+	if (IS_ERR(bip))
+		return PTR_ERR(bip);
+
+	tag_len = io->cc->on_disk_tag_size * bio_sectors(bio);
+
+	bip->bip_iter.bi_size = tag_len;
+	bip->bip_iter.bi_sector = io->cc->start + io->sector;
+
+	/* We own the metadata, do not let bio_free to release it */
+	bip->bip_flags &= ~BIP_BLOCK_INTEGRITY;
+
+	ret = bio_integrity_add_page(bio, virt_to_page(io->integrity_metadata),
+				     tag_len, offset_in_page(io->integrity_metadata));
+	if (unlikely(ret != tag_len))
+		return -ENOMEM;
+
+	return 0;
+}
+
+static int crypt_integrity_ctr(struct crypt_config *cc, struct dm_target *ti)
+{
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+	struct blk_integrity *bi = blk_get_integrity(cc->dev->bdev->bd_disk);
+
+	/* From now we require underlying device with our integrity profile */
+	if (!bi || strcasecmp(bi->profile->name, "DM-DIF-EXT-TAG")) {
+		ti->error = "Integrity profile not supported.";
+		return -EINVAL;
+	}
+
+	if (bi->tag_size != cc->on_disk_tag_size) {
+		ti->error = "Integrity profile tag size mismatch.";
+		return -EINVAL;
+	}
+
+	if (crypt_integrity_mode(cc)) {
+		cc->integrity_tag_size = cc->on_disk_tag_size - cc->integrity_iv_size;
+		DMINFO("Integrity AEAD, tag size %u, IV size %u.",
+		       cc->integrity_tag_size, cc->integrity_iv_size);
+
+		if (crypto_aead_setauthsize(any_tfm_aead(cc), cc->integrity_tag_size)) {
+			ti->error = "Integrity AEAD auth tag size is not supported.";
+			return -EINVAL;
+		}
+	} else if (cc->integrity_iv_size)
+		DMINFO("Additional per-sector space %u bytes for IV.",
+		       cc->integrity_iv_size);
+
+	if ((cc->integrity_tag_size + cc->integrity_iv_size) != bi->tag_size) {
+		ti->error = "Not enough space for integrity tag in the profile.";
+		return -EINVAL;
+	}
+
+	return 0;
+#else
+	ti->error = "Integrity profile not supported.";
+	return -EINVAL;
+#endif
+}
+
 static void crypt_convert_init(struct crypt_config *cc,
 			       struct convert_context *ctx,
 			       struct bio *bio_out, struct bio *bio_in,
@@ -822,58 +977,206 @@ static void crypt_convert_init(struct crypt_config *cc,
 }
 
 static struct dm_crypt_request *dmreq_of_req(struct crypt_config *cc,
-					     struct skcipher_request *req)
+					     void *req)
 {
 	return (struct dm_crypt_request *)((char *)req + cc->dmreq_start);
 }
 
-static struct skcipher_request *req_of_dmreq(struct crypt_config *cc,
-					       struct dm_crypt_request *dmreq)
+static void *req_of_dmreq(struct crypt_config *cc, struct dm_crypt_request *dmreq)
 {
-	return (struct skcipher_request *)((char *)dmreq - cc->dmreq_start);
+	return (void *)((char *)dmreq - cc->dmreq_start);
 }
 
 static u8 *iv_of_dmreq(struct crypt_config *cc,
 		       struct dm_crypt_request *dmreq)
 {
-	return (u8 *)ALIGN((unsigned long)(dmreq + 1),
-		crypto_skcipher_alignmask(any_tfm(cc)) + 1);
+	if (crypt_integrity_mode(cc))
+		return (u8 *)ALIGN((unsigned long)(dmreq + 1),
+			crypto_aead_alignmask(any_tfm_aead(cc)) + 1);
+	else
+		return (u8 *)ALIGN((unsigned long)(dmreq + 1),
+			crypto_skcipher_alignmask(any_tfm(cc)) + 1);
 }
 
-static int crypt_convert_block(struct crypt_config *cc,
-			       struct convert_context *ctx,
-			       struct skcipher_request *req)
+static u8 *org_iv_of_dmreq(struct crypt_config *cc,
+		       struct dm_crypt_request *dmreq)
+{
+	return iv_of_dmreq(cc, dmreq) + cc->iv_size;
+}
+
+static uint64_t *org_sector_of_dmreq(struct crypt_config *cc,
+		       struct dm_crypt_request *dmreq)
+{
+	u8 *ptr = iv_of_dmreq(cc, dmreq) + cc->iv_size + cc->iv_size;
+	return (uint64_t*) ptr;
+}
+
+static unsigned int *org_tag_of_dmreq(struct crypt_config *cc,
+		       struct dm_crypt_request *dmreq)
+{
+	u8 *ptr = iv_of_dmreq(cc, dmreq) + cc->iv_size +
+		  cc->iv_size + sizeof(uint64_t);
+	return (unsigned int*)ptr;
+}
+
+static void *tag_from_dmreq(struct crypt_config *cc,
+				struct dm_crypt_request *dmreq)
+{
+	struct convert_context *ctx = dmreq->ctx;
+	struct dm_crypt_io *io = container_of(ctx, struct dm_crypt_io, ctx);
+
+	return &io->integrity_metadata[*org_tag_of_dmreq(cc, dmreq) *
+		cc->on_disk_tag_size];
+}
+
+static void *iv_tag_from_dmreq(struct crypt_config *cc,
+			       struct dm_crypt_request *dmreq)
+{
+	return tag_from_dmreq(cc, dmreq) + cc->integrity_tag_size;
+}
+
+static int crypt_convert_block_aead(struct crypt_config *cc,
+				     struct convert_context *ctx,
+				     struct aead_request *req,
+				     unsigned int tag_offset)
 {
 	struct bio_vec bv_in = bio_iter_iovec(ctx->bio_in, ctx->iter_in);
 	struct bio_vec bv_out = bio_iter_iovec(ctx->bio_out, ctx->iter_out);
 	struct dm_crypt_request *dmreq;
-	u8 *iv;
-	int r;
+	unsigned int data_len = 1 << SECTOR_SHIFT;
+	u8 *iv, *org_iv, *tag_iv, *tag;
+	uint64_t *sector;
+	int r = 0;
+
+	BUG_ON(cc->integrity_iv_size && cc->integrity_iv_size != cc->iv_size);
 
 	dmreq = dmreq_of_req(cc, req);
+	dmreq->iv_sector = ctx->cc_sector;
+	dmreq->ctx = ctx;
+
+	*org_tag_of_dmreq(cc, dmreq) = tag_offset;
+
+	sector = org_sector_of_dmreq(cc, dmreq);
+	*sector = cpu_to_le64(ctx->cc_sector - cc->iv_offset);
+
 	iv = iv_of_dmreq(cc, dmreq);
+	org_iv = org_iv_of_dmreq(cc, dmreq);
+	tag = tag_from_dmreq(cc, dmreq);
+	tag_iv = iv_tag_from_dmreq(cc, dmreq);
+
+	/* AEAD request:
+	 *  |----- AAD -------|------ DATA -------|-- AUTH TAG --|
+	 *  | (authenticated) | (auth+encryption) |              |
+	 *  | sector_LE |  IV |  sector in/out    |  tag in/out  |
+	 */
+	sg_init_table(dmreq->sg_in, 4);
+	sg_set_buf(&dmreq->sg_in[0], sector, sizeof(uint64_t));
+	sg_set_buf(&dmreq->sg_in[1], org_iv, cc->iv_size);
+	sg_set_page(&dmreq->sg_in[2], bv_in.bv_page, data_len, bv_in.bv_offset);
+	sg_set_buf(&dmreq->sg_in[3], tag, cc->integrity_tag_size);
+
+	sg_init_table(dmreq->sg_out, 4);
+	sg_set_buf(&dmreq->sg_out[0], sector, sizeof(uint64_t));
+	sg_set_buf(&dmreq->sg_out[1], org_iv, cc->iv_size);
+	sg_set_page(&dmreq->sg_out[2], bv_out.bv_page, data_len, bv_out.bv_offset);
+	sg_set_buf(&dmreq->sg_out[3], tag, cc->integrity_tag_size);
+
+	if (cc->iv_gen_ops) {
+		/* For READs use IV stored in integrity metadata */
+		if (cc->integrity_iv_size && bio_data_dir(ctx->bio_in) != WRITE) {
+			memcpy(org_iv, tag_iv, cc->iv_size);
+		} else {
+			r = cc->iv_gen_ops->generator(cc, org_iv, dmreq);
+			if (r < 0)
+				return r;
+			/* Store generated IV in integrity metadata */
+			if (cc->integrity_iv_size)
+				memcpy(tag_iv, org_iv, cc->iv_size);
+		}
+		/* Working copy of IV, to be modified in crypto API */
+		memcpy(iv, org_iv, cc->iv_size);
+	}
+
+	aead_request_set_ad(req, sizeof(uint64_t) + cc->iv_size);
+	if (bio_data_dir(ctx->bio_in) == WRITE) {
+		aead_request_set_crypt(req, dmreq->sg_in, dmreq->sg_out,
+				       data_len, iv);
+		r = crypto_aead_encrypt(req);
+		if (cc->integrity_tag_size + cc->integrity_iv_size != cc->on_disk_tag_size)
+			memset(tag + cc->integrity_tag_size + cc->integrity_iv_size, 0,
+			       cc->on_disk_tag_size - (cc->integrity_tag_size + cc->integrity_iv_size));
+	} else {
+		aead_request_set_crypt(req, dmreq->sg_in, dmreq->sg_out,
+				       data_len + cc->integrity_tag_size, iv);
+		r = crypto_aead_decrypt(req);
+	}
+	if (r == -EBADMSG)
+		DMERR("INTEGRITY AEAD ERROR, sector %llu",
+		      (unsigned long long)le64_to_cpu(*sector));
+
+	if (!r && cc->iv_gen_ops && cc->iv_gen_ops->post)
+		r = cc->iv_gen_ops->post(cc, org_iv, dmreq);
+
+	bio_advance_iter(ctx->bio_in, &ctx->iter_in, data_len);
+	bio_advance_iter(ctx->bio_out, &ctx->iter_out, data_len);
+
+	return r;
+}
+
+static int crypt_convert_block_skcipher(struct crypt_config *cc,
+					struct convert_context *ctx,
+					struct skcipher_request *req,
+					unsigned int tag_offset)
+{
+	struct bio_vec bv_in = bio_iter_iovec(ctx->bio_in, ctx->iter_in);
+	struct bio_vec bv_out = bio_iter_iovec(ctx->bio_out, ctx->iter_out);
+	struct scatterlist *sg_in, *sg_out;
+	struct dm_crypt_request *dmreq;
+	unsigned int data_len = 1 << SECTOR_SHIFT;
+	u8 *iv, *org_iv, *tag_iv;
+	uint64_t *sector;
+	int r = 0;
 
+	dmreq = dmreq_of_req(cc, req);
 	dmreq->iv_sector = ctx->cc_sector;
 	dmreq->ctx = ctx;
-	sg_init_table(&dmreq->sg_in, 1);
-	sg_set_page(&dmreq->sg_in, bv_in.bv_page, 1 << SECTOR_SHIFT,
-		    bv_in.bv_offset);
 
-	sg_init_table(&dmreq->sg_out, 1);
-	sg_set_page(&dmreq->sg_out, bv_out.bv_page, 1 << SECTOR_SHIFT,
-		    bv_out.bv_offset);
+	*org_tag_of_dmreq(cc, dmreq) = tag_offset;
+
+	iv = iv_of_dmreq(cc, dmreq);
+	org_iv = org_iv_of_dmreq(cc, dmreq);
+	tag_iv = iv_tag_from_dmreq(cc, dmreq);
+
+	sector = org_sector_of_dmreq(cc, dmreq);
+	*sector = cpu_to_le64(ctx->cc_sector - cc->iv_offset);
+
+	/* For skcipher we use only the first sg item */
+	sg_in  = &dmreq->sg_in[0];
+	sg_out = &dmreq->sg_out[0];
 
-	bio_advance_iter(ctx->bio_in, &ctx->iter_in, 1 << SECTOR_SHIFT);
-	bio_advance_iter(ctx->bio_out, &ctx->iter_out, 1 << SECTOR_SHIFT);
+	sg_init_table(sg_in, 1);
+	sg_set_page(sg_in, bv_in.bv_page, data_len, bv_in.bv_offset);
+
+	sg_init_table(sg_out, 1);
+	sg_set_page(sg_out, bv_out.bv_page, data_len, bv_out.bv_offset);
 
 	if (cc->iv_gen_ops) {
-		r = cc->iv_gen_ops->generator(cc, iv, dmreq);
-		if (r < 0)
-			return r;
+		/* For READs use IV stored in integrity metadata */
+		if (cc->integrity_iv_size && bio_data_dir(ctx->bio_in) != WRITE) {
+			memcpy(org_iv, tag_iv, cc->integrity_iv_size);
+		} else {
+			r = cc->iv_gen_ops->generator(cc, org_iv, dmreq);
+			if (r < 0)
+				return r;
+			/* Store generated IV in integrity metadata */
+			if (cc->integrity_iv_size)
+				memcpy(tag_iv, org_iv, cc->integrity_iv_size);
+		}
+		/* Working copy of IV, to be modified in crypto API */
+		memcpy(iv, org_iv, cc->iv_size);
 	}
 
-	skcipher_request_set_crypt(req, &dmreq->sg_in, &dmreq->sg_out,
-				   1 << SECTOR_SHIFT, iv);
+	skcipher_request_set_crypt(req, sg_in, sg_out, data_len, iv);
 
 	if (bio_data_dir(ctx->bio_in) == WRITE)
 		r = crypto_skcipher_encrypt(req);
@@ -881,7 +1184,10 @@ static int crypt_convert_block(struct crypt_config *cc,
 		r = crypto_skcipher_decrypt(req);
 
 	if (!r && cc->iv_gen_ops && cc->iv_gen_ops->post)
-		r = cc->iv_gen_ops->post(cc, iv, dmreq);
+		r = cc->iv_gen_ops->post(cc, org_iv, dmreq);
+
+	bio_advance_iter(ctx->bio_in, &ctx->iter_in, data_len);
+	bio_advance_iter(ctx->bio_out, &ctx->iter_out, data_len);
 
 	return r;
 }
@@ -889,27 +1195,53 @@ static int crypt_convert_block(struct crypt_config *cc,
 static void kcryptd_async_done(struct crypto_async_request *async_req,
 			       int error);
 
-static void crypt_alloc_req(struct crypt_config *cc,
-			    struct convert_context *ctx)
+static void crypt_alloc_req_skcipher(struct crypt_config *cc,
+				     struct convert_context *ctx)
 {
 	unsigned key_index = ctx->cc_sector & (cc->tfms_count - 1);
 
-	if (!ctx->req)
-		ctx->req = mempool_alloc(cc->req_pool, GFP_NOIO);
+	if (!ctx->r.req)
+		ctx->r.req = mempool_alloc(cc->req_pool, GFP_NOIO);
 
-	skcipher_request_set_tfm(ctx->req, cc->tfms[key_index]);
+	skcipher_request_set_tfm(ctx->r.req, cc->cipher_tfm.tfms[key_index]);
 
 	/*
 	 * Use REQ_MAY_BACKLOG so a cipher driver internally backlogs
 	 * requests if driver request queue is full.
 	 */
-	skcipher_request_set_callback(ctx->req,
+	skcipher_request_set_callback(ctx->r.req,
 	    CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
-	    kcryptd_async_done, dmreq_of_req(cc, ctx->req));
+	    kcryptd_async_done, dmreq_of_req(cc, ctx->r.req));
 }
 
-static void crypt_free_req(struct crypt_config *cc,
-			   struct skcipher_request *req, struct bio *base_bio)
+static void crypt_alloc_req_aead(struct crypt_config *cc,
+				 struct convert_context *ctx)
+{
+	if (!ctx->r.req_aead)
+		ctx->r.req_aead = mempool_alloc(cc->req_pool, GFP_NOIO);
+
+	aead_request_set_tfm(ctx->r.req_aead, cc->cipher_tfm.tfms_aead[0]);
+
+	/*
+	 * Use REQ_MAY_BACKLOG so a cipher driver internally backlogs
+	 * requests if driver request queue is full.
+	 */
+	aead_request_set_callback(ctx->r.req_aead,
+	    CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
+	    kcryptd_async_done, dmreq_of_req(cc, ctx->r.req_aead));
+}
+
+static void crypt_alloc_req(struct crypt_config *cc,
+			    struct convert_context *ctx)
+{
+	if (crypt_integrity_mode(cc))
+		crypt_alloc_req_aead(cc, ctx);
+	else
+		crypt_alloc_req_skcipher(cc, ctx);
+}
+
+static void crypt_free_req_skcipher(struct crypt_config *cc,
+				    struct skcipher_request *req, struct bio *base_bio)
 {
 	struct dm_crypt_io *io = dm_per_bio_data(base_bio, cc->per_bio_data_size);
 
@@ -917,12 +1249,30 @@ static void crypt_free_req(struct crypt_config *cc,
 		mempool_free(req, cc->req_pool);
 }
 
+static void crypt_free_req_aead(struct crypt_config *cc,
+				struct aead_request *req, struct bio *base_bio)
+{
+	struct dm_crypt_io *io = dm_per_bio_data(base_bio, cc->per_bio_data_size);
+
+	if ((struct aead_request *)(io + 1) != req)
+		mempool_free(req, cc->req_pool);
+}
+
+static void crypt_free_req(struct crypt_config *cc, void *req, struct bio *base_bio)
+{
+	if (crypt_integrity_mode(cc))
+		crypt_free_req_aead(cc, req, base_bio);
+	else
+		crypt_free_req_skcipher(cc, req, base_bio);
+}
+
 /*
  * Encrypt / decrypt data from one bio to another one (can be the same one)
  */
 static int crypt_convert(struct crypt_config *cc,
 			 struct convert_context *ctx)
 {
+	unsigned int tag_offset = 0;
 	int r;
 
 	atomic_set(&ctx->cc_pending, 1);
@@ -933,7 +1283,10 @@ static int crypt_convert(struct crypt_config *cc,
 
 		atomic_inc(&ctx->cc_pending);
 
-		r = crypt_convert_block(cc, ctx, ctx->req);
+		if (crypt_integrity_mode(cc))
+			r = crypt_convert_block_aead(cc, ctx, ctx->r.req_aead, tag_offset);
+		else
+			r = crypt_convert_block_skcipher(cc, ctx, ctx->r.req, tag_offset);
 
 		switch (r) {
 		/*
@@ -949,8 +1302,9 @@ static int crypt_convert(struct crypt_config *cc,
 		 * completion function kcryptd_async_done() will be called.
 		 */
 		case -EINPROGRESS:
-			ctx->req = NULL;
+			ctx->r.req = NULL;
 			ctx->cc_sector++;
+			tag_offset++;
 			continue;
 		/*
 		 * The request was already processed (synchronously).
@@ -958,6 +1312,7 @@ static int crypt_convert(struct crypt_config *cc,
 		case 0:
 			atomic_dec(&ctx->cc_pending);
 			ctx->cc_sector++;
+			tag_offset++;
 			cond_resched();
 			continue;
 
@@ -1031,6 +1386,13 @@ static struct bio *crypt_alloc_buffer(struct dm_crypt_io *io, unsigned size)
 	if (unlikely(gfp_mask & __GFP_DIRECT_RECLAIM))
 		mutex_unlock(&cc->bio_alloc_lock);
 
+	/* Allocate space for integrity tags */
+	if (dm_crypt_integrity_io_alloc(io, clone)) {
+		crypt_free_buffer_pages(cc, clone);
+		bio_put(clone);
+		clone = NULL;
+	}
+
 	return clone;
 }
 
@@ -1053,7 +1415,9 @@ static void crypt_io_init(struct dm_crypt_io *io, struct crypt_config *cc,
 	io->base_bio = bio;
 	io->sector = sector;
 	io->error = 0;
-	io->ctx.req = NULL;
+	io->ctx.r.req = NULL;
+	io->integrity_metadata = NULL;
+	io->integrity_metadata_from_pool = false;
 	atomic_set(&io->io_pending, 0);
 }
 
@@ -1075,8 +1439,13 @@ static void crypt_dec_pending(struct dm_crypt_io *io)
 	if (!atomic_dec_and_test(&io->io_pending))
 		return;
 
-	if (io->ctx.req)
-		crypt_free_req(cc, io->ctx.req, base_bio);
+	if (io->ctx.r.req)
+		crypt_free_req(cc, io->ctx.r.req, base_bio);
+
+	if (unlikely(io->integrity_metadata_from_pool))
+		mempool_free(io->integrity_metadata, io->cc->tag_pool);
+	else
+		kfree(io->integrity_metadata);
 
 	base_bio->bi_error = error;
 	bio_endio(base_bio);
@@ -1156,6 +1525,12 @@ static int kcryptd_io_read(struct dm_crypt_io *io, gfp_t gfp)
 	clone_init(io, clone);
 	clone->bi_iter.bi_sector = cc->start + io->sector;
 
+	if (dm_crypt_integrity_io_alloc(io, clone)) {
+		crypt_dec_pending(io);
+		bio_put(clone);
+		return 1;
+	}
+
 	generic_make_request(clone);
 	return 0;
 }
@@ -1371,8 +1746,12 @@ static void kcryptd_async_done(struct crypto_async_request *async_req,
 		return;
 	}
 
+	if (error == -EBADMSG)
+		DMERR("INTEGRITY AEAD ERROR, sector %llu",
+		      (unsigned long long)le64_to_cpu(*org_sector_of_dmreq(cc, dmreq)));
+
 	if (!error && cc->iv_gen_ops && cc->iv_gen_ops->post)
-		error = cc->iv_gen_ops->post(cc, iv_of_dmreq(cc, dmreq), dmreq);
+		error = cc->iv_gen_ops->post(cc, org_iv_of_dmreq(cc, dmreq), dmreq);
 
 	if (error < 0)
 		io->error = -EIO;
@@ -1430,37 +1809,59 @@ static int crypt_decode_key(u8 *key, char *hex, unsigned int size)
 	return 0;
 }
 
-static void crypt_free_tfms(struct crypt_config *cc)
+static void crypt_free_tfms_aead(struct crypt_config *cc)
+{
+	if (!cc->cipher_tfm.tfms_aead)
+		return;
+
+	if (cc->cipher_tfm.tfms_aead[0] && !IS_ERR(cc->cipher_tfm.tfms_aead[0])) {
+		crypto_free_aead(cc->cipher_tfm.tfms_aead[0]);
+		cc->cipher_tfm.tfms_aead[0] = NULL;
+	}
+
+	kfree(cc->cipher_tfm.tfms_aead);
+	cc->cipher_tfm.tfms_aead = NULL;
+}
+
+static void crypt_free_tfms_skcipher(struct crypt_config *cc)
 {
 	unsigned i;
 
-	if (!cc->tfms)
+	if (!cc->cipher_tfm.tfms)
 		return;
 
 	for (i = 0; i < cc->tfms_count; i++)
-		if (cc->tfms[i] && !IS_ERR(cc->tfms[i])) {
-			crypto_free_skcipher(cc->tfms[i]);
-			cc->tfms[i] = NULL;
+		if (cc->cipher_tfm.tfms[i] && !IS_ERR(cc->cipher_tfm.tfms[i])) {
+			crypto_free_skcipher(cc->cipher_tfm.tfms[i]);
+			cc->cipher_tfm.tfms[i] = NULL;
 		}
 
-	kfree(cc->tfms);
-	cc->tfms = NULL;
+	kfree(cc->cipher_tfm.tfms);
+	cc->cipher_tfm.tfms = NULL;
 }
 
-static int crypt_alloc_tfms(struct crypt_config *cc, char *ciphermode)
+static void crypt_free_tfms(struct crypt_config *cc)
+{
+	if (crypt_integrity_mode(cc))
+		crypt_free_tfms_aead(cc);
+	else
+		crypt_free_tfms_skcipher(cc);
+}
+
+static int crypt_alloc_tfms_skcipher(struct crypt_config *cc, char *ciphermode)
 {
 	unsigned i;
 	int err;
 
-	cc->tfms = kzalloc(cc->tfms_count * sizeof(struct crypto_skcipher *),
-			   GFP_KERNEL);
-	if (!cc->tfms)
+	cc->cipher_tfm.tfms = kzalloc(cc->tfms_count *
+				      sizeof(struct crypto_skcipher *), GFP_KERNEL);
+	if (!cc->cipher_tfm.tfms)
 		return -ENOMEM;
 
 	for (i = 0; i < cc->tfms_count; i++) {
-		cc->tfms[i] = crypto_alloc_skcipher(ciphermode, 0, 0);
-		if (IS_ERR(cc->tfms[i])) {
-			err = PTR_ERR(cc->tfms[i]);
+		cc->cipher_tfm.tfms[i] = crypto_alloc_skcipher(ciphermode, 0, 0);
+		if (IS_ERR(cc->cipher_tfm.tfms[i])) {
+			err = PTR_ERR(cc->cipher_tfm.tfms[i]);
 			crypt_free_tfms(cc);
 			return err;
 		}
@@ -1469,22 +1870,111 @@ static int crypt_alloc_tfms(struct crypt_config *cc, char *ciphermode)
 	return 0;
 }
 
+static int crypt_alloc_tfms_aead(struct crypt_config *cc, char *ciphermode)
+{
+	char *authenc = NULL;
+	int err;
+
+	cc->cipher_tfm.tfms = kmalloc(sizeof(struct crypto_aead *), GFP_KERNEL);
+	if (!cc->cipher_tfm.tfms)
+		return -ENOMEM;
+
+	/* Compose AEAD cipher with autenc(authenticator,cipher) structure */
+	if (crypt_integrity_hmac(cc)) {
+		authenc = kmalloc(CRYPTO_MAX_ALG_NAME, GFP_KERNEL);
+		if (!authenc)
+			return -ENOMEM;
+		err = snprintf(authenc, CRYPTO_MAX_ALG_NAME,
+		       "authenc(%s,%s)", cc->cipher_auth, ciphermode);
+		if (err < 0) {
+			kzfree(authenc);
+			return err;
+		}
+		ciphermode = authenc;
+	}
+
+	cc->cipher_tfm.tfms_aead[0] = crypto_alloc_aead(ciphermode, 0, 0);
+	if (IS_ERR(cc->cipher_tfm.tfms_aead[0])) {
+		err = PTR_ERR(cc->cipher_tfm.tfms_aead[0]);
+		crypt_free_tfms(cc);
+		return err;
+	}
+
+	kzfree(authenc);
+	return 0;
+}
+
+static int crypt_alloc_tfms(struct crypt_config *cc, char *ciphermode)
+{
+	if (crypt_integrity_mode(cc))
+		return crypt_alloc_tfms_aead(cc, ciphermode);
+	else
+		return crypt_alloc_tfms_skcipher(cc, ciphermode);
+}
+
+static unsigned crypt_subkey_size(struct crypt_config *cc)
+{
+	return (cc->key_size - cc->key_extra_size) >> ilog2(cc->tfms_count);
+}
+
+static unsigned crypt_authenckey_size(struct crypt_config *cc)
+{
+	return crypt_subkey_size(cc) + RTA_SPACE(sizeof(struct crypto_authenc_key_param));
+}
+
+/*
+ * If AEAD is composed like authenc(hmac(sha256),xts(aes)),
+ * the key must be for some reason in special format.
+ * This funcion converts cc->key to this special format.
+ */
+static void crypt_copy_authenckey(char *p, const void *key,
+				  unsigned enckeylen, unsigned authkeylen)
+{
+	struct crypto_authenc_key_param *param;
+	struct rtattr *rta;
+
+	rta = (struct rtattr *)p;
+	param = RTA_DATA(rta);
+	param->enckeylen = cpu_to_be32(enckeylen);
+	rta->rta_len = RTA_LENGTH(sizeof(*param));
+	rta->rta_type = CRYPTO_AUTHENC_KEYA_PARAM;
+	p += RTA_SPACE(sizeof(*param));
+	memcpy(p, key + enckeylen, authkeylen);
+	p += authkeylen;
+	memcpy(p, key, enckeylen);
+}
+
 static int crypt_setkey(struct crypt_config *cc)
 {
 	unsigned subkey_size;
 	int err = 0, i, r;
 
 	/* Ignore extra keys (which are used for IV etc) */
-	subkey_size = (cc->key_size - cc->key_extra_size) >> ilog2(cc->tfms_count);
+	subkey_size = crypt_subkey_size(cc);
 
+	if (crypt_integrity_hmac(cc))
+		crypt_copy_authenckey(cc->authenc_key, cc->key,
+				      subkey_size - cc->key_mac_size,
+				      cc->key_mac_size);
 	for (i = 0; i < cc->tfms_count; i++) {
-		r = crypto_skcipher_setkey(cc->tfms[i],
-					   cc->key + (i * subkey_size),
-					   subkey_size);
+		if (crypt_integrity_aead(cc))
+			r = crypto_aead_setkey(cc->cipher_tfm.tfms_aead[i],
+						   cc->key + (i * subkey_size),
+						   subkey_size);
+		else if (crypt_integrity_hmac(cc))
+			r = crypto_aead_setkey(cc->cipher_tfm.tfms_aead[i],
+				cc->authenc_key, crypt_authenckey_size(cc));
+		else
+			r = crypto_skcipher_setkey(cc->cipher_tfm.tfms[i],
+						   cc->key + (i * subkey_size),
+						   subkey_size);
 		if (r)
 			err = r;
 	}
 
+	if (crypt_integrity_hmac(cc))
+		memzero_explicit(cc->authenc_key, crypt_authenckey_size(cc));
+
 	return err;
 }
 
@@ -1681,6 +2171,7 @@ static void crypt_dtr(struct dm_target *ti)
 
 	mempool_destroy(cc->page_pool);
 	mempool_destroy(cc->req_pool);
+	mempool_destroy(cc->tag_pool);
 
 	if (cc->iv_gen_ops && cc->iv_gen_ops->dtr)
 		cc->iv_gen_ops->dtr(cc);
@@ -1691,6 +2182,8 @@ static void crypt_dtr(struct dm_target *ti)
 	kzfree(cc->cipher);
 	kzfree(cc->cipher_string);
 	kzfree(cc->key_string);
+	kzfree(cc->cipher_auth);
+	kzfree(cc->authenc_key);
 
 	/* Must zero key material before freeing */
 	kzfree(cc);
@@ -1731,7 +2224,6 @@ static int crypt_ctr_cipher(struct dm_target *ti,
 		return -EINVAL;
 	}
 	cc->key_parts = cc->tfms_count;
-	cc->key_extra_size = 0;
 
 	cc->cipher = kstrdup(cipher, GFP_KERNEL);
 	if (!cc->cipher)
@@ -1777,7 +2269,20 @@ static int crypt_ctr_cipher(struct dm_target *ti,
 	}
 
 	/* Initialize IV */
-	cc->iv_size = crypto_skcipher_ivsize(any_tfm(cc));
+	if (crypt_integrity_mode(cc))
+		cc->iv_size = crypto_aead_ivsize(any_tfm_aead(cc));
+	else
+		cc->iv_size = crypto_skcipher_ivsize(any_tfm(cc));
+
+	if (crypt_integrity_hmac(cc)) {
+		cc->authenc_key = kmalloc(crypt_authenckey_size(cc), GFP_KERNEL);
+		if (!cc->authenc_key) {
+			ret = -ENOMEM;
+			ti->error = "Error allocating authenc key space";
+			goto bad;
+		}
+	}
+
 	if (cc->iv_size)
 		/* at least a 64 bit sector number should fit in our buffer */
 		cc->iv_size = max(cc->iv_size,
@@ -1816,6 +2321,10 @@ static int crypt_ctr_cipher(struct dm_target *ti,
 		cc->iv_gen_ops = &crypt_iv_tcw_ops;
 		cc->key_parts += 2; /* IV + whitening */
 		cc->key_extra_size = cc->iv_size + TCW_WHITENING_SIZE;
+	} else if (strcmp(ivmode, "random") == 0) {
+		cc->iv_gen_ops = &crypt_iv_random_ops;
+		/* Need storage space in integrity fields. */
+		cc->integrity_iv_size = cc->iv_size;
 	} else {
 		ret = -EINVAL;
 		ti->error = "Invalid IV mode";
@@ -1857,6 +2366,75 @@ static int crypt_ctr_cipher(struct dm_target *ti,
 	return -ENOMEM;
 }
 
+static int crypt_ctr_optional(struct dm_target *ti, unsigned int argc, char **argv)
+{
+	struct crypt_config *cc = ti->private;
+	struct dm_arg_set as;
+	static struct dm_arg _args[] = {
+		{0, 3, "Invalid number of feature args"},
+	};
+	unsigned int opt_params, val;
+	const char *opt_string, *sval;
+	int ret;
+
+	/* Optional parameters */
+	as.argc = argc;
+	as.argv = argv;
+
+	ret = dm_read_arg_group(_args, &as, &opt_params, &ti->error);
+	if (ret)
+		return ret;
+
+	while (opt_params--) {
+		opt_string = dm_shift_arg(&as);
+		if (!opt_string) {
+			ti->error = "Not enough feature arguments";
+			return -EINVAL;
+		}
+
+		if (!strcasecmp(opt_string, "allow_discards"))
+			ti->num_discard_bios = 1;
+
+		else if (!strcasecmp(opt_string, "same_cpu_crypt"))
+			set_bit(DM_CRYPT_SAME_CPU, &cc->flags);
+
+		else if (!strcasecmp(opt_string, "submit_from_crypt_cpus"))
+			set_bit(DM_CRYPT_NO_OFFLOAD, &cc->flags);
+		else if (sscanf(opt_string, "integrity:%u:", &val) == 1) {
+			cc->on_disk_tag_size = val;
+			sval = strchr(opt_string + strlen("integrity:"), ':') + 1;
+			if (val == 0 || val > MAX_TAG_SIZE || !sval) {
+				ti->error = "Invalid integrity arguments";
+				return -EINVAL;
+			}
+			if (!strcasecmp(sval, "aead")) {
+				set_bit(CRYPT_MODE_INTEGRITY_AEAD, &cc->cipher_flags);
+			} else  if (!strncasecmp(sval, "hmac(", strlen("hmac("))) {
+				struct crypto_ahash *hmac_tfm = crypto_alloc_ahash(sval, 0, 0);
+				if (IS_ERR(hmac_tfm)) {
+					ti->error = "Error initializing HMAC integrity hash.";
+					return PTR_ERR(hmac_tfm);
+				}
+				cc->key_mac_size = crypto_ahash_digestsize(hmac_tfm);
+				crypto_free_ahash(hmac_tfm);
+				set_bit(CRYPT_MODE_INTEGRITY_HMAC, &cc->cipher_flags);
+			} else  if (strcasecmp(sval, "none")) {
+				ti->error = "Unknown integrity profile";
+				return -EINVAL;
+			}
+
+			cc->cipher_auth = kstrdup(sval, GFP_KERNEL);
+			if (!cc->cipher_auth)
+				return -ENOMEM;
+		}  else {
+			ti->error = "Invalid feature arguments";
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
 /*
  * Construct an encryption mapping:
  * <cipher> [<key>|:<key_size>:<user|logon>:<key_description>] <iv_offset> <dev_path> <start>
@@ -1865,18 +2443,12 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 {
 	struct crypt_config *cc;
 	int key_size;
-	unsigned int opt_params;
+	unsigned int align_mask;
 	unsigned long long tmpll;
 	int ret;
-	size_t iv_size_padding;
-	struct dm_arg_set as;
-	const char *opt_string;
+	size_t iv_size_padding, additional_req_size;
 	char dummy;
 
-	static struct dm_arg _args[] = {
-		{0, 3, "Invalid number of feature args"},
-	};
-
 	if (argc < 5) {
 		ti->error = "Not enough arguments";
 		return -EINVAL;
@@ -1896,38 +2468,59 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	cc->key_size = key_size;
 
 	ti->private = cc;
+
+	/* Optional parameters need to be read before cipher constructor */
+	if (argc > 5) {
+		ret = crypt_ctr_optional(ti, argc - 5, &argv[5]);
+		if (ret)
+			goto bad;
+	}
+
 	ret = crypt_ctr_cipher(ti, argv[0], argv[1]);
 	if (ret < 0)
 		goto bad;
 
-	cc->dmreq_start = sizeof(struct skcipher_request);
-	cc->dmreq_start += crypto_skcipher_reqsize(any_tfm(cc));
+	if (crypt_integrity_mode(cc)) {
+		cc->dmreq_start = sizeof(struct aead_request);
+		cc->dmreq_start += crypto_aead_reqsize(any_tfm_aead(cc));
+		align_mask = crypto_aead_alignmask(any_tfm_aead(cc));
+	} else {
+		cc->dmreq_start = sizeof(struct skcipher_request);
+		cc->dmreq_start += crypto_skcipher_reqsize(any_tfm(cc));
+		align_mask = crypto_skcipher_alignmask(any_tfm(cc));
+	}
 	cc->dmreq_start = ALIGN(cc->dmreq_start, __alignof__(struct dm_crypt_request));
 
-	if (crypto_skcipher_alignmask(any_tfm(cc)) < CRYPTO_MINALIGN) {
+	if (align_mask < CRYPTO_MINALIGN) {
 		/* Allocate the padding exactly */
 		iv_size_padding = -(cc->dmreq_start + sizeof(struct dm_crypt_request))
-				& crypto_skcipher_alignmask(any_tfm(cc));
+				& align_mask;
 	} else {
 		/*
 		 * If the cipher requires greater alignment than kmalloc
 		 * alignment, we don't know the exact position of the
 		 * initialization vector. We must assume worst case.
 		 */
-		iv_size_padding = crypto_skcipher_alignmask(any_tfm(cc));
+		iv_size_padding = align_mask;
 	}
 
 	ret = -ENOMEM;
-	cc->req_pool = mempool_create_kmalloc_pool(MIN_IOS, cc->dmreq_start +
-			sizeof(struct dm_crypt_request) + iv_size_padding + cc->iv_size);
+
+	/*  ...| IV + padding | original IV | original sec. number | bio tag offset | */
+	additional_req_size = sizeof(struct dm_crypt_request) +
+		iv_size_padding + cc->iv_size +
+		cc->iv_size +
+		sizeof(uint64_t) +
+		sizeof(unsigned int);
+
+	cc->req_pool = mempool_create_kmalloc_pool(MIN_IOS, cc->dmreq_start + additional_req_size);
 	if (!cc->req_pool) {
 		ti->error = "Cannot allocate crypt request mempool";
 		goto bad;
 	}
 
 	cc->per_bio_data_size = ti->per_io_data_size =
-		ALIGN(sizeof(struct dm_crypt_io) + cc->dmreq_start +
-		      sizeof(struct dm_crypt_request) + iv_size_padding + cc->iv_size,
+		ALIGN(sizeof(struct dm_crypt_io) + cc->dmreq_start + additional_req_size,
 		      ARCH_KMALLOC_MINALIGN);
 
 	cc->page_pool = mempool_create_page_pool(BIO_MAX_PAGES, 0);
@@ -1964,39 +2557,20 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	}
 	cc->start = tmpll;
 
-	argv += 5;
-	argc -= 5;
-
-	/* Optional parameters */
-	if (argc) {
-		as.argc = argc;
-		as.argv = argv;
-
-		ret = dm_read_arg_group(_args, &as, &opt_params, &ti->error);
+	if (crypt_integrity_mode(cc) || cc->integrity_iv_size) {
+		ret = crypt_integrity_ctr(cc, ti);
 		if (ret)
 			goto bad;
 
-		ret = -EINVAL;
-		while (opt_params--) {
-			opt_string = dm_shift_arg(&as);
-			if (!opt_string) {
-				ti->error = "Not enough feature arguments";
-				goto bad;
-			}
-
-			if (!strcasecmp(opt_string, "allow_discards"))
-				ti->num_discard_bios = 1;
-
-			else if (!strcasecmp(opt_string, "same_cpu_crypt"))
-				set_bit(DM_CRYPT_SAME_CPU, &cc->flags);
-
-			else if (!strcasecmp(opt_string, "submit_from_crypt_cpus"))
-				set_bit(DM_CRYPT_NO_OFFLOAD, &cc->flags);
+		cc->tag_pool_max_sectors = POOL_ENTRY_SIZE / cc->on_disk_tag_size;
+		if (!cc->tag_pool_max_sectors)
+			cc->tag_pool_max_sectors = 1;
 
-			else {
-				ti->error = "Invalid feature arguments";
-				goto bad;
-			}
+		cc->tag_pool = mempool_create_kmalloc_pool(MIN_IOS,
+			cc->tag_pool_max_sectors * cc->on_disk_tag_size);
+		if (!cc->tag_pool) {
+			ti->error = "Cannot allocate integrity tags mempool";
+			goto bad;
 		}
 	}
 
@@ -2062,12 +2636,29 @@ static int crypt_map(struct dm_target *ti, struct bio *bio)
 	 * Check if bio is too large, split as needed.
 	 */
 	if (unlikely(bio->bi_iter.bi_size > (BIO_MAX_PAGES << PAGE_SHIFT)) &&
-	    bio_data_dir(bio) == WRITE)
+	    (bio_data_dir(bio) == WRITE || cc->on_disk_tag_size))
 		dm_accept_partial_bio(bio, ((BIO_MAX_PAGES << PAGE_SHIFT) >> SECTOR_SHIFT));
 
 	io = dm_per_bio_data(bio, cc->per_bio_data_size);
 	crypt_io_init(io, cc, bio, dm_target_offset(ti, bio->bi_iter.bi_sector));
-	io->ctx.req = (struct skcipher_request *)(io + 1);
+
+	if (cc->on_disk_tag_size) {
+		unsigned tag_len = cc->on_disk_tag_size * bio_sectors(bio);
+
+		if (unlikely(tag_len > KMALLOC_MAX_SIZE) ||
+		    unlikely(!(io->integrity_metadata = kmalloc(tag_len,
+				GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN)))) {
+			if (bio_sectors(bio) > cc->tag_pool_max_sectors)
+				dm_accept_partial_bio(bio, cc->tag_pool_max_sectors);
+			io->integrity_metadata = mempool_alloc(cc->tag_pool, GFP_NOIO);
+			io->integrity_metadata_from_pool = true;
+		}
+	}
+
+	if (crypt_integrity_mode(cc))
+		io->ctx.r.req_aead = (struct aead_request *)(io + 1);
+	else
+		io->ctx.r.req = (struct skcipher_request *)(io + 1);
 
 	if (bio_data_dir(io->base_bio) == READ) {
 		if (kcryptd_io_read(io, GFP_NOWAIT))
@@ -2108,6 +2699,8 @@ static void crypt_status(struct dm_target *ti, status_type_t type,
 		num_feature_args += !!ti->num_discard_bios;
 		num_feature_args += test_bit(DM_CRYPT_SAME_CPU, &cc->flags);
 		num_feature_args += test_bit(DM_CRYPT_NO_OFFLOAD, &cc->flags);
+		if (cc->on_disk_tag_size)
+			num_feature_args++;
 		if (num_feature_args) {
 			DMEMIT(" %d", num_feature_args);
 			if (ti->num_discard_bios)
@@ -2116,6 +2709,8 @@ static void crypt_status(struct dm_target *ti, status_type_t type,
 				DMEMIT(" same_cpu_crypt");
 			if (test_bit(DM_CRYPT_NO_OFFLOAD, &cc->flags))
 				DMEMIT(" submit_from_crypt_cpus");
+			if (cc->on_disk_tag_size)
+				DMEMIT(" integrity:%u:%s", cc->on_disk_tag_size, cc->cipher_auth);
 		}
 
 		break;
@@ -2216,7 +2811,7 @@ static void crypt_io_hints(struct dm_target *ti, struct queue_limits *limits)
 
 static struct target_type crypt_target = {
 	.name   = "crypt",
-	.version = {1, 15, 0},
+	.version = {1, 16, 0},
 	.module = THIS_MODULE,
 	.ctr    = crypt_ctr,
 	.dtr    = crypt_dtr,
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 0/7] Data integrity protection with dm-integrity and dm-crypt
  2017-01-04 19:23 [RFC PATCH 0/4] Data integrity protection with dm-integrity and dm-crypt Milan Broz
                   ` (3 preceding siblings ...)
  2017-01-04 19:23 ` [RFC PATCH 4/4] Add cryptographic data integrity protection (authenticated encryption) to dm-crypt Milan Broz
@ 2017-03-16 14:39 ` Milan Broz
  2017-03-16 19:12   ` Mike Snitzer
  2017-03-16 14:39 ` [PATCH 1/7] dm-crypt: Fix documentation of integrity table option Milan Broz
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 14+ messages in thread
From: Milan Broz @ 2017-03-16 14:39 UTC (permalink / raw)
  To: dm-devel; +Cc: Milan Broz

These are additional patches to previously submitted integrity protection commits,
based on for-next branch in Mike's DM tree.

Most of them probably can be folded-in, patches are separated for better readability.

The last patch for large encryption sectors is a new feature but because it depends
on all previous patches it is posted in this series.

Milan Broz (7):
  dm-crypt: Fix documentation of integrity table option.
  dm-crypt: Move IV constructor to separate function.
  dm-crypt: Introduce new format of cipher with capi: prefix.
  dm-crypt: Compute HMAC key size in a separate function.
  dm-crypt: Parse cipher specification according to AEAD flag.
  dm-crypt: Remove obsolete integrity_mode function.
  dm-crypt: optionally support larger encryption sector size

 Documentation/device-mapper/dm-crypt.txt |  65 +++--
 drivers/md/dm-crypt.c                    | 479 +++++++++++++++++++++----------
 2 files changed, 371 insertions(+), 173 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/7] dm-crypt: Fix documentation of integrity table option.
  2017-01-04 19:23 [RFC PATCH 0/4] Data integrity protection with dm-integrity and dm-crypt Milan Broz
                   ` (4 preceding siblings ...)
  2017-03-16 14:39 ` [PATCH 0/7] Data integrity protection with dm-integrity and dm-crypt Milan Broz
@ 2017-03-16 14:39 ` Milan Broz
  2017-03-16 14:39 ` [PATCH 2/7] dm-crypt: Move IV constructor to separate function Milan Broz
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Milan Broz @ 2017-03-16 14:39 UTC (permalink / raw)
  To: dm-devel; +Cc: Milan Broz

This patch updates old documentation to really implemented version,
previous "hmac" option was merged to the same processing path.

Signed-off-by: Milan Broz <gmazyland@gmail.com>
---
 Documentation/device-mapper/dm-crypt.txt | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/Documentation/device-mapper/dm-crypt.txt b/Documentation/device-mapper/dm-crypt.txt
index a2a6627aa659..058f26ddf875 100644
--- a/Documentation/device-mapper/dm-crypt.txt
+++ b/Documentation/device-mapper/dm-crypt.txt
@@ -94,20 +94,16 @@ submit_from_crypt_cpus
     same context.
 
 integrity:<bytes>:<type>
-    Calculates and verifies integrity for the encrypted device (uses
-    authenticated encryption). This mode requires metadata stored in per-bio
-    integrity structure of <bytes> in size.
+    The device requires additional <bytes> metadata per-sector stored
+    in per-bio integrity structure. This metadata must by provided
+    by underlying dm-integrity target.
 
-    This option requires that the underlying device is created by dm-integrity
-    target and provides exactly <bytes> of per-sector metadata.
+    The <type> can be "none" if metadata is used only for persistent IV.
 
-    There can by two options for <type>. The first one is used when encryption
-    mode is Authenticated mode (AEAD mode), then type must be just "aead".
-    The second option is integrity calculated by keyed hash (HMAC), then
-    <type> is for example "hmac(sha256)".
-
-    If random IV is used (persistently stored IV in metadata per-sector),
-    then <bytes> includes both space for random IV and authentication tag.
+    For Authenticated Encryption with Additional Data (AEAD)
+    the <type> is "aead". An AEAD mode additionally calculates and verifies
+    integrity for the encrypted device. The additional space is then
+    used for storing authentication tag (and persistent IV if needed).
 
 Example scripts
 ===============
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/7] dm-crypt: Move IV constructor to separate function.
  2017-01-04 19:23 [RFC PATCH 0/4] Data integrity protection with dm-integrity and dm-crypt Milan Broz
                   ` (5 preceding siblings ...)
  2017-03-16 14:39 ` [PATCH 1/7] dm-crypt: Fix documentation of integrity table option Milan Broz
@ 2017-03-16 14:39 ` Milan Broz
  2017-03-16 14:39 ` [PATCH 3/7] dm-crypt: Introduce new format of cipher with capi: prefix Milan Broz
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Milan Broz @ 2017-03-16 14:39 UTC (permalink / raw)
  To: dm-devel; +Cc: Milan Broz

No functional change in this patch, just preparation for next patches.

Signed-off-by: Milan Broz <gmazyland@gmail.com>
---
 drivers/md/dm-crypt.c | 130 +++++++++++++++++++++++++++-----------------------
 1 file changed, 69 insertions(+), 61 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 9ec6d50603f6..faec408dcf50 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -2196,6 +2196,73 @@ static void crypt_dtr(struct dm_target *ti)
 	kzfree(cc);
 }
 
+static int crypt_ctr_ivmode(struct dm_target *ti, const char *ivmode)
+{
+	struct crypt_config *cc = ti->private;
+
+	if (crypt_integrity_mode(cc))
+		cc->iv_size = crypto_aead_ivsize(any_tfm_aead(cc));
+	else
+		cc->iv_size = crypto_skcipher_ivsize(any_tfm(cc));
+
+	if (crypt_integrity_hmac(cc)) {
+		cc->authenc_key = kmalloc(crypt_authenckey_size(cc), GFP_KERNEL);
+		if (!cc->authenc_key) {
+			ti->error = "Error allocating authenc key space";
+			return -ENOMEM;
+		}
+	}
+
+	if (cc->iv_size)
+		/* at least a 64 bit sector number should fit in our buffer */
+		cc->iv_size = max(cc->iv_size,
+				  (unsigned int)(sizeof(u64) / sizeof(u8)));
+	else if (ivmode) {
+		DMWARN("Selected cipher does not support IVs");
+		ivmode = NULL;
+	}
+
+	/* Choose ivmode, see comments at iv code. */
+	if (ivmode == NULL)
+		cc->iv_gen_ops = NULL;
+	else if (strcmp(ivmode, "plain") == 0)
+		cc->iv_gen_ops = &crypt_iv_plain_ops;
+	else if (strcmp(ivmode, "plain64") == 0)
+		cc->iv_gen_ops = &crypt_iv_plain64_ops;
+	else if (strcmp(ivmode, "essiv") == 0)
+		cc->iv_gen_ops = &crypt_iv_essiv_ops;
+	else if (strcmp(ivmode, "benbi") == 0)
+		cc->iv_gen_ops = &crypt_iv_benbi_ops;
+	else if (strcmp(ivmode, "null") == 0)
+		cc->iv_gen_ops = &crypt_iv_null_ops;
+	else if (strcmp(ivmode, "lmk") == 0) {
+		cc->iv_gen_ops = &crypt_iv_lmk_ops;
+		/*
+		 * Version 2 and 3 is recognised according
+		 * to length of provided multi-key string.
+		 * If present (version 3), last key is used as IV seed.
+		 * All keys (including IV seed) are always the same size.
+		 */
+		if (cc->key_size % cc->key_parts) {
+			cc->key_parts++;
+			cc->key_extra_size = cc->key_size / cc->key_parts;
+		}
+	} else if (strcmp(ivmode, "tcw") == 0) {
+		cc->iv_gen_ops = &crypt_iv_tcw_ops;
+		cc->key_parts += 2; /* IV + whitening */
+		cc->key_extra_size = cc->iv_size + TCW_WHITENING_SIZE;
+	} else if (strcmp(ivmode, "random") == 0) {
+		cc->iv_gen_ops = &crypt_iv_random_ops;
+		/* Need storage space in integrity fields. */
+		cc->integrity_iv_size = cc->iv_size;
+	} else {
+		ti->error = "Invalid IV mode";
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static int crypt_ctr_cipher(struct dm_target *ti,
 			    char *cipher_in, char *key)
 {
@@ -2205,7 +2272,6 @@ static int crypt_ctr_cipher(struct dm_target *ti,
 	int ret = -EINVAL;
 	char dummy;
 
-	/* Convert to crypto api definition? */
 	if (strchr(cipher_in, '(')) {
 		ti->error = "Bad cipher specification";
 		return -EINVAL;
@@ -2276,67 +2342,9 @@ static int crypt_ctr_cipher(struct dm_target *ti,
 	}
 
 	/* Initialize IV */
-	if (crypt_integrity_mode(cc))
-		cc->iv_size = crypto_aead_ivsize(any_tfm_aead(cc));
-	else
-		cc->iv_size = crypto_skcipher_ivsize(any_tfm(cc));
-
-	if (crypt_integrity_hmac(cc)) {
-		cc->authenc_key = kmalloc(crypt_authenckey_size(cc), GFP_KERNEL);
-		if (!cc->authenc_key) {
-			ret = -ENOMEM;
-			ti->error = "Error allocating authenc key space";
-			goto bad;
-		}
-	}
-
-	if (cc->iv_size)
-		/* at least a 64 bit sector number should fit in our buffer */
-		cc->iv_size = max(cc->iv_size,
-				  (unsigned int)(sizeof(u64) / sizeof(u8)));
-	else if (ivmode) {
-		DMWARN("Selected cipher does not support IVs");
-		ivmode = NULL;
-	}
-
-	/* Choose ivmode, see comments at iv code. */
-	if (ivmode == NULL)
-		cc->iv_gen_ops = NULL;
-	else if (strcmp(ivmode, "plain") == 0)
-		cc->iv_gen_ops = &crypt_iv_plain_ops;
-	else if (strcmp(ivmode, "plain64") == 0)
-		cc->iv_gen_ops = &crypt_iv_plain64_ops;
-	else if (strcmp(ivmode, "essiv") == 0)
-		cc->iv_gen_ops = &crypt_iv_essiv_ops;
-	else if (strcmp(ivmode, "benbi") == 0)
-		cc->iv_gen_ops = &crypt_iv_benbi_ops;
-	else if (strcmp(ivmode, "null") == 0)
-		cc->iv_gen_ops = &crypt_iv_null_ops;
-	else if (strcmp(ivmode, "lmk") == 0) {
-		cc->iv_gen_ops = &crypt_iv_lmk_ops;
-		/*
-		 * Version 2 and 3 is recognised according
-		 * to length of provided multi-key string.
-		 * If present (version 3), last key is used as IV seed.
-		 * All keys (including IV seed) are always the same size.
-		 */
-		if (cc->key_size % cc->key_parts) {
-			cc->key_parts++;
-			cc->key_extra_size = cc->key_size / cc->key_parts;
-		}
-	} else if (strcmp(ivmode, "tcw") == 0) {
-		cc->iv_gen_ops = &crypt_iv_tcw_ops;
-		cc->key_parts += 2; /* IV + whitening */
-		cc->key_extra_size = cc->iv_size + TCW_WHITENING_SIZE;
-	} else if (strcmp(ivmode, "random") == 0) {
-		cc->iv_gen_ops = &crypt_iv_random_ops;
-		/* Need storage space in integrity fields. */
-		cc->integrity_iv_size = cc->iv_size;
-	} else {
-		ret = -EINVAL;
-		ti->error = "Invalid IV mode";
+	ret = crypt_ctr_ivmode(ti, ivmode);
+	if (ret < 0)
 		goto bad;
-	}
 
 	/* Initialize and set key */
 	ret = crypt_set_key(cc, key);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/7] dm-crypt: Introduce new format of cipher with capi: prefix.
  2017-01-04 19:23 [RFC PATCH 0/4] Data integrity protection with dm-integrity and dm-crypt Milan Broz
                   ` (6 preceding siblings ...)
  2017-03-16 14:39 ` [PATCH 2/7] dm-crypt: Move IV constructor to separate function Milan Broz
@ 2017-03-16 14:39 ` Milan Broz
  2017-03-16 14:39 ` [PATCH 4/7] dm-crypt: Compute HMAC key size in a separate function Milan Broz
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Milan Broz @ 2017-03-16 14:39 UTC (permalink / raw)
  To: dm-devel; +Cc: Milan Broz

For the new authenticated encryption we have to support generic composed
modes (combination of encryption algorithm anf authenticator) because
this is the way how kernel crypto API accesses such algorithms.

To simplify interface, we accept an algorithm directly in crypto API
format.  The new format is recognised by the "capi:" prefix.
The dmcrypt internal IV specification is the same as for the old format.

The crypto API cipher specifications format is:
     capi:cipher_api_spec-ivmode[:ivopts]
Examples:
     capi:cbc(aes)-essiv:sha256 (equivalent to old aes-cbc-essiv:sha256)
     capi:xts(aes)-plain64      (equivalent to old aes-xts-plain64)
Examples of authenticated modes:
     capi:gcm(aes)-random
     capi:authenc(hmac(sha256),xts(aes))-random
     capi:rfc7539(chacha20,poly1305)-random

Authenticated modes can be configured only by the new cipher format.

Authenticated encryption algorithms can be of two types, either
native modes (like GCM) that performs both encryption and authentication
internally, and also composed mode where user can compose AEAD with
separate specification of encryption algorithm and authenticator.

Note that this policy allows user to specify arbitrary combination
that can be insecure. (Policy decision is done in cryptsetup userspace.)

(Note that HMAC composed modes requires following patches in series
that are split for better readability of changes.)

Signed-off-by: Milan Broz <gmazyland@gmail.com>
---
 Documentation/device-mapper/dm-crypt.txt |  27 +++++--
 drivers/md/dm-crypt.c                    | 134 +++++++++++++++++++++++++------
 2 files changed, 132 insertions(+), 29 deletions(-)

diff --git a/Documentation/device-mapper/dm-crypt.txt b/Documentation/device-mapper/dm-crypt.txt
index 058f26ddf875..8140b71f3c54 100644
--- a/Documentation/device-mapper/dm-crypt.txt
+++ b/Documentation/device-mapper/dm-crypt.txt
@@ -11,14 +11,31 @@ Parameters: <cipher> <key> <iv_offset> <device path> \
 	      <offset> [<#opt_params> <opt_params>]
 
 <cipher>
-    Encryption cipher and an optional IV generation mode.
-    (In format cipher[:keycount]-chainmode-ivmode[:ivopts]).
+    Encryption cipher, encryption mode and Initial Vector (IV) generator.
+
+    The cipher specifications format is:
+       cipher[:keycount]-chainmode-ivmode[:ivopts]
     Examples:
-       des
        aes-cbc-essiv:sha256
-       twofish-ecb
+       aes-xts-plain64
+       serpent-xts-plain64
+
+    Cipher format also supports direct specification with kernel crypt API
+    format (selected by capi: prefix). The IV specification is the same
+    as for the first format type.
+    This format is mainly used for specification of authenticated modes.
 
-    /proc/crypto contains supported crypto modes
+    The crypto API cipher specifications format is:
+        capi:cipher_api_spec-ivmode[:ivopts]
+    Examples:
+        capi:cbc(aes)-essiv:sha256
+        capi:xts(aes)-plain64
+    Examples of authenticated modes:
+        capi:gcm(aes)-random
+        capi:authenc(hmac(sha256),xts(aes))-random
+        capi:rfc7539(chacha20,poly1305)-random
+
+    The /proc/crypto contains a list of curently loaded crypto modes.
 
 <key>
     Key used for encryption. It is encoded either as a hexadecimal number
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index faec408dcf50..0c7d07e17b81 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -2263,11 +2263,82 @@ static int crypt_ctr_ivmode(struct dm_target *ti, const char *ivmode)
 	return 0;
 }
 
-static int crypt_ctr_cipher(struct dm_target *ti,
-			    char *cipher_in, char *key)
+/*
+ * Workaround to parse cipher algorithm from crypto API spec.
+ * The cc->cipher is currently used only in ESSIV.
+ * This should be probably done by crypto-api calls (once available...)
+ */
+static int crypt_ctr_blkdev_cipher(struct crypt_config *cc)
+{
+	const char *alg_name = crypto_tfm_alg_name(crypto_skcipher_tfm(any_tfm(cc)));
+	char *start, *end;
+
+	start = strchr(alg_name, '(');
+	end = strchr(alg_name, ')');
+
+	if (!start && !end) {
+		cc->cipher = kstrdup(alg_name, GFP_KERNEL);
+		return cc->cipher ? 0 : -ENOMEM;
+	}
+
+	if (!start || !end || ++start >= end)
+		return -EINVAL;
+
+	cc->cipher = kzalloc(end - start + 1, GFP_KERNEL);
+	if (!cc->cipher)
+		return -ENOMEM;
+
+	strncpy(cc->cipher, start, end - start);
+
+	return 0;
+}
+
+static int crypt_ctr_cipher_new(struct dm_target *ti, char *cipher_in, char *key,
+				char **ivmode, char **ivopts)
 {
 	struct crypt_config *cc = ti->private;
-	char *tmp, *cipher, *chainmode, *ivmode, *ivopts, *keycount;
+	char *tmp, *cipher_api;
+	int ret = -EINVAL;
+
+	cc->tfms_count = 1;
+
+	/*
+	 * New format (capi: prefix)
+	 * capi:cipher_api_spec-iv:ivopts
+	 */
+	tmp = &cipher_in[strlen("capi:")];
+	cipher_api = strsep(&tmp, "-");
+	*ivmode = strsep(&tmp, ":");
+	*ivopts = tmp;
+
+	if (*ivmode && !strcmp(*ivmode, "lmk"))
+		cc->tfms_count = 64;
+
+	cc->key_parts = cc->tfms_count;
+
+	/* Allocate cipher */
+	ret = crypt_alloc_tfms(cc, cipher_api);
+	if (ret < 0) {
+		ti->error = "Error allocating crypto tfm";
+		return ret;
+	}
+
+	cc->iv_size = crypto_skcipher_ivsize(any_tfm(cc));
+
+	ret = crypt_ctr_blkdev_cipher(cc);
+	if (ret < 0) {
+		ti->error = "Cannot allocate cipher string";
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static int crypt_ctr_cipher_old(struct dm_target *ti, char *cipher_in, char *key,
+				char **ivmode, char **ivopts)
+{
+	struct crypt_config *cc = ti->private;
+	char *tmp, *cipher, *chainmode, *keycount;
 	char *cipher_api = NULL;
 	int ret = -EINVAL;
 	char dummy;
@@ -2277,10 +2348,6 @@ static int crypt_ctr_cipher(struct dm_target *ti,
 		return -EINVAL;
 	}
 
-	cc->cipher_string = kstrdup(cipher_in, GFP_KERNEL);
-	if (!cc->cipher_string)
-		goto bad_mem;
-
 	/*
 	 * Legacy dm-crypt cipher specification
 	 * cipher[:keycount]-mode-iv:ivopts
@@ -2303,8 +2370,8 @@ static int crypt_ctr_cipher(struct dm_target *ti,
 		goto bad_mem;
 
 	chainmode = strsep(&tmp, "-");
-	ivopts = strsep(&tmp, "-");
-	ivmode = strsep(&ivopts, ":");
+	*ivopts = strsep(&tmp, "-");
+	*ivmode = strsep(&*ivopts, ":");
 
 	if (tmp)
 		DMWARN("Ignoring unexpected additional cipher options");
@@ -2313,12 +2380,12 @@ static int crypt_ctr_cipher(struct dm_target *ti,
 	 * For compatibility with the original dm-crypt mapping format, if
 	 * only the cipher name is supplied, use cbc-plain.
 	 */
-	if (!chainmode || (!strcmp(chainmode, "plain") && !ivmode)) {
+	if (!chainmode || (!strcmp(chainmode, "plain") && !*ivmode)) {
 		chainmode = "cbc";
-		ivmode = "plain";
+		*ivmode = "plain";
 	}
 
-	if (strcmp(chainmode, "ecb") && !ivmode) {
+	if (strcmp(chainmode, "ecb") && !*ivmode) {
 		ti->error = "IV mechanism required";
 		return -EINVAL;
 	}
@@ -2338,19 +2405,45 @@ static int crypt_ctr_cipher(struct dm_target *ti,
 	ret = crypt_alloc_tfms(cc, cipher_api);
 	if (ret < 0) {
 		ti->error = "Error allocating crypto tfm";
-		goto bad;
+		kfree(cipher_api);
+		return ret;
 	}
 
+	return 0;
+bad_mem:
+	ti->error = "Cannot allocate cipher strings";
+	return -ENOMEM;
+}
+
+static int crypt_ctr_cipher(struct dm_target *ti, char *cipher_in, char *key)
+{
+	struct crypt_config *cc = ti->private;
+	char *ivmode = NULL, *ivopts = NULL;
+	int ret;
+
+	cc->cipher_string = kstrdup(cipher_in, GFP_KERNEL);
+	if (!cc->cipher_string) {
+		ti->error = "Cannot allocate cipher strings";
+		return -ENOMEM;
+	}
+
+	if (strstarts(cipher_in, "capi:"))
+		ret = crypt_ctr_cipher_new(ti, cipher_in, key, &ivmode, &ivopts);
+	else
+		ret = crypt_ctr_cipher_old(ti, cipher_in, key, &ivmode, &ivopts);
+	if (ret)
+		return ret;
+
 	/* Initialize IV */
 	ret = crypt_ctr_ivmode(ti, ivmode);
 	if (ret < 0)
-		goto bad;
+		return ret;
 
 	/* Initialize and set key */
 	ret = crypt_set_key(cc, key);
 	if (ret < 0) {
 		ti->error = "Error decoding and setting key";
-		goto bad;
+		return ret;
 	}
 
 	/* Allocate IV */
@@ -2358,7 +2451,7 @@ static int crypt_ctr_cipher(struct dm_target *ti,
 		ret = cc->iv_gen_ops->ctr(cc, ti, ivopts);
 		if (ret < 0) {
 			ti->error = "Error creating IV";
-			goto bad;
+			return ret;
 		}
 	}
 
@@ -2367,18 +2460,11 @@ static int crypt_ctr_cipher(struct dm_target *ti,
 		ret = cc->iv_gen_ops->init(cc);
 		if (ret < 0) {
 			ti->error = "Error initialising IV";
-			goto bad;
+			return ret;
 		}
 	}
 
-	ret = 0;
-bad:
-	kfree(cipher_api);
 	return ret;
-
-bad_mem:
-	ti->error = "Cannot allocate cipher strings";
-	return -ENOMEM;
 }
 
 static int crypt_ctr_optional(struct dm_target *ti, unsigned int argc, char **argv)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/7] dm-crypt: Compute HMAC key size in a separate function.
  2017-01-04 19:23 [RFC PATCH 0/4] Data integrity protection with dm-integrity and dm-crypt Milan Broz
                   ` (7 preceding siblings ...)
  2017-03-16 14:39 ` [PATCH 3/7] dm-crypt: Introduce new format of cipher with capi: prefix Milan Broz
@ 2017-03-16 14:39 ` Milan Broz
  2017-03-16 14:39 ` [PATCH 5/7] dm-crypt: Parse cipher specification according to AEAD flag Milan Broz
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Milan Broz @ 2017-03-16 14:39 UTC (permalink / raw)
  To: dm-devel; +Cc: Milan Broz

For composed authenticated modes with HMAC (length-preserving encryption
mode like a XTS and HMAC as an authenticator) we have to calculate
HMAC digest size (the separate authentication key is as the same size
as the HMAC digest).

This patch introduces workaround to parse crypto API string to get
HMAC algorithm and retrieve digest size from it.

Signed-off-by: Milan Broz <gmazyland@gmail.com>
---
 drivers/md/dm-crypt.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 49 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 0c7d07e17b81..48e8dfe91c53 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -2293,6 +2293,45 @@ static int crypt_ctr_blkdev_cipher(struct crypt_config *cc)
 	return 0;
 }
 
+/*
+ * Workaround to parse HMAC algorithm from AEAD crypto API spec.
+ * The HMAC is needed to calculate tag size (HMAC digest size).
+ * This should be probably done by crypto-api calls (once available...)
+ */
+static int crypt_ctr_auth_cipher(struct crypt_config *cc, char *cipher_api)
+{
+	char *start, *end, *mac_alg = NULL;
+	struct crypto_ahash *mac;
+
+	if (!strstarts(cipher_api, "authenc("))
+		return 0;
+
+	start = strchr(cipher_api, '(');
+	end = strchr(cipher_api, ',');
+	if (!start || !end || ++start > end)
+		return -EINVAL;
+
+	mac_alg = kzalloc(end - start + 1, GFP_KERNEL);
+	if (!mac_alg)
+		return -ENOMEM;
+	strncpy(mac_alg, start, end - start);
+
+	mac = crypto_alloc_ahash(mac_alg, 0, 0);
+	kfree(mac_alg);
+
+	if (IS_ERR(mac))
+		return PTR_ERR(mac);
+
+	cc->key_mac_size = crypto_ahash_digestsize(mac);
+	crypto_free_ahash(mac);
+
+	cc->authenc_key = kmalloc(crypt_authenckey_size(cc), GFP_KERNEL);
+	if (!cc->authenc_key)
+		return -ENOMEM;
+
+	return 0;
+}
+
 static int crypt_ctr_cipher_new(struct dm_target *ti, char *cipher_in, char *key,
 				char **ivmode, char **ivopts)
 {
@@ -2323,7 +2362,16 @@ static int crypt_ctr_cipher_new(struct dm_target *ti, char *cipher_in, char *key
 		return ret;
 	}
 
-	cc->iv_size = crypto_skcipher_ivsize(any_tfm(cc));
+	/* Alloc AEAD, can be used only in new format. */
+	if (crypt_integrity_aead(cc)) {
+		ret = crypt_ctr_auth_cipher(cc, cipher_api);
+		if (ret < 0) {
+			ti->error = "Invalid AEAD cipher spec";
+			return -ENOMEM;
+		}
+		cc->iv_size = crypto_aead_ivsize(any_tfm_aead(cc));
+	} else
+		cc->iv_size = crypto_skcipher_ivsize(any_tfm(cc));
 
 	ret = crypt_ctr_blkdev_cipher(cc);
 	if (ret < 0) {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/7] dm-crypt: Parse cipher specification according to AEAD flag.
  2017-01-04 19:23 [RFC PATCH 0/4] Data integrity protection with dm-integrity and dm-crypt Milan Broz
                   ` (8 preceding siblings ...)
  2017-03-16 14:39 ` [PATCH 4/7] dm-crypt: Compute HMAC key size in a separate function Milan Broz
@ 2017-03-16 14:39 ` Milan Broz
  2017-03-16 14:39 ` [PATCH 6/7] dm-crypt: Remove obsolete integrity_mode function Milan Broz
  2017-03-16 14:39 ` [PATCH 7/7] dm-crypt: optionally support larger encryption sector size Milan Broz
  11 siblings, 0 replies; 14+ messages in thread
From: Milan Broz @ 2017-03-16 14:39 UTC (permalink / raw)
  To: dm-devel; +Cc: Milan Broz

This patch siplifies allocation of HMAC composed mode by parsing
the new cipher format directly.

For native AEAD mode (like GCM), we can use crypto_tfm_alg_name() API
to get the cipher specification, for HMAC composed mode we need
to parse crypto API string to get cipher mode nested in specification.

Signed-off-by: Milan Broz <gmazyland@gmail.com>
---
 drivers/md/dm-crypt.c | 49 +++++++++++++++++--------------------------------
 1 file changed, 17 insertions(+), 32 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 48e8dfe91c53..3a4bf5791a3b 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -873,12 +873,12 @@ static bool crypt_integrity_aead(struct crypt_config *cc)
 
 static bool crypt_integrity_hmac(struct crypt_config *cc)
 {
-	return test_bit(CRYPT_MODE_INTEGRITY_HMAC, &cc->cipher_flags);
+	return crypt_integrity_aead(cc) && cc->key_mac_size;
 }
 
 static bool crypt_integrity_mode(struct crypt_config *cc)
 {
-	return crypt_integrity_aead(cc) || crypt_integrity_hmac(cc);
+	return crypt_integrity_aead(cc);
 }
 
 /* Get sg containing data */
@@ -1879,27 +1879,12 @@ static int crypt_alloc_tfms_skcipher(struct crypt_config *cc, char *ciphermode)
 
 static int crypt_alloc_tfms_aead(struct crypt_config *cc, char *ciphermode)
 {
-	char *authenc = NULL;
 	int err;
 
 	cc->cipher_tfm.tfms = kmalloc(sizeof(struct crypto_aead *), GFP_KERNEL);
 	if (!cc->cipher_tfm.tfms)
 		return -ENOMEM;
 
-	/* Compose AEAD cipher with autenc(authenticator,cipher) structure */
-	if (crypt_integrity_hmac(cc)) {
-		authenc = kmalloc(CRYPTO_MAX_ALG_NAME, GFP_KERNEL);
-		if (!authenc)
-			return -ENOMEM;
-		err = snprintf(authenc, CRYPTO_MAX_ALG_NAME,
-		       "authenc(%s,%s)", cc->cipher_auth, ciphermode);
-		if (err < 0) {
-			kzfree(authenc);
-			return err;
-		}
-		ciphermode = authenc;
-	}
-
 	cc->cipher_tfm.tfms_aead[0] = crypto_alloc_aead(ciphermode, 0, 0);
 	if (IS_ERR(cc->cipher_tfm.tfms_aead[0])) {
 		err = PTR_ERR(cc->cipher_tfm.tfms_aead[0]);
@@ -1907,7 +1892,6 @@ static int crypt_alloc_tfms_aead(struct crypt_config *cc, char *ciphermode)
 		return err;
 	}
 
-	kzfree(authenc);
 	return 0;
 }
 
@@ -1964,13 +1948,13 @@ static int crypt_setkey(struct crypt_config *cc)
 				      subkey_size - cc->key_mac_size,
 				      cc->key_mac_size);
 	for (i = 0; i < cc->tfms_count; i++) {
-		if (crypt_integrity_aead(cc))
-			r = crypto_aead_setkey(cc->cipher_tfm.tfms_aead[i],
-						   cc->key + (i * subkey_size),
-						   subkey_size);
-		else if (crypt_integrity_hmac(cc))
+		if (crypt_integrity_hmac(cc))
 			r = crypto_aead_setkey(cc->cipher_tfm.tfms_aead[i],
 				cc->authenc_key, crypt_authenckey_size(cc));
+		else if (crypt_integrity_aead(cc))
+			r = crypto_aead_setkey(cc->cipher_tfm.tfms_aead[i],
+						cc->key + (i * subkey_size),
+						subkey_size);
 		else
 			r = crypto_skcipher_setkey(cc->cipher_tfm.tfms[i],
 						   cc->key + (i * subkey_size),
@@ -2205,14 +2189,6 @@ static int crypt_ctr_ivmode(struct dm_target *ti, const char *ivmode)
 	else
 		cc->iv_size = crypto_skcipher_ivsize(any_tfm(cc));
 
-	if (crypt_integrity_hmac(cc)) {
-		cc->authenc_key = kmalloc(crypt_authenckey_size(cc), GFP_KERNEL);
-		if (!cc->authenc_key) {
-			ti->error = "Error allocating authenc key space";
-			return -ENOMEM;
-		}
-	}
-
 	if (cc->iv_size)
 		/* at least a 64 bit sector number should fit in our buffer */
 		cc->iv_size = max(cc->iv_size,
@@ -2270,9 +2246,18 @@ static int crypt_ctr_ivmode(struct dm_target *ti, const char *ivmode)
  */
 static int crypt_ctr_blkdev_cipher(struct crypt_config *cc)
 {
-	const char *alg_name = crypto_tfm_alg_name(crypto_skcipher_tfm(any_tfm(cc)));
+	const char *alg_name = NULL;
 	char *start, *end;
 
+	if (crypt_integrity_aead(cc)) {
+		if (!(alg_name = crypto_tfm_alg_name(crypto_aead_tfm(any_tfm_aead(cc)))))
+			return -EINVAL;
+		if (crypt_integrity_hmac(cc) && !(alg_name = strchr(alg_name, ',')))
+			return -EINVAL;
+		alg_name++;
+	} else if (!(alg_name = crypto_tfm_alg_name(crypto_skcipher_tfm(any_tfm(cc)))))
+		return -EINVAL;
+
 	start = strchr(alg_name, '(');
 	end = strchr(alg_name, ')');
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 6/7] dm-crypt: Remove obsolete integrity_mode function.
  2017-01-04 19:23 [RFC PATCH 0/4] Data integrity protection with dm-integrity and dm-crypt Milan Broz
                   ` (9 preceding siblings ...)
  2017-03-16 14:39 ` [PATCH 5/7] dm-crypt: Parse cipher specification according to AEAD flag Milan Broz
@ 2017-03-16 14:39 ` Milan Broz
  2017-03-16 14:39 ` [PATCH 7/7] dm-crypt: optionally support larger encryption sector size Milan Broz
  11 siblings, 0 replies; 14+ messages in thread
From: Milan Broz @ 2017-03-16 14:39 UTC (permalink / raw)
  To: dm-devel; +Cc: Milan Broz

The HMAC composed mode is not processed the same as the normal AEAD
mode, so this patch removes no longer needed flag and "hmac"
specification for the table integrity argument.

Signed-off-by: Milan Broz <gmazyland@gmail.com>
---
 drivers/md/dm-crypt.c | 41 +++++++++++++----------------------------
 1 file changed, 13 insertions(+), 28 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 3a4bf5791a3b..270adba14717 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -129,7 +129,6 @@ enum flags { DM_CRYPT_SUSPENDED, DM_CRYPT_KEY_VALID,
 
 enum cipher_flags {
 	CRYPT_MODE_INTEGRITY_AEAD,	/* Use authenticated mode for cihper */
-	CRYPT_MODE_INTEGRITY_HMAC,	/* Compose authenticated mode from normal mode and HMAC */
 };
 
 /*
@@ -876,16 +875,11 @@ static bool crypt_integrity_hmac(struct crypt_config *cc)
 	return crypt_integrity_aead(cc) && cc->key_mac_size;
 }
 
-static bool crypt_integrity_mode(struct crypt_config *cc)
-{
-	return crypt_integrity_aead(cc);
-}
-
 /* Get sg containing data */
 static struct scatterlist *crypt_get_sg_data(struct crypt_config *cc,
 					     struct scatterlist *sg)
 {
-	if (unlikely(crypt_integrity_mode(cc)))
+	if (unlikely(crypt_integrity_aead(cc)))
 		return &sg[2];
 
 	return sg;
@@ -936,7 +930,7 @@ static int crypt_integrity_ctr(struct crypt_config *cc, struct dm_target *ti)
 		return -EINVAL;
 	}
 
-	if (crypt_integrity_mode(cc)) {
+	if (crypt_integrity_aead(cc)) {
 		cc->integrity_tag_size = cc->on_disk_tag_size - cc->integrity_iv_size;
 		DMINFO("Integrity AEAD, tag size %u, IV size %u.",
 		       cc->integrity_tag_size, cc->integrity_iv_size);
@@ -990,7 +984,7 @@ static void *req_of_dmreq(struct crypt_config *cc, struct dm_crypt_request *dmre
 static u8 *iv_of_dmreq(struct crypt_config *cc,
 		       struct dm_crypt_request *dmreq)
 {
-	if (crypt_integrity_mode(cc))
+	if (crypt_integrity_aead(cc))
 		return (u8 *)ALIGN((unsigned long)(dmreq + 1),
 			crypto_aead_alignmask(any_tfm_aead(cc)) + 1);
 	else
@@ -1235,7 +1229,7 @@ static void crypt_alloc_req_aead(struct crypt_config *cc,
 static void crypt_alloc_req(struct crypt_config *cc,
 			    struct convert_context *ctx)
 {
-	if (crypt_integrity_mode(cc))
+	if (crypt_integrity_aead(cc))
 		crypt_alloc_req_aead(cc, ctx);
 	else
 		crypt_alloc_req_skcipher(cc, ctx);
@@ -1261,7 +1255,7 @@ static void crypt_free_req_aead(struct crypt_config *cc,
 
 static void crypt_free_req(struct crypt_config *cc, void *req, struct bio *base_bio)
 {
-	if (crypt_integrity_mode(cc))
+	if (crypt_integrity_aead(cc))
 		crypt_free_req_aead(cc, req, base_bio);
 	else
 		crypt_free_req_skcipher(cc, req, base_bio);
@@ -1284,7 +1278,7 @@ static int crypt_convert(struct crypt_config *cc,
 
 		atomic_inc(&ctx->cc_pending);
 
-		if (crypt_integrity_mode(cc))
+		if (crypt_integrity_aead(cc))
 			r = crypt_convert_block_aead(cc, ctx, ctx->r.req_aead, tag_offset);
 		else
 			r = crypt_convert_block_skcipher(cc, ctx, ctx->r.req, tag_offset);
@@ -1849,7 +1843,7 @@ static void crypt_free_tfms_skcipher(struct crypt_config *cc)
 
 static void crypt_free_tfms(struct crypt_config *cc)
 {
-	if (crypt_integrity_mode(cc))
+	if (crypt_integrity_aead(cc))
 		crypt_free_tfms_aead(cc);
 	else
 		crypt_free_tfms_skcipher(cc);
@@ -1897,7 +1891,7 @@ static int crypt_alloc_tfms_aead(struct crypt_config *cc, char *ciphermode)
 
 static int crypt_alloc_tfms(struct crypt_config *cc, char *ciphermode)
 {
-	if (crypt_integrity_mode(cc))
+	if (crypt_integrity_aead(cc))
 		return crypt_alloc_tfms_aead(cc, ciphermode);
 	else
 		return crypt_alloc_tfms_skcipher(cc, ciphermode);
@@ -2184,7 +2178,7 @@ static int crypt_ctr_ivmode(struct dm_target *ti, const char *ivmode)
 {
 	struct crypt_config *cc = ti->private;
 
-	if (crypt_integrity_mode(cc))
+	if (crypt_integrity_aead(cc))
 		cc->iv_size = crypto_aead_ivsize(any_tfm_aead(cc));
 	else
 		cc->iv_size = crypto_skcipher_ivsize(any_tfm(cc));
@@ -2376,7 +2370,7 @@ static int crypt_ctr_cipher_old(struct dm_target *ti, char *cipher_in, char *key
 	int ret = -EINVAL;
 	char dummy;
 
-	if (strchr(cipher_in, '(')) {
+	if (strchr(cipher_in, '(') || crypt_integrity_aead(cc)) {
 		ti->error = "Bad cipher specification";
 		return -EINVAL;
 	}
@@ -2543,15 +2537,6 @@ static int crypt_ctr_optional(struct dm_target *ti, unsigned int argc, char **ar
 			}
 			if (!strcasecmp(sval, "aead")) {
 				set_bit(CRYPT_MODE_INTEGRITY_AEAD, &cc->cipher_flags);
-			} else  if (!strncasecmp(sval, "hmac(", strlen("hmac("))) {
-				struct crypto_ahash *hmac_tfm = crypto_alloc_ahash(sval, 0, 0);
-				if (IS_ERR(hmac_tfm)) {
-					ti->error = "Error initializing HMAC integrity hash.";
-					return PTR_ERR(hmac_tfm);
-				}
-				cc->key_mac_size = crypto_ahash_digestsize(hmac_tfm);
-				crypto_free_ahash(hmac_tfm);
-				set_bit(CRYPT_MODE_INTEGRITY_HMAC, &cc->cipher_flags);
 			} else  if (strcasecmp(sval, "none")) {
 				ti->error = "Unknown integrity profile";
 				return -EINVAL;
@@ -2614,7 +2599,7 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	if (ret < 0)
 		goto bad;
 
-	if (crypt_integrity_mode(cc)) {
+	if (crypt_integrity_aead(cc)) {
 		cc->dmreq_start = sizeof(struct aead_request);
 		cc->dmreq_start += crypto_aead_reqsize(any_tfm_aead(cc));
 		align_mask = crypto_aead_alignmask(any_tfm_aead(cc));
@@ -2691,7 +2676,7 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	}
 	cc->start = tmpll;
 
-	if (crypt_integrity_mode(cc) || cc->integrity_iv_size) {
+	if (crypt_integrity_aead(cc) || cc->integrity_iv_size) {
 		ret = crypt_integrity_ctr(cc, ti);
 		if (ret)
 			goto bad;
@@ -2789,7 +2774,7 @@ static int crypt_map(struct dm_target *ti, struct bio *bio)
 		}
 	}
 
-	if (crypt_integrity_mode(cc))
+	if (crypt_integrity_aead(cc))
 		io->ctx.r.req_aead = (struct aead_request *)(io + 1);
 	else
 		io->ctx.r.req = (struct skcipher_request *)(io + 1);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 7/7] dm-crypt: optionally support larger encryption sector size
  2017-01-04 19:23 [RFC PATCH 0/4] Data integrity protection with dm-integrity and dm-crypt Milan Broz
                   ` (10 preceding siblings ...)
  2017-03-16 14:39 ` [PATCH 6/7] dm-crypt: Remove obsolete integrity_mode function Milan Broz
@ 2017-03-16 14:39 ` Milan Broz
  11 siblings, 0 replies; 14+ messages in thread
From: Milan Broz @ 2017-03-16 14:39 UTC (permalink / raw)
  To: dm-devel; +Cc: Milan Broz

This patch adds optional "sector_size"  parameter that specifies encryption
sector size (atomic unit of block device encryption).

Parameter can be in range 512 - 4096 bytes and must be power of two.
For compatibility reasons, the maximal IO must fit into the page limit,
so the limit is set to minimal page size possible (4096 bytes).

NOTE: this device cannot be handled yet with cryptsetup directly
if this parameter is set.

IV for sector is calculated from the 512 bytes sector offset
until the iv_large_sectors option is used.

Test script using dmsetup:

  DEV="/dev/sdb"
  DEV_SIZE=$(blockdev --getsz $DEV)
  KEY="9c1185a5c5e9fc54612808977ee8f548b2258d31ddadef707ba62c166051b9e3cd0294c27515f2bccee924e8823ca6e124b8fc3167ed478bca702babe4e130ac"
  BLOCK_SIZE=4096

  dmsetup create test_crypt --table "0 $DEV_SIZE crypt aes-xts-plain64 $KEY 0 $DEV 0 1 sector_size:$BLOCK_SIZE"
  #dmsetup table --showkeys test_crypt

Signed-off-by: Milan Broz <gmazyland@gmail.com>
---
 Documentation/device-mapper/dm-crypt.txt |  14 +++++
 drivers/md/dm-crypt.c                    | 105 ++++++++++++++++++++++++-------
 2 files changed, 96 insertions(+), 23 deletions(-)

diff --git a/Documentation/device-mapper/dm-crypt.txt b/Documentation/device-mapper/dm-crypt.txt
index 8140b71f3c54..3b3e1de21c9c 100644
--- a/Documentation/device-mapper/dm-crypt.txt
+++ b/Documentation/device-mapper/dm-crypt.txt
@@ -122,6 +122,20 @@ integrity:<bytes>:<type>
     integrity for the encrypted device. The additional space is then
     used for storing authentication tag (and persistent IV if needed).
 
+sector_size:<bytes>
+    Use <bytes> as the encryption unit instead of 512 bytes sectors.
+    This option can be in range 512 - 4096 bytes and must be power of two.
+    Virtual device will announce this size as a minimal IO and logical sector.
+
+iv_large_sectors
+   IV generators will use sector number counted in <sector_size> units
+   instead of default 512 bytes sectors.
+
+   For example, if <sector_size> is 4096 bytes, plain64 IV for the second
+   sector will be 8 (without flag) and 1 if iv_large_sectors is present.
+   The <iv_offset> must be multiple of <sector_size> (in 512 bytes units)
+   if this flag is specified.
+
 Example scripts
 ===============
 LUKS (Linux Unified Key Setup) is now the preferred way to set up disk
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 270adba14717..83fa03a2e60b 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -129,6 +129,7 @@ enum flags { DM_CRYPT_SUSPENDED, DM_CRYPT_KEY_VALID,
 
 enum cipher_flags {
 	CRYPT_MODE_INTEGRITY_AEAD,	/* Use authenticated mode for cihper */
+	CRYPT_IV_LARGE_SECTORS,		/* Calculate IV from sector_size, not 512B sectors */
 };
 
 /*
@@ -171,6 +172,7 @@ struct crypt_config {
 	} iv_gen_private;
 	sector_t iv_offset;
 	unsigned int iv_size;
+	unsigned int sector_size;
 
 	/* ESSIV: struct crypto_cipher *essiv_tfm */
 	void *iv_private;
@@ -524,6 +526,11 @@ static int crypt_iv_lmk_ctr(struct crypt_config *cc, struct dm_target *ti,
 {
 	struct iv_lmk_private *lmk = &cc->iv_gen_private.lmk;
 
+	if (cc->sector_size != (1 << SECTOR_SHIFT)) {
+		ti->error = "Unsupported sector size for LMK";
+		return -EINVAL;
+	}
+
 	lmk->hash_tfm = crypto_alloc_shash("md5", 0, 0);
 	if (IS_ERR(lmk->hash_tfm)) {
 		ti->error = "Error initializing LMK hash";
@@ -677,6 +684,11 @@ static int crypt_iv_tcw_ctr(struct crypt_config *cc, struct dm_target *ti,
 {
 	struct iv_tcw_private *tcw = &cc->iv_gen_private.tcw;
 
+	if (cc->sector_size != (1 << SECTOR_SHIFT)) {
+		ti->error = "Unsupported sector size for TCW";
+		return -EINVAL;
+	}
+
 	if (cc->key_size <= (cc->iv_size + TCW_WHITENING_SIZE)) {
 		ti->error = "Wrong key size for TCW";
 		return -EINVAL;
@@ -1037,15 +1049,20 @@ static int crypt_convert_block_aead(struct crypt_config *cc,
 	struct bio_vec bv_in = bio_iter_iovec(ctx->bio_in, ctx->iter_in);
 	struct bio_vec bv_out = bio_iter_iovec(ctx->bio_out, ctx->iter_out);
 	struct dm_crypt_request *dmreq;
-	unsigned int data_len = 1 << SECTOR_SHIFT;
 	u8 *iv, *org_iv, *tag_iv, *tag;
 	uint64_t *sector;
 	int r = 0;
 
 	BUG_ON(cc->integrity_iv_size && cc->integrity_iv_size != cc->iv_size);
 
+	/* Reject unexpected unaligned bio. */
+	if (unlikely(bv_in.bv_offset & (cc->sector_size - 1)))
+		return -EIO;
+
 	dmreq = dmreq_of_req(cc, req);
 	dmreq->iv_sector = ctx->cc_sector;
+	if (test_bit(CRYPT_IV_LARGE_SECTORS, &cc->cipher_flags))
+		sector_div(dmreq->iv_sector, cc->sector_size >> SECTOR_SHIFT);
 	dmreq->ctx = ctx;
 
 	*org_tag_of_dmreq(cc, dmreq) = tag_offset;
@@ -1066,13 +1083,13 @@ static int crypt_convert_block_aead(struct crypt_config *cc,
 	sg_init_table(dmreq->sg_in, 4);
 	sg_set_buf(&dmreq->sg_in[0], sector, sizeof(uint64_t));
 	sg_set_buf(&dmreq->sg_in[1], org_iv, cc->iv_size);
-	sg_set_page(&dmreq->sg_in[2], bv_in.bv_page, data_len, bv_in.bv_offset);
+	sg_set_page(&dmreq->sg_in[2], bv_in.bv_page, cc->sector_size, bv_in.bv_offset);
 	sg_set_buf(&dmreq->sg_in[3], tag, cc->integrity_tag_size);
 
 	sg_init_table(dmreq->sg_out, 4);
 	sg_set_buf(&dmreq->sg_out[0], sector, sizeof(uint64_t));
 	sg_set_buf(&dmreq->sg_out[1], org_iv, cc->iv_size);
-	sg_set_page(&dmreq->sg_out[2], bv_out.bv_page, data_len, bv_out.bv_offset);
+	sg_set_page(&dmreq->sg_out[2], bv_out.bv_page, cc->sector_size, bv_out.bv_offset);
 	sg_set_buf(&dmreq->sg_out[3], tag, cc->integrity_tag_size);
 
 	if (cc->iv_gen_ops) {
@@ -1094,14 +1111,14 @@ static int crypt_convert_block_aead(struct crypt_config *cc,
 	aead_request_set_ad(req, sizeof(uint64_t) + cc->iv_size);
 	if (bio_data_dir(ctx->bio_in) == WRITE) {
 		aead_request_set_crypt(req, dmreq->sg_in, dmreq->sg_out,
-				       data_len, iv);
+				       cc->sector_size, iv);
 		r = crypto_aead_encrypt(req);
 		if (cc->integrity_tag_size + cc->integrity_iv_size != cc->on_disk_tag_size)
 			memset(tag + cc->integrity_tag_size + cc->integrity_iv_size, 0,
 			       cc->on_disk_tag_size - (cc->integrity_tag_size + cc->integrity_iv_size));
 	} else {
 		aead_request_set_crypt(req, dmreq->sg_in, dmreq->sg_out,
-				       data_len + cc->integrity_tag_size, iv);
+				       cc->sector_size + cc->integrity_tag_size, iv);
 		r = crypto_aead_decrypt(req);
 	}
 
@@ -1112,8 +1129,8 @@ static int crypt_convert_block_aead(struct crypt_config *cc,
 	if (!r && cc->iv_gen_ops && cc->iv_gen_ops->post)
 		r = cc->iv_gen_ops->post(cc, org_iv, dmreq);
 
-	bio_advance_iter(ctx->bio_in, &ctx->iter_in, data_len);
-	bio_advance_iter(ctx->bio_out, &ctx->iter_out, data_len);
+	bio_advance_iter(ctx->bio_in, &ctx->iter_in, cc->sector_size);
+	bio_advance_iter(ctx->bio_out, &ctx->iter_out, cc->sector_size);
 
 	return r;
 }
@@ -1127,13 +1144,18 @@ static int crypt_convert_block_skcipher(struct crypt_config *cc,
 	struct bio_vec bv_out = bio_iter_iovec(ctx->bio_out, ctx->iter_out);
 	struct scatterlist *sg_in, *sg_out;
 	struct dm_crypt_request *dmreq;
-	unsigned int data_len = 1 << SECTOR_SHIFT;
 	u8 *iv, *org_iv, *tag_iv;
 	uint64_t *sector;
 	int r = 0;
 
+	/* Reject unexpected unaligned bio. */
+	if (unlikely(bv_in.bv_offset & (cc->sector_size - 1)))
+		return -EIO;
+
 	dmreq = dmreq_of_req(cc, req);
 	dmreq->iv_sector = ctx->cc_sector;
+	if (test_bit(CRYPT_IV_LARGE_SECTORS, &cc->cipher_flags))
+		sector_div(dmreq->iv_sector, cc->sector_size >> SECTOR_SHIFT);
 	dmreq->ctx = ctx;
 
 	*org_tag_of_dmreq(cc, dmreq) = tag_offset;
@@ -1150,10 +1172,10 @@ static int crypt_convert_block_skcipher(struct crypt_config *cc,
 	sg_out = &dmreq->sg_out[0];
 
 	sg_init_table(sg_in, 1);
-	sg_set_page(sg_in, bv_in.bv_page, data_len, bv_in.bv_offset);
+	sg_set_page(sg_in, bv_in.bv_page, cc->sector_size, bv_in.bv_offset);
 
 	sg_init_table(sg_out, 1);
-	sg_set_page(sg_out, bv_out.bv_page, data_len, bv_out.bv_offset);
+	sg_set_page(sg_out, bv_out.bv_page, cc->sector_size, bv_out.bv_offset);
 
 	if (cc->iv_gen_ops) {
 		/* For READs use IV stored in integrity metadata */
@@ -1171,7 +1193,7 @@ static int crypt_convert_block_skcipher(struct crypt_config *cc,
 		memcpy(iv, org_iv, cc->iv_size);
 	}
 
-	skcipher_request_set_crypt(req, sg_in, sg_out, data_len, iv);
+	skcipher_request_set_crypt(req, sg_in, sg_out, cc->sector_size, iv);
 
 	if (bio_data_dir(ctx->bio_in) == WRITE)
 		r = crypto_skcipher_encrypt(req);
@@ -1181,8 +1203,8 @@ static int crypt_convert_block_skcipher(struct crypt_config *cc,
 	if (!r && cc->iv_gen_ops && cc->iv_gen_ops->post)
 		r = cc->iv_gen_ops->post(cc, org_iv, dmreq);
 
-	bio_advance_iter(ctx->bio_in, &ctx->iter_in, data_len);
-	bio_advance_iter(ctx->bio_out, &ctx->iter_out, data_len);
+	bio_advance_iter(ctx->bio_in, &ctx->iter_in, cc->sector_size);
+	bio_advance_iter(ctx->bio_out, &ctx->iter_out, cc->sector_size);
 
 	return r;
 }
@@ -1268,6 +1290,7 @@ static int crypt_convert(struct crypt_config *cc,
 			 struct convert_context *ctx)
 {
 	unsigned int tag_offset = 0;
+	unsigned int sector_step = cc->sector_size / (1 << SECTOR_SHIFT);
 	int r;
 
 	atomic_set(&ctx->cc_pending, 1);
@@ -1275,7 +1298,6 @@ static int crypt_convert(struct crypt_config *cc,
 	while (ctx->iter_in.bi_size && ctx->iter_out.bi_size) {
 
 		crypt_alloc_req(cc, ctx);
-
 		atomic_inc(&ctx->cc_pending);
 
 		if (crypt_integrity_aead(cc))
@@ -1298,16 +1320,16 @@ static int crypt_convert(struct crypt_config *cc,
 		 */
 		case -EINPROGRESS:
 			ctx->r.req = NULL;
-			ctx->cc_sector++;
-			tag_offset++;
+			ctx->cc_sector += sector_step;
+			tag_offset += sector_step;
 			continue;
 		/*
 		 * The request was already processed (synchronously).
 		 */
 		case 0:
 			atomic_dec(&ctx->cc_pending);
-			ctx->cc_sector++;
-			tag_offset++;
+			ctx->cc_sector += sector_step;
+			tag_offset += sector_step;
 			cond_resched();
 			continue;
 		/*
@@ -2499,10 +2521,11 @@ static int crypt_ctr_optional(struct dm_target *ti, unsigned int argc, char **ar
 	struct crypt_config *cc = ti->private;
 	struct dm_arg_set as;
 	static struct dm_arg _args[] = {
-		{0, 3, "Invalid number of feature args"},
+		{0, 6, "Invalid number of feature args"},
 	};
 	unsigned int opt_params, val;
 	const char *opt_string, *sval;
+	char dummy;
 	int ret;
 
 	/* Optional parameters */
@@ -2545,7 +2568,16 @@ static int crypt_ctr_optional(struct dm_target *ti, unsigned int argc, char **ar
 			cc->cipher_auth = kstrdup(sval, GFP_KERNEL);
 			if (!cc->cipher_auth)
 				return -ENOMEM;
-		}  else {
+		} else if (sscanf(opt_string, "sector_size:%u%c", &cc->sector_size, &dummy) == 1) {
+			if (cc->sector_size < (1 << SECTOR_SHIFT) ||
+			    cc->sector_size > 4096 ||
+			    (1 << ilog2(cc->sector_size) != cc->sector_size)) {
+				ti->error = "Invalid feature value for sector_size";
+				return -EINVAL;
+			}
+		} else if (!strcasecmp(opt_string, "iv_large_sectors"))
+			set_bit(CRYPT_IV_LARGE_SECTORS, &cc->cipher_flags);
+		else {
 			ti->error = "Invalid feature arguments";
 			return -EINVAL;
 		}
@@ -2585,6 +2617,7 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 		return -ENOMEM;
 	}
 	cc->key_size = key_size;
+	cc->sector_size = (1 << SECTOR_SHIFT);
 
 	ti->private = cc;
 
@@ -2657,7 +2690,8 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	mutex_init(&cc->bio_alloc_lock);
 
 	ret = -EINVAL;
-	if (sscanf(argv[2], "%llu%c", &tmpll, &dummy) != 1) {
+	if ((sscanf(argv[2], "%llu%c", &tmpll, &dummy) != 1) ||
+	    (tmpll & ((cc->sector_size >> SECTOR_SHIFT) - 1))) {
 		ti->error = "Invalid iv_offset sector";
 		goto bad;
 	}
@@ -2758,6 +2792,16 @@ static int crypt_map(struct dm_target *ti, struct bio *bio)
 	    (bio_data_dir(bio) == WRITE || cc->on_disk_tag_size))
 		dm_accept_partial_bio(bio, ((BIO_MAX_PAGES << PAGE_SHIFT) >> SECTOR_SHIFT));
 
+	/*
+	 * Ensure that bio is a multiple of internal sector encryption size
+	 * and is aligned to this size as defined in IO hints.
+	 */
+	if (unlikely((bio->bi_iter.bi_sector & ((cc->sector_size >> SECTOR_SHIFT) - 1)) != 0))
+		return -EIO;
+
+	if (unlikely(bio->bi_iter.bi_size & (cc->sector_size - 1)))
+		return -EIO;
+
 	io = dm_per_bio_data(bio, cc->per_bio_data_size);
 	crypt_io_init(io, cc, bio, dm_target_offset(ti, bio->bi_iter.bi_sector));
 
@@ -2765,12 +2809,13 @@ static int crypt_map(struct dm_target *ti, struct bio *bio)
 		unsigned tag_len = cc->on_disk_tag_size * bio_sectors(bio);
 
 		if (unlikely(tag_len > KMALLOC_MAX_SIZE) ||
-		    unlikely(!(io->integrity_metadata = kmalloc(tag_len,
+		    unlikely(!(io->integrity_metadata = kzalloc(tag_len,
 				GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN)))) {
 			if (bio_sectors(bio) > cc->tag_pool_max_sectors)
 				dm_accept_partial_bio(bio, cc->tag_pool_max_sectors);
 			io->integrity_metadata = mempool_alloc(cc->tag_pool, GFP_NOIO);
 			io->integrity_metadata_from_pool = true;
+			memset(io->integrity_metadata, 0, cc->tag_pool_max_sectors * (1 << SECTOR_SHIFT));
 		}
 	}
 
@@ -2818,6 +2863,8 @@ static void crypt_status(struct dm_target *ti, status_type_t type,
 		num_feature_args += !!ti->num_discard_bios;
 		num_feature_args += test_bit(DM_CRYPT_SAME_CPU, &cc->flags);
 		num_feature_args += test_bit(DM_CRYPT_NO_OFFLOAD, &cc->flags);
+		num_feature_args += (cc->sector_size != (1 << SECTOR_SHIFT)) ? 1 : 0;
+		num_feature_args += test_bit(CRYPT_IV_LARGE_SECTORS, &cc->cipher_flags);
 		if (cc->on_disk_tag_size)
 			num_feature_args++;
 		if (num_feature_args) {
@@ -2830,6 +2877,10 @@ static void crypt_status(struct dm_target *ti, status_type_t type,
 				DMEMIT(" submit_from_crypt_cpus");
 			if (cc->on_disk_tag_size)
 				DMEMIT(" integrity:%u:%s", cc->on_disk_tag_size, cc->cipher_auth);
+			if (cc->sector_size != (1 << SECTOR_SHIFT))
+				DMEMIT(" sector_size:%d", cc->sector_size);
+			if (test_bit(CRYPT_IV_LARGE_SECTORS, &cc->cipher_flags))
+				DMEMIT(" iv_large_sectors");
 		}
 
 		break;
@@ -2919,6 +2970,8 @@ static int crypt_iterate_devices(struct dm_target *ti,
 
 static void crypt_io_hints(struct dm_target *ti, struct queue_limits *limits)
 {
+	struct crypt_config *cc = ti->private;
+
 	/*
 	 * Unfortunate constraint that is required to avoid the potential
 	 * for exceeding underlying device's max_segments limits -- due to
@@ -2926,11 +2979,17 @@ static void crypt_io_hints(struct dm_target *ti, struct queue_limits *limits)
 	 * bio that are not as physically contiguous as the original bio.
 	 */
 	limits->max_segment_size = PAGE_SIZE;
+
+	if (cc->sector_size != (1 << SECTOR_SHIFT)) {
+		limits->logical_block_size = cc->sector_size;
+		limits->physical_block_size = cc->sector_size;
+		blk_limits_io_min(limits, cc->sector_size);
+	}
 }
 
 static struct target_type crypt_target = {
 	.name   = "crypt",
-	.version = {1, 16, 0},
+	.version = {1, 17, 0},
 	.module = THIS_MODULE,
 	.ctr    = crypt_ctr,
 	.dtr    = crypt_dtr,
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/7] Data integrity protection with dm-integrity and dm-crypt
  2017-03-16 14:39 ` [PATCH 0/7] Data integrity protection with dm-integrity and dm-crypt Milan Broz
@ 2017-03-16 19:12   ` Mike Snitzer
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Snitzer @ 2017-03-16 19:12 UTC (permalink / raw)
  To: Milan Broz; +Cc: dm-devel

On Thu, Mar 16 2017 at 10:39am -0400,
Milan Broz <gmazyland@gmail.com> wrote:

> These are additional patches to previously submitted integrity protection commits,
> based on for-next branch in Mike's DM tree.
> 
> Most of them probably can be folded-in, patches are separated for better readability.
> 
> The last patch for large encryption sectors is a new feature but because it depends
> on all previous patches it is posted in this series.

I've picked all these up and staged in linux-next and dm-4.12, see:
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.12

(you'll note that I folded them and tweaked headers accordingly)

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-03-16 19:12 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-04 19:23 [RFC PATCH 0/4] Data integrity protection with dm-integrity and dm-crypt Milan Broz
2017-01-04 19:23 ` [RFC PATCH 1/4] dm-table: Add flag to allow own target handling of integrity metadata Milan Broz
2017-01-04 19:23 ` [RFC PATCH 2/4] Add sector start offset to dm-bufio interface Milan Broz
2017-01-04 19:23 ` [RFC PATCH 3/4] Add the dm-integrity target Milan Broz
2017-01-04 19:23 ` [RFC PATCH 4/4] Add cryptographic data integrity protection (authenticated encryption) to dm-crypt Milan Broz
2017-03-16 14:39 ` [PATCH 0/7] Data integrity protection with dm-integrity and dm-crypt Milan Broz
2017-03-16 19:12   ` Mike Snitzer
2017-03-16 14:39 ` [PATCH 1/7] dm-crypt: Fix documentation of integrity table option Milan Broz
2017-03-16 14:39 ` [PATCH 2/7] dm-crypt: Move IV constructor to separate function Milan Broz
2017-03-16 14:39 ` [PATCH 3/7] dm-crypt: Introduce new format of cipher with capi: prefix Milan Broz
2017-03-16 14:39 ` [PATCH 4/7] dm-crypt: Compute HMAC key size in a separate function Milan Broz
2017-03-16 14:39 ` [PATCH 5/7] dm-crypt: Parse cipher specification according to AEAD flag Milan Broz
2017-03-16 14:39 ` [PATCH 6/7] dm-crypt: Remove obsolete integrity_mode function Milan Broz
2017-03-16 14:39 ` [PATCH 7/7] dm-crypt: optionally support larger encryption sector size Milan Broz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.