All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/24] dm cache: proposed changes for v3.13 merge
@ 2013-10-24 18:30 Mike Snitzer
  2013-10-24 18:30 ` [PATCH 01/24] dm: nest targets used for testing under DM_TEST_TARGETS Mike Snitzer
                   ` (23 more replies)
  0 siblings, 24 replies; 49+ messages in thread
From: Mike Snitzer @ 2013-10-24 18:30 UTC (permalink / raw)
  To: dm-devel; +Cc: Morgan Mears, Heinz Mauelshagen, Joe Thornber, Mike Snitzer

The following patches have been published to the 'for-next' branch of
the device-mapper git repository:
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git

The patches marked as RFC (19-24) could use the most review.  But all
review would be appreciated.

Thanks,
Mike

Heinz Mauelshagen (11):
  dm cache policy mq: return NULL if mq->free list is empty in alloc_entry
  dm cache policy: variable hints support
  dm cache policy: have policy_writeback_work return -ENODATA by default
  dm cache: use is_write_io() in more places
  dm cache: use cell_defer() boolean argument consistently
  dm cache: log error message if dm_kcopyd_copy() fails
  dm cache: use a boolean when setting cache->quiescing
  dm cache: optimize commit_if_needed
  dm cache: add hints policy
  dm cache: add cache block invalidation API
  dm cache policy era: add cache block invalidation support

Joe Thornber (9):
  dm: nest targets used for testing under DM_TEST_TARGETS
  dm space map disk: optimise sm_disk_dec_block
  dm cache policy: remove return from void policy_remove_mapping
  dm cache policy mq: a few small fixes
  dm cache policy mq: implement writeback_work() and mq_{set,clear}_dirty()
  dm cache: be much more aggressive about promoting writes to discarded blocks
  dm cache metadata: return bool from __superblock_all_zeroes
  dm cache metadata: check the metadata version when reading the superblock
  dm cache: add passthrough mode

Mike Snitzer (1):
  dm table: print error on preresume failure

Morgan Mears (3):
  dm cache: support for stackable caching policies
  dm cache: add era policy shim
  dm cache: add trc policy shim

 drivers/md/Kconfig                             |  68 ++-
 drivers/md/Makefile                            |   9 +-
 drivers/md/dm-cache-metadata.c                 | 189 +++++-
 drivers/md/dm-cache-metadata.h                 |  28 +-
 drivers/md/dm-cache-policy-cleaner.c           |  10 +-
 drivers/md/dm-cache-policy-era.c               | 542 +++++++++++++++++
 drivers/md/dm-cache-policy-hints.c             | 769 +++++++++++++++++++++++++
 drivers/md/dm-cache-policy-internal.h          |  31 +-
 drivers/md/dm-cache-policy-mq.c                | 343 ++++++++---
 drivers/md/dm-cache-policy-trc.c               | 263 +++++++++
 drivers/md/dm-cache-policy.c                   |  66 ++-
 drivers/md/dm-cache-policy.h                   |  77 ++-
 drivers/md/dm-cache-shim-utils.c               | 217 +++++++
 drivers/md/dm-cache-shim-utils.h               |  73 +++
 drivers/md/dm-cache-stack-utils.c              | 239 ++++++++
 drivers/md/dm-cache-stack-utils.h              |  34 ++
 drivers/md/dm-cache-target.c                   | 417 ++++++++++++--
 drivers/md/dm-table.c                          |   5 +-
 drivers/md/persistent-data/dm-space-map-disk.c |  18 +-
 19 files changed, 3167 insertions(+), 231 deletions(-)
 create mode 100644 drivers/md/dm-cache-policy-era.c
 create mode 100644 drivers/md/dm-cache-policy-hints.c
 create mode 100644 drivers/md/dm-cache-policy-trc.c
 create mode 100644 drivers/md/dm-cache-shim-utils.c
 create mode 100644 drivers/md/dm-cache-shim-utils.h
 create mode 100644 drivers/md/dm-cache-stack-utils.c
 create mode 100644 drivers/md/dm-cache-stack-utils.h

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 01/24] dm: nest targets used for testing under DM_TEST_TARGETS
  2013-10-24 18:30 [PATCH 00/24] dm cache: proposed changes for v3.13 merge Mike Snitzer
@ 2013-10-24 18:30 ` Mike Snitzer
  2013-10-24 23:17   ` Alasdair G Kergon
  2013-10-24 23:51   ` Alasdair G Kergon
  2013-10-24 18:30 ` [PATCH 02/24] dm space map disk: optimise sm_disk_dec_block Mike Snitzer
                   ` (22 subsequent siblings)
  23 siblings, 2 replies; 49+ messages in thread
From: Mike Snitzer @ 2013-10-24 18:30 UTC (permalink / raw)
  To: dm-devel; +Cc: Morgan Mears, Heinz Mauelshagen, Joe Thornber, Mike Snitzer

From: Joe Thornber <ejt@redhat.com>

Both the flakey and delay targets are generally used for testing.  Other
test targets (and cache policies) will be introduced in the future.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/Kconfig | 36 +++++++++++++++++++++---------------
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index 30b426e..816e023 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -371,27 +371,12 @@ config DM_MULTIPATH_ST
 
 	  If unsure, say N.
 
-config DM_DELAY
-	tristate "I/O delaying target"
-	depends on BLK_DEV_DM
-	---help---
-	A target that delays reads and/or writes and can send
-	them to different devices.  Useful for testing.
-
-	If unsure, say N.
-
 config DM_UEVENT
 	bool "DM uevents"
 	depends on BLK_DEV_DM
 	---help---
 	Generate udev events for DM events.
 
-config DM_FLAKEY
-       tristate "Flakey target"
-       depends on BLK_DEV_DM
-       ---help---
-         A target that intermittently fails I/O for debugging purposes.
-
 config DM_VERITY
 	tristate "Verity target support"
 	depends on BLK_DEV_DM
@@ -426,4 +411,25 @@ config DM_SWITCH
 
 	  If unsure, say N.
 
+config DM_TEST_TARGETS
+       bool "DM test targets"
+       depends on BLK_DEV_DM
+       ---help---
+         Targets that are only useful for testing.
+
+config DM_FLAKEY
+       tristate "Flakey target (EXPERIMENTAL)"
+       depends on DM_TEST_TARGETS
+       ---help---
+         A target that intermittently fails I/O for debugging purposes.
+
+config DM_DELAY
+	tristate "I/O delaying target (EXPERIMENTAL)"
+	depends on DM_TEST_TARGETS
+	---help---
+	A target that delays reads and/or writes and can send
+	them to different devices.  Useful for testing.
+
+	If unsure, say N.
+
 endif # MD
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 02/24] dm space map disk: optimise sm_disk_dec_block
  2013-10-24 18:30 [PATCH 00/24] dm cache: proposed changes for v3.13 merge Mike Snitzer
  2013-10-24 18:30 ` [PATCH 01/24] dm: nest targets used for testing under DM_TEST_TARGETS Mike Snitzer
@ 2013-10-24 18:30 ` Mike Snitzer
  2013-10-24 18:30 ` [PATCH 03/24] dm cache policy: remove return from void policy_remove_mapping Mike Snitzer
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Mike Snitzer @ 2013-10-24 18:30 UTC (permalink / raw)
  To: dm-devel; +Cc: Morgan Mears, Heinz Mauelshagen, Joe Thornber, Mike Snitzer

From: Joe Thornber <ejt@redhat.com>

Don't waste time spotting blocks that have been allocated and then freed
in the same transaction.

The extra lookup is expensive, and I don't think it really gives us much.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/persistent-data/dm-space-map-disk.c | 18 +-----------------
 1 file changed, 1 insertion(+), 17 deletions(-)

diff --git a/drivers/md/persistent-data/dm-space-map-disk.c b/drivers/md/persistent-data/dm-space-map-disk.c
index e735a6d..cfbf961 100644
--- a/drivers/md/persistent-data/dm-space-map-disk.c
+++ b/drivers/md/persistent-data/dm-space-map-disk.c
@@ -140,26 +140,10 @@ static int sm_disk_inc_block(struct dm_space_map *sm, dm_block_t b)
 
 static int sm_disk_dec_block(struct dm_space_map *sm, dm_block_t b)
 {
-	int r;
-	uint32_t old_count;
 	enum allocation_event ev;
 	struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
 
-	r = sm_ll_dec(&smd->ll, b, &ev);
-	if (!r && (ev == SM_FREE)) {
-		/*
-		 * It's only free if it's also free in the last
-		 * transaction.
-		 */
-		r = sm_ll_lookup(&smd->old_ll, b, &old_count);
-		if (r)
-			return r;
-
-		if (!old_count)
-			smd->nr_allocated_this_transaction--;
-	}
-
-	return r;
+	return sm_ll_dec(&smd->ll, b, &ev);
 }
 
 static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 03/24] dm cache policy: remove return from void policy_remove_mapping
  2013-10-24 18:30 [PATCH 00/24] dm cache: proposed changes for v3.13 merge Mike Snitzer
  2013-10-24 18:30 ` [PATCH 01/24] dm: nest targets used for testing under DM_TEST_TARGETS Mike Snitzer
  2013-10-24 18:30 ` [PATCH 02/24] dm space map disk: optimise sm_disk_dec_block Mike Snitzer
@ 2013-10-24 18:30 ` Mike Snitzer
  2013-10-24 18:30 ` [PATCH 04/24] dm cache policy mq: a few small fixes Mike Snitzer
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Mike Snitzer @ 2013-10-24 18:30 UTC (permalink / raw)
  To: dm-devel; +Cc: Morgan Mears, Heinz Mauelshagen, Joe Thornber, Mike Snitzer

From: Joe Thornber <ejt@redhat.com>

No need to return from a void function.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-cache-policy-internal.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/dm-cache-policy-internal.h b/drivers/md/dm-cache-policy-internal.h
index 0928abd..a75f7e7 100644
--- a/drivers/md/dm-cache-policy-internal.h
+++ b/drivers/md/dm-cache-policy-internal.h
@@ -61,7 +61,7 @@ static inline int policy_writeback_work(struct dm_cache_policy *p,
 
 static inline void policy_remove_mapping(struct dm_cache_policy *p, dm_oblock_t oblock)
 {
-	return p->remove_mapping(p, oblock);
+	p->remove_mapping(p, oblock);
 }
 
 static inline void policy_force_mapping(struct dm_cache_policy *p,
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 04/24] dm cache policy mq: a few small fixes
  2013-10-24 18:30 [PATCH 00/24] dm cache: proposed changes for v3.13 merge Mike Snitzer
                   ` (2 preceding siblings ...)
  2013-10-24 18:30 ` [PATCH 03/24] dm cache policy: remove return from void policy_remove_mapping Mike Snitzer
@ 2013-10-24 18:30 ` Mike Snitzer
  2013-10-24 18:30 ` [PATCH 05/24] dm cache policy mq: implement writeback_work() and mq_{set, clear}_dirty() Mike Snitzer
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Mike Snitzer @ 2013-10-24 18:30 UTC (permalink / raw)
  To: dm-devel; +Cc: Morgan Mears, Heinz Mauelshagen, Joe Thornber, Mike Snitzer

From: Joe Thornber <ejt@redhat.com>

Rename takeout_queue to concat_queue.

Fix a harmless bug in mq policies pop() function.  Currently pop()
always succeeds, with up coming changes this wont be the case.

Fix typo in comment above pre_cache_to_cache prototype.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-cache-policy-mq.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/md/dm-cache-policy-mq.c b/drivers/md/dm-cache-policy-mq.c
index 4296155..0410a08 100644
--- a/drivers/md/dm-cache-policy-mq.c
+++ b/drivers/md/dm-cache-policy-mq.c
@@ -311,7 +311,7 @@ struct mq_policy {
 
 /*----------------------------------------------------------------*/
 /* Free/alloc mq cache entry structures. */
-static void takeout_queue(struct list_head *lh, struct queue *q)
+static void concat_queue(struct list_head *lh, struct queue *q)
 {
 	unsigned level;
 
@@ -323,8 +323,8 @@ static void free_entries(struct mq_policy *mq)
 {
 	struct entry *e, *tmp;
 
-	takeout_queue(&mq->free, &mq->pre_cache);
-	takeout_queue(&mq->free, &mq->cache);
+	concat_queue(&mq->free, &mq->pre_cache);
+	concat_queue(&mq->free, &mq->cache);
 
 	list_for_each_entry_safe(e, tmp, &mq->free, list)
 		kmem_cache_free(mq_entry_cache, e);
@@ -531,14 +531,16 @@ static void del(struct mq_policy *mq, struct entry *e)
  */
 static struct entry *pop(struct mq_policy *mq, struct queue *q)
 {
-	struct entry *e = container_of(queue_pop(q), struct entry, list);
+	struct entry *e;
+	struct list_head *h = queue_pop(q);
 
-	if (e) {
-		hash_remove(e);
+	if (!h)
+		return NULL;
 
-		if (e->in_cache)
-			free_cblock(mq, e->cblock);
-	}
+	e = container_of(h, struct entry, list);
+	hash_remove(e);
+	if (e->in_cache)
+		free_cblock(mq, e->cblock);
 
 	return e;
 }
@@ -697,7 +699,7 @@ static int cache_entry_found(struct mq_policy *mq,
 }
 
 /*
- * Moves and entry from the pre_cache to the cache.  The main work is
+ * Moves an entry from the pre_cache to the cache.  The main work is
  * finding which cache block to use.
  */
 static int pre_cache_to_cache(struct mq_policy *mq, struct entry *e,
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 05/24] dm cache policy mq: implement writeback_work() and mq_{set, clear}_dirty()
  2013-10-24 18:30 [PATCH 00/24] dm cache: proposed changes for v3.13 merge Mike Snitzer
                   ` (3 preceding siblings ...)
  2013-10-24 18:30 ` [PATCH 04/24] dm cache policy mq: a few small fixes Mike Snitzer
@ 2013-10-24 18:30 ` Mike Snitzer
  2013-10-25 16:06   ` Alasdair G Kergon
  2013-10-24 18:30 ` [PATCH 06/24] dm cache policy mq: return NULL if mq->free list is empty in alloc_entry Mike Snitzer
                   ` (18 subsequent siblings)
  23 siblings, 1 reply; 49+ messages in thread
From: Mike Snitzer @ 2013-10-24 18:30 UTC (permalink / raw)
  To: dm-devel; +Cc: Morgan Mears, Heinz Mauelshagen, Joe Thornber, Mike Snitzer

From: Joe Thornber <ejt@redhat.com>

There are now two multiqueues for in cache blocks.  A clean one and a
dirty one.

writeback_work comes from the dirty one.  Demotions come from the clean
one.

There are two benefits:
- Performance improvement, since demoting a clean block is a noop.
- The cache cleans itself when io load is light.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-cache-policy-mq.c | 140 ++++++++++++++++++++++++++++++++++------
 1 file changed, 122 insertions(+), 18 deletions(-)

diff --git a/drivers/md/dm-cache-policy-mq.c b/drivers/md/dm-cache-policy-mq.c
index 0410a08..108380d 100644
--- a/drivers/md/dm-cache-policy-mq.c
+++ b/drivers/md/dm-cache-policy-mq.c
@@ -224,6 +224,7 @@ struct entry {
 	 * FIXME: pack these better
 	 */
 	bool in_cache:1;
+	bool dirty:1;
 	unsigned hit_count;
 	unsigned generation;
 	unsigned tick;
@@ -238,13 +239,15 @@ struct mq_policy {
 	struct io_tracker tracker;
 
 	/*
-	 * We maintain two queues of entries.  The cache proper contains
-	 * the currently active mappings.  Whereas the pre_cache tracks
-	 * blocks that are being hit frequently and potential candidates
-	 * for promotion to the cache.
+	 * We maintain three queues of entries.  The cache proper,
+	 * consisting of a clean and dirty queue, contains the currently
+	 * active mappings.  Whereas the pre_cache tracks blocks that
+	 * are being hit frequently and potential candidates for promotion
+	 * to the cache.
 	 */
 	struct queue pre_cache;
-	struct queue cache;
+	struct queue cache_clean;
+	struct queue cache_dirty;
 
 	/*
 	 * Keeps track of time, incremented by the core.  We use this to
@@ -324,7 +327,8 @@ static void free_entries(struct mq_policy *mq)
 	struct entry *e, *tmp;
 
 	concat_queue(&mq->free, &mq->pre_cache);
-	concat_queue(&mq->free, &mq->cache);
+	concat_queue(&mq->free, &mq->cache_clean);
+	concat_queue(&mq->free, &mq->cache_dirty);
 
 	list_for_each_entry_safe(e, tmp, &mq->free, list)
 		kmem_cache_free(mq_entry_cache, e);
@@ -508,7 +512,8 @@ static void push(struct mq_policy *mq, struct entry *e)
 
 	if (e->in_cache) {
 		alloc_cblock(mq, e->cblock);
-		queue_push(&mq->cache, queue_level(e), &e->list);
+		queue_push(e->dirty ? &mq->cache_dirty : &mq->cache_clean,
+			   queue_level(e), &e->list);
 	} else
 		queue_push(&mq->pre_cache, queue_level(e), &e->list);
 }
@@ -580,7 +585,16 @@ static void check_generation(struct mq_policy *mq)
 		mq->generation++;
 
 		for (level = 0; level < NR_QUEUE_LEVELS && count < MAX_TO_AVERAGE; level++) {
-			head = mq->cache.qs + level;
+			head = mq->cache_clean.qs + level;
+			list_for_each_entry(e, head, list) {
+				nr++;
+				total += e->hit_count;
+
+				if (++count >= MAX_TO_AVERAGE)
+					break;
+			}
+
+			head = mq->cache_dirty.qs + level;
 			list_for_each_entry(e, head, list) {
 				nr++;
 				total += e->hit_count;
@@ -633,19 +647,28 @@ static void requeue_and_update_tick(struct mq_policy *mq, struct entry *e)
  * - set the hit count to a hard coded value other than 1, eg, is it better
  *   if it goes in at level 2?
  */
-static dm_cblock_t demote_cblock(struct mq_policy *mq, dm_oblock_t *oblock)
+static int demote_cblock(struct mq_policy *mq, dm_oblock_t *oblock, dm_cblock_t *cblock)
 {
-	dm_cblock_t result;
-	struct entry *demoted = pop(mq, &mq->cache);
+	struct entry *demoted = pop(mq, &mq->cache_clean);
 
-	BUG_ON(!demoted);
-	result = demoted->cblock;
+	if (!demoted)
+		/*
+		 * We could get a block from mq->cache_dirty, but that
+		 * would add extra latency to the triggering bio as it
+		 * waits for the writeback.  Better to not promote this
+		 * time and hope there's a clean block next time this block
+		 * is hit.
+		 */
+		return -ENOSPC;
+
+	*cblock = demoted->cblock;
 	*oblock = demoted->oblock;
 	demoted->in_cache = false;
+	demoted->dirty = false;
 	demoted->hit_count = 1;
 	push(mq, demoted);
 
-	return result;
+	return 0;
 }
 
 /*
@@ -705,11 +728,16 @@ static int cache_entry_found(struct mq_policy *mq,
 static int pre_cache_to_cache(struct mq_policy *mq, struct entry *e,
 			      struct policy_result *result)
 {
+	int r;
 	dm_cblock_t cblock;
 
 	if (find_free_cblock(mq, &cblock) == -ENOSPC) {
 		result->op = POLICY_REPLACE;
-		cblock = demote_cblock(mq, &result->old_oblock);
+		r = demote_cblock(mq, &result->old_oblock, &cblock);
+		if (r) {
+			result->op = POLICY_MISS;
+			return 0;
+		}
 	} else
 		result->op = POLICY_NEW;
 
@@ -717,6 +745,7 @@ static int pre_cache_to_cache(struct mq_policy *mq, struct entry *e,
 
 	del(mq, e);
 	e->in_cache = true;
+	e->dirty = false;
 	push(mq, e);
 
 	return 0;
@@ -760,6 +789,7 @@ static void insert_in_pre_cache(struct mq_policy *mq,
 	}
 
 	e->in_cache = false;
+	e->dirty = false;
 	e->oblock = oblock;
 	e->hit_count = 1;
 	e->generation = mq->generation;
@@ -787,6 +817,7 @@ static void insert_in_cache(struct mq_policy *mq, dm_oblock_t oblock,
 	e->oblock = oblock;
 	e->cblock = cblock;
 	e->in_cache = true;
+	e->dirty = false;
 	e->hit_count = 1;
 	e->generation = mq->generation;
 	push(mq, e);
@@ -917,6 +948,36 @@ static int mq_lookup(struct dm_cache_policy *p, dm_oblock_t oblock, dm_cblock_t
 	return r;
 }
 
+// FIXME: can __mq_set_clear_dirty block?
+static void __mq_set_clear_dirty(struct dm_cache_policy *p, dm_oblock_t oblock, bool set)
+{
+	struct mq_policy *mq = to_mq_policy(p);
+	struct entry *e;
+
+	mutex_lock(&mq->lock);
+	e = hash_lookup(mq, oblock);
+	if (!e)
+		DMWARN("__mq_set_clear_dirty called for a block that isn't in the cache");
+	else {
+		BUG_ON(!e->in_cache);
+
+		del(mq, e);
+		e->dirty = set;
+		push(mq, e);
+	}
+	mutex_unlock(&mq->lock);
+}
+
+static void mq_set_dirty(struct dm_cache_policy *p, dm_oblock_t oblock)
+{
+	__mq_set_clear_dirty(p, oblock, true);
+}
+
+static void mq_clear_dirty(struct dm_cache_policy *p, dm_oblock_t oblock)
+{
+	__mq_set_clear_dirty(p, oblock, false);
+}
+
 static int mq_load_mapping(struct dm_cache_policy *p,
 			   dm_oblock_t oblock, dm_cblock_t cblock,
 			   uint32_t hint, bool hint_valid)
@@ -931,6 +992,7 @@ static int mq_load_mapping(struct dm_cache_policy *p,
 	e->cblock = cblock;
 	e->oblock = oblock;
 	e->in_cache = true;
+	e->dirty = true;	/* this gets corrected in a minute */
 	e->hit_count = hint_valid ? hint : 1;
 	e->generation = mq->generation;
 	push(mq, e);
@@ -949,7 +1011,14 @@ static int mq_walk_mappings(struct dm_cache_policy *p, policy_walk_fn fn,
 	mutex_lock(&mq->lock);
 
 	for (level = 0; level < NR_QUEUE_LEVELS; level++)
-		list_for_each_entry(e, &mq->cache.qs[level], list) {
+		list_for_each_entry(e, &mq->cache_clean.qs[level], list) {
+			r = fn(context, e->cblock, e->oblock, e->hit_count);
+			if (r)
+				goto out;
+		}
+
+	for (level = 0; level < NR_QUEUE_LEVELS; level++)
+		list_for_each_entry(e, &mq->cache_dirty.qs[level], list) {
 			r = fn(context, e->cblock, e->oblock, e->hit_count);
 			if (r)
 				goto out;
@@ -974,9 +1043,39 @@ static void mq_remove_mapping(struct dm_cache_policy *p, dm_oblock_t oblock)
 
 	del(mq, e);
 	e->in_cache = false;
+	e->dirty = false;
+	push(mq, e);
+
+	mutex_unlock(&mq->lock);
+}
+
+static int __mq_writeback_work(struct mq_policy *mq, dm_oblock_t *oblock,
+			      dm_cblock_t *cblock)
+{
+	struct entry *e = pop(mq, &mq->cache_dirty);
+
+	if (!e)
+		return -ENODATA;
+
+	*oblock = e->oblock;
+	*cblock = e->cblock;
+	e->dirty = false;
 	push(mq, e);
 
+	return 0;
+}
+
+static int mq_writeback_work(struct dm_cache_policy *p, dm_oblock_t *oblock,
+			     dm_cblock_t *cblock)
+{
+	int r;
+	struct mq_policy *mq = to_mq_policy(p);
+
+	mutex_lock(&mq->lock);
+	r = __mq_writeback_work(mq, oblock, cblock);
 	mutex_unlock(&mq->lock);
+
+	return r;
 }
 
 static void force_mapping(struct mq_policy *mq,
@@ -988,6 +1087,7 @@ static void force_mapping(struct mq_policy *mq,
 
 	del(mq, e);
 	e->oblock = new_oblock;
+	e->dirty = true;
 	push(mq, e);
 }
 
@@ -1059,10 +1159,12 @@ static void init_policy_functions(struct mq_policy *mq)
 	mq->policy.destroy = mq_destroy;
 	mq->policy.map = mq_map;
 	mq->policy.lookup = mq_lookup;
+	mq->policy.set_dirty = mq_set_dirty;
+	mq->policy.clear_dirty = mq_clear_dirty;
 	mq->policy.load_mapping = mq_load_mapping;
 	mq->policy.walk_mappings = mq_walk_mappings;
 	mq->policy.remove_mapping = mq_remove_mapping;
-	mq->policy.writeback_work = NULL;
+	mq->policy.writeback_work = mq_writeback_work;
 	mq->policy.force_mapping = mq_force_mapping;
 	mq->policy.residency = mq_residency;
 	mq->policy.tick = mq_tick;
@@ -1095,7 +1197,9 @@ static struct dm_cache_policy *mq_create(dm_cblock_t cache_size,
 	mq->find_free_last_word = 0;
 
 	queue_init(&mq->pre_cache);
-	queue_init(&mq->cache);
+	queue_init(&mq->cache_clean);
+	queue_init(&mq->cache_dirty);
+
 	mq->generation_period = max((unsigned) from_cblock(cache_size), 1024U);
 
 	mq->nr_entries = 2 * from_cblock(cache_size);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 06/24] dm cache policy mq: return NULL if mq->free list is empty in alloc_entry
  2013-10-24 18:30 [PATCH 00/24] dm cache: proposed changes for v3.13 merge Mike Snitzer
                   ` (4 preceding siblings ...)
  2013-10-24 18:30 ` [PATCH 05/24] dm cache policy mq: implement writeback_work() and mq_{set, clear}_dirty() Mike Snitzer
@ 2013-10-24 18:30 ` Mike Snitzer
  2013-10-25 16:37   ` Alasdair G Kergon
  2013-10-29 14:49   ` [PATCH 06/24 v2] dm cache policy mq: return NULL from alloc_entry if cache is full Mike Snitzer
  2013-10-24 18:30 ` [PATCH 07/24] dm cache: be much more aggressive about promoting writes to discarded blocks Mike Snitzer
                   ` (17 subsequent siblings)
  23 siblings, 2 replies; 49+ messages in thread
From: Mike Snitzer @ 2013-10-24 18:30 UTC (permalink / raw)
  To: dm-devel; +Cc: Morgan Mears, Heinz Mauelshagen, Joe Thornber, Mike Snitzer

From: Heinz Mauelshagen <heinzm@redhat.com>

Addresses callers' (insert_in*cache()) requirement that alloc_entry()
return NULL when an entry isn't able to be allocated.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-cache-policy-mq.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm-cache-policy-mq.c b/drivers/md/dm-cache-policy-mq.c
index 108380d..35e6798 100644
--- a/drivers/md/dm-cache-policy-mq.c
+++ b/drivers/md/dm-cache-policy-mq.c
@@ -399,18 +399,20 @@ static void hash_remove(struct entry *e)
  */
 static struct entry *alloc_entry(struct mq_policy *mq)
 {
-	struct entry *e;
+	struct entry *e = NULL;
 
 	if (mq->nr_entries_allocated >= mq->nr_entries) {
 		BUG_ON(!list_empty(&mq->free));
 		return NULL;
 	}
 
-	e = list_entry(list_pop(&mq->free), struct entry, list);
-	INIT_LIST_HEAD(&e->list);
-	INIT_HLIST_NODE(&e->hlist);
+	if (!list_empty(&mq->free)) {
+		e = list_entry(list_pop(&mq->free), struct entry, list);
+		INIT_LIST_HEAD(&e->list);
+		INIT_HLIST_NODE(&e->hlist);
+		mq->nr_entries_allocated++;
+	}
 
-	mq->nr_entries_allocated++;
 	return e;
 }
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 07/24] dm cache: be much more aggressive about promoting writes to discarded blocks
  2013-10-24 18:30 [PATCH 00/24] dm cache: proposed changes for v3.13 merge Mike Snitzer
                   ` (5 preceding siblings ...)
  2013-10-24 18:30 ` [PATCH 06/24] dm cache policy mq: return NULL if mq->free list is empty in alloc_entry Mike Snitzer
@ 2013-10-24 18:30 ` Mike Snitzer
  2013-10-24 18:30 ` [PATCH 08/24] dm cache metadata: return bool from __superblock_all_zeroes Mike Snitzer
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Mike Snitzer @ 2013-10-24 18:30 UTC (permalink / raw)
  To: dm-devel; +Cc: Morgan Mears, Heinz Mauelshagen, Joe Thornber, Mike Snitzer

From: Joe Thornber <ejt@redhat.com>

Previously these promotions only got priority if there were unused cache
blocks.  Now we give them priority if there are any clean blocks in the
cache.

The fio_soak_test in the device-mapper-test-suite now gives uniform
performance across subvolumes (~16 seconds).

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-cache-policy-mq.c | 84 ++++++++++++++++++++++++++++++-----------
 1 file changed, 63 insertions(+), 21 deletions(-)

diff --git a/drivers/md/dm-cache-policy-mq.c b/drivers/md/dm-cache-policy-mq.c
index 35e6798..152e979 100644
--- a/drivers/md/dm-cache-policy-mq.c
+++ b/drivers/md/dm-cache-policy-mq.c
@@ -151,6 +151,21 @@ static void queue_init(struct queue *q)
 }
 
 /*
+ * Checks to see if the queue is empty.
+ * FIXME: reduce cpu usage.
+ */
+static bool queue_empty(struct queue *q)
+{
+	unsigned i;
+
+	for (i = 0; i < NR_QUEUE_LEVELS; i++)
+		if (!list_empty(q->qs + i))
+			return false;
+
+	return true;
+}
+
+/*
  * Insert an entry to the back of the given level.
  */
 static void queue_push(struct queue *q, unsigned level, struct list_head *elt)
@@ -444,6 +459,11 @@ static bool any_free_cblocks(struct mq_policy *mq)
 	return mq->nr_cblocks_allocated < from_cblock(mq->cache_size);
 }
 
+static bool any_clean_cblocks(struct mq_policy *mq)
+{
+	return !queue_empty(&mq->cache_clean);
+}
+
 /*
  * Fills result out with a cache block that isn't in use, or return
  * -ENOSPC.  This does _not_ mark the cblock as allocated, the caller is
@@ -689,17 +709,18 @@ static int demote_cblock(struct mq_policy *mq, dm_oblock_t *oblock, dm_cblock_t
 static unsigned adjusted_promote_threshold(struct mq_policy *mq,
 					   bool discarded_oblock, int data_dir)
 {
-	if (discarded_oblock && any_free_cblocks(mq) && data_dir == WRITE)
+	if (data_dir == READ)
+		return mq->promote_threshold + READ_PROMOTE_THRESHOLD;
+
+	if (discarded_oblock && (any_free_cblocks(mq) || any_clean_cblocks(mq))) {
 		/*
 		 * We don't need to do any copying at all, so give this a
-		 * very low threshold.  In practice this only triggers
-		 * during initial population after a format.
+		 * very low threshold.
 		 */
 		return DISCARDED_PROMOTE_THRESHOLD;
+	}
 
-	return data_dir == READ ?
-		(mq->promote_threshold + READ_PROMOTE_THRESHOLD) :
-		(mq->promote_threshold + WRITE_PROMOTE_THRESHOLD);
+	return mq->promote_threshold + WRITE_PROMOTE_THRESHOLD;
 }
 
 static bool should_promote(struct mq_policy *mq, struct entry *e,
@@ -773,6 +794,17 @@ static int pre_cache_entry_found(struct mq_policy *mq, struct entry *e,
 	return r;
 }
 
+static void insert_entry_in_pre_cache(struct mq_policy *mq,
+				      struct entry *e, dm_oblock_t oblock)
+{
+	e->in_cache = false;
+	e->dirty = false;
+	e->oblock = oblock;
+	e->hit_count = 1;
+	e->generation = mq->generation;
+	push(mq, e);
+}
+
 static void insert_in_pre_cache(struct mq_policy *mq,
 				dm_oblock_t oblock)
 {
@@ -790,30 +822,41 @@ static void insert_in_pre_cache(struct mq_policy *mq,
 		return;
 	}
 
-	e->in_cache = false;
-	e->dirty = false;
-	e->oblock = oblock;
-	e->hit_count = 1;
-	e->generation = mq->generation;
-	push(mq, e);
+	insert_entry_in_pre_cache(mq, e, oblock);
 }
 
 static void insert_in_cache(struct mq_policy *mq, dm_oblock_t oblock,
 			    struct policy_result *result)
 {
+	int r;
 	struct entry *e;
 	dm_cblock_t cblock;
 
 	if (find_free_cblock(mq, &cblock) == -ENOSPC) {
-		result->op = POLICY_MISS;
-		insert_in_pre_cache(mq, oblock);
-		return;
-	}
+		r = demote_cblock(mq, &result->old_oblock, &cblock);
+		if (unlikely(r)) {
+			result->op = POLICY_MISS;
+			insert_in_pre_cache(mq, oblock);
+			return;
+		}
 
-	e = alloc_entry(mq);
-	if (unlikely(!e)) {
-		result->op = POLICY_MISS;
-		return;
+		/*
+		 * This will always succeed, since we've just demoted.
+		 */
+		e = pop(mq, &mq->pre_cache);
+		result->op = POLICY_REPLACE;
+
+	} else {
+		e = alloc_entry(mq);
+		if (unlikely(!e))
+			e = pop(mq, &mq->pre_cache);
+
+		if (unlikely(!e)) {
+			result->op = POLICY_MISS;
+			return;
+		}
+
+		result->op = POLICY_NEW;
 	}
 
 	e->oblock = oblock;
@@ -824,7 +867,6 @@ static void insert_in_cache(struct mq_policy *mq, dm_oblock_t oblock,
 	e->generation = mq->generation;
 	push(mq, e);
 
-	result->op = POLICY_NEW;
 	result->cblock = e->cblock;
 }
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 08/24] dm cache metadata: return bool from __superblock_all_zeroes
  2013-10-24 18:30 [PATCH 00/24] dm cache: proposed changes for v3.13 merge Mike Snitzer
                   ` (6 preceding siblings ...)
  2013-10-24 18:30 ` [PATCH 07/24] dm cache: be much more aggressive about promoting writes to discarded blocks Mike Snitzer
@ 2013-10-24 18:30 ` Mike Snitzer
  2013-10-24 18:30 ` [PATCH 09/24] dm cache metadata: check the metadata version when reading the superblock Mike Snitzer
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Mike Snitzer @ 2013-10-24 18:30 UTC (permalink / raw)
  To: dm-devel; +Cc: Morgan Mears, Heinz Mauelshagen, Joe Thornber, Mike Snitzer

From: Joe Thornber <ejt@redhat.com>

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-cache-metadata.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/md/dm-cache-metadata.c b/drivers/md/dm-cache-metadata.c
index 1af7255..2262b4e 100644
--- a/drivers/md/dm-cache-metadata.c
+++ b/drivers/md/dm-cache-metadata.c
@@ -198,7 +198,7 @@ static int superblock_lock(struct dm_cache_metadata *cmd,
 
 /*----------------------------------------------------------------*/
 
-static int __superblock_all_zeroes(struct dm_block_manager *bm, int *result)
+static int __superblock_all_zeroes(struct dm_block_manager *bm, bool *result)
 {
 	int r;
 	unsigned i;
@@ -214,10 +214,10 @@ static int __superblock_all_zeroes(struct dm_block_manager *bm, int *result)
 		return r;
 
 	data_le = dm_block_data(b);
-	*result = 1;
+	*result = true;
 	for (i = 0; i < sb_block_size; i++) {
 		if (data_le[i] != zero) {
-			*result = 0;
+			*result = false;
 			break;
 		}
 	}
@@ -411,7 +411,8 @@ bad:
 static int __open_or_format_metadata(struct dm_cache_metadata *cmd,
 				     bool format_device)
 {
-	int r, unformatted;
+	int r;
+	bool unformatted = false;
 
 	r = __superblock_all_zeroes(cmd->bm, &unformatted);
 	if (r)
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 09/24] dm cache metadata: check the metadata version when reading the superblock
  2013-10-24 18:30 [PATCH 00/24] dm cache: proposed changes for v3.13 merge Mike Snitzer
                   ` (7 preceding siblings ...)
  2013-10-24 18:30 ` [PATCH 08/24] dm cache metadata: return bool from __superblock_all_zeroes Mike Snitzer
@ 2013-10-24 18:30 ` Mike Snitzer
  2013-10-25 17:07   ` Alasdair G Kergon
  2013-10-24 18:30 ` [PATCH 10/24] dm cache policy: variable hints support Mike Snitzer
                   ` (14 subsequent siblings)
  23 siblings, 1 reply; 49+ messages in thread
From: Mike Snitzer @ 2013-10-24 18:30 UTC (permalink / raw)
  To: dm-devel; +Cc: Morgan Mears, Heinz Mauelshagen, Joe Thornber, Mike Snitzer

From: Joe Thornber <ejt@redhat.com>

Need to check the version to verify on-disk metadata is supported.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-cache-metadata.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm-cache-metadata.c b/drivers/md/dm-cache-metadata.c
index 2262b4e..c409c1a 100644
--- a/drivers/md/dm-cache-metadata.c
+++ b/drivers/md/dm-cache-metadata.c
@@ -20,7 +20,13 @@
 
 #define CACHE_SUPERBLOCK_MAGIC 06142003
 #define CACHE_SUPERBLOCK_LOCATION 0
-#define CACHE_VERSION 1
+
+/*
+ * defines a range of metadata versions that this module can handle.
+ */
+#define MIN_CACHE_VERSION 1
+#define MAX_CACHE_VERSION 1
+
 #define CACHE_METADATA_CACHE_SIZE 64
 
 /*
@@ -134,6 +140,18 @@ static void sb_prepare_for_write(struct dm_block_validator *v,
 						      SUPERBLOCK_CSUM_XOR));
 }
 
+static int check_metadata_version(struct cache_disk_superblock *disk_super)
+{
+	uint32_t metadata_version = le32_to_cpu(disk_super->version);
+	if (metadata_version < MIN_CACHE_VERSION || metadata_version > MAX_CACHE_VERSION) {
+		DMERR("kernel does not support this version of cache metadata (%u)",
+		      metadata_version);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static int sb_check(struct dm_block_validator *v,
 		    struct dm_block *b,
 		    size_t sb_block_size)
@@ -164,7 +182,7 @@ static int sb_check(struct dm_block_validator *v,
 		return -EILSEQ;
 	}
 
-	return 0;
+	return check_metadata_version(disk_super);
 }
 
 static struct dm_block_validator sb_validator = {
@@ -270,7 +288,7 @@ static int __write_initial_superblock(struct dm_cache_metadata *cmd)
 	disk_super->flags = 0;
 	memset(disk_super->uuid, 0, sizeof(disk_super->uuid));
 	disk_super->magic = cpu_to_le64(CACHE_SUPERBLOCK_MAGIC);
-	disk_super->version = cpu_to_le32(CACHE_VERSION);
+	disk_super->version = cpu_to_le32(MAX_CACHE_VERSION);
 	memset(disk_super->policy_name, 0, sizeof(disk_super->policy_name));
 	memset(disk_super->policy_version, 0, sizeof(disk_super->policy_version));
 	disk_super->policy_hint_size = 0;
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 10/24] dm cache policy: variable hints support
  2013-10-24 18:30 [PATCH 00/24] dm cache: proposed changes for v3.13 merge Mike Snitzer
                   ` (8 preceding siblings ...)
  2013-10-24 18:30 ` [PATCH 09/24] dm cache metadata: check the metadata version when reading the superblock Mike Snitzer
@ 2013-10-24 18:30 ` Mike Snitzer
  2013-10-24 18:30 ` [PATCH 11/24] dm table: print error on preresume failure Mike Snitzer
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Mike Snitzer @ 2013-10-24 18:30 UTC (permalink / raw)
  To: dm-devel; +Cc: Morgan Mears, Heinz Mauelshagen, Joe Thornber, Mike Snitzer

From: Heinz Mauelshagen <heinzm@redhat.com>

Policies can now specify a hint size other than 0 or 4.  The
DM_CACHE_POLICY_MAX_HINT_SIZE is 128.

Upcoming policy stack support will make use of variable hints.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-cache-metadata.c        | 104 +++++++++++++++++++++++++++-------
 drivers/md/dm-cache-metadata.h        |  23 +++++---
 drivers/md/dm-cache-policy-cleaner.c  |   2 +-
 drivers/md/dm-cache-policy-internal.h |   5 +-
 drivers/md/dm-cache-policy-mq.c       |  44 ++++++++------
 drivers/md/dm-cache-policy.c          |  19 ++++++-
 drivers/md/dm-cache-policy.h          |  14 +++--
 drivers/md/dm-cache-target.c          |   8 ++-
 8 files changed, 159 insertions(+), 60 deletions(-)

diff --git a/drivers/md/dm-cache-metadata.c b/drivers/md/dm-cache-metadata.c
index c409c1a..822972c 100644
--- a/drivers/md/dm-cache-metadata.c
+++ b/drivers/md/dm-cache-metadata.c
@@ -119,6 +119,7 @@ struct dm_cache_metadata {
 	char policy_name[CACHE_POLICY_NAME_SIZE];
 	unsigned policy_version[CACHE_POLICY_VERSION_SIZE];
 	size_t policy_hint_size;
+	void *policy_hint_value_buffer;
 	struct dm_cache_statistics stats;
 };
 
@@ -243,7 +244,7 @@ static int __superblock_all_zeroes(struct dm_block_manager *bm, bool *result)
 	return dm_bm_unlock(b);
 }
 
-static void __setup_mapping_info(struct dm_cache_metadata *cmd)
+static int __setup_mapping_info(struct dm_cache_metadata *cmd)
 {
 	struct dm_btree_value_type vt;
 
@@ -255,9 +256,30 @@ static void __setup_mapping_info(struct dm_cache_metadata *cmd)
 	dm_array_info_init(&cmd->info, cmd->tm, &vt);
 
 	if (cmd->policy_hint_size) {
-		vt.size = sizeof(__le32);
+		if (cmd->policy_hint_size > DM_CACHE_POLICY_MAX_HINT_SIZE ||
+		    cmd->policy_hint_size % 4) {
+			DMERR("hint size not divisible by 4 or is larger than %d",
+			      (int) DM_CACHE_POLICY_MAX_HINT_SIZE);
+			return -EINVAL;
+		}
+
+		vt.size = cmd->policy_hint_size;
 		dm_array_info_init(&cmd->hint_info, cmd->tm, &vt);
-	}
+
+		cmd->policy_hint_value_buffer = kmalloc(cmd->policy_hint_size, GFP_KERNEL);
+		if (!cmd->policy_hint_value_buffer) {
+			DMERR("unable to allocate hint value buffer");
+			return -ENOMEM;
+		}
+	} else
+		cmd->policy_hint_value_buffer = NULL;
+
+	return 0;
+}
+
+static void __destroy_mapping_info(struct dm_cache_metadata *cmd)
+{
+	kfree(cmd->policy_hint_value_buffer);
 }
 
 static int __write_initial_superblock(struct dm_cache_metadata *cmd)
@@ -330,7 +352,9 @@ static int __format_metadata(struct dm_cache_metadata *cmd)
 		return r;
 	}
 
-	__setup_mapping_info(cmd);
+	r = __setup_mapping_info(cmd);
+	if (r < 0)
+		goto bad_mapping_info;
 
 	r = dm_array_empty(&cmd->info, &cmd->root);
 	if (r < 0)
@@ -353,6 +377,8 @@ static int __format_metadata(struct dm_cache_metadata *cmd)
 	return 0;
 
 bad:
+	__destroy_mapping_info(cmd);
+bad_mapping_info:
 	dm_tm_destroy(cmd->tm);
 	dm_sm_destroy(cmd->metadata_sm);
 
@@ -387,6 +413,12 @@ static int __check_incompat_features(struct cache_disk_superblock *disk_super,
 	return 0;
 }
 
+static bool using_variable_size_hints(struct cache_disk_superblock *disk_super)
+{
+	unsigned long iflags = le32_to_cpu(disk_super->incompat_flags);
+	return test_bit(DM_CACHE_VARIABLE_HINT_SIZE, &iflags);
+}
+
 static int __open_metadata(struct dm_cache_metadata *cmd)
 {
 	int r;
@@ -415,7 +447,18 @@ static int __open_metadata(struct dm_cache_metadata *cmd)
 		goto bad;
 	}
 
-	__setup_mapping_info(cmd);
+	/*
+	 * We need to set the hint size before calling __setup_mapping_info()
+	 */
+	if (using_variable_size_hints(disk_super))
+		cmd->policy_hint_size = le32_to_cpu(disk_super->policy_hint_size);
+	else
+		cmd->policy_hint_size = DM_CACHE_POLICY_DEF_HINT_SIZE;
+
+	r = __setup_mapping_info(cmd);
+	if (r < 0)
+		goto bad;
+
 	dm_disk_bitset_init(cmd->tm, &cmd->discard_info);
 	sb_flags = le32_to_cpu(disk_super->flags);
 	cmd->clean_when_opened = test_bit(CLEAN_SHUTDOWN, &sb_flags);
@@ -503,7 +546,16 @@ static void read_superblock_fields(struct dm_cache_metadata *cmd,
 	cmd->policy_version[0] = le32_to_cpu(disk_super->policy_version[0]);
 	cmd->policy_version[1] = le32_to_cpu(disk_super->policy_version[1]);
 	cmd->policy_version[2] = le32_to_cpu(disk_super->policy_version[2]);
-	cmd->policy_hint_size = le32_to_cpu(disk_super->policy_hint_size);
+
+	if (using_variable_size_hints(disk_super))
+		cmd->policy_hint_size = le32_to_cpu(disk_super->policy_hint_size);
+	else {
+		/*
+		 * Must establish policy_hint_size because older superblock
+		 * wouldn't have it.
+		 */
+		cmd->policy_hint_size = DM_CACHE_POLICY_DEF_HINT_SIZE;
+	}
 
 	cmd->stats.read_hits = le32_to_cpu(disk_super->read_hits);
 	cmd->stats.read_misses = le32_to_cpu(disk_super->read_misses);
@@ -601,6 +653,15 @@ static int __commit_transaction(struct dm_cache_metadata *cmd,
 	disk_super->policy_version[1] = cpu_to_le32(cmd->policy_version[1]);
 	disk_super->policy_version[2] = cpu_to_le32(cmd->policy_version[2]);
 
+	if (cmd->policy_hint_size != DM_CACHE_POLICY_DEF_HINT_SIZE) {
+		unsigned long iflags = 0;
+		set_bit(DM_CACHE_VARIABLE_HINT_SIZE, &iflags);
+		disk_super->incompat_flags = cpu_to_le32(iflags);
+	} else
+		disk_super->incompat_flags = cpu_to_le32(0u);
+
+	disk_super->policy_hint_size =  cpu_to_le32(cmd->policy_hint_size);
+
 	disk_super->read_hits = cpu_to_le32(cmd->stats.read_hits);
 	disk_super->read_misses = cpu_to_le32(cmd->stats.read_misses);
 	disk_super->write_hits = cpu_to_le32(cmd->stats.write_hits);
@@ -666,6 +727,7 @@ struct dm_cache_metadata *dm_cache_metadata_open(struct block_device *bdev,
 
 	r = __create_persistent_data_objects(cmd, may_format_device);
 	if (r) {
+		__destroy_mapping_info(cmd);
 		kfree(cmd);
 		return ERR_PTR(r);
 	}
@@ -682,6 +744,7 @@ struct dm_cache_metadata *dm_cache_metadata_open(struct block_device *bdev,
 void dm_cache_metadata_close(struct dm_cache_metadata *cmd)
 {
 	__destroy_persistent_data_objects(cmd);
+	__destroy_mapping_info(cmd);
 	kfree(cmd);
 }
 
@@ -927,7 +990,6 @@ static int __load_mapping(void *context, uint64_t cblock, void *leaf)
 	int r = 0;
 	bool dirty;
 	__le64 value;
-	__le32 hint_value = 0;
 	dm_oblock_t oblock;
 	unsigned flags;
 	struct thunk *thunk = context;
@@ -939,14 +1001,14 @@ static int __load_mapping(void *context, uint64_t cblock, void *leaf)
 	if (flags & M_VALID) {
 		if (thunk->hints_valid) {
 			r = dm_array_get_value(&cmd->hint_info, cmd->hint_root,
-					       cblock, &hint_value);
+					       cblock, cmd->policy_hint_value_buffer);
 			if (r && r != -ENODATA)
 				return r;
 		}
 
 		dirty = thunk->respect_dirty_flags ? (flags & M_DIRTY) : true;
 		r = thunk->fn(thunk->context, oblock, to_cblock(cblock),
-			      dirty, le32_to_cpu(hint_value), thunk->hints_valid);
+			      dirty, cmd->policy_hint_value_buffer, thunk->hints_valid);
 	}
 
 	return r;
@@ -1122,8 +1184,6 @@ int dm_cache_get_metadata_dev_size(struct dm_cache_metadata *cmd,
 static int begin_hints(struct dm_cache_metadata *cmd, struct dm_cache_policy *policy)
 {
 	int r;
-	__le32 value;
-	size_t hint_size;
 	const char *policy_name = dm_cache_policy_get_name(policy);
 	const unsigned *policy_version = dm_cache_policy_get_version(policy);
 
@@ -1132,6 +1192,8 @@ static int begin_hints(struct dm_cache_metadata *cmd, struct dm_cache_policy *po
 		return -EINVAL;
 
 	if (!policy_unchanged(cmd, policy)) {
+		size_t hint_size;
+
 		strncpy(cmd->policy_name, policy_name, sizeof(cmd->policy_name));
 		memcpy(cmd->policy_version, policy_version, sizeof(cmd->policy_version));
 
@@ -1150,11 +1212,11 @@ static int begin_hints(struct dm_cache_metadata *cmd, struct dm_cache_policy *po
 		if (r)
 			return r;
 
-		value = cpu_to_le32(0);
+		memset(cmd->policy_hint_value_buffer, 0, hint_size);
 		__dm_bless_for_disk(&value);
 		r = dm_array_resize(&cmd->hint_info, cmd->hint_root, 0,
 				    from_cblock(cmd->cache_blocks),
-				    &value, &cmd->hint_root);
+				    cmd->policy_hint_value_buffer, &cmd->hint_root);
 		if (r)
 			return r;
 	}
@@ -1173,27 +1235,27 @@ int dm_cache_begin_hints(struct dm_cache_metadata *cmd, struct dm_cache_policy *
 	return r;
 }
 
-static int save_hint(struct dm_cache_metadata *cmd, dm_cblock_t cblock,
-		     uint32_t hint)
+static int save_hint(struct dm_cache_metadata *cmd, dm_cblock_t cblock, void *hint)
+	__dm_written_to_disk(hint)
 {
 	int r;
-	__le32 value = cpu_to_le32(hint);
-	__dm_bless_for_disk(&value);
 
 	r = dm_array_set_value(&cmd->hint_info, cmd->hint_root,
-			       from_cblock(cblock), &value, &cmd->hint_root);
+			       from_cblock(cblock), hint, &cmd->hint_root);
 	cmd->changed = true;
 
 	return r;
 }
 
-int dm_cache_save_hint(struct dm_cache_metadata *cmd, dm_cblock_t cblock,
-		       uint32_t hint)
+int dm_cache_save_hint(struct dm_cache_metadata *cmd, dm_cblock_t cblock, void *hint)
+	__dm_written_to_disk(hint)
 {
 	int r;
 
-	if (!hints_array_initialized(cmd))
+	if (!hints_array_initialized(cmd)) {
+		__dm_unbless_for_disk(hint);
 		return 0;
+	}
 
 	down_write(&cmd->root_lock);
 	r = save_hint(cmd, cblock, hint);
diff --git a/drivers/md/dm-cache-metadata.h b/drivers/md/dm-cache-metadata.h
index f45cef2..44fd4bf 100644
--- a/drivers/md/dm-cache-metadata.h
+++ b/drivers/md/dm-cache-metadata.h
@@ -49,7 +49,12 @@
  */
 #define DM_CACHE_FEATURE_COMPAT_SUPP	  0UL
 #define DM_CACHE_FEATURE_COMPAT_RO_SUPP	  0UL
-#define DM_CACHE_FEATURE_INCOMPAT_SUPP	  0UL
+
+enum dm_cache_incompat_bits {
+	DM_CACHE_VARIABLE_HINT_SIZE = 0
+};
+
+#define DM_CACHE_FEATURE_INCOMPAT_SUPP	  (1 << DM_CACHE_VARIABLE_HINT_SIZE)
 
 /*
  * Reopens or creates a new, empty metadata volume.
@@ -87,7 +92,7 @@ int dm_cache_changed_this_transaction(struct dm_cache_metadata *cmd);
 
 typedef int (*load_mapping_fn)(void *context, dm_oblock_t oblock,
 			       dm_cblock_t cblock, bool dirty,
-			       uint32_t hint, bool hint_valid);
+			       void *hint, bool hint_valid);
 int dm_cache_load_mappings(struct dm_cache_metadata *cmd,
 			   struct dm_cache_policy *policy,
 			   load_mapping_fn fn,
@@ -118,9 +123,10 @@ int dm_cache_get_metadata_dev_size(struct dm_cache_metadata *cmd,
 void dm_cache_dump(struct dm_cache_metadata *cmd);
 
 /*
- * The policy is invited to save a 32bit hint value for every cblock (eg,
- * for a hit count).  These are stored against the policy name.  If
- * policies are changed, then hints will be lost.  If the machine crashes,
+ * The policy is invited to save a hint (void* sequence of bytes) for every
+ * cblock (eg, for a hit count) and is reponsible to do endianess conversions.
+ * These are stored against the policy name.
+ * If policies are changed, then hints will be lost.  If the machine crashes,
  * hints will be lost.
  *
  * The hints are indexed by the cblock, but many policies will not
@@ -132,10 +138,13 @@ void dm_cache_dump(struct dm_cache_metadata *cmd);
 int dm_cache_begin_hints(struct dm_cache_metadata *cmd, struct dm_cache_policy *p);
 
 /*
- * requests hints for every cblock and stores in the metadata device.
+ * Saves the hint for a given cblock in the metadata device.  Policy
+ * modules must perform any endian conversions needed and bless the hints
+ * for disk.
  */
 int dm_cache_save_hint(struct dm_cache_metadata *cmd,
-		       dm_cblock_t cblock, uint32_t hint);
+		       dm_cblock_t cblock, void *hint)
+	__dm_written_to_disk(hint);
 
 /*----------------------------------------------------------------*/
 
diff --git a/drivers/md/dm-cache-policy-cleaner.c b/drivers/md/dm-cache-policy-cleaner.c
index b04d1f9..7e5983c 100644
--- a/drivers/md/dm-cache-policy-cleaner.c
+++ b/drivers/md/dm-cache-policy-cleaner.c
@@ -274,7 +274,7 @@ static void add_cache_entry(struct policy *p, struct wb_cache_entry *e)
 
 static int wb_load_mapping(struct dm_cache_policy *pe,
 			   dm_oblock_t oblock, dm_cblock_t cblock,
-			   uint32_t hint, bool hint_valid)
+			   void *hint, bool hint_valid)
 {
 	int r;
 	struct policy *p = to_policy(pe);
diff --git a/drivers/md/dm-cache-policy-internal.h b/drivers/md/dm-cache-policy-internal.h
index a75f7e7..0f749e8 100644
--- a/drivers/md/dm-cache-policy-internal.h
+++ b/drivers/md/dm-cache-policy-internal.h
@@ -41,7 +41,7 @@ static inline void policy_clear_dirty(struct dm_cache_policy *p, dm_oblock_t obl
 
 static inline int policy_load_mapping(struct dm_cache_policy *p,
 				      dm_oblock_t oblock, dm_cblock_t cblock,
-				      uint32_t hint, bool hint_valid)
+				      void *hint, bool hint_valid)
 {
 	return p->load_mapping(p, oblock, cblock, hint, hint_valid);
 }
@@ -119,6 +119,9 @@ const char *dm_cache_policy_get_name(struct dm_cache_policy *p);
 
 const unsigned *dm_cache_policy_get_version(struct dm_cache_policy *p);
 
+#define DM_CACHE_POLICY_DEF_HINT_SIZE 4U
+#define DM_CACHE_POLICY_MAX_HINT_SIZE 128U
+int    dm_cache_policy_set_hint_size(struct dm_cache_policy *p, unsigned hint_size);
 size_t dm_cache_policy_get_hint_size(struct dm_cache_policy *p);
 
 /*----------------------------------------------------------------*/
diff --git a/drivers/md/dm-cache-policy-mq.c b/drivers/md/dm-cache-policy-mq.c
index 152e979..9f2589e 100644
--- a/drivers/md/dm-cache-policy-mq.c
+++ b/drivers/md/dm-cache-policy-mq.c
@@ -6,6 +6,7 @@
 
 #include "dm-cache-policy.h"
 #include "dm.h"
+#include "persistent-data/dm-btree.h"
 
 #include <linux/hash.h>
 #include <linux/module.h>
@@ -1024,7 +1025,7 @@ static void mq_clear_dirty(struct dm_cache_policy *p, dm_oblock_t oblock)
 
 static int mq_load_mapping(struct dm_cache_policy *p,
 			   dm_oblock_t oblock, dm_cblock_t cblock,
-			   uint32_t hint, bool hint_valid)
+			   void *hint, bool hint_valid)
 {
 	struct mq_policy *mq = to_mq_policy(p);
 	struct entry *e;
@@ -1037,38 +1038,45 @@ static int mq_load_mapping(struct dm_cache_policy *p,
 	e->oblock = oblock;
 	e->in_cache = true;
 	e->dirty = true;	/* this gets corrected in a minute */
-	e->hit_count = hint_valid ? hint : 1;
+	e->hit_count = hint_valid ? le32_to_cpu(*((__le32 *) hint)) : 1;
 	e->generation = mq->generation;
 	push(mq, e);
 
 	return 0;
 }
 
+static int mq_save_hints(struct mq_policy *mq, struct queue *q,
+			 policy_walk_fn fn, void *context)
+{
+	int r;
+	unsigned level;
+	struct entry *e;
+
+	for (level = 0; level < NR_QUEUE_LEVELS; level++)
+		list_for_each_entry(e, q->qs + level, list) {
+			__le32 value = cpu_to_le32(e->hit_count);
+			__dm_bless_for_disk(&value);
+
+			r = fn(context, e->cblock, e->oblock, &value);
+			if (r)
+				return r;
+		}
+
+	return 0;
+}
+
 static int mq_walk_mappings(struct dm_cache_policy *p, policy_walk_fn fn,
 			    void *context)
 {
 	struct mq_policy *mq = to_mq_policy(p);
 	int r = 0;
-	struct entry *e;
-	unsigned level;
 
 	mutex_lock(&mq->lock);
 
-	for (level = 0; level < NR_QUEUE_LEVELS; level++)
-		list_for_each_entry(e, &mq->cache_clean.qs[level], list) {
-			r = fn(context, e->cblock, e->oblock, e->hit_count);
-			if (r)
-				goto out;
-		}
-
-	for (level = 0; level < NR_QUEUE_LEVELS; level++)
-		list_for_each_entry(e, &mq->cache_dirty.qs[level], list) {
-			r = fn(context, e->cblock, e->oblock, e->hit_count);
-			if (r)
-				goto out;
-		}
+	r = mq_save_hints(mq, &mq->cache_clean, fn, context);
+	if (!r)
+		r = mq_save_hints(mq, &mq->cache_dirty, fn, context);
 
-out:
 	mutex_unlock(&mq->lock);
 
 	return r;
diff --git a/drivers/md/dm-cache-policy.c b/drivers/md/dm-cache-policy.c
index 21c03c5..8e84d08 100644
--- a/drivers/md/dm-cache-policy.c
+++ b/drivers/md/dm-cache-policy.c
@@ -80,9 +80,10 @@ int dm_cache_policy_register(struct dm_cache_policy_type *type)
 {
 	int r;
 
-	/* One size fits all for now */
-	if (type->hint_size != 0 && type->hint_size != 4) {
-		DMWARN("hint size must be 0 or 4 but %llu supplied.", (unsigned long long) type->hint_size);
+	if (type->hint_size > DM_CACHE_POLICY_MAX_HINT_SIZE) {
+		DMWARN("hint size must be <= %llu but %llu was supplied.",
+		       (unsigned long long) DM_CACHE_POLICY_MAX_HINT_SIZE,
+		       (unsigned long long) type->hint_size);
 		return -EINVAL;
 	}
 
@@ -166,4 +167,16 @@ size_t dm_cache_policy_get_hint_size(struct dm_cache_policy *p)
 }
 EXPORT_SYMBOL_GPL(dm_cache_policy_get_hint_size);
 
+int dm_cache_policy_set_hint_size(struct dm_cache_policy *p, unsigned hint_size)
+{
+	struct dm_cache_policy_type *t = p->private;
+
+	if (hint_size > DM_CACHE_POLICY_MAX_HINT_SIZE)
+		return -EPERM;
+
+	t->hint_size = hint_size;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(dm_cache_policy_set_hint_size);
+
 /*----------------------------------------------------------------*/
diff --git a/drivers/md/dm-cache-policy.h b/drivers/md/dm-cache-policy.h
index 33369ca..6779ea7 100644
--- a/drivers/md/dm-cache-policy.h
+++ b/drivers/md/dm-cache-policy.h
@@ -8,6 +8,7 @@
 #define DM_CACHE_POLICY_H
 
 #include "dm-cache-block-types.h"
+#include "persistent-data/dm-btree.h"
 
 #include <linux/device-mapper.h>
 
@@ -79,7 +80,8 @@ struct policy_result {
 };
 
 typedef int (*policy_walk_fn)(void *context, dm_cblock_t cblock,
-			      dm_oblock_t oblock, uint32_t hint);
+			      dm_oblock_t oblock, void *hint)
+	__dm_written_to_disk(hint);
 
 /*
  * The cache policy object.  Just a bunch of methods.  It is envisaged that
@@ -146,7 +148,7 @@ struct dm_cache_policy {
 	 * mapping from the metadata device into the policy.
 	 */
 	int (*load_mapping)(struct dm_cache_policy *p, dm_oblock_t oblock,
-			    dm_cblock_t cblock, uint32_t hint, bool hint_valid);
+			    dm_cblock_t cblock, void *hint, bool hint_valid);
 
 	int (*walk_mappings)(struct dm_cache_policy *p, policy_walk_fn fn,
 			     void *context);
@@ -210,9 +212,9 @@ struct dm_cache_policy_type {
 	unsigned version[CACHE_POLICY_VERSION_SIZE];
 
 	/*
-	 * Policies may store a hint for each each cache block.
-	 * Currently the size of this hint must be 0 or 4 bytes but we
-	 * expect to relax this in future.
+	 * Policies may store a hint for each cache block.
+	 * Currently the size of this hint must be <=
+	 * DM_CACHE_POLICY_MAX_HINT_SIZE bytes.
 	 */
 	size_t hint_size;
 
@@ -227,4 +229,4 @@ void dm_cache_policy_unregister(struct dm_cache_policy_type *type);
 
 /*----------------------------------------------------------------*/
 
-#endif	/* DM_CACHE_POLICY_H */
+#endif /* DM_CACHE_POLICY_H */
diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index 2956976..6fa45a8 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -2304,9 +2304,11 @@ static int write_discard_bitset(struct cache *cache)
 }
 
 static int save_hint(void *context, dm_cblock_t cblock, dm_oblock_t oblock,
-		     uint32_t hint)
+		     void *hint)
 {
 	struct cache *cache = context;
+
+	__dm_bless_for_disk(hint);
 	return dm_cache_save_hint(cache->cmd, cblock, hint);
 }
 
@@ -2374,7 +2376,7 @@ static void cache_postsuspend(struct dm_target *ti)
 }
 
 static int load_mapping(void *context, dm_oblock_t oblock, dm_cblock_t cblock,
-			bool dirty, uint32_t hint, bool hint_valid)
+			bool dirty, void *hint, bool hint_valid)
 {
 	int r;
 	struct cache *cache = context;
@@ -2630,7 +2632,7 @@ static void cache_io_hints(struct dm_target *ti, struct queue_limits *limits)
 
 static struct target_type cache_target = {
 	.name = "cache",
-	.version = {1, 1, 1},
+	.version = {1, 2, 0},
 	.module = THIS_MODULE,
 	.ctr = cache_ctr,
 	.dtr = cache_dtr,
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 11/24] dm table: print error on preresume failure
  2013-10-24 18:30 [PATCH 00/24] dm cache: proposed changes for v3.13 merge Mike Snitzer
                   ` (9 preceding siblings ...)
  2013-10-24 18:30 ` [PATCH 10/24] dm cache policy: variable hints support Mike Snitzer
@ 2013-10-24 18:30 ` Mike Snitzer
  2013-10-25 19:22   ` Alasdair G Kergon
  2013-10-24 18:30 ` [PATCH 12/24] dm cache: add passthrough mode Mike Snitzer
                   ` (12 subsequent siblings)
  23 siblings, 1 reply; 49+ messages in thread
From: Mike Snitzer @ 2013-10-24 18:30 UTC (permalink / raw)
  To: dm-devel; +Cc: Morgan Mears, Heinz Mauelshagen, Joe Thornber, Mike Snitzer

If preresume fails it is worth logging an error given that a device is
left suspended due to the failure.

This change was motivated by local preresume error logging that was
added to the cache target.  Elevating this error logging to DM core
makes sense.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
---
 drivers/md/dm-table.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 8f87835..9bd75dc 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1548,8 +1548,11 @@ int dm_table_resume_targets(struct dm_table *t)
 			continue;
 
 		r = ti->type->preresume(ti);
-		if (r)
+		if (r) {
+			DMERR("%s: %s: preresume failed, error = %d",
+			      dm_device_name(t->md), ti->type->name, r);
 			return r;
+		}
 	}
 
 	for (i = 0; i < t->num_targets; i++) {
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 12/24] dm cache: add passthrough mode
  2013-10-24 18:30 [PATCH 00/24] dm cache: proposed changes for v3.13 merge Mike Snitzer
                   ` (10 preceding siblings ...)
  2013-10-24 18:30 ` [PATCH 11/24] dm table: print error on preresume failure Mike Snitzer
@ 2013-10-24 18:30 ` Mike Snitzer
  2013-10-24 18:30 ` [PATCH 13/24] dm cache policy: have policy_writeback_work return -ENODATA by default Mike Snitzer
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Mike Snitzer @ 2013-10-24 18:30 UTC (permalink / raw)
  To: dm-devel; +Cc: Morgan Mears, Heinz Mauelshagen, Joe Thornber, Mike Snitzer

From: Joe Thornber <ejt@redhat.com>

"Passthrough" is a dm-cache operating mode (like writethrough or
writeback) which is intended to be used when the cache contents are not
known to be coherent with the origin device.  It behaves as follows:

* All reads are served from the origin device (all reads miss the cache)
* All writes are forwarded to the origin device; additionally, write
  hits cause cache block invalidates

This mode decouples cache coherency checks from cache device creation,
largely to avoid having to perform coherency checks while booting.  Boot
scripts can create cache devices in passthrough mode and put them into
service (mount cached filesystems, for example) without having to worry
about coherency.  Coherency that exists is maintained, although the
cache will gradually cool as writes take place.

Later, applications can perform coherency checks, the nature of which
will depend on the type of the underlying storage.  If coherency can be
verified, the cache device can be transitioned to writethrough or
writeback mode while still warm; otherwise, the cache contents can be
discarded prior to transitioning to the desired operating mode.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Morgan Mears <Morgan.Mears@netapp.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-cache-metadata.c |  52 +++++++++++
 drivers/md/dm-cache-metadata.h |   5 +
 drivers/md/dm-cache-target.c   | 208 +++++++++++++++++++++++++++++++++++------
 3 files changed, 234 insertions(+), 31 deletions(-)

diff --git a/drivers/md/dm-cache-metadata.c b/drivers/md/dm-cache-metadata.c
index 822972c..136fbe9 100644
--- a/drivers/md/dm-cache-metadata.c
+++ b/drivers/md/dm-cache-metadata.c
@@ -748,6 +748,53 @@ void dm_cache_metadata_close(struct dm_cache_metadata *cmd)
 	kfree(cmd);
 }
 
+/*
+ * Checks that the given cache block is either unmapped, or clean.
+ */
+static int block_unmapped_or_clean(struct dm_cache_metadata *cmd, dm_cblock_t b,
+				   bool *result)
+{
+	int r;
+	__le64 value;
+	dm_oblock_t ob;
+	unsigned flags;
+
+	r = dm_array_get_value(&cmd->info, cmd->root, from_cblock(b), &value);
+	if (r) {
+		DMERR("block_unmapped_or_clean failed");
+		return r;
+	}
+
+	unpack_value(value, &ob, &flags);
+	*result = !((flags & M_VALID) && (flags & M_DIRTY));
+
+	return 0;
+}
+
+static int blocks_are_unmapped_or_clean(struct dm_cache_metadata *cmd,
+					dm_cblock_t begin,
+					dm_cblock_t end,
+					bool *result)
+{
+	int r;
+
+	while (begin != end) {
+		r = block_unmapped_or_clean(cmd, begin, result);
+		if (r)
+			return r;
+
+		if (!*result) {
+			DMERR("cache block %llu is dirty",
+			      (unsigned long long) from_cblock(begin));
+			return 0;
+		}
+
+		begin = to_cblock(from_cblock(begin) + 1);
+	}
+
+	return 0;
+}
+
 int dm_cache_resize(struct dm_cache_metadata *cmd, dm_cblock_t new_cache_size)
 {
 	int r;
@@ -1263,3 +1310,8 @@ int dm_cache_save_hint(struct dm_cache_metadata *cmd, dm_cblock_t cblock, void *
 
 	return r;
 }
+
+int dm_cache_metadata_all_clean(struct dm_cache_metadata *cmd, bool *result)
+{
+	return blocks_are_unmapped_or_clean(cmd, 0, cmd->cache_blocks, result);
+}
diff --git a/drivers/md/dm-cache-metadata.h b/drivers/md/dm-cache-metadata.h
index 44fd4bf..4e5f435 100644
--- a/drivers/md/dm-cache-metadata.h
+++ b/drivers/md/dm-cache-metadata.h
@@ -146,6 +146,11 @@ int dm_cache_save_hint(struct dm_cache_metadata *cmd,
 		       dm_cblock_t cblock, void *hint)
 	__dm_written_to_disk(hint);
 
+/*
+ * Query method.  Are all the blocks in the cache clean?
+ */
+int dm_cache_metadata_all_clean(struct dm_cache_metadata *cmd, bool *result);
+
 /*----------------------------------------------------------------*/
 
 #endif /* DM_CACHE_METADATA_H */
diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index 6fa45a8..3bf8cc7 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -76,14 +76,37 @@ static void free_bitset(unsigned long *bits)
 /*
  * FIXME: the cache is read/write for the time being.
  */
-enum cache_mode {
+enum cache_metadata_mode {
 	CM_WRITE,		/* metadata may be changed */
 	CM_READ_ONLY,		/* metadata may not be changed */
 };
 
+enum cache_io_mode {
+	/*
+	 * Data is written to cached blocks only.  These blocks are marked
+	 * dirty.  If you lose the cache device you will lose data.
+	 * Potential performance increase for both reads and writes.
+	 */
+	CM_IO_WRITEBACK,
+
+	/*
+	 * Data is written to both cache and origin.  Blocks are never
+	 * dirty.  Potential performance benfit for reads only.
+	 */
+	CM_IO_WRITETHROUGH,
+
+	/*
+	 * A degraded mode useful for various cache coherency situations
+	 * (eg, rolling back snapshots).  Reads and writes always go to the
+	 * origin.  If a write goes to a cached oblock, then the cache
+	 * block is invalidated.
+	 */
+	CM_IO_PASSTHROUGH
+};
+
 struct cache_features {
-	enum cache_mode mode;
-	bool write_through:1;
+	enum cache_metadata_mode mode;
+	enum cache_io_mode io_mode;
 };
 
 struct cache_stats {
@@ -533,9 +556,24 @@ static void save_stats(struct cache *cache)
 #define PB_DATA_SIZE_WB (offsetof(struct per_bio_data, cache))
 #define PB_DATA_SIZE_WT (sizeof(struct per_bio_data))
 
+static bool writethrough_mode(struct cache_features *f)
+{
+	return f->io_mode == CM_IO_WRITETHROUGH;
+}
+
+static bool writeback_mode(struct cache_features *f)
+{
+	return f->io_mode == CM_IO_WRITEBACK;
+}
+
+static bool passthrough_mode(struct cache_features *f)
+{
+	return f->io_mode == CM_IO_PASSTHROUGH;
+}
+
 static size_t get_per_bio_data_size(struct cache *cache)
 {
-	return cache->features.write_through ? PB_DATA_SIZE_WT : PB_DATA_SIZE_WB;
+	return writethrough_mode(&cache->features) ? PB_DATA_SIZE_WT : PB_DATA_SIZE_WB;
 }
 
 static struct per_bio_data *get_per_bio_data(struct bio *bio, size_t data_size)
@@ -1047,6 +1085,31 @@ static void demote_then_promote(struct cache *cache, struct prealloc *structs,
 	quiesce_migration(mg);
 }
 
+/*
+ * Invalidate a cache entry.  No writeback occurs; any changes in the cache
+ * block are thrown away.
+ */
+static void invalidate(struct cache *cache, struct prealloc *structs,
+		       dm_oblock_t oblock, dm_cblock_t cblock,
+		       struct dm_bio_prison_cell *cell)
+{
+	struct dm_cache_migration *mg = prealloc_get_migration(structs);
+
+	mg->err = false;
+	mg->writeback = false;
+	mg->demote = true;
+	mg->promote = false;
+	mg->cache = cache;
+	mg->old_oblock = oblock;
+	mg->cblock = cblock;
+	mg->old_ocell = cell;
+	mg->new_ocell = NULL;
+	mg->start_jiffies = jiffies;
+
+	inc_nr_migrations(cache);
+	quiesce_migration(mg);
+}
+
 /*----------------------------------------------------------------
  * bio processing
  *--------------------------------------------------------------*/
@@ -1109,11 +1172,9 @@ static bool spare_migration_bandwidth(struct cache *cache)
 	return current_volume < cache->migration_threshold;
 }
 
-static bool is_writethrough_io(struct cache *cache, struct bio *bio,
-			       dm_cblock_t cblock)
+static bool is_write_io(struct bio *bio)
 {
-	return bio_data_dir(bio) == WRITE &&
-		cache->features.write_through && !is_dirty(cache, cblock);
+	return bio_data_dir(bio) == WRITE;
 }
 
 static void inc_hit_counter(struct cache *cache, struct bio *bio)
@@ -1128,6 +1189,15 @@ static void inc_miss_counter(struct cache *cache, struct bio *bio)
 		   &cache->stats.read_miss : &cache->stats.write_miss);
 }
 
+static void issue_cache_bio(struct cache *cache, struct bio *bio,
+			    struct per_bio_data *pb,
+			    dm_oblock_t oblock, dm_cblock_t cblock)
+{
+	pb->all_io_entry = dm_deferred_entry_inc(cache->all_io_ds);
+	remap_to_cache_dirty(cache, bio, oblock, cblock);
+	issue(cache, bio);
+}
+
 static void process_bio(struct cache *cache, struct prealloc *structs,
 			struct bio *bio)
 {
@@ -1139,7 +1209,8 @@ static void process_bio(struct cache *cache, struct prealloc *structs,
 	size_t pb_data_size = get_per_bio_data_size(cache);
 	struct per_bio_data *pb = get_per_bio_data(bio, pb_data_size);
 	bool discarded_block = is_discarded_oblock(cache, block);
-	bool can_migrate = discarded_block || spare_migration_bandwidth(cache);
+	bool passthrough = passthrough_mode(&cache->features);
+	bool can_migrate = !passthrough && (discarded_block || spare_migration_bandwidth(cache));
 
 	/*
 	 * Check to see if that block is currently migrating.
@@ -1160,15 +1231,39 @@ static void process_bio(struct cache *cache, struct prealloc *structs,
 
 	switch (lookup_result.op) {
 	case POLICY_HIT:
-		inc_hit_counter(cache, bio);
-		pb->all_io_entry = dm_deferred_entry_inc(cache->all_io_ds);
+		if (passthrough) {
+			inc_miss_counter(cache, bio);
 
-		if (is_writethrough_io(cache, bio, lookup_result.cblock))
-			remap_to_origin_then_cache(cache, bio, block, lookup_result.cblock);
-		else
-			remap_to_cache_dirty(cache, bio, block, lookup_result.cblock);
+			/*
+			 * Passthrough always maps to the origin,
+			 * invalidating any cache blocks that are written
+			 * to.
+			 */
+
+			if (is_write_io(bio)) {
+				atomic_inc(&cache->stats.demotion);
+				invalidate(cache, structs, block, lookup_result.cblock, new_ocell);
+				release_cell = false;
+
+			} else {
+				// FIXME: factor out issue_origin()
+				pb->all_io_entry = dm_deferred_entry_inc(cache->all_io_ds);
+				remap_to_origin_clear_discard(cache, bio, block);
+				issue(cache, bio);
+			}
+		} else {
+			inc_hit_counter(cache, bio);
+
+			if (is_write_io(bio) &&
+			    writethrough_mode(&cache->features) &&
+			    !is_dirty(cache, lookup_result.cblock)) {
+				pb->all_io_entry = dm_deferred_entry_inc(cache->all_io_ds);
+				remap_to_origin_then_cache(cache, bio, block, lookup_result.cblock);
+				issue(cache, bio);
+			} else
+				issue_cache_bio(cache, bio, pb, block, lookup_result.cblock);
+		}
 
-		issue(cache, bio);
 		break;
 
 	case POLICY_MISS:
@@ -1715,7 +1810,7 @@ static int parse_block_size(struct cache_args *ca, struct dm_arg_set *as,
 static void init_features(struct cache_features *cf)
 {
 	cf->mode = CM_WRITE;
-	cf->write_through = false;
+	cf->io_mode = CM_IO_WRITEBACK;
 }
 
 static int parse_features(struct cache_args *ca, struct dm_arg_set *as,
@@ -1740,10 +1835,13 @@ static int parse_features(struct cache_args *ca, struct dm_arg_set *as,
 		arg = dm_shift_arg(as);
 
 		if (!strcasecmp(arg, "writeback"))
-			cf->write_through = false;
+			cf->io_mode = CM_IO_WRITEBACK;
 
 		else if (!strcasecmp(arg, "writethrough"))
-			cf->write_through = true;
+			cf->io_mode = CM_IO_WRITETHROUGH;
+
+		else if (!strcasecmp(arg, "passthrough"))
+			cf->io_mode = CM_IO_PASSTHROUGH;
 
 		else {
 			*error = "Unrecognised cache feature requested";
@@ -1995,6 +2093,22 @@ static int cache_create(struct cache_args *ca, struct cache **result)
 	}
 	cache->cmd = cmd;
 
+	if (passthrough_mode(&cache->features)) {
+		bool all_clean;
+
+		r = dm_cache_metadata_all_clean(cache->cmd, &all_clean);
+		if (r) {
+			*error = "dm_cache_metadata_all_clean() failed";
+			goto bad;
+		}
+
+		if (!all_clean) {
+			*error = "Cannot enter passthrough mode unless all blocks are clean";
+			r = -EINVAL;
+			goto bad;
+		}
+	}
+
 	spin_lock_init(&cache->lock);
 	bio_list_init(&cache->deferred_bios);
 	bio_list_init(&cache->deferred_flush_bios);
@@ -2207,17 +2321,40 @@ static int cache_map(struct dm_target *ti, struct bio *bio)
 		return DM_MAPIO_SUBMITTED;
 	}
 
+	r = DM_MAPIO_REMAPPED;
 	switch (lookup_result.op) {
 	case POLICY_HIT:
-		inc_hit_counter(cache, bio);
-		pb->all_io_entry = dm_deferred_entry_inc(cache->all_io_ds);
+		if (passthrough_mode(&cache->features)) {
+			if (is_write_io(bio)) {
+				/*
+				 * We need to invalidate this block, so
+				 * defer for the worker thread.
+				 */
+				cell_defer(cache, cell, true);
+				r = DM_MAPIO_SUBMITTED;
+
+			} else {
+				pb->all_io_entry = dm_deferred_entry_inc(cache->all_io_ds);
+				inc_miss_counter(cache, bio);
+				remap_to_origin_clear_discard(cache, bio, block);
+
+				cell_defer(cache, cell, false);
+			}
 
-		if (is_writethrough_io(cache, bio, lookup_result.cblock))
-			remap_to_origin_then_cache(cache, bio, block, lookup_result.cblock);
-		else
-			remap_to_cache_dirty(cache, bio, block, lookup_result.cblock);
+		} else {
+			inc_hit_counter(cache, bio);
 
-		cell_defer(cache, cell, false);
+			if (is_write_io(bio) &&
+			    writethrough_mode(&cache->features) &&
+			    !is_dirty(cache, lookup_result.cblock))
+				remap_to_origin_then_cache(cache, bio, block, lookup_result.cblock);
+
+			else
+				remap_to_cache_dirty(cache, bio, block, lookup_result.cblock);
+
+			cell_defer(cache, cell, false);
+
+		}
 		break;
 
 	case POLICY_MISS:
@@ -2242,10 +2379,10 @@ static int cache_map(struct dm_target *ti, struct bio *bio)
 		DMERR_LIMIT("%s: erroring bio: unknown policy op: %u", __func__,
 			    (unsigned) lookup_result.op);
 		bio_io_error(bio);
-		return DM_MAPIO_SUBMITTED;
+		r = DM_MAPIO_SUBMITTED;
 	}
 
-	return DM_MAPIO_REMAPPED;
+	return r;
 }
 
 static int cache_end_io(struct dm_target *ti, struct bio *bio, int error)
@@ -2520,10 +2657,19 @@ static void cache_status(struct dm_target *ti, status_type_t type,
 		       (unsigned long long) from_cblock(residency),
 		       cache->nr_dirty);
 
-		if (cache->features.write_through)
+		if (writethrough_mode(&cache->features))
 			DMEMIT("1 writethrough ");
-		else
-			DMEMIT("0 ");
+
+		else if (passthrough_mode(&cache->features))
+			DMEMIT("1 passthrough ");
+
+		else if (writeback_mode(&cache->features))
+			DMEMIT("1 writeback ");
+
+		else {
+			DMERR("internal error: unknown io mode: %d", (int) cache->features.io_mode);
+			goto err;
+		}
 
 		DMEMIT("2 migration_threshold %llu ", (unsigned long long) cache->migration_threshold);
 		if (sz < maxlen) {
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 13/24] dm cache policy: have policy_writeback_work return -ENODATA by default
  2013-10-24 18:30 [PATCH 00/24] dm cache: proposed changes for v3.13 merge Mike Snitzer
                   ` (11 preceding siblings ...)
  2013-10-24 18:30 ` [PATCH 12/24] dm cache: add passthrough mode Mike Snitzer
@ 2013-10-24 18:30 ` Mike Snitzer
  2013-10-24 18:30 ` [PATCH 14/24] dm cache: use is_write_io() in more places Mike Snitzer
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Mike Snitzer @ 2013-10-24 18:30 UTC (permalink / raw)
  To: dm-devel; +Cc: Morgan Mears, Heinz Mauelshagen, Joe Thornber, Mike Snitzer

From: Heinz Mauelshagen <heinzm@redhat.com>

Caller of policy_writeback_work doesn't care, so doesn't actually fix
anything but increases consistency of returns used (which helps
debugging).

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-cache-policy-internal.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/dm-cache-policy-internal.h b/drivers/md/dm-cache-policy-internal.h
index 0f749e8..9b1473b 100644
--- a/drivers/md/dm-cache-policy-internal.h
+++ b/drivers/md/dm-cache-policy-internal.h
@@ -56,7 +56,7 @@ static inline int policy_writeback_work(struct dm_cache_policy *p,
 					dm_oblock_t *oblock,
 					dm_cblock_t *cblock)
 {
-	return p->writeback_work ? p->writeback_work(p, oblock, cblock) : -ENOENT;
+	return p->writeback_work ? p->writeback_work(p, oblock, cblock) : -ENODATA;
 }
 
 static inline void policy_remove_mapping(struct dm_cache_policy *p, dm_oblock_t oblock)
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 14/24] dm cache: use is_write_io() in more places
  2013-10-24 18:30 [PATCH 00/24] dm cache: proposed changes for v3.13 merge Mike Snitzer
                   ` (12 preceding siblings ...)
  2013-10-24 18:30 ` [PATCH 13/24] dm cache policy: have policy_writeback_work return -ENODATA by default Mike Snitzer
@ 2013-10-24 18:30 ` Mike Snitzer
  2013-10-25 19:53   ` Alasdair G Kergon
  2013-10-24 18:30 ` [PATCH 15/24] dm cache: use cell_defer() boolean argument consistently Mike Snitzer
                   ` (9 subsequent siblings)
  23 siblings, 1 reply; 49+ messages in thread
From: Mike Snitzer @ 2013-10-24 18:30 UTC (permalink / raw)
  To: dm-devel; +Cc: Morgan Mears, Heinz Mauelshagen, Joe Thornber, Mike Snitzer

From: Heinz Mauelshagen <heinzm@redhat.com>

Use is_write_io() consistently where it is makes sense.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-cache-target.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index 3bf8cc7..5fb4d64 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -631,12 +631,17 @@ static void check_if_tick_bio_needed(struct cache *cache, struct bio *bio)
 	spin_unlock_irqrestore(&cache->lock, flags);
 }
 
+static bool is_write_io(struct bio *bio)
+{
+	return bio_data_dir(bio) == WRITE;
+}
+
 static void remap_to_origin_clear_discard(struct cache *cache, struct bio *bio,
 				  dm_oblock_t oblock)
 {
 	check_if_tick_bio_needed(cache, bio);
 	remap_to_origin(cache, bio);
-	if (bio_data_dir(bio) == WRITE)
+	if (is_write_io(bio))
 		clear_discard(cache, oblock_to_dblock(cache, oblock));
 }
 
@@ -644,7 +649,7 @@ static void remap_to_cache_dirty(struct cache *cache, struct bio *bio,
 				 dm_oblock_t oblock, dm_cblock_t cblock)
 {
 	remap_to_cache(cache, bio, cblock);
-	if (bio_data_dir(bio) == WRITE) {
+	if (is_write_io(bio)) {
 		set_dirty(cache, oblock, cblock);
 		clear_discard(cache, oblock_to_dblock(cache, oblock));
 	}
@@ -1172,21 +1177,16 @@ static bool spare_migration_bandwidth(struct cache *cache)
 	return current_volume < cache->migration_threshold;
 }
 
-static bool is_write_io(struct bio *bio)
-{
-	return bio_data_dir(bio) == WRITE;
-}
-
 static void inc_hit_counter(struct cache *cache, struct bio *bio)
 {
-	atomic_inc(bio_data_dir(bio) == READ ?
-		   &cache->stats.read_hit : &cache->stats.write_hit);
+	atomic_inc(is_write_io(bio) ?
+		   &cache->stats.write_hit : &cache->stats.read_hit);
 }
 
 static void inc_miss_counter(struct cache *cache, struct bio *bio)
 {
-	atomic_inc(bio_data_dir(bio) == READ ?
-		   &cache->stats.read_miss : &cache->stats.write_miss);
+	atomic_inc(is_write_io(bio) ?
+		   &cache->stats.write_miss : &cache->stats.read_miss);
 }
 
 static void issue_cache_bio(struct cache *cache, struct bio *bio,
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 15/24] dm cache: use cell_defer() boolean argument consistently
  2013-10-24 18:30 [PATCH 00/24] dm cache: proposed changes for v3.13 merge Mike Snitzer
                   ` (13 preceding siblings ...)
  2013-10-24 18:30 ` [PATCH 14/24] dm cache: use is_write_io() in more places Mike Snitzer
@ 2013-10-24 18:30 ` Mike Snitzer
  2013-10-24 18:30 ` [PATCH 16/24] dm cache: log error message if dm_kcopyd_copy() fails Mike Snitzer
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Mike Snitzer @ 2013-10-24 18:30 UTC (permalink / raw)
  To: dm-devel; +Cc: Morgan Mears, Heinz Mauelshagen, Joe Thornber, Mike Snitzer

From: Heinz Mauelshagen <heinzm@redhat.com>

Fix a few cell_defer() calls that weren't passing a bool.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-cache-target.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index 5fb4d64..db27dfa 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -808,13 +808,13 @@ static void migration_failure(struct dm_cache_migration *mg)
 		DMWARN_LIMIT("demotion failed; couldn't copy block");
 		policy_force_mapping(cache->policy, mg->new_oblock, mg->old_oblock);
 
-		cell_defer(cache, mg->old_ocell, mg->promote ? 0 : 1);
+		cell_defer(cache, mg->old_ocell, mg->promote ? false : true);
 		if (mg->promote)
-			cell_defer(cache, mg->new_ocell, 1);
+			cell_defer(cache, mg->new_ocell, true);
 	} else {
 		DMWARN_LIMIT("promotion failed; couldn't copy block");
 		policy_remove_mapping(cache->policy, mg->new_oblock);
-		cell_defer(cache, mg->new_ocell, 1);
+		cell_defer(cache, mg->new_ocell, true);
 	}
 
 	cleanup_migration(mg);
@@ -866,7 +866,7 @@ static void migration_success_post_commit(struct dm_cache_migration *mg)
 		return;
 
 	} else if (mg->demote) {
-		cell_defer(cache, mg->old_ocell, mg->promote ? 0 : 1);
+		cell_defer(cache, mg->old_ocell, mg->promote ? false : true);
 
 		if (mg->promote) {
 			mg->demote = false;
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 16/24] dm cache: log error message if dm_kcopyd_copy() fails
  2013-10-24 18:30 [PATCH 00/24] dm cache: proposed changes for v3.13 merge Mike Snitzer
                   ` (14 preceding siblings ...)
  2013-10-24 18:30 ` [PATCH 15/24] dm cache: use cell_defer() boolean argument consistently Mike Snitzer
@ 2013-10-24 18:30 ` Mike Snitzer
  2013-11-08 13:33   ` Alasdair G Kergon
  2013-10-24 18:30 ` [PATCH 17/24] dm cache: use a boolean when setting cache->quiescing Mike Snitzer
                   ` (7 subsequent siblings)
  23 siblings, 1 reply; 49+ messages in thread
From: Mike Snitzer @ 2013-10-24 18:30 UTC (permalink / raw)
  To: dm-devel; +Cc: Morgan Mears, Heinz Mauelshagen, Joe Thornber, Mike Snitzer

From: Heinz Mauelshagen <heinzm@redhat.com>

A migration failure should be logged (albeit limited).

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-cache-target.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index db27dfa..afeda1b 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -924,8 +924,10 @@ static void issue_copy_real(struct dm_cache_migration *mg)
 		r = dm_kcopyd_copy(cache->copier, &o_region, 1, &c_region, 0, copy_complete, mg);
 	}
 
-	if (r < 0)
+	if (r < 0) {
+		DMERR_LIMIT("issuing migration failed");
 		migration_failure(mg);
+	}
 }
 
 static void avoid_copy(struct dm_cache_migration *mg)
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 17/24] dm cache: use a boolean when setting cache->quiescing
  2013-10-24 18:30 [PATCH 00/24] dm cache: proposed changes for v3.13 merge Mike Snitzer
                   ` (15 preceding siblings ...)
  2013-10-24 18:30 ` [PATCH 16/24] dm cache: log error message if dm_kcopyd_copy() fails Mike Snitzer
@ 2013-10-24 18:30 ` Mike Snitzer
  2013-11-06 15:02   ` Alasdair G Kergon
  2013-10-24 18:30 ` [PATCH 18/24] dm cache: optimize commit_if_needed Mike Snitzer
                   ` (6 subsequent siblings)
  23 siblings, 1 reply; 49+ messages in thread
From: Mike Snitzer @ 2013-10-24 18:30 UTC (permalink / raw)
  To: dm-devel; +Cc: Morgan Mears, Heinz Mauelshagen, Joe Thornber, Mike Snitzer

From: Heinz Mauelshagen <heinzm@redhat.com>

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-cache-target.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index afeda1b..f999ddd 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -1448,7 +1448,7 @@ static void start_quiescing(struct cache *cache)
 	unsigned long flags;
 
 	spin_lock_irqsave(&cache->lock, flags);
-	cache->quiescing = 1;
+	cache->quiescing = true;
 	spin_unlock_irqrestore(&cache->lock, flags);
 }
 
@@ -1457,7 +1457,7 @@ static void stop_quiescing(struct cache *cache)
 	unsigned long flags;
 
 	spin_lock_irqsave(&cache->lock, flags);
-	cache->quiescing = 0;
+	cache->quiescing = false;
 	spin_unlock_irqrestore(&cache->lock, flags);
 }
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 18/24] dm cache: optimize commit_if_needed
  2013-10-24 18:30 [PATCH 00/24] dm cache: proposed changes for v3.13 merge Mike Snitzer
                   ` (16 preceding siblings ...)
  2013-10-24 18:30 ` [PATCH 17/24] dm cache: use a boolean when setting cache->quiescing Mike Snitzer
@ 2013-10-24 18:30 ` Mike Snitzer
  2013-10-24 18:30 ` [PATCH 19/24] dm cache: support for stackable caching policies Mike Snitzer
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Mike Snitzer @ 2013-10-24 18:30 UTC (permalink / raw)
  To: dm-devel; +Cc: Morgan Mears, Heinz Mauelshagen, Joe Thornber, Mike Snitzer

From: Heinz Mauelshagen <heinzm@redhat.com>

Check commit_requested flag _before_ calling
dm_cache_changed_this_transaction() superfluously.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-cache-target.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index f999ddd..502ae64 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -1324,8 +1324,8 @@ static int need_commit_due_to_time(struct cache *cache)
 
 static int commit_if_needed(struct cache *cache)
 {
-	if (dm_cache_changed_this_transaction(cache->cmd) &&
-	    (cache->commit_requested || need_commit_due_to_time(cache))) {
+	if ((cache->commit_requested || need_commit_due_to_time(cache)) &&
+	    dm_cache_changed_this_transaction(cache->cmd)) {
 		atomic_inc(&cache->stats.commit_count);
 		cache->last_commit_jiffies = jiffies;
 		cache->commit_requested = false;
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 19/24] dm cache: support for stackable caching policies
  2013-10-24 18:30 [PATCH 00/24] dm cache: proposed changes for v3.13 merge Mike Snitzer
                   ` (17 preceding siblings ...)
  2013-10-24 18:30 ` [PATCH 18/24] dm cache: optimize commit_if_needed Mike Snitzer
@ 2013-10-24 18:30 ` Mike Snitzer
  2013-10-24 18:30 ` [PATCH 20/24] dm cache: add era policy shim Mike Snitzer
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Mike Snitzer @ 2013-10-24 18:30 UTC (permalink / raw)
  To: dm-devel; +Cc: Morgan Mears, Heinz Mauelshagen, Joe Thornber, Mike Snitzer

From: Morgan Mears <morgan.mears@netapp.com>

This commit implements support for stacking caching policies by
concatenating policy names.

A policy stack includes zero or more non-terminal policies, or shims,
followed by exactly one terminal policy which actually maintains the
cache-to-origin block mappings.

Non-terminal policy shims will be added in later patches.  Each policy
shim must set the DM_CACHE_POLICY_SHIM feature flag in the "features"
member of the dm_cache_policy_type structure.

Signed-off-by: Morgan Mears <morgan.mears@netapp.com>
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/Makefile                   |   3 +-
 drivers/md/dm-cache-policy-internal.h |   6 +
 drivers/md/dm-cache-policy.c          |  47 ++++++-
 drivers/md/dm-cache-policy.h          |  17 +++
 drivers/md/dm-cache-shim-utils.c      | 210 +++++++++++++++++++++++++++++
 drivers/md/dm-cache-shim-utils.h      |  73 +++++++++++
 drivers/md/dm-cache-stack-utils.c     | 239 ++++++++++++++++++++++++++++++++++
 drivers/md/dm-cache-stack-utils.h     |  34 +++++
 8 files changed, 626 insertions(+), 3 deletions(-)
 create mode 100644 drivers/md/dm-cache-shim-utils.c
 create mode 100644 drivers/md/dm-cache-shim-utils.h
 create mode 100644 drivers/md/dm-cache-stack-utils.c
 create mode 100644 drivers/md/dm-cache-stack-utils.h

diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index 2acc43f..5f6dfc3 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -11,7 +11,8 @@ dm-mirror-y	+= dm-raid1.o
 dm-log-userspace-y \
 		+= dm-log-userspace-base.o dm-log-userspace-transfer.o
 dm-thin-pool-y	+= dm-thin.o dm-thin-metadata.o
-dm-cache-y	+= dm-cache-target.o dm-cache-metadata.o dm-cache-policy.o
+dm-cache-y	+= dm-cache-target.o dm-cache-metadata.o dm-cache-policy.o \
+		   dm-cache-shim-utils.o dm-cache-stack-utils.o
 dm-cache-mq-y   += dm-cache-policy-mq.o
 dm-cache-cleaner-y += dm-cache-policy-cleaner.o
 md-mod-y	+= md.o bitmap.o
diff --git a/drivers/md/dm-cache-policy-internal.h b/drivers/md/dm-cache-policy-internal.h
index 9b1473b..996b2b5 100644
--- a/drivers/md/dm-cache-policy-internal.h
+++ b/drivers/md/dm-cache-policy-internal.h
@@ -124,6 +124,12 @@ const unsigned *dm_cache_policy_get_version(struct dm_cache_policy *p);
 int    dm_cache_policy_set_hint_size(struct dm_cache_policy *p, unsigned hint_size);
 size_t dm_cache_policy_get_hint_size(struct dm_cache_policy *p);
 
+/*
+ * Return bool that reflects whether or not policy is only a shim
+ * layer in a policy stack.
+ */
+bool dm_cache_policy_is_shim(struct dm_cache_policy *p);
+
 /*----------------------------------------------------------------*/
 
 #endif /* DM_CACHE_POLICY_INTERNAL_H */
diff --git a/drivers/md/dm-cache-policy.c b/drivers/md/dm-cache-policy.c
index 8e84d08..f2d8d33 100644
--- a/drivers/md/dm-cache-policy.c
+++ b/drivers/md/dm-cache-policy.c
@@ -5,6 +5,7 @@
  */
 
 #include "dm-cache-policy-internal.h"
+#include "dm-cache-stack-utils.h"
 #include "dm.h"
 
 #include <linux/module.h>
@@ -54,6 +55,9 @@ static struct dm_cache_policy_type *get_policy_once(const char *name)
 static struct dm_cache_policy_type *get_policy(const char *name)
 {
 	struct dm_cache_policy_type *t;
+	char name_wo_delim[CACHE_POLICY_NAME_SIZE];
+	char *p_delim;
+	int n;
 
 	t = get_policy_once(name);
 	if (IS_ERR(t))
@@ -68,6 +72,28 @@ static struct dm_cache_policy_type *get_policy(const char *name)
 	if (IS_ERR(t))
 		return NULL;
 
+	if (t)
+		return t;
+
+	/*
+	 * We also need to check for dm-cache-<@name> with no trailing
+	 * DM_CACHE_POLICY_STACK_DELIM if @name has one, in order to
+	 * support loadable policy shims.
+	 */
+	n = strlcpy(name_wo_delim, name, sizeof(name_wo_delim));
+	if (n >= sizeof(name_wo_delim))
+		return NULL;
+	p_delim = strchr(name_wo_delim, DM_CACHE_POLICY_STACK_DELIM);
+	if (!p_delim || (p_delim[1] != '\0'))
+		return NULL;
+	p_delim[0] = '\0';
+
+	request_module("dm-cache-%s", name_wo_delim);
+
+	t = get_policy_once(name);
+	if (IS_ERR(t))
+		return NULL;
+
 	return t;
 }
 
@@ -117,6 +143,11 @@ struct dm_cache_policy *dm_cache_policy_create(const char *name,
 	struct dm_cache_policy *p = NULL;
 	struct dm_cache_policy_type *type;
 
+	if (dm_cache_stack_utils_string_is_policy_stack(name))
+		return dm_cache_stack_utils_policy_stack_create(name, cache_size,
+								origin_size,
+								cache_block_size);
+
 	type = get_policy(name);
 	if (!type) {
 		DMWARN("unknown policy type");
@@ -138,8 +169,12 @@ void dm_cache_policy_destroy(struct dm_cache_policy *p)
 {
 	struct dm_cache_policy_type *t = p->private;
 
-	p->destroy(p);
-	put_policy(t);
+	if (dm_cache_stack_utils_string_is_policy_stack(t->name))
+		dm_cache_stack_utils_policy_stack_destroy(p);
+	else {
+		p->destroy(p);
+		put_policy(t);
+	}
 }
 EXPORT_SYMBOL_GPL(dm_cache_policy_destroy);
 
@@ -179,4 +214,12 @@ int dm_cache_policy_set_hint_size(struct dm_cache_policy *p, unsigned hint_size)
 }
 EXPORT_SYMBOL_GPL(dm_cache_policy_set_hint_size);
 
+bool dm_cache_policy_is_shim(struct dm_cache_policy *p)
+{
+	struct dm_cache_policy_type *t = p->private;
+
+	return (t->features & DM_CACHE_POLICY_SHIM) ? true : false;
+}
+EXPORT_SYMBOL_GPL(dm_cache_policy_is_shim);
+
 /*----------------------------------------------------------------*/
diff --git a/drivers/md/dm-cache-policy.h b/drivers/md/dm-cache-policy.h
index 6779ea7..83ec775 100644
--- a/drivers/md/dm-cache-policy.h
+++ b/drivers/md/dm-cache-policy.h
@@ -190,11 +190,26 @@ struct dm_cache_policy {
 	 * Book keeping ptr for the policy register, not for general use.
 	 */
 	void *private;
+
+	/*
+	 * Support for stackable policies. A policy stack consists of 0 or more
+	 * "non-terminal" policies (which can intercept requests to provide
+	 * additional functionality, but ultimately hand them down the stack)
+	 * followed by one "terminal" policy which actually runs a caching
+	 * algorithm.  This is the pointer to the "next" policy in a
+	 * non-terminal policy.  It will always be NULL in a terminal policy.
+	 */
+	struct dm_cache_policy *child;
 };
 
 /*----------------------------------------------------------------*/
 
 /*
+ * Indicates that a policy is only a shim layer in a policy stack.
+ */
+#define	DM_CACHE_POLICY_SHIM	 (1 << 0)
+
+/*
  * We maintain a little register of the different policy types.
  */
 #define CACHE_POLICY_NAME_SIZE 16
@@ -222,6 +237,8 @@ struct dm_cache_policy_type {
 	struct dm_cache_policy *(*create)(dm_cblock_t cache_size,
 					  sector_t origin_size,
 					  sector_t block_size);
+
+	unsigned long features;
 };
 
 int dm_cache_policy_register(struct dm_cache_policy_type *type);
diff --git a/drivers/md/dm-cache-shim-utils.c b/drivers/md/dm-cache-shim-utils.c
new file mode 100644
index 0000000..4151883
--- /dev/null
+++ b/drivers/md/dm-cache-shim-utils.c
@@ -0,0 +1,210 @@
+/*
+ * Copyright 2013 NetApp, Inc. All Rights Reserved, contribution by
+ * Morgan Mears.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or (at your
+ * option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details
+ *
+ */
+
+#include "dm-cache-policy.h"
+#include "dm-cache-policy-internal.h"
+#include "dm-cache-shim-utils.h"
+#include "dm.h"
+
+#include <linux/module.h>
+
+#define DM_MSG_PREFIX "cache-shim-utils"
+
+/*----------------------------------------------------------------*/
+
+static int shim_nested_walk_apply(void *context, dm_cblock_t cblock,
+				  dm_oblock_t oblock, void *hint)
+{
+	struct shim_walk_map_ctx *ctx = context;
+	struct dm_cache_policy *p;
+	int child_hint_size;
+	void *my_hint;
+
+	/* Save off our child's hint */
+	if (ctx->child_hint_buf) {
+		p = ctx->my_policy;
+		child_hint_size = dm_cache_policy_get_hint_size(p->child);
+		if (child_hint_size && hint)
+			memcpy(&ctx->child_hint_buf[0], hint, child_hint_size);
+	}
+
+	/* Provide my hint or NULL up the stack */
+	my_hint = ctx->cblock_to_hint_fn ?
+		ctx->cblock_to_hint_fn(ctx, cblock, oblock) : NULL;
+
+	/* Reverse recurse, unless short-circuted */
+	return (ctx->parent_fn) ?
+		(*ctx->parent_fn)(ctx->parent_ctx, cblock, oblock, my_hint) : 0;
+}
+
+/*----------------------------------------------------------------*/
+
+/*
+ * Public interface, via the policy struct.  See dm-cache-policy.h for a
+ * description of these.
+ */
+
+static void shim_destroy(struct dm_cache_policy *p)
+{
+	kfree(p);
+}
+
+static int shim_map(struct dm_cache_policy *p, dm_oblock_t oblock,
+		    bool can_block, bool can_migrate, bool discarded_oblock,
+		    struct bio *bio, struct policy_result *result)
+{
+	return policy_map(p->child, oblock, can_block, can_migrate,
+			  discarded_oblock, bio, result);
+}
+
+static int shim_lookup(struct dm_cache_policy *p, dm_oblock_t oblock,
+		       dm_cblock_t *cblock)
+{
+	return policy_lookup(p->child, oblock, cblock);
+}
+
+static void shim_set_dirty(struct dm_cache_policy *p, dm_oblock_t oblock)
+{
+	policy_set_dirty(p->child, oblock);
+}
+
+static void shim_clear_dirty(struct dm_cache_policy *p, dm_oblock_t oblock)
+{
+	policy_clear_dirty(p->child, oblock);
+}
+
+static int shim_load_mapping(struct dm_cache_policy *p,
+			     dm_oblock_t oblock, dm_cblock_t cblock,
+			     void *hint, bool hint_valid)
+{
+	return policy_load_mapping(p->child, oblock, cblock, hint, hint_valid);
+}
+
+static int shim_walk_mappings(struct dm_cache_policy *p, policy_walk_fn fn,
+			      void *context)
+{
+	struct shim_walk_map_ctx my_ctx, *parent_ctx;
+	int my_hint_size;
+
+	parent_ctx = (struct shim_walk_map_ctx *)context;
+	my_hint_size = dm_cache_policy_get_hint_size(p);
+
+	my_ctx.parent_ctx = parent_ctx;
+	my_ctx.parent_fn = fn;
+	my_ctx.my_policy = p;
+	my_ctx.child_hint_buf = (parent_ctx->child_hint_buf) ?
+		&parent_ctx->child_hint_buf[my_hint_size] : NULL;
+	my_ctx.cblock_to_hint_fn = NULL;
+
+	return policy_walk_mappings(p->child, shim_nested_walk_apply, &my_ctx);
+}
+
+static void shim_remove_mapping(struct dm_cache_policy *p, dm_oblock_t oblock)
+{
+	policy_remove_mapping(p->child, oblock);
+}
+
+static int shim_writeback_work(struct dm_cache_policy *p, dm_oblock_t *oblock,
+			       dm_cblock_t *cblock)
+{
+	return policy_writeback_work(p->child, oblock, cblock);
+}
+
+static void shim_force_mapping(struct dm_cache_policy *p,
+			       dm_oblock_t current_oblock,
+			       dm_oblock_t new_oblock)
+{
+	policy_force_mapping(p->child, current_oblock, new_oblock);
+}
+
+static dm_cblock_t shim_residency(struct dm_cache_policy *p)
+{
+	return policy_residency(p->child);
+}
+
+static void shim_tick(struct dm_cache_policy *p)
+{
+	policy_tick(p->child);
+}
+
+static int shim_set_config_value(struct dm_cache_policy *p,
+				 const char *key, const char *value)
+{
+	return policy_set_config_value(p->child, key, value);
+}
+
+static int shim_emit_config_values(struct dm_cache_policy *p, char *result,
+				   unsigned maxlen)
+{
+	return policy_emit_config_values(p->child, result, maxlen);
+}
+
+void dm_cache_shim_utils_init_shim_policy(struct dm_cache_policy *p)
+{
+	p->destroy = shim_destroy;
+	p->map = shim_map;
+	p->lookup = shim_lookup;
+	p->set_dirty = shim_set_dirty;
+	p->clear_dirty = shim_clear_dirty;
+	p->load_mapping = shim_load_mapping;
+	p->walk_mappings = shim_walk_mappings;
+	p->remove_mapping = shim_remove_mapping;
+	p->writeback_work = shim_writeback_work;
+	p->force_mapping = shim_force_mapping;
+	p->residency = shim_residency;
+	p->tick = shim_tick;
+	p->emit_config_values = shim_emit_config_values;
+	p->set_config_value = shim_set_config_value;
+}
+EXPORT_SYMBOL_GPL(dm_cache_shim_utils_init_shim_policy);
+
+int dm_cache_shim_utils_walk_map_with_ctx(struct shim_walk_map_ctx *ctx)
+{
+	struct dm_cache_policy *p = ctx->my_policy;
+
+	/*
+	 * Used by the stack root policy in its walk_mappings implementation,
+	 * to provide the top-level context that contains the buffer used to
+	 * consolidate hint data from all of the shims and the terminal policy.
+	 */
+	return policy_walk_mappings(p->child, shim_nested_walk_apply, ctx);
+}
+EXPORT_SYMBOL_GPL(dm_cache_shim_utils_walk_map_with_ctx);
+
+int dm_cache_shim_utils_walk_map(struct dm_cache_policy *p, policy_walk_fn fn,
+				 void *context, cblock_to_hint_fn_t hint_fn)
+{
+	struct shim_walk_map_ctx my_ctx, *parent_ctx;
+	int my_hint_size;
+
+	/*
+	 * Used by shim policies for their walk_mappings implementations.
+	 * Handles packing up the hint data, in conjunction with
+	 * shim_nested_walk_apply.
+	 */
+	parent_ctx = (struct shim_walk_map_ctx *)context;
+	my_hint_size = dm_cache_policy_get_hint_size(p);
+
+	my_ctx.parent_ctx = parent_ctx;
+	my_ctx.parent_fn = fn;
+	my_ctx.my_policy = p;
+	my_ctx.child_hint_buf = (parent_ctx && parent_ctx->child_hint_buf) ?
+		&parent_ctx->child_hint_buf[my_hint_size] : NULL;
+	my_ctx.cblock_to_hint_fn = hint_fn;
+
+	return policy_walk_mappings(p->child, shim_nested_walk_apply, &my_ctx);
+}
+EXPORT_SYMBOL_GPL(dm_cache_shim_utils_walk_map);
diff --git a/drivers/md/dm-cache-shim-utils.h b/drivers/md/dm-cache-shim-utils.h
new file mode 100644
index 0000000..92f2f21
--- /dev/null
+++ b/drivers/md/dm-cache-shim-utils.h
@@ -0,0 +1,73 @@
+/*
+ * Copyright 2013 NetApp, Inc. All Rights Reserved, contribution by
+ * Morgan Mears.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or (at your
+ * option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details
+ *
+ */
+
+#ifndef DM_CACHE_SHIM_UTILS_H
+#define DM_CACHE_SHIM_UTILS_H
+
+#include "dm-cache-policy.h"
+
+struct shim_walk_map_ctx;
+
+typedef void* (*cblock_to_hint_fn_t)(struct shim_walk_map_ctx *,
+				     dm_cblock_t,
+				     dm_oblock_t);
+
+/*
+ * For walk_mappings to work with a policy stack, every non-terminal policy
+ * has to start its context with one of these.  There are no requirements for
+ * the context used by the terminal policy.
+ */
+struct shim_walk_map_ctx {
+	void *parent_ctx;
+	policy_walk_fn parent_fn;
+	struct dm_cache_policy *my_policy;
+	char *child_hint_buf;
+	cblock_to_hint_fn_t cblock_to_hint_fn;
+	union {
+		__le64 le64_buf;
+		__le32 le32_buf;
+		__le16 le16_buf;
+	};
+};
+
+/*
+ * Populate a shim (non-terminal) policy structure with functions that just
+ * hand off to the child policy.  Caller can then override just those
+ * functions of interest.
+ */
+void dm_cache_shim_utils_init_shim_policy(struct dm_cache_policy *p);
+
+/*
+ * Launch a "walk_mappings" leg using the context provided by our caller.
+ * Typically used at the bottom of a policy stack, so caller can provide
+ * the hint buffer.
+ */
+int dm_cache_shim_utils_walk_map_with_ctx(struct shim_walk_map_ctx *ctx);
+
+/*
+ * Initialize a context appropriately and Launch a "walk_mappings" leg.
+ * Typically used to implement walk_mappings in shim policies. The
+ * framework will call hint_fn at the appropriate point, and it should
+ * return a pointer to the disk-ready hint for the given cblock.  The
+ * leXX_bufs in the shim_walk_map_ctx structure can be used to store the
+ * disk-ready hint if it will fit.
+ */
+int dm_cache_shim_utils_walk_map(struct dm_cache_policy *p,
+				 policy_walk_fn fn,
+				 void *context,
+				 cblock_to_hint_fn_t hint_fn);
+
+#endif /* DM_CACHE_SHIM_UTILS_H */
diff --git a/drivers/md/dm-cache-stack-utils.c b/drivers/md/dm-cache-stack-utils.c
new file mode 100644
index 0000000..82cc3af
--- /dev/null
+++ b/drivers/md/dm-cache-stack-utils.c
@@ -0,0 +1,239 @@
+/*
+ * Copyright 2013 NetApp, Inc. All Rights Reserved, contribution by
+ * Morgan Mears.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or (at your
+ * option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details
+ *
+ */
+
+#include "dm-cache-policy-internal.h"
+#include "dm-cache-shim-utils.h"
+#include "dm-cache-stack-utils.h"
+#include "dm-cache-policy.h"
+#include "dm.h"
+
+#include <linux/delay.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+#define DM_MSG_PREFIX "cache-stack-utils"
+
+struct stack_root_policy {
+	struct dm_cache_policy policy;
+	struct dm_cache_policy_type type;
+};
+
+/*----------------------------------------------------------------*/
+
+static void *stack_root_cblock_to_hint(struct shim_walk_map_ctx *ctx,
+				       dm_cblock_t cblock, dm_oblock_t oblock)
+{
+	return ctx->child_hint_buf;
+}
+
+static int stack_root_walk_mappings(struct dm_cache_policy *p,
+				    policy_walk_fn fn, void *context)
+{
+	struct shim_walk_map_ctx ctx;
+	size_t hint_size;
+	int r;
+
+	ctx.parent_ctx = context;
+	ctx.parent_fn = fn;
+	ctx.my_policy = p;
+	ctx.child_hint_buf = NULL;
+	ctx.cblock_to_hint_fn = stack_root_cblock_to_hint;
+
+	hint_size = dm_cache_policy_get_hint_size(p);
+	if (hint_size) {
+		ctx.child_hint_buf = kzalloc(hint_size, GFP_KERNEL);
+		if (!ctx.child_hint_buf)
+			return -ENOMEM;
+	}
+
+	r = dm_cache_shim_utils_walk_map_with_ctx(&ctx);
+
+	kfree(ctx.child_hint_buf);
+
+	return r;
+}
+
+static struct dm_cache_policy *stack_root_create(const char *policy_stack_str,
+						 struct dm_cache_policy *head)
+{
+	struct stack_root_policy *p = kzalloc(sizeof(*p), GFP_KERNEL);
+	struct dm_cache_policy *child;
+	struct dm_cache_policy_type *t;
+	const unsigned *version;
+	const char *seg_name;
+	size_t canonical_name_len, hint_size;
+	int i;
+
+	if (!p)
+		return NULL;
+
+	t = &p->type;
+	dm_cache_shim_utils_init_shim_policy(&p->policy);
+	p->policy.walk_mappings = stack_root_walk_mappings;
+	p->policy.child = head;
+
+	/*
+	 * We compose the canonical name for this policy stack by removing
+	 * any shim policies that do not have hint data.  This is intended
+	 * to allow for a class of shim policies that can be inserted into,
+	 * or removed from, the policy stack without causing the in-flash
+	 * metadata to be invalidated.  The thought is to allow debug or
+	 * tracing shims to be inserted or removed without dropping the cache.
+	 * The composite version numbers of a policy stack do not include the
+	 * versions of the hintless policies for the same reason.
+	 */
+	canonical_name_len = 0;
+	for (child = head; child; child = child->child) {
+		hint_size = dm_cache_policy_get_hint_size(child);
+
+#if 0
+		/* FIXME: avoids policy name in t->name, thus leaving an non-destroyable stack. */
+		if (!hint_size && child->child)
+			continue;
+#endif
+
+		t->hint_size += hint_size;
+
+		seg_name = dm_cache_policy_get_name(child);
+		canonical_name_len += strlen(seg_name) + (dm_cache_policy_is_shim(child) ? 1 : 0);
+
+		if (canonical_name_len >= sizeof(t->name)) {
+			DMWARN("policy stack string '%s' is too long",
+			       policy_stack_str);
+			kfree(p);
+			return NULL;
+		}
+
+		strcat(t->name, seg_name);
+
+		if (dm_cache_policy_is_shim(child)) {
+			t->name[canonical_name_len - 1] = DM_CACHE_POLICY_STACK_DELIM;
+			t->name[canonical_name_len] = '\0';
+		}
+
+		version = dm_cache_policy_get_version(child);
+
+		for (i = 0; i < CACHE_POLICY_VERSION_SIZE; i++)
+			t->version[i] += version[i];
+	}
+
+	p->policy.private = t;
+	return &p->policy;
+}
+
+static void stack_root_destroy(struct dm_cache_policy *p)
+{
+	kfree(p);
+}
+
+/*----------------------------------------------------------------*/
+
+int dm_cache_stack_utils_string_is_policy_stack(const char *string)
+{
+	const char *delim;
+
+	/*
+	 * A string specifies a policy stack instead of a policy if it
+	 * contains a policy delimiter (+) anywhere but at the end.  The
+	 * latter is needed to properly distinguish between policy stacks and
+	 * individual shim policies, since this function is called on them
+	 * when the policy stack is constructed from the specified string.
+	 */
+	delim = strchr(string, DM_CACHE_POLICY_STACK_DELIM);
+	if (!delim || (delim[1] == '\0'))
+		return false;
+
+	return true;
+}
+
+static void __policy_destroy_stack(struct dm_cache_policy *head_p)
+{
+	struct dm_cache_policy *cur_p, *next_p;
+
+	for (cur_p = head_p; cur_p; cur_p = next_p) {
+		next_p = cur_p->child;
+		dm_cache_policy_destroy(cur_p);
+	}
+}
+
+struct dm_cache_policy *
+dm_cache_stack_utils_policy_stack_create(const char *policy_stack_str,
+					 dm_cblock_t cache_size,
+					 sector_t origin_size,
+					 sector_t cache_block_size)
+{
+	char policy_name_buf[CACHE_POLICY_NAME_SIZE];
+	struct dm_cache_policy *p, *head_p, *next_p;
+	char *policy_name, *delim, uninitialized_var(saved_char);
+	int n;
+
+	n = strlcpy(policy_name_buf, policy_stack_str, sizeof(policy_name_buf));
+	if (n >= sizeof(policy_name_buf)) {
+		DMWARN("policy stack string is too long");
+		return NULL;
+	}
+
+	policy_name = policy_name_buf;
+	p = head_p = next_p = NULL;
+
+	do {
+		delim = strchr(policy_name, DM_CACHE_POLICY_STACK_DELIM);
+		if (delim)
+			*delim = '\0';
+
+		next_p = dm_cache_policy_create(policy_name, cache_size,
+						origin_size, cache_block_size);
+		if (!next_p)
+			goto cleanup;
+
+		next_p->child = NULL;
+		if (p)
+			p->child = next_p;
+		else
+			head_p = next_p;
+		p = next_p;
+
+		if (delim) {
+			if (!dm_cache_policy_is_shim(next_p)) {
+				DMERR("%s is no shim policy", policy_name);
+				goto cleanup;
+			}
+
+			*delim = DM_CACHE_POLICY_STACK_DELIM;
+			policy_name = delim + 1;
+		}
+	} while (delim);
+
+	if (head_p->child) {
+		next_p = stack_root_create(policy_stack_str, head_p);
+		if (!next_p)
+			goto cleanup;
+
+		head_p = next_p;
+	}
+
+	return head_p;
+
+cleanup:
+	__policy_destroy_stack(head_p);
+	return NULL;
+}
+
+void dm_cache_stack_utils_policy_stack_destroy(struct dm_cache_policy *p)
+{
+	__policy_destroy_stack(p->child);
+	stack_root_destroy(p);
+}
diff --git a/drivers/md/dm-cache-stack-utils.h b/drivers/md/dm-cache-stack-utils.h
new file mode 100644
index 0000000..54c767e
--- /dev/null
+++ b/drivers/md/dm-cache-stack-utils.h
@@ -0,0 +1,34 @@
+/*
+ * Copyright 2013 NetApp, Inc. All Rights Reserved, contribution by
+ * Morgan Mears.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or (at your
+ * option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details
+ *
+ */
+
+#ifndef DM_CACHE_STACK_UTILS_H
+#define DM_CACHE_STACK_UTILS_H
+
+#include "dm-cache-policy.h"
+
+#define DM_CACHE_POLICY_STACK_DELIM '+'
+
+int dm_cache_stack_utils_string_is_policy_stack(const char *string);
+
+struct dm_cache_policy *dm_cache_stack_utils_policy_stack_create(
+				const char *policy_stack_string,
+				dm_cblock_t cache_size,
+				sector_t origin_size,
+				sector_t cache_block_size);
+
+void dm_cache_stack_utils_policy_stack_destroy(struct dm_cache_policy *p);
+
+#endif /* DM_CACHE_STACK_UTILS_H */
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 20/24] dm cache: add era policy shim
  2013-10-24 18:30 [PATCH 00/24] dm cache: proposed changes for v3.13 merge Mike Snitzer
                   ` (18 preceding siblings ...)
  2013-10-24 18:30 ` [PATCH 19/24] dm cache: support for stackable caching policies Mike Snitzer
@ 2013-10-24 18:30 ` Mike Snitzer
  2013-10-24 18:30 ` [PATCH 21/24] dm cache: add trc " Mike Snitzer
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Mike Snitzer @ 2013-10-24 18:30 UTC (permalink / raw)
  To: dm-devel; +Cc: Morgan Mears, Heinz Mauelshagen, Joe Thornber, Mike Snitzer

From: Morgan Mears <morgan.mears@netapp.com>

This commit includes a non-terminal policy (aka "shim") called era that
may be stacked ontop of a terminal policy (e.g. mq).

The era policy adds:
- an era number to every cache block that gets updated on write hits
- an interface that allows an application to read and increment the
  current era value
- an interface to invalidate cache blocks that have been written to
  before or after a given era

This functionality can be used to partially invalidate the cache
contents to restore cache coherency after a snapshot rollback.

Signed-off-by: Morgan Mears <morgan.mears@netapp.com>
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/Kconfig               |  17 ++
 drivers/md/Makefile              |   2 +
 drivers/md/dm-cache-policy-era.c | 428 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 447 insertions(+)
 create mode 100644 drivers/md/dm-cache-policy-era.c

diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index 816e023..ad32101 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -282,6 +282,23 @@ config DM_CACHE_MQ
          This is meant to be a general purpose policy.  It prioritises
          reads over writes.
 
+config DM_CACHE_ERA
+       tristate "ERA Cache Policy shim (EXPERIMENTAL)"
+       depends on DM_CACHE
+       ---help---
+	 A cache policy shim that adds an "era" property to the
+	 per-cache-block metadata, to facilitate the implementation of
+	 cache coherency validation and recovery tools.	 This mechanism
+	 works as follows.  There is a monotonically increasing 32-bit
+	 era counter associated with each cache instance.  Each cache
+	 block is tagged with the era during which it was last written.
+	 A device mapper message interface is provided to obtain the
+	 current era, advance to the next era, and invalidate blocks
+	 from before or after a given era.  Note that you can use this
+	 policy shim to add the era functionality to any cache policy
+	 via name concatenation -- specify era+mq instead of just mq to
+	 add the era mechanism to the mq policy, for example.
+
 config DM_CACHE_CLEANER
        tristate "Cleaner Cache Policy (EXPERIMENTAL)"
        depends on DM_CACHE
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index 5f6dfc3..0ae00bd 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -15,6 +15,7 @@ dm-cache-y	+= dm-cache-target.o dm-cache-metadata.o dm-cache-policy.o \
 		   dm-cache-shim-utils.o dm-cache-stack-utils.o
 dm-cache-mq-y   += dm-cache-policy-mq.o
 dm-cache-cleaner-y += dm-cache-policy-cleaner.o
+dm-cache-era-y	+= dm-cache-policy-era.o
 md-mod-y	+= md.o bitmap.o
 raid456-y	+= raid5.o
 
@@ -53,6 +54,7 @@ obj-$(CONFIG_DM_VERITY)		+= dm-verity.o
 obj-$(CONFIG_DM_CACHE)		+= dm-cache.o
 obj-$(CONFIG_DM_CACHE_MQ)	+= dm-cache-mq.o
 obj-$(CONFIG_DM_CACHE_CLEANER)	+= dm-cache-cleaner.o
+obj-$(CONFIG_DM_CACHE_ERA)	+= dm-cache-era.o
 
 ifeq ($(CONFIG_DM_UEVENT),y)
 dm-mod-objs			+= dm-uevent.o
diff --git a/drivers/md/dm-cache-policy-era.c b/drivers/md/dm-cache-policy-era.c
new file mode 100644
index 0000000..427514c
--- /dev/null
+++ b/drivers/md/dm-cache-policy-era.c
@@ -0,0 +1,428 @@
+/*
+ * Copyright 2013 NetApp, Inc. All Rights Reserved, contribution by
+ * Morgan Mears.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or (at your
+ * option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details
+ *
+ */
+
+#include "dm-cache-policy.h"
+#include "dm-cache-policy-internal.h"
+#include "dm-cache-shim-utils.h"
+#include "dm.h"
+
+#include <linux/hash.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+
+#include <linux/delay.h>
+
+#define DEBUG_ERA 0
+
+#define DM_MSG_PREFIX "cache-policy-era"
+
+typedef uint32_t era_t;
+#define ERA_MAX_ERA UINT_MAX
+
+struct era_policy {
+	struct dm_cache_policy policy;
+
+	struct mutex lock;	/* FIXME: spinlock? */
+
+	dm_cblock_t cache_size;
+
+	era_t *cb_to_era;
+
+	era_t era_counter;
+};
+
+/*----------------------------------------------------------------*/
+
+static struct era_policy *to_era_policy(struct dm_cache_policy *p)
+{
+	return container_of(p, struct era_policy, policy);
+}
+
+static int incr_era_counter(struct era_policy *era, const char *curr_era_str)
+{
+	era_t curr_era_counter;
+	int r;
+
+	/*
+	 * If the era counter value provided by the user matches the current
+	 * counter value while under lock, increment the counter (intention
+	 * is to prevent races).  Rollover problems are avoided by locking
+	 * the counter at a maximum value.  The application must take
+	 * appropriate action on this error to preserve correction, but
+	 * a properly behaved set of applications will never trigger it;
+	 * the era counter is meant to increment less than once a second
+	 * and is 32 bits.
+	 */
+
+	if (kstrtou32(curr_era_str, 10, &curr_era_counter))
+		return -EINVAL;
+
+	smp_rmb();
+	if (era->era_counter != curr_era_counter)
+		r = -ECANCELED;
+	else if (era->era_counter >= ERA_MAX_ERA)
+		r = -EOVERFLOW;
+	else {
+		era->era_counter++;
+		smp_wmb();
+		r = 0;
+	}
+
+	return r;
+}
+
+static void *era_cblock_to_hint(struct shim_walk_map_ctx *ctx,
+				dm_cblock_t cblock, dm_oblock_t oblock)
+{
+	struct era_policy *era = to_era_policy(ctx->my_policy);
+	era_t era_val;
+	era_val = era->cb_to_era[from_cblock(cblock)];
+#if DEBUG_ERA
+	DMDEBUG("storing era %u for cblock %u.", era_val, cblock);
+#endif
+	ctx->le32_buf = cpu_to_le32(era_val);
+	return &ctx->le32_buf;
+}
+
+static int era_is_gt_value(era_t era, era_t value)
+{
+	return era > value;
+}
+
+static int era_is_gte_value(era_t era, era_t value)
+{
+	return era >= value;
+}
+
+static int era_is_lte_value(era_t era, era_t value)
+{
+	return era <= value;
+}
+
+static int era_is_lt_value(era_t era, era_t value)
+{
+	return era < value;
+}
+
+typedef int (*era_match_fn_t)(era_t, era_t);
+
+struct inval_oblocks_ctx {
+	struct era_policy *era;
+	era_match_fn_t era_match_fn;
+	era_t test_era;
+};
+
+static int era_inval_oblocks(void *context, dm_cblock_t cblock,
+			     dm_oblock_t oblock, void *unused)
+{
+	struct inval_oblocks_ctx *ctx = (struct inval_oblocks_ctx *)context;
+	struct dm_cache_policy *child;
+	era_t act_era;
+
+	act_era = ctx->era->cb_to_era[from_cblock(cblock)];
+	if (ctx->era_match_fn(act_era, ctx->test_era)) {
+#if DEBUG_ERA
+		DMDEBUG("cblock %u has era %u matching test_era %u; "
+			"marking mapping to be removed for oblock %llu.",
+			from_cblock(cblock), act_era, ctx->test_era, oblock);
+#endif
+		child = ctx->era->policy.child;
+
+		/*
+		 * This deadlocks (lock against self) because child is calling
+		 * us via the walk_mappings context callback, child's
+		 * walk_mappings holds child's lock, and child's remove_mappings
+		 * tries to get it again.  Not fixing because I believe the
+		 * invalidate API is going to change.
+		 */
+		/* child->remove_mapping(child, oblock); */
+	}
+
+	return 0;
+}
+
+static int cond_unmap_by_era(struct era_policy *era, const char *test_era_str,
+			     era_match_fn_t era_match_fn)
+{
+	struct shim_walk_map_ctx ctx;
+	struct inval_oblocks_ctx io_ctx;
+	era_t test_era;
+	int r;
+
+	/*
+	 * Unmap blocks with eras matching the given era, according to the
+	 * given matching function.
+	 */
+
+	if (kstrtou32(test_era_str, 10, &test_era))
+		return -EINVAL;
+
+	io_ctx.era = era;
+	io_ctx.era_match_fn = era_match_fn;
+	io_ctx.test_era = test_era;
+
+	ctx.parent_ctx = &io_ctx;
+	ctx.parent_fn = era_inval_oblocks;
+	ctx.my_policy = &era->policy;
+	ctx.child_hint_buf = NULL;
+	ctx.cblock_to_hint_fn = NULL;
+
+	mutex_lock(&era->lock);
+	r = dm_cache_shim_utils_walk_map_with_ctx(&ctx);
+	mutex_unlock(&era->lock);
+
+	return r;
+}
+
+/*
+ * Public interface, via the policy struct.  See dm-cache-policy.h for a
+ * description of these.
+ */
+
+static void era_destroy(struct dm_cache_policy *p)
+{
+	struct era_policy *era = to_era_policy(p);
+#if DEBUG_ERA
+	DMDEBUG("destroyed era %p", era);
+#endif
+	kfree(era->cb_to_era);
+	kfree(era);
+}
+
+static int era_map(struct dm_cache_policy *p, dm_oblock_t oblock,
+		      bool can_block, bool can_migrate, bool discarded_oblock,
+		      struct bio *bio, struct policy_result *result)
+{
+	struct era_policy *era = to_era_policy(p);
+	uint32_t cb_idx;
+	int r;
+
+	result->op = POLICY_MISS;
+
+	if (can_block)
+		mutex_lock(&era->lock);
+	else if (!mutex_trylock(&era->lock))
+		return -EWOULDBLOCK;
+
+	/* Check for a mapping */
+	r = policy_map(p->child, oblock, can_block, can_migrate,
+		       discarded_oblock, bio, result);
+
+	/* If we got a hit and this is a write, update the era for the block */
+	if (!r && (bio_data_dir(bio) == WRITE) && (result->op == POLICY_HIT)) {
+		cb_idx = from_cblock(result->cblock);
+		BUG_ON(cb_idx >= from_cblock(era->cache_size));
+		smp_rmb();
+		era->cb_to_era[cb_idx] = era->era_counter;
+#if DEBUG_ERA
+		DMDEBUG("assigned era %u to cblock %u, oblock %llu due to write hit.",
+			era->era_counter, result->cblock, oblock);
+#endif
+	}
+
+	mutex_unlock(&era->lock);
+
+	return r;
+}
+
+static int era_load_mapping(struct dm_cache_policy *p,
+			    dm_oblock_t oblock, dm_cblock_t cblock,
+			    void *hint, bool hint_valid)
+{
+	struct era_policy *era = to_era_policy(p);
+	struct dm_cache_policy *child;
+	__le32 *le32_hint;
+	era_t recovered_era;
+	int r;
+
+	child = era->policy.child;
+
+	le32_hint = (__le32 *)hint;
+	hint = &le32_hint[1];
+
+	r = policy_load_mapping(child, oblock, cblock, hint, hint_valid);
+
+	if (!r && hint_valid &&
+	    (from_cblock(cblock) < from_cblock(era->cache_size))) {
+		recovered_era = le32_to_cpu(*le32_hint);
+#if DEBUG_ERA
+		DMDEBUG("recovered era %u for cblock %u.", recovered_era, cblock);
+#endif
+		era->cb_to_era[from_cblock(cblock)] = recovered_era;
+
+		/*
+		 * Make sure the era counter starts higher than the highest
+		 * persisted era.
+		 */
+		smp_rmb();
+		if (recovered_era >= era->era_counter) {
+			era->era_counter = recovered_era;
+			if (era->era_counter < ERA_MAX_ERA)
+				era->era_counter++;
+			smp_wmb();
+#if DEBUG_ERA
+			DMDEBUG("set era_counter to %u.", era->era_counter);
+#endif
+		}
+	}
+
+	return r;
+}
+
+static int era_walk_mappings(struct dm_cache_policy *p, policy_walk_fn fn,
+			     void *context)
+{
+	return dm_cache_shim_utils_walk_map(p, fn, context, era_cblock_to_hint);
+}
+
+static void era_force_mapping(struct dm_cache_policy *p, dm_oblock_t old_oblock,
+			      dm_oblock_t new_oblock)
+{
+	struct era_policy *era = to_era_policy(p);
+	dm_cblock_t cblock;
+
+	mutex_lock(&era->lock);
+
+	if (!policy_lookup(p->child, old_oblock, &cblock)) {
+		smp_rmb();
+		era->cb_to_era[from_cblock(cblock)] = era->era_counter;
+#if DEBUG_ERA
+		DMDEBUG("assigned era %u to cblock %u, oblock %llu "
+			"(old_oblock %llu) due to force_mapping.",
+			era->era_counter, cblock, new_oblock, old_oblock);
+#endif
+	}
+
+	policy_force_mapping(p->child, old_oblock, new_oblock);
+
+	mutex_unlock(&era->lock);
+}
+
+static int era_set_config_value(struct dm_cache_policy *p, const char *key,
+				const char *value)
+{
+	struct era_policy *era = to_era_policy(p);
+	int r;
+
+	if (!strcasecmp(key, "increment_era_counter"))
+		r = incr_era_counter(era, value);
+	else if (!strcasecmp(key, "unmap_blocks_from_later_eras"))
+		r = cond_unmap_by_era(era, value, era_is_gt_value);
+	else if (!strcasecmp(key, "unmap_blocks_from_this_era_and_later"))
+		r = cond_unmap_by_era(era, value, era_is_gte_value);
+	else if (!strcasecmp(key, "unmap_blocks_from_this_era_and_earlier"))
+		r = cond_unmap_by_era(era, value, era_is_lte_value);
+	else if (!strcasecmp(key, "unmap_blocks_from_earlier_eras"))
+		r = cond_unmap_by_era(era, value, era_is_lt_value);
+	else
+		r = policy_set_config_value(p->child, key, value);
+
+	return r;
+}
+
+static int era_emit_config_values(struct dm_cache_policy *p, char *result,
+				  unsigned maxlen)
+{
+	struct era_policy *era = to_era_policy(p);
+	ssize_t sz = 0;
+
+	smp_rmb();
+	DMEMIT("era_counter %u ", era->era_counter);
+	return policy_emit_config_values(p->child, result + sz, maxlen - sz);
+}
+
+/* Init the policy plugin interface function pointers. */
+static void init_policy_functions(struct era_policy *era)
+{
+	dm_cache_shim_utils_init_shim_policy(&era->policy);
+	era->policy.destroy = era_destroy;
+	era->policy.map = era_map;
+	era->policy.load_mapping = era_load_mapping;
+	era->policy.walk_mappings = era_walk_mappings;
+	era->policy.force_mapping = era_force_mapping;
+	era->policy.emit_config_values = era_emit_config_values;
+	era->policy.set_config_value = era_set_config_value;
+}
+
+static struct dm_cache_policy *era_create(dm_cblock_t cache_size,
+					  sector_t origin_size,
+					  sector_t cache_block_size)
+{
+	struct era_policy *era = kzalloc(sizeof(*era), GFP_KERNEL);
+
+	if (!era)
+		return NULL;
+
+	init_policy_functions(era);
+	era->cache_size = cache_size;
+	mutex_init(&era->lock);
+
+	era->cb_to_era = kzalloc(from_cblock(era->cache_size) *
+				 sizeof(*(era->cb_to_era)), GFP_KERNEL);
+	if (!era->cb_to_era)
+		goto bad_alloc_cb_to_era;
+	era->era_counter = 1;
+
+	return &era->policy;
+
+bad_alloc_cb_to_era:
+	kfree(era);
+	return NULL;
+}
+
+/*----------------------------------------------------------------*/
+
+static struct dm_cache_policy_type era_policy_type = {
+	.name = "era",
+	.version = {1, 0, 0},
+	.hint_size = 4,
+	.owner = THIS_MODULE,
+	.create = era_create,
+	.features = DM_CACHE_POLICY_SHIM
+};
+
+static int __init era_init(void)
+{
+	int r;
+
+	r = dm_cache_policy_register(&era_policy_type);
+	if (!r) {
+		DMINFO("version %u.%u.%u loaded",
+		       era_policy_type.version[0],
+		       era_policy_type.version[1],
+		       era_policy_type.version[2]);
+		return 0;
+	}
+
+	DMERR("register failed %d", r);
+
+	dm_cache_policy_unregister(&era_policy_type);
+	return -ENOMEM;
+}
+
+static void __exit era_exit(void)
+{
+	dm_cache_policy_unregister(&era_policy_type);
+}
+
+module_init(era_init);
+module_exit(era_exit);
+
+MODULE_AUTHOR("Morgan Mears <dm-devel@redhat.com>");
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("era cache policy shim");
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 21/24] dm cache: add trc policy shim
  2013-10-24 18:30 [PATCH 00/24] dm cache: proposed changes for v3.13 merge Mike Snitzer
                   ` (19 preceding siblings ...)
  2013-10-24 18:30 ` [PATCH 20/24] dm cache: add era policy shim Mike Snitzer
@ 2013-10-24 18:30 ` Mike Snitzer
  2013-10-25 20:13   ` Alasdair G Kergon
  2013-10-24 18:30 ` [PATCH 22/24] dm cache: add hints policy Mike Snitzer
                   ` (2 subsequent siblings)
  23 siblings, 1 reply; 49+ messages in thread
From: Mike Snitzer @ 2013-10-24 18:30 UTC (permalink / raw)
  To: dm-devel; +Cc: Morgan Mears, Heinz Mauelshagen, Joe Thornber, Mike Snitzer

From: Morgan Mears <morgan.mears@netapp.com>

This commit includes a non-terminal policy (aka "shim") called trc that
may be stacked ontop of a terminal policy (e.g. mq).

The second shim, trc, adds function-call-level tracing to the policy
stack.  By default, an INFO-level log message including function name
and parameter(s) will be generated whenever most of the cache policy
interface functions are called.  An interface to increase or decrease
the verbosity of the trace output is also provided.

Signed-off-by: Morgan Mears <morgan.mears@netapp.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/Kconfig               |   8 ++
 drivers/md/Makefile              |   2 +
 drivers/md/dm-cache-policy-trc.c | 255 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 265 insertions(+)
 create mode 100644 drivers/md/dm-cache-policy-trc.c

diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index ad32101..d63539c 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -449,4 +449,12 @@ config DM_DELAY
 
 	If unsure, say N.
 
+config DM_CACHE_TRC
+       tristate "Tracing Cache Policy shim (EXPERIMENTAL)"
+       depends on DM_CACHE
+       depends on DM_TEST_TARGETS
+       ---help---
+         A cache policy shim that adds debug tracing for all cache policy
+	 interface function calls, along with a configurable trace level.
+
 endif # MD
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index 0ae00bd..d68fdf7 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -16,6 +16,7 @@ dm-cache-y	+= dm-cache-target.o dm-cache-metadata.o dm-cache-policy.o \
 dm-cache-mq-y   += dm-cache-policy-mq.o
 dm-cache-cleaner-y += dm-cache-policy-cleaner.o
 dm-cache-era-y	+= dm-cache-policy-era.o
+dm-cache-trc-y	+= dm-cache-policy-trc.o
 md-mod-y	+= md.o bitmap.o
 raid456-y	+= raid5.o
 
@@ -55,6 +56,7 @@ obj-$(CONFIG_DM_CACHE)		+= dm-cache.o
 obj-$(CONFIG_DM_CACHE_MQ)	+= dm-cache-mq.o
 obj-$(CONFIG_DM_CACHE_CLEANER)	+= dm-cache-cleaner.o
 obj-$(CONFIG_DM_CACHE_ERA)	+= dm-cache-era.o
+obj-$(CONFIG_DM_CACHE_TRC)	+= dm-cache-trc.o
 
 ifeq ($(CONFIG_DM_UEVENT),y)
 dm-mod-objs			+= dm-uevent.o
diff --git a/drivers/md/dm-cache-policy-trc.c b/drivers/md/dm-cache-policy-trc.c
new file mode 100644
index 0000000..8b16061
--- /dev/null
+++ b/drivers/md/dm-cache-policy-trc.c
@@ -0,0 +1,255 @@
+/*
+ * Copyright 2013 NetApp, Inc. All Rights Reserved, contribution by
+ * Morgan Mears.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Gentrcl Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or (at your
+ * option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Gentrcl Public License
+ * for more details
+ *
+ */
+
+/* FIXME: use from_[oc]block() */
+
+#include "dm-cache-policy.h"
+#include "dm-cache-policy-internal.h"
+#include "dm-cache-shim-utils.h"
+#include "dm.h"
+
+#include <linux/module.h>
+
+#define DM_MSG_PREFIX "cache-policy-trc"
+#define DM_TRC_OUT(lev, p, f, arg...) \
+	do { \
+		if (to_trc_policy(p)->trace_level >= lev) \
+			DMINFO("%s: " f, __func__, ## arg); \
+	} while (0)
+
+enum dm_trace_lev_e {
+	DM_TRC_LEV_OFF		= 0,
+	DM_TRC_LEV_NORMAL	= 1,
+	DM_TRC_LEV_VERBOSE	= 2
+};
+
+struct trc_policy {
+	struct dm_cache_policy policy;
+	int trace_level;
+};
+
+/*----------------------------------------------------------------*/
+
+static struct trc_policy *to_trc_policy(struct dm_cache_policy *p)
+{
+	return container_of(p, struct trc_policy, policy);
+}
+
+static int set_trace_level(struct dm_cache_policy *p, const char *str)
+{
+	uint32_t val;
+
+	if (kstrtou32(str, 10, &val))
+		return -EINVAL;
+	to_trc_policy(p)->trace_level = val;
+	return 0;
+}
+
+/*
+ * Public interface, via the policy struct.  See dm-cache-policy.h for a
+ * description of these.
+ */
+
+static void trc_destroy(struct dm_cache_policy *p)
+{
+	DM_TRC_OUT(DM_TRC_LEV_NORMAL, p, "%p", p);
+	kfree(p);
+}
+
+static int trc_map(struct dm_cache_policy *p, dm_oblock_t oblock,
+		   bool can_block, bool can_migrate, bool discarded_oblock,
+		   struct bio *bio, struct policy_result *result)
+{
+	DM_TRC_OUT(DM_TRC_LEV_NORMAL, p, "%p %llu %u %u %u %p %p", p,
+		   oblock, can_block, can_migrate, discarded_oblock,
+		   bio, result);
+	return policy_map(p->child, oblock, can_block, can_migrate,
+			  discarded_oblock, bio, result);
+}
+
+static int trc_lookup(struct dm_cache_policy *p, dm_oblock_t oblock,
+		      dm_cblock_t *cblock)
+{
+	DM_TRC_OUT(DM_TRC_LEV_NORMAL, p, "%p %llu %p", p, oblock, cblock);
+	return policy_lookup(p->child, oblock, cblock);
+}
+
+static void trc_set_dirty(struct dm_cache_policy *p, dm_oblock_t oblock)
+{
+	DM_TRC_OUT(DM_TRC_LEV_NORMAL, p, "%p %llu", p, oblock);
+	policy_set_dirty(p->child, oblock);
+}
+
+static void trc_clear_dirty(struct dm_cache_policy *p, dm_oblock_t oblock)
+{
+	DM_TRC_OUT(DM_TRC_LEV_NORMAL, p, "%p %llu", p, oblock);
+	policy_clear_dirty(p->child, oblock);
+}
+
+static int trc_load_mapping(struct dm_cache_policy *p,
+			    dm_oblock_t oblock, dm_cblock_t cblock,
+			    void *hint, bool hint_valid)
+{
+	DM_TRC_OUT(DM_TRC_LEV_NORMAL, p, "%p %llu %u %p %u", p, oblock,
+		   cblock, hint, hint_valid);
+	return policy_load_mapping(p->child, oblock, cblock, hint, hint_valid);
+}
+
+static int trc_walk_mappings(struct dm_cache_policy *p, policy_walk_fn fn,
+			     void *context)
+{
+	DM_TRC_OUT(DM_TRC_LEV_NORMAL, p, "%p %p %p", p, fn, context);
+	return dm_cache_shim_utils_walk_map(p, fn, context, NULL);
+}
+
+static void trc_remove_mapping(struct dm_cache_policy *p, dm_oblock_t oblock)
+{
+	DM_TRC_OUT(DM_TRC_LEV_NORMAL, p, "%p %llu", p, oblock);
+	policy_remove_mapping(p->child, oblock);
+}
+
+static int trc_writeback_work(struct dm_cache_policy *p, dm_oblock_t *oblock,
+			      dm_cblock_t *cblock)
+{
+	DM_TRC_OUT(DM_TRC_LEV_VERBOSE, p, "%p %p %p", p, oblock, cblock);
+	return policy_writeback_work(p->child, oblock, cblock);
+}
+
+static void trc_force_mapping(struct dm_cache_policy *p,
+			       dm_oblock_t old_oblock,
+			       dm_oblock_t new_oblock)
+{
+	DM_TRC_OUT(DM_TRC_LEV_NORMAL, p, "%p %llu %llu",
+		   p, old_oblock, new_oblock);
+	policy_force_mapping(p->child, old_oblock, new_oblock);
+}
+
+static dm_cblock_t trc_residency(struct dm_cache_policy *p)
+{
+	DM_TRC_OUT(DM_TRC_LEV_NORMAL, p, "%p", p);
+	return policy_residency(p->child);
+}
+
+static void trc_tick(struct dm_cache_policy *p)
+{
+	DM_TRC_OUT(DM_TRC_LEV_VERBOSE, p, "%p", p);
+	policy_tick(p->child);
+}
+
+static int trc_set_config_value(struct dm_cache_policy *p,
+				const char *key,
+				const char *value)
+{
+	int r;
+
+	DM_TRC_OUT(DM_TRC_LEV_NORMAL, p, "%p %s %s", p, key, value);
+
+	if (!strcasecmp(key, "set_trace_level"))
+		r = set_trace_level(p, value);
+	else
+		r = policy_set_config_value(p->child, key, value);
+
+	return r;
+}
+
+static int trc_emit_config_values(struct dm_cache_policy *p, char *result,
+				  unsigned maxlen)
+{
+	struct trc_policy *trc = to_trc_policy(p);
+	ssize_t sz = 0;
+
+	DM_TRC_OUT(DM_TRC_LEV_NORMAL, p, "%p %p %u", p, result, maxlen);
+
+	DMEMIT("trace_level %u ", trc->trace_level);
+	return policy_emit_config_values(p->child, result + sz, maxlen - sz);
+}
+
+/* Init the policy plugin interface function pointers. */
+static void init_policy_functions(struct trc_policy *trc)
+{
+	dm_cache_shim_utils_init_shim_policy(&trc->policy);
+	trc->policy.destroy = trc_destroy;
+	trc->policy.map = trc_map;
+	trc->policy.lookup = trc_lookup;
+	trc->policy.set_dirty = trc_set_dirty;
+	trc->policy.clear_dirty = trc_clear_dirty;
+	trc->policy.load_mapping = trc_load_mapping;
+	trc->policy.walk_mappings = trc_walk_mappings;
+	trc->policy.remove_mapping = trc_remove_mapping;
+	trc->policy.writeback_work = trc_writeback_work;
+	trc->policy.force_mapping = trc_force_mapping;
+	trc->policy.residency = trc_residency;
+	trc->policy.tick = trc_tick;
+	trc->policy.emit_config_values = trc_emit_config_values;
+	trc->policy.set_config_value = trc_set_config_value;
+}
+
+static struct dm_cache_policy *trc_create(dm_cblock_t cache_size,
+					  sector_t origin_size,
+					  sector_t cache_block_size)
+{
+	struct trc_policy *trc = kzalloc(sizeof(*trc), GFP_KERNEL);
+
+	if (!trc)
+		return NULL;
+
+	init_policy_functions(trc);
+	trc->trace_level = DM_TRC_LEV_NORMAL;
+
+	return &trc->policy;
+}
+
+/*----------------------------------------------------------------*/
+
+static struct dm_cache_policy_type trc_policy_type = {
+	.name = "trc",
+	.version = {1, 0, 0},
+	.hint_size = 0,
+	.owner = THIS_MODULE,
+	.create = trc_create,
+	.features = DM_CACHE_POLICY_SHIM
+};
+
+static int __init trc_init(void)
+{
+	int r;
+
+	r = dm_cache_policy_register(&trc_policy_type);
+	if (!r) {
+		DMINFO("version %u.%u.%u loaded",
+		       trc_policy_type.version[0],
+		       trc_policy_type.version[1],
+		       trc_policy_type.version[2]);
+		return 0;
+	}
+
+	DMERR("register failed %d", r);
+
+	dm_cache_policy_unregister(&trc_policy_type);
+	return -ENOMEM;
+}
+
+static void __exit trc_exit(void)
+{
+	dm_cache_policy_unregister(&trc_policy_type);
+}
+
+module_init(trc_init);
+module_exit(trc_exit);
+
+MODULE_AUTHOR("Morgan Mears <dm-devel@redhat.com>");
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("trc cache policy shim");
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 22/24] dm cache: add hints policy
  2013-10-24 18:30 [PATCH 00/24] dm cache: proposed changes for v3.13 merge Mike Snitzer
                   ` (20 preceding siblings ...)
  2013-10-24 18:30 ` [PATCH 21/24] dm cache: add trc " Mike Snitzer
@ 2013-10-24 18:30 ` Mike Snitzer
  2013-10-24 18:30 ` [PATCH 23/24] dm cache: add cache block invalidation API Mike Snitzer
  2013-10-24 18:30 ` [PATCH 24/24] dm cache policy era: add cache block invalidation support Mike Snitzer
  23 siblings, 0 replies; 49+ messages in thread
From: Mike Snitzer @ 2013-10-24 18:30 UTC (permalink / raw)
  To: dm-devel; +Cc: Morgan Mears, Heinz Mauelshagen, Joe Thornber, Mike Snitzer

From: Heinz Mauelshagen <heinzm@redhat.com>

A dumb cache policy just to test variable hint size support

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/Kconfig                 |   7 +
 drivers/md/Makefile                |   2 +
 drivers/md/dm-cache-policy-hints.c | 769 +++++++++++++++++++++++++++++++++++++
 3 files changed, 778 insertions(+)
 create mode 100644 drivers/md/dm-cache-policy-hints.c

diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index d63539c..6019570 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -449,6 +449,13 @@ config DM_DELAY
 
 	If unsure, say N.
 
+config DM_CACHE_HINTS
+       tristate "Hint Size Test Cache Policy (EXPERIMENTAL)"
+       depends on DM_CACHE
+       depends on DM_TEST_TARGETS
+       ---help---
+         A dumb cache policy just to test variable hint size
+
 config DM_CACHE_TRC
        tristate "Tracing Cache Policy shim (EXPERIMENTAL)"
        depends on DM_CACHE
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index d68fdf7..8c71863 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -17,6 +17,7 @@ dm-cache-mq-y   += dm-cache-policy-mq.o
 dm-cache-cleaner-y += dm-cache-policy-cleaner.o
 dm-cache-era-y	+= dm-cache-policy-era.o
 dm-cache-trc-y	+= dm-cache-policy-trc.o
+dm-cache-hints-y += dm-cache-policy-hints.o
 md-mod-y	+= md.o bitmap.o
 raid456-y	+= raid5.o
 
@@ -57,6 +58,7 @@ obj-$(CONFIG_DM_CACHE_MQ)	+= dm-cache-mq.o
 obj-$(CONFIG_DM_CACHE_CLEANER)	+= dm-cache-cleaner.o
 obj-$(CONFIG_DM_CACHE_ERA)	+= dm-cache-era.o
 obj-$(CONFIG_DM_CACHE_TRC)	+= dm-cache-trc.o
+obj-$(CONFIG_DM_CACHE_HINTS)	+= dm-cache-hints.o
 
 ifeq ($(CONFIG_DM_UEVENT),y)
 dm-mod-objs			+= dm-uevent.o
diff --git a/drivers/md/dm-cache-policy-hints.c b/drivers/md/dm-cache-policy-hints.c
new file mode 100644
index 0000000..312cfb7
--- /dev/null
+++ b/drivers/md/dm-cache-policy-hints.c
@@ -0,0 +1,769 @@
+/*
+ * Copyright (C) 2013 Red Hat. All rights reserved.
+ *
+ * This file is released under the GPL.
+ *
+ * TESTING! NOT FOR PRODUCTION USE!
+ *
+ * "hints" policy to test variable hint size.
+ */
+
+#include "dm.h"
+#include "dm-cache-policy.h"
+#include "dm-cache-policy-internal.h"
+
+#include <linux/hash.h>
+#include <linux/list.h>
+#include <linux/module.h>
+
+#define DM_MSG_PREFIX "cache-policy-hints"
+
+/*----------------------------------------------------------------*/
+
+static struct kmem_cache *hints_entry_cache;
+
+/*----------------------------------------------------------------*/
+
+static unsigned next_power(unsigned n, unsigned min)
+{
+	return roundup_pow_of_two(max(n, min));
+}
+
+struct hash {
+	struct hlist_head *table;
+	dm_block_t hash_bits;
+	unsigned nr_buckets;
+};
+
+struct entry {
+	struct hlist_node hlist;
+	struct list_head list;
+	dm_oblock_t oblock;
+	dm_cblock_t cblock;
+};
+
+#define	DEFAULT_HINT_SIZE DM_CACHE_POLICY_MAX_HINT_SIZE
+struct policy {
+	struct dm_cache_policy policy;
+	struct mutex lock;
+
+	sector_t origin_size, block_size;
+
+	/* To optimize search in the allocation bitset */
+	unsigned find_free_nr_words, find_free_last_word;
+	unsigned long *allocation_bitset;
+
+	dm_cblock_t nr_cblocks_allocated;
+	dm_cblock_t cache_size;
+
+	struct {
+		struct list_head free; /* Free cache entry list */
+		struct list_head used; /* Used cache entry list */
+	} queues;
+
+	/* The cache hash */
+	struct hash chash;
+
+	void *hints_buffer;
+	unsigned hint_counter[4];
+
+	/* Flag to block (re)setting hint_size via the message interface */
+	bool hint_size_set;
+};
+
+/*----------------------------------------------------------------------------*/
+/* Low-level queue function. */
+static struct entry *queue_pop(struct list_head *q)
+{
+	if (!list_empty(q)) {
+		struct list_head *elt = q->next;
+
+		list_del(elt);
+		return list_entry(elt, struct entry, list);
+	}
+
+	return NULL;
+}
+/*----------------------------------------------------------------------------*/
+
+/* Allocate/free various resources. */
+static int alloc_hash(struct hash *hash, unsigned elts)
+{
+	hash->nr_buckets = next_power(elts >> 4, 16);
+	hash->hash_bits = ffs(hash->nr_buckets) - 1;
+	hash->table = vzalloc(sizeof(*hash->table) * hash->nr_buckets);
+
+	return hash->table ? 0 : -ENOMEM;
+}
+
+static void free_hash(struct hash *hash)
+{
+	vfree(hash->table);
+}
+
+/* Free/alloc basic cache entry structures. */
+static void __free_cache_entries(struct list_head *q)
+{
+	struct entry *e;
+
+	while ((e = queue_pop(q)))
+		kmem_cache_free(hints_entry_cache, e);
+}
+
+static void free_cache_entries(struct policy *p)
+{
+	__free_cache_entries(&p->queues.free);
+	__free_cache_entries(&p->queues.used);
+}
+
+static int alloc_cache_blocks_with_hash(struct policy *p, unsigned cache_size)
+{
+	int r = -ENOMEM;
+	unsigned u = cache_size;
+
+	p->nr_cblocks_allocated = to_cblock(0);
+
+	while (u--) {
+		struct entry *e = kmem_cache_zalloc(hints_entry_cache, GFP_KERNEL);
+
+		if (!e)
+			goto bad_cache_alloc;
+
+		list_add(&e->list, &p->queues.free);
+	}
+
+	/* Cache entries hash. */
+	r = alloc_hash(&p->chash, cache_size);
+	if (r)
+		goto bad_cache_alloc;
+
+	return 0;
+
+bad_cache_alloc:
+	free_cache_entries(p);
+
+	return r;
+}
+
+static void free_cache_blocks_and_hash(struct policy *p)
+{
+	free_hash(&p->chash);
+	free_cache_entries(p);
+}
+
+static void alloc_cblock(struct policy *p, dm_cblock_t cblock)
+{
+	BUG_ON(from_cblock(cblock) >= from_cblock(p->cache_size));
+	BUG_ON(test_bit(from_cblock(cblock), p->allocation_bitset));
+	set_bit(from_cblock(cblock), p->allocation_bitset);
+}
+
+static void free_cblock(struct policy *p, dm_cblock_t cblock)
+{
+	BUG_ON(from_cblock(cblock) >= from_cblock(p->cache_size));
+	BUG_ON(!test_bit(from_cblock(cblock), p->allocation_bitset));
+	clear_bit(from_cblock(cblock), p->allocation_bitset);
+}
+
+/*----------------------------------------------------------------------------*/
+/* Low-level functions. */
+static struct policy *to_policy(struct dm_cache_policy *p)
+{
+	return container_of(p, struct policy, policy);
+}
+
+/*----------------------------------------------------------------*/
+
+static unsigned bit_set_nr_words(unsigned long nr_cblocks)
+{
+	return dm_div_up(nr_cblocks, BITS_PER_LONG);
+}
+
+static unsigned long *alloc_bitset(unsigned nr_cblocks)
+{
+	return vzalloc(sizeof(unsigned long) * bit_set_nr_words(nr_cblocks));
+}
+
+static void free_bitset(unsigned long *bits)
+{
+	vfree(bits);
+}
+/*----------------------------------------------------------------------------*/
+
+/* Hash functions (lookup, insert, remove). */
+static struct entry *lookup_cache_entry(struct policy *p, dm_oblock_t oblock)
+{
+	struct hash *hash = &p->chash;
+	unsigned h = hash_64(from_oblock(oblock), hash->hash_bits);
+	struct entry *cur;
+	struct hlist_head *bucket = &hash->table[h];
+
+	hlist_for_each_entry(cur, bucket, hlist) {
+		if (cur->oblock == oblock) {
+			/* Move upfront bucket for faster access. */
+			hlist_del(&cur->hlist);
+			hlist_add_head(&cur->hlist, bucket);
+			return cur;
+		}
+	}
+
+	return NULL;
+}
+
+static void insert_cache_hash_entry(struct policy *p, struct entry *e)
+{
+	unsigned h = hash_64(from_oblock(e->oblock), p->chash.hash_bits);
+
+	hlist_add_head(&e->hlist, &p->chash.table[h]);
+}
+
+static void remove_cache_hash_entry(struct policy *p, struct entry *e)
+{
+	hlist_del(&e->hlist);
+}
+
+
+/*----------------------------------------------------------------------------*/
+/*
+ * This doesn't allocate the block.
+ */
+static int __find_free_cblock(struct policy *p, unsigned begin, unsigned end,
+			      dm_cblock_t *result, unsigned *last_word)
+{
+	int r = -ENOSPC;
+	unsigned w;
+
+	for (w = begin; w < end; w++) {
+		/*
+		 * ffz is undefined if no zero exists
+		 */
+		if (p->allocation_bitset[w] != ULONG_MAX) {
+			*last_word = w;
+			*result = to_cblock((w * BITS_PER_LONG) + ffz(p->allocation_bitset[w]));
+			if (from_cblock(*result) < from_cblock(p->cache_size))
+				r = 0;
+
+			break;
+		}
+	}
+
+	return r;
+}
+
+static int find_free_cblock(struct policy *p, dm_cblock_t *result)
+{
+	int r = __find_free_cblock(p, p->find_free_last_word, p->find_free_nr_words, result, &p->find_free_last_word);
+
+	if (r == -ENOSPC && p->find_free_last_word)
+		r = __find_free_cblock(p, 0, p->find_free_last_word, result, &p->find_free_last_word);
+
+	return r;
+}
+
+static struct entry *alloc_cache_entry(struct policy *p)
+{
+	struct entry *e = queue_pop(&p->queues.free);
+
+	if (e) {
+		BUG_ON(from_cblock(p->nr_cblocks_allocated) >= from_cblock(p->cache_size));
+		p->nr_cblocks_allocated = to_cblock(from_cblock(p->nr_cblocks_allocated) + 1);
+	}
+
+	return e;
+}
+
+static void alloc_cblock_and_insert_cache(struct policy *p, struct entry *e)
+{
+	alloc_cblock(p, e->cblock);
+	insert_cache_hash_entry(p, e);
+}
+
+static void add_cache_entry(struct policy *p, struct entry *e)
+{
+	list_add_tail(&e->list, &p->queues.used);
+	alloc_cblock_and_insert_cache(p, e);
+}
+
+static void remove_cache_entry(struct policy *p, struct entry *e)
+{
+	remove_cache_hash_entry(p, e);
+	free_cblock(p, e->cblock);
+}
+
+static struct entry *evict_cache_entry(struct policy *p)
+{
+	struct entry *e = queue_pop(&p->queues.used);
+
+	BUG_ON(!e);
+	remove_cache_entry(p, e);
+
+	return e;
+}
+
+static void get_cache_block(struct policy *p, dm_oblock_t oblock, struct bio *bio,
+			    struct policy_result *result)
+{
+	struct entry *e = alloc_cache_entry(p);
+
+	if (e) {
+		int r = find_free_cblock(p, &e->cblock);
+
+		BUG_ON(r);
+		result->op = POLICY_NEW;
+
+	} else {
+		e = evict_cache_entry(p);
+		result->old_oblock = e->oblock;
+		result->op = POLICY_REPLACE;
+	}
+
+	result->cblock = e->cblock;
+	e->oblock = oblock;
+	add_cache_entry(p, e);
+}
+
+static bool in_cache(struct policy *p, dm_oblock_t oblock, dm_cblock_t *cblock)
+{
+	struct entry *e = lookup_cache_entry(p, oblock);
+
+	if (!e)
+		return false;
+
+	*cblock = e->cblock;
+	return true;
+}
+
+/*----------------------------------------------------------------------------*/
+
+/* Public interface (see dm-cache-policy.h */
+static int hints_map(struct dm_cache_policy *pe, dm_oblock_t oblock,
+		     bool can_block, bool can_migrate, bool discarded_oblock,
+		     struct bio *bio, struct policy_result *result)
+{
+	int r = 0;
+	struct policy *p = to_policy(pe);
+
+	result->op = POLICY_MISS;
+
+	if (can_block)
+		mutex_lock(&p->lock);
+
+	else if (!mutex_trylock(&p->lock))
+		return -EWOULDBLOCK;
+
+
+	if (in_cache(p, oblock, &result->cblock))
+		result->op = POLICY_HIT;
+
+	else if (!can_migrate)
+		r = -EWOULDBLOCK;
+
+	else
+		get_cache_block(p, oblock, bio, result);
+
+	mutex_unlock(&p->lock);
+
+	return r;
+}
+
+static int hints_lookup(struct dm_cache_policy *pe, dm_oblock_t oblock, dm_cblock_t *cblock)
+{
+	int r;
+	struct policy *p = to_policy(pe);
+
+	if (!mutex_trylock(&p->lock))
+		return -EWOULDBLOCK;
+
+	r = in_cache(p, oblock, cblock) ? 0 : -ENOENT;
+
+	mutex_unlock(&p->lock);
+
+	return r;
+}
+
+static void hints_destroy(struct dm_cache_policy *pe)
+{
+	struct policy *p = to_policy(pe);
+
+	free_bitset(p->allocation_bitset);
+	free_cache_blocks_and_hash(p);
+	kfree(p->hints_buffer);
+	kfree(p);
+}
+
+/*----------------------------------------------------------------------------*/
+
+/* Hints endianess conversions */
+#define __le8 uint8_t
+struct hints_ptrs {
+	__le64 *le64_hints;
+	__le32 *le32_hints;
+	__le16 *le16_hints;
+	__le8  *le8_hints;
+
+	uint64_t *u64_hints;
+	uint32_t *u32_hints;
+	uint16_t *u16_hints;
+	uint8_t  *u8_hints;
+};
+
+typedef int (*hints_xfer_fn_t) (struct hints_ptrs*, unsigned, unsigned, bool);
+
+#define cpu_to_le8(x) (x)
+#define le8_to_cpu(x) (x)
+
+#define HINTS_XFER(width) \
+static int hints_ ## width ## _xfer(struct hints_ptrs *p, unsigned idx, unsigned val, bool to_disk) \
+{ \
+	if (to_disk) \
+		p->le ## width ## _hints[idx] = cpu_to_le ## width(val); \
+\
+	else { \
+		p->u ## width ## _hints[idx] = le ## width ## _to_cpu(p->le ## width ## _hints[idx]); \
+		if (p->u ## width ## _hints[idx] != val) { \
+			DMERR_LIMIT("%s -- hint value %llu != %u", __func__, \
+				    (long long unsigned) p->u ## width ## _hints[idx], val); \
+			return -EINVAL; \
+		} \
+	} \
+\
+	return 0; \
+}
+
+HINTS_XFER(64)
+HINTS_XFER(32)
+HINTS_XFER(16)
+HINTS_XFER(8)
+
+static void calc_hint_value_counters(struct policy *p)
+{
+	unsigned div, rest = dm_cache_policy_get_hint_size(&p->policy), u;
+
+	for (u = 3, div = sizeof(uint64_t); rest; u--, div >>= 1) {
+		p->hint_counter[u] = rest / div;
+		rest -= p->hint_counter[u] * div;
+	}
+}
+
+/* Macro to set hint ptr for width on LHS based on RHS width<<1 */
+#define PTR_INC(lhs, rhs, c) \
+	inc = 2 * p->hint_counter[c]; \
+	ptrs->le ## lhs ## _hints = (__le ## lhs  *) ptrs->le ## rhs ## _hints + inc; \
+	ptrs->u ## lhs ## _hints  = (uint ## lhs ## _t *) ptrs->u ## rhs ## _hints  + inc;
+
+static void set_hints_ptrs(struct policy *p, struct hints_ptrs *ptrs)
+{
+	unsigned inc;
+
+	ptrs->le64_hints = p->hints_buffer;
+	ptrs->u64_hints  = p->hints_buffer;
+
+	PTR_INC(32, 64, 3)
+	PTR_INC(16, 32, 2)
+	PTR_INC(8,  16, 1)
+}
+
+static void __hints_xfer_disk(struct policy *p, bool to_disk)
+{
+	unsigned idx, u, val;
+	hints_xfer_fn_t hints_xfer_fns[] = {
+		hints_8_xfer,
+		hints_16_xfer,
+		hints_32_xfer,
+		hints_64_xfer
+	};
+
+	struct hints_ptrs hints_ptrs;
+
+	if (!p->hint_size_set) {
+		calc_hint_value_counters(p);
+		p->hint_size_set = true;
+	}
+
+	/* Must happen after calc_hint_value_counters()! */
+	set_hints_ptrs(p, &hints_ptrs);
+
+	val = 1;
+	u = ARRAY_SIZE(hints_xfer_fns);
+	while (u--) {
+		for (idx = 0; idx < p->hint_counter[u]; idx++) {
+			/*
+			 * val only suitable because of 256 hint value limitation.
+			 *
+			 * An uint8_t maxes at 255, so we could theoretically
+			 * test hint sizes up to 2023 bytes with this limitation.
+			 */
+			if (hints_xfer_fns[u](&hints_ptrs, idx, val, to_disk))
+				return;
+
+			val++;
+		}
+	}
+
+	return;
+}
+
+static void hints_preset_and_to_disk(struct policy *p)
+{
+	__hints_xfer_disk(p, true);
+}
+
+static void hints_from_disk_and_check(struct policy *p)
+{
+	__hints_xfer_disk(p, false);
+}
+
+static int hints_load_mapping(struct dm_cache_policy *pe,
+			      dm_oblock_t oblock, dm_cblock_t cblock,
+			      void *hint, bool hint_valid)
+{
+	struct policy *p = to_policy(pe);
+	struct entry *e;
+
+	e = alloc_cache_entry(p);
+	if (!e)
+		return -ENOMEM;
+
+	e->cblock = cblock;
+	e->oblock = oblock;
+
+	if (hint_valid) {
+		void *tmp = p->hints_buffer;
+
+		p->hints_buffer = hint;
+		hints_from_disk_and_check(p);
+		p->hints_buffer = tmp;
+	}
+
+	alloc_cblock_and_insert_cache(p, e);
+
+	return 0;
+}
+
+/* Walk mappings */
+static int hints_walk_mappings(struct dm_cache_policy *pe, policy_walk_fn fn, void *context)
+{
+	int r = 0;
+	struct policy *p = to_policy(pe);
+	struct entry *e;
+
+	hints_preset_and_to_disk(p);
+
+	mutex_lock(&p->lock);
+
+	list_for_each_entry(e, &p->queues.used, list) {
+		__dm_bless_for_disk(p->hints_buffer);
+		r = fn(context, e->cblock, e->oblock, (void *) p->hints_buffer);
+		if (r)
+			break;
+	}
+
+	mutex_unlock(&p->lock);
+
+	return r;
+}
+
+static struct entry *__hints_force_remove_mapping(struct policy *p, dm_oblock_t oblock)
+{
+	struct entry *e = lookup_cache_entry(p, oblock);
+
+	BUG_ON(!e);
+
+	list_del(&e->list);
+	remove_cache_entry(p, e);
+
+	return e;
+}
+
+static void hints_remove_mapping(struct dm_cache_policy *pe, dm_oblock_t oblock)
+{
+	struct policy *p = to_policy(pe);
+	struct entry *e;
+
+	mutex_lock(&p->lock);
+	e = __hints_force_remove_mapping(p, oblock);
+	list_add_tail(&e->list, &p->queues.free);
+
+	BUG_ON(!from_cblock(p->nr_cblocks_allocated));
+	p->nr_cblocks_allocated = to_cblock(from_cblock(p->nr_cblocks_allocated) - 1);
+	mutex_unlock(&p->lock);
+}
+
+static void hints_force_mapping(struct dm_cache_policy *pe,
+				dm_oblock_t current_oblock, dm_oblock_t oblock)
+{
+	struct policy *p = to_policy(pe);
+	struct entry *e;
+
+	mutex_lock(&p->lock);
+
+	e = __hints_force_remove_mapping(p, current_oblock);
+	e->oblock = oblock;
+	add_cache_entry(p, e);
+
+	mutex_unlock(&p->lock);
+}
+
+static dm_cblock_t hints_residency(struct dm_cache_policy *pe)
+{
+	/* FIXME: lock mutex, not sure we can block here. */
+	return to_policy(pe)->nr_cblocks_allocated;
+}
+
+static int hints_set_config_value(struct dm_cache_policy *pe,
+				  const char *key, const char *value)
+{
+	if (!strcasecmp(key, "hint_size")) {
+		struct policy *p = to_policy(pe);
+
+		if (p->hint_size_set)
+			return -EPERM;
+
+		else {
+			unsigned tmp;
+
+			if (kstrtou32(value, 10, &tmp))
+				return -EINVAL;
+
+			else {
+				int r = dm_cache_policy_set_hint_size(pe, tmp);
+
+				if (!r) {
+					calc_hint_value_counters(p);
+					p->hint_size_set = true;
+				}
+
+				return r;
+			}
+		}
+	}
+
+	return -EINVAL;
+}
+
+static int hints_emit_config_values(struct dm_cache_policy *pe, char *result, unsigned maxlen)
+{
+	ssize_t sz = 0;
+
+	DMEMIT("2 hint_size %llu", (long long unsigned) dm_cache_policy_get_hint_size(pe));
+	return 0;
+}
+
+/* Init the policy plugin interface function pointers. */
+static void init_policy_functions(struct policy *p)
+{
+	p->policy.destroy = hints_destroy;
+	p->policy.map = hints_map;
+	p->policy.lookup = hints_lookup;
+#if 0
+	p->policy.set_dirty = NULL;
+	p->policy.clear_dirty = NULL;
+#endif
+	p->policy.load_mapping = hints_load_mapping;
+	p->policy.walk_mappings = hints_walk_mappings;
+	p->policy.remove_mapping = hints_remove_mapping;
+	p->policy.writeback_work = NULL;
+	p->policy.force_mapping = hints_force_mapping;
+	p->policy.residency = hints_residency;
+	p->policy.tick = NULL;
+	p->policy.emit_config_values = hints_emit_config_values;
+	p->policy.set_config_value = hints_set_config_value;
+}
+
+static struct dm_cache_policy *hints_policy_create(dm_cblock_t cache_size,
+						   sector_t origin_size,
+						   sector_t block_size)
+{
+	int r;
+	struct policy *p;
+
+	BUILD_BUG_ON(DEFAULT_HINT_SIZE > 2023);
+
+	p = kzalloc(sizeof(*p), GFP_KERNEL);
+	if (!p)
+		return NULL;
+
+	init_policy_functions(p);
+
+	p->cache_size = cache_size;
+	p->find_free_nr_words = bit_set_nr_words(from_cblock(cache_size));
+	p->find_free_last_word = 0;
+	p->block_size = block_size;
+	p->origin_size = origin_size;
+	mutex_init(&p->lock);
+	INIT_LIST_HEAD(&p->queues.free);
+	INIT_LIST_HEAD(&p->queues.used);
+
+	/* Allocate cache entry structs and add them to free list. */
+	r = alloc_cache_blocks_with_hash(p, from_cblock(cache_size));
+	if (r)
+		goto bad_free_policy;
+
+	/* Cache allocation bitset. */
+	p->allocation_bitset = alloc_bitset(from_cblock(cache_size));
+	if (!p->allocation_bitset)
+		goto bad_free_cache_blocks_and_hash;
+
+	p->hints_buffer = kzalloc(DM_CACHE_POLICY_MAX_HINT_SIZE, GFP_KERNEL);
+	if (!p->hints_buffer)
+		goto bad_free_allocation_bitset;
+
+	p->hint_size_set = false;
+
+	return &p->policy;
+
+bad_free_allocation_bitset:
+	free_bitset(p->allocation_bitset);
+bad_free_cache_blocks_and_hash:
+	free_cache_blocks_and_hash(p);
+bad_free_policy:
+	kfree(p);
+
+	return NULL;
+}
+
+/*----------------------------------------------------------------------------*/
+static struct dm_cache_policy_type hints_policy_type = {
+	.name = "hints",
+	.version = {1, 0, 0},
+	.hint_size = DEFAULT_HINT_SIZE,
+	.owner = THIS_MODULE,
+	.create = hints_policy_create
+};
+
+static int __init hints_init(void)
+{
+	int r = -ENOMEM;
+
+	hints_entry_cache = kmem_cache_create("dm_hints_policy_cache_entry",
+					      sizeof(struct entry),
+					      __alignof__(struct entry),
+					      0, NULL);
+	if (hints_entry_cache) {
+		r = dm_cache_policy_register(&hints_policy_type);
+		if (r)
+			kmem_cache_destroy(hints_entry_cache);
+
+		else {
+			DMINFO("version %u.%u.%u loaded",
+			       hints_policy_type.version[0],
+			       hints_policy_type.version[1],
+			       hints_policy_type.version[2]);
+		}
+	}
+
+	return r;
+}
+
+static void __exit hints_exit(void)
+{
+	dm_cache_policy_unregister(&hints_policy_type);
+	kmem_cache_destroy(hints_entry_cache);
+}
+
+module_init(hints_init);
+module_exit(hints_exit);
+
+MODULE_AUTHOR("Heinz Mauelshagen <dm-devel@redhat.com>");
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("hint size test cache policy");
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 23/24] dm cache: add cache block invalidation API
  2013-10-24 18:30 [PATCH 00/24] dm cache: proposed changes for v3.13 merge Mike Snitzer
                   ` (21 preceding siblings ...)
  2013-10-24 18:30 ` [PATCH 22/24] dm cache: add hints policy Mike Snitzer
@ 2013-10-24 18:30 ` Mike Snitzer
  2013-10-24 18:30 ` [PATCH 24/24] dm cache policy era: add cache block invalidation support Mike Snitzer
  23 siblings, 0 replies; 49+ messages in thread
From: Mike Snitzer @ 2013-10-24 18:30 UTC (permalink / raw)
  To: dm-devel; +Cc: Morgan Mears, Heinz Mauelshagen, Joe Thornber, Mike Snitzer

From: Heinz Mauelshagen <heinzm@redhat.com>

This commit introduces an invalidation API to dm-cache in order to allow
for full functionality of the stackable "era" policy shim.

It adds:
- a core target worker function, invalidate_mappings(), to invalidate a
  range of blocks being requested by a message
- a respective policy_invalidate_mapping() function to carry out a cache
  block invalidation

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-cache-policy-cleaner.c  |   8 +-
 drivers/md/dm-cache-policy-internal.h |  16 ++--
 drivers/md/dm-cache-policy-mq.c       |  85 ++++++++++++------
 drivers/md/dm-cache-policy-trc.c      |  16 +++-
 drivers/md/dm-cache-policy.h          |  46 +++++++++-
 drivers/md/dm-cache-shim-utils.c      |  15 +++-
 drivers/md/dm-cache-target.c          | 165 ++++++++++++++++++++++++++++++++--
 7 files changed, 297 insertions(+), 54 deletions(-)

diff --git a/drivers/md/dm-cache-policy-cleaner.c b/drivers/md/dm-cache-policy-cleaner.c
index 7e5983c..e6273bb 100644
--- a/drivers/md/dm-cache-policy-cleaner.c
+++ b/drivers/md/dm-cache-policy-cleaner.c
@@ -243,7 +243,7 @@ static void __set_clear_dirty(struct dm_cache_policy *pe, dm_oblock_t oblock, bo
 	}
 }
 
-static void wb_set_dirty(struct dm_cache_policy *pe, dm_oblock_t oblock)
+static int wb_set_dirty(struct dm_cache_policy *pe, dm_oblock_t oblock)
 {
 	struct policy *p = to_policy(pe);
 	unsigned long flags;
@@ -251,9 +251,11 @@ static void wb_set_dirty(struct dm_cache_policy *pe, dm_oblock_t oblock)
 	spin_lock_irqsave(&p->lock, flags);
 	__set_clear_dirty(pe, oblock, true);
 	spin_unlock_irqrestore(&p->lock, flags);
+
+	return 0;
 }
 
-static void wb_clear_dirty(struct dm_cache_policy *pe, dm_oblock_t oblock)
+static int wb_clear_dirty(struct dm_cache_policy *pe, dm_oblock_t oblock)
 {
 	struct policy *p = to_policy(pe);
 	unsigned long flags;
@@ -261,6 +263,8 @@ static void wb_clear_dirty(struct dm_cache_policy *pe, dm_oblock_t oblock)
 	spin_lock_irqsave(&p->lock, flags);
 	__set_clear_dirty(pe, oblock, false);
 	spin_unlock_irqrestore(&p->lock, flags);
+
+	return 0;
 }
 
 static void add_cache_entry(struct policy *p, struct wb_cache_entry *e)
diff --git a/drivers/md/dm-cache-policy-internal.h b/drivers/md/dm-cache-policy-internal.h
index 996b2b5..4245a38 100644
--- a/drivers/md/dm-cache-policy-internal.h
+++ b/drivers/md/dm-cache-policy-internal.h
@@ -27,16 +27,14 @@ static inline int policy_lookup(struct dm_cache_policy *p, dm_oblock_t oblock, d
 	return p->lookup(p, oblock, cblock);
 }
 
-static inline void policy_set_dirty(struct dm_cache_policy *p, dm_oblock_t oblock)
+static inline int policy_set_dirty(struct dm_cache_policy *p, dm_oblock_t oblock)
 {
-	if (p->set_dirty)
-		p->set_dirty(p, oblock);
+	return p->set_dirty ? p->set_dirty(p, oblock) : -EINVAL;
 }
 
-static inline void policy_clear_dirty(struct dm_cache_policy *p, dm_oblock_t oblock)
+static inline int policy_clear_dirty(struct dm_cache_policy *p, dm_oblock_t oblock)
 {
-	if (p->clear_dirty)
-		p->clear_dirty(p, oblock);
+	return p->clear_dirty ? p->clear_dirty(p, oblock) : -EINVAL;
 }
 
 static inline int policy_load_mapping(struct dm_cache_policy *p,
@@ -70,6 +68,12 @@ static inline void policy_force_mapping(struct dm_cache_policy *p,
 	return p->force_mapping(p, current_oblock, new_oblock);
 }
 
+static inline int policy_invalidate_mapping(struct dm_cache_policy *p,
+					    dm_oblock_t *oblock, dm_cblock_t *cblock)
+{
+	return p->invalidate_mapping ? p->invalidate_mapping(p, oblock, cblock) : -EINVAL;
+}
+
 static inline dm_cblock_t policy_residency(struct dm_cache_policy *p)
 {
 	return p->residency(p);
diff --git a/drivers/md/dm-cache-policy-mq.c b/drivers/md/dm-cache-policy-mq.c
index 9f2589e..88f3bc0e 100644
--- a/drivers/md/dm-cache-policy-mq.c
+++ b/drivers/md/dm-cache-policy-mq.c
@@ -994,16 +994,18 @@ static int mq_lookup(struct dm_cache_policy *p, dm_oblock_t oblock, dm_cblock_t
 }
 
 // FIXME: can __mq_set_clear_dirty block?
-static void __mq_set_clear_dirty(struct dm_cache_policy *p, dm_oblock_t oblock, bool set)
+static int __mq_set_clear_dirty(struct dm_cache_policy *p, dm_oblock_t oblock, bool set)
 {
+	int r = 0;
 	struct mq_policy *mq = to_mq_policy(p);
 	struct entry *e;
 
 	mutex_lock(&mq->lock);
 	e = hash_lookup(mq, oblock);
-	if (!e)
+	if (!e) {
+		r = -ENOENT;
 		DMWARN("__mq_set_clear_dirty called for a block that isn't in the cache");
-	else {
+	} else {
 		BUG_ON(!e->in_cache);
 
 		del(mq, e);
@@ -1011,16 +1013,18 @@ static void __mq_set_clear_dirty(struct dm_cache_policy *p, dm_oblock_t oblock,
 		push(mq, e);
 	}
 	mutex_unlock(&mq->lock);
+
+	return r;
 }
 
-static void mq_set_dirty(struct dm_cache_policy *p, dm_oblock_t oblock)
+static int mq_set_dirty(struct dm_cache_policy *p, dm_oblock_t oblock)
 {
-	__mq_set_clear_dirty(p, oblock, true);
+	return __mq_set_clear_dirty(p, oblock, true);
 }
 
-static void mq_clear_dirty(struct dm_cache_policy *p, dm_oblock_t oblock)
+static int mq_clear_dirty(struct dm_cache_policy *p, dm_oblock_t oblock)
 {
-	__mq_set_clear_dirty(p, oblock, false);
+	return __mq_set_clear_dirty(p, oblock, false);
 }
 
 static int mq_load_mapping(struct dm_cache_policy *p,
@@ -1082,23 +1086,40 @@ static int mq_walk_mappings(struct dm_cache_policy *p, policy_walk_fn fn,
 	return r;
 }
 
-static void mq_remove_mapping(struct dm_cache_policy *p, dm_oblock_t oblock)
+static int __remove_mapping(struct mq_policy *mq,
+			    dm_oblock_t oblock, dm_cblock_t *cblock)
 {
-	struct mq_policy *mq = to_mq_policy(p);
 	struct entry *e;
 
-	mutex_lock(&mq->lock);
-
 	e = hash_lookup(mq, oblock);
 
-	BUG_ON(!e || !e->in_cache);
+	if (e && e->in_cache) {
+		del(mq, e);
+		e->in_cache = false;
+		e->dirty = false;
 
-	del(mq, e);
-	e->in_cache = false;
-	e->dirty = false;
-	push(mq, e);
+		if (cblock) {
+			*cblock = e->cblock;
+			list_add(&e->list, &mq->free);
+		} else
+			push(mq, e);
+
+		return 0;
+	}
+
+	return -ENOENT;
+}
+
+static void mq_remove_mapping(struct dm_cache_policy *p, dm_oblock_t oblock)
+{
+	int r;
+	struct mq_policy *mq = to_mq_policy(p);
 
+	mutex_lock(&mq->lock);
+	r = __remove_mapping(mq, oblock, NULL);
 	mutex_unlock(&mq->lock);
+
+	BUG_ON(r);
 }
 
 static int __mq_writeback_work(struct mq_policy *mq, dm_oblock_t *oblock,
@@ -1130,17 +1151,17 @@ static int mq_writeback_work(struct dm_cache_policy *p, dm_oblock_t *oblock,
 	return r;
 }
 
-static void force_mapping(struct mq_policy *mq,
-			  dm_oblock_t current_oblock, dm_oblock_t new_oblock)
+static void __force_mapping(struct mq_policy *mq,
+			    dm_oblock_t current_oblock, dm_oblock_t new_oblock)
 {
 	struct entry *e = hash_lookup(mq, current_oblock);
 
-	BUG_ON(!e || !e->in_cache);
-
-	del(mq, e);
-	e->oblock = new_oblock;
-	e->dirty = true;
-	push(mq, e);
+	if (e && e->in_cache) {
+		del(mq, e);
+		e->oblock = new_oblock;
+		e->dirty = true;
+		push(mq, e);
+	}
 }
 
 static void mq_force_mapping(struct dm_cache_policy *p,
@@ -1149,10 +1170,23 @@ static void mq_force_mapping(struct dm_cache_policy *p,
 	struct mq_policy *mq = to_mq_policy(p);
 
 	mutex_lock(&mq->lock);
-	force_mapping(mq, current_oblock, new_oblock);
+	__force_mapping(mq, current_oblock, new_oblock);
 	mutex_unlock(&mq->lock);
 }
 
+static int mq_invalidate_mapping(struct dm_cache_policy *p,
+				 dm_oblock_t *oblock, dm_cblock_t *cblock)
+{
+	int r;
+	struct mq_policy *mq = to_mq_policy(p);
+
+	mutex_lock(&mq->lock);
+	r = __remove_mapping(mq, *oblock, cblock);
+	mutex_unlock(&mq->lock);
+
+	return r;
+}
+
 static dm_cblock_t mq_residency(struct dm_cache_policy *p)
 {
 	struct mq_policy *mq = to_mq_policy(p);
@@ -1218,6 +1252,7 @@ static void init_policy_functions(struct mq_policy *mq)
 	mq->policy.remove_mapping = mq_remove_mapping;
 	mq->policy.writeback_work = mq_writeback_work;
 	mq->policy.force_mapping = mq_force_mapping;
+	mq->policy.invalidate_mapping = mq_invalidate_mapping;
 	mq->policy.residency = mq_residency;
 	mq->policy.tick = mq_tick;
 	mq->policy.emit_config_values = mq_emit_config_values;
diff --git a/drivers/md/dm-cache-policy-trc.c b/drivers/md/dm-cache-policy-trc.c
index 8b16061..83bc8c3 100644
--- a/drivers/md/dm-cache-policy-trc.c
+++ b/drivers/md/dm-cache-policy-trc.c
@@ -87,16 +87,16 @@ static int trc_lookup(struct dm_cache_policy *p, dm_oblock_t oblock,
 	return policy_lookup(p->child, oblock, cblock);
 }
 
-static void trc_set_dirty(struct dm_cache_policy *p, dm_oblock_t oblock)
+static int trc_set_dirty(struct dm_cache_policy *p, dm_oblock_t oblock)
 {
 	DM_TRC_OUT(DM_TRC_LEV_NORMAL, p, "%p %llu", p, oblock);
-	policy_set_dirty(p->child, oblock);
+	return policy_set_dirty(p->child, oblock);
 }
 
-static void trc_clear_dirty(struct dm_cache_policy *p, dm_oblock_t oblock)
+static int trc_clear_dirty(struct dm_cache_policy *p, dm_oblock_t oblock)
 {
 	DM_TRC_OUT(DM_TRC_LEV_NORMAL, p, "%p %llu", p, oblock);
-	policy_clear_dirty(p->child, oblock);
+	return policy_clear_dirty(p->child, oblock);
 }
 
 static int trc_load_mapping(struct dm_cache_policy *p,
@@ -137,6 +137,13 @@ static void trc_force_mapping(struct dm_cache_policy *p,
 	policy_force_mapping(p->child, old_oblock, new_oblock);
 }
 
+static int trc_invalidate_mapping(struct dm_cache_policy *p,
+				  dm_oblock_t *oblock, dm_cblock_t *cblock)
+{
+	DM_TRC_OUT(DM_TRC_LEV_NORMAL, p, "%p %llu %u", p, from_oblock(*oblock), from_cblock(*cblock));
+	return policy_invalidate_mapping(p->child, oblock, cblock);
+}
+
 static dm_cblock_t trc_residency(struct dm_cache_policy *p)
 {
 	DM_TRC_OUT(DM_TRC_LEV_NORMAL, p, "%p", p);
@@ -191,6 +198,7 @@ static void init_policy_functions(struct trc_policy *trc)
 	trc->policy.remove_mapping = trc_remove_mapping;
 	trc->policy.writeback_work = trc_writeback_work;
 	trc->policy.force_mapping = trc_force_mapping;
+	trc->policy.invalidate_mapping = trc_invalidate_mapping;
 	trc->policy.residency = trc_residency;
 	trc->policy.tick = trc_tick;
 	trc->policy.emit_config_values = trc_emit_config_values;
diff --git a/drivers/md/dm-cache-policy.h b/drivers/md/dm-cache-policy.h
index 83ec775..33ddb69 100644
--- a/drivers/md/dm-cache-policy.h
+++ b/drivers/md/dm-cache-policy.h
@@ -138,10 +138,21 @@ struct dm_cache_policy {
 	int (*lookup)(struct dm_cache_policy *p, dm_oblock_t oblock, dm_cblock_t *cblock);
 
 	/*
-	 * oblock must be a mapped block.  Must not block.
+	 * set/clear a blocks dirty state.
+	 *
+	 * oblock is the block we want to change state for.  Must not block.
+	 *
+	 * Returns:
+	 *
+	 * 0       if block is in cache _and_ set/clear respectively succeded
+	 *
+	 * -EINVAL if block is in cache _but_ block was already set to dirty
+	 *  	   on a set call / clean on a clean call
+	 *
+	 * -ENOENT if block is not in cache
 	 */
-	void (*set_dirty)(struct dm_cache_policy *p, dm_oblock_t oblock);
-	void (*clear_dirty)(struct dm_cache_policy *p, dm_oblock_t oblock);
+	int (*set_dirty)(struct dm_cache_policy *p, dm_oblock_t oblock);
+	int (*clear_dirty)(struct dm_cache_policy *p, dm_oblock_t oblock);
 
 	/*
 	 * Called when a cache target is first created.  Used to load a
@@ -161,8 +172,35 @@ struct dm_cache_policy {
 	void (*force_mapping)(struct dm_cache_policy *p, dm_oblock_t current_oblock,
 			      dm_oblock_t new_oblock);
 
-	int (*writeback_work)(struct dm_cache_policy *p, dm_oblock_t *oblock, dm_cblock_t *cblock);
+	/*
+	 * Invalidate mapping for an origin block.
+	 *
+	 * Returns:
+	 *
+	 * 0 and @cblock,@oblock: if mapped, the policy returns the cache block
+	 *			  and optionally changes the original block (e.g. era)
+	 *
+	 * -EINVAL: invalidation not supported
+	 *
+	 * -ENOENT: no entry for @oblock in the cache
+	 *
+	 * -ENODATA: all possible invalidation requests processed
+	 *
+	 * May return a _different_ oblock than the requested one
+	 * to allow the policy to rule which block to invalidate (e.g. era).
+	 */
+	int (*invalidate_mapping)(struct dm_cache_policy *p, dm_oblock_t *oblock, dm_cblock_t *cblock);
 
+	/*
+	 * Provide a dirty block to be written back by the core target.
+	 *
+	 * Returns:
+	 *
+	 * 0 and @cblock,@oblock: block to write back provided
+	 *
+	 * -ENODATA: no dirty blocks available
+	 */
+	int (*writeback_work)(struct dm_cache_policy *p, dm_oblock_t *oblock, dm_cblock_t *cblock);
 
 	/*
 	 * How full is the cache?
diff --git a/drivers/md/dm-cache-shim-utils.c b/drivers/md/dm-cache-shim-utils.c
index 4151883..8b8d5d5 100644
--- a/drivers/md/dm-cache-shim-utils.c
+++ b/drivers/md/dm-cache-shim-utils.c
@@ -76,14 +76,14 @@ static int shim_lookup(struct dm_cache_policy *p, dm_oblock_t oblock,
 	return policy_lookup(p->child, oblock, cblock);
 }
 
-static void shim_set_dirty(struct dm_cache_policy *p, dm_oblock_t oblock)
+static int shim_set_dirty(struct dm_cache_policy *p, dm_oblock_t oblock)
 {
-	policy_set_dirty(p->child, oblock);
+	return policy_set_dirty(p->child, oblock);
 }
 
-static void shim_clear_dirty(struct dm_cache_policy *p, dm_oblock_t oblock)
+static int shim_clear_dirty(struct dm_cache_policy *p, dm_oblock_t oblock)
 {
-	policy_clear_dirty(p->child, oblock);
+	return policy_clear_dirty(p->child, oblock);
 }
 
 static int shim_load_mapping(struct dm_cache_policy *p,
@@ -130,6 +130,12 @@ static void shim_force_mapping(struct dm_cache_policy *p,
 	policy_force_mapping(p->child, current_oblock, new_oblock);
 }
 
+static int shim_invalidate_mapping(struct dm_cache_policy *p,
+				   dm_oblock_t *oblock, dm_cblock_t *cblock)
+{
+	return policy_invalidate_mapping(p->child, oblock, cblock);
+}
+
 static dm_cblock_t shim_residency(struct dm_cache_policy *p)
 {
 	return policy_residency(p->child);
@@ -164,6 +170,7 @@ void dm_cache_shim_utils_init_shim_policy(struct dm_cache_policy *p)
 	p->remove_mapping = shim_remove_mapping;
 	p->writeback_work = shim_writeback_work;
 	p->force_mapping = shim_force_mapping;
+	p->invalidate_mapping = shim_invalidate_mapping;
 	p->residency = shim_residency;
 	p->tick = shim_tick;
 	p->emit_config_values = shim_emit_config_values;
diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index 502ae64..e3f474a 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -155,6 +155,12 @@ struct cache {
 	dm_cblock_t cache_size;
 
 	/*
+	 * Original block begin/end range to invalidate any mapped cache entries for.
+	 */
+	dm_oblock_t begin_invalidate;
+	dm_oblock_t end_invalidate;
+
+	/*
 	 * Fields for converting from sectors to blocks.
 	 */
 	uint32_t sectors_per_block;
@@ -210,6 +216,7 @@ struct cache {
 	bool need_tick_bio:1;
 	bool sized:1;
 	bool quiescing:1;
+	bool invalidate:1;
 	bool commit_requested:1;
 	bool loaded_mappings:1;
 	bool loaded_discards:1;
@@ -251,6 +258,7 @@ struct dm_cache_migration {
 	bool writeback:1;
 	bool demote:1;
 	bool promote:1;
+	bool invalidate:1;
 
 	struct dm_bio_prison_cell *old_ocell;
 	struct dm_bio_prison_cell *new_ocell;
@@ -841,6 +849,7 @@ static void migration_success_pre_commit(struct dm_cache_migration *mg)
 			cleanup_migration(mg);
 			return;
 		}
+
 	} else {
 		if (dm_cache_insert_mapping(cache->cmd, mg->cblock, mg->new_oblock)) {
 			DMWARN_LIMIT("promotion failed; couldn't update on disk metadata");
@@ -875,8 +884,11 @@ static void migration_success_post_commit(struct dm_cache_migration *mg)
 			list_add_tail(&mg->list, &cache->quiesced_migrations);
 			spin_unlock_irqrestore(&cache->lock, flags);
 
-		} else
+		} else {
+			if (mg->invalidate)
+				policy_remove_mapping(cache->policy, mg->old_oblock);
 			cleanup_migration(mg);
+		}
 
 	} else {
 		cell_defer(cache, mg->new_ocell, true);
@@ -1036,6 +1048,7 @@ static void promote(struct cache *cache, struct prealloc *structs,
 	mg->writeback = false;
 	mg->demote = false;
 	mg->promote = true;
+	mg->invalidate = false;
 	mg->cache = cache;
 	mg->new_oblock = oblock;
 	mg->cblock = cblock;
@@ -1057,6 +1070,7 @@ static void writeback(struct cache *cache, struct prealloc *structs,
 	mg->writeback = true;
 	mg->demote = false;
 	mg->promote = false;
+	mg->invalidate = false;
 	mg->cache = cache;
 	mg->old_oblock = oblock;
 	mg->cblock = cblock;
@@ -1080,6 +1094,7 @@ static void demote_then_promote(struct cache *cache, struct prealloc *structs,
 	mg->writeback = false;
 	mg->demote = true;
 	mg->promote = true;
+	mg->invalidate = false;
 	mg->cache = cache;
 	mg->old_oblock = old_oblock;
 	mg->new_oblock = new_oblock;
@@ -1106,6 +1121,7 @@ static void invalidate(struct cache *cache, struct prealloc *structs,
 	mg->writeback = false;
 	mg->demote = true;
 	mg->promote = false;
+	mg->invalidate = true;
 	mg->cache = cache;
 	mg->old_oblock = oblock;
 	mg->cblock = cblock;
@@ -1324,15 +1340,17 @@ static int need_commit_due_to_time(struct cache *cache)
 
 static int commit_if_needed(struct cache *cache)
 {
+	int r = 0;
+
 	if ((cache->commit_requested || need_commit_due_to_time(cache)) &&
 	    dm_cache_changed_this_transaction(cache->cmd)) {
 		atomic_inc(&cache->stats.commit_count);
-		cache->last_commit_jiffies = jiffies;
 		cache->commit_requested = false;
-		return dm_cache_commit(cache->cmd, false);
+		r = dm_cache_commit(cache->cmd, false);
+		cache->last_commit_jiffies = jiffies;
 	}
 
-	return 0;
+	return r;
 }
 
 static void process_deferred_bios(struct cache *cache)
@@ -1497,6 +1515,60 @@ static void requeue_deferred_io(struct cache *cache)
 		bio_endio(bio, DM_ENDIO_REQUEUE);
 }
 
+static void invalidate_mappings(struct cache *cache)
+{
+	dm_oblock_t oblock, end;
+	unsigned long long count = 0;
+
+	smp_rmb();
+
+	if (!cache->invalidate)
+		return;
+
+	oblock = cache->begin_invalidate;
+	end    = to_oblock(from_oblock(cache->end_invalidate) + 1);
+
+	while (oblock != end) {
+		int r;
+		dm_cblock_t cblock;
+		dm_oblock_t given_oblock = oblock;
+
+		r = policy_invalidate_mapping(cache->policy, &given_oblock, &cblock);
+		/*
+		 * Policy either doesn't suport invalidation (yet) or
+		 * doesn't offer any more blocks to invalidate (e.g. era).
+		  */
+		if (r == -EINVAL) {
+			DMWARN("policy doesn't support invalidation (yet).");
+			break;
+		}
+
+		if (r == -ENODATA)
+			break;
+
+		else if (!r) {
+			if (dm_cache_remove_mapping(cache->cmd, cblock)) {
+				DMWARN_LIMIT("invalidation failed; couldn't update on disk metadata");
+				r = policy_load_mapping(cache->policy, given_oblock, cblock, NULL, false);
+				BUG_ON(r);
+
+			} else {
+				/*
+				 * FIXME: we are cautious and keep this even though all
+				 *        blocks _should_ be clean in passthrough mode.
+				 */
+				clear_dirty(cache, given_oblock, cblock);
+				cache->commit_requested = true;
+				count++;
+			}
+		}
+
+		oblock = to_oblock(from_oblock(oblock) + 1);
+	}
+
+	cache->invalidate = false;
+}
+
 static int more_work(struct cache *cache)
 {
 	if (is_quiescing(cache))
@@ -1509,7 +1581,8 @@ static int more_work(struct cache *cache)
 			!bio_list_empty(&cache->deferred_writethrough_bios) ||
 			!list_empty(&cache->quiesced_migrations) ||
 			!list_empty(&cache->completed_migrations) ||
-			!list_empty(&cache->need_commit_migrations);
+			!list_empty(&cache->need_commit_migrations) ||
+			cache->invalidate;
 }
 
 static void do_worker(struct work_struct *ws)
@@ -1527,6 +1600,8 @@ static void do_worker(struct work_struct *ws)
 
 		process_deferred_writethrough_bios(cache);
 
+		invalidate_mappings(cache);
+
 		if (commit_if_needed(cache)) {
 			process_deferred_flush_bios(cache, false);
 
@@ -2181,6 +2256,7 @@ static int cache_create(struct cache_args *ca, struct cache **result)
 	cache->need_tick_bio = true;
 	cache->sized = false;
 	cache->quiescing = false;
+	cache->invalidate = false;
 	cache->commit_requested = false;
 	cache->loaded_mappings = false;
 	cache->loaded_discards = false;
@@ -2702,8 +2778,73 @@ err:
 	DMEMIT("Error");
 }
 
+static int get_origin_block(struct cache *cache, const char *what,
+			    char *arg, unsigned long long *val)
+{
+	unsigned long long last_block = from_oblock(cache->origin_blocks) - 1;
+
+	if (!strcmp(arg, "begin"))
+		*val = 0;
+
+	else if (!strcmp(arg, "end"))
+		*val = last_block;
+
+	else if (kstrtoull(arg, 10, val)) {
+		DMERR("%s origin block invalid", what);
+		return -EINVAL;
+
+	} else if (*val > last_block) {
+		*val = last_block;
+		DMERR("%s origin block adjusted to EOD=%llu", what, *val);
+	}
+
+	return 0;
+}
+
+static int set_invalidate_mappings(struct cache *cache, char **argv)
+{
+	unsigned long long begin, end;
+
+	if (strcasecmp(argv[0], "invalidate_mappings"))
+		return -EINVAL;
+
+	if (!passthrough_mode(&cache->features)) {
+		DMERR("cache has to be in passthrough mode for invalidation!");
+		return -EPERM;
+	}
+
+	if (cache->invalidate) {
+		DMERR("cache is processing invalidation");
+		return -EPERM;
+	}
+
+	if (get_origin_block(cache, "begin", argv[1], &begin) ||
+	    get_origin_block(cache, "end", argv[2], &end))
+		return -EINVAL;
+
+	if (begin > end) {
+		DMERR("begin origin block > end origin block");
+		return -EINVAL;
+	}
+
+	/*
+	 * Pass begin and end origin blocks to the worker and wake it.
+	 */
+	cache->begin_invalidate = to_oblock(begin);
+	cache->end_invalidate = to_oblock(end);
+	cache->invalidate = true;
+	smp_wmb();
+
+	wake_worker(cache);
+
+	return 0;
+}
+
 /*
- * Supports <key> <value>.
+ * Supports
+ *	"<key> <value>"
+ * and
+ *     "invalidate_mappings <begin_origin_block> <end_origin_block>".
  *
  * The key migration_threshold is supported by the cache target core.
  */
@@ -2711,10 +2852,16 @@ static int cache_message(struct dm_target *ti, unsigned argc, char **argv)
 {
 	struct cache *cache = ti->private;
 
-	if (argc != 2)
-		return -EINVAL;
+	switch (argc) {
+	case 2:
+		return set_config_value(cache, argv[0], argv[1]);
 
-	return set_config_value(cache, argv[0], argv[1]);
+	case 3:
+		return set_invalidate_mappings(cache, argv);
+
+	default:
+		return -EINVAL;
+	}
 }
 
 static int cache_iterate_devices(struct dm_target *ti,
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 24/24] dm cache policy era: add cache block invalidation support
  2013-10-24 18:30 [PATCH 00/24] dm cache: proposed changes for v3.13 merge Mike Snitzer
                   ` (22 preceding siblings ...)
  2013-10-24 18:30 ` [PATCH 23/24] dm cache: add cache block invalidation API Mike Snitzer
@ 2013-10-24 18:30 ` Mike Snitzer
  23 siblings, 0 replies; 49+ messages in thread
From: Mike Snitzer @ 2013-10-24 18:30 UTC (permalink / raw)
  To: dm-devel; +Cc: Morgan Mears, Heinz Mauelshagen, Joe Thornber, Mike Snitzer

From: Heinz Mauelshagen <heinzm@redhat.com>

This commit adds support for the cache block invalidation API to the
"era" policy shim.

It provides:
- a temporary store containing selected cache blocks filtered based on a
  message requesting eras equal to, less than, less equal, greater than
  or greater equal to an era number
- provisioning of the entries, to be invalidated in that temporary
  store, to the cache core target

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-cache-policy-era.c | 194 +++++++++++++++++++++++++++++++--------
 1 file changed, 154 insertions(+), 40 deletions(-)

diff --git a/drivers/md/dm-cache-policy-era.c b/drivers/md/dm-cache-policy-era.c
index 427514c..486717a 100644
--- a/drivers/md/dm-cache-policy-era.c
+++ b/drivers/md/dm-cache-policy-era.c
@@ -2,6 +2,8 @@
  * Copyright 2013 NetApp, Inc. All Rights Reserved, contribution by
  * Morgan Mears.
  *
+ * Copyright 2013 Red Hat, Inc. All Rights Reserved.
+ *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms of the GNU General Public License as published by the
  * Free Software Foundation; either version 2 of the License, or (at your
@@ -44,6 +46,13 @@ struct era_policy {
 	era_t *cb_to_era;
 
 	era_t era_counter;
+
+	/* Temporary store for unmap information during invalidation. */
+	struct {
+		unsigned long *bitset;
+		dm_oblock_t *oblocks;
+		unsigned long last_cblock;
+	} invalidate;
 };
 
 /*----------------------------------------------------------------*/
@@ -53,7 +62,69 @@ static struct era_policy *to_era_policy(struct dm_cache_policy *p)
 	return container_of(p, struct era_policy, policy);
 }
 
-static int incr_era_counter(struct era_policy *era, const char *curr_era_str)
+static unsigned long *alloc_bitset(unsigned nr_entries)
+{
+	size_t s = sizeof(unsigned long) * dm_div_up(nr_entries, BITS_PER_LONG);
+	return vzalloc(s);
+}
+
+static void free_bitset(unsigned long *bits)
+{
+	vfree(bits);
+}
+
+static dm_oblock_t *alloc_oblocks(unsigned nr_entries)
+{
+	size_t s = sizeof(dm_oblock_t) * nr_entries;
+	return vmalloc(s);
+}
+
+static void free_oblocks(dm_oblock_t *blocks)
+{
+	vfree(blocks);
+}
+
+static void free_invalidate(struct era_policy *era)
+{
+	if (era->invalidate.oblocks) {
+		free_oblocks(era->invalidate.oblocks);
+		era->invalidate.oblocks = NULL;
+	}
+
+	if (era->invalidate.bitset) {
+		free_bitset(era->invalidate.bitset);
+		era->invalidate.bitset = NULL; /* Being checked for! */
+	}
+}
+
+static int alloc_invalidate(struct era_policy *era)
+{
+	/* FIXME: memory consumption! */
+	era->invalidate.oblocks = alloc_oblocks(from_cblock(era->cache_size));
+	if (!era->invalidate.oblocks) {
+		DMERR("failed to allocate original blocks unmap array");
+		goto err;
+	}
+
+	era->invalidate.bitset = alloc_bitset(from_cblock(era->cache_size));
+	if (!era->invalidate.bitset) {
+		DMERR("failed to allocate cache blocks unmap bitset");
+		goto err;
+	}
+
+	era->invalidate.last_cblock = 0;
+	return 0;
+
+err:
+	free_invalidate(era);
+	return -ENOMEM;
+}
+
+
+typedef int (*era_match_fn_t)(era_t, era_t);
+
+static int incr_era_counter(struct era_policy *era, const char *curr_era_str,
+			    era_match_fn_t dummy)
 {
 	era_t curr_era_counter;
 	int r;
@@ -119,8 +190,6 @@ static int era_is_lt_value(era_t era, era_t value)
 	return era < value;
 }
 
-typedef int (*era_match_fn_t)(era_t, era_t);
-
 struct inval_oblocks_ctx {
 	struct era_policy *era;
 	era_match_fn_t era_match_fn;
@@ -131,26 +200,16 @@ static int era_inval_oblocks(void *context, dm_cblock_t cblock,
 			     dm_oblock_t oblock, void *unused)
 {
 	struct inval_oblocks_ctx *ctx = (struct inval_oblocks_ctx *)context;
-	struct dm_cache_policy *child;
-	era_t act_era;
+	era_t act_era = ctx->era->cb_to_era[from_cblock(cblock)];
 
-	act_era = ctx->era->cb_to_era[from_cblock(cblock)];
 	if (ctx->era_match_fn(act_era, ctx->test_era)) {
 #if DEBUG_ERA
 		DMDEBUG("cblock %u has era %u matching test_era %u; "
 			"marking mapping to be removed for oblock %llu.",
 			from_cblock(cblock), act_era, ctx->test_era, oblock);
 #endif
-		child = ctx->era->policy.child;
-
-		/*
-		 * This deadlocks (lock against self) because child is calling
-		 * us via the walk_mappings context callback, child's
-		 * walk_mappings holds child's lock, and child's remove_mappings
-		 * tries to get it again.  Not fixing because I believe the
-		 * invalidate API is going to change.
-		 */
-		/* child->remove_mapping(child, oblock); */
+		set_bit(from_cblock(cblock), ctx->era->invalidate.bitset);
+		ctx->era->invalidate.oblocks[from_cblock(cblock)] = oblock;
 	}
 
 	return 0;
@@ -164,6 +223,11 @@ static int cond_unmap_by_era(struct era_policy *era, const char *test_era_str,
 	era_t test_era;
 	int r;
 
+	if (era->invalidate.bitset) {
+		DMERR("previous unmap request exists");
+		return -EPERM;
+	}
+
 	/*
 	 * Unmap blocks with eras matching the given era, according to the
 	 * given matching function.
@@ -172,6 +236,10 @@ static int cond_unmap_by_era(struct era_policy *era, const char *test_era_str,
 	if (kstrtou32(test_era_str, 10, &test_era))
 		return -EINVAL;
 
+	r = alloc_invalidate(era);
+	if (r)
+		return r;
+
 	io_ctx.era = era;
 	io_ctx.era_match_fn = era_match_fn;
 	io_ctx.test_era = test_era;
@@ -197,10 +265,12 @@ static int cond_unmap_by_era(struct era_policy *era, const char *test_era_str,
 static void era_destroy(struct dm_cache_policy *p)
 {
 	struct era_policy *era = to_era_policy(p);
+
 #if DEBUG_ERA
-	DMDEBUG("destroyed era %p", era);
+	DMDEBUG("destroying era %p", era);
 #endif
-	kfree(era->cb_to_era);
+	free_invalidate(era);
+	vfree(era->cb_to_era);
 	kfree(era);
 }
 
@@ -216,6 +286,7 @@ static int era_map(struct dm_cache_policy *p, dm_oblock_t oblock,
 
 	if (can_block)
 		mutex_lock(&era->lock);
+
 	else if (!mutex_trylock(&era->lock))
 		return -EWOULDBLOCK;
 
@@ -257,6 +328,7 @@ static int era_load_mapping(struct dm_cache_policy *p,
 
 	r = policy_load_mapping(child, oblock, cblock, hint, hint_valid);
 
+	/* FIXME: recovered area valid on reload called from cache core invalidate mapping error path? */
 	if (!r && hint_valid &&
 	    (from_cblock(cblock) < from_cblock(era->cache_size))) {
 		recovered_era = le32_to_cpu(*le32_hint);
@@ -313,28 +385,71 @@ static void era_force_mapping(struct dm_cache_policy *p, dm_oblock_t old_oblock,
 	mutex_unlock(&era->lock);
 }
 
-static int era_set_config_value(struct dm_cache_policy *p, const char *key,
-				const char *value)
+/* Find next block to invalidate. */
+static int __find_invalidate_block(struct era_policy *era, dm_cblock_t *cblock)
+{
+	int bit = find_next_bit(era->invalidate.bitset, from_cblock(era->cache_size),
+				era->invalidate.last_cblock);
+
+	*cblock = to_cblock(bit);
+	era->invalidate.last_cblock = bit;
+	return bit < from_cblock(era->cache_size) ? 0 : -ENODATA;
+}
+
+static int era_invalidate_mapping(struct dm_cache_policy *p,
+				  dm_oblock_t *oblock, dm_cblock_t *cblock)
 {
 	struct era_policy *era = to_era_policy(p);
 	int r;
 
-	if (!strcasecmp(key, "increment_era_counter"))
-		r = incr_era_counter(era, value);
-	else if (!strcasecmp(key, "unmap_blocks_from_later_eras"))
-		r = cond_unmap_by_era(era, value, era_is_gt_value);
-	else if (!strcasecmp(key, "unmap_blocks_from_this_era_and_later"))
-		r = cond_unmap_by_era(era, value, era_is_gte_value);
-	else if (!strcasecmp(key, "unmap_blocks_from_this_era_and_earlier"))
-		r = cond_unmap_by_era(era, value, era_is_lte_value);
-	else if (!strcasecmp(key, "unmap_blocks_from_earlier_eras"))
-		r = cond_unmap_by_era(era, value, era_is_lt_value);
-	else
-		r = policy_set_config_value(p->child, key, value);
+	if (!era->invalidate.bitset)
+		return -ENODATA;
+
+	r = __find_invalidate_block(era, cblock);
+	if (r < 0)
+		free_invalidate(era);
+	else {
+		BUG_ON(from_cblock(*cblock) >= from_cblock(era->cache_size));
+		BUG_ON(!test_bit(from_cblock(*cblock), era->invalidate.bitset));
+		clear_bit(from_cblock(*cblock), era->invalidate.bitset);
+		*oblock = era->invalidate.oblocks[from_cblock(*cblock)];
+		r = policy_invalidate_mapping(p->child, oblock, cblock);
+#if DEBUG_ERA
+		DMDEBUG("unmapped cblock=%u oblock=%llu", from_cblock(*cblock), from_oblock(*oblock));
+#endif
+	}
 
 	return r;
 }
 
+struct config_value_handler {
+	const char *cmd;
+	int (*handler_fn)(struct era_policy *, const char *, era_match_fn_t);
+	era_match_fn_t match_fn;
+};
+
+/* FIXME: is a delete unmap request needed or is reloading the mapping sufficient to achieve it? */
+static int era_set_config_value(struct dm_cache_policy *p, const char *key,
+				const char *value)
+{
+	struct era_policy *era = to_era_policy(p);
+	struct config_value_handler *vh, value_handlers[] = {
+		{ "increment_era_counter",                  incr_era_counter,  NULL },
+		{ "unmap_blocks_from_later_eras",           cond_unmap_by_era, era_is_gt_value },
+		{ "unmap_blocks_from_this_era_and_later",   cond_unmap_by_era, era_is_gte_value },
+		{ "unmap_blocks_from_this_era_and_earlier", cond_unmap_by_era, era_is_lte_value },
+		{ "unmap_blocks_from_earlier_eras",         cond_unmap_by_era, era_is_lt_value },
+		{ NULL }
+	};
+
+	for (vh = value_handlers; vh->cmd; vh++) {
+		if (!strcasecmp(key, vh->cmd))
+			return vh->handler_fn(era, value, vh->match_fn);
+	}
+
+	return policy_set_config_value(p->child, key, value);
+}
+
 static int era_emit_config_values(struct dm_cache_policy *p, char *result,
 				  unsigned maxlen)
 {
@@ -355,6 +470,7 @@ static void init_policy_functions(struct era_policy *era)
 	era->policy.load_mapping = era_load_mapping;
 	era->policy.walk_mappings = era_walk_mappings;
 	era->policy.force_mapping = era_force_mapping;
+	era->policy.invalidate_mapping = era_invalidate_mapping;
 	era->policy.emit_config_values = era_emit_config_values;
 	era->policy.set_config_value = era_set_config_value;
 }
@@ -372,15 +488,13 @@ static struct dm_cache_policy *era_create(dm_cblock_t cache_size,
 	era->cache_size = cache_size;
 	mutex_init(&era->lock);
 
-	era->cb_to_era = kzalloc(from_cblock(era->cache_size) *
-				 sizeof(*(era->cb_to_era)), GFP_KERNEL);
-	if (!era->cb_to_era)
-		goto bad_alloc_cb_to_era;
-	era->era_counter = 1;
-
-	return &era->policy;
+	era->cb_to_era = vzalloc(from_cblock(era->cache_size) *
+				 sizeof(*era->cb_to_era));
+	if (era->cb_to_era) {
+		era->era_counter = 1;
+		return &era->policy;
+	}
 
-bad_alloc_cb_to_era:
 	kfree(era);
 	return NULL;
 }
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/24] dm: nest targets used for testing under DM_TEST_TARGETS
  2013-10-24 18:30 ` [PATCH 01/24] dm: nest targets used for testing under DM_TEST_TARGETS Mike Snitzer
@ 2013-10-24 23:17   ` Alasdair G Kergon
  2013-10-25 19:25     ` Mike Snitzer
  2013-10-24 23:51   ` Alasdair G Kergon
  1 sibling, 1 reply; 49+ messages in thread
From: Alasdair G Kergon @ 2013-10-24 23:17 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Morgan Mears, Heinz Mauelshagen, dm-devel, Joe Thornber

> --- a/drivers/md/Kconfig
> +++ b/drivers/md/Kconfig
> @@ -371,27 +371,12 @@ config DM_MULTIPATH_ST
  	  
> -config DM_DELAY
> -	tristate "I/O delaying target"

> +config DM_DELAY
> +	tristate "I/O delaying target (EXPERIMENTAL)"

Why has EXPERIMENTAL reappeared???
See d57916a00fd749ccd354a7f754c2aba98f86d064

Please regenerate these patches against an up-to-date upstream base!

Thanks,
Alasdair

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/24] dm: nest targets used for testing under DM_TEST_TARGETS
  2013-10-24 18:30 ` [PATCH 01/24] dm: nest targets used for testing under DM_TEST_TARGETS Mike Snitzer
  2013-10-24 23:17   ` Alasdair G Kergon
@ 2013-10-24 23:51   ` Alasdair G Kergon
  1 sibling, 0 replies; 49+ messages in thread
From: Alasdair G Kergon @ 2013-10-24 23:51 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Morgan Mears, Heinz Mauelshagen, dm-devel, Joe Thornber

On Thu, Oct 24, 2013 at 02:30:14PM -0400, Mike Snitzer wrote:
> Both the flakey and delay targets are generally used for testing.  
                                        ^^^^^^^^^
Indeed - not exclusively.

> --- a/drivers/md/Kconfig

> +config DM_TEST_TARGETS

> +         Targets that are only useful for testing.

Could we delete the word 'only'?  Or replace it with something like
'primarily'?

(I don't like this classification for DM_DELAY: I think it can have uses in
non-testing scenarios.)

Alasdair

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 05/24] dm cache policy mq: implement writeback_work() and mq_{set, clear}_dirty()
  2013-10-24 18:30 ` [PATCH 05/24] dm cache policy mq: implement writeback_work() and mq_{set, clear}_dirty() Mike Snitzer
@ 2013-10-25 16:06   ` Alasdair G Kergon
  2013-10-25 19:18     ` Mike Snitzer
  0 siblings, 1 reply; 49+ messages in thread
From: Alasdair G Kergon @ 2013-10-25 16:06 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Morgan Mears, Heinz Mauelshagen, dm-devel, Joe Thornber

On Thu, Oct 24, 2013 at 02:30:18PM -0400, Mike Snitzer wrote:
> From: Joe Thornber <ejt@redhat.com>
 
> --- a/drivers/md/dm-cache-policy-mq.c

> -	 * We maintain two queues of entries.  The cache proper contains

> +	 * We maintain three queues of entries.  The cache proper,

Does the Documentation file need updating too?

: The multiqueue policy has two sets of 16 queues: one set for entries
: waiting for the cache and another one for those in the cache.

> @@ -580,7 +585,16 @@ static void check_generation(struct mq_policy *mq)

Is the comment about this function no longer accurate?

: * At the moment the threshold is taken by averaging the hit counts of some
: * of the entries in the cache (the first 20 entries of the first level).

- It looks at clean before dirty?
- It looks at more than just the first level if it needs to?

> +// FIXME: can __mq_set_clear_dirty block?

Still to check this before inclusion upstream?

Alasdair

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/24] dm cache policy mq: return NULL if mq->free list is empty in alloc_entry
  2013-10-24 18:30 ` [PATCH 06/24] dm cache policy mq: return NULL if mq->free list is empty in alloc_entry Mike Snitzer
@ 2013-10-25 16:37   ` Alasdair G Kergon
  2013-10-25 20:44     ` Mike Snitzer
  2013-10-29 14:49   ` [PATCH 06/24 v2] dm cache policy mq: return NULL from alloc_entry if cache is full Mike Snitzer
  1 sibling, 1 reply; 49+ messages in thread
From: Alasdair G Kergon @ 2013-10-25 16:37 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Morgan Mears, Heinz Mauelshagen, dm-devel, Joe Thornber

On Thu, Oct 24, 2013 at 02:30:19PM -0400, Mike Snitzer wrote:
> From: Heinz Mauelshagen <heinzm@redhat.com>
> Addresses callers' (insert_in*cache()) requirement that alloc_entry()
> return NULL when an entry isn't able to be allocated.
 
What is the code path that leads to the requirement for this patch?

> +++ b/drivers/md/dm-cache-policy-mq.c
>  static struct entry *alloc_entry(struct mq_policy *mq)

> +	struct entry *e = NULL;
>  
>  	if (mq->nr_entries_allocated >= mq->nr_entries) {
>  		BUG_ON(!list_empty(&mq->free));
>  		return NULL;
>  	}
>  
> -	e = list_entry(list_pop(&mq->free), struct entry, list);
> -	INIT_LIST_HEAD(&e->list);
> -	INIT_HLIST_NODE(&e->hlist);
> +	if (!list_empty(&mq->free)) {
> +		e = list_entry(list_pop(&mq->free), struct entry, list);
> +		INIT_LIST_HEAD(&e->list);
> +		INIT_HLIST_NODE(&e->hlist);
> +		mq->nr_entries_allocated++;
> +	}

In other words, under what circumstances is mq->nr_entries_allocated
less then mq->nr_entries, yet the mq->free list is empty?

Is it better to apply a patch like this, or rather to fix whatever situation
leads to those circumstances?  Has the bug/race always been there or is it a
regression?

Alasdair

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 09/24] dm cache metadata: check the metadata version when reading the superblock
  2013-10-24 18:30 ` [PATCH 09/24] dm cache metadata: check the metadata version when reading the superblock Mike Snitzer
@ 2013-10-25 17:07   ` Alasdair G Kergon
  2013-10-25 19:53     ` Mike Snitzer
  0 siblings, 1 reply; 49+ messages in thread
From: Alasdair G Kergon @ 2013-10-25 17:07 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Morgan Mears, Heinz Mauelshagen, dm-devel, Joe Thornber

On Thu, Oct 24, 2013 at 02:30:22PM -0400, Mike Snitzer wrote:
> From: Joe Thornber <ejt@redhat.com>

> +++ b/drivers/md/dm-cache-metadata.c
> @@ -20,7 +20,13 @@

> + * defines a range of metadata versions that this module can handle.

> +{
> +	uint32_t metadata_version = le32_to_cpu(disk_super->version);

+

> +	if (metadata_version < MIN_CACHE_VERSION || metadata_version > MAX_CACHE_VERSION) {
> +		DMERR("kernel does not support this version of cache metadata (%u)",
> +		      metadata_version);

- Also state the range supported?
   "Cache metadata version %u found, but only versions between %u and %u supported."

Alasdair

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 05/24] dm cache policy mq: implement writeback_work() and mq_{set, clear}_dirty()
  2013-10-25 16:06   ` Alasdair G Kergon
@ 2013-10-25 19:18     ` Mike Snitzer
  0 siblings, 0 replies; 49+ messages in thread
From: Mike Snitzer @ 2013-10-25 19:18 UTC (permalink / raw)
  To: dm-devel, Morgan Mears, Heinz Mauelshagen, Joe Thornber

On Fri, Oct 25 2013 at 12:06pm -0400,
Alasdair G Kergon <agk@redhat.com> wrote:

> On Thu, Oct 24, 2013 at 02:30:18PM -0400, Mike Snitzer wrote:
> > From: Joe Thornber <ejt@redhat.com>
>  
> > --- a/drivers/md/dm-cache-policy-mq.c
> 
> > -	 * We maintain two queues of entries.  The cache proper contains
> 
> > +	 * We maintain three queues of entries.  The cache proper,
> 
> Does the Documentation file need updating too?

will fix.

> : The multiqueue policy has two sets of 16 queues: one set for entries
> : waiting for the cache and another one for those in the cache.
> 
> > @@ -580,7 +585,16 @@ static void check_generation(struct mq_policy *mq)
> 
> Is the comment about this function no longer accurate?
> 
> : * At the moment the threshold is taken by averaging the hit counts of some
> : * of the entries in the cache (the first 20 entries of the first level).
> 
> - It looks at clean before dirty?

yes, that is new.  will fix.

> - It looks at more than just the first level if it needs to?

was always the case, so docs were already stale.  will fix.

> > +// FIXME: can __mq_set_clear_dirty block?
> 
> Still to check this before inclusion upstream?

yeah, Joe said he'd have a look.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 11/24] dm table: print error on preresume failure
  2013-10-24 18:30 ` [PATCH 11/24] dm table: print error on preresume failure Mike Snitzer
@ 2013-10-25 19:22   ` Alasdair G Kergon
  2013-10-25 19:58     ` Mike Snitzer
  0 siblings, 1 reply; 49+ messages in thread
From: Alasdair G Kergon @ 2013-10-25 19:22 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Morgan Mears, Heinz Mauelshagen, dm-devel, Joe Thornber

On Thu, Oct 24, 2013 at 02:30:24PM -0400, Mike Snitzer wrote:
> If preresume fails it is worth logging an error given that a device is
> left suspended due to the failure.
 
Indeed.

> This change was motivated by local preresume error logging that was
> added to the cache target.  Elevating this error logging to DM core
> makes sense.
 
But here I disagree: I think it makes more sense not to elevate this
logging, but to leave the responsibility for it with the targets
themselves.

> +			DMERR("%s: %s: preresume failed, error = %d",
> +			      dm_device_name(t->md), ti->type->name, r);

Elevated, you're only getting a single number displayed.

If the targets retain the responsibility, then you should get a
more-detailed error message such as:

                DMERR("aborting resume - crypt key is not set.");

                        DMERR("Unable to resume snapshot source until "
                              "handover completes.");

                        DMERR("Unable to perform snapshot handover until "
                              "source is suspended.");

                        DMERR("could not resize cache metadata");
                        DMERR("could not load cache mappings");
                        DMERR("could not load origin discards");

I think we should retain the requirement that each target must log
a meaningful message whenever preresume fails.

Alasdair

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/24] dm: nest targets used for testing under DM_TEST_TARGETS
  2013-10-24 23:17   ` Alasdair G Kergon
@ 2013-10-25 19:25     ` Mike Snitzer
  2013-10-25 19:29       ` Alasdair G Kergon
  0 siblings, 1 reply; 49+ messages in thread
From: Mike Snitzer @ 2013-10-25 19:25 UTC (permalink / raw)
  To: dm-devel, Morgan Mears, Heinz Mauelshagen, Joe Thornber

On Thu, Oct 24 2013 at  7:17pm -0400,
Alasdair G Kergon <agk@redhat.com> wrote:

> > --- a/drivers/md/Kconfig
> > +++ b/drivers/md/Kconfig
> > @@ -371,27 +371,12 @@ config DM_MULTIPATH_ST
>   	  
> > -config DM_DELAY
> > -	tristate "I/O delaying target"
> 
> > +config DM_DELAY
> > +	tristate "I/O delaying target (EXPERIMENTAL)"
> 
> Why has EXPERIMENTAL reappeared???
> See d57916a00fd749ccd354a7f754c2aba98f86d064
> 
> Please regenerate these patches against an up-to-date upstream base!

The base for these patches is v3.12-rc5, certainly recent enough.

What you took issue with here has nothing to do with an older base being
used for the patches.  It has everything to do with Joe having made this
change in his thin-dev branch quite a while ago -- when I rebased I
didn't change the text the patch moved.  Will remove the EXPERIMENTALs.

As for delay being useful for non-test use-cases, that may be but
classifying it as a target that is used for testing is fair (hence
"Useful for testing." in the DM_DELAY help text).  That said, if you'd
really prefer DM_DELAY not be nested under DM_TEST_TARGETS I can revert
it.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/24] dm: nest targets used for testing under DM_TEST_TARGETS
  2013-10-25 19:25     ` Mike Snitzer
@ 2013-10-25 19:29       ` Alasdair G Kergon
  0 siblings, 0 replies; 49+ messages in thread
From: Alasdair G Kergon @ 2013-10-25 19:29 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Morgan Mears, Heinz Mauelshagen, dm-devel, Joe Thornber

On Fri, Oct 25, 2013 at 03:25:56PM -0400, Mike Snitzer wrote:
> As for delay being useful for non-test use-cases, that may be but
> classifying it as a target that is used for testing is fair (hence
> "Useful for testing." in the DM_DELAY help text).  That said, if you'd
> really prefer DM_DELAY not be nested under DM_TEST_TARGETS I can revert
> it.
 
I'm not objecting to its inclusion with the TEST TARGETS, as that remains
the primary use of it, but with the statement that the target is then meant
only for testing.

Alasdair

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 14/24] dm cache: use is_write_io() in more places
  2013-10-24 18:30 ` [PATCH 14/24] dm cache: use is_write_io() in more places Mike Snitzer
@ 2013-10-25 19:53   ` Alasdair G Kergon
  2013-10-25 20:11     ` Mike Snitzer
  0 siblings, 1 reply; 49+ messages in thread
From: Alasdair G Kergon @ 2013-10-25 19:53 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Jens Axboe, Morgan Mears, Heinz Mauelshagen, Joe Thornber,
	Christoph Hellwig, dm-devel

On Thu, Oct 24, 2013 at 02:30:27PM -0400, Mike Snitzer wrote:
> From: Heinz Mauelshagen <heinzm@redhat.com>
 
> Use is_write_io() consistently where it is makes sense.
 
> +++ b/drivers/md/dm-cache-target.c

> +static bool is_write_io(struct bio *bio)
> +{
> +	return bio_data_dir(bio) == WRITE;
> +}

I'm not really convinced about this one, though I can see some superficial
attractiveness.  I think introducing it in a single file (rather than
kernel-wide) is obfuscating the code rather than making it more readable.

If a change like this were to happen, wouldn't fs.h be a better place?

Alasdair

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 09/24] dm cache metadata: check the metadata version when reading the superblock
  2013-10-25 17:07   ` Alasdair G Kergon
@ 2013-10-25 19:53     ` Mike Snitzer
  0 siblings, 0 replies; 49+ messages in thread
From: Mike Snitzer @ 2013-10-25 19:53 UTC (permalink / raw)
  To: dm-devel, Morgan Mears, Heinz Mauelshagen, Joe Thornber

On Fri, Oct 25 2013 at  1:07pm -0400,
Alasdair G Kergon <agk@redhat.com> wrote:

> On Thu, Oct 24, 2013 at 02:30:22PM -0400, Mike Snitzer wrote:
> > From: Joe Thornber <ejt@redhat.com>
> 
> > +++ b/drivers/md/dm-cache-metadata.c
> > @@ -20,7 +20,13 @@
> 
> > + * defines a range of metadata versions that this module can handle.
> 
> > +{
> > +	uint32_t metadata_version = le32_to_cpu(disk_super->version);
> 
> +
> 
> > +	if (metadata_version < MIN_CACHE_VERSION || metadata_version > MAX_CACHE_VERSION) {
> > +		DMERR("kernel does not support this version of cache metadata (%u)",
> > +		      metadata_version);
> 
> - Also state the range supported?
>    "Cache metadata version %u found, but only versions between %u and %u supported."

Done.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 11/24] dm table: print error on preresume failure
  2013-10-25 19:22   ` Alasdair G Kergon
@ 2013-10-25 19:58     ` Mike Snitzer
  0 siblings, 0 replies; 49+ messages in thread
From: Mike Snitzer @ 2013-10-25 19:58 UTC (permalink / raw)
  To: dm-devel, Morgan Mears, Heinz Mauelshagen, ejt

On Fri, Oct 25 2013 at  3:22pm -0400,
Alasdair G Kergon <agk@redhat.com> wrote:

> On Thu, Oct 24, 2013 at 02:30:24PM -0400, Mike Snitzer wrote:
> > If preresume fails it is worth logging an error given that a device is
> > left suspended due to the failure.
>  
> Indeed.
> 
> > This change was motivated by local preresume error logging that was
> > added to the cache target.  Elevating this error logging to DM core
> > makes sense.
>  
> But here I disagree: I think it makes more sense not to elevate this
> logging, but to leave the responsibility for it with the targets
> themselves.
> 
> > +			DMERR("%s: %s: preresume failed, error = %d",
> > +			      dm_device_name(t->md), ti->type->name, r);
> 
> Elevated, you're only getting a single number displayed.

Elevated was over-stated, I'll revise the header.  I was trying to say
that the target shouldn't need to worry about stating that the failure
happens to be in the preresume method on the target.

> If the targets retain the responsibility, then you should get a
> more-detailed error message such as:
> 
>                 DMERR("aborting resume - crypt key is not set.");
> 
>                         DMERR("Unable to resume snapshot source until "
>                               "handover completes.");
> 
>                         DMERR("Unable to perform snapshot handover until "
>                               "source is suspended.");
> 
>                         DMERR("could not resize cache metadata");
>                         DMERR("could not load cache mappings");
>                         DMERR("could not load origin discards");
> 
> I think we should retain the requirement that each target must log
> a meaningful message whenever preresume fails.

Yeah, I agree.  I just want DM core to make it clear that the target
failure resulted in preresume failing.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 14/24] dm cache: use is_write_io() in more places
  2013-10-25 19:53   ` Alasdair G Kergon
@ 2013-10-25 20:11     ` Mike Snitzer
  0 siblings, 0 replies; 49+ messages in thread
From: Mike Snitzer @ 2013-10-25 20:11 UTC (permalink / raw)
  To: dm-devel, Morgan Mears, Heinz Mauelshagen, Joe Thornber,
	Jens Axboe, Christoph Hellwig

On Fri, Oct 25 2013 at  3:53pm -0400,
Alasdair G Kergon <agk@redhat.com> wrote:

> On Thu, Oct 24, 2013 at 02:30:27PM -0400, Mike Snitzer wrote:
> > From: Heinz Mauelshagen <heinzm@redhat.com>
>  
> > Use is_write_io() consistently where it is makes sense.
>  
> > +++ b/drivers/md/dm-cache-target.c
> 
> > +static bool is_write_io(struct bio *bio)
> > +{
> > +	return bio_data_dir(bio) == WRITE;
> > +}
> 
> I'm not really convinced about this one, though I can see some superficial
> attractiveness.  I think introducing it in a single file (rather than
> kernel-wide) is obfuscating the code rather than making it more readable.
> 
> If a change like this were to happen, wouldn't fs.h be a better place?

is_write_io() was already in dm-cache-target.c, this patch just makes
use of it in more applicable places.

But I'm fine with completely eliminating is_write_io()... whatever
others would prefer.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 21/24] dm cache: add trc policy shim
  2013-10-24 18:30 ` [PATCH 21/24] dm cache: add trc " Mike Snitzer
@ 2013-10-25 20:13   ` Alasdair G Kergon
  2013-10-25 21:08     ` Mike Snitzer
  0 siblings, 1 reply; 49+ messages in thread
From: Alasdair G Kergon @ 2013-10-25 20:13 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Morgan Mears, Heinz Mauelshagen, dm-devel, Joe Thornber

On Thu, Oct 24, 2013 at 02:30:34PM -0400, Mike Snitzer wrote:
> From: Morgan Mears <morgan.mears@netapp.com>
 
> This commit includes a non-terminal policy (aka "shim") called trc that
> may be stacked ontop of a terminal policy (e.g. mq).
> The second shim, trc, adds function-call-level tracing to the policy
> stack.  By default, an INFO-level log message including function name
> and parameter(s) will be generated whenever most of the cache policy
> interface functions are called.  An interface to increase or decrease
> the verbosity of the trace output is also provided.
 
Firstly, why not call it 'trace' in full, rather than abbreviating it to 3
consonants?

> +++ b/drivers/md/dm-cache-policy-trc.c

> +#define DM_TRC_OUT(lev, p, f, arg...) \
> +	do { \
> +		if (to_trc_policy(p)->trace_level >= lev) \
> +			DMINFO("%s: " f, __func__, ## arg); \
> +	} while (0)

OK for private debugging, but I can't pretend to be very keen on this
one going upstream in this form.   Might this not need to support high
volumes of messages sometimes?  Were other upstream mechanisms
considered?  (E.g. see Documentation/trace).

Alasdair

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/24] dm cache policy mq: return NULL if mq->free list is empty in alloc_entry
  2013-10-25 16:37   ` Alasdair G Kergon
@ 2013-10-25 20:44     ` Mike Snitzer
  2013-10-25 22:36       ` Heinz Mauelshagen
  0 siblings, 1 reply; 49+ messages in thread
From: Mike Snitzer @ 2013-10-25 20:44 UTC (permalink / raw)
  To: dm-devel, Morgan Mears, Heinz Mauelshagen, Joe Thornber

On Fri, Oct 25 2013 at 12:37pm -0400,
Alasdair G Kergon <agk@redhat.com> wrote:

> On Thu, Oct 24, 2013 at 02:30:19PM -0400, Mike Snitzer wrote:
> > From: Heinz Mauelshagen <heinzm@redhat.com>
> > Addresses callers' (insert_in*cache()) requirement that alloc_entry()
> > return NULL when an entry isn't able to be allocated.
>  
> What is the code path that leads to the requirement for this patch?
> 
> > +++ b/drivers/md/dm-cache-policy-mq.c
> >  static struct entry *alloc_entry(struct mq_policy *mq)
> 
> > +	struct entry *e = NULL;
> >  
> >  	if (mq->nr_entries_allocated >= mq->nr_entries) {
> >  		BUG_ON(!list_empty(&mq->free));
> >  		return NULL;
> >  	}
> >  
> > -	e = list_entry(list_pop(&mq->free), struct entry, list);
> > -	INIT_LIST_HEAD(&e->list);
> > -	INIT_HLIST_NODE(&e->hlist);
> > +	if (!list_empty(&mq->free)) {
> > +		e = list_entry(list_pop(&mq->free), struct entry, list);
> > +		INIT_LIST_HEAD(&e->list);
> > +		INIT_HLIST_NODE(&e->hlist);
> > +		mq->nr_entries_allocated++;
> > +	}
> 
> In other words, under what circumstances is mq->nr_entries_allocated
> less then mq->nr_entries, yet the mq->free list is empty?
> 
> Is it better to apply a patch like this, or rather to fix whatever situation
> leads to those circumstances?  Has the bug/race always been there or is it a
> regression?

Fair questions, Heinz should explain further.

IIRC this change was needed as a prereq for the conversion of his
out-of-tree "background" policy to a shim (layered ontop of mq).

So will drop for now, can revisit if/when the need is clearer (e.g. when
background policy goes upstream).  TBD if we need it in the near-term,
but will table this for now.  Dropping patch until more context from
Heinz is provided.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 21/24] dm cache: add trc policy shim
  2013-10-25 20:13   ` Alasdair G Kergon
@ 2013-10-25 21:08     ` Mike Snitzer
  2013-10-25 22:44       ` Mike Snitzer
  0 siblings, 1 reply; 49+ messages in thread
From: Mike Snitzer @ 2013-10-25 21:08 UTC (permalink / raw)
  To: Alasdair G Kergon; +Cc: Morgan Mears, Heinz Mauelshagen, dm-devel, Joe Thornber

On Fri, Oct 25 2013 at  4:13pm -0400,
Alasdair G Kergon <agk@redhat.com> wrote:

> On Thu, Oct 24, 2013 at 02:30:34PM -0400, Mike Snitzer wrote:
> > From: Morgan Mears <morgan.mears@netapp.com>
>  
> > This commit includes a non-terminal policy (aka "shim") called trc that
> > may be stacked ontop of a terminal policy (e.g. mq).
> > The second shim, trc, adds function-call-level tracing to the policy
> > stack.  By default, an INFO-level log message including function name
> > and parameter(s) will be generated whenever most of the cache policy
> > interface functions are called.  An interface to increase or decrease
> > the verbosity of the trace output is also provided.
>  
> Firstly, why not call it 'trace' in full, rather than abbreviating it to 3
> consonants?

We can easily rename.
 
> > +++ b/drivers/md/dm-cache-policy-trc.c
> 
> > +#define DM_TRC_OUT(lev, p, f, arg...) \
> > +	do { \
> > +		if (to_trc_policy(p)->trace_level >= lev) \
> > +			DMINFO("%s: " f, __func__, ## arg); \
> > +	} while (0)
> 
> OK for private debugging, but I can't pretend to be very keen on this
> one going upstream in this form.   Might this not need to support high
> volumes of messages sometimes?  Were other upstream mechanisms
> considered?  (E.g. see Documentation/trace).

Think this would take care of it (not full-blown tracepoints like
blktrace but certainly more performant than standard printk):

---
 drivers/md/dm-cache-policy-trc.c |    2 +-
 include/linux/device-mapper.h    |    3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

Index: linux/drivers/md/dm-cache-policy-trc.c
===================================================================
--- linux.orig/drivers/md/dm-cache-policy-trc.c
+++ linux/drivers/md/dm-cache-policy-trc.c
@@ -27,7 +27,7 @@
 #define DM_TRC_OUT(lev, p, f, arg...) \
 	do { \
 		if (to_trc_policy(p)->trace_level >= lev) \
-			DMINFO("%s: " f, __func__, ## arg); \
+			DMTRACE("%s: " f, __func__, ## arg); \
 	} while (0)
 
 enum dm_trace_lev_e {
Index: linux/include/linux/device-mapper.h
===================================================================
--- linux.orig/include/linux/device-mapper.h
+++ linux/include/linux/device-mapper.h
@@ -520,6 +520,9 @@ extern struct ratelimit_state dm_ratelim
 			       "\n", ## arg); \
 	} while (0)
 
+#define DMTRACE(f, arg...) \
+	trace_printk(DM_NAME ": " DM_MSG_PREFIX ": " f "\n", ## arg)
+
 #ifdef CONFIG_DM_DEBUG
 #  define DMDEBUG(f, arg...) \
 	printk(KERN_DEBUG DM_NAME ": " DM_MSG_PREFIX " DEBUG: " f "\n", ## arg)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/24] dm cache policy mq: return NULL if mq->free list is empty in alloc_entry
  2013-10-25 20:44     ` Mike Snitzer
@ 2013-10-25 22:36       ` Heinz Mauelshagen
  0 siblings, 0 replies; 49+ messages in thread
From: Heinz Mauelshagen @ 2013-10-25 22:36 UTC (permalink / raw)
  To: dm-devel


On 10/25/2013 10:44 PM, Mike Snitzer wrote:
> On Fri, Oct 25 2013 at 12:37pm -0400,
> Alasdair G Kergon <agk@redhat.com> wrote:
>
>> On Thu, Oct 24, 2013 at 02:30:19PM -0400, Mike Snitzer wrote:
>>> From: Heinz Mauelshagen <heinzm@redhat.com>
>>> Addresses callers' (insert_in*cache()) requirement that alloc_entry()
>>> return NULL when an entry isn't able to be allocated.
>>   
>> What is the code path that leads to the requirement for this patch?
>>
>>> +++ b/drivers/md/dm-cache-policy-mq.c
>>>   static struct entry *alloc_entry(struct mq_policy *mq)
>>> +	struct entry *e = NULL;
>>>   
>>>   	if (mq->nr_entries_allocated >= mq->nr_entries) {
>>>   		BUG_ON(!list_empty(&mq->free));
>>>   		return NULL;
>>>   	}
>>>   
>>> -	e = list_entry(list_pop(&mq->free), struct entry, list);
>>> -	INIT_LIST_HEAD(&e->list);
>>> -	INIT_HLIST_NODE(&e->hlist);
>>> +	if (!list_empty(&mq->free)) {
>>> +		e = list_entry(list_pop(&mq->free), struct entry, list);
>>> +		INIT_LIST_HEAD(&e->list);
>>> +		INIT_HLIST_NODE(&e->hlist);
>>> +		mq->nr_entries_allocated++;
>>> +	}
>> In other words, under what circumstances is mq->nr_entries_allocated
>> less then mq->nr_entries, yet the mq->free list is empty?
>>
>> Is it better to apply a patch like this, or rather to fix whatever situation
>> leads to those circumstances?  Has the bug/race always been there or is it a
>> regression?
> Fair questions, Heinz should explain further.
>
> IIRC this change was needed as a prereq for the conversion of his
> out-of-tree "background" policy to a shim (layered ontop of mq).

Has nothing to do with that.

It is a fix for existing callers presuming that alloc_entry() could 
return NULL but the
callee did not handle this properly.


>
> So will drop for now, can revisit if/when the need is clearer (e.g. when
> background policy goes upstream).  TBD if we need it in the near-term,
> but will table this for now.  Dropping patch until more context from
> Heinz is provided.

Leave it in for the aforementioned reason.

Heinz

>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 21/24] dm cache: add trc policy shim
  2013-10-25 21:08     ` Mike Snitzer
@ 2013-10-25 22:44       ` Mike Snitzer
  2013-11-01 21:39         ` Steven Rostedt
  0 siblings, 1 reply; 49+ messages in thread
From: Mike Snitzer @ 2013-10-25 22:44 UTC (permalink / raw)
  To: Alasdair G Kergon; +Cc: Morgan Mears, Heinz Mauelshagen, dm-devel, Joe Thornber

On Fri, Oct 25 2013 at  5:08pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Fri, Oct 25 2013 at  4:13pm -0400,
> Alasdair G Kergon <agk@redhat.com> wrote:
> 
> > On Thu, Oct 24, 2013 at 02:30:34PM -0400, Mike Snitzer wrote:
> > > From: Morgan Mears <morgan.mears@netapp.com>
> >  
> > > This commit includes a non-terminal policy (aka "shim") called trc that
> > > may be stacked ontop of a terminal policy (e.g. mq).
> > > The second shim, trc, adds function-call-level tracing to the policy
> > > stack.  By default, an INFO-level log message including function name
> > > and parameter(s) will be generated whenever most of the cache policy
> > > interface functions are called.  An interface to increase or decrease
> > > the verbosity of the trace output is also provided.
> >  
> > Firstly, why not call it 'trace' in full, rather than abbreviating it to 3
> > consonants?
> 
> We can easily rename.
>  
> > > +++ b/drivers/md/dm-cache-policy-trc.c
> > 
> > > +#define DM_TRC_OUT(lev, p, f, arg...) \
> > > +	do { \
> > > +		if (to_trc_policy(p)->trace_level >= lev) \
> > > +			DMINFO("%s: " f, __func__, ## arg); \
> > > +	} while (0)
> > 
> > OK for private debugging, but I can't pretend to be very keen on this
> > one going upstream in this form.   Might this not need to support high
> > volumes of messages sometimes?  Were other upstream mechanisms
> > considered?  (E.g. see Documentation/trace).
> 
> Think this would take care of it (not full-blown tracepoints like
> blktrace but certainly more performant than standard printk):

I folded this trace_printk change in and renamed from trc to trace, see:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-next&id=f6f5db50495b16ff23a0febf38f9ee9d964b12dd

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 06/24 v2] dm cache policy mq: return NULL from alloc_entry if cache is full
  2013-10-24 18:30 ` [PATCH 06/24] dm cache policy mq: return NULL if mq->free list is empty in alloc_entry Mike Snitzer
  2013-10-25 16:37   ` Alasdair G Kergon
@ 2013-10-29 14:49   ` Mike Snitzer
  1 sibling, 0 replies; 49+ messages in thread
From: Mike Snitzer @ 2013-10-29 14:49 UTC (permalink / raw)
  To: dm-devel; +Cc: Morgan Mears, Heinz Mauelshagen, Joe Thornber

From: Heinz Mauelshagen <heinzm@redhat.com>

alloc_entry() must return NULL if the cache is full (mq->free list is
empty).  To be clear, it is not a bug if the alloc_entry() logic tries
to alloc from the free list before it falls back to pop from pre cache.
The free list is simply a resource priority list that alloc_entry()
tries to claim a block from but sometimes cannot (e.g. cache is full).

This fix addresses callers' (insert_in*cache()) requirement that
alloc_entry() return NULL when an entry isn't able to be allocated.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # v3.9+
---
 drivers/md/dm-cache-policy-mq.c |   12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

Index: linux/drivers/md/dm-cache-policy-mq.c
===================================================================
--- linux.orig/drivers/md/dm-cache-policy-mq.c
+++ linux/drivers/md/dm-cache-policy-mq.c
@@ -415,18 +415,20 @@ static void hash_remove(struct entry *e)
  */
 static struct entry *alloc_entry(struct mq_policy *mq)
 {
-	struct entry *e;
+	struct entry *e = NULL;
 
 	if (mq->nr_entries_allocated >= mq->nr_entries) {
 		BUG_ON(!list_empty(&mq->free));
 		return NULL;
 	}
 
-	e = list_entry(list_pop(&mq->free), struct entry, list);
-	INIT_LIST_HEAD(&e->list);
-	INIT_HLIST_NODE(&e->hlist);
+	if (!list_empty(&mq->free)) {
+		e = list_entry(list_pop(&mq->free), struct entry, list);
+		INIT_LIST_HEAD(&e->list);
+		INIT_HLIST_NODE(&e->hlist);
+		mq->nr_entries_allocated++;
+	}
 
-	mq->nr_entries_allocated++;
 	return e;
 }
 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 21/24] dm cache: add trc policy shim
  2013-10-25 22:44       ` Mike Snitzer
@ 2013-11-01 21:39         ` Steven Rostedt
  2013-11-01 23:38           ` Mike Snitzer
  0 siblings, 1 reply; 49+ messages in thread
From: Steven Rostedt @ 2013-11-01 21:39 UTC (permalink / raw)
  To: device-mapper development
  Cc: Morgan Mears, Heinz Mauelshagen, Thornber, Joe, Alasdair G Kergon

On Fri, 2013-10-25 at 18:44 -0400, Mike Snitzer wrote:
> On Fri, Oct 25 2013 at  5:08pm -0400,
> Mike Snitzer <snitzer@redhat.com> wrote:
> 
> > On Fri, Oct 25 2013 at  4:13pm -0400,
> > Alasdair G Kergon <agk@redhat.com> wrote:
> > 
> > > On Thu, Oct 24, 2013 at 02:30:34PM -0400, Mike Snitzer wrote:
> > > > From: Morgan Mears <morgan.mears@netapp.com>
> > >  
> > > > This commit includes a non-terminal policy (aka "shim") called trc that
> > > > may be stacked ontop of a terminal policy (e.g. mq).
> > > > The second shim, trc, adds function-call-level tracing to the policy
> > > > stack.  By default, an INFO-level log message including function name
> > > > and parameter(s) will be generated whenever most of the cache policy
> > > > interface functions are called.  An interface to increase or decrease
> > > > the verbosity of the trace output is also provided.
> > >  
> > > Firstly, why not call it 'trace' in full, rather than abbreviating it to 3
> > > consonants?
> > 
> > We can easily rename.
> >  
> > > > +++ b/drivers/md/dm-cache-policy-trc.c
> > > 
> > > > +#define DM_TRC_OUT(lev, p, f, arg...) \
> > > > +	do { \
> > > > +		if (to_trc_policy(p)->trace_level >= lev) \
> > > > +			DMINFO("%s: " f, __func__, ## arg); \
> > > > +	} while (0)
> > > 
> > > OK for private debugging, but I can't pretend to be very keen on this
> > > one going upstream in this form.   Might this not need to support high
> > > volumes of messages sometimes?  Were other upstream mechanisms
> > > considered?  (E.g. see Documentation/trace).
> > 
> > Think this would take care of it (not full-blown tracepoints like
> > blktrace but certainly more performant than standard printk):

What's wrong with full blown tracepoints?

> 
> I folded this trace_printk change in and renamed from trc to trace, see:
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-next&id=f6f5db50495b16ff23a0febf38f9ee9d964b12dd
> 

Please please please DO NOT USE TRACE_PRINTK!

It's a debugging tool, and should never be used in the kernel proper. It
was meant for users to use it for debugging and then strip it out.

Now what you could do is to set up tracepoints to record the information
you want, either in the function itself, or add the wrapper function too
as a separate policy. Not sure if that would be any help or not.

But unless this is something that is considered "debug use only" and
never put into a production kernel, do not use trace_printk().

Note, tracepoints will also give you the ability to pick and choose the
traces instead of doing it by trace_level.

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 21/24] dm cache: add trc policy shim
  2013-11-01 21:39         ` Steven Rostedt
@ 2013-11-01 23:38           ` Mike Snitzer
  0 siblings, 0 replies; 49+ messages in thread
From: Mike Snitzer @ 2013-11-01 23:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Morgan Mears, Heinz Mauelshagen, Thornber,
	device-mapper development, Joe, Alasdair G Kergon

On Fri, Nov 01 2013 at  5:39pm -0400,
Steven Rostedt <srostedt@redhat.com> wrote:

> > > 
> > > Think this would take care of it (not full-blown tracepoints like
> > > blktrace but certainly more performant than standard printk):
> 
> What's wrong with full blown tracepoints?

Nothing.  Just wasn't as quick to switch to while keeping the intent of
the original implementation.

> > I folded this trace_printk change in and renamed from trc to trace, see:
> > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-next&id=f6f5db50495b16ff23a0febf38f9ee9d964b12dd
> > 
> 
> Please please please DO NOT USE TRACE_PRINTK!

Sure, I'll have a look at using proper tracepoints.  Worst case, in the
near-term, we drop this type of patch for v3.13 considering how close we
are to merge.

> It's a debugging tool, and should never be used in the kernel proper. It
> was meant for users to use it for debugging and then strip it out.
> 
> Now what you could do is to set up tracepoints to record the information
> you want, either in the function itself, or add the wrapper function too
> as a separate policy. Not sure if that would be any help or not.
> 
> But unless this is something that is considered "debug use only" and
> never put into a production kernel, do not use trace_printk().

It really is debug only, but I cannot promise nobody will ever add a
trace layer in production -- especially if it is as easy as using
"trace+mq" vs "mq" when specifying the policy name.

> Note, tracepoints will also give you the ability to pick and choose the
> traces instead of doing it by trace_level.

Yeap, I'm aware.

Thanks for your feedback Steve.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 17/24] dm cache: use a boolean when setting cache->quiescing
  2013-10-24 18:30 ` [PATCH 17/24] dm cache: use a boolean when setting cache->quiescing Mike Snitzer
@ 2013-11-06 15:02   ` Alasdair G Kergon
  2013-11-06 15:25     ` Mike Snitzer
  0 siblings, 1 reply; 49+ messages in thread
From: Alasdair G Kergon @ 2013-11-06 15:02 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Morgan Mears, Heinz Mauelshagen, dm-devel, Joe Thornber

On Thu, Oct 24, 2013 at 02:30:30PM -0400, Mike Snitzer wrote:
> From: Heinz Mauelshagen <heinzm@redhat.com>
> 
> Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>

Please could we drop this one?

It looks pointless to me, with this one following:
  dm cache: improve effciency of quiescing flag management

Alasdair

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 17/24] dm cache: use a boolean when setting cache->quiescing
  2013-11-06 15:02   ` Alasdair G Kergon
@ 2013-11-06 15:25     ` Mike Snitzer
  0 siblings, 0 replies; 49+ messages in thread
From: Mike Snitzer @ 2013-11-06 15:25 UTC (permalink / raw)
  To: dm-devel, Morgan Mears, Heinz Mauelshagen, Joe Thornber

On Wed, Nov 06 2013 at 10:02am -0500,
Alasdair G Kergon <agk@redhat.com> wrote:

> On Thu, Oct 24, 2013 at 02:30:30PM -0400, Mike Snitzer wrote:
> > From: Heinz Mauelshagen <heinzm@redhat.com>
> > 
> > Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
> > Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> 
> Please could we drop this one?
> 
> It looks pointless to me, with this one following:
>   dm cache: improve effciency of quiescing flag management

It was just a change for consistency that happened during the natural
flow of development.  Really didn't see a problem with preserving the
evolution of the code...

But sure, I can drop it.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 16/24] dm cache: log error message if dm_kcopyd_copy() fails
  2013-10-24 18:30 ` [PATCH 16/24] dm cache: log error message if dm_kcopyd_copy() fails Mike Snitzer
@ 2013-11-08 13:33   ` Alasdair G Kergon
  0 siblings, 0 replies; 49+ messages in thread
From: Alasdair G Kergon @ 2013-11-08 13:33 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Morgan Mears, Heinz Mauelshagen, dm-devel, Joe Thornber

On Thu, Oct 24, 2013 at 02:30:29PM -0400, Mike Snitzer wrote:
> A migration failure should be logged (albeit limited).
 
Does this *really* add anything useful?

migration_failure() already always logs a message, so you'll now get
two (throttled) messages instead of one every time.

Would it be better to log not just the type of failure but also some
details about the data block location too?

And if you really need to know "issuing migration failed" then why not
just add the string to the existing log message?

Alasdair

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2013-11-08 13:33 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-24 18:30 [PATCH 00/24] dm cache: proposed changes for v3.13 merge Mike Snitzer
2013-10-24 18:30 ` [PATCH 01/24] dm: nest targets used for testing under DM_TEST_TARGETS Mike Snitzer
2013-10-24 23:17   ` Alasdair G Kergon
2013-10-25 19:25     ` Mike Snitzer
2013-10-25 19:29       ` Alasdair G Kergon
2013-10-24 23:51   ` Alasdair G Kergon
2013-10-24 18:30 ` [PATCH 02/24] dm space map disk: optimise sm_disk_dec_block Mike Snitzer
2013-10-24 18:30 ` [PATCH 03/24] dm cache policy: remove return from void policy_remove_mapping Mike Snitzer
2013-10-24 18:30 ` [PATCH 04/24] dm cache policy mq: a few small fixes Mike Snitzer
2013-10-24 18:30 ` [PATCH 05/24] dm cache policy mq: implement writeback_work() and mq_{set, clear}_dirty() Mike Snitzer
2013-10-25 16:06   ` Alasdair G Kergon
2013-10-25 19:18     ` Mike Snitzer
2013-10-24 18:30 ` [PATCH 06/24] dm cache policy mq: return NULL if mq->free list is empty in alloc_entry Mike Snitzer
2013-10-25 16:37   ` Alasdair G Kergon
2013-10-25 20:44     ` Mike Snitzer
2013-10-25 22:36       ` Heinz Mauelshagen
2013-10-29 14:49   ` [PATCH 06/24 v2] dm cache policy mq: return NULL from alloc_entry if cache is full Mike Snitzer
2013-10-24 18:30 ` [PATCH 07/24] dm cache: be much more aggressive about promoting writes to discarded blocks Mike Snitzer
2013-10-24 18:30 ` [PATCH 08/24] dm cache metadata: return bool from __superblock_all_zeroes Mike Snitzer
2013-10-24 18:30 ` [PATCH 09/24] dm cache metadata: check the metadata version when reading the superblock Mike Snitzer
2013-10-25 17:07   ` Alasdair G Kergon
2013-10-25 19:53     ` Mike Snitzer
2013-10-24 18:30 ` [PATCH 10/24] dm cache policy: variable hints support Mike Snitzer
2013-10-24 18:30 ` [PATCH 11/24] dm table: print error on preresume failure Mike Snitzer
2013-10-25 19:22   ` Alasdair G Kergon
2013-10-25 19:58     ` Mike Snitzer
2013-10-24 18:30 ` [PATCH 12/24] dm cache: add passthrough mode Mike Snitzer
2013-10-24 18:30 ` [PATCH 13/24] dm cache policy: have policy_writeback_work return -ENODATA by default Mike Snitzer
2013-10-24 18:30 ` [PATCH 14/24] dm cache: use is_write_io() in more places Mike Snitzer
2013-10-25 19:53   ` Alasdair G Kergon
2013-10-25 20:11     ` Mike Snitzer
2013-10-24 18:30 ` [PATCH 15/24] dm cache: use cell_defer() boolean argument consistently Mike Snitzer
2013-10-24 18:30 ` [PATCH 16/24] dm cache: log error message if dm_kcopyd_copy() fails Mike Snitzer
2013-11-08 13:33   ` Alasdair G Kergon
2013-10-24 18:30 ` [PATCH 17/24] dm cache: use a boolean when setting cache->quiescing Mike Snitzer
2013-11-06 15:02   ` Alasdair G Kergon
2013-11-06 15:25     ` Mike Snitzer
2013-10-24 18:30 ` [PATCH 18/24] dm cache: optimize commit_if_needed Mike Snitzer
2013-10-24 18:30 ` [PATCH 19/24] dm cache: support for stackable caching policies Mike Snitzer
2013-10-24 18:30 ` [PATCH 20/24] dm cache: add era policy shim Mike Snitzer
2013-10-24 18:30 ` [PATCH 21/24] dm cache: add trc " Mike Snitzer
2013-10-25 20:13   ` Alasdair G Kergon
2013-10-25 21:08     ` Mike Snitzer
2013-10-25 22:44       ` Mike Snitzer
2013-11-01 21:39         ` Steven Rostedt
2013-11-01 23:38           ` Mike Snitzer
2013-10-24 18:30 ` [PATCH 22/24] dm cache: add hints policy Mike Snitzer
2013-10-24 18:30 ` [PATCH 23/24] dm cache: add cache block invalidation API Mike Snitzer
2013-10-24 18:30 ` [PATCH 24/24] dm cache policy era: add cache block invalidation support Mike Snitzer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.