All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 4/4] libibverbs: Undo changes in memory range tree when madvise() fails
@ 2010-02-01  5:57 Alex Vainman
       [not found] ` <4B666D56.4090708-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 2+ messages in thread
From: Alex Vainman @ 2010-02-01  5:57 UTC (permalink / raw)
  To: roland, Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	alexr-smomgflXvOZWk0Htik3J/w

ibv_madvise_range() doesn't cleanup if madvise() fails.
This patch comes to roll back the changes, made in memory tree,
which preceded the madvise() failure:

When madvise() fails on a memory range portion out of the whole range which
user requested to modify and ibv_madvise_range() successfully modified a few
tree nodes up to the problematical portion sub-ranges (this can happen if
there is an overlap between user's range and range's which where previously
added to the memory tree) then it is not enough to undo the split and merge
operation performed on the current node, which caused the failure, but the
functions needed to undo all the changes made on all the previous ranges from
start pointer to current location.
The patch revertes all the changes by re-running it self from start pointer to
current location with toggled inc value.

Signed-off-by: Alex Vainman <alexv-smomgflXvOZWk0Htik3J/w@public.gmane.org>
---
 src/memory.c |   39 ++++++++++++++++++++++++++++++++++++++-
 1 files changed, 38 insertions(+), 1 deletions(-)

diff --git a/src/memory.c b/src/memory.c
index 4dd9bdd..876dced 100644
--- a/src/memory.c
+++ b/src/memory.c
@@ -522,12 +522,43 @@ static struct ibv_mem_node *undo_node(struct ibv_mem_node *node,
 	return node;
 }
 
+/*
+ * This function is being called if madvise() fails.
+ * The node which caused madvise() to fail may contain
+ * just a sub range of [start-end].
+ * So we need to undo all the successful changes (if any)
+ * already performed on a range [start - (node->prev)->end].
+ * This function finds the node to begin rescanning from,
+ * finds the end of the range to rescan and invert
+ * the operation type.
+ */
+static struct ibv_mem_node *prepare_to_roll_back(struct ibv_mem_node *node,
+						 uintptr_t start,
+						 uintptr_t *end,
+						 int *inc,
+						 int *advice)
+{
+	struct ibv_mem_node *tmp = NULL;
+
+	*inc *= -1;
+	*advice = *inc == 1 ? MADV_DONTFORK : MADV_DOFORK;
+	tmp = __mm_prev(node);
+	node = NULL;
+	if (tmp) {
+		*end = tmp->end;
+		if (start <= *end)
+			node = get_start_node(start, *end, *inc);
+	}
+	return node;
+}
+
 static int ibv_madvise_range(void *base, size_t size, int advice)
 {
 	uintptr_t start, end;
 	struct ibv_mem_node *node, *tmp;
 	int inc;
 	int ret = 0;
+	int rolling_back = 0;
 
 	if (!size)
 		return 0;
@@ -576,7 +607,13 @@ static int ibv_madvise_range(void *base, size_t size, int advice)
 					      advice);
 			if (ret) {
 				node = undo_node(node, start, inc);
-				goto out;
+				if (rolling_back || !node)
+					goto out;
+
+				node = prepare_to_roll_back(node, start, &end,
+							    &inc, &advice);
+				rolling_back = 1;
+				continue;
 			}
 		}
 
-- 
1.6.5.3


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH v2 4/4] libibverbs: Undo changes in memory range tree when madvise() fails
       [not found] ` <4B666D56.4090708-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2010-03-19 18:15   ` Roland Dreier
  0 siblings, 0 replies; 2+ messages in thread
From: Roland Dreier @ 2010-03-19 18:15 UTC (permalink / raw)
  To: alexv-smomgflXvOZWk0Htik3J/w
  Cc: roland, linux-rdma-u79uwXL29TY76Z2rM5mHXA, alexr-smomgflXvOZWk0Htik3J/w

 > +static struct ibv_mem_node *prepare_to_roll_back(struct ibv_mem_node *node,
 > +						 uintptr_t start,
 > +						 uintptr_t *end,
 > +						 int *inc,
 > +						 int *advice)
 > +{
 > +	struct ibv_mem_node *tmp = NULL;
 > +
 > +	*inc *= -1;
 > +	*advice = *inc == 1 ? MADV_DONTFORK : MADV_DOFORK;
 > +	tmp = __mm_prev(node);
 > +	node = NULL;
 > +	if (tmp) {
 > +		*end = tmp->end;
 > +		if (start <= *end)
 > +			node = get_start_node(start, *end, *inc);
 > +	}
 > +	return node;
 > +}

I don't think this really makes sense as a separate function.

You could slightly rearrange the start of ibv_madvise_range:

static int ibv_madvise_range(void *base, size_t size, int advice)
{
	uintptr_t start, end;
	struct ibv_mem_node *node, *tmp;
	int inc;
+	int rolling_back = 0;
	int ret = 0;

	if (!size)
		return 0;

+	start = (uintptr_t) base & ~(page_size - 1);
+	end   = ((uintptr_t) (base + size + page_size - 1) &
+		 ~(page_size - 1)) - 1;

+	pthread_mutex_lock(&mm_mutex);

+again:
	inc = advice == MADV_DONTFORK ? 1 : -1;

	node = get_start_node(start, end, inc);
	if (!node) {
		ret = -1;
		goto out;
	}

and then for the rollback part, do:

+				if (rolling_back || !node)
+					goto out;
+
+				/* madvise failed, roll back previous changes */
+				rolling_back = 1;
+				advice = advice == MADV_DONTFORK ? MADV_DOFORK : MADV_DONTFORK;
+				tmp = __mm_prev(node);
+				if (!tmp)
+					goto out;
+				end = tmp->end;
+				goto again;

(All this untested/uncompiled etc).
-- 
Roland Dreier  <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2010-03-19 18:15 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-01  5:57 [PATCH v2 4/4] libibverbs: Undo changes in memory range tree when madvise() fails Alex Vainman
     [not found] ` <4B666D56.4090708-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2010-03-19 18:15   ` Roland Dreier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.