All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] xenstore: support reading directory with many children
@ 2016-10-25  5:52 Juergen Gross
  2016-10-25  5:52 ` [PATCH 1/2] xenstore: add support for " Juergen Gross
  2016-10-25  5:52 ` [PATCH 2/2] xenstore: support XS_DIRECTORY_PART in libxenstore Juergen Gross
  0 siblings, 2 replies; 9+ messages in thread
From: Juergen Gross @ 2016-10-25  5:52 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, wei.liu2, andrew.cooper3, ian.jackson,
	david.vrabel, jbeulich

Reading the children list of a xenstore node with the length of that
list exceeding 4096 bytes is currently not possible. This can be a
problem for a large host with a huge number of domains as Xen tools
will no longer by capable to scan some directories of xenstore (e.g.
/local/domain).

This small patch series adds a new xs wire command to read a directory
in multiple chunks. libxenstore is modified in a compatible way to
show an unmodified result in case xenstored doesn't support the new
command.

The patch set has been verified to work by using the following shell script:

xenstore-write /test "test"

for i in `seq 100 500`
do
    xenstore-write /test/entry_with_very_long_name_$i $i
done

xenstore-ls
xenstore-rm /test


Juergen Gross (2):
  xenstore: add support for reading directory with many children
  xenstore: support XS_DIRECTORY_PART in libxenstore

 tools/xenstore/xenstored_core.c        |  4 ++
 tools/xenstore/xenstored_transaction.c | 57 ++++++++++++++++++++++++++
 tools/xenstore/xenstored_transaction.h |  1 +
 tools/xenstore/xs.c                    | 74 ++++++++++++++++++++++++++++++----
 xen/include/public/io/xs_wire.h        |  1 +
 5 files changed, 129 insertions(+), 8 deletions(-)

-- 
2.6.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] xenstore: add support for reading directory with many children
  2016-10-25  5:52 [PATCH 0/2] xenstore: support reading directory with many children Juergen Gross
@ 2016-10-25  5:52 ` Juergen Gross
  2016-10-25  9:06   ` Jan Beulich
       [not found]   ` <580F3CC902000078001195F3@suse.com>
  2016-10-25  5:52 ` [PATCH 2/2] xenstore: support XS_DIRECTORY_PART in libxenstore Juergen Gross
  1 sibling, 2 replies; 9+ messages in thread
From: Juergen Gross @ 2016-10-25  5:52 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, wei.liu2, andrew.cooper3, ian.jackson,
	david.vrabel, jbeulich

As the payload size for one xenstore wire command is limited to 4096
bytes it is impossible to read the children names of a node with a
large number of children (e.g. /local/domain in case of a host with
more than about 2000 domains). This effectively limits the maximum
number of domains a host can support.

In order to support such long directory outputs add a new wire command
XS_DIRECTORY_PART which will return only some entries in each call and
can be called in a loop to get all entries. For this to work reliably
the loop using XS_DIRECTORY_PART until no further entries are returned
must be in one transaction.

Using XS_DIRECTORY_PART with a valid path will start the transmission
by copying the list of children to a xenstored internal buffer linked
to the transaction. Further calls of XS_DIRECTORY_PART with a path "@"
will advance in the buffer. The end of the output is indicated by an
empty child name. The internal buffer is released when:

- the buffer is exhausted
- XS_DIRECTORY or XS_DIRECTORY_PART with a valid path is called
  (this will allocate a new buffer, of course)
- the transaction is terminated (explicit or implicit termination)

The number of entries returned for each call is implementation
specific. The only guarantee is that no call will exceed the limit of
4096 bytes returned.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 tools/xenstore/xenstored_core.c        |  4 +++
 tools/xenstore/xenstored_transaction.c | 57 ++++++++++++++++++++++++++++++++++
 tools/xenstore/xenstored_transaction.h |  1 +
 xen/include/public/io/xs_wire.h        |  1 +
 4 files changed, 63 insertions(+)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 3df977b..9667ce5 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1327,6 +1327,10 @@ static void process_message(struct connection *conn, struct buffered_data *in)
 		do_reset_watches(conn, in);
 		break;
 
+	case XS_DIRECTORY_PART:
+		send_directory_part(conn, in);
+		break;
+
 	default:
 		eprintf("Client unknown operation %i", in->hdr.msg.type);
 		send_error(conn, ENOSYS);
diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
index 34720fa..22a51da 100644
--- a/tools/xenstore/xenstored_transaction.c
+++ b/tools/xenstore/xenstored_transaction.c
@@ -79,6 +79,11 @@ struct transaction
 
 	/* List of changed domains - to record the changed domain entry number */
 	struct list_head changed_domains;
+
+	/* Temporary buffer for XS_DIRECTORY_PART. */
+	char *dirpart_buf;
+	unsigned buf_off;
+	unsigned buf_len;
 };
 
 extern int quota_max_transaction;
@@ -280,6 +285,58 @@ void conn_delete_all_transactions(struct connection *conn)
 	conn->transaction_started = 0;
 }
 
+void send_directory_part(struct connection *conn, struct buffered_data *in)
+{
+	struct transaction *trans = conn->transaction;
+	struct node *node;
+	const char *name = onearg(in);
+	unsigned len;
+
+	if (name == NULL || trans == NULL) {
+		send_error(conn, EINVAL);
+		return;
+	}
+
+	if (name[0] == '@' && name[1] == 0) {
+		if (trans->dirpart_buf == NULL) {
+			send_error(conn, EINVAL);
+			return;
+		}
+	} else {
+		if (trans->dirpart_buf) {
+			talloc_free(trans->dirpart_buf);
+			trans->dirpart_buf = NULL;
+		}
+
+		name = canonicalize(conn, name);
+		node = get_node(conn, in, name, XS_PERM_READ);
+		if (!node) {
+			send_error(conn, errno);
+			return;
+		}
+		trans->dirpart_buf = talloc_array(trans, char,
+						  node->childlen + 1);
+		memcpy(trans->dirpart_buf, node->children, node->childlen);
+		trans->dirpart_buf[node->childlen] = 0;
+		trans->buf_off = 0;
+		trans->buf_len = node->childlen + 1;
+	}
+
+	if (trans->buf_len - trans->buf_off > 1024)
+		len = strlen(trans->dirpart_buf + trans->buf_off) + 1;
+	else
+		len = trans->buf_len - trans->buf_off;
+
+	send_reply(conn, XS_DIRECTORY_PART, trans->dirpart_buf + trans->buf_off,
+		   len);
+
+	trans->buf_off += len;
+	if (trans->buf_off == trans->buf_len) {
+		talloc_free(trans->dirpart_buf);
+		trans->dirpart_buf = NULL;
+	}
+}
+
 /*
  * Local variables:
  *  c-file-style: "linux"
diff --git a/tools/xenstore/xenstored_transaction.h b/tools/xenstore/xenstored_transaction.h
index 0c868ee..7e9e187 100644
--- a/tools/xenstore/xenstored_transaction.h
+++ b/tools/xenstore/xenstored_transaction.h
@@ -38,5 +38,6 @@ void add_change_node(struct transaction *trans, const char *node,
 TDB_CONTEXT *tdb_transaction_context(struct transaction *trans);
 
 void conn_delete_all_transactions(struct connection *conn);
+void send_directory_part(struct connection *conn, struct buffered_data *in);
 
 #endif /* _XENSTORED_TRANSACTION_H */
diff --git a/xen/include/public/io/xs_wire.h b/xen/include/public/io/xs_wire.h
index 0a0cdbc..545f916 100644
--- a/xen/include/public/io/xs_wire.h
+++ b/xen/include/public/io/xs_wire.h
@@ -50,6 +50,7 @@ enum xsd_sockmsg_type
     XS_SET_TARGET,
     XS_RESTRICT,
     XS_RESET_WATCHES,
+    XS_DIRECTORY_PART,
 
     XS_INVALID = 0xffff /* Guaranteed to remain an invalid type */
 };
-- 
2.6.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] xenstore: support XS_DIRECTORY_PART in libxenstore
  2016-10-25  5:52 [PATCH 0/2] xenstore: support reading directory with many children Juergen Gross
  2016-10-25  5:52 ` [PATCH 1/2] xenstore: add support for " Juergen Gross
@ 2016-10-25  5:52 ` Juergen Gross
  1 sibling, 0 replies; 9+ messages in thread
From: Juergen Gross @ 2016-10-25  5:52 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, wei.liu2, andrew.cooper3, ian.jackson,
	david.vrabel, jbeulich

This will enable all users of libxenstore to handle xenstore nodes
with a huge amount of children.

In order to not depend completely on the XS_DIRECTORY_PART
functionality use it only in case of E2BIG returned by XS_DIRECTORY.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 tools/xenstore/xs.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 66 insertions(+), 8 deletions(-)

diff --git a/tools/xenstore/xs.c b/tools/xenstore/xs.c
index d1e01ba..e64ded8 100644
--- a/tools/xenstore/xs.c
+++ b/tools/xenstore/xs.c
@@ -558,15 +558,10 @@ static bool xs_bool(char *reply)
 	return true;
 }
 
-char **xs_directory(struct xs_handle *h, xs_transaction_t t,
-		    const char *path, unsigned int *num)
+static char **xs_directory_common(char *strings, unsigned int len,
+				  unsigned int *num)
 {
-	char *strings, *p, **ret;
-	unsigned int len;
-
-	strings = xs_single(h, t, XS_DIRECTORY, path, &len);
-	if (!strings)
-		return NULL;
+	char *p, **ret;
 
 	/* Count the strings. */
 	*num = xs_count_strings(strings, len);
@@ -586,6 +581,69 @@ char **xs_directory(struct xs_handle *h, xs_transaction_t t,
 	return ret;
 }
 
+static char **xs_directory_part(struct xs_handle *h, xs_transaction_t t,
+				const char *path, unsigned int *num)
+{
+	char *strings, *p, **ret;
+	unsigned int len, p_len;
+	int saved_errno;
+
+	if (t == XBT_NULL) {
+		for (;;) {
+			t = xs_transaction_start(h);
+			if (t == XBT_NULL)
+				return NULL;
+
+			ret = xs_directory_part(h, t, path, num);
+			saved_errno = errno;
+
+			if (xs_transaction_end(h, t, false) || ret == NULL) {
+				errno = saved_errno;
+				return ret;
+			}
+
+			free(ret);
+		}
+	}
+
+	strings = xs_single(h, t, XS_DIRECTORY_PART, path, &len);
+	if (!strings) {
+		/* If XS_DIRECTORY_PART is not supported return E2BIG. */
+		if (errno == ENOSYS)
+			errno = E2BIG;
+		return NULL;
+	}
+
+	while (len > 1 && strings[len - 2] != 0) {
+		p = xs_single(h, t, XS_DIRECTORY_PART, "@", &p_len);
+		if (!p) {
+			free_no_errno(strings);
+			return NULL;
+		}
+		strings = realloc(strings, len + p_len);
+		memcpy(strings + len, p, p_len);
+		len += p_len;
+	}
+
+	return xs_directory_common(strings, len - 1, num);
+}
+
+char **xs_directory(struct xs_handle *h, xs_transaction_t t,
+		    const char *path, unsigned int *num)
+{
+	char *strings;
+	unsigned int len;
+
+	strings = xs_single(h, t, XS_DIRECTORY, path, &len);
+	if (!strings) {
+		if (errno != E2BIG)
+			return NULL;
+		return xs_directory_part(h, t, path, num);
+	}
+
+	return xs_directory_common(strings, len, num);
+}
+
 /* Get the value of a single file, nul terminated.
  * Returns a malloced value: call free() on it after use.
  * len indicates length in bytes, not including the nul.
-- 
2.6.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 1/2] xenstore: add support for reading directory with many children
  2016-10-25  5:52 ` [PATCH 1/2] xenstore: add support for " Juergen Gross
@ 2016-10-25  9:06   ` Jan Beulich
       [not found]   ` <580F3CC902000078001195F3@suse.com>
  1 sibling, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2016-10-25  9:06 UTC (permalink / raw)
  To: Juergen Gross
  Cc: ian.jackson, andrew.cooper3, wei.liu2, david.vrabel, xen-devel

>>> On 25.10.16 at 07:52, <JGross@suse.com> wrote:
> --- a/tools/xenstore/xenstored_transaction.c
> +++ b/tools/xenstore/xenstored_transaction.c
> @@ -79,6 +79,11 @@ struct transaction
>  
>  	/* List of changed domains - to record the changed domain entry number */
>  	struct list_head changed_domains;
> +
> +	/* Temporary buffer for XS_DIRECTORY_PART. */
> +	char *dirpart_buf;
> +	unsigned buf_off;
> +	unsigned buf_len;
>  };

The buffer you allocate for this can - by nature of this change - be
huge, and there's one per transaction. Isn't this causing a resource
utilization issue?

> @@ -280,6 +285,58 @@ void conn_delete_all_transactions(struct connection *conn)
>  	conn->transaction_started = 0;
>  }
>  
> +void send_directory_part(struct connection *conn, struct buffered_data *in)
> +{
> +	struct transaction *trans = conn->transaction;
> +	struct node *node;
> +	const char *name = onearg(in);
> +	unsigned len;
> +
> +	if (name == NULL || trans == NULL) {
> +		send_error(conn, EINVAL);
> +		return;
> +	}
> +
> +	if (name[0] == '@' && name[1] == 0) {
> +		if (trans->dirpart_buf == NULL) {
> +			send_error(conn, EINVAL);
> +			return;
> +		}
> +	} else {
> +		if (trans->dirpart_buf) {
> +			talloc_free(trans->dirpart_buf);
> +			trans->dirpart_buf = NULL;
> +		}
> +
> +		name = canonicalize(conn, name);
> +		node = get_node(conn, in, name, XS_PERM_READ);
> +		if (!node) {
> +			send_error(conn, errno);
> +			return;
> +		}
> +		trans->dirpart_buf = talloc_array(trans, char,
> +						  node->childlen + 1);

Once again, especially considering the buffer possibly being huge,
shouldn't you check for allocation failure here?

> +		memcpy(trans->dirpart_buf, node->children, node->childlen);
> +		trans->dirpart_buf[node->childlen] = 0;
> +		trans->buf_off = 0;
> +		trans->buf_len = node->childlen + 1;
> +	}
> +
> +	if (trans->buf_len - trans->buf_off > 1024)

What has this 1024 been derived from? In particular, why is it not
(close to) the 4096 limit mentioned in the description?

> +		len = strlen(trans->dirpart_buf + trans->buf_off) + 1;

Looking at construct_node() I'm getting the impression that there
are embedded nul characters in the ->children array, in which case
this strlen() would likely chop off values which could still fit, requiring
needlessly many iterations. And the explicit nul termination after the
allocation above would then also appear to be unnecessary.

And then - why did you put this function here instead of in
xenstored_core.c, e.g. next to send_directory()?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] xenstore: add support for reading directory with many children
       [not found]   ` <580F3CC902000078001195F3@suse.com>
@ 2016-10-25 11:41     ` Juergen Gross
  2016-10-25 13:20       ` Jan Beulich
       [not found]       ` <580F785502000078001197C7@suse.com>
  0 siblings, 2 replies; 9+ messages in thread
From: Juergen Gross @ 2016-10-25 11:41 UTC (permalink / raw)
  To: Jan Beulich
  Cc: ian.jackson, andrew.cooper3, wei.liu2, david.vrabel, xen-devel

On 25/10/16 11:06, Jan Beulich wrote:
>>>> On 25.10.16 at 07:52, <JGross@suse.com> wrote:
>> --- a/tools/xenstore/xenstored_transaction.c
>> +++ b/tools/xenstore/xenstored_transaction.c
>> @@ -79,6 +79,11 @@ struct transaction
>>  
>>  	/* List of changed domains - to record the changed domain entry number */
>>  	struct list_head changed_domains;
>> +
>> +	/* Temporary buffer for XS_DIRECTORY_PART. */
>> +	char *dirpart_buf;
>> +	unsigned buf_off;
>> +	unsigned buf_len;
>>  };
> 
> The buffer you allocate for this can - by nature of this change - be
> huge, and there's one per transaction. Isn't this causing a resource
> utilization issue?

It will be allocated only when needed and its size is less than that of
the node to be read.

>> @@ -280,6 +285,58 @@ void conn_delete_all_transactions(struct connection *conn)
>>  	conn->transaction_started = 0;
>>  }
>>  
>> +void send_directory_part(struct connection *conn, struct buffered_data *in)
>> +{
>> +	struct transaction *trans = conn->transaction;
>> +	struct node *node;
>> +	const char *name = onearg(in);
>> +	unsigned len;
>> +
>> +	if (name == NULL || trans == NULL) {
>> +		send_error(conn, EINVAL);
>> +		return;
>> +	}
>> +
>> +	if (name[0] == '@' && name[1] == 0) {
>> +		if (trans->dirpart_buf == NULL) {
>> +			send_error(conn, EINVAL);
>> +			return;
>> +		}
>> +	} else {
>> +		if (trans->dirpart_buf) {
>> +			talloc_free(trans->dirpart_buf);
>> +			trans->dirpart_buf = NULL;
>> +		}
>> +
>> +		name = canonicalize(conn, name);
>> +		node = get_node(conn, in, name, XS_PERM_READ);
>> +		if (!node) {
>> +			send_error(conn, errno);
>> +			return;
>> +		}
>> +		trans->dirpart_buf = talloc_array(trans, char,
>> +						  node->childlen + 1);
> 
> Once again, especially considering the buffer possibly being huge,
> shouldn't you check for allocation failure here?

I followed coding style from xenstored. Modifying this style should be
another patch set IMHO (I'd be fine doing this).

> 
>> +		memcpy(trans->dirpart_buf, node->children, node->childlen);
>> +		trans->dirpart_buf[node->childlen] = 0;
>> +		trans->buf_off = 0;
>> +		trans->buf_len = node->childlen + 1;
>> +	}
>> +
>> +	if (trans->buf_len - trans->buf_off > 1024)
> 
> What has this 1024 been derived from? In particular, why is it not
> (close to) the 4096 limit mentioned in the description?

In theory a single entry could be up to 2048 bytes long. I wanted to
keep the logic simple by not trying to iterate until the limit is
reached. I can change that, of course.

> 
>> +		len = strlen(trans->dirpart_buf + trans->buf_off) + 1;
> 
> Looking at construct_node() I'm getting the impression that there
> are embedded nul characters in the ->children array, in which case
> this strlen() would likely chop off values which could still fit, requiring
> needlessly many iterations. And the explicit nul termination after the
> allocation above would then also appear to be unnecessary.

Each children member is a nul terminated string, so I need to keep
the nul bytes in the result. And the final nul byte is the indicator
for the last entry being reached.

> And then - why did you put this function here instead of in
> xenstored_core.c, e.g. next to send_directory()?

It is using the xenstored_transaction.c private struct transaction.
Putting it in xenstored_core.c would have required to move the
struct definition into a header file.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] xenstore: add support for reading directory with many children
  2016-10-25 11:41     ` Juergen Gross
@ 2016-10-25 13:20       ` Jan Beulich
       [not found]       ` <580F785502000078001197C7@suse.com>
  1 sibling, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2016-10-25 13:20 UTC (permalink / raw)
  To: Juergen Gross
  Cc: ian.jackson, andrew.cooper3, wei.liu2, david.vrabel, xen-devel

>>> On 25.10.16 at 13:41, <JGross@suse.com> wrote:
> On 25/10/16 11:06, Jan Beulich wrote:
>>>>> On 25.10.16 at 07:52, <JGross@suse.com> wrote:
>>> --- a/tools/xenstore/xenstored_transaction.c
>>> +++ b/tools/xenstore/xenstored_transaction.c
>>> @@ -79,6 +79,11 @@ struct transaction
>>>  
>>>  	/* List of changed domains - to record the changed domain entry number */
>>>  	struct list_head changed_domains;
>>> +
>>> +	/* Temporary buffer for XS_DIRECTORY_PART. */
>>> +	char *dirpart_buf;
>>> +	unsigned buf_off;
>>> +	unsigned buf_len;
>>>  };
>> 
>> The buffer you allocate for this can - by nature of this change - be
>> huge, and there's one per transaction. Isn't this causing a resource
>> utilization issue?
> 
> It will be allocated only when needed and its size is less than that of
> the node to be read.

That's not addressing my concern. A DomU could have multiple
transactions each with such an operation in progress. And the
space here doesn't get accounted against any of the quota.

>>> @@ -280,6 +285,58 @@ void conn_delete_all_transactions(struct connection *conn)
>>>  	conn->transaction_started = 0;
>>>  }
>>>  
>>> +void send_directory_part(struct connection *conn, struct buffered_data *in)
>>> +{
>>> +	struct transaction *trans = conn->transaction;
>>> +	struct node *node;
>>> +	const char *name = onearg(in);
>>> +	unsigned len;
>>> +
>>> +	if (name == NULL || trans == NULL) {
>>> +		send_error(conn, EINVAL);
>>> +		return;
>>> +	}
>>> +
>>> +	if (name[0] == '@' && name[1] == 0) {
>>> +		if (trans->dirpart_buf == NULL) {
>>> +			send_error(conn, EINVAL);
>>> +			return;
>>> +		}
>>> +	} else {
>>> +		if (trans->dirpart_buf) {
>>> +			talloc_free(trans->dirpart_buf);
>>> +			trans->dirpart_buf = NULL;
>>> +		}
>>> +
>>> +		name = canonicalize(conn, name);
>>> +		node = get_node(conn, in, name, XS_PERM_READ);
>>> +		if (!node) {
>>> +			send_error(conn, errno);
>>> +			return;
>>> +		}
>>> +		trans->dirpart_buf = talloc_array(trans, char,
>>> +						  node->childlen + 1);
>> 
>> Once again, especially considering the buffer possibly being huge,
>> shouldn't you check for allocation failure here?
> 
> I followed coding style from xenstored. Modifying this style should be
> another patch set IMHO (I'd be fine doing this).

Cloning buggy code is never a good idea imo.

>>> +		memcpy(trans->dirpart_buf, node->children, node->childlen);
>>> +		trans->dirpart_buf[node->childlen] = 0;
>>> +		trans->buf_off = 0;
>>> +		trans->buf_len = node->childlen + 1;
>>> +	}
>>> +
>>> +	if (trans->buf_len - trans->buf_off > 1024)
>> 
>> What has this 1024 been derived from? In particular, why is it not
>> (close to) the 4096 limit mentioned in the description?
> 
> In theory a single entry could be up to 2048 bytes long. I wanted to
> keep the logic simple by not trying to iterate until the limit is
> reached. I can change that, of course.

While I think that would be worthwhile, I don't think this addresses
my comment: You don't really explain where the 1024 is coming from.
Instead to me your response looks to be related to the following
point.

>>> +		len = strlen(trans->dirpart_buf + trans->buf_off) + 1;
>> 
>> Looking at construct_node() I'm getting the impression that there
>> are embedded nul characters in the ->children array, in which case
>> this strlen() would likely chop off values which could still fit, requiring
>> needlessly many iterations. And the explicit nul termination after the
>> allocation above would then also appear to be unnecessary.
> 
> Each children member is a nul terminated string, so I need to keep
> the nul bytes in the result. And the final nul byte is the indicator
> for the last entry being reached.

Right, but this means you transfer only a single entry here. That was
the point of my original concern.

>> And then - why did you put this function here instead of in
>> xenstored_core.c, e.g. next to send_directory()?
> 
> It is using the xenstored_transaction.c private struct transaction.
> Putting it in xenstored_core.c would have required to move the
> struct definition into a header file.

Yeah, I've realized this after having sent the reply, but then again
this also relates to the first question above (whether this should be
a per-transaction item in the first place). In particular I don't think
this being per transaction helps with allowing the guest to (easily)
run multiple such queries in parallel, as commonly simple operations
get run with null transactions.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] xenstore: add support for reading directory with many children
       [not found]       ` <580F785502000078001197C7@suse.com>
@ 2016-10-25 13:47         ` Juergen Gross
  2016-10-25 14:02           ` Jan Beulich
       [not found]           ` <580F820E0200007800119823@suse.com>
  0 siblings, 2 replies; 9+ messages in thread
From: Juergen Gross @ 2016-10-25 13:47 UTC (permalink / raw)
  To: Jan Beulich
  Cc: ian.jackson, andrew.cooper3, wei.liu2, david.vrabel, xen-devel

On 25/10/16 15:20, Jan Beulich wrote:
>>>> On 25.10.16 at 13:41, <JGross@suse.com> wrote:
>> On 25/10/16 11:06, Jan Beulich wrote:
>>>>>> On 25.10.16 at 07:52, <JGross@suse.com> wrote:
>>>> --- a/tools/xenstore/xenstored_transaction.c
>>>> +++ b/tools/xenstore/xenstored_transaction.c
>>>> @@ -79,6 +79,11 @@ struct transaction
>>>>  
>>>>  	/* List of changed domains - to record the changed domain entry number */
>>>>  	struct list_head changed_domains;
>>>> +
>>>> +	/* Temporary buffer for XS_DIRECTORY_PART. */
>>>> +	char *dirpart_buf;
>>>> +	unsigned buf_off;
>>>> +	unsigned buf_len;
>>>>  };
>>>
>>> The buffer you allocate for this can - by nature of this change - be
>>> huge, and there's one per transaction. Isn't this causing a resource
>>> utilization issue?
>>
>> It will be allocated only when needed and its size is less than that of
>> the node to be read.
> 
> That's not addressing my concern. A DomU could have multiple
> transactions each with such an operation in progress. And the
> space here doesn't get accounted against any of the quota.

It is accounted against the maximum number of transactions.

Additionally the buffer will be kept only if not all of the data
could be transferred in the first iteration of the read.

> 
>>>> @@ -280,6 +285,58 @@ void conn_delete_all_transactions(struct connection *conn)
>>>>  	conn->transaction_started = 0;
>>>>  }
>>>>  
>>>> +void send_directory_part(struct connection *conn, struct buffered_data *in)
>>>> +{
>>>> +	struct transaction *trans = conn->transaction;
>>>> +	struct node *node;
>>>> +	const char *name = onearg(in);
>>>> +	unsigned len;
>>>> +
>>>> +	if (name == NULL || trans == NULL) {
>>>> +		send_error(conn, EINVAL);
>>>> +		return;
>>>> +	}
>>>> +
>>>> +	if (name[0] == '@' && name[1] == 0) {
>>>> +		if (trans->dirpart_buf == NULL) {
>>>> +			send_error(conn, EINVAL);
>>>> +			return;
>>>> +		}
>>>> +	} else {
>>>> +		if (trans->dirpart_buf) {
>>>> +			talloc_free(trans->dirpart_buf);
>>>> +			trans->dirpart_buf = NULL;
>>>> +		}
>>>> +
>>>> +		name = canonicalize(conn, name);
>>>> +		node = get_node(conn, in, name, XS_PERM_READ);
>>>> +		if (!node) {
>>>> +			send_error(conn, errno);
>>>> +			return;
>>>> +		}
>>>> +		trans->dirpart_buf = talloc_array(trans, char,
>>>> +						  node->childlen + 1);
>>>
>>> Once again, especially considering the buffer possibly being huge,
>>> shouldn't you check for allocation failure here?
>>
>> I followed coding style from xenstored. Modifying this style should be
>> another patch set IMHO (I'd be fine doing this).
> 
> Cloning buggy code is never a good idea imo.
> 
>>>> +		memcpy(trans->dirpart_buf, node->children, node->childlen);
>>>> +		trans->dirpart_buf[node->childlen] = 0;
>>>> +		trans->buf_off = 0;
>>>> +		trans->buf_len = node->childlen + 1;
>>>> +	}
>>>> +
>>>> +	if (trans->buf_len - trans->buf_off > 1024)
>>>
>>> What has this 1024 been derived from? In particular, why is it not
>>> (close to) the 4096 limit mentioned in the description?
>>
>> In theory a single entry could be up to 2048 bytes long. I wanted to
>> keep the logic simple by not trying to iterate until the limit is
>> reached. I can change that, of course.
> 
> While I think that would be worthwhile, I don't think this addresses
> my comment: You don't really explain where the 1024 is coming from.
> Instead to me your response looks to be related to the following
> point.

The 1024 is some arbitrary choice. I'll change it with the iteration
towards the allowed maximum added.

> 
>>>> +		len = strlen(trans->dirpart_buf + trans->buf_off) + 1;
>>>
>>> Looking at construct_node() I'm getting the impression that there
>>> are embedded nul characters in the ->children array, in which case
>>> this strlen() would likely chop off values which could still fit, requiring
>>> needlessly many iterations. And the explicit nul termination after the
>>> allocation above would then also appear to be unnecessary.
>>
>> Each children member is a nul terminated string, so I need to keep
>> the nul bytes in the result. And the final nul byte is the indicator
>> for the last entry being reached.
> 
> Right, but this means you transfer only a single entry here. That was
> the point of my original concern.

And only now I realize that you are right. I meant to transfer at
least 1024 bytes but didn't do so. I'll correct this.

> 
>>> And then - why did you put this function here instead of in
>>> xenstored_core.c, e.g. next to send_directory()?
>>
>> It is using the xenstored_transaction.c private struct transaction.
>> Putting it in xenstored_core.c would have required to move the
>> struct definition into a header file.
> 
> Yeah, I've realized this after having sent the reply, but then again
> this also relates to the first question above (whether this should be
> a per-transaction item in the first place). In particular I don't think
> this being per transaction helps with allowing the guest to (easily)
> run multiple such queries in parallel, as commonly simple operations
> get run with null transactions.

The XS_DIRECTORY_PART command is valid in a transaction only. This is
a prerequisite to be able to split the operation into multiple pieces
while keeping the consistency.


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] xenstore: add support for reading directory with many children
  2016-10-25 13:47         ` Juergen Gross
@ 2016-10-25 14:02           ` Jan Beulich
       [not found]           ` <580F820E0200007800119823@suse.com>
  1 sibling, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2016-10-25 14:02 UTC (permalink / raw)
  To: Juergen Gross
  Cc: ian.jackson, andrew.cooper3, wei.liu2, david.vrabel, xen-devel

>>> On 25.10.16 at 15:47, <JGross@suse.com> wrote:
> On 25/10/16 15:20, Jan Beulich wrote:
>>>>> On 25.10.16 at 13:41, <JGross@suse.com> wrote:
>>> On 25/10/16 11:06, Jan Beulich wrote:
>>>>>>> On 25.10.16 at 07:52, <JGross@suse.com> wrote:
>>>>> --- a/tools/xenstore/xenstored_transaction.c
>>>>> +++ b/tools/xenstore/xenstored_transaction.c
>>>>> @@ -79,6 +79,11 @@ struct transaction
>>>>>  
>>>>>  	/* List of changed domains - to record the changed domain entry number */
>>>>>  	struct list_head changed_domains;
>>>>> +
>>>>> +	/* Temporary buffer for XS_DIRECTORY_PART. */
>>>>> +	char *dirpart_buf;
>>>>> +	unsigned buf_off;
>>>>> +	unsigned buf_len;
>>>>>  };
>>>>
>>>> The buffer you allocate for this can - by nature of this change - be
>>>> huge, and there's one per transaction. Isn't this causing a resource
>>>> utilization issue?
>>>
>>> It will be allocated only when needed and its size is less than that of
>>> the node to be read.
>> 
>> That's not addressing my concern. A DomU could have multiple
>> transactions each with such an operation in progress. And the
>> space here doesn't get accounted against any of the quota.
> 
> It is accounted against the maximum number of transactions.
> 
> Additionally the buffer will be kept only if not all of the data
> could be transferred in the first iteration of the read.

Which nevertheless means an ill behaved guest could (if its quota
allow) create a node with sufficiently many children and then start
as many transactions as it can, and do a partial directory listing
over each of them. Perhaps, considering that you mainly need this
for Dom0, the new verb could be restricted to the control domain
for now?

>>>> And then - why did you put this function here instead of in
>>>> xenstored_core.c, e.g. next to send_directory()?
>>>
>>> It is using the xenstored_transaction.c private struct transaction.
>>> Putting it in xenstored_core.c would have required to move the
>>> struct definition into a header file.
>> 
>> Yeah, I've realized this after having sent the reply, but then again
>> this also relates to the first question above (whether this should be
>> a per-transaction item in the first place). In particular I don't think
>> this being per transaction helps with allowing the guest to (easily)
>> run multiple such queries in parallel, as commonly simple operations
>> get run with null transactions.
> 
> The XS_DIRECTORY_PART command is valid in a transaction only. This is
> a prerequisite to be able to split the operation into multiple pieces
> while keeping the consistency.

But I'm sure other models could be come up with, ideally without
requiring meaningful amounts of temporary storage. For instance
nodes could be versioned, and if the version changed between
iterations, the caller could be signaled to start over. And it looks
like such a version wouldn't even need to be bumped for child
additions, but only for child removals.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] xenstore: add support for reading directory with many children
       [not found]           ` <580F820E0200007800119823@suse.com>
@ 2016-10-25 14:48             ` Juergen Gross
  0 siblings, 0 replies; 9+ messages in thread
From: Juergen Gross @ 2016-10-25 14:48 UTC (permalink / raw)
  To: Jan Beulich
  Cc: ian.jackson, andrew.cooper3, wei.liu2, david.vrabel, xen-devel

On 25/10/16 16:02, Jan Beulich wrote:
>>>> On 25.10.16 at 15:47, <JGross@suse.com> wrote:
>> On 25/10/16 15:20, Jan Beulich wrote:
>>>>>> On 25.10.16 at 13:41, <JGross@suse.com> wrote:
>>>> On 25/10/16 11:06, Jan Beulich wrote:
>>>>>>>> On 25.10.16 at 07:52, <JGross@suse.com> wrote:
>>>>>> --- a/tools/xenstore/xenstored_transaction.c
>>>>>> +++ b/tools/xenstore/xenstored_transaction.c
>>>>>> @@ -79,6 +79,11 @@ struct transaction
>>>>>>  
>>>>>>  	/* List of changed domains - to record the changed domain entry number */
>>>>>>  	struct list_head changed_domains;
>>>>>> +
>>>>>> +	/* Temporary buffer for XS_DIRECTORY_PART. */
>>>>>> +	char *dirpart_buf;
>>>>>> +	unsigned buf_off;
>>>>>> +	unsigned buf_len;
>>>>>>  };
>>>>>
>>>>> The buffer you allocate for this can - by nature of this change - be
>>>>> huge, and there's one per transaction. Isn't this causing a resource
>>>>> utilization issue?
>>>>
>>>> It will be allocated only when needed and its size is less than that of
>>>> the node to be read.
>>>
>>> That's not addressing my concern. A DomU could have multiple
>>> transactions each with such an operation in progress. And the
>>> space here doesn't get accounted against any of the quota.
>>
>> It is accounted against the maximum number of transactions.
>>
>> Additionally the buffer will be kept only if not all of the data
>> could be transferred in the first iteration of the read.
> 
> Which nevertheless means an ill behaved guest could (if its quota
> allow) create a node with sufficiently many children and then start
> as many transactions as it can, and do a partial directory listing
> over each of them. Perhaps, considering that you mainly need this
> for Dom0, the new verb could be restricted to the control domain
> for now?

What about driver domains? They might need it, too.

Regarding memory usage: current limits for a guest if no quota have been
specified are:

- 1000 nodes
- max. entry size of 2048 bytes
- 10 transactions

So each node will need max. 4kB (2048 bytes entry, 2048 bytes name),
total node memory will be 4MB for this guest.

Lets assume the domU puts all those nodes in just one directory and
tries to read it in all possible 10 transactions. For each
transaction we'll need:

- 2MB temp buffer (1000 nodes * 2048 bytes name)
- at least 4MB data base copy (each transaction needs a complete
  copy of the data base, which is one of the main reasons I want to
  rewrite the transaction handling)

So an ill-behaved domU can force xenstored to use at least 64MB of
memory. The (minimal) memory amount for n ill-behaved domUs is:

n * (24 MB + n * 40MB)

n=1:  64MB
n=2: 208MB
n=3: 432MB
n=4: 736MB

Without the XS_DIRECTORY_PART support the numbers are rather similar:

n * n * 40MB total memory, so e.g. 640MB for 4 evil domUs.

> 
>>>>> And then - why did you put this function here instead of in
>>>>> xenstored_core.c, e.g. next to send_directory()?
>>>>
>>>> It is using the xenstored_transaction.c private struct transaction.
>>>> Putting it in xenstored_core.c would have required to move the
>>>> struct definition into a header file.
>>>
>>> Yeah, I've realized this after having sent the reply, but then again
>>> this also relates to the first question above (whether this should be
>>> a per-transaction item in the first place). In particular I don't think
>>> this being per transaction helps with allowing the guest to (easily)
>>> run multiple such queries in parallel, as commonly simple operations
>>> get run with null transactions.
>>
>> The XS_DIRECTORY_PART command is valid in a transaction only. This is
>> a prerequisite to be able to split the operation into multiple pieces
>> while keeping the consistency.
> 
> But I'm sure other models could be come up with, ideally without
> requiring meaningful amounts of temporary storage. For instance
> nodes could be versioned, and if the version changed between
> iterations, the caller could be signaled to start over. And it looks
> like such a version wouldn't even need to be bumped for child
> additions, but only for child removals.

In case nobody objects to that idea I'll look into it. The node version
might even help for better transaction support.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-10-25 14:48 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-25  5:52 [PATCH 0/2] xenstore: support reading directory with many children Juergen Gross
2016-10-25  5:52 ` [PATCH 1/2] xenstore: add support for " Juergen Gross
2016-10-25  9:06   ` Jan Beulich
     [not found]   ` <580F3CC902000078001195F3@suse.com>
2016-10-25 11:41     ` Juergen Gross
2016-10-25 13:20       ` Jan Beulich
     [not found]       ` <580F785502000078001197C7@suse.com>
2016-10-25 13:47         ` Juergen Gross
2016-10-25 14:02           ` Jan Beulich
     [not found]           ` <580F820E0200007800119823@suse.com>
2016-10-25 14:48             ` Juergen Gross
2016-10-25  5:52 ` [PATCH 2/2] xenstore: support XS_DIRECTORY_PART in libxenstore Juergen Gross

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.