xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/13] tools/xenstore: rework internal accounting
@ 2023-01-20 10:00 Juergen Gross
  2023-01-20 10:00 ` [PATCH v2 01/13] tools/xenstore: don't allow creating too many nodes in a transaction Juergen Gross
                   ` (12 more replies)
  0 siblings, 13 replies; 40+ messages in thread
From: Juergen Gross @ 2023-01-20 10:00 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Wei Liu, Julien Grall, Anthony PERARD,
	Andrew Cooper, George Dunlap, Jan Beulich, Stefano Stabellini

This series reworks the Xenstore internal accounting to use a uniform
generic framework. It is adding some additional useful diagnostic
information, like accounting trace and max. per-domain and global quota
values seen.

Changes in V2:
- added patch 1 (leftover from previous series)
- rebase

Juergen Gross (13):
  tools/xenstore: don't allow creating too many nodes in a transaction
  tools/xenstore: manage per-transaction domain accounting data in an
    array
  tools/xenstore: introduce accounting data array for per-domain values
  tools/xenstore: add framework to commit accounting data on success
    only
  tools/xenstore: use accounting buffering for node accounting
  tools/xenstore: add current connection to domain_memory_add()
    parameters
  tools/xenstore: use accounting data array for per-domain values
  tools/xenstore: add accounting trace support
  tools/xenstore: add TDB access trace support
  tools/xenstore: switch transaction accounting to generic accounting
  tools/xenstore: remember global and per domain max accounting values
  tools/xenstore: use generic accounting for remaining quotas
  tools/xenstore: switch quota management to be table based

 docs/misc/xenstore.txt                 |   5 +-
 tools/xenstore/xenstored_control.c     |  65 ++--
 tools/xenstore/xenstored_core.c        | 164 +++++-----
 tools/xenstore/xenstored_core.h        |  23 +-
 tools/xenstore/xenstored_domain.c      | 435 ++++++++++++++++++-------
 tools/xenstore/xenstored_domain.h      |  55 +++-
 tools/xenstore/xenstored_transaction.c |  22 +-
 tools/xenstore/xenstored_watch.c       |  15 +-
 8 files changed, 506 insertions(+), 278 deletions(-)

-- 
2.35.3



^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 01/13] tools/xenstore: don't allow creating too many nodes in a transaction
  2023-01-20 10:00 [PATCH v2 00/13] tools/xenstore: rework internal accounting Juergen Gross
@ 2023-01-20 10:00 ` Juergen Gross
  2023-02-20  9:46   ` Julien Grall
  2023-01-20 10:00 ` [PATCH v2 02/13] tools/xenstore: manage per-transaction domain accounting data in an array Juergen Gross
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 40+ messages in thread
From: Juergen Gross @ 2023-01-20 10:00 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Wei Liu, Julien Grall, Anthony PERARD

The accounting for the number of nodes of a domain in an active
transaction is not working correctly, as it allows to create arbitrary
number of nodes. The transaction will finally fail due to exceeding
the number of nodes quota, but before closing the transaction an
unprivileged guest could cause Xenstore to use a lot of memory.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 tools/xenstore/xenstored_domain.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
index 9ef41ede03..7eb9cd077b 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -1116,9 +1116,8 @@ int domain_nbentry_fix(unsigned int domid, int num, bool update)
 
 int domain_nbentry(struct connection *conn)
 {
-	return (domain_is_unprivileged(conn))
-		? conn->domain->nbentry
-		: 0;
+	return domain_is_unprivileged(conn)
+	       ? domain_nbentry_add(conn, conn->id, 0, true) : 0;
 }
 
 static bool domain_chk_quota(struct domain *domain, int mem)
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 02/13] tools/xenstore: manage per-transaction domain accounting data in an array
  2023-01-20 10:00 [PATCH v2 00/13] tools/xenstore: rework internal accounting Juergen Gross
  2023-01-20 10:00 ` [PATCH v2 01/13] tools/xenstore: don't allow creating too many nodes in a transaction Juergen Gross
@ 2023-01-20 10:00 ` Juergen Gross
  2023-02-17 18:49   ` Julien Grall
  2023-01-20 10:00 ` [PATCH v2 03/13] tools/xenstore: introduce accounting data array for per-domain values Juergen Gross
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 40+ messages in thread
From: Juergen Gross @ 2023-01-20 10:00 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Wei Liu, Julien Grall, Anthony PERARD

In order to prepare keeping accounting data in an array instead of
using independent fields, switch the struct changed_domain accounting
data to that scheme, for now only using an array with one element.

In order to be able to extend this scheme add the needed indexing enum
to xenstored_domain.h.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- make "what" parameter of acc_add_changed_dom() an enum type, and
  assert() that it won't exceed the accounting array (Julien Grall)
---
 tools/xenstore/xenstored_domain.c | 19 +++++++++++--------
 tools/xenstore/xenstored_domain.h |  5 +++++
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
index 7eb9cd077b..44e72937fa 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -99,8 +99,8 @@ struct changed_domain
 	/* Identifier of the changed domain. */
 	unsigned int domid;
 
-	/* Amount by which this domain's nbentry field has changed. */
-	int nbentry;
+	/* Accounting data. */
+	int acc[ACC_TR_N];
 };
 
 static struct hashtable *domhash;
@@ -550,7 +550,7 @@ int acc_fix_domains(struct list_head *head, bool update)
 	int cnt;
 
 	list_for_each_entry(cd, head, list) {
-		cnt = domain_nbentry_fix(cd->domid, cd->nbentry, update);
+		cnt = domain_nbentry_fix(cd->domid, cd->acc[ACC_NODES], update);
 		if (!update) {
 			if (cnt >= quota_nb_entry_per_domain)
 				return ENOSPC;
@@ -595,19 +595,21 @@ static struct changed_domain *acc_get_changed_domain(const void *ctx,
 	return cd;
 }
 
-static int acc_add_dom_nbentry(const void *ctx, struct list_head *head, int val,
-			       unsigned int domid)
+static int acc_add_changed_dom(const void *ctx, struct list_head *head,
+			       enum accitem what, int val, unsigned int domid)
 {
 	struct changed_domain *cd;
 
+	assert(what < ARRAY_SIZE(cd->acc));
+
 	cd = acc_get_changed_domain(ctx, head, domid);
 	if (!cd)
 		return 0;
 
 	errno = 0;
-	cd->nbentry += val;
+	cd->acc[what] += val;
 
-	return cd->nbentry;
+	return cd->acc[what];
 }
 
 static void domain_conn_reset(struct domain *domain)
@@ -1071,7 +1073,8 @@ static int domain_nbentry_add(struct connection *conn, unsigned int domid,
 
 	if (conn && conn->transaction) {
 		head = transaction_get_changed_domains(conn->transaction);
-		ret = acc_add_dom_nbentry(conn->transaction, head, add, domid);
+		ret = acc_add_changed_dom(conn->transaction, head, ACC_NODES,
+					  add, domid);
 		if (errno) {
 			fail_transaction(conn->transaction);
 			return -1;
diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
index dc4660861e..6a2b76a85b 100644
--- a/tools/xenstore/xenstored_domain.h
+++ b/tools/xenstore/xenstored_domain.h
@@ -19,6 +19,11 @@
 #ifndef _XENSTORED_DOMAIN_H
 #define _XENSTORED_DOMAIN_H
 
+enum accitem {
+	ACC_NODES,
+	ACC_TR_N	/* Number of elements per transaction and domain. */
+};
+
 void handle_event(void);
 
 void check_domains(void);
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 03/13] tools/xenstore: introduce accounting data array for per-domain values
  2023-01-20 10:00 [PATCH v2 00/13] tools/xenstore: rework internal accounting Juergen Gross
  2023-01-20 10:00 ` [PATCH v2 01/13] tools/xenstore: don't allow creating too many nodes in a transaction Juergen Gross
  2023-01-20 10:00 ` [PATCH v2 02/13] tools/xenstore: manage per-transaction domain accounting data in an array Juergen Gross
@ 2023-01-20 10:00 ` Juergen Gross
  2023-02-17 19:29   ` Julien Grall
  2023-01-20 10:00 ` [PATCH v2 04/13] tools/xenstore: add framework to commit accounting data on success only Juergen Gross
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 40+ messages in thread
From: Juergen Gross @ 2023-01-20 10:00 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Wei Liu, Julien Grall, Anthony PERARD

Introduce the scheme of an accounting data array for per-domain
accounting data and use it initially for the number of nodes owned by
a domain.

Make the accounting data type to be unsigned int, as no data is allowed
to be negative at any time.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 tools/xenstore/xenstored_domain.c | 71 ++++++++++++++++++-------------
 tools/xenstore/xenstored_domain.h |  5 ++-
 2 files changed, 45 insertions(+), 31 deletions(-)

diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
index 44e72937fa..f459c5aabb 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -69,8 +69,8 @@ struct domain
 	/* Has domain been officially introduced? */
 	bool introduced;
 
-	/* number of entry from this domain in the store */
-	int nbentry;
+	/* Accounting data for this domain. */
+	unsigned int acc[ACC_N];
 
 	/* Amount of memory allocated for this domain. */
 	int memory;
@@ -246,7 +246,7 @@ static int domain_tree_remove_sub(const void *ctx, struct connection *conn,
 
 	if (keep_orphans) {
 		set_tdb_key(node->name, &key);
-		domain->nbentry--;
+		domain_nbentry_dec(NULL, domain->domid);
 		node->perms.p[0].id = priv_domid;
 		node->acc.memory = 0;
 		domain_nbentry_inc(NULL, priv_domid);
@@ -270,7 +270,7 @@ static int domain_tree_remove_sub(const void *ctx, struct connection *conn,
 		ret = WALK_TREE_SKIP_CHILDREN;
 	}
 
-	return domain->nbentry > 0 ? ret : WALK_TREE_SUCCESS_STOP;
+	return domain->acc[ACC_NODES] ? ret : WALK_TREE_SUCCESS_STOP;
 }
 
 static void domain_tree_remove(struct domain *domain)
@@ -278,7 +278,7 @@ static void domain_tree_remove(struct domain *domain)
 	int ret;
 	struct walk_funcs walkfuncs = { .enter = domain_tree_remove_sub };
 
-	if (domain->nbentry > 0) {
+	if (domain->acc[ACC_NODES]) {
 		ret = walk_node_tree(domain, NULL, "/", &walkfuncs, domain);
 		if (ret == WALK_TREE_ERROR_STOP)
 			syslog(LOG_ERR,
@@ -437,7 +437,7 @@ int domain_get_quota(const void *ctx, struct connection *conn,
 	resp = talloc_asprintf_append(resp, "%-16s: %8d\n", #t, e); \
 	if (!resp) return ENOMEM
 
-	ent(nodes, d->nbentry);
+	ent(nodes, d->acc[ACC_NODES]);
 	ent(watches, d->nbwatch);
 	ent(transactions, ta);
 	ent(outstanding, d->nboutstanding);
@@ -1047,8 +1047,28 @@ int domain_adjust_node_perms(struct node *node)
 	return 0;
 }
 
-static int domain_nbentry_add(struct connection *conn, unsigned int domid,
-			      int add, bool no_dom_alloc)
+static int domain_acc_add_chk(struct domain *d, enum accitem what, int add,
+			      unsigned int domid)
+{
+	assert(what < ARRAY_SIZE(d->acc));
+
+	if ((add < 0 && -add > d->acc[what]) ||
+	    (d->acc[what] + add) > INT_MAX) {
+		/*
+		 * In a transaction when a node is being added/removed AND the
+		 * same node has been added/removed outside the transaction in
+		 * parallel, the resulting value will be wrong. This is no
+		 * problem, as the transaction will fail due to the resulting
+		 * conflict.
+		 */
+		return (add < 0) ? 0 : INT_MAX;
+	}
+
+	return d->acc[what] + add;
+}
+
+static int domain_acc_add(struct connection *conn, unsigned int domid,
+			  enum accitem what, int add, bool no_dom_alloc)
 {
 	struct domain *d;
 	struct list_head *head;
@@ -1071,56 +1091,49 @@ static int domain_nbentry_add(struct connection *conn, unsigned int domid,
 		}
 	}
 
-	if (conn && conn->transaction) {
+	if (conn && conn->transaction && what < ACC_TR_N) {
 		head = transaction_get_changed_domains(conn->transaction);
-		ret = acc_add_changed_dom(conn->transaction, head, ACC_NODES,
+		ret = acc_add_changed_dom(conn->transaction, head, what,
 					  add, domid);
 		if (errno) {
 			fail_transaction(conn->transaction);
 			return -1;
 		}
-		/*
-		 * In a transaction when a node is being added/removed AND the
-		 * same node has been added/removed outside the transaction in
-		 * parallel, the resulting number of nodes will be wrong. This
-		 * is no problem, as the transaction will fail due to the
-		 * resulting conflict.
-		 * In the node remove case the resulting number can be even
-		 * negative, which should be avoided.
-		 */
-		return max(d->nbentry + ret, 0);
+		return domain_acc_add_chk(d, what, ret, domid);
 	}
 
-	d->nbentry += add;
+	d->acc[what] = domain_acc_add_chk(d, what, add, domid);
 
-	return d->nbentry;
+	return d->acc[what];
 }
 
 int domain_nbentry_inc(struct connection *conn, unsigned int domid)
 {
-	return (domain_nbentry_add(conn, domid, 1, false) < 0) ? errno : 0;
+	return (domain_acc_add(conn, domid, ACC_NODES, 1, false) < 0)
+	       ? errno : 0;
 }
 
 int domain_nbentry_dec(struct connection *conn, unsigned int domid)
 {
-	return (domain_nbentry_add(conn, domid, -1, true) < 0) ? errno : 0;
+	return (domain_acc_add(conn, domid, ACC_NODES, -1, true) < 0)
+	       ? errno : 0;
 }
 
 int domain_nbentry_fix(unsigned int domid, int num, bool update)
 {
 	int ret;
 
-	ret = domain_nbentry_add(NULL, domid, update ? num : 0, update);
+	ret = domain_acc_add(NULL, domid, ACC_NODES, update ? num : 0, update);
 	if (ret < 0 || update)
 		return ret;
 
 	return domid_is_unprivileged(domid) ? ret + num : 0;
 }
 
-int domain_nbentry(struct connection *conn)
+unsigned int domain_nbentry(struct connection *conn)
 {
 	return domain_is_unprivileged(conn)
-	       ? domain_nbentry_add(conn, conn->id, 0, true) : 0;
+	       ? domain_acc_add(conn, conn->id, ACC_NODES, 0, true) : 0;
 }
 
 static bool domain_chk_quota(struct domain *domain, int mem)
@@ -1597,7 +1610,7 @@ static int domain_check_acc_init_sub(const void *k, void *v, void *arg)
 	 * If everything is correct incrementing the value for each node will
 	 * result in dom->nodes being 0 at the end.
 	 */
-	dom->nodes = -d->nbentry;
+	dom->nodes = -d->acc[ACC_NODES];
 
 	if (!hashtable_insert(domains, &dom->domid, dom)) {
 		talloc_free(dom);
@@ -1652,7 +1665,7 @@ static int domain_check_acc_cb(const void *k, void *v, void *arg)
 	if (!d)
 		return 0;
 
-	d->nbentry += dom->nodes;
+	d->acc[ACC_NODES] += dom->nodes;
 
 	return 0;
 }
diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
index 6a2b76a85b..8259c114b0 100644
--- a/tools/xenstore/xenstored_domain.h
+++ b/tools/xenstore/xenstored_domain.h
@@ -21,7 +21,8 @@
 
 enum accitem {
 	ACC_NODES,
-	ACC_TR_N	/* Number of elements per transaction and domain. */
+	ACC_TR_N,        /* Number of elements per transaction and domain. */
+	ACC_N = ACC_TR_N /* Number of elements per domain. */
 };
 
 void handle_event(void);
@@ -72,7 +73,7 @@ int domain_alloc_permrefs(struct node_perms *perms);
 int domain_nbentry_inc(struct connection *conn, unsigned int domid);
 int domain_nbentry_dec(struct connection *conn, unsigned int domid);
 int domain_nbentry_fix(unsigned int domid, int num, bool update);
-int domain_nbentry(struct connection *conn);
+unsigned int domain_nbentry(struct connection *conn);
 int domain_memory_add(unsigned int domid, int mem, bool no_quota_check);
 
 /*
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 04/13] tools/xenstore: add framework to commit accounting data on success only
  2023-01-20 10:00 [PATCH v2 00/13] tools/xenstore: rework internal accounting Juergen Gross
                   ` (2 preceding siblings ...)
  2023-01-20 10:00 ` [PATCH v2 03/13] tools/xenstore: introduce accounting data array for per-domain values Juergen Gross
@ 2023-01-20 10:00 ` Juergen Gross
  2023-02-20 22:50   ` Julien Grall
  2023-01-20 10:00 ` [PATCH v2 05/13] tools/xenstore: use accounting buffering for node accounting Juergen Gross
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 40+ messages in thread
From: Juergen Gross @ 2023-01-20 10:00 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Wei Liu, Julien Grall, Anthony PERARD

Instead of modifying accounting data and undo those modifications in
case of an error during further processing, add a framework for
collecting the needed changes and commit them only when the whole
operation has succeeded.

This scheme can reuse large parts of the per transaction accounting.
The changed_domain handling can be reused, but the array size of the
accounting data should be possible to be different for both use cases.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 tools/xenstore/xenstored_core.c   |  5 +++
 tools/xenstore/xenstored_core.h   |  3 ++
 tools/xenstore/xenstored_domain.c | 64 +++++++++++++++++++++++++++----
 tools/xenstore/xenstored_domain.h |  5 ++-
 4 files changed, 68 insertions(+), 9 deletions(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 27dfbe9593..2d10cacf35 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1023,6 +1023,9 @@ static void send_error(struct connection *conn, int error)
 			break;
 		}
 	}
+
+	acc_drop(conn);
+
 	send_reply(conn, XS_ERROR, xsd_errors[i].errstring,
 			  strlen(xsd_errors[i].errstring) + 1);
 }
@@ -1060,6 +1063,7 @@ void send_reply(struct connection *conn, enum xsd_sockmsg_type type,
 	}
 
 	conn->in = NULL;
+	acc_commit(conn);
 
 	/* Update relevant header fields and fill in the message body. */
 	bdata->hdr.msg.type = type;
@@ -2195,6 +2199,7 @@ struct connection *new_connection(const struct interface_funcs *funcs)
 	new->is_stalled = false;
 	new->transaction_started = 0;
 	INIT_LIST_HEAD(&new->out_list);
+	INIT_LIST_HEAD(&new->acc_list);
 	INIT_LIST_HEAD(&new->ref_list);
 	INIT_LIST_HEAD(&new->watches);
 	INIT_LIST_HEAD(&new->transaction_list);
diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
index c59b06551f..1f811f38cb 100644
--- a/tools/xenstore/xenstored_core.h
+++ b/tools/xenstore/xenstored_core.h
@@ -139,6 +139,9 @@ struct connection
 	struct list_head out_list;
 	uint64_t timeout_msec;
 
+	/* Not yet committed accounting data (valid if in != NULL). */
+	struct list_head acc_list;
+
 	/* Referenced requests no longer pending. */
 	struct list_head ref_list;
 
diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
index f459c5aabb..33105dba8f 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -100,7 +100,7 @@ struct changed_domain
 	unsigned int domid;
 
 	/* Accounting data. */
-	int acc[ACC_TR_N];
+	int acc[];
 };
 
 static struct hashtable *domhash;
@@ -577,6 +577,7 @@ static struct changed_domain *acc_find_changed_domain(struct list_head *head,
 
 static struct changed_domain *acc_get_changed_domain(const void *ctx,
 						     struct list_head *head,
+						     enum accitem acc_sz,
 						     unsigned int domid)
 {
 	struct changed_domain *cd;
@@ -585,7 +586,7 @@ static struct changed_domain *acc_get_changed_domain(const void *ctx,
 	if (cd)
 		return cd;
 
-	cd = talloc_zero(ctx, struct changed_domain);
+	cd = talloc_zero_size(ctx, sizeof(*cd) + acc_sz * sizeof(cd->acc[0]));
 	if (!cd)
 		return NULL;
 
@@ -596,13 +597,13 @@ static struct changed_domain *acc_get_changed_domain(const void *ctx,
 }
 
 static int acc_add_changed_dom(const void *ctx, struct list_head *head,
-			       enum accitem what, int val, unsigned int domid)
+			       enum accitem acc_sz, enum accitem what,
+			       int val, unsigned int domid)
 {
 	struct changed_domain *cd;
 
-	assert(what < ARRAY_SIZE(cd->acc));
-
-	cd = acc_get_changed_domain(ctx, head, domid);
+	assert(what < acc_sz);
+	cd = acc_get_changed_domain(ctx, head, acc_sz, domid);
 	if (!cd)
 		return 0;
 
@@ -1071,6 +1072,7 @@ static int domain_acc_add(struct connection *conn, unsigned int domid,
 			  enum accitem what, int add, bool no_dom_alloc)
 {
 	struct domain *d;
+	struct changed_domain *cd;
 	struct list_head *head;
 	int ret;
 
@@ -1091,10 +1093,26 @@ static int domain_acc_add(struct connection *conn, unsigned int domid,
 		}
 	}
 
+	/* Temporary accounting data until final commit? */
+	if (conn && conn->in && what < ACC_REQ_N) {
+		/* Consider transaction local data. */
+		ret = 0;
+		if (conn->transaction && what < ACC_TR_N) {
+			head = transaction_get_changed_domains(
+				conn->transaction);
+			cd = acc_find_changed_domain(head, domid);
+			if (cd)
+				ret = cd->acc[what];
+		}
+		ret += acc_add_changed_dom(conn->in, &conn->acc_list, ACC_REQ_N,
+					  what, add, domid);
+		return errno ? -1 : domain_acc_add_chk(d, what, ret, domid);
+	}
+
 	if (conn && conn->transaction && what < ACC_TR_N) {
 		head = transaction_get_changed_domains(conn->transaction);
-		ret = acc_add_changed_dom(conn->transaction, head, what,
-					  add, domid);
+		ret = acc_add_changed_dom(conn->transaction, head, ACC_TR_N,
+					  what, add, domid);
 		if (errno) {
 			fail_transaction(conn->transaction);
 			return -1;
@@ -1107,6 +1125,36 @@ static int domain_acc_add(struct connection *conn, unsigned int domid,
 	return d->acc[what];
 }
 
+void acc_drop(struct connection *conn)
+{
+	struct changed_domain *cd;
+
+	while ((cd = list_top(&conn->acc_list, struct changed_domain, list))) {
+		list_del(&cd->list);
+		talloc_free(cd);
+	}
+}
+
+void acc_commit(struct connection *conn)
+{
+	struct changed_domain *cd;
+	struct buffered_data *in = conn->in;
+	enum accitem what;
+
+	conn->in = NULL; /* Avoid recursion. */
+	while ((cd = list_top(&conn->acc_list, struct changed_domain, list))) {
+		list_del(&cd->list);
+		for (what = 0; what < ACC_REQ_N; what++)
+			if (cd->acc[what])
+				domain_acc_add(conn, cd->domid, what,
+					       cd->acc[what], true);
+
+		talloc_free(cd);
+	}
+
+	conn->in = in;
+}
+
 int domain_nbentry_inc(struct connection *conn, unsigned int domid)
 {
 	return (domain_acc_add(conn, domid, ACC_NODES, 1, false) < 0)
diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
index 8259c114b0..cd85bd0cad 100644
--- a/tools/xenstore/xenstored_domain.h
+++ b/tools/xenstore/xenstored_domain.h
@@ -20,7 +20,8 @@
 #define _XENSTORED_DOMAIN_H
 
 enum accitem {
-	ACC_NODES,
+	ACC_REQ_N,       /* Number of elements per request and domain. */
+	ACC_NODES = ACC_REQ_N,
 	ACC_TR_N,        /* Number of elements per transaction and domain. */
 	ACC_N = ACC_TR_N /* Number of elements per domain. */
 };
@@ -103,6 +104,8 @@ void domain_outstanding_domid_dec(unsigned int domid);
 int domain_get_quota(const void *ctx, struct connection *conn,
 		     unsigned int domid);
 int acc_fix_domains(struct list_head *head, bool update);
+void acc_drop(struct connection *conn);
+void acc_commit(struct connection *conn);
 
 /* Write rate limiting */
 
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 05/13] tools/xenstore: use accounting buffering for node accounting
  2023-01-20 10:00 [PATCH v2 00/13] tools/xenstore: rework internal accounting Juergen Gross
                   ` (3 preceding siblings ...)
  2023-01-20 10:00 ` [PATCH v2 04/13] tools/xenstore: add framework to commit accounting data on success only Juergen Gross
@ 2023-01-20 10:00 ` Juergen Gross
  2023-01-20 10:00 ` [PATCH v2 06/13] tools/xenstore: add current connection to domain_memory_add() parameters Juergen Gross
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 40+ messages in thread
From: Juergen Gross @ 2023-01-20 10:00 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Wei Liu, Julien Grall, Anthony PERARD

Add the node accounting to the accounting information buffering in
order to avoid having to undo it in case of failure.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 tools/xenstore/xenstored_core.c   | 21 ++-------------------
 tools/xenstore/xenstored_domain.h |  4 ++--
 2 files changed, 4 insertions(+), 21 deletions(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 2d10cacf35..250ecd1199 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1452,7 +1452,6 @@ static void destroy_node_rm(struct connection *conn, struct node *node)
 static int destroy_node(struct connection *conn, struct node *node)
 {
 	destroy_node_rm(conn, node);
-	domain_nbentry_dec(conn, get_node_owner(node));
 
 	/*
 	 * It is not possible to easily revert the changes in a transaction.
@@ -1797,27 +1796,11 @@ static int do_set_perms(const void *ctx, struct connection *conn,
 	old_perms = node->perms;
 	domain_nbentry_dec(conn, get_node_owner(node));
 	node->perms = perms;
-	if (domain_nbentry_inc(conn, get_node_owner(node))) {
-		node->perms = old_perms;
-		/*
-		 * This should never fail because we had a reference on the
-		 * domain before and Xenstored is single-threaded.
-		 */
-		domain_nbentry_inc(conn, get_node_owner(node));
+	if (domain_nbentry_inc(conn, get_node_owner(node)))
 		return ENOMEM;
-	}
 
-	if (write_node(conn, node, false)) {
-		int saved_errno = errno;
-
-		domain_nbentry_dec(conn, get_node_owner(node));
-		node->perms = old_perms;
-		/* No failure possible as above. */
-		domain_nbentry_inc(conn, get_node_owner(node));
-
-		errno = saved_errno;
+	if (write_node(conn, node, false))
 		return errno;
-	}
 
 	fire_watches(conn, ctx, name, node, false, &old_perms);
 	send_ack(conn, XS_SET_PERMS);
diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
index cd85bd0cad..5df6ae72f8 100644
--- a/tools/xenstore/xenstored_domain.h
+++ b/tools/xenstore/xenstored_domain.h
@@ -20,9 +20,9 @@
 #define _XENSTORED_DOMAIN_H
 
 enum accitem {
+	ACC_NODES,
 	ACC_REQ_N,       /* Number of elements per request and domain. */
-	ACC_NODES = ACC_REQ_N,
-	ACC_TR_N,        /* Number of elements per transaction and domain. */
+	ACC_TR_N = ACC_REQ_N, /* Number of elements per transaction and domain. */
 	ACC_N = ACC_TR_N /* Number of elements per domain. */
 };
 
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 06/13] tools/xenstore: add current connection to domain_memory_add() parameters
  2023-01-20 10:00 [PATCH v2 00/13] tools/xenstore: rework internal accounting Juergen Gross
                   ` (4 preceding siblings ...)
  2023-01-20 10:00 ` [PATCH v2 05/13] tools/xenstore: use accounting buffering for node accounting Juergen Gross
@ 2023-01-20 10:00 ` Juergen Gross
  2023-01-20 10:00 ` [PATCH v2 07/13] tools/xenstore: use accounting data array for per-domain values Juergen Gross
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 40+ messages in thread
From: Juergen Gross @ 2023-01-20 10:00 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Wei Liu, Julien Grall, Anthony PERARD

In order to enable switching memory accounting to the generic array
based accounting, add the current connection to the parameters of
domain_memory_add().

This requires to add the connection to some other functions, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 tools/xenstore/xenstored_core.c   | 28 ++++++++++++++++------------
 tools/xenstore/xenstored_domain.c |  3 ++-
 tools/xenstore/xenstored_domain.h | 14 +++++++++-----
 tools/xenstore/xenstored_watch.c  | 11 ++++++-----
 4 files changed, 33 insertions(+), 23 deletions(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 250ecd1199..1958df9147 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -246,7 +246,8 @@ static void free_buffered_data(struct buffered_data *out,
 		}
 	}
 
-	domain_memory_add_nochk(conn->id, -out->hdr.msg.len - sizeof(out->hdr));
+	domain_memory_add_nochk(conn, conn->id,
+				-out->hdr.msg.len - sizeof(out->hdr));
 
 	if (out->hdr.msg.type == XS_WATCH_EVENT) {
 		req = out->pend.req;
@@ -631,24 +632,25 @@ int do_tdb_write(struct connection *conn, TDB_DATA *key, TDB_DATA *data,
 	 * nodes to new owners.
 	 */
 	if (old_acc.memory)
-		domain_memory_add_nochk(old_domid,
+		domain_memory_add_nochk(conn, old_domid,
 					-old_acc.memory - key->dsize);
-	ret = domain_memory_add(new_domid, data->dsize + key->dsize,
-				no_quota_check);
+	ret = domain_memory_add(conn, new_domid,
+				data->dsize + key->dsize, no_quota_check);
 	if (ret) {
 		/* Error path, so no quota check. */
 		if (old_acc.memory)
-			domain_memory_add_nochk(old_domid,
+			domain_memory_add_nochk(conn, old_domid,
 						old_acc.memory + key->dsize);
 		return ret;
 	}
 
 	/* TDB should set errno, but doesn't even set ecode AFAICT. */
 	if (tdb_store(tdb_ctx, *key, *data, TDB_REPLACE) != 0) {
-		domain_memory_add_nochk(new_domid, -data->dsize - key->dsize);
+		domain_memory_add_nochk(conn, new_domid,
+					-data->dsize - key->dsize);
 		/* Error path, so no quota check. */
 		if (old_acc.memory)
-			domain_memory_add_nochk(old_domid,
+			domain_memory_add_nochk(conn, old_domid,
 						old_acc.memory + key->dsize);
 		errno = EIO;
 		return errno;
@@ -683,7 +685,7 @@ int do_tdb_delete(struct connection *conn, TDB_DATA *key,
 
 	if (acc->memory) {
 		domid = get_acc_domid(conn, key, acc->domid);
-		domain_memory_add_nochk(domid, -acc->memory - key->dsize);
+		domain_memory_add_nochk(conn, domid, -acc->memory - key->dsize);
 	}
 
 	return 0;
@@ -1052,11 +1054,13 @@ void send_reply(struct connection *conn, enum xsd_sockmsg_type type,
 	if (len <= DEFAULT_BUFFER_SIZE) {
 		bdata->buffer = bdata->default_buffer;
 		/* Don't check quota, path might be used for returning error. */
-		domain_memory_add_nochk(conn->id, len + sizeof(bdata->hdr));
+		domain_memory_add_nochk(conn, conn->id,
+					len + sizeof(bdata->hdr));
 	} else {
 		bdata->buffer = talloc_array(bdata, char, len);
 		if (!bdata->buffer ||
-		    domain_memory_add_chk(conn->id, len + sizeof(bdata->hdr))) {
+		    domain_memory_add_chk(conn, conn->id,
+					  len + sizeof(bdata->hdr))) {
 			send_error(conn, ENOMEM);
 			return;
 		}
@@ -1122,7 +1126,7 @@ void send_event(struct buffered_data *req, struct connection *conn,
 		}
 	}
 
-	if (domain_memory_add_chk(conn->id, len + sizeof(bdata->hdr))) {
+	if (domain_memory_add_chk(conn, conn->id, len + sizeof(bdata->hdr))) {
 		talloc_free(bdata);
 		return;
 	}
@@ -3307,7 +3311,7 @@ static void add_buffered_data(struct buffered_data *bdata,
 	 * be smaller. So ignore it. The limit will be applied for any resource
 	 * after the state has been fully restored.
 	 */
-	domain_memory_add_nochk(conn->id, len + sizeof(bdata->hdr));
+	domain_memory_add_nochk(conn, conn->id, len + sizeof(bdata->hdr));
 }
 
 void read_state_buffered_data(const void *ctx, struct connection *conn,
diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
index 33105dba8f..9e7252372d 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -1233,7 +1233,8 @@ static bool domain_chk_quota(struct domain *domain, int mem)
 	return false;
 }
 
-int domain_memory_add(unsigned int domid, int mem, bool no_quota_check)
+int domain_memory_add(struct connection *conn, unsigned int domid, int mem,
+		      bool no_quota_check)
 {
 	struct domain *domain;
 
diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
index 5df6ae72f8..11569dbd1b 100644
--- a/tools/xenstore/xenstored_domain.h
+++ b/tools/xenstore/xenstored_domain.h
@@ -75,25 +75,29 @@ int domain_nbentry_inc(struct connection *conn, unsigned int domid);
 int domain_nbentry_dec(struct connection *conn, unsigned int domid);
 int domain_nbentry_fix(unsigned int domid, int num, bool update);
 unsigned int domain_nbentry(struct connection *conn);
-int domain_memory_add(unsigned int domid, int mem, bool no_quota_check);
+int domain_memory_add(struct connection *conn, unsigned int domid, int mem,
+		      bool no_quota_check);
 
 /*
  * domain_memory_add_chk(): to be used when memory quota should be checked.
  * Not to be used when specifying a negative mem value, as lowering the used
  * memory should always be allowed.
  */
-static inline int domain_memory_add_chk(unsigned int domid, int mem)
+static inline int domain_memory_add_chk(struct connection *conn,
+					unsigned int domid, int mem)
 {
-	return domain_memory_add(domid, mem, false);
+	return domain_memory_add(conn, domid, mem, false);
 }
+
 /*
  * domain_memory_add_nochk(): to be used when memory quota should not be
  * checked, e.g. when lowering memory usage, or in an error case for undoing
  * a previous memory adjustment.
  */
-static inline void domain_memory_add_nochk(unsigned int domid, int mem)
+static inline void domain_memory_add_nochk(struct connection *conn,
+					   unsigned int domid, int mem)
 {
-	domain_memory_add(domid, mem, true);
+	domain_memory_add(conn, domid, mem, true);
 }
 void domain_watch_inc(struct connection *conn);
 void domain_watch_dec(struct connection *conn);
diff --git a/tools/xenstore/xenstored_watch.c b/tools/xenstore/xenstored_watch.c
index 8ad0229df6..e30cd89be3 100644
--- a/tools/xenstore/xenstored_watch.c
+++ b/tools/xenstore/xenstored_watch.c
@@ -199,7 +199,7 @@ static struct watch *add_watch(struct connection *conn, char *path, char *token,
 	watch->token = talloc_strdup(watch, token);
 	if (!watch->node || !watch->token)
 		goto nomem;
-	if (domain_memory_add(conn->id, strlen(path) + strlen(token),
+	if (domain_memory_add(conn, conn->id, strlen(path) + strlen(token),
 			      no_quota_check))
 		goto nomem;
 
@@ -274,8 +274,9 @@ int do_unwatch(const void *ctx, struct connection *conn,
 	list_for_each_entry(watch, &conn->watches, list) {
 		if (streq(watch->node, node) && streq(watch->token, vec[1])) {
 			list_del(&watch->list);
-			domain_memory_add_nochk(conn->id, -strlen(watch->node) -
-							  strlen(watch->token));
+			domain_memory_add_nochk(conn, conn->id,
+						-strlen(watch->node) -
+						strlen(watch->token));
 			talloc_free(watch);
 			domain_watch_dec(conn);
 			send_ack(conn, XS_UNWATCH);
@@ -291,8 +292,8 @@ void conn_delete_all_watches(struct connection *conn)
 
 	while ((watch = list_top(&conn->watches, struct watch, list))) {
 		list_del(&watch->list);
-		domain_memory_add_nochk(conn->id, -strlen(watch->node) -
-						  strlen(watch->token));
+		domain_memory_add_nochk(conn, conn->id, -strlen(watch->node) -
+							strlen(watch->token));
 		talloc_free(watch);
 		domain_watch_dec(conn);
 	}
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 07/13] tools/xenstore: use accounting data array for per-domain values
  2023-01-20 10:00 [PATCH v2 00/13] tools/xenstore: rework internal accounting Juergen Gross
                   ` (5 preceding siblings ...)
  2023-01-20 10:00 ` [PATCH v2 06/13] tools/xenstore: add current connection to domain_memory_add() parameters Juergen Gross
@ 2023-01-20 10:00 ` Juergen Gross
  2023-01-20 10:00 ` [PATCH v2 08/13] tools/xenstore: add accounting trace support Juergen Gross
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 40+ messages in thread
From: Juergen Gross @ 2023-01-20 10:00 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Wei Liu, Julien Grall, Anthony PERARD

Add the accounting of per-domain usage of Xenstore memory, watches, and
outstanding requests to the array based mechanism.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 tools/xenstore/xenstored_core.c   |   8 +--
 tools/xenstore/xenstored_domain.c | 111 +++++++++++-------------------
 tools/xenstore/xenstored_domain.h |  10 +--
 3 files changed, 52 insertions(+), 77 deletions(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 1958df9147..6ef60179fa 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -255,7 +255,7 @@ static void free_buffered_data(struct buffered_data *out,
 			req->pend.ref.event_cnt--;
 			if (!req->pend.ref.event_cnt && !req->on_out_list) {
 				if (req->on_ref_list) {
-					domain_outstanding_domid_dec(
+					domain_outstanding_dec(conn,
 						req->pend.ref.domid);
 					list_del(&req->list);
 				}
@@ -271,7 +271,7 @@ static void free_buffered_data(struct buffered_data *out,
 		out->on_ref_list = true;
 		return;
 	} else
-		domain_outstanding_dec(conn);
+		domain_outstanding_dec(conn, conn->id);
 
 	talloc_free(out);
 }
@@ -1077,7 +1077,7 @@ void send_reply(struct connection *conn, enum xsd_sockmsg_type type,
 	/* Queue for later transmission. */
 	list_add_tail(&bdata->list, &conn->out_list);
 	bdata->on_out_list = true;
-	domain_outstanding_inc(conn);
+	domain_outstanding_inc(conn, conn->id);
 }
 
 /*
@@ -3305,7 +3305,7 @@ static void add_buffered_data(struct buffered_data *bdata,
 	 * request have been delivered.
 	 */
 	if (bdata->hdr.msg.type != XS_WATCH_EVENT)
-		domain_outstanding_inc(conn);
+		domain_outstanding_inc(conn, conn->id);
 	/*
 	 * We are restoring the state after Live-Update and the new quota may
 	 * be smaller. So ignore it. The limit will be applied for any resource
diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
index 9e7252372d..b1e29edb7e 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -72,19 +72,12 @@ struct domain
 	/* Accounting data for this domain. */
 	unsigned int acc[ACC_N];
 
-	/* Amount of memory allocated for this domain. */
-	int memory;
+	/* Memory quota data for this domain. */
 	bool soft_quota_reported;
 	bool hard_quota_reported;
 	time_t mem_last_msg;
 #define MEM_WARN_MINTIME_SEC 10
 
-	/* number of watch for this domain */
-	int nbwatch;
-
-	/* Number of outstanding requests. */
-	int nboutstanding;
-
 	/* write rate limit */
 	wrl_creditt wrl_credit; /* [ -wrl_config_writecost, +_dburst ] */
 	struct wrl_timestampt wrl_timestamp;
@@ -200,14 +193,15 @@ static bool domain_can_write(struct connection *conn)
 
 static bool domain_can_read(struct connection *conn)
 {
-	struct xenstore_domain_interface *intf = conn->domain->interface;
+	struct domain *domain = conn->domain;
+	struct xenstore_domain_interface *intf = domain->interface;
 
 	if (domain_is_unprivileged(conn)) {
-		if (conn->domain->wrl_credit < 0)
+		if (domain->wrl_credit < 0)
 			return false;
-		if (conn->domain->nboutstanding >= quota_req_outstanding)
+		if (domain->acc[ACC_OUTST] >= quota_req_outstanding)
 			return false;
-		if (conn->domain->memory >= quota_memory_per_domain_hard &&
+		if (domain->acc[ACC_MEM] >= quota_memory_per_domain_hard &&
 		    quota_memory_per_domain_hard)
 			return false;
 	}
@@ -438,10 +432,10 @@ int domain_get_quota(const void *ctx, struct connection *conn,
 	if (!resp) return ENOMEM
 
 	ent(nodes, d->acc[ACC_NODES]);
-	ent(watches, d->nbwatch);
+	ent(watches, d->acc[ACC_WATCH]);
 	ent(transactions, ta);
-	ent(outstanding, d->nboutstanding);
-	ent(memory, d->memory);
+	ent(outstanding, d->acc[ACC_OUTST]);
+	ent(memory, d->acc[ACC_MEM]);
 
 #undef ent
 
@@ -1184,14 +1178,16 @@ unsigned int domain_nbentry(struct connection *conn)
 	       ? domain_acc_add(conn, conn->id, ACC_NODES, 0, true) : 0;
 }
 
-static bool domain_chk_quota(struct domain *domain, int mem)
+static bool domain_chk_quota(struct connection *conn, unsigned int mem)
 {
 	time_t now;
+	struct domain *domain;
 
-	if (!domain || !domid_is_unprivileged(domain->domid) ||
-	    (domain->conn && domain->conn->is_ignored))
+	if (!conn || !domid_is_unprivileged(conn->id) ||
+	    conn->is_ignored)
 		return false;
 
+	domain = conn->domain;
 	now = time(NULL);
 
 	if (mem >= quota_memory_per_domain_hard &&
@@ -1236,80 +1232,57 @@ static bool domain_chk_quota(struct domain *domain, int mem)
 int domain_memory_add(struct connection *conn, unsigned int domid, int mem,
 		      bool no_quota_check)
 {
-	struct domain *domain;
+	int ret;
 
-	domain = find_domain_struct(domid);
-	if (domain) {
-		/*
-		 * domain_chk_quota() will print warning and also store whether
-		 * the soft/hard quota has been hit. So check no_quota_check
-		 * *after*.
-		 */
-		if (domain_chk_quota(domain, domain->memory + mem) &&
-		    !no_quota_check)
-			return ENOMEM;
-		domain->memory += mem;
-	} else {
-		/*
-		 * The domain the memory is to be accounted for should always
-		 * exist, as accounting is done either for a domain related to
-		 * the current connection, or for the domain owning a node
-		 * (which is always existing, as the owner of the node is
-		 * tested to exist and deleted or replaced by domid 0 if not).
-		 * So not finding the related domain MUST be an error in the
-		 * data base.
-		 */
-		errno = ENOENT;
-		corrupt(NULL, "Accounting called for non-existing domain %u\n",
-			domid);
-		return ENOENT;
-	}
+	ret = domain_acc_add(conn, domid, ACC_MEM, 0, true);
+	if (ret < 0)
+		return -ret;
+
+	/*
+	 * domain_chk_quota() will print warning and also store whether the
+	 * soft/hard quota has been hit. So check no_quota_check *after*.
+	 */
+	if (domain_chk_quota(conn, ret + mem) && !no_quota_check)
+		return ENOMEM;
+
+	/*
+	 * The domain the memory is to be accounted for should always exist,
+	 * as accounting is done either for a domain related to the current
+	 * connection, or for the domain owning a node (which is always
+	 * existing, as the owner of the node is tested to exist and deleted
+	 * or replaced by domid 0 if not).
+	 * So not finding the related domain MUST be an error in the data base.
+	 */
+	domain_acc_add(conn, domid, ACC_MEM, mem, true);
 
 	return 0;
 }
 
 void domain_watch_inc(struct connection *conn)
 {
-	if (!conn || !conn->domain)
-		return;
-	conn->domain->nbwatch++;
+	domain_acc_add(conn, conn->id, ACC_WATCH, 1, true);
 }
 
 void domain_watch_dec(struct connection *conn)
 {
-	if (!conn || !conn->domain)
-		return;
-	if (conn->domain->nbwatch)
-		conn->domain->nbwatch--;
+	domain_acc_add(conn, conn->id, ACC_WATCH, -1, true);
 }
 
 int domain_watch(struct connection *conn)
 {
 	return (domain_is_unprivileged(conn))
-		? conn->domain->nbwatch
+		? domain_acc_add(conn, conn->id, ACC_WATCH, 0, true)
 		: 0;
 }
 
-void domain_outstanding_inc(struct connection *conn)
+void domain_outstanding_inc(struct connection *conn, unsigned int domid)
 {
-	if (!conn || !conn->domain)
-		return;
-	conn->domain->nboutstanding++;
+	domain_acc_add(conn, domid, ACC_OUTST, 1, true);
 }
 
-void domain_outstanding_dec(struct connection *conn)
+void domain_outstanding_dec(struct connection *conn, unsigned int domid)
 {
-	if (!conn || !conn->domain)
-		return;
-	conn->domain->nboutstanding--;
-}
-
-void domain_outstanding_domid_dec(unsigned int domid)
-{
-	struct domain *d = find_domain_by_domid(domid);
-
-	if (d)
-		d->nboutstanding--;
+	domain_acc_add(conn, domid, ACC_OUTST, -1, true);
 }
 
 static wrl_creditt wrl_config_writecost      = WRL_FACTOR;
diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
index 11569dbd1b..37e14d1c8c 100644
--- a/tools/xenstore/xenstored_domain.h
+++ b/tools/xenstore/xenstored_domain.h
@@ -23,7 +23,10 @@ enum accitem {
 	ACC_NODES,
 	ACC_REQ_N,       /* Number of elements per request and domain. */
 	ACC_TR_N = ACC_REQ_N, /* Number of elements per transaction and domain. */
-	ACC_N = ACC_TR_N /* Number of elements per domain. */
+	ACC_WATCH = ACC_TR_N,
+	ACC_OUTST,
+	ACC_MEM,
+	ACC_N            /* Number of elements per domain. */
 };
 
 void handle_event(void);
@@ -102,9 +105,8 @@ static inline void domain_memory_add_nochk(struct connection *conn,
 void domain_watch_inc(struct connection *conn);
 void domain_watch_dec(struct connection *conn);
 int domain_watch(struct connection *conn);
-void domain_outstanding_inc(struct connection *conn);
-void domain_outstanding_dec(struct connection *conn);
-void domain_outstanding_domid_dec(unsigned int domid);
+void domain_outstanding_inc(struct connection *conn, unsigned int domid);
+void domain_outstanding_dec(struct connection *conn, unsigned int domid);
 int domain_get_quota(const void *ctx, struct connection *conn,
 		     unsigned int domid);
 int acc_fix_domains(struct list_head *head, bool update);
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 08/13] tools/xenstore: add accounting trace support
  2023-01-20 10:00 [PATCH v2 00/13] tools/xenstore: rework internal accounting Juergen Gross
                   ` (6 preceding siblings ...)
  2023-01-20 10:00 ` [PATCH v2 07/13] tools/xenstore: use accounting data array for per-domain values Juergen Gross
@ 2023-01-20 10:00 ` Juergen Gross
  2023-02-20 22:57   ` Julien Grall
  2023-01-20 10:00 ` [PATCH v2 09/13] tools/xenstore: add TDB access " Juergen Gross
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 40+ messages in thread
From: Juergen Gross @ 2023-01-20 10:00 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Wei Liu, Julien Grall, Anthony PERARD

Add a new trace switch "acc" and the related trace calls.

The "acc" switch is off per default.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 tools/xenstore/xenstored_core.c   |  2 +-
 tools/xenstore/xenstored_core.h   |  1 +
 tools/xenstore/xenstored_domain.c | 10 ++++++++++
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 6ef60179fa..558ef491b1 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -2746,7 +2746,7 @@ static void set_quota(const char *arg, bool soft)
 
 /* Sorted by bit values of TRACE_* flags. Flag is (1u << index). */
 const char *const trace_switches[] = {
-	"obj", "io", "wrl",
+	"obj", "io", "wrl", "acc",
 	NULL
 };
 
diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
index 1f811f38cb..3e0734a6c6 100644
--- a/tools/xenstore/xenstored_core.h
+++ b/tools/xenstore/xenstored_core.h
@@ -302,6 +302,7 @@ extern unsigned int trace_flags;
 #define TRACE_OBJ	0x00000001
 #define TRACE_IO	0x00000002
 #define TRACE_WRL	0x00000004
+#define TRACE_ACC	0x00000008
 extern const char *const trace_switches[];
 int set_trace_switch(const char *arg);
 
diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
index b1e29edb7e..d461fd8cc8 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -538,6 +538,12 @@ static struct domain *find_domain_by_domid(unsigned int domid)
 	return (d && d->introduced) ? d : NULL;
 }
 
+#define trace_acc(...)				\
+do {						\
+	if (trace_flags & TRACE_ACC)		\
+		trace("acc: " __VA_ARGS__);	\
+} while (0)
+
 int acc_fix_domains(struct list_head *head, bool update)
 {
 	struct changed_domain *cd;
@@ -602,6 +608,8 @@ static int acc_add_changed_dom(const void *ctx, struct list_head *head,
 		return 0;
 
 	errno = 0;
+	trace_acc("local change domid %u: what=%u %d add %d\n", domid, what,
+		  cd->acc[what], val);
 	cd->acc[what] += val;
 
 	return cd->acc[what];
@@ -1114,6 +1122,8 @@ static int domain_acc_add(struct connection *conn, unsigned int domid,
 		return domain_acc_add_chk(d, what, ret, domid);
 	}
 
+	trace_acc("global change domid %u: what=%u %u add %d\n", domid, what,
+		  d->acc[what], add);
 	d->acc[what] = domain_acc_add_chk(d, what, add, domid);
 
 	return d->acc[what];
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 09/13] tools/xenstore: add TDB access trace support
  2023-01-20 10:00 [PATCH v2 00/13] tools/xenstore: rework internal accounting Juergen Gross
                   ` (7 preceding siblings ...)
  2023-01-20 10:00 ` [PATCH v2 08/13] tools/xenstore: add accounting trace support Juergen Gross
@ 2023-01-20 10:00 ` Juergen Gross
  2023-02-20 22:59   ` Julien Grall
  2023-01-20 10:00 ` [PATCH v2 10/13] tools/xenstore: switch transaction accounting to generic accounting Juergen Gross
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 40+ messages in thread
From: Juergen Gross @ 2023-01-20 10:00 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Wei Liu, Julien Grall, Anthony PERARD

Add a new trace switch "tdb" and the related trace calls.

The "tdb" switch is off per default.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 tools/xenstore/xenstored_core.c        | 8 +++++++-
 tools/xenstore/xenstored_core.h        | 6 ++++++
 tools/xenstore/xenstored_transaction.c | 7 ++++++-
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 558ef491b1..49e196e7ae 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -589,6 +589,8 @@ static void get_acc_data(TDB_DATA *key, struct node_account_data *acc)
 		if (old_data.dptr == NULL) {
 			acc->memory = 0;
 		} else {
+			trace_tdb("read %s size %zu\n", key->dptr,
+				  old_data.dsize + key->dsize);
 			hdr = (void *)old_data.dptr;
 			acc->memory = old_data.dsize;
 			acc->domid = hdr->perms[0].id;
@@ -655,6 +657,7 @@ int do_tdb_write(struct connection *conn, TDB_DATA *key, TDB_DATA *data,
 		errno = EIO;
 		return errno;
 	}
+	trace_tdb("store %s size %zu\n", key->dptr, data->dsize + key->dsize);
 
 	if (acc) {
 		/* Don't use new_domid, as it might be a transaction node. */
@@ -682,6 +685,7 @@ int do_tdb_delete(struct connection *conn, TDB_DATA *key,
 		errno = EIO;
 		return errno;
 	}
+	trace_tdb("delete %s\n", key->dptr);
 
 	if (acc->memory) {
 		domid = get_acc_domid(conn, key, acc->domid);
@@ -731,6 +735,8 @@ struct node *read_node(struct connection *conn, const void *ctx,
 		goto error;
 	}
 
+	trace_tdb("read %s size %zu\n", key.dptr, data.dsize + key.dsize);
+
 	node->parent = NULL;
 	talloc_steal(node, data.dptr);
 
@@ -2746,7 +2752,7 @@ static void set_quota(const char *arg, bool soft)
 
 /* Sorted by bit values of TRACE_* flags. Flag is (1u << index). */
 const char *const trace_switches[] = {
-	"obj", "io", "wrl", "acc",
+	"obj", "io", "wrl", "acc", "tdb",
 	NULL
 };
 
diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
index 3e0734a6c6..419a144396 100644
--- a/tools/xenstore/xenstored_core.h
+++ b/tools/xenstore/xenstored_core.h
@@ -303,8 +303,14 @@ extern unsigned int trace_flags;
 #define TRACE_IO	0x00000002
 #define TRACE_WRL	0x00000004
 #define TRACE_ACC	0x00000008
+#define TRACE_TDB	0x00000010
 extern const char *const trace_switches[];
 int set_trace_switch(const char *arg);
+#define trace_tdb(...)				\
+do {						\
+	if (trace_flags & TRACE_TDB)		\
+		trace("tdb: " __VA_ARGS__);	\
+} while (0)
 
 extern TDB_CONTEXT *tdb_ctx;
 extern int dom0_domid;
diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
index 1aa9d3cb3d..19a1175d1b 100644
--- a/tools/xenstore/xenstored_transaction.c
+++ b/tools/xenstore/xenstored_transaction.c
@@ -366,8 +366,11 @@ static int finalize_transaction(struct connection *conn,
 				if (tdb_error(tdb_ctx) != TDB_ERR_NOEXIST)
 					return EIO;
 				gen = NO_GENERATION;
-			} else
+			} else {
+				trace_tdb("read %s size %zu\n", key.dptr,
+					  key.dsize + data.dsize);
 				gen = hdr->generation;
+			}
 			talloc_free(data.dptr);
 			if (i->generation != gen)
 				return EAGAIN;
@@ -391,6 +394,8 @@ static int finalize_transaction(struct connection *conn,
 			set_tdb_key(i->trans_name, &ta_key);
 			data = tdb_fetch(tdb_ctx, ta_key);
 			if (data.dptr) {
+				trace_tdb("read %s size %zu\n", ta_key.dptr,
+					  ta_key.dsize + data.dsize);
 				hdr = (void *)data.dptr;
 				hdr->generation = ++generation;
 				*is_corrupt |= do_tdb_write(conn, &key, &data,
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 10/13] tools/xenstore: switch transaction accounting to generic accounting
  2023-01-20 10:00 [PATCH v2 00/13] tools/xenstore: rework internal accounting Juergen Gross
                   ` (8 preceding siblings ...)
  2023-01-20 10:00 ` [PATCH v2 09/13] tools/xenstore: add TDB access " Juergen Gross
@ 2023-01-20 10:00 ` Juergen Gross
  2023-01-20 10:00 ` [PATCH v2 11/13] tools/xenstore: remember global and per domain max accounting values Juergen Gross
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 40+ messages in thread
From: Juergen Gross @ 2023-01-20 10:00 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Wei Liu, Julien Grall, Anthony PERARD

As transaction accounting is active for unprivileged domains only, it
can easily be added to the generic per-domain accounting.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 tools/xenstore/xenstored_core.c        |  3 +--
 tools/xenstore/xenstored_core.h        |  1 -
 tools/xenstore/xenstored_domain.c      | 21 ++++++++++++++++++---
 tools/xenstore/xenstored_domain.h      |  4 ++++
 tools/xenstore/xenstored_transaction.c | 12 +++++-------
 5 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 49e196e7ae..5b85b69eb2 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -2083,7 +2083,7 @@ static void consider_message(struct connection *conn)
 	 * stalled. This will ignore new requests until Live-Update happened
 	 * or it was aborted.
 	 */
-	if (lu_is_pending() && conn->transaction_started == 0 &&
+	if (lu_is_pending() && conn->ta_start_time == 0 &&
 	    conn->in->hdr.msg.type == XS_TRANSACTION_START) {
 		trace("Delaying transaction start for connection %p req_id %u\n",
 		      conn, conn->in->hdr.msg.req_id);
@@ -2190,7 +2190,6 @@ struct connection *new_connection(const struct interface_funcs *funcs)
 	new->funcs = funcs;
 	new->is_ignored = false;
 	new->is_stalled = false;
-	new->transaction_started = 0;
 	INIT_LIST_HEAD(&new->out_list);
 	INIT_LIST_HEAD(&new->acc_list);
 	INIT_LIST_HEAD(&new->ref_list);
diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
index 419a144396..6465105b4d 100644
--- a/tools/xenstore/xenstored_core.h
+++ b/tools/xenstore/xenstored_core.h
@@ -151,7 +151,6 @@ struct connection
 	/* List of in-progress transactions. */
 	struct list_head transaction_list;
 	uint32_t next_transaction_id;
-	unsigned int transaction_started;
 	time_t ta_start_time;
 
 	/* List of delayed requests. */
diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
index d461fd8cc8..4d48e7a9f4 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -417,12 +417,10 @@ int domain_get_quota(const void *ctx, struct connection *conn,
 {
 	struct domain *d = find_domain_struct(domid);
 	char *resp;
-	int ta;
 
 	if (!d)
 		return ENOENT;
 
-	ta = d->conn ? d->conn->transaction_started : 0;
 	resp = talloc_asprintf(ctx, "Domain %u:\n", domid);
 	if (!resp)
 		return ENOMEM;
@@ -433,7 +431,7 @@ int domain_get_quota(const void *ctx, struct connection *conn,
 
 	ent(nodes, d->acc[ACC_NODES]);
 	ent(watches, d->acc[ACC_WATCH]);
-	ent(transactions, ta);
+	ent(transactions, d->acc[ACC_TRANS]);
 	ent(outstanding, d->acc[ACC_OUTST]);
 	ent(memory, d->acc[ACC_MEM]);
 
@@ -1295,6 +1293,23 @@ void domain_outstanding_dec(struct connection *conn, unsigned int domid)
 	domain_acc_add(conn, domid, ACC_OUTST, -1, true);
 }
 
+void domain_transaction_inc(struct connection *conn)
+{
+	domain_acc_add(conn, conn->id, ACC_TRANS, 1, true);
+}
+
+void domain_transaction_dec(struct connection *conn)
+{
+	domain_acc_add(conn, conn->id, ACC_TRANS, -1, true);
+}
+
+unsigned int domain_transaction_get(struct connection *conn)
+{
+	return (domain_is_unprivileged(conn))
+		? domain_acc_add(conn, conn->id, ACC_TRANS, 0, true)
+		: 0;
+}
+
 static wrl_creditt wrl_config_writecost      = WRL_FACTOR;
 static wrl_creditt wrl_config_rate           = WRL_RATE   * WRL_FACTOR;
 static wrl_creditt wrl_config_dburst         = WRL_DBURST * WRL_FACTOR;
diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
index 37e14d1c8c..e0562b0b4d 100644
--- a/tools/xenstore/xenstored_domain.h
+++ b/tools/xenstore/xenstored_domain.h
@@ -26,6 +26,7 @@ enum accitem {
 	ACC_WATCH = ACC_TR_N,
 	ACC_OUTST,
 	ACC_MEM,
+	ACC_TRANS,
 	ACC_N            /* Number of elements per domain. */
 };
 
@@ -107,6 +108,9 @@ void domain_watch_dec(struct connection *conn);
 int domain_watch(struct connection *conn);
 void domain_outstanding_inc(struct connection *conn, unsigned int domid);
 void domain_outstanding_dec(struct connection *conn, unsigned int domid);
+void domain_transaction_inc(struct connection *conn);
+void domain_transaction_dec(struct connection *conn);
+unsigned int domain_transaction_get(struct connection *conn);
 int domain_get_quota(const void *ctx, struct connection *conn,
 		     unsigned int domid);
 int acc_fix_domains(struct list_head *head, bool update);
diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
index 19a1175d1b..ce6a12b576 100644
--- a/tools/xenstore/xenstored_transaction.c
+++ b/tools/xenstore/xenstored_transaction.c
@@ -471,8 +471,7 @@ int do_transaction_start(const void *ctx, struct connection *conn,
 	if (conn->transaction)
 		return EBUSY;
 
-	if (domain_is_unprivileged(conn) &&
-	    conn->transaction_started > quota_max_transaction)
+	if (domain_transaction_get(conn) > quota_max_transaction)
 		return ENOSPC;
 
 	/* Attach transaction to ctx for autofree until it's complete */
@@ -497,9 +496,9 @@ int do_transaction_start(const void *ctx, struct connection *conn,
 	list_add_tail(&trans->list, &conn->transaction_list);
 	talloc_steal(conn, trans);
 	talloc_set_destructor(trans, destroy_transaction);
-	if (!conn->transaction_started)
+	if (!conn->ta_start_time)
 		conn->ta_start_time = time(NULL);
-	conn->transaction_started++;
+	domain_transaction_inc(conn);
 	wrl_ntransactions++;
 
 	snprintf(id_str, sizeof(id_str), "%u", trans->id);
@@ -524,8 +523,8 @@ int do_transaction_end(const void *ctx, struct connection *conn,
 
 	conn->transaction = NULL;
 	list_del(&trans->list);
-	conn->transaction_started--;
-	if (!conn->transaction_started)
+	domain_transaction_dec(conn);
+	if (list_empty(&conn->transaction_list))
 		conn->ta_start_time = 0;
 
 	/* Attach transaction to ctx for auto-cleanup */
@@ -576,7 +575,6 @@ void conn_delete_all_transactions(struct connection *conn)
 
 	assert(conn->transaction == NULL);
 
-	conn->transaction_started = 0;
 	conn->ta_start_time = 0;
 }
 
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 11/13] tools/xenstore: remember global and per domain max accounting values
  2023-01-20 10:00 [PATCH v2 00/13] tools/xenstore: rework internal accounting Juergen Gross
                   ` (9 preceding siblings ...)
  2023-01-20 10:00 ` [PATCH v2 10/13] tools/xenstore: switch transaction accounting to generic accounting Juergen Gross
@ 2023-01-20 10:00 ` Juergen Gross
  2023-01-20 10:00 ` [PATCH v2 12/13] tools/xenstore: use generic accounting for remaining quotas Juergen Gross
  2023-01-20 10:00 ` [PATCH v2 13/13] tools/xenstore: switch quota management to be table based Juergen Gross
  12 siblings, 0 replies; 40+ messages in thread
From: Juergen Gross @ 2023-01-20 10:00 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu, Anthony PERARD

Add saving the maximum values of the different accounting data seen
per domain and (for unprivileged domains) globally, and print those
values via the xenstore-control quota command. Add a sub-command for
resetting the global maximum values seen.

This should help for a decision how to set the related quotas.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 docs/misc/xenstore.txt             |   5 +-
 tools/xenstore/xenstored_control.c |  22 ++++++-
 tools/xenstore/xenstored_domain.c  | 100 +++++++++++++++++++++++------
 tools/xenstore/xenstored_domain.h  |   2 +
 4 files changed, 108 insertions(+), 21 deletions(-)

diff --git a/docs/misc/xenstore.txt b/docs/misc/xenstore.txt
index 8887e7df88..44a02f6724 100644
--- a/docs/misc/xenstore.txt
+++ b/docs/misc/xenstore.txt
@@ -423,7 +423,7 @@ CONTROL			<command>|[<parameters>|]
 	print|<string>
 		print <string> to syslog (xenstore runs as daemon) or
 		to console (xenstore runs as stubdom)
-	quota|[set <name> <val>|<domid>]
+	quota|[set <name> <val>|<domid>|max [-r]]
 		without parameters: print the current quota settings
 		with "set <name> <val>": set the quota <name> to new value
 		<val> (The admin should make sure all the domain usage is
@@ -432,6 +432,9 @@ CONTROL			<command>|[<parameters>|]
 		violating the new quota setting isn't increased further)
 		with "<domid>": print quota related accounting data for
 		the domain <domid>
+		with "max [-r]": show global per-domain maximum values of all
+		unprivileged domains, optionally reset the values by adding
+		"-r"
 	quota-soft|[set <name> <val>]
 		like the "quota" command, but for soft-quota.
 	help			<supported-commands>
diff --git a/tools/xenstore/xenstored_control.c b/tools/xenstore/xenstored_control.c
index cbd62556c3..a2ba64a15c 100644
--- a/tools/xenstore/xenstored_control.c
+++ b/tools/xenstore/xenstored_control.c
@@ -306,6 +306,22 @@ static int quota_get(const void *ctx, struct connection *conn,
 	return domain_get_quota(ctx, conn, atoi(vec[0]));
 }
 
+static int quota_max(const void *ctx, struct connection *conn,
+		     char **vec, int num)
+{
+	if (num > 1)
+		return EINVAL;
+
+	if (num == 1) {
+		if (!strcmp(vec[0], "-r"))
+			domain_reset_global_acc();
+		else
+			return EINVAL;
+	}
+
+	return domain_max_global_acc(ctx, conn);
+}
+
 static int do_control_quota(const void *ctx, struct connection *conn,
 			    char **vec, int num)
 {
@@ -315,6 +331,9 @@ static int do_control_quota(const void *ctx, struct connection *conn,
 	if (!strcmp(vec[0], "set"))
 		return quota_set(ctx, conn, vec + 1, num - 1, hard_quotas);
 
+	if (!strcmp(vec[0], "max"))
+		return quota_max(ctx, conn, vec + 1, num - 1);
+
 	return quota_get(ctx, conn, vec, num);
 }
 
@@ -978,7 +997,8 @@ static struct cmd_s cmds[] = {
 	{ "memreport", do_control_memreport, "[<file>]" },
 #endif
 	{ "print", do_control_print, "<string>" },
-	{ "quota", do_control_quota, "[set <name> <val>|<domid>]" },
+	{ "quota", do_control_quota,
+		"[set <name> <val>|<domid>|max [-r]]" },
 	{ "quota-soft", do_control_quota_s, "[set <name> <val>]" },
 	{ "help", do_control_help, "" },
 };
diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
index 4d48e7a9f4..91695b6313 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -43,6 +43,8 @@ static evtchn_port_t virq_port;
 
 xenevtchn_handle *xce_handle = NULL;
 
+static unsigned int acc_global_max[ACC_N];
+
 struct domain
 {
 	/* The id of this domain */
@@ -70,7 +72,10 @@ struct domain
 	bool introduced;
 
 	/* Accounting data for this domain. */
-	unsigned int acc[ACC_N];
+	struct acc {
+		unsigned int val;
+		unsigned int max;
+	} acc[ACC_N];
 
 	/* Memory quota data for this domain. */
 	bool soft_quota_reported;
@@ -199,9 +204,9 @@ static bool domain_can_read(struct connection *conn)
 	if (domain_is_unprivileged(conn)) {
 		if (domain->wrl_credit < 0)
 			return false;
-		if (domain->acc[ACC_OUTST] >= quota_req_outstanding)
+		if (domain->acc[ACC_OUTST].val >= quota_req_outstanding)
 			return false;
-		if (domain->acc[ACC_MEM] >= quota_memory_per_domain_hard &&
+		if (domain->acc[ACC_MEM].val >= quota_memory_per_domain_hard &&
 		    quota_memory_per_domain_hard)
 			return false;
 	}
@@ -264,7 +269,7 @@ static int domain_tree_remove_sub(const void *ctx, struct connection *conn,
 		ret = WALK_TREE_SKIP_CHILDREN;
 	}
 
-	return domain->acc[ACC_NODES] ? ret : WALK_TREE_SUCCESS_STOP;
+	return domain->acc[ACC_NODES].val ? ret : WALK_TREE_SUCCESS_STOP;
 }
 
 static void domain_tree_remove(struct domain *domain)
@@ -272,7 +277,7 @@ static void domain_tree_remove(struct domain *domain)
 	int ret;
 	struct walk_funcs walkfuncs = { .enter = domain_tree_remove_sub };
 
-	if (domain->acc[ACC_NODES]) {
+	if (domain->acc[ACC_NODES].val) {
 		ret = walk_node_tree(domain, NULL, "/", &walkfuncs, domain);
 		if (ret == WALK_TREE_ERROR_STOP)
 			syslog(LOG_ERR,
@@ -426,14 +431,41 @@ int domain_get_quota(const void *ctx, struct connection *conn,
 		return ENOMEM;
 
 #define ent(t, e) \
-	resp = talloc_asprintf_append(resp, "%-16s: %8d\n", #t, e); \
+	resp = talloc_asprintf_append(resp, "%-16s: %8u (max: %8u\n", #t, \
+				      d->acc[e].val, d->acc[e].max); \
+	if (!resp) return ENOMEM
+
+	ent(nodes, ACC_NODES);
+	ent(watches, ACC_WATCH);
+	ent(transactions, ACC_TRANS);
+	ent(outstanding, ACC_OUTST);
+	ent(memory, ACC_MEM);
+
+#undef ent
+
+	send_reply(conn, XS_CONTROL, resp, strlen(resp) + 1);
+
+	return 0;
+}
+
+int domain_max_global_acc(const void *ctx, struct connection *conn)
+{
+	char *resp;
+
+	resp = talloc_asprintf(ctx, "Max. seen accounting values:\n");
+	if (!resp)
+		return ENOMEM;
+
+#define ent(t, e) \
+	resp = talloc_asprintf_append(resp, "%-16s: %8u\n", #t,   \
+				      acc_global_max[e]);         \
 	if (!resp) return ENOMEM
 
-	ent(nodes, d->acc[ACC_NODES]);
-	ent(watches, d->acc[ACC_WATCH]);
-	ent(transactions, d->acc[ACC_TRANS]);
-	ent(outstanding, d->acc[ACC_OUTST]);
-	ent(memory, d->acc[ACC_MEM]);
+	ent(nodes, ACC_NODES);
+	ent(watches, ACC_WATCH);
+	ent(transactions, ACC_TRANS);
+	ent(outstanding, ACC_OUTST);
+	ent(memory, ACC_MEM);
 
 #undef ent
 
@@ -1051,10 +1083,12 @@ int domain_adjust_node_perms(struct node *node)
 static int domain_acc_add_chk(struct domain *d, enum accitem what, int add,
 			      unsigned int domid)
 {
+	unsigned int val;
+
 	assert(what < ARRAY_SIZE(d->acc));
 
-	if ((add < 0 && -add > d->acc[what]) ||
-	    (d->acc[what] + add) > INT_MAX) {
+	if ((add < 0 && -add > d->acc[what].val) ||
+	    (d->acc[what].val + add) > INT_MAX) {
 		/*
 		 * In a transaction when a node is being added/removed AND the
 		 * same node has been added/removed outside the transaction in
@@ -1065,7 +1099,13 @@ static int domain_acc_add_chk(struct domain *d, enum accitem what, int add,
 		return (add < 0) ? 0 : INT_MAX;
 	}
 
-	return d->acc[what] + add;
+	val = d->acc[what].val + add;
+	if (val > d->acc[what].max)
+		d->acc[what].max = val;
+	if (val > acc_global_max[what] && domid_is_unprivileged(domid))
+		acc_global_max[what] = val;
+
+	return val;
 }
 
 static int domain_acc_add(struct connection *conn, unsigned int domid,
@@ -1121,10 +1161,10 @@ static int domain_acc_add(struct connection *conn, unsigned int domid,
 	}
 
 	trace_acc("global change domid %u: what=%u %u add %d\n", domid, what,
-		  d->acc[what], add);
-	d->acc[what] = domain_acc_add_chk(d, what, add, domid);
+		  d->acc[what].val, add);
+	d->acc[what].val = domain_acc_add_chk(d, what, add, domid);
 
-	return d->acc[what];
+	return d->acc[what].val;
 }
 
 void acc_drop(struct connection *conn)
@@ -1157,6 +1197,28 @@ void acc_commit(struct connection *conn)
 	conn->in = in;
 }
 
+static int domain_reset_global_acc_sub(const void *k, void *v, void *arg)
+{
+	struct domain *d = v;
+	unsigned int i;
+
+	for (i = 0; i < ACC_N; i++)
+		d->acc[i].max = d->acc[i].val;
+
+	return 0;
+}
+
+void domain_reset_global_acc(void)
+{
+	unsigned int i;
+
+	for (i = 0; i < ACC_N; i++)
+		acc_global_max[i] = 0;
+
+	/* Set current max values seen. */
+	hashtable_iterate(domhash, domain_reset_global_acc_sub, NULL);
+}
+
 int domain_nbentry_inc(struct connection *conn, unsigned int domid)
 {
 	return (domain_acc_add(conn, domid, ACC_NODES, 1, false) < 0)
@@ -1657,7 +1719,7 @@ static int domain_check_acc_init_sub(const void *k, void *v, void *arg)
 	 * If everything is correct incrementing the value for each node will
 	 * result in dom->nodes being 0 at the end.
 	 */
-	dom->nodes = -d->acc[ACC_NODES];
+	dom->nodes = -d->acc[ACC_NODES].val;
 
 	if (!hashtable_insert(domains, &dom->domid, dom)) {
 		talloc_free(dom);
@@ -1712,7 +1774,7 @@ static int domain_check_acc_cb(const void *k, void *v, void *arg)
 	if (!d)
 		return 0;
 
-	d->acc[ACC_NODES] += dom->nodes;
+	d->acc[ACC_NODES].val += dom->nodes;
 
 	return 0;
 }
diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
index e0562b0b4d..162e7dc0d0 100644
--- a/tools/xenstore/xenstored_domain.h
+++ b/tools/xenstore/xenstored_domain.h
@@ -116,6 +116,8 @@ int domain_get_quota(const void *ctx, struct connection *conn,
 int acc_fix_domains(struct list_head *head, bool update);
 void acc_drop(struct connection *conn);
 void acc_commit(struct connection *conn);
+int domain_max_global_acc(const void *ctx, struct connection *conn);
+void domain_reset_global_acc(void);
 
 /* Write rate limiting */
 
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 12/13] tools/xenstore: use generic accounting for remaining quotas
  2023-01-20 10:00 [PATCH v2 00/13] tools/xenstore: rework internal accounting Juergen Gross
                   ` (10 preceding siblings ...)
  2023-01-20 10:00 ` [PATCH v2 11/13] tools/xenstore: remember global and per domain max accounting values Juergen Gross
@ 2023-01-20 10:00 ` Juergen Gross
  2023-01-20 10:00 ` [PATCH v2 13/13] tools/xenstore: switch quota management to be table based Juergen Gross
  12 siblings, 0 replies; 40+ messages in thread
From: Juergen Gross @ 2023-01-20 10:00 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Wei Liu, Julien Grall, Anthony PERARD

The maxrequests, node size, number of node permissions, and path length
quota are a little bit special, as they are either active in
transactions only (maxrequests), or they are just per item instead of
count values. Nevertheless being able to know the maximum number of
those quota related values per domain would be beneficial, so add them
to the generic accounting.

The per domain value will never show current numbers other than zero,
but the maximum number seen can be gathered the same way as the number
of nodes during a transaction.

To be able to use the const qualifier for a new function switch
domain_is_unprivileged() to take a const pointer, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 tools/xenstore/xenstored_core.c        | 14 ++++-----
 tools/xenstore/xenstored_core.h        |  2 +-
 tools/xenstore/xenstored_domain.c      | 39 ++++++++++++++++++++------
 tools/xenstore/xenstored_domain.h      |  6 ++++
 tools/xenstore/xenstored_transaction.c |  4 +--
 tools/xenstore/xenstored_watch.c       |  2 +-
 6 files changed, 48 insertions(+), 19 deletions(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 5b85b69eb2..c34de5ca3a 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -799,8 +799,8 @@ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
 		+ node->perms.num * sizeof(node->perms.p[0])
 		+ node->datalen + node->childlen;
 
-	if (!no_quota_check && domain_is_unprivileged(conn) &&
-	    data.dsize >= quota_max_entry_size) {
+	if (domain_max_chk(conn, ACC_NODESZ, data.dsize, quota_max_entry_size)
+	    && !no_quota_check) {
 		errno = ENOSPC;
 		return errno;
 	}
@@ -1168,7 +1168,7 @@ static bool valid_chars(const char *node)
 		       "0123456789-/_@") == strlen(node));
 }
 
-bool is_valid_nodename(const char *node)
+bool is_valid_nodename(const struct connection *conn, const char *node)
 {
 	int local_off = 0;
 	unsigned int domid;
@@ -1188,7 +1188,8 @@ bool is_valid_nodename(const char *node)
 	if (sscanf(node, "/local/domain/%5u/%n", &domid, &local_off) != 1)
 		local_off = 0;
 
-	if (strlen(node) > local_off + quota_max_path_len)
+	if (domain_max_chk(conn, ACC_PATHLEN, strlen(node) - local_off,
+			   quota_max_path_len))
 		return false;
 
 	return valid_chars(node);
@@ -1250,7 +1251,7 @@ static struct node *get_node_canonicalized(struct connection *conn,
 	*canonical_name = canonicalize(conn, ctx, name);
 	if (!*canonical_name)
 		return NULL;
-	if (!is_valid_nodename(*canonical_name)) {
+	if (!is_valid_nodename(conn, *canonical_name)) {
 		errno = EINVAL;
 		return NULL;
 	}
@@ -1775,8 +1776,7 @@ static int do_set_perms(const void *ctx, struct connection *conn,
 		return EINVAL;
 
 	perms.num--;
-	if (domain_is_unprivileged(conn) &&
-	    perms.num > quota_nb_perms_per_node)
+	if (domain_max_chk(conn, ACC_NPERM, perms.num, quota_nb_perms_per_node))
 		return ENOSPC;
 
 	permstr = in->buffer + strlen(in->buffer) + 1;
diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
index 6465105b4d..0140c25880 100644
--- a/tools/xenstore/xenstored_core.h
+++ b/tools/xenstore/xenstored_core.h
@@ -258,7 +258,7 @@ void check_store(void);
 void corrupt(struct connection *conn, const char *fmt, ...);
 
 /* Is this a valid node name? */
-bool is_valid_nodename(const char *node);
+bool is_valid_nodename(const struct connection *conn, const char *node);
 
 /* Get name of parent node. */
 char *get_parent(const void *ctx, const char *node);
diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
index 91695b6313..f431076505 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -431,7 +431,7 @@ int domain_get_quota(const void *ctx, struct connection *conn,
 		return ENOMEM;
 
 #define ent(t, e) \
-	resp = talloc_asprintf_append(resp, "%-16s: %8u (max: %8u\n", #t, \
+	resp = talloc_asprintf_append(resp, "%-17s: %8u (max: %8u\n", #t, \
 				      d->acc[e].val, d->acc[e].max); \
 	if (!resp) return ENOMEM
 
@@ -440,6 +440,7 @@ int domain_get_quota(const void *ctx, struct connection *conn,
 	ent(transactions, ACC_TRANS);
 	ent(outstanding, ACC_OUTST);
 	ent(memory, ACC_MEM);
+	ent(transaction-nodes, ACC_TRANSNODES);
 
 #undef ent
 
@@ -457,7 +458,7 @@ int domain_max_global_acc(const void *ctx, struct connection *conn)
 		return ENOMEM;
 
 #define ent(t, e) \
-	resp = talloc_asprintf_append(resp, "%-16s: %8u\n", #t,   \
+	resp = talloc_asprintf_append(resp, "%-17s: %8u\n", #t,   \
 				      acc_global_max[e]);         \
 	if (!resp) return ENOMEM
 
@@ -466,6 +467,7 @@ int domain_max_global_acc(const void *ctx, struct connection *conn)
 	ent(transactions, ACC_TRANS);
 	ent(outstanding, ACC_OUTST);
 	ent(memory, ACC_MEM);
+	ent(transaction-nodes, ACC_TRANSNODES);
 
 #undef ent
 
@@ -1080,13 +1082,23 @@ int domain_adjust_node_perms(struct node *node)
 	return 0;
 }
 
+static void domain_acc_chk_max(struct domain *d, enum accitem what,
+			       unsigned int val, unsigned int domid)
+{
+	assert(what < ARRAY_SIZE(d->acc));
+	assert(what < ARRAY_SIZE(acc_global_max));
+
+	if (val > d->acc[what].max)
+		d->acc[what].max = val;
+	if (val > acc_global_max[what] && domid_is_unprivileged(domid))
+		acc_global_max[what] = val;
+}
+
 static int domain_acc_add_chk(struct domain *d, enum accitem what, int add,
 			      unsigned int domid)
 {
 	unsigned int val;
 
-	assert(what < ARRAY_SIZE(d->acc));
-
 	if ((add < 0 && -add > d->acc[what].val) ||
 	    (d->acc[what].val + add) > INT_MAX) {
 		/*
@@ -1100,10 +1112,7 @@ static int domain_acc_add_chk(struct domain *d, enum accitem what, int add,
 	}
 
 	val = d->acc[what].val + add;
-	if (val > d->acc[what].max)
-		d->acc[what].max = val;
-	if (val > acc_global_max[what] && domid_is_unprivileged(domid))
-		acc_global_max[what] = val;
+	domain_acc_chk_max(d, what, val, domid);
 
 	return val;
 }
@@ -1219,6 +1228,20 @@ void domain_reset_global_acc(void)
 	hashtable_iterate(domhash, domain_reset_global_acc_sub, NULL);
 }
 
+bool domain_max_chk(const struct connection *conn, enum accitem what,
+		    unsigned int val, unsigned int quota)
+{
+	if (!conn || !conn->domain)
+		return false;
+
+	if (domain_is_unprivileged(conn) && val > quota)
+		return true;
+
+	domain_acc_chk_max(conn->domain, what, val, conn->id);
+
+	return false;
+}
+
 int domain_nbentry_inc(struct connection *conn, unsigned int domid)
 {
 	return (domain_acc_add(conn, domid, ACC_NODES, 1, false) < 0)
diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
index 162e7dc0d0..ff341dd8bf 100644
--- a/tools/xenstore/xenstored_domain.h
+++ b/tools/xenstore/xenstored_domain.h
@@ -27,6 +27,10 @@ enum accitem {
 	ACC_OUTST,
 	ACC_MEM,
 	ACC_TRANS,
+	ACC_TRANSNODES,
+	ACC_NPERM,
+	ACC_PATHLEN,
+	ACC_NODESZ,
 	ACC_N            /* Number of elements per domain. */
 };
 
@@ -118,6 +122,8 @@ void acc_drop(struct connection *conn);
 void acc_commit(struct connection *conn);
 int domain_max_global_acc(const void *ctx, struct connection *conn);
 void domain_reset_global_acc(void);
+bool domain_max_chk(const struct connection *conn, unsigned int what,
+		    unsigned int val, unsigned int quota);
 
 /* Write rate limiting */
 
diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
index ce6a12b576..7967770ca2 100644
--- a/tools/xenstore/xenstored_transaction.c
+++ b/tools/xenstore/xenstored_transaction.c
@@ -244,8 +244,8 @@ int access_node(struct connection *conn, struct node *node,
 
 	i = find_accessed_node(trans, node->name);
 	if (!i) {
-		if (trans->nodes >= quota_trans_nodes &&
-		    domain_is_unprivileged(conn)) {
+		if (domain_max_chk(conn, ACC_TRANSNODES, trans->nodes + 1,
+				   quota_trans_nodes)) {
 			ret = ENOSPC;
 			goto err;
 		}
diff --git a/tools/xenstore/xenstored_watch.c b/tools/xenstore/xenstored_watch.c
index e30cd89be3..61b1e3421e 100644
--- a/tools/xenstore/xenstored_watch.c
+++ b/tools/xenstore/xenstored_watch.c
@@ -176,7 +176,7 @@ static int check_watch_path(struct connection *conn, const void *ctx,
 		*path = canonicalize(conn, ctx, *path);
 		if (!*path)
 			return errno;
-		if (!is_valid_nodename(*path))
+		if (!is_valid_nodename(conn, *path))
 			goto inval;
 	}
 
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 13/13] tools/xenstore: switch quota management to be table based
  2023-01-20 10:00 [PATCH v2 00/13] tools/xenstore: rework internal accounting Juergen Gross
                   ` (11 preceding siblings ...)
  2023-01-20 10:00 ` [PATCH v2 12/13] tools/xenstore: use generic accounting for remaining quotas Juergen Gross
@ 2023-01-20 10:00 ` Juergen Gross
  12 siblings, 0 replies; 40+ messages in thread
From: Juergen Gross @ 2023-01-20 10:00 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Wei Liu, Julien Grall, Anthony PERARD

Instead of having individual quota variables switch to a table based
approach like the generic accounting. Include all the related data in
the same table and add accessor functions.

This enables to use the command line --quota parameter for setting all
possible quota values, keeping the previous parameters for
compatibility.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- new patch
One further remark: it would be rather easy to add soft-quota for all
the other quotas (similar to the memory one). This could be used as
an early warning for the need to raise global quota.
---
 tools/xenstore/xenstored_control.c     |  43 ++------
 tools/xenstore/xenstored_core.c        |  85 ++++++++--------
 tools/xenstore/xenstored_core.h        |  10 --
 tools/xenstore/xenstored_domain.c      | 132 +++++++++++++++++--------
 tools/xenstore/xenstored_domain.h      |  12 ++-
 tools/xenstore/xenstored_transaction.c |   5 +-
 tools/xenstore/xenstored_watch.c       |   2 +-
 7 files changed, 155 insertions(+), 134 deletions(-)

diff --git a/tools/xenstore/xenstored_control.c b/tools/xenstore/xenstored_control.c
index a2ba64a15c..75f51a80db 100644
--- a/tools/xenstore/xenstored_control.c
+++ b/tools/xenstore/xenstored_control.c
@@ -221,35 +221,6 @@ static int do_control_log(const void *ctx, struct connection *conn,
 	return 0;
 }
 
-struct quota {
-	const char *name;
-	int *quota;
-	const char *descr;
-};
-
-static const struct quota hard_quotas[] = {
-	{ "nodes", &quota_nb_entry_per_domain, "Nodes per domain" },
-	{ "watches", &quota_nb_watch_per_domain, "Watches per domain" },
-	{ "transactions", &quota_max_transaction, "Transactions per domain" },
-	{ "outstanding", &quota_req_outstanding,
-		"Outstanding requests per domain" },
-	{ "transaction-nodes", &quota_trans_nodes,
-		"Max. number of accessed nodes per transaction" },
-	{ "memory", &quota_memory_per_domain_hard,
-		"Total Xenstore memory per domain (error level)" },
-	{ "node-size", &quota_max_entry_size, "Max. size of a node" },
-	{ "path-max", &quota_max_path_len, "Max. length of a node path" },
-	{ "permissions", &quota_nb_perms_per_node,
-		"Max. number of permissions per node" },
-	{ NULL, NULL, NULL }
-};
-
-static const struct quota soft_quotas[] = {
-	{ "memory", &quota_memory_per_domain_soft,
-		"Total Xenstore memory per domain (warning level)" },
-	{ NULL, NULL, NULL }
-};
-
 static int quota_show_current(const void *ctx, struct connection *conn,
 			      const struct quota *quotas)
 {
@@ -260,9 +231,11 @@ static int quota_show_current(const void *ctx, struct connection *conn,
 	if (!resp)
 		return ENOMEM;
 
-	for (i = 0; quotas[i].quota; i++) {
+	for (i = 0; i < ACC_N; i++) {
+		if (!quotas[i].name)
+			continue;
 		resp = talloc_asprintf_append(resp, "%-17s: %8d %s\n",
-					      quotas[i].name, *quotas[i].quota,
+					      quotas[i].name, quotas[i].val,
 					      quotas[i].descr);
 		if (!resp)
 			return ENOMEM;
@@ -274,7 +247,7 @@ static int quota_show_current(const void *ctx, struct connection *conn,
 }
 
 static int quota_set(const void *ctx, struct connection *conn,
-		     char **vec, int num, const struct quota *quotas)
+		     char **vec, int num, struct quota *quotas)
 {
 	unsigned int i;
 	int val;
@@ -286,9 +259,9 @@ static int quota_set(const void *ctx, struct connection *conn,
 	if (val < 1)
 		return EINVAL;
 
-	for (i = 0; quotas[i].quota; i++) {
-		if (!strcmp(vec[0], quotas[i].name)) {
-			*quotas[i].quota = val;
+	for (i = 0; i < ACC_N; i++) {
+		if (quotas[i].name && !strcmp(vec[0], quotas[i].name)) {
+			quotas[i].val = val;
 			send_ack(conn, XS_CONTROL);
 			return 0;
 		}
diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index c34de5ca3a..6d71cb5a2c 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -89,17 +89,6 @@ unsigned int trace_flags = TRACE_OBJ | TRACE_IO;
 
 static const char *sockmsg_string(enum xsd_sockmsg_type type);
 
-int quota_nb_entry_per_domain = 1000;
-int quota_nb_watch_per_domain = 128;
-int quota_max_entry_size = 2048; /* 2K */
-int quota_max_transaction = 10;
-int quota_nb_perms_per_node = 5;
-int quota_trans_nodes = 1024;
-int quota_max_path_len = XENSTORE_REL_PATH_MAX;
-int quota_req_outstanding = 20;
-int quota_memory_per_domain_soft = 2 * 1024 * 1024; /* 2 MB */
-int quota_memory_per_domain_hard = 2 * 1024 * 1024 + 512 * 1024; /* 2.5 MB */
-
 unsigned int timeout_watch_event_msec = 20000;
 
 void trace(const char *fmt, ...)
@@ -799,7 +788,7 @@ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
 		+ node->perms.num * sizeof(node->perms.p[0])
 		+ node->datalen + node->childlen;
 
-	if (domain_max_chk(conn, ACC_NODESZ, data.dsize, quota_max_entry_size)
+	if (domain_max_chk(conn, ACC_NODESZ, data.dsize)
 	    && !no_quota_check) {
 		errno = ENOSPC;
 		return errno;
@@ -1188,8 +1177,7 @@ bool is_valid_nodename(const struct connection *conn, const char *node)
 	if (sscanf(node, "/local/domain/%5u/%n", &domid, &local_off) != 1)
 		local_off = 0;
 
-	if (domain_max_chk(conn, ACC_PATHLEN, strlen(node) - local_off,
-			   quota_max_path_len))
+	if (domain_max_chk(conn, ACC_PATHLEN, strlen(node) - local_off))
 		return false;
 
 	return valid_chars(node);
@@ -1501,7 +1489,7 @@ static struct node *create_node(struct connection *conn, const void *ctx,
 	for (i = node; i; i = i->parent) {
 		/* i->parent is set for each new node, so check quota. */
 		if (i->parent &&
-		    domain_nbentry(conn) >= quota_nb_entry_per_domain) {
+		    domain_nbentry(conn) >= hard_quotas[ACC_NODES].val) {
 			ret = ENOSPC;
 			goto err;
 		}
@@ -1776,7 +1764,7 @@ static int do_set_perms(const void *ctx, struct connection *conn,
 		return EINVAL;
 
 	perms.num--;
-	if (domain_max_chk(conn, ACC_NPERM, perms.num, quota_nb_perms_per_node))
+	if (domain_max_chk(conn, ACC_NPERM, perms.num))
 		return ENOSPC;
 
 	permstr = in->buffer + strlen(in->buffer) + 1;
@@ -2644,7 +2632,16 @@ static void usage(void)
 "                          memory: total used memory per domain for nodes,\n"
 "                                  transactions, watches and requests, above\n"
 "                                  which Xenstore will stop talking to domain\n"
+"                          nodes: number nodes owned by a domain\n"
+"                          node-permissions: number of access permissions per\n"
+"                                            node\n"
+"                          node-size: total size of a node (permissions +\n"
+"                                     children names + content)\n"
 "                          outstanding: number of outstanding requests\n"
+"                          path-length: length of a node path\n"
+"                          transactions: number of concurrent transactions\n"
+"                                        per domain\n"
+"                          watches: number of watches per domain"
 "  -q, --quota-soft <what>=<nb> set a soft quota <what> to the value <nb>,\n"
 "                          causing a warning to be issued via syslog() if the\n"
 "                          limit is violated, allowed quotas are:\n"
@@ -2695,12 +2692,12 @@ int dom0_domid = 0;
 int dom0_event = 0;
 int priv_domid = 0;
 
-static int get_optval_int(const char *arg)
+static unsigned int get_optval_int(const char *arg)
 {
 	char *end;
-	long val;
+	unsigned long val;
 
-	val = strtol(arg, &end, 10);
+	val = strtoul(arg, &end, 10);
 	if (!*arg || *end || val < 0 || val > INT_MAX)
 		barf("invalid parameter value \"%s\"\n", arg);
 
@@ -2709,15 +2706,19 @@ static int get_optval_int(const char *arg)
 
 static bool what_matches(const char *arg, const char *what)
 {
-	unsigned int what_len = strlen(what);
+	unsigned int what_len;
+
+	if (!what)
+		false;
 
+	what_len = strlen(what);
 	return !strncmp(arg, what, what_len) && arg[what_len] == '=';
 }
 
 static void set_timeout(const char *arg)
 {
 	const char *eq = strchr(arg, '=');
-	int val;
+	unsigned int val;
 
 	if (!eq)
 		barf("quotas must be specified via <what>=<seconds>\n");
@@ -2731,22 +2732,22 @@ static void set_timeout(const char *arg)
 static void set_quota(const char *arg, bool soft)
 {
 	const char *eq = strchr(arg, '=');
-	int val;
+	struct quota *q = soft ? soft_quotas : hard_quotas;
+	unsigned int val;
+	unsigned int i;
 
 	if (!eq)
 		barf("quotas must be specified via <what>=<nb>\n");
 	val = get_optval_int(eq + 1);
-	if (what_matches(arg, "outstanding") && !soft)
-		quota_req_outstanding = val;
-	else if (what_matches(arg, "transaction-nodes") && !soft)
-		quota_trans_nodes = val;
-	else if (what_matches(arg, "memory")) {
-		if (soft)
-			quota_memory_per_domain_soft = val;
-		else
-			quota_memory_per_domain_hard = val;
-	} else
-		barf("unknown quota \"%s\"\n", arg);
+
+	for (i = 0; i < ACC_N; i++) {
+		if (what_matches(arg, q[i].name)) {
+			q[i].val = val;
+			return;
+		}
+	}
+
+	barf("unknown quota \"%s\"\n", arg);
 }
 
 /* Sorted by bit values of TRACE_* flags. Flag is (1u << index). */
@@ -2808,7 +2809,7 @@ int main(int argc, char *argv[])
 			no_domain_init = true;
 			break;
 		case 'E':
-			quota_nb_entry_per_domain = strtol(optarg, NULL, 10);
+			hard_quotas[ACC_NODES].val = strtoul(optarg, NULL, 10);
 			break;
 		case 'F':
 			pidfile = optarg;
@@ -2826,10 +2827,10 @@ int main(int argc, char *argv[])
 			recovery = false;
 			break;
 		case 'S':
-			quota_max_entry_size = strtol(optarg, NULL, 10);
+			hard_quotas[ACC_NODESZ].val = strtoul(optarg, NULL, 10);
 			break;
 		case 't':
-			quota_max_transaction = strtol(optarg, NULL, 10);
+			hard_quotas[ACC_TRANS].val = strtoul(optarg, NULL, 10);
 			break;
 		case 'T':
 			tracefile = optarg;
@@ -2849,15 +2850,17 @@ int main(int argc, char *argv[])
 			verbose = true;
 			break;
 		case 'W':
-			quota_nb_watch_per_domain = strtol(optarg, NULL, 10);
+			hard_quotas[ACC_WATCH].val = strtoul(optarg, NULL, 10);
 			break;
 		case 'A':
-			quota_nb_perms_per_node = strtol(optarg, NULL, 10);
+			hard_quotas[ACC_NPERM].val = strtoul(optarg, NULL, 10);
 			break;
 		case 'M':
-			quota_max_path_len = strtol(optarg, NULL, 10);
-			quota_max_path_len = min(XENSTORE_REL_PATH_MAX,
-						 quota_max_path_len);
+			hard_quotas[ACC_PATHLEN].val =
+				strtoul(optarg, NULL, 10);
+			hard_quotas[ACC_PATHLEN].val =
+				 min((unsigned int)XENSTORE_REL_PATH_MAX,
+				     hard_quotas[ACC_PATHLEN].val);
 			break;
 		case 'Q':
 			set_quota(optarg, false);
diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
index 0140c25880..693622ec68 100644
--- a/tools/xenstore/xenstored_core.h
+++ b/tools/xenstore/xenstored_core.h
@@ -315,16 +315,6 @@ extern TDB_CONTEXT *tdb_ctx;
 extern int dom0_domid;
 extern int dom0_event;
 extern int priv_domid;
-extern int quota_nb_watch_per_domain;
-extern int quota_max_transaction;
-extern int quota_max_entry_size;
-extern int quota_nb_perms_per_node;
-extern int quota_max_path_len;
-extern int quota_nb_entry_per_domain;
-extern int quota_req_outstanding;
-extern int quota_trans_nodes;
-extern int quota_memory_per_domain_soft;
-extern int quota_memory_per_domain_hard;
 extern bool keep_orphans;
 
 extern unsigned int timeout_watch_event_msec;
diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
index f431076505..3906047e6b 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -43,7 +43,61 @@ static evtchn_port_t virq_port;
 
 xenevtchn_handle *xce_handle = NULL;
 
-static unsigned int acc_global_max[ACC_N];
+struct quota hard_quotas[ACC_N] = {
+	[ACC_NODES] = {
+		.name = "nodes",
+		.descr = "Nodes per domain",
+		.val = 1000,
+	},
+	[ACC_WATCH] = {
+		.name = "watches",
+		.descr = "Watches per domain",
+		.val = 128,
+	},
+	[ACC_OUTST] = {
+		.name = "outstanding",
+		.descr = "Outstanding requests per domain",
+		.val = 20,
+	},
+	[ACC_MEM] = {
+		.name = "memory",
+		.descr = "Total Xenstore memory per domain (error level)",
+		.val = 2 * 1024 * 1024 + 512 * 1024,	/* 2.5 MB */
+	},
+	[ACC_TRANS] = {
+		.name = "transactions",
+		.descr = "Active transactions per domain",
+		.val = 10,
+	},
+	[ACC_TRANSNODES] = {
+		.name = "transaction-nodes",
+		.descr = "Max. number of accessed nodes per transaction",
+		.val = 1024,
+	},
+	[ACC_NPERM] = {
+		.name = "node-permissions",
+		.descr = "Max. number of permissions per node",
+		.val = 5,
+	},
+	[ACC_PATHLEN] = {
+		.name = "path-max",
+		.descr = "Max. length of a node path",
+		.val = XENSTORE_REL_PATH_MAX,
+	},
+	[ACC_NODESZ] = {
+		.name = "node-size",
+		.descr = "Max. size of a node",
+		.val = 2048,
+	},
+};
+
+struct quota soft_quotas[ACC_N] = {
+	[ACC_MEM] = {
+		.name = "memory",
+		.descr = "Total Xenstore memory per domain (warning level)",
+		.val = 2 * 1024 * 1024,			/* 2.0 MB */
+	},
+};
 
 struct domain
 {
@@ -204,10 +258,10 @@ static bool domain_can_read(struct connection *conn)
 	if (domain_is_unprivileged(conn)) {
 		if (domain->wrl_credit < 0)
 			return false;
-		if (domain->acc[ACC_OUTST].val >= quota_req_outstanding)
+		if (domain->acc[ACC_OUTST].val >= hard_quotas[ACC_OUTST].val)
 			return false;
-		if (domain->acc[ACC_MEM].val >= quota_memory_per_domain_hard &&
-		    quota_memory_per_domain_hard)
+		if (domain->acc[ACC_MEM].val >= hard_quotas[ACC_MEM].val &&
+		    hard_quotas[ACC_MEM].val)
 			return false;
 	}
 
@@ -422,6 +476,7 @@ int domain_get_quota(const void *ctx, struct connection *conn,
 {
 	struct domain *d = find_domain_struct(domid);
 	char *resp;
+	unsigned int i;
 
 	if (!d)
 		return ENOENT;
@@ -430,19 +485,15 @@ int domain_get_quota(const void *ctx, struct connection *conn,
 	if (!resp)
 		return ENOMEM;
 
-#define ent(t, e) \
-	resp = talloc_asprintf_append(resp, "%-17s: %8u (max: %8u\n", #t, \
-				      d->acc[e].val, d->acc[e].max); \
-	if (!resp) return ENOMEM
-
-	ent(nodes, ACC_NODES);
-	ent(watches, ACC_WATCH);
-	ent(transactions, ACC_TRANS);
-	ent(outstanding, ACC_OUTST);
-	ent(memory, ACC_MEM);
-	ent(transaction-nodes, ACC_TRANSNODES);
-
-#undef ent
+	for (i = 0; i < ACC_N; i++) {
+		if (!hard_quotas[i].name)
+			continue;
+		resp = talloc_asprintf_append(resp, "%-17s: %8u (max %8u)\n",
+					      hard_quotas[i].name,
+					      d->acc[i].val, d->acc[i].max);
+		if (!resp)
+			return ENOMEM;
+	}
 
 	send_reply(conn, XS_CONTROL, resp, strlen(resp) + 1);
 
@@ -452,24 +503,21 @@ int domain_get_quota(const void *ctx, struct connection *conn,
 int domain_max_global_acc(const void *ctx, struct connection *conn)
 {
 	char *resp;
+	unsigned int i;
 
 	resp = talloc_asprintf(ctx, "Max. seen accounting values:\n");
 	if (!resp)
 		return ENOMEM;
 
-#define ent(t, e) \
-	resp = talloc_asprintf_append(resp, "%-17s: %8u\n", #t,   \
-				      acc_global_max[e]);         \
-	if (!resp) return ENOMEM
-
-	ent(nodes, ACC_NODES);
-	ent(watches, ACC_WATCH);
-	ent(transactions, ACC_TRANS);
-	ent(outstanding, ACC_OUTST);
-	ent(memory, ACC_MEM);
-	ent(transaction-nodes, ACC_TRANSNODES);
-
-#undef ent
+	for (i = 0; i < ACC_N; i++) {
+		if (!hard_quotas[i].name)
+			continue;
+		resp = talloc_asprintf_append(resp, "%-17s: %8u\n",
+					      hard_quotas[i].name,
+					      hard_quotas[i].max);
+		if (!resp)
+			return ENOMEM;
+	}
 
 	send_reply(conn, XS_CONTROL, resp, strlen(resp) + 1);
 
@@ -584,7 +632,7 @@ int acc_fix_domains(struct list_head *head, bool update)
 	list_for_each_entry(cd, head, list) {
 		cnt = domain_nbentry_fix(cd->domid, cd->acc[ACC_NODES], update);
 		if (!update) {
-			if (cnt >= quota_nb_entry_per_domain)
+			if (cnt >= hard_quotas[ACC_NODES].val)
 				return ENOSPC;
 			if (cnt < 0)
 				return ENOMEM;
@@ -1086,12 +1134,12 @@ static void domain_acc_chk_max(struct domain *d, enum accitem what,
 			       unsigned int val, unsigned int domid)
 {
 	assert(what < ARRAY_SIZE(d->acc));
-	assert(what < ARRAY_SIZE(acc_global_max));
+	assert(what < ARRAY_SIZE(hard_quotas));
 
 	if (val > d->acc[what].max)
 		d->acc[what].max = val;
-	if (val > acc_global_max[what] && domid_is_unprivileged(domid))
-		acc_global_max[what] = val;
+	if (val > hard_quotas[what].max && domid_is_unprivileged(domid))
+		hard_quotas[what].max = val;
 }
 
 static int domain_acc_add_chk(struct domain *d, enum accitem what, int add,
@@ -1222,19 +1270,19 @@ void domain_reset_global_acc(void)
 	unsigned int i;
 
 	for (i = 0; i < ACC_N; i++)
-		acc_global_max[i] = 0;
+		hard_quotas[i].max = 0;
 
 	/* Set current max values seen. */
 	hashtable_iterate(domhash, domain_reset_global_acc_sub, NULL);
 }
 
 bool domain_max_chk(const struct connection *conn, enum accitem what,
-		    unsigned int val, unsigned int quota)
+		    unsigned int val)
 {
 	if (!conn || !conn->domain)
 		return false;
 
-	if (domain_is_unprivileged(conn) && val > quota)
+	if (domain_is_unprivileged(conn) && val > hard_quotas[what].val)
 		return true;
 
 	domain_acc_chk_max(conn->domain, what, val, conn->id);
@@ -1283,8 +1331,7 @@ static bool domain_chk_quota(struct connection *conn, unsigned int mem)
 	domain = conn->domain;
 	now = time(NULL);
 
-	if (mem >= quota_memory_per_domain_hard &&
-	    quota_memory_per_domain_hard) {
+	if (mem >= hard_quotas[ACC_MEM].val && hard_quotas[ACC_MEM].val) {
 		if (domain->hard_quota_reported)
 			return true;
 		syslog(LOG_ERR, "Domain %u exceeds hard memory quota, Xenstore interface to domain stalled\n",
@@ -1301,15 +1348,14 @@ static bool domain_chk_quota(struct connection *conn, unsigned int mem)
 			syslog(LOG_INFO, "Domain %u below hard memory quota again\n",
 			       domain->domid);
 		}
-		if (mem >= quota_memory_per_domain_soft &&
-		    quota_memory_per_domain_soft &&
-		    !domain->soft_quota_reported) {
+		if (mem >= soft_quotas[ACC_MEM].val &&
+		    soft_quotas[ACC_MEM].val && !domain->soft_quota_reported) {
 			domain->mem_last_msg = now;
 			domain->soft_quota_reported = true;
 			syslog(LOG_WARNING, "Domain %u exceeds soft memory quota\n",
 			       domain->domid);
 		}
-		if (mem < quota_memory_per_domain_soft &&
+		if (mem < soft_quotas[ACC_MEM].val &&
 		    domain->soft_quota_reported) {
 			domain->mem_last_msg = now;
 			domain->soft_quota_reported = false;
diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
index ff341dd8bf..3989d4a038 100644
--- a/tools/xenstore/xenstored_domain.h
+++ b/tools/xenstore/xenstored_domain.h
@@ -34,6 +34,16 @@ enum accitem {
 	ACC_N            /* Number of elements per domain. */
 };
 
+struct quota {
+	const char *name;
+	const char *descr;
+	unsigned int val;
+	unsigned int max;
+};
+
+extern struct quota hard_quotas[ACC_N];
+extern struct quota soft_quotas[ACC_N];
+
 void handle_event(void);
 
 void check_domains(void);
@@ -123,7 +133,7 @@ void acc_commit(struct connection *conn);
 int domain_max_global_acc(const void *ctx, struct connection *conn);
 void domain_reset_global_acc(void);
 bool domain_max_chk(const struct connection *conn, unsigned int what,
-		    unsigned int val, unsigned int quota);
+		    unsigned int val);
 
 /* Write rate limiting */
 
diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
index 7967770ca2..13fabe030d 100644
--- a/tools/xenstore/xenstored_transaction.c
+++ b/tools/xenstore/xenstored_transaction.c
@@ -244,8 +244,7 @@ int access_node(struct connection *conn, struct node *node,
 
 	i = find_accessed_node(trans, node->name);
 	if (!i) {
-		if (domain_max_chk(conn, ACC_TRANSNODES, trans->nodes + 1,
-				   quota_trans_nodes)) {
+		if (domain_max_chk(conn, ACC_TRANSNODES, trans->nodes + 1)) {
 			ret = ENOSPC;
 			goto err;
 		}
@@ -471,7 +470,7 @@ int do_transaction_start(const void *ctx, struct connection *conn,
 	if (conn->transaction)
 		return EBUSY;
 
-	if (domain_transaction_get(conn) > quota_max_transaction)
+	if (domain_transaction_get(conn) > hard_quotas[ACC_TRANS].val)
 		return ENOSPC;
 
 	/* Attach transaction to ctx for autofree until it's complete */
diff --git a/tools/xenstore/xenstored_watch.c b/tools/xenstore/xenstored_watch.c
index 61b1e3421e..e8eb35de02 100644
--- a/tools/xenstore/xenstored_watch.c
+++ b/tools/xenstore/xenstored_watch.c
@@ -239,7 +239,7 @@ int do_watch(const void *ctx, struct connection *conn, struct buffered_data *in)
 			return EEXIST;
 	}
 
-	if (domain_watch(conn) > quota_nb_watch_per_domain)
+	if (domain_watch(conn) > hard_quotas[ACC_WATCH].val)
 		return E2BIG;
 
 	watch = add_watch(conn, vec[0], vec[1], relative, false);
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 02/13] tools/xenstore: manage per-transaction domain accounting data in an array
  2023-01-20 10:00 ` [PATCH v2 02/13] tools/xenstore: manage per-transaction domain accounting data in an array Juergen Gross
@ 2023-02-17 18:49   ` Julien Grall
  0 siblings, 0 replies; 40+ messages in thread
From: Julien Grall @ 2023-02-17 18:49 UTC (permalink / raw)
  To: Juergen Gross, xen-devel; +Cc: Wei Liu, Anthony PERARD

Hi Juergen,

On 20/01/2023 10:00, Juergen Gross wrote:
> In order to prepare keeping accounting data in an array instead of
> using independent fields, switch the struct changed_domain accounting
> data to that scheme, for now only using an array with one element.
> 
> In order to be able to extend this scheme add the needed indexing enum
> to xenstored_domain.h.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

Reviewed-by: Julien Grall <jgrall@amazon.com>

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 03/13] tools/xenstore: introduce accounting data array for per-domain values
  2023-01-20 10:00 ` [PATCH v2 03/13] tools/xenstore: introduce accounting data array for per-domain values Juergen Gross
@ 2023-02-17 19:29   ` Julien Grall
  2023-02-20 11:20     ` Juergen Gross
  0 siblings, 1 reply; 40+ messages in thread
From: Julien Grall @ 2023-02-17 19:29 UTC (permalink / raw)
  To: Juergen Gross, xen-devel; +Cc: Wei Liu, Anthony PERARD

Hi Juergen,

On 20/01/2023 10:00, Juergen Gross wrote:
> Introduce the scheme of an accounting data array for per-domain
> accounting data and use it initially for the number of nodes owned by
> a domain.
> 
> Make the accounting data type to be unsigned int, as no data is allowed
> to be negative at any time.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>   tools/xenstore/xenstored_domain.c | 71 ++++++++++++++++++-------------
>   tools/xenstore/xenstored_domain.h |  5 ++-
>   2 files changed, 45 insertions(+), 31 deletions(-)
> 
> diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
> index 44e72937fa..f459c5aabb 100644
> --- a/tools/xenstore/xenstored_domain.c
> +++ b/tools/xenstore/xenstored_domain.c
> @@ -69,8 +69,8 @@ struct domain
>   	/* Has domain been officially introduced? */
>   	bool introduced;
>   
> -	/* number of entry from this domain in the store */
> -	int nbentry;
> +	/* Accounting data for this domain. */
> +	unsigned int acc[ACC_N];
>   
>   	/* Amount of memory allocated for this domain. */
>   	int memory;
> @@ -246,7 +246,7 @@ static int domain_tree_remove_sub(const void *ctx, struct connection *conn,
>   
>   	if (keep_orphans) {
>   		set_tdb_key(node->name, &key);
> -		domain->nbentry--;
> +		domain_nbentry_dec(NULL, domain->domid);
>   		node->perms.p[0].id = priv_domid;
>   		node->acc.memory = 0;
>   		domain_nbentry_inc(NULL, priv_domid);
> @@ -270,7 +270,7 @@ static int domain_tree_remove_sub(const void *ctx, struct connection *conn,
>   		ret = WALK_TREE_SKIP_CHILDREN;
>   	}
>   
> -	return domain->nbentry > 0 ? ret : WALK_TREE_SUCCESS_STOP;
> +	return domain->acc[ACC_NODES] ? ret : WALK_TREE_SUCCESS_STOP;
>   }
>   
>   static void domain_tree_remove(struct domain *domain)
> @@ -278,7 +278,7 @@ static void domain_tree_remove(struct domain *domain)
>   	int ret;
>   	struct walk_funcs walkfuncs = { .enter = domain_tree_remove_sub };
>   
> -	if (domain->nbentry > 0) {
> +	if (domain->acc[ACC_NODES]) {
>   		ret = walk_node_tree(domain, NULL, "/", &walkfuncs, domain);
>   		if (ret == WALK_TREE_ERROR_STOP)
>   			syslog(LOG_ERR,
> @@ -437,7 +437,7 @@ int domain_get_quota(const void *ctx, struct connection *conn,
>   	resp = talloc_asprintf_append(resp, "%-16s: %8d\n", #t, e); \
>   	if (!resp) return ENOMEM
>   
> -	ent(nodes, d->nbentry);
> +	ent(nodes, d->acc[ACC_NODES]);
>   	ent(watches, d->nbwatch);
>   	ent(transactions, ta);
>   	ent(outstanding, d->nboutstanding);
> @@ -1047,8 +1047,28 @@ int domain_adjust_node_perms(struct node *node)
>   	return 0;
>   }
>   
> -static int domain_nbentry_add(struct connection *conn, unsigned int domid,
> -			      int add, bool no_dom_alloc)
> +static int domain_acc_add_chk(struct domain *d, enum accitem what, int add,
> +			      unsigned int domid)

You are passing the domid but this doesn't seem to be used within the 
function. Also, from just reading at this prototype, it is not clear to 
me whether 'domid' is meant to correspond to the one of 'd'.

The name is also a bit confusing because below you have a function call 
domain_acc_add() that will update "d->acc[what]" so I would expect this 
function to also update it after a sanity get.

I would suggest to rename it to domain_acc_get_chk() or similar (see 
below for some context).

> +{
> +	assert(what < ARRAY_SIZE(d->acc));
> +
> +	if ((add < 0 && -add > d->acc[what]) ||
> +	    (d->acc[what] + add) > INT_MAX) {

NIT: Even if I know that 'add' will unlikely be INT_MAX, it would be 
better to use '(INT_MAX - d->acc[what]) < add)'. So there is no overflow 
possible.

> +		/*
> +		 * In a transaction when a node is being added/removed AND the
> +		 * same node has been added/removed outside the transaction in
> +		 * parallel, the resulting value will be wrong. This is no
> +		 * problem, as the transaction will fail due to the resulting
> +		 * conflict.
> +		 */
> +		return (add < 0) ? 0 : INT_MAX;
> +	}
> +
> +	return d->acc[what] + add;
> +}
> +
> +static int domain_acc_add(struct connection *conn, unsigned int domid,
> +			  enum accitem what, int add, bool no_dom_alloc)
>   {
>   	struct domain *d;
>   	struct list_head *head;
> @@ -1071,56 +1091,49 @@ static int domain_nbentry_add(struct connection *conn, unsigned int domid,
>   		}
>   	}
>   
> -	if (conn && conn->transaction) {
> +	if (conn && conn->transaction && what < ACC_TR_N) {

Do you have a use case where 'what' is >= ACC_TR_N and you want to 
modify d->acc?

>   		head = transaction_get_changed_domains(conn->transaction);
> -		ret = acc_add_changed_dom(conn->transaction, head, ACC_NODES,
> +		ret = acc_add_changed_dom(conn->transaction, head, what,
>   					  add, domid);
>   		if (errno) {
>   			fail_transaction(conn->transaction);
>   			return -1;
>   		}
> -		/*
> -		 * In a transaction when a node is being added/removed AND the
> -		 * same node has been added/removed outside the transaction in
> -		 * parallel, the resulting number of nodes will be wrong. This
> -		 * is no problem, as the transaction will fail due to the
> -		 * resulting conflict.
> -		 * In the node remove case the resulting number can be even
> -		 * negative, which should be avoided.
> -		 */
> -		return max(d->nbentry + ret, 0);
> +		return domain_acc_add_chk(d, what, ret, domid);

I was going to ask here why we are updating d->acc[what]. However the 
function is not doing what I was expecting from the name. You are only 
returning the number of entries adjusted.

>   	}
>   
> -	d->nbentry += add;
> +	d->acc[what] = domain_acc_add_chk(d, what, add, domid);
>   
> -	return d->nbentry;
> +	return d->acc[what];
>   }
>   
>   int domain_nbentry_inc(struct connection *conn, unsigned int domid)
>   {
> -	return (domain_nbentry_add(conn, domid, 1, false) < 0) ? errno : 0;
> +	return (domain_acc_add(conn, domid, ACC_NODES, 1, false) < 0)
> +	       ? errno : 0;
>   }
>   
>   int domain_nbentry_dec(struct connection *conn, unsigned int domid)
>   {
> -	return (domain_nbentry_add(conn, domid, -1, true) < 0) ? errno : 0;
> +	return (domain_acc_add(conn, domid, ACC_NODES, -1, true) < 0)
> +	       ? errno : 0;
>   }
>   
>   int domain_nbentry_fix(unsigned int domid, int num, bool update)
>   {
>   	int ret;
>   
> -	ret = domain_nbentry_add(NULL, domid, update ? num : 0, update);
> +	ret = domain_acc_add(NULL, domid, ACC_NODES, update ? num : 0, update);
>   	if (ret < 0 || update)
>   		return ret;
>   
>   	return domid_is_unprivileged(domid) ? ret + num : 0;
>   }
>   
> -int domain_nbentry(struct connection *conn)
> +unsigned int domain_nbentry(struct connection *conn)
>   {
>   	return domain_is_unprivileged(conn)
> -	       ? domain_nbentry_add(conn, conn->id, 0, true) : 0;
> +	       ? domain_acc_add(conn, conn->id, ACC_NODES, 0, true) : 0;
>   }
>   
>   static bool domain_chk_quota(struct domain *domain, int mem)
> @@ -1597,7 +1610,7 @@ static int domain_check_acc_init_sub(const void *k, void *v, void *arg)
>   	 * If everything is correct incrementing the value for each node will
>   	 * result in dom->nodes being 0 at the end.
>   	 */
> -	dom->nodes = -d->nbentry;
> +	dom->nodes = -d->acc[ACC_NODES];
>   
>   	if (!hashtable_insert(domains, &dom->domid, dom)) {
>   		talloc_free(dom);
> @@ -1652,7 +1665,7 @@ static int domain_check_acc_cb(const void *k, void *v, void *arg)
>   	if (!d)
>   		return 0;
>   
> -	d->nbentry += dom->nodes;
> +	d->acc[ACC_NODES] += dom->nodes;
>   
>   	return 0;
>   }
> diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
> index 6a2b76a85b..8259c114b0 100644
> --- a/tools/xenstore/xenstored_domain.h
> +++ b/tools/xenstore/xenstored_domain.h
> @@ -21,7 +21,8 @@
>   
>   enum accitem {
>   	ACC_NODES,
> -	ACC_TR_N	/* Number of elements per transaction and domain. */
> +	ACC_TR_N,        /* Number of elements per transaction and domain. */

The churn here could have been avoided if you add a "," even for the edn 
element and properly indented the comment in the original patch.

Also, was the comment indented to be updated to remove "and domain"?


> +	ACC_N = ACC_TR_N /* Number of elements per domain. */
>   };
>   
>   void handle_event(void);
> @@ -72,7 +73,7 @@ int domain_alloc_permrefs(struct node_perms *perms);
>   int domain_nbentry_inc(struct connection *conn, unsigned int domid);
>   int domain_nbentry_dec(struct connection *conn, unsigned int domid);
>   int domain_nbentry_fix(unsigned int domid, int num, bool update);
> -int domain_nbentry(struct connection *conn);
> +unsigned int domain_nbentry(struct connection *conn);
>   int domain_memory_add(unsigned int domid, int mem, bool no_quota_check);
>   
>   /*

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 01/13] tools/xenstore: don't allow creating too many nodes in a transaction
  2023-01-20 10:00 ` [PATCH v2 01/13] tools/xenstore: don't allow creating too many nodes in a transaction Juergen Gross
@ 2023-02-20  9:46   ` Julien Grall
  2023-02-20 11:04     ` Juergen Gross
  0 siblings, 1 reply; 40+ messages in thread
From: Julien Grall @ 2023-02-20  9:46 UTC (permalink / raw)
  To: Juergen Gross, xen-devel; +Cc: Wei Liu, Anthony PERARD

Hi Juergen,

On 20/01/2023 10:00, Juergen Gross wrote:
> The accounting for the number of nodes of a domain in an active
> transaction is not working correctly, as it allows to create arbitrary
> number of nodes. The transaction will finally fail due to exceeding
> the number of nodes quota, but before closing the transaction an
> unprivileged guest could cause Xenstore to use a lot of memory.

I know I said I would delay my decision on this patch. However, I was 
still expecting the commit message to be updated based on our previous 
discussion.

Also thinking more about it, "The transaction will finally fail due to 
exceeding the number of nodes quota" may not be true for a couple of 
reasons:
   1) The transaction may removed a node afterwards.
   2) A node may have been removed outside of the transaction.

In both situation, the transaction will still be committed. This will 
now be prevented by this patch.

While I understand, they may be edge cases, this is also true for what 
you are aiming to solve. So I am still not convinced about the benefits 
of this patch.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 01/13] tools/xenstore: don't allow creating too many nodes in a transaction
  2023-02-20  9:46   ` Julien Grall
@ 2023-02-20 11:04     ` Juergen Gross
  2023-02-20 12:07       ` Julien Grall
  0 siblings, 1 reply; 40+ messages in thread
From: Juergen Gross @ 2023-02-20 11:04 UTC (permalink / raw)
  To: Julien Grall, xen-devel; +Cc: Wei Liu, Anthony PERARD


[-- Attachment #1.1.1: Type: text/plain, Size: 1776 bytes --]

On 20.02.23 10:46, Julien Grall wrote:
> Hi Juergen,
> 
> On 20/01/2023 10:00, Juergen Gross wrote:
>> The accounting for the number of nodes of a domain in an active
>> transaction is not working correctly, as it allows to create arbitrary
>> number of nodes. The transaction will finally fail due to exceeding
>> the number of nodes quota, but before closing the transaction an
>> unprivileged guest could cause Xenstore to use a lot of memory.
> 
> I know I said I would delay my decision on this patch. However, I was still 
> expecting the commit message to be updated based on our previous discussion.

As said before, I was waiting on the settlement of our discussion before
doing the update.

> Also thinking more about it, "The transaction will finally fail due to exceeding 
> the number of nodes quota" may not be true for a couple of reasons:
>    1) The transaction may removed a node afterwards.

Yes, and? Just because it is a transaction, this is still a violation of
the quota. Even outside a transaction you could use the same reasoning,
but you don't (which is correct, of course).

In case you never finish the transaction, you are owner of more than
allowed nodes.

>    2) A node may have been removed outside of the transaction.

If the removed node hasn't been touched by the transaction, it will be
accounted for correctly. If it has been touched, the transaction will
fail anyway.

> In both situation, the transaction will still be committed. This will now be 
> prevented by this patch.

As said above, only in the first case.

> While I understand, they may be edge cases, this is also true for what you are 
> aiming to solve. So I am still not convinced about the benefits of this patch.


Juergen


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 03/13] tools/xenstore: introduce accounting data array for per-domain values
  2023-02-17 19:29   ` Julien Grall
@ 2023-02-20 11:20     ` Juergen Gross
  2023-02-20 12:13       ` Julien Grall
  0 siblings, 1 reply; 40+ messages in thread
From: Juergen Gross @ 2023-02-20 11:20 UTC (permalink / raw)
  To: Julien Grall, xen-devel; +Cc: Wei Liu, Anthony PERARD


[-- Attachment #1.1.1: Type: text/plain, Size: 10664 bytes --]

On 17.02.23 20:29, Julien Grall wrote:
> Hi Juergen,
> 
> On 20/01/2023 10:00, Juergen Gross wrote:
>> Introduce the scheme of an accounting data array for per-domain
>> accounting data and use it initially for the number of nodes owned by
>> a domain.
>>
>> Make the accounting data type to be unsigned int, as no data is allowed
>> to be negative at any time.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>   tools/xenstore/xenstored_domain.c | 71 ++++++++++++++++++-------------
>>   tools/xenstore/xenstored_domain.h |  5 ++-
>>   2 files changed, 45 insertions(+), 31 deletions(-)
>>
>> diff --git a/tools/xenstore/xenstored_domain.c 
>> b/tools/xenstore/xenstored_domain.c
>> index 44e72937fa..f459c5aabb 100644
>> --- a/tools/xenstore/xenstored_domain.c
>> +++ b/tools/xenstore/xenstored_domain.c
>> @@ -69,8 +69,8 @@ struct domain
>>       /* Has domain been officially introduced? */
>>       bool introduced;
>> -    /* number of entry from this domain in the store */
>> -    int nbentry;
>> +    /* Accounting data for this domain. */
>> +    unsigned int acc[ACC_N];
>>       /* Amount of memory allocated for this domain. */
>>       int memory;
>> @@ -246,7 +246,7 @@ static int domain_tree_remove_sub(const void *ctx, struct 
>> connection *conn,
>>       if (keep_orphans) {
>>           set_tdb_key(node->name, &key);
>> -        domain->nbentry--;
>> +        domain_nbentry_dec(NULL, domain->domid);
>>           node->perms.p[0].id = priv_domid;
>>           node->acc.memory = 0;
>>           domain_nbentry_inc(NULL, priv_domid);
>> @@ -270,7 +270,7 @@ static int domain_tree_remove_sub(const void *ctx, struct 
>> connection *conn,
>>           ret = WALK_TREE_SKIP_CHILDREN;
>>       }
>> -    return domain->nbentry > 0 ? ret : WALK_TREE_SUCCESS_STOP;
>> +    return domain->acc[ACC_NODES] ? ret : WALK_TREE_SUCCESS_STOP;
>>   }
>>   static void domain_tree_remove(struct domain *domain)
>> @@ -278,7 +278,7 @@ static void domain_tree_remove(struct domain *domain)
>>       int ret;
>>       struct walk_funcs walkfuncs = { .enter = domain_tree_remove_sub };
>> -    if (domain->nbentry > 0) {
>> +    if (domain->acc[ACC_NODES]) {
>>           ret = walk_node_tree(domain, NULL, "/", &walkfuncs, domain);
>>           if (ret == WALK_TREE_ERROR_STOP)
>>               syslog(LOG_ERR,
>> @@ -437,7 +437,7 @@ int domain_get_quota(const void *ctx, struct connection 
>> *conn,
>>       resp = talloc_asprintf_append(resp, "%-16s: %8d\n", #t, e); \
>>       if (!resp) return ENOMEM
>> -    ent(nodes, d->nbentry);
>> +    ent(nodes, d->acc[ACC_NODES]);
>>       ent(watches, d->nbwatch);
>>       ent(transactions, ta);
>>       ent(outstanding, d->nboutstanding);
>> @@ -1047,8 +1047,28 @@ int domain_adjust_node_perms(struct node *node)
>>       return 0;
>>   }
>> -static int domain_nbentry_add(struct connection *conn, unsigned int domid,
>> -                  int add, bool no_dom_alloc)
>> +static int domain_acc_add_chk(struct domain *d, enum accitem what, int add,
>> +                  unsigned int domid)
> 
> You are passing the domid but this doesn't seem to be used within the function.

That was added in order to avoid more code churn in a later patch, which is
making use of domid.

> Also, from just reading at this prototype, it is not clear to me whether 'domid' 
> is meant to correspond to the one of 'd'.

It is corresponding to 'd', so I think I can remove the domid parameter without
problem (I think I had it this way because d could be NULL in my initial
version, which changed afterwards - I'm not sure about the history, though).

> The name is also a bit confusing because below you have a function call 
> domain_acc_add() that will update "d->acc[what]" so I would expect this function 
> to also update it after a sanity get.
> 
> I would suggest to rename it to domain_acc_get_chk() or similar (see below for 
> some context).

I could do that, but check is done based on the current value plus an "add"
value, hence the current name. Would domain_acc_add_valid() be a better
name?

> 
>> +{
>> +    assert(what < ARRAY_SIZE(d->acc));
>> +
>> +    if ((add < 0 && -add > d->acc[what]) ||
>> +        (d->acc[what] + add) > INT_MAX) {
> 
> NIT: Even if I know that 'add' will unlikely be INT_MAX, it would be better to 
> use '(INT_MAX - d->acc[what]) < add)'. So there is no overflow possible.

Okay.

> 
>> +        /*
>> +         * In a transaction when a node is being added/removed AND the
>> +         * same node has been added/removed outside the transaction in
>> +         * parallel, the resulting value will be wrong. This is no
>> +         * problem, as the transaction will fail due to the resulting
>> +         * conflict.
>> +         */
>> +        return (add < 0) ? 0 : INT_MAX;
>> +    }
>> +
>> +    return d->acc[what] + add;
>> +}
>> +
>> +static int domain_acc_add(struct connection *conn, unsigned int domid,
>> +              enum accitem what, int add, bool no_dom_alloc)
>>   {
>>       struct domain *d;
>>       struct list_head *head;
>> @@ -1071,56 +1091,49 @@ static int domain_nbentry_add(struct connection *conn, 
>> unsigned int domid,
>>           }
>>       }
>> -    if (conn && conn->transaction) {
>> +    if (conn && conn->transaction && what < ACC_TR_N) {
> 
> Do you have a use case where 'what' is >= ACC_TR_N and you want to modify d->acc?

Yes, please see patch 7.

> 
>>           head = transaction_get_changed_domains(conn->transaction);
>> -        ret = acc_add_changed_dom(conn->transaction, head, ACC_NODES,
>> +        ret = acc_add_changed_dom(conn->transaction, head, what,
>>                         add, domid);
>>           if (errno) {
>>               fail_transaction(conn->transaction);
>>               return -1;
>>           }
>> -        /*
>> -         * In a transaction when a node is being added/removed AND the
>> -         * same node has been added/removed outside the transaction in
>> -         * parallel, the resulting number of nodes will be wrong. This
>> -         * is no problem, as the transaction will fail due to the
>> -         * resulting conflict.
>> -         * In the node remove case the resulting number can be even
>> -         * negative, which should be avoided.
>> -         */
>> -        return max(d->nbentry + ret, 0);
>> +        return domain_acc_add_chk(d, what, ret, domid);
> 
> I was going to ask here why we are updating d->acc[what]. However the function 
> is not doing what I was expecting from the name. You are only returning the 
> number of entries adjusted.

Huh?

In the transaction case acc_add_changed_dom() is updating the value in the
changed_domain data of the transaction, while in the non-transaction case ...

> 
>>       }
>> -    d->nbentry += add;
>> +    d->acc[what] = domain_acc_add_chk(d, what, add, domid);
>> -    return d->nbentry;
>> +    return d->acc[what];

... the domain data is updated here.

>>   }
>>   int domain_nbentry_inc(struct connection *conn, unsigned int domid)
>>   {
>> -    return (domain_nbentry_add(conn, domid, 1, false) < 0) ? errno : 0;
>> +    return (domain_acc_add(conn, domid, ACC_NODES, 1, false) < 0)
>> +           ? errno : 0;
>>   }
>>   int domain_nbentry_dec(struct connection *conn, unsigned int domid)
>>   {
>> -    return (domain_nbentry_add(conn, domid, -1, true) < 0) ? errno : 0;
>> +    return (domain_acc_add(conn, domid, ACC_NODES, -1, true) < 0)
>> +           ? errno : 0;
>>   }
>>   int domain_nbentry_fix(unsigned int domid, int num, bool update)
>>   {
>>       int ret;
>> -    ret = domain_nbentry_add(NULL, domid, update ? num : 0, update);
>> +    ret = domain_acc_add(NULL, domid, ACC_NODES, update ? num : 0, update);
>>       if (ret < 0 || update)
>>           return ret;
>>       return domid_is_unprivileged(domid) ? ret + num : 0;
>>   }
>> -int domain_nbentry(struct connection *conn)
>> +unsigned int domain_nbentry(struct connection *conn)
>>   {
>>       return domain_is_unprivileged(conn)
>> -           ? domain_nbentry_add(conn, conn->id, 0, true) : 0;
>> +           ? domain_acc_add(conn, conn->id, ACC_NODES, 0, true) : 0;
>>   }
>>   static bool domain_chk_quota(struct domain *domain, int mem)
>> @@ -1597,7 +1610,7 @@ static int domain_check_acc_init_sub(const void *k, void 
>> *v, void *arg)
>>        * If everything is correct incrementing the value for each node will
>>        * result in dom->nodes being 0 at the end.
>>        */
>> -    dom->nodes = -d->nbentry;
>> +    dom->nodes = -d->acc[ACC_NODES];
>>       if (!hashtable_insert(domains, &dom->domid, dom)) {
>>           talloc_free(dom);
>> @@ -1652,7 +1665,7 @@ static int domain_check_acc_cb(const void *k, void *v, 
>> void *arg)
>>       if (!d)
>>           return 0;
>> -    d->nbentry += dom->nodes;
>> +    d->acc[ACC_NODES] += dom->nodes;
>>       return 0;
>>   }
>> diff --git a/tools/xenstore/xenstored_domain.h 
>> b/tools/xenstore/xenstored_domain.h
>> index 6a2b76a85b..8259c114b0 100644
>> --- a/tools/xenstore/xenstored_domain.h
>> +++ b/tools/xenstore/xenstored_domain.h
>> @@ -21,7 +21,8 @@
>>   enum accitem {
>>       ACC_NODES,
>> -    ACC_TR_N    /* Number of elements per transaction and domain. */
>> +    ACC_TR_N,        /* Number of elements per transaction and domain. */
> 
> The churn here could have been avoided if you add a "," even for the edn element 
> and properly indented the comment in the original patch.

Okay.

> 
> Also, was the comment indented to be updated to remove "and domain"?

No, the new indentation was meant to keep the comment start aligned with
the one of the following line.

> 
> 
>> +    ACC_N = ACC_TR_N /* Number of elements per domain. */


Juergen


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 01/13] tools/xenstore: don't allow creating too many nodes in a transaction
  2023-02-20 11:04     ` Juergen Gross
@ 2023-02-20 12:07       ` Julien Grall
  2023-02-20 13:49         ` Juergen Gross
  0 siblings, 1 reply; 40+ messages in thread
From: Julien Grall @ 2023-02-20 12:07 UTC (permalink / raw)
  To: Juergen Gross, xen-devel; +Cc: Wei Liu, Anthony PERARD

Hi Juergen,

On 20/02/2023 11:04, Juergen Gross wrote:
> On 20.02.23 10:46, Julien Grall wrote:
>> Hi Juergen,
>>
>> On 20/01/2023 10:00, Juergen Gross wrote:
>>> The accounting for the number of nodes of a domain in an active
>>> transaction is not working correctly, as it allows to create arbitrary
>>> number of nodes. The transaction will finally fail due to exceeding
>>> the number of nodes quota, but before closing the transaction an
>>> unprivileged guest could cause Xenstore to use a lot of memory.
>>
>> I know I said I would delay my decision on this patch. However, I was 
>> still expecting the commit message to be updated based on our previous 
>> discussion.
> 
> As said before, I was waiting on the settlement of our discussion before
> doing the update.

This is not a very good practice to resend a patch without recording the 
disagreement because new reviewers may not be aware of the disagreement 
and this could end up to be committed by mistake...

> 
>> Also thinking more about it, "The transaction will finally fail due to 
>> exceeding the number of nodes quota" may not be true for a couple of 
>> reasons:
>>    1) The transaction may removed a node afterwards.
> 
> Yes, and? Just because it is a transaction, this is still a violation of
> the quota. Even outside a transaction you could use the same reasoning,
> but you don't (which is correct, of course).

I can understand your point. However, to me, the goal of the transaction 
is to commit everything in one go at the end. So the violations in the 
middle should not matter.

Furthermore, I would expect a transaction to work on a snapshot of the 
system. So it feels really strange to me that we are constantly checking 
the quota with the updated values rather than the initial one.

> 
> In case you never finish the transaction, you are owner of more than
> allowed nodes.
How so? The nodes would not be committed so you don't really own them.
We also have quotas to limit the number of nodes accessed in a transaction.

> 
>>    2) A node may have been removed outside of the transaction.
> 
> If the removed node hasn't been touched by the transaction, it will be
> accounted for correctly.

It depends on when the node was removed. If it is removed *after* the 
quota was hit then your transaction will fail.

>  If it has been touched, the transaction will
> fail anyway.
So the transaction will fail to commit because there is a conflict. So 
the client is expected to retry it. We should not expected the client to 
retry for error like quota.

So the overall behavior is changed.

> 
>> In both situation, the transaction will still be committed. This will 
>> now be prevented by this patch.
> 
> As said above, only in the first case.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 03/13] tools/xenstore: introduce accounting data array for per-domain values
  2023-02-20 11:20     ` Juergen Gross
@ 2023-02-20 12:13       ` Julien Grall
  0 siblings, 0 replies; 40+ messages in thread
From: Julien Grall @ 2023-02-20 12:13 UTC (permalink / raw)
  To: Juergen Gross, xen-devel; +Cc: Wei Liu, Anthony PERARD



On 20/02/2023 11:20, Juergen Gross wrote:
> On 17.02.23 20:29, Julien Grall wrote:
>> Hi Juergen,
>>
>> On 20/01/2023 10:00, Juergen Gross wrote:
>>> Introduce the scheme of an accounting data array for per-domain
>>> accounting data and use it initially for the number of nodes owned by
>>> a domain.
>>>
>>> Make the accounting data type to be unsigned int, as no data is allowed
>>> to be negative at any time.
>>>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>> ---
>>>   tools/xenstore/xenstored_domain.c | 71 ++++++++++++++++++-------------
>>>   tools/xenstore/xenstored_domain.h |  5 ++-
>>>   2 files changed, 45 insertions(+), 31 deletions(-)
>>>
>>> diff --git a/tools/xenstore/xenstored_domain.c 
>>> b/tools/xenstore/xenstored_domain.c
>>> index 44e72937fa..f459c5aabb 100644
>>> --- a/tools/xenstore/xenstored_domain.c
>>> +++ b/tools/xenstore/xenstored_domain.c
>>> @@ -69,8 +69,8 @@ struct domain
>>>       /* Has domain been officially introduced? */
>>>       bool introduced;
>>> -    /* number of entry from this domain in the store */
>>> -    int nbentry;
>>> +    /* Accounting data for this domain. */
>>> +    unsigned int acc[ACC_N];
>>>       /* Amount of memory allocated for this domain. */
>>>       int memory;
>>> @@ -246,7 +246,7 @@ static int domain_tree_remove_sub(const void 
>>> *ctx, struct connection *conn,
>>>       if (keep_orphans) {
>>>           set_tdb_key(node->name, &key);
>>> -        domain->nbentry--;
>>> +        domain_nbentry_dec(NULL, domain->domid);
>>>           node->perms.p[0].id = priv_domid;
>>>           node->acc.memory = 0;
>>>           domain_nbentry_inc(NULL, priv_domid);
>>> @@ -270,7 +270,7 @@ static int domain_tree_remove_sub(const void 
>>> *ctx, struct connection *conn,
>>>           ret = WALK_TREE_SKIP_CHILDREN;
>>>       }
>>> -    return domain->nbentry > 0 ? ret : WALK_TREE_SUCCESS_STOP;
>>> +    return domain->acc[ACC_NODES] ? ret : WALK_TREE_SUCCESS_STOP;
>>>   }
>>>   static void domain_tree_remove(struct domain *domain)
>>> @@ -278,7 +278,7 @@ static void domain_tree_remove(struct domain 
>>> *domain)
>>>       int ret;
>>>       struct walk_funcs walkfuncs = { .enter = domain_tree_remove_sub };
>>> -    if (domain->nbentry > 0) {
>>> +    if (domain->acc[ACC_NODES]) {
>>>           ret = walk_node_tree(domain, NULL, "/", &walkfuncs, domain);
>>>           if (ret == WALK_TREE_ERROR_STOP)
>>>               syslog(LOG_ERR,
>>> @@ -437,7 +437,7 @@ int domain_get_quota(const void *ctx, struct 
>>> connection *conn,
>>>       resp = talloc_asprintf_append(resp, "%-16s: %8d\n", #t, e); \
>>>       if (!resp) return ENOMEM
>>> -    ent(nodes, d->nbentry);
>>> +    ent(nodes, d->acc[ACC_NODES]);
>>>       ent(watches, d->nbwatch);
>>>       ent(transactions, ta);
>>>       ent(outstanding, d->nboutstanding);
>>> @@ -1047,8 +1047,28 @@ int domain_adjust_node_perms(struct node *node)
>>>       return 0;
>>>   }
>>> -static int domain_nbentry_add(struct connection *conn, unsigned int 
>>> domid,
>>> -                  int add, bool no_dom_alloc)
>>> +static int domain_acc_add_chk(struct domain *d, enum accitem what, 
>>> int add,
>>> +                  unsigned int domid)
>>
>> You are passing the domid but this doesn't seem to be used within the 
>> function.
> 
> That was added in order to avoid more code churn in a later patch, which is
> making use of domid.
> 
>> Also, from just reading at this prototype, it is not clear to me 
>> whether 'domid' is meant to correspond to the one of 'd'.
> 
> It is corresponding to 'd', so I think I can remove the domid parameter 
> without
> problem (I think I had it this way because d could be NULL in my initial
> version, which changed afterwards - I'm not sure about the history, 
> though).

Ok. Please mention it in the commit message if you want to keep it.

> 
>> The name is also a bit confusing because below you have a function 
>> call domain_acc_add() that will update "d->acc[what]" so I would 
>> expect this function to also update it after a sanity get.
>>
>> I would suggest to rename it to domain_acc_get_chk() or similar (see 
>> below for some context).
> 
> I could do that, but check is done based on the current value plus an "add"
> value, hence the current name. Would domain_acc_add_valid() be a better
> name?

Fine with me.

> 
>>
>>> +{
>>> +    assert(what < ARRAY_SIZE(d->acc));
>>> +
>>> +    if ((add < 0 && -add > d->acc[what]) ||
>>> +        (d->acc[what] + add) > INT_MAX) {
>>
>> NIT: Even if I know that 'add' will unlikely be INT_MAX, it would be 
>> better to use '(INT_MAX - d->acc[what]) < add)'. So there is no 
>> overflow possible.
> 
> Okay.
> 
>>
>>> +        /*
>>> +         * In a transaction when a node is being added/removed AND the
>>> +         * same node has been added/removed outside the transaction in
>>> +         * parallel, the resulting value will be wrong. This is no
>>> +         * problem, as the transaction will fail due to the resulting
>>> +         * conflict.
>>> +         */
>>> +        return (add < 0) ? 0 : INT_MAX;
>>> +    }
>>> +
>>> +    return d->acc[what] + add;
>>> +}
>>> +
>>> +static int domain_acc_add(struct connection *conn, unsigned int domid,
>>> +              enum accitem what, int add, bool no_dom_alloc)
>>>   {
>>>       struct domain *d;
>>>       struct list_head *head;
>>> @@ -1071,56 +1091,49 @@ static int domain_nbentry_add(struct 
>>> connection *conn, unsigned int domid,
>>>           }
>>>       }
>>> -    if (conn && conn->transaction) {
>>> +    if (conn && conn->transaction && what < ACC_TR_N) {
>>
>> Do you have a use case where 'what' is >= ACC_TR_N and you want to 
>> modify d->acc?
> 
> Yes, please see patch 7.
> 
>>
>>>           head = transaction_get_changed_domains(conn->transaction);
>>> -        ret = acc_add_changed_dom(conn->transaction, head, ACC_NODES,
>>> +        ret = acc_add_changed_dom(conn->transaction, head, what,
>>>                         add, domid);
>>>           if (errno) {
>>>               fail_transaction(conn->transaction);
>>>               return -1;
>>>           }
>>> -        /*
>>> -         * In a transaction when a node is being added/removed AND the
>>> -         * same node has been added/removed outside the transaction in
>>> -         * parallel, the resulting number of nodes will be wrong. This
>>> -         * is no problem, as the transaction will fail due to the
>>> -         * resulting conflict.
>>> -         * In the node remove case the resulting number can be even
>>> -         * negative, which should be avoided.
>>> -         */
>>> -        return max(d->nbentry + ret, 0);
>>> +        return domain_acc_add_chk(d, what, ret, domid);
>>
>> I was going to ask here why we are updating d->acc[what]. However the 
>> function is not doing what I was expecting from the name. You are only 
>> returning the number of entries adjusted.
> 
> Huh?
> 
> In the transaction case acc_add_changed_dom() is updating the value in the
> changed_domain data of the transaction, while in the non-transaction 
> case ...

I was referring to the name of the function domain_acc_add_chk() which 
lead me to think the helper was doing something different. If you rename 
it, then the confusion is removed.

> 
>>
>>>       }
>>> -    d->nbentry += add;
>>> +    d->acc[what] = domain_acc_add_chk(d, what, add, domid);
>>> -    return d->nbentry;
>>> +    return d->acc[what];
> 
> ... the domain data is updated here.
> 
>>>   }
>>>   int domain_nbentry_inc(struct connection *conn, unsigned int domid)
>>>   {
>>> -    return (domain_nbentry_add(conn, domid, 1, false) < 0) ? errno : 0;
>>> +    return (domain_acc_add(conn, domid, ACC_NODES, 1, false) < 0)
>>> +           ? errno : 0;
>>>   }
>>>   int domain_nbentry_dec(struct connection *conn, unsigned int domid)
>>>   {
>>> -    return (domain_nbentry_add(conn, domid, -1, true) < 0) ? errno : 0;
>>> +    return (domain_acc_add(conn, domid, ACC_NODES, -1, true) < 0)
>>> +           ? errno : 0;
>>>   }
>>>   int domain_nbentry_fix(unsigned int domid, int num, bool update)
>>>   {
>>>       int ret;
>>> -    ret = domain_nbentry_add(NULL, domid, update ? num : 0, update);
>>> +    ret = domain_acc_add(NULL, domid, ACC_NODES, update ? num : 0, 
>>> update);
>>>       if (ret < 0 || update)
>>>           return ret;
>>>       return domid_is_unprivileged(domid) ? ret + num : 0;
>>>   }
>>> -int domain_nbentry(struct connection *conn)
>>> +unsigned int domain_nbentry(struct connection *conn)
>>>   {
>>>       return domain_is_unprivileged(conn)
>>> -           ? domain_nbentry_add(conn, conn->id, 0, true) : 0;
>>> +           ? domain_acc_add(conn, conn->id, ACC_NODES, 0, true) : 0;
>>>   }
>>>   static bool domain_chk_quota(struct domain *domain, int mem)
>>> @@ -1597,7 +1610,7 @@ static int domain_check_acc_init_sub(const void 
>>> *k, void *v, void *arg)
>>>        * If everything is correct incrementing the value for each 
>>> node will
>>>        * result in dom->nodes being 0 at the end.
>>>        */
>>> -    dom->nodes = -d->nbentry;
>>> +    dom->nodes = -d->acc[ACC_NODES];
>>>       if (!hashtable_insert(domains, &dom->domid, dom)) {
>>>           talloc_free(dom);
>>> @@ -1652,7 +1665,7 @@ static int domain_check_acc_cb(const void *k, 
>>> void *v, void *arg)
>>>       if (!d)
>>>           return 0;
>>> -    d->nbentry += dom->nodes;
>>> +    d->acc[ACC_NODES] += dom->nodes;
>>>       return 0;
>>>   }
>>> diff --git a/tools/xenstore/xenstored_domain.h 
>>> b/tools/xenstore/xenstored_domain.h
>>> index 6a2b76a85b..8259c114b0 100644
>>> --- a/tools/xenstore/xenstored_domain.h
>>> +++ b/tools/xenstore/xenstored_domain.h
>>> @@ -21,7 +21,8 @@
>>>   enum accitem {
>>>       ACC_NODES,
>>> -    ACC_TR_N    /* Number of elements per transaction and domain. */
>>> +    ACC_TR_N,        /* Number of elements per transaction and 
>>> domain. */
>>
>> The churn here could have been avoided if you add a "," even for the 
>> edn element and properly indented the comment in the original patch.
> 
> Okay.
> 
>>
>> Also, was the comment indented to be updated to remove "and domain"?
> 
> No, the new indentation was meant to keep the comment start aligned with
> the one of the following line.

Sorry, I meant "intended" rather than "indented".

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 01/13] tools/xenstore: don't allow creating too many nodes in a transaction
  2023-02-20 12:07       ` Julien Grall
@ 2023-02-20 13:49         ` Juergen Gross
  2023-02-20 14:06           ` Juergen Gross
  2023-02-20 14:15           ` Julien Grall
  0 siblings, 2 replies; 40+ messages in thread
From: Juergen Gross @ 2023-02-20 13:49 UTC (permalink / raw)
  To: Julien Grall, xen-devel; +Cc: Wei Liu, Anthony PERARD


[-- Attachment #1.1.1: Type: text/plain, Size: 4683 bytes --]

On 20.02.23 13:07, Julien Grall wrote:
> Hi Juergen,
> 
> On 20/02/2023 11:04, Juergen Gross wrote:
>> On 20.02.23 10:46, Julien Grall wrote:
>>> Hi Juergen,
>>>
>>> On 20/01/2023 10:00, Juergen Gross wrote:
>>>> The accounting for the number of nodes of a domain in an active
>>>> transaction is not working correctly, as it allows to create arbitrary
>>>> number of nodes. The transaction will finally fail due to exceeding
>>>> the number of nodes quota, but before closing the transaction an
>>>> unprivileged guest could cause Xenstore to use a lot of memory.
>>>
>>> I know I said I would delay my decision on this patch. However, I was still 
>>> expecting the commit message to be updated based on our previous discussion.
>>
>> As said before, I was waiting on the settlement of our discussion before
>> doing the update.
> 
> This is not a very good practice to resend a patch without recording the 
> disagreement because new reviewers may not be aware of the disagreement and this 
> could end up to be committed by mistake...

You said you wanted to look into this patch in detail after the previous
series, so I move it into this series. The wording update would strongly
depend on the outcome of the discussion IMO, so I didn't do it yet.

Not adding the patch in this series would require some additional rebase
effort, so I kept the patch as is.

>>> Also thinking more about it, "The transaction will finally fail due to 
>>> exceeding the number of nodes quota" may not be true for a couple of reasons:
>>>    1) The transaction may removed a node afterwards.
>>
>> Yes, and? Just because it is a transaction, this is still a violation of
>> the quota. Even outside a transaction you could use the same reasoning,
>> but you don't (which is correct, of course).
> 
> I can understand your point. However, to me, the goal of the transaction is to 
> commit everything in one go at the end. So the violations in the middle should 
> not matter.

Of course they should.

We wouldn't allow to write over-sized nodes, even if they could be rewritten in
the same transaction with a smaller size.

We wouldn't allow to create nodes which would violate overall memory
consumption.

We wouldn't allow nodes with more permission entries than allowed, even if they
could be reduced in the same transaction again.

I don't see why the number of nodes shouldn't be taken into account.

> Furthermore, I would expect a transaction to work on a snapshot of the system. 
> So it feels really strange to me that we are constantly checking the quota with 
> the updated values rather than the initial one.

You are aware that the code without this patch is just neglecting the nodes
created in the transaction? It just is using the current number of nodes
outside any transaction for quota check. So I could easily create a scenario
where my new code will succeed, but the current one is failing:

Assume node quota is 1000, and at start of the transaction the guest is owning
999 nodes. In the transaction the guest is deleting 10 nodes, then dom0 is
creating 5 additional nodes owned by the guest. The central node counter is now
1004 (over quota), while the in-transaction count is 994. In the transaction the
guest can now happily create a new node (#995) with my patch, but will fail to
do so without (it sees the 1004 due to the transaction count being ignored).

>> In case you never finish the transaction, you are owner of more than
>> allowed nodes.
> How so? The nodes would not be committed so you don't really own them.

But you can use them inside the transaction.

> We also have quotas to limit the number of nodes accessed in a transaction.

Yes, but you can use the excess nodes in the transaction until that quota
is reached.

>>>    2) A node may have been removed outside of the transaction.
>>
>> If the removed node hasn't been touched by the transaction, it will be
>> accounted for correctly.
> 
> It depends on when the node was removed. If it is removed *after* the quota was 
> hit then your transaction will fail.

Yes. That is why the quota check at the finalization of the transaction has
to be kept.

> 
>>  If it has been touched, the transaction will
>> fail anyway.
> So the transaction will fail to commit because there is a conflict. So the 
> client is expected to retry it. We should not expected the client to retry for 
> error like quota.

Correct. The failure I'm mentioning isn't due to quota, but due to a conflict.

> So the overall behavior is changed.

We do so for any bug-fix. And IMO my patch is a bug-fix.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 01/13] tools/xenstore: don't allow creating too many nodes in a transaction
  2023-02-20 13:49         ` Juergen Gross
@ 2023-02-20 14:06           ` Juergen Gross
  2023-02-20 14:15           ` Julien Grall
  1 sibling, 0 replies; 40+ messages in thread
From: Juergen Gross @ 2023-02-20 14:06 UTC (permalink / raw)
  To: Julien Grall, xen-devel; +Cc: Wei Liu, Anthony PERARD


[-- Attachment #1.1.1: Type: text/plain, Size: 1210 bytes --]

On 20.02.23 14:49, Juergen Gross wrote:
> Assume node quota is 1000, and at start of the transaction the guest is owning
> 999 nodes. In the transaction the guest is deleting 10 nodes, then dom0 is
> creating 5 additional nodes owned by the guest. The central node counter is now
> 1004 (over quota), while the in-transaction count is 994. In the transaction the
> guest can now happily create a new node (#995) with my patch, but will fail to
> do so without (it sees the 1004 due to the transaction count being ignored).

It is even worse, so I'd like to suggest the following commit message:

   tools/xenstore: take transaction internal nodes into account for quota

   The accounting for the number of nodes of a domain in an active
   transaction is not working correctly, as it is checking the node quota
   only against the number of nodes outside the transaction.

   This can result in the transaction finally failing, as node quota is
   checked at the end of the transaction again.

   On the other hand even in a transaction deleting many nodes, new nodes
   might not be creatable, in case the node quota was already reached at
   the start of the transaction.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 01/13] tools/xenstore: don't allow creating too many nodes in a transaction
  2023-02-20 13:49         ` Juergen Gross
  2023-02-20 14:06           ` Juergen Gross
@ 2023-02-20 14:15           ` Julien Grall
  2023-02-20 14:21             ` Juergen Gross
  1 sibling, 1 reply; 40+ messages in thread
From: Julien Grall @ 2023-02-20 14:15 UTC (permalink / raw)
  To: Juergen Gross, xen-devel; +Cc: Wei Liu, Anthony PERARD

On 20/02/2023 13:49, Juergen Gross wrote:
> On 20.02.23 13:07, Julien Grall wrote:
>> Hi Juergen,
>>
>> On 20/02/2023 11:04, Juergen Gross wrote:
>>> On 20.02.23 10:46, Julien Grall wrote:
>>>> Hi Juergen,
>>>>
>>>> On 20/01/2023 10:00, Juergen Gross wrote:
>>>>> The accounting for the number of nodes of a domain in an active
>>>>> transaction is not working correctly, as it allows to create arbitrary
>>>>> number of nodes. The transaction will finally fail due to exceeding
>>>>> the number of nodes quota, but before closing the transaction an
>>>>> unprivileged guest could cause Xenstore to use a lot of memory.
>>>>
>>>> I know I said I would delay my decision on this patch. However, I 
>>>> was still expecting the commit message to be updated based on our 
>>>> previous discussion.
>>>
>>> As said before, I was waiting on the settlement of our discussion before
>>> doing the update.
>>
>> This is not a very good practice to resend a patch without recording 
>> the disagreement because new reviewers may not be aware of the 
>> disagreement and this could end up to be committed by mistake...
> 
> You said you wanted to look into this patch in detail after the previous
> series, so I move it into this series. The wording update would strongly
> depend on the outcome of the discussion IMO, so I didn't do it yet.
This could have been mentioned after ---. This could allow people to 
understand the concern and then participate.

> 
>>>> Also thinking more about it, "The transaction will finally fail due 
>>>> to exceeding the number of nodes quota" may not be true for a couple 
>>>> of reasons:
>>>>    1) The transaction may removed a node afterwards.
>>>
>>> Yes, and? Just because it is a transaction, this is still a violation of
>>> the quota. Even outside a transaction you could use the same reasoning,
>>> but you don't (which is correct, of course).
>>
>> I can understand your point. However, to me, the goal of the 
>> transaction is to commit everything in one go at the end. So the 
>> violations in the middle should not matter.
> 
> Of course they should.
> 
> We wouldn't allow to write over-sized nodes, even if they could be 
> rewritten in
> the same transaction with a smaller size.

I view the two different.

> We wouldn't allow to create nodes which would violate overall memory
> consumption.
> 
> We wouldn't allow nodes with more permission entries than allowed, even 
> if they
> could be reduced in the same transaction again.
> 
> I don't see why the number of nodes shouldn't be taken into account.
> 
>> Furthermore, I would expect a transaction to work on a snapshot of the 
>> system. So it feels really strange to me that we are constantly 
>> checking the quota with the updated values rather than the initial one.
> 
> You are aware that the code without this patch is just neglecting the nodes
> created in the transaction? It just is using the current number of nodes
> outside any transaction for quota check.

Are you referring to the quota check within the transaction?

> So I could easily create a 
> scenario
> where my new code will succeed, but the current one is failing:
> Assume node quota is 1000, and at start of the transaction the guest is 
> owning
> 999 nodes. In the transaction the guest is deleting 10 nodes, then dom0 is
> creating 5 additional nodes owned by the guest. The central node counter 
> is now
> 1004 (over quota), while the in-transaction count is 994.
> In the 
> transaction the
> guest can now happily create a new node (#995) with my patch, but will 
> fail to
> do so without (it sees the 1004 due to the transaction count being 
> ignored).

This doesn't sound correct. To do you have any test I could use to check 
the behavior?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 01/13] tools/xenstore: don't allow creating too many nodes in a transaction
  2023-02-20 14:15           ` Julien Grall
@ 2023-02-20 14:21             ` Juergen Gross
  2023-02-20 18:01               ` Julien Grall
  0 siblings, 1 reply; 40+ messages in thread
From: Juergen Gross @ 2023-02-20 14:21 UTC (permalink / raw)
  To: Julien Grall, xen-devel; +Cc: Wei Liu, Anthony PERARD


[-- Attachment #1.1.1: Type: text/plain, Size: 4217 bytes --]

On 20.02.23 15:15, Julien Grall wrote:
> On 20/02/2023 13:49, Juergen Gross wrote:
>> On 20.02.23 13:07, Julien Grall wrote:
>>> Hi Juergen,
>>>
>>> On 20/02/2023 11:04, Juergen Gross wrote:
>>>> On 20.02.23 10:46, Julien Grall wrote:
>>>>> Hi Juergen,
>>>>>
>>>>> On 20/01/2023 10:00, Juergen Gross wrote:
>>>>>> The accounting for the number of nodes of a domain in an active
>>>>>> transaction is not working correctly, as it allows to create arbitrary
>>>>>> number of nodes. The transaction will finally fail due to exceeding
>>>>>> the number of nodes quota, but before closing the transaction an
>>>>>> unprivileged guest could cause Xenstore to use a lot of memory.
>>>>>
>>>>> I know I said I would delay my decision on this patch. However, I was still 
>>>>> expecting the commit message to be updated based on our previous discussion.
>>>>
>>>> As said before, I was waiting on the settlement of our discussion before
>>>> doing the update.
>>>
>>> This is not a very good practice to resend a patch without recording the 
>>> disagreement because new reviewers may not be aware of the disagreement and 
>>> this could end up to be committed by mistake...
>>
>> You said you wanted to look into this patch in detail after the previous
>> series, so I move it into this series. The wording update would strongly
>> depend on the outcome of the discussion IMO, so I didn't do it yet.
> This could have been mentioned after ---. This could allow people to understand 
> the concern and then participate.

Will do so in future.

> 
>>
>>>>> Also thinking more about it, "The transaction will finally fail due to 
>>>>> exceeding the number of nodes quota" may not be true for a couple of reasons:
>>>>>    1) The transaction may removed a node afterwards.
>>>>
>>>> Yes, and? Just because it is a transaction, this is still a violation of
>>>> the quota. Even outside a transaction you could use the same reasoning,
>>>> but you don't (which is correct, of course).
>>>
>>> I can understand your point. However, to me, the goal of the transaction is 
>>> to commit everything in one go at the end. So the violations in the middle 
>>> should not matter.
>>
>> Of course they should.
>>
>> We wouldn't allow to write over-sized nodes, even if they could be rewritten in
>> the same transaction with a smaller size.
> 
> I view the two different.
> 
>> We wouldn't allow to create nodes which would violate overall memory
>> consumption.
>>
>> We wouldn't allow nodes with more permission entries than allowed, even if they
>> could be reduced in the same transaction again.
>>
>> I don't see why the number of nodes shouldn't be taken into account.
>>
>>> Furthermore, I would expect a transaction to work on a snapshot of the 
>>> system. So it feels really strange to me that we are constantly checking the 
>>> quota with the updated values rather than the initial one.
>>
>> You are aware that the code without this patch is just neglecting the nodes
>> created in the transaction? It just is using the current number of nodes
>> outside any transaction for quota check.
> 
> Are you referring to the quota check within the transaction?

I'm referring to the quota check in create_node().

> 
>> So I could easily create a scenario
>> where my new code will succeed, but the current one is failing:
>> Assume node quota is 1000, and at start of the transaction the guest is owning
>> 999 nodes. In the transaction the guest is deleting 10 nodes, then dom0 is
>> creating 5 additional nodes owned by the guest. The central node counter is now
>> 1004 (over quota), while the in-transaction count is 994.
>> In the transaction the
>> guest can now happily create a new node (#995) with my patch, but will fail to
>> do so without (it sees the 1004 due to the transaction count being ignored).
> 
> This doesn't sound correct. To do you have any test I could use to check the 
> behavior?

Try it:

- create nodes in a guest until you hit the ENOSPC return code due to too many
   nodes
- start a transaction deleting some nodes and then trying to create another
   one, which fail fail with ENOSPC.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 01/13] tools/xenstore: don't allow creating too many nodes in a transaction
  2023-02-20 14:21             ` Juergen Gross
@ 2023-02-20 18:01               ` Julien Grall
  2023-02-21  8:10                 ` Juergen Gross
  0 siblings, 1 reply; 40+ messages in thread
From: Julien Grall @ 2023-02-20 18:01 UTC (permalink / raw)
  To: Juergen Gross, xen-devel; +Cc: Wei Liu, Anthony PERARD



On 20/02/2023 14:21, Juergen Gross wrote:
> On 20.02.23 15:15, Julien Grall wrote:
>> On 20/02/2023 13:49, Juergen Gross wrote:
>>> On 20.02.23 13:07, Julien Grall wrote:
>>>> Hi Juergen,
>>>>
>>>> On 20/02/2023 11:04, Juergen Gross wrote:
>>>>> On 20.02.23 10:46, Julien Grall wrote:
>>>>>> Hi Juergen,
>>>>>>
>>>>>> On 20/01/2023 10:00, Juergen Gross wrote:
>>>>>>> The accounting for the number of nodes of a domain in an active
>>>>>>> transaction is not working correctly, as it allows to create 
>>>>>>> arbitrary
>>>>>>> number of nodes. The transaction will finally fail due to exceeding
>>>>>>> the number of nodes quota, but before closing the transaction an
>>>>>>> unprivileged guest could cause Xenstore to use a lot of memory.
>>>>>>
>>>>>> I know I said I would delay my decision on this patch. However, I 
>>>>>> was still expecting the commit message to be updated based on our 
>>>>>> previous discussion.
>>>>>
>>>>> As said before, I was waiting on the settlement of our discussion 
>>>>> before
>>>>> doing the update.
>>>>
>>>> This is not a very good practice to resend a patch without recording 
>>>> the disagreement because new reviewers may not be aware of the 
>>>> disagreement and this could end up to be committed by mistake...
>>>
>>> You said you wanted to look into this patch in detail after the previous
>>> series, so I move it into this series. The wording update would strongly
>>> depend on the outcome of the discussion IMO, so I didn't do it yet.
>> This could have been mentioned after ---. This could allow people to 
>> understand the concern and then participate.
> 
> Will do so in future.
> 
>>
>>>
>>>>>> Also thinking more about it, "The transaction will finally fail 
>>>>>> due to exceeding the number of nodes quota" may not be true for a 
>>>>>> couple of reasons:
>>>>>>    1) The transaction may removed a node afterwards.
>>>>>
>>>>> Yes, and? Just because it is a transaction, this is still a 
>>>>> violation of
>>>>> the quota. Even outside a transaction you could use the same 
>>>>> reasoning,
>>>>> but you don't (which is correct, of course).
>>>>
>>>> I can understand your point. However, to me, the goal of the 
>>>> transaction is to commit everything in one go at the end. So the 
>>>> violations in the middle should not matter.
>>>
>>> Of course they should.
>>>
>>> We wouldn't allow to write over-sized nodes, even if they could be 
>>> rewritten in
>>> the same transaction with a smaller size.
>>
>> I view the two different.
>>
>>> We wouldn't allow to create nodes which would violate overall memory
>>> consumption.
>>>
>>> We wouldn't allow nodes with more permission entries than allowed, 
>>> even if they
>>> could be reduced in the same transaction again.
>>>
>>> I don't see why the number of nodes shouldn't be taken into account.
>>>
>>>> Furthermore, I would expect a transaction to work on a snapshot of 
>>>> the system. So it feels really strange to me that we are constantly 
>>>> checking the quota with the updated values rather than the initial one.
>>>
>>> You are aware that the code without this patch is just neglecting the 
>>> nodes
>>> created in the transaction? It just is using the current number of nodes
>>> outside any transaction for quota check.
>>
>> Are you referring to the quota check within the transaction?
> 
> I'm referring to the quota check in create_node(). >
>>
>>> So I could easily create a scenario
>>> where my new code will succeed, but the current one is failing:
>>> Assume node quota is 1000, and at start of the transaction the guest 
>>> is owning
>>> 999 nodes. In the transaction the guest is deleting 10 nodes, then 
>>> dom0 is
>>> creating 5 additional nodes owned by the guest. The central node 
>>> counter is now
>>> 1004 (over quota), while the in-transaction count is 994.
>>> In the transaction the
>>> guest can now happily create a new node (#995) with my patch, but 
>>> will fail to
>>> do so without (it sees the 1004 due to the transaction count being 
>>> ignored).
>>
>> This doesn't sound correct. To do you have any test I could use to 
>> check the behavior?
> 
> Try it:
> 
> - create nodes in a guest until you hit the ENOSPC return code due to 
> too many
>    nodes
> - start a transaction deleting some nodes and then trying to create another
>    one, which fail fail with ENOSPC.

So I have recreated an XTF test which I think match what you wrote [1].

It is indeed failing without your patch. But then there are still some 
weird behavior here.

I would expect the creation of the node would also fail if instead of 
removing the node if removed outside of the transaction.

This is not the case because we are looking at the current quota. So 
shouldn't we snapshot the global count?

Cheers,

[1]
#include <xtf.h>

const char test_title[] = "Test xenstore-transaction-limit-1";

#define BASELINE_DIR "data"
#define BASELINE_MAX_DIR BASELINE_DIR"/max"

#define MAX_NODES 2000

static bool max_out_nodes(void)
{
     unsigned int parent_id = 0, child_id = 0, nr_nodes = 0;
     xs_transaction_t tid = XBT_NULL;

     printk("Maxing out nodes\n");

     do
     {
         int rc;
         char path[256];

         rc = snprintf(path, ARRAY_SIZE(path), "%s/%u/%u",
                       BASELINE_MAX_DIR, parent_id, child_id);

         if ( rc >= (int)ARRAY_SIZE(path) )
         {
             xtf_error("Unable to create the path\n");
             return false;
         }

         rc = xenstore_mkdir(tid, path);

         /* Xenstored will return ENOSPC if we exceed a quota */
         if ( rc == ENOSPC )
         {
             /*
              * If we can't write the first child, then this likely means
              * we exceed the maximum of nodes quota. Consider it a
              * success.
              *
              * Otherwise, we may hit the maximum size of the parent.
              * Switch to a different parent.
              */
             if ( child_id == 0 )
             {
                 printk("Stopped after %u iterations\n", nr_nodes);
                 return true;
             }
             else
             {
                 printk("Parent ID %u: created %u children\n",
                        parent_id, child_id);
                 parent_id++;
                 child_id = 0;
                 continue;
             }
         }
         else if ( rc )
         {
             xtf_error("Unexpected error %d\n", rc);
             return false;
         }
         else
         {
             nr_nodes++;
             child_id++;
         }
     } while ( nr_nodes < MAX_NODES );

     xtf_error("Created %u nodes and the quota is still not reached?\n",
               nr_nodes);

     return false;
}

void test_main(void)
{
     xs_transaction_t tid;
     int rc;

     if ( !max_out_nodes() )
         return;

     tid = xenstore_transaction_start();
     if ( tid == XBT_NULL )
     {
         xtf_error("Cannot start transaction\n");
         return;
     }

     /* Remove one of the node within the transaction */
     rc = xenstore_rm(tid, "data/max/0/0");
     if ( rc )
     {
         xtf_error("Cannot remove the node data/max/0/0 (rc = %d)\n", rc);
         return;
     }

     /* Creating a new node should work because we removed one */
     rc = xenstore_write(tid, "data/foo", "bar");
     if ( rc )
     {
         xtf_error("Cannot create node data/foo (rc = %d)\n", rc);
         return;
     }

     xtf_success(NULL);
}



> 
> 
> Juergen

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 04/13] tools/xenstore: add framework to commit accounting data on success only
  2023-01-20 10:00 ` [PATCH v2 04/13] tools/xenstore: add framework to commit accounting data on success only Juergen Gross
@ 2023-02-20 22:50   ` Julien Grall
  2023-02-21  8:37     ` Juergen Gross
  0 siblings, 1 reply; 40+ messages in thread
From: Julien Grall @ 2023-02-20 22:50 UTC (permalink / raw)
  To: Juergen Gross, xen-devel; +Cc: Wei Liu, Anthony PERARD

Hi Juergen,

On 20/01/2023 10:00, Juergen Gross wrote:
> Instead of modifying accounting data and undo those modifications in
> case of an error during further processing, add a framework for
> collecting the needed changes and commit them only when the whole
> operation has succeeded.
> 
> This scheme can reuse large parts of the per transaction accounting.
> The changed_domain handling can be reused, but the array size of the
> accounting data should be possible to be different for both use cases.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>   tools/xenstore/xenstored_core.c   |  5 +++
>   tools/xenstore/xenstored_core.h   |  3 ++
>   tools/xenstore/xenstored_domain.c | 64 +++++++++++++++++++++++++++----
>   tools/xenstore/xenstored_domain.h |  5 ++-
>   4 files changed, 68 insertions(+), 9 deletions(-)
> 
> diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
> index 27dfbe9593..2d10cacf35 100644
> --- a/tools/xenstore/xenstored_core.c
> +++ b/tools/xenstore/xenstored_core.c
> @@ -1023,6 +1023,9 @@ static void send_error(struct connection *conn, int error)
>   			break;
>   		}
>   	}
> +
> +	acc_drop(conn);
> +
>   	send_reply(conn, XS_ERROR, xsd_errors[i].errstring,
>   			  strlen(xsd_errors[i].errstring) + 1);
>   }
> @@ -1060,6 +1063,7 @@ void send_reply(struct connection *conn, enum xsd_sockmsg_type type,
>   	}
>   
>   	conn->in = NULL;
> +	acc_commit(conn);

AFAIU, if send_reply() is called then we would need to commit the 
accounting even if we can't send the reply (i.e. send_reply()). So 
shouldn't this be call right at the beginning of send_reply()?

>   
>   	/* Update relevant header fields and fill in the message body. */
>   	bdata->hdr.msg.type = type;
> @@ -2195,6 +2199,7 @@ struct connection *new_connection(const struct interface_funcs *funcs)
>   	new->is_stalled = false;
>   	new->transaction_started = 0;
>   	INIT_LIST_HEAD(&new->out_list);
> +	INIT_LIST_HEAD(&new->acc_list);
>   	INIT_LIST_HEAD(&new->ref_list);
>   	INIT_LIST_HEAD(&new->watches);
>   	INIT_LIST_HEAD(&new->transaction_list);
> diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
> index c59b06551f..1f811f38cb 100644
> --- a/tools/xenstore/xenstored_core.h
> +++ b/tools/xenstore/xenstored_core.h
> @@ -139,6 +139,9 @@ struct connection
>   	struct list_head out_list;
>   	uint64_t timeout_msec;
>   
> +	/* Not yet committed accounting data (valid if in != NULL). */
> +	struct list_head acc_list;
> +
>   	/* Referenced requests no longer pending. */
>   	struct list_head ref_list;
>   
> diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
> index f459c5aabb..33105dba8f 100644
> --- a/tools/xenstore/xenstored_domain.c
> +++ b/tools/xenstore/xenstored_domain.c
> @@ -100,7 +100,7 @@ struct changed_domain
>   	unsigned int domid;
>   
>   	/* Accounting data. */
> -	int acc[ACC_TR_N];
> +	int acc[];

Is this actually worth it? How much memory would we save?

>   };
>   
>   static struct hashtable *domhash;
> @@ -577,6 +577,7 @@ static struct changed_domain *acc_find_changed_domain(struct list_head *head,
>   
>   static struct changed_domain *acc_get_changed_domain(const void *ctx,
>   						     struct list_head *head,
> +						     enum accitem acc_sz,
>   						     unsigned int domid)
>   {
>   	struct changed_domain *cd;
> @@ -585,7 +586,7 @@ static struct changed_domain *acc_get_changed_domain(const void *ctx,
>   	if (cd)
>   		return cd;
>   
> -	cd = talloc_zero(ctx, struct changed_domain);
> +	cd = talloc_zero_size(ctx, sizeof(*cd) + acc_sz * sizeof(cd->acc[0]));
>   	if (!cd)
>   		return NULL;
>   
> @@ -596,13 +597,13 @@ static struct changed_domain *acc_get_changed_domain(const void *ctx,
>   }
>   
>   static int acc_add_changed_dom(const void *ctx, struct list_head *head,
> -			       enum accitem what, int val, unsigned int domid)
> +			       enum accitem acc_sz, enum accitem what,
> +			       int val, unsigned int domid)
>   {
>   	struct changed_domain *cd;
>   
> -	assert(what < ARRAY_SIZE(cd->acc));
> -
> -	cd = acc_get_changed_domain(ctx, head, domid);
> +	assert(what < acc_sz);
> +	cd = acc_get_changed_domain(ctx, head, acc_sz, domid);
>   	if (!cd)
>   		return 0;
>   
> @@ -1071,6 +1072,7 @@ static int domain_acc_add(struct connection *conn, unsigned int domid,
>   			  enum accitem what, int add, bool no_dom_alloc)
>   {
>   	struct domain *d;
> +	struct changed_domain *cd;
>   	struct list_head *head;
>   	int ret;
>   
> @@ -1091,10 +1093,26 @@ static int domain_acc_add(struct connection *conn, unsigned int domid,
>   		}
>   	}
>   
> +	/* Temporary accounting data until final commit? */
> +	if (conn && conn->in && what < ACC_REQ_N) {
> +		/* Consider transaction local data. */
> +		ret = 0;
> +		if (conn->transaction && what < ACC_TR_N) {
> +			head = transaction_get_changed_domains(
> +				conn->transaction);
> +			cd = acc_find_changed_domain(head, domid);
> +			if (cd)
> +				ret = cd->acc[what];
> +		}
> +		ret += acc_add_changed_dom(conn->in, &conn->acc_list, ACC_REQ_N,
> +					  what, add, domid);
> +		return errno ? -1 : domain_acc_add_chk(d, what, ret, domid);
> +	}
> +
>   	if (conn && conn->transaction && what < ACC_TR_N) {
>   		head = transaction_get_changed_domains(conn->transaction);
> -		ret = acc_add_changed_dom(conn->transaction, head, what,
> -					  add, domid);
> +		ret = acc_add_changed_dom(conn->transaction, head, ACC_TR_N,
> +					  what, add, domid);
>   		if (errno) {
>   			fail_transaction(conn->transaction);
>   			return -1;
> @@ -1107,6 +1125,36 @@ static int domain_acc_add(struct connection *conn, unsigned int domid,
>   	return d->acc[what];
>   }
>   
> +void acc_drop(struct connection *conn)
> +{
> +	struct changed_domain *cd;
> +
> +	while ((cd = list_top(&conn->acc_list, struct changed_domain, list))) {

NIT: You could use list_for_each_safe();

> +		list_del(&cd->list);
> +		talloc_free(cd);
> +	}
> +}
> +
> +void acc_commit(struct connection *conn)
> +{
> +	struct changed_domain *cd;
> +	struct buffered_data *in = conn->in;
> +	enum accitem what;
> +
> +	conn->in = NULL; /* Avoid recursion. */

I am not sure I understand this comment. Can you provide more details 
where the recursion would happen?

> +	while ((cd = list_top(&conn->acc_list, struct changed_domain, list))) {

NIT: You could use list_for_each_safe();

> +		list_del(&cd->list);
> +		for (what = 0; what < ACC_REQ_N; what++)

There is a chance that some compiler will complain about this line 
because ACC_REQ_N = 0. So this would always be true. Therefore...

> +			if (cd->acc[what])
> +				domain_acc_add(conn, cd->domid, what,
> +					       cd->acc[what], true);
> +
> +		talloc_free(cd);
> +	}
> +
> +	conn->in = in;
> +}
> +
>   int domain_nbentry_inc(struct connection *conn, unsigned int domid)
>   {
>   	return (domain_acc_add(conn, domid, ACC_NODES, 1, false) < 0)
> diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
> index 8259c114b0..cd85bd0cad 100644
> --- a/tools/xenstore/xenstored_domain.h
> +++ b/tools/xenstore/xenstored_domain.h
> @@ -20,7 +20,8 @@
>   #define _XENSTORED_DOMAIN_H
>   
>   enum accitem {
> -	ACC_NODES,
> +	ACC_REQ_N,       /* Number of elements per request and domain. */
> +	ACC_NODES = ACC_REQ_N,

I would bring forward the change in patch #5. I mean:

ACC_NODES,
ACC_REQ_N

>   	ACC_TR_N,        /* Number of elements per transaction and domain. */
>   	ACC_N = ACC_TR_N /* Number of elements per domain. */
>   };

This enum is getting extremely confusing. There are many "number of 
elements per ... domain". Can you clarify?

> @@ -103,6 +104,8 @@ void domain_outstanding_domid_dec(unsigned int domid);
>   int domain_get_quota(const void *ctx, struct connection *conn,
>   		     unsigned int domid);
>   int acc_fix_domains(struct list_head *head, bool update);
> +void acc_drop(struct connection *conn);
> +void acc_commit(struct connection *conn);
>   
>   /* Write rate limiting */
>   

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 08/13] tools/xenstore: add accounting trace support
  2023-01-20 10:00 ` [PATCH v2 08/13] tools/xenstore: add accounting trace support Juergen Gross
@ 2023-02-20 22:57   ` Julien Grall
  2023-02-21  8:40     ` Juergen Gross
  0 siblings, 1 reply; 40+ messages in thread
From: Julien Grall @ 2023-02-20 22:57 UTC (permalink / raw)
  To: Juergen Gross, xen-devel; +Cc: Wei Liu, Anthony PERARD

Hi Juergen,

On 20/01/2023 10:00, Juergen Gross wrote:
> Add a new trace switch "acc" and the related trace calls.
> 
> The "acc" switch is off per default.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

With one reamrk (see below):

Reviewed-by: Julien Grall <jgrall@amazon.com>

> ---
>   tools/xenstore/xenstored_core.c   |  2 +-
>   tools/xenstore/xenstored_core.h   |  1 +
>   tools/xenstore/xenstored_domain.c | 10 ++++++++++
>   3 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
> index 6ef60179fa..558ef491b1 100644
> --- a/tools/xenstore/xenstored_core.c
> +++ b/tools/xenstore/xenstored_core.c
> @@ -2746,7 +2746,7 @@ static void set_quota(const char *arg, bool soft)
>   
>   /* Sorted by bit values of TRACE_* flags. Flag is (1u << index). */
>   const char *const trace_switches[] = {
> -	"obj", "io", "wrl",
> +	"obj", "io", "wrl", "acc",
>   	NULL
>   };
>   
> diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
> index 1f811f38cb..3e0734a6c6 100644
> --- a/tools/xenstore/xenstored_core.h
> +++ b/tools/xenstore/xenstored_core.h
> @@ -302,6 +302,7 @@ extern unsigned int trace_flags;
>   #define TRACE_OBJ	0x00000001
>   #define TRACE_IO	0x00000002
>   #define TRACE_WRL	0x00000004
> +#define TRACE_ACC	0x00000008
>   extern const char *const trace_switches[];
>   int set_trace_switch(const char *arg);
>   
> diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
> index b1e29edb7e..d461fd8cc8 100644
> --- a/tools/xenstore/xenstored_domain.c
> +++ b/tools/xenstore/xenstored_domain.c
> @@ -538,6 +538,12 @@ static struct domain *find_domain_by_domid(unsigned int domid)
>   	return (d && d->introduced) ? d : NULL;
>   }
>   
> +#define trace_acc(...)				\

The indentation of '\' looks odd.

> +do {						\
> +	if (trace_flags & TRACE_ACC)		\
> +		trace("acc: " __VA_ARGS__);	\
> +} while (0)
> +
>   int acc_fix_domains(struct list_head *head, bool update)
>   {
>   	struct changed_domain *cd;
> @@ -602,6 +608,8 @@ static int acc_add_changed_dom(const void *ctx, struct list_head *head,
>   		return 0;
>   
>   	errno = 0;
> +	trace_acc("local change domid %u: what=%u %d add %d\n", domid, what,
> +		  cd->acc[what], val);
>   	cd->acc[what] += val;
>   
>   	return cd->acc[what];
> @@ -1114,6 +1122,8 @@ static int domain_acc_add(struct connection *conn, unsigned int domid,
>   		return domain_acc_add_chk(d, what, ret, domid);
>   	}
>   
> +	trace_acc("global change domid %u: what=%u %u add %d\n", domid, what,
> +		  d->acc[what], add);
>   	d->acc[what] = domain_acc_add_chk(d, what, add, domid);
>   
>   	return d->acc[what];
Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 09/13] tools/xenstore: add TDB access trace support
  2023-01-20 10:00 ` [PATCH v2 09/13] tools/xenstore: add TDB access " Juergen Gross
@ 2023-02-20 22:59   ` Julien Grall
  2023-02-21  8:41     ` Juergen Gross
  0 siblings, 1 reply; 40+ messages in thread
From: Julien Grall @ 2023-02-20 22:59 UTC (permalink / raw)
  To: Juergen Gross, xen-devel; +Cc: Wei Liu, Anthony PERARD

Hi,

On 20/01/2023 10:00, Juergen Gross wrote:
> Add a new trace switch "tdb" and the related trace calls.
> 
> The "tdb" switch is off per default.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

With one remark (see below):

Reviewed-by: Julien Grall <jgrall@amazon.com>

> ---
>   tools/xenstore/xenstored_core.c        | 8 +++++++-
>   tools/xenstore/xenstored_core.h        | 6 ++++++
>   tools/xenstore/xenstored_transaction.c | 7 ++++++-
>   3 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
> index 558ef491b1..49e196e7ae 100644
> --- a/tools/xenstore/xenstored_core.c
> +++ b/tools/xenstore/xenstored_core.c
> @@ -589,6 +589,8 @@ static void get_acc_data(TDB_DATA *key, struct node_account_data *acc)
>   		if (old_data.dptr == NULL) {
>   			acc->memory = 0;
>   		} else {
> +			trace_tdb("read %s size %zu\n", key->dptr,
> +				  old_data.dsize + key->dsize);
>   			hdr = (void *)old_data.dptr;
>   			acc->memory = old_data.dsize;
>   			acc->domid = hdr->perms[0].id;
> @@ -655,6 +657,7 @@ int do_tdb_write(struct connection *conn, TDB_DATA *key, TDB_DATA *data,
>   		errno = EIO;
>   		return errno;
>   	}
> +	trace_tdb("store %s size %zu\n", key->dptr, data->dsize + key->dsize);
>   
>   	if (acc) {
>   		/* Don't use new_domid, as it might be a transaction node. */
> @@ -682,6 +685,7 @@ int do_tdb_delete(struct connection *conn, TDB_DATA *key,
>   		errno = EIO;
>   		return errno;
>   	}
> +	trace_tdb("delete %s\n", key->dptr);
>   
>   	if (acc->memory) {
>   		domid = get_acc_domid(conn, key, acc->domid);
> @@ -731,6 +735,8 @@ struct node *read_node(struct connection *conn, const void *ctx,
>   		goto error;
>   	}
>   
> +	trace_tdb("read %s size %zu\n", key.dptr, data.dsize + key.dsize);
> +
>   	node->parent = NULL;
>   	talloc_steal(node, data.dptr);
>   
> @@ -2746,7 +2752,7 @@ static void set_quota(const char *arg, bool soft)
>   
>   /* Sorted by bit values of TRACE_* flags. Flag is (1u << index). */
>   const char *const trace_switches[] = {
> -	"obj", "io", "wrl", "acc",
> +	"obj", "io", "wrl", "acc", "tdb",
>   	NULL
>   };
>   
> diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
> index 3e0734a6c6..419a144396 100644
> --- a/tools/xenstore/xenstored_core.h
> +++ b/tools/xenstore/xenstored_core.h
> @@ -303,8 +303,14 @@ extern unsigned int trace_flags;
>   #define TRACE_IO	0x00000002
>   #define TRACE_WRL	0x00000004
>   #define TRACE_ACC	0x00000008
> +#define TRACE_TDB	0x00000010
>   extern const char *const trace_switches[];
>   int set_trace_switch(const char *arg);

Add a newline here.

> +#define trace_tdb(...)				\
> +do {						\
> +	if (trace_flags & TRACE_TDB)		\
> +		trace("tdb: " __VA_ARGS__);	\
> +} while (0)
>   
>   extern TDB_CONTEXT *tdb_ctx;
>   extern int dom0_domid;
> diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
> index 1aa9d3cb3d..19a1175d1b 100644
> --- a/tools/xenstore/xenstored_transaction.c
> +++ b/tools/xenstore/xenstored_transaction.c
> @@ -366,8 +366,11 @@ static int finalize_transaction(struct connection *conn,
>   				if (tdb_error(tdb_ctx) != TDB_ERR_NOEXIST)
>   					return EIO;
>   				gen = NO_GENERATION;
> -			} else
> +			} else {
> +				trace_tdb("read %s size %zu\n", key.dptr,
> +					  key.dsize + data.dsize);
>   				gen = hdr->generation;
> +			}
>   			talloc_free(data.dptr);
>   			if (i->generation != gen)
>   				return EAGAIN;
> @@ -391,6 +394,8 @@ static int finalize_transaction(struct connection *conn,
>   			set_tdb_key(i->trans_name, &ta_key);
>   			data = tdb_fetch(tdb_ctx, ta_key);
>   			if (data.dptr) {
> +				trace_tdb("read %s size %zu\n", ta_key.dptr,
> +					  ta_key.dsize + data.dsize);
>   				hdr = (void *)data.dptr;
>   				hdr->generation = ++generation;
>   				*is_corrupt |= do_tdb_write(conn, &key, &data,

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 01/13] tools/xenstore: don't allow creating too many nodes in a transaction
  2023-02-20 18:01               ` Julien Grall
@ 2023-02-21  8:10                 ` Juergen Gross
  2023-02-21 22:36                   ` Julien Grall
  0 siblings, 1 reply; 40+ messages in thread
From: Juergen Gross @ 2023-02-21  8:10 UTC (permalink / raw)
  To: Julien Grall, xen-devel; +Cc: Wei Liu, Anthony PERARD


[-- Attachment #1.1.1: Type: text/plain, Size: 5409 bytes --]

On 20.02.23 19:01, Julien Grall wrote:
> 
> 
> On 20/02/2023 14:21, Juergen Gross wrote:
>> On 20.02.23 15:15, Julien Grall wrote:
>>> On 20/02/2023 13:49, Juergen Gross wrote:
>>>> On 20.02.23 13:07, Julien Grall wrote:
>>>>> Hi Juergen,
>>>>>
>>>>> On 20/02/2023 11:04, Juergen Gross wrote:
>>>>>> On 20.02.23 10:46, Julien Grall wrote:
>>>>>>> Hi Juergen,
>>>>>>>
>>>>>>> On 20/01/2023 10:00, Juergen Gross wrote:
>>>>>>>> The accounting for the number of nodes of a domain in an active
>>>>>>>> transaction is not working correctly, as it allows to create arbitrary
>>>>>>>> number of nodes. The transaction will finally fail due to exceeding
>>>>>>>> the number of nodes quota, but before closing the transaction an
>>>>>>>> unprivileged guest could cause Xenstore to use a lot of memory.
>>>>>>>
>>>>>>> I know I said I would delay my decision on this patch. However, I was 
>>>>>>> still expecting the commit message to be updated based on our previous 
>>>>>>> discussion.
>>>>>>
>>>>>> As said before, I was waiting on the settlement of our discussion before
>>>>>> doing the update.
>>>>>
>>>>> This is not a very good practice to resend a patch without recording the 
>>>>> disagreement because new reviewers may not be aware of the disagreement and 
>>>>> this could end up to be committed by mistake...
>>>>
>>>> You said you wanted to look into this patch in detail after the previous
>>>> series, so I move it into this series. The wording update would strongly
>>>> depend on the outcome of the discussion IMO, so I didn't do it yet.
>>> This could have been mentioned after ---. This could allow people to 
>>> understand the concern and then participate.
>>
>> Will do so in future.
>>
>>>
>>>>
>>>>>>> Also thinking more about it, "The transaction will finally fail due to 
>>>>>>> exceeding the number of nodes quota" may not be true for a couple of 
>>>>>>> reasons:
>>>>>>>    1) The transaction may removed a node afterwards.
>>>>>>
>>>>>> Yes, and? Just because it is a transaction, this is still a violation of
>>>>>> the quota. Even outside a transaction you could use the same reasoning,
>>>>>> but you don't (which is correct, of course).
>>>>>
>>>>> I can understand your point. However, to me, the goal of the transaction is 
>>>>> to commit everything in one go at the end. So the violations in the middle 
>>>>> should not matter.
>>>>
>>>> Of course they should.
>>>>
>>>> We wouldn't allow to write over-sized nodes, even if they could be rewritten in
>>>> the same transaction with a smaller size.
>>>
>>> I view the two different.
>>>
>>>> We wouldn't allow to create nodes which would violate overall memory
>>>> consumption.
>>>>
>>>> We wouldn't allow nodes with more permission entries than allowed, even if they
>>>> could be reduced in the same transaction again.
>>>>
>>>> I don't see why the number of nodes shouldn't be taken into account.
>>>>
>>>>> Furthermore, I would expect a transaction to work on a snapshot of the 
>>>>> system. So it feels really strange to me that we are constantly checking 
>>>>> the quota with the updated values rather than the initial one.
>>>>
>>>> You are aware that the code without this patch is just neglecting the nodes
>>>> created in the transaction? It just is using the current number of nodes
>>>> outside any transaction for quota check.
>>>
>>> Are you referring to the quota check within the transaction?
>>
>> I'm referring to the quota check in create_node(). >
>>>
>>>> So I could easily create a scenario
>>>> where my new code will succeed, but the current one is failing:
>>>> Assume node quota is 1000, and at start of the transaction the guest is owning
>>>> 999 nodes. In the transaction the guest is deleting 10 nodes, then dom0 is
>>>> creating 5 additional nodes owned by the guest. The central node counter is now
>>>> 1004 (over quota), while the in-transaction count is 994.
>>>> In the transaction the
>>>> guest can now happily create a new node (#995) with my patch, but will fail to
>>>> do so without (it sees the 1004 due to the transaction count being ignored).
>>>
>>> This doesn't sound correct. To do you have any test I could use to check the 
>>> behavior?
>>
>> Try it:
>>
>> - create nodes in a guest until you hit the ENOSPC return code due to too many
>>    nodes
>> - start a transaction deleting some nodes and then trying to create another
>>    one, which fail fail with ENOSPC.
> 
> So I have recreated an XTF test which I think match what you wrote [1].
> 
> It is indeed failing without your patch. But then there are still some weird 
> behavior here.
> 
> I would expect the creation of the node would also fail if instead of removing 
> the node if removed outside of the transaction.
> 
> This is not the case because we are looking at the current quota. So shouldn't 
> we snapshot the global count?

As we don't do a global snapshot of the data base for a transaction (this was
changed due to huge memory needs, bad performance, and a higher transaction
failure rate), I don't think we should snapshot the count either.

A transaction is performed atomically at the time it is finished. Therefore
seeing the current global state inside the transaction (with the transaction
private state on top like an overlay) is absolutely fine IMO.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 04/13] tools/xenstore: add framework to commit accounting data on success only
  2023-02-20 22:50   ` Julien Grall
@ 2023-02-21  8:37     ` Juergen Gross
  2023-02-21 22:42       ` Julien Grall
  0 siblings, 1 reply; 40+ messages in thread
From: Juergen Gross @ 2023-02-21  8:37 UTC (permalink / raw)
  To: Julien Grall, xen-devel; +Cc: Wei Liu, Anthony PERARD


[-- Attachment #1.1.1: Type: text/plain, Size: 10767 bytes --]

On 20.02.23 23:50, Julien Grall wrote:
> Hi Juergen,
> 
> On 20/01/2023 10:00, Juergen Gross wrote:
>> Instead of modifying accounting data and undo those modifications in
>> case of an error during further processing, add a framework for
>> collecting the needed changes and commit them only when the whole
>> operation has succeeded.
>>
>> This scheme can reuse large parts of the per transaction accounting.
>> The changed_domain handling can be reused, but the array size of the
>> accounting data should be possible to be different for both use cases.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>   tools/xenstore/xenstored_core.c   |  5 +++
>>   tools/xenstore/xenstored_core.h   |  3 ++
>>   tools/xenstore/xenstored_domain.c | 64 +++++++++++++++++++++++++++----
>>   tools/xenstore/xenstored_domain.h |  5 ++-
>>   4 files changed, 68 insertions(+), 9 deletions(-)
>>
>> diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
>> index 27dfbe9593..2d10cacf35 100644
>> --- a/tools/xenstore/xenstored_core.c
>> +++ b/tools/xenstore/xenstored_core.c
>> @@ -1023,6 +1023,9 @@ static void send_error(struct connection *conn, int error)
>>               break;
>>           }
>>       }
>> +
>> +    acc_drop(conn);
>> +
>>       send_reply(conn, XS_ERROR, xsd_errors[i].errstring,
>>                 strlen(xsd_errors[i].errstring) + 1);
>>   }
>> @@ -1060,6 +1063,7 @@ void send_reply(struct connection *conn, enum 
>> xsd_sockmsg_type type,
>>       }
>>       conn->in = NULL;
>> +    acc_commit(conn);
> 
> AFAIU, if send_reply() is called then we would need to commit the accounting 
> even if we can't send the reply (i.e. send_reply()). So shouldn't this be call 
> right at the beginning of send_reply()?

Oh, indeed. Good catch!

> 
>>       /* Update relevant header fields and fill in the message body. */
>>       bdata->hdr.msg.type = type;
>> @@ -2195,6 +2199,7 @@ struct connection *new_connection(const struct 
>> interface_funcs *funcs)
>>       new->is_stalled = false;
>>       new->transaction_started = 0;
>>       INIT_LIST_HEAD(&new->out_list);
>> +    INIT_LIST_HEAD(&new->acc_list);
>>       INIT_LIST_HEAD(&new->ref_list);
>>       INIT_LIST_HEAD(&new->watches);
>>       INIT_LIST_HEAD(&new->transaction_list);
>> diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
>> index c59b06551f..1f811f38cb 100644
>> --- a/tools/xenstore/xenstored_core.h
>> +++ b/tools/xenstore/xenstored_core.h
>> @@ -139,6 +139,9 @@ struct connection
>>       struct list_head out_list;
>>       uint64_t timeout_msec;
>> +    /* Not yet committed accounting data (valid if in != NULL). */
>> +    struct list_head acc_list;
>> +
>>       /* Referenced requests no longer pending. */
>>       struct list_head ref_list;
>> diff --git a/tools/xenstore/xenstored_domain.c 
>> b/tools/xenstore/xenstored_domain.c
>> index f459c5aabb..33105dba8f 100644
>> --- a/tools/xenstore/xenstored_domain.c
>> +++ b/tools/xenstore/xenstored_domain.c
>> @@ -100,7 +100,7 @@ struct changed_domain
>>       unsigned int domid;
>>       /* Accounting data. */
>> -    int acc[ACC_TR_N];
>> +    int acc[];
> 
> Is this actually worth it? How much memory would we save?

Hmm, true. While being more generic, the saved memory is actually zero,
as ACC_TR_N and ACC_REQ_N have the same value. And even if they should
differ in future, we can just use the higher of both values here.

> 
>>   };
>>   static struct hashtable *domhash;
>> @@ -577,6 +577,7 @@ static struct changed_domain 
>> *acc_find_changed_domain(struct list_head *head,
>>   static struct changed_domain *acc_get_changed_domain(const void *ctx,
>>                                struct list_head *head,
>> +                             enum accitem acc_sz,
>>                                unsigned int domid)
>>   {
>>       struct changed_domain *cd;
>> @@ -585,7 +586,7 @@ static struct changed_domain *acc_get_changed_domain(const 
>> void *ctx,
>>       if (cd)
>>           return cd;
>> -    cd = talloc_zero(ctx, struct changed_domain);
>> +    cd = talloc_zero_size(ctx, sizeof(*cd) + acc_sz * sizeof(cd->acc[0]));
>>       if (!cd)
>>           return NULL;
>> @@ -596,13 +597,13 @@ static struct changed_domain 
>> *acc_get_changed_domain(const void *ctx,
>>   }
>>   static int acc_add_changed_dom(const void *ctx, struct list_head *head,
>> -                   enum accitem what, int val, unsigned int domid)
>> +                   enum accitem acc_sz, enum accitem what,
>> +                   int val, unsigned int domid)
>>   {
>>       struct changed_domain *cd;
>> -    assert(what < ARRAY_SIZE(cd->acc));
>> -
>> -    cd = acc_get_changed_domain(ctx, head, domid);
>> +    assert(what < acc_sz);
>> +    cd = acc_get_changed_domain(ctx, head, acc_sz, domid);
>>       if (!cd)
>>           return 0;
>> @@ -1071,6 +1072,7 @@ static int domain_acc_add(struct connection *conn, 
>> unsigned int domid,
>>                 enum accitem what, int add, bool no_dom_alloc)
>>   {
>>       struct domain *d;
>> +    struct changed_domain *cd;
>>       struct list_head *head;
>>       int ret;
>> @@ -1091,10 +1093,26 @@ static int domain_acc_add(struct connection *conn, 
>> unsigned int domid,
>>           }
>>       }
>> +    /* Temporary accounting data until final commit? */
>> +    if (conn && conn->in && what < ACC_REQ_N) {
>> +        /* Consider transaction local data. */
>> +        ret = 0;
>> +        if (conn->transaction && what < ACC_TR_N) {
>> +            head = transaction_get_changed_domains(
>> +                conn->transaction);
>> +            cd = acc_find_changed_domain(head, domid);
>> +            if (cd)
>> +                ret = cd->acc[what];
>> +        }
>> +        ret += acc_add_changed_dom(conn->in, &conn->acc_list, ACC_REQ_N,
>> +                      what, add, domid);
>> +        return errno ? -1 : domain_acc_add_chk(d, what, ret, domid);
>> +    }
>> +
>>       if (conn && conn->transaction && what < ACC_TR_N) {
>>           head = transaction_get_changed_domains(conn->transaction);
>> -        ret = acc_add_changed_dom(conn->transaction, head, what,
>> -                      add, domid);
>> +        ret = acc_add_changed_dom(conn->transaction, head, ACC_TR_N,
>> +                      what, add, domid);
>>           if (errno) {
>>               fail_transaction(conn->transaction);
>>               return -1;
>> @@ -1107,6 +1125,36 @@ static int domain_acc_add(struct connection *conn, 
>> unsigned int domid,
>>       return d->acc[what];
>>   }
>> +void acc_drop(struct connection *conn)
>> +{
>> +    struct changed_domain *cd;
>> +
>> +    while ((cd = list_top(&conn->acc_list, struct changed_domain, list))) {
> 
> NIT: You could use list_for_each_safe();

Yes, at the cost of an additional variable.

> 
>> +        list_del(&cd->list);
>> +        talloc_free(cd);
>> +    }
>> +}
>> +
>> +void acc_commit(struct connection *conn)
>> +{
>> +    struct changed_domain *cd;
>> +    struct buffered_data *in = conn->in;
>> +    enum accitem what;
>> +
>> +    conn->in = NULL; /* Avoid recursion. */
> 
> I am not sure I understand this comment. Can you provide more details where the 
> recursion would happen?

domain_acc_add() might do temporary accounting if conn->in isn't NULL.

> 
>> +    while ((cd = list_top(&conn->acc_list, struct changed_domain, list))) {
> 
> NIT: You could use list_for_each_safe();

Like above.

> 
>> +        list_del(&cd->list);
>> +        for (what = 0; what < ACC_REQ_N; what++)
> 
> There is a chance that some compiler will complain about this line because 
> ACC_REQ_N = 0. So this would always be true. Therefore...

Huh? It would always be false.

> 
>> +            if (cd->acc[what])
>> +                domain_acc_add(conn, cd->domid, what,
>> +                           cd->acc[what], true);
>> +
>> +        talloc_free(cd);
>> +    }
>> +
>> +    conn->in = in;
>> +}
>> +
>>   int domain_nbentry_inc(struct connection *conn, unsigned int domid)
>>   {
>>       return (domain_acc_add(conn, domid, ACC_NODES, 1, false) < 0)
>> diff --git a/tools/xenstore/xenstored_domain.h 
>> b/tools/xenstore/xenstored_domain.h
>> index 8259c114b0..cd85bd0cad 100644
>> --- a/tools/xenstore/xenstored_domain.h
>> +++ b/tools/xenstore/xenstored_domain.h
>> @@ -20,7 +20,8 @@
>>   #define _XENSTORED_DOMAIN_H
>>   enum accitem {
>> -    ACC_NODES,
>> +    ACC_REQ_N,       /* Number of elements per request and domain. */
>> +    ACC_NODES = ACC_REQ_N,
> 
> I would bring forward the change in patch #5. I mean:
> 
> ACC_NODES,
> ACC_REQ_N

I'm not sure this is a good idea. This would activate the temporary
accounting for nodes, but keeping the error handling active. I'd rather
activate temporary accounting for nodes together with removing the
accounting correction in the error handling.

> 
>>       ACC_TR_N,        /* Number of elements per transaction and domain. */
>>       ACC_N = ACC_TR_N /* Number of elements per domain. */
>>   };
> 
> This enum is getting extremely confusing. There are many "number of elements per 
> ... domain". Can you clarify?

I will add some more comments to the header. Would you like it better to have
the limits indented more? something like:

enum accitem {
         ACC_NODES,
		/* Number of elements per request and domain. */
		ACC_REQ_N,
		/* Number of elements per transaction and domain. */
         	ACC_TR_N = ACC_REQ_N,
         ACC_WATCH = ACC_TR_N,
         ACC_OUTST,
         ACC_MEM,
         ACC_TRANS,
         ACC_TRANSNODES,
         ACC_NPERM,
         ACC_PATHLEN,
         ACC_NODESZ,
		/* Number of elements per domain. */
         	ACC_N
};


Juergen


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 08/13] tools/xenstore: add accounting trace support
  2023-02-20 22:57   ` Julien Grall
@ 2023-02-21  8:40     ` Juergen Gross
  2023-02-21 22:45       ` Julien Grall
  0 siblings, 1 reply; 40+ messages in thread
From: Juergen Gross @ 2023-02-21  8:40 UTC (permalink / raw)
  To: Julien Grall, xen-devel; +Cc: Wei Liu, Anthony PERARD


[-- Attachment #1.1.1: Type: text/plain, Size: 2163 bytes --]

On 20.02.23 23:57, Julien Grall wrote:
> Hi Juergen,
> 
> On 20/01/2023 10:00, Juergen Gross wrote:
>> Add a new trace switch "acc" and the related trace calls.
>>
>> The "acc" switch is off per default.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> With one reamrk (see below):
> 
> Reviewed-by: Julien Grall <jgrall@amazon.com>
> 
>> ---
>>   tools/xenstore/xenstored_core.c   |  2 +-
>>   tools/xenstore/xenstored_core.h   |  1 +
>>   tools/xenstore/xenstored_domain.c | 10 ++++++++++
>>   3 files changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
>> index 6ef60179fa..558ef491b1 100644
>> --- a/tools/xenstore/xenstored_core.c
>> +++ b/tools/xenstore/xenstored_core.c
>> @@ -2746,7 +2746,7 @@ static void set_quota(const char *arg, bool soft)
>>   /* Sorted by bit values of TRACE_* flags. Flag is (1u << index). */
>>   const char *const trace_switches[] = {
>> -    "obj", "io", "wrl",
>> +    "obj", "io", "wrl", "acc",
>>       NULL
>>   };
>> diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
>> index 1f811f38cb..3e0734a6c6 100644
>> --- a/tools/xenstore/xenstored_core.h
>> +++ b/tools/xenstore/xenstored_core.h
>> @@ -302,6 +302,7 @@ extern unsigned int trace_flags;
>>   #define TRACE_OBJ    0x00000001
>>   #define TRACE_IO    0x00000002
>>   #define TRACE_WRL    0x00000004
>> +#define TRACE_ACC    0x00000008
>>   extern const char *const trace_switches[];
>>   int set_trace_switch(const char *arg);
>> diff --git a/tools/xenstore/xenstored_domain.c 
>> b/tools/xenstore/xenstored_domain.c
>> index b1e29edb7e..d461fd8cc8 100644
>> --- a/tools/xenstore/xenstored_domain.c
>> +++ b/tools/xenstore/xenstored_domain.c
>> @@ -538,6 +538,12 @@ static struct domain *find_domain_by_domid(unsigned int 
>> domid)
>>       return (d && d->introduced) ? d : NULL;
>>   }
>> +#define trace_acc(...)                \
> 
> The indentation of '\' looks odd.

Not for me. Maybe you have a different tab setting?


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 09/13] tools/xenstore: add TDB access trace support
  2023-02-20 22:59   ` Julien Grall
@ 2023-02-21  8:41     ` Juergen Gross
  0 siblings, 0 replies; 40+ messages in thread
From: Juergen Gross @ 2023-02-21  8:41 UTC (permalink / raw)
  To: Julien Grall, xen-devel; +Cc: Wei Liu, Anthony PERARD


[-- Attachment #1.1.1: Type: text/plain, Size: 3281 bytes --]

On 20.02.23 23:59, Julien Grall wrote:
> Hi,
> 
> On 20/01/2023 10:00, Juergen Gross wrote:
>> Add a new trace switch "tdb" and the related trace calls.
>>
>> The "tdb" switch is off per default.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> With one remark (see below):
> 
> Reviewed-by: Julien Grall <jgrall@amazon.com>
> 
>> ---
>>   tools/xenstore/xenstored_core.c        | 8 +++++++-
>>   tools/xenstore/xenstored_core.h        | 6 ++++++
>>   tools/xenstore/xenstored_transaction.c | 7 ++++++-
>>   3 files changed, 19 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
>> index 558ef491b1..49e196e7ae 100644
>> --- a/tools/xenstore/xenstored_core.c
>> +++ b/tools/xenstore/xenstored_core.c
>> @@ -589,6 +589,8 @@ static void get_acc_data(TDB_DATA *key, struct 
>> node_account_data *acc)
>>           if (old_data.dptr == NULL) {
>>               acc->memory = 0;
>>           } else {
>> +            trace_tdb("read %s size %zu\n", key->dptr,
>> +                  old_data.dsize + key->dsize);
>>               hdr = (void *)old_data.dptr;
>>               acc->memory = old_data.dsize;
>>               acc->domid = hdr->perms[0].id;
>> @@ -655,6 +657,7 @@ int do_tdb_write(struct connection *conn, TDB_DATA *key, 
>> TDB_DATA *data,
>>           errno = EIO;
>>           return errno;
>>       }
>> +    trace_tdb("store %s size %zu\n", key->dptr, data->dsize + key->dsize);
>>       if (acc) {
>>           /* Don't use new_domid, as it might be a transaction node. */
>> @@ -682,6 +685,7 @@ int do_tdb_delete(struct connection *conn, TDB_DATA *key,
>>           errno = EIO;
>>           return errno;
>>       }
>> +    trace_tdb("delete %s\n", key->dptr);
>>       if (acc->memory) {
>>           domid = get_acc_domid(conn, key, acc->domid);
>> @@ -731,6 +735,8 @@ struct node *read_node(struct connection *conn, const void 
>> *ctx,
>>           goto error;
>>       }
>> +    trace_tdb("read %s size %zu\n", key.dptr, data.dsize + key.dsize);
>> +
>>       node->parent = NULL;
>>       talloc_steal(node, data.dptr);
>> @@ -2746,7 +2752,7 @@ static void set_quota(const char *arg, bool soft)
>>   /* Sorted by bit values of TRACE_* flags. Flag is (1u << index). */
>>   const char *const trace_switches[] = {
>> -    "obj", "io", "wrl", "acc",
>> +    "obj", "io", "wrl", "acc", "tdb",
>>       NULL
>>   };
>> diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
>> index 3e0734a6c6..419a144396 100644
>> --- a/tools/xenstore/xenstored_core.h
>> +++ b/tools/xenstore/xenstored_core.h
>> @@ -303,8 +303,14 @@ extern unsigned int trace_flags;
>>   #define TRACE_IO    0x00000002
>>   #define TRACE_WRL    0x00000004
>>   #define TRACE_ACC    0x00000008
>> +#define TRACE_TDB    0x00000010
>>   extern const char *const trace_switches[];
>>   int set_trace_switch(const char *arg);
> 
> Add a newline here.

Okay.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 01/13] tools/xenstore: don't allow creating too many nodes in a transaction
  2023-02-21  8:10                 ` Juergen Gross
@ 2023-02-21 22:36                   ` Julien Grall
  2023-02-22  8:36                     ` Juergen Gross
  0 siblings, 1 reply; 40+ messages in thread
From: Julien Grall @ 2023-02-21 22:36 UTC (permalink / raw)
  To: Juergen Gross, xen-devel; +Cc: Wei Liu, Anthony PERARD, Edwin Török

Hi Juergen,

On 21/02/2023 08:10, Juergen Gross wrote:
> On 20.02.23 19:01, Julien Grall wrote:
>> So I have recreated an XTF test which I think match what you wrote [1].
>>
>> It is indeed failing without your patch. But then there are still some 
>> weird behavior here.
>>
>> I would expect the creation of the node would also fail if instead of 
>> removing the node if removed outside of the transaction.
>>
>> This is not the case because we are looking at the current quota. So 
>> shouldn't we snapshot the global count?
> 
> As we don't do a global snapshot of the data base for a transaction 
> (this was
> changed due to huge memory needs, bad performance, and a higher transaction
> failure rate), 

I am a bit surprised that the only way to do proper transaction is to 
have a global snapshot. Instead, you could have an overlay.

> I don't think we should snapshot the count either.

But that would mean that the quota will change depending on modification 
of the global database while the transaction is inflight.

I guess this is not better nor worse that the current situation. But it 
is still really confusing for a client because:
   1) The error could happen at random point
   2) You may see an inconsistent database as nodes are only cached when 
they are first accessed

> A transaction is performed atomically at the time it is finished. Therefore
> seeing the current global state inside the transaction (with the 
> transaction
> private state on top like an overlay) is absolutely fine IMO.

To me it is just showing that our concept of transaction is very broken 
in C Xenstored. I am curious to know whether OXenstored is behaving the 
same way.

Anyway, I agree this is not better nor worse than the current situation. 
So I will acked this patch:

Acked-by: Julien Grall <jgrall@amazon.com>

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 04/13] tools/xenstore: add framework to commit accounting data on success only
  2023-02-21  8:37     ` Juergen Gross
@ 2023-02-21 22:42       ` Julien Grall
  2023-02-22  8:52         ` Juergen Gross
  0 siblings, 1 reply; 40+ messages in thread
From: Julien Grall @ 2023-02-21 22:42 UTC (permalink / raw)
  To: Juergen Gross, xen-devel; +Cc: Wei Liu, Anthony PERARD

Hi Juergen,

On 21/02/2023 08:37, Juergen Gross wrote:
> On 20.02.23 23:50, Julien Grall wrote:
>>> +        list_del(&cd->list);
>>> +        talloc_free(cd);
>>> +    }
>>> +}
>>> +
>>> +void acc_commit(struct connection *conn)
>>> +{
>>> +    struct changed_domain *cd;
>>> +    struct buffered_data *in = conn->in;
>>> +    enum accitem what;
>>> +
>>> +    conn->in = NULL; /* Avoid recursion. */
>>
>> I am not sure I understand this comment. Can you provide more details 
>> where the recursion would happen?
> 
> domain_acc_add() might do temporary accounting if conn->in isn't NULL.

I am confused. To me recursion means the function (or the caller) will 
call itself. But here you seem to say you just want to avoid temporary 
accounting.

What did I miss?

> 
>>
>>> +    while ((cd = list_top(&conn->acc_list, struct changed_domain, 
>>> list))) {
>>
>> NIT: You could use list_for_each_safe();
> 
> Like above.
> 
>>
>>> +        list_del(&cd->list);
>>> +        for (what = 0; what < ACC_REQ_N; what++)
>>
>> There is a chance that some compiler will complain about this line 
>> because ACC_REQ_N = 0. So this would always be true. Therefore...
> 
> Huh? It would always be false.

Yes false sorry. This doesn't change about the potential (temporary) 
warning.

> 
>>
>>> +            if (cd->acc[what])
>>> +                domain_acc_add(conn, cd->domid, what,
>>> +                           cd->acc[what], true);
>>> +
>>> +        talloc_free(cd);
>>> +    }
>>> +
>>> +    conn->in = in;
>>> +}
>>> +
>>>   int domain_nbentry_inc(struct connection *conn, unsigned int domid)
>>>   {
>>>       return (domain_acc_add(conn, domid, ACC_NODES, 1, false) < 0)
>>> diff --git a/tools/xenstore/xenstored_domain.h 
>>> b/tools/xenstore/xenstored_domain.h
>>> index 8259c114b0..cd85bd0cad 100644
>>> --- a/tools/xenstore/xenstored_domain.h
>>> +++ b/tools/xenstore/xenstored_domain.h
>>> @@ -20,7 +20,8 @@
>>>   #define _XENSTORED_DOMAIN_H
>>>   enum accitem {
>>> -    ACC_NODES,
>>> +    ACC_REQ_N,       /* Number of elements per request and domain. */
>>> +    ACC_NODES = ACC_REQ_N,
>>
>> I would bring forward the change in patch #5. I mean:
>>
>> ACC_NODES,
>> ACC_REQ_N
> 
> I'm not sure this is a good idea. This would activate the temporary
> accounting for nodes, but keeping the error handling active. I'd rather
> activate temporary accounting for nodes together with removing the
> accounting correction in the error handling.

I am not sure I fully understand what you would rather do. Can you clarify?

> 
>>
>>>       ACC_TR_N,        /* Number of elements per transaction and 
>>> domain. */
>>>       ACC_N = ACC_TR_N /* Number of elements per domain. */
>>>   };
>>
>> This enum is getting extremely confusing. There are many "number of 
>> elements per ... domain". Can you clarify?
> 
> I will add some more comments to the header. Would you like it better to 
> have
> the limits indented more? something like:

I am afraid it doesn't help understanding the comment. For instance,

> 
> enum accitem {
>          ACC_NODES,
>          /* Number of elements per request and domain. */

you wrote "per request and domain" here. But...

>          ACC_REQ_N,
>          /* Number of elements per transaction and domain. */

... here this is "per transaction and domain". Should not nbe "elements 
per transaction"? And if not, then why don't we had "per request, 
transaction and domain" above?

>              ACC_TR_N = ACC_REQ_N,
>          ACC_WATCH = ACC_TR_N,
>          ACC_OUTST,
>          ACC_MEM,
>          ACC_TRANS,
>          ACC_TRANSNODES,
>          ACC_NPERM,
>          ACC_PATHLEN,
>          ACC_NODESZ,
>          /* Number of elements per domain. */
>              ACC_N
> };
> 
> 
> Juergen
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 08/13] tools/xenstore: add accounting trace support
  2023-02-21  8:40     ` Juergen Gross
@ 2023-02-21 22:45       ` Julien Grall
  0 siblings, 0 replies; 40+ messages in thread
From: Julien Grall @ 2023-02-21 22:45 UTC (permalink / raw)
  To: Juergen Gross, xen-devel; +Cc: Wei Liu, Anthony PERARD

Hi Juergen,

On 21/02/2023 08:40, Juergen Gross wrote:
> On 20.02.23 23:57, Julien Grall wrote:
>> Hi Juergen,
>>
>> On 20/01/2023 10:00, Juergen Gross wrote:
>>> Add a new trace switch "acc" and the related trace calls.
>>>
>>> The "acc" switch is off per default.
>>>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>
>> With one reamrk (see below):
>>
>> Reviewed-by: Julien Grall <jgrall@amazon.com>
>>
>>> ---
>>>   tools/xenstore/xenstored_core.c   |  2 +-
>>>   tools/xenstore/xenstored_core.h   |  1 +
>>>   tools/xenstore/xenstored_domain.c | 10 ++++++++++
>>>   3 files changed, 12 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tools/xenstore/xenstored_core.c 
>>> b/tools/xenstore/xenstored_core.c
>>> index 6ef60179fa..558ef491b1 100644
>>> --- a/tools/xenstore/xenstored_core.c
>>> +++ b/tools/xenstore/xenstored_core.c
>>> @@ -2746,7 +2746,7 @@ static void set_quota(const char *arg, bool soft)
>>>   /* Sorted by bit values of TRACE_* flags. Flag is (1u << index). */
>>>   const char *const trace_switches[] = {
>>> -    "obj", "io", "wrl",
>>> +    "obj", "io", "wrl", "acc",
>>>       NULL
>>>   };
>>> diff --git a/tools/xenstore/xenstored_core.h 
>>> b/tools/xenstore/xenstored_core.h
>>> index 1f811f38cb..3e0734a6c6 100644
>>> --- a/tools/xenstore/xenstored_core.h
>>> +++ b/tools/xenstore/xenstored_core.h
>>> @@ -302,6 +302,7 @@ extern unsigned int trace_flags;
>>>   #define TRACE_OBJ    0x00000001
>>>   #define TRACE_IO    0x00000002
>>>   #define TRACE_WRL    0x00000004
>>> +#define TRACE_ACC    0x00000008
>>>   extern const char *const trace_switches[];
>>>   int set_trace_switch(const char *arg);
>>> diff --git a/tools/xenstore/xenstored_domain.c 
>>> b/tools/xenstore/xenstored_domain.c
>>> index b1e29edb7e..d461fd8cc8 100644
>>> --- a/tools/xenstore/xenstored_domain.c
>>> +++ b/tools/xenstore/xenstored_domain.c
>>> @@ -538,6 +538,12 @@ static struct domain 
>>> *find_domain_by_domid(unsigned int domid)
>>>       return (d && d->introduced) ? d : NULL;
>>>   }
>>> +#define trace_acc(...)                \
>>
>> The indentation of '\' looks odd.
> 
> Not for me. Maybe you have a different tab setting?

I only looked at the code from my mail client. In my editor, it looks OK.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 01/13] tools/xenstore: don't allow creating too many nodes in a transaction
  2023-02-21 22:36                   ` Julien Grall
@ 2023-02-22  8:36                     ` Juergen Gross
  2023-02-22  8:52                       ` Julien Grall
  0 siblings, 1 reply; 40+ messages in thread
From: Juergen Gross @ 2023-02-22  8:36 UTC (permalink / raw)
  To: Julien Grall, xen-devel; +Cc: Wei Liu, Anthony PERARD, Edwin Török


[-- Attachment #1.1.1: Type: text/plain, Size: 2538 bytes --]

On 21.02.23 23:36, Julien Grall wrote:
> Hi Juergen,
> 
> On 21/02/2023 08:10, Juergen Gross wrote:
>> On 20.02.23 19:01, Julien Grall wrote:
>>> So I have recreated an XTF test which I think match what you wrote [1].
>>>
>>> It is indeed failing without your patch. But then there are still some weird 
>>> behavior here.
>>>
>>> I would expect the creation of the node would also fail if instead of 
>>> removing the node if removed outside of the transaction.
>>>
>>> This is not the case because we are looking at the current quota. So 
>>> shouldn't we snapshot the global count?
>>
>> As we don't do a global snapshot of the data base for a transaction (this was
>> changed due to huge memory needs, bad performance, and a higher transaction
>> failure rate), 
> 
> I am a bit surprised that the only way to do proper transaction is to have a 
> global snapshot. Instead, you could have an overlay.

I didn't say that a global snapshot is the only way. And we are using an
overlay.

> 
>> I don't think we should snapshot the count either.
> 
> But that would mean that the quota will change depending on modification of the 
> global database while the transaction is inflight.

I really don't see the problem with that. But it seems our views are different
in this case.

> I guess this is not better nor worse that the current situation. But it is still 
> really confusing for a client because:
>    1) The error could happen at random point

Yes, like without a transaction.

>    2) You may see an inconsistent database as nodes are only cached when they 
> are first accessed

It isn't inconsistent at all. The same could happen if such other nodes are
added/modified/removed just a microsecond before you start the transaction.
You won't be able to tell the difference. You can only reason about nodes
being accessed in the transaction, and those are transaction-local.

>> A transaction is performed atomically at the time it is finished. Therefore
>> seeing the current global state inside the transaction (with the transaction
>> private state on top like an overlay) is absolutely fine IMO.
> 
> To me it is just showing that our concept of transaction is very broken in C 
> Xenstored. I am curious to know whether OXenstored is behaving the same way.

I don't know either.

> Anyway, I agree this is not better nor worse than the current situation. So I 
> will acked this patch:
> 
> Acked-by: Julien Grall <jgrall@amazon.com>

Thanks,


Juergen


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 04/13] tools/xenstore: add framework to commit accounting data on success only
  2023-02-21 22:42       ` Julien Grall
@ 2023-02-22  8:52         ` Juergen Gross
  0 siblings, 0 replies; 40+ messages in thread
From: Juergen Gross @ 2023-02-22  8:52 UTC (permalink / raw)
  To: Julien Grall, xen-devel; +Cc: Wei Liu, Anthony PERARD


[-- Attachment #1.1.1: Type: text/plain, Size: 4892 bytes --]

On 21.02.23 23:42, Julien Grall wrote:
> Hi Juergen,
> 
> On 21/02/2023 08:37, Juergen Gross wrote:
>> On 20.02.23 23:50, Julien Grall wrote:
>>>> +        list_del(&cd->list);
>>>> +        talloc_free(cd);
>>>> +    }
>>>> +}
>>>> +
>>>> +void acc_commit(struct connection *conn)
>>>> +{
>>>> +    struct changed_domain *cd;
>>>> +    struct buffered_data *in = conn->in;
>>>> +    enum accitem what;
>>>> +
>>>> +    conn->in = NULL; /* Avoid recursion. */
>>>
>>> I am not sure I understand this comment. Can you provide more details where 
>>> the recursion would happen?
>>
>> domain_acc_add() might do temporary accounting if conn->in isn't NULL.
> 
> I am confused. To me recursion means the function (or the caller) will call 
> itself. But here you seem to say you just want to avoid temporary accounting.

It is basically data recursion. Trying to apply temporary accounting data
leading to that temporary accounting data being modified again might result
in endless loops.

> What did I miss?
> 
>>
>>>
>>>> +    while ((cd = list_top(&conn->acc_list, struct changed_domain, list))) {
>>>
>>> NIT: You could use list_for_each_safe();
>>
>> Like above.
>>
>>>
>>>> +        list_del(&cd->list);
>>>> +        for (what = 0; what < ACC_REQ_N; what++)
>>>
>>> There is a chance that some compiler will complain about this line because 
>>> ACC_REQ_N = 0. So this would always be true. Therefore...
>>
>> Huh? It would always be false.
> 
> Yes false sorry. This doesn't change about the potential (temporary) warning.

Shouldn't that rather result in dead code elimination instead?

> 
>>
>>>
>>>> +            if (cd->acc[what])
>>>> +                domain_acc_add(conn, cd->domid, what,
>>>> +                           cd->acc[what], true);
>>>> +
>>>> +        talloc_free(cd);
>>>> +    }
>>>> +
>>>> +    conn->in = in;
>>>> +}
>>>> +
>>>>   int domain_nbentry_inc(struct connection *conn, unsigned int domid)
>>>>   {
>>>>       return (domain_acc_add(conn, domid, ACC_NODES, 1, false) < 0)
>>>> diff --git a/tools/xenstore/xenstored_domain.h 
>>>> b/tools/xenstore/xenstored_domain.h
>>>> index 8259c114b0..cd85bd0cad 100644
>>>> --- a/tools/xenstore/xenstored_domain.h
>>>> +++ b/tools/xenstore/xenstored_domain.h
>>>> @@ -20,7 +20,8 @@
>>>>   #define _XENSTORED_DOMAIN_H
>>>>   enum accitem {
>>>> -    ACC_NODES,
>>>> +    ACC_REQ_N,       /* Number of elements per request and domain. */
>>>> +    ACC_NODES = ACC_REQ_N,
>>>
>>> I would bring forward the change in patch #5. I mean:
>>>
>>> ACC_NODES,
>>> ACC_REQ_N
>>
>> I'm not sure this is a good idea. This would activate the temporary
>> accounting for nodes, but keeping the error handling active. I'd rather
>> activate temporary accounting for nodes together with removing the
>> accounting correction in the error handling.
> 
> I am not sure I fully understand what you would rather do. Can you clarify?

Having ACC_REQ_N > 0 will result in temporary accounting being active for
ACC_NODES (which is 0), due to "what < ACC_REQ_N".

At the same time there are still all the places where in error cases the
node accounting is corrected again (the mentioned error handling). So we
would have both error handling mechanisms (explicit and temp accounting)
active at the same time for nodes.

> 
>>
>>>
>>>>       ACC_TR_N,        /* Number of elements per transaction and domain. */
>>>>       ACC_N = ACC_TR_N /* Number of elements per domain. */
>>>>   };
>>>
>>> This enum is getting extremely confusing. There are many "number of elements 
>>> per ... domain". Can you clarify?
>>
>> I will add some more comments to the header. Would you like it better to have
>> the limits indented more? something like:
> 
> I am afraid it doesn't help understanding the comment. For instance,
> 
>>
>> enum accitem {
>>          ACC_NODES,
>>          /* Number of elements per request and domain. */
> 
> you wrote "per request and domain" here. But...
> 
>>          ACC_REQ_N,
>>          /* Number of elements per transaction and domain. */
> 
> ... here this is "per transaction and domain". Should not nbe "elements per 
> transaction"? And if not, then why don't we had "per request, transaction and 
> domain" above?

This is due to the nesting: the outermost nesting level is "all accounting
items", which covers everything, and is tracked on a per domain basis.

The next level is "per transaction" (I can drop the "and per domain", as
everything is per domain).

The innermost level is "per request". A request can be either part of a
transaction, or not. This doesn't matter.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 01/13] tools/xenstore: don't allow creating too many nodes in a transaction
  2023-02-22  8:36                     ` Juergen Gross
@ 2023-02-22  8:52                       ` Julien Grall
  2023-02-22  9:37                         ` Juergen Gross
  0 siblings, 1 reply; 40+ messages in thread
From: Julien Grall @ 2023-02-22  8:52 UTC (permalink / raw)
  To: Juergen Gross, xen-devel; +Cc: Wei Liu, Anthony PERARD, Edwin Török

Hi Juergen,

On 22/02/2023 08:36, Juergen Gross wrote:
> On 21.02.23 23:36, Julien Grall wrote:
>> Hi Juergen,
>>
>> On 21/02/2023 08:10, Juergen Gross wrote:
>>> On 20.02.23 19:01, Julien Grall wrote:
>>>> So I have recreated an XTF test which I think match what you wrote [1].
>>>>
>>>> It is indeed failing without your patch. But then there are still 
>>>> some weird behavior here.
>>>>
>>>> I would expect the creation of the node would also fail if instead 
>>>> of removing the node if removed outside of the transaction.
>>>>
>>>> This is not the case because we are looking at the current quota. So 
>>>> shouldn't we snapshot the global count?
>>>
>>> As we don't do a global snapshot of the data base for a transaction 
>>> (this was
>>> changed due to huge memory needs, bad performance, and a higher 
>>> transaction
>>> failure rate), 
>>
>> I am a bit surprised that the only way to do proper transaction is to 
>> have a global snapshot. Instead, you could have an overlay.
> 
> I didn't say that a global snapshot is the only way. And we are using an
> overlay.
> 
>>
>>> I don't think we should snapshot the count either.
>>
>> But that would mean that the quota will change depending on 
>> modification of the global database while the transaction is inflight.
> 
> I really don't see the problem with that. But it seems our views are 
> different
> in this case.

See below.

> 
>> I guess this is not better nor worse that the current situation. But 
>> it is still really confusing for a client because:
>>    1) The error could happen at random point
> 
> Yes, like without a transaction.
> 
>>    2) You may see an inconsistent database as nodes are only cached 
>> when they are first accessed
> 
> It isn't inconsistent at all. The same could happen if such other nodes are
> added/modified/removed just a microsecond before you start the transaction.
> You won't be able to tell the difference. You can only reason about nodes
> being accessed in the transaction, and those are transaction-local.

I am not talking about the case a node is added/modified/removed outside 
of a transaction. I am talking about the in-transaction case. For 
example, let say we have a transaction A that remove node 1, 2 and 
transaction B to access 1, 2 (it may do more).

Now let's imagine the following sequence with the initial state is node 
1 and 2 exists:

  - TA started
  - TA remove 1
  - TA remove 2
  - TA remove 3
  - TB started
  - TB read 1
  - TA ended
  - TB read 2

With the above, one would expect that transaction B can read 2 as 
transaction A didn't commit before B started. But this is not what's 
happening.

My point here is that a protocol could require that if 1 is present then 
2 is. So it would be valid for a client to error out because the other 
side was considered to have misbehaved. However, here this is just how 
Xenstored behave and AFAICT it is undocumented.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 01/13] tools/xenstore: don't allow creating too many nodes in a transaction
  2023-02-22  8:52                       ` Julien Grall
@ 2023-02-22  9:37                         ` Juergen Gross
  0 siblings, 0 replies; 40+ messages in thread
From: Juergen Gross @ 2023-02-22  9:37 UTC (permalink / raw)
  To: Julien Grall, xen-devel; +Cc: Wei Liu, Anthony PERARD, Edwin Török


[-- Attachment #1.1.1: Type: text/plain, Size: 3576 bytes --]

On 22.02.23 09:52, Julien Grall wrote:
> Hi Juergen,
> 
> On 22/02/2023 08:36, Juergen Gross wrote:
>> On 21.02.23 23:36, Julien Grall wrote:
>>> Hi Juergen,
>>>
>>> On 21/02/2023 08:10, Juergen Gross wrote:
>>>> On 20.02.23 19:01, Julien Grall wrote:
>>>>> So I have recreated an XTF test which I think match what you wrote [1].
>>>>>
>>>>> It is indeed failing without your patch. But then there are still some 
>>>>> weird behavior here.
>>>>>
>>>>> I would expect the creation of the node would also fail if instead of 
>>>>> removing the node if removed outside of the transaction.
>>>>>
>>>>> This is not the case because we are looking at the current quota. So 
>>>>> shouldn't we snapshot the global count?
>>>>
>>>> As we don't do a global snapshot of the data base for a transaction (this was
>>>> changed due to huge memory needs, bad performance, and a higher transaction
>>>> failure rate), 
>>>
>>> I am a bit surprised that the only way to do proper transaction is to have a 
>>> global snapshot. Instead, you could have an overlay.
>>
>> I didn't say that a global snapshot is the only way. And we are using an
>> overlay.
>>
>>>
>>>> I don't think we should snapshot the count either.
>>>
>>> But that would mean that the quota will change depending on modification of 
>>> the global database while the transaction is inflight.
>>
>> I really don't see the problem with that. But it seems our views are different
>> in this case.
> 
> See below.
> 
>>
>>> I guess this is not better nor worse that the current situation. But it is 
>>> still really confusing for a client because:
>>>    1) The error could happen at random point
>>
>> Yes, like without a transaction.
>>
>>>    2) You may see an inconsistent database as nodes are only cached when they 
>>> are first accessed
>>
>> It isn't inconsistent at all. The same could happen if such other nodes are
>> added/modified/removed just a microsecond before you start the transaction.
>> You won't be able to tell the difference. You can only reason about nodes
>> being accessed in the transaction, and those are transaction-local.
> 
> I am not talking about the case a node is added/modified/removed outside of a 
> transaction. I am talking about the in-transaction case. For example, let say we 
> have a transaction A that remove node 1, 2 and transaction B to access 1, 2 (it 
> may do more).
> 
> Now let's imagine the following sequence with the initial state is node 1 and 2 
> exists:
> 
>   - TA started
>   - TA remove 1
>   - TA remove 2
>   - TA remove 3
>   - TB started
>   - TB read 1
>   - TA ended
>   - TB read 2
> 
> With the above, one would expect that transaction B can read 2 as transaction A 
> didn't commit before B started. But this is not what's happening.
> 
> My point here is that a protocol could require that if 1 is present then 2 is. 
> So it would be valid for a client to error out because the other side was 
> considered to have misbehaved. However, here this is just how Xenstored behave 
> and AFAICT it is undocumented.

Ah, okay.

Yes, I can see your point.

I was thinking for some time now to switch over to a "Copy on Write" scheme for
transactions. This will require to get rid of TDB, as otherwise it will be quite
hard to setup the needed links between nodes.

I'm quite close to succeed on the TDB removal. Switching the transaction
mechanism will need some more thoughts, though.

So this is nothing I'll be able to solve soon. ;-)


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2023-02-22  9:37 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-20 10:00 [PATCH v2 00/13] tools/xenstore: rework internal accounting Juergen Gross
2023-01-20 10:00 ` [PATCH v2 01/13] tools/xenstore: don't allow creating too many nodes in a transaction Juergen Gross
2023-02-20  9:46   ` Julien Grall
2023-02-20 11:04     ` Juergen Gross
2023-02-20 12:07       ` Julien Grall
2023-02-20 13:49         ` Juergen Gross
2023-02-20 14:06           ` Juergen Gross
2023-02-20 14:15           ` Julien Grall
2023-02-20 14:21             ` Juergen Gross
2023-02-20 18:01               ` Julien Grall
2023-02-21  8:10                 ` Juergen Gross
2023-02-21 22:36                   ` Julien Grall
2023-02-22  8:36                     ` Juergen Gross
2023-02-22  8:52                       ` Julien Grall
2023-02-22  9:37                         ` Juergen Gross
2023-01-20 10:00 ` [PATCH v2 02/13] tools/xenstore: manage per-transaction domain accounting data in an array Juergen Gross
2023-02-17 18:49   ` Julien Grall
2023-01-20 10:00 ` [PATCH v2 03/13] tools/xenstore: introduce accounting data array for per-domain values Juergen Gross
2023-02-17 19:29   ` Julien Grall
2023-02-20 11:20     ` Juergen Gross
2023-02-20 12:13       ` Julien Grall
2023-01-20 10:00 ` [PATCH v2 04/13] tools/xenstore: add framework to commit accounting data on success only Juergen Gross
2023-02-20 22:50   ` Julien Grall
2023-02-21  8:37     ` Juergen Gross
2023-02-21 22:42       ` Julien Grall
2023-02-22  8:52         ` Juergen Gross
2023-01-20 10:00 ` [PATCH v2 05/13] tools/xenstore: use accounting buffering for node accounting Juergen Gross
2023-01-20 10:00 ` [PATCH v2 06/13] tools/xenstore: add current connection to domain_memory_add() parameters Juergen Gross
2023-01-20 10:00 ` [PATCH v2 07/13] tools/xenstore: use accounting data array for per-domain values Juergen Gross
2023-01-20 10:00 ` [PATCH v2 08/13] tools/xenstore: add accounting trace support Juergen Gross
2023-02-20 22:57   ` Julien Grall
2023-02-21  8:40     ` Juergen Gross
2023-02-21 22:45       ` Julien Grall
2023-01-20 10:00 ` [PATCH v2 09/13] tools/xenstore: add TDB access " Juergen Gross
2023-02-20 22:59   ` Julien Grall
2023-02-21  8:41     ` Juergen Gross
2023-01-20 10:00 ` [PATCH v2 10/13] tools/xenstore: switch transaction accounting to generic accounting Juergen Gross
2023-01-20 10:00 ` [PATCH v2 11/13] tools/xenstore: remember global and per domain max accounting values Juergen Gross
2023-01-20 10:00 ` [PATCH v2 12/13] tools/xenstore: use generic accounting for remaining quotas Juergen Gross
2023-01-20 10:00 ` [PATCH v2 13/13] tools/xenstore: switch quota management to be table based Juergen Gross

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).