linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v12 0/3] ipc: Increase IPCMNI limit
@ 2019-02-28 18:47 Waiman Long
  2019-02-28 18:47 ` [PATCH v12 1/3] ipc: Allow boot time extension of IPCMNI from 32k to 16M Waiman Long
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Waiman Long @ 2019-02-28 18:47 UTC (permalink / raw)
  To: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Takashi Iwai, Davidlohr Bueso, Manfred Spraul,
	Waiman Long

v11->v12:
 - As suggested by Matthew, change patch 2 to increment sequence number
   when there is a wrap around in generated ID instead of after a
   ID deletion.

v10->v11:
 - Remove the sysctl parameter. Now delete mode is the only way the
   sequence number is updated. The only choice users had to make is
   whether they need to specify ipcmni_extend in the boot command line
   or not.
 - Cyclical id allocation is enabled only in the ipcmni_extend mode.
 - Increase max # of ids in ipcmni_extend mode to 16M.

v9  patch: https://lkml.org/lkml/2018/9/7/1141
v10 patch: https://lkml.org/lkml/2018/11/5/791
v11 patch: https://lkml.org/lkml/2018/11/10/32

There are users out there requesting increase in the IPCMNI value to
more than 32k. This patchset does that by using a boot kernel parameter
"ipcmni_extend" to increase the IPCMNI limit from 32k to 16M when that
boot command line option is specified.

Patch 1 adds a "ipcmni_extend" boot command line parameter to extend
the IPCMNI limit from 32k to 16M.

Patch 2 changes how the sequence number within an id is being generated
by incrementing it only when the generated id is not greater than the
previous one. That reduces the chance of id reuse whether "ipcmni_extend"
is set or not.

Patch 3 makes identifier allocation go cyclical through the entire
24-bit id space with "ipcmni_extend" only to further reduce the chance
of id reuse, but probably with a slight memory and performance overhead.

The cyclical id allocation isn't done for non-ipcmni_extend mode as the
potential memory and performance overhead may be problematic on system
with slow CPU and little memory. Systems that run applications which need
more than 32k IPC identifiers can certainly afford the extra overhead.

Waiman Long (3):
  ipc: Allow boot time extension of IPCMNI from 32k to 16M
  ipc: Conserve sequence numbers in ipcmni_extend mode
  ipc: Do cyclic id allocation with ipcmni_extend mode

 Documentation/admin-guide/kernel-parameters.txt |  6 ++++
 include/linux/ipc_namespace.h                   |  1 +
 ipc/ipc_sysctl.c                                | 14 +++++++-
 ipc/util.c                                      | 27 ++++++++++-----
 ipc/util.h                                      | 46 ++++++++++++++++++++-----
 5 files changed, 76 insertions(+), 18 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v12 1/3] ipc: Allow boot time extension of IPCMNI from 32k to 16M
  2019-02-28 18:47 [PATCH v12 0/3] ipc: Increase IPCMNI limit Waiman Long
@ 2019-02-28 18:47 ` Waiman Long
  2019-02-28 18:47 ` [PATCH v12 2/3] ipc: Conserve sequence numbers in ipcmni_extend mode Waiman Long
  2019-02-28 18:47 ` [PATCH v12 3/3] ipc: Do cyclic id allocation with " Waiman Long
  2 siblings, 0 replies; 12+ messages in thread
From: Waiman Long @ 2019-02-28 18:47 UTC (permalink / raw)
  To: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Takashi Iwai, Davidlohr Bueso, Manfred Spraul,
	Waiman Long

The maximum number of unique System V IPC identifiers was limited to
32k.  That limit should be big enough for most use cases.

However, there are some users out there requesting for more, especially
those that are migrating from Solaris which uses 24 bits for unique
identifiers. To satisfy the need of those users, a new boot time kernel
option "ipcmni_extend" is added to extend the IPCMNI value to 16M. This
is a 512X increase which should be big enough for users out there that
need a large number of unique IPC identifier.

The use of this new option will change the pattern of the IPC identifiers
returned by functions like shmget(2). An application that depends on
such pattern may not work properly.  So it should only be used if the
users really need more than 32k of unique IPC numbers.

This new option does have the side effect of reducing the maximum number
of unique sequence numbers from 64k down to 128. So it is a trade-off.

The computation of a new IPC id is not done in the performance critical
path.  So a little bit of additional overhead shouldn't have any real
performance impact.

Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  3 ++
 ipc/ipc_sysctl.c                                | 12 ++++++-
 ipc/util.c                                      | 10 +++---
 ipc/util.h                                      | 44 ++++++++++++++++++++-----
 4 files changed, 54 insertions(+), 15 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 858b6c0..074b775 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1812,6 +1812,9 @@
 	ip=		[IP_PNP]
 			See Documentation/filesystems/nfs/nfsroot.txt.
 
+	ipcmni_extend	[KNL] Extend the maximum number of unique System V
+			IPC identifiers from 32,768 to 16,777,216.
+
 	irqaffinity=	[SMP] Set the default irq affinity mask
 			The argument is a cpu list, as described above.
 
diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
index 49f9bf4..73b7782 100644
--- a/ipc/ipc_sysctl.c
+++ b/ipc/ipc_sysctl.c
@@ -120,7 +120,8 @@ static int proc_ipc_sem_dointvec(struct ctl_table *table, int write,
 static int zero;
 static int one = 1;
 static int int_max = INT_MAX;
-static int ipc_mni = IPCMNI;
+int ipc_mni = IPCMNI;
+int ipc_mni_shift = IPCMNI_SHIFT;
 
 static struct ctl_table ipc_kern_table[] = {
 	{
@@ -246,3 +247,12 @@ static int __init ipc_sysctl_init(void)
 }
 
 device_initcall(ipc_sysctl_init);
+
+static int __init ipc_mni_extend(char *str)
+{
+	ipc_mni = IPCMNI_EXTEND;
+	ipc_mni_shift = IPCMNI_EXTEND_SHIFT;
+	pr_info("IPCMNI extended to %d.\n", ipc_mni);
+	return 0;
+}
+early_param("ipcmni_extend", ipc_mni_extend);
diff --git a/ipc/util.c b/ipc/util.c
index 0af0575..07ae117 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -110,7 +110,7 @@ static int __init ipc_init(void)
  * @ids: ipc identifier set
  *
  * Set up the sequence range to use for the ipc identifier range (limited
- * below IPCMNI) then initialise the keys hashtable and ids idr.
+ * below ipc_mni) then initialise the keys hashtable and ids idr.
  */
 void ipc_init_ids(struct ipc_ids *ids)
 {
@@ -226,7 +226,7 @@ static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new)
 				0, GFP_NOWAIT);
 	}
 	if (idx >= 0)
-		new->id = SEQ_MULTIPLIER * new->seq + idx;
+		new->id = (new->seq << IPCMNI_SEQ_SHIFT) + idx;
 	return idx;
 }
 
@@ -254,8 +254,8 @@ int ipc_addid(struct ipc_ids *ids, struct kern_ipc_perm *new, int limit)
 	/* 1) Initialize the refcount so that ipc_rcu_putref works */
 	refcount_set(&new->refcount, 1);
 
-	if (limit > IPCMNI)
-		limit = IPCMNI;
+	if (limit > ipc_mni)
+		limit = ipc_mni;
 
 	if (ids->in_use >= limit)
 		return -ENOSPC;
@@ -738,7 +738,7 @@ static struct kern_ipc_perm *sysvipc_find_ipc(struct ipc_ids *ids, loff_t pos,
 	if (total >= ids->in_use)
 		return NULL;
 
-	for (; pos < IPCMNI; pos++) {
+	for (; pos < ipc_mni; pos++) {
 		ipc = idr_find(&ids->ipcs_idr, pos);
 		if (ipc != NULL) {
 			*new_pos = pos + 1;
diff --git a/ipc/util.h b/ipc/util.h
index d768fdb..6a88d51 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -15,8 +15,34 @@
 #include <linux/err.h>
 #include <linux/ipc_namespace.h>
 
-#define IPCMNI 32768  /* <= MAX_INT limit for ipc arrays (including sysctl changes) */
-#define SEQ_MULTIPLIER	(IPCMNI)
+/*
+ * The IPC ID contains 2 separate numbers - index and sequence number.
+ * By default,
+ *   bits  0-14: index (32k, 15 bits)
+ *   bits 15-30: sequence number (64k, 16 bits)
+ *
+ * When IPCMNI extension mode is turned on, the composition changes:
+ *   bits  0-23: index (16M, 24 bits)
+ *   bits 24-30: sequence number (128, 7 bits)
+ */
+#define IPCMNI_SHIFT		15
+#define IPCMNI_EXTEND_SHIFT	24
+#define IPCMNI			(1 << IPCMNI_SHIFT)
+#define IPCMNI_EXTEND		(1 << IPCMNI_EXTEND_SHIFT)
+
+#ifdef CONFIG_SYSVIPC_SYSCTL
+extern int ipc_mni;
+extern int ipc_mni_shift;
+
+#define IPCMNI_SEQ_SHIFT	ipc_mni_shift
+#define IPCMNI_IDX_MASK		((1 << ipc_mni_shift) - 1)
+
+#else /* CONFIG_SYSVIPC_SYSCTL */
+
+#define ipc_mni			IPCMNI
+#define IPCMNI_SEQ_SHIFT	IPCMNI_SHIFT
+#define IPCMNI_IDX_MASK		((1 << IPCMNI_SHIFT) - 1)
+#endif /* CONFIG_SYSVIPC_SYSCTL */
 
 void sem_init(void);
 void msg_init(void);
@@ -96,9 +122,9 @@ void __init ipc_init_proc_interface(const char *path, const char *header,
 #define IPC_MSG_IDS	1
 #define IPC_SHM_IDS	2
 
-#define ipcid_to_idx(id) ((id) % SEQ_MULTIPLIER)
-#define ipcid_to_seqx(id) ((id) / SEQ_MULTIPLIER)
-#define IPCID_SEQ_MAX min_t(int, INT_MAX/SEQ_MULTIPLIER, USHRT_MAX)
+#define ipcid_to_idx(id)  ((id) & IPCMNI_IDX_MASK)
+#define ipcid_to_seqx(id) ((id) >> IPCMNI_SEQ_SHIFT)
+#define IPCID_SEQ_MAX	  (INT_MAX >> IPCMNI_SEQ_SHIFT)
 
 /* must be called with ids->rwsem acquired for writing */
 int ipc_addid(struct ipc_ids *, struct kern_ipc_perm *, int);
@@ -123,8 +149,8 @@ static inline int ipc_get_maxidx(struct ipc_ids *ids)
 	if (ids->in_use == 0)
 		return -1;
 
-	if (ids->in_use == IPCMNI)
-		return IPCMNI - 1;
+	if (ids->in_use == ipc_mni)
+		return ipc_mni - 1;
 
 	return ids->max_idx;
 }
@@ -219,10 +245,10 @@ void free_ipcs(struct ipc_namespace *ns, struct ipc_ids *ids,
 
 static inline int sem_check_semmni(struct ipc_namespace *ns) {
 	/*
-	 * Check semmni range [0, IPCMNI]
+	 * Check semmni range [0, ipc_mni]
 	 * semmni is the last element of sem_ctls[4] array
 	 */
-	return ((ns->sem_ctls[3] < 0) || (ns->sem_ctls[3] > IPCMNI))
+	return ((ns->sem_ctls[3] < 0) || (ns->sem_ctls[3] > ipc_mni))
 		? -ERANGE : 0;
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v12 2/3] ipc: Conserve sequence numbers in ipcmni_extend mode
  2019-02-28 18:47 [PATCH v12 0/3] ipc: Increase IPCMNI limit Waiman Long
  2019-02-28 18:47 ` [PATCH v12 1/3] ipc: Allow boot time extension of IPCMNI from 32k to 16M Waiman Long
@ 2019-02-28 18:47 ` Waiman Long
  2019-03-16 18:52   ` Manfred Spraul
  2019-02-28 18:47 ` [PATCH v12 3/3] ipc: Do cyclic id allocation with " Waiman Long
  2 siblings, 1 reply; 12+ messages in thread
From: Waiman Long @ 2019-02-28 18:47 UTC (permalink / raw)
  To: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Takashi Iwai, Davidlohr Bueso, Manfred Spraul,
	Waiman Long

The mixing in of a sequence number into the IPC IDs is probably to
avoid ID reuse in userspace as much as possible. With ipcmni_extend
mode, the number of usable sequence numbers is greatly reduced leading
to higher chance of ID reuse.

To address this issue, we need to conserve the sequence number space
as much as possible. Right now, the sequence number is incremented for
every new ID created. In reality, we only need to increment the sequence
number when new allocated ID is not greater than the last one allocated.
It is in such case that the new ID may collide with an existing one.
This is being done irrespective of the ipcmni mode.

Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/linux/ipc_namespace.h |  1 +
 ipc/util.c                    | 12 +++++++++---
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index 6ab8c1b..c309f43 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -19,6 +19,7 @@ struct ipc_ids {
 	struct rw_semaphore rwsem;
 	struct idr ipcs_idr;
 	int max_idx;
+	int last_idx;	/* For wrap around detection */
 #ifdef CONFIG_CHECKPOINT_RESTORE
 	int next_id;
 #endif
diff --git a/ipc/util.c b/ipc/util.c
index 07ae117..0a835a4 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -120,6 +120,7 @@ void ipc_init_ids(struct ipc_ids *ids)
 	rhashtable_init(&ids->key_ht, &ipc_kht_params);
 	idr_init(&ids->ipcs_idr);
 	ids->max_idx = -1;
+	ids->last_idx = -1;
 #ifdef CONFIG_CHECKPOINT_RESTORE
 	ids->next_id = -1;
 #endif
@@ -193,6 +194,10 @@ static struct kern_ipc_perm *ipc_findkey(struct ipc_ids *ids, key_t key)
  *
  * The caller must own kern_ipc_perm.lock.of the new object.
  * On error, the function returns a (negative) error code.
+ *
+ * To conserve sequence number space, especially with extended ipc_mni,
+ * the sequence number is incremented only when the returned ID is less than
+ * the last one.
  */
 static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new)
 {
@@ -216,10 +221,11 @@ static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new)
 	 */
 
 	if (next_id < 0) { /* !CHECKPOINT_RESTORE or next_id is unset */
-		new->seq = ids->seq++;
-		if (ids->seq > IPCID_SEQ_MAX)
-			ids->seq = 0;
 		idx = idr_alloc(&ids->ipcs_idr, new, 0, 0, GFP_NOWAIT);
+		if ((idx <= ids->last_idx) && (++ids->seq > IPCID_SEQ_MAX))
+			ids->seq = 0;
+		new->seq = ids->seq;
+		ids->last_idx = idx;
 	} else {
 		new->seq = ipcid_to_seqx(next_id);
 		idx = idr_alloc(&ids->ipcs_idr, new, ipcid_to_idx(next_id),
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v12 3/3] ipc: Do cyclic id allocation with ipcmni_extend mode
  2019-02-28 18:47 [PATCH v12 0/3] ipc: Increase IPCMNI limit Waiman Long
  2019-02-28 18:47 ` [PATCH v12 1/3] ipc: Allow boot time extension of IPCMNI from 32k to 16M Waiman Long
  2019-02-28 18:47 ` [PATCH v12 2/3] ipc: Conserve sequence numbers in ipcmni_extend mode Waiman Long
@ 2019-02-28 18:47 ` Waiman Long
  2019-03-17 18:27   ` Manfred Spraul
  2 siblings, 1 reply; 12+ messages in thread
From: Waiman Long @ 2019-02-28 18:47 UTC (permalink / raw)
  To: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Takashi Iwai, Davidlohr Bueso, Manfred Spraul,
	Waiman Long

For ipcmni_extend mode, the sequence number space is only 7 bits. So
the chance of id reuse is relatively high compared with the non-extended
mode.

To alleviate this id reuse problem, the id allocation will be done
cyclically to cycle through all the 24-bit id space before wrapping
around when in ipcmni_extend mode. This may cause the use of more memory
in term of the number of xa_nodes allocated as well as potentially more
cachelines used as the xa_nodes may be spread more sparsely in this case.

There is probably a slight memory and performance cost in doing cyclic
id allocation. For applications that really need more than 32k unique IPC
identifiers, this is a small price to pay to avoid the id reuse problem.

As a result, the chance of id reuse should be even smaller in the
ipcmni_extend mode. For users who worry about id reuse, they can
turn on ipcmni_extend mode, even if they don't need more than 32k
IPC identifiers.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 5 ++++-
 ipc/ipc_sysctl.c                                | 2 ++
 ipc/util.c                                      | 7 ++++++-
 ipc/util.h                                      | 2 ++
 4 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 074b775..bb851d0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1813,7 +1813,10 @@
 			See Documentation/filesystems/nfs/nfsroot.txt.
 
 	ipcmni_extend	[KNL] Extend the maximum number of unique System V
-			IPC identifiers from 32,768 to 16,777,216.
+			IPC identifiers from 32,768 to 16,777,216. Also do
+			cyclical identifier allocation through the entire
+			24-bit identifier space to reduce the chance of
+			identifier reuse.
 
 	irqaffinity=	[SMP] Set the default irq affinity mask
 			The argument is a cpu list, as described above.
diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
index 73b7782..d9ac6ca 100644
--- a/ipc/ipc_sysctl.c
+++ b/ipc/ipc_sysctl.c
@@ -122,6 +122,7 @@ static int proc_ipc_sem_dointvec(struct ctl_table *table, int write,
 static int int_max = INT_MAX;
 int ipc_mni = IPCMNI;
 int ipc_mni_shift = IPCMNI_SHIFT;
+bool ipc_mni_extended;
 
 static struct ctl_table ipc_kern_table[] = {
 	{
@@ -252,6 +253,7 @@ static int __init ipc_mni_extend(char *str)
 {
 	ipc_mni = IPCMNI_EXTEND;
 	ipc_mni_shift = IPCMNI_EXTEND_SHIFT;
+	ipc_mni_extended = true;
 	pr_info("IPCMNI extended to %d.\n", ipc_mni);
 	return 0;
 }
diff --git a/ipc/util.c b/ipc/util.c
index 0a835a4..78e14ac 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -221,7 +221,12 @@ static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new)
 	 */
 
 	if (next_id < 0) { /* !CHECKPOINT_RESTORE or next_id is unset */
-		idx = idr_alloc(&ids->ipcs_idr, new, 0, 0, GFP_NOWAIT);
+		if (ipc_mni_extended)
+			idx = idr_alloc_cyclic(&ids->ipcs_idr, new, 0, ipc_mni,
+						GFP_NOWAIT);
+		else
+			idx = idr_alloc(&ids->ipcs_idr, new, 0, 0, GFP_NOWAIT);
+
 		if ((idx <= ids->last_idx) && (++ids->seq > IPCID_SEQ_MAX))
 			ids->seq = 0;
 		new->seq = ids->seq;
diff --git a/ipc/util.h b/ipc/util.h
index 6a88d51..9f0dd79 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -33,6 +33,7 @@
 #ifdef CONFIG_SYSVIPC_SYSCTL
 extern int ipc_mni;
 extern int ipc_mni_shift;
+extern bool ipc_mni_extended;
 
 #define IPCMNI_SEQ_SHIFT	ipc_mni_shift
 #define IPCMNI_IDX_MASK		((1 << ipc_mni_shift) - 1)
@@ -40,6 +41,7 @@
 #else /* CONFIG_SYSVIPC_SYSCTL */
 
 #define ipc_mni			IPCMNI
+#define ipc_mni_extended	false
 #define IPCMNI_SEQ_SHIFT	IPCMNI_SHIFT
 #define IPCMNI_IDX_MASK		((1 << IPCMNI_SHIFT) - 1)
 #endif /* CONFIG_SYSVIPC_SYSCTL */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v12 2/3] ipc: Conserve sequence numbers in ipcmni_extend mode
  2019-02-28 18:47 ` [PATCH v12 2/3] ipc: Conserve sequence numbers in ipcmni_extend mode Waiman Long
@ 2019-03-16 18:52   ` Manfred Spraul
  2019-03-18 18:57     ` Waiman Long
  2019-03-18 19:00     ` Waiman Long
  0 siblings, 2 replies; 12+ messages in thread
From: Manfred Spraul @ 2019-03-16 18:52 UTC (permalink / raw)
  To: Waiman Long, Luis R. Rodriguez, Kees Cook, Andrew Morton,
	Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Takashi Iwai, Davidlohr Bueso, 1vier1

[-- Attachment #1: Type: text/plain, Size: 1021 bytes --]

Hi,

On 2/28/19 7:47 PM, Waiman Long wrote:
> @@ -216,10 +221,11 @@ static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new)
>   	 */
>   
>   	if (next_id < 0) { /* !CHECKPOINT_RESTORE or next_id is unset */
> -		new->seq = ids->seq++;
> -		if (ids->seq > IPCID_SEQ_MAX)
> -			ids->seq = 0;
>   		idx = idr_alloc(&ids->ipcs_idr, new, 0, 0, GFP_NOWAIT);
> +		if ((idx <= ids->last_idx) && (++ids->seq > IPCID_SEQ_MAX))
> +			ids->seq = 0;

I'm always impressed by such lines:

Everything in just two lines, use "++a", etc.

But: How did you test it?

idr_alloc() can fail, the code doesn't handle that :-(


> +		new->seq = ids->seq;

As written this morning:

Writing new->seq after inserting "new" into the idr creates races 
without any good reason.

I could not spot a bug, even find_alloc_undo() appears to be safe, but 
why should we take this risk?


Attached is:

- proposed replacement for this patch.

- the test patch that I have used to check the error handling.


--

     Manfred


[-- Attachment #2: patch-debug-idr_alloc_failure --]
[-- Type: text/plain, Size: 871 bytes --]

diff --git a/ipc/util.c b/ipc/util.c
index 6e0fe3410423..5dafe4bc78a1 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -309,6 +309,7 @@ int ipc_addid(struct ipc_ids *ids, struct kern_ipc_perm *new, int limit)
 		}
 	}
 	if (idx < 0) {
+pr_info("failed allocation.\n");
 		new->deleted = true;
 		spin_unlock(&new->lock);
 		rcu_read_unlock();
diff --git a/lib/idr.c b/lib/idr.c
index cb1db9b8d3f6..ba274baa87e3 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -83,6 +83,17 @@ int idr_alloc(struct idr *idr, void *ptr, int start, int end, gfp_t gfp)
 	if (WARN_ON_ONCE(start < 0))
 		return -EINVAL;
 
+	{
+		u64 a = get_jiffies_64();
+
+		if (time_after64(a, (u64)INITIAL_JIFFIES+40*HZ)) {
+			if (a%5 < 2) {
+				pr_info("idr_alloc:Failing.\n");
+				return -ENOSPC;
+			}
+		}
+	}
+
 	ret = idr_alloc_u32(idr, ptr, &id, end > 0 ? end - 1 : INT_MAX, gfp);
 	if (ret)
 		return ret;

[-- Attachment #3: 0001-ipc-Conserve-sequence-numbers-in-ipcmni_extend-mode.patch --]
[-- Type: text/x-patch, Size: 5210 bytes --]

From edee319b2d5c96af14b8b8899e5dde324861e4e4 Mon Sep 17 00:00:00 2001
From: Manfred Spraul <manfred@colorfullife.com>
Date: Sat, 16 Mar 2019 10:18:53 +0100
Subject: [PATCH] ipc: Conserve sequence numbers in ipcmni_extend mode

Rewrite, based on the patch from Waiman Long:

The mixing in of a sequence number into the IPC IDs is probably to
avoid ID reuse in userspace as much as possible. With ipcmni_extend
mode, the number of usable sequence numbers is greatly reduced leading
to higher chance of ID reuse.

To address this issue, we need to conserve the sequence number space
as much as possible. Right now, the sequence number is incremented for
every new ID created. In reality, we only need to increment the sequence
number when new allocated ID is not greater than the last one allocated.
It is in such case that the new ID may collide with an existing one.
This is being done irrespective of the ipcmni mode.

In order to avoid any races, the index is first allocated and
then the pointer is replaced.

Changes compared to the initial patch:
- Handle failures from idr_alloc().
- Avoid that concurrent operations can see the wrong
  sequence number.
(This is achieved by using idr_replace()).
- IPCMNI_SEQ_SHIFT is not a constant, thus renamed to
	ipcmni_seq_shift().
- IPCMNI_SEQ_MAX is not a constant, thus renamed to
	ipcmni_seq_max().

Suggested-by: Matthew Wilcox <willy@infradead.org>
Original-patch-from: Waiman Long <longman@redhat.com>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
---
 include/linux/ipc_namespace.h |  1 +
 ipc/util.c                    | 35 ++++++++++++++++++++++++++++++-----
 ipc/util.h                    |  8 ++++----
 3 files changed, 35 insertions(+), 9 deletions(-)

diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index 6ab8c1bada3f..c309f43bde45 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -19,6 +19,7 @@ struct ipc_ids {
 	struct rw_semaphore rwsem;
 	struct idr ipcs_idr;
 	int max_idx;
+	int last_idx;	/* For wrap around detection */
 #ifdef CONFIG_CHECKPOINT_RESTORE
 	int next_id;
 #endif
diff --git a/ipc/util.c b/ipc/util.c
index 07ae117ccdc0..6e0fe3410423 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -120,6 +120,7 @@ void ipc_init_ids(struct ipc_ids *ids)
 	rhashtable_init(&ids->key_ht, &ipc_kht_params);
 	idr_init(&ids->ipcs_idr);
 	ids->max_idx = -1;
+	ids->last_idx = -1;
 #ifdef CONFIG_CHECKPOINT_RESTORE
 	ids->next_id = -1;
 #endif
@@ -193,6 +194,10 @@ static struct kern_ipc_perm *ipc_findkey(struct ipc_ids *ids, key_t key)
  *
  * The caller must own kern_ipc_perm.lock.of the new object.
  * On error, the function returns a (negative) error code.
+ *
+ * To conserve sequence number space, especially with extended ipc_mni,
+ * the sequence number is incremented only when the returned ID is less than
+ * the last one.
  */
 static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new)
 {
@@ -216,17 +221,37 @@ static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new)
 	 */
 
 	if (next_id < 0) { /* !CHECKPOINT_RESTORE or next_id is unset */
-		new->seq = ids->seq++;
-		if (ids->seq > IPCID_SEQ_MAX)
-			ids->seq = 0;
-		idx = idr_alloc(&ids->ipcs_idr, new, 0, 0, GFP_NOWAIT);
+
+		/* allocate the idx, with a NULL struct kern_ipc_perm */
+		idx = idr_alloc(&ids->ipcs_idr, NULL, 0, 0, GFP_NOWAIT);
+
+		if (idx >= 0) {
+			/*
+			 * idx got allocated successfully.
+			 * Now calculate the sequence number and set the
+			 * pointer for real.
+			 */
+			if (idx <= ids->last_idx) {
+				ids->seq++;
+				if (ids->seq >= ipcid_seq_max())
+					ids->seq = 0;
+			}
+			ids->last_idx = idx;
+
+			new->seq = ids->seq;
+			/* no need for smp_wmb(), this is done
+			 * inside idr_replace, as part of
+			 * rcu_assign_pointer
+			 */
+			idr_replace(&ids->ipcs_idr, new, idx);
+		}
 	} else {
 		new->seq = ipcid_to_seqx(next_id);
 		idx = idr_alloc(&ids->ipcs_idr, new, ipcid_to_idx(next_id),
 				0, GFP_NOWAIT);
 	}
 	if (idx >= 0)
-		new->id = (new->seq << IPCMNI_SEQ_SHIFT) + idx;
+		new->id = (new->seq << ipcmni_seq_shift()) + idx;
 	return idx;
 }
 
diff --git a/ipc/util.h b/ipc/util.h
index 9746886757de..8c834ed39012 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -34,13 +34,13 @@
 extern int ipc_mni;
 extern int ipc_mni_shift;
 
-#define IPCMNI_SEQ_SHIFT	ipc_mni_shift
+#define ipcmni_seq_shift()	ipc_mni_shift
 #define IPCMNI_IDX_MASK		((1 << ipc_mni_shift) - 1)
 
 #else /* CONFIG_SYSVIPC_SYSCTL */
 
 #define ipc_mni			IPCMNI
-#define IPCMNI_SEQ_SHIFT	IPCMNI_SHIFT
+#define ipcmni_seq_shift()	IPCMNI_SHIFT
 #define IPCMNI_IDX_MASK		((1 << IPCMNI_SHIFT) - 1)
 #endif /* CONFIG_SYSVIPC_SYSCTL */
 
@@ -123,8 +123,8 @@ struct pid_namespace *ipc_seq_pid_ns(struct seq_file *);
 #define IPC_SHM_IDS	2
 
 #define ipcid_to_idx(id)  ((id) & IPCMNI_IDX_MASK)
-#define ipcid_to_seqx(id) ((id) >> IPCMNI_SEQ_SHIFT)
-#define IPCID_SEQ_MAX	  (INT_MAX >> IPCMNI_SEQ_SHIFT)
+#define ipcid_to_seqx(id) ((id) >> ipcmni_seq_shift())
+#define ipcid_seq_max()	  (INT_MAX >> ipcmni_seq_shift())
 
 /* must be called with ids->rwsem acquired for writing */
 int ipc_addid(struct ipc_ids *, struct kern_ipc_perm *, int);
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v12 3/3] ipc: Do cyclic id allocation with ipcmni_extend mode
  2019-02-28 18:47 ` [PATCH v12 3/3] ipc: Do cyclic id allocation with " Waiman Long
@ 2019-03-17 18:27   ` Manfred Spraul
  2019-03-18 18:37     ` Waiman Long
       [not found]     ` <728b5e85-3129-9707-3802-306f66093c78@redhat.com>
  0 siblings, 2 replies; 12+ messages in thread
From: Manfred Spraul @ 2019-03-17 18:27 UTC (permalink / raw)
  To: Waiman Long, Luis R. Rodriguez, Kees Cook, Andrew Morton,
	Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Takashi Iwai, Davidlohr Bueso

[-- Attachment #1: Type: text/plain, Size: 1859 bytes --]

Hi Waiman,

On 2/28/19 7:47 PM, Waiman Long wrote:
> For ipcmni_extend mode, the sequence number space is only 7 bits. So
> the chance of id reuse is relatively high compared with the non-extended
> mode.
>
> To alleviate this id reuse problem, the id allocation will be done
> cyclically to cycle through all the 24-bit id space before wrapping
> around when in ipcmni_extend mode. This may cause the use of more memory
> in term of the number of xa_nodes allocated as well as potentially more
> cachelines used as the xa_nodes may be spread more sparsely in this case.
>
> There is probably a slight memory and performance cost in doing cyclic
> id allocation. For applications that really need more than 32k unique IPC
> identifiers, this is a small price to pay to avoid the id reuse problem.

Have you measured it?

I have observed -3% for semop() for a 4 level radix tree compared to a 
1-level radix tree, and I'm a bit reluctant to accept that.
Especially as the percentage will increase if the syscall overhead goes 
down again (-> less spectre impact).

[...]

> --- a/ipc/util.c
> +++ b/ipc/util.c
> @@ -221,7 +221,12 @@ static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new)
>   	 */
>   
>   	if (next_id < 0) { /* !CHECKPOINT_RESTORE or next_id is unset */
> -		idx = idr_alloc(&ids->ipcs_idr, new, 0, 0, GFP_NOWAIT);
> +		if (ipc_mni_extended)
> +			idx = idr_alloc_cyclic(&ids->ipcs_idr, new, 0, ipc_mni,
> +						GFP_NOWAIT);
> +		else
> +			idx = idr_alloc(&ids->ipcs_idr, new, 0, 0, GFP_NOWAIT);
> +
>   		if ((idx <= ids->last_idx) && (++ids->seq > IPCID_SEQ_MAX))
>   			ids->seq = 0;

I don't like it that there are two different codepaths.

Attached is a different proposal:

Always use cyclic allocation, with some logic to minimize the additional 
radix tree levels.

What do you think?

--

     Manfred


[-- Attachment #2: 0002-ipc-Do-cyclic-id-allocation-for-the-ipc-objects.patch --]
[-- Type: text/x-patch, Size: 4846 bytes --]

From 491ea87cc3022e50c02caae009f9aeba2b6ddcb4 Mon Sep 17 00:00:00 2001
From: Manfred Spraul <manfred@colorfullife.com>
Date: Sun, 17 Mar 2019 06:29:00 +0100
Subject: [PATCH 2/2] ipc: Do cyclic id allocation for the ipc objects.

For ipcmni_extend mode, the sequence number space is only 7 bits. So
the chance of id reuse is relatively high compared with the non-extended
mode.

To alleviate this id reuse problem, this patch enables cyclic allocation
for the index to the radix tree (idx).
The disadvantage is that this can cause a slight slow-down of the fast
path, as the radix tree could be higher than necessary.

To limit the radix tree height, I have chosen the following limits:
- 1) The cycling is done over in_use*1.5.
- 2) At least, the cycling is done over
   "normal" ipcnmi mode: RADIX_TREE_MAP_SIZE elements
   "ipcmni_extended": 4096 elements

Result:
- for normal mode:
	No change for <= 42 active ipc elements. With more than 42
	active ipc elements, a 2nd level would be added to the radix
	tree.
	Without cyclic allocation, a 2nd level would be added only with
	more than 63 active elements.

- for extended mode:
	Cycling creates always at least a 2-level radix tree.
	With more than 2730 active objects, a 3rd level would be
	added, instead of > 4095 active objects until the 3rd level
	is added without cyclic allocation.

For a 2-level radix tree compared to a 1-level radix tree, I have
observed < 1% performance impact. I consider this as a good
compromise.

Notes:
1) Normal "x=semget();y=semget();" is unaffected: Then the idx
  is e.g. a and a+1, regardless of idr_alloc() or idr_alloc_cyclic()
  is used.

2) The -1% happens in a microbenchmark after this situation:
	x=semget();
	for(i=0;i<4000;i++) {t=semget();semctl(t,0,IPC_RMID);}
	y=semget();
	Now perform semget calls on x and y that do not sleep.

3) The worst-case reuse cycle time is unfortunately unaffected:
   If you have 2^24-1 ipc objects allocated, and get/remove the last
   possible element in a loop, then the id is reused after 128
   get/remove pairs.

Performance check:
A microbenchmark that performes no-op semop() randomly on two IDs,
with only these two IDs allocated.
The IDs were set using /proc/sys/kernel/sem_next_id.
The test was run 5 times, averages are shown.

1 & 2: Base (6.22 seconds for 10.000.000 semops)
1 & 40: -0.2%
1 & 3348: - 0.8%
1 & 27348: - 1.6%
1 & 15777204: - 3.2%

Or: ~12.6 cpu cycles per additional radix tree level.
The cpu is an Intel I3-5010U. ~1300 cpu cycles/syscall is slower
than what I remember (spectre impact?).

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
---
 ipc/ipc_sysctl.c |  2 ++
 ipc/util.c       | 10 +++++++++-
 ipc/util.h       |  3 +++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
index 73b7782eccf4..bfaae457810c 100644
--- a/ipc/ipc_sysctl.c
+++ b/ipc/ipc_sysctl.c
@@ -122,6 +122,7 @@ static int one = 1;
 static int int_max = INT_MAX;
 int ipc_mni = IPCMNI;
 int ipc_mni_shift = IPCMNI_SHIFT;
+int ipc_min_cycle = RADIX_TREE_MAP_SIZE;
 
 static struct ctl_table ipc_kern_table[] = {
 	{
@@ -252,6 +253,7 @@ static int __init ipc_mni_extend(char *str)
 {
 	ipc_mni = IPCMNI_EXTEND;
 	ipc_mni_shift = IPCMNI_EXTEND_SHIFT;
+	ipc_min_cycle = IPCMNI_EXTEND_MIN_CYCLE;
 	pr_info("IPCMNI extended to %d.\n", ipc_mni);
 	return 0;
 }
diff --git a/ipc/util.c b/ipc/util.c
index 6e0fe3410423..90764b67af51 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -221,9 +221,17 @@ static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new)
 	 */
 
 	if (next_id < 0) { /* !CHECKPOINT_RESTORE or next_id is unset */
+		int max_idx;
+
+		max_idx = ids->in_use*3/2;
+		if (max_idx > ipc_mni)
+			max_idx = ipc_mni;
+		if (max_idx < ipc_min_cycle)
+			max_idx = ipc_min_cycle;
 
 		/* allocate the idx, with a NULL struct kern_ipc_perm */
-		idx = idr_alloc(&ids->ipcs_idr, NULL, 0, 0, GFP_NOWAIT);
+		idx = idr_alloc_cyclic(&ids->ipcs_idr, NULL, 0, max_idx,
+					GFP_NOWAIT);
 
 		if (idx >= 0) {
 			/*
diff --git a/ipc/util.h b/ipc/util.h
index 8c834ed39012..ef4e86bb2db8 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -27,12 +27,14 @@
  */
 #define IPCMNI_SHIFT		15
 #define IPCMNI_EXTEND_SHIFT	24
+#define IPCMNI_EXTEND_MIN_CYCLE	(2 << 12)
 #define IPCMNI			(1 << IPCMNI_SHIFT)
 #define IPCMNI_EXTEND		(1 << IPCMNI_EXTEND_SHIFT)
 
 #ifdef CONFIG_SYSVIPC_SYSCTL
 extern int ipc_mni;
 extern int ipc_mni_shift;
+extern int ipc_min_cycle;
 
 #define ipcmni_seq_shift()	ipc_mni_shift
 #define IPCMNI_IDX_MASK		((1 << ipc_mni_shift) - 1)
@@ -40,6 +42,7 @@ extern int ipc_mni_shift;
 #else /* CONFIG_SYSVIPC_SYSCTL */
 
 #define ipc_mni			IPCMNI
+#define ipc_min_cycle		RADIX_TREE_MAP_SIZE
 #define ipcmni_seq_shift()	IPCMNI_SHIFT
 #define IPCMNI_IDX_MASK		((1 << IPCMNI_SHIFT) - 1)
 #endif /* CONFIG_SYSVIPC_SYSCTL */
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v12 3/3] ipc: Do cyclic id allocation with ipcmni_extend mode
  2019-03-17 18:27   ` Manfred Spraul
@ 2019-03-18 18:37     ` Waiman Long
  2019-03-18 18:53       ` Waiman Long
       [not found]     ` <728b5e85-3129-9707-3802-306f66093c78@redhat.com>
  1 sibling, 1 reply; 12+ messages in thread
From: Waiman Long @ 2019-03-18 18:37 UTC (permalink / raw)
  To: Manfred Spraul, Luis R. Rodriguez, Kees Cook, Andrew Morton,
	Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Takashi Iwai, Davidlohr Bueso

On 03/17/2019 02:27 PM, Manfred Spraul wrote:
> Hi Waiman,
>
> On 2/28/19 7:47 PM, Waiman Long wrote:
>> For ipcmni_extend mode, the sequence number space is only 7 bits. So
>> the chance of id reuse is relatively high compared with the non-extended
>> mode.
>>
>> To alleviate this id reuse problem, the id allocation will be done
>> cyclically to cycle through all the 24-bit id space before wrapping
>> around when in ipcmni_extend mode. This may cause the use of more memory
>> in term of the number of xa_nodes allocated as well as potentially more
>> cachelines used as the xa_nodes may be spread more sparsely in this
>> case.
>>
>> There is probably a slight memory and performance cost in doing cyclic
>> id allocation. For applications that really need more than 32k unique
>> IPC
>> identifiers, this is a small price to pay to avoid the id reuse problem.
>
> Have you measured it?
>
> I have observed -3% for semop() for a 4 level radix tree compared to a
> 1-level radix tree, and I'm a bit reluctant to accept that.
> Especially as the percentage will increase if the syscall overhead
> goes down again (-> less spectre impact).
>

It is both Spectre (retpoline) and Meltdown (PTI). PTI is not needed in
AMD CPU and so you may see a bit higher slowdown.

> [...]
>
>> --- a/ipc/util.c
>> +++ b/ipc/util.c
>> @@ -221,7 +221,12 @@ static inline int ipc_idr_alloc(struct ipc_ids
>> *ids, struct kern_ipc_perm *new)
>>        */
>>         if (next_id < 0) { /* !CHECKPOINT_RESTORE or next_id is unset */
>> -        idx = idr_alloc(&ids->ipcs_idr, new, 0, 0, GFP_NOWAIT);
>> +        if (ipc_mni_extended)
>> +            idx = idr_alloc_cyclic(&ids->ipcs_idr, new, 0, ipc_mni,
>> +                        GFP_NOWAIT);
>> +        else
>> +            idx = idr_alloc(&ids->ipcs_idr, new, 0, 0, GFP_NOWAIT);
>> +
>>           if ((idx <= ids->last_idx) && (++ids->seq > IPCID_SEQ_MAX))
>>               ids->seq = 0;
>
> I don't like it that there are two different codepaths.
>
> Attached is a different proposal:
>
> Always use cyclic allocation, with some logic to minimize the
> additional radix tree levels.
>
> What do you think?

Your proposed patch look good. I saw that you use the max_idx to limit
radix tree nesting level which mitigate my concern of using more memory
and slower performance. I do have some minor comments about the patch in
a later email.

-Longman



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v12 3/3] ipc: Do cyclic id allocation with ipcmni_extend mode
  2019-03-18 18:37     ` Waiman Long
@ 2019-03-18 18:53       ` Waiman Long
  0 siblings, 0 replies; 12+ messages in thread
From: Waiman Long @ 2019-03-18 18:53 UTC (permalink / raw)
  To: Manfred Spraul, Luis R. Rodriguez, Kees Cook, Andrew Morton,
	Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Takashi Iwai, Davidlohr Bueso

On 03/18/2019 02:37 PM, Waiman Long wrote:
> On 03/17/2019 02:27 PM, Manfred Spraul wrote:
>> Hi Waiman,
>>
>> On 2/28/19 7:47 PM, Waiman Long wrote:
>>> For ipcmni_extend mode, the sequence number space is only 7 bits. So
>>> the chance of id reuse is relatively high compared with the non-extended
>>> mode.
>>>
>>> To alleviate this id reuse problem, the id allocation will be done
>>> cyclically to cycle through all the 24-bit id space before wrapping
>>> around when in ipcmni_extend mode. This may cause the use of more memory
>>> in term of the number of xa_nodes allocated as well as potentially more
>>> cachelines used as the xa_nodes may be spread more sparsely in this
>>> case.
>>>
>>> There is probably a slight memory and performance cost in doing cyclic
>>> id allocation. For applications that really need more than 32k unique
>>> IPC
>>> identifiers, this is a small price to pay to avoid the id reuse problem.
>> Have you measured it?
>>
>> I have observed -3% for semop() for a 4 level radix tree compared to a
>> 1-level radix tree, and I'm a bit reluctant to accept that.
>> Especially as the percentage will increase if the syscall overhead
>> goes down again (-> less spectre impact).
>>
> It is both Spectre (retpoline) and Meltdown (PTI). PTI is not needed in
> AMD CPU and so you may see a bit higher slowdown.

The use of idr_replace() in your previous patch may also slow the code
path a bit to reduce the performance difference that you saw. This is
actually my main concern with using idr_replace() as suggested by
Matthew, but I am OK to use it if people think absolute correctness is
more important.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v12 2/3] ipc: Conserve sequence numbers in ipcmni_extend mode
  2019-03-16 18:52   ` Manfred Spraul
@ 2019-03-18 18:57     ` Waiman Long
  2019-03-18 19:00     ` Waiman Long
  1 sibling, 0 replies; 12+ messages in thread
From: Waiman Long @ 2019-03-18 18:57 UTC (permalink / raw)
  To: Manfred Spraul, Luis R. Rodriguez, Kees Cook, Andrew Morton,
	Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Takashi Iwai, Davidlohr Bueso, 1vier1

On 03/16/2019 02:52 PM, Manfred Spraul wrote:
> Hi,
>
> On 2/28/19 7:47 PM, Waiman Long wrote:
>> @@ -216,10 +221,11 @@ static inline int ipc_idr_alloc(struct ipc_ids
>> *ids, struct kern_ipc_perm *new)
>>        */
>>         if (next_id < 0) { /* !CHECKPOINT_RESTORE or next_id is unset */
>> -        new->seq = ids->seq++;
>> -        if (ids->seq > IPCID_SEQ_MAX)
>> -            ids->seq = 0;
>>           idx = idr_alloc(&ids->ipcs_idr, new, 0, 0, GFP_NOWAIT);
>> +        if ((idx <= ids->last_idx) && (++ids->seq > IPCID_SEQ_MAX))
>> +            ids->seq = 0;
>
> I'm always impressed by such lines:
>
> Everything in just two lines, use "++a", etc.
>
> But: How did you test it?
>
> idr_alloc() can fail, the code doesn't handle that :-(
>
>

You are right. I should have checked for the error case.

Thanks for spotting that.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v12 2/3] ipc: Conserve sequence numbers in ipcmni_extend mode
  2019-03-16 18:52   ` Manfred Spraul
  2019-03-18 18:57     ` Waiman Long
@ 2019-03-18 19:00     ` Waiman Long
  1 sibling, 0 replies; 12+ messages in thread
From: Waiman Long @ 2019-03-18 19:00 UTC (permalink / raw)
  To: Manfred Spraul, Luis R. Rodriguez, Kees Cook, Andrew Morton,
	Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Takashi Iwai, Davidlohr Bueso, 1vier1

On 03/16/2019 02:52 PM, Manfred Spraul wrote:
> From edee319b2d5c96af14b8b8899e5dde324861e4e4 Mon Sep 17 00:00:00 2001
> From: Manfred Spraul <manfred@colorfullife.com>
> Date: Sat, 16 Mar 2019 10:18:53 +0100
> Subject: [PATCH] ipc: Conserve sequence numbers in ipcmni_extend mode
>
> Rewrite, based on the patch from Waiman Long:
>
> The mixing in of a sequence number into the IPC IDs is probably to
> avoid ID reuse in userspace as much as possible. With ipcmni_extend
> mode, the number of usable sequence numbers is greatly reduced leading
> to higher chance of ID reuse.
>
> To address this issue, we need to conserve the sequence number space
> as much as possible. Right now, the sequence number is incremented for
> every new ID created. In reality, we only need to increment the sequence
> number when new allocated ID is not greater than the last one allocated.
> It is in such case that the new ID may collide with an existing one.
> This is being done irrespective of the ipcmni mode.
>
> In order to avoid any races, the index is first allocated and
> then the pointer is replaced.
>
> Changes compared to the initial patch:
> - Handle failures from idr_alloc().
> - Avoid that concurrent operations can see the wrong
>   sequence number.
> (This is achieved by using idr_replace()).
> - IPCMNI_SEQ_SHIFT is not a constant, thus renamed to
> 	ipcmni_seq_shift().
> - IPCMNI_SEQ_MAX is not a constant, thus renamed to
> 	ipcmni_seq_max().
>
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Original-patch-from: Waiman Long <longman@redhat.com>
> Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
> ---
>  include/linux/ipc_namespace.h |  1 +
>  ipc/util.c                    | 35 ++++++++++++++++++++++++++++++-----
>  ipc/util.h                    |  8 ++++----
>  3 files changed, 35 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
> index 6ab8c1bada3f..c309f43bde45 100644
> --- a/include/linux/ipc_namespace.h
> +++ b/include/linux/ipc_namespace.h
> @@ -19,6 +19,7 @@ struct ipc_ids {
>  	struct rw_semaphore rwsem;
>  	struct idr ipcs_idr;
>  	int max_idx;
> +	int last_idx;	/* For wrap around detection */
>  #ifdef CONFIG_CHECKPOINT_RESTORE
>  	int next_id;
>  #endif
> diff --git a/ipc/util.c b/ipc/util.c
> index 07ae117ccdc0..6e0fe3410423 100644
> --- a/ipc/util.c
> +++ b/ipc/util.c
> @@ -120,6 +120,7 @@ void ipc_init_ids(struct ipc_ids *ids)
>  	rhashtable_init(&ids->key_ht, &ipc_kht_params);
>  	idr_init(&ids->ipcs_idr);
>  	ids->max_idx = -1;
> +	ids->last_idx = -1;
>  #ifdef CONFIG_CHECKPOINT_RESTORE
>  	ids->next_id = -1;
>  #endif
> @@ -193,6 +194,10 @@ static struct kern_ipc_perm *ipc_findkey(struct ipc_ids *ids, key_t key)
>   *
>   * The caller must own kern_ipc_perm.lock.of the new object.
>   * On error, the function returns a (negative) error code.
> + *
> + * To conserve sequence number space, especially with extended ipc_mni,
> + * the sequence number is incremented only when the returned ID is less than
> + * the last one.
>   */
>  static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new)
>  {
> @@ -216,17 +221,37 @@ static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new)
>  	 */
>  
>  	if (next_id < 0) { /* !CHECKPOINT_RESTORE or next_id is unset */
> -		new->seq = ids->seq++;
> -		if (ids->seq > IPCID_SEQ_MAX)
> -			ids->seq = 0;
> -		idx = idr_alloc(&ids->ipcs_idr, new, 0, 0, GFP_NOWAIT);
> +
> +		/* allocate the idx, with a NULL struct kern_ipc_perm */
> +		idx = idr_alloc(&ids->ipcs_idr, NULL, 0, 0, GFP_NOWAIT);
> +
> +		if (idx >= 0) {
> +			/*
> +			 * idx got allocated successfully.
> +			 * Now calculate the sequence number and set the
> +			 * pointer for real.
> +			 */
> +			if (idx <= ids->last_idx) {
> +				ids->seq++;
> +				if (ids->seq >= ipcid_seq_max())
> +					ids->seq = 0;
> +			}
> +			ids->last_idx = idx;
> +
> +			new->seq = ids->seq;
> +			/* no need for smp_wmb(), this is done
> +			 * inside idr_replace, as part of
> +			 * rcu_assign_pointer
> +			 */
> +			idr_replace(&ids->ipcs_idr, new, idx);
> +		}
>  	} else {
>  		new->seq = ipcid_to_seqx(next_id);
>  		idx = idr_alloc(&ids->ipcs_idr, new, ipcid_to_idx(next_id),
>  				0, GFP_NOWAIT);
>  	}
>  	if (idx >= 0)
> -		new->id = (new->seq << IPCMNI_SEQ_SHIFT) + idx;
> +		new->id = (new->seq << ipcmni_seq_shift()) + idx;
>  	return idx;
>  }
>  
> diff --git a/ipc/util.h b/ipc/util.h
> index 9746886757de..8c834ed39012 100644
> --- a/ipc/util.h
> +++ b/ipc/util.h
> @@ -34,13 +34,13 @@
>  extern int ipc_mni;
>  extern int ipc_mni_shift;
>  
> -#define IPCMNI_SEQ_SHIFT	ipc_mni_shift
> +#define ipcmni_seq_shift()	ipc_mni_shift
>  #define IPCMNI_IDX_MASK		((1 << ipc_mni_shift) - 1)
>  
>  #else /* CONFIG_SYSVIPC_SYSCTL */
>  
>  #define ipc_mni			IPCMNI
> -#define IPCMNI_SEQ_SHIFT	IPCMNI_SHIFT
> +#define ipcmni_seq_shift()	IPCMNI_SHIFT
>  #define IPCMNI_IDX_MASK		((1 << IPCMNI_SHIFT) - 1)
>  #endif /* CONFIG_SYSVIPC_SYSCTL */
>  
> @@ -123,8 +123,8 @@ struct pid_namespace *ipc_seq_pid_ns(struct seq_file *);
>  #define IPC_SHM_IDS	2
>  
>  #define ipcid_to_idx(id)  ((id) & IPCMNI_IDX_MASK)
> -#define ipcid_to_seqx(id) ((id) >> IPCMNI_SEQ_SHIFT)
> -#define IPCID_SEQ_MAX	  (INT_MAX >> IPCMNI_SEQ_SHIFT)
> +#define ipcid_to_seqx(id) ((id) >> ipcmni_seq_shift())
> +#define ipcid_seq_max()	  (INT_MAX >> ipcmni_seq_shift())
>  
>  /* must be called with ids->rwsem acquired for writing */
>  int ipc_addid(struct ipc_ids *, struct kern_ipc_perm *, int);

Acked-by: Waiman Long <longman@redhat.com>

I am fine with this patch replacing mine.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v12 3/3] ipc: Do cyclic id allocation with ipcmni_extend mode
       [not found]     ` <728b5e85-3129-9707-3802-306f66093c78@redhat.com>
@ 2019-03-19 18:18       ` Manfred Spraul
  2019-03-19 18:46         ` Waiman Long
  0 siblings, 1 reply; 12+ messages in thread
From: Manfred Spraul @ 2019-03-19 18:18 UTC (permalink / raw)
  To: Waiman Long, Luis R. Rodriguez, Kees Cook, Andrew Morton,
	Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Takashi Iwai, Davidlohr Bueso

[-- Attachment #1: Type: text/plain, Size: 1373 bytes --]

Hi Waiman,


On 3/18/19 7:46 PM, Waiman Long wrote:
> --- a/ipc/util.c
>> +++ b/ipc/util.c
>> @@ -221,9 +221,17 @@ static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new)
>>   	 */
>>   
>>   	if (next_id < 0) { /* !CHECKPOINT_RESTORE or next_id is unset */
>> +		int max_idx;
>> +
>> +		max_idx = ids->in_use*3/2;
>> +		if (max_idx > ipc_mni)
>> +			max_idx = ipc_mni;
>> +		if (max_idx < ipc_min_cycle)
>> +			max_idx = ipc_min_cycle;
>
> Why don't you use the min() and max() macros which will make it easier 
> to read?
>
Changed.
>>   
>>   		/* allocate the idx, with a NULL struct kern_ipc_perm */
>> -		idx = idr_alloc(&ids->ipcs_idr, NULL, 0, 0, GFP_NOWAIT);
>> +		idx = idr_alloc_cyclic(&ids->ipcs_idr, NULL, 0, max_idx,
>> +					GFP_NOWAIT);
>>   
>>   		if (idx >= 0) {
>>   			/*
>> diff --git a/ipc/util.h b/ipc/util.h
>> index 8c834ed39012..ef4e86bb2db8 100644
>> --- a/ipc/util.h
>> +++ b/ipc/util.h
>> @@ -27,12 +27,14 @@
>>    */
>>   #define IPCMNI_SHIFT		15
>>   #define IPCMNI_EXTEND_SHIFT	24
>> +#define IPCMNI_EXTEND_MIN_CYCLE	(2 << 12)
>
> How about
>
> #define IPCMNI_EXTEND_MIN_CYCLE    (RADIX_TREE_MAP_SIZE * 
> RADIX_TREE_MAP_SIZE)
>
Good idea.
Actually, "2<<12" was the initial guess.

And then I noticed that this ends up as a two level radix tree during 
testing :-)


Updated patch attached.

--

     Manfred


[-- Attachment #2: 0002-ipc-Do-cyclic-id-allocation-for-the-ipc-object.patch --]
[-- Type: text/x-patch, Size: 4890 bytes --]

From 844c9d78cea41983a89c820bd5265ceded59883b Mon Sep 17 00:00:00 2001
From: Manfred Spraul <manfred@colorfullife.com>
Date: Sun, 17 Mar 2019 06:29:00 +0100
Subject: [PATCH 2/2] ipc: Do cyclic id allocation for the ipc object.

For ipcmni_extend mode, the sequence number space is only 7 bits. So
the chance of id reuse is relatively high compared with the non-extended
mode.

To alleviate this id reuse problem, this patch enables cyclic allocation
for the index to the radix tree (idx).
The disadvantage is that this can cause a slight slow-down of the fast
path, as the radix tree could be higher than necessary.

To limit the radix tree height, I have chosen the following limits:
- 1) The cycling is done over in_use*1.5.
- 2) At least, the cycling is done over
   "normal" ipcnmi mode: RADIX_TREE_MAP_SIZE elements
   "ipcmni_extended": 4096 elements

Result:
- for normal mode:
	No change for <= 42 active ipc elements. With more than 42
	active ipc elements, a 2nd level would be added to the radix
	tree.
	Without cyclic allocation, a 2nd level would be added only with
	more than 63 active elements.

- for extended mode:
	Cycling creates always at least a 2-level radix tree.
	With more than 2730 active objects, a 3rd level would be
	added, instead of > 4095 active objects until the 3rd level
	is added without cyclic allocation.

For a 2-level radix tree compared to a 1-level radix tree, I have
observed < 1% performance impact.

Notes:
1) Normal "x=semget();y=semget();" is unaffected: Then the idx
  is e.g. a and a+1, regardless if idr_alloc() or idr_alloc_cyclic()
  is used.

2) The -1% happens in a microbenchmark after this situation:
	x=semget();
	for(i=0;i<4000;i++) {t=semget();semctl(t,0,IPC_RMID);}
	y=semget();
	Now perform semget calls on x and y that do not sleep.

3) The worst-case reuse cycle time is unfortunately unaffected:
   If you have 2^24-1 ipc objects allocated, and get/remove the last
   possible element in a loop, then the id is reused after 128
   get/remove pairs.

Performance check:
A microbenchmark that performes no-op semop() randomly on two IDs,
with only these two IDs allocated.
The IDs were set using /proc/sys/kernel/sem_next_id.
The test was run 5 times, averages are shown.

1 & 2: Base (6.22 seconds for 10.000.000 semops)
1 & 40: -0.2%
1 & 3348: - 0.8%
1 & 27348: - 1.6%
1 & 15777204: - 3.2%

Or: ~12.6 cpu cycles per additional radix tree level.
The cpu is an Intel I3-5010U. ~1300 cpu cycles/syscall is slower
than what I remember (spectre impact?).

V2 of the patch:
- use "min" and "max"
- use RADIX_TREE_MAP_SIZE * RADIX_TREE_MAP_SIZE instead of
	(2<<12).

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
---
 ipc/ipc_sysctl.c | 2 ++
 ipc/util.c       | 7 ++++++-
 ipc/util.h       | 3 +++
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
index 73b7782eccf4..bfaae457810c 100644
--- a/ipc/ipc_sysctl.c
+++ b/ipc/ipc_sysctl.c
@@ -122,6 +122,7 @@ static int one = 1;
 static int int_max = INT_MAX;
 int ipc_mni = IPCMNI;
 int ipc_mni_shift = IPCMNI_SHIFT;
+int ipc_min_cycle = RADIX_TREE_MAP_SIZE;
 
 static struct ctl_table ipc_kern_table[] = {
 	{
@@ -252,6 +253,7 @@ static int __init ipc_mni_extend(char *str)
 {
 	ipc_mni = IPCMNI_EXTEND;
 	ipc_mni_shift = IPCMNI_EXTEND_SHIFT;
+	ipc_min_cycle = IPCMNI_EXTEND_MIN_CYCLE;
 	pr_info("IPCMNI extended to %d.\n", ipc_mni);
 	return 0;
 }
diff --git a/ipc/util.c b/ipc/util.c
index 6e0fe3410423..1a492afb1d8b 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -221,9 +221,14 @@ static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new)
 	 */
 
 	if (next_id < 0) { /* !CHECKPOINT_RESTORE or next_id is unset */
+		int max_idx;
+
+		max_idx = max(ids->in_use*3/2, ipc_min_cycle);
+		max_idx = min(max_idx, ipc_mni);
 
 		/* allocate the idx, with a NULL struct kern_ipc_perm */
-		idx = idr_alloc(&ids->ipcs_idr, NULL, 0, 0, GFP_NOWAIT);
+		idx = idr_alloc_cyclic(&ids->ipcs_idr, NULL, 0, max_idx,
+					GFP_NOWAIT);
 
 		if (idx >= 0) {
 			/*
diff --git a/ipc/util.h b/ipc/util.h
index 8c834ed39012..d316399f0c32 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -27,12 +27,14 @@
  */
 #define IPCMNI_SHIFT		15
 #define IPCMNI_EXTEND_SHIFT	24
+#define IPCMNI_EXTEND_MIN_CYCLE	(RADIX_TREE_MAP_SIZE * RADIX_TREE_MAP_SIZE)
 #define IPCMNI			(1 << IPCMNI_SHIFT)
 #define IPCMNI_EXTEND		(1 << IPCMNI_EXTEND_SHIFT)
 
 #ifdef CONFIG_SYSVIPC_SYSCTL
 extern int ipc_mni;
 extern int ipc_mni_shift;
+extern int ipc_min_cycle;
 
 #define ipcmni_seq_shift()	ipc_mni_shift
 #define IPCMNI_IDX_MASK		((1 << ipc_mni_shift) - 1)
@@ -40,6 +42,7 @@ extern int ipc_mni_shift;
 #else /* CONFIG_SYSVIPC_SYSCTL */
 
 #define ipc_mni			IPCMNI
+#define ipc_min_cycle		RADIX_TREE_MAP_SIZE
 #define ipcmni_seq_shift()	IPCMNI_SHIFT
 #define IPCMNI_IDX_MASK		((1 << IPCMNI_SHIFT) - 1)
 #endif /* CONFIG_SYSVIPC_SYSCTL */
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v12 3/3] ipc: Do cyclic id allocation with ipcmni_extend mode
  2019-03-19 18:18       ` Manfred Spraul
@ 2019-03-19 18:46         ` Waiman Long
  0 siblings, 0 replies; 12+ messages in thread
From: Waiman Long @ 2019-03-19 18:46 UTC (permalink / raw)
  To: Manfred Spraul, Luis R. Rodriguez, Kees Cook, Andrew Morton,
	Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Takashi Iwai, Davidlohr Bueso

On 03/19/2019 02:18 PM, Manfred Spraul wrote:
> From 844c9d78cea41983a89c820bd5265ceded59883b Mon Sep 17 00:00:00 2001
> From: Manfred Spraul <manfred@colorfullife.com>
> Date: Sun, 17 Mar 2019 06:29:00 +0100
> Subject: [PATCH 2/2] ipc: Do cyclic id allocation for the ipc object.
>
> For ipcmni_extend mode, the sequence number space is only 7 bits. So
> the chance of id reuse is relatively high compared with the non-extended
> mode.
>
> To alleviate this id reuse problem, this patch enables cyclic allocation
> for the index to the radix tree (idx).
> The disadvantage is that this can cause a slight slow-down of the fast
> path, as the radix tree could be higher than necessary.
>
> To limit the radix tree height, I have chosen the following limits:
> - 1) The cycling is done over in_use*1.5.
> - 2) At least, the cycling is done over
>    "normal" ipcnmi mode: RADIX_TREE_MAP_SIZE elements
>    "ipcmni_extended": 4096 elements
>
> Result:
> - for normal mode:
> 	No change for <= 42 active ipc elements. With more than 42
> 	active ipc elements, a 2nd level would be added to the radix
> 	tree.
> 	Without cyclic allocation, a 2nd level would be added only with
> 	more than 63 active elements.
>
> - for extended mode:
> 	Cycling creates always at least a 2-level radix tree.
> 	With more than 2730 active objects, a 3rd level would be
> 	added, instead of > 4095 active objects until the 3rd level
> 	is added without cyclic allocation.
>
> For a 2-level radix tree compared to a 1-level radix tree, I have
> observed < 1% performance impact.
>
> Notes:
> 1) Normal "x=semget();y=semget();" is unaffected: Then the idx
>   is e.g. a and a+1, regardless if idr_alloc() or idr_alloc_cyclic()
>   is used.
>
> 2) The -1% happens in a microbenchmark after this situation:
> 	x=semget();
> 	for(i=0;i<4000;i++) {t=semget();semctl(t,0,IPC_RMID);}
> 	y=semget();
> 	Now perform semget calls on x and y that do not sleep.
>
> 3) The worst-case reuse cycle time is unfortunately unaffected:
>    If you have 2^24-1 ipc objects allocated, and get/remove the last
>    possible element in a loop, then the id is reused after 128
>    get/remove pairs.
>
> Performance check:
> A microbenchmark that performes no-op semop() randomly on two IDs,
> with only these two IDs allocated.
> The IDs were set using /proc/sys/kernel/sem_next_id.
> The test was run 5 times, averages are shown.
>
> 1 & 2: Base (6.22 seconds for 10.000.000 semops)
> 1 & 40: -0.2%
> 1 & 3348: - 0.8%
> 1 & 27348: - 1.6%
> 1 & 15777204: - 3.2%
>
> Or: ~12.6 cpu cycles per additional radix tree level.
> The cpu is an Intel I3-5010U. ~1300 cpu cycles/syscall is slower
> than what I remember (spectre impact?).
>
> V2 of the patch:
> - use "min" and "max"
> - use RADIX_TREE_MAP_SIZE * RADIX_TREE_MAP_SIZE instead of
> 	(2<<12).
>
> Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
> ---
>  ipc/ipc_sysctl.c | 2 ++
>  ipc/util.c       | 7 ++++++-
>  ipc/util.h       | 3 +++
>  3 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
> index 73b7782eccf4..bfaae457810c 100644
> --- a/ipc/ipc_sysctl.c
> +++ b/ipc/ipc_sysctl.c
> @@ -122,6 +122,7 @@ static int one = 1;
>  static int int_max = INT_MAX;
>  int ipc_mni = IPCMNI;
>  int ipc_mni_shift = IPCMNI_SHIFT;
> +int ipc_min_cycle = RADIX_TREE_MAP_SIZE;
>  
>  static struct ctl_table ipc_kern_table[] = {
>  	{
> @@ -252,6 +253,7 @@ static int __init ipc_mni_extend(char *str)
>  {
>  	ipc_mni = IPCMNI_EXTEND;
>  	ipc_mni_shift = IPCMNI_EXTEND_SHIFT;
> +	ipc_min_cycle = IPCMNI_EXTEND_MIN_CYCLE;
>  	pr_info("IPCMNI extended to %d.\n", ipc_mni);
>  	return 0;
>  }
> diff --git a/ipc/util.c b/ipc/util.c
> index 6e0fe3410423..1a492afb1d8b 100644
> --- a/ipc/util.c
> +++ b/ipc/util.c
> @@ -221,9 +221,14 @@ static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new)
>  	 */
>  
>  	if (next_id < 0) { /* !CHECKPOINT_RESTORE or next_id is unset */
> +		int max_idx;
> +
> +		max_idx = max(ids->in_use*3/2, ipc_min_cycle);
> +		max_idx = min(max_idx, ipc_mni);
>  
>  		/* allocate the idx, with a NULL struct kern_ipc_perm */
> -		idx = idr_alloc(&ids->ipcs_idr, NULL, 0, 0, GFP_NOWAIT);
> +		idx = idr_alloc_cyclic(&ids->ipcs_idr, NULL, 0, max_idx,
> +					GFP_NOWAIT);
>  
>  		if (idx >= 0) {
>  			/*
> diff --git a/ipc/util.h b/ipc/util.h
> index 8c834ed39012..d316399f0c32 100644
> --- a/ipc/util.h
> +++ b/ipc/util.h
> @@ -27,12 +27,14 @@
>   */
>  #define IPCMNI_SHIFT		15
>  #define IPCMNI_EXTEND_SHIFT	24
> +#define IPCMNI_EXTEND_MIN_CYCLE	(RADIX_TREE_MAP_SIZE * RADIX_TREE_MAP_SIZE)
>  #define IPCMNI			(1 << IPCMNI_SHIFT)
>  #define IPCMNI_EXTEND		(1 << IPCMNI_EXTEND_SHIFT)
>  
>  #ifdef CONFIG_SYSVIPC_SYSCTL
>  extern int ipc_mni;
>  extern int ipc_mni_shift;
> +extern int ipc_min_cycle;
>  
>  #define ipcmni_seq_shift()	ipc_mni_shift
>  #define IPCMNI_IDX_MASK		((1 << ipc_mni_shift) - 1)
> @@ -40,6 +42,7 @@ extern int ipc_mni_shift;
>  #else /* CONFIG_SYSVIPC_SYSCTL */
>  
>  #define ipc_mni			IPCMNI
> +#define ipc_min_cycle		RADIX_TREE_MAP_SIZE
>  #define ipcmni_seq_shift()	IPCMNI_SHIFT
>  #define IPCMNI_IDX_MASK		((1 << IPCMNI_SHIFT) - 1)
>  #endif /* CONFIG_SYSVIPC_SYSCTL */
> -- 2.17.2

Acked-by: Waiman Long <longman@redhat.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-03-19 18:46 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-28 18:47 [PATCH v12 0/3] ipc: Increase IPCMNI limit Waiman Long
2019-02-28 18:47 ` [PATCH v12 1/3] ipc: Allow boot time extension of IPCMNI from 32k to 16M Waiman Long
2019-02-28 18:47 ` [PATCH v12 2/3] ipc: Conserve sequence numbers in ipcmni_extend mode Waiman Long
2019-03-16 18:52   ` Manfred Spraul
2019-03-18 18:57     ` Waiman Long
2019-03-18 19:00     ` Waiman Long
2019-02-28 18:47 ` [PATCH v12 3/3] ipc: Do cyclic id allocation with " Waiman Long
2019-03-17 18:27   ` Manfred Spraul
2019-03-18 18:37     ` Waiman Long
2019-03-18 18:53       ` Waiman Long
     [not found]     ` <728b5e85-3129-9707-3802-306f66093c78@redhat.com>
2019-03-19 18:18       ` Manfred Spraul
2019-03-19 18:46         ` Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).