linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v11 0/3] ipc: Increase IPCMNI limit & IPC id generation modes
@ 2018-11-09 20:11 Waiman Long
  2018-11-09 20:11 ` [PATCH v11 1/3] ipc: Allow boot time extension of IPCMNI from 32k to 16M Waiman Long
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Waiman Long @ 2018-11-09 20:11 UTC (permalink / raw)
  To: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Takashi Iwai, Davidlohr Bueso, Manfred Spraul,
	Waiman Long

v10->v11:
 - Remove the sysctl parameter. Now delete mode is the only way the
   sequence number is updated. The only choice users had to make is
   whether they need to specify ipcmni_extend in the boot command line
   or not.
 - Cyclical id allocation is enabled only in the ipcmni_extend mode.
 - Increase max # of ids in ipcmni_extend mode to 16M.

v8  patch: https://lkml.org/lkml/2018/6/18/706
v9  patch: https://lkml.org/lkml/2018/9/7/1141
v10 patch: https://lkml.org/lkml/2018/11/5/791

There are users out there requesting increase in the IPCMNI value to
more than 32k. This patchset does that by using a boot kernel parameter
"ipcmni_extend" to increase the IPCMNI limit from 32k to 16M when that
boot command line option is specified.

Patch 1 adds a "ipcmni_extend" boot command line parameter to extend
the IPCMNI limit from 32k to 16M.

Patch 2 changes how the sequence number within an id is being generated
by incrementing it only when one or more ids are deleted previously to
reduce the chance of id reuse whether "ipcmni_extend" is set or not.

Patch 3 makes identifier allocation go cyclical through the entire
24-bit id space with "ipcmni_extend" only to further reduce the chance
of id reuse, but probably with a slight memory and performance overhead.

The cyclical id allocation isn't done for non-ipcmni_extend mode as the
potential memory and performance overhead may be problematic on system
with slow CPU and little memory. Systems that run applications which need
more than 32k IPC identifiers can certainly afford the extra overhead.

Waiman Long (3):
  ipc: Allow boot time extension of IPCMNI from 32k to 16M
  ipc: Conserve sequence numbers in ipcmni_extend mode
  ipc: Do cyclic id allocation with ipcmni_extend mode

 Documentation/admin-guide/kernel-parameters.txt |  6 ++++
 include/linux/ipc_namespace.h                   |  1 +
 ipc/ipc_sysctl.c                                | 14 +++++++-
 ipc/util.c                                      | 32 ++++++++++++-----
 ipc/util.h                                      | 46 ++++++++++++++++++++-----
 5 files changed, 80 insertions(+), 19 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v11 1/3] ipc: Allow boot time extension of IPCMNI from 32k to 16M
  2018-11-09 20:11 [PATCH v11 0/3] ipc: Increase IPCMNI limit & IPC id generation modes Waiman Long
@ 2018-11-09 20:11 ` Waiman Long
  2018-11-20 19:45   ` Manfred Spraul
  2018-11-09 20:11 ` [PATCH v11 2/3] ipc: Conserve sequence numbers in ipcmni_extend mode Waiman Long
  2018-11-09 20:11 ` [PATCH v11 3/3] ipc: Do cyclic id allocation with " Waiman Long
  2 siblings, 1 reply; 9+ messages in thread
From: Waiman Long @ 2018-11-09 20:11 UTC (permalink / raw)
  To: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Takashi Iwai, Davidlohr Bueso, Manfred Spraul,
	Waiman Long

The maximum number of unique System V IPC identifiers was limited to
32k.  That limit should be big enough for most use cases.

However, there are some users out there requesting for more, especially
those that are migrating from Solaris which uses 24 bits for unique
identifiers. To satisfy the need of those users, a new boot time kernel
option "ipcmni_extend" is added to extend the IPCMNI value to 16M. This
is a 512X increase which should be big enough for users out there that
need a large number of unique IPC identifier.

The use of this new option will change the pattern of the IPC identifiers
returned by functions like shmget(2). An application that depends on
such pattern may not work properly.  So it should only be used if the
users really need more than 32k of unique IPC numbers.

This new option does have the side effect of reducing the maximum number
of unique sequence numbers from 64k down to 128. So it is a trade-off.

The computation of a new IPC id is not done in the performance critical
path.  So a little bit of additional overhead shouldn't have any real
performance impact.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  3 ++
 ipc/ipc_sysctl.c                                | 12 ++++++-
 ipc/util.c                                      | 10 +++---
 ipc/util.h                                      | 44 ++++++++++++++++++++-----
 4 files changed, 54 insertions(+), 15 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 81d1d5a..93d1454 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1799,6 +1799,9 @@
 	ip=		[IP_PNP]
 			See Documentation/filesystems/nfs/nfsroot.txt.
 
+	ipcmni_extend	[KNL] Extend the maximum number of unique System V
+			IPC identifiers from 32,768 to 16,777,216.
+
 	irqaffinity=	[SMP] Set the default irq affinity mask
 			The argument is a cpu list, as described above.
 
diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
index 49f9bf4..73b7782 100644
--- a/ipc/ipc_sysctl.c
+++ b/ipc/ipc_sysctl.c
@@ -120,7 +120,8 @@ static int proc_ipc_sem_dointvec(struct ctl_table *table, int write,
 static int zero;
 static int one = 1;
 static int int_max = INT_MAX;
-static int ipc_mni = IPCMNI;
+int ipc_mni = IPCMNI;
+int ipc_mni_shift = IPCMNI_SHIFT;
 
 static struct ctl_table ipc_kern_table[] = {
 	{
@@ -246,3 +247,12 @@ static int __init ipc_sysctl_init(void)
 }
 
 device_initcall(ipc_sysctl_init);
+
+static int __init ipc_mni_extend(char *str)
+{
+	ipc_mni = IPCMNI_EXTEND;
+	ipc_mni_shift = IPCMNI_EXTEND_SHIFT;
+	pr_info("IPCMNI extended to %d.\n", ipc_mni);
+	return 0;
+}
+early_param("ipcmni_extend", ipc_mni_extend);
diff --git a/ipc/util.c b/ipc/util.c
index 0af0575..07ae117 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -110,7 +110,7 @@ static int __init ipc_init(void)
  * @ids: ipc identifier set
  *
  * Set up the sequence range to use for the ipc identifier range (limited
- * below IPCMNI) then initialise the keys hashtable and ids idr.
+ * below ipc_mni) then initialise the keys hashtable and ids idr.
  */
 void ipc_init_ids(struct ipc_ids *ids)
 {
@@ -226,7 +226,7 @@ static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new)
 				0, GFP_NOWAIT);
 	}
 	if (idx >= 0)
-		new->id = SEQ_MULTIPLIER * new->seq + idx;
+		new->id = (new->seq << IPCMNI_SEQ_SHIFT) + idx;
 	return idx;
 }
 
@@ -254,8 +254,8 @@ int ipc_addid(struct ipc_ids *ids, struct kern_ipc_perm *new, int limit)
 	/* 1) Initialize the refcount so that ipc_rcu_putref works */
 	refcount_set(&new->refcount, 1);
 
-	if (limit > IPCMNI)
-		limit = IPCMNI;
+	if (limit > ipc_mni)
+		limit = ipc_mni;
 
 	if (ids->in_use >= limit)
 		return -ENOSPC;
@@ -738,7 +738,7 @@ static struct kern_ipc_perm *sysvipc_find_ipc(struct ipc_ids *ids, loff_t pos,
 	if (total >= ids->in_use)
 		return NULL;
 
-	for (; pos < IPCMNI; pos++) {
+	for (; pos < ipc_mni; pos++) {
 		ipc = idr_find(&ids->ipcs_idr, pos);
 		if (ipc != NULL) {
 			*new_pos = pos + 1;
diff --git a/ipc/util.h b/ipc/util.h
index d768fdb..6a88d51 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -15,8 +15,34 @@
 #include <linux/err.h>
 #include <linux/ipc_namespace.h>
 
-#define IPCMNI 32768  /* <= MAX_INT limit for ipc arrays (including sysctl changes) */
-#define SEQ_MULTIPLIER	(IPCMNI)
+/*
+ * The IPC ID contains 2 separate numbers - index and sequence number.
+ * By default,
+ *   bits  0-14: index (32k, 15 bits)
+ *   bits 15-30: sequence number (64k, 16 bits)
+ *
+ * When IPCMNI extension mode is turned on, the composition changes:
+ *   bits  0-23: index (16M, 24 bits)
+ *   bits 24-30: sequence number (128, 7 bits)
+ */
+#define IPCMNI_SHIFT		15
+#define IPCMNI_EXTEND_SHIFT	24
+#define IPCMNI			(1 << IPCMNI_SHIFT)
+#define IPCMNI_EXTEND		(1 << IPCMNI_EXTEND_SHIFT)
+
+#ifdef CONFIG_SYSVIPC_SYSCTL
+extern int ipc_mni;
+extern int ipc_mni_shift;
+
+#define IPCMNI_SEQ_SHIFT	ipc_mni_shift
+#define IPCMNI_IDX_MASK		((1 << ipc_mni_shift) - 1)
+
+#else /* CONFIG_SYSVIPC_SYSCTL */
+
+#define ipc_mni			IPCMNI
+#define IPCMNI_SEQ_SHIFT	IPCMNI_SHIFT
+#define IPCMNI_IDX_MASK		((1 << IPCMNI_SHIFT) - 1)
+#endif /* CONFIG_SYSVIPC_SYSCTL */
 
 void sem_init(void);
 void msg_init(void);
@@ -96,9 +122,9 @@ void __init ipc_init_proc_interface(const char *path, const char *header,
 #define IPC_MSG_IDS	1
 #define IPC_SHM_IDS	2
 
-#define ipcid_to_idx(id) ((id) % SEQ_MULTIPLIER)
-#define ipcid_to_seqx(id) ((id) / SEQ_MULTIPLIER)
-#define IPCID_SEQ_MAX min_t(int, INT_MAX/SEQ_MULTIPLIER, USHRT_MAX)
+#define ipcid_to_idx(id)  ((id) & IPCMNI_IDX_MASK)
+#define ipcid_to_seqx(id) ((id) >> IPCMNI_SEQ_SHIFT)
+#define IPCID_SEQ_MAX	  (INT_MAX >> IPCMNI_SEQ_SHIFT)
 
 /* must be called with ids->rwsem acquired for writing */
 int ipc_addid(struct ipc_ids *, struct kern_ipc_perm *, int);
@@ -123,8 +149,8 @@ static inline int ipc_get_maxidx(struct ipc_ids *ids)
 	if (ids->in_use == 0)
 		return -1;
 
-	if (ids->in_use == IPCMNI)
-		return IPCMNI - 1;
+	if (ids->in_use == ipc_mni)
+		return ipc_mni - 1;
 
 	return ids->max_idx;
 }
@@ -219,10 +245,10 @@ void free_ipcs(struct ipc_namespace *ns, struct ipc_ids *ids,
 
 static inline int sem_check_semmni(struct ipc_namespace *ns) {
 	/*
-	 * Check semmni range [0, IPCMNI]
+	 * Check semmni range [0, ipc_mni]
 	 * semmni is the last element of sem_ctls[4] array
 	 */
-	return ((ns->sem_ctls[3] < 0) || (ns->sem_ctls[3] > IPCMNI))
+	return ((ns->sem_ctls[3] < 0) || (ns->sem_ctls[3] > ipc_mni))
 		? -ERANGE : 0;
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v11 2/3] ipc: Conserve sequence numbers in ipcmni_extend mode
  2018-11-09 20:11 [PATCH v11 0/3] ipc: Increase IPCMNI limit & IPC id generation modes Waiman Long
  2018-11-09 20:11 ` [PATCH v11 1/3] ipc: Allow boot time extension of IPCMNI from 32k to 16M Waiman Long
@ 2018-11-09 20:11 ` Waiman Long
  2018-11-10  7:41   ` Matthew Wilcox
  2018-11-09 20:11 ` [PATCH v11 3/3] ipc: Do cyclic id allocation with " Waiman Long
  2 siblings, 1 reply; 9+ messages in thread
From: Waiman Long @ 2018-11-09 20:11 UTC (permalink / raw)
  To: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Takashi Iwai, Davidlohr Bueso, Manfred Spraul,
	Waiman Long

The mixing in of a sequence number into the IPC IDs is probably to
avoid ID reuse in userspace as much as possible. With ipcmni_extend
mode, the number of usable sequence numbers is greatly reduced leading
to higher chance of ID reuse.

To address this issue, we need to conserve the sequence number space
as much as possible. Right now, the sequence number is incremented
for every new ID created. In reality, we only need to increment the
sequence number when one or more IDs have been removed previously to
make sure that those IDs will not be reused when a new one is built.
This is being done irrespective of the ipcmni mode.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/linux/ipc_namespace.h |  1 +
 ipc/util.c                    | 16 +++++++++++++---
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index 6ab8c1b..7d5f553 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -16,6 +16,7 @@
 struct ipc_ids {
 	int in_use;
 	unsigned short seq;
+	unsigned short deleted;
 	struct rw_semaphore rwsem;
 	struct idr ipcs_idr;
 	int max_idx;
diff --git a/ipc/util.c b/ipc/util.c
index 07ae117..00000a1 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -115,6 +115,7 @@ static int __init ipc_init(void)
 void ipc_init_ids(struct ipc_ids *ids)
 {
 	ids->in_use = 0;
+	ids->deleted = false;
 	ids->seq = 0;
 	init_rwsem(&ids->rwsem);
 	rhashtable_init(&ids->key_ht, &ipc_kht_params);
@@ -193,6 +194,10 @@ static struct kern_ipc_perm *ipc_findkey(struct ipc_ids *ids, key_t key)
  *
  * The caller must own kern_ipc_perm.lock.of the new object.
  * On error, the function returns a (negative) error code.
+ *
+ * To conserve sequence number space, especially with extended ipc_mni,
+ * the sequence number is incremented only when one or more IDs have been
+ * removed previously.
  */
 static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new)
 {
@@ -216,9 +221,13 @@ static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new)
 	 */
 
 	if (next_id < 0) { /* !CHECKPOINT_RESTORE or next_id is unset */
-		new->seq = ids->seq++;
-		if (ids->seq > IPCID_SEQ_MAX)
-			ids->seq = 0;
+		if (ids->deleted) {
+			ids->seq++;
+			if (ids->seq > IPCID_SEQ_MAX)
+				ids->seq = 0;
+			ids->deleted = false;
+		}
+		new->seq = ids->seq;
 		idx = idr_alloc(&ids->ipcs_idr, new, 0, 0, GFP_NOWAIT);
 	} else {
 		new->seq = ipcid_to_seqx(next_id);
@@ -436,6 +445,7 @@ void ipc_rmid(struct ipc_ids *ids, struct kern_ipc_perm *ipcp)
 	idr_remove(&ids->ipcs_idr, idx);
 	ipc_kht_remove(ids, ipcp);
 	ids->in_use--;
+	ids->deleted = true;
 	ipcp->deleted = true;
 
 	if (unlikely(idx == ids->max_idx)) {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v11 3/3] ipc: Do cyclic id allocation with ipcmni_extend mode
  2018-11-09 20:11 [PATCH v11 0/3] ipc: Increase IPCMNI limit & IPC id generation modes Waiman Long
  2018-11-09 20:11 ` [PATCH v11 1/3] ipc: Allow boot time extension of IPCMNI from 32k to 16M Waiman Long
  2018-11-09 20:11 ` [PATCH v11 2/3] ipc: Conserve sequence numbers in ipcmni_extend mode Waiman Long
@ 2018-11-09 20:11 ` Waiman Long
  2 siblings, 0 replies; 9+ messages in thread
From: Waiman Long @ 2018-11-09 20:11 UTC (permalink / raw)
  To: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Takashi Iwai, Davidlohr Bueso, Manfred Spraul,
	Waiman Long

For ipcmni_extend mode, the sequence number space is only 7 bits. So
the chance of id reuse is relatively high compared with the non-extended
mode.

To alleviate this id reuse problem, the id allocation will be done
cyclically to cycle through all the 24-bit id space before wrapping
around when in ipcmni_extend mode. This may cause the use of more memory
in term of the number of xa_nodes allocated as well as potentially more
cachelines used as the xa_nodes may be spread more sparsely in this case.

There is probably a slight memory and performance cost in doing cyclic
id allocation. For applications that really need more than 32k unique IPC
identifiers, this is a small price to pay to avoid the id reuse problem.

As a result, the chance of id reuse should be even smaller in the
ipcmni_extend mode. For users who worry about id reuse, they can
turn on ipcmni_extend mode, even if they don't need more than 32k
IPC identifiers.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 5 ++++-
 ipc/ipc_sysctl.c                                | 2 ++
 ipc/util.c                                      | 6 +++++-
 ipc/util.h                                      | 2 ++
 4 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 93d1454..49620b9 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1800,7 +1800,10 @@
 			See Documentation/filesystems/nfs/nfsroot.txt.
 
 	ipcmni_extend	[KNL] Extend the maximum number of unique System V
-			IPC identifiers from 32,768 to 16,777,216.
+			IPC identifiers from 32,768 to 16,777,216. Also do
+			cyclical identifier allocation through the entire
+			24-bit identifier space to reduce the chance of
+			identifier reuse.
 
 	irqaffinity=	[SMP] Set the default irq affinity mask
 			The argument is a cpu list, as described above.
diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
index 73b7782..d9ac6ca 100644
--- a/ipc/ipc_sysctl.c
+++ b/ipc/ipc_sysctl.c
@@ -122,6 +122,7 @@ static int proc_ipc_sem_dointvec(struct ctl_table *table, int write,
 static int int_max = INT_MAX;
 int ipc_mni = IPCMNI;
 int ipc_mni_shift = IPCMNI_SHIFT;
+bool ipc_mni_extended;
 
 static struct ctl_table ipc_kern_table[] = {
 	{
@@ -252,6 +253,7 @@ static int __init ipc_mni_extend(char *str)
 {
 	ipc_mni = IPCMNI_EXTEND;
 	ipc_mni_shift = IPCMNI_EXTEND_SHIFT;
+	ipc_mni_extended = true;
 	pr_info("IPCMNI extended to %d.\n", ipc_mni);
 	return 0;
 }
diff --git a/ipc/util.c b/ipc/util.c
index 00000a1..634b190 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -228,7 +228,11 @@ static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new)
 			ids->deleted = false;
 		}
 		new->seq = ids->seq;
-		idx = idr_alloc(&ids->ipcs_idr, new, 0, 0, GFP_NOWAIT);
+		if (ipc_mni_extended)
+			idx = idr_alloc_cyclic(&ids->ipcs_idr, new, 0, ipc_mni,
+						GFP_NOWAIT);
+		else
+			idx = idr_alloc(&ids->ipcs_idr, new, 0, 0, GFP_NOWAIT);
 	} else {
 		new->seq = ipcid_to_seqx(next_id);
 		idx = idr_alloc(&ids->ipcs_idr, new, ipcid_to_idx(next_id),
diff --git a/ipc/util.h b/ipc/util.h
index 6a88d51..9f0dd79 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -33,6 +33,7 @@
 #ifdef CONFIG_SYSVIPC_SYSCTL
 extern int ipc_mni;
 extern int ipc_mni_shift;
+extern bool ipc_mni_extended;
 
 #define IPCMNI_SEQ_SHIFT	ipc_mni_shift
 #define IPCMNI_IDX_MASK		((1 << ipc_mni_shift) - 1)
@@ -40,6 +41,7 @@
 #else /* CONFIG_SYSVIPC_SYSCTL */
 
 #define ipc_mni			IPCMNI
+#define ipc_mni_extended	false
 #define IPCMNI_SEQ_SHIFT	IPCMNI_SHIFT
 #define IPCMNI_IDX_MASK		((1 << IPCMNI_SHIFT) - 1)
 #endif /* CONFIG_SYSVIPC_SYSCTL */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v11 2/3] ipc: Conserve sequence numbers in ipcmni_extend mode
  2018-11-09 20:11 ` [PATCH v11 2/3] ipc: Conserve sequence numbers in ipcmni_extend mode Waiman Long
@ 2018-11-10  7:41   ` Matthew Wilcox
  2018-11-10 13:55     ` Waiman Long
  2018-11-20 19:41     ` Manfred Spraul
  0 siblings, 2 replies; 9+ messages in thread
From: Matthew Wilcox @ 2018-11-10  7:41 UTC (permalink / raw)
  To: Waiman Long
  Cc: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet,
	linux-kernel, linux-fsdevel, linux-doc, Al Viro,
	Eric W. Biederman, Takashi Iwai, Davidlohr Bueso, Manfred Spraul

On Fri, Nov 09, 2018 at 03:11:31PM -0500, Waiman Long wrote:
> The mixing in of a sequence number into the IPC IDs is probably to
> avoid ID reuse in userspace as much as possible. With ipcmni_extend
> mode, the number of usable sequence numbers is greatly reduced leading
> to higher chance of ID reuse.
> 
> To address this issue, we need to conserve the sequence number space
> as much as possible. Right now, the sequence number is incremented
> for every new ID created. In reality, we only need to increment the
> sequence number when one or more IDs have been removed previously to
> make sure that those IDs will not be reused when a new one is built.
> This is being done irrespective of the ipcmni mode.

That's not what I said.  Increment the sequence ID when the cursor wraps,
not when there's been a deletion.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v11 2/3] ipc: Conserve sequence numbers in ipcmni_extend mode
  2018-11-10  7:41   ` Matthew Wilcox
@ 2018-11-10 13:55     ` Waiman Long
  2018-11-20 19:41     ` Manfred Spraul
  1 sibling, 0 replies; 9+ messages in thread
From: Waiman Long @ 2018-11-10 13:55 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet,
	linux-kernel, linux-fsdevel, linux-doc, Al Viro,
	Eric W. Biederman, Takashi Iwai, Davidlohr Bueso, Manfred Spraul

On 11/10/2018 02:41 AM, Matthew Wilcox wrote:
> On Fri, Nov 09, 2018 at 03:11:31PM -0500, Waiman Long wrote:
>> The mixing in of a sequence number into the IPC IDs is probably to
>> avoid ID reuse in userspace as much as possible. With ipcmni_extend
>> mode, the number of usable sequence numbers is greatly reduced leading
>> to higher chance of ID reuse.
>>
>> To address this issue, we need to conserve the sequence number space
>> as much as possible. Right now, the sequence number is incremented
>> for every new ID created. In reality, we only need to increment the
>> sequence number when one or more IDs have been removed previously to
>> make sure that those IDs will not be reused when a new one is built.
>> This is being done irrespective of the ipcmni mode.
> That's not what I said.  Increment the sequence ID when the cursor wraps,
> not when there's been a deletion.

With non-cyclic idr allocation, the cursor will never wraps back to 0.
It is to the lowest available integer. I can do that with cyclic idr
allocation.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v11 2/3] ipc: Conserve sequence numbers in ipcmni_extend mode
  2018-11-10  7:41   ` Matthew Wilcox
  2018-11-10 13:55     ` Waiman Long
@ 2018-11-20 19:41     ` Manfred Spraul
       [not found]       ` <0dd6a66b-4c7f-5224-bcf9-646b3a012a10@redhat.com>
  1 sibling, 1 reply; 9+ messages in thread
From: Manfred Spraul @ 2018-11-20 19:41 UTC (permalink / raw)
  To: Matthew Wilcox, Waiman Long
  Cc: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet,
	linux-kernel, linux-fsdevel, linux-doc, Al Viro,
	Eric W. Biederman, Takashi Iwai, Davidlohr Bueso

[-- Attachment #1: Type: text/plain, Size: 1121 bytes --]

Hi Matthew,

On 11/10/18 8:41 AM, Matthew Wilcox wrote:
> On Fri, Nov 09, 2018 at 03:11:31PM -0500, Waiman Long wrote:
>> The mixing in of a sequence number into the IPC IDs is probably to
>> avoid ID reuse in userspace as much as possible. With ipcmni_extend
>> mode, the number of usable sequence numbers is greatly reduced leading
>> to higher chance of ID reuse.
>>
>> To address this issue, we need to conserve the sequence number space
>> as much as possible. Right now, the sequence number is incremented
>> for every new ID created. In reality, we only need to increment the
>> sequence number when one or more IDs have been removed previously to
>> make sure that those IDs will not be reused when a new one is built.
>> This is being done irrespective of the ipcmni mode.
> That's not what I said.  Increment the sequence ID when the cursor wraps,
> not when there's been a deletion.

Something like the attached patch?

Unfortunately, idr_alloc_cyclic cannot be used, this creates some 
copy&paste from lib/idr.c to ipc/util.c
[as potential replacement for patch 2 and 3 from the series]

--

     Manfred


[-- Attachment #2: 0002-ipc-util.c-use-idr_alloc_cyclic-for-ipc-allocations.patch --]
[-- Type: text/x-patch, Size: 3487 bytes --]

From 6bbade73d21884258a995698f21ad3128df8e98a Mon Sep 17 00:00:00 2001
From: Manfred Spraul <manfred@colorfullife.com>
Date: Sat, 29 Sep 2018 15:43:28 +0200
Subject: [PATCH 2/2] ipc/util.c: use idr_alloc_cyclic() for ipc allocations

A bit related to the patch that increases IPC_MNI, and
partially based on the mail from willy@infradead.org:

(User space) id reuse create the risk of data corruption:

Process A: calls ipc function
Process A: sleeps just at the beginning of the syscall
Process B: Frees the ipc object (i.e.: calls ...ctl(IPC_RMID)
Process B: Creates a new ipc object (i.e.: calls ...get())
	<If new object and old object have the same id>
Process A: is woken up, and accesses the new object

To reduce the probability that the new and the old object have the
same id, the current implementation adds a sequence number to the
index of the object in the idr tree.

To further reduce the probability for a reuse, perform a cyclic
allocation, and increase the sequence number only when there is
a wrap-around. Unfortunately, idr_alloc_cyclic cannot be used,
because the sequence number must be increased when a wrap-around
occurs.

The patch cycles over at least RADIX_TREE_MAP_SIZE, i.e.
if there is only a small number of objects, the accesses
continue to be direct.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
---
 ipc/util.c | 48 ++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 44 insertions(+), 4 deletions(-)

diff --git a/ipc/util.c b/ipc/util.c
index 07ae117ccdc0..fa7b8fa7a14c 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -216,10 +216,49 @@ static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new)
 	 */
 
 	if (next_id < 0) { /* !CHECKPOINT_RESTORE or next_id is unset */
-		new->seq = ids->seq++;
-		if (ids->seq > IPCID_SEQ_MAX)
-			ids->seq = 0;
-		idx = idr_alloc(&ids->ipcs_idr, new, 0, 0, GFP_NOWAIT);
+		int idx_max;
+
+		/*
+		 * If a user space visible id is reused, then this creates a
+		 * risk for data corruption. To reduce the probability that
+		 * a number is reused, three approaches are used:
+		 * 1) the idr index is allocated cyclically.
+		 * 2) the use space id is build by concatenating the
+		 *    internal idr index with a sequence number.
+		 * 3) The sequence number is only increased when the index
+		 *    wraps around.
+		 * Note that this code cannot use idr_alloc_cyclic:
+		 * new->seq must be set before the entry is inserted in the
+		 * idr.
+		 */
+		idx_max = ids->in_use*2;
+		if (idx_max < RADIX_TREE_MAP_SIZE)
+			idx_max = RADIX_TREE_MAP_SIZE;
+		if (idx_max > ipc_mni)
+			idx_max = ipc_mni;
+
+		if (ids->ipcs_idr.idr_next <= idx_max) {
+			new->seq = ids->seq;
+			idx = idr_alloc(&ids->ipcs_idr, new,
+						ids->ipcs_idr.idr_next,
+						idx_max, GFP_NOWAIT);
+		}
+
+		if ((idx == -ENOSPC) && (ids->ipcs_idr.idr_next > 0)) {
+			/*
+			 * A wrap around occurred.
+			 * Increase ids->seq, update new->seq
+			 */
+			ids->seq++;
+			if (ids->seq > IPCID_SEQ_MAX)
+				ids->seq = 0;
+			new->seq = ids->seq;
+
+			idx = idr_alloc(&ids->ipcs_idr, new, 0, idx_max,
+						GFP_NOWAIT);
+		}
+		if (idx >= 0)
+			ids->ipcs_idr.idr_next = idx+1;
 	} else {
 		new->seq = ipcid_to_seqx(next_id);
 		idx = idr_alloc(&ids->ipcs_idr, new, ipcid_to_idx(next_id),
@@ -227,6 +266,7 @@ static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new)
 	}
 	if (idx >= 0)
 		new->id = (new->seq << IPCMNI_SEQ_SHIFT) + idx;
+
 	return idx;
 }
 
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v11 1/3] ipc: Allow boot time extension of IPCMNI from 32k to 16M
  2018-11-09 20:11 ` [PATCH v11 1/3] ipc: Allow boot time extension of IPCMNI from 32k to 16M Waiman Long
@ 2018-11-20 19:45   ` Manfred Spraul
  0 siblings, 0 replies; 9+ messages in thread
From: Manfred Spraul @ 2018-11-20 19:45 UTC (permalink / raw)
  To: Waiman Long, Luis R. Rodriguez, Kees Cook, Andrew Morton,
	Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Takashi Iwai, Davidlohr Bueso

On 11/9/18 9:11 PM, Waiman Long wrote:
> The maximum number of unique System V IPC identifiers was limited to
> 32k.  That limit should be big enough for most use cases.
>
> However, there are some users out there requesting for more, especially
> those that are migrating from Solaris which uses 24 bits for unique
> identifiers. To satisfy the need of those users, a new boot time kernel
> option "ipcmni_extend" is added to extend the IPCMNI value to 16M. This
> is a 512X increase which should be big enough for users out there that
> need a large number of unique IPC identifier.
>
> The use of this new option will change the pattern of the IPC identifiers
> returned by functions like shmget(2). An application that depends on
> such pattern may not work properly.  So it should only be used if the
> users really need more than 32k of unique IPC numbers.
>
> This new option does have the side effect of reducing the maximum number
> of unique sequence numbers from 64k down to 128. So it is a trade-off.
>
> The computation of a new IPC id is not done in the performance critical
> path.  So a little bit of additional overhead shouldn't have any real
> performance impact.
>
> Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v11 2/3] ipc: Conserve sequence numbers in ipcmni_extend mode
       [not found]       ` <0dd6a66b-4c7f-5224-bcf9-646b3a012a10@redhat.com>
@ 2019-03-10 12:47         ` Manfred Spraul
  0 siblings, 0 replies; 9+ messages in thread
From: Manfred Spraul @ 2019-03-10 12:47 UTC (permalink / raw)
  To: Waiman Long, Matthew Wilcox
  Cc: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet,
	linux-kernel, linux-fsdevel, linux-doc, Al Viro,
	Eric W. Biederman, Takashi Iwai, Davidlohr Bueso

On 2/27/19 9:30 PM, Waiman Long wrote:
> On 11/20/2018 02:41 PM, Manfred Spraul wrote:
>>  From 6bbade73d21884258a995698f21ad3128df8e98a Mon Sep 17 00:00:00 2001
>> From: Manfred Spraul<manfred@colorfullife.com>
>> Date: Sat, 29 Sep 2018 15:43:28 +0200
>> Subject: [PATCH 2/2] ipc/util.c: use idr_alloc_cyclic() for ipc allocations
>>
>> A bit related to the patch that increases IPC_MNI, and
>> partially based on the mail fromwilly@infradead.org:
>>
>> (User space) id reuse create the risk of data corruption:
>>
>> Process A: calls ipc function
>> Process A: sleeps just at the beginning of the syscall
>> Process B: Frees the ipc object (i.e.: calls ...ctl(IPC_RMID)
>> Process B: Creates a new ipc object (i.e.: calls ...get())
>> 	<If new object and old object have the same id>
>> Process A: is woken up, and accesses the new object
>>
>> To reduce the probability that the new and the old object have the
>> same id, the current implementation adds a sequence number to the
>> index of the object in the idr tree.
>>
>> To further reduce the probability for a reuse, perform a cyclic
>> allocation, and increase the sequence number only when there is
>> a wrap-around. Unfortunately, idr_alloc_cyclic cannot be used,
>> because the sequence number must be increased when a wrap-around
>> occurs.
>>
>> The patch cycles over at least RADIX_TREE_MAP_SIZE, i.e.
>> if there is only a small number of objects, the accesses
>> continue to be direct.
>>
>> Signed-off-by: Manfred Spraul<manfred@colorfullife.com>
>> ---
>>   ipc/util.c | 48 ++++++++++++++++++++++++++++++++++++++++++++----
>>   1 file changed, 44 insertions(+), 4 deletions(-)
>>
>> diff --git a/ipc/util.c b/ipc/util.c
>> index 07ae117ccdc0..fa7b8fa7a14c 100644
>> --- a/ipc/util.c
>> +++ b/ipc/util.c
>> @@ -216,10 +216,49 @@ static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new)
>>   	 */
>>   
>>   	if (next_id < 0) { /* !CHECKPOINT_RESTORE or next_id is unset */
>> -		new->seq = ids->seq++;
>> -		if (ids->seq > IPCID_SEQ_MAX)
>> -			ids->seq = 0;
>> -		idx = idr_alloc(&ids->ipcs_idr, new, 0, 0, GFP_NOWAIT);
>> +		int idx_max;
>> +
>> +		/*
>> +		 * If a user space visible id is reused, then this creates a
>> +		 * risk for data corruption. To reduce the probability that
>> +		 * a number is reused, three approaches are used:
>> +		 * 1) the idr index is allocated cyclically.
>> +		 * 2) the use space id is build by concatenating the
>> +		 *    internal idr index with a sequence number.
>> +		 * 3) The sequence number is only increased when the index
>> +		 *    wraps around.
>> +		 * Note that this code cannot use idr_alloc_cyclic:
>> +		 * new->seq must be set before the entry is inserted in the
>> +		 * idr.
>
> I don't think that is true. The IDR code just need to associate a 
> pointer to the given ID. It is not going to access anything inside. So 
> we don't need to set the seq number first before calling idr_alloc().
>
We must, sorry - there is even a CVE associate to that bug:

CVE-2015-7613, 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b9a532277938798b53178d5a66af6e2915cb27cf

The problem is not the IDR code, the problem is that 
ipc_obtain_object_check() calls ipc_checkid(), and ipc_checkid() 
accesses ipcp->seq.

And since the ipc_checkid() is called before acquiring any locks, 
everything must be fully initialized before idr_alloc().

>> +		 */
>> +		idx_max = ids->in_use*2;
>> +		if (idx_max < RADIX_TREE_MAP_SIZE)
>> +			idx_max = RADIX_TREE_MAP_SIZE;
>> +		if (idx_max > ipc_mni)
>> +			idx_max = ipc_mni;
>> +
>> +		if (ids->ipcs_idr.idr_next <= idx_max) {
>> +			new->seq = ids->seq;
>> +			idx = idr_alloc(&ids->ipcs_idr, new,
>> +						ids->ipcs_idr.idr_next,
>> +						idx_max, GFP_NOWAIT);
>> +		}
>> +
>> +		if ((idx == -ENOSPC) && (ids->ipcs_idr.idr_next > 0)) {
>> +			/*
>> +			 * A wrap around occurred.
>> +			 * Increase ids->seq, update new->seq
>> +			 */
>> +			ids->seq++;
>> +			if (ids->seq > IPCID_SEQ_MAX)
>> +				ids->seq = 0;
>> +			new->seq = ids->seq;
>> +
>> +			idx = idr_alloc(&ids->ipcs_idr, new, 0, idx_max,
>> +						GFP_NOWAIT);
>> +		}
>> +		if (idx >= 0)
>> +			ids->ipcs_idr.idr_next = idx+1;
>
> This code has dependence on the internal implementation of the IDR 
> code. So if the IDR code is changed and the one who does it forgets to 
> update the IPC code, we may have a problem. Using idr_alloc_cyclic() 
> for all will likely increase memory footprint which can be a problem 
> on IoT devices that have little memory. That is the main reason why I 
> opted to use idr_alloc_cyclic() only when in ipcmni_extend mode which 
> I am sure won't be activated on systems with little memory.
>
I know.

But IoT devices with little memory will compile out sysv (as it is done 
by Android).


--

     Manfred


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-03-10 12:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-09 20:11 [PATCH v11 0/3] ipc: Increase IPCMNI limit & IPC id generation modes Waiman Long
2018-11-09 20:11 ` [PATCH v11 1/3] ipc: Allow boot time extension of IPCMNI from 32k to 16M Waiman Long
2018-11-20 19:45   ` Manfred Spraul
2018-11-09 20:11 ` [PATCH v11 2/3] ipc: Conserve sequence numbers in ipcmni_extend mode Waiman Long
2018-11-10  7:41   ` Matthew Wilcox
2018-11-10 13:55     ` Waiman Long
2018-11-20 19:41     ` Manfred Spraul
     [not found]       ` <0dd6a66b-4c7f-5224-bcf9-646b3a012a10@redhat.com>
2019-03-10 12:47         ` Manfred Spraul
2018-11-09 20:11 ` [PATCH v11 3/3] ipc: Do cyclic id allocation with " Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).