linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/8] ipc: Clamp *mni to the real IPCMNI limit & increase that limit
@ 2018-04-27 21:00 Waiman Long
  2018-04-27 21:00 ` [PATCH v6 1/8] sysctl: Add flags to support min/max range clamping Waiman Long
                   ` (8 more replies)
  0 siblings, 9 replies; 17+ messages in thread
From: Waiman Long @ 2018-04-27 21:00 UTC (permalink / raw)
  To: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Waiman Long

v5->v6:
 - Consolidate the 3 ctl_table flags into 2.
 - Make similar changes to proc_doulongvec_minmax() and its associates
   to complete the clamping change.
 - Remove the sysctl registration failure test patch for now for later
   consideration.
 - Add extra braces to patch 1 to reduce code diff in a later patch.

v4->v5:
 - Revert the flags back to 16-bit so that there will be no change to
   the size of ctl_table.
 - Enhance the sysctl_check_flags() as requested by Luis to perform more
   checks to spot incorrect ctl_table entries.
 - Change the sysctl selftest to use dummy sysctls instead of production
   ones & enhance it to do more checks.
 - Add one more sysctl selftest for registration failure.
 - Add 2 ipc patches to add an extended mode to increase IPCMNI from
   32k to 2M.
 - Miscellaneous change to incorporate feedback comments from
   reviewers.

v3->v4:
 - Remove v3 patches 1 & 2 as they have been merged into the mm tree.
 - Change flags from uint16_t to unsigned int.
 - Remove CTL_FLAGS_OOR_WARNED and use pr_warn_ratelimited() instead.
 - Simplify the warning message code.
 - Add a new patch to fail the ctl_table registration with invalid flag.
 - Add a test case for range clamping in sysctl selftest.

v2->v3:
 - Fix kdoc comment errors.
 - Incorporate comments and suggestions from Luis R. Rodriguez.
 - Add a patch to fix a typo error in fs/proc/proc_sysctl.c.

v1->v2:
 - Add kdoc comments to the do_proc_do{u}intvec_minmax_conv_param
   structures.
 - Add a new flags field to the ctl_table structure for specifying
   whether range clamping should be activated instead of adding new
   sysctl parameter handlers.
 - Clamp the semmni value embedded in the multi-values sem parameter.

v1 patch: https://lkml.org/lkml/2018/2/19/453
v2 patch: https://lkml.org/lkml/2018/2/27/627
v3 patch: https://lkml.org/lkml/2018/3/1/716 
v4 patch: https://lkml.org/lkml/2018/3/12/867
v5 patch: https://lkml.org/lkml/2018/3/16/1106

The sysctl parameters msgmni, shmmni and semmni have an inherent limit
of IPC_MNI (32k). However, users may not be aware of that because they
can write a value much higher than that without getting any error or
notification. Reading the parameters back will show the newly written
values which are not real.

Enforcing the limit by failing sysctl parameter write, however, may
cause regressions if existing user setup scripts set those parameters
above 32k as those scripts will now fail in this case.

To address this delemma, a new flags field is introduced into
the ctl_table. The value CTL_FLAGS_CLAMP_RANGE can be added to any
ctl_table entries to enable a looser range clamping without returning
any error. For example,

  .flags = CTL_FLAGS_CLAMP_RANGE,

This flags value are now used for the range checking of shmmni,
msgmni and semmni without breaking existing applications. If any out
of range value is written to those sysctl parameters, the following
warning will be printed instead.

  sysctl: "shmmni" was set out of range [0, 32768], clamped to 32768.

Reading the values back will show 32768 instead of some fake values.

New sysctl selftests are added to exercise new code added by this
patchset.

There are users out there requesting increase in the IPCMNI value.
The last 2 patches attempt to do that by using a boot kernel parameter
"ipcmni_extend" to increase the IPCMNI limit from 32k to 2M.

Eric Biederman had posted an RFC patch to just scrap the IPCMNI limit
and open up the whole positive integer space for IPC IDs. A major
issue that I have with this approach is that SysV IPC had been in use
for over 20 years. We just don't know if there are user applications
that have dependency on the way that the IDs are built. So drastic
change like this may have the potential of breaking some applications.

I prefer a more conservative approach where users will observe no
change in behavior unless they explictly opt in to enable the extended
mode. I could open up the whole positive integer space in this case
like what Eric did, but that will make the code more complex.  So I
just extend IPCMNI to 2M in this case and keep similar ID generation
logic.


Waiman Long (8):
  sysctl: Add flags to support min/max range clamping
  proc/sysctl: Provide additional ctl_table.flags checks
  sysctl: Warn when a clamped sysctl parameter is set out of range
  ipc: Clamp msgmni and shmmni to the real IPCMNI limit
  ipc: Clamp semmni to the real IPCMNI limit
  test_sysctl: Add range clamping test
  ipc: Allow boot time extension of IPCMNI from 32k to 2M
  ipc: Conserve sequence numbers in extended IPCMNI mode

 Documentation/admin-guide/kernel-parameters.txt |   3 +
 fs/proc/proc_sysctl.c                           |  60 ++++++++++++++
 include/linux/ipc_namespace.h                   |   1 +
 include/linux/sysctl.h                          |  32 ++++++++
 ipc/ipc_sysctl.c                                |  33 +++++++-
 ipc/sem.c                                       |  25 ++++++
 ipc/util.c                                      |  41 +++++++---
 ipc/util.h                                      |  35 ++++++--
 kernel/sysctl.c                                 | 104 +++++++++++++++++++++---
 lib/test_sysctl.c                               |  29 +++++++
 tools/testing/selftests/sysctl/sysctl.sh        |  52 ++++++++++++
 11 files changed, 379 insertions(+), 36 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v6 1/8] sysctl: Add flags to support min/max range clamping
  2018-04-27 21:00 [PATCH v6 0/8] ipc: Clamp *mni to the real IPCMNI limit & increase that limit Waiman Long
@ 2018-04-27 21:00 ` Waiman Long
  2018-04-27 21:00 ` [PATCH v6 2/8] proc/sysctl: Provide additional ctl_table.flags checks Waiman Long
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2018-04-27 21:00 UTC (permalink / raw)
  To: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Waiman Long

When minimum/maximum values are specified for a sysctl parameter in
the ctl_table structure with proc_dointvec_minmax() handler, update
to that parameter will fail with error if the given value is outside
of the required range.

There are use cases where it may be better to clamp the value of
the sysctl parameter to the given range without failing the update,
especially if the users are not aware of the actual range limits.
Reading the value back after the update will now be a good practice
to see if the provided value exceeds the range limits.

To provide this less restrictive form of range checking, a new flags
field is added to the ctl_table structure. The new field is a 16-bit
value that just fits into the hole left by the 16-bit umode_t field
without increasing the size of the structure.

When either the CTL_FLAGS_CLAMP_SIGNED_RANGE or the
CTL_FLAGS_CLAMP_UNSIGNED_RANGE flag is set in the ctl_table entry, any
update from the userspace will be clamped to the given range without
error if either the proc_dointvec_minmax() or the proc_douintvec_minmax()
handlers is used respectively.

In the case of proc_doulongvec_minmax(), the out-of-range input value
is either ignored or clamped if the CTL_FLAGS_CLAMP_UNSIGNED_RANGE flag
is set.

The clamped value is either the maximum or minimum value that is
closest to the input value provided by the user.

This patch, by itself, does not require the use of separate signed
and unsigned flags.  However, the use of separate flags allows us to
perform more comprehensive checking in a later patch.

Extra braces are also used in this patch to make a latter patch easier
to read.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/linux/sysctl.h | 32 ++++++++++++++++++++++
 kernel/sysctl.c        | 74 ++++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 94 insertions(+), 12 deletions(-)

diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index b769ecf..3a628cf 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -116,6 +116,7 @@ struct ctl_table
 	void *data;
 	int maxlen;
 	umode_t mode;
+	uint16_t flags;
 	struct ctl_table *child;	/* Deprecated */
 	proc_handler *proc_handler;	/* Callback for text formatting */
 	struct ctl_table_poll *poll;
@@ -123,6 +124,37 @@ struct ctl_table
 	void *extra2;
 } __randomize_layout;
 
+/**
+ * enum ctl_table_flags - flags for the ctl table (struct ctl_table.flags)
+ *
+ * @CTL_FLAGS_CLAMP_SIGNED_RANGE: Set to indicate that the entry holds a
+ *	signed value and should be flexibly clamped to the provided
+ *	min/max signed value in case the user provided a value outside
+ *	of the given range.  The clamped value is either the provided
+ *	minimum or maximum value that is closest to the input value.
+ *	No lower bound or upper bound checking will be done if the
+ *	corresponding minimum or maximum value isn't provided.
+ *
+ * @CTL_FLAGS_CLAMP_UNSIGNED_RANGE: Set to indicate that the entry holds
+ *	an unsigned value and should be flexibly clamped to the provided
+ *	min/max unsigned value in case the user provided a value outside
+ *	of the given range.  The clamped value is either the provided
+ *	minimum or maximum value that is closest to the input value.
+ *	No lower bound or upper bound checking will be done if the
+ *	corresponding minimum or maximum value isn't provided.
+ *
+ * At most 16 different flags are currently allowed.
+ */
+enum ctl_table_flags {
+	CTL_FLAGS_CLAMP_SIGNED_RANGE	= BIT(0),
+	CTL_FLAGS_CLAMP_UNSIGNED_RANGE	= BIT(1),
+	__CTL_FLAGS_MAX			= BIT(2),
+};
+
+#define CTL_FLAGS_CLAMP_RANGE	(CTL_FLAGS_CLAMP_SIGNED_RANGE|\
+				 CTL_FLAGS_CLAMP_UNSIGNED_RANGE)
+#define CTL_TABLE_FLAGS_ALL	(__CTL_FLAGS_MAX - 1)
+
 struct ctl_node {
 	struct rb_node node;
 	struct ctl_table_header *header;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 6a78cf7..5b84c1d 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2515,6 +2515,7 @@ static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write,
  * struct do_proc_dointvec_minmax_conv_param - proc_dointvec_minmax() range checking structure
  * @min: pointer to minimum allowable value
  * @max: pointer to maximum allowable value
+ * @flags: pointer to flags
  *
  * The do_proc_dointvec_minmax_conv_param structure provides the
  * minimum and maximum values for doing range checking for those sysctl
@@ -2523,6 +2524,7 @@ static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write,
 struct do_proc_dointvec_minmax_conv_param {
 	int *min;
 	int *max;
+	uint16_t *flags;
 };
 
 static int do_proc_dointvec_minmax_conv(bool *negp, unsigned long *lvalp,
@@ -2532,9 +2534,23 @@ static int do_proc_dointvec_minmax_conv(bool *negp, unsigned long *lvalp,
 	struct do_proc_dointvec_minmax_conv_param *param = data;
 	if (write) {
 		int val = *negp ? -*lvalp : *lvalp;
-		if ((param->min && *param->min > val) ||
-		    (param->max && *param->max < val))
-			return -EINVAL;
+		bool clamp = param->flags &&
+			   (*param->flags & CTL_FLAGS_CLAMP_SIGNED_RANGE);
+
+		if (param->min && *param->min > val) {
+			if (clamp) {
+				val = *param->min;
+			} else {
+				return -EINVAL;
+			}
+		}
+		if (param->max && *param->max < val) {
+			if (clamp) {
+				val = *param->max;
+			} else {
+				return -EINVAL;
+			}
+		}
 		*valp = val;
 	} else {
 		int val = *valp;
@@ -2563,7 +2579,8 @@ static int do_proc_dointvec_minmax_conv(bool *negp, unsigned long *lvalp,
  * This routine will ensure the values are within the range specified by
  * table->extra1 (min) and table->extra2 (max).
  *
- * Returns 0 on success or -EINVAL on write when the range check fails.
+ * Returns 0 on success or -EINVAL on write when the range check fails
+ * without the CTL_FLAGS_CLAMP_SIGNED_RANGE flag.
  */
 int proc_dointvec_minmax(struct ctl_table *table, int write,
 		  void __user *buffer, size_t *lenp, loff_t *ppos)
@@ -2571,6 +2588,7 @@ int proc_dointvec_minmax(struct ctl_table *table, int write,
 	struct do_proc_dointvec_minmax_conv_param param = {
 		.min = (int *) table->extra1,
 		.max = (int *) table->extra2,
+		.flags = &table->flags,
 	};
 	return do_proc_dointvec(table, write, buffer, lenp, ppos,
 				do_proc_dointvec_minmax_conv, &param);
@@ -2580,6 +2598,7 @@ int proc_dointvec_minmax(struct ctl_table *table, int write,
  * struct do_proc_douintvec_minmax_conv_param - proc_douintvec_minmax() range checking structure
  * @min: pointer to minimum allowable value
  * @max: pointer to maximum allowable value
+ * @flags: pointer to flags
  *
  * The do_proc_douintvec_minmax_conv_param structure provides the
  * minimum and maximum values for doing range checking for those sysctl
@@ -2588,6 +2607,7 @@ int proc_dointvec_minmax(struct ctl_table *table, int write,
 struct do_proc_douintvec_minmax_conv_param {
 	unsigned int *min;
 	unsigned int *max;
+	uint16_t *flags;
 };
 
 static int do_proc_douintvec_minmax_conv(unsigned long *lvalp,
@@ -2598,14 +2618,26 @@ static int do_proc_douintvec_minmax_conv(unsigned long *lvalp,
 
 	if (write) {
 		unsigned int val = *lvalp;
+		bool clamp = param->flags &&
+			   (*param->flags & CTL_FLAGS_CLAMP_UNSIGNED_RANGE);
 
 		if (*lvalp > UINT_MAX)
 			return -EINVAL;
 
-		if ((param->min && *param->min > val) ||
-		    (param->max && *param->max < val))
-			return -ERANGE;
-
+		if (param->min && *param->min > val) {
+			if (clamp) {
+				val = *param->min;
+			} else {
+				return -ERANGE;
+			}
+		}
+		if (param->max && *param->max < val) {
+			if (clamp) {
+				val = *param->max;
+			} else {
+				return -ERANGE;
+			}
+		}
 		*valp = val;
 	} else {
 		unsigned int val = *valp;
@@ -2632,7 +2664,8 @@ static int do_proc_douintvec_minmax_conv(unsigned long *lvalp,
  * check for UINT_MAX to avoid having to support wrap around uses from
  * userspace.
  *
- * Returns 0 on success or -ERANGE on write when the range check fails.
+ * Returns 0 on success or -ERANGE on write when the range check fails
+ * without the CTL_FLAGS_CLAMP_UNSIGNED_RANGE flag.
  */
 int proc_douintvec_minmax(struct ctl_table *table, int write,
 			  void __user *buffer, size_t *lenp, loff_t *ppos)
@@ -2640,6 +2673,7 @@ int proc_douintvec_minmax(struct ctl_table *table, int write,
 	struct do_proc_douintvec_minmax_conv_param param = {
 		.min = (unsigned int *) table->extra1,
 		.max = (unsigned int *) table->extra2,
+		.flags = &table->flags,
 	};
 	return do_proc_douintvec(table, write, buffer, lenp, ppos,
 				 do_proc_douintvec_minmax_conv, &param);
@@ -2716,6 +2750,7 @@ static int __do_proc_doulongvec_minmax(void *data, struct ctl_table *table, int
 	int vleft, first = 1, err = 0;
 	size_t left;
 	char *kbuf = NULL, *p;
+	uint16_t flags;
 
 	if (!data || !table->maxlen || !*lenp || (*ppos && !write)) {
 		*lenp = 0;
@@ -2725,6 +2760,7 @@ static int __do_proc_doulongvec_minmax(void *data, struct ctl_table *table, int
 	i = (unsigned long *) data;
 	min = (unsigned long *) table->extra1;
 	max = (unsigned long *) table->extra2;
+	flags = table->flags;
 	vleft = table->maxlen / sizeof(unsigned long);
 	left = *lenp;
 
@@ -2755,8 +2791,20 @@ static int __do_proc_doulongvec_minmax(void *data, struct ctl_table *table, int
 			if (neg)
 				continue;
 			val = convmul * val / convdiv;
-			if ((min && val < *min) || (max && val > *max))
-				continue;
+			if (min && val < *min) {
+				if (flags & CTL_FLAGS_CLAMP_UNSIGNED_RANGE) {
+					val = *min;
+				} else {
+					continue;
+				}
+			}
+			if (max && val > *max) {
+				if (flags & CTL_FLAGS_CLAMP_UNSIGNED_RANGE) {
+					val = *max;
+				} else {
+					continue;
+				}
+			}
 			*i = val;
 		} else {
 			val = convdiv * (*i) / convmul;
@@ -2808,7 +2856,9 @@ static int do_proc_doulongvec_minmax(struct ctl_table *table, int write,
  * values from/to the user buffer, treated as an ASCII string.
  *
  * This routine will ensure the values are within the range specified by
- * table->extra1 (min) and table->extra2 (max).
+ * table->extra1 (min) and table->extra2 (max).  Values outside the range
+ * will be ignored or clamped to the given range if the
+ * CTL_FLAGS_CLAMP_UNSIGNED_RANGE flag is specified.
  *
  * Returns 0 on success.
  */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 2/8] proc/sysctl: Provide additional ctl_table.flags checks
  2018-04-27 21:00 [PATCH v6 0/8] ipc: Clamp *mni to the real IPCMNI limit & increase that limit Waiman Long
  2018-04-27 21:00 ` [PATCH v6 1/8] sysctl: Add flags to support min/max range clamping Waiman Long
@ 2018-04-27 21:00 ` Waiman Long
  2018-04-27 21:00 ` [PATCH v6 3/8] sysctl: Warn when a clamped sysctl parameter is set out of range Waiman Long
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2018-04-27 21:00 UTC (permalink / raw)
  To: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Waiman Long

Checking code is added to provide the following additional
ctl_table.flags checks:

 1) No unknown flag is allowed.
 2) Minimum of a range cannot be larger than the maximum value.
 3) The signed and unsigned flags are mutually exclusive.
 4) The proc_handler should be consistent with the signed or unsigned
    flags.

The separation of signed and unsigned flags helps to provide more
comprehensive checking than it would have been if there is only one
flag available.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 fs/proc/proc_sysctl.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 8989936..fb09454 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -1092,6 +1092,64 @@ static int sysctl_check_table_array(const char *path, struct ctl_table *table)
 	return err;
 }
 
+/*
+ * This code assumes that only one integer value is allowed in an integer
+ * sysctl when one of the clamping flags is used. If that assumption is no
+ * longer true, we may need to add another flag to indicate the entry size.
+ */
+static int sysctl_check_flags(const char *path, struct ctl_table *table)
+{
+	int err = 0;
+
+	if ((table->flags & ~CTL_TABLE_FLAGS_ALL) ||
+	   ((table->flags & CTL_FLAGS_CLAMP_RANGE) == CTL_FLAGS_CLAMP_RANGE))
+		err = sysctl_err(path, table, "invalid flags");
+
+	if (table->flags & CTL_FLAGS_CLAMP_RANGE) {
+		int range_err = 0;
+		bool is_int = (table->maxlen == sizeof(int));
+
+		if (!is_int && (table->maxlen != sizeof(long))) {
+			range_err++;
+		} else if (!table->extra1 || !table->extra2) {
+			/* No min > max checking needed */
+		} else if (table->flags & CTL_FLAGS_CLAMP_UNSIGNED_RANGE) {
+			unsigned long min, max;
+
+			min = is_int ? *(unsigned int *)table->extra1
+				     : *(unsigned long *)table->extra1;
+			max = is_int ? *(unsigned int *)table->extra2
+				     : *(unsigned long *)table->extra2;
+			range_err += (min > max);
+		} else { /* table->flags & CTL_FLAGS_CLAMP_SIGNED_RANGE */
+
+			long min, max;
+
+			min = is_int ? *(int *)table->extra1
+				     : *(long *)table->extra1;
+			max = is_int ? *(int *)table->extra2
+				     : *(long *)table->extra2;
+			range_err += (min > max);
+		}
+
+		/*
+		 * proc_handler and flag consistency check.
+		 */
+		if (((table->proc_handler == proc_douintvec_minmax)   ||
+		     (table->proc_handler == proc_doulongvec_minmax)) &&
+		    !(table->flags & CTL_FLAGS_CLAMP_UNSIGNED_RANGE))
+			range_err++;
+
+		if ((table->proc_handler == proc_dointvec_minmax) &&
+		   !(table->flags & CTL_FLAGS_CLAMP_SIGNED_RANGE))
+			range_err++;
+
+		if (range_err)
+			err |= sysctl_err(path, table, "Invalid range");
+	}
+	return err;
+}
+
 static int sysctl_check_table(const char *path, struct ctl_table *table)
 {
 	int err = 0;
@@ -1111,6 +1169,8 @@ static int sysctl_check_table(const char *path, struct ctl_table *table)
 		    (table->proc_handler == proc_doulongvec_ms_jiffies_minmax)) {
 			if (!table->data)
 				err |= sysctl_err(path, table, "No data");
+			if (table->flags)
+				err |= sysctl_check_flags(path, table);
 			if (!table->maxlen)
 				err |= sysctl_err(path, table, "No maxlen");
 			else
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 3/8] sysctl: Warn when a clamped sysctl parameter is set out of range
  2018-04-27 21:00 [PATCH v6 0/8] ipc: Clamp *mni to the real IPCMNI limit & increase that limit Waiman Long
  2018-04-27 21:00 ` [PATCH v6 1/8] sysctl: Add flags to support min/max range clamping Waiman Long
  2018-04-27 21:00 ` [PATCH v6 2/8] proc/sysctl: Provide additional ctl_table.flags checks Waiman Long
@ 2018-04-27 21:00 ` Waiman Long
  2018-04-30 22:40   ` Kees Cook
  2018-04-27 21:00 ` [PATCH v6 4/8] ipc: Clamp msgmni and shmmni to the real IPCMNI limit Waiman Long
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 17+ messages in thread
From: Waiman Long @ 2018-04-27 21:00 UTC (permalink / raw)
  To: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Waiman Long

Even with clamped sysctl parameters, it is still not that straight
forward to figure out the exact range of those parameters. One may
try to write extreme parameter values to see if they get clamped.
To make it easier, a warning with the expected range will now be
printed into the kernel ring buffer when a clamped sysctl parameter
receives an out of range value.

The pr_warn_ratelimited() macro is used to limit the number of warning
messages that can be printed within a given period of time.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/sysctl.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 5b84c1d..76b2f1b 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -17,6 +17,7 @@
  * The list_for_each() macro wasn't appropriate for the sysctl loop.
  *  Removed it and replaced it with older style, 03/23/00, Bill Wendling
  */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include <linux/module.h>
 #include <linux/aio.h>
@@ -2516,6 +2517,7 @@ static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write,
  * @min: pointer to minimum allowable value
  * @max: pointer to maximum allowable value
  * @flags: pointer to flags
+ * @name: sysctl parameter name
  *
  * The do_proc_dointvec_minmax_conv_param structure provides the
  * minimum and maximum values for doing range checking for those sysctl
@@ -2525,6 +2527,7 @@ struct do_proc_dointvec_minmax_conv_param {
 	int *min;
 	int *max;
 	uint16_t *flags;
+	const char *name;
 };
 
 static int do_proc_dointvec_minmax_conv(bool *negp, unsigned long *lvalp,
@@ -2534,12 +2537,14 @@ static int do_proc_dointvec_minmax_conv(bool *negp, unsigned long *lvalp,
 	struct do_proc_dointvec_minmax_conv_param *param = data;
 	if (write) {
 		int val = *negp ? -*lvalp : *lvalp;
+		bool clamped = false;
 		bool clamp = param->flags &&
 			   (*param->flags & CTL_FLAGS_CLAMP_SIGNED_RANGE);
 
 		if (param->min && *param->min > val) {
 			if (clamp) {
 				val = *param->min;
+				clamped = true;
 			} else {
 				return -EINVAL;
 			}
@@ -2547,11 +2552,17 @@ static int do_proc_dointvec_minmax_conv(bool *negp, unsigned long *lvalp,
 		if (param->max && *param->max < val) {
 			if (clamp) {
 				val = *param->max;
+				clamped = true;
 			} else {
 				return -EINVAL;
 			}
 		}
 		*valp = val;
+		if (clamped && param->name)
+			pr_warn_ratelimited("\"%s\" was set out of range [%d, %d], clamped to %d.\n",
+				param->name,
+				param->min ? *param->min : -INT_MAX,
+				param->max ? *param->max :  INT_MAX, val);
 	} else {
 		int val = *valp;
 		if (val < 0) {
@@ -2589,6 +2600,7 @@ int proc_dointvec_minmax(struct ctl_table *table, int write,
 		.min = (int *) table->extra1,
 		.max = (int *) table->extra2,
 		.flags = &table->flags,
+		.name  = table->procname,
 	};
 	return do_proc_dointvec(table, write, buffer, lenp, ppos,
 				do_proc_dointvec_minmax_conv, &param);
@@ -2599,6 +2611,7 @@ int proc_dointvec_minmax(struct ctl_table *table, int write,
  * @min: pointer to minimum allowable value
  * @max: pointer to maximum allowable value
  * @flags: pointer to flags
+ * @name: sysctl parameter name
  *
  * The do_proc_douintvec_minmax_conv_param structure provides the
  * minimum and maximum values for doing range checking for those sysctl
@@ -2608,6 +2621,7 @@ struct do_proc_douintvec_minmax_conv_param {
 	unsigned int *min;
 	unsigned int *max;
 	uint16_t *flags;
+	const char *name;
 };
 
 static int do_proc_douintvec_minmax_conv(unsigned long *lvalp,
@@ -2618,6 +2632,7 @@ static int do_proc_douintvec_minmax_conv(unsigned long *lvalp,
 
 	if (write) {
 		unsigned int val = *lvalp;
+		bool clamped = false;
 		bool clamp = param->flags &&
 			   (*param->flags & CTL_FLAGS_CLAMP_UNSIGNED_RANGE);
 
@@ -2627,6 +2642,7 @@ static int do_proc_douintvec_minmax_conv(unsigned long *lvalp,
 		if (param->min && *param->min > val) {
 			if (clamp) {
 				val = *param->min;
+				clamped = true;
 			} else {
 				return -ERANGE;
 			}
@@ -2634,11 +2650,17 @@ static int do_proc_douintvec_minmax_conv(unsigned long *lvalp,
 		if (param->max && *param->max < val) {
 			if (clamp) {
 				val = *param->max;
+				clamped = true;
 			} else {
 				return -ERANGE;
 			}
 		}
 		*valp = val;
+		if (clamped && param->name)
+			pr_warn_ratelimited("\"%s\" was set out of range [%u, %u], clamped to %u.\n",
+				param->name,
+				param->min ? *param->min : 0,
+				param->max ? *param->max : UINT_MAX, val);
 	} else {
 		unsigned int val = *valp;
 		*lvalp = (unsigned long) val;
@@ -2674,6 +2696,7 @@ int proc_douintvec_minmax(struct ctl_table *table, int write,
 		.min = (unsigned int *) table->extra1,
 		.max = (unsigned int *) table->extra2,
 		.flags = &table->flags,
+		.name  = table->procname,
 	};
 	return do_proc_douintvec(table, write, buffer, lenp, ppos,
 				 do_proc_douintvec_minmax_conv, &param);
@@ -2780,6 +2803,7 @@ static int __do_proc_doulongvec_minmax(void *data, struct ctl_table *table, int
 
 		if (write) {
 			bool neg;
+			bool clamped = false;
 
 			left -= proc_skip_spaces(&p);
 
@@ -2794,6 +2818,7 @@ static int __do_proc_doulongvec_minmax(void *data, struct ctl_table *table, int
 			if (min && val < *min) {
 				if (flags & CTL_FLAGS_CLAMP_UNSIGNED_RANGE) {
 					val = *min;
+					clamped = true;
 				} else {
 					continue;
 				}
@@ -2801,11 +2826,16 @@ static int __do_proc_doulongvec_minmax(void *data, struct ctl_table *table, int
 			if (max && val > *max) {
 				if (flags & CTL_FLAGS_CLAMP_UNSIGNED_RANGE) {
 					val = *max;
+					clamped = true;
 				} else {
 					continue;
 				}
 			}
 			*i = val;
+			if (clamped && table->procname)
+				pr_warn_ratelimited("\"%s\" was set out of range [%lu, %lu], clamped to %lu.\n",
+					table->procname, min ? *min : 0,
+					max ? *max : ULONG_MAX, val);
 		} else {
 			val = convdiv * (*i) / convmul;
 			if (!first) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 4/8] ipc: Clamp msgmni and shmmni to the real IPCMNI limit
  2018-04-27 21:00 [PATCH v6 0/8] ipc: Clamp *mni to the real IPCMNI limit & increase that limit Waiman Long
                   ` (2 preceding siblings ...)
  2018-04-27 21:00 ` [PATCH v6 3/8] sysctl: Warn when a clamped sysctl parameter is set out of range Waiman Long
@ 2018-04-27 21:00 ` Waiman Long
  2018-04-27 21:00 ` [PATCH v6 5/8] ipc: Clamp semmni " Waiman Long
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2018-04-27 21:00 UTC (permalink / raw)
  To: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Waiman Long

A user can write arbitrary integer values to msgmni and shmmni sysctl
parameters without getting error, but the actual limit is really
IPCMNI (32k). This can mislead users as they think they can get a
value that is not real.

Enforcing the limit by failing the sysctl parameter write, however,
can break existing user applications in case they are writing a value
greater than 32k. Instead, the range clamping flag is set to enforce
the limit without failing existing user code. Users can easily figure
out if the sysctl parameter value is out of range by either reading
back the parameter value or checking the kernel ring buffer for warning.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 ipc/ipc_sysctl.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
index 8ad93c2..d71f949 100644
--- a/ipc/ipc_sysctl.c
+++ b/ipc/ipc_sysctl.c
@@ -99,6 +99,7 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write,
 static int zero;
 static int one = 1;
 static int int_max = INT_MAX;
+static int ipc_mni = IPCMNI;
 
 static struct ctl_table ipc_kern_table[] = {
 	{
@@ -120,7 +121,10 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write,
 		.data		= &init_ipc_ns.shm_ctlmni,
 		.maxlen		= sizeof(init_ipc_ns.shm_ctlmni),
 		.mode		= 0644,
-		.proc_handler	= proc_ipc_dointvec,
+		.proc_handler	= proc_ipc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &ipc_mni,
+		.flags		= CTL_FLAGS_CLAMP_SIGNED_RANGE,
 	},
 	{
 		.procname	= "shm_rmid_forced",
@@ -147,7 +151,8 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write,
 		.mode		= 0644,
 		.proc_handler	= proc_ipc_dointvec_minmax,
 		.extra1		= &zero,
-		.extra2		= &int_max,
+		.extra2		= &ipc_mni,
+		.flags		= CTL_FLAGS_CLAMP_SIGNED_RANGE,
 	},
 	{
 		.procname	= "auto_msgmni",
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 5/8] ipc: Clamp semmni to the real IPCMNI limit
  2018-04-27 21:00 [PATCH v6 0/8] ipc: Clamp *mni to the real IPCMNI limit & increase that limit Waiman Long
                   ` (3 preceding siblings ...)
  2018-04-27 21:00 ` [PATCH v6 4/8] ipc: Clamp msgmni and shmmni to the real IPCMNI limit Waiman Long
@ 2018-04-27 21:00 ` Waiman Long
  2018-04-27 21:00 ` [PATCH v6 6/8] test_sysctl: Add range clamping test Waiman Long
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2018-04-27 21:00 UTC (permalink / raw)
  To: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Waiman Long

For SysV semaphores, the semmni value is the last part of the 4-element
sem number array. To make semmni behave in a similar way to msgmni
and shmmni, we can't directly use the _minmax handler. Instead,
a special sem specific handler is added to check the last argument
to make sure that it is clamped to the [0, IPCMNI] range and prints
a warning message once when an out-of-range value is being written.
This does require duplicating some of the code in the _minmax handlers.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 ipc/ipc_sysctl.c | 12 +++++++++++-
 ipc/sem.c        | 25 +++++++++++++++++++++++++
 ipc/util.h       |  4 ++++
 3 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
index d71f949..478e634 100644
--- a/ipc/ipc_sysctl.c
+++ b/ipc/ipc_sysctl.c
@@ -88,12 +88,22 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write,
 	return proc_dointvec_minmax(&ipc_table, write, buffer, lenp, ppos);
 }
 
+static int proc_ipc_sem_dointvec(struct ctl_table *table, int write,
+	void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+	int ret = proc_ipc_dointvec(table, write, buffer, lenp, ppos);
+
+	sem_check_semmni(table, current->nsproxy->ipc_ns);
+	return ret;
+}
+
 #else
 #define proc_ipc_doulongvec_minmax NULL
 #define proc_ipc_dointvec	   NULL
 #define proc_ipc_dointvec_minmax   NULL
 #define proc_ipc_dointvec_minmax_orphans   NULL
 #define proc_ipc_auto_msgmni	   NULL
+#define proc_ipc_sem_dointvec	   NULL
 #endif
 
 static int zero;
@@ -177,7 +187,7 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write,
 		.data		= &init_ipc_ns.sem_ctls,
 		.maxlen		= 4*sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_ipc_dointvec,
+		.proc_handler	= proc_ipc_sem_dointvec,
 	},
 #ifdef CONFIG_CHECKPOINT_RESTORE
 	{
diff --git a/ipc/sem.c b/ipc/sem.c
index 06be75d..96bdec6 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -2397,3 +2397,28 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it)
 	return 0;
 }
 #endif
+
+#ifdef CONFIG_PROC_SYSCTL
+/*
+ * Check to see if semmni is out of range and clamp it if necessary.
+ */
+void sem_check_semmni(struct ctl_table *table, struct ipc_namespace *ns)
+{
+	bool clamped = false;
+
+	/*
+	 * Clamp semmni to the range [0, IPCMNI].
+	 */
+	if (ns->sc_semmni < 0) {
+		ns->sc_semmni = 0;
+		clamped = true;
+	}
+	if (ns->sc_semmni > IPCMNI) {
+		ns->sc_semmni = IPCMNI;
+		clamped = true;
+	}
+	if (clamped)
+		pr_warn_ratelimited("sysctl: \"sem[3]\" was set out of range [%d, %d], clamped to %d.\n",
+				     0, IPCMNI, ns->sc_semmni);
+}
+#endif
diff --git a/ipc/util.h b/ipc/util.h
index acc5159..7c20871 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -218,6 +218,10 @@ int ipcget(struct ipc_namespace *ns, struct ipc_ids *ids,
 void free_ipcs(struct ipc_namespace *ns, struct ipc_ids *ids,
 		void (*free)(struct ipc_namespace *, struct kern_ipc_perm *));
 
+#ifdef CONFIG_PROC_SYSCTL
+extern void sem_check_semmni(struct ctl_table *table, struct ipc_namespace *ns);
+#endif
+
 #ifdef CONFIG_COMPAT
 #include <linux/compat.h>
 struct compat_ipc_perm {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 6/8] test_sysctl: Add range clamping test
  2018-04-27 21:00 [PATCH v6 0/8] ipc: Clamp *mni to the real IPCMNI limit & increase that limit Waiman Long
                   ` (4 preceding siblings ...)
  2018-04-27 21:00 ` [PATCH v6 5/8] ipc: Clamp semmni " Waiman Long
@ 2018-04-27 21:00 ` Waiman Long
  2018-04-27 21:00 ` [PATCH v6 7/8] ipc: Allow boot time extension of IPCMNI from 32k to 2M Waiman Long
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2018-04-27 21:00 UTC (permalink / raw)
  To: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Waiman Long

Add a range clamping test to verify that the input value will be
clamped if it exceeds the builtin maximum or minimum value.

Below is the expected test run result:

Running test: sysctl_test_0006 - run #0
Checking range minimum clamping ... ok
Checking range maximum clamping ... ok
Checking range minimum clamping ... ok
Checking range maximum clamping ... ok

Signed-off-by: Waiman Long <longman@redhat.com>
---
 lib/test_sysctl.c                        | 29 ++++++++++++++++++
 tools/testing/selftests/sysctl/sysctl.sh | 52 ++++++++++++++++++++++++++++++++
 2 files changed, 81 insertions(+)

diff --git a/lib/test_sysctl.c b/lib/test_sysctl.c
index 3dd801c..3c619b9 100644
--- a/lib/test_sysctl.c
+++ b/lib/test_sysctl.c
@@ -38,12 +38,18 @@
 
 static int i_zero;
 static int i_one_hundred = 100;
+static int signed_min = -10;
+static int signed_max = 10;
+static unsigned int unsigned_min = 10;
+static unsigned int unsigned_max = 30;
 
 struct test_sysctl_data {
 	int int_0001;
 	int int_0002;
 	int int_0003[4];
+	int range_0001;
 
+	unsigned int urange_0001;
 	unsigned int uint_0001;
 
 	char string_0001[65];
@@ -58,6 +64,9 @@ struct test_sysctl_data {
 	.int_0003[2] = 2,
 	.int_0003[3] = 3,
 
+	.range_0001 = 0,
+	.urange_0001 = 20,
+
 	.uint_0001 = 314,
 
 	.string_0001 = "(none)",
@@ -102,6 +111,26 @@ struct test_sysctl_data {
 		.mode		= 0644,
 		.proc_handler	= proc_dostring,
 	},
+	{
+		.procname	= "range_0001",
+		.data		= &test_data.range_0001,
+		.maxlen		= sizeof(test_data.range_0001),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.flags		= CTL_FLAGS_CLAMP_SIGNED_RANGE,
+		.extra1		= &signed_min,
+		.extra2		= &signed_max,
+	},
+	{
+		.procname	= "urange_0001",
+		.data		= &test_data.urange_0001,
+		.maxlen		= sizeof(test_data.urange_0001),
+		.mode		= 0644,
+		.proc_handler	= proc_douintvec_minmax,
+		.flags		= CTL_FLAGS_CLAMP_UNSIGNED_RANGE,
+		.extra1		= &unsigned_min,
+		.extra2		= &unsigned_max,
+	},
 	{ }
 };
 
diff --git a/tools/testing/selftests/sysctl/sysctl.sh b/tools/testing/selftests/sysctl/sysctl.sh
index ec232c3..1aa1bba 100755
--- a/tools/testing/selftests/sysctl/sysctl.sh
+++ b/tools/testing/selftests/sysctl/sysctl.sh
@@ -34,6 +34,7 @@ ALL_TESTS="$ALL_TESTS 0002:1:1"
 ALL_TESTS="$ALL_TESTS 0003:1:1"
 ALL_TESTS="$ALL_TESTS 0004:1:1"
 ALL_TESTS="$ALL_TESTS 0005:3:1"
+ALL_TESTS="$ALL_TESTS 0006:1:1"
 
 test_modprobe()
 {
@@ -543,6 +544,38 @@ run_stringtests()
 	test_rc
 }
 
+# TARGET, RANGE_MIN & RANGE_MAX need to be defined before running test.
+run_range_clamping_test()
+{
+	rc=0
+
+	echo -n "Checking range minimum clamping ... "
+	VAL=$((RANGE_MIN - 1))
+	echo -n $VAL > "${TARGET}" 2> /dev/null
+	EXITVAL=$?
+	NEWVAL=$(cat "${TARGET}")
+	if [[ $EXITVAL -ne 0 || $NEWVAL -ne $RANGE_MIN ]]; then
+		echo "FAIL" >&2
+		rc=1
+	else
+		echo "ok"
+	fi
+
+	echo -n "Checking range maximum clamping ... "
+	VAL=$((RANGE_MAX + 1))
+	echo -n $VAL > "${TARGET}" 2> /dev/null
+	EXITVAL=$?
+	NEWVAL=$(cat "${TARGET}")
+	if [[ $EXITVAL -ne 0 || $NEWVAL -ne $RANGE_MAX ]]; then
+		echo "FAIL" >&2
+		rc=1
+	else
+		echo "ok"
+	fi
+
+	test_rc
+}
+
 sysctl_test_0001()
 {
 	TARGET="${SYSCTL}/int_0001"
@@ -600,6 +633,25 @@ sysctl_test_0005()
 	run_limit_digit_int_array
 }
 
+sysctl_test_0006()
+{
+	TARGET="${SYSCTL}/range_0001"
+	ORIG=$(cat "${TARGET}")
+	RANGE_MIN=-10
+	RANGE_MAX=10
+
+	run_range_clamping_test
+	set_orig
+
+	TARGET="${SYSCTL}/urange_0001"
+	ORIG=$(cat "${TARGET}")
+	RANGE_MIN=10
+	RANGE_MAX=30
+
+	run_range_clamping_test
+	set_orig
+}
+
 list_tests()
 {
 	echo "Test ID list:"
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 7/8] ipc: Allow boot time extension of IPCMNI from 32k to 2M
  2018-04-27 21:00 [PATCH v6 0/8] ipc: Clamp *mni to the real IPCMNI limit & increase that limit Waiman Long
                   ` (5 preceding siblings ...)
  2018-04-27 21:00 ` [PATCH v6 6/8] test_sysctl: Add range clamping test Waiman Long
@ 2018-04-27 21:00 ` Waiman Long
  2018-04-29 15:54   ` kbuild test robot
  2018-04-27 21:00 ` [PATCH v6 8/8] ipc: Conserve sequence numbers in extended IPCMNI mode Waiman Long
  2018-05-02  2:18 ` [PATCH v6 0/8] ipc: Clamp *mni to the real IPCMNI limit & increase that limit Eric W. Biederman
  8 siblings, 1 reply; 17+ messages in thread
From: Waiman Long @ 2018-04-27 21:00 UTC (permalink / raw)
  To: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Waiman Long

The maximum number of unique System V IPC identifiers was limited to
32k.  That limit should be big enough for most use cases.

However, there are some users out there requesting for more. To satisfy
the need of those users, a new boot time kernel option "ipcmni_extend"
is added to extend the IPCMNI value to 2M. This is a 64X increase which
hopefully is big enough for them.

This new option does have the side effect of reducing the maximum
number of unique sequence numbers from 64k down to 1k. So it is
a trade-off.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  3 +++
 ipc/ipc_sysctl.c                                | 12 +++++++++-
 ipc/util.c                                      | 12 +++++-----
 ipc/util.h                                      | 30 ++++++++++++++++++-------
 4 files changed, 42 insertions(+), 15 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 11fc28e..00bc0cb 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1735,6 +1735,9 @@
 	ip=		[IP_PNP]
 			See Documentation/filesystems/nfs/nfsroot.txt.
 
+	ipcmni_extend	[KNL] Extend the maximum number of unique System V
+			IPC identifiers from 32768 to 2097152.
+
 	irqaffinity=	[SMP] Set the default irq affinity mask
 			The argument is a cpu list, as described above.
 
diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
index 478e634..4e2cb6d 100644
--- a/ipc/ipc_sysctl.c
+++ b/ipc/ipc_sysctl.c
@@ -109,7 +109,8 @@ static int proc_ipc_sem_dointvec(struct ctl_table *table, int write,
 static int zero;
 static int one = 1;
 static int int_max = INT_MAX;
-static int ipc_mni = IPCMNI;
+int ipc_mni __read_mostly = IPCMNI;
+int ipc_mni_shift __read_mostly = IPCMNI_SHIFT;
 
 static struct ctl_table ipc_kern_table[] = {
 	{
@@ -237,3 +238,12 @@ static int __init ipc_sysctl_init(void)
 }
 
 device_initcall(ipc_sysctl_init);
+
+static int __init ipc_mni_extend(char *str)
+{
+	ipc_mni = IPCMNI_EXTEND;
+	ipc_mni_shift = IPCMNI_EXTEND_SHIFT;
+	pr_info("IPCMNI extended to %d.\n", ipc_mni);
+	return 0;
+}
+early_param("ipcmni_extend", ipc_mni_extend);
diff --git a/ipc/util.c b/ipc/util.c
index 4e81182..782a8d0 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -113,7 +113,7 @@ static int __init ipc_init(void)
  * @ids: ipc identifier set
  *
  * Set up the sequence range to use for the ipc identifier range (limited
- * below IPCMNI) then initialise the keys hashtable and ids idr.
+ * below ipc_mni) then initialise the keys hashtable and ids idr.
  */
 int ipc_init_ids(struct ipc_ids *ids)
 {
@@ -214,7 +214,7 @@ static inline int ipc_buildid(int id, struct ipc_ids *ids,
 		ids->next_id = -1;
 	}
 
-	return SEQ_MULTIPLIER * new->seq + id;
+	return (new->seq << SEQ_SHIFT) + id;
 }
 
 #else
@@ -228,7 +228,7 @@ static inline int ipc_buildid(int id, struct ipc_ids *ids,
 	if (ids->seq > IPCID_SEQ_MAX)
 		ids->seq = 0;
 
-	return SEQ_MULTIPLIER * new->seq + id;
+	return (new->seq << SEQ_SHIFT) + id;
 }
 
 #endif /* CONFIG_CHECKPOINT_RESTORE */
@@ -252,8 +252,8 @@ int ipc_addid(struct ipc_ids *ids, struct kern_ipc_perm *new, int limit)
 	kgid_t egid;
 	int id, err;
 
-	if (limit > IPCMNI)
-		limit = IPCMNI;
+	if (limit > ipc_mni)
+		limit = ipc_mni;
 
 	if (!ids->tables_initialized || ids->in_use >= limit)
 		return -ENOSPC;
@@ -777,7 +777,7 @@ static struct kern_ipc_perm *sysvipc_find_ipc(struct ipc_ids *ids, loff_t pos,
 	if (total >= ids->in_use)
 		return NULL;
 
-	for (; pos < IPCMNI; pos++) {
+	for (; pos < ipc_mni; pos++) {
 		ipc = idr_find(&ids->ipcs_idr, pos);
 		if (ipc != NULL) {
 			*new_pos = pos + 1;
diff --git a/ipc/util.h b/ipc/util.h
index 7c20871..e4d14b6 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -15,8 +15,22 @@
 #include <linux/err.h>
 #include <linux/ipc_namespace.h>
 
-#define IPCMNI 32768  /* <= MAX_INT limit for ipc arrays (including sysctl changes) */
-#define SEQ_MULTIPLIER	(IPCMNI)
+/*
+ * By default, the ipc arrays can have up to 32k (15 bits) entries.
+ * When IPCMNI extension mode is turned on, the ipc arrays can have up
+ * to 2M (21 bits) entries. However, the space for sequence number will
+ * be shrunk from 16 bits to 10 bits.
+ */
+#define IPCMNI_SHIFT		15
+#define IPCMNI_EXTEND_SHIFT	21
+#define IPCMNI			(1 << IPCMNI_SHIFT)
+#define IPCMNI_EXTEND		(1 << IPCMNI_EXTEND_SHIFT)
+
+extern int ipc_mni;
+extern int ipc_mni_shift;
+
+#define SEQ_SHIFT	ipc_mni_shift
+#define SEQ_MASK	((1 << ipc_mni_shift) - 1)
 
 int sem_init(void);
 int msg_init(void);
@@ -96,9 +110,9 @@ void __init ipc_init_proc_interface(const char *path, const char *header,
 #define IPC_MSG_IDS	1
 #define IPC_SHM_IDS	2
 
-#define ipcid_to_idx(id) ((id) % SEQ_MULTIPLIER)
-#define ipcid_to_seqx(id) ((id) / SEQ_MULTIPLIER)
-#define IPCID_SEQ_MAX min_t(int, INT_MAX/SEQ_MULTIPLIER, USHRT_MAX)
+#define ipcid_to_idx(id)  ((id) & SEQ_MASK)
+#define ipcid_to_seqx(id) ((id) >> SEQ_SHIFT)
+#define IPCID_SEQ_MAX	  (INT_MAX >> SEQ_SHIFT)
 
 /* must be called with ids->rwsem acquired for writing */
 int ipc_addid(struct ipc_ids *, struct kern_ipc_perm *, int);
@@ -123,8 +137,8 @@ static inline int ipc_get_maxid(struct ipc_ids *ids)
 	if (ids->in_use == 0)
 		return -1;
 
-	if (ids->in_use == IPCMNI)
-		return IPCMNI - 1;
+	if (ids->in_use == ipc_mni)
+		return ipc_mni - 1;
 
 	return ids->max_id;
 }
@@ -175,7 +189,7 @@ static inline void ipc_update_pid(struct pid **pos, struct pid *pid)
 
 static inline int ipc_checkid(struct kern_ipc_perm *ipcp, int uid)
 {
-	return uid / SEQ_MULTIPLIER != ipcp->seq;
+	return (uid >> SEQ_SHIFT) != ipcp->seq;
 }
 
 static inline void ipc_lock_object(struct kern_ipc_perm *perm)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 8/8] ipc: Conserve sequence numbers in extended IPCMNI mode
  2018-04-27 21:00 [PATCH v6 0/8] ipc: Clamp *mni to the real IPCMNI limit & increase that limit Waiman Long
                   ` (6 preceding siblings ...)
  2018-04-27 21:00 ` [PATCH v6 7/8] ipc: Allow boot time extension of IPCMNI from 32k to 2M Waiman Long
@ 2018-04-27 21:00 ` Waiman Long
  2018-04-29 16:51   ` kbuild test robot
  2018-05-02  2:18 ` [PATCH v6 0/8] ipc: Clamp *mni to the real IPCMNI limit & increase that limit Eric W. Biederman
  8 siblings, 1 reply; 17+ messages in thread
From: Waiman Long @ 2018-04-27 21:00 UTC (permalink / raw)
  To: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Waiman Long

The mixing in of a sequence number into the IPC IDs is probably to
avoid ID reuse in userspace as much as possible. With extended IPCMNI
mode, the number of usable sequence numbers is greatly reduced leading
to higher chance of ID reuse.

To address this issue, we need to conserve the sequence number space
as much as possible. Right now, the sequence number is incremented
for every new ID created. In reality, we only need to increment the
sequence number when one or more IDs have been removed previously to
make sure that those IDs will not be reused when a new one is built.
This is being done in the extended IPCMNI mode,

Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/linux/ipc_namespace.h |  1 +
 ipc/ipc_sysctl.c              |  2 ++
 ipc/util.c                    | 29 ++++++++++++++++++++++-------
 ipc/util.h                    |  1 +
 4 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index b5630c8..9c86fd9 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -16,6 +16,7 @@
 struct ipc_ids {
 	int in_use;
 	unsigned short seq;
+	unsigned short deleted;
 	bool tables_initialized;
 	struct rw_semaphore rwsem;
 	struct idr ipcs_idr;
diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
index 4e2cb6d..b7fb38c 100644
--- a/ipc/ipc_sysctl.c
+++ b/ipc/ipc_sysctl.c
@@ -111,6 +111,7 @@ static int proc_ipc_sem_dointvec(struct ctl_table *table, int write,
 static int int_max = INT_MAX;
 int ipc_mni __read_mostly = IPCMNI;
 int ipc_mni_shift __read_mostly = IPCMNI_SHIFT;
+bool ipc_mni_extended __read_mostly;
 
 static struct ctl_table ipc_kern_table[] = {
 	{
@@ -243,6 +244,7 @@ static int __init ipc_mni_extend(char *str)
 {
 	ipc_mni = IPCMNI_EXTEND;
 	ipc_mni_shift = IPCMNI_EXTEND_SHIFT;
+	ipc_mni_extended = true;
 	pr_info("IPCMNI extended to %d.\n", ipc_mni);
 	return 0;
 }
diff --git a/ipc/util.c b/ipc/util.c
index 782a8d0..7c8e733 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -119,7 +119,8 @@ int ipc_init_ids(struct ipc_ids *ids)
 {
 	int err;
 	ids->in_use = 0;
-	ids->seq = 0;
+	ids->deleted = false;
+	ids->seq = ipc_mni_extended ? 0 : -1; /* seq # is pre-incremented */
 	init_rwsem(&ids->rwsem);
 	err = rhashtable_init(&ids->key_ht, &ipc_kht_params);
 	if (err)
@@ -193,6 +194,11 @@ static struct kern_ipc_perm *ipc_findkey(struct ipc_ids *ids, key_t key)
 	return NULL;
 }
 
+/*
+ * To conserve sequence number space with extended ipc_mni when new ID
+ * is built, the sequence number is incremented only when one or more
+ * IDs have been removed previously.
+ */
 #ifdef CONFIG_CHECKPOINT_RESTORE
 /*
  * Specify desired id for next allocated IPC object.
@@ -206,9 +212,13 @@ static inline int ipc_buildid(int id, struct ipc_ids *ids,
 			      struct kern_ipc_perm *new)
 {
 	if (ids->next_id < 0) { /* default, behave as !CHECKPOINT_RESTORE */
-		new->seq = ids->seq++;
-		if (ids->seq > IPCID_SEQ_MAX)
-			ids->seq = 0;
+		if (!ipc_mni_extended || ids->deleted) {
+			ids->seq++;
+			if (ids->seq > IPCID_SEQ_MAX)
+				ids->seq = 0;
+			ids->deleted = false;
+		}
+		new->seq = ids->seq;
 	} else {
 		new->seq = ipcid_to_seqx(ids->next_id);
 		ids->next_id = -1;
@@ -224,9 +234,13 @@ static inline int ipc_buildid(int id, struct ipc_ids *ids,
 static inline int ipc_buildid(int id, struct ipc_ids *ids,
 			      struct kern_ipc_perm *new)
 {
-	new->seq = ids->seq++;
-	if (ids->seq > IPCID_SEQ_MAX)
-		ids->seq = 0;
+	if (!ipc_mni_extended || ids->deleted) {
+		ids->seq++;
+		if (ids->seq > IPCID_SEQ_MAX)
+			ids->seq = 0;
+		ids->deleted = false;
+	}
+	new->seq = ids->seq;
 
 	return (new->seq << SEQ_SHIFT) + id;
 }
@@ -436,6 +450,7 @@ void ipc_rmid(struct ipc_ids *ids, struct kern_ipc_perm *ipcp)
 	idr_remove(&ids->ipcs_idr, lid);
 	ipc_kht_remove(ids, ipcp);
 	ids->in_use--;
+	ids->deleted = true;
 	ipcp->deleted = true;
 
 	if (unlikely(lid == ids->max_id)) {
diff --git a/ipc/util.h b/ipc/util.h
index e4d14b6..54a86fc 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -28,6 +28,7 @@
 
 extern int ipc_mni;
 extern int ipc_mni_shift;
+extern bool ipc_mni_extended;
 
 #define SEQ_SHIFT	ipc_mni_shift
 #define SEQ_MASK	((1 << ipc_mni_shift) - 1)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 7/8] ipc: Allow boot time extension of IPCMNI from 32k to 2M
  2018-04-27 21:00 ` [PATCH v6 7/8] ipc: Allow boot time extension of IPCMNI from 32k to 2M Waiman Long
@ 2018-04-29 15:54   ` kbuild test robot
  0 siblings, 0 replies; 17+ messages in thread
From: kbuild test robot @ 2018-04-29 15:54 UTC (permalink / raw)
  To: Waiman Long
  Cc: kbuild-all, Luis R. Rodriguez, Kees Cook, Andrew Morton,
	Jonathan Corbet, linux-kernel, linux-fsdevel, linux-doc, Al Viro,
	Matthew Wilcox, Eric W. Biederman, Waiman Long

[-- Attachment #1: Type: text/plain, Size: 1897 bytes --]

Hi Waiman,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.17-rc2 next-20180426]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Waiman-Long/ipc-Clamp-mni-to-the-real-IPCMNI-limit-increase-that-limit/20180429-175431
config: m68k-stmark2_defconfig (attached as .config)
compiler: m68k-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=m68k 

All errors (new ones prefixed by >>):

   ipc/util.o: In function `sysvipc_find_ipc':
>> util.c:(.text+0x2ca): undefined reference to `ipc_mni'
   ipc/util.o: In function `ipc_addid':
   util.c:(.text+0x448): undefined reference to `ipc_mni'
>> util.c:(.text+0x6d6): undefined reference to `ipc_mni_shift'
   ipc/util.o: In function `ipc_rmid':
   util.c:(.text+0x712): undefined reference to `ipc_mni_shift'
   ipc/util.o: In function `ipc_obtain_object_idr':
   util.c:(.text+0x97c): undefined reference to `ipc_mni_shift'
   ipc/util.o: In function `ipc_obtain_object_check':
   util.c:(.text+0x9f0): undefined reference to `ipc_mni_shift'
   ipc/msg.o: In function `ksys_msgctl':
>> msg.c:(.text+0x4da): undefined reference to `ipc_mni'
   ipc/sem.o: In function `semctl_info.isra.3':
>> sem.c:(.text+0x91c): undefined reference to `ipc_mni'
   ipc/shm.o: In function `ksys_shmctl':
>> shm.c:(.text+0xa36): undefined reference to `ipc_mni'
   shm.c:(.text+0xb0a): undefined reference to `ipc_mni'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6908 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 8/8] ipc: Conserve sequence numbers in extended IPCMNI mode
  2018-04-27 21:00 ` [PATCH v6 8/8] ipc: Conserve sequence numbers in extended IPCMNI mode Waiman Long
@ 2018-04-29 16:51   ` kbuild test robot
  0 siblings, 0 replies; 17+ messages in thread
From: kbuild test robot @ 2018-04-29 16:51 UTC (permalink / raw)
  To: Waiman Long
  Cc: kbuild-all, Luis R. Rodriguez, Kees Cook, Andrew Morton,
	Jonathan Corbet, linux-kernel, linux-fsdevel, linux-doc, Al Viro,
	Matthew Wilcox, Eric W. Biederman, Waiman Long

[-- Attachment #1: Type: text/plain, Size: 2138 bytes --]

Hi Waiman,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.17-rc2 next-20180426]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Waiman-Long/ipc-Clamp-mni-to-the-real-IPCMNI-limit-increase-that-limit/20180429-175431
config: m68k-stmark2_defconfig (attached as .config)
compiler: m68k-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=m68k 

All errors (new ones prefixed by >>):

   ipc/util.o: In function `sysvipc_find_ipc':
   util.c:(.text+0x2ca): undefined reference to `ipc_mni'
   ipc/util.o: In function `ipc_init_ids':
>> util.c:(.text+0x3e2): undefined reference to `ipc_mni_extended'
   ipc/util.o: In function `ipc_addid':
   util.c:(.text+0x458): undefined reference to `ipc_mni'
   util.c:(.text+0x6d8): undefined reference to `ipc_mni_extended'
   util.c:(.text+0x6f0): undefined reference to `ipc_mni_shift'
   util.c:(.text+0x714): undefined reference to `ipc_mni_shift'
   ipc/util.o: In function `ipc_rmid':
   util.c:(.text+0x736): undefined reference to `ipc_mni_shift'
   ipc/util.o: In function `ipc_obtain_object_idr':
   util.c:(.text+0x9a6): undefined reference to `ipc_mni_shift'
   ipc/util.o: In function `ipc_obtain_object_check':
   util.c:(.text+0xa1a): undefined reference to `ipc_mni_shift'
   ipc/msg.o: In function `ksys_msgctl':
   msg.c:(.text+0x4da): undefined reference to `ipc_mni'
   ipc/sem.o: In function `semctl_info.isra.3':
   sem.c:(.text+0x91c): undefined reference to `ipc_mni'
   ipc/shm.o: In function `ksys_shmctl':
   shm.c:(.text+0xa36): undefined reference to `ipc_mni'
   shm.c:(.text+0xb0a): undefined reference to `ipc_mni'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6908 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 3/8] sysctl: Warn when a clamped sysctl parameter is set out of range
  2018-04-27 21:00 ` [PATCH v6 3/8] sysctl: Warn when a clamped sysctl parameter is set out of range Waiman Long
@ 2018-04-30 22:40   ` Kees Cook
  2018-05-01 13:41     ` Waiman Long
  0 siblings, 1 reply; 17+ messages in thread
From: Kees Cook @ 2018-04-30 22:40 UTC (permalink / raw)
  To: Waiman Long
  Cc: Luis R. Rodriguez, Andrew Morton, Jonathan Corbet, LKML,
	linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman

I like this series overall, thanks! No objections from me. One thing I
noted, though:

On Fri, Apr 27, 2018 at 2:00 PM, Waiman Long <longman@redhat.com> wrote:
>                 if (param->min && *param->min > val) {
>                         if (clamp) {
>                                 val = *param->min;
> +                               clamped = true;
>                         } else {
>                                 return -EINVAL;
>                         }

This appears as a common bit of logic in many places in the series. It
seems like it'd make sense to make this a helper of some kind?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 3/8] sysctl: Warn when a clamped sysctl parameter is set out of range
  2018-04-30 22:40   ` Kees Cook
@ 2018-05-01 13:41     ` Waiman Long
  0 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2018-05-01 13:41 UTC (permalink / raw)
  To: Kees Cook
  Cc: Luis R. Rodriguez, Andrew Morton, Jonathan Corbet, LKML,
	linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman

On 04/30/2018 06:40 PM, Kees Cook wrote:
> I like this series overall, thanks! No objections from me. One thing I
> noted, though:
>
> On Fri, Apr 27, 2018 at 2:00 PM, Waiman Long <longman@redhat.com> wrote:
>>                 if (param->min && *param->min > val) {
>>                         if (clamp) {
>>                                 val = *param->min;
>> +                               clamped = true;
>>                         } else {
>>                                 return -EINVAL;
>>                         }
> This appears as a common bit of logic in many places in the series. It
> seems like it'd make sense to make this a helper of some kind?
>
> -Kees
>
We can't have an inline helper function because the types are different.
We may be able to use a helper macro, but helper macro like that may be
not well accepted by the kernel community.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 0/8] ipc: Clamp *mni to the real IPCMNI limit & increase that limit
  2018-04-27 21:00 [PATCH v6 0/8] ipc: Clamp *mni to the real IPCMNI limit & increase that limit Waiman Long
                   ` (7 preceding siblings ...)
  2018-04-27 21:00 ` [PATCH v6 8/8] ipc: Conserve sequence numbers in extended IPCMNI mode Waiman Long
@ 2018-05-02  2:18 ` Eric W. Biederman
  2018-05-02 13:23   ` Waiman Long
  8 siblings, 1 reply; 17+ messages in thread
From: Eric W. Biederman @ 2018-05-02  2:18 UTC (permalink / raw)
  To: Waiman Long
  Cc: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet,
	linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox



> The sysctl parameters msgmni, shmmni and semmni have an inherent limit
> of IPC_MNI (32k). However, users may not be aware of that because they
> can write a value much higher than that without getting any error or
> notification. Reading the parameters back will show the newly written
> values which are not real.
>
> Enforcing the limit by failing sysctl parameter write, however, may
> cause regressions if existing user setup scripts set those parameters
> above 32k as those scripts will now fail in this case.

I have a serious problem with this approach.  Have you made any effort
to identify any code that sets these values above 32k?  Have you looked
to see if these applications actually care if you return an error when
a value is set too large?

Right now this seems like a lot of work to avoid breaking applications
and or users that may or may not exist.  If you can find something that
will care sure.  We need to avoid breaking userspace and causing
regressions.  However as this stands it looks you are making maintenance
of the kernel more difficult to avoid having to look to see if there are
monsters under the bed.



Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 0/8] ipc: Clamp *mni to the real IPCMNI limit & increase that limit
  2018-05-02  2:18 ` [PATCH v6 0/8] ipc: Clamp *mni to the real IPCMNI limit & increase that limit Eric W. Biederman
@ 2018-05-02 13:23   ` Waiman Long
  2018-05-02 15:06     ` Eric W. Biederman
  0 siblings, 1 reply; 17+ messages in thread
From: Waiman Long @ 2018-05-02 13:23 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet,
	linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox

On 05/01/2018 10:18 PM, Eric W. Biederman wrote:
>
>> The sysctl parameters msgmni, shmmni and semmni have an inherent limit
>> of IPC_MNI (32k). However, users may not be aware of that because they
>> can write a value much higher than that without getting any error or
>> notification. Reading the parameters back will show the newly written
>> values which are not real.
>>
>> Enforcing the limit by failing sysctl parameter write, however, may
>> cause regressions if existing user setup scripts set those parameters
>> above 32k as those scripts will now fail in this case.
> I have a serious problem with this approach.  Have you made any effort
> to identify any code that sets these values above 32k?  Have you looked
> to see if these applications actually care if you return an error when
> a value is set too large?

It is not that an application cares about if an error is returned or
not. Most applications don't care. It is that if an error is returned,
it means that the sysctl parameter isn't change at all instead of being
set to a large value and then internally clamped to a smaller number
which is still bigger than the original value. That is what can break an
application because the sysctl parameters may be just too small for the
application.

> Right now this seems like a lot of work to avoid breaking applications
> and or users that may or may not exist.  If you can find something that
> will care sure.  We need to avoid breaking userspace and causing
> regressions.  However as this stands it looks you are making maintenance
> of the kernel more difficult to avoid having to look to see if there are
> monsters under the bed.

I shall admit that it can be hard to find applications that will
explicitly need that as we usually don't have access to the applications
that the customers have. It is more a correctness issue where the
existing code is kind of lying about what can actually be supported. I
just want to make the users more aware of what the right limits are.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 0/8] ipc: Clamp *mni to the real IPCMNI limit & increase that limit
  2018-05-02 13:23   ` Waiman Long
@ 2018-05-02 15:06     ` Eric W. Biederman
  2018-05-07 19:14       ` Waiman Long
  0 siblings, 1 reply; 17+ messages in thread
From: Eric W. Biederman @ 2018-05-02 15:06 UTC (permalink / raw)
  To: Waiman Long
  Cc: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet,
	linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox

Waiman Long <longman@redhat.com> writes:

> On 05/01/2018 10:18 PM, Eric W. Biederman wrote:
>>
>>> The sysctl parameters msgmni, shmmni and semmni have an inherent limit
>>> of IPC_MNI (32k). However, users may not be aware of that because they
>>> can write a value much higher than that without getting any error or
>>> notification. Reading the parameters back will show the newly written
>>> values which are not real.
>>>
>>> Enforcing the limit by failing sysctl parameter write, however, may
>>> cause regressions if existing user setup scripts set those parameters
>>> above 32k as those scripts will now fail in this case.
>> I have a serious problem with this approach.  Have you made any effort
>> to identify any code that sets these values above 32k?  Have you looked
>> to see if these applications actually care if you return an error when
>> a value is set too large?
>
> It is not that an application cares about if an error is returned or
> not. Most applications don't care. It is that if an error is returned,
> it means that the sysctl parameter isn't change at all instead of being
> set to a large value and then internally clamped to a smaller number
> which is still bigger than the original value. That is what can break an
> application because the sysctl parameters may be just too small for the
> application.

Agreed that is a possibility.  The other possibility is like your
customer they will try to use all of the increased number of shared
memory segments it won't work and they will fail, and it will be
mysterious and weird.

I took a quick look to see if cargo culting bad settings was a common
thing and all I could see were examples of people setting the limits
to numbers smaller than 4096.

>> Right now this seems like a lot of work to avoid breaking applications
>> and or users that may or may not exist.  If you can find something that
>> will care sure.  We need to avoid breaking userspace and causing
>> regressions.  However as this stands it looks you are making maintenance
>> of the kernel more difficult to avoid having to look to see if there are
>> monsters under the bed.
>
> I shall admit that it can be hard to find applications that will
> explicitly need that as we usually don't have access to the applications
> that the customers have. It is more a correctness issue where the
> existing code is kind of lying about what can actually be supported. I
> just want to make the users more aware of what the right limits are.

You presume the kernel is lying to applications.  I admit the kernel
can lie to applications.  I don't see any evidence that the kernel is
actually doing so.  So far (to me) it looks like a large number of sysv
shared memory segments is not particulalry common.

So I would not be at all surprised if no regressions would be generated
if you simply deny setting the value past the maximum.

Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 0/8] ipc: Clamp *mni to the real IPCMNI limit & increase that limit
  2018-05-02 15:06     ` Eric W. Biederman
@ 2018-05-07 19:14       ` Waiman Long
  0 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2018-05-07 19:14 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet,
	linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox

On 05/02/2018 11:06 AM, Eric W. Biederman wrote:
>
>>> and or users that may or may not exist.  If you can find something that
>>> will care sure.  We need to avoid breaking userspace and causing
>>> regressions.  However as this stands it looks you are making maintenance
>>> of the kernel more difficult to avoid having to look to see if there are
>>> monsters under the bed.
>> I shall admit that it can be hard to find applications that will
>> explicitly need that as we usually don't have access to the applications
>> that the customers have. It is more a correctness issue where the
>> existing code is kind of lying about what can actually be supported. I
>> just want to make the users more aware of what the right limits are.
> You presume the kernel is lying to applications.  I admit the kernel
> can lie to applications.  I don't see any evidence that the kernel is
> actually doing so.  So far (to me) it looks like a large number of sysv
> shared memory segments is not particulalry common.
>
> So I would not be at all surprised if no regressions would be generated
> if you simply deny setting the value past the maximum.

Maybe you are right. I will update the patchset to fail the update if
the range is exceeded since I had added option of extending the limit if
the users choose to do so.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-05-07 19:14 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-27 21:00 [PATCH v6 0/8] ipc: Clamp *mni to the real IPCMNI limit & increase that limit Waiman Long
2018-04-27 21:00 ` [PATCH v6 1/8] sysctl: Add flags to support min/max range clamping Waiman Long
2018-04-27 21:00 ` [PATCH v6 2/8] proc/sysctl: Provide additional ctl_table.flags checks Waiman Long
2018-04-27 21:00 ` [PATCH v6 3/8] sysctl: Warn when a clamped sysctl parameter is set out of range Waiman Long
2018-04-30 22:40   ` Kees Cook
2018-05-01 13:41     ` Waiman Long
2018-04-27 21:00 ` [PATCH v6 4/8] ipc: Clamp msgmni and shmmni to the real IPCMNI limit Waiman Long
2018-04-27 21:00 ` [PATCH v6 5/8] ipc: Clamp semmni " Waiman Long
2018-04-27 21:00 ` [PATCH v6 6/8] test_sysctl: Add range clamping test Waiman Long
2018-04-27 21:00 ` [PATCH v6 7/8] ipc: Allow boot time extension of IPCMNI from 32k to 2M Waiman Long
2018-04-29 15:54   ` kbuild test robot
2018-04-27 21:00 ` [PATCH v6 8/8] ipc: Conserve sequence numbers in extended IPCMNI mode Waiman Long
2018-04-29 16:51   ` kbuild test robot
2018-05-02  2:18 ` [PATCH v6 0/8] ipc: Clamp *mni to the real IPCMNI limit & increase that limit Eric W. Biederman
2018-05-02 13:23   ` Waiman Long
2018-05-02 15:06     ` Eric W. Biederman
2018-05-07 19:14       ` Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).