* [PATCH 0/3] Misc updates to sysv ipc @ 2014-10-06 18:32 Manfred Spraul 2014-10-06 18:32 ` [PATCH 1/3] ipc/sem.c: Chance memory barrier in sem_lock() to smp_rmb() Manfred Spraul 0 siblings, 1 reply; 10+ messages in thread From: Manfred Spraul @ 2014-10-06 18:32 UTC (permalink / raw) To: Andrew Morton Cc: LKML, Davidlohr Bueso, Michael Kerrisk, Rafael Aquini, Rik van Riel, 1vier1, Manfred Spraul Hi Andrew, I've updated the patches based on the feedback I got. Could you include them in your tree and forward them to Linus? I consider them as ready for merging. 0001-ipc-sem.c-Chance-memory-barrier-in-sem_lock-to-smp_r.patch When I rewrote sem_lock(), I was more conservative than necessary: smp_rmb() is sufficient. 0002-ipc-sem.c-increase-SEMMSL-SEMMNI-SEMOPM.patch - increase all limits - update the documentation, it was stale 0003-ipc-msg-increase-MSGMNI-remove-scaling.patch - increase MSGMNI to 32000 - as a bonus, this removes around 300 lines - keep auto_msgmni, but document it as deprecated. I've dropped: ipc-namespace-copy-settings-from-parent-namespace.patch - user space must handle the old logic anyway. - it could break userspace -- Manfred ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/3] ipc/sem.c: Chance memory barrier in sem_lock() to smp_rmb() 2014-10-06 18:32 [PATCH 0/3] Misc updates to sysv ipc Manfred Spraul @ 2014-10-06 18:32 ` Manfred Spraul 2014-10-06 18:32 ` [PATCH 2/3] ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM Manfred Spraul ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Manfred Spraul @ 2014-10-06 18:32 UTC (permalink / raw) To: Andrew Morton Cc: LKML, Davidlohr Bueso, Michael Kerrisk, Rafael Aquini, Rik van Riel, 1vier1, Manfred Spraul When I fixed bugs in the sem_lock() logic, I was more conservative than necessary. Therefore it is safe to replace the smp_mb() with smp_rmb(). And: With smp_rmb(), semop() syscalls are up to 10% faster. The race we must protect against is: sem->lock is free sma->complex_count = 0 sma->sem_perm.lock held by thread B thread A: A: spin_lock(&sem->lock) B: sma->complex_count++; (now 1) B: spin_unlock(&sma->sem_perm.lock); A: spin_is_locked(&sma->sem_perm.lock); A: XXXXX memory barrier A: if (sma->complex_count == 0) Thread A must read the increased complex_count value, i.e. the read must not be reordered with the read of sem_perm.lock done by spin_is_locked(). Since it's about ordering of reads, smp_rmb() is sufficient. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> --- ipc/sem.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/ipc/sem.c b/ipc/sem.c index 454f6c6..ffc71de 100644 --- a/ipc/sem.c +++ b/ipc/sem.c @@ -326,10 +326,16 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops, /* Then check that the global lock is free */ if (!spin_is_locked(&sma->sem_perm.lock)) { - /* spin_is_locked() is not a memory barrier */ - smp_mb(); + /* + * The next test must happen after the test for + * sem_perm.lock, otherwise we can race with another + * thread that does + * complex_count++;spin_unlock(sem_perm.lock); + */ + smp_rmb(); - /* Now repeat the test of complex_count: + /* + * Now repeat the test of complex_count: * It can't change anymore until we drop sem->lock. * Thus: if is now 0, then it will stay 0. */ -- 1.9.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/3] ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM 2014-10-06 18:32 ` [PATCH 1/3] ipc/sem.c: Chance memory barrier in sem_lock() to smp_rmb() Manfred Spraul @ 2014-10-06 18:32 ` Manfred Spraul 2014-10-06 18:32 ` [PATCH 3/3] ipc/msg: increase MSGMNI, remove scaling Manfred Spraul 2014-10-07 20:09 ` [PATCH 2/3] ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM Rafael Aquini 2014-10-07 20:08 ` [PATCH 1/3] ipc/sem.c: Chance memory barrier in sem_lock() to smp_rmb() Rafael Aquini 2014-10-09 23:07 ` Davidlohr Bueso 2 siblings, 2 replies; 10+ messages in thread From: Manfred Spraul @ 2014-10-06 18:32 UTC (permalink / raw) To: Andrew Morton Cc: LKML, Davidlohr Bueso, Michael Kerrisk, Rafael Aquini, Rik van Riel, 1vier1, Manfred Spraul a) SysV can be abused to allocate locked kernel memory. For most systems, a small limit doesn't make sense, see the discussion with regards to SHMMAX. Therefore: Increase the sysv sem limits so that all known applications will work with these defaults. b) With regards to the maximum supported: Some of the specified hard limits are not correct anymore, therefore the patch updates the documentation. - SEMMNI must stay below IPCMNI, which is 32768. As for SHMMAX: Stay a bit below this limit. - SEMMSL was limited to 8k, to ensure that the kmalloc for the kernel array was limited to 16 kB (order=2) This doesn't apply anymore: - the allocation size isn't sizeof(short)*nsems anymore. - ipc_alloc falls back to vmalloc - SEMOPM should stay below 1000, to limit the kmalloc in semtimedop() to an order=1 allocation. Therefore: Leave it at 500 (order=0 allocation). Note: If an administrator must limit the memory allocations, then he can set the values as necessary. Or he can disable sysv entirely (as e.g. done by Android). Signed-off-by: Manfred Spraul <manfred@colorfullife.com> --- include/uapi/linux/sem.h | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/include/uapi/linux/sem.h b/include/uapi/linux/sem.h index 541fce0..dd73b90 100644 --- a/include/uapi/linux/sem.h +++ b/include/uapi/linux/sem.h @@ -63,10 +63,22 @@ struct seminfo { int semaem; }; -#define SEMMNI 128 /* <= IPCMNI max # of semaphore identifiers */ -#define SEMMSL 250 /* <= 8 000 max num of semaphores per id */ +/* + * SEMMNI, SEMMSL and SEMMNS are default values which can be + * modified by sysctl. + * The values has been chosen to be larger than necessary for any + * known configuration. + * + * SEMOPM should not be increased beyond 1000, otherwise there is the + * risk that semop()/semtimedop() fails due to kernel memory fragmentation when + * allocating the sop array. + */ + + +#define SEMMNI 32000 /* <= IPCMNI max # of semaphore identifiers */ +#define SEMMSL 32000 /* <= INT_MAX max num of semaphores per id */ #define SEMMNS (SEMMNI*SEMMSL) /* <= INT_MAX max # of semaphores in system */ -#define SEMOPM 32 /* <= 1 000 max num of ops per semop call */ +#define SEMOPM 500 /* <= 1 000 max num of ops per semop call */ #define SEMVMX 32767 /* <= 32767 semaphore maximum value */ #define SEMAEM SEMVMX /* adjust on exit max value */ -- 1.9.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/3] ipc/msg: increase MSGMNI, remove scaling 2014-10-06 18:32 ` [PATCH 2/3] ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM Manfred Spraul @ 2014-10-06 18:32 ` Manfred Spraul 2014-10-07 20:09 ` [PATCH 2/3] ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM Rafael Aquini 1 sibling, 0 replies; 10+ messages in thread From: Manfred Spraul @ 2014-10-06 18:32 UTC (permalink / raw) To: Andrew Morton Cc: LKML, Davidlohr Bueso, Michael Kerrisk, Rafael Aquini, Rik van Riel, 1vier1, Manfred Spraul SysV can be abused to allocate locked kernel memory. For most systems, a small limit doesn't make sense, see the discussion with regards to SHMMAX. Therefore: increase MSGMNI to the maximum supported. And: If we ignore the risk of locking too much memory, then an automatic scaling of MSGMNI doesn't make sense. Therefore the logic can be removed. The code preserves auto_msgmni to avoid breaking any user space applications that expect that the value exists. Notes: 1) If an administrator must limit the memory allocations, then he can set MSGMNI as necessary. Or he can disable sysv entirely (as e.g. done by Android). 2) MSGMAX and MSGMNB are intentionally not increased, as these values are used to control latency vs. throughput: If MSGMNB is large, then msgsnd() just returns and more messages can be queued before a task switch to a task that calls msgrcv() is forced. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> --- Documentation/sysctl/kernel.txt | 10 +++-- include/linux/ipc_namespace.h | 20 --------- include/uapi/linux/msg.h | 28 ++++++++---- ipc/Makefile | 2 +- ipc/ipc_sysctl.c | 94 ++++++++--------------------------------- ipc/ipcns_notifier.c | 92 ---------------------------------------- ipc/msg.c | 36 +--------------- ipc/namespace.c | 22 ---------- ipc/util.c | 40 ------------------ 9 files changed, 45 insertions(+), 299 deletions(-) delete mode 100644 ipc/ipcns_notifier.c diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index f79eb96..4ea9b48 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -115,10 +115,12 @@ set during run time. auto_msgmni: -Enables/Disables automatic recomputing of msgmni upon memory add/remove -or upon ipc namespace creation/removal (see the msgmni description -above). Echoing "1" into this file enables msgmni automatic recomputing. -Echoing "0" turns it off. auto_msgmni default value is 1. +This variable has no effect and may be removed in future kernel +releases. Reading it always returns 0. +Up to Linux 3.17, it enabled/disabled automatic recomputing of msgmni +upon memory add/remove or upon ipc namespace creation/removal. +Echoing "1" into this file enabled msgmni automatic recomputing. +Echoing "0" turned it off. auto_msgmni default value was 1. ============================================================== diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h index 35e7eca..e365d5e 100644 --- a/include/linux/ipc_namespace.h +++ b/include/linux/ipc_namespace.h @@ -7,15 +7,6 @@ #include <linux/notifier.h> #include <linux/nsproxy.h> -/* - * ipc namespace events - */ -#define IPCNS_MEMCHANGED 0x00000001 /* Notify lowmem size changed */ -#define IPCNS_CREATED 0x00000002 /* Notify new ipc namespace created */ -#define IPCNS_REMOVED 0x00000003 /* Notify ipc namespace removed */ - -#define IPCNS_CALLBACK_PRI 0 - struct user_namespace; struct ipc_ids { @@ -38,7 +29,6 @@ struct ipc_namespace { unsigned int msg_ctlmni; atomic_t msg_bytes; atomic_t msg_hdrs; - int auto_msgmni; size_t shm_ctlmax; size_t shm_ctlall; @@ -77,18 +67,8 @@ extern atomic_t nr_ipc_ns; extern spinlock_t mq_lock; #ifdef CONFIG_SYSVIPC -extern int register_ipcns_notifier(struct ipc_namespace *); -extern int cond_register_ipcns_notifier(struct ipc_namespace *); -extern void unregister_ipcns_notifier(struct ipc_namespace *); -extern int ipcns_notify(unsigned long); extern void shm_destroy_orphaned(struct ipc_namespace *ns); #else /* CONFIG_SYSVIPC */ -static inline int register_ipcns_notifier(struct ipc_namespace *ns) -{ return 0; } -static inline int cond_register_ipcns_notifier(struct ipc_namespace *ns) -{ return 0; } -static inline void unregister_ipcns_notifier(struct ipc_namespace *ns) { } -static inline int ipcns_notify(unsigned long l) { return 0; } static inline void shm_destroy_orphaned(struct ipc_namespace *ns) {} #endif /* CONFIG_SYSVIPC */ diff --git a/include/uapi/linux/msg.h b/include/uapi/linux/msg.h index a703755..2733ec8 100644 --- a/include/uapi/linux/msg.h +++ b/include/uapi/linux/msg.h @@ -51,16 +51,28 @@ struct msginfo { }; /* - * Scaling factor to compute msgmni: - * the memory dedicated to msg queues (msgmni * msgmnb) should occupy - * at most 1/MSG_MEM_SCALE of the lowmem (see the formula in ipc/msg.c): - * up to 8MB : msgmni = 16 (MSGMNI) - * 4 GB : msgmni = 8K - * more than 16 GB : msgmni = 32K (IPCMNI) + * MSGMNI, MSGMAX and MSGMNB are default values which can be + * modified by sysctl. + * + * MSGMNI is the upper limit for the number of messages queues per + * namespace. + * It has been chosen to be as large possible without facilitating + * scenarios where userspace causes overflows when adjusting the limits via + * operations of the form retrieve current limit; add X; update limit". + * + * MSGMNB is the default size of a new message queue. Non-root tasks can + * decrease the size with msgctl(IPC_SET), root tasks + * (actually: CAP_SYS_RESOURCE) can both increase and decrease the queue + * size. The optimal value is application dependant. + * 16384 is used because it was always used (since 0.99.10) + * + * MAXMAX is the maximum size of an individual message, it's a global + * (per-namespace) limit that applies for all message queues. + * It's set to 1/2 of MSGMNB, to ensure that at least two messages fit into + * the queue. This is also an arbitrary choice (since 2.6.0). */ -#define MSG_MEM_SCALE 32 -#define MSGMNI 16 /* <= IPCMNI */ /* max # of msg queue identifiers */ +#define MSGMNI 32000 /* <= IPCMNI */ /* max # of msg queue identifiers */ #define MSGMAX 8192 /* <= INT_MAX */ /* max size of message (bytes) */ #define MSGMNB 16384 /* <= INT_MAX */ /* default max size of a message queue */ diff --git a/ipc/Makefile b/ipc/Makefile index 9075e17..86c7300 100644 --- a/ipc/Makefile +++ b/ipc/Makefile @@ -3,7 +3,7 @@ # obj-$(CONFIG_SYSVIPC_COMPAT) += compat.o -obj-$(CONFIG_SYSVIPC) += util.o msgutil.o msg.o sem.o shm.o ipcns_notifier.o syscall.o +obj-$(CONFIG_SYSVIPC) += util.o msgutil.o msg.o sem.o shm.o syscall.o obj-$(CONFIG_SYSVIPC_SYSCTL) += ipc_sysctl.o obj_mq-$(CONFIG_COMPAT) += compat_mq.o obj-$(CONFIG_POSIX_MQUEUE) += mqueue.o msgutil.o $(obj_mq-y) diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c index c3f0326..8ad93c2 100644 --- a/ipc/ipc_sysctl.c +++ b/ipc/ipc_sysctl.c @@ -62,29 +62,6 @@ static int proc_ipc_dointvec_minmax_orphans(struct ctl_table *table, int write, return err; } -static int proc_ipc_callback_dointvec_minmax(struct ctl_table *table, int write, - void __user *buffer, size_t *lenp, loff_t *ppos) -{ - struct ctl_table ipc_table; - size_t lenp_bef = *lenp; - int rc; - - memcpy(&ipc_table, table, sizeof(ipc_table)); - ipc_table.data = get_ipc(table); - - rc = proc_dointvec_minmax(&ipc_table, write, buffer, lenp, ppos); - - if (write && !rc && lenp_bef == *lenp) - /* - * Tunable has successfully been changed by hand. Disable its - * automatic adjustment. This simply requires unregistering - * the notifiers that trigger recalculation. - */ - unregister_ipcns_notifier(current->nsproxy->ipc_ns); - - return rc; -} - static int proc_ipc_doulongvec_minmax(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos) { @@ -96,55 +73,19 @@ static int proc_ipc_doulongvec_minmax(struct ctl_table *table, int write, lenp, ppos); } -/* - * Routine that is called when the file "auto_msgmni" has successfully been - * written. - * Two values are allowed: - * 0: unregister msgmni's callback routine from the ipc namespace notifier - * chain. This means that msgmni won't be recomputed anymore upon memory - * add/remove or ipc namespace creation/removal. - * 1: register back the callback routine. - */ -static void ipc_auto_callback(int val) -{ - if (!val) - unregister_ipcns_notifier(current->nsproxy->ipc_ns); - else { - /* - * Re-enable automatic recomputing only if not already - * enabled. - */ - recompute_msgmni(current->nsproxy->ipc_ns); - cond_register_ipcns_notifier(current->nsproxy->ipc_ns); - } -} - -static int proc_ipcauto_dointvec_minmax(struct ctl_table *table, int write, +static int proc_ipc_auto_msgmni(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos) { struct ctl_table ipc_table; - size_t lenp_bef = *lenp; - int oldval; - int rc; + int dummy = 0; memcpy(&ipc_table, table, sizeof(ipc_table)); - ipc_table.data = get_ipc(table); - oldval = *((int *)(ipc_table.data)); + ipc_table.data = &dummy; - rc = proc_dointvec_minmax(&ipc_table, write, buffer, lenp, ppos); + if (write) + pr_info_once("writing to auto_msgmni has no effect"); - if (write && !rc && lenp_bef == *lenp) { - int newval = *((int *)(ipc_table.data)); - /* - * The file "auto_msgmni" has correctly been set. - * React by (un)registering the corresponding tunable, if the - * value has changed. - */ - if (newval != oldval) - ipc_auto_callback(newval); - } - - return rc; + return proc_dointvec_minmax(&ipc_table, write, buffer, lenp, ppos); } #else @@ -152,8 +93,7 @@ static int proc_ipcauto_dointvec_minmax(struct ctl_table *table, int write, #define proc_ipc_dointvec NULL #define proc_ipc_dointvec_minmax NULL #define proc_ipc_dointvec_minmax_orphans NULL -#define proc_ipc_callback_dointvec_minmax NULL -#define proc_ipcauto_dointvec_minmax NULL +#define proc_ipc_auto_msgmni NULL #endif static int zero; @@ -205,11 +145,20 @@ static struct ctl_table ipc_kern_table[] = { .data = &init_ipc_ns.msg_ctlmni, .maxlen = sizeof(init_ipc_ns.msg_ctlmni), .mode = 0644, - .proc_handler = proc_ipc_callback_dointvec_minmax, + .proc_handler = proc_ipc_dointvec_minmax, .extra1 = &zero, .extra2 = &int_max, }, { + .procname = "auto_msgmni", + .data = NULL, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_ipc_auto_msgmni, + .extra1 = &zero, + .extra2 = &one, + }, + { .procname = "msgmnb", .data = &init_ipc_ns.msg_ctlmnb, .maxlen = sizeof(init_ipc_ns.msg_ctlmnb), @@ -225,15 +174,6 @@ static struct ctl_table ipc_kern_table[] = { .mode = 0644, .proc_handler = proc_ipc_dointvec, }, - { - .procname = "auto_msgmni", - .data = &init_ipc_ns.auto_msgmni, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_ipcauto_dointvec_minmax, - .extra1 = &zero, - .extra2 = &one, - }, #ifdef CONFIG_CHECKPOINT_RESTORE { .procname = "sem_next_id", diff --git a/ipc/ipcns_notifier.c b/ipc/ipcns_notifier.c deleted file mode 100644 index b9b31a4..0000000 --- a/ipc/ipcns_notifier.c +++ /dev/null @@ -1,92 +0,0 @@ -/* - * linux/ipc/ipcns_notifier.c - * Copyright (C) 2007 BULL SA. Nadia Derbey - * - * Notification mechanism for ipc namespaces: - * The callback routine registered in the memory chain invokes the ipcns - * notifier chain with the IPCNS_MEMCHANGED event. - * Each callback routine registered in the ipcns namespace recomputes msgmni - * for the owning namespace. - */ - -#include <linux/msg.h> -#include <linux/rcupdate.h> -#include <linux/notifier.h> -#include <linux/nsproxy.h> -#include <linux/ipc_namespace.h> - -#include "util.h" - - - -static BLOCKING_NOTIFIER_HEAD(ipcns_chain); - - -static int ipcns_callback(struct notifier_block *self, - unsigned long action, void *arg) -{ - struct ipc_namespace *ns; - - switch (action) { - case IPCNS_MEMCHANGED: /* amount of lowmem has changed */ - case IPCNS_CREATED: - case IPCNS_REMOVED: - /* - * It's time to recompute msgmni - */ - ns = container_of(self, struct ipc_namespace, ipcns_nb); - /* - * No need to get a reference on the ns: the 1st job of - * free_ipc_ns() is to unregister the callback routine. - * blocking_notifier_chain_unregister takes the wr lock to do - * it. - * When this callback routine is called the rd lock is held by - * blocking_notifier_call_chain. - * So the ipc ns cannot be freed while we are here. - */ - recompute_msgmni(ns); - break; - default: - break; - } - - return NOTIFY_OK; -} - -int register_ipcns_notifier(struct ipc_namespace *ns) -{ - int rc; - - memset(&ns->ipcns_nb, 0, sizeof(ns->ipcns_nb)); - ns->ipcns_nb.notifier_call = ipcns_callback; - ns->ipcns_nb.priority = IPCNS_CALLBACK_PRI; - rc = blocking_notifier_chain_register(&ipcns_chain, &ns->ipcns_nb); - if (!rc) - ns->auto_msgmni = 1; - return rc; -} - -int cond_register_ipcns_notifier(struct ipc_namespace *ns) -{ - int rc; - - memset(&ns->ipcns_nb, 0, sizeof(ns->ipcns_nb)); - ns->ipcns_nb.notifier_call = ipcns_callback; - ns->ipcns_nb.priority = IPCNS_CALLBACK_PRI; - rc = blocking_notifier_chain_cond_register(&ipcns_chain, - &ns->ipcns_nb); - if (!rc) - ns->auto_msgmni = 1; - return rc; -} - -void unregister_ipcns_notifier(struct ipc_namespace *ns) -{ - blocking_notifier_chain_unregister(&ipcns_chain, &ns->ipcns_nb); - ns->auto_msgmni = 0; -} - -int ipcns_notify(unsigned long val) -{ - return blocking_notifier_call_chain(&ipcns_chain, val, NULL); -} diff --git a/ipc/msg.c b/ipc/msg.c index c5d8e37..a7261d5 100644 --- a/ipc/msg.c +++ b/ipc/msg.c @@ -989,43 +989,12 @@ SYSCALL_DEFINE5(msgrcv, int, msqid, struct msgbuf __user *, msgp, size_t, msgsz, return do_msgrcv(msqid, msgp, msgsz, msgtyp, msgflg, do_msg_fill); } -/* - * Scale msgmni with the available lowmem size: the memory dedicated to msg - * queues should occupy at most 1/MSG_MEM_SCALE of lowmem. - * Also take into account the number of nsproxies created so far. - * This should be done staying within the (MSGMNI , IPCMNI/nr_ipc_ns) range. - */ -void recompute_msgmni(struct ipc_namespace *ns) -{ - struct sysinfo i; - unsigned long allowed; - int nb_ns; - - si_meminfo(&i); - allowed = (((i.totalram - i.totalhigh) / MSG_MEM_SCALE) * i.mem_unit) - / MSGMNB; - nb_ns = atomic_read(&nr_ipc_ns); - allowed /= nb_ns; - - if (allowed < MSGMNI) { - ns->msg_ctlmni = MSGMNI; - return; - } - - if (allowed > IPCMNI / nb_ns) { - ns->msg_ctlmni = IPCMNI / nb_ns; - return; - } - - ns->msg_ctlmni = allowed; -} void msg_init_ns(struct ipc_namespace *ns) { ns->msg_ctlmax = MSGMAX; ns->msg_ctlmnb = MSGMNB; - - recompute_msgmni(ns); + ns->msg_ctlmni = MSGMNI; atomic_set(&ns->msg_bytes, 0); atomic_set(&ns->msg_hdrs, 0); @@ -1069,9 +1038,6 @@ void __init msg_init(void) { msg_init_ns(&init_ipc_ns); - printk(KERN_INFO "msgmni has been set to %d\n", - init_ipc_ns.msg_ctlmni); - ipc_init_proc_interface("sysvipc/msg", " key msqid perms cbytes qnum lspid lrpid uid gid cuid cgid stime rtime ctime\n", IPC_MSG_IDS, sysvipc_msg_proc_show); diff --git a/ipc/namespace.c b/ipc/namespace.c index b54468e..1a3ffd4 100644 --- a/ipc/namespace.c +++ b/ipc/namespace.c @@ -45,14 +45,6 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns, msg_init_ns(ns); shm_init_ns(ns); - /* - * msgmni has already been computed for the new ipc ns. - * Thus, do the ipcns creation notification before registering that - * new ipcns in the chain. - */ - ipcns_notify(IPCNS_CREATED); - register_ipcns_notifier(ns); - ns->user_ns = get_user_ns(user_ns); return ns; @@ -99,25 +91,11 @@ void free_ipcs(struct ipc_namespace *ns, struct ipc_ids *ids, static void free_ipc_ns(struct ipc_namespace *ns) { - /* - * Unregistering the hotplug notifier at the beginning guarantees - * that the ipc namespace won't be freed while we are inside the - * callback routine. Since the blocking_notifier_chain_XXX routines - * hold a rw lock on the notifier list, unregister_ipcns_notifier() - * won't take the rw lock before blocking_notifier_call_chain() has - * released the rd lock. - */ - unregister_ipcns_notifier(ns); sem_exit_ns(ns); msg_exit_ns(ns); shm_exit_ns(ns); atomic_dec(&nr_ipc_ns); - /* - * Do the ipcns removal notification after decrementing nr_ipc_ns in - * order to have a correct value when recomputing msgmni. - */ - ipcns_notify(IPCNS_REMOVED); put_user_ns(ns->user_ns); proc_free_inum(ns->proc_inum); kfree(ns); diff --git a/ipc/util.c b/ipc/util.c index 27d74e6..0001560 100644 --- a/ipc/util.c +++ b/ipc/util.c @@ -71,44 +71,6 @@ struct ipc_proc_iface { int (*show)(struct seq_file *, void *); }; -static void ipc_memory_notifier(struct work_struct *work) -{ - ipcns_notify(IPCNS_MEMCHANGED); -} - -static int ipc_memory_callback(struct notifier_block *self, - unsigned long action, void *arg) -{ - static DECLARE_WORK(ipc_memory_wq, ipc_memory_notifier); - - switch (action) { - case MEM_ONLINE: /* memory successfully brought online */ - case MEM_OFFLINE: /* or offline: it's time to recompute msgmni */ - /* - * This is done by invoking the ipcns notifier chain with the - * IPC_MEMCHANGED event. - * In order not to keep the lock on the hotplug memory chain - * for too long, queue a work item that will, when waken up, - * activate the ipcns notification chain. - */ - schedule_work(&ipc_memory_wq); - break; - case MEM_GOING_ONLINE: - case MEM_GOING_OFFLINE: - case MEM_CANCEL_ONLINE: - case MEM_CANCEL_OFFLINE: - default: - break; - } - - return NOTIFY_OK; -} - -static struct notifier_block ipc_memory_nb = { - .notifier_call = ipc_memory_callback, - .priority = IPC_CALLBACK_PRI, -}; - /** * ipc_init - initialise ipc subsystem * @@ -124,8 +86,6 @@ static int __init ipc_init(void) sem_init(); msg_init(); shm_init(); - register_hotmemory_notifier(&ipc_memory_nb); - register_ipcns_notifier(&init_ipc_ns); return 0; } device_initcall(ipc_init); -- 1.9.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM 2014-10-06 18:32 ` [PATCH 2/3] ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM Manfred Spraul 2014-10-06 18:32 ` [PATCH 3/3] ipc/msg: increase MSGMNI, remove scaling Manfred Spraul @ 2014-10-07 20:09 ` Rafael Aquini 1 sibling, 0 replies; 10+ messages in thread From: Rafael Aquini @ 2014-10-07 20:09 UTC (permalink / raw) To: Manfred Spraul Cc: Andrew Morton, LKML, Davidlohr Bueso, Michael Kerrisk, Rik van Riel, 1vier1 On Mon, Oct 06, 2014 at 08:32:42PM +0200, Manfred Spraul wrote: > a) > SysV can be abused to allocate locked kernel memory. For most systems, a > small limit doesn't make sense, see the discussion with regards to SHMMAX. > > Therefore: Increase the sysv sem limits so that all known applications > will work with these defaults. > > b) > With regards to the maximum supported: > Some of the specified hard limits are not correct anymore, therefore the > patch updates the documentation. > > - SEMMNI must stay below IPCMNI, which is 32768. > As for SHMMAX: Stay a bit below this limit. > > - SEMMSL was limited to 8k, to ensure that the kmalloc for the kernel array > was limited to 16 kB (order=2) > > This doesn't apply anymore: > - the allocation size isn't sizeof(short)*nsems anymore. > - ipc_alloc falls back to vmalloc > > - SEMOPM should stay below 1000, to limit the kmalloc in semtimedop() to an > order=1 allocation. > Therefore: Leave it at 500 (order=0 allocation). > > Note: > If an administrator must limit the memory allocations, then he can set the > values as necessary. > > Or he can disable sysv entirely (as e.g. done by Android). > > Signed-off-by: Manfred Spraul <manfred@colorfullife.com> > --- > include/uapi/linux/sem.h | 18 +++++++++++++++--- > 1 file changed, 15 insertions(+), 3 deletions(-) > > diff --git a/include/uapi/linux/sem.h b/include/uapi/linux/sem.h > index 541fce0..dd73b90 100644 > --- a/include/uapi/linux/sem.h > +++ b/include/uapi/linux/sem.h > @@ -63,10 +63,22 @@ struct seminfo { > int semaem; > }; > > -#define SEMMNI 128 /* <= IPCMNI max # of semaphore identifiers */ > -#define SEMMSL 250 /* <= 8 000 max num of semaphores per id */ > +/* > + * SEMMNI, SEMMSL and SEMMNS are default values which can be > + * modified by sysctl. > + * The values has been chosen to be larger than necessary for any > + * known configuration. > + * > + * SEMOPM should not be increased beyond 1000, otherwise there is the > + * risk that semop()/semtimedop() fails due to kernel memory fragmentation when > + * allocating the sop array. > + */ > + > + > +#define SEMMNI 32000 /* <= IPCMNI max # of semaphore identifiers */ > +#define SEMMSL 32000 /* <= INT_MAX max num of semaphores per id */ > #define SEMMNS (SEMMNI*SEMMSL) /* <= INT_MAX max # of semaphores in system */ > -#define SEMOPM 32 /* <= 1 000 max num of ops per semop call */ > +#define SEMOPM 500 /* <= 1 000 max num of ops per semop call */ > #define SEMVMX 32767 /* <= 32767 semaphore maximum value */ > #define SEMAEM SEMVMX /* adjust on exit max value */ > > -- > 1.9.3 > Acked-by: Rafael Aquini <aquini@redhat.com> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] ipc/sem.c: Chance memory barrier in sem_lock() to smp_rmb() 2014-10-06 18:32 ` [PATCH 1/3] ipc/sem.c: Chance memory barrier in sem_lock() to smp_rmb() Manfred Spraul 2014-10-06 18:32 ` [PATCH 2/3] ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM Manfred Spraul @ 2014-10-07 20:08 ` Rafael Aquini 2014-10-09 23:07 ` Davidlohr Bueso 2 siblings, 0 replies; 10+ messages in thread From: Rafael Aquini @ 2014-10-07 20:08 UTC (permalink / raw) To: Manfred Spraul Cc: Andrew Morton, LKML, Davidlohr Bueso, Michael Kerrisk, Rik van Riel, 1vier1 On Mon, Oct 06, 2014 at 08:32:41PM +0200, Manfred Spraul wrote: > When I fixed bugs in the sem_lock() logic, I was more conservative than > necessary. > Therefore it is safe to replace the smp_mb() with smp_rmb(). > And: With smp_rmb(), semop() syscalls are up to 10% faster. > > The race we must protect against is: > > sem->lock is free > sma->complex_count = 0 > sma->sem_perm.lock held by thread B > > thread A: > > A: spin_lock(&sem->lock) > > B: sma->complex_count++; (now 1) > B: spin_unlock(&sma->sem_perm.lock); > > A: spin_is_locked(&sma->sem_perm.lock); > A: XXXXX memory barrier > A: if (sma->complex_count == 0) > > Thread A must read the increased complex_count value, i.e. the read must > not be reordered with the read of sem_perm.lock done by spin_is_locked(). > > Since it's about ordering of reads, smp_rmb() is sufficient. > > Signed-off-by: Manfred Spraul <manfred@colorfullife.com> > --- > ipc/sem.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/ipc/sem.c b/ipc/sem.c > index 454f6c6..ffc71de 100644 > --- a/ipc/sem.c > +++ b/ipc/sem.c > @@ -326,10 +326,16 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops, > > /* Then check that the global lock is free */ > if (!spin_is_locked(&sma->sem_perm.lock)) { > - /* spin_is_locked() is not a memory barrier */ > - smp_mb(); > + /* > + * The next test must happen after the test for > + * sem_perm.lock, otherwise we can race with another > + * thread that does > + * complex_count++;spin_unlock(sem_perm.lock); > + */ > + smp_rmb(); > > - /* Now repeat the test of complex_count: > + /* > + * Now repeat the test of complex_count: > * It can't change anymore until we drop sem->lock. > * Thus: if is now 0, then it will stay 0. > */ > -- > 1.9.3 > Acked-by: Rafael Aquini <aquini@redhat.com> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] ipc/sem.c: Chance memory barrier in sem_lock() to smp_rmb() 2014-10-06 18:32 ` [PATCH 1/3] ipc/sem.c: Chance memory barrier in sem_lock() to smp_rmb() Manfred Spraul 2014-10-06 18:32 ` [PATCH 2/3] ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM Manfred Spraul 2014-10-07 20:08 ` [PATCH 1/3] ipc/sem.c: Chance memory barrier in sem_lock() to smp_rmb() Rafael Aquini @ 2014-10-09 23:07 ` Davidlohr Bueso 2 siblings, 0 replies; 10+ messages in thread From: Davidlohr Bueso @ 2014-10-09 23:07 UTC (permalink / raw) To: Manfred Spraul Cc: Andrew Morton, LKML, Michael Kerrisk, Rafael Aquini, Rik van Riel, 1vier1 On Mon, 2014-10-06 at 20:32 +0200, Manfred Spraul wrote: > When I fixed bugs in the sem_lock() logic, I was more conservative than > necessary. > Therefore it is safe to replace the smp_mb() with smp_rmb(). > And: With smp_rmb(), semop() syscalls are up to 10% faster. > > The race we must protect against is: > > sem->lock is free > sma->complex_count = 0 > sma->sem_perm.lock held by thread B > > thread A: > > A: spin_lock(&sem->lock) > > B: sma->complex_count++; (now 1) > B: spin_unlock(&sma->sem_perm.lock); > > A: spin_is_locked(&sma->sem_perm.lock); > A: XXXXX memory barrier > A: if (sma->complex_count == 0) > > Thread A must read the increased complex_count value, i.e. the read must > not be reordered with the read of sem_perm.lock done by spin_is_locked(). > > Since it's about ordering of reads, smp_rmb() is sufficient. > > Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> With a suggestion below. > --- > ipc/sem.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/ipc/sem.c b/ipc/sem.c > index 454f6c6..ffc71de 100644 > --- a/ipc/sem.c > +++ b/ipc/sem.c > @@ -326,10 +326,16 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops, > > /* Then check that the global lock is free */ > if (!spin_is_locked(&sma->sem_perm.lock)) { > - /* spin_is_locked() is not a memory barrier */ > - smp_mb(); > + /* > + * The next test must happen after the test for > + * sem_perm.lock, otherwise we can race with another > + * thread that does > + * complex_count++;spin_unlock(sem_perm.lock); > + */ How about this comment instead: /* * The ipc object lock check must be visible on all cores before * rechecking the complex count. Otherwise we can race with * another thread that does: * complex_count++++; * spin_unlock(sem_perm.lock); */ ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 0/3] ipc: Further updates to sysv/mqueue limits @ 2014-08-12 7:29 Manfred Spraul 2014-08-12 7:29 ` [PATCH 1/3] ipc/msg: increase MSGMNI, remove scaling Manfred Spraul 0 siblings, 1 reply; 10+ messages in thread From: Manfred Spraul @ 2014-08-12 7:29 UTC (permalink / raw) To: Andrew Morton Cc: LKML, Davidlohr Bueso, Michael Kerrisk, Rafael Aquini, Rik van Riel, 1vier1, Manfred Spraul Hi Andrew, I got some positive and no negative feedback on my patches, thus: Could you add the patches to -mm and push them towards Linus? 0001-ipc-msg-increase-MSGMNI-remove-scaling.patch - increase MSGMNI to 32000 - as a bonus, this removes around 300 lines 0002-ipc-sem.c-increase-SEMMSL-SEMMNI-SEMOPM.patch - increase all limits - update the documentation, it was stale Note: Redhat/Oracle recommend values that exceed what I would recommend as safe upper limits. https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Tuning_and_Optimizing_Red_Hat_Enterprise_Linux_for_Oracle_9i_and_10g_Databases/sect-Oracle_9i_and_10g_Tuning_Guide-Setting_Semaphores-An_Example_of_Semaphore_Settings.html Unfortunately I didn't get any replies, thus I don't know how to proceed. One option would be to switch from kmalloc to ipc_alloc, then the risk of failing due to memory fragmentation would be resolved. 0003-ipc-namespace-copy-settings-from-parent-namespace.patch Change the namespace code so that a new namespace inherits the values from it's parent instead of the boot time defaults. Serge Hallyn and Michael Kerrisk also consider the new approach as better. One thing that is a bit surprising is that clone(CLONE_NEWIPC)k I've tested the changes both with unshare -i and with my own test apps. Unfortunately I didn't manage to get lxc to run on my busybox test setup :-( -- Manfred ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/3] ipc/msg: increase MSGMNI, remove scaling 2014-08-12 7:29 [PATCH 0/3] ipc: Further updates to sysv/mqueue limits Manfred Spraul @ 2014-08-12 7:29 ` Manfred Spraul 2014-08-12 7:29 ` [PATCH 2/3] ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM Manfred Spraul 0 siblings, 1 reply; 10+ messages in thread From: Manfred Spraul @ 2014-08-12 7:29 UTC (permalink / raw) To: Andrew Morton Cc: LKML, Davidlohr Bueso, Michael Kerrisk, Rafael Aquini, Rik van Riel, 1vier1, Manfred Spraul SysV can be abused to allocate locked kernel memory. For most systems, a small limit doesn't make sense, see the discussion with regards to SHMMAX. Therefore: increase MSGMNI to the maximum supported. And: If we ignore the risk of locking too much memory, then an automatic scaling of MSGMNI doesn't make sense. Therefore the logic can be removed. Notes: 1) If an administrator must limit the memory allocations, then he can set MSGMNI as necessary. Or he can disable sysv entirely (as e.g. done by Android). 2) MSGMAX and MSGMNB are intentionally not increased, as these values are used to control latency vs. throughput: If MSGMNB is large, then msgsnd() just returns and more messages can be queued before a task switch to a task that calls msgrcv() is forced. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> --- include/linux/ipc_namespace.h | 20 ---------- include/uapi/linux/msg.h | 28 +++++++++---- ipc/Makefile | 2 +- ipc/ipc_sysctl.c | 86 +--------------------------------------- ipc/ipcns_notifier.c | 92 ------------------------------------------- ipc/msg.c | 36 +---------------- ipc/namespace.c | 22 ----------- ipc/util.c | 40 ------------------- 8 files changed, 23 insertions(+), 303 deletions(-) delete mode 100644 ipc/ipcns_notifier.c diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h index 35e7eca..e365d5e 100644 --- a/include/linux/ipc_namespace.h +++ b/include/linux/ipc_namespace.h @@ -7,15 +7,6 @@ #include <linux/notifier.h> #include <linux/nsproxy.h> -/* - * ipc namespace events - */ -#define IPCNS_MEMCHANGED 0x00000001 /* Notify lowmem size changed */ -#define IPCNS_CREATED 0x00000002 /* Notify new ipc namespace created */ -#define IPCNS_REMOVED 0x00000003 /* Notify ipc namespace removed */ - -#define IPCNS_CALLBACK_PRI 0 - struct user_namespace; struct ipc_ids { @@ -38,7 +29,6 @@ struct ipc_namespace { unsigned int msg_ctlmni; atomic_t msg_bytes; atomic_t msg_hdrs; - int auto_msgmni; size_t shm_ctlmax; size_t shm_ctlall; @@ -77,18 +67,8 @@ extern atomic_t nr_ipc_ns; extern spinlock_t mq_lock; #ifdef CONFIG_SYSVIPC -extern int register_ipcns_notifier(struct ipc_namespace *); -extern int cond_register_ipcns_notifier(struct ipc_namespace *); -extern void unregister_ipcns_notifier(struct ipc_namespace *); -extern int ipcns_notify(unsigned long); extern void shm_destroy_orphaned(struct ipc_namespace *ns); #else /* CONFIG_SYSVIPC */ -static inline int register_ipcns_notifier(struct ipc_namespace *ns) -{ return 0; } -static inline int cond_register_ipcns_notifier(struct ipc_namespace *ns) -{ return 0; } -static inline void unregister_ipcns_notifier(struct ipc_namespace *ns) { } -static inline int ipcns_notify(unsigned long l) { return 0; } static inline void shm_destroy_orphaned(struct ipc_namespace *ns) {} #endif /* CONFIG_SYSVIPC */ diff --git a/include/uapi/linux/msg.h b/include/uapi/linux/msg.h index a703755..2733ec8 100644 --- a/include/uapi/linux/msg.h +++ b/include/uapi/linux/msg.h @@ -51,16 +51,28 @@ struct msginfo { }; /* - * Scaling factor to compute msgmni: - * the memory dedicated to msg queues (msgmni * msgmnb) should occupy - * at most 1/MSG_MEM_SCALE of the lowmem (see the formula in ipc/msg.c): - * up to 8MB : msgmni = 16 (MSGMNI) - * 4 GB : msgmni = 8K - * more than 16 GB : msgmni = 32K (IPCMNI) + * MSGMNI, MSGMAX and MSGMNB are default values which can be + * modified by sysctl. + * + * MSGMNI is the upper limit for the number of messages queues per + * namespace. + * It has been chosen to be as large possible without facilitating + * scenarios where userspace causes overflows when adjusting the limits via + * operations of the form retrieve current limit; add X; update limit". + * + * MSGMNB is the default size of a new message queue. Non-root tasks can + * decrease the size with msgctl(IPC_SET), root tasks + * (actually: CAP_SYS_RESOURCE) can both increase and decrease the queue + * size. The optimal value is application dependant. + * 16384 is used because it was always used (since 0.99.10) + * + * MAXMAX is the maximum size of an individual message, it's a global + * (per-namespace) limit that applies for all message queues. + * It's set to 1/2 of MSGMNB, to ensure that at least two messages fit into + * the queue. This is also an arbitrary choice (since 2.6.0). */ -#define MSG_MEM_SCALE 32 -#define MSGMNI 16 /* <= IPCMNI */ /* max # of msg queue identifiers */ +#define MSGMNI 32000 /* <= IPCMNI */ /* max # of msg queue identifiers */ #define MSGMAX 8192 /* <= INT_MAX */ /* max size of message (bytes) */ #define MSGMNB 16384 /* <= INT_MAX */ /* default max size of a message queue */ diff --git a/ipc/Makefile b/ipc/Makefile index 9075e17..86c7300 100644 --- a/ipc/Makefile +++ b/ipc/Makefile @@ -3,7 +3,7 @@ # obj-$(CONFIG_SYSVIPC_COMPAT) += compat.o -obj-$(CONFIG_SYSVIPC) += util.o msgutil.o msg.o sem.o shm.o ipcns_notifier.o syscall.o +obj-$(CONFIG_SYSVIPC) += util.o msgutil.o msg.o sem.o shm.o syscall.o obj-$(CONFIG_SYSVIPC_SYSCTL) += ipc_sysctl.o obj_mq-$(CONFIG_COMPAT) += compat_mq.o obj-$(CONFIG_POSIX_MQUEUE) += mqueue.o msgutil.o $(obj_mq-y) diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c index c3f0326..a00f9df 100644 --- a/ipc/ipc_sysctl.c +++ b/ipc/ipc_sysctl.c @@ -62,29 +62,6 @@ static int proc_ipc_dointvec_minmax_orphans(struct ctl_table *table, int write, return err; } -static int proc_ipc_callback_dointvec_minmax(struct ctl_table *table, int write, - void __user *buffer, size_t *lenp, loff_t *ppos) -{ - struct ctl_table ipc_table; - size_t lenp_bef = *lenp; - int rc; - - memcpy(&ipc_table, table, sizeof(ipc_table)); - ipc_table.data = get_ipc(table); - - rc = proc_dointvec_minmax(&ipc_table, write, buffer, lenp, ppos); - - if (write && !rc && lenp_bef == *lenp) - /* - * Tunable has successfully been changed by hand. Disable its - * automatic adjustment. This simply requires unregistering - * the notifiers that trigger recalculation. - */ - unregister_ipcns_notifier(current->nsproxy->ipc_ns); - - return rc; -} - static int proc_ipc_doulongvec_minmax(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos) { @@ -96,64 +73,12 @@ static int proc_ipc_doulongvec_minmax(struct ctl_table *table, int write, lenp, ppos); } -/* - * Routine that is called when the file "auto_msgmni" has successfully been - * written. - * Two values are allowed: - * 0: unregister msgmni's callback routine from the ipc namespace notifier - * chain. This means that msgmni won't be recomputed anymore upon memory - * add/remove or ipc namespace creation/removal. - * 1: register back the callback routine. - */ -static void ipc_auto_callback(int val) -{ - if (!val) - unregister_ipcns_notifier(current->nsproxy->ipc_ns); - else { - /* - * Re-enable automatic recomputing only if not already - * enabled. - */ - recompute_msgmni(current->nsproxy->ipc_ns); - cond_register_ipcns_notifier(current->nsproxy->ipc_ns); - } -} - -static int proc_ipcauto_dointvec_minmax(struct ctl_table *table, int write, - void __user *buffer, size_t *lenp, loff_t *ppos) -{ - struct ctl_table ipc_table; - size_t lenp_bef = *lenp; - int oldval; - int rc; - - memcpy(&ipc_table, table, sizeof(ipc_table)); - ipc_table.data = get_ipc(table); - oldval = *((int *)(ipc_table.data)); - - rc = proc_dointvec_minmax(&ipc_table, write, buffer, lenp, ppos); - - if (write && !rc && lenp_bef == *lenp) { - int newval = *((int *)(ipc_table.data)); - /* - * The file "auto_msgmni" has correctly been set. - * React by (un)registering the corresponding tunable, if the - * value has changed. - */ - if (newval != oldval) - ipc_auto_callback(newval); - } - - return rc; -} #else #define proc_ipc_doulongvec_minmax NULL #define proc_ipc_dointvec NULL #define proc_ipc_dointvec_minmax NULL #define proc_ipc_dointvec_minmax_orphans NULL -#define proc_ipc_callback_dointvec_minmax NULL -#define proc_ipcauto_dointvec_minmax NULL #endif static int zero; @@ -205,7 +130,7 @@ static struct ctl_table ipc_kern_table[] = { .data = &init_ipc_ns.msg_ctlmni, .maxlen = sizeof(init_ipc_ns.msg_ctlmni), .mode = 0644, - .proc_handler = proc_ipc_callback_dointvec_minmax, + .proc_handler = proc_ipc_dointvec_minmax, .extra1 = &zero, .extra2 = &int_max, }, @@ -225,15 +150,6 @@ static struct ctl_table ipc_kern_table[] = { .mode = 0644, .proc_handler = proc_ipc_dointvec, }, - { - .procname = "auto_msgmni", - .data = &init_ipc_ns.auto_msgmni, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_ipcauto_dointvec_minmax, - .extra1 = &zero, - .extra2 = &one, - }, #ifdef CONFIG_CHECKPOINT_RESTORE { .procname = "sem_next_id", diff --git a/ipc/ipcns_notifier.c b/ipc/ipcns_notifier.c deleted file mode 100644 index b9b31a4..0000000 --- a/ipc/ipcns_notifier.c +++ /dev/null @@ -1,92 +0,0 @@ -/* - * linux/ipc/ipcns_notifier.c - * Copyright (C) 2007 BULL SA. Nadia Derbey - * - * Notification mechanism for ipc namespaces: - * The callback routine registered in the memory chain invokes the ipcns - * notifier chain with the IPCNS_MEMCHANGED event. - * Each callback routine registered in the ipcns namespace recomputes msgmni - * for the owning namespace. - */ - -#include <linux/msg.h> -#include <linux/rcupdate.h> -#include <linux/notifier.h> -#include <linux/nsproxy.h> -#include <linux/ipc_namespace.h> - -#include "util.h" - - - -static BLOCKING_NOTIFIER_HEAD(ipcns_chain); - - -static int ipcns_callback(struct notifier_block *self, - unsigned long action, void *arg) -{ - struct ipc_namespace *ns; - - switch (action) { - case IPCNS_MEMCHANGED: /* amount of lowmem has changed */ - case IPCNS_CREATED: - case IPCNS_REMOVED: - /* - * It's time to recompute msgmni - */ - ns = container_of(self, struct ipc_namespace, ipcns_nb); - /* - * No need to get a reference on the ns: the 1st job of - * free_ipc_ns() is to unregister the callback routine. - * blocking_notifier_chain_unregister takes the wr lock to do - * it. - * When this callback routine is called the rd lock is held by - * blocking_notifier_call_chain. - * So the ipc ns cannot be freed while we are here. - */ - recompute_msgmni(ns); - break; - default: - break; - } - - return NOTIFY_OK; -} - -int register_ipcns_notifier(struct ipc_namespace *ns) -{ - int rc; - - memset(&ns->ipcns_nb, 0, sizeof(ns->ipcns_nb)); - ns->ipcns_nb.notifier_call = ipcns_callback; - ns->ipcns_nb.priority = IPCNS_CALLBACK_PRI; - rc = blocking_notifier_chain_register(&ipcns_chain, &ns->ipcns_nb); - if (!rc) - ns->auto_msgmni = 1; - return rc; -} - -int cond_register_ipcns_notifier(struct ipc_namespace *ns) -{ - int rc; - - memset(&ns->ipcns_nb, 0, sizeof(ns->ipcns_nb)); - ns->ipcns_nb.notifier_call = ipcns_callback; - ns->ipcns_nb.priority = IPCNS_CALLBACK_PRI; - rc = blocking_notifier_chain_cond_register(&ipcns_chain, - &ns->ipcns_nb); - if (!rc) - ns->auto_msgmni = 1; - return rc; -} - -void unregister_ipcns_notifier(struct ipc_namespace *ns) -{ - blocking_notifier_chain_unregister(&ipcns_chain, &ns->ipcns_nb); - ns->auto_msgmni = 0; -} - -int ipcns_notify(unsigned long val) -{ - return blocking_notifier_call_chain(&ipcns_chain, val, NULL); -} diff --git a/ipc/msg.c b/ipc/msg.c index c5d8e37..a7261d5 100644 --- a/ipc/msg.c +++ b/ipc/msg.c @@ -989,43 +989,12 @@ SYSCALL_DEFINE5(msgrcv, int, msqid, struct msgbuf __user *, msgp, size_t, msgsz, return do_msgrcv(msqid, msgp, msgsz, msgtyp, msgflg, do_msg_fill); } -/* - * Scale msgmni with the available lowmem size: the memory dedicated to msg - * queues should occupy at most 1/MSG_MEM_SCALE of lowmem. - * Also take into account the number of nsproxies created so far. - * This should be done staying within the (MSGMNI , IPCMNI/nr_ipc_ns) range. - */ -void recompute_msgmni(struct ipc_namespace *ns) -{ - struct sysinfo i; - unsigned long allowed; - int nb_ns; - - si_meminfo(&i); - allowed = (((i.totalram - i.totalhigh) / MSG_MEM_SCALE) * i.mem_unit) - / MSGMNB; - nb_ns = atomic_read(&nr_ipc_ns); - allowed /= nb_ns; - - if (allowed < MSGMNI) { - ns->msg_ctlmni = MSGMNI; - return; - } - - if (allowed > IPCMNI / nb_ns) { - ns->msg_ctlmni = IPCMNI / nb_ns; - return; - } - - ns->msg_ctlmni = allowed; -} void msg_init_ns(struct ipc_namespace *ns) { ns->msg_ctlmax = MSGMAX; ns->msg_ctlmnb = MSGMNB; - - recompute_msgmni(ns); + ns->msg_ctlmni = MSGMNI; atomic_set(&ns->msg_bytes, 0); atomic_set(&ns->msg_hdrs, 0); @@ -1069,9 +1038,6 @@ void __init msg_init(void) { msg_init_ns(&init_ipc_ns); - printk(KERN_INFO "msgmni has been set to %d\n", - init_ipc_ns.msg_ctlmni); - ipc_init_proc_interface("sysvipc/msg", " key msqid perms cbytes qnum lspid lrpid uid gid cuid cgid stime rtime ctime\n", IPC_MSG_IDS, sysvipc_msg_proc_show); diff --git a/ipc/namespace.c b/ipc/namespace.c index b54468e..1a3ffd4 100644 --- a/ipc/namespace.c +++ b/ipc/namespace.c @@ -45,14 +45,6 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns, msg_init_ns(ns); shm_init_ns(ns); - /* - * msgmni has already been computed for the new ipc ns. - * Thus, do the ipcns creation notification before registering that - * new ipcns in the chain. - */ - ipcns_notify(IPCNS_CREATED); - register_ipcns_notifier(ns); - ns->user_ns = get_user_ns(user_ns); return ns; @@ -99,25 +91,11 @@ void free_ipcs(struct ipc_namespace *ns, struct ipc_ids *ids, static void free_ipc_ns(struct ipc_namespace *ns) { - /* - * Unregistering the hotplug notifier at the beginning guarantees - * that the ipc namespace won't be freed while we are inside the - * callback routine. Since the blocking_notifier_chain_XXX routines - * hold a rw lock on the notifier list, unregister_ipcns_notifier() - * won't take the rw lock before blocking_notifier_call_chain() has - * released the rd lock. - */ - unregister_ipcns_notifier(ns); sem_exit_ns(ns); msg_exit_ns(ns); shm_exit_ns(ns); atomic_dec(&nr_ipc_ns); - /* - * Do the ipcns removal notification after decrementing nr_ipc_ns in - * order to have a correct value when recomputing msgmni. - */ - ipcns_notify(IPCNS_REMOVED); put_user_ns(ns->user_ns); proc_free_inum(ns->proc_inum); kfree(ns); diff --git a/ipc/util.c b/ipc/util.c index 27d74e6..0001560 100644 --- a/ipc/util.c +++ b/ipc/util.c @@ -71,44 +71,6 @@ struct ipc_proc_iface { int (*show)(struct seq_file *, void *); }; -static void ipc_memory_notifier(struct work_struct *work) -{ - ipcns_notify(IPCNS_MEMCHANGED); -} - -static int ipc_memory_callback(struct notifier_block *self, - unsigned long action, void *arg) -{ - static DECLARE_WORK(ipc_memory_wq, ipc_memory_notifier); - - switch (action) { - case MEM_ONLINE: /* memory successfully brought online */ - case MEM_OFFLINE: /* or offline: it's time to recompute msgmni */ - /* - * This is done by invoking the ipcns notifier chain with the - * IPC_MEMCHANGED event. - * In order not to keep the lock on the hotplug memory chain - * for too long, queue a work item that will, when waken up, - * activate the ipcns notification chain. - */ - schedule_work(&ipc_memory_wq); - break; - case MEM_GOING_ONLINE: - case MEM_GOING_OFFLINE: - case MEM_CANCEL_ONLINE: - case MEM_CANCEL_OFFLINE: - default: - break; - } - - return NOTIFY_OK; -} - -static struct notifier_block ipc_memory_nb = { - .notifier_call = ipc_memory_callback, - .priority = IPC_CALLBACK_PRI, -}; - /** * ipc_init - initialise ipc subsystem * @@ -124,8 +86,6 @@ static int __init ipc_init(void) sem_init(); msg_init(); shm_init(); - register_hotmemory_notifier(&ipc_memory_nb); - register_ipcns_notifier(&init_ipc_ns); return 0; } device_initcall(ipc_init); -- 1.9.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/3] ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM 2014-08-12 7:29 ` [PATCH 1/3] ipc/msg: increase MSGMNI, remove scaling Manfred Spraul @ 2014-08-12 7:29 ` Manfred Spraul 2014-08-15 13:42 ` Rafael Aquini 0 siblings, 1 reply; 10+ messages in thread From: Manfred Spraul @ 2014-08-12 7:29 UTC (permalink / raw) To: Andrew Morton Cc: LKML, Davidlohr Bueso, Michael Kerrisk, Rafael Aquini, Rik van Riel, 1vier1, Manfred Spraul a) SysV can be abused to allocate locked kernel memory. For most systems, a small limit doesn't make sense, see the discussion with regards to SHMMAX. Therefore: Increase the sysv sem limits so that all known applications will work with these defaults. b) With regards to the maximum supported: Some of the specified hard limits are not correct anymore, therefore the patch updates the documentation. - SEMMNI must stay below IPCMNI, which is 32768. As for SHMMAX: Stay a bit below this limit. - SEMMSL was limited to 8k, to ensure that the kmalloc for the kernel array was limited to 16 kB (order=2) This doesn't apply anymore: - the allocation size isn't sizeof(short)*nsems anymore. - ipc_alloc falls back to vmalloc - SEMOPM should stay below 1000, to limit the kmalloc in semtimedop() to an order=1 allocation. Therefore: Leave it at 500 (order=0 allocation). Note: If an administrator must limit the memory allocations, then he can set the values as necessary. Or he can disable sysv entirely (as e.g. done by Android). Signed-off-by: Manfred Spraul <manfred@colorfullife.com> --- include/uapi/linux/sem.h | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/include/uapi/linux/sem.h b/include/uapi/linux/sem.h index 541fce0..dd73b90 100644 --- a/include/uapi/linux/sem.h +++ b/include/uapi/linux/sem.h @@ -63,10 +63,22 @@ struct seminfo { int semaem; }; -#define SEMMNI 128 /* <= IPCMNI max # of semaphore identifiers */ -#define SEMMSL 250 /* <= 8 000 max num of semaphores per id */ +/* + * SEMMNI, SEMMSL and SEMMNS are default values which can be + * modified by sysctl. + * The values has been chosen to be larger than necessary for any + * known configuration. + * + * SEMOPM should not be increased beyond 1000, otherwise there is the + * risk that semop()/semtimedop() fails due to kernel memory fragmentation when + * allocating the sop array. + */ + + +#define SEMMNI 32000 /* <= IPCMNI max # of semaphore identifiers */ +#define SEMMSL 32000 /* <= INT_MAX max num of semaphores per id */ #define SEMMNS (SEMMNI*SEMMSL) /* <= INT_MAX max # of semaphores in system */ -#define SEMOPM 32 /* <= 1 000 max num of ops per semop call */ +#define SEMOPM 500 /* <= 1 000 max num of ops per semop call */ #define SEMVMX 32767 /* <= 32767 semaphore maximum value */ #define SEMAEM SEMVMX /* adjust on exit max value */ -- 1.9.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM 2014-08-12 7:29 ` [PATCH 2/3] ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM Manfred Spraul @ 2014-08-15 13:42 ` Rafael Aquini 0 siblings, 0 replies; 10+ messages in thread From: Rafael Aquini @ 2014-08-15 13:42 UTC (permalink / raw) To: Manfred Spraul Cc: Andrew Morton, LKML, Davidlohr Bueso, Michael Kerrisk, Rik van Riel, 1vier1 On Tue, Aug 12, 2014 at 09:29:16AM +0200, Manfred Spraul wrote: > a) > SysV can be abused to allocate locked kernel memory. For most systems, a > small limit doesn't make sense, see the discussion with regards to SHMMAX. > > Therefore: Increase the sysv sem limits so that all known applications > will work with these defaults. > > b) > With regards to the maximum supported: > Some of the specified hard limits are not correct anymore, therefore the > patch updates the documentation. > > - SEMMNI must stay below IPCMNI, which is 32768. > As for SHMMAX: Stay a bit below this limit. > > - SEMMSL was limited to 8k, to ensure that the kmalloc for the kernel array > was limited to 16 kB (order=2) > > This doesn't apply anymore: > - the allocation size isn't sizeof(short)*nsems anymore. > - ipc_alloc falls back to vmalloc > > - SEMOPM should stay below 1000, to limit the kmalloc in semtimedop() to an > order=1 allocation. > Therefore: Leave it at 500 (order=0 allocation). > > Note: > If an administrator must limit the memory allocations, then he can set the > values as necessary. > > Or he can disable sysv entirely (as e.g. done by Android). > > Signed-off-by: Manfred Spraul <manfred@colorfullife.com> > --- Acked-by: Rafael Aquini <aquini@redhat.com> > include/uapi/linux/sem.h | 18 +++++++++++++++--- > 1 file changed, 15 insertions(+), 3 deletions(-) > > diff --git a/include/uapi/linux/sem.h b/include/uapi/linux/sem.h > index 541fce0..dd73b90 100644 > --- a/include/uapi/linux/sem.h > +++ b/include/uapi/linux/sem.h > @@ -63,10 +63,22 @@ struct seminfo { > int semaem; > }; > > -#define SEMMNI 128 /* <= IPCMNI max # of semaphore identifiers */ > -#define SEMMSL 250 /* <= 8 000 max num of semaphores per id */ > +/* > + * SEMMNI, SEMMSL and SEMMNS are default values which can be > + * modified by sysctl. > + * The values has been chosen to be larger than necessary for any > + * known configuration. > + * > + * SEMOPM should not be increased beyond 1000, otherwise there is the > + * risk that semop()/semtimedop() fails due to kernel memory fragmentation when > + * allocating the sop array. > + */ > + > + > +#define SEMMNI 32000 /* <= IPCMNI max # of semaphore identifiers */ > +#define SEMMSL 32000 /* <= INT_MAX max num of semaphores per id */ > #define SEMMNS (SEMMNI*SEMMSL) /* <= INT_MAX max # of semaphores in system */ > -#define SEMOPM 32 /* <= 1 000 max num of ops per semop call */ > +#define SEMOPM 500 /* <= 1 000 max num of ops per semop call */ > #define SEMVMX 32767 /* <= 32767 semaphore maximum value */ > #define SEMAEM SEMVMX /* adjust on exit max value */ > > -- > 1.9.3 > ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 0/3] ipc: Further updates to sysv/mqueue limits @ 2014-05-29 18:46 Manfred Spraul 2014-05-29 18:46 ` [PATCH 1/3] ipc/msg: increase MSGMNI, remove scaling Manfred Spraul 0 siblings, 1 reply; 10+ messages in thread From: Manfred Spraul @ 2014-05-29 18:46 UTC (permalink / raw) To: Davidlohr Bueso, Michael Kerrisk, Rafael Aquini, Rik van Riel Cc: LKML, 1vier1, Manfred Spraul Hi all, a) If we increase SHMMAX/SHMALL, then it makes sense to increase MSGMNI, too. And: This allows to remove the automatic scaling (~300 lines) b) We can also increase SEMMSL, SEMMNI and SEMOPM c) I think it would make more sense if a namespace starts with the limits from it's parent: If an admin set the limits, then he probably wants that these limits also apply for a new child namespace. All patches are RFC - they compile, but that's it. TODO: - check if the sysv sem limits are sane. Especially the SEMOPM - if real users exist that pass > 1k ops, then switch from kmalloc to vmalloc. @the Redhat developers: Do you have any idea where this "often recommended" comes from? https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Tuning_and_Optimizing_Red_Hat_Enterprise_Linux_for_Oracle_9i_and_10g_Databases/sect-Oracle_9i_and_10g_Tuning_Guide-Setting_Semaphores-The_SEMOPM_Parameter.html - copy Davidlohrs explanation for the sysv shm limits to sysv msg and sysv sem. - check if we should also increase the limits for posix mqueue - decide if it would make sense to increase IPCMNI: Right now, it is 32768. This means that after 65536 pairs of semget()/semctl(IPC_RMID), semget() will return the same identifier again - and a semop(old_id) won't return -EINVAL, instead it will access the "new" array, which is probably now what the caller intended to do. The split is arbitrary - we could also split it 1048576/2048 or any other split we want. - test everything. -- Manfred ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/3] ipc/msg: increase MSGMNI, remove scaling 2014-05-29 18:46 [PATCH 0/3] ipc: Further updates to sysv/mqueue limits Manfred Spraul @ 2014-05-29 18:46 ` Manfred Spraul 2014-05-29 18:46 ` [PATCH 2/3] ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM Manfred Spraul 0 siblings, 1 reply; 10+ messages in thread From: Manfred Spraul @ 2014-05-29 18:46 UTC (permalink / raw) To: Davidlohr Bueso, Michael Kerrisk, Rafael Aquini, Rik van Riel Cc: LKML, 1vier1, Manfred Spraul SysV can be abused to allocated locked kernel memory. For most systems, a small limit doesn't make sense, see the discussion with regards to SHMMAX. Therefore: increase MSGMNI to the maximum supported. And: if we ignore the risk of locking too much memory, then an automatic scaling of MSGMNI doesn't make sense. Therefore: remove the whole logic. Note: If an administrator must limit the memory allocations, then he can set MSGMNI as necessary. Or he can disable sysv entirely (as e.g. done with Android). --- include/linux/ipc_namespace.h | 20 ---------- include/uapi/linux/msg.h | 12 +----- ipc/Makefile | 2 +- ipc/ipc_sysctl.c | 87 +--------------------------------------- ipc/ipcns_notifier.c | 92 ------------------------------------------- ipc/msg.c | 36 +---------------- ipc/namespace.c | 22 ----------- ipc/util.c | 40 ------------------- 8 files changed, 4 insertions(+), 307 deletions(-) delete mode 100644 ipc/ipcns_notifier.c diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h index 35e7eca..e365d5e 100644 --- a/include/linux/ipc_namespace.h +++ b/include/linux/ipc_namespace.h @@ -7,15 +7,6 @@ #include <linux/notifier.h> #include <linux/nsproxy.h> -/* - * ipc namespace events - */ -#define IPCNS_MEMCHANGED 0x00000001 /* Notify lowmem size changed */ -#define IPCNS_CREATED 0x00000002 /* Notify new ipc namespace created */ -#define IPCNS_REMOVED 0x00000003 /* Notify ipc namespace removed */ - -#define IPCNS_CALLBACK_PRI 0 - struct user_namespace; struct ipc_ids { @@ -38,7 +29,6 @@ struct ipc_namespace { unsigned int msg_ctlmni; atomic_t msg_bytes; atomic_t msg_hdrs; - int auto_msgmni; size_t shm_ctlmax; size_t shm_ctlall; @@ -77,18 +67,8 @@ extern atomic_t nr_ipc_ns; extern spinlock_t mq_lock; #ifdef CONFIG_SYSVIPC -extern int register_ipcns_notifier(struct ipc_namespace *); -extern int cond_register_ipcns_notifier(struct ipc_namespace *); -extern void unregister_ipcns_notifier(struct ipc_namespace *); -extern int ipcns_notify(unsigned long); extern void shm_destroy_orphaned(struct ipc_namespace *ns); #else /* CONFIG_SYSVIPC */ -static inline int register_ipcns_notifier(struct ipc_namespace *ns) -{ return 0; } -static inline int cond_register_ipcns_notifier(struct ipc_namespace *ns) -{ return 0; } -static inline void unregister_ipcns_notifier(struct ipc_namespace *ns) { } -static inline int ipcns_notify(unsigned long l) { return 0; } static inline void shm_destroy_orphaned(struct ipc_namespace *ns) {} #endif /* CONFIG_SYSVIPC */ diff --git a/include/uapi/linux/msg.h b/include/uapi/linux/msg.h index a703755..ae9ddd2 100644 --- a/include/uapi/linux/msg.h +++ b/include/uapi/linux/msg.h @@ -50,17 +50,7 @@ struct msginfo { unsigned short msgseg; }; -/* - * Scaling factor to compute msgmni: - * the memory dedicated to msg queues (msgmni * msgmnb) should occupy - * at most 1/MSG_MEM_SCALE of the lowmem (see the formula in ipc/msg.c): - * up to 8MB : msgmni = 16 (MSGMNI) - * 4 GB : msgmni = 8K - * more than 16 GB : msgmni = 32K (IPCMNI) - */ -#define MSG_MEM_SCALE 32 - -#define MSGMNI 16 /* <= IPCMNI */ /* max # of msg queue identifiers */ +#define MSGMNI 32000 /* <= IPCMNI */ /* max # of msg queue identifiers */ #define MSGMAX 8192 /* <= INT_MAX */ /* max size of message (bytes) */ #define MSGMNB 16384 /* <= INT_MAX */ /* default max size of a message queue */ diff --git a/ipc/Makefile b/ipc/Makefile index 9075e17..86c7300 100644 --- a/ipc/Makefile +++ b/ipc/Makefile @@ -3,7 +3,7 @@ # obj-$(CONFIG_SYSVIPC_COMPAT) += compat.o -obj-$(CONFIG_SYSVIPC) += util.o msgutil.o msg.o sem.o shm.o ipcns_notifier.o syscall.o +obj-$(CONFIG_SYSVIPC) += util.o msgutil.o msg.o sem.o shm.o syscall.o obj-$(CONFIG_SYSVIPC_SYSCTL) += ipc_sysctl.o obj_mq-$(CONFIG_COMPAT) += compat_mq.o obj-$(CONFIG_POSIX_MQUEUE) += mqueue.o msgutil.o $(obj_mq-y) diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c index 998d31b..cc34646 100644 --- a/ipc/ipc_sysctl.c +++ b/ipc/ipc_sysctl.c @@ -62,29 +62,6 @@ static int proc_ipc_dointvec_minmax_orphans(ctl_table *table, int write, return err; } -static int proc_ipc_callback_dointvec_minmax(ctl_table *table, int write, - void __user *buffer, size_t *lenp, loff_t *ppos) -{ - struct ctl_table ipc_table; - size_t lenp_bef = *lenp; - int rc; - - memcpy(&ipc_table, table, sizeof(ipc_table)); - ipc_table.data = get_ipc(table); - - rc = proc_dointvec_minmax(&ipc_table, write, buffer, lenp, ppos); - - if (write && !rc && lenp_bef == *lenp) - /* - * Tunable has successfully been changed by hand. Disable its - * automatic adjustment. This simply requires unregistering - * the notifiers that trigger recalculation. - */ - unregister_ipcns_notifier(current->nsproxy->ipc_ns); - - return rc; -} - static int proc_ipc_doulongvec_minmax(ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos) { @@ -96,64 +73,11 @@ static int proc_ipc_doulongvec_minmax(ctl_table *table, int write, lenp, ppos); } -/* - * Routine that is called when the file "auto_msgmni" has successfully been - * written. - * Two values are allowed: - * 0: unregister msgmni's callback routine from the ipc namespace notifier - * chain. This means that msgmni won't be recomputed anymore upon memory - * add/remove or ipc namespace creation/removal. - * 1: register back the callback routine. - */ -static void ipc_auto_callback(int val) -{ - if (!val) - unregister_ipcns_notifier(current->nsproxy->ipc_ns); - else { - /* - * Re-enable automatic recomputing only if not already - * enabled. - */ - recompute_msgmni(current->nsproxy->ipc_ns); - cond_register_ipcns_notifier(current->nsproxy->ipc_ns); - } -} - -static int proc_ipcauto_dointvec_minmax(ctl_table *table, int write, - void __user *buffer, size_t *lenp, loff_t *ppos) -{ - struct ctl_table ipc_table; - size_t lenp_bef = *lenp; - int oldval; - int rc; - - memcpy(&ipc_table, table, sizeof(ipc_table)); - ipc_table.data = get_ipc(table); - oldval = *((int *)(ipc_table.data)); - - rc = proc_dointvec_minmax(&ipc_table, write, buffer, lenp, ppos); - - if (write && !rc && lenp_bef == *lenp) { - int newval = *((int *)(ipc_table.data)); - /* - * The file "auto_msgmni" has correctly been set. - * React by (un)registering the corresponding tunable, if the - * value has changed. - */ - if (newval != oldval) - ipc_auto_callback(newval); - } - - return rc; -} - #else #define proc_ipc_doulongvec_minmax NULL #define proc_ipc_dointvec NULL #define proc_ipc_dointvec_minmax NULL #define proc_ipc_dointvec_minmax_orphans NULL -#define proc_ipc_callback_dointvec_minmax NULL -#define proc_ipcauto_dointvec_minmax NULL #endif static int zero; @@ -205,7 +129,7 @@ static struct ctl_table ipc_kern_table[] = { .data = &init_ipc_ns.msg_ctlmni, .maxlen = sizeof(init_ipc_ns.msg_ctlmni), .mode = 0644, - .proc_handler = proc_ipc_callback_dointvec_minmax, + .proc_handler = proc_ipc_dointvec_minmax, .extra1 = &zero, .extra2 = &int_max, }, @@ -225,15 +149,6 @@ static struct ctl_table ipc_kern_table[] = { .mode = 0644, .proc_handler = proc_ipc_dointvec, }, - { - .procname = "auto_msgmni", - .data = &init_ipc_ns.auto_msgmni, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_ipcauto_dointvec_minmax, - .extra1 = &zero, - .extra2 = &one, - }, #ifdef CONFIG_CHECKPOINT_RESTORE { .procname = "sem_next_id", diff --git a/ipc/ipcns_notifier.c b/ipc/ipcns_notifier.c deleted file mode 100644 index b9b31a4..0000000 --- a/ipc/ipcns_notifier.c +++ /dev/null @@ -1,92 +0,0 @@ -/* - * linux/ipc/ipcns_notifier.c - * Copyright (C) 2007 BULL SA. Nadia Derbey - * - * Notification mechanism for ipc namespaces: - * The callback routine registered in the memory chain invokes the ipcns - * notifier chain with the IPCNS_MEMCHANGED event. - * Each callback routine registered in the ipcns namespace recomputes msgmni - * for the owning namespace. - */ - -#include <linux/msg.h> -#include <linux/rcupdate.h> -#include <linux/notifier.h> -#include <linux/nsproxy.h> -#include <linux/ipc_namespace.h> - -#include "util.h" - - - -static BLOCKING_NOTIFIER_HEAD(ipcns_chain); - - -static int ipcns_callback(struct notifier_block *self, - unsigned long action, void *arg) -{ - struct ipc_namespace *ns; - - switch (action) { - case IPCNS_MEMCHANGED: /* amount of lowmem has changed */ - case IPCNS_CREATED: - case IPCNS_REMOVED: - /* - * It's time to recompute msgmni - */ - ns = container_of(self, struct ipc_namespace, ipcns_nb); - /* - * No need to get a reference on the ns: the 1st job of - * free_ipc_ns() is to unregister the callback routine. - * blocking_notifier_chain_unregister takes the wr lock to do - * it. - * When this callback routine is called the rd lock is held by - * blocking_notifier_call_chain. - * So the ipc ns cannot be freed while we are here. - */ - recompute_msgmni(ns); - break; - default: - break; - } - - return NOTIFY_OK; -} - -int register_ipcns_notifier(struct ipc_namespace *ns) -{ - int rc; - - memset(&ns->ipcns_nb, 0, sizeof(ns->ipcns_nb)); - ns->ipcns_nb.notifier_call = ipcns_callback; - ns->ipcns_nb.priority = IPCNS_CALLBACK_PRI; - rc = blocking_notifier_chain_register(&ipcns_chain, &ns->ipcns_nb); - if (!rc) - ns->auto_msgmni = 1; - return rc; -} - -int cond_register_ipcns_notifier(struct ipc_namespace *ns) -{ - int rc; - - memset(&ns->ipcns_nb, 0, sizeof(ns->ipcns_nb)); - ns->ipcns_nb.notifier_call = ipcns_callback; - ns->ipcns_nb.priority = IPCNS_CALLBACK_PRI; - rc = blocking_notifier_chain_cond_register(&ipcns_chain, - &ns->ipcns_nb); - if (!rc) - ns->auto_msgmni = 1; - return rc; -} - -void unregister_ipcns_notifier(struct ipc_namespace *ns) -{ - blocking_notifier_chain_unregister(&ipcns_chain, &ns->ipcns_nb); - ns->auto_msgmni = 0; -} - -int ipcns_notify(unsigned long val) -{ - return blocking_notifier_call_chain(&ipcns_chain, val, NULL); -} diff --git a/ipc/msg.c b/ipc/msg.c index 6498531..ea7e0d1 100644 --- a/ipc/msg.c +++ b/ipc/msg.c @@ -76,43 +76,12 @@ static int newque(struct ipc_namespace *, struct ipc_params *); static int sysvipc_msg_proc_show(struct seq_file *s, void *it); #endif -/* - * Scale msgmni with the available lowmem size: the memory dedicated to msg - * queues should occupy at most 1/MSG_MEM_SCALE of lowmem. - * Also take into account the number of nsproxies created so far. - * This should be done staying within the (MSGMNI , IPCMNI/nr_ipc_ns) range. - */ -void recompute_msgmni(struct ipc_namespace *ns) -{ - struct sysinfo i; - unsigned long allowed; - int nb_ns; - - si_meminfo(&i); - allowed = (((i.totalram - i.totalhigh) / MSG_MEM_SCALE) * i.mem_unit) - / MSGMNB; - nb_ns = atomic_read(&nr_ipc_ns); - allowed /= nb_ns; - - if (allowed < MSGMNI) { - ns->msg_ctlmni = MSGMNI; - return; - } - - if (allowed > IPCMNI / nb_ns) { - ns->msg_ctlmni = IPCMNI / nb_ns; - return; - } - - ns->msg_ctlmni = allowed; -} void msg_init_ns(struct ipc_namespace *ns) { ns->msg_ctlmax = MSGMAX; ns->msg_ctlmnb = MSGMNB; - - recompute_msgmni(ns); + ns->msg_ctlmni = MSGMNI; atomic_set(&ns->msg_bytes, 0); atomic_set(&ns->msg_hdrs, 0); @@ -131,9 +100,6 @@ void __init msg_init(void) { msg_init_ns(&init_ipc_ns); - printk(KERN_INFO "msgmni has been set to %d\n", - init_ipc_ns.msg_ctlmni); - ipc_init_proc_interface("sysvipc/msg", " key msqid perms cbytes qnum lspid lrpid uid gid cuid cgid stime rtime ctime\n", IPC_MSG_IDS, sysvipc_msg_proc_show); diff --git a/ipc/namespace.c b/ipc/namespace.c index 59451c1..6c01c2a 100644 --- a/ipc/namespace.c +++ b/ipc/namespace.c @@ -45,14 +45,6 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns, msg_init_ns(ns); shm_init_ns(ns); - /* - * msgmni has already been computed for the new ipc ns. - * Thus, do the ipcns creation notification before registering that - * new ipcns in the chain. - */ - ipcns_notify(IPCNS_CREATED); - register_ipcns_notifier(ns); - ns->user_ns = get_user_ns(user_ns); return ns; @@ -99,25 +91,11 @@ void free_ipcs(struct ipc_namespace *ns, struct ipc_ids *ids, static void free_ipc_ns(struct ipc_namespace *ns) { - /* - * Unregistering the hotplug notifier at the beginning guarantees - * that the ipc namespace won't be freed while we are inside the - * callback routine. Since the blocking_notifier_chain_XXX routines - * hold a rw lock on the notifier list, unregister_ipcns_notifier() - * won't take the rw lock before blocking_notifier_call_chain() has - * released the rd lock. - */ - unregister_ipcns_notifier(ns); sem_exit_ns(ns); msg_exit_ns(ns); shm_exit_ns(ns); atomic_dec(&nr_ipc_ns); - /* - * Do the ipcns removal notification after decrementing nr_ipc_ns in - * order to have a correct value when recomputing msgmni. - */ - ipcns_notify(IPCNS_REMOVED); put_user_ns(ns->user_ns); proc_free_inum(ns->proc_inum); kfree(ns); diff --git a/ipc/util.c b/ipc/util.c index 2eb0d1e..768e1fc75 100644 --- a/ipc/util.c +++ b/ipc/util.c @@ -71,44 +71,6 @@ struct ipc_proc_iface { int (*show)(struct seq_file *, void *); }; -static void ipc_memory_notifier(struct work_struct *work) -{ - ipcns_notify(IPCNS_MEMCHANGED); -} - -static int ipc_memory_callback(struct notifier_block *self, - unsigned long action, void *arg) -{ - static DECLARE_WORK(ipc_memory_wq, ipc_memory_notifier); - - switch (action) { - case MEM_ONLINE: /* memory successfully brought online */ - case MEM_OFFLINE: /* or offline: it's time to recompute msgmni */ - /* - * This is done by invoking the ipcns notifier chain with the - * IPC_MEMCHANGED event. - * In order not to keep the lock on the hotplug memory chain - * for too long, queue a work item that will, when waken up, - * activate the ipcns notification chain. - */ - schedule_work(&ipc_memory_wq); - break; - case MEM_GOING_ONLINE: - case MEM_GOING_OFFLINE: - case MEM_CANCEL_ONLINE: - case MEM_CANCEL_OFFLINE: - default: - break; - } - - return NOTIFY_OK; -} - -static struct notifier_block ipc_memory_nb = { - .notifier_call = ipc_memory_callback, - .priority = IPC_CALLBACK_PRI, -}; - /** * ipc_init - initialise ipc subsystem * @@ -124,8 +86,6 @@ static int __init ipc_init(void) sem_init(); msg_init(); shm_init(); - register_hotmemory_notifier(&ipc_memory_nb); - register_ipcns_notifier(&init_ipc_ns); return 0; } device_initcall(ipc_init); -- 1.9.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/3] ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM 2014-05-29 18:46 ` [PATCH 1/3] ipc/msg: increase MSGMNI, remove scaling Manfred Spraul @ 2014-05-29 18:46 ` Manfred Spraul 0 siblings, 0 replies; 10+ messages in thread From: Manfred Spraul @ 2014-05-29 18:46 UTC (permalink / raw) To: Davidlohr Bueso, Michael Kerrisk, Rafael Aquini, Rik van Riel Cc: LKML, 1vier1, Manfred Spraul SysV can be abused to allocated locked kernel memory. For most systems, a small limit doesn't make sense, see the discussion with regards to SHMMAX. Therefore: Increase the sysv sem limits to the maximum supported. With regards to the maximum supported: Some of the specified hard limits are not correct anymore. Therefore here is an updated list: - SEMMNI must stay below IPCMNI, which is 32768. As for SHMMAX: Stay a bit below this limit. - SEMMSL was limited to 8k, to ensure that the kmalloc for the kernel array was limited to 16 kB (order=2) This doesn't apply anymore: - the allocation size isn't sizeof(short)*nsems anymore. - ipc_alloc falls back to vmalloc - SEMOPM should stay below 1000, to limit the kmalloc in semtimedop() to an order=1 allocation. This might be a real issue: The Redhat documentation shows an example with SEMOPM=5010: https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Tuning_and_Optimizing_Red_Hat_Enterprise_Linux_for_Oracle_9i_and_10g_Databases/sect-Oracle_9i_and_10g_Tuning_Guide-Setting_Semaphores-An_Example_of_Semaphore_Settings.html If Oracle really uses operations with SEMOPM 5000 and larger, then semtimedop() should be converted from kmalloc to ipc_alloc. Note: If an administrator must limit the memory allocations, then he can set MSGMNI as necessary. Or he can disable sysv entirely (as e.g. done with Android). --- include/uapi/linux/sem.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/include/uapi/linux/sem.h b/include/uapi/linux/sem.h index 541fce0..a4f47d9 100644 --- a/include/uapi/linux/sem.h +++ b/include/uapi/linux/sem.h @@ -63,10 +63,10 @@ struct seminfo { int semaem; }; -#define SEMMNI 128 /* <= IPCMNI max # of semaphore identifiers */ -#define SEMMSL 250 /* <= 8 000 max num of semaphores per id */ +#define SEMMNI 32000 /* <= IPCMNI max # of semaphore identifiers */ +#define SEMMSL 32000 /* <= INT_MAX max num of semaphores per id */ #define SEMMNS (SEMMNI*SEMMSL) /* <= INT_MAX max # of semaphores in system */ -#define SEMOPM 32 /* <= 1 000 max num of ops per semop call */ +#define SEMOPM 1000 /* <= 1 000 max num of ops per semop call */ #define SEMVMX 32767 /* <= 32767 semaphore maximum value */ #define SEMAEM SEMVMX /* adjust on exit max value */ -- 1.9.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-10-09 23:07 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-10-06 18:32 [PATCH 0/3] Misc updates to sysv ipc Manfred Spraul 2014-10-06 18:32 ` [PATCH 1/3] ipc/sem.c: Chance memory barrier in sem_lock() to smp_rmb() Manfred Spraul 2014-10-06 18:32 ` [PATCH 2/3] ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM Manfred Spraul 2014-10-06 18:32 ` [PATCH 3/3] ipc/msg: increase MSGMNI, remove scaling Manfred Spraul 2014-10-07 20:09 ` [PATCH 2/3] ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM Rafael Aquini 2014-10-07 20:08 ` [PATCH 1/3] ipc/sem.c: Chance memory barrier in sem_lock() to smp_rmb() Rafael Aquini 2014-10-09 23:07 ` Davidlohr Bueso -- strict thread matches above, loose matches on Subject: below -- 2014-08-12 7:29 [PATCH 0/3] ipc: Further updates to sysv/mqueue limits Manfred Spraul 2014-08-12 7:29 ` [PATCH 1/3] ipc/msg: increase MSGMNI, remove scaling Manfred Spraul 2014-08-12 7:29 ` [PATCH 2/3] ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM Manfred Spraul 2014-08-15 13:42 ` Rafael Aquini 2014-05-29 18:46 [PATCH 0/3] ipc: Further updates to sysv/mqueue limits Manfred Spraul 2014-05-29 18:46 ` [PATCH 1/3] ipc/msg: increase MSGMNI, remove scaling Manfred Spraul 2014-05-29 18:46 ` [PATCH 2/3] ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM Manfred Spraul
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).