All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -next v3 0/4] rhashtable: guarantee initial allocation
@ 2018-06-21 21:28 Davidlohr Bueso
  2018-06-21 21:28 ` [PATCH 1/4] lib/rhashtable: simplify bucket_table_alloc() Davidlohr Bueso
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Davidlohr Bueso @ 2018-06-21 21:28 UTC (permalink / raw)
  To: akpm, torvalds
  Cc: tgraf, herbert, manfred, mhocko, guillaume.knispel, linux-api,
	linux-kernel, dave

Changes from v2:
lkml.kernel.org/r/20180601160125.30031-1-dave@stgolabs.net

- Dropped original patch 1 (which removed the EINVAL returns
  for WARN_ON) as well as patch 5.
- Added a new patch 1, which simplifies the alloc after mhocko's
  changes (originally suggested by Linus).
- Use max_t instead of min_t for the resize.

Changes from v1
lkml.kernel.org/r/20180524211135.27760-1-dave@stgolabs.net

- patch 2 is reworked a bit based on the commments from Herbert Xu.
  o upon failure, retry immediately with GFP_NOFAIL (simpler)
  o the caller now passes the needed semantics, not bucket_table_alloc().
  o we consider min_size when resizing, not just HASH_MIN_SIZE.

- removed patch 3; not need after Michal's patch.

Hi,

This series is the result of the discussion with Linus around ipc
subsystem initialization and how it behaves with error return when
calling rhashtable_init()[1]. Instead of caring about the error
or calling the infamous BUG_ON, Linus suggested we guarantee the
rhashtable allocation.

First two patches deal with the rhashtable_init() changes, while the
patches 3+4 remove some ipc hacks we no longer need.

Please consider for v4.19.

[0] https://lkml.org/lkml/2018/5/23/758

Thanks!

Davidlohr Bueso (4):
  lib/rhashtable: simplify bucket_table_alloc()
  lib/rhashtable: guarantee initial hashtable allocation
  ipc: get rid of ids->tables_initialized hack
  ipc: simplify ipc initialization

 include/linux/ipc_namespace.h |  1 -
 ipc/msg.c                     |  9 ++++-----
 ipc/namespace.c               | 20 ++++----------------
 ipc/sem.c                     | 10 ++++------
 ipc/shm.c                     |  9 ++++-----
 ipc/util.c                    | 41 +++++++++++++----------------------------
 ipc/util.h                    | 18 +++++++++---------
 lib/rhashtable.c              | 19 ++++++++++++-------
 8 files changed, 50 insertions(+), 77 deletions(-)

-- 
2.16.4


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] lib/rhashtable: simplify bucket_table_alloc()
  2018-06-21 21:28 [PATCH -next v3 0/4] rhashtable: guarantee initial allocation Davidlohr Bueso
@ 2018-06-21 21:28 ` Davidlohr Bueso
  2018-06-21 21:33   ` Randy Dunlap
                     ` (2 more replies)
  2018-06-21 21:28 ` [PATCH 2/4] lib/rhashtable: guarantee initial hashtable allocation Davidlohr Bueso
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 14+ messages in thread
From: Davidlohr Bueso @ 2018-06-21 21:28 UTC (permalink / raw)
  To: akpm, torvalds
  Cc: tgraf, herbert, manfred, mhocko, guillaume.knispel, linux-api,
	linux-kernel, dave, Davidlohr Bueso

As of ce91f6ee5 (mm: kvmalloc does not fallback to vmalloc for incompatible gfp flag),
we can simplify the caller and trust kvzalloc() to just do the right thing.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 lib/rhashtable.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 9427b5766134..26c9cd8a985a 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -175,10 +175,7 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
 	int i;
 
 	size = sizeof(*tbl) + nbuckets * sizeof(tbl->buckets[0]);
-	if (gfp != GFP_KERNEL)
-		tbl = kzalloc(size, gfp | __GFP_NOWARN | __GFP_NORETRY);
-	else
-		tbl = kvzalloc(size, gfp);
+	tbl = kvzalloc(size, gfp);
 
 	size = nbuckets;
 
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/4] lib/rhashtable: guarantee initial hashtable allocation
  2018-06-21 21:28 [PATCH -next v3 0/4] rhashtable: guarantee initial allocation Davidlohr Bueso
  2018-06-21 21:28 ` [PATCH 1/4] lib/rhashtable: simplify bucket_table_alloc() Davidlohr Bueso
@ 2018-06-21 21:28 ` Davidlohr Bueso
  2018-06-22  6:54   ` Herbert Xu
  2018-06-21 21:28 ` [PATCH 3/4] ipc: get rid of ids->tables_initialized hack Davidlohr Bueso
  2018-06-21 21:28 ` [PATCH 4/4] ipc: simplify ipc initialization Davidlohr Bueso
  3 siblings, 1 reply; 14+ messages in thread
From: Davidlohr Bueso @ 2018-06-21 21:28 UTC (permalink / raw)
  To: akpm, torvalds
  Cc: tgraf, herbert, manfred, mhocko, guillaume.knispel, linux-api,
	linux-kernel, dave, Davidlohr Bueso

rhashtable_init() may fail due to -ENOMEM, thus making the
entire api unusable. This patch removes this scenario,
however unlikely. In order to guarantee memory allocation,
this patch always ends up doing GFP_KERNEL|__GFP_NOFAIL
for both the tbl as well as alloc_bucket_spinlocks().

Upon the first table allocation failure, we shrink the
size to the smallest value that makes sense and retry with
__GFP_NOFAIL semantics. With the defaults, this means that
from 64 buckets, we retry with only 4. Any later issues
regarding performance due to collisions or larger table
resizing (when more memory becomes available) is the least
of our problems.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 lib/rhashtable.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 26c9cd8a985a..411c4041ce83 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -179,10 +179,11 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
 
 	size = nbuckets;
 
-	if (tbl == NULL && gfp != GFP_KERNEL) {
+	if (tbl == NULL && (gfp & ~__GFP_NOFAIL) != GFP_KERNEL) {
 		tbl = nested_bucket_table_alloc(ht, nbuckets, gfp);
 		nbuckets = 0;
 	}
+
 	if (tbl == NULL)
 		return NULL;
 
@@ -1065,9 +1066,16 @@ int rhashtable_init(struct rhashtable *ht,
 		}
 	}
 
+	/*
+	 * This is api initialization and thus we need to guarantee the
+	 * initial rhashtable allocation. Upon failure, retry with the
+	 * smallest possible size with __GFP_NOFAIL semantics.
+	 */
 	tbl = bucket_table_alloc(ht, size, GFP_KERNEL);
-	if (tbl == NULL)
-		return -ENOMEM;
+	if (unlikely(tbl == NULL)) {
+		size = max_t(u16, ht->p.min_size, HASH_MIN_SIZE);
+		tbl = bucket_table_alloc(ht, size, GFP_KERNEL | __GFP_NOFAIL);
+	}
 
 	atomic_set(&ht->nelems, 0);
 
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/4] ipc: get rid of ids->tables_initialized hack
  2018-06-21 21:28 [PATCH -next v3 0/4] rhashtable: guarantee initial allocation Davidlohr Bueso
  2018-06-21 21:28 ` [PATCH 1/4] lib/rhashtable: simplify bucket_table_alloc() Davidlohr Bueso
  2018-06-21 21:28 ` [PATCH 2/4] lib/rhashtable: guarantee initial hashtable allocation Davidlohr Bueso
@ 2018-06-21 21:28 ` Davidlohr Bueso
  2018-06-21 21:28 ` [PATCH 4/4] ipc: simplify ipc initialization Davidlohr Bueso
  3 siblings, 0 replies; 14+ messages in thread
From: Davidlohr Bueso @ 2018-06-21 21:28 UTC (permalink / raw)
  To: akpm, torvalds
  Cc: tgraf, herbert, manfred, mhocko, guillaume.knispel, linux-api,
	linux-kernel, dave, Davidlohr Bueso

In sysvipc we have an ids->tables_initialized regarding the
rhashtable, introduced in:

    0cfb6aee70b (ipc: optimize semget/shmget/msgget for lots of keys).

It's there, specifically, to prevent nil pointer dereferences,
from using an uninitialized api. Considering how rhashtable_init()
can fail (probably due to ENOMEM, if anything), this made the
overall ipc initialization capable of failure as well. That alone
is ugly, but fine, however I've spotted a few issues regarding the
semantics of tables_initialized (however unlikely they may be):

- There is inconsistency in what we return to userspace: ipc_addid()
returns ENOSPC which is certainly _wrong_, while ipc_obtain_object_idr()
returns EINVAL.

- After we started using rhashtables, ipc_findkey() can return nil upon
!tables_initialized, but the caller expects nil for when the ipc structure
isn't found, and can therefore call into ipcget() callbacks.

Now that rhashtable initialization cannot fail, we can properly
get rid of the hack altogether.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 include/linux/ipc_namespace.h |  1 -
 ipc/util.c                    | 23 ++++++++---------------
 2 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index b5630c8eb2f3..37f3a4b7c637 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -16,7 +16,6 @@ struct user_namespace;
 struct ipc_ids {
 	int in_use;
 	unsigned short seq;
-	bool tables_initialized;
 	struct rw_semaphore rwsem;
 	struct idr ipcs_idr;
 	int max_id;
diff --git a/ipc/util.c b/ipc/util.c
index 4e81182fa0ac..823e09e72c58 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -125,7 +125,6 @@ int ipc_init_ids(struct ipc_ids *ids)
 	if (err)
 		return err;
 	idr_init(&ids->ipcs_idr);
-	ids->tables_initialized = true;
 	ids->max_id = -1;
 #ifdef CONFIG_CHECKPOINT_RESTORE
 	ids->next_id = -1;
@@ -178,19 +177,16 @@ void __init ipc_init_proc_interface(const char *path, const char *header,
  */
 static struct kern_ipc_perm *ipc_findkey(struct ipc_ids *ids, key_t key)
 {
-	struct kern_ipc_perm *ipcp = NULL;
+	struct kern_ipc_perm *ipcp;
 
-	if (likely(ids->tables_initialized))
-		ipcp = rhashtable_lookup_fast(&ids->key_ht, &key,
+	ipcp = rhashtable_lookup_fast(&ids->key_ht, &key,
 					      ipc_kht_params);
+	if (!ipcp)
+		return NULL;
 
-	if (ipcp) {
-		rcu_read_lock();
-		ipc_lock_object(ipcp);
-		return ipcp;
-	}
-
-	return NULL;
+	rcu_read_lock();
+	ipc_lock_object(ipcp);
+	return ipcp;
 }
 
 #ifdef CONFIG_CHECKPOINT_RESTORE
@@ -255,7 +251,7 @@ int ipc_addid(struct ipc_ids *ids, struct kern_ipc_perm *new, int limit)
 	if (limit > IPCMNI)
 		limit = IPCMNI;
 
-	if (!ids->tables_initialized || ids->in_use >= limit)
+	if (ids->in_use >= limit)
 		return -ENOSPC;
 
 	idr_preload(GFP_KERNEL);
@@ -566,9 +562,6 @@ struct kern_ipc_perm *ipc_obtain_object_idr(struct ipc_ids *ids, int id)
 	struct kern_ipc_perm *out;
 	int lid = ipcid_to_idx(id);
 
-	if (unlikely(!ids->tables_initialized))
-		return ERR_PTR(-EINVAL);
-
 	out = idr_find(&ids->ipcs_idr, lid);
 	if (!out)
 		return ERR_PTR(-EINVAL);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/4] ipc: simplify ipc initialization
  2018-06-21 21:28 [PATCH -next v3 0/4] rhashtable: guarantee initial allocation Davidlohr Bueso
                   ` (2 preceding siblings ...)
  2018-06-21 21:28 ` [PATCH 3/4] ipc: get rid of ids->tables_initialized hack Davidlohr Bueso
@ 2018-06-21 21:28 ` Davidlohr Bueso
  3 siblings, 0 replies; 14+ messages in thread
From: Davidlohr Bueso @ 2018-06-21 21:28 UTC (permalink / raw)
  To: akpm, torvalds
  Cc: tgraf, herbert, manfred, mhocko, guillaume.knispel, linux-api,
	linux-kernel, dave, Davidlohr Bueso

Now that we know that rhashtable_init() will not fail, we
can get rid of a lot of the unnecessary cleanup paths when
the call errored out.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 ipc/msg.c       |  9 ++++-----
 ipc/namespace.c | 20 ++++----------------
 ipc/sem.c       | 10 ++++------
 ipc/shm.c       |  9 ++++-----
 ipc/util.c      | 18 +++++-------------
 ipc/util.h      | 18 +++++++++---------
 6 files changed, 30 insertions(+), 54 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 3b6545302598..62545ce19173 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -1228,7 +1228,7 @@ COMPAT_SYSCALL_DEFINE5(msgrcv, int, msqid, compat_uptr_t, msgp,
 }
 #endif
 
-int msg_init_ns(struct ipc_namespace *ns)
+void msg_init_ns(struct ipc_namespace *ns)
 {
 	ns->msg_ctlmax = MSGMAX;
 	ns->msg_ctlmnb = MSGMNB;
@@ -1236,7 +1236,7 @@ int msg_init_ns(struct ipc_namespace *ns)
 
 	atomic_set(&ns->msg_bytes, 0);
 	atomic_set(&ns->msg_hdrs, 0);
-	return ipc_init_ids(&ns->ids[IPC_MSG_IDS]);
+	ipc_init_ids(&ns->ids[IPC_MSG_IDS]);
 }
 
 #ifdef CONFIG_IPC_NS
@@ -1277,12 +1277,11 @@ static int sysvipc_msg_proc_show(struct seq_file *s, void *it)
 }
 #endif
 
-int __init msg_init(void)
+void __init msg_init(void)
 {
-	const int err = msg_init_ns(&init_ipc_ns);
+	msg_init_ns(&init_ipc_ns);
 
 	ipc_init_proc_interface("sysvipc/msg",
 				"       key      msqid perms      cbytes       qnum lspid lrpid   uid   gid  cuid  cgid      stime      rtime      ctime\n",
 				IPC_MSG_IDS, sysvipc_msg_proc_show);
-	return err;
 }
diff --git a/ipc/namespace.c b/ipc/namespace.c
index f59a89966f92..21607791d62c 100644
--- a/ipc/namespace.c
+++ b/ipc/namespace.c
@@ -55,28 +55,16 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns,
 	ns->user_ns = get_user_ns(user_ns);
 	ns->ucounts = ucounts;
 
-	err = sem_init_ns(ns);
+	err = mq_init_ns(ns);
 	if (err)
 		goto fail_put;
-	err = msg_init_ns(ns);
-	if (err)
-		goto fail_destroy_sem;
-	err = shm_init_ns(ns);
-	if (err)
-		goto fail_destroy_msg;
 
-	err = mq_init_ns(ns);
-	if (err)
-		goto fail_destroy_shm;
+	sem_init_ns(ns);
+	msg_init_ns(ns);
+	shm_init_ns(ns);
 
 	return ns;
 
-fail_destroy_shm:
-	shm_exit_ns(ns);
-fail_destroy_msg:
-	msg_exit_ns(ns);
-fail_destroy_sem:
-	sem_exit_ns(ns);
 fail_put:
 	put_user_ns(ns->user_ns);
 	ns_free_inum(&ns->ns);
diff --git a/ipc/sem.c b/ipc/sem.c
index 5af1943ad782..0f3a09b56582 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -220,14 +220,14 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it);
 #define sc_semopm	sem_ctls[2]
 #define sc_semmni	sem_ctls[3]
 
-int sem_init_ns(struct ipc_namespace *ns)
+void sem_init_ns(struct ipc_namespace *ns)
 {
 	ns->sc_semmsl = SEMMSL;
 	ns->sc_semmns = SEMMNS;
 	ns->sc_semopm = SEMOPM;
 	ns->sc_semmni = SEMMNI;
 	ns->used_sems = 0;
-	return ipc_init_ids(&ns->ids[IPC_SEM_IDS]);
+	ipc_init_ids(&ns->ids[IPC_SEM_IDS]);
 }
 
 #ifdef CONFIG_IPC_NS
@@ -239,14 +239,12 @@ void sem_exit_ns(struct ipc_namespace *ns)
 }
 #endif
 
-int __init sem_init(void)
+void __init sem_init(void)
 {
-	const int err = sem_init_ns(&init_ipc_ns);
-
+	sem_init_ns(&init_ipc_ns);
 	ipc_init_proc_interface("sysvipc/sem",
 				"       key      semid perms      nsems   uid   gid  cuid  cgid      otime      ctime\n",
 				IPC_SEM_IDS, sysvipc_sem_proc_show);
-	return err;
 }
 
 /**
diff --git a/ipc/shm.c b/ipc/shm.c
index 051a3e1fb8df..e602a6fc3991 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -95,14 +95,14 @@ static void shm_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp);
 static int sysvipc_shm_proc_show(struct seq_file *s, void *it);
 #endif
 
-int shm_init_ns(struct ipc_namespace *ns)
+void shm_init_ns(struct ipc_namespace *ns)
 {
 	ns->shm_ctlmax = SHMMAX;
 	ns->shm_ctlall = SHMALL;
 	ns->shm_ctlmni = SHMMNI;
 	ns->shm_rmid_forced = 0;
 	ns->shm_tot = 0;
-	return ipc_init_ids(&shm_ids(ns));
+	ipc_init_ids(&shm_ids(ns));
 }
 
 /*
@@ -135,9 +135,8 @@ void shm_exit_ns(struct ipc_namespace *ns)
 
 static int __init ipc_ns_init(void)
 {
-	const int err = shm_init_ns(&init_ipc_ns);
-	WARN(err, "ipc: sysv shm_init_ns failed: %d\n", err);
-	return err;
+	shm_init_ns(&init_ipc_ns);
+	return 0;
 }
 
 pure_initcall(ipc_ns_init);
diff --git a/ipc/util.c b/ipc/util.c
index 823e09e72c58..06d7c575847c 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -87,16 +87,12 @@ struct ipc_proc_iface {
  */
 static int __init ipc_init(void)
 {
-	int err_sem, err_msg;
-
 	proc_mkdir("sysvipc", NULL);
-	err_sem = sem_init();
-	WARN(err_sem, "ipc: sysv sem_init failed: %d\n", err_sem);
-	err_msg = msg_init();
-	WARN(err_msg, "ipc: sysv msg_init failed: %d\n", err_msg);
+	sem_init();
+	msg_init();
 	shm_init();
 
-	return err_msg ? err_msg : err_sem;
+	return 0;
 }
 device_initcall(ipc_init);
 
@@ -115,21 +111,17 @@ static const struct rhashtable_params ipc_kht_params = {
  * Set up the sequence range to use for the ipc identifier range (limited
  * below IPCMNI) then initialise the keys hashtable and ids idr.
  */
-int ipc_init_ids(struct ipc_ids *ids)
+void ipc_init_ids(struct ipc_ids *ids)
 {
-	int err;
 	ids->in_use = 0;
 	ids->seq = 0;
 	init_rwsem(&ids->rwsem);
-	err = rhashtable_init(&ids->key_ht, &ipc_kht_params);
-	if (err)
-		return err;
+	rhashtable_init(&ids->key_ht, &ipc_kht_params);
 	idr_init(&ids->ipcs_idr);
 	ids->max_id = -1;
 #ifdef CONFIG_CHECKPOINT_RESTORE
 	ids->next_id = -1;
 #endif
-	return 0;
 }
 
 #ifdef CONFIG_PROC_FS
diff --git a/ipc/util.h b/ipc/util.h
index 0aba3230d007..65fad8a94da8 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -18,8 +18,8 @@
 #define IPCMNI 32768  /* <= MAX_INT limit for ipc arrays (including sysctl changes) */
 #define SEQ_MULTIPLIER	(IPCMNI)
 
-int sem_init(void);
-int msg_init(void);
+void sem_init(void);
+void msg_init(void);
 void shm_init(void);
 
 struct ipc_namespace;
@@ -34,17 +34,17 @@ static inline void mq_put_mnt(struct ipc_namespace *ns) { }
 #endif
 
 #ifdef CONFIG_SYSVIPC
-int sem_init_ns(struct ipc_namespace *ns);
-int msg_init_ns(struct ipc_namespace *ns);
-int shm_init_ns(struct ipc_namespace *ns);
+void sem_init_ns(struct ipc_namespace *ns);
+void msg_init_ns(struct ipc_namespace *ns);
+void shm_init_ns(struct ipc_namespace *ns);
 
 void sem_exit_ns(struct ipc_namespace *ns);
 void msg_exit_ns(struct ipc_namespace *ns);
 void shm_exit_ns(struct ipc_namespace *ns);
 #else
-static inline int sem_init_ns(struct ipc_namespace *ns) { return 0; }
-static inline int msg_init_ns(struct ipc_namespace *ns) { return 0; }
-static inline int shm_init_ns(struct ipc_namespace *ns) { return 0; }
+static inline void sem_init_ns(struct ipc_namespace *ns) { }
+static inline void msg_init_ns(struct ipc_namespace *ns) { }
+static inline void shm_init_ns(struct ipc_namespace *ns) { }
 
 static inline void sem_exit_ns(struct ipc_namespace *ns) { }
 static inline void msg_exit_ns(struct ipc_namespace *ns) { }
@@ -83,7 +83,7 @@ struct ipc_ops {
 struct seq_file;
 struct ipc_ids;
 
-int ipc_init_ids(struct ipc_ids *);
+void ipc_init_ids(struct ipc_ids *);
 #ifdef CONFIG_PROC_FS
 void __init ipc_init_proc_interface(const char *path, const char *header,
 		int ids, int (*show)(struct seq_file *, void *));
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] lib/rhashtable: simplify bucket_table_alloc()
  2018-06-21 21:28 ` [PATCH 1/4] lib/rhashtable: simplify bucket_table_alloc() Davidlohr Bueso
@ 2018-06-21 21:33   ` Randy Dunlap
  2018-06-22  6:04     ` NeilBrown
  2018-06-22 18:15   ` [PATCH v2 " Davidlohr Bueso
  2 siblings, 0 replies; 14+ messages in thread
From: Randy Dunlap @ 2018-06-21 21:33 UTC (permalink / raw)
  To: Davidlohr Bueso, akpm, torvalds
  Cc: tgraf, herbert, manfred, mhocko, guillaume.knispel, linux-api,
	linux-kernel, Davidlohr Bueso

On 06/21/2018 02:28 PM, Davidlohr Bueso wrote:
> As of ce91f6ee5 (mm: kvmalloc does not fallback to vmalloc for incompatible gfp flag),
> we can simplify the caller and trust kvzalloc() to just do the right thing.

Hi,
JFYI, we are using 12-digit (12-character) commit IDs nowadays...


> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> ---
>  lib/rhashtable.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index 9427b5766134..26c9cd8a985a 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -175,10 +175,7 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
>  	int i;
>  
>  	size = sizeof(*tbl) + nbuckets * sizeof(tbl->buckets[0]);
> -	if (gfp != GFP_KERNEL)
> -		tbl = kzalloc(size, gfp | __GFP_NOWARN | __GFP_NORETRY);
> -	else
> -		tbl = kvzalloc(size, gfp);
> +	tbl = kvzalloc(size, gfp);
>  
>  	size = nbuckets;
>  
> 


-- 
~Randy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] lib/rhashtable: simplify bucket_table_alloc()
  2018-06-21 21:28 ` [PATCH 1/4] lib/rhashtable: simplify bucket_table_alloc() Davidlohr Bueso
@ 2018-06-22  6:04     ` NeilBrown
  2018-06-22  6:04     ` NeilBrown
  2018-06-22 18:15   ` [PATCH v2 " Davidlohr Bueso
  2 siblings, 0 replies; 14+ messages in thread
From: NeilBrown @ 2018-06-22  6:04 UTC (permalink / raw)
  To: Davidlohr Bueso, akpm, torvalds
  Cc: tgraf, herbert, manfred, mhocko, guillaume.knispel, linux-api,
	linux-kernel, dave, Davidlohr Bueso

[-- Attachment #1: Type: text/plain, Size: 1279 bytes --]

On Thu, Jun 21 2018, Davidlohr Bueso wrote:

> As of ce91f6ee5 (mm: kvmalloc does not fallback to vmalloc for incompatible gfp flag),
> we can simplify the caller and trust kvzalloc() to just do the right thing.

Hi,
 it isn't clear to me that this is true.
 With this change we lose __GFP_NOWARN and __GFP_NORETRY.
 I doubt the NORETRY is particularly important as this is if it
 isn't GFP_KERNEL, then it is GFP_ATOMIC which doesn't retry anyway.
 However I cannot see why this patch won't result in warnings when the
 kzalloc() fails.
 What am I missing?

Thanks,
NeilBrown


>
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> ---
>  lib/rhashtable.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index 9427b5766134..26c9cd8a985a 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -175,10 +175,7 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
>  	int i;
>  
>  	size = sizeof(*tbl) + nbuckets * sizeof(tbl->buckets[0]);
> -	if (gfp != GFP_KERNEL)
> -		tbl = kzalloc(size, gfp | __GFP_NOWARN | __GFP_NORETRY);
> -	else
> -		tbl = kvzalloc(size, gfp);
> +	tbl = kvzalloc(size, gfp);
>  
>  	size = nbuckets;
>  
> -- 
> 2.16.4

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] lib/rhashtable: simplify bucket_table_alloc()
@ 2018-06-22  6:04     ` NeilBrown
  0 siblings, 0 replies; 14+ messages in thread
From: NeilBrown @ 2018-06-22  6:04 UTC (permalink / raw)
  To: akpm, torvalds
  Cc: tgraf, herbert, manfred, mhocko, guillaume.knispel, linux-api,
	linux-kernel, dave, Davidlohr Bueso

[-- Attachment #1: Type: text/plain, Size: 1279 bytes --]

On Thu, Jun 21 2018, Davidlohr Bueso wrote:

> As of ce91f6ee5 (mm: kvmalloc does not fallback to vmalloc for incompatible gfp flag),
> we can simplify the caller and trust kvzalloc() to just do the right thing.

Hi,
 it isn't clear to me that this is true.
 With this change we lose __GFP_NOWARN and __GFP_NORETRY.
 I doubt the NORETRY is particularly important as this is if it
 isn't GFP_KERNEL, then it is GFP_ATOMIC which doesn't retry anyway.
 However I cannot see why this patch won't result in warnings when the
 kzalloc() fails.
 What am I missing?

Thanks,
NeilBrown


>
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> ---
>  lib/rhashtable.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index 9427b5766134..26c9cd8a985a 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -175,10 +175,7 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
>  	int i;
>  
>  	size = sizeof(*tbl) + nbuckets * sizeof(tbl->buckets[0]);
> -	if (gfp != GFP_KERNEL)
> -		tbl = kzalloc(size, gfp | __GFP_NOWARN | __GFP_NORETRY);
> -	else
> -		tbl = kvzalloc(size, gfp);
> +	tbl = kvzalloc(size, gfp);
>  
>  	size = nbuckets;
>  
> -- 
> 2.16.4

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] lib/rhashtable: simplify bucket_table_alloc()
  2018-06-22  6:04     ` NeilBrown
  (?)
@ 2018-06-22  6:36     ` Davidlohr Bueso
  -1 siblings, 0 replies; 14+ messages in thread
From: Davidlohr Bueso @ 2018-06-22  6:36 UTC (permalink / raw)
  To: NeilBrown
  Cc: akpm, torvalds, tgraf, herbert, manfred, mhocko,
	guillaume.knispel, linux-api, linux-kernel, Davidlohr Bueso

[-- Attachment #1: Type: text/plain, Size: 804 bytes --]

On Fri, 22 Jun 2018, NeilBrown wrote:

>On Thu, Jun 21 2018, Davidlohr Bueso wrote:
>
>> As of ce91f6ee5 (mm: kvmalloc does not fallback to vmalloc for incompatible gfp flag),
>> we can simplify the caller and trust kvzalloc() to just do the right thing.
>
>Hi,
> it isn't clear to me that this is true.
> With this change we lose __GFP_NOWARN and __GFP_NORETRY.
> I doubt the NORETRY is particularly important as this is if it
> isn't GFP_KERNEL, then it is GFP_ATOMIC which doesn't retry anyway.
> However I cannot see why this patch won't result in warnings when the
> kzalloc() fails.
> What am I missing?

You're right, it might be too agressive to get rid of the GFP_NOWARN for
the callers that do GFP_ATOMIC.

I'll send a new version of this patch along with a better changelog.

Thanks,
Davidlohr

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] lib/rhashtable: guarantee initial hashtable allocation
  2018-06-21 21:28 ` [PATCH 2/4] lib/rhashtable: guarantee initial hashtable allocation Davidlohr Bueso
@ 2018-06-22  6:54   ` Herbert Xu
  0 siblings, 0 replies; 14+ messages in thread
From: Herbert Xu @ 2018-06-22  6:54 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: akpm, torvalds, tgraf, manfred, mhocko, guillaume.knispel,
	linux-api, linux-kernel, Davidlohr Bueso

On Thu, Jun 21, 2018 at 02:28:23PM -0700, Davidlohr Bueso wrote:
> rhashtable_init() may fail due to -ENOMEM, thus making the
> entire api unusable. This patch removes this scenario,
> however unlikely. In order to guarantee memory allocation,
> this patch always ends up doing GFP_KERNEL|__GFP_NOFAIL
> for both the tbl as well as alloc_bucket_spinlocks().
> 
> Upon the first table allocation failure, we shrink the
> size to the smallest value that makes sense and retry with
> __GFP_NOFAIL semantics. With the defaults, this means that
> from 64 buckets, we retry with only 4. Any later issues
> regarding performance due to collisions or larger table
> resizing (when more memory becomes available) is the least
> of our problems.
> 
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/4] lib/rhashtable: simplify bucket_table_alloc()
  2018-06-21 21:28 ` [PATCH 1/4] lib/rhashtable: simplify bucket_table_alloc() Davidlohr Bueso
  2018-06-21 21:33   ` Randy Dunlap
  2018-06-22  6:04     ` NeilBrown
@ 2018-06-22 18:15   ` Davidlohr Bueso
  2018-06-22 18:16     ` Davidlohr Bueso
                       ` (2 more replies)
  2 siblings, 3 replies; 14+ messages in thread
From: Davidlohr Bueso @ 2018-06-22 18:15 UTC (permalink / raw)
  To: akpm, torvalds
  Cc: tgraf, herbert, manfred, mhocko, guillaume.knispel, linux-api,
	linux-kernel, Davidlohr Bueso

As of ce91f6ee5b3 (mm: kvmalloc does not fallback to vmalloc for incompatible gfp flag),
we can simplify the caller and trust kvzalloc() to just do the right thing. For the
case of the GFP_ATOMIC context, we can drop the __GFP_NORETRY flag for obvious reasons,
and for the __GFP_NOWARN case, however, it is changed such that the caller passes the
flag instead of making bucket_table_alloc() handle it.

This slightly changes the gfp flags passed on to nested_table_alloc() as it will now
also use GFP_ATOMIC | __GFP_NOWARN. However, I consider this a positive consequence
as for the same reasons we want nowarn semantics in bucket_table_alloc().

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---

v2:
- Changes based on Neil's concerns about keeping nowarn flag.
- Better changelog.


 lib/rhashtable.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 9427b5766134..083f871491a1 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -175,10 +175,7 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
 	int i;
 
 	size = sizeof(*tbl) + nbuckets * sizeof(tbl->buckets[0]);
-	if (gfp != GFP_KERNEL)
-		tbl = kzalloc(size, gfp | __GFP_NOWARN | __GFP_NORETRY);
-	else
-		tbl = kvzalloc(size, gfp);
+	tbl = kvzalloc(size, gfp);
 
 	size = nbuckets;
 
@@ -459,7 +456,7 @@ static int rhashtable_insert_rehash(struct rhashtable *ht,
 
 	err = -ENOMEM;
 
-	new_tbl = bucket_table_alloc(ht, size, GFP_ATOMIC);
+	new_tbl = bucket_table_alloc(ht, size, GFP_ATOMIC | __GFP_NOWARN);
 	if (new_tbl == NULL)
 		goto fail;
 
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/4] lib/rhashtable: simplify bucket_table_alloc()
  2018-06-22 18:15   ` [PATCH v2 " Davidlohr Bueso
@ 2018-06-22 18:16     ` Davidlohr Bueso
  2018-06-22 18:35     ` Davidlohr Bueso
  2018-06-25  9:13     ` Michal Hocko
  2 siblings, 0 replies; 14+ messages in thread
From: Davidlohr Bueso @ 2018-06-22 18:16 UTC (permalink / raw)
  To: akpm, torvalds
  Cc: tgraf, herbert, manfred, mhocko, guillaume.knispel, linux-api,
	linux-kernel, Davidlohr Bueso, neilb

Cc'ing Neil.

On Fri, 22 Jun 2018, Davidlohr Bueso wrote:

>As of ce91f6ee5b3 (mm: kvmalloc does not fallback to vmalloc for incompatible gfp flag),
>we can simplify the caller and trust kvzalloc() to just do the right thing. For the
>case of the GFP_ATOMIC context, we can drop the __GFP_NORETRY flag for obvious reasons,
>and for the __GFP_NOWARN case, however, it is changed such that the caller passes the
>flag instead of making bucket_table_alloc() handle it.
>
>This slightly changes the gfp flags passed on to nested_table_alloc() as it will now
>also use GFP_ATOMIC | __GFP_NOWARN. However, I consider this a positive consequence
>as for the same reasons we want nowarn semantics in bucket_table_alloc().
>
>Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
>---
>
>v2:
>- Changes based on Neil's concerns about keeping nowarn flag.
>- Better changelog.
>
>
>lib/rhashtable.c | 7 ++-----
>1 file changed, 2 insertions(+), 5 deletions(-)
>
>diff --git a/lib/rhashtable.c b/lib/rhashtable.c
>index 9427b5766134..083f871491a1 100644
>--- a/lib/rhashtable.c
>+++ b/lib/rhashtable.c
>@@ -175,10 +175,7 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
>	int i;
>
>	size = sizeof(*tbl) + nbuckets * sizeof(tbl->buckets[0]);
>-	if (gfp != GFP_KERNEL)
>-		tbl = kzalloc(size, gfp | __GFP_NOWARN | __GFP_NORETRY);
>-	else
>-		tbl = kvzalloc(size, gfp);
>+	tbl = kvzalloc(size, gfp);
>
>	size = nbuckets;
>
>@@ -459,7 +456,7 @@ static int rhashtable_insert_rehash(struct rhashtable *ht,
>
>	err = -ENOMEM;
>
>-	new_tbl = bucket_table_alloc(ht, size, GFP_ATOMIC);
>+	new_tbl = bucket_table_alloc(ht, size, GFP_ATOMIC | __GFP_NOWARN);
>	if (new_tbl == NULL)
>		goto fail;
>
>-- 
>2.16.4
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/4] lib/rhashtable: simplify bucket_table_alloc()
  2018-06-22 18:15   ` [PATCH v2 " Davidlohr Bueso
  2018-06-22 18:16     ` Davidlohr Bueso
@ 2018-06-22 18:35     ` Davidlohr Bueso
  2018-06-25  9:13     ` Michal Hocko
  2 siblings, 0 replies; 14+ messages in thread
From: Davidlohr Bueso @ 2018-06-22 18:35 UTC (permalink / raw)
  To: akpm, torvalds
  Cc: tgraf, herbert, manfred, mhocko, guillaume.knispel, linux-api,
	linux-kernel, Davidlohr Bueso, neilb

On Fri, 22 Jun 2018, Davidlohr Bueso wrote:

>This slightly changes the gfp flags passed on to nested_table_alloc() as it will now
>also use GFP_ATOMIC | __GFP_NOWARN. However, I consider this a positive consequence
>as for the same reasons we want nowarn semantics in bucket_table_alloc().

If this is not acceptable, we can just keep the caller's current semantics - the
atomic flag could also be labeled 'rehash' or something considering that it comes
only from insert_rehash() when we get EAGAIN after trying to insert the first time:

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 9427b5766134..18740b052aec 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -172,17 +172,15 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
 {
 	struct bucket_table *tbl = NULL;
 	size_t size, max_locks;
+	bool atomic = (gfp == GFP_ATOMIC);
 	int i;
 
 	size = sizeof(*tbl) + nbuckets * sizeof(tbl->buckets[0]);
-	if (gfp != GFP_KERNEL)
-		tbl = kzalloc(size, gfp | __GFP_NOWARN | __GFP_NORETRY);
-	else
-		tbl = kvzalloc(size, gfp);
+	tbl = kvzalloc(size, atomic ? gfp | __GFP_NOWARN : gfp);
 
 	size = nbuckets;
 
-	if (tbl == NULL && gfp != GFP_KERNEL) {
+	if (tbl == NULL && atomic) {
 		tbl = nested_bucket_table_alloc(ht, nbuckets, gfp);
 		nbuckets = 0;
 	}



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/4] lib/rhashtable: simplify bucket_table_alloc()
  2018-06-22 18:15   ` [PATCH v2 " Davidlohr Bueso
  2018-06-22 18:16     ` Davidlohr Bueso
  2018-06-22 18:35     ` Davidlohr Bueso
@ 2018-06-25  9:13     ` Michal Hocko
  2 siblings, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2018-06-25  9:13 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: akpm, torvalds, tgraf, herbert, manfred, guillaume.knispel,
	linux-api, linux-kernel, Davidlohr Bueso

On Fri 22-06-18 11:15:40, Davidlohr Bueso wrote:
> As of ce91f6ee5b3 (mm: kvmalloc does not fallback to vmalloc for incompatible gfp flag),
> we can simplify the caller and trust kvzalloc() to just do the right thing. For the
> case of the GFP_ATOMIC context, we can drop the __GFP_NORETRY flag for obvious reasons,
> and for the __GFP_NOWARN case, however, it is changed such that the caller passes the
> flag instead of making bucket_table_alloc() handle it.
> 
> This slightly changes the gfp flags passed on to nested_table_alloc() as it will now
> also use GFP_ATOMIC | __GFP_NOWARN. However, I consider this a positive consequence
> as for the same reasons we want nowarn semantics in bucket_table_alloc().
> 
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
> 
> v2:
> - Changes based on Neil's concerns about keeping nowarn flag.
> - Better changelog.
> 
> 
> lib/rhashtable.c | 7 ++-----
> 1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index 9427b5766134..083f871491a1 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -175,10 +175,7 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
> 	int i;
> 
> 	size = sizeof(*tbl) + nbuckets * sizeof(tbl->buckets[0]);
> -	if (gfp != GFP_KERNEL)
> -		tbl = kzalloc(size, gfp | __GFP_NOWARN | __GFP_NORETRY);
> -	else
> -		tbl = kvzalloc(size, gfp);
> +	tbl = kvzalloc(size, gfp);
> 
> 	size = nbuckets;
> 
> @@ -459,7 +456,7 @@ static int rhashtable_insert_rehash(struct rhashtable *ht,
> 
> 	err = -ENOMEM;
> 
> -	new_tbl = bucket_table_alloc(ht, size, GFP_ATOMIC);
> +	new_tbl = bucket_table_alloc(ht, size, GFP_ATOMIC | __GFP_NOWARN);
> 	if (new_tbl == NULL)
> 		goto fail;
> 
> -- 
> 2.16.4

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-06-25  9:13 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-21 21:28 [PATCH -next v3 0/4] rhashtable: guarantee initial allocation Davidlohr Bueso
2018-06-21 21:28 ` [PATCH 1/4] lib/rhashtable: simplify bucket_table_alloc() Davidlohr Bueso
2018-06-21 21:33   ` Randy Dunlap
2018-06-22  6:04   ` NeilBrown
2018-06-22  6:04     ` NeilBrown
2018-06-22  6:36     ` Davidlohr Bueso
2018-06-22 18:15   ` [PATCH v2 " Davidlohr Bueso
2018-06-22 18:16     ` Davidlohr Bueso
2018-06-22 18:35     ` Davidlohr Bueso
2018-06-25  9:13     ` Michal Hocko
2018-06-21 21:28 ` [PATCH 2/4] lib/rhashtable: guarantee initial hashtable allocation Davidlohr Bueso
2018-06-22  6:54   ` Herbert Xu
2018-06-21 21:28 ` [PATCH 3/4] ipc: get rid of ids->tables_initialized hack Davidlohr Bueso
2018-06-21 21:28 ` [PATCH 4/4] ipc: simplify ipc initialization Davidlohr Bueso

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.