All of lore.kernel.org
 help / color / mirror / Atom feed
* [kernel-hardening] kmalloc() nofail allocations
@ 2011-08-17 19:15 Vasiliy Kulikov
  2011-08-20 14:27 ` [kernel-hardening] " Vasiliy Kulikov
  0 siblings, 1 reply; 8+ messages in thread
From: Vasiliy Kulikov @ 2011-08-17 19:15 UTC (permalink / raw)
  To: kernel-hardening

Solar,

As a follow up of setuid() subject - I'm thinking about such scheme of
making the kmalloc nofail allocation policy explicit.

Currently small allocations cannot fail:

static inline int
should_alloc_retry(gfp_t gfp_mask, unsigned int order,
				unsigned long pages_reclaimed)
{
    ...
	/*
	 * In this implementation, order <= PAGE_ALLOC_COSTLY_ORDER
	 * means __GFP_NOFAIL, but that may not be true in other
	 * implementations.
	 */
	if (order <= PAGE_ALLOC_COSTLY_ORDER)
		return 1;
    ...
}

But this policy is not explicit and many core developers even don't know
about it.  Last year I sent many patches fixing missing kmalloc()
checks, which were actually redundant because of should_alloc_retry()
logic.  But I got *no* complains that kmalloc() cannot fail.  So, either
it doesn't really guaranteed or a few people know about it.

If the former, we should do something with set_user() error path and
similar places.

If the latter (which is likely case), I suggest this scheme:

For each structure, which is expected to be small, explicitly define
the expectation and use specialized function for the allocations.


In header.h:

struct some_struct {
    ...
};
MALLOC_CANNOT_FAIL(some_struct);

In .c:

struct some_struct *x = kmalloc_nofail(sizeof(some_struct *), GFP_KERNEL);
x->a = 12; /* no check x==NULL */


In generic .h:

#define MALLOC_CANNOT_FAIL(x) \
    BUILD_BUG_ON(sizeof(x) <= (1 << PAGE_ALLOC_COSTLY_ORDER))

static inline
void *kmalloc_nofail(size_t size, int flags)
{
    void *p;
    BUILD_BUG_ON(!__buildin_const_p(size));
    BUILD_BUG_ON(size > (1 << PAGE_ALLOC_COSTLY_ORDER));
    p = kmalloc(size, flags);
    BUG_ON(p == NULL);
    return p;
}


So, this code does 3 things:

1) explicitly exploits the property of nofail allocation for reducing
error handling code.

2) ensures that (1) works in runtime.

3) ensures that "occasional" future changes of allocated structure type
(or subfield types) don't cross NOFAIL size limits.


I'm not sure about 2 things:

1) whether BUG_ON(p == 0) is OK in sense of permormance.  As compared to
the current behaviour, the check is simply moved into kmalloc_nofail()
without any slowdown, but someone may say "hey, why then have _any_
runtime check at all?".

2) kmalloc_nofail() interface.  There was __NOFAIL flag for kmalloc(),
which was removed for some reasons.  So, it needs some investigation.


I expect to get positive feedback from LKML folks as it resolves some
nasty hardly debuggable bugs, which are quite often.


What do you think about it?

Thanks,

-- 
Vasiliy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [kernel-hardening] Re: kmalloc() nofail allocations
  2011-08-17 19:15 [kernel-hardening] kmalloc() nofail allocations Vasiliy Kulikov
@ 2011-08-20 14:27 ` Vasiliy Kulikov
  2011-08-20 16:31   ` [kernel-hardening] " Solar Designer
  0 siblings, 1 reply; 8+ messages in thread
From: Vasiliy Kulikov @ 2011-08-20 14:27 UTC (permalink / raw)
  To: kernel-hardening

Solar,

Here is a patch to do it.  I've implemented k(m|z)alloc() and
kmem_cache_{z,}alloc() nofail variants.  As a result, setuid() cannot
fail with any reason, but EACCES.

kernel/cred.c is partly moved to _nofail() too, just to show how much
error handling code it removes.

--
 include/linux/slab.h     |   26 ++++++++++++++++++++++++++
 include/linux/slub_def.h |   15 +++++++++++++++
 kernel/cred.c            |   28 ++++++----------------------
 kernel/sys.c             |    2 --
 kernel/user.c            |    8 +-------
 5 files changed, 48 insertions(+), 31 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 573c809..c2a967b 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -189,6 +189,32 @@ size_t ksize(const void *);
 #include <linux/slab_def.h>
 #endif
 
+/*
+ * *_nofail() is used for small allocations, which cannot fail because
+ * allocations of such size are infinitely retried up to the success.
+ * 
+ * Limitation of k*alloc_nofail(): `size' argument must be constant.
+ * If you have a dynamic `size', you have to use common k*alloc().
+ */
+#define kmalloc_nofail(size, flags) \
+({ \
+	void *p; \
+	(void)BUILD_BUG_ON_ZERO(size > (PAGE_SIZE << (PAGE_ALLOC_COSTLY_ORDER-1))); \
+	if (flags & __GFP_NORETRY) \
+		panic("Attempt to call kmalloc_nofail() with __GFP_NORETRY"); \
+	p = kmalloc(size, flags); \
+	if (p == NULL) \
+		panic("kmalloc_nofail() returned NULL\n"); \
+	p; \
+})
+
+#define kzalloc_nofail(size, flags) \
+	kmalloc_nofail(size, (flags | __GFP_ZERO))
+
+
+#define kmem_cache_zalloc_nofail(cache, flags) \
+	kmem_cache_alloc_nofail(cache, (flags | __GFP_ZERO))
+
 /**
  * kcalloc - allocate memory for an array. The memory is set to zero.
  * @n: number of elements.
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index f58d641..9e63bc5 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -309,4 +309,19 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
 }
 #endif
 
+/* This is a slightly weaker check than kmalloc_nofail() as kmem check is runtime check :\ */
+#define kmem_cache_alloc_nofail(cache, flags) \
+({ \
+	void *p; \
+	if ((cache)->objsize > (PAGE_SIZE << (PAGE_ALLOC_COSTLY_ORDER-1))) \
+		panic("Too big size (%lu) for kmem_cache_alloc_nofail()!", \
+			(long)(cache)->objsize); \
+	if (flags & __GFP_NORETRY); \
+		panic("Attempt to call kmem_cache_alloc_nofail() with __GFP_NORETRY"); \
+	p = kmem_cache_alloc(cache, flags); \
+	if (p == NULL) \
+		panic("kmem_cache_alloc() returned NULL\n"); \
+	p; \
+})
+
 #endif /* _LINUX_SLUB_DEF_H */
diff --git a/kernel/cred.c b/kernel/cred.c
index 8ef31f5..4c2997a 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -239,16 +239,10 @@ struct cred *cred_alloc_blank(void)
 {
 	struct cred *new;
 
-	new = kmem_cache_zalloc(cred_jar, GFP_KERNEL);
-	if (!new)
-		return NULL;
+	new = kmem_cache_zalloc_nofail(cred_jar, GFP_KERNEL);
 
 #ifdef CONFIG_KEYS
-	new->tgcred = kzalloc(sizeof(*new->tgcred), GFP_KERNEL);
-	if (!new->tgcred) {
-		kmem_cache_free(cred_jar, new);
-		return NULL;
-	}
+	new->tgcred = kzalloc_nofail(sizeof(*new->tgcred), GFP_KERNEL);
 	atomic_set(&new->tgcred->usage, 1);
 #endif
 
@@ -289,9 +283,7 @@ struct cred *prepare_creds(void)
 
 	validate_process_creds();
 
-	new = kmem_cache_alloc(cred_jar, GFP_KERNEL);
-	if (!new)
-		return NULL;
+	new = kmem_cache_alloc_nofail(cred_jar, GFP_KERNEL);
 
 	kdebug("prepare_creds() alloc %p", new);
 
@@ -334,9 +326,7 @@ struct cred *prepare_exec_creds(void)
 	struct cred *new;
 
 #ifdef CONFIG_KEYS
-	tgcred = kmalloc(sizeof(*tgcred), GFP_KERNEL);
-	if (!tgcred)
-		return NULL;
+	tgcred = kmalloc_nofail(sizeof(*tgcred), GFP_KERNEL);
 #endif
 
 	new = prepare_creds();
@@ -430,11 +420,7 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
 	 * a process - this is slightly icky as we violate COW credentials a
 	 * bit */
 	if (!(clone_flags & CLONE_THREAD)) {
-		tgcred = kmalloc(sizeof(*tgcred), GFP_KERNEL);
-		if (!tgcred) {
-			ret = -ENOMEM;
-			goto error_put;
-		}
+		tgcred = kmalloc_nofail(sizeof(*tgcred), GFP_KERNEL);
 		atomic_set(&tgcred->usage, 1);
 		spin_lock_init(&tgcred->lock);
 		tgcred->process_keyring = NULL;
@@ -647,9 +633,7 @@ struct cred *prepare_kernel_cred(struct task_struct *daemon)
 	const struct cred *old;
 	struct cred *new;
 
-	new = kmem_cache_alloc(cred_jar, GFP_KERNEL);
-	if (!new)
-		return NULL;
+	new = kmem_cache_alloc_nofail(cred_jar, GFP_KERNEL);
 
 	kdebug("prepare_kernel_cred() alloc %p", new);
 
diff --git a/kernel/sys.c b/kernel/sys.c
index dd948a1..d8bca2d 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -618,8 +618,6 @@ static int set_user(struct cred *new)
 	struct user_struct *new_user;
 
 	new_user = alloc_uid(current_user_ns(), new->uid);
-	if (!new_user)
-		return -EAGAIN;
 
 	/*
 	 * We don't fail in case of NPROC limit excess here because too many
diff --git a/kernel/user.c b/kernel/user.c
index 9e03e9c..d79a345 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -146,10 +146,7 @@ struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid)
 	spin_unlock_irq(&uidhash_lock);
 
 	if (!up) {
-		new = kmem_cache_zalloc(uid_cachep, GFP_KERNEL);
-		if (!new)
-			goto out_unlock;
-
+		new = kmem_cache_zalloc_nofail(uid_cachep, GFP_KERNEL);
 		new->uid = uid;
 		atomic_set(&new->__count, 1);
 
@@ -174,9 +171,6 @@ struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid)
 	}
 
 	return up;
-
-out_unlock:
-	return NULL;
 }
 
 static int __init uid_cache_init(void)

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [kernel-hardening] kmalloc() nofail allocations
  2011-08-20 14:27 ` [kernel-hardening] " Vasiliy Kulikov
@ 2011-08-20 16:31   ` Solar Designer
  2011-08-22  9:24     ` Vasiliy Kulikov
  0 siblings, 1 reply; 8+ messages in thread
From: Solar Designer @ 2011-08-20 16:31 UTC (permalink / raw)
  To: kernel-hardening

Vasiliy,

Overall, I like this.  My biggest concern is that it implies that the
kernel will always have a can't fail memory allocation interface.  So if
kernel developers at some point want to change this, they'll have a hard
time making such changes.  On the other hand, the current situation with
some code assuming that failures are impossible and some other code
checking for them anyway is inconsistent and not very helpful for
possible memory allocation semantics changes anyway.

So let's give this a try.

On Sat, Aug 20, 2011 at 06:27:23PM +0400, Vasiliy Kulikov wrote:
> Here is a patch to do it.  I've implemented k(m|z)alloc() and
> kmem_cache_{z,}alloc() nofail variants.  As a result, setuid() cannot
> fail with any reason, but EACCES.
> 
> kernel/cred.c is partly moved to _nofail() too, just to show how much
> error handling code it removes.

This mostly looks good to me.  Some comments are below:

> +#define kmalloc_nofail(size, flags) \
> +({ \
> +	void *p; \
> +	(void)BUILD_BUG_ON_ZERO(size > (PAGE_SIZE << (PAGE_ALLOC_COSTLY_ORDER-1))); \
> +	if (flags & __GFP_NORETRY) \
> +		panic("Attempt to call kmalloc_nofail() with __GFP_NORETRY"); \
> +	p = kmalloc(size, flags); \
> +	if (p == NULL) \
> +		panic("kmalloc_nofail() returned NULL\n"); \
> +	p; \
> +})

You need to enclose size and flags in braces in this macro.

Also, I think portions of this macro should be moved into a non-inline
function defined in some .c file - to reduce total code size.  Then the
macro (and thus duplicated code) could be shortened to:

#define kmalloc_nofail(size, flags) \
({ \
	(void)BUILD_BUG_ON_ZERO((size) > (PAGE_SIZE << (PAGE_ALLOC_COSTLY_ORDER-1))); \
	_kmalloc_nofail((size), (flags)); \
})

or to optimize it for speed at the cost of slight size increase:

#define kmalloc_nofail(size, flags) \
({ \
	void *p; \
	(void)BUILD_BUG_ON_ZERO((size) > (PAGE_SIZE << (PAGE_ALLOC_COSTLY_ORDER-1))); \
	p = kmalloc((size), (flags)); \
	if (((flags) & __GFP_NORETRY) || p == NULL) \
		kmalloc_nofail_panic((flags), p); \
	p; \
})

so you have only one instance of a function call on error instead of
two calls to panic() and you don't rely on the compiler and linker
combining the many instances of constant strings.

You can leave these changes for after your initial RFC postings to LKML
if you like, though.

> +#define kzalloc_nofail(size, flags) \
> +	kmalloc_nofail(size, (flags | __GFP_ZERO))

Need braces around flags.

> +#define kmem_cache_zalloc_nofail(cache, flags) \
> +	kmem_cache_alloc_nofail(cache, (flags | __GFP_ZERO))

Need braces around cache and flags.

> +/* This is a slightly weaker check than kmalloc_nofail() as kmem check is runtime check :\ */
> +#define kmem_cache_alloc_nofail(cache, flags) \
> +({ \
> +	void *p; \
> +	if ((cache)->objsize > (PAGE_SIZE << (PAGE_ALLOC_COSTLY_ORDER-1))) \
> +		panic("Too big size (%lu) for kmem_cache_alloc_nofail()!", \
> +			(long)(cache)->objsize); \
> +	if (flags & __GFP_NORETRY); \
> +		panic("Attempt to call kmem_cache_alloc_nofail() with __GFP_NORETRY"); \
> +	p = kmem_cache_alloc(cache, flags); \
> +	if (p == NULL) \
> +		panic("kmem_cache_alloc() returned NULL\n"); \
> +	p; \
> +})

Similar comments apply here.  Since the size check is runtime anyway,
perhaps it should be moved into a non-inline function.

Please fix at least the braces, and bring this to LKML for discussion.

Thanks,

Alexander

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [kernel-hardening] kmalloc() nofail allocations
  2011-08-20 16:31   ` [kernel-hardening] " Solar Designer
@ 2011-08-22  9:24     ` Vasiliy Kulikov
  2011-08-22  9:38       ` Solar Designer
  0 siblings, 1 reply; 8+ messages in thread
From: Vasiliy Kulikov @ 2011-08-22  9:24 UTC (permalink / raw)
  To: kernel-hardening

Solar,

Major problem with the idea in general:


static inline struct page *
__alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
	struct zonelist *zonelist, enum zone_type high_zoneidx,
	nodemask_t *nodemask, struct zone *preferred_zone,
	int migratetype)
{
    ...
	/* Avoid allocations with no watermarks from looping endlessly */
	if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
		goto nopage;
    ...
}


Unlikely, but fully possible case - we want some memory and the OOM killer
kills us for our wish.

Also there are some flags, which I didn't take into consideration in the
checks of the previous patch, like __GFP_WAIT, but these are solvable
problems.


We cannot use __GFP_NOFAIL as it is marked as a legacy flag, which
shouldn't be used in a new code.  Otherwise, we'd return the same old
behaviour as before.


However, now I want to bring the subject to LKML (it starts to be
interesting) to clarify (and document, etc.) the allocation bahaviour,
when it should be cheched for NULL, whether these are really "nofail"
allocations, etc.

Thanks,

-- 
Vasiliy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [kernel-hardening] kmalloc() nofail allocations
  2011-08-22  9:24     ` Vasiliy Kulikov
@ 2011-08-22  9:38       ` Solar Designer
  2011-08-22  9:45         ` Vasiliy Kulikov
  0 siblings, 1 reply; 8+ messages in thread
From: Solar Designer @ 2011-08-22  9:38 UTC (permalink / raw)
  To: kernel-hardening

Vasiliy,

On Mon, Aug 22, 2011 at 01:24:29PM +0400, Vasiliy Kulikov wrote:
> Major problem with the idea in general:
...
> Unlikely, but fully possible case - we want some memory and the OOM killer
> kills us for our wish.

How/why is this a major problem with the idea in general?  I am probably
missing something.

Thanks,

Alexander

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [kernel-hardening] kmalloc() nofail allocations
  2011-08-22  9:38       ` Solar Designer
@ 2011-08-22  9:45         ` Vasiliy Kulikov
  2011-08-22  9:53           ` Solar Designer
  0 siblings, 1 reply; 8+ messages in thread
From: Vasiliy Kulikov @ 2011-08-22  9:45 UTC (permalink / raw)
  To: kernel-hardening

Solar,

On Mon, Aug 22, 2011 at 13:38 +0400, Solar Designer wrote:
> On Mon, Aug 22, 2011 at 01:24:29PM +0400, Vasiliy Kulikov wrote:
> > Major problem with the idea in general:
> ...
> > Unlikely, but fully possible case - we want some memory and the OOM killer
> > kills us for our wish.
> 
> How/why is this a major problem with the idea in general?  I am probably
> missing something.

Initially I wanted to "wrap" kmalloc calls, which cannot fail in any
case.  This would not change any mm code, but the caller (its
expectation).

Now I see that it's impossible without any mm code changes.  It needs at
least __GFP_NOFAIL flag addition, which is explicitly marked as "no new
uses".  Such kmalloc_nofail() wouldn't differ much from kmalloc(size,
flags | __GFP_NOFAIL).


Thanks,

-- 
Vasiliy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [kernel-hardening] kmalloc() nofail allocations
  2011-08-22  9:45         ` Vasiliy Kulikov
@ 2011-08-22  9:53           ` Solar Designer
  2011-08-22 10:05             ` Vasiliy Kulikov
  0 siblings, 1 reply; 8+ messages in thread
From: Solar Designer @ 2011-08-22  9:53 UTC (permalink / raw)
  To: kernel-hardening

Vasiliy,

On Mon, Aug 22, 2011 at 01:45:58PM +0400, Vasiliy Kulikov wrote:
> On Mon, Aug 22, 2011 at 13:38 +0400, Solar Designer wrote:
> > On Mon, Aug 22, 2011 at 01:24:29PM +0400, Vasiliy Kulikov wrote:
> > > Major problem with the idea in general:
> > ...
> > > Unlikely, but fully possible case - we want some memory and the OOM killer
> > > kills us for our wish.
> > 
> > How/why is this a major problem with the idea in general?  I am probably
> > missing something.
> 
> Initially I wanted to "wrap" kmalloc calls, which cannot fail in any
> case.  This would not change any mm code, but the caller (its
> expectation).
> 
> Now I see that it's impossible without any mm code changes.  It needs at
> least __GFP_NOFAIL flag addition, which is explicitly marked as "no new
> uses".  Such kmalloc_nofail() wouldn't differ much from kmalloc(size,
> flags | __GFP_NOFAIL).

Thanks for the explanation.

However, is there any difference for the caller between kmalloc()
looping until success (and thus only returning on success) and it
OOM-killing the current process (and thus also only returning on
success)?  Or does this question somehow not apply to the problem you
discovered?

Alexander

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [kernel-hardening] kmalloc() nofail allocations
  2011-08-22  9:53           ` Solar Designer
@ 2011-08-22 10:05             ` Vasiliy Kulikov
  0 siblings, 0 replies; 8+ messages in thread
From: Vasiliy Kulikov @ 2011-08-22 10:05 UTC (permalink / raw)
  To: kernel-hardening

Solar,

On Mon, Aug 22, 2011 at 13:53 +0400, Solar Designer wrote:
> However, is there any difference for the caller between kmalloc()
> looping until success (and thus only returning on success) and it
> OOM-killing the current process (and thus also only returning on
> success)?  Or does this question somehow not apply to the problem you
> discovered?

The "cannot fail" loop stops if the current task is marked as
to-be-killed-by-oom-killer.  OOM killer doesn't immediately kill the
task, the task exits itself after the exit from kernel mode and a
scheduler step (AFAIU).


So, AFAICS, the loop is executed only if:

1) Appropriate flags are passed (no NOWAIT, etc.), sane NUMA node list is
passed (e.g. not GFP_THISNODE and node list without current node).

2) Size is not big (less than PAGE_SIZE << 3).

3) Debugging allocation failure injection is either off or has a
min size more than the current allocation size.

4) OOM killer didn't mark the task with TIF_MEMDIE.


Thanks,

-- 
Vasiliy

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-08-22 10:05 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-17 19:15 [kernel-hardening] kmalloc() nofail allocations Vasiliy Kulikov
2011-08-20 14:27 ` [kernel-hardening] " Vasiliy Kulikov
2011-08-20 16:31   ` [kernel-hardening] " Solar Designer
2011-08-22  9:24     ` Vasiliy Kulikov
2011-08-22  9:38       ` Solar Designer
2011-08-22  9:45         ` Vasiliy Kulikov
2011-08-22  9:53           ` Solar Designer
2011-08-22 10:05             ` Vasiliy Kulikov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.