linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Using TASK_SIZE for kernel threads
@ 2017-02-24 16:15 Martin Schwidefsky
  2017-02-24 16:15 ` [PATCH] s390: " Martin Schwidefsky
  2017-02-25 18:19 ` Using " Linus Torvalds
  0 siblings, 2 replies; 4+ messages in thread
From: Martin Schwidefsky @ 2017-02-24 16:15 UTC (permalink / raw)
  To: linux-kernel, Alexander Viro, Linus Torvalds; +Cc: Carsten Otte

Hello,

this week we had some fun with kernel 4.10 on s390. Carsten found that
the kernel kept crashing reproducibly on his system. Not on mine or any
other system, just his.

The kdevtmpfs kernel thread crashed in __queued_work as it tried to
terminate. The devtmpfsd function got an error on the sys_mount() call.
Why it crashes on termination is a different story, the interesing part
is why sys_mount() got an error.

After some more debugging Carsten found out that copy_mount_options()
only got 2 bytes of the option string, "mo" instead of "mode=0755".
It turned out that the s390 definition of TASK_SIZE together with the
size calculation in copy_mount_options causes this:

	#define TASK_SIZE_OF(tsk) ((tsk)->mm->context.asce_limit)
and
	size = TASK_SIZE - (unsigned long)data;

For a kernel thread (tsk)->mm is zero and the value located at
(0)->context.asce_limit happened to be close enough to the data
pointer that the 'size' result is a small number, in this case 2.

Now I fixed this in the s390 code, the patch is queued and will be
included in next weeks please-pull. But I am wondering about the use
of TASK_SIZE in kernel threads. For x86 copy_mount_options works
because the size calculation will give a negative result for 'data'
pointing to kernel space. Which is corrected by the size limit:

	if (size > PAGE_SIZE)
		size = PAGE_SIZE;

Wouldn't it be cleaner to test "get_fs()==KERNEL_DS" and just use
size=4096 in this case? The detour via TASK_SIZE does not make much
sense to me.

To find out how big the problem is, I have added a warning to TASK_SIZE
to create a console messsage if it is called for a task without an mm.
The only hit has been copy_mount_options.

For reference I have included the s390 patch.

Martin Schwidefsky (1):
  s390: TASK_SIZE for kernel threads

 arch/s390/include/asm/processor.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH] s390: TASK_SIZE for kernel threads
  2017-02-24 16:15 Using TASK_SIZE for kernel threads Martin Schwidefsky
@ 2017-02-24 16:15 ` Martin Schwidefsky
  2017-02-25 18:19 ` Using " Linus Torvalds
  1 sibling, 0 replies; 4+ messages in thread
From: Martin Schwidefsky @ 2017-02-24 16:15 UTC (permalink / raw)
  To: linux-kernel, Alexander Viro, Linus Torvalds; +Cc: Carsten Otte

Return a sensible value if TASK_SIZE if called from a kernel thread.

This gets us around an issue with copy_mount_options that does a magic
size calculation "TASK_SIZE - (unsigned long)data" while in a kernel
thread and data pointing to kernel space.

Cc: <stable@vger.kernel.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
 arch/s390/include/asm/processor.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index dacba34..4045639 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -89,7 +89,8 @@ extern void execve_tail(void);
  * User space process size: 2GB for 31 bit, 4TB or 8PT for 64 bit.
  */
 
-#define TASK_SIZE_OF(tsk)	((tsk)->mm->context.asce_limit)
+#define TASK_SIZE_OF(tsk)	((tsk)->mm ? \
+				 (tsk)->mm->context.asce_limit : TASK_MAX_SIZE)
 #define TASK_UNMAPPED_BASE	(test_thread_flag(TIF_31BIT) ? \
 					(1UL << 30) : (1UL << 41))
 #define TASK_SIZE		TASK_SIZE_OF(current)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: Using TASK_SIZE for kernel threads
  2017-02-24 16:15 Using TASK_SIZE for kernel threads Martin Schwidefsky
  2017-02-24 16:15 ` [PATCH] s390: " Martin Schwidefsky
@ 2017-02-25 18:19 ` Linus Torvalds
  2017-02-27  6:21   ` Martin Schwidefsky
  1 sibling, 1 reply; 4+ messages in thread
From: Linus Torvalds @ 2017-02-25 18:19 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Linux Kernel Mailing List, Alexander Viro, Carsten Otte

On Fri, Feb 24, 2017 at 8:15 AM, Martin Schwidefsky
<schwidefsky@de.ibm.com> wrote:
>
> Now I fixed this in the s390 code, the patch is queued and will be
> included in next weeks please-pull. But I am wondering about the use
> of TASK_SIZE in kernel threads. For x86 copy_mount_options works
> because the size calculation will give a negative result for 'data'
> pointing to kernel space. Which is corrected by the size limit:
>
>         if (size > PAGE_SIZE)
>                 size = PAGE_SIZE;
>
> Wouldn't it be cleaner to test "get_fs()==KERNEL_DS" and just use
> size=4096 in this case? The detour via TASK_SIZE does not make much
> sense to me.
>
> To find out how big the problem is, I have added a warning to TASK_SIZE
> to create a console messsage if it is called for a task without an mm.
> The only hit has been copy_mount_options.

So copy_mount_options() is a horrible hack. It doesn't have a size
limit, and it can copy binary data, so our good auto-limiting code in
strncpy_from_user() isn't usable either.

It probably *should* use the same user_addr_max() logic that
strncpy_from_user() uses, but that wouldn't actually have helped s390,
because s390 doesn't use the generic strncpy_from_user(), and doesn't
have that user_addr_max() thing.

So from everything I see, I think this is actually a s390 bug in every
way. Your TASK_SIZE_OF() implementation is simply bogus and broken,
and that's the core problem.

For example, you could have just had

   #define user_addr_max()   (current_thread_info()->addr_limit.seg)

like some other architectures, and it would have been all good.

If somebody is willing to add user_addr_max() to all architectures and
make copy_mount_options() use the same logic as
lib/strncpy_from_user.c, then that would certainly be acceptable to
me. As it is, I think it uses TASK_SIZE in ways that are not pretty,
but are what they are..

                Linus

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Using TASK_SIZE for kernel threads
  2017-02-25 18:19 ` Using " Linus Torvalds
@ 2017-02-27  6:21   ` Martin Schwidefsky
  0 siblings, 0 replies; 4+ messages in thread
From: Martin Schwidefsky @ 2017-02-27  6:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List, Alexander Viro, Carsten Otte

On Sat, 25 Feb 2017 10:19:04 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Fri, Feb 24, 2017 at 8:15 AM, Martin Schwidefsky
> <schwidefsky@de.ibm.com> wrote:
> >
> > Now I fixed this in the s390 code, the patch is queued and will be
> > included in next weeks please-pull. But I am wondering about the use
> > of TASK_SIZE in kernel threads. For x86 copy_mount_options works
> > because the size calculation will give a negative result for 'data'
> > pointing to kernel space. Which is corrected by the size limit:
> >
> >         if (size > PAGE_SIZE)
> >                 size = PAGE_SIZE;
> >
> > Wouldn't it be cleaner to test "get_fs()==KERNEL_DS" and just use
> > size=4096 in this case? The detour via TASK_SIZE does not make much
> > sense to me.
> >
> > To find out how big the problem is, I have added a warning to TASK_SIZE
> > to create a console messsage if it is called for a task without an mm.
> > The only hit has been copy_mount_options.  
> 
> So copy_mount_options() is a horrible hack. It doesn't have a size
> limit, and it can copy binary data, so our good auto-limiting code in
> strncpy_from_user() isn't usable either.
> 
> It probably *should* use the same user_addr_max() logic that
> strncpy_from_user() uses, but that wouldn't actually have helped s390,
> because s390 doesn't use the generic strncpy_from_user(), and doesn't
> have that user_addr_max() thing.

I see, set_fs(KERNEL_DS) sets a different address for user_addr_max to
return. That would work but requires that all architectures have the
define.

> So from everything I see, I think this is actually a s390 bug in every
> way. Your TASK_SIZE_OF() implementation is simply bogus and broken,
> and that's the core problem.
> 
> For example, you could have just had
> 
>    #define user_addr_max()   (current_thread_info()->addr_limit.seg)
> 
> like some other architectures, and it would have been all good.

The background is that TASK_SIZE on s390 is not a constant, it depends
on the layout of the mm. There are three, 2GB for 31-bit with a 2-level
page table, 4TB for a standard 64-bit process with a 3-level page table
and 8PB with 4 levels for a process that did a really large mmap.
The upgrade from 4TB to 8PB is at runtime, that is why the size
of the mm is stored in mm->context. It is an attribute of the mm, if
one thread changes it, it changes for all threads.

> If somebody is willing to add user_addr_max() to all architectures and
> make copy_mount_options() use the same logic as
> lib/strncpy_from_user.c, then that would certainly be acceptable to
> me. As it is, I think it uses TASK_SIZE in ways that are not pretty,
> but are what they are..

I guess that won't happen anytime soon. I will use the proposed fix
within the arch code. Thanks.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-02-27  9:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-24 16:15 Using TASK_SIZE for kernel threads Martin Schwidefsky
2017-02-24 16:15 ` [PATCH] s390: " Martin Schwidefsky
2017-02-25 18:19 ` Using " Linus Torvalds
2017-02-27  6:21   ` Martin Schwidefsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).