[1/3] mm/vmalloc: fix size check for remap_vmalloc_range_partial()
diff mbox series

Message ID 20190103145954.16942-2-rpenyaev@suse.de
State In Next
Commit 8a55abf0cdf0571c44c081e906cbd5f06249c403
Headers show
Series
  • mm/vmalloc: fix size check and few cleanups
Related show

Commit Message

Roman Penyaev Jan. 3, 2019, 2:59 p.m. UTC
area->size can include adjacent guard page but get_vm_area_size()
returns actual size of the area.

This fixes possible kernel crash when userspace tries to map area
on 1 page bigger: size check passes but the following vmalloc_to_page()
returns NULL on last guard (non-existing) page.

Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Joe Perches <joe@perches.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
---
 mm/vmalloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Michal Hocko Jan. 3, 2019, 3:13 p.m. UTC | #1
On Thu 03-01-19 15:59:52, Roman Penyaev wrote:
> area->size can include adjacent guard page but get_vm_area_size()
> returns actual size of the area.
> 
> This fixes possible kernel crash when userspace tries to map area
> on 1 page bigger: size check passes but the following vmalloc_to_page()
> returns NULL on last guard (non-existing) page.

Can this actually happen? I am not really familiar with all the callers
of this API but VM_NO_GUARD is not really used wildly in the kernel.
All I can see is kasan na arm64 which doesn't really seem to use it
for vmalloc.

So is the problem real or this is a mere cleanup?
 
> Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
> Cc: Joe Perches <joe@perches.com>
> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Cc: stable@vger.kernel.org
> ---
>  mm/vmalloc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 871e41c55e23..2cd24186ba84 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2248,7 +2248,7 @@ int remap_vmalloc_range_partial(struct vm_area_struct *vma, unsigned long uaddr,
>  	if (!(area->flags & VM_USERMAP))
>  		return -EINVAL;
>  
> -	if (kaddr + size > area->addr + area->size)
> +	if (kaddr + size > area->addr + get_vm_area_size(area))
>  		return -EINVAL;
>  
>  	do {
> -- 
> 2.19.1
Roman Penyaev Jan. 3, 2019, 7:27 p.m. UTC | #2
On 2019-01-03 16:13, Michal Hocko wrote:
> On Thu 03-01-19 15:59:52, Roman Penyaev wrote:
>> area->size can include adjacent guard page but get_vm_area_size()
>> returns actual size of the area.
>> 
>> This fixes possible kernel crash when userspace tries to map area
>> on 1 page bigger: size check passes but the following 
>> vmalloc_to_page()
>> returns NULL on last guard (non-existing) page.
> 
> Can this actually happen? I am not really familiar with all the callers
> of this API but VM_NO_GUARD is not really used wildly in the kernel.

Exactly, by default (VM_NO_GUARD is not set) each area has guard page,
thus the area->size will be bigger.  The bug is not reproduced if
VM_NO_GUARD is set.

> All I can see is kasan na arm64 which doesn't really seem to use it
> for vmalloc.
> 
> So is the problem real or this is a mere cleanup?

This is the real problem, try this hunk for any file descriptor which 
provides
mapping, or say modify epoll as example:

--------------------------------
diff --git a/fs/eventpoll.c b/fs/eventpoll.c

+static int ep_mmap(struct file *file, struct vm_area_struct *vma)
+{
+       void *mem;
+
+       mem = vmalloc_user(4096);
+       BUG_ON(!mem);
+       /* Do not care about mem leak */
+
+       return remap_vmalloc_range(vma, mem, 0);
+}
+
  /* File callbacks that implement the eventpoll file behaviour */
  static const struct file_operations eventpoll_fops = {
  #ifdef CONFIG_PROC_FS
         .show_fdinfo    = ep_show_fdinfo,
  #endif
+       .mmap           = ep_mmap,
         .release        = ep_eventpoll_release,
--------------------------------

and the following code from userspace, which maps 2 pages,
instead of 1:

--------------------------------
epfd = epoll_create1(0);
assert(epfd >= 0);

p = mmap(NULL, 2<<12, PROT_WRITE|PROT_READ, MAP_PRIVATE, epfd, 0);
assert(p != MAP_FAILED);
--------------------------------

You immediately get the following oops:

[   38.894571] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000008
[   38.899048] #PF error: [normal kernel read fault]
[   38.901487] PGD 0 P4D 0
[   38.902801] Oops: 0000 [#1] PREEMPT SMP PTI
[   38.904984] CPU: 2 PID: 399 Comm: mmap-epoll Not tainted 4.20.0-1 
#238
[   38.914064] RIP: 0010:vm_insert_page+0x3b/0x1d0
[   38.941181] Call Trace:
[   38.941656]  remap_vmalloc_range_partial+0x8d/0xd0
[   38.942417]  mmap_region+0x3c7/0x630
[   38.942982]  do_mmap+0x38d/0x560
[   38.943479]  vm_mmap_pgoff+0x9a/0xf0
[   38.944028]  ksys_mmap_pgoff+0x18e/0x220
[   38.944554]  do_syscall_64+0x48/0xf0
[   38.945076]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

--
Roman
Michal Hocko Jan. 3, 2019, 7:40 p.m. UTC | #3
On Thu 03-01-19 20:27:26, Roman Penyaev wrote:
> On 2019-01-03 16:13, Michal Hocko wrote:
> > On Thu 03-01-19 15:59:52, Roman Penyaev wrote:
> > > area->size can include adjacent guard page but get_vm_area_size()
> > > returns actual size of the area.
> > > 
> > > This fixes possible kernel crash when userspace tries to map area
> > > on 1 page bigger: size check passes but the following
> > > vmalloc_to_page()
> > > returns NULL on last guard (non-existing) page.
> > 
> > Can this actually happen? I am not really familiar with all the callers
> > of this API but VM_NO_GUARD is not really used wildly in the kernel.
> 
> Exactly, by default (VM_NO_GUARD is not set) each area has guard page,
> thus the area->size will be bigger.  The bug is not reproduced if
> VM_NO_GUARD is set.
> 
> > All I can see is kasan na arm64 which doesn't really seem to use it
> > for vmalloc.
> > 
> > So is the problem real or this is a mere cleanup?
> 
> This is the real problem, try this hunk for any file descriptor which
> provides
> mapping, or say modify epoll as example:

OK, my response was more confusing than I intended. I meant to say. Is
there any in kernel code that would allow the bug have had in mind?
In other words can userspace trick any existing code?
Michal Hocko Jan. 3, 2019, 7:59 p.m. UTC | #4
On Thu 03-01-19 15:59:52, Roman Penyaev wrote:
> area->size can include adjacent guard page but get_vm_area_size()
> returns actual size of the area.
> 
> This fixes possible kernel crash when userspace tries to map area
> on 1 page bigger: size check passes but the following vmalloc_to_page()
> returns NULL on last guard (non-existing) page.
> 
> Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
> Cc: Joe Perches <joe@perches.com>
> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Cc: stable@vger.kernel.org

Forgot to add
Acked-by: Michal Hocko <mhocko@suse.com>
Although I am not really sure the stable backport is really needed as I
haven't seen any explicit example of a buggy kernel code to trigger the
issue.

> ---
>  mm/vmalloc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 871e41c55e23..2cd24186ba84 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2248,7 +2248,7 @@ int remap_vmalloc_range_partial(struct vm_area_struct *vma, unsigned long uaddr,
>  	if (!(area->flags & VM_USERMAP))
>  		return -EINVAL;
>  
> -	if (kaddr + size > area->addr + area->size)
> +	if (kaddr + size > area->addr + get_vm_area_size(area))
>  		return -EINVAL;
>  
>  	do {
> -- 
> 2.19.1
Roman Penyaev Jan. 3, 2019, 8:31 p.m. UTC | #5
On 2019-01-03 20:40, Michal Hocko wrote:
> On Thu 03-01-19 20:27:26, Roman Penyaev wrote:
>> On 2019-01-03 16:13, Michal Hocko wrote:
>> > On Thu 03-01-19 15:59:52, Roman Penyaev wrote:
>> > > area->size can include adjacent guard page but get_vm_area_size()
>> > > returns actual size of the area.
>> > >
>> > > This fixes possible kernel crash when userspace tries to map area
>> > > on 1 page bigger: size check passes but the following
>> > > vmalloc_to_page()
>> > > returns NULL on last guard (non-existing) page.
>> >
>> > Can this actually happen? I am not really familiar with all the callers
>> > of this API but VM_NO_GUARD is not really used wildly in the kernel.
>> 
>> Exactly, by default (VM_NO_GUARD is not set) each area has guard page,
>> thus the area->size will be bigger.  The bug is not reproduced if
>> VM_NO_GUARD is set.
>> 
>> > All I can see is kasan na arm64 which doesn't really seem to use it
>> > for vmalloc.
>> >
>> > So is the problem real or this is a mere cleanup?
>> 
>> This is the real problem, try this hunk for any file descriptor which
>> provides
>> mapping, or say modify epoll as example:
> 
> OK, my response was more confusing than I intended. I meant to say. Is
> there any in kernel code that would allow the bug have had in mind?
> In other words can userspace trick any existing code?

In theory any existing caller of remap_vmalloc_range() which does
not have an explicit size check should trigger an oops, e.g. this is
a good candidate:

*** drivers/media/usb/stkwebcam/stk-webcam.c:
v4l_stk_mmap[789]              ret = remap_vmalloc_range(vma, 
sbuf->buffer, 0);

According to the code no explicit size check, should be easy to 
reproduce:
mmap the frame buffer and you are done.

Other callers are not so easy to follow. But wait, here is another 
example:

(drivers/video/fbdev/core/fbmem.c)
static int
fb_mmap(struct file *file, struct vm_area_struct * vma)
         ...
    	res = fb->fb_mmap(info, vma);

(drivers/video/fbdev/vfb.c)
static int vfb_mmap(struct fb_info *info,
		    struct vm_area_struct *vma)
{
	return remap_vmalloc_range(vma, (void *)info->fix.smem_start, 
vma->vm_pgoff);
}

No checks, naked calls, should be also the candidate.


--
Roman
Michal Hocko Jan. 4, 2019, 9:38 a.m. UTC | #6
On Thu 03-01-19 21:31:58, Roman Penyaev wrote:
> On 2019-01-03 20:40, Michal Hocko wrote:
> > On Thu 03-01-19 20:27:26, Roman Penyaev wrote:
> > > On 2019-01-03 16:13, Michal Hocko wrote:
> > > > On Thu 03-01-19 15:59:52, Roman Penyaev wrote:
> > > > > area->size can include adjacent guard page but get_vm_area_size()
> > > > > returns actual size of the area.
> > > > >
> > > > > This fixes possible kernel crash when userspace tries to map area
> > > > > on 1 page bigger: size check passes but the following
> > > > > vmalloc_to_page()
> > > > > returns NULL on last guard (non-existing) page.
> > > >
> > > > Can this actually happen? I am not really familiar with all the callers
> > > > of this API but VM_NO_GUARD is not really used wildly in the kernel.
> > > 
> > > Exactly, by default (VM_NO_GUARD is not set) each area has guard page,
> > > thus the area->size will be bigger.  The bug is not reproduced if
> > > VM_NO_GUARD is set.
> > > 
> > > > All I can see is kasan na arm64 which doesn't really seem to use it
> > > > for vmalloc.
> > > >
> > > > So is the problem real or this is a mere cleanup?
> > > 
> > > This is the real problem, try this hunk for any file descriptor which
> > > provides
> > > mapping, or say modify epoll as example:
> > 
> > OK, my response was more confusing than I intended. I meant to say. Is
> > there any in kernel code that would allow the bug have had in mind?
> > In other words can userspace trick any existing code?
> 
> In theory any existing caller of remap_vmalloc_range() which does
> not have an explicit size check should trigger an oops, e.g. this is
> a good candidate:
> 
> *** drivers/media/usb/stkwebcam/stk-webcam.c:
> v4l_stk_mmap[789]              ret = remap_vmalloc_range(vma, sbuf->buffer,
> 0);

Hmm, sbuf->buffer is allocated in stk_setup_siobuf to have
buf->v4lbuf.length. mmap callback maps this buffer to the vma size and
that is indeed not enforced to be <= length AFAICS. So you are right!

Can we have an example in the changelog please?

Thanks!
Roman Penyaev Jan. 4, 2019, 10:21 a.m. UTC | #7
On 2019-01-04 10:38, Michal Hocko wrote:

[...]

>> >
>> > OK, my response was more confusing than I intended. I meant to say. Is
>> > there any in kernel code that would allow the bug have had in mind?
>> > In other words can userspace trick any existing code?
>> 
>> In theory any existing caller of remap_vmalloc_range() which does
>> not have an explicit size check should trigger an oops, e.g. this is
>> a good candidate:
>> 
>> *** drivers/media/usb/stkwebcam/stk-webcam.c:
>> v4l_stk_mmap[789]              ret = remap_vmalloc_range(vma, 
>> sbuf->buffer,
>> 0);
> 
> Hmm, sbuf->buffer is allocated in stk_setup_siobuf to have
> buf->v4lbuf.length. mmap callback maps this buffer to the vma size and
> that is indeed not enforced to be <= length AFAICS. So you are right!
> 
> Can we have an example in the changelog please?

You mean to resend this particular patch with the list of possible
candidates for oops in a comment message?  Sure thing.

--
Roman
Michal Hocko Jan. 4, 2019, 10:28 a.m. UTC | #8
On Fri 04-01-19 11:21:39, Roman Penyaev wrote:
> On 2019-01-04 10:38, Michal Hocko wrote:
> 
> [...]
> 
> > > >
> > > > OK, my response was more confusing than I intended. I meant to say. Is
> > > > there any in kernel code that would allow the bug have had in mind?
> > > > In other words can userspace trick any existing code?
> > > 
> > > In theory any existing caller of remap_vmalloc_range() which does
> > > not have an explicit size check should trigger an oops, e.g. this is
> > > a good candidate:
> > > 
> > > *** drivers/media/usb/stkwebcam/stk-webcam.c:
> > > v4l_stk_mmap[789]              ret = remap_vmalloc_range(vma,
> > > sbuf->buffer,
> > > 0);
> > 
> > Hmm, sbuf->buffer is allocated in stk_setup_siobuf to have
> > buf->v4lbuf.length. mmap callback maps this buffer to the vma size and
> > that is indeed not enforced to be <= length AFAICS. So you are right!
> > 
> > Can we have an example in the changelog please?
> 
> You mean to resend this particular patch with the list of possible
> candidates for oops in a comment message?  Sure thing.

I would just reply to the original patch with an updated changelog
wording (to include the above example and explain how the vma setup is
completely independent on the buffer allocation and ask Andrew to update
the changelog of the patch that is already in the mmotm tree).

Thanks!
Roman Penyaev Jan. 4, 2019, 11:06 a.m. UTC | #9
Hi Andrew,

Could you please update the commit message with something as
the following (I hope it will become a bit clearer now):

-------------
When VM_NO_GUARD is not set area->size includes adjacent guard page,
thus for correct size checking get_vm_area_size() should be used,
but not area->size.

This fixes possible kernel oops when userspace tries to mmap an area
on 1 page bigger than was allocated by vmalloc_user() call: the size
check inside remap_vmalloc_range_partial() accounts non-existing
guard page also, so check successfully passes but vmalloc_to_page()
returns NULL (guard page does not physically exist).

The following code pattern example should trigger an oops:

  static int oops_mmap(struct file *file, struct vm_area_struct *vma)
  {
        void *mem;

        mem = vmalloc_user(4096);
        BUG_ON(!mem);
        /* Do not care about mem leak */

        return remap_vmalloc_range(vma, mem, 0);
  }

And userspace simply mmaps size + PAGE_SIZE:

  mmap(NULL, 8192, PROT_WRITE|PROT_READ, MAP_PRIVATE, fd, 0);

Possible candidates for oops which do not have any explicit size
checks:

   *** drivers/media/usb/stkwebcam/stk-webcam.c:
   v4l_stk_mmap[789]   ret = remap_vmalloc_range(vma, sbuf->buffer, 0);

Or the following one:

   *** drivers/video/fbdev/core/fbmem.c
   static int
   fb_mmap(struct file *file, struct vm_area_struct * vma)
        ...
        res = fb->fb_mmap(info, vma);

Where fb_mmap callback calls remap_vmalloc_range() directly without
any explicit checks:

   *** drivers/video/fbdev/vfb.c
   static int vfb_mmap(struct fb_info *info,
             struct vm_area_struct *vma)
   {
       return remap_vmalloc_range(vma, (void *)info->fix.smem_start, 
vma->vm_pgoff);
   }
--------

--
Roman


On 2019-01-03 15:59, Roman Penyaev wrote:
> area->size can include adjacent guard page but get_vm_area_size()
> returns actual size of the area.
> 
> This fixes possible kernel crash when userspace tries to map area
> on 1 page bigger: size check passes but the following vmalloc_to_page()
> returns NULL on last guard (non-existing) page.
> 
> Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
> Cc: Joe Perches <joe@perches.com>
> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Cc: stable@vger.kernel.org
> ---
>  mm/vmalloc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 871e41c55e23..2cd24186ba84 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2248,7 +2248,7 @@ int remap_vmalloc_range_partial(struct
> vm_area_struct *vma, unsigned long uaddr,
>  	if (!(area->flags & VM_USERMAP))
>  		return -EINVAL;
> 
> -	if (kaddr + size > area->addr + area->size)
> +	if (kaddr + size > area->addr + get_vm_area_size(area))
>  		return -EINVAL;
> 
>  	do {
Andrey Ryabinin Jan. 11, 2019, 7:19 p.m. UTC | #10
On 1/3/19 5:59 PM, Roman Penyaev wrote:
> area->size can include adjacent guard page but get_vm_area_size()
> returns actual size of the area.
> 
> This fixes possible kernel crash when userspace tries to map area
> on 1 page bigger: size check passes but the following vmalloc_to_page()
> returns NULL on last guard (non-existing) page.
> 
> Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
> Cc: Joe Perches <joe@perches.com>
> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Cc: stable@vger.kernel.org
> ---

Fixes: e69e9d4aee71 ("vmalloc: introduce remap_vmalloc_range_partial")
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>

Patch
diff mbox series

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 871e41c55e23..2cd24186ba84 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2248,7 +2248,7 @@  int remap_vmalloc_range_partial(struct vm_area_struct *vma, unsigned long uaddr,
 	if (!(area->flags & VM_USERMAP))
 		return -EINVAL;
 
-	if (kaddr + size > area->addr + area->size)
+	if (kaddr + size > area->addr + get_vm_area_size(area))
 		return -EINVAL;
 
 	do {