linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* /dev/kmem BUG on mmap
@ 2012-07-29 22:28 Sasha Levin
  2012-07-30  9:43 ` Johannes Weiner
  0 siblings, 1 reply; 3+ messages in thread
From: Sasha Levin @ 2012-07-29 22:28 UTC (permalink / raw)
  To: gregkh, arnd; +Cc: linux-kernel

Hi all,

I was poking around /dev/kmem related code, and noticed the following in mmap_kmem():

        /* Turn a kernel-virtual address into a physical page frame */
        pfn = __pa((u64)vma->vm_pgoff << PAGE_SHIFT) >> PAGE_SHIFT;

Which looked odd since vm_pgoff is the offset into the mapping, so I'd assume that PAGE_OFFSET should be added to it as well, otherwise we get an invalid address.

I tested it by writing something like this:

	int main(void)
	{
	        int fd;
	        void *addr;
	
	        fd = open("/dev/kmem", O_RDONLY);
	        addr = mmap(NULL, 4096, PROT_READ, MAP_PRIVATE, fd, 4096);
	
	        return 0;
	}

Which indeed triggered a VM_BUG:

[   32.285431] kernel BUG at arch/x86/mm/physaddr.c:18!
[   32.285431] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[   32.285431] CPU 0
[   32.285431] Pid: 5643, comm: a.out Tainted: G        W    3.5.0-next-20120727-sasha #504
[   32.285431] RIP: 0010:[<ffffffff810acd97>]  [<ffffffff810acd97>] __phys_addr+0x57/0xa0
[   32.285431] RSP: 0018:ffff88000be67d68  EFLAGS: 00010213
[   32.285431] RAX: ffff87ffffffffff RBX: ffff88000d67cb00 RCX: 00000000000080d0
[   32.285431] RDX: 0000000000000071 RSI: ffff88000bfc8dc8 RDI: 0000000000001000
[   32.285431] RBP: ffff88000be67d68 R08: 0000000000000001 R09: ffff88000bfc8dc8
[   32.285431] R10: ffff88000bfc81f8 R11: 0000000000000002 R12: ffff88000bfc8dc8
[   32.285431] R13: 00007f26f80e6000 R14: ffff88000bf81000 R15: ffff88000bfc8dc8
[   32.285431] FS:  00007f26f80e8700(0000) GS:ffff88000d800000(0000) knlGS:0000000000000000
[   32.285431] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   32.285431] CR2: 00007f26f7c07d50 CR3: 000000000bfb8000 CR4: 00000000000406f0
[   32.285431] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   32.285431] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   32.285431] Process a.out (pid: 5643, threadinfo ffff88000be66000, task ffff88000bf2b000)
[   32.285431] Stack:
[   32.285431]  ffff88000be67d88 ffffffff81bb0737 ffff88000bfc95e0 ffff88000bfc95f0
[   32.285431]  ffff88000be67e48 ffffffff812163ae ffff88000d67cb00 0000000000000001
[   32.285431]  0000000000000000 0000000000000000 ffff88000be67dd8 ffff88000bfc81f8
[   32.285431] Call Trace:
[   32.285431]  [<ffffffff81bb0737>] mmap_kmem+0x27/0x90
[   32.285431]  [<ffffffff812163ae>] mmap_region+0x35e/0x5f0
[   32.285431]  [<ffffffff812168f9>] do_mmap_pgoff+0x2b9/0x350
[   32.285431]  [<ffffffff8120120c>] ? vm_mmap_pgoff+0x6c/0xb0
[   32.285431]  [<ffffffff81201224>] vm_mmap_pgoff+0x84/0xb0
[   32.285431]  [<ffffffff8124f280>] ? fget_raw+0x260/0x260
[   32.285431]  [<ffffffff81213d9e>] sys_mmap_pgoff+0x15e/0x190
[   32.285431]  [<ffffffff8198ab2e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[   32.285431]  [<ffffffff8106e4ed>] sys_mmap+0x1d/0x20
[   32.285431]  [<ffffffff8361f6f9>] system_call_fastpath+0x16/0x1b
[   32.285431] Code: 00 00 00 00 eb fe 66 0f 1f 44 00 00 48 03 05 91 02 78 03 eb 57 0f 1f 80 00 00 00 00 48 b8 ff ff ff ff ff 87 ff ff 48 39 c7 77 11 <0f> 0b 0f 1f 80 00 00 00 00 eb fe 66 0f 1f 44 00 00 48 b8 00 00
[   32.285431] RIP  [<ffffffff810acd97>] __phys_addr+0x57/0xa0
[   32.285431]  RSP <ffff88000be67d68>

I could send a patch to do what I think it's supposed to be doing, but I find it odd since apparently /dev/kmem has been broken for a while now - which doesn't make sense.

What am I missing?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: /dev/kmem BUG on mmap
  2012-07-29 22:28 /dev/kmem BUG on mmap Sasha Levin
@ 2012-07-30  9:43 ` Johannes Weiner
  2012-07-31  2:06   ` Hugh Dickins
  0 siblings, 1 reply; 3+ messages in thread
From: Johannes Weiner @ 2012-07-30  9:43 UTC (permalink / raw)
  To: Sasha Levin; +Cc: gregkh, arnd, Hugh Dickins, linux-kernel

On Mon, Jul 30, 2012 at 12:28:35AM +0200, Sasha Levin wrote:
> Hi all,
> 
> I was poking around /dev/kmem related code, and noticed the following in mmap_kmem():
> 
>         /* Turn a kernel-virtual address into a physical page frame */
>         pfn = __pa((u64)vma->vm_pgoff << PAGE_SHIFT) >> PAGE_SHIFT;
> 
> Which looked odd since vm_pgoff is the offset into the mapping, so
> I'd assume that PAGE_OFFSET should be added to it as well, otherwise
> we get an invalid address.

It's supposed to be used with kernel offsets in the first place,
i.e. vma->vm_pgoff << PAGE_SHIFT should actually be a kernel virtual
address.  See 6d3154c Revert "[PATCH] Fix up mmap_kmem".

> I tested it by writing something like this:
> 
> 	int main(void)
> 	{
> 	        int fd;
> 	        void *addr;
> 	
> 	        fd = open("/dev/kmem", O_RDONLY);
> 	        addr = mmap(NULL, 4096, PROT_READ, MAP_PRIVATE, fd, 4096);
> 	
> 	        return 0;
> 	}
> 
> Which indeed triggered a VM_BUG:
> 
> [   32.285431] kernel BUG at arch/x86/mm/physaddr.c:18!

x86's debug-version of __pa() triggers that bug.  I'm reluctant to add
a whole lot of error checking to this interface, given that you should
already know what you are doing.  OTOH, crashing like this is not very
nice, either.

Is there a portable way to check if an address is a kernel virtual
one?  It looks like comparing to PAGE_OFFSET would work on most archs,
but not necessarily on powerpc for example.

Johannes

> [   32.285431] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [   32.285431] CPU 0
> [   32.285431] Pid: 5643, comm: a.out Tainted: G        W    3.5.0-next-20120727-sasha #504
> [   32.285431] RIP: 0010:[<ffffffff810acd97>]  [<ffffffff810acd97>] __phys_addr+0x57/0xa0
> [   32.285431] RSP: 0018:ffff88000be67d68  EFLAGS: 00010213
> [   32.285431] RAX: ffff87ffffffffff RBX: ffff88000d67cb00 RCX: 00000000000080d0
> [   32.285431] RDX: 0000000000000071 RSI: ffff88000bfc8dc8 RDI: 0000000000001000
> [   32.285431] RBP: ffff88000be67d68 R08: 0000000000000001 R09: ffff88000bfc8dc8
> [   32.285431] R10: ffff88000bfc81f8 R11: 0000000000000002 R12: ffff88000bfc8dc8
> [   32.285431] R13: 00007f26f80e6000 R14: ffff88000bf81000 R15: ffff88000bfc8dc8
> [   32.285431] FS:  00007f26f80e8700(0000) GS:ffff88000d800000(0000) knlGS:0000000000000000
> [   32.285431] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   32.285431] CR2: 00007f26f7c07d50 CR3: 000000000bfb8000 CR4: 00000000000406f0
> [   32.285431] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   32.285431] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [   32.285431] Process a.out (pid: 5643, threadinfo ffff88000be66000, task ffff88000bf2b000)
> [   32.285431] Stack:
> [   32.285431]  ffff88000be67d88 ffffffff81bb0737 ffff88000bfc95e0 ffff88000bfc95f0
> [   32.285431]  ffff88000be67e48 ffffffff812163ae ffff88000d67cb00 0000000000000001
> [   32.285431]  0000000000000000 0000000000000000 ffff88000be67dd8 ffff88000bfc81f8
> [   32.285431] Call Trace:
> [   32.285431]  [<ffffffff81bb0737>] mmap_kmem+0x27/0x90
> [   32.285431]  [<ffffffff812163ae>] mmap_region+0x35e/0x5f0
> [   32.285431]  [<ffffffff812168f9>] do_mmap_pgoff+0x2b9/0x350
> [   32.285431]  [<ffffffff8120120c>] ? vm_mmap_pgoff+0x6c/0xb0
> [   32.285431]  [<ffffffff81201224>] vm_mmap_pgoff+0x84/0xb0
> [   32.285431]  [<ffffffff8124f280>] ? fget_raw+0x260/0x260
> [   32.285431]  [<ffffffff81213d9e>] sys_mmap_pgoff+0x15e/0x190
> [   32.285431]  [<ffffffff8198ab2e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [   32.285431]  [<ffffffff8106e4ed>] sys_mmap+0x1d/0x20
> [   32.285431]  [<ffffffff8361f6f9>] system_call_fastpath+0x16/0x1b
> [   32.285431] Code: 00 00 00 00 eb fe 66 0f 1f 44 00 00 48 03 05 91 02 78 03 eb 57 0f 1f 80 00 00 00 00 48 b8 ff ff ff ff ff 87 ff ff 48 39 c7 77 11 <0f> 0b 0f 1f 80 00 00 00 00 eb fe 66 0f 1f 44 00 00 48 b8 00 00
> [   32.285431] RIP  [<ffffffff810acd97>] __phys_addr+0x57/0xa0
> [   32.285431]  RSP <ffff88000be67d68>
> 
> I could send a patch to do what I think it's supposed to be doing, but I find it odd since apparently /dev/kmem has been broken for a while now - which doesn't make sense.
> 
> What am I missing?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: /dev/kmem BUG on mmap
  2012-07-30  9:43 ` Johannes Weiner
@ 2012-07-31  2:06   ` Hugh Dickins
  0 siblings, 0 replies; 3+ messages in thread
From: Hugh Dickins @ 2012-07-31  2:06 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Sasha Levin, gregkh, arnd, linux-kernel

On Mon, 30 Jul 2012, Johannes Weiner wrote:
> On Mon, Jul 30, 2012 at 12:28:35AM +0200, Sasha Levin wrote:
> > Hi all,
> > 
> > I was poking around /dev/kmem related code, and noticed the following in mmap_kmem():
> > 
> >         /* Turn a kernel-virtual address into a physical page frame */
> >         pfn = __pa((u64)vma->vm_pgoff << PAGE_SHIFT) >> PAGE_SHIFT;
> > 
> > Which looked odd since vm_pgoff is the offset into the mapping, so
> > I'd assume that PAGE_OFFSET should be added to it as well, otherwise
> > we get an invalid address.
> 
> It's supposed to be used with kernel offsets in the first place,
> i.e. vma->vm_pgoff << PAGE_SHIFT should actually be a kernel virtual
> address.  See 6d3154c Revert "[PATCH] Fix up mmap_kmem".

Yes.  Some would say we should add a comment; but already it has one.

> 
> > I tested it by writing something like this:
> > 
> > 	int main(void)
> > 	{
> > 	        int fd;
> > 	        void *addr;
> > 	
> > 	        fd = open("/dev/kmem", O_RDONLY);
> > 	        addr = mmap(NULL, 4096, PROT_READ, MAP_PRIVATE, fd, 4096);
> > 	
> > 	        return 0;
> > 	}
> > 
> > Which indeed triggered a VM_BUG:
> > 
> > [   32.285431] kernel BUG at arch/x86/mm/physaddr.c:18!
> 
> x86's debug-version of __pa() triggers that bug.  I'm reluctant to add
> a whole lot of error checking to this interface, given that you should
> already know what you are doing.  OTOH, crashing like this is not very
> nice, either.
> 
> Is there a portable way to check if an address is a kernel virtual
> one?  It looks like comparing to PAGE_OFFSET would work on most archs,
> but not necessarily on powerpc for example.

I didn't look into powerpc; even on x86, comparing with PAGE_OFFSET
first would filter out the most likely crashes, but leave it crashing
on >= KERNEL_IMAGE_SIZE and !phys_addr_valid().

I think that's why it's so long said just __pa(), because different
architectures would not agree on the appropriate prior validation.
Debug crashes added at as low level as __pa() come as a surprise.

Thank you to Sasha for bringing this to our attention, and if there
were an obvious right answer, I'd definitely prefer to fail than crash
an out-of-range mmap arg here, even if only CAP_SYS_RAWIO gets this far.

You could say that the right answer is to add the __pa_nodebug()
to every architecture (or in asm-generic), and then use that here;
but is it worth bothering?

Once I read the DEBUG_VIRTUAL Kconfig entry:
	Enable some costly sanity checks in virtual to page code.
	This can catch mistakes with virt_to_page() and friends.
	If unsure, say N.
I'm inclined to think that few would turn DEBUG_VIRTUAL on, and
those who do so might as well welcome this crash as the costly
way in which it catches their mistakes - with apology to Sasha.

Not an answer I'm especially proud of, but doubt it's worth more.

Hugh

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-07-31  2:07 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-29 22:28 /dev/kmem BUG on mmap Sasha Levin
2012-07-30  9:43 ` Johannes Weiner
2012-07-31  2:06   ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).