linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* m(un)map kmalloc buffers to userspace
@ 2015-12-08 17:25 Sebastian Frias
  2015-12-09 13:55 ` Michal Hocko
  0 siblings, 1 reply; 13+ messages in thread
From: Sebastian Frias @ 2015-12-08 17:25 UTC (permalink / raw)
  To: linux-mm; +Cc: Marc Gonzalez, linux-kernel

Hi,

We are porting a driver from Linux 3.4.39+ to 4.1.13+, CPU is Cortex-A9.

The driver maps kmalloc'ed memory to user space.
The usermode sees a contiguous space, although in reality it could span 
several chunks of memory allocated with separate calls to kmalloc.
For simplicity, the below example supposes a single chunk of kmalloc is 
mmaped (the problem does not seems to lie on the partition, but on the 
handling of page faults and/or mapping)

[driver]   : kmalloc(size) some buffer: "kaddr"
[usermode] : umvaddr = mmap(NULL, size, PROT_READ|PROT_WRITE, 
MAP_SHARED, fd, some_offset);
NOTE: "some_offset" is used to encode some flags for the driver
[driver]   : the driver's mmap would do:
              - vma->vm_page_prot = pgprot_noncached(*prot)
              - setup vma->vm_ops with a .fault handler
              - vma->vm_flags |= VM_RESERVED
              - store 'start' (which will equal "umvaddr")
              the .fault handler is setup to:
              - determine if the fault is on a virtual address that's 
supposed to map to "kaddr", if yes:
                - page = virt_to_page((kaddr + (vmf->virtual_address - 
vma->vm_start)) & PAGE_MASK)
                - get_page(page); (for whatever reason, there are 
actually two successive calls to "get_page()")
                - vmf->page = page and return VM_FAULT_MINOR;
[usermode]  : will use "umvaddr"

That worked just fine on 3.4 and earlier, but it is causing crashes on 
4.1.13+.
The crash happens as soon as the usermode tries to write (reading from a 
file) into "umvaddr" buffer (see further below for the log).
However, it does not happens 100% of the time, sometimes we don't get a 
crash, like if we were missing some initialisation?

Since VM_RESERVED was removed, we redefined it to VM_IO | VM_DONTEXPAND 
| VM_DONTDUMP (see 
http://thread.gmane.org/gmane.linux.kernel/1335615/focus=1335625)

Questions:
1) Do you guys see something wrong in the above scenario?
2) Now that VM_RESERVED was removed, is there another recommended flag 
to replace it for the purposes above?
3) Since it was working before, we suppose that something that was 
previously done by default on the kernel it is not done anymore, could 
that be a remap_pfn_range during mmap or kmalloc?
4) We tried using remap_pfn_range inside mmap and while it seems to 
work, we still get occasional crashes due to corrupted memory (in this 
case the behaviour is the same between 4.1 and 3.4 when using the same 
modified driver), are we missing something?

Thanks in advance,


Crash log:
[  194.330390] page fault for umvaddr=0xb597a000, kvaddr=0xe6580000
[  194.335389] page 0xe7fc4000
[  194.350169] Unable to handle kernel paging request at virtual address 
90ed1d49
[  194.357445] pgd = e777c000
[  194.360162] [90ed1d49] *pgd=00000000
[  194.363773] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[  194.369109] Modules linked in: em8xxx(PO) llad(PO) [last unloaded: llad]
[  194.375862] CPU: 1 PID: 1161 Comm: test_rmfp Tainted: P           O 
4.1.13+ #2
[  194.383382] Hardware name: Sigma Tango DT
[  194.387409] task: e6c97180 ti: e6e84000 task.ti: e6e84000
[  194.392846] PC is at inode_to_bdi+0x14/0x4c
[  194.397055] LR is at balance_dirty_pages_ratelimited+0x1c/0x7f4
[  194.403004] pc : [<c00efda4>]    lr : [<c00a1864>]    psr: a00e0013
[  194.403004] sp : e6e85ba8  ip : e6e85bc0  fp : e6e85bbc
[  194.414543] r10: 00000000  r9 : e6e2d5e8  r8 : 0000017a
[  194.419791] r7 : e6580000  r6 : e762bce8  r5 : 90ed1d35  r4 : e6580000
[  194.426349] r3 : e7fc4000  r2 : 00000000  r1 : e7fc4000  r0 : 90ed1d35
[  194.432908] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM 
Segment user
[  194.440078] Control: 10c5387d  Table: a777c04a  DAC: 00000015
[  194.445850] Process test_rmfp (pid: 1161, stack limit = 0xe6e84218)
[  194.452147] Stack: (0xe6e85ba8 to 0xe6e86000)
[  194.456525] 5ba0:                   e6580000 e6e2d000 e6e85c34 
e6e85bc0 c00a1864 c00efd9c
[  194.464748] 5bc0: bf0227b8 bf021ad0 e7fc4000 e7fc4010 00000000 
c0444084 e6e85c44
...(snipped)...
[  194.744251] Backtrace:
[  194.746718] [<c00efd90>] (inode_to_bdi) from [<c00a1864>] 
(balance_dirty_pages_ratelimited+0x1c/0x7f4)
[  194.756071]  r5:e6e2d000 r4:e6580000
[  194.759671] [<c00a1848>] (balance_dirty_pages_ratelimited) from 
[<c00b79bc>] (handle_mm_fault+0x230/0x104c)
[  194.769461]  r10:00000000 r9:e6e2d5e8 r8:0000017a r7:b597a000 
r6:e762bce8 r5:e6e2d000
[  194.777346]  r4:e6580000
[  194.779894] [<c00b778c>] (handle_mm_fault) from [<c00221a4>] 
(do_page_fault+0x1cc/0x370)
[  194.788024]  r10:00000015 r9:e6d1d078 r8:b597a000 r7:e6c97180 
r6:00000817 r5:e6d1d040
[  194.795908]  r4:e6e85dc8
[  194.798455] [<c0021fd8>] (do_page_fault) from [<c000928c>] 
(do_DataAbort+0x40/0xc0)
[  194.806148]  r10:b597a000 r9:00001000 r8:e6e85dc8 r7:c0447334 
r6:b597a000 r5:c0021fd8
[  194.814030]  r4:00000817
[  194.816579] [<c000924c>] (do_DataAbort) from [<c00193d8>] 
(__dabt_svc+0x38/0x60)
[  194.824010] Exception stack(0xe6e85dc8 to 0xe6e85e10)
[  194.829086] 5dc0:                   00008000 00000000 00000000 
e7f7b020 e6e85efc 00001000
[  194.837307] 5de0: 00000000 e6e85ef4 00000000 00001000 b597a000 
e6e85e44 00000000 e6e85e10
[  194.845525] 5e00: c009970c c019e0f4 200f0013 ffffffff
[  194.850597]  r8:00000000 r7:e6e85dfc r6:ffffffff r5:200f0013 r4:c019e0f4
[  194.857353] [<c019e090>] (copy_page_to_iter) from [<c009970c>] 
(generic_file_read_iter+0x324/0x66c)
[  194.866442]  r10:e7f7b020 r9:e6daf940 r8:00000000 r7:00000000 
r6:e71d2e90 r5:e71d2f5c
[  194.874323]  r4:00001000
[  194.876874] [<c00993e8>] (generic_file_read_iter) from [<c01365a8>] 
(nfs_file_read+0x48/0x8c)
[  194.885440]  r10:00000000 r9:b597a000 r8:00008000 r7:e6e85f78 
r6:e6e85efc r5:e71d2e90
[  194.893322]  r4:e6e85f10
[  194.895872] [<c0136560>] (nfs_file_read) from [<c00cae90>] 
(__vfs_read+0xb0/0xd8)
[  194.903390]  r6:e6daf940 r5:00000000 r4:00000000 r3:c0136560
[  194.909089] [<c00cade0>] (__vfs_read) from [<c00cb548>] 
(vfs_read+0x7c/0xa0)
[  194.916170]  r8:00008000 r7:e6daf940 r6:e6e85f78 r5:b597a000 r4:e6daf940
[  194.922920] [<c00cb4cc>] (vfs_read) from [<c00cbb9c>] 
(SyS_read+0x44/0x98)
[  194.929826]  r6:e6daf943 r5:00000000 r4:00000000 r3:e6e85f78
[  194.935530] [<c00cbb58>] (SyS_read) from [<c0014ca0>] 
(ret_fast_syscall+0x0/0x3c)
[  194.943049]  r9:e6e84000 r8:c0014e68 r7:00000003 r6:b6a85000 
r5:be9e5a69 r4:be9dad40
[  194.950849] Code: e92dd830 e24cb004 e2505000 0a000006 (e5954014)
[  194.957132] ---[ end trace 34014c1bf96caa57 ]---


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: m(un)map kmalloc buffers to userspace
  2015-12-08 17:25 m(un)map kmalloc buffers to userspace Sebastian Frias
@ 2015-12-09 13:55 ` Michal Hocko
  2015-12-09 14:07   ` Marc Gonzalez
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2015-12-09 13:55 UTC (permalink / raw)
  To: Sebastian Frias; +Cc: linux-mm, Marc Gonzalez, linux-kernel

On Tue 08-12-15 18:25:31, Sebastian Frias wrote:
> Hi,
> 
> We are porting a driver from Linux 3.4.39+ to 4.1.13+, CPU is Cortex-A9.
> 
> The driver maps kmalloc'ed memory to user space.

This sounds like a terrible idea to me. Why don't you simply use the
page allocator directly? Try to imagine what would happen if you mmaped
a kmalloc with a size which is not page aligned? mmaped memory uses
whole page granularity.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: m(un)map kmalloc buffers to userspace
  2015-12-09 13:55 ` Michal Hocko
@ 2015-12-09 14:07   ` Marc Gonzalez
  2015-12-09 14:32     ` Michal Hocko
  0 siblings, 1 reply; 13+ messages in thread
From: Marc Gonzalez @ 2015-12-09 14:07 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Sebastian Frias, linux-mm, LKML

On 09/12/2015 14:55, Michal Hocko wrote:
> On Tue 08-12-15 18:25:31, Sebastian Frias wrote:
>> Hi,
>>
>> We are porting a driver from Linux 3.4.39+ to 4.1.13+, CPU is Cortex-A9.
>>
>> The driver maps kmalloc'ed memory to user space.
> 
> This sounds like a terrible idea to me. Why don't you simply use the
> page allocator directly? Try to imagine what would happen if you mmaped
> a kmalloc with a size which is not page aligned? mmaped memory uses
> whole page granularity.

According to the source code, this kernel module calls

  kmalloc(1 << 17, GFP_KERNEL | __GFP_REPEAT);

I suppose kmalloc() would return page-aligned memory?

(Note: the kernel module was originally written for 2.4 and was updated
inconsistently over the years.)

Regards.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: m(un)map kmalloc buffers to userspace
  2015-12-09 14:07   ` Marc Gonzalez
@ 2015-12-09 14:32     ` Michal Hocko
  2015-12-09 14:53       ` Sebastian Frias
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2015-12-09 14:32 UTC (permalink / raw)
  To: Marc Gonzalez; +Cc: Sebastian Frias, linux-mm, LKML

On Wed 09-12-15 15:07:50, Marc Gonzalez wrote:
> On 09/12/2015 14:55, Michal Hocko wrote:
> > On Tue 08-12-15 18:25:31, Sebastian Frias wrote:
> >> Hi,
> >>
> >> We are porting a driver from Linux 3.4.39+ to 4.1.13+, CPU is Cortex-A9.
> >>
> >> The driver maps kmalloc'ed memory to user space.
> > 
> > This sounds like a terrible idea to me. Why don't you simply use the
> > page allocator directly? Try to imagine what would happen if you mmaped
> > a kmalloc with a size which is not page aligned? mmaped memory uses
> > whole page granularity.
> 
> According to the source code, this kernel module calls
> 
>   kmalloc(1 << 17, GFP_KERNEL | __GFP_REPEAT);

So I guess you are mapping with 32pages granularity? If this is really
needed for internal usage you can use highorder page and map its
subpages directly.

> I suppose kmalloc() would return page-aligned memory?

I do not think there is any guarantee like that. AFAIK you only get
guarantee for the natural word alignment. Slab allocator is allowed
to use larger allocation and put its metadata or whatever before the
returned pointer.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: m(un)map kmalloc buffers to userspace
  2015-12-09 14:32     ` Michal Hocko
@ 2015-12-09 14:53       ` Sebastian Frias
  2015-12-09 15:12         ` Michal Hocko
  0 siblings, 1 reply; 13+ messages in thread
From: Sebastian Frias @ 2015-12-09 14:53 UTC (permalink / raw)
  To: Michal Hocko, Marc Gonzalez; +Cc: linux-mm, LKML

On 12/09/2015 03:32 PM, Michal Hocko wrote:
> On Wed 09-12-15 15:07:50, Marc Gonzalez wrote:
>> On 09/12/2015 14:55, Michal Hocko wrote:
>>> On Tue 08-12-15 18:25:31, Sebastian Frias wrote:
>>>> Hi,
>>>>
>>>> We are porting a driver from Linux 3.4.39+ to 4.1.13+, CPU is Cortex-A9.
>>>>
>>>> The driver maps kmalloc'ed memory to user space.
>>>
>>> This sounds like a terrible idea to me. Why don't you simply use the
>>> page allocator directly? Try to imagine what would happen if you mmaped
>>> a kmalloc with a size which is not page aligned? mmaped memory uses
>>> whole page granularity.
>>
>> According to the source code, this kernel module calls
>>
>>    kmalloc(1 << 17, GFP_KERNEL | __GFP_REPEAT);
>
> So I guess you are mapping with 32pages granularity? If this is really
> needed for internal usage you can use highorder page and map its
> subpages directly.
>
>> I suppose kmalloc() would return page-aligned memory?
>
> I do not think there is any guarantee like that. AFAIK you only get
> guarantee for the natural word alignment. Slab allocator is allowed
> to use larger allocation and put its metadata or whatever before the
> returned pointer.
>

Thanks for your answer.
Do you have any suggestions regarding the rest of the questions? 
(copy/pasted below for convenience)

2) Now that VM_RESERVED was removed, is there another recommended flag 
to replace it for the purposes above?
3) Since it was working before, we suppose that something that was 
previously done by default on the kernel it is not done anymore, could 
that be a remap_pfn_range during mmap or kmalloc?
4) We tried using remap_pfn_range inside mmap and while it seems to 
work, we still get occasional crashes due to corrupted memory (in this 
case the behaviour is the same between 4.1 and 3.4 when using the same 
modified driver), are we missing something?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: m(un)map kmalloc buffers to userspace
  2015-12-09 14:53       ` Sebastian Frias
@ 2015-12-09 15:12         ` Michal Hocko
  2015-12-09 15:35           ` Sebastian Frias
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2015-12-09 15:12 UTC (permalink / raw)
  To: Sebastian Frias; +Cc: Marc Gonzalez, linux-mm, LKML

On Wed 09-12-15 15:53:22, Sebastian Frias wrote:
[...]
> 2) Now that VM_RESERVED was removed, is there another recommended flag to
> replace it for the purposes above?

VM_IO + potentially others depending on your usecase.

> 3) Since it was working before, we suppose that something that was
> previously done by default on the kernel it is not done anymore, could that
> be a remap_pfn_range during mmap or kmalloc?

VM_RESERVED removal was a cleanup which has removed the flag because it
was not needed and the same effect could be implied from either VM_IO or 
VM_DONTEXPAND | VM_DONTDUMP. See 314e51b9851b ("mm: kill vma flag
VM_RESERVED and mm->reserved_vm counter") for more detailed information.

> 4) We tried using remap_pfn_range inside mmap and while it seems to work, we
> still get occasional crashes due to corrupted memory (in this case the
> behaviour is the same between 4.1 and 3.4 when using the same modified
> driver), are we missing something?

This is hard to tell without knowing your driver. I would just encourage
you to look at other drivers which map kernel memory to userspace via
mmap. There are many of them. Maybe you can find a pattern which suites
your usecase.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: m(un)map kmalloc buffers to userspace
  2015-12-09 15:12         ` Michal Hocko
@ 2015-12-09 15:35           ` Sebastian Frias
  2015-12-10 11:40             ` Michal Hocko
  0 siblings, 1 reply; 13+ messages in thread
From: Sebastian Frias @ 2015-12-09 15:35 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Marc Gonzalez, linux-mm, LKML

On 12/09/2015 04:12 PM, Michal Hocko wrote:
> On Wed 09-12-15 15:53:22, Sebastian Frias wrote:
> [...]
>> 2) Now that VM_RESERVED was removed, is there another recommended flag to
>> replace it for the purposes above?
>
> VM_IO + potentially others depending on your usecase.
>
>> 3) Since it was working before, we suppose that something that was
>> previously done by default on the kernel it is not done anymore, could that
>> be a remap_pfn_range during mmap or kmalloc?
>
> VM_RESERVED removal was a cleanup which has removed the flag because it
> was not needed and the same effect could be implied from either VM_IO or
> VM_DONTEXPAND | VM_DONTDUMP. See 314e51b9851b ("mm: kill vma flag
> VM_RESERVED and mm->reserved_vm counter") for more detailed information.
>
>> 4) We tried using remap_pfn_range inside mmap and while it seems to work, we
>> still get occasional crashes due to corrupted memory (in this case the
>> behaviour is the same between 4.1 and 3.4 when using the same modified
>> driver), are we missing something?
>
> This is hard to tell without knowing your driver. I would just encourage
> you to look at other drivers which map kernel memory to userspace via
> mmap. There are many of them. Maybe you can find a pattern which suites
> your usecase.
>

Ok, thanks.
We've seen that drivers/media/pci/zoran/zoran_driver.c for example seems 
to be doing as us kmalloc+remap_pfn_range, is there any guarantee (or at 
least an advised heuristic) to determine if a driver is "current" (ie: 
uses the latest APIs and works)?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: m(un)map kmalloc buffers to userspace
  2015-12-09 15:35           ` Sebastian Frias
@ 2015-12-10 11:40             ` Michal Hocko
  2015-12-10 12:04               ` Richard Weinberger
  2015-12-10 13:37               ` Sebastian Frias
  0 siblings, 2 replies; 13+ messages in thread
From: Michal Hocko @ 2015-12-10 11:40 UTC (permalink / raw)
  To: Sebastian Frias; +Cc: Marc Gonzalez, linux-mm, LKML

On Wed 09-12-15 16:35:53, Sebastian Frias wrote:
[...]
> We've seen that drivers/media/pci/zoran/zoran_driver.c for example seems to
> be doing as us kmalloc+remap_pfn_range,

This driver is broken - I will post a patch.

> is there any guarantee (or at least an advised heuristic) to determine
> if a driver is "current" (ie: uses the latest APIs and works)?

OK, it seems I was overly optimistic when directing you to existing
drivers. Sorry about that I wasn't aware you could find such a terrible
code there. Please refer to Linux Device Drivers book which should give
you a much better lead (e.g. http://www.makelinux.net/ldd3/chp-15-sect-2)
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: m(un)map kmalloc buffers to userspace
  2015-12-10 11:40             ` Michal Hocko
@ 2015-12-10 12:04               ` Richard Weinberger
  2015-12-10 13:37               ` Sebastian Frias
  1 sibling, 0 replies; 13+ messages in thread
From: Richard Weinberger @ 2015-12-10 12:04 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Sebastian Frias, Marc Gonzalez, linux-mm, LKML

On Thu, Dec 10, 2015 at 12:40 PM, Michal Hocko <mhocko@kernel.org> wrote:
> On Wed 09-12-15 16:35:53, Sebastian Frias wrote:
> [...]
>> We've seen that drivers/media/pci/zoran/zoran_driver.c for example seems to
>> be doing as us kmalloc+remap_pfn_range,
>
> This driver is broken - I will post a patch.
>
>> is there any guarantee (or at least an advised heuristic) to determine
>> if a driver is "current" (ie: uses the latest APIs and works)?
>
> OK, it seems I was overly optimistic when directing you to existing
> drivers. Sorry about that I wasn't aware you could find such a terrible
> code there. Please refer to Linux Device Drivers book which should give
> you a much better lead (e.g. http://www.makelinux.net/ldd3/chp-15-sect-2)

Also consider using UIO.

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: m(un)map kmalloc buffers to userspace
  2015-12-10 11:40             ` Michal Hocko
  2015-12-10 12:04               ` Richard Weinberger
@ 2015-12-10 13:37               ` Sebastian Frias
  2015-12-10 14:06                 ` Michal Hocko
  1 sibling, 1 reply; 13+ messages in thread
From: Sebastian Frias @ 2015-12-10 13:37 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Marc Gonzalez, linux-mm, LKML

On 12/10/2015 12:40 PM, Michal Hocko wrote:
> On Wed 09-12-15 16:35:53, Sebastian Frias wrote:
> [...]
>> We've seen that drivers/media/pci/zoran/zoran_driver.c for example seems to
>> be doing as us kmalloc+remap_pfn_range,
>
> This driver is broken - I will post a patch.

Ok, we'll be glad to see a good example, please keep us posted.

>
>> is there any guarantee (or at least an advised heuristic) to determine
>> if a driver is "current" (ie: uses the latest APIs and works)?
>
> OK, it seems I was overly optimistic when directing you to existing
> drivers. Sorry about that I wasn't aware you could find such a terrible
> code there. Please refer to Linux Device Drivers book which should give
> you a much better lead (e.g. http://www.makelinux.net/ldd3/chp-15-sect-2)
>

Thank you for the link.
The current code of our driver was has portions written following LDD3, 
however, we it seems that LDD3 advice is not relevant anymore.
Indeed, it talks about VM_RESERVED, it talks about using "nopage" and it 
says that remap_pfn_range cannot be used for pages from get_user_page 
(or kmalloc).
It seems such assertions are valid on older kernels, because the code 
stops working on 3.4+ if we use remap_pfn_range the same way than 
drivers/media/pci/zoran/zoran_driver.c
However, kmalloc+remap_pfn_range does work on 4.1.13+

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: m(un)map kmalloc buffers to userspace
  2015-12-10 13:37               ` Sebastian Frias
@ 2015-12-10 14:06                 ` Michal Hocko
  2015-12-10 16:48                   ` Sebastian Frias
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2015-12-10 14:06 UTC (permalink / raw)
  To: Sebastian Frias; +Cc: Marc Gonzalez, linux-mm, LKML

On Thu 10-12-15 14:37:38, Sebastian Frias wrote:
> On 12/10/2015 12:40 PM, Michal Hocko wrote:
> >On Wed 09-12-15 16:35:53, Sebastian Frias wrote:
> >[...]
> >>We've seen that drivers/media/pci/zoran/zoran_driver.c for example seems to
> >>be doing as us kmalloc+remap_pfn_range,
> >
> >This driver is broken - I will post a patch.
> 
> Ok, we'll be glad to see a good example, please keep us posted.
> 
> >
> >>is there any guarantee (or at least an advised heuristic) to determine
> >>if a driver is "current" (ie: uses the latest APIs and works)?
> >
> >OK, it seems I was overly optimistic when directing you to existing
> >drivers. Sorry about that I wasn't aware you could find such a terrible
> >code there. Please refer to Linux Device Drivers book which should give
> >you a much better lead (e.g. http://www.makelinux.net/ldd3/chp-15-sect-2)
> >
> 
> Thank you for the link.
> The current code of our driver was has portions written following LDD3,
> however, we it seems that LDD3 advice is not relevant anymore.
> Indeed, it talks about VM_RESERVED, it talks about using "nopage" and it
> says that remap_pfn_range cannot be used for pages from get_user_page (or
> kmalloc).

Heh, it seems that we are indeed outdated there as well. The memory
management code doesn't really require pages to be reserved and it
allows to use get_user_page(s) memory to be mapped to user ptes.
remap_pfn_range will set all the appropriate flags to make sure MM code
will not stumble over those pages and let's the driver to take care of
the memory deallocation.

> It seems such assertions are valid on older kernels, because the code stops
> working on 3.4+ if we use remap_pfn_range the same way than
> drivers/media/pci/zoran/zoran_driver.c
> However, kmalloc+remap_pfn_range does work on 4.1.13+

As I've said nothing will guarantee that the kmalloc returned address
will be page aligned so you might corrupt slab internal data structures.
You might allocate a larger buffer via kmalloc and make sure it is
aligned properly but I fail to see why should be kmalloc used in the
first place as you need a memory in page size unnits anyway.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: m(un)map kmalloc buffers to userspace
  2015-12-10 14:06                 ` Michal Hocko
@ 2015-12-10 16:48                   ` Sebastian Frias
  2015-12-11  9:42                     ` Michal Hocko
  0 siblings, 1 reply; 13+ messages in thread
From: Sebastian Frias @ 2015-12-10 16:48 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Marc Gonzalez, linux-mm, LKML

On 12/10/2015 03:06 PM, Michal Hocko wrote:
> On Thu 10-12-15 14:37:38, Sebastian Frias wrote:
>> On 12/10/2015 12:40 PM, Michal Hocko wrote:
>>> On Wed 09-12-15 16:35:53, Sebastian Frias wrote:
>>> [...]
>>>> We've seen that drivers/media/pci/zoran/zoran_driver.c for example seems to
>>>> be doing as us kmalloc+remap_pfn_range,
>>>
>>> This driver is broken - I will post a patch.
>>
>> Ok, we'll be glad to see a good example, please keep us posted.
>>
>>>
>>>> is there any guarantee (or at least an advised heuristic) to determine
>>>> if a driver is "current" (ie: uses the latest APIs and works)?
>>>
>>> OK, it seems I was overly optimistic when directing you to existing
>>> drivers. Sorry about that I wasn't aware you could find such a terrible
>>> code there. Please refer to Linux Device Drivers book which should give
>>> you a much better lead (e.g. http://www.makelinux.net/ldd3/chp-15-sect-2)
>>>
>>
>> Thank you for the link.
>> The current code of our driver was has portions written following LDD3,
>> however, we it seems that LDD3 advice is not relevant anymore.
>> Indeed, it talks about VM_RESERVED, it talks about using "nopage" and it
>> says that remap_pfn_range cannot be used for pages from get_user_page (or
>> kmalloc).
>
> Heh, it seems that we are indeed outdated there as well. The memory
> management code doesn't really require pages to be reserved and it
> allows to use get_user_page(s) memory to be mapped to user ptes.
> remap_pfn_range will set all the appropriate flags to make sure MM code
> will not stumble over those pages and let's the driver to take care of
> the memory deallocation.

Ok, just for information, do you know since when it is possible to use 
remap_pfn_range on kmalloc/get_user_page memory?

>
>> It seems such assertions are valid on older kernels, because the code stops
>> working on 3.4+ if we use remap_pfn_range the same way than
>> drivers/media/pci/zoran/zoran_driver.c
>> However, kmalloc+remap_pfn_range does work on 4.1.13+
>
> As I've said nothing will guarantee that the kmalloc returned address
> will be page aligned so you might corrupt slab internal data structures.
> You might allocate a larger buffer via kmalloc and make sure it is
> aligned properly but I fail to see why should be kmalloc used in the
> first place as you need a memory in page size unnits anyway.
>

Ok, so let's say we stop using kmalloc in favor of __get_user_pages, do 
you see other things that would need to be done to be compliant with 
current practices?

For instance, drivers/media/pci/zoran/zoran_driver.c is doing:

    for (off = 0; off < fh->buffers.buffer_size; off += PAGE_SIZE)
       SetPageReserved(virt_to_page(mem + off));

on the memory allocated with kmalloc, but we are not doing any of that, 
yet it was working. Would the switch to __get_user_pages require the 
calls to SetPageReserved?

Thanks for your help.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: m(un)map kmalloc buffers to userspace
  2015-12-10 16:48                   ` Sebastian Frias
@ 2015-12-11  9:42                     ` Michal Hocko
  0 siblings, 0 replies; 13+ messages in thread
From: Michal Hocko @ 2015-12-11  9:42 UTC (permalink / raw)
  To: Sebastian Frias; +Cc: Marc Gonzalez, linux-mm, LKML

On Thu 10-12-15 17:48:31, Sebastian Frias wrote:
> On 12/10/2015 03:06 PM, Michal Hocko wrote:
> >On Thu 10-12-15 14:37:38, Sebastian Frias wrote:
> >>On 12/10/2015 12:40 PM, Michal Hocko wrote:
> >>>On Wed 09-12-15 16:35:53, Sebastian Frias wrote:
> >>>[...]
> >>>>We've seen that drivers/media/pci/zoran/zoran_driver.c for example seems to
> >>>>be doing as us kmalloc+remap_pfn_range,
> >>>
> >>>This driver is broken - I will post a patch.
> >>
> >>Ok, we'll be glad to see a good example, please keep us posted.
> >>
> >>>
> >>>>is there any guarantee (or at least an advised heuristic) to determine
> >>>>if a driver is "current" (ie: uses the latest APIs and works)?
> >>>
> >>>OK, it seems I was overly optimistic when directing you to existing
> >>>drivers. Sorry about that I wasn't aware you could find such a terrible
> >>>code there. Please refer to Linux Device Drivers book which should give
> >>>you a much better lead (e.g. http://www.makelinux.net/ldd3/chp-15-sect-2)
> >>>
> >>
> >>Thank you for the link.
> >>The current code of our driver was has portions written following LDD3,
> >>however, we it seems that LDD3 advice is not relevant anymore.
> >>Indeed, it talks about VM_RESERVED, it talks about using "nopage" and it
> >>says that remap_pfn_range cannot be used for pages from get_user_page (or
> >>kmalloc).
> >
> >Heh, it seems that we are indeed outdated there as well. The memory
> >management code doesn't really require pages to be reserved and it
> >allows to use get_user_page(s) memory to be mapped to user ptes.
> >remap_pfn_range will set all the appropriate flags to make sure MM code
> >will not stumble over those pages and let's the driver to take care of
> >the memory deallocation.
> 
> Ok, just for information, do you know since when it is possible to use
> remap_pfn_range on kmalloc/get_user_page memory?

No from top of my head. But at least since 6aab341e0a28a (2.6.15)
remap_pfn_page sets PM_PFN which make vm_normal_page ignore those pages
in MM code.

> >>It seems such assertions are valid on older kernels, because the code stops
> >>working on 3.4+ if we use remap_pfn_range the same way than
> >>drivers/media/pci/zoran/zoran_driver.c
> >>However, kmalloc+remap_pfn_range does work on 4.1.13+
> >
> >As I've said nothing will guarantee that the kmalloc returned address
> >will be page aligned so you might corrupt slab internal data structures.
> >You might allocate a larger buffer via kmalloc and make sure it is
> >aligned properly but I fail to see why should be kmalloc used in the
> >first place as you need a memory in page size unnits anyway.
> >
> 
> Ok, so let's say we stop using kmalloc in favor of __get_user_pages, do you
> see other things that would need to be done to be compliant with current
> practices?

I think this should just work.

> For instance, drivers/media/pci/zoran/zoran_driver.c is doing:
> 
>    for (off = 0; off < fh->buffers.buffer_size; off += PAGE_SIZE)
>       SetPageReserved(virt_to_page(mem + off));
> 
> on the memory allocated with kmalloc, but we are not doing any of that, yet
> it was working. Would the switch to __get_user_pages require the calls to
> SetPageReserved?

I do not see much point of setting pages reserved. MM should ignore them
based on the vma flags AFAICS via vm_normal_page. Quick check of
PageReserved usage in the mm code shows that we use it very rarely. It
would be really a bug when mm would touch such a page even without
PageReserved. So this seems like a historical heritage.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-12-11  9:42 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-08 17:25 m(un)map kmalloc buffers to userspace Sebastian Frias
2015-12-09 13:55 ` Michal Hocko
2015-12-09 14:07   ` Marc Gonzalez
2015-12-09 14:32     ` Michal Hocko
2015-12-09 14:53       ` Sebastian Frias
2015-12-09 15:12         ` Michal Hocko
2015-12-09 15:35           ` Sebastian Frias
2015-12-10 11:40             ` Michal Hocko
2015-12-10 12:04               ` Richard Weinberger
2015-12-10 13:37               ` Sebastian Frias
2015-12-10 14:06                 ` Michal Hocko
2015-12-10 16:48                   ` Sebastian Frias
2015-12-11  9:42                     ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).