linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC PATCH] userfaultfd: Add feature to request for a signal delivery
       [not found]   ` <20170627153557.GB10091@rapoport-lnx>
@ 2017-06-27 16:01     ` Prakash Sangappa
       [not found]       ` <51508e99-d2dd-894f-8d8a-678e3747c1ee-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Prakash Sangappa @ 2017-06-27 16:01 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Michal Hocko, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Andrea Arcangeli, Mike Kravetz,
	Dave Hansen, Christoph Hellwig, linux-api-u79uwXL29TY76Z2rM5mHXA

On 6/27/17 8:35 AM, Mike Rapoport wrote:

> On Tue, Jun 27, 2017 at 09:06:43AM +0200, Michal Hocko wrote:
>> This is an user visible API so let's CC linux-api mailing list.
>>
>> On Mon 26-06-17 12:46:13, Prakash Sangappa wrote:
>>> In some cases, userfaultfd mechanism should just deliver a SIGBUS signal
>>> to the faulting process, instead of the page-fault event. Dealing with
>>> page-fault event using a monitor thread can be an overhead in these
>>> cases. For example applications like the database could use the signaling
>>> mechanism for robustness purpose.
>> this is rather confusing. What is the reason that the monitor would be
>> slower than signal delivery and handling?
>>
>>> Database uses hugetlbfs for performance reason. Files on hugetlbfs
>>> filesystem are created and huge pages allocated using fallocate() API.
>>> Pages are deallocated/freed using fallocate() hole punching support.
>>> These files are mmapped and accessed by many processes as shared memory.
>>> The database keeps track of which offsets in the hugetlbfs file have
>>> pages allocated.
>>>
>>> Any access to mapped address over holes in the file, which can occur due
>>> to bugs in the application, is considered invalid and expect the process
>>> to simply receive a SIGBUS.  However, currently when a hole in the file is
>>> accessed via the mapped address, kernel/mm attempts to automatically
>>> allocate a page at page fault time, resulting in implicitly filling the
>>> hole in the file. This may not be the desired behavior for applications
>>> like the database that want to explicitly manage page allocations of
>>> hugetlbfs files.
>> So you register UFFD_FEATURE_SIGBUS on each region tha you are unmapping
>> and than just let those offenders die?
>   
> If I understand correctly, the database will create the mapping, then it'll
> open userfaultfd and register those mappings with the userfault.
> Afterwards, when the application accesses a hole userfault will cause
> SIGBUS and the application will process it in whatever way it likes, e.g.
> just die.

Yes.

> What I don't understand is why won't you use userfault monitor process that
> will take care of the page fault events?
> It shouldn't be much overhead running it and it can keep track on all the
> userfault file descriptors for you and it will allow more versatile error
> handling that SIGBUS.
>

Co-ordination with the external monitor process by all the database 
processes
to send  their userfaultfd is still an overhead.


>>> Using userfaultfd mechanism, with this support to get a signal, database
>>> application can prevent pages from being allocated implicitly when
>>> processes access mapped address over holes in the file.
>>>
>>> This patch adds the feature to request for a SIGBUS signal to userfaultfd
>>> mechanism.
>>>
>>> See following for previous discussion about the database requirement
>>> leading to this proposal as suggested by Andrea.
>>>
>>> http://www.spinics.net/lists/linux-mm/msg129224.html
>> Please make those requirements part of the changelog.
>>
>>> Signed-off-by: Prakash <prakash.sangappa-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>>> ---
>>>   fs/userfaultfd.c                 |  5 +++++
>>>   include/uapi/linux/userfaultfd.h | 10 +++++++++-
>>>   2 files changed, 14 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
>>> index 1d622f2..5686d6d2 100644
>>> --- a/fs/userfaultfd.c
>>> +++ b/fs/userfaultfd.c
>>> @@ -371,6 +371,11 @@ int handle_userfault(struct vm_fault *vmf, unsigned
>>> long reason)
>>>       VM_BUG_ON(reason & ~(VM_UFFD_MISSING|VM_UFFD_WP));
>>>       VM_BUG_ON(!(reason & VM_UFFD_MISSING) ^ !!(reason & VM_UFFD_WP));
>>>
>>> +    if (ctx->features & UFFD_FEATURE_SIGBUS) {
>>> +        goto out;
>>> +    }
>>> +
>>>       /*
>>>        * If it's already released don't get it. This avoids to loop
>>>        * in __get_user_pages if userfaultfd_release waits on the
>>> diff --git a/include/uapi/linux/userfaultfd.h
>>> b/include/uapi/linux/userfaultfd.h
>>> index 3b05953..d39d5db 100644
>>> --- a/include/uapi/linux/userfaultfd.h
>>> +++ b/include/uapi/linux/userfaultfd.h
>>> @@ -23,7 +23,8 @@
>>>                  UFFD_FEATURE_EVENT_REMOVE |    \
>>>                  UFFD_FEATURE_EVENT_UNMAP |        \
>>>                  UFFD_FEATURE_MISSING_HUGETLBFS |    \
>>> -               UFFD_FEATURE_MISSING_SHMEM)
>>> +               UFFD_FEATURE_MISSING_SHMEM |        \
>>> +               UFFD_FEATURE_SIGBUS)
>>>   #define UFFD_API_IOCTLS                \
>>>       ((__u64)1 << _UFFDIO_REGISTER |        \
>>>        (__u64)1 << _UFFDIO_UNREGISTER |    \
>>> @@ -153,6 +154,12 @@ struct uffdio_api {
>>>        * UFFD_FEATURE_MISSING_SHMEM works the same as
>>>        * UFFD_FEATURE_MISSING_HUGETLBFS, but it applies to shmem
>>>        * (i.e. tmpfs and other shmem based APIs).
>>> +     *
>>> +     * UFFD_FEATURE_SIGBUS feature means no page-fault
>>> +     * (UFFD_EVENT_PAGEFAULT) event will be delivered, instead
>>> +     * a SIGBUS signal will be sent to the faulting process.
>>> +     * The application process can enable this behavior by adding
>>> +     * it to uffdio_api.features.
>>>        */
>>>   #define UFFD_FEATURE_PAGEFAULT_FLAG_WP        (1<<0)
>>>   #define UFFD_FEATURE_EVENT_FORK            (1<<1)
>>> @@ -161,6 +168,7 @@ struct uffdio_api {
>>>   #define UFFD_FEATURE_MISSING_HUGETLBFS        (1<<4)
>>>   #define UFFD_FEATURE_MISSING_SHMEM        (1<<5)
>>>   #define UFFD_FEATURE_EVENT_UNMAP        (1<<6)
>>> +#define UFFD_FEATURE_SIGBUS            (1<<7)
>>>       __u64 features;
>>>
>>>       __u64 ioctls;
>>> -- 
>>> 2.7.4
>>>
>> -- 
>> Michal Hocko
>> SUSE Labs
>>
> --
> Sincerely yours,
> Mike.
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] userfaultfd: Add feature to request for a signal delivery
       [not found]       ` <51508e99-d2dd-894f-8d8a-678e3747c1ee-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2017-06-28 13:18         ` Mike Rapoport
  2017-06-28 18:23           ` Prakash Sangappa
  0 siblings, 1 reply; 13+ messages in thread
From: Mike Rapoport @ 2017-06-28 13:18 UTC (permalink / raw)
  To: Prakash Sangappa
  Cc: Michal Hocko, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Andrea Arcangeli, Mike Kravetz,
	Dave Hansen, Christoph Hellwig, linux-api-u79uwXL29TY76Z2rM5mHXA

On Tue, Jun 27, 2017 at 09:01:20AM -0700, Prakash Sangappa wrote:
> On 6/27/17 8:35 AM, Mike Rapoport wrote:
> 
> >On Tue, Jun 27, 2017 at 09:06:43AM +0200, Michal Hocko wrote:
> >>This is an user visible API so let's CC linux-api mailing list.
> >>
> >>On Mon 26-06-17 12:46:13, Prakash Sangappa wrote:
> >>>In some cases, userfaultfd mechanism should just deliver a SIGBUS signal
> >>>to the faulting process, instead of the page-fault event. Dealing with
> >>>page-fault event using a monitor thread can be an overhead in these
> >>>cases. For example applications like the database could use the signaling
> >>>mechanism for robustness purpose.
> >>this is rather confusing. What is the reason that the monitor would be
> >>slower than signal delivery and handling?
> >>
> >>>Database uses hugetlbfs for performance reason. Files on hugetlbfs
> >>>filesystem are created and huge pages allocated using fallocate() API.
> >>>Pages are deallocated/freed using fallocate() hole punching support.
> >>>These files are mmapped and accessed by many processes as shared memory.
> >>>The database keeps track of which offsets in the hugetlbfs file have
> >>>pages allocated.
> >>>
> >>>Any access to mapped address over holes in the file, which can occur due
> >>>to bugs in the application, is considered invalid and expect the process
> >>>to simply receive a SIGBUS.  However, currently when a hole in the file is
> >>>accessed via the mapped address, kernel/mm attempts to automatically
> >>>allocate a page at page fault time, resulting in implicitly filling the
> >>>hole in the file. This may not be the desired behavior for applications
> >>>like the database that want to explicitly manage page allocations of
> >>>hugetlbfs files.
> >>So you register UFFD_FEATURE_SIGBUS on each region tha you are unmapping
> >>and than just let those offenders die?
> >If I understand correctly, the database will create the mapping, then it'll
> >open userfaultfd and register those mappings with the userfault.
> >Afterwards, when the application accesses a hole userfault will cause
> >SIGBUS and the application will process it in whatever way it likes, e.g.
> >just die.
> 
> Yes.
>
> >What I don't understand is why won't you use userfault monitor process that
> >will take care of the page fault events?
> >It shouldn't be much overhead running it and it can keep track on all the
> >userfault file descriptors for you and it will allow more versatile error
> >handling that SIGBUS.
> >
> 
> Co-ordination with the external monitor process by all the database
> processes
> to send  their userfaultfd is still an overhead.

You are planning to register in userfaultfd only the holes you punch to
deallocate pages, am I right?

And the co-ordination of the userfault file descriptor with the monitor
would have been added after calls to fallocate() and userfaultfd_register()?

I've just been thinking that maybe it would be possible to use
UFFD_EVENT_REMOVE for this case. We anyway need to implement the generation
of UFFD_EVENT_REMOVE for the case of hole punching in hugetlbfs for
non-cooperative userfaultfd. It could be that it will solve your issue as
well.

> >>>Using userfaultfd mechanism, with this support to get a signal, database
> >>>application can prevent pages from being allocated implicitly when
> >>>processes access mapped address over holes in the file.
> >>>
> >>>This patch adds the feature to request for a SIGBUS signal to userfaultfd
> >>>mechanism.
> >>>
> >>>See following for previous discussion about the database requirement
> >>>leading to this proposal as suggested by Andrea.
> >>>
> >>>http://www.spinics.net/lists/linux-mm/msg129224.html
> >>Please make those requirements part of the changelog.
> >>
> >>>Signed-off-by: Prakash <prakash.sangappa-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> >>>---
> >>>  fs/userfaultfd.c                 |  5 +++++
> >>>  include/uapi/linux/userfaultfd.h | 10 +++++++++-
> >>>  2 files changed, 14 insertions(+), 1 deletion(-)
> >>>
> >>>diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> >>>index 1d622f2..5686d6d2 100644
> >>>--- a/fs/userfaultfd.c
> >>>+++ b/fs/userfaultfd.c
> >>>@@ -371,6 +371,11 @@ int handle_userfault(struct vm_fault *vmf, unsigned
> >>>long reason)
> >>>      VM_BUG_ON(reason & ~(VM_UFFD_MISSING|VM_UFFD_WP));
> >>>      VM_BUG_ON(!(reason & VM_UFFD_MISSING) ^ !!(reason & VM_UFFD_WP));
> >>>
> >>>+    if (ctx->features & UFFD_FEATURE_SIGBUS) {
> >>>+        goto out;
> >>>+    }
> >>>+
> >>>      /*
> >>>       * If it's already released don't get it. This avoids to loop
> >>>       * in __get_user_pages if userfaultfd_release waits on the
> >>>diff --git a/include/uapi/linux/userfaultfd.h
> >>>b/include/uapi/linux/userfaultfd.h
> >>>index 3b05953..d39d5db 100644
> >>>--- a/include/uapi/linux/userfaultfd.h
> >>>+++ b/include/uapi/linux/userfaultfd.h
> >>>@@ -23,7 +23,8 @@
> >>>                 UFFD_FEATURE_EVENT_REMOVE |    \
> >>>                 UFFD_FEATURE_EVENT_UNMAP |        \
> >>>                 UFFD_FEATURE_MISSING_HUGETLBFS |    \
> >>>-               UFFD_FEATURE_MISSING_SHMEM)
> >>>+               UFFD_FEATURE_MISSING_SHMEM |        \
> >>>+               UFFD_FEATURE_SIGBUS)
> >>>  #define UFFD_API_IOCTLS                \
> >>>      ((__u64)1 << _UFFDIO_REGISTER |        \
> >>>       (__u64)1 << _UFFDIO_UNREGISTER |    \
> >>>@@ -153,6 +154,12 @@ struct uffdio_api {
> >>>       * UFFD_FEATURE_MISSING_SHMEM works the same as
> >>>       * UFFD_FEATURE_MISSING_HUGETLBFS, but it applies to shmem
> >>>       * (i.e. tmpfs and other shmem based APIs).
> >>>+     *
> >>>+     * UFFD_FEATURE_SIGBUS feature means no page-fault
> >>>+     * (UFFD_EVENT_PAGEFAULT) event will be delivered, instead
> >>>+     * a SIGBUS signal will be sent to the faulting process.
> >>>+     * The application process can enable this behavior by adding
> >>>+     * it to uffdio_api.features.
> >>>       */
> >>>  #define UFFD_FEATURE_PAGEFAULT_FLAG_WP        (1<<0)
> >>>  #define UFFD_FEATURE_EVENT_FORK            (1<<1)
> >>>@@ -161,6 +168,7 @@ struct uffdio_api {
> >>>  #define UFFD_FEATURE_MISSING_HUGETLBFS        (1<<4)
> >>>  #define UFFD_FEATURE_MISSING_SHMEM        (1<<5)
> >>>  #define UFFD_FEATURE_EVENT_UNMAP        (1<<6)
> >>>+#define UFFD_FEATURE_SIGBUS            (1<<7)
> >>>      __u64 features;
> >>>
> >>>      __u64 ioctls;
> >>>-- 
> >>>2.7.4
> >>>
> >>-- 
> >>Michal Hocko
> >>SUSE Labs
> >>
> >--
> >Sincerely yours,
> >Mike.
> >
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] userfaultfd: Add feature to request for a signal delivery
  2017-06-28 13:18         ` Mike Rapoport
@ 2017-06-28 18:23           ` Prakash Sangappa
  2017-06-29  8:09             ` Michal Hocko
  2017-06-29 10:46             ` Mike Rapoport
  0 siblings, 2 replies; 13+ messages in thread
From: Prakash Sangappa @ 2017-06-28 18:23 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Michal Hocko, linux-kernel, linux-mm, Andrea Arcangeli,
	Mike Kravetz, Dave Hansen, Christoph Hellwig, linux-api



On 6/28/17 6:18 AM, Mike Rapoport wrote:
> On Tue, Jun 27, 2017 at 09:01:20AM -0700, Prakash Sangappa wrote:
>> On 6/27/17 8:35 AM, Mike Rapoport wrote:
>>
>>> On Tue, Jun 27, 2017 at 09:06:43AM +0200, Michal Hocko wrote:
>>>> This is an user visible API so let's CC linux-api mailing list.
>>>>
>>>> On Mon 26-06-17 12:46:13, Prakash Sangappa wrote:
>>>>
>>>>> Any access to mapped address over holes in the file, which can occur due
>>>>> to bugs in the application, is considered invalid and expect the process
>>>>> to simply receive a SIGBUS.  However, currently when a hole in the file is
>>>>> accessed via the mapped address, kernel/mm attempts to automatically
>>>>> allocate a page at page fault time, resulting in implicitly filling the
>>>>> hole in the file. This may not be the desired behavior for applications
>>>>> like the database that want to explicitly manage page allocations of
>>>>> hugetlbfs files.
>>>> So you register UFFD_FEATURE_SIGBUS on each region tha you are unmapping
>>>> and than just let those offenders die?
>>> If I understand correctly, the database will create the mapping, then it'll
>>> open userfaultfd and register those mappings with the userfault.
>>> Afterwards, when the application accesses a hole userfault will cause
>>> SIGBUS and the application will process it in whatever way it likes, e.g.
>>> just die.
>> Yes.
>>
>>> What I don't understand is why won't you use userfault monitor process that
>>> will take care of the page fault events?
>>> It shouldn't be much overhead running it and it can keep track on all the
>>> userfault file descriptors for you and it will allow more versatile error
>>> handling that SIGBUS.
>>>
>> Co-ordination with the external monitor process by all the database
>> processes
>> to send  their userfaultfd is still an overhead.
> You are planning to register in userfaultfd only the holes you punch to
> deallocate pages, am I right?


No, the entire mmap'ed region. The DB processes would mmap(MAP_NORESERVE)
hugetlbfs files, register this mapped address with userfaultfd ones 
right after
the mmap() call.

>
> And the co-ordination of the userfault file descriptor with the monitor
> would have been added after calls to fallocate() and userfaultfd_register()?

Well, the database application does not need to deal with a monitor.

>
> I've just been thinking that maybe it would be possible to use
> UFFD_EVENT_REMOVE for this case. We anyway need to implement the generation
> of UFFD_EVENT_REMOVE for the case of hole punching in hugetlbfs for
> non-cooperative userfaultfd. It could be that it will solve your issue as
> well.
>

Will this result in a signal delivery?

In the use case described, the database application does not need any event
for  hole punching. Basically, just a signal for any invalid access to 
mapped
area over holes in the file.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] userfaultfd: Add feature to request for a signal delivery
  2017-06-28 18:23           ` Prakash Sangappa
@ 2017-06-29  8:09             ` Michal Hocko
  2017-06-29 21:41               ` prakash.sangappa
  2017-06-29 10:46             ` Mike Rapoport
  1 sibling, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2017-06-29  8:09 UTC (permalink / raw)
  To: Prakash Sangappa
  Cc: Mike Rapoport, linux-kernel, linux-mm, Andrea Arcangeli,
	Mike Kravetz, Dave Hansen, Christoph Hellwig, linux-api

On Wed 28-06-17 11:23:32, Prakash Sangappa wrote:
> 
> 
> On 6/28/17 6:18 AM, Mike Rapoport wrote:
[...]
> >I've just been thinking that maybe it would be possible to use
> >UFFD_EVENT_REMOVE for this case. We anyway need to implement the generation
> >of UFFD_EVENT_REMOVE for the case of hole punching in hugetlbfs for
> >non-cooperative userfaultfd. It could be that it will solve your issue as
> >well.
> >
> 
> Will this result in a signal delivery?
> 
> In the use case described, the database application does not need any event
> for  hole punching. Basically, just a signal for any invalid access to
> mapped area over holes in the file.

OK, but it would be better to think that through for other potential
usecases so that this doesn't end up as a single hugetlb feature. E.g.
what should happen if a regular anonymous memory gets swapped out?
Should we deliver signal as well? How does userspace tell whether this
was a no backing page from unavailable backing page?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] userfaultfd: Add feature to request for a signal delivery
  2017-06-28 18:23           ` Prakash Sangappa
  2017-06-29  8:09             ` Michal Hocko
@ 2017-06-29 10:46             ` Mike Rapoport
  2017-06-29 21:49               ` prakash.sangappa
  1 sibling, 1 reply; 13+ messages in thread
From: Mike Rapoport @ 2017-06-29 10:46 UTC (permalink / raw)
  To: Prakash Sangappa
  Cc: Michal Hocko, linux-kernel, linux-mm, Andrea Arcangeli,
	Mike Kravetz, Dave Hansen, Christoph Hellwig, linux-api

On Wed, Jun 28, 2017 at 11:23:32AM -0700, Prakash Sangappa wrote:
> 
> 
> On 6/28/17 6:18 AM, Mike Rapoport wrote:
> >On Tue, Jun 27, 2017 at 09:01:20AM -0700, Prakash Sangappa wrote:
> >>On 6/27/17 8:35 AM, Mike Rapoport wrote:
> >>
> >>>On Tue, Jun 27, 2017 at 09:06:43AM +0200, Michal Hocko wrote:
> >>>>This is an user visible API so let's CC linux-api mailing list.
> >>>>
> >>>>On Mon 26-06-17 12:46:13, Prakash Sangappa wrote:
> >>>>
> >>>>>Any access to mapped address over holes in the file, which can occur due
> >>>>>to bugs in the application, is considered invalid and expect the process
> >>>>>to simply receive a SIGBUS.  However, currently when a hole in the file is
> >>>>>accessed via the mapped address, kernel/mm attempts to automatically
> >>>>>allocate a page at page fault time, resulting in implicitly filling the
> >>>>>hole in the file. This may not be the desired behavior for applications
> >>>>>like the database that want to explicitly manage page allocations of
> >>>>>hugetlbfs files.
> >>>>So you register UFFD_FEATURE_SIGBUS on each region tha you are unmapping
> >>>>and than just let those offenders die?
> >>>If I understand correctly, the database will create the mapping, then it'll
> >>>open userfaultfd and register those mappings with the userfault.
> >>>Afterwards, when the application accesses a hole userfault will cause
> >>>SIGBUS and the application will process it in whatever way it likes, e.g.
> >>>just die.
> >>Yes.
> >>
> >>>What I don't understand is why won't you use userfault monitor process that
> >>>will take care of the page fault events?
> >>>It shouldn't be much overhead running it and it can keep track on all the
> >>>userfault file descriptors for you and it will allow more versatile error
> >>>handling that SIGBUS.
> >>>
> >>Co-ordination with the external monitor process by all the database
> >>processes
> >>to send  their userfaultfd is still an overhead.
> >You are planning to register in userfaultfd only the holes you punch to
> >deallocate pages, am I right?
> 
> 
> No, the entire mmap'ed region. The DB processes would mmap(MAP_NORESERVE)
> hugetlbfs files, register this mapped address with userfaultfd ones right
> after
> the mmap() call.
> 
> >
> >And the co-ordination of the userfault file descriptor with the monitor
> >would have been added after calls to fallocate() and userfaultfd_register()?
> 
> Well, the database application does not need to deal with a monitor.
> 
> >
> >I've just been thinking that maybe it would be possible to use
> >UFFD_EVENT_REMOVE for this case. We anyway need to implement the generation
> >of UFFD_EVENT_REMOVE for the case of hole punching in hugetlbfs for
> >non-cooperative userfaultfd. It could be that it will solve your issue as
> >well.
> >
> 
> Will this result in a signal delivery?
> 
> In the use case described, the database application does not need any event
> for  hole punching. Basically, just a signal for any invalid access to
> mapped
> area over holes in the file.
 
Well, what I had in mind was using a single-process uffd monitor that will
track all the userfault file descriptors. With UFFD_EVENT_REMOVE this
process will know what areas are invalid and it will be able to process the
invalid access in any way it likes, e.g. send SIGBUS to the database
application.

If you mmap() and userfaultfd_register() only at the initialization time,
it might be also possible to avoid sending userfault file descriptors to
the monitor process with UFFD_FEATURE_EVENT_FORK.

--
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] userfaultfd: Add feature to request for a signal delivery
  2017-06-29  8:09             ` Michal Hocko
@ 2017-06-29 21:41               ` prakash.sangappa
       [not found]                 ` <936bde7b-1913-5589-22f4-9bbfdb6a8dd5-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: prakash.sangappa @ 2017-06-29 21:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Rapoport, linux-kernel, linux-mm, Andrea Arcangeli,
	Mike Kravetz, Dave Hansen, Christoph Hellwig, linux-api



On 06/29/2017 01:09 AM, Michal Hocko wrote:
> On Wed 28-06-17 11:23:32, Prakash Sangappa wrote:
>>
>> On 6/28/17 6:18 AM, Mike Rapoport wrote:
> [...]
>>> I've just been thinking that maybe it would be possible to use
>>> UFFD_EVENT_REMOVE for this case. We anyway need to implement the generation
>>> of UFFD_EVENT_REMOVE for the case of hole punching in hugetlbfs for
>>> non-cooperative userfaultfd. It could be that it will solve your issue as
>>> well.
>>>
>> Will this result in a signal delivery?
>>
>> In the use case described, the database application does not need any event
>> for  hole punching. Basically, just a signal for any invalid access to
>> mapped area over holes in the file.
> OK, but it would be better to think that through for other potential
> usecases so that this doesn't end up as a single hugetlb feature. E.g.
> what should happen if a regular anonymous memory gets swapped out?
> Should we deliver signal as well? How does userspace tell whether this
> was a no backing page from unavailable backing page?

This may not be useful in all cases. Potential, it could be used
with use of mlock() on anonymous memory to ensure any access
to memory that is not locked is caught, again for robustness
purpose.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] userfaultfd: Add feature to request for a signal delivery
  2017-06-29 10:46             ` Mike Rapoport
@ 2017-06-29 21:49               ` prakash.sangappa
  0 siblings, 0 replies; 13+ messages in thread
From: prakash.sangappa @ 2017-06-29 21:49 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Michal Hocko, linux-kernel, linux-mm, Andrea Arcangeli,
	Mike Kravetz, Dave Hansen, Christoph Hellwig, linux-api



On 06/29/2017 03:46 AM, Mike Rapoport wrote:
> On Wed, Jun 28, 2017 at 11:23:32AM -0700, Prakash Sangappa wrote:
[...]
>>
>> Will this result in a signal delivery?
>>
>> In the use case described, the database application does not need any event
>> for  hole punching. Basically, just a signal for any invalid access to
>> mapped
>> area over holes in the file.
>   
> Well, what I had in mind was using a single-process uffd monitor that will
> track all the userfault file descriptors. With UFFD_EVENT_REMOVE this
> process will know what areas are invalid and it will be able to process the
> invalid access in any way it likes, e.g. send SIGBUS to the database
> application.


Use of a monitor process is also an overhead for the database.


>
> If you mmap() and userfaultfd_register() only at the initialization time,
> it might be also possible to avoid sending userfault file descriptors to
> the monitor process with UFFD_FEATURE_EVENT_FORK.

The new processes are always exec'd in the database case and these
processes could be mapping different files. So, not sure if
UFFD_FEATURE_EVENT_FORK will be useful.  Also, it may not be one
process spawning the other new processes.


>
> --
> Sincerely yours,
> Mike.
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] userfaultfd: Add feature to request for a signal delivery
       [not found]                 ` <936bde7b-1913-5589-22f4-9bbfdb6a8dd5-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2017-06-30  9:47                   ` Michal Hocko
  2017-06-30 13:08                     ` Andrea Arcangeli
       [not found]                     ` <20170630094718.GE22917-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  0 siblings, 2 replies; 13+ messages in thread
From: Michal Hocko @ 2017-06-30  9:47 UTC (permalink / raw)
  To: prakash.sangappa
  Cc: Mike Rapoport, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Andrea Arcangeli, Mike Kravetz,
	Dave Hansen, Christoph Hellwig, linux-api-u79uwXL29TY76Z2rM5mHXA,
	John Stultz

[CC John, the thread started
http://lkml.kernel.org/r/9363561f-a9cd-7ab6-9c11-ab9a99dc89f1-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org]

On Thu 29-06-17 14:41:22, prakash.sangappa wrote:
> 
> 
> On 06/29/2017 01:09 AM, Michal Hocko wrote:
> >On Wed 28-06-17 11:23:32, Prakash Sangappa wrote:
> >>
> >>On 6/28/17 6:18 AM, Mike Rapoport wrote:
> >[...]
> >>>I've just been thinking that maybe it would be possible to use
> >>>UFFD_EVENT_REMOVE for this case. We anyway need to implement the generation
> >>>of UFFD_EVENT_REMOVE for the case of hole punching in hugetlbfs for
> >>>non-cooperative userfaultfd. It could be that it will solve your issue as
> >>>well.
> >>>
> >>Will this result in a signal delivery?
> >>
> >>In the use case described, the database application does not need any event
> >>for  hole punching. Basically, just a signal for any invalid access to
> >>mapped area over holes in the file.
> >OK, but it would be better to think that through for other potential
> >usecases so that this doesn't end up as a single hugetlb feature. E.g.
> >what should happen if a regular anonymous memory gets swapped out?
> >Should we deliver signal as well? How does userspace tell whether this
> >was a no backing page from unavailable backing page?
> 
> This may not be useful in all cases. Potential, it could be used
> with use of mlock() on anonymous memory to ensure any access
> to memory that is not locked is caught, again for robustness
> purpose.

The thing I wanted to point out is that not only this should be a single
usecase thing (I believe others will pop out as well - see below) but it
should also be well defined as this is a user visible API. Please try to
write a patch to the userfaultfd man page to clarify the exact semantic.
This should help the further discussion.

As an aside, I rememeber that prior to MADV_FREE there was long
discussion about lazy freeing of memory from userspace. Some users
wanted to be signalled when their memory was freed by the system so that
they could rebuild the original content (e.g. uncompressed images in
memory). It seems like MADV_FREE + this signalling could be used for
that usecase. John would surely know more about those usecases.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] userfaultfd: Add feature to request for a signal delivery
  2017-06-30  9:47                   ` Michal Hocko
@ 2017-06-30 13:08                     ` Andrea Arcangeli
       [not found]                       ` <20170630130813.GA5738-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
       [not found]                     ` <20170630094718.GE22917-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  1 sibling, 1 reply; 13+ messages in thread
From: Andrea Arcangeli @ 2017-06-30 13:08 UTC (permalink / raw)
  To: Michal Hocko
  Cc: prakash.sangappa, Mike Rapoport, linux-kernel, linux-mm,
	Mike Kravetz, Dave Hansen, Christoph Hellwig, linux-api,
	John Stultz

On Fri, Jun 30, 2017 at 11:47:35AM +0200, Michal Hocko wrote:
> [CC John, the thread started
> http://lkml.kernel.org/r/9363561f-a9cd-7ab6-9c11-ab9a99dc89f1@oracle.com]
> 
> On Thu 29-06-17 14:41:22, prakash.sangappa wrote:
> > 
> > 
> > On 06/29/2017 01:09 AM, Michal Hocko wrote:
> > >On Wed 28-06-17 11:23:32, Prakash Sangappa wrote:
> > >>
> > >>On 6/28/17 6:18 AM, Mike Rapoport wrote:
> > >[...]
> > >>>I've just been thinking that maybe it would be possible to use
> > >>>UFFD_EVENT_REMOVE for this case. We anyway need to implement the generation
> > >>>of UFFD_EVENT_REMOVE for the case of hole punching in hugetlbfs for
> > >>>non-cooperative userfaultfd. It could be that it will solve your issue as
> > >>>well.
> > >>>
> > >>Will this result in a signal delivery?
> > >>
> > >>In the use case described, the database application does not need any event
> > >>for  hole punching. Basically, just a signal for any invalid access to
> > >>mapped area over holes in the file.
> > >OK, but it would be better to think that through for other potential
> > >usecases so that this doesn't end up as a single hugetlb feature. E.g.
> > >what should happen if a regular anonymous memory gets swapped out?
> > >Should we deliver signal as well? How does userspace tell whether this
> > >was a no backing page from unavailable backing page?
> > 
> > This may not be useful in all cases. Potential, it could be used
> > with use of mlock() on anonymous memory to ensure any access
> > to memory that is not locked is caught, again for robustness
> > purpose.
> 
> The thing I wanted to point out is that not only this should be a single
> usecase thing (I believe others will pop out as well - see below) but it
> should also be well defined as this is a user visible API. Please try to
> write a patch to the userfaultfd man page to clarify the exact semantic.
> This should help the further discussion.
> 
> As an aside, I rememeber that prior to MADV_FREE there was long
> discussion about lazy freeing of memory from userspace. Some users
> wanted to be signalled when their memory was freed by the system so that
> they could rebuild the original content (e.g. uncompressed images in
> memory). It seems like MADV_FREE + this signalling could be used for
> that usecase. John would surely know more about those usecases.

That would provide an equivalent API to the one volatile pages
provided agreed. So it would allow to adapt code (if any?) more easily
to drop the duplicate feature in volatile pages code (however it would
be faster if the userland code using volatile pages lazy reclaim mode
was converted to poll the uffd so the kernel talks directly to the
monitor without involving a SIGBUS signal handler which will cause
spurious enter/exit if compared to signal-less uffd API).

The main benefit in my view is not volatile pages but that
UFFD_FEATURE_SIGBUS would work equally well to enforce robustness on
all kind of memory not only hugetlbfs (so one could run the database
with robustness on THP over tmpfs) and the new cache can be injected
in the filesystem using UFFDIO_COPY which is likely faster than
fallocate as UFFDIO_COPY was already demonstrated to be faster even
than a regular page fault.

It's also simpler to handle backwards compatibility with the
UFFDIO_API call, that allows probing if UFFD_FEATURE_SIGBUS is
supported by the running kernel regardless of kernel version (so it
can be backported and enabled by the database, without the database
noticing it's on a older kernel version).

So while this wasn't the intended way to use the userfault and I
already pointed out the possibility to use a single monitor to do all
this, I'm positive about UFFD_FEATURE_SIGBUS if the overhead of having
a monitor is so concerning.

Ultimately there are many pros and just a single cons: the branch in
handle_userfault().

I wonder if it would be possible to use static_branch_enable() in
UFFDIO_API and static_branch_unlikely in handle_userfault() to
eliminate that branch but perhaps it's overkill and UFFDIO_API is
unprivileged and it would send an IPI to all CPUs. I don't think we
normally expose the static_branch_enable() to unprivileged userland
and making UFFD_FEATURE_SIGBUS a privileged op doesn't sound
attractive (although the alternative of altering a hugetlbfs mount
option would be a privileged op).

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] userfaultfd: Add feature to request for a signal delivery
       [not found]                       ` <20170630130813.GA5738-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-07-01  0:55                         ` prakash sangappa
       [not found]                           ` <5956F2EC.1000805-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: prakash sangappa @ 2017-07-01  0:55 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Michal Hocko, Mike Rapoport, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Mike Kravetz, Dave Hansen,
	Christoph Hellwig, linux-api-u79uwXL29TY76Z2rM5mHXA, John Stultz


On 6/30/2017 6:08 AM, Andrea Arcangeli wrote:
> On Fri, Jun 30, 2017 at 11:47:35AM +0200, Michal Hocko wrote:
[...]
>> As an aside, I rememeber that prior to MADV_FREE there was long
>> discussion about lazy freeing of memory from userspace. Some users
>> wanted to be signalled when their memory was freed by the system so that
>> they could rebuild the original content (e.g. uncompressed images in
>> memory). It seems like MADV_FREE + this signalling could be used for
>> that usecase. John would surely know more about those usecases.
> That would provide an equivalent API to the one volatile pages
> provided agreed. So it would allow to adapt code (if any?) more easily
> to drop the duplicate feature in volatile pages code (however it would
> be faster if the userland code using volatile pages lazy reclaim mode
> was converted to poll the uffd so the kernel talks directly to the
> monitor without involving a SIGBUS signal handler which will cause
> spurious enter/exit if compared to signal-less uffd API).
>
> The main benefit in my view is not volatile pages but that
> UFFD_FEATURE_SIGBUS would work equally well to enforce robustness on
> all kind of memory not only hugetlbfs (so one could run the database
> with robustness on THP over tmpfs) and the new cache can be injected
> in the filesystem using UFFDIO_COPY which is likely faster than
> fallocate as UFFDIO_COPY was already demonstrated to be faster even
> than a regular page fault.

Interesting that UFFDIO_COPY is faster then fallocate().  In the DB use case
the page does not need to be allocated at the time a process trips on 
the hugetlbfs
file hole and receives SIGBUS.  fallocate() is called on the hugetlbfs file,
when more memory needs to be allocated by a separate process.

> It's also simpler to handle backwards compatibility with the
> UFFDIO_API call, that allows probing if UFFD_FEATURE_SIGBUS is
> supported by the running kernel regardless of kernel version (so it
> can be backported and enabled by the database, without the database
> noticing it's on a older kernel version).

Yes, this is useful as this change will need to be back ported.

> So while this wasn't the intended way to use the userfault and I
> already pointed out the possibility to use a single monitor to do all
> this, I'm positive about UFFD_FEATURE_SIGBUS if the overhead of having
> a monitor is so concerning.
>
> Ultimately there are many pros and just a single cons: the branch in
> handle_userfault().
>
> I wonder if it would be possible to use static_branch_enable() in
> UFFDIO_API and static_branch_unlikely in handle_userfault() to
> eliminate that branch but perhaps it's overkill and UFFDIO_API is
> unprivileged and it would send an IPI to all CPUs. I don't think we
> normally expose the static_branch_enable() to unprivileged userland
> and making UFFD_FEATURE_SIGBUS a privileged op doesn't sound
> attractive (although the alternative of altering a hugetlbfs mount
> option would be a privileged op).

Regarding hugetlbfs mount option, one consideration is to allow mounts of
hugetlbfs inside user namespaces's mount namespace. Which would allow
non privileged processes to mount hugetlbfs for use inside a user 
namespace.
This may be needed even for the 'min_size' mount option using which an
application could reserve huge pages and mount a filesystem for its use,
with out the need to have privileges given the system has enough hugepages
configured.  It seems if non privileged processes are allowed to mount 
hugetlbfs
filesystem, then min_size should be subject to some resource limits.

Mounting inside user namespace will be a different patch proposal later.


>
> Thanks,
> Andrea

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] userfaultfd: Add feature to request for a signal delivery
       [not found]                           ` <5956F2EC.1000805-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2017-07-04 16:40                             ` Andrea Arcangeli
  2017-07-05 22:24                               ` prakash.sangappa
  0 siblings, 1 reply; 13+ messages in thread
From: Andrea Arcangeli @ 2017-07-04 16:40 UTC (permalink / raw)
  To: prakash sangappa
  Cc: Michal Hocko, Mike Rapoport, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Mike Kravetz, Dave Hansen,
	Christoph Hellwig, linux-api-u79uwXL29TY76Z2rM5mHXA, John Stultz

On Fri, Jun 30, 2017 at 05:55:08PM -0700, prakash sangappa wrote:
> Interesting that UFFDIO_COPY is faster then fallocate().  In the DB use case
> the page does not need to be allocated at the time a process trips on 
> the hugetlbfs
> file hole and receives SIGBUS.  fallocate() is called on the hugetlbfs file,
> when more memory needs to be allocated by a separate process.

The major difference is that with UFFDIO_COPY the hugepage will be
immediately mapped into the virtual address without requiring any
further minor fault. So it's ideal if you could arrange to call
UFFDIO_COPY from the same process that is going to touch and use the
hugetlbfs data immediately after. You would eliminate a minor fault
that way.

UFFDIO_COPY at least for anon was measured to perform better than a
regular page fault too.

> Regarding hugetlbfs mount option, one consideration is to allow mounts of
> hugetlbfs inside user namespaces's mount namespace. Which would allow
> non privileged processes to mount hugetlbfs for use inside a user 
> namespace.
> This may be needed even for the 'min_size' mount option using which an
> application could reserve huge pages and mount a filesystem for its use,
> with out the need to have privileges given the system has enough hugepages
> configured.  It seems if non privileged processes are allowed to mount 
> hugetlbfs
> filesystem, then min_size should be subject to some resource limits.
> 
> Mounting inside user namespace will be a different patch proposal later.

There's no particular reason to make UFFDIO_FEATURE_SIGBUS a
privileged op unless we want to eliminate the branch with the static
key, so it's certainly simpler than dealing with hugetlbfs min_size
reserves.

I'm positive about the UFFDIO_FEATURE_SIGBUS tradeoffs, but others
feel free to comment.

If you could make second patch to extend the selftest to exercise and
validates UFFDIO_FEATURE_SIGBUS in anon/shmem/hugetlbfs it'd be great.

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] userfaultfd: Add feature to request for a signal delivery
       [not found]                     ` <20170630094718.GE22917-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2017-07-05 18:41                       ` John Stultz
  0 siblings, 0 replies; 13+ messages in thread
From: John Stultz @ 2017-07-05 18:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: prakash.sangappa, Mike Rapoport, lkml,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Andrea Arcangeli, Mike Kravetz,
	Dave Hansen, Christoph Hellwig, Linux API

On Fri, Jun 30, 2017 at 2:47 AM, Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> [CC John, the thread started
> http://lkml.kernel.org/r/9363561f-a9cd-7ab6-9c11-ab9a99dc89f1-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org]
>
> On Thu 29-06-17 14:41:22, prakash.sangappa wrote:
>>
>>
>> On 06/29/2017 01:09 AM, Michal Hocko wrote:
>> >On Wed 28-06-17 11:23:32, Prakash Sangappa wrote:
>> >>
>> >>On 6/28/17 6:18 AM, Mike Rapoport wrote:
>> >[...]
>> >>>I've just been thinking that maybe it would be possible to use
>> >>>UFFD_EVENT_REMOVE for this case. We anyway need to implement the generation
>> >>>of UFFD_EVENT_REMOVE for the case of hole punching in hugetlbfs for
>> >>>non-cooperative userfaultfd. It could be that it will solve your issue as
>> >>>well.
>> >>>
>> >>Will this result in a signal delivery?
>> >>
>> >>In the use case described, the database application does not need any event
>> >>for  hole punching. Basically, just a signal for any invalid access to
>> >>mapped area over holes in the file.
>> >OK, but it would be better to think that through for other potential
>> >usecases so that this doesn't end up as a single hugetlb feature. E.g.
>> >what should happen if a regular anonymous memory gets swapped out?
>> >Should we deliver signal as well? How does userspace tell whether this
>> >was a no backing page from unavailable backing page?
>>
>> This may not be useful in all cases. Potential, it could be used
>> with use of mlock() on anonymous memory to ensure any access
>> to memory that is not locked is caught, again for robustness
>> purpose.
>
> The thing I wanted to point out is that not only this should be a single
> usecase thing (I believe others will pop out as well - see below) but it
> should also be well defined as this is a user visible API. Please try to
> write a patch to the userfaultfd man page to clarify the exact semantic.
> This should help the further discussion.
>
> As an aside, I rememeber that prior to MADV_FREE there was long
> discussion about lazy freeing of memory from userspace. Some users
> wanted to be signalled when their memory was freed by the system so that
> they could rebuild the original content (e.g. uncompressed images in
> memory). It seems like MADV_FREE + this signalling could be used for
> that usecase. John would surely know more about those usecases.

Sorry for being slow to reply here. The main usecase for Android is
explicit marking and unmarking of volatile pages, where the userspace
is notified if any pages were purged when it sets a page range
non-volatile, and no access of volatile pages are made before they are
marked non-volatile.

As part of my generalization for the API, there were other users
interested in the marking pages volatile, and then optimistically
using the pages w/o marking them non-volatile. Then only when the user
touched a purged volatile page they would then get a signal they could
handle to mark the pages non-volatile and re-generate the data.

This second use case seems like it would be potentially doable with
the userfaultfd interface, but I'm not sure I see how we could fit the
first use case (which Android's ashmem provides) with it (at least in
an efficient way).

thanks
-john

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] userfaultfd: Add feature to request for a signal delivery
  2017-07-04 16:40                             ` Andrea Arcangeli
@ 2017-07-05 22:24                               ` prakash.sangappa
  0 siblings, 0 replies; 13+ messages in thread
From: prakash.sangappa @ 2017-07-05 22:24 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Michal Hocko, Mike Rapoport, linux-kernel, linux-mm,
	Mike Kravetz, Dave Hansen, Christoph Hellwig, linux-api,
	John Stultz



On 07/04/2017 09:40 AM, Andrea Arcangeli wrote:
> On Fri, Jun 30, 2017 at 05:55:08PM -0700, prakash sangappa wrote:
>> Interesting that UFFDIO_COPY is faster then fallocate().  In the DB use case
>> the page does not need to be allocated at the time a process trips on
>> the hugetlbfs
>> file hole and receives SIGBUS.  fallocate() is called on the hugetlbfs file,
>> when more memory needs to be allocated by a separate process.
> The major difference is that with UFFDIO_COPY the hugepage will be
> immediately mapped into the virtual address without requiring any
> further minor fault. So it's ideal if you could arrange to call
> UFFDIO_COPY from the same process that is going to touch and use the
> hugetlbfs data immediately after. You would eliminate a minor fault
> that way.

Ok, we will see how it could be used in the DB use case.

>
> UFFDIO_COPY at least for anon was measured to perform better than a
> regular page fault too.
>> Regarding hugetlbfs mount option, one consideration is to allow mounts of
>> hugetlbfs inside user namespaces's mount namespace. Which would allow
>> non privileged processes to mount hugetlbfs for use inside a user
>> namespace.
>> This may be needed even for the 'min_size' mount option using which an
>> application could reserve huge pages and mount a filesystem for its use,
>> with out the need to have privileges given the system has enough hugepages
>> configured.  It seems if non privileged processes are allowed to mount
>> hugetlbfs
>> filesystem, then min_size should be subject to some resource limits.
>>
>> Mounting inside user namespace will be a different patch proposal later.
> There's no particular reason to make UFFDIO_FEATURE_SIGBUS a
> privileged op unless we want to eliminate the branch with the static
> key, so it's certainly simpler than dealing with hugetlbfs min_size
> reserves.

Ok, so, for now will not make UFFDIO_FEATURE_SIGBUS
a privileged op and not use the static key to eliminate the
branch.


> I'm positive about the UFFDIO_FEATURE_SIGBUS tradeoffs, but others
> feel free to comment.
>
> If you could make second patch to extend the selftest to exercise and
> validates UFFDIO_FEATURE_SIGBUS in anon/shmem/hugetlbfs it'd be great.


Sure, I will update the tests and send a patch.

Thanks,
-Prakash.


>
> Thanks,
> Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-07-05 22:24 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <9363561f-a9cd-7ab6-9c11-ab9a99dc89f1@oracle.com>
     [not found] ` <20170627070643.GA28078@dhcp22.suse.cz>
     [not found]   ` <20170627153557.GB10091@rapoport-lnx>
2017-06-27 16:01     ` [RFC PATCH] userfaultfd: Add feature to request for a signal delivery Prakash Sangappa
     [not found]       ` <51508e99-d2dd-894f-8d8a-678e3747c1ee-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-06-28 13:18         ` Mike Rapoport
2017-06-28 18:23           ` Prakash Sangappa
2017-06-29  8:09             ` Michal Hocko
2017-06-29 21:41               ` prakash.sangappa
     [not found]                 ` <936bde7b-1913-5589-22f4-9bbfdb6a8dd5-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-06-30  9:47                   ` Michal Hocko
2017-06-30 13:08                     ` Andrea Arcangeli
     [not found]                       ` <20170630130813.GA5738-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-07-01  0:55                         ` prakash sangappa
     [not found]                           ` <5956F2EC.1000805-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-07-04 16:40                             ` Andrea Arcangeli
2017-07-05 22:24                               ` prakash.sangappa
     [not found]                     ` <20170630094718.GE22917-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2017-07-05 18:41                       ` John Stultz
2017-06-29 10:46             ` Mike Rapoport
2017-06-29 21:49               ` prakash.sangappa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).