All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Pavel Emelyanov <xemul@parallels.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	Linux API <linux-api@vger.kernel.org>,
	Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Subject: Re: [PATCH 2/3] uffd: Introduce the v2 API
Date: Mon, 27 Apr 2015 23:12:37 +0200	[thread overview]
Message-ID: <20150427211236.GB24035@redhat.com> (raw)
In-Reply-To: <55389133.8070701@parallels.com>

Hello,

On Thu, Apr 23, 2015 at 09:29:07AM +0300, Pavel Emelyanov wrote:
> So your proposal is to always report 16 bytes per PF from read() and
> let userspace decide itself how to handle the result?

Reading 16bytes for each userfault (instead of 8) and sharing the same
read(2) protocol (UFFD_API) for both the cooperative and
non-cooperative usages, is something I just suggested for
consideration after reading your patchset.

The pros of using a single protocol for both is that it would reduce
amount of code and there would be just one file operation for the
.read method. The cons is that it will waste 8 bytes per userfault in
terms of memory footprint. The other major cons is that it would force
us to define the format of the non cooperative protocol now despite it's
not fully finished yet.

I'm also ok with two protocols if nobody else objects, but if we use
two protocols, we should at least use different file operation methods
and use __always_inline with constants passed as parameter to optimize
away the branches at build time. This way we get the reduced memory
footprint in the read syscall without other runtime overhead
associated with it.

> >> +struct uffd_v2_msg {
> >> +	__u64	type;
> >> +	__u64	arg;
> >> +};
> >> +
> >> +#define UFFD_PAGEFAULT	0x1
> >> +
> >> +#define UFFD_PAGEFAULT_BIT	(1 << (UFFD_PAGEFAULT - 1))
> >> +#define __UFFD_API_V2_BITS	(UFFD_PAGEFAULT_BIT)
> >> +
> >> +/*
> >> + * Lower PAGE_SHIFT bits are used to report those supported
> >> + * by the pagefault message itself. Other bits are used to
> >> + * report the message types v2 API supports
> >> + */
> >> +#define UFFD_API_V2_BITS	(__UFFD_API_V2_BITS << 12)
> >> +
> > 
> > And why exactly is this 12 hardcoded?
> 
> Ah, it should have been the PAGE_SHIFT one, but I was unsure whether it
> would be OK to have different shifts in different arches.
> 
> But taking into account your comment that bits field id bad for these
> values, if we introduce the new .features one for api message, then this
> 12 will just go away.

Ok.

> > And which field should be masked
> > with the bits? In the V1 protocol it was the "arg" (userfault address)
> > not the "type". So this is a bit confusing and probably requires
> > simplification.
> 
> I see. Actually I decided that since bits higher than 12th (for x86) is
> always 0 in api message (no bits allowed there, since pfn sits in this
> place), it would be OK to put non-PF bits there.

That was ok yes.

> Should I better introduce another .features field in uffd API message?

What about renaming "uffdio_api.bits" to "uffdio_api.features"?

And then we set uffdio_api.features to
UFFD_FEATURE_WRITE|UFFD_FEATURE_WP|UFFD_FEATURE_FORK as needed.

UFFD_FEATURE_WRITE would always be enabled, it's there only in case we
want to disable it later (mostly if some arch has trouble with it,
which is unlikely, but qemu doesn't need that bit of information at
all for example so qemu would be fine if UFFD_FEATURE_WRITE
disappears).

UFFD_FEATURE_WP would signal also that the wrprotection feature (not
implemented yet) is available (then later the register ioctl would
also show the new wrprotection ioctl numbers available to mangle the
wrprotection). The UFFD_FEATURE_WP feature in the cooperative usage
(qemu live snapshotting) can use the UFFD_API first protocol too.

UFFD_FEATURE_FORK would be returned if the UFFD_API_V2 was set in
uffdio.api, and it would be part of the incremental non-cooperative
patchset.

We could also not define "UFFD_FEATURE_FORK" at all and imply that
fork/mremap/MADV_DONTNEED are all available if UFFD_API_V2 uffdio_api
ioctl succeeds... That's only doable if we keep two different read
protocols though. UFFD_FEATURE_FORK (or UFFD_FEATURE_NON_COOPERATIVE)
are really strictly needed only if we share the same read(2) protocol
for both the cooperative and non-cooperative usages.

The idea is that there's not huge benefit of only having the "fork"
feature supported but missing "mremap" and "madv_dontneed".

In fact if a new syscall that works like mremap is added later (call
it mremap2), we would need to fail the UFFDIO_API_V2 and require a
UFFDIO_API_V3 for such kernel that can return a new mremap2 type of
event. Userland couldn't just assume it is ok to use postcopy live
migration for containers, because
UFFD_FEATURE_FORK|MREMAP|MADV_DONTNEED are present in the
uffdio.features when it asked for API_V2. There shall be something
that tells userland "hey there's a new mremap2 that the software
inside the container can run on top of this kernel, so you are going
to get a new mremap2 type of userfault event too".

In any case, regardless of how we solve the above,
"uffdio_api.features" sounds better than ".bits".

If we retain two different UFFD_API, we'll be able to freeze the
current one and decide later if
UFFD_FEATURE_FORK|UFFD_FEATURE_MREMAP|UFFD_FEATURE_MADV_DONTNEED shall
be added to the .features, or if to rely on UFFD_API_V2 succeeding to
let userland know that the non-cooperative usage is fully supported by
the kernel.

Not having to freeze these details now is the main benefit of having
two different UFFD_API after all...

WARNING: multiple messages have this Message-ID (diff)
From: Andrea Arcangeli <aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Pavel Emelyanov <xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
Cc: Linux Kernel Mailing List
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Linux MM <linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>,
	Linux API <linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Sanidhya Kashyap
	<sanidhya.gatech-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Subject: Re: [PATCH 2/3] uffd: Introduce the v2 API
Date: Mon, 27 Apr 2015 23:12:37 +0200	[thread overview]
Message-ID: <20150427211236.GB24035@redhat.com> (raw)
In-Reply-To: <55389133.8070701-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>

Hello,

On Thu, Apr 23, 2015 at 09:29:07AM +0300, Pavel Emelyanov wrote:
> So your proposal is to always report 16 bytes per PF from read() and
> let userspace decide itself how to handle the result?

Reading 16bytes for each userfault (instead of 8) and sharing the same
read(2) protocol (UFFD_API) for both the cooperative and
non-cooperative usages, is something I just suggested for
consideration after reading your patchset.

The pros of using a single protocol for both is that it would reduce
amount of code and there would be just one file operation for the
.read method. The cons is that it will waste 8 bytes per userfault in
terms of memory footprint. The other major cons is that it would force
us to define the format of the non cooperative protocol now despite it's
not fully finished yet.

I'm also ok with two protocols if nobody else objects, but if we use
two protocols, we should at least use different file operation methods
and use __always_inline with constants passed as parameter to optimize
away the branches at build time. This way we get the reduced memory
footprint in the read syscall without other runtime overhead
associated with it.

> >> +struct uffd_v2_msg {
> >> +	__u64	type;
> >> +	__u64	arg;
> >> +};
> >> +
> >> +#define UFFD_PAGEFAULT	0x1
> >> +
> >> +#define UFFD_PAGEFAULT_BIT	(1 << (UFFD_PAGEFAULT - 1))
> >> +#define __UFFD_API_V2_BITS	(UFFD_PAGEFAULT_BIT)
> >> +
> >> +/*
> >> + * Lower PAGE_SHIFT bits are used to report those supported
> >> + * by the pagefault message itself. Other bits are used to
> >> + * report the message types v2 API supports
> >> + */
> >> +#define UFFD_API_V2_BITS	(__UFFD_API_V2_BITS << 12)
> >> +
> > 
> > And why exactly is this 12 hardcoded?
> 
> Ah, it should have been the PAGE_SHIFT one, but I was unsure whether it
> would be OK to have different shifts in different arches.
> 
> But taking into account your comment that bits field id bad for these
> values, if we introduce the new .features one for api message, then this
> 12 will just go away.

Ok.

> > And which field should be masked
> > with the bits? In the V1 protocol it was the "arg" (userfault address)
> > not the "type". So this is a bit confusing and probably requires
> > simplification.
> 
> I see. Actually I decided that since bits higher than 12th (for x86) is
> always 0 in api message (no bits allowed there, since pfn sits in this
> place), it would be OK to put non-PF bits there.

That was ok yes.

> Should I better introduce another .features field in uffd API message?

What about renaming "uffdio_api.bits" to "uffdio_api.features"?

And then we set uffdio_api.features to
UFFD_FEATURE_WRITE|UFFD_FEATURE_WP|UFFD_FEATURE_FORK as needed.

UFFD_FEATURE_WRITE would always be enabled, it's there only in case we
want to disable it later (mostly if some arch has trouble with it,
which is unlikely, but qemu doesn't need that bit of information at
all for example so qemu would be fine if UFFD_FEATURE_WRITE
disappears).

UFFD_FEATURE_WP would signal also that the wrprotection feature (not
implemented yet) is available (then later the register ioctl would
also show the new wrprotection ioctl numbers available to mangle the
wrprotection). The UFFD_FEATURE_WP feature in the cooperative usage
(qemu live snapshotting) can use the UFFD_API first protocol too.

UFFD_FEATURE_FORK would be returned if the UFFD_API_V2 was set in
uffdio.api, and it would be part of the incremental non-cooperative
patchset.

We could also not define "UFFD_FEATURE_FORK" at all and imply that
fork/mremap/MADV_DONTNEED are all available if UFFD_API_V2 uffdio_api
ioctl succeeds... That's only doable if we keep two different read
protocols though. UFFD_FEATURE_FORK (or UFFD_FEATURE_NON_COOPERATIVE)
are really strictly needed only if we share the same read(2) protocol
for both the cooperative and non-cooperative usages.

The idea is that there's not huge benefit of only having the "fork"
feature supported but missing "mremap" and "madv_dontneed".

In fact if a new syscall that works like mremap is added later (call
it mremap2), we would need to fail the UFFDIO_API_V2 and require a
UFFDIO_API_V3 for such kernel that can return a new mremap2 type of
event. Userland couldn't just assume it is ok to use postcopy live
migration for containers, because
UFFD_FEATURE_FORK|MREMAP|MADV_DONTNEED are present in the
uffdio.features when it asked for API_V2. There shall be something
that tells userland "hey there's a new mremap2 that the software
inside the container can run on top of this kernel, so you are going
to get a new mremap2 type of userfault event too".

In any case, regardless of how we solve the above,
"uffdio_api.features" sounds better than ".bits".

If we retain two different UFFD_API, we'll be able to freeze the
current one and decide later if
UFFD_FEATURE_FORK|UFFD_FEATURE_MREMAP|UFFD_FEATURE_MADV_DONTNEED shall
be added to the .features, or if to rely on UFFD_API_V2 succeeding to
let userland know that the non-cooperative usage is fully supported by
the kernel.

Not having to freeze these details now is the main benefit of having
two different UFFD_API after all...

WARNING: multiple messages have this Message-ID (diff)
From: Andrea Arcangeli <aarcange@redhat.com>
To: Pavel Emelyanov <xemul@parallels.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	Linux API <linux-api@vger.kernel.org>,
	Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Subject: Re: [PATCH 2/3] uffd: Introduce the v2 API
Date: Mon, 27 Apr 2015 23:12:37 +0200	[thread overview]
Message-ID: <20150427211236.GB24035@redhat.com> (raw)
In-Reply-To: <55389133.8070701@parallels.com>

Hello,

On Thu, Apr 23, 2015 at 09:29:07AM +0300, Pavel Emelyanov wrote:
> So your proposal is to always report 16 bytes per PF from read() and
> let userspace decide itself how to handle the result?

Reading 16bytes for each userfault (instead of 8) and sharing the same
read(2) protocol (UFFD_API) for both the cooperative and
non-cooperative usages, is something I just suggested for
consideration after reading your patchset.

The pros of using a single protocol for both is that it would reduce
amount of code and there would be just one file operation for the
.read method. The cons is that it will waste 8 bytes per userfault in
terms of memory footprint. The other major cons is that it would force
us to define the format of the non cooperative protocol now despite it's
not fully finished yet.

I'm also ok with two protocols if nobody else objects, but if we use
two protocols, we should at least use different file operation methods
and use __always_inline with constants passed as parameter to optimize
away the branches at build time. This way we get the reduced memory
footprint in the read syscall without other runtime overhead
associated with it.

> >> +struct uffd_v2_msg {
> >> +	__u64	type;
> >> +	__u64	arg;
> >> +};
> >> +
> >> +#define UFFD_PAGEFAULT	0x1
> >> +
> >> +#define UFFD_PAGEFAULT_BIT	(1 << (UFFD_PAGEFAULT - 1))
> >> +#define __UFFD_API_V2_BITS	(UFFD_PAGEFAULT_BIT)
> >> +
> >> +/*
> >> + * Lower PAGE_SHIFT bits are used to report those supported
> >> + * by the pagefault message itself. Other bits are used to
> >> + * report the message types v2 API supports
> >> + */
> >> +#define UFFD_API_V2_BITS	(__UFFD_API_V2_BITS << 12)
> >> +
> > 
> > And why exactly is this 12 hardcoded?
> 
> Ah, it should have been the PAGE_SHIFT one, but I was unsure whether it
> would be OK to have different shifts in different arches.
> 
> But taking into account your comment that bits field id bad for these
> values, if we introduce the new .features one for api message, then this
> 12 will just go away.

Ok.

> > And which field should be masked
> > with the bits? In the V1 protocol it was the "arg" (userfault address)
> > not the "type". So this is a bit confusing and probably requires
> > simplification.
> 
> I see. Actually I decided that since bits higher than 12th (for x86) is
> always 0 in api message (no bits allowed there, since pfn sits in this
> place), it would be OK to put non-PF bits there.

That was ok yes.

> Should I better introduce another .features field in uffd API message?

What about renaming "uffdio_api.bits" to "uffdio_api.features"?

And then we set uffdio_api.features to
UFFD_FEATURE_WRITE|UFFD_FEATURE_WP|UFFD_FEATURE_FORK as needed.

UFFD_FEATURE_WRITE would always be enabled, it's there only in case we
want to disable it later (mostly if some arch has trouble with it,
which is unlikely, but qemu doesn't need that bit of information at
all for example so qemu would be fine if UFFD_FEATURE_WRITE
disappears).

UFFD_FEATURE_WP would signal also that the wrprotection feature (not
implemented yet) is available (then later the register ioctl would
also show the new wrprotection ioctl numbers available to mangle the
wrprotection). The UFFD_FEATURE_WP feature in the cooperative usage
(qemu live snapshotting) can use the UFFD_API first protocol too.

UFFD_FEATURE_FORK would be returned if the UFFD_API_V2 was set in
uffdio.api, and it would be part of the incremental non-cooperative
patchset.

We could also not define "UFFD_FEATURE_FORK" at all and imply that
fork/mremap/MADV_DONTNEED are all available if UFFD_API_V2 uffdio_api
ioctl succeeds... That's only doable if we keep two different read
protocols though. UFFD_FEATURE_FORK (or UFFD_FEATURE_NON_COOPERATIVE)
are really strictly needed only if we share the same read(2) protocol
for both the cooperative and non-cooperative usages.

The idea is that there's not huge benefit of only having the "fork"
feature supported but missing "mremap" and "madv_dontneed".

In fact if a new syscall that works like mremap is added later (call
it mremap2), we would need to fail the UFFDIO_API_V2 and require a
UFFDIO_API_V3 for such kernel that can return a new mremap2 type of
event. Userland couldn't just assume it is ok to use postcopy live
migration for containers, because
UFFD_FEATURE_FORK|MREMAP|MADV_DONTNEED are present in the
uffdio.features when it asked for API_V2. There shall be something
that tells userland "hey there's a new mremap2 that the software
inside the container can run on top of this kernel, so you are going
to get a new mremap2 type of userfault event too".

In any case, regardless of how we solve the above,
"uffdio_api.features" sounds better than ".bits".

If we retain two different UFFD_API, we'll be able to freeze the
current one and decide later if
UFFD_FEATURE_FORK|UFFD_FEATURE_MREMAP|UFFD_FEATURE_MADV_DONTNEED shall
be added to the .features, or if to rely on UFFD_API_V2 succeeding to
let userland know that the non-cooperative usage is fully supported by
the kernel.

Not having to freeze these details now is the main benefit of having
two different UFFD_API after all...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-04-27 21:12 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-18 19:34 [PATCH 0/3] UserfaultFD: Extension for non cooperative uffd usage Pavel Emelyanov
2015-03-18 19:34 ` Pavel Emelyanov
2015-03-18 19:34 ` Pavel Emelyanov
2015-03-18 19:34 ` [PATCH 1/3] uffd: Tossing bits around Pavel Emelyanov
2015-03-18 19:34   ` Pavel Emelyanov
2015-03-18 19:34   ` Pavel Emelyanov
2015-03-18 19:35 ` [PATCH 2/3] uffd: Introduce the v2 API Pavel Emelyanov
2015-03-18 19:35   ` Pavel Emelyanov
2015-03-18 19:35   ` Pavel Emelyanov
2015-04-21 12:18   ` Andrea Arcangeli
2015-04-21 12:18     ` Andrea Arcangeli
2015-04-21 12:18     ` Andrea Arcangeli
2015-04-23  6:29     ` Pavel Emelyanov
2015-04-23  6:29       ` Pavel Emelyanov
2015-04-27 21:12       ` Andrea Arcangeli [this message]
2015-04-27 21:12         ` Andrea Arcangeli
2015-04-27 21:12         ` Andrea Arcangeli
2015-04-30  9:50         ` Pavel Emelyanov
2015-04-30  9:50           ` Pavel Emelyanov
2015-03-18 19:35 ` [PATCH 3/3] uffd: Introduce fork() notification Pavel Emelyanov
2015-03-18 19:35   ` Pavel Emelyanov
2015-03-18 19:35   ` Pavel Emelyanov
2015-04-21 12:02 ` [PATCH 0/3] UserfaultFD: Extension for non cooperative uffd usage Andrea Arcangeli
2015-04-21 12:02   ` Andrea Arcangeli
2015-04-21 12:02   ` Andrea Arcangeli
2015-04-23  6:34   ` Pavel Emelyanov
2015-04-23  6:34     ` Pavel Emelyanov
2015-04-23  6:34     ` Pavel Emelyanov
     [not found]     ` <20150427211650.GC24035@redhat.com>
2015-04-30 16:38       ` [PATCH] UserfaultFD: Rename uffd_api.bits into .features Pavel Emelyanov
2015-05-07 13:42         ` Andrea Arcangeli
2015-05-07 14:28           ` Pavel Emelyanov
2015-05-07 14:33             ` Andrea Arcangeli
2015-05-07 14:42               ` Pavel Emelyanov
2015-05-07 15:11                 ` Andrea Arcangeli
2015-05-07 15:20                   ` Pavel Emelyanov
2015-05-07 17:08                     ` Andrea Arcangeli
2015-05-07 18:35                       ` Pavel Emelyanov
2015-05-08 13:39                       ` Pavel Emelyanov
2015-05-08 14:07                         ` [PATCH] UserfaultFD: Fix stack corruption when zeroing uffd_msg Pavel Emelyanov
2015-05-08 17:54                           ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150427211236.GB24035@redhat.com \
    --to=aarcange@redhat.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sanidhya.gatech@gmail.com \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.