All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Baron <jbaron@akamai.com>
To: Fam Zheng <famz@redhat.com>
Cc: linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org, Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Kees Cook <keescook@chromium.org>,
	Andy Lutomirski <luto@amacapital.net>,
	David Herrmann <dh.herrmann@gmail.com>,
	Alexei Starovoitov <ast@plumgrid.com>,
	Miklos Szeredi <mszeredi@suse.cz>,
	David Drysdale <drysdale@google.com>,
	Oleg Nesterov <oleg@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	Vivek Goyal <vgoyal@redhat.com>,
	Mike Frysinger <vapier@gentoo.org>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Rashika Kheria <rashika.kheria@gmail.com>,
	Hugh Dickins <hughd@google.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org,
	Josh Triplett <josh@joshtriplett.org>,
	"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Omar Sandoval <osandov@osandov.com>,
	Jonathan Corbet <corbet@lwn.net>,
	shane.seymour@hp.com, dan.j.rosenberg@gmail.com
Subject: Re: [PATCH v4 0/9] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1
Date: Fri, 13 Mar 2015 10:46:47 -0400	[thread overview]
Message-ID: <5502F857.6050505@akamai.com> (raw)
In-Reply-To: <20150313113122.GA7427@ad.nay.redhat.com>


On 03/13/2015 07:31 AM, Fam Zheng wrote:
> On Thu, 03/12 11:02, Jason Baron wrote:
>> On 03/09/2015 09:49 PM, Fam Zheng wrote:
>>
>> Hi,
>>
>> So it sounds like you are comparing original qemu code (which was using
>> ppoll) vs. using epoll with these new syscalls. Curious if you have numbers
>> comparing the existing epoll (with say the timerfd in your epoll set), so
>> we can see the improvement relative to epoll.
> I did compare them, but they are too close to see differences. The improvements
> in epoll_pwait1 doesn't really help the hot path of guest IO, but it does
> affect the program timer precision, that are used in various device emulations
> in QEMU.
>
> Although it's kind of subtle and difficult to summarize here, I can give an
> example in the IO throttling implementation in QEMU, to show the significance:
>
> The throttling algorithm computes a duration for the next IO, which is used to
> arm a timer in order to delay the request a bit. As timers are always rounded
> *UP* to the effective granularity, the timeout being 1ms in epoll_pwait is just
> too coarse and will lead to severe inaccuracy. With epoll_pwait1, we can avoid
> the rounding-up.

right, but we could use the timerfd here to get the desired precision.

> I think this idea could be pertty generally desired by other applications, too.
>
> Regarding the epoll_ctl_batch improvement, again, it is not going to disrupt
> the numbers in the small workload I managed to test.
>
> Of course, if you have a specific application senario in mind, I will try it. :)

I want to understand what new functionality these syscalls offer over
what we have now. I mean we could show a micro-benchmark where
these matter, but is that enough to justify these new syscalls given that
I think we could implement library wrappers around what we have now
to do what you are proposing here.

Thanks,

-Jason

WARNING: multiple messages have this Message-ID (diff)
From: Jason Baron <jbaron-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org>
To: Fam Zheng <famz-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>,
	Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	"H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>,
	x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	Alexander Viro
	<viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>,
	Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>,
	David Herrmann
	<dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Alexei Starovoitov <ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org>,
	Miklos Szeredi <mszeredi-AlSwsSmVLrQ@public.gmane.org>,
	David Drysdale <drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	"David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>,
	Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>,
	Theodore Ts'o <tytso-3s7WtUTddSA@public.gmane.org>,
	Heiko Carstens
	<heiko.carstens-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>,
	Rasmus Villemoes
	<linux-qQsb+v5E8BnlAoU/VqSP6n9LOBIZ5rWg@public.gmane.org>,
	Rashika Kheria
	<rashika.kheria-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Hugh Dickins <hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Mathieu Desnoyers
	<mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>,
	Peter Zijlstra <peterz@
Subject: Re: [PATCH v4 0/9] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1
Date: Fri, 13 Mar 2015 10:46:47 -0400	[thread overview]
Message-ID: <5502F857.6050505@akamai.com> (raw)
In-Reply-To: <20150313113122.GA7427-ZfWej9ACyHUXGNroddHbYwC/G2K4zDHf@public.gmane.org>


On 03/13/2015 07:31 AM, Fam Zheng wrote:
> On Thu, 03/12 11:02, Jason Baron wrote:
>> On 03/09/2015 09:49 PM, Fam Zheng wrote:
>>
>> Hi,
>>
>> So it sounds like you are comparing original qemu code (which was using
>> ppoll) vs. using epoll with these new syscalls. Curious if you have numbers
>> comparing the existing epoll (with say the timerfd in your epoll set), so
>> we can see the improvement relative to epoll.
> I did compare them, but they are too close to see differences. The improvements
> in epoll_pwait1 doesn't really help the hot path of guest IO, but it does
> affect the program timer precision, that are used in various device emulations
> in QEMU.
>
> Although it's kind of subtle and difficult to summarize here, I can give an
> example in the IO throttling implementation in QEMU, to show the significance:
>
> The throttling algorithm computes a duration for the next IO, which is used to
> arm a timer in order to delay the request a bit. As timers are always rounded
> *UP* to the effective granularity, the timeout being 1ms in epoll_pwait is just
> too coarse and will lead to severe inaccuracy. With epoll_pwait1, we can avoid
> the rounding-up.

right, but we could use the timerfd here to get the desired precision.

> I think this idea could be pertty generally desired by other applications, too.
>
> Regarding the epoll_ctl_batch improvement, again, it is not going to disrupt
> the numbers in the small workload I managed to test.
>
> Of course, if you have a specific application senario in mind, I will try it. :)

I want to understand what new functionality these syscalls offer over
what we have now. I mean we could show a micro-benchmark where
these matter, but is that enough to justify these new syscalls given that
I think we could implement library wrappers around what we have now
to do what you are proposing here.

Thanks,

-Jason

WARNING: multiple messages have this Message-ID (diff)
From: Jason Baron <jbaron-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org>
To: Fam Zheng <famz-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>,
	Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	"H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>,
	x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	Alexander Viro
	<viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>,
	Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>,
	David Herrmann
	<dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Alexei Starovoitov <ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org>,
	Miklos Szeredi <mszeredi-AlSwsSmVLrQ@public.gmane.org>,
	David Drysdale <drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	"David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>,
	Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>,
	Theodore Ts'o <tytso-3s7WtUTddSA@public.gmane.org>,
	Heiko Carstens
	<heiko.carstens-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>,
	Rasmus Villemoes
	<linux-qQsb+v5E8BnlAoU/VqSP6n9LOBIZ5rWg@public.gmane.org>,
	Rashika Kheria
	<rashika.kheria-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Hugh Dickins <hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Mathieu Desnoyers
	<mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>,
	Peter Zijlstra <peterz@>
Subject: Re: [PATCH v4 0/9] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1
Date: Fri, 13 Mar 2015 10:46:47 -0400	[thread overview]
Message-ID: <5502F857.6050505@akamai.com> (raw)
In-Reply-To: <20150313113122.GA7427-ZfWej9ACyHUXGNroddHbYwC/G2K4zDHf@public.gmane.org>


On 03/13/2015 07:31 AM, Fam Zheng wrote:
> On Thu, 03/12 11:02, Jason Baron wrote:
>> On 03/09/2015 09:49 PM, Fam Zheng wrote:
>>
>> Hi,
>>
>> So it sounds like you are comparing original qemu code (which was using
>> ppoll) vs. using epoll with these new syscalls. Curious if you have numbers
>> comparing the existing epoll (with say the timerfd in your epoll set), so
>> we can see the improvement relative to epoll.
> I did compare them, but they are too close to see differences. The improvements
> in epoll_pwait1 doesn't really help the hot path of guest IO, but it does
> affect the program timer precision, that are used in various device emulations
> in QEMU.
>
> Although it's kind of subtle and difficult to summarize here, I can give an
> example in the IO throttling implementation in QEMU, to show the significance:
>
> The throttling algorithm computes a duration for the next IO, which is used to
> arm a timer in order to delay the request a bit. As timers are always rounded
> *UP* to the effective granularity, the timeout being 1ms in epoll_pwait is just
> too coarse and will lead to severe inaccuracy. With epoll_pwait1, we can avoid
> the rounding-up.

right, but we could use the timerfd here to get the desired precision.

> I think this idea could be pertty generally desired by other applications, too.
>
> Regarding the epoll_ctl_batch improvement, again, it is not going to disrupt
> the numbers in the small workload I managed to test.
>
> Of course, if you have a specific application senario in mind, I will try it. :)

I want to understand what new functionality these syscalls offer over
what we have now. I mean we could show a micro-benchmark where
these matter, but is that enough to justify these new syscalls given that
I think we could implement library wrappers around what we have now
to do what you are proposing here.

Thanks,

-Jason

  reply	other threads:[~2015-03-13 14:46 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-10  1:49 [PATCH v4 0/9] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1 Fam Zheng
2015-03-10  1:49 ` Fam Zheng
2015-03-10  1:49 ` Fam Zheng
2015-03-10  1:49 ` [PATCH v4 1/9] epoll: Extract epoll_wait_do and epoll_pwait_do Fam Zheng
2015-03-10  1:49   ` Fam Zheng
2015-03-10  1:49   ` Fam Zheng
2015-03-10  1:49 ` [PATCH v4 2/9] epoll: Specify clockid explicitly Fam Zheng
2015-03-10  1:49   ` Fam Zheng
2015-03-10  1:49   ` Fam Zheng
2015-03-10  1:49 ` [PATCH v4 3/9] epoll: Extract ep_ctl_do Fam Zheng
2015-03-10  1:49   ` Fam Zheng
2015-03-10  1:49   ` Fam Zheng
2015-03-10  1:49 ` [PATCH v4 4/9] epoll: Add implementation for epoll_ctl_batch Fam Zheng
2015-03-10  1:49   ` Fam Zheng
2015-03-10  1:49   ` Fam Zheng
2015-03-10 13:59   ` Dan Rosenberg
2015-03-10 13:59     ` Dan Rosenberg
2015-03-11  2:23     ` Fam Zheng
2015-03-10  1:49 ` [PATCH v4 5/9] x86: Hook up epoll_ctl_batch syscall Fam Zheng
2015-03-10  1:49   ` Fam Zheng
2015-03-10  1:49   ` Fam Zheng
2015-03-10  1:49 ` [PATCH v4 6/9] epoll: Add implementation for epoll_pwait1 Fam Zheng
2015-03-10  1:49   ` Fam Zheng
2015-03-10  1:49   ` Fam Zheng
2015-03-10  1:49 ` [PATCH v4 7/9] x86: Hook up epoll_pwait1 syscall Fam Zheng
2015-03-10  1:49   ` Fam Zheng
2015-03-10  1:49   ` Fam Zheng
2015-03-10  1:49 ` [PATCH v4 8/9] epoll: Add compat version implementation of epoll_pwait1 Fam Zheng
2015-03-10  1:49   ` Fam Zheng
2015-03-10  1:49   ` Fam Zheng
2015-03-10  1:49 ` [PATCH v4 9/9] x86: Hook up 32 bit compat epoll_pwait1 syscall Fam Zheng
2015-03-10  1:49   ` Fam Zheng
2015-03-10  1:49   ` Fam Zheng
2015-03-12 15:02 ` [PATCH v4 0/9] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1 Jason Baron
2015-03-12 15:02   ` Jason Baron
2015-03-13 11:31   ` Fam Zheng
2015-03-13 11:31     ` Fam Zheng
2015-03-13 11:31     ` Fam Zheng
2015-03-13 14:46     ` Jason Baron [this message]
2015-03-13 14:46       ` Jason Baron
2015-03-13 14:46       ` Jason Baron
2015-03-13 14:56       ` Paolo Bonzini
2015-03-13 14:56         ` Paolo Bonzini
2015-03-13 14:56         ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5502F857.6050505@akamai.com \
    --to=jbaron@akamai.com \
    --cc=akpm@linux-foundation.org \
    --cc=ast@plumgrid.com \
    --cc=corbet@lwn.net \
    --cc=dan.j.rosenberg@gmail.com \
    --cc=davem@davemloft.net \
    --cc=dh.herrmann@gmail.com \
    --cc=drysdale@google.com \
    --cc=famz@redhat.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=josh@joshtriplett.org \
    --cc=keescook@chromium.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=luto@amacapital.net \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@redhat.com \
    --cc=mszeredi@suse.cz \
    --cc=mtk.manpages@gmail.com \
    --cc=oleg@redhat.com \
    --cc=osandov@osandov.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rashika.kheria@gmail.com \
    --cc=shane.seymour@hp.com \
    --cc=tglx@linutronix.de \
    --cc=tytso@mit.edu \
    --cc=vapier@gentoo.org \
    --cc=vgoyal@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.