From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755830AbbCMOqv (ORCPT ); Fri, 13 Mar 2015 10:46:51 -0400 Received: from prod-mail-xrelay02.akamai.com ([72.246.2.14]:56545 "EHLO prod-mail-xrelay02.akamai.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751667AbbCMOqs (ORCPT ); Fri, 13 Mar 2015 10:46:48 -0400 Message-ID: <5502F857.6050505@akamai.com> Date: Fri, 13 Mar 2015 10:46:47 -0400 From: Jason Baron User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: Fam Zheng CC: linux-kernel@vger.kernel.org, Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, Alexander Viro , Andrew Morton , Kees Cook , Andy Lutomirski , David Herrmann , Alexei Starovoitov , Miklos Szeredi , David Drysdale , Oleg Nesterov , "David S. Miller" , Vivek Goyal , Mike Frysinger , "Theodore Ts'o" , Heiko Carstens , Rasmus Villemoes , Rashika Kheria , Hugh Dickins , Mathieu Desnoyers , Peter Zijlstra , linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, Josh Triplett , "Michael Kerrisk (man-pages)" , Paolo Bonzini , Omar Sandoval , Jonathan Corbet , shane.seymour@hp.com, dan.j.rosenberg@gmail.com Subject: Re: [PATCH v4 0/9] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1 References: <1425952155-27603-1-git-send-email-famz@redhat.com> <5501AA6B.2020209@akamai.com> <20150313113122.GA7427@ad.nay.redhat.com> In-Reply-To: <20150313113122.GA7427@ad.nay.redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/13/2015 07:31 AM, Fam Zheng wrote: > On Thu, 03/12 11:02, Jason Baron wrote: >> On 03/09/2015 09:49 PM, Fam Zheng wrote: >> >> Hi, >> >> So it sounds like you are comparing original qemu code (which was using >> ppoll) vs. using epoll with these new syscalls. Curious if you have numbers >> comparing the existing epoll (with say the timerfd in your epoll set), so >> we can see the improvement relative to epoll. > I did compare them, but they are too close to see differences. The improvements > in epoll_pwait1 doesn't really help the hot path of guest IO, but it does > affect the program timer precision, that are used in various device emulations > in QEMU. > > Although it's kind of subtle and difficult to summarize here, I can give an > example in the IO throttling implementation in QEMU, to show the significance: > > The throttling algorithm computes a duration for the next IO, which is used to > arm a timer in order to delay the request a bit. As timers are always rounded > *UP* to the effective granularity, the timeout being 1ms in epoll_pwait is just > too coarse and will lead to severe inaccuracy. With epoll_pwait1, we can avoid > the rounding-up. right, but we could use the timerfd here to get the desired precision. > I think this idea could be pertty generally desired by other applications, too. > > Regarding the epoll_ctl_batch improvement, again, it is not going to disrupt > the numbers in the small workload I managed to test. > > Of course, if you have a specific application senario in mind, I will try it. :) I want to understand what new functionality these syscalls offer over what we have now. I mean we could show a micro-benchmark where these matter, but is that enough to justify these new syscalls given that I think we could implement library wrappers around what we have now to do what you are proposing here. Thanks, -Jason