* [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements
@ 2007-01-09 16:15 Pierre Peiffer
2007-01-11 17:47 ` Ulrich Drepper
0 siblings, 1 reply; 11+ messages in thread
From: Pierre Peiffer @ 2007-01-09 16:15 UTC (permalink / raw)
To: LKML
Cc: Dinakar Guniguntala, Jean-Pierre Dion, Ingo Molnar,
Ulrich Drepper, Jakub Jelinek, Darren Hart,
Sébastien Dugué
Hi,
Today, there are several functionalities or improvements about futexes included
in -rt kernel tree, which, I think, it make sense to have in mainline.
Among them, there are:
* futex use prio list : allow threads to be woken in priority order instead of
FIFO order.
* futex_wait use hrtimer : allow the use of finer timer resolution.
* futex_requeue_pi functionality : allow use of requeue optimisation for
PI-mutexes/PI-futexes.
* futex64 syscall : allow use of 64-bit futexes instead of 32-bit.
The following mails provide the corresponding patches.
Comments, suggestions, feedback, etc are welcome, as usual.
--
Pierre Peiffer
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements
2007-01-09 16:15 [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements Pierre Peiffer
@ 2007-01-11 17:47 ` Ulrich Drepper
[not found] ` <20070111134615.34902742.akpm@osdl.org>
0 siblings, 1 reply; 11+ messages in thread
From: Ulrich Drepper @ 2007-01-11 17:47 UTC (permalink / raw)
To: Andrew Morton; +Cc: Pierre Peiffer, LKML, Ingo Molnar, Jakub Jelinek
[-- Attachment #1: Type: text/plain, Size: 381 bytes --]
Andrew,
if the patches allow this, I'd like to see parts 2, 3, and 4 to be in
-mm ASAP. Especially the 64-bit variants are urgently needed. Just
hold off adding the plist use, I am still not convinced that
unconditional use is a good thing, especially with one single global list.
--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 251 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements
[not found] ` <20070111134615.34902742.akpm@osdl.org>
@ 2007-01-12 7:53 ` Pierre Peiffer
2007-01-12 7:58 ` Ingo Molnar
0 siblings, 1 reply; 11+ messages in thread
From: Pierre Peiffer @ 2007-01-12 7:53 UTC (permalink / raw)
To: LKML; +Cc: Andrew Morton, Ulrich Drepper, Ingo Molnar, Jakub Jelinek
Andrew Morton a écrit :
> OK. Unfortunately patches 2-4 don't apply without #1 present and the fix
> is not immediately obvious, so we'll need a respin+retest, please.
Ok, I'll provide updated patches for -mm ASAP.
> On Thu, 11 Jan 2007 09:47:28 -0800
> Ulrich Drepper <drepper@redhat.com> wrote:
>> if the patches allow this, I'd like to see parts 2, 3, and 4 to be in
>> -mm ASAP. Especially the 64-bit variants are urgently needed. Just
>> hold off adding the plist use, I am still not convinced that
>> unconditional use is a good thing, especially with one single global list.
Just to avoid any misunderstanding (I (really) understand your point about
performance issue), but:
* the problem I mention about several futexes hashed on the same key, and thus
with all potential waiters listed on the same list, is _not_ a new problem which
comes with this patch: it already exists today, with simple list.
* the measures of performance done with pthread_broadcast (and thus with
futex_requeue) is a good choice (well, may be not realistic, when considering
real applications (*)) to put in evidence the performance impact, rather than
threads making FUTEX_WAIT/FUTEX_WAKE: what is expensive with plist is the
plist_add operation (which occurs in FUTEX_WAIT), not plist_del (which occurs
during FUTEX_WAKE => thus, no big impact should be noticed here). Any measure
will be difficult to do with only FUTEX_WAIT/WAKE.
=> futex_requeue does as many plist_del/plist_add operations as the number of
threads waiting (minus 1), and thus has a direct impact on the time needed to
wake everybody (or to wake the first thread to be more precise).
(*) I'll try the volano bench, if I have time.
--
Pierre
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements
2007-01-12 7:53 ` Pierre Peiffer
@ 2007-01-12 7:58 ` Ingo Molnar
2007-01-16 8:34 ` Pierre Peiffer
0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2007-01-12 7:58 UTC (permalink / raw)
To: Pierre Peiffer; +Cc: LKML, Andrew Morton, Ulrich Drepper, Jakub Jelinek
[-- Attachment #1: Type: text/plain, Size: 563 bytes --]
* Pierre Peiffer <pierre.peiffer@bull.net> wrote:
> [...] Any measure will be difficult to do with only FUTEX_WAIT/WAKE.
that's not a problem - just do such a measurement and show that it does
/not/ impact performance measurably. That's what we want to know...
> (*) I'll try the volano bench, if I have time.
yeah. As an alternative, it might be a good idea to pthread-ify
hackbench.c - that should replicate the Volano workload pretty
accurately. I've attached hackbench.c. (it's process based right now, so
it wont trigger contended futex ops)
Ingo
[-- Attachment #2: hackbench.c --]
[-- Type: text/plain, Size: 5408 bytes --]
/* Test groups of 20 processes spraying to 20 receivers */
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/wait.h>
#include <sys/time.h>
#include <sys/poll.h>
#define DATASIZE 100
static unsigned int loops = 100;
static int use_pipes = 0;
static void barf(const char *msg)
{
fprintf(stderr, "%s (error: %s)\n", msg, strerror(errno));
exit(1);
}
static void fdpair(int fds[2])
{
if (use_pipes) {
if (pipe(fds) == 0)
return;
} else {
if (socketpair(AF_UNIX, SOCK_STREAM, 0, fds) == 0)
return;
}
barf("Creating fdpair");
}
/* Block until we're ready to go */
static void ready(int ready_out, int wakefd)
{
char dummy;
struct pollfd pollfd = { .fd = wakefd, .events = POLLIN };
/* Tell them we're ready. */
if (write(ready_out, &dummy, 1) != 1)
barf("CLIENT: ready write");
/* Wait for "GO" signal */
if (poll(&pollfd, 1, -1) != 1)
barf("poll");
}
/* Sender sprays loops messages down each file descriptor */
static void sender(unsigned int num_fds,
int out_fd[num_fds],
int ready_out,
int wakefd)
{
char data[DATASIZE];
unsigned int i, j;
ready(ready_out, wakefd);
/* Now pump to every receiver. */
for (i = 0; i < loops; i++) {
for (j = 0; j < num_fds; j++) {
int ret, done = 0;
again:
ret = write(out_fd[j], data + done, sizeof(data)-done);
if (ret < 0)
barf("SENDER: write");
done += ret;
if (done < sizeof(data))
goto again;
}
}
}
/* One receiver per fd */
static void receiver(unsigned int num_packets,
int in_fd,
int ready_out,
int wakefd)
{
unsigned int i;
/* Wait for start... */
ready(ready_out, wakefd);
/* Receive them all */
for (i = 0; i < num_packets; i++) {
char data[DATASIZE];
int ret, done = 0;
again:
ret = read(in_fd, data + done, DATASIZE - done);
if (ret < 0)
barf("SERVER: read");
done += ret;
if (done < DATASIZE)
goto again;
}
}
/* One group of senders and receivers */
static unsigned int group(unsigned int num_fds,
int ready_out,
int wakefd)
{
unsigned int i;
unsigned int out_fds[num_fds];
for (i = 0; i < num_fds; i++) {
int fds[2];
/* Create the pipe between client and server */
fdpair(fds);
/* Fork the receiver. */
switch (fork()) {
case -1: barf("fork()");
case 0:
close(fds[1]);
receiver(num_fds*loops, fds[0], ready_out, wakefd);
exit(0);
}
out_fds[i] = fds[1];
close(fds[0]);
}
/* Now we have all the fds, fork the senders */
for (i = 0; i < num_fds; i++) {
switch (fork()) {
case -1: barf("fork()");
case 0:
sender(num_fds, out_fds, ready_out, wakefd);
exit(0);
}
}
/* Close the fds we have left */
for (i = 0; i < num_fds; i++)
close(out_fds[i]);
/* Return number of children to reap */
return num_fds * 2;
}
int main(int argc, char *argv[])
{
unsigned int i, num_groups, total_children;
struct timeval start, stop, diff;
unsigned int num_fds = 20;
int readyfds[2], wakefds[2];
char dummy;
if (argv[1] && strcmp(argv[1], "-pipe") == 0) {
use_pipes = 1;
argc--;
argv++;
}
if (argc != 2 || (num_groups = atoi(argv[1])) == 0)
barf("Usage: hackbench [-pipe] <num groups>\n");
fdpair(readyfds);
fdpair(wakefds);
total_children = 0;
for (i = 0; i < num_groups; i++)
total_children += group(num_fds, readyfds[1], wakefds[0]);
/* Wait for everyone to be ready */
for (i = 0; i < total_children; i++)
if (read(readyfds[0], &dummy, 1) != 1)
barf("Reading for readyfds");
gettimeofday(&start, NULL);
/* Kick them off */
if (write(wakefds[1], &dummy, 1) != 1)
barf("Writing to start them");
/* Reap them all */
for (i = 0; i < total_children; i++) {
int status;
wait(&status);
if (!WIFEXITED(status))
exit(1);
}
gettimeofday(&stop, NULL);
/* Print time... */
timersub(&stop, &start, &diff);
printf("Time: %lu.%03lu\n", diff.tv_sec, diff.tv_usec/1000);
exit(0);
}
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements
2007-01-12 7:58 ` Ingo Molnar
@ 2007-01-16 8:34 ` Pierre Peiffer
2007-01-16 9:44 ` Ingo Molnar
2007-01-16 15:14 ` Ulrich Drepper
0 siblings, 2 replies; 11+ messages in thread
From: Pierre Peiffer @ 2007-01-16 8:34 UTC (permalink / raw)
To: LKML; +Cc: Ingo Molnar, Andrew Morton, Ulrich Drepper, Jakub Jelinek
Hi,
Ingo Molnar a écrit :
> yeah. As an alternative, it might be a good idea to pthread-ify
> hackbench.c - that should replicate the Volano workload pretty
> accurately. I've attached hackbench.c. (it's process based right now, so
> it wont trigger contended futex ops)
Ok, thanks. I've adapted your test, Ingo, and do some measures. (I've only
replaced fork with pthread_create, I didn't use condvar or barrier for the first
synchronization).
The modified hackbench is available here:
http://www.bullopensource.org/posix/pi-futex/hackbench_pth.c
I've run this bench 1000 times with pipe and 800 groups.
Here are the results:
Test1 - with simple list (i.e. without any futex patches)
=========================================================
Iterations=1000
Latency (s) min max avg stddev
26.67 27.89 27.14 0.19
Test2 - with plist (i.e. with only patch 1/4 as is)
===================================================
Iterations=1000
Latency (s) min max avg stddev
26.87 28.18 27.30 0.18
Test3 - with plist but all SHED_OTHER registered
with the same priority (MAX_RT_PRIO)
(i.e. with modified patch 1/4, patch not yet posted here)
=========================================================
Iterations=1000
Latency (s) min max avg stddev
26.74 27.84 27.16 0.18
--
Pierre
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements
2007-01-16 8:34 ` Pierre Peiffer
@ 2007-01-16 9:44 ` Ingo Molnar
2007-01-16 15:14 ` Ulrich Drepper
1 sibling, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2007-01-16 9:44 UTC (permalink / raw)
To: Pierre Peiffer; +Cc: LKML, Andrew Morton, Ulrich Drepper, Jakub Jelinek
* Pierre Peiffer <pierre.peiffer@bull.net> wrote:
> The modified hackbench is available here:
>
> http://www.bullopensource.org/posix/pi-futex/hackbench_pth.c
cool!
> I've run this bench 1000 times with pipe and 800 groups.
> Here are the results:
>
> Test1 - with simple list (i.e. without any futex patches)
> =========================================================
> Latency (s) min max avg stddev
> 26.67 27.89 27.14 0.19
> Test2 - with plist (i.e. with only patch 1/4 as is)
> 26.87 28.18 27.30 0.18
> Test3 - with plist but all SHED_OTHER registered
> 26.74 27.84 27.16 0.18
ok, seems like the last one is the winner - it's the same as unmodified,
within noise.
Ingo
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements
2007-01-16 8:34 ` Pierre Peiffer
2007-01-16 9:44 ` Ingo Molnar
@ 2007-01-16 15:14 ` Ulrich Drepper
2007-01-16 15:40 ` Ingo Molnar
1 sibling, 1 reply; 11+ messages in thread
From: Ulrich Drepper @ 2007-01-16 15:14 UTC (permalink / raw)
To: Pierre Peiffer; +Cc: LKML, Ingo Molnar, Andrew Morton, Jakub Jelinek
[-- Attachment #1: Type: text/plain, Size: 699 bytes --]
Pierre Peiffer wrote:
> I've run this bench 1000 times with pipe and 800 groups.
> Here are the results:
This is not what I'm mostly concerned about. The patches create a
bottleneck since _all_ processes use the same resource. Plus, this code
has to be run on a machine with multiple processors to get RFOs into play.
So, please do this: on an SMP (4p or more) machine, rig the test so that
it runs quite a while. Then, in a script, start the program a bunch of
times, all in parallel. Have the script wait until all program runs are
done and time the time until the last program finishes.
--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 251 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements
2007-01-16 15:14 ` Ulrich Drepper
@ 2007-01-16 15:40 ` Ingo Molnar
2007-01-16 17:46 ` Ulrich Drepper
0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2007-01-16 15:40 UTC (permalink / raw)
To: Ulrich Drepper; +Cc: Pierre Peiffer, LKML, Andrew Morton, Jakub Jelinek
* Ulrich Drepper <drepper@redhat.com> wrote:
> Pierre Peiffer wrote:
> > I've run this bench 1000 times with pipe and 800 groups.
> > Here are the results:
>
> This is not what I'm mostly concerned about. The patches create a
> bottleneck since _all_ processes use the same resource. [...]
what do you mean by that - which is this same resource?
Ingo
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements
2007-01-16 15:40 ` Ingo Molnar
@ 2007-01-16 17:46 ` Ulrich Drepper
2007-01-16 17:50 ` Ingo Molnar
0 siblings, 1 reply; 11+ messages in thread
From: Ulrich Drepper @ 2007-01-16 17:46 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Pierre Peiffer, LKML, Andrew Morton, Jakub Jelinek
[-- Attachment #1: Type: text/plain, Size: 376 bytes --]
Ingo Molnar wrote:
> what do you mean by that - which is this same resource?
From what has been said here before, all futexes are stored in the same
list or hash table or whatever it was. I want to see how that code
behaves if many separate processes concurrently use futexes.
--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 251 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements
2007-01-16 17:46 ` Ulrich Drepper
@ 2007-01-16 17:50 ` Ingo Molnar
2007-01-17 7:50 ` Pierre Peiffer
0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2007-01-16 17:50 UTC (permalink / raw)
To: Ulrich Drepper; +Cc: Pierre Peiffer, LKML, Andrew Morton, Jakub Jelinek
* Ulrich Drepper <drepper@redhat.com> wrote:
> > what do you mean by that - which is this same resource?
>
> From what has been said here before, all futexes are stored in the
> same list or hash table or whatever it was. I want to see how that
> code behaves if many separate processes concurrently use futexes.
futexes are stored in the bucket hash, and these patches do not change
that. The pi-list that was talked about is per-futex. So there's no
change to the way futexes are hashed nor should there be any scalability
impact - besides the micro-impact that was measured in a number of ways
- AFAICS.
Ingo
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements
2007-01-16 17:50 ` Ingo Molnar
@ 2007-01-17 7:50 ` Pierre Peiffer
0 siblings, 0 replies; 11+ messages in thread
From: Pierre Peiffer @ 2007-01-17 7:50 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Ulrich Drepper, LKML, Andrew Morton, Jakub Jelinek
Ingo Molnar a écrit :
> * Ulrich Drepper <drepper@redhat.com> wrote:
>
>>> what do you mean by that - which is this same resource?
>> From what has been said here before, all futexes are stored in the
>> same list or hash table or whatever it was. I want to see how that
>> code behaves if many separate processes concurrently use futexes.
>
> futexes are stored in the bucket hash, and these patches do not change
> that. The pi-list that was talked about is per-futex. So there's no
> change to the way futexes are hashed nor should there be any scalability
> impact - besides the micro-impact that was measured in a number of ways
> - AFAICS.
Yes, that's completely right !
--
Pierre
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2007-01-17 7:51 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-09 16:15 [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements Pierre Peiffer
2007-01-11 17:47 ` Ulrich Drepper
[not found] ` <20070111134615.34902742.akpm@osdl.org>
2007-01-12 7:53 ` Pierre Peiffer
2007-01-12 7:58 ` Ingo Molnar
2007-01-16 8:34 ` Pierre Peiffer
2007-01-16 9:44 ` Ingo Molnar
2007-01-16 15:14 ` Ulrich Drepper
2007-01-16 15:40 ` Ingo Molnar
2007-01-16 17:46 ` Ulrich Drepper
2007-01-16 17:50 ` Ingo Molnar
2007-01-17 7:50 ` Pierre Peiffer
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.