All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Adding userspace_libaio_reap option
@ 2011-08-30  0:29 Dan Ehrenberg
  2011-08-30 13:45 ` Jens Axboe
  2011-08-30 14:07 ` Jeff Moyer
  0 siblings, 2 replies; 8+ messages in thread
From: Dan Ehrenberg @ 2011-08-30  0:29 UTC (permalink / raw)
  To: Jens Axboe; +Cc: fio, Dan Ehrenberg

When a single thread is reading from a libaio io_context_t object
in a non-blocking polling manner (that is, with the minimum number
of events to return being 0), then it is possible to safely read
events directly from user-space, taking advantage of the fact that
the io_context_t object is a pointer to memory with a certain layout.
This patch adds an option, userspace_libaio_reap, which allows
reading events in this manner when the libaio engine is used.

You can observe its effect by setting iodepth_batch_complete=0
and seeing the change in distribution of system/user time based on
whether this new flag is set. If userspace_libaio_reap=1, then
busy polling takes place in userspace, and there is a larger amount of
usr CPU. If userspace_libaio_reap=0 (the default), then there is a
larger amount of sys CPU from the polling in the kernel.

Polling from a queue in this manner is several times faster. In my
testing, it took less than an eighth as much time to execute a
polling operation in user-space than with the io_getevents syscall.
---
 engines/libaio.c |   51 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 fio.h            |    2 ++
 options.c        |    9 +++++++++
 3 files changed, 61 insertions(+), 1 deletions(-)

diff --git a/engines/libaio.c b/engines/libaio.c
index c837ab6..b55bc55 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -58,6 +58,46 @@ static struct io_u *fio_libaio_event(struct thread_data *td, int event)
 	return io_u;
 }
 
+struct aio_ring {
+	unsigned id;		 /** kernel internal index number */
+	unsigned nr;		 /** number of io_events */
+	unsigned head;
+	unsigned tail;
+ 
+	unsigned magic;
+	unsigned compat_features;
+	unsigned incompat_features;
+	unsigned header_length;	/** size of aio_ring */
+
+	struct io_event events[0];
+};
+
+#define AIO_RING_MAGIC	0xa10a10a1
+
+static int user_io_getevents(io_context_t aio_ctx, unsigned int max,
+			struct io_event *events)
+{
+	long i = 0;
+	unsigned head;
+	struct aio_ring *ring = (struct aio_ring*)aio_ctx;
+
+	while (i < max) {
+		head = ring->head;
+
+		if (head == ring->tail) {
+			/* There are no more completions */
+			break;
+		} else {
+			/* There is another completion to reap */
+			events[i] = ring->events[head];
+    			ring->head = (head + 1) % ring->nr;
+			i++;
+		}
+	}
+
+	return i;
+}
+
 static int fio_libaio_getevents(struct thread_data *td, unsigned int min,
 				unsigned int max, struct timespec *t)
 {
@@ -66,7 +106,16 @@ static int fio_libaio_getevents(struct thread_data *td, unsigned int min,
 	int r, events = 0;
 
 	do {
-		r = io_getevents(ld->aio_ctx, actual_min, max, ld->aio_events + events, t);
+		if (td->o.userspace_libaio_reap == 1
+		    && actual_min == 0
+		    && ((struct aio_ring *)(ld->aio_ctx))->magic
+				== AIO_RING_MAGIC) {
+			r = user_io_getevents(ld->aio_ctx, max,
+				ld->aio_events + events);
+		} else {
+			r = io_getevents(ld->aio_ctx, actual_min,
+				max, ld->aio_events + events, t);
+		}
 		if (r >= 0)
 			events += r;
 		else if (r == -EAGAIN)
diff --git a/fio.h b/fio.h
index 9d2a61c..0c86f28 100644
--- a/fio.h
+++ b/fio.h
@@ -413,6 +413,8 @@ struct thread_options {
 	unsigned int gid;
 
 	unsigned int sync_file_range;
+
+	unsigned int userspace_libaio_reap;
 };
 
 #define FIO_VERROR_SIZE	128
diff --git a/options.c b/options.c
index 6a87e98..6f7c41e 100644
--- a/options.c
+++ b/options.c
@@ -2069,6 +2069,15 @@ static struct fio_option options[FIO_MAX_OPTS] = {
 		.off1	= td_var_offset(gid),
 		.help	= "Run job with this group ID",
 	},
+#ifdef FIO_HAVE_LIBAIO
+	{
+		.name	= "userspace_libaio_reap",
+		.type	= FIO_OPT_BOOL,
+		.off1	= td_var_offset(userspace_libaio_reap),
+		.help	= "When using the libaio engine with iodepth_batch_complete=0, enable userspace reaping",
+		.def	= "0",
+	},
+#endif
 	{
 		.name = NULL,
 	},
-- 
1.7.3.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] Adding userspace_libaio_reap option
  2011-08-30  0:29 [PATCH] Adding userspace_libaio_reap option Dan Ehrenberg
@ 2011-08-30 13:45 ` Jens Axboe
  2011-08-30 17:47   ` Daniel Ehrenberg
  2011-08-30 14:07 ` Jeff Moyer
  1 sibling, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2011-08-30 13:45 UTC (permalink / raw)
  To: Dan Ehrenberg; +Cc: fio

On 2011-08-29 18:29, Dan Ehrenberg wrote:
> When a single thread is reading from a libaio io_context_t object
> in a non-blocking polling manner (that is, with the minimum number
> of events to return being 0), then it is possible to safely read
> events directly from user-space, taking advantage of the fact that
> the io_context_t object is a pointer to memory with a certain layout.
> This patch adds an option, userspace_libaio_reap, which allows
> reading events in this manner when the libaio engine is used.
> 
> You can observe its effect by setting iodepth_batch_complete=0
> and seeing the change in distribution of system/user time based on
> whether this new flag is set. If userspace_libaio_reap=1, then
> busy polling takes place in userspace, and there is a larger amount of
> usr CPU. If userspace_libaio_reap=0 (the default), then there is a
> larger amount of sys CPU from the polling in the kernel.
> 
> Polling from a queue in this manner is several times faster. In my
> testing, it took less than an eighth as much time to execute a
> polling operation in user-space than with the io_getevents syscall.

Good stuff! The libaio side looks good, but I think we should add engine
specific options under the specific engine. With all the
commands/options that fio has, it quickly becomes a bit unwieldy. So,
idea would be to have:

ioengine=libaio:userspace_reap

I'll look into that.

One question on the code:

> +static int user_io_getevents(io_context_t aio_ctx, unsigned int max,
> +			struct io_event *events)
> +{
> +	long i = 0;
> +	unsigned head;
> +	struct aio_ring *ring = (struct aio_ring*)aio_ctx;
> +
> +	while (i < max) {
> +		head = ring->head;
> +
> +		if (head == ring->tail) {
> +			/* There are no more completions */
> +			break;
> +		} else {
> +			/* There is another completion to reap */
> +			events[i] = ring->events[head];
> +    			ring->head = (head + 1) % ring->nr;
> +			i++;
> +		}
> +	}

Don't we need a read barrier here before reading the head/tail?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Adding userspace_libaio_reap option
  2011-08-30  0:29 [PATCH] Adding userspace_libaio_reap option Dan Ehrenberg
  2011-08-30 13:45 ` Jens Axboe
@ 2011-08-30 14:07 ` Jeff Moyer
       [not found]   ` <CAAK6Zt0uPgY1V1tJihUwKxbjtNVNUqMZu0UDmUtRdJY4k_Lkmw@mail.gmail.com>
  1 sibling, 1 reply; 8+ messages in thread
From: Jeff Moyer @ 2011-08-30 14:07 UTC (permalink / raw)
  To: Dan Ehrenberg; +Cc: Jens Axboe, fio

Dan Ehrenberg <dehrenberg@google.com> writes:

> When a single thread is reading from a libaio io_context_t object
> in a non-blocking polling manner (that is, with the minimum number
> of events to return being 0), then it is possible to safely read
> events directly from user-space, taking advantage of the fact that
> the io_context_t object is a pointer to memory with a certain layout.
> This patch adds an option, userspace_libaio_reap, which allows
> reading events in this manner when the libaio engine is used.

I haven't yet tried to poke holes in your code, but I'm pretty sure I
can find some.  I have patches for the kernel and libaio which allow
user-space reaping of events.  Why don't I dust those off and post them,
and then fio won't have to change at all?  That seems like to proper
approach to solving the problem.

Cheers,
Jeff


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Adding userspace_libaio_reap option
  2011-08-30 13:45 ` Jens Axboe
@ 2011-08-30 17:47   ` Daniel Ehrenberg
  2011-08-30 17:51     ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Ehrenberg @ 2011-08-30 17:47 UTC (permalink / raw)
  To: Jens Axboe; +Cc: fio

[-- Attachment #1: Type: text/plain, Size: 2891 bytes --]

On Tuesday, August 30, 2011, Jens Axboe <axboe@kernel.dk> wrote:
> On 2011-08-29 18:29, Dan Ehrenberg wrote:
>> When a single thread is reading from a libaio io_context_t object
>> in a non-blocking polling manner (that is, with the minimum number
>> of events to return being 0), then it is possible to safely read
>> events directly from user-space, taking advantage of the fact that
>> the io_context_t object is a pointer to memory with a certain layout.
>> This patch adds an option, userspace_libaio_reap, which allows
>> reading events in this manner when the libaio engine is used.
>>
>> You can observe its effect by setting iodepth_batch_complete=0
>> and seeing the change in distribution of system/user time based on
>> whether this new flag is set. If userspace_libaio_reap=1, then
>> busy polling takes place in userspace, and there is a larger amount of
>> usr CPU. If userspace_libaio_reap=0 (the default), then there is a
>> larger amount of sys CPU from the polling in the kernel.
>>
>> Polling from a queue in this manner is several times faster. In my
>> testing, it took less than an eighth as much time to execute a
>> polling operation in user-space than with the io_getevents syscall.
>
> Good stuff! The libaio side looks good, but I think we should add engine
> specific options under the specific engine. With all the
> commands/options that fio has, it quickly becomes a bit unwieldy. So,
> idea would be to have:
>
> ioengine=libaio:userspace_reap

Good idea. I was looking around for engine-specific options but didn't see
any examples. I like this convention.
>
> I'll look into that.
>
> One question on the code:
>
>> +static int user_io_getevents(io_context_t aio_ctx, unsigned int max,
>> +                     struct io_event *events)
>> +{
>> +     long i = 0;
>> +     unsigned head;
>> +     struct aio_ring *ring = (struct aio_ring*)aio_ctx;
>> +
>> +     while (i < max) {
>> +             head = ring->head;
>> +
>> +             if (head == ring->tail) {
>> +                     /* There are no more completions */
>> +                     break;
>> +             } else {
>> +                     /* There is another completion to reap */
>> +                     events[i] = ring->events[head];
>> +                     ring->head = (head + 1) % ring->nr;
>> +                     i++;
>> +             }
>> +     }
>
> Don't we need a read barrier here before reading the head/tail?
>
Of course; how did I forget that?

I can make a fine barrier to run on my x64 machines, but it would be much
better to not introduce an architectural dependency. Is there any kind of
free library for this? Google has one (used in V8) but it's C++ and probably
isn't on enough architectures. And of course the Linux kernel has one, but
it would be a small project to extract it for use in user-space--or has
someone done this work?

> --
> Jens Axboe
>
>

Dan

[-- Attachment #2: Type: text/html, Size: 3485 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Adding userspace_libaio_reap option
  2011-08-30 17:47   ` Daniel Ehrenberg
@ 2011-08-30 17:51     ` Jens Axboe
  0 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2011-08-30 17:51 UTC (permalink / raw)
  To: Daniel Ehrenberg; +Cc: fio

On 2011-08-30 11:47, Daniel Ehrenberg wrote:
> On Tuesday, August 30, 2011, Jens Axboe <axboe@kernel.dk <mailto:axboe@kernel.dk>> wrote:
>> On 2011-08-29 18:29, Dan Ehrenberg wrote:
>>> When a single thread is reading from a libaio io_context_t object
>>> in a non-blocking polling manner (that is, with the minimum number
>>> of events to return being 0), then it is possible to safely read
>>> events directly from user-space, taking advantage of the fact that
>>> the io_context_t object is a pointer to memory with a certain layout.
>>> This patch adds an option, userspace_libaio_reap, which allows
>>> reading events in this manner when the libaio engine is used.
>>>
>>> You can observe its effect by setting iodepth_batch_complete=0
>>> and seeing the change in distribution of system/user time based on
>>> whether this new flag is set. If userspace_libaio_reap=1, then
>>> busy polling takes place in userspace, and there is a larger amount of
>>> usr CPU. If userspace_libaio_reap=0 (the default), then there is a
>>> larger amount of sys CPU from the polling in the kernel.
>>>
>>> Polling from a queue in this manner is several times faster. In my
>>> testing, it took less than an eighth as much time to execute a
>>> polling operation in user-space than with the io_getevents syscall.
>>
>> Good stuff! The libaio side looks good, but I think we should add engine
>> specific options under the specific engine. With all the
>> commands/options that fio has, it quickly becomes a bit unwieldy. So,
>> idea would be to have:
>>
>> ioengine=libaio:userspace_reap
> 
> Good idea. I was looking around for engine-specific options but didn't
> see any examples. I like this convention.

Optimally, we should be able to nest options under the options. But a
quicker hack should suffice, can always be extended if need be.

>>
>> I'll look into that.
>>
>> One question on the code:
>>
>>> +static int user_io_getevents(io_context_t aio_ctx, unsigned int max,
>>> +                     struct io_event *events)
>>> +{
>>> +     long i = 0;
>>> +     unsigned head;
>>> +     struct aio_ring *ring = (struct aio_ring*)aio_ctx;
>>> +
>>> +     while (i < max) {
>>> +             head = ring->head;
>>> +
>>> +             if (head == ring->tail) {
>>> +                     /* There are no more completions */
>>> +                     break;
>>> +             } else {
>>> +                     /* There is another completion to reap */
>>> +                     events[i] = ring->events[head];
>>> +                     ring->head = (head + 1) % ring->nr;
>>> +                     i++;
>>> +             }
>>> +     }
>>
>> Don't we need a read barrier here before reading the head/tail?
>>
> Of course; how did I forget that?
> 
> I can make a fine barrier to run on my x64 machines, but it would be
> much better to not introduce an architectural dependency. Is there any
> kind of free library for this? Google has one (used in V8) but it's
> C++ and probably isn't on enough architectures. And of course the
> Linux kernel has one, but it would be a small project to extract it
> for use in user-space--or has someone done this work?

Fio already includes read and write barriers, they are called
read_barrier() and write_barrier().

FWIW, I agree with Jeff that this would be best handled in the libaio
library code. But if we can make it work reliably with the generic
kernel code (and I think we should), then I want to carry it in fio. For
patches that aren't even merged yet, the road to a setup that already
has this included by default is very long.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Adding userspace_libaio_reap option
       [not found]     ` <CAAK6Zt3iVsTS9=YGSJ3dTvY3vSYBygQYR9HeJwh8Zivmkfa7dg@mail.gmail.com>
@ 2011-08-30 21:14       ` Jeff Moyer
       [not found]         ` <CAAK6Zt0SUBi_+_XRkP0pU5W6RYjcQhg-W+RkJP5qpteGSwPo4g@mail.gmail.com>
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Moyer @ 2011-08-30 21:14 UTC (permalink / raw)
  To: Daniel Ehrenberg; +Cc: fio

Daniel Ehrenberg <dehrenberg@google.com> writes:

> On Tuesday, August 30, 2011, Jeff Moyer <jmoyer@redhat.com> wrote:
>> Dan Ehrenberg <dehrenberg@google.com> writes:
>>
>>> When a single thread is reading from a libaio io_context_t object
>>> in a non-blocking polling manner (that is, with the minimum number
>>> of events to return being 0), then it is possible to safely read
>>> events directly from user-space, taking advantage of the fact that
>>> the io_context_t object is a pointer to memory with a certain layout.
>>> This patch adds an option, userspace_libaio_reap, which allows
>>> reading events in this manner when the libaio engine is used.
>>
>> I haven't yet tried to poke holes in your code, but I'm pretty sure I
>> can find some.  I have patches for the kernel and libaio which allow
>> user-space reaping of events.  Why don't I dust those off and post them,
>> and then fio won't have to change at all?  That seems like to proper
>> approach to solving the problem.
>>
>> Cheers,
>> Jeff
>>
>
> Ken Chen posted some patches which accomplish this in 2007. However, I was a
> little concerned about his lock-free queue structure--it seems like an
> integer overflow might cause events to be lost; on the other hand this is
> very unlikely. Are you talking about Ken's patchset or another one?

I'm talking about another one, since I completely forgot about Ken's
patches.  Thanks for reminding me!

I'm not sure your approach will work on all architectures without kernel
modification.  I think at least a flush_dcache_page may be required.
More importantly, though, I'd like to discourage peeking into the
internals outside of the libaio library.  Doing this makes it really
hard to change things moving forward.

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Adding userspace_libaio_reap option
       [not found]         ` <CAAK6Zt0SUBi_+_XRkP0pU5W6RYjcQhg-W+RkJP5qpteGSwPo4g@mail.gmail.com>
@ 2011-08-30 21:35           ` Daniel Ehrenberg
       [not found]           ` <x49r5415x0j.fsf@segfault.boston.devel.redhat.com>
  1 sibling, 0 replies; 8+ messages in thread
From: Daniel Ehrenberg @ 2011-08-30 21:35 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: fio

On Tue, Aug 30, 2011 at 2:14 PM, Jeff Moyer <jmoyer@redhat.com> wrote:
> Daniel Ehrenberg <dehrenberg@google.com> writes:
>
>> On Tuesday, August 30, 2011, Jeff Moyer <jmoyer@redhat.com> wrote:
>>> Dan Ehrenberg <dehrenberg@google.com> writes:
>>>
>>>> When a single thread is reading from a libaio io_context_t object
>>>> in a non-blocking polling manner (that is, with the minimum number
>>>> of events to return being 0), then it is possible to safely read
>>>> events directly from user-space, taking advantage of the fact that
>>>> the io_context_t object is a pointer to memory with a certain layout.
>>>> This patch adds an option, userspace_libaio_reap, which allows
>>>> reading events in this manner when the libaio engine is used.
>>>
>>> I haven't yet tried to poke holes in your code, but I'm pretty sure I
>>> can find some. �I have patches for the kernel and libaio which allow
>>> user-space reaping of events. �Why don't I dust those off and post them,
>>> and then fio won't have to change at all? �That seems like to proper
>>> approach to solving the problem.
>>>
>>> Cheers,
>>> Jeff
>>>
>>
>> Ken Chen posted some patches which accomplish this in 2007. However, I was a
>> little concerned about his lock-free queue structure--it seems like an
>> integer overflow might cause events to be lost; on the other hand this is
>> very unlikely. Are you talking about Ken's patchset or another one?
>
> I'm talking about another one, since I completely forgot about Ken's
> patches. �Thanks for reminding me!
>
> I'm not sure your approach will work on all architectures without kernel
> modification. �I think at least a flush_dcache_page may be required.
> More importantly, though, I'd like to discourage peeking into the
> internals outside of the libaio library. �Doing this makes it really
> hard to change things moving forward.

Are you saying flush_dcache_page from the kernel or from user-space?
What kind of architecture will have problems? What kind of failures
could result?

About ABI dependence: I agree, it would be better to have things in
libaio rather than here. But to do reaping like this, with certain
restrictions about what context it's used in, I think we'd have to
make another function rather than just changing io_getevents. At
first, I was looking into changing io_getevents, but then I realized
that the application I'm working on optimizing, like FIO, is only
calling io_getevents in a certain pattern, making all of the
synchronization unnecessary. I don't think these two things are the
only users of io_getevents in this pattern. But maybe proper
synchronization can be made cheap enough that there's not a big
penalty for doing it properly all the time.
>
> Cheers,
> Jeff
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Adding userspace_libaio_reap option
       [not found]             ` <CAAK6Zt3cnfUra+vXrZG5ondMf49KDgKf3JOgCM5rbi=KxJvD_Q@mail.gmail.com>
@ 2011-08-31 22:58               ` Daniel Ehrenberg
  0 siblings, 0 replies; 8+ messages in thread
From: Daniel Ehrenberg @ 2011-08-31 22:58 UTC (permalink / raw)
  To: fio, Jeff Moyer

Looks like I accidentally sent a reply just to Jeff rather than to the
list, and we've had a little exchange this way. For the record, here's
what we discussed. Jeff, please reply to this email instead of the one
I just sent you.

---------- Forwarded message ----------
From: Daniel Ehrenberg <dehrenberg@google.com>
Date: Wed, Aug 31, 2011 at 3:55 PM
Subject: Re: [PATCH] Adding userspace_libaio_reap option
To: Jeff Moyer <jmoyer@redhat.com>


On Wed, Aug 31, 2011 at 10:08 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
> Daniel Ehrenberg <dehrenberg@google.com> writes:
>
>> On Tue, Aug 30, 2011 at 2:14 PM, Jeff Moyer <jmoyer@redhat.com> wrote:
>>> Daniel Ehrenberg <dehrenberg@google.com> writes:
>>>
>>>> On Tuesday, August 30, 2011, Jeff Moyer <jmoyer@redhat.com> wrote:
>>>>> Dan Ehrenberg <dehrenberg@google.com> writes:
>>>>>
>>>>>> When a single thread is reading from a libaio io_context_t object
>>>>>> in a non-blocking polling manner (that is, with the minimum number
>>>>>> of events to return being 0), then it is possible to safely read
>>>>>> events directly from user-space, taking advantage of the fact that
>>>>>> the io_context_t object is a pointer to memory with a certain layout.
>>>>>> This patch adds an option, userspace_libaio_reap, which allows
>>>>>> reading events in this manner when the libaio engine is used.
>>>>>
>>>>> I haven't yet tried to poke holes in your code, but I'm pretty sure I
>>>>> can find some. �I have patches for the kernel and libaio which allow
>>>>> user-space reaping of events. �Why don't I dust those off and post them,
>>>>> and then fio won't have to change at all? �That seems like to proper
>>>>> approach to solving the problem.
>>>>>
>>>>> Cheers,
>>>>> Jeff
>>>>>
>>>>
>>>> Ken Chen posted some patches which accomplish this in 2007. However, I was a
>>>> little concerned about his lock-free queue structure--it seems like an
>>>> integer overflow might cause events to be lost; on the other hand this is
>>>> very unlikely. Are you talking about Ken's patchset or another one?
>>>
>>> I'm talking about another one, since I completely forgot about Ken's
>>> patches. �Thanks for reminding me!
>>>
>>> I'm not sure your approach will work on all architectures without kernel
>>> modification. �I think at least a flush_dcache_page may be required.
>>> More importantly, though, I'd like to discourage peeking into the
>>> internals outside of the libaio library. �Doing this makes it really
>>> hard to change things moving forward.
>>
>> Are you saying flush_dcache_page from the kernel or from user-space?
>> What kind of architecture will have problems? What kind of failures
>> could result?
>
> From the kernel. �Arm, ppc, mips, sparc64, etc would have issues.
> Basically everything that #defines ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1.
> Specifically, updates to the tail by the kernel would not be seen in
> userspace.

If updates to the tail are not seen in userspace immediately, then it
will just take longer for the reaping to occur, right? Eventually
something else will flush the cache, and then userspace can see that
there's a new event. A delay for observing an event is a performance,
not correctness, bug. Or am i understanding this wrong?
>
>> About ABI dependence: I agree, it would be better to have things in
>> libaio rather than here. But to do reaping like this, with certain
>> restrictions about what context it's used in, I think we'd have to
>> make another function rather than just changing io_getevents. At
>> first, I was looking into changing io_getevents, but then I realized
>> that the application I'm working on optimizing, like FIO, is only
>> calling io_getevents in a certain pattern, making all of the
>> synchronization unnecessary. I don't think these two things are the
>> only users of io_getevents in this pattern. But maybe proper
>> synchronization can be made cheap enough that there's not a big
>> penalty for doing it properly all the time.
>
> I'm not sure what penalty you think exists. �It's a matter of switching
> a spinlock in the kernel to an atomic cmpxchg. �Userspace would then
> also need atomic ops. �I'm pretty sure it'll be a net win over a system
> call.
>
> Cheers,
> Jeff
>

I was talking about the penalty of doing the atomic cmpxchg in
userspace rather than doing non-atomic operations. I agree that the
atomic operations in userspace will be a win over the system call,
since I've measured this. The atomic operations themselves actually
aren't that expensive, but they're not completely free; I don't have a
good idea of how much they are relative to the rest of what goes on in
a real workload, but probably not much.

Dan

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-08-31 22:58 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-30  0:29 [PATCH] Adding userspace_libaio_reap option Dan Ehrenberg
2011-08-30 13:45 ` Jens Axboe
2011-08-30 17:47   ` Daniel Ehrenberg
2011-08-30 17:51     ` Jens Axboe
2011-08-30 14:07 ` Jeff Moyer
     [not found]   ` <CAAK6Zt0uPgY1V1tJihUwKxbjtNVNUqMZu0UDmUtRdJY4k_Lkmw@mail.gmail.com>
     [not found]     ` <CAAK6Zt3iVsTS9=YGSJ3dTvY3vSYBygQYR9HeJwh8Zivmkfa7dg@mail.gmail.com>
2011-08-30 21:14       ` Jeff Moyer
     [not found]         ` <CAAK6Zt0SUBi_+_XRkP0pU5W6RYjcQhg-W+RkJP5qpteGSwPo4g@mail.gmail.com>
2011-08-30 21:35           ` Daniel Ehrenberg
     [not found]           ` <x49r5415x0j.fsf@segfault.boston.devel.redhat.com>
     [not found]             ` <CAAK6Zt3cnfUra+vXrZG5ondMf49KDgKf3JOgCM5rbi=KxJvD_Q@mail.gmail.com>
2011-08-31 22:58               ` Daniel Ehrenberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.