* [PATCH v2] Adding userspace_libaio_reap option
@ 2011-08-31 0:50 Dan Ehrenberg
2011-08-31 0:57 ` Jens Axboe
0 siblings, 1 reply; 4+ messages in thread
From: Dan Ehrenberg @ 2011-08-31 0:50 UTC (permalink / raw)
To: Jens Axboe; +Cc: fio, Dan Ehrenberg
When a single thread is reading from a libaio io_context_t object
in a non-blocking polling manner (that is, with the minimum number
of events to return being 0), then it is possible to safely read
events directly from user-space, taking advantage of the fact that
the io_context_t object is a pointer to memory with a certain layout.
This patch adds an option, userspace_libaio_reap, which allows
reading events in this manner when the libaio engine is used.
You can observe its effect by setting iodepth_batch_complete=0
and seeing the change in distribution of system/user time based on
whether this new flag is set. If userspace_libaio_reap=1, then
busy polling takes place in userspace, and there is a larger amount of
usr CPU. If userspace_libaio_reap=0 (the default), then there is a
larger amount of sys CPU from the polling in the kernel.
Polling from a queue in this manner is several times faster. In my
testing, it took less than an eighth as much time to execute a
polling operation in user-space than with the io_getevents syscall.
---
HOWTO | 7 +++++++
engines/libaio.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++++++-
fio.h | 2 ++
options.c | 9 +++++++++
4 files changed, 69 insertions(+), 1 deletions(-)
---
Version 2 updates the HOWTO file and uses the necessary read barrier.
I am not sure if this needs a flush_dcache_page call in the kernel
for correctness. A future version should consider cleaning up the
option interface and consider moving the reaping code to libaio.
diff --git a/HOWTO b/HOWTO
index a1b2e8c..ad4e454 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1187,6 +1187,13 @@ uid=int Instead of running as the invoking user, set the user ID to
gid=int Set group ID, see uid.
+userspace_libaio_reap=bool Normally, with the libaio engine in use, fio
+ will use the io_getevents system call to reap newly returned
+ events. With this flag turned on, the AIO ring will be read
+ directly from user-space to reap events. The reaping mode is
+ only enabled when polling for a minimum of 0 events (eg when
+ iodepth_batch_complete=0).
+
6.0 Interpreting the output
---------------------------
diff --git a/engines/libaio.c b/engines/libaio.c
index c837ab6..ea05c63 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -58,6 +58,47 @@ static struct io_u *fio_libaio_event(struct thread_data *td, int event)
return io_u;
}
+struct aio_ring {
+ unsigned id; /** kernel internal index number */
+ unsigned nr; /** number of io_events */
+ unsigned head;
+ unsigned tail;
+
+ unsigned magic;
+ unsigned compat_features;
+ unsigned incompat_features;
+ unsigned header_length; /** size of aio_ring */
+
+ struct io_event events[0];
+};
+
+#define AIO_RING_MAGIC 0xa10a10a1
+
+static int user_io_getevents(io_context_t aio_ctx, unsigned int max,
+ struct io_event *events)
+{
+ long i = 0;
+ unsigned head;
+ struct aio_ring *ring = (struct aio_ring*)aio_ctx;
+
+ while (i < max) {
+ head = ring->head;
+
+ if (head == ring->tail) {
+ /* There are no more completions */
+ break;
+ } else {
+ /* There is another completion to reap */
+ events[i] = ring->events[head];
+ read_barrier();
+ ring->head = (head + 1) % ring->nr;
+ i++;
+ }
+ }
+
+ return i;
+}
+
static int fio_libaio_getevents(struct thread_data *td, unsigned int min,
unsigned int max, struct timespec *t)
{
@@ -66,7 +107,16 @@ static int fio_libaio_getevents(struct thread_data *td, unsigned int min,
int r, events = 0;
do {
- r = io_getevents(ld->aio_ctx, actual_min, max, ld->aio_events + events, t);
+ if (td->o.userspace_libaio_reap == 1
+ && actual_min == 0
+ && ((struct aio_ring *)(ld->aio_ctx))->magic
+ == AIO_RING_MAGIC) {
+ r = user_io_getevents(ld->aio_ctx, max,
+ ld->aio_events + events);
+ } else {
+ r = io_getevents(ld->aio_ctx, actual_min,
+ max, ld->aio_events + events, t);
+ }
if (r >= 0)
events += r;
else if (r == -EAGAIN)
diff --git a/fio.h b/fio.h
index 9d2a61c..0c86f28 100644
--- a/fio.h
+++ b/fio.h
@@ -413,6 +413,8 @@ struct thread_options {
unsigned int gid;
unsigned int sync_file_range;
+
+ unsigned int userspace_libaio_reap;
};
#define FIO_VERROR_SIZE 128
diff --git a/options.c b/options.c
index 6a87e98..6f7c41e 100644
--- a/options.c
+++ b/options.c
@@ -2069,6 +2069,15 @@ static struct fio_option options[FIO_MAX_OPTS] = {
.off1 = td_var_offset(gid),
.help = "Run job with this group ID",
},
+#ifdef FIO_HAVE_LIBAIO
+ {
+ .name = "userspace_libaio_reap",
+ .type = FIO_OPT_BOOL,
+ .off1 = td_var_offset(userspace_libaio_reap),
+ .help = "When using the libaio engine with iodepth_batch_complete=0, enable userspace reaping",
+ .def = "0",
+ },
+#endif
{
.name = NULL,
},
--
1.7.3.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v2] Adding userspace_libaio_reap option
2011-08-31 0:50 [PATCH v2] Adding userspace_libaio_reap option Dan Ehrenberg
@ 2011-08-31 0:57 ` Jens Axboe
2011-08-31 1:09 ` Daniel Ehrenberg
0 siblings, 1 reply; 4+ messages in thread
From: Jens Axboe @ 2011-08-31 0:57 UTC (permalink / raw)
To: Dan Ehrenberg; +Cc: fio
On 2011-08-30 18:50, Dan Ehrenberg wrote:
> @@ -66,7 +107,16 @@ static int fio_libaio_getevents(struct thread_data *td, unsigned int min,
> int r, events = 0;
>
> do {
> - r = io_getevents(ld->aio_ctx, actual_min, max, ld->aio_events + events, t);
> + if (td->o.userspace_libaio_reap == 1
> + && actual_min == 0
> + && ((struct aio_ring *)(ld->aio_ctx))->magic
> + == AIO_RING_MAGIC) {
> + r = user_io_getevents(ld->aio_ctx, max,
> + ld->aio_events + events);
> + } else {
> + r = io_getevents(ld->aio_ctx, actual_min,
> + max, ld->aio_events + events, t);
> + }
One question here - why depend on actual_min == 0? The check is
practically free, would not hurt to do the user_io_getevents() first and
punt to io_getevents() afterwards if need be.
--
Jens Axboe
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2] Adding userspace_libaio_reap option
2011-08-31 0:57 ` Jens Axboe
@ 2011-08-31 1:09 ` Daniel Ehrenberg
2011-08-31 2:47 ` Jens Axboe
0 siblings, 1 reply; 4+ messages in thread
From: Daniel Ehrenberg @ 2011-08-31 1:09 UTC (permalink / raw)
To: Jens Axboe; +Cc: fio
On Tue, Aug 30, 2011 at 5:57 PM, Jens Axboe <axboe@kernel.dk> wrote:
> On 2011-08-30 18:50, Dan Ehrenberg wrote:
>> @@ -66,7 +107,16 @@ static int fio_libaio_getevents(struct thread_data *td, unsigned int min,
>> int r, events = 0;
>>
>> do {
>> - r = io_getevents(ld->aio_ctx, actual_min, max, ld->aio_events + events, t);
>> + if (td->o.userspace_libaio_reap == 1
>> + && actual_min == 0
>> + && ((struct aio_ring *)(ld->aio_ctx))->magic
>> + == AIO_RING_MAGIC) {
>> + r = user_io_getevents(ld->aio_ctx, max,
>> + ld->aio_events + events);
>> + } else {
>> + r = io_getevents(ld->aio_ctx, actual_min,
>> + max, ld->aio_events + events, t);
>> + }
>
> One question here - why depend on actual_min == 0? The check is
> practically free, would not hurt to do the user_io_getevents() first and
> punt to io_getevents() afterwards if need be.
I guess I don't need to depend on that, since you'll never be calling
both user_io_getevents() and sys_io_getevents() at the same time
still. It was just paranoia, really--if I'm not grabbing the ring_lock
spinlock that exists in the kernel, then I'll be really safe if I
never run code that wants to grab it. But I guess the condition is not
really necessary since the whole thing is single-threaded.
>
> --
> Jens Axboe
>
> --
> To unsubscribe from this list: send the line "unsubscribe fio" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2] Adding userspace_libaio_reap option
2011-08-31 1:09 ` Daniel Ehrenberg
@ 2011-08-31 2:47 ` Jens Axboe
0 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2011-08-31 2:47 UTC (permalink / raw)
To: Daniel Ehrenberg; +Cc: fio
On 2011-08-30 19:09, Daniel Ehrenberg wrote:
> On Tue, Aug 30, 2011 at 5:57 PM, Jens Axboe <axboe@kernel.dk> wrote:
>> On 2011-08-30 18:50, Dan Ehrenberg wrote:
>>> @@ -66,7 +107,16 @@ static int fio_libaio_getevents(struct thread_data *td, unsigned int min,
>>> int r, events = 0;
>>>
>>> do {
>>> - r = io_getevents(ld->aio_ctx, actual_min, max, ld->aio_events + events, t);
>>> + if (td->o.userspace_libaio_reap == 1
>>> + && actual_min == 0
>>> + && ((struct aio_ring *)(ld->aio_ctx))->magic
>>> + == AIO_RING_MAGIC) {
>>> + r = user_io_getevents(ld->aio_ctx, max,
>>> + ld->aio_events + events);
>>> + } else {
>>> + r = io_getevents(ld->aio_ctx, actual_min,
>>> + max, ld->aio_events + events, t);
>>> + }
>>
>> One question here - why depend on actual_min == 0? The check is
>> practically free, would not hurt to do the user_io_getevents() first and
>> punt to io_getevents() afterwards if need be.
>
> I guess I don't need to depend on that, since you'll never be calling
> both user_io_getevents() and sys_io_getevents() at the same time
> still. It was just paranoia, really--if I'm not grabbing the ring_lock
> spinlock that exists in the kernel, then I'll be really safe if I
> never run code that wants to grab it. But I guess the condition is not
> really necessary since the whole thing is single-threaded.
I think it should be safe.
FWIW, I committed this version, so just base further improves on fio
git. I also half-assed a patch to add the sub-option support, and
converted this one. So you should use
ioengine=libaio:userspace_reap
to set it now. I say half-assed, since it's only really supported for
ioengine at the moment, and it's not included in --cmdhelp output. But
it should work, hopefully.
Please give it a spin :-)
--
Jens Axboe
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-08-31 2:47 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-31 0:50 [PATCH v2] Adding userspace_libaio_reap option Dan Ehrenberg
2011-08-31 0:57 ` Jens Axboe
2011-08-31 1:09 ` Daniel Ehrenberg
2011-08-31 2:47 ` Jens Axboe
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.