All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>,
	Dave Chinner <david@fromorbit.com>,
	linux-aio@kvack.org,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 07/13] aio: enabled thread based async fsync
Date: Wed, 20 Jan 2016 00:02:25 -0500	[thread overview]
Message-ID: <20160120050225.GA28031@thunk.org> (raw)
In-Reply-To: <CA+55aFzRo3yztEBBvJ4CMCvVHAo6qEDhTHTc_LGyqmxbcFyNYw@mail.gmail.com>

On Tue, Jan 19, 2016 at 07:59:35PM -0800, Linus Torvalds wrote:
> 
> After thinking it over some more, I guess I'm ok with your approach.
> The table-driven patch makes me a bit happier, and I guess not very
> many people end up ever wanting to do async system calls anyway.
> 
> Are there other users outside of Solace? It would be good to get comments..

For async I/O?  We're using it inside Google, for networking and for
storage I/O's.  We don't need async fsync/fdatasync, but we do need
very fast, low overhead I/O's.  To that end, we have some patches to
batch block layer completion handling, which Kent tried upstreaming a
few years back but which everyone thought was too ugly to live.

(It *was* ugly, but we had access to some very fast storage devices
where it really mattered.  With upcoming NVMe devices, that sort of
hardware should be available to more folks, so it's something that
I've been meaning to revisit from an upstreaming perspective,
especially if I can get my hands on some publically available hardware
for benchmarking purposes to demonstrate why it's useful, even if it
is ugly.)

The other thing which we have which is a bit more experimental is that
we've plumbed through the aio priority bits to the block layer, as
well as aio_cancel.  The idea for the latter is if you are are
interested in low latency access to a clustered file system, where
sometimes a read request can get stuck behind other I/O requests if a
server has a long queue of requests to service.  So the client for
which low latency is very important fires off the request to more than
one server, and as soon as it gets an answer it sends a "never mind"
message to the other server(s).

The code to do aio_cancellation in the block layer is fairly well
tested, and was in Kent's git trees, but never got formally pushed
upstream.  The code to push the cancellation request all the way to
the HDD (for those hard disks / storage devices that support I/O
cancellation) is even more experimental, and needs a lot of cleanup
before it could be sent for review (it was done by someone who isn't
used to upstream coding standards).

The reason why we haven't tried to pushed more of these changes
upsream has been lack of resources, and the fact that the AIO code
*is* ugly, which means extensions tend to make the code at the very
least, more complex.  Especially since some of the folks working on
it, such as Kent, were really worried about performance at all costs,
and Kerningham's "it's twice as hard to debug code as to write it"
comment really applies here.  And since very few people outside of
Google seem to use AIO, and even fewer seem eager to review or work on
AIO, and our team is quite small for the work we need to do, it just
hasn't risen to the top of the priority list.

Still, it's fair to say that if you are using Google Hangouts, or
Google Mail, or Google Docs, AIO is most definitely getting used to
process your queries.

As far as comments, aside from the "we really care about performance",
and "the code is scary complex and barely on the edge of being
maintainable", the other comment I'd make is libaio is pretty awful,
and so as a result a number (most?) of our AIO users have elected to
use the raw system call interfaces and are *not* using the libaio
abstractions --- which, as near as I can tell, don't really buy you
much anyway.  (Do we really need to keep code that provides backwards
compatibility with kernels over 10+ years old at this point?)

Cheers,

					- Ted

WARNING: multiple messages have this Message-ID (diff)
From: Theodore Ts'o <tytso@mit.edu>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>,
	Dave Chinner <david@fromorbit.com>,
	linux-aio@kvack.org,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 07/13] aio: enabled thread based async fsync
Date: Wed, 20 Jan 2016 00:02:25 -0500	[thread overview]
Message-ID: <20160120050225.GA28031@thunk.org> (raw)
In-Reply-To: <CA+55aFzRo3yztEBBvJ4CMCvVHAo6qEDhTHTc_LGyqmxbcFyNYw@mail.gmail.com>

On Tue, Jan 19, 2016 at 07:59:35PM -0800, Linus Torvalds wrote:
> 
> After thinking it over some more, I guess I'm ok with your approach.
> The table-driven patch makes me a bit happier, and I guess not very
> many people end up ever wanting to do async system calls anyway.
> 
> Are there other users outside of Solace? It would be good to get comments..

For async I/O?  We're using it inside Google, for networking and for
storage I/O's.  We don't need async fsync/fdatasync, but we do need
very fast, low overhead I/O's.  To that end, we have some patches to
batch block layer completion handling, which Kent tried upstreaming a
few years back but which everyone thought was too ugly to live.

(It *was* ugly, but we had access to some very fast storage devices
where it really mattered.  With upcoming NVMe devices, that sort of
hardware should be available to more folks, so it's something that
I've been meaning to revisit from an upstreaming perspective,
especially if I can get my hands on some publically available hardware
for benchmarking purposes to demonstrate why it's useful, even if it
is ugly.)

The other thing which we have which is a bit more experimental is that
we've plumbed through the aio priority bits to the block layer, as
well as aio_cancel.  The idea for the latter is if you are are
interested in low latency access to a clustered file system, where
sometimes a read request can get stuck behind other I/O requests if a
server has a long queue of requests to service.  So the client for
which low latency is very important fires off the request to more than
one server, and as soon as it gets an answer it sends a "never mind"
message to the other server(s).

The code to do aio_cancellation in the block layer is fairly well
tested, and was in Kent's git trees, but never got formally pushed
upstream.  The code to push the cancellation request all the way to
the HDD (for those hard disks / storage devices that support I/O
cancellation) is even more experimental, and needs a lot of cleanup
before it could be sent for review (it was done by someone who isn't
used to upstream coding standards).

The reason why we haven't tried to pushed more of these changes
upsream has been lack of resources, and the fact that the AIO code
*is* ugly, which means extensions tend to make the code at the very
least, more complex.  Especially since some of the folks working on
it, such as Kent, were really worried about performance at all costs,
and Kerningham's "it's twice as hard to debug code as to write it"
comment really applies here.  And since very few people outside of
Google seem to use AIO, and even fewer seem eager to review or work on
AIO, and our team is quite small for the work we need to do, it just
hasn't risen to the top of the priority list.

Still, it's fair to say that if you are using Google Hangouts, or
Google Mail, or Google Docs, AIO is most definitely getting used to
process your queries.

As far as comments, aside from the "we really care about performance",
and "the code is scary complex and barely on the edge of being
maintainable", the other comment I'd make is libaio is pretty awful,
and so as a result a number (most?) of our AIO users have elected to
use the raw system call interfaces and are *not* using the libaio
abstractions --- which, as near as I can tell, don't really buy you
much anyway.  (Do we really need to keep code that provides backwards
compatibility with kernels over 10+ years old at this point?)

Cheers,

					- Ted

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Theodore Ts'o <tytso-3s7WtUTddSA@public.gmane.org>
To: Linus Torvalds
	<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Cc: Benjamin LaHaise <bcrl-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>,
	Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>,
	linux-aio-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	linux-fsdevel
	<linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Linux Kernel Mailing List
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Linux API <linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	linux-mm <linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>,
	Alexander Viro
	<viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Subject: Re: [PATCH 07/13] aio: enabled thread based async fsync
Date: Wed, 20 Jan 2016 00:02:25 -0500	[thread overview]
Message-ID: <20160120050225.GA28031@thunk.org> (raw)
In-Reply-To: <CA+55aFzRo3yztEBBvJ4CMCvVHAo6qEDhTHTc_LGyqmxbcFyNYw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Tue, Jan 19, 2016 at 07:59:35PM -0800, Linus Torvalds wrote:
> 
> After thinking it over some more, I guess I'm ok with your approach.
> The table-driven patch makes me a bit happier, and I guess not very
> many people end up ever wanting to do async system calls anyway.
> 
> Are there other users outside of Solace? It would be good to get comments..

For async I/O?  We're using it inside Google, for networking and for
storage I/O's.  We don't need async fsync/fdatasync, but we do need
very fast, low overhead I/O's.  To that end, we have some patches to
batch block layer completion handling, which Kent tried upstreaming a
few years back but which everyone thought was too ugly to live.

(It *was* ugly, but we had access to some very fast storage devices
where it really mattered.  With upcoming NVMe devices, that sort of
hardware should be available to more folks, so it's something that
I've been meaning to revisit from an upstreaming perspective,
especially if I can get my hands on some publically available hardware
for benchmarking purposes to demonstrate why it's useful, even if it
is ugly.)

The other thing which we have which is a bit more experimental is that
we've plumbed through the aio priority bits to the block layer, as
well as aio_cancel.  The idea for the latter is if you are are
interested in low latency access to a clustered file system, where
sometimes a read request can get stuck behind other I/O requests if a
server has a long queue of requests to service.  So the client for
which low latency is very important fires off the request to more than
one server, and as soon as it gets an answer it sends a "never mind"
message to the other server(s).

The code to do aio_cancellation in the block layer is fairly well
tested, and was in Kent's git trees, but never got formally pushed
upstream.  The code to push the cancellation request all the way to
the HDD (for those hard disks / storage devices that support I/O
cancellation) is even more experimental, and needs a lot of cleanup
before it could be sent for review (it was done by someone who isn't
used to upstream coding standards).

The reason why we haven't tried to pushed more of these changes
upsream has been lack of resources, and the fact that the AIO code
*is* ugly, which means extensions tend to make the code at the very
least, more complex.  Especially since some of the folks working on
it, such as Kent, were really worried about performance at all costs,
and Kerningham's "it's twice as hard to debug code as to write it"
comment really applies here.  And since very few people outside of
Google seem to use AIO, and even fewer seem eager to review or work on
AIO, and our team is quite small for the work we need to do, it just
hasn't risen to the top of the priority list.

Still, it's fair to say that if you are using Google Hangouts, or
Google Mail, or Google Docs, AIO is most definitely getting used to
process your queries.

As far as comments, aside from the "we really care about performance",
and "the code is scary complex and barely on the edge of being
maintainable", the other comment I'd make is libaio is pretty awful,
and so as a result a number (most?) of our AIO users have elected to
use the raw system call interfaces and are *not* using the libaio
abstractions --- which, as near as I can tell, don't really buy you
much anyway.  (Do we really need to keep code that provides backwards
compatibility with kernels over 10+ years old at this point?)

Cheers,

					- Ted

  reply	other threads:[~2016-01-20  5:02 UTC|newest]

Thread overview: 133+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-11 22:06 [PATCH 00/13] aio: thread (work queue) based aio and new aio functionality Benjamin LaHaise
2016-01-11 22:06 ` Benjamin LaHaise
2016-01-11 22:06 ` [PATCH 01/13] signals: distinguish signals sent due to i/o via io_send_sig() Benjamin LaHaise
2016-01-11 22:06   ` Benjamin LaHaise
2016-01-11 22:06   ` Benjamin LaHaise
2016-01-11 22:06 ` [PATCH 02/13] aio: add aio_get_mm() helper Benjamin LaHaise
2016-01-11 22:06   ` Benjamin LaHaise
2016-01-11 22:06   ` Benjamin LaHaise
2016-01-11 22:06 ` [PATCH 03/13] aio: for async operations, make the iter argument persistent Benjamin LaHaise
2016-01-11 22:06   ` Benjamin LaHaise
2016-01-11 22:06   ` Benjamin LaHaise
2016-01-11 22:07 ` [PATCH 04/13] signals: add and use aio_get_task() to direct signals sent via io_send_sig() Benjamin LaHaise
2016-01-11 22:07   ` Benjamin LaHaise
2016-01-11 22:07   ` Benjamin LaHaise
2016-01-11 22:07 ` [PATCH 05/13] fs: make do_loop_readv_writev() non-static Benjamin LaHaise
2016-01-11 22:07   ` Benjamin LaHaise
2016-01-11 22:07   ` Benjamin LaHaise
2016-01-11 22:07 ` [PATCH 06/13] aio: add queue_work() based threaded aio support Benjamin LaHaise
2016-01-11 22:07   ` Benjamin LaHaise
2016-01-11 22:07   ` Benjamin LaHaise
2016-01-11 22:07 ` [PATCH 07/13] aio: enabled thread based async fsync Benjamin LaHaise
2016-01-11 22:07   ` Benjamin LaHaise
2016-01-11 22:07   ` Benjamin LaHaise
2016-01-12  1:11   ` Dave Chinner
2016-01-12  1:11     ` Dave Chinner
2016-01-12  1:20     ` Linus Torvalds
2016-01-12  1:20       ` Linus Torvalds
2016-01-12  2:25       ` Dave Chinner
2016-01-12  2:25         ` Dave Chinner
2016-01-12  2:25         ` Dave Chinner
2016-01-12  2:38         ` Linus Torvalds
2016-01-12  2:38           ` Linus Torvalds
2016-01-12  3:37           ` Dave Chinner
2016-01-12  3:37             ` Dave Chinner
2016-01-12  4:03             ` Linus Torvalds
2016-01-12  4:03               ` Linus Torvalds
2016-01-12  4:48               ` Linus Torvalds
2016-01-12  4:48                 ` Linus Torvalds
2016-01-12 22:50                 ` Benjamin LaHaise
2016-01-12 22:50                   ` Benjamin LaHaise
2016-01-12 22:50                   ` Benjamin LaHaise
2016-01-15 20:21                 ` Benjamin LaHaise
2016-01-15 20:21                   ` Benjamin LaHaise
2016-01-15 20:21                   ` Benjamin LaHaise
2016-01-20  3:59                   ` Linus Torvalds
2016-01-20  3:59                     ` Linus Torvalds
2016-01-20  3:59                     ` Linus Torvalds
2016-01-20  5:02                     ` Theodore Ts'o [this message]
2016-01-20  5:02                       ` Theodore Ts'o
2016-01-20  5:02                       ` Theodore Ts'o
2016-01-20 19:59                     ` Dave Chinner
2016-01-20 19:59                       ` Dave Chinner
2016-01-20 19:59                       ` Dave Chinner
2016-01-20 20:29                       ` Linus Torvalds
2016-01-20 20:29                         ` Linus Torvalds
2016-01-20 20:44                         ` Benjamin LaHaise
2016-01-20 20:44                           ` Benjamin LaHaise
2016-01-20 20:44                           ` Benjamin LaHaise
2016-01-20 21:45                           ` Dave Chinner
2016-01-20 21:45                             ` Dave Chinner
2016-01-20 21:56                             ` Benjamin LaHaise
2016-01-20 21:56                               ` Benjamin LaHaise
2016-01-20 21:56                               ` Benjamin LaHaise
2016-01-23  4:24                               ` Dave Chinner
2016-01-23  4:24                                 ` Dave Chinner
2016-01-23  4:50                                 ` Benjamin LaHaise
2016-01-23  4:50                                   ` Benjamin LaHaise
2016-01-23  4:50                                   ` Benjamin LaHaise
2016-01-23 22:22                                   ` Dave Chinner
2016-01-23 22:22                                     ` Dave Chinner
2016-01-23 22:22                                     ` Dave Chinner
2016-01-20 23:07                             ` Linus Torvalds
2016-01-23  4:39                               ` Dave Chinner
2016-01-23  4:39                                 ` Dave Chinner
2016-01-23  4:39                                 ` Dave Chinner
2016-03-14 17:17                                 ` aio openat " Benjamin LaHaise
2016-03-14 17:17                                   ` Benjamin LaHaise
2016-03-20  1:20                                   ` Linus Torvalds
2016-03-20  1:20                                     ` Linus Torvalds
2016-03-20  1:26                                     ` Al Viro
2016-03-20  1:26                                       ` Al Viro
2016-03-20  1:26                                       ` Al Viro
2016-03-20  1:45                                       ` Linus Torvalds
2016-03-20  1:45                                         ` Linus Torvalds
2016-03-20  1:45                                         ` Linus Torvalds
2016-03-20  1:55                                         ` Al Viro
2016-03-20  1:55                                           ` Al Viro
2016-03-20  2:03                                           ` Linus Torvalds
2016-03-20  2:03                                             ` Linus Torvalds
2016-03-20  2:03                                             ` Linus Torvalds
2016-01-20 21:57                         ` Dave Chinner
2016-01-20 21:57                           ` Dave Chinner
2016-01-20 21:57                           ` Dave Chinner
2016-01-22 15:41                     ` Andres Freund
2016-01-22 15:41                       ` Andres Freund
2016-01-12 22:59               ` Andy Lutomirski
2016-01-12 22:59                 ` Andy Lutomirski
2016-01-12 22:59                 ` Andy Lutomirski
2016-01-14  9:19       ` Paolo Bonzini
2016-01-14  9:19         ` Paolo Bonzini
2016-01-14  9:19         ` Paolo Bonzini
2016-01-12  1:30     ` Benjamin LaHaise
2016-01-12  1:30       ` Benjamin LaHaise
2016-01-12  1:30       ` Benjamin LaHaise
2016-01-22 15:31     ` Andres Freund
2016-01-22 15:31       ` Andres Freund
2016-01-22 15:31       ` Andres Freund
2016-01-11 22:07 ` [PATCH 08/13] aio: add support for aio poll via aio thread helper Benjamin LaHaise
2016-01-11 22:07   ` Benjamin LaHaise
2016-01-11 22:07 ` [PATCH 09/13] aio: add support for async openat() Benjamin LaHaise
2016-01-11 22:07   ` Benjamin LaHaise
2016-01-11 22:07   ` Benjamin LaHaise
2016-01-12  0:22   ` Linus Torvalds
2016-01-12  0:22     ` Linus Torvalds
2016-01-12  1:17     ` Benjamin LaHaise
2016-01-12  1:17       ` Benjamin LaHaise
2016-01-12  1:17       ` Benjamin LaHaise
2016-01-12  1:45     ` Chris Mason
2016-01-12  1:45       ` Chris Mason
2016-01-12  1:45       ` Chris Mason
2016-01-12  9:53     ` Ingo Molnar
2016-01-12  9:53       ` Ingo Molnar
2016-01-12  9:53       ` Ingo Molnar
2016-01-11 22:07 ` [PATCH 10/13] aio: add async unlinkat functionality Benjamin LaHaise
2016-01-11 22:07   ` Benjamin LaHaise
2016-01-11 22:07   ` Benjamin LaHaise
2016-01-11 22:07 ` [PATCH 11/13] mm: enable __do_page_cache_readahead() to include present pages Benjamin LaHaise
2016-01-11 22:07   ` Benjamin LaHaise
2016-01-11 22:07 ` [PATCH 12/13] aio: add support for aio readahead Benjamin LaHaise
2016-01-11 22:07   ` Benjamin LaHaise
2016-01-11 22:08 ` [PATCH 13/13] aio: add support for aio renameat operation Benjamin LaHaise
2016-01-11 22:08   ` Benjamin LaHaise
2016-01-11 22:08   ` Benjamin LaHaise

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160120050225.GA28031@thunk.org \
    --to=tytso@mit.edu \
    --cc=akpm@linux-foundation.org \
    --cc=bcrl@kvack.org \
    --cc=david@fromorbit.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.