linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever@oracle.com>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jeff Layton <jlayton@poochiereds.net>,
	Kevin Wolf <kwolf@redhat.com>, NeilBrown <neilb@suse.com>,
	Rik van Riel <riel@redhat.com>,
	Christoph Hellwig <hch@infradead.org>,
	linux-mm <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	lsf-pc@lists.linux-foundation.org,
	Ric Wheeler <rwheeler@redhat.com>
Subject: Re: [Lsf-pc] [LSF/MM TOPIC] I/O error handling and fsync()
Date: Mon, 23 Jan 2017 12:53:23 -0500	[thread overview]
Message-ID: <FB649A96-2DD8-4B45-8A72-1454630E096B@oracle.com> (raw)
In-Reply-To: <20170123172500.itzbe7qgzcs6kgh2@thunk.org>


> On Jan 23, 2017, at 12:25 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> 
> On Mon, Jan 23, 2017 at 07:10:00AM -0500, Jeff Layton wrote:
>>>> Well, except for QEMU/KVM, Kevin has already confirmed that using
>>>> Direct I/O is a completely viable solution.  (And I'll add it solves a
>>>> bunch of other problems, including page cache efficiency....)
>> 
>> Sure, O_DIRECT does make this simpler (though it's not always the most
>> efficient way to do I/O). I'm more interested in whether we can improve
>> the error handling with buffered I/O.
> 
> I just want to make sure we're designing a solution that will actually
> be _used_, because it is a good fit for at least one real-world use
> case.
> 
> Is QEMU/KVM using volumes that are stored over NFS really used in the
> real world?

Yes. NFS has worked well for many years in pre-cloud virtualization
environments; in other words, environments that have supported guest
migration for much longer than OpenStack has been around.


> Especially one where you want a huge amount of
> reliability and recovery after some kind network failure?

These are largely data center-grade machine room area networks, not
WANs. Network failures are not as frequent as they used to be.

Most server systems ship with more than one Ethernet device anyway.
Adding a second LAN path between each client and storage targets is
pretty straightforward.


> If we are
> talking about customers who are going to suspend the VM and restart it
> on another server, that presumes a fairly large installation size and
> enough servers that would they *really* want to use a single point of
> failure such as an NFS filer?

You certainly can make NFS more reliable by using a filer that supports
IP-based cluster failover, and has a reasonable amount of redundant
durable storage.

I don't see why we should presume anything about installation size.


> Even if it was a proprietary
> purpose-built NFS filer?  Why wouldn't they be using RADOS and Ceph
> instead, for example?

NFS is a fine inexpensive solution for small deployments and experimental
set ups.

It's much simpler for a single user with no administrative rights to
manage NFS-based files than to deal with creating LUNs or backing
objects, for instance.

Considering the various weirdnesses and inefficiencies involved in
turning an object store into something that has proper POSIX file
semantics, IMO NFS is a known quantity that is straightforward
and a natural fit for some cloud deployments. If it wan't, then
there would be no reason to provide object-to-NFS gateway services.


Wrt O_DIRECT, an NFS client can open the NFS file that backs a virtual
block device with O_DIRECT, and you get the same semantics as reading
or writing to a physical block device. There is no need for the server
to use O_DIRECT as well: the client uses the NFS protocol to control
when the server commits data to durable storage (like, immediately).


--
Chuck Lever



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-01-23 17:53 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-10 16:02 [LSF/MM TOPIC] I/O error handling and fsync() Kevin Wolf
2017-01-11  0:41 ` NeilBrown
2017-01-13 11:09   ` Kevin Wolf
2017-01-13 14:21     ` Theodore Ts'o
2017-01-13 16:00       ` Kevin Wolf
2017-01-13 22:28         ` NeilBrown
2017-01-14  6:18           ` Darrick J. Wong
2017-01-16 12:14           ` [Lsf-pc] " Jeff Layton
2017-01-22 22:44             ` NeilBrown
2017-01-22 23:31               ` Jeff Layton
2017-01-23  0:21                 ` Theodore Ts'o
2017-01-23 10:09                   ` Kevin Wolf
2017-01-23 12:10                     ` Jeff Layton
2017-01-23 17:25                       ` Theodore Ts'o
2017-01-23 17:53                         ` Chuck Lever [this message]
2017-01-23 22:40                         ` Jeff Layton
2017-01-23 22:35                     ` Jeff Layton
2017-01-23 23:09                       ` Trond Myklebust
2017-01-24  0:16                         ` NeilBrown
2017-01-24  0:46                           ` Jeff Layton
2017-01-24 21:58                             ` NeilBrown
2017-01-25 13:00                               ` Jeff Layton
2017-01-30  5:30                                 ` NeilBrown
2017-01-24  3:34                           ` Trond Myklebust
2017-01-25 18:35                             ` Theodore Ts'o
2017-01-26  0:36                               ` NeilBrown
2017-01-26  9:25                                 ` Jan Kara
2017-01-26 22:19                                   ` NeilBrown
2017-01-27  3:23                                     ` Theodore Ts'o
2017-01-27  6:03                                       ` NeilBrown
2017-01-30 16:04                                       ` Jan Kara
2017-01-13 18:40     ` Al Viro
2017-01-13 19:06       ` Kevin Wolf
2017-01-11  5:03 ` Theodore Ts'o
2017-01-11  9:47   ` [Lsf-pc] " Jan Kara
2017-01-11 15:45     ` Theodore Ts'o
2017-01-11 10:55   ` Chris Vest
2017-01-11 11:40   ` Kevin Wolf
2017-01-13  4:51     ` NeilBrown
2017-01-13 11:51       ` Kevin Wolf
2017-01-13 21:55         ` NeilBrown
2017-01-11 12:14   ` Chris Vest

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=FB649A96-2DD8-4B45-8A72-1454630E096B@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=hch@infradead.org \
    --cc=jlayton@poochiereds.net \
    --cc=kwolf@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=neilb@suse.com \
    --cc=riel@redhat.com \
    --cc=rwheeler@redhat.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).