From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-fsdevel-owner@vger.kernel.org>
Received: from out3-smtp.messagingengine.com ([66.111.4.27]:34191 "EHLO
        out3-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK)
        by vger.kernel.org with ESMTP id S1753072AbeDLTzi (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Thu, 12 Apr 2018 15:55:38 -0400
Date: Thu, 12 Apr 2018 12:55:36 -0700
From: Andres Freund <andres@anarazel.de>
To: "Theodore Y. Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger@dilger.ca>,
        Ext4 Developers List <linux-ext4@vger.kernel.org>,
        Linux FS Devel <linux-fsdevel@vger.kernel.org>,
        Jeff Layton <jlayton@redhat.com>,
        "Joshua D. Drake" <jd@commandprompt.com>
Subject: Re: fsync() errors is unsafe and risks data loss
Message-ID: <20180412195536.4nunjt5li2xb4rpw@alap3.anarazel.de>
References: <20180410220726.vunhvwuzxi5bm6e5@alap3.anarazel.de>
 <190CF56C-C03D-4504-8B35-5DB479801513@dilger.ca>
 <20180412021752.2wykkutkmzh4ikbf@alap3.anarazel.de>
 <20180412053445.GP2801@thunk.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20180412053445.GP2801@thunk.org>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

Hi,

On 2018-04-12 01:34:45 -0400, Theodore Y. Ts'o wrote:
> The solution we use at Google is that we watch for I/O errors using a
> completely different process that is responsible for monitoring
> machine health.  It used to scrape dmesg, but we now arrange to have
> I/O errors get sent via a netlink channel to the machine health
> monitoring daemon.

Any pointers to that the underling netlink mechanism? If we can force
postgres to kill itself when such an error is detected (via a dedicated
monitoring process), I'd personally be happy enough.  It'd be nicer if
we could associate that knowledge with particular filesystems etc
(which'd possibly hard through dm etc?), but this'd be much better than
nothing.


> The reality is that recovering from disk errors is tricky business,
> and I very much doubt most userspace applications, including distro
> package managers, are going to want to engineer for trying to detect
> and recover from disk errors.  If that were true, then Red Hat and/or
> SuSE have kernel engineers, and they would have implemented everything
> everything on your wish list.  They haven't, and that should tell you
> something.

The problem really isn't about *recovering* from disk errors. *Knowing*
about them is the crucial part. We do not want to give back clients the
information that an operation succeeded, when it actually didn't. There
could be improvements above that, but as long as it's guaranteed that
"we" get the error (rather than just some kernel log we don't have
access to, which looks different due to config etc), it's ok. We can
throw our hands up in the air and give up.


> The other reality is that once a disk starts developing errors, in
> reality you will probably need to take the disk off-line, scrub it to
> find any other media errors, and there's a good chance you'll need to
> rewrite bad sectors (incluing some which are on top of file system
> metadata, so you probably will have to run fsck or reformat the whole
> file system).  I certainly don't think it's realistic to assume adding
> lots of sophistication to each and every userspace program.

> If you have tens or hundreds of thousands of disk drives, then you
> will need to do tsomething automated, but I claim that you really
> don't want to smush all of that detailed exception handling and HDD
> repair technology into each database or cluster file system component.
> It really needs to be done in a separate health-monitor and
> machine-level management system.

Yea, agreed on all that. I don't think anybody actually involved in
postgres wants to do anything like that. Seems far outside of postgres'
remit.

Greetings,

Andres Freund