From mboxrd@z Thu Jan 1 00:00:00 1970 From: Roger Heflin Subject: Re: Filesystem corruption on RAID1 Date: Thu, 17 Aug 2017 07:41:00 -0500 Message-ID: References: <20170713214856.4a5c8778@natsu> <592f19bf608e9a959f9445f7f25c5dad@assyoma.it> <770b09d3-cff6-b6b2-0a51-5d11e8bac7e9@thelounge.net> <9eea45ddc0f80f4f4e238b5c2527a1fa@assyoma.it> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Return-path: In-Reply-To: Sender: linux-raid-owner@vger.kernel.org To: Gionatan Danti Cc: Reindl Harald , Roman Mamedov , Linux RAID List-Id: linux-raid.ids On Thu, Aug 17, 2017 at 3:23 AM, Gionatan Danti wrote: > On 14/07/2017 12:46, Gionatan Danti wrote:> Hi, so a premature/preventive > drive detachment is not a silver bullet, > but is this the right solution)? > - how to deal with this problem (other than being 100% sure power is never > lost by any disks)? > > Thank you all, > regards. > Here is a guess based on what you determined was the cause. The mid-layer does not know the writes were lost. The writes were in the drives write cache (already submitted to the drive and confirmed back to the mid-layer as done, even though they were not yet on the platter), and when the driver lost power and "rebooted" those writes disappeared, the write(s) the mid-layer had in progress and that never got a done from the drive failed were retried and succeeded after the driver reset was completed. In high reliability raid the solution is to turn off that write cache, *but* if you do direct io writes (most databases) with the drives write cache off and no battery backed up cache between the 2 then the drive becomes horribly slow since it must actually write the data to the platter before telling the next level up that the data was safe.