From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A14D5C3A5A9 for ; Tue, 5 May 2020 02:00:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 86D09206D7 for ; Tue, 5 May 2020 02:00:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727097AbgEECAY convert rfc822-to-8bit (ORCPT ); Mon, 4 May 2020 22:00:24 -0400 Received: from james.kirk.hungrycats.org ([174.142.39.145]:44264 "EHLO james.kirk.hungrycats.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726482AbgEECAY (ORCPT ); Mon, 4 May 2020 22:00:24 -0400 Received: by james.kirk.hungrycats.org (Postfix, from userid 1002) id 432AE6A64FC; Mon, 4 May 2020 22:00:21 -0400 (EDT) Date: Mon, 4 May 2020 22:00:21 -0400 From: Zygo Blaxell To: Chris Murphy Cc: Rich Rauenzahn , Btrfs BTRFS Subject: Re: Western Digital Red's SMR and btrfs? Message-ID: <20200505020021.GR10769@hungrycats.org> References: <20200504230857.GQ10769@hungrycats.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: 8BIT In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On Mon, May 04, 2020 at 05:24:11PM -0600, Chris Murphy wrote: > On Mon, May 4, 2020 at 5:09 PM Zygo Blaxell > wrote: > > > Some kinds of RAID rebuild don't provide sufficient idle time to complete > > the CMR-to-SMR writeback, so the host gets throttled. If the drive slows > > down too much, the kernel times out on IO, and reports that the drive > > has failed. The RAID system running on top thinks the drive is faulty > > (a false positive failure) and the fun begins (hope you don't have two > > of these drives in the same array!). > > This came up on linux-raid@ list today also, and someone posted this > smartmontools bug. > https://www.smartmontools.org/ticket/1313 > > It notes in part this error, which is not a time out. Uhhh...wow. If that's not an individual broken disk, but the programmed behavior of the firmware, that would mean the drive model is not usable at all. > [20809.396284] blk_update_request: I/O error, dev sdd, sector > 3484334688 op 0x1:(WRITE) flags 0x700 phys_seg 2 prio class 0 > > An explicit write error is a defective drive. But even slow downs > resulting in link resets is defective. The marketing of DM-SMR says > it's suitable without having to apply local customizations accounting > for the drive being SMR. > > > > Desktop CMR drives (which are not good in RAID arrays but people use > > them anyway) have firmware hardcoded to retry reads for about 120 > > seconds before giving up. To use desktop CMR drives in RAID arrays, > > you must increase the Linux kernel IO timeout to 180 seconds or risk > > false positive rejections (i.e. multi-disk failures) from RAID arrays. > > I think we're way past the time when all desktop oriented Linux > installations should have overridden the kernel default, using 180 > second timeouts instead. Even in the single disk case. The system is > better off failing safe to slow response, rather than link resets and > subsequent face plant. But these days most every laptop and desktop's > sysroot is on an SSD of some kind. > > > > Now here is the problem: DM-SMR drives have write latencies of up to 300 > > seconds in *non-error* cases. They are up to 10,000 times slower than > > CMR in the worst case. Assume that there's an additional 120 seconds > > for error recovery on top of the non-error write latency, and add the > > extra 50% for safety, and the SMR drive should be configured with a > > 630 second timeout (10.5 minutes) in the Linux kernel to avoid false > > positive failures. > > Incredible. > > > -- > Chris Murphy