From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81E8BC83000 for ; Tue, 28 Apr 2020 07:22:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5352120661 for ; Tue, 28 Apr 2020 07:22:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ka9q-net.20150623.gappssmtp.com header.i=@ka9q-net.20150623.gappssmtp.com header.b="p9jmt4Lu" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726259AbgD1HWY (ORCPT ); Tue, 28 Apr 2020 03:22:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51802 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725917AbgD1HWY (ORCPT ); Tue, 28 Apr 2020 03:22:24 -0400 Received: from mail-pg1-x541.google.com (mail-pg1-x541.google.com [IPv6:2607:f8b0:4864:20::541]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EC8CFC03C1A9 for ; Tue, 28 Apr 2020 00:22:23 -0700 (PDT) Received: by mail-pg1-x541.google.com with SMTP id n16so9872932pgb.7 for ; Tue, 28 Apr 2020 00:22:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ka9q-net.20150623.gappssmtp.com; s=20150623; h=to:from:autocrypt:subject:message-id:date:user-agent:mime-version :content-transfer-encoding:content-language; bh=HE8i9mdh761YpJw84xxMUQIXtxyRaErQD4EvbUKob40=; b=p9jmt4Lu2Km6A8pSwlHzSlEQaDaEnKowpyEquM9OvfcIQ+xGjlzj+r5h0xub9lq3Mu DG1pIsSxaHhfV1L2MrRNu8FYBCzjtaoms9aAvfXpqnUniTo3p0RMQhdccragjgctj9Yx Rv3ifdoJwahonq5fLrYPzRan5FF3ySEC0lpFeRxLNUnJT7S0rAwWHnZNwiat7u0oTE7v 5ZXMcU+RSrEewTowFXAi5DSKMYIncVAmUxm79XJoH1+hkfJ88QLULH8y7mCY9HF9Nasd hfGH8YcPtFfdXOCtThzqqU+JkOvNFm8/OEGhG5WJsQL2wzLDVnQEqjUnwDb8IyEkdg47 2kPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:from:autocrypt:subject:message-id:date :user-agent:mime-version:content-transfer-encoding:content-language; bh=HE8i9mdh761YpJw84xxMUQIXtxyRaErQD4EvbUKob40=; b=hxhT//7hdA5VtkYW6xj48rnyD4LNDfPMCRNk1Psmu/3YU3ONAdjNGxpz0m8jS9ExhF AM3t2H6bIcxUsGHGK1EdUVikIkPXUxwkRQk0Dbsr7WUEOCZX2ketgLIWB0UbA18KcFvT 6Z5WfGXCTQRBQLuRHbuJZsJ40iehpiyrN4JM27yN206dIEkqtfV3bJKWMKQm8LgdIslB lYaBPHkNRzAvgt4cbxGewI+8dc2bAp5E+L1Sh7qeX5AlVSq7DgBHA80K4UMOCEDOanx6 Q+gyYH2r3zJnOjFppmb4Q0LwrMBezMgg3hM/fG1MM2QrRYPFVcW8nVHP6Q3jXGLYFugU S6vQ== X-Gm-Message-State: AGi0PuZ/xkGdJUYfQVhhvHlQtcU/Cm86YXCyO3LqBbG41k+PSmgOHHED vbC8rlCPCIyBnPWwYU+Fs9q17A== X-Google-Smtp-Source: APiQypJJUMfAY4Wr2a+bckbEJf7zLwo43A06GQqJ06aSIowmun1xMhKRblF6av108NYz5Av2bUQ5Sw== X-Received: by 2002:a62:1549:: with SMTP id 70mr26871893pfv.43.1588058543365; Tue, 28 Apr 2020 00:22:23 -0700 (PDT) Received: from selma.local ([2605:e000:1c0e:43f7:2543:a594:4e32:2f52]) by smtp.gmail.com with ESMTPSA id a30sm12180928pgm.44.2020.04.28.00.22.21 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 28 Apr 2020 00:22:22 -0700 (PDT) To: linux-btrfs@vger.kernel.org From: Phil Karn Autocrypt: addr=karn@ka9q.net; keydata= mQENBEw2mJ4BCADELiPsLFHDwapoSU7d2xNHxmwzzrFUCZWhO34kM6G5+o9GUNmGgMQ0BmXp I6hx77HHnrj9FC6kWh/bxBt3+o8HW+NTWzJSvf6kW7ThaNt7v9iewkS23JOMarAZs4qy6MhT 1RW1/yWY7RimWxrmkKPTKKa0Ad4CWT6fcP3t+doqGslSQIeoTh0C33ZT+LY59Wcr224iXohN 4Uu/nFe4yAHjtA+4Sesveo3Tyi8cbOgkcO6vij+pXesCcuhtGMlnE2dxRqbenrfVGLUVxNug LkQdLWezaGGm+dcjWYk1xjtaDnsCpVaYCMsfYNADQPJAjAFu37pVdoXhseVXfzOUN2EXABEB AAG0GVBoaWwgS2FybiA8a2FybkBrYTlxLm5ldD6JATgEEwECACIFAkw2mJ4CGwMGCwkIBwMC BhUIAgkKCwQWAgMBAh4BAheAAAoJEPFOQ1TtRjRGU98H/Atsb/N4lNbzNdzdIRcHD9XgCEa1 UdR4mxgjwvLxS1nYRNdHwfTxvA5nxWfMx/0CB26VaNFdI3lkg/S0vYsSUu6M7l8Zb8v4JMyU 4B4yvkFHZ3II5oilzIMa3e2cMfDz7TSwO1JcXyI5y9vHnvH65/LQF+QojDgzf3vXKiNdTXJp 3nEa5IgMAB0rcSNsXFx8xbHi8s5niL9+1I7XTPvVMeXrMe8h4AG1nM/dK96WwmV+tLyXKYXn xVeb9F4X9CNQbkn/xAH+egvKHHT3V7K9cAhrDfu9Qwpo7zKk/akBpLWG2kmkTOfhXjm3UQhv MVgDmpeQIYa1HgAsKrsDQMzrIFm5AQ0ETDaYngEIAJmFdm0MmENzLEosD1FvGPJleWDYb0ah 8dOk4XUug1RhW40f7AsugT75pKs9PolXt92920GdU727X3Jpgdj4kLDtIQA0KZrOXiEOZjIZ WcROAyvTGyMs/P7Um1AGNM161Ga6/Wtlc076FN7EUQtzPbthH26M3lGWUX0Ccls/8Ep4qbnF IrMRBxjaxKbqfKPTeU10pDykzA7s5hiNe7qaegvqD6YLseZ+6FqCn988YnLiIaFeNbWxUY5G spjAsfesnAmpn5vhUqAGiizkNlAMIN31xvpLd93oM4/vORszIuN1UP2RlxL3s30BncZl2XOd Mk1/59Sy70zVqF1ANyMrA18AEQEAAYkBHwQYAQIACQUCTDaYngIbDAAKCRDxTkNU7UY0Rszt B/9ZPH9xw47lPkVJRbhgf0G7fdsxsyiuouAqOKklUNFSy4+qeGomjwE6YvdMybwGtaUGla7t 2mDzrva+7Gzb0inXIgmahQPmM16F3GVxGoFL+QJ+7gD8Hco6e0/2kju7ZREDE7SOEwKb3lhD eNLccfX2AqAHfCT/LVLbgBpMRmwUJQThM+33Z2L9BqIM3awj2mOTmeDumpxiDfroU90mGc9c pXe4YrNIkL/N8eMzLe1bpu+mpPCiIrEO+dFA7N8jjVcOCQ4Lr8sU6cOsEdkaACZiNFKT99eb NkKigK8sEkDZc/AKhPCEsnaZpwBZPScOL88LLi7FHj9Osznt+uhWfbLe Subject: Extremely slow device removals Message-ID: <8b647a7f-1223-fa9f-57c0-9a81a9bbeb27@ka9q.net> Date: Tue, 28 Apr 2020 00:22:20 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Content-Language: en-US Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org I've been running btrfs in RAID1 mode on four 6TB drives for years. They have 35+K hours (4 years) of running time, and while they're still passing SMART scans I I wanted to stop tempting fate. They were also starting to get full (about 92%) and performance was beginning to suffer.= My plan: replace them with two new 16TB EXOS (Enterprise) drives from Seagate. My first false start was a "device add" of one of the new drives followed by a "device remove" on an old one. (I'd been a while, and I'd forgotten "device replace"). This went extremely slowly, and by morning it had bombed with a message in the kernel log about running out of space on (I think) the *old* drive. This seemed odd since the new drive was still mostly empty. The filesystem also refused to remount right away, but given the furious drive activity I decided to be patient. The file system mounted by itself an hour or so later. There were plenty of "task hung" messages in the kernel log, but they all seemed to be warnings. No lost data. Whew. By now I remembered "device replace". But I'd already done "device add" on the first new 16 TB drive. That gave me 5 drives online and no spare slot for the second new drive. I didn't want to repeat the "device remove" for fear of another out-of-space failure. So I took a gamble.=C2=A0 I pulled one of the old 6= TB drives to make room for the second new 16TB drive, brought the array up in degraded mode and started a "device replace missing" operation onto the second new drive. 'iostat' showed just what I expected: a burst of reads from one or more of the three old drives alternating with big writes to the new drive. The data rates were reasonably consistent with the I/O bandwidth limitations of my 10-year-old server. When it finished the next day I pulled the old 6TB drive and replaced it with the second new 16 TB drive. So far so good. I then began another "device replace". Since I wasn't forced to degrade the array this time, I didn't. It's been several days, and it's nowhere near half done. As far as I can tell, it's only making headway of maybe 100-200 GB/day so at this rate it might finish in several weeks! Moreover, when I run 'iostat' I see lots of writes **to** the drive being replaced, usually in parallel with the same amount of data going to one of the other drives. I'd expect lots of *reads from* the drive being replaced, but why are there any writes to it at all? Is this just to keep the filesystem consistent in case of a crash? I'd already run data and metadata balance operations up to about 95%. I hesitate to tempt fate by forcing the system down to do another "device replace missing" operation. Can anyone explain why replacing a missing device is so much faster than replacing an existing device? Is it simply because, without no redundancy left against a drive loss, less work needs to (or can) be done to protect against a crash? Thanks. Phil Karn Here's some current system information. Linux homer.ka9q.net 4.19.0-8-rt-amd64 #1 SMP PREEMPT RT Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux btrfs-progs v4.20.1 Label: 'homer-btrfs'=C2=A0 uuid: 0d090428-8af8-4d23-99da-92f7176f82a7 Total devices 5 FS bytes used 9.89TiB =C2=A0=C2=A0=C2=A0 devid=C2=A0=C2=A0=C2=A0 1 size 5.46TiB used 3.81TiB pa= th /dev/sdd3 =C2=A0=C2=A0=C2=A0 devid=C2=A0=C2=A0=C2=A0 2 size 0.00B used 2.72TiB path= /dev/sde3 [device currently being replaced] =C2=A0=C2=A0=C2=A0 devid=C2=A0=C2=A0=C2=A0 4 size 5.46TiB used 5.10TiB pa= th /dev/sdc3 =C2=A0=C2=A0=C2=A0 devid=C2=A0=C2=A0=C2=A0 5 size 14.32TiB used 6.08TiB p= ath /dev/sdb4 =C2=A0=C2=A0=C2=A0 devid=C2=A0=C2=A0=C2=A0 6 size 14.32TiB used 2.08TiB p= ath /dev/sda4 Data, RAID1: total=3D9.84TiB, used=3D9.84TiB System, RAID1: total=3D32.00MiB, used=3D1.73MiB Metadata, RAID1: total=3D52.00GiB, used=3D48.32GiB GlobalReserve, single: total=3D512.00MiB, used=3D0.00B