From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.5 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D31AC4727C for ; Wed, 30 Sep 2020 20:16:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4247D2074B for ; Wed, 30 Sep 2020 20:16:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729724AbgI3UQP (ORCPT ); Wed, 30 Sep 2020 16:16:15 -0400 Received: from smtp.hosts.co.uk ([85.233.160.19]:14572 "EHLO smtp.hosts.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729504AbgI3UQP (ORCPT ); Wed, 30 Sep 2020 16:16:15 -0400 Received: from host86-157-96-171.range86-157.btcentralplus.com ([86.157.96.171] helo=[192.168.1.65]) by smtp.hosts.co.uk with esmtpa (Exim) (envelope-from ) id 1kNiWI-0006j5-FQ; Wed, 30 Sep 2020 21:16:11 +0100 Subject: Re: RAID5->RAID6 reshape remains stuck at 0% (does nothing, not even start) To: David Madore Cc: Linux RAID mailing-list References: <20200930014032.pd4csjwu3m7uihin@achernar.gro-tsen.net> <5F740390.7050005@youngman.org.uk> <20200930090031.6lzrs336fr4inpz4@achernar.gro-tsen.net> <90338e5b-9ed4-c86e-fa35-8acdd6768ca7@youngman.org.uk> <20200930185824.q6dphu2axpfcjjly@achernar.gro-tsen.net> <5F74D684.8020005@youngman.org.uk> <20200930194510.vki7zixjca6sxvin@achernar.gro-tsen.net> From: antlists Message-ID: Date: Wed, 30 Sep 2020 21:16:10 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: <20200930194510.vki7zixjca6sxvin@achernar.gro-tsen.net> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org On 30/09/2020 20:45, David Madore wrote: > On Wed, Sep 30, 2020 at 08:03:32PM +0100, Wols Lists wrote: >> On 30/09/20 19:58, David Madore wrote: >>> mdadm - v4.1 - 2018-10-01 >>> >>> - which I think is roughly contemporaneous to the kernel version I'm >>> using. But the problem still persists (with the exact same symptoms >>> and details). >> >> Except that mdadm is NOT the problem. The problem is that the kernel and >> mdadm are not matched date-wise, and because the kernel is a >> franken-kernel you need to use a different kernel. > > I don't understand what you mean by "matched date-wise". The kernel > I'm using is a longterm support branch (4.9) which was frozen at the > same approximate date as the mdadm I just installed. And it was also > the same longterm support branch that was used in the Debian oldstable > (9 aka stretch). Do you mean that there is no mdadm version which is > compatible with the 4.9 kernels? How often does the mdadm-kernel > interface break compatibility? The problem is that if you use mdadm 3.4 with kernel 4.9.237, the 237 means that your kernel has been heavily updated and is far too new. But if you use mdadm 4.1 with kernel 4.9.237, the 4.9 means that the kernel is basically a very old one - too old for mdadm 4.1 As I said, the problem is the kernel - it is, at heart, an ancient kernel. And it hasn't been regression tested for raid reshapes. And what the problem is, we don't know exactly, nor do we particularly care, sorry. So long as your data isn't lost, the response here is pretty much the same as elsewhere, unfortunately - "run an up-to-date system". > >> Use a rescue disk!!! That way, you get a kernel and an mdadm that are >> the same approximate date. As it stands, your frankenkernel is too new >> for mdadm 3.4, but too ancient for a modern kernel. > > Using a rescue disk would mean taking the system down for longer than > I can afford (I can afford to have this particular partition down for > a long time, but not the whole system... which unfortunately resides > on the same disks). So I'd like to keep this as a very last resort, > or at least, not consider it until I've fully understood what's going > on. (It's especially problematic that I have absolutely no idea of > the speed at which I can expect the reshape to take place, compared to > an ordinary resync. If you could give me a ballpark figure, it would > help me decide. My disks resync at ~120MB/sec, and the RAID array I > wish to reshape is ~900GB in per partition, so it takes a few hours to > do an "ordinary" resync: I assume a reshape will take much longer, but > how much longer are we talking?) What do you mean by a resync? Do you mean replacing a drive? Because I can't speak for certain, but I wouldn't expect a reshape to take much longer. If you don't want to take the system down to use a rescue disk, I don't really know what to suggest. You could revert your kernel back to a 4.9.x where x is a single digit, and it would probably work. Or you could install a modern 5.9 or similar kernel, but that might well break a load of other stuff. Or just upgrade to a new Debian/Ubuntu ... any of them *should* work, but the only options we'd recommend would be to upgrade your distro, or use a rescue disk. Sorry. Cheers, Wol