From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx1.redhat.com (ext-mx08.extmail.prod.ext.phx2.redhat.com [10.5.110.32]) by smtp.corp.redhat.com (Postfix) with ESMTPS id D7FD05E7A8 for ; Sat, 2 Feb 2019 13:34:17 +0000 (UTC) Received: from mail-lj1-f180.google.com (mail-lj1-f180.google.com [209.85.208.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 674BFC058CBE for ; Sat, 2 Feb 2019 13:34:15 +0000 (UTC) Received: by mail-lj1-f180.google.com with SMTP id c19-v6so8149648lja.5 for ; Sat, 02 Feb 2019 05:34:15 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Steve Dodd Date: Sat, 2 Feb 2019 13:34:02 +0000 Message-ID: Content-Type: multipart/alternative; boundary="000000000000b7acd50580e95030" Subject: Re: [linux-lvm] Scrub errors after extending LVM RAID1 mirror [full email] Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: To: linux-lvm@redhat.com --000000000000b7acd50580e95030 Content-Type: text/plain; charset="UTF-8" Weirdly, I thought I had failed to reproduce this bug, but my auto-scrub job ran this morning (first Sat of month), and I got: 03:15:15: Starting scrub of rvg/test ... 03:15:15: ... scrub started ... 03:18:36: FAILED: 7926656 mismatches So I really have no idea what's going on there. I will wade through my bash history and see if I can see what I did last week and what triggered this.. S. On Wed, 23 Jan 2019 at 10:46, Steve Dodd wrote: > Sorry, user error sent the last email before I'd finished typing, trying > again.. > > Hi everyone, > > I am experiencing a mystery scrub failure after extending a particular LV > which is a raid1 type mirror. I am using Ubuntu 18.04, LVM > 2.02.176-4.1ubuntu3, Ubuntu kernel 4.15.0-29-generic. I mentioned this on > IRC, thought an email might reach more people and allow me to provide more > detail. > > As far as I can tell, the LV was *not* created with --nosync: > > # lvs rvg/backups >> LV VG Attr LSize Pool Origin Data% Meta% Move Log >> Cpy%Sync Convert >> backups rvg rwi-aor--- 96.64G >> 100.00 > > > The only odd thing I tend to do is specify extents for the extension > manually, being a bit OCD about on disk segment layouts. Having mined > .bash_history, it seems that last time I ran: > > lvextend -l+2561 rvg/backups /dev/sdc3:20480-23041 /dev/sdb3:80097-82658 > > > After that, a *lvchange --syncaction check rvg/backups* showed a huge > number for raid_mismatch_count (seemed roughly consistent with the newly > extended portion not being synced), but dumping the actual filesystem with > partclone from both legs of the mirror through md5sum showed no > inconsistencies; the contents are mostly borg repositories and for good > measure I verified the data in those using borg as well - no problems. > > After a full resync all is well again. This is the second time this > happened to me on the same LV (I think - certainly the same VG.) > > Any clues? Any known bugs fixed recently that might not have made it into > Ubuntu 1804? I am trying to reproduce with a test LV but can't. Only other > thing I can think might be relevant was that the volume was mounted (but > quiescent) at the time. > > Thanks, > Steve > --000000000000b7acd50580e95030 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Weirdly, I thought I had failed to r= eproduce this bug, but my auto-scrub job ran this morning (first Sat of mon= th), and I got:

03:15:15: Starting scrub of r= vg/test ...=C2=A0
03:15:15: ... scrub started ...
03:18= :36: FAILED:=C2=A0 =C2=A0 =C2=A0 7926656 mismatches

So I really have no idea what's going on there. I will wade thr= ough my bash history and see if I can see what I did last week and what tri= ggered this..

S.

On Wed, 23 Jan 2019 at 10:46, Stev= e Dodd <steved424@gmail.com&g= t; wrote:
Sorry, user error sent the last email before I'd finished typi= ng, trying again..

Hi everyone,

I am experiencin= g a mystery scrub failure after extending a particular LV which is a raid1 = type mirror. I am using Ubuntu 18.04, LVM 2.02.176-4.1ubuntu3, Ubuntu kerne= l=C2=A04.15.0-29-generic. I mentioned this on IRC, thought an email might r= each more people and allow me to provide more detail.

As far as I can tell, the LV was *not* created with --nosyn= c:

# lvs rvg/backups
=C2=A0 LV=C2=A0= =C2=A0 =C2=A0 VG=C2=A0 Attr=C2=A0 =C2=A0 =C2=A0 =C2=A0LSize=C2=A0 Pool Ori= gin Data%=C2=A0 Meta%=C2=A0 Move Log Cpy%Sync Convert
=C2=A0 backups rvg= rwi-aor--- 96.64G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 100.0= 0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0

The only odd thing I tend to do is specify extents for the extension = manually, being a bit OCD about on disk segment layouts. Having mined .bash_history, it seems that last tim= e I ran:

lvextend -l+2561 rvg/backups /dev/s= dc3:20480-23041 /dev/sdb3:80097-82658

After that, a=C2=A0lvchange = --syncaction check rvg/backups showed a huge number for raid_mismatch_count=C2=A0(seemed roughly= consistent with the newly extended portion not being synced), but dumping = the actual filesystem with partclone from both legs of the mirror through m= d5sum showed no inconsistencies; the contents are mostly borg repositories = and for good measure I verified the data in those using borg as well - no p= roblems.

After a full resync all is well again. Th= is is the second time this happened to me on the same LV (I think - certain= ly the same VG.)

Any clues? Any known bugs fixed r= ecently that might not have made it into Ubuntu 1804? I am trying to reprod= uce with a test LV but can't. Only other thing I can think might be rel= evant was that the volume was mounted (but quiescent) at the time.

Thanks,
Steve
--000000000000b7acd50580e95030--