All of lore.kernel.org
 help / color / mirror / Atom feed
* parent transid verify failed
@ 2017-05-11 10:01 Massimo B.
  0 siblings, 0 replies; 22+ messages in thread
From: Massimo B. @ 2017-05-11 10:01 UTC (permalink / raw)
  To: linux-btrfs

Hello,

this is some btrfs-on-luks, USB hdd as blockdevice.
I can't mount my btrfs anymore, getting continuously the same syslog error:

- Last output repeated twice -
May 11 07:58:25 [kernel] BTRFS error (device dm-3): failed to read block groups:
-5
May 11 07:58:25 [kernel] BTRFS error (device dm-3): open_ctree failed
May 11 07:58:31 [kernel] BTRFS info (device dm-3): use zlib compression
May 11 07:58:31 [kernel] BTRFS info (device dm-3): enabling auto defrag
May 11 07:58:31 [kernel] BTRFS info (device dm-3): disk space caching is enabled
May 11 07:58:31 [kernel] BTRFS info (device dm-3): has skinny extents
May 11 07:58:33 [kernel] BTRFS error (device dm-3): parent transid verify failed
on 541635395584 wanted 10388 found 10385

This is the last part of btrfs check --repair (I know, highly experimental, but
I didn't get an alternative solution on #btrfs) :

rent transid verify failed on 541577035776 wanted 10388 found 10384
parent transid verify failed on 541577035776 wanted 10388 found 10384
parent transid verify failed on 541577035776 wanted 10388 found 10384
parent transid verify failed on 541577035776 wanted 10388 found 10384
parent transid verify failed on 541577035776 wanted 10388 found 10384
Chunk[256, 228, 429526089728]: length(1073741824), offset(429526089728), type(1)
is not found in block group
Chunk[256, 228, 430599831552]: length(1073741824), offset(430599831552), type(1)
is not found in block group
Chunk[256, 228, 431673573376]: length(1073741824), offset(431673573376), type(1)
is not found in block group
Chunk[256, 228, 434894798848]: length(1073741824), offset(434894798848), type(1)
is not found in block group
Chunk[256, 228, 435968540672]: length(1073741824), offset(435968540672), type(1)
is not found in block group
Chunk[256, 228, 437042282496]: length(1073741824), offset(437042282496), type(1)
is not found in block group
Chunk[256, 228, 438116024320]: length(1073741824), offset(438116024320), type(1)
is not found in block group
ref mismatch on [429497528320 40960] extent item 0, found 1
Backref 429497528320 parent 858210304 owner 0 offset 0 num_refs 0 not found in
extent tree
Incorrect local backref count on 429497528320 parent 858210304 owner 0 offset 0
found 1 wanted 0 back 0x37aaefc0
backpointer mismatch on [429497528320 40960]
parent transid verify failed on 541635395584 wanted 10388 found 10385
Ignoring transid failure
Failed to find [541635395584, 168, 16384]
btrfs unable to find ref byte nr 541635395584 parent 0 root 2  owner 1 offset 0
failed to repair damaged filesystem, aborting

How did that happen?
Yesterday I sent a big snapshot from local drive to a slower USB drive via
btrbk. That was  already finished. However the USB drive was completely filled
up to 99% and doing some IO apparently. Then I was not able to shutdown the
machine. Shutdown was really slow, finally umounts were accomplished, services
stopped, system shutdown almost finished, but no shutdown. I did a Sysreq- E I U
S R B, no reboot. Sysreq-O did not even shut off. So as last consequence I
disconnected power supply.

The broken btrfs is actually only a snapshot receiver as backup. I would prefer
to get it repaired. Seeing that btrfs is sensitive about filling up to 99%
usage, I'm worried about my production btrfs.

This is Gentoo-Linux, 4.10.14-ck, btrfs-progs-4.10.2.

Best regards,
Massimo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: parent transid verify failed
  2022-01-02  4:03               ` Eric Levy
  2022-01-02  5:57                 ` Zygo Blaxell
@ 2022-01-03  7:41                 ` Chris Murphy
  1 sibling, 0 replies; 22+ messages in thread
From: Chris Murphy @ 2022-01-03  7:41 UTC (permalink / raw)
  To: Eric Levy; +Cc: linux-btrfs

On Sat, Jan 1, 2022 at 9:03 PM Eric Levy <contact@ericlevy.name> wrote:
>
> On Sat, 2022-01-01 at 22:27 -0500, Zygo Blaxell wrote:

> > Since you're transitioning from one disk to multiple disks, you
> > should
> > convert metadata to raid1, and ensure there's sufficient unallocated
> > space on the first drive to hold metadata as it grows.  This can be
> > done
>
> I thought it was the default for disk pools, fully duplicated metadata,
> and JBOD-style for payload data.

raid1 is the default profile for metadata at mkfs time for multiple
device file systems.

However, dup is the default profile for metadata at mkfs time for
single device file systems, and it's not automatically converted to
raid1 when you add additional devices.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: parent transid verify failed
  2022-01-02  5:57                 ` Zygo Blaxell
@ 2022-01-02 10:17                   ` Eric Levy
  0 siblings, 0 replies; 22+ messages in thread
From: Eric Levy @ 2022-01-02 10:17 UTC (permalink / raw)
  To: linux-btrfs

On Sun, 2022-01-02 at 00:57 -0500, Zygo Blaxell wrote:

> Device order (mapping of /dev/sd? device to underlying storage)
> usually
> isn't guaranteed, but once it's set it tends to be persistent;
> otherwise,
> disastrous problems would occur with every multi-device Linux
> subsystem:
> btrfs, mdadm, lvm, and dm-crypt would all significantly damage their
> data
> if the mapping from subdevice to physical storage changed
> unexpectedly
> while they were active.
> 
> In disconnect-reconnect cases, the old device is reconnected only if
> it
> can be reliably identified; otherwise, an entirely new device is
> created
> (as in the case where sdc became sdf after iSCSI logout and
> reconnect).

After poking around, I made an interesting discovery:

$ ls  -l /dev/disk/by-path
lrwxrwxrwx 1 root root  9 Jan  2 04:47 <iscsi-tgt-name>-lun-1 -> ../../sdd
lrwxrwxrwx 1 root root  9 Jan  2 04:47 <iscsi-tgt-name>-lun-2 -> ../../sdc

It seems somehow the devices are exposed to the system in a different
order from how they are scanned in the iSCSI session.

Although I hoped when I gave the rescan command to the iSCSI daemon, it
would simply add a device, actually it did much more.

Obviously, there should be better support for predictability and safety
in these procedures. It's rather unhelpful that even after rebooting,
the device order remains inverted in the two views.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: parent transid verify failed
  2021-12-31 20:33   ` Eric Levy
  2021-12-31 23:09     ` Chris Murphy
  2022-01-01 20:56     ` Zygo Blaxell
@ 2022-01-02  7:31     ` Andrei Borzenkov
  2 siblings, 0 replies; 22+ messages in thread
From: Andrei Borzenkov @ 2022-01-02  7:31 UTC (permalink / raw)
  To: Eric Levy; +Cc: linux-btrfs

On Sun, Jan 2, 2022 at 4:11 AM Eric Levy <contact@ericlevy.name> wrote:
>
> Dec 29 21:01:09 hostname kernel: sd 4:0:0:1: Warning! Received an indication that the LUN assignments on this target have changed. The Linux SCSI layer does not automatical

Message is truncated (please always show full log), but it indicates
that iSCSI target configuration changed. You may need to update your
client to reflect this change.

...
> Dec 30 03:45:09 hostname kernel:  connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4461894664, last ping 4461895936, now 4461897216
> Dec 30 03:45:09 hostname kernel:  connection1:0: detected conn error (1022)
> Dec 30 03:47:10 hostname kernel:  session1: session recovery timed out after 120 secs

Looks like your system does not have access to this device any more.

> Dec 30 03:47:10 hostname kernel: sd 4:0:0:1: rejecting I/O to offline device
> Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev sdc, sector 523542288 op 0x1:(WRITE) flags 0x104000 phys_seg 128 prio class 0
> Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev sdc, sector 523540240 op 0x1:(WRITE) flags 0x100000 phys_seg 128 prio class 0
> Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev sdc, sector 523538192 op 0x1:(WRITE) flags 0x100000 phys_seg 128 prio class 0
> Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev sdc, sector 523537168 op 0x1:(WRITE) flags 0x104000 phys_seg 128 prio class 0
> Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
> Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev sdc, sector 523541264 op 0x1:(WRITE) flags 0x104000 phys_seg 128 prio class 0
> Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev sdc, sector 523539216 op 0x1:(WRITE) flags 0x104000 phys_seg 128 prio class 0
> Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
> Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev sdc, sector 523543312 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
> Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 3, rd 0, flush 0, corrupt 0, gen 0
> Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev sdc, sector 523635088 op 0x1:(WRITE) flags 0x800 phys_seg 64 prio class 0
> Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 4, rd 0, flush 0, corrupt 0, gen 0
> Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev sdc, sector 523543320 op 0x1:(WRITE) flags 0x100000 phys_seg 81 prio class 0
> Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 5, rd 0, flush 0, corrupt 0, gen 0
> Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev sdc, sector 523543968 op 0x1:(WRITE) flags 0x104000 phys_seg 128 prio class 0
> Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 6, rd 0, flush 0, corrupt 0, gen 0
> Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 7, rd 0, flush 0, corrupt 0, gen 0
> Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 8, rd 0, flush 0, corrupt 0, gen 0
> Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 9, rd 0, flush 0, corrupt 0, gen 0
> Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 10, rd 0, flush 0, corrupt 0, gen 0
> Dec 30 03:47:10 hostname kernel: BTRFS warning (device sdc1): error -5 while searching for dev_stats item for device /dev/sdc1
> Dec 30 03:47:10 hostname kernel: BTRFS warning (device sdc1): Skipping commit of aborted transaction.
> Dec 30 03:47:10 hostname kernel: BTRFS: error (device sdc1) in cleanup_transaction:1975: errno=-5 IO failure
> Dec 30 03:47:10 hostname kernel: BTRFS info (device sdc1): forced readonly
> Dec 30 05:11:10 hostname kernel: BTRFS warning (device sdc1): sdc1 checksum verify failed on 867254272 wanted 0x62c8b548 found 0x982a1375 level 0
> Dec 30 05:11:10 hostname kernel: BTRFS error (device sdc1): parent transid verify failed on 867254272 wanted 9212 found 8675
> Dec 30 05:11:10 hostname kernel: BTRFS warning (device sdc1): csum hole found for disk bytenr range [271567056896, 271567060992)
> Dec 30 05:11:10 hostname kernel: BTRFS error (device sdc1): parent transid verify failed on 867254272 wanted 9212 found 8675
> Dec 30 05:11:10 hostname kernel: BTRFS error (device sdc1): parent transid verify failed on 867254272 wanted 9212 found 8675
>

Those are probably just followup errors.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: parent transid verify failed
  2022-01-02  4:03               ` Eric Levy
@ 2022-01-02  5:57                 ` Zygo Blaxell
  2022-01-02 10:17                   ` Eric Levy
  2022-01-03  7:41                 ` Chris Murphy
  1 sibling, 1 reply; 22+ messages in thread
From: Zygo Blaxell @ 2022-01-02  5:57 UTC (permalink / raw)
  To: Eric Levy; +Cc: linux-btrfs

On Sat, Jan 01, 2022 at 11:03:25PM -0500, Eric Levy wrote:
> On Sat, 2022-01-01 at 22:27 -0500, Zygo Blaxell wrote:
> 
> > Yes, that's normal with lazy umounts.  With a lazy umount, the mount
> > point is removed immediately, so you get all the behavior you
> > described,
> > but the filesystem is not actually umounted until after the last open
> > file descriptors referencing the filesystem are closed (that's the
> > "lazy" part).
> 
> All of the explicit umount calls I gave had no lazy flag. I don't
> understand the full details of your comments, as I am only aware of
> lazy umount as a fallback for failing umount calls in synchronous
> operations.
> 
> It seems the way mounts work, especially when interacting with user
> space solutions, such as FUSE and Gvfs, is often burdened by legacy
> design never updated to work properly. I have seen Gvfs pull down an
> entire system from trying to access a file in a user mount, since the
> mount was made before a network connection with the Samba server was
> dropped.
> 
> For laptops these kinds of vulnerability are quite debilitating.

Indeed.  A lot of users get bitten by this.

If there were block device connectivity issues (or udev integration
issues) then systemd or udev seem to "helpfully" lazy-umount filesystems
in some cases.

With other filesystems, multiple instances of the same filesystem uuid
can be mounted on different block devices and mount points, so the
lingering lazy-umounted filesystem can be ignored.  They take up some
kernel memory and make a little noise during writeback events, but they
are otherwise inert as no process can access them and they have no access
to any device they could write to.

With btrfs, this is a disaster, as each btrfs filesystem uuid can be
mounted exactly once in any given kernel, and all following mounts
of the same uuid are converted into bind mounts on subvols.  If the
filesystem gets into a bad state and is lazy-umounted, there's no easy
way to recover.

> > Since you're transitioning from one disk to multiple disks, you
> > should
> > convert metadata to raid1, and ensure there's sufficient unallocated
> > space on the first drive to hold metadata as it grows.  This can be
> > done
> 
> I thought it was the default for disk pools, fully duplicated metadata,
> and JBOD-style for payload data.

If you mkfs and start with a completely empty filesystem, that's true.

If you add a second disk to a previously single-disk filesystem, the
existing data will be in single and dup profiles.  New data will still
be single profile, but new metadata will be raid1, and that imposes the
requirement that there be sufficient space on _both_ disks to allocate
all existing and future metadata in raid1.  If the first disk is near
full when the second disk is added, this requirement is almost certainly
not met.

> > with two balance commands:
> > 
> > 	btrfs balance start -dlimit=16 /fs
> > 
> > 	btrfs balance start -mconvert=raid1,soft /fs
> 
> Does balance manage balanced utilization of disks? For specific
> scenarios, unbalanced disk usage is actually desirable, for example, my
> current one, in which the devices are logical volumes on a redundant
> back end. Using as much of one volume as possible simplifies
> reallocation of back end storage resources.

btrfs balance is a tool for relocating data, converting between RAID
profiles, or changing chunk allocations from data to metadata.  In this
case, we want to move a few GB of data to the second disk so that there
will be room for metadata on both disks.  With the limit filter, this
is only 3% of a full balance for a 500GB filesystem (though if you have
enough IO time to spare for that, you can go ahead and do a full balance
if that's simpler to manage).  This will get the filesystem close enough
to its final state with two disks that normal allocations can proceed
until the disks fill up.

btrfs balance does not manage future disk utilization, other than by
rearranging data and free space to get desirable allocator behavior
in the future.  The allocator algorithms are all fixed, and optimized
for maximum available space usage starting from an empty filesystem.
All use cases that are different from that are a bit of a challenge on
btrfs (e.g. SSD metadata, remote mirror devices, starting from non-initial
conditions after a profile change, adding a device, increasing the size
of a device...the list goes on).

Note that btrfs 'single' profile always fills the disk with the most
free space first ('raid1' profile fills the two disks with the most free
space first).  If you add a new disk to an existing array with single
data, all the new data will typically be allocated on the new disk
(unless the old disk is much larger than the new one).  This makes all
the space usable regardless of which profile is used for metadata.

If this is not what you want, you could use 'btrfs fi resize' to resize
the second device smaller, and resize it larger again over time as it
fills up, but you'll have to monitor that closely and make continuous
adjustments as you're forcing the allocator to do the opposite of its
normal behavior (don't resize the physical device, just the amount of
the device that btrfs will use).

> ---
> 
> I am noticing that the order of devices in now reversed. I am wondering
> whether what happened is that iSCSI, either on the client or server
> side, reversed the order, by some quirk or design flaw. while I had the
> live mount. 

Device order (mapping of /dev/sd? device to underlying storage) usually
isn't guaranteed, but once it's set it tends to be persistent; otherwise,
disastrous problems would occur with every multi-device Linux subsystem:
btrfs, mdadm, lvm, and dm-crypt would all significantly damage their data
if the mapping from subdevice to physical storage changed unexpectedly
while they were active.

In disconnect-reconnect cases, the old device is reconnected only if it
can be reliably identified; otherwise, an entirely new device is created
(as in the case where sdc became sdf after iSCSI logout and reconnect).

> Still, it doesn't explain why I originally was able to see
> the new, unpartitioned device and add it to the pool.

Why would you not be able to see both raw and partition devices?
Normally you can use either, just not both at the same time.

> Another possibility is that the sequence of add/remove/add, which I
> explained earlier, caused some problem, but it also seems not clear
> how, since all of these operations are synchronous.

I don't know the fine details of what the lower storage layers you've
chosen are doing, so I can't comment on specific sequences of operations.

Some interesting things can happen if the last writes before a device is
closed are lost--in the device add and remove cases, the device would
or would not be marked as belonging to the filesystem, with possible
surprising results when the filesystem is reassembled.  On the other
hand, most of the possible results are "the filesystem goes splat and
can't be mounted any more" or "the filesystem permanently adopts a
device that wasn't expected" and it doesn't sound like either of those
events happened.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: parent transid verify failed
  2022-01-02  3:27             ` Zygo Blaxell
@ 2022-01-02  4:03               ` Eric Levy
  2022-01-02  5:57                 ` Zygo Blaxell
  2022-01-03  7:41                 ` Chris Murphy
  0 siblings, 2 replies; 22+ messages in thread
From: Eric Levy @ 2022-01-02  4:03 UTC (permalink / raw)
  To: linux-btrfs

On Sat, 2022-01-01 at 22:27 -0500, Zygo Blaxell wrote:

> Yes, that's normal with lazy umounts.  With a lazy umount, the mount
> point is removed immediately, so you get all the behavior you
> described,
> but the filesystem is not actually umounted until after the last open
> file descriptors referencing the filesystem are closed (that's the
> "lazy" part).

All of the explicit umount calls I gave had no lazy flag. I don't
understand the full details of your comments, as I am only aware of
lazy umount as a fallback for failing umount calls in synchronous
operations.

It seems the way mounts work, especially when interacting with user
space solutions, such as FUSE and Gvfs, is often burdened by legacy
design never updated to work properly. I have seen Gvfs pull down an
entire system from trying to access a file in a user mount, since the
mount was made before a network connection with the Samba server was
dropped.

For laptops these kinds of vulnerability are quite debilitating.

> You can run btrfs scrub to verify the data and metadata.
> 
> The most probable outcome is that everything's OK.  The device layer
> reported failed writes to btrfs, and btrfs aborted the last
> transaction
> to protect the on-disk data.  I would not expect any errors in either
> btrfs check or scrub, as this scenario is one the filesystem is
> designed
> and tested for.

Great, thanks. Scrub reported no errors, and the check reported none,
as already mentioned.

> Since you're transitioning from one disk to multiple disks, you
> should
> convert metadata to raid1, and ensure there's sufficient unallocated
> space on the first drive to hold metadata as it grows.  This can be
> done

I thought it was the default for disk pools, fully duplicated metadata,
and JBOD-style for payload data.

> with two balance commands:
> 
> 	btrfs balance start -dlimit=16 /fs
> 
> 	btrfs balance start -mconvert=raid1,soft /fs

Does balance manage balanced utilization of disks? For specific
scenarios, unbalanced disk usage is actually desirable, for example, my
current one, in which the devices are logical volumes on a redundant
back end. Using as much of one volume as possible simplifies
reallocation of back end storage resources.

---

I am noticing that the order of devices in now reversed. I am wondering
whether what happened is that iSCSI, either on the client or server
side, reversed the order, by some quirk or design flaw. while I had the
live mount. Still, it doesn't explain why I originally was able to see
the new, unpartitioned device and add it to the pool.

Another possibility is that the sequence of add/remove/add, which I
explained earlier, caused some problem, but it also seems not clear
how, since all of these operations are synchronous.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: parent transid verify failed
  2022-01-02  0:55           ` Eric Levy
@ 2022-01-02  3:27             ` Zygo Blaxell
  2022-01-02  4:03               ` Eric Levy
  0 siblings, 1 reply; 22+ messages in thread
From: Zygo Blaxell @ 2022-01-02  3:27 UTC (permalink / raw)
  To: Eric Levy; +Cc: linux-btrfs

On Sat, Jan 01, 2022 at 07:55:57PM -0500, Eric Levy wrote:
> On Sat, 2022-01-01 at 19:15 -0500, Zygo Blaxell wrote:
> > Try rebooting to make sure the old mount is truly gone.
> > 
> > If the filesystem has been lazily umounted with open files it can be
> > very difficult to find all the processes that are still holding the
> > files open and force them to close.  Tools like 'fuser' and 'lsof'
> > don't work after all the mount points have been deleted.
> 
> I couldn't find any indication that the file system remained mounted.
> The mount point was empty, umount commands failed for all target
> devices, as well as the mount point, and the mount table showed no
> relevant entries.

Yes, that's normal with lazy umounts.  With a lazy umount, the mount
point is removed immediately, so you get all the behavior you described,
but the filesystem is not actually umounted until after the last open
file descriptors referencing the filesystem are closed (that's the
"lazy" part).

It can get complicated when there are also errors in the lower storage
layers, as processes can sometimes get stuck in iowait state and become
unkillable, and then the only way to recover is to reboot.

Sometimes you can spot a filesystem that has been lazy-umounted in
/proc, by running

	ls -l /proc/self/*/fd/* /proc/self/*/{exec,cmd,root}

and looking for anomalous filenames in the symlinks (like the file
has the right base name but the early parts of the path are missing).
That's the only clue you'll have that this is happening, and it's
not even a reliable one.

With private mount namespaces it's even more confusing, because it's
possible for a filesystem to be mounted in a place that is not visible
to some processes.

> I rebooted, and the file system mounted automatically (as normal). I
> found no reports of problems in the log (below).
> 
> I performed a trivial test (writing "Hello world" to a file, and
> reading), and it performed correctly.
> 
> Is it likely that data written before the bad events is entirely
> intact, that all parts of the file tree not touched afterward have not
> been damaged? How confident should I be of not having corruption?

You can run btrfs scrub to verify the data and metadata.

The most probable outcome is that everything's OK.  The device layer
reported failed writes to btrfs, and btrfs aborted the last transaction
to protect the on-disk data.  I would not expect any errors in either
btrfs check or scrub, as this scenario is one the filesystem is designed
and tested for.

Normally, btrfs transactions will protect most old data from earlier
transactions from damage by write errors, so I would not expect any
errors in this case.  The exceptions are nodatacow files (because
the transactional data integrity guarantees require datacow to work)
and raid5/6 (because many parity raid implementations violate btrfs's
transactional data integrity rules, notably including btrfs's built-in
raid5/6).

> Is it safe to try to continue to write data beyond the capacity of the
> first device, into the additional device?

Probably.  It should have been safe the first time, too.  If you've
resolved the lower storage layer issues, then it should be OK.

Since you're transitioning from one disk to multiple disks, you should
convert metadata to raid1, and ensure there's sufficient unallocated
space on the first drive to hold metadata as it grows.  This can be done
with two balance commands:

	btrfs balance start -dlimit=16 /fs

	btrfs balance start -mconvert=raid1,soft /fs

(replace "16" with the total data capacity of your filesystem divided by
30 GB, or 5 + the number of GB of metadata you already have, whichever is
larger)

If you run out of metadata space, then the filesystem will abort a
transaction and go read-only.  Adding a second disk to a full first disk
makes this likely--a balance is required to rearrange the data so that
all the space can be used.

> ---
> 
> 
> Jan 01 19:26:43 hostname kernel: Loading iSCSI transport class v2.0-870.
> Jan 01 19:26:43 hostname kernel: iscsi: registered transport (tcp)
> Jan 01 19:26:43 hostname kernel: aufs 5.x-rcN-20210809
> Jan 01 19:26:43 hostname kernel: scsi host4: iSCSI Initiator over TCP/IP
> Jan 01 19:26:43 hostname kernel: scsi host5: iSCSI Initiator over TCP/IP
> Jan 01 19:26:43 hostname kernel: scsi 4:0:0:2: Direct-Access     SYNOLOGY Storage          4.0  PQ: 0 ANSI: 5
> Jan 01 19:26:43 hostname kernel: sd 4:0:0:2: Attached scsi generic sg4 type 0
> Jan 01 19:26:43 hostname kernel: sd 4:0:0:2: [sdc] 524288000 512-byte logical blocks: (268 GB/250 GiB)
> Jan 01 19:26:43 hostname kernel: sd 4:0:0:2: [sdc] Write Protect is off
> Jan 01 19:26:43 hostname kernel: sd 4:0:0:2: [sdc] Mode Sense: 43 00 10 08
> Jan 01 19:26:43 hostname kernel: sd 4:0:0:2: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA
> Jan 01 19:26:43 hostname kernel: sd 4:0:0:2: [sdc] Optimal transfer size 16384 logical blocks > dev_max (8192 logical blocks)
> Jan 01 19:26:43 hostname kernel: scsi 4:0:0:1: Direct-Access     SYNOLOGY iSCSI Storage    4.0  PQ: 0 ANSI: 5
> Jan 01 19:26:43 hostname kernel: sd 4:0:0:1: Attached scsi generic sg5 type 0
> Jan 01 19:26:43 hostname kernel: sd 4:0:0:1: [sdd] 524288000 512-byte logical blocks: (268 GB/250 GiB)
> Jan 01 19:26:43 hostname kernel: sd 4:0:0:1: [sdd] Write Protect is off
> Jan 01 19:26:43 hostname kernel: sd 4:0:0:1: [sdd] Mode Sense: 43 00 10 08
> Jan 01 19:26:43 hostname kernel: sd 4:0:0:1: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA
> Jan 01 19:26:43 hostname kernel: sd 4:0:0:1: [sdd] Optimal transfer size 16384 logical blocks > dev_max (8192 logical blocks)
> Jan 01 19:26:43 hostname kernel:  sdd: sdd1
> Jan 01 19:26:43 hostname kernel: sd 4:0:0:2: [sdc] Attached SCSI disk
> Jan 01 19:26:43 hostname kernel: sd 4:0:0:1: [sdd] Attached SCSI disk
> Jan 01 19:26:43 hostname kernel: BTRFS: device fsid c6f83d24-1ac3-4417-bdd9-6249c899604d devid 2 transid 9211 /dev/sdc scanned by systemd-udevd (386)
> Jan 01 19:26:44 hostname kernel: BTRFS: device fsid c6f83d24-1ac3-4417-bdd9-6249c899604d devid 1 transid 9211 /dev/sdd1 scanned by systemd-udevd (396)

There's the dev scan with no duplicate devices...

> Jan 01 19:26:44 hostname kernel: kauditd_printk_skb: 13 callbacks suppressed
> Jan 01 19:26:44 hostname kernel: audit: type=1400 audit(1641083204.403:25): apparmor="DENIED" operation="capable" profile="/usr/sbin/cups-browsed" pid=670 comm="cups-browsed" capability=23  capname="sys_nice"
> Jan 01 19:26:47 hostname kernel: audit: type=1400 audit(1641083207.587:26): apparmor="STATUS" operation="profile_load" profile="unconfined" name="docker-default" pid=993 comm="apparmor_parser"
> Jan 01 19:26:49 hostname kernel: bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
> Jan 01 19:26:49 hostname kernel: Bridge firewalling registered
> Jan 01 19:26:49 hostname kernel: bpfilter: Loaded bpfilter_umh pid 1017
> Jan 01 19:26:49 hostname unknown: Started bpfilter
> Jan 01 19:26:50 hostname kernel: Initializing XFRM netlink socket
> Jan 01 19:28:45 hostname kernel: BTRFS info (device sdd1): flagging fs with big metadata feature
> Jan 01 19:28:45 hostname kernel: BTRFS info (device sdd1): disk space caching is enabled
> Jan 01 19:28:45 hostname kernel: BTRFS info (device sdd1): has skinny extents

...and there's the ctree open messages.  Looks good.

Note there are no errors recorded (i.e. no messages reporting error
counts during mount).  Those errors occurred in the last transaction
that was aborted, so their error counts are not persisted.

> Jan 01 19:28:45 hostname kernel: FS-Cache: Loaded
> Jan 01 19:28:45 hostname kernel: FS-Cache: Netfs 'nfs' registered for caching
> Jan 01 19:28:45 hostname kernel: NFS: Registering the id_resolver key type
> Jan 01 19:28:45 hostname kernel: Key type id_resolver registered
> Jan 01 19:28:45 hostname kernel: Key type id_legacy registered
> Jan 01 19:28:45 hostname kernel: FS-Cache: Duplicate cookie detected
> Jan 01 19:28:45 hostname kernel: FS-Cache: O-cookie c=00000000eab55a10 [p=00000000197b94cc fl=222 nc=0 na=1]
> Jan 01 19:28:45 hostname kernel: FS-Cache: O-cookie d=000000007bc6e6c7 n=00000000d6efde4c
> Jan 01 19:28:45 hostname kernel: FS-Cache: O-key=[16] '0400000001000000020008010a040102'
> Jan 01 19:28:45 hostname kernel: FS-Cache: N-cookie c=00000000d8681119 [p=00000000197b94cc fl=2 nc=0 na=1]
> Jan 01 19:28:45 hostname kernel: FS-Cache: N-cookie d=000000007bc6e6c7 n=00000000fd46ace5
> Jan 01 19:28:45 hostname kernel: FS-Cache: N-key=[16] '0400000001000000020008010a040102'
> Jan 01 19:29:15 hostname kernel: BTRFS info (device sdd1): the free space cache file (274908315648) is invalid, skip it

Free space cache updates were lost too.  btrfs space_cache v1 expects
that to happen, and can detect it and rebuild the cache.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: parent transid verify failed
  2022-01-02  0:15         ` Zygo Blaxell
@ 2022-01-02  0:55           ` Eric Levy
  2022-01-02  3:27             ` Zygo Blaxell
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Levy @ 2022-01-02  0:55 UTC (permalink / raw)
  To: linux-btrfs

On Sat, 2022-01-01 at 19:15 -0500, Zygo Blaxell wrote:
> Try rebooting to make sure the old mount is truly gone.
> 
> If the filesystem has been lazily umounted with open files it can be
> very difficult to find all the processes that are still holding the
> files open and force them to close.  Tools like 'fuser' and 'lsof'
> don't work after all the mount points have been deleted.

I couldn't find any indication that the file system remained mounted.
The mount point was empty, umount commands failed for all target
devices, as well as the mount point, and the mount table showed no
relevant entries.

I rebooted, and the file system mounted automatically (as normal). I
found no reports of problems in the log (below).

I performed a trivial test (writing "Hello world" to a file, and
reading), and it performed correctly.

Is it likely that data written before the bad events is entirely
intact, that all parts of the file tree not touched afterward have not
been damaged? How confident should I be of not having corruption?

Is it safe to try to continue to write data beyond the capacity of the
first device, into the additional device?

---


Jan 01 19:26:43 hostname kernel: Loading iSCSI transport class v2.0-870.
Jan 01 19:26:43 hostname kernel: iscsi: registered transport (tcp)
Jan 01 19:26:43 hostname kernel: aufs 5.x-rcN-20210809
Jan 01 19:26:43 hostname kernel: scsi host4: iSCSI Initiator over TCP/IP
Jan 01 19:26:43 hostname kernel: scsi host5: iSCSI Initiator over TCP/IP
Jan 01 19:26:43 hostname kernel: scsi 4:0:0:2: Direct-Access     SYNOLOGY Storage          4.0  PQ: 0 ANSI: 5
Jan 01 19:26:43 hostname kernel: sd 4:0:0:2: Attached scsi generic sg4 type 0
Jan 01 19:26:43 hostname kernel: sd 4:0:0:2: [sdc] 524288000 512-byte logical blocks: (268 GB/250 GiB)
Jan 01 19:26:43 hostname kernel: sd 4:0:0:2: [sdc] Write Protect is off
Jan 01 19:26:43 hostname kernel: sd 4:0:0:2: [sdc] Mode Sense: 43 00 10 08
Jan 01 19:26:43 hostname kernel: sd 4:0:0:2: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA
Jan 01 19:26:43 hostname kernel: sd 4:0:0:2: [sdc] Optimal transfer size 16384 logical blocks > dev_max (8192 logical blocks)
Jan 01 19:26:43 hostname kernel: scsi 4:0:0:1: Direct-Access     SYNOLOGY iSCSI Storage    4.0  PQ: 0 ANSI: 5
Jan 01 19:26:43 hostname kernel: sd 4:0:0:1: Attached scsi generic sg5 type 0
Jan 01 19:26:43 hostname kernel: sd 4:0:0:1: [sdd] 524288000 512-byte logical blocks: (268 GB/250 GiB)
Jan 01 19:26:43 hostname kernel: sd 4:0:0:1: [sdd] Write Protect is off
Jan 01 19:26:43 hostname kernel: sd 4:0:0:1: [sdd] Mode Sense: 43 00 10 08
Jan 01 19:26:43 hostname kernel: sd 4:0:0:1: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA
Jan 01 19:26:43 hostname kernel: sd 4:0:0:1: [sdd] Optimal transfer size 16384 logical blocks > dev_max (8192 logical blocks)
Jan 01 19:26:43 hostname kernel:  sdd: sdd1
Jan 01 19:26:43 hostname kernel: sd 4:0:0:2: [sdc] Attached SCSI disk
Jan 01 19:26:43 hostname kernel: sd 4:0:0:1: [sdd] Attached SCSI disk
Jan 01 19:26:43 hostname kernel: BTRFS: device fsid c6f83d24-1ac3-4417-bdd9-6249c899604d devid 2 transid 9211 /dev/sdc scanned by systemd-udevd (386)
Jan 01 19:26:44 hostname kernel: BTRFS: device fsid c6f83d24-1ac3-4417-bdd9-6249c899604d devid 1 transid 9211 /dev/sdd1 scanned by systemd-udevd (396)
Jan 01 19:26:44 hostname kernel: kauditd_printk_skb: 13 callbacks suppressed
Jan 01 19:26:44 hostname kernel: audit: type=1400 audit(1641083204.403:25): apparmor="DENIED" operation="capable" profile="/usr/sbin/cups-browsed" pid=670 comm="cups-browsed" capability=23  capname="sys_nice"
Jan 01 19:26:47 hostname kernel: audit: type=1400 audit(1641083207.587:26): apparmor="STATUS" operation="profile_load" profile="unconfined" name="docker-default" pid=993 comm="apparmor_parser"
Jan 01 19:26:49 hostname kernel: bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
Jan 01 19:26:49 hostname kernel: Bridge firewalling registered
Jan 01 19:26:49 hostname kernel: bpfilter: Loaded bpfilter_umh pid 1017
Jan 01 19:26:49 hostname unknown: Started bpfilter
Jan 01 19:26:50 hostname kernel: Initializing XFRM netlink socket
Jan 01 19:28:45 hostname kernel: BTRFS info (device sdd1): flagging fs with big metadata feature
Jan 01 19:28:45 hostname kernel: BTRFS info (device sdd1): disk space caching is enabled
Jan 01 19:28:45 hostname kernel: BTRFS info (device sdd1): has skinny extents
Jan 01 19:28:45 hostname kernel: FS-Cache: Loaded
Jan 01 19:28:45 hostname kernel: FS-Cache: Netfs 'nfs' registered for caching
Jan 01 19:28:45 hostname kernel: NFS: Registering the id_resolver key type
Jan 01 19:28:45 hostname kernel: Key type id_resolver registered
Jan 01 19:28:45 hostname kernel: Key type id_legacy registered
Jan 01 19:28:45 hostname kernel: FS-Cache: Duplicate cookie detected
Jan 01 19:28:45 hostname kernel: FS-Cache: O-cookie c=00000000eab55a10 [p=00000000197b94cc fl=222 nc=0 na=1]
Jan 01 19:28:45 hostname kernel: FS-Cache: O-cookie d=000000007bc6e6c7 n=00000000d6efde4c
Jan 01 19:28:45 hostname kernel: FS-Cache: O-key=[16] '0400000001000000020008010a040102'
Jan 01 19:28:45 hostname kernel: FS-Cache: N-cookie c=00000000d8681119 [p=00000000197b94cc fl=2 nc=0 na=1]
Jan 01 19:28:45 hostname kernel: FS-Cache: N-cookie d=000000007bc6e6c7 n=00000000fd46ace5
Jan 01 19:28:45 hostname kernel: FS-Cache: N-key=[16] '0400000001000000020008010a040102'
Jan 01 19:29:15 hostname kernel: BTRFS info (device sdd1): the free space cache file (274908315648) is invalid, skip it


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: parent transid verify failed
  2022-01-01 21:58       ` Eric Levy
@ 2022-01-02  0:15         ` Zygo Blaxell
  2022-01-02  0:55           ` Eric Levy
  0 siblings, 1 reply; 22+ messages in thread
From: Zygo Blaxell @ 2022-01-02  0:15 UTC (permalink / raw)
  To: Eric Levy; +Cc: linux-btrfs

On Sat, Jan 01, 2022 at 04:58:58PM -0500, Eric Levy wrote:
> Here is the new information:
> 
> 
> 1) Previously, the system log from R/W mount attempt, which failed:
> 
> Dec 31 15:15:48 hostname sudo[1477409]:      user : TTY=pts/0 ; PWD=/home/user ; USER=root ; COMMAND=/usr/bin/mount -o skip_balance /dev/sdc1
> Dec 31 15:15:48 hostname sudo[1477409]: pam_unix(sudo:session): session opened for user root by (uid=0)
> Dec 31 15:15:48 hostname sudo[1477409]: pam_unix(sudo:session): session closed for user root
> Dec 31 15:15:48 hostname kernel: BTRFS info (device sdc1): disk space caching is enabled
> Dec 31 15:15:48 hostname kernel: BTRFS error (device sdc1): Remounting read-write after error is not allowed
> 
> 
> 2) Same, but in RO:
> 
> Dec 31 15:14:50 hostname sudo[1477248]:      user : TTY=pts/0 ; PWD=/home/user ; USER=root ; COMMAND=/usr/bin/mount -o skip_balance,ro /dev/sdc1
> Dec 31 15:14:50 hostname sudo[1477248]: pam_unix(sudo:session): session opened for user root by (uid=0)
> Dec 31 15:14:50 hostname sudo[1477248]: pam_unix(sudo:session): session closed for user root
> Dec 31 15:14:50 hostname kernel: BTRFS error (device sdc1): parent transid verify failed on 867434496 wanted 9212 found 8675
> Dec 31 15:14:50 hostname kernel: BTRFS error (device sdc1): parent transid verify failed on 867434496 wanted 9212 found 8675
> Dec 31 15:14:50 hostname kernel: BTRFS error (device sdc1): parent transid verify failed on 867434496 wanted 9212 found 8675
> Dec 31 15:14:50 hostname kernel: BTRFS error (device sdc1): parent transid verify failed on 867434496 wanted 9212 found 8675
> Dec 31 15:14:50 hostname kernel: BTRFS error (Device sdc1): parent transid verify failed on 867434496 wanted 9212 found 8675
> Dec 31 15:14:50 hostname kernel: BTRFS error (device sdc1): parent transid verify failed on 867434496 wanted 9212 found 8675
> Dec 31 15:14:50 hostname kernel: BTRFS error (device sdc1): parent transid verify failed on 867434496 wanted 9212 found 8675
> Dec 31 15:14:50 hostname kernel: BTRFS error (device sdc1): parent transid verify failed on 867434496 wanted 9212 found 8675
> Dec 31 15:14:50 hostname kernel: BTRFS error (device sdc1): parent transid verify failed on 867434496 wanted 9212 found 8675
> Dec 31 15:14:50 hostname kernel: BTRFS error (device sdc1): parent transid verify failed on 867434496 wanted 9212 found 8675

There's no log messages from the ctree open and it's complaining about
remounting, so you still have the filesystem mounted somewhere, i.e. you
did not umount it successfully.  This would also lead to assigning new
SCSI device IDs instead of reusing the old ones, and getting duplicate
device complains during device scan.

Maybe you did a lazy umount and there's still an open file on the
filesystem?

> 3) After running iSCSI logout, and login, the devices were assigned new
> names (sdc -> sdf, sdd -> sde). Then checking with readonly flag, using
> admin tools, while unmounted:
> 
> $ sudo btrfs check --readonly /dev/sdf1
> Opening filesystem to check...
> Checking filesystem on /dev/sdf1
> UUID: c6f83d24-1ac3-4417-bdd9-6249c899604d
> [1/7] checking root items
> [2/7] checking extents
> [3/7] checking free space cache
> [4/7] checking fs roots
> [5/7] checking only csums items (without verifying data)
> [6/7] checking root refs
> [7/7] checking quota groups skipped (not enabled on this FS)
> found 266107584512 bytes used, no error found
> total csum bytes: 259546380
> total tree bytes: 586268672
> total fs tree bytes: 214188032
> total extent tree bytes: 52609024
> btree space waste bytes: 89657360
> file data blocks allocated: 1019677446144
>  referenced 300654301184

That's a good sign that the data on disk is OK.

> 4) However, mount still fails. Here is log output from trying to mount
> /dev/sdf1:
> 
> kernel: BTRFS warning: duplicate device /dev/sdf1 devid 1 generation 9211 scanned by mount (1641108)

Again there are no ctree open messages, so you're adding a new subvol
mount point to a filesystem that is already mounted, which means you
didn't successfully umount earlier.

The old device (/dev/sdc and /dev/sdd1) is still mounted, so btrfs device
scan and/or mount will complain about duplicate devices.

> 5) Same kind of results for /dev/sde:
> 
> kernel: BTRFS warning: duplicate device /dev/sde devid 2 generation 9211 scanned by mount (1642247)

Try rebooting to make sure the old mount is truly gone.

If the filesystem has been lazily umounted with open files it can be
very difficult to find all the processes that are still holding the
files open and force them to close.  Tools like 'fuser' and 'lsof'
don't work after all the mount points have been deleted.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: parent transid verify failed
  2022-01-01 20:56     ` Zygo Blaxell
@ 2022-01-01 21:58       ` Eric Levy
  2022-01-02  0:15         ` Zygo Blaxell
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Levy @ 2022-01-01 21:58 UTC (permalink / raw)
  To: linux-btrfs

Here is the new information:


1) Previously, the system log from R/W mount attempt, which failed:

Dec 31 15:15:48 hostname sudo[1477409]:      user : TTY=pts/0 ; PWD=/home/user ; USER=root ; COMMAND=/usr/bin/mount -o skip_balance /dev/sdc1
Dec 31 15:15:48 hostname sudo[1477409]: pam_unix(sudo:session): session opened for user root by (uid=0)
Dec 31 15:15:48 hostname sudo[1477409]: pam_unix(sudo:session): session closed for user root
Dec 31 15:15:48 hostname kernel: BTRFS info (device sdc1): disk space caching is enabled
Dec 31 15:15:48 hostname kernel: BTRFS error (device sdc1): Remounting read-write after error is not allowed


2) Same, but in RO:

Dec 31 15:14:50 hostname sudo[1477248]:      user : TTY=pts/0 ; PWD=/home/user ; USER=root ; COMMAND=/usr/bin/mount -o skip_balance,ro /dev/sdc1
Dec 31 15:14:50 hostname sudo[1477248]: pam_unix(sudo:session): session opened for user root by (uid=0)
Dec 31 15:14:50 hostname sudo[1477248]: pam_unix(sudo:session): session closed for user root
Dec 31 15:14:50 hostname kernel: BTRFS error (device sdc1): parent transid verify failed on 867434496 wanted 9212 found 8675
Dec 31 15:14:50 hostname kernel: BTRFS error (device sdc1): parent transid verify failed on 867434496 wanted 9212 found 8675
Dec 31 15:14:50 hostname kernel: BTRFS error (device sdc1): parent transid verify failed on 867434496 wanted 9212 found 8675
Dec 31 15:14:50 hostname kernel: BTRFS error (device sdc1): parent transid verify failed on 867434496 wanted 9212 found 8675
Dec 31 15:14:50 hostname kernel: BTRFS error (Device sdc1): parent transid verify failed on 867434496 wanted 9212 found 8675
Dec 31 15:14:50 hostname kernel: BTRFS error (device sdc1): parent transid verify failed on 867434496 wanted 9212 found 8675
Dec 31 15:14:50 hostname kernel: BTRFS error (device sdc1): parent transid verify failed on 867434496 wanted 9212 found 8675
Dec 31 15:14:50 hostname kernel: BTRFS error (device sdc1): parent transid verify failed on 867434496 wanted 9212 found 8675
Dec 31 15:14:50 hostname kernel: BTRFS error (device sdc1): parent transid verify failed on 867434496 wanted 9212 found 8675
Dec 31 15:14:50 hostname kernel: BTRFS error (device sdc1): parent transid verify failed on 867434496 wanted 9212 found 8675


3) After running iSCSI logout, and login, the devices were assigned new
names (sdc -> sdf, sdd -> sde). Then checking with readonly flag, using
admin tools, while unmounted:

$ sudo btrfs check --readonly /dev/sdf1
Opening filesystem to check...
Checking filesystem on /dev/sdf1
UUID: c6f83d24-1ac3-4417-bdd9-6249c899604d
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 266107584512 bytes used, no error found
total csum bytes: 259546380
total tree bytes: 586268672
total fs tree bytes: 214188032
total extent tree bytes: 52609024
btree space waste bytes: 89657360
file data blocks allocated: 1019677446144
 referenced 300654301184


4) However, mount still fails. Here is log output from trying to mount
/dev/sdf1:

kernel: BTRFS warning: duplicate device /dev/sdf1 devid 1 generation 9211 scanned by mount (1641108)


5) Same kind of results for /dev/sde:

kernel: BTRFS warning: duplicate device /dev/sde devid 2 generation 9211 scanned by mount (1642247)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: parent transid verify failed
  2022-01-01 20:49         ` Chris Murphy
@ 2022-01-01 21:57           ` Eric Levy
  0 siblings, 0 replies; 22+ messages in thread
From: Eric Levy @ 2022-01-01 21:57 UTC (permalink / raw)
  To: linux-btrfs

On Sat, 2022-01-01 at 13:49 -0700, Chris Murphy wrote:


> OK except we've got kernel messages clearly showing /dev/sdc is
> rejecting writes, and that's what btrfs is complaining about. Since
> there's only a partial dmesg to go on, I can really say if there's
> something else going on, but pretty straightforward when there's
> multiple of these:

All I'm suggesting is that the write issue is far more likely to be
occurring at the layer of logical volumes than physical media, based on
the symptoms.
 
If you go back a few messages, you can find one with the full output,
attached as a file. 


> Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev
> sdc, sector 523542288 op 0x1:(WRITE) flags 0x104000 phys_seg 128 prio
> class 0
> 
> That it will result in these:
> 
> Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev
> /dev/sdc1 errs: wr 10, rd 0, flush 0, corrupt 0, gen 0
> 
> Btrfs is telling us that writes are being dropped. I'm not sure why
> /dev/sdc is dropping writes but the block layer also knows they're
> being rejected and what sector, so I kinda suspect the block device
> itself is aware of and reporting the write failures with sector
> address to the kernel.

If the problem is at the logical layer, then I would suspect the same,
that the block layer knows whether block writes are committed.


> I can't tell if you've tried umounting this file system and then
> mounting it again cleanly (not remount) and what messages appear in
> dmesg if it fails.

Yes. Clean mount/umount only. No remount option.


> It's definitely an issue with /dev/sdc, but I can't tell from the
> available info why it's dropping writes, only that two parts of the
> kernel are seeing it, btrfs and blk_update_request

I'm simply giving the inference that there must have been some bock
writes on sdc(1) following the addition of sdd as a system device,
otherwise we would have seen failures trying to join the two devices
into the same file system. Right?

So there was some interval during which both block devices were
writable. Right?


> > If the problem is device-level, then there is much to try,
including
> 
> > renewing the iSCSI login. I can also restart the daemon, reboot the
> > host, even restart the iSCSI backend service or appliance.
> > 
> > Would any operations of such a kind be helpful to try?
> 
> iSCSI isn't my area of expertise, someone else might have an idea
> what
> could be going on.

Nor mine, but I'm trying to make infereences based on how the parts
fit together.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: parent transid verify failed
  2021-12-31 20:33   ` Eric Levy
  2021-12-31 23:09     ` Chris Murphy
@ 2022-01-01 20:56     ` Zygo Blaxell
  2022-01-01 21:58       ` Eric Levy
  2022-01-02  7:31     ` Andrei Borzenkov
  2 siblings, 1 reply; 22+ messages in thread
From: Zygo Blaxell @ 2022-01-01 20:56 UTC (permalink / raw)
  To: Eric Levy; +Cc: linux-btrfs

On Fri, Dec 31, 2021 at 03:33:01PM -0500, Eric Levy wrote:
> On Fri, 2021-12-31 at 14:14 -0500, Zygo Blaxell wrote:
> 
> > To be clear, do the parent transid verify failed messages appear
> > _before_
> > or _after_ the filesystem switches to read-only?
> > 
> > "After" is fine.  
> 
> "Fine" in the sense that the file system is fully recoverable, or only
> that the files are not damaged, and may be copied to a different file
> system?

"Fine" in the sense that the "parent transid verify failed" error is
not arising from persistent storage, and everything on disk is OK (or
at least not damaged by previously undetected lost writes _specifically_).

More precisely:  we can't distinguish between p-t-v-f errors caused by
temporary in-memory data vs. errors caused by persistent on-disk data
after the filesystem is forced read-only.  If we want to determine whether
there are any persistent errors on disk, we must umount and mount again
to find out.  We can't rely on any errors reported between the switch
to read-only and umounting the filesystem.

In common cases there are no persistent errors on disk, so we can presume
that any p-t-v-f errors after the readonly switch are exclusively in
memory; however, from other things you've posted in this thread, this
specific case may not be a "common" one.

Depending on what the lower layers did, critical writes may have been
dropped, and unrecoverable damage to metadata may have occurred.

> It seems that the RO switch was before the first parent transid error.
> See logs below (excerpt from more complete capture sent previously).

> > Mount with '-o skip_balance'.  If you're in the "after" case then
> > this
> > will avoid running out of metadata space again during mount.
> 
> It seems not to work. So far, I succeeded in mounting only with the
> "ro" option. I can mount with "skip_balance" and "ro" in combination.
> Is that helpful? (I can also mount with "ro" only, or with "ro" and
> "usebackuproot" in combination.)

It means you can read the data from the filesystem, and it's possible
to update or complete your backups.

> $ sudo mount -o skip_balance /dev/sdc1
> mount: /srv/store: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error.

We'd need to see the log messages from the failing mounts.

> $ sudo mount -o skip_balance,ro /dev/sdc1
> 
> ---
> 
> Dec 29 21:01:09 hostname kernel: sd 4:0:0:1: Warning! Received an indication that the LUN assignments on this target have changed. The Linux SCSI layer does not automatical
> Dec 29 21:01:12 hostname kernel: scsi 4:0:0:2: Direct-Access     SYNOLOGY Storage          4.0  PQ: 0 ANSI: 5
> Dec 29 21:01:12 hostname kernel: sd 4:0:0:2: Attached scsi generic sg5 type 0
> Dec 29 21:01:12 hostname kernel: sd 4:0:0:2: [sdd] 524288000 512-byte logical blocks: (268 GB/250 GiB)
> Dec 29 21:01:12 hostname kernel: sd 4:0:0:2: [sdd] Write Protect is off
> Dec 29 21:01:12 hostname kernel: sd 4:0:0:2: [sdd] Mode Sense: 43 00 10 08
> Dec 29 21:01:12 hostname kernel: sd 4:0:0:2: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA
> Dec 29 21:01:12 hostname kernel: sd 4:0:0:2: [sdd] Optimal transfer size 16384 logical blocks > dev_max (8192 logical blocks)
> Dec 29 21:01:12 hostname kernel: sd 4:0:0:2: [sdd] Attached SCSI disk
> Dec 29 21:05:56 hostname kernel:  sdd:
> Dec 29 21:05:56 hostname kernel:  sdd:
> Dec 29 21:05:56 hostname kernel:  sdd:
> Dec 29 21:06:19 hostname kernel:  sdd: sdd1
> Dec 29 21:06:20 hostname kernel:  sdd: sdd1
> Dec 29 21:06:20 hostname kernel:  sdd: sdd1
> Dec 29 21:06:20 hostname kernel: BTRFS: device fsid e523753b-f3fd-4586-971e-fd7333115ded devid 1 transid 5 /dev/sdd1 scanned by mkfs.btrfs (1191911)
> Dec 29 21:06:23 hostname kernel:  sdd: sdd1
> Dec 29 21:09:21 hostname kernel: BTRFS info (device sdc1): disk added /dev/sdd1
> Dec 29 22:26:47 hostname kernel: BTRFS info (device sdc1): device deleted: /dev/sdd1
> Dec 29 22:28:28 hostname kernel:  sdd: sdd1
> Dec 29 22:28:28 hostname kernel: BTRFS info (device sdc1): disk added /dev/sdd
> Dec 30 03:45:09 hostname kernel:  connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4461894664, last ping 4461895936, now 4461897216
> Dec 30 03:45:09 hostname kernel:  connection1:0: detected conn error (1022)
> Dec 30 03:47:10 hostname kernel:  session1: session recovery timed out after 120 secs
> Dec 30 03:47:10 hostname kernel: sd 4:0:0:1: rejecting I/O to offline device
> Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev sdc, sector 523542288 op 0x1:(WRITE) flags 0x104000 phys_seg 128 prio class 0
> Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev sdc, sector 523540240 op 0x1:(WRITE) flags 0x100000 phys_seg 128 prio class 0
> Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev sdc, sector 523538192 op 0x1:(WRITE) flags 0x100000 phys_seg 128 prio class 0
> Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev sdc, sector 523537168 op 0x1:(WRITE) flags 0x104000 phys_seg 128 prio class 0
> Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
> Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev sdc, sector 523541264 op 0x1:(WRITE) flags 0x104000 phys_seg 128 prio class 0
> Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev sdc, sector 523539216 op 0x1:(WRITE) flags 0x104000 phys_seg 128 prio class 0
> Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
> Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev sdc, sector 523543312 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
> Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 3, rd 0, flush 0, corrupt 0, gen 0
> Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev sdc, sector 523635088 op 0x1:(WRITE) flags 0x800 phys_seg 64 prio class 0
> Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 4, rd 0, flush 0, corrupt 0, gen 0
> Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev sdc, sector 523543320 op 0x1:(WRITE) flags 0x100000 phys_seg 81 prio class 0
> Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 5, rd 0, flush 0, corrupt 0, gen 0
> Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev sdc, sector 523543968 op 0x1:(WRITE) flags 0x104000 phys_seg 128 prio class 0
> Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 6, rd 0, flush 0, corrupt 0, gen 0
> Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 7, rd 0, flush 0, corrupt 0, gen 0
> Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 8, rd 0, flush 0, corrupt 0, gen 0
> Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 9, rd 0, flush 0, corrupt 0, gen 0
> Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 10, rd 0, flush 0, corrupt 0, gen 0
> Dec 30 03:47:10 hostname kernel: BTRFS warning (device sdc1): error -5 while searching for dev_stats item for device /dev/sdc1
> Dec 30 03:47:10 hostname kernel: BTRFS warning (device sdc1): Skipping commit of aborted transaction.
> Dec 30 03:47:10 hostname kernel: BTRFS: error (device sdc1) in cleanup_transaction:1975: errno=-5 IO failure
> Dec 30 03:47:10 hostname kernel: BTRFS info (device sdc1): forced readonly
> Dec 30 05:11:10 hostname kernel: BTRFS warning (device sdc1): sdc1 checksum verify failed on 867254272 wanted 0x62c8b548 found 0x982a1375 level 0
> Dec 30 05:11:10 hostname kernel: BTRFS error (device sdc1): parent transid verify failed on 867254272 wanted 9212 found 8675
> Dec 30 05:11:10 hostname kernel: BTRFS warning (device sdc1): csum hole found for disk bytenr range [271567056896, 271567060992)
> Dec 30 05:11:10 hostname kernel: BTRFS error (device sdc1): parent transid verify failed on 867254272 wanted 9212 found 8675
> Dec 30 05:11:10 hostname kernel: BTRFS error (device sdc1): parent transid verify failed on 867254272 wanted 9212 found 8675
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: parent transid verify failed
  2022-01-01  7:33       ` Eric Levy
@ 2022-01-01 20:49         ` Chris Murphy
  2022-01-01 21:57           ` Eric Levy
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Murphy @ 2022-01-01 20:49 UTC (permalink / raw)
  To: Eric Levy; +Cc: linux-btrfs

On Sat, Jan 1, 2022 at 12:34 AM Eric Levy <contact@ericlevy.name> wrote:

> The hardware is an appliance with an array of SATA magnetic disks
> managed in a RAID6-like volume. The volume is formatted with a top-
> level Btrfs file system, on which is installed an OS that manages
> services and provides an administrative interface.
>
> All of these components are presently reported healthy through normal
> diagnostics and logging. More intensive diagnostics would include a
> S.M.A.R.T. extended test of each media device, and a scrubbing of the
> file system. All available indications suggest these operations would
> expose no problems.

OK except we've got kernel messages clearly showing /dev/sdc is
rejecting writes, and that's what btrfs is complaining about. Since
there's only a partial dmesg to go on, I can really say if there's
something else going on, but pretty straightforward when there's
multiple of these:

Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev
sdc, sector 523542288 op 0x1:(WRITE) flags 0x104000 phys_seg 128 prio
class 0

That it will result in these:

Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev
/dev/sdc1 errs: wr 10, rd 0, flush 0, corrupt 0, gen 0

Btrfs is telling us that writes are being dropped. I'm not sure why
/dev/sdc is dropping writes but the block layer also knows they're
being rejected and what sector, so I kinda suspect the block device
itself is aware of and reporting the write failures with sector
address to the kernel.


Dec 30 03:47:10 hostname kernel: BTRFS warning (device sdc1): error -5
while searching for dev_stats item for device /dev/sdc1
Dec 30 03:47:10 hostname kernel: BTRFS warning (device sdc1): Skipping
commit of aborted transaction.
Dec 30 03:47:10 hostname kernel: BTRFS: error (device sdc1) in
cleanup_transaction:1975: errno=-5 IO failure
Dec 30 03:47:10 hostname kernel: BTRFS info (device sdc1): forced readonly

Too many dropped writes and Btrfs decides it's going to stop trying,
so that the file system hopefully doesn't get corrupt with some writes
getting to stable media but not others. But as it turns out, maybe
some of those writes did get to stable media, hence the confusion.

I can't tell if you've tried umounting this file system and then
mounting it again cleanly (not remount) and what messages appear in
dmesg if it fails.


> Possible reasons for the device becoming read only include the
> following:
>
>    1. As a side effect of provisioning the new volume, write privileges
>       were effectively removed for the existing one.
>    2. The LV backend became confused and entered a protective state.
>    3. The iSCSI initiator was unable to add a new volume without affecting
>       the existing one.

Btrfs went read-only in this case because it detected dropped writes,
and it's deciding to stop further writes so that the confusion doesn't
end up on disk, hopefully. Going read-only is not the problem, that's
the fail safe action Btrfs is trying to take as a result of the
problem. The problem is, all these dropped writes from /dev/sdc


>
> However, the logs show that between the detection of the new volume and
> the reversion to the RO state, several add and remove operations were
> completed. Without knowing about the mechanics of Btrfs, I would
> postulate that these operations depend on successful write access to
> all devices in the pool, including any added or removed. It appears to
> me as though the volumes were healthy and accessible for at least some
> time, which can be calculated from the log timestamps, long enough for
> me to issue commands.
>
> Thus, my first thought was a file-system issue, not a device issue.

It's definitely an issue with /dev/sdc, but I can't tell from the
available info why it's dropping writes, only that two parts of the
kernel are seeing it, btrfs and blk_update_request


> Is the present indication more strongly that a) the driver refuses to
> mount the file system because its state shows as corrupt, or b) the
> driver aborts the mount operation because it fails to write at the
> block level?

You can check the consistency of the file system by umounting and then
running 'btrfs check --readonly` against any device in the file system
(the command finds all member devices and the fs is checked on all of
them as needed).

>
> If the problem is device-level, then there is much to try, including
> renewing the iSCSI login. I can also restart the daemon, reboot the
> host, even restart the iSCSI backend service or appliance.
>
> Would any operations of such a kind be helpful to try?

iSCSI isn't my area of expertise, someone else might have an idea what
could be going on.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: parent transid verify failed
  2021-12-30 21:47 ` Chris Murphy
@ 2022-01-01 15:11   ` devel
  0 siblings, 0 replies; 22+ messages in thread
From: devel @ 2022-01-01 15:11 UTC (permalink / raw)
  To: Chris Murphy, Eric Levy; +Cc: linux-btrfs

On 30/12/2021 21:47, Chris Murphy wrote:
> On Thu, Dec 30, 2021 at 2:10 PM Eric Levy <contact@ericlevy.name> wrote:
>> Hello.
>>
>> I had a simple Btrfs partition, with only one subvolume, about 250 Gb
>> in size. As it began to fill, I added a second volume, live. By the
>> time the size of the file system reached the limit for the first
>> volume, the file system reverted to read only.
>>
>> From journalctl, the following message has recurred, with the same
>> numeric values:
>>
>> BTRFS error (device sdc1): parent transid verify failed on 867434496
>> wanted 9212 found 8675
>>
>> Presently, the file system mounts only as read only. It will not mount
>> in read-write, even with the usebackuproot option.
>>
>> It seems that balance and scrub are not available, either due to read-
>> only mode, or some other reason. Both abort as soon as they begin to
>> run.
> Complete dmesg, at least since adding the 2nd device and subsequent read only.
> `btrfs fi us $mountpoint` for the ro mounted file system so we can see
> the space available and bg profiles
> `btrfs insp dump-s $anydev` for either of the two devices
> kernel version
> btrfs-progs version
> make/model of the two devices being used
>
>> What is the best next step for recovery?
> For now, take advantage of the fact it mounts read-only, backup
> anything important and prepare to restore from backup. Transid errors
> are unlikely to be repairable. What's weird in this case is there's no
> crash or powerfailure. For transid problems to show up during normal
> operation is unusual, as if one of the drives is dropping writes
> occasionally without even needing a crash or powerfail to become
> confused.
>
>
As well as that I would suggest you run memtesx86+ because those sorts
of bit flips can well be bad memory. I speak from experience.

-- 
==

D Alexander


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: parent transid verify failed
  2021-12-31 23:09     ` Chris Murphy
@ 2022-01-01  7:33       ` Eric Levy
  2022-01-01 20:49         ` Chris Murphy
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Levy @ 2022-01-01  7:33 UTC (permalink / raw)
  To: linux-btrfs

On Fri, 2021-12-31 at 16:09 -0700, Chris Murphy wrote:
> Dec 29 21:01:09 hostname kernel: sd 4:0:0:1: Warning! Received an
> indication that the LUN assignments on this target have changed. The
> Linux SCSI layer does not automatical
> ...
> Dec 30 03:47:10 hostname kernel: sd 4:0:0:1: rejecting I/O to offline
> device
> Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev
> sdc, sector 523542288 op 0x1:(WRITE) flags 0x104000 phys_seg 128 prio
> class 0
> 
> 
> Can you tell us more about /dev/sdc a.k.a. scsi 4:0:0:1? Because this
> device seems to be in a bad state, and is rejecting writes.
> 
> 
> --
> Chris Murphy


The hardware is an appliance with an array of SATA magnetic disks
managed in a RAID6-like volume. The volume is formatted with a top-
level Btrfs file system, on which is installed an OS that manages
services and provides an administrative interface. 

All of these components are presently reported healthy through normal
diagnostics and logging. More intensive diagnostics would include a
S.M.A.R.T. extended test of each media device, and a scrubbing of the
file system. All available indications suggest these operations would
expose no problems.

Above these lower layers runs a logical-unit manager and an iSCSI
service, allowing provisioning of logical volumes. After a volume is
provisioned, it is given appropriate permissions (e.g. R/W), and mapped
to an iSCSI target. Adding it to an existing target allows it be
detected by an initiator (the host using the volume) without any
administrative overhead on the remote side. However, the new mapping is
not broadcast, so the initiator must request a refreshed the list of
items in the target.

The log message you have reproduced shows the iSCSI daemon detecting a
new volume in the target, as a consequence of my instruction from the
administrative tool to refresh the list. The volume labeled 4:0:0:1, or
/dev/sdc already existed. The one that was added was 4:0:0:2, or
/dev/sdd.

In this case, the logical units, one pre-existing and one new, were the
volumes for the failing file system.

Since all of these operations were new to me, I tried a variety of
smaller operations at each step to finally achieve the end result.  I
have no indication that I caused any problems, but it is possible there
were side effects due to my own missteps, or quirks in the design of
the administrative system.

While I think a problem of such kind is unlikely, it seems more likely
than a problem at a lower level.

Possible reasons for the device becoming read only include the
following:

   1. As a side effect of provisioning the new volume, write privileges
      were effectively removed for the existing one.
   2. The LV backend became confused and entered a protective state.
   3. The iSCSI initiator was unable to add a new volume without affecting
      the existing one.

However, the logs show that between the detection of the new volume and
the reversion to the RO state, several add and remove operations were
completed. Without knowing about the mechanics of Btrfs, I would
postulate that these operations depend on successful write access to
all devices in the pool, including any added or removed. It appears to
me as though the volumes were healthy and accessible for at least some
time, which can be calculated from the log timestamps, long enough for
me to issue commands.

Thus, my first thought was a file-system issue, not a device issue.

Is the present indication more strongly that a) the driver refuses to
mount the file system because its state shows as corrupt, or b) the
driver aborts the mount operation because it fails to write at the
block level?

If the problem is device-level, then there is much to try, including
renewing the iSCSI login. I can also restart the daemon, reboot the
host, even restart the iSCSI backend service or appliance. 

Would any operations of such a kind be helpful to try?





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: parent transid verify failed
  2021-12-31 20:33   ` Eric Levy
@ 2021-12-31 23:09     ` Chris Murphy
  2022-01-01  7:33       ` Eric Levy
  2022-01-01 20:56     ` Zygo Blaxell
  2022-01-02  7:31     ` Andrei Borzenkov
  2 siblings, 1 reply; 22+ messages in thread
From: Chris Murphy @ 2021-12-31 23:09 UTC (permalink / raw)
  To: Eric Levy; +Cc: linux-btrfs

Dec 29 21:01:09 hostname kernel: sd 4:0:0:1: Warning! Received an
indication that the LUN assignments on this target have changed. The
Linux SCSI layer does not automatical
...
Dec 30 03:47:10 hostname kernel: sd 4:0:0:1: rejecting I/O to offline device
Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev
sdc, sector 523542288 op 0x1:(WRITE) flags 0x104000 phys_seg 128 prio
class 0


Can you tell us more about /dev/sdc a.k.a. scsi 4:0:0:1? Because this
device seems to be in a bad state, and is rejecting writes.


--
Chris Murphy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: parent transid verify failed
  2021-12-31 19:14 ` Zygo Blaxell
@ 2021-12-31 20:33   ` Eric Levy
  2021-12-31 23:09     ` Chris Murphy
                       ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Eric Levy @ 2021-12-31 20:33 UTC (permalink / raw)
  To: linux-btrfs

On Fri, 2021-12-31 at 14:14 -0500, Zygo Blaxell wrote:

> To be clear, do the parent transid verify failed messages appear
> _before_
> or _after_ the filesystem switches to read-only?
> 
> "After" is fine.  

"Fine" in the sense that the file system is fully recoverable, or only
that the files are not damaged, and may be copied to a different file
system?

It seems that the RO switch was before the first parent transid error.
See logs below (excerpt from more complete capture sent previously).

I am also recalling that I originally created a partition on the new
volume, which I added, but then removed, in favor of adding the entire
unpartitioned device. Is it possible the second add operation clobbered
part of the file system, even though I submitted a remove operation in
between the two add operations?

$ sudo btrfs device add /dev/sdd1 ./ -f
$ sudo btrfs device remove /dev/sdd1 ./
$ sudo btrfs device add /dev/sdd ./ -f

> Mount with '-o skip_balance'.  If you're in the "after" case then
> this
> will avoid running out of metadata space again during mount.

It seems not to work. So far, I succeeded in mounting only with the
"ro" option. I can mount with "skip_balance" and "ro" in combination.
Is that helpful? (I can also mount with "ro" only, or with "ro" and
"usebackuproot" in combination.)

$ sudo mount -o skip_balance /dev/sdc1
mount: /srv/store: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error.

$ sudo mount -o skip_balance,ro /dev/sdc1

---

Dec 29 21:01:09 hostname kernel: sd 4:0:0:1: Warning! Received an indication that the LUN assignments on this target have changed. The Linux SCSI layer does not automatical
Dec 29 21:01:12 hostname kernel: scsi 4:0:0:2: Direct-Access     SYNOLOGY Storage          4.0  PQ: 0 ANSI: 5
Dec 29 21:01:12 hostname kernel: sd 4:0:0:2: Attached scsi generic sg5 type 0
Dec 29 21:01:12 hostname kernel: sd 4:0:0:2: [sdd] 524288000 512-byte logical blocks: (268 GB/250 GiB)
Dec 29 21:01:12 hostname kernel: sd 4:0:0:2: [sdd] Write Protect is off
Dec 29 21:01:12 hostname kernel: sd 4:0:0:2: [sdd] Mode Sense: 43 00 10 08
Dec 29 21:01:12 hostname kernel: sd 4:0:0:2: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA
Dec 29 21:01:12 hostname kernel: sd 4:0:0:2: [sdd] Optimal transfer size 16384 logical blocks > dev_max (8192 logical blocks)
Dec 29 21:01:12 hostname kernel: sd 4:0:0:2: [sdd] Attached SCSI disk
Dec 29 21:05:56 hostname kernel:  sdd:
Dec 29 21:05:56 hostname kernel:  sdd:
Dec 29 21:05:56 hostname kernel:  sdd:
Dec 29 21:06:19 hostname kernel:  sdd: sdd1
Dec 29 21:06:20 hostname kernel:  sdd: sdd1
Dec 29 21:06:20 hostname kernel:  sdd: sdd1
Dec 29 21:06:20 hostname kernel: BTRFS: device fsid e523753b-f3fd-4586-971e-fd7333115ded devid 1 transid 5 /dev/sdd1 scanned by mkfs.btrfs (1191911)
Dec 29 21:06:23 hostname kernel:  sdd: sdd1
Dec 29 21:09:21 hostname kernel: BTRFS info (device sdc1): disk added /dev/sdd1
Dec 29 22:26:47 hostname kernel: BTRFS info (device sdc1): device deleted: /dev/sdd1
Dec 29 22:28:28 hostname kernel:  sdd: sdd1
Dec 29 22:28:28 hostname kernel: BTRFS info (device sdc1): disk added /dev/sdd
Dec 30 03:45:09 hostname kernel:  connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4461894664, last ping 4461895936, now 4461897216
Dec 30 03:45:09 hostname kernel:  connection1:0: detected conn error (1022)
Dec 30 03:47:10 hostname kernel:  session1: session recovery timed out after 120 secs
Dec 30 03:47:10 hostname kernel: sd 4:0:0:1: rejecting I/O to offline device
Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev sdc, sector 523542288 op 0x1:(WRITE) flags 0x104000 phys_seg 128 prio class 0
Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev sdc, sector 523540240 op 0x1:(WRITE) flags 0x100000 phys_seg 128 prio class 0
Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev sdc, sector 523538192 op 0x1:(WRITE) flags 0x100000 phys_seg 128 prio class 0
Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev sdc, sector 523537168 op 0x1:(WRITE) flags 0x104000 phys_seg 128 prio class 0
Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev sdc, sector 523541264 op 0x1:(WRITE) flags 0x104000 phys_seg 128 prio class 0
Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev sdc, sector 523539216 op 0x1:(WRITE) flags 0x104000 phys_seg 128 prio class 0
Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev sdc, sector 523543312 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 3, rd 0, flush 0, corrupt 0, gen 0
Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev sdc, sector 523635088 op 0x1:(WRITE) flags 0x800 phys_seg 64 prio class 0
Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 4, rd 0, flush 0, corrupt 0, gen 0
Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev sdc, sector 523543320 op 0x1:(WRITE) flags 0x100000 phys_seg 81 prio class 0
Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 5, rd 0, flush 0, corrupt 0, gen 0
Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev sdc, sector 523543968 op 0x1:(WRITE) flags 0x104000 phys_seg 128 prio class 0
Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 6, rd 0, flush 0, corrupt 0, gen 0
Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 7, rd 0, flush 0, corrupt 0, gen 0
Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 8, rd 0, flush 0, corrupt 0, gen 0
Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 9, rd 0, flush 0, corrupt 0, gen 0
Dec 30 03:47:10 hostname kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 10, rd 0, flush 0, corrupt 0, gen 0
Dec 30 03:47:10 hostname kernel: BTRFS warning (device sdc1): error -5 while searching for dev_stats item for device /dev/sdc1
Dec 30 03:47:10 hostname kernel: BTRFS warning (device sdc1): Skipping commit of aborted transaction.
Dec 30 03:47:10 hostname kernel: BTRFS: error (device sdc1) in cleanup_transaction:1975: errno=-5 IO failure
Dec 30 03:47:10 hostname kernel: BTRFS info (device sdc1): forced readonly
Dec 30 05:11:10 hostname kernel: BTRFS warning (device sdc1): sdc1 checksum verify failed on 867254272 wanted 0x62c8b548 found 0x982a1375 level 0
Dec 30 05:11:10 hostname kernel: BTRFS error (device sdc1): parent transid verify failed on 867254272 wanted 9212 found 8675
Dec 30 05:11:10 hostname kernel: BTRFS warning (device sdc1): csum hole found for disk bytenr range [271567056896, 271567060992)
Dec 30 05:11:10 hostname kernel: BTRFS error (device sdc1): parent transid verify failed on 867254272 wanted 9212 found 8675
Dec 30 05:11:10 hostname kernel: BTRFS error (device sdc1): parent transid verify failed on 867254272 wanted 9212 found 8675


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: parent transid verify failed
  2021-12-30 21:10 Eric Levy
  2021-12-30 21:47 ` Chris Murphy
@ 2021-12-31 19:14 ` Zygo Blaxell
  2021-12-31 20:33   ` Eric Levy
  1 sibling, 1 reply; 22+ messages in thread
From: Zygo Blaxell @ 2021-12-31 19:14 UTC (permalink / raw)
  To: Eric Levy; +Cc: linux-btrfs

On Thu, Dec 30, 2021 at 04:10:23PM -0500, Eric Levy wrote:
> Hello.
> 
> I had a simple Btrfs partition, with only one subvolume, about 250 Gb
> in size. As it began to fill, I added a second volume, live. By the
> time the size of the file system reached the limit for the first
> volume, the file system reverted to read only.
> 
> >From journalctl, the following message has recurred, with the same
> numeric values:
> 
> BTRFS error (device sdc1): parent transid verify failed on 867434496
> wanted 9212 found 8675

To be clear, do the parent transid verify failed messages appear _before_
or _after_ the filesystem switches to read-only?

"After" is fine.  When the kernel switches btrfs to read-only, it stops
updating the disk, so pointers in memory no longer match what's on disk
and you will get a whole stream of errors that only exist in btrfs's
kernel RAM.  A umount/mount will clear those.  This is most likely caused
by running out of metadata space because you didn't balance data block
groups on the first drive after (or in some cases before) adding the
second drive.

"Before" is the unrecoverable case.  Some drives silently dropped writes,
which is a failure btrfs can detect but not recover from except in
cases where the missing data is present on some other drive (i.e. RAID1
configurations).  Depending on how "added a second volume, live" was done,
writes could be interrupted or lost on the first drive without reporting
to the kernel (e.g. bumping the cables or browning out a power supply).

Since the "after" case can happen on healthy hardware in this scenario,
but the "before" case requires a hardware failure, it's more likely
you're in the "after" case, and the filesystem can be recovered by
carefully rearranging the data on the disks.  We'll need the output of
'btrfs fi usage' to see where to start with this.

> Presently, the file system mounts only as read only. It will not mount
> in read-write, even with the usebackuproot option. 
> 
> It seems that balance and scrub are not available, either due to read-
> only mode, or some other reason. Both abort as soon as they begin to
> run.

Mount with '-o skip_balance'.  If you're in the "after" case then this
will avoid running out of metadata space again during mount.

> What is the best next step for recovery?

Confirm whether the first "parent transid verify failed" message appears
before or after the filesystem is forced read-only.  If it's before,
the best next step is mkfs and restore your backups.

If it's after, try -o skip_balance and provide us with 'btrfs fi usage'
details.

You will need to rearrange free space (balance with filters, delete some
data, or add additional drives temporarily) so that you can do a data
balance, then balance data block groups until both drives have equal free
space on them.  Also you should convert all existing metadata to raid1
profile (there's no sane use case for dup metadata on multiple drives)
but you'll have to do that after making space with data balances.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: parent transid verify failed
  2021-12-30 21:10 Eric Levy
@ 2021-12-30 21:47 ` Chris Murphy
  2022-01-01 15:11   ` devel
  2021-12-31 19:14 ` Zygo Blaxell
  1 sibling, 1 reply; 22+ messages in thread
From: Chris Murphy @ 2021-12-30 21:47 UTC (permalink / raw)
  To: Eric Levy; +Cc: linux-btrfs

On Thu, Dec 30, 2021 at 2:10 PM Eric Levy <contact@ericlevy.name> wrote:
>
> Hello.
>
> I had a simple Btrfs partition, with only one subvolume, about 250 Gb
> in size. As it began to fill, I added a second volume, live. By the
> time the size of the file system reached the limit for the first
> volume, the file system reverted to read only.
>
> From journalctl, the following message has recurred, with the same
> numeric values:
>
> BTRFS error (device sdc1): parent transid verify failed on 867434496
> wanted 9212 found 8675
>
> Presently, the file system mounts only as read only. It will not mount
> in read-write, even with the usebackuproot option.
>
> It seems that balance and scrub are not available, either due to read-
> only mode, or some other reason. Both abort as soon as they begin to
> run.

Complete dmesg, at least since adding the 2nd device and subsequent read only.
`btrfs fi us $mountpoint` for the ro mounted file system so we can see
the space available and bg profiles
`btrfs insp dump-s $anydev` for either of the two devices
kernel version
btrfs-progs version
make/model of the two devices being used

>
> What is the best next step for recovery?

For now, take advantage of the fact it mounts read-only, backup
anything important and prepare to restore from backup. Transid errors
are unlikely to be repairable. What's weird in this case is there's no
crash or powerfailure. For transid problems to show up during normal
operation is unusual, as if one of the drives is dropping writes
occasionally without even needing a crash or powerfail to become
confused.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* parent transid verify failed
@ 2021-12-30 21:10 Eric Levy
  2021-12-30 21:47 ` Chris Murphy
  2021-12-31 19:14 ` Zygo Blaxell
  0 siblings, 2 replies; 22+ messages in thread
From: Eric Levy @ 2021-12-30 21:10 UTC (permalink / raw)
  To: linux-btrfs

Hello.

I had a simple Btrfs partition, with only one subvolume, about 250 Gb
in size. As it began to fill, I added a second volume, live. By the
time the size of the file system reached the limit for the first
volume, the file system reverted to read only.

From journalctl, the following message has recurred, with the same
numeric values:

BTRFS error (device sdc1): parent transid verify failed on 867434496
wanted 9212 found 8675

Presently, the file system mounts only as read only. It will not mount
in read-write, even with the usebackuproot option. 

It seems that balance and scrub are not available, either due to read-
only mode, or some other reason. Both abort as soon as they begin to
run.

What is the best next step for recovery?



^ permalink raw reply	[flat|nested] 22+ messages in thread

* parent transid verify failed
       [not found] <E18363B1-CD81-41F4-A03C-4D09AA669915@plack.net>
@ 2015-04-28 12:34 ` Anthony Plack
  0 siblings, 0 replies; 22+ messages in thread
From: Anthony Plack @ 2015-04-28 12:34 UTC (permalink / raw)
  To: linux-btrfs

I am running kernel 4.0 and btrfs-prog mainline.  I have a backup.

Of the following commands:

btrfs check —repair device
btrfsck —repair device
mount -t btrfs -o recovery device mount && btrfs scrub start mount

--none of them remove the "parent transid verify failed” errors from the disk.

The disk was going to read-only.  The disk now mounts and seems to be fine.  However, these “errors” persist.

Is there any tool other than to zero the log, which will “repair” the log?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* parent transid verify failed
@ 2010-09-06 17:28 Jan Steffens
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Steffens @ 2010-09-06 17:28 UTC (permalink / raw)
  To: linux-btrfs

After an unclean shutdown, my btrfs is now unmountable:

device label root devid 1 transid 375202 /dev/sdc4
parent transid verify failed on 53984886784 wanted 375202 found 375201
parent transid verify failed on 53984886784 wanted 375202 found 375201
parent transid verify failed on 53984886784 wanted 375202 found 375201
btrfs: open_ctree failed

btrfsck aborts:

couldn't open because of unsupported option features (2).
btrfsck: disk-io.c:682: open_ctree_fd: Assertion `!(1)' failed.
[1]    14899 abort      btrfsck /dev/sdc4

Is there any way to recover the filesystem?

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-01-03  7:41 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-11 10:01 parent transid verify failed Massimo B.
  -- strict thread matches above, loose matches on Subject: below --
2021-12-30 21:10 Eric Levy
2021-12-30 21:47 ` Chris Murphy
2022-01-01 15:11   ` devel
2021-12-31 19:14 ` Zygo Blaxell
2021-12-31 20:33   ` Eric Levy
2021-12-31 23:09     ` Chris Murphy
2022-01-01  7:33       ` Eric Levy
2022-01-01 20:49         ` Chris Murphy
2022-01-01 21:57           ` Eric Levy
2022-01-01 20:56     ` Zygo Blaxell
2022-01-01 21:58       ` Eric Levy
2022-01-02  0:15         ` Zygo Blaxell
2022-01-02  0:55           ` Eric Levy
2022-01-02  3:27             ` Zygo Blaxell
2022-01-02  4:03               ` Eric Levy
2022-01-02  5:57                 ` Zygo Blaxell
2022-01-02 10:17                   ` Eric Levy
2022-01-03  7:41                 ` Chris Murphy
2022-01-02  7:31     ` Andrei Borzenkov
     [not found] <E18363B1-CD81-41F4-A03C-4D09AA669915@plack.net>
2015-04-28 12:34 ` Anthony Plack
2010-09-06 17:28 Jan Steffens

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.