All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Andy Leadbetter <andy.leadbetter@theleadbetters.com>,
	linux-btrfs@vger.kernel.org
Subject: Re: [[Missing subject]]
Date: Fri, 23 Nov 2018 17:55:40 +0800	[thread overview]
Message-ID: <7526c43f-dac7-3a32-9c53-42f3f6fa9072@gmx.com> (raw)
In-Reply-To: <CAJUWh6qyHerKg=-oaFN+USa10_Aag5+SYjBOeLCX1qM+WcDUwA@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 7165 bytes --]



On 2018/11/23 下午2:41, Andy Leadbetter wrote:
> I have a failing 2TB disk that is part of a 4 disk RAID 6 system.  I
> have added a new 2TB disk to the computer, and started a BTRFS replace
> for the old and new disk.  The process starts correctly however some
> hours into the job, there is an error and kernel oops. relevant log
> below.
> 
> The disks are configured on top of bcache, in 5 arrays with a small
> 128GB SSD cache shared.  The system in this configuration has worked
> perfectly for 3 years, until 2 weeks ago csum errors started
> appearing.  I have a crashplan backup of all files on the disk, so I
> am not concerned about data loss, but I would like to avoid rebuild
> the system.
> 
> btrfs dev stats shows
> [/dev/bcache0].write_io_errs    0
> [/dev/bcache0].read_io_errs     0
> [/dev/bcache0].flush_io_errs    0
> [/dev/bcache0].corruption_errs  0
> [/dev/bcache0].generation_errs  0
> [/dev/bcache1].write_io_errs    0
> [/dev/bcache1].read_io_errs     20
> [/dev/bcache1].flush_io_errs    0
> [/dev/bcache1].corruption_errs  0
> [/dev/bcache1].generation_errs  14

Unfortunately, this is not a sign of degrading disk, but something
really went wrong, screwing up some metadata.

For such case, it's recommended to do a "btrfs check --readonly", to
show how serious the problem is.

It could be some subvolume corruption, or some non-essential tree, but
anyway the generation mismatch is a problem that neither kernel or
btrfs-progs has a real good solution.

So at least please consider rebuild the fs.

Despite that, it's recommended to provide the versions of all the
kernels run on the fs, along with the mount option used.

We had some similar reports on such generation mismatch, but still we
don't have a convincing cause for it.
From old kernel to space cache corruption to powerloss + space cache
corruption.

> [/dev/bcache3].write_io_errs    0
> [/dev/bcache3].read_io_errs     0
> [/dev/bcache3].flush_io_errs    0
> [/dev/bcache3].corruption_errs  0
> [/dev/bcache3].generation_errs  19
> [/dev/bcache2].write_io_errs    0
> [/dev/bcache2].read_io_errs     0
> [/dev/bcache2].flush_io_errs    0
> [/dev/bcache2].corruption_errs  0
> [/dev/bcache2].generation_errs  2
> 
> and a smart test of the backing disk /dev/bcache1 shows a high read
> error rate, and lot of reallocated sectors.  The disk is 10 years old,
> and has clearly started to fail.
> 
> I've tried the latest kernel, and the latest tools, but nothing will
> allow me to replace, or delete the failed disk.
> 
>   884.171025] BTRFS info (device bcache0): dev_replace from
> /dev/bcache1 (devid 2) to /dev/bcache4 started
> [ 3301.101958] BTRFS error (device bcache0): parent transid verify
> failed on 8251260944384 wanted 640926 found 640907
> [ 3301.241214] BTRFS error (device bcache0): parent transid verify
> failed on 8251260944384 wanted 640926 found 640907
> [ 3301.241398] BTRFS error (device bcache0): parent transid verify
> failed on 8251260944384 wanted 640926 found 640907
> [ 3301.241513] BTRFS error (device bcache0): parent transid verify
> failed on 8251260944384 wanted 640926 found 640907

If btrfs check --readonly only reports this problem, it may be possible
for us to fix it.

Please also do a tree block dump on this block by:
# btrfs ins dump-tree -b 8251260944384 /dev/bcache0

If btrfs check --readonly reports a lot of problems, then it's strongly
recommended to rebuild the filesystem.

Thanks,
Qu

> [ 3302.381094] BTRFS error (device bcache0):
> btrfs_scrub_dev(/dev/bcache1, 2, /dev/bcache4) failed -5
> [ 3302.394612] WARNING: CPU: 0 PID: 5936 at
> /build/linux-5s7Xkn/linux-4.15.0/fs/btrfs/dev-replace.c:413
> btrfs_dev_replace_start+0x281/0x320 [btrfs]
> [ 3302.394613] Modules linked in: btrfs zstd_compress xor raid6_pq
> bcache intel_rapl x86_pkg_temp_thermal intel_powerclamp
> snd_hda_codec_hdmi coretemp kvm_intel snd_hda_codec_realtek kvm
> snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core
> irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hwdep
> snd_pcm pcbc snd_seq_midi aesni_intel snd_seq_midi_event joydev
> input_leds aes_x86_64 snd_rawmidi crypto_simd glue_helper snd_seq
> eeepc_wmi cryptd asus_wmi snd_seq_device snd_timer wmi_bmof
> sparse_keymap snd intel_cstate intel_rapl_perf soundcore mei_me mei
> shpchp mac_hid sch_fq_codel acpi_pad parport_pc ppdev lp parport
> ip_tables x_tables autofs4 overlay nls_iso8859_1 dm_mirror
> dm_region_hash dm_log hid_generic usbhid hid uas usb_storage i915
> i2c_algo_bit drm_kms_helper syscopyarea sysfillrect
> [ 3302.394640]  sysimgblt fb_sys_fops r8169 mxm_wmi mii drm ahci
> libahci wmi video
> [ 3302.394646] CPU: 0 PID: 5936 Comm: btrfs Not tainted
> 4.15.0-20-generic #21-Ubuntu
> [ 3302.394646] Hardware name: System manufacturer System Product
> Name/H110M-R, BIOS 3404 10/10/2017
> [ 3302.394658] RIP: 0010:btrfs_dev_replace_start+0x281/0x320 [btrfs]
> [ 3302.394659] RSP: 0018:ffffa8b582b5fd18 EFLAGS: 00010282
> [ 3302.394660] RAX: 00000000fffffffb RBX: ffff927d3afe0000 RCX: 0000000000000000
> [ 3302.394660] RDX: 0000000000000001 RSI: 0000000000000296 RDI: ffff927d3afece90
> [ 3302.394661] RBP: ffffa8b582b5fd68 R08: 0000000000000000 R09: ffffa8b582b5fc18
> [ 3302.394662] R10: ffffa8b582b5fd10 R11: 0000000000000000 R12: ffff927d3afece20
> [ 3302.394662] R13: ffff927d34b59421 R14: ffff927d34b59020 R15: 0000000000000001
> [ 3302.394663] FS:  00007fba4831c8c0(0000) GS:ffff927df6c00000(0000)
> knlGS:0000000000000000
> [ 3302.394664] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3302.394664] CR2: 00002b9b83db85b8 CR3: 0000000164d3a002 CR4: 00000000003606f0
> [ 3302.394665] Call Trace:
> [ 3302.394676]  btrfs_dev_replace_by_ioctl+0x39/0x60 [btrfs]
> [ 3302.394686]  btrfs_ioctl+0x1988/0x2080 [btrfs]
> [ 3302.394689]  ? iput+0x8d/0x220
> [ 3302.394690]  ? __blkdev_put+0x199/0x1f0
> [ 3302.394692]  do_vfs_ioctl+0xa8/0x630
> [ 3302.394701]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
> [ 3302.394703]  ? do_vfs_ioctl+0xa8/0x630
> [ 3302.394704]  ? do_sigaction+0xb4/0x1e0
> [ 3302.394706]  SyS_ioctl+0x79/0x90
> [ 3302.394708]  do_syscall_64+0x73/0x130
> [ 3302.394710]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> [ 3302.394711] RIP: 0033:0x7fba471085d7
> [ 3302.394712] RSP: 002b:00007ffe5af753b8 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [ 3302.394713] RAX: ffffffffffffffda RBX: 000055a8eecfb2a0 RCX: 00007fba471085d7
> [ 3302.394713] RDX: 00007ffe5af757f8 RSI: 00000000ca289435 RDI: 0000000000000003
> [ 3302.394714] RBP: 00000000ffffffff R08: 0000000000000000 R09: 00007fba47160f00
> [ 3302.394715] R10: 0000000000000008 R11: 0000000000000246 R12: 00007ffe5af7880d
> [ 3302.394715] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000004
> [ 3302.394716] Code: 88 98 00 00 00 48 8b b0 90 00 00 00 6a 01 e8 67
> 9b fe ff 48 89 df 89 c6 e8 0d f8 ff ff 83 f8 8d 5a 74 6d 85 c0 0f 84
> 19 fe ff ff <0f> 0b e9 12 fe ff ff 89 c2 48 c7 c6 81 c1 c1 c0 48 89 df
> e8 f7
> [ 3302.394736] ---[ end trace a5f8501fc7a5d644 ]---e
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  parent reply	other threads:[~2018-11-23  9:55 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAJUWh6qyHerKg=-oaFN+USa10_Aag5+SYjBOeLCX1qM+WcDUwA@mail.gmail.com>
2018-11-23  7:52 ` Chris Murphy
2018-11-23  9:34   ` Re: Andy Leadbetter
2018-11-23  9:55 ` Qu Wenruo [this message]
     [not found] <4688c4e4.426@victor.provo.novell.com>
2011-08-18 10:09 ` <missing subject> Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7526c43f-dac7-3a32-9c53-42f3f6fa9072@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=andy.leadbetter@theleadbetters.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.