All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Murphy <lists@colorremedies.com>
To: Christian Wimmer <telefonchris@icloud.com>
Cc: Chris Murphy <lists@colorremedies.com>,
	Qu Wenruo <quwenruo.btrfs@gmx.com>, Qu WenRuo <wqu@suse.com>,
	Anand Jain <anand.jain@oracle.com>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: 12 TB btrfs file system on virtual machine broke again (third time)
Date: Sat, 11 Jan 2020 12:46:37 -0700	[thread overview]
Message-ID: <CAJCQCtRkZPq-k6pX3bCJmj25HY4eDdAEUcgLwGSh_Mi6VEqdiQ@mail.gmail.com> (raw)
In-Reply-To: <86147601-37F0-49C0-B6F8-0F5245750450@icloud.com>

On Sat, Jan 11, 2020 at 10:23 AM Christian Wimmer
<telefonchris@icloud.com> wrote:
>
> Hi guys,
>
> here the last lines before the suspend and after it:
>
>
> 66939.439945] Buffer I/O error on dev loop1, logical block 0, async page read
> [66939.621686] print_req_error: I/O error, dev loop1, sector 83885952
> [66939.621738] print_req_error: I/O error, dev loop1, sector 83885952
> [66939.621740] Buffer I/O error on dev loop1, logical block 10485744, async page read

I don't know what /dev/loop1 is or the origin of the i/o error, but
I'm gonna guess it's a problem with the underlying device, or a bug
related to either the VM or the host or guest kernels or some
interaction of the three.

> [509652.385601] sd 6:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [509652.385623] sd 6:0:0:0: [sdb] tag#0 Sense Key : Illegal Request [current]
> [509652.385628] sd 6:0:0:0: [sdb] tag#0 Add. Sense: Invalid command operation code
> [509652.385631] sd 6:0:0:0: [sdb] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 16 b2 c0 00 00 00 20 00 00
> [509652.385634] print_req_error: critical target error, dev sdb, sector 1487552
> [509652.385642] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
> [509652.386117] xhci_hcd 0000:00:1d.6: Mismatch between completed Set TR Deq Ptr command & xHCI internal state.
> [509652.386120] xhci_hcd 0000:00:1d.6: ep deq seg = ffff880bd3fbab40, deq ptr = ffff880bd3b65150
> [509652.386132] sd 6:0:0:0: [sdb] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [509652.386135] sd 6:0:0:0: [sdb] tag#1 Sense Key : Illegal Request [current]
> [509652.386137] sd 6:0:0:0: [sdb] tag#1 Add. Sense: Invalid command operation code
> [509652.386139] sd 6:0:0:0: [sdb] tag#1 CDB: Write(16) 8a 00 00 00 00 00 01 1d 9a 18 00 00 02 00 00 00
> [509652.386141] print_req_error: critical target error, dev sdb, sector 18717208
> [509652.386158] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 1, rd 1, flush 0, corrupt 0, gen 0
> [509682.560535] xhci_hcd 0000:00:1d.6: Mismatch between completed Set TR Deq Ptr command & xHCI internal state.
> [509682.560537] xhci_hcd 0000:00:1d.6: ep deq seg = ffff880bd3fbab40, deq ptr = ffff880bd3b65190
> [509682.560566] sd 6:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [509682.560567] sd 6:0:0:0: [sdb] tag#0 Sense Key : Illegal Request [current]
> [509682.560569] sd 6:0:0:0: [sdb] tag#0 Add. Sense: Invalid command operation code
> [509682.560571] sd 6:0:0:0: [sdb] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 36 b2 c0 00 00 00 20 00 00
> [509682.560572] print_req_error: critical target error, dev sdb, sector 3584704
> [509682.560579] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 1, rd 2, flush 0, corrupt 0, gen 0
> [509682.560667] BTRFS: error (device sdb1) in btrfs_start_dirty_block_groups:3716: errno=-5 IO failure
> [509682.560669] BTRFS info (device sdb1): forced readonly
> [509712.672325] xhci_hcd 0000:00:1d.6: Mismatch between completed Set TR Deq Ptr command & xHCI internal state.
> [509712.672330] xhci_hcd 0000:00:1d.6: ep deq seg = ffff880bd3fbab40, deq ptr = ffff880bd3b651d0
> [509712.672373] sd 6:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [509712.672377] sd 6:0:0:0: [sdb] tag#0 Sense Key : Illegal Request [current]
> [509712.672379] sd 6:0:0:0: [sdb] tag#0 Add. Sense: Invalid command operation code
> [509712.672382] sd 6:0:0:0: [sdb] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 16 b2 c0 00 00 00 20 00 00
> [509712.672384] print_req_error: critical target error, dev sdb, sector 1487552
> [509712.672388] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 1, rd 3, flush 0, corrupt 0, gen 0
> [509742.783978] xhci_hcd 0000:00:1d.6: Mismatch between completed Set TR Deq Ptr command & xHCI internal state.
> [509742.783982] xhci_hcd 0000:00:1d.6: ep deq seg = ffff880bd3fbab40, deq ptr = ffff880bd3b65210
> [509742.784016] sd 6:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [509742.784019] sd 6:0:0:0: [sdb] tag#0 Sense Key : Illegal Request [current]
> [509742.784021] sd 6:0:0:0: [sdb] tag#0 Add. Sense: Invalid command operation code
> [509742.784024] sd 6:0:0:0: [sdb] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 36 b2 c0 00 00 00 20 00 00
> [509742.784026] print_req_error: critical target error, dev sdb, sector 3584704
> [509742.784031] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 1, rd 4, flush 0, corrupt 0, gen 0
> [509742.784079] BTRFS: error (device sdb1) in btrfs_start_dirty_block_groups:3716: errno=-5 IO failure
> [509742.784082] BTRFS warning (device sdb1): Skipping commit of aborted transaction.
> [509742.784084] BTRFS: error (device sdb1) in cleanup_transaction:1881: errno=-5 IO failure
> [509742.784084] BTRFS info (device sdb1): delayed_refs has NO entry
> [512066.481724] usb 4-1: USB disconnect, device number 2
> [512066.484604] sd 6:0:0:0: [sdb] Synchronizing SCSI cache
> [512066.661516] usb 2-1: USB disconnect, device number 2
> [512066.661725] usblp0: removed

There are multiple hardware related errors here. Timing wise, it looks
like some kind of USB bus related error "xhci_hcd 0000:00:1d.6:
Mismatch between..." and maybe in the course of that error is what
causes the drive, /dev/sdb, to think a command it received is invalid,
hence "sd 6:0:0:0: [sdb] tag#0 Add. Sense: Invalid command operation
code" and the subsequent sector error, and the subsequent Btrfs error.

So Btrfs is getting confused because lower layers it has no control
over are doing the wrong thing; resulting in either dropped reads or
writes. Dropped writes can cause permanent corruption, depending on
what the writes are.

And then it looks like a USB device is disconnected.

Is /dev/sdb a USB stick or  drive in a USB enclosure? That's not
working out well, whatever it is.


> [512066.716156] usb 1-1: USB disconnect, device number 2
> [512066.751428] sd 6:0:0:0: [sdb] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
> [512066.811207] usb 2-2: USB disconnect, device number 3
> [512066.811350] usblp1: removed
> [512066.963131] usb 2-3: USB disconnect, device number 4
> [512066.963296] usblp2: removed
> [512067.091114] usb 4-1: new SuperSpeed USB device number 3 using xhci_hcd
> [512067.107446] usb 1-1: new full-speed USB device number 3 using uhci_hcd
> [512067.113881] usb 2-4: USB disconnect, device number 5
> [512067.114143] usblp3: removed
> [512067.126193] usb 4-1: New USB device found, idVendor=2109, idProduct=0711
> [512067.126195] usb 4-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
> [512067.126196] usb 4-1: Product: 20231
> [512067.126196] usb 4-1: Manufacturer: Ugreen
> [512067.126197] usb 4-1: SerialNumber: 000000128DA4
> [512067.140176] scsi host9: uas
> [512067.141472] scsi 9:0:0:0: Direct-Access     ST6000DM 004-2EH11C       DN02 PQ: 0 ANSI: 6
> [512067.142615] sd 9:0:0:0:


The problem with snippets is that they're over too small of a time
period, and after problems already started, to determine the sequence
of what went wrong, and what else is involved in the confusion. The
problem with getting the entire dmesg is that it's tedious to read
through to try and find the problem.

The gist though is the setup is flawed and now you've had three
failures it's telling you something is wrong.

Any hardware errors inside a VM guest, can be either hardware or
software errors in the host. It could be the hypervisor. It could be
the host kernel. Have you checked the host kernel messages while these
errors are happening inside the VM? What about the hypervisor logs?





-- 
Chris Murphy

  reply	other threads:[~2020-01-11 19:47 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-06  3:44 [PATCH] btrfs-progs: Skip device tree when we failed to read it Qu Wenruo
2019-12-06  6:12 ` Anand Jain
2019-12-06 15:50   ` Christian Wimmer
2019-12-06 16:34   ` Christian Wimmer
     [not found]   ` <762365A0-8BDF-454B-ABA9-AB2F0C958106@icloud.com>
2019-12-07  1:16     ` Qu WenRuo
2019-12-07  3:47       ` Christian Wimmer
2019-12-07  4:31         ` Qu Wenruo
2019-12-07 13:03           ` Christian Wimmer
2019-12-07 14:10             ` Qu Wenruo
2019-12-07 14:25               ` Christian Wimmer
2019-12-07 16:44               ` Christian Wimmer
2019-12-08  1:21                 ` Qu WenRuo
2019-12-10 21:25                   ` Christian Wimmer
2019-12-11  0:36                     ` Qu Wenruo
2019-12-11 15:57                       ` Christian Wimmer
     [not found]           ` <9FB359ED-EAD4-41DD-B846-1422F2DC4242@icloud.com>
2020-01-04 17:07             ` 12 TB btrfs file system on virtual machine broke again Christian Wimmer
2020-01-05  4:03               ` Chris Murphy
2020-01-05 13:40                 ` Christian Wimmer
2020-01-05 14:07                   ` Martin Raiber
2020-01-05 14:14                     ` Christian Wimmer
2020-01-05 14:23                       ` Christian Wimmer
2020-01-05  4:25               ` Qu Wenruo
2020-01-05 14:17                 ` Christian Wimmer
2020-01-05 18:50                   ` Chris Murphy
2020-01-05 19:18                     ` Christian Wimmer
2020-01-05 19:36                       ` Chris Murphy
2020-01-05 19:49                         ` Christian Wimmer
2020-01-05 19:52                         ` Christian Wimmer
2020-01-05 20:34                           ` Chris Murphy
2020-01-05 20:36                             ` Chris Murphy
     [not found]                         ` <3F43DDB8-0372-4CDE-B143-D2727D3447BC@icloud.com>
2020-01-05 20:30                           ` Chris Murphy
2020-01-05 20:36                             ` Christian Wimmer
2020-01-05 21:13                               ` Chris Murphy
2020-01-05 21:58                                 ` Christian Wimmer
2020-01-05 22:28                                   ` Chris Murphy
2020-01-06  1:31                                     ` Christian Wimmer
2020-01-06  1:33                                     ` Christian Wimmer
2020-01-11 17:04                                     ` 12 TB btrfs file system on virtual machine broke again (third time) Christian Wimmer
2020-01-11 17:23                                     ` Christian Wimmer
2020-01-11 19:46                                       ` Chris Murphy [this message]
2020-01-13 19:41                                         ` 12 TB btrfs file system on virtual machine broke again (fourth time) Christian Wimmer
2020-01-13 20:03                                           ` Chris Murphy
2020-01-31 16:35                                             ` btrfs not booting any more Christian Wimmer
2020-05-08 12:20                                             ` btrfs reports bad key ordering after out of memory situation Christian Wimmer
2020-01-05 23:50                   ` 12 TB btrfs file system on virtual machine broke again Qu Wenruo
2020-01-06  1:32                     ` Christian Wimmer
2020-01-11  7:25                     ` Andrei Borzenkov
2021-10-15 21:01                     ` need help in a broken 2TB BTRFS partition Christian Wimmer
2021-10-16 10:08                       ` Qu Wenruo
2021-10-16 17:29                         ` Christian Wimmer
2021-10-16 22:55                           ` Qu Wenruo
2021-10-16 17:35                         ` Christian Wimmer
2021-10-16 23:27                           ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJCQCtRkZPq-k6pX3bCJmj25HY4eDdAEUcgLwGSh_Mi6VEqdiQ@mail.gmail.com \
    --to=lists@colorremedies.com \
    --cc=anand.jain@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=telefonchris@icloud.com \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.