Re: Rough (re)start with btrfs

* Re: Rough (re)start with btrfs
       [not found] <em9eba60a7-2c0d-4399-8712-c134f0f50d4d@ryzen>
@ 2019-05-02 23:40 ` Qu Wenruo
  2019-05-03  5:41   ` Re[2]: " Hendrik Friedel
  2019-05-03  5:58   ` Chris Murphy
  2019-05-03  5:52 ` Re[2]: " Chris Murphy
  1 sibling, 2 replies; 9+ messages in thread
From: Qu Wenruo @ 2019-05-02 23:40 UTC (permalink / raw)
  To: Hendrik Friedel, Chris Murphy, Btrfs BTRFS

[-- Attachment #1.1: Type: text/plain, Size: 2800 bytes --]

On 2019/5/3 上午3:02, Hendrik Friedel wrote:
> Hello,
> 
> thanks for your replies. I appreciate it!
>>>  I am using btrfs-progs v4.20.2 and debian stretch with
>>>  4.19.0-0.bpo.2-amd64 (I think, this is the latest Kernel available in
>>>  stretch. Please correct if I am wrong.
>>
>> What scheduler is being used for the drive?
>>
>> # cat /sys/block/<dev>/queue/scheduler
> [mq-deadline] none
> 
>> If it's none, then kernel version and scheduler aren't likely related
>> to what you're seeing.
>>
>> It's not immediately urgent, but I would still look for something
>> newer, just because the 4.19 series already has 37 upstream updates
>> released, each with dozens of fixes, easily there are over 1000 fixes
>> available in total. I'm not a Debian user but I think there's
>> stretch-backports that has newer kernels?
>> http://jensd.be/818/linux/install-a-newer-kernel-in-debian-9-stretch-stable
>>
> 
> Unfortunately, backports provides 4.19 as the latest.
> I am now manually compiling 5.0. Last time I did that, I was less half
> my current age :-)
> 
>> We need the entire dmesg so we can see if there are any earlier
>> complaints by the drive or the link. Can you attach the entire dmesg
>> as a file?
> Done (also the two smartctl outputs).
> 
>>Have you tried stop the workload, and see if the timeout disappears?
> 
> Unfortunately not. I had the impression that the system did not react
> anymore. I CTRL-Ced and rebooted.
> I was copying all the stuff from my old drive to the new one. I should
> say, that the workload was high, but not exceptional. Just one or two
> copy jobs.

Then it's some deadlock, not regular high load timeout.

> Also, the btrfs drive was in advantage:
> 1) it had btrfs ;-) (the other ext4)
> 2) it did not need to search
> 3) it was connected via SATA (and not USB3 as the source)
> 
> The drive does not seem to be an SMR drive (WD80EZAZ).
> 
>> If it just disappear after some time, then it's the disk too slow and
>> too heavy load, combined with btrfs' low concurrency design leading to
>> the problem.
> 
> I was tempted to ask, whether this should be fixed. On the other hand, I
> am not even sure anything bad happened (except, well, the system -at
> least the copy- seemed to hang).

Definitely needs to be fixed.

With full dmesg, it's now clear that is a real dead lock.
Something wrong with the free space cache, blocking the whole fs to be
committed.

If you still want to try btrfs, you could try "nosapce_cache" mount option.
Free space cache of btrfs is just an optimization, you can completely
ignore that with minor performance drop.

Thanks,
Qu

> 
> By the way: I ran a scrub and a smartctl -t long. Both without errors.
> 
> Greetings,
> Hendrik

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread