All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Rough (re)start with btrfs
       [not found] <em9eba60a7-2c0d-4399-8712-c134f0f50d4d@ryzen>
@ 2019-05-02 23:40 ` Qu Wenruo
  2019-05-03  5:41   ` Re[2]: " Hendrik Friedel
  2019-05-03  5:58   ` Chris Murphy
  2019-05-03  5:52 ` Re[2]: " Chris Murphy
  1 sibling, 2 replies; 9+ messages in thread
From: Qu Wenruo @ 2019-05-02 23:40 UTC (permalink / raw)
  To: Hendrik Friedel, Chris Murphy, Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 2800 bytes --]



On 2019/5/3 上午3:02, Hendrik Friedel wrote:
> Hello,
> 
> thanks for your replies. I appreciate it!
>>>  I am using btrfs-progs v4.20.2 and debian stretch with
>>>  4.19.0-0.bpo.2-amd64 (I think, this is the latest Kernel available in
>>>  stretch. Please correct if I am wrong.
>>
>> What scheduler is being used for the drive?
>>
>> # cat /sys/block/<dev>/queue/scheduler
> [mq-deadline] none
> 
>> If it's none, then kernel version and scheduler aren't likely related
>> to what you're seeing.
>>
>> It's not immediately urgent, but I would still look for something
>> newer, just because the 4.19 series already has 37 upstream updates
>> released, each with dozens of fixes, easily there are over 1000 fixes
>> available in total. I'm not a Debian user but I think there's
>> stretch-backports that has newer kernels?
>> http://jensd.be/818/linux/install-a-newer-kernel-in-debian-9-stretch-stable
>>
> 
> Unfortunately, backports provides 4.19 as the latest.
> I am now manually compiling 5.0. Last time I did that, I was less half
> my current age :-)
> 
>> We need the entire dmesg so we can see if there are any earlier
>> complaints by the drive or the link. Can you attach the entire dmesg
>> as a file?
> Done (also the two smartctl outputs).
> 
>>Have you tried stop the workload, and see if the timeout disappears?
> 
> Unfortunately not. I had the impression that the system did not react
> anymore. I CTRL-Ced and rebooted.
> I was copying all the stuff from my old drive to the new one. I should
> say, that the workload was high, but not exceptional. Just one or two
> copy jobs.

Then it's some deadlock, not regular high load timeout.

> Also, the btrfs drive was in advantage:
> 1) it had btrfs ;-) (the other ext4)
> 2) it did not need to search
> 3) it was connected via SATA (and not USB3 as the source)
> 
> The drive does not seem to be an SMR drive (WD80EZAZ).
> 
>> If it just disappear after some time, then it's the disk too slow and
>> too heavy load, combined with btrfs' low concurrency design leading to
>> the problem.
> 
> I was tempted to ask, whether this should be fixed. On the other hand, I
> am not even sure anything bad happened (except, well, the system -at
> least the copy- seemed to hang).

Definitely needs to be fixed.

With full dmesg, it's now clear that is a real dead lock.
Something wrong with the free space cache, blocking the whole fs to be
committed.

If you still want to try btrfs, you could try "nosapce_cache" mount option.
Free space cache of btrfs is just an optimization, you can completely
ignore that with minor performance drop.

Thanks,
Qu

> 
> By the way: I ran a scrub and a smartctl -t long. Both without errors.
> 
> Greetings,
> Hendrik


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re[2]: Rough (re)start with btrfs
  2019-05-02 23:40 ` Rough (re)start with btrfs Qu Wenruo
@ 2019-05-03  5:41   ` Hendrik Friedel
  2019-05-03  6:05     ` Chris Murphy
  2019-05-03  7:34     ` Qu Wenruo
  2019-05-03  5:58   ` Chris Murphy
  1 sibling, 2 replies; 9+ messages in thread
From: Hendrik Friedel @ 2019-05-03  5:41 UTC (permalink / raw)
  To: Qu Wenruo, Chris Murphy, Btrfs BTRFS

Hello,

-by the way: I think my mail did not appear in the list, but only 
reached Chris and Qu directly. I just tried to re-send it. Could this be 
caused by
1) me not a subscriber of the list
2) combined with me sending attachments
I did *not* get any error message by the server.

>>  I was tempted to ask, whether this should be fixed. On the other hand, I
>>  am not even sure anything bad happened (except, well, the system -at
>>  least the copy- seemed to hang).
>
>Definitely needs to be fixed.
>
>With full dmesg, it's now clear that is a real dead lock.
>Something wrong with the free space cache, blocking the whole fs to be
>committed.
>
So, what's the next step? Shall I open a bug report somewhere, or is it 
already on some list?

>If you still want to try btrfs, you could try "nosapce_cache" mount option.
>Free space cache of btrfs is just an optimization, you can completely
>ignore that with minor performance drop.
>
I will try that, yes.
Can you confirm, that it is unlikely, that I lost any data / damaged the 
Filesystem?

Regards,
Hendrik


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re[2]: Rough (re)start with btrfs
       [not found] <em9eba60a7-2c0d-4399-8712-c134f0f50d4d@ryzen>
  2019-05-02 23:40 ` Rough (re)start with btrfs Qu Wenruo
@ 2019-05-03  5:52 ` Chris Murphy
  1 sibling, 0 replies; 9+ messages in thread
From: Chris Murphy @ 2019-05-03  5:52 UTC (permalink / raw)
  To: Hendrik Friedel; +Cc: Chris Murphy, Btrfs BTRFS, Qu Wenruo

On Thu, May 2, 2019 at 1:02 PM Hendrik Friedel <hendrik@friedels.name> wrote:
>
> >What scheduler is being used for the drive?
> >
> ># cat /sys/block/<dev>/queue/scheduler
> [mq-deadline] none

At first I thought you might be running into this bug
https://lwn.net/Articles/774440/

However:

[Mo Apr 29 20:44:47 2019]       Not tainted 4.19.0-0.bpo.2-amd64 #1
Debian 4.19.16-1~bpo9+1

This is actually based on 4.19.16 which has the fix for that.


[Mo Apr 29 06:44:32 2019] systemd[1]: apt-daily-upgrade.timer: Adding
36min 35.299087s random time.
[Mo Apr 29 20:44:47 2019] INFO: task btrfs-transacti:10227 blocked for
more than 120 seconds.

Literally nothing for hours before the blocking. And I don't see
anything off during device discovery.

Qu would know better but usually developers ask for sysrq+w when
there's blocked tasks.
https://www.kernel.org/doc/html/v4.11/admin-guide/sysrq.html

Basically as root issue
# echo 1 >/proc/sys/kernel/sysrq
# echo w > /proc/sysrq-trigger

What I do is run the first command and type out the second command but
do not press return; in another shell reproduce the hang, and then go
back to the first shell and hit return. That way it doesn't take a
minute or two to type out during the hang. The result appears in
dmesg, so stop the operation causing the hang if possible and then
'dmesg>dmesg.txt' and attach it. Also, you'll want to reboot with
'log_bug_len=1M' because the sysrq+w that gets dumped to dmesg will
fill up the kernel message buffer.

> Done (also the two smartctl outputs).

I don't see anything weird there either. The errors are a little weird
but predate the Btrfs error by a lot.


> I was tempted to ask, whether this should be fixed. On the other hand, I am not even sure anything bad happened (except, well, the system -at least the copy- seemed to hang).

It could be a bug somewhere, but question is where. The workload is
only copy? Seems trivial and not prone to lock contention.

You know what? Try changing the scheduler from mq-deadline to none.
Change nothing else. Now try to reproduce. Let's see if it still
happens.

Also, what are the mount options?

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Rough (re)start with btrfs
  2019-05-02 23:40 ` Rough (re)start with btrfs Qu Wenruo
  2019-05-03  5:41   ` Re[2]: " Hendrik Friedel
@ 2019-05-03  5:58   ` Chris Murphy
  1 sibling, 0 replies; 9+ messages in thread
From: Chris Murphy @ 2019-05-03  5:58 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Hendrik Friedel, Chris Murphy, Btrfs BTRFS

On Thu, May 2, 2019 at 5:40 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2019/5/3 上午3:02, Hendrik Friedel wrote:
> > Hello,
> >
> > thanks for your replies. I appreciate it!
> >>>  I am using btrfs-progs v4.20.2 and debian stretch with
> >>>  4.19.0-0.bpo.2-amd64 (I think, this is the latest Kernel available in
> >>>  stretch. Please correct if I am wrong.
> >>
> >> What scheduler is being used for the drive?
> >>
> >> # cat /sys/block/<dev>/queue/scheduler
> > [mq-deadline] none
> >
> >> If it's none, then kernel version and scheduler aren't likely related
> >> to what you're seeing.
> >>
> >> It's not immediately urgent, but I would still look for something
> >> newer, just because the 4.19 series already has 37 upstream updates
> >> released, each with dozens of fixes, easily there are over 1000 fixes
> >> available in total. I'm not a Debian user but I think there's
> >> stretch-backports that has newer kernels?
> >> http://jensd.be/818/linux/install-a-newer-kernel-in-debian-9-stretch-stable
> >>
> >
> > Unfortunately, backports provides 4.19 as the latest.
> > I am now manually compiling 5.0. Last time I did that, I was less half
> > my current age :-)
> >
> >> We need the entire dmesg so we can see if there are any earlier
> >> complaints by the drive or the link. Can you attach the entire dmesg
> >> as a file?
> > Done (also the two smartctl outputs).
> >
> >>Have you tried stop the workload, and see if the timeout disappears?
> >
> > Unfortunately not. I had the impression that the system did not react
> > anymore. I CTRL-Ced and rebooted.
> > I was copying all the stuff from my old drive to the new one. I should
> > say, that the workload was high, but not exceptional. Just one or two
> > copy jobs.
>
> Then it's some deadlock, not regular high load timeout.
>
> > Also, the btrfs drive was in advantage:
> > 1) it had btrfs ;-) (the other ext4)
> > 2) it did not need to search
> > 3) it was connected via SATA (and not USB3 as the source)
> >
> > The drive does not seem to be an SMR drive (WD80EZAZ).
> >
> >> If it just disappear after some time, then it's the disk too slow and
> >> too heavy load, combined with btrfs' low concurrency design leading to
> >> the problem.
> >
> > I was tempted to ask, whether this should be fixed. On the other hand, I
> > am not even sure anything bad happened (except, well, the system -at
> > least the copy- seemed to hang).
>
> Definitely needs to be fixed.
>
> With full dmesg, it's now clear that is a real dead lock.
> Something wrong with the free space cache, blocking the whole fs to be
> committed.
>
> If you still want to try btrfs, you could try "nosapce_cache" mount option.
> Free space cache of btrfs is just an optimization, you can completely
> ignore that with minor performance drop.


I should have read this before replying earlier.

You can also do a one time clean mount with '-o
clear_cache,space_cache=v2' which will remove the v1 (default) space
cache, and create a v2 cache. Subsequent mount will see the flag for
this feature and always use the v2 cache. It's a totally differently
implementation and shouldn't have this problem.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re[2]: Rough (re)start with btrfs
  2019-05-03  5:41   ` Re[2]: " Hendrik Friedel
@ 2019-05-03  6:05     ` Chris Murphy
  2019-05-04  9:31       ` Re[4]: " Hendrik Friedel
  2019-05-03  7:34     ` Qu Wenruo
  1 sibling, 1 reply; 9+ messages in thread
From: Chris Murphy @ 2019-05-03  6:05 UTC (permalink / raw)
  To: Hendrik Friedel; +Cc: Qu Wenruo, Chris Murphy, Btrfs BTRFS

On Thu, May 2, 2019 at 11:41 PM Hendrik Friedel <hendrik@friedels.name> wrote:
>
> Hello,
>
> -by the way: I think my mail did not appear in the list, but only
> reached Chris and Qu directly. I just tried to re-send it. Could this be
> caused by
> 1) me not a subscriber of the list
> 2) combined with me sending attachments
> I did *not* get any error message by the server.
>
> >>  I was tempted to ask, whether this should be fixed. On the other hand, I
> >>  am not even sure anything bad happened (except, well, the system -at
> >>  least the copy- seemed to hang).
> >
> >Definitely needs to be fixed.
> >
> >With full dmesg, it's now clear that is a real dead lock.
> >Something wrong with the free space cache, blocking the whole fs to be
> >committed.
> >
> So, what's the next step? Shall I open a bug report somewhere, or is it
> already on some list?
>
> >If you still want to try btrfs, you could try "nosapce_cache" mount option.
> >Free space cache of btrfs is just an optimization, you can completely
> >ignore that with minor performance drop.
> >
> I will try that, yes.
> Can you confirm, that it is unlikely, that I lost any data / damaged the
> Filesystem?

Not likely. You can do a scrub to check for metadata and data
corruption. And you can do an offline (unmounted) 'btrfs check
--readonly' to check the validity of the metadata. The Btrfs call
traces during the blocked task are INFO, not warnings or errors, so
the file system and data is likely fine. There's no read, write,
corruption, or generation errors in the dmesg; but you can also check
'btfs dev stats <mountpoint>' which is a persistent counter for this
particular device.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Rough (re)start with btrfs
  2019-05-03  5:41   ` Re[2]: " Hendrik Friedel
  2019-05-03  6:05     ` Chris Murphy
@ 2019-05-03  7:34     ` Qu Wenruo
  1 sibling, 0 replies; 9+ messages in thread
From: Qu Wenruo @ 2019-05-03  7:34 UTC (permalink / raw)
  To: Hendrik Friedel, Chris Murphy, Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 1506 bytes --]



On 2019/5/3 下午1:41, Hendrik Friedel wrote:
> Hello,
> 
> -by the way: I think my mail did not appear in the list, but only
> reached Chris and Qu directly. I just tried to re-send it. Could this be
> caused by
> 1) me not a subscriber of the list
> 2) combined with me sending attachments
> I did *not* get any error message by the server.
> 
>>>  I was tempted to ask, whether this should be fixed. On the other
>>> hand, I
>>>  am not even sure anything bad happened (except, well, the system -at
>>>  least the copy- seemed to hang).
>>
>> Definitely needs to be fixed.
>>
>> With full dmesg, it's now clear that is a real dead lock.
>> Something wrong with the free space cache, blocking the whole fs to be
>> committed.
>>
> So, what's the next step? Shall I open a bug report somewhere, or is it
> already on some list?

Not sure if other is looking into this.

Btrfs bug tracking is somewhat tricky.
Some prefer bug report in mail list directly like me, some prefer kernel
bugzilla.

> 
>> If you still want to try btrfs, you could try "nosapce_cache" mount
>> option.
>> Free space cache of btrfs is just an optimization, you can completely
>> ignore that with minor performance drop.
>>
> I will try that, yes.
> Can you confirm, that it is unlikely, that I lost any data / damaged the
> Filesystem?

You lost nothing except the new data which is going be committed in the
blocked transaction.

Thanks,
Qu

> 
> Regards,
> Hendrik
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re[4]: Rough (re)start with btrfs
  2019-05-03  6:05     ` Chris Murphy
@ 2019-05-04  9:31       ` Hendrik Friedel
  2019-05-04 19:05         ` Chris Murphy
  0 siblings, 1 reply; 9+ messages in thread
From: Hendrik Friedel @ 2019-05-04  9:31 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Qu Wenruo, Chris Murphy, Btrfs BTRFS

Hello,

this:
 >Some prefer bug report in mail list directly like me, some prefer 
kernel
 >bugzilla.

and this:
 >Not sure if other is looking into this.
 >Btrfs bug tracking is somewhat tricky.

may be related...


 >Not likely. You can do a scrub to check for metadata and data
 >corruption.

Did that. All good.

 >And you can do an offline (unmounted) 'btrfs check
 >--readonly' to check the validity of the metadata.

Will do that.

 > The Btrfs call
 >traces during the blocked task are INFO, not warnings or errors, so
 >the file system and data is likely fine. There's no read, write,
 >corruption, or generation errors in the dmesg; but you can also check
 >'btfs dev stats <mountpoint>' which is a persistent counter for this
 >particular device.
[/dev/sdh1].write_io_errs 0
[/dev/sdh1].read_io_errs 0
[/dev/sdh1].flush_io_errs 0
[/dev/sdh1].corruption_errs 0
[/dev/sdh1].generation_errs 0


 >I should have read this before replying earlier.
 >
 >You can also do a one time clean mount with '-o
 >clear_cache,space_cache=v2' which will remove the v1 (default) space
 >cache, and create a v2 cache. Subsequent mount will see the flag for
 >this feature and always use the v2 cache. It's a totally differently
 >implementation and shouldn't have this problem.

So, you have a suspicion already about what caused the problem? Why is 
v2 then not default? Is it worth chasing the Bug in v1?
For me, the question now is, whether we should chase this Bug or not. I 
encountered it three times while filling a 8TB drive with 7TB. Now, I 
have 1TB left and I am not sure I can reproduce, but I can try.

 >Qu would know better but usually developers ask for sysrq+w when
 >there's blocked tasks.

I am wondering, whether there is a -long term- a better way than this. 
Ideally, btrfs would automatically create a 
btrfs-bug-DD-MM-YY-hh-mm-ss.tar.gz with all the info you need and inform 
the User about it and where to issue the bug. I am aware that this is 
tricky. But in order to further mature btrfs, I assume you need more 
real life data with good quality (that is, the right logs) without too 
much work (asking for logs). What's your view on this?

 >You know what? Try changing the scheduler from mq-deadline to none.
 >Change nothing else. Now try to reproduce. Let's see if it still
 >happens.

Wouldn't it make sense first to try to reproduce without changing 
anything?

 >Also, what are the mount options?
rw,noatime,nospace_cache,subvolid=5,subvol=/
But noatime and nospace_cache I added just today.

Greetings,
Hendrik


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re[4]: Rough (re)start with btrfs
  2019-05-04  9:31       ` Re[4]: " Hendrik Friedel
@ 2019-05-04 19:05         ` Chris Murphy
  2019-05-06 18:39           ` Re[6]: " Hendrik Friedel
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Murphy @ 2019-05-04 19:05 UTC (permalink / raw)
  To: Hendrik Friedel; +Cc: Chris Murphy, Qu Wenruo, Btrfs BTRFS

On Sat, May 4, 2019 at 3:31 AM Hendrik Friedel <hendrik@friedels.name> wrote:
>
>  >I should have read this before replying earlier.
>  >
>  >You can also do a one time clean mount with '-o
>  >clear_cache,space_cache=v2' which will remove the v1 (default) space
>  >cache, and create a v2 cache. Subsequent mount will see the flag for
>  >this feature and always use the v2 cache. It's a totally differently
>  >implementation and shouldn't have this problem.
>
> So, you have a suspicion already about what caused the problem? Why is
> v2 then not default? Is it worth chasing the Bug in v1?

v2 is expected to become the default soon

There's known contention for certain workloads when using v1 because
the cache information is stored as if it were a hidden data file,
whereas v2 uses its own btree. But from the sound of it Qu has enough
information to maybe track down the v1 problem and fix it, and
probably should be fixed as v1 is the default and is still supported
and will be forever. But the time frame for a fix may be a while, I'm
not sure.


> For me, the question now is, whether we should chase this Bug or not. I
> encountered it three times while filling a 8TB drive with 7TB. Now, I
> have 1TB left and I am not sure I can reproduce, but I can try.

I don't think it's necessary unless Qu specifically asks.


>
>  >Qu would know better but usually developers ask for sysrq+w when
>  >there's blocked tasks.
>
> I am wondering, whether there is a -long term- a better way than this.
> Ideally, btrfs would automatically create a
> btrfs-bug-DD-MM-YY-hh-mm-ss.tar.gz with all the info you need and inform
> the User about it and where to issue the bug.

No Linux file system has such a thing. And to create such a package
would happen in user space, not the kernel code. Most of Btrfs is
kernel code, same as ext4 and XFS and other file systems. What is
usually the case, if the file system gets confused, it should dump
information into the kernel messages, and the file system developers
do control what kinds of info, error, and warning kernel messages get
dumped into dmesg. Normally that's enough. But since Btrfs is in the
kernel, it depends on other things that happen in the kernel and it's
sometimes necessary to get more information on demand. There really
isn't a way to automate sysrq - you wouldn't want to constantly dump
that amount of information into kernel message buffer and then burden
the system logger with quite a lot of extraneous information.



>
>  >You know what? Try changing the scheduler from mq-deadline to none.
>  >Change nothing else. Now try to reproduce. Let's see if it still
>  >happens.
>
> Wouldn't it make sense first to try to reproduce without changing
> anything?

I assumed it was a persistent problem rather than a transient one. So
yes you should first discover the reproduce steps. That's ideal too
for the developers because often they need to reproduce to see on
their own system what's going on, and often times they have Btrfs
debug option set in their kernels which most distros do not enable. So
they can see more things than we do.

Once you have a reproducer, then you can change the scheduler and see
if your reproduce steps still reproduce the problem.



>
>  >Also, what are the mount options?
> rw,noatime,nospace_cache,subvolid=5,subvol=/
> But noatime and nospace_cache I added just today.

OK that all looks good.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re[6]: Rough (re)start with btrfs
  2019-05-04 19:05         ` Chris Murphy
@ 2019-05-06 18:39           ` Hendrik Friedel
  0 siblings, 0 replies; 9+ messages in thread
From: Hendrik Friedel @ 2019-05-06 18:39 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Chris Murphy, Qu Wenruo, Btrfs BTRFS

Hello,

>v2 is expected to become the default soon
That is good to hear.

>But from the sound of it Qu has enough
>information to maybe track down the v1 problem and fix it, and
>probably should be fixed as v1 is the default and is still supported
>and will be forever.
That's good to hear.

>>
>>  For me, the question now is, whether we should chase this Bug or not. I
>>  encountered it three times while filling a 8TB drive with 7TB. Now, I
>>  have 1TB left and I am not sure I can reproduce, but I can try.
>
>I don't think it's necessary unless Qu specifically asks.
Let me know Qu.

>>you wouldn't want to constantly dump
>that amount of information into kernel message buffer and then burden
>the system logger with quite a lot of extraneous information.
I understand. Still, a a pitty.

>Once you have a reproducer, then you can change the scheduler and see
>if your reproduce steps still reproduce the problem.
I will try and let you know. It's not persistent.

Greetings,
Hendrik


>


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-05-06 18:39 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <em9eba60a7-2c0d-4399-8712-c134f0f50d4d@ryzen>
2019-05-02 23:40 ` Rough (re)start with btrfs Qu Wenruo
2019-05-03  5:41   ` Re[2]: " Hendrik Friedel
2019-05-03  6:05     ` Chris Murphy
2019-05-04  9:31       ` Re[4]: " Hendrik Friedel
2019-05-04 19:05         ` Chris Murphy
2019-05-06 18:39           ` Re[6]: " Hendrik Friedel
2019-05-03  7:34     ` Qu Wenruo
2019-05-03  5:58   ` Chris Murphy
2019-05-03  5:52 ` Re[2]: " Chris Murphy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.