* BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
@ 2017-07-11 6:21 Marc MERLIN
2017-07-11 16:00 ` Chris Murphy
2017-07-15 1:22 ` BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012) Marc MERLIN
0 siblings, 2 replies; 47+ messages in thread
From: Marc MERLIN @ 2017-07-11 6:21 UTC (permalink / raw)
To: linux-btrfs
Looks like btrfs has decided to give me hell.
I'm still recovering my system.
The biggest filesystem seems to work, but I just had it go read only:
------------[ cut here ]------------
WARNING: CPU: 5 PID: 3734 at fs/btrfs/extent-tree.c:2960 btrfs_run_delayed_refs+0xb6/0x1dc
BTRFS: Transaction aborted (error -17)
Modules linked in: udp_diag tcp_diag inet_diag veth ip6table_filter ip6_tables ebtable_nat ebtables ppdev lp xt_addrtype br_
fmt_misc ftdi_sio nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_
ptable_mangle iptable_filter pl2303 dm_snapshot dm_bufio iptable_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
e_ipv4 nf_nat nf_conntrack x_tables sg st snd_pcm_oss snd_mixer_oss bcache kvm_intel kvm irqbypass snd_hda_codec_realtek snd
da_codec snd_cmipci rc_ati_x10 asus_wmi snd_hda_core snd_mpu401_uart snd_opl3_lib snd_hwdep snd_rawmidi snd_seq_device spars
l tpm_infineon snd tpm_tis hwmon tpm_tis_core usbnet rc_core i2c_i801 usbserial libphy soundcore wmi i915 lpc_ich mfd_cor
s evdev pcspkr parport_pc battery mei_me parport i2c_smbus e1000e ptp pps_core fuse raid456 multipath mmc_block mmc_core dm_
r async_memcpy async_tx crc32c_intel blowfish_x86_64 blowfish_common aesni_intel aes_x86_64 lrw glue_helper ablk_helper cryp
4 thermal usbcore mvsas libsas fjes scsi_transport_sas fan r8169 mii usb_common [last unloaded: ftdi_sio]
CPU: 1 PID: 3734 Comm: btrfs-transacti Tainted: G U W 4.8.10-amd64-preempt-sysrq-20161121vb3tj1 #12
Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
0000000000200286 000000003f87d529 ffff9dcc9838fd00 ffffffffbb39e738
ffff9dcc9838fd50 0000000000000000 ffff9dcc9838fd40 ffffffffbb066e08
00000b909838fdc0 ffff9dc94fdc9be0 0000000000000000 ffff9dcca0d93000
Call Trace:
[<ffffffffbb39e738>] dump_stack+0x63/0x7f
[<ffffffffbb066e08>] __warn+0xc2/0xdd
[<ffffffffbb066e7d>] warn_slowpath_fmt+0x5a/0x76
[<ffffffffbb291dc2>] btrfs_run_delayed_refs+0xb6/0x1dc
[<ffffffffbb2a4d1d>] btrfs_commit_transaction+0x5b/0x965
[<ffffffffbb2a030e>] transaction_kthread+0xf5/0x19f
[<ffffffffbb2a0219>] ? btrfs_cleanup_transaction+0x47b/0x47b
[<ffffffffbb081df3>] kthread+0xb4/0xbc
[<ffffffffbb6d23df>] ret_from_fork+0x1f/0x40
[<ffffffffbb081d3f>] ? init_completion+0x24/0x24
---[ end trace feb4b95c83ac065f ]---
BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
BTRFS info (device dm-2): forced readonly
Yes, I'm back with 4.8 since I need to get back to a working state,
however this may be a totally unrelated bug that has been fixed since
4.8?
The filesystem seems fine though:
enabling repair mode
Checking filesystem on /dev/mapper/dshelf1
UUID: 36f5079e-ca6c-4855-8639-ccb82695c18d
checking extents
Fixed 0 roots.
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
checking csums
checking root refs
found 11452211699712 bytes used, no error found
total csum bytes: 11167908392
total tree bytes: 13463715840
total fs tree bytes: 712867840
total extent tree bytes: 478281728
btree space waste bytes: 1159679826
file data blocks allocated: 11888008564736
referenced 11908268208128
So I'm going to remount it read-write, but can someone explain the failure above?
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
2017-07-11 6:21 BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists Marc MERLIN
@ 2017-07-11 16:00 ` Chris Murphy
2017-07-11 16:48 ` Marc MERLIN
2017-07-15 1:22 ` BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012) Marc MERLIN
1 sibling, 1 reply; 47+ messages in thread
From: Chris Murphy @ 2017-07-11 16:00 UTC (permalink / raw)
To: Marc MERLIN; +Cc: Btrfs BTRFS
On Tue, Jul 11, 2017 at 12:21 AM, Marc MERLIN <marc@merlins.org> wrote:
> Looks like btrfs has decided to give me hell.
> I'm still recovering my system.
> The biggest filesystem seems to work, but I just had it go read only:
>
> ------------[ cut here ]------------
> WARNING: CPU: 5 PID: 3734 at fs/btrfs/extent-tree.c:2960 btrfs_run_delayed_refs+0xb6/0x1dc
> BTRFS: Transaction aborted (error -17)
> Modules linked in: udp_diag tcp_diag inet_diag veth ip6table_filter ip6_tables ebtable_nat ebtables ppdev lp xt_addrtype br_
> fmt_misc ftdi_sio nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_
> ptable_mangle iptable_filter pl2303 dm_snapshot dm_bufio iptable_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
> e_ipv4 nf_nat nf_conntrack x_tables sg st snd_pcm_oss snd_mixer_oss bcache kvm_intel kvm irqbypass snd_hda_codec_realtek snd
> da_codec snd_cmipci rc_ati_x10 asus_wmi snd_hda_core snd_mpu401_uart snd_opl3_lib snd_hwdep snd_rawmidi snd_seq_device spars
> l tpm_infineon snd tpm_tis hwmon tpm_tis_core usbnet rc_core i2c_i801 usbserial libphy soundcore wmi i915 lpc_ich mfd_cor
> s evdev pcspkr parport_pc battery mei_me parport i2c_smbus e1000e ptp pps_core fuse raid456 multipath mmc_block mmc_core dm_
> r async_memcpy async_tx crc32c_intel blowfish_x86_64 blowfish_common aesni_intel aes_x86_64 lrw glue_helper ablk_helper cryp
> 4 thermal usbcore mvsas libsas fjes scsi_transport_sas fan r8169 mii usb_common [last unloaded: ftdi_sio]
> CPU: 1 PID: 3734 Comm: btrfs-transacti Tainted: G U W 4.8.10-amd64-preempt-sysrq-20161121vb3tj1 #12
> Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> 0000000000200286 000000003f87d529 ffff9dcc9838fd00 ffffffffbb39e738
> ffff9dcc9838fd50 0000000000000000 ffff9dcc9838fd40 ffffffffbb066e08
> 00000b909838fdc0 ffff9dc94fdc9be0 0000000000000000 ffff9dcca0d93000
> Call Trace:
> [<ffffffffbb39e738>] dump_stack+0x63/0x7f
> [<ffffffffbb066e08>] __warn+0xc2/0xdd
> [<ffffffffbb066e7d>] warn_slowpath_fmt+0x5a/0x76
> [<ffffffffbb291dc2>] btrfs_run_delayed_refs+0xb6/0x1dc
> [<ffffffffbb2a4d1d>] btrfs_commit_transaction+0x5b/0x965
> [<ffffffffbb2a030e>] transaction_kthread+0xf5/0x19f
> [<ffffffffbb2a0219>] ? btrfs_cleanup_transaction+0x47b/0x47b
> [<ffffffffbb081df3>] kthread+0xb4/0xbc
> [<ffffffffbb6d23df>] ret_from_fork+0x1f/0x40
> [<ffffffffbb081d3f>] ? init_completion+0x24/0x24
> ---[ end trace feb4b95c83ac065f ]---
> BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
> BTRFS info (device dm-2): forced readonly
You've already had this same traceback, not sure whether it's the same
file system or not, but it was 4.7.2 kernel.
> Yes, I'm back with 4.8 since I need to get back to a working state,
> however this may be a totally unrelated bug that has been fixed since
> 4.8?
Probably fixed in 4.9, no idea when. I would just use the most recent
4.9 kernel you can get or build. Less chance of regressions in
longterm, greater chance of bug fixes. Same for 4.4.
--
Chris Murphy
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
2017-07-11 16:00 ` Chris Murphy
@ 2017-07-11 16:48 ` Marc MERLIN
2017-07-11 22:43 ` Chris Murphy
2017-07-13 1:10 ` Marc MERLIN
0 siblings, 2 replies; 47+ messages in thread
From: Marc MERLIN @ 2017-07-11 16:48 UTC (permalink / raw)
To: Chris Murphy; +Cc: Btrfs BTRFS
On Tue, Jul 11, 2017 at 10:00:40AM -0600, Chris Murphy wrote:
> > ---[ end trace feb4b95c83ac065f ]---
> > BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
> > BTRFS info (device dm-2): forced readonly
>
> You've already had this same traceback, not sure whether it's the same
> file system or not, but it was 4.7.2 kernel.
You have better memory than me. I'll admit that I'm kind of overwhelmed
by all the time I'm currently spending/wasting on btrfs recovery and
that came almost out of nowwhere and hit me in 3 different places :-/
> Probably fixed in 4.9, no idea when. I would just use the most recent
> 4.9 kernel you can get or build. Less chance of regressions in
> longterm, greater chance of bug fixes. Same for 4.4.
Fair suggestion. I jumped from 4.8 to 4.11. I'll build a 4.9 then.
Thanks,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
2017-07-11 16:48 ` Marc MERLIN
@ 2017-07-11 22:43 ` Chris Murphy
2017-07-11 23:04 ` Marc MERLIN
2017-07-13 1:10 ` Marc MERLIN
1 sibling, 1 reply; 47+ messages in thread
From: Chris Murphy @ 2017-07-11 22:43 UTC (permalink / raw)
To: Marc MERLIN; +Cc: Chris Murphy, Btrfs BTRFS
On Tue, Jul 11, 2017 at 10:48 AM, Marc MERLIN <marc@merlins.org> wrote:
> On Tue, Jul 11, 2017 at 10:00:40AM -0600, Chris Murphy wrote:
>> > ---[ end trace feb4b95c83ac065f ]---
>> > BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
>> > BTRFS info (device dm-2): forced readonly
>>
>> You've already had this same traceback, not sure whether it's the same
>> file system or not, but it was 4.7.2 kernel.
>
> You have better memory than me. I'll admit that I'm kind of overwhelmed
> by all the time I'm currently spending/wasting on btrfs recovery and
> that came almost out of nowwhere and hit me in 3 different places :-/
>
>> Probably fixed in 4.9, no idea when. I would just use the most recent
>> 4.9 kernel you can get or build. Less chance of regressions in
>> longterm, greater chance of bug fixes. Same for 4.4.
>
> Fair suggestion. I jumped from 4.8 to 4.11. I'll build a 4.9 then.
Assuming it works, settle on 4.9 until 4.14 shakes out a bit. Given
your setup and the penalty for even small problems, it's probably
better to go low risk and that means longterm kernels. Maybe one of
the three systems can use a newer kernel just to make sure you're
regressions, if any, are contained, but otherwise avoid all eggs in
one basket approach.
Another option is cutting down the size of the array and going with a
gluster or ceph approach so the rebuilds aren't so hideously invasive.
You could also optionally use a different storage layout and file
system for a small subset of the bricks, either XFS on LVM RAID or
ZoL. Again, fewer eggs in one basket. But even if they're all Btrfs,
merely breaking things down makes for faster rebuilds, less downtime,
less stress. Because whether it's an unexplained regression, the never
finished fsck, a hardware bug, or a legit drive failure, you will
inevitably have brick problems. Something's always going to go wrong
eventually. Haha. Just throw more drives at the problem and have
gluster do some distributed replication so you can more easily lose
entire bricks like this.
--
Chris Murphy
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
2017-07-11 22:43 ` Chris Murphy
@ 2017-07-11 23:04 ` Marc MERLIN
0 siblings, 0 replies; 47+ messages in thread
From: Marc MERLIN @ 2017-07-11 23:04 UTC (permalink / raw)
To: Chris Murphy; +Cc: Btrfs BTRFS
On Tue, Jul 11, 2017 at 04:43:06PM -0600, Chris Murphy wrote:
> Assuming it works, settle on 4.9 until 4.14 shakes out a bit. Given
> your setup and the penalty for even small problems, it's probably
> better to go low risk and that means longterm kernels. Maybe one of
> the three systems can use a newer kernel just to make sure you're
> regressions, if any, are contained, but otherwise avoid all eggs in
> one basket approach.
That's indeed what I was considering doing.
I guess I got complacent/too trusting after btrfs had worked for me without
real problems for over a year (maybe close to 2?)
My laptop had to be upgraded to 4.11 due to a kernel issue with nvme drives
that made any kernel before that hang on S3 sleep.
But my server can be on anything, and it seems that I'm going to leave it in
4.9 for a while indeed, even if it had been happily on 4.8 for a long time
(but given this snapshot rotation bug that caused it to remount a perfectly
good filesystem, as read only, I indeed just moved it to 4.9.36)
> Another option is cutting down the size of the array and going with a
> gluster or ceph approach so the rebuilds aren't so hideously invasive.
Right, it's just personal stuff, I don't want the management to be
ridiculously high for something that ought to be simple.
I only have 2 raid5 arrays of 5 drives each (when back in the day, I
remember building a 26 drive array with SCSI SCA drives in 3 disk shelves
for a total of 2TB, woot!)
I don't really want to artificially cut that raid5 in smaller filesystem by
adding yet another layer like LVM and then concatenate several smaller btrfs
filesystems.
I know I might be a bit stubborn here, but only 4 data drives, it should be
considered small enough, even if the drives are not super small.
> You could also optionally use a different storage layout and file
> system for a small subset of the bricks, either XFS on LVM RAID or
Yes, basically instead of having one media array and one backup array, I can
make multiple ones, and then take the penalty of moving data across them.
Been there in the past, don't really want to go back :-/
But as you said, there is no magic answer outside not having filesystems
that get corrupted so easily. I did have one flaky SAS card that did
probably slightly damage one of my arrays, but the other 2 (and the
filesystem on my laptop) don't have that hardware excuse.
Anyway, while it's not very helpful to the btrfs project, 4.9.36 seems like
indeed what's best for me for now.
Thanks for the replies.
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
2017-07-11 16:48 ` Marc MERLIN
2017-07-11 22:43 ` Chris Murphy
@ 2017-07-13 1:10 ` Marc MERLIN
2017-07-13 18:17 ` Chris Murphy
1 sibling, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2017-07-13 1:10 UTC (permalink / raw)
To: Chris Murphy; +Cc: Btrfs BTRFS
On Tue, Jul 11, 2017 at 09:48:12AM -0700, Marc MERLIN wrote:
> On Tue, Jul 11, 2017 at 10:00:40AM -0600, Chris Murphy wrote:
> > > ---[ end trace feb4b95c83ac065f ]---
> > > BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
> > > BTRFS info (device dm-2): forced readonly
> >
> > You've already had this same traceback, not sure whether it's the same
> > file system or not, but it was 4.7.2 kernel.
>
> You have better memory than me. I'll admit that I'm kind of overwhelmed
> by all the time I'm currently spending/wasting on btrfs recovery and
> that came almost out of nowwhere and hit me in 3 different places :-/
Ok, I'm on 4.9.36 and same problem :(
This is on an otherwise ok working filesystem that comes back clean
on btrfs check (although I haven't done lowmem but last time I tried lowmem it
reported problems that apparently weren't really problems)
Dear devs, what does this error mean exactly and what should I do about it besides
ignoring it and remounting my FS read-write?
On the plus side thanks for both
1) showing which device the error is on
2) not crashing the system :)
WARNING: CPU: 6 PID: 3730 at fs/btrfs/extent-tree.c:2967 btrfs_run_delayed_refs+0xbd/0x1be
BTRFS: Transaction aborted (error -17)
CPU: 0 PID: 3730 Comm: btrfs-cleaner Tainted: G U W 4.9.36-amd64-preempt-sysrq-20170
Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
ffffb55c679bfc88 ffffffff8239b00b ffffb55c679bfcd8 0000000000000000
ffffb55c679bfcc8 ffffffff82066769 00000b97679bfd48 ffffa07f61a5eaa0
ffffa086f217c800 00000000ffffffef ffffa086ad8b5a90 00000000000003a0
Call Trace:
[<ffffffff8239b00b>] dump_stack+0x61/0x7d
[<ffffffff82066769>] __warn+0xc2/0xdd
[<ffffffff820667de>] warn_slowpath_fmt+0x5a/0x76
[<ffffffff8228dd5f>] btrfs_run_delayed_refs+0xbd/0x1be
[<ffffffff8228b358>] ? walk_up_tree+0x87/0x10f
[<ffffffff8229fd8f>] btrfs_should_end_transaction+0x54/0x5d
[<ffffffff8228c8b5>] btrfs_drop_snapshot+0x380/0x65c
[<ffffffff822edf7c>] ? btrfs_kill_all_delayed_nodes+0x5f/0xd7
[<ffffffff826ecf8a>] ? _raw_spin_lock+0x15/0x17
[<ffffffff82292130>] ? btrfs_delete_unused_bgs+0x326/0x369
[<ffffffff822a0e29>] btrfs_clean_one_deleted_snapshot+0xce/0xdc
[<ffffffff82298c1e>] cleaner_kthread+0xaf/0x17c
[<ffffffff82298b6f>] ? btrfs_need_cleaner_sleep.isra.25+0x2c/0x2c
[<ffffffff82081e94>] kthread+0xd1/0xd9
[<ffffffff82081dc3>] ? init_completion+0x24/0x24
[<ffffffff82003add>] ? do_fast_syscall_32+0xb7/0xfe
[<ffffffff826ed4b5>] ret_from_fork+0x25/0x30
---[ end trace 59fd1c9a379f73bc ]---
BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2967: errno=-17 Object already exists
BTRFS info (device dm-2): forced readonly
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
2017-07-13 1:10 ` Marc MERLIN
@ 2017-07-13 18:17 ` Chris Murphy
2017-07-15 0:48 ` Marc MERLIN
0 siblings, 1 reply; 47+ messages in thread
From: Chris Murphy @ 2017-07-13 18:17 UTC (permalink / raw)
To: Marc MERLIN; +Cc: Chris Murphy, Btrfs BTRFS
On Wed, Jul 12, 2017 at 7:10 PM, Marc MERLIN <marc@merlins.org> wrote:
> On Tue, Jul 11, 2017 at 09:48:12AM -0700, Marc MERLIN wrote:
>> On Tue, Jul 11, 2017 at 10:00:40AM -0600, Chris Murphy wrote:
>> > > ---[ end trace feb4b95c83ac065f ]---
>> > > BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
>> > > BTRFS info (device dm-2): forced readonly
>> >
>> > You've already had this same traceback, not sure whether it's the same
>> > file system or not, but it was 4.7.2 kernel.
>>
>> You have better memory than me. I'll admit that I'm kind of overwhelmed
>> by all the time I'm currently spending/wasting on btrfs recovery and
>> that came almost out of nowwhere and hit me in 3 different places :-/
>
> Ok, I'm on 4.9.36 and same problem :(
>
> This is on an otherwise ok working filesystem that comes back clean
> on btrfs check (although I haven't done lowmem but last time I tried lowmem it
> reported problems that apparently weren't really problems)
>
> Dear devs, what does this error mean exactly and what should I do about it besides
> ignoring it and remounting my FS read-write?
> On the plus side thanks for both
> 1) showing which device the error is on
> 2) not crashing the system :)
>
> WARNING: CPU: 6 PID: 3730 at fs/btrfs/extent-tree.c:2967 btrfs_run_delayed_refs+0xbd/0x1be
> BTRFS: Transaction aborted (error -17)
> CPU: 0 PID: 3730 Comm: btrfs-cleaner Tainted: G U W 4.9.36-amd64-preempt-sysrq-20170
>
> Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> ffffb55c679bfc88 ffffffff8239b00b ffffb55c679bfcd8 0000000000000000
> ffffb55c679bfcc8 ffffffff82066769 00000b97679bfd48 ffffa07f61a5eaa0
> ffffa086f217c800 00000000ffffffef ffffa086ad8b5a90 00000000000003a0
> Call Trace:
> [<ffffffff8239b00b>] dump_stack+0x61/0x7d
> [<ffffffff82066769>] __warn+0xc2/0xdd
> [<ffffffff820667de>] warn_slowpath_fmt+0x5a/0x76
> [<ffffffff8228dd5f>] btrfs_run_delayed_refs+0xbd/0x1be
> [<ffffffff8228b358>] ? walk_up_tree+0x87/0x10f
> [<ffffffff8229fd8f>] btrfs_should_end_transaction+0x54/0x5d
> [<ffffffff8228c8b5>] btrfs_drop_snapshot+0x380/0x65c
> [<ffffffff822edf7c>] ? btrfs_kill_all_delayed_nodes+0x5f/0xd7
> [<ffffffff826ecf8a>] ? _raw_spin_lock+0x15/0x17
> [<ffffffff82292130>] ? btrfs_delete_unused_bgs+0x326/0x369
> [<ffffffff822a0e29>] btrfs_clean_one_deleted_snapshot+0xce/0xdc
> [<ffffffff82298c1e>] cleaner_kthread+0xaf/0x17c
> [<ffffffff82298b6f>] ? btrfs_need_cleaner_sleep.isra.25+0x2c/0x2c
> [<ffffffff82081e94>] kthread+0xd1/0xd9
> [<ffffffff82081dc3>] ? init_completion+0x24/0x24
> [<ffffffff82003add>] ? do_fast_syscall_32+0xb7/0xfe
> [<ffffffff826ed4b5>] ret_from_fork+0x25/0x30
> ---[ end trace 59fd1c9a379f73bc ]---
> BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2967: errno=-17 Object already exists
> BTRFS info (device dm-2): forced readonly
Well I'd say it's a bug, but that's not a revelation. Is there a
snapshot being deleted in the approximate time frame for this? I see a
snapshot is being cleaned up and chunks being removed. So I wonder if
this can be avoided or intentionally triggered by manipulating
snapshot deletion coinciding with the workload? Maybe it's a race, and
that's why it hits EEXIST, and if so then it's just getting confused
and needs to start from scratch - if true then it's OK to just umount
and mount (rw) again and continue on.
There are some changes in the code between 4.9.36 and 4.12.1 (not sure
when the change was introduced, or if it alters whether you hit this
bug)
btrfs/extent.c
@@ -2962,7 +2966,7 @@ again:
delayed_refs->run_delayed_start = find_middle(&delayed_refs->root);
#endif
trans->can_flush_pending_bgs = false;
- ret = __btrfs_run_delayed_refs(trans, root, count);
+ ret = __btrfs_run_delayed_refs(trans, fs_info, count);
if (ret < 0) {
btrfs_abort_transaction(trans, ret);
return ret;
Another thing I'm not certain of is if the dm-2 reference is just how
it's referring to the file system, or if it's to be taken literally as
an issue with this device. My understanding of the code is really
weak, but I think this whole trace is within Btrfs logical block
handling, in which case it wouldn't know of a problem with a
particular device. It knows that it's in the weeds, but has no idea
what golf course it's on.
--
Chris Murphy
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
2017-07-13 18:17 ` Chris Murphy
@ 2017-07-15 0:48 ` Marc MERLIN
0 siblings, 0 replies; 47+ messages in thread
From: Marc MERLIN @ 2017-07-15 0:48 UTC (permalink / raw)
To: Chris Murphy; +Cc: Btrfs BTRFS
On Thu, Jul 13, 2017 at 12:17:16PM -0600, Chris Murphy wrote:
> Well I'd say it's a bug, but that's not a revelation. Is there a
> snapshot being deleted in the approximate time frame for this? I see a
Yep :)
I run btrfs-snaps and it happens right aroudn that time.
It creates a snapshot and deletes the oldest one.
There is likely a race condition if you delete a or more snapshots just
after creating one on the same subvolume, although this has worked for
about 3 years up to now.
http://marc.merlins.org/perso/btrfs/post_2014-03-21_Btrfs-Tips_-How-To-Setup-Netapp-Style-Snapshots.html
http://marc.merlins.org/linux/scripts/btrfs-snaps
Sure, I can start adding sleeps between creation and deletion, but I
haven't had to so far.
> snapshot is being cleaned up and chunks being removed. So I wonder if
> this can be avoided or intentionally triggered by manipulating
> snapshot deletion coinciding with the workload? Maybe it's a race, and
> that's why it hits EEXIST, and if so then it's just getting confused
> and needs to start from scratch - if true then it's OK to just umount
> and mount (rw) again and continue on.
which is what I've been doing.
> There are some changes in the code between 4.9.36 and 4.12.1 (not sure
> when the change was introduced, or if it alters whether you hit this
> bug)
I don't think I hit the bug with 4.11 or 4.12 since I didn't stay on it
long enough to know for sure (I don't think I hit the bug on 4.11, but
with the corruption issues I had which I'm still not sure were due to
other factors or the kernel, I've rolled back as discussed earlier.
On my biggest system, I'm still debugging an issue with 3 of my 8 drives
get pseudo randomly kicked out after returning corrupted data for a few
seconds. I'm pretty sure it's not an issue with the drives, but I'm not
sure if it's the disk carrier/enclosure, cables, or actual ports on the
SAS card (working through the option matrix to find out)
> Another thing I'm not certain of is if the dm-2 reference is just how
> it's referring to the file system, or if it's to be taken literally as
> an issue with this device. My understanding of the code is really
> weak, but I think this whole trace is within Btrfs logical block
> handling, in which case it wouldn't know of a problem with a
> particular device. It knows that it's in the weeds, but has no idea
> what golf course it's on.
dm-2 is correct, it does refer to the correct device.
gargamel:~# dmsetup status -v dshelf1
Name: dshelf1
State: ACTIVE
Read Ahead: 8192
Tables present: LIVE
Open count: 1
Event number: 1
Major, minor: 253, 2
Number of targets: 1
UUID: CRYPT-LUKS1-3cd9bbafa2bb44a587a658a77487ee73-dshelf1_unformatted
0 46883102704 crypt
gargamel:~# l /dev/mapper/dshelf1 /dev/dm-2
brw-rw---- 1 root disk 253, 2 Jul 14 06:30 /dev/dm-2
lrwxrwxrwx 1 root root 7 Jul 14 06:30 /dev/mapper/dshelf1 -> ../dm-2
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-07-11 6:21 BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists Marc MERLIN
2017-07-11 16:00 ` Chris Murphy
@ 2017-07-15 1:22 ` Marc MERLIN
2017-07-15 23:12 ` Marc MERLIN
1 sibling, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2017-07-15 1:22 UTC (permalink / raw)
To: linux-btrfs, Chris Murphy, Kai Krakow, bepi, matt, mh, mkaganer,
david, tch, somethingsome2000
Cc: Chris Mason, bo.li.liu, fdmanana, Josef Bacik, David Sterba
Dear Chris and other developers,
Can you look at this bug which has been happening since 2012 on apparently all kernels between at least
3.4 and 4.11.
I didn't look in detail at each thread (took long enough to even find them all and paste here), but they seem pretty
similar although the reasons how they got there may be different, or at least not as benign as a race condition
between snapshot creation and deletion for those who do hourly snapshot rotations like me.
On the plus side, it looks like ever since 3.4 the code was already
smart enough not to crash you and just remounted the device read only.
On Mon, Jul 10, 2017 at 11:21:55PM -0700, Marc MERLIN wrote:
> Looks like btrfs has decided to give me hell.
> I'm still recovering my system.
> The biggest filesystem seems to work, but I just had it go read only:
>
> ------------[ cut here ]------------
> WARNING: CPU: 5 PID: 3734 at fs/btrfs/extent-tree.c:2960 btrfs_run_delayed_refs+0xb6/0x1dc
> BTRFS: Transaction aborted (error -17)
> Modules linked in: udp_diag tcp_diag inet_diag veth ip6table_filter ip6_tables ebtable_nat ebtables ppdev lp xt_addrtype br_
> fmt_misc ftdi_sio nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_
> ptable_mangle iptable_filter pl2303 dm_snapshot dm_bufio iptable_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
> e_ipv4 nf_nat nf_conntrack x_tables sg st snd_pcm_oss snd_mixer_oss bcache kvm_intel kvm irqbypass snd_hda_codec_realtek snd
> da_codec snd_cmipci rc_ati_x10 asus_wmi snd_hda_core snd_mpu401_uart snd_opl3_lib snd_hwdep snd_rawmidi snd_seq_device spars
> l tpm_infineon snd tpm_tis hwmon tpm_tis_core usbnet rc_core i2c_i801 usbserial libphy soundcore wmi i915 lpc_ich mfd_cor
> s evdev pcspkr parport_pc battery mei_me parport i2c_smbus e1000e ptp pps_core fuse raid456 multipath mmc_block mmc_core dm_
> r async_memcpy async_tx crc32c_intel blowfish_x86_64 blowfish_common aesni_intel aes_x86_64 lrw glue_helper ablk_helper cryp
> 4 thermal usbcore mvsas libsas fjes scsi_transport_sas fan r8169 mii usb_common [last unloaded: ftdi_sio]
> CPU: 1 PID: 3734 Comm: btrfs-transacti Tainted: G U W 4.8.10-amd64-preempt-sysrq-20161121vb3tj1 #12
> Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> 0000000000200286 000000003f87d529 ffff9dcc9838fd00 ffffffffbb39e738
> ffff9dcc9838fd50 0000000000000000 ffff9dcc9838fd40 ffffffffbb066e08
> 00000b909838fdc0 ffff9dc94fdc9be0 0000000000000000 ffff9dcca0d93000
> Call Trace:
> [<ffffffffbb39e738>] dump_stack+0x63/0x7f
> [<ffffffffbb066e08>] __warn+0xc2/0xdd
> [<ffffffffbb066e7d>] warn_slowpath_fmt+0x5a/0x76
> [<ffffffffbb291dc2>] btrfs_run_delayed_refs+0xb6/0x1dc
> [<ffffffffbb2a4d1d>] btrfs_commit_transaction+0x5b/0x965
> [<ffffffffbb2a030e>] transaction_kthread+0xf5/0x19f
> [<ffffffffbb2a0219>] ? btrfs_cleanup_transaction+0x47b/0x47b
> [<ffffffffbb081df3>] kthread+0xb4/0xbc
> [<ffffffffbb6d23df>] ret_from_fork+0x1f/0x40
> [<ffffffffbb081d3f>] ? init_completion+0x24/0x24
> ---[ end trace feb4b95c83ac065f ]---
> BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
> BTRFS info (device dm-2): forced readonly
Ok, please try this search in gmail or whatever archive you have
"btrfs_run_delayed_refs" "BTRFS: Transaction aborted" "Object already exists"
I had a look in the archives. I'm wrong, I did have the bug with 4.11 (pasted below)
and plenty of others have had it too, actually plenty of other people, going all the way back to 3.4 (2012)
if all the reports I just found and pasted are ultimately the same problem (they may not be)
Me, it happens at snapshot rotation time, others triggered this other ways I think
Kai Krakow <hurikhan77@gmail.com> 2016/08/28
[4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists
[44819.903435] ------------[ cut here ]------------
[44819.903443] WARNING: CPU: 3 PID: 2787 at fs/btrfs/extent-tree.c:2963 btrfs_run_delayed_refs+0x26c/0x290
[44819.903444] BTRFS: Transaction aborted (error -17)
[44819.903484] CPU: 3 PID: 2787 Comm: BrowserBlocking Tainted: P O 4.7.2-gentoo #2
[44819.903485] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z68 Pro3, BIOS L2.16A 02/22/2013
[44819.903487] 0000000000000000 ffffffff8130af2d ffff8800b7d03d20 0000000000000000
[44819.903489] ffffffff810865fa ffff880409374428 ffff8800b7d03d70 ffff8803bf299760
[44819.903491] 0000000000000000 00000000ffffffef ffff8803f677f000 ffffffff8108666a
[44819.903493] Call Trace:
[44819.903496] [<ffffffff8130af2d>] ? dump_stack+0x46/0x59
[44819.903500] [<ffffffff810865fa>] ? __warn+0xba/0xe0
[44819.903502] [<ffffffff8108666a>] ? warn_slowpath_fmt+0x4a/0x50
[44819.903504] [<ffffffff8121351c>] ? btrfs_run_delayed_refs+0x26c/0x290
[44819.903507] [<ffffffff811feb1e>] ? btrfs_release_path+0xe/0x80
[44819.903509] [<ffffffff81216afa>] ? btrfs_start_dirty_block_groups+0x2da/0x420
[44819.903511] [<ffffffff812279f3>] ? btrfs_commit_transaction+0x143/0x990
[44819.903514] [<ffffffff8116a2c5>] ? kmem_cache_free+0x165/0x180
[44819.903516] [<ffffffff8124396c>] ? btrfs_wait_ordered_range+0x7c/0x110
[44819.903518] [<ffffffff8123ecf6>] ? btrfs_sync_file+0x286/0x360
[44819.903522] [<ffffffff811ae343>] ? do_fsync+0x33/0x60
[44819.903524] [<ffffffff811ae57a>] ? SyS_fdatasync+0xa/0x10
[44819.903528] [<ffffffff8162299b>] ? entry_SYSCALL_64_fastpath+0x13/0x8f
[44819.903529] ---[ end trace 6944811e170a0e57 ]---
[44819.903531] BTRFS: error (device bcache2) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists
[44819.903533] BTRFS info (device bcache2): forced readonly
Me 2017/06/20
4.11.3: BTRFS critical (device dm-1): unable to add free space :-17 => btrfs check --repair runs clean
[846332.977964] ------------[ cut here ]------------
[846332.992285] WARNING: CPU: 4 PID: 4095 at fs/btrfs/free-space-cache.c:1476 tree_insert_offset+0x78/0xb1
[846333.402648] CPU: 4 PID: 4095 Comm: btrfs-transacti Tainted: G U 4.11.3-amd64-preempt-sysrq-20170406 #5
[846333.434917] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
[846333.463597] Call Trace:
[846333.469942] usb 2-1-port4: device 2-1.4 not suspended yet
[846333.489639] dump_stack+0x61/0x7d
[846333.500480] __warn+0xc2/0xdd
[846333.510956] warn_slowpath_null+0x1d/0x1f
[846333.524103] tree_insert_offset+0x78/0xb1
[846333.537337] link_free_space+0x2c/0x41
[846333.549991] __btrfs_add_free_space+0x89/0x3aa
[846333.564236] ? kmem_cache_free+0x3d/0x92
[846333.577702] btrfs_add_free_space+0x1d/0x1f
[846333.591179] unpin_extent_range+0xf3/0x2b0
[846333.605220] btrfs_finish_extent_commit+0xda/0x1d4
[846333.621324] btrfs_commit_transaction+0x629/0x79a
[846333.637205] ? add_wait_queue+0x44/0x44
[846333.649680] transaction_kthread+0xe2/0x178
[846333.663201] ? btrfs_cleanup_transaction+0x3e8/0x3e8
[846333.679033] kthread+0xfb/0x100
[846333.690261] ? init_completion+0x24/0x24
[846333.703239] ? do_fast_syscall_32+0xb7/0xfe
[846333.717649] ret_from_fork+0x2c/0x40
[846333.729656] ---[ end trace 27aa532d1886e536 ]---
[846333.744721] BTRFS critical (device dm-1): unable to add free space :-17
[847312.529660] BTRFS: Transaction aborted (error -17)
[847312.912784] CPU: 6 PID: 4094 Comm: btrfs-cleaner Tainted: G U W 4.11.3-amd64-preempt-sysrq-20170406 #5
[847312.913132] usb 2-1-port4: device 2-1.4 not suspended yet
[847312.962394] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
[847312.990936] Call Trace:
[847312.999347] dump_stack+0x61/0x7d
[847313.010383] __warn+0xc2/0xdd
[847313.020351] warn_slowpath_fmt+0x5a/0x76
[847313.033274] btrfs_run_delayed_refs+0xb1/0x1cc
[847313.047655] btrfs_should_end_transaction+0x50/0x57
[847313.063910] btrfs_drop_snapshot+0x38a/0x6c4
[847313.078619] ? btrfs_kill_all_delayed_nodes+0x5f/0xd7
[847313.094916] ? _raw_spin_lock+0x15/0x17
[847313.108325] btrfs_clean_one_deleted_snapshot+0xce/0xdc
[847313.125493] cleaner_kthread+0x91/0x14b
[847313.138228] ? btrfs_destroy_pinned_extent+0xd2/0xd2
[847313.154308] kthread+0xfb/0x100
[847313.164900] ? init_completion+0x24/0x24
[847313.177781] ? do_fast_syscall_32+0xb7/0xfe
[847313.191490] ret_from_fork+0x2c/0x40
[847313.203432] ---[ end trace 27aa532d1886e537 ]---
[847313.218391] BTRFS: error (device dm-1) in btrfs_run_delayed_refs:2961: errno=-17 Object already exists
[847313.247668] BTRFS info (device dm-1): forced readonly
Giuseppe Della Bianca 2016/12/18, 4.8.8
[CORRUPTION FILESYSTEM] Corrupted and unrecoverable file system during the snapshot receive
------------[ cut here ]------------
WARNING: CPU: 1 PID: 4325 at fs/btrfs/extent-tree.c:2960 btrfs_run_delayed_refs+0x283/0x2b0 [btrfs]
BTRFS: Transaction aborted (error -17)
Modules linked in: fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_br
soundcore acpi_cpufreq tpm_tis tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc ata_generic nouveau vide
CPU: 1 PID: 4325 Comm: umount Tainted: G W 4.8.8-100.fc23.x86_64 #1
Hardware name: System manufacturer System Product Name/M2N, BIOS 0902 02/16/2009
0000000000000286 00000000dd260fac ffff8ffa0d25bb60 ffffffffbc3e493e
ffff8ffa0d25bbb0 0000000000000000 ffff8ffa0d25bba0 ffffffffbc0a0ecb
00000b9000000049 ffff8ff9e61b40a0 ffff8ffa2da77800 ffffffffffffffff
Call Trace:
[<ffffffffbc3e493e>] dump_stack+0x63/0x85
[<ffffffffbc0a0ecb>] __warn+0xcb/0xf0
[<ffffffffbc0a0f4f>] warn_slowpath_fmt+0x5f/0x80
[<ffffffffc07eb4e3>] btrfs_run_delayed_refs+0x283/0x2b0 [btrfs]
[<ffffffffc07d62ec>] ? btrfs_cow_block+0x10c/0x1e0 [btrfs]
[<ffffffffc07ff62e>] commit_cowonly_roots+0xae/0x2e0 [btrfs]
[<ffffffffc07eb466>] ? btrfs_run_delayed_refs+0x206/0x2b0 [btrfs]
[<ffffffffc08706b4>] ? btrfs_qgroup_account_extents+0x84/0x180 [btrfs]
[<ffffffffc0802187>] btrfs_commit_transaction+0x547/0xa40 [btrfs]
[<ffffffffc07faa9f>] btrfs_commit_super+0x8f/0xa0 [btrfs]
[<ffffffffc07fcbcb>] close_ctree+0x2db/0x380 [btrfs]
[<ffffffffbc26d3da>] ? evict_inodes+0x15a/0x180
[<ffffffffc07ccf29>] btrfs_put_super+0x19/0x20 [btrfs]
[<ffffffffbc2520bf>] generic_shutdown_super+0x6f/0xf0
[<ffffffffbc2523b2>] kill_anon_super+0x12/0x20
[<ffffffffc07cdd98>] btrfs_kill_super+0x18/0x110 [btrfs]
[<ffffffffbc252763>] deactivate_locked_super+0x43/0x70
[<ffffffffbc2527ec>] deactivate_super+0x5c/0x60
[<ffffffffbc2711bf>] cleanup_mnt+0x3f/0x90
[<ffffffffbc271252>] __cleanup_mnt+0x12/0x20
[<ffffffffbc0bf0ce>] task_work_run+0x7e/0xa0
[<ffffffffbc0032d2>] exit_to_usermode_loop+0xc2/0xd0
[<ffffffffbc003bf1>] syscall_return_slowpath+0xa1/0xb0
[<ffffffffbc7ffb3a>] entry_SYSCALL_64_fastpath+0xa2/0xa4
---[ end trace f7eb2e818f727168 ]---
BTRFS: error (device sda3) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
BTRFS info (device sda3): forced readonly
BTRFS warning (device sda3): Skipping commit of aborted transaction.
BTRFS: error (device sda3) in cleanup_transaction:1854: errno=-17 Object already exists
Matt McKinnon <matt@techsquare.com> 2017/08/09 kernel 4.7
BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists
------------[ cut here ]------------
WARNING: CPU: 6 PID: 269 at fs/btrfs/extent-tree.c:2963 btrfs_run_delayed_refs+0x292/0x2d0 [btrfs]
BTRFS: Transaction aborted (error -17)
Modules linked in: ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables ipmi_devintf nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd dm_multipath joydev lpc_ich mei_me mei ioatdma wmi ipmi_si ipmi_msghandler shpchp mac_hid btrfs lp parport ses enclosure scsi_transport_sas raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor igb raid6_pq libcrc32c i2c_algo_bit raid1 hid_generic dca usbhid raid0 ptp hid ahci megaraid_sas multipath libahci pps_core linear dm_mirror dm_region_hash dm_log
CPU: 6 PID: 269 Comm: kworker/u18:5 Not tainted 4.7.0-custom #1
Hardware name: Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28/2014
Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
0000000000000000 ffff88086a057ca0 ffffffff813b816c ffff88086a057cf0
0000000000000000 ffff88086a057ce0 ffffffff8107a321 00000b9325288170
ffff8808519eb000 ffff880825288170 ffff88086b2c1000 0000000000000020
Call Trace:
[<ffffffff813b816c>] dump_stack+0x63/0x87
[<ffffffff8107a321>] __warn+0xd1/0xf0
[<ffffffff8107a38f>] warn_slowpath_fmt+0x4f/0x60
[<ffffffffc01c6e52>] btrfs_run_delayed_refs+0x292/0x2d0 [btrfs]
[<ffffffffc01c6f24>] delayed_ref_async_start+0x94/0xb0 [btrfs]
[<ffffffffc020f780>] normal_work_helper+0xc0/0x2d0 [btrfs]
[<ffffffff81091082>] ? pwq_activate_delayed_work+0x42/0xb0
[<ffffffffc020fbc2>] btrfs_extent_refs_helper+0x12/0x20 [btrfs]
[<ffffffff81093173>] process_one_work+0x153/0x3f0
[<ffffffff8109392b>] worker_thread+0x12b/0x4b0
[<ffffffff81093800>] ? rescuer_thread+0x340/0x340
[<ffffffff81099109>] kthread+0xc9/0xe0
[<ffffffff817db85f>] ret_from_fork+0x1f/0x40
[<ffffffff81099040>] ? kthread_park+0x60/0x60
---[ end trace e2b0b8dc37502011 ]---
BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists
BTRFS info (device sda1): forced readonly
Marc Haber <mh+linux-btrfs@zugschlus.de> 2015/12/11
Transaction aborted (error -17) during balance
WARNING: CPU: 4 PID: 5545 at /build/linux-eGTGmU/linux-4.3/fs/btrfs/extent-tree.c:2093 __btrfs_inc_extent_ref.isra.52+0x20e/0x280 [btrfs]()
BTRFS: Transaction aborted (error -17)
Modules linked in: ctr ccm tun rfcomm cpufreq_userspace binfmt_misc cpufreq_stats cpufreq_powersave cpufreq_conservative nf_conntrack_netlink nfnetlink bnep ip6table_filter ip6_tables xt_TCPMSS xt_tcpudp iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables bridge stp llc joydev arc4 iTCO_wdt iwldvm iTCO_vendor_support mac80211 snd_hda_codec_conexant intel_rapl snd_hda_codec_generic iosf_mbi x86_pkg_temp_thermal btusb intel_powerclamp btrtl snd_hda_intel iwlwifi btbcm kvm_intel snd_hda_codec btintel kvm snd_hda_core psmouse bluetooth snd_hwdep snd_pcm_oss pcspkr serio_raw i2c_i801 sg cfg80211 snd_mixer_oss lpc_ich snd_pcm mfd_core snd_timer mei_me shpchp mei thinkpad_acpi nvram
tpm_tis snd tpm soundcore rfkill evdev battery ac processor coretemp loop drbd lru_cache libcrc32c parport_pc ppdev lp parport autofs4 btrfs xor raid6_pq ext4 crc16 mbcache jbd2 algif_skcipher af_alg dm_crypt dm_mod md_mod hid_generic hid_logitech_hidpp hid_logitech_dj usbhid hid sd_mod uas usb_storage crct10dif_pclmul crc32_pclmul crc32c_intel jitterentropy_rng sha256_ssse3 sha256_generic hmac drbg ansi_cprng aesni_intel aes_x86_64 lrw gf128mul glue_helper i915 ahci ablk_helper cryptd libahci sdhci_pci i2c_algo_bit libata ehci_pci drm_kms_helper sdhci ehci_hcd scsi_mod mmc_core e1000e usbcore ptp usb_common drm pps_core thermal wmi video button
CPU: 4 PID: 5545 Comm: kworker/u16:1 Not tainted 4.3.0-trunk-amd64 #1 Debian 4.3-1~exp2
Hardware name: LENOVO 4240CTO/4240CTO, BIOS 8AET63WW (1.43 ) 05/08/2013
Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
ffffffffa0627250 ffffffff812c5319 ffff88020dc23ba0 ffffffff8106ebcd
ffff880406146000 ffff88020dc23bf0 ffff8803c90b9410 0000000000000000
0000000000000106 ffffffff8106ec4c ffffffffa0627420 ffffffff00000020
Call Trace:
[<ffffffff812c5319>] ? dump_stack+0x40/0x57
[<ffffffff8106ebcd>] ? warn_slowpath_common+0x7d/0xb0
[<ffffffff8106ec4c>] ? warn_slowpath_fmt+0x4c/0x50
[<ffffffffa058bdc9>] ? insert_tree_block_ref+0x49/0x60 [btrfs]
[<ffffffffa058fc6e>] ? __btrfs_inc_extent_ref.isra.52+0x20e/0x280 [btrfs]
[<ffffffffa0594e77>] ? __btrfs_run_delayed_refs+0xc47/0x1050 [btrfs]
[<ffffffff8101d3b5>] ? sched_clock+0x5/0x10
[<ffffffff81094130>] ? check_preempt_curr+0x50/0x90
[<ffffffff81094184>] ? ttwu_do_wakeup+0x14/0xc0
[<ffffffffa0597e98>] ? btrfs_run_delayed_refs+0x78/0x2a0 [btrfs]
[<ffffffffa05980f2>] ? delayed_ref_async_start+0x32/0x80 [btrfs]
[<ffffffffa05daeb8>] ? btrfs_scrubparity_helper+0xc8/0x260 [btrfs]
[<ffffffff810851df>] ? process_one_work+0x19f/0x3d0
[<ffffffff8108545d>] ? worker_thread+0x4d/0x450
[<ffffffff81085410>] ? process_one_work+0x3d0/0x3d0
[<ffffffff8108af5d>] ? kthread+0xbd/0xe0
[<ffffffff8108aea0>] ? kthread_create_on_node+0x170/0x170
[<ffffffff81553d0f>] ? ret_from_fork+0x3f/0x70
[<ffffffff8108aea0>] ? kthread_create_on_node+0x170/0x170
---[ end trace 6671e30ac2882b40 ]---
BTRFS: error (device dm-11) in __btrfs_inc_extent_ref:2093: errno=-17 Object already exists
BTRFS info (device dm-11): forced readonly
BTRFS: error (device dm-11) in btrfs_run_delayed_refs:2851: errno=-17 Object already exists
Mordechay Kaganer <mkaganer@gmail.com> 2015/11/16 kernel 4.2
Transaction aborted (error -17) after crash
[ 836.026606] BTRFS warning (device md1): block group 12969790406656 has wrong amount of free space
[ 836.026610] BTRFS warning (device md1): failed to load free space cache for block group 12969790406656, rebuild it now
[ 1033.619798] BTRFS warning (device md1): block group 15322358743040 has wrong amount of free space
[ 1033.619801] BTRFS warning (device md1): failed to load free space cache for block group 15322358743040, rebuild it now
[ 2052.843713] ------------[ cut here ]------------
[ 2052.843756] WARNING: CPU: 2 PID: 1725 at /home/kernel/COD/linux/fs/btrfs/extent-tree.c:2781 btrfs_run_delayed_refs.part.73+0x242/0x270 [btrfs]()
[ 2052.843758] BTRFS: Transaction aborted (error -17)
[ 2052.843827] CPU: 2 PID: 1725 Comm: btrfs-transacti Not tainted 4.2.5-040205-generic #201510270124
[ 2052.843829] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./EPC602D8A, BIOS P1.20 04/16/2014
[ 2052.843832] 0000000000000000 00000000df907816 ffff8808414dfcb8 ffffffff817d8d6d
[ 2052.843836] 0000000000000000 ffff8808414dfd10 ffff8808414dfcf8 ffffffff8107b3c6
[ 2052.843839] 0000000000001a0c ffff88049c5fe8a0 ffff88085577d800 ffff88082932cb80
[ 2052.843843] Call Trace:
[ 2052.843852] [<ffffffff817d8d6d>] dump_stack+0x45/0x57
[ 2052.843858] [<ffffffff8107b3c6>] warn_slowpath_common+0x86/0xc0
[ 2052.843862] [<ffffffff8107b455>] warn_slowpath_fmt+0x55/0x70
[ 2052.843878] [<ffffffffc022ecf2>] btrfs_run_delayed_refs.part.73+0x242/0x270 [btrfs]
[ 2052.843882] [<ffffffff810e54bc>] ? del_timer_sync+0x4c/0x60
[ 2052.843897] [<ffffffffc022ed35>] btrfs_run_delayed_refs+0x15/0x20 [btrfs]
[ 2052.843915] [<ffffffffc0243756>] btrfs_commit_transaction+0x56/0xb20 [btrfs]
[ 2052.843931] [<ffffffffc023ee19>] transaction_kthread+0x229/0x240 [btrfs]
[ 2052.843945] [<ffffffffc023ebf0>] ? btrfs_cleanup_transaction+0x550/0x550 [btrfs]
[ 2052.843949] [<ffffffff8109a798>] kthread+0xd8/0xf0
[ 2052.843953] [<ffffffff8109a6c0>] ? kthread_create_on_node+0x1b0/0x1b0
[ 2052.843957] [<ffffffff817dff9f>] ret_from_fork+0x3f/0x70
[ 2052.843960] [<ffffffff8109a6c0>] ? kthread_create_on_node+0x1b0/0x1b0
[ 2052.843962] ---[ end trace 6575cf272a151e61 ]---
[ 2052.843966] BTRFS: error (device md1) in
btrfs_run_delayed_refs:2781: errno=-17 Object already exists
[ 2052.844024] BTRFS info (device md1): forced readonly
[ 2052.848397] pending csums is 7327744
David Goodwin <david@codepoets.co.uk> 2015/07/25 kernel 4.2
WARNING: CPU: 2 PID: 31502 at fs/btrfs/extent-tree.c:2025 __btrfs_inc_extent_ref.isra.51+0x210/0x280 [btrfs]()
BTRFS: Transaction aborted (error -17)
CPU: 2 PID: 31502 Comm: kworker/u16:1 Tainted: G O 4.2.0-rc3-dg1 #1
Hardware name: System manufacturer System Product Name/M5A88-M, BIOS 1101 03/16/2012
Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
0000000000000000 ffffffffa02b98a7 ffffffff81540a6f ffff880107383b28
ffffffff8106dfa1 ffff88040c955800 ffff8801003612f8 ffff8800441bfda0
00000a6f8acba000 0000000000003fa4 ffffffff8106e01a ffffffffa02bbc48
Call Trace:
[<ffffffff81540a6f>] ? dump_stack+0x40/0x50
[<ffffffff8106dfa1>] ? warn_slowpath_common+0x81/0xb0
[<ffffffff8106e01a>] ? warn_slowpath_fmt+0x4a/0x50
[<ffffffffa0222390>] ? __btrfs_inc_extent_ref.isra.51+0x210/0x280 [btrfs]
[<ffffffffa0229e1f>] ? __btrfs_run_delayed_refs+0xd1f/0x10a0 [btrfs]
[<ffffffff8101cc65>] ? sched_clock+0x5/0x10
[<ffffffff811bd0c2>] ? __sb_start_write+0x42/0xe0
[<ffffffffa022e26a>] ? btrfs_run_delayed_refs.part.73+0x6a/0x280 [btrfs]
[<ffffffffa022e518>] ? delayed_ref_async_start+0x78/0x90 [btrfs]
[<ffffffffa026eb6c>] ? normal_work_helper+0xbc/0x260 [btrfs]
[<ffffffff81084e01>] ? process_one_work+0x151/0x3d0
[<ffffffff81085805>] ? worker_thread+0x65/0x470
[<ffffffff8154226d>] ? __schedule+0x28d/0x8a0
[<ffffffff810857a0>] ? rescuer_thread+0x310/0x310
[<ffffffff8108ac23>] ? kthread+0xd3/0xf0
[<ffffffff8108ab50>] ? kthread_create_on_node+0x180/0x180
[<ffffffff8154699f>] ? ret_from_fork+0x3f/0x70
[<ffffffff8108ab50>] ? kthread_create_on_node+0x180/0x180
---[ end trace cc878b7b9dc6406e ]---
BTRFS: error (device sdc1) in __btrfs_inc_extent_ref:2025: errno=-17 Object already exists
BTRFS info (device sdc1): forced readonly
BTRFS: error (device sdc1) in btrfs_run_delayed_refs:2781: errno=-17 Object already exists
It keeps going, I ran out of motivation for pasting them all
Tomasz Chmielewski <tch@virtall.com> / 2013/12/20 kernel 3.13:
BTRFS debug (device sdb5): run_one_delayed_ref returned -17
------------[ cut here ]------------
WARNING: CPU: 0 PID: 15042 at fs/btrfs/super.c:254 __btrfs_abort_transaction+0x4d/0xff [btrfs]()
btrfs: Transaction aborted (error -17)
CPU: 0 PID: 15042 Comm: btrfs-transacti Tainted: G W 3.13.0-rc4 #1
Hardware name: System manufacturer System Product Name/P8H77-M PRO, BIOS 1101 02/04/2013
0000000000000009 ffff8800374ddc48 ffffffff8138a37d 0000000000000006
ffff8800374ddc98 ffff8800374ddc88 ffffffff810370a9 ffff8800374ddd80
ffffffffa020d524 00000000ffffffef ffff8807ead7d800 ffff8807ff0cc8c0
Call Trace:
[<ffffffff8138a37d>] dump_stack+0x46/0x58
[<ffffffff810370a9>] warn_slowpath_common+0x77/0x91
[<ffffffffa020d524>] ? __btrfs_abort_transaction+0x4d/0xff [btrfs]
[<ffffffff81037157>] warn_slowpath_fmt+0x41/0x43
[<ffffffffa020d524>] __btrfs_abort_transaction+0x4d/0xff [btrfs]
[<ffffffffa02226ed>] btrfs_run_delayed_refs+0x253/0x46f [btrfs]
[<ffffffffa022fdec>] btrfs_commit_transaction+0x36d/0x7df [btrfs]
[<ffffffffa022e345>] transaction_kthread+0xef/0x1c2 [btrfs]
[<ffffffffa022e256>] ? open_ctree+0x1ac7/0x1ac7 [btrfs]
[<ffffffff8104ee9a>] kthread+0xcd/0xd5
[<ffffffff8104edcd>] ? kthread_freezable_should_stop+0x43/0x43
[<ffffffff8138f17c>] ret_from_fork+0x7c/0xb0
[<ffffffff8104edcd>] ? kthread_freezable_should_stop+0x43/0x43
---[ end trace b552aca9a0cff3cb ]---
BTRFS error (device sdb5) in btrfs_run_delayed_refs:2730: errno=-17 Object already exists
BTRFS info (device sdb5): forced readonly
BTRFS warning (device sdb5): Skipping commit of aborted transaction.
BTRFS error (device sdb5) in cleanup_transaction:1553: errno=-17 Object already exists
Chester <somethingsome2000@gmail.com> / 2012/06/26
btrfs volume suddenly becomes read-only
btrfs: run_one_delayed_ref returned -17
------------[ cut here ]------------
WARNING: at fs/btrfs/super.c:221 __btrfs_abort_transaction+0x40/0x9d()
Hardware name: HP Pavilion dv6 Notebook PC
btrfs: Transaction aborted
Pid: 4491, comm: btrfs-endio-wri Not tainted 3.4.0-00091-gcb77fcd #1
Call Trace:
[<ffffffff8106382f>] warn_slowpath_common+0x7e/0x96
[<ffffffff810638db>] warn_slowpath_fmt+0x41/0x43
[<ffffffff8125e626>] __btrfs_abort_transaction+0x40/0x9d
[<ffffffff8126dd55>] btrfs_run_delayed_refs+0x267/0x34b
[<ffffffff8111e2f3>] ? virt_to_head_page+0x9/0x2c
[<ffffffff8127c241>] __btrfs_end_transaction+0x7f/0x21b
[<ffffffff8127c426>] btrfs_end_transaction+0x10/0x12
[<ffffffff812810c0>] btrfs_finish_ordered_io+0x295/0x2e5
[<ffffffff8167ce58>] ? schedule_timeout+0x9c/0xb6
[<ffffffff8106eb22>] ? usleep_range+0x3d/0x3d
[<ffffffff81281120>] finish_ordered_fn+0x10/0x12
[<ffffffff812a3256>] worker_loop+0x169/0x4a3
[<ffffffff812a30ed>] ? btrfs_queue_worker+0x283/0x283
[<ffffffff8107d0c0>] kthread+0x86/0x8e
[<ffffffff81685c64>] kernel_thread_helper+0x4/0x10
[<ffffffff8107d03a>] ? kthread_freezable_should_stop+0x43/0x43
[<ffffffff81685c60>] ? gs_change+0x13/0x13
---[ end trace fe73a333f7c68c2e ]---
BTRFS error (device sda6) in btrfs_run_delayed_refs:2454: Object already exists
btrfs is forced readonly
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-07-15 1:22 ` BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012) Marc MERLIN
@ 2017-07-15 23:12 ` Marc MERLIN
2017-07-16 14:01 ` Giuseppe Della Bianca
2017-08-29 3:16 ` Marc MERLIN
0 siblings, 2 replies; 47+ messages in thread
From: Marc MERLIN @ 2017-07-15 23:12 UTC (permalink / raw)
To: linux-btrfs, Chris Murphy, Kai Krakow, bepi, matt, mh, mkaganer,
david, tch, somethingsome2000
Cc: Chris Mason, bo.li.liu, fdmanana, Josef Bacik, David Sterba
On Fri, Jul 14, 2017 at 06:22:16PM -0700, Marc MERLIN wrote:
> Dear Chris and other developers,
>
> Can you look at this bug which has been happening since 2012 on apparently all kernels between at least
> 3.4 and 4.11.
> I didn't look in detail at each thread (took long enough to even find them all and paste here), but they seem pretty
> similar although the reasons how they got there may be different, or at least not as benign as a race condition
> between snapshot creation and deletion for those who do hourly snapshot rotations like me.
I just finished 2 check repairs, one with each mode, they both come back
clean.
Yet my FS still remounts read only with the same
BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2967: errno=-17 Object already exists
BTRFS info (device dm-2): forced readonly
BTRFS warning (device dm-2): failed setting block group ro, ret=-30
So, given that I can reproduce this almost at will (actually I wish I could
stop it, for now I've turned off snapshots), and that the filesystem is deemed
clean, is there any patch/fix I can try?
Others on this thread with the same error: did anyone recover from this
without wiping the filesystem?
Is there a chance a balance might work around the bug so that whatever
layout I have, changes, and stops the bug from occuring?
gargamel:~# btrfs check --repair /dev/mapper/dshelf1
enabling repair mode
Checking filesystem on /dev/mapper/dshelf1
UUID: 36f5079e-ca6c-4855-8639-ccb82695c18d
checking extents
Fixed 0 roots.
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
checking csums
checking root refs
found 11454147125248 bytes used, no error found
total csum bytes: 11169793608
total tree bytes: 13468549120
total fs tree bytes: 715669504
total extent tree bytes: 478838784
btree space waste bytes: 1159606020
file data blocks allocated: 11917231079424
referenced 11938096029696
gargamel:~# btrfs check --mode=lowmem /dev/mapper/dshelf1
Checking filesystem on /dev/mapper/dshelf1
UUID: 36f5079e-ca6c-4855-8639-ccb82695c18d
checking extents
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
checking csums
checking root refs
found 11454147158016 bytes used, no error found
total csum bytes: 11169793608
total tree bytes: 13506461696
total fs tree bytes: 753549312
total extent tree bytes: 478871552
btree space waste bytes: 1165617982
file data blocks allocated: 13203054301184
referenced 13229588148224
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-07-15 23:12 ` Marc MERLIN
@ 2017-07-16 14:01 ` Giuseppe Della Bianca
2017-07-16 16:06 ` Marc MERLIN
2017-08-29 3:16 ` Marc MERLIN
1 sibling, 1 reply; 47+ messages in thread
From: Giuseppe Della Bianca @ 2017-07-16 14:01 UTC (permalink / raw)
To: Marc MERLIN; +Cc: linux-btrfs
> On Fri, Jul 14, 2017 at 06:22:16PM -0700, Marc MERLIN wrote:
> > Dear Chris and other developers,
]zac[
> Others on this thread with the same error: did anyone recover from this
> without wiping the filesystem?
>
> Is there a chance a balance might work around the bug so that whatever
> layout I have, changes, and stops the bug from occuring?
]zac[
Any attempt, even just delete files, has worsened the situation.
I advise not to waste time in repairs, and directly recreate the filesystem.
My workaround is to avoid being more than one btrfs tools running.
progResult=0
while read proc; do
if [ $progResult == 0 ]; then
echo -e \nbtrfs tools already running
progResult=222
fi
echo $proc"
done < <(ps -ef | grep -e "btrfs \{1,\}\(subvolume\|send\|receive\|delete\)")
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-07-16 14:01 ` Giuseppe Della Bianca
@ 2017-07-16 16:06 ` Marc MERLIN
2017-07-17 11:05 ` gius db
0 siblings, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2017-07-16 16:06 UTC (permalink / raw)
To: Giuseppe Della Bianca; +Cc: linux-btrfs
On Sun, Jul 16, 2017 at 04:01:53PM +0200, Giuseppe Della Bianca wrote:
> > On Fri, Jul 14, 2017 at 06:22:16PM -0700, Marc MERLIN wrote:
> > > Dear Chris and other developers,
> ]zac[
> > Others on this thread with the same error: did anyone recover from this
> > without wiping the filesystem?
> >
> > Is there a chance a balance might work around the bug so that whatever
> > layout I have, changes, and stops the bug from occuring?
> ]zac[
>
> Any attempt, even just delete files, has worsened the situation.
> I advise not to waste time in repairs, and directly recreate the filesystem.
I see. So, this is a condition where the filesystem is clear as far as:
- check
- check lowmem
- scrub
are all concerned (at least in my case), but it's in a state where
touching something around a sensitive area causes the bug.
If so, this blows, and I'm not really wanting to recreate a clean 12TB
filesystem "just because", especially since this could just happen
again after I've rebuilt it.
> while read proc; do
> if [ $progResult == 0 ]; then
> echo -e \nbtrfs tools already running
>
> progResult=222
> fi
>
> echo $proc"
> done < <(ps -ef | grep -e "btrfs \{1,\}\(subvolume\|send\|receive\|delete\)")
Yeah, I probably hit that. I think you can also add scrub to that list.
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-07-16 16:06 ` Marc MERLIN
@ 2017-07-17 11:05 ` gius db
0 siblings, 0 replies; 47+ messages in thread
From: gius db @ 2017-07-17 11:05 UTC (permalink / raw)
To: Marc MERLIN; +Cc: linux-btrfs
2017-07-16 18:06 GMT+02:00 Marc MERLIN <marc@merlins.org>:
> On Sun, Jul 16, 2017 at 04:01:53PM +0200, Giuseppe Della Bianca wrote:
>> > On Fri, Jul 14, 2017 at 06:22:16PM -0700, Marc MERLIN wrote:
>> > > Dear Chris and other developers,
>> ]zac[
>> > Others on this thread with the same error: did anyone recover from this
>> > without wiping the filesystem?
>> >
>> > Is there a chance a balance might work around the bug so that whatever
>> > layout I have, changes, and stops the bug from occuring?
>> ]zac[
>>
>> Any attempt, even just delete files, has worsened the situation.
>> I advise not to waste time in repairs, and directly recreate the filesystem.
>
> I see. So, this is a condition where the filesystem is clear as far as:
> - check
> - check lowmem
> - scrub
> are all concerned (at least in my case), but it's in a state where
> touching something around a sensitive area causes the bug.
> If so, this blows, and I'm not really wanting to recreate a clean 12TB
> filesystem "just because", especially since this could just happen
> again after I've rebuilt it.
>
IMHO, rebuild from scratch, 1-2 times a year, the snapshot receive
filesystem is inevitable.
For this reason, my snapshot receive filesystems have only this
purpose and are not bigger than 1-2 TB.
>> while read proc; do
>> if [ $progResult == 0 ]; then
>> echo -e \nbtrfs tools already running
>>
>> progResult=222
>> fi
>>
>> echo $proc"
>> done < <(ps -ef | grep -e "btrfs \{1,\}\(subvolume\|send\|receive\|delete\)")
>
> Yeah, I probably hit that. I think you can also add scrub to that list.
>
Yes.
I did not add scrubs because my scrub are always read-only.
And I think that race condition is between snapshot receive and
subvolume delete.
I also suggest:
- Use btrfs subvolume delete with -c
- Try to add a sleep after subvolume delete and receive.
> Marc
Gdb
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-07-15 23:12 ` Marc MERLIN
2017-07-16 14:01 ` Giuseppe Della Bianca
@ 2017-08-29 3:16 ` Marc MERLIN
2017-08-29 14:30 ` Josef Bacik
1 sibling, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2017-08-29 3:16 UTC (permalink / raw)
To: linux-btrfs, Chris Murphy
Cc: Chris Mason, bo.li.liu, fdmanana, Josef Bacik, David Sterba
On Sat, Jul 15, 2017 at 04:12:45PM -0700, Marc MERLIN wrote:
> On Fri, Jul 14, 2017 at 06:22:16PM -0700, Marc MERLIN wrote:
> > Dear Chris and other developers,
> >
> > Can you look at this bug which has been happening since 2012 on apparently all kernels between at least
> > 3.4 and 4.11.
> > I didn't look in detail at each thread (took long enough to even find them all and paste here), but they seem pretty
> > similar although the reasons how they got there may be different, or at least not as benign as a race condition
> > between snapshot creation and deletion for those who do hourly snapshot rotations like me.
>
> I just finished 2 check repairs, one with each mode, they both come back
> clean.
> Yet my FS still remounts read only with the same
> BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2967: errno=-17 Object already exists
> BTRFS info (device dm-2): forced readonly
> BTRFS warning (device dm-2): failed setting block group ro, ret=-30
So this still happens pseudo randomly every 2 weeks maybe?
Last one is below.
It did not happen during a btrfs snapshot although I'm not entirely sure
what else was running at the time.
Any update on this problem?
------------[ cut here ]------------
WARNING: CPU: 6 PID: 3783 at fs/btrfs/extent-tree.c:2967 btrfs_run_delayed_refs+0xbd/0x1be
BTRFS: Transaction aborted (error -17)
Modules linked in: asix veth ip6table_filter ip6_tables ebtable_nat ebtables ppdev lp xt_addrtype br_netfilter bridge stp llc tun autofs4 softdog binfmt_misc ftdi_sio nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_nat xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG iptable_mangle iptable_filter lm85 hwmon_vid pl2303 dm_snapshot dm_bufio iptable_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_conntrack_ftp ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_nat nf_conntrack x_tables sg st snd_pcm_oss snd_mixer_oss bcache kvm_intel kvm irqbypass snd_hda_codec_realtek snd_hda_codec_generic snd_cmipci snd_mpu401_uart snd_hda_intel snd_opl3_lib snd_hda_codec snd_hda_core snd_hwdep eeepc_wmi snd_rawmidi snd_seq_device tpm_infineon tpm_tis
snd_pcm asus_wmi snd_timer tpm_tis_core rc_ati_x10 snd ati_remote sparse_keymap rfkill i2c_i801 usbserial hwmon usbnet libphy pcspkr wmi soundcore input_leds tpm rc_core parport_pc evdev i915 lpc_ich i2c_smbus parport battery mei_me e1000e ptp pps_core fuse raid456 multipath mmc_block mmc_core dm_crypt dm_mod async_raid6_recov async_pq async_xor async_memcpy async_tx crc32c_intel blowfish_x86_64 blowfish_common aesni_intel aes_x86_64 lrw glue_helper ablk_helper cryptd sata_sil24 fjes mvsas xhci_pci libsas xhci_hcd ehci_pci ehci_hcd thermal usbcore fan r8169 mii scsi_transport_sas [last unloaded: asix]
CPU: 2 PID: 3783 Comm: btrfs-transacti Tainted: G U 4.9.36-amd64-preempt-sysrq-20170406 #1
Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
ffffb7eb67affc98 ffffffffae39b00b ffffb7eb67affce8 0000000000000000
ffffb7eb67affcd8 ffffffffae066769 00000b9767affd58 ffff974f736da960
ffff9756319df000 00000000ffffffef ffff975302da7a50 ffffffffffffffff
Call Trace:
[<ffffffffae39b00b>] dump_stack+0x61/0x7d
[<ffffffffae066769>] __warn+0xc2/0xdd
[<ffffffffae0667de>] warn_slowpath_fmt+0x5a/0x76
[<ffffffffae28dd5f>] btrfs_run_delayed_refs+0xbd/0x1be
[<ffffffffae29ed64>] commit_cowonly_roots+0x10d/0x2b2
[<ffffffffae2fb5ed>] ? btrfs_qgroup_account_extents+0x131/0x181
[<ffffffffae28de48>] ? btrfs_run_delayed_refs+0x1a6/0x1be
[<ffffffffae2a131a>] btrfs_commit_transaction+0x46b/0x8fb
[<ffffffffae29c560>] transaction_kthread+0xf5/0x1a1
[<ffffffffae29c46b>] ? btrfs_cleanup_transaction+0x436/0x436
[<ffffffffae081e94>] kthread+0xd1/0xd9
[<ffffffffae081dc3>] ? init_completion+0x24/0x24
[<ffffffffae003add>] ? do_fast_syscall_32+0xb7/0xfe
[<ffffffffae6ed4b5>] ret_from_fork+0x25/0x30
---[ end trace 4c5fcb9daa07c11a ]---
BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2967: errno=-17 Object already exists
BTRFS info (device dm-2): forced readonly
BTRFS warning (device dm-2): Skipping commit of aborted transaction.
BTRFS: error (device dm-2) in cleanup_transaction:1850: errno=-17 Object already exists
BTRFS error (device dm-2): pending csums is 131072
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-08-29 3:16 ` Marc MERLIN
@ 2017-08-29 14:30 ` Josef Bacik
2017-08-29 14:39 ` Marc MERLIN
0 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2017-08-29 14:30 UTC (permalink / raw)
To: Marc MERLIN, linux-btrfs, Chris Murphy
Cc: Chris Mason, bo.li.liu, fdmanana, David Sterba
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 5281 bytes --]
Sorry Marc, Iâll wire up a bcc script to try and catch when this happens. In order for it to work itâll need to read the extent tree in before you mount the fs, is that something youâll be able to swing or is this your root fs? Also is it the only btrfs fs on the system? Thanks,
Josef
On 8/28/17, 11:17 PM, "Marc MERLIN" <marc@merlins.org> wrote:
On Sat, Jul 15, 2017 at 04:12:45PM -0700, Marc MERLIN wrote:
> On Fri, Jul 14, 2017 at 06:22:16PM -0700, Marc MERLIN wrote:
> > Dear Chris and other developers,
> >
> > Can you look at this bug which has been happening since 2012 on apparently all kernels between at least
> > 3.4 and 4.11.
> > I didn't look in detail at each thread (took long enough to even find them all and paste here), but they seem pretty
> > similar although the reasons how they got there may be different, or at least not as benign as a race condition
> > between snapshot creation and deletion for those who do hourly snapshot rotations like me.
>
> I just finished 2 check repairs, one with each mode, they both come back
> clean.
> Yet my FS still remounts read only with the same
> BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2967: errno=-17 Object already exists
> BTRFS info (device dm-2): forced readonly
> BTRFS warning (device dm-2): failed setting block group ro, ret=-30
So this still happens pseudo randomly every 2 weeks maybe?
Last one is below.
It did not happen during a btrfs snapshot although I'm not entirely sure
what else was running at the time.
Any update on this problem?
------------[ cut here ]------------
WARNING: CPU: 6 PID: 3783 at fs/btrfs/extent-tree.c:2967 btrfs_run_delayed_refs+0xbd/0x1be
BTRFS: Transaction aborted (error -17)
Modules linked in: asix veth ip6table_filter ip6_tables ebtable_nat ebtables ppdev lp xt_addrtype br_netfilter bridge stp llc tun autofs4 softdog binfmt_misc ftdi_sio nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_nat xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG iptable_mangle iptable_filter lm85 hwmon_vid pl2303 dm_snapshot dm_bufio iptable_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_conntrack_ftp ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_nat nf_conntrack x_tables sg st snd_pcm_oss snd_mixer_oss bcache kvm_intel kvm irqbypass snd_hda_codec_realtek snd_hda_codec_generic snd_cmipci snd_mpu401_uart snd_hda_intel snd_opl3_lib snd_hda_codec snd_hda_core snd_hwdep eeepc_wmi snd_rawmidi snd_seq_device tpm_infineon tpm_tis
snd_pcm asus_wmi snd_timer tpm_tis_core rc_ati_x10 snd ati_remote sparse_keymap rfkill i2c_i801 usbserial hwmon usbnet libphy pcspkr wmi soundcore input_leds tpm rc_core parport_pc evdev i915 lpc_ich i2c_smbus parport battery mei_me e1000e ptp pps_core fuse raid456 multipath mmc_block mmc_core dm_crypt dm_mod async_raid6_recov async_pq async_xor async_memcpy async_tx crc32c_intel blowfish_x86_64 blowfish_common aesni_intel aes_x86_64 lrw glue_helper ablk_helper cryptd sata_sil24 fjes mvsas xhci_pci libsas xhci_hcd ehci_pci ehci_hcd thermal usbcore fan r8169 mii scsi_transport_sas [last unloaded: asix]
CPU: 2 PID: 3783 Comm: btrfs-transacti Tainted: G U 4.9.36-amd64-preempt-sysrq-20170406 #1
Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
ffffb7eb67affc98 ffffffffae39b00b ffffb7eb67affce8 0000000000000000
ffffb7eb67affcd8 ffffffffae066769 00000b9767affd58 ffff974f736da960
ffff9756319df000 00000000ffffffef ffff975302da7a50 ffffffffffffffff
Call Trace:
[<ffffffffae39b00b>] dump_stack+0x61/0x7d
[<ffffffffae066769>] __warn+0xc2/0xdd
[<ffffffffae0667de>] warn_slowpath_fmt+0x5a/0x76
[<ffffffffae28dd5f>] btrfs_run_delayed_refs+0xbd/0x1be
[<ffffffffae29ed64>] commit_cowonly_roots+0x10d/0x2b2
[<ffffffffae2fb5ed>] ? btrfs_qgroup_account_extents+0x131/0x181
[<ffffffffae28de48>] ? btrfs_run_delayed_refs+0x1a6/0x1be
[<ffffffffae2a131a>] btrfs_commit_transaction+0x46b/0x8fb
[<ffffffffae29c560>] transaction_kthread+0xf5/0x1a1
[<ffffffffae29c46b>] ? btrfs_cleanup_transaction+0x436/0x436
[<ffffffffae081e94>] kthread+0xd1/0xd9
[<ffffffffae081dc3>] ? init_completion+0x24/0x24
[<ffffffffae003add>] ? do_fast_syscall_32+0xb7/0xfe
[<ffffffffae6ed4b5>] ret_from_fork+0x25/0x30
---[ end trace 4c5fcb9daa07c11a ]---
BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2967: errno=-17 Object already exists
BTRFS info (device dm-2): forced readonly
BTRFS warning (device dm-2): Skipping commit of aborted transaction.
BTRFS: error (device dm-2) in cleanup_transaction:1850: errno=-17 Object already exists
BTRFS error (device dm-2): pending csums is 131072
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=qcSpYy8ZFdhWPMDeFU0pClrt2eWlHLnDl5rqwzlssdk&s=591MXZleq8AqL3ZpDgJYq2y-sRj1LSE4F_32mkIa9Pg&e=
ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±ý»k~ÏâØ^nr¡ö¦zË\x1aëh¨èÚ&£ûàz¿äz¹Þú+Ê+zf£¢·h§~Ûiÿÿïêÿêçz_è®\x0fæj:+v¨þ)ߣøm
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-08-29 14:30 ` Josef Bacik
@ 2017-08-29 14:39 ` Marc MERLIN
2017-08-29 14:43 ` Josef Bacik
2017-08-29 18:22 ` Josef Bacik
0 siblings, 2 replies; 47+ messages in thread
From: Marc MERLIN @ 2017-08-29 14:39 UTC (permalink / raw)
To: Josef Bacik
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
On Tue, Aug 29, 2017 at 02:30:19PM +0000, Josef Bacik wrote:
> Sorry Marc, I’ll wire up a bcc script to try and catch when this
> happens. In order for it to work it’ll need to read the extent tree in
> before you mount the fs, is that something you’ll be able to swing or is
> this your root fs? Also is it the only btrfs fs on the system? Thanks,
HI Josef, thanks for your reply.
Thankfully it's not the root FS.
There are 3 btrfs filesystems on that system.
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-08-29 14:39 ` Marc MERLIN
@ 2017-08-29 14:43 ` Josef Bacik
2017-08-29 18:22 ` Josef Bacik
1 sibling, 0 replies; 47+ messages in thread
From: Josef Bacik @ 2017-08-29 14:43 UTC (permalink / raw)
To: Marc MERLIN
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1435 bytes --]
Alright Iâll figure out a way to differentiate between the fsâs, but being able to scan the fs before itâs mounted was the hardest part so thatâs perfect. Iâll get something written up and tested today to make sure it wonât spit out false positives and send it to you this afternoon or tomorrow. Thanks,
Josef
On 8/29/17, 10:40 AM, "Marc MERLIN" <marc@merlins.org> wrote:
On Tue, Aug 29, 2017 at 02:30:19PM +0000, Josef Bacik wrote:
> Sorry Marc, Iâll wire up a bcc script to try and catch when this
> happens. In order for it to work itâll need to read the extent tree in
> before you mount the fs, is that something youâll be able to swing or is
> this your root fs? Also is it the only btrfs fs on the system? Thanks,
HI Josef, thanks for your reply.
Thankfully it's not the root FS.
There are 3 btrfs filesystems on that system.
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=Rb6fFZaTtI5fFzN4MD03GPvT0eSOYGuRNKKA4pDehzY&s=sMstwHEsJAdwf4N0fDnuUedvuGEPnDiEV-YmTYK8Zxc&e=
ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±ý»k~ÏâØ^nr¡ö¦zË\x1aëh¨èÚ&£ûàz¿äz¹Þú+Ê+zf£¢·h§~Ûiÿÿïêÿêçz_è®\x0fæj:+v¨þ)ߣøm
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-08-29 14:39 ` Marc MERLIN
2017-08-29 14:43 ` Josef Bacik
@ 2017-08-29 18:22 ` Josef Bacik
2017-08-30 3:40 ` Marc MERLIN
1 sibling, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2017-08-29 18:22 UTC (permalink / raw)
To: Marc MERLIN
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1494 bytes --]
How much metadata do you have on this fs? I was going to hold everything in bpf hash trees, but Iâm worried weâll hit collisions and then the tracing will be useless. If itâs too big Iâll have to dump everything to userspace and let python take care of keeping everything in memory, so if you have a lot of metadata hopefully you have lots of memory too ;). Thanks,
Josef
On 8/29/17, 10:40 AM, "Marc MERLIN" <marc@merlins.org> wrote:
On Tue, Aug 29, 2017 at 02:30:19PM +0000, Josef Bacik wrote:
> Sorry Marc, Iâll wire up a bcc script to try and catch when this
> happens. In order for it to work itâll need to read the extent tree in
> before you mount the fs, is that something youâll be able to swing or is
> this your root fs? Also is it the only btrfs fs on the system? Thanks,
HI Josef, thanks for your reply.
Thankfully it's not the root FS.
There are 3 btrfs filesystems on that system.
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=Rb6fFZaTtI5fFzN4MD03GPvT0eSOYGuRNKKA4pDehzY&s=sMstwHEsJAdwf4N0fDnuUedvuGEPnDiEV-YmTYK8Zxc&e=
ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±ý»k~ÏâØ^nr¡ö¦zË\x1aëh¨èÚ&£ûàz¿äz¹Þú+Ê+zf£¢·h§~Ûiÿÿïêÿêçz_è®\x0fæj:+v¨þ)ߣøm
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-08-29 18:22 ` Josef Bacik
@ 2017-08-30 3:40 ` Marc MERLIN
2017-08-31 14:52 ` Josef Bacik
0 siblings, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2017-08-30 3:40 UTC (permalink / raw)
To: Josef Bacik
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
On Tue, Aug 29, 2017 at 06:22:38PM +0000, Josef Bacik wrote:
> How much metadata do you have on this fs? I was going to hold everything in bpf hash trees, but I’m worried we’ll hit collisions and then the tracing will be useless. If it’s too big I’ll have to dump everything to userspace and let python take care of keeping everything in memory, so if you have a lot of metadata hopefully you have lots of memory too ;). Thanks,
gargamel:~# btrfs fi df /mnt/btrfs_pool1
Data, single: total=10.60TiB, used=10.54TiB
System, DUP: total=32.00MiB, used=1.19MiB
Metadata, DUP: total=58.00GiB, used=12.69GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-08-30 3:40 ` Marc MERLIN
@ 2017-08-31 14:52 ` Josef Bacik
2017-08-31 17:36 ` Marc MERLIN
0 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2017-08-31 14:52 UTC (permalink / raw)
To: Marc MERLIN
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1956 bytes --]
Hello,
Sorry I really thought I could accomplish this with BPF, but ref tracking is just too complicated to work properly with BPF. I forward ported my ref verification patch to the latest kernel, you can find it in the btrfs-readdir branch of my btrfs-next tree here
git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git
Just check that out, git checkout btrfs-readdir, build with CONFIG_BTRFS_FS_REF_VERIFY=y, and then mount the problematic fs with âo ref_verify and then grab the full output when it blows up and we should be able to work out what is happening from there. Thanks,
Josef
On 8/29/17, 11:41 PM, "Marc MERLIN" <marc@merlins.org> wrote:
On Tue, Aug 29, 2017 at 06:22:38PM +0000, Josef Bacik wrote:
> How much metadata do you have on this fs? I was going to hold everything in bpf hash trees, but Iâm worried weâll hit collisions and then the tracing will be useless. If itâs too big Iâll have to dump everything to userspace and let python take care of keeping everything in memory, so if you have a lot of metadata hopefully you have lots of memory too ;). Thanks,
gargamel:~# btrfs fi df /mnt/btrfs_pool1
Data, single: total=10.60TiB, used=10.54TiB
System, DUP: total=32.00MiB, used=1.19MiB
Metadata, DUP: total=58.00GiB, used=12.69GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=q-HXS1ddbqcYmJLp6pXcQoJL7qBXplbRAFRQ4eGSQYw&s=yyIlFUXCBjQ2xLoWBYzasW3BtBiLrITfkKLWvnhqgOs&e= | PGP 1024R/763BE901
ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±ý»k~ÏâØ^nr¡ö¦zË\x1aëh¨èÚ&£ûàz¿äz¹Þú+Ê+zf£¢·h§~Ûiÿÿïêÿêçz_è®\x0fæj:+v¨þ)ߣøm
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-08-31 14:52 ` Josef Bacik
@ 2017-08-31 17:36 ` Marc MERLIN
2017-08-31 17:48 ` Josef Bacik
0 siblings, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2017-08-31 17:36 UTC (permalink / raw)
To: Josef Bacik
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
On Thu, Aug 31, 2017 at 02:52:56PM +0000, Josef Bacik wrote:
> Hello,
>
> Sorry I really thought I could accomplish this with BPF, but ref tracking is just too complicated to work properly with BPF. I forward ported my ref verification patch to the latest kernel, you can find it in the btrfs-readdir branch of my btrfs-next tree here
>
> git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git
Thanks.
Now, I have to ask: how safe is this kernel btrfs-wise? I'm ok if it
crashes, but much less so if it damages my filesysetem.
I spent over a week recovering from the last corruption that happened when I
moved to 4.11 (and retreated back to 4.9).
>From other reports you've seen, has 4.11/4.12 been stable enough for others,
and is 4.13-rc (which your branch is based on, correct?) safe enough in your
opinion?
(and yes, just asking for your opinion, I totally understand that you can't
predict all bugs, and you can't give me a 100% assurance)
I do have a backup, but it indeed takes days to recover, and over a week if
the kernel also damages the other FS on that system, which is smaller, but
has maybe 100x the amount of files.
For now, the problem in the subject line, happens rarely-ish (2-3 weeks?)
although if I remove sleeps in my snapshot creation and rotation, it may
start happening more often again.
Thanks,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-08-31 17:36 ` Marc MERLIN
@ 2017-08-31 17:48 ` Josef Bacik
2017-09-01 20:43 ` Marc MERLIN
0 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2017-08-31 17:48 UTC (permalink / raw)
To: Marc MERLIN
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2659 bytes --]
We are using 4.11 in production at fb with backports from recent (a month ago?) stuff. Iâm relatively certain nothing bad will happen, and this branch has the most recent fsync() corruption fix (which exists in your kernel so itâs not new). That said if you are uncomfortable I can rebase this patch onto whatever base you want and push out a branch, itâs your choice. Keep in mind this is going to hold a lot of shit in memory, so I hope you have enough, and Iâd definitely remove the sleepâs from your script, thereâs no telling if this is a race condition or not and the overhead of the ref-verify stuff may cause it to be less likely to happen. Thanks,
Josef
On 8/31/17, 1:36 PM, "Marc MERLIN" <marc@merlins.org> wrote:
On Thu, Aug 31, 2017 at 02:52:56PM +0000, Josef Bacik wrote:
> Hello,
>
> Sorry I really thought I could accomplish this with BPF, but ref tracking is just too complicated to work properly with BPF. I forward ported my ref verification patch to the latest kernel, you can find it in the btrfs-readdir branch of my btrfs-next tree here
>
> git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git
Thanks.
Now, I have to ask: how safe is this kernel btrfs-wise? I'm ok if it
crashes, but much less so if it damages my filesysetem.
I spent over a week recovering from the last corruption that happened when I
moved to 4.11 (and retreated back to 4.9).
>From other reports you've seen, has 4.11/4.12 been stable enough for others,
and is 4.13-rc (which your branch is based on, correct?) safe enough in your
opinion?
(and yes, just asking for your opinion, I totally understand that you can't
predict all bugs, and you can't give me a 100% assurance)
I do have a backup, but it indeed takes days to recover, and over a week if
the kernel also damages the other FS on that system, which is smaller, but
has maybe 100x the amount of files.
For now, the problem in the subject line, happens rarely-ish (2-3 weeks?)
although if I remove sleeps in my snapshot creation and rotation, it may
start happening more often again.
Thanks,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=dPglHkF4tnOYz0Vu1uAapAEiUpHQoQoBDXggxgitjhY&s=nlFmiXkCAu4Dlg2YpjTNdKNFgTA7NzdZJ3oTOPko2U0&e=
ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±ý»k~ÏâØ^nr¡ö¦zË\x1aëh¨èÚ&£ûàz¿äz¹Þú+Ê+zf£¢·h§~Ûiÿÿïêÿêçz_è®\x0fæj:+v¨þ)ߣøm
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-08-31 17:48 ` Josef Bacik
@ 2017-09-01 20:43 ` Marc MERLIN
2017-09-01 23:01 ` Josef Bacik
0 siblings, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2017-09-01 20:43 UTC (permalink / raw)
To: Josef Bacik
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
On Thu, Aug 31, 2017 at 05:48:23PM +0000, Josef Bacik wrote:
> We are using 4.11 in production at fb with backports from recent (a month ago?) stuff. I’m relatively certain nothing bad will happen, and this branch has the most recent fsync() corruption fix (which exists in your kernel so it’s not new). That said if you are uncomfortable I can rebase this patch onto whatever base you want and push out a branch, it’s your choice. Keep in mind this is going to hold a lot of shit in memory, so I hope you have enough, and I’d definitely remove the sleep’s from your script, there’s no telling if this is a race condition or not and the overhead of the ref-verify stuff may cause it to be less likely to happen. Thanks,
Thanks for the warning. I have 32GB of RAM in the server, and I probably use
8. Most of the rest is so that I can do btrfs check --repair without the
machine dying :-/
I am concerned that I have a lot more metadata than I have memory:
gargamel:~# btrfs fi df /mnt/btrfs_pool1
Data, single: total=10.66TiB, used=10.60TiB
System, DUP: total=32.00MiB, used=1.20MiB
Metadata, DUP: total=58.00GiB, used=12.76GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
gargamel:~# btrfs fi df /mnt/btrfs_pool2
Data, single: total=5.07TiB, used=4.78TiB
System, DUP: total=8.00MiB, used=640.00KiB
Metadata, DUP: total=70.50GiB, used=66.58GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
That's 13GB + 67GB.
Is it going to fall over if I only have 32GB of RAM?
If I stop mounting /mnt/btrfs_pool2 for a while, will 32GB of RAM
cover the 13GB of metadata from /mnt/btrfs_pool1 ?
Thanks,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-09-01 20:43 ` Marc MERLIN
@ 2017-09-01 23:01 ` Josef Bacik
2017-09-02 16:09 ` Marc MERLIN
0 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2017-09-01 23:01 UTC (permalink / raw)
To: Marc MERLIN
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2783 bytes --]
You'll be fine, it's only happening on the one fs right? That's 13gib of metadata with checksums and all that shit, it'll probably look like 8 or 9gib of ram worst case. I'd mount with -o ref_verify and check the slab amount in /proc/meminfo to get an idea of real usage. Once the mount is finished that'll be about as much metadata you will use, of course it'll grow as metadata usage grows but it should be nominal. Thanks,
Josef
Sent from my iPhone
> On Sep 1, 2017, at 4:43 PM, Marc MERLIN <marc@merlins.org> wrote:
>
>> On Thu, Aug 31, 2017 at 05:48:23PM +0000, Josef Bacik wrote:
>> We are using 4.11 in production at fb with backports from recent (a month ago?) stuff. Iâm relatively certain nothing bad will happen, and this branch has the most recent fsync() corruption fix (which exists in your kernel so itâs not new). That said if you are uncomfortable I can rebase this patch onto whatever base you want and push out a branch, itâs your choice. Keep in mind this is going to hold a lot of shit in memory, so I hope you have enough, and Iâd definitely remove the sleepâs from your script, thereâs no telling if this is a race condition or not and the overhead of the ref-verify stuff may cause it to be less likely to happen. Thanks,
>
> Thanks for the warning. I have 32GB of RAM in the server, and I probably use
> 8. Most of the rest is so that I can do btrfs check --repair without the
> machine dying :-/
>
> I am concerned that I have a lot more metadata than I have memory:
> gargamel:~# btrfs fi df /mnt/btrfs_pool1
> Data, single: total=10.66TiB, used=10.60TiB
> System, DUP: total=32.00MiB, used=1.20MiB
> Metadata, DUP: total=58.00GiB, used=12.76GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
> gargamel:~# btrfs fi df /mnt/btrfs_pool2
> Data, single: total=5.07TiB, used=4.78TiB
> System, DUP: total=8.00MiB, used=640.00KiB
> Metadata, DUP: total=70.50GiB, used=66.58GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
> That's 13GB + 67GB.
> Is it going to fall over if I only have 32GB of RAM?
>
> If I stop mounting /mnt/btrfs_pool2 for a while, will 32GB of RAM
> cover the 13GB of metadata from /mnt/btrfs_pool1 ?
>
> Thanks,
> Marc
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
> .... what McDonalds is to gourmet cooking
> Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=9sSxC-1zmDEfNiAWSOeOTrz03WlT5Fd1j_U0WK0kfPk&s=YbE1JGIKZGAAWnKVWJfwkj0Fu_GC6OYF7fmbfjcrqHY&e=
ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±ý»k~ÏâØ^nr¡ö¦zË\x1aëh¨èÚ&£ûàz¿äz¹Þú+Ê+zf£¢·h§~Ûiÿÿïêÿêçz_è®\x0fæj:+v¨þ)ߣøm
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-09-01 23:01 ` Josef Bacik
@ 2017-09-02 16:09 ` Marc MERLIN
2017-09-02 16:52 ` Josef Bacik
0 siblings, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2017-09-02 16:09 UTC (permalink / raw)
To: Josef Bacik
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
On Fri, Sep 01, 2017 at 11:01:30PM +0000, Josef Bacik wrote:
> You'll be fine, it's only happening on the one fs right? That's 13gib of metadata with checksums and all that shit, it'll probably look like 8 or 9gib of ram worst case. I'd mount with -o ref_verify and check the slab amount in /proc/meminfo to get an idea of real usage. Once the mount is finished that'll be about as much metadata you will use, of course it'll grow as metadata usage grows but it should be nominal. Thanks,
Looks like I don't have enough RAM :(
[ 80.964838] BTRFS info (device dm-2): bdev /dev/mapper/dshelf1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
[ 1382.968986]Tbcache_writebaceinvoked oom-killer:dgfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0
[ 1383.003255] bcache_writebac cpuset=/ mems_allowed=0
[ 1383.018947] CPU: 6 PID: 2359 Comm: bcache_writebac Tainted: G U 4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
[ 1383.052448] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
[ 1383.080911] Call Trace:
[ 1383.089336] dump_stack+0x61/0x7d
[ 1383.100132] dump_header+0x97/0x239
[ 1383.111354] ? _raw_spin_unlock_irqrestore+0x14/0x24
[ 1383.127322] oom_kill_process+0x86/0x379
[ 1383.140208] out_of_memory+0x3b8/0x416
[ 1383.152581] __alloc_pages_slowpath+0x890/0xa55
[ 1383.166960] ? _raw_spin_unlock_irq+0x11/0x21
[ 1383.180806] __alloc_pages_nodemask+0x141/0x1f5
[ 1383.195144] alloc_pages_current+0x8d/0x96
[ 1383.208310] bio_alloc_pages+0x29/0x6a
[ 1383.220472] bch_writeback_thread+0x53b/0x6ff [bcache]
[ 1383.236942] ? write_dirty+0x90/0x90 [bcache]
[ 1383.250734] kthread+0xfb/0x100
[ 1383.261230] ? init_completion+0x24/0x24
[ 1383.273988] ? do_fast_syscall_32+0xb7/0xfe
[ 1383.287265] ret_from_fork+0x25/0x30
[ 1383.298733] Mem-Info:
[ 1383.306446] active_anon:1 inactive_anon:3 isolated_anon:0
[ 1383.306446] active_file:190 inactive_file:180 isolated_file:0
[ 1383.306446] unevictable:0 dirty:0 writeback:1 unstable:0
[ 1383.306446] slab_reclaimable:3436 slab_unreclaimable:8033273
[ 1383.306446] mapped:1 shmem:2 pagetables:74 bounce:0
[ 1383.306446] free:53127 free_pcp:0 free_cma:3741
[ 1383.406332] Node 0 active_anon:0kB inactive_anon:16kB active_file:896kB inactive_file:824kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:4kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
[ 1383.486392] Node 0 DMA free:15880kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1383.565818] lowmem_reserve[]: 0 3201 31832 31832 31832
[ 1383.581956] Node 0 DMA32 free:121256kB min:6788kB low:10064kB high:13340kB active_anon:0kB inactive_anon:0kB active_file:44kB inactive_file:52kB unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1383.665831] lowmem_reserve[]: 0 0 28631 28631 28631
[ 1383.681212] Node 0 Normal free:75372kB min:60760kB low:90092kB high:119424kB active_anon:0kB inactive_anon:16kB active_file:788kB inactive_file:836kB unevictable:0kB writepending:0kB present:29874176kB managed:29337252kB mlocked:0kB kernel_stack:8048kB pagetables:296kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:14964kB
[ 1383.769793] lowmem_reserve[]: 0 0 0 0 0
[ 1383.782429] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15880kB
[ 1383.823171] Node 0 DMA32: 4*4kB (UM) 21*8kB (UME) 9*16kB (UME) 5*32kB (ME) 5*64kB (UME) 5*128kB (UME) 6*256kB (UME) 5*512kB (UME) 5*1024kB (ME) 4*2048kB (UME) 25*4096kB (M) = 121256kB
[ 1383.874564] Node 0 Normal: 773*4kB (UMEC) 494*8kB (ME) 373*16kB (UMEC) 284*32kB (MEC) 177*64kB (UMEC) 108*128kB (UME) 36*256kB (UME) 9*512kB (UMEC) 0*1024kB 1*2048kB (C) 3*4096kB (C) = 75412kB
[ 1383.927787] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 1383.954002] 467 total pagecache pages
[ 1383.965889] 3 pages in swap cache
[ 1383.976715] Swap cache stats: add 1253, delete 1250, find 21/36
[ 1383.995325] Free swap = 15610620kB
[ 1384.006675] Total swap = 15616764kB
[ 1384.018005] 8313052 pages RAM
[ 1384.027730] 0 pages HighMem/MovableOnly
[ 1384.040076] 150644 pages reserved
[ 1384.050845] 4096 pages cma reserved
[ 1384.062127] 0 pages hwpoisoned
[ 1384.072133] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[ 1384.098531] [ 983] 0 983 936 0 6 2 32 0 init
[ 1384.124971] [ 984] 0 984 941 1 5 2 98 0 rc
[ 1384.150843] [ 1103] 0 1103 920 1 5 2 188 -1000 udevd
[ 1384.177534] [ 1311] 0 1311 925 1 5 2 67 -1000 net.agent
[ 1384.205278] [ 1352] 0 1352 925 1 5 2 66 -1000 net.agent
[ 1384.233017] [ 1703] 0 1703 926 1 5 2 68 -1000 net.agent
[ 1384.260731] [ 1935] 0 1935 587 0 5 2 31 0 bootlogd
[ 1384.288190] [ 2469] 0 2469 993 0 5 2 262 -1000 udevd
[ 1384.314846] [ 2470] 0 2470 993 0 5 2 261 -1000 udevd
[ 1384.341494] [ 3049] 0 3049 1538 1 6 2 177 0 S13mountall.sh
[ 1384.370576] [ 3125] 0 3125 1718 0 7 2 128 0 mount
[ 1384.397360] [15456] 0 15456 124 0 3 2 10 -1000 sleep
[ 1384.424026] [15457] 0 15457 124 0 3 2 12 -1000 sleep
[ 1384.450650] [15458] 0 15458 124 1 3 2 10 -1000 sleep
[ 1384.477317] Out of memory: Kill process 3049 (S13mountall.sh) score 0 or sacrifice child
[ 1384.502384] Killed process 3125 (mount) total-vm:6872kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 1384.535964] oom_reaper: reaped process 3125 (mount), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 1384.573082] bcache_writebac invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0
[ 1384.607340] bcache_writebac cpuset=/ mems_allowed=0
[ 1384.623102] CPU: 0 PID: 2359 Comm: bcache_writebac Tainted: G U 4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
[ 1384.656825] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
[ 1384.685361] Call Trace:
[ 1384.693823] dump_stack+0x61/0x7d
[ 1384.704866] dump_header+0x97/0x239
[ 1384.716086] ? _raw_spin_unlock_irqrestore+0x14/0x24
[ 1384.731697] oom_kill_process+0x86/0x379
[ 1384.744201] out_of_memory+0x3b8/0x416
[ 1384.756259] __alloc_pages_slowpath+0x890/0xa55
[ 1384.770536] ? _raw_spin_unlock_irq+0x11/0x21
[ 1384.784302] __alloc_pages_nodemask+0x141/0x1f5
[ 1384.798539] alloc_pages_current+0x8d/0x96
[ 1384.811465] bio_alloc_pages+0x29/0x6a
[ 1384.823334] bch_writeback_thread+0x53b/0x6ff [bcache]
[ 1384.839334] ? write_dirty+0x90/0x90 [bcache]
[ 1384.852984] kthread+0xfb/0x100
[ 1384.862970] ? init_completion+0x24/0x24
[ 1384.875285] ? do_fast_syscall_32+0xb7/0xfe
[ 1384.888368] ret_from_fork+0x25/0x30
[ 1384.899696] Mem-Info:
[ 1384.907064] active_anon:0 inactive_anon:2 isolated_anon:0
[ 1384.907064] active_file:189 inactive_file:273 isolated_file:0
[ 1384.907064] unevictable:0 dirty:0 writeback:0 unstable:0
[ 1384.907064] slab_reclaimable:3414 slab_unreclaimable:8053934
[ 1384.907064] mapped:1 shmem:2 pagetables:74 bounce:0
[ 1384.907064] free:32075 free_pcp:25 free_cma:3741
[ 1384.922833] kworker/6:1H: page allocation failure: order:0, mode:0x1284020(GFP_ATOMIC|__GFP_COMP|__GFP_NOTRACK), nodemask=(null)
[ 1384.922836] kworker/6:1H cpuset=/ mems_allowed=0
[ 1384.922840] CPU: 6 PID: 400 Comm: kworker/6:1H Tainted: G U 4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
[ 1384.922841] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
[ 1384.922844] Workqueue: kblockd blk_mq_run_work_fn
[ 1384.922845] Call Trace:
[ 1384.922849] dump_stack+0x61/0x7d
[ 1384.922851] warn_alloc+0xfc/0x18c
[ 1384.922854] __alloc_pages_slowpath+0x9ca/0xa55
[ 1384.922856] ? __alloc_pages_slowpath+0x9ca/0xa55
[ 1384.922858] __alloc_pages_nodemask+0x141/0x1f5
[ 1384.922862] cache_grow_begin+0xa4/0x294
[ 1384.922863] fallback_alloc+0x154/0x196
[ 1384.922865] ? cache_grow_begin+0xa4/0x294
[ 1384.922867] ____cache_alloc_node+0xdd/0xe9
[ 1384.922869] kmem_cache_alloc+0x98/0x143
[ 1384.922873] sas_alloc_task+0x1d/0x32 [libsas]
[ 1384.922876] sas_ata_qc_issue+0x71/0x21c [libsas]
[ 1384.922878] ata_qc_issue+0x1fc/0x24c
[ 1384.922880] ? ata_scsi_write_same_xlat+0x2d1/0x2d1
[ 1384.922882] __ata_scsi_queuecmd+0x18f/0x1eb
[ 1384.922883] ata_sas_queuecmd+0x31/0x4d
[ 1384.922886] sas_queuecommand+0x83/0x1cf [libsas]
[ 1384.922889] ? blk_add_timer+0xcb/0x10f
[ 1384.922892] scsi_dispatch_cmd+0x141/0x210
[ 1384.922893] scsi_queue_rq+0x1c7/0x28f
[ 1384.922895] blk_mq_dispatch_rq_list+0x1a6/0x2cf
[ 1384.922896] ? find_next_bit+0xb/0xd
[ 1384.922899] blk_mq_sched_dispatch_requests+0x14e/0x1e7
[ 1384.922900] ? __switch_to+0x288/0x44b
[ 1384.922911] __blk_mq_run_hw_queue+0x4c/0x7f
[ 1384.922912] blk_mq_run_work_fn+0x2c/0x2e
[ 1384.922913] process_one_work+0x179/0x2a5
[ 1384.922915] ? rescuer_thread+0x273/0x273
[ 1384.922915] worker_thread+0x1a8/0x25b
[ 1384.922917] ? rescuer_thread+0x273/0x273
[ 1384.922917] kthread+0xfb/0x100
[ 1384.922918] ? init_completion+0x24/0x24
[ 1384.922919] ? do_fast_syscall_32+0xb7/0xfe
[ 1384.922920] ret_from_fork+0x25/0x30
[ 1384.922922] Mem-Info:
[ 1384.922924] active_anon:0 inactive_anon:2 isolated_anon:0
[ 1384.922924] active_file:199 inactive_file:263 isolated_file:0
[ 1384.922924] unevictable:0 dirty:0 writeback:0 unstable:0
[ 1384.922924] slab_reclaimable:3414 slab_unreclaimable:8055394
[ 1384.922924] mapped:1 shmem:2 pagetables:74 bounce:0
[ 1384.922924] free:30587 free_pcp:18 free_cma:3741
[ 1384.922926] Node 0 active_anon:0kB inactive_anon:8kB active_file:796kB inactive_file:1052kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:4kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 1384.922926] Node 0 DMA free:15880kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1384.922928] lowmem_reserve[]: 0 3201 31832 31832 31832
[ 1384.922930] Node 0 DMA32 free:91392kB min:6788kB low:10064kB high:13340kB active_anon:0kB inactive_anon:0kB active_file:56kB inactive_file:56kB unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:72kB local_pcp:72kB free_cma:0kB
[ 1384.922932] lowmem_reserve[]: 0 0 28631 28631 28631
[ 1384.922933] Node 0 Normal free:15076kB min:60760kB low:90092kB high:119424kB active_anon:0kB inactive_anon:8kB active_file:740kB inactive_file:996kB unevictable:0kB writepending:0kB present:29874176kB managed:29337252kB mlocked:0kB kernel_stack:8048kB pagetables:296kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:14964kB
[ 1384.922935] lowmem_reserve[]: 0 0 0 0 0
[ 1384.922936] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15880kB
[ 1384.922941] Node 0 DMA32: 2*4kB (UM) 3*8kB (ME) 4*16kB (UME) 5*32kB (ME) 4*64kB (ME) 4*128kB (ME) 5*256kB (ME) 4*512kB (ME) 5*1024kB (ME) 4*2048kB (UME) 18*4096kB (M) = 91392kB
[ 1384.922946] Node 0 Normal: 1*4kB (C) 0*8kB 1*16kB (C) 1*32kB (C) 1*64kB (C) 0*128kB 0*256kB 1*512kB (C) 0*1024kB 1*2048kB (C) 3*4096kB (C) = 14964kB
[ 1384.922951] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 1384.922951] 464 total pagecache pages
[ 1384.922953] 0 pages in swap cache
[ 1384.922954] Swap cache stats: add 1253, delete 1253, find 21/36
[ 1384.922954] Free swap = 15611132kB
[ 1384.922954] Total swap = 15616764kB
[ 1384.922955] 8313052 pages RAM
[ 1384.922955] 0 pages HighMem/MovableOnly
[ 1384.922955] 150644 pages reserved
[ 1384.922956] 4096 pages cma reserved
[ 1384.922956] 0 pages hwpoisoned
[ 1385.007958] ata17.00: exception Emask 0x0 SAct 0x10000 SErr 0x0 action 0x0
[ 1385.007961] ata17.00: failed command: READ FPDMA QUEUED
[ 1385.007965] ata17.00: cmd 60/20:80:90:81:6f/00:00:35:01:00/40 tag 16 ncq dma 16384 in
[ 1385.007965] res 40/00:78:10:2c:8d/00:00:f1:00:00/40 Emask 0x40 (internal error)
[ 1385.007966] ata17.00: status: { DRDY }
[ 1385.008982] ata17.00: Security Log not supported
[ 1385.010102] ata17.00: Security Log not supported
[ 1385.010104] ata17.00: configured for UDMA/133
[ 1385.010110] ata17: EH complete
[ 1385.010162] scsi_eh_10: page allocation failure: order:0, mode:0x1284020(GFP_ATOMIC|__GFP_COMP|__GFP_NOTRACK), nodemask=(null)
[ 1385.010164] scsi_eh_10 cpuset=/ mems_allowed=0
[ 1385.010175] CPU: 6 PID: 409 Comm: scsi_eh_10 Tainted: G U 4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
[ 1385.010175] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
[ 1385.010175] Call Trace:
[ 1385.010178] dump_stack+0x61/0x7d
[ 1385.010179] warn_alloc+0xfc/0x18c
[ 1385.010181] __alloc_pages_slowpath+0x9ca/0xa55
[ 1385.010182] ? __alloc_pages_slowpath+0x9ca/0xa55
[ 1385.010184] __alloc_pages_nodemask+0x141/0x1f5
[ 1385.010186] cache_grow_begin+0xa4/0x294
[ 1385.010187] fallback_alloc+0x154/0x196
[ 1385.010188] ? cache_grow_begin+0xa4/0x294
[ 1385.010189] ____cache_alloc_node+0xdd/0xe9
[ 1385.010191] kmem_cache_alloc+0x98/0x143
[ 1385.010193] sas_alloc_task+0x1d/0x32 [libsas]
[ 1385.010195] sas_ata_qc_issue+0x71/0x21c [libsas]
[ 1385.010196] ata_qc_issue+0x1fc/0x24c
[ 1385.010198] ? ata_scsi_write_same_xlat+0x2d1/0x2d1
[ 1385.010198] __ata_scsi_queuecmd+0x18f/0x1eb
[ 1385.010200] ata_sas_queuecmd+0x31/0x4d
[ 1385.010202] sas_queuecommand+0x83/0x1cf [libsas]
[ 1385.010203] ? blk_add_timer+0xcb/0x10f
[ 1385.010205] scsi_dispatch_cmd+0x141/0x210
[ 1385.010205] scsi_queue_rq+0x1c7/0x28f
[ 1385.010207] blk_mq_dispatch_rq_list+0x1a6/0x2cf
[ 1385.010208] blk_mq_sched_dispatch_requests+0x129/0x1e7
[ 1385.010209] __blk_mq_run_hw_queue+0x4c/0x7f
[ 1385.010210] __blk_mq_delay_run_hw_queue+0x5c/0xa2
[ 1385.010211] blk_mq_run_hw_queue+0x14/0x16
[ 1385.010212] blk_mq_run_hw_queues+0x2e/0x5e
[ 1385.010212] scsi_run_queue+0x236/0x2c1
[ 1385.010214] scsi_run_host_queues+0x1f/0x37
[ 1385.010215] scsi_error_handler+0x467/0x523
[ 1385.010216] ? __schedule+0x4f5/0x5c5
[ 1385.010217] ? scsi_eh_get_sense+0x1a9/0x1a9
[ 1385.010218] kthread+0xfb/0x100
[ 1385.010219] ? init_completion+0x24/0x24
[ 1385.010220] ret_from_fork+0x25/0x30
[ 1385.010260] ata17.00: exception Emask 0x0 SAct 0x20000 SErr 0x0 action 0x0
[ 1385.010263] ata17.00: failed command: READ FPDMA QUEUED
[ 1385.010266] ata17.00: cmd 60/20:88:90:81:6f/00:00:35:01:00/40 tag 17 ncq dma 16384 in
[ 1385.010266] res 50/00:01:30:00:00/00:00:00:00:00/a0 Emask 0x40 (internal error)
[ 1385.010267] ata17.00: status: { DRDY }
[ 1385.011259] ata17.00: Security Log not supported
[ 1385.012380] ata17.00: Security Log not supported
[ 1385.012382] ata17.00: configured for UDMA/133
[ 1385.012385] ata17: EH complete
[ 1385.335912] mount: page allocation failure: order:0, mode:0x1604040(GFP_NOFS|__GFP_COMP|__GFP_NOTRACK), nodemask=(null)
[ 1385.335916] mount cpuset=/ mems_allowed=0
[ 1385.335920] CPU: 7 PID: 3125 Comm: mount Tainted: G U 4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
[ 1385.335920] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
[ 1385.335921] Call Trace:
[ 1385.335927] dump_stack+0x61/0x7d
[ 1385.335930] warn_alloc+0xfc/0x18c
[ 1385.335933] ? call_timer_fn+0x140/0x140
[ 1385.335935] __alloc_pages_slowpath+0x9ca/0xa55
[ 1385.335939] __alloc_pages_nodemask+0x141/0x1f5
[ 1385.335943] cache_grow_begin+0xa4/0x294
[ 1385.335945] fallback_alloc+0x154/0x196
[ 1385.335946] ? cache_grow_begin+0xa4/0x294
[ 1385.335948] ____cache_alloc_node+0xdd/0xe9
[ 1385.335950] kmem_cache_alloc_trace+0xa0/0xfc
[ 1385.335953] add_tree_block+0x6a/0x1a1
[ 1385.335955] build_ref_tree_for_root+0x1aa/0x3c8
[ 1385.335956] btrfs_build_ref_tree+0x142/0x179
[ 1385.335958] open_ctree+0x19af/0x1ffe
[ 1385.335961] ? _raw_spin_unlock_bh+0x1a/0x1c
[ 1385.335964] btrfs_mount+0xa0e/0xb86
[ 1385.335965] ? btrfs_mount+0xa0e/0xb86
[ 1385.335967] ? find_next_bit+0xb/0xd
[ 1385.335970] mount_fs+0x67/0x111
[ 1385.335973] vfs_kern_mount+0x6b/0xd5
[ 1385.335974] btrfs_mount+0x1de/0xb86
[ 1385.335975] ? find_next_bit+0xb/0xd
[ 1385.335978] mount_fs+0x67/0x111
[ 1385.335979] vfs_kern_mount+0x6b/0xd5
[ 1385.335981] do_mount+0x6e9/0x987
[ 1385.335984] compat_SyS_mount+0x185/0x1ae
[ 1385.335986] do_fast_syscall_32+0xb7/0xfe
[ 1385.335988] entry_SYSENTER_compat+0x4c/0x5b
[ 1385.335990] RIP: 0023:0xf7f69c29
[ 1385.335991] RSP: 002b:00000000ffa6fed0 EFLAGS: 00000297 ORIG_RAX: 0000000000000015
[ 1385.335992] RAX: ffffffffffffffda RBX: 0000000009877050 RCX: 00000000098771e8
[ 1385.335993] RDX: 0000000009877370 RSI: 00000000c0ed0400 RDI: 00000000098bd548
[ 1385.335993] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 1385.335994] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 1385.335994] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 1387.789938] Node 0 active_anon:588kB inactive_anon:300kB active_file:3988kB inactive_file:1428kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:2184kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
[ 1387.871500] Node 0 DMA free:0kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1387.949345] lowmem_reserve[]: 0 3201 31832 31832 31832
[ 1387.965376] Node 0 DMA32 free:621628kB min:6788kB low:10064kB high:13340kB active_anon:0kB inactive_anon:0kB active_file:84kB inactive_file:28kB unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:640kB local_pcp:0kB free_cma:0kB
[ 1388.049300] lowmem_reserve[]: 0 0 28631 28631 28631
[ 1388.064560] Node 0 Normal free:4812428kB min:60760kB low:90092kB high:119424kB active_anon:588kB inactive_anon:300kB active_file:3904kB inactive_file:1400kB unevictable:0kB writepending:0kB present:29874176kB managed:29337252kB mlocked:0kB kernel_stack:8080kB pagetables:320kB bounce:0kB free_pcp:4124kB local_pcp:420kB free_cma:11288kB
[ 1388.155296] lowmem_reserve[]: 0 0 0 0 0
[ 1388.167479] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[ 1388.198947] Node 0 DMA32: 4430*4kB (U) 3580*8kB (U) 1883*16kB (U) 1009*32kB (U) 17*64kB (U) 16*128kB (U) 12*256kB (U) 10*512kB (U) 17*1024kB (U) 18*2048kB (U) 126*4096kB (U) = 690472kB
[ 1388.249622] Node 0 Normal: 71828*4kB (UC) 54033*8kB (UC) 34313*16kB (UC) 21097*32kB (UC) 10342*64kB (U) 2801*128kB (UC) 201*256kB (UC) 96*512kB (UC) 68*1024kB (U) 48*2048kB (UC) 457*4096kB (UC) = 5104520kB
[ 1388.305855] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 1388.331978] 1467 total pagecache pages
[ 1388.344026] 111 pages in swap cache
[ 1388.355282] Swap cache stats: add 1465, delete 1354, find 364/553
[ 1388.374360] Free swap = 15611132kB
[ 1388.385607] Total swap = 15616764kB
[ 1388.396863] 8313052 pages RAM
[ 1388.406556] 0 pages HighMem/MovableOnly
[ 1388.418861] 150644 pages reserved
[ 1388.429595] 4096 pages cma reserved
[ 1388.440853] 0 pages hwpoisoned
[ 1388.450807] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[ 1388.477200] [ 983] 0 983 936 0 6 2 32 0 init
[ 1388.503586] [ 984] 0 984 941 1 5 2 98 0 rc
[ 1388.529456] [ 1103] 0 1103 920 1 5 2 188 -1000 udevd
[ 1388.556123] [ 1311] 0 1311 925 443 5 2 24 -1000 net.agent
[ 1388.583800] [ 1352] 0 1352 925 441 5 2 26 -1000 net.agent
[ 1388.611490] [ 1703] 0 1703 926 442 5 2 26 -1000 net.agent
[ 1388.639176] [ 1935] 0 1935 587 0 5 2 31 0 bootlogd
[ 1388.666611] [ 2469] 0 2469 993 0 5 2 262 -1000 udevd
[ 1388.693254] [ 2470] 0 2470 993 0 5 2 261 -1000 udevd
[ 1388.719913] [ 3049] 0 3049 1538 1 6 2 177 0 S13mountall.sh
[ 1388.748886] [ 3125] 0 3125 1718 0 7 2 0 0 mount
[ 1388.775570] [15483] 0 15483 558 141 5 2 0 -1000 sleep
[ 1388.802207] [15484] 0 15484 558 146 4 2 0 -1000 sleep
[ 1388.828828] [15485] 0 15485 558 145 5 2 0 -1000 sleep
[ 1388.855456] Out of memory: Kill process 3049 (S13mountall.sh) score 0 or sacrifice child
And hopefully totally unrelated (but maybe not), after the boot continues, it
crashes with:
[ 1523.299228] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffbd1d4c2c
[ 1523.299228]
[ 1523.334262] CPU: 2 PID: 19932 Comm: avahi-daemon Tainted: G U 4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
[ 1523.367142] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
[ 1523.395339] Call Trace:
[ 1523.403515] dump_stack+0x61/0x7d
[ 1523.414266] panic+0xe7/0x235
[ 1523.423982] ? compat_core_sys_select+0x25b/0x26d
[ 1523.438878] __stack_chk_fail+0x19/0x19
[ 1523.451168] compat_core_sys_select+0x25b/0x26d
[ 1523.465552] ? compat_SyS_select+0xe/0x10
[ 1523.478358] ? do_fast_syscall_32+0xb7/0xfe
[ 1523.491698] ? entry_SYSENTER_compat+0x4c/0x5b
[ 1523.505858] Kernel Offset: 0x3c000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1523.538981] Rebooting in 20 seconds..
I did add stack-protector in 4.13, and it seems to be finding an unrelated bug.
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-09-02 16:09 ` Marc MERLIN
@ 2017-09-02 16:52 ` Josef Bacik
[not found] ` <CAHKv19A=OVgCpQpDL2454T+f8QgLm9iynA8xZ4w4Kg8JjYS=UA@mail.gmail.com>
2017-09-02 23:53 ` Marc MERLIN
0 siblings, 2 replies; 47+ messages in thread
From: Josef Bacik @ 2017-09-02 16:52 UTC (permalink / raw)
To: Marc MERLIN
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
Oops, ok I've updated my tree so we don't save the stack trace of the initial scan, which we don't need anyway. That should save a decent amount of memory in your case. It was an in place update so you'll need to blow away your local branch and pull the new one to get the new code. Thanks,
Josef
Sent from my iPhone
> On Sep 2, 2017, at 12:10 PM, Marc MERLIN <marc@merlins.org> wrote:
>
>> On Fri, Sep 01, 2017 at 11:01:30PM +0000, Josef Bacik wrote:
>> You'll be fine, it's only happening on the one fs right? That's 13gib of metadata with checksums and all that shit, it'll probably look like 8 or 9gib of ram worst case. I'd mount with -o ref_verify and check the slab amount in /proc/meminfo to get an idea of real usage. Once the mount is finished that'll be about as much metadata you will use, of course it'll grow as metadata usage grows but it should be nominal. Thanks,
>
> Looks like I don't have enough RAM :(
>
> [ 80.964838] BTRFS info (device dm-2): bdev /dev/mapper/dshelf1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
> [ 1382.968986]Tbcache_writebaceinvoked oom-killer:dgfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0
> [ 1383.003255] bcache_writebac cpuset=/ mems_allowed=0
> [ 1383.018947] CPU: 6 PID: 2359 Comm: bcache_writebac Tainted: G U 4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
> [ 1383.052448] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> [ 1383.080911] Call Trace:
> [ 1383.089336] dump_stack+0x61/0x7d
> [ 1383.100132] dump_header+0x97/0x239
> [ 1383.111354] ? _raw_spin_unlock_irqrestore+0x14/0x24
> [ 1383.127322] oom_kill_process+0x86/0x379
> [ 1383.140208] out_of_memory+0x3b8/0x416
> [ 1383.152581] __alloc_pages_slowpath+0x890/0xa55
> [ 1383.166960] ? _raw_spin_unlock_irq+0x11/0x21
> [ 1383.180806] __alloc_pages_nodemask+0x141/0x1f5
> [ 1383.195144] alloc_pages_current+0x8d/0x96
> [ 1383.208310] bio_alloc_pages+0x29/0x6a
> [ 1383.220472] bch_writeback_thread+0x53b/0x6ff [bcache]
> [ 1383.236942] ? write_dirty+0x90/0x90 [bcache]
> [ 1383.250734] kthread+0xfb/0x100
> [ 1383.261230] ? init_completion+0x24/0x24
> [ 1383.273988] ? do_fast_syscall_32+0xb7/0xfe
> [ 1383.287265] ret_from_fork+0x25/0x30
> [ 1383.298733] Mem-Info:
> [ 1383.306446] active_anon:1 inactive_anon:3 isolated_anon:0
> [ 1383.306446] active_file:190 inactive_file:180 isolated_file:0
> [ 1383.306446] unevictable:0 dirty:0 writeback:1 unstable:0
> [ 1383.306446] slab_reclaimable:3436 slab_unreclaimable:8033273
> [ 1383.306446] mapped:1 shmem:2 pagetables:74 bounce:0
> [ 1383.306446] free:53127 free_pcp:0 free_cma:3741
> [ 1383.406332] Node 0 active_anon:0kB inactive_anon:16kB active_file:896kB inactive_file:824kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:4kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
> [ 1383.486392] Node 0 DMA free:15880kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [ 1383.565818] lowmem_reserve[]: 0 3201 31832 31832 31832
> [ 1383.581956] Node 0 DMA32 free:121256kB min:6788kB low:10064kB high:13340kB active_anon:0kB inactive_anon:0kB active_file:44kB inactive_file:52kB unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [ 1383.665831] lowmem_reserve[]: 0 0 28631 28631 28631
> [ 1383.681212] Node 0 Normal free:75372kB min:60760kB low:90092kB high:119424kB active_anon:0kB inactive_anon:16kB active_file:788kB inactive_file:836kB unevictable:0kB writepending:0kB present:29874176kB managed:29337252kB mlocked:0kB kernel_stack:8048kB pagetables:296kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:14964kB
> [ 1383.769793] lowmem_reserve[]: 0 0 0 0 0
> [ 1383.782429] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15880kB
> [ 1383.823171] Node 0 DMA32: 4*4kB (UM) 21*8kB (UME) 9*16kB (UME) 5*32kB (ME) 5*64kB (UME) 5*128kB (UME) 6*256kB (UME) 5*512kB (UME) 5*1024kB (ME) 4*2048kB (UME) 25*4096kB (M) = 121256kB
> [ 1383.874564] Node 0 Normal: 773*4kB (UMEC) 494*8kB (ME) 373*16kB (UMEC) 284*32kB (MEC) 177*64kB (UMEC) 108*128kB (UME) 36*256kB (UME) 9*512kB (UMEC) 0*1024kB 1*2048kB (C) 3*4096kB (C) = 75412kB
> [ 1383.927787] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> [ 1383.954002] 467 total pagecache pages
> [ 1383.965889] 3 pages in swap cache
> [ 1383.976715] Swap cache stats: add 1253, delete 1250, find 21/36
> [ 1383.995325] Free swap = 15610620kB
> [ 1384.006675] Total swap = 15616764kB
> [ 1384.018005] 8313052 pages RAM
> [ 1384.027730] 0 pages HighMem/MovableOnly
> [ 1384.040076] 150644 pages reserved
> [ 1384.050845] 4096 pages cma reserved
> [ 1384.062127] 0 pages hwpoisoned
> [ 1384.072133] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
> [ 1384.098531] [ 983] 0 983 936 0 6 2 32 0 init
> [ 1384.124971] [ 984] 0 984 941 1 5 2 98 0 rc
> [ 1384.150843] [ 1103] 0 1103 920 1 5 2 188 -1000 udevd
> [ 1384.177534] [ 1311] 0 1311 925 1 5 2 67 -1000 net.agent
> [ 1384.205278] [ 1352] 0 1352 925 1 5 2 66 -1000 net.agent
> [ 1384.233017] [ 1703] 0 1703 926 1 5 2 68 -1000 net.agent
> [ 1384.260731] [ 1935] 0 1935 587 0 5 2 31 0 bootlogd
> [ 1384.288190] [ 2469] 0 2469 993 0 5 2 262 -1000 udevd
> [ 1384.314846] [ 2470] 0 2470 993 0 5 2 261 -1000 udevd
> [ 1384.341494] [ 3049] 0 3049 1538 1 6 2 177 0 S13mountall.sh
> [ 1384.370576] [ 3125] 0 3125 1718 0 7 2 128 0 mount
> [ 1384.397360] [15456] 0 15456 124 0 3 2 10 -1000 sleep
> [ 1384.424026] [15457] 0 15457 124 0 3 2 12 -1000 sleep
> [ 1384.450650] [15458] 0 15458 124 1 3 2 10 -1000 sleep
> [ 1384.477317] Out of memory: Kill process 3049 (S13mountall.sh) score 0 or sacrifice child
> [ 1384.502384] Killed process 3125 (mount) total-vm:6872kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> [ 1384.535964] oom_reaper: reaped process 3125 (mount), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> [ 1384.573082] bcache_writebac invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0
> [ 1384.607340] bcache_writebac cpuset=/ mems_allowed=0
> [ 1384.623102] CPU: 0 PID: 2359 Comm: bcache_writebac Tainted: G U 4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
> [ 1384.656825] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> [ 1384.685361] Call Trace:
> [ 1384.693823] dump_stack+0x61/0x7d
> [ 1384.704866] dump_header+0x97/0x239
> [ 1384.716086] ? _raw_spin_unlock_irqrestore+0x14/0x24
> [ 1384.731697] oom_kill_process+0x86/0x379
> [ 1384.744201] out_of_memory+0x3b8/0x416
> [ 1384.756259] __alloc_pages_slowpath+0x890/0xa55
> [ 1384.770536] ? _raw_spin_unlock_irq+0x11/0x21
> [ 1384.784302] __alloc_pages_nodemask+0x141/0x1f5
> [ 1384.798539] alloc_pages_current+0x8d/0x96
> [ 1384.811465] bio_alloc_pages+0x29/0x6a
> [ 1384.823334] bch_writeback_thread+0x53b/0x6ff [bcache]
> [ 1384.839334] ? write_dirty+0x90/0x90 [bcache]
> [ 1384.852984] kthread+0xfb/0x100
> [ 1384.862970] ? init_completion+0x24/0x24
> [ 1384.875285] ? do_fast_syscall_32+0xb7/0xfe
> [ 1384.888368] ret_from_fork+0x25/0x30
> [ 1384.899696] Mem-Info:
> [ 1384.907064] active_anon:0 inactive_anon:2 isolated_anon:0
> [ 1384.907064] active_file:189 inactive_file:273 isolated_file:0
> [ 1384.907064] unevictable:0 dirty:0 writeback:0 unstable:0
> [ 1384.907064] slab_reclaimable:3414 slab_unreclaimable:8053934
> [ 1384.907064] mapped:1 shmem:2 pagetables:74 bounce:0
> [ 1384.907064] free:32075 free_pcp:25 free_cma:3741
> [ 1384.922833] kworker/6:1H: page allocation failure: order:0, mode:0x1284020(GFP_ATOMIC|__GFP_COMP|__GFP_NOTRACK), nodemask=(null)
> [ 1384.922836] kworker/6:1H cpuset=/ mems_allowed=0
> [ 1384.922840] CPU: 6 PID: 400 Comm: kworker/6:1H Tainted: G U 4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
> [ 1384.922841] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> [ 1384.922844] Workqueue: kblockd blk_mq_run_work_fn
> [ 1384.922845] Call Trace:
> [ 1384.922849] dump_stack+0x61/0x7d
> [ 1384.922851] warn_alloc+0xfc/0x18c
> [ 1384.922854] __alloc_pages_slowpath+0x9ca/0xa55
> [ 1384.922856] ? __alloc_pages_slowpath+0x9ca/0xa55
> [ 1384.922858] __alloc_pages_nodemask+0x141/0x1f5
> [ 1384.922862] cache_grow_begin+0xa4/0x294
> [ 1384.922863] fallback_alloc+0x154/0x196
> [ 1384.922865] ? cache_grow_begin+0xa4/0x294
> [ 1384.922867] ____cache_alloc_node+0xdd/0xe9
> [ 1384.922869] kmem_cache_alloc+0x98/0x143
> [ 1384.922873] sas_alloc_task+0x1d/0x32 [libsas]
> [ 1384.922876] sas_ata_qc_issue+0x71/0x21c [libsas]
> [ 1384.922878] ata_qc_issue+0x1fc/0x24c
> [ 1384.922880] ? ata_scsi_write_same_xlat+0x2d1/0x2d1
> [ 1384.922882] __ata_scsi_queuecmd+0x18f/0x1eb
> [ 1384.922883] ata_sas_queuecmd+0x31/0x4d
> [ 1384.922886] sas_queuecommand+0x83/0x1cf [libsas]
> [ 1384.922889] ? blk_add_timer+0xcb/0x10f
> [ 1384.922892] scsi_dispatch_cmd+0x141/0x210
> [ 1384.922893] scsi_queue_rq+0x1c7/0x28f
> [ 1384.922895] blk_mq_dispatch_rq_list+0x1a6/0x2cf
> [ 1384.922896] ? find_next_bit+0xb/0xd
> [ 1384.922899] blk_mq_sched_dispatch_requests+0x14e/0x1e7
> [ 1384.922900] ? __switch_to+0x288/0x44b
> [ 1384.922911] __blk_mq_run_hw_queue+0x4c/0x7f
> [ 1384.922912] blk_mq_run_work_fn+0x2c/0x2e
> [ 1384.922913] process_one_work+0x179/0x2a5
> [ 1384.922915] ? rescuer_thread+0x273/0x273
> [ 1384.922915] worker_thread+0x1a8/0x25b
> [ 1384.922917] ? rescuer_thread+0x273/0x273
> [ 1384.922917] kthread+0xfb/0x100
> [ 1384.922918] ? init_completion+0x24/0x24
> [ 1384.922919] ? do_fast_syscall_32+0xb7/0xfe
> [ 1384.922920] ret_from_fork+0x25/0x30
> [ 1384.922922] Mem-Info:
> [ 1384.922924] active_anon:0 inactive_anon:2 isolated_anon:0
> [ 1384.922924] active_file:199 inactive_file:263 isolated_file:0
> [ 1384.922924] unevictable:0 dirty:0 writeback:0 unstable:0
> [ 1384.922924] slab_reclaimable:3414 slab_unreclaimable:8055394
> [ 1384.922924] mapped:1 shmem:2 pagetables:74 bounce:0
> [ 1384.922924] free:30587 free_pcp:18 free_cma:3741
> [ 1384.922926] Node 0 active_anon:0kB inactive_anon:8kB active_file:796kB inactive_file:1052kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:4kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
> [ 1384.922926] Node 0 DMA free:15880kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [ 1384.922928] lowmem_reserve[]: 0 3201 31832 31832 31832
> [ 1384.922930] Node 0 DMA32 free:91392kB min:6788kB low:10064kB high:13340kB active_anon:0kB inactive_anon:0kB active_file:56kB inactive_file:56kB unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:72kB local_pcp:72kB free_cma:0kB
> [ 1384.922932] lowmem_reserve[]: 0 0 28631 28631 28631
> [ 1384.922933] Node 0 Normal free:15076kB min:60760kB low:90092kB high:119424kB active_anon:0kB inactive_anon:8kB active_file:740kB inactive_file:996kB unevictable:0kB writepending:0kB present:29874176kB managed:29337252kB mlocked:0kB kernel_stack:8048kB pagetables:296kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:14964kB
> [ 1384.922935] lowmem_reserve[]: 0 0 0 0 0
> [ 1384.922936] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15880kB
> [ 1384.922941] Node 0 DMA32: 2*4kB (UM) 3*8kB (ME) 4*16kB (UME) 5*32kB (ME) 4*64kB (ME) 4*128kB (ME) 5*256kB (ME) 4*512kB (ME) 5*1024kB (ME) 4*2048kB (UME) 18*4096kB (M) = 91392kB
> [ 1384.922946] Node 0 Normal: 1*4kB (C) 0*8kB 1*16kB (C) 1*32kB (C) 1*64kB (C) 0*128kB 0*256kB 1*512kB (C) 0*1024kB 1*2048kB (C) 3*4096kB (C) = 14964kB
> [ 1384.922951] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> [ 1384.922951] 464 total pagecache pages
> [ 1384.922953] 0 pages in swap cache
> [ 1384.922954] Swap cache stats: add 1253, delete 1253, find 21/36
> [ 1384.922954] Free swap = 15611132kB
> [ 1384.922954] Total swap = 15616764kB
> [ 1384.922955] 8313052 pages RAM
> [ 1384.922955] 0 pages HighMem/MovableOnly
> [ 1384.922955] 150644 pages reserved
> [ 1384.922956] 4096 pages cma reserved
> [ 1384.922956] 0 pages hwpoisoned
> [ 1385.007958] ata17.00: exception Emask 0x0 SAct 0x10000 SErr 0x0 action 0x0
> [ 1385.007961] ata17.00: failed command: READ FPDMA QUEUED
> [ 1385.007965] ata17.00: cmd 60/20:80:90:81:6f/00:00:35:01:00/40 tag 16 ncq dma 16384 in
> [ 1385.007965] res 40/00:78:10:2c:8d/00:00:f1:00:00/40 Emask 0x40 (internal error)
> [ 1385.007966] ata17.00: status: { DRDY }
> [ 1385.008982] ata17.00: Security Log not supported
> [ 1385.010102] ata17.00: Security Log not supported
> [ 1385.010104] ata17.00: configured for UDMA/133
> [ 1385.010110] ata17: EH complete
> [ 1385.010162] scsi_eh_10: page allocation failure: order:0, mode:0x1284020(GFP_ATOMIC|__GFP_COMP|__GFP_NOTRACK), nodemask=(null)
> [ 1385.010164] scsi_eh_10 cpuset=/ mems_allowed=0
> [ 1385.010175] CPU: 6 PID: 409 Comm: scsi_eh_10 Tainted: G U 4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
> [ 1385.010175] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> [ 1385.010175] Call Trace:
> [ 1385.010178] dump_stack+0x61/0x7d
> [ 1385.010179] warn_alloc+0xfc/0x18c
> [ 1385.010181] __alloc_pages_slowpath+0x9ca/0xa55
> [ 1385.010182] ? __alloc_pages_slowpath+0x9ca/0xa55
> [ 1385.010184] __alloc_pages_nodemask+0x141/0x1f5
> [ 1385.010186] cache_grow_begin+0xa4/0x294
> [ 1385.010187] fallback_alloc+0x154/0x196
> [ 1385.010188] ? cache_grow_begin+0xa4/0x294
> [ 1385.010189] ____cache_alloc_node+0xdd/0xe9
> [ 1385.010191] kmem_cache_alloc+0x98/0x143
> [ 1385.010193] sas_alloc_task+0x1d/0x32 [libsas]
> [ 1385.010195] sas_ata_qc_issue+0x71/0x21c [libsas]
> [ 1385.010196] ata_qc_issue+0x1fc/0x24c
> [ 1385.010198] ? ata_scsi_write_same_xlat+0x2d1/0x2d1
> [ 1385.010198] __ata_scsi_queuecmd+0x18f/0x1eb
> [ 1385.010200] ata_sas_queuecmd+0x31/0x4d
> [ 1385.010202] sas_queuecommand+0x83/0x1cf [libsas]
> [ 1385.010203] ? blk_add_timer+0xcb/0x10f
> [ 1385.010205] scsi_dispatch_cmd+0x141/0x210
> [ 1385.010205] scsi_queue_rq+0x1c7/0x28f
> [ 1385.010207] blk_mq_dispatch_rq_list+0x1a6/0x2cf
> [ 1385.010208] blk_mq_sched_dispatch_requests+0x129/0x1e7
> [ 1385.010209] __blk_mq_run_hw_queue+0x4c/0x7f
> [ 1385.010210] __blk_mq_delay_run_hw_queue+0x5c/0xa2
> [ 1385.010211] blk_mq_run_hw_queue+0x14/0x16
> [ 1385.010212] blk_mq_run_hw_queues+0x2e/0x5e
> [ 1385.010212] scsi_run_queue+0x236/0x2c1
> [ 1385.010214] scsi_run_host_queues+0x1f/0x37
> [ 1385.010215] scsi_error_handler+0x467/0x523
> [ 1385.010216] ? __schedule+0x4f5/0x5c5
> [ 1385.010217] ? scsi_eh_get_sense+0x1a9/0x1a9
> [ 1385.010218] kthread+0xfb/0x100
> [ 1385.010219] ? init_completion+0x24/0x24
> [ 1385.010220] ret_from_fork+0x25/0x30
> [ 1385.010260] ata17.00: exception Emask 0x0 SAct 0x20000 SErr 0x0 action 0x0
> [ 1385.010263] ata17.00: failed command: READ FPDMA QUEUED
> [ 1385.010266] ata17.00: cmd 60/20:88:90:81:6f/00:00:35:01:00/40 tag 17 ncq dma 16384 in
> [ 1385.010266] res 50/00:01:30:00:00/00:00:00:00:00/a0 Emask 0x40 (internal error)
> [ 1385.010267] ata17.00: status: { DRDY }
> [ 1385.011259] ata17.00: Security Log not supported
> [ 1385.012380] ata17.00: Security Log not supported
> [ 1385.012382] ata17.00: configured for UDMA/133
> [ 1385.012385] ata17: EH complete
> [ 1385.335912] mount: page allocation failure: order:0, mode:0x1604040(GFP_NOFS|__GFP_COMP|__GFP_NOTRACK), nodemask=(null)
> [ 1385.335916] mount cpuset=/ mems_allowed=0
> [ 1385.335920] CPU: 7 PID: 3125 Comm: mount Tainted: G U 4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
> [ 1385.335920] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> [ 1385.335921] Call Trace:
> [ 1385.335927] dump_stack+0x61/0x7d
> [ 1385.335930] warn_alloc+0xfc/0x18c
> [ 1385.335933] ? call_timer_fn+0x140/0x140
> [ 1385.335935] __alloc_pages_slowpath+0x9ca/0xa55
> [ 1385.335939] __alloc_pages_nodemask+0x141/0x1f5
> [ 1385.335943] cache_grow_begin+0xa4/0x294
> [ 1385.335945] fallback_alloc+0x154/0x196
> [ 1385.335946] ? cache_grow_begin+0xa4/0x294
> [ 1385.335948] ____cache_alloc_node+0xdd/0xe9
> [ 1385.335950] kmem_cache_alloc_trace+0xa0/0xfc
> [ 1385.335953] add_tree_block+0x6a/0x1a1
> [ 1385.335955] build_ref_tree_for_root+0x1aa/0x3c8
> [ 1385.335956] btrfs_build_ref_tree+0x142/0x179
> [ 1385.335958] open_ctree+0x19af/0x1ffe
> [ 1385.335961] ? _raw_spin_unlock_bh+0x1a/0x1c
> [ 1385.335964] btrfs_mount+0xa0e/0xb86
> [ 1385.335965] ? btrfs_mount+0xa0e/0xb86
> [ 1385.335967] ? find_next_bit+0xb/0xd
> [ 1385.335970] mount_fs+0x67/0x111
> [ 1385.335973] vfs_kern_mount+0x6b/0xd5
> [ 1385.335974] btrfs_mount+0x1de/0xb86
> [ 1385.335975] ? find_next_bit+0xb/0xd
> [ 1385.335978] mount_fs+0x67/0x111
> [ 1385.335979] vfs_kern_mount+0x6b/0xd5
> [ 1385.335981] do_mount+0x6e9/0x987
> [ 1385.335984] compat_SyS_mount+0x185/0x1ae
> [ 1385.335986] do_fast_syscall_32+0xb7/0xfe
> [ 1385.335988] entry_SYSENTER_compat+0x4c/0x5b
> [ 1385.335990] RIP: 0023:0xf7f69c29
> [ 1385.335991] RSP: 002b:00000000ffa6fed0 EFLAGS: 00000297 ORIG_RAX: 0000000000000015
> [ 1385.335992] RAX: ffffffffffffffda RBX: 0000000009877050 RCX: 00000000098771e8
> [ 1385.335993] RDX: 0000000009877370 RSI: 00000000c0ed0400 RDI: 00000000098bd548
> [ 1385.335993] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> [ 1385.335994] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [ 1385.335994] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [ 1387.789938] Node 0 active_anon:588kB inactive_anon:300kB active_file:3988kB inactive_file:1428kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:2184kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
> [ 1387.871500] Node 0 DMA free:0kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [ 1387.949345] lowmem_reserve[]: 0 3201 31832 31832 31832
> [ 1387.965376] Node 0 DMA32 free:621628kB min:6788kB low:10064kB high:13340kB active_anon:0kB inactive_anon:0kB active_file:84kB inactive_file:28kB unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:640kB local_pcp:0kB free_cma:0kB
> [ 1388.049300] lowmem_reserve[]: 0 0 28631 28631 28631
> [ 1388.064560] Node 0 Normal free:4812428kB min:60760kB low:90092kB high:119424kB active_anon:588kB inactive_anon:300kB active_file:3904kB inactive_file:1400kB unevictable:0kB writepending:0kB present:29874176kB managed:29337252kB mlocked:0kB kernel_stack:8080kB pagetables:320kB bounce:0kB free_pcp:4124kB local_pcp:420kB free_cma:11288kB
> [ 1388.155296] lowmem_reserve[]: 0 0 0 0 0
> [ 1388.167479] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [ 1388.198947] Node 0 DMA32: 4430*4kB (U) 3580*8kB (U) 1883*16kB (U) 1009*32kB (U) 17*64kB (U) 16*128kB (U) 12*256kB (U) 10*512kB (U) 17*1024kB (U) 18*2048kB (U) 126*4096kB (U) = 690472kB
> [ 1388.249622] Node 0 Normal: 71828*4kB (UC) 54033*8kB (UC) 34313*16kB (UC) 21097*32kB (UC) 10342*64kB (U) 2801*128kB (UC) 201*256kB (UC) 96*512kB (UC) 68*1024kB (U) 48*2048kB (UC) 457*4096kB (UC) = 5104520kB
> [ 1388.305855] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> [ 1388.331978] 1467 total pagecache pages
> [ 1388.344026] 111 pages in swap cache
> [ 1388.355282] Swap cache stats: add 1465, delete 1354, find 364/553
> [ 1388.374360] Free swap = 15611132kB
> [ 1388.385607] Total swap = 15616764kB
> [ 1388.396863] 8313052 pages RAM
> [ 1388.406556] 0 pages HighMem/MovableOnly
> [ 1388.418861] 150644 pages reserved
> [ 1388.429595] 4096 pages cma reserved
> [ 1388.440853] 0 pages hwpoisoned
> [ 1388.450807] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
> [ 1388.477200] [ 983] 0 983 936 0 6 2 32 0 init
> [ 1388.503586] [ 984] 0 984 941 1 5 2 98 0 rc
> [ 1388.529456] [ 1103] 0 1103 920 1 5 2 188 -1000 udevd
> [ 1388.556123] [ 1311] 0 1311 925 443 5 2 24 -1000 net.agent
> [ 1388.583800] [ 1352] 0 1352 925 441 5 2 26 -1000 net.agent
> [ 1388.611490] [ 1703] 0 1703 926 442 5 2 26 -1000 net.agent
> [ 1388.639176] [ 1935] 0 1935 587 0 5 2 31 0 bootlogd
> [ 1388.666611] [ 2469] 0 2469 993 0 5 2 262 -1000 udevd
> [ 1388.693254] [ 2470] 0 2470 993 0 5 2 261 -1000 udevd
> [ 1388.719913] [ 3049] 0 3049 1538 1 6 2 177 0 S13mountall.sh
> [ 1388.748886] [ 3125] 0 3125 1718 0 7 2 0 0 mount
> [ 1388.775570] [15483] 0 15483 558 141 5 2 0 -1000 sleep
> [ 1388.802207] [15484] 0 15484 558 146 4 2 0 -1000 sleep
> [ 1388.828828] [15485] 0 15485 558 145 5 2 0 -1000 sleep
> [ 1388.855456] Out of memory: Kill process 3049 (S13mountall.sh) score 0 or sacrifice child
>
> And hopefully totally unrelated (but maybe not), after the boot continues, it
> crashes with:
> [ 1523.299228] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffbd1d4c2c
> [ 1523.299228]
> [ 1523.334262] CPU: 2 PID: 19932 Comm: avahi-daemon Tainted: G U 4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
> [ 1523.367142] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> [ 1523.395339] Call Trace:
> [ 1523.403515] dump_stack+0x61/0x7d
> [ 1523.414266] panic+0xe7/0x235
> [ 1523.423982] ? compat_core_sys_select+0x25b/0x26d
> [ 1523.438878] __stack_chk_fail+0x19/0x19
> [ 1523.451168] compat_core_sys_select+0x25b/0x26d
> [ 1523.465552] ? compat_SyS_select+0xe/0x10
> [ 1523.478358] ? do_fast_syscall_32+0xb7/0xfe
> [ 1523.491698] ? entry_SYSENTER_compat+0x4c/0x5b
> [ 1523.505858] Kernel Offset: 0x3c000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [ 1523.538981] Rebooting in 20 seconds..
>
> I did add stack-protector in 4.13, and it seems to be finding an unrelated bug.
>
> Marc
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
> .... what McDonalds is to gourmet cooking
> Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=mDvpdGkRxdLklN-yVzuqr1omzWlRYVI9TzvOASUue9Q&s=rb6VESzi-2sFH_z_ODWKQ5tQtta83EITuT_KaHE7jIs&e= | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 47+ messages in thread
* Fwd: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
[not found] ` <CAHKv19A=OVgCpQpDL2454T+f8QgLm9iynA8xZ4w4Kg8JjYS=UA@mail.gmail.com>
@ 2017-09-02 18:55 ` George Joseph
0 siblings, 0 replies; 47+ messages in thread
From: George Joseph @ 2017-09-02 18:55 UTC (permalink / raw)
To: linux-btrfs
I've just had this happen for the 3rd time in 4 days. I wasn't
suibscribed to the list so couldn't reply to the existing thread but
here it is http://www.spinics.net/lists/linux-btrfs/msg68662.html
I can do some limited testing. It's my main dev machine though..
On Sat, Sep 2, 2017 at 10:52 AM, Josef Bacik <jbacik@fb.com> wrote:
>
> Oops, ok I've updated my tree so we don't save the stack trace of the initial scan, which we don't need anyway. That should save a decent amount of memory in your case. It was an in place update so you'll need to blow away your local branch and pull the new one to get the new code. Thanks,
>
> Josef
>
> Sent from my iPhone
>
> > On Sep 2, 2017, at 12:10 PM, Marc MERLIN <marc@merlins.org> wrote:
> >
> >> On Fri, Sep 01, 2017 at 11:01:30PM +0000, Josef Bacik wrote:
> >> You'll be fine, it's only happening on the one fs right? That's 13gib of metadata with checksums and all that shit, it'll probably look like 8 or 9gib of ram worst case. I'd mount with -o ref_verify and check the slab amount in /proc/meminfo to get an idea of real usage. Once the mount is finished that'll be about as much metadata you will use, of course it'll grow as metadata usage grows but it should be nominal. Thanks,
> >
> > Looks like I don't have enough RAM :(
> >
> > [ 80.964838] BTRFS info (device dm-2): bdev /dev/mapper/dshelf1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
> > [ 1382.968986]Tbcache_writebaceinvoked oom-killer:dgfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0
> > [ 1383.003255] bcache_writebac cpuset=/ mems_allowed=0
> > [ 1383.018947] CPU: 6 PID: 2359 Comm: bcache_writebac Tainted: G U 4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
> > [ 1383.052448] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> > [ 1383.080911] Call Trace:
> > [ 1383.089336] dump_stack+0x61/0x7d
> > [ 1383.100132] dump_header+0x97/0x239
> > [ 1383.111354] ? _raw_spin_unlock_irqrestore+0x14/0x24
> > [ 1383.127322] oom_kill_process+0x86/0x379
> > [ 1383.140208] out_of_memory+0x3b8/0x416
> > [ 1383.152581] __alloc_pages_slowpath+0x890/0xa55
> > [ 1383.166960] ? _raw_spin_unlock_irq+0x11/0x21
> > [ 1383.180806] __alloc_pages_nodemask+0x141/0x1f5
> > [ 1383.195144] alloc_pages_current+0x8d/0x96
> > [ 1383.208310] bio_alloc_pages+0x29/0x6a
> > [ 1383.220472] bch_writeback_thread+0x53b/0x6ff [bcache]
> > [ 1383.236942] ? write_dirty+0x90/0x90 [bcache]
> > [ 1383.250734] kthread+0xfb/0x100
> > [ 1383.261230] ? init_completion+0x24/0x24
> > [ 1383.273988] ? do_fast_syscall_32+0xb7/0xfe
> > [ 1383.287265] ret_from_fork+0x25/0x30
> > [ 1383.298733] Mem-Info:
> > [ 1383.306446] active_anon:1 inactive_anon:3 isolated_anon:0
> > [ 1383.306446] active_file:190 inactive_file:180 isolated_file:0
> > [ 1383.306446] unevictable:0 dirty:0 writeback:1 unstable:0
> > [ 1383.306446] slab_reclaimable:3436 slab_unreclaimable:8033273
> > [ 1383.306446] mapped:1 shmem:2 pagetables:74 bounce:0
> > [ 1383.306446] free:53127 free_pcp:0 free_cma:3741
> > [ 1383.406332] Node 0 active_anon:0kB inactive_anon:16kB active_file:896kB inactive_file:824kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:4kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
> > [ 1383.486392] Node 0 DMA free:15880kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> > [ 1383.565818] lowmem_reserve[]: 0 3201 31832 31832 31832
> > [ 1383.581956] Node 0 DMA32 free:121256kB min:6788kB low:10064kB high:13340kB active_anon:0kB inactive_anon:0kB active_file:44kB inactive_file:52kB unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> > [ 1383.665831] lowmem_reserve[]: 0 0 28631 28631 28631
> > [ 1383.681212] Node 0 Normal free:75372kB min:60760kB low:90092kB high:119424kB active_anon:0kB inactive_anon:16kB active_file:788kB inactive_file:836kB unevictable:0kB writepending:0kB present:29874176kB managed:29337252kB mlocked:0kB kernel_stack:8048kB pagetables:296kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:14964kB
> > [ 1383.769793] lowmem_reserve[]: 0 0 0 0 0
> > [ 1383.782429] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15880kB
> > [ 1383.823171] Node 0 DMA32: 4*4kB (UM) 21*8kB (UME) 9*16kB (UME) 5*32kB (ME) 5*64kB (UME) 5*128kB (UME) 6*256kB (UME) 5*512kB (UME) 5*1024kB (ME) 4*2048kB (UME) 25*4096kB (M) = 121256kB
> > [ 1383.874564] Node 0 Normal: 773*4kB (UMEC) 494*8kB (ME) 373*16kB (UMEC) 284*32kB (MEC) 177*64kB (UMEC) 108*128kB (UME) 36*256kB (UME) 9*512kB (UMEC) 0*1024kB 1*2048kB (C) 3*4096kB (C) = 75412kB
> > [ 1383.927787] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> > [ 1383.954002] 467 total pagecache pages
> > [ 1383.965889] 3 pages in swap cache
> > [ 1383.976715] Swap cache stats: add 1253, delete 1250, find 21/36
> > [ 1383.995325] Free swap = 15610620kB
> > [ 1384.006675] Total swap = 15616764kB
> > [ 1384.018005] 8313052 pages RAM
> > [ 1384.027730] 0 pages HighMem/MovableOnly
> > [ 1384.040076] 150644 pages reserved
> > [ 1384.050845] 4096 pages cma reserved
> > [ 1384.062127] 0 pages hwpoisoned
> > [ 1384.072133] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
> > [ 1384.098531] [ 983] 0 983 936 0 6 2 32 0 init
> > [ 1384.124971] [ 984] 0 984 941 1 5 2 98 0 rc
> > [ 1384.150843] [ 1103] 0 1103 920 1 5 2 188 -1000 udevd
> > [ 1384.177534] [ 1311] 0 1311 925 1 5 2 67 -1000 net.agent
> > [ 1384.205278] [ 1352] 0 1352 925 1 5 2 66 -1000 net.agent
> > [ 1384.233017] [ 1703] 0 1703 926 1 5 2 68 -1000 net.agent
> > [ 1384.260731] [ 1935] 0 1935 587 0 5 2 31 0 bootlogd
> > [ 1384.288190] [ 2469] 0 2469 993 0 5 2 262 -1000 udevd
> > [ 1384.314846] [ 2470] 0 2470 993 0 5 2 261 -1000 udevd
> > [ 1384.341494] [ 3049] 0 3049 1538 1 6 2 177 0 S13mountall.sh
> > [ 1384.370576] [ 3125] 0 3125 1718 0 7 2 128 0 mount
> > [ 1384.397360] [15456] 0 15456 124 0 3 2 10 -1000 sleep
> > [ 1384.424026] [15457] 0 15457 124 0 3 2 12 -1000 sleep
> > [ 1384.450650] [15458] 0 15458 124 1 3 2 10 -1000 sleep
> > [ 1384.477317] Out of memory: Kill process 3049 (S13mountall.sh) score 0 or sacrifice child
> > [ 1384.502384] Killed process 3125 (mount) total-vm:6872kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> > [ 1384.535964] oom_reaper: reaped process 3125 (mount), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> > [ 1384.573082] bcache_writebac invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0
> > [ 1384.607340] bcache_writebac cpuset=/ mems_allowed=0
> > [ 1384.623102] CPU: 0 PID: 2359 Comm: bcache_writebac Tainted: G U 4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
> > [ 1384.656825] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> > [ 1384.685361] Call Trace:
> > [ 1384.693823] dump_stack+0x61/0x7d
> > [ 1384.704866] dump_header+0x97/0x239
> > [ 1384.716086] ? _raw_spin_unlock_irqrestore+0x14/0x24
> > [ 1384.731697] oom_kill_process+0x86/0x379
> > [ 1384.744201] out_of_memory+0x3b8/0x416
> > [ 1384.756259] __alloc_pages_slowpath+0x890/0xa55
> > [ 1384.770536] ? _raw_spin_unlock_irq+0x11/0x21
> > [ 1384.784302] __alloc_pages_nodemask+0x141/0x1f5
> > [ 1384.798539] alloc_pages_current+0x8d/0x96
> > [ 1384.811465] bio_alloc_pages+0x29/0x6a
> > [ 1384.823334] bch_writeback_thread+0x53b/0x6ff [bcache]
> > [ 1384.839334] ? write_dirty+0x90/0x90 [bcache]
> > [ 1384.852984] kthread+0xfb/0x100
> > [ 1384.862970] ? init_completion+0x24/0x24
> > [ 1384.875285] ? do_fast_syscall_32+0xb7/0xfe
> > [ 1384.888368] ret_from_fork+0x25/0x30
> > [ 1384.899696] Mem-Info:
> > [ 1384.907064] active_anon:0 inactive_anon:2 isolated_anon:0
> > [ 1384.907064] active_file:189 inactive_file:273 isolated_file:0
> > [ 1384.907064] unevictable:0 dirty:0 writeback:0 unstable:0
> > [ 1384.907064] slab_reclaimable:3414 slab_unreclaimable:8053934
> > [ 1384.907064] mapped:1 shmem:2 pagetables:74 bounce:0
> > [ 1384.907064] free:32075 free_pcp:25 free_cma:3741
> > [ 1384.922833] kworker/6:1H: page allocation failure: order:0, mode:0x1284020(GFP_ATOMIC|__GFP_COMP|__GFP_NOTRACK), nodemask=(null)
> > [ 1384.922836] kworker/6:1H cpuset=/ mems_allowed=0
> > [ 1384.922840] CPU: 6 PID: 400 Comm: kworker/6:1H Tainted: G U 4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
> > [ 1384.922841] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> > [ 1384.922844] Workqueue: kblockd blk_mq_run_work_fn
> > [ 1384.922845] Call Trace:
> > [ 1384.922849] dump_stack+0x61/0x7d
> > [ 1384.922851] warn_alloc+0xfc/0x18c
> > [ 1384.922854] __alloc_pages_slowpath+0x9ca/0xa55
> > [ 1384.922856] ? __alloc_pages_slowpath+0x9ca/0xa55
> > [ 1384.922858] __alloc_pages_nodemask+0x141/0x1f5
> > [ 1384.922862] cache_grow_begin+0xa4/0x294
> > [ 1384.922863] fallback_alloc+0x154/0x196
> > [ 1384.922865] ? cache_grow_begin+0xa4/0x294
> > [ 1384.922867] ____cache_alloc_node+0xdd/0xe9
> > [ 1384.922869] kmem_cache_alloc+0x98/0x143
> > [ 1384.922873] sas_alloc_task+0x1d/0x32 [libsas]
> > [ 1384.922876] sas_ata_qc_issue+0x71/0x21c [libsas]
> > [ 1384.922878] ata_qc_issue+0x1fc/0x24c
> > [ 1384.922880] ? ata_scsi_write_same_xlat+0x2d1/0x2d1
> > [ 1384.922882] __ata_scsi_queuecmd+0x18f/0x1eb
> > [ 1384.922883] ata_sas_queuecmd+0x31/0x4d
> > [ 1384.922886] sas_queuecommand+0x83/0x1cf [libsas]
> > [ 1384.922889] ? blk_add_timer+0xcb/0x10f
> > [ 1384.922892] scsi_dispatch_cmd+0x141/0x210
> > [ 1384.922893] scsi_queue_rq+0x1c7/0x28f
> > [ 1384.922895] blk_mq_dispatch_rq_list+0x1a6/0x2cf
> > [ 1384.922896] ? find_next_bit+0xb/0xd
> > [ 1384.922899] blk_mq_sched_dispatch_requests+0x14e/0x1e7
> > [ 1384.922900] ? __switch_to+0x288/0x44b
> > [ 1384.922911] __blk_mq_run_hw_queue+0x4c/0x7f
> > [ 1384.922912] blk_mq_run_work_fn+0x2c/0x2e
> > [ 1384.922913] process_one_work+0x179/0x2a5
> > [ 1384.922915] ? rescuer_thread+0x273/0x273
> > [ 1384.922915] worker_thread+0x1a8/0x25b
> > [ 1384.922917] ? rescuer_thread+0x273/0x273
> > [ 1384.922917] kthread+0xfb/0x100
> > [ 1384.922918] ? init_completion+0x24/0x24
> > [ 1384.922919] ? do_fast_syscall_32+0xb7/0xfe
> > [ 1384.922920] ret_from_fork+0x25/0x30
> > [ 1384.922922] Mem-Info:
> > [ 1384.922924] active_anon:0 inactive_anon:2 isolated_anon:0
> > [ 1384.922924] active_file:199 inactive_file:263 isolated_file:0
> > [ 1384.922924] unevictable:0 dirty:0 writeback:0 unstable:0
> > [ 1384.922924] slab_reclaimable:3414 slab_unreclaimable:8055394
> > [ 1384.922924] mapped:1 shmem:2 pagetables:74 bounce:0
> > [ 1384.922924] free:30587 free_pcp:18 free_cma:3741
> > [ 1384.922926] Node 0 active_anon:0kB inactive_anon:8kB active_file:796kB inactive_file:1052kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:4kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
> > [ 1384.922926] Node 0 DMA free:15880kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> > [ 1384.922928] lowmem_reserve[]: 0 3201 31832 31832 31832
> > [ 1384.922930] Node 0 DMA32 free:91392kB min:6788kB low:10064kB high:13340kB active_anon:0kB inactive_anon:0kB active_file:56kB inactive_file:56kB unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:72kB local_pcp:72kB free_cma:0kB
> > [ 1384.922932] lowmem_reserve[]: 0 0 28631 28631 28631
> > [ 1384.922933] Node 0 Normal free:15076kB min:60760kB low:90092kB high:119424kB active_anon:0kB inactive_anon:8kB active_file:740kB inactive_file:996kB unevictable:0kB writepending:0kB present:29874176kB managed:29337252kB mlocked:0kB kernel_stack:8048kB pagetables:296kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:14964kB
> > [ 1384.922935] lowmem_reserve[]: 0 0 0 0 0
> > [ 1384.922936] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15880kB
> > [ 1384.922941] Node 0 DMA32: 2*4kB (UM) 3*8kB (ME) 4*16kB (UME) 5*32kB (ME) 4*64kB (ME) 4*128kB (ME) 5*256kB (ME) 4*512kB (ME) 5*1024kB (ME) 4*2048kB (UME) 18*4096kB (M) = 91392kB
> > [ 1384.922946] Node 0 Normal: 1*4kB (C) 0*8kB 1*16kB (C) 1*32kB (C) 1*64kB (C) 0*128kB 0*256kB 1*512kB (C) 0*1024kB 1*2048kB (C) 3*4096kB (C) = 14964kB
> > [ 1384.922951] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> > [ 1384.922951] 464 total pagecache pages
> > [ 1384.922953] 0 pages in swap cache
> > [ 1384.922954] Swap cache stats: add 1253, delete 1253, find 21/36
> > [ 1384.922954] Free swap = 15611132kB
> > [ 1384.922954] Total swap = 15616764kB
> > [ 1384.922955] 8313052 pages RAM
> > [ 1384.922955] 0 pages HighMem/MovableOnly
> > [ 1384.922955] 150644 pages reserved
> > [ 1384.922956] 4096 pages cma reserved
> > [ 1384.922956] 0 pages hwpoisoned
> > [ 1385.007958] ata17.00: exception Emask 0x0 SAct 0x10000 SErr 0x0 action 0x0
> > [ 1385.007961] ata17.00: failed command: READ FPDMA QUEUED
> > [ 1385.007965] ata17.00: cmd 60/20:80:90:81:6f/00:00:35:01:00/40 tag 16 ncq dma 16384 in
> > [ 1385.007965] res 40/00:78:10:2c:8d/00:00:f1:00:00/40 Emask 0x40 (internal error)
> > [ 1385.007966] ata17.00: status: { DRDY }
> > [ 1385.008982] ata17.00: Security Log not supported
> > [ 1385.010102] ata17.00: Security Log not supported
> > [ 1385.010104] ata17.00: configured for UDMA/133
> > [ 1385.010110] ata17: EH complete
> > [ 1385.010162] scsi_eh_10: page allocation failure: order:0, mode:0x1284020(GFP_ATOMIC|__GFP_COMP|__GFP_NOTRACK), nodemask=(null)
> > [ 1385.010164] scsi_eh_10 cpuset=/ mems_allowed=0
> > [ 1385.010175] CPU: 6 PID: 409 Comm: scsi_eh_10 Tainted: G U 4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
> > [ 1385.010175] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> > [ 1385.010175] Call Trace:
> > [ 1385.010178] dump_stack+0x61/0x7d
> > [ 1385.010179] warn_alloc+0xfc/0x18c
> > [ 1385.010181] __alloc_pages_slowpath+0x9ca/0xa55
> > [ 1385.010182] ? __alloc_pages_slowpath+0x9ca/0xa55
> > [ 1385.010184] __alloc_pages_nodemask+0x141/0x1f5
> > [ 1385.010186] cache_grow_begin+0xa4/0x294
> > [ 1385.010187] fallback_alloc+0x154/0x196
> > [ 1385.010188] ? cache_grow_begin+0xa4/0x294
> > [ 1385.010189] ____cache_alloc_node+0xdd/0xe9
> > [ 1385.010191] kmem_cache_alloc+0x98/0x143
> > [ 1385.010193] sas_alloc_task+0x1d/0x32 [libsas]
> > [ 1385.010195] sas_ata_qc_issue+0x71/0x21c [libsas]
> > [ 1385.010196] ata_qc_issue+0x1fc/0x24c
> > [ 1385.010198] ? ata_scsi_write_same_xlat+0x2d1/0x2d1
> > [ 1385.010198] __ata_scsi_queuecmd+0x18f/0x1eb
> > [ 1385.010200] ata_sas_queuecmd+0x31/0x4d
> > [ 1385.010202] sas_queuecommand+0x83/0x1cf [libsas]
> > [ 1385.010203] ? blk_add_timer+0xcb/0x10f
> > [ 1385.010205] scsi_dispatch_cmd+0x141/0x210
> > [ 1385.010205] scsi_queue_rq+0x1c7/0x28f
> > [ 1385.010207] blk_mq_dispatch_rq_list+0x1a6/0x2cf
> > [ 1385.010208] blk_mq_sched_dispatch_requests+0x129/0x1e7
> > [ 1385.010209] __blk_mq_run_hw_queue+0x4c/0x7f
> > [ 1385.010210] __blk_mq_delay_run_hw_queue+0x5c/0xa2
> > [ 1385.010211] blk_mq_run_hw_queue+0x14/0x16
> > [ 1385.010212] blk_mq_run_hw_queues+0x2e/0x5e
> > [ 1385.010212] scsi_run_queue+0x236/0x2c1
> > [ 1385.010214] scsi_run_host_queues+0x1f/0x37
> > [ 1385.010215] scsi_error_handler+0x467/0x523
> > [ 1385.010216] ? __schedule+0x4f5/0x5c5
> > [ 1385.010217] ? scsi_eh_get_sense+0x1a9/0x1a9
> > [ 1385.010218] kthread+0xfb/0x100
> > [ 1385.010219] ? init_completion+0x24/0x24
> > [ 1385.010220] ret_from_fork+0x25/0x30
> > [ 1385.010260] ata17.00: exception Emask 0x0 SAct 0x20000 SErr 0x0 action 0x0
> > [ 1385.010263] ata17.00: failed command: READ FPDMA QUEUED
> > [ 1385.010266] ata17.00: cmd 60/20:88:90:81:6f/00:00:35:01:00/40 tag 17 ncq dma 16384 in
> > [ 1385.010266] res 50/00:01:30:00:00/00:00:00:00:00/a0 Emask 0x40 (internal error)
> > [ 1385.010267] ata17.00: status: { DRDY }
> > [ 1385.011259] ata17.00: Security Log not supported
> > [ 1385.012380] ata17.00: Security Log not supported
> > [ 1385.012382] ata17.00: configured for UDMA/133
> > [ 1385.012385] ata17: EH complete
> > [ 1385.335912] mount: page allocation failure: order:0, mode:0x1604040(GFP_NOFS|__GFP_COMP|__GFP_NOTRACK), nodemask=(null)
> > [ 1385.335916] mount cpuset=/ mems_allowed=0
> > [ 1385.335920] CPU: 7 PID: 3125 Comm: mount Tainted: G U 4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
> > [ 1385.335920] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> > [ 1385.335921] Call Trace:
> > [ 1385.335927] dump_stack+0x61/0x7d
> > [ 1385.335930] warn_alloc+0xfc/0x18c
> > [ 1385.335933] ? call_timer_fn+0x140/0x140
> > [ 1385.335935] __alloc_pages_slowpath+0x9ca/0xa55
> > [ 1385.335939] __alloc_pages_nodemask+0x141/0x1f5
> > [ 1385.335943] cache_grow_begin+0xa4/0x294
> > [ 1385.335945] fallback_alloc+0x154/0x196
> > [ 1385.335946] ? cache_grow_begin+0xa4/0x294
> > [ 1385.335948] ____cache_alloc_node+0xdd/0xe9
> > [ 1385.335950] kmem_cache_alloc_trace+0xa0/0xfc
> > [ 1385.335953] add_tree_block+0x6a/0x1a1
> > [ 1385.335955] build_ref_tree_for_root+0x1aa/0x3c8
> > [ 1385.335956] btrfs_build_ref_tree+0x142/0x179
> > [ 1385.335958] open_ctree+0x19af/0x1ffe
> > [ 1385.335961] ? _raw_spin_unlock_bh+0x1a/0x1c
> > [ 1385.335964] btrfs_mount+0xa0e/0xb86
> > [ 1385.335965] ? btrfs_mount+0xa0e/0xb86
> > [ 1385.335967] ? find_next_bit+0xb/0xd
> > [ 1385.335970] mount_fs+0x67/0x111
> > [ 1385.335973] vfs_kern_mount+0x6b/0xd5
> > [ 1385.335974] btrfs_mount+0x1de/0xb86
> > [ 1385.335975] ? find_next_bit+0xb/0xd
> > [ 1385.335978] mount_fs+0x67/0x111
> > [ 1385.335979] vfs_kern_mount+0x6b/0xd5
> > [ 1385.335981] do_mount+0x6e9/0x987
> > [ 1385.335984] compat_SyS_mount+0x185/0x1ae
> > [ 1385.335986] do_fast_syscall_32+0xb7/0xfe
> > [ 1385.335988] entry_SYSENTER_compat+0x4c/0x5b
> > [ 1385.335990] RIP: 0023:0xf7f69c29
> > [ 1385.335991] RSP: 002b:00000000ffa6fed0 EFLAGS: 00000297 ORIG_RAX: 0000000000000015
> > [ 1385.335992] RAX: ffffffffffffffda RBX: 0000000009877050 RCX: 00000000098771e8
> > [ 1385.335993] RDX: 0000000009877370 RSI: 00000000c0ed0400 RDI: 00000000098bd548
> > [ 1385.335993] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> > [ 1385.335994] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> > [ 1385.335994] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> > [ 1387.789938] Node 0 active_anon:588kB inactive_anon:300kB active_file:3988kB inactive_file:1428kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:2184kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
> > [ 1387.871500] Node 0 DMA free:0kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> > [ 1387.949345] lowmem_reserve[]: 0 3201 31832 31832 31832
> > [ 1387.965376] Node 0 DMA32 free:621628kB min:6788kB low:10064kB high:13340kB active_anon:0kB inactive_anon:0kB active_file:84kB inactive_file:28kB unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:640kB local_pcp:0kB free_cma:0kB
> > [ 1388.049300] lowmem_reserve[]: 0 0 28631 28631 28631
> > [ 1388.064560] Node 0 Normal free:4812428kB min:60760kB low:90092kB high:119424kB active_anon:588kB inactive_anon:300kB active_file:3904kB inactive_file:1400kB unevictable:0kB writepending:0kB present:29874176kB managed:29337252kB mlocked:0kB kernel_stack:8080kB pagetables:320kB bounce:0kB free_pcp:4124kB local_pcp:420kB free_cma:11288kB
> > [ 1388.155296] lowmem_reserve[]: 0 0 0 0 0
> > [ 1388.167479] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> > [ 1388.198947] Node 0 DMA32: 4430*4kB (U) 3580*8kB (U) 1883*16kB (U) 1009*32kB (U) 17*64kB (U) 16*128kB (U) 12*256kB (U) 10*512kB (U) 17*1024kB (U) 18*2048kB (U) 126*4096kB (U) = 690472kB
> > [ 1388.249622] Node 0 Normal: 71828*4kB (UC) 54033*8kB (UC) 34313*16kB (UC) 21097*32kB (UC) 10342*64kB (U) 2801*128kB (UC) 201*256kB (UC) 96*512kB (UC) 68*1024kB (U) 48*2048kB (UC) 457*4096kB (UC) = 5104520kB
> > [ 1388.305855] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> > [ 1388.331978] 1467 total pagecache pages
> > [ 1388.344026] 111 pages in swap cache
> > [ 1388.355282] Swap cache stats: add 1465, delete 1354, find 364/553
> > [ 1388.374360] Free swap = 15611132kB
> > [ 1388.385607] Total swap = 15616764kB
> > [ 1388.396863] 8313052 pages RAM
> > [ 1388.406556] 0 pages HighMem/MovableOnly
> > [ 1388.418861] 150644 pages reserved
> > [ 1388.429595] 4096 pages cma reserved
> > [ 1388.440853] 0 pages hwpoisoned
> > [ 1388.450807] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
> > [ 1388.477200] [ 983] 0 983 936 0 6 2 32 0 init
> > [ 1388.503586] [ 984] 0 984 941 1 5 2 98 0 rc
> > [ 1388.529456] [ 1103] 0 1103 920 1 5 2 188 -1000 udevd
> > [ 1388.556123] [ 1311] 0 1311 925 443 5 2 24 -1000 net.agent
> > [ 1388.583800] [ 1352] 0 1352 925 441 5 2 26 -1000 net.agent
> > [ 1388.611490] [ 1703] 0 1703 926 442 5 2 26 -1000 net.agent
> > [ 1388.639176] [ 1935] 0 1935 587 0 5 2 31 0 bootlogd
> > [ 1388.666611] [ 2469] 0 2469 993 0 5 2 262 -1000 udevd
> > [ 1388.693254] [ 2470] 0 2470 993 0 5 2 261 -1000 udevd
> > [ 1388.719913] [ 3049] 0 3049 1538 1 6 2 177 0 S13mountall.sh
> > [ 1388.748886] [ 3125] 0 3125 1718 0 7 2 0 0 mount
> > [ 1388.775570] [15483] 0 15483 558 141 5 2 0 -1000 sleep
> > [ 1388.802207] [15484] 0 15484 558 146 4 2 0 -1000 sleep
> > [ 1388.828828] [15485] 0 15485 558 145 5 2 0 -1000 sleep
> > [ 1388.855456] Out of memory: Kill process 3049 (S13mountall.sh) score 0 or sacrifice child
> >
> > And hopefully totally unrelated (but maybe not), after the boot continues, it
> > crashes with:
> > [ 1523.299228] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffbd1d4c2c
> > [ 1523.299228]
> > [ 1523.334262] CPU: 2 PID: 19932 Comm: avahi-daemon Tainted: G U 4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
> > [ 1523.367142] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> > [ 1523.395339] Call Trace:
> > [ 1523.403515] dump_stack+0x61/0x7d
> > [ 1523.414266] panic+0xe7/0x235
> > [ 1523.423982] ? compat_core_sys_select+0x25b/0x26d
> > [ 1523.438878] __stack_chk_fail+0x19/0x19
> > [ 1523.451168] compat_core_sys_select+0x25b/0x26d
> > [ 1523.465552] ? compat_SyS_select+0xe/0x10
> > [ 1523.478358] ? do_fast_syscall_32+0xb7/0xfe
> > [ 1523.491698] ? entry_SYSENTER_compat+0x4c/0x5b
> > [ 1523.505858] Kernel Offset: 0x3c000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > [ 1523.538981] Rebooting in 20 seconds..
> >
> > I did add stack-protector in 4.13, and it seems to be finding an unrelated bug.
> >
> > Marc
> > --
> > "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> > Microsoft is to operating systems ....
> > .... what McDonalds is to gourmet cooking
> > Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=mDvpdGkRxdLklN-yVzuqr1omzWlRYVI9TzvOASUue9Q&s=rb6VESzi-2sFH_z_ODWKQ5tQtta83EITuT_KaHE7jIs&e= | PGP 1024R/763BE901
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-09-02 16:52 ` Josef Bacik
[not found] ` <CAHKv19A=OVgCpQpDL2454T+f8QgLm9iynA8xZ4w4Kg8JjYS=UA@mail.gmail.com>
@ 2017-09-02 23:53 ` Marc MERLIN
2017-09-03 0:30 ` Josef Bacik
1 sibling, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2017-09-02 23:53 UTC (permalink / raw)
To: Josef Bacik
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
On Sat, Sep 02, 2017 at 04:52:20PM +0000, Josef Bacik wrote:
> Oops, ok I've updated my tree so we don't save the stack trace of the initial scan, which we don't need anyway. That should save a decent amount of memory in your case. It was an in place update so you'll need to blow away your local branch and pull the new one to get the new code. Thanks,
Still did not work unfortunately (on top of extra unrelated bugs in
4.13rc5 like I was afraid)
mounting the partition still sucks all the memory
[ 358.719722] bcache_writebac invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0
[ 358.753716] bcache_writebac cpuset=/ mems_allowed=0
[ 358.769071] CPU: 3 PID: 2339 Comm: bcache_writebac Tainted: G U 4.13.0-rc5-amd64-stkreg-sysrq-20170902+ #2
[ 358.802040] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
[ 358.830082] Call Trace:
[ 358.838108] dump_stack+0x61/0x7d
[ 358.848728] dump_header+0x97/0x239
[ 358.859846] ? _raw_spin_unlock_irqrestore+0x14/0x24
[ 358.875398] oom_kill_process+0x86/0x379
[ 358.887838] out_of_memory+0x3a6/0x3ef
[ 358.899730] __alloc_pages_slowpath+0x86e/0xa1f
[ 358.913977] ? native_sched_clock+0x1a/0x37
[ 358.927197] __alloc_pages_nodemask+0x134/0x1d4
[ 358.941432] alloc_pages_current+0x8d/0x96
[ 358.954343] bio_alloc_pages+0x29/0x6a
[ 358.966194] bch_writeback_thread+0x51c/0x6d4 [bcache]
[ 358.982206] ? write_dirty+0x90/0x90 [bcache]
[ 358.995878] kthread+0xfb/0x100
[ 359.005899] ? init_completion+0x24/0x24
[ 359.018242] ? do_fast_syscall_32+0xb7/0xfe
[ 359.031360] ret_from_fork+0x25/0x30
[ 359.042723] Mem-Info:
[ 359.050529] active_anon:0 inactive_anon:2 isolated_anon:0
[ 359.050529] active_file:306 inactive_file:163 isolated_file:0
[ 359.050529] unevictable:0 dirty:0 writeback:0 unstable:0
[ 359.050529] slab_reclaimable:3430 slab_unreclaimable:8034083
[ 359.050529] mapped:1 shmem:2 pagetables:80 bounce:0
[ 359.050529] free:51932 free_pcp:46 free_cma:3741
[ 359.149971] Node 0 active_anon:0kB inactive_anon:8kB active_file:1128kB inactive_file:892kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:4kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
[ 359.229593] Node 0 DMA free:15880kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 359.308570] lowmem_reserve[]: 0 3201 31832 31832 31832
[ 359.324706] Node 0 DMA32 free:121124kB min:6788kB low:10064kB high:13340kB active_anon:0kB inactive_anon:0kB active_file:100kB inactive_file:0kB unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB mlocked:0kB kernel_stack:16kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 359.408498] lowmem_reserve[]: 0 0 28631 28631 28631
[ 359.423773] Node 0 Normal free:70792kB min:60760kB low:90092kB high:119424kB active_anon:0kB inactive_anon:8kB active_file:780kB inactive_file:808kB unevictable:0kB writepending:0kB present:29874176kB managed:29337512kB mlocked:0kB kernel_stack:4608kB pagetables:320kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:14964kB
[ 359.511284] lowmem_reserve[]: 0 0 0 0 0
[ 359.523514] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15880kB
[ 359.564260] Node 0 DMA32: 3*4kB (UME) 3*8kB (ME) 4*16kB (UME) 6*32kB (UME) 4*64kB (ME) 4*128kB (ME) 5*256kB (ME) 4*512kB (ME) 6*1024kB (UME) 4*2048kB (UME) 25*4096kB (M) = 121124kB
[ 359.614116] Node 0 Normal: 559*4kB (UMEC) 272*8kB (ME) 163*16kB (UMEC) 93*32kB (UMEC) 65*64kB (MEC) 71*128kB (UME) 37*256kB (ME) 18*512kB (UMC) 8*1024kB (M) 4*2048kB (MC) 3*4096kB (C) = 70604kB
[ 359.667377] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 359.693519] 456 total pagecache pages
[ 359.705331] 0 pages in swap cache
[ 359.716151] Swap cache stats: add 1184, delete 1184, find 4/8
[ 359.734213] Free swap = 15610620kB
[ 359.745499] Total swap = 15616764kB
[ 359.756879] 8313052 pages RAM
[ 359.766596] 0 pages HighMem/MovableOnly
[ 359.778927] 150579 pages reserved
[ 359.789686] 4096 pages cma reserved
[ 359.801052] 0 pages hwpoisoned
[ 359.811026] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[ 359.837419] [ 967] 0 967 936 0 6 2 32 0 init
[ 359.863819] [ 968] 0 968 941 1 5 2 98 0 rc
[ 359.889683] [ 1087] 0 1087 942 1 5 2 212 -1000 udevd
[ 359.916457] [ 1294] 0 1294 917 1 5 2 60 -1000 net.agent
[ 359.944236] [ 1340] 0 1340 917 1 5 2 59 -1000 net.agent
[ 359.971915] [ 1750] 0 1750 918 1 5 2 59 -1000 net.agent
[ 359.999603] [ 1915] 0 1915 587 0 5 2 31 0 bootlogd
[ 360.027033] [ 2442] 0 2442 942 0 5 2 211 -1000 udevd
[ 360.053685] [ 2443] 0 2443 942 0 5 2 211 -1000 udevd
[ 360.080467] [ 3023] 0 3023 1538 1 6 2 177 0 S13mountall.sh
[ 360.109446] [ 3078] 0 3078 1719 1 7 2 129 0 mount
[ 360.136111] [ 5722] 0 5722 558 0 5 2 16 -1000 sleep
[ 360.162742] [ 5723] 0 5723 558 0 5 2 17 -1000 sleep
[ 360.189358] [ 5724] 0 5724 558 0 5 2 17 -1000 sleep
[ 360.215977] Out of memory: Kill process 3023 (S13mountall.sh) score 0 or sacrifice child
[ 360.241102] Killed process 3078 (mount) total-vm:6876kB, anon-rss:0kB, file-rss:4kB, shmem-rss:0kB
[ 360.276193] oom_reaper: reaped process 3078 (mount), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 360.308435] bcache_writebac invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0
[ 360.342339] bcache_writebac cpuset=/ mems_allowed=0
[ 360.357757] CPU: 1 PID: 2339 Comm: bcache_writebac Tainted: G U 4.13.0-rc5-amd64-stkreg-sysrq-20170902+ #2
[ 360.390847] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
[ 360.419010] Call Trace:
[ 360.427123] dump_stack+0x61/0x7d
[ 360.437815] dump_header+0x97/0x239
[ 360.449000] ? _raw_spin_unlock_irqrestore+0x14/0x24
[ 360.464705] oom_kill_process+0x86/0x379
[ 360.477180] out_of_memory+0x3a6/0x3ef
[ 360.489120] __alloc_pages_slowpath+0x86e/0xa1f
[ 360.503360] ? native_sched_clock+0x1a/0x37
[ 360.516712] __alloc_pages_nodemask+0x134/0x1d4
[ 360.530950] alloc_pages_current+0x8d/0x96
[ 360.543852] bio_alloc_pages+0x29/0x6a
[ 360.555696] bch_writeback_thread+0x51c/0x6d4 [bcache]
[ 360.571694] ? write_dirty+0x90/0x90 [bcache]
[ 360.585320] kthread+0xfb/0x100
[ 360.595281] ? init_completion+0x24/0x24
[ 360.607571] ? do_fast_syscall_32+0xb7/0xfe
[ 360.620725] ret_from_fork+0x25/0x30
[ 360.632000] Mem-Info:
[ 360.639511] active_anon:0 inactive_anon:2 isolated_anon:0
[ 360.639511] active_file:237 inactive_file:181 isolated_file:0
[ 360.639511] unevictable:0 dirty:0 writeback:0 unstable:0
[ 360.639511] slab_reclaimable:3428 slab_unreclaimable:8054968
[ 360.639511] mapped:1 shmem:2 pagetables:80 bounce:0
[ 360.639511] free:31221 free_pcp:20 free_cma:3741
[ 360.738644] Node 0 active_anon:0kB inactive_anon:8kB active_file:1016kB inactive_file:980kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:4kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 360.818057] Node 0 DMA free:15880kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 360.897260] lowmem_reserve[]: 0 3201 31832 31832 31832
[ 360.913324] Node 0 DMA32 free:4760kB min:6788kB low:10064kB high:13340kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:324kB unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB mlocked:0kB kernel_stack:16kB pagetables:0kB bounce:0kB free_pcp:28kB local_pcp:0kB free_cma:0kB
[ 360.968034] mount: page allocation failure: order:0, mode:0x1604040(GFP_NOFS|__GFP_COMP|__GFP_NOTRACK), nodemask=(null)
[ 360.968038] mount cpuset=/ mems_allowed=0
[ 360.968042] CPU: 0 PID: 3078 Comm: mount Tainted: G U 4.13.0-rc5-amd64-stkreg-sysrq-20170902+ #2
[ 360.968043] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
[ 360.968044] Call Trace:
[ 360.968050] dump_stack+0x61/0x7d
[ 360.968052] warn_alloc+0xe4/0x15d
[ 360.968056] ? call_timer_fn+0x140/0x140
[ 360.968058] __alloc_pages_slowpath+0x9a8/0xa1f
[ 360.968068] ? __rmqueue+0x285/0x297
[ 360.968071] __alloc_pages_nodemask+0x134/0x1d4
[ 360.968075] cache_grow_begin+0x95/0x26f
[ 360.968077] fallback_alloc+0x154/0x196
[ 360.968079] ____cache_alloc_node+0xdd/0xe9
[ 360.968081] kmem_cache_alloc_trace+0xa0/0xfc
[ 360.968084] add_tree_block+0x6a/0x1ac
[ 360.968086] build_ref_tree_for_root+0x19b/0x3a5
[ 360.968088] btrfs_build_ref_tree+0x133/0x156
[ 360.968090] open_ctree+0x1997/0x1fd2
[ 360.968093] btrfs_mount+0x9d5/0xb2d
[ 360.968094] ? btrfs_mount+0x9d5/0xb2d
[ 360.968096] ? find_next_bit+0xb/0xd
[ 360.968099] mount_fs+0x67/0x111
[ 360.968101] vfs_kern_mount+0x6b/0xd5
[ 360.968102] btrfs_mount+0x1c3/0xb2d
[ 360.968103] ? find_next_bit+0xb/0xd
[ 360.968106] mount_fs+0x67/0x111
[ 360.968107] vfs_kern_mount+0x6b/0xd5
[ 360.968109] do_mount+0x6da/0x964
[ 360.968111] ? slab_post_alloc_hook.isra.46+0xe/0x1d
[ 360.968113] compat_SyS_mount+0x185/0x1ae
[ 360.968116] do_fast_syscall_32+0xb7/0xfe
[ 360.968118] entry_SYSENTER_compat+0x4c/0x5b
[ 360.968119] RIP: 0023:0xf7f21c29
[ 360.968120] RSP: 002b:00000000ffd733e0 EFLAGS: 00000297 ORIG_RAX: 0000000000000015
[ 360.968122] RAX: ffffffffffffffda RBX: 0000000008ff32c8 RCX: 0000000008ff3118
[ 360.968122] RDX: 0000000008ff3010 RSI: 00000000c0ed0400 RDI: 0000000008ff6500
[ 360.968123] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 360.968124] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 360.968124] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 360.968126] Mem-Info:
[ 360.968130] active_anon:0 inactive_anon:2 isolated_anon:0
[ 360.968130] active_file:333 inactive_file:144 isolated_file:0
[ 360.968130] unevictable:0 dirty:0 writeback:0 unstable:0
[ 360.968130] slab_reclaimable:3428 slab_unreclaimable:8082489
[ 360.968130] mapped:1 shmem:2 pagetables:80 bounce:0
[ 360.968130] free:3717 free_pcp:9 free_cma:3741
[ 360.968132] Node 0 active_anon:0kB inactive_anon:8kB active_file:1332kB inactive_file:576kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:4kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
[ 360.968133] Node 0 DMA free:0kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 360.968136] lowmem_reserve[]: 0 3201 31832 31832 31832
[ 360.968138] Node 0 DMA32 free:296kB min:6788kB low:10064kB high:13340kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:292kB unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB mlocked:0kB kernel_stack:16kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 360.968141] lowmem_reserve[]: 0 0 28631 28631 28631
[ 360.968143] Node 0 Normal free:14572kB min:60760kB low:90092kB high:119424kB active_anon:0kB inactive_anon:8kB active_file:920kB inactive_file:504kB unevictable:0kB writepending:0kB present:29874176kB managed:29337512kB mlocked:0kB kernel_stack:4608kB pagetables:320kB bounce:0kB free_pcp:36kB local_pcp:0kB free_cma:14964kB
[ 360.968146] lowmem_reserve[]: 0 0 0 0 0
[ 360.968147] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[ 360.968152] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[ 360.968157] Node 0 Normal: 1*4kB (C) 0*8kB 1*16kB (C) 1*32kB (C) 1*64kB (C) 0*128kB 0*256kB 1*512kB (C) 0*1024kB 1*2048kB (C) 3*4096kB (C) = 14964kB
[ 360.968164] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 360.968165] 495 total pagecache pages
[ 360.968167] 0 pages in swap cache
[ 360.968168] Swap cache stats: add 1184, delete 1184, find 4/8
[ 360.968169] Free swap = 15611132kB
[ 360.968169] Total swap = 15616764kB
[ 360.968170] 8313052 pages RAM
[ 360.968170] 0 pages HighMem/MovableOnly
[ 360.968170] 150579 pages reserved
[ 360.968171] 4096 pages cma reserved
[ 360.968171] 0 pages hwpoisoned
[ 362.324699] lowmem_reserve[]: 0 0 28631 28631 28631
[ 362.340147] Node 0 Normal free:2422448kB min:60760kB low:90092kB high:119424kB active_anon:636kB inactive_anon:196kB active_file:3788kB inactive_file:1424kB unevictable:0kB writepending:0kB present:29874176kB managed:29337512kB mlocked:0kB kernel_stack:4608kB pagetables:324kB bounce:0kB free_pcp:4100kB local_pcp:48kB free_cma:11468kB
[ 362.431162] lowmem_reserve[]: 0 0 0 0 0
[ 362.443496] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[ 362.475115] Node 0 DMA32: 4469*4kB (U) 3639*8kB (U) 3116*16kB (U) 1233*32kB (U) 64*64kB (U) 9*128kB (U) 4*256kB (U) 6*512kB (U) 3*1024kB (U) 6*2048kB (U) 45*4096kB (U) = 345324kB
[ 362.524575] Node 0 Normal: 53138*4kB (UMC) 40392*8kB (UMC) 29104*16kB (UM) 16364*32kB (U) 4513*64kB (UC) 658*128kB (UC) 25*256kB (U) 18*512kB (U) 20*1024kB (UC) 19*2048kB (UC) 193*4096kB (UC) = 2763592kB
[ 362.580622] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 362.606863] 1429 total pagecache pages
[ 362.619062] 96 pages in swap cache
[ 362.630199] Swap cache stats: add 1396, delete 1300, find 237/380
[ 362.649433] Free swap = 15611132kB
[ 362.660833] Total swap = 15616764kB
[ 362.672274] 8313052 pages RAM
[ 362.682133] 0 pages HighMem/MovableOnly
[ 362.694909] 150579 pages reserved
[ 362.705741] 4096 pages cma reserved
[ 362.717084] 0 pages hwpoisoned
[ 362.727099] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[ 362.753547] [ 967] 0 967 936 0 6 2 32 0 init
[ 362.779963] [ 968] 0 968 941 1 5 2 98 0 rc
[ 362.805846] [ 1087] 0 1087 942 1 5 2 212 -1000 udevd
[ 362.832511] [ 1294] 0 1294 917 421 5 2 24 -1000 net.agent
[ 362.860203] [ 1340] 0 1340 917 396 5 2 28 -1000 net.agent
[ 362.887851] [ 1750] 0 1750 918 419 5 2 25 -1000 net.agent
[ 362.915478] [ 1915] 0 1915 587 0 5 2 31 0 bootlogd
[ 362.942839] [ 2442] 0 2442 942 0 5 2 211 -1000 udevd
[ 362.969390] [ 2443] 0 2443 942 0 5 2 211 -1000 udevd
[ 362.995922] [ 3023] 0 3023 1538 1 6 2 177 0 S13mountall.sh
[ 363.024797] [ 3078] 0 3078 1719 0 7 2 0 0 mount
[ 363.051309] [ 5743] 0 5743 558 147 5 2 0 -1000 sleep
[ 363.077802] [ 5744] 0 5744 558 148 5 2 0 -1000 sleep
[ 363.104305] [ 5745] 0 5745 558 143 5 2 0 -1000 sleep
[ 363.130752] Out of memory: Kill process 3023 (S13mountall.sh) score 0 or sacrifice child
[ 363.155660] Killed process 3023 (S13mountall.sh) total-vm:6152kB, anon-rss:0kB, file-rss:4kB, shmem-rss:0kB
/etc/init.d/rc: line 120: 3023 Killed $debug "$script" $action
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-09-02 23:53 ` Marc MERLIN
@ 2017-09-03 0:30 ` Josef Bacik
2017-09-03 1:01 ` Marc MERLIN
0 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2017-09-03 0:30 UTC (permalink / raw)
To: Marc MERLIN
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
My bad, I forgot I don't dynamically allocate the stack trace space so my patch did nothing, I blame the children for distracting me. I've dropped allocating the action altogether for the on disk stuff, that should dramatically reduce the memory usage. You can just do a git pull since I made a new commit. You are mounting with -o ref_verify on only the one fs right? Give this a try and if it still doesn't work we can try a stripped down version that doesn't build the initial tree and just hope that the problem exists in allocating a new block and not modifying the refs for an existing block. Thanks,
Josef
Sent from my iPhone
> On Sep 2, 2017, at 7:54 PM, Marc MERLIN <marc@merlins.org> wrote:
>
>> On Sat, Sep 02, 2017 at 04:52:20PM +0000, Josef Bacik wrote:
>> Oops, ok I've updated my tree so we don't save the stack trace of the initial scan, which we don't need anyway. That should save a decent amount of memory in your case. It was an in place update so you'll need to blow away your local branch and pull the new one to get the new code. Thanks,
>
> Still did not work unfortunately (on top of extra unrelated bugs in
> 4.13rc5 like I was afraid)
>
> mounting the partition still sucks all the memory
>
> [ 358.719722] bcache_writebac invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0
> [ 358.753716] bcache_writebac cpuset=/ mems_allowed=0
> [ 358.769071] CPU: 3 PID: 2339 Comm: bcache_writebac Tainted: G U 4.13.0-rc5-amd64-stkreg-sysrq-20170902+ #2
> [ 358.802040] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> [ 358.830082] Call Trace:
> [ 358.838108] dump_stack+0x61/0x7d
> [ 358.848728] dump_header+0x97/0x239
> [ 358.859846] ? _raw_spin_unlock_irqrestore+0x14/0x24
> [ 358.875398] oom_kill_process+0x86/0x379
> [ 358.887838] out_of_memory+0x3a6/0x3ef
> [ 358.899730] __alloc_pages_slowpath+0x86e/0xa1f
> [ 358.913977] ? native_sched_clock+0x1a/0x37
> [ 358.927197] __alloc_pages_nodemask+0x134/0x1d4
> [ 358.941432] alloc_pages_current+0x8d/0x96
> [ 358.954343] bio_alloc_pages+0x29/0x6a
> [ 358.966194] bch_writeback_thread+0x51c/0x6d4 [bcache]
> [ 358.982206] ? write_dirty+0x90/0x90 [bcache]
> [ 358.995878] kthread+0xfb/0x100
> [ 359.005899] ? init_completion+0x24/0x24
> [ 359.018242] ? do_fast_syscall_32+0xb7/0xfe
> [ 359.031360] ret_from_fork+0x25/0x30
> [ 359.042723] Mem-Info:
> [ 359.050529] active_anon:0 inactive_anon:2 isolated_anon:0
> [ 359.050529] active_file:306 inactive_file:163 isolated_file:0
> [ 359.050529] unevictable:0 dirty:0 writeback:0 unstable:0
> [ 359.050529] slab_reclaimable:3430 slab_unreclaimable:8034083
> [ 359.050529] mapped:1 shmem:2 pagetables:80 bounce:0
> [ 359.050529] free:51932 free_pcp:46 free_cma:3741
> [ 359.149971] Node 0 active_anon:0kB inactive_anon:8kB active_file:1128kB inactive_file:892kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:4kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
> [ 359.229593] Node 0 DMA free:15880kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [ 359.308570] lowmem_reserve[]: 0 3201 31832 31832 31832
> [ 359.324706] Node 0 DMA32 free:121124kB min:6788kB low:10064kB high:13340kB active_anon:0kB inactive_anon:0kB active_file:100kB inactive_file:0kB unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB mlocked:0kB kernel_stack:16kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [ 359.408498] lowmem_reserve[]: 0 0 28631 28631 28631
> [ 359.423773] Node 0 Normal free:70792kB min:60760kB low:90092kB high:119424kB active_anon:0kB inactive_anon:8kB active_file:780kB inactive_file:808kB unevictable:0kB writepending:0kB present:29874176kB managed:29337512kB mlocked:0kB kernel_stack:4608kB pagetables:320kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:14964kB
> [ 359.511284] lowmem_reserve[]: 0 0 0 0 0
> [ 359.523514] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15880kB
> [ 359.564260] Node 0 DMA32: 3*4kB (UME) 3*8kB (ME) 4*16kB (UME) 6*32kB (UME) 4*64kB (ME) 4*128kB (ME) 5*256kB (ME) 4*512kB (ME) 6*1024kB (UME) 4*2048kB (UME) 25*4096kB (M) = 121124kB
> [ 359.614116] Node 0 Normal: 559*4kB (UMEC) 272*8kB (ME) 163*16kB (UMEC) 93*32kB (UMEC) 65*64kB (MEC) 71*128kB (UME) 37*256kB (ME) 18*512kB (UMC) 8*1024kB (M) 4*2048kB (MC) 3*4096kB (C) = 70604kB
> [ 359.667377] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> [ 359.693519] 456 total pagecache pages
> [ 359.705331] 0 pages in swap cache
> [ 359.716151] Swap cache stats: add 1184, delete 1184, find 4/8
> [ 359.734213] Free swap = 15610620kB
> [ 359.745499] Total swap = 15616764kB
> [ 359.756879] 8313052 pages RAM
> [ 359.766596] 0 pages HighMem/MovableOnly
> [ 359.778927] 150579 pages reserved
> [ 359.789686] 4096 pages cma reserved
> [ 359.801052] 0 pages hwpoisoned
> [ 359.811026] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
> [ 359.837419] [ 967] 0 967 936 0 6 2 32 0 init
> [ 359.863819] [ 968] 0 968 941 1 5 2 98 0 rc
> [ 359.889683] [ 1087] 0 1087 942 1 5 2 212 -1000 udevd
> [ 359.916457] [ 1294] 0 1294 917 1 5 2 60 -1000 net.agent
> [ 359.944236] [ 1340] 0 1340 917 1 5 2 59 -1000 net.agent
> [ 359.971915] [ 1750] 0 1750 918 1 5 2 59 -1000 net.agent
> [ 359.999603] [ 1915] 0 1915 587 0 5 2 31 0 bootlogd
> [ 360.027033] [ 2442] 0 2442 942 0 5 2 211 -1000 udevd
> [ 360.053685] [ 2443] 0 2443 942 0 5 2 211 -1000 udevd
> [ 360.080467] [ 3023] 0 3023 1538 1 6 2 177 0 S13mountall.sh
> [ 360.109446] [ 3078] 0 3078 1719 1 7 2 129 0 mount
> [ 360.136111] [ 5722] 0 5722 558 0 5 2 16 -1000 sleep
> [ 360.162742] [ 5723] 0 5723 558 0 5 2 17 -1000 sleep
> [ 360.189358] [ 5724] 0 5724 558 0 5 2 17 -1000 sleep
> [ 360.215977] Out of memory: Kill process 3023 (S13mountall.sh) score 0 or sacrifice child
> [ 360.241102] Killed process 3078 (mount) total-vm:6876kB, anon-rss:0kB, file-rss:4kB, shmem-rss:0kB
> [ 360.276193] oom_reaper: reaped process 3078 (mount), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> [ 360.308435] bcache_writebac invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0
> [ 360.342339] bcache_writebac cpuset=/ mems_allowed=0
> [ 360.357757] CPU: 1 PID: 2339 Comm: bcache_writebac Tainted: G U 4.13.0-rc5-amd64-stkreg-sysrq-20170902+ #2
> [ 360.390847] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> [ 360.419010] Call Trace:
> [ 360.427123] dump_stack+0x61/0x7d
> [ 360.437815] dump_header+0x97/0x239
> [ 360.449000] ? _raw_spin_unlock_irqrestore+0x14/0x24
> [ 360.464705] oom_kill_process+0x86/0x379
> [ 360.477180] out_of_memory+0x3a6/0x3ef
> [ 360.489120] __alloc_pages_slowpath+0x86e/0xa1f
> [ 360.503360] ? native_sched_clock+0x1a/0x37
> [ 360.516712] __alloc_pages_nodemask+0x134/0x1d4
> [ 360.530950] alloc_pages_current+0x8d/0x96
> [ 360.543852] bio_alloc_pages+0x29/0x6a
> [ 360.555696] bch_writeback_thread+0x51c/0x6d4 [bcache]
> [ 360.571694] ? write_dirty+0x90/0x90 [bcache]
> [ 360.585320] kthread+0xfb/0x100
> [ 360.595281] ? init_completion+0x24/0x24
> [ 360.607571] ? do_fast_syscall_32+0xb7/0xfe
> [ 360.620725] ret_from_fork+0x25/0x30
> [ 360.632000] Mem-Info:
> [ 360.639511] active_anon:0 inactive_anon:2 isolated_anon:0
> [ 360.639511] active_file:237 inactive_file:181 isolated_file:0
> [ 360.639511] unevictable:0 dirty:0 writeback:0 unstable:0
> [ 360.639511] slab_reclaimable:3428 slab_unreclaimable:8054968
> [ 360.639511] mapped:1 shmem:2 pagetables:80 bounce:0
> [ 360.639511] free:31221 free_pcp:20 free_cma:3741
> [ 360.738644] Node 0 active_anon:0kB inactive_anon:8kB active_file:1016kB inactive_file:980kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:4kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
> [ 360.818057] Node 0 DMA free:15880kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [ 360.897260] lowmem_reserve[]: 0 3201 31832 31832 31832
> [ 360.913324] Node 0 DMA32 free:4760kB min:6788kB low:10064kB high:13340kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:324kB unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB mlocked:0kB kernel_stack:16kB pagetables:0kB bounce:0kB free_pcp:28kB local_pcp:0kB free_cma:0kB
> [ 360.968034] mount: page allocation failure: order:0, mode:0x1604040(GFP_NOFS|__GFP_COMP|__GFP_NOTRACK), nodemask=(null)
> [ 360.968038] mount cpuset=/ mems_allowed=0
> [ 360.968042] CPU: 0 PID: 3078 Comm: mount Tainted: G U 4.13.0-rc5-amd64-stkreg-sysrq-20170902+ #2
> [ 360.968043] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> [ 360.968044] Call Trace:
> [ 360.968050] dump_stack+0x61/0x7d
> [ 360.968052] warn_alloc+0xe4/0x15d
> [ 360.968056] ? call_timer_fn+0x140/0x140
> [ 360.968058] __alloc_pages_slowpath+0x9a8/0xa1f
> [ 360.968068] ? __rmqueue+0x285/0x297
> [ 360.968071] __alloc_pages_nodemask+0x134/0x1d4
> [ 360.968075] cache_grow_begin+0x95/0x26f
> [ 360.968077] fallback_alloc+0x154/0x196
> [ 360.968079] ____cache_alloc_node+0xdd/0xe9
> [ 360.968081] kmem_cache_alloc_trace+0xa0/0xfc
> [ 360.968084] add_tree_block+0x6a/0x1ac
> [ 360.968086] build_ref_tree_for_root+0x19b/0x3a5
> [ 360.968088] btrfs_build_ref_tree+0x133/0x156
> [ 360.968090] open_ctree+0x1997/0x1fd2
> [ 360.968093] btrfs_mount+0x9d5/0xb2d
> [ 360.968094] ? btrfs_mount+0x9d5/0xb2d
> [ 360.968096] ? find_next_bit+0xb/0xd
> [ 360.968099] mount_fs+0x67/0x111
> [ 360.968101] vfs_kern_mount+0x6b/0xd5
> [ 360.968102] btrfs_mount+0x1c3/0xb2d
> [ 360.968103] ? find_next_bit+0xb/0xd
> [ 360.968106] mount_fs+0x67/0x111
> [ 360.968107] vfs_kern_mount+0x6b/0xd5
> [ 360.968109] do_mount+0x6da/0x964
> [ 360.968111] ? slab_post_alloc_hook.isra.46+0xe/0x1d
> [ 360.968113] compat_SyS_mount+0x185/0x1ae
> [ 360.968116] do_fast_syscall_32+0xb7/0xfe
> [ 360.968118] entry_SYSENTER_compat+0x4c/0x5b
> [ 360.968119] RIP: 0023:0xf7f21c29
> [ 360.968120] RSP: 002b:00000000ffd733e0 EFLAGS: 00000297 ORIG_RAX: 0000000000000015
> [ 360.968122] RAX: ffffffffffffffda RBX: 0000000008ff32c8 RCX: 0000000008ff3118
> [ 360.968122] RDX: 0000000008ff3010 RSI: 00000000c0ed0400 RDI: 0000000008ff6500
> [ 360.968123] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> [ 360.968124] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [ 360.968124] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [ 360.968126] Mem-Info:
> [ 360.968130] active_anon:0 inactive_anon:2 isolated_anon:0
> [ 360.968130] active_file:333 inactive_file:144 isolated_file:0
> [ 360.968130] unevictable:0 dirty:0 writeback:0 unstable:0
> [ 360.968130] slab_reclaimable:3428 slab_unreclaimable:8082489
> [ 360.968130] mapped:1 shmem:2 pagetables:80 bounce:0
> [ 360.968130] free:3717 free_pcp:9 free_cma:3741
> [ 360.968132] Node 0 active_anon:0kB inactive_anon:8kB active_file:1332kB inactive_file:576kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:4kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
> [ 360.968133] Node 0 DMA free:0kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [ 360.968136] lowmem_reserve[]: 0 3201 31832 31832 31832
> [ 360.968138] Node 0 DMA32 free:296kB min:6788kB low:10064kB high:13340kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:292kB unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB mlocked:0kB kernel_stack:16kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [ 360.968141] lowmem_reserve[]: 0 0 28631 28631 28631
> [ 360.968143] Node 0 Normal free:14572kB min:60760kB low:90092kB high:119424kB active_anon:0kB inactive_anon:8kB active_file:920kB inactive_file:504kB unevictable:0kB writepending:0kB present:29874176kB managed:29337512kB mlocked:0kB kernel_stack:4608kB pagetables:320kB bounce:0kB free_pcp:36kB local_pcp:0kB free_cma:14964kB
> [ 360.968146] lowmem_reserve[]: 0 0 0 0 0
> [ 360.968147] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [ 360.968152] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [ 360.968157] Node 0 Normal: 1*4kB (C) 0*8kB 1*16kB (C) 1*32kB (C) 1*64kB (C) 0*128kB 0*256kB 1*512kB (C) 0*1024kB 1*2048kB (C) 3*4096kB (C) = 14964kB
> [ 360.968164] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> [ 360.968165] 495 total pagecache pages
> [ 360.968167] 0 pages in swap cache
> [ 360.968168] Swap cache stats: add 1184, delete 1184, find 4/8
> [ 360.968169] Free swap = 15611132kB
> [ 360.968169] Total swap = 15616764kB
> [ 360.968170] 8313052 pages RAM
> [ 360.968170] 0 pages HighMem/MovableOnly
> [ 360.968170] 150579 pages reserved
> [ 360.968171] 4096 pages cma reserved
> [ 360.968171] 0 pages hwpoisoned
> [ 362.324699] lowmem_reserve[]: 0 0 28631 28631 28631
> [ 362.340147] Node 0 Normal free:2422448kB min:60760kB low:90092kB high:119424kB active_anon:636kB inactive_anon:196kB active_file:3788kB inactive_file:1424kB unevictable:0kB writepending:0kB present:29874176kB managed:29337512kB mlocked:0kB kernel_stack:4608kB pagetables:324kB bounce:0kB free_pcp:4100kB local_pcp:48kB free_cma:11468kB
> [ 362.431162] lowmem_reserve[]: 0 0 0 0 0
> [ 362.443496] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [ 362.475115] Node 0 DMA32: 4469*4kB (U) 3639*8kB (U) 3116*16kB (U) 1233*32kB (U) 64*64kB (U) 9*128kB (U) 4*256kB (U) 6*512kB (U) 3*1024kB (U) 6*2048kB (U) 45*4096kB (U) = 345324kB
> [ 362.524575] Node 0 Normal: 53138*4kB (UMC) 40392*8kB (UMC) 29104*16kB (UM) 16364*32kB (U) 4513*64kB (UC) 658*128kB (UC) 25*256kB (U) 18*512kB (U) 20*1024kB (UC) 19*2048kB (UC) 193*4096kB (UC) = 2763592kB
> [ 362.580622] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> [ 362.606863] 1429 total pagecache pages
> [ 362.619062] 96 pages in swap cache
> [ 362.630199] Swap cache stats: add 1396, delete 1300, find 237/380
> [ 362.649433] Free swap = 15611132kB
> [ 362.660833] Total swap = 15616764kB
> [ 362.672274] 8313052 pages RAM
> [ 362.682133] 0 pages HighMem/MovableOnly
> [ 362.694909] 150579 pages reserved
> [ 362.705741] 4096 pages cma reserved
> [ 362.717084] 0 pages hwpoisoned
> [ 362.727099] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
> [ 362.753547] [ 967] 0 967 936 0 6 2 32 0 init
> [ 362.779963] [ 968] 0 968 941 1 5 2 98 0 rc
> [ 362.805846] [ 1087] 0 1087 942 1 5 2 212 -1000 udevd
> [ 362.832511] [ 1294] 0 1294 917 421 5 2 24 -1000 net.agent
> [ 362.860203] [ 1340] 0 1340 917 396 5 2 28 -1000 net.agent
> [ 362.887851] [ 1750] 0 1750 918 419 5 2 25 -1000 net.agent
> [ 362.915478] [ 1915] 0 1915 587 0 5 2 31 0 bootlogd
> [ 362.942839] [ 2442] 0 2442 942 0 5 2 211 -1000 udevd
> [ 362.969390] [ 2443] 0 2443 942 0 5 2 211 -1000 udevd
> [ 362.995922] [ 3023] 0 3023 1538 1 6 2 177 0 S13mountall.sh
> [ 363.024797] [ 3078] 0 3078 1719 0 7 2 0 0 mount
> [ 363.051309] [ 5743] 0 5743 558 147 5 2 0 -1000 sleep
> [ 363.077802] [ 5744] 0 5744 558 148 5 2 0 -1000 sleep
> [ 363.104305] [ 5745] 0 5745 558 143 5 2 0 -1000 sleep
> [ 363.130752] Out of memory: Kill process 3023 (S13mountall.sh) score 0 or sacrifice child
> [ 363.155660] Killed process 3023 (S13mountall.sh) total-vm:6152kB, anon-rss:0kB, file-rss:4kB, shmem-rss:0kB
> /etc/init.d/rc: line 120: 3023 Killed $debug "$script" $action
>
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
> .... what McDonalds is to gourmet cooking
> Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=1aDQg12_YvSWTRwtKEuju2jwfwBQHWUmF1TFzisZwyE&s=gOQVCOu1vW2YKYAvS2imou0jsDaSNerp6_GvMVfCh5Q&e= | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-09-03 0:30 ` Josef Bacik
@ 2017-09-03 1:01 ` Marc MERLIN
2017-09-03 3:26 ` Josef Bacik
0 siblings, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2017-09-03 1:01 UTC (permalink / raw)
To: Josef Bacik
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
On Sun, Sep 03, 2017 at 12:30:07AM +0000, Josef Bacik wrote:
> My bad, I forgot I don't dynamically allocate the stack trace space so my patch did nothing, I blame the children for distracting me. I've dropped allocating the action altogether for the on disk stuff, that should dramatically reduce the memory usage. You can just do a git pull since I made a new commit. You are mounting with -o ref_verify on only the one fs right? Give this a try and if it still doesn't work we can try a stripped down version that doesn't build the initial tree and just hope that the problem exists in allocating a new block and not modifying the refs for an existing block. Thanks,
Good news, this time it booted without crashing on OOM.
I'll now get to see how it runs and hopefully it won't crash due to
other problems in 4.13
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-09-03 1:01 ` Marc MERLIN
@ 2017-09-03 3:26 ` Josef Bacik
2017-09-03 14:31 ` Marc MERLIN
0 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2017-09-03 3:26 UTC (permalink / raw)
To: Marc MERLIN
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
I was looking through the code for other ways to cut down memory usage when I noticed we only catch improper re-allocations, not adding another ref for metadata which is what I suspect your problem is. I added another patch and pushed it out, sorry for the churn.
Josef
Sent from my iPhone
> On Sep 2, 2017, at 9:01 PM, Marc MERLIN <marc@merlins.org> wrote:
>
>> On Sun, Sep 03, 2017 at 12:30:07AM +0000, Josef Bacik wrote:
>> My bad, I forgot I don't dynamically allocate the stack trace space so my patch did nothing, I blame the children for distracting me. I've dropped allocating the action altogether for the on disk stuff, that should dramatically reduce the memory usage. You can just do a git pull since I made a new commit. You are mounting with -o ref_verify on only the one fs right? Give this a try and if it still doesn't work we can try a stripped down version that doesn't build the initial tree and just hope that the problem exists in allocating a new block and not modifying the refs for an existing block. Thanks,
>
> Good news, this time it booted without crashing on OOM.
>
> I'll now get to see how it runs and hopefully it won't crash due to
> other problems in 4.13
>
> Marc
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
> .... what McDonalds is to gourmet cooking
> Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=-zFzT4JPwAa-JY-PU1TRHuerYPlZf00HGKCTgtSRcxU&s=fyD-Ff-h7AsoFbRF2RqvzlURQJg38B1RTu7A_n0OLs8&e= | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-09-03 3:26 ` Josef Bacik
@ 2017-09-03 14:31 ` Marc MERLIN
2017-09-03 14:38 ` Josef Bacik
0 siblings, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2017-09-03 14:31 UTC (permalink / raw)
To: Josef Bacik
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
On Sun, Sep 03, 2017 at 03:26:34AM +0000, Josef Bacik wrote:
> I was looking through the code for other ways to cut down memory usage when I noticed we only catch improper re-allocations, not adding another ref for metadata which is what I suspect your problem is. I added another patch and pushed it out, sorry for the churn.
Installed.
For now, I've seen this once, but otherwise no issues:
Dropping a ref for a root that doesn't have a ref on the block
Dumping block entry [26538725376 4096], num_refs 2, metadata 0, from disk 1
Ref root 0, parent 29818880, owner 23608, offset 0, num_refs 18446744073709551615
Ref root 0, parent 202129408, owner 23608, offset 0, num_refs 1
Ref root 418, parent 0, owner 23608, offset 0, num_refs 1
Root entry 418, num_refs 1
Root entry 69809, num_refs 0
Ref action 1, root 418, ref_root 0, parent 202129408, owner 23608, offset 0, num_refs 1
No stacktrace support
Ref action 2, root 69809, ref_root 0, parent 29818880, owner 23608, offset 0, num_refs 18446744073709551615
No stacktrace support
I'm assuming this was done by your patch?
Should I worry about 'No stacktrace support' ?
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-09-03 14:31 ` Marc MERLIN
@ 2017-09-03 14:38 ` Josef Bacik
2017-09-03 14:42 ` Marc MERLIN
0 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2017-09-03 14:38 UTC (permalink / raw)
To: Marc MERLIN
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
Oh yeah you need CONFIG_STACKTRACE turned on, otherwise this is going to be difficult ;). Thanks,
Josef
Sent from my iPhone
> On Sep 3, 2017, at 10:31 AM, Marc MERLIN <marc@merlins.org> wrote:
>
>> On Sun, Sep 03, 2017 at 03:26:34AM +0000, Josef Bacik wrote:
>> I was looking through the code for other ways to cut down memory usage when I noticed we only catch improper re-allocations, not adding another ref for metadata which is what I suspect your problem is. I added another patch and pushed it out, sorry for the churn.
>
> Installed.
>
> For now, I've seen this once, but otherwise no issues:
> Dropping a ref for a root that doesn't have a ref on the block
> Dumping block entry [26538725376 4096], num_refs 2, metadata 0, from disk 1
> Ref root 0, parent 29818880, owner 23608, offset 0, num_refs 18446744073709551615
> Ref root 0, parent 202129408, owner 23608, offset 0, num_refs 1
> Ref root 418, parent 0, owner 23608, offset 0, num_refs 1
> Root entry 418, num_refs 1
> Root entry 69809, num_refs 0
> Ref action 1, root 418, ref_root 0, parent 202129408, owner 23608, offset 0, num_refs 1
> No stacktrace support
> Ref action 2, root 69809, ref_root 0, parent 29818880, owner 23608, offset 0, num_refs 18446744073709551615
> No stacktrace support
>
>
> I'm assuming this was done by your patch?
> Should I worry about 'No stacktrace support' ?
>
> Marc
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
> .... what McDonalds is to gourmet cooking
> Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=LcpX_93P3Y777JowgGupu6UcijcbbvSYDebGKuuA1G8&s=w9rh7zu0AfB72bo7gMQ9oAj20iJYe8KIXuudlTWa_ek&e= | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-09-03 14:38 ` Josef Bacik
@ 2017-09-03 14:42 ` Marc MERLIN
2017-09-03 14:55 ` Josef Bacik
2017-09-03 17:33 ` Josef Bacik
0 siblings, 2 replies; 47+ messages in thread
From: Marc MERLIN @ 2017-09-03 14:42 UTC (permalink / raw)
To: Josef Bacik
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
On Sun, Sep 03, 2017 at 02:38:57PM +0000, Josef Bacik wrote:
> Oh yeah you need CONFIG_STACKTRACE turned on, otherwise this is going to be difficult ;). Thanks,
Right, except that I thought I did:
saruman:/usr/src/linux-btrfs/btrfs-next# grep STACKTRACE .config
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_RELIABLE_STACKTRACE=y
CONFIG_STACKTRACE=y
CONFIG_USER_STACKTRACE_SUPPORT=y
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-09-03 14:42 ` Marc MERLIN
@ 2017-09-03 14:55 ` Josef Bacik
2017-09-03 17:33 ` Josef Bacik
1 sibling, 0 replies; 47+ messages in thread
From: Josef Bacik @ 2017-09-03 14:55 UTC (permalink / raw)
To: Marc MERLIN
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
Jesus Christ I misspelled it, I'll fix it up when I get home. Thanks,
Josef
Sent from my iPhone
> On Sep 3, 2017, at 10:42 AM, Marc MERLIN <marc@merlins.org> wrote:
>
>> On Sun, Sep 03, 2017 at 02:38:57PM +0000, Josef Bacik wrote:
>> Oh yeah you need CONFIG_STACKTRACE turned on, otherwise this is going to be difficult ;). Thanks,
>
> Right, except that I thought I did:
>
> saruman:/usr/src/linux-btrfs/btrfs-next# grep STACKTRACE .config
> CONFIG_STACKTRACE_SUPPORT=y
> CONFIG_HAVE_RELIABLE_STACKTRACE=y
> CONFIG_STACKTRACE=y
> CONFIG_USER_STACKTRACE_SUPPORT=y
>
> Marc
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
> .... what McDonalds is to gourmet cooking
> Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=6hYQEzNFsUwvT2CxYV_u4CrE2zAroYdvDkhnSNUI_aY&s=8wh8ci2P8k3BgZ3s_Fxsh3cZak4P3ESZslRm2vobnqs&e= | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-09-03 14:42 ` Marc MERLIN
2017-09-03 14:55 ` Josef Bacik
@ 2017-09-03 17:33 ` Josef Bacik
2017-09-03 20:20 ` Marc MERLIN
1 sibling, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2017-09-03 17:33 UTC (permalink / raw)
To: Marc MERLIN
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
Alright pushed, sorry about that.
Josef
Sent from my iPhone
> On Sep 3, 2017, at 10:42 AM, Marc MERLIN <marc@merlins.org> wrote:
>
>> On Sun, Sep 03, 2017 at 02:38:57PM +0000, Josef Bacik wrote:
>> Oh yeah you need CONFIG_STACKTRACE turned on, otherwise this is going to be difficult ;). Thanks,
>
> Right, except that I thought I did:
>
> saruman:/usr/src/linux-btrfs/btrfs-next# grep STACKTRACE .config
> CONFIG_STACKTRACE_SUPPORT=y
> CONFIG_HAVE_RELIABLE_STACKTRACE=y
> CONFIG_STACKTRACE=y
> CONFIG_USER_STACKTRACE_SUPPORT=y
>
> Marc
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
> .... what McDonalds is to gourmet cooking
> Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=6hYQEzNFsUwvT2CxYV_u4CrE2zAroYdvDkhnSNUI_aY&s=8wh8ci2P8k3BgZ3s_Fxsh3cZak4P3ESZslRm2vobnqs&e= | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-09-03 17:33 ` Josef Bacik
@ 2017-09-03 20:20 ` Marc MERLIN
2017-09-04 0:55 ` Josef Bacik
2017-09-05 18:19 ` Josef Bacik
0 siblings, 2 replies; 47+ messages in thread
From: Marc MERLIN @ 2017-09-03 20:20 UTC (permalink / raw)
To: Josef Bacik
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
On Sun, Sep 03, 2017 at 05:33:33PM +0000, Josef Bacik wrote:
> Alright pushed, sorry about that.
I'm reasonably sure I'm running the new code, but still got this:
[ 2104.336513] Dropping a ref for a root that doesn't have a ref on the block
[ 2104.358226] Dumping block entry [115253923840 155648], num_refs 1, metadata 0, from disk 1
[ 2104.384037] Ref root 0, parent 3414272884736, owner 262813, offset 0, num_refs 18446744073709551615
[ 2104.412766] Ref root 418, parent 0, owner 262813, offset 0, num_refs 1
[ 2104.433888] Root entry 418, num_refs 1
[ 2104.446648] Root entry 69869, num_refs 0
[ 2104.459904] Ref action 2, root 69869, ref_root 0, parent 3414272884736, owner 262813, offset 0, num_refs 18446744073709551615
[ 2104.496244] No Stacktrace
Now, in the background I had a monthly md check of the underlying device
(mdadm raid 5), and got some of those. Obviously that's not good, and
I'm assuming that md raid5 may not have a checksum on blocks, so it won't know
which drive has the corrupted data.
Does that sound right?
Now, the good news is that btrfs on top does have checksums, so running a scrub should
hopefully find those corrupted blocks if they happen to be in use by the filesystem
(maybe they are free).
But as a reminder, this whole thread started with my FS maybe not being in a good state, but both
check --repair and scrub returning clean. Maybe I'll use the opportunity to re-run a check --repair
and a scrub after that to see what state things are in.
md6: mismatch sector in range 3581539536-3581539544
md6: mismatch sector in range 3581539544-3581539552
md6: mismatch sector in range 3581539552-3581539560
md6: mismatch sector in range 3581539560-3581539568
md6: mismatch sector in range 3581543792-3581543800
md6: mismatch sector in range 3581543800-3581543808
md6: mismatch sector in range 3581543808-3581543816
md6: mismatch sector in range 3581543816-3581543824
md6: mismatch sector in range 3581544112-3581544120
md6: mismatch sector in range 3581544120-3581544128
As for your patch, no idea why it's not giving me a stacktrace, sorry :-/
Git log of my tree does show:
commit aa162d2908bd7452805ea812b7550232b0b6ed53
Author: Josef Bacik <jbacik@fb.com>
Date: Sun Sep 3 13:32:17 2017 -0400
Btrfs: use be->metadata just in case
I suspect we're not getting the owner in some cases, so we want to just
use the known value.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-09-03 20:20 ` Marc MERLIN
@ 2017-09-04 0:55 ` Josef Bacik
2017-09-05 18:19 ` Josef Bacik
1 sibling, 0 replies; 47+ messages in thread
From: Josef Bacik @ 2017-09-04 0:55 UTC (permalink / raw)
To: Marc MERLIN
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
Ok this output looked fishy and so I went and tested it on my box again. It looks like I wasn't testing modifying a snapshot with an existing fs so I never saw these errors, but I see them as well. I definitely fucked the building of the initial ref tree. It's too late tonight for me to rework it and have it working for you, but I should be able to get it into shape in the morning. I'll let you know when I have something useful to test, sorry about the mess,
Josef
Sent from my iPhone
> On Sep 3, 2017, at 4:21 PM, Marc MERLIN <marc@merlins.org> wrote:
>
>> On Sun, Sep 03, 2017 at 05:33:33PM +0000, Josef Bacik wrote:
>> Alright pushed, sorry about that.
>
> I'm reasonably sure I'm running the new code, but still got this:
> [ 2104.336513] Dropping a ref for a root that doesn't have a ref on the block
> [ 2104.358226] Dumping block entry [115253923840 155648], num_refs 1, metadata 0, from disk 1
> [ 2104.384037] Ref root 0, parent 3414272884736, owner 262813, offset 0, num_refs 18446744073709551615
> [ 2104.412766] Ref root 418, parent 0, owner 262813, offset 0, num_refs 1
> [ 2104.433888] Root entry 418, num_refs 1
> [ 2104.446648] Root entry 69869, num_refs 0
> [ 2104.459904] Ref action 2, root 69869, ref_root 0, parent 3414272884736, owner 262813, offset 0, num_refs 18446744073709551615
> [ 2104.496244] No Stacktrace
>
> Now, in the background I had a monthly md check of the underlying device
> (mdadm raid 5), and got some of those. Obviously that's not good, and
> I'm assuming that md raid5 may not have a checksum on blocks, so it won't know
> which drive has the corrupted data.
> Does that sound right?
>
> Now, the good news is that btrfs on top does have checksums, so running a scrub should
> hopefully find those corrupted blocks if they happen to be in use by the filesystem
> (maybe they are free).
> But as a reminder, this whole thread started with my FS maybe not being in a good state, but both
> check --repair and scrub returning clean. Maybe I'll use the opportunity to re-run a check --repair
> and a scrub after that to see what state things are in.
>
> md6: mismatch sector in range 3581539536-3581539544
> md6: mismatch sector in range 3581539544-3581539552
> md6: mismatch sector in range 3581539552-3581539560
> md6: mismatch sector in range 3581539560-3581539568
> md6: mismatch sector in range 3581543792-3581543800
> md6: mismatch sector in range 3581543800-3581543808
> md6: mismatch sector in range 3581543808-3581543816
> md6: mismatch sector in range 3581543816-3581543824
> md6: mismatch sector in range 3581544112-3581544120
> md6: mismatch sector in range 3581544120-3581544128
>
> As for your patch, no idea why it's not giving me a stacktrace, sorry :-/
>
> Git log of my tree does show:
> commit aa162d2908bd7452805ea812b7550232b0b6ed53
> Author: Josef Bacik <jbacik@fb.com>
> Date: Sun Sep 3 13:32:17 2017 -0400
>
> Btrfs: use be->metadata just in case
>
> I suspect we're not getting the owner in some cases, so we want to just
> use the known value.
>
> Signed-off-by: Josef Bacik <jbacik@fb.com>
>
> Marc
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
> .... what McDonalds is to gourmet cooking
> Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=BaH33jtavN-1wWyV3yseE5v7ImIAaTXLnjChSr4HnQw&s=3JczS4Mo254uip2aIsYiC_EUHsmGYcCJUUMl6si8NQ8&e= | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-09-03 20:20 ` Marc MERLIN
2017-09-04 0:55 ` Josef Bacik
@ 2017-09-05 18:19 ` Josef Bacik
2017-09-09 18:39 ` Marc MERLIN
1 sibling, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2017-09-05 18:19 UTC (permalink / raw)
To: Marc MERLIN
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 3536 bytes --]
Alright I just reworked the build tree ref stuff and tested it to make sure it wasnât going to give false positives again. Apparently I had only ever used this with very basic existing fsâes and nothing super complicated, so it was just broken for anything complex. Iâve pushed it to my tree, you can just pull and build and try again. This time the stack traces will even work! Thanks,
Josef
On 9/3/17, 4:21 PM, "Marc MERLIN" <marc@merlins.org> wrote:
On Sun, Sep 03, 2017 at 05:33:33PM +0000, Josef Bacik wrote:
> Alright pushed, sorry about that.
I'm reasonably sure I'm running the new code, but still got this:
[ 2104.336513] Dropping a ref for a root that doesn't have a ref on the block
[ 2104.358226] Dumping block entry [115253923840 155648], num_refs 1, metadata 0, from disk 1
[ 2104.384037] Ref root 0, parent 3414272884736, owner 262813, offset 0, num_refs 18446744073709551615
[ 2104.412766] Ref root 418, parent 0, owner 262813, offset 0, num_refs 1
[ 2104.433888] Root entry 418, num_refs 1
[ 2104.446648] Root entry 69869, num_refs 0
[ 2104.459904] Ref action 2, root 69869, ref_root 0, parent 3414272884736, owner 262813, offset 0, num_refs 18446744073709551615
[ 2104.496244] No Stacktrace
Now, in the background I had a monthly md check of the underlying device
(mdadm raid 5), and got some of those. Obviously that's not good, and
I'm assuming that md raid5 may not have a checksum on blocks, so it won't know
which drive has the corrupted data.
Does that sound right?
Now, the good news is that btrfs on top does have checksums, so running a scrub should
hopefully find those corrupted blocks if they happen to be in use by the filesystem
(maybe they are free).
But as a reminder, this whole thread started with my FS maybe not being in a good state, but both
check --repair and scrub returning clean. Maybe I'll use the opportunity to re-run a check --repair
and a scrub after that to see what state things are in.
md6: mismatch sector in range 3581539536-3581539544
md6: mismatch sector in range 3581539544-3581539552
md6: mismatch sector in range 3581539552-3581539560
md6: mismatch sector in range 3581539560-3581539568
md6: mismatch sector in range 3581543792-3581543800
md6: mismatch sector in range 3581543800-3581543808
md6: mismatch sector in range 3581543808-3581543816
md6: mismatch sector in range 3581543816-3581543824
md6: mismatch sector in range 3581544112-3581544120
md6: mismatch sector in range 3581544120-3581544128
As for your patch, no idea why it's not giving me a stacktrace, sorry :-/
Git log of my tree does show:
commit aa162d2908bd7452805ea812b7550232b0b6ed53
Author: Josef Bacik <jbacik@fb.com>
Date: Sun Sep 3 13:32:17 2017 -0400
Btrfs: use be->metadata just in case
I suspect we're not getting the owner in some cases, so we want to just
use the known value.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=BaH33jtavN-1wWyV3yseE5v7ImIAaTXLnjChSr4HnQw&s=3JczS4Mo254uip2aIsYiC_EUHsmGYcCJUUMl6si8NQ8&e= | PGP 1024R/763BE901
ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±ý»k~ÏâØ^nr¡ö¦zË\x1aëh¨èÚ&£ûàz¿äz¹Þú+Ê+zf£¢·h§~Ûiÿÿïêÿêçz_è®\x0fæj:+v¨þ)ߣøm
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-09-05 18:19 ` Josef Bacik
@ 2017-09-09 18:39 ` Marc MERLIN
2017-09-09 22:56 ` Josef Bacik
0 siblings, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2017-09-09 18:39 UTC (permalink / raw)
To: Josef Bacik
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
On Tue, Sep 05, 2017 at 06:19:25PM +0000, Josef Bacik wrote:
> Alright I just reworked the build tree ref stuff and tested it to make sure it wasn’t going to give false positives again. Apparently I had only ever used this with very basic existing fs’es and nothing super complicated, so it was just broken for anything complex. I’ve pushed it to my tree, you can just pull and build and try again. This time the stack traces will even work! Thanks,
Ok, so I found out that I just need to copy a bunch of data to the
filesystem to trigger the bug.
There you go:
[318400.507972] re-allocated a block that still has references to it!
[318400.527517] Dumping block entry [13282417065984 16384], num_refs 2, metadata 1, from disk 1
[318400.553751] Ref root 2, parent 0, owner 0, offset 0, num_refs 1
[318400.573208] Root entry 2, num_refs 1
[318400.585614] Root entry 7, num_refs 0
[318400.598028] Ref action 3, root 7, ref_root 7, parent 0, owner 1, offset 0, num_refs 1
[318400.623774] btrfs_alloc_tree_block+0x33e/0x3e1
[318400.639083] __btrfs_cow_block+0xf3/0x420
[318400.652817] btrfs_cow_block+0xcf/0x145
[318400.666024] btrfs_search_slot+0x269/0x6de
[318400.680041] btrfs_del_csums+0xac/0x2f9
[318400.693245] __btrfs_free_extent+0x88b/0xa0b
[318400.707718] __btrfs_run_delayed_refs+0xb4e/0xd20
[318400.723491] btrfs_run_delayed_refs+0x77/0x1a1
[318400.738993] btrfs_write_dirty_block_groups+0xf5/0x2c1
[318400.755994] commit_cowonly_roots+0x1da/0x273
[318400.770673] btrfs_commit_transaction+0x3dd/0x761
[318400.786397] transaction_kthread+0xe2/0x178
[318400.800515] kthread+0xfb/0x100
[318400.811487] ret_from_fork+0x25/0x30
[318400.823748] 0xffffffffffffffff
[318400.957574] ------------[ cut here ]------------
[318400.972498] WARNING: CPU: 2 PID: 3242 at fs/btrfs/extent-tree.c:3015 btrfs_run_delayed_refs+0xa2/0x1a1
[318401.001382] Modules linked in: veth ip6table_filter ip6_tables ebtable_nat ebtables ppdev lp xt_addrtype br_netfilter bridge stp llc tun autofs4 softdog binfmt_misc ftdi_sio nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_nat xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG iptable_mangle iptable_filter lm85 hwmon_vid pl2303 dm_snapshot dm_bufio iptable_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_conntrack_ftp ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_nat nf_conntrack x_tables sg st snd_pcm_oss snd_mixer_oss bcache kvm_intel kvm irqbypass snd_hda_codec_realtek snd_hda_codec_generic snd_cmipci snd_hda_intel snd_mpu401_uart snd_hda_codec snd_opl3_lib eeepc_wmi snd_hda_core tpm_infineon snd_rawmidi asix asus_wmi rc_ati_x10 tpm_tis
[318401.218357] snd_seq_device sparse_keymap snd_hwdep tpm_tis_core ati_remote usbnet parport_pc snd_pcm rfkill pcspkr i915 hwmon tpm parport rc_core libphy mei_me snd_timer lpc_ich wmi_bmof battery usbserial evdev wmi input_leds i2c_i801 snd soundcore e1000e ptp pps_core fuse raid456 multipath mmc_block mmc_core lrw ablk_helper dm_crypt dm_mod async_raid6_recov async_pq async_xor async_memcpy async_tx crc32c_intel blowfish_x86_64 blowfish_common pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd xhci_pci ehci_pci xhci_hcd ehci_hcd mvsas libsas r8169 sata_sil24 usbcore mii scsi_transport_sas thermal fan [last unloaded: ftdi_sio]
[318401.392440] CPU: 2 PID: 3242 Comm: btrfs-transacti Tainted: G U 4.13.0-rc5-amd64-stkreg-sysrq-20170902d+ #6
[318401.426262] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
[318401.454894] task: ffff948ef791e200 task.stack: ffffb18a091ec000
[318401.473918] RIP: 0010:btrfs_run_delayed_refs+0xa2/0x1a1
[318401.490849] RSP: 0018:ffffb18a091efd08 EFLAGS: 00010296
[318401.507751] RAX: 0000000000000026 RBX: ffff9488208be618 RCX: 0000000000000000
[318401.530384] RDX: ffff948f1e295e01 RSI: ffff948f1e28dd58 RDI: ffff948f1e28dd58
[318401.553548] RBP: ffffb18a091efd50 R08: 0003dc12ea8bcc57 R09: ffff948f1f50b868
[318401.576127] R10: ffff948b1f1cc460 R11: ffffffffaef37285 R12: 00000000ffffffef
[318401.598717] R13: ffffffffffffffff R14: ffff948edb7efd48 R15: ffff948cdbdeb000
[318401.621327] FS: 0000000000000000(0000) GS:ffff948f1e280000(0000) knlGS:0000000000000000
[318401.646737] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[318401.665149] CR2: 00000000f7f05001 CR3: 000000061f587000 CR4: 00000000001406e0
[318401.687684] Call Trace:
[318401.696148] btrfs_write_dirty_block_groups+0xf5/0x2c1
[318401.712745] ? btrfs_run_delayed_refs+0x127/0x1a1
[318401.727981] commit_cowonly_roots+0x1da/0x273
[318401.742183] btrfs_commit_transaction+0x3dd/0x761
[318401.757447] transaction_kthread+0xe2/0x178
[318401.771158] ? btrfs_cleanup_transaction+0x3c2/0x3c2
[318401.787169] kthread+0xfb/0x100
[318401.797769] ? init_completion+0x24/0x24
[318401.810718] ret_from_fork+0x25/0x30
[318401.822588] Code: 85 c0 41 89 c4 79 60 48 8b 43 60 f0 0f ba a8 d8 16 00 00 02 72 35 41 83 fc fb 74 13 44 89 e6 48 c7 c7 27 3f af ae e8 81 5d e1 ff <0f> ff eb 1c f6 05 2a da ab 00 04 74 13 48 8b 7b 60 44 89 e2 48
[318401.881182] ---[ end trace 47464f1fcc4796c5 ]---
[318401.896818] BTRFS: error (device dm-2) in btrfs_run_delayed_refs:3015: errno=-17 Object already exists
[318401.925978] BTRFS info (device dm-2): forced readonly
[318401.950682] BTRFS warning (device dm-2): Skipping commit of aborted transaction.
[318401.974102] BTRFS: error (device dm-2) in cleanup_transaction:1873: errno=-17 Object already exists
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-09-09 18:39 ` Marc MERLIN
@ 2017-09-09 22:56 ` Josef Bacik
2017-09-10 2:36 ` Marc MERLIN
0 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2017-09-09 22:56 UTC (permalink / raw)
To: Marc MERLIN
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 6676 bytes --]
Well that's odd, a block allocated on disk is in the free space cache. Can I see the full output of the fsck? I want to make sure it's actually getting to the part where it checks the free space cache. If it does then I'll have to think of how to catch this kind of bug, because you've got a weird one. Thanks,
Josef
Sent from my iPhone
> On Sep 9, 2017, at 2:39 PM, Marc MERLIN <marc@merlins.org> wrote:
>
>> On Tue, Sep 05, 2017 at 06:19:25PM +0000, Josef Bacik wrote:
>> Alright I just reworked the build tree ref stuff and tested it to make sure it wasnât going to give false positives again. Apparently I had only ever used this with very basic existing fsâes and nothing super complicated, so it was just broken for anything complex. Iâve pushed it to my tree, you can just pull and build and try again. This time the stack traces will even work! Thanks,
>
> Ok, so I found out that I just need to copy a bunch of data to the
> filesystem to trigger the bug.
>
> There you go:
> [318400.507972] re-allocated a block that still has references to it!
> [318400.527517] Dumping block entry [13282417065984 16384], num_refs 2, metadata 1, from disk 1
> [318400.553751] Ref root 2, parent 0, owner 0, offset 0, num_refs 1
> [318400.573208] Root entry 2, num_refs 1
> [318400.585614] Root entry 7, num_refs 0
> [318400.598028] Ref action 3, root 7, ref_root 7, parent 0, owner 1, offset 0, num_refs 1
> [318400.623774] btrfs_alloc_tree_block+0x33e/0x3e1
> [318400.639083] __btrfs_cow_block+0xf3/0x420
> [318400.652817] btrfs_cow_block+0xcf/0x145
> [318400.666024] btrfs_search_slot+0x269/0x6de
> [318400.680041] btrfs_del_csums+0xac/0x2f9
> [318400.693245] __btrfs_free_extent+0x88b/0xa0b
> [318400.707718] __btrfs_run_delayed_refs+0xb4e/0xd20
> [318400.723491] btrfs_run_delayed_refs+0x77/0x1a1
> [318400.738993] btrfs_write_dirty_block_groups+0xf5/0x2c1
> [318400.755994] commit_cowonly_roots+0x1da/0x273
> [318400.770673] btrfs_commit_transaction+0x3dd/0x761
> [318400.786397] transaction_kthread+0xe2/0x178
> [318400.800515] kthread+0xfb/0x100
> [318400.811487] ret_from_fork+0x25/0x30
> [318400.823748] 0xffffffffffffffff
> [318400.957574] ------------[ cut here ]------------
> [318400.972498] WARNING: CPU: 2 PID: 3242 at fs/btrfs/extent-tree.c:3015 btrfs_run_delayed_refs+0xa2/0x1a1
> [318401.001382] Modules linked in: veth ip6table_filter ip6_tables ebtable_nat ebtables ppdev lp xt_addrtype br_netfilter bridge stp llc tun autofs4 softdog binfmt_misc ftdi_sio nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_nat xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG iptable_mangle iptable_filter lm85 hwmon_vid pl2303 dm_snapshot dm_bufio iptable_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_conntrack_ftp ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_nat nf_conntrack x_tables sg st snd_pcm_oss snd_mixer_oss bcache kvm_intel kvm irqbypass snd_hda_codec_realtek snd_hda_codec_generic snd_cmipci snd_hda_intel snd_mpu401_uart snd_hda_codec snd_opl3_lib eeepc_wmi snd_hda_core tpm_infineon snd_rawmidi asix asus_wmi rc_ati_x10 tpm_tis
> [318401.218357] snd_seq_device sparse_keymap snd_hwdep tpm_tis_core ati_remote usbnet parport_pc snd_pcm rfkill pcspkr i915 hwmon tpm parport rc_core libphy mei_me snd_timer lpc_ich wmi_bmof battery usbserial evdev wmi input_leds i2c_i801 snd soundcore e1000e ptp pps_core fuse raid456 multipath mmc_block mmc_core lrw ablk_helper dm_crypt dm_mod async_raid6_recov async_pq async_xor async_memcpy async_tx crc32c_intel blowfish_x86_64 blowfish_common pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd xhci_pci ehci_pci xhci_hcd ehci_hcd mvsas libsas r8169 sata_sil24 usbcore mii scsi_transport_sas thermal fan [last unloaded: ftdi_sio]
> [318401.392440] CPU: 2 PID: 3242 Comm: btrfs-transacti Tainted: G U 4.13.0-rc5-amd64-stkreg-sysrq-20170902d+ #6
> [318401.426262] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> [318401.454894] task: ffff948ef791e200 task.stack: ffffb18a091ec000
> [318401.473918] RIP: 0010:btrfs_run_delayed_refs+0xa2/0x1a1
> [318401.490849] RSP: 0018:ffffb18a091efd08 EFLAGS: 00010296
> [318401.507751] RAX: 0000000000000026 RBX: ffff9488208be618 RCX: 0000000000000000
> [318401.530384] RDX: ffff948f1e295e01 RSI: ffff948f1e28dd58 RDI: ffff948f1e28dd58
> [318401.553548] RBP: ffffb18a091efd50 R08: 0003dc12ea8bcc57 R09: ffff948f1f50b868
> [318401.576127] R10: ffff948b1f1cc460 R11: ffffffffaef37285 R12: 00000000ffffffef
> [318401.598717] R13: ffffffffffffffff R14: ffff948edb7efd48 R15: ffff948cdbdeb000
> [318401.621327] FS: 0000000000000000(0000) GS:ffff948f1e280000(0000) knlGS:0000000000000000
> [318401.646737] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [318401.665149] CR2: 00000000f7f05001 CR3: 000000061f587000 CR4: 00000000001406e0
> [318401.687684] Call Trace:
> [318401.696148] btrfs_write_dirty_block_groups+0xf5/0x2c1
> [318401.712745] ? btrfs_run_delayed_refs+0x127/0x1a1
> [318401.727981] commit_cowonly_roots+0x1da/0x273
> [318401.742183] btrfs_commit_transaction+0x3dd/0x761
> [318401.757447] transaction_kthread+0xe2/0x178
> [318401.771158] ? btrfs_cleanup_transaction+0x3c2/0x3c2
> [318401.787169] kthread+0xfb/0x100
> [318401.797769] ? init_completion+0x24/0x24
> [318401.810718] ret_from_fork+0x25/0x30
> [318401.822588] Code: 85 c0 41 89 c4 79 60 48 8b 43 60 f0 0f ba a8 d8 16 00 00 02 72 35 41 83 fc fb 74 13 44 89 e6 48 c7 c7 27 3f af ae e8 81 5d e1 ff <0f> ff eb 1c f6 05 2a da ab 00 04 74 13 48 8b 7b 60 44 89 e2 48
> [318401.881182] ---[ end trace 47464f1fcc4796c5 ]---
> [318401.896818] BTRFS: error (device dm-2) in btrfs_run_delayed_refs:3015: errno=-17 Object already exists
> [318401.925978] BTRFS info (device dm-2): forced readonly
> [318401.950682] BTRFS warning (device dm-2): Skipping commit of aborted transaction.
> [318401.974102] BTRFS: error (device dm-2) in cleanup_transaction:1873: errno=-17 Object already exists
>
> Marc
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
> .... what McDonalds is to gourmet cooking
> Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=o759mdnkjma3m2oMqEzj1KVVewpEmzlydubih83mtq0&s=IRkCyJIqdUxvKz2hxZ2G_kAV0pyiM5qARhoNzbUuoh0&e= | PGP 1024R/763BE901
ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±ý»k~ÏâØ^nr¡ö¦zË\x1aëh¨èÚ&£ûàz¿äz¹Þú+Ê+zf£¢·h§~Ûiÿÿïêÿêçz_è®\x0fæj:+v¨þ)ߣøm
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-09-09 22:56 ` Josef Bacik
@ 2017-09-10 2:36 ` Marc MERLIN
2017-09-10 3:12 ` Josef Bacik
0 siblings, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2017-09-10 2:36 UTC (permalink / raw)
To: Josef Bacik
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
On Sat, Sep 09, 2017 at 10:56:14PM +0000, Josef Bacik wrote:
> Well that's odd, a block allocated on disk is in the free space cache. Can I see the full output of the fsck? I want to make sure it's actually getting to the part where it checks the free space cache. If it does then I'll have to think of how to catch this kind of bug, because you've got a weird one. Thanks,
Well, btrfs check was clean before, that, but now I returned this:
gargamel:~# time btrfs check /dev/mapper/dshelf1
Checking filesystem on /dev/mapper/dshelf1
UUID: 36f5079e-ca6c-4855-8639-ccb82695c18d
checking extents
checking free space cache
Wanted bytes 16384, found 196608 for off 13282417049600
Wanted bytes 536870912, found 196608 for off 13282417049600
cache appears valid but isn't 13282417049600
There is no free space entry for 13849889603584-13849889652736
There is no free space entry for 13849889603584-13850426474496
cache appears valid but isn't 13849889603584
Wanted bytes 5832704, found 81920 for off 13870290698240
Wanted bytes 536870912, found 81920 for off 13870290698240
cache appears valid but isn't 13870290698240
block group 13928272756736 has wrong amount of free space
failed to load free space cache for block group 13928272756736
Duplicate entries in free space cache
failed to load free space cache for block group 13962095624192
block group 14003434684416 has wrong amount of free space
failed to load free space cache for block group 14003434684416
block group 14470042615808 has wrong amount of free space
failed to load free space cache for block group 14470042615808
block group 14610702794752 has wrong amount of free space
failed to load free space cache for block group 14610702794752
block group 14612313407488 has wrong amount of free space
failed to load free space cache for block group 14612313407488
block group 14624661438464 has wrong amount of free space
failed to load free space cache for block group 14624661438464
block group 14648820629504 has wrong amount of free space
failed to load free space cache for block group 14648820629504
Wanted offset 14657410793472, found 14657410760704
Wanted offset 14657410793472, found 14657410760704
cache appears valid but isn't 14657410564096
block group 15886844952576 has wrong amount of free space
failed to load free space cache for block group 15886844952576
There is no free space entry for 15905635434496-15905636499456
There is no free space entry for 15905635434496-15906172305408
cache appears valid but isn't 15905635434496
block group 16542901207040 has wrong amount of free space
failed to load free space cache for block group 16542901207040
block group 16581019041792 has wrong amount of free space
failed to load free space cache for block group 16581019041792
block group 16616989392896 has wrong amount of free space
failed to load free space cache for block group 16616989392896
block group 16676582064128 has wrong amount of free space
failed to load free space cache for block group 16676582064128
block group 16697520029696 has wrong amount of free space
failed to load free space cache for block group 16697520029696
block group 16848380755968 has wrong amount of free space
failed to load free space cache for block group 16848380755968
ERROR: errors found in free space cache
found 11732749766656 bytes used, error(s) found
total csum bytes: 11441478452
total tree bytes: 13793296384
total fs tree bytes: 727580672
total extent tree bytes: 483426304
btree space waste bytes: 1194373662
file data blocks allocated: 12133646495744
referenced 12155707805696
real 100m12.252s
user 0m33.771s
sys 1m11.220s
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-09-10 2:36 ` Marc MERLIN
@ 2017-09-10 3:12 ` Josef Bacik
2017-09-10 13:14 ` Marc MERLIN
0 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2017-09-10 3:12 UTC (permalink / raw)
To: Marc MERLIN
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
Ok mount -o clear_cache, umount and run fsck again just to make sure. Then if it comes out clean mount with ref_verify again and wait for it to blow up again. Thanks,
Josef
Sent from my iPhone
> On Sep 9, 2017, at 10:37 PM, Marc MERLIN <marc@merlins.org> wrote:
>
>> On Sat, Sep 09, 2017 at 10:56:14PM +0000, Josef Bacik wrote:
>> Well that's odd, a block allocated on disk is in the free space cache. Can I see the full output of the fsck? I want to make sure it's actually getting to the part where it checks the free space cache. If it does then I'll have to think of how to catch this kind of bug, because you've got a weird one. Thanks,
>
> Well, btrfs check was clean before, that, but now I returned this:
> gargamel:~# time btrfs check /dev/mapper/dshelf1
> Checking filesystem on /dev/mapper/dshelf1
> UUID: 36f5079e-ca6c-4855-8639-ccb82695c18d
> checking extents
> checking free space cache
> Wanted bytes 16384, found 196608 for off 13282417049600
> Wanted bytes 536870912, found 196608 for off 13282417049600
> cache appears valid but isn't 13282417049600
> There is no free space entry for 13849889603584-13849889652736
> There is no free space entry for 13849889603584-13850426474496
> cache appears valid but isn't 13849889603584
> Wanted bytes 5832704, found 81920 for off 13870290698240
> Wanted bytes 536870912, found 81920 for off 13870290698240
> cache appears valid but isn't 13870290698240
> block group 13928272756736 has wrong amount of free space
> failed to load free space cache for block group 13928272756736
> Duplicate entries in free space cache
> failed to load free space cache for block group 13962095624192
> block group 14003434684416 has wrong amount of free space
> failed to load free space cache for block group 14003434684416
> block group 14470042615808 has wrong amount of free space
> failed to load free space cache for block group 14470042615808
> block group 14610702794752 has wrong amount of free space
> failed to load free space cache for block group 14610702794752
> block group 14612313407488 has wrong amount of free space
> failed to load free space cache for block group 14612313407488
> block group 14624661438464 has wrong amount of free space
> failed to load free space cache for block group 14624661438464
> block group 14648820629504 has wrong amount of free space
> failed to load free space cache for block group 14648820629504
> Wanted offset 14657410793472, found 14657410760704
> Wanted offset 14657410793472, found 14657410760704
> cache appears valid but isn't 14657410564096
> block group 15886844952576 has wrong amount of free space
> failed to load free space cache for block group 15886844952576
> There is no free space entry for 15905635434496-15905636499456
> There is no free space entry for 15905635434496-15906172305408
> cache appears valid but isn't 15905635434496
> block group 16542901207040 has wrong amount of free space
> failed to load free space cache for block group 16542901207040
> block group 16581019041792 has wrong amount of free space
> failed to load free space cache for block group 16581019041792
> block group 16616989392896 has wrong amount of free space
> failed to load free space cache for block group 16616989392896
> block group 16676582064128 has wrong amount of free space
> failed to load free space cache for block group 16676582064128
> block group 16697520029696 has wrong amount of free space
> failed to load free space cache for block group 16697520029696
> block group 16848380755968 has wrong amount of free space
> failed to load free space cache for block group 16848380755968
> ERROR: errors found in free space cache
> found 11732749766656 bytes used, error(s) found
> total csum bytes: 11441478452
> total tree bytes: 13793296384
> total fs tree bytes: 727580672
> total extent tree bytes: 483426304
> btree space waste bytes: 1194373662
> file data blocks allocated: 12133646495744
> referenced 12155707805696
>
> real 100m12.252s
> user 0m33.771s
> sys 1m11.220s
>
> Marc
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
> .... what McDonalds is to gourmet cooking
> Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=aM1dKUolLxTtIO-Lzj78H4ut4SBtL_PddTteGDuBebc&s=vl4rfHfvogAgd7IHj7J1ZX4Joo9Rwj87HHq-BoldS8k&e= | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-09-10 3:12 ` Josef Bacik
@ 2017-09-10 13:14 ` Marc MERLIN
2017-09-10 13:16 ` Josef Bacik
0 siblings, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2017-09-10 13:14 UTC (permalink / raw)
To: Josef Bacik
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
On Sun, Sep 10, 2017 at 03:12:16AM +0000, Josef Bacik wrote:
> Ok mount -o clear_cache, umount and run fsck again just to make sure. Then if it comes out clean mount with ref_verify again and wait for it to blow up again. Thanks,
Ok, just did the 2nd fsck, came back clean after mount -o clear_cache
I'll re-trigger the exact same bug and repeat the whole cycle then.
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-09-10 13:14 ` Marc MERLIN
@ 2017-09-10 13:16 ` Josef Bacik
2017-09-11 0:22 ` Marc MERLIN
0 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2017-09-10 13:16 UTC (permalink / raw)
To: Marc MERLIN
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
Great, if the free space cache is fucked again after the next go around then I need to expand the verifier to watch entries being added to the cache as well. Thanks,
Josef
Sent from my iPhone
> On Sep 10, 2017, at 9:14 AM, Marc MERLIN <marc@merlins.org> wrote:
>
>> On Sun, Sep 10, 2017 at 03:12:16AM +0000, Josef Bacik wrote:
>> Ok mount -o clear_cache, umount and run fsck again just to make sure. Then if it comes out clean mount with ref_verify again and wait for it to blow up again. Thanks,
>
> Ok, just did the 2nd fsck, came back clean after mount -o clear_cache
>
> I'll re-trigger the exact same bug and repeat the whole cycle then.
>
> Marc
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
> .... what McDonalds is to gourmet cooking
> Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=46Ubpt2icp5_meAcqMuzd4whl0dZVSwf02fqYoDbzKw&s=nb55W48Rh0IzH8FH4eykviziYCc2S72iYmmNxdpjbOc&e= | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-09-10 13:16 ` Josef Bacik
@ 2017-09-11 0:22 ` Marc MERLIN
2017-09-27 18:01 ` Marc MERLIN
0 siblings, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2017-09-11 0:22 UTC (permalink / raw)
To: Josef Bacik
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
On Sun, Sep 10, 2017 at 01:16:26PM +0000, Josef Bacik wrote:
> Great, if the free space cache is fucked again after the next go
> around then I need to expand the verifier to watch entries being added
> to the cache as well. Thanks,
Well, I copied about 1TB of data, and nothing happened.
So it seems clearing it and fsck may have fixed this fault I had been
carrying for quite a while.
If so, yeah!
I'm not sure if this needs a kernel fix to not get triggered and if
btrfs check should also be improved to catch this, but hopefully you
know what makes sense there.
Thanks,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
2017-09-11 0:22 ` Marc MERLIN
@ 2017-09-27 18:01 ` Marc MERLIN
0 siblings, 0 replies; 47+ messages in thread
From: Marc MERLIN @ 2017-09-27 18:01 UTC (permalink / raw)
To: Josef Bacik
Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
David Sterba
On Sun, Sep 10, 2017 at 05:22:14PM -0700, Marc MERLIN wrote:
> On Sun, Sep 10, 2017 at 01:16:26PM +0000, Josef Bacik wrote:
> > Great, if the free space cache is fucked again after the next go
> > around then I need to expand the verifier to watch entries being added
> > to the cache as well. Thanks,
>
> Well, I copied about 1TB of data, and nothing happened.
> So it seems clearing it and fsck may have fixed this fault I had been
> carrying for quite a while.
> If so, yeah!
>
> I'm not sure if this needs a kernel fix to not get triggered and if
> btrfs check should also be improved to catch this, but hopefully you
> know what makes sense there.
Just to report back, it's now been another 2 weeks, and no problem.
Seems that forcing the clear cache was actually the issue. Not sure if
the kernel should have found/detected/auto fixed the problem or if btrfs
check should have.
Either way, thanks for your help.
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 47+ messages in thread
end of thread, other threads:[~2017-09-27 18:02 UTC | newest]
Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-11 6:21 BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists Marc MERLIN
2017-07-11 16:00 ` Chris Murphy
2017-07-11 16:48 ` Marc MERLIN
2017-07-11 22:43 ` Chris Murphy
2017-07-11 23:04 ` Marc MERLIN
2017-07-13 1:10 ` Marc MERLIN
2017-07-13 18:17 ` Chris Murphy
2017-07-15 0:48 ` Marc MERLIN
2017-07-15 1:22 ` BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012) Marc MERLIN
2017-07-15 23:12 ` Marc MERLIN
2017-07-16 14:01 ` Giuseppe Della Bianca
2017-07-16 16:06 ` Marc MERLIN
2017-07-17 11:05 ` gius db
2017-08-29 3:16 ` Marc MERLIN
2017-08-29 14:30 ` Josef Bacik
2017-08-29 14:39 ` Marc MERLIN
2017-08-29 14:43 ` Josef Bacik
2017-08-29 18:22 ` Josef Bacik
2017-08-30 3:40 ` Marc MERLIN
2017-08-31 14:52 ` Josef Bacik
2017-08-31 17:36 ` Marc MERLIN
2017-08-31 17:48 ` Josef Bacik
2017-09-01 20:43 ` Marc MERLIN
2017-09-01 23:01 ` Josef Bacik
2017-09-02 16:09 ` Marc MERLIN
2017-09-02 16:52 ` Josef Bacik
[not found] ` <CAHKv19A=OVgCpQpDL2454T+f8QgLm9iynA8xZ4w4Kg8JjYS=UA@mail.gmail.com>
2017-09-02 18:55 ` Fwd: " George Joseph
2017-09-02 23:53 ` Marc MERLIN
2017-09-03 0:30 ` Josef Bacik
2017-09-03 1:01 ` Marc MERLIN
2017-09-03 3:26 ` Josef Bacik
2017-09-03 14:31 ` Marc MERLIN
2017-09-03 14:38 ` Josef Bacik
2017-09-03 14:42 ` Marc MERLIN
2017-09-03 14:55 ` Josef Bacik
2017-09-03 17:33 ` Josef Bacik
2017-09-03 20:20 ` Marc MERLIN
2017-09-04 0:55 ` Josef Bacik
2017-09-05 18:19 ` Josef Bacik
2017-09-09 18:39 ` Marc MERLIN
2017-09-09 22:56 ` Josef Bacik
2017-09-10 2:36 ` Marc MERLIN
2017-09-10 3:12 ` Josef Bacik
2017-09-10 13:14 ` Marc MERLIN
2017-09-10 13:16 ` Josef Bacik
2017-09-11 0:22 ` Marc MERLIN
2017-09-27 18:01 ` Marc MERLIN
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.