All of lore.kernel.org
 help / color / mirror / Atom feed
* BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
@ 2017-07-11  6:21 Marc MERLIN
  2017-07-11 16:00 ` Chris Murphy
  2017-07-15  1:22 ` BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012) Marc MERLIN
  0 siblings, 2 replies; 47+ messages in thread
From: Marc MERLIN @ 2017-07-11  6:21 UTC (permalink / raw)
  To: linux-btrfs

Looks like btrfs has decided to give me hell.
I'm still recovering my system.
The biggest filesystem seems to work, but I just had it go read only:

------------[ cut here ]------------
WARNING: CPU: 5 PID: 3734 at fs/btrfs/extent-tree.c:2960 btrfs_run_delayed_refs+0xb6/0x1dc
BTRFS: Transaction aborted (error -17)
Modules linked in: udp_diag tcp_diag inet_diag veth ip6table_filter ip6_tables ebtable_nat ebtables ppdev lp xt_addrtype br_
fmt_misc ftdi_sio nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_
ptable_mangle iptable_filter pl2303 dm_snapshot dm_bufio iptable_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 
e_ipv4 nf_nat nf_conntrack x_tables sg st snd_pcm_oss snd_mixer_oss bcache kvm_intel kvm irqbypass snd_hda_codec_realtek snd
da_codec snd_cmipci rc_ati_x10 asus_wmi snd_hda_core snd_mpu401_uart snd_opl3_lib snd_hwdep snd_rawmidi snd_seq_device spars
l tpm_infineon snd tpm_tis hwmon tpm_tis_core usbnet rc_core i2c_i801 usbserial libphy soundcore wmi i915 lpc_ich mfd_cor
s evdev pcspkr parport_pc battery mei_me parport i2c_smbus e1000e ptp pps_core fuse raid456 multipath mmc_block mmc_core dm_
r async_memcpy async_tx crc32c_intel blowfish_x86_64 blowfish_common aesni_intel aes_x86_64 lrw glue_helper ablk_helper cryp
4 thermal usbcore mvsas libsas fjes scsi_transport_sas fan r8169 mii usb_common [last unloaded: ftdi_sio]
CPU: 1 PID: 3734 Comm: btrfs-transacti Tainted: G     U  W       4.8.10-amd64-preempt-sysrq-20161121vb3tj1 #12
Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
 0000000000200286 000000003f87d529 ffff9dcc9838fd00 ffffffffbb39e738
 ffff9dcc9838fd50 0000000000000000 ffff9dcc9838fd40 ffffffffbb066e08
 00000b909838fdc0 ffff9dc94fdc9be0 0000000000000000 ffff9dcca0d93000
Call Trace:
 [<ffffffffbb39e738>] dump_stack+0x63/0x7f
 [<ffffffffbb066e08>] __warn+0xc2/0xdd
 [<ffffffffbb066e7d>] warn_slowpath_fmt+0x5a/0x76
 [<ffffffffbb291dc2>] btrfs_run_delayed_refs+0xb6/0x1dc
 [<ffffffffbb2a4d1d>] btrfs_commit_transaction+0x5b/0x965
 [<ffffffffbb2a030e>] transaction_kthread+0xf5/0x19f
 [<ffffffffbb2a0219>] ? btrfs_cleanup_transaction+0x47b/0x47b
 [<ffffffffbb081df3>] kthread+0xb4/0xbc
 [<ffffffffbb6d23df>] ret_from_fork+0x1f/0x40
 [<ffffffffbb081d3f>] ? init_completion+0x24/0x24
---[ end trace feb4b95c83ac065f ]---
BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
BTRFS info (device dm-2): forced readonly

Yes, I'm back with 4.8 since I need to get back to a working state,
however this may be a totally unrelated bug that has been fixed since
4.8?

The filesystem seems fine though:
enabling repair mode
Checking filesystem on /dev/mapper/dshelf1
UUID: 36f5079e-ca6c-4855-8639-ccb82695c18d
checking extents
Fixed 0 roots.
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
checking csums
checking root refs
found 11452211699712 bytes used, no error found
total csum bytes: 11167908392
total tree bytes: 13463715840
total fs tree bytes: 712867840
total extent tree bytes: 478281728
btree space waste bytes: 1159679826
file data blocks allocated: 11888008564736
 referenced 11908268208128

So I'm going to remount it read-write, but can someone explain the failure above?

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
  2017-07-11  6:21 BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists Marc MERLIN
@ 2017-07-11 16:00 ` Chris Murphy
  2017-07-11 16:48   ` Marc MERLIN
  2017-07-15  1:22 ` BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012) Marc MERLIN
  1 sibling, 1 reply; 47+ messages in thread
From: Chris Murphy @ 2017-07-11 16:00 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Btrfs BTRFS

On Tue, Jul 11, 2017 at 12:21 AM, Marc MERLIN <marc@merlins.org> wrote:
> Looks like btrfs has decided to give me hell.
> I'm still recovering my system.
> The biggest filesystem seems to work, but I just had it go read only:
>
> ------------[ cut here ]------------
> WARNING: CPU: 5 PID: 3734 at fs/btrfs/extent-tree.c:2960 btrfs_run_delayed_refs+0xb6/0x1dc
> BTRFS: Transaction aborted (error -17)
> Modules linked in: udp_diag tcp_diag inet_diag veth ip6table_filter ip6_tables ebtable_nat ebtables ppdev lp xt_addrtype br_
> fmt_misc ftdi_sio nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_
> ptable_mangle iptable_filter pl2303 dm_snapshot dm_bufio iptable_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
> e_ipv4 nf_nat nf_conntrack x_tables sg st snd_pcm_oss snd_mixer_oss bcache kvm_intel kvm irqbypass snd_hda_codec_realtek snd
> da_codec snd_cmipci rc_ati_x10 asus_wmi snd_hda_core snd_mpu401_uart snd_opl3_lib snd_hwdep snd_rawmidi snd_seq_device spars
> l tpm_infineon snd tpm_tis hwmon tpm_tis_core usbnet rc_core i2c_i801 usbserial libphy soundcore wmi i915 lpc_ich mfd_cor
> s evdev pcspkr parport_pc battery mei_me parport i2c_smbus e1000e ptp pps_core fuse raid456 multipath mmc_block mmc_core dm_
> r async_memcpy async_tx crc32c_intel blowfish_x86_64 blowfish_common aesni_intel aes_x86_64 lrw glue_helper ablk_helper cryp
> 4 thermal usbcore mvsas libsas fjes scsi_transport_sas fan r8169 mii usb_common [last unloaded: ftdi_sio]
> CPU: 1 PID: 3734 Comm: btrfs-transacti Tainted: G     U  W       4.8.10-amd64-preempt-sysrq-20161121vb3tj1 #12
> Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
>  0000000000200286 000000003f87d529 ffff9dcc9838fd00 ffffffffbb39e738
>  ffff9dcc9838fd50 0000000000000000 ffff9dcc9838fd40 ffffffffbb066e08
>  00000b909838fdc0 ffff9dc94fdc9be0 0000000000000000 ffff9dcca0d93000
> Call Trace:
>  [<ffffffffbb39e738>] dump_stack+0x63/0x7f
>  [<ffffffffbb066e08>] __warn+0xc2/0xdd
>  [<ffffffffbb066e7d>] warn_slowpath_fmt+0x5a/0x76
>  [<ffffffffbb291dc2>] btrfs_run_delayed_refs+0xb6/0x1dc
>  [<ffffffffbb2a4d1d>] btrfs_commit_transaction+0x5b/0x965
>  [<ffffffffbb2a030e>] transaction_kthread+0xf5/0x19f
>  [<ffffffffbb2a0219>] ? btrfs_cleanup_transaction+0x47b/0x47b
>  [<ffffffffbb081df3>] kthread+0xb4/0xbc
>  [<ffffffffbb6d23df>] ret_from_fork+0x1f/0x40
>  [<ffffffffbb081d3f>] ? init_completion+0x24/0x24
> ---[ end trace feb4b95c83ac065f ]---
> BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
> BTRFS info (device dm-2): forced readonly


You've already had this same traceback, not sure whether it's the same
file system or not, but it was 4.7.2 kernel.


> Yes, I'm back with 4.8 since I need to get back to a working state,
> however this may be a totally unrelated bug that has been fixed since
> 4.8?

Probably fixed in 4.9, no idea when. I would just use the most recent
4.9 kernel you can get or build. Less chance of regressions in
longterm, greater chance of bug fixes. Same for 4.4.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
  2017-07-11 16:00 ` Chris Murphy
@ 2017-07-11 16:48   ` Marc MERLIN
  2017-07-11 22:43     ` Chris Murphy
  2017-07-13  1:10     ` Marc MERLIN
  0 siblings, 2 replies; 47+ messages in thread
From: Marc MERLIN @ 2017-07-11 16:48 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On Tue, Jul 11, 2017 at 10:00:40AM -0600, Chris Murphy wrote:
> > ---[ end trace feb4b95c83ac065f ]---
> > BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
> > BTRFS info (device dm-2): forced readonly
> 
> You've already had this same traceback, not sure whether it's the same
> file system or not, but it was 4.7.2 kernel.
 
You have better memory than me. I'll admit that I'm kind of overwhelmed
by all the time I'm currently spending/wasting on btrfs recovery and
that came almost out of nowwhere and hit me in 3 different places :-/
 
> Probably fixed in 4.9, no idea when. I would just use the most recent
> 4.9 kernel you can get or build. Less chance of regressions in
> longterm, greater chance of bug fixes. Same for 4.4.

Fair suggestion. I jumped from 4.8 to 4.11. I'll build a 4.9 then.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
  2017-07-11 16:48   ` Marc MERLIN
@ 2017-07-11 22:43     ` Chris Murphy
  2017-07-11 23:04       ` Marc MERLIN
  2017-07-13  1:10     ` Marc MERLIN
  1 sibling, 1 reply; 47+ messages in thread
From: Chris Murphy @ 2017-07-11 22:43 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Chris Murphy, Btrfs BTRFS

On Tue, Jul 11, 2017 at 10:48 AM, Marc MERLIN <marc@merlins.org> wrote:
> On Tue, Jul 11, 2017 at 10:00:40AM -0600, Chris Murphy wrote:
>> > ---[ end trace feb4b95c83ac065f ]---
>> > BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
>> > BTRFS info (device dm-2): forced readonly
>>
>> You've already had this same traceback, not sure whether it's the same
>> file system or not, but it was 4.7.2 kernel.
>
> You have better memory than me. I'll admit that I'm kind of overwhelmed
> by all the time I'm currently spending/wasting on btrfs recovery and
> that came almost out of nowwhere and hit me in 3 different places :-/
>
>> Probably fixed in 4.9, no idea when. I would just use the most recent
>> 4.9 kernel you can get or build. Less chance of regressions in
>> longterm, greater chance of bug fixes. Same for 4.4.
>
> Fair suggestion. I jumped from 4.8 to 4.11. I'll build a 4.9 then.

Assuming it works, settle on 4.9 until 4.14 shakes out a bit. Given
your setup and the penalty for even small problems, it's probably
better to go low risk and that means longterm kernels. Maybe one of
the three systems can use a newer kernel just to make sure you're
regressions, if any, are contained, but otherwise avoid all eggs in
one basket approach.

Another option is cutting down the size of the array and going with a
gluster or ceph approach so the rebuilds aren't so hideously invasive.
You could also optionally use a different storage layout and file
system for a small subset of the bricks, either XFS on LVM RAID or
ZoL. Again, fewer eggs in one basket. But even if they're all Btrfs,
merely breaking things down makes for faster rebuilds, less downtime,
less stress. Because whether it's an unexplained regression, the never
finished fsck, a hardware bug, or a legit drive failure, you will
inevitably have brick problems. Something's always going to go wrong
eventually. Haha. Just throw more drives at the problem and have
gluster do some distributed replication so you can more easily lose
entire bricks like this.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
  2017-07-11 22:43     ` Chris Murphy
@ 2017-07-11 23:04       ` Marc MERLIN
  0 siblings, 0 replies; 47+ messages in thread
From: Marc MERLIN @ 2017-07-11 23:04 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On Tue, Jul 11, 2017 at 04:43:06PM -0600, Chris Murphy wrote:
> Assuming it works, settle on 4.9 until 4.14 shakes out a bit. Given
> your setup and the penalty for even small problems, it's probably
> better to go low risk and that means longterm kernels. Maybe one of
> the three systems can use a newer kernel just to make sure you're
> regressions, if any, are contained, but otherwise avoid all eggs in
> one basket approach.
 
That's indeed what I was considering doing.
I guess I got complacent/too trusting after btrfs had worked for me without
real problems for over a year (maybe close to 2?)

My laptop had to be upgraded to 4.11 due to a kernel issue with nvme drives
that made any kernel before that hang on S3 sleep.
But my server can be on anything, and it seems that I'm going to leave it in
4.9 for a while indeed, even if it had been happily on 4.8 for a long time
(but given this snapshot rotation bug that caused it to remount a perfectly
good filesystem, as read only, I indeed just moved it to 4.9.36)

> Another option is cutting down the size of the array and going with a
> gluster or ceph approach so the rebuilds aren't so hideously invasive.

Right, it's just personal stuff, I don't want the management to be
ridiculously high for something that ought to be simple.
I only have 2 raid5 arrays of 5 drives each (when back in the day, I
remember building a 26 drive array with SCSI SCA drives in 3 disk shelves
for a total of 2TB, woot!)
I don't really want to artificially cut that raid5 in smaller filesystem by
adding yet another layer like LVM and then concatenate several smaller btrfs
filesystems.
I know I might be a bit stubborn here, but only 4 data drives, it should be
considered small enough, even if the drives are not super small.

> You could also optionally use a different storage layout and file
> system for a small subset of the bricks, either XFS on LVM RAID or

Yes, basically instead of having one media array and one backup array, I can
make multiple ones, and then take the penalty of moving data across them.
Been there in the past, don't really want to go back :-/
But as you said, there is no magic answer outside not having filesystems
that get corrupted so easily. I did have one flaky SAS card that did
probably slightly damage one of my arrays, but the other 2 (and the
filesystem on my laptop) don't have that hardware excuse.

Anyway, while it's not very helpful to the btrfs project, 4.9.36 seems like
indeed what's best for me for now.

Thanks for the replies.
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
  2017-07-11 16:48   ` Marc MERLIN
  2017-07-11 22:43     ` Chris Murphy
@ 2017-07-13  1:10     ` Marc MERLIN
  2017-07-13 18:17       ` Chris Murphy
  1 sibling, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2017-07-13  1:10 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On Tue, Jul 11, 2017 at 09:48:12AM -0700, Marc MERLIN wrote:
> On Tue, Jul 11, 2017 at 10:00:40AM -0600, Chris Murphy wrote:
> > > ---[ end trace feb4b95c83ac065f ]---
> > > BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
> > > BTRFS info (device dm-2): forced readonly
> >
> > You've already had this same traceback, not sure whether it's the same
> > file system or not, but it was 4.7.2 kernel.
>
> You have better memory than me. I'll admit that I'm kind of overwhelmed
> by all the time I'm currently spending/wasting on btrfs recovery and
> that came almost out of nowwhere and hit me in 3 different places :-/

Ok, I'm on 4.9.36 and same problem :(

This is on an otherwise ok working filesystem that comes back clean 
on btrfs check (although I haven't done lowmem but last time I tried lowmem it
reported problems that apparently weren't really problems)

Dear devs, what does this error mean exactly and what should I do about it besides
ignoring it and remounting my FS read-write?
On the plus side thanks for both
1) showing which device the error is on
2) not crashing the system :)

WARNING: CPU: 6 PID: 3730 at fs/btrfs/extent-tree.c:2967 btrfs_run_delayed_refs+0xbd/0x1be
BTRFS: Transaction aborted (error -17)
CPU: 0 PID: 3730 Comm: btrfs-cleaner Tainted: G     U  W       4.9.36-amd64-preempt-sysrq-20170

Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
 ffffb55c679bfc88 ffffffff8239b00b ffffb55c679bfcd8 0000000000000000
 ffffb55c679bfcc8 ffffffff82066769 00000b97679bfd48 ffffa07f61a5eaa0
 ffffa086f217c800 00000000ffffffef ffffa086ad8b5a90 00000000000003a0
Call Trace:
 [<ffffffff8239b00b>] dump_stack+0x61/0x7d
 [<ffffffff82066769>] __warn+0xc2/0xdd
 [<ffffffff820667de>] warn_slowpath_fmt+0x5a/0x76
 [<ffffffff8228dd5f>] btrfs_run_delayed_refs+0xbd/0x1be
 [<ffffffff8228b358>] ? walk_up_tree+0x87/0x10f
 [<ffffffff8229fd8f>] btrfs_should_end_transaction+0x54/0x5d
 [<ffffffff8228c8b5>] btrfs_drop_snapshot+0x380/0x65c
 [<ffffffff822edf7c>] ? btrfs_kill_all_delayed_nodes+0x5f/0xd7
 [<ffffffff826ecf8a>] ? _raw_spin_lock+0x15/0x17
 [<ffffffff82292130>] ? btrfs_delete_unused_bgs+0x326/0x369
 [<ffffffff822a0e29>] btrfs_clean_one_deleted_snapshot+0xce/0xdc
 [<ffffffff82298c1e>] cleaner_kthread+0xaf/0x17c
 [<ffffffff82298b6f>] ? btrfs_need_cleaner_sleep.isra.25+0x2c/0x2c
 [<ffffffff82081e94>] kthread+0xd1/0xd9
 [<ffffffff82081dc3>] ? init_completion+0x24/0x24
 [<ffffffff82003add>] ? do_fast_syscall_32+0xb7/0xfe
 [<ffffffff826ed4b5>] ret_from_fork+0x25/0x30
---[ end trace 59fd1c9a379f73bc ]---
BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2967: errno=-17 Object already exists
BTRFS info (device dm-2): forced readonly
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
  2017-07-13  1:10     ` Marc MERLIN
@ 2017-07-13 18:17       ` Chris Murphy
  2017-07-15  0:48         ` Marc MERLIN
  0 siblings, 1 reply; 47+ messages in thread
From: Chris Murphy @ 2017-07-13 18:17 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Chris Murphy, Btrfs BTRFS

On Wed, Jul 12, 2017 at 7:10 PM, Marc MERLIN <marc@merlins.org> wrote:
> On Tue, Jul 11, 2017 at 09:48:12AM -0700, Marc MERLIN wrote:
>> On Tue, Jul 11, 2017 at 10:00:40AM -0600, Chris Murphy wrote:
>> > > ---[ end trace feb4b95c83ac065f ]---
>> > > BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
>> > > BTRFS info (device dm-2): forced readonly
>> >
>> > You've already had this same traceback, not sure whether it's the same
>> > file system or not, but it was 4.7.2 kernel.
>>
>> You have better memory than me. I'll admit that I'm kind of overwhelmed
>> by all the time I'm currently spending/wasting on btrfs recovery and
>> that came almost out of nowwhere and hit me in 3 different places :-/
>
> Ok, I'm on 4.9.36 and same problem :(
>
> This is on an otherwise ok working filesystem that comes back clean
> on btrfs check (although I haven't done lowmem but last time I tried lowmem it
> reported problems that apparently weren't really problems)
>
> Dear devs, what does this error mean exactly and what should I do about it besides
> ignoring it and remounting my FS read-write?
> On the plus side thanks for both
> 1) showing which device the error is on
> 2) not crashing the system :)
>
> WARNING: CPU: 6 PID: 3730 at fs/btrfs/extent-tree.c:2967 btrfs_run_delayed_refs+0xbd/0x1be
> BTRFS: Transaction aborted (error -17)
> CPU: 0 PID: 3730 Comm: btrfs-cleaner Tainted: G     U  W       4.9.36-amd64-preempt-sysrq-20170
>
> Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
>  ffffb55c679bfc88 ffffffff8239b00b ffffb55c679bfcd8 0000000000000000
>  ffffb55c679bfcc8 ffffffff82066769 00000b97679bfd48 ffffa07f61a5eaa0
>  ffffa086f217c800 00000000ffffffef ffffa086ad8b5a90 00000000000003a0
> Call Trace:
>  [<ffffffff8239b00b>] dump_stack+0x61/0x7d
>  [<ffffffff82066769>] __warn+0xc2/0xdd
>  [<ffffffff820667de>] warn_slowpath_fmt+0x5a/0x76
>  [<ffffffff8228dd5f>] btrfs_run_delayed_refs+0xbd/0x1be
>  [<ffffffff8228b358>] ? walk_up_tree+0x87/0x10f
>  [<ffffffff8229fd8f>] btrfs_should_end_transaction+0x54/0x5d
>  [<ffffffff8228c8b5>] btrfs_drop_snapshot+0x380/0x65c
>  [<ffffffff822edf7c>] ? btrfs_kill_all_delayed_nodes+0x5f/0xd7
>  [<ffffffff826ecf8a>] ? _raw_spin_lock+0x15/0x17
>  [<ffffffff82292130>] ? btrfs_delete_unused_bgs+0x326/0x369
>  [<ffffffff822a0e29>] btrfs_clean_one_deleted_snapshot+0xce/0xdc
>  [<ffffffff82298c1e>] cleaner_kthread+0xaf/0x17c
>  [<ffffffff82298b6f>] ? btrfs_need_cleaner_sleep.isra.25+0x2c/0x2c
>  [<ffffffff82081e94>] kthread+0xd1/0xd9
>  [<ffffffff82081dc3>] ? init_completion+0x24/0x24
>  [<ffffffff82003add>] ? do_fast_syscall_32+0xb7/0xfe
>  [<ffffffff826ed4b5>] ret_from_fork+0x25/0x30
> ---[ end trace 59fd1c9a379f73bc ]---
> BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2967: errno=-17 Object already exists
> BTRFS info (device dm-2): forced readonly


Well I'd say it's a bug, but that's not a revelation. Is there a
snapshot being deleted in the approximate time frame for this? I see a
snapshot is being cleaned up and chunks being removed. So I wonder if
this can be avoided or intentionally triggered by manipulating
snapshot deletion coinciding with the workload? Maybe it's a race, and
that's why it hits EEXIST, and if so then it's just getting confused
and needs to start from scratch - if true then it's OK to just umount
and mount (rw) again and continue on.

There are some changes in the code between 4.9.36 and 4.12.1 (not sure
when the change was introduced, or if it alters whether you hit this
bug)

btrfs/extent.c
@@ -2962,7 +2966,7 @@ again:
delayed_refs->run_delayed_start = find_middle(&delayed_refs->root);
#endif
trans->can_flush_pending_bgs = false;
- ret = __btrfs_run_delayed_refs(trans, root, count);
+ ret = __btrfs_run_delayed_refs(trans, fs_info, count);
if (ret < 0) {
btrfs_abort_transaction(trans, ret);
return ret;

Another thing I'm not certain of is if the dm-2 reference is just how
it's referring to the file system, or if it's to be taken literally as
an issue with this device. My understanding of the code is really
weak, but I think this whole trace is within Btrfs logical block
handling, in which case it wouldn't know of a problem with a
particular device. It knows that it's in the weeds, but has no idea
what golf course it's on.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
  2017-07-13 18:17       ` Chris Murphy
@ 2017-07-15  0:48         ` Marc MERLIN
  0 siblings, 0 replies; 47+ messages in thread
From: Marc MERLIN @ 2017-07-15  0:48 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On Thu, Jul 13, 2017 at 12:17:16PM -0600, Chris Murphy wrote:
> Well I'd say it's a bug, but that's not a revelation. Is there a
> snapshot being deleted in the approximate time frame for this? I see a

Yep :)
I run btrfs-snaps and it happens right aroudn that time.
It creates a snapshot and deletes the oldest one.
There is likely a race condition if you delete a or more snapshots just
after creating one on the same subvolume, although this has worked for
about 3 years up to now.
http://marc.merlins.org/perso/btrfs/post_2014-03-21_Btrfs-Tips_-How-To-Setup-Netapp-Style-Snapshots.html
http://marc.merlins.org/linux/scripts/btrfs-snaps

Sure, I can start adding sleeps between creation and deletion, but I
haven't had to so far.

> snapshot is being cleaned up and chunks being removed. So I wonder if
> this can be avoided or intentionally triggered by manipulating
> snapshot deletion coinciding with the workload? Maybe it's a race, and
> that's why it hits EEXIST, and if so then it's just getting confused
> and needs to start from scratch - if true then it's OK to just umount
> and mount (rw) again and continue on.
 
which is what I've been doing.

> There are some changes in the code between 4.9.36 and 4.12.1 (not sure
> when the change was introduced, or if it alters whether you hit this
> bug)

I don't think I hit the bug with 4.11 or 4.12 since I didn't stay on it
long enough to know for sure (I don't think I hit the bug on 4.11, but
with the corruption issues I had which I'm still not sure were due to
other factors or the kernel, I've rolled back as discussed earlier.

On my biggest system, I'm still debugging an issue with 3 of my 8 drives
get pseudo randomly kicked out after returning corrupted data for a few
seconds. I'm pretty sure it's not an issue with the drives, but I'm not
sure if it's the disk carrier/enclosure, cables, or actual ports on the
SAS card (working through the option matrix to find out)

> Another thing I'm not certain of is if the dm-2 reference is just how
> it's referring to the file system, or if it's to be taken literally as
> an issue with this device. My understanding of the code is really
> weak, but I think this whole trace is within Btrfs logical block
> handling, in which case it wouldn't know of a problem with a
> particular device. It knows that it's in the weeds, but has no idea
> what golf course it's on.

dm-2 is correct, it does refer to the correct device.

gargamel:~# dmsetup status -v dshelf1
Name:              dshelf1
State:             ACTIVE
Read Ahead:        8192
Tables present:    LIVE
Open count:        1
Event number:      1
Major, minor:      253, 2
Number of targets: 1
UUID: CRYPT-LUKS1-3cd9bbafa2bb44a587a658a77487ee73-dshelf1_unformatted
0 46883102704 crypt 
gargamel:~# l /dev/mapper/dshelf1 /dev/dm-2 
brw-rw---- 1 root disk 253, 2 Jul 14 06:30 /dev/dm-2
lrwxrwxrwx 1 root root      7 Jul 14 06:30 /dev/mapper/dshelf1 -> ../dm-2

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-07-11  6:21 BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists Marc MERLIN
  2017-07-11 16:00 ` Chris Murphy
@ 2017-07-15  1:22 ` Marc MERLIN
  2017-07-15 23:12   ` Marc MERLIN
  1 sibling, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2017-07-15  1:22 UTC (permalink / raw)
  To: linux-btrfs, Chris Murphy, Kai Krakow, bepi, matt, mh, mkaganer,
	david, tch, somethingsome2000
  Cc: Chris Mason, bo.li.liu, fdmanana, Josef Bacik, David Sterba

Dear Chris and other developers,

Can you look at this bug which has been happening since 2012 on apparently all kernels between at least
3.4 and 4.11.
I didn't look in detail at each thread (took long enough to even find them all and paste here), but they seem pretty
similar although the reasons how they got there may be different, or at least not as benign as a race condition
between snapshot creation and deletion for those who do hourly snapshot rotations like me.

On the plus side, it looks like ever since 3.4 the code was already
smart enough not to crash you and just remounted the device read only.

On Mon, Jul 10, 2017 at 11:21:55PM -0700, Marc MERLIN wrote:
> Looks like btrfs has decided to give me hell.
> I'm still recovering my system.
> The biggest filesystem seems to work, but I just had it go read only:
> 
> ------------[ cut here ]------------
> WARNING: CPU: 5 PID: 3734 at fs/btrfs/extent-tree.c:2960 btrfs_run_delayed_refs+0xb6/0x1dc
> BTRFS: Transaction aborted (error -17)
> Modules linked in: udp_diag tcp_diag inet_diag veth ip6table_filter ip6_tables ebtable_nat ebtables ppdev lp xt_addrtype br_
> fmt_misc ftdi_sio nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_
> ptable_mangle iptable_filter pl2303 dm_snapshot dm_bufio iptable_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 
> e_ipv4 nf_nat nf_conntrack x_tables sg st snd_pcm_oss snd_mixer_oss bcache kvm_intel kvm irqbypass snd_hda_codec_realtek snd
> da_codec snd_cmipci rc_ati_x10 asus_wmi snd_hda_core snd_mpu401_uart snd_opl3_lib snd_hwdep snd_rawmidi snd_seq_device spars
> l tpm_infineon snd tpm_tis hwmon tpm_tis_core usbnet rc_core i2c_i801 usbserial libphy soundcore wmi i915 lpc_ich mfd_cor
> s evdev pcspkr parport_pc battery mei_me parport i2c_smbus e1000e ptp pps_core fuse raid456 multipath mmc_block mmc_core dm_
> r async_memcpy async_tx crc32c_intel blowfish_x86_64 blowfish_common aesni_intel aes_x86_64 lrw glue_helper ablk_helper cryp
> 4 thermal usbcore mvsas libsas fjes scsi_transport_sas fan r8169 mii usb_common [last unloaded: ftdi_sio]
> CPU: 1 PID: 3734 Comm: btrfs-transacti Tainted: G     U  W       4.8.10-amd64-preempt-sysrq-20161121vb3tj1 #12
> Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
>  0000000000200286 000000003f87d529 ffff9dcc9838fd00 ffffffffbb39e738
>  ffff9dcc9838fd50 0000000000000000 ffff9dcc9838fd40 ffffffffbb066e08
>  00000b909838fdc0 ffff9dc94fdc9be0 0000000000000000 ffff9dcca0d93000
> Call Trace:
>  [<ffffffffbb39e738>] dump_stack+0x63/0x7f
>  [<ffffffffbb066e08>] __warn+0xc2/0xdd
>  [<ffffffffbb066e7d>] warn_slowpath_fmt+0x5a/0x76
>  [<ffffffffbb291dc2>] btrfs_run_delayed_refs+0xb6/0x1dc
>  [<ffffffffbb2a4d1d>] btrfs_commit_transaction+0x5b/0x965
>  [<ffffffffbb2a030e>] transaction_kthread+0xf5/0x19f
>  [<ffffffffbb2a0219>] ? btrfs_cleanup_transaction+0x47b/0x47b
>  [<ffffffffbb081df3>] kthread+0xb4/0xbc
>  [<ffffffffbb6d23df>] ret_from_fork+0x1f/0x40
>  [<ffffffffbb081d3f>] ? init_completion+0x24/0x24
> ---[ end trace feb4b95c83ac065f ]---
> BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
> BTRFS info (device dm-2): forced readonly

Ok, please try this search in gmail or whatever archive you have
"btrfs_run_delayed_refs" "BTRFS: Transaction aborted" "Object already exists" 
 
I had a look in the archives. I'm wrong, I did have the bug with 4.11 (pasted below)
and plenty of others have had it too, actually plenty of other people, going all the way back to 3.4 (2012)
if all the reports I just found and pasted are ultimately the same problem (they may not be)
Me, it happens at snapshot rotation time, others triggered this other ways I think


Kai Krakow <hurikhan77@gmail.com> 2016/08/28
[4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists
[44819.903435] ------------[ cut here ]------------
[44819.903443] WARNING: CPU: 3 PID: 2787 at fs/btrfs/extent-tree.c:2963 btrfs_run_delayed_refs+0x26c/0x290
[44819.903444] BTRFS: Transaction aborted (error -17)
[44819.903484] CPU: 3 PID: 2787 Comm: BrowserBlocking Tainted: P           O    4.7.2-gentoo #2
[44819.903485] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z68 Pro3, BIOS L2.16A 02/22/2013
[44819.903487]  0000000000000000 ffffffff8130af2d ffff8800b7d03d20 0000000000000000
[44819.903489]  ffffffff810865fa ffff880409374428 ffff8800b7d03d70 ffff8803bf299760
[44819.903491]  0000000000000000 00000000ffffffef ffff8803f677f000 ffffffff8108666a
[44819.903493] Call Trace:
[44819.903496]  [<ffffffff8130af2d>] ? dump_stack+0x46/0x59
[44819.903500]  [<ffffffff810865fa>] ? __warn+0xba/0xe0
[44819.903502]  [<ffffffff8108666a>] ? warn_slowpath_fmt+0x4a/0x50
[44819.903504]  [<ffffffff8121351c>] ? btrfs_run_delayed_refs+0x26c/0x290
[44819.903507]  [<ffffffff811feb1e>] ? btrfs_release_path+0xe/0x80
[44819.903509]  [<ffffffff81216afa>] ? btrfs_start_dirty_block_groups+0x2da/0x420
[44819.903511]  [<ffffffff812279f3>] ? btrfs_commit_transaction+0x143/0x990
[44819.903514]  [<ffffffff8116a2c5>] ? kmem_cache_free+0x165/0x180
[44819.903516]  [<ffffffff8124396c>] ? btrfs_wait_ordered_range+0x7c/0x110
[44819.903518]  [<ffffffff8123ecf6>] ? btrfs_sync_file+0x286/0x360
[44819.903522]  [<ffffffff811ae343>] ? do_fsync+0x33/0x60
[44819.903524]  [<ffffffff811ae57a>] ? SyS_fdatasync+0xa/0x10
[44819.903528]  [<ffffffff8162299b>] ? entry_SYSCALL_64_fastpath+0x13/0x8f
[44819.903529] ---[ end trace 6944811e170a0e57 ]---
[44819.903531] BTRFS: error (device bcache2) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists
[44819.903533] BTRFS info (device bcache2): forced readonly


Me 2017/06/20
4.11.3: BTRFS critical (device dm-1): unable to add free space :-17 => btrfs check --repair runs clean
[846332.977964] ------------[ cut here ]------------
[846332.992285] WARNING: CPU: 4 PID: 4095 at fs/btrfs/free-space-cache.c:1476 tree_insert_offset+0x78/0xb1
[846333.402648] CPU: 4 PID: 4095 Comm: btrfs-transacti Tainted: G     U          4.11.3-amd64-preempt-sysrq-20170406 #5
[846333.434917] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
[846333.463597] Call Trace:
[846333.469942] usb 2-1-port4: device 2-1.4 not suspended yet
[846333.489639]  dump_stack+0x61/0x7d
[846333.500480]  __warn+0xc2/0xdd
[846333.510956]  warn_slowpath_null+0x1d/0x1f
[846333.524103]  tree_insert_offset+0x78/0xb1
[846333.537337]  link_free_space+0x2c/0x41
[846333.549991]  __btrfs_add_free_space+0x89/0x3aa
[846333.564236]  ? kmem_cache_free+0x3d/0x92
[846333.577702]  btrfs_add_free_space+0x1d/0x1f
[846333.591179]  unpin_extent_range+0xf3/0x2b0
[846333.605220]  btrfs_finish_extent_commit+0xda/0x1d4
[846333.621324]  btrfs_commit_transaction+0x629/0x79a
[846333.637205]  ? add_wait_queue+0x44/0x44
[846333.649680]  transaction_kthread+0xe2/0x178
[846333.663201]  ? btrfs_cleanup_transaction+0x3e8/0x3e8
[846333.679033]  kthread+0xfb/0x100
[846333.690261]  ? init_completion+0x24/0x24
[846333.703239]  ? do_fast_syscall_32+0xb7/0xfe
[846333.717649]  ret_from_fork+0x2c/0x40
[846333.729656] ---[ end trace 27aa532d1886e536 ]---
[846333.744721] BTRFS critical (device dm-1): unable to add free space :-17

[847312.529660] BTRFS: Transaction aborted (error -17)
[847312.912784] CPU: 6 PID: 4094 Comm: btrfs-cleaner Tainted: G     U  W       4.11.3-amd64-preempt-sysrq-20170406 #5
[847312.913132] usb 2-1-port4: device 2-1.4 not suspended yet
[847312.962394] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
[847312.990936] Call Trace:
[847312.999347]  dump_stack+0x61/0x7d
[847313.010383]  __warn+0xc2/0xdd
[847313.020351]  warn_slowpath_fmt+0x5a/0x76
[847313.033274]  btrfs_run_delayed_refs+0xb1/0x1cc
[847313.047655]  btrfs_should_end_transaction+0x50/0x57
[847313.063910]  btrfs_drop_snapshot+0x38a/0x6c4
[847313.078619]  ? btrfs_kill_all_delayed_nodes+0x5f/0xd7
[847313.094916]  ? _raw_spin_lock+0x15/0x17
[847313.108325]  btrfs_clean_one_deleted_snapshot+0xce/0xdc
[847313.125493]  cleaner_kthread+0x91/0x14b
[847313.138228]  ? btrfs_destroy_pinned_extent+0xd2/0xd2
[847313.154308]  kthread+0xfb/0x100
[847313.164900]  ? init_completion+0x24/0x24
[847313.177781]  ? do_fast_syscall_32+0xb7/0xfe
[847313.191490]  ret_from_fork+0x2c/0x40
[847313.203432] ---[ end trace 27aa532d1886e537 ]---
[847313.218391] BTRFS: error (device dm-1) in btrfs_run_delayed_refs:2961: errno=-17 Object already exists
[847313.247668] BTRFS info (device dm-1): forced readonly


Giuseppe Della Bianca 2016/12/18, 4.8.8
[CORRUPTION FILESYSTEM] Corrupted and unrecoverable file system during the snapshot receive
------------[ cut here ]------------
WARNING: CPU: 1 PID: 4325 at fs/btrfs/extent-tree.c:2960 btrfs_run_delayed_refs+0x283/0x2b0 [btrfs]
BTRFS: Transaction aborted (error -17)
Modules linked in: fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_br
 soundcore acpi_cpufreq tpm_tis tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc ata_generic nouveau vide
CPU: 1 PID: 4325 Comm: umount Tainted: G        W       4.8.8-100.fc23.x86_64 #1
Hardware name: System manufacturer System Product Name/M2N, BIOS 0902    02/16/2009
 0000000000000286 00000000dd260fac ffff8ffa0d25bb60 ffffffffbc3e493e
 ffff8ffa0d25bbb0 0000000000000000 ffff8ffa0d25bba0 ffffffffbc0a0ecb
 00000b9000000049 ffff8ff9e61b40a0 ffff8ffa2da77800 ffffffffffffffff
Call Trace:
 [<ffffffffbc3e493e>] dump_stack+0x63/0x85
 [<ffffffffbc0a0ecb>] __warn+0xcb/0xf0
 [<ffffffffbc0a0f4f>] warn_slowpath_fmt+0x5f/0x80
 [<ffffffffc07eb4e3>] btrfs_run_delayed_refs+0x283/0x2b0 [btrfs]
 [<ffffffffc07d62ec>] ?  btrfs_cow_block+0x10c/0x1e0 [btrfs]
 [<ffffffffc07ff62e>] commit_cowonly_roots+0xae/0x2e0 [btrfs]
 [<ffffffffc07eb466>] ?  btrfs_run_delayed_refs+0x206/0x2b0 [btrfs]
 [<ffffffffc08706b4>] ?  btrfs_qgroup_account_extents+0x84/0x180 [btrfs]
 [<ffffffffc0802187>] btrfs_commit_transaction+0x547/0xa40 [btrfs]
 [<ffffffffc07faa9f>] btrfs_commit_super+0x8f/0xa0 [btrfs]
 [<ffffffffc07fcbcb>] close_ctree+0x2db/0x380 [btrfs]
 [<ffffffffbc26d3da>] ?  evict_inodes+0x15a/0x180
 [<ffffffffc07ccf29>] btrfs_put_super+0x19/0x20 [btrfs]
 [<ffffffffbc2520bf>] generic_shutdown_super+0x6f/0xf0
 [<ffffffffbc2523b2>] kill_anon_super+0x12/0x20
 [<ffffffffc07cdd98>] btrfs_kill_super+0x18/0x110 [btrfs]
 [<ffffffffbc252763>] deactivate_locked_super+0x43/0x70
 [<ffffffffbc2527ec>] deactivate_super+0x5c/0x60
 [<ffffffffbc2711bf>] cleanup_mnt+0x3f/0x90
 [<ffffffffbc271252>] __cleanup_mnt+0x12/0x20
 [<ffffffffbc0bf0ce>] task_work_run+0x7e/0xa0
 [<ffffffffbc0032d2>] exit_to_usermode_loop+0xc2/0xd0
 [<ffffffffbc003bf1>] syscall_return_slowpath+0xa1/0xb0
 [<ffffffffbc7ffb3a>] entry_SYSCALL_64_fastpath+0xa2/0xa4
---[ end trace f7eb2e818f727168 ]---
BTRFS: error (device sda3) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists
BTRFS info (device sda3): forced readonly
BTRFS warning (device sda3): Skipping commit of aborted transaction.
BTRFS: error (device sda3) in cleanup_transaction:1854: errno=-17 Object already exists


Matt McKinnon <matt@techsquare.com> 2017/08/09 kernel 4.7
BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists
------------[ cut here ]------------
WARNING: CPU: 6 PID: 269 at fs/btrfs/extent-tree.c:2963 btrfs_run_delayed_refs+0x292/0x2d0 [btrfs]
BTRFS: Transaction aborted (error -17)
Modules linked in: ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables ipmi_devintf nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd dm_multipath joydev lpc_ich mei_me mei ioatdma wmi ipmi_si ipmi_msghandler shpchp mac_hid btrfs lp parport ses enclosure scsi_transport_sas raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor igb raid6_pq libcrc32c i2c_algo_bit raid1 hid_generic dca usbhid raid0 ptp hid ahci megaraid_sas multipath libahci pps_core linear dm_mirror dm_region_hash dm_log
CPU: 6 PID: 269 Comm: kworker/u18:5 Not tainted 4.7.0-custom #1
Hardware name: Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28/2014
Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
 0000000000000000 ffff88086a057ca0 ffffffff813b816c ffff88086a057cf0
 0000000000000000 ffff88086a057ce0 ffffffff8107a321 00000b9325288170
 ffff8808519eb000 ffff880825288170 ffff88086b2c1000 0000000000000020
Call Trace:
 [<ffffffff813b816c>] dump_stack+0x63/0x87
 [<ffffffff8107a321>] __warn+0xd1/0xf0
 [<ffffffff8107a38f>] warn_slowpath_fmt+0x4f/0x60
 [<ffffffffc01c6e52>] btrfs_run_delayed_refs+0x292/0x2d0 [btrfs]
 [<ffffffffc01c6f24>] delayed_ref_async_start+0x94/0xb0 [btrfs]
 [<ffffffffc020f780>] normal_work_helper+0xc0/0x2d0 [btrfs]
 [<ffffffff81091082>] ? pwq_activate_delayed_work+0x42/0xb0
 [<ffffffffc020fbc2>] btrfs_extent_refs_helper+0x12/0x20 [btrfs]
 [<ffffffff81093173>] process_one_work+0x153/0x3f0
 [<ffffffff8109392b>] worker_thread+0x12b/0x4b0
 [<ffffffff81093800>] ? rescuer_thread+0x340/0x340
 [<ffffffff81099109>] kthread+0xc9/0xe0
 [<ffffffff817db85f>] ret_from_fork+0x1f/0x40
 [<ffffffff81099040>] ? kthread_park+0x60/0x60
---[ end trace e2b0b8dc37502011 ]---
BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists
BTRFS info (device sda1): forced readonly


Marc Haber <mh+linux-btrfs@zugschlus.de> 2015/12/11
Transaction aborted (error -17) during balance
WARNING: CPU: 4 PID: 5545 at /build/linux-eGTGmU/linux-4.3/fs/btrfs/extent-tree.c:2093 __btrfs_inc_extent_ref.isra.52+0x20e/0x280 [btrfs]()
BTRFS: Transaction aborted (error -17)
Modules linked in: ctr ccm tun rfcomm cpufreq_userspace binfmt_misc cpufreq_stats cpufreq_powersave cpufreq_conservative nf_conntrack_netlink nfnetlink bnep ip6table_filter ip6_tables xt_TCPMSS xt_tcpudp iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables bridge stp llc joydev arc4 iTCO_wdt iwldvm iTCO_vendor_support mac80211 snd_hda_codec_conexant intel_rapl snd_hda_codec_generic iosf_mbi x86_pkg_temp_thermal btusb intel_powerclamp btrtl snd_hda_intel iwlwifi btbcm kvm_intel snd_hda_codec btintel kvm snd_hda_core psmouse bluetooth snd_hwdep snd_pcm_oss pcspkr serio_raw i2c_i801 sg cfg80211 snd_mixer_oss lpc_ich snd_pcm mfd_core snd_timer mei_me shpchp mei thinkpad_acpi nvram
 tpm_tis snd tpm soundcore rfkill evdev battery ac processor coretemp loop drbd lru_cache libcrc32c parport_pc ppdev lp parport autofs4 btrfs xor raid6_pq ext4 crc16 mbcache jbd2 algif_skcipher af_alg dm_crypt dm_mod md_mod hid_generic hid_logitech_hidpp hid_logitech_dj usbhid hid sd_mod uas usb_storage crct10dif_pclmul crc32_pclmul crc32c_intel jitterentropy_rng sha256_ssse3 sha256_generic hmac drbg ansi_cprng aesni_intel aes_x86_64 lrw gf128mul glue_helper i915 ahci ablk_helper cryptd libahci sdhci_pci i2c_algo_bit libata ehci_pci drm_kms_helper sdhci ehci_hcd scsi_mod mmc_core e1000e usbcore ptp usb_common drm pps_core thermal wmi video button
CPU: 4 PID: 5545 Comm: kworker/u16:1 Not tainted 4.3.0-trunk-amd64 #1 Debian 4.3-1~exp2
Hardware name: LENOVO 4240CTO/4240CTO, BIOS 8AET63WW (1.43 ) 05/08/2013
Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
 ffffffffa0627250 ffffffff812c5319 ffff88020dc23ba0 ffffffff8106ebcd
 ffff880406146000 ffff88020dc23bf0 ffff8803c90b9410 0000000000000000
 0000000000000106 ffffffff8106ec4c ffffffffa0627420 ffffffff00000020
Call Trace:
 [<ffffffff812c5319>] ? dump_stack+0x40/0x57
 [<ffffffff8106ebcd>] ? warn_slowpath_common+0x7d/0xb0
 [<ffffffff8106ec4c>] ? warn_slowpath_fmt+0x4c/0x50
 [<ffffffffa058bdc9>] ? insert_tree_block_ref+0x49/0x60 [btrfs]
 [<ffffffffa058fc6e>] ? __btrfs_inc_extent_ref.isra.52+0x20e/0x280 [btrfs]
 [<ffffffffa0594e77>] ? __btrfs_run_delayed_refs+0xc47/0x1050 [btrfs]
 [<ffffffff8101d3b5>] ? sched_clock+0x5/0x10
 [<ffffffff81094130>] ? check_preempt_curr+0x50/0x90
 [<ffffffff81094184>] ? ttwu_do_wakeup+0x14/0xc0
 [<ffffffffa0597e98>] ? btrfs_run_delayed_refs+0x78/0x2a0 [btrfs]
 [<ffffffffa05980f2>] ? delayed_ref_async_start+0x32/0x80 [btrfs]
 [<ffffffffa05daeb8>] ? btrfs_scrubparity_helper+0xc8/0x260 [btrfs]
 [<ffffffff810851df>] ? process_one_work+0x19f/0x3d0
 [<ffffffff8108545d>] ? worker_thread+0x4d/0x450
 [<ffffffff81085410>] ? process_one_work+0x3d0/0x3d0
 [<ffffffff8108af5d>] ? kthread+0xbd/0xe0
 [<ffffffff8108aea0>] ? kthread_create_on_node+0x170/0x170
 [<ffffffff81553d0f>] ? ret_from_fork+0x3f/0x70
 [<ffffffff8108aea0>] ? kthread_create_on_node+0x170/0x170
---[ end trace 6671e30ac2882b40 ]---
BTRFS: error (device dm-11) in __btrfs_inc_extent_ref:2093: errno=-17 Object already exists
BTRFS info (device dm-11): forced readonly
BTRFS: error (device dm-11) in btrfs_run_delayed_refs:2851: errno=-17 Object already exists


Mordechay Kaganer <mkaganer@gmail.com> 2015/11/16 kernel 4.2
Transaction aborted (error -17) after crash
[  836.026606] BTRFS warning (device md1): block group 12969790406656 has wrong amount of free space
[  836.026610] BTRFS warning (device md1): failed to load free space cache for block group 12969790406656, rebuild it now
[ 1033.619798] BTRFS warning (device md1): block group 15322358743040 has wrong amount of free space
[ 1033.619801] BTRFS warning (device md1): failed to load free space cache for block group 15322358743040, rebuild it now
[ 2052.843713] ------------[ cut here ]------------
[ 2052.843756] WARNING: CPU: 2 PID: 1725 at /home/kernel/COD/linux/fs/btrfs/extent-tree.c:2781 btrfs_run_delayed_refs.part.73+0x242/0x270 [btrfs]()
[ 2052.843758] BTRFS: Transaction aborted (error -17)
[ 2052.843827] CPU: 2 PID: 1725 Comm: btrfs-transacti Not tainted 4.2.5-040205-generic #201510270124
[ 2052.843829] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./EPC602D8A, BIOS P1.20 04/16/2014
[ 2052.843832]  0000000000000000 00000000df907816 ffff8808414dfcb8 ffffffff817d8d6d
[ 2052.843836]  0000000000000000 ffff8808414dfd10 ffff8808414dfcf8 ffffffff8107b3c6
[ 2052.843839]  0000000000001a0c ffff88049c5fe8a0 ffff88085577d800 ffff88082932cb80
[ 2052.843843] Call Trace:
[ 2052.843852]  [<ffffffff817d8d6d>] dump_stack+0x45/0x57
[ 2052.843858]  [<ffffffff8107b3c6>] warn_slowpath_common+0x86/0xc0
[ 2052.843862]  [<ffffffff8107b455>] warn_slowpath_fmt+0x55/0x70
[ 2052.843878]  [<ffffffffc022ecf2>] btrfs_run_delayed_refs.part.73+0x242/0x270 [btrfs]
[ 2052.843882]  [<ffffffff810e54bc>] ? del_timer_sync+0x4c/0x60
[ 2052.843897]  [<ffffffffc022ed35>] btrfs_run_delayed_refs+0x15/0x20 [btrfs]
[ 2052.843915]  [<ffffffffc0243756>] btrfs_commit_transaction+0x56/0xb20 [btrfs]
[ 2052.843931]  [<ffffffffc023ee19>] transaction_kthread+0x229/0x240 [btrfs]
[ 2052.843945]  [<ffffffffc023ebf0>] ?  btrfs_cleanup_transaction+0x550/0x550 [btrfs]
[ 2052.843949]  [<ffffffff8109a798>] kthread+0xd8/0xf0
[ 2052.843953]  [<ffffffff8109a6c0>] ? kthread_create_on_node+0x1b0/0x1b0
[ 2052.843957]  [<ffffffff817dff9f>] ret_from_fork+0x3f/0x70
[ 2052.843960]  [<ffffffff8109a6c0>] ? kthread_create_on_node+0x1b0/0x1b0
[ 2052.843962] ---[ end trace 6575cf272a151e61 ]---
[ 2052.843966] BTRFS: error (device md1) in
btrfs_run_delayed_refs:2781: errno=-17 Object already exists
[ 2052.844024] BTRFS info (device md1): forced readonly
[ 2052.848397] pending csums is 7327744

David Goodwin <david@codepoets.co.uk>  2015/07/25 kernel 4.2
WARNING: CPU: 2 PID: 31502 at fs/btrfs/extent-tree.c:2025 __btrfs_inc_extent_ref.isra.51+0x210/0x280 [btrfs]()
BTRFS: Transaction aborted (error -17)
CPU: 2 PID: 31502 Comm: kworker/u16:1 Tainted: G           O 4.2.0-rc3-dg1 #1
Hardware name: System manufacturer System Product Name/M5A88-M, BIOS 1101    03/16/2012
Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
 0000000000000000 ffffffffa02b98a7 ffffffff81540a6f ffff880107383b28
 ffffffff8106dfa1 ffff88040c955800 ffff8801003612f8 ffff8800441bfda0
 00000a6f8acba000 0000000000003fa4 ffffffff8106e01a ffffffffa02bbc48
Call Trace:
 [<ffffffff81540a6f>] ? dump_stack+0x40/0x50
 [<ffffffff8106dfa1>] ? warn_slowpath_common+0x81/0xb0
 [<ffffffff8106e01a>] ? warn_slowpath_fmt+0x4a/0x50
 [<ffffffffa0222390>] ? __btrfs_inc_extent_ref.isra.51+0x210/0x280 [btrfs]
 [<ffffffffa0229e1f>] ? __btrfs_run_delayed_refs+0xd1f/0x10a0 [btrfs]
 [<ffffffff8101cc65>] ? sched_clock+0x5/0x10
 [<ffffffff811bd0c2>] ? __sb_start_write+0x42/0xe0
 [<ffffffffa022e26a>] ? btrfs_run_delayed_refs.part.73+0x6a/0x280 [btrfs]
 [<ffffffffa022e518>] ? delayed_ref_async_start+0x78/0x90 [btrfs]
 [<ffffffffa026eb6c>] ? normal_work_helper+0xbc/0x260 [btrfs]
 [<ffffffff81084e01>] ? process_one_work+0x151/0x3d0
 [<ffffffff81085805>] ? worker_thread+0x65/0x470
 [<ffffffff8154226d>] ? __schedule+0x28d/0x8a0
 [<ffffffff810857a0>] ? rescuer_thread+0x310/0x310
 [<ffffffff8108ac23>] ? kthread+0xd3/0xf0
 [<ffffffff8108ab50>] ? kthread_create_on_node+0x180/0x180
 [<ffffffff8154699f>] ? ret_from_fork+0x3f/0x70
 [<ffffffff8108ab50>] ? kthread_create_on_node+0x180/0x180
---[ end trace cc878b7b9dc6406e ]---
BTRFS: error (device sdc1) in __btrfs_inc_extent_ref:2025: errno=-17 Object already exists
BTRFS info (device sdc1): forced readonly
BTRFS: error (device sdc1) in btrfs_run_delayed_refs:2781: errno=-17 Object already exists


It keeps going, I ran out of motivation for pasting them all

Tomasz Chmielewski <tch@virtall.com> / 2013/12/20 kernel 3.13:
BTRFS debug (device sdb5): run_one_delayed_ref returned -17
------------[ cut here ]------------
WARNING: CPU: 0 PID: 15042 at fs/btrfs/super.c:254 __btrfs_abort_transaction+0x4d/0xff [btrfs]()
btrfs: Transaction aborted (error -17)
CPU: 0 PID: 15042 Comm: btrfs-transacti Tainted: G        W    3.13.0-rc4 #1
Hardware name: System manufacturer System Product Name/P8H77-M PRO, BIOS 1101 02/04/2013
 0000000000000009 ffff8800374ddc48 ffffffff8138a37d 0000000000000006
 ffff8800374ddc98 ffff8800374ddc88 ffffffff810370a9 ffff8800374ddd80
 ffffffffa020d524 00000000ffffffef ffff8807ead7d800 ffff8807ff0cc8c0
Call Trace:
 [<ffffffff8138a37d>] dump_stack+0x46/0x58
 [<ffffffff810370a9>] warn_slowpath_common+0x77/0x91
 [<ffffffffa020d524>] ? __btrfs_abort_transaction+0x4d/0xff [btrfs]
 [<ffffffff81037157>] warn_slowpath_fmt+0x41/0x43
 [<ffffffffa020d524>] __btrfs_abort_transaction+0x4d/0xff [btrfs]
 [<ffffffffa02226ed>] btrfs_run_delayed_refs+0x253/0x46f [btrfs]
 [<ffffffffa022fdec>] btrfs_commit_transaction+0x36d/0x7df [btrfs]
 [<ffffffffa022e345>] transaction_kthread+0xef/0x1c2 [btrfs]
 [<ffffffffa022e256>] ? open_ctree+0x1ac7/0x1ac7 [btrfs]
 [<ffffffff8104ee9a>] kthread+0xcd/0xd5
 [<ffffffff8104edcd>] ? kthread_freezable_should_stop+0x43/0x43
 [<ffffffff8138f17c>] ret_from_fork+0x7c/0xb0
 [<ffffffff8104edcd>] ? kthread_freezable_should_stop+0x43/0x43
---[ end trace b552aca9a0cff3cb ]---
BTRFS error (device sdb5) in btrfs_run_delayed_refs:2730: errno=-17 Object already exists
BTRFS info (device sdb5): forced readonly
BTRFS warning (device sdb5): Skipping commit of aborted transaction.
BTRFS error (device sdb5) in cleanup_transaction:1553: errno=-17 Object already exists


Chester <somethingsome2000@gmail.com> / 2012/06/26
btrfs volume suddenly becomes read-only
btrfs: run_one_delayed_ref returned -17
------------[ cut here ]------------
WARNING: at fs/btrfs/super.c:221 __btrfs_abort_transaction+0x40/0x9d()
Hardware name: HP Pavilion dv6 Notebook PC
btrfs: Transaction aborted
Pid: 4491, comm: btrfs-endio-wri Not tainted 3.4.0-00091-gcb77fcd #1
Call Trace:
 [<ffffffff8106382f>] warn_slowpath_common+0x7e/0x96
 [<ffffffff810638db>] warn_slowpath_fmt+0x41/0x43
 [<ffffffff8125e626>] __btrfs_abort_transaction+0x40/0x9d
 [<ffffffff8126dd55>] btrfs_run_delayed_refs+0x267/0x34b
 [<ffffffff8111e2f3>] ? virt_to_head_page+0x9/0x2c
 [<ffffffff8127c241>] __btrfs_end_transaction+0x7f/0x21b
 [<ffffffff8127c426>] btrfs_end_transaction+0x10/0x12
 [<ffffffff812810c0>] btrfs_finish_ordered_io+0x295/0x2e5
 [<ffffffff8167ce58>] ? schedule_timeout+0x9c/0xb6
 [<ffffffff8106eb22>] ? usleep_range+0x3d/0x3d
 [<ffffffff81281120>] finish_ordered_fn+0x10/0x12
 [<ffffffff812a3256>] worker_loop+0x169/0x4a3
 [<ffffffff812a30ed>] ? btrfs_queue_worker+0x283/0x283
 [<ffffffff8107d0c0>] kthread+0x86/0x8e
 [<ffffffff81685c64>] kernel_thread_helper+0x4/0x10
 [<ffffffff8107d03a>] ? kthread_freezable_should_stop+0x43/0x43
 [<ffffffff81685c60>] ? gs_change+0x13/0x13
---[ end trace fe73a333f7c68c2e ]---
BTRFS error (device sda6) in btrfs_run_delayed_refs:2454: Object already exists
btrfs is forced readonly
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-07-15  1:22 ` BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012) Marc MERLIN
@ 2017-07-15 23:12   ` Marc MERLIN
  2017-07-16 14:01     ` Giuseppe Della Bianca
  2017-08-29  3:16     ` Marc MERLIN
  0 siblings, 2 replies; 47+ messages in thread
From: Marc MERLIN @ 2017-07-15 23:12 UTC (permalink / raw)
  To: linux-btrfs, Chris Murphy, Kai Krakow, bepi, matt, mh, mkaganer,
	david, tch, somethingsome2000
  Cc: Chris Mason, bo.li.liu, fdmanana, Josef Bacik, David Sterba

On Fri, Jul 14, 2017 at 06:22:16PM -0700, Marc MERLIN wrote:
> Dear Chris and other developers,
> 
> Can you look at this bug which has been happening since 2012 on apparently all kernels between at least
> 3.4 and 4.11.
> I didn't look in detail at each thread (took long enough to even find them all and paste here), but they seem pretty
> similar although the reasons how they got there may be different, or at least not as benign as a race condition
> between snapshot creation and deletion for those who do hourly snapshot rotations like me.

I just finished 2 check repairs, one with each mode, they both come back
clean.
Yet my FS still remounts read only with the same
BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2967: errno=-17 Object already exists
BTRFS info (device dm-2): forced readonly
BTRFS warning (device dm-2): failed setting block group ro, ret=-30 

So, given that I can reproduce this almost at will (actually I wish I could
stop it, for now I've turned off snapshots), and that the filesystem is deemed
clean, is there any patch/fix I can try?

Others on this thread with the same error: did anyone recover from this
without wiping the filesystem?

Is there a chance a balance might work around the bug so that whatever
layout I have, changes, and stops the bug from occuring?


gargamel:~# btrfs check --repair  /dev/mapper/dshelf1
enabling repair mode
Checking filesystem on /dev/mapper/dshelf1
UUID: 36f5079e-ca6c-4855-8639-ccb82695c18d
checking extents
Fixed 0 roots.
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
checking csums
checking root refs
found 11454147125248 bytes used, no error found
total csum bytes: 11169793608
total tree bytes: 13468549120
total fs tree bytes: 715669504
total extent tree bytes: 478838784
btree space waste bytes: 1159606020
file data blocks allocated: 11917231079424
 referenced 11938096029696

gargamel:~# btrfs check --mode=lowmem /dev/mapper/dshelf1
Checking filesystem on /dev/mapper/dshelf1
UUID: 36f5079e-ca6c-4855-8639-ccb82695c18d
checking extents
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
checking csums
checking root refs
found 11454147158016 bytes used, no error found
total csum bytes: 11169793608
total tree bytes: 13506461696
total fs tree bytes: 753549312
total extent tree bytes: 478871552
btree space waste bytes: 1165617982
file data blocks allocated: 13203054301184
 referenced 13229588148224

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-07-15 23:12   ` Marc MERLIN
@ 2017-07-16 14:01     ` Giuseppe Della Bianca
  2017-07-16 16:06       ` Marc MERLIN
  2017-08-29  3:16     ` Marc MERLIN
  1 sibling, 1 reply; 47+ messages in thread
From: Giuseppe Della Bianca @ 2017-07-16 14:01 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-btrfs

> On Fri, Jul 14, 2017 at 06:22:16PM -0700, Marc MERLIN wrote:
> > Dear Chris and other developers,
]zac[
> Others on this thread with the same error: did anyone recover from this
> without wiping the filesystem?
> 
> Is there a chance a balance might work around the bug so that whatever
> layout I have, changes, and stops the bug from occuring?
]zac[

Any attempt, even just delete files, has worsened the situation.
I advise not to waste time in repairs, and directly recreate the filesystem.

My workaround is to avoid being more than one btrfs tools running.

 progResult=0

 while read proc; do
	 if [ $progResult == 0 ]; then
		 echo -e \nbtrfs tools already running

		 progResult=222
	 fi

	 echo $proc"
 done < <(ps -ef | grep -e "btrfs \{1,\}\(subvolume\|send\|receive\|delete\)")

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-07-16 14:01     ` Giuseppe Della Bianca
@ 2017-07-16 16:06       ` Marc MERLIN
  2017-07-17 11:05         ` gius db
  0 siblings, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2017-07-16 16:06 UTC (permalink / raw)
  To: Giuseppe Della Bianca; +Cc: linux-btrfs

On Sun, Jul 16, 2017 at 04:01:53PM +0200, Giuseppe Della Bianca wrote:
> > On Fri, Jul 14, 2017 at 06:22:16PM -0700, Marc MERLIN wrote:
> > > Dear Chris and other developers,
> ]zac[
> > Others on this thread with the same error: did anyone recover from this
> > without wiping the filesystem?
> > 
> > Is there a chance a balance might work around the bug so that whatever
> > layout I have, changes, and stops the bug from occuring?
> ]zac[
> 
> Any attempt, even just delete files, has worsened the situation.
> I advise not to waste time in repairs, and directly recreate the filesystem.
 
I see. So, this is a condition where the filesystem is clear as far as:
- check 
- check lowmem
- scrub
are all concerned (at least in my case), but it's in a state where
touching something around a sensitive area causes the bug.
If so, this blows, and I'm not really wanting to recreate a clean 12TB
filesystem "just because", especially since this could just happen
again after I've rebuilt it.

>  while read proc; do
> 	 if [ $progResult == 0 ]; then
> 		 echo -e \nbtrfs tools already running
> 
> 		 progResult=222
> 	 fi
> 
> 	 echo $proc"
>  done < <(ps -ef | grep -e "btrfs \{1,\}\(subvolume\|send\|receive\|delete\)")

Yeah, I probably hit that. I think you can also add scrub to that list.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-07-16 16:06       ` Marc MERLIN
@ 2017-07-17 11:05         ` gius db
  0 siblings, 0 replies; 47+ messages in thread
From: gius db @ 2017-07-17 11:05 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-btrfs

2017-07-16 18:06 GMT+02:00 Marc MERLIN <marc@merlins.org>:
> On Sun, Jul 16, 2017 at 04:01:53PM +0200, Giuseppe Della Bianca wrote:
>> > On Fri, Jul 14, 2017 at 06:22:16PM -0700, Marc MERLIN wrote:
>> > > Dear Chris and other developers,
>> ]zac[
>> > Others on this thread with the same error: did anyone recover from this
>> > without wiping the filesystem?
>> >
>> > Is there a chance a balance might work around the bug so that whatever
>> > layout I have, changes, and stops the bug from occuring?
>> ]zac[
>>
>> Any attempt, even just delete files, has worsened the situation.
>> I advise not to waste time in repairs, and directly recreate the filesystem.
>
> I see. So, this is a condition where the filesystem is clear as far as:
> - check
> - check lowmem
> - scrub
> are all concerned (at least in my case), but it's in a state where
> touching something around a sensitive area causes the bug.
> If so, this blows, and I'm not really wanting to recreate a clean 12TB
> filesystem "just because", especially since this could just happen
> again after I've rebuilt it.
>

IMHO, rebuild from scratch, 1-2 times a year, the snapshot receive
filesystem is inevitable.

For this reason, my snapshot receive filesystems have only this
purpose and are not bigger than 1-2 TB.


>>  while read proc; do
>>        if [ $progResult == 0 ]; then
>>                echo -e \nbtrfs tools already running
>>
>>                progResult=222
>>        fi
>>
>>        echo $proc"
>>  done < <(ps -ef | grep -e "btrfs \{1,\}\(subvolume\|send\|receive\|delete\)")
>
> Yeah, I probably hit that. I think you can also add scrub to that list.
>

Yes.

I did not add scrubs because my scrub are always read-only.
And I think that race condition is between snapshot receive and
subvolume delete.

I also suggest:
- Use btrfs subvolume delete with -c
- Try to add a sleep after subvolume delete and receive.

> Marc


Gdb

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-07-15 23:12   ` Marc MERLIN
  2017-07-16 14:01     ` Giuseppe Della Bianca
@ 2017-08-29  3:16     ` Marc MERLIN
  2017-08-29 14:30       ` Josef Bacik
  1 sibling, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2017-08-29  3:16 UTC (permalink / raw)
  To: linux-btrfs, Chris Murphy
  Cc: Chris Mason, bo.li.liu, fdmanana, Josef Bacik, David Sterba

On Sat, Jul 15, 2017 at 04:12:45PM -0700, Marc MERLIN wrote:
> On Fri, Jul 14, 2017 at 06:22:16PM -0700, Marc MERLIN wrote:
> > Dear Chris and other developers,
> > 
> > Can you look at this bug which has been happening since 2012 on apparently all kernels between at least
> > 3.4 and 4.11.
> > I didn't look in detail at each thread (took long enough to even find them all and paste here), but they seem pretty
> > similar although the reasons how they got there may be different, or at least not as benign as a race condition
> > between snapshot creation and deletion for those who do hourly snapshot rotations like me.
> 
> I just finished 2 check repairs, one with each mode, they both come back
> clean.
> Yet my FS still remounts read only with the same
> BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2967: errno=-17 Object already exists
> BTRFS info (device dm-2): forced readonly
> BTRFS warning (device dm-2): failed setting block group ro, ret=-30 

So this still happens pseudo randomly every 2 weeks maybe?

Last one is below.
It did not happen during a btrfs snapshot although I'm not entirely sure
what else was running at the time.

Any update on this problem?

------------[ cut here ]------------  
WARNING: CPU: 6 PID: 3783 at fs/btrfs/extent-tree.c:2967 btrfs_run_delayed_refs+0xbd/0x1be  
BTRFS: Transaction aborted (error -17)  
Modules linked in: asix veth ip6table_filter ip6_tables ebtable_nat ebtables ppdev lp xt_addrtype br_netfilter bridge stp llc tun autofs4 softdog binfmt_misc ftdi_sio nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_nat xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG iptable_mangle iptable_filter lm85 hwmon_vid pl2303 dm_snapshot dm_bufio iptable_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_conntrack_ftp ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_nat nf_conntrack x_tables sg st snd_pcm_oss snd_mixer_oss bcache kvm_intel kvm irqbypass snd_hda_codec_realtek snd_hda_codec_generic snd_cmipci snd_mpu401_uart snd_hda_intel snd_opl3_lib snd_hda_codec snd_hda_core snd_hwdep eeepc_wmi snd_rawmidi snd_seq_device tpm_infineon tpm_tis  
 snd_pcm asus_wmi snd_timer tpm_tis_core rc_ati_x10 snd ati_remote sparse_keymap rfkill i2c_i801 usbserial hwmon usbnet libphy pcspkr wmi soundcore input_leds tpm rc_core parport_pc evdev i915 lpc_ich i2c_smbus parport battery mei_me e1000e ptp pps_core fuse raid456 multipath mmc_block mmc_core dm_crypt dm_mod async_raid6_recov async_pq async_xor async_memcpy async_tx crc32c_intel blowfish_x86_64 blowfish_common aesni_intel aes_x86_64 lrw glue_helper ablk_helper cryptd sata_sil24 fjes mvsas xhci_pci libsas xhci_hcd ehci_pci ehci_hcd thermal usbcore fan r8169 mii scsi_transport_sas [last unloaded: asix]  
CPU: 2 PID: 3783 Comm: btrfs-transacti Tainted: G     U          4.9.36-amd64-preempt-sysrq-20170406 #1  
Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013  
 ffffb7eb67affc98 ffffffffae39b00b ffffb7eb67affce8 0000000000000000  
 ffffb7eb67affcd8 ffffffffae066769 00000b9767affd58 ffff974f736da960  
 ffff9756319df000 00000000ffffffef ffff975302da7a50 ffffffffffffffff  
Call Trace:  
 [<ffffffffae39b00b>] dump_stack+0x61/0x7d  
 [<ffffffffae066769>] __warn+0xc2/0xdd  
 [<ffffffffae0667de>] warn_slowpath_fmt+0x5a/0x76  
 [<ffffffffae28dd5f>] btrfs_run_delayed_refs+0xbd/0x1be  
 [<ffffffffae29ed64>] commit_cowonly_roots+0x10d/0x2b2  
 [<ffffffffae2fb5ed>] ? btrfs_qgroup_account_extents+0x131/0x181  
 [<ffffffffae28de48>] ? btrfs_run_delayed_refs+0x1a6/0x1be  
 [<ffffffffae2a131a>] btrfs_commit_transaction+0x46b/0x8fb  
 [<ffffffffae29c560>] transaction_kthread+0xf5/0x1a1  
 [<ffffffffae29c46b>] ? btrfs_cleanup_transaction+0x436/0x436  
 [<ffffffffae081e94>] kthread+0xd1/0xd9  
 [<ffffffffae081dc3>] ? init_completion+0x24/0x24  
 [<ffffffffae003add>] ? do_fast_syscall_32+0xb7/0xfe  
 [<ffffffffae6ed4b5>] ret_from_fork+0x25/0x30  
---[ end trace 4c5fcb9daa07c11a ]---  
BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2967: errno=-17 Object already exists  
BTRFS info (device dm-2): forced readonly  
BTRFS warning (device dm-2): Skipping commit of aborted transaction.  
BTRFS: error (device dm-2) in cleanup_transaction:1850: errno=-17 Object already exists  
BTRFS error (device dm-2): pending csums is 131072  

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-08-29  3:16     ` Marc MERLIN
@ 2017-08-29 14:30       ` Josef Bacik
  2017-08-29 14:39         ` Marc MERLIN
  0 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2017-08-29 14:30 UTC (permalink / raw)
  To: Marc MERLIN, linux-btrfs, Chris Murphy
  Cc: Chris Mason, bo.li.liu, fdmanana, David Sterba

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 5281 bytes --]

Sorry Marc, I’ll wire up a bcc script to try and catch when this happens.  In order for it to work it’ll need to read the extent tree in before you mount the fs, is that something you’ll be able to swing or is this your root fs?  Also is it the only btrfs fs on the system?  Thanks,

Josef

On 8/28/17, 11:17 PM, "Marc MERLIN" <marc@merlins.org> wrote:

On Sat, Jul 15, 2017 at 04:12:45PM -0700, Marc MERLIN wrote:
> On Fri, Jul 14, 2017 at 06:22:16PM -0700, Marc MERLIN wrote:
> > Dear Chris and other developers,
> > 
> > Can you look at this bug which has been happening since 2012 on apparently all kernels between at least
> > 3.4 and 4.11.
> > I didn't look in detail at each thread (took long enough to even find them all and paste here), but they seem pretty
> > similar although the reasons how they got there may be different, or at least not as benign as a race condition
> > between snapshot creation and deletion for those who do hourly snapshot rotations like me.
> 
> I just finished 2 check repairs, one with each mode, they both come back
> clean.
> Yet my FS still remounts read only with the same
> BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2967: errno=-17 Object already exists
> BTRFS info (device dm-2): forced readonly
> BTRFS warning (device dm-2): failed setting block group ro, ret=-30 

So this still happens pseudo randomly every 2 weeks maybe?

Last one is below.
It did not happen during a btrfs snapshot although I'm not entirely sure
what else was running at the time.

Any update on this problem?

------------[ cut here ]------------  
WARNING: CPU: 6 PID: 3783 at fs/btrfs/extent-tree.c:2967 btrfs_run_delayed_refs+0xbd/0x1be  
BTRFS: Transaction aborted (error -17)  
Modules linked in: asix veth ip6table_filter ip6_tables ebtable_nat ebtables ppdev lp xt_addrtype br_netfilter bridge stp llc tun autofs4 softdog binfmt_misc ftdi_sio nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_nat xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG iptable_mangle iptable_filter lm85 hwmon_vid pl2303 dm_snapshot dm_bufio iptable_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_conntrack_ftp ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_nat nf_conntrack x_tables sg st snd_pcm_oss snd_mixer_oss bcache kvm_intel kvm irqbypass snd_hda_codec_realtek snd_hda_codec_generic snd_cmipci snd_mpu401_uart snd_hda_intel snd_opl3_lib snd_hda_codec snd_hda_core snd_hwdep eeepc_wmi snd_rawmidi snd_seq_device tpm_infineon tpm_tis  
 snd_pcm asus_wmi snd_timer tpm_tis_core rc_ati_x10 snd ati_remote sparse_keymap rfkill i2c_i801 usbserial hwmon usbnet libphy pcspkr wmi soundcore input_leds tpm rc_core parport_pc evdev i915 lpc_ich i2c_smbus parport battery mei_me e1000e ptp pps_core fuse raid456 multipath mmc_block mmc_core dm_crypt dm_mod async_raid6_recov async_pq async_xor async_memcpy async_tx crc32c_intel blowfish_x86_64 blowfish_common aesni_intel aes_x86_64 lrw glue_helper ablk_helper cryptd sata_sil24 fjes mvsas xhci_pci libsas xhci_hcd ehci_pci ehci_hcd thermal usbcore fan r8169 mii scsi_transport_sas [last unloaded: asix]  
CPU: 2 PID: 3783 Comm: btrfs-transacti Tainted: G     U          4.9.36-amd64-preempt-sysrq-20170406 #1  
Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013  
 ffffb7eb67affc98 ffffffffae39b00b ffffb7eb67affce8 0000000000000000  
 ffffb7eb67affcd8 ffffffffae066769 00000b9767affd58 ffff974f736da960  
 ffff9756319df000 00000000ffffffef ffff975302da7a50 ffffffffffffffff  
Call Trace:  
 [<ffffffffae39b00b>] dump_stack+0x61/0x7d  
 [<ffffffffae066769>] __warn+0xc2/0xdd  
 [<ffffffffae0667de>] warn_slowpath_fmt+0x5a/0x76  
 [<ffffffffae28dd5f>] btrfs_run_delayed_refs+0xbd/0x1be  
 [<ffffffffae29ed64>] commit_cowonly_roots+0x10d/0x2b2  
 [<ffffffffae2fb5ed>] ? btrfs_qgroup_account_extents+0x131/0x181  
 [<ffffffffae28de48>] ? btrfs_run_delayed_refs+0x1a6/0x1be  
 [<ffffffffae2a131a>] btrfs_commit_transaction+0x46b/0x8fb  
 [<ffffffffae29c560>] transaction_kthread+0xf5/0x1a1  
 [<ffffffffae29c46b>] ? btrfs_cleanup_transaction+0x436/0x436  
 [<ffffffffae081e94>] kthread+0xd1/0xd9  
 [<ffffffffae081dc3>] ? init_completion+0x24/0x24  
 [<ffffffffae003add>] ? do_fast_syscall_32+0xb7/0xfe  
 [<ffffffffae6ed4b5>] ret_from_fork+0x25/0x30  
---[ end trace 4c5fcb9daa07c11a ]---  
BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2967: errno=-17 Object already exists  
BTRFS info (device dm-2): forced readonly  
BTRFS warning (device dm-2): Skipping commit of aborted transaction.  
BTRFS: error (device dm-2) in cleanup_transaction:1850: errno=-17 Object already exists  
BTRFS error (device dm-2): pending csums is 131072  

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=qcSpYy8ZFdhWPMDeFU0pClrt2eWlHLnDl5rqwzlssdk&s=591MXZleq8AqL3ZpDgJYq2y-sRj1LSE4F_32mkIa9Pg&e=   


ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨è­Ú&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~†­†Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ߣøm

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-08-29 14:30       ` Josef Bacik
@ 2017-08-29 14:39         ` Marc MERLIN
  2017-08-29 14:43           ` Josef Bacik
  2017-08-29 18:22           ` Josef Bacik
  0 siblings, 2 replies; 47+ messages in thread
From: Marc MERLIN @ 2017-08-29 14:39 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

On Tue, Aug 29, 2017 at 02:30:19PM +0000, Josef Bacik wrote:
> Sorry Marc, I’ll wire up a bcc script to try and catch when this
> happens.  In order for it to work it’ll need to read the extent tree in
> before you mount the fs, is that something you’ll be able to swing or is
> this your root fs?  Also is it the only btrfs fs on the system?  Thanks,

HI Josef, thanks for your reply.

Thankfully it's not the root FS.
There are 3 btrfs filesystems on that system.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-08-29 14:39         ` Marc MERLIN
@ 2017-08-29 14:43           ` Josef Bacik
  2017-08-29 18:22           ` Josef Bacik
  1 sibling, 0 replies; 47+ messages in thread
From: Josef Bacik @ 2017-08-29 14:43 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1435 bytes --]

Alright I’ll figure out a way to differentiate between the fs’s, but being able to scan the fs before it’s mounted was the hardest part so that’s perfect.  I’ll get something written up and tested today to make sure it won’t spit out false positives and send it to you this afternoon or tomorrow.  Thanks,

Josef

On 8/29/17, 10:40 AM, "Marc MERLIN" <marc@merlins.org> wrote:

On Tue, Aug 29, 2017 at 02:30:19PM +0000, Josef Bacik wrote:
> Sorry Marc, I’ll wire up a bcc script to try and catch when this
> happens.  In order for it to work it’ll need to read the extent tree in
> before you mount the fs, is that something you’ll be able to swing or is
> this your root fs?  Also is it the only btrfs fs on the system?  Thanks,

HI Josef, thanks for your reply.

Thankfully it's not the root FS.
There are 3 btrfs filesystems on that system.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=Rb6fFZaTtI5fFzN4MD03GPvT0eSOYGuRNKKA4pDehzY&s=sMstwHEsJAdwf4N0fDnuUedvuGEPnDiEV-YmTYK8Zxc&e=   


ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨è­Ú&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~†­†Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ߣøm

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-08-29 14:39         ` Marc MERLIN
  2017-08-29 14:43           ` Josef Bacik
@ 2017-08-29 18:22           ` Josef Bacik
  2017-08-30  3:40             ` Marc MERLIN
  1 sibling, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2017-08-29 18:22 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1494 bytes --]

How much metadata do you have on this fs?  I was going to hold everything in bpf hash trees, but I’m worried we’ll hit collisions and then the tracing will be useless.  If it’s too big I’ll have to dump everything to userspace and let python take care of keeping everything in memory, so if you have a lot of metadata hopefully you have lots of memory too ;).  Thanks,

Josef

On 8/29/17, 10:40 AM, "Marc MERLIN" <marc@merlins.org> wrote:

On Tue, Aug 29, 2017 at 02:30:19PM +0000, Josef Bacik wrote:
> Sorry Marc, I’ll wire up a bcc script to try and catch when this
> happens.  In order for it to work it’ll need to read the extent tree in
> before you mount the fs, is that something you’ll be able to swing or is
> this your root fs?  Also is it the only btrfs fs on the system?  Thanks,

HI Josef, thanks for your reply.

Thankfully it's not the root FS.
There are 3 btrfs filesystems on that system.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=Rb6fFZaTtI5fFzN4MD03GPvT0eSOYGuRNKKA4pDehzY&s=sMstwHEsJAdwf4N0fDnuUedvuGEPnDiEV-YmTYK8Zxc&e=   


ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨è­Ú&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~†­†Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ߣøm

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-08-29 18:22           ` Josef Bacik
@ 2017-08-30  3:40             ` Marc MERLIN
  2017-08-31 14:52               ` Josef Bacik
  0 siblings, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2017-08-30  3:40 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

On Tue, Aug 29, 2017 at 06:22:38PM +0000, Josef Bacik wrote:
> How much metadata do you have on this fs?  I was going to hold everything in bpf hash trees, but I’m worried we’ll hit collisions and then the tracing will be useless.  If it’s too big I’ll have to dump everything to userspace and let python take care of keeping everything in memory, so if you have a lot of metadata hopefully you have lots of memory too ;).  Thanks,

gargamel:~# btrfs fi df /mnt/btrfs_pool1
Data, single: total=10.60TiB, used=10.54TiB
System, DUP: total=32.00MiB, used=1.19MiB
Metadata, DUP: total=58.00GiB, used=12.69GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-08-30  3:40             ` Marc MERLIN
@ 2017-08-31 14:52               ` Josef Bacik
  2017-08-31 17:36                 ` Marc MERLIN
  0 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2017-08-31 14:52 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1956 bytes --]

Hello,

Sorry I really thought I could accomplish this with BPF, but ref tracking is just too complicated to work properly with BPF.  I forward ported my ref verification patch to the latest kernel, you can find it in the btrfs-readdir branch of my btrfs-next tree here

git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git

Just check that out, git checkout btrfs-readdir, build with CONFIG_BTRFS_FS_REF_VERIFY=y, and then mount the problematic fs with –o ref_verify and then grab the full output when it blows up and we should be able to work out what is happening from there.  Thanks,

Josef

On 8/29/17, 11:41 PM, "Marc MERLIN" <marc@merlins.org> wrote:

On Tue, Aug 29, 2017 at 06:22:38PM +0000, Josef Bacik wrote:
> How much metadata do you have on this fs?  I was going to hold everything in bpf hash trees, but I’m worried we’ll hit collisions and then the tracing will be useless.  If it’s too big I’ll have to dump everything to userspace and let python take care of keeping everything in memory, so if you have a lot of metadata hopefully you have lots of memory too ;).  Thanks,

gargamel:~# btrfs fi df /mnt/btrfs_pool1
Data, single: total=10.60TiB, used=10.54TiB
System, DUP: total=32.00MiB, used=1.19MiB
Metadata, DUP: total=58.00GiB, used=12.69GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=q-HXS1ddbqcYmJLp6pXcQoJL7qBXplbRAFRQ4eGSQYw&s=yyIlFUXCBjQ2xLoWBYzasW3BtBiLrITfkKLWvnhqgOs&e=                          | PGP 1024R/763BE901


ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨è­Ú&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~†­†Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ߣøm

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-08-31 14:52               ` Josef Bacik
@ 2017-08-31 17:36                 ` Marc MERLIN
  2017-08-31 17:48                   ` Josef Bacik
  0 siblings, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2017-08-31 17:36 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

On Thu, Aug 31, 2017 at 02:52:56PM +0000, Josef Bacik wrote:
> Hello,
> 
> Sorry I really thought I could accomplish this with BPF, but ref tracking is just too complicated to work properly with BPF.  I forward ported my ref verification patch to the latest kernel, you can find it in the btrfs-readdir branch of my btrfs-next tree here
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git

Thanks.

Now, I have to ask: how safe is this kernel btrfs-wise? I'm ok if it
crashes, but much less so if it damages my filesysetem.
I spent over a week recovering from the last corruption that happened when I
moved to 4.11 (and retreated back to 4.9).

>From other reports you've seen, has 4.11/4.12 been stable enough for others,
and is 4.13-rc (which your branch is based on, correct?) safe enough in your
opinion?
(and yes, just asking for your opinion, I totally understand that you can't
predict all bugs, and you can't give me a 100% assurance)

I do have a backup, but it indeed takes days to recover, and over a week if
the kernel also damages the other FS on that system, which is smaller, but
has maybe 100x the amount of files.

For now, the problem in the subject line, happens rarely-ish (2-3 weeks?)
although if I remove sleeps in my snapshot creation and rotation, it may
start happening more often again.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-08-31 17:36                 ` Marc MERLIN
@ 2017-08-31 17:48                   ` Josef Bacik
  2017-09-01 20:43                     ` Marc MERLIN
  0 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2017-08-31 17:48 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2659 bytes --]

We are using 4.11 in production at fb with backports from recent (a month ago?) stuff.  I’m relatively certain nothing bad will happen, and this branch has the most recent fsync() corruption fix (which exists in your kernel so it’s not new).  That said if you are uncomfortable I can rebase this patch onto whatever base you want and push out a branch, it’s your choice.  Keep in mind this is going to hold a lot of shit in memory, so I hope you have enough, and I’d definitely remove the sleep’s from your script, there’s no telling if this is a race condition or not and the overhead of the ref-verify stuff may cause it to be less likely to happen.  Thanks,

Josef 

On 8/31/17, 1:36 PM, "Marc MERLIN" <marc@merlins.org> wrote:

On Thu, Aug 31, 2017 at 02:52:56PM +0000, Josef Bacik wrote:
> Hello,
> 
> Sorry I really thought I could accomplish this with BPF, but ref tracking is just too complicated to work properly with BPF.  I forward ported my ref verification patch to the latest kernel, you can find it in the btrfs-readdir branch of my btrfs-next tree here
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git

Thanks.

Now, I have to ask: how safe is this kernel btrfs-wise? I'm ok if it
crashes, but much less so if it damages my filesysetem.
I spent over a week recovering from the last corruption that happened when I
moved to 4.11 (and retreated back to 4.9).

>From other reports you've seen, has 4.11/4.12 been stable enough for others,
and is 4.13-rc (which your branch is based on, correct?) safe enough in your
opinion?
(and yes, just asking for your opinion, I totally understand that you can't
predict all bugs, and you can't give me a 100% assurance)

I do have a backup, but it indeed takes days to recover, and over a week if
the kernel also damages the other FS on that system, which is smaller, but
has maybe 100x the amount of files.

For now, the problem in the subject line, happens rarely-ish (2-3 weeks?)
although if I remove sleeps in my snapshot creation and rotation, it may
start happening more often again.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=dPglHkF4tnOYz0Vu1uAapAEiUpHQoQoBDXggxgitjhY&s=nlFmiXkCAu4Dlg2YpjTNdKNFgTA7NzdZJ3oTOPko2U0&e=   


ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨è­Ú&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~†­†Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ߣøm

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-08-31 17:48                   ` Josef Bacik
@ 2017-09-01 20:43                     ` Marc MERLIN
  2017-09-01 23:01                       ` Josef Bacik
  0 siblings, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2017-09-01 20:43 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

On Thu, Aug 31, 2017 at 05:48:23PM +0000, Josef Bacik wrote:
> We are using 4.11 in production at fb with backports from recent (a month ago?) stuff.  I’m relatively certain nothing bad will happen, and this branch has the most recent fsync() corruption fix (which exists in your kernel so it’s not new).  That said if you are uncomfortable I can rebase this patch onto whatever base you want and push out a branch, it’s your choice.  Keep in mind this is going to hold a lot of shit in memory, so I hope you have enough, and I’d definitely remove the sleep’s from your script, there’s no telling if this is a race condition or not and the overhead of the ref-verify stuff may cause it to be less likely to happen.  Thanks,

Thanks for the warning. I have 32GB of RAM in the server, and I probably use
8. Most of the rest is so that I can do btrfs check --repair without the
machine dying :-/

I am concerned that I have a lot more metadata than I have memory:
gargamel:~# btrfs fi df /mnt/btrfs_pool1
Data, single: total=10.66TiB, used=10.60TiB
System, DUP: total=32.00MiB, used=1.20MiB
Metadata, DUP: total=58.00GiB, used=12.76GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
gargamel:~# btrfs fi df /mnt/btrfs_pool2
Data, single: total=5.07TiB, used=4.78TiB
System, DUP: total=8.00MiB, used=640.00KiB
Metadata, DUP: total=70.50GiB, used=66.58GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

That's 13GB + 67GB.
Is it going to fall over if I only have 32GB of RAM?

If I stop mounting /mnt/btrfs_pool2 for a while, will 32GB of RAM
cover the 13GB of metadata from /mnt/btrfs_pool1 ?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-09-01 20:43                     ` Marc MERLIN
@ 2017-09-01 23:01                       ` Josef Bacik
  2017-09-02 16:09                         ` Marc MERLIN
  0 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2017-09-01 23:01 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2783 bytes --]

You'll be fine, it's only happening on the one fs right?  That's 13gib of metadata with checksums and all that shit, it'll probably look like 8 or 9gib of ram worst case.  I'd mount with -o ref_verify and check the slab amount in /proc/meminfo to get an idea of real usage.  Once the mount is finished that'll be about as much metadata you will use, of course it'll grow as metadata usage grows but it should be nominal.  Thanks,

Josef

Sent from my iPhone

> On Sep 1, 2017, at 4:43 PM, Marc MERLIN <marc@merlins.org> wrote:
> 
>> On Thu, Aug 31, 2017 at 05:48:23PM +0000, Josef Bacik wrote:
>> We are using 4.11 in production at fb with backports from recent (a month ago?) stuff.  I’m relatively certain nothing bad will happen, and this branch has the most recent fsync() corruption fix (which exists in your kernel so it’s not new).  That said if you are uncomfortable I can rebase this patch onto whatever base you want and push out a branch, it’s your choice.  Keep in mind this is going to hold a lot of shit in memory, so I hope you have enough, and I’d definitely remove the sleep’s from your script, there’s no telling if this is a race condition or not and the overhead of the ref-verify stuff may cause it to be less likely to happen.  Thanks,
> 
> Thanks for the warning. I have 32GB of RAM in the server, and I probably use
> 8. Most of the rest is so that I can do btrfs check --repair without the
> machine dying :-/
> 
> I am concerned that I have a lot more metadata than I have memory:
> gargamel:~# btrfs fi df /mnt/btrfs_pool1
> Data, single: total=10.66TiB, used=10.60TiB
> System, DUP: total=32.00MiB, used=1.20MiB
> Metadata, DUP: total=58.00GiB, used=12.76GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
> gargamel:~# btrfs fi df /mnt/btrfs_pool2
> Data, single: total=5.07TiB, used=4.78TiB
> System, DUP: total=8.00MiB, used=640.00KiB
> Metadata, DUP: total=70.50GiB, used=66.58GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
> 
> That's 13GB + 67GB.
> Is it going to fall over if I only have 32GB of RAM?
> 
> If I stop mounting /mnt/btrfs_pool2 for a while, will 32GB of RAM
> cover the 13GB of metadata from /mnt/btrfs_pool1 ?
> 
> Thanks,
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                      .... what McDonalds is to gourmet cooking
> Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=9sSxC-1zmDEfNiAWSOeOTrz03WlT5Fd1j_U0WK0kfPk&s=YbE1JGIKZGAAWnKVWJfwkj0Fu_GC6OYF7fmbfjcrqHY&e=   
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨è­Ú&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~†­†Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ߣøm

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-09-01 23:01                       ` Josef Bacik
@ 2017-09-02 16:09                         ` Marc MERLIN
  2017-09-02 16:52                           ` Josef Bacik
  0 siblings, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2017-09-02 16:09 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

On Fri, Sep 01, 2017 at 11:01:30PM +0000, Josef Bacik wrote:
> You'll be fine, it's only happening on the one fs right?  That's 13gib of metadata with checksums and all that shit, it'll probably look like 8 or 9gib of ram worst case.  I'd mount with -o ref_verify and check the slab amount in /proc/meminfo to get an idea of real usage.  Once the mount is finished that'll be about as much metadata you will use, of course it'll grow as metadata usage grows but it should be nominal.  Thanks,

Looks like I don't have enough RAM :(

[   80.964838] BTRFS info (device dm-2): bdev /dev/mapper/dshelf1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
[ 1382.968986]Tbcache_writebaceinvoked oom-killer:dgfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null),  order=0, oom_score_adj=0
[ 1383.003255] bcache_writebac cpuset=/ mems_allowed=0
[ 1383.018947] CPU: 6 PID: 2359 Comm: bcache_writebac Tainted: G     U		4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
[ 1383.052448] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
[ 1383.080911] Call Trace:
[ 1383.089336]	dump_stack+0x61/0x7d
[ 1383.100132]	dump_header+0x97/0x239
[ 1383.111354]	? _raw_spin_unlock_irqrestore+0x14/0x24
[ 1383.127322]	oom_kill_process+0x86/0x379
[ 1383.140208]	out_of_memory+0x3b8/0x416
[ 1383.152581]	__alloc_pages_slowpath+0x890/0xa55
[ 1383.166960]	? _raw_spin_unlock_irq+0x11/0x21
[ 1383.180806]	__alloc_pages_nodemask+0x141/0x1f5
[ 1383.195144]	alloc_pages_current+0x8d/0x96
[ 1383.208310]	bio_alloc_pages+0x29/0x6a
[ 1383.220472]	bch_writeback_thread+0x53b/0x6ff [bcache]
[ 1383.236942]	? write_dirty+0x90/0x90 [bcache]
[ 1383.250734]	kthread+0xfb/0x100
[ 1383.261230]	? init_completion+0x24/0x24
[ 1383.273988]	? do_fast_syscall_32+0xb7/0xfe
[ 1383.287265]	ret_from_fork+0x25/0x30
[ 1383.298733] Mem-Info:
[ 1383.306446] active_anon:1 inactive_anon:3 isolated_anon:0
[ 1383.306446]	active_file:190 inactive_file:180 isolated_file:0
[ 1383.306446]	unevictable:0 dirty:0 writeback:1 unstable:0
[ 1383.306446]	slab_reclaimable:3436 slab_unreclaimable:8033273
[ 1383.306446]	mapped:1 shmem:2 pagetables:74 bounce:0
[ 1383.306446]	free:53127 free_pcp:0 free_cma:3741
[ 1383.406332] Node 0 active_anon:0kB inactive_anon:16kB active_file:896kB inactive_file:824kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:4kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
[ 1383.486392] Node 0 DMA free:15880kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1383.565818] lowmem_reserve[]: 0 3201 31832 31832 31832
[ 1383.581956] Node 0 DMA32 free:121256kB min:6788kB low:10064kB high:13340kB active_anon:0kB inactive_anon:0kB active_file:44kB inactive_file:52kB unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1383.665831] lowmem_reserve[]: 0 0 28631 28631 28631
[ 1383.681212] Node 0 Normal free:75372kB min:60760kB low:90092kB high:119424kB active_anon:0kB inactive_anon:16kB active_file:788kB inactive_file:836kB unevictable:0kB writepending:0kB present:29874176kB managed:29337252kB mlocked:0kB kernel_stack:8048kB pagetables:296kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:14964kB
[ 1383.769793] lowmem_reserve[]: 0 0 0 0 0
[ 1383.782429] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15880kB
[ 1383.823171] Node 0 DMA32: 4*4kB (UM) 21*8kB (UME) 9*16kB (UME) 5*32kB (ME) 5*64kB (UME) 5*128kB (UME) 6*256kB (UME) 5*512kB (UME) 5*1024kB (ME) 4*2048kB (UME) 25*4096kB (M) = 121256kB
[ 1383.874564] Node 0 Normal: 773*4kB (UMEC) 494*8kB (ME) 373*16kB (UMEC) 284*32kB (MEC) 177*64kB (UMEC) 108*128kB (UME) 36*256kB (UME) 9*512kB (UMEC) 0*1024kB 1*2048kB (C) 3*4096kB (C) = 75412kB
[ 1383.927787] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 1383.954002] 467 total pagecache pages
[ 1383.965889] 3 pages in swap cache
[ 1383.976715] Swap cache stats: add 1253, delete 1250, find 21/36
[ 1383.995325] Free swap  = 15610620kB
[ 1384.006675] Total swap = 15616764kB
[ 1384.018005] 8313052 pages RAM
[ 1384.027730] 0 pages HighMem/MovableOnly
[ 1384.040076] 150644 pages reserved
[ 1384.050845] 4096 pages cma reserved
[ 1384.062127] 0 pages hwpoisoned
[ 1384.072133] [ pid ]	 uid  tgid total_vm	 rss nr_ptes nr_pmds swapents oom_score_adj name
[ 1384.098531] [  983]	   0   983	936	   0	   6	   2	   32		  0 init
[ 1384.124971] [  984]	   0   984	941	   1	   5	   2	   98		  0 rc
[ 1384.150843] [ 1103]	   0  1103	920	   1	   5	   2	  188	      -1000 udevd
[ 1384.177534] [ 1311]	   0  1311	925	   1	   5	   2	   67	      -1000 net.agent
[ 1384.205278] [ 1352]	   0  1352	925	   1	   5	   2	   66	      -1000 net.agent
[ 1384.233017] [ 1703]	   0  1703	926	   1	   5	   2	   68	      -1000 net.agent
[ 1384.260731] [ 1935]	   0  1935	587	   0	   5	   2	   31		  0 bootlogd
[ 1384.288190] [ 2469]	   0  2469	993	   0	   5	   2	  262	      -1000 udevd
[ 1384.314846] [ 2470]	   0  2470	993	   0	   5	   2	  261	      -1000 udevd
[ 1384.341494] [ 3049]	   0  3049     1538	   1	   6	   2	  177		  0 S13mountall.sh
[ 1384.370576] [ 3125]	   0  3125     1718	   0	   7	   2	  128		  0 mount
[ 1384.397360] [15456]	   0 15456	124	   0	   3	   2	   10	      -1000 sleep
[ 1384.424026] [15457]	   0 15457	124	   0	   3	   2	   12	      -1000 sleep
[ 1384.450650] [15458]	   0 15458	124	   1	   3	   2	   10	      -1000 sleep
[ 1384.477317] Out of memory: Kill process 3049 (S13mountall.sh) score 0 or sacrifice child
[ 1384.502384] Killed process 3125 (mount) total-vm:6872kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 1384.535964] oom_reaper: reaped process 3125 (mount), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 1384.573082] bcache_writebac invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null),  order=0, oom_score_adj=0
[ 1384.607340] bcache_writebac cpuset=/ mems_allowed=0
[ 1384.623102] CPU: 0 PID: 2359 Comm: bcache_writebac Tainted: G     U		4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
[ 1384.656825] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
[ 1384.685361] Call Trace:
[ 1384.693823]	dump_stack+0x61/0x7d
[ 1384.704866]	dump_header+0x97/0x239
[ 1384.716086]	? _raw_spin_unlock_irqrestore+0x14/0x24
[ 1384.731697]	oom_kill_process+0x86/0x379
[ 1384.744201]	out_of_memory+0x3b8/0x416
[ 1384.756259]	__alloc_pages_slowpath+0x890/0xa55
[ 1384.770536]	? _raw_spin_unlock_irq+0x11/0x21
[ 1384.784302]	__alloc_pages_nodemask+0x141/0x1f5
[ 1384.798539]	alloc_pages_current+0x8d/0x96
[ 1384.811465]	bio_alloc_pages+0x29/0x6a
[ 1384.823334]	bch_writeback_thread+0x53b/0x6ff [bcache]
[ 1384.839334]	? write_dirty+0x90/0x90 [bcache]
[ 1384.852984]	kthread+0xfb/0x100
[ 1384.862970]	? init_completion+0x24/0x24
[ 1384.875285]	? do_fast_syscall_32+0xb7/0xfe
[ 1384.888368]	ret_from_fork+0x25/0x30
[ 1384.899696] Mem-Info:
[ 1384.907064] active_anon:0 inactive_anon:2 isolated_anon:0
[ 1384.907064]	active_file:189 inactive_file:273 isolated_file:0
[ 1384.907064]	unevictable:0 dirty:0 writeback:0 unstable:0
[ 1384.907064]	slab_reclaimable:3414 slab_unreclaimable:8053934
[ 1384.907064]	mapped:1 shmem:2 pagetables:74 bounce:0
[ 1384.907064]	free:32075 free_pcp:25 free_cma:3741
[ 1384.922833] kworker/6:1H: page allocation failure: order:0, mode:0x1284020(GFP_ATOMIC|__GFP_COMP|__GFP_NOTRACK), nodemask=(null)
[ 1384.922836] kworker/6:1H cpuset=/ mems_allowed=0
[ 1384.922840] CPU: 6 PID: 400 Comm: kworker/6:1H Tainted: G	 U	    4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
[ 1384.922841] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
[ 1384.922844] Workqueue: kblockd blk_mq_run_work_fn
[ 1384.922845] Call Trace:
[ 1384.922849]	dump_stack+0x61/0x7d
[ 1384.922851]	warn_alloc+0xfc/0x18c
[ 1384.922854]	__alloc_pages_slowpath+0x9ca/0xa55
[ 1384.922856]	? __alloc_pages_slowpath+0x9ca/0xa55
[ 1384.922858]	__alloc_pages_nodemask+0x141/0x1f5
[ 1384.922862]	cache_grow_begin+0xa4/0x294
[ 1384.922863]	fallback_alloc+0x154/0x196
[ 1384.922865]	? cache_grow_begin+0xa4/0x294
[ 1384.922867]	____cache_alloc_node+0xdd/0xe9
[ 1384.922869]	kmem_cache_alloc+0x98/0x143
[ 1384.922873]	sas_alloc_task+0x1d/0x32 [libsas]
[ 1384.922876]	sas_ata_qc_issue+0x71/0x21c [libsas]
[ 1384.922878]	ata_qc_issue+0x1fc/0x24c
[ 1384.922880]	? ata_scsi_write_same_xlat+0x2d1/0x2d1
[ 1384.922882]	__ata_scsi_queuecmd+0x18f/0x1eb
[ 1384.922883]	ata_sas_queuecmd+0x31/0x4d
[ 1384.922886]	sas_queuecommand+0x83/0x1cf [libsas]
[ 1384.922889]	? blk_add_timer+0xcb/0x10f
[ 1384.922892]	scsi_dispatch_cmd+0x141/0x210
[ 1384.922893]	scsi_queue_rq+0x1c7/0x28f
[ 1384.922895]	blk_mq_dispatch_rq_list+0x1a6/0x2cf
[ 1384.922896]	? find_next_bit+0xb/0xd
[ 1384.922899]	blk_mq_sched_dispatch_requests+0x14e/0x1e7
[ 1384.922900]	? __switch_to+0x288/0x44b
[ 1384.922911]	__blk_mq_run_hw_queue+0x4c/0x7f
[ 1384.922912]	blk_mq_run_work_fn+0x2c/0x2e
[ 1384.922913]	process_one_work+0x179/0x2a5
[ 1384.922915]	? rescuer_thread+0x273/0x273
[ 1384.922915]	worker_thread+0x1a8/0x25b
[ 1384.922917]	? rescuer_thread+0x273/0x273
[ 1384.922917]	kthread+0xfb/0x100
[ 1384.922918]	? init_completion+0x24/0x24
[ 1384.922919]	? do_fast_syscall_32+0xb7/0xfe
[ 1384.922920]	ret_from_fork+0x25/0x30
[ 1384.922922] Mem-Info:
[ 1384.922924] active_anon:0 inactive_anon:2 isolated_anon:0
[ 1384.922924]	active_file:199 inactive_file:263 isolated_file:0
[ 1384.922924]	unevictable:0 dirty:0 writeback:0 unstable:0
[ 1384.922924]	slab_reclaimable:3414 slab_unreclaimable:8055394
[ 1384.922924]	mapped:1 shmem:2 pagetables:74 bounce:0
[ 1384.922924]	free:30587 free_pcp:18 free_cma:3741
[ 1384.922926] Node 0 active_anon:0kB inactive_anon:8kB active_file:796kB inactive_file:1052kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:4kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 1384.922926] Node 0 DMA free:15880kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1384.922928] lowmem_reserve[]: 0 3201 31832 31832 31832
[ 1384.922930] Node 0 DMA32 free:91392kB min:6788kB low:10064kB high:13340kB active_anon:0kB inactive_anon:0kB active_file:56kB inactive_file:56kB unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:72kB local_pcp:72kB free_cma:0kB
[ 1384.922932] lowmem_reserve[]: 0 0 28631 28631 28631
[ 1384.922933] Node 0 Normal free:15076kB min:60760kB low:90092kB high:119424kB active_anon:0kB inactive_anon:8kB active_file:740kB inactive_file:996kB unevictable:0kB writepending:0kB present:29874176kB managed:29337252kB mlocked:0kB kernel_stack:8048kB pagetables:296kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:14964kB
[ 1384.922935] lowmem_reserve[]: 0 0 0 0 0
[ 1384.922936] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15880kB
[ 1384.922941] Node 0 DMA32: 2*4kB (UM) 3*8kB (ME) 4*16kB (UME) 5*32kB (ME) 4*64kB (ME) 4*128kB (ME) 5*256kB (ME) 4*512kB (ME) 5*1024kB (ME) 4*2048kB (UME) 18*4096kB (M) = 91392kB
[ 1384.922946] Node 0 Normal: 1*4kB (C) 0*8kB 1*16kB (C) 1*32kB (C) 1*64kB (C) 0*128kB 0*256kB 1*512kB (C) 0*1024kB 1*2048kB (C) 3*4096kB (C) = 14964kB
[ 1384.922951] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 1384.922951] 464 total pagecache pages
[ 1384.922953] 0 pages in swap cache
[ 1384.922954] Swap cache stats: add 1253, delete 1253, find 21/36
[ 1384.922954] Free swap  = 15611132kB
[ 1384.922954] Total swap = 15616764kB
[ 1384.922955] 8313052 pages RAM
[ 1384.922955] 0 pages HighMem/MovableOnly
[ 1384.922955] 150644 pages reserved
[ 1384.922956] 4096 pages cma reserved
[ 1384.922956] 0 pages hwpoisoned
[ 1385.007958] ata17.00: exception Emask 0x0 SAct 0x10000 SErr 0x0 action 0x0
[ 1385.007961] ata17.00: failed command: READ FPDMA QUEUED
[ 1385.007965] ata17.00: cmd 60/20:80:90:81:6f/00:00:35:01:00/40 tag 16 ncq dma 16384 in
[ 1385.007965]		res 40/00:78:10:2c:8d/00:00:f1:00:00/40 Emask 0x40 (internal error)
[ 1385.007966] ata17.00: status: { DRDY }
[ 1385.008982] ata17.00: Security Log not supported
[ 1385.010102] ata17.00: Security Log not supported
[ 1385.010104] ata17.00: configured for UDMA/133
[ 1385.010110] ata17: EH complete
[ 1385.010162] scsi_eh_10: page allocation failure: order:0, mode:0x1284020(GFP_ATOMIC|__GFP_COMP|__GFP_NOTRACK), nodemask=(null)
[ 1385.010164] scsi_eh_10 cpuset=/ mems_allowed=0
[ 1385.010175] CPU: 6 PID: 409 Comm: scsi_eh_10 Tainted: G     U	  4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
[ 1385.010175] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
[ 1385.010175] Call Trace:
[ 1385.010178]	dump_stack+0x61/0x7d
[ 1385.010179]	warn_alloc+0xfc/0x18c
[ 1385.010181]	__alloc_pages_slowpath+0x9ca/0xa55
[ 1385.010182]	? __alloc_pages_slowpath+0x9ca/0xa55
[ 1385.010184]	__alloc_pages_nodemask+0x141/0x1f5
[ 1385.010186]	cache_grow_begin+0xa4/0x294
[ 1385.010187]	fallback_alloc+0x154/0x196
[ 1385.010188]	? cache_grow_begin+0xa4/0x294
[ 1385.010189]	____cache_alloc_node+0xdd/0xe9
[ 1385.010191]	kmem_cache_alloc+0x98/0x143
[ 1385.010193]	sas_alloc_task+0x1d/0x32 [libsas]
[ 1385.010195]	sas_ata_qc_issue+0x71/0x21c [libsas]
[ 1385.010196]	ata_qc_issue+0x1fc/0x24c
[ 1385.010198]	? ata_scsi_write_same_xlat+0x2d1/0x2d1
[ 1385.010198]	__ata_scsi_queuecmd+0x18f/0x1eb
[ 1385.010200]	ata_sas_queuecmd+0x31/0x4d
[ 1385.010202]	sas_queuecommand+0x83/0x1cf [libsas]
[ 1385.010203]	? blk_add_timer+0xcb/0x10f
[ 1385.010205]	scsi_dispatch_cmd+0x141/0x210
[ 1385.010205]	scsi_queue_rq+0x1c7/0x28f
[ 1385.010207]	blk_mq_dispatch_rq_list+0x1a6/0x2cf
[ 1385.010208]	blk_mq_sched_dispatch_requests+0x129/0x1e7
[ 1385.010209]	__blk_mq_run_hw_queue+0x4c/0x7f
[ 1385.010210]	__blk_mq_delay_run_hw_queue+0x5c/0xa2
[ 1385.010211]	blk_mq_run_hw_queue+0x14/0x16
[ 1385.010212]	blk_mq_run_hw_queues+0x2e/0x5e
[ 1385.010212]	scsi_run_queue+0x236/0x2c1
[ 1385.010214]	scsi_run_host_queues+0x1f/0x37
[ 1385.010215]	scsi_error_handler+0x467/0x523
[ 1385.010216]	? __schedule+0x4f5/0x5c5
[ 1385.010217]	? scsi_eh_get_sense+0x1a9/0x1a9
[ 1385.010218]	kthread+0xfb/0x100
[ 1385.010219]	? init_completion+0x24/0x24
[ 1385.010220]	ret_from_fork+0x25/0x30
[ 1385.010260] ata17.00: exception Emask 0x0 SAct 0x20000 SErr 0x0 action 0x0
[ 1385.010263] ata17.00: failed command: READ FPDMA QUEUED
[ 1385.010266] ata17.00: cmd 60/20:88:90:81:6f/00:00:35:01:00/40 tag 17 ncq dma 16384 in
[ 1385.010266]		res 50/00:01:30:00:00/00:00:00:00:00/a0 Emask 0x40 (internal error)
[ 1385.010267] ata17.00: status: { DRDY }
[ 1385.011259] ata17.00: Security Log not supported
[ 1385.012380] ata17.00: Security Log not supported
[ 1385.012382] ata17.00: configured for UDMA/133
[ 1385.012385] ata17: EH complete
[ 1385.335912] mount: page allocation failure: order:0, mode:0x1604040(GFP_NOFS|__GFP_COMP|__GFP_NOTRACK), nodemask=(null)
[ 1385.335916] mount cpuset=/ mems_allowed=0
[ 1385.335920] CPU: 7 PID: 3125 Comm: mount Tainted: G	   U	      4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
[ 1385.335920] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
[ 1385.335921] Call Trace:
[ 1385.335927]	dump_stack+0x61/0x7d
[ 1385.335930]	warn_alloc+0xfc/0x18c
[ 1385.335933]	? call_timer_fn+0x140/0x140
[ 1385.335935]	__alloc_pages_slowpath+0x9ca/0xa55
[ 1385.335939]	__alloc_pages_nodemask+0x141/0x1f5
[ 1385.335943]	cache_grow_begin+0xa4/0x294
[ 1385.335945]	fallback_alloc+0x154/0x196
[ 1385.335946]	? cache_grow_begin+0xa4/0x294
[ 1385.335948]	____cache_alloc_node+0xdd/0xe9
[ 1385.335950]	kmem_cache_alloc_trace+0xa0/0xfc
[ 1385.335953]	add_tree_block+0x6a/0x1a1
[ 1385.335955]	build_ref_tree_for_root+0x1aa/0x3c8
[ 1385.335956]	btrfs_build_ref_tree+0x142/0x179
[ 1385.335958]	open_ctree+0x19af/0x1ffe
[ 1385.335961]	? _raw_spin_unlock_bh+0x1a/0x1c
[ 1385.335964]	btrfs_mount+0xa0e/0xb86
[ 1385.335965]	? btrfs_mount+0xa0e/0xb86
[ 1385.335967]	? find_next_bit+0xb/0xd
[ 1385.335970]	mount_fs+0x67/0x111
[ 1385.335973]	vfs_kern_mount+0x6b/0xd5
[ 1385.335974]	btrfs_mount+0x1de/0xb86
[ 1385.335975]	? find_next_bit+0xb/0xd
[ 1385.335978]	mount_fs+0x67/0x111
[ 1385.335979]	vfs_kern_mount+0x6b/0xd5
[ 1385.335981]	do_mount+0x6e9/0x987
[ 1385.335984]	compat_SyS_mount+0x185/0x1ae
[ 1385.335986]	do_fast_syscall_32+0xb7/0xfe
[ 1385.335988]	entry_SYSENTER_compat+0x4c/0x5b
[ 1385.335990] RIP: 0023:0xf7f69c29
[ 1385.335991] RSP: 002b:00000000ffa6fed0 EFLAGS: 00000297 ORIG_RAX: 0000000000000015
[ 1385.335992] RAX: ffffffffffffffda RBX: 0000000009877050 RCX: 00000000098771e8
[ 1385.335993] RDX: 0000000009877370 RSI: 00000000c0ed0400 RDI: 00000000098bd548
[ 1385.335993] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 1385.335994] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 1385.335994] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 1387.789938] Node 0 active_anon:588kB inactive_anon:300kB active_file:3988kB inactive_file:1428kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:2184kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
[ 1387.871500] Node 0 DMA free:0kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1387.949345] lowmem_reserve[]: 0 3201 31832 31832 31832
[ 1387.965376] Node 0 DMA32 free:621628kB min:6788kB low:10064kB high:13340kB active_anon:0kB inactive_anon:0kB active_file:84kB inactive_file:28kB unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:640kB local_pcp:0kB free_cma:0kB
[ 1388.049300] lowmem_reserve[]: 0 0 28631 28631 28631
[ 1388.064560] Node 0 Normal free:4812428kB min:60760kB low:90092kB high:119424kB active_anon:588kB inactive_anon:300kB active_file:3904kB inactive_file:1400kB unevictable:0kB writepending:0kB present:29874176kB managed:29337252kB mlocked:0kB kernel_stack:8080kB pagetables:320kB bounce:0kB free_pcp:4124kB local_pcp:420kB free_cma:11288kB
[ 1388.155296] lowmem_reserve[]: 0 0 0 0 0
[ 1388.167479] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[ 1388.198947] Node 0 DMA32: 4430*4kB (U) 3580*8kB (U) 1883*16kB (U) 1009*32kB (U) 17*64kB (U) 16*128kB (U) 12*256kB (U) 10*512kB (U) 17*1024kB (U) 18*2048kB (U) 126*4096kB (U) = 690472kB
[ 1388.249622] Node 0 Normal: 71828*4kB (UC) 54033*8kB (UC) 34313*16kB (UC) 21097*32kB (UC) 10342*64kB (U) 2801*128kB (UC) 201*256kB (UC) 96*512kB (UC) 68*1024kB (U) 48*2048kB (UC) 457*4096kB (UC) = 5104520kB
[ 1388.305855] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 1388.331978] 1467 total pagecache pages
[ 1388.344026] 111 pages in swap cache
[ 1388.355282] Swap cache stats: add 1465, delete 1354, find 364/553
[ 1388.374360] Free swap  = 15611132kB
[ 1388.385607] Total swap = 15616764kB
[ 1388.396863] 8313052 pages RAM
[ 1388.406556] 0 pages HighMem/MovableOnly
[ 1388.418861] 150644 pages reserved
[ 1388.429595] 4096 pages cma reserved
[ 1388.440853] 0 pages hwpoisoned
[ 1388.450807] [ pid ]	 uid  tgid total_vm	 rss nr_ptes nr_pmds swapents oom_score_adj name
[ 1388.477200] [  983]	   0   983	936	   0	   6	   2	   32		  0 init
[ 1388.503586] [  984]	   0   984	941	   1	   5	   2	   98		  0 rc
[ 1388.529456] [ 1103]	   0  1103	920	   1	   5	   2	  188	      -1000 udevd
[ 1388.556123] [ 1311]	   0  1311	925	 443	   5	   2	   24	      -1000 net.agent
[ 1388.583800] [ 1352]	   0  1352	925	 441	   5	   2	   26	      -1000 net.agent
[ 1388.611490] [ 1703]	   0  1703	926	 442	   5	   2	   26	      -1000 net.agent
[ 1388.639176] [ 1935]	   0  1935	587	   0	   5	   2	   31		  0 bootlogd
[ 1388.666611] [ 2469]	   0  2469	993	   0	   5	   2	  262	      -1000 udevd
[ 1388.693254] [ 2470]	   0  2470	993	   0	   5	   2	  261	      -1000 udevd
[ 1388.719913] [ 3049]	   0  3049     1538	   1	   6	   2	  177		  0 S13mountall.sh
[ 1388.748886] [ 3125]	   0  3125     1718	   0	   7	   2	    0		  0 mount
[ 1388.775570] [15483]	   0 15483	558	 141	   5	   2	    0	      -1000 sleep
[ 1388.802207] [15484]	   0 15484	558	 146	   4	   2	    0	      -1000 sleep
[ 1388.828828] [15485]	   0 15485	558	 145	   5	   2	    0	      -1000 sleep
[ 1388.855456] Out of memory: Kill process 3049 (S13mountall.sh) score 0 or sacrifice child

And hopefully totally unrelated (but maybe not), after the boot continues, it
crashes with:
[ 1523.299228] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffbd1d4c2c
[ 1523.299228]
[ 1523.334262] CPU: 2 PID: 19932 Comm: avahi-daemon Tainted: G	   U	      4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
[ 1523.367142] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
[ 1523.395339] Call Trace:
[ 1523.403515]	dump_stack+0x61/0x7d
[ 1523.414266]	panic+0xe7/0x235
[ 1523.423982]	? compat_core_sys_select+0x25b/0x26d
[ 1523.438878]	__stack_chk_fail+0x19/0x19
[ 1523.451168]	compat_core_sys_select+0x25b/0x26d
[ 1523.465552]	? compat_SyS_select+0xe/0x10
[ 1523.478358]	? do_fast_syscall_32+0xb7/0xfe
[ 1523.491698]	? entry_SYSENTER_compat+0x4c/0x5b
[ 1523.505858] Kernel Offset: 0x3c000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1523.538981] Rebooting in 20 seconds..

I did add stack-protector in 4.13, and it seems to be finding an unrelated bug.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-09-02 16:09                         ` Marc MERLIN
@ 2017-09-02 16:52                           ` Josef Bacik
       [not found]                             ` <CAHKv19A=OVgCpQpDL2454T+f8QgLm9iynA8xZ4w4Kg8JjYS=UA@mail.gmail.com>
  2017-09-02 23:53                             ` Marc MERLIN
  0 siblings, 2 replies; 47+ messages in thread
From: Josef Bacik @ 2017-09-02 16:52 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

Oops, ok I've updated my tree so we don't save the stack trace of the initial scan, which we don't need anyway.  That should save a decent amount of memory in your case.  It was an in place update so you'll need to blow away your local branch and pull the new one to get the new code.  Thanks,

Josef

Sent from my iPhone

> On Sep 2, 2017, at 12:10 PM, Marc MERLIN <marc@merlins.org> wrote:
> 
>> On Fri, Sep 01, 2017 at 11:01:30PM +0000, Josef Bacik wrote:
>> You'll be fine, it's only happening on the one fs right?  That's 13gib of metadata with checksums and all that shit, it'll probably look like 8 or 9gib of ram worst case.  I'd mount with -o ref_verify and check the slab amount in /proc/meminfo to get an idea of real usage.  Once the mount is finished that'll be about as much metadata you will use, of course it'll grow as metadata usage grows but it should be nominal.  Thanks,
> 
> Looks like I don't have enough RAM :(
> 
> [   80.964838] BTRFS info (device dm-2): bdev /dev/mapper/dshelf1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
> [ 1382.968986]Tbcache_writebaceinvoked oom-killer:dgfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null),  order=0, oom_score_adj=0
> [ 1383.003255] bcache_writebac cpuset=/ mems_allowed=0
> [ 1383.018947] CPU: 6 PID: 2359 Comm: bcache_writebac Tainted: G     U        4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
> [ 1383.052448] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> [ 1383.080911] Call Trace:
> [ 1383.089336]    dump_stack+0x61/0x7d
> [ 1383.100132]    dump_header+0x97/0x239
> [ 1383.111354]    ? _raw_spin_unlock_irqrestore+0x14/0x24
> [ 1383.127322]    oom_kill_process+0x86/0x379
> [ 1383.140208]    out_of_memory+0x3b8/0x416
> [ 1383.152581]    __alloc_pages_slowpath+0x890/0xa55
> [ 1383.166960]    ? _raw_spin_unlock_irq+0x11/0x21
> [ 1383.180806]    __alloc_pages_nodemask+0x141/0x1f5
> [ 1383.195144]    alloc_pages_current+0x8d/0x96
> [ 1383.208310]    bio_alloc_pages+0x29/0x6a
> [ 1383.220472]    bch_writeback_thread+0x53b/0x6ff [bcache]
> [ 1383.236942]    ? write_dirty+0x90/0x90 [bcache]
> [ 1383.250734]    kthread+0xfb/0x100
> [ 1383.261230]    ? init_completion+0x24/0x24
> [ 1383.273988]    ? do_fast_syscall_32+0xb7/0xfe
> [ 1383.287265]    ret_from_fork+0x25/0x30
> [ 1383.298733] Mem-Info:
> [ 1383.306446] active_anon:1 inactive_anon:3 isolated_anon:0
> [ 1383.306446]    active_file:190 inactive_file:180 isolated_file:0
> [ 1383.306446]    unevictable:0 dirty:0 writeback:1 unstable:0
> [ 1383.306446]    slab_reclaimable:3436 slab_unreclaimable:8033273
> [ 1383.306446]    mapped:1 shmem:2 pagetables:74 bounce:0
> [ 1383.306446]    free:53127 free_pcp:0 free_cma:3741
> [ 1383.406332] Node 0 active_anon:0kB inactive_anon:16kB active_file:896kB inactive_file:824kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:4kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
> [ 1383.486392] Node 0 DMA free:15880kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [ 1383.565818] lowmem_reserve[]: 0 3201 31832 31832 31832
> [ 1383.581956] Node 0 DMA32 free:121256kB min:6788kB low:10064kB high:13340kB active_anon:0kB inactive_anon:0kB active_file:44kB inactive_file:52kB unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [ 1383.665831] lowmem_reserve[]: 0 0 28631 28631 28631
> [ 1383.681212] Node 0 Normal free:75372kB min:60760kB low:90092kB high:119424kB active_anon:0kB inactive_anon:16kB active_file:788kB inactive_file:836kB unevictable:0kB writepending:0kB present:29874176kB managed:29337252kB mlocked:0kB kernel_stack:8048kB pagetables:296kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:14964kB
> [ 1383.769793] lowmem_reserve[]: 0 0 0 0 0
> [ 1383.782429] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15880kB
> [ 1383.823171] Node 0 DMA32: 4*4kB (UM) 21*8kB (UME) 9*16kB (UME) 5*32kB (ME) 5*64kB (UME) 5*128kB (UME) 6*256kB (UME) 5*512kB (UME) 5*1024kB (ME) 4*2048kB (UME) 25*4096kB (M) = 121256kB
> [ 1383.874564] Node 0 Normal: 773*4kB (UMEC) 494*8kB (ME) 373*16kB (UMEC) 284*32kB (MEC) 177*64kB (UMEC) 108*128kB (UME) 36*256kB (UME) 9*512kB (UMEC) 0*1024kB 1*2048kB (C) 3*4096kB (C) = 75412kB
> [ 1383.927787] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> [ 1383.954002] 467 total pagecache pages
> [ 1383.965889] 3 pages in swap cache
> [ 1383.976715] Swap cache stats: add 1253, delete 1250, find 21/36
> [ 1383.995325] Free swap  = 15610620kB
> [ 1384.006675] Total swap = 15616764kB
> [ 1384.018005] 8313052 pages RAM
> [ 1384.027730] 0 pages HighMem/MovableOnly
> [ 1384.040076] 150644 pages reserved
> [ 1384.050845] 4096 pages cma reserved
> [ 1384.062127] 0 pages hwpoisoned
> [ 1384.072133] [ pid ]     uid  tgid total_vm     rss nr_ptes nr_pmds swapents oom_score_adj name
> [ 1384.098531] [  983]       0   983    936       0       6       2       32          0 init
> [ 1384.124971] [  984]       0   984    941       1       5       2       98          0 rc
> [ 1384.150843] [ 1103]       0  1103    920       1       5       2      188          -1000 udevd
> [ 1384.177534] [ 1311]       0  1311    925       1       5       2       67          -1000 net.agent
> [ 1384.205278] [ 1352]       0  1352    925       1       5       2       66          -1000 net.agent
> [ 1384.233017] [ 1703]       0  1703    926       1       5       2       68          -1000 net.agent
> [ 1384.260731] [ 1935]       0  1935    587       0       5       2       31          0 bootlogd
> [ 1384.288190] [ 2469]       0  2469    993       0       5       2      262          -1000 udevd
> [ 1384.314846] [ 2470]       0  2470    993       0       5       2      261          -1000 udevd
> [ 1384.341494] [ 3049]       0  3049     1538       1       6       2      177          0 S13mountall.sh
> [ 1384.370576] [ 3125]       0  3125     1718       0       7       2      128          0 mount
> [ 1384.397360] [15456]       0 15456    124       0       3       2       10          -1000 sleep
> [ 1384.424026] [15457]       0 15457    124       0       3       2       12          -1000 sleep
> [ 1384.450650] [15458]       0 15458    124       1       3       2       10          -1000 sleep
> [ 1384.477317] Out of memory: Kill process 3049 (S13mountall.sh) score 0 or sacrifice child
> [ 1384.502384] Killed process 3125 (mount) total-vm:6872kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> [ 1384.535964] oom_reaper: reaped process 3125 (mount), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> [ 1384.573082] bcache_writebac invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null),  order=0, oom_score_adj=0
> [ 1384.607340] bcache_writebac cpuset=/ mems_allowed=0
> [ 1384.623102] CPU: 0 PID: 2359 Comm: bcache_writebac Tainted: G     U        4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
> [ 1384.656825] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> [ 1384.685361] Call Trace:
> [ 1384.693823]    dump_stack+0x61/0x7d
> [ 1384.704866]    dump_header+0x97/0x239
> [ 1384.716086]    ? _raw_spin_unlock_irqrestore+0x14/0x24
> [ 1384.731697]    oom_kill_process+0x86/0x379
> [ 1384.744201]    out_of_memory+0x3b8/0x416
> [ 1384.756259]    __alloc_pages_slowpath+0x890/0xa55
> [ 1384.770536]    ? _raw_spin_unlock_irq+0x11/0x21
> [ 1384.784302]    __alloc_pages_nodemask+0x141/0x1f5
> [ 1384.798539]    alloc_pages_current+0x8d/0x96
> [ 1384.811465]    bio_alloc_pages+0x29/0x6a
> [ 1384.823334]    bch_writeback_thread+0x53b/0x6ff [bcache]
> [ 1384.839334]    ? write_dirty+0x90/0x90 [bcache]
> [ 1384.852984]    kthread+0xfb/0x100
> [ 1384.862970]    ? init_completion+0x24/0x24
> [ 1384.875285]    ? do_fast_syscall_32+0xb7/0xfe
> [ 1384.888368]    ret_from_fork+0x25/0x30
> [ 1384.899696] Mem-Info:
> [ 1384.907064] active_anon:0 inactive_anon:2 isolated_anon:0
> [ 1384.907064]    active_file:189 inactive_file:273 isolated_file:0
> [ 1384.907064]    unevictable:0 dirty:0 writeback:0 unstable:0
> [ 1384.907064]    slab_reclaimable:3414 slab_unreclaimable:8053934
> [ 1384.907064]    mapped:1 shmem:2 pagetables:74 bounce:0
> [ 1384.907064]    free:32075 free_pcp:25 free_cma:3741
> [ 1384.922833] kworker/6:1H: page allocation failure: order:0, mode:0x1284020(GFP_ATOMIC|__GFP_COMP|__GFP_NOTRACK), nodemask=(null)
> [ 1384.922836] kworker/6:1H cpuset=/ mems_allowed=0
> [ 1384.922840] CPU: 6 PID: 400 Comm: kworker/6:1H Tainted: G     U        4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
> [ 1384.922841] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> [ 1384.922844] Workqueue: kblockd blk_mq_run_work_fn
> [ 1384.922845] Call Trace:
> [ 1384.922849]    dump_stack+0x61/0x7d
> [ 1384.922851]    warn_alloc+0xfc/0x18c
> [ 1384.922854]    __alloc_pages_slowpath+0x9ca/0xa55
> [ 1384.922856]    ? __alloc_pages_slowpath+0x9ca/0xa55
> [ 1384.922858]    __alloc_pages_nodemask+0x141/0x1f5
> [ 1384.922862]    cache_grow_begin+0xa4/0x294
> [ 1384.922863]    fallback_alloc+0x154/0x196
> [ 1384.922865]    ? cache_grow_begin+0xa4/0x294
> [ 1384.922867]    ____cache_alloc_node+0xdd/0xe9
> [ 1384.922869]    kmem_cache_alloc+0x98/0x143
> [ 1384.922873]    sas_alloc_task+0x1d/0x32 [libsas]
> [ 1384.922876]    sas_ata_qc_issue+0x71/0x21c [libsas]
> [ 1384.922878]    ata_qc_issue+0x1fc/0x24c
> [ 1384.922880]    ? ata_scsi_write_same_xlat+0x2d1/0x2d1
> [ 1384.922882]    __ata_scsi_queuecmd+0x18f/0x1eb
> [ 1384.922883]    ata_sas_queuecmd+0x31/0x4d
> [ 1384.922886]    sas_queuecommand+0x83/0x1cf [libsas]
> [ 1384.922889]    ? blk_add_timer+0xcb/0x10f
> [ 1384.922892]    scsi_dispatch_cmd+0x141/0x210
> [ 1384.922893]    scsi_queue_rq+0x1c7/0x28f
> [ 1384.922895]    blk_mq_dispatch_rq_list+0x1a6/0x2cf
> [ 1384.922896]    ? find_next_bit+0xb/0xd
> [ 1384.922899]    blk_mq_sched_dispatch_requests+0x14e/0x1e7
> [ 1384.922900]    ? __switch_to+0x288/0x44b
> [ 1384.922911]    __blk_mq_run_hw_queue+0x4c/0x7f
> [ 1384.922912]    blk_mq_run_work_fn+0x2c/0x2e
> [ 1384.922913]    process_one_work+0x179/0x2a5
> [ 1384.922915]    ? rescuer_thread+0x273/0x273
> [ 1384.922915]    worker_thread+0x1a8/0x25b
> [ 1384.922917]    ? rescuer_thread+0x273/0x273
> [ 1384.922917]    kthread+0xfb/0x100
> [ 1384.922918]    ? init_completion+0x24/0x24
> [ 1384.922919]    ? do_fast_syscall_32+0xb7/0xfe
> [ 1384.922920]    ret_from_fork+0x25/0x30
> [ 1384.922922] Mem-Info:
> [ 1384.922924] active_anon:0 inactive_anon:2 isolated_anon:0
> [ 1384.922924]    active_file:199 inactive_file:263 isolated_file:0
> [ 1384.922924]    unevictable:0 dirty:0 writeback:0 unstable:0
> [ 1384.922924]    slab_reclaimable:3414 slab_unreclaimable:8055394
> [ 1384.922924]    mapped:1 shmem:2 pagetables:74 bounce:0
> [ 1384.922924]    free:30587 free_pcp:18 free_cma:3741
> [ 1384.922926] Node 0 active_anon:0kB inactive_anon:8kB active_file:796kB inactive_file:1052kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:4kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
> [ 1384.922926] Node 0 DMA free:15880kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [ 1384.922928] lowmem_reserve[]: 0 3201 31832 31832 31832
> [ 1384.922930] Node 0 DMA32 free:91392kB min:6788kB low:10064kB high:13340kB active_anon:0kB inactive_anon:0kB active_file:56kB inactive_file:56kB unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:72kB local_pcp:72kB free_cma:0kB
> [ 1384.922932] lowmem_reserve[]: 0 0 28631 28631 28631
> [ 1384.922933] Node 0 Normal free:15076kB min:60760kB low:90092kB high:119424kB active_anon:0kB inactive_anon:8kB active_file:740kB inactive_file:996kB unevictable:0kB writepending:0kB present:29874176kB managed:29337252kB mlocked:0kB kernel_stack:8048kB pagetables:296kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:14964kB
> [ 1384.922935] lowmem_reserve[]: 0 0 0 0 0
> [ 1384.922936] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15880kB
> [ 1384.922941] Node 0 DMA32: 2*4kB (UM) 3*8kB (ME) 4*16kB (UME) 5*32kB (ME) 4*64kB (ME) 4*128kB (ME) 5*256kB (ME) 4*512kB (ME) 5*1024kB (ME) 4*2048kB (UME) 18*4096kB (M) = 91392kB
> [ 1384.922946] Node 0 Normal: 1*4kB (C) 0*8kB 1*16kB (C) 1*32kB (C) 1*64kB (C) 0*128kB 0*256kB 1*512kB (C) 0*1024kB 1*2048kB (C) 3*4096kB (C) = 14964kB
> [ 1384.922951] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> [ 1384.922951] 464 total pagecache pages
> [ 1384.922953] 0 pages in swap cache
> [ 1384.922954] Swap cache stats: add 1253, delete 1253, find 21/36
> [ 1384.922954] Free swap  = 15611132kB
> [ 1384.922954] Total swap = 15616764kB
> [ 1384.922955] 8313052 pages RAM
> [ 1384.922955] 0 pages HighMem/MovableOnly
> [ 1384.922955] 150644 pages reserved
> [ 1384.922956] 4096 pages cma reserved
> [ 1384.922956] 0 pages hwpoisoned
> [ 1385.007958] ata17.00: exception Emask 0x0 SAct 0x10000 SErr 0x0 action 0x0
> [ 1385.007961] ata17.00: failed command: READ FPDMA QUEUED
> [ 1385.007965] ata17.00: cmd 60/20:80:90:81:6f/00:00:35:01:00/40 tag 16 ncq dma 16384 in
> [ 1385.007965]        res 40/00:78:10:2c:8d/00:00:f1:00:00/40 Emask 0x40 (internal error)
> [ 1385.007966] ata17.00: status: { DRDY }
> [ 1385.008982] ata17.00: Security Log not supported
> [ 1385.010102] ata17.00: Security Log not supported
> [ 1385.010104] ata17.00: configured for UDMA/133
> [ 1385.010110] ata17: EH complete
> [ 1385.010162] scsi_eh_10: page allocation failure: order:0, mode:0x1284020(GFP_ATOMIC|__GFP_COMP|__GFP_NOTRACK), nodemask=(null)
> [ 1385.010164] scsi_eh_10 cpuset=/ mems_allowed=0
> [ 1385.010175] CPU: 6 PID: 409 Comm: scsi_eh_10 Tainted: G     U      4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
> [ 1385.010175] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> [ 1385.010175] Call Trace:
> [ 1385.010178]    dump_stack+0x61/0x7d
> [ 1385.010179]    warn_alloc+0xfc/0x18c
> [ 1385.010181]    __alloc_pages_slowpath+0x9ca/0xa55
> [ 1385.010182]    ? __alloc_pages_slowpath+0x9ca/0xa55
> [ 1385.010184]    __alloc_pages_nodemask+0x141/0x1f5
> [ 1385.010186]    cache_grow_begin+0xa4/0x294
> [ 1385.010187]    fallback_alloc+0x154/0x196
> [ 1385.010188]    ? cache_grow_begin+0xa4/0x294
> [ 1385.010189]    ____cache_alloc_node+0xdd/0xe9
> [ 1385.010191]    kmem_cache_alloc+0x98/0x143
> [ 1385.010193]    sas_alloc_task+0x1d/0x32 [libsas]
> [ 1385.010195]    sas_ata_qc_issue+0x71/0x21c [libsas]
> [ 1385.010196]    ata_qc_issue+0x1fc/0x24c
> [ 1385.010198]    ? ata_scsi_write_same_xlat+0x2d1/0x2d1
> [ 1385.010198]    __ata_scsi_queuecmd+0x18f/0x1eb
> [ 1385.010200]    ata_sas_queuecmd+0x31/0x4d
> [ 1385.010202]    sas_queuecommand+0x83/0x1cf [libsas]
> [ 1385.010203]    ? blk_add_timer+0xcb/0x10f
> [ 1385.010205]    scsi_dispatch_cmd+0x141/0x210
> [ 1385.010205]    scsi_queue_rq+0x1c7/0x28f
> [ 1385.010207]    blk_mq_dispatch_rq_list+0x1a6/0x2cf
> [ 1385.010208]    blk_mq_sched_dispatch_requests+0x129/0x1e7
> [ 1385.010209]    __blk_mq_run_hw_queue+0x4c/0x7f
> [ 1385.010210]    __blk_mq_delay_run_hw_queue+0x5c/0xa2
> [ 1385.010211]    blk_mq_run_hw_queue+0x14/0x16
> [ 1385.010212]    blk_mq_run_hw_queues+0x2e/0x5e
> [ 1385.010212]    scsi_run_queue+0x236/0x2c1
> [ 1385.010214]    scsi_run_host_queues+0x1f/0x37
> [ 1385.010215]    scsi_error_handler+0x467/0x523
> [ 1385.010216]    ? __schedule+0x4f5/0x5c5
> [ 1385.010217]    ? scsi_eh_get_sense+0x1a9/0x1a9
> [ 1385.010218]    kthread+0xfb/0x100
> [ 1385.010219]    ? init_completion+0x24/0x24
> [ 1385.010220]    ret_from_fork+0x25/0x30
> [ 1385.010260] ata17.00: exception Emask 0x0 SAct 0x20000 SErr 0x0 action 0x0
> [ 1385.010263] ata17.00: failed command: READ FPDMA QUEUED
> [ 1385.010266] ata17.00: cmd 60/20:88:90:81:6f/00:00:35:01:00/40 tag 17 ncq dma 16384 in
> [ 1385.010266]        res 50/00:01:30:00:00/00:00:00:00:00/a0 Emask 0x40 (internal error)
> [ 1385.010267] ata17.00: status: { DRDY }
> [ 1385.011259] ata17.00: Security Log not supported
> [ 1385.012380] ata17.00: Security Log not supported
> [ 1385.012382] ata17.00: configured for UDMA/133
> [ 1385.012385] ata17: EH complete
> [ 1385.335912] mount: page allocation failure: order:0, mode:0x1604040(GFP_NOFS|__GFP_COMP|__GFP_NOTRACK), nodemask=(null)
> [ 1385.335916] mount cpuset=/ mems_allowed=0
> [ 1385.335920] CPU: 7 PID: 3125 Comm: mount Tainted: G       U          4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
> [ 1385.335920] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> [ 1385.335921] Call Trace:
> [ 1385.335927]    dump_stack+0x61/0x7d
> [ 1385.335930]    warn_alloc+0xfc/0x18c
> [ 1385.335933]    ? call_timer_fn+0x140/0x140
> [ 1385.335935]    __alloc_pages_slowpath+0x9ca/0xa55
> [ 1385.335939]    __alloc_pages_nodemask+0x141/0x1f5
> [ 1385.335943]    cache_grow_begin+0xa4/0x294
> [ 1385.335945]    fallback_alloc+0x154/0x196
> [ 1385.335946]    ? cache_grow_begin+0xa4/0x294
> [ 1385.335948]    ____cache_alloc_node+0xdd/0xe9
> [ 1385.335950]    kmem_cache_alloc_trace+0xa0/0xfc
> [ 1385.335953]    add_tree_block+0x6a/0x1a1
> [ 1385.335955]    build_ref_tree_for_root+0x1aa/0x3c8
> [ 1385.335956]    btrfs_build_ref_tree+0x142/0x179
> [ 1385.335958]    open_ctree+0x19af/0x1ffe
> [ 1385.335961]    ? _raw_spin_unlock_bh+0x1a/0x1c
> [ 1385.335964]    btrfs_mount+0xa0e/0xb86
> [ 1385.335965]    ? btrfs_mount+0xa0e/0xb86
> [ 1385.335967]    ? find_next_bit+0xb/0xd
> [ 1385.335970]    mount_fs+0x67/0x111
> [ 1385.335973]    vfs_kern_mount+0x6b/0xd5
> [ 1385.335974]    btrfs_mount+0x1de/0xb86
> [ 1385.335975]    ? find_next_bit+0xb/0xd
> [ 1385.335978]    mount_fs+0x67/0x111
> [ 1385.335979]    vfs_kern_mount+0x6b/0xd5
> [ 1385.335981]    do_mount+0x6e9/0x987
> [ 1385.335984]    compat_SyS_mount+0x185/0x1ae
> [ 1385.335986]    do_fast_syscall_32+0xb7/0xfe
> [ 1385.335988]    entry_SYSENTER_compat+0x4c/0x5b
> [ 1385.335990] RIP: 0023:0xf7f69c29
> [ 1385.335991] RSP: 002b:00000000ffa6fed0 EFLAGS: 00000297 ORIG_RAX: 0000000000000015
> [ 1385.335992] RAX: ffffffffffffffda RBX: 0000000009877050 RCX: 00000000098771e8
> [ 1385.335993] RDX: 0000000009877370 RSI: 00000000c0ed0400 RDI: 00000000098bd548
> [ 1385.335993] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> [ 1385.335994] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [ 1385.335994] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [ 1387.789938] Node 0 active_anon:588kB inactive_anon:300kB active_file:3988kB inactive_file:1428kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:2184kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
> [ 1387.871500] Node 0 DMA free:0kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [ 1387.949345] lowmem_reserve[]: 0 3201 31832 31832 31832
> [ 1387.965376] Node 0 DMA32 free:621628kB min:6788kB low:10064kB high:13340kB active_anon:0kB inactive_anon:0kB active_file:84kB inactive_file:28kB unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:640kB local_pcp:0kB free_cma:0kB
> [ 1388.049300] lowmem_reserve[]: 0 0 28631 28631 28631
> [ 1388.064560] Node 0 Normal free:4812428kB min:60760kB low:90092kB high:119424kB active_anon:588kB inactive_anon:300kB active_file:3904kB inactive_file:1400kB unevictable:0kB writepending:0kB present:29874176kB managed:29337252kB mlocked:0kB kernel_stack:8080kB pagetables:320kB bounce:0kB free_pcp:4124kB local_pcp:420kB free_cma:11288kB
> [ 1388.155296] lowmem_reserve[]: 0 0 0 0 0
> [ 1388.167479] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [ 1388.198947] Node 0 DMA32: 4430*4kB (U) 3580*8kB (U) 1883*16kB (U) 1009*32kB (U) 17*64kB (U) 16*128kB (U) 12*256kB (U) 10*512kB (U) 17*1024kB (U) 18*2048kB (U) 126*4096kB (U) = 690472kB
> [ 1388.249622] Node 0 Normal: 71828*4kB (UC) 54033*8kB (UC) 34313*16kB (UC) 21097*32kB (UC) 10342*64kB (U) 2801*128kB (UC) 201*256kB (UC) 96*512kB (UC) 68*1024kB (U) 48*2048kB (UC) 457*4096kB (UC) = 5104520kB
> [ 1388.305855] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> [ 1388.331978] 1467 total pagecache pages
> [ 1388.344026] 111 pages in swap cache
> [ 1388.355282] Swap cache stats: add 1465, delete 1354, find 364/553
> [ 1388.374360] Free swap  = 15611132kB
> [ 1388.385607] Total swap = 15616764kB
> [ 1388.396863] 8313052 pages RAM
> [ 1388.406556] 0 pages HighMem/MovableOnly
> [ 1388.418861] 150644 pages reserved
> [ 1388.429595] 4096 pages cma reserved
> [ 1388.440853] 0 pages hwpoisoned
> [ 1388.450807] [ pid ]     uid  tgid total_vm     rss nr_ptes nr_pmds swapents oom_score_adj name
> [ 1388.477200] [  983]       0   983    936       0       6       2       32          0 init
> [ 1388.503586] [  984]       0   984    941       1       5       2       98          0 rc
> [ 1388.529456] [ 1103]       0  1103    920       1       5       2      188          -1000 udevd
> [ 1388.556123] [ 1311]       0  1311    925     443       5       2       24          -1000 net.agent
> [ 1388.583800] [ 1352]       0  1352    925     441       5       2       26          -1000 net.agent
> [ 1388.611490] [ 1703]       0  1703    926     442       5       2       26          -1000 net.agent
> [ 1388.639176] [ 1935]       0  1935    587       0       5       2       31          0 bootlogd
> [ 1388.666611] [ 2469]       0  2469    993       0       5       2      262          -1000 udevd
> [ 1388.693254] [ 2470]       0  2470    993       0       5       2      261          -1000 udevd
> [ 1388.719913] [ 3049]       0  3049     1538       1       6       2      177          0 S13mountall.sh
> [ 1388.748886] [ 3125]       0  3125     1718       0       7       2        0          0 mount
> [ 1388.775570] [15483]       0 15483    558     141       5       2        0          -1000 sleep
> [ 1388.802207] [15484]       0 15484    558     146       4       2        0          -1000 sleep
> [ 1388.828828] [15485]       0 15485    558     145       5       2        0          -1000 sleep
> [ 1388.855456] Out of memory: Kill process 3049 (S13mountall.sh) score 0 or sacrifice child
> 
> And hopefully totally unrelated (but maybe not), after the boot continues, it
> crashes with:
> [ 1523.299228] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffbd1d4c2c
> [ 1523.299228]
> [ 1523.334262] CPU: 2 PID: 19932 Comm: avahi-daemon Tainted: G       U          4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
> [ 1523.367142] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> [ 1523.395339] Call Trace:
> [ 1523.403515]    dump_stack+0x61/0x7d
> [ 1523.414266]    panic+0xe7/0x235
> [ 1523.423982]    ? compat_core_sys_select+0x25b/0x26d
> [ 1523.438878]    __stack_chk_fail+0x19/0x19
> [ 1523.451168]    compat_core_sys_select+0x25b/0x26d
> [ 1523.465552]    ? compat_SyS_select+0xe/0x10
> [ 1523.478358]    ? do_fast_syscall_32+0xb7/0xfe
> [ 1523.491698]    ? entry_SYSENTER_compat+0x4c/0x5b
> [ 1523.505858] Kernel Offset: 0x3c000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [ 1523.538981] Rebooting in 20 seconds..
> 
> I did add stack-protector in 4.13, and it seems to be finding an unrelated bug.
> 
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                      .... what McDonalds is to gourmet cooking
> Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=mDvpdGkRxdLklN-yVzuqr1omzWlRYVI9TzvOASUue9Q&s=rb6VESzi-2sFH_z_ODWKQ5tQtta83EITuT_KaHE7jIs&e=                          | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Fwd: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
       [not found]                             ` <CAHKv19A=OVgCpQpDL2454T+f8QgLm9iynA8xZ4w4Kg8JjYS=UA@mail.gmail.com>
@ 2017-09-02 18:55                               ` George Joseph
  0 siblings, 0 replies; 47+ messages in thread
From: George Joseph @ 2017-09-02 18:55 UTC (permalink / raw)
  To: linux-btrfs

I've just had this happen for the 3rd time in 4 days.  I wasn't
suibscribed to the list so couldn't reply to the existing thread but
here it is http://www.spinics.net/lists/linux-btrfs/msg68662.html

I can do some limited testing.  It's my main dev machine though..


On Sat, Sep 2, 2017 at 10:52 AM, Josef Bacik <jbacik@fb.com> wrote:
>
> Oops, ok I've updated my tree so we don't save the stack trace of the initial scan, which we don't need anyway.  That should save a decent amount of memory in your case.  It was an in place update so you'll need to blow away your local branch and pull the new one to get the new code.  Thanks,
>
> Josef
>
> Sent from my iPhone
>
> > On Sep 2, 2017, at 12:10 PM, Marc MERLIN <marc@merlins.org> wrote:
> >
> >> On Fri, Sep 01, 2017 at 11:01:30PM +0000, Josef Bacik wrote:
> >> You'll be fine, it's only happening on the one fs right?  That's 13gib of metadata with checksums and all that shit, it'll probably look like 8 or 9gib of ram worst case.  I'd mount with -o ref_verify and check the slab amount in /proc/meminfo to get an idea of real usage.  Once the mount is finished that'll be about as much metadata you will use, of course it'll grow as metadata usage grows but it should be nominal.  Thanks,
> >
> > Looks like I don't have enough RAM :(
> >
> > [   80.964838] BTRFS info (device dm-2): bdev /dev/mapper/dshelf1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
> > [ 1382.968986]Tbcache_writebaceinvoked oom-killer:dgfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null),  order=0, oom_score_adj=0
> > [ 1383.003255] bcache_writebac cpuset=/ mems_allowed=0
> > [ 1383.018947] CPU: 6 PID: 2359 Comm: bcache_writebac Tainted: G     U        4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
> > [ 1383.052448] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> > [ 1383.080911] Call Trace:
> > [ 1383.089336]    dump_stack+0x61/0x7d
> > [ 1383.100132]    dump_header+0x97/0x239
> > [ 1383.111354]    ? _raw_spin_unlock_irqrestore+0x14/0x24
> > [ 1383.127322]    oom_kill_process+0x86/0x379
> > [ 1383.140208]    out_of_memory+0x3b8/0x416
> > [ 1383.152581]    __alloc_pages_slowpath+0x890/0xa55
> > [ 1383.166960]    ? _raw_spin_unlock_irq+0x11/0x21
> > [ 1383.180806]    __alloc_pages_nodemask+0x141/0x1f5
> > [ 1383.195144]    alloc_pages_current+0x8d/0x96
> > [ 1383.208310]    bio_alloc_pages+0x29/0x6a
> > [ 1383.220472]    bch_writeback_thread+0x53b/0x6ff [bcache]
> > [ 1383.236942]    ? write_dirty+0x90/0x90 [bcache]
> > [ 1383.250734]    kthread+0xfb/0x100
> > [ 1383.261230]    ? init_completion+0x24/0x24
> > [ 1383.273988]    ? do_fast_syscall_32+0xb7/0xfe
> > [ 1383.287265]    ret_from_fork+0x25/0x30
> > [ 1383.298733] Mem-Info:
> > [ 1383.306446] active_anon:1 inactive_anon:3 isolated_anon:0
> > [ 1383.306446]    active_file:190 inactive_file:180 isolated_file:0
> > [ 1383.306446]    unevictable:0 dirty:0 writeback:1 unstable:0
> > [ 1383.306446]    slab_reclaimable:3436 slab_unreclaimable:8033273
> > [ 1383.306446]    mapped:1 shmem:2 pagetables:74 bounce:0
> > [ 1383.306446]    free:53127 free_pcp:0 free_cma:3741
> > [ 1383.406332] Node 0 active_anon:0kB inactive_anon:16kB active_file:896kB inactive_file:824kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:4kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
> > [ 1383.486392] Node 0 DMA free:15880kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> > [ 1383.565818] lowmem_reserve[]: 0 3201 31832 31832 31832
> > [ 1383.581956] Node 0 DMA32 free:121256kB min:6788kB low:10064kB high:13340kB active_anon:0kB inactive_anon:0kB active_file:44kB inactive_file:52kB unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> > [ 1383.665831] lowmem_reserve[]: 0 0 28631 28631 28631
> > [ 1383.681212] Node 0 Normal free:75372kB min:60760kB low:90092kB high:119424kB active_anon:0kB inactive_anon:16kB active_file:788kB inactive_file:836kB unevictable:0kB writepending:0kB present:29874176kB managed:29337252kB mlocked:0kB kernel_stack:8048kB pagetables:296kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:14964kB
> > [ 1383.769793] lowmem_reserve[]: 0 0 0 0 0
> > [ 1383.782429] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15880kB
> > [ 1383.823171] Node 0 DMA32: 4*4kB (UM) 21*8kB (UME) 9*16kB (UME) 5*32kB (ME) 5*64kB (UME) 5*128kB (UME) 6*256kB (UME) 5*512kB (UME) 5*1024kB (ME) 4*2048kB (UME) 25*4096kB (M) = 121256kB
> > [ 1383.874564] Node 0 Normal: 773*4kB (UMEC) 494*8kB (ME) 373*16kB (UMEC) 284*32kB (MEC) 177*64kB (UMEC) 108*128kB (UME) 36*256kB (UME) 9*512kB (UMEC) 0*1024kB 1*2048kB (C) 3*4096kB (C) = 75412kB
> > [ 1383.927787] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> > [ 1383.954002] 467 total pagecache pages
> > [ 1383.965889] 3 pages in swap cache
> > [ 1383.976715] Swap cache stats: add 1253, delete 1250, find 21/36
> > [ 1383.995325] Free swap  = 15610620kB
> > [ 1384.006675] Total swap = 15616764kB
> > [ 1384.018005] 8313052 pages RAM
> > [ 1384.027730] 0 pages HighMem/MovableOnly
> > [ 1384.040076] 150644 pages reserved
> > [ 1384.050845] 4096 pages cma reserved
> > [ 1384.062127] 0 pages hwpoisoned
> > [ 1384.072133] [ pid ]     uid  tgid total_vm     rss nr_ptes nr_pmds swapents oom_score_adj name
> > [ 1384.098531] [  983]       0   983    936       0       6       2       32          0 init
> > [ 1384.124971] [  984]       0   984    941       1       5       2       98          0 rc
> > [ 1384.150843] [ 1103]       0  1103    920       1       5       2      188          -1000 udevd
> > [ 1384.177534] [ 1311]       0  1311    925       1       5       2       67          -1000 net.agent
> > [ 1384.205278] [ 1352]       0  1352    925       1       5       2       66          -1000 net.agent
> > [ 1384.233017] [ 1703]       0  1703    926       1       5       2       68          -1000 net.agent
> > [ 1384.260731] [ 1935]       0  1935    587       0       5       2       31          0 bootlogd
> > [ 1384.288190] [ 2469]       0  2469    993       0       5       2      262          -1000 udevd
> > [ 1384.314846] [ 2470]       0  2470    993       0       5       2      261          -1000 udevd
> > [ 1384.341494] [ 3049]       0  3049     1538       1       6       2      177          0 S13mountall.sh
> > [ 1384.370576] [ 3125]       0  3125     1718       0       7       2      128          0 mount
> > [ 1384.397360] [15456]       0 15456    124       0       3       2       10          -1000 sleep
> > [ 1384.424026] [15457]       0 15457    124       0       3       2       12          -1000 sleep
> > [ 1384.450650] [15458]       0 15458    124       1       3       2       10          -1000 sleep
> > [ 1384.477317] Out of memory: Kill process 3049 (S13mountall.sh) score 0 or sacrifice child
> > [ 1384.502384] Killed process 3125 (mount) total-vm:6872kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> > [ 1384.535964] oom_reaper: reaped process 3125 (mount), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> > [ 1384.573082] bcache_writebac invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null),  order=0, oom_score_adj=0
> > [ 1384.607340] bcache_writebac cpuset=/ mems_allowed=0
> > [ 1384.623102] CPU: 0 PID: 2359 Comm: bcache_writebac Tainted: G     U        4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
> > [ 1384.656825] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> > [ 1384.685361] Call Trace:
> > [ 1384.693823]    dump_stack+0x61/0x7d
> > [ 1384.704866]    dump_header+0x97/0x239
> > [ 1384.716086]    ? _raw_spin_unlock_irqrestore+0x14/0x24
> > [ 1384.731697]    oom_kill_process+0x86/0x379
> > [ 1384.744201]    out_of_memory+0x3b8/0x416
> > [ 1384.756259]    __alloc_pages_slowpath+0x890/0xa55
> > [ 1384.770536]    ? _raw_spin_unlock_irq+0x11/0x21
> > [ 1384.784302]    __alloc_pages_nodemask+0x141/0x1f5
> > [ 1384.798539]    alloc_pages_current+0x8d/0x96
> > [ 1384.811465]    bio_alloc_pages+0x29/0x6a
> > [ 1384.823334]    bch_writeback_thread+0x53b/0x6ff [bcache]
> > [ 1384.839334]    ? write_dirty+0x90/0x90 [bcache]
> > [ 1384.852984]    kthread+0xfb/0x100
> > [ 1384.862970]    ? init_completion+0x24/0x24
> > [ 1384.875285]    ? do_fast_syscall_32+0xb7/0xfe
> > [ 1384.888368]    ret_from_fork+0x25/0x30
> > [ 1384.899696] Mem-Info:
> > [ 1384.907064] active_anon:0 inactive_anon:2 isolated_anon:0
> > [ 1384.907064]    active_file:189 inactive_file:273 isolated_file:0
> > [ 1384.907064]    unevictable:0 dirty:0 writeback:0 unstable:0
> > [ 1384.907064]    slab_reclaimable:3414 slab_unreclaimable:8053934
> > [ 1384.907064]    mapped:1 shmem:2 pagetables:74 bounce:0
> > [ 1384.907064]    free:32075 free_pcp:25 free_cma:3741
> > [ 1384.922833] kworker/6:1H: page allocation failure: order:0, mode:0x1284020(GFP_ATOMIC|__GFP_COMP|__GFP_NOTRACK), nodemask=(null)
> > [ 1384.922836] kworker/6:1H cpuset=/ mems_allowed=0
> > [ 1384.922840] CPU: 6 PID: 400 Comm: kworker/6:1H Tainted: G     U        4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
> > [ 1384.922841] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> > [ 1384.922844] Workqueue: kblockd blk_mq_run_work_fn
> > [ 1384.922845] Call Trace:
> > [ 1384.922849]    dump_stack+0x61/0x7d
> > [ 1384.922851]    warn_alloc+0xfc/0x18c
> > [ 1384.922854]    __alloc_pages_slowpath+0x9ca/0xa55
> > [ 1384.922856]    ? __alloc_pages_slowpath+0x9ca/0xa55
> > [ 1384.922858]    __alloc_pages_nodemask+0x141/0x1f5
> > [ 1384.922862]    cache_grow_begin+0xa4/0x294
> > [ 1384.922863]    fallback_alloc+0x154/0x196
> > [ 1384.922865]    ? cache_grow_begin+0xa4/0x294
> > [ 1384.922867]    ____cache_alloc_node+0xdd/0xe9
> > [ 1384.922869]    kmem_cache_alloc+0x98/0x143
> > [ 1384.922873]    sas_alloc_task+0x1d/0x32 [libsas]
> > [ 1384.922876]    sas_ata_qc_issue+0x71/0x21c [libsas]
> > [ 1384.922878]    ata_qc_issue+0x1fc/0x24c
> > [ 1384.922880]    ? ata_scsi_write_same_xlat+0x2d1/0x2d1
> > [ 1384.922882]    __ata_scsi_queuecmd+0x18f/0x1eb
> > [ 1384.922883]    ata_sas_queuecmd+0x31/0x4d
> > [ 1384.922886]    sas_queuecommand+0x83/0x1cf [libsas]
> > [ 1384.922889]    ? blk_add_timer+0xcb/0x10f
> > [ 1384.922892]    scsi_dispatch_cmd+0x141/0x210
> > [ 1384.922893]    scsi_queue_rq+0x1c7/0x28f
> > [ 1384.922895]    blk_mq_dispatch_rq_list+0x1a6/0x2cf
> > [ 1384.922896]    ? find_next_bit+0xb/0xd
> > [ 1384.922899]    blk_mq_sched_dispatch_requests+0x14e/0x1e7
> > [ 1384.922900]    ? __switch_to+0x288/0x44b
> > [ 1384.922911]    __blk_mq_run_hw_queue+0x4c/0x7f
> > [ 1384.922912]    blk_mq_run_work_fn+0x2c/0x2e
> > [ 1384.922913]    process_one_work+0x179/0x2a5
> > [ 1384.922915]    ? rescuer_thread+0x273/0x273
> > [ 1384.922915]    worker_thread+0x1a8/0x25b
> > [ 1384.922917]    ? rescuer_thread+0x273/0x273
> > [ 1384.922917]    kthread+0xfb/0x100
> > [ 1384.922918]    ? init_completion+0x24/0x24
> > [ 1384.922919]    ? do_fast_syscall_32+0xb7/0xfe
> > [ 1384.922920]    ret_from_fork+0x25/0x30
> > [ 1384.922922] Mem-Info:
> > [ 1384.922924] active_anon:0 inactive_anon:2 isolated_anon:0
> > [ 1384.922924]    active_file:199 inactive_file:263 isolated_file:0
> > [ 1384.922924]    unevictable:0 dirty:0 writeback:0 unstable:0
> > [ 1384.922924]    slab_reclaimable:3414 slab_unreclaimable:8055394
> > [ 1384.922924]    mapped:1 shmem:2 pagetables:74 bounce:0
> > [ 1384.922924]    free:30587 free_pcp:18 free_cma:3741
> > [ 1384.922926] Node 0 active_anon:0kB inactive_anon:8kB active_file:796kB inactive_file:1052kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:4kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
> > [ 1384.922926] Node 0 DMA free:15880kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> > [ 1384.922928] lowmem_reserve[]: 0 3201 31832 31832 31832
> > [ 1384.922930] Node 0 DMA32 free:91392kB min:6788kB low:10064kB high:13340kB active_anon:0kB inactive_anon:0kB active_file:56kB inactive_file:56kB unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:72kB local_pcp:72kB free_cma:0kB
> > [ 1384.922932] lowmem_reserve[]: 0 0 28631 28631 28631
> > [ 1384.922933] Node 0 Normal free:15076kB min:60760kB low:90092kB high:119424kB active_anon:0kB inactive_anon:8kB active_file:740kB inactive_file:996kB unevictable:0kB writepending:0kB present:29874176kB managed:29337252kB mlocked:0kB kernel_stack:8048kB pagetables:296kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:14964kB
> > [ 1384.922935] lowmem_reserve[]: 0 0 0 0 0
> > [ 1384.922936] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15880kB
> > [ 1384.922941] Node 0 DMA32: 2*4kB (UM) 3*8kB (ME) 4*16kB (UME) 5*32kB (ME) 4*64kB (ME) 4*128kB (ME) 5*256kB (ME) 4*512kB (ME) 5*1024kB (ME) 4*2048kB (UME) 18*4096kB (M) = 91392kB
> > [ 1384.922946] Node 0 Normal: 1*4kB (C) 0*8kB 1*16kB (C) 1*32kB (C) 1*64kB (C) 0*128kB 0*256kB 1*512kB (C) 0*1024kB 1*2048kB (C) 3*4096kB (C) = 14964kB
> > [ 1384.922951] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> > [ 1384.922951] 464 total pagecache pages
> > [ 1384.922953] 0 pages in swap cache
> > [ 1384.922954] Swap cache stats: add 1253, delete 1253, find 21/36
> > [ 1384.922954] Free swap  = 15611132kB
> > [ 1384.922954] Total swap = 15616764kB
> > [ 1384.922955] 8313052 pages RAM
> > [ 1384.922955] 0 pages HighMem/MovableOnly
> > [ 1384.922955] 150644 pages reserved
> > [ 1384.922956] 4096 pages cma reserved
> > [ 1384.922956] 0 pages hwpoisoned
> > [ 1385.007958] ata17.00: exception Emask 0x0 SAct 0x10000 SErr 0x0 action 0x0
> > [ 1385.007961] ata17.00: failed command: READ FPDMA QUEUED
> > [ 1385.007965] ata17.00: cmd 60/20:80:90:81:6f/00:00:35:01:00/40 tag 16 ncq dma 16384 in
> > [ 1385.007965]        res 40/00:78:10:2c:8d/00:00:f1:00:00/40 Emask 0x40 (internal error)
> > [ 1385.007966] ata17.00: status: { DRDY }
> > [ 1385.008982] ata17.00: Security Log not supported
> > [ 1385.010102] ata17.00: Security Log not supported
> > [ 1385.010104] ata17.00: configured for UDMA/133
> > [ 1385.010110] ata17: EH complete
> > [ 1385.010162] scsi_eh_10: page allocation failure: order:0, mode:0x1284020(GFP_ATOMIC|__GFP_COMP|__GFP_NOTRACK), nodemask=(null)
> > [ 1385.010164] scsi_eh_10 cpuset=/ mems_allowed=0
> > [ 1385.010175] CPU: 6 PID: 409 Comm: scsi_eh_10 Tainted: G     U      4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
> > [ 1385.010175] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> > [ 1385.010175] Call Trace:
> > [ 1385.010178]    dump_stack+0x61/0x7d
> > [ 1385.010179]    warn_alloc+0xfc/0x18c
> > [ 1385.010181]    __alloc_pages_slowpath+0x9ca/0xa55
> > [ 1385.010182]    ? __alloc_pages_slowpath+0x9ca/0xa55
> > [ 1385.010184]    __alloc_pages_nodemask+0x141/0x1f5
> > [ 1385.010186]    cache_grow_begin+0xa4/0x294
> > [ 1385.010187]    fallback_alloc+0x154/0x196
> > [ 1385.010188]    ? cache_grow_begin+0xa4/0x294
> > [ 1385.010189]    ____cache_alloc_node+0xdd/0xe9
> > [ 1385.010191]    kmem_cache_alloc+0x98/0x143
> > [ 1385.010193]    sas_alloc_task+0x1d/0x32 [libsas]
> > [ 1385.010195]    sas_ata_qc_issue+0x71/0x21c [libsas]
> > [ 1385.010196]    ata_qc_issue+0x1fc/0x24c
> > [ 1385.010198]    ? ata_scsi_write_same_xlat+0x2d1/0x2d1
> > [ 1385.010198]    __ata_scsi_queuecmd+0x18f/0x1eb
> > [ 1385.010200]    ata_sas_queuecmd+0x31/0x4d
> > [ 1385.010202]    sas_queuecommand+0x83/0x1cf [libsas]
> > [ 1385.010203]    ? blk_add_timer+0xcb/0x10f
> > [ 1385.010205]    scsi_dispatch_cmd+0x141/0x210
> > [ 1385.010205]    scsi_queue_rq+0x1c7/0x28f
> > [ 1385.010207]    blk_mq_dispatch_rq_list+0x1a6/0x2cf
> > [ 1385.010208]    blk_mq_sched_dispatch_requests+0x129/0x1e7
> > [ 1385.010209]    __blk_mq_run_hw_queue+0x4c/0x7f
> > [ 1385.010210]    __blk_mq_delay_run_hw_queue+0x5c/0xa2
> > [ 1385.010211]    blk_mq_run_hw_queue+0x14/0x16
> > [ 1385.010212]    blk_mq_run_hw_queues+0x2e/0x5e
> > [ 1385.010212]    scsi_run_queue+0x236/0x2c1
> > [ 1385.010214]    scsi_run_host_queues+0x1f/0x37
> > [ 1385.010215]    scsi_error_handler+0x467/0x523
> > [ 1385.010216]    ? __schedule+0x4f5/0x5c5
> > [ 1385.010217]    ? scsi_eh_get_sense+0x1a9/0x1a9
> > [ 1385.010218]    kthread+0xfb/0x100
> > [ 1385.010219]    ? init_completion+0x24/0x24
> > [ 1385.010220]    ret_from_fork+0x25/0x30
> > [ 1385.010260] ata17.00: exception Emask 0x0 SAct 0x20000 SErr 0x0 action 0x0
> > [ 1385.010263] ata17.00: failed command: READ FPDMA QUEUED
> > [ 1385.010266] ata17.00: cmd 60/20:88:90:81:6f/00:00:35:01:00/40 tag 17 ncq dma 16384 in
> > [ 1385.010266]        res 50/00:01:30:00:00/00:00:00:00:00/a0 Emask 0x40 (internal error)
> > [ 1385.010267] ata17.00: status: { DRDY }
> > [ 1385.011259] ata17.00: Security Log not supported
> > [ 1385.012380] ata17.00: Security Log not supported
> > [ 1385.012382] ata17.00: configured for UDMA/133
> > [ 1385.012385] ata17: EH complete
> > [ 1385.335912] mount: page allocation failure: order:0, mode:0x1604040(GFP_NOFS|__GFP_COMP|__GFP_NOTRACK), nodemask=(null)
> > [ 1385.335916] mount cpuset=/ mems_allowed=0
> > [ 1385.335920] CPU: 7 PID: 3125 Comm: mount Tainted: G       U          4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
> > [ 1385.335920] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> > [ 1385.335921] Call Trace:
> > [ 1385.335927]    dump_stack+0x61/0x7d
> > [ 1385.335930]    warn_alloc+0xfc/0x18c
> > [ 1385.335933]    ? call_timer_fn+0x140/0x140
> > [ 1385.335935]    __alloc_pages_slowpath+0x9ca/0xa55
> > [ 1385.335939]    __alloc_pages_nodemask+0x141/0x1f5
> > [ 1385.335943]    cache_grow_begin+0xa4/0x294
> > [ 1385.335945]    fallback_alloc+0x154/0x196
> > [ 1385.335946]    ? cache_grow_begin+0xa4/0x294
> > [ 1385.335948]    ____cache_alloc_node+0xdd/0xe9
> > [ 1385.335950]    kmem_cache_alloc_trace+0xa0/0xfc
> > [ 1385.335953]    add_tree_block+0x6a/0x1a1
> > [ 1385.335955]    build_ref_tree_for_root+0x1aa/0x3c8
> > [ 1385.335956]    btrfs_build_ref_tree+0x142/0x179
> > [ 1385.335958]    open_ctree+0x19af/0x1ffe
> > [ 1385.335961]    ? _raw_spin_unlock_bh+0x1a/0x1c
> > [ 1385.335964]    btrfs_mount+0xa0e/0xb86
> > [ 1385.335965]    ? btrfs_mount+0xa0e/0xb86
> > [ 1385.335967]    ? find_next_bit+0xb/0xd
> > [ 1385.335970]    mount_fs+0x67/0x111
> > [ 1385.335973]    vfs_kern_mount+0x6b/0xd5
> > [ 1385.335974]    btrfs_mount+0x1de/0xb86
> > [ 1385.335975]    ? find_next_bit+0xb/0xd
> > [ 1385.335978]    mount_fs+0x67/0x111
> > [ 1385.335979]    vfs_kern_mount+0x6b/0xd5
> > [ 1385.335981]    do_mount+0x6e9/0x987
> > [ 1385.335984]    compat_SyS_mount+0x185/0x1ae
> > [ 1385.335986]    do_fast_syscall_32+0xb7/0xfe
> > [ 1385.335988]    entry_SYSENTER_compat+0x4c/0x5b
> > [ 1385.335990] RIP: 0023:0xf7f69c29
> > [ 1385.335991] RSP: 002b:00000000ffa6fed0 EFLAGS: 00000297 ORIG_RAX: 0000000000000015
> > [ 1385.335992] RAX: ffffffffffffffda RBX: 0000000009877050 RCX: 00000000098771e8
> > [ 1385.335993] RDX: 0000000009877370 RSI: 00000000c0ed0400 RDI: 00000000098bd548
> > [ 1385.335993] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> > [ 1385.335994] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> > [ 1385.335994] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> > [ 1387.789938] Node 0 active_anon:588kB inactive_anon:300kB active_file:3988kB inactive_file:1428kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:2184kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
> > [ 1387.871500] Node 0 DMA free:0kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> > [ 1387.949345] lowmem_reserve[]: 0 3201 31832 31832 31832
> > [ 1387.965376] Node 0 DMA32 free:621628kB min:6788kB low:10064kB high:13340kB active_anon:0kB inactive_anon:0kB active_file:84kB inactive_file:28kB unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:640kB local_pcp:0kB free_cma:0kB
> > [ 1388.049300] lowmem_reserve[]: 0 0 28631 28631 28631
> > [ 1388.064560] Node 0 Normal free:4812428kB min:60760kB low:90092kB high:119424kB active_anon:588kB inactive_anon:300kB active_file:3904kB inactive_file:1400kB unevictable:0kB writepending:0kB present:29874176kB managed:29337252kB mlocked:0kB kernel_stack:8080kB pagetables:320kB bounce:0kB free_pcp:4124kB local_pcp:420kB free_cma:11288kB
> > [ 1388.155296] lowmem_reserve[]: 0 0 0 0 0
> > [ 1388.167479] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> > [ 1388.198947] Node 0 DMA32: 4430*4kB (U) 3580*8kB (U) 1883*16kB (U) 1009*32kB (U) 17*64kB (U) 16*128kB (U) 12*256kB (U) 10*512kB (U) 17*1024kB (U) 18*2048kB (U) 126*4096kB (U) = 690472kB
> > [ 1388.249622] Node 0 Normal: 71828*4kB (UC) 54033*8kB (UC) 34313*16kB (UC) 21097*32kB (UC) 10342*64kB (U) 2801*128kB (UC) 201*256kB (UC) 96*512kB (UC) 68*1024kB (U) 48*2048kB (UC) 457*4096kB (UC) = 5104520kB
> > [ 1388.305855] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> > [ 1388.331978] 1467 total pagecache pages
> > [ 1388.344026] 111 pages in swap cache
> > [ 1388.355282] Swap cache stats: add 1465, delete 1354, find 364/553
> > [ 1388.374360] Free swap  = 15611132kB
> > [ 1388.385607] Total swap = 15616764kB
> > [ 1388.396863] 8313052 pages RAM
> > [ 1388.406556] 0 pages HighMem/MovableOnly
> > [ 1388.418861] 150644 pages reserved
> > [ 1388.429595] 4096 pages cma reserved
> > [ 1388.440853] 0 pages hwpoisoned
> > [ 1388.450807] [ pid ]     uid  tgid total_vm     rss nr_ptes nr_pmds swapents oom_score_adj name
> > [ 1388.477200] [  983]       0   983    936       0       6       2       32          0 init
> > [ 1388.503586] [  984]       0   984    941       1       5       2       98          0 rc
> > [ 1388.529456] [ 1103]       0  1103    920       1       5       2      188          -1000 udevd
> > [ 1388.556123] [ 1311]       0  1311    925     443       5       2       24          -1000 net.agent
> > [ 1388.583800] [ 1352]       0  1352    925     441       5       2       26          -1000 net.agent
> > [ 1388.611490] [ 1703]       0  1703    926     442       5       2       26          -1000 net.agent
> > [ 1388.639176] [ 1935]       0  1935    587       0       5       2       31          0 bootlogd
> > [ 1388.666611] [ 2469]       0  2469    993       0       5       2      262          -1000 udevd
> > [ 1388.693254] [ 2470]       0  2470    993       0       5       2      261          -1000 udevd
> > [ 1388.719913] [ 3049]       0  3049     1538       1       6       2      177          0 S13mountall.sh
> > [ 1388.748886] [ 3125]       0  3125     1718       0       7       2        0          0 mount
> > [ 1388.775570] [15483]       0 15483    558     141       5       2        0          -1000 sleep
> > [ 1388.802207] [15484]       0 15484    558     146       4       2        0          -1000 sleep
> > [ 1388.828828] [15485]       0 15485    558     145       5       2        0          -1000 sleep
> > [ 1388.855456] Out of memory: Kill process 3049 (S13mountall.sh) score 0 or sacrifice child
> >
> > And hopefully totally unrelated (but maybe not), after the boot continues, it
> > crashes with:
> > [ 1523.299228] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffbd1d4c2c
> > [ 1523.299228]
> > [ 1523.334262] CPU: 2 PID: 19932 Comm: avahi-daemon Tainted: G       U          4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
> > [ 1523.367142] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> > [ 1523.395339] Call Trace:
> > [ 1523.403515]    dump_stack+0x61/0x7d
> > [ 1523.414266]    panic+0xe7/0x235
> > [ 1523.423982]    ? compat_core_sys_select+0x25b/0x26d
> > [ 1523.438878]    __stack_chk_fail+0x19/0x19
> > [ 1523.451168]    compat_core_sys_select+0x25b/0x26d
> > [ 1523.465552]    ? compat_SyS_select+0xe/0x10
> > [ 1523.478358]    ? do_fast_syscall_32+0xb7/0xfe
> > [ 1523.491698]    ? entry_SYSENTER_compat+0x4c/0x5b
> > [ 1523.505858] Kernel Offset: 0x3c000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > [ 1523.538981] Rebooting in 20 seconds..
> >
> > I did add stack-protector in 4.13, and it seems to be finding an unrelated bug.
> >
> > Marc
> > --
> > "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> > Microsoft is to operating systems ....
> >                                      .... what McDonalds is to gourmet cooking
> > Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=mDvpdGkRxdLklN-yVzuqr1omzWlRYVI9TzvOASUue9Q&s=rb6VESzi-2sFH_z_ODWKQ5tQtta83EITuT_KaHE7jIs&e=                          | PGP 1024R/763BE901
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-09-02 16:52                           ` Josef Bacik
       [not found]                             ` <CAHKv19A=OVgCpQpDL2454T+f8QgLm9iynA8xZ4w4Kg8JjYS=UA@mail.gmail.com>
@ 2017-09-02 23:53                             ` Marc MERLIN
  2017-09-03  0:30                               ` Josef Bacik
  1 sibling, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2017-09-02 23:53 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

On Sat, Sep 02, 2017 at 04:52:20PM +0000, Josef Bacik wrote:
> Oops, ok I've updated my tree so we don't save the stack trace of the initial scan, which we don't need anyway.  That should save a decent amount of memory in your case.  It was an in place update so you'll need to blow away your local branch and pull the new one to get the new code.  Thanks,

Still did not work unfortunately (on top of extra unrelated bugs in
4.13rc5 like I was afraid)

mounting the partition still sucks all the memory

[  358.719722] bcache_writebac invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null),  order=0, oom_score_adj=0
[  358.753716] bcache_writebac cpuset=/ mems_allowed=0
[  358.769071] CPU: 3 PID: 2339 Comm: bcache_writebac Tainted: G     U		4.13.0-rc5-amd64-stkreg-sysrq-20170902+ #2
[  358.802040] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
[  358.830082] Call Trace:
[  358.838108]	dump_stack+0x61/0x7d
[  358.848728]	dump_header+0x97/0x239
[  358.859846]	? _raw_spin_unlock_irqrestore+0x14/0x24
[  358.875398]	oom_kill_process+0x86/0x379
[  358.887838]	out_of_memory+0x3a6/0x3ef
[  358.899730]	__alloc_pages_slowpath+0x86e/0xa1f
[  358.913977]	? native_sched_clock+0x1a/0x37
[  358.927197]	__alloc_pages_nodemask+0x134/0x1d4
[  358.941432]	alloc_pages_current+0x8d/0x96
[  358.954343]	bio_alloc_pages+0x29/0x6a
[  358.966194]	bch_writeback_thread+0x51c/0x6d4 [bcache]
[  358.982206]	? write_dirty+0x90/0x90 [bcache]
[  358.995878]	kthread+0xfb/0x100
[  359.005899]	? init_completion+0x24/0x24
[  359.018242]	? do_fast_syscall_32+0xb7/0xfe
[  359.031360]	ret_from_fork+0x25/0x30
[  359.042723] Mem-Info:
[  359.050529] active_anon:0 inactive_anon:2 isolated_anon:0
[  359.050529]	active_file:306 inactive_file:163 isolated_file:0
[  359.050529]	unevictable:0 dirty:0 writeback:0 unstable:0
[  359.050529]	slab_reclaimable:3430 slab_unreclaimable:8034083
[  359.050529]	mapped:1 shmem:2 pagetables:80 bounce:0
[  359.050529]	free:51932 free_pcp:46 free_cma:3741
[  359.149971] Node 0 active_anon:0kB inactive_anon:8kB active_file:1128kB inactive_file:892kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:4kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
[  359.229593] Node 0 DMA free:15880kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[  359.308570] lowmem_reserve[]: 0 3201 31832 31832 31832
[  359.324706] Node 0 DMA32 free:121124kB min:6788kB low:10064kB high:13340kB active_anon:0kB inactive_anon:0kB active_file:100kB inactive_file:0kB unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB mlocked:0kB kernel_stack:16kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[  359.408498] lowmem_reserve[]: 0 0 28631 28631 28631
[  359.423773] Node 0 Normal free:70792kB min:60760kB low:90092kB high:119424kB active_anon:0kB inactive_anon:8kB active_file:780kB inactive_file:808kB unevictable:0kB writepending:0kB present:29874176kB managed:29337512kB mlocked:0kB kernel_stack:4608kB pagetables:320kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:14964kB
[  359.511284] lowmem_reserve[]: 0 0 0 0 0
[  359.523514] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15880kB
[  359.564260] Node 0 DMA32: 3*4kB (UME) 3*8kB (ME) 4*16kB (UME) 6*32kB (UME) 4*64kB (ME) 4*128kB (ME) 5*256kB (ME) 4*512kB (ME) 6*1024kB (UME) 4*2048kB (UME) 25*4096kB (M) = 121124kB
[  359.614116] Node 0 Normal: 559*4kB (UMEC) 272*8kB (ME) 163*16kB (UMEC) 93*32kB (UMEC) 65*64kB (MEC) 71*128kB (UME) 37*256kB (ME) 18*512kB (UMC) 8*1024kB (M) 4*2048kB (MC) 3*4096kB (C) = 70604kB
[  359.667377] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[  359.693519] 456 total pagecache pages
[  359.705331] 0 pages in swap cache
[  359.716151] Swap cache stats: add 1184, delete 1184, find 4/8
[  359.734213] Free swap  = 15610620kB
[  359.745499] Total swap = 15616764kB
[  359.756879] 8313052 pages RAM
[  359.766596] 0 pages HighMem/MovableOnly
[  359.778927] 150579 pages reserved
[  359.789686] 4096 pages cma reserved
[  359.801052] 0 pages hwpoisoned
[  359.811026] [ pid ]	 uid  tgid total_vm	 rss nr_ptes nr_pmds swapents oom_score_adj name
[  359.837419] [  967]	   0   967	936	   0	   6	   2	   32		  0 init
[  359.863819] [  968]	   0   968	941	   1	   5	   2	   98		  0 rc
[  359.889683] [ 1087]	   0  1087	942	   1	   5	   2	  212	      -1000 udevd
[  359.916457] [ 1294]	   0  1294	917	   1	   5	   2	   60	      -1000 net.agent
[  359.944236] [ 1340]	   0  1340	917	   1	   5	   2	   59	      -1000 net.agent
[  359.971915] [ 1750]	   0  1750	918	   1	   5	   2	   59	      -1000 net.agent
[  359.999603] [ 1915]	   0  1915	587	   0	   5	   2	   31		  0 bootlogd
[  360.027033] [ 2442]	   0  2442	942	   0	   5	   2	  211	      -1000 udevd
[  360.053685] [ 2443]	   0  2443	942	   0	   5	   2	  211	      -1000 udevd
[  360.080467] [ 3023]	   0  3023     1538	   1	   6	   2	  177		  0 S13mountall.sh
[  360.109446] [ 3078]	   0  3078     1719	   1	   7	   2	  129		  0 mount
[  360.136111] [ 5722]	   0  5722	558	   0	   5	   2	   16	      -1000 sleep
[  360.162742] [ 5723]	   0  5723	558	   0	   5	   2	   17	      -1000 sleep
[  360.189358] [ 5724]	   0  5724	558	   0	   5	   2	   17	      -1000 sleep
[  360.215977] Out of memory: Kill process 3023 (S13mountall.sh) score 0 or sacrifice child
[  360.241102] Killed process 3078 (mount) total-vm:6876kB, anon-rss:0kB, file-rss:4kB, shmem-rss:0kB
[  360.276193] oom_reaper: reaped process 3078 (mount), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[  360.308435] bcache_writebac invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null),  order=0, oom_score_adj=0
[  360.342339] bcache_writebac cpuset=/ mems_allowed=0
[  360.357757] CPU: 1 PID: 2339 Comm: bcache_writebac Tainted: G     U		4.13.0-rc5-amd64-stkreg-sysrq-20170902+ #2
[  360.390847] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
[  360.419010] Call Trace:
[  360.427123]	dump_stack+0x61/0x7d
[  360.437815]	dump_header+0x97/0x239
[  360.449000]	? _raw_spin_unlock_irqrestore+0x14/0x24
[  360.464705]	oom_kill_process+0x86/0x379
[  360.477180]	out_of_memory+0x3a6/0x3ef
[  360.489120]	__alloc_pages_slowpath+0x86e/0xa1f
[  360.503360]	? native_sched_clock+0x1a/0x37
[  360.516712]	__alloc_pages_nodemask+0x134/0x1d4
[  360.530950]	alloc_pages_current+0x8d/0x96
[  360.543852]	bio_alloc_pages+0x29/0x6a
[  360.555696]	bch_writeback_thread+0x51c/0x6d4 [bcache]
[  360.571694]	? write_dirty+0x90/0x90 [bcache]
[  360.585320]	kthread+0xfb/0x100
[  360.595281]	? init_completion+0x24/0x24
[  360.607571]	? do_fast_syscall_32+0xb7/0xfe
[  360.620725]	ret_from_fork+0x25/0x30
[  360.632000] Mem-Info:
[  360.639511] active_anon:0 inactive_anon:2 isolated_anon:0
[  360.639511]	active_file:237 inactive_file:181 isolated_file:0
[  360.639511]	unevictable:0 dirty:0 writeback:0 unstable:0
[  360.639511]	slab_reclaimable:3428 slab_unreclaimable:8054968
[  360.639511]	mapped:1 shmem:2 pagetables:80 bounce:0
[  360.639511]	free:31221 free_pcp:20 free_cma:3741
[  360.738644] Node 0 active_anon:0kB inactive_anon:8kB active_file:1016kB inactive_file:980kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:4kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[  360.818057] Node 0 DMA free:15880kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[  360.897260] lowmem_reserve[]: 0 3201 31832 31832 31832
[  360.913324] Node 0 DMA32 free:4760kB min:6788kB low:10064kB high:13340kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:324kB unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB mlocked:0kB kernel_stack:16kB pagetables:0kB bounce:0kB free_pcp:28kB local_pcp:0kB free_cma:0kB
[  360.968034] mount: page allocation failure: order:0, mode:0x1604040(GFP_NOFS|__GFP_COMP|__GFP_NOTRACK), nodemask=(null)
[  360.968038] mount cpuset=/ mems_allowed=0
[  360.968042] CPU: 0 PID: 3078 Comm: mount Tainted: G	   U	      4.13.0-rc5-amd64-stkreg-sysrq-20170902+ #2
[  360.968043] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
[  360.968044] Call Trace:
[  360.968050]	dump_stack+0x61/0x7d
[  360.968052]	warn_alloc+0xe4/0x15d
[  360.968056]	? call_timer_fn+0x140/0x140
[  360.968058]	__alloc_pages_slowpath+0x9a8/0xa1f
[  360.968068]	? __rmqueue+0x285/0x297
[  360.968071]	__alloc_pages_nodemask+0x134/0x1d4
[  360.968075]	cache_grow_begin+0x95/0x26f
[  360.968077]	fallback_alloc+0x154/0x196
[  360.968079]	____cache_alloc_node+0xdd/0xe9
[  360.968081]	kmem_cache_alloc_trace+0xa0/0xfc
[  360.968084]	add_tree_block+0x6a/0x1ac
[  360.968086]	build_ref_tree_for_root+0x19b/0x3a5
[  360.968088]	btrfs_build_ref_tree+0x133/0x156
[  360.968090]	open_ctree+0x1997/0x1fd2
[  360.968093]	btrfs_mount+0x9d5/0xb2d
[  360.968094]	? btrfs_mount+0x9d5/0xb2d
[  360.968096]	? find_next_bit+0xb/0xd
[  360.968099]	mount_fs+0x67/0x111
[  360.968101]	vfs_kern_mount+0x6b/0xd5
[  360.968102]	btrfs_mount+0x1c3/0xb2d
[  360.968103]	? find_next_bit+0xb/0xd
[  360.968106]	mount_fs+0x67/0x111
[  360.968107]	vfs_kern_mount+0x6b/0xd5
[  360.968109]	do_mount+0x6da/0x964
[  360.968111]	? slab_post_alloc_hook.isra.46+0xe/0x1d
[  360.968113]	compat_SyS_mount+0x185/0x1ae
[  360.968116]	do_fast_syscall_32+0xb7/0xfe
[  360.968118]	entry_SYSENTER_compat+0x4c/0x5b
[  360.968119] RIP: 0023:0xf7f21c29
[  360.968120] RSP: 002b:00000000ffd733e0 EFLAGS: 00000297 ORIG_RAX: 0000000000000015
[  360.968122] RAX: ffffffffffffffda RBX: 0000000008ff32c8 RCX: 0000000008ff3118
[  360.968122] RDX: 0000000008ff3010 RSI: 00000000c0ed0400 RDI: 0000000008ff6500
[  360.968123] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  360.968124] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  360.968124] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  360.968126] Mem-Info:
[  360.968130] active_anon:0 inactive_anon:2 isolated_anon:0
[  360.968130]	active_file:333 inactive_file:144 isolated_file:0
[  360.968130]	unevictable:0 dirty:0 writeback:0 unstable:0
[  360.968130]	slab_reclaimable:3428 slab_unreclaimable:8082489
[  360.968130]	mapped:1 shmem:2 pagetables:80 bounce:0
[  360.968130]	free:3717 free_pcp:9 free_cma:3741
[  360.968132] Node 0 active_anon:0kB inactive_anon:8kB active_file:1332kB inactive_file:576kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:4kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
[  360.968133] Node 0 DMA free:0kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[  360.968136] lowmem_reserve[]: 0 3201 31832 31832 31832
[  360.968138] Node 0 DMA32 free:296kB min:6788kB low:10064kB high:13340kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:292kB unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB mlocked:0kB kernel_stack:16kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[  360.968141] lowmem_reserve[]: 0 0 28631 28631 28631
[  360.968143] Node 0 Normal free:14572kB min:60760kB low:90092kB high:119424kB active_anon:0kB inactive_anon:8kB active_file:920kB inactive_file:504kB unevictable:0kB writepending:0kB present:29874176kB managed:29337512kB mlocked:0kB kernel_stack:4608kB pagetables:320kB bounce:0kB free_pcp:36kB local_pcp:0kB free_cma:14964kB
[  360.968146] lowmem_reserve[]: 0 0 0 0 0
[  360.968147] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[  360.968152] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[  360.968157] Node 0 Normal: 1*4kB (C) 0*8kB 1*16kB (C) 1*32kB (C) 1*64kB (C) 0*128kB 0*256kB 1*512kB (C) 0*1024kB 1*2048kB (C) 3*4096kB (C) = 14964kB
[  360.968164] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[  360.968165] 495 total pagecache pages
[  360.968167] 0 pages in swap cache
[  360.968168] Swap cache stats: add 1184, delete 1184, find 4/8
[  360.968169] Free swap  = 15611132kB
[  360.968169] Total swap = 15616764kB
[  360.968170] 8313052 pages RAM
[  360.968170] 0 pages HighMem/MovableOnly
[  360.968170] 150579 pages reserved
[  360.968171] 4096 pages cma reserved
[  360.968171] 0 pages hwpoisoned
[  362.324699] lowmem_reserve[]: 0 0 28631 28631 28631
[  362.340147] Node 0 Normal free:2422448kB min:60760kB low:90092kB high:119424kB active_anon:636kB inactive_anon:196kB active_file:3788kB inactive_file:1424kB unevictable:0kB writepending:0kB present:29874176kB managed:29337512kB mlocked:0kB kernel_stack:4608kB pagetables:324kB bounce:0kB free_pcp:4100kB local_pcp:48kB free_cma:11468kB
[  362.431162] lowmem_reserve[]: 0 0 0 0 0
[  362.443496] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[  362.475115] Node 0 DMA32: 4469*4kB (U) 3639*8kB (U) 3116*16kB (U) 1233*32kB (U) 64*64kB (U) 9*128kB (U) 4*256kB (U) 6*512kB (U) 3*1024kB (U) 6*2048kB (U) 45*4096kB (U) = 345324kB
[  362.524575] Node 0 Normal: 53138*4kB (UMC) 40392*8kB (UMC) 29104*16kB (UM) 16364*32kB (U) 4513*64kB (UC) 658*128kB (UC) 25*256kB (U) 18*512kB (U) 20*1024kB (UC) 19*2048kB (UC) 193*4096kB (UC) = 2763592kB
[  362.580622] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[  362.606863] 1429 total pagecache pages
[  362.619062] 96 pages in swap cache
[  362.630199] Swap cache stats: add 1396, delete 1300, find 237/380
[  362.649433] Free swap  = 15611132kB
[  362.660833] Total swap = 15616764kB
[  362.672274] 8313052 pages RAM
[  362.682133] 0 pages HighMem/MovableOnly
[  362.694909] 150579 pages reserved
[  362.705741] 4096 pages cma reserved
[  362.717084] 0 pages hwpoisoned
[  362.727099] [ pid ]	 uid  tgid total_vm	 rss nr_ptes nr_pmds swapents oom_score_adj name
[  362.753547] [  967]	   0   967	936	   0	   6	   2	   32		  0 init
[  362.779963] [  968]	   0   968	941	   1	   5	   2	   98		  0 rc
[  362.805846] [ 1087]	   0  1087	942	   1	   5	   2	  212	      -1000 udevd
[  362.832511] [ 1294]	   0  1294	917	 421	   5	   2	   24	      -1000 net.agent
[  362.860203] [ 1340]	   0  1340	917	 396	   5	   2	   28	      -1000 net.agent
[  362.887851] [ 1750]	   0  1750	918	 419	   5	   2	   25	      -1000 net.agent
[  362.915478] [ 1915]	   0  1915	587	   0	   5	   2	   31		  0 bootlogd
[  362.942839] [ 2442]	   0  2442	942	   0	   5	   2	  211	      -1000 udevd
[  362.969390] [ 2443]	   0  2443	942	   0	   5	   2	  211	      -1000 udevd
[  362.995922] [ 3023]	   0  3023     1538	   1	   6	   2	  177		  0 S13mountall.sh
[  363.024797] [ 3078]	   0  3078     1719	   0	   7	   2	    0		  0 mount
[  363.051309] [ 5743]	   0  5743	558	 147	   5	   2	    0	      -1000 sleep
[  363.077802] [ 5744]	   0  5744	558	 148	   5	   2	    0	      -1000 sleep
[  363.104305] [ 5745]	   0  5745	558	 143	   5	   2	    0	      -1000 sleep
[  363.130752] Out of memory: Kill process 3023 (S13mountall.sh) score 0 or sacrifice child
[  363.155660] Killed process 3023 (S13mountall.sh) total-vm:6152kB, anon-rss:0kB, file-rss:4kB, shmem-rss:0kB
/etc/init.d/rc: line 120:  3023 Killed			$debug "$script" $action

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-09-02 23:53                             ` Marc MERLIN
@ 2017-09-03  0:30                               ` Josef Bacik
  2017-09-03  1:01                                 ` Marc MERLIN
  0 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2017-09-03  0:30 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

My bad, I forgot I don't dynamically allocate the stack trace space so my patch did nothing, I blame the children for distracting me.  I've dropped allocating the action altogether for the on disk stuff, that should dramatically reduce the memory usage.  You can just do a git pull since I made a new commit.  You are mounting with -o ref_verify on only the one fs right?  Give this a try and if it still doesn't work we can try a stripped down version that doesn't build the initial tree and just hope that the problem exists in allocating a new block and not modifying the refs for an existing block.  Thanks,

Josef

Sent from my iPhone

> On Sep 2, 2017, at 7:54 PM, Marc MERLIN <marc@merlins.org> wrote:
> 
>> On Sat, Sep 02, 2017 at 04:52:20PM +0000, Josef Bacik wrote:
>> Oops, ok I've updated my tree so we don't save the stack trace of the initial scan, which we don't need anyway.  That should save a decent amount of memory in your case.  It was an in place update so you'll need to blow away your local branch and pull the new one to get the new code.  Thanks,
> 
> Still did not work unfortunately (on top of extra unrelated bugs in
> 4.13rc5 like I was afraid)
> 
> mounting the partition still sucks all the memory
> 
> [  358.719722] bcache_writebac invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null),  order=0, oom_score_adj=0
> [  358.753716] bcache_writebac cpuset=/ mems_allowed=0
> [  358.769071] CPU: 3 PID: 2339 Comm: bcache_writebac Tainted: G     U        4.13.0-rc5-amd64-stkreg-sysrq-20170902+ #2
> [  358.802040] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> [  358.830082] Call Trace:
> [  358.838108]    dump_stack+0x61/0x7d
> [  358.848728]    dump_header+0x97/0x239
> [  358.859846]    ? _raw_spin_unlock_irqrestore+0x14/0x24
> [  358.875398]    oom_kill_process+0x86/0x379
> [  358.887838]    out_of_memory+0x3a6/0x3ef
> [  358.899730]    __alloc_pages_slowpath+0x86e/0xa1f
> [  358.913977]    ? native_sched_clock+0x1a/0x37
> [  358.927197]    __alloc_pages_nodemask+0x134/0x1d4
> [  358.941432]    alloc_pages_current+0x8d/0x96
> [  358.954343]    bio_alloc_pages+0x29/0x6a
> [  358.966194]    bch_writeback_thread+0x51c/0x6d4 [bcache]
> [  358.982206]    ? write_dirty+0x90/0x90 [bcache]
> [  358.995878]    kthread+0xfb/0x100
> [  359.005899]    ? init_completion+0x24/0x24
> [  359.018242]    ? do_fast_syscall_32+0xb7/0xfe
> [  359.031360]    ret_from_fork+0x25/0x30
> [  359.042723] Mem-Info:
> [  359.050529] active_anon:0 inactive_anon:2 isolated_anon:0
> [  359.050529]    active_file:306 inactive_file:163 isolated_file:0
> [  359.050529]    unevictable:0 dirty:0 writeback:0 unstable:0
> [  359.050529]    slab_reclaimable:3430 slab_unreclaimable:8034083
> [  359.050529]    mapped:1 shmem:2 pagetables:80 bounce:0
> [  359.050529]    free:51932 free_pcp:46 free_cma:3741
> [  359.149971] Node 0 active_anon:0kB inactive_anon:8kB active_file:1128kB inactive_file:892kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:4kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
> [  359.229593] Node 0 DMA free:15880kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [  359.308570] lowmem_reserve[]: 0 3201 31832 31832 31832
> [  359.324706] Node 0 DMA32 free:121124kB min:6788kB low:10064kB high:13340kB active_anon:0kB inactive_anon:0kB active_file:100kB inactive_file:0kB unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB mlocked:0kB kernel_stack:16kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [  359.408498] lowmem_reserve[]: 0 0 28631 28631 28631
> [  359.423773] Node 0 Normal free:70792kB min:60760kB low:90092kB high:119424kB active_anon:0kB inactive_anon:8kB active_file:780kB inactive_file:808kB unevictable:0kB writepending:0kB present:29874176kB managed:29337512kB mlocked:0kB kernel_stack:4608kB pagetables:320kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:14964kB
> [  359.511284] lowmem_reserve[]: 0 0 0 0 0
> [  359.523514] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15880kB
> [  359.564260] Node 0 DMA32: 3*4kB (UME) 3*8kB (ME) 4*16kB (UME) 6*32kB (UME) 4*64kB (ME) 4*128kB (ME) 5*256kB (ME) 4*512kB (ME) 6*1024kB (UME) 4*2048kB (UME) 25*4096kB (M) = 121124kB
> [  359.614116] Node 0 Normal: 559*4kB (UMEC) 272*8kB (ME) 163*16kB (UMEC) 93*32kB (UMEC) 65*64kB (MEC) 71*128kB (UME) 37*256kB (ME) 18*512kB (UMC) 8*1024kB (M) 4*2048kB (MC) 3*4096kB (C) = 70604kB
> [  359.667377] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> [  359.693519] 456 total pagecache pages
> [  359.705331] 0 pages in swap cache
> [  359.716151] Swap cache stats: add 1184, delete 1184, find 4/8
> [  359.734213] Free swap  = 15610620kB
> [  359.745499] Total swap = 15616764kB
> [  359.756879] 8313052 pages RAM
> [  359.766596] 0 pages HighMem/MovableOnly
> [  359.778927] 150579 pages reserved
> [  359.789686] 4096 pages cma reserved
> [  359.801052] 0 pages hwpoisoned
> [  359.811026] [ pid ]     uid  tgid total_vm     rss nr_ptes nr_pmds swapents oom_score_adj name
> [  359.837419] [  967]       0   967    936       0       6       2       32          0 init
> [  359.863819] [  968]       0   968    941       1       5       2       98          0 rc
> [  359.889683] [ 1087]       0  1087    942       1       5       2      212          -1000 udevd
> [  359.916457] [ 1294]       0  1294    917       1       5       2       60          -1000 net.agent
> [  359.944236] [ 1340]       0  1340    917       1       5       2       59          -1000 net.agent
> [  359.971915] [ 1750]       0  1750    918       1       5       2       59          -1000 net.agent
> [  359.999603] [ 1915]       0  1915    587       0       5       2       31          0 bootlogd
> [  360.027033] [ 2442]       0  2442    942       0       5       2      211          -1000 udevd
> [  360.053685] [ 2443]       0  2443    942       0       5       2      211          -1000 udevd
> [  360.080467] [ 3023]       0  3023     1538       1       6       2      177          0 S13mountall.sh
> [  360.109446] [ 3078]       0  3078     1719       1       7       2      129          0 mount
> [  360.136111] [ 5722]       0  5722    558       0       5       2       16          -1000 sleep
> [  360.162742] [ 5723]       0  5723    558       0       5       2       17          -1000 sleep
> [  360.189358] [ 5724]       0  5724    558       0       5       2       17          -1000 sleep
> [  360.215977] Out of memory: Kill process 3023 (S13mountall.sh) score 0 or sacrifice child
> [  360.241102] Killed process 3078 (mount) total-vm:6876kB, anon-rss:0kB, file-rss:4kB, shmem-rss:0kB
> [  360.276193] oom_reaper: reaped process 3078 (mount), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> [  360.308435] bcache_writebac invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null),  order=0, oom_score_adj=0
> [  360.342339] bcache_writebac cpuset=/ mems_allowed=0
> [  360.357757] CPU: 1 PID: 2339 Comm: bcache_writebac Tainted: G     U        4.13.0-rc5-amd64-stkreg-sysrq-20170902+ #2
> [  360.390847] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> [  360.419010] Call Trace:
> [  360.427123]    dump_stack+0x61/0x7d
> [  360.437815]    dump_header+0x97/0x239
> [  360.449000]    ? _raw_spin_unlock_irqrestore+0x14/0x24
> [  360.464705]    oom_kill_process+0x86/0x379
> [  360.477180]    out_of_memory+0x3a6/0x3ef
> [  360.489120]    __alloc_pages_slowpath+0x86e/0xa1f
> [  360.503360]    ? native_sched_clock+0x1a/0x37
> [  360.516712]    __alloc_pages_nodemask+0x134/0x1d4
> [  360.530950]    alloc_pages_current+0x8d/0x96
> [  360.543852]    bio_alloc_pages+0x29/0x6a
> [  360.555696]    bch_writeback_thread+0x51c/0x6d4 [bcache]
> [  360.571694]    ? write_dirty+0x90/0x90 [bcache]
> [  360.585320]    kthread+0xfb/0x100
> [  360.595281]    ? init_completion+0x24/0x24
> [  360.607571]    ? do_fast_syscall_32+0xb7/0xfe
> [  360.620725]    ret_from_fork+0x25/0x30
> [  360.632000] Mem-Info:
> [  360.639511] active_anon:0 inactive_anon:2 isolated_anon:0
> [  360.639511]    active_file:237 inactive_file:181 isolated_file:0
> [  360.639511]    unevictable:0 dirty:0 writeback:0 unstable:0
> [  360.639511]    slab_reclaimable:3428 slab_unreclaimable:8054968
> [  360.639511]    mapped:1 shmem:2 pagetables:80 bounce:0
> [  360.639511]    free:31221 free_pcp:20 free_cma:3741
> [  360.738644] Node 0 active_anon:0kB inactive_anon:8kB active_file:1016kB inactive_file:980kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:4kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
> [  360.818057] Node 0 DMA free:15880kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [  360.897260] lowmem_reserve[]: 0 3201 31832 31832 31832
> [  360.913324] Node 0 DMA32 free:4760kB min:6788kB low:10064kB high:13340kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:324kB unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB mlocked:0kB kernel_stack:16kB pagetables:0kB bounce:0kB free_pcp:28kB local_pcp:0kB free_cma:0kB
> [  360.968034] mount: page allocation failure: order:0, mode:0x1604040(GFP_NOFS|__GFP_COMP|__GFP_NOTRACK), nodemask=(null)
> [  360.968038] mount cpuset=/ mems_allowed=0
> [  360.968042] CPU: 0 PID: 3078 Comm: mount Tainted: G       U          4.13.0-rc5-amd64-stkreg-sysrq-20170902+ #2
> [  360.968043] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> [  360.968044] Call Trace:
> [  360.968050]    dump_stack+0x61/0x7d
> [  360.968052]    warn_alloc+0xe4/0x15d
> [  360.968056]    ? call_timer_fn+0x140/0x140
> [  360.968058]    __alloc_pages_slowpath+0x9a8/0xa1f
> [  360.968068]    ? __rmqueue+0x285/0x297
> [  360.968071]    __alloc_pages_nodemask+0x134/0x1d4
> [  360.968075]    cache_grow_begin+0x95/0x26f
> [  360.968077]    fallback_alloc+0x154/0x196
> [  360.968079]    ____cache_alloc_node+0xdd/0xe9
> [  360.968081]    kmem_cache_alloc_trace+0xa0/0xfc
> [  360.968084]    add_tree_block+0x6a/0x1ac
> [  360.968086]    build_ref_tree_for_root+0x19b/0x3a5
> [  360.968088]    btrfs_build_ref_tree+0x133/0x156
> [  360.968090]    open_ctree+0x1997/0x1fd2
> [  360.968093]    btrfs_mount+0x9d5/0xb2d
> [  360.968094]    ? btrfs_mount+0x9d5/0xb2d
> [  360.968096]    ? find_next_bit+0xb/0xd
> [  360.968099]    mount_fs+0x67/0x111
> [  360.968101]    vfs_kern_mount+0x6b/0xd5
> [  360.968102]    btrfs_mount+0x1c3/0xb2d
> [  360.968103]    ? find_next_bit+0xb/0xd
> [  360.968106]    mount_fs+0x67/0x111
> [  360.968107]    vfs_kern_mount+0x6b/0xd5
> [  360.968109]    do_mount+0x6da/0x964
> [  360.968111]    ? slab_post_alloc_hook.isra.46+0xe/0x1d
> [  360.968113]    compat_SyS_mount+0x185/0x1ae
> [  360.968116]    do_fast_syscall_32+0xb7/0xfe
> [  360.968118]    entry_SYSENTER_compat+0x4c/0x5b
> [  360.968119] RIP: 0023:0xf7f21c29
> [  360.968120] RSP: 002b:00000000ffd733e0 EFLAGS: 00000297 ORIG_RAX: 0000000000000015
> [  360.968122] RAX: ffffffffffffffda RBX: 0000000008ff32c8 RCX: 0000000008ff3118
> [  360.968122] RDX: 0000000008ff3010 RSI: 00000000c0ed0400 RDI: 0000000008ff6500
> [  360.968123] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> [  360.968124] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [  360.968124] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [  360.968126] Mem-Info:
> [  360.968130] active_anon:0 inactive_anon:2 isolated_anon:0
> [  360.968130]    active_file:333 inactive_file:144 isolated_file:0
> [  360.968130]    unevictable:0 dirty:0 writeback:0 unstable:0
> [  360.968130]    slab_reclaimable:3428 slab_unreclaimable:8082489
> [  360.968130]    mapped:1 shmem:2 pagetables:80 bounce:0
> [  360.968130]    free:3717 free_pcp:9 free_cma:3741
> [  360.968132] Node 0 active_anon:0kB inactive_anon:8kB active_file:1332kB inactive_file:576kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:4kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
> [  360.968133] Node 0 DMA free:0kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [  360.968136] lowmem_reserve[]: 0 3201 31832 31832 31832
> [  360.968138] Node 0 DMA32 free:296kB min:6788kB low:10064kB high:13340kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:292kB unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB mlocked:0kB kernel_stack:16kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [  360.968141] lowmem_reserve[]: 0 0 28631 28631 28631
> [  360.968143] Node 0 Normal free:14572kB min:60760kB low:90092kB high:119424kB active_anon:0kB inactive_anon:8kB active_file:920kB inactive_file:504kB unevictable:0kB writepending:0kB present:29874176kB managed:29337512kB mlocked:0kB kernel_stack:4608kB pagetables:320kB bounce:0kB free_pcp:36kB local_pcp:0kB free_cma:14964kB
> [  360.968146] lowmem_reserve[]: 0 0 0 0 0
> [  360.968147] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [  360.968152] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [  360.968157] Node 0 Normal: 1*4kB (C) 0*8kB 1*16kB (C) 1*32kB (C) 1*64kB (C) 0*128kB 0*256kB 1*512kB (C) 0*1024kB 1*2048kB (C) 3*4096kB (C) = 14964kB
> [  360.968164] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> [  360.968165] 495 total pagecache pages
> [  360.968167] 0 pages in swap cache
> [  360.968168] Swap cache stats: add 1184, delete 1184, find 4/8
> [  360.968169] Free swap  = 15611132kB
> [  360.968169] Total swap = 15616764kB
> [  360.968170] 8313052 pages RAM
> [  360.968170] 0 pages HighMem/MovableOnly
> [  360.968170] 150579 pages reserved
> [  360.968171] 4096 pages cma reserved
> [  360.968171] 0 pages hwpoisoned
> [  362.324699] lowmem_reserve[]: 0 0 28631 28631 28631
> [  362.340147] Node 0 Normal free:2422448kB min:60760kB low:90092kB high:119424kB active_anon:636kB inactive_anon:196kB active_file:3788kB inactive_file:1424kB unevictable:0kB writepending:0kB present:29874176kB managed:29337512kB mlocked:0kB kernel_stack:4608kB pagetables:324kB bounce:0kB free_pcp:4100kB local_pcp:48kB free_cma:11468kB
> [  362.431162] lowmem_reserve[]: 0 0 0 0 0
> [  362.443496] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [  362.475115] Node 0 DMA32: 4469*4kB (U) 3639*8kB (U) 3116*16kB (U) 1233*32kB (U) 64*64kB (U) 9*128kB (U) 4*256kB (U) 6*512kB (U) 3*1024kB (U) 6*2048kB (U) 45*4096kB (U) = 345324kB
> [  362.524575] Node 0 Normal: 53138*4kB (UMC) 40392*8kB (UMC) 29104*16kB (UM) 16364*32kB (U) 4513*64kB (UC) 658*128kB (UC) 25*256kB (U) 18*512kB (U) 20*1024kB (UC) 19*2048kB (UC) 193*4096kB (UC) = 2763592kB
> [  362.580622] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> [  362.606863] 1429 total pagecache pages
> [  362.619062] 96 pages in swap cache
> [  362.630199] Swap cache stats: add 1396, delete 1300, find 237/380
> [  362.649433] Free swap  = 15611132kB
> [  362.660833] Total swap = 15616764kB
> [  362.672274] 8313052 pages RAM
> [  362.682133] 0 pages HighMem/MovableOnly
> [  362.694909] 150579 pages reserved
> [  362.705741] 4096 pages cma reserved
> [  362.717084] 0 pages hwpoisoned
> [  362.727099] [ pid ]     uid  tgid total_vm     rss nr_ptes nr_pmds swapents oom_score_adj name
> [  362.753547] [  967]       0   967    936       0       6       2       32          0 init
> [  362.779963] [  968]       0   968    941       1       5       2       98          0 rc
> [  362.805846] [ 1087]       0  1087    942       1       5       2      212          -1000 udevd
> [  362.832511] [ 1294]       0  1294    917     421       5       2       24          -1000 net.agent
> [  362.860203] [ 1340]       0  1340    917     396       5       2       28          -1000 net.agent
> [  362.887851] [ 1750]       0  1750    918     419       5       2       25          -1000 net.agent
> [  362.915478] [ 1915]       0  1915    587       0       5       2       31          0 bootlogd
> [  362.942839] [ 2442]       0  2442    942       0       5       2      211          -1000 udevd
> [  362.969390] [ 2443]       0  2443    942       0       5       2      211          -1000 udevd
> [  362.995922] [ 3023]       0  3023     1538       1       6       2      177          0 S13mountall.sh
> [  363.024797] [ 3078]       0  3078     1719       0       7       2        0          0 mount
> [  363.051309] [ 5743]       0  5743    558     147       5       2        0          -1000 sleep
> [  363.077802] [ 5744]       0  5744    558     148       5       2        0          -1000 sleep
> [  363.104305] [ 5745]       0  5745    558     143       5       2        0          -1000 sleep
> [  363.130752] Out of memory: Kill process 3023 (S13mountall.sh) score 0 or sacrifice child
> [  363.155660] Killed process 3023 (S13mountall.sh) total-vm:6152kB, anon-rss:0kB, file-rss:4kB, shmem-rss:0kB
> /etc/init.d/rc: line 120:  3023 Killed            $debug "$script" $action
> 
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                      .... what McDonalds is to gourmet cooking
> Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=1aDQg12_YvSWTRwtKEuju2jwfwBQHWUmF1TFzisZwyE&s=gOQVCOu1vW2YKYAvS2imou0jsDaSNerp6_GvMVfCh5Q&e=                          | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-09-03  0:30                               ` Josef Bacik
@ 2017-09-03  1:01                                 ` Marc MERLIN
  2017-09-03  3:26                                   ` Josef Bacik
  0 siblings, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2017-09-03  1:01 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

On Sun, Sep 03, 2017 at 12:30:07AM +0000, Josef Bacik wrote:
> My bad, I forgot I don't dynamically allocate the stack trace space so my patch did nothing, I blame the children for distracting me.  I've dropped allocating the action altogether for the on disk stuff, that should dramatically reduce the memory usage.  You can just do a git pull since I made a new commit.  You are mounting with -o ref_verify on only the one fs right?  Give this a try and if it still doesn't work we can try a stripped down version that doesn't build the initial tree and just hope that the problem exists in allocating a new block and not modifying the refs for an existing block.  Thanks,

Good news, this time it booted without crashing on OOM.

I'll now get to see how it runs and hopefully it won't crash due to
other problems in 4.13

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-09-03  1:01                                 ` Marc MERLIN
@ 2017-09-03  3:26                                   ` Josef Bacik
  2017-09-03 14:31                                     ` Marc MERLIN
  0 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2017-09-03  3:26 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

I was looking through the code for other ways to cut down memory usage when I noticed we only catch improper re-allocations, not adding another ref for metadata which is what I suspect your problem is.  I added another patch and pushed it out, sorry for the churn.

Josef

Sent from my iPhone

> On Sep 2, 2017, at 9:01 PM, Marc MERLIN <marc@merlins.org> wrote:
> 
>> On Sun, Sep 03, 2017 at 12:30:07AM +0000, Josef Bacik wrote:
>> My bad, I forgot I don't dynamically allocate the stack trace space so my patch did nothing, I blame the children for distracting me.  I've dropped allocating the action altogether for the on disk stuff, that should dramatically reduce the memory usage.  You can just do a git pull since I made a new commit.  You are mounting with -o ref_verify on only the one fs right?  Give this a try and if it still doesn't work we can try a stripped down version that doesn't build the initial tree and just hope that the problem exists in allocating a new block and not modifying the refs for an existing block.  Thanks,
> 
> Good news, this time it booted without crashing on OOM.
> 
> I'll now get to see how it runs and hopefully it won't crash due to
> other problems in 4.13
> 
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                      .... what McDonalds is to gourmet cooking
> Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=-zFzT4JPwAa-JY-PU1TRHuerYPlZf00HGKCTgtSRcxU&s=fyD-Ff-h7AsoFbRF2RqvzlURQJg38B1RTu7A_n0OLs8&e=                          | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-09-03  3:26                                   ` Josef Bacik
@ 2017-09-03 14:31                                     ` Marc MERLIN
  2017-09-03 14:38                                       ` Josef Bacik
  0 siblings, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2017-09-03 14:31 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

On Sun, Sep 03, 2017 at 03:26:34AM +0000, Josef Bacik wrote:
> I was looking through the code for other ways to cut down memory usage when I noticed we only catch improper re-allocations, not adding another ref for metadata which is what I suspect your problem is.  I added another patch and pushed it out, sorry for the churn.

Installed.

For now, I've seen this once, but otherwise no issues:
Dropping a ref for a root that doesn't have a ref on the block
Dumping block entry [26538725376 4096], num_refs 2, metadata 0, from disk 1
  Ref root 0, parent 29818880, owner 23608, offset 0, num_refs 18446744073709551615
  Ref root 0, parent 202129408, owner 23608, offset 0, num_refs 1
  Ref root 418, parent 0, owner 23608, offset 0, num_refs 1
  Root entry 418, num_refs 1
  Root entry 69809, num_refs 0
  Ref action 1, root 418, ref_root 0, parent 202129408, owner 23608, offset 0, num_refs 1
  No stacktrace support
  Ref action 2, root 69809, ref_root 0, parent 29818880, owner 23608, offset 0, num_refs 18446744073709551615
  No stacktrace support


I'm assuming this was done by your patch?
Should I worry about 'No stacktrace support' ?

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-09-03 14:31                                     ` Marc MERLIN
@ 2017-09-03 14:38                                       ` Josef Bacik
  2017-09-03 14:42                                         ` Marc MERLIN
  0 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2017-09-03 14:38 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

Oh yeah you need CONFIG_STACKTRACE turned on, otherwise this is going to be difficult ;).  Thanks,

Josef

Sent from my iPhone

> On Sep 3, 2017, at 10:31 AM, Marc MERLIN <marc@merlins.org> wrote:
> 
>> On Sun, Sep 03, 2017 at 03:26:34AM +0000, Josef Bacik wrote:
>> I was looking through the code for other ways to cut down memory usage when I noticed we only catch improper re-allocations, not adding another ref for metadata which is what I suspect your problem is.  I added another patch and pushed it out, sorry for the churn.
> 
> Installed.
> 
> For now, I've seen this once, but otherwise no issues:
> Dropping a ref for a root that doesn't have a ref on the block
> Dumping block entry [26538725376 4096], num_refs 2, metadata 0, from disk 1
>  Ref root 0, parent 29818880, owner 23608, offset 0, num_refs 18446744073709551615
>  Ref root 0, parent 202129408, owner 23608, offset 0, num_refs 1
>  Ref root 418, parent 0, owner 23608, offset 0, num_refs 1
>  Root entry 418, num_refs 1
>  Root entry 69809, num_refs 0
>  Ref action 1, root 418, ref_root 0, parent 202129408, owner 23608, offset 0, num_refs 1
>  No stacktrace support
>  Ref action 2, root 69809, ref_root 0, parent 29818880, owner 23608, offset 0, num_refs 18446744073709551615
>  No stacktrace support
> 
> 
> I'm assuming this was done by your patch?
> Should I worry about 'No stacktrace support' ?
> 
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                      .... what McDonalds is to gourmet cooking
> Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=LcpX_93P3Y777JowgGupu6UcijcbbvSYDebGKuuA1G8&s=w9rh7zu0AfB72bo7gMQ9oAj20iJYe8KIXuudlTWa_ek&e=                          | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-09-03 14:38                                       ` Josef Bacik
@ 2017-09-03 14:42                                         ` Marc MERLIN
  2017-09-03 14:55                                           ` Josef Bacik
  2017-09-03 17:33                                           ` Josef Bacik
  0 siblings, 2 replies; 47+ messages in thread
From: Marc MERLIN @ 2017-09-03 14:42 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

On Sun, Sep 03, 2017 at 02:38:57PM +0000, Josef Bacik wrote:
> Oh yeah you need CONFIG_STACKTRACE turned on, otherwise this is going to be difficult ;).  Thanks,
 
Right, except that I thought I did:

saruman:/usr/src/linux-btrfs/btrfs-next# grep STACKTRACE .config
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_RELIABLE_STACKTRACE=y
CONFIG_STACKTRACE=y
CONFIG_USER_STACKTRACE_SUPPORT=y

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-09-03 14:42                                         ` Marc MERLIN
@ 2017-09-03 14:55                                           ` Josef Bacik
  2017-09-03 17:33                                           ` Josef Bacik
  1 sibling, 0 replies; 47+ messages in thread
From: Josef Bacik @ 2017-09-03 14:55 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

Jesus Christ I misspelled it, I'll fix it up when I get home.  Thanks,

Josef

Sent from my iPhone

> On Sep 3, 2017, at 10:42 AM, Marc MERLIN <marc@merlins.org> wrote:
> 
>> On Sun, Sep 03, 2017 at 02:38:57PM +0000, Josef Bacik wrote:
>> Oh yeah you need CONFIG_STACKTRACE turned on, otherwise this is going to be difficult ;).  Thanks,
> 
> Right, except that I thought I did:
> 
> saruman:/usr/src/linux-btrfs/btrfs-next# grep STACKTRACE .config
> CONFIG_STACKTRACE_SUPPORT=y
> CONFIG_HAVE_RELIABLE_STACKTRACE=y
> CONFIG_STACKTRACE=y
> CONFIG_USER_STACKTRACE_SUPPORT=y
> 
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                      .... what McDonalds is to gourmet cooking
> Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=6hYQEzNFsUwvT2CxYV_u4CrE2zAroYdvDkhnSNUI_aY&s=8wh8ci2P8k3BgZ3s_Fxsh3cZak4P3ESZslRm2vobnqs&e=                          | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-09-03 14:42                                         ` Marc MERLIN
  2017-09-03 14:55                                           ` Josef Bacik
@ 2017-09-03 17:33                                           ` Josef Bacik
  2017-09-03 20:20                                             ` Marc MERLIN
  1 sibling, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2017-09-03 17:33 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

Alright pushed, sorry about that.

Josef

Sent from my iPhone

> On Sep 3, 2017, at 10:42 AM, Marc MERLIN <marc@merlins.org> wrote:
> 
>> On Sun, Sep 03, 2017 at 02:38:57PM +0000, Josef Bacik wrote:
>> Oh yeah you need CONFIG_STACKTRACE turned on, otherwise this is going to be difficult ;).  Thanks,
> 
> Right, except that I thought I did:
> 
> saruman:/usr/src/linux-btrfs/btrfs-next# grep STACKTRACE .config
> CONFIG_STACKTRACE_SUPPORT=y
> CONFIG_HAVE_RELIABLE_STACKTRACE=y
> CONFIG_STACKTRACE=y
> CONFIG_USER_STACKTRACE_SUPPORT=y
> 
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                      .... what McDonalds is to gourmet cooking
> Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=6hYQEzNFsUwvT2CxYV_u4CrE2zAroYdvDkhnSNUI_aY&s=8wh8ci2P8k3BgZ3s_Fxsh3cZak4P3ESZslRm2vobnqs&e=                          | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-09-03 17:33                                           ` Josef Bacik
@ 2017-09-03 20:20                                             ` Marc MERLIN
  2017-09-04  0:55                                               ` Josef Bacik
  2017-09-05 18:19                                               ` Josef Bacik
  0 siblings, 2 replies; 47+ messages in thread
From: Marc MERLIN @ 2017-09-03 20:20 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

On Sun, Sep 03, 2017 at 05:33:33PM +0000, Josef Bacik wrote:
> Alright pushed, sorry about that.
 
I'm reasonably sure I'm running the new code, but still got this:
[ 2104.336513] Dropping a ref for a root that doesn't have a ref on the block
[ 2104.358226] Dumping block entry [115253923840 155648], num_refs 1, metadata 0, from disk 1
[ 2104.384037]   Ref root 0, parent 3414272884736, owner 262813, offset 0, num_refs 18446744073709551615
[ 2104.412766]   Ref root 418, parent 0, owner 262813, offset 0, num_refs 1
[ 2104.433888]   Root entry 418, num_refs 1
[ 2104.446648]   Root entry 69869, num_refs 0
[ 2104.459904]   Ref action 2, root 69869, ref_root 0, parent 3414272884736, owner 262813, offset 0, num_refs 18446744073709551615
[ 2104.496244]   No Stacktrace

Now, in the background I had a monthly md check of the underlying device
(mdadm raid 5), and got some of those. Obviously that's not good, and 
I'm assuming that md raid5 may not have a checksum on blocks, so it won't know
which drive has the corrupted data.
Does that sound right?

Now, the good news is that btrfs on top does have checksums, so running a scrub should
hopefully find those corrupted blocks if they happen to be in use by the filesystem
(maybe they are free).
But as a reminder, this whole thread started with my FS maybe not being in a good state, but both
check --repair and scrub returning clean. Maybe I'll use the opportunity to re-run a check --repair
and a scrub after that to see what state things are in.

md6: mismatch sector in range 3581539536-3581539544
md6: mismatch sector in range 3581539544-3581539552
md6: mismatch sector in range 3581539552-3581539560
md6: mismatch sector in range 3581539560-3581539568  
md6: mismatch sector in range 3581543792-3581543800
md6: mismatch sector in range 3581543800-3581543808
md6: mismatch sector in range 3581543808-3581543816
md6: mismatch sector in range 3581543816-3581543824
md6: mismatch sector in range 3581544112-3581544120
md6: mismatch sector in range 3581544120-3581544128

As for your patch, no idea why it's not giving me a stacktrace, sorry :-/

Git log of my tree does show:
commit aa162d2908bd7452805ea812b7550232b0b6ed53
Author: Josef Bacik <jbacik@fb.com>
Date:   Sun Sep 3 13:32:17 2017 -0400

    Btrfs: use be->metadata just in case
    
    I suspect we're not getting the owner in some cases, so we want to just
    use the known value.
    
    Signed-off-by: Josef Bacik <jbacik@fb.com>

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-09-03 20:20                                             ` Marc MERLIN
@ 2017-09-04  0:55                                               ` Josef Bacik
  2017-09-05 18:19                                               ` Josef Bacik
  1 sibling, 0 replies; 47+ messages in thread
From: Josef Bacik @ 2017-09-04  0:55 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

Ok this output looked fishy and so I went and tested it on my box again.  It looks like I wasn't testing modifying a snapshot with an existing fs so I never saw these errors, but I see them as well.  I definitely fucked the building of the initial ref tree.  It's too late tonight for me to rework it and have it working for you, but I should be able to get it into shape in the morning.  I'll let you know when I have something useful to test, sorry about the mess,

Josef

Sent from my iPhone

> On Sep 3, 2017, at 4:21 PM, Marc MERLIN <marc@merlins.org> wrote:
> 
>> On Sun, Sep 03, 2017 at 05:33:33PM +0000, Josef Bacik wrote:
>> Alright pushed, sorry about that.
> 
> I'm reasonably sure I'm running the new code, but still got this:
> [ 2104.336513] Dropping a ref for a root that doesn't have a ref on the block
> [ 2104.358226] Dumping block entry [115253923840 155648], num_refs 1, metadata 0, from disk 1
> [ 2104.384037]   Ref root 0, parent 3414272884736, owner 262813, offset 0, num_refs 18446744073709551615
> [ 2104.412766]   Ref root 418, parent 0, owner 262813, offset 0, num_refs 1
> [ 2104.433888]   Root entry 418, num_refs 1
> [ 2104.446648]   Root entry 69869, num_refs 0
> [ 2104.459904]   Ref action 2, root 69869, ref_root 0, parent 3414272884736, owner 262813, offset 0, num_refs 18446744073709551615
> [ 2104.496244]   No Stacktrace
> 
> Now, in the background I had a monthly md check of the underlying device
> (mdadm raid 5), and got some of those. Obviously that's not good, and 
> I'm assuming that md raid5 may not have a checksum on blocks, so it won't know
> which drive has the corrupted data.
> Does that sound right?
> 
> Now, the good news is that btrfs on top does have checksums, so running a scrub should
> hopefully find those corrupted blocks if they happen to be in use by the filesystem
> (maybe they are free).
> But as a reminder, this whole thread started with my FS maybe not being in a good state, but both
> check --repair and scrub returning clean. Maybe I'll use the opportunity to re-run a check --repair
> and a scrub after that to see what state things are in.
> 
> md6: mismatch sector in range 3581539536-3581539544
> md6: mismatch sector in range 3581539544-3581539552
> md6: mismatch sector in range 3581539552-3581539560
> md6: mismatch sector in range 3581539560-3581539568  
> md6: mismatch sector in range 3581543792-3581543800
> md6: mismatch sector in range 3581543800-3581543808
> md6: mismatch sector in range 3581543808-3581543816
> md6: mismatch sector in range 3581543816-3581543824
> md6: mismatch sector in range 3581544112-3581544120
> md6: mismatch sector in range 3581544120-3581544128
> 
> As for your patch, no idea why it's not giving me a stacktrace, sorry :-/
> 
> Git log of my tree does show:
> commit aa162d2908bd7452805ea812b7550232b0b6ed53
> Author: Josef Bacik <jbacik@fb.com>
> Date:   Sun Sep 3 13:32:17 2017 -0400
> 
>    Btrfs: use be->metadata just in case
> 
>    I suspect we're not getting the owner in some cases, so we want to just
>    use the known value.
> 
>    Signed-off-by: Josef Bacik <jbacik@fb.com>
> 
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                      .... what McDonalds is to gourmet cooking
> Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=BaH33jtavN-1wWyV3yseE5v7ImIAaTXLnjChSr4HnQw&s=3JczS4Mo254uip2aIsYiC_EUHsmGYcCJUUMl6si8NQ8&e=                          | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-09-03 20:20                                             ` Marc MERLIN
  2017-09-04  0:55                                               ` Josef Bacik
@ 2017-09-05 18:19                                               ` Josef Bacik
  2017-09-09 18:39                                                 ` Marc MERLIN
  1 sibling, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2017-09-05 18:19 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 3536 bytes --]

Alright I just reworked the build tree ref stuff and tested it to make sure it wasn’t going to give false positives again.  Apparently I had only ever used this with very basic existing fs’es and nothing super complicated, so it was just broken for anything complex.  I’ve pushed it to my tree, you can just pull and build and try again.  This time the stack traces will even work!  Thanks,

Josef

On 9/3/17, 4:21 PM, "Marc MERLIN" <marc@merlins.org> wrote:

On Sun, Sep 03, 2017 at 05:33:33PM +0000, Josef Bacik wrote:
> Alright pushed, sorry about that.
 
I'm reasonably sure I'm running the new code, but still got this:
[ 2104.336513] Dropping a ref for a root that doesn't have a ref on the block
[ 2104.358226] Dumping block entry [115253923840 155648], num_refs 1, metadata 0, from disk 1
[ 2104.384037]   Ref root 0, parent 3414272884736, owner 262813, offset 0, num_refs 18446744073709551615
[ 2104.412766]   Ref root 418, parent 0, owner 262813, offset 0, num_refs 1
[ 2104.433888]   Root entry 418, num_refs 1
[ 2104.446648]   Root entry 69869, num_refs 0
[ 2104.459904]   Ref action 2, root 69869, ref_root 0, parent 3414272884736, owner 262813, offset 0, num_refs 18446744073709551615
[ 2104.496244]   No Stacktrace

Now, in the background I had a monthly md check of the underlying device
(mdadm raid 5), and got some of those. Obviously that's not good, and 
I'm assuming that md raid5 may not have a checksum on blocks, so it won't know
which drive has the corrupted data.
Does that sound right?

Now, the good news is that btrfs on top does have checksums, so running a scrub should
hopefully find those corrupted blocks if they happen to be in use by the filesystem
(maybe they are free).
But as a reminder, this whole thread started with my FS maybe not being in a good state, but both
check --repair and scrub returning clean. Maybe I'll use the opportunity to re-run a check --repair
and a scrub after that to see what state things are in.

md6: mismatch sector in range 3581539536-3581539544
md6: mismatch sector in range 3581539544-3581539552
md6: mismatch sector in range 3581539552-3581539560
md6: mismatch sector in range 3581539560-3581539568  
md6: mismatch sector in range 3581543792-3581543800
md6: mismatch sector in range 3581543800-3581543808
md6: mismatch sector in range 3581543808-3581543816
md6: mismatch sector in range 3581543816-3581543824
md6: mismatch sector in range 3581544112-3581544120
md6: mismatch sector in range 3581544120-3581544128

As for your patch, no idea why it's not giving me a stacktrace, sorry :-/

Git log of my tree does show:
commit aa162d2908bd7452805ea812b7550232b0b6ed53
Author: Josef Bacik <jbacik@fb.com>
Date:   Sun Sep 3 13:32:17 2017 -0400

    Btrfs: use be->metadata just in case
    
    I suspect we're not getting the owner in some cases, so we want to just
    use the known value.
    
    Signed-off-by: Josef Bacik <jbacik@fb.com>

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=BaH33jtavN-1wWyV3yseE5v7ImIAaTXLnjChSr4HnQw&s=3JczS4Mo254uip2aIsYiC_EUHsmGYcCJUUMl6si8NQ8&e=                          | PGP 1024R/763BE901


ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨è­Ú&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~†­†Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ߣøm

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-09-05 18:19                                               ` Josef Bacik
@ 2017-09-09 18:39                                                 ` Marc MERLIN
  2017-09-09 22:56                                                   ` Josef Bacik
  0 siblings, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2017-09-09 18:39 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

On Tue, Sep 05, 2017 at 06:19:25PM +0000, Josef Bacik wrote:
> Alright I just reworked the build tree ref stuff and tested it to make sure it wasn’t going to give false positives again.  Apparently I had only ever used this with very basic existing fs’es and nothing super complicated, so it was just broken for anything complex.  I’ve pushed it to my tree, you can just pull and build and try again.  This time the stack traces will even work!  Thanks,
 
Ok, so I found out that I just need to copy a bunch of data to the
filesystem to trigger the bug.

There you go:
[318400.507972] re-allocated a block that still has references to it!
[318400.527517] Dumping block entry [13282417065984 16384], num_refs 2, metadata 1, from disk 1
[318400.553751]   Ref root 2, parent 0, owner 0, offset 0, num_refs 1
[318400.573208]   Root entry 2, num_refs 1
[318400.585614]   Root entry 7, num_refs 0
[318400.598028]   Ref action 3, root 7, ref_root 7, parent 0, owner 1, offset 0, num_refs 1
[318400.623774]    btrfs_alloc_tree_block+0x33e/0x3e1
[318400.639083]    __btrfs_cow_block+0xf3/0x420
[318400.652817]    btrfs_cow_block+0xcf/0x145
[318400.666024]    btrfs_search_slot+0x269/0x6de
[318400.680041]    btrfs_del_csums+0xac/0x2f9
[318400.693245]    __btrfs_free_extent+0x88b/0xa0b
[318400.707718]    __btrfs_run_delayed_refs+0xb4e/0xd20
[318400.723491]    btrfs_run_delayed_refs+0x77/0x1a1
[318400.738993]    btrfs_write_dirty_block_groups+0xf5/0x2c1
[318400.755994]    commit_cowonly_roots+0x1da/0x273
[318400.770673]    btrfs_commit_transaction+0x3dd/0x761
[318400.786397]    transaction_kthread+0xe2/0x178
[318400.800515]    kthread+0xfb/0x100
[318400.811487]    ret_from_fork+0x25/0x30
[318400.823748]    0xffffffffffffffff
[318400.957574] ------------[ cut here ]------------
[318400.972498] WARNING: CPU: 2 PID: 3242 at fs/btrfs/extent-tree.c:3015 btrfs_run_delayed_refs+0xa2/0x1a1
[318401.001382] Modules linked in: veth ip6table_filter ip6_tables ebtable_nat ebtables ppdev lp xt_addrtype br_netfilter bridge stp llc tun autofs4 softdog binfmt_misc ftdi_sio nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_nat xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG iptable_mangle iptable_filter lm85 hwmon_vid pl2303 dm_snapshot dm_bufio iptable_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_conntrack_ftp ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_nat nf_conntrack x_tables sg st snd_pcm_oss snd_mixer_oss bcache kvm_intel kvm irqbypass snd_hda_codec_realtek snd_hda_codec_generic snd_cmipci snd_hda_intel snd_mpu401_uart snd_hda_codec snd_opl3_lib eeepc_wmi snd_hda_core tpm_infineon snd_rawmidi asix asus_wmi rc_ati_x10 tpm_tis
[318401.218357]  snd_seq_device sparse_keymap snd_hwdep tpm_tis_core ati_remote usbnet parport_pc snd_pcm rfkill pcspkr i915 hwmon tpm parport rc_core libphy mei_me snd_timer lpc_ich wmi_bmof battery usbserial evdev wmi input_leds i2c_i801 snd soundcore e1000e ptp pps_core fuse raid456 multipath mmc_block mmc_core lrw ablk_helper dm_crypt dm_mod async_raid6_recov async_pq async_xor async_memcpy async_tx crc32c_intel blowfish_x86_64 blowfish_common pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd xhci_pci ehci_pci xhci_hcd ehci_hcd mvsas libsas r8169 sata_sil24 usbcore mii scsi_transport_sas thermal fan [last unloaded: ftdi_sio]
[318401.392440] CPU: 2 PID: 3242 Comm: btrfs-transacti Tainted: G     U          4.13.0-rc5-amd64-stkreg-sysrq-20170902d+ #6
[318401.426262] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
[318401.454894] task: ffff948ef791e200 task.stack: ffffb18a091ec000
[318401.473918] RIP: 0010:btrfs_run_delayed_refs+0xa2/0x1a1
[318401.490849] RSP: 0018:ffffb18a091efd08 EFLAGS: 00010296
[318401.507751] RAX: 0000000000000026 RBX: ffff9488208be618 RCX: 0000000000000000
[318401.530384] RDX: ffff948f1e295e01 RSI: ffff948f1e28dd58 RDI: ffff948f1e28dd58
[318401.553548] RBP: ffffb18a091efd50 R08: 0003dc12ea8bcc57 R09: ffff948f1f50b868
[318401.576127] R10: ffff948b1f1cc460 R11: ffffffffaef37285 R12: 00000000ffffffef
[318401.598717] R13: ffffffffffffffff R14: ffff948edb7efd48 R15: ffff948cdbdeb000
[318401.621327] FS:  0000000000000000(0000) GS:ffff948f1e280000(0000) knlGS:0000000000000000
[318401.646737] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[318401.665149] CR2: 00000000f7f05001 CR3: 000000061f587000 CR4: 00000000001406e0
[318401.687684] Call Trace:
[318401.696148]  btrfs_write_dirty_block_groups+0xf5/0x2c1
[318401.712745]  ? btrfs_run_delayed_refs+0x127/0x1a1
[318401.727981]  commit_cowonly_roots+0x1da/0x273
[318401.742183]  btrfs_commit_transaction+0x3dd/0x761
[318401.757447]  transaction_kthread+0xe2/0x178
[318401.771158]  ? btrfs_cleanup_transaction+0x3c2/0x3c2
[318401.787169]  kthread+0xfb/0x100
[318401.797769]  ? init_completion+0x24/0x24
[318401.810718]  ret_from_fork+0x25/0x30
[318401.822588] Code: 85 c0 41 89 c4 79 60 48 8b 43 60 f0 0f ba a8 d8 16 00 00 02 72 35 41 83 fc fb 74 13 44 89 e6 48 c7 c7 27 3f af ae e8 81 5d e1 ff <0f> ff eb 1c f6 05 2a da ab 00 04 74 13 48 8b 7b 60 44 89 e2 48 
[318401.881182] ---[ end trace 47464f1fcc4796c5 ]---
[318401.896818] BTRFS: error (device dm-2) in btrfs_run_delayed_refs:3015: errno=-17 Object already exists
[318401.925978] BTRFS info (device dm-2): forced readonly
[318401.950682] BTRFS warning (device dm-2): Skipping commit of aborted transaction.
[318401.974102] BTRFS: error (device dm-2) in cleanup_transaction:1873: errno=-17 Object already exists

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-09-09 18:39                                                 ` Marc MERLIN
@ 2017-09-09 22:56                                                   ` Josef Bacik
  2017-09-10  2:36                                                     ` Marc MERLIN
  0 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2017-09-09 22:56 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 6676 bytes --]

Well that's odd, a block allocated on disk is in the free space cache.  Can I see the full output of the fsck?  I want to make sure it's actually getting to the part where it checks the free space cache.  If it does then I'll have to think of how to catch this kind of bug, because you've got a weird one.  Thanks,

Josef

Sent from my iPhone

> On Sep 9, 2017, at 2:39 PM, Marc MERLIN <marc@merlins.org> wrote:
> 
>> On Tue, Sep 05, 2017 at 06:19:25PM +0000, Josef Bacik wrote:
>> Alright I just reworked the build tree ref stuff and tested it to make sure it wasn’t going to give false positives again.  Apparently I had only ever used this with very basic existing fs’es and nothing super complicated, so it was just broken for anything complex.  I’ve pushed it to my tree, you can just pull and build and try again.  This time the stack traces will even work!  Thanks,
> 
> Ok, so I found out that I just need to copy a bunch of data to the
> filesystem to trigger the bug.
> 
> There you go:
> [318400.507972] re-allocated a block that still has references to it!
> [318400.527517] Dumping block entry [13282417065984 16384], num_refs 2, metadata 1, from disk 1
> [318400.553751]   Ref root 2, parent 0, owner 0, offset 0, num_refs 1
> [318400.573208]   Root entry 2, num_refs 1
> [318400.585614]   Root entry 7, num_refs 0
> [318400.598028]   Ref action 3, root 7, ref_root 7, parent 0, owner 1, offset 0, num_refs 1
> [318400.623774]    btrfs_alloc_tree_block+0x33e/0x3e1
> [318400.639083]    __btrfs_cow_block+0xf3/0x420
> [318400.652817]    btrfs_cow_block+0xcf/0x145
> [318400.666024]    btrfs_search_slot+0x269/0x6de
> [318400.680041]    btrfs_del_csums+0xac/0x2f9
> [318400.693245]    __btrfs_free_extent+0x88b/0xa0b
> [318400.707718]    __btrfs_run_delayed_refs+0xb4e/0xd20
> [318400.723491]    btrfs_run_delayed_refs+0x77/0x1a1
> [318400.738993]    btrfs_write_dirty_block_groups+0xf5/0x2c1
> [318400.755994]    commit_cowonly_roots+0x1da/0x273
> [318400.770673]    btrfs_commit_transaction+0x3dd/0x761
> [318400.786397]    transaction_kthread+0xe2/0x178
> [318400.800515]    kthread+0xfb/0x100
> [318400.811487]    ret_from_fork+0x25/0x30
> [318400.823748]    0xffffffffffffffff
> [318400.957574] ------------[ cut here ]------------
> [318400.972498] WARNING: CPU: 2 PID: 3242 at fs/btrfs/extent-tree.c:3015 btrfs_run_delayed_refs+0xa2/0x1a1
> [318401.001382] Modules linked in: veth ip6table_filter ip6_tables ebtable_nat ebtables ppdev lp xt_addrtype br_netfilter bridge stp llc tun autofs4 softdog binfmt_misc ftdi_sio nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_nat xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG iptable_mangle iptable_filter lm85 hwmon_vid pl2303 dm_snapshot dm_bufio iptable_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_conntrack_ftp ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_nat nf_conntrack x_tables sg st snd_pcm_oss snd_mixer_oss bcache kvm_intel kvm irqbypass snd_hda_codec_realtek snd_hda_codec_generic snd_cmipci snd_hda_intel snd_mpu401_uart snd_hda_codec snd_opl3_lib eeepc_wmi snd_hda_core tpm_infineon snd_rawmidi asix asus_wmi rc_ati_x10 tpm_tis
> [318401.218357]  snd_seq_device sparse_keymap snd_hwdep tpm_tis_core ati_remote usbnet parport_pc snd_pcm rfkill pcspkr i915 hwmon tpm parport rc_core libphy mei_me snd_timer lpc_ich wmi_bmof battery usbserial evdev wmi input_leds i2c_i801 snd soundcore e1000e ptp pps_core fuse raid456 multipath mmc_block mmc_core lrw ablk_helper dm_crypt dm_mod async_raid6_recov async_pq async_xor async_memcpy async_tx crc32c_intel blowfish_x86_64 blowfish_common pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd xhci_pci ehci_pci xhci_hcd ehci_hcd mvsas libsas r8169 sata_sil24 usbcore mii scsi_transport_sas thermal fan [last unloaded: ftdi_sio]
> [318401.392440] CPU: 2 PID: 3242 Comm: btrfs-transacti Tainted: G     U          4.13.0-rc5-amd64-stkreg-sysrq-20170902d+ #6
> [318401.426262] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> [318401.454894] task: ffff948ef791e200 task.stack: ffffb18a091ec000
> [318401.473918] RIP: 0010:btrfs_run_delayed_refs+0xa2/0x1a1
> [318401.490849] RSP: 0018:ffffb18a091efd08 EFLAGS: 00010296
> [318401.507751] RAX: 0000000000000026 RBX: ffff9488208be618 RCX: 0000000000000000
> [318401.530384] RDX: ffff948f1e295e01 RSI: ffff948f1e28dd58 RDI: ffff948f1e28dd58
> [318401.553548] RBP: ffffb18a091efd50 R08: 0003dc12ea8bcc57 R09: ffff948f1f50b868
> [318401.576127] R10: ffff948b1f1cc460 R11: ffffffffaef37285 R12: 00000000ffffffef
> [318401.598717] R13: ffffffffffffffff R14: ffff948edb7efd48 R15: ffff948cdbdeb000
> [318401.621327] FS:  0000000000000000(0000) GS:ffff948f1e280000(0000) knlGS:0000000000000000
> [318401.646737] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [318401.665149] CR2: 00000000f7f05001 CR3: 000000061f587000 CR4: 00000000001406e0
> [318401.687684] Call Trace:
> [318401.696148]  btrfs_write_dirty_block_groups+0xf5/0x2c1
> [318401.712745]  ? btrfs_run_delayed_refs+0x127/0x1a1
> [318401.727981]  commit_cowonly_roots+0x1da/0x273
> [318401.742183]  btrfs_commit_transaction+0x3dd/0x761
> [318401.757447]  transaction_kthread+0xe2/0x178
> [318401.771158]  ? btrfs_cleanup_transaction+0x3c2/0x3c2
> [318401.787169]  kthread+0xfb/0x100
> [318401.797769]  ? init_completion+0x24/0x24
> [318401.810718]  ret_from_fork+0x25/0x30
> [318401.822588] Code: 85 c0 41 89 c4 79 60 48 8b 43 60 f0 0f ba a8 d8 16 00 00 02 72 35 41 83 fc fb 74 13 44 89 e6 48 c7 c7 27 3f af ae e8 81 5d e1 ff <0f> ff eb 1c f6 05 2a da ab 00 04 74 13 48 8b 7b 60 44 89 e2 48 
> [318401.881182] ---[ end trace 47464f1fcc4796c5 ]---
> [318401.896818] BTRFS: error (device dm-2) in btrfs_run_delayed_refs:3015: errno=-17 Object already exists
> [318401.925978] BTRFS info (device dm-2): forced readonly
> [318401.950682] BTRFS warning (device dm-2): Skipping commit of aborted transaction.
> [318401.974102] BTRFS: error (device dm-2) in cleanup_transaction:1873: errno=-17 Object already exists
> 
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                      .... what McDonalds is to gourmet cooking
> Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=o759mdnkjma3m2oMqEzj1KVVewpEmzlydubih83mtq0&s=IRkCyJIqdUxvKz2hxZ2G_kAV0pyiM5qARhoNzbUuoh0&e=                          | PGP 1024R/763BE901
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨è­Ú&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~†­†Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ߣøm

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-09-09 22:56                                                   ` Josef Bacik
@ 2017-09-10  2:36                                                     ` Marc MERLIN
  2017-09-10  3:12                                                       ` Josef Bacik
  0 siblings, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2017-09-10  2:36 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

On Sat, Sep 09, 2017 at 10:56:14PM +0000, Josef Bacik wrote:
> Well that's odd, a block allocated on disk is in the free space cache.  Can I see the full output of the fsck?  I want to make sure it's actually getting to the part where it checks the free space cache.  If it does then I'll have to think of how to catch this kind of bug, because you've got a weird one.  Thanks,
 
Well, btrfs check was clean before, that, but now I returned this:
gargamel:~# time btrfs check /dev/mapper/dshelf1  
Checking filesystem on /dev/mapper/dshelf1  
UUID: 36f5079e-ca6c-4855-8639-ccb82695c18d  
checking extents  
checking free space cache  
Wanted bytes 16384, found 196608 for off 13282417049600  
Wanted bytes 536870912, found 196608 for off 13282417049600  
cache appears valid but isn't 13282417049600  
There is no free space entry for 13849889603584-13849889652736  
There is no free space entry for 13849889603584-13850426474496  
cache appears valid but isn't 13849889603584  
Wanted bytes 5832704, found 81920 for off 13870290698240  
Wanted bytes 536870912, found 81920 for off 13870290698240  
cache appears valid but isn't 13870290698240  
block group 13928272756736 has wrong amount of free space  
failed to load free space cache for block group 13928272756736  
Duplicate entries in free space cache  
failed to load free space cache for block group 13962095624192  
block group 14003434684416 has wrong amount of free space  
failed to load free space cache for block group 14003434684416  
block group 14470042615808 has wrong amount of free space  
failed to load free space cache for block group 14470042615808  
block group 14610702794752 has wrong amount of free space  
failed to load free space cache for block group 14610702794752  
block group 14612313407488 has wrong amount of free space  
failed to load free space cache for block group 14612313407488  
block group 14624661438464 has wrong amount of free space  
failed to load free space cache for block group 14624661438464  
block group 14648820629504 has wrong amount of free space  
failed to load free space cache for block group 14648820629504  
Wanted offset 14657410793472, found 14657410760704  
Wanted offset 14657410793472, found 14657410760704  
cache appears valid but isn't 14657410564096  
block group 15886844952576 has wrong amount of free space  
failed to load free space cache for block group 15886844952576  
There is no free space entry for 15905635434496-15905636499456  
There is no free space entry for 15905635434496-15906172305408  
cache appears valid but isn't 15905635434496  
block group 16542901207040 has wrong amount of free space  
failed to load free space cache for block group 16542901207040  
block group 16581019041792 has wrong amount of free space  
failed to load free space cache for block group 16581019041792  
block group 16616989392896 has wrong amount of free space  
failed to load free space cache for block group 16616989392896  
block group 16676582064128 has wrong amount of free space  
failed to load free space cache for block group 16676582064128  
block group 16697520029696 has wrong amount of free space  
failed to load free space cache for block group 16697520029696  
block group 16848380755968 has wrong amount of free space  
failed to load free space cache for block group 16848380755968  
ERROR: errors found in free space cache  
found 11732749766656 bytes used, error(s) found  
total csum bytes: 11441478452  
total tree bytes: 13793296384  
total fs tree bytes: 727580672  
total extent tree bytes: 483426304  
btree space waste bytes: 1194373662  
file data blocks allocated: 12133646495744  
 referenced 12155707805696  
  
real    100m12.252s  
user    0m33.771s  
sys     1m11.220s 

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-09-10  2:36                                                     ` Marc MERLIN
@ 2017-09-10  3:12                                                       ` Josef Bacik
  2017-09-10 13:14                                                         ` Marc MERLIN
  0 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2017-09-10  3:12 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

Ok mount -o clear_cache, umount and run fsck again just to make sure.  Then if it comes out clean mount with ref_verify again and wait for it to blow up again.  Thanks,

Josef

Sent from my iPhone

> On Sep 9, 2017, at 10:37 PM, Marc MERLIN <marc@merlins.org> wrote:
> 
>> On Sat, Sep 09, 2017 at 10:56:14PM +0000, Josef Bacik wrote:
>> Well that's odd, a block allocated on disk is in the free space cache.  Can I see the full output of the fsck?  I want to make sure it's actually getting to the part where it checks the free space cache.  If it does then I'll have to think of how to catch this kind of bug, because you've got a weird one.  Thanks,
> 
> Well, btrfs check was clean before, that, but now I returned this:
> gargamel:~# time btrfs check /dev/mapper/dshelf1  
> Checking filesystem on /dev/mapper/dshelf1  
> UUID: 36f5079e-ca6c-4855-8639-ccb82695c18d  
> checking extents  
> checking free space cache  
> Wanted bytes 16384, found 196608 for off 13282417049600  
> Wanted bytes 536870912, found 196608 for off 13282417049600  
> cache appears valid but isn't 13282417049600  
> There is no free space entry for 13849889603584-13849889652736  
> There is no free space entry for 13849889603584-13850426474496  
> cache appears valid but isn't 13849889603584  
> Wanted bytes 5832704, found 81920 for off 13870290698240  
> Wanted bytes 536870912, found 81920 for off 13870290698240  
> cache appears valid but isn't 13870290698240  
> block group 13928272756736 has wrong amount of free space  
> failed to load free space cache for block group 13928272756736  
> Duplicate entries in free space cache  
> failed to load free space cache for block group 13962095624192  
> block group 14003434684416 has wrong amount of free space  
> failed to load free space cache for block group 14003434684416  
> block group 14470042615808 has wrong amount of free space  
> failed to load free space cache for block group 14470042615808  
> block group 14610702794752 has wrong amount of free space  
> failed to load free space cache for block group 14610702794752  
> block group 14612313407488 has wrong amount of free space  
> failed to load free space cache for block group 14612313407488  
> block group 14624661438464 has wrong amount of free space  
> failed to load free space cache for block group 14624661438464  
> block group 14648820629504 has wrong amount of free space  
> failed to load free space cache for block group 14648820629504  
> Wanted offset 14657410793472, found 14657410760704  
> Wanted offset 14657410793472, found 14657410760704  
> cache appears valid but isn't 14657410564096  
> block group 15886844952576 has wrong amount of free space  
> failed to load free space cache for block group 15886844952576  
> There is no free space entry for 15905635434496-15905636499456  
> There is no free space entry for 15905635434496-15906172305408  
> cache appears valid but isn't 15905635434496  
> block group 16542901207040 has wrong amount of free space  
> failed to load free space cache for block group 16542901207040  
> block group 16581019041792 has wrong amount of free space  
> failed to load free space cache for block group 16581019041792  
> block group 16616989392896 has wrong amount of free space  
> failed to load free space cache for block group 16616989392896  
> block group 16676582064128 has wrong amount of free space  
> failed to load free space cache for block group 16676582064128  
> block group 16697520029696 has wrong amount of free space  
> failed to load free space cache for block group 16697520029696  
> block group 16848380755968 has wrong amount of free space  
> failed to load free space cache for block group 16848380755968  
> ERROR: errors found in free space cache  
> found 11732749766656 bytes used, error(s) found  
> total csum bytes: 11441478452  
> total tree bytes: 13793296384  
> total fs tree bytes: 727580672  
> total extent tree bytes: 483426304  
> btree space waste bytes: 1194373662  
> file data blocks allocated: 12133646495744  
> referenced 12155707805696  
> 
> real    100m12.252s  
> user    0m33.771s  
> sys     1m11.220s 
> 
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                      .... what McDonalds is to gourmet cooking
> Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=aM1dKUolLxTtIO-Lzj78H4ut4SBtL_PddTteGDuBebc&s=vl4rfHfvogAgd7IHj7J1ZX4Joo9Rwj87HHq-BoldS8k&e=                          | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-09-10  3:12                                                       ` Josef Bacik
@ 2017-09-10 13:14                                                         ` Marc MERLIN
  2017-09-10 13:16                                                           ` Josef Bacik
  0 siblings, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2017-09-10 13:14 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

On Sun, Sep 10, 2017 at 03:12:16AM +0000, Josef Bacik wrote:
> Ok mount -o clear_cache, umount and run fsck again just to make sure.  Then if it comes out clean mount with ref_verify again and wait for it to blow up again.  Thanks,
 
Ok, just did the 2nd fsck, came back clean after mount -o clear_cache

I'll re-trigger the exact same bug and repeat the whole cycle then.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-09-10 13:14                                                         ` Marc MERLIN
@ 2017-09-10 13:16                                                           ` Josef Bacik
  2017-09-11  0:22                                                             ` Marc MERLIN
  0 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2017-09-10 13:16 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

Great, if the free space cache is fucked again after the next go around then I need to expand the verifier to watch entries being added to the cache as well.  Thanks,

Josef

Sent from my iPhone

> On Sep 10, 2017, at 9:14 AM, Marc MERLIN <marc@merlins.org> wrote:
> 
>> On Sun, Sep 10, 2017 at 03:12:16AM +0000, Josef Bacik wrote:
>> Ok mount -o clear_cache, umount and run fsck again just to make sure.  Then if it comes out clean mount with ref_verify again and wait for it to blow up again.  Thanks,
> 
> Ok, just did the 2nd fsck, came back clean after mount -o clear_cache
> 
> I'll re-trigger the exact same bug and repeat the whole cycle then.
> 
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                      .... what McDonalds is to gourmet cooking
> Home page: https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=46Ubpt2icp5_meAcqMuzd4whl0dZVSwf02fqYoDbzKw&s=nb55W48Rh0IzH8FH4eykviziYCc2S72iYmmNxdpjbOc&e=                          | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-09-10 13:16                                                           ` Josef Bacik
@ 2017-09-11  0:22                                                             ` Marc MERLIN
  2017-09-27 18:01                                                               ` Marc MERLIN
  0 siblings, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2017-09-11  0:22 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

On Sun, Sep 10, 2017 at 01:16:26PM +0000, Josef Bacik wrote:
> Great, if the free space cache is fucked again after the next go
> around then I need to expand the verifier to watch entries being added
> to the cache as well.  Thanks,

Well, I copied about 1TB of data, and nothing happened.
So it seems clearing it and fsck may have fixed this fault I had been
carrying for quite a while.
If so, yeah!

I'm not sure if this needs a kernel fix to not get triggered and if
btrfs check should also be improved to catch this, but hopefully you
know what makes sense there.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
  2017-09-11  0:22                                                             ` Marc MERLIN
@ 2017-09-27 18:01                                                               ` Marc MERLIN
  0 siblings, 0 replies; 47+ messages in thread
From: Marc MERLIN @ 2017-09-27 18:01 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-btrfs, Chris Murphy, Chris Mason, bo.li.liu, fdmanana,
	David Sterba

On Sun, Sep 10, 2017 at 05:22:14PM -0700, Marc MERLIN wrote:
> On Sun, Sep 10, 2017 at 01:16:26PM +0000, Josef Bacik wrote:
> > Great, if the free space cache is fucked again after the next go
> > around then I need to expand the verifier to watch entries being added
> > to the cache as well.  Thanks,
> 
> Well, I copied about 1TB of data, and nothing happened.
> So it seems clearing it and fsck may have fixed this fault I had been
> carrying for quite a while.
> If so, yeah!
> 
> I'm not sure if this needs a kernel fix to not get triggered and if
> btrfs check should also be improved to catch this, but hopefully you
> know what makes sense there.

Just to report back, it's now been another 2 weeks, and no problem.
Seems that forcing the clear cache was actually the issue. Not sure if
the kernel should have found/detected/auto fixed the problem or if btrfs
check should have.

Either way, thanks for your help.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2017-09-27 18:02 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-11  6:21 BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists Marc MERLIN
2017-07-11 16:00 ` Chris Murphy
2017-07-11 16:48   ` Marc MERLIN
2017-07-11 22:43     ` Chris Murphy
2017-07-11 23:04       ` Marc MERLIN
2017-07-13  1:10     ` Marc MERLIN
2017-07-13 18:17       ` Chris Murphy
2017-07-15  0:48         ` Marc MERLIN
2017-07-15  1:22 ` BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012) Marc MERLIN
2017-07-15 23:12   ` Marc MERLIN
2017-07-16 14:01     ` Giuseppe Della Bianca
2017-07-16 16:06       ` Marc MERLIN
2017-07-17 11:05         ` gius db
2017-08-29  3:16     ` Marc MERLIN
2017-08-29 14:30       ` Josef Bacik
2017-08-29 14:39         ` Marc MERLIN
2017-08-29 14:43           ` Josef Bacik
2017-08-29 18:22           ` Josef Bacik
2017-08-30  3:40             ` Marc MERLIN
2017-08-31 14:52               ` Josef Bacik
2017-08-31 17:36                 ` Marc MERLIN
2017-08-31 17:48                   ` Josef Bacik
2017-09-01 20:43                     ` Marc MERLIN
2017-09-01 23:01                       ` Josef Bacik
2017-09-02 16:09                         ` Marc MERLIN
2017-09-02 16:52                           ` Josef Bacik
     [not found]                             ` <CAHKv19A=OVgCpQpDL2454T+f8QgLm9iynA8xZ4w4Kg8JjYS=UA@mail.gmail.com>
2017-09-02 18:55                               ` Fwd: " George Joseph
2017-09-02 23:53                             ` Marc MERLIN
2017-09-03  0:30                               ` Josef Bacik
2017-09-03  1:01                                 ` Marc MERLIN
2017-09-03  3:26                                   ` Josef Bacik
2017-09-03 14:31                                     ` Marc MERLIN
2017-09-03 14:38                                       ` Josef Bacik
2017-09-03 14:42                                         ` Marc MERLIN
2017-09-03 14:55                                           ` Josef Bacik
2017-09-03 17:33                                           ` Josef Bacik
2017-09-03 20:20                                             ` Marc MERLIN
2017-09-04  0:55                                               ` Josef Bacik
2017-09-05 18:19                                               ` Josef Bacik
2017-09-09 18:39                                                 ` Marc MERLIN
2017-09-09 22:56                                                   ` Josef Bacik
2017-09-10  2:36                                                     ` Marc MERLIN
2017-09-10  3:12                                                       ` Josef Bacik
2017-09-10 13:14                                                         ` Marc MERLIN
2017-09-10 13:16                                                           ` Josef Bacik
2017-09-11  0:22                                                             ` Marc MERLIN
2017-09-27 18:01                                                               ` Marc MERLIN

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.