linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* btrfs panic problem
@ 2018-09-18  0:28 sunny.s.zhang
  2018-09-18  0:36 ` sunny.s.zhang
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: sunny.s.zhang @ 2018-09-18  0:28 UTC (permalink / raw)
  To: linux-btrfs

Hi All,

My OS(4.1.12) panic in kmem_cache_alloc, which is called by 
btrfs_get_or_create_delayed_node.

I found that the freelist of the slub is wrong.

crash> struct kmem_cache_cpu ffff887e7d7a24b0

struct kmem_cache_cpu {
   freelist = 0x2026,   <<< the value is id of one inode
   tid = 29567861,
   page = 0xffffea0132168d00,
   partial = 0x0
}

And, I found there are two different btrfs inodes pointing delayed_node. 
It means that the same slub is used twice.

I think this slub is freed twice, and then the next pointer of this slub 
point itself. So we get the same slub twice.

When use this slub again, that break the freelist.

Folloing code will make the delayed node being freed twice. But I don't 
found what is the process.

Process A (btrfs_evict_inode) Process B

call btrfs_remove_delayed_node call  btrfs_get_delayed_node

node = ACCESS_ONCE(btrfs_inode->delayed_node);

BTRFS_I(inode)->delayed_node = NULL;
btrfs_release_delayed_node(delayed_node);

if (node) {
atomic_inc(&node->refs);
return node;
}

......

btrfs_release_delayed_node(delayed_node);


1313 void btrfs_remove_delayed_node(struct inode *inode)
1314 {
1315         struct btrfs_delayed_node *delayed_node;
1316
1317         delayed_node = ACCESS_ONCE(BTRFS_I(inode)->delayed_node);
1318         if (!delayed_node)
1319                 return;
1320
1321         BTRFS_I(inode)->delayed_node = NULL;
1322         btrfs_release_delayed_node(delayed_node);
1323 }


   87 static struct btrfs_delayed_node *btrfs_get_delayed_node(struct 
inode *inode)
   88 {
   89         struct btrfs_inode *btrfs_inode = BTRFS_I(inode);
   90         struct btrfs_root *root = btrfs_inode->root;
   91         u64 ino = btrfs_ino(inode);
   92         struct btrfs_delayed_node *node;
   93
   94         node = ACCESS_ONCE(btrfs_inode->delayed_node);
   95         if (node) {
   96                 atomic_inc(&node->refs);
   97                 return node;
   98         }


Thanks,

Sunny


PS:

------------------------------------------------

panic informations

PID: 73638  TASK: ffff887deb586200  CPU: 38  COMMAND: "dockerd"
  #0 [ffff88130404f940] machine_kexec at ffffffff8105ec10
  #1 [ffff88130404f9b0] crash_kexec at ffffffff811145b8
  #2 [ffff88130404fa80] oops_end at ffffffff8101a868
  #3 [ffff88130404fab0] no_context at ffffffff8106ea91
  #4 [ffff88130404fb00] __bad_area_nosemaphore at ffffffff8106ec8d
  #5 [ffff88130404fb50] bad_area_nosemaphore at ffffffff8106eda3
  #6 [ffff88130404fb60] __do_page_fault at ffffffff8106f328
  #7 [ffff88130404fbd0] do_page_fault at ffffffff8106f637
  #8 [ffff88130404fc10] page_fault at ffffffff816f6308
     [exception RIP: kmem_cache_alloc+121]
     RIP: ffffffff811ef019  RSP: ffff88130404fcc8  RFLAGS: 00010286
     RAX: 0000000000000000  RBX: 0000000000000000  RCX: 0000000001c32b76
     RDX: 0000000001c32b75  RSI: 0000000000000000  RDI: 00000000000224b0
     RBP: ffff88130404fd08   R8: ffff887e7d7a24b0   R9: 0000000000000000
     R10: ffff8802668b6618  R11: 0000000000000002  R12: ffff887e3e230a00
     R13: 0000000000002026  R14: ffff887e3e230a00  R15: ffffffffa01abf49
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  #9 [ffff88130404fd10] btrfs_get_or_create_delayed_node at 
ffffffffa01abf49 [btrfs]
#10 [ffff88130404fd60] btrfs_delayed_update_inode at ffffffffa01aea12 
[btrfs]
#11 [ffff88130404fdb0] btrfs_update_inode at ffffffffa015b199 [btrfs]
#12 [ffff88130404fdf0] btrfs_dirty_inode at ffffffffa015cd11 [btrfs]
#13 [ffff88130404fe20] btrfs_update_time at ffffffffa015fa25 [btrfs]
#14 [ffff88130404fe50] touch_atime at ffffffff812286d3
#15 [ffff88130404fe90] iterate_dir at ffffffff81221929
#16 [ffff88130404fee0] sys_getdents64 at ffffffff81221a19
#17 [ffff88130404ff50] system_call_fastpath at ffffffff816f2594
     RIP: 00000000006b68e4  RSP: 000000c866259080  RFLAGS: 00000246
     RAX: ffffffffffffffda  RBX: 000000c828dbbe00  RCX: 00000000006b68e4
     RDX: 0000000000001000  RSI: 000000c83da14000  RDI: 0000000000000011
     RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000000
     R10: 0000000000000000  R11: 0000000000000246  R12: 00000000000000c7
     R13: 0000000002174e74  R14: 0000000005555555  R15: 0000000000000038
     ORIG_RAX: 00000000000000d9  CS: 0033  SS: 002b


We also find the list double add informations, including n_list and p_list:

[8642921.110568] ------------[ cut here ]------------
[8642921.167929] WARNING: CPU: 38 PID: 73638 at lib/list_debug.c:33 
__list_add+0xbe/0xd0()
[8642921.263780] list_add corruption. prev->next should be next 
(ffff887e40fa5368), but was ff:ff884c85a36288. (prev=ffff884c85a36288).
[8642921.405490] Modules linked in: ipt_MASQUERADE 
nf_nat_masquerade_ipv4 xt_conntrack iptable_filter arc4 ecb ppp_mppe 
ppp_async crc_ccitt ppp_generic slhc nfsv3 nfs_acl rpcsec_gss_krb5 
auth_rpcgss nfsv4 nfs fscache lockd sunrpc grace veth xt_nat xt_addrtype 
br_netfilter bridge tcp_diag inet_diag oracleacfs(POE) oracleadvm(POE) 
oracleoks(POE) oracleasm autofs4 dm_queue_length cpufreq_powersave 
be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i iw_cxgb4 cxgb4 cxgb3i 
libcxgbi iw_cxgb3 cxgb3 ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core 
ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 8021q mrp 
garp stp llc bonding iptable_nat nf_nat_ipv4 nf_nat ipt_REJECT 
nf_reject_ipv4 xt_pkttype nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables 
ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state
[8642922.259780]  nf_conntrack ip6table_filter ip6_tables ipv6 
dm_multipath sg iTCO_wdt iTCO_vendor_support ipmi_devintf ipmi_si 
ipmi_msghandler intel_powerclamp coretemp kvm_intel kvm pcspkr ixgbe dca 
ptp pps_core vxlan udp_tunnel ip6_udp_tunnel mdio sb_edac edac_core 
i2c_i801 i2c_core lpc_ich mfd_core shpchp ext4 jbd2 mbcache2 btrfs 
raid6_pq xor sd_mod megaraid_sas ghash_clmulni_intel crc32_pclmul 
crc32c_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper 
aes_x86_64 ahci libahci mxm_wmi wmi dm_mirror dm_region_hash dm_log 
dm_mod [last unloaded: xt_conntrack]
[8642922.850285] CPU: 38 PID: 73638 Comm: dockerd Tainted: P           
OE   4.1.12-112.14.13.el6uek.x86_64 #2
[8642922.965881] Hardware name: Oracle Corporation ORACLE SERVER 
X6-2/ASM,MOTHERBOARD,1U, BIOS 38040100 06/06/2016
[8642923.086670]  0000000000000000 ffff88130404fc28 ffffffff816ec91e 
ffff88130404fc78
[8642923.177730]  0000000000000021 ffff88130404fc68 ffffffff81088925 
00ff88130404fc98
[8642923.268793]  ffff884c85a36288 ffff887e40fa5368 ffff884c85a36288 
0000000000000000
[8642923.359851] Call Trace:
[8642923.391194]  [<ffffffff816ec91e>] dump_stack+0x63/0x85
[8642923.454780]  [<ffffffff81088925>] warn_slowpath_common+0x95/0xe0
[8642923.528769]  [<ffffffff81088a26>] warn_slowpath_fmt+0x46/0x50
[8642923.599635]  [<ffffffff81343f3e>] __list_add+0xbe/0xd0
[8642923.663333]  [<ffffffffa01abb92>] 
btrfs_queue_delayed_node+0xb2/0xe0 [btrfs]
[8642923.749898]  [<ffffffffa01abc66>] 
__btrfs_release_delayed_node+0xa6/0x160 [btrfs]
[8642923.841661]  [<ffffffffa01aea5e>] 
btrfs_delayed_update_inode+0x7e/0x100 [btrfs]
[8642923.931336]  [<ffffffffa015b199>] btrfs_update_inode+0xa9/0x130 [btrfs]
[8642924.012686]  [<ffffffffa015cd11>] btrfs_dirty_inode+0x71/0xe0 [btrfs]
[8642924.091954]  [<ffffffffa015fa25>] btrfs_update_time+0x65/0xc0 [btrfs]
[8642924.171145]  [<ffffffff812286d3>] touch_atime+0x123/0x170
[8642924.237849]  [<ffffffff81221929>] iterate_dir+0xf9/0x140
[8642924.303512]  [<ffffffff81221a19>] SyS_getdents64+0xa9/0x100
[8642924.372296]  [<ffffffff81221720>] ? fillonedir+0xe0/0xe0
[8642924.437963]  [<ffffffff816f2594>] system_call_fastpath+0x12/0xce
[8642924.512201] ---[ end trace 6782707b09f9f873 ]---
[8642924.569677] ------------[ cut here ]------------
[8642924.627055] WARNING: CPU: 38 PID: 73638 at lib/list_debug.c:36 
__list_add+0x9c/0xd0()
[8642924.722925] list_add double add: new=ffff884c85a36288, 
prev=ffff884c85a36288, next=ffff887e40fa5368.
[8642924.834373] Modules linked in: ipt_MASQUERADE 
nf_nat_masquerade_ipv4 xt_conntrack iptable_filter arc4 ecb ppp_mppe 
ppp_async crc_ccitt ppp_generic slhc nfsv3 nfs_acl rpcsec_gss_krb5 
auth_rpcgss nfsv4 nfs fscache lockd sunrpc grace veth xt_nat xt_addrtype 
br_netfilter bridge tcp_diag inet_diag oracleacfs(POE) oracleadvm(POE) 
oracleoks(POE) oracleasm autofs4 dm_queue_length cpufreq_powersave 
be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i iw_cxgb4 cxgb4 cxgb3i 
libcxgbi iw_cxgb3 cxgb3 ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core 
ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 8021q mrp 
garp stp llc bonding iptable_nat nf_nat_ipv4 nf_nat ipt_REJECT 
nf_reject_ipv4 xt_pkttype nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables 
ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state
[8642925.688498]  nf_conntrack ip6table_filter ip6_tables ipv6 
dm_multipath sg iTCO_wdt iTCO_vendor_support ipmi_devintf ipmi_si 
ipmi_msghandler intel_powerclamp coretemp kvm_intel kvm pcspkr ixgbe dca 
ptp pps_core vxlan udp_tunnel ip6_udp_tunnel mdio sb_edac edac_core 
i2c_i801 i2c_core lpc_ich mfd_core shpchp ext4 jbd2 mbcache2 btrfs 
raid6_pq xor sd_mod megaraid_sas ghash_clmulni_intel crc32_pclmul 
crc32c_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper 
aes_x86_64 ahci libahci mxm_wmi wmi dm_mirror dm_region_hash dm_log 
dm_mod [last unloaded: xt_conntrack]
[8642926.279029] CPU: 38 PID: 73638 Comm: dockerd Tainted: P        W  
OE   4.1.12-112.14.13.el6uek.x86_64 #2
[8642926.394626] Hardware name: Oracle Corporation ORACLE SERVER 
X6-2/ASM,MOTHERBOARD,1U, BIOS 38040100 06/06/2016
[8642926.515421]  0000000000000000 ffff88130404fc28 ffffffff816ec91e 
ffff88130404fc78
[8642926.606485]  0000000000000024 ffff88130404fc68 ffffffff81088925 
00ff88130404fc98
[8642926.697544]  ffff884c85a36288 ffff887e40fa5368 ffff884c85a36288 
0000000000000000
[8642926.788605] Call Trace:
[8642926.819945]  [<ffffffff816ec91e>] dump_stack+0x63/0x85
[8642926.883535]  [<ffffffff81088925>] warn_slowpath_common+0x95/0xe0
[8642926.957529]  [<ffffffff81088a26>] warn_slowpath_fmt+0x46/0x50
[8642927.028400]  [<ffffffff81343f1c>] __list_add+0x9c/0xd0
[8642927.092116]  [<ffffffffa01abb92>] 
btrfs_queue_delayed_node+0xb2/0xe0 [btrfs]
[8642927.178669]  [<ffffffffa01abc66>] 
__btrfs_release_delayed_node+0xa6/0x160 [btrfs]
[8642927.270440]  [<ffffffffa01aea5e>] 
btrfs_delayed_update_inode+0x7e/0x100 [btrfs]
[8642927.360112]  [<ffffffffa015b199>] btrfs_update_inode+0xa9/0x130 [btrfs]
[8642927.441447]  [<ffffffffa015cd11>] btrfs_dirty_inode+0x71/0xe0 [btrfs]
[8642927.520716]  [<ffffffffa015fa25>] btrfs_update_time+0x65/0xc0 [btrfs]
[8642927.599910]  [<ffffffff812286d3>] touch_atime+0x123/0x170
[8642927.666615]  [<ffffffff81221929>] iterate_dir+0xf9/0x140
[8642927.732284]  [<ffffffff81221a19>] SyS_getdents64+0xa9/0x100
[8642927.801072]  [<ffffffff81221720>] ? fillonedir+0xe0/0xe0
[8642927.866741]  [<ffffffff816f2594>] system_call_fastpath+0x12/0xce
[8642927.940843] ---[ end trace 6782707b09f9f874 ]---
[8642927.998378] ------------[ cut here ]------------

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs panic problem
  2018-09-18  0:28 btrfs panic problem sunny.s.zhang
@ 2018-09-18  0:36 ` sunny.s.zhang
  2018-09-18  5:05 ` Duncan
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: sunny.s.zhang @ 2018-09-18  0:36 UTC (permalink / raw)
  To: linux-btrfs

Sorry, modify some errors:

Process A (btrfs_evict_inode)                           Process B

call btrfs_remove_delayed_node                   call 
btrfs_get_delayed_node
node = ACCESS_ONCE(btrfs_inode->delayed_node);

BTRFS_I(inode)->delayed_node = NULL;
btrfs_release_delayed_node(delayed_node);

if (node) {
atomic_inc(&node->refs);
return node;
}

......

btrfs_release_delayed_node(delayed_node);
在 2018年09月18日 08:28, sunny.s.zhang 写道:
> Hi All,
>
> My OS(4.1.12) panic in kmem_cache_alloc, which is called by 
> btrfs_get_or_create_delayed_node.
>
> I found that the freelist of the slub is wrong.
>
> crash> struct kmem_cache_cpu ffff887e7d7a24b0
>
> struct kmem_cache_cpu {
>   freelist = 0x2026,   <<< the value is id of one inode
>   tid = 29567861,
>   page = 0xffffea0132168d00,
>   partial = 0x0
> }
>
> And, I found there are two different btrfs inodes pointing 
> delayed_node. It means that the same slub is used twice.
>
> I think this slub is freed twice, and then the next pointer of this 
> slub point itself. So we get the same slub twice.
>
> When use this slub again, that break the freelist.
>
> Folloing code will make the delayed node being freed twice. But I 
> don't found what is the process.
>
> Process A (btrfs_evict_inode) Process B
>
> call btrfs_remove_delayed_node call  btrfs_get_delayed_node
>
> node = ACCESS_ONCE(btrfs_inode->delayed_node);
>
> BTRFS_I(inode)->delayed_node = NULL;
> btrfs_release_delayed_node(delayed_node);
>
> if (node) {
> atomic_inc(&node->refs);
> return node;
> }
>
> ......
>
> btrfs_release_delayed_node(delayed_node);
>
>
> 1313 void btrfs_remove_delayed_node(struct inode *inode)
> 1314 {
> 1315         struct btrfs_delayed_node *delayed_node;
> 1316
> 1317         delayed_node = ACCESS_ONCE(BTRFS_I(inode)->delayed_node);
> 1318         if (!delayed_node)
> 1319                 return;
> 1320
> 1321         BTRFS_I(inode)->delayed_node = NULL;
> 1322         btrfs_release_delayed_node(delayed_node);
> 1323 }
>
>
>   87 static struct btrfs_delayed_node *btrfs_get_delayed_node(struct 
> inode *inode)
>   88 {
>   89         struct btrfs_inode *btrfs_inode = BTRFS_I(inode);
>   90         struct btrfs_root *root = btrfs_inode->root;
>   91         u64 ino = btrfs_ino(inode);
>   92         struct btrfs_delayed_node *node;
>   93
>   94         node = ACCESS_ONCE(btrfs_inode->delayed_node);
>   95         if (node) {
>   96                 atomic_inc(&node->refs);
>   97                 return node;
>   98         }
>
>
> Thanks,
>
> Sunny
>
>
> PS:
>
> ------------------------------------------------
>
> panic informations
>
> PID: 73638  TASK: ffff887deb586200  CPU: 38  COMMAND: "dockerd"
>  #0 [ffff88130404f940] machine_kexec at ffffffff8105ec10
>  #1 [ffff88130404f9b0] crash_kexec at ffffffff811145b8
>  #2 [ffff88130404fa80] oops_end at ffffffff8101a868
>  #3 [ffff88130404fab0] no_context at ffffffff8106ea91
>  #4 [ffff88130404fb00] __bad_area_nosemaphore at ffffffff8106ec8d
>  #5 [ffff88130404fb50] bad_area_nosemaphore at ffffffff8106eda3
>  #6 [ffff88130404fb60] __do_page_fault at ffffffff8106f328
>  #7 [ffff88130404fbd0] do_page_fault at ffffffff8106f637
>  #8 [ffff88130404fc10] page_fault at ffffffff816f6308
>     [exception RIP: kmem_cache_alloc+121]
>     RIP: ffffffff811ef019  RSP: ffff88130404fcc8  RFLAGS: 00010286
>     RAX: 0000000000000000  RBX: 0000000000000000  RCX: 0000000001c32b76
>     RDX: 0000000001c32b75  RSI: 0000000000000000  RDI: 00000000000224b0
>     RBP: ffff88130404fd08   R8: ffff887e7d7a24b0   R9: 0000000000000000
>     R10: ffff8802668b6618  R11: 0000000000000002  R12: ffff887e3e230a00
>     R13: 0000000000002026  R14: ffff887e3e230a00  R15: ffffffffa01abf49
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>  #9 [ffff88130404fd10] btrfs_get_or_create_delayed_node at 
> ffffffffa01abf49 [btrfs]
> #10 [ffff88130404fd60] btrfs_delayed_update_inode at ffffffffa01aea12 
> [btrfs]
> #11 [ffff88130404fdb0] btrfs_update_inode at ffffffffa015b199 [btrfs]
> #12 [ffff88130404fdf0] btrfs_dirty_inode at ffffffffa015cd11 [btrfs]
> #13 [ffff88130404fe20] btrfs_update_time at ffffffffa015fa25 [btrfs]
> #14 [ffff88130404fe50] touch_atime at ffffffff812286d3
> #15 [ffff88130404fe90] iterate_dir at ffffffff81221929
> #16 [ffff88130404fee0] sys_getdents64 at ffffffff81221a19
> #17 [ffff88130404ff50] system_call_fastpath at ffffffff816f2594
>     RIP: 00000000006b68e4  RSP: 000000c866259080  RFLAGS: 00000246
>     RAX: ffffffffffffffda  RBX: 000000c828dbbe00  RCX: 00000000006b68e4
>     RDX: 0000000000001000  RSI: 000000c83da14000  RDI: 0000000000000011
>     RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000000
>     R10: 0000000000000000  R11: 0000000000000246  R12: 00000000000000c7
>     R13: 0000000002174e74  R14: 0000000005555555  R15: 0000000000000038
>     ORIG_RAX: 00000000000000d9  CS: 0033  SS: 002b
>
>
> We also find the list double add informations, including n_list and 
> p_list:
>
> [8642921.110568] ------------[ cut here ]------------
> [8642921.167929] WARNING: CPU: 38 PID: 73638 at lib/list_debug.c:33 
> __list_add+0xbe/0xd0()
> [8642921.263780] list_add corruption. prev->next should be next 
> (ffff887e40fa5368), but was ff:ff884c85a36288. (prev=ffff884c85a36288).
> [8642921.405490] Modules linked in: ipt_MASQUERADE 
> nf_nat_masquerade_ipv4 xt_conntrack iptable_filter arc4 ecb ppp_mppe 
> ppp_async crc_ccitt ppp_generic slhc nfsv3 nfs_acl rpcsec_gss_krb5 
> auth_rpcgss nfsv4 nfs fscache lockd sunrpc grace veth xt_nat 
> xt_addrtype br_netfilter bridge tcp_diag inet_diag oracleacfs(POE) 
> oracleadvm(POE) oracleoks(POE) oracleasm autofs4 dm_queue_length 
> cpufreq_powersave be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i 
> iw_cxgb4 cxgb4 cxgb3i libcxgbi iw_cxgb3 cxgb3 ib_iser rdma_cm ib_cm 
> iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi 
> scsi_transport_iscsi 8021q mrp garp stp llc bonding iptable_nat 
> nf_nat_ipv4 nf_nat ipt_REJECT nf_reject_ipv4 xt_pkttype 
> nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables ip6t_REJECT nf_reject_ipv6 
> nf_conntrack_ipv6 nf_defrag_ipv6 xt_state
> [8642922.259780]  nf_conntrack ip6table_filter ip6_tables ipv6 
> dm_multipath sg iTCO_wdt iTCO_vendor_support ipmi_devintf ipmi_si 
> ipmi_msghandler intel_powerclamp coretemp kvm_intel kvm pcspkr ixgbe 
> dca ptp pps_core vxlan udp_tunnel ip6_udp_tunnel mdio sb_edac 
> edac_core i2c_i801 i2c_core lpc_ich mfd_core shpchp ext4 jbd2 mbcache2 
> btrfs raid6_pq xor sd_mod megaraid_sas ghash_clmulni_intel 
> crc32_pclmul crc32c_intel aesni_intel ablk_helper cryptd lrw gf128mul 
> glue_helper aes_x86_64 ahci libahci mxm_wmi wmi dm_mirror 
> dm_region_hash dm_log dm_mod [last unloaded: xt_conntrack]
> [8642922.850285] CPU: 38 PID: 73638 Comm: dockerd Tainted: P           
> OE   4.1.12-112.14.13.el6uek.x86_64 #2
> [8642922.965881] Hardware name: Oracle Corporation ORACLE SERVER 
> X6-2/ASM,MOTHERBOARD,1U, BIOS 38040100 06/06/2016
> [8642923.086670]  0000000000000000 ffff88130404fc28 ffffffff816ec91e 
> ffff88130404fc78
> [8642923.177730]  0000000000000021 ffff88130404fc68 ffffffff81088925 
> 00ff88130404fc98
> [8642923.268793]  ffff884c85a36288 ffff887e40fa5368 ffff884c85a36288 
> 0000000000000000
> [8642923.359851] Call Trace:
> [8642923.391194]  [<ffffffff816ec91e>] dump_stack+0x63/0x85
> [8642923.454780]  [<ffffffff81088925>] warn_slowpath_common+0x95/0xe0
> [8642923.528769]  [<ffffffff81088a26>] warn_slowpath_fmt+0x46/0x50
> [8642923.599635]  [<ffffffff81343f3e>] __list_add+0xbe/0xd0
> [8642923.663333]  [<ffffffffa01abb92>] 
> btrfs_queue_delayed_node+0xb2/0xe0 [btrfs]
> [8642923.749898]  [<ffffffffa01abc66>] 
> __btrfs_release_delayed_node+0xa6/0x160 [btrfs]
> [8642923.841661]  [<ffffffffa01aea5e>] 
> btrfs_delayed_update_inode+0x7e/0x100 [btrfs]
> [8642923.931336]  [<ffffffffa015b199>] btrfs_update_inode+0xa9/0x130 
> [btrfs]
> [8642924.012686]  [<ffffffffa015cd11>] btrfs_dirty_inode+0x71/0xe0 
> [btrfs]
> [8642924.091954]  [<ffffffffa015fa25>] btrfs_update_time+0x65/0xc0 
> [btrfs]
> [8642924.171145]  [<ffffffff812286d3>] touch_atime+0x123/0x170
> [8642924.237849]  [<ffffffff81221929>] iterate_dir+0xf9/0x140
> [8642924.303512]  [<ffffffff81221a19>] SyS_getdents64+0xa9/0x100
> [8642924.372296]  [<ffffffff81221720>] ? fillonedir+0xe0/0xe0
> [8642924.437963]  [<ffffffff816f2594>] system_call_fastpath+0x12/0xce
> [8642924.512201] ---[ end trace 6782707b09f9f873 ]---
> [8642924.569677] ------------[ cut here ]------------
> [8642924.627055] WARNING: CPU: 38 PID: 73638 at lib/list_debug.c:36 
> __list_add+0x9c/0xd0()
> [8642924.722925] list_add double add: new=ffff884c85a36288, 
> prev=ffff884c85a36288, next=ffff887e40fa5368.
> [8642924.834373] Modules linked in: ipt_MASQUERADE 
> nf_nat_masquerade_ipv4 xt_conntrack iptable_filter arc4 ecb ppp_mppe 
> ppp_async crc_ccitt ppp_generic slhc nfsv3 nfs_acl rpcsec_gss_krb5 
> auth_rpcgss nfsv4 nfs fscache lockd sunrpc grace veth xt_nat 
> xt_addrtype br_netfilter bridge tcp_diag inet_diag oracleacfs(POE) 
> oracleadvm(POE) oracleoks(POE) oracleasm autofs4 dm_queue_length 
> cpufreq_powersave be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i 
> iw_cxgb4 cxgb4 cxgb3i libcxgbi iw_cxgb3 cxgb3 ib_iser rdma_cm ib_cm 
> iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi 
> scsi_transport_iscsi 8021q mrp garp stp llc bonding iptable_nat 
> nf_nat_ipv4 nf_nat ipt_REJECT nf_reject_ipv4 xt_pkttype 
> nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables ip6t_REJECT nf_reject_ipv6 
> nf_conntrack_ipv6 nf_defrag_ipv6 xt_state
> [8642925.688498]  nf_conntrack ip6table_filter ip6_tables ipv6 
> dm_multipath sg iTCO_wdt iTCO_vendor_support ipmi_devintf ipmi_si 
> ipmi_msghandler intel_powerclamp coretemp kvm_intel kvm pcspkr ixgbe 
> dca ptp pps_core vxlan udp_tunnel ip6_udp_tunnel mdio sb_edac 
> edac_core i2c_i801 i2c_core lpc_ich mfd_core shpchp ext4 jbd2 mbcache2 
> btrfs raid6_pq xor sd_mod megaraid_sas ghash_clmulni_intel 
> crc32_pclmul crc32c_intel aesni_intel ablk_helper cryptd lrw gf128mul 
> glue_helper aes_x86_64 ahci libahci mxm_wmi wmi dm_mirror 
> dm_region_hash dm_log dm_mod [last unloaded: xt_conntrack]
> [8642926.279029] CPU: 38 PID: 73638 Comm: dockerd Tainted: P        W  
> OE   4.1.12-112.14.13.el6uek.x86_64 #2
> [8642926.394626] Hardware name: Oracle Corporation ORACLE SERVER 
> X6-2/ASM,MOTHERBOARD,1U, BIOS 38040100 06/06/2016
> [8642926.515421]  0000000000000000 ffff88130404fc28 ffffffff816ec91e 
> ffff88130404fc78
> [8642926.606485]  0000000000000024 ffff88130404fc68 ffffffff81088925 
> 00ff88130404fc98
> [8642926.697544]  ffff884c85a36288 ffff887e40fa5368 ffff884c85a36288 
> 0000000000000000
> [8642926.788605] Call Trace:
> [8642926.819945]  [<ffffffff816ec91e>] dump_stack+0x63/0x85
> [8642926.883535]  [<ffffffff81088925>] warn_slowpath_common+0x95/0xe0
> [8642926.957529]  [<ffffffff81088a26>] warn_slowpath_fmt+0x46/0x50
> [8642927.028400]  [<ffffffff81343f1c>] __list_add+0x9c/0xd0
> [8642927.092116]  [<ffffffffa01abb92>] 
> btrfs_queue_delayed_node+0xb2/0xe0 [btrfs]
> [8642927.178669]  [<ffffffffa01abc66>] 
> __btrfs_release_delayed_node+0xa6/0x160 [btrfs]
> [8642927.270440]  [<ffffffffa01aea5e>] 
> btrfs_delayed_update_inode+0x7e/0x100 [btrfs]
> [8642927.360112]  [<ffffffffa015b199>] btrfs_update_inode+0xa9/0x130 
> [btrfs]
> [8642927.441447]  [<ffffffffa015cd11>] btrfs_dirty_inode+0x71/0xe0 
> [btrfs]
> [8642927.520716]  [<ffffffffa015fa25>] btrfs_update_time+0x65/0xc0 
> [btrfs]
> [8642927.599910]  [<ffffffff812286d3>] touch_atime+0x123/0x170
> [8642927.666615]  [<ffffffff81221929>] iterate_dir+0xf9/0x140
> [8642927.732284]  [<ffffffff81221a19>] SyS_getdents64+0xa9/0x100
> [8642927.801072]  [<ffffffff81221720>] ? fillonedir+0xe0/0xe0
> [8642927.866741]  [<ffffffff816f2594>] system_call_fastpath+0x12/0xce
> [8642927.940843] ---[ end trace 6782707b09f9f874 ]---
> [8642927.998378] ------------[ cut here ]------------
>
>
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs panic problem
  2018-09-18  0:28 btrfs panic problem sunny.s.zhang
  2018-09-18  0:36 ` sunny.s.zhang
@ 2018-09-18  5:05 ` Duncan
  2018-09-18  9:13   ` sunny.s.zhang
  2018-09-18 23:53   ` sunny.s.zhang
  2018-09-19  0:05 ` Qu Wenruo
  2018-09-19 18:36 ` Liu Bo
  3 siblings, 2 replies; 14+ messages in thread
From: Duncan @ 2018-09-18  5:05 UTC (permalink / raw)
  To: linux-btrfs

sunny.s.zhang posted on Tue, 18 Sep 2018 08:28:14 +0800 as excerpted:

> My OS(4.1.12) panic in kmem_cache_alloc, which is called by
> btrfs_get_or_create_delayed_node.
> 
> I found that the freelist of the slub is wrong.

[Not a dev, just a btrfs list regular and user, myself.  But here's a 
general btrfs list recommendations reply...]

You appear to mean kernel 4.1.12 -- confirmed by the version reported in 
the posted dump:  4.1.12-112.14.13.el6uek.x86_64

OK, so from the perspective of this forward-development-focused list, 
kernel 4.1 is pretty ancient history, but you do have a number of options.

First let's consider the general situation.  Most people choose an 
enterprise distro for supported stability, and that's certainly a valid 
thing to want.  However, btrfs, while now reaching early maturity for the 
basics (single device in single or dup mode, and multi-device in single/
raid0/1/10 modes, note that raid56 mode is newer and less mature), 
remains under quite heavy development, and keeping reasonably current is 
recommended for that reason.

So you you chose an enterprise distro presumably to lock in supported 
stability for several years, but you chose a filesystem, btrfs, that's 
still under heavy development, with reasonably current kernels and 
userspace recommended as tending to have the known bugs fixed.  There's a 
bit of a conflict there, and the /general/ recommendation would thus be 
to consider whether one or the other of those choices are inappropriate 
for your use-case, because it's really quite likely that if you really 
want the stability of an enterprise distro and kernel, that btrfs isn't 
as stable a filesystem as you're likely to want to match with it.  
Alternatively, if you want something newer to match the still under heavy 
development btrfs, you very likely want a distro that's not focused on 
years-old stability just for the sake of it.  One or the other is likely 
to be a poor match for your needs, and choosing something else that's a 
better match is likely to be a much better experience for you.

But perhaps you do have reason to want to run the newer and not quite to 
traditional enterprise-distro level stability btrfs, on an otherwise 
older and very stable enterprise distro.  That's fine, provided you know 
what you're getting yourself into, and are prepared to deal with it.

In that case, for best support from the list, we'd recommend running one 
of the latest two kernels in either the current or mainline LTS tracks. 

For current track, With 4.18 being the latest kernel, that'd be 4.18 or 
4.17, as available on kernel.org (tho 4.17 is already EOL, no further 
releases, at 4.17.19).

For mainline-LTS track, 4.14 and 4.9 are the latest two LTS series 
kernels, tho IIRC 4.19 is scheduled to be this year's LTS (or was it 4.18 
and it's just not out of normal stable range yet so not yet marked LTS?), 
so it'll be coming up soon and 4.9 will then be dropping to third LTS 
series and thus out of our best recommended range.  4.4 was the previous 
LTS and while still in LTS support, is outside the two newest LTS series 
that this list recommends.

And of course 4.1 is older than 4.4, so as I said, in btrfs development 
terms, it's quite ancient indeed... quite out of practical support range 
here, tho of course we'll still try, but in many cases the first question 
when any problem's reported is going to be whether it's reproducible on 
something closer to current.

But... you ARE on an enterprise kernel, likely on an enterprise distro, 
and very possibly actually paying /them/ for support.  So you're not 
without options if you prefer to stay with your supported enterprise 
kernel.  If you're paying them for support, you might as well use it, and 
of course of the very many fixes since 4.1, they know what they've 
backported and what they haven't, so they're far better placed to provide 
that support in any case.

Or, given what you posted, you appear to be reasonably able to do at 
least limited kernel-dev-level analysis yourself.  Given that, you're 
already reasonably well placed to simply decide to stick with what you 
have and take the support you can get, diving into things yourself if 
necessary.


So those are your kernel options.  What about userspace btrfs-progs?

Generally speaking, while the filesystem's running, it's the kernel code 
doing most of the work.  If you have old userspace, it simply means you 
can't take advantage of some of the newer features as the old userspace 
doesn't know how to call for them.

But the situation changes as soon as you have problems and can't mount, 
because it's userspace code that runs to try to fix that sort of problem, 
or failing that, it's userspace code that btrfs restore runs to try to 
grab what files can be grabbed off of the unmountable filesystem.

So for routine operation, it's no big deal if userspace is a bit old, at 
least as long as it's new enough to have all the newer command formats, 
etc, that you need, and for comparing against others when posted.  But 
once things go bad on you, you really want the newest btrfs-progs in 
ordered to give you the best chance at either fixing things, or worst-
case, at least retrieving the files off the dead filesystem.  So using 
the older distro btrfs-progs for routine running should be fine, but 
unless your backups are complete and frequent enough that if something 
goes wrong it's easiest to simply blow the bad version away with a fresh 
mkfs and start over, you'll probably want at least a reasonably current 
btrfs-progs on your rescue media at least.  Since the userspace version 
numbers are synced to the kernel cycle, a good rule of thumb is keep your 
btrfs-progs version to at least that of the oldest recommended LTS kernel 
version, as well, so you'd want at least btrfs-progs 4.9 on your rescue 
media, for now, and 4.14, coming up, since when the new kernel goes LTS 
that'll displace 4.9 and 4.14 will then be the second-back LTS.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs panic problem
  2018-09-18  5:05 ` Duncan
@ 2018-09-18  9:13   ` sunny.s.zhang
  2018-09-18 23:53   ` sunny.s.zhang
  1 sibling, 0 replies; 14+ messages in thread
From: sunny.s.zhang @ 2018-09-18  9:13 UTC (permalink / raw)
  To: Duncan, linux-btrfs

Add Junxiao


在 2018年09月18日 13:05, Duncan 写道:
> sunny.s.zhang posted on Tue, 18 Sep 2018 08:28:14 +0800 as excerpted:
>
>> My OS(4.1.12) panic in kmem_cache_alloc, which is called by
>> btrfs_get_or_create_delayed_node.
>>
>> I found that the freelist of the slub is wrong.
> [Not a dev, just a btrfs list regular and user, myself.  But here's a
> general btrfs list recommendations reply...]
>
> You appear to mean kernel 4.1.12 -- confirmed by the version reported in
> the posted dump:  4.1.12-112.14.13.el6uek.x86_64
>
> OK, so from the perspective of this forward-development-focused list,
> kernel 4.1 is pretty ancient history, but you do have a number of options.
>
> First let's consider the general situation.  Most people choose an
> enterprise distro for supported stability, and that's certainly a valid
> thing to want.  However, btrfs, while now reaching early maturity for the
> basics (single device in single or dup mode, and multi-device in single/
> raid0/1/10 modes, note that raid56 mode is newer and less mature),
> remains under quite heavy development, and keeping reasonably current is
> recommended for that reason.
>
> So you you chose an enterprise distro presumably to lock in supported
> stability for several years, but you chose a filesystem, btrfs, that's
> still under heavy development, with reasonably current kernels and
> userspace recommended as tending to have the known bugs fixed.  There's a
> bit of a conflict there, and the /general/ recommendation would thus be
> to consider whether one or the other of those choices are inappropriate
> for your use-case, because it's really quite likely that if you really
> want the stability of an enterprise distro and kernel, that btrfs isn't
> as stable a filesystem as you're likely to want to match with it.
> Alternatively, if you want something newer to match the still under heavy
> development btrfs, you very likely want a distro that's not focused on
> years-old stability just for the sake of it.  One or the other is likely
> to be a poor match for your needs, and choosing something else that's a
> better match is likely to be a much better experience for you.
>
> But perhaps you do have reason to want to run the newer and not quite to
> traditional enterprise-distro level stability btrfs, on an otherwise
> older and very stable enterprise distro.  That's fine, provided you know
> what you're getting yourself into, and are prepared to deal with it.
>
> In that case, for best support from the list, we'd recommend running one
> of the latest two kernels in either the current or mainline LTS tracks.
>
> For current track, With 4.18 being the latest kernel, that'd be 4.18 or
> 4.17, as available on kernel.org (tho 4.17 is already EOL, no further
> releases, at 4.17.19).
>
> For mainline-LTS track, 4.14 and 4.9 are the latest two LTS series
> kernels, tho IIRC 4.19 is scheduled to be this year's LTS (or was it 4.18
> and it's just not out of normal stable range yet so not yet marked LTS?),
> so it'll be coming up soon and 4.9 will then be dropping to third LTS
> series and thus out of our best recommended range.  4.4 was the previous
> LTS and while still in LTS support, is outside the two newest LTS series
> that this list recommends.
>
> And of course 4.1 is older than 4.4, so as I said, in btrfs development
> terms, it's quite ancient indeed... quite out of practical support range
> here, tho of course we'll still try, but in many cases the first question
> when any problem's reported is going to be whether it's reproducible on
> something closer to current.
>
> But... you ARE on an enterprise kernel, likely on an enterprise distro,
> and very possibly actually paying /them/ for support.  So you're not
> without options if you prefer to stay with your supported enterprise
> kernel.  If you're paying them for support, you might as well use it, and
> of course of the very many fixes since 4.1, they know what they've
> backported and what they haven't, so they're far better placed to provide
> that support in any case.
>
> Or, given what you posted, you appear to be reasonably able to do at
> least limited kernel-dev-level analysis yourself.  Given that, you're
> already reasonably well placed to simply decide to stick with what you
> have and take the support you can get, diving into things yourself if
> necessary.
>
>
> So those are your kernel options.  What about userspace btrfs-progs?
>
> Generally speaking, while the filesystem's running, it's the kernel code
> doing most of the work.  If you have old userspace, it simply means you
> can't take advantage of some of the newer features as the old userspace
> doesn't know how to call for them.
>
> But the situation changes as soon as you have problems and can't mount,
> because it's userspace code that runs to try to fix that sort of problem,
> or failing that, it's userspace code that btrfs restore runs to try to
> grab what files can be grabbed off of the unmountable filesystem.
>
> So for routine operation, it's no big deal if userspace is a bit old, at
> least as long as it's new enough to have all the newer command formats,
> etc, that you need, and for comparing against others when posted.  But
> once things go bad on you, you really want the newest btrfs-progs in
> ordered to give you the best chance at either fixing things, or worst-
> case, at least retrieving the files off the dead filesystem.  So using
> the older distro btrfs-progs for routine running should be fine, but
> unless your backups are complete and frequent enough that if something
> goes wrong it's easiest to simply blow the bad version away with a fresh
> mkfs and start over, you'll probably want at least a reasonably current
> btrfs-progs on your rescue media at least.  Since the userspace version
> numbers are synced to the kernel cycle, a good rule of thumb is keep your
> btrfs-progs version to at least that of the oldest recommended LTS kernel
> version, as well, so you'd want at least btrfs-progs 4.9 on your rescue
> media, for now, and 4.14, coming up, since when the new kernel goes LTS
> that'll displace 4.9 and 4.14 will then be the second-back LTS.
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs panic problem
  2018-09-18  5:05 ` Duncan
  2018-09-18  9:13   ` sunny.s.zhang
@ 2018-09-18 23:53   ` sunny.s.zhang
  2018-09-19 16:12     ` Nikolay Borisov
  1 sibling, 1 reply; 14+ messages in thread
From: sunny.s.zhang @ 2018-09-18 23:53 UTC (permalink / raw)
  To: Duncan, linux-btrfs

Hi Duncan,

Thank you for your advice. I understand what you mean.  But i have 
reviewed the latest btrfs code, and i think the issue is exist still.

At 71 line, if the function of btrfs_get_delayed_node run over this 
line, then switch to other process, which run over the 1282 and release 
the delayed node at the end.

And then, switch back to the  btrfs_get_delayed_node. find that the node 
is not null, and use it as normal. that mean we used a freed memory.

at some time, this memory will be freed again.

latest code as below.

1278 void btrfs_remove_delayed_node(struct btrfs_inode *inode)
1279 {
1280         struct btrfs_delayed_node *delayed_node;
1281
1282         delayed_node = READ_ONCE(inode->delayed_node);
1283         if (!delayed_node)
1284                 return;
1285
1286         inode->delayed_node = NULL;
1287         btrfs_release_delayed_node(delayed_node);
1288 }


   64 static struct btrfs_delayed_node *btrfs_get_delayed_node(
   65                 struct btrfs_inode *btrfs_inode)
   66 {
   67         struct btrfs_root *root = btrfs_inode->root;
   68         u64 ino = btrfs_ino(btrfs_inode);
   69         struct btrfs_delayed_node *node;
   70
   71         node = READ_ONCE(btrfs_inode->delayed_node);
   72         if (node) {
   73                 refcount_inc(&node->refs);
   74                 return node;
   75         }
   76
   77         spin_lock(&root->inode_lock);
   78         node = radix_tree_lookup(&root->delayed_nodes_tree, ino);


在 2018年09月18日 13:05, Duncan 写道:
> sunny.s.zhang posted on Tue, 18 Sep 2018 08:28:14 +0800 as excerpted:
>
>> My OS(4.1.12) panic in kmem_cache_alloc, which is called by
>> btrfs_get_or_create_delayed_node.
>>
>> I found that the freelist of the slub is wrong.
> [Not a dev, just a btrfs list regular and user, myself.  But here's a
> general btrfs list recommendations reply...]
>
> You appear to mean kernel 4.1.12 -- confirmed by the version reported in
> the posted dump:  4.1.12-112.14.13.el6uek.x86_64
>
> OK, so from the perspective of this forward-development-focused list,
> kernel 4.1 is pretty ancient history, but you do have a number of options.
>
> First let's consider the general situation.  Most people choose an
> enterprise distro for supported stability, and that's certainly a valid
> thing to want.  However, btrfs, while now reaching early maturity for the
> basics (single device in single or dup mode, and multi-device in single/
> raid0/1/10 modes, note that raid56 mode is newer and less mature),
> remains under quite heavy development, and keeping reasonably current is
> recommended for that reason.
>
> So you you chose an enterprise distro presumably to lock in supported
> stability for several years, but you chose a filesystem, btrfs, that's
> still under heavy development, with reasonably current kernels and
> userspace recommended as tending to have the known bugs fixed.  There's a
> bit of a conflict there, and the /general/ recommendation would thus be
> to consider whether one or the other of those choices are inappropriate
> for your use-case, because it's really quite likely that if you really
> want the stability of an enterprise distro and kernel, that btrfs isn't
> as stable a filesystem as you're likely to want to match with it.
> Alternatively, if you want something newer to match the still under heavy
> development btrfs, you very likely want a distro that's not focused on
> years-old stability just for the sake of it.  One or the other is likely
> to be a poor match for your needs, and choosing something else that's a
> better match is likely to be a much better experience for you.
>
> But perhaps you do have reason to want to run the newer and not quite to
> traditional enterprise-distro level stability btrfs, on an otherwise
> older and very stable enterprise distro.  That's fine, provided you know
> what you're getting yourself into, and are prepared to deal with it.
>
> In that case, for best support from the list, we'd recommend running one
> of the latest two kernels in either the current or mainline LTS tracks.
>
> For current track, With 4.18 being the latest kernel, that'd be 4.18 or
> 4.17, as available on kernel.org (tho 4.17 is already EOL, no further
> releases, at 4.17.19).
>
> For mainline-LTS track, 4.14 and 4.9 are the latest two LTS series
> kernels, tho IIRC 4.19 is scheduled to be this year's LTS (or was it 4.18
> and it's just not out of normal stable range yet so not yet marked LTS?),
> so it'll be coming up soon and 4.9 will then be dropping to third LTS
> series and thus out of our best recommended range.  4.4 was the previous
> LTS and while still in LTS support, is outside the two newest LTS series
> that this list recommends.
>
> And of course 4.1 is older than 4.4, so as I said, in btrfs development
> terms, it's quite ancient indeed... quite out of practical support range
> here, tho of course we'll still try, but in many cases the first question
> when any problem's reported is going to be whether it's reproducible on
> something closer to current.
>
> But... you ARE on an enterprise kernel, likely on an enterprise distro,
> and very possibly actually paying /them/ for support.  So you're not
> without options if you prefer to stay with your supported enterprise
> kernel.  If you're paying them for support, you might as well use it, and
> of course of the very many fixes since 4.1, they know what they've
> backported and what they haven't, so they're far better placed to provide
> that support in any case.
>
> Or, given what you posted, you appear to be reasonably able to do at
> least limited kernel-dev-level analysis yourself.  Given that, you're
> already reasonably well placed to simply decide to stick with what you
> have and take the support you can get, diving into things yourself if
> necessary.
>
>
> So those are your kernel options.  What about userspace btrfs-progs?
>
> Generally speaking, while the filesystem's running, it's the kernel code
> doing most of the work.  If you have old userspace, it simply means you
> can't take advantage of some of the newer features as the old userspace
> doesn't know how to call for them.
>
> But the situation changes as soon as you have problems and can't mount,
> because it's userspace code that runs to try to fix that sort of problem,
> or failing that, it's userspace code that btrfs restore runs to try to
> grab what files can be grabbed off of the unmountable filesystem.
>
> So for routine operation, it's no big deal if userspace is a bit old, at
> least as long as it's new enough to have all the newer command formats,
> etc, that you need, and for comparing against others when posted.  But
> once things go bad on you, you really want the newest btrfs-progs in
> ordered to give you the best chance at either fixing things, or worst-
> case, at least retrieving the files off the dead filesystem.  So using
> the older distro btrfs-progs for routine running should be fine, but
> unless your backups are complete and frequent enough that if something
> goes wrong it's easiest to simply blow the bad version away with a fresh
> mkfs and start over, you'll probably want at least a reasonably current
> btrfs-progs on your rescue media at least.  Since the userspace version
> numbers are synced to the kernel cycle, a good rule of thumb is keep your
> btrfs-progs version to at least that of the oldest recommended LTS kernel
> version, as well, so you'd want at least btrfs-progs 4.9 on your rescue
> media, for now, and 4.14, coming up, since when the new kernel goes LTS
> that'll displace 4.9 and 4.14 will then be the second-back LTS.
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs panic problem
  2018-09-18  0:28 btrfs panic problem sunny.s.zhang
  2018-09-18  0:36 ` sunny.s.zhang
  2018-09-18  5:05 ` Duncan
@ 2018-09-19  0:05 ` Qu Wenruo
  2018-09-19  0:35   ` sunny.s.zhang
  2018-09-19 18:36 ` Liu Bo
  3 siblings, 1 reply; 14+ messages in thread
From: Qu Wenruo @ 2018-09-19  0:05 UTC (permalink / raw)
  To: sunny.s.zhang, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 12207 bytes --]



On 2018/9/18 上午8:28, sunny.s.zhang wrote:
> Hi All,
> 
> My OS(4.1.12) panic in kmem_cache_alloc, which is called by
> btrfs_get_or_create_delayed_node.

Any reproducer?

Anyway we need a reproducer as a testcase.

The code looks

> 
> I found that the freelist of the slub is wrong.
> 
> crash> struct kmem_cache_cpu ffff887e7d7a24b0
> 
> struct kmem_cache_cpu {
>   freelist = 0x2026,   <<< the value is id of one inode
>   tid = 29567861,
>   page = 0xffffea0132168d00,
>   partial = 0x0
> }
> 
> And, I found there are two different btrfs inodes pointing delayed_node.
> It means that the same slub is used twice.
> 
> I think this slub is freed twice, and then the next pointer of this slub
> point itself. So we get the same slub twice.
> 
> When use this slub again, that break the freelist.
> 
> Folloing code will make the delayed node being freed twice. But I don't
> found what is the process.
> 
> Process A (btrfs_evict_inode) Process B
> 
> call btrfs_remove_delayed_node call  btrfs_get_delayed_node
> 
> node = ACCESS_ONCE(btrfs_inode->delayed_node);
> 
> BTRFS_I(inode)->delayed_node = NULL;
> btrfs_release_delayed_node(delayed_node);
> 
> if (node) {
> atomic_inc(&node->refs);
> return node;
> }
> 
> ......
> 
> btrfs_release_delayed_node(delayed_node);
> 
> 
> 1313 void btrfs_remove_delayed_node(struct inode *inode)
> 1314 {
> 1315         struct btrfs_delayed_node *delayed_node;
> 1316
> 1317         delayed_node = ACCESS_ONCE(BTRFS_I(inode)->delayed_node);
> 1318         if (!delayed_node)
> 1319                 return;
> 1320
> 1321         BTRFS_I(inode)->delayed_node = NULL;
> 1322         btrfs_release_delayed_node(delayed_node);
> 1323 }
> 
> 
>   87 static struct btrfs_delayed_node *btrfs_get_delayed_node(struct
> inode *inode)
>   88 {
>   89         struct btrfs_inode *btrfs_inode = BTRFS_I(inode);
>   90         struct btrfs_root *root = btrfs_inode->root;
>   91         u64 ino = btrfs_ino(inode);
>   92         struct btrfs_delayed_node *node;
>   93
>   94         node = ACCESS_ONCE(btrfs_inode->delayed_node);
>   95         if (node) {
>   96                 atomic_inc(&node->refs);
>   97                 return node;
>   98         }
> 

The analyse looks valid.
Can be fixed by adding a spinlock.

Just wondering why we didn't hit it.

Thanks,
Qu

> 
> Thanks,
> 
> Sunny
> 
> 
> PS:
> 
> ------------------------------------------------
> 
> panic informations
> 
> PID: 73638  TASK: ffff887deb586200  CPU: 38  COMMAND: "dockerd"
>  #0 [ffff88130404f940] machine_kexec at ffffffff8105ec10
>  #1 [ffff88130404f9b0] crash_kexec at ffffffff811145b8
>  #2 [ffff88130404fa80] oops_end at ffffffff8101a868
>  #3 [ffff88130404fab0] no_context at ffffffff8106ea91
>  #4 [ffff88130404fb00] __bad_area_nosemaphore at ffffffff8106ec8d
>  #5 [ffff88130404fb50] bad_area_nosemaphore at ffffffff8106eda3
>  #6 [ffff88130404fb60] __do_page_fault at ffffffff8106f328
>  #7 [ffff88130404fbd0] do_page_fault at ffffffff8106f637
>  #8 [ffff88130404fc10] page_fault at ffffffff816f6308
>     [exception RIP: kmem_cache_alloc+121]
>     RIP: ffffffff811ef019  RSP: ffff88130404fcc8  RFLAGS: 00010286
>     RAX: 0000000000000000  RBX: 0000000000000000  RCX: 0000000001c32b76
>     RDX: 0000000001c32b75  RSI: 0000000000000000  RDI: 00000000000224b0
>     RBP: ffff88130404fd08   R8: ffff887e7d7a24b0   R9: 0000000000000000
>     R10: ffff8802668b6618  R11: 0000000000000002  R12: ffff887e3e230a00
>     R13: 0000000000002026  R14: ffff887e3e230a00  R15: ffffffffa01abf49
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>  #9 [ffff88130404fd10] btrfs_get_or_create_delayed_node at
> ffffffffa01abf49 [btrfs]
> #10 [ffff88130404fd60] btrfs_delayed_update_inode at ffffffffa01aea12
> [btrfs]
> #11 [ffff88130404fdb0] btrfs_update_inode at ffffffffa015b199 [btrfs]
> #12 [ffff88130404fdf0] btrfs_dirty_inode at ffffffffa015cd11 [btrfs]
> #13 [ffff88130404fe20] btrfs_update_time at ffffffffa015fa25 [btrfs]
> #14 [ffff88130404fe50] touch_atime at ffffffff812286d3
> #15 [ffff88130404fe90] iterate_dir at ffffffff81221929
> #16 [ffff88130404fee0] sys_getdents64 at ffffffff81221a19
> #17 [ffff88130404ff50] system_call_fastpath at ffffffff816f2594
>     RIP: 00000000006b68e4  RSP: 000000c866259080  RFLAGS: 00000246
>     RAX: ffffffffffffffda  RBX: 000000c828dbbe00  RCX: 00000000006b68e4
>     RDX: 0000000000001000  RSI: 000000c83da14000  RDI: 0000000000000011
>     RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000000
>     R10: 0000000000000000  R11: 0000000000000246  R12: 00000000000000c7
>     R13: 0000000002174e74  R14: 0000000005555555  R15: 0000000000000038
>     ORIG_RAX: 00000000000000d9  CS: 0033  SS: 002b
> 
> 
> We also find the list double add informations, including n_list and p_list:
> 
> [8642921.110568] ------------[ cut here ]------------
> [8642921.167929] WARNING: CPU: 38 PID: 73638 at lib/list_debug.c:33
> __list_add+0xbe/0xd0()
> [8642921.263780] list_add corruption. prev->next should be next
> (ffff887e40fa5368), but was ff:ff884c85a36288. (prev=ffff884c85a36288).
> [8642921.405490] Modules linked in: ipt_MASQUERADE
> nf_nat_masquerade_ipv4 xt_conntrack iptable_filter arc4 ecb ppp_mppe
> ppp_async crc_ccitt ppp_generic slhc nfsv3 nfs_acl rpcsec_gss_krb5
> auth_rpcgss nfsv4 nfs fscache lockd sunrpc grace veth xt_nat xt_addrtype
> br_netfilter bridge tcp_diag inet_diag oracleacfs(POE) oracleadvm(POE)
> oracleoks(POE) oracleasm autofs4 dm_queue_length cpufreq_powersave
> be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i iw_cxgb4 cxgb4 cxgb3i
> libcxgbi iw_cxgb3 cxgb3 ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core
> ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 8021q mrp
> garp stp llc bonding iptable_nat nf_nat_ipv4 nf_nat ipt_REJECT
> nf_reject_ipv4 xt_pkttype nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables
> ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state
> [8642922.259780]  nf_conntrack ip6table_filter ip6_tables ipv6
> dm_multipath sg iTCO_wdt iTCO_vendor_support ipmi_devintf ipmi_si
> ipmi_msghandler intel_powerclamp coretemp kvm_intel kvm pcspkr ixgbe dca
> ptp pps_core vxlan udp_tunnel ip6_udp_tunnel mdio sb_edac edac_core
> i2c_i801 i2c_core lpc_ich mfd_core shpchp ext4 jbd2 mbcache2 btrfs
> raid6_pq xor sd_mod megaraid_sas ghash_clmulni_intel crc32_pclmul
> crc32c_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper
> aes_x86_64 ahci libahci mxm_wmi wmi dm_mirror dm_region_hash dm_log
> dm_mod [last unloaded: xt_conntrack]
> [8642922.850285] CPU: 38 PID: 73638 Comm: dockerd Tainted: P          
> OE   4.1.12-112.14.13.el6uek.x86_64 #2
> [8642922.965881] Hardware name: Oracle Corporation ORACLE SERVER
> X6-2/ASM,MOTHERBOARD,1U, BIOS 38040100 06/06/2016
> [8642923.086670]  0000000000000000 ffff88130404fc28 ffffffff816ec91e
> ffff88130404fc78
> [8642923.177730]  0000000000000021 ffff88130404fc68 ffffffff81088925
> 00ff88130404fc98
> [8642923.268793]  ffff884c85a36288 ffff887e40fa5368 ffff884c85a36288
> 0000000000000000
> [8642923.359851] Call Trace:
> [8642923.391194]  [<ffffffff816ec91e>] dump_stack+0x63/0x85
> [8642923.454780]  [<ffffffff81088925>] warn_slowpath_common+0x95/0xe0
> [8642923.528769]  [<ffffffff81088a26>] warn_slowpath_fmt+0x46/0x50
> [8642923.599635]  [<ffffffff81343f3e>] __list_add+0xbe/0xd0
> [8642923.663333]  [<ffffffffa01abb92>]
> btrfs_queue_delayed_node+0xb2/0xe0 [btrfs]
> [8642923.749898]  [<ffffffffa01abc66>]
> __btrfs_release_delayed_node+0xa6/0x160 [btrfs]
> [8642923.841661]  [<ffffffffa01aea5e>]
> btrfs_delayed_update_inode+0x7e/0x100 [btrfs]
> [8642923.931336]  [<ffffffffa015b199>] btrfs_update_inode+0xa9/0x130
> [btrfs]
> [8642924.012686]  [<ffffffffa015cd11>] btrfs_dirty_inode+0x71/0xe0 [btrfs]
> [8642924.091954]  [<ffffffffa015fa25>] btrfs_update_time+0x65/0xc0 [btrfs]
> [8642924.171145]  [<ffffffff812286d3>] touch_atime+0x123/0x170
> [8642924.237849]  [<ffffffff81221929>] iterate_dir+0xf9/0x140
> [8642924.303512]  [<ffffffff81221a19>] SyS_getdents64+0xa9/0x100
> [8642924.372296]  [<ffffffff81221720>] ? fillonedir+0xe0/0xe0
> [8642924.437963]  [<ffffffff816f2594>] system_call_fastpath+0x12/0xce
> [8642924.512201] ---[ end trace 6782707b09f9f873 ]---
> [8642924.569677] ------------[ cut here ]------------
> [8642924.627055] WARNING: CPU: 38 PID: 73638 at lib/list_debug.c:36
> __list_add+0x9c/0xd0()
> [8642924.722925] list_add double add: new=ffff884c85a36288,
> prev=ffff884c85a36288, next=ffff887e40fa5368.
> [8642924.834373] Modules linked in: ipt_MASQUERADE
> nf_nat_masquerade_ipv4 xt_conntrack iptable_filter arc4 ecb ppp_mppe
> ppp_async crc_ccitt ppp_generic slhc nfsv3 nfs_acl rpcsec_gss_krb5
> auth_rpcgss nfsv4 nfs fscache lockd sunrpc grace veth xt_nat xt_addrtype
> br_netfilter bridge tcp_diag inet_diag oracleacfs(POE) oracleadvm(POE)
> oracleoks(POE) oracleasm autofs4 dm_queue_length cpufreq_powersave
> be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i iw_cxgb4 cxgb4 cxgb3i
> libcxgbi iw_cxgb3 cxgb3 ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core
> ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 8021q mrp
> garp stp llc bonding iptable_nat nf_nat_ipv4 nf_nat ipt_REJECT
> nf_reject_ipv4 xt_pkttype nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables
> ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state
> [8642925.688498]  nf_conntrack ip6table_filter ip6_tables ipv6
> dm_multipath sg iTCO_wdt iTCO_vendor_support ipmi_devintf ipmi_si
> ipmi_msghandler intel_powerclamp coretemp kvm_intel kvm pcspkr ixgbe dca
> ptp pps_core vxlan udp_tunnel ip6_udp_tunnel mdio sb_edac edac_core
> i2c_i801 i2c_core lpc_ich mfd_core shpchp ext4 jbd2 mbcache2 btrfs
> raid6_pq xor sd_mod megaraid_sas ghash_clmulni_intel crc32_pclmul
> crc32c_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper
> aes_x86_64 ahci libahci mxm_wmi wmi dm_mirror dm_region_hash dm_log
> dm_mod [last unloaded: xt_conntrack]
> [8642926.279029] CPU: 38 PID: 73638 Comm: dockerd Tainted: P        W 
> OE   4.1.12-112.14.13.el6uek.x86_64 #2
> [8642926.394626] Hardware name: Oracle Corporation ORACLE SERVER
> X6-2/ASM,MOTHERBOARD,1U, BIOS 38040100 06/06/2016
> [8642926.515421]  0000000000000000 ffff88130404fc28 ffffffff816ec91e
> ffff88130404fc78
> [8642926.606485]  0000000000000024 ffff88130404fc68 ffffffff81088925
> 00ff88130404fc98
> [8642926.697544]  ffff884c85a36288 ffff887e40fa5368 ffff884c85a36288
> 0000000000000000
> [8642926.788605] Call Trace:
> [8642926.819945]  [<ffffffff816ec91e>] dump_stack+0x63/0x85
> [8642926.883535]  [<ffffffff81088925>] warn_slowpath_common+0x95/0xe0
> [8642926.957529]  [<ffffffff81088a26>] warn_slowpath_fmt+0x46/0x50
> [8642927.028400]  [<ffffffff81343f1c>] __list_add+0x9c/0xd0
> [8642927.092116]  [<ffffffffa01abb92>]
> btrfs_queue_delayed_node+0xb2/0xe0 [btrfs]
> [8642927.178669]  [<ffffffffa01abc66>]
> __btrfs_release_delayed_node+0xa6/0x160 [btrfs]
> [8642927.270440]  [<ffffffffa01aea5e>]
> btrfs_delayed_update_inode+0x7e/0x100 [btrfs]
> [8642927.360112]  [<ffffffffa015b199>] btrfs_update_inode+0xa9/0x130
> [btrfs]
> [8642927.441447]  [<ffffffffa015cd11>] btrfs_dirty_inode+0x71/0xe0 [btrfs]
> [8642927.520716]  [<ffffffffa015fa25>] btrfs_update_time+0x65/0xc0 [btrfs]
> [8642927.599910]  [<ffffffff812286d3>] touch_atime+0x123/0x170
> [8642927.666615]  [<ffffffff81221929>] iterate_dir+0xf9/0x140
> [8642927.732284]  [<ffffffff81221a19>] SyS_getdents64+0xa9/0x100
> [8642927.801072]  [<ffffffff81221720>] ? fillonedir+0xe0/0xe0
> [8642927.866741]  [<ffffffff816f2594>] system_call_fastpath+0x12/0xce
> [8642927.940843] ---[ end trace 6782707b09f9f874 ]---
> [8642927.998378] ------------[ cut here ]------------
> 
> 
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs panic problem
  2018-09-19  0:05 ` Qu Wenruo
@ 2018-09-19  0:35   ` sunny.s.zhang
  2018-09-19  2:27     ` Qu Wenruo
  0 siblings, 1 reply; 14+ messages in thread
From: sunny.s.zhang @ 2018-09-19  0:35 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Junxiao Bi


在 2018年09月19日 08:05, Qu Wenruo 写道:
>
> On 2018/9/18 上午8:28, sunny.s.zhang wrote:
>> Hi All,
>>
>> My OS(4.1.12) panic in kmem_cache_alloc, which is called by
>> btrfs_get_or_create_delayed_node.
> Any reproducer?
>
> Anyway we need a reproducer as a testcase.

I have had a try, but could not  reproduce yet.

Any advice to reproduce it?

>
> The code looks
>
>> I found that the freelist of the slub is wrong.
>>
>> crash> struct kmem_cache_cpu ffff887e7d7a24b0
>>
>> struct kmem_cache_cpu {
>>    freelist = 0x2026,   <<< the value is id of one inode
>>    tid = 29567861,
>>    page = 0xffffea0132168d00,
>>    partial = 0x0
>> }
>>
>> And, I found there are two different btrfs inodes pointing delayed_node.
>> It means that the same slub is used twice.
>>
>> I think this slub is freed twice, and then the next pointer of this slub
>> point itself. So we get the same slub twice.
>>
>> When use this slub again, that break the freelist.
>>
>> Folloing code will make the delayed node being freed twice. But I don't
>> found what is the process.
>>
>> Process A (btrfs_evict_inode) Process B
>>
>> call btrfs_remove_delayed_node call  btrfs_get_delayed_node
>>
>> node = ACCESS_ONCE(btrfs_inode->delayed_node);
>>
>> BTRFS_I(inode)->delayed_node = NULL;
>> btrfs_release_delayed_node(delayed_node);
>>
>> if (node) {
>> atomic_inc(&node->refs);
>> return node;
>> }
>>
>> ......
>>
>> btrfs_release_delayed_node(delayed_node);
>>
>>
>> 1313 void btrfs_remove_delayed_node(struct inode *inode)
>> 1314 {
>> 1315         struct btrfs_delayed_node *delayed_node;
>> 1316
>> 1317         delayed_node = ACCESS_ONCE(BTRFS_I(inode)->delayed_node);
>> 1318         if (!delayed_node)
>> 1319                 return;
>> 1320
>> 1321         BTRFS_I(inode)->delayed_node = NULL;
>> 1322         btrfs_release_delayed_node(delayed_node);
>> 1323 }
>>
>>
>>    87 static struct btrfs_delayed_node *btrfs_get_delayed_node(struct
>> inode *inode)
>>    88 {
>>    89         struct btrfs_inode *btrfs_inode = BTRFS_I(inode);
>>    90         struct btrfs_root *root = btrfs_inode->root;
>>    91         u64 ino = btrfs_ino(inode);
>>    92         struct btrfs_delayed_node *node;
>>    93
>>    94         node = ACCESS_ONCE(btrfs_inode->delayed_node);
>>    95         if (node) {
>>    96                 atomic_inc(&node->refs);
>>    97                 return node;
>>    98         }
>>
> The analyse looks valid.
> Can be fixed by adding a spinlock.
>
> Just wondering why we didn't hit it.

It just appeared once in our production environment.

Thanks,
Sunny
>
> Thanks,
> Qu
>
>> Thanks,
>>
>> Sunny
>>
>>
>> PS:
>>
>> ------------------------------------------------
>>
>> panic informations
>>
>> PID: 73638  TASK: ffff887deb586200  CPU: 38  COMMAND: "dockerd"
>>   #0 [ffff88130404f940] machine_kexec at ffffffff8105ec10
>>   #1 [ffff88130404f9b0] crash_kexec at ffffffff811145b8
>>   #2 [ffff88130404fa80] oops_end at ffffffff8101a868
>>   #3 [ffff88130404fab0] no_context at ffffffff8106ea91
>>   #4 [ffff88130404fb00] __bad_area_nosemaphore at ffffffff8106ec8d
>>   #5 [ffff88130404fb50] bad_area_nosemaphore at ffffffff8106eda3
>>   #6 [ffff88130404fb60] __do_page_fault at ffffffff8106f328
>>   #7 [ffff88130404fbd0] do_page_fault at ffffffff8106f637
>>   #8 [ffff88130404fc10] page_fault at ffffffff816f6308
>>      [exception RIP: kmem_cache_alloc+121]
>>      RIP: ffffffff811ef019  RSP: ffff88130404fcc8  RFLAGS: 00010286
>>      RAX: 0000000000000000  RBX: 0000000000000000  RCX: 0000000001c32b76
>>      RDX: 0000000001c32b75  RSI: 0000000000000000  RDI: 00000000000224b0
>>      RBP: ffff88130404fd08   R8: ffff887e7d7a24b0   R9: 0000000000000000
>>      R10: ffff8802668b6618  R11: 0000000000000002  R12: ffff887e3e230a00
>>      R13: 0000000000002026  R14: ffff887e3e230a00  R15: ffffffffa01abf49
>>      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>   #9 [ffff88130404fd10] btrfs_get_or_create_delayed_node at
>> ffffffffa01abf49 [btrfs]
>> #10 [ffff88130404fd60] btrfs_delayed_update_inode at ffffffffa01aea12
>> [btrfs]
>> #11 [ffff88130404fdb0] btrfs_update_inode at ffffffffa015b199 [btrfs]
>> #12 [ffff88130404fdf0] btrfs_dirty_inode at ffffffffa015cd11 [btrfs]
>> #13 [ffff88130404fe20] btrfs_update_time at ffffffffa015fa25 [btrfs]
>> #14 [ffff88130404fe50] touch_atime at ffffffff812286d3
>> #15 [ffff88130404fe90] iterate_dir at ffffffff81221929
>> #16 [ffff88130404fee0] sys_getdents64 at ffffffff81221a19
>> #17 [ffff88130404ff50] system_call_fastpath at ffffffff816f2594
>>      RIP: 00000000006b68e4  RSP: 000000c866259080  RFLAGS: 00000246
>>      RAX: ffffffffffffffda  RBX: 000000c828dbbe00  RCX: 00000000006b68e4
>>      RDX: 0000000000001000  RSI: 000000c83da14000  RDI: 0000000000000011
>>      RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000000
>>      R10: 0000000000000000  R11: 0000000000000246  R12: 00000000000000c7
>>      R13: 0000000002174e74  R14: 0000000005555555  R15: 0000000000000038
>>      ORIG_RAX: 00000000000000d9  CS: 0033  SS: 002b
>>
>>
>> We also find the list double add informations, including n_list and p_list:
>>
>> [8642921.110568] ------------[ cut here ]------------
>> [8642921.167929] WARNING: CPU: 38 PID: 73638 at lib/list_debug.c:33
>> __list_add+0xbe/0xd0()
>> [8642921.263780] list_add corruption. prev->next should be next
>> (ffff887e40fa5368), but was ff:ff884c85a36288. (prev=ffff884c85a36288).
>> [8642921.405490] Modules linked in: ipt_MASQUERADE
>> nf_nat_masquerade_ipv4 xt_conntrack iptable_filter arc4 ecb ppp_mppe
>> ppp_async crc_ccitt ppp_generic slhc nfsv3 nfs_acl rpcsec_gss_krb5
>> auth_rpcgss nfsv4 nfs fscache lockd sunrpc grace veth xt_nat xt_addrtype
>> br_netfilter bridge tcp_diag inet_diag oracleacfs(POE) oracleadvm(POE)
>> oracleoks(POE) oracleasm autofs4 dm_queue_length cpufreq_powersave
>> be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i iw_cxgb4 cxgb4 cxgb3i
>> libcxgbi iw_cxgb3 cxgb3 ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core
>> ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 8021q mrp
>> garp stp llc bonding iptable_nat nf_nat_ipv4 nf_nat ipt_REJECT
>> nf_reject_ipv4 xt_pkttype nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables
>> ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state
>> [8642922.259780]  nf_conntrack ip6table_filter ip6_tables ipv6
>> dm_multipath sg iTCO_wdt iTCO_vendor_support ipmi_devintf ipmi_si
>> ipmi_msghandler intel_powerclamp coretemp kvm_intel kvm pcspkr ixgbe dca
>> ptp pps_core vxlan udp_tunnel ip6_udp_tunnel mdio sb_edac edac_core
>> i2c_i801 i2c_core lpc_ich mfd_core shpchp ext4 jbd2 mbcache2 btrfs
>> raid6_pq xor sd_mod megaraid_sas ghash_clmulni_intel crc32_pclmul
>> crc32c_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper
>> aes_x86_64 ahci libahci mxm_wmi wmi dm_mirror dm_region_hash dm_log
>> dm_mod [last unloaded: xt_conntrack]
>> [8642922.850285] CPU: 38 PID: 73638 Comm: dockerd Tainted: P
>> OE   4.1.12-112.14.13.el6uek.x86_64 #2
>> [8642922.965881] Hardware name: Oracle Corporation ORACLE SERVER
>> X6-2/ASM,MOTHERBOARD,1U, BIOS 38040100 06/06/2016
>> [8642923.086670]  0000000000000000 ffff88130404fc28 ffffffff816ec91e
>> ffff88130404fc78
>> [8642923.177730]  0000000000000021 ffff88130404fc68 ffffffff81088925
>> 00ff88130404fc98
>> [8642923.268793]  ffff884c85a36288 ffff887e40fa5368 ffff884c85a36288
>> 0000000000000000
>> [8642923.359851] Call Trace:
>> [8642923.391194]  [<ffffffff816ec91e>] dump_stack+0x63/0x85
>> [8642923.454780]  [<ffffffff81088925>] warn_slowpath_common+0x95/0xe0
>> [8642923.528769]  [<ffffffff81088a26>] warn_slowpath_fmt+0x46/0x50
>> [8642923.599635]  [<ffffffff81343f3e>] __list_add+0xbe/0xd0
>> [8642923.663333]  [<ffffffffa01abb92>]
>> btrfs_queue_delayed_node+0xb2/0xe0 [btrfs]
>> [8642923.749898]  [<ffffffffa01abc66>]
>> __btrfs_release_delayed_node+0xa6/0x160 [btrfs]
>> [8642923.841661]  [<ffffffffa01aea5e>]
>> btrfs_delayed_update_inode+0x7e/0x100 [btrfs]
>> [8642923.931336]  [<ffffffffa015b199>] btrfs_update_inode+0xa9/0x130
>> [btrfs]
>> [8642924.012686]  [<ffffffffa015cd11>] btrfs_dirty_inode+0x71/0xe0 [btrfs]
>> [8642924.091954]  [<ffffffffa015fa25>] btrfs_update_time+0x65/0xc0 [btrfs]
>> [8642924.171145]  [<ffffffff812286d3>] touch_atime+0x123/0x170
>> [8642924.237849]  [<ffffffff81221929>] iterate_dir+0xf9/0x140
>> [8642924.303512]  [<ffffffff81221a19>] SyS_getdents64+0xa9/0x100
>> [8642924.372296]  [<ffffffff81221720>] ? fillonedir+0xe0/0xe0
>> [8642924.437963]  [<ffffffff816f2594>] system_call_fastpath+0x12/0xce
>> [8642924.512201] ---[ end trace 6782707b09f9f873 ]---
>> [8642924.569677] ------------[ cut here ]------------
>> [8642924.627055] WARNING: CPU: 38 PID: 73638 at lib/list_debug.c:36
>> __list_add+0x9c/0xd0()
>> [8642924.722925] list_add double add: new=ffff884c85a36288,
>> prev=ffff884c85a36288, next=ffff887e40fa5368.
>> [8642924.834373] Modules linked in: ipt_MASQUERADE
>> nf_nat_masquerade_ipv4 xt_conntrack iptable_filter arc4 ecb ppp_mppe
>> ppp_async crc_ccitt ppp_generic slhc nfsv3 nfs_acl rpcsec_gss_krb5
>> auth_rpcgss nfsv4 nfs fscache lockd sunrpc grace veth xt_nat xt_addrtype
>> br_netfilter bridge tcp_diag inet_diag oracleacfs(POE) oracleadvm(POE)
>> oracleoks(POE) oracleasm autofs4 dm_queue_length cpufreq_powersave
>> be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i iw_cxgb4 cxgb4 cxgb3i
>> libcxgbi iw_cxgb3 cxgb3 ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core
>> ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 8021q mrp
>> garp stp llc bonding iptable_nat nf_nat_ipv4 nf_nat ipt_REJECT
>> nf_reject_ipv4 xt_pkttype nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables
>> ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state
>> [8642925.688498]  nf_conntrack ip6table_filter ip6_tables ipv6
>> dm_multipath sg iTCO_wdt iTCO_vendor_support ipmi_devintf ipmi_si
>> ipmi_msghandler intel_powerclamp coretemp kvm_intel kvm pcspkr ixgbe dca
>> ptp pps_core vxlan udp_tunnel ip6_udp_tunnel mdio sb_edac edac_core
>> i2c_i801 i2c_core lpc_ich mfd_core shpchp ext4 jbd2 mbcache2 btrfs
>> raid6_pq xor sd_mod megaraid_sas ghash_clmulni_intel crc32_pclmul
>> crc32c_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper
>> aes_x86_64 ahci libahci mxm_wmi wmi dm_mirror dm_region_hash dm_log
>> dm_mod [last unloaded: xt_conntrack]
>> [8642926.279029] CPU: 38 PID: 73638 Comm: dockerd Tainted: P        W
>> OE   4.1.12-112.14.13.el6uek.x86_64 #2
>> [8642926.394626] Hardware name: Oracle Corporation ORACLE SERVER
>> X6-2/ASM,MOTHERBOARD,1U, BIOS 38040100 06/06/2016
>> [8642926.515421]  0000000000000000 ffff88130404fc28 ffffffff816ec91e
>> ffff88130404fc78
>> [8642926.606485]  0000000000000024 ffff88130404fc68 ffffffff81088925
>> 00ff88130404fc98
>> [8642926.697544]  ffff884c85a36288 ffff887e40fa5368 ffff884c85a36288
>> 0000000000000000
>> [8642926.788605] Call Trace:
>> [8642926.819945]  [<ffffffff816ec91e>] dump_stack+0x63/0x85
>> [8642926.883535]  [<ffffffff81088925>] warn_slowpath_common+0x95/0xe0
>> [8642926.957529]  [<ffffffff81088a26>] warn_slowpath_fmt+0x46/0x50
>> [8642927.028400]  [<ffffffff81343f1c>] __list_add+0x9c/0xd0
>> [8642927.092116]  [<ffffffffa01abb92>]
>> btrfs_queue_delayed_node+0xb2/0xe0 [btrfs]
>> [8642927.178669]  [<ffffffffa01abc66>]
>> __btrfs_release_delayed_node+0xa6/0x160 [btrfs]
>> [8642927.270440]  [<ffffffffa01aea5e>]
>> btrfs_delayed_update_inode+0x7e/0x100 [btrfs]
>> [8642927.360112]  [<ffffffffa015b199>] btrfs_update_inode+0xa9/0x130
>> [btrfs]
>> [8642927.441447]  [<ffffffffa015cd11>] btrfs_dirty_inode+0x71/0xe0 [btrfs]
>> [8642927.520716]  [<ffffffffa015fa25>] btrfs_update_time+0x65/0xc0 [btrfs]
>> [8642927.599910]  [<ffffffff812286d3>] touch_atime+0x123/0x170
>> [8642927.666615]  [<ffffffff81221929>] iterate_dir+0xf9/0x140
>> [8642927.732284]  [<ffffffff81221a19>] SyS_getdents64+0xa9/0x100
>> [8642927.801072]  [<ffffffff81221720>] ? fillonedir+0xe0/0xe0
>> [8642927.866741]  [<ffffffff816f2594>] system_call_fastpath+0x12/0xce
>> [8642927.940843] ---[ end trace 6782707b09f9f874 ]---
>> [8642927.998378] ------------[ cut here ]------------
>>
>>
>>
>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs panic problem
  2018-09-19  0:35   ` sunny.s.zhang
@ 2018-09-19  2:27     ` Qu Wenruo
  0 siblings, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2018-09-19  2:27 UTC (permalink / raw)
  To: sunny.s.zhang, linux-btrfs; +Cc: Junxiao Bi


[-- Attachment #1.1: Type: text/plain, Size: 13284 bytes --]



On 2018/9/19 上午8:35, sunny.s.zhang wrote:
> 
> 在 2018年09月19日 08:05, Qu Wenruo 写道:
>>
>> On 2018/9/18 上午8:28, sunny.s.zhang wrote:
>>> Hi All,
>>>
>>> My OS(4.1.12) panic in kmem_cache_alloc, which is called by
>>> btrfs_get_or_create_delayed_node.
>> Any reproducer?
>>
>> Anyway we need a reproducer as a testcase.
> 
> I have had a try, but could not  reproduce yet.

Since it's just one hit in production environment, I'm afraid we need to
inject some sleep or delay into this code and try bombing it with fsstress.

Despite that I have no good idea on reproducing it.

Thanks,
Qu

> 
> Any advice to reproduce it?
> 
>>
>> The code looks
>>
>>> I found that the freelist of the slub is wrong.
>>>
>>> crash> struct kmem_cache_cpu ffff887e7d7a24b0
>>>
>>> struct kmem_cache_cpu {
>>>    freelist = 0x2026,   <<< the value is id of one inode
>>>    tid = 29567861,
>>>    page = 0xffffea0132168d00,
>>>    partial = 0x0
>>> }
>>>
>>> And, I found there are two different btrfs inodes pointing delayed_node.
>>> It means that the same slub is used twice.
>>>
>>> I think this slub is freed twice, and then the next pointer of this slub
>>> point itself. So we get the same slub twice.
>>>
>>> When use this slub again, that break the freelist.
>>>
>>> Folloing code will make the delayed node being freed twice. But I don't
>>> found what is the process.
>>>
>>> Process A (btrfs_evict_inode) Process B
>>>
>>> call btrfs_remove_delayed_node call  btrfs_get_delayed_node
>>>
>>> node = ACCESS_ONCE(btrfs_inode->delayed_node);
>>>
>>> BTRFS_I(inode)->delayed_node = NULL;
>>> btrfs_release_delayed_node(delayed_node);
>>>
>>> if (node) {
>>> atomic_inc(&node->refs);
>>> return node;
>>> }
>>>
>>> ......
>>>
>>> btrfs_release_delayed_node(delayed_node);
>>>
>>>
>>> 1313 void btrfs_remove_delayed_node(struct inode *inode)
>>> 1314 {
>>> 1315         struct btrfs_delayed_node *delayed_node;
>>> 1316
>>> 1317         delayed_node = ACCESS_ONCE(BTRFS_I(inode)->delayed_node);
>>> 1318         if (!delayed_node)
>>> 1319                 return;
>>> 1320
>>> 1321         BTRFS_I(inode)->delayed_node = NULL;
>>> 1322         btrfs_release_delayed_node(delayed_node);
>>> 1323 }
>>>
>>>
>>>    87 static struct btrfs_delayed_node *btrfs_get_delayed_node(struct
>>> inode *inode)
>>>    88 {
>>>    89         struct btrfs_inode *btrfs_inode = BTRFS_I(inode);
>>>    90         struct btrfs_root *root = btrfs_inode->root;
>>>    91         u64 ino = btrfs_ino(inode);
>>>    92         struct btrfs_delayed_node *node;
>>>    93
>>>    94         node = ACCESS_ONCE(btrfs_inode->delayed_node);
>>>    95         if (node) {
>>>    96                 atomic_inc(&node->refs);
>>>    97                 return node;
>>>    98         }
>>>
>> The analyse looks valid.
>> Can be fixed by adding a spinlock.
>>
>> Just wondering why we didn't hit it.
> 
> It just appeared once in our production environment.
> 
> Thanks,
> Sunny
>>
>> Thanks,
>> Qu
>>
>>> Thanks,
>>>
>>> Sunny
>>>
>>>
>>> PS:
>>>
>>> ------------------------------------------------
>>>
>>> panic informations
>>>
>>> PID: 73638  TASK: ffff887deb586200  CPU: 38  COMMAND: "dockerd"
>>>   #0 [ffff88130404f940] machine_kexec at ffffffff8105ec10
>>>   #1 [ffff88130404f9b0] crash_kexec at ffffffff811145b8
>>>   #2 [ffff88130404fa80] oops_end at ffffffff8101a868
>>>   #3 [ffff88130404fab0] no_context at ffffffff8106ea91
>>>   #4 [ffff88130404fb00] __bad_area_nosemaphore at ffffffff8106ec8d
>>>   #5 [ffff88130404fb50] bad_area_nosemaphore at ffffffff8106eda3
>>>   #6 [ffff88130404fb60] __do_page_fault at ffffffff8106f328
>>>   #7 [ffff88130404fbd0] do_page_fault at ffffffff8106f637
>>>   #8 [ffff88130404fc10] page_fault at ffffffff816f6308
>>>      [exception RIP: kmem_cache_alloc+121]
>>>      RIP: ffffffff811ef019  RSP: ffff88130404fcc8  RFLAGS: 00010286
>>>      RAX: 0000000000000000  RBX: 0000000000000000  RCX: 0000000001c32b76
>>>      RDX: 0000000001c32b75  RSI: 0000000000000000  RDI: 00000000000224b0
>>>      RBP: ffff88130404fd08   R8: ffff887e7d7a24b0   R9: 0000000000000000
>>>      R10: ffff8802668b6618  R11: 0000000000000002  R12: ffff887e3e230a00
>>>      R13: 0000000000002026  R14: ffff887e3e230a00  R15: ffffffffa01abf49
>>>      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>>   #9 [ffff88130404fd10] btrfs_get_or_create_delayed_node at
>>> ffffffffa01abf49 [btrfs]
>>> #10 [ffff88130404fd60] btrfs_delayed_update_inode at ffffffffa01aea12
>>> [btrfs]
>>> #11 [ffff88130404fdb0] btrfs_update_inode at ffffffffa015b199 [btrfs]
>>> #12 [ffff88130404fdf0] btrfs_dirty_inode at ffffffffa015cd11 [btrfs]
>>> #13 [ffff88130404fe20] btrfs_update_time at ffffffffa015fa25 [btrfs]
>>> #14 [ffff88130404fe50] touch_atime at ffffffff812286d3
>>> #15 [ffff88130404fe90] iterate_dir at ffffffff81221929
>>> #16 [ffff88130404fee0] sys_getdents64 at ffffffff81221a19
>>> #17 [ffff88130404ff50] system_call_fastpath at ffffffff816f2594
>>>      RIP: 00000000006b68e4  RSP: 000000c866259080  RFLAGS: 00000246
>>>      RAX: ffffffffffffffda  RBX: 000000c828dbbe00  RCX: 00000000006b68e4
>>>      RDX: 0000000000001000  RSI: 000000c83da14000  RDI: 0000000000000011
>>>      RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000000
>>>      R10: 0000000000000000  R11: 0000000000000246  R12: 00000000000000c7
>>>      R13: 0000000002174e74  R14: 0000000005555555  R15: 0000000000000038
>>>      ORIG_RAX: 00000000000000d9  CS: 0033  SS: 002b
>>>
>>>
>>> We also find the list double add informations, including n_list and
>>> p_list:
>>>
>>> [8642921.110568] ------------[ cut here ]------------
>>> [8642921.167929] WARNING: CPU: 38 PID: 73638 at lib/list_debug.c:33
>>> __list_add+0xbe/0xd0()
>>> [8642921.263780] list_add corruption. prev->next should be next
>>> (ffff887e40fa5368), but was ff:ff884c85a36288. (prev=ffff884c85a36288).
>>> [8642921.405490] Modules linked in: ipt_MASQUERADE
>>> nf_nat_masquerade_ipv4 xt_conntrack iptable_filter arc4 ecb ppp_mppe
>>> ppp_async crc_ccitt ppp_generic slhc nfsv3 nfs_acl rpcsec_gss_krb5
>>> auth_rpcgss nfsv4 nfs fscache lockd sunrpc grace veth xt_nat xt_addrtype
>>> br_netfilter bridge tcp_diag inet_diag oracleacfs(POE) oracleadvm(POE)
>>> oracleoks(POE) oracleasm autofs4 dm_queue_length cpufreq_powersave
>>> be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i iw_cxgb4 cxgb4 cxgb3i
>>> libcxgbi iw_cxgb3 cxgb3 ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core
>>> ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 8021q mrp
>>> garp stp llc bonding iptable_nat nf_nat_ipv4 nf_nat ipt_REJECT
>>> nf_reject_ipv4 xt_pkttype nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables
>>> ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state
>>> [8642922.259780]  nf_conntrack ip6table_filter ip6_tables ipv6
>>> dm_multipath sg iTCO_wdt iTCO_vendor_support ipmi_devintf ipmi_si
>>> ipmi_msghandler intel_powerclamp coretemp kvm_intel kvm pcspkr ixgbe dca
>>> ptp pps_core vxlan udp_tunnel ip6_udp_tunnel mdio sb_edac edac_core
>>> i2c_i801 i2c_core lpc_ich mfd_core shpchp ext4 jbd2 mbcache2 btrfs
>>> raid6_pq xor sd_mod megaraid_sas ghash_clmulni_intel crc32_pclmul
>>> crc32c_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper
>>> aes_x86_64 ahci libahci mxm_wmi wmi dm_mirror dm_region_hash dm_log
>>> dm_mod [last unloaded: xt_conntrack]
>>> [8642922.850285] CPU: 38 PID: 73638 Comm: dockerd Tainted: P
>>> OE   4.1.12-112.14.13.el6uek.x86_64 #2
>>> [8642922.965881] Hardware name: Oracle Corporation ORACLE SERVER
>>> X6-2/ASM,MOTHERBOARD,1U, BIOS 38040100 06/06/2016
>>> [8642923.086670]  0000000000000000 ffff88130404fc28 ffffffff816ec91e
>>> ffff88130404fc78
>>> [8642923.177730]  0000000000000021 ffff88130404fc68 ffffffff81088925
>>> 00ff88130404fc98
>>> [8642923.268793]  ffff884c85a36288 ffff887e40fa5368 ffff884c85a36288
>>> 0000000000000000
>>> [8642923.359851] Call Trace:
>>> [8642923.391194]  [<ffffffff816ec91e>] dump_stack+0x63/0x85
>>> [8642923.454780]  [<ffffffff81088925>] warn_slowpath_common+0x95/0xe0
>>> [8642923.528769]  [<ffffffff81088a26>] warn_slowpath_fmt+0x46/0x50
>>> [8642923.599635]  [<ffffffff81343f3e>] __list_add+0xbe/0xd0
>>> [8642923.663333]  [<ffffffffa01abb92>]
>>> btrfs_queue_delayed_node+0xb2/0xe0 [btrfs]
>>> [8642923.749898]  [<ffffffffa01abc66>]
>>> __btrfs_release_delayed_node+0xa6/0x160 [btrfs]
>>> [8642923.841661]  [<ffffffffa01aea5e>]
>>> btrfs_delayed_update_inode+0x7e/0x100 [btrfs]
>>> [8642923.931336]  [<ffffffffa015b199>] btrfs_update_inode+0xa9/0x130
>>> [btrfs]
>>> [8642924.012686]  [<ffffffffa015cd11>] btrfs_dirty_inode+0x71/0xe0
>>> [btrfs]
>>> [8642924.091954]  [<ffffffffa015fa25>] btrfs_update_time+0x65/0xc0
>>> [btrfs]
>>> [8642924.171145]  [<ffffffff812286d3>] touch_atime+0x123/0x170
>>> [8642924.237849]  [<ffffffff81221929>] iterate_dir+0xf9/0x140
>>> [8642924.303512]  [<ffffffff81221a19>] SyS_getdents64+0xa9/0x100
>>> [8642924.372296]  [<ffffffff81221720>] ? fillonedir+0xe0/0xe0
>>> [8642924.437963]  [<ffffffff816f2594>] system_call_fastpath+0x12/0xce
>>> [8642924.512201] ---[ end trace 6782707b09f9f873 ]---
>>> [8642924.569677] ------------[ cut here ]------------
>>> [8642924.627055] WARNING: CPU: 38 PID: 73638 at lib/list_debug.c:36
>>> __list_add+0x9c/0xd0()
>>> [8642924.722925] list_add double add: new=ffff884c85a36288,
>>> prev=ffff884c85a36288, next=ffff887e40fa5368.
>>> [8642924.834373] Modules linked in: ipt_MASQUERADE
>>> nf_nat_masquerade_ipv4 xt_conntrack iptable_filter arc4 ecb ppp_mppe
>>> ppp_async crc_ccitt ppp_generic slhc nfsv3 nfs_acl rpcsec_gss_krb5
>>> auth_rpcgss nfsv4 nfs fscache lockd sunrpc grace veth xt_nat xt_addrtype
>>> br_netfilter bridge tcp_diag inet_diag oracleacfs(POE) oracleadvm(POE)
>>> oracleoks(POE) oracleasm autofs4 dm_queue_length cpufreq_powersave
>>> be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i iw_cxgb4 cxgb4 cxgb3i
>>> libcxgbi iw_cxgb3 cxgb3 ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core
>>> ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 8021q mrp
>>> garp stp llc bonding iptable_nat nf_nat_ipv4 nf_nat ipt_REJECT
>>> nf_reject_ipv4 xt_pkttype nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables
>>> ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state
>>> [8642925.688498]  nf_conntrack ip6table_filter ip6_tables ipv6
>>> dm_multipath sg iTCO_wdt iTCO_vendor_support ipmi_devintf ipmi_si
>>> ipmi_msghandler intel_powerclamp coretemp kvm_intel kvm pcspkr ixgbe dca
>>> ptp pps_core vxlan udp_tunnel ip6_udp_tunnel mdio sb_edac edac_core
>>> i2c_i801 i2c_core lpc_ich mfd_core shpchp ext4 jbd2 mbcache2 btrfs
>>> raid6_pq xor sd_mod megaraid_sas ghash_clmulni_intel crc32_pclmul
>>> crc32c_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper
>>> aes_x86_64 ahci libahci mxm_wmi wmi dm_mirror dm_region_hash dm_log
>>> dm_mod [last unloaded: xt_conntrack]
>>> [8642926.279029] CPU: 38 PID: 73638 Comm: dockerd Tainted: P        W
>>> OE   4.1.12-112.14.13.el6uek.x86_64 #2
>>> [8642926.394626] Hardware name: Oracle Corporation ORACLE SERVER
>>> X6-2/ASM,MOTHERBOARD,1U, BIOS 38040100 06/06/2016
>>> [8642926.515421]  0000000000000000 ffff88130404fc28 ffffffff816ec91e
>>> ffff88130404fc78
>>> [8642926.606485]  0000000000000024 ffff88130404fc68 ffffffff81088925
>>> 00ff88130404fc98
>>> [8642926.697544]  ffff884c85a36288 ffff887e40fa5368 ffff884c85a36288
>>> 0000000000000000
>>> [8642926.788605] Call Trace:
>>> [8642926.819945]  [<ffffffff816ec91e>] dump_stack+0x63/0x85
>>> [8642926.883535]  [<ffffffff81088925>] warn_slowpath_common+0x95/0xe0
>>> [8642926.957529]  [<ffffffff81088a26>] warn_slowpath_fmt+0x46/0x50
>>> [8642927.028400]  [<ffffffff81343f1c>] __list_add+0x9c/0xd0
>>> [8642927.092116]  [<ffffffffa01abb92>]
>>> btrfs_queue_delayed_node+0xb2/0xe0 [btrfs]
>>> [8642927.178669]  [<ffffffffa01abc66>]
>>> __btrfs_release_delayed_node+0xa6/0x160 [btrfs]
>>> [8642927.270440]  [<ffffffffa01aea5e>]
>>> btrfs_delayed_update_inode+0x7e/0x100 [btrfs]
>>> [8642927.360112]  [<ffffffffa015b199>] btrfs_update_inode+0xa9/0x130
>>> [btrfs]
>>> [8642927.441447]  [<ffffffffa015cd11>] btrfs_dirty_inode+0x71/0xe0
>>> [btrfs]
>>> [8642927.520716]  [<ffffffffa015fa25>] btrfs_update_time+0x65/0xc0
>>> [btrfs]
>>> [8642927.599910]  [<ffffffff812286d3>] touch_atime+0x123/0x170
>>> [8642927.666615]  [<ffffffff81221929>] iterate_dir+0xf9/0x140
>>> [8642927.732284]  [<ffffffff81221a19>] SyS_getdents64+0xa9/0x100
>>> [8642927.801072]  [<ffffffff81221720>] ? fillonedir+0xe0/0xe0
>>> [8642927.866741]  [<ffffffff816f2594>] system_call_fastpath+0x12/0xce
>>> [8642927.940843] ---[ end trace 6782707b09f9f874 ]---
>>> [8642927.998378] ------------[ cut here ]------------
>>>
>>>
>>>
>>>
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs panic problem
  2018-09-18 23:53   ` sunny.s.zhang
@ 2018-09-19 16:12     ` Nikolay Borisov
  2018-09-25  8:29       ` sunny.s.zhang
  0 siblings, 1 reply; 14+ messages in thread
From: Nikolay Borisov @ 2018-09-19 16:12 UTC (permalink / raw)
  To: sunny.s.zhang, Duncan, linux-btrfs



On 19.09.2018 02:53, sunny.s.zhang wrote:
> Hi Duncan,
> 
> Thank you for your advice. I understand what you mean.  But i have
> reviewed the latest btrfs code, and i think the issue is exist still.
> 
> At 71 line, if the function of btrfs_get_delayed_node run over this
> line, then switch to other process, which run over the 1282 and release
> the delayed node at the end.
> 
> And then, switch back to the  btrfs_get_delayed_node. find that the node
> is not null, and use it as normal. that mean we used a freed memory.
> 
> at some time, this memory will be freed again.
> 
> latest code as below.
> 
> 1278 void btrfs_remove_delayed_node(struct btrfs_inode *inode)
> 1279 {
> 1280         struct btrfs_delayed_node *delayed_node;
> 1281
> 1282         delayed_node = READ_ONCE(inode->delayed_node);
> 1283         if (!delayed_node)
> 1284                 return;
> 1285
> 1286         inode->delayed_node = NULL;
> 1287         btrfs_release_delayed_node(delayed_node);
> 1288 }
> 
> 
>   64 static struct btrfs_delayed_node *btrfs_get_delayed_node(
>   65                 struct btrfs_inode *btrfs_inode)
>   66 {
>   67         struct btrfs_root *root = btrfs_inode->root;
>   68         u64 ino = btrfs_ino(btrfs_inode);
>   69         struct btrfs_delayed_node *node;
>   70
>   71         node = READ_ONCE(btrfs_inode->delayed_node);
>   72         if (node) {
>   73                 refcount_inc(&node->refs);
>   74                 return node;
>   75         }
>   76
>   77         spin_lock(&root->inode_lock);
>   78         node = radix_tree_lookup(&root->delayed_nodes_tree, ino);
> 
> 

You are analysis is correct, however it's missing one crucial point -
btrfs_remove_delayed_node is called only from btrfs_evict_inode. And
inodes are evicted when all other references have been dropped. Check
the code in evict_inodes() - inodes are added to the dispose list when
their i_count is 0 at which point there should be no references in this
inode. This invalidates your analysis...

> 在 2018年09月18日 13:05, Duncan 写道:
>> sunny.s.zhang posted on Tue, 18 Sep 2018 08:28:14 +0800 as excerpted:
>>
>>> My OS(4.1.12) panic in kmem_cache_alloc, which is called by
>>> btrfs_get_or_create_delayed_node.
>>>
>>> I found that the freelist of the slub is wrong.
>> [Not a dev, just a btrfs list regular and user, myself.  But here's a
>> general btrfs list recommendations reply...]
>>
>> You appear to mean kernel 4.1.12 -- confirmed by the version reported in
>> the posted dump:  4.1.12-112.14.13.el6uek.x86_64
>>
>> OK, so from the perspective of this forward-development-focused list,
>> kernel 4.1 is pretty ancient history, but you do have a number of
>> options.
>>
>> First let's consider the general situation.  Most people choose an
>> enterprise distro for supported stability, and that's certainly a valid
>> thing to want.  However, btrfs, while now reaching early maturity for the
>> basics (single device in single or dup mode, and multi-device in single/
>> raid0/1/10 modes, note that raid56 mode is newer and less mature),
>> remains under quite heavy development, and keeping reasonably current is
>> recommended for that reason.
>>
>> So you you chose an enterprise distro presumably to lock in supported
>> stability for several years, but you chose a filesystem, btrfs, that's
>> still under heavy development, with reasonably current kernels and
>> userspace recommended as tending to have the known bugs fixed.  There's a
>> bit of a conflict there, and the /general/ recommendation would thus be
>> to consider whether one or the other of those choices are inappropriate
>> for your use-case, because it's really quite likely that if you really
>> want the stability of an enterprise distro and kernel, that btrfs isn't
>> as stable a filesystem as you're likely to want to match with it.
>> Alternatively, if you want something newer to match the still under heavy
>> development btrfs, you very likely want a distro that's not focused on
>> years-old stability just for the sake of it.  One or the other is likely
>> to be a poor match for your needs, and choosing something else that's a
>> better match is likely to be a much better experience for you.
>>
>> But perhaps you do have reason to want to run the newer and not quite to
>> traditional enterprise-distro level stability btrfs, on an otherwise
>> older and very stable enterprise distro.  That's fine, provided you know
>> what you're getting yourself into, and are prepared to deal with it.
>>
>> In that case, for best support from the list, we'd recommend running one
>> of the latest two kernels in either the current or mainline LTS tracks.
>>
>> For current track, With 4.18 being the latest kernel, that'd be 4.18 or
>> 4.17, as available on kernel.org (tho 4.17 is already EOL, no further
>> releases, at 4.17.19).
>>
>> For mainline-LTS track, 4.14 and 4.9 are the latest two LTS series
>> kernels, tho IIRC 4.19 is scheduled to be this year's LTS (or was it 4.18
>> and it's just not out of normal stable range yet so not yet marked LTS?),
>> so it'll be coming up soon and 4.9 will then be dropping to third LTS
>> series and thus out of our best recommended range.  4.4 was the previous
>> LTS and while still in LTS support, is outside the two newest LTS series
>> that this list recommends.
>>
>> And of course 4.1 is older than 4.4, so as I said, in btrfs development
>> terms, it's quite ancient indeed... quite out of practical support range
>> here, tho of course we'll still try, but in many cases the first question
>> when any problem's reported is going to be whether it's reproducible on
>> something closer to current.
>>
>> But... you ARE on an enterprise kernel, likely on an enterprise distro,
>> and very possibly actually paying /them/ for support.  So you're not
>> without options if you prefer to stay with your supported enterprise
>> kernel.  If you're paying them for support, you might as well use it, and
>> of course of the very many fixes since 4.1, they know what they've
>> backported and what they haven't, so they're far better placed to provide
>> that support in any case.
>>
>> Or, given what you posted, you appear to be reasonably able to do at
>> least limited kernel-dev-level analysis yourself.  Given that, you're
>> already reasonably well placed to simply decide to stick with what you
>> have and take the support you can get, diving into things yourself if
>> necessary.
>>
>>
>> So those are your kernel options.  What about userspace btrfs-progs?
>>
>> Generally speaking, while the filesystem's running, it's the kernel code
>> doing most of the work.  If you have old userspace, it simply means you
>> can't take advantage of some of the newer features as the old userspace
>> doesn't know how to call for them.
>>
>> But the situation changes as soon as you have problems and can't mount,
>> because it's userspace code that runs to try to fix that sort of problem,
>> or failing that, it's userspace code that btrfs restore runs to try to
>> grab what files can be grabbed off of the unmountable filesystem.
>>
>> So for routine operation, it's no big deal if userspace is a bit old, at
>> least as long as it's new enough to have all the newer command formats,
>> etc, that you need, and for comparing against others when posted.  But
>> once things go bad on you, you really want the newest btrfs-progs in
>> ordered to give you the best chance at either fixing things, or worst-
>> case, at least retrieving the files off the dead filesystem.  So using
>> the older distro btrfs-progs for routine running should be fine, but
>> unless your backups are complete and frequent enough that if something
>> goes wrong it's easiest to simply blow the bad version away with a fresh
>> mkfs and start over, you'll probably want at least a reasonably current
>> btrfs-progs on your rescue media at least.  Since the userspace version
>> numbers are synced to the kernel cycle, a good rule of thumb is keep your
>> btrfs-progs version to at least that of the oldest recommended LTS kernel
>> version, as well, so you'd want at least btrfs-progs 4.9 on your rescue
>> media, for now, and 4.14, coming up, since when the new kernel goes LTS
>> that'll displace 4.9 and 4.14 will then be the second-back LTS.
>>
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs panic problem
  2018-09-18  0:28 btrfs panic problem sunny.s.zhang
                   ` (2 preceding siblings ...)
  2018-09-19  0:05 ` Qu Wenruo
@ 2018-09-19 18:36 ` Liu Bo
  2018-09-25  8:20   ` sunny.s.zhang
  3 siblings, 1 reply; 14+ messages in thread
From: Liu Bo @ 2018-09-19 18:36 UTC (permalink / raw)
  To: sunny.s.zhang; +Cc: linux-btrfs

On Mon, Sep 17, 2018 at 5:28 PM, sunny.s.zhang <sunny.s.zhang@oracle.com> wrote:
> Hi All,
>
> My OS(4.1.12) panic in kmem_cache_alloc, which is called by
> btrfs_get_or_create_delayed_node.
>
> I found that the freelist of the slub is wrong.
>
> crash> struct kmem_cache_cpu ffff887e7d7a24b0
>
> struct kmem_cache_cpu {
>   freelist = 0x2026,   <<< the value is id of one inode
>   tid = 29567861,
>   page = 0xffffea0132168d00,
>   partial = 0x0
> }
>
> And, I found there are two different btrfs inodes pointing delayed_node. It
> means that the same slub is used twice.
>
> I think this slub is freed twice, and then the next pointer of this slub
> point itself. So we get the same slub twice.
>
> When use this slub again, that break the freelist.
>
> Folloing code will make the delayed node being freed twice. But I don't
> found what is the process.
>
> Process A (btrfs_evict_inode) Process B
>
> call btrfs_remove_delayed_node call  btrfs_get_delayed_node
>
> node = ACCESS_ONCE(btrfs_inode->delayed_node);
>
> BTRFS_I(inode)->delayed_node = NULL;
> btrfs_release_delayed_node(delayed_node);
>
> if (node) {
> atomic_inc(&node->refs);
> return node;
> }
>
> ......
>
> btrfs_release_delayed_node(delayed_node);
>

By looking at the race,  seems the following commit has addressed it.

btrfs: fix refcount_t usage when deleting btrfs_delayed_nodes
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ec35e48b286959991cdbb886f1bdeda4575c80b4

thanks,
liubo


>
> 1313 void btrfs_remove_delayed_node(struct inode *inode)
> 1314 {
> 1315         struct btrfs_delayed_node *delayed_node;
> 1316
> 1317         delayed_node = ACCESS_ONCE(BTRFS_I(inode)->delayed_node);
> 1318         if (!delayed_node)
> 1319                 return;
> 1320
> 1321         BTRFS_I(inode)->delayed_node = NULL;
> 1322         btrfs_release_delayed_node(delayed_node);
> 1323 }
>
>
>   87 static struct btrfs_delayed_node *btrfs_get_delayed_node(struct inode
> *inode)
>   88 {
>   89         struct btrfs_inode *btrfs_inode = BTRFS_I(inode);
>   90         struct btrfs_root *root = btrfs_inode->root;
>   91         u64 ino = btrfs_ino(inode);
>   92         struct btrfs_delayed_node *node;
>   93
>   94         node = ACCESS_ONCE(btrfs_inode->delayed_node);
>   95         if (node) {
>   96                 atomic_inc(&node->refs);
>   97                 return node;
>   98         }
>
>
> Thanks,
>
> Sunny
>
>
> PS:
>
> ------------------------------------------------
>
> panic informations
>
> PID: 73638  TASK: ffff887deb586200  CPU: 38  COMMAND: "dockerd"
>  #0 [ffff88130404f940] machine_kexec at ffffffff8105ec10
>  #1 [ffff88130404f9b0] crash_kexec at ffffffff811145b8
>  #2 [ffff88130404fa80] oops_end at ffffffff8101a868
>  #3 [ffff88130404fab0] no_context at ffffffff8106ea91
>  #4 [ffff88130404fb00] __bad_area_nosemaphore at ffffffff8106ec8d
>  #5 [ffff88130404fb50] bad_area_nosemaphore at ffffffff8106eda3
>  #6 [ffff88130404fb60] __do_page_fault at ffffffff8106f328
>  #7 [ffff88130404fbd0] do_page_fault at ffffffff8106f637
>  #8 [ffff88130404fc10] page_fault at ffffffff816f6308
>     [exception RIP: kmem_cache_alloc+121]
>     RIP: ffffffff811ef019  RSP: ffff88130404fcc8  RFLAGS: 00010286
>     RAX: 0000000000000000  RBX: 0000000000000000  RCX: 0000000001c32b76
>     RDX: 0000000001c32b75  RSI: 0000000000000000  RDI: 00000000000224b0
>     RBP: ffff88130404fd08   R8: ffff887e7d7a24b0   R9: 0000000000000000
>     R10: ffff8802668b6618  R11: 0000000000000002  R12: ffff887e3e230a00
>     R13: 0000000000002026  R14: ffff887e3e230a00  R15: ffffffffa01abf49
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>  #9 [ffff88130404fd10] btrfs_get_or_create_delayed_node at ffffffffa01abf49
> [btrfs]
> #10 [ffff88130404fd60] btrfs_delayed_update_inode at ffffffffa01aea12
> [btrfs]
> #11 [ffff88130404fdb0] btrfs_update_inode at ffffffffa015b199 [btrfs]
> #12 [ffff88130404fdf0] btrfs_dirty_inode at ffffffffa015cd11 [btrfs]
> #13 [ffff88130404fe20] btrfs_update_time at ffffffffa015fa25 [btrfs]
> #14 [ffff88130404fe50] touch_atime at ffffffff812286d3
> #15 [ffff88130404fe90] iterate_dir at ffffffff81221929
> #16 [ffff88130404fee0] sys_getdents64 at ffffffff81221a19
> #17 [ffff88130404ff50] system_call_fastpath at ffffffff816f2594
>     RIP: 00000000006b68e4  RSP: 000000c866259080  RFLAGS: 00000246
>     RAX: ffffffffffffffda  RBX: 000000c828dbbe00  RCX: 00000000006b68e4
>     RDX: 0000000000001000  RSI: 000000c83da14000  RDI: 0000000000000011
>     RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000000
>     R10: 0000000000000000  R11: 0000000000000246  R12: 00000000000000c7
>     R13: 0000000002174e74  R14: 0000000005555555  R15: 0000000000000038
>     ORIG_RAX: 00000000000000d9  CS: 0033  SS: 002b
>
>
> We also find the list double add informations, including n_list and p_list:
>
> [8642921.110568] ------------[ cut here ]------------
> [8642921.167929] WARNING: CPU: 38 PID: 73638 at lib/list_debug.c:33
> __list_add+0xbe/0xd0()
> [8642921.263780] list_add corruption. prev->next should be next
> (ffff887e40fa5368), but was ff:ff884c85a36288. (prev=ffff884c85a36288).
> [8642921.405490] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4
> xt_conntrack iptable_filter arc4 ecb ppp_mppe ppp_async crc_ccitt
> ppp_generic slhc nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs fscache
> lockd sunrpc grace veth xt_nat xt_addrtype br_netfilter bridge tcp_diag
> inet_diag oracleacfs(POE) oracleadvm(POE) oracleoks(POE) oracleasm autofs4
> dm_queue_length cpufreq_powersave be2iscsi iscsi_boot_sysfs bnx2i cnic uio
> cxgb4i iw_cxgb4 cxgb4 cxgb3i libcxgbi iw_cxgb3 cxgb3 ib_iser rdma_cm ib_cm
> iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi
> scsi_transport_iscsi 8021q mrp garp stp llc bonding iptable_nat nf_nat_ipv4
> nf_nat ipt_REJECT nf_reject_ipv4 xt_pkttype nf_conntrack_ipv4 nf_defrag_ipv4
> ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6
> xt_state
> [8642922.259780]  nf_conntrack ip6table_filter ip6_tables ipv6 dm_multipath
> sg iTCO_wdt iTCO_vendor_support ipmi_devintf ipmi_si ipmi_msghandler
> intel_powerclamp coretemp kvm_intel kvm pcspkr ixgbe dca ptp pps_core vxlan
> udp_tunnel ip6_udp_tunnel mdio sb_edac edac_core i2c_i801 i2c_core lpc_ich
> mfd_core shpchp ext4 jbd2 mbcache2 btrfs raid6_pq xor sd_mod megaraid_sas
> ghash_clmulni_intel crc32_pclmul crc32c_intel aesni_intel ablk_helper cryptd
> lrw gf128mul glue_helper aes_x86_64 ahci libahci mxm_wmi wmi dm_mirror
> dm_region_hash dm_log dm_mod [last unloaded: xt_conntrack]
> [8642922.850285] CPU: 38 PID: 73638 Comm: dockerd Tainted: P           OE
> 4.1.12-112.14.13.el6uek.x86_64 #2
> [8642922.965881] Hardware name: Oracle Corporation ORACLE SERVER
> X6-2/ASM,MOTHERBOARD,1U, BIOS 38040100 06/06/2016
> [8642923.086670]  0000000000000000 ffff88130404fc28 ffffffff816ec91e
> ffff88130404fc78
> [8642923.177730]  0000000000000021 ffff88130404fc68 ffffffff81088925
> 00ff88130404fc98
> [8642923.268793]  ffff884c85a36288 ffff887e40fa5368 ffff884c85a36288
> 0000000000000000
> [8642923.359851] Call Trace:
> [8642923.391194]  [<ffffffff816ec91e>] dump_stack+0x63/0x85
> [8642923.454780]  [<ffffffff81088925>] warn_slowpath_common+0x95/0xe0
> [8642923.528769]  [<ffffffff81088a26>] warn_slowpath_fmt+0x46/0x50
> [8642923.599635]  [<ffffffff81343f3e>] __list_add+0xbe/0xd0
> [8642923.663333]  [<ffffffffa01abb92>] btrfs_queue_delayed_node+0xb2/0xe0
> [btrfs]
> [8642923.749898]  [<ffffffffa01abc66>]
> __btrfs_release_delayed_node+0xa6/0x160 [btrfs]
> [8642923.841661]  [<ffffffffa01aea5e>] btrfs_delayed_update_inode+0x7e/0x100
> [btrfs]
> [8642923.931336]  [<ffffffffa015b199>] btrfs_update_inode+0xa9/0x130 [btrfs]
> [8642924.012686]  [<ffffffffa015cd11>] btrfs_dirty_inode+0x71/0xe0 [btrfs]
> [8642924.091954]  [<ffffffffa015fa25>] btrfs_update_time+0x65/0xc0 [btrfs]
> [8642924.171145]  [<ffffffff812286d3>] touch_atime+0x123/0x170
> [8642924.237849]  [<ffffffff81221929>] iterate_dir+0xf9/0x140
> [8642924.303512]  [<ffffffff81221a19>] SyS_getdents64+0xa9/0x100
> [8642924.372296]  [<ffffffff81221720>] ? fillonedir+0xe0/0xe0
> [8642924.437963]  [<ffffffff816f2594>] system_call_fastpath+0x12/0xce
> [8642924.512201] ---[ end trace 6782707b09f9f873 ]---
> [8642924.569677] ------------[ cut here ]------------
> [8642924.627055] WARNING: CPU: 38 PID: 73638 at lib/list_debug.c:36
> __list_add+0x9c/0xd0()
> [8642924.722925] list_add double add: new=ffff884c85a36288,
> prev=ffff884c85a36288, next=ffff887e40fa5368.
> [8642924.834373] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4
> xt_conntrack iptable_filter arc4 ecb ppp_mppe ppp_async crc_ccitt
> ppp_generic slhc nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs fscache
> lockd sunrpc grace veth xt_nat xt_addrtype br_netfilter bridge tcp_diag
> inet_diag oracleacfs(POE) oracleadvm(POE) oracleoks(POE) oracleasm autofs4
> dm_queue_length cpufreq_powersave be2iscsi iscsi_boot_sysfs bnx2i cnic uio
> cxgb4i iw_cxgb4 cxgb4 cxgb3i libcxgbi iw_cxgb3 cxgb3 ib_iser rdma_cm ib_cm
> iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi
> scsi_transport_iscsi 8021q mrp garp stp llc bonding iptable_nat nf_nat_ipv4
> nf_nat ipt_REJECT nf_reject_ipv4 xt_pkttype nf_conntrack_ipv4 nf_defrag_ipv4
> ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6
> xt_state
> [8642925.688498]  nf_conntrack ip6table_filter ip6_tables ipv6 dm_multipath
> sg iTCO_wdt iTCO_vendor_support ipmi_devintf ipmi_si ipmi_msghandler
> intel_powerclamp coretemp kvm_intel kvm pcspkr ixgbe dca ptp pps_core vxlan
> udp_tunnel ip6_udp_tunnel mdio sb_edac edac_core i2c_i801 i2c_core lpc_ich
> mfd_core shpchp ext4 jbd2 mbcache2 btrfs raid6_pq xor sd_mod megaraid_sas
> ghash_clmulni_intel crc32_pclmul crc32c_intel aesni_intel ablk_helper cryptd
> lrw gf128mul glue_helper aes_x86_64 ahci libahci mxm_wmi wmi dm_mirror
> dm_region_hash dm_log dm_mod [last unloaded: xt_conntrack]
> [8642926.279029] CPU: 38 PID: 73638 Comm: dockerd Tainted: P        W  OE
> 4.1.12-112.14.13.el6uek.x86_64 #2
> [8642926.394626] Hardware name: Oracle Corporation ORACLE SERVER
> X6-2/ASM,MOTHERBOARD,1U, BIOS 38040100 06/06/2016
> [8642926.515421]  0000000000000000 ffff88130404fc28 ffffffff816ec91e
> ffff88130404fc78
> [8642926.606485]  0000000000000024 ffff88130404fc68 ffffffff81088925
> 00ff88130404fc98
> [8642926.697544]  ffff884c85a36288 ffff887e40fa5368 ffff884c85a36288
> 0000000000000000
> [8642926.788605] Call Trace:
> [8642926.819945]  [<ffffffff816ec91e>] dump_stack+0x63/0x85
> [8642926.883535]  [<ffffffff81088925>] warn_slowpath_common+0x95/0xe0
> [8642926.957529]  [<ffffffff81088a26>] warn_slowpath_fmt+0x46/0x50
> [8642927.028400]  [<ffffffff81343f1c>] __list_add+0x9c/0xd0
> [8642927.092116]  [<ffffffffa01abb92>] btrfs_queue_delayed_node+0xb2/0xe0
> [btrfs]
> [8642927.178669]  [<ffffffffa01abc66>]
> __btrfs_release_delayed_node+0xa6/0x160 [btrfs]
> [8642927.270440]  [<ffffffffa01aea5e>] btrfs_delayed_update_inode+0x7e/0x100
> [btrfs]
> [8642927.360112]  [<ffffffffa015b199>] btrfs_update_inode+0xa9/0x130 [btrfs]
> [8642927.441447]  [<ffffffffa015cd11>] btrfs_dirty_inode+0x71/0xe0 [btrfs]
> [8642927.520716]  [<ffffffffa015fa25>] btrfs_update_time+0x65/0xc0 [btrfs]
> [8642927.599910]  [<ffffffff812286d3>] touch_atime+0x123/0x170
> [8642927.666615]  [<ffffffff81221929>] iterate_dir+0xf9/0x140
> [8642927.732284]  [<ffffffff81221a19>] SyS_getdents64+0xa9/0x100
> [8642927.801072]  [<ffffffff81221720>] ? fillonedir+0xe0/0xe0
> [8642927.866741]  [<ffffffff816f2594>] system_call_fastpath+0x12/0xce
> [8642927.940843] ---[ end trace 6782707b09f9f874 ]---
> [8642927.998378] ------------[ cut here ]------------
>
>
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs panic problem
  2018-09-19 18:36 ` Liu Bo
@ 2018-09-25  8:20   ` sunny.s.zhang
  2018-09-25  8:31     ` Nikolay Borisov
  0 siblings, 1 reply; 14+ messages in thread
From: sunny.s.zhang @ 2018-09-25  8:20 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs


在 2018年09月20日 02:36, Liu Bo 写道:
> On Mon, Sep 17, 2018 at 5:28 PM, sunny.s.zhang <sunny.s.zhang@oracle.com> wrote:
>> Hi All,
>>
>> My OS(4.1.12) panic in kmem_cache_alloc, which is called by
>> btrfs_get_or_create_delayed_node.
>>
>> I found that the freelist of the slub is wrong.
>>
>> crash> struct kmem_cache_cpu ffff887e7d7a24b0
>>
>> struct kmem_cache_cpu {
>>    freelist = 0x2026,   <<< the value is id of one inode
>>    tid = 29567861,
>>    page = 0xffffea0132168d00,
>>    partial = 0x0
>> }
>>
>> And, I found there are two different btrfs inodes pointing delayed_node. It
>> means that the same slub is used twice.
>>
>> I think this slub is freed twice, and then the next pointer of this slub
>> point itself. So we get the same slub twice.
>>
>> When use this slub again, that break the freelist.
>>
>> Folloing code will make the delayed node being freed twice. But I don't
>> found what is the process.
>>
>> Process A (btrfs_evict_inode) Process B
>>
>> call btrfs_remove_delayed_node call  btrfs_get_delayed_node
>>
>> node = ACCESS_ONCE(btrfs_inode->delayed_node);
>>
>> BTRFS_I(inode)->delayed_node = NULL;
>> btrfs_release_delayed_node(delayed_node);
>>
>> if (node) {
>> atomic_inc(&node->refs);
>> return node;
>> }
>>
>> ......
>>
>> btrfs_release_delayed_node(delayed_node);
>>
> By looking at the race,  seems the following commit has addressed it.
>
> btrfs: fix refcount_t usage when deleting btrfs_delayed_nodes
> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_torvalds_linux.git_commit_-3Fid-3Dec35e48b286959991cdbb886f1bdeda4575c80b4&d=DwIBaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=mcYQsljqnoxPHJVaWVFtwsEEDhXdP3ULRlrPW_9etWQ&m=O7fQASCATWfOIp82M24gmi314geaUJDU-9erYxJ2ZEs&s=QtIafUNfkdy5BqfRQLhoHLY6o-Vk8-ZB0sD28mM-o_s&e=
>
> thanks,
> liubo

I don't think so.
this patch has resolved the problem of radix_tree_lookup. I don't think 
this can resolve my problem that race occur after 
ACCESS_ONCE(btrfs_inode->delayed_node).
Because, if ACCESS_ONCE(btrfs_inode->delayed_node) return the node, then 
the function of btrfs_get_delayed_node will return, and don't continue.

Thanks,
Sunny

>
>> 1313 void btrfs_remove_delayed_node(struct inode *inode)
>> 1314 {
>> 1315         struct btrfs_delayed_node *delayed_node;
>> 1316
>> 1317         delayed_node = ACCESS_ONCE(BTRFS_I(inode)->delayed_node);
>> 1318         if (!delayed_node)
>> 1319                 return;
>> 1320
>> 1321         BTRFS_I(inode)->delayed_node = NULL;
>> 1322         btrfs_release_delayed_node(delayed_node);
>> 1323 }
>>
>>
>>    87 static struct btrfs_delayed_node *btrfs_get_delayed_node(struct inode
>> *inode)
>>    88 {
>>    89         struct btrfs_inode *btrfs_inode = BTRFS_I(inode);
>>    90         struct btrfs_root *root = btrfs_inode->root;
>>    91         u64 ino = btrfs_ino(inode);
>>    92         struct btrfs_delayed_node *node;
>>    93
>>    94         node = ACCESS_ONCE(btrfs_inode->delayed_node);
>>    95         if (node) {
>>    96                 atomic_inc(&node->refs);
>>    97                 return node;
>>    98         }
>>
>>
>> Thanks,
>>
>> Sunny
>>
>>
>> PS:
>>
>> ------------------------------------------------
>>
>> panic informations
>>
>> PID: 73638  TASK: ffff887deb586200  CPU: 38  COMMAND: "dockerd"
>>   #0 [ffff88130404f940] machine_kexec at ffffffff8105ec10
>>   #1 [ffff88130404f9b0] crash_kexec at ffffffff811145b8
>>   #2 [ffff88130404fa80] oops_end at ffffffff8101a868
>>   #3 [ffff88130404fab0] no_context at ffffffff8106ea91
>>   #4 [ffff88130404fb00] __bad_area_nosemaphore at ffffffff8106ec8d
>>   #5 [ffff88130404fb50] bad_area_nosemaphore at ffffffff8106eda3
>>   #6 [ffff88130404fb60] __do_page_fault at ffffffff8106f328
>>   #7 [ffff88130404fbd0] do_page_fault at ffffffff8106f637
>>   #8 [ffff88130404fc10] page_fault at ffffffff816f6308
>>      [exception RIP: kmem_cache_alloc+121]
>>      RIP: ffffffff811ef019  RSP: ffff88130404fcc8  RFLAGS: 00010286
>>      RAX: 0000000000000000  RBX: 0000000000000000  RCX: 0000000001c32b76
>>      RDX: 0000000001c32b75  RSI: 0000000000000000  RDI: 00000000000224b0
>>      RBP: ffff88130404fd08   R8: ffff887e7d7a24b0   R9: 0000000000000000
>>      R10: ffff8802668b6618  R11: 0000000000000002  R12: ffff887e3e230a00
>>      R13: 0000000000002026  R14: ffff887e3e230a00  R15: ffffffffa01abf49
>>      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>   #9 [ffff88130404fd10] btrfs_get_or_create_delayed_node at ffffffffa01abf49
>> [btrfs]
>> #10 [ffff88130404fd60] btrfs_delayed_update_inode at ffffffffa01aea12
>> [btrfs]
>> #11 [ffff88130404fdb0] btrfs_update_inode at ffffffffa015b199 [btrfs]
>> #12 [ffff88130404fdf0] btrfs_dirty_inode at ffffffffa015cd11 [btrfs]
>> #13 [ffff88130404fe20] btrfs_update_time at ffffffffa015fa25 [btrfs]
>> #14 [ffff88130404fe50] touch_atime at ffffffff812286d3
>> #15 [ffff88130404fe90] iterate_dir at ffffffff81221929
>> #16 [ffff88130404fee0] sys_getdents64 at ffffffff81221a19
>> #17 [ffff88130404ff50] system_call_fastpath at ffffffff816f2594
>>      RIP: 00000000006b68e4  RSP: 000000c866259080  RFLAGS: 00000246
>>      RAX: ffffffffffffffda  RBX: 000000c828dbbe00  RCX: 00000000006b68e4
>>      RDX: 0000000000001000  RSI: 000000c83da14000  RDI: 0000000000000011
>>      RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000000
>>      R10: 0000000000000000  R11: 0000000000000246  R12: 00000000000000c7
>>      R13: 0000000002174e74  R14: 0000000005555555  R15: 0000000000000038
>>      ORIG_RAX: 00000000000000d9  CS: 0033  SS: 002b
>>
>>
>> We also find the list double add informations, including n_list and p_list:
>>
>> [8642921.110568] ------------[ cut here ]------------
>> [8642921.167929] WARNING: CPU: 38 PID: 73638 at lib/list_debug.c:33
>> __list_add+0xbe/0xd0()
>> [8642921.263780] list_add corruption. prev->next should be next
>> (ffff887e40fa5368), but was ff:ff884c85a36288. (prev=ffff884c85a36288).
>> [8642921.405490] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4
>> xt_conntrack iptable_filter arc4 ecb ppp_mppe ppp_async crc_ccitt
>> ppp_generic slhc nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs fscache
>> lockd sunrpc grace veth xt_nat xt_addrtype br_netfilter bridge tcp_diag
>> inet_diag oracleacfs(POE) oracleadvm(POE) oracleoks(POE) oracleasm autofs4
>> dm_queue_length cpufreq_powersave be2iscsi iscsi_boot_sysfs bnx2i cnic uio
>> cxgb4i iw_cxgb4 cxgb4 cxgb3i libcxgbi iw_cxgb3 cxgb3 ib_iser rdma_cm ib_cm
>> iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi
>> scsi_transport_iscsi 8021q mrp garp stp llc bonding iptable_nat nf_nat_ipv4
>> nf_nat ipt_REJECT nf_reject_ipv4 xt_pkttype nf_conntrack_ipv4 nf_defrag_ipv4
>> ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6
>> xt_state
>> [8642922.259780]  nf_conntrack ip6table_filter ip6_tables ipv6 dm_multipath
>> sg iTCO_wdt iTCO_vendor_support ipmi_devintf ipmi_si ipmi_msghandler
>> intel_powerclamp coretemp kvm_intel kvm pcspkr ixgbe dca ptp pps_core vxlan
>> udp_tunnel ip6_udp_tunnel mdio sb_edac edac_core i2c_i801 i2c_core lpc_ich
>> mfd_core shpchp ext4 jbd2 mbcache2 btrfs raid6_pq xor sd_mod megaraid_sas
>> ghash_clmulni_intel crc32_pclmul crc32c_intel aesni_intel ablk_helper cryptd
>> lrw gf128mul glue_helper aes_x86_64 ahci libahci mxm_wmi wmi dm_mirror
>> dm_region_hash dm_log dm_mod [last unloaded: xt_conntrack]
>> [8642922.850285] CPU: 38 PID: 73638 Comm: dockerd Tainted: P           OE
>> 4.1.12-112.14.13.el6uek.x86_64 #2
>> [8642922.965881] Hardware name: Oracle Corporation ORACLE SERVER
>> X6-2/ASM,MOTHERBOARD,1U, BIOS 38040100 06/06/2016
>> [8642923.086670]  0000000000000000 ffff88130404fc28 ffffffff816ec91e
>> ffff88130404fc78
>> [8642923.177730]  0000000000000021 ffff88130404fc68 ffffffff81088925
>> 00ff88130404fc98
>> [8642923.268793]  ffff884c85a36288 ffff887e40fa5368 ffff884c85a36288
>> 0000000000000000
>> [8642923.359851] Call Trace:
>> [8642923.391194]  [<ffffffff816ec91e>] dump_stack+0x63/0x85
>> [8642923.454780]  [<ffffffff81088925>] warn_slowpath_common+0x95/0xe0
>> [8642923.528769]  [<ffffffff81088a26>] warn_slowpath_fmt+0x46/0x50
>> [8642923.599635]  [<ffffffff81343f3e>] __list_add+0xbe/0xd0
>> [8642923.663333]  [<ffffffffa01abb92>] btrfs_queue_delayed_node+0xb2/0xe0
>> [btrfs]
>> [8642923.749898]  [<ffffffffa01abc66>]
>> __btrfs_release_delayed_node+0xa6/0x160 [btrfs]
>> [8642923.841661]  [<ffffffffa01aea5e>] btrfs_delayed_update_inode+0x7e/0x100
>> [btrfs]
>> [8642923.931336]  [<ffffffffa015b199>] btrfs_update_inode+0xa9/0x130 [btrfs]
>> [8642924.012686]  [<ffffffffa015cd11>] btrfs_dirty_inode+0x71/0xe0 [btrfs]
>> [8642924.091954]  [<ffffffffa015fa25>] btrfs_update_time+0x65/0xc0 [btrfs]
>> [8642924.171145]  [<ffffffff812286d3>] touch_atime+0x123/0x170
>> [8642924.237849]  [<ffffffff81221929>] iterate_dir+0xf9/0x140
>> [8642924.303512]  [<ffffffff81221a19>] SyS_getdents64+0xa9/0x100
>> [8642924.372296]  [<ffffffff81221720>] ? fillonedir+0xe0/0xe0
>> [8642924.437963]  [<ffffffff816f2594>] system_call_fastpath+0x12/0xce
>> [8642924.512201] ---[ end trace 6782707b09f9f873 ]---
>> [8642924.569677] ------------[ cut here ]------------
>> [8642924.627055] WARNING: CPU: 38 PID: 73638 at lib/list_debug.c:36
>> __list_add+0x9c/0xd0()
>> [8642924.722925] list_add double add: new=ffff884c85a36288,
>> prev=ffff884c85a36288, next=ffff887e40fa5368.
>> [8642924.834373] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4
>> xt_conntrack iptable_filter arc4 ecb ppp_mppe ppp_async crc_ccitt
>> ppp_generic slhc nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs fscache
>> lockd sunrpc grace veth xt_nat xt_addrtype br_netfilter bridge tcp_diag
>> inet_diag oracleacfs(POE) oracleadvm(POE) oracleoks(POE) oracleasm autofs4
>> dm_queue_length cpufreq_powersave be2iscsi iscsi_boot_sysfs bnx2i cnic uio
>> cxgb4i iw_cxgb4 cxgb4 cxgb3i libcxgbi iw_cxgb3 cxgb3 ib_iser rdma_cm ib_cm
>> iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi
>> scsi_transport_iscsi 8021q mrp garp stp llc bonding iptable_nat nf_nat_ipv4
>> nf_nat ipt_REJECT nf_reject_ipv4 xt_pkttype nf_conntrack_ipv4 nf_defrag_ipv4
>> ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6
>> xt_state
>> [8642925.688498]  nf_conntrack ip6table_filter ip6_tables ipv6 dm_multipath
>> sg iTCO_wdt iTCO_vendor_support ipmi_devintf ipmi_si ipmi_msghandler
>> intel_powerclamp coretemp kvm_intel kvm pcspkr ixgbe dca ptp pps_core vxlan
>> udp_tunnel ip6_udp_tunnel mdio sb_edac edac_core i2c_i801 i2c_core lpc_ich
>> mfd_core shpchp ext4 jbd2 mbcache2 btrfs raid6_pq xor sd_mod megaraid_sas
>> ghash_clmulni_intel crc32_pclmul crc32c_intel aesni_intel ablk_helper cryptd
>> lrw gf128mul glue_helper aes_x86_64 ahci libahci mxm_wmi wmi dm_mirror
>> dm_region_hash dm_log dm_mod [last unloaded: xt_conntrack]
>> [8642926.279029] CPU: 38 PID: 73638 Comm: dockerd Tainted: P        W  OE
>> 4.1.12-112.14.13.el6uek.x86_64 #2
>> [8642926.394626] Hardware name: Oracle Corporation ORACLE SERVER
>> X6-2/ASM,MOTHERBOARD,1U, BIOS 38040100 06/06/2016
>> [8642926.515421]  0000000000000000 ffff88130404fc28 ffffffff816ec91e
>> ffff88130404fc78
>> [8642926.606485]  0000000000000024 ffff88130404fc68 ffffffff81088925
>> 00ff88130404fc98
>> [8642926.697544]  ffff884c85a36288 ffff887e40fa5368 ffff884c85a36288
>> 0000000000000000
>> [8642926.788605] Call Trace:
>> [8642926.819945]  [<ffffffff816ec91e>] dump_stack+0x63/0x85
>> [8642926.883535]  [<ffffffff81088925>] warn_slowpath_common+0x95/0xe0
>> [8642926.957529]  [<ffffffff81088a26>] warn_slowpath_fmt+0x46/0x50
>> [8642927.028400]  [<ffffffff81343f1c>] __list_add+0x9c/0xd0
>> [8642927.092116]  [<ffffffffa01abb92>] btrfs_queue_delayed_node+0xb2/0xe0
>> [btrfs]
>> [8642927.178669]  [<ffffffffa01abc66>]
>> __btrfs_release_delayed_node+0xa6/0x160 [btrfs]
>> [8642927.270440]  [<ffffffffa01aea5e>] btrfs_delayed_update_inode+0x7e/0x100
>> [btrfs]
>> [8642927.360112]  [<ffffffffa015b199>] btrfs_update_inode+0xa9/0x130 [btrfs]
>> [8642927.441447]  [<ffffffffa015cd11>] btrfs_dirty_inode+0x71/0xe0 [btrfs]
>> [8642927.520716]  [<ffffffffa015fa25>] btrfs_update_time+0x65/0xc0 [btrfs]
>> [8642927.599910]  [<ffffffff812286d3>] touch_atime+0x123/0x170
>> [8642927.666615]  [<ffffffff81221929>] iterate_dir+0xf9/0x140
>> [8642927.732284]  [<ffffffff81221a19>] SyS_getdents64+0xa9/0x100
>> [8642927.801072]  [<ffffffff81221720>] ? fillonedir+0xe0/0xe0
>> [8642927.866741]  [<ffffffff816f2594>] system_call_fastpath+0x12/0xce
>> [8642927.940843] ---[ end trace 6782707b09f9f874 ]---
>> [8642927.998378] ------------[ cut here ]------------
>>
>>
>>
>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs panic problem
  2018-09-19 16:12     ` Nikolay Borisov
@ 2018-09-25  8:29       ` sunny.s.zhang
  0 siblings, 0 replies; 14+ messages in thread
From: sunny.s.zhang @ 2018-09-25  8:29 UTC (permalink / raw)
  To: Nikolay Borisov, Duncan, linux-btrfs


在 2018年09月20日 00:12, Nikolay Borisov 写道:
> On 19.09.2018 02:53, sunny.s.zhang wrote:
>> Hi Duncan,
>>
>> Thank you for your advice. I understand what you mean.  But i have
>> reviewed the latest btrfs code, and i think the issue is exist still.
>>
>> At 71 line, if the function of btrfs_get_delayed_node run over this
>> line, then switch to other process, which run over the 1282 and release
>> the delayed node at the end.
>>
>> And then, switch back to the  btrfs_get_delayed_node. find that the node
>> is not null, and use it as normal. that mean we used a freed memory.
>>
>> at some time, this memory will be freed again.
>>
>> latest code as below.
>>
>> 1278 void btrfs_remove_delayed_node(struct btrfs_inode *inode)
>> 1279 {
>> 1280         struct btrfs_delayed_node *delayed_node;
>> 1281
>> 1282         delayed_node = READ_ONCE(inode->delayed_node);
>> 1283         if (!delayed_node)
>> 1284                 return;
>> 1285
>> 1286         inode->delayed_node = NULL;
>> 1287         btrfs_release_delayed_node(delayed_node);
>> 1288 }
>>
>>
>>    64 static struct btrfs_delayed_node *btrfs_get_delayed_node(
>>    65                 struct btrfs_inode *btrfs_inode)
>>    66 {
>>    67         struct btrfs_root *root = btrfs_inode->root;
>>    68         u64 ino = btrfs_ino(btrfs_inode);
>>    69         struct btrfs_delayed_node *node;
>>    70
>>    71         node = READ_ONCE(btrfs_inode->delayed_node);
>>    72         if (node) {
>>    73                 refcount_inc(&node->refs);
>>    74                 return node;
>>    75         }
>>    76
>>    77         spin_lock(&root->inode_lock);
>>    78         node = radix_tree_lookup(&root->delayed_nodes_tree, ino);
>>
>>
> You are analysis is correct, however it's missing one crucial point -
> btrfs_remove_delayed_node is called only from btrfs_evict_inode. And
> inodes are evicted when all other references have been dropped. Check
> the code in evict_inodes() - inodes are added to the dispose list when
> their i_count is 0 at which point there should be no references in this
> inode. This invalidates your analysis...
Thanks.
Yes, I know this.  and I know that other process can not use this inode 
if the inode is in the I_FREEING status.
But,  Chris has fixed a bug, which is similar with this and is found in 
production.  it mean that this will occur in some condition.

btrfs: fix refcount_t usage when deleting btrfs_delayed_nodes
https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_torvalds_linux.git_commit_-3Fid-3Dec35e48b286959991cdbb886f1bdeda4575c80b4&d=DwIBaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=mcYQsljqnoxPHJVaWVFtwsEEDhXdP3ULRlrPW_9etWQ&m=O7fQASCATWfOIp82M24gmi314geaUJDU-9erYxJ2ZEs&s=QtIafUNfkdy5BqfRQLhoHLY6o-Vk8-ZB0sD28mM-o_s&e=

>> 在 2018年09月18日 13:05, Duncan 写道:
>>> sunny.s.zhang posted on Tue, 18 Sep 2018 08:28:14 +0800 as excerpted:
>>>
>>>> My OS(4.1.12) panic in kmem_cache_alloc, which is called by
>>>> btrfs_get_or_create_delayed_node.
>>>>
>>>> I found that the freelist of the slub is wrong.
>>> [Not a dev, just a btrfs list regular and user, myself.  But here's a
>>> general btrfs list recommendations reply...]
>>>
>>> You appear to mean kernel 4.1.12 -- confirmed by the version reported in
>>> the posted dump:  4.1.12-112.14.13.el6uek.x86_64
>>>
>>> OK, so from the perspective of this forward-development-focused list,
>>> kernel 4.1 is pretty ancient history, but you do have a number of
>>> options.
>>>
>>> First let's consider the general situation.  Most people choose an
>>> enterprise distro for supported stability, and that's certainly a valid
>>> thing to want.  However, btrfs, while now reaching early maturity for the
>>> basics (single device in single or dup mode, and multi-device in single/
>>> raid0/1/10 modes, note that raid56 mode is newer and less mature),
>>> remains under quite heavy development, and keeping reasonably current is
>>> recommended for that reason.
>>>
>>> So you you chose an enterprise distro presumably to lock in supported
>>> stability for several years, but you chose a filesystem, btrfs, that's
>>> still under heavy development, with reasonably current kernels and
>>> userspace recommended as tending to have the known bugs fixed.  There's a
>>> bit of a conflict there, and the /general/ recommendation would thus be
>>> to consider whether one or the other of those choices are inappropriate
>>> for your use-case, because it's really quite likely that if you really
>>> want the stability of an enterprise distro and kernel, that btrfs isn't
>>> as stable a filesystem as you're likely to want to match with it.
>>> Alternatively, if you want something newer to match the still under heavy
>>> development btrfs, you very likely want a distro that's not focused on
>>> years-old stability just for the sake of it.  One or the other is likely
>>> to be a poor match for your needs, and choosing something else that's a
>>> better match is likely to be a much better experience for you.
>>>
>>> But perhaps you do have reason to want to run the newer and not quite to
>>> traditional enterprise-distro level stability btrfs, on an otherwise
>>> older and very stable enterprise distro.  That's fine, provided you know
>>> what you're getting yourself into, and are prepared to deal with it.
>>>
>>> In that case, for best support from the list, we'd recommend running one
>>> of the latest two kernels in either the current or mainline LTS tracks.
>>>
>>> For current track, With 4.18 being the latest kernel, that'd be 4.18 or
>>> 4.17, as available on kernel.org (tho 4.17 is already EOL, no further
>>> releases, at 4.17.19).
>>>
>>> For mainline-LTS track, 4.14 and 4.9 are the latest two LTS series
>>> kernels, tho IIRC 4.19 is scheduled to be this year's LTS (or was it 4.18
>>> and it's just not out of normal stable range yet so not yet marked LTS?),
>>> so it'll be coming up soon and 4.9 will then be dropping to third LTS
>>> series and thus out of our best recommended range.  4.4 was the previous
>>> LTS and while still in LTS support, is outside the two newest LTS series
>>> that this list recommends.
>>>
>>> And of course 4.1 is older than 4.4, so as I said, in btrfs development
>>> terms, it's quite ancient indeed... quite out of practical support range
>>> here, tho of course we'll still try, but in many cases the first question
>>> when any problem's reported is going to be whether it's reproducible on
>>> something closer to current.
>>>
>>> But... you ARE on an enterprise kernel, likely on an enterprise distro,
>>> and very possibly actually paying /them/ for support.  So you're not
>>> without options if you prefer to stay with your supported enterprise
>>> kernel.  If you're paying them for support, you might as well use it, and
>>> of course of the very many fixes since 4.1, they know what they've
>>> backported and what they haven't, so they're far better placed to provide
>>> that support in any case.
>>>
>>> Or, given what you posted, you appear to be reasonably able to do at
>>> least limited kernel-dev-level analysis yourself.  Given that, you're
>>> already reasonably well placed to simply decide to stick with what you
>>> have and take the support you can get, diving into things yourself if
>>> necessary.
>>>
>>>
>>> So those are your kernel options.  What about userspace btrfs-progs?
>>>
>>> Generally speaking, while the filesystem's running, it's the kernel code
>>> doing most of the work.  If you have old userspace, it simply means you
>>> can't take advantage of some of the newer features as the old userspace
>>> doesn't know how to call for them.
>>>
>>> But the situation changes as soon as you have problems and can't mount,
>>> because it's userspace code that runs to try to fix that sort of problem,
>>> or failing that, it's userspace code that btrfs restore runs to try to
>>> grab what files can be grabbed off of the unmountable filesystem.
>>>
>>> So for routine operation, it's no big deal if userspace is a bit old, at
>>> least as long as it's new enough to have all the newer command formats,
>>> etc, that you need, and for comparing against others when posted.  But
>>> once things go bad on you, you really want the newest btrfs-progs in
>>> ordered to give you the best chance at either fixing things, or worst-
>>> case, at least retrieving the files off the dead filesystem.  So using
>>> the older distro btrfs-progs for routine running should be fine, but
>>> unless your backups are complete and frequent enough that if something
>>> goes wrong it's easiest to simply blow the bad version away with a fresh
>>> mkfs and start over, you'll probably want at least a reasonably current
>>> btrfs-progs on your rescue media at least.  Since the userspace version
>>> numbers are synced to the kernel cycle, a good rule of thumb is keep your
>>> btrfs-progs version to at least that of the oldest recommended LTS kernel
>>> version, as well, so you'd want at least btrfs-progs 4.9 on your rescue
>>> media, for now, and 4.14, coming up, since when the new kernel goes LTS
>>> that'll displace 4.9 and 4.14 will then be the second-back LTS.
>>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs panic problem
  2018-09-25  8:20   ` sunny.s.zhang
@ 2018-09-25  8:31     ` Nikolay Borisov
  2018-09-25  8:42       ` sunny.s.zhang
  0 siblings, 1 reply; 14+ messages in thread
From: Nikolay Borisov @ 2018-09-25  8:31 UTC (permalink / raw)
  To: sunny.s.zhang, Liu Bo; +Cc: linux-btrfs



On 25.09.2018 11:20, sunny.s.zhang wrote:
> 
> 在 2018年09月20日 02:36, Liu Bo 写道:
>> On Mon, Sep 17, 2018 at 5:28 PM, sunny.s.zhang
>> <sunny.s.zhang@oracle.com> wrote:
>>> Hi All,
>>>
>>> My OS(4.1.12) panic in kmem_cache_alloc, which is called by
>>> btrfs_get_or_create_delayed_node.
>>>
>>> I found that the freelist of the slub is wrong.
>>>
>>> crash> struct kmem_cache_cpu ffff887e7d7a24b0
>>>
>>> struct kmem_cache_cpu {
>>>    freelist = 0x2026,   <<< the value is id of one inode
>>>    tid = 29567861,
>>>    page = 0xffffea0132168d00,
>>>    partial = 0x0
>>> }
>>>
>>> And, I found there are two different btrfs inodes pointing
>>> delayed_node. It
>>> means that the same slub is used twice.
>>>
>>> I think this slub is freed twice, and then the next pointer of this slub
>>> point itself. So we get the same slub twice.
>>>
>>> When use this slub again, that break the freelist.
>>>
>>> Folloing code will make the delayed node being freed twice. But I don't
>>> found what is the process.
>>>
>>> Process A (btrfs_evict_inode) Process B
>>>
>>> call btrfs_remove_delayed_node call  btrfs_get_delayed_node
>>>
>>> node = ACCESS_ONCE(btrfs_inode->delayed_node);
>>>
>>> BTRFS_I(inode)->delayed_node = NULL;
>>> btrfs_release_delayed_node(delayed_node);
>>>
>>> if (node) {
>>> atomic_inc(&node->refs);
>>> return node;
>>> }
>>>
>>> ......
>>>
>>> btrfs_release_delayed_node(delayed_node);
>>>
>> By looking at the race,  seems the following commit has addressed it.
>>
>> btrfs: fix refcount_t usage when deleting btrfs_delayed_nodes
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_torvalds_linux.git_commit_-3Fid-3Dec35e48b286959991cdbb886f1bdeda4575c80b4&d=DwIBaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=mcYQsljqnoxPHJVaWVFtwsEEDhXdP3ULRlrPW_9etWQ&m=O7fQASCATWfOIp82M24gmi314geaUJDU-9erYxJ2ZEs&s=QtIafUNfkdy5BqfRQLhoHLY6o-Vk8-ZB0sD28mM-o_s&e=
>>
>>
>> thanks,
>> liubo
> 
> I don't think so.
> this patch has resolved the problem of radix_tree_lookup. I don't think
> this can resolve my problem that race occur after
> ACCESS_ONCE(btrfs_inode->delayed_node).
> Because, if ACCESS_ONCE(btrfs_inode->delayed_node) return the node, then
> the function of btrfs_get_delayed_node will return, and don't continue.

Can you reproduce the problem on an upstream kernel with added delays?
The original report is from some RHEL-based distro (presumably oracle
unbreakable linux) so there is no indication currently that this is a
genuine problem in upstream kernels.

> 
> Thanks,
> Sunny
> 
>>
>>> 1313 void btrfs_remove_delayed_node(struct inode *inode)
>>> 1314 {
>>> 1315         struct btrfs_delayed_node *delayed_node;
>>> 1316
>>> 1317         delayed_node = ACCESS_ONCE(BTRFS_I(inode)->delayed_node);
>>> 1318         if (!delayed_node)
>>> 1319                 return;
>>> 1320
>>> 1321         BTRFS_I(inode)->delayed_node = NULL;
>>> 1322         btrfs_release_delayed_node(delayed_node);
>>> 1323 }
>>>
>>>
>>>    87 static struct btrfs_delayed_node *btrfs_get_delayed_node(struct
>>> inode
>>> *inode)
>>>    88 {
>>>    89         struct btrfs_inode *btrfs_inode = BTRFS_I(inode);
>>>    90         struct btrfs_root *root = btrfs_inode->root;
>>>    91         u64 ino = btrfs_ino(inode);
>>>    92         struct btrfs_delayed_node *node;
>>>    93
>>>    94         node = ACCESS_ONCE(btrfs_inode->delayed_node);
>>>    95         if (node) {
>>>    96                 atomic_inc(&node->refs);
>>>    97                 return node;
>>>    98         }
>>>
>>>
>>> Thanks,
>>>
>>> Sunny
>>>
>>>
>>> PS:
>>>
>>> ------------------------------------------------
>>>
>>> panic informations
>>>
>>> PID: 73638  TASK: ffff887deb586200  CPU: 38  COMMAND: "dockerd"
>>>   #0 [ffff88130404f940] machine_kexec at ffffffff8105ec10
>>>   #1 [ffff88130404f9b0] crash_kexec at ffffffff811145b8
>>>   #2 [ffff88130404fa80] oops_end at ffffffff8101a868
>>>   #3 [ffff88130404fab0] no_context at ffffffff8106ea91
>>>   #4 [ffff88130404fb00] __bad_area_nosemaphore at ffffffff8106ec8d
>>>   #5 [ffff88130404fb50] bad_area_nosemaphore at ffffffff8106eda3
>>>   #6 [ffff88130404fb60] __do_page_fault at ffffffff8106f328
>>>   #7 [ffff88130404fbd0] do_page_fault at ffffffff8106f637
>>>   #8 [ffff88130404fc10] page_fault at ffffffff816f6308
>>>      [exception RIP: kmem_cache_alloc+121]
>>>      RIP: ffffffff811ef019  RSP: ffff88130404fcc8  RFLAGS: 00010286
>>>      RAX: 0000000000000000  RBX: 0000000000000000  RCX: 0000000001c32b76
>>>      RDX: 0000000001c32b75  RSI: 0000000000000000  RDI: 00000000000224b0
>>>      RBP: ffff88130404fd08   R8: ffff887e7d7a24b0   R9: 0000000000000000
>>>      R10: ffff8802668b6618  R11: 0000000000000002  R12: ffff887e3e230a00
>>>      R13: 0000000000002026  R14: ffff887e3e230a00  R15: ffffffffa01abf49
>>>      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>>   #9 [ffff88130404fd10] btrfs_get_or_create_delayed_node at
>>> ffffffffa01abf49
>>> [btrfs]
>>> #10 [ffff88130404fd60] btrfs_delayed_update_inode at ffffffffa01aea12
>>> [btrfs]
>>> #11 [ffff88130404fdb0] btrfs_update_inode at ffffffffa015b199 [btrfs]
>>> #12 [ffff88130404fdf0] btrfs_dirty_inode at ffffffffa015cd11 [btrfs]
>>> #13 [ffff88130404fe20] btrfs_update_time at ffffffffa015fa25 [btrfs]
>>> #14 [ffff88130404fe50] touch_atime at ffffffff812286d3
>>> #15 [ffff88130404fe90] iterate_dir at ffffffff81221929
>>> #16 [ffff88130404fee0] sys_getdents64 at ffffffff81221a19
>>> #17 [ffff88130404ff50] system_call_fastpath at ffffffff816f2594
>>>      RIP: 00000000006b68e4  RSP: 000000c866259080  RFLAGS: 00000246
>>>      RAX: ffffffffffffffda  RBX: 000000c828dbbe00  RCX: 00000000006b68e4
>>>      RDX: 0000000000001000  RSI: 000000c83da14000  RDI: 0000000000000011
>>>      RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000000
>>>      R10: 0000000000000000  R11: 0000000000000246  R12: 00000000000000c7
>>>      R13: 0000000002174e74  R14: 0000000005555555  R15: 0000000000000038
>>>      ORIG_RAX: 00000000000000d9  CS: 0033  SS: 002b
>>>
>>>
>>> We also find the list double add informations, including n_list and
>>> p_list:
>>>
>>> [8642921.110568] ------------[ cut here ]------------
>>> [8642921.167929] WARNING: CPU: 38 PID: 73638 at lib/list_debug.c:33
>>> __list_add+0xbe/0xd0()
>>> [8642921.263780] list_add corruption. prev->next should be next
>>> (ffff887e40fa5368), but was ff:ff884c85a36288. (prev=ffff884c85a36288).
>>> [8642921.405490] Modules linked in: ipt_MASQUERADE
>>> nf_nat_masquerade_ipv4
>>> xt_conntrack iptable_filter arc4 ecb ppp_mppe ppp_async crc_ccitt
>>> ppp_generic slhc nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs
>>> fscache
>>> lockd sunrpc grace veth xt_nat xt_addrtype br_netfilter bridge tcp_diag
>>> inet_diag oracleacfs(POE) oracleadvm(POE) oracleoks(POE) oracleasm
>>> autofs4
>>> dm_queue_length cpufreq_powersave be2iscsi iscsi_boot_sysfs bnx2i
>>> cnic uio
>>> cxgb4i iw_cxgb4 cxgb4 cxgb3i libcxgbi iw_cxgb3 cxgb3 ib_iser rdma_cm
>>> ib_cm
>>> iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi
>>> scsi_transport_iscsi 8021q mrp garp stp llc bonding iptable_nat
>>> nf_nat_ipv4
>>> nf_nat ipt_REJECT nf_reject_ipv4 xt_pkttype nf_conntrack_ipv4
>>> nf_defrag_ipv4
>>> ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6
>>> xt_state
>>> [8642922.259780]  nf_conntrack ip6table_filter ip6_tables ipv6
>>> dm_multipath
>>> sg iTCO_wdt iTCO_vendor_support ipmi_devintf ipmi_si ipmi_msghandler
>>> intel_powerclamp coretemp kvm_intel kvm pcspkr ixgbe dca ptp pps_core
>>> vxlan
>>> udp_tunnel ip6_udp_tunnel mdio sb_edac edac_core i2c_i801 i2c_core
>>> lpc_ich
>>> mfd_core shpchp ext4 jbd2 mbcache2 btrfs raid6_pq xor sd_mod
>>> megaraid_sas
>>> ghash_clmulni_intel crc32_pclmul crc32c_intel aesni_intel ablk_helper
>>> cryptd
>>> lrw gf128mul glue_helper aes_x86_64 ahci libahci mxm_wmi wmi dm_mirror
>>> dm_region_hash dm_log dm_mod [last unloaded: xt_conntrack]
>>> [8642922.850285] CPU: 38 PID: 73638 Comm: dockerd Tainted:
>>> P           OE
>>> 4.1.12-112.14.13.el6uek.x86_64 #2
>>> [8642922.965881] Hardware name: Oracle Corporation ORACLE SERVER
>>> X6-2/ASM,MOTHERBOARD,1U, BIOS 38040100 06/06/2016
>>> [8642923.086670]  0000000000000000 ffff88130404fc28 ffffffff816ec91e
>>> ffff88130404fc78
>>> [8642923.177730]  0000000000000021 ffff88130404fc68 ffffffff81088925
>>> 00ff88130404fc98
>>> [8642923.268793]  ffff884c85a36288 ffff887e40fa5368 ffff884c85a36288
>>> 0000000000000000
>>> [8642923.359851] Call Trace:
>>> [8642923.391194]  [<ffffffff816ec91e>] dump_stack+0x63/0x85
>>> [8642923.454780]  [<ffffffff81088925>] warn_slowpath_common+0x95/0xe0
>>> [8642923.528769]  [<ffffffff81088a26>] warn_slowpath_fmt+0x46/0x50
>>> [8642923.599635]  [<ffffffff81343f3e>] __list_add+0xbe/0xd0
>>> [8642923.663333]  [<ffffffffa01abb92>]
>>> btrfs_queue_delayed_node+0xb2/0xe0
>>> [btrfs]
>>> [8642923.749898]  [<ffffffffa01abc66>]
>>> __btrfs_release_delayed_node+0xa6/0x160 [btrfs]
>>> [8642923.841661]  [<ffffffffa01aea5e>]
>>> btrfs_delayed_update_inode+0x7e/0x100
>>> [btrfs]
>>> [8642923.931336]  [<ffffffffa015b199>] btrfs_update_inode+0xa9/0x130
>>> [btrfs]
>>> [8642924.012686]  [<ffffffffa015cd11>] btrfs_dirty_inode+0x71/0xe0
>>> [btrfs]
>>> [8642924.091954]  [<ffffffffa015fa25>] btrfs_update_time+0x65/0xc0
>>> [btrfs]
>>> [8642924.171145]  [<ffffffff812286d3>] touch_atime+0x123/0x170
>>> [8642924.237849]  [<ffffffff81221929>] iterate_dir+0xf9/0x140
>>> [8642924.303512]  [<ffffffff81221a19>] SyS_getdents64+0xa9/0x100
>>> [8642924.372296]  [<ffffffff81221720>] ? fillonedir+0xe0/0xe0
>>> [8642924.437963]  [<ffffffff816f2594>] system_call_fastpath+0x12/0xce
>>> [8642924.512201] ---[ end trace 6782707b09f9f873 ]---
>>> [8642924.569677] ------------[ cut here ]------------
>>> [8642924.627055] WARNING: CPU: 38 PID: 73638 at lib/list_debug.c:36
>>> __list_add+0x9c/0xd0()
>>> [8642924.722925] list_add double add: new=ffff884c85a36288,
>>> prev=ffff884c85a36288, next=ffff887e40fa5368.
>>> [8642924.834373] Modules linked in: ipt_MASQUERADE
>>> nf_nat_masquerade_ipv4
>>> xt_conntrack iptable_filter arc4 ecb ppp_mppe ppp_async crc_ccitt
>>> ppp_generic slhc nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs
>>> fscache
>>> lockd sunrpc grace veth xt_nat xt_addrtype br_netfilter bridge tcp_diag
>>> inet_diag oracleacfs(POE) oracleadvm(POE) oracleoks(POE) oracleasm
>>> autofs4
>>> dm_queue_length cpufreq_powersave be2iscsi iscsi_boot_sysfs bnx2i
>>> cnic uio
>>> cxgb4i iw_cxgb4 cxgb4 cxgb3i libcxgbi iw_cxgb3 cxgb3 ib_iser rdma_cm
>>> ib_cm
>>> iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi
>>> scsi_transport_iscsi 8021q mrp garp stp llc bonding iptable_nat
>>> nf_nat_ipv4
>>> nf_nat ipt_REJECT nf_reject_ipv4 xt_pkttype nf_conntrack_ipv4
>>> nf_defrag_ipv4
>>> ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6
>>> xt_state
>>> [8642925.688498]  nf_conntrack ip6table_filter ip6_tables ipv6
>>> dm_multipath
>>> sg iTCO_wdt iTCO_vendor_support ipmi_devintf ipmi_si ipmi_msghandler
>>> intel_powerclamp coretemp kvm_intel kvm pcspkr ixgbe dca ptp pps_core
>>> vxlan
>>> udp_tunnel ip6_udp_tunnel mdio sb_edac edac_core i2c_i801 i2c_core
>>> lpc_ich
>>> mfd_core shpchp ext4 jbd2 mbcache2 btrfs raid6_pq xor sd_mod
>>> megaraid_sas
>>> ghash_clmulni_intel crc32_pclmul crc32c_intel aesni_intel ablk_helper
>>> cryptd
>>> lrw gf128mul glue_helper aes_x86_64 ahci libahci mxm_wmi wmi dm_mirror
>>> dm_region_hash dm_log dm_mod [last unloaded: xt_conntrack]
>>> [8642926.279029] CPU: 38 PID: 73638 Comm: dockerd Tainted: P       
>>> W  OE
>>> 4.1.12-112.14.13.el6uek.x86_64 #2
>>> [8642926.394626] Hardware name: Oracle Corporation ORACLE SERVER
>>> X6-2/ASM,MOTHERBOARD,1U, BIOS 38040100 06/06/2016
>>> [8642926.515421]  0000000000000000 ffff88130404fc28 ffffffff816ec91e
>>> ffff88130404fc78
>>> [8642926.606485]  0000000000000024 ffff88130404fc68 ffffffff81088925
>>> 00ff88130404fc98
>>> [8642926.697544]  ffff884c85a36288 ffff887e40fa5368 ffff884c85a36288
>>> 0000000000000000
>>> [8642926.788605] Call Trace:
>>> [8642926.819945]  [<ffffffff816ec91e>] dump_stack+0x63/0x85
>>> [8642926.883535]  [<ffffffff81088925>] warn_slowpath_common+0x95/0xe0
>>> [8642926.957529]  [<ffffffff81088a26>] warn_slowpath_fmt+0x46/0x50
>>> [8642927.028400]  [<ffffffff81343f1c>] __list_add+0x9c/0xd0
>>> [8642927.092116]  [<ffffffffa01abb92>]
>>> btrfs_queue_delayed_node+0xb2/0xe0
>>> [btrfs]
>>> [8642927.178669]  [<ffffffffa01abc66>]
>>> __btrfs_release_delayed_node+0xa6/0x160 [btrfs]
>>> [8642927.270440]  [<ffffffffa01aea5e>]
>>> btrfs_delayed_update_inode+0x7e/0x100
>>> [btrfs]
>>> [8642927.360112]  [<ffffffffa015b199>] btrfs_update_inode+0xa9/0x130
>>> [btrfs]
>>> [8642927.441447]  [<ffffffffa015cd11>] btrfs_dirty_inode+0x71/0xe0
>>> [btrfs]
>>> [8642927.520716]  [<ffffffffa015fa25>] btrfs_update_time+0x65/0xc0
>>> [btrfs]
>>> [8642927.599910]  [<ffffffff812286d3>] touch_atime+0x123/0x170
>>> [8642927.666615]  [<ffffffff81221929>] iterate_dir+0xf9/0x140
>>> [8642927.732284]  [<ffffffff81221a19>] SyS_getdents64+0xa9/0x100
>>> [8642927.801072]  [<ffffffff81221720>] ? fillonedir+0xe0/0xe0
>>> [8642927.866741]  [<ffffffff816f2594>] system_call_fastpath+0x12/0xce
>>> [8642927.940843] ---[ end trace 6782707b09f9f874 ]---
>>> [8642927.998378] ------------[ cut here ]------------
>>>
>>>
>>>
>>>
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs panic problem
  2018-09-25  8:31     ` Nikolay Borisov
@ 2018-09-25  8:42       ` sunny.s.zhang
  0 siblings, 0 replies; 14+ messages in thread
From: sunny.s.zhang @ 2018-09-25  8:42 UTC (permalink / raw)
  To: Nikolay Borisov, Liu Bo; +Cc: linux-btrfs



在 2018年09月25日 16:31, Nikolay Borisov 写道:
>
> On 25.09.2018 11:20, sunny.s.zhang wrote:
>> 在 2018年09月20日 02:36, Liu Bo 写道:
>>> On Mon, Sep 17, 2018 at 5:28 PM, sunny.s.zhang
>>> <sunny.s.zhang@oracle.com> wrote:
>>>> Hi All,
>>>>
>>>> My OS(4.1.12) panic in kmem_cache_alloc, which is called by
>>>> btrfs_get_or_create_delayed_node.
>>>>
>>>> I found that the freelist of the slub is wrong.
>>>>
>>>> crash> struct kmem_cache_cpu ffff887e7d7a24b0
>>>>
>>>> struct kmem_cache_cpu {
>>>>     freelist = 0x2026,   <<< the value is id of one inode
>>>>     tid = 29567861,
>>>>     page = 0xffffea0132168d00,
>>>>     partial = 0x0
>>>> }
>>>>
>>>> And, I found there are two different btrfs inodes pointing
>>>> delayed_node. It
>>>> means that the same slub is used twice.
>>>>
>>>> I think this slub is freed twice, and then the next pointer of this slub
>>>> point itself. So we get the same slub twice.
>>>>
>>>> When use this slub again, that break the freelist.
>>>>
>>>> Folloing code will make the delayed node being freed twice. But I don't
>>>> found what is the process.
>>>>
>>>> Process A (btrfs_evict_inode) Process B
>>>>
>>>> call btrfs_remove_delayed_node call  btrfs_get_delayed_node
>>>>
>>>> node = ACCESS_ONCE(btrfs_inode->delayed_node);
>>>>
>>>> BTRFS_I(inode)->delayed_node = NULL;
>>>> btrfs_release_delayed_node(delayed_node);
>>>>
>>>> if (node) {
>>>> atomic_inc(&node->refs);
>>>> return node;
>>>> }
>>>>
>>>> ......
>>>>
>>>> btrfs_release_delayed_node(delayed_node);
>>>>
>>> By looking at the race,  seems the following commit has addressed it.
>>>
>>> btrfs: fix refcount_t usage when deleting btrfs_delayed_nodes
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_torvalds_linux.git_commit_-3Fid-3Dec35e48b286959991cdbb886f1bdeda4575c80b4&d=DwIBaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=mcYQsljqnoxPHJVaWVFtwsEEDhXdP3ULRlrPW_9etWQ&m=O7fQASCATWfOIp82M24gmi314geaUJDU-9erYxJ2ZEs&s=QtIafUNfkdy5BqfRQLhoHLY6o-Vk8-ZB0sD28mM-o_s&e=
>>>
>>>
>>> thanks,
>>> liubo
>> I don't think so.
>> this patch has resolved the problem of radix_tree_lookup. I don't think
>> this can resolve my problem that race occur after
>> ACCESS_ONCE(btrfs_inode->delayed_node).
>> Because, if ACCESS_ONCE(btrfs_inode->delayed_node) return the node, then
>> the function of btrfs_get_delayed_node will return, and don't continue.
> Can you reproduce the problem on an upstream kernel with added delays?
> The original report is from some RHEL-based distro (presumably oracle
> unbreakable linux) so there is no indication currently that this is a
> genuine problem in upstream kernels.
Not yet. I will reproduce later.
But I don't have any clue about this race now.
Thanks,
Sunny

>
>> Thanks,
>> Sunny
>>
>>>> 1313 void btrfs_remove_delayed_node(struct inode *inode)
>>>> 1314 {
>>>> 1315         struct btrfs_delayed_node *delayed_node;
>>>> 1316
>>>> 1317         delayed_node = ACCESS_ONCE(BTRFS_I(inode)->delayed_node);
>>>> 1318         if (!delayed_node)
>>>> 1319                 return;
>>>> 1320
>>>> 1321         BTRFS_I(inode)->delayed_node = NULL;
>>>> 1322         btrfs_release_delayed_node(delayed_node);
>>>> 1323 }
>>>>
>>>>
>>>>     87 static struct btrfs_delayed_node *btrfs_get_delayed_node(struct
>>>> inode
>>>> *inode)
>>>>     88 {
>>>>     89         struct btrfs_inode *btrfs_inode = BTRFS_I(inode);
>>>>     90         struct btrfs_root *root = btrfs_inode->root;
>>>>     91         u64 ino = btrfs_ino(inode);
>>>>     92         struct btrfs_delayed_node *node;
>>>>     93
>>>>     94         node = ACCESS_ONCE(btrfs_inode->delayed_node);
>>>>     95         if (node) {
>>>>     96                 atomic_inc(&node->refs);
>>>>     97                 return node;
>>>>     98         }
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Sunny
>>>>
>>>>
>>>> PS:
>>>>
>>>> ------------------------------------------------
>>>>
>>>> panic informations
>>>>
>>>> PID: 73638  TASK: ffff887deb586200  CPU: 38  COMMAND: "dockerd"
>>>>    #0 [ffff88130404f940] machine_kexec at ffffffff8105ec10
>>>>    #1 [ffff88130404f9b0] crash_kexec at ffffffff811145b8
>>>>    #2 [ffff88130404fa80] oops_end at ffffffff8101a868
>>>>    #3 [ffff88130404fab0] no_context at ffffffff8106ea91
>>>>    #4 [ffff88130404fb00] __bad_area_nosemaphore at ffffffff8106ec8d
>>>>    #5 [ffff88130404fb50] bad_area_nosemaphore at ffffffff8106eda3
>>>>    #6 [ffff88130404fb60] __do_page_fault at ffffffff8106f328
>>>>    #7 [ffff88130404fbd0] do_page_fault at ffffffff8106f637
>>>>    #8 [ffff88130404fc10] page_fault at ffffffff816f6308
>>>>       [exception RIP: kmem_cache_alloc+121]
>>>>       RIP: ffffffff811ef019  RSP: ffff88130404fcc8  RFLAGS: 00010286
>>>>       RAX: 0000000000000000  RBX: 0000000000000000  RCX: 0000000001c32b76
>>>>       RDX: 0000000001c32b75  RSI: 0000000000000000  RDI: 00000000000224b0
>>>>       RBP: ffff88130404fd08   R8: ffff887e7d7a24b0   R9: 0000000000000000
>>>>       R10: ffff8802668b6618  R11: 0000000000000002  R12: ffff887e3e230a00
>>>>       R13: 0000000000002026  R14: ffff887e3e230a00  R15: ffffffffa01abf49
>>>>       ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>>>    #9 [ffff88130404fd10] btrfs_get_or_create_delayed_node at
>>>> ffffffffa01abf49
>>>> [btrfs]
>>>> #10 [ffff88130404fd60] btrfs_delayed_update_inode at ffffffffa01aea12
>>>> [btrfs]
>>>> #11 [ffff88130404fdb0] btrfs_update_inode at ffffffffa015b199 [btrfs]
>>>> #12 [ffff88130404fdf0] btrfs_dirty_inode at ffffffffa015cd11 [btrfs]
>>>> #13 [ffff88130404fe20] btrfs_update_time at ffffffffa015fa25 [btrfs]
>>>> #14 [ffff88130404fe50] touch_atime at ffffffff812286d3
>>>> #15 [ffff88130404fe90] iterate_dir at ffffffff81221929
>>>> #16 [ffff88130404fee0] sys_getdents64 at ffffffff81221a19
>>>> #17 [ffff88130404ff50] system_call_fastpath at ffffffff816f2594
>>>>       RIP: 00000000006b68e4  RSP: 000000c866259080  RFLAGS: 00000246
>>>>       RAX: ffffffffffffffda  RBX: 000000c828dbbe00  RCX: 00000000006b68e4
>>>>       RDX: 0000000000001000  RSI: 000000c83da14000  RDI: 0000000000000011
>>>>       RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000000
>>>>       R10: 0000000000000000  R11: 0000000000000246  R12: 00000000000000c7
>>>>       R13: 0000000002174e74  R14: 0000000005555555  R15: 0000000000000038
>>>>       ORIG_RAX: 00000000000000d9  CS: 0033  SS: 002b
>>>>
>>>>
>>>> We also find the list double add informations, including n_list and
>>>> p_list:
>>>>
>>>> [8642921.110568] ------------[ cut here ]------------
>>>> [8642921.167929] WARNING: CPU: 38 PID: 73638 at lib/list_debug.c:33
>>>> __list_add+0xbe/0xd0()
>>>> [8642921.263780] list_add corruption. prev->next should be next
>>>> (ffff887e40fa5368), but was ff:ff884c85a36288. (prev=ffff884c85a36288).
>>>> [8642921.405490] Modules linked in: ipt_MASQUERADE
>>>> nf_nat_masquerade_ipv4
>>>> xt_conntrack iptable_filter arc4 ecb ppp_mppe ppp_async crc_ccitt
>>>> ppp_generic slhc nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs
>>>> fscache
>>>> lockd sunrpc grace veth xt_nat xt_addrtype br_netfilter bridge tcp_diag
>>>> inet_diag oracleacfs(POE) oracleadvm(POE) oracleoks(POE) oracleasm
>>>> autofs4
>>>> dm_queue_length cpufreq_powersave be2iscsi iscsi_boot_sysfs bnx2i
>>>> cnic uio
>>>> cxgb4i iw_cxgb4 cxgb4 cxgb3i libcxgbi iw_cxgb3 cxgb3 ib_iser rdma_cm
>>>> ib_cm
>>>> iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi
>>>> scsi_transport_iscsi 8021q mrp garp stp llc bonding iptable_nat
>>>> nf_nat_ipv4
>>>> nf_nat ipt_REJECT nf_reject_ipv4 xt_pkttype nf_conntrack_ipv4
>>>> nf_defrag_ipv4
>>>> ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6
>>>> xt_state
>>>> [8642922.259780]  nf_conntrack ip6table_filter ip6_tables ipv6
>>>> dm_multipath
>>>> sg iTCO_wdt iTCO_vendor_support ipmi_devintf ipmi_si ipmi_msghandler
>>>> intel_powerclamp coretemp kvm_intel kvm pcspkr ixgbe dca ptp pps_core
>>>> vxlan
>>>> udp_tunnel ip6_udp_tunnel mdio sb_edac edac_core i2c_i801 i2c_core
>>>> lpc_ich
>>>> mfd_core shpchp ext4 jbd2 mbcache2 btrfs raid6_pq xor sd_mod
>>>> megaraid_sas
>>>> ghash_clmulni_intel crc32_pclmul crc32c_intel aesni_intel ablk_helper
>>>> cryptd
>>>> lrw gf128mul glue_helper aes_x86_64 ahci libahci mxm_wmi wmi dm_mirror
>>>> dm_region_hash dm_log dm_mod [last unloaded: xt_conntrack]
>>>> [8642922.850285] CPU: 38 PID: 73638 Comm: dockerd Tainted:
>>>> P           OE
>>>> 4.1.12-112.14.13.el6uek.x86_64 #2
>>>> [8642922.965881] Hardware name: Oracle Corporation ORACLE SERVER
>>>> X6-2/ASM,MOTHERBOARD,1U, BIOS 38040100 06/06/2016
>>>> [8642923.086670]  0000000000000000 ffff88130404fc28 ffffffff816ec91e
>>>> ffff88130404fc78
>>>> [8642923.177730]  0000000000000021 ffff88130404fc68 ffffffff81088925
>>>> 00ff88130404fc98
>>>> [8642923.268793]  ffff884c85a36288 ffff887e40fa5368 ffff884c85a36288
>>>> 0000000000000000
>>>> [8642923.359851] Call Trace:
>>>> [8642923.391194]  [<ffffffff816ec91e>] dump_stack+0x63/0x85
>>>> [8642923.454780]  [<ffffffff81088925>] warn_slowpath_common+0x95/0xe0
>>>> [8642923.528769]  [<ffffffff81088a26>] warn_slowpath_fmt+0x46/0x50
>>>> [8642923.599635]  [<ffffffff81343f3e>] __list_add+0xbe/0xd0
>>>> [8642923.663333]  [<ffffffffa01abb92>]
>>>> btrfs_queue_delayed_node+0xb2/0xe0
>>>> [btrfs]
>>>> [8642923.749898]  [<ffffffffa01abc66>]
>>>> __btrfs_release_delayed_node+0xa6/0x160 [btrfs]
>>>> [8642923.841661]  [<ffffffffa01aea5e>]
>>>> btrfs_delayed_update_inode+0x7e/0x100
>>>> [btrfs]
>>>> [8642923.931336]  [<ffffffffa015b199>] btrfs_update_inode+0xa9/0x130
>>>> [btrfs]
>>>> [8642924.012686]  [<ffffffffa015cd11>] btrfs_dirty_inode+0x71/0xe0
>>>> [btrfs]
>>>> [8642924.091954]  [<ffffffffa015fa25>] btrfs_update_time+0x65/0xc0
>>>> [btrfs]
>>>> [8642924.171145]  [<ffffffff812286d3>] touch_atime+0x123/0x170
>>>> [8642924.237849]  [<ffffffff81221929>] iterate_dir+0xf9/0x140
>>>> [8642924.303512]  [<ffffffff81221a19>] SyS_getdents64+0xa9/0x100
>>>> [8642924.372296]  [<ffffffff81221720>] ? fillonedir+0xe0/0xe0
>>>> [8642924.437963]  [<ffffffff816f2594>] system_call_fastpath+0x12/0xce
>>>> [8642924.512201] ---[ end trace 6782707b09f9f873 ]---
>>>> [8642924.569677] ------------[ cut here ]------------
>>>> [8642924.627055] WARNING: CPU: 38 PID: 73638 at lib/list_debug.c:36
>>>> __list_add+0x9c/0xd0()
>>>> [8642924.722925] list_add double add: new=ffff884c85a36288,
>>>> prev=ffff884c85a36288, next=ffff887e40fa5368.
>>>> [8642924.834373] Modules linked in: ipt_MASQUERADE
>>>> nf_nat_masquerade_ipv4
>>>> xt_conntrack iptable_filter arc4 ecb ppp_mppe ppp_async crc_ccitt
>>>> ppp_generic slhc nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs
>>>> fscache
>>>> lockd sunrpc grace veth xt_nat xt_addrtype br_netfilter bridge tcp_diag
>>>> inet_diag oracleacfs(POE) oracleadvm(POE) oracleoks(POE) oracleasm
>>>> autofs4
>>>> dm_queue_length cpufreq_powersave be2iscsi iscsi_boot_sysfs bnx2i
>>>> cnic uio
>>>> cxgb4i iw_cxgb4 cxgb4 cxgb3i libcxgbi iw_cxgb3 cxgb3 ib_iser rdma_cm
>>>> ib_cm
>>>> iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi
>>>> scsi_transport_iscsi 8021q mrp garp stp llc bonding iptable_nat
>>>> nf_nat_ipv4
>>>> nf_nat ipt_REJECT nf_reject_ipv4 xt_pkttype nf_conntrack_ipv4
>>>> nf_defrag_ipv4
>>>> ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6
>>>> xt_state
>>>> [8642925.688498]  nf_conntrack ip6table_filter ip6_tables ipv6
>>>> dm_multipath
>>>> sg iTCO_wdt iTCO_vendor_support ipmi_devintf ipmi_si ipmi_msghandler
>>>> intel_powerclamp coretemp kvm_intel kvm pcspkr ixgbe dca ptp pps_core
>>>> vxlan
>>>> udp_tunnel ip6_udp_tunnel mdio sb_edac edac_core i2c_i801 i2c_core
>>>> lpc_ich
>>>> mfd_core shpchp ext4 jbd2 mbcache2 btrfs raid6_pq xor sd_mod
>>>> megaraid_sas
>>>> ghash_clmulni_intel crc32_pclmul crc32c_intel aesni_intel ablk_helper
>>>> cryptd
>>>> lrw gf128mul glue_helper aes_x86_64 ahci libahci mxm_wmi wmi dm_mirror
>>>> dm_region_hash dm_log dm_mod [last unloaded: xt_conntrack]
>>>> [8642926.279029] CPU: 38 PID: 73638 Comm: dockerd Tainted: P
>>>> W  OE
>>>> 4.1.12-112.14.13.el6uek.x86_64 #2
>>>> [8642926.394626] Hardware name: Oracle Corporation ORACLE SERVER
>>>> X6-2/ASM,MOTHERBOARD,1U, BIOS 38040100 06/06/2016
>>>> [8642926.515421]  0000000000000000 ffff88130404fc28 ffffffff816ec91e
>>>> ffff88130404fc78
>>>> [8642926.606485]  0000000000000024 ffff88130404fc68 ffffffff81088925
>>>> 00ff88130404fc98
>>>> [8642926.697544]  ffff884c85a36288 ffff887e40fa5368 ffff884c85a36288
>>>> 0000000000000000
>>>> [8642926.788605] Call Trace:
>>>> [8642926.819945]  [<ffffffff816ec91e>] dump_stack+0x63/0x85
>>>> [8642926.883535]  [<ffffffff81088925>] warn_slowpath_common+0x95/0xe0
>>>> [8642926.957529]  [<ffffffff81088a26>] warn_slowpath_fmt+0x46/0x50
>>>> [8642927.028400]  [<ffffffff81343f1c>] __list_add+0x9c/0xd0
>>>> [8642927.092116]  [<ffffffffa01abb92>]
>>>> btrfs_queue_delayed_node+0xb2/0xe0
>>>> [btrfs]
>>>> [8642927.178669]  [<ffffffffa01abc66>]
>>>> __btrfs_release_delayed_node+0xa6/0x160 [btrfs]
>>>> [8642927.270440]  [<ffffffffa01aea5e>]
>>>> btrfs_delayed_update_inode+0x7e/0x100
>>>> [btrfs]
>>>> [8642927.360112]  [<ffffffffa015b199>] btrfs_update_inode+0xa9/0x130
>>>> [btrfs]
>>>> [8642927.441447]  [<ffffffffa015cd11>] btrfs_dirty_inode+0x71/0xe0
>>>> [btrfs]
>>>> [8642927.520716]  [<ffffffffa015fa25>] btrfs_update_time+0x65/0xc0
>>>> [btrfs]
>>>> [8642927.599910]  [<ffffffff812286d3>] touch_atime+0x123/0x170
>>>> [8642927.666615]  [<ffffffff81221929>] iterate_dir+0xf9/0x140
>>>> [8642927.732284]  [<ffffffff81221a19>] SyS_getdents64+0xa9/0x100
>>>> [8642927.801072]  [<ffffffff81221720>] ? fillonedir+0xe0/0xe0
>>>> [8642927.866741]  [<ffffffff816f2594>] system_call_fastpath+0x12/0xce
>>>> [8642927.940843] ---[ end trace 6782707b09f9f874 ]---
>>>> [8642927.998378] ------------[ cut here ]------------
>>>>
>>>>
>>>>
>>>>
>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-09-25 14:49 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-18  0:28 btrfs panic problem sunny.s.zhang
2018-09-18  0:36 ` sunny.s.zhang
2018-09-18  5:05 ` Duncan
2018-09-18  9:13   ` sunny.s.zhang
2018-09-18 23:53   ` sunny.s.zhang
2018-09-19 16:12     ` Nikolay Borisov
2018-09-25  8:29       ` sunny.s.zhang
2018-09-19  0:05 ` Qu Wenruo
2018-09-19  0:35   ` sunny.s.zhang
2018-09-19  2:27     ` Qu Wenruo
2018-09-19 18:36 ` Liu Bo
2018-09-25  8:20   ` sunny.s.zhang
2018-09-25  8:31     ` Nikolay Borisov
2018-09-25  8:42       ` sunny.s.zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).