All of lore.kernel.org
 help / color / mirror / Atom feed
* massive memory leak in 3.1[3-5] with nfs4+kerberos
@ 2014-10-11  3:36 Carlos Carvalho
  2014-10-13 13:58 ` J. Bruce Fields
  0 siblings, 1 reply; 9+ messages in thread
From: Carlos Carvalho @ 2014-10-11  3:36 UTC (permalink / raw)
  To: linux-nfs

We're observing a big memory leak in 3.1[3-5]. We've gone until 3.15.8 and back
to 3.14 because of LTS. Today we're running 3.14.21. The problem has existed
for several months but recently has become a show-stopper.

Here are the values of SUnreclaim: from /proc/meminfo, sampled at every 4h
(units are kB):

87192
297044
765320
2325160
3306056
4412808
4799292
5085392
4999936
5521648
6628496
7785460
8518084
8988404
9141220
9533224
10053484
10954000
11716700
12369516
12847412
13318872
13846196
14339476
14815600
15293564
15798024
17092772
19240084
21679888
22399060
22943812
23407004
24049804
26210880
28034980
29059812  <== almost 30GB!

After a few days the machine has lost so much memory that it panics or becomes
very slow due to lack of cache and we have to reboot it. It's a busy file
server of home directories.

We have several other busy servers (including identical hardware) but the
memory leak happens only in this machine. What is different with it is that
it's the only place where we use:
- nfs4 with authentication and encryption by kerberos
- raid10

All others do only nfs3 or no nfs, and raid6. That's why we suspect it's a nfs4
problem.

What about these patches: http://permalink.gmane.org/gmane.linux.nfs/62012
Bruce said they were accepted but they're not in 3.14. Were they rejected or
forgotten? Could they have any relation to this memory leak?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: massive memory leak in 3.1[3-5] with nfs4+kerberos
  2014-10-11  3:36 massive memory leak in 3.1[3-5] with nfs4+kerberos Carlos Carvalho
@ 2014-10-13 13:58 ` J. Bruce Fields
  2014-10-13 23:50   ` Carlos Carvalho
  0 siblings, 1 reply; 9+ messages in thread
From: J. Bruce Fields @ 2014-10-13 13:58 UTC (permalink / raw)
  To: Carlos Carvalho; +Cc: linux-nfs

On Sat, Oct 11, 2014 at 12:36:27AM -0300, Carlos Carvalho wrote:
> We're observing a big memory leak in 3.1[3-5]. We've gone until 3.15.8 and back
> to 3.14 because of LTS. Today we're running 3.14.21. The problem has existed
> for several months but recently has become a show-stopper.

Is there an older version that you know was OK?

> Here are the values of SUnreclaim: from /proc/meminfo, sampled at every 4h
> (units are kB):
> 
> 87192
> 297044
> 765320
> 2325160
> 3306056
> 4412808
> 4799292
> 5085392
> 4999936
> 5521648
> 6628496
> 7785460
> 8518084
> 8988404
> 9141220
> 9533224
> 10053484
> 10954000
> 11716700
> 12369516
> 12847412
> 13318872
> 13846196
> 14339476
> 14815600
> 15293564
> 15798024
> 17092772
> 19240084
> 21679888
> 22399060
> 22943812
> 23407004
> 24049804
> 26210880
> 28034980
> 29059812  <== almost 30GB!

Can you figure out from /proc/slabinfo which slab is the problem?

> After a few days the machine has lost so much memory that it panics or becomes
> very slow due to lack of cache and we have to reboot it. It's a busy file
> server of home directories.
> 
> We have several other busy servers (including identical hardware) but the
> memory leak happens only in this machine. What is different with it is that
> it's the only place where we use:
> - nfs4 with authentication and encryption by kerberos
> - raid10
> 
> All others do only nfs3 or no nfs, and raid6. That's why we suspect it's a nfs4
> problem.

It would also be interesting to know whether the problem is with nfs4 or
krb5.  But I don't know if you have an easy way to test that.  (E.g.
temporarily downgrade to nfs3 while keeping krb5 and see if that
matters?)

Do you know if any of your clients are using NFSv4.1?

What filesystem are you exporting, with what options?

> What about these patches: http://permalink.gmane.org/gmane.linux.nfs/62012
> Bruce said they were accepted but they're not in 3.14. Were they rejected or
> forgotten? Could they have any relation to this memory leak?

Those are in 3.15.

There'd be no harm in trying them, but on a quick skim I don't think
they're likely to explain your symptoms.

--b.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: massive memory leak in 3.1[3-5] with nfs4+kerberos
  2014-10-13 13:58 ` J. Bruce Fields
@ 2014-10-13 23:50   ` Carlos Carvalho
  2014-10-14 20:42     ` J. Bruce Fields
  0 siblings, 1 reply; 9+ messages in thread
From: Carlos Carvalho @ 2014-10-13 23:50 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

J. Bruce Fields (bfields@fieldses.org) wrote on Mon, Oct 13, 2014 at 10:58:40AM BRT:
> On Sat, Oct 11, 2014 at 12:36:27AM -0300, Carlos Carvalho wrote:
> > We're observing a big memory leak in 3.1[3-5]. We've gone until 3.15.8 and back
> > to 3.14 because of LTS. Today we're running 3.14.21. The problem has existed
> > for several months but recently has become a show-stopper.
> 
> Is there an older version that you know was OK?

Perhaps something as old as 3.8 but I'm not sure if it still worked. We jumped
from 3.8 to 3.13 and this one certainly leaks.

> > Here are the values of SUnreclaim: from /proc/meminfo, sampled at every 4h
> > (units are kB):
[TRIMMED]
> > 28034980
> > 29059812  <== almost 30GB!
> 
> Can you figure out from /proc/slabinfo which slab is the problem?

I don't understand it precisely but here's what slabtop says:

urquell# slabtop -o -sc
 Active / Total Objects (% used)    : 62466073 / 63672277 (98.1%)
 Active / Total Slabs (% used)      : 1903762 / 1903762 (100.0%)
 Active / Total Caches (% used)     : 122 / 140 (87.1%)
 Active / Total Size (% used)       : 31965140.38K / 32265879.49K (99.1%)
 Minimum / Average / Maximum Object : 0.08K / 0.51K / 8.07K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
13782656 13745696  99%    1.00K 430708       32  13782656K ext4_inode_cache
3374235 3374199  99%    2.07K 224949       15   7198368K kmalloc-2048
6750996 6750937  99%    0.57K 241107       28   3857712K kmalloc-512
9438787 9121010  96%    0.26K 304477       31   2435816K dentry
11455656 11444417  99%    0.17K 249036       46   1992288K buffer_head
161378 161367  99%    4.07K  23054        7    737728K kmalloc-4096
6924380 6924248  99%    0.09K 150530       46    602120K kmalloc-16
6754644 6754432  99%    0.08K 132444       51    529776K kmalloc-8
662286 648372  97%    0.62K  12986       51    415552K radix_tree_node
161730 161709  99%    1.69K   8985       18    287520K TCP
1987470 1840303  92%    0.13K  66249       30    264996K kmalloc-64
203366 162439  79%    0.69K   4421       46    141472K sock_inode_cache
626112 285190  45%    0.11K  17392       36     69568K ext4_extent_status
177123 177040  99%    0.31K   3473       51     55568K skbuff_head_cache
357272 357220  99%    0.12K  10508       34     42032K jbd2_inode
136422 136375  99%    0.20K   3498       39     27984K ext4_groupinfo_4k

Note that ext4* appears to be the vilain but it's not because
echo 3>/proc/sys/vm/drop_caches
gets rid of it. The problem seems to be the kmalloc-*, particularly
kmalloc-2048, which never gets smaller.

Maybe this gives you a clue:

urquell# ./slabinfo -r kmalloc-2048

Slabcache: kmalloc-2048          Aliases:  0 Order :  3 Objects: 3379188

Sizes (bytes)     Slabs              Debug                Memory
------------------------------------------------------------------------
Object :    2048  Total  :  225282   Sanity Checks : On   Total: 7382040576
SlabObj:    2120  Full   :  225277   Redzoning     : On   Used : 6920577024
SlabSiz:   32768  Partial:       5   Poisoning     : On   Loss : 461463552
Loss   :      72  CpuSlab:       0   Tracking      : On   Lalig: 243301536
Align  :       8  Objects:      15   Tracing       : Off  Lpadd: 218072976

kmalloc-2048 has no kmem_cache operations

kmalloc-2048: Kernel object allocation
-----------------------------------------------------------------------
      1 mcheck_cpu_init+0x2dd/0x4b0 age=71315213 pid=0 cpus=0 nodes=0
      4 mempool_create_node+0x67/0x130 age=71310870/71310884/71310896 pid=845 cpus=8,10 nodes=1
      1 pcpu_extend_area_map+0x36/0xf0 age=71314878 pid=1 cpus=10 nodes=1
      1 __vmalloc_node_range+0xb1/0x250 age=71314877 pid=1 cpus=10 nodes=1
      7 alloc_fdmem+0x17/0x30 age=19467/40818077/71308204 pid=5541-16287 cpus=0-1,4,6-7 nodes=0
      2 __register_sysctl_table+0x52/0x540 age=71315213/71315213/71315213 pid=0 cpus=0 nodes=0
      1 register_leaf_sysctl_tables+0x7f/0x1f0 age=71315213 pid=0 cpus=0 nodes=0
      1 ext4_kvmalloc+0x1e/0x60 age=71310060 pid=2778 cpus=0 nodes=0
      2 ext4_kvzalloc+0x1e/0x70 age=71310041/71310197/71310354 pid=2778 cpus=4,9 nodes=0-1
      4 ext4_fill_super+0x9e/0x2d50 age=71310064/71310406/71310853 pid=987-2778 cpus=0,9-10 nodes=0-1
      4 journal_init_common+0x12/0x150 age=71310051/71310386/71310852 pid=987-2778 cpus=0,8-9 nodes=0-1
     50 create_client.isra.79+0x6d/0x450 age=56450/25384654/71293503 pid=5372-5435 cpus=0-17,19-21,23 nodes=0-1
   1550 nfsd4_create_session+0x24a/0x810 age=56450/25266733/71293502 pid=5372-5436 cpus=0-11,13-16,19-20,27 nodes=0-1
    136 pci_alloc_dev+0x22/0x60 age=71314711/71314783/71314907 pid=1 cpus=0 nodes=0-1
      1 acpi_ev_create_gpe_block+0x110/0x323 age=71315026 pid=1 cpus=0 nodes=0
      2 tty_write+0x1f9/0x290 age=11025/35647516/71284008 pid=5965-16318 cpus=5-6 nodes=0
      2 kobj_map_init+0x20/0xa0 age=71315078/71315145/71315213 pid=0-1 cpus=0 nodes=0
      7 scsi_host_alloc+0x30/0x410 age=71312864/71313119/71314655 pid=4 cpus=0 nodes=0
     52 scsi_alloc_sdev+0x52/0x290 age=71311100/71311851/71314235 pid=6-289 cpus=0-1,3,8 nodes=0-1
     50 _scsih_slave_alloc+0x36/0x1f0 age=71311100/71311758/71312386 pid=7-289 cpus=0,8 nodes=0-1
      1 ata_attach_transport+0x12/0x550 age=71314887 pid=1 cpus=10 nodes=1
      2 usb_create_shared_hcd+0x3d/0x1b0 age=71312498/71312500/71312502 pid=4 cpus=0 nodes=0
      2 input_alloc_absinfo+0x1f/0x50 age=71312135/71312157/71312180 pid=175 cpus=0 nodes=0
     18 sk_prot_alloc.isra.51+0xab/0x180 age=71307028/71307092/71307162 pid=5268 cpus=0-2 nodes=0
     12 reqsk_queue_alloc+0x5b/0xf0 age=71308217/71308264/71308326 pid=5281-5531 cpus=0,7 nodes=0
      9 __alloc_skb+0x82/0x2a0 age=11/62981/85980 pid=16286-28415 cpus=0,2,12,20 nodes=0-1
     19 alloc_netdev_mqs+0x5d/0x380 age=71309462/71309950/71314887 pid=1-4109 cpus=1,9-10 nodes=0-1
     48 neigh_sysctl_register+0x39/0x270 age=71309462/71310133/71314878 pid=1-4109 cpus=1-3,5-6,9-10,18 nodes=0-1
      2 neigh_hash_alloc+0x9d/0xb0 age=71122283/71208056/71293830 pid=4732-5411 cpus=0,12 nodes=0-1
      4 __rtnl_register+0x9e/0xd0 age=71312234/71314317/71315079 pid=1 cpus=0,2,10 nodes=0-1
      2 nf_ct_l4proto_register+0x96/0x140 age=71312233/71312234/71312235 pid=1 cpus=2 nodes=0
     25 __devinet_sysctl_register+0x39/0xf0 age=71309462/71310428/71314878 pid=1-4109 cpus=1,3,5-6,9-10,18 nodes=0-1
3377159 xprt_alloc+0x1e/0x190 age=0/27663979/71308304 pid=6-32599 cpus=0-31 nodes=0-1
      6 cache_create_net+0x32/0x80 age=71314835/71314848/71314876 pid=1 cpus=8,10 nodes=1
      1 uncore_types_init+0x33/0x1a5 age=71314842 pid=1 cpus=31 nodes=1
      2 netdev_create_hash+0x12/0x30 age=71314887/71314887/71314887 pid=1 cpus=10 nodes=1

kmalloc-2048: Kernel object freeing
------------------------------------------------------------------------
1236114 <not-available> age=4366207748 pid=0 cpus=0 nodes=0-1
      1 pcpu_extend_area_map+0x79/0xf0 age=71312606 pid=281 cpus=0 nodes=0
  13387 do_readv_writev+0x10f/0x2b0 age=11019/32837523/71273914 pid=5372-5436 cpus=0-31 nodes=0-1
    181 __free_fdtable+0xd/0x20 age=64631/41719568/71310089 pid=632-32741 cpus=0-26,28-29,31 nodes=0-1
  25068 free_session+0x113/0x140 age=65100/41914089/71277241 pid=6-32564 cpus=0-9,11-23,25-31 nodes=0-1
   1008 destroy_client+0x34b/0x3f0 age=88408/41955492/71233598 pid=6-32564 cpus=0-9,13-23,25-26,28-31 nodes=0-1
   5568 nfsd4_create_session+0x567/0x810 age=56743/41522683/71293808 pid=5372-5436 cpus=0-28,30 nodes=0-1
      5 acpi_pci_irq_find_prt_entry+0x253/0x26d age=71312722/71313708/71315113 pid=1-4 cpus=0,10 nodes=0-1
      1 free_tty_struct+0x23/0x40 age=68221685 pid=933 cpus=3 nodes=0
     14 flush_to_ldisc+0x86/0x170 age=2758222/47649639/71281276 pid=254-12135 cpus=0,2,9-10,12,15-17,25 nodes=0-1
2094711 __kfree_skb+0x9/0xa0 age=48/27035171/71311197 pid=0-32767 cpus=0-31 nodes=0-1
      1 pskb_expand_head+0x150/0x220 age=71315138 pid=1 cpus=10 nodes=1
      6 xt_free_table_info+0x4e/0x130 age=71308862/71308863/71308864 pid=5003-5005 cpus=25,29-30 nodes=0
      3 inetdev_event+0x36d/0x4e0 age=71310679/71310692/71310703 pid=1160-1199 cpus=5-6,18 nodes=0
      5 inetdev_event+0x37a/0x4e0 age=71310679/71310715/71310749 pid=1160-1199 cpus=3,5,18-19 nodes=0
   3155 unix_stream_sendmsg+0x39c/0x3b0 age=31207/33532093/71308550 pid=398-32756 cpus=0-31 nodes=0-1
      4 addrconf_notify+0x1b1/0x770 age=71310703/71310724/71310749 pid=1160-1199 cpus=3,5,18-19 nodes=0

NUMA nodes           :    0    1
--------------------------------
All slabs              118.1K 107.1K
Partial slabs             1    4

Note the big xprt_alloc. slabinfo is found in the kernel tree at tools/vm.
Another way to see it:

urquell# sort -n /sys/kernel/slab/kmalloc-2048/alloc_calls | tail -n 2
   1519 nfsd4_create_session+0x24a/0x810 age=189221/25894524/71426273 pid=5372-5436 cpus=0-11,13-16,19-20 nodes=0-1
3380755 xprt_alloc+0x1e/0x190 age=5/27767270/71441075 pid=6-32599 cpus=0-31 nodes=0-1

Yet another puzzling thing for us is that the number of allocs and frees is
nearly equal:

urquell# awk '{summ += $1} END {print summ}' /sys/kernel/slab/kmalloc-2048/alloc_calls
3385122
urquell# awk '{summ += $1} END {print summ}' /sys/kernel/slab/kmalloc-2048/free_calls
3385273

> It would also be interesting to know whether the problem is with nfs4 or
> krb5.  But I don't know if you have an easy way to test that.  (E.g.
> temporarily downgrade to nfs3 while keeping krb5 and see if that
> matters?)

That'd be quite hard to do...

> Do you know if any of your clients are using NFSv4.1?

All of them. Clients are a few general login servers and about a hundred
terminals. All of them are diskless and mount their root via nfs3 without
kerberos. The login servers mount the user home dirs with nfs4.1 WITHOUT
kerberos. The terminals run ubuntu and mount with nfs4.1 AND kerberos. Here is
their /proc/version:

Linux version 3.14.14-kernel (root@urquell) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #15 SMP Thu Jul 31 11:15:52 BRT 2014

and here their mount:

urquell.home:/home on /home type nfs4(rw,vers=4.1,addr=10.17.110.3,clientaddr=10.17.110.11)

> What filesystem are you exporting, with what options?

ext4. For the terminals:
/exports        10.254.0.0/16(sec=krb5p:none,ro,async,fsid=0,crossmnt,subtree_check,no_root_squash)
/exports/home   10.254.0.0/16(sec=krb5p,rw,async,no_subtree_check,root_squash)

For the servers:
/exports/home   server.home(rw,async,root_squash,no_subtree_check)

> > What about these patches: http://permalink.gmane.org/gmane.linux.nfs/62012
> > Bruce said they were accepted but they're not in 3.14. Were they rejected or
> > forgotten? Could they have any relation to this memory leak?
> 
> Those are in 3.15.
> 
> There'd be no harm in trying them, but on a quick skim I don't think
> they're likely to explain your symptoms.

Yes, we tried until 3.15.8 to no avail.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: massive memory leak in 3.1[3-5] with nfs4+kerberos
  2014-10-13 23:50   ` Carlos Carvalho
@ 2014-10-14 20:42     ` J. Bruce Fields
  2014-10-28 14:14       ` Carlos Carvalho
  0 siblings, 1 reply; 9+ messages in thread
From: J. Bruce Fields @ 2014-10-14 20:42 UTC (permalink / raw)
  To: Carlos Carvalho; +Cc: linux-nfs

On Mon, Oct 13, 2014 at 08:50:27PM -0300, Carlos Carvalho wrote:
> J. Bruce Fields (bfields@fieldses.org) wrote on Mon, Oct 13, 2014 at 10:58:40AM BRT:
> > On Sat, Oct 11, 2014 at 12:36:27AM -0300, Carlos Carvalho wrote:
> > > We're observing a big memory leak in 3.1[3-5]. We've gone until 3.15.8 and back
> > > to 3.14 because of LTS. Today we're running 3.14.21. The problem has existed
> > > for several months but recently has become a show-stopper.
> > 
> > Is there an older version that you know was OK?
> 
> Perhaps something as old as 3.8 but I'm not sure if it still worked. We jumped
> from 3.8 to 3.13 and this one certainly leaks.
...
>   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
...
> 3374235 3374199  99%    2.07K 224949       15   7198368K kmalloc-2048

> urquell# ./slabinfo -r kmalloc-2048
...
> kmalloc-2048: Kernel object allocation
> -----------------------------------------------------------------------
...
> 3377159 xprt_alloc+0x1e/0x190 age=0/27663979/71308304 pid=6-32599 cpus=0-31 nodes=0-1
...
> Note the big xprt_alloc. slabinfo is found in the kernel tree at tools/vm.
> Another way to see it:
> 
> urquell# sort -n /sys/kernel/slab/kmalloc-2048/alloc_calls | tail -n 2
>    1519 nfsd4_create_session+0x24a/0x810 age=189221/25894524/71426273 pid=5372-5436 cpus=0-11,13-16,19-20 nodes=0-1
> 3380755 xprt_alloc+0x1e/0x190 age=5/27767270/71441075 pid=6-32599 cpus=0-31 nodes=0-1

Agreed that the xprt_alloc is suspicious, though I don't really
understand these statistics.

Since you have 4.1 clients, maybe this would be explained by a leak in
the backchannel code.

> Yet another puzzling thing for us is that the number of allocs and frees is
> nearly equal:
> 
> urquell# awk '{summ += $1} END {print summ}' /sys/kernel/slab/kmalloc-2048/alloc_calls
> 3385122
> urquell# awk '{summ += $1} END {print summ}' /sys/kernel/slab/kmalloc-2048/free_calls
> 3385273

I can't tell what these numbers actually mean.  (E.g. is it really
tracking every single alloc and free since the kernel booted, or does
this just represent recent behavior?)

> 
> > It would also be interesting to know whether the problem is with nfs4 or
> > krb5.  But I don't know if you have an easy way to test that.  (E.g.
> > temporarily downgrade to nfs3 while keeping krb5 and see if that
> > matters?)
> 
> That'd be quite hard to do...
> 
> > Do you know if any of your clients are using NFSv4.1?
> 
> All of them. Clients are a few general login servers and about a hundred
> terminals. All of them are diskless and mount their root via nfs3 without
> kerberos. The login servers mount the user home dirs with nfs4.1 WITHOUT
> kerberos. The terminals run ubuntu and mount with nfs4.1 AND kerberos. Here is
> their /proc/version:

OK, thanks.  I'm not seeing it yet.

I don't see any relevant-looking fix in recent git history, though there
might have overlooked something, especially in the recent rewrite of the
NFSv4 state code.

It could certainly still be worth testing 3.17 if possible.

--b.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: massive memory leak in 3.1[3-5] with nfs4+kerberos
  2014-10-14 20:42     ` J. Bruce Fields
@ 2014-10-28 14:14       ` Carlos Carvalho
  2014-10-28 14:24         ` J. Bruce Fields
  0 siblings, 1 reply; 9+ messages in thread
From: Carlos Carvalho @ 2014-10-28 14:14 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

J. Bruce Fields (bfields@fieldses.org) wrote on Tue, Oct 14, 2014 at 05:42:45PM BRT:
> On Mon, Oct 13, 2014 at 08:50:27PM -0300, Carlos Carvalho wrote:
> > J. Bruce Fields (bfields@fieldses.org) wrote on Mon, Oct 13, 2014 at 10:58:40AM BRT:
> > Note the big xprt_alloc. slabinfo is found in the kernel tree at tools/vm.
> > Another way to see it:
> > 
> > urquell# sort -n /sys/kernel/slab/kmalloc-2048/alloc_calls | tail -n 2
> >    1519 nfsd4_create_session+0x24a/0x810 age=189221/25894524/71426273 pid=5372-5436 cpus=0-11,13-16,19-20 nodes=0-1
> > 3380755 xprt_alloc+0x1e/0x190 age=5/27767270/71441075 pid=6-32599 cpus=0-31 nodes=0-1
> 
> Agreed that the xprt_alloc is suspicious, though I don't really
> understand these statistics.
> 
> Since you have 4.1 clients, maybe this would be explained by a leak in
> the backchannel code.

We've set clients to use 4.0 and it only made the problem worse; the growth in
unreclaimable memory was faster.

> It could certainly still be worth testing 3.17 if possible.

We tested it and it SEEMS the problem doesn't appear in 3.17.1; the SUnreclaim
value oscillates up and down as usual, instead of increasing monotonically.
However it didn't last long enough for us to get conclusive numbers because
after about 5-6h the machine fills the screen with "NMI watchdog CPU #... is
locked for more than 22s". It spits these messages for many cores at once, and
becomes unresponsive; we have to reboot it from the console with alt+sysreq.

Do these 2 new pieces of info give a clue?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: massive memory leak in 3.1[3-5] with nfs4+kerberos
  2014-10-28 14:14       ` Carlos Carvalho
@ 2014-10-28 14:24         ` J. Bruce Fields
  2014-10-28 19:12           ` Carlos Carvalho
  0 siblings, 1 reply; 9+ messages in thread
From: J. Bruce Fields @ 2014-10-28 14:24 UTC (permalink / raw)
  To: Carlos Carvalho; +Cc: linux-nfs

On Tue, Oct 28, 2014 at 12:14:28PM -0200, Carlos Carvalho wrote:
> J. Bruce Fields (bfields@fieldses.org) wrote on Tue, Oct 14, 2014 at 05:42:45PM BRT:
> > On Mon, Oct 13, 2014 at 08:50:27PM -0300, Carlos Carvalho wrote:
> > > J. Bruce Fields (bfields@fieldses.org) wrote on Mon, Oct 13, 2014 at 10:58:40AM BRT:
> > > Note the big xprt_alloc. slabinfo is found in the kernel tree at tools/vm.
> > > Another way to see it:
> > > 
> > > urquell# sort -n /sys/kernel/slab/kmalloc-2048/alloc_calls | tail -n 2
> > >    1519 nfsd4_create_session+0x24a/0x810 age=189221/25894524/71426273 pid=5372-5436 cpus=0-11,13-16,19-20 nodes=0-1
> > > 3380755 xprt_alloc+0x1e/0x190 age=5/27767270/71441075 pid=6-32599 cpus=0-31 nodes=0-1
> > 
> > Agreed that the xprt_alloc is suspicious, though I don't really
> > understand these statistics.
> > 
> > Since you have 4.1 clients, maybe this would be explained by a leak in
> > the backchannel code.
> 
> We've set clients to use 4.0 and it only made the problem worse; the growth in
> unreclaimable memory was faster.
> 
> > It could certainly still be worth testing 3.17 if possible.
> 
> We tested it and it SEEMS the problem doesn't appear in 3.17.1; the SUnreclaim
> value oscillates up and down as usual, instead of increasing monotonically.
> However it didn't last long enough for us to get conclusive numbers because
> after about 5-6h the machine fills the screen with "NMI watchdog CPU #... is
> locked for more than 22s".

Are the backtraces with those messages?

--b.

> It spits these messages for many cores at once, and
> becomes unresponsive; we have to reboot it from the console with alt+sysreq.
> 
> Do these 2 new pieces of info give a clue?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: massive memory leak in 3.1[3-5] with nfs4+kerberos
  2014-10-28 14:24         ` J. Bruce Fields
@ 2014-10-28 19:12           ` Carlos Carvalho
  2014-10-28 19:29             ` J. Bruce Fields
  0 siblings, 1 reply; 9+ messages in thread
From: Carlos Carvalho @ 2014-10-28 19:12 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

J. Bruce Fields (bfields@fieldses.org) wrote on Tue, Oct 28, 2014 at 12:24:50PM BRST:
> On Tue, Oct 28, 2014 at 12:14:28PM -0200, Carlos Carvalho wrote:
> > J. Bruce Fields (bfields@fieldses.org) wrote on Tue, Oct 14, 2014 at 05:42:45PM BRT:
> > > On Mon, Oct 13, 2014 at 08:50:27PM -0300, Carlos Carvalho wrote:
> > > > J. Bruce Fields (bfields@fieldses.org) wrote on Mon, Oct 13, 2014 at 10:58:40AM BRT:
> > > > Note the big xprt_alloc. slabinfo is found in the kernel tree at tools/vm.
> > > > Another way to see it:
> > > > 
> > > > urquell# sort -n /sys/kernel/slab/kmalloc-2048/alloc_calls | tail -n 2
> > > >    1519 nfsd4_create_session+0x24a/0x810 age=189221/25894524/71426273 pid=5372-5436 cpus=0-11,13-16,19-20 nodes=0-1
> > > > 3380755 xprt_alloc+0x1e/0x190 age=5/27767270/71441075 pid=6-32599 cpus=0-31 nodes=0-1
> > > 
> > > Agreed that the xprt_alloc is suspicious, though I don't really
> > > understand these statistics.
> > > 
> > > Since you have 4.1 clients, maybe this would be explained by a leak in
> > > the backchannel code.
> > 
> > We've set clients to use 4.0 and it only made the problem worse; the growth in
> > unreclaimable memory was faster.
> > 
> > > It could certainly still be worth testing 3.17 if possible.
> > 
> > We tested it and it SEEMS the problem doesn't appear in 3.17.1; the SUnreclaim
> > value oscillates up and down as usual, instead of increasing monotonically.
> > However it didn't last long enough for us to get conclusive numbers because
> > after about 5-6h the machine fills the screen with "NMI watchdog CPU #... is
> > locked for more than 22s".
> 
> Are the backtraces with those messages?

First one, nfsd:

Oct 22 18:37:12 urquell kernel: NMI watchdog: BUG: soft lockup - CPU#23 stuck for 23s! [nfsd:12603]
Oct 22 18:37:12 urquell kernel: Modules linked in:
Oct 22 18:37:12 urquell kernel: CPU: 23 PID: 12603 Comm: nfsd Not tainted 3.17.1-urquell-slabdebug #2
Oct 22 18:37:12 urquell kernel: Hardware name: SGI.COM SGI MIS Server/S2600JF, BIOS SE5C600.86B.01.03.0002.062020121504 06/20/2012
Oct 22 18:37:12 urquell kernel: task: ffff880fbb321000 ti: ffff880fbaf58000 task.ti: ffff880fbaf58000
Oct 22 18:37:12 urquell kernel: RIP: 0010:[<ffffffff82511685>]  [<ffffffff82511685>] _raw_spin_unlock_irqrestore+0x5/0x10
Oct 22 18:37:12 urquell kernel: RSP: 0018:ffff880fbaf5bde0  EFLAGS: 00000296
Oct 22 18:37:12 urquell kernel: RAX: ffffffff82841930 RBX: ffffffff820963db RCX: 0000000000000000
Oct 22 18:37:12 urquell kernel: RDX: ffffffff82841948 RSI: 0000000000000296 RDI: ffffffff82841940
Oct 22 18:37:12 urquell kernel: RBP: ffff880fd1ac52d0 R08: 0000000000000000 R09: 0000000000000000
Oct 22 18:37:12 urquell kernel: R10: ffff880fd1ac5b28 R11: ffff880fff804780 R12: 00000000ffffffff
Oct 22 18:37:12 urquell kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Oct 22 18:37:12 urquell kernel: FS:  0000000000000000(0000) GS:ffff880fffde0000(0000) knlGS:0000000000000000
Oct 22 18:37:12 urquell kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 22 18:37:12 urquell kernel: CR2: 00007f22e0b01400 CR3: 000000000280e000 CR4: 00000000000407e0
Oct 22 18:37:12 urquell kernel: Stack:
Oct 22 18:37:12 urquell kernel: ffffffff821dd282 ffff880fbaf5bde8 ffff880fbaf5bde8 ffff881ff785c6d8
Oct 22 18:37:12 urquell kernel: ffff880fd1ac52d0 ffff881ff785c6d8 0000000000000020 ffffffff8285dac0
Oct 22 18:37:12 urquell kernel: ffffffff821e3550 ffff880fbaf5be28 ffff880fbaf5be28 ffff881ff785c6d8
Oct 22 18:37:12 urquell kernel: Call Trace:
Oct 22 18:37:12 urquell kernel: [<ffffffff821dd282>] ? __destroy_client+0xd2/0x130
Oct 22 18:37:12 urquell kernel: [<ffffffff821e3550>] ? nfs4_state_shutdown_net+0x150/0x1e0
Oct 22 18:37:12 urquell kernel: [<ffffffff821c1152>] ? nfsd_shutdown_net+0x32/0x60
Oct 22 18:37:12 urquell kernel: [<ffffffff821c11da>] ? nfsd_last_thread+0x5a/0x90
Oct 22 18:37:12 urquell kernel: [<ffffffff821c163b>] ? nfsd_destroy+0x4b/0x70
Oct 22 18:37:12 urquell kernel: [<ffffffff821c17a4>] ? nfsd+0x144/0x170
Oct 22 18:37:12 urquell kernel: [<ffffffff821c1660>] ? nfsd_destroy+0x70/0x70
Oct 22 18:37:12 urquell kernel: [<ffffffff8207f38c>] ? kthread+0xbc/0xe0
Oct 22 18:37:12 urquell kernel: [<ffffffff8207f2d0>] ? __kthread_parkme+0x70/0x70
Oct 22 18:37:12 urquell kernel: [<ffffffff82511dac>] ? ret_from_fork+0x7c/0xb0
Oct 22 18:37:12 urquell kernel: [<ffffffff8207f2d0>] ? __kthread_parkme+0x70/0x70

Next one, shortly after, another task and core:

Oct 22 18:37:40 urquell kernel: NMI watchdog: BUG: soft lockup - CPU#9 stuck for 23s! [migration/9:54]
Oct 22 18:37:40 urquell kernel: Modules linked in:
Oct 22 18:37:40 urquell kernel: CPU: 9 PID: 54 Comm: migration/9 Tainted: G             L 3.17.1-urquell-slabdebug #2
Oct 22 18:37:40 urquell kernel: Hardware name: SGI.COM SGI MIS Server/S2600JF, BIOS SE5C600.86B.01.03.0002.062020121504 06/20/2012
Oct 22 18:37:40 urquell kernel: task: ffff881ff932a000 ti: ffff881ff888c000 task.ti: ffff881ff888c000
Oct 22 18:37:40 urquell kernel: RIP: 0010:[<ffffffff820c51ec>]  [<ffffffff820c51ec>] multi_cpu_stop+0x4c/0xb0
Oct 22 18:37:40 urquell kernel: RSP: 0000:ffff881ff888fd98  EFLAGS: 00000293
Oct 22 18:37:40 urquell kernel: RAX: 0000000000000000 RBX: ffff881ff932a000 RCX: ffff88203fc2d5a8
Oct 22 18:37:40 urquell kernel: RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffff881ff412bbb0
Oct 22 18:37:40 urquell kernel: RBP: ffff881ff412bbb0 R08: ffff881ff888c000 R09: 0000000000000001
Oct 22 18:37:40 urquell kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff881ff6e18d68
Oct 22 18:37:40 urquell kernel: R13: 0000000000000046 R14: ffffffff8209197b R15: ffff88203fc31200
Oct 22 18:37:40 urquell kernel: FS:  0000000000000000(0000) GS:ffff88203fc20000(0000) knlGS:0000000000000000
Oct 22 18:37:40 urquell kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 22 18:37:40 urquell kernel: CR2: 00007fd3488e5640 CR3: 000000000280e000 CR4: 00000000000407e0
Oct 22 18:37:40 urquell kernel: Stack:
Oct 22 18:37:40 urquell kernel: 0000000000000000 ffff88203fc2d5a0 ffff881ff412bbd8 ffff88203fc2d5a8
Oct 22 18:37:40 urquell kernel: ffff881ff412bbb0 ffffffff820c4e50 ffff880fffc11200 ffffffff82084351
Oct 22 18:37:40 urquell kernel: 0000000000000001 ffff881ff92b8000 0000000000000000 0000000000000000
Oct 22 18:37:40 urquell kernel: Call Trace:
Oct 22 18:37:40 urquell kernel: [<ffffffff820c4e50>] ? cpu_stopper_thread+0x70/0x120
Oct 22 18:37:40 urquell kernel: [<ffffffff82084351>] ? ttwu_do_wakeup+0x11/0x80
Oct 22 18:37:40 urquell kernel: [<ffffffff8208254c>] ? smpboot_thread_fn+0xfc/0x190
Oct 22 18:37:40 urquell kernel: [<ffffffff82082450>] ? SyS_setgroups+0x130/0x130
Oct 22 18:37:40 urquell kernel: [<ffffffff8207f38c>] ? kthread+0xbc/0xe0
Oct 22 18:37:40 urquell kernel: [<ffffffff8207f2d0>] ? __kthread_parkme+0x70/0x70
Oct 22 18:37:40 urquell kernel: [<ffffffff82511dac>] ? ret_from_fork+0x7c/0xb0
Oct 22 18:37:40 urquell kernel: [<ffffffff8207f2d0>] ? __kthread_parkme+0x70/0x70

These alternate in a series:

Oct 22 18:37:40 urquell kernel: NMI watchdog: BUG: soft lockup - CPU#23 stuck for 22s! [nfsd:12603]
[REMOVED]
Oct 22 18:37:40 urquell kernel: [<ffffffff821e3550>] ? nfs4_state_shutdown_net+0x150/0x1e0
Oct 22 18:37:40 urquell kernel: [<ffffffff821c1152>] ? nfsd_shutdown_net+0x32/0x60
Oct 22 18:37:40 urquell kernel: [<ffffffff821c11da>] ? nfsd_last_thread+0x5a/0x90
Oct 22 18:37:40 urquell kernel: [<ffffffff821c163b>] ? nfsd_destroy+0x4b/0x70
Oct 22 18:37:40 urquell kernel: [<ffffffff821c17a4>] ? nfsd+0x144/0x170
Oct 22 18:37:40 urquell kernel: [<ffffffff821c1660>] ? nfsd_destroy+0x70/0x70
Oct 22 18:37:40 urquell kernel: [<ffffffff8207f38c>] ? kthread+0xbc/0xe0
Oct 22 18:37:40 urquell kernel: [<ffffffff8207f2d0>] ? __kthread_parkme+0x70/0x70
Oct 22 18:37:40 urquell kernel: [<ffffffff82511dac>] ? ret_from_fork+0x7c/0xb0
Oct 22 18:37:40 urquell kernel: [<ffffffff8207f2d0>] ? __kthread_parkme+0x70/0x70

Oct 22 18:38:08 urquell kernel: NMI watchdog: BUG: soft lockup - CPU#9 stuck for 23s! [migration/9:54]
[REMOVED]
Oct 22 18:38:08 urquell kernel: [<ffffffff820c4e50>] ? cpu_stopper_thread+0x70/0x120
Oct 22 18:38:08 urquell kernel: [<ffffffff82084351>] ? ttwu_do_wakeup+0x11/0x80
Oct 22 18:38:08 urquell kernel: [<ffffffff8208254c>] ? smpboot_thread_fn+0xfc/0x190
Oct 22 18:38:08 urquell kernel: [<ffffffff82082450>] ? SyS_setgroups+0x130/0x130
Oct 22 18:38:08 urquell kernel: [<ffffffff8207f38c>] ? kthread+0xbc/0xe0
Oct 22 18:38:08 urquell kernel: [<ffffffff8207f2d0>] ? __kthread_parkme+0x70/0x70
Oct 22 18:38:08 urquell kernel: [<ffffffff82511dac>] ? ret_from_fork+0x7c/0xb0
Oct 22 18:38:08 urquell kernel: [<ffffffff8207f2d0>] ? __kthread_parkme+0x70/0x70

Etc. There's some variation in the call thread:

Oct 22 18:38:44 urquell kernel: NMI watchdog: BUG: soft lockup - CPU#23 stuck for 23s! [nfsd:12603]
[REMOVED]
Oct 22 18:38:44 urquell kernel: [<ffffffff82096093>] ? __wake_up+0x43/0x70
Oct 22 18:38:44 urquell kernel: [<ffffffff821dcf49>] ? nfs4_put_stid+0x29/0x100
Oct 22 18:38:44 urquell kernel: [<ffffffff821dd282>] ? __destroy_client+0xd2/0x130
Oct 22 18:38:44 urquell kernel: [<ffffffff821e3550>] ? nfs4_state_shutdown_net+0x150/0x1e0
Oct 22 18:38:44 urquell kernel: [<ffffffff821c1152>] ? nfsd_shutdown_net+0x32/0x60
Oct 22 18:38:44 urquell kernel: [<ffffffff821c11da>] ? nfsd_last_thread+0x5a/0x90
Oct 22 18:38:44 urquell kernel: [<ffffffff821c163b>] ? nfsd_destroy+0x4b/0x70
Oct 22 18:38:44 urquell kernel: [<ffffffff821c17a4>] ? nfsd+0x144/0x170
Oct 22 18:38:44 urquell kernel: [<ffffffff821c1660>] ? nfsd_destroy+0x70/0x70
Oct 22 18:38:44 urquell kernel: [<ffffffff8207f38c>] ? kthread+0xbc/0xe0
Oct 22 18:38:44 urquell kernel: [<ffffffff8207f2d0>] ? __kthread_parkme+0x70/0x70
Oct 22 18:38:44 urquell kernel: [<ffffffff82511dac>] ? ret_from_fork+0x7c/0xb0
Oct 22 18:38:44 urquell kernel: [<ffffffff8207f2d0>] ? __kthread_parkme+0x70/0x70


Then reboot. Next NMI appeared next day:

Oct 23 11:43:24 urquell kernel: NMI watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [kworker/u64:3:6414]
Oct 23 11:43:24 urquell kernel: Modules linked in:
Oct 23 11:43:24 urquell kernel: CPU: 5 PID: 6414 Comm: kworker/u64:3 Not tainted 3.17.1-urquell-slabdebug #2
Oct 23 11:43:24 urquell kernel: Hardware name: SGI.COM SGI MIS Server/S2600JF, BIOS SE5C600.86B.01.03.0002.062020121504 06/20/2012
Oct 23 11:43:24 urquell kernel: Workqueue: nfsd4 laundromat_main
Oct 23 11:43:24 urquell kernel: task: ffff881ff0d49800 ti: ffff8813f5a04000 task.ti: ffff8813f5a04000
Oct 23 11:43:24 urquell kernel: RIP: 0010:[<ffffffff85245cc8>]  [<ffffffff85245cc8>] _atomic_dec_and_lock+0x18/0x60
Oct 23 11:43:24 urquell kernel: RSP: 0018:ffff8813f5a07d08  EFLAGS: 00000246
Oct 23 11:43:24 urquell kernel: RAX: 0000000000000000 RBX: ffffffff85096290 RCX: 00000000ffffffff
Oct 23 11:43:24 urquell kernel: RDX: 0000000000000000 RSI: ffffffff850963db RDI: ffff8813f5a07cf8
Oct 23 11:43:24 urquell kernel: RBP: ffffffff85096093 R08: 0000000000000000 R09: 0000000000000001
Oct 23 11:43:24 urquell kernel: R10: 0000000000000001 R11: 0140000000000000 R12: 0000000000000000
Oct 23 11:43:24 urquell kernel: R13: ffff880ff5c752d0 R14: ffffffff851bdaa0 R15: 0000000000000206
Oct 23 11:43:24 urquell kernel: FS:  0000000000000000(0000) GS:ffff880fffca0000(0000) knlGS:0000000000000000
Oct 23 11:43:24 urquell kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 23 11:43:24 urquell kernel: CR2: ffffffffff600400 CR3: 000000000580e000 CR4: 00000000000407e0
Oct 23 11:43:24 urquell kernel: Stack:
Oct 23 11:43:24 urquell kernel: ffff881fee03dba0 ffffffff85096093 ffff8813f5a07cf8 ffffffff851dcf49
Oct 23 11:43:24 urquell kernel: ffff881fee039950 ffff881fee0398d8 ffff8813f5a07d48 ffffffff851dd282
Oct 23 11:43:24 urquell kernel: ffff8813f5a07d48 ffff8813f5a07d48 ffff8813f5a07db8 ffff881ff7fbbd98
Oct 23 11:43:24 urquell kernel: Call Trace:
Oct 23 11:43:24 urquell kernel: [<ffffffff85096093>] ? __wake_up+0x43/0x70
Oct 23 11:43:24 urquell kernel: [<ffffffff851dcf49>] ? nfs4_put_stid+0x29/0x100
Oct 23 11:43:24 urquell kernel: [<ffffffff851dd282>] ? __destroy_client+0xd2/0x130
Oct 23 11:43:24 urquell kernel: [<ffffffff851dd449>] ? laundromat_main+0x149/0x3f0
Oct 23 11:43:24 urquell kernel: [<ffffffff8507a6a4>] ? process_one_work+0x134/0x340
Oct 23 11:43:24 urquell kernel: [<ffffffff8507adb4>] ? worker_thread+0x114/0x460
Oct 23 11:43:24 urquell kernel: [<ffffffff85095b5c>] ? __wake_up_common+0x4c/0x80
Oct 23 11:43:24 urquell kernel: [<ffffffff8507aca0>] ? init_pwq+0x160/0x160
Oct 23 11:43:24 urquell kernel: [<ffffffff8507f38c>] ? kthread+0xbc/0xe0
Oct 23 11:43:24 urquell kernel: [<ffffffff8507f2d0>] ? __kthread_parkme+0x70/0x70
Oct 23 11:43:24 urquell kernel: [<ffffffff85511dac>] ? ret_from_fork+0x7c/0xb0
Oct 23 11:43:24 urquell kernel: [<ffffffff8507f2d0>] ? __kthread_parkme+0x70/0x70

Oct 23 11:43:52 urquell kernel: NMI watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [kworker/u64:3:6414]
Oct 23 11:43:52 urquell kernel: Modules linked in:
Oct 23 11:43:52 urquell kernel: CPU: 5 PID: 6414 Comm: kworker/u64:3 Tainted: G             L 3.17.1-urquell-slabdebug #2
Oct 23 11:43:52 urquell kernel: Hardware name: SGI.COM SGI MIS Server/S2600JF, BIOS SE5C600.86B.01.03.0002.062020121504 06/20/2012
Oct 23 11:43:52 urquell kernel: Workqueue: nfsd4 laundromat_main
Oct 23 11:43:52 urquell kernel: task: ffff881ff0d49800 ti: ffff8813f5a04000 task.ti: ffff8813f5a04000
Oct 23 11:43:52 urquell kernel: RIP: 0010:[<ffffffff85511685>]  [<ffffffff85511685>] _raw_spin_unlock_irqrestore+0x5/0x10
Oct 23 11:43:52 urquell kernel: RSP: 0018:ffff8813f5a07d40  EFLAGS: 00000296
Oct 23 11:43:52 urquell kernel: RAX: ffffffff85841930 RBX: ffffffff850963db RCX: 0000000000000000
Oct 23 11:43:52 urquell kernel: RDX: ffffffff85841948 RSI: 0000000000000296 RDI: ffffffff85841940
Oct 23 11:43:52 urquell kernel: RBP: ffff881fee0398d8 R08: 0000000000000000 R09: 0000000000000001
Oct 23 11:43:52 urquell kernel: R10: 0000000000000001 R11: 0140000000000000 R12: ffff8813f5a07d48
Oct 23 11:43:52 urquell kernel: R13: ffff8813f5a07d48 R14: 0000000000000000 R15: 0000000000000001
Oct 23 11:43:52 urquell kernel: FS:  0000000000000000(0000) GS:ffff880fffca0000(0000) knlGS:0000000000000000
Oct 23 11:43:52 urquell kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 23 11:43:52 urquell kernel: CR2: ffffffffff600400 CR3: 000000000580e000 CR4: 00000000000407e0
Oct 23 11:43:52 urquell kernel: Stack:
Oct 23 11:43:52 urquell kernel: ffffffff851dd282 ffff8813f5a07d48 ffff8813f5a07d48 ffff8813f5a07db8
Oct 23 11:43:52 urquell kernel: ffff881ff7fbbd98 ffff8813f5a07db8 ffff881ff7fbbeb0 000000005449058d
Oct 23 11:43:52 urquell kernel: ffffffff851dd449 0000000000000000 0000000000000000 0000000000000000
Oct 23 11:43:52 urquell kernel: Call Trace:
Oct 23 11:43:52 urquell kernel: [<ffffffff851dd282>] ? __destroy_client+0xd2/0x130
Oct 23 11:43:52 urquell kernel: [<ffffffff851dd449>] ? laundromat_main+0x149/0x3f0
Oct 23 11:43:52 urquell kernel: [<ffffffff8507a6a4>] ? process_one_work+0x134/0x340
Oct 23 11:43:52 urquell kernel: [<ffffffff8507adb4>] ? worker_thread+0x114/0x460
Oct 23 11:43:52 urquell kernel: [<ffffffff85095b5c>] ? __wake_up_common+0x4c/0x80
Oct 23 11:43:52 urquell kernel: [<ffffffff8507aca0>] ? init_pwq+0x160/0x160
Oct 23 11:43:52 urquell kernel: [<ffffffff8507f38c>] ? kthread+0xbc/0xe0
Oct 23 11:43:52 urquell kernel: [<ffffffff8507f2d0>] ? __kthread_parkme+0x70/0x70
Oct 23 11:43:52 urquell kernel: [<ffffffff85511dac>] ? ret_from_fork+0x7c/0xb0
Oct 23 11:43:52 urquell kernel: [<ffffffff8507f2d0>] ? __kthread_parkme+0x70/0x70

And it repeats, always kworker at the same core.

We then reverted to 3.14.21, since an uptime of ~5h (under load) is not viable
in this mission critical server.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: massive memory leak in 3.1[3-5] with nfs4+kerberos
  2014-10-28 19:12           ` Carlos Carvalho
@ 2014-10-28 19:29             ` J. Bruce Fields
  2014-10-28 19:37               ` Carlos Carvalho
  0 siblings, 1 reply; 9+ messages in thread
From: J. Bruce Fields @ 2014-10-28 19:29 UTC (permalink / raw)
  To: Carlos Carvalho; +Cc: linux-nfs

On Tue, Oct 28, 2014 at 05:12:29PM -0200, Carlos Carvalho wrote:
> J. Bruce Fields (bfields@fieldses.org) wrote on Tue, Oct 28, 2014 at 12:24:50PM BRST:
> > On Tue, Oct 28, 2014 at 12:14:28PM -0200, Carlos Carvalho wrote:
> > > J. Bruce Fields (bfields@fieldses.org) wrote on Tue, Oct 14, 2014 at 05:42:45PM BRT:
> > > > On Mon, Oct 13, 2014 at 08:50:27PM -0300, Carlos Carvalho wrote:
> > > > > J. Bruce Fields (bfields@fieldses.org) wrote on Mon, Oct 13, 2014 at 10:58:40AM BRT:
> > > > > Note the big xprt_alloc. slabinfo is found in the kernel tree at tools/vm.
> > > > > Another way to see it:
> > > > > 
> > > > > urquell# sort -n /sys/kernel/slab/kmalloc-2048/alloc_calls | tail -n 2
> > > > >    1519 nfsd4_create_session+0x24a/0x810 age=189221/25894524/71426273 pid=5372-5436 cpus=0-11,13-16,19-20 nodes=0-1
> > > > > 3380755 xprt_alloc+0x1e/0x190 age=5/27767270/71441075 pid=6-32599 cpus=0-31 nodes=0-1
> > > > 
> > > > Agreed that the xprt_alloc is suspicious, though I don't really
> > > > understand these statistics.
> > > > 
> > > > Since you have 4.1 clients, maybe this would be explained by a leak in
> > > > the backchannel code.
> > > 
> > > We've set clients to use 4.0 and it only made the problem worse; the growth in
> > > unreclaimable memory was faster.
> > > 
> > > > It could certainly still be worth testing 3.17 if possible.
> > > 
> > > We tested it and it SEEMS the problem doesn't appear in 3.17.1; the SUnreclaim
> > > value oscillates up and down as usual, instead of increasing monotonically.
> > > However it didn't last long enough for us to get conclusive numbers because
> > > after about 5-6h the machine fills the screen with "NMI watchdog CPU #... is
> > > locked for more than 22s".
> > 
> > Are the backtraces with those messages?
> 
> First one, nfsd:
> 
> Oct 22 18:37:12 urquell kernel: NMI watchdog: BUG: soft lockup - CPU#23 stuck for 23s! [nfsd:12603]
> Oct 22 18:37:12 urquell kernel: Modules linked in:
> Oct 22 18:37:12 urquell kernel: CPU: 23 PID: 12603 Comm: nfsd Not tainted 3.17.1-urquell-slabdebug #2
> Oct 22 18:37:12 urquell kernel: Hardware name: SGI.COM SGI MIS Server/S2600JF, BIOS SE5C600.86B.01.03.0002.062020121504 06/20/2012
> Oct 22 18:37:12 urquell kernel: task: ffff880fbb321000 ti: ffff880fbaf58000 task.ti: ffff880fbaf58000
> Oct 22 18:37:12 urquell kernel: RIP: 0010:[<ffffffff82511685>]  [<ffffffff82511685>] _raw_spin_unlock_irqrestore+0x5/0x10
> Oct 22 18:37:12 urquell kernel: RSP: 0018:ffff880fbaf5bde0  EFLAGS: 00000296
> Oct 22 18:37:12 urquell kernel: RAX: ffffffff82841930 RBX: ffffffff820963db RCX: 0000000000000000
> Oct 22 18:37:12 urquell kernel: RDX: ffffffff82841948 RSI: 0000000000000296 RDI: ffffffff82841940
> Oct 22 18:37:12 urquell kernel: RBP: ffff880fd1ac52d0 R08: 0000000000000000 R09: 0000000000000000
> Oct 22 18:37:12 urquell kernel: R10: ffff880fd1ac5b28 R11: ffff880fff804780 R12: 00000000ffffffff
> Oct 22 18:37:12 urquell kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> Oct 22 18:37:12 urquell kernel: FS:  0000000000000000(0000) GS:ffff880fffde0000(0000) knlGS:0000000000000000
> Oct 22 18:37:12 urquell kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Oct 22 18:37:12 urquell kernel: CR2: 00007f22e0b01400 CR3: 000000000280e000 CR4: 00000000000407e0
> Oct 22 18:37:12 urquell kernel: Stack:
> Oct 22 18:37:12 urquell kernel: ffffffff821dd282 ffff880fbaf5bde8 ffff880fbaf5bde8 ffff881ff785c6d8
> Oct 22 18:37:12 urquell kernel: ffff880fd1ac52d0 ffff881ff785c6d8 0000000000000020 ffffffff8285dac0
> Oct 22 18:37:12 urquell kernel: ffffffff821e3550 ffff880fbaf5be28 ffff880fbaf5be28 ffff881ff785c6d8
> Oct 22 18:37:12 urquell kernel: Call Trace:
> Oct 22 18:37:12 urquell kernel: [<ffffffff821dd282>] ? __destroy_client+0xd2/0x130
> Oct 22 18:37:12 urquell kernel: [<ffffffff821e3550>] ? nfs4_state_shutdown_net+0x150/0x1e0
> Oct 22 18:37:12 urquell kernel: [<ffffffff821c1152>] ? nfsd_shutdown_net+0x32/0x60
> Oct 22 18:37:12 urquell kernel: [<ffffffff821c11da>] ? nfsd_last_thread+0x5a/0x90
> Oct 22 18:37:12 urquell kernel: [<ffffffff821c163b>] ? nfsd_destroy+0x4b/0x70
> Oct 22 18:37:12 urquell kernel: [<ffffffff821c17a4>] ? nfsd+0x144/0x170
> Oct 22 18:37:12 urquell kernel: [<ffffffff821c1660>] ? nfsd_destroy+0x70/0x70
> Oct 22 18:37:12 urquell kernel: [<ffffffff8207f38c>] ? kthread+0xbc/0xe0
> Oct 22 18:37:12 urquell kernel: [<ffffffff8207f2d0>] ? __kthread_parkme+0x70/0x70
> Oct 22 18:37:12 urquell kernel: [<ffffffff82511dac>] ? ret_from_fork+0x7c/0xb0
> Oct 22 18:37:12 urquell kernel: [<ffffffff8207f2d0>] ? __kthread_parkme+0x70/0x70

That means the server was shutting down.  Is this some kind of cluster
failover or were you shutting it down by hand?

...
> Then reboot. Next NMI appeared next day:
> 
> Oct 23 11:43:24 urquell kernel: NMI watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [kworker/u64:3:6414]
> Oct 23 11:43:24 urquell kernel: Modules linked in:
> Oct 23 11:43:24 urquell kernel: CPU: 5 PID: 6414 Comm: kworker/u64:3 Not tainted 3.17.1-urquell-slabdebug #2
> Oct 23 11:43:24 urquell kernel: Hardware name: SGI.COM SGI MIS Server/S2600JF, BIOS SE5C600.86B.01.03.0002.062020121504 06/20/2012
> Oct 23 11:43:24 urquell kernel: Workqueue: nfsd4 laundromat_main
> Oct 23 11:43:24 urquell kernel: task: ffff881ff0d49800 ti: ffff8813f5a04000 task.ti: ffff8813f5a04000
> Oct 23 11:43:24 urquell kernel: RIP: 0010:[<ffffffff85245cc8>]  [<ffffffff85245cc8>] _atomic_dec_and_lock+0x18/0x60
> Oct 23 11:43:24 urquell kernel: RSP: 0018:ffff8813f5a07d08  EFLAGS: 00000246
> Oct 23 11:43:24 urquell kernel: RAX: 0000000000000000 RBX: ffffffff85096290 RCX: 00000000ffffffff
> Oct 23 11:43:24 urquell kernel: RDX: 0000000000000000 RSI: ffffffff850963db RDI: ffff8813f5a07cf8
> Oct 23 11:43:24 urquell kernel: RBP: ffffffff85096093 R08: 0000000000000000 R09: 0000000000000001
> Oct 23 11:43:24 urquell kernel: R10: 0000000000000001 R11: 0140000000000000 R12: 0000000000000000
> Oct 23 11:43:24 urquell kernel: R13: ffff880ff5c752d0 R14: ffffffff851bdaa0 R15: 0000000000000206
> Oct 23 11:43:24 urquell kernel: FS:  0000000000000000(0000) GS:ffff880fffca0000(0000) knlGS:0000000000000000
> Oct 23 11:43:24 urquell kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Oct 23 11:43:24 urquell kernel: CR2: ffffffffff600400 CR3: 000000000580e000 CR4: 00000000000407e0
> Oct 23 11:43:24 urquell kernel: Stack:
> Oct 23 11:43:24 urquell kernel: ffff881fee03dba0 ffffffff85096093 ffff8813f5a07cf8 ffffffff851dcf49
> Oct 23 11:43:24 urquell kernel: ffff881fee039950 ffff881fee0398d8 ffff8813f5a07d48 ffffffff851dd282
> Oct 23 11:43:24 urquell kernel: ffff8813f5a07d48 ffff8813f5a07d48 ffff8813f5a07db8 ffff881ff7fbbd98
> Oct 23 11:43:24 urquell kernel: Call Trace:
> Oct 23 11:43:24 urquell kernel: [<ffffffff85096093>] ? __wake_up+0x43/0x70
> Oct 23 11:43:24 urquell kernel: [<ffffffff851dcf49>] ? nfs4_put_stid+0x29/0x100
> Oct 23 11:43:24 urquell kernel: [<ffffffff851dd282>] ? __destroy_client+0xd2/0x130
> Oct 23 11:43:24 urquell kernel: [<ffffffff851dd449>] ? laundromat_main+0x149/0x3f0
> Oct 23 11:43:24 urquell kernel: [<ffffffff8507a6a4>] ? process_one_work+0x134/0x340
> Oct 23 11:43:24 urquell kernel: [<ffffffff8507adb4>] ? worker_thread+0x114/0x460
> Oct 23 11:43:24 urquell kernel: [<ffffffff85095b5c>] ? __wake_up_common+0x4c/0x80
> Oct 23 11:43:24 urquell kernel: [<ffffffff8507aca0>] ? init_pwq+0x160/0x160
> Oct 23 11:43:24 urquell kernel: [<ffffffff8507f38c>] ? kthread+0xbc/0xe0
> Oct 23 11:43:24 urquell kernel: [<ffffffff8507f2d0>] ? __kthread_parkme+0x70/0x70
> Oct 23 11:43:24 urquell kernel: [<ffffffff85511dac>] ? ret_from_fork+0x7c/0xb0
> Oct 23 11:43:24 urquell kernel: [<ffffffff8507f2d0>] ? __kthread_parkme+0x70/0x70

Hm.  So maybe we forgot to drop cl_lock somewhere?

--b.

> 
> Oct 23 11:43:52 urquell kernel: NMI watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [kworker/u64:3:6414]
> Oct 23 11:43:52 urquell kernel: Modules linked in:
> Oct 23 11:43:52 urquell kernel: CPU: 5 PID: 6414 Comm: kworker/u64:3 Tainted: G             L 3.17.1-urquell-slabdebug #2
> Oct 23 11:43:52 urquell kernel: Hardware name: SGI.COM SGI MIS Server/S2600JF, BIOS SE5C600.86B.01.03.0002.062020121504 06/20/2012
> Oct 23 11:43:52 urquell kernel: Workqueue: nfsd4 laundromat_main
> Oct 23 11:43:52 urquell kernel: task: ffff881ff0d49800 ti: ffff8813f5a04000 task.ti: ffff8813f5a04000
> Oct 23 11:43:52 urquell kernel: RIP: 0010:[<ffffffff85511685>]  [<ffffffff85511685>] _raw_spin_unlock_irqrestore+0x5/0x10
> Oct 23 11:43:52 urquell kernel: RSP: 0018:ffff8813f5a07d40  EFLAGS: 00000296
> Oct 23 11:43:52 urquell kernel: RAX: ffffffff85841930 RBX: ffffffff850963db RCX: 0000000000000000
> Oct 23 11:43:52 urquell kernel: RDX: ffffffff85841948 RSI: 0000000000000296 RDI: ffffffff85841940
> Oct 23 11:43:52 urquell kernel: RBP: ffff881fee0398d8 R08: 0000000000000000 R09: 0000000000000001
> Oct 23 11:43:52 urquell kernel: R10: 0000000000000001 R11: 0140000000000000 R12: ffff8813f5a07d48
> Oct 23 11:43:52 urquell kernel: R13: ffff8813f5a07d48 R14: 0000000000000000 R15: 0000000000000001
> Oct 23 11:43:52 urquell kernel: FS:  0000000000000000(0000) GS:ffff880fffca0000(0000) knlGS:0000000000000000
> Oct 23 11:43:52 urquell kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Oct 23 11:43:52 urquell kernel: CR2: ffffffffff600400 CR3: 000000000580e000 CR4: 00000000000407e0
> Oct 23 11:43:52 urquell kernel: Stack:
> Oct 23 11:43:52 urquell kernel: ffffffff851dd282 ffff8813f5a07d48 ffff8813f5a07d48 ffff8813f5a07db8
> Oct 23 11:43:52 urquell kernel: ffff881ff7fbbd98 ffff8813f5a07db8 ffff881ff7fbbeb0 000000005449058d
> Oct 23 11:43:52 urquell kernel: ffffffff851dd449 0000000000000000 0000000000000000 0000000000000000
> Oct 23 11:43:52 urquell kernel: Call Trace:
> Oct 23 11:43:52 urquell kernel: [<ffffffff851dd282>] ? __destroy_client+0xd2/0x130
> Oct 23 11:43:52 urquell kernel: [<ffffffff851dd449>] ? laundromat_main+0x149/0x3f0
> Oct 23 11:43:52 urquell kernel: [<ffffffff8507a6a4>] ? process_one_work+0x134/0x340
> Oct 23 11:43:52 urquell kernel: [<ffffffff8507adb4>] ? worker_thread+0x114/0x460
> Oct 23 11:43:52 urquell kernel: [<ffffffff85095b5c>] ? __wake_up_common+0x4c/0x80
> Oct 23 11:43:52 urquell kernel: [<ffffffff8507aca0>] ? init_pwq+0x160/0x160
> Oct 23 11:43:52 urquell kernel: [<ffffffff8507f38c>] ? kthread+0xbc/0xe0
> Oct 23 11:43:52 urquell kernel: [<ffffffff8507f2d0>] ? __kthread_parkme+0x70/0x70
> Oct 23 11:43:52 urquell kernel: [<ffffffff85511dac>] ? ret_from_fork+0x7c/0xb0
> Oct 23 11:43:52 urquell kernel: [<ffffffff8507f2d0>] ? __kthread_parkme+0x70/0x70
> 
> And it repeats, always kworker at the same core.
> 
> We then reverted to 3.14.21, since an uptime of ~5h (under load) is not viable
> in this mission critical server.
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: massive memory leak in 3.1[3-5] with nfs4+kerberos
  2014-10-28 19:29             ` J. Bruce Fields
@ 2014-10-28 19:37               ` Carlos Carvalho
  0 siblings, 0 replies; 9+ messages in thread
From: Carlos Carvalho @ 2014-10-28 19:37 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

J. Bruce Fields (bfields@fieldses.org) wrote on Tue, Oct 28, 2014 at 05:29:40PM BRST:
> On Tue, Oct 28, 2014 at 05:12:29PM -0200, Carlos Carvalho wrote:
> > J. Bruce Fields (bfields@fieldses.org) wrote on Tue, Oct 28, 2014 at 12:24:50PM BRST:
> > > On Tue, Oct 28, 2014 at 12:14:28PM -0200, Carlos Carvalho wrote:
> > > > J. Bruce Fields (bfields@fieldses.org) wrote on Tue, Oct 14, 2014 at 05:42:45PM BRT:
> > > > > On Mon, Oct 13, 2014 at 08:50:27PM -0300, Carlos Carvalho wrote:
> > > > > > J. Bruce Fields (bfields@fieldses.org) wrote on Mon, Oct 13, 2014 at 10:58:40AM BRT:
> > > > > > Note the big xprt_alloc. slabinfo is found in the kernel tree at tools/vm.
> > > > > > Another way to see it:
> > > > > > 
> > > > > > urquell# sort -n /sys/kernel/slab/kmalloc-2048/alloc_calls | tail -n 2
> > > > > >    1519 nfsd4_create_session+0x24a/0x810 age=189221/25894524/71426273 pid=5372-5436 cpus=0-11,13-16,19-20 nodes=0-1
> > > > > > 3380755 xprt_alloc+0x1e/0x190 age=5/27767270/71441075 pid=6-32599 cpus=0-31 nodes=0-1
> > > > > 
> > > > > Agreed that the xprt_alloc is suspicious, though I don't really
> > > > > understand these statistics.
> > > > > 
> > > > > Since you have 4.1 clients, maybe this would be explained by a leak in
> > > > > the backchannel code.
> > > > 
> > > > We've set clients to use 4.0 and it only made the problem worse; the growth in
> > > > unreclaimable memory was faster.
> > > > 
> > > > > It could certainly still be worth testing 3.17 if possible.
> > > > 
> > > > We tested it and it SEEMS the problem doesn't appear in 3.17.1; the SUnreclaim
> > > > value oscillates up and down as usual, instead of increasing monotonically.
> > > > However it didn't last long enough for us to get conclusive numbers because
> > > > after about 5-6h the machine fills the screen with "NMI watchdog CPU #... is
> > > > locked for more than 22s".
> > > 
> > > Are the backtraces with those messages?
> > 
> > First one, nfsd:
> > 
> > Oct 22 18:37:12 urquell kernel: NMI watchdog: BUG: soft lockup - CPU#23 stuck for 23s! [nfsd:12603]
> > Oct 22 18:37:12 urquell kernel: Modules linked in:
> > Oct 22 18:37:12 urquell kernel: CPU: 23 PID: 12603 Comm: nfsd Not tainted 3.17.1-urquell-slabdebug #2
> > Oct 22 18:37:12 urquell kernel: Hardware name: SGI.COM SGI MIS Server/S2600JF, BIOS SE5C600.86B.01.03.0002.062020121504 06/20/2012
> > Oct 22 18:37:12 urquell kernel: task: ffff880fbb321000 ti: ffff880fbaf58000 task.ti: ffff880fbaf58000
> > Oct 22 18:37:12 urquell kernel: RIP: 0010:[<ffffffff82511685>]  [<ffffffff82511685>] _raw_spin_unlock_irqrestore+0x5/0x10
> > Oct 22 18:37:12 urquell kernel: RSP: 0018:ffff880fbaf5bde0  EFLAGS: 00000296
> > Oct 22 18:37:12 urquell kernel: RAX: ffffffff82841930 RBX: ffffffff820963db RCX: 0000000000000000
> > Oct 22 18:37:12 urquell kernel: RDX: ffffffff82841948 RSI: 0000000000000296 RDI: ffffffff82841940
> > Oct 22 18:37:12 urquell kernel: RBP: ffff880fd1ac52d0 R08: 0000000000000000 R09: 0000000000000000
> > Oct 22 18:37:12 urquell kernel: R10: ffff880fd1ac5b28 R11: ffff880fff804780 R12: 00000000ffffffff
> > Oct 22 18:37:12 urquell kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> > Oct 22 18:37:12 urquell kernel: FS:  0000000000000000(0000) GS:ffff880fffde0000(0000) knlGS:0000000000000000
> > Oct 22 18:37:12 urquell kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > Oct 22 18:37:12 urquell kernel: CR2: 00007f22e0b01400 CR3: 000000000280e000 CR4: 00000000000407e0
> > Oct 22 18:37:12 urquell kernel: Stack:
> > Oct 22 18:37:12 urquell kernel: ffffffff821dd282 ffff880fbaf5bde8 ffff880fbaf5bde8 ffff881ff785c6d8
> > Oct 22 18:37:12 urquell kernel: ffff880fd1ac52d0 ffff881ff785c6d8 0000000000000020 ffffffff8285dac0
> > Oct 22 18:37:12 urquell kernel: ffffffff821e3550 ffff880fbaf5be28 ffff880fbaf5be28 ffff881ff785c6d8
> > Oct 22 18:37:12 urquell kernel: Call Trace:
> > Oct 22 18:37:12 urquell kernel: [<ffffffff821dd282>] ? __destroy_client+0xd2/0x130
> > Oct 22 18:37:12 urquell kernel: [<ffffffff821e3550>] ? nfs4_state_shutdown_net+0x150/0x1e0
> > Oct 22 18:37:12 urquell kernel: [<ffffffff821c1152>] ? nfsd_shutdown_net+0x32/0x60
> > Oct 22 18:37:12 urquell kernel: [<ffffffff821c11da>] ? nfsd_last_thread+0x5a/0x90
> > Oct 22 18:37:12 urquell kernel: [<ffffffff821c163b>] ? nfsd_destroy+0x4b/0x70
> > Oct 22 18:37:12 urquell kernel: [<ffffffff821c17a4>] ? nfsd+0x144/0x170
> > Oct 22 18:37:12 urquell kernel: [<ffffffff821c1660>] ? nfsd_destroy+0x70/0x70
> > Oct 22 18:37:12 urquell kernel: [<ffffffff8207f38c>] ? kthread+0xbc/0xe0
> > Oct 22 18:37:12 urquell kernel: [<ffffffff8207f2d0>] ? __kthread_parkme+0x70/0x70
> > Oct 22 18:37:12 urquell kernel: [<ffffffff82511dac>] ? ret_from_fork+0x7c/0xb0
> > Oct 22 18:37:12 urquell kernel: [<ffffffff8207f2d0>] ? __kthread_parkme+0x70/0x70
> 
> That means the server was shutting down.

Not at this moment. Reboot happened about 10min later; the other traces
happened before I rebooted.

> Is this some kind of cluster failover or were you shutting it down by hand?

By hand, and there's no cluster.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-10-28 19:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-11  3:36 massive memory leak in 3.1[3-5] with nfs4+kerberos Carlos Carvalho
2014-10-13 13:58 ` J. Bruce Fields
2014-10-13 23:50   ` Carlos Carvalho
2014-10-14 20:42     ` J. Bruce Fields
2014-10-28 14:14       ` Carlos Carvalho
2014-10-28 14:24         ` J. Bruce Fields
2014-10-28 19:12           ` Carlos Carvalho
2014-10-28 19:29             ` J. Bruce Fields
2014-10-28 19:37               ` Carlos Carvalho

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.