linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* nfsd: memory leak when client does many file operations
@ 2024-03-24 19:57 Jan Schunk
  2024-03-24 20:14 ` [External] : " Chuck Lever III
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Schunk @ 2024-03-24 19:57 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	linux-nfs, linux-kernel

Issue found on: v6.5.13 v6.6.13, v6.6.14, v6.6.20 and v6.8.1
Not found on: v6.4, v6.1.82 and below
Architectures: amd64 and arm(hf)

Steps to reproduce:
- Create a VM with 1GB RAM
- Install Debian 12
- Install linux-image-6.6.13+bpo-amd64-unsigned and nfs-kernel-server
- Export some folder
On the client:
- Mount the share
- Run a script that does produce heavy usage on the share (like unpacking large tar archives that cointain many small files into a git and commiting them)

On my setup it takes 20-40 hours until the memory is full and oom-kill gets hired by nfsd to kill other processes. the memory stays full and the system reboots:

[121969.590000] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,task=dbus-daemon,pid=454,uid=101
[121969.600000] Out of memory: Killed process 454 (dbus-daemon) total-vm:6196kB, anon-rss:128kB, file-rss:1408kB, shmem-rss:0kB, UID:101 pgtables:12kB oom_score_adj:-900
[121971.700000] oom_reaper: reaped process 454 (dbus-daemon), now anon-rss:0kB, file-rss:64kB, shmem-rss:0kB
[121971.920000] nfsd invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
[121971.930000] CPU: 1 PID: 537 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
[121971.930000] Hardware name: Freescale LS1024A
[121971.940000]  unwind_backtrace from show_stack+0xb/0xc
[121971.940000]  show_stack from dump_stack_lvl+0x2b/0x34
[121971.950000]  dump_stack_lvl from dump_header+0x35/0x212
[121971.950000]  dump_header from out_of_memory+0x317/0x34c
[121971.960000]  out_of_memory from __alloc_pages+0x8e7/0xbb0
[121971.970000]  __alloc_pages from __alloc_pages_bulk+0x26d/0x3d8
[121971.970000]  __alloc_pages_bulk from svc_recv+0x9d/0x7d4
[121971.980000]  svc_recv from nfsd+0x7d/0xd4
[121971.980000]  nfsd from kthread+0xb9/0xcc
[121971.990000]  kthread from ret_from_fork+0x11/0x1c
[121971.990000] Exception stack(0xc2cadfb0 to 0xc2cadff8)
[121971.990000] dfa0:                                     00000000 00000000 00000000 00000000
[121972.000000] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[121972.010000] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[121972.020000] Mem-Info:
[121972.020000] active_anon:101 inactive_anon:127 isolated_anon:29
[121972.020000]  active_file:1200 inactive_file:1204 isolated_file:98
[121972.020000]  unevictable:394 dirty:296 writeback:17
[121972.020000]  slab_reclaimable:13680 slab_unreclaimable:4350
[121972.020000]  mapped:637 shmem:4 pagetables:414
[121972.020000]  sec_pagetables:0 bounce:0
[121972.020000]  kernel_misc_reclaimable:0
[121972.020000]  free:7279 free_pcp:184 free_cma:1094
[121972.060000] Node 0 active_anon:404kB inactive_anon:508kB active_file:4736kB inactive_file:4884kB unevictable:1576kB isolated(anon):116kB isolated(file):388kB mapped:2548kB dirty:1184kB writeback:68kB shmem:16kB writeback_tmp:0kB kernel_stack:1088kB pagetables:1656kB sec_pagetables:0kB all_unreclaimable? no
[121972.090000] Normal free:29116kB boost:18432kB min:26624kB low:28672kB high:30720kB reserved_highatomic:0KB active_anon:404kB inactive_anon:712kB active_file:4788kB inactive_file:4752kB unevictable:1576kB writepending:1252kB present:1048576kB managed:1011988kB mlocked:1576kB bounce:0kB free_pcp:736kB local_pcp:236kB free_cma:4376kB
[121972.120000] lowmem_reserve[]: 0 0
[121972.120000] Normal: 2137*4kB (UEC) 1173*8kB (UEC) 529*16kB (UEC) 19*32kB (UC) 7*64kB (C) 5*128kB (C) 2*256kB (C) 1*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 29116kB
[121972.140000] 2991 total pagecache pages
[121972.140000] 166 pages in swap cache
[121972.140000] Free swap  = 93424kB
[121972.150000] Total swap = 102396kB
[121972.150000] 262144 pages RAM
[121972.150000] 0 pages HighMem/MovableOnly
[121972.160000] 9147 pages reserved
[121972.160000] 4096 pages cma reserved
[121972.160000] Unreclaimable slab info:
[121972.170000] Name                      Used          Total
[121972.170000] bio-88                    64KB         64KB
[121972.180000] TCPv6                     61KB         61KB
[121972.180000] bio-76                    16KB         16KB
[121972.190000] bio-188                   11KB         11KB
[121972.190000] nfs_read_data             22KB         22KB
[121972.200000] kioctx                    15KB         15KB
[121972.200000] posix_timers_cache          7KB          7KB
[121972.210000] UDP                       63KB         63KB
[121972.220000] tw_sock_TCP                3KB          3KB
[121972.220000] request_sock_TCP           3KB          3KB
[121972.230000] TCP                       62KB         62KB
[121972.230000] bio-168                    7KB          7KB
[121972.240000] ep_head                    8KB          8KB
[121972.240000] request_queue             15KB         15KB
[121972.250000] bio-124                   18KB         40KB
[121972.250000] biovec-max               264KB        264KB
[121972.260000] biovec-128                63KB         63KB
[121972.260000] biovec-64                157KB        157KB
[121972.270000] skbuff_small_head         94KB         94KB
[121972.270000] skbuff_fclone_cache         55KB         63KB
[121972.280000] skbuff_head_cache         59KB         59KB
[121972.280000] fsnotify_mark_connector         16KB         28KB
[121972.290000] sigqueue                  19KB         31KB
[121972.300000] shmem_inode_cache       1622KB       1662KB
[121972.300000] kernfs_iattrs_cache         15KB         15KB
[121972.310000] kernfs_node_cache       2107KB       2138KB
[121972.310000] filp                     259KB        315KB
[121972.320000] net_namespace             30KB         30KB
[121972.320000] uts_namespace             15KB         15KB
[121972.330000] vma_lock                 143KB        179KB
[121972.330000] vm_area_struct           459KB        553KB
[121972.340000] sighand_cache            191KB        220KB
[121972.340000] task_struct              378KB        446KB
[121972.350000] anon_vma_chain           753KB        804KB
[121972.360000] anon_vma                 170KB        207KB
[121972.360000] trace_event_file          83KB         83KB
[121972.370000] mm_struct                157KB        173KB
[121972.370000] vmap_area                217KB        354KB
[121972.380000] kmalloc-8k               224KB        224KB
[121972.380000] kmalloc-4k               860KB        992KB
[121972.390000] kmalloc-2k               352KB        352KB
[121972.390000] kmalloc-1k               563KB        576KB
[121972.400000] kmalloc-512              936KB        936KB
[121972.400000] kmalloc-256              196KB        240KB
[121972.410000] kmalloc-192              160KB        169KB
[121972.410000] kmalloc-128              546KB        764KB
[121972.420000] kmalloc-64              1213KB       1288KB
[121972.420000] kmem_cache_node           12KB         12KB
[121972.430000] kmem_cache                16KB         16KB
[121972.440000] Tasks state (memory values in pages):
[121972.440000] [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
[121972.450000] [    209]     0   209     5140      320        0      320         0    16384      480         -1000 systemd-udevd
[121972.460000] [    230]   998   230     2887       55       32       23         0    18432        0             0 systemd-network
[121972.470000] [    420]     0   420      596        0        0        0         0     6144       22             0 mdadm
[121972.490000] [    421]   102   421     1393       56       32       24         0    10240        0             0 rpcbind
[121972.500000] [    429]   996   429     3695       17        0       17         0    20480        0             0 systemd-resolve
[121972.510000] [    433]     0   433      494       51        0       51         0     8192        0             0 rpc.idmapd
[121972.520000] [    434]     0   434      743       92       33       59         0     8192        7             0 nfsdcld
[121972.530000] [    451]     0   451      390        0        0        0         0     6144        0             0 acpid
[121972.540000] [    453]   105   453     1380       50       32       18         0    10240       18             0 avahi-daemon
[121972.550000] [    454]   101   454     1549       16        0       16         0    12288       32          -900 dbus-daemon
[121972.560000] [    466]     0   466     3771       60        0       60         0    14336        0             0 irqbalance
[121972.570000] [    475]     0   475     6269       32       32        0         0    18432        0             0 rsyslogd
[121972.590000] [    487]   105   487     1347       68       38       30         0    10240        0             0 avahi-daemon
[121972.600000] [    492]     0   492     1765        0        0        0         0    12288        0             0 cron
[121972.610000] [    493]     0   493     2593        0        0        0         0    16384        0             0 wpa_supplicant
[121972.620000] [    494]     0   494      607        0        0        0         0     8192       32             0 atd
[121972.630000] [    506]     0   506     1065       25        0       25         0    10240        0             0 rpc.mountd
[121972.640000] [    514]   103   514      809       25        0       25         0     8192        0             0 rpc.statd
[121972.650000] [    522]     0   522      999       31        0       31         0    10240        0             0 agetty
[121972.660000] [    524]     0   524     1540       28        0       28         0    12288        0             0 agetty
[121972.670000] [    525]     0   525     9098       56       32       24         0    34816        0             0 unattended-upgr
[121972.690000] [    526]     0   526     2621      320        0      320         0    14336      192         -1000 sshd
[121972.700000] [    539]     0   539      849       32       32        0         0     8192        0             0 in.tftpd
[121972.710000] [    544]   113   544     4361        6        6        0         0    16384       25             0 chronyd
[121972.720000] [    546]     0   546    16816       62       32       30         0    45056        0             0 winbindd
[121972.730000] [    552]     0   552    16905       59       32       27         0    45056        3             0 winbindd
[121972.740000] [    559]     0   559    17849       94       32       30        32    49152        4             0 smbd
[121972.750000] [    572]     0   572    17409       40       16       24         0    43008       11             0 smbd-notifyd
[121972.760000] [    573]     0   573    17412       16       16        0         0    43008       24             0 cleanupd
[121972.770000] [    584]     0   584     3036       20        0       20         0    16384        4             0 sshd
[121972.780000] [    589]     0   589    16816       32        2       30         0    40960       21             0 winbindd
[121972.790000] [    590]     0   590    27009       47       23       24         0    65536       21             0 smbd
[121972.810000] [    597]   501   597     3344       91       32       59         0    20480        0           100 systemd
[121972.820000] [    653]   501   653     3036        0        0        0         0    16384       33             0 sshd
[121972.830000] [    656]   501   656     1938       93       32       61         0    12288        9             0 bash
[121972.840000] [    704]     0   704      395      352       64      288         0     6144        0         -1000 watchdog
[121972.850000] [    738]   501   738     2834       12        0       12         0    16384        6             0 top
[121972.860000] [   4750]     0  4750     4218       44       26       18         0    18432       11             0 proftpd
[121972.870000] [   4768]     0  4768      401       31        0       31         0     6144        0             0 apt.systemd.dai
[121972.880000] [   4772]     0  4772      401       31        0       31         0     6144        0             0 apt.systemd.dai
[121972.890000] [   4778]     0  4778    13556       54        0       54         0    59392       26             0 apt-get
[121972.900000] Out of memory and no killable processes...
[121972.910000] Kernel panic - not syncing: System is deadlocked on memory
[121972.920000] CPU: 1 PID: 537 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
[121972.920000] Hardware name: Freescale LS1024A
[121972.930000]  unwind_backtrace from show_stack+0xb/0xc
[121972.930000]  show_stack from dump_stack_lvl+0x2b/0x34
[121972.940000]  dump_stack_lvl from panic+0xbf/0x264
[121972.940000]  panic from out_of_memory+0x33f/0x34c
[121972.950000]  out_of_memory from __alloc_pages+0x8e7/0xbb0
[121972.950000]  __alloc_pages from __alloc_pages_bulk+0x26d/0x3d8
[121972.960000]  __alloc_pages_bulk from svc_recv+0x9d/0x7d4
[121972.960000]  svc_recv from nfsd+0x7d/0xd4
[121972.970000]  nfsd from kthread+0xb9/0xcc
[121972.970000]  kthread from ret_from_fork+0x11/0x1c
[121972.980000] Exception stack(0xc2cadfb0 to 0xc2cadff8)
[121972.980000] dfa0:                                     00000000 00000000 00000000 00000000
[121972.990000] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[121973.000000] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[121973.010000] CPU0: stopping
[121973.010000] CPU: 0 PID: 540 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
[121973.010000] Hardware name: Freescale LS1024A
[121973.010000]  unwind_backtrace from show_stack+0xb/0xc
[121973.010000]  show_stack from dump_stack_lvl+0x2b/0x34
[121973.010000]  dump_stack_lvl from do_handle_IPI+0x151/0x178
[121973.010000]  do_handle_IPI from ipi_handler+0x13/0x18
[121973.010000]  ipi_handler from handle_percpu_devid_irq+0x55/0x144
[121973.010000]  handle_percpu_devid_irq from generic_handle_domain_irq+0x17/0x20
[121973.010000]  generic_handle_domain_irq from gic_handle_irq+0x5f/0x70
[121973.010000]  gic_handle_irq from generic_handle_arch_irq+0x27/0x34
[121973.010000]  generic_handle_arch_irq from call_with_stack+0xd/0x10
[121973.010000] Rebooting in 90 seconds..

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [External] : nfsd: memory leak when client does many file operations
  2024-03-24 19:57 nfsd: memory leak when client does many file operations Jan Schunk
@ 2024-03-24 20:14 ` Chuck Lever III
  2024-03-24 20:48   ` Aw: " Jan Schunk
  0 siblings, 1 reply; 24+ messages in thread
From: Chuck Lever III @ 2024-03-24 20:14 UTC (permalink / raw)
  To: Jan Schunk
  Cc: Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Linux NFS Mailing List, linux-kernel



> On Mar 24, 2024, at 3:57 PM, Jan Schunk <scpcom@gmx.de> wrote:
> 
> Issue found on: v6.5.13 v6.6.13, v6.6.14, v6.6.20 and v6.8.1
> Not found on: v6.4, v6.1.82 and below
> Architectures: amd64 and arm(hf)
> 
> Steps to reproduce:
> - Create a VM with 1GB RAM
> - Install Debian 12
> - Install linux-image-6.6.13+bpo-amd64-unsigned and nfs-kernel-server
> - Export some folder
> On the client:
> - Mount the share
> - Run a script that does produce heavy usage on the share (like unpacking large tar archives that cointain many small files into a git and commiting them)

Hi Jan, thanks for the report.

The "produce heavy usage" instruction here is pretty vague.
I run CI testing with kmemleak enabled, and have not seen
any leaks on recent kernels when running the git regression
tests, which are similar to this kind of workload.

Can you try to narrow the reproducer for us, even just a
little? What client action exactly is triggering the memory
leak? Is there any other workload on your NFS server that
might be consuming memory?


> On my setup it takes 20-40 hours until the memory is full and oom-kill gets hired by nfsd to kill other processes. the memory stays full and the system reboots:
> 
> [121969.590000] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,task=dbus-daemon,pid=454,uid=101
> [121969.600000] Out of memory: Killed process 454 (dbus-daemon) total-vm:6196kB, anon-rss:128kB, file-rss:1408kB, shmem-rss:0kB, UID:101 pgtables:12kB oom_score_adj:-900
> [121971.700000] oom_reaper: reaped process 454 (dbus-daemon), now anon-rss:0kB, file-rss:64kB, shmem-rss:0kB
> [121971.920000] nfsd invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
> [121971.930000] CPU: 1 PID: 537 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
> [121971.930000] Hardware name: Freescale LS1024A
> [121971.940000]  unwind_backtrace from show_stack+0xb/0xc
> [121971.940000]  show_stack from dump_stack_lvl+0x2b/0x34
> [121971.950000]  dump_stack_lvl from dump_header+0x35/0x212
> [121971.950000]  dump_header from out_of_memory+0x317/0x34c
> [121971.960000]  out_of_memory from __alloc_pages+0x8e7/0xbb0
> [121971.970000]  __alloc_pages from __alloc_pages_bulk+0x26d/0x3d8
> [121971.970000]  __alloc_pages_bulk from svc_recv+0x9d/0x7d4
> [121971.980000]  svc_recv from nfsd+0x7d/0xd4
> [121971.980000]  nfsd from kthread+0xb9/0xcc
> [121971.990000]  kthread from ret_from_fork+0x11/0x1c
> [121971.990000] Exception stack(0xc2cadfb0 to 0xc2cadff8)
> [121971.990000] dfa0:                                     00000000 00000000 00000000 00000000
> [121972.000000] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [121972.010000] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [121972.020000] Mem-Info:
> [121972.020000] active_anon:101 inactive_anon:127 isolated_anon:29
> [121972.020000]  active_file:1200 inactive_file:1204 isolated_file:98
> [121972.020000]  unevictable:394 dirty:296 writeback:17
> [121972.020000]  slab_reclaimable:13680 slab_unreclaimable:4350
> [121972.020000]  mapped:637 shmem:4 pagetables:414
> [121972.020000]  sec_pagetables:0 bounce:0
> [121972.020000]  kernel_misc_reclaimable:0
> [121972.020000]  free:7279 free_pcp:184 free_cma:1094
> [121972.060000] Node 0 active_anon:404kB inactive_anon:508kB active_file:4736kB inactive_file:4884kB unevictable:1576kB isolated(anon):116kB isolated(file):388kB mapped:2548kB dirty:1184kB writeback:68kB shmem:16kB writeback_tmp:0kB kernel_stack:1088kB pagetables:1656kB sec_pagetables:0kB all_unreclaimable? no
> [121972.090000] Normal free:29116kB boost:18432kB min:26624kB low:28672kB high:30720kB reserved_highatomic:0KB active_anon:404kB inactive_anon:712kB active_file:4788kB inactive_file:4752kB unevictable:1576kB writepending:1252kB present:1048576kB managed:1011988kB mlocked:1576kB bounce:0kB free_pcp:736kB local_pcp:236kB free_cma:4376kB
> [121972.120000] lowmem_reserve[]: 0 0
> [121972.120000] Normal: 2137*4kB (UEC) 1173*8kB (UEC) 529*16kB (UEC) 19*32kB (UC) 7*64kB (C) 5*128kB (C) 2*256kB (C) 1*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 29116kB
> [121972.140000] 2991 total pagecache pages
> [121972.140000] 166 pages in swap cache
> [121972.140000] Free swap  = 93424kB
> [121972.150000] Total swap = 102396kB
> [121972.150000] 262144 pages RAM
> [121972.150000] 0 pages HighMem/MovableOnly
> [121972.160000] 9147 pages reserved
> [121972.160000] 4096 pages cma reserved
> [121972.160000] Unreclaimable slab info:
> [121972.170000] Name                      Used          Total
> [121972.170000] bio-88                    64KB         64KB
> [121972.180000] TCPv6                     61KB         61KB
> [121972.180000] bio-76                    16KB         16KB
> [121972.190000] bio-188                   11KB         11KB
> [121972.190000] nfs_read_data             22KB         22KB
> [121972.200000] kioctx                    15KB         15KB
> [121972.200000] posix_timers_cache          7KB          7KB
> [121972.210000] UDP                       63KB         63KB
> [121972.220000] tw_sock_TCP                3KB          3KB
> [121972.220000] request_sock_TCP           3KB          3KB
> [121972.230000] TCP                       62KB         62KB
> [121972.230000] bio-168                    7KB          7KB
> [121972.240000] ep_head                    8KB          8KB
> [121972.240000] request_queue             15KB         15KB
> [121972.250000] bio-124                   18KB         40KB
> [121972.250000] biovec-max               264KB        264KB
> [121972.260000] biovec-128                63KB         63KB
> [121972.260000] biovec-64                157KB        157KB
> [121972.270000] skbuff_small_head         94KB         94KB
> [121972.270000] skbuff_fclone_cache         55KB         63KB
> [121972.280000] skbuff_head_cache         59KB         59KB
> [121972.280000] fsnotify_mark_connector         16KB         28KB
> [121972.290000] sigqueue                  19KB         31KB
> [121972.300000] shmem_inode_cache       1622KB       1662KB
> [121972.300000] kernfs_iattrs_cache         15KB         15KB
> [121972.310000] kernfs_node_cache       2107KB       2138KB
> [121972.310000] filp                     259KB        315KB
> [121972.320000] net_namespace             30KB         30KB
> [121972.320000] uts_namespace             15KB         15KB
> [121972.330000] vma_lock                 143KB        179KB
> [121972.330000] vm_area_struct           459KB        553KB
> [121972.340000] sighand_cache            191KB        220KB
> [121972.340000] task_struct              378KB        446KB
> [121972.350000] anon_vma_chain           753KB        804KB
> [121972.360000] anon_vma                 170KB        207KB
> [121972.360000] trace_event_file          83KB         83KB
> [121972.370000] mm_struct                157KB        173KB
> [121972.370000] vmap_area                217KB        354KB
> [121972.380000] kmalloc-8k               224KB        224KB
> [121972.380000] kmalloc-4k               860KB        992KB
> [121972.390000] kmalloc-2k               352KB        352KB
> [121972.390000] kmalloc-1k               563KB        576KB
> [121972.400000] kmalloc-512              936KB        936KB
> [121972.400000] kmalloc-256              196KB        240KB
> [121972.410000] kmalloc-192              160KB        169KB
> [121972.410000] kmalloc-128              546KB        764KB
> [121972.420000] kmalloc-64              1213KB       1288KB
> [121972.420000] kmem_cache_node           12KB         12KB
> [121972.430000] kmem_cache                16KB         16KB
> [121972.440000] Tasks state (memory values in pages):
> [121972.440000] [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
> [121972.450000] [    209]     0   209     5140      320        0      320         0    16384      480         -1000 systemd-udevd
> [121972.460000] [    230]   998   230     2887       55       32       23         0    18432        0             0 systemd-network
> [121972.470000] [    420]     0   420      596        0        0        0         0     6144       22             0 mdadm
> [121972.490000] [    421]   102   421     1393       56       32       24         0    10240        0             0 rpcbind
> [121972.500000] [    429]   996   429     3695       17        0       17         0    20480        0             0 systemd-resolve
> [121972.510000] [    433]     0   433      494       51        0       51         0     8192        0             0 rpc.idmapd
> [121972.520000] [    434]     0   434      743       92       33       59         0     8192        7             0 nfsdcld
> [121972.530000] [    451]     0   451      390        0        0        0         0     6144        0             0 acpid
> [121972.540000] [    453]   105   453     1380       50       32       18         0    10240       18             0 avahi-daemon
> [121972.550000] [    454]   101   454     1549       16        0       16         0    12288       32          -900 dbus-daemon
> [121972.560000] [    466]     0   466     3771       60        0       60         0    14336        0             0 irqbalance
> [121972.570000] [    475]     0   475     6269       32       32        0         0    18432        0             0 rsyslogd
> [121972.590000] [    487]   105   487     1347       68       38       30         0    10240        0             0 avahi-daemon
> [121972.600000] [    492]     0   492     1765        0        0        0         0    12288        0             0 cron
> [121972.610000] [    493]     0   493     2593        0        0        0         0    16384        0             0 wpa_supplicant
> [121972.620000] [    494]     0   494      607        0        0        0         0     8192       32             0 atd
> [121972.630000] [    506]     0   506     1065       25        0       25         0    10240        0             0 rpc.mountd
> [121972.640000] [    514]   103   514      809       25        0       25         0     8192        0             0 rpc.statd
> [121972.650000] [    522]     0   522      999       31        0       31         0    10240        0             0 agetty
> [121972.660000] [    524]     0   524     1540       28        0       28         0    12288        0             0 agetty
> [121972.670000] [    525]     0   525     9098       56       32       24         0    34816        0             0 unattended-upgr
> [121972.690000] [    526]     0   526     2621      320        0      320         0    14336      192         -1000 sshd
> [121972.700000] [    539]     0   539      849       32       32        0         0     8192        0             0 in.tftpd
> [121972.710000] [    544]   113   544     4361        6        6        0         0    16384       25             0 chronyd
> [121972.720000] [    546]     0   546    16816       62       32       30         0    45056        0             0 winbindd
> [121972.730000] [    552]     0   552    16905       59       32       27         0    45056        3             0 winbindd
> [121972.740000] [    559]     0   559    17849       94       32       30        32    49152        4             0 smbd
> [121972.750000] [    572]     0   572    17409       40       16       24         0    43008       11             0 smbd-notifyd
> [121972.760000] [    573]     0   573    17412       16       16        0         0    43008       24             0 cleanupd
> [121972.770000] [    584]     0   584     3036       20        0       20         0    16384        4             0 sshd
> [121972.780000] [    589]     0   589    16816       32        2       30         0    40960       21             0 winbindd
> [121972.790000] [    590]     0   590    27009       47       23       24         0    65536       21             0 smbd
> [121972.810000] [    597]   501   597     3344       91       32       59         0    20480        0           100 systemd
> [121972.820000] [    653]   501   653     3036        0        0        0         0    16384       33             0 sshd
> [121972.830000] [    656]   501   656     1938       93       32       61         0    12288        9             0 bash
> [121972.840000] [    704]     0   704      395      352       64      288         0     6144        0         -1000 watchdog
> [121972.850000] [    738]   501   738     2834       12        0       12         0    16384        6             0 top
> [121972.860000] [   4750]     0  4750     4218       44       26       18         0    18432       11             0 proftpd
> [121972.870000] [   4768]     0  4768      401       31        0       31         0     6144        0             0 apt.systemd.dai
> [121972.880000] [   4772]     0  4772      401       31        0       31         0     6144        0             0 apt.systemd.dai
> [121972.890000] [   4778]     0  4778    13556       54        0       54         0    59392       26             0 apt-get
> [121972.900000] Out of memory and no killable processes...
> [121972.910000] Kernel panic - not syncing: System is deadlocked on memory
> [121972.920000] CPU: 1 PID: 537 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
> [121972.920000] Hardware name: Freescale LS1024A
> [121972.930000]  unwind_backtrace from show_stack+0xb/0xc
> [121972.930000]  show_stack from dump_stack_lvl+0x2b/0x34
> [121972.940000]  dump_stack_lvl from panic+0xbf/0x264
> [121972.940000]  panic from out_of_memory+0x33f/0x34c
> [121972.950000]  out_of_memory from __alloc_pages+0x8e7/0xbb0
> [121972.950000]  __alloc_pages from __alloc_pages_bulk+0x26d/0x3d8
> [121972.960000]  __alloc_pages_bulk from svc_recv+0x9d/0x7d4
> [121972.960000]  svc_recv from nfsd+0x7d/0xd4
> [121972.970000]  nfsd from kthread+0xb9/0xcc
> [121972.970000]  kthread from ret_from_fork+0x11/0x1c
> [121972.980000] Exception stack(0xc2cadfb0 to 0xc2cadff8)
> [121972.980000] dfa0:                                     00000000 00000000 00000000 00000000
> [121972.990000] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [121973.000000] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [121973.010000] CPU0: stopping
> [121973.010000] CPU: 0 PID: 540 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
> [121973.010000] Hardware name: Freescale LS1024A
> [121973.010000]  unwind_backtrace from show_stack+0xb/0xc
> [121973.010000]  show_stack from dump_stack_lvl+0x2b/0x34
> [121973.010000]  dump_stack_lvl from do_handle_IPI+0x151/0x178
> [121973.010000]  do_handle_IPI from ipi_handler+0x13/0x18
> [121973.010000]  ipi_handler from handle_percpu_devid_irq+0x55/0x144
> [121973.010000]  handle_percpu_devid_irq from generic_handle_domain_irq+0x17/0x20
> [121973.010000]  generic_handle_domain_irq from gic_handle_irq+0x5f/0x70
> [121973.010000]  gic_handle_irq from generic_handle_arch_irq+0x27/0x34
> [121973.010000]  generic_handle_arch_irq from call_with_stack+0xd/0x10
> [121973.010000] Rebooting in 90 seconds..

--
Chuck Lever



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Aw: Re: [External] : nfsd: memory leak when client does many file operations
  2024-03-24 20:14 ` [External] : " Chuck Lever III
@ 2024-03-24 20:48   ` Jan Schunk
  2024-03-24 21:10     ` Chuck Lever III
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Schunk @ 2024-03-24 20:48 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Linux NFS Mailing List, linux-kernel

The "heavy usage" is a simple script runinng on the client and does the following:
1. Create a empty git repository on the share
2. Unpacking a tar.gz archive (Qnap GPL source code)
3. Remove some folders/files
4. Use diff to compare it with an older version
5. commit them to the git
6. Repeat at step 2 with next archive

On my armhf NAS the other memory consuming workload is an SMB server.
On the test VM the other memory consuming workload is a GNOME desktop.

But it does not make much difference if I stop other services it just takes a bit longer until the same issue happens.
The size of swap also does not make a difference.

> Gesendet: Sonntag, den 24.03.2024 um 21:14 Uhr
> Von: "Chuck Lever III" <chuck.lever@oracle.com>
> An: "Jan Schunk" <scpcom@gmx.de>
> Cc: "Jeff Layton" <jlayton@kernel.org>, "Neil Brown" <neilb@suse.de>, "Olga Kornievskaia" <kolga@netapp.com>, "Dai Ngo" <dai.ngo@oracle.com>, "Tom Talpey" <tom@talpey.com>, "Linux NFS Mailing List" <linux-nfs@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
> 
> 
> 
> > On Mar 24, 2024, at 3:57 PM, Jan Schunk <scpcom@gmx.de> wrote:
> > 
> > Issue found on: v6.5.13 v6.6.13, v6.6.14, v6.6.20 and v6.8.1
> > Not found on: v6.4, v6.1.82 and below
> > Architectures: amd64 and arm(hf)
> > 
> > Steps to reproduce:
> > - Create a VM with 1GB RAM
> > - Install Debian 12
> > - Install linux-image-6.6.13+bpo-amd64-unsigned and nfs-kernel-server
> > - Export some folder
> > On the client:
> > - Mount the share
> > - Run a script that does produce heavy usage on the share (like unpacking large tar archives that cointain many small files into a git and commiting them)
> 
> Hi Jan, thanks for the report.
> 
> The "produce heavy usage" instruction here is pretty vague.
> I run CI testing with kmemleak enabled, and have not seen
> any leaks on recent kernels when running the git regression
> tests, which are similar to this kind of workload.
> 
> Can you try to narrow the reproducer for us, even just a
> little? What client action exactly is triggering the memory
> leak? Is there any other workload on your NFS server that
> might be consuming memory?
> 
> 
> > On my setup it takes 20-40 hours until the memory is full and oom-kill gets hired by nfsd to kill other processes. the memory stays full and the system reboots:
> > 
> > [121969.590000] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,task=dbus-daemon,pid=454,uid=101
> > [121969.600000] Out of memory: Killed process 454 (dbus-daemon) total-vm:6196kB, anon-rss:128kB, file-rss:1408kB, shmem-rss:0kB, UID:101 pgtables:12kB oom_score_adj:-900
> > [121971.700000] oom_reaper: reaped process 454 (dbus-daemon), now anon-rss:0kB, file-rss:64kB, shmem-rss:0kB
> > [121971.920000] nfsd invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
> > [121971.930000] CPU: 1 PID: 537 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
> > [121971.930000] Hardware name: Freescale LS1024A
> > [121971.940000]  unwind_backtrace from show_stack+0xb/0xc
> > [121971.940000]  show_stack from dump_stack_lvl+0x2b/0x34
> > [121971.950000]  dump_stack_lvl from dump_header+0x35/0x212
> > [121971.950000]  dump_header from out_of_memory+0x317/0x34c
> > [121971.960000]  out_of_memory from __alloc_pages+0x8e7/0xbb0
> > [121971.970000]  __alloc_pages from __alloc_pages_bulk+0x26d/0x3d8
> > [121971.970000]  __alloc_pages_bulk from svc_recv+0x9d/0x7d4
> > [121971.980000]  svc_recv from nfsd+0x7d/0xd4
> > [121971.980000]  nfsd from kthread+0xb9/0xcc
> > [121971.990000]  kthread from ret_from_fork+0x11/0x1c
> > [121971.990000] Exception stack(0xc2cadfb0 to 0xc2cadff8)
> > [121971.990000] dfa0:                                     00000000 00000000 00000000 00000000
> > [121972.000000] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> > [121972.010000] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> > [121972.020000] Mem-Info:
> > [121972.020000] active_anon:101 inactive_anon:127 isolated_anon:29
> > [121972.020000]  active_file:1200 inactive_file:1204 isolated_file:98
> > [121972.020000]  unevictable:394 dirty:296 writeback:17
> > [121972.020000]  slab_reclaimable:13680 slab_unreclaimable:4350
> > [121972.020000]  mapped:637 shmem:4 pagetables:414
> > [121972.020000]  sec_pagetables:0 bounce:0
> > [121972.020000]  kernel_misc_reclaimable:0
> > [121972.020000]  free:7279 free_pcp:184 free_cma:1094
> > [121972.060000] Node 0 active_anon:404kB inactive_anon:508kB active_file:4736kB inactive_file:4884kB unevictable:1576kB isolated(anon):116kB isolated(file):388kB mapped:2548kB dirty:1184kB writeback:68kB shmem:16kB writeback_tmp:0kB kernel_stack:1088kB pagetables:1656kB sec_pagetables:0kB all_unreclaimable? no
> > [121972.090000] Normal free:29116kB boost:18432kB min:26624kB low:28672kB high:30720kB reserved_highatomic:0KB active_anon:404kB inactive_anon:712kB active_file:4788kB inactive_file:4752kB unevictable:1576kB writepending:1252kB present:1048576kB managed:1011988kB mlocked:1576kB bounce:0kB free_pcp:736kB local_pcp:236kB free_cma:4376kB
> > [121972.120000] lowmem_reserve[]: 0 0
> > [121972.120000] Normal: 2137*4kB (UEC) 1173*8kB (UEC) 529*16kB (UEC) 19*32kB (UC) 7*64kB (C) 5*128kB (C) 2*256kB (C) 1*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 29116kB
> > [121972.140000] 2991 total pagecache pages
> > [121972.140000] 166 pages in swap cache
> > [121972.140000] Free swap  = 93424kB
> > [121972.150000] Total swap = 102396kB
> > [121972.150000] 262144 pages RAM
> > [121972.150000] 0 pages HighMem/MovableOnly
> > [121972.160000] 9147 pages reserved
> > [121972.160000] 4096 pages cma reserved
> > [121972.160000] Unreclaimable slab info:
> > [121972.170000] Name                      Used          Total
> > [121972.170000] bio-88                    64KB         64KB
> > [121972.180000] TCPv6                     61KB         61KB
> > [121972.180000] bio-76                    16KB         16KB
> > [121972.190000] bio-188                   11KB         11KB
> > [121972.190000] nfs_read_data             22KB         22KB
> > [121972.200000] kioctx                    15KB         15KB
> > [121972.200000] posix_timers_cache          7KB          7KB
> > [121972.210000] UDP                       63KB         63KB
> > [121972.220000] tw_sock_TCP                3KB          3KB
> > [121972.220000] request_sock_TCP           3KB          3KB
> > [121972.230000] TCP                       62KB         62KB
> > [121972.230000] bio-168                    7KB          7KB
> > [121972.240000] ep_head                    8KB          8KB
> > [121972.240000] request_queue             15KB         15KB
> > [121972.250000] bio-124                   18KB         40KB
> > [121972.250000] biovec-max               264KB        264KB
> > [121972.260000] biovec-128                63KB         63KB
> > [121972.260000] biovec-64                157KB        157KB
> > [121972.270000] skbuff_small_head         94KB         94KB
> > [121972.270000] skbuff_fclone_cache         55KB         63KB
> > [121972.280000] skbuff_head_cache         59KB         59KB
> > [121972.280000] fsnotify_mark_connector         16KB         28KB
> > [121972.290000] sigqueue                  19KB         31KB
> > [121972.300000] shmem_inode_cache       1622KB       1662KB
> > [121972.300000] kernfs_iattrs_cache         15KB         15KB
> > [121972.310000] kernfs_node_cache       2107KB       2138KB
> > [121972.310000] filp                     259KB        315KB
> > [121972.320000] net_namespace             30KB         30KB
> > [121972.320000] uts_namespace             15KB         15KB
> > [121972.330000] vma_lock                 143KB        179KB
> > [121972.330000] vm_area_struct           459KB        553KB
> > [121972.340000] sighand_cache            191KB        220KB
> > [121972.340000] task_struct              378KB        446KB
> > [121972.350000] anon_vma_chain           753KB        804KB
> > [121972.360000] anon_vma                 170KB        207KB
> > [121972.360000] trace_event_file          83KB         83KB
> > [121972.370000] mm_struct                157KB        173KB
> > [121972.370000] vmap_area                217KB        354KB
> > [121972.380000] kmalloc-8k               224KB        224KB
> > [121972.380000] kmalloc-4k               860KB        992KB
> > [121972.390000] kmalloc-2k               352KB        352KB
> > [121972.390000] kmalloc-1k               563KB        576KB
> > [121972.400000] kmalloc-512              936KB        936KB
> > [121972.400000] kmalloc-256              196KB        240KB
> > [121972.410000] kmalloc-192              160KB        169KB
> > [121972.410000] kmalloc-128              546KB        764KB
> > [121972.420000] kmalloc-64              1213KB       1288KB
> > [121972.420000] kmem_cache_node           12KB         12KB
> > [121972.430000] kmem_cache                16KB         16KB
> > [121972.440000] Tasks state (memory values in pages):
> > [121972.440000] [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
> > [121972.450000] [    209]     0   209     5140      320        0      320         0    16384      480         -1000 systemd-udevd
> > [121972.460000] [    230]   998   230     2887       55       32       23         0    18432        0             0 systemd-network
> > [121972.470000] [    420]     0   420      596        0        0        0         0     6144       22             0 mdadm
> > [121972.490000] [    421]   102   421     1393       56       32       24         0    10240        0             0 rpcbind
> > [121972.500000] [    429]   996   429     3695       17        0       17         0    20480        0             0 systemd-resolve
> > [121972.510000] [    433]     0   433      494       51        0       51         0     8192        0             0 rpc.idmapd
> > [121972.520000] [    434]     0   434      743       92       33       59         0     8192        7             0 nfsdcld
> > [121972.530000] [    451]     0   451      390        0        0        0         0     6144        0             0 acpid
> > [121972.540000] [    453]   105   453     1380       50       32       18         0    10240       18             0 avahi-daemon
> > [121972.550000] [    454]   101   454     1549       16        0       16         0    12288       32          -900 dbus-daemon
> > [121972.560000] [    466]     0   466     3771       60        0       60         0    14336        0             0 irqbalance
> > [121972.570000] [    475]     0   475     6269       32       32        0         0    18432        0             0 rsyslogd
> > [121972.590000] [    487]   105   487     1347       68       38       30         0    10240        0             0 avahi-daemon
> > [121972.600000] [    492]     0   492     1765        0        0        0         0    12288        0             0 cron
> > [121972.610000] [    493]     0   493     2593        0        0        0         0    16384        0             0 wpa_supplicant
> > [121972.620000] [    494]     0   494      607        0        0        0         0     8192       32             0 atd
> > [121972.630000] [    506]     0   506     1065       25        0       25         0    10240        0             0 rpc.mountd
> > [121972.640000] [    514]   103   514      809       25        0       25         0     8192        0             0 rpc.statd
> > [121972.650000] [    522]     0   522      999       31        0       31         0    10240        0             0 agetty
> > [121972.660000] [    524]     0   524     1540       28        0       28         0    12288        0             0 agetty
> > [121972.670000] [    525]     0   525     9098       56       32       24         0    34816        0             0 unattended-upgr
> > [121972.690000] [    526]     0   526     2621      320        0      320         0    14336      192         -1000 sshd
> > [121972.700000] [    539]     0   539      849       32       32        0         0     8192        0             0 in.tftpd
> > [121972.710000] [    544]   113   544     4361        6        6        0         0    16384       25             0 chronyd
> > [121972.720000] [    546]     0   546    16816       62       32       30         0    45056        0             0 winbindd
> > [121972.730000] [    552]     0   552    16905       59       32       27         0    45056        3             0 winbindd
> > [121972.740000] [    559]     0   559    17849       94       32       30        32    49152        4             0 smbd
> > [121972.750000] [    572]     0   572    17409       40       16       24         0    43008       11             0 smbd-notifyd
> > [121972.760000] [    573]     0   573    17412       16       16        0         0    43008       24             0 cleanupd
> > [121972.770000] [    584]     0   584     3036       20        0       20         0    16384        4             0 sshd
> > [121972.780000] [    589]     0   589    16816       32        2       30         0    40960       21             0 winbindd
> > [121972.790000] [    590]     0   590    27009       47       23       24         0    65536       21             0 smbd
> > [121972.810000] [    597]   501   597     3344       91       32       59         0    20480        0           100 systemd
> > [121972.820000] [    653]   501   653     3036        0        0        0         0    16384       33             0 sshd
> > [121972.830000] [    656]   501   656     1938       93       32       61         0    12288        9             0 bash
> > [121972.840000] [    704]     0   704      395      352       64      288         0     6144        0         -1000 watchdog
> > [121972.850000] [    738]   501   738     2834       12        0       12         0    16384        6             0 top
> > [121972.860000] [   4750]     0  4750     4218       44       26       18         0    18432       11             0 proftpd
> > [121972.870000] [   4768]     0  4768      401       31        0       31         0     6144        0             0 apt.systemd.dai
> > [121972.880000] [   4772]     0  4772      401       31        0       31         0     6144        0             0 apt.systemd.dai
> > [121972.890000] [   4778]     0  4778    13556       54        0       54         0    59392       26             0 apt-get
> > [121972.900000] Out of memory and no killable processes...
> > [121972.910000] Kernel panic - not syncing: System is deadlocked on memory
> > [121972.920000] CPU: 1 PID: 537 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
> > [121972.920000] Hardware name: Freescale LS1024A
> > [121972.930000]  unwind_backtrace from show_stack+0xb/0xc
> > [121972.930000]  show_stack from dump_stack_lvl+0x2b/0x34
> > [121972.940000]  dump_stack_lvl from panic+0xbf/0x264
> > [121972.940000]  panic from out_of_memory+0x33f/0x34c
> > [121972.950000]  out_of_memory from __alloc_pages+0x8e7/0xbb0
> > [121972.950000]  __alloc_pages from __alloc_pages_bulk+0x26d/0x3d8
> > [121972.960000]  __alloc_pages_bulk from svc_recv+0x9d/0x7d4
> > [121972.960000]  svc_recv from nfsd+0x7d/0xd4
> > [121972.970000]  nfsd from kthread+0xb9/0xcc
> > [121972.970000]  kthread from ret_from_fork+0x11/0x1c
> > [121972.980000] Exception stack(0xc2cadfb0 to 0xc2cadff8)
> > [121972.980000] dfa0:                                     00000000 00000000 00000000 00000000
> > [121972.990000] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> > [121973.000000] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> > [121973.010000] CPU0: stopping
> > [121973.010000] CPU: 0 PID: 540 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
> > [121973.010000] Hardware name: Freescale LS1024A
> > [121973.010000]  unwind_backtrace from show_stack+0xb/0xc
> > [121973.010000]  show_stack from dump_stack_lvl+0x2b/0x34
> > [121973.010000]  dump_stack_lvl from do_handle_IPI+0x151/0x178
> > [121973.010000]  do_handle_IPI from ipi_handler+0x13/0x18
> > [121973.010000]  ipi_handler from handle_percpu_devid_irq+0x55/0x144
> > [121973.010000]  handle_percpu_devid_irq from generic_handle_domain_irq+0x17/0x20
> > [121973.010000]  generic_handle_domain_irq from gic_handle_irq+0x5f/0x70
> > [121973.010000]  gic_handle_irq from generic_handle_arch_irq+0x27/0x34
> > [121973.010000]  generic_handle_arch_irq from call_with_stack+0xd/0x10
> > [121973.010000] Rebooting in 90 seconds..
> 
> --
> Chuck Lever
> 
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [External] : nfsd: memory leak when client does many file operations
  2024-03-24 20:48   ` Aw: " Jan Schunk
@ 2024-03-24 21:10     ` Chuck Lever III
  2024-03-24 21:39       ` Aw: " Jan Schunk
  0 siblings, 1 reply; 24+ messages in thread
From: Chuck Lever III @ 2024-03-24 21:10 UTC (permalink / raw)
  To: Jan Schunk
  Cc: Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Linux NFS Mailing List, linux-kernel


> On Mar 24, 2024, at 4:48 PM, Jan Schunk <scpcom@gmx.de> wrote:
> 
> The "heavy usage" is a simple script runinng on the client and does the following:
> 1. Create a empty git repository on the share
> 2. Unpacking a tar.gz archive (Qnap GPL source code)
> 3. Remove some folders/files
> 4. Use diff to compare it with an older version
> 5. commit them to the git
> 6. Repeat at step 2 with next archive
> 
> On my armhf NAS the other memory consuming workload is an SMB server.

I'm not sure any of us has a Freescale system to try this ...


> On the test VM the other memory consuming workload is a GNOME desktop.

... and so I'm hoping this VM is an x86_64 system.


> But it does not make much difference if I stop other services it just takes a bit longer until the same issue happens.
> The size of swap also does not make a difference.

What is the nfsd thread count on the server? 'pgrep -c nfsd'

What version of NFS does your client mount with?

What is the speed of the network between your client and server?

What is the type of the exported file system?

Do you use NFS with Kerberos?


>> Gesendet: Sonntag, den 24.03.2024 um 21:14 Uhr
>> Von: "Chuck Lever III" <chuck.lever@oracle.com>
>> An: "Jan Schunk" <scpcom@gmx.de>
>> Cc: "Jeff Layton" <jlayton@kernel.org>, "Neil Brown" <neilb@suse.de>, "Olga Kornievskaia" <kolga@netapp.com>, "Dai Ngo" <dai.ngo@oracle.com>, "Tom Talpey" <tom@talpey.com>, "Linux NFS Mailing List" <linux-nfs@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
>> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
>> 
>> 
>> 
>>> On Mar 24, 2024, at 3:57 PM, Jan Schunk <scpcom@gmx.de> wrote:
>>> 
>>> Issue found on: v6.5.13 v6.6.13, v6.6.14, v6.6.20 and v6.8.1
>>> Not found on: v6.4, v6.1.82 and below
>>> Architectures: amd64 and arm(hf)
>>> 
>>> Steps to reproduce:
>>> - Create a VM with 1GB RAM
>>> - Install Debian 12
>>> - Install linux-image-6.6.13+bpo-amd64-unsigned and nfs-kernel-server
>>> - Export some folder
>>> On the client:
>>> - Mount the share
>>> - Run a script that does produce heavy usage on the share (like unpacking large tar archives that cointain many small files into a git and commiting them)
>> 
>> Hi Jan, thanks for the report.
>> 
>> The "produce heavy usage" instruction here is pretty vague.
>> I run CI testing with kmemleak enabled, and have not seen
>> any leaks on recent kernels when running the git regression
>> tests, which are similar to this kind of workload.
>> 
>> Can you try to narrow the reproducer for us, even just a
>> little? What client action exactly is triggering the memory
>> leak? Is there any other workload on your NFS server that
>> might be consuming memory?
>> 
>> 
>>> On my setup it takes 20-40 hours until the memory is full and oom-kill gets hired by nfsd to kill other processes. the memory stays full and the system reboots:
>>> 
>>> [121969.590000] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,task=dbus-daemon,pid=454,uid=101
>>> [121969.600000] Out of memory: Killed process 454 (dbus-daemon) total-vm:6196kB, anon-rss:128kB, file-rss:1408kB, shmem-rss:0kB, UID:101 pgtables:12kB oom_score_adj:-900
>>> [121971.700000] oom_reaper: reaped process 454 (dbus-daemon), now anon-rss:0kB, file-rss:64kB, shmem-rss:0kB
>>> [121971.920000] nfsd invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
>>> [121971.930000] CPU: 1 PID: 537 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
>>> [121971.930000] Hardware name: Freescale LS1024A
>>> [121971.940000]  unwind_backtrace from show_stack+0xb/0xc
>>> [121971.940000]  show_stack from dump_stack_lvl+0x2b/0x34
>>> [121971.950000]  dump_stack_lvl from dump_header+0x35/0x212
>>> [121971.950000]  dump_header from out_of_memory+0x317/0x34c
>>> [121971.960000]  out_of_memory from __alloc_pages+0x8e7/0xbb0
>>> [121971.970000]  __alloc_pages from __alloc_pages_bulk+0x26d/0x3d8
>>> [121971.970000]  __alloc_pages_bulk from svc_recv+0x9d/0x7d4
>>> [121971.980000]  svc_recv from nfsd+0x7d/0xd4
>>> [121971.980000]  nfsd from kthread+0xb9/0xcc
>>> [121971.990000]  kthread from ret_from_fork+0x11/0x1c
>>> [121971.990000] Exception stack(0xc2cadfb0 to 0xc2cadff8)
>>> [121971.990000] dfa0:                                     00000000 00000000 00000000 00000000
>>> [121972.000000] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>>> [121972.010000] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
>>> [121972.020000] Mem-Info:
>>> [121972.020000] active_anon:101 inactive_anon:127 isolated_anon:29
>>> [121972.020000]  active_file:1200 inactive_file:1204 isolated_file:98
>>> [121972.020000]  unevictable:394 dirty:296 writeback:17
>>> [121972.020000]  slab_reclaimable:13680 slab_unreclaimable:4350
>>> [121972.020000]  mapped:637 shmem:4 pagetables:414
>>> [121972.020000]  sec_pagetables:0 bounce:0
>>> [121972.020000]  kernel_misc_reclaimable:0
>>> [121972.020000]  free:7279 free_pcp:184 free_cma:1094
>>> [121972.060000] Node 0 active_anon:404kB inactive_anon:508kB active_file:4736kB inactive_file:4884kB unevictable:1576kB isolated(anon):116kB isolated(file):388kB mapped:2548kB dirty:1184kB writeback:68kB shmem:16kB writeback_tmp:0kB kernel_stack:1088kB pagetables:1656kB sec_pagetables:0kB all_unreclaimable? no
>>> [121972.090000] Normal free:29116kB boost:18432kB min:26624kB low:28672kB high:30720kB reserved_highatomic:0KB active_anon:404kB inactive_anon:712kB active_file:4788kB inactive_file:4752kB unevictable:1576kB writepending:1252kB present:1048576kB managed:1011988kB mlocked:1576kB bounce:0kB free_pcp:736kB local_pcp:236kB free_cma:4376kB
>>> [121972.120000] lowmem_reserve[]: 0 0
>>> [121972.120000] Normal: 2137*4kB (UEC) 1173*8kB (UEC) 529*16kB (UEC) 19*32kB (UC) 7*64kB (C) 5*128kB (C) 2*256kB (C) 1*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 29116kB
>>> [121972.140000] 2991 total pagecache pages
>>> [121972.140000] 166 pages in swap cache
>>> [121972.140000] Free swap  = 93424kB
>>> [121972.150000] Total swap = 102396kB
>>> [121972.150000] 262144 pages RAM
>>> [121972.150000] 0 pages HighMem/MovableOnly
>>> [121972.160000] 9147 pages reserved
>>> [121972.160000] 4096 pages cma reserved
>>> [121972.160000] Unreclaimable slab info:
>>> [121972.170000] Name                      Used          Total
>>> [121972.170000] bio-88                    64KB         64KB
>>> [121972.180000] TCPv6                     61KB         61KB
>>> [121972.180000] bio-76                    16KB         16KB
>>> [121972.190000] bio-188                   11KB         11KB
>>> [121972.190000] nfs_read_data             22KB         22KB
>>> [121972.200000] kioctx                    15KB         15KB
>>> [121972.200000] posix_timers_cache          7KB          7KB
>>> [121972.210000] UDP                       63KB         63KB
>>> [121972.220000] tw_sock_TCP                3KB          3KB
>>> [121972.220000] request_sock_TCP           3KB          3KB
>>> [121972.230000] TCP                       62KB         62KB
>>> [121972.230000] bio-168                    7KB          7KB
>>> [121972.240000] ep_head                    8KB          8KB
>>> [121972.240000] request_queue             15KB         15KB
>>> [121972.250000] bio-124                   18KB         40KB
>>> [121972.250000] biovec-max               264KB        264KB
>>> [121972.260000] biovec-128                63KB         63KB
>>> [121972.260000] biovec-64                157KB        157KB
>>> [121972.270000] skbuff_small_head         94KB         94KB
>>> [121972.270000] skbuff_fclone_cache         55KB         63KB
>>> [121972.280000] skbuff_head_cache         59KB         59KB
>>> [121972.280000] fsnotify_mark_connector         16KB         28KB
>>> [121972.290000] sigqueue                  19KB         31KB
>>> [121972.300000] shmem_inode_cache       1622KB       1662KB
>>> [121972.300000] kernfs_iattrs_cache         15KB         15KB
>>> [121972.310000] kernfs_node_cache       2107KB       2138KB
>>> [121972.310000] filp                     259KB        315KB
>>> [121972.320000] net_namespace             30KB         30KB
>>> [121972.320000] uts_namespace             15KB         15KB
>>> [121972.330000] vma_lock                 143KB        179KB
>>> [121972.330000] vm_area_struct           459KB        553KB
>>> [121972.340000] sighand_cache            191KB        220KB
>>> [121972.340000] task_struct              378KB        446KB
>>> [121972.350000] anon_vma_chain           753KB        804KB
>>> [121972.360000] anon_vma                 170KB        207KB
>>> [121972.360000] trace_event_file          83KB         83KB
>>> [121972.370000] mm_struct                157KB        173KB
>>> [121972.370000] vmap_area                217KB        354KB
>>> [121972.380000] kmalloc-8k               224KB        224KB
>>> [121972.380000] kmalloc-4k               860KB        992KB
>>> [121972.390000] kmalloc-2k               352KB        352KB
>>> [121972.390000] kmalloc-1k               563KB        576KB
>>> [121972.400000] kmalloc-512              936KB        936KB
>>> [121972.400000] kmalloc-256              196KB        240KB
>>> [121972.410000] kmalloc-192              160KB        169KB
>>> [121972.410000] kmalloc-128              546KB        764KB
>>> [121972.420000] kmalloc-64              1213KB       1288KB
>>> [121972.420000] kmem_cache_node           12KB         12KB
>>> [121972.430000] kmem_cache                16KB         16KB
>>> [121972.440000] Tasks state (memory values in pages):
>>> [121972.440000] [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
>>> [121972.450000] [    209]     0   209     5140      320        0      320         0    16384      480         -1000 systemd-udevd
>>> [121972.460000] [    230]   998   230     2887       55       32       23         0    18432        0             0 systemd-network
>>> [121972.470000] [    420]     0   420      596        0        0        0         0     6144       22             0 mdadm
>>> [121972.490000] [    421]   102   421     1393       56       32       24         0    10240        0             0 rpcbind
>>> [121972.500000] [    429]   996   429     3695       17        0       17         0    20480        0             0 systemd-resolve
>>> [121972.510000] [    433]     0   433      494       51        0       51         0     8192        0             0 rpc.idmapd
>>> [121972.520000] [    434]     0   434      743       92       33       59         0     8192        7             0 nfsdcld
>>> [121972.530000] [    451]     0   451      390        0        0        0         0     6144        0             0 acpid
>>> [121972.540000] [    453]   105   453     1380       50       32       18         0    10240       18             0 avahi-daemon
>>> [121972.550000] [    454]   101   454     1549       16        0       16         0    12288       32          -900 dbus-daemon
>>> [121972.560000] [    466]     0   466     3771       60        0       60         0    14336        0             0 irqbalance
>>> [121972.570000] [    475]     0   475     6269       32       32        0         0    18432        0             0 rsyslogd
>>> [121972.590000] [    487]   105   487     1347       68       38       30         0    10240        0             0 avahi-daemon
>>> [121972.600000] [    492]     0   492     1765        0        0        0         0    12288        0             0 cron
>>> [121972.610000] [    493]     0   493     2593        0        0        0         0    16384        0             0 wpa_supplicant
>>> [121972.620000] [    494]     0   494      607        0        0        0         0     8192       32             0 atd
>>> [121972.630000] [    506]     0   506     1065       25        0       25         0    10240        0             0 rpc.mountd
>>> [121972.640000] [    514]   103   514      809       25        0       25         0     8192        0             0 rpc.statd
>>> [121972.650000] [    522]     0   522      999       31        0       31         0    10240        0             0 agetty
>>> [121972.660000] [    524]     0   524     1540       28        0       28         0    12288        0             0 agetty
>>> [121972.670000] [    525]     0   525     9098       56       32       24         0    34816        0             0 unattended-upgr
>>> [121972.690000] [    526]     0   526     2621      320        0      320         0    14336      192         -1000 sshd
>>> [121972.700000] [    539]     0   539      849       32       32        0         0     8192        0             0 in.tftpd
>>> [121972.710000] [    544]   113   544     4361        6        6        0         0    16384       25             0 chronyd
>>> [121972.720000] [    546]     0   546    16816       62       32       30         0    45056        0             0 winbindd
>>> [121972.730000] [    552]     0   552    16905       59       32       27         0    45056        3             0 winbindd
>>> [121972.740000] [    559]     0   559    17849       94       32       30        32    49152        4             0 smbd
>>> [121972.750000] [    572]     0   572    17409       40       16       24         0    43008       11             0 smbd-notifyd
>>> [121972.760000] [    573]     0   573    17412       16       16        0         0    43008       24             0 cleanupd
>>> [121972.770000] [    584]     0   584     3036       20        0       20         0    16384        4             0 sshd
>>> [121972.780000] [    589]     0   589    16816       32        2       30         0    40960       21             0 winbindd
>>> [121972.790000] [    590]     0   590    27009       47       23       24         0    65536       21             0 smbd
>>> [121972.810000] [    597]   501   597     3344       91       32       59         0    20480        0           100 systemd
>>> [121972.820000] [    653]   501   653     3036        0        0        0         0    16384       33             0 sshd
>>> [121972.830000] [    656]   501   656     1938       93       32       61         0    12288        9             0 bash
>>> [121972.840000] [    704]     0   704      395      352       64      288         0     6144        0         -1000 watchdog
>>> [121972.850000] [    738]   501   738     2834       12        0       12         0    16384        6             0 top
>>> [121972.860000] [   4750]     0  4750     4218       44       26       18         0    18432       11             0 proftpd
>>> [121972.870000] [   4768]     0  4768      401       31        0       31         0     6144        0             0 apt.systemd.dai
>>> [121972.880000] [   4772]     0  4772      401       31        0       31         0     6144        0             0 apt.systemd.dai
>>> [121972.890000] [   4778]     0  4778    13556       54        0       54         0    59392       26             0 apt-get
>>> [121972.900000] Out of memory and no killable processes...
>>> [121972.910000] Kernel panic - not syncing: System is deadlocked on memory
>>> [121972.920000] CPU: 1 PID: 537 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
>>> [121972.920000] Hardware name: Freescale LS1024A
>>> [121972.930000]  unwind_backtrace from show_stack+0xb/0xc
>>> [121972.930000]  show_stack from dump_stack_lvl+0x2b/0x34
>>> [121972.940000]  dump_stack_lvl from panic+0xbf/0x264
>>> [121972.940000]  panic from out_of_memory+0x33f/0x34c
>>> [121972.950000]  out_of_memory from __alloc_pages+0x8e7/0xbb0
>>> [121972.950000]  __alloc_pages from __alloc_pages_bulk+0x26d/0x3d8
>>> [121972.960000]  __alloc_pages_bulk from svc_recv+0x9d/0x7d4
>>> [121972.960000]  svc_recv from nfsd+0x7d/0xd4
>>> [121972.970000]  nfsd from kthread+0xb9/0xcc
>>> [121972.970000]  kthread from ret_from_fork+0x11/0x1c
>>> [121972.980000] Exception stack(0xc2cadfb0 to 0xc2cadff8)
>>> [121972.980000] dfa0:                                     00000000 00000000 00000000 00000000
>>> [121972.990000] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>>> [121973.000000] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
>>> [121973.010000] CPU0: stopping
>>> [121973.010000] CPU: 0 PID: 540 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
>>> [121973.010000] Hardware name: Freescale LS1024A
>>> [121973.010000]  unwind_backtrace from show_stack+0xb/0xc
>>> [121973.010000]  show_stack from dump_stack_lvl+0x2b/0x34
>>> [121973.010000]  dump_stack_lvl from do_handle_IPI+0x151/0x178
>>> [121973.010000]  do_handle_IPI from ipi_handler+0x13/0x18
>>> [121973.010000]  ipi_handler from handle_percpu_devid_irq+0x55/0x144
>>> [121973.010000]  handle_percpu_devid_irq from generic_handle_domain_irq+0x17/0x20
>>> [121973.010000]  generic_handle_domain_irq from gic_handle_irq+0x5f/0x70
>>> [121973.010000]  gic_handle_irq from generic_handle_arch_irq+0x27/0x34
>>> [121973.010000]  generic_handle_arch_irq from call_with_stack+0xd/0x10
>>> [121973.010000] Rebooting in 90 seconds..
>> 
>> --
>> Chuck Lever
>> 
>> 

--
Chuck Lever



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Aw: Re: [External] : nfsd: memory leak when client does many file operations
  2024-03-24 21:10     ` Chuck Lever III
@ 2024-03-24 21:39       ` Jan Schunk
  2024-03-24 22:13         ` Chuck Lever III
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Schunk @ 2024-03-24 21:39 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Linux NFS Mailing List, linux-kernel

Yes, the VM is x86_64.

"pgrep -c nfsd" says: 9

I use NFS version 3.

All network ports are connected with 1GBit/s.

The exported file system is ext4.

I do not use any authentication.

The mount options in /etc/fstab are:
rw,noatime,nfsvers=3,proto=tcp,hard,nointr,timeo=600,rsize=32768,wsize=32768,noauto

The line in /etc/exports:
/export/data3 192.168.0.0/16(fsid=<uuid>,rw,no_root_squash,async,no_subtree_check)


> Gesendet: Sonntag, den 24.03.2024 um 22:10 Uhr
> Von: "Chuck Lever III" <chuck.lever@oracle.com>
> An: "Jan Schunk" <scpcom@gmx.de>
> Cc: "Jeff Layton" <jlayton@kernel.org>, "Neil Brown" <neilb@suse.de>, "Olga Kornievskaia" <kolga@netapp.com>, "Dai Ngo" <dai.ngo@oracle.com>, "Tom Talpey" <tom@talpey.com>, "Linux NFS Mailing List" <linux-nfs@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
> 
> 
> > On Mar 24, 2024, at 4:48 PM, Jan Schunk <scpcom@gmx.de> wrote:
> > 
> > The "heavy usage" is a simple script runinng on the client and does the following:
> > 1. Create a empty git repository on the share
> > 2. Unpacking a tar.gz archive (Qnap GPL source code)
> > 3. Remove some folders/files
> > 4. Use diff to compare it with an older version
> > 5. commit them to the git
> > 6. Repeat at step 2 with next archive
> > 
> > On my armhf NAS the other memory consuming workload is an SMB server.
> 
> I'm not sure any of us has a Freescale system to try this ...
> 
> 
> > On the test VM the other memory consuming workload is a GNOME desktop.
> 
> ... and so I'm hoping this VM is an x86_64 system.
> 
> 
> > But it does not make much difference if I stop other services it just takes a bit longer until the same issue happens.
> > The size of swap also does not make a difference.
> 
> What is the nfsd thread count on the server? 'pgrep -c nfsd'
> 
> What version of NFS does your client mount with?
> 
> What is the speed of the network between your client and server?
> 
> What is the type of the exported file system?
> 
> Do you use NFS with Kerberos?
> 
> 
> >> Gesendet: Sonntag, den 24.03.2024 um 21:14 Uhr
> >> Von: "Chuck Lever III" <chuck.lever@oracle.com>
> >> An: "Jan Schunk" <scpcom@gmx.de>
> >> Cc: "Jeff Layton" <jlayton@kernel.org>, "Neil Brown" <neilb@suse.de>, "Olga Kornievskaia" <kolga@netapp.com>, "Dai Ngo" <dai.ngo@oracle.com>, "Tom Talpey" <tom@talpey.com>, "Linux NFS Mailing List" <linux-nfs@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
> >> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
> >> 
> >> 
> >> 
> >>> On Mar 24, 2024, at 3:57 PM, Jan Schunk <scpcom@gmx.de> wrote:
> >>> 
> >>> Issue found on: v6.5.13 v6.6.13, v6.6.14, v6.6.20 and v6.8.1
> >>> Not found on: v6.4, v6.1.82 and below
> >>> Architectures: amd64 and arm(hf)
> >>> 
> >>> Steps to reproduce:
> >>> - Create a VM with 1GB RAM
> >>> - Install Debian 12
> >>> - Install linux-image-6.6.13+bpo-amd64-unsigned and nfs-kernel-server
> >>> - Export some folder
> >>> On the client:
> >>> - Mount the share
> >>> - Run a script that does produce heavy usage on the share (like unpacking large tar archives that cointain many small files into a git and commiting them)
> >> 
> >> Hi Jan, thanks for the report.
> >> 
> >> The "produce heavy usage" instruction here is pretty vague.
> >> I run CI testing with kmemleak enabled, and have not seen
> >> any leaks on recent kernels when running the git regression
> >> tests, which are similar to this kind of workload.
> >> 
> >> Can you try to narrow the reproducer for us, even just a
> >> little? What client action exactly is triggering the memory
> >> leak? Is there any other workload on your NFS server that
> >> might be consuming memory?
> >> 
> >> 
> >>> On my setup it takes 20-40 hours until the memory is full and oom-kill gets hired by nfsd to kill other processes. the memory stays full and the system reboots:
> >>> 
> >>> [121969.590000] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,task=dbus-daemon,pid=454,uid=101
> >>> [121969.600000] Out of memory: Killed process 454 (dbus-daemon) total-vm:6196kB, anon-rss:128kB, file-rss:1408kB, shmem-rss:0kB, UID:101 pgtables:12kB oom_score_adj:-900
> >>> [121971.700000] oom_reaper: reaped process 454 (dbus-daemon), now anon-rss:0kB, file-rss:64kB, shmem-rss:0kB
> >>> [121971.920000] nfsd invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
> >>> [121971.930000] CPU: 1 PID: 537 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
> >>> [121971.930000] Hardware name: Freescale LS1024A
> >>> [121971.940000]  unwind_backtrace from show_stack+0xb/0xc
> >>> [121971.940000]  show_stack from dump_stack_lvl+0x2b/0x34
> >>> [121971.950000]  dump_stack_lvl from dump_header+0x35/0x212
> >>> [121971.950000]  dump_header from out_of_memory+0x317/0x34c
> >>> [121971.960000]  out_of_memory from __alloc_pages+0x8e7/0xbb0
> >>> [121971.970000]  __alloc_pages from __alloc_pages_bulk+0x26d/0x3d8
> >>> [121971.970000]  __alloc_pages_bulk from svc_recv+0x9d/0x7d4
> >>> [121971.980000]  svc_recv from nfsd+0x7d/0xd4
> >>> [121971.980000]  nfsd from kthread+0xb9/0xcc
> >>> [121971.990000]  kthread from ret_from_fork+0x11/0x1c
> >>> [121971.990000] Exception stack(0xc2cadfb0 to 0xc2cadff8)
> >>> [121971.990000] dfa0:                                     00000000 00000000 00000000 00000000
> >>> [121972.000000] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> >>> [121972.010000] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> >>> [121972.020000] Mem-Info:
> >>> [121972.020000] active_anon:101 inactive_anon:127 isolated_anon:29
> >>> [121972.020000]  active_file:1200 inactive_file:1204 isolated_file:98
> >>> [121972.020000]  unevictable:394 dirty:296 writeback:17
> >>> [121972.020000]  slab_reclaimable:13680 slab_unreclaimable:4350
> >>> [121972.020000]  mapped:637 shmem:4 pagetables:414
> >>> [121972.020000]  sec_pagetables:0 bounce:0
> >>> [121972.020000]  kernel_misc_reclaimable:0
> >>> [121972.020000]  free:7279 free_pcp:184 free_cma:1094
> >>> [121972.060000] Node 0 active_anon:404kB inactive_anon:508kB active_file:4736kB inactive_file:4884kB unevictable:1576kB isolated(anon):116kB isolated(file):388kB mapped:2548kB dirty:1184kB writeback:68kB shmem:16kB writeback_tmp:0kB kernel_stack:1088kB pagetables:1656kB sec_pagetables:0kB all_unreclaimable? no
> >>> [121972.090000] Normal free:29116kB boost:18432kB min:26624kB low:28672kB high:30720kB reserved_highatomic:0KB active_anon:404kB inactive_anon:712kB active_file:4788kB inactive_file:4752kB unevictable:1576kB writepending:1252kB present:1048576kB managed:1011988kB mlocked:1576kB bounce:0kB free_pcp:736kB local_pcp:236kB free_cma:4376kB
> >>> [121972.120000] lowmem_reserve[]: 0 0
> >>> [121972.120000] Normal: 2137*4kB (UEC) 1173*8kB (UEC) 529*16kB (UEC) 19*32kB (UC) 7*64kB (C) 5*128kB (C) 2*256kB (C) 1*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 29116kB
> >>> [121972.140000] 2991 total pagecache pages
> >>> [121972.140000] 166 pages in swap cache
> >>> [121972.140000] Free swap  = 93424kB
> >>> [121972.150000] Total swap = 102396kB
> >>> [121972.150000] 262144 pages RAM
> >>> [121972.150000] 0 pages HighMem/MovableOnly
> >>> [121972.160000] 9147 pages reserved
> >>> [121972.160000] 4096 pages cma reserved
> >>> [121972.160000] Unreclaimable slab info:
> >>> [121972.170000] Name                      Used          Total
> >>> [121972.170000] bio-88                    64KB         64KB
> >>> [121972.180000] TCPv6                     61KB         61KB
> >>> [121972.180000] bio-76                    16KB         16KB
> >>> [121972.190000] bio-188                   11KB         11KB
> >>> [121972.190000] nfs_read_data             22KB         22KB
> >>> [121972.200000] kioctx                    15KB         15KB
> >>> [121972.200000] posix_timers_cache          7KB          7KB
> >>> [121972.210000] UDP                       63KB         63KB
> >>> [121972.220000] tw_sock_TCP                3KB          3KB
> >>> [121972.220000] request_sock_TCP           3KB          3KB
> >>> [121972.230000] TCP                       62KB         62KB
> >>> [121972.230000] bio-168                    7KB          7KB
> >>> [121972.240000] ep_head                    8KB          8KB
> >>> [121972.240000] request_queue             15KB         15KB
> >>> [121972.250000] bio-124                   18KB         40KB
> >>> [121972.250000] biovec-max               264KB        264KB
> >>> [121972.260000] biovec-128                63KB         63KB
> >>> [121972.260000] biovec-64                157KB        157KB
> >>> [121972.270000] skbuff_small_head         94KB         94KB
> >>> [121972.270000] skbuff_fclone_cache         55KB         63KB
> >>> [121972.280000] skbuff_head_cache         59KB         59KB
> >>> [121972.280000] fsnotify_mark_connector         16KB         28KB
> >>> [121972.290000] sigqueue                  19KB         31KB
> >>> [121972.300000] shmem_inode_cache       1622KB       1662KB
> >>> [121972.300000] kernfs_iattrs_cache         15KB         15KB
> >>> [121972.310000] kernfs_node_cache       2107KB       2138KB
> >>> [121972.310000] filp                     259KB        315KB
> >>> [121972.320000] net_namespace             30KB         30KB
> >>> [121972.320000] uts_namespace             15KB         15KB
> >>> [121972.330000] vma_lock                 143KB        179KB
> >>> [121972.330000] vm_area_struct           459KB        553KB
> >>> [121972.340000] sighand_cache            191KB        220KB
> >>> [121972.340000] task_struct              378KB        446KB
> >>> [121972.350000] anon_vma_chain           753KB        804KB
> >>> [121972.360000] anon_vma                 170KB        207KB
> >>> [121972.360000] trace_event_file          83KB         83KB
> >>> [121972.370000] mm_struct                157KB        173KB
> >>> [121972.370000] vmap_area                217KB        354KB
> >>> [121972.380000] kmalloc-8k               224KB        224KB
> >>> [121972.380000] kmalloc-4k               860KB        992KB
> >>> [121972.390000] kmalloc-2k               352KB        352KB
> >>> [121972.390000] kmalloc-1k               563KB        576KB
> >>> [121972.400000] kmalloc-512              936KB        936KB
> >>> [121972.400000] kmalloc-256              196KB        240KB
> >>> [121972.410000] kmalloc-192              160KB        169KB
> >>> [121972.410000] kmalloc-128              546KB        764KB
> >>> [121972.420000] kmalloc-64              1213KB       1288KB
> >>> [121972.420000] kmem_cache_node           12KB         12KB
> >>> [121972.430000] kmem_cache                16KB         16KB
> >>> [121972.440000] Tasks state (memory values in pages):
> >>> [121972.440000] [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
> >>> [121972.450000] [    209]     0   209     5140      320        0      320         0    16384      480         -1000 systemd-udevd
> >>> [121972.460000] [    230]   998   230     2887       55       32       23         0    18432        0             0 systemd-network
> >>> [121972.470000] [    420]     0   420      596        0        0        0         0     6144       22             0 mdadm
> >>> [121972.490000] [    421]   102   421     1393       56       32       24         0    10240        0             0 rpcbind
> >>> [121972.500000] [    429]   996   429     3695       17        0       17         0    20480        0             0 systemd-resolve
> >>> [121972.510000] [    433]     0   433      494       51        0       51         0     8192        0             0 rpc.idmapd
> >>> [121972.520000] [    434]     0   434      743       92       33       59         0     8192        7             0 nfsdcld
> >>> [121972.530000] [    451]     0   451      390        0        0        0         0     6144        0             0 acpid
> >>> [121972.540000] [    453]   105   453     1380       50       32       18         0    10240       18             0 avahi-daemon
> >>> [121972.550000] [    454]   101   454     1549       16        0       16         0    12288       32          -900 dbus-daemon
> >>> [121972.560000] [    466]     0   466     3771       60        0       60         0    14336        0             0 irqbalance
> >>> [121972.570000] [    475]     0   475     6269       32       32        0         0    18432        0             0 rsyslogd
> >>> [121972.590000] [    487]   105   487     1347       68       38       30         0    10240        0             0 avahi-daemon
> >>> [121972.600000] [    492]     0   492     1765        0        0        0         0    12288        0             0 cron
> >>> [121972.610000] [    493]     0   493     2593        0        0        0         0    16384        0             0 wpa_supplicant
> >>> [121972.620000] [    494]     0   494      607        0        0        0         0     8192       32             0 atd
> >>> [121972.630000] [    506]     0   506     1065       25        0       25         0    10240        0             0 rpc.mountd
> >>> [121972.640000] [    514]   103   514      809       25        0       25         0     8192        0             0 rpc.statd
> >>> [121972.650000] [    522]     0   522      999       31        0       31         0    10240        0             0 agetty
> >>> [121972.660000] [    524]     0   524     1540       28        0       28         0    12288        0             0 agetty
> >>> [121972.670000] [    525]     0   525     9098       56       32       24         0    34816        0             0 unattended-upgr
> >>> [121972.690000] [    526]     0   526     2621      320        0      320         0    14336      192         -1000 sshd
> >>> [121972.700000] [    539]     0   539      849       32       32        0         0     8192        0             0 in.tftpd
> >>> [121972.710000] [    544]   113   544     4361        6        6        0         0    16384       25             0 chronyd
> >>> [121972.720000] [    546]     0   546    16816       62       32       30         0    45056        0             0 winbindd
> >>> [121972.730000] [    552]     0   552    16905       59       32       27         0    45056        3             0 winbindd
> >>> [121972.740000] [    559]     0   559    17849       94       32       30        32    49152        4             0 smbd
> >>> [121972.750000] [    572]     0   572    17409       40       16       24         0    43008       11             0 smbd-notifyd
> >>> [121972.760000] [    573]     0   573    17412       16       16        0         0    43008       24             0 cleanupd
> >>> [121972.770000] [    584]     0   584     3036       20        0       20         0    16384        4             0 sshd
> >>> [121972.780000] [    589]     0   589    16816       32        2       30         0    40960       21             0 winbindd
> >>> [121972.790000] [    590]     0   590    27009       47       23       24         0    65536       21             0 smbd
> >>> [121972.810000] [    597]   501   597     3344       91       32       59         0    20480        0           100 systemd
> >>> [121972.820000] [    653]   501   653     3036        0        0        0         0    16384       33             0 sshd
> >>> [121972.830000] [    656]   501   656     1938       93       32       61         0    12288        9             0 bash
> >>> [121972.840000] [    704]     0   704      395      352       64      288         0     6144        0         -1000 watchdog
> >>> [121972.850000] [    738]   501   738     2834       12        0       12         0    16384        6             0 top
> >>> [121972.860000] [   4750]     0  4750     4218       44       26       18         0    18432       11             0 proftpd
> >>> [121972.870000] [   4768]     0  4768      401       31        0       31         0     6144        0             0 apt.systemd.dai
> >>> [121972.880000] [   4772]     0  4772      401       31        0       31         0     6144        0             0 apt.systemd.dai
> >>> [121972.890000] [   4778]     0  4778    13556       54        0       54         0    59392       26             0 apt-get
> >>> [121972.900000] Out of memory and no killable processes...
> >>> [121972.910000] Kernel panic - not syncing: System is deadlocked on memory
> >>> [121972.920000] CPU: 1 PID: 537 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
> >>> [121972.920000] Hardware name: Freescale LS1024A
> >>> [121972.930000]  unwind_backtrace from show_stack+0xb/0xc
> >>> [121972.930000]  show_stack from dump_stack_lvl+0x2b/0x34
> >>> [121972.940000]  dump_stack_lvl from panic+0xbf/0x264
> >>> [121972.940000]  panic from out_of_memory+0x33f/0x34c
> >>> [121972.950000]  out_of_memory from __alloc_pages+0x8e7/0xbb0
> >>> [121972.950000]  __alloc_pages from __alloc_pages_bulk+0x26d/0x3d8
> >>> [121972.960000]  __alloc_pages_bulk from svc_recv+0x9d/0x7d4
> >>> [121972.960000]  svc_recv from nfsd+0x7d/0xd4
> >>> [121972.970000]  nfsd from kthread+0xb9/0xcc
> >>> [121972.970000]  kthread from ret_from_fork+0x11/0x1c
> >>> [121972.980000] Exception stack(0xc2cadfb0 to 0xc2cadff8)
> >>> [121972.980000] dfa0:                                     00000000 00000000 00000000 00000000
> >>> [121972.990000] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> >>> [121973.000000] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> >>> [121973.010000] CPU0: stopping
> >>> [121973.010000] CPU: 0 PID: 540 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
> >>> [121973.010000] Hardware name: Freescale LS1024A
> >>> [121973.010000]  unwind_backtrace from show_stack+0xb/0xc
> >>> [121973.010000]  show_stack from dump_stack_lvl+0x2b/0x34
> >>> [121973.010000]  dump_stack_lvl from do_handle_IPI+0x151/0x178
> >>> [121973.010000]  do_handle_IPI from ipi_handler+0x13/0x18
> >>> [121973.010000]  ipi_handler from handle_percpu_devid_irq+0x55/0x144
> >>> [121973.010000]  handle_percpu_devid_irq from generic_handle_domain_irq+0x17/0x20
> >>> [121973.010000]  generic_handle_domain_irq from gic_handle_irq+0x5f/0x70
> >>> [121973.010000]  gic_handle_irq from generic_handle_arch_irq+0x27/0x34
> >>> [121973.010000]  generic_handle_arch_irq from call_with_stack+0xd/0x10
> >>> [121973.010000] Rebooting in 90 seconds..
> >> 
> >> --
> >> Chuck Lever
> >> 
> >> 
> 
> --
> Chuck Lever
> 
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [External] : nfsd: memory leak when client does many file operations
  2024-03-24 21:39       ` Aw: " Jan Schunk
@ 2024-03-24 22:13         ` Chuck Lever III
  2024-03-24 22:54           ` Aw: " Jan Schunk
  2024-03-25 19:55           ` Jan Schunk
  0 siblings, 2 replies; 24+ messages in thread
From: Chuck Lever III @ 2024-03-24 22:13 UTC (permalink / raw)
  To: Jan Schunk
  Cc: Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Linux NFS Mailing List, linux-kernel



> On Mar 24, 2024, at 5:39 PM, Jan Schunk <scpcom@gmx.de> wrote:
> 
> Yes, the VM is x86_64.
> 
> "pgrep -c nfsd" says: 9
> 
> I use NFS version 3.
> 
> All network ports are connected with 1GBit/s.
> 
> The exported file system is ext4.
> 
> I do not use any authentication.
> 
> The mount options in /etc/fstab are:
> rw,noatime,nfsvers=3,proto=tcp,hard,nointr,timeo=600,rsize=32768,wsize=32768,noauto
> 
> The line in /etc/exports:
> /export/data3 192.168.0.0/16(fsid=<uuid>,rw,no_root_squash,async,no_subtree_check)

Is it possible to reproduce this issue without the "noatime"
mount option and without the "async" export option?


>> Gesendet: Sonntag, den 24.03.2024 um 22:10 Uhr
>> Von: "Chuck Lever III" <chuck.lever@oracle.com>
>> An: "Jan Schunk" <scpcom@gmx.de>
>> Cc: "Jeff Layton" <jlayton@kernel.org>, "Neil Brown" <neilb@suse.de>, "Olga Kornievskaia" <kolga@netapp.com>, "Dai Ngo" <dai.ngo@oracle.com>, "Tom Talpey" <tom@talpey.com>, "Linux NFS Mailing List" <linux-nfs@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
>> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
>> 
>> 
>>> On Mar 24, 2024, at 4:48 PM, Jan Schunk <scpcom@gmx.de> wrote:
>>> 
>>> The "heavy usage" is a simple script runinng on the client and does the following:
>>> 1. Create a empty git repository on the share
>>> 2. Unpacking a tar.gz archive (Qnap GPL source code)
>>> 3. Remove some folders/files
>>> 4. Use diff to compare it with an older version
>>> 5. commit them to the git
>>> 6. Repeat at step 2 with next archive
>>> 
>>> On my armhf NAS the other memory consuming workload is an SMB server.
>> 
>> I'm not sure any of us has a Freescale system to try this ...
>> 
>> 
>>> On the test VM the other memory consuming workload is a GNOME desktop.
>> 
>> ... and so I'm hoping this VM is an x86_64 system.
>> 
>> 
>>> But it does not make much difference if I stop other services it just takes a bit longer until the same issue happens.
>>> The size of swap also does not make a difference.
>> 
>> What is the nfsd thread count on the server? 'pgrep -c nfsd'
>> 
>> What version of NFS does your client mount with?
>> 
>> What is the speed of the network between your client and server?
>> 
>> What is the type of the exported file system?
>> 
>> Do you use NFS with Kerberos?
>> 
>> 
>>>> Gesendet: Sonntag, den 24.03.2024 um 21:14 Uhr
>>>> Von: "Chuck Lever III" <chuck.lever@oracle.com>
>>>> An: "Jan Schunk" <scpcom@gmx.de>
>>>> Cc: "Jeff Layton" <jlayton@kernel.org>, "Neil Brown" <neilb@suse.de>, "Olga Kornievskaia" <kolga@netapp.com>, "Dai Ngo" <dai.ngo@oracle.com>, "Tom Talpey" <tom@talpey.com>, "Linux NFS Mailing List" <linux-nfs@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
>>>> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
>>>> 
>>>> 
>>>> 
>>>>> On Mar 24, 2024, at 3:57 PM, Jan Schunk <scpcom@gmx.de> wrote:
>>>>> 
>>>>> Issue found on: v6.5.13 v6.6.13, v6.6.14, v6.6.20 and v6.8.1
>>>>> Not found on: v6.4, v6.1.82 and below
>>>>> Architectures: amd64 and arm(hf)
>>>>> 
>>>>> Steps to reproduce:
>>>>> - Create a VM with 1GB RAM
>>>>> - Install Debian 12
>>>>> - Install linux-image-6.6.13+bpo-amd64-unsigned and nfs-kernel-server
>>>>> - Export some folder
>>>>> On the client:
>>>>> - Mount the share
>>>>> - Run a script that does produce heavy usage on the share (like unpacking large tar archives that cointain many small files into a git and commiting them)
>>>> 
>>>> Hi Jan, thanks for the report.
>>>> 
>>>> The "produce heavy usage" instruction here is pretty vague.
>>>> I run CI testing with kmemleak enabled, and have not seen
>>>> any leaks on recent kernels when running the git regression
>>>> tests, which are similar to this kind of workload.
>>>> 
>>>> Can you try to narrow the reproducer for us, even just a
>>>> little? What client action exactly is triggering the memory
>>>> leak? Is there any other workload on your NFS server that
>>>> might be consuming memory?
>>>> 
>>>> 
>>>>> On my setup it takes 20-40 hours until the memory is full and oom-kill gets hired by nfsd to kill other processes. the memory stays full and the system reboots:
>>>>> 
>>>>> [121969.590000] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,task=dbus-daemon,pid=454,uid=101
>>>>> [121969.600000] Out of memory: Killed process 454 (dbus-daemon) total-vm:6196kB, anon-rss:128kB, file-rss:1408kB, shmem-rss:0kB, UID:101 pgtables:12kB oom_score_adj:-900
>>>>> [121971.700000] oom_reaper: reaped process 454 (dbus-daemon), now anon-rss:0kB, file-rss:64kB, shmem-rss:0kB
>>>>> [121971.920000] nfsd invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
>>>>> [121971.930000] CPU: 1 PID: 537 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
>>>>> [121971.930000] Hardware name: Freescale LS1024A
>>>>> [121971.940000]  unwind_backtrace from show_stack+0xb/0xc
>>>>> [121971.940000]  show_stack from dump_stack_lvl+0x2b/0x34
>>>>> [121971.950000]  dump_stack_lvl from dump_header+0x35/0x212
>>>>> [121971.950000]  dump_header from out_of_memory+0x317/0x34c
>>>>> [121971.960000]  out_of_memory from __alloc_pages+0x8e7/0xbb0
>>>>> [121971.970000]  __alloc_pages from __alloc_pages_bulk+0x26d/0x3d8
>>>>> [121971.970000]  __alloc_pages_bulk from svc_recv+0x9d/0x7d4
>>>>> [121971.980000]  svc_recv from nfsd+0x7d/0xd4
>>>>> [121971.980000]  nfsd from kthread+0xb9/0xcc
>>>>> [121971.990000]  kthread from ret_from_fork+0x11/0x1c
>>>>> [121971.990000] Exception stack(0xc2cadfb0 to 0xc2cadff8)
>>>>> [121971.990000] dfa0:                                     00000000 00000000 00000000 00000000
>>>>> [121972.000000] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>>>>> [121972.010000] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
>>>>> [121972.020000] Mem-Info:
>>>>> [121972.020000] active_anon:101 inactive_anon:127 isolated_anon:29
>>>>> [121972.020000]  active_file:1200 inactive_file:1204 isolated_file:98
>>>>> [121972.020000]  unevictable:394 dirty:296 writeback:17
>>>>> [121972.020000]  slab_reclaimable:13680 slab_unreclaimable:4350
>>>>> [121972.020000]  mapped:637 shmem:4 pagetables:414
>>>>> [121972.020000]  sec_pagetables:0 bounce:0
>>>>> [121972.020000]  kernel_misc_reclaimable:0
>>>>> [121972.020000]  free:7279 free_pcp:184 free_cma:1094
>>>>> [121972.060000] Node 0 active_anon:404kB inactive_anon:508kB active_file:4736kB inactive_file:4884kB unevictable:1576kB isolated(anon):116kB isolated(file):388kB mapped:2548kB dirty:1184kB writeback:68kB shmem:16kB writeback_tmp:0kB kernel_stack:1088kB pagetables:1656kB sec_pagetables:0kB all_unreclaimable? no
>>>>> [121972.090000] Normal free:29116kB boost:18432kB min:26624kB low:28672kB high:30720kB reserved_highatomic:0KB active_anon:404kB inactive_anon:712kB active_file:4788kB inactive_file:4752kB unevictable:1576kB writepending:1252kB present:1048576kB managed:1011988kB mlocked:1576kB bounce:0kB free_pcp:736kB local_pcp:236kB free_cma:4376kB
>>>>> [121972.120000] lowmem_reserve[]: 0 0
>>>>> [121972.120000] Normal: 2137*4kB (UEC) 1173*8kB (UEC) 529*16kB (UEC) 19*32kB (UC) 7*64kB (C) 5*128kB (C) 2*256kB (C) 1*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 29116kB
>>>>> [121972.140000] 2991 total pagecache pages
>>>>> [121972.140000] 166 pages in swap cache
>>>>> [121972.140000] Free swap  = 93424kB
>>>>> [121972.150000] Total swap = 102396kB
>>>>> [121972.150000] 262144 pages RAM
>>>>> [121972.150000] 0 pages HighMem/MovableOnly
>>>>> [121972.160000] 9147 pages reserved
>>>>> [121972.160000] 4096 pages cma reserved
>>>>> [121972.160000] Unreclaimable slab info:
>>>>> [121972.170000] Name                      Used          Total
>>>>> [121972.170000] bio-88                    64KB         64KB
>>>>> [121972.180000] TCPv6                     61KB         61KB
>>>>> [121972.180000] bio-76                    16KB         16KB
>>>>> [121972.190000] bio-188                   11KB         11KB
>>>>> [121972.190000] nfs_read_data             22KB         22KB
>>>>> [121972.200000] kioctx                    15KB         15KB
>>>>> [121972.200000] posix_timers_cache          7KB          7KB
>>>>> [121972.210000] UDP                       63KB         63KB
>>>>> [121972.220000] tw_sock_TCP                3KB          3KB
>>>>> [121972.220000] request_sock_TCP           3KB          3KB
>>>>> [121972.230000] TCP                       62KB         62KB
>>>>> [121972.230000] bio-168                    7KB          7KB
>>>>> [121972.240000] ep_head                    8KB          8KB
>>>>> [121972.240000] request_queue             15KB         15KB
>>>>> [121972.250000] bio-124                   18KB         40KB
>>>>> [121972.250000] biovec-max               264KB        264KB
>>>>> [121972.260000] biovec-128                63KB         63KB
>>>>> [121972.260000] biovec-64                157KB        157KB
>>>>> [121972.270000] skbuff_small_head         94KB         94KB
>>>>> [121972.270000] skbuff_fclone_cache         55KB         63KB
>>>>> [121972.280000] skbuff_head_cache         59KB         59KB
>>>>> [121972.280000] fsnotify_mark_connector         16KB         28KB
>>>>> [121972.290000] sigqueue                  19KB         31KB
>>>>> [121972.300000] shmem_inode_cache       1622KB       1662KB
>>>>> [121972.300000] kernfs_iattrs_cache         15KB         15KB
>>>>> [121972.310000] kernfs_node_cache       2107KB       2138KB
>>>>> [121972.310000] filp                     259KB        315KB
>>>>> [121972.320000] net_namespace             30KB         30KB
>>>>> [121972.320000] uts_namespace             15KB         15KB
>>>>> [121972.330000] vma_lock                 143KB        179KB
>>>>> [121972.330000] vm_area_struct           459KB        553KB
>>>>> [121972.340000] sighand_cache            191KB        220KB
>>>>> [121972.340000] task_struct              378KB        446KB
>>>>> [121972.350000] anon_vma_chain           753KB        804KB
>>>>> [121972.360000] anon_vma                 170KB        207KB
>>>>> [121972.360000] trace_event_file          83KB         83KB
>>>>> [121972.370000] mm_struct                157KB        173KB
>>>>> [121972.370000] vmap_area                217KB        354KB
>>>>> [121972.380000] kmalloc-8k               224KB        224KB
>>>>> [121972.380000] kmalloc-4k               860KB        992KB
>>>>> [121972.390000] kmalloc-2k               352KB        352KB
>>>>> [121972.390000] kmalloc-1k               563KB        576KB
>>>>> [121972.400000] kmalloc-512              936KB        936KB
>>>>> [121972.400000] kmalloc-256              196KB        240KB
>>>>> [121972.410000] kmalloc-192              160KB        169KB
>>>>> [121972.410000] kmalloc-128              546KB        764KB
>>>>> [121972.420000] kmalloc-64              1213KB       1288KB
>>>>> [121972.420000] kmem_cache_node           12KB         12KB
>>>>> [121972.430000] kmem_cache                16KB         16KB
>>>>> [121972.440000] Tasks state (memory values in pages):
>>>>> [121972.440000] [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
>>>>> [121972.450000] [    209]     0   209     5140      320        0      320         0    16384      480         -1000 systemd-udevd
>>>>> [121972.460000] [    230]   998   230     2887       55       32       23         0    18432        0             0 systemd-network
>>>>> [121972.470000] [    420]     0   420      596        0        0        0         0     6144       22             0 mdadm
>>>>> [121972.490000] [    421]   102   421     1393       56       32       24         0    10240        0             0 rpcbind
>>>>> [121972.500000] [    429]   996   429     3695       17        0       17         0    20480        0             0 systemd-resolve
>>>>> [121972.510000] [    433]     0   433      494       51        0       51         0     8192        0             0 rpc.idmapd
>>>>> [121972.520000] [    434]     0   434      743       92       33       59         0     8192        7             0 nfsdcld
>>>>> [121972.530000] [    451]     0   451      390        0        0        0         0     6144        0             0 acpid
>>>>> [121972.540000] [    453]   105   453     1380       50       32       18         0    10240       18             0 avahi-daemon
>>>>> [121972.550000] [    454]   101   454     1549       16        0       16         0    12288       32          -900 dbus-daemon
>>>>> [121972.560000] [    466]     0   466     3771       60        0       60         0    14336        0             0 irqbalance
>>>>> [121972.570000] [    475]     0   475     6269       32       32        0         0    18432        0             0 rsyslogd
>>>>> [121972.590000] [    487]   105   487     1347       68       38       30         0    10240        0             0 avahi-daemon
>>>>> [121972.600000] [    492]     0   492     1765        0        0        0         0    12288        0             0 cron
>>>>> [121972.610000] [    493]     0   493     2593        0        0        0         0    16384        0             0 wpa_supplicant
>>>>> [121972.620000] [    494]     0   494      607        0        0        0         0     8192       32             0 atd
>>>>> [121972.630000] [    506]     0   506     1065       25        0       25         0    10240        0             0 rpc.mountd
>>>>> [121972.640000] [    514]   103   514      809       25        0       25         0     8192        0             0 rpc.statd
>>>>> [121972.650000] [    522]     0   522      999       31        0       31         0    10240        0             0 agetty
>>>>> [121972.660000] [    524]     0   524     1540       28        0       28         0    12288        0             0 agetty
>>>>> [121972.670000] [    525]     0   525     9098       56       32       24         0    34816        0             0 unattended-upgr
>>>>> [121972.690000] [    526]     0   526     2621      320        0      320         0    14336      192         -1000 sshd
>>>>> [121972.700000] [    539]     0   539      849       32       32        0         0     8192        0             0 in.tftpd
>>>>> [121972.710000] [    544]   113   544     4361        6        6        0         0    16384       25             0 chronyd
>>>>> [121972.720000] [    546]     0   546    16816       62       32       30         0    45056        0             0 winbindd
>>>>> [121972.730000] [    552]     0   552    16905       59       32       27         0    45056        3             0 winbindd
>>>>> [121972.740000] [    559]     0   559    17849       94       32       30        32    49152        4             0 smbd
>>>>> [121972.750000] [    572]     0   572    17409       40       16       24         0    43008       11             0 smbd-notifyd
>>>>> [121972.760000] [    573]     0   573    17412       16       16        0         0    43008       24             0 cleanupd
>>>>> [121972.770000] [    584]     0   584     3036       20        0       20         0    16384        4             0 sshd
>>>>> [121972.780000] [    589]     0   589    16816       32        2       30         0    40960       21             0 winbindd
>>>>> [121972.790000] [    590]     0   590    27009       47       23       24         0    65536       21             0 smbd
>>>>> [121972.810000] [    597]   501   597     3344       91       32       59         0    20480        0           100 systemd
>>>>> [121972.820000] [    653]   501   653     3036        0        0        0         0    16384       33             0 sshd
>>>>> [121972.830000] [    656]   501   656     1938       93       32       61         0    12288        9             0 bash
>>>>> [121972.840000] [    704]     0   704      395      352       64      288         0     6144        0         -1000 watchdog
>>>>> [121972.850000] [    738]   501   738     2834       12        0       12         0    16384        6             0 top
>>>>> [121972.860000] [   4750]     0  4750     4218       44       26       18         0    18432       11             0 proftpd
>>>>> [121972.870000] [   4768]     0  4768      401       31        0       31         0     6144        0             0 apt.systemd.dai
>>>>> [121972.880000] [   4772]     0  4772      401       31        0       31         0     6144        0             0 apt.systemd.dai
>>>>> [121972.890000] [   4778]     0  4778    13556       54        0       54         0    59392       26             0 apt-get
>>>>> [121972.900000] Out of memory and no killable processes...
>>>>> [121972.910000] Kernel panic - not syncing: System is deadlocked on memory
>>>>> [121972.920000] CPU: 1 PID: 537 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
>>>>> [121972.920000] Hardware name: Freescale LS1024A
>>>>> [121972.930000]  unwind_backtrace from show_stack+0xb/0xc
>>>>> [121972.930000]  show_stack from dump_stack_lvl+0x2b/0x34
>>>>> [121972.940000]  dump_stack_lvl from panic+0xbf/0x264
>>>>> [121972.940000]  panic from out_of_memory+0x33f/0x34c
>>>>> [121972.950000]  out_of_memory from __alloc_pages+0x8e7/0xbb0
>>>>> [121972.950000]  __alloc_pages from __alloc_pages_bulk+0x26d/0x3d8
>>>>> [121972.960000]  __alloc_pages_bulk from svc_recv+0x9d/0x7d4
>>>>> [121972.960000]  svc_recv from nfsd+0x7d/0xd4
>>>>> [121972.970000]  nfsd from kthread+0xb9/0xcc
>>>>> [121972.970000]  kthread from ret_from_fork+0x11/0x1c
>>>>> [121972.980000] Exception stack(0xc2cadfb0 to 0xc2cadff8)
>>>>> [121972.980000] dfa0:                                     00000000 00000000 00000000 00000000
>>>>> [121972.990000] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>>>>> [121973.000000] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
>>>>> [121973.010000] CPU0: stopping
>>>>> [121973.010000] CPU: 0 PID: 540 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
>>>>> [121973.010000] Hardware name: Freescale LS1024A
>>>>> [121973.010000]  unwind_backtrace from show_stack+0xb/0xc
>>>>> [121973.010000]  show_stack from dump_stack_lvl+0x2b/0x34
>>>>> [121973.010000]  dump_stack_lvl from do_handle_IPI+0x151/0x178
>>>>> [121973.010000]  do_handle_IPI from ipi_handler+0x13/0x18
>>>>> [121973.010000]  ipi_handler from handle_percpu_devid_irq+0x55/0x144
>>>>> [121973.010000]  handle_percpu_devid_irq from generic_handle_domain_irq+0x17/0x20
>>>>> [121973.010000]  generic_handle_domain_irq from gic_handle_irq+0x5f/0x70
>>>>> [121973.010000]  gic_handle_irq from generic_handle_arch_irq+0x27/0x34
>>>>> [121973.010000]  generic_handle_arch_irq from call_with_stack+0xd/0x10
>>>>> [121973.010000] Rebooting in 90 seconds..
>>>> 
>>>> --
>>>> Chuck Lever
>>>> 
>>>> 
>> 
>> --
>> Chuck Lever
>> 
>> 

--
Chuck Lever



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Aw: Re: [External] : nfsd: memory leak when client does many file operations
  2024-03-24 22:13         ` Chuck Lever III
@ 2024-03-24 22:54           ` Jan Schunk
  2024-03-25 19:55           ` Jan Schunk
  1 sibling, 0 replies; 24+ messages in thread
From: Jan Schunk @ 2024-03-24 22:54 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Linux NFS Mailing List, linux-kernel

I never tried, but I removed async and noatime and started a test run on the VM now.
The result will take some time, as written below.

> Gesendet: Sonntag, den 24.03.2024 um 23:13 Uhr
> Von: "Chuck Lever III" <chuck.lever@oracle.com>
> An: "Jan Schunk" <scpcom@gmx.de>
> Cc: "Jeff Layton" <jlayton@kernel.org>, "Neil Brown" <neilb@suse.de>, "Olga Kornievskaia" <kolga@netapp.com>, "Dai Ngo" <dai.ngo@oracle.com>, "Tom Talpey" <tom@talpey.com>, "Linux NFS Mailing List" <linux-nfs@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
> 
> 
> 
> > On Mar 24, 2024, at 5:39 PM, Jan Schunk <scpcom@gmx.de> wrote:
> > 
> > Yes, the VM is x86_64.
> > 
> > "pgrep -c nfsd" says: 9
> > 
> > I use NFS version 3.
> > 
> > All network ports are connected with 1GBit/s.
> > 
> > The exported file system is ext4.
> > 
> > I do not use any authentication.
> > 
> > The mount options in /etc/fstab are:
> > rw,noatime,nfsvers=3,proto=tcp,hard,nointr,timeo=600,rsize=32768,wsize=32768,noauto
> > 
> > The line in /etc/exports:
> > /export/data3 192.168.0.0/16(fsid=<uuid>,rw,no_root_squash,async,no_subtree_check)
> 
> Is it possible to reproduce this issue without the "noatime"
> mount option and without the "async" export option?
> 
> 
> >> Gesendet: Sonntag, den 24.03.2024 um 22:10 Uhr
> >> Von: "Chuck Lever III" <chuck.lever@oracle.com>
> >> An: "Jan Schunk" <scpcom@gmx.de>
> >> Cc: "Jeff Layton" <jlayton@kernel.org>, "Neil Brown" <neilb@suse.de>, "Olga Kornievskaia" <kolga@netapp.com>, "Dai Ngo" <dai.ngo@oracle.com>, "Tom Talpey" <tom@talpey.com>, "Linux NFS Mailing List" <linux-nfs@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
> >> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
> >> 
> >> 
> >>> On Mar 24, 2024, at 4:48 PM, Jan Schunk <scpcom@gmx.de> wrote:
> >>> 
> >>> The "heavy usage" is a simple script runinng on the client and does the following:
> >>> 1. Create a empty git repository on the share
> >>> 2. Unpacking a tar.gz archive (Qnap GPL source code)
> >>> 3. Remove some folders/files
> >>> 4. Use diff to compare it with an older version
> >>> 5. commit them to the git
> >>> 6. Repeat at step 2 with next archive
> >>> 
> >>> On my armhf NAS the other memory consuming workload is an SMB server.
> >> 
> >> I'm not sure any of us has a Freescale system to try this ...
> >> 
> >> 
> >>> On the test VM the other memory consuming workload is a GNOME desktop.
> >> 
> >> ... and so I'm hoping this VM is an x86_64 system.
> >> 
> >> 
> >>> But it does not make much difference if I stop other services it just takes a bit longer until the same issue happens.
> >>> The size of swap also does not make a difference.
> >> 
> >> What is the nfsd thread count on the server? 'pgrep -c nfsd'
> >> 
> >> What version of NFS does your client mount with?
> >> 
> >> What is the speed of the network between your client and server?
> >> 
> >> What is the type of the exported file system?
> >> 
> >> Do you use NFS with Kerberos?
> >> 
> >> 
> >>>> Gesendet: Sonntag, den 24.03.2024 um 21:14 Uhr
> >>>> Von: "Chuck Lever III" <chuck.lever@oracle.com>
> >>>> An: "Jan Schunk" <scpcom@gmx.de>
> >>>> Cc: "Jeff Layton" <jlayton@kernel.org>, "Neil Brown" <neilb@suse.de>, "Olga Kornievskaia" <kolga@netapp.com>, "Dai Ngo" <dai.ngo@oracle.com>, "Tom Talpey" <tom@talpey.com>, "Linux NFS Mailing List" <linux-nfs@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
> >>>> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
> >>>> 
> >>>> 
> >>>> 
> >>>>> On Mar 24, 2024, at 3:57 PM, Jan Schunk <scpcom@gmx.de> wrote:
> >>>>> 
> >>>>> Issue found on: v6.5.13 v6.6.13, v6.6.14, v6.6.20 and v6.8.1
> >>>>> Not found on: v6.4, v6.1.82 and below
> >>>>> Architectures: amd64 and arm(hf)
> >>>>> 
> >>>>> Steps to reproduce:
> >>>>> - Create a VM with 1GB RAM
> >>>>> - Install Debian 12
> >>>>> - Install linux-image-6.6.13+bpo-amd64-unsigned and nfs-kernel-server
> >>>>> - Export some folder
> >>>>> On the client:
> >>>>> - Mount the share
> >>>>> - Run a script that does produce heavy usage on the share (like unpacking large tar archives that cointain many small files into a git and commiting them)
> >>>> 
> >>>> Hi Jan, thanks for the report.
> >>>> 
> >>>> The "produce heavy usage" instruction here is pretty vague.
> >>>> I run CI testing with kmemleak enabled, and have not seen
> >>>> any leaks on recent kernels when running the git regression
> >>>> tests, which are similar to this kind of workload.
> >>>> 
> >>>> Can you try to narrow the reproducer for us, even just a
> >>>> little? What client action exactly is triggering the memory
> >>>> leak? Is there any other workload on your NFS server that
> >>>> might be consuming memory?
> >>>> 
> >>>> 
> >>>>> On my setup it takes 20-40 hours until the memory is full and oom-kill gets hired by nfsd to kill other processes. the memory stays full and the system reboots:
> >>>>> 
> >>>>> [121969.590000] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,task=dbus-daemon,pid=454,uid=101
> >>>>> [121969.600000] Out of memory: Killed process 454 (dbus-daemon) total-vm:6196kB, anon-rss:128kB, file-rss:1408kB, shmem-rss:0kB, UID:101 pgtables:12kB oom_score_adj:-900
> >>>>> [121971.700000] oom_reaper: reaped process 454 (dbus-daemon), now anon-rss:0kB, file-rss:64kB, shmem-rss:0kB
> >>>>> [121971.920000] nfsd invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
> >>>>> [121971.930000] CPU: 1 PID: 537 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
> >>>>> [121971.930000] Hardware name: Freescale LS1024A
> >>>>> [121971.940000]  unwind_backtrace from show_stack+0xb/0xc
> >>>>> [121971.940000]  show_stack from dump_stack_lvl+0x2b/0x34
> >>>>> [121971.950000]  dump_stack_lvl from dump_header+0x35/0x212
> >>>>> [121971.950000]  dump_header from out_of_memory+0x317/0x34c
> >>>>> [121971.960000]  out_of_memory from __alloc_pages+0x8e7/0xbb0
> >>>>> [121971.970000]  __alloc_pages from __alloc_pages_bulk+0x26d/0x3d8
> >>>>> [121971.970000]  __alloc_pages_bulk from svc_recv+0x9d/0x7d4
> >>>>> [121971.980000]  svc_recv from nfsd+0x7d/0xd4
> >>>>> [121971.980000]  nfsd from kthread+0xb9/0xcc
> >>>>> [121971.990000]  kthread from ret_from_fork+0x11/0x1c
> >>>>> [121971.990000] Exception stack(0xc2cadfb0 to 0xc2cadff8)
> >>>>> [121971.990000] dfa0:                                     00000000 00000000 00000000 00000000
> >>>>> [121972.000000] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> >>>>> [121972.010000] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> >>>>> [121972.020000] Mem-Info:
> >>>>> [121972.020000] active_anon:101 inactive_anon:127 isolated_anon:29
> >>>>> [121972.020000]  active_file:1200 inactive_file:1204 isolated_file:98
> >>>>> [121972.020000]  unevictable:394 dirty:296 writeback:17
> >>>>> [121972.020000]  slab_reclaimable:13680 slab_unreclaimable:4350
> >>>>> [121972.020000]  mapped:637 shmem:4 pagetables:414
> >>>>> [121972.020000]  sec_pagetables:0 bounce:0
> >>>>> [121972.020000]  kernel_misc_reclaimable:0
> >>>>> [121972.020000]  free:7279 free_pcp:184 free_cma:1094
> >>>>> [121972.060000] Node 0 active_anon:404kB inactive_anon:508kB active_file:4736kB inactive_file:4884kB unevictable:1576kB isolated(anon):116kB isolated(file):388kB mapped:2548kB dirty:1184kB writeback:68kB shmem:16kB writeback_tmp:0kB kernel_stack:1088kB pagetables:1656kB sec_pagetables:0kB all_unreclaimable? no
> >>>>> [121972.090000] Normal free:29116kB boost:18432kB min:26624kB low:28672kB high:30720kB reserved_highatomic:0KB active_anon:404kB inactive_anon:712kB active_file:4788kB inactive_file:4752kB unevictable:1576kB writepending:1252kB present:1048576kB managed:1011988kB mlocked:1576kB bounce:0kB free_pcp:736kB local_pcp:236kB free_cma:4376kB
> >>>>> [121972.120000] lowmem_reserve[]: 0 0
> >>>>> [121972.120000] Normal: 2137*4kB (UEC) 1173*8kB (UEC) 529*16kB (UEC) 19*32kB (UC) 7*64kB (C) 5*128kB (C) 2*256kB (C) 1*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 29116kB
> >>>>> [121972.140000] 2991 total pagecache pages
> >>>>> [121972.140000] 166 pages in swap cache
> >>>>> [121972.140000] Free swap  = 93424kB
> >>>>> [121972.150000] Total swap = 102396kB
> >>>>> [121972.150000] 262144 pages RAM
> >>>>> [121972.150000] 0 pages HighMem/MovableOnly
> >>>>> [121972.160000] 9147 pages reserved
> >>>>> [121972.160000] 4096 pages cma reserved
> >>>>> [121972.160000] Unreclaimable slab info:
> >>>>> [121972.170000] Name                      Used          Total
> >>>>> [121972.170000] bio-88                    64KB         64KB
> >>>>> [121972.180000] TCPv6                     61KB         61KB
> >>>>> [121972.180000] bio-76                    16KB         16KB
> >>>>> [121972.190000] bio-188                   11KB         11KB
> >>>>> [121972.190000] nfs_read_data             22KB         22KB
> >>>>> [121972.200000] kioctx                    15KB         15KB
> >>>>> [121972.200000] posix_timers_cache          7KB          7KB
> >>>>> [121972.210000] UDP                       63KB         63KB
> >>>>> [121972.220000] tw_sock_TCP                3KB          3KB
> >>>>> [121972.220000] request_sock_TCP           3KB          3KB
> >>>>> [121972.230000] TCP                       62KB         62KB
> >>>>> [121972.230000] bio-168                    7KB          7KB
> >>>>> [121972.240000] ep_head                    8KB          8KB
> >>>>> [121972.240000] request_queue             15KB         15KB
> >>>>> [121972.250000] bio-124                   18KB         40KB
> >>>>> [121972.250000] biovec-max               264KB        264KB
> >>>>> [121972.260000] biovec-128                63KB         63KB
> >>>>> [121972.260000] biovec-64                157KB        157KB
> >>>>> [121972.270000] skbuff_small_head         94KB         94KB
> >>>>> [121972.270000] skbuff_fclone_cache         55KB         63KB
> >>>>> [121972.280000] skbuff_head_cache         59KB         59KB
> >>>>> [121972.280000] fsnotify_mark_connector         16KB         28KB
> >>>>> [121972.290000] sigqueue                  19KB         31KB
> >>>>> [121972.300000] shmem_inode_cache       1622KB       1662KB
> >>>>> [121972.300000] kernfs_iattrs_cache         15KB         15KB
> >>>>> [121972.310000] kernfs_node_cache       2107KB       2138KB
> >>>>> [121972.310000] filp                     259KB        315KB
> >>>>> [121972.320000] net_namespace             30KB         30KB
> >>>>> [121972.320000] uts_namespace             15KB         15KB
> >>>>> [121972.330000] vma_lock                 143KB        179KB
> >>>>> [121972.330000] vm_area_struct           459KB        553KB
> >>>>> [121972.340000] sighand_cache            191KB        220KB
> >>>>> [121972.340000] task_struct              378KB        446KB
> >>>>> [121972.350000] anon_vma_chain           753KB        804KB
> >>>>> [121972.360000] anon_vma                 170KB        207KB
> >>>>> [121972.360000] trace_event_file          83KB         83KB
> >>>>> [121972.370000] mm_struct                157KB        173KB
> >>>>> [121972.370000] vmap_area                217KB        354KB
> >>>>> [121972.380000] kmalloc-8k               224KB        224KB
> >>>>> [121972.380000] kmalloc-4k               860KB        992KB
> >>>>> [121972.390000] kmalloc-2k               352KB        352KB
> >>>>> [121972.390000] kmalloc-1k               563KB        576KB
> >>>>> [121972.400000] kmalloc-512              936KB        936KB
> >>>>> [121972.400000] kmalloc-256              196KB        240KB
> >>>>> [121972.410000] kmalloc-192              160KB        169KB
> >>>>> [121972.410000] kmalloc-128              546KB        764KB
> >>>>> [121972.420000] kmalloc-64              1213KB       1288KB
> >>>>> [121972.420000] kmem_cache_node           12KB         12KB
> >>>>> [121972.430000] kmem_cache                16KB         16KB
> >>>>> [121972.440000] Tasks state (memory values in pages):
> >>>>> [121972.440000] [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
> >>>>> [121972.450000] [    209]     0   209     5140      320        0      320         0    16384      480         -1000 systemd-udevd
> >>>>> [121972.460000] [    230]   998   230     2887       55       32       23         0    18432        0             0 systemd-network
> >>>>> [121972.470000] [    420]     0   420      596        0        0        0         0     6144       22             0 mdadm
> >>>>> [121972.490000] [    421]   102   421     1393       56       32       24         0    10240        0             0 rpcbind
> >>>>> [121972.500000] [    429]   996   429     3695       17        0       17         0    20480        0             0 systemd-resolve
> >>>>> [121972.510000] [    433]     0   433      494       51        0       51         0     8192        0             0 rpc.idmapd
> >>>>> [121972.520000] [    434]     0   434      743       92       33       59         0     8192        7             0 nfsdcld
> >>>>> [121972.530000] [    451]     0   451      390        0        0        0         0     6144        0             0 acpid
> >>>>> [121972.540000] [    453]   105   453     1380       50       32       18         0    10240       18             0 avahi-daemon
> >>>>> [121972.550000] [    454]   101   454     1549       16        0       16         0    12288       32          -900 dbus-daemon
> >>>>> [121972.560000] [    466]     0   466     3771       60        0       60         0    14336        0             0 irqbalance
> >>>>> [121972.570000] [    475]     0   475     6269       32       32        0         0    18432        0             0 rsyslogd
> >>>>> [121972.590000] [    487]   105   487     1347       68       38       30         0    10240        0             0 avahi-daemon
> >>>>> [121972.600000] [    492]     0   492     1765        0        0        0         0    12288        0             0 cron
> >>>>> [121972.610000] [    493]     0   493     2593        0        0        0         0    16384        0             0 wpa_supplicant
> >>>>> [121972.620000] [    494]     0   494      607        0        0        0         0     8192       32             0 atd
> >>>>> [121972.630000] [    506]     0   506     1065       25        0       25         0    10240        0             0 rpc.mountd
> >>>>> [121972.640000] [    514]   103   514      809       25        0       25         0     8192        0             0 rpc.statd
> >>>>> [121972.650000] [    522]     0   522      999       31        0       31         0    10240        0             0 agetty
> >>>>> [121972.660000] [    524]     0   524     1540       28        0       28         0    12288        0             0 agetty
> >>>>> [121972.670000] [    525]     0   525     9098       56       32       24         0    34816        0             0 unattended-upgr
> >>>>> [121972.690000] [    526]     0   526     2621      320        0      320         0    14336      192         -1000 sshd
> >>>>> [121972.700000] [    539]     0   539      849       32       32        0         0     8192        0             0 in.tftpd
> >>>>> [121972.710000] [    544]   113   544     4361        6        6        0         0    16384       25             0 chronyd
> >>>>> [121972.720000] [    546]     0   546    16816       62       32       30         0    45056        0             0 winbindd
> >>>>> [121972.730000] [    552]     0   552    16905       59       32       27         0    45056        3             0 winbindd
> >>>>> [121972.740000] [    559]     0   559    17849       94       32       30        32    49152        4             0 smbd
> >>>>> [121972.750000] [    572]     0   572    17409       40       16       24         0    43008       11             0 smbd-notifyd
> >>>>> [121972.760000] [    573]     0   573    17412       16       16        0         0    43008       24             0 cleanupd
> >>>>> [121972.770000] [    584]     0   584     3036       20        0       20         0    16384        4             0 sshd
> >>>>> [121972.780000] [    589]     0   589    16816       32        2       30         0    40960       21             0 winbindd
> >>>>> [121972.790000] [    590]     0   590    27009       47       23       24         0    65536       21             0 smbd
> >>>>> [121972.810000] [    597]   501   597     3344       91       32       59         0    20480        0           100 systemd
> >>>>> [121972.820000] [    653]   501   653     3036        0        0        0         0    16384       33             0 sshd
> >>>>> [121972.830000] [    656]   501   656     1938       93       32       61         0    12288        9             0 bash
> >>>>> [121972.840000] [    704]     0   704      395      352       64      288         0     6144        0         -1000 watchdog
> >>>>> [121972.850000] [    738]   501   738     2834       12        0       12         0    16384        6             0 top
> >>>>> [121972.860000] [   4750]     0  4750     4218       44       26       18         0    18432       11             0 proftpd
> >>>>> [121972.870000] [   4768]     0  4768      401       31        0       31         0     6144        0             0 apt.systemd.dai
> >>>>> [121972.880000] [   4772]     0  4772      401       31        0       31         0     6144        0             0 apt.systemd.dai
> >>>>> [121972.890000] [   4778]     0  4778    13556       54        0       54         0    59392       26             0 apt-get
> >>>>> [121972.900000] Out of memory and no killable processes...
> >>>>> [121972.910000] Kernel panic - not syncing: System is deadlocked on memory
> >>>>> [121972.920000] CPU: 1 PID: 537 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
> >>>>> [121972.920000] Hardware name: Freescale LS1024A
> >>>>> [121972.930000]  unwind_backtrace from show_stack+0xb/0xc
> >>>>> [121972.930000]  show_stack from dump_stack_lvl+0x2b/0x34
> >>>>> [121972.940000]  dump_stack_lvl from panic+0xbf/0x264
> >>>>> [121972.940000]  panic from out_of_memory+0x33f/0x34c
> >>>>> [121972.950000]  out_of_memory from __alloc_pages+0x8e7/0xbb0
> >>>>> [121972.950000]  __alloc_pages from __alloc_pages_bulk+0x26d/0x3d8
> >>>>> [121972.960000]  __alloc_pages_bulk from svc_recv+0x9d/0x7d4
> >>>>> [121972.960000]  svc_recv from nfsd+0x7d/0xd4
> >>>>> [121972.970000]  nfsd from kthread+0xb9/0xcc
> >>>>> [121972.970000]  kthread from ret_from_fork+0x11/0x1c
> >>>>> [121972.980000] Exception stack(0xc2cadfb0 to 0xc2cadff8)
> >>>>> [121972.980000] dfa0:                                     00000000 00000000 00000000 00000000
> >>>>> [121972.990000] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> >>>>> [121973.000000] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> >>>>> [121973.010000] CPU0: stopping
> >>>>> [121973.010000] CPU: 0 PID: 540 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
> >>>>> [121973.010000] Hardware name: Freescale LS1024A
> >>>>> [121973.010000]  unwind_backtrace from show_stack+0xb/0xc
> >>>>> [121973.010000]  show_stack from dump_stack_lvl+0x2b/0x34
> >>>>> [121973.010000]  dump_stack_lvl from do_handle_IPI+0x151/0x178
> >>>>> [121973.010000]  do_handle_IPI from ipi_handler+0x13/0x18
> >>>>> [121973.010000]  ipi_handler from handle_percpu_devid_irq+0x55/0x144
> >>>>> [121973.010000]  handle_percpu_devid_irq from generic_handle_domain_irq+0x17/0x20
> >>>>> [121973.010000]  generic_handle_domain_irq from gic_handle_irq+0x5f/0x70
> >>>>> [121973.010000]  gic_handle_irq from generic_handle_arch_irq+0x27/0x34
> >>>>> [121973.010000]  generic_handle_arch_irq from call_with_stack+0xd/0x10
> >>>>> [121973.010000] Rebooting in 90 seconds..
> >>>> 
> >>>> --
> >>>> Chuck Lever
> >>>> 
> >>>> 
> >> 
> >> --
> >> Chuck Lever
> >> 
> >> 
> 
> --
> Chuck Lever
> 
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Aw: Re: [External] : nfsd: memory leak when client does many file operations
  2024-03-24 22:13         ` Chuck Lever III
  2024-03-24 22:54           ` Aw: " Jan Schunk
@ 2024-03-25 19:55           ` Jan Schunk
  2024-03-25 20:11             ` Chuck Lever III
  1 sibling, 1 reply; 24+ messages in thread
From: Jan Schunk @ 2024-03-25 19:55 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Linux NFS Mailing List, linux-kernel

The VM is now running 20 hours with 512MB RAM, no desktop, without the "noatime" mount option and without the "async" export option.

Currently there is no issue, but the memory usage is still contantly growing. It may just take longer before something happens.

top - 00:49:49 up 3 min,  1 user,  load average: 0,21, 0,19, 0,09
Tasks: 111 total,   1 running, 110 sleeping,   0 stopped,   0 zombie
%CPU(s):  0,2 us,  0,3 sy,  0,0 ni, 99,5 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st 
MiB Spch:    467,0 total,    302,3 free,     89,3 used,     88,1 buff/cache     
MiB Swap:    975,0 total,    975,0 free,      0,0 used.    377,7 avail Spch

top - 15:05:39 up 14:19,  1 user,  load average: 1,87, 1,72, 1,65
Tasks: 104 total,   1 running, 103 sleeping,   0 stopped,   0 zombie
%CPU(s):  0,2 us,  4,9 sy,  0,0 ni, 53,3 id, 39,0 wa,  0,0 hi,  2,6 si,  0,0 st 
MiB Spch:    467,0 total,     21,2 free,    147,1 used,    310,9 buff/cache     
MiB Swap:    975,0 total,    952,9 free,     22,1 used.    319,9 avail Spch

top - 20:48:16 up 20:01,  1 user,  load average: 5,02, 2,72, 2,08
Tasks: 104 total,   5 running,  99 sleeping,   0 stopped,   0 zombie
%CPU(s):  0,2 us, 46,4 sy,  0,0 ni, 11,9 id,  2,3 wa,  0,0 hi, 39,2 si,  0,0 st 
MiB Spch:    467,0 total,     16,9 free,    190,8 used,    271,6 buff/cache     
MiB Swap:    975,0 total,    952,9 free,     22,1 used.    276,2 avail Spch

> Gesendet: Sonntag, den 24.03.2024 um 23:13 Uhr
> Von: "Chuck Lever III" <chuck.lever@oracle.com>
> An: "Jan Schunk" <scpcom@gmx.de>
> Cc: "Jeff Layton" <jlayton@kernel.org>, "Neil Brown" <neilb@suse.de>, "Olga Kornievskaia" <kolga@netapp.com>, "Dai Ngo" <dai.ngo@oracle.com>, "Tom Talpey" <tom@talpey.com>, "Linux NFS Mailing List" <linux-nfs@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
> 
> 
> 
> > On Mar 24, 2024, at 5:39 PM, Jan Schunk <scpcom@gmx.de> wrote:
> > 
> > Yes, the VM is x86_64.
> > 
> > "pgrep -c nfsd" says: 9
> > 
> > I use NFS version 3.
> > 
> > All network ports are connected with 1GBit/s.
> > 
> > The exported file system is ext4.
> > 
> > I do not use any authentication.
> > 
> > The mount options in /etc/fstab are:
> > rw,noatime,nfsvers=3,proto=tcp,hard,nointr,timeo=600,rsize=32768,wsize=32768,noauto
> > 
> > The line in /etc/exports:
> > /export/data3 192.168.0.0/16(fsid=<uuid>,rw,no_root_squash,async,no_subtree_check)
> 
> Is it possible to reproduce this issue without the "noatime"
> mount option and without the "async" export option?
> 
> 
> >> Gesendet: Sonntag, den 24.03.2024 um 22:10 Uhr
> >> Von: "Chuck Lever III" <chuck.lever@oracle.com>
> >> An: "Jan Schunk" <scpcom@gmx.de>
> >> Cc: "Jeff Layton" <jlayton@kernel.org>, "Neil Brown" <neilb@suse.de>, "Olga Kornievskaia" <kolga@netapp.com>, "Dai Ngo" <dai.ngo@oracle.com>, "Tom Talpey" <tom@talpey.com>, "Linux NFS Mailing List" <linux-nfs@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
> >> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
> >> 
> >> 
> >>> On Mar 24, 2024, at 4:48 PM, Jan Schunk <scpcom@gmx.de> wrote:
> >>> 
> >>> The "heavy usage" is a simple script runinng on the client and does the following:
> >>> 1. Create a empty git repository on the share
> >>> 2. Unpacking a tar.gz archive (Qnap GPL source code)
> >>> 3. Remove some folders/files
> >>> 4. Use diff to compare it with an older version
> >>> 5. commit them to the git
> >>> 6. Repeat at step 2 with next archive
> >>> 
> >>> On my armhf NAS the other memory consuming workload is an SMB server.
> >> 
> >> I'm not sure any of us has a Freescale system to try this ...
> >> 
> >> 
> >>> On the test VM the other memory consuming workload is a GNOME desktop.
> >> 
> >> ... and so I'm hoping this VM is an x86_64 system.
> >> 
> >> 
> >>> But it does not make much difference if I stop other services it just takes a bit longer until the same issue happens.
> >>> The size of swap also does not make a difference.
> >> 
> >> What is the nfsd thread count on the server? 'pgrep -c nfsd'
> >> 
> >> What version of NFS does your client mount with?
> >> 
> >> What is the speed of the network between your client and server?
> >> 
> >> What is the type of the exported file system?
> >> 
> >> Do you use NFS with Kerberos?
> >> 
> >> 
> >>>> Gesendet: Sonntag, den 24.03.2024 um 21:14 Uhr
> >>>> Von: "Chuck Lever III" <chuck.lever@oracle.com>
> >>>> An: "Jan Schunk" <scpcom@gmx.de>
> >>>> Cc: "Jeff Layton" <jlayton@kernel.org>, "Neil Brown" <neilb@suse.de>, "Olga Kornievskaia" <kolga@netapp.com>, "Dai Ngo" <dai.ngo@oracle.com>, "Tom Talpey" <tom@talpey.com>, "Linux NFS Mailing List" <linux-nfs@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
> >>>> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
> >>>> 
> >>>> 
> >>>> 
> >>>>> On Mar 24, 2024, at 3:57 PM, Jan Schunk <scpcom@gmx.de> wrote:
> >>>>> 
> >>>>> Issue found on: v6.5.13 v6.6.13, v6.6.14, v6.6.20 and v6.8.1
> >>>>> Not found on: v6.4, v6.1.82 and below
> >>>>> Architectures: amd64 and arm(hf)
> >>>>> 
> >>>>> Steps to reproduce:
> >>>>> - Create a VM with 1GB RAM
> >>>>> - Install Debian 12
> >>>>> - Install linux-image-6.6.13+bpo-amd64-unsigned and nfs-kernel-server
> >>>>> - Export some folder
> >>>>> On the client:
> >>>>> - Mount the share
> >>>>> - Run a script that does produce heavy usage on the share (like unpacking large tar archives that cointain many small files into a git and commiting them)
> >>>> 
> >>>> Hi Jan, thanks for the report.
> >>>> 
> >>>> The "produce heavy usage" instruction here is pretty vague.
> >>>> I run CI testing with kmemleak enabled, and have not seen
> >>>> any leaks on recent kernels when running the git regression
> >>>> tests, which are similar to this kind of workload.
> >>>> 
> >>>> Can you try to narrow the reproducer for us, even just a
> >>>> little? What client action exactly is triggering the memory
> >>>> leak? Is there any other workload on your NFS server that
> >>>> might be consuming memory?
> >>>> 
> >>>> 
> >>>>> On my setup it takes 20-40 hours until the memory is full and oom-kill gets hired by nfsd to kill other processes. the memory stays full and the system reboots:
> >>>>> 
> >>>>> [121969.590000] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,task=dbus-daemon,pid=454,uid=101
> >>>>> [121969.600000] Out of memory: Killed process 454 (dbus-daemon) total-vm:6196kB, anon-rss:128kB, file-rss:1408kB, shmem-rss:0kB, UID:101 pgtables:12kB oom_score_adj:-900
> >>>>> [121971.700000] oom_reaper: reaped process 454 (dbus-daemon), now anon-rss:0kB, file-rss:64kB, shmem-rss:0kB
> >>>>> [121971.920000] nfsd invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
> >>>>> [121971.930000] CPU: 1 PID: 537 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
> >>>>> [121971.930000] Hardware name: Freescale LS1024A
> >>>>> [121971.940000]  unwind_backtrace from show_stack+0xb/0xc
> >>>>> [121971.940000]  show_stack from dump_stack_lvl+0x2b/0x34
> >>>>> [121971.950000]  dump_stack_lvl from dump_header+0x35/0x212
> >>>>> [121971.950000]  dump_header from out_of_memory+0x317/0x34c
> >>>>> [121971.960000]  out_of_memory from __alloc_pages+0x8e7/0xbb0
> >>>>> [121971.970000]  __alloc_pages from __alloc_pages_bulk+0x26d/0x3d8
> >>>>> [121971.970000]  __alloc_pages_bulk from svc_recv+0x9d/0x7d4
> >>>>> [121971.980000]  svc_recv from nfsd+0x7d/0xd4
> >>>>> [121971.980000]  nfsd from kthread+0xb9/0xcc
> >>>>> [121971.990000]  kthread from ret_from_fork+0x11/0x1c
> >>>>> [121971.990000] Exception stack(0xc2cadfb0 to 0xc2cadff8)
> >>>>> [121971.990000] dfa0:                                     00000000 00000000 00000000 00000000
> >>>>> [121972.000000] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> >>>>> [121972.010000] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> >>>>> [121972.020000] Mem-Info:
> >>>>> [121972.020000] active_anon:101 inactive_anon:127 isolated_anon:29
> >>>>> [121972.020000]  active_file:1200 inactive_file:1204 isolated_file:98
> >>>>> [121972.020000]  unevictable:394 dirty:296 writeback:17
> >>>>> [121972.020000]  slab_reclaimable:13680 slab_unreclaimable:4350
> >>>>> [121972.020000]  mapped:637 shmem:4 pagetables:414
> >>>>> [121972.020000]  sec_pagetables:0 bounce:0
> >>>>> [121972.020000]  kernel_misc_reclaimable:0
> >>>>> [121972.020000]  free:7279 free_pcp:184 free_cma:1094
> >>>>> [121972.060000] Node 0 active_anon:404kB inactive_anon:508kB active_file:4736kB inactive_file:4884kB unevictable:1576kB isolated(anon):116kB isolated(file):388kB mapped:2548kB dirty:1184kB writeback:68kB shmem:16kB writeback_tmp:0kB kernel_stack:1088kB pagetables:1656kB sec_pagetables:0kB all_unreclaimable? no
> >>>>> [121972.090000] Normal free:29116kB boost:18432kB min:26624kB low:28672kB high:30720kB reserved_highatomic:0KB active_anon:404kB inactive_anon:712kB active_file:4788kB inactive_file:4752kB unevictable:1576kB writepending:1252kB present:1048576kB managed:1011988kB mlocked:1576kB bounce:0kB free_pcp:736kB local_pcp:236kB free_cma:4376kB
> >>>>> [121972.120000] lowmem_reserve[]: 0 0
> >>>>> [121972.120000] Normal: 2137*4kB (UEC) 1173*8kB (UEC) 529*16kB (UEC) 19*32kB (UC) 7*64kB (C) 5*128kB (C) 2*256kB (C) 1*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 29116kB
> >>>>> [121972.140000] 2991 total pagecache pages
> >>>>> [121972.140000] 166 pages in swap cache
> >>>>> [121972.140000] Free swap  = 93424kB
> >>>>> [121972.150000] Total swap = 102396kB
> >>>>> [121972.150000] 262144 pages RAM
> >>>>> [121972.150000] 0 pages HighMem/MovableOnly
> >>>>> [121972.160000] 9147 pages reserved
> >>>>> [121972.160000] 4096 pages cma reserved
> >>>>> [121972.160000] Unreclaimable slab info:
> >>>>> [121972.170000] Name                      Used          Total
> >>>>> [121972.170000] bio-88                    64KB         64KB
> >>>>> [121972.180000] TCPv6                     61KB         61KB
> >>>>> [121972.180000] bio-76                    16KB         16KB
> >>>>> [121972.190000] bio-188                   11KB         11KB
> >>>>> [121972.190000] nfs_read_data             22KB         22KB
> >>>>> [121972.200000] kioctx                    15KB         15KB
> >>>>> [121972.200000] posix_timers_cache          7KB          7KB
> >>>>> [121972.210000] UDP                       63KB         63KB
> >>>>> [121972.220000] tw_sock_TCP                3KB          3KB
> >>>>> [121972.220000] request_sock_TCP           3KB          3KB
> >>>>> [121972.230000] TCP                       62KB         62KB
> >>>>> [121972.230000] bio-168                    7KB          7KB
> >>>>> [121972.240000] ep_head                    8KB          8KB
> >>>>> [121972.240000] request_queue             15KB         15KB
> >>>>> [121972.250000] bio-124                   18KB         40KB
> >>>>> [121972.250000] biovec-max               264KB        264KB
> >>>>> [121972.260000] biovec-128                63KB         63KB
> >>>>> [121972.260000] biovec-64                157KB        157KB
> >>>>> [121972.270000] skbuff_small_head         94KB         94KB
> >>>>> [121972.270000] skbuff_fclone_cache         55KB         63KB
> >>>>> [121972.280000] skbuff_head_cache         59KB         59KB
> >>>>> [121972.280000] fsnotify_mark_connector         16KB         28KB
> >>>>> [121972.290000] sigqueue                  19KB         31KB
> >>>>> [121972.300000] shmem_inode_cache       1622KB       1662KB
> >>>>> [121972.300000] kernfs_iattrs_cache         15KB         15KB
> >>>>> [121972.310000] kernfs_node_cache       2107KB       2138KB
> >>>>> [121972.310000] filp                     259KB        315KB
> >>>>> [121972.320000] net_namespace             30KB         30KB
> >>>>> [121972.320000] uts_namespace             15KB         15KB
> >>>>> [121972.330000] vma_lock                 143KB        179KB
> >>>>> [121972.330000] vm_area_struct           459KB        553KB
> >>>>> [121972.340000] sighand_cache            191KB        220KB
> >>>>> [121972.340000] task_struct              378KB        446KB
> >>>>> [121972.350000] anon_vma_chain           753KB        804KB
> >>>>> [121972.360000] anon_vma                 170KB        207KB
> >>>>> [121972.360000] trace_event_file          83KB         83KB
> >>>>> [121972.370000] mm_struct                157KB        173KB
> >>>>> [121972.370000] vmap_area                217KB        354KB
> >>>>> [121972.380000] kmalloc-8k               224KB        224KB
> >>>>> [121972.380000] kmalloc-4k               860KB        992KB
> >>>>> [121972.390000] kmalloc-2k               352KB        352KB
> >>>>> [121972.390000] kmalloc-1k               563KB        576KB
> >>>>> [121972.400000] kmalloc-512              936KB        936KB
> >>>>> [121972.400000] kmalloc-256              196KB        240KB
> >>>>> [121972.410000] kmalloc-192              160KB        169KB
> >>>>> [121972.410000] kmalloc-128              546KB        764KB
> >>>>> [121972.420000] kmalloc-64              1213KB       1288KB
> >>>>> [121972.420000] kmem_cache_node           12KB         12KB
> >>>>> [121972.430000] kmem_cache                16KB         16KB
> >>>>> [121972.440000] Tasks state (memory values in pages):
> >>>>> [121972.440000] [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
> >>>>> [121972.450000] [    209]     0   209     5140      320        0      320         0    16384      480         -1000 systemd-udevd
> >>>>> [121972.460000] [    230]   998   230     2887       55       32       23         0    18432        0             0 systemd-network
> >>>>> [121972.470000] [    420]     0   420      596        0        0        0         0     6144       22             0 mdadm
> >>>>> [121972.490000] [    421]   102   421     1393       56       32       24         0    10240        0             0 rpcbind
> >>>>> [121972.500000] [    429]   996   429     3695       17        0       17         0    20480        0             0 systemd-resolve
> >>>>> [121972.510000] [    433]     0   433      494       51        0       51         0     8192        0             0 rpc.idmapd
> >>>>> [121972.520000] [    434]     0   434      743       92       33       59         0     8192        7             0 nfsdcld
> >>>>> [121972.530000] [    451]     0   451      390        0        0        0         0     6144        0             0 acpid
> >>>>> [121972.540000] [    453]   105   453     1380       50       32       18         0    10240       18             0 avahi-daemon
> >>>>> [121972.550000] [    454]   101   454     1549       16        0       16         0    12288       32          -900 dbus-daemon
> >>>>> [121972.560000] [    466]     0   466     3771       60        0       60         0    14336        0             0 irqbalance
> >>>>> [121972.570000] [    475]     0   475     6269       32       32        0         0    18432        0             0 rsyslogd
> >>>>> [121972.590000] [    487]   105   487     1347       68       38       30         0    10240        0             0 avahi-daemon
> >>>>> [121972.600000] [    492]     0   492     1765        0        0        0         0    12288        0             0 cron
> >>>>> [121972.610000] [    493]     0   493     2593        0        0        0         0    16384        0             0 wpa_supplicant
> >>>>> [121972.620000] [    494]     0   494      607        0        0        0         0     8192       32             0 atd
> >>>>> [121972.630000] [    506]     0   506     1065       25        0       25         0    10240        0             0 rpc.mountd
> >>>>> [121972.640000] [    514]   103   514      809       25        0       25         0     8192        0             0 rpc.statd
> >>>>> [121972.650000] [    522]     0   522      999       31        0       31         0    10240        0             0 agetty
> >>>>> [121972.660000] [    524]     0   524     1540       28        0       28         0    12288        0             0 agetty
> >>>>> [121972.670000] [    525]     0   525     9098       56       32       24         0    34816        0             0 unattended-upgr
> >>>>> [121972.690000] [    526]     0   526     2621      320        0      320         0    14336      192         -1000 sshd
> >>>>> [121972.700000] [    539]     0   539      849       32       32        0         0     8192        0             0 in.tftpd
> >>>>> [121972.710000] [    544]   113   544     4361        6        6        0         0    16384       25             0 chronyd
> >>>>> [121972.720000] [    546]     0   546    16816       62       32       30         0    45056        0             0 winbindd
> >>>>> [121972.730000] [    552]     0   552    16905       59       32       27         0    45056        3             0 winbindd
> >>>>> [121972.740000] [    559]     0   559    17849       94       32       30        32    49152        4             0 smbd
> >>>>> [121972.750000] [    572]     0   572    17409       40       16       24         0    43008       11             0 smbd-notifyd
> >>>>> [121972.760000] [    573]     0   573    17412       16       16        0         0    43008       24             0 cleanupd
> >>>>> [121972.770000] [    584]     0   584     3036       20        0       20         0    16384        4             0 sshd
> >>>>> [121972.780000] [    589]     0   589    16816       32        2       30         0    40960       21             0 winbindd
> >>>>> [121972.790000] [    590]     0   590    27009       47       23       24         0    65536       21             0 smbd
> >>>>> [121972.810000] [    597]   501   597     3344       91       32       59         0    20480        0           100 systemd
> >>>>> [121972.820000] [    653]   501   653     3036        0        0        0         0    16384       33             0 sshd
> >>>>> [121972.830000] [    656]   501   656     1938       93       32       61         0    12288        9             0 bash
> >>>>> [121972.840000] [    704]     0   704      395      352       64      288         0     6144        0         -1000 watchdog
> >>>>> [121972.850000] [    738]   501   738     2834       12        0       12         0    16384        6             0 top
> >>>>> [121972.860000] [   4750]     0  4750     4218       44       26       18         0    18432       11             0 proftpd
> >>>>> [121972.870000] [   4768]     0  4768      401       31        0       31         0     6144        0             0 apt.systemd.dai
> >>>>> [121972.880000] [   4772]     0  4772      401       31        0       31         0     6144        0             0 apt.systemd.dai
> >>>>> [121972.890000] [   4778]     0  4778    13556       54        0       54         0    59392       26             0 apt-get
> >>>>> [121972.900000] Out of memory and no killable processes...
> >>>>> [121972.910000] Kernel panic - not syncing: System is deadlocked on memory
> >>>>> [121972.920000] CPU: 1 PID: 537 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
> >>>>> [121972.920000] Hardware name: Freescale LS1024A
> >>>>> [121972.930000]  unwind_backtrace from show_stack+0xb/0xc
> >>>>> [121972.930000]  show_stack from dump_stack_lvl+0x2b/0x34
> >>>>> [121972.940000]  dump_stack_lvl from panic+0xbf/0x264
> >>>>> [121972.940000]  panic from out_of_memory+0x33f/0x34c
> >>>>> [121972.950000]  out_of_memory from __alloc_pages+0x8e7/0xbb0
> >>>>> [121972.950000]  __alloc_pages from __alloc_pages_bulk+0x26d/0x3d8
> >>>>> [121972.960000]  __alloc_pages_bulk from svc_recv+0x9d/0x7d4
> >>>>> [121972.960000]  svc_recv from nfsd+0x7d/0xd4
> >>>>> [121972.970000]  nfsd from kthread+0xb9/0xcc
> >>>>> [121972.970000]  kthread from ret_from_fork+0x11/0x1c
> >>>>> [121972.980000] Exception stack(0xc2cadfb0 to 0xc2cadff8)
> >>>>> [121972.980000] dfa0:                                     00000000 00000000 00000000 00000000
> >>>>> [121972.990000] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> >>>>> [121973.000000] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> >>>>> [121973.010000] CPU0: stopping
> >>>>> [121973.010000] CPU: 0 PID: 540 Comm: nfsd Not tainted 6.8.1+nas5xx #nas5xx
> >>>>> [121973.010000] Hardware name: Freescale LS1024A
> >>>>> [121973.010000]  unwind_backtrace from show_stack+0xb/0xc
> >>>>> [121973.010000]  show_stack from dump_stack_lvl+0x2b/0x34
> >>>>> [121973.010000]  dump_stack_lvl from do_handle_IPI+0x151/0x178
> >>>>> [121973.010000]  do_handle_IPI from ipi_handler+0x13/0x18
> >>>>> [121973.010000]  ipi_handler from handle_percpu_devid_irq+0x55/0x144
> >>>>> [121973.010000]  handle_percpu_devid_irq from generic_handle_domain_irq+0x17/0x20
> >>>>> [121973.010000]  generic_handle_domain_irq from gic_handle_irq+0x5f/0x70
> >>>>> [121973.010000]  gic_handle_irq from generic_handle_arch_irq+0x27/0x34
> >>>>> [121973.010000]  generic_handle_arch_irq from call_with_stack+0xd/0x10
> >>>>> [121973.010000] Rebooting in 90 seconds..
> >>>> 
> >>>> --
> >>>> Chuck Lever
> >>>> 
> >>>> 
> >> 
> >> --
> >> Chuck Lever
> >> 
> >> 
> 
> --
> Chuck Lever
> 
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [External] : nfsd: memory leak when client does many file operations
  2024-03-25 19:55           ` Jan Schunk
@ 2024-03-25 20:11             ` Chuck Lever III
  2024-03-25 20:26               ` Aw: " Jan Schunk
  2024-03-26 11:15               ` Benjamin Coddington
  0 siblings, 2 replies; 24+ messages in thread
From: Chuck Lever III @ 2024-03-25 20:11 UTC (permalink / raw)
  To: Jan Schunk
  Cc: Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Linux NFS Mailing List, linux-kernel



> On Mar 25, 2024, at 3:55 PM, Jan Schunk <scpcom@gmx.de> wrote:
> 
> The VM is now running 20 hours with 512MB RAM, no desktop, without the "noatime" mount option and without the "async" export option.
> 
> Currently there is no issue, but the memory usage is still contantly growing. It may just take longer before something happens.
> 
> top - 00:49:49 up 3 min,  1 user,  load average: 0,21, 0,19, 0,09
> Tasks: 111 total,   1 running, 110 sleeping,   0 stopped,   0 zombie
> %CPU(s):  0,2 us,  0,3 sy,  0,0 ni, 99,5 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st 
> MiB Spch:    467,0 total,    302,3 free,     89,3 used,     88,1 buff/cache     
> MiB Swap:    975,0 total,    975,0 free,      0,0 used.    377,7 avail Spch
> 
> top - 15:05:39 up 14:19,  1 user,  load average: 1,87, 1,72, 1,65
> Tasks: 104 total,   1 running, 103 sleeping,   0 stopped,   0 zombie
> %CPU(s):  0,2 us,  4,9 sy,  0,0 ni, 53,3 id, 39,0 wa,  0,0 hi,  2,6 si,  0,0 st 
> MiB Spch:    467,0 total,     21,2 free,    147,1 used,    310,9 buff/cache     
> MiB Swap:    975,0 total,    952,9 free,     22,1 used.    319,9 avail Spch
> 
> top - 20:48:16 up 20:01,  1 user,  load average: 5,02, 2,72, 2,08
> Tasks: 104 total,   5 running,  99 sleeping,   0 stopped,   0 zombie
> %CPU(s):  0,2 us, 46,4 sy,  0,0 ni, 11,9 id,  2,3 wa,  0,0 hi, 39,2 si,  0,0 st 
> MiB Spch:    467,0 total,     16,9 free,    190,8 used,    271,6 buff/cache     
> MiB Swap:    975,0 total,    952,9 free,     22,1 used.    276,2 avail Spch

I don't see anything in your original memory dump that
might account for this. But I'm at a loss because I'm
a kernel developer, not a support guy -- I don't have
any tools or expertise that can troubleshoot a system
without rebuilding a kernel with instrumentation. My
first instinct is to tell you to bisect between v6.3
and v6.4, or at least enable kmemleak, but I'm guessing
you don't build your own kernels.

My only recourse at this point would be to try to
reproduce it myself, but unfortunately I've just
upgraded my whole lab to Fedora 39, and there's a grub
bug that prevents booting any custom-built kernel
on my hardware.

So I'm stuck until I can nail that down. Anyone else
care to help out?


--
Chuck Lever



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Aw: Re: [External] : nfsd: memory leak when client does many file operations
  2024-03-25 20:11             ` Chuck Lever III
@ 2024-03-25 20:26               ` Jan Schunk
  2024-03-25 20:36                 ` Chuck Lever III
  2024-03-26 11:15               ` Benjamin Coddington
  1 sibling, 1 reply; 24+ messages in thread
From: Jan Schunk @ 2024-03-25 20:26 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Linux NFS Mailing List, linux-kernel

I am building my own kernels, but I never tried kmemleak, is this just a Kconfig option?
What do you mean with "bisect between v6.3 and v6.4"?
Everything including v6.4 is OK, the problem starts at v6.5.

I also looked at some code already but there are huge changes to mm that happened in v6.5 and v6.6 so for me it is heavy to compare it with older versions to find one or more commits that may cause the issue.

Btw. thanks for guiding me so far.

> Gesendet: Montag, den 25.03.2024 um 21:11 Uhr
> Von: "Chuck Lever III" <chuck.lever@oracle.com>
> An: "Jan Schunk" <scpcom@gmx.de>
> Cc: "Jeff Layton" <jlayton@kernel.org>, "Neil Brown" <neilb@suse.de>, "Olga Kornievskaia" <kolga@netapp.com>, "Dai Ngo" <dai.ngo@oracle.com>, "Tom Talpey" <tom@talpey.com>, "Linux NFS Mailing List" <linux-nfs@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
> 
> 
> 
> > On Mar 25, 2024, at 3:55 PM, Jan Schunk <scpcom@gmx.de> wrote:
> > 
> > The VM is now running 20 hours with 512MB RAM, no desktop, without the "noatime" mount option and without the "async" export option.
> > 
> > Currently there is no issue, but the memory usage is still contantly growing. It may just take longer before something happens.
> > 
> > top - 00:49:49 up 3 min,  1 user,  load average: 0,21, 0,19, 0,09
> > Tasks: 111 total,   1 running, 110 sleeping,   0 stopped,   0 zombie
> > %CPU(s):  0,2 us,  0,3 sy,  0,0 ni, 99,5 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st 
> > MiB Spch:    467,0 total,    302,3 free,     89,3 used,     88,1 buff/cache     
> > MiB Swap:    975,0 total,    975,0 free,      0,0 used.    377,7 avail Spch
> > 
> > top - 15:05:39 up 14:19,  1 user,  load average: 1,87, 1,72, 1,65
> > Tasks: 104 total,   1 running, 103 sleeping,   0 stopped,   0 zombie
> > %CPU(s):  0,2 us,  4,9 sy,  0,0 ni, 53,3 id, 39,0 wa,  0,0 hi,  2,6 si,  0,0 st 
> > MiB Spch:    467,0 total,     21,2 free,    147,1 used,    310,9 buff/cache     
> > MiB Swap:    975,0 total,    952,9 free,     22,1 used.    319,9 avail Spch
> > 
> > top - 20:48:16 up 20:01,  1 user,  load average: 5,02, 2,72, 2,08
> > Tasks: 104 total,   5 running,  99 sleeping,   0 stopped,   0 zombie
> > %CPU(s):  0,2 us, 46,4 sy,  0,0 ni, 11,9 id,  2,3 wa,  0,0 hi, 39,2 si,  0,0 st 
> > MiB Spch:    467,0 total,     16,9 free,    190,8 used,    271,6 buff/cache     
> > MiB Swap:    975,0 total,    952,9 free,     22,1 used.    276,2 avail Spch
> 
> I don't see anything in your original memory dump that
> might account for this. But I'm at a loss because I'm
> a kernel developer, not a support guy -- I don't have
> any tools or expertise that can troubleshoot a system
> without rebuilding a kernel with instrumentation. My
> first instinct is to tell you to bisect between v6.3
> and v6.4, or at least enable kmemleak, but I'm guessing
> you don't build your own kernels.
> 
> My only recourse at this point would be to try to
> reproduce it myself, but unfortunately I've just
> upgraded my whole lab to Fedora 39, and there's a grub
> bug that prevents booting any custom-built kernel
> on my hardware.
> 
> So I'm stuck until I can nail that down. Anyone else
> care to help out?
> 
> 
> --
> Chuck Lever
> 
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [External] : nfsd: memory leak when client does many file operations
  2024-03-25 20:26               ` Aw: " Jan Schunk
@ 2024-03-25 20:36                 ` Chuck Lever III
  2024-03-26 16:50                   ` Aw: " Jan Schunk
  2024-03-28 22:03                   ` Jan Schunk
  0 siblings, 2 replies; 24+ messages in thread
From: Chuck Lever III @ 2024-03-25 20:36 UTC (permalink / raw)
  To: Jan Schunk
  Cc: Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Linux NFS Mailing List, linux-kernel



> On Mar 25, 2024, at 4:26 PM, Jan Schunk <scpcom@gmx.de> wrote:
> 
> I am building my own kernels, but I never tried kmemleak, is this just a Kconfig option?

  Location:
    -> Kernel hacking
      -> Memory Debugging
(1)     -> Kernel memory leak detector (DEBUG_KMEMLEAK [=n])


> What do you mean with "bisect between v6.3 and v6.4"?

After you "git clone" the kernel source:

$ git bisect start v6.4 v6.3

Build the kernel and test. If the test fails:

$ cd <your kernel source tree>; git bisect bad

If the test succeeds:

$ cd <your kernel source tree>; git bisect good

Rebuild and try again until it lands on the first broken commit.


> Everything including v6.4 is OK, the problem starts at v6.5.

I misremembered. Use "$ git bisect start v6.5 v6.4" then.


> I also looked at some code already but there are huge changes to mm that happened in v6.5 and v6.6 so for me it is heavy to compare it with older versions to find one or more commits that may cause the issue.

Bisection is a mechanical test-based process. You don't need
to look at code until you've reached the first bad commit.

--
Chuck Lever



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [External] : nfsd: memory leak when client does many file operations
  2024-03-25 20:11             ` Chuck Lever III
  2024-03-25 20:26               ` Aw: " Jan Schunk
@ 2024-03-26 11:15               ` Benjamin Coddington
  2024-03-26 17:04                 ` Aw: " Jan Schunk
  1 sibling, 1 reply; 24+ messages in thread
From: Benjamin Coddington @ 2024-03-26 11:15 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Jan Schunk, Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo,
	Tom Talpey, Linux NFS Mailing List, linux-kernel

On 25 Mar 2024, at 16:11, Chuck Lever III wrote:

>> On Mar 25, 2024, at 3:55 PM, Jan Schunk <scpcom@gmx.de> wrote:
>>
>> The VM is now running 20 hours with 512MB RAM, no desktop, without the "noatime" mount option and without the "async" export option.
>>
>> Currently there is no issue, but the memory usage is still contantly growing. It may just take longer before something happens.
>>
>> top - 00:49:49 up 3 min,  1 user,  load average: 0,21, 0,19, 0,09
>> Tasks: 111 total,   1 running, 110 sleeping,   0 stopped,   0 zombie
>> %CPU(s):  0,2 us,  0,3 sy,  0,0 ni, 99,5 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
>> MiB Spch:    467,0 total,    302,3 free,     89,3 used,     88,1 buff/cache
>> MiB Swap:    975,0 total,    975,0 free,      0,0 used.    377,7 avail Spch
>>
>> top - 15:05:39 up 14:19,  1 user,  load average: 1,87, 1,72, 1,65
>> Tasks: 104 total,   1 running, 103 sleeping,   0 stopped,   0 zombie
>> %CPU(s):  0,2 us,  4,9 sy,  0,0 ni, 53,3 id, 39,0 wa,  0,0 hi,  2,6 si,  0,0 st
>> MiB Spch:    467,0 total,     21,2 free,    147,1 used,    310,9 buff/cache
>> MiB Swap:    975,0 total,    952,9 free,     22,1 used.    319,9 avail Spch
>>
>> top - 20:48:16 up 20:01,  1 user,  load average: 5,02, 2,72, 2,08
>> Tasks: 104 total,   5 running,  99 sleeping,   0 stopped,   0 zombie
>> %CPU(s):  0,2 us, 46,4 sy,  0,0 ni, 11,9 id,  2,3 wa,  0,0 hi, 39,2 si,  0,0 st
>> MiB Spch:    467,0 total,     16,9 free,    190,8 used,    271,6 buff/cache
>> MiB Swap:    975,0 total,    952,9 free,     22,1 used.    276,2 avail Spch
>
> I don't see anything in your original memory dump that
> might account for this. But I'm at a loss because I'm
> a kernel developer, not a support guy -- I don't have
> any tools or expertise that can troubleshoot a system
> without rebuilding a kernel with instrumentation. My
> first instinct is to tell you to bisect between v6.3
> and v6.4, or at least enable kmemleak, but I'm guessing
> you don't build your own kernels.
>
> My only recourse at this point would be to try to
> reproduce it myself, but unfortunately I've just
> upgraded my whole lab to Fedora 39, and there's a grub
> bug that prevents booting any custom-built kernel
> on my hardware.
>
> So I'm stuck until I can nail that down. Anyone else
> care to help out?

Sure - I can throw some stuff..

Can we dig into which memory slabs might be growing?  Something like:

watch -d "cat /proc/slabinfo | grep nfsd"

.. for a bit might show what is growing.

Then use a systemtap script like the one below to trace the allocations - use:

stap -v --all-modules kmem_alloc.stp <slab_name>

Ben


8<---------------------------- save as kmem_alloc.stp ----------------------------

# This script displays the number of given slab allocations and the backtraces leading up to it.

global slab = @1
global stats, stacks
probe kernel.function("kmem_cache_alloc") {
        if (kernel_string($s->name) == slab) {
                stats[execname()] <<< 1
                stacks[execname(),kernel_string($s->name),backtrace()] <<< 1
        }
}
# Exit after 10 seconds
# probe timer.ms(10000) { exit () }
probe end {
        printf("Number of %s slab allocations by process\n", slab)
        foreach ([exec] in stats) {
                printf("%s:\t%d\n",exec,@count(stats[exec]))
        }
        printf("\nBacktrace of processes when allocating\n")
        foreach ([proc,cache,bt] in stacks) {
                printf("Exec: %s Name: %s  Count: %d\n",proc,cache,@count(stacks[proc,cache,bt]))
                print_stack(bt)
                printf("\n-------------------------------------------------------\n\n")
        }
}


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Aw: Re: [External] : nfsd: memory leak when client does many file operations
  2024-03-25 20:36                 ` Chuck Lever III
@ 2024-03-26 16:50                   ` Jan Schunk
  2024-03-28 22:03                   ` Jan Schunk
  1 sibling, 0 replies; 24+ messages in thread
From: Jan Schunk @ 2024-03-26 16:50 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Linux NFS Mailing List, linux-kernel

Thanks, I do some tests with DEBUG_KMEMLEAK enabled and git bisect now.

> Gesendet: Montag, den 25.03.2024 um 21:36 Uhr
> Von: "Chuck Lever III" <chuck.lever@oracle.com>
> An: "Jan Schunk" <scpcom@gmx.de>
> Cc: "Jeff Layton" <jlayton@kernel.org>, "Neil Brown" <neilb@suse.de>, "Olga Kornievskaia" <kolga@netapp.com>, "Dai Ngo" <dai.ngo@oracle.com>, "Tom Talpey" <tom@talpey.com>, "Linux NFS Mailing List" <linux-nfs@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
> 
> 
> 
> > On Mar 25, 2024, at 4:26 PM, Jan Schunk <scpcom@gmx.de> wrote:
> > 
> > I am building my own kernels, but I never tried kmemleak, is this just a Kconfig option?
> 
>   Location:
>     -> Kernel hacking
>       -> Memory Debugging
> (1)     -> Kernel memory leak detector (DEBUG_KMEMLEAK [=n])
> 
> 
> > What do you mean with "bisect between v6.3 and v6.4"?
> 
> After you "git clone" the kernel source:
> 
> $ git bisect start v6.4 v6.3
> 
> Build the kernel and test. If the test fails:
> 
> $ cd <your kernel source tree>; git bisect bad
> 
> If the test succeeds:
> 
> $ cd <your kernel source tree>; git bisect good
> 
> Rebuild and try again until it lands on the first broken commit.
> 
> 
> > Everything including v6.4 is OK, the problem starts at v6.5.
> 
> I misremembered. Use "$ git bisect start v6.5 v6.4" then.
> 
> 
> > I also looked at some code already but there are huge changes to mm that happened in v6.5 and v6.6 so for me it is heavy to compare it with older versions to find one or more commits that may cause the issue.
> 
> Bisection is a mechanical test-based process. You don't need
> to look at code until you've reached the first bad commit.
> 
> --
> Chuck Lever
> 
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Aw: Re: [External] : nfsd: memory leak when client does many file operations
  2024-03-26 11:15               ` Benjamin Coddington
@ 2024-03-26 17:04                 ` Jan Schunk
  2024-03-26 17:13                   ` Benjamin Coddington
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Schunk @ 2024-03-26 17:04 UTC (permalink / raw)
  To: Benjamin Coddington
  Cc: Chuck Lever III, Jeff Layton, Neil Brown, Olga Kornievskaia,
	Dai Ngo, Tom Talpey, Linux NFS Mailing List, linux-kernel

Before I start doing this on my own build I tried it with unmodified linux-image-6.6.13+bpo-amd64 from Debian 12.
I installed systemtap, linux-headers-6.6.13+bpo-amd64 and linux-image-6.6.13+bpo-amd64-dbg and tried to run stap:

user@deb:~$ sudo stap -v --all-modules kmem_alloc.stp nfsd_file
WARNING: Kernel function symbol table missing [man warning::symbols]
Pass 1: parsed user script and 484 library scripts using 110120virt/96896res/7168shr/89800data kb, in 1360usr/1080sys/4963real ms.
WARNING: cannot find module kernel debuginfo: No DWARF information found [man warning::debuginfo]
semantic error: resolution failed in DWARF builder

semantic error: while resolving probe point: identifier 'kernel' at kmem_alloc.stp:5:7
        source: probe kernel.function("kmem_cache_alloc") {
                      ^

semantic error: no match

Pass 2: analyzed script: 1 probe, 5 functions, 1 embed, 3 globals using 112132virt/100352res/8704shr/91792data kb, in 30usr/30sys/167real ms.
Pass 2: analysis failed.  [man error::pass2]
Tip: /usr/share/doc/systemtap/README.Debian should help you get started.
user@deb:~$ 

user@deb:~$ grep -E 'CONFIG_DEBUG_INFO|CONFIG_KPROBES|CONFIG_DEBUG_FS|CONFIG_RELAY' /boot/config-6.6.13+bpo-amd64 
CONFIG_RELAY=y
CONFIG_KPROBES=y
CONFIG_KPROBES_ON_FTRACE=y
CONFIG_DEBUG_INFO=y
# CONFIG_DEBUG_INFO_NONE is not set
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
# CONFIG_DEBUG_INFO_DWARF4 is not set
# CONFIG_DEBUG_INFO_DWARF5 is not set
# CONFIG_DEBUG_INFO_REDUCED is not set
CONFIG_DEBUG_INFO_COMPRESSED_NONE=y
# CONFIG_DEBUG_INFO_COMPRESSED_ZLIB is not set
# CONFIG_DEBUG_INFO_SPLIT is not set
CONFIG_DEBUG_INFO_BTF=y
CONFIG_DEBUG_INFO_BTF_MODULES=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_FS_ALLOW_ALL=y
# CONFIG_DEBUG_FS_DISALLOW_MOUNT is not set
# CONFIG_DEBUG_FS_ALLOW_NONE is not set
user@deb:~$ 

Do I need to enable other options?


> Gesendet: Dienstag, den 26.03.2024 um 12:15 Uhr
> Von: "Benjamin Coddington" <bcodding@redhat.com>
> An: "Chuck Lever III" <chuck.lever@oracle.com>
> Cc: "Jan Schunk" <scpcom@gmx.de>, "Jeff Layton" <jlayton@kernel.org>, "Neil Brown" <neilb@suse.de>, "Olga Kornievskaia" <kolga@netapp.com>, "Dai Ngo" <dai.ngo@oracle.com>, "Tom Talpey" <tom@talpey.com>, "Linux NFS Mailing List" <linux-nfs@vger.kernel.org>, linux-kernel@vger.kernel.org
> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
> 
> On 25 Mar 2024, at 16:11, Chuck Lever III wrote:
> 
> >> On Mar 25, 2024, at 3:55 PM, Jan Schunk <scpcom@gmx.de> wrote:
> >>
> >> The VM is now running 20 hours with 512MB RAM, no desktop, without the "noatime" mount option and without the "async" export option.
> >>
> >> Currently there is no issue, but the memory usage is still contantly growing. It may just take longer before something happens.
> >>
> >> top - 00:49:49 up 3 min,  1 user,  load average: 0,21, 0,19, 0,09
> >> Tasks: 111 total,   1 running, 110 sleeping,   0 stopped,   0 zombie
> >> %CPU(s):  0,2 us,  0,3 sy,  0,0 ni, 99,5 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
> >> MiB Spch:    467,0 total,    302,3 free,     89,3 used,     88,1 buff/cache
> >> MiB Swap:    975,0 total,    975,0 free,      0,0 used.    377,7 avail Spch
> >>
> >> top - 15:05:39 up 14:19,  1 user,  load average: 1,87, 1,72, 1,65
> >> Tasks: 104 total,   1 running, 103 sleeping,   0 stopped,   0 zombie
> >> %CPU(s):  0,2 us,  4,9 sy,  0,0 ni, 53,3 id, 39,0 wa,  0,0 hi,  2,6 si,  0,0 st
> >> MiB Spch:    467,0 total,     21,2 free,    147,1 used,    310,9 buff/cache
> >> MiB Swap:    975,0 total,    952,9 free,     22,1 used.    319,9 avail Spch
> >>
> >> top - 20:48:16 up 20:01,  1 user,  load average: 5,02, 2,72, 2,08
> >> Tasks: 104 total,   5 running,  99 sleeping,   0 stopped,   0 zombie
> >> %CPU(s):  0,2 us, 46,4 sy,  0,0 ni, 11,9 id,  2,3 wa,  0,0 hi, 39,2 si,  0,0 st
> >> MiB Spch:    467,0 total,     16,9 free,    190,8 used,    271,6 buff/cache
> >> MiB Swap:    975,0 total,    952,9 free,     22,1 used.    276,2 avail Spch
> >
> > I don't see anything in your original memory dump that
> > might account for this. But I'm at a loss because I'm
> > a kernel developer, not a support guy -- I don't have
> > any tools or expertise that can troubleshoot a system
> > without rebuilding a kernel with instrumentation. My
> > first instinct is to tell you to bisect between v6.3
> > and v6.4, or at least enable kmemleak, but I'm guessing
> > you don't build your own kernels.
> >
> > My only recourse at this point would be to try to
> > reproduce it myself, but unfortunately I've just
> > upgraded my whole lab to Fedora 39, and there's a grub
> > bug that prevents booting any custom-built kernel
> > on my hardware.
> >
> > So I'm stuck until I can nail that down. Anyone else
> > care to help out?
> 
> Sure - I can throw some stuff..
> 
> Can we dig into which memory slabs might be growing?  Something like:
> 
> watch -d "cat /proc/slabinfo | grep nfsd"
> 
> .. for a bit might show what is growing.
> 
> Then use a systemtap script like the one below to trace the allocations - use:
> 
> stap -v --all-modules kmem_alloc.stp <slab_name>
> 
> Ben
> 
> 
> 8<---------------------------- save as kmem_alloc.stp ----------------------------
> 
> # This script displays the number of given slab allocations and the backtraces leading up to it.
> 
> global slab = @1
> global stats, stacks
> probe kernel.function("kmem_cache_alloc") {
>         if (kernel_string($s->name) == slab) {
>                 stats[execname()] <<< 1
>                 stacks[execname(),kernel_string($s->name),backtrace()] <<< 1
>         }
> }
> # Exit after 10 seconds
> # probe timer.ms(10000) { exit () }
> probe end {
>         printf("Number of %s slab allocations by process\n", slab)
>         foreach ([exec] in stats) {
>                 printf("%s:\t%d\n",exec,@count(stats[exec]))
>         }
>         printf("\nBacktrace of processes when allocating\n")
>         foreach ([proc,cache,bt] in stacks) {
>                 printf("Exec: %s Name: %s  Count: %d\n",proc,cache,@count(stacks[proc,cache,bt]))
>                 print_stack(bt)
>                 printf("\n-------------------------------------------------------\n\n")
>         }
> }
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [External] : nfsd: memory leak when client does many file operations
  2024-03-26 17:04                 ` Aw: " Jan Schunk
@ 2024-03-26 17:13                   ` Benjamin Coddington
  2024-03-26 17:15                     ` Benjamin Coddington
  0 siblings, 1 reply; 24+ messages in thread
From: Benjamin Coddington @ 2024-03-26 17:13 UTC (permalink / raw)
  To: Jan Schunk
  Cc: Chuck Lever III, Jeff Layton, Neil Brown, Olga Kornievskaia,
	Dai Ngo, Tom Talpey, Linux NFS Mailing List, linux-kernel

On 26 Mar 2024, at 13:04, Jan Schunk wrote:

> Before I start doing this on my own build I tried it with unmodified linux-image-6.6.13+bpo-amd64 from Debian 12.
> I installed systemtap, linux-headers-6.6.13+bpo-amd64 and linux-image-6.6.13+bpo-amd64-dbg and tried to run stap:
>
> user@deb:~$ sudo stap -v --all-modules kmem_alloc.stp nfsd_file
> WARNING: Kernel function symbol table missing [man warning::symbols]
> Pass 1: parsed user script and 484 library scripts using 110120virt/96896res/7168shr/89800data kb, in 1360usr/1080sys/4963real ms.
> WARNING: cannot find module kernel debuginfo: No DWARF information found [man warning::debuginfo]
> semantic error: resolution failed in DWARF builder
>
> semantic error: while resolving probe point: identifier 'kernel' at kmem_alloc.stp:5:7
>         source: probe kernel.function("kmem_cache_alloc") {
>                       ^
>
> semantic error: no match
>
> Pass 2: analyzed script: 1 probe, 5 functions, 1 embed, 3 globals using 112132virt/100352res/8704shr/91792data kb, in 30usr/30sys/167real ms.
> Pass 2: analysis failed.  [man error::pass2]
> Tip: /usr/share/doc/systemtap/README.Debian should help you get started.
> user@deb:~$
>
> user@deb:~$ grep -E 'CONFIG_DEBUG_INFO|CONFIG_KPROBES|CONFIG_DEBUG_FS|CONFIG_RELAY' /boot/config-6.6.13+bpo-amd64
> CONFIG_RELAY=y
> CONFIG_KPROBES=y
> CONFIG_KPROBES_ON_FTRACE=y
> CONFIG_DEBUG_INFO=y
> # CONFIG_DEBUG_INFO_NONE is not set
> CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
> # CONFIG_DEBUG_INFO_DWARF4 is not set
> # CONFIG_DEBUG_INFO_DWARF5 is not set
> # CONFIG_DEBUG_INFO_REDUCED is not set
> CONFIG_DEBUG_INFO_COMPRESSED_NONE=y
> # CONFIG_DEBUG_INFO_COMPRESSED_ZLIB is not set
> # CONFIG_DEBUG_INFO_SPLIT is not set
> CONFIG_DEBUG_INFO_BTF=y
> CONFIG_DEBUG_INFO_BTF_MODULES=y
> CONFIG_DEBUG_FS=y
> CONFIG_DEBUG_FS_ALLOW_ALL=y
> # CONFIG_DEBUG_FS_DISALLOW_MOUNT is not set
> # CONFIG_DEBUG_FS_ALLOW_NONE is not set
> user@deb:~$
>
> Do I need to enable other options?

You should just need DEBUG_INFO.. maybe stap can't find it?  You can try to add: -r /path/to/the/kernel/build

.. but usually I use this option for a cross-compile.  Usually I don't have to muck around without the debuginfo packages either.  If I don't have them then I'm annotating the kernel directly.

Maybe just a view of what's happening in /proc/slabinfo would be enough..

Ben


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [External] : nfsd: memory leak when client does many file operations
  2024-03-26 17:13                   ` Benjamin Coddington
@ 2024-03-26 17:15                     ` Benjamin Coddington
  2024-03-26 19:05                       ` Aw: " Jan Schunk
  0 siblings, 1 reply; 24+ messages in thread
From: Benjamin Coddington @ 2024-03-26 17:15 UTC (permalink / raw)
  To: Jan Schunk
  Cc: Chuck Lever III, Jeff Layton, Neil Brown, Olga Kornievskaia,
	Dai Ngo, Tom Talpey, Linux NFS Mailing List, linux-kernel

On 26 Mar 2024, at 13:13, Benjamin Coddington wrote:

> On 26 Mar 2024, at 13:04, Jan Schunk wrote:
>
>> Before I start doing this on my own build I tried it with unmodified linux-image-6.6.13+bpo-amd64 from Debian 12.
>> I installed systemtap, linux-headers-6.6.13+bpo-amd64 and linux-image-6.6.13+bpo-amd64-dbg and tried to run stap:
>>
>> user@deb:~$ sudo stap -v --all-modules kmem_alloc.stp nfsd_file
>> WARNING: Kernel function symbol table missing [man warning::symbols]
>> Pass 1: parsed user script and 484 library scripts using 110120virt/96896res/7168shr/89800data kb, in 1360usr/1080sys/4963real ms.
>> WARNING: cannot find module kernel debuginfo: No DWARF information found [man warning::debuginfo]
>> semantic error: resolution failed in DWARF builder
>>
>> semantic error: while resolving probe point: identifier 'kernel' at kmem_alloc.stp:5:7
>>         source: probe kernel.function("kmem_cache_alloc") {
>>                       ^
>>
>> semantic error: no match
>>
>> Pass 2: analyzed script: 1 probe, 5 functions, 1 embed, 3 globals using 112132virt/100352res/8704shr/91792data kb, in 30usr/30sys/167real ms.
>> Pass 2: analysis failed.  [man error::pass2]
>> Tip: /usr/share/doc/systemtap/README.Debian should help you get started.
>> user@deb:~$
>>
>> user@deb:~$ grep -E 'CONFIG_DEBUG_INFO|CONFIG_KPROBES|CONFIG_DEBUG_FS|CONFIG_RELAY' /boot/config-6.6.13+bpo-amd64
>> CONFIG_RELAY=y
>> CONFIG_KPROBES=y
>> CONFIG_KPROBES_ON_FTRACE=y
>> CONFIG_DEBUG_INFO=y
>> # CONFIG_DEBUG_INFO_NONE is not set
>> CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
>> # CONFIG_DEBUG_INFO_DWARF4 is not set
>> # CONFIG_DEBUG_INFO_DWARF5 is not set
>> # CONFIG_DEBUG_INFO_REDUCED is not set
>> CONFIG_DEBUG_INFO_COMPRESSED_NONE=y
>> # CONFIG_DEBUG_INFO_COMPRESSED_ZLIB is not set
>> # CONFIG_DEBUG_INFO_SPLIT is not set
>> CONFIG_DEBUG_INFO_BTF=y
>> CONFIG_DEBUG_INFO_BTF_MODULES=y
>> CONFIG_DEBUG_FS=y
>> CONFIG_DEBUG_FS_ALLOW_ALL=y
>> # CONFIG_DEBUG_FS_DISALLOW_MOUNT is not set
>> # CONFIG_DEBUG_FS_ALLOW_NONE is not set
>> user@deb:~$
>>
>> Do I need to enable other options?
>
> You should just need DEBUG_INFO.. maybe stap can't find it?  You can try to add: -r /path/to/the/kernel/build

oh, nevermind - you're using a packaged kernel.  I'm no familiar with the packaged requirements for systemtap on debian.

Ben


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Aw: Re: [External] : nfsd: memory leak when client does many file operations
  2024-03-26 17:15                     ` Benjamin Coddington
@ 2024-03-26 19:05                       ` Jan Schunk
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Schunk @ 2024-03-26 19:05 UTC (permalink / raw)
  To: Benjamin Coddington
  Cc: Chuck Lever III, Jeff Layton, Neil Brown, Olga Kornievskaia,
	Dai Ngo, Tom Talpey, Linux NFS Mailing List, linux-kernel

Thanks, yes this was a packaged kernel, I will try it with my own build later.

On an earlier test run I saved slabinfo to a file sometimes. On Kernel 6.6.x I can see nfsd_file <active_objs> and <num_objs> is growing from 72 to 324 within 14 hours. But I can not compare it to older kernels since there is no nfsd_file in the list.

top - 00:49:49 up 3 min,  1 user,  load average: 0,21, 0,19, 0,09
Tasks: 111 total,   1 running, 110 sleeping,   0 stopped,   0 zombie
%CPU(s):  0,2 us,  0,3 sy,  0,0 ni, 99,5 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
MiB Spch:    467,0 total,    302,3 free,     89,3 used,     88,1 buff/cache
MiB Swap:    975,0 total,    975,0 free,      0,0 used.    377,7 avail Spch

slabinfo
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
nfsd_file             72     72    112   36    1 : tunables    0    0    0 : slabdata      2      2      0

top - 15:05:39 up 14:19,  1 user,  load average: 1,87, 1,72, 1,65
Tasks: 104 total,   1 running, 103 sleeping,   0 stopped,   0 zombie
%CPU(s):  0,2 us,  4,9 sy,  0,0 ni, 53,3 id, 39,0 wa,  0,0 hi,  2,6 si,  0,0 st
MiB Spch:    467,0 total,     21,2 free,    147,1 used,    310,9 buff/cache
MiB Swap:    975,0 total,    952,9 free,     22,1 used.    319,9 avail Spch

slabinfo
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
nfsd_file            324    324    112   36    1 : tunables    0    0    0 : slabdata      9      9      0


> Gesendet: Dienstag, den 26.03.2024 um 18:15 Uhr
> Von: "Benjamin Coddington" <bcodding@redhat.com>
> An: "Jan Schunk" <scpcom@gmx.de>
> Cc: "Chuck Lever III" <chuck.lever@oracle.com>, "Jeff Layton" <jlayton@kernel.org>, "Neil Brown" <neilb@suse.de>, "Olga Kornievskaia" <kolga@netapp.com>, "Dai Ngo" <dai.ngo@oracle.com>, "Tom Talpey" <tom@talpey.com>, "Linux NFS Mailing List" <linux-nfs@vger.kernel.org>, linux-kernel@vger.kernel.org
> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
>
> On 26 Mar 2024, at 13:13, Benjamin Coddington wrote:
>
> > On 26 Mar 2024, at 13:04, Jan Schunk wrote:
> >
> >> Before I start doing this on my own build I tried it with unmodified linux-image-6.6.13+bpo-amd64 from Debian 12.
> >> I installed systemtap, linux-headers-6.6.13+bpo-amd64 and linux-image-6.6.13+bpo-amd64-dbg and tried to run stap:
> >>
> >> user@deb:~$ sudo stap -v --all-modules kmem_alloc.stp nfsd_file
> >> WARNING: Kernel function symbol table missing [man warning::symbols]
> >> Pass 1: parsed user script and 484 library scripts using 110120virt/96896res/7168shr/89800data kb, in 1360usr/1080sys/4963real ms.
> >> WARNING: cannot find module kernel debuginfo: No DWARF information found [man warning::debuginfo]
> >> semantic error: resolution failed in DWARF builder
> >>
> >> semantic error: while resolving probe point: identifier 'kernel' at kmem_alloc.stp:5:7
> >>         source: probe kernel.function("kmem_cache_alloc") {
> >>                       ^
> >>
> >> semantic error: no match
> >>
> >> Pass 2: analyzed script: 1 probe, 5 functions, 1 embed, 3 globals using 112132virt/100352res/8704shr/91792data kb, in 30usr/30sys/167real ms.
> >> Pass 2: analysis failed.  [man error::pass2]
> >> Tip: /usr/share/doc/systemtap/README.Debian should help you get started.
> >> user@deb:~$
> >>
> >> user@deb:~$ grep -E 'CONFIG_DEBUG_INFO|CONFIG_KPROBES|CONFIG_DEBUG_FS|CONFIG_RELAY' /boot/config-6.6.13+bpo-amd64
> >> CONFIG_RELAY=y
> >> CONFIG_KPROBES=y
> >> CONFIG_KPROBES_ON_FTRACE=y
> >> CONFIG_DEBUG_INFO=y
> >> # CONFIG_DEBUG_INFO_NONE is not set
> >> CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
> >> # CONFIG_DEBUG_INFO_DWARF4 is not set
> >> # CONFIG_DEBUG_INFO_DWARF5 is not set
> >> # CONFIG_DEBUG_INFO_REDUCED is not set
> >> CONFIG_DEBUG_INFO_COMPRESSED_NONE=y
> >> # CONFIG_DEBUG_INFO_COMPRESSED_ZLIB is not set
> >> # CONFIG_DEBUG_INFO_SPLIT is not set
> >> CONFIG_DEBUG_INFO_BTF=y
> >> CONFIG_DEBUG_INFO_BTF_MODULES=y
> >> CONFIG_DEBUG_FS=y
> >> CONFIG_DEBUG_FS_ALLOW_ALL=y
> >> # CONFIG_DEBUG_FS_DISALLOW_MOUNT is not set
> >> # CONFIG_DEBUG_FS_ALLOW_NONE is not set
> >> user@deb:~$
> >>
> >> Do I need to enable other options?
> >
> > You should just need DEBUG_INFO.. maybe stap can't find it?  You can try to add: -r /path/to/the/kernel/build
>
> oh, nevermind - you're using a packaged kernel.  I'm no familiar with the packaged requirements for systemtap on debian.
>
> Ben
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Aw: Re: [External] : nfsd: memory leak when client does many file operations
  2024-03-25 20:36                 ` Chuck Lever III
  2024-03-26 16:50                   ` Aw: " Jan Schunk
@ 2024-03-28 22:03                   ` Jan Schunk
  2024-03-29  0:25                     ` Chuck Lever III
  1 sibling, 1 reply; 24+ messages in thread
From: Jan Schunk @ 2024-03-28 22:03 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Linux NFS Mailing List, linux-kernel

Inside the VM I was not able to reproduce the issue on v6.5.x so I keep concentrating on v6.6.x.

Current status:

$ git bisect start v6.6 v6.5
Bisecting: 7882 revisions left to test after this (roughly 13 steps)
[a1c19328a160c80251868dbd80066dce23d07995] Merge tag 'soc-arm-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

--
$ git bisect good
Bisecting: 3935 revisions left to test after this (roughly 12 steps)
[e4f1b8202fb59c56a3de7642d50326923670513f] Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

--
$ git bisect bad
Bisecting: 2014 revisions left to test after this (roughly 11 steps)
[e0152e7481c6c63764d6ea8ee41af5cf9dfac5e9] Merge tag 'riscv-for-linus-6.6-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

--
$ git bisect bad
Bisecting: 975 revisions left to test after this (roughly 10 steps)
[4a3b1007eeb26b2bb7ae4d734cc8577463325165] Merge tag 'pinctrl-v6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

--
$ git bisect good
Bisecting: 476 revisions left to test after this (roughly 9 steps)
[4debf77169ee459c46ec70e13dc503bc25efd7d2] Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

--
$ git bisect good
Bisecting: 237 revisions left to test after this (roughly 8 steps)
[e7e9423db459423d3dcb367217553ad9ededadc9] Merge tag 'v6.6-vfs.super.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

> Gesendet: Montag, den 25.03.2024 um 21:36 Uhr
> Von: "Chuck Lever III" <chuck.lever@oracle.com>
> An: "Jan Schunk" <scpcom@gmx.de>
> Cc: "Jeff Layton" <jlayton@kernel.org>, "Neil Brown" <neilb@suse.de>, "Olga Kornievskaia" <kolga@netapp.com>, "Dai Ngo" <dai.ngo@oracle.com>, "Tom Talpey" <tom@talpey.com>, "Linux NFS Mailing List" <linux-nfs@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
> 
> 
> 
> > On Mar 25, 2024, at 4:26 PM, Jan Schunk <scpcom@gmx.de> wrote:
> > 
> > I am building my own kernels, but I never tried kmemleak, is this just a Kconfig option?
> 
>   Location:
>     -> Kernel hacking
>       -> Memory Debugging
> (1)     -> Kernel memory leak detector (DEBUG_KMEMLEAK [=n])
> 
> 
> > What do you mean with "bisect between v6.3 and v6.4"?
> 
> After you "git clone" the kernel source:
> 
> $ git bisect start v6.4 v6.3
> 
> Build the kernel and test. If the test fails:
> 
> $ cd <your kernel source tree>; git bisect bad
> 
> If the test succeeds:
> 
> $ cd <your kernel source tree>; git bisect good
> 
> Rebuild and try again until it lands on the first broken commit.
> 
> 
> > Everything including v6.4 is OK, the problem starts at v6.5.
> 
> I misremembered. Use "$ git bisect start v6.5 v6.4" then.
> 
> 
> > I also looked at some code already but there are huge changes to mm that happened in v6.5 and v6.6 so for me it is heavy to compare it with older versions to find one or more commits that may cause the issue.
> 
> Bisection is a mechanical test-based process. You don't need
> to look at code until you've reached the first bad commit.
> 
> --
> Chuck Lever
> 
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [External] : nfsd: memory leak when client does many file operations
  2024-03-28 22:03                   ` Jan Schunk
@ 2024-03-29  0:25                     ` Chuck Lever III
  2024-03-30 15:26                       ` Aw: " Jan Schunk
  0 siblings, 1 reply; 24+ messages in thread
From: Chuck Lever III @ 2024-03-29  0:25 UTC (permalink / raw)
  To: Jan Schunk, Benjamin Coddington
  Cc: Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Linux NFS Mailing List, linux-kernel



> On Mar 28, 2024, at 6:03 PM, Jan Schunk <scpcom@gmx.de> wrote:
> 
> Inside the VM I was not able to reproduce the issue on v6.5.x so I keep concentrating on v6.6.x.
> 
> Current status:
> 
> $ git bisect start v6.6 v6.5
> Bisecting: 7882 revisions left to test after this (roughly 13 steps)
> [a1c19328a160c80251868dbd80066dce23d07995] Merge tag 'soc-arm-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
> 
> --
> $ git bisect good
> Bisecting: 3935 revisions left to test after this (roughly 12 steps)
> [e4f1b8202fb59c56a3de7642d50326923670513f] Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
> 
> --
> $ git bisect bad
> Bisecting: 2014 revisions left to test after this (roughly 11 steps)
> [e0152e7481c6c63764d6ea8ee41af5cf9dfac5e9] Merge tag 'riscv-for-linus-6.6-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
> 
> --
> $ git bisect bad
> Bisecting: 975 revisions left to test after this (roughly 10 steps)
> [4a3b1007eeb26b2bb7ae4d734cc8577463325165] Merge tag 'pinctrl-v6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
> 
> --
> $ git bisect good
> Bisecting: 476 revisions left to test after this (roughly 9 steps)
> [4debf77169ee459c46ec70e13dc503bc25efd7d2] Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
> 
> --
> $ git bisect good
> Bisecting: 237 revisions left to test after this (roughly 8 steps)
> [e7e9423db459423d3dcb367217553ad9ededadc9] Merge tag 'v6.6-vfs.super.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Good, keep going.

I've tried replicating the free memory loss here, using the
git regression suite on my nfsd-fixes branch. Taking a
meminfo sample between each of four test runs, the only
clear downward trend I see is:

free:3019839 < start
free:2858438 < after first run
free:2836058 < after second run
free:2822077 < after third run
free:2797143 < after fourth run

All other metrics seem to vary arbitrarily.

The only slightly suspicious slab I see is buffer_head.
/sys/kernel/debug/kmemleak has a single entry in it, not
related to NFSD.

At this point I'm kind of suspecting that the issue will
not be related to NFSD or SUNRPC or any particular slab
cache, but will be orphaned whole pages. Your bisect
still seems like the best shot at localizing the
misbehavior.


--
Chuck Lever



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Aw: Re: [External] : nfsd: memory leak when client does many file operations
  2024-03-29  0:25                     ` Chuck Lever III
@ 2024-03-30 15:26                       ` Jan Schunk
  2024-03-30 16:26                         ` Chuck Lever
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Schunk @ 2024-03-30 15:26 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Benjamin Coddington, Jeff Layton, Neil Brown, Olga Kornievskaia,
	Dai Ngo, Tom Talpey, Linux NFS Mailing List, linux-kernel

Full test result:

$ git bisect start v6.6 v6.5
Bisecting: 7882 revisions left to test after this (roughly 13 steps)
[a1c19328a160c80251868dbd80066dce23d07995] Merge tag 'soc-arm-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
--
$ git bisect good
Bisecting: 3935 revisions left to test after this (roughly 12 steps)
[e4f1b8202fb59c56a3de7642d50326923670513f] Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
--
$ git bisect bad
Bisecting: 2014 revisions left to test after this (roughly 11 steps)
[e0152e7481c6c63764d6ea8ee41af5cf9dfac5e9] Merge tag 'riscv-for-linus-6.6-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
--
$ git bisect bad
Bisecting: 975 revisions left to test after this (roughly 10 steps)
[4a3b1007eeb26b2bb7ae4d734cc8577463325165] Merge tag 'pinctrl-v6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
--
$ git bisect good
Bisecting: 476 revisions left to test after this (roughly 9 steps)
[4debf77169ee459c46ec70e13dc503bc25efd7d2] Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
--
$ git bisect good
Bisecting: 237 revisions left to test after this (roughly 8 steps)
[e7e9423db459423d3dcb367217553ad9ededadc9] Merge tag 'v6.6-vfs.super.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
--
$ git bisect good
Bisecting: 141 revisions left to test after this (roughly 7 steps)
[8ae5d298ef2005da5454fc1680f983e85d3e1622] Merge tag '6.6-rc-ksmbd-fixes-part1' of git://git.samba.org/ksmbd
--
$ git bisect good
Bisecting: 61 revisions left to test after this (roughly 6 steps)
[99d99825fc075fd24b60cc9cf0fb1e20b9c16b0f] Merge tag 'nfs-for-6.6-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
--
$ git bisect bad
Bisecting: 39 revisions left to test after this (roughly 5 steps)
[7b719e2bf342a59e88b2b6215b98ca4cf824bc58] SUNRPC: change svc_recv() to return void.
--
$ git bisect bad
Bisecting: 19 revisions left to test after this (roughly 4 steps)
[e7421ce71437ec8e4d69cc6bdf35b6853adc5050] NFSD: Rename struct svc_cacherep
--
$ git bisect good
Bisecting: 9 revisions left to test after this (roughly 3 steps)
[baabf59c24145612e4a975f459a5024389f13f5d] SUNRPC: Convert svc_udp_sendto() to use the per-socket bio_vec array
--
$ git bisect bad
Bisecting: 4 revisions left to test after this (roughly 2 steps)
[be2be5f7f4436442d8f6bffbb97a6f438df2896b] lockd: nlm_blocked list race fixes
--
$ git bisect good
Bisecting: 2 revisions left to test after this (roughly 1 step)
[d424797032c6e24b44037e6c7a2d32fd958300f0] nfsd: inherit required unset default acls from effective set
--
$ git bisect good
Bisecting: 0 revisions left to test after this (roughly 1 step)
[e18e157bb5c8c1cd8a9ba25acfdcf4f3035836f4] SUNRPC: Send RPC message on TCP with a single sock_sendmsg() call
--
$ git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[2eb2b93581813b74c7174961126f6ec38eadb5a7] SUNRPC: Convert svc_tcp_sendmsg to use bio_vecs directly
--
$ git bisect good
e18e157bb5c8c1cd8a9ba25acfdcf4f3035836f4 is the first bad commit
commit e18e157bb5c8c1cd8a9ba25acfdcf4f3035836f4

I found the memory loss inside /proc/meminfo only on MemAvailable
 MemTotal:         346948 kB
On a bad test run in looks like this:
-MemAvailable:     210820 kB
+MemAvailable:      26608 kB
On a good test run it looks like this:
-MemAvailable:     215872 kB
+MemAvailable:     221128 kB


> Gesendet: Freitag, den 29.03.2024 um 01:25 Uhr
> Von: "Chuck Lever III" <chuck.lever@oracle.com>
> An: "Jan Schunk" <scpcom@gmx.de>, "Benjamin Coddington" <bcodding@redhat.com>
> Cc: "Jeff Layton" <jlayton@kernel.org>, "Neil Brown" <neilb@suse.de>, "Olga Kornievskaia" <kolga@netapp.com>, "Dai Ngo" <dai.ngo@oracle.com>, "Tom Talpey" <tom@talpey.com>, "Linux NFS Mailing List" <linux-nfs@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
> 
> 
> 
> > On Mar 28, 2024, at 6:03 PM, Jan Schunk <scpcom@gmx.de> wrote:
> > 
> > Inside the VM I was not able to reproduce the issue on v6.5.x so I keep concentrating on v6.6.x.
> > 
> > Current status:
> > 
> > $ git bisect start v6.6 v6.5
> > Bisecting: 7882 revisions left to test after this (roughly 13 steps)
> > [a1c19328a160c80251868dbd80066dce23d07995] Merge tag 'soc-arm-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
> > 
> > --
> > $ git bisect good
> > Bisecting: 3935 revisions left to test after this (roughly 12 steps)
> > [e4f1b8202fb59c56a3de7642d50326923670513f] Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
> > 
> > --
> > $ git bisect bad
> > Bisecting: 2014 revisions left to test after this (roughly 11 steps)
> > [e0152e7481c6c63764d6ea8ee41af5cf9dfac5e9] Merge tag 'riscv-for-linus-6.6-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
> > 
> > --
> > $ git bisect bad
> > Bisecting: 975 revisions left to test after this (roughly 10 steps)
> > [4a3b1007eeb26b2bb7ae4d734cc8577463325165] Merge tag 'pinctrl-v6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
> > 
> > --
> > $ git bisect good
> > Bisecting: 476 revisions left to test after this (roughly 9 steps)
> > [4debf77169ee459c46ec70e13dc503bc25efd7d2] Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
> > 
> > --
> > $ git bisect good
> > Bisecting: 237 revisions left to test after this (roughly 8 steps)
> > [e7e9423db459423d3dcb367217553ad9ededadc9] Merge tag 'v6.6-vfs.super.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
> 
> Good, keep going.
> 
> I've tried replicating the free memory loss here, using the
> git regression suite on my nfsd-fixes branch. Taking a
> meminfo sample between each of four test runs, the only
> clear downward trend I see is:
> 
> free:3019839 < start
> free:2858438 < after first run
> free:2836058 < after second run
> free:2822077 < after third run
> free:2797143 < after fourth run
> 
> All other metrics seem to vary arbitrarily.
> 
> The only slightly suspicious slab I see is buffer_head.
> /sys/kernel/debug/kmemleak has a single entry in it, not
> related to NFSD.
> 
> At this point I'm kind of suspecting that the issue will
> not be related to NFSD or SUNRPC or any particular slab
> cache, but will be orphaned whole pages. Your bisect
> still seems like the best shot at localizing the
> misbehavior.
> 
> 
> --
> Chuck Lever
> 
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Re: [External] : nfsd: memory leak when client does many file operations
  2024-03-30 15:26                       ` Aw: " Jan Schunk
@ 2024-03-30 16:26                         ` Chuck Lever
  2024-04-01 14:08                           ` Chuck Lever III
  0 siblings, 1 reply; 24+ messages in thread
From: Chuck Lever @ 2024-03-30 16:26 UTC (permalink / raw)
  To: Jan Schunk
  Cc: Benjamin Coddington, Jeff Layton, Neil Brown, Olga Kornievskaia,
	Dai Ngo, Tom Talpey, Linux NFS Mailing List, linux-kernel,
	David Howells

On Sat, Mar 30, 2024 at 04:26:09PM +0100, Jan Schunk wrote:
> Full test result:
> 
> $ git bisect start v6.6 v6.5
> Bisecting: 7882 revisions left to test after this (roughly 13 steps)
> [a1c19328a160c80251868dbd80066dce23d07995] Merge tag 'soc-arm-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
> --
> $ git bisect good
> Bisecting: 3935 revisions left to test after this (roughly 12 steps)
> [e4f1b8202fb59c56a3de7642d50326923670513f] Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
> --
> $ git bisect bad
> Bisecting: 2014 revisions left to test after this (roughly 11 steps)
> [e0152e7481c6c63764d6ea8ee41af5cf9dfac5e9] Merge tag 'riscv-for-linus-6.6-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
> --
> $ git bisect bad
> Bisecting: 975 revisions left to test after this (roughly 10 steps)
> [4a3b1007eeb26b2bb7ae4d734cc8577463325165] Merge tag 'pinctrl-v6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
> --
> $ git bisect good
> Bisecting: 476 revisions left to test after this (roughly 9 steps)
> [4debf77169ee459c46ec70e13dc503bc25efd7d2] Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
> --
> $ git bisect good
> Bisecting: 237 revisions left to test after this (roughly 8 steps)
> [e7e9423db459423d3dcb367217553ad9ededadc9] Merge tag 'v6.6-vfs.super.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
> --
> $ git bisect good
> Bisecting: 141 revisions left to test after this (roughly 7 steps)
> [8ae5d298ef2005da5454fc1680f983e85d3e1622] Merge tag '6.6-rc-ksmbd-fixes-part1' of git://git.samba.org/ksmbd
> --
> $ git bisect good
> Bisecting: 61 revisions left to test after this (roughly 6 steps)
> [99d99825fc075fd24b60cc9cf0fb1e20b9c16b0f] Merge tag 'nfs-for-6.6-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
> --
> $ git bisect bad
> Bisecting: 39 revisions left to test after this (roughly 5 steps)
> [7b719e2bf342a59e88b2b6215b98ca4cf824bc58] SUNRPC: change svc_recv() to return void.
> --
> $ git bisect bad
> Bisecting: 19 revisions left to test after this (roughly 4 steps)
> [e7421ce71437ec8e4d69cc6bdf35b6853adc5050] NFSD: Rename struct svc_cacherep
> --
> $ git bisect good
> Bisecting: 9 revisions left to test after this (roughly 3 steps)
> [baabf59c24145612e4a975f459a5024389f13f5d] SUNRPC: Convert svc_udp_sendto() to use the per-socket bio_vec array
> --
> $ git bisect bad
> Bisecting: 4 revisions left to test after this (roughly 2 steps)
> [be2be5f7f4436442d8f6bffbb97a6f438df2896b] lockd: nlm_blocked list race fixes
> --
> $ git bisect good
> Bisecting: 2 revisions left to test after this (roughly 1 step)
> [d424797032c6e24b44037e6c7a2d32fd958300f0] nfsd: inherit required unset default acls from effective set
> --
> $ git bisect good
> Bisecting: 0 revisions left to test after this (roughly 1 step)
> [e18e157bb5c8c1cd8a9ba25acfdcf4f3035836f4] SUNRPC: Send RPC message on TCP with a single sock_sendmsg() call
> --
> $ git bisect bad
> Bisecting: 0 revisions left to test after this (roughly 0 steps)
> [2eb2b93581813b74c7174961126f6ec38eadb5a7] SUNRPC: Convert svc_tcp_sendmsg to use bio_vecs directly
> --
> $ git bisect good
> e18e157bb5c8c1cd8a9ba25acfdcf4f3035836f4 is the first bad commit
> commit e18e157bb5c8c1cd8a9ba25acfdcf4f3035836f4

This is a plausible bisect result for this behavior, so nice work.

David (cc'd), can you have a brief look at this? What did we miss?
I'm guessing it's a page reference count issue that might occur
only when the XDR head and tail buffers are in the same page. Or
it might occur if two entries in the XDR page array point to the
same page...?

/me stabs in the darkness


> I found the memory loss inside /proc/meminfo only on MemAvailable
>  MemTotal:         346948 kB
> On a bad test run in looks like this:
> -MemAvailable:     210820 kB
> +MemAvailable:      26608 kB
> On a good test run it looks like this:
> -MemAvailable:     215872 kB
> +MemAvailable:     221128 kB

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [External] : nfsd: memory leak when client does many file operations
  2024-03-30 16:26                         ` Chuck Lever
@ 2024-04-01 14:08                           ` Chuck Lever III
  2024-04-01 17:35                             ` Aw: " Jan Schunk
  0 siblings, 1 reply; 24+ messages in thread
From: Chuck Lever III @ 2024-04-01 14:08 UTC (permalink / raw)
  To: Jan Schunk
  Cc: Benjamin Coddington, Jeff Layton, Neil Brown, Olga Kornievskaia,
	Dai Ngo, Tom Talpey, Linux NFS Mailing List, linux-kernel,
	David Howells, Linux regressions mailing list



> On Mar 30, 2024, at 12:26 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
> 
> On Sat, Mar 30, 2024 at 04:26:09PM +0100, Jan Schunk wrote:
>> Full test result:
>> 
>> $ git bisect start v6.6 v6.5
>> Bisecting: 7882 revisions left to test after this (roughly 13 steps)
>> [a1c19328a160c80251868dbd80066dce23d07995] Merge tag 'soc-arm-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
>> --
>> $ git bisect good
>> Bisecting: 3935 revisions left to test after this (roughly 12 steps)
>> [e4f1b8202fb59c56a3de7642d50326923670513f] Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
>> --
>> $ git bisect bad
>> Bisecting: 2014 revisions left to test after this (roughly 11 steps)
>> [e0152e7481c6c63764d6ea8ee41af5cf9dfac5e9] Merge tag 'riscv-for-linus-6.6-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
>> --
>> $ git bisect bad
>> Bisecting: 975 revisions left to test after this (roughly 10 steps)
>> [4a3b1007eeb26b2bb7ae4d734cc8577463325165] Merge tag 'pinctrl-v6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
>> --
>> $ git bisect good
>> Bisecting: 476 revisions left to test after this (roughly 9 steps)
>> [4debf77169ee459c46ec70e13dc503bc25efd7d2] Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
>> --
>> $ git bisect good
>> Bisecting: 237 revisions left to test after this (roughly 8 steps)
>> [e7e9423db459423d3dcb367217553ad9ededadc9] Merge tag 'v6.6-vfs.super.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
>> --
>> $ git bisect good
>> Bisecting: 141 revisions left to test after this (roughly 7 steps)
>> [8ae5d298ef2005da5454fc1680f983e85d3e1622] Merge tag '6.6-rc-ksmbd-fixes-part1' of git://git.samba.org/ksmbd
>> --
>> $ git bisect good
>> Bisecting: 61 revisions left to test after this (roughly 6 steps)
>> [99d99825fc075fd24b60cc9cf0fb1e20b9c16b0f] Merge tag 'nfs-for-6.6-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
>> --
>> $ git bisect bad
>> Bisecting: 39 revisions left to test after this (roughly 5 steps)
>> [7b719e2bf342a59e88b2b6215b98ca4cf824bc58] SUNRPC: change svc_recv() to return void.
>> --
>> $ git bisect bad
>> Bisecting: 19 revisions left to test after this (roughly 4 steps)
>> [e7421ce71437ec8e4d69cc6bdf35b6853adc5050] NFSD: Rename struct svc_cacherep
>> --
>> $ git bisect good
>> Bisecting: 9 revisions left to test after this (roughly 3 steps)
>> [baabf59c24145612e4a975f459a5024389f13f5d] SUNRPC: Convert svc_udp_sendto() to use the per-socket bio_vec array
>> --
>> $ git bisect bad
>> Bisecting: 4 revisions left to test after this (roughly 2 steps)
>> [be2be5f7f4436442d8f6bffbb97a6f438df2896b] lockd: nlm_blocked list race fixes
>> --
>> $ git bisect good
>> Bisecting: 2 revisions left to test after this (roughly 1 step)
>> [d424797032c6e24b44037e6c7a2d32fd958300f0] nfsd: inherit required unset default acls from effective set
>> --
>> $ git bisect good
>> Bisecting: 0 revisions left to test after this (roughly 1 step)
>> [e18e157bb5c8c1cd8a9ba25acfdcf4f3035836f4] SUNRPC: Send RPC message on TCP with a single sock_sendmsg() call
>> --
>> $ git bisect bad
>> Bisecting: 0 revisions left to test after this (roughly 0 steps)
>> [2eb2b93581813b74c7174961126f6ec38eadb5a7] SUNRPC: Convert svc_tcp_sendmsg to use bio_vecs directly
>> --
>> $ git bisect good
>> e18e157bb5c8c1cd8a9ba25acfdcf4f3035836f4 is the first bad commit
>> commit e18e157bb5c8c1cd8a9ba25acfdcf4f3035836f4
> 
> This is a plausible bisect result for this behavior, so nice work.
> 
> David (cc'd), can you have a brief look at this? What did we miss?
> I'm guessing it's a page reference count issue that might occur
> only when the XDR head and tail buffers are in the same page. Or
> it might occur if two entries in the XDR page array point to the
> same page...?
> 
> /me stabs in the darkness
> 
> 
>> I found the memory loss inside /proc/meminfo only on MemAvailable
>> MemTotal:         346948 kB
>> On a bad test run in looks like this:
>> -MemAvailable:     210820 kB
>> +MemAvailable:      26608 kB
>> On a good test run it looks like this:
>> -MemAvailable:     215872 kB
>> +MemAvailable:     221128 kB

Jan, may I ask one more favor? Since this might take a little
time to run down, can you open a bug report on
bugzilla.kernel.org <http://bugzilla.kernel.org/>, and copy in the symptomology and the
bisect results? It will get assigned to Trond, and he can
pass it to me.

The problem looks like how we're using a page_frag_cache to
handle the record marker buffers, but I'm not sure what the
proper solution is yet.

#regzbot ^introduced: e18e157bb5c8

--
Chuck Lever



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Aw: Re: [External] : nfsd: memory leak when client does many file operations
  2024-04-01 14:08                           ` Chuck Lever III
@ 2024-04-01 17:35                             ` Jan Schunk
  2024-04-01 17:51                               ` Chuck Lever III
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Schunk @ 2024-04-01 17:35 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Benjamin Coddington, Jeff Layton, Neil Brown, Olga Kornievskaia,
	Dai Ngo, Tom Talpey, Linux NFS Mailing List, linux-kernel,
	David Howells, Linux regressions mailing list

Hi,
the bug report is now here:
https://bugzilla.kernel.org/show_bug.cgi?id=218671

PS: I can also confirm, if you use the latest v6.6.22 and only revert e18e157bb5c8 nfsd works without any issue.

> Gesendet: Montag, den 01.04.2024 um 16:08 Uhr
> Von: "Chuck Lever III" <chuck.lever@oracle.com>
> An: "Jan Schunk" <scpcom@gmx.de>
> Cc: "Benjamin Coddington" <bcodding@redhat.com>, "Jeff Layton" <jlayton@kernel.org>, "Neil Brown" <neilb@suse.de>, "Olga Kornievskaia" <kolga@netapp.com>, "Dai Ngo" <dai.ngo@oracle.com>, "Tom Talpey" <tom@talpey.com>, "Linux NFS Mailing List" <linux-nfs@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "David Howells" <dhowells@redhat.com>, "Linux regressions mailing list" <regressions@lists.linux.dev>
> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
> 
> 
> 
> > On Mar 30, 2024, at 12:26 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
> > 
> > On Sat, Mar 30, 2024 at 04:26:09PM +0100, Jan Schunk wrote:
> >> Full test result:
> >> 
> >> $ git bisect start v6.6 v6.5
> >> Bisecting: 7882 revisions left to test after this (roughly 13 steps)
> >> [a1c19328a160c80251868dbd80066dce23d07995] Merge tag 'soc-arm-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
> >> --
> >> $ git bisect good
> >> Bisecting: 3935 revisions left to test after this (roughly 12 steps)
> >> [e4f1b8202fb59c56a3de7642d50326923670513f] Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
> >> --
> >> $ git bisect bad
> >> Bisecting: 2014 revisions left to test after this (roughly 11 steps)
> >> [e0152e7481c6c63764d6ea8ee41af5cf9dfac5e9] Merge tag 'riscv-for-linus-6.6-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
> >> --
> >> $ git bisect bad
> >> Bisecting: 975 revisions left to test after this (roughly 10 steps)
> >> [4a3b1007eeb26b2bb7ae4d734cc8577463325165] Merge tag 'pinctrl-v6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
> >> --
> >> $ git bisect good
> >> Bisecting: 476 revisions left to test after this (roughly 9 steps)
> >> [4debf77169ee459c46ec70e13dc503bc25efd7d2] Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
> >> --
> >> $ git bisect good
> >> Bisecting: 237 revisions left to test after this (roughly 8 steps)
> >> [e7e9423db459423d3dcb367217553ad9ededadc9] Merge tag 'v6.6-vfs.super.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
> >> --
> >> $ git bisect good
> >> Bisecting: 141 revisions left to test after this (roughly 7 steps)
> >> [8ae5d298ef2005da5454fc1680f983e85d3e1622] Merge tag '6.6-rc-ksmbd-fixes-part1' of git://git.samba.org/ksmbd
> >> --
> >> $ git bisect good
> >> Bisecting: 61 revisions left to test after this (roughly 6 steps)
> >> [99d99825fc075fd24b60cc9cf0fb1e20b9c16b0f] Merge tag 'nfs-for-6.6-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
> >> --
> >> $ git bisect bad
> >> Bisecting: 39 revisions left to test after this (roughly 5 steps)
> >> [7b719e2bf342a59e88b2b6215b98ca4cf824bc58] SUNRPC: change svc_recv() to return void.
> >> --
> >> $ git bisect bad
> >> Bisecting: 19 revisions left to test after this (roughly 4 steps)
> >> [e7421ce71437ec8e4d69cc6bdf35b6853adc5050] NFSD: Rename struct svc_cacherep
> >> --
> >> $ git bisect good
> >> Bisecting: 9 revisions left to test after this (roughly 3 steps)
> >> [baabf59c24145612e4a975f459a5024389f13f5d] SUNRPC: Convert svc_udp_sendto() to use the per-socket bio_vec array
> >> --
> >> $ git bisect bad
> >> Bisecting: 4 revisions left to test after this (roughly 2 steps)
> >> [be2be5f7f4436442d8f6bffbb97a6f438df2896b] lockd: nlm_blocked list race fixes
> >> --
> >> $ git bisect good
> >> Bisecting: 2 revisions left to test after this (roughly 1 step)
> >> [d424797032c6e24b44037e6c7a2d32fd958300f0] nfsd: inherit required unset default acls from effective set
> >> --
> >> $ git bisect good
> >> Bisecting: 0 revisions left to test after this (roughly 1 step)
> >> [e18e157bb5c8c1cd8a9ba25acfdcf4f3035836f4] SUNRPC: Send RPC message on TCP with a single sock_sendmsg() call
> >> --
> >> $ git bisect bad
> >> Bisecting: 0 revisions left to test after this (roughly 0 steps)
> >> [2eb2b93581813b74c7174961126f6ec38eadb5a7] SUNRPC: Convert svc_tcp_sendmsg to use bio_vecs directly
> >> --
> >> $ git bisect good
> >> e18e157bb5c8c1cd8a9ba25acfdcf4f3035836f4 is the first bad commit
> >> commit e18e157bb5c8c1cd8a9ba25acfdcf4f3035836f4
> > 
> > This is a plausible bisect result for this behavior, so nice work.
> > 
> > David (cc'd), can you have a brief look at this? What did we miss?
> > I'm guessing it's a page reference count issue that might occur
> > only when the XDR head and tail buffers are in the same page. Or
> > it might occur if two entries in the XDR page array point to the
> > same page...?
> > 
> > /me stabs in the darkness
> > 
> > 
> >> I found the memory loss inside /proc/meminfo only on MemAvailable
> >> MemTotal:         346948 kB
> >> On a bad test run in looks like this:
> >> -MemAvailable:     210820 kB
> >> +MemAvailable:      26608 kB
> >> On a good test run it looks like this:
> >> -MemAvailable:     215872 kB
> >> +MemAvailable:     221128 kB
> 
> Jan, may I ask one more favor? Since this might take a little
> time to run down, can you open a bug report on
> bugzilla.kernel.org <http://bugzilla.kernel.org/>, and copy in the symptomology and the
> bisect results? It will get assigned to Trond, and he can
> pass it to me.
> 
> The problem looks like how we're using a page_frag_cache to
> handle the record marker buffers, but I'm not sure what the
> proper solution is yet.
> 
> #regzbot ^introduced: e18e157bb5c8
> 
> --
> Chuck Lever
> 
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [External] : nfsd: memory leak when client does many file operations
  2024-04-01 17:35                             ` Aw: " Jan Schunk
@ 2024-04-01 17:51                               ` Chuck Lever III
  0 siblings, 0 replies; 24+ messages in thread
From: Chuck Lever III @ 2024-04-01 17:51 UTC (permalink / raw)
  To: Jan Schunk
  Cc: Benjamin Coddington, Jeff Layton, Neil Brown, Olga Kornievskaia,
	Dai Ngo, Tom Talpey, Linux NFS Mailing List, linux-kernel,
	David Howells, Linux regressions mailing list



> On Apr 1, 2024, at 1:35 PM, Jan Schunk <scpcom@gmx.de> wrote:
> 
> the bug report is now here:
> https://bugzilla.kernel.org/show_bug.cgi?id=218671

Thanks, I'll try to keep that bug updated.

--
Chuck Lever



^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2024-04-01 17:52 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-24 19:57 nfsd: memory leak when client does many file operations Jan Schunk
2024-03-24 20:14 ` [External] : " Chuck Lever III
2024-03-24 20:48   ` Aw: " Jan Schunk
2024-03-24 21:10     ` Chuck Lever III
2024-03-24 21:39       ` Aw: " Jan Schunk
2024-03-24 22:13         ` Chuck Lever III
2024-03-24 22:54           ` Aw: " Jan Schunk
2024-03-25 19:55           ` Jan Schunk
2024-03-25 20:11             ` Chuck Lever III
2024-03-25 20:26               ` Aw: " Jan Schunk
2024-03-25 20:36                 ` Chuck Lever III
2024-03-26 16:50                   ` Aw: " Jan Schunk
2024-03-28 22:03                   ` Jan Schunk
2024-03-29  0:25                     ` Chuck Lever III
2024-03-30 15:26                       ` Aw: " Jan Schunk
2024-03-30 16:26                         ` Chuck Lever
2024-04-01 14:08                           ` Chuck Lever III
2024-04-01 17:35                             ` Aw: " Jan Schunk
2024-04-01 17:51                               ` Chuck Lever III
2024-03-26 11:15               ` Benjamin Coddington
2024-03-26 17:04                 ` Aw: " Jan Schunk
2024-03-26 17:13                   ` Benjamin Coddington
2024-03-26 17:15                     ` Benjamin Coddington
2024-03-26 19:05                       ` Aw: " Jan Schunk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).