All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG REPORT] missing memory counter introduced by xfs
@ 2016-09-07 10:36 Lin Feng
  2016-09-07 21:22 ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Lin Feng @ 2016-09-07 10:36 UTC (permalink / raw)
  To: xfs; +Cc: dchinner

Hi all nice xfs folks,

I'm a rookie and really fresh new in xfs and currently I ran into an issue same 
as the following link described:
http://oss.sgi.com/archives/xfs/2014-04/msg00058.html

In my box(running cephfs osd using xfs kernel 2.6.32-358) and I sum all possible 
memory counter can be find but it seems that nearlly 26GB memory has gone and 
they are back after I echo 2 > /proc/sys/vm/drop_caches, so seems these memory 
can be reclaimed by slab.
But it my box kernel use swap instead reclaim slab until I run echo 2 > 
/proc/sys/vm/drop_caches.

Following memory usage is caught by my shell script, and the slabinfo/meminfo 
pasted at the end of this mail.
-----
before echo 1 > /proc/sys/vm/drop_caches
Analysis: all processes rss + buffer+cached + slabs + free, total Rss: 39863308 K
              total       used       free     shared    buffers     cached
Mem:      65963504   58230212    7733292          0      31284    6711912
-/+ buffers/cache:   51487016   14476488
Swap:      8388600          0    8388600

after echo 1 > /proc/sys/vm/drop_caches
Analysis: all processes rss + buffer+cached + slabs + free, total Rss:
39781110 K

free info:
              total       used       free     shared    buffers     cached
Mem:      65963504   51666124   14297380          0       3376      55704
-/+ buffers/cache:   51607044   14356460
Swap:      8388600          0    8388600

after echo 2 > /proc/sys/vm/drop_caches
Analysis: all processes rss + buffer+cached + slabs + free, total Rss:
65259244 K
              total       used       free     shared    buffers     cached
Mem:      65963504   17194480   48769024          0       7948      53216
-/+ buffers/cache:   17133316   48830188
Swap:      8388600          0    8388600
-----


And according to what David said replying in the list:
David's mail contents quotes start
-----
On Thu, Apr 10, 2014 at 07:40:44PM -0700, daiguochao wrote:
 > Dear Stan, I can't send email to you.So I leave a message here.I hope not to
 > bother you.
 > Thank you for your kind assistance.
 >
 > In accordance with your suggestion, we executed "echo 3 >
 > /proc/sysm/drop_caches" for trying to release vfs dentries and inodes.
 > Really,
 > our lost memory came back. But we learned that the memory of vfs dentries
 > and inodes is distributed from slab. Please check our system "Slab:  509708
 > kB" from /proc/meminfo, and it seems only be took up 500MB and xfs_buf take
 > up 450MB among.

That's where your memory is - in metadata buffers. The xfs_buf slab
entries are just the handles - the metadata pages in the buffers
usually take much more space and it's not accounted to the slab
cache nor the page cache.

Can you post the output of /proc/slabinfo, and what is the output of
xfs_info on the filesystem in question? Also, a description of your
workload that is resulting in large amounts of cached metadata
buffers but no inodes or dentries would be helpful.
-----
David's mail contents quotes end

After some research it seems that after this patch(commit 0e6e847ffe37) we use 
alloc_page in xfs_buf_allocate_memory() instead of original find_or_create_page.
Ps. mainline kernel still uses alloc_page.

So if my  speculation was right, my problem is if there a way to find out how 
much memory that xfs_buf_t->b_pages are using or if xfs has already exported 
such counter to user space.


thanks in advance.
linfeng

-----
commit 0e6e847ffe37436e331c132639f9f872febce82e
Author: Dave Chinner <dchinner@redhat.com>
Date:   Sat Mar 26 09:16:45 2011 +1100

     xfs: stop using the page cache to back the buffer cache

     Now that the buffer cache has it's own LRU, we do not need to use
     the page cache to provide persistent caching and reclaim
     infrastructure. Convert the buffer cache to use alloc_pages()
     instead of the page cache. This will remove all the overhead of page
     cache management from setup and teardown of the buffers, as well as
     needing to mark pages accessed as we find buffers in the buffer
     cache.
...
-             retry:
-               page = find_or_create_page(mapping, first + i, gfp_mask);
+retry:
+               page = alloc_page(gfp_mask);
-----


slabtop info:
  Active / Total Objects (% used)    : 27396369 / 27446160 (99.8%)
  Active / Total Slabs (% used)      : 2371663 / 2371729 (100.0%)
  Active / Total Caches (% used)     : 112 / 197 (56.9%)
  Active / Total Size (% used)       : 9186047.17K / 9202410.61K (99.8%)
  Minimum / Average / Maximum Object : 0.02K / 0.33K / 4096.00K

   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
4383036 4383014  99%    1.00K 1095759        4   4383036K xfs_inode
5394610 5394544  99%    0.38K 539461       10   2157844K xfs_buf
6448560 6448451  99%    0.19K 322428       20   1289712K dentry
1083285 1062902  98%    0.55K 154755        7    619020K radix_tree_node
3015600 3015546  99%    0.12K 100520       30    402080K size-128
4379806 4379430  99%    0.06K  74234       59    296936K xfs_ifork
687640 687144  99%    0.19K  34382       20    137528K size-192
1833130 1833089  99%    0.06K  31070       59    124280K size-64
   1060   1059  99%   16.00K   1060        1     16960K size-16384
    196    196 100%   32.12K    196        1     12544K kmem_cache
   4332   4316  99%    2.59K   1444        3     11552K task_struct
  15900  15731  98%    0.62K   2650        6     10600K proc_inode_cache
   8136   7730  95%    1.00K   2034        4      8136K size-1024
   9930   9930 100%    0.58K   1655        6      6620K inode_cache
  20700  14438  69%    0.19K   1035       20      4140K filp
   3704   3691  99%    1.00K    926        4      3704K ext4_inode_cache
  17005  15631  91%    0.20K    895       19      3580K vm_area_struct
  18090  18043  99%    0.14K    670       27      2680K sysfs_dir_cache
   1266   1254  99%    1.94K    633        2      2532K TCP
    885    885 100%    2.06K    295        3      2360K sighand_cache

/proc/meminfo
MemTotal:       65963504 kB
MemFree:        14296540 kB
Buffers:            3380 kB
Cached:            55700 kB
SwapCached:            0 kB
Active:         15717512 kB
Inactive:         306828 kB
Active(anon):   15699604 kB
Inactive(anon):   268724 kB
Active(file):      17908 kB
Inactive(file):    38104 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       8388600 kB
SwapFree:        8388600 kB
Dirty:                72 kB
Writeback:             0 kB
AnonPages:      15966248 kB
Mapped:            33668 kB
Shmem:              3056 kB
Slab:            9521800 kB
SReclaimable:    6314860 kB
SUnreclaim:      3206940 kB
KernelStack:       32680 kB
PageTables:        51504 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    61159400 kB
Committed_AS:   29734944 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      389896 kB
VmallocChunk:   34324818076 kB
HardwareCorrupted:     0 kB
AnonHugePages:    407552 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        5504 kB
DirectMap2M:     2082816 kB
DirectMap1G:    65011712 kB


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG REPORT] missing memory counter introduced by xfs
  2016-09-07 10:36 [BUG REPORT] missing memory counter introduced by xfs Lin Feng
@ 2016-09-07 21:22 ` Dave Chinner
  2016-09-08 10:07   ` Lin Feng
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2016-09-07 21:22 UTC (permalink / raw)
  To: Lin Feng; +Cc: dchinner, xfs

On Wed, Sep 07, 2016 at 06:36:19PM +0800, Lin Feng wrote:
> Hi all nice xfs folks,
> 
> I'm a rookie and really fresh new in xfs and currently I ran into an
> issue same as the following link described:
> http://oss.sgi.com/archives/xfs/2014-04/msg00058.html
> 
> In my box(running cephfs osd using xfs kernel 2.6.32-358) and I sum
> all possible memory counter can be find but it seems that nearlly
> 26GB memory has gone and they are back after I echo 2 >
> /proc/sys/vm/drop_caches, so seems these memory can be reclaimed by
> slab.

It isn't "reclaimed by slab". The XFS metadata buffer cache is
reclaimed by a memory shrinker, which are for reclaiming objects
from caches that aren't the page cache. "echo 2 >
/proc/sys/vm/drop_caches" runs the memory shrinkers rather than page
cache reclaim. Many slab caches are backed by memory shrinkers,
which is why it is thought that "2" is "slab reclaim"....

> And according to what David said replying in the list:
..
> That's where your memory is - in metadata buffers. The xfs_buf slab
> entries are just the handles - the metadata pages in the buffers
> usually take much more space and it's not accounted to the slab
> cache nor the page cache.

That's exactly the case.

>  Minimum / Average / Maximum Object : 0.02K / 0.33K / 4096.00K
> 
>   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
> 4383036 4383014  99%    1.00K 1095759        4   4383036K xfs_inode
> 5394610 5394544  99%    0.38K 539461       10   2157844K xfs_buf

So, you have *5.4 million* active metadata buffers. Each buffer will
hold  1 or 2 4k pages on your kernel, so simple math says 4M * 4k +
1.4M * 8k = 26G. There's no missing counter here....

Obviously your workload is doing something extremely metadata
intensive to have a cache footprint like this - you have more cached
buffers than inodes, dentries, etc. That in itself is very unusual -
can you describe what is stored on that filesystem and how large the
attributes being stored in each inode are?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG REPORT] missing memory counter introduced by xfs
  2016-09-07 21:22 ` Dave Chinner
@ 2016-09-08 10:07   ` Lin Feng
  2016-09-08 20:44     ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Lin Feng @ 2016-09-08 10:07 UTC (permalink / raw)
  To: Dave Chinner; +Cc: dchinner, xfs

Hi Dave,

Thank you for your fast reply, look beblow please.

On 09/08/2016 05:22 AM, Dave Chinner wrote:
> On Wed, Sep 07, 2016 at 06:36:19PM +0800, Lin Feng wrote:
>> Hi all nice xfs folks,
>>
>> I'm a rookie and really fresh new in xfs and currently I ran into an
>> issue same as the following link described:
>> http://oss.sgi.com/archives/xfs/2014-04/msg00058.html
>>
>> In my box(running cephfs osd using xfs kernel 2.6.32-358) and I sum
>> all possible memory counter can be find but it seems that nearlly
>> 26GB memory has gone and they are back after I echo 2 >
>> /proc/sys/vm/drop_caches, so seems these memory can be reclaimed by
>> slab.
>
> It isn't "reclaimed by slab". The XFS metadata buffer cache is
> reclaimed by a memory shrinker, which are for reclaiming objects
> from caches that aren't the page cache. "echo 2 >
> /proc/sys/vm/drop_caches" runs the memory shrinkers rather than page
> cache reclaim. Many slab caches are backed by memory shrinkers,
> which is why it is thought that "2" is "slab reclaim"....
>
>> And according to what David said replying in the list:
> ..
>> That's where your memory is - in metadata buffers. The xfs_buf slab
>> entries are just the handles - the metadata pages in the buffers
>> usually take much more space and it's not accounted to the slab
>> cache nor the page cache.
>
> That's exactly the case.
>
>>   Minimum / Average / Maximum Object : 0.02K / 0.33K / 4096.00K
>>
>>    OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
>> 4383036 4383014  99%    1.00K 1095759        4   4383036K xfs_inode
>> 5394610 5394544  99%    0.38K 539461       10   2157844K xfs_buf
>
> So, you have *5.4 million* active metadata buffers. Each buffer will
> hold  1 or 2 4k pages on your kernel, so simple math says 4M * 4k +
> 1.4M * 8k = 26G. There's no missing counter here....

Does xattr contribute to such metadata buffers or there is something else?
After consulting to my teammate, who told me that in our case small files
(there are a looot, look below) always use xattr.

Another thing is do we need to export such thing or we have to make the 
computation every time to figure out if we leak memory.
And more important is that seems these memory has a low priority to be reclaimed 
by memory reclaim mechanism, does it due to most of the slab objects are active?
 >>    OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
 >> 4383036 4383014  99%    1.00K 1095759        4   4383036K xfs_inode
 >> 5394610 5394544  99%    0.38K 539461       10   2157844K xfs_buf

In fact xfs eats a lot of my ram and I will never know where it goes without 
diving into xfs source, at least I'm the second extreme user ;-)

>
> Obviously your workload is doing something extremely metadata
> intensive to have a cache footprint like this - you have more cached
> buffers than inodes, dentries, etc. That in itself is very unusual -
> can you describe what is stored on that filesystem and how large the
> attributes being stored in each inode are?

The fs-user behavior is that ceph-osd daemon will intensively 
pull/synchronize/update files from other osd when the server is up.
In our case cephfs osd stores a lot of small pictures in the filesystem, and I 
do some simple analysis, there are nearly 3,000,000 files on each disk and there 
are 10 such disk.
[root@wzdx49 osd.670]# find current -type f -size -512k | wc -l
2668769
[root@wzdx49 ~]# find /data/osd/osd.67 -type f | wc -l
2682891
[root@wzdx49 ~]# find /data/osd/osd.67 -type d | wc -l
109760

thanks,
linfeng

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG REPORT] missing memory counter introduced by xfs
  2016-09-08 10:07   ` Lin Feng
@ 2016-09-08 20:44     ` Dave Chinner
  2016-09-09  6:32       ` Lin Feng
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2016-09-08 20:44 UTC (permalink / raw)
  To: Lin Feng; +Cc: dchinner, xfs

On Thu, Sep 08, 2016 at 06:07:45PM +0800, Lin Feng wrote:
> Hi Dave,
> 
> Thank you for your fast reply, look beblow please.
> 
> On 09/08/2016 05:22 AM, Dave Chinner wrote:
> >On Wed, Sep 07, 2016 at 06:36:19PM +0800, Lin Feng wrote:
> >>Hi all nice xfs folks,
> >>
> >>I'm a rookie and really fresh new in xfs and currently I ran into an
> >>issue same as the following link described:
> >>http://oss.sgi.com/archives/xfs/2014-04/msg00058.html
> >>
> >>In my box(running cephfs osd using xfs kernel 2.6.32-358) and I sum
> >>all possible memory counter can be find but it seems that nearlly
> >>26GB memory has gone and they are back after I echo 2 >
> >>/proc/sys/vm/drop_caches, so seems these memory can be reclaimed by
> >>slab.
> >
> >It isn't "reclaimed by slab". The XFS metadata buffer cache is
> >reclaimed by a memory shrinker, which are for reclaiming objects
> >from caches that aren't the page cache. "echo 2 >
> >/proc/sys/vm/drop_caches" runs the memory shrinkers rather than page
> >cache reclaim. Many slab caches are backed by memory shrinkers,
> >which is why it is thought that "2" is "slab reclaim"....
> >
> >>And according to what David said replying in the list:
> >..
> >>That's where your memory is - in metadata buffers. The xfs_buf slab
> >>entries are just the handles - the metadata pages in the buffers
> >>usually take much more space and it's not accounted to the slab
> >>cache nor the page cache.
> >
> >That's exactly the case.
> >
> >>  Minimum / Average / Maximum Object : 0.02K / 0.33K / 4096.00K
> >>
> >>   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
> >>4383036 4383014  99%    1.00K 1095759        4   4383036K xfs_inode
> >>5394610 5394544  99%    0.38K 539461       10   2157844K xfs_buf
> >
> >So, you have *5.4 million* active metadata buffers. Each buffer will
> >hold  1 or 2 4k pages on your kernel, so simple math says 4M * 4k +
> >1.4M * 8k = 26G. There's no missing counter here....
> 
> Does xattr contribute to such metadata buffers or there is something else?

xattrs are metadata, so if they don't fit in line in the inode
(typical for ceph because it uses xattrs larger than 256 bytes) then
they are held in external blocks which are cached in the buffer
cache.

> After consulting to my teammate, who told me that in our case small files
> (there are a looot, look below) always use xattr.

Which means that if you have 4.4M cached inodes, you probably have
~4.4M xattr metadata buffers in cache for those inodes, too.

> Another thing is do we need to export such thing or we have to make
> the computation every time to figure out if we leak memory.
> And more important is that seems these memory has a low priority to
> be reclaimed by memory reclaim mechanism, does it due to most of the
> slab objects are active?

"active" slab objects simply mean they are allocated. It does not
mean they are cached or imply anything else about the object's life
cycle.

> >>    OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
> >> 4383036 4383014  99%    1.00K 1095759        4   4383036K xfs_inode
> >> 5394610 5394544  99%    0.38K 539461       10   2157844K xfs_buf
> 
> In fact xfs eats a lot of my ram and I will never know where it goes
> without diving into xfs source, at least I'm the second extreme user
> ;-)
> 
> >
> >Obviously your workload is doing something extremely metadata
> >intensive to have a cache footprint like this - you have more cached
> >buffers than inodes, dentries, etc. That in itself is very unusual -
> >can you describe what is stored on that filesystem and how large the
> >attributes being stored in each inode are?
> 
> The fs-user behavior is that ceph-osd daemon will intensively
> pull/synchronize/update files from other osd when the server is up.
> In our case cephfs osd stores a lot of small pictures in the
> filesystem, and I do some simple analysis, there are nearly
> 3,000,000 files on each disk and there are 10 such disk.
> [root@wzdx49 osd.670]# find current -type f -size -512k | wc -l
> 2668769
> [root@wzdx49 ~]# find /data/osd/osd.67 -type f | wc -l
> 2682891
> [root@wzdx49 ~]# find /data/osd/osd.67 -type d | wc -l
> 109760

Yup, that's a pretty good indication that you have a high metadata
to data ratio in each filesystem, and that ceph is accessing the
metadata more intensively than the data. The fact that the metadata
buffer count roughly matches the cached inode count tells me that
the memory reclaim code is being fairly balanced about what it
reclaims under memory pressure - I think the problem here is more
that you didn't know where the memory was being used than anything
else....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG REPORT] missing memory counter introduced by xfs
  2016-09-08 20:44     ` Dave Chinner
@ 2016-09-09  6:32       ` Lin Feng
  2016-09-09 23:13         ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Lin Feng @ 2016-09-09  6:32 UTC (permalink / raw)
  To: Dave Chinner; +Cc: dchinner, xfs

Hi Dave,

A final not-clear concept about XFS, look beblow please.

On 09/09/2016 04:44 AM, Dave Chinner wrote:
> On Thu, Sep 08, 2016 at 06:07:45PM +0800, Lin Feng wrote:
>> Hi Dave,
>>
>> Thank you for your fast reply, look beblow please.
>>
>> On 09/08/2016 05:22 AM, Dave Chinner wrote:
>>> On Wed, Sep 07, 2016 at 06:36:19PM +0800, Lin Feng wrote:
>>>> Hi all nice xfs folks,
>>>>
>>>> I'm a rookie and really fresh new in xfs and currently I ran into an
>>>> issue same as the following link described:
>>>> http://oss.sgi.com/archives/xfs/2014-04/msg00058.html
>>>>
>>>> In my box(running cephfs osd using xfs kernel 2.6.32-358) and I sum
>>>> all possible memory counter can be find but it seems that nearlly
>>>> 26GB memory has gone and they are back after I echo 2 >
>>>> /proc/sys/vm/drop_caches, so seems these memory can be reclaimed by
>>>> slab.
>>>
>>> It isn't "reclaimed by slab". The XFS metadata buffer cache is
>>> reclaimed by a memory shrinker, which are for reclaiming objects
>> >from caches that aren't the page cache. "echo 2 >
>>> /proc/sys/vm/drop_caches" runs the memory shrinkers rather than page
>>> cache reclaim. Many slab caches are backed by memory shrinkers,
>>> which is why it is thought that "2" is "slab reclaim"....
>>>
>>>> And according to what David said replying in the list:
>>> ..
>>>> That's where your memory is - in metadata buffers. The xfs_buf slab
>>>> entries are just the handles - the metadata pages in the buffers
>>>> usually take much more space and it's not accounted to the slab
>>>> cache nor the page cache.
>>>
>>> That's exactly the case.
>>>
>>>>   Minimum / Average / Maximum Object : 0.02K / 0.33K / 4096.00K
>>>>
>>>>    OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
>>>> 4383036 4383014  99%    1.00K 1095759        4   4383036K xfs_inode
>>>> 5394610 5394544  99%    0.38K 539461       10   2157844K xfs_buf
>>>
>>> So, you have *5.4 million* active metadata buffers. Each buffer will
>>> hold  1 or 2 4k pages on your kernel, so simple math says 4M * 4k +
>>> 1.4M * 8k = 26G. There's no missing counter here....
>>
>> Does xattr contribute to such metadata buffers or there is something else?
>
> xattrs are metadata, so if they don't fit in line in the inode
> (typical for ceph because it uses xattrs larger than 256 bytes) then
> they are held in external blocks which are cached in the buffer
> cache.
>

So the 'buffer cache' here you mean is the pages handled by xfs_buf struct, used 
to hold the xattrs if the inode inline data space overflows, not the 
'beffer/cache' seen via free command, they won't reflect in cache field by free 
command, right?

>> After consulting to my teammate, who told me that in our case small files
>> (there are a looot, look below) always use xattr.
>
> Which means that if you have 4.4M cached inodes, you probably have
> ~4.4M xattr metadata buffers in cache for those inodes, too.
>
>> Another thing is do we need to export such thing or we have to make
>> the computation every time to figure out if we leak memory.
>> And more important is that seems these memory has a low priority to
>> be reclaimed by memory reclaim mechanism, does it due to most of the
>> slab objects are active?
>
> "active" slab objects simply mean they are allocated. It does not
> mean they are cached or imply anything else about the object's life
> cycle.

Sorry, I mistake the concept for active in slab, thanks your explanation.
>
>>>>     OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
>>>> 4383036 4383014  99%    1.00K 1095759        4   4383036K xfs_inode
>>>> 5394610 5394544  99%    0.38K 539461       10   2157844K xfs_buf
>>
>> In fact xfs eats a lot of my ram and I will never know where it goes
>> without diving into xfs source, at least I'm the second extreme user
>> ;-)
>>
>>>
>>> Obviously your workload is doing something extremely metadata
>>> intensive to have a cache footprint like this - you have more cached
>>> buffers than inodes, dentries, etc. That in itself is very unusual -
>>> can you describe what is stored on that filesystem and how large the
>>> attributes being stored in each inode are?
>>
>> The fs-user behavior is that ceph-osd daemon will intensively
>> pull/synchronize/update files from other osd when the server is up.
>> In our case cephfs osd stores a lot of small pictures in the
>> filesystem, and I do some simple analysis, there are nearly
>> 3,000,000 files on each disk and there are 10 such disk.
>> [root@wzdx49 osd.670]# find current -type f -size -512k | wc -l
>> 2668769
>> [root@wzdx49 ~]# find /data/osd/osd.67 -type f | wc -l
>> 2682891
>> [root@wzdx49 ~]# find /data/osd/osd.67 -type d | wc -l
>> 109760
>
> Yup, that's a pretty good indication that you have a high metadata
> to data ratio in each filesystem, and that ceph is accessing the
> metadata more intensively than the data. The fact that the metadata
> buffer count roughly matches the cached inode count tells me that
> the memory reclaim code is being fairly balanced about what it
> reclaims under memory pressure - I think the problem here is more
> that you didn't know where the memory was being used than anything
> else....

Yes, that's exactly why I sent this mail.
Again, thanks for your detailed explanation.

Best regards,
linfeng

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG REPORT] missing memory counter introduced by xfs
  2016-09-09  6:32       ` Lin Feng
@ 2016-09-09 23:13         ` Dave Chinner
  0 siblings, 0 replies; 6+ messages in thread
From: Dave Chinner @ 2016-09-09 23:13 UTC (permalink / raw)
  To: Lin Feng; +Cc: xfs

On Fri, Sep 09, 2016 at 02:32:18PM +0800, Lin Feng wrote:
> Hi Dave,
> 
> A final not-clear concept about XFS, look beblow please.
> 
> On 09/09/2016 04:44 AM, Dave Chinner wrote:
> >On Thu, Sep 08, 2016 at 06:07:45PM +0800, Lin Feng wrote:
....
> >>>So, you have *5.4 million* active metadata buffers. Each buffer will
> >>>hold  1 or 2 4k pages on your kernel, so simple math says 4M * 4k +
> >>>1.4M * 8k = 26G. There's no missing counter here....
> >>
> >>Does xattr contribute to such metadata buffers or there is something else?
> >
> >xattrs are metadata, so if they don't fit in line in the inode
> >(typical for ceph because it uses xattrs larger than 256 bytes) then
> >they are held in external blocks which are cached in the buffer
> >cache.
> >
> 
> So the 'buffer cache' here you mean is the pages handled by xfs_buf
> struct,

Yes.

> used to hold the xattrs if the inode inline data space
> overflows,

And all other cached metadata that is accessed via struct xfs_buf.

> not the 'beffer/cache' seen via free command, they won't
> reflect in cache field by free command, right?

Correct. From the "free" man page:

	buffers
	     Memory used by kernel buffers (Buffers in /proc/meminfo)

	cache
	     Memory used by the page cache and slabs (Cached and
	     Slab in /proc/meminfo)

	buff/cache
	     Sum of buffers and cache


So, "Buffers" is the amount of cached block device pages - this is
always zero for XFS filesystems as we don't use the block device
page cache at all (IIRC, that's where ext4 caches it's metadata).
"cache" is obvious, but it does not include memory attached to slab
objects. Hence it will account for struct xfs_buf memory usage, but
not the pages attached to each xfs_buf....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-09-09 23:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-07 10:36 [BUG REPORT] missing memory counter introduced by xfs Lin Feng
2016-09-07 21:22 ` Dave Chinner
2016-09-08 10:07   ` Lin Feng
2016-09-08 20:44     ` Dave Chinner
2016-09-09  6:32       ` Lin Feng
2016-09-09 23:13         ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.