* How to sample memory usage cheaply? @ 2017-03-30 20:04 Benjamin King 2017-03-31 12:06 ` Milian Wolff 2017-04-03 19:09 ` Benjamin King 0 siblings, 2 replies; 6+ messages in thread From: Benjamin King @ 2017-03-30 20:04 UTC (permalink / raw) To: linux-perf-users Hi, I'd like to get a big picture of where a memory hogging process uses physical memory. I'm interested in call graphs, but in terms of Brendans treatise (http://www.brendangregg.com/FlameGraphs/memoryflamegraphs.html), I'd love to analyze page faults a bit before continuing with the more expensive tracing of malloc and friends. My problem is that we mmap some readonly files more than once to save memory, but each individual mapping creates page faults. This makes sense, but how can I measure physical memory properly, then? Parsing the Pss-Rows in /proc/<pid>/smaps does work, but seems a bit clumsy. Is there a better way (e.g. with callstacks) to measure physical memory growth for a process? cheers, Benjamin PS: Here is some measurement with a leaky toy program. It uses a 1GB zero-filled file and drops file system caches prior to the measurement to encourage major page faults for the first mapping only. Does not work at all: ----- $ gcc -O0 mmap_faults.c $ fallocate -z -l $((1<<30)) 1gb_of_garbage.dat $ sudo sysctl -w vm.drop_caches=3 vm.drop_caches = 3 $ perf stat -eminor-faults,major-faults ./a.out Performance counter stats for './a.out': 327,726 minor-faults 1 major-faults $ cat mmap_faults.c #include <fcntl.h> #include <sys/mman.h> #include <unistd.h> #define numMaps 20 #define length 1u<<30 #define path "1gb_of_garbage.dat" int main() { int sum = 0; for ( int j = 0; j < numMaps; ++j ) { const char *result = (const char*)mmap( NULL, length, PROT_READ, MAP_PRIVATE, open( path, O_RDONLY ), 0 ); for ( int i = 0; i < length; i += 4096 ) sum += result[ i ]; } return sum; } ----- Shouldn't I see ~5 Million page faults (20GB/4K)? Shouldn't I see more major page faults? Same thing when garbage-file is filled from /dev/urandom Even weirder when MAP_POPULATE'ing the file ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: How to sample memory usage cheaply? 2017-03-30 20:04 How to sample memory usage cheaply? Benjamin King @ 2017-03-31 12:06 ` Milian Wolff 2017-04-01 7:41 ` Benjamin King 2017-04-03 19:09 ` Benjamin King 1 sibling, 1 reply; 6+ messages in thread From: Milian Wolff @ 2017-03-31 12:06 UTC (permalink / raw) To: Benjamin King; +Cc: linux-perf-users [-- Attachment #1: Type: text/plain, Size: 3688 bytes --] On Donnerstag, 30. März 2017 22:04:04 CEST Benjamin King wrote: > Hi, > > I'd like to get a big picture of where a memory hogging process uses > physical memory. I'm interested in call graphs, but in terms of Brendans > treatise (http://www.brendangregg.com/FlameGraphs/memoryflamegraphs.html), > I'd love to analyze page faults a bit before continuing with the more > expensive tracing of malloc and friends. I suggest you try out my heaptrack tool. While "expensive" it does work quite well even for applications that allocate a lot of heap memory. It's super easy to use, and if it works you don't need to venture into low-level profiling like you are describing below. https://www.kdab.com/heaptrack-v1-0-0-release/ If you are actually using low-level stuff directly, i.e. have a custom memory pool implementation, heaptrack also offers an API similar to Valgrind such that you can annotate your custom heap allocations. > My problem is that we mmap some readonly files more than once to save > memory, but each individual mapping creates page faults. > > This makes sense, but how can I measure physical memory properly, then? > Parsing the Pss-Rows in /proc/<pid>/smaps does work, but seems a bit clumsy. > Is there a better way (e.g. with callstacks) to measure physical memory > growth for a process? From what I understand, wouldn't you get this by tracing sbrk + mmap with call stacks? Do note that file-backed mmaps can be shared across processes, so you'll have to take that into account as done for Pss. But in my experience, when you want to improve a single application's memory consumption, it usually boils down to non-shared heap memory anyways. I.e. sbrk and anon mmaps. But if you look at the callstacks for these syscalls, they usually point at the obvious places (i.e. mempools), but you won't see what is _actually_ using the memory of these pools. Heaptrack or massif is much better in that regard. Hope that helps, happy profiling! > PS: > Here is some measurement with a leaky toy program. It uses a 1GB > zero-filled file and drops file system caches prior to the measurement to > encourage major page faults for the first mapping only. Does not work at > all: ----- > $ gcc -O0 mmap_faults.c > $ fallocate -z -l $((1<<30)) 1gb_of_garbage.dat > $ sudo sysctl -w vm.drop_caches=3 > vm.drop_caches = 3 > $ perf stat -eminor-faults,major-faults ./a.out > > Performance counter stats for './a.out': > > 327,726 minor-faults > 1 major-faults > $ cat mmap_faults.c > #include <fcntl.h> > #include <sys/mman.h> > #include <unistd.h> > > #define numMaps 20 > #define length 1u<<30 > #define path "1gb_of_garbage.dat" > > int main() > { > int sum = 0; > for ( int j = 0; j < numMaps; ++j ) > { > const char *result = > (const char*)mmap( NULL, length, PROT_READ, MAP_PRIVATE, > open( path, O_RDONLY ), 0 ); > > for ( int i = 0; i < length; i += 4096 ) > sum += result[ i ]; > } > return sum; > } > ----- > Shouldn't I see ~5 Million page faults (20GB/4K)? > Shouldn't I see more major page faults? > Same thing when garbage-file is filled from /dev/urandom > Even weirder when MAP_POPULATE'ing the file > -- > To unsubscribe from this list: send the line "unsubscribe linux-perf-users" > in the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Milian Wolff | milian.wolff@kdab.com | Software Engineer KDAB (Deutschland) GmbH&Co KG, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt Experts [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5903 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: How to sample memory usage cheaply? 2017-03-31 12:06 ` Milian Wolff @ 2017-04-01 7:41 ` Benjamin King 2017-04-01 13:54 ` Vince Weaver 0 siblings, 1 reply; 6+ messages in thread From: Benjamin King @ 2017-04-01 7:41 UTC (permalink / raw) To: Milian Wolff; +Cc: linux-perf-users Hi! On Fri, Mar 31, 2017 at 02:06:28PM +0200, Milian Wolff wrote: >> I'd love to analyze page faults a bit before continuing with the more >> expensive tracing of malloc and friends. > >I suggest you try out my heaptrack tool. I did! Also the heap profiler from google perftools and a homegrown approach using uprobes on malloc, free, mmap and munmap. My bcc-fu is too weak to do me any good right now. The workload that I am working with takes a few hours to generate, however, so more overhead means a lot of latency to observe the effect of changes that I do. I guess that the issue I am facing currently is a bit more fundamental, since I don't have a good way to trace/sample instances where my process is using a page of physical memory *for the first time*. Bonus points to detect when the use count on behalf of my process drops to zero. I'd like to tell these events apart from instances where I am using the same physical page from more than one virtual page. It feels like there should be a way to do so, but I don't know how. Also, the 'perf stat' output that I sent does not make much sense to me right now. For example, when I add MAP_POPULATE to the flags for mmap, I only see ~40 minor page faults, which I do not understand at all. >> This makes sense, but how can I measure physical memory properly, then? >> Parsing the Pss-Rows in /proc/<pid>/smaps does work, but seems a bit clumsy. >> Is there a better way (e.g. with callstacks) to measure physical memory >> growth for a process? > From what I understand, wouldn't you get this by tracing sbrk + mmap with call >stacks? No, not quite, since different threads in my process mmap the same file. This counts them twice in terms of virtual memory but I'm interested in the load on physical memory. > Do note that file-backed mmaps can be shared across processes, so >you'll have to take that into account as done for Pss. But in my experience, >when you want to improve a single application's memory consumption, it usually >boils down to non-shared heap memory anyways. I.e. sbrk and anon mmaps. True, and that's what I'm after, eventually, but the multiple mappings skew the picture a bit right now. To make progress, I need to figure out how to properly measure physical memory usage more. >But if you look at the callstacks for these syscalls, they usually point at >the obvious places (i.e. mempools), but you won't see what is _actually_ using >the memory of these pools. Heaptrack or massif is much better in that regard. > >Hope that helps, happy profiling! Thanks, Benjamin > >> PS: >> Here is some measurement with a leaky toy program. It uses a 1GB >> zero-filled file and drops file system caches prior to the measurement to >> encourage major page faults for the first mapping only. Does not work at >> all: ----- >> $ gcc -O0 mmap_faults.c >> $ fallocate -z -l $((1<<30)) 1gb_of_garbage.dat >> $ sudo sysctl -w vm.drop_caches=3 >> vm.drop_caches = 3 >> $ perf stat -eminor-faults,major-faults ./a.out >> >> Performance counter stats for './a.out': >> >> 327,726 minor-faults >> 1 major-faults >> $ cat mmap_faults.c >> #include <fcntl.h> >> #include <sys/mman.h> >> #include <unistd.h> >> >> #define numMaps 20 >> #define length 1u<<30 >> #define path "1gb_of_garbage.dat" >> >> int main() >> { >> int sum = 0; >> for ( int j = 0; j < numMaps; ++j ) >> { >> const char *result = >> (const char*)mmap( NULL, length, PROT_READ, MAP_PRIVATE, >> open( path, O_RDONLY ), 0 ); >> >> for ( int i = 0; i < length; i += 4096 ) >> sum += result[ i ]; >> } >> return sum; >> } >> ----- >> Shouldn't I see ~5 Million page faults (20GB/4K)? >> Shouldn't I see more major page faults? >> Same thing when garbage-file is filled from /dev/urandom >> Even weirder when MAP_POPULATE'ing the file >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-perf-users" >> in the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > >-- >Milian Wolff | milian.wolff@kdab.com | Software Engineer >KDAB (Deutschland) GmbH&Co KG, a KDAB Group company >Tel: +49-30-521325470 >KDAB - The Qt Experts ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: How to sample memory usage cheaply? 2017-04-01 7:41 ` Benjamin King @ 2017-04-01 13:54 ` Vince Weaver 2017-04-01 16:27 ` Benjamin King 0 siblings, 1 reply; 6+ messages in thread From: Vince Weaver @ 2017-04-01 13:54 UTC (permalink / raw) To: Benjamin King; +Cc: Milian Wolff, linux-perf-users On Sat, 1 Apr 2017, Benjamin King wrote: > Also, the 'perf stat' output that I sent does not make much sense to me right > now. For example, when I add MAP_POPULATE to the flags for mmap, I only see > ~40 minor page faults, which I do not understand at all. have you tried accessing your file in random order? I think the kernel is likely doing some sort of readahead/prefetching. I think you can change that behavior with the madvise() syscall Vince ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: How to sample memory usage cheaply? 2017-04-01 13:54 ` Vince Weaver @ 2017-04-01 16:27 ` Benjamin King 0 siblings, 0 replies; 6+ messages in thread From: Benjamin King @ 2017-04-01 16:27 UTC (permalink / raw) To: Vince Weaver; +Cc: Milian Wolff, linux-perf-users Hi Vince, >> Also, the 'perf stat' output that I sent does not make much sense to me right >> now. For example, when I add MAP_POPULATE to the flags for mmap, I only see >> ~40 minor page faults, which I do not understand at all. > >have you tried accessing your file in random order? I think the kernel is >likely doing some sort of readahead/prefetching. I think you can change >that behavior with the madvise() syscall No, I did not try that. Just reading in every 4096th byte. I did assume that any way of populating the page table of my process would show up as a page fault. I guess, I have to read a bit more on that... Cheers, Benjamin ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: How to sample memory usage cheaply? 2017-03-30 20:04 How to sample memory usage cheaply? Benjamin King 2017-03-31 12:06 ` Milian Wolff @ 2017-04-03 19:09 ` Benjamin King 1 sibling, 0 replies; 6+ messages in thread From: Benjamin King @ 2017-04-03 19:09 UTC (permalink / raw) To: linux-perf-users On Thu, Mar 30, 2017 at 10:04:04PM +0200, Benjamin King wrote: Hi, I learned a bit more about observing memory with perf. If this is not the right place to discuss this any more, just tell me to shut up :-) wrapping this up a bit: >I'd like to get a big picture of where a memory hogging process uses physical >memory. I'm interested in call graphs, [...] I'd love to >analyze page faults I have learned that first use of a physical page is called "page allocation", which can be traced via the event kmem:mm_page_alloc. This is the pyhsical analogue of and different from "page faults" that happen in virtual memory. Maping a file with MAP_POPULATE after dropping filesystem caches (sysctl -w vm.drop_caches=3) will show the right number in kmem:mm_page_alloc, namely the size/4K. 4K is the page size on my system. If I do the same again without dropping caches in between, mm_page_alloc does not show the same number, but rather the number of pages it takes to hold the page table entries. This is nice and fairly complete, but I still hope to find a way to observe when a page from the filesystem cache is referenced for the first time from my process. This would allow me to do without the cache dropping. Page faults from virtual memory are more opaque to me. They only seem to be counted when the system did not prepare the process via prefetching. For example, MAP_POPULATE'd mappings will not count towards page faults, neither minor nor major ones. To control some of the prefetching, there is a debugfs knob called /sys/kernel/debug/fault_around_bytes, but reducing this to the minimum on my machine does not produce a page fault number that I could easily explain, at least not in the MAP_POPULATE case. It might work better when actually reading data from the mapped file. Anticipating page faults and preventing them proactively is a nice service from the OS, but I would be delighted if there was a way to trace this as well, similar to how mm_page_alloc will count each and every pyhsical allocation. This would make page faults more useful as a poor man's memory tracker. Cheers, Benjamin ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-04-03 19:09 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-03-30 20:04 How to sample memory usage cheaply? Benjamin King 2017-03-31 12:06 ` Milian Wolff 2017-04-01 7:41 ` Benjamin King 2017-04-01 13:54 ` Vince Weaver 2017-04-01 16:27 ` Benjamin King 2017-04-03 19:09 ` Benjamin King
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.