From: Wu Fengguang <fengguang.wu@intel.com> To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org>, Andrew Morton <akpm@linux-foundation.org>, LKML <linux-kernel@vger.kernel.org>, "linux-mm@kvack.org" <linux-mm@kvack.org> Subject: [RFC][PATCH] proc: export more page flags in /proc/kpageflags (take 3) Date: Thu, 23 Apr 2009 10:26:25 +0800 [thread overview] Message-ID: <20090423022625.GA8822@localhost> (raw) In-Reply-To: <20090416111108.AC55.A69D9226@jp.fujitsu.com> Andi and KOSAKI: can we hopefully reach harmony of opinions on this version? Export 9 page flags in /proc/kpageflags, and 8 more for kernel developers. 1) for kernel hackers (on CONFIG_DEBUG_KERNEL) - all available page flags are exported, and - exported as is 2) for admins and end users - only the more `well known' flags are exported: 11. KPF_MMAP (pseudo flag) memory mapped page 12. KPF_ANON (pseudo flag) memory mapped page (anonymous) 13. KPF_SWAPCACHE page is in swap cache 14. KPF_SWAPBACKED page is swap/RAM backed 15. KPF_COMPOUND_HEAD (*) 16. KPF_COMPOUND_TAIL (*) 17. KPF_UNEVICTABLE page is in the unevictable LRU list 18. KPF_POISON hardware detected corruption 19. KPF_NOPAGE (pseudo flag) no page frame at the address (*) For compound pages, exporting _both_ head/tail info enables users to tell where a compound page starts/ends, and its order. - limit flags to their typical usage scenario, as indicated by KOSAKI: - LRU pages: only export relevant flags - PG_lru - PG_unevictable - PG_active - PG_referenced - page_mapped() - PageAnon() - PG_swapcache - PG_swapbacked - PG_reclaim - no-IO pages: mask out irrelevant flags - PG_dirty - PG_uptodate - PG_writeback - SLAB pages: mask out overloaded flags: - PG_error - PG_active - PG_private - PG_reclaim: filter out the overloaded PG_readahead Note that compound page flags are exported faithfully to end user. This risks exposing internal implementation details of the SLUB allocator, however hiding it risks larger impacts: - admins may wonder where all the compound pages gone - the use of compound pages in SLUB might have some real world relevance, so that end users want to be aware of this behavior - admins may be confused on inconsistent number of head/tail segments This is because SLUB only marks PG_slab on the compound head page. If we mask out PG_head|PG_tail for PG_slab pages, we are actually only masking out PG_head flags. Therefore the PG_tail segments will outnumber PG_head ones, which puzzled me for some time.. Here are the admin/linus views of all page flags on a newly booted nfs-root system: # ./page-types # for admin flags page-count MB symbolic-flags long-symbolic-flags 0x000000000000 491449 1919 ____________________________ 0x000000008000 15 0 _______________H____________ compound_head 0x000000010000 4280 16 ________________T___________ compound_tail 0x000000000008 17 0 ___U________________________ uptodate 0x000000008010 1 0 ____D__________H____________ dirty,compound_head 0x000000010010 4 0 ____D___________T___________ dirty,compound_tail 0x000000000020 1 0 _____l______________________ lru 0x000000000028 2678 10 ___U_l______________________ uptodate,lru 0x00000000002c 5244 20 __RU_l______________________ referenced,uptodate,lru 0x000000004060 1 0 _____lA_______b_____________ lru,active,swapbacked 0x000000004064 13 0 __R__lA_______b_____________ referenced,lru,active,swapbacked 0x000000000068 236 0 ___U_lA_____________________ uptodate,lru,active 0x00000000006c 927 3 __RU_lA_____________________ referenced,uptodate,lru,active 0x000000008080 968 3 _______S_______H____________ slab,compound_head 0x000000000080 1539 6 _______S____________________ slab 0x000000000400 516 2 __________B_________________ buddy 0x000000000828 1142 4 ___U_l_____M________________ uptodate,lru,mmap 0x00000000082c 280 1 __RU_l_____M________________ referenced,uptodate,lru,mmap 0x000000004860 2 0 _____lA____M__b_____________ lru,active,mmap,swapbacked 0x000000000868 366 1 ___U_lA____M________________ uptodate,lru,active,mmap 0x00000000086c 623 2 __RU_lA____M________________ referenced,uptodate,lru,active,mmap 0x000000005868 3639 14 ___U_lA____Ma_b_____________ uptodate,lru,active,mmap,anonymous,swapbacked 0x00000000586c 27 0 __RU_lA____Ma_b_____________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked total 513968 2007 # ./page-types # for linus, when CONFIG_DEBUG_KERNEL is turned on flags page-count MB symbolic-flags long-symbolic-flags 0x000000000000 471731 1842 ____________________________ 0x000100000000 19258 75 ____________________r_______ reserved 0x000000008000 15 0 _______________H____________ compound_head 0x000000010000 4270 16 ________________T___________ compound_tail 0x000000000008 3 0 ___U________________________ uptodate 0x000000008014 1 0 __R_D__________H____________ referenced,dirty,compound_head 0x000000010014 4 0 __R_D___________T___________ referenced,dirty,compound_tail 0x000000000020 1 0 _____l______________________ lru 0x000000000028 2626 10 ___U_l______________________ uptodate,lru 0x00000000002c 5244 20 __RU_l______________________ referenced,uptodate,lru 0x000000000068 238 0 ___U_lA_____________________ uptodate,lru,active 0x00000000006c 925 3 __RU_lA_____________________ referenced,uptodate,lru,active 0x000000004078 1 0 ___UDlA_______b_____________ uptodate,dirty,lru,active,swapbacked 0x00000000407c 13 0 __RUDlA_______b_____________ referenced,uptodate,dirty,lru,active,swapbacked 0x000000000228 49 0 ___U_l___I__________________ uptodate,lru,reclaim 0x000000000400 523 2 __________B_________________ buddy 0x000000000804 1 0 __R________M________________ referenced,mmap 0x00000000080c 1 0 __RU_______M________________ referenced,uptodate,mmap 0x000000000828 1142 4 ___U_l_____M________________ uptodate,lru,mmap 0x00000000082c 280 1 __RU_l_____M________________ referenced,uptodate,lru,mmap 0x000000000868 366 1 ___U_lA____M________________ uptodate,lru,active,mmap 0x00000000086c 622 2 __RU_lA____M________________ referenced,uptodate,lru,active,mmap 0x000000004878 2 0 ___UDlA____M__b_____________ uptodate,dirty,lru,active,mmap,swapbacked 0x000000008880 907 3 _______S___M___H____________ slab,mmap,compound_head 0x000000000880 1488 5 _______S___M________________ slab,mmap 0x0000000088c0 59 0 ______AS___M___H____________ active,slab,mmap,compound_head 0x0000000008c0 49 0 ______AS___M________________ active,slab,mmap 0x000000001000 465 1 ____________a_______________ anonymous 0x000000005008 8 0 ___U________a_b_____________ uptodate,anonymous,swapbacked 0x000000005808 4 0 ___U_______Ma_b_____________ uptodate,mmap,anonymous,swapbacked 0x00000000580c 1 0 __RU_______Ma_b_____________ referenced,uptodate,mmap,anonymous,swapbacked 0x000000005868 3645 14 ___U_lA____Ma_b_____________ uptodate,lru,active,mmap,anonymous,swapbacked 0x00000000586c 26 0 __RU_lA____Ma_b_____________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked total 513968 2007 Kudos to KOSAKI and Andi for the extensive recommendations! Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> --- Documentation/vm/pagemap.txt | 65 ++++++++++ fs/proc/page.c | 197 +++++++++++++++++++++++++++------ 2 files changed, 227 insertions(+), 35 deletions(-) --- mm.orig/fs/proc/page.c +++ mm/fs/proc/page.c @@ -6,6 +6,7 @@ #include <linux/mmzone.h> #include <linux/proc_fs.h> #include <linux/seq_file.h> +#include <linux/backing-dev.h> #include <asm/uaccess.h> #include "internal.h" @@ -68,19 +69,167 @@ static const struct file_operations proc /* These macros are used to decouple internal flags from exported ones */ -#define KPF_LOCKED 0 -#define KPF_ERROR 1 -#define KPF_REFERENCED 2 -#define KPF_UPTODATE 3 -#define KPF_DIRTY 4 -#define KPF_LRU 5 -#define KPF_ACTIVE 6 -#define KPF_SLAB 7 -#define KPF_WRITEBACK 8 -#define KPF_RECLAIM 9 -#define KPF_BUDDY 10 +#define KPF_LOCKED 0 +#define KPF_ERROR 1 +#define KPF_REFERENCED 2 +#define KPF_UPTODATE 3 +#define KPF_DIRTY 4 +#define KPF_LRU 5 +#define KPF_ACTIVE 6 +#define KPF_SLAB 7 +#define KPF_WRITEBACK 8 +#define KPF_RECLAIM 9 +#define KPF_BUDDY 10 + +/* new additions in 2.6.31 */ +#define KPF_MMAP 11 +#define KPF_ANON 12 +#define KPF_SWAPCACHE 13 +#define KPF_SWAPBACKED 14 +#define KPF_COMPOUND_HEAD 15 +#define KPF_COMPOUND_TAIL 16 +#define KPF_UNEVICTABLE 17 +#define KPF_POISON 18 +#define KPF_NOPAGE 19 + +/* kernel hacking assistances */ +#define KPF_RESERVED 32 +#define KPF_MLOCKED 33 +#define KPF_MAPPEDTODISK 34 +#define KPF_PRIVATE 35 +#define KPF_PRIVATE2 36 +#define KPF_OWNER_PRIVATE 37 +#define KPF_ARCH 38 +#define KPF_UNCACHED 39 + +/* + * Kernel flags are exported faithfully to Linus and his fellow hackers. + * Otherwise some details are masked to avoid confusing the end user: + * - some kernel flags are completely invisible + * - some kernel flags are conditionally invisible on their odd usages + */ +#ifdef CONFIG_DEBUG_KERNEL +static inline int genuine_linus(void) { return 1; } +#else +static inline int genuine_linus(void) { return 0; } +#endif + +#define kpf_copy_bit(uflags, kflags, visible, ubit, kbit) \ + do { \ + if (visible || genuine_linus()) \ + uflags |= ((kflags >> kbit) & 1) << ubit; \ + } while (0); + +/* a helper function _not_ intended for more general uses */ +static inline int page_cap_writeback_dirty(struct page *page) +{ + struct address_space *mapping = NULL; + + if (!PageSlab(page)) + mapping = page_mapping(page); + + return !mapping || mapping_cap_writeback_dirty(mapping); +} -#define kpf_copy_bit(flags, dstpos, srcpos) (((flags >> srcpos) & 1) << dstpos) +static u64 get_uflags(struct page *page) +{ + u64 k; + u64 u; + int io; + int lru; + int slab; + + /* + * pseudo flag: KPF_NOPAGE + * it differentiates a memory hole from a page with no flags + */ + if (!page) + return 1 << KPF_NOPAGE; + + k = page->flags; + u = 0; + + io = page_cap_writeback_dirty(page); + lru = k & (1 << PG_lru); + slab = k & (1 << PG_slab); + + /* + * pseudo flags for the well known (anonymous) memory mapped pages + */ + if (lru || genuine_linus()) { + if (page_mapped(page)) + u |= 1 << KPF_MMAP; + if (PageAnon(page)) + u |= 1 << KPF_ANON; + } + + /* + * compound pages: export both head/tail info + * they together define a compound page's start/end pos and order + */ + if (PageHead(page)) + u |= 1 << KPF_COMPOUND_HEAD; + if (PageTail(page)) + u |= 1 << KPF_COMPOUND_TAIL; + + kpf_copy_bit(u, k, 1, KPF_LOCKED, PG_locked); + + kpf_copy_bit(u, k, 1, KPF_SLAB, PG_slab); + kpf_copy_bit(u, k, 1, KPF_BUDDY, PG_buddy); + + kpf_copy_bit(u, k, io, KPF_ERROR, PG_error); + kpf_copy_bit(u, k, io, KPF_DIRTY, PG_dirty); + kpf_copy_bit(u, k, io, KPF_UPTODATE, PG_uptodate); + kpf_copy_bit(u, k, io, KPF_WRITEBACK, PG_writeback); + + kpf_copy_bit(u, k, 1, KPF_LRU, PG_lru); + kpf_copy_bit(u, k, lru, KPF_REFERENCED, PG_referenced); + kpf_copy_bit(u, k, lru, KPF_ACTIVE, PG_active); + kpf_copy_bit(u, k, lru, KPF_RECLAIM, PG_reclaim); + + kpf_copy_bit(u, k, lru, KPF_SWAPCACHE, PG_swapcache); + kpf_copy_bit(u, k, lru, KPF_SWAPBACKED, PG_swapbacked); + +#ifdef CONFIG_MEMORY_FAILURE + kpf_copy_bit(u, k, 1, KPF_POISON, PG_poison); +#endif + +#ifdef CONFIG_UNEVICTABLE_LRU + kpf_copy_bit(u, k, lru, KPF_UNEVICTABLE, PG_unevictable); + kpf_copy_bit(u, k, 0, KPF_MLOCKED, PG_mlocked); +#endif + + kpf_copy_bit(u, k, 0, KPF_RESERVED, PG_reserved); + kpf_copy_bit(u, k, 0, KPF_MAPPEDTODISK, PG_mappedtodisk); + kpf_copy_bit(u, k, 0, KPF_PRIVATE, PG_private); + kpf_copy_bit(u, k, 0, KPF_PRIVATE2, PG_private_2); + kpf_copy_bit(u, k, 0, KPF_OWNER_PRIVATE, PG_owner_priv_1); + kpf_copy_bit(u, k, 0, KPF_ARCH, PG_arch_1); + +#ifdef CONFIG_IA64_UNCACHED_ALLOCATOR + kpf_copy_bit(u, k, 0, KPF_UNCACHED, PG_uncached); +#endif + + if (!genuine_linus()) { + /* + * SLAB/SLOB/SLUB overload some page flags which may confuse end user + */ + if (slab) { + u &= ~ ((1 << KPF_ACTIVE) | + (1 << KPF_ERROR) | + (1 << KPF_MMAP)); + } + /* + * PG_reclaim could be overloaded as PG_readahead, + * and we only want to export the first one. + */ + if ((u & ((1 << KPF_RECLAIM) | (1 << KPF_WRITEBACK))) == + (1 << KPF_RECLAIM)) + u &= ~ (1 << KPF_RECLAIM); + } + + return u; +}; static ssize_t kpageflags_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) @@ -90,7 +239,6 @@ static ssize_t kpageflags_read(struct fi unsigned long src = *ppos; unsigned long pfn; ssize_t ret = 0; - u64 kflags, uflags; pfn = src / KPMSIZE; count = min_t(unsigned long, count, (max_pfn * KPMSIZE) - src); @@ -98,32 +246,17 @@ static ssize_t kpageflags_read(struct fi return -EINVAL; while (count > 0) { - ppage = NULL; if (pfn_valid(pfn)) ppage = pfn_to_page(pfn); - pfn++; - if (!ppage) - kflags = 0; else - kflags = ppage->flags; - - uflags = kpf_copy_bit(kflags, KPF_LOCKED, PG_locked) | - kpf_copy_bit(kflags, KPF_ERROR, PG_error) | - kpf_copy_bit(kflags, KPF_REFERENCED, PG_referenced) | - kpf_copy_bit(kflags, KPF_UPTODATE, PG_uptodate) | - kpf_copy_bit(kflags, KPF_DIRTY, PG_dirty) | - kpf_copy_bit(kflags, KPF_LRU, PG_lru) | - kpf_copy_bit(kflags, KPF_ACTIVE, PG_active) | - kpf_copy_bit(kflags, KPF_SLAB, PG_slab) | - kpf_copy_bit(kflags, KPF_WRITEBACK, PG_writeback) | - kpf_copy_bit(kflags, KPF_RECLAIM, PG_reclaim) | - kpf_copy_bit(kflags, KPF_BUDDY, PG_buddy); + ppage = NULL; - if (put_user(uflags, out++)) { + if (put_user(get_uflags(ppage), out)) { ret = -EFAULT; break; } - + out++; + pfn++; count -= KPMSIZE; } --- mm.orig/Documentation/vm/pagemap.txt +++ mm/Documentation/vm/pagemap.txt @@ -12,9 +12,9 @@ There are three components to pagemap: value for each virtual page, containing the following data (from fs/proc/task_mmu.c, above pagemap_read): - * Bits 0-55 page frame number (PFN) if present + * Bits 0-54 page frame number (PFN) if present * Bits 0-4 swap type if swapped - * Bits 5-55 swap offset if swapped + * Bits 5-54 swap offset if swapped * Bits 55-60 page shift (page size = 1<<page shift) * Bit 61 reserved for future use * Bit 62 page swapped @@ -36,7 +36,7 @@ There are three components to pagemap: * /proc/kpageflags. This file contains a 64-bit set of flags for each page, indexed by PFN. - The flags are (from fs/proc/proc_misc, above kpageflags_read): + The flags are (from fs/proc/page.c, above kpageflags_read): 0. LOCKED 1. ERROR @@ -49,6 +49,65 @@ There are three components to pagemap: 8. WRITEBACK 9. RECLAIM 10. BUDDY + 11. MMAP + 12. ANON + 13. SWAPCACHE + 14. SWAPBACKED + 15. COMPOUND_HEAD + 16. COMPOUND_TAIL + 17. UNEVICTABLE + 18. POISON + 19. NOPAGE + +Short descriptions to the page flags: + + 0. LOCKED + page is being locked for exclusive access, eg. by undergoing read/write IO + + 7. SLAB + page is managed by the SLAB/SLOB/SLUB/SLQB kernel memory allocator + +10. BUDDY + a free memory block managed by the buddy system allocator + The buddy system organizes free memory in blocks of various orders. + An order N block has 2^N physically contiguous pages, with the BUDDY flag + set for and _only_ for the first page. + +15. COMPOUND_HEAD +16. COMPOUND_TAIL + A compound page with order N consists of 2^N physically contiguous pages. + A compound page with order 2 takes the form of "HTTT", where H donates its + head page and T donates its tail page(s). The major consumers of compound + pages are hugeTLB pages (Documentation/vm/hugetlbpage.txt), the SLUB etc. + memory allocators and various device drivers. + +18. POISON + hardware has detected memory corruption on this page + +19. NOPAGE + no page frame exists at the requested address + + [IO related page flags] + 1. ERROR IO error occurred + 3. UPTODATE page has up-to-date data + ie. for file backed page: (in-memory data revision >= on-disk one) + 4. DIRTY page has been written to, hence contains new data + ie. for file backed page: (in-memory data revision > on-disk one) + 8. WRITEBACK page is being synced to disk + + [LRU related page flags] + 5. LRU page is in one of the LRU lists + 6. ACTIVE page is in the active LRU list +17. UNEVICTABLE page is in the unevictable (non-)LRU list + It is somehow pinned and not a candidate for LRU page reclaims, + eg. ramfs pages, shmctl(SHM_LOCK) and mlock() memory segments + 2. REFERENCED page has been referenced since last LRU list enqueue/requeue + 9. RECLAIM page will be reclaimed soon after its pageout IO completed +11. MMAP a memory mapped page +12. ANON a memory mapped page who is not a file page +13. SWAPCACHE page is mapped to swap space, ie. has an associated swap entry +14. SWAPBACKED page is backed by swap/RAM + Using pagemap to do something useful:
WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com> To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org>, Andrew Morton <akpm@linux-foundation.org>, LKML <linux-kernel@vger.kernel.org>, "linux-mm@kvack.org" <linux-mm@kvack.org> Subject: [RFC][PATCH] proc: export more page flags in /proc/kpageflags (take 3) Date: Thu, 23 Apr 2009 10:26:25 +0800 [thread overview] Message-ID: <20090423022625.GA8822@localhost> (raw) In-Reply-To: <20090416111108.AC55.A69D9226@jp.fujitsu.com> Andi and KOSAKI: can we hopefully reach harmony of opinions on this version? Export 9 page flags in /proc/kpageflags, and 8 more for kernel developers. 1) for kernel hackers (on CONFIG_DEBUG_KERNEL) - all available page flags are exported, and - exported as is 2) for admins and end users - only the more `well known' flags are exported: 11. KPF_MMAP (pseudo flag) memory mapped page 12. KPF_ANON (pseudo flag) memory mapped page (anonymous) 13. KPF_SWAPCACHE page is in swap cache 14. KPF_SWAPBACKED page is swap/RAM backed 15. KPF_COMPOUND_HEAD (*) 16. KPF_COMPOUND_TAIL (*) 17. KPF_UNEVICTABLE page is in the unevictable LRU list 18. KPF_POISON hardware detected corruption 19. KPF_NOPAGE (pseudo flag) no page frame at the address (*) For compound pages, exporting _both_ head/tail info enables users to tell where a compound page starts/ends, and its order. - limit flags to their typical usage scenario, as indicated by KOSAKI: - LRU pages: only export relevant flags - PG_lru - PG_unevictable - PG_active - PG_referenced - page_mapped() - PageAnon() - PG_swapcache - PG_swapbacked - PG_reclaim - no-IO pages: mask out irrelevant flags - PG_dirty - PG_uptodate - PG_writeback - SLAB pages: mask out overloaded flags: - PG_error - PG_active - PG_private - PG_reclaim: filter out the overloaded PG_readahead Note that compound page flags are exported faithfully to end user. This risks exposing internal implementation details of the SLUB allocator, however hiding it risks larger impacts: - admins may wonder where all the compound pages gone - the use of compound pages in SLUB might have some real world relevance, so that end users want to be aware of this behavior - admins may be confused on inconsistent number of head/tail segments This is because SLUB only marks PG_slab on the compound head page. If we mask out PG_head|PG_tail for PG_slab pages, we are actually only masking out PG_head flags. Therefore the PG_tail segments will outnumber PG_head ones, which puzzled me for some time.. Here are the admin/linus views of all page flags on a newly booted nfs-root system: # ./page-types # for admin flags page-count MB symbolic-flags long-symbolic-flags 0x000000000000 491449 1919 ____________________________ 0x000000008000 15 0 _______________H____________ compound_head 0x000000010000 4280 16 ________________T___________ compound_tail 0x000000000008 17 0 ___U________________________ uptodate 0x000000008010 1 0 ____D__________H____________ dirty,compound_head 0x000000010010 4 0 ____D___________T___________ dirty,compound_tail 0x000000000020 1 0 _____l______________________ lru 0x000000000028 2678 10 ___U_l______________________ uptodate,lru 0x00000000002c 5244 20 __RU_l______________________ referenced,uptodate,lru 0x000000004060 1 0 _____lA_______b_____________ lru,active,swapbacked 0x000000004064 13 0 __R__lA_______b_____________ referenced,lru,active,swapbacked 0x000000000068 236 0 ___U_lA_____________________ uptodate,lru,active 0x00000000006c 927 3 __RU_lA_____________________ referenced,uptodate,lru,active 0x000000008080 968 3 _______S_______H____________ slab,compound_head 0x000000000080 1539 6 _______S____________________ slab 0x000000000400 516 2 __________B_________________ buddy 0x000000000828 1142 4 ___U_l_____M________________ uptodate,lru,mmap 0x00000000082c 280 1 __RU_l_____M________________ referenced,uptodate,lru,mmap 0x000000004860 2 0 _____lA____M__b_____________ lru,active,mmap,swapbacked 0x000000000868 366 1 ___U_lA____M________________ uptodate,lru,active,mmap 0x00000000086c 623 2 __RU_lA____M________________ referenced,uptodate,lru,active,mmap 0x000000005868 3639 14 ___U_lA____Ma_b_____________ uptodate,lru,active,mmap,anonymous,swapbacked 0x00000000586c 27 0 __RU_lA____Ma_b_____________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked total 513968 2007 # ./page-types # for linus, when CONFIG_DEBUG_KERNEL is turned on flags page-count MB symbolic-flags long-symbolic-flags 0x000000000000 471731 1842 ____________________________ 0x000100000000 19258 75 ____________________r_______ reserved 0x000000008000 15 0 _______________H____________ compound_head 0x000000010000 4270 16 ________________T___________ compound_tail 0x000000000008 3 0 ___U________________________ uptodate 0x000000008014 1 0 __R_D__________H____________ referenced,dirty,compound_head 0x000000010014 4 0 __R_D___________T___________ referenced,dirty,compound_tail 0x000000000020 1 0 _____l______________________ lru 0x000000000028 2626 10 ___U_l______________________ uptodate,lru 0x00000000002c 5244 20 __RU_l______________________ referenced,uptodate,lru 0x000000000068 238 0 ___U_lA_____________________ uptodate,lru,active 0x00000000006c 925 3 __RU_lA_____________________ referenced,uptodate,lru,active 0x000000004078 1 0 ___UDlA_______b_____________ uptodate,dirty,lru,active,swapbacked 0x00000000407c 13 0 __RUDlA_______b_____________ referenced,uptodate,dirty,lru,active,swapbacked 0x000000000228 49 0 ___U_l___I__________________ uptodate,lru,reclaim 0x000000000400 523 2 __________B_________________ buddy 0x000000000804 1 0 __R________M________________ referenced,mmap 0x00000000080c 1 0 __RU_______M________________ referenced,uptodate,mmap 0x000000000828 1142 4 ___U_l_____M________________ uptodate,lru,mmap 0x00000000082c 280 1 __RU_l_____M________________ referenced,uptodate,lru,mmap 0x000000000868 366 1 ___U_lA____M________________ uptodate,lru,active,mmap 0x00000000086c 622 2 __RU_lA____M________________ referenced,uptodate,lru,active,mmap 0x000000004878 2 0 ___UDlA____M__b_____________ uptodate,dirty,lru,active,mmap,swapbacked 0x000000008880 907 3 _______S___M___H____________ slab,mmap,compound_head 0x000000000880 1488 5 _______S___M________________ slab,mmap 0x0000000088c0 59 0 ______AS___M___H____________ active,slab,mmap,compound_head 0x0000000008c0 49 0 ______AS___M________________ active,slab,mmap 0x000000001000 465 1 ____________a_______________ anonymous 0x000000005008 8 0 ___U________a_b_____________ uptodate,anonymous,swapbacked 0x000000005808 4 0 ___U_______Ma_b_____________ uptodate,mmap,anonymous,swapbacked 0x00000000580c 1 0 __RU_______Ma_b_____________ referenced,uptodate,mmap,anonymous,swapbacked 0x000000005868 3645 14 ___U_lA____Ma_b_____________ uptodate,lru,active,mmap,anonymous,swapbacked 0x00000000586c 26 0 __RU_lA____Ma_b_____________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked total 513968 2007 Kudos to KOSAKI and Andi for the extensive recommendations! Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> --- Documentation/vm/pagemap.txt | 65 ++++++++++ fs/proc/page.c | 197 +++++++++++++++++++++++++++------ 2 files changed, 227 insertions(+), 35 deletions(-) --- mm.orig/fs/proc/page.c +++ mm/fs/proc/page.c @@ -6,6 +6,7 @@ #include <linux/mmzone.h> #include <linux/proc_fs.h> #include <linux/seq_file.h> +#include <linux/backing-dev.h> #include <asm/uaccess.h> #include "internal.h" @@ -68,19 +69,167 @@ static const struct file_operations proc /* These macros are used to decouple internal flags from exported ones */ -#define KPF_LOCKED 0 -#define KPF_ERROR 1 -#define KPF_REFERENCED 2 -#define KPF_UPTODATE 3 -#define KPF_DIRTY 4 -#define KPF_LRU 5 -#define KPF_ACTIVE 6 -#define KPF_SLAB 7 -#define KPF_WRITEBACK 8 -#define KPF_RECLAIM 9 -#define KPF_BUDDY 10 +#define KPF_LOCKED 0 +#define KPF_ERROR 1 +#define KPF_REFERENCED 2 +#define KPF_UPTODATE 3 +#define KPF_DIRTY 4 +#define KPF_LRU 5 +#define KPF_ACTIVE 6 +#define KPF_SLAB 7 +#define KPF_WRITEBACK 8 +#define KPF_RECLAIM 9 +#define KPF_BUDDY 10 + +/* new additions in 2.6.31 */ +#define KPF_MMAP 11 +#define KPF_ANON 12 +#define KPF_SWAPCACHE 13 +#define KPF_SWAPBACKED 14 +#define KPF_COMPOUND_HEAD 15 +#define KPF_COMPOUND_TAIL 16 +#define KPF_UNEVICTABLE 17 +#define KPF_POISON 18 +#define KPF_NOPAGE 19 + +/* kernel hacking assistances */ +#define KPF_RESERVED 32 +#define KPF_MLOCKED 33 +#define KPF_MAPPEDTODISK 34 +#define KPF_PRIVATE 35 +#define KPF_PRIVATE2 36 +#define KPF_OWNER_PRIVATE 37 +#define KPF_ARCH 38 +#define KPF_UNCACHED 39 + +/* + * Kernel flags are exported faithfully to Linus and his fellow hackers. + * Otherwise some details are masked to avoid confusing the end user: + * - some kernel flags are completely invisible + * - some kernel flags are conditionally invisible on their odd usages + */ +#ifdef CONFIG_DEBUG_KERNEL +static inline int genuine_linus(void) { return 1; } +#else +static inline int genuine_linus(void) { return 0; } +#endif + +#define kpf_copy_bit(uflags, kflags, visible, ubit, kbit) \ + do { \ + if (visible || genuine_linus()) \ + uflags |= ((kflags >> kbit) & 1) << ubit; \ + } while (0); + +/* a helper function _not_ intended for more general uses */ +static inline int page_cap_writeback_dirty(struct page *page) +{ + struct address_space *mapping = NULL; + + if (!PageSlab(page)) + mapping = page_mapping(page); + + return !mapping || mapping_cap_writeback_dirty(mapping); +} -#define kpf_copy_bit(flags, dstpos, srcpos) (((flags >> srcpos) & 1) << dstpos) +static u64 get_uflags(struct page *page) +{ + u64 k; + u64 u; + int io; + int lru; + int slab; + + /* + * pseudo flag: KPF_NOPAGE + * it differentiates a memory hole from a page with no flags + */ + if (!page) + return 1 << KPF_NOPAGE; + + k = page->flags; + u = 0; + + io = page_cap_writeback_dirty(page); + lru = k & (1 << PG_lru); + slab = k & (1 << PG_slab); + + /* + * pseudo flags for the well known (anonymous) memory mapped pages + */ + if (lru || genuine_linus()) { + if (page_mapped(page)) + u |= 1 << KPF_MMAP; + if (PageAnon(page)) + u |= 1 << KPF_ANON; + } + + /* + * compound pages: export both head/tail info + * they together define a compound page's start/end pos and order + */ + if (PageHead(page)) + u |= 1 << KPF_COMPOUND_HEAD; + if (PageTail(page)) + u |= 1 << KPF_COMPOUND_TAIL; + + kpf_copy_bit(u, k, 1, KPF_LOCKED, PG_locked); + + kpf_copy_bit(u, k, 1, KPF_SLAB, PG_slab); + kpf_copy_bit(u, k, 1, KPF_BUDDY, PG_buddy); + + kpf_copy_bit(u, k, io, KPF_ERROR, PG_error); + kpf_copy_bit(u, k, io, KPF_DIRTY, PG_dirty); + kpf_copy_bit(u, k, io, KPF_UPTODATE, PG_uptodate); + kpf_copy_bit(u, k, io, KPF_WRITEBACK, PG_writeback); + + kpf_copy_bit(u, k, 1, KPF_LRU, PG_lru); + kpf_copy_bit(u, k, lru, KPF_REFERENCED, PG_referenced); + kpf_copy_bit(u, k, lru, KPF_ACTIVE, PG_active); + kpf_copy_bit(u, k, lru, KPF_RECLAIM, PG_reclaim); + + kpf_copy_bit(u, k, lru, KPF_SWAPCACHE, PG_swapcache); + kpf_copy_bit(u, k, lru, KPF_SWAPBACKED, PG_swapbacked); + +#ifdef CONFIG_MEMORY_FAILURE + kpf_copy_bit(u, k, 1, KPF_POISON, PG_poison); +#endif + +#ifdef CONFIG_UNEVICTABLE_LRU + kpf_copy_bit(u, k, lru, KPF_UNEVICTABLE, PG_unevictable); + kpf_copy_bit(u, k, 0, KPF_MLOCKED, PG_mlocked); +#endif + + kpf_copy_bit(u, k, 0, KPF_RESERVED, PG_reserved); + kpf_copy_bit(u, k, 0, KPF_MAPPEDTODISK, PG_mappedtodisk); + kpf_copy_bit(u, k, 0, KPF_PRIVATE, PG_private); + kpf_copy_bit(u, k, 0, KPF_PRIVATE2, PG_private_2); + kpf_copy_bit(u, k, 0, KPF_OWNER_PRIVATE, PG_owner_priv_1); + kpf_copy_bit(u, k, 0, KPF_ARCH, PG_arch_1); + +#ifdef CONFIG_IA64_UNCACHED_ALLOCATOR + kpf_copy_bit(u, k, 0, KPF_UNCACHED, PG_uncached); +#endif + + if (!genuine_linus()) { + /* + * SLAB/SLOB/SLUB overload some page flags which may confuse end user + */ + if (slab) { + u &= ~ ((1 << KPF_ACTIVE) | + (1 << KPF_ERROR) | + (1 << KPF_MMAP)); + } + /* + * PG_reclaim could be overloaded as PG_readahead, + * and we only want to export the first one. + */ + if ((u & ((1 << KPF_RECLAIM) | (1 << KPF_WRITEBACK))) == + (1 << KPF_RECLAIM)) + u &= ~ (1 << KPF_RECLAIM); + } + + return u; +}; static ssize_t kpageflags_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) @@ -90,7 +239,6 @@ static ssize_t kpageflags_read(struct fi unsigned long src = *ppos; unsigned long pfn; ssize_t ret = 0; - u64 kflags, uflags; pfn = src / KPMSIZE; count = min_t(unsigned long, count, (max_pfn * KPMSIZE) - src); @@ -98,32 +246,17 @@ static ssize_t kpageflags_read(struct fi return -EINVAL; while (count > 0) { - ppage = NULL; if (pfn_valid(pfn)) ppage = pfn_to_page(pfn); - pfn++; - if (!ppage) - kflags = 0; else - kflags = ppage->flags; - - uflags = kpf_copy_bit(kflags, KPF_LOCKED, PG_locked) | - kpf_copy_bit(kflags, KPF_ERROR, PG_error) | - kpf_copy_bit(kflags, KPF_REFERENCED, PG_referenced) | - kpf_copy_bit(kflags, KPF_UPTODATE, PG_uptodate) | - kpf_copy_bit(kflags, KPF_DIRTY, PG_dirty) | - kpf_copy_bit(kflags, KPF_LRU, PG_lru) | - kpf_copy_bit(kflags, KPF_ACTIVE, PG_active) | - kpf_copy_bit(kflags, KPF_SLAB, PG_slab) | - kpf_copy_bit(kflags, KPF_WRITEBACK, PG_writeback) | - kpf_copy_bit(kflags, KPF_RECLAIM, PG_reclaim) | - kpf_copy_bit(kflags, KPF_BUDDY, PG_buddy); + ppage = NULL; - if (put_user(uflags, out++)) { + if (put_user(get_uflags(ppage), out)) { ret = -EFAULT; break; } - + out++; + pfn++; count -= KPMSIZE; } --- mm.orig/Documentation/vm/pagemap.txt +++ mm/Documentation/vm/pagemap.txt @@ -12,9 +12,9 @@ There are three components to pagemap: value for each virtual page, containing the following data (from fs/proc/task_mmu.c, above pagemap_read): - * Bits 0-55 page frame number (PFN) if present + * Bits 0-54 page frame number (PFN) if present * Bits 0-4 swap type if swapped - * Bits 5-55 swap offset if swapped + * Bits 5-54 swap offset if swapped * Bits 55-60 page shift (page size = 1<<page shift) * Bit 61 reserved for future use * Bit 62 page swapped @@ -36,7 +36,7 @@ There are three components to pagemap: * /proc/kpageflags. This file contains a 64-bit set of flags for each page, indexed by PFN. - The flags are (from fs/proc/proc_misc, above kpageflags_read): + The flags are (from fs/proc/page.c, above kpageflags_read): 0. LOCKED 1. ERROR @@ -49,6 +49,65 @@ There are three components to pagemap: 8. WRITEBACK 9. RECLAIM 10. BUDDY + 11. MMAP + 12. ANON + 13. SWAPCACHE + 14. SWAPBACKED + 15. COMPOUND_HEAD + 16. COMPOUND_TAIL + 17. UNEVICTABLE + 18. POISON + 19. NOPAGE + +Short descriptions to the page flags: + + 0. LOCKED + page is being locked for exclusive access, eg. by undergoing read/write IO + + 7. SLAB + page is managed by the SLAB/SLOB/SLUB/SLQB kernel memory allocator + +10. BUDDY + a free memory block managed by the buddy system allocator + The buddy system organizes free memory in blocks of various orders. + An order N block has 2^N physically contiguous pages, with the BUDDY flag + set for and _only_ for the first page. + +15. COMPOUND_HEAD +16. COMPOUND_TAIL + A compound page with order N consists of 2^N physically contiguous pages. + A compound page with order 2 takes the form of "HTTT", where H donates its + head page and T donates its tail page(s). The major consumers of compound + pages are hugeTLB pages (Documentation/vm/hugetlbpage.txt), the SLUB etc. + memory allocators and various device drivers. + +18. POISON + hardware has detected memory corruption on this page + +19. NOPAGE + no page frame exists at the requested address + + [IO related page flags] + 1. ERROR IO error occurred + 3. UPTODATE page has up-to-date data + ie. for file backed page: (in-memory data revision >= on-disk one) + 4. DIRTY page has been written to, hence contains new data + ie. for file backed page: (in-memory data revision > on-disk one) + 8. WRITEBACK page is being synced to disk + + [LRU related page flags] + 5. LRU page is in one of the LRU lists + 6. ACTIVE page is in the active LRU list +17. UNEVICTABLE page is in the unevictable (non-)LRU list + It is somehow pinned and not a candidate for LRU page reclaims, + eg. ramfs pages, shmctl(SHM_LOCK) and mlock() memory segments + 2. REFERENCED page has been referenced since last LRU list enqueue/requeue + 9. RECLAIM page will be reclaimed soon after its pageout IO completed +11. MMAP a memory mapped page +12. ANON a memory mapped page who is not a file page +13. SWAPCACHE page is mapped to swap space, ie. has an associated swap entry +14. SWAPBACKED page is backed by swap/RAM + Using pagemap to do something useful: -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-04-23 2:27 UTC|newest] Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top 2009-04-14 4:22 [RFC][PATCH] proc: export more page flags in /proc/kpageflags Wu Fengguang 2009-04-14 4:22 ` Wu Fengguang 2009-04-14 4:36 ` Wu Fengguang 2009-04-14 4:37 ` KOSAKI Motohiro 2009-04-14 4:37 ` KOSAKI Motohiro 2009-04-14 6:41 ` Wu Fengguang 2009-04-14 6:41 ` Wu Fengguang 2009-04-14 6:54 ` KOSAKI Motohiro 2009-04-14 6:54 ` KOSAKI Motohiro 2009-04-14 7:11 ` Andi Kleen 2009-04-14 7:11 ` Andi Kleen 2009-04-14 7:17 ` KOSAKI Motohiro 2009-04-14 7:17 ` KOSAKI Motohiro 2009-04-15 13:18 ` Wu Fengguang 2009-04-15 13:18 ` Wu Fengguang 2009-04-15 13:57 ` Andi Kleen 2009-04-15 13:57 ` Andi Kleen 2009-04-16 2:41 ` Wu Fengguang 2009-04-16 2:41 ` Wu Fengguang 2009-04-16 3:54 ` Andi Kleen 2009-04-16 3:54 ` Andi Kleen 2009-04-16 4:43 ` Wu Fengguang 2009-04-16 4:43 ` Wu Fengguang 2009-04-16 2:26 ` KOSAKI Motohiro 2009-04-16 2:26 ` KOSAKI Motohiro 2009-04-16 3:49 ` Wu Fengguang 2009-04-16 3:49 ` Wu Fengguang 2009-04-16 6:30 ` Wu Fengguang 2009-04-16 6:30 ` Wu Fengguang 2009-04-23 2:26 ` Wu Fengguang [this message] 2009-04-23 2:26 ` [RFC][PATCH] proc: export more page flags in /proc/kpageflags (take 3) Wu Fengguang 2009-04-23 7:48 ` Andi Kleen 2009-04-23 7:48 ` Andi Kleen 2009-04-23 8:10 ` Wu Fengguang 2009-04-23 8:10 ` Wu Fengguang 2009-04-23 8:54 ` Andi Kleen 2009-04-23 8:54 ` Andi Kleen 2009-04-23 11:21 ` Wu Fengguang 2009-04-23 11:21 ` Wu Fengguang 2009-04-25 1:59 ` Wu Fengguang 2009-04-14 7:22 ` [RFC][PATCH] proc: export more page flags in /proc/kpageflags Wu Fengguang 2009-04-14 7:22 ` Wu Fengguang 2009-04-14 7:42 ` KOSAKI Motohiro 2009-04-14 7:42 ` KOSAKI Motohiro
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20090423022625.GA8822@localhost \ --to=fengguang.wu@intel.com \ --cc=akpm@linux-foundation.org \ --cc=andi@firstfloor.org \ --cc=kosaki.motohiro@jp.fujitsu.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.