All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: [RFC][PATCH] proc: export more page flags in /proc/kpageflags (take 3)
Date: Thu, 23 Apr 2009 10:26:25 +0800	[thread overview]
Message-ID: <20090423022625.GA8822@localhost> (raw)
In-Reply-To: <20090416111108.AC55.A69D9226@jp.fujitsu.com>

Andi and KOSAKI: can we hopefully reach harmony of opinions on this version?

Export 9 page flags in /proc/kpageflags, and 8 more for kernel developers.

1) for kernel hackers (on CONFIG_DEBUG_KERNEL)
   - all available page flags are exported, and
   - exported as is
2) for admins and end users
   - only the more `well known' flags are exported:
	11. KPF_MMAP		(pseudo flag) memory mapped page
	12. KPF_ANON		(pseudo flag) memory mapped page (anonymous)
	13. KPF_SWAPCACHE	page is in swap cache
	14. KPF_SWAPBACKED	page is swap/RAM backed
	15. KPF_COMPOUND_HEAD	(*)
	16. KPF_COMPOUND_TAIL	(*)
	17. KPF_UNEVICTABLE	page is in the unevictable LRU list
	18. KPF_POISON		hardware detected corruption
	19. KPF_NOPAGE		(pseudo flag) no page frame at the address

	(*) For compound pages, exporting _both_ head/tail info enables
	    users to tell where a compound page starts/ends, and its order.

   - limit flags to their typical usage scenario, as indicated by KOSAKI:
	- LRU pages: only export relevant flags
		- PG_lru
		- PG_unevictable
		- PG_active
		- PG_referenced
		- page_mapped()
		- PageAnon()
		- PG_swapcache
		- PG_swapbacked
		- PG_reclaim
	- no-IO pages: mask out irrelevant flags
		- PG_dirty
		- PG_uptodate
		- PG_writeback
	- SLAB pages: mask out overloaded flags:
		- PG_error
		- PG_active
		- PG_private
	- PG_reclaim: filter out the overloaded PG_readahead

Note that compound page flags are exported faithfully to end user.  This risks
exposing internal implementation details of the SLUB allocator, however hiding
it risks larger impacts:
	- admins may wonder where all the compound pages gone - the use of
	  compound pages in SLUB might have some real world relevance, so that
	  end users want to be aware of this behavior
	- admins may be confused on inconsistent number of head/tail segments
	  This is because SLUB only marks PG_slab on the compound head page.
	  If we mask out PG_head|PG_tail for PG_slab pages, we are actually
	  only masking out PG_head flags. Therefore the PG_tail segments will
	  outnumber PG_head ones, which puzzled me for some time..

Here are the admin/linus views of all page flags on a newly booted nfs-root system:

# ./page-types # for admin
         flags  page-count       MB  symbolic-flags                     long-symbolic-flags
0x000000000000      491449     1919  ____________________________
0x000000008000          15        0  _______________H____________       compound_head
0x000000010000        4280       16  ________________T___________       compound_tail
0x000000000008          17        0  ___U________________________       uptodate
0x000000008010           1        0  ____D__________H____________       dirty,compound_head
0x000000010010           4        0  ____D___________T___________       dirty,compound_tail
0x000000000020           1        0  _____l______________________       lru
0x000000000028        2678       10  ___U_l______________________       uptodate,lru
0x00000000002c        5244       20  __RU_l______________________       referenced,uptodate,lru
0x000000004060           1        0  _____lA_______b_____________       lru,active,swapbacked
0x000000004064          13        0  __R__lA_______b_____________       referenced,lru,active,swapbacked
0x000000000068         236        0  ___U_lA_____________________       uptodate,lru,active
0x00000000006c         927        3  __RU_lA_____________________       referenced,uptodate,lru,active
0x000000008080         968        3  _______S_______H____________       slab,compound_head
0x000000000080        1539        6  _______S____________________       slab
0x000000000400         516        2  __________B_________________       buddy
0x000000000828        1142        4  ___U_l_____M________________       uptodate,lru,mmap
0x00000000082c         280        1  __RU_l_____M________________       referenced,uptodate,lru,mmap
0x000000004860           2        0  _____lA____M__b_____________       lru,active,mmap,swapbacked
0x000000000868         366        1  ___U_lA____M________________       uptodate,lru,active,mmap
0x00000000086c         623        2  __RU_lA____M________________       referenced,uptodate,lru,active,mmap
0x000000005868        3639       14  ___U_lA____Ma_b_____________       uptodate,lru,active,mmap,anonymous,swapbacked
0x00000000586c          27        0  __RU_lA____Ma_b_____________       referenced,uptodate,lru,active,mmap,anonymous,swapbacked
         total      513968     2007

# ./page-types # for linus, when CONFIG_DEBUG_KERNEL is turned on
         flags  page-count       MB  symbolic-flags                     long-symbolic-flags
0x000000000000      471731     1842  ____________________________
0x000100000000       19258       75  ____________________r_______       reserved
0x000000008000          15        0  _______________H____________       compound_head
0x000000010000        4270       16  ________________T___________       compound_tail
0x000000000008           3        0  ___U________________________       uptodate
0x000000008014           1        0  __R_D__________H____________       referenced,dirty,compound_head
0x000000010014           4        0  __R_D___________T___________       referenced,dirty,compound_tail
0x000000000020           1        0  _____l______________________       lru
0x000000000028        2626       10  ___U_l______________________       uptodate,lru
0x00000000002c        5244       20  __RU_l______________________       referenced,uptodate,lru
0x000000000068         238        0  ___U_lA_____________________       uptodate,lru,active
0x00000000006c         925        3  __RU_lA_____________________       referenced,uptodate,lru,active
0x000000004078           1        0  ___UDlA_______b_____________       uptodate,dirty,lru,active,swapbacked
0x00000000407c          13        0  __RUDlA_______b_____________       referenced,uptodate,dirty,lru,active,swapbacked
0x000000000228          49        0  ___U_l___I__________________       uptodate,lru,reclaim
0x000000000400         523        2  __________B_________________       buddy
0x000000000804           1        0  __R________M________________       referenced,mmap
0x00000000080c           1        0  __RU_______M________________       referenced,uptodate,mmap
0x000000000828        1142        4  ___U_l_____M________________       uptodate,lru,mmap
0x00000000082c         280        1  __RU_l_____M________________       referenced,uptodate,lru,mmap
0x000000000868         366        1  ___U_lA____M________________       uptodate,lru,active,mmap
0x00000000086c         622        2  __RU_lA____M________________       referenced,uptodate,lru,active,mmap
0x000000004878           2        0  ___UDlA____M__b_____________       uptodate,dirty,lru,active,mmap,swapbacked
0x000000008880         907        3  _______S___M___H____________       slab,mmap,compound_head
0x000000000880        1488        5  _______S___M________________       slab,mmap
0x0000000088c0          59        0  ______AS___M___H____________       active,slab,mmap,compound_head
0x0000000008c0          49        0  ______AS___M________________       active,slab,mmap
0x000000001000         465        1  ____________a_______________       anonymous
0x000000005008           8        0  ___U________a_b_____________       uptodate,anonymous,swapbacked
0x000000005808           4        0  ___U_______Ma_b_____________       uptodate,mmap,anonymous,swapbacked
0x00000000580c           1        0  __RU_______Ma_b_____________       referenced,uptodate,mmap,anonymous,swapbacked
0x000000005868        3645       14  ___U_lA____Ma_b_____________       uptodate,lru,active,mmap,anonymous,swapbacked
0x00000000586c          26        0  __RU_lA____Ma_b_____________       referenced,uptodate,lru,active,mmap,anonymous,swapbacked
         total      513968     2007

Kudos to KOSAKI and Andi for the extensive recommendations!

Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 Documentation/vm/pagemap.txt |   65 ++++++++++
 fs/proc/page.c               |  197 +++++++++++++++++++++++++++------
 2 files changed, 227 insertions(+), 35 deletions(-)

--- mm.orig/fs/proc/page.c
+++ mm/fs/proc/page.c
@@ -6,6 +6,7 @@
 #include <linux/mmzone.h>
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
+#include <linux/backing-dev.h>
 #include <asm/uaccess.h>
 #include "internal.h"
 
@@ -68,19 +69,167 @@ static const struct file_operations proc
 
 /* These macros are used to decouple internal flags from exported ones */
 
-#define KPF_LOCKED     0
-#define KPF_ERROR      1
-#define KPF_REFERENCED 2
-#define KPF_UPTODATE   3
-#define KPF_DIRTY      4
-#define KPF_LRU        5
-#define KPF_ACTIVE     6
-#define KPF_SLAB       7
-#define KPF_WRITEBACK  8
-#define KPF_RECLAIM    9
-#define KPF_BUDDY     10
+#define KPF_LOCKED		0
+#define KPF_ERROR		1
+#define KPF_REFERENCED		2
+#define KPF_UPTODATE		3
+#define KPF_DIRTY		4
+#define KPF_LRU			5
+#define KPF_ACTIVE		6
+#define KPF_SLAB		7
+#define KPF_WRITEBACK		8
+#define KPF_RECLAIM		9
+#define KPF_BUDDY		10
+
+/* new additions in 2.6.31 */
+#define KPF_MMAP		11
+#define KPF_ANON		12
+#define KPF_SWAPCACHE		13
+#define KPF_SWAPBACKED		14
+#define KPF_COMPOUND_HEAD	15
+#define KPF_COMPOUND_TAIL	16
+#define KPF_UNEVICTABLE		17
+#define KPF_POISON		18
+#define KPF_NOPAGE		19
+
+/* kernel hacking assistances */
+#define KPF_RESERVED		32
+#define KPF_MLOCKED		33
+#define KPF_MAPPEDTODISK	34
+#define KPF_PRIVATE		35
+#define KPF_PRIVATE2		36
+#define KPF_OWNER_PRIVATE	37
+#define KPF_ARCH		38
+#define KPF_UNCACHED		39
+
+/*
+ * Kernel flags are exported faithfully to Linus and his fellow hackers.
+ * Otherwise some details are masked to avoid confusing the end user:
+ * - some kernel flags are completely invisible
+ * - some kernel flags are conditionally invisible on their odd usages
+ */
+#ifdef CONFIG_DEBUG_KERNEL
+static inline int genuine_linus(void) { return 1; }
+#else
+static inline int genuine_linus(void) { return 0; }
+#endif
+
+#define kpf_copy_bit(uflags, kflags, visible, ubit, kbit)		\
+	do {								\
+		if (visible || genuine_linus())				\
+			uflags |= ((kflags >> kbit) & 1) << ubit;	\
+	} while (0);
+
+/* a helper function _not_ intended for more general uses */
+static inline int page_cap_writeback_dirty(struct page *page)
+{
+	struct address_space *mapping = NULL;
+
+	if (!PageSlab(page))
+		mapping = page_mapping(page);
+
+	return !mapping || mapping_cap_writeback_dirty(mapping);
+}
 
-#define kpf_copy_bit(flags, dstpos, srcpos) (((flags >> srcpos) & 1) << dstpos)
+static u64 get_uflags(struct page *page)
+{
+	u64 k;
+	u64 u;
+	int io;
+	int lru;
+	int slab;
+
+	/*
+	 * pseudo flag: KPF_NOPAGE
+	 * it differentiates a memory hole from a page with no flags
+	 */
+	if (!page)
+		return 1 << KPF_NOPAGE;
+
+	k = page->flags;
+	u = 0;
+
+	io   = page_cap_writeback_dirty(page);
+	lru  = k & (1 << PG_lru);
+	slab = k & (1 << PG_slab);
+
+	/*
+	 * pseudo flags for the well known (anonymous) memory mapped pages
+	 */
+	if (lru || genuine_linus()) {
+		if (page_mapped(page))
+			u |= 1 << KPF_MMAP;
+		if (PageAnon(page))
+			u |= 1 << KPF_ANON;
+	}
+
+	/*
+	 * compound pages: export both head/tail info
+	 * they together define a compound page's start/end pos and order
+	 */
+	if (PageHead(page))
+		u |= 1 << KPF_COMPOUND_HEAD;
+	if (PageTail(page))
+		u |= 1 << KPF_COMPOUND_TAIL;
+
+	kpf_copy_bit(u, k, 1,	  KPF_LOCKED,		PG_locked);
+
+	kpf_copy_bit(u, k, 1,     KPF_SLAB,		PG_slab);
+	kpf_copy_bit(u, k, 1,     KPF_BUDDY,		PG_buddy);
+
+	kpf_copy_bit(u, k, io,    KPF_ERROR,		PG_error);
+	kpf_copy_bit(u, k, io,    KPF_DIRTY,		PG_dirty);
+	kpf_copy_bit(u, k, io,    KPF_UPTODATE,		PG_uptodate);
+	kpf_copy_bit(u, k, io,    KPF_WRITEBACK,	PG_writeback);
+
+	kpf_copy_bit(u, k, 1,     KPF_LRU,		PG_lru);
+	kpf_copy_bit(u, k, lru,	  KPF_REFERENCED,	PG_referenced);
+	kpf_copy_bit(u, k, lru,   KPF_ACTIVE,		PG_active);
+	kpf_copy_bit(u, k, lru,   KPF_RECLAIM,		PG_reclaim);
+
+	kpf_copy_bit(u, k, lru,   KPF_SWAPCACHE,	PG_swapcache);
+	kpf_copy_bit(u, k, lru,   KPF_SWAPBACKED,	PG_swapbacked);
+
+#ifdef CONFIG_MEMORY_FAILURE
+	kpf_copy_bit(u, k, 1,     KPF_POISON,		PG_poison);
+#endif
+
+#ifdef CONFIG_UNEVICTABLE_LRU
+	kpf_copy_bit(u, k, lru,   KPF_UNEVICTABLE,	PG_unevictable);
+	kpf_copy_bit(u, k, 0,     KPF_MLOCKED,		PG_mlocked);
+#endif
+
+	kpf_copy_bit(u, k, 0,     KPF_RESERVED,		PG_reserved);
+	kpf_copy_bit(u, k, 0,     KPF_MAPPEDTODISK,	PG_mappedtodisk);
+	kpf_copy_bit(u, k, 0,     KPF_PRIVATE,		PG_private);
+	kpf_copy_bit(u, k, 0,     KPF_PRIVATE2,		PG_private_2);
+	kpf_copy_bit(u, k, 0,     KPF_OWNER_PRIVATE,	PG_owner_priv_1);
+	kpf_copy_bit(u, k, 0,     KPF_ARCH,		PG_arch_1);
+
+#ifdef CONFIG_IA64_UNCACHED_ALLOCATOR
+	kpf_copy_bit(u, k, 0,     KPF_UNCACHED,		PG_uncached);
+#endif
+
+	if (!genuine_linus()) {
+		/*
+		 * SLAB/SLOB/SLUB overload some page flags which may confuse end user
+		 */
+		if (slab) {
+			u &= ~ ((1 << KPF_ACTIVE)	|
+				(1 << KPF_ERROR)	|
+				(1 << KPF_MMAP));
+		}
+		/*
+		 * PG_reclaim could be overloaded as PG_readahead,
+		 * and we only want to export the first one.
+		 */
+		if ((u & ((1 << KPF_RECLAIM) | (1 << KPF_WRITEBACK))) ==
+			  (1 << KPF_RECLAIM))
+			u &= ~ (1 << KPF_RECLAIM);
+	}
+
+	return u;
+};
 
 static ssize_t kpageflags_read(struct file *file, char __user *buf,
 			     size_t count, loff_t *ppos)
@@ -90,7 +239,6 @@ static ssize_t kpageflags_read(struct fi
 	unsigned long src = *ppos;
 	unsigned long pfn;
 	ssize_t ret = 0;
-	u64 kflags, uflags;
 
 	pfn = src / KPMSIZE;
 	count = min_t(unsigned long, count, (max_pfn * KPMSIZE) - src);
@@ -98,32 +246,17 @@ static ssize_t kpageflags_read(struct fi
 		return -EINVAL;
 
 	while (count > 0) {
-		ppage = NULL;
 		if (pfn_valid(pfn))
 			ppage = pfn_to_page(pfn);
-		pfn++;
-		if (!ppage)
-			kflags = 0;
 		else
-			kflags = ppage->flags;
-
-		uflags = kpf_copy_bit(kflags, KPF_LOCKED, PG_locked) |
-			kpf_copy_bit(kflags, KPF_ERROR, PG_error) |
-			kpf_copy_bit(kflags, KPF_REFERENCED, PG_referenced) |
-			kpf_copy_bit(kflags, KPF_UPTODATE, PG_uptodate) |
-			kpf_copy_bit(kflags, KPF_DIRTY, PG_dirty) |
-			kpf_copy_bit(kflags, KPF_LRU, PG_lru) |
-			kpf_copy_bit(kflags, KPF_ACTIVE, PG_active) |
-			kpf_copy_bit(kflags, KPF_SLAB, PG_slab) |
-			kpf_copy_bit(kflags, KPF_WRITEBACK, PG_writeback) |
-			kpf_copy_bit(kflags, KPF_RECLAIM, PG_reclaim) |
-			kpf_copy_bit(kflags, KPF_BUDDY, PG_buddy);
+			ppage = NULL;
 
-		if (put_user(uflags, out++)) {
+		if (put_user(get_uflags(ppage), out)) {
 			ret = -EFAULT;
 			break;
 		}
-
+		out++;
+		pfn++;
 		count -= KPMSIZE;
 	}
 
--- mm.orig/Documentation/vm/pagemap.txt
+++ mm/Documentation/vm/pagemap.txt
@@ -12,9 +12,9 @@ There are three components to pagemap:
    value for each virtual page, containing the following data (from
    fs/proc/task_mmu.c, above pagemap_read):
 
-    * Bits 0-55  page frame number (PFN) if present
+    * Bits 0-54  page frame number (PFN) if present
     * Bits 0-4   swap type if swapped
-    * Bits 5-55  swap offset if swapped
+    * Bits 5-54  swap offset if swapped
     * Bits 55-60 page shift (page size = 1<<page shift)
     * Bit  61    reserved for future use
     * Bit  62    page swapped
@@ -36,7 +36,7 @@ There are three components to pagemap:
  * /proc/kpageflags.  This file contains a 64-bit set of flags for each
    page, indexed by PFN.
 
-   The flags are (from fs/proc/proc_misc, above kpageflags_read):
+   The flags are (from fs/proc/page.c, above kpageflags_read):
 
      0. LOCKED
      1. ERROR
@@ -49,6 +49,65 @@ There are three components to pagemap:
      8. WRITEBACK
      9. RECLAIM
     10. BUDDY
+    11. MMAP
+    12. ANON
+    13. SWAPCACHE
+    14. SWAPBACKED
+    15. COMPOUND_HEAD
+    16. COMPOUND_TAIL
+    17. UNEVICTABLE
+    18. POISON
+    19. NOPAGE
+
+Short descriptions to the page flags:
+
+ 0. LOCKED
+    page is being locked for exclusive access, eg. by undergoing read/write IO
+
+ 7. SLAB
+    page is managed by the SLAB/SLOB/SLUB/SLQB kernel memory allocator
+
+10. BUDDY
+    a free memory block managed by the buddy system allocator
+    The buddy system organizes free memory in blocks of various orders.
+    An order N block has 2^N physically contiguous pages, with the BUDDY flag
+    set for and _only_ for the first page.
+
+15. COMPOUND_HEAD
+16. COMPOUND_TAIL
+    A compound page with order N consists of 2^N physically contiguous pages.
+    A compound page with order 2 takes the form of "HTTT", where H donates its
+    head page and T donates its tail page(s).  The major consumers of compound
+    pages are hugeTLB pages (Documentation/vm/hugetlbpage.txt), the SLUB etc.
+    memory allocators and various device drivers.
+
+18. POISON
+    hardware has detected memory corruption on this page
+
+19. NOPAGE
+    no page frame exists at the requested address
+
+    [IO related page flags]
+ 1. ERROR     IO error occurred
+ 3. UPTODATE  page has up-to-date data
+              ie. for file backed page: (in-memory data revision >= on-disk one)
+ 4. DIRTY     page has been written to, hence contains new data
+              ie. for file backed page: (in-memory data revision >  on-disk one)
+ 8. WRITEBACK page is being synced to disk
+
+    [LRU related page flags]
+ 5. LRU         page is in one of the LRU lists
+ 6. ACTIVE      page is in the active LRU list
+17. UNEVICTABLE page is in the unevictable (non-)LRU list
+                It is somehow pinned and not a candidate for LRU page reclaims,
+		eg. ramfs pages, shmctl(SHM_LOCK) and mlock() memory segments
+ 2. REFERENCED  page has been referenced since last LRU list enqueue/requeue
+ 9. RECLAIM     page will be reclaimed soon after its pageout IO completed
+11. MMAP        a memory mapped page
+12. ANON        a memory mapped page who is not a file page
+13. SWAPCACHE   page is mapped to swap space, ie. has an associated swap entry
+14. SWAPBACKED  page is backed by swap/RAM
+
 
 Using pagemap to do something useful:
 

WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: [RFC][PATCH] proc: export more page flags in /proc/kpageflags (take 3)
Date: Thu, 23 Apr 2009 10:26:25 +0800	[thread overview]
Message-ID: <20090423022625.GA8822@localhost> (raw)
In-Reply-To: <20090416111108.AC55.A69D9226@jp.fujitsu.com>

Andi and KOSAKI: can we hopefully reach harmony of opinions on this version?

Export 9 page flags in /proc/kpageflags, and 8 more for kernel developers.

1) for kernel hackers (on CONFIG_DEBUG_KERNEL)
   - all available page flags are exported, and
   - exported as is
2) for admins and end users
   - only the more `well known' flags are exported:
	11. KPF_MMAP		(pseudo flag) memory mapped page
	12. KPF_ANON		(pseudo flag) memory mapped page (anonymous)
	13. KPF_SWAPCACHE	page is in swap cache
	14. KPF_SWAPBACKED	page is swap/RAM backed
	15. KPF_COMPOUND_HEAD	(*)
	16. KPF_COMPOUND_TAIL	(*)
	17. KPF_UNEVICTABLE	page is in the unevictable LRU list
	18. KPF_POISON		hardware detected corruption
	19. KPF_NOPAGE		(pseudo flag) no page frame at the address

	(*) For compound pages, exporting _both_ head/tail info enables
	    users to tell where a compound page starts/ends, and its order.

   - limit flags to their typical usage scenario, as indicated by KOSAKI:
	- LRU pages: only export relevant flags
		- PG_lru
		- PG_unevictable
		- PG_active
		- PG_referenced
		- page_mapped()
		- PageAnon()
		- PG_swapcache
		- PG_swapbacked
		- PG_reclaim
	- no-IO pages: mask out irrelevant flags
		- PG_dirty
		- PG_uptodate
		- PG_writeback
	- SLAB pages: mask out overloaded flags:
		- PG_error
		- PG_active
		- PG_private
	- PG_reclaim: filter out the overloaded PG_readahead

Note that compound page flags are exported faithfully to end user.  This risks
exposing internal implementation details of the SLUB allocator, however hiding
it risks larger impacts:
	- admins may wonder where all the compound pages gone - the use of
	  compound pages in SLUB might have some real world relevance, so that
	  end users want to be aware of this behavior
	- admins may be confused on inconsistent number of head/tail segments
	  This is because SLUB only marks PG_slab on the compound head page.
	  If we mask out PG_head|PG_tail for PG_slab pages, we are actually
	  only masking out PG_head flags. Therefore the PG_tail segments will
	  outnumber PG_head ones, which puzzled me for some time..

Here are the admin/linus views of all page flags on a newly booted nfs-root system:

# ./page-types # for admin
         flags  page-count       MB  symbolic-flags                     long-symbolic-flags
0x000000000000      491449     1919  ____________________________
0x000000008000          15        0  _______________H____________       compound_head
0x000000010000        4280       16  ________________T___________       compound_tail
0x000000000008          17        0  ___U________________________       uptodate
0x000000008010           1        0  ____D__________H____________       dirty,compound_head
0x000000010010           4        0  ____D___________T___________       dirty,compound_tail
0x000000000020           1        0  _____l______________________       lru
0x000000000028        2678       10  ___U_l______________________       uptodate,lru
0x00000000002c        5244       20  __RU_l______________________       referenced,uptodate,lru
0x000000004060           1        0  _____lA_______b_____________       lru,active,swapbacked
0x000000004064          13        0  __R__lA_______b_____________       referenced,lru,active,swapbacked
0x000000000068         236        0  ___U_lA_____________________       uptodate,lru,active
0x00000000006c         927        3  __RU_lA_____________________       referenced,uptodate,lru,active
0x000000008080         968        3  _______S_______H____________       slab,compound_head
0x000000000080        1539        6  _______S____________________       slab
0x000000000400         516        2  __________B_________________       buddy
0x000000000828        1142        4  ___U_l_____M________________       uptodate,lru,mmap
0x00000000082c         280        1  __RU_l_____M________________       referenced,uptodate,lru,mmap
0x000000004860           2        0  _____lA____M__b_____________       lru,active,mmap,swapbacked
0x000000000868         366        1  ___U_lA____M________________       uptodate,lru,active,mmap
0x00000000086c         623        2  __RU_lA____M________________       referenced,uptodate,lru,active,mmap
0x000000005868        3639       14  ___U_lA____Ma_b_____________       uptodate,lru,active,mmap,anonymous,swapbacked
0x00000000586c          27        0  __RU_lA____Ma_b_____________       referenced,uptodate,lru,active,mmap,anonymous,swapbacked
         total      513968     2007

# ./page-types # for linus, when CONFIG_DEBUG_KERNEL is turned on
         flags  page-count       MB  symbolic-flags                     long-symbolic-flags
0x000000000000      471731     1842  ____________________________
0x000100000000       19258       75  ____________________r_______       reserved
0x000000008000          15        0  _______________H____________       compound_head
0x000000010000        4270       16  ________________T___________       compound_tail
0x000000000008           3        0  ___U________________________       uptodate
0x000000008014           1        0  __R_D__________H____________       referenced,dirty,compound_head
0x000000010014           4        0  __R_D___________T___________       referenced,dirty,compound_tail
0x000000000020           1        0  _____l______________________       lru
0x000000000028        2626       10  ___U_l______________________       uptodate,lru
0x00000000002c        5244       20  __RU_l______________________       referenced,uptodate,lru
0x000000000068         238        0  ___U_lA_____________________       uptodate,lru,active
0x00000000006c         925        3  __RU_lA_____________________       referenced,uptodate,lru,active
0x000000004078           1        0  ___UDlA_______b_____________       uptodate,dirty,lru,active,swapbacked
0x00000000407c          13        0  __RUDlA_______b_____________       referenced,uptodate,dirty,lru,active,swapbacked
0x000000000228          49        0  ___U_l___I__________________       uptodate,lru,reclaim
0x000000000400         523        2  __________B_________________       buddy
0x000000000804           1        0  __R________M________________       referenced,mmap
0x00000000080c           1        0  __RU_______M________________       referenced,uptodate,mmap
0x000000000828        1142        4  ___U_l_____M________________       uptodate,lru,mmap
0x00000000082c         280        1  __RU_l_____M________________       referenced,uptodate,lru,mmap
0x000000000868         366        1  ___U_lA____M________________       uptodate,lru,active,mmap
0x00000000086c         622        2  __RU_lA____M________________       referenced,uptodate,lru,active,mmap
0x000000004878           2        0  ___UDlA____M__b_____________       uptodate,dirty,lru,active,mmap,swapbacked
0x000000008880         907        3  _______S___M___H____________       slab,mmap,compound_head
0x000000000880        1488        5  _______S___M________________       slab,mmap
0x0000000088c0          59        0  ______AS___M___H____________       active,slab,mmap,compound_head
0x0000000008c0          49        0  ______AS___M________________       active,slab,mmap
0x000000001000         465        1  ____________a_______________       anonymous
0x000000005008           8        0  ___U________a_b_____________       uptodate,anonymous,swapbacked
0x000000005808           4        0  ___U_______Ma_b_____________       uptodate,mmap,anonymous,swapbacked
0x00000000580c           1        0  __RU_______Ma_b_____________       referenced,uptodate,mmap,anonymous,swapbacked
0x000000005868        3645       14  ___U_lA____Ma_b_____________       uptodate,lru,active,mmap,anonymous,swapbacked
0x00000000586c          26        0  __RU_lA____Ma_b_____________       referenced,uptodate,lru,active,mmap,anonymous,swapbacked
         total      513968     2007

Kudos to KOSAKI and Andi for the extensive recommendations!

Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 Documentation/vm/pagemap.txt |   65 ++++++++++
 fs/proc/page.c               |  197 +++++++++++++++++++++++++++------
 2 files changed, 227 insertions(+), 35 deletions(-)

--- mm.orig/fs/proc/page.c
+++ mm/fs/proc/page.c
@@ -6,6 +6,7 @@
 #include <linux/mmzone.h>
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
+#include <linux/backing-dev.h>
 #include <asm/uaccess.h>
 #include "internal.h"
 
@@ -68,19 +69,167 @@ static const struct file_operations proc
 
 /* These macros are used to decouple internal flags from exported ones */
 
-#define KPF_LOCKED     0
-#define KPF_ERROR      1
-#define KPF_REFERENCED 2
-#define KPF_UPTODATE   3
-#define KPF_DIRTY      4
-#define KPF_LRU        5
-#define KPF_ACTIVE     6
-#define KPF_SLAB       7
-#define KPF_WRITEBACK  8
-#define KPF_RECLAIM    9
-#define KPF_BUDDY     10
+#define KPF_LOCKED		0
+#define KPF_ERROR		1
+#define KPF_REFERENCED		2
+#define KPF_UPTODATE		3
+#define KPF_DIRTY		4
+#define KPF_LRU			5
+#define KPF_ACTIVE		6
+#define KPF_SLAB		7
+#define KPF_WRITEBACK		8
+#define KPF_RECLAIM		9
+#define KPF_BUDDY		10
+
+/* new additions in 2.6.31 */
+#define KPF_MMAP		11
+#define KPF_ANON		12
+#define KPF_SWAPCACHE		13
+#define KPF_SWAPBACKED		14
+#define KPF_COMPOUND_HEAD	15
+#define KPF_COMPOUND_TAIL	16
+#define KPF_UNEVICTABLE		17
+#define KPF_POISON		18
+#define KPF_NOPAGE		19
+
+/* kernel hacking assistances */
+#define KPF_RESERVED		32
+#define KPF_MLOCKED		33
+#define KPF_MAPPEDTODISK	34
+#define KPF_PRIVATE		35
+#define KPF_PRIVATE2		36
+#define KPF_OWNER_PRIVATE	37
+#define KPF_ARCH		38
+#define KPF_UNCACHED		39
+
+/*
+ * Kernel flags are exported faithfully to Linus and his fellow hackers.
+ * Otherwise some details are masked to avoid confusing the end user:
+ * - some kernel flags are completely invisible
+ * - some kernel flags are conditionally invisible on their odd usages
+ */
+#ifdef CONFIG_DEBUG_KERNEL
+static inline int genuine_linus(void) { return 1; }
+#else
+static inline int genuine_linus(void) { return 0; }
+#endif
+
+#define kpf_copy_bit(uflags, kflags, visible, ubit, kbit)		\
+	do {								\
+		if (visible || genuine_linus())				\
+			uflags |= ((kflags >> kbit) & 1) << ubit;	\
+	} while (0);
+
+/* a helper function _not_ intended for more general uses */
+static inline int page_cap_writeback_dirty(struct page *page)
+{
+	struct address_space *mapping = NULL;
+
+	if (!PageSlab(page))
+		mapping = page_mapping(page);
+
+	return !mapping || mapping_cap_writeback_dirty(mapping);
+}
 
-#define kpf_copy_bit(flags, dstpos, srcpos) (((flags >> srcpos) & 1) << dstpos)
+static u64 get_uflags(struct page *page)
+{
+	u64 k;
+	u64 u;
+	int io;
+	int lru;
+	int slab;
+
+	/*
+	 * pseudo flag: KPF_NOPAGE
+	 * it differentiates a memory hole from a page with no flags
+	 */
+	if (!page)
+		return 1 << KPF_NOPAGE;
+
+	k = page->flags;
+	u = 0;
+
+	io   = page_cap_writeback_dirty(page);
+	lru  = k & (1 << PG_lru);
+	slab = k & (1 << PG_slab);
+
+	/*
+	 * pseudo flags for the well known (anonymous) memory mapped pages
+	 */
+	if (lru || genuine_linus()) {
+		if (page_mapped(page))
+			u |= 1 << KPF_MMAP;
+		if (PageAnon(page))
+			u |= 1 << KPF_ANON;
+	}
+
+	/*
+	 * compound pages: export both head/tail info
+	 * they together define a compound page's start/end pos and order
+	 */
+	if (PageHead(page))
+		u |= 1 << KPF_COMPOUND_HEAD;
+	if (PageTail(page))
+		u |= 1 << KPF_COMPOUND_TAIL;
+
+	kpf_copy_bit(u, k, 1,	  KPF_LOCKED,		PG_locked);
+
+	kpf_copy_bit(u, k, 1,     KPF_SLAB,		PG_slab);
+	kpf_copy_bit(u, k, 1,     KPF_BUDDY,		PG_buddy);
+
+	kpf_copy_bit(u, k, io,    KPF_ERROR,		PG_error);
+	kpf_copy_bit(u, k, io,    KPF_DIRTY,		PG_dirty);
+	kpf_copy_bit(u, k, io,    KPF_UPTODATE,		PG_uptodate);
+	kpf_copy_bit(u, k, io,    KPF_WRITEBACK,	PG_writeback);
+
+	kpf_copy_bit(u, k, 1,     KPF_LRU,		PG_lru);
+	kpf_copy_bit(u, k, lru,	  KPF_REFERENCED,	PG_referenced);
+	kpf_copy_bit(u, k, lru,   KPF_ACTIVE,		PG_active);
+	kpf_copy_bit(u, k, lru,   KPF_RECLAIM,		PG_reclaim);
+
+	kpf_copy_bit(u, k, lru,   KPF_SWAPCACHE,	PG_swapcache);
+	kpf_copy_bit(u, k, lru,   KPF_SWAPBACKED,	PG_swapbacked);
+
+#ifdef CONFIG_MEMORY_FAILURE
+	kpf_copy_bit(u, k, 1,     KPF_POISON,		PG_poison);
+#endif
+
+#ifdef CONFIG_UNEVICTABLE_LRU
+	kpf_copy_bit(u, k, lru,   KPF_UNEVICTABLE,	PG_unevictable);
+	kpf_copy_bit(u, k, 0,     KPF_MLOCKED,		PG_mlocked);
+#endif
+
+	kpf_copy_bit(u, k, 0,     KPF_RESERVED,		PG_reserved);
+	kpf_copy_bit(u, k, 0,     KPF_MAPPEDTODISK,	PG_mappedtodisk);
+	kpf_copy_bit(u, k, 0,     KPF_PRIVATE,		PG_private);
+	kpf_copy_bit(u, k, 0,     KPF_PRIVATE2,		PG_private_2);
+	kpf_copy_bit(u, k, 0,     KPF_OWNER_PRIVATE,	PG_owner_priv_1);
+	kpf_copy_bit(u, k, 0,     KPF_ARCH,		PG_arch_1);
+
+#ifdef CONFIG_IA64_UNCACHED_ALLOCATOR
+	kpf_copy_bit(u, k, 0,     KPF_UNCACHED,		PG_uncached);
+#endif
+
+	if (!genuine_linus()) {
+		/*
+		 * SLAB/SLOB/SLUB overload some page flags which may confuse end user
+		 */
+		if (slab) {
+			u &= ~ ((1 << KPF_ACTIVE)	|
+				(1 << KPF_ERROR)	|
+				(1 << KPF_MMAP));
+		}
+		/*
+		 * PG_reclaim could be overloaded as PG_readahead,
+		 * and we only want to export the first one.
+		 */
+		if ((u & ((1 << KPF_RECLAIM) | (1 << KPF_WRITEBACK))) ==
+			  (1 << KPF_RECLAIM))
+			u &= ~ (1 << KPF_RECLAIM);
+	}
+
+	return u;
+};
 
 static ssize_t kpageflags_read(struct file *file, char __user *buf,
 			     size_t count, loff_t *ppos)
@@ -90,7 +239,6 @@ static ssize_t kpageflags_read(struct fi
 	unsigned long src = *ppos;
 	unsigned long pfn;
 	ssize_t ret = 0;
-	u64 kflags, uflags;
 
 	pfn = src / KPMSIZE;
 	count = min_t(unsigned long, count, (max_pfn * KPMSIZE) - src);
@@ -98,32 +246,17 @@ static ssize_t kpageflags_read(struct fi
 		return -EINVAL;
 
 	while (count > 0) {
-		ppage = NULL;
 		if (pfn_valid(pfn))
 			ppage = pfn_to_page(pfn);
-		pfn++;
-		if (!ppage)
-			kflags = 0;
 		else
-			kflags = ppage->flags;
-
-		uflags = kpf_copy_bit(kflags, KPF_LOCKED, PG_locked) |
-			kpf_copy_bit(kflags, KPF_ERROR, PG_error) |
-			kpf_copy_bit(kflags, KPF_REFERENCED, PG_referenced) |
-			kpf_copy_bit(kflags, KPF_UPTODATE, PG_uptodate) |
-			kpf_copy_bit(kflags, KPF_DIRTY, PG_dirty) |
-			kpf_copy_bit(kflags, KPF_LRU, PG_lru) |
-			kpf_copy_bit(kflags, KPF_ACTIVE, PG_active) |
-			kpf_copy_bit(kflags, KPF_SLAB, PG_slab) |
-			kpf_copy_bit(kflags, KPF_WRITEBACK, PG_writeback) |
-			kpf_copy_bit(kflags, KPF_RECLAIM, PG_reclaim) |
-			kpf_copy_bit(kflags, KPF_BUDDY, PG_buddy);
+			ppage = NULL;
 
-		if (put_user(uflags, out++)) {
+		if (put_user(get_uflags(ppage), out)) {
 			ret = -EFAULT;
 			break;
 		}
-
+		out++;
+		pfn++;
 		count -= KPMSIZE;
 	}
 
--- mm.orig/Documentation/vm/pagemap.txt
+++ mm/Documentation/vm/pagemap.txt
@@ -12,9 +12,9 @@ There are three components to pagemap:
    value for each virtual page, containing the following data (from
    fs/proc/task_mmu.c, above pagemap_read):
 
-    * Bits 0-55  page frame number (PFN) if present
+    * Bits 0-54  page frame number (PFN) if present
     * Bits 0-4   swap type if swapped
-    * Bits 5-55  swap offset if swapped
+    * Bits 5-54  swap offset if swapped
     * Bits 55-60 page shift (page size = 1<<page shift)
     * Bit  61    reserved for future use
     * Bit  62    page swapped
@@ -36,7 +36,7 @@ There are three components to pagemap:
  * /proc/kpageflags.  This file contains a 64-bit set of flags for each
    page, indexed by PFN.
 
-   The flags are (from fs/proc/proc_misc, above kpageflags_read):
+   The flags are (from fs/proc/page.c, above kpageflags_read):
 
      0. LOCKED
      1. ERROR
@@ -49,6 +49,65 @@ There are three components to pagemap:
      8. WRITEBACK
      9. RECLAIM
     10. BUDDY
+    11. MMAP
+    12. ANON
+    13. SWAPCACHE
+    14. SWAPBACKED
+    15. COMPOUND_HEAD
+    16. COMPOUND_TAIL
+    17. UNEVICTABLE
+    18. POISON
+    19. NOPAGE
+
+Short descriptions to the page flags:
+
+ 0. LOCKED
+    page is being locked for exclusive access, eg. by undergoing read/write IO
+
+ 7. SLAB
+    page is managed by the SLAB/SLOB/SLUB/SLQB kernel memory allocator
+
+10. BUDDY
+    a free memory block managed by the buddy system allocator
+    The buddy system organizes free memory in blocks of various orders.
+    An order N block has 2^N physically contiguous pages, with the BUDDY flag
+    set for and _only_ for the first page.
+
+15. COMPOUND_HEAD
+16. COMPOUND_TAIL
+    A compound page with order N consists of 2^N physically contiguous pages.
+    A compound page with order 2 takes the form of "HTTT", where H donates its
+    head page and T donates its tail page(s).  The major consumers of compound
+    pages are hugeTLB pages (Documentation/vm/hugetlbpage.txt), the SLUB etc.
+    memory allocators and various device drivers.
+
+18. POISON
+    hardware has detected memory corruption on this page
+
+19. NOPAGE
+    no page frame exists at the requested address
+
+    [IO related page flags]
+ 1. ERROR     IO error occurred
+ 3. UPTODATE  page has up-to-date data
+              ie. for file backed page: (in-memory data revision >= on-disk one)
+ 4. DIRTY     page has been written to, hence contains new data
+              ie. for file backed page: (in-memory data revision >  on-disk one)
+ 8. WRITEBACK page is being synced to disk
+
+    [LRU related page flags]
+ 5. LRU         page is in one of the LRU lists
+ 6. ACTIVE      page is in the active LRU list
+17. UNEVICTABLE page is in the unevictable (non-)LRU list
+                It is somehow pinned and not a candidate for LRU page reclaims,
+		eg. ramfs pages, shmctl(SHM_LOCK) and mlock() memory segments
+ 2. REFERENCED  page has been referenced since last LRU list enqueue/requeue
+ 9. RECLAIM     page will be reclaimed soon after its pageout IO completed
+11. MMAP        a memory mapped page
+12. ANON        a memory mapped page who is not a file page
+13. SWAPCACHE   page is mapped to swap space, ie. has an associated swap entry
+14. SWAPBACKED  page is backed by swap/RAM
+
 
 Using pagemap to do something useful:
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2009-04-23  2:27 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-14  4:22 [RFC][PATCH] proc: export more page flags in /proc/kpageflags Wu Fengguang
2009-04-14  4:22 ` Wu Fengguang
2009-04-14  4:36 ` Wu Fengguang
2009-04-14  4:37 ` KOSAKI Motohiro
2009-04-14  4:37   ` KOSAKI Motohiro
2009-04-14  6:41   ` Wu Fengguang
2009-04-14  6:41     ` Wu Fengguang
2009-04-14  6:54     ` KOSAKI Motohiro
2009-04-14  6:54       ` KOSAKI Motohiro
2009-04-14  7:11       ` Andi Kleen
2009-04-14  7:11         ` Andi Kleen
2009-04-14  7:17         ` KOSAKI Motohiro
2009-04-14  7:17           ` KOSAKI Motohiro
2009-04-15 13:18         ` Wu Fengguang
2009-04-15 13:18           ` Wu Fengguang
2009-04-15 13:57           ` Andi Kleen
2009-04-15 13:57             ` Andi Kleen
2009-04-16  2:41             ` Wu Fengguang
2009-04-16  2:41               ` Wu Fengguang
2009-04-16  3:54               ` Andi Kleen
2009-04-16  3:54                 ` Andi Kleen
2009-04-16  4:43                 ` Wu Fengguang
2009-04-16  4:43                   ` Wu Fengguang
2009-04-16  2:26           ` KOSAKI Motohiro
2009-04-16  2:26             ` KOSAKI Motohiro
2009-04-16  3:49             ` Wu Fengguang
2009-04-16  3:49               ` Wu Fengguang
2009-04-16  6:30               ` Wu Fengguang
2009-04-16  6:30                 ` Wu Fengguang
2009-04-23  2:26             ` Wu Fengguang [this message]
2009-04-23  2:26               ` [RFC][PATCH] proc: export more page flags in /proc/kpageflags (take 3) Wu Fengguang
2009-04-23  7:48               ` Andi Kleen
2009-04-23  7:48                 ` Andi Kleen
2009-04-23  8:10                 ` Wu Fengguang
2009-04-23  8:10                   ` Wu Fengguang
2009-04-23  8:54                   ` Andi Kleen
2009-04-23  8:54                     ` Andi Kleen
2009-04-23 11:21                     ` Wu Fengguang
2009-04-23 11:21                       ` Wu Fengguang
2009-04-25  1:59               ` Wu Fengguang
2009-04-14  7:22       ` [RFC][PATCH] proc: export more page flags in /proc/kpageflags Wu Fengguang
2009-04-14  7:22         ` Wu Fengguang
2009-04-14  7:42         ` KOSAKI Motohiro
2009-04-14  7:42           ` KOSAKI Motohiro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090423022625.GA8822@localhost \
    --to=fengguang.wu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.