All of lore.kernel.org
 help / color / mirror / Atom feed
* [rfc 0/3] Cleaning up soft-dirty bit usage
@ 2014-04-03 18:48 ` Cyrill Gorcunov
  0 siblings, 0 replies; 11+ messages in thread
From: Cyrill Gorcunov @ 2014-04-03 18:48 UTC (permalink / raw)
  To: linux-kernel; +Cc: gorcunov, linux-mm

Hi! I've been trying to clean up soft-dirty bit usage. I can't cleanup
"ridiculous macros in pgtable-2level.h" completely because I need to
define _PAGE_FILE,_PAGE_PROTNONE,_PAGE_NUMA bits in sequence manner
like

#define _PAGE_BIT_FILE		(_PAGE_BIT_PRESENT + 1)	/* _PAGE_BIT_RW */
#define _PAGE_BIT_NUMA		(_PAGE_BIT_PRESENT + 2)	/* _PAGE_BIT_USER */
#define _PAGE_BIT_PROTNONE	(_PAGE_BIT_PRESENT + 3)	/* _PAGE_BIT_PWT */

which can't be done right now because numa code needs to save original
pte bits for example in __split_huge_page_map, if I'm not missing something
obvious.

Also if we ever redefine the bits above we will need to update PAT code
which uses _PAGE_GLOBAL + _PAGE_PRESENT to make pte_present return true
or false.

Another weird thing I found is the following sequence:

   mprotect_fixup
    change_protection (passes @prot_numa = 0 which finally ends up in)
      ...
      change_pte_range(..., prot_numa)

			if (!prot_numa) {
				...
			} else {
				... this seems to be dead code branch ...
			}

    is it intentional, and @prot_numa argument is supposed to be passed
    with prot_numa = 1 one day, or it's leftover from old times?

Note I've not yet tested the series building it now, hopefully finish
testing in a couple of hours.

Linus, by saying "define the bits we use when PAGE_PRESENT==0 separately
and explicitly" you meant complete rework of the bits, right? Not simply
group them in once place in a header?

	Cyrill

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [rfc 0/3] Cleaning up soft-dirty bit usage
@ 2014-04-03 18:48 ` Cyrill Gorcunov
  0 siblings, 0 replies; 11+ messages in thread
From: Cyrill Gorcunov @ 2014-04-03 18:48 UTC (permalink / raw)
  To: linux-kernel; +Cc: gorcunov, linux-mm

Hi! I've been trying to clean up soft-dirty bit usage. I can't cleanup
"ridiculous macros in pgtable-2level.h" completely because I need to
define _PAGE_FILE,_PAGE_PROTNONE,_PAGE_NUMA bits in sequence manner
like

#define _PAGE_BIT_FILE		(_PAGE_BIT_PRESENT + 1)	/* _PAGE_BIT_RW */
#define _PAGE_BIT_NUMA		(_PAGE_BIT_PRESENT + 2)	/* _PAGE_BIT_USER */
#define _PAGE_BIT_PROTNONE	(_PAGE_BIT_PRESENT + 3)	/* _PAGE_BIT_PWT */

which can't be done right now because numa code needs to save original
pte bits for example in __split_huge_page_map, if I'm not missing something
obvious.

Also if we ever redefine the bits above we will need to update PAT code
which uses _PAGE_GLOBAL + _PAGE_PRESENT to make pte_present return true
or false.

Another weird thing I found is the following sequence:

   mprotect_fixup
    change_protection (passes @prot_numa = 0 which finally ends up in)
      ...
      change_pte_range(..., prot_numa)

			if (!prot_numa) {
				...
			} else {
				... this seems to be dead code branch ...
			}

    is it intentional, and @prot_numa argument is supposed to be passed
    with prot_numa = 1 one day, or it's leftover from old times?

Note I've not yet tested the series building it now, hopefully finish
testing in a couple of hours.

Linus, by saying "define the bits we use when PAGE_PRESENT==0 separately
and explicitly" you meant complete rework of the bits, right? Not simply
group them in once place in a header?

	Cyrill

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [rfc 1/3] mm: pgtable -- Drop unneeded preprocessor ifdef
  2014-04-03 18:48 ` Cyrill Gorcunov
@ 2014-04-03 18:48   ` Cyrill Gorcunov
  -1 siblings, 0 replies; 11+ messages in thread
From: Cyrill Gorcunov @ 2014-04-03 18:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: gorcunov, linux-mm, Linus Torvalds, Mel Gorman, Peter Anvin,
	Ingo Molnar, Steven Noonan, Rik van Riel, David Vrabel,
	Andrew Morton, Peter Zijlstra, Pavel Emelyanov

[-- Attachment #1: pgbits-drop-if --]
[-- Type: text/plain, Size: 2738 bytes --]

_PAGE_BIT_FILE (bit 6) is always less than
_PAGE_BIT_PROTNONE (bit 9) so drop redundant #ifdef.

CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Mel Gorman <mgorman@suse.de>
CC: Peter Anvin <hpa@zytor.com>
CC: Ingo Molnar <mingo@kernel.org>
CC: Steven Noonan <steven@uplinklabs.net>
CC: Rik van Riel <riel@redhat.com>
CC: David Vrabel <david.vrabel@citrix.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
---
 arch/x86/include/asm/pgtable-2level.h |   10 ----------
 arch/x86/include/asm/pgtable_64.h     |    5 -----
 2 files changed, 15 deletions(-)

Index: linux-2.6.git/arch/x86/include/asm/pgtable-2level.h
===================================================================
--- linux-2.6.git.orig/arch/x86/include/asm/pgtable-2level.h
+++ linux-2.6.git/arch/x86/include/asm/pgtable-2level.h
@@ -115,13 +115,8 @@ static __always_inline pte_t pgoff_to_pt
  */
 #define PTE_FILE_MAX_BITS	29
 #define PTE_FILE_SHIFT1		(_PAGE_BIT_PRESENT + 1)
-#if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE
 #define PTE_FILE_SHIFT2		(_PAGE_BIT_FILE + 1)
 #define PTE_FILE_SHIFT3		(_PAGE_BIT_PROTNONE + 1)
-#else
-#define PTE_FILE_SHIFT2		(_PAGE_BIT_PROTNONE + 1)
-#define PTE_FILE_SHIFT3		(_PAGE_BIT_FILE + 1)
-#endif
 #define PTE_FILE_BITS1		(PTE_FILE_SHIFT2 - PTE_FILE_SHIFT1 - 1)
 #define PTE_FILE_BITS2		(PTE_FILE_SHIFT3 - PTE_FILE_SHIFT2 - 1)
 
@@ -153,13 +148,8 @@ static __always_inline pte_t pgoff_to_pt
 #endif /* CONFIG_MEM_SOFT_DIRTY */
 
 /* Encode and de-code a swap entry */
-#if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE
 #define SWP_TYPE_BITS (_PAGE_BIT_FILE - _PAGE_BIT_PRESENT - 1)
 #define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 1)
-#else
-#define SWP_TYPE_BITS (_PAGE_BIT_PROTNONE - _PAGE_BIT_PRESENT - 1)
-#define SWP_OFFSET_SHIFT (_PAGE_BIT_FILE + 1)
-#endif
 
 #define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS)
 
Index: linux-2.6.git/arch/x86/include/asm/pgtable_64.h
===================================================================
--- linux-2.6.git.orig/arch/x86/include/asm/pgtable_64.h
+++ linux-2.6.git/arch/x86/include/asm/pgtable_64.h
@@ -143,13 +143,8 @@ static inline int pgd_large(pgd_t pgd) {
 #define pte_unmap(pte) ((void)(pte))/* NOP */
 
 /* Encode and de-code a swap entry */
-#if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE
 #define SWP_TYPE_BITS (_PAGE_BIT_FILE - _PAGE_BIT_PRESENT - 1)
 #define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 1)
-#else
-#define SWP_TYPE_BITS (_PAGE_BIT_PROTNONE - _PAGE_BIT_PRESENT - 1)
-#define SWP_OFFSET_SHIFT (_PAGE_BIT_FILE + 1)
-#endif
 
 #define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS)
 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [rfc 1/3] mm: pgtable -- Drop unneeded preprocessor ifdef
@ 2014-04-03 18:48   ` Cyrill Gorcunov
  0 siblings, 0 replies; 11+ messages in thread
From: Cyrill Gorcunov @ 2014-04-03 18:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: gorcunov, linux-mm, Linus Torvalds, Mel Gorman, Peter Anvin,
	Ingo Molnar, Steven Noonan, Rik van Riel, David Vrabel,
	Andrew Morton, Peter Zijlstra, Pavel Emelyanov

[-- Attachment #1: pgbits-drop-if --]
[-- Type: text/plain, Size: 2963 bytes --]

_PAGE_BIT_FILE (bit 6) is always less than
_PAGE_BIT_PROTNONE (bit 9) so drop redundant #ifdef.

CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Mel Gorman <mgorman@suse.de>
CC: Peter Anvin <hpa@zytor.com>
CC: Ingo Molnar <mingo@kernel.org>
CC: Steven Noonan <steven@uplinklabs.net>
CC: Rik van Riel <riel@redhat.com>
CC: David Vrabel <david.vrabel@citrix.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
---
 arch/x86/include/asm/pgtable-2level.h |   10 ----------
 arch/x86/include/asm/pgtable_64.h     |    5 -----
 2 files changed, 15 deletions(-)

Index: linux-2.6.git/arch/x86/include/asm/pgtable-2level.h
===================================================================
--- linux-2.6.git.orig/arch/x86/include/asm/pgtable-2level.h
+++ linux-2.6.git/arch/x86/include/asm/pgtable-2level.h
@@ -115,13 +115,8 @@ static __always_inline pte_t pgoff_to_pt
  */
 #define PTE_FILE_MAX_BITS	29
 #define PTE_FILE_SHIFT1		(_PAGE_BIT_PRESENT + 1)
-#if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE
 #define PTE_FILE_SHIFT2		(_PAGE_BIT_FILE + 1)
 #define PTE_FILE_SHIFT3		(_PAGE_BIT_PROTNONE + 1)
-#else
-#define PTE_FILE_SHIFT2		(_PAGE_BIT_PROTNONE + 1)
-#define PTE_FILE_SHIFT3		(_PAGE_BIT_FILE + 1)
-#endif
 #define PTE_FILE_BITS1		(PTE_FILE_SHIFT2 - PTE_FILE_SHIFT1 - 1)
 #define PTE_FILE_BITS2		(PTE_FILE_SHIFT3 - PTE_FILE_SHIFT2 - 1)
 
@@ -153,13 +148,8 @@ static __always_inline pte_t pgoff_to_pt
 #endif /* CONFIG_MEM_SOFT_DIRTY */
 
 /* Encode and de-code a swap entry */
-#if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE
 #define SWP_TYPE_BITS (_PAGE_BIT_FILE - _PAGE_BIT_PRESENT - 1)
 #define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 1)
-#else
-#define SWP_TYPE_BITS (_PAGE_BIT_PROTNONE - _PAGE_BIT_PRESENT - 1)
-#define SWP_OFFSET_SHIFT (_PAGE_BIT_FILE + 1)
-#endif
 
 #define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS)
 
Index: linux-2.6.git/arch/x86/include/asm/pgtable_64.h
===================================================================
--- linux-2.6.git.orig/arch/x86/include/asm/pgtable_64.h
+++ linux-2.6.git/arch/x86/include/asm/pgtable_64.h
@@ -143,13 +143,8 @@ static inline int pgd_large(pgd_t pgd) {
 #define pte_unmap(pte) ((void)(pte))/* NOP */
 
 /* Encode and de-code a swap entry */
-#if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE
 #define SWP_TYPE_BITS (_PAGE_BIT_FILE - _PAGE_BIT_PRESENT - 1)
 #define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 1)
-#else
-#define SWP_TYPE_BITS (_PAGE_BIT_PROTNONE - _PAGE_BIT_PRESENT - 1)
-#define SWP_OFFSET_SHIFT (_PAGE_BIT_FILE + 1)
-#endif
 
 #define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS)
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [rfc 2/3] mm: pgtable -- Require X86_64 for soft-dirty tracker
  2014-04-03 18:48 ` Cyrill Gorcunov
@ 2014-04-03 18:48   ` Cyrill Gorcunov
  -1 siblings, 0 replies; 11+ messages in thread
From: Cyrill Gorcunov @ 2014-04-03 18:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: gorcunov, linux-mm, Linus Torvalds, Mel Gorman, Peter Anvin,
	Ingo Molnar, Steven Noonan, Rik van Riel, David Vrabel,
	Andrew Morton, Peter Zijlstra, Pavel Emelyanov

[-- Attachment #1: pgbits-drop-softdirty-non-x86-64 --]
[-- Type: text/plain, Size: 4354 bytes --]

Tracking dirty status on 2 level pages requires very ugly macros
and taking into account how old the machines who can operate
without PAE mode only are, lets drop soft dirty tracker from
them for code simplicity (note I can't drop all the macros
from 2 level pages by now since _PAGE_BIT_PROTNONE and
_PAGE_BIT_FILE are still used even without tracker).

Linus proposed to completely rip off softdirty support on
x86-32 (even with PAE) and since for CRIU we're not planning
to support native x86-32 mode, lets do that.

(Softdirty tracker is relatively new feature which mostly used
 by CRIU so I don't expect if such API change would cause problems
 on userspace).

CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Mel Gorman <mgorman@suse.de>
CC: Peter Anvin <hpa@zytor.com>
CC: Ingo Molnar <mingo@kernel.org>
CC: Steven Noonan <steven@uplinklabs.net>
CC: Rik van Riel <riel@redhat.com>
CC: David Vrabel <david.vrabel@citrix.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
---
 arch/x86/Kconfig                      |    2 -
 arch/x86/include/asm/pgtable-2level.h |   49 ----------------------------------
 2 files changed, 1 insertion(+), 50 deletions(-)

Index: linux-2.6.git/arch/x86/Kconfig
===================================================================
--- linux-2.6.git.orig/arch/x86/Kconfig
+++ linux-2.6.git/arch/x86/Kconfig
@@ -104,7 +104,7 @@ config X86
 	select HAVE_ARCH_SECCOMP_FILTER
 	select BUILDTIME_EXTABLE_SORT
 	select GENERIC_CMOS_UPDATE
-	select HAVE_ARCH_SOFT_DIRTY
+	select HAVE_ARCH_SOFT_DIRTY if X86_64
 	select CLOCKSOURCE_WATCHDOG
 	select GENERIC_CLOCKEVENTS
 	select ARCH_CLOCKSOURCE_DATA if X86_64
Index: linux-2.6.git/arch/x86/include/asm/pgtable-2level.h
===================================================================
--- linux-2.6.git.orig/arch/x86/include/asm/pgtable-2level.h
+++ linux-2.6.git/arch/x86/include/asm/pgtable-2level.h
@@ -62,53 +62,6 @@ static inline unsigned long pte_bitop(un
 	return ((value >> rightshift) & mask) << leftshift;
 }
 
-#ifdef CONFIG_MEM_SOFT_DIRTY
-
-/*
- * Bits _PAGE_BIT_PRESENT, _PAGE_BIT_FILE, _PAGE_BIT_SOFT_DIRTY and
- * _PAGE_BIT_PROTNONE are taken, split up the 28 bits of offset
- * into this range.
- */
-#define PTE_FILE_MAX_BITS	28
-#define PTE_FILE_SHIFT1		(_PAGE_BIT_PRESENT + 1)
-#define PTE_FILE_SHIFT2		(_PAGE_BIT_FILE + 1)
-#define PTE_FILE_SHIFT3		(_PAGE_BIT_PROTNONE + 1)
-#define PTE_FILE_SHIFT4		(_PAGE_BIT_SOFT_DIRTY + 1)
-#define PTE_FILE_BITS1		(PTE_FILE_SHIFT2 - PTE_FILE_SHIFT1 - 1)
-#define PTE_FILE_BITS2		(PTE_FILE_SHIFT3 - PTE_FILE_SHIFT2 - 1)
-#define PTE_FILE_BITS3		(PTE_FILE_SHIFT4 - PTE_FILE_SHIFT3 - 1)
-
-#define PTE_FILE_MASK1		((1U << PTE_FILE_BITS1) - 1)
-#define PTE_FILE_MASK2		((1U << PTE_FILE_BITS2) - 1)
-#define PTE_FILE_MASK3		((1U << PTE_FILE_BITS3) - 1)
-
-#define PTE_FILE_LSHIFT2	(PTE_FILE_BITS1)
-#define PTE_FILE_LSHIFT3	(PTE_FILE_BITS1 + PTE_FILE_BITS2)
-#define PTE_FILE_LSHIFT4	(PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3)
-
-static __always_inline pgoff_t pte_to_pgoff(pte_t pte)
-{
-	return (pgoff_t)
-		(pte_bitop(pte.pte_low, PTE_FILE_SHIFT1, PTE_FILE_MASK1,  0)		    +
-		 pte_bitop(pte.pte_low, PTE_FILE_SHIFT2, PTE_FILE_MASK2,  PTE_FILE_LSHIFT2) +
-		 pte_bitop(pte.pte_low, PTE_FILE_SHIFT3, PTE_FILE_MASK3,  PTE_FILE_LSHIFT3) +
-		 pte_bitop(pte.pte_low, PTE_FILE_SHIFT4,           -1UL,  PTE_FILE_LSHIFT4));
-}
-
-static __always_inline pte_t pgoff_to_pte(pgoff_t off)
-{
-	return (pte_t){
-		.pte_low =
-			pte_bitop(off,                0, PTE_FILE_MASK1,  PTE_FILE_SHIFT1) +
-			pte_bitop(off, PTE_FILE_LSHIFT2, PTE_FILE_MASK2,  PTE_FILE_SHIFT2) +
-			pte_bitop(off, PTE_FILE_LSHIFT3, PTE_FILE_MASK3,  PTE_FILE_SHIFT3) +
-			pte_bitop(off, PTE_FILE_LSHIFT4,           -1UL,  PTE_FILE_SHIFT4) +
-			_PAGE_FILE,
-	};
-}
-
-#else /* CONFIG_MEM_SOFT_DIRTY */
-
 /*
  * Bits _PAGE_BIT_PRESENT, _PAGE_BIT_FILE and _PAGE_BIT_PROTNONE are taken,
  * split up the 29 bits of offset into this range.
@@ -145,8 +98,6 @@ static __always_inline pte_t pgoff_to_pt
 	};
 }
 
-#endif /* CONFIG_MEM_SOFT_DIRTY */
-
 /* Encode and de-code a swap entry */
 #define SWP_TYPE_BITS (_PAGE_BIT_FILE - _PAGE_BIT_PRESENT - 1)
 #define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 1)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [rfc 2/3] mm: pgtable -- Require X86_64 for soft-dirty tracker
@ 2014-04-03 18:48   ` Cyrill Gorcunov
  0 siblings, 0 replies; 11+ messages in thread
From: Cyrill Gorcunov @ 2014-04-03 18:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: gorcunov, linux-mm, Linus Torvalds, Mel Gorman, Peter Anvin,
	Ingo Molnar, Steven Noonan, Rik van Riel, David Vrabel,
	Andrew Morton, Peter Zijlstra, Pavel Emelyanov

[-- Attachment #1: pgbits-drop-softdirty-non-x86-64 --]
[-- Type: text/plain, Size: 4579 bytes --]

Tracking dirty status on 2 level pages requires very ugly macros
and taking into account how old the machines who can operate
without PAE mode only are, lets drop soft dirty tracker from
them for code simplicity (note I can't drop all the macros
from 2 level pages by now since _PAGE_BIT_PROTNONE and
_PAGE_BIT_FILE are still used even without tracker).

Linus proposed to completely rip off softdirty support on
x86-32 (even with PAE) and since for CRIU we're not planning
to support native x86-32 mode, lets do that.

(Softdirty tracker is relatively new feature which mostly used
 by CRIU so I don't expect if such API change would cause problems
 on userspace).

CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Mel Gorman <mgorman@suse.de>
CC: Peter Anvin <hpa@zytor.com>
CC: Ingo Molnar <mingo@kernel.org>
CC: Steven Noonan <steven@uplinklabs.net>
CC: Rik van Riel <riel@redhat.com>
CC: David Vrabel <david.vrabel@citrix.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
---
 arch/x86/Kconfig                      |    2 -
 arch/x86/include/asm/pgtable-2level.h |   49 ----------------------------------
 2 files changed, 1 insertion(+), 50 deletions(-)

Index: linux-2.6.git/arch/x86/Kconfig
===================================================================
--- linux-2.6.git.orig/arch/x86/Kconfig
+++ linux-2.6.git/arch/x86/Kconfig
@@ -104,7 +104,7 @@ config X86
 	select HAVE_ARCH_SECCOMP_FILTER
 	select BUILDTIME_EXTABLE_SORT
 	select GENERIC_CMOS_UPDATE
-	select HAVE_ARCH_SOFT_DIRTY
+	select HAVE_ARCH_SOFT_DIRTY if X86_64
 	select CLOCKSOURCE_WATCHDOG
 	select GENERIC_CLOCKEVENTS
 	select ARCH_CLOCKSOURCE_DATA if X86_64
Index: linux-2.6.git/arch/x86/include/asm/pgtable-2level.h
===================================================================
--- linux-2.6.git.orig/arch/x86/include/asm/pgtable-2level.h
+++ linux-2.6.git/arch/x86/include/asm/pgtable-2level.h
@@ -62,53 +62,6 @@ static inline unsigned long pte_bitop(un
 	return ((value >> rightshift) & mask) << leftshift;
 }
 
-#ifdef CONFIG_MEM_SOFT_DIRTY
-
-/*
- * Bits _PAGE_BIT_PRESENT, _PAGE_BIT_FILE, _PAGE_BIT_SOFT_DIRTY and
- * _PAGE_BIT_PROTNONE are taken, split up the 28 bits of offset
- * into this range.
- */
-#define PTE_FILE_MAX_BITS	28
-#define PTE_FILE_SHIFT1		(_PAGE_BIT_PRESENT + 1)
-#define PTE_FILE_SHIFT2		(_PAGE_BIT_FILE + 1)
-#define PTE_FILE_SHIFT3		(_PAGE_BIT_PROTNONE + 1)
-#define PTE_FILE_SHIFT4		(_PAGE_BIT_SOFT_DIRTY + 1)
-#define PTE_FILE_BITS1		(PTE_FILE_SHIFT2 - PTE_FILE_SHIFT1 - 1)
-#define PTE_FILE_BITS2		(PTE_FILE_SHIFT3 - PTE_FILE_SHIFT2 - 1)
-#define PTE_FILE_BITS3		(PTE_FILE_SHIFT4 - PTE_FILE_SHIFT3 - 1)
-
-#define PTE_FILE_MASK1		((1U << PTE_FILE_BITS1) - 1)
-#define PTE_FILE_MASK2		((1U << PTE_FILE_BITS2) - 1)
-#define PTE_FILE_MASK3		((1U << PTE_FILE_BITS3) - 1)
-
-#define PTE_FILE_LSHIFT2	(PTE_FILE_BITS1)
-#define PTE_FILE_LSHIFT3	(PTE_FILE_BITS1 + PTE_FILE_BITS2)
-#define PTE_FILE_LSHIFT4	(PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3)
-
-static __always_inline pgoff_t pte_to_pgoff(pte_t pte)
-{
-	return (pgoff_t)
-		(pte_bitop(pte.pte_low, PTE_FILE_SHIFT1, PTE_FILE_MASK1,  0)		    +
-		 pte_bitop(pte.pte_low, PTE_FILE_SHIFT2, PTE_FILE_MASK2,  PTE_FILE_LSHIFT2) +
-		 pte_bitop(pte.pte_low, PTE_FILE_SHIFT3, PTE_FILE_MASK3,  PTE_FILE_LSHIFT3) +
-		 pte_bitop(pte.pte_low, PTE_FILE_SHIFT4,           -1UL,  PTE_FILE_LSHIFT4));
-}
-
-static __always_inline pte_t pgoff_to_pte(pgoff_t off)
-{
-	return (pte_t){
-		.pte_low =
-			pte_bitop(off,                0, PTE_FILE_MASK1,  PTE_FILE_SHIFT1) +
-			pte_bitop(off, PTE_FILE_LSHIFT2, PTE_FILE_MASK2,  PTE_FILE_SHIFT2) +
-			pte_bitop(off, PTE_FILE_LSHIFT3, PTE_FILE_MASK3,  PTE_FILE_SHIFT3) +
-			pte_bitop(off, PTE_FILE_LSHIFT4,           -1UL,  PTE_FILE_SHIFT4) +
-			_PAGE_FILE,
-	};
-}
-
-#else /* CONFIG_MEM_SOFT_DIRTY */
-
 /*
  * Bits _PAGE_BIT_PRESENT, _PAGE_BIT_FILE and _PAGE_BIT_PROTNONE are taken,
  * split up the 29 bits of offset into this range.
@@ -145,8 +98,6 @@ static __always_inline pte_t pgoff_to_pt
 	};
 }
 
-#endif /* CONFIG_MEM_SOFT_DIRTY */
-
 /* Encode and de-code a swap entry */
 #define SWP_TYPE_BITS (_PAGE_BIT_FILE - _PAGE_BIT_PRESENT - 1)
 #define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 1)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [rfc 3/3] mm: pgtable -- Use _PAGE_SOFT_DIRTY for swap entries
  2014-04-03 18:48 ` Cyrill Gorcunov
@ 2014-04-03 18:48   ` Cyrill Gorcunov
  -1 siblings, 0 replies; 11+ messages in thread
From: Cyrill Gorcunov @ 2014-04-03 18:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: gorcunov, linux-mm, Linus Torvalds, Mel Gorman, Peter Anvin,
	Ingo Molnar, Steven Noonan, Rik van Riel, David Vrabel,
	Andrew Morton, Peter Zijlstra, Pavel Emelyanov

[-- Attachment #1: pgbits-drop-pse-for-dirty-swap --]
[-- Type: text/plain, Size: 3258 bytes --]

Since we support soft-dirty on x86-64 now we can release _PAGE_PSE
bit used to track dirty swap entries and reuse ealready existing
_PAGE_SOFT_DIRTY.

Thus for all soft-dirty needs we use same pte bit.

CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Mel Gorman <mgorman@suse.de>
CC: Peter Anvin <hpa@zytor.com>
CC: Ingo Molnar <mingo@kernel.org>
CC: Steven Noonan <steven@uplinklabs.net>
CC: Rik van Riel <riel@redhat.com>
CC: David Vrabel <david.vrabel@citrix.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
---
 arch/x86/include/asm/pgtable_64.h    |   12 ++++++++++--
 arch/x86/include/asm/pgtable_types.h |   19 ++++---------------
 2 files changed, 14 insertions(+), 17 deletions(-)

Index: linux-2.6.git/arch/x86/include/asm/pgtable_64.h
===================================================================
--- linux-2.6.git.orig/arch/x86/include/asm/pgtable_64.h
+++ linux-2.6.git/arch/x86/include/asm/pgtable_64.h
@@ -142,9 +142,17 @@ static inline int pgd_large(pgd_t pgd) {
 #define pte_offset_map(dir, address) pte_offset_kernel((dir), (address))
 #define pte_unmap(pte) ((void)(pte))/* NOP */
 
-/* Encode and de-code a swap entry */
+/*
+ * Encode and de-code a swap entry. When soft-dirty memory tracker is
+ * enabled we need to borrow _PAGE_BIT_SOFT_DIRTY bit for own needs,
+ * which limits the max size of swap partiotion about to 1T.
+ */
 #define SWP_TYPE_BITS (_PAGE_BIT_FILE - _PAGE_BIT_PRESENT - 1)
-#define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 1)
+#ifdef CONFIG_MEM_SOFT_DIRTY
+# define SWP_OFFSET_SHIFT (_PAGE_BIT_SOFT_DIRTY + 1)
+#else
+# define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 1)
+#endif
 
 #define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS)
 
Index: linux-2.6.git/arch/x86/include/asm/pgtable_types.h
===================================================================
--- linux-2.6.git.orig/arch/x86/include/asm/pgtable_types.h
+++ linux-2.6.git/arch/x86/include/asm/pgtable_types.h
@@ -59,29 +59,18 @@
  * The same hidden bit is used by kmemcheck, but since kmemcheck
  * works on kernel pages while soft-dirty engine on user space,
  * they do not conflict with each other.
+ *
+ * Because soft-dirty is limited to x86-64 only we can reuse this
+ * bit to track swap entries as well.
  */
 
 #define _PAGE_BIT_SOFT_DIRTY	_PAGE_BIT_HIDDEN
 
 #ifdef CONFIG_MEM_SOFT_DIRTY
 #define _PAGE_SOFT_DIRTY	(_AT(pteval_t, 1) << _PAGE_BIT_SOFT_DIRTY)
+#define _PAGE_SWP_SOFT_DIRTY	_PAGE_SOFT_DIRTY
 #else
 #define _PAGE_SOFT_DIRTY	(_AT(pteval_t, 0))
-#endif
-
-/*
- * Tracking soft dirty bit when a page goes to a swap is tricky.
- * We need a bit which can be stored in pte _and_ not conflict
- * with swap entry format. On x86 bits 6 and 7 are *not* involved
- * into swap entry computation, but bit 6 is used for nonlinear
- * file mapping, so we borrow bit 7 for soft dirty tracking.
- *
- * Please note that this bit must be treated as swap dirty page
- * mark if and only if the PTE has present bit clear!
- */
-#ifdef CONFIG_MEM_SOFT_DIRTY
-#define _PAGE_SWP_SOFT_DIRTY	_PAGE_PSE
-#else
 #define _PAGE_SWP_SOFT_DIRTY	(_AT(pteval_t, 0))
 #endif
 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [rfc 3/3] mm: pgtable -- Use _PAGE_SOFT_DIRTY for swap entries
@ 2014-04-03 18:48   ` Cyrill Gorcunov
  0 siblings, 0 replies; 11+ messages in thread
From: Cyrill Gorcunov @ 2014-04-03 18:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: gorcunov, linux-mm, Linus Torvalds, Mel Gorman, Peter Anvin,
	Ingo Molnar, Steven Noonan, Rik van Riel, David Vrabel,
	Andrew Morton, Peter Zijlstra, Pavel Emelyanov

[-- Attachment #1: pgbits-drop-pse-for-dirty-swap --]
[-- Type: text/plain, Size: 3483 bytes --]

Since we support soft-dirty on x86-64 now we can release _PAGE_PSE
bit used to track dirty swap entries and reuse ealready existing
_PAGE_SOFT_DIRTY.

Thus for all soft-dirty needs we use same pte bit.

CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Mel Gorman <mgorman@suse.de>
CC: Peter Anvin <hpa@zytor.com>
CC: Ingo Molnar <mingo@kernel.org>
CC: Steven Noonan <steven@uplinklabs.net>
CC: Rik van Riel <riel@redhat.com>
CC: David Vrabel <david.vrabel@citrix.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
---
 arch/x86/include/asm/pgtable_64.h    |   12 ++++++++++--
 arch/x86/include/asm/pgtable_types.h |   19 ++++---------------
 2 files changed, 14 insertions(+), 17 deletions(-)

Index: linux-2.6.git/arch/x86/include/asm/pgtable_64.h
===================================================================
--- linux-2.6.git.orig/arch/x86/include/asm/pgtable_64.h
+++ linux-2.6.git/arch/x86/include/asm/pgtable_64.h
@@ -142,9 +142,17 @@ static inline int pgd_large(pgd_t pgd) {
 #define pte_offset_map(dir, address) pte_offset_kernel((dir), (address))
 #define pte_unmap(pte) ((void)(pte))/* NOP */
 
-/* Encode and de-code a swap entry */
+/*
+ * Encode and de-code a swap entry. When soft-dirty memory tracker is
+ * enabled we need to borrow _PAGE_BIT_SOFT_DIRTY bit for own needs,
+ * which limits the max size of swap partiotion about to 1T.
+ */
 #define SWP_TYPE_BITS (_PAGE_BIT_FILE - _PAGE_BIT_PRESENT - 1)
-#define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 1)
+#ifdef CONFIG_MEM_SOFT_DIRTY
+# define SWP_OFFSET_SHIFT (_PAGE_BIT_SOFT_DIRTY + 1)
+#else
+# define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 1)
+#endif
 
 #define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS)
 
Index: linux-2.6.git/arch/x86/include/asm/pgtable_types.h
===================================================================
--- linux-2.6.git.orig/arch/x86/include/asm/pgtable_types.h
+++ linux-2.6.git/arch/x86/include/asm/pgtable_types.h
@@ -59,29 +59,18 @@
  * The same hidden bit is used by kmemcheck, but since kmemcheck
  * works on kernel pages while soft-dirty engine on user space,
  * they do not conflict with each other.
+ *
+ * Because soft-dirty is limited to x86-64 only we can reuse this
+ * bit to track swap entries as well.
  */
 
 #define _PAGE_BIT_SOFT_DIRTY	_PAGE_BIT_HIDDEN
 
 #ifdef CONFIG_MEM_SOFT_DIRTY
 #define _PAGE_SOFT_DIRTY	(_AT(pteval_t, 1) << _PAGE_BIT_SOFT_DIRTY)
+#define _PAGE_SWP_SOFT_DIRTY	_PAGE_SOFT_DIRTY
 #else
 #define _PAGE_SOFT_DIRTY	(_AT(pteval_t, 0))
-#endif
-
-/*
- * Tracking soft dirty bit when a page goes to a swap is tricky.
- * We need a bit which can be stored in pte _and_ not conflict
- * with swap entry format. On x86 bits 6 and 7 are *not* involved
- * into swap entry computation, but bit 6 is used for nonlinear
- * file mapping, so we borrow bit 7 for soft dirty tracking.
- *
- * Please note that this bit must be treated as swap dirty page
- * mark if and only if the PTE has present bit clear!
- */
-#ifdef CONFIG_MEM_SOFT_DIRTY
-#define _PAGE_SWP_SOFT_DIRTY	_PAGE_PSE
-#else
 #define _PAGE_SWP_SOFT_DIRTY	(_AT(pteval_t, 0))
 #endif
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [rfc 0/3] Cleaning up soft-dirty bit usage
  2014-04-03 18:48 ` Cyrill Gorcunov
                   ` (3 preceding siblings ...)
  (?)
@ 2014-04-07 13:07 ` Kirill A. Shutemov
  2014-04-07 13:24     ` Cyrill Gorcunov
  -1 siblings, 1 reply; 11+ messages in thread
From: Kirill A. Shutemov @ 2014-04-07 13:07 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: linux-kernel, linux-mm

On Thu, Apr 03, 2014 at 10:48:44PM +0400, Cyrill Gorcunov wrote:
> Hi! I've been trying to clean up soft-dirty bit usage. I can't cleanup
> "ridiculous macros in pgtable-2level.h" completely because I need to
> define _PAGE_FILE,_PAGE_PROTNONE,_PAGE_NUMA bits in sequence manner
> like
> 
> #define _PAGE_BIT_FILE		(_PAGE_BIT_PRESENT + 1)	/* _PAGE_BIT_RW */
> #define _PAGE_BIT_NUMA		(_PAGE_BIT_PRESENT + 2)	/* _PAGE_BIT_USER */
> #define _PAGE_BIT_PROTNONE	(_PAGE_BIT_PRESENT + 3)	/* _PAGE_BIT_PWT */
> 
> which can't be done right now because numa code needs to save original
> pte bits for example in __split_huge_page_map, if I'm not missing something
> obvious.

Sorry, I didn't get this. How __split_huge_page_map() does depend on pte
bits order?

> 
> Also if we ever redefine the bits above we will need to update PAT code
> which uses _PAGE_GLOBAL + _PAGE_PRESENT to make pte_present return true
> or false.
> 
> Another weird thing I found is the following sequence:
> 
>    mprotect_fixup
>     change_protection (passes @prot_numa = 0 which finally ends up in)
>       ...
>       change_pte_range(..., prot_numa)
> 
> 			if (!prot_numa) {
> 				...
> 			} else {
> 				... this seems to be dead code branch ...
> 			}
> 
>     is it intentional, and @prot_numa argument is supposed to be passed
>     with prot_numa = 1 one day, or it's leftover from old times?

I see one more user of change_protection() -- change_prot_numa(), which
has .prot_numa == 1.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [rfc 0/3] Cleaning up soft-dirty bit usage
  2014-04-07 13:07 ` [rfc 0/3] Cleaning up soft-dirty bit usage Kirill A. Shutemov
@ 2014-04-07 13:24     ` Cyrill Gorcunov
  0 siblings, 0 replies; 11+ messages in thread
From: Cyrill Gorcunov @ 2014-04-07 13:24 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: linux-kernel, linux-mm

On Mon, Apr 07, 2014 at 04:07:01PM +0300, Kirill A. Shutemov wrote:
> On Thu, Apr 03, 2014 at 10:48:44PM +0400, Cyrill Gorcunov wrote:
> > Hi! I've been trying to clean up soft-dirty bit usage. I can't cleanup
> > "ridiculous macros in pgtable-2level.h" completely because I need to
> > define _PAGE_FILE,_PAGE_PROTNONE,_PAGE_NUMA bits in sequence manner
> > like
> > 
> > #define _PAGE_BIT_FILE		(_PAGE_BIT_PRESENT + 1)	/* _PAGE_BIT_RW */
> > #define _PAGE_BIT_NUMA		(_PAGE_BIT_PRESENT + 2)	/* _PAGE_BIT_USER */
> > #define _PAGE_BIT_PROTNONE	(_PAGE_BIT_PRESENT + 3)	/* _PAGE_BIT_PWT */
> > 
> > which can't be done right now because numa code needs to save original
> > pte bits for example in __split_huge_page_map, if I'm not missing something
> > obvious.
> 
> Sorry, I didn't get this. How __split_huge_page_map() does depend on pte
> bits order?

__split_huge_page_map
  ...
  for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
    ...
    here we modify with pte bits
    entry = pte_mknuma(entry); --> clean _PAGE_PRESENT and set _PAGE_NUMA

    pte bits must remain valid and meaningful, for example we might
    have set _PAGE_RW here

> >     is it intentional, and @prot_numa argument is supposed to be passed
> >     with prot_numa = 1 one day, or it's leftover from old times?
> 
> I see one more user of change_protection() -- change_prot_numa(), which
> has .prot_numa == 1.

Yeah, thanks, managed to miss this.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [rfc 0/3] Cleaning up soft-dirty bit usage
@ 2014-04-07 13:24     ` Cyrill Gorcunov
  0 siblings, 0 replies; 11+ messages in thread
From: Cyrill Gorcunov @ 2014-04-07 13:24 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: linux-kernel, linux-mm

On Mon, Apr 07, 2014 at 04:07:01PM +0300, Kirill A. Shutemov wrote:
> On Thu, Apr 03, 2014 at 10:48:44PM +0400, Cyrill Gorcunov wrote:
> > Hi! I've been trying to clean up soft-dirty bit usage. I can't cleanup
> > "ridiculous macros in pgtable-2level.h" completely because I need to
> > define _PAGE_FILE,_PAGE_PROTNONE,_PAGE_NUMA bits in sequence manner
> > like
> > 
> > #define _PAGE_BIT_FILE		(_PAGE_BIT_PRESENT + 1)	/* _PAGE_BIT_RW */
> > #define _PAGE_BIT_NUMA		(_PAGE_BIT_PRESENT + 2)	/* _PAGE_BIT_USER */
> > #define _PAGE_BIT_PROTNONE	(_PAGE_BIT_PRESENT + 3)	/* _PAGE_BIT_PWT */
> > 
> > which can't be done right now because numa code needs to save original
> > pte bits for example in __split_huge_page_map, if I'm not missing something
> > obvious.
> 
> Sorry, I didn't get this. How __split_huge_page_map() does depend on pte
> bits order?

__split_huge_page_map
  ...
  for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
    ...
    here we modify with pte bits
    entry = pte_mknuma(entry); --> clean _PAGE_PRESENT and set _PAGE_NUMA

    pte bits must remain valid and meaningful, for example we might
    have set _PAGE_RW here

> >     is it intentional, and @prot_numa argument is supposed to be passed
> >     with prot_numa = 1 one day, or it's leftover from old times?
> 
> I see one more user of change_protection() -- change_prot_numa(), which
> has .prot_numa == 1.

Yeah, thanks, managed to miss this.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-04-07 13:24 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-03 18:48 [rfc 0/3] Cleaning up soft-dirty bit usage Cyrill Gorcunov
2014-04-03 18:48 ` Cyrill Gorcunov
2014-04-03 18:48 ` [rfc 1/3] mm: pgtable -- Drop unneeded preprocessor ifdef Cyrill Gorcunov
2014-04-03 18:48   ` Cyrill Gorcunov
2014-04-03 18:48 ` [rfc 2/3] mm: pgtable -- Require X86_64 for soft-dirty tracker Cyrill Gorcunov
2014-04-03 18:48   ` Cyrill Gorcunov
2014-04-03 18:48 ` [rfc 3/3] mm: pgtable -- Use _PAGE_SOFT_DIRTY for swap entries Cyrill Gorcunov
2014-04-03 18:48   ` Cyrill Gorcunov
2014-04-07 13:07 ` [rfc 0/3] Cleaning up soft-dirty bit usage Kirill A. Shutemov
2014-04-07 13:24   ` Cyrill Gorcunov
2014-04-07 13:24     ` Cyrill Gorcunov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.