All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86: Allow 1GB pages to be SPECIAL similar to 2MB
@ 2014-09-24 17:19 James Custer
  2014-09-24 22:04 ` David Rientjes
  0 siblings, 1 reply; 6+ messages in thread
From: James Custer @ 2014-09-24 17:19 UTC (permalink / raw)
  To: x86, linux-kernel, tglx, mingo, hpa; +Cc: James Custer


Signed-off-by: James Custer <jcuster@sgi.com>
---
 arch/x86/mm/gup.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
index dd74e46..12ca9cf 100644
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -121,7 +121,6 @@ static noinline int gup_huge_pmd(pmd_t pmd, unsigned long addr,
 		mask |= _PAGE_RW;
 	if ((pte_flags(pte) & mask) != mask)
 		return 0;
-	/* hugepages are never "special" */
 	VM_BUG_ON(pte_flags(pte) & _PAGE_SPECIAL);
 	VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
 
@@ -191,7 +190,6 @@ static noinline int gup_huge_pud(pud_t pud, unsigned long addr,
 		mask |= _PAGE_RW;
 	if ((pte_flags(pte) & mask) != mask)
 		return 0;
-	/* hugepages are never "special" */
 	VM_BUG_ON(pte_flags(pte) & _PAGE_SPECIAL);
 	VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
 
@@ -223,7 +221,7 @@ static int gup_pud_range(pgd_t pgd, unsigned long addr, unsigned long end,
 		pud_t pud = *pudp;
 
 		next = pud_addr_end(addr, end);
-		if (pud_none(pud))
+		if (pud_none(pud) || (pud_val(pud) & _PAGE_SPECIAL))
 			return 0;
 		if (unlikely(pud_large(pud))) {
 			if (!gup_huge_pud(pud, addr, next, write, pages, nr))
-- 
1.8.2.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86: Allow 1GB pages to be SPECIAL similar to 2MB
  2014-09-24 17:19 [PATCH] x86: Allow 1GB pages to be SPECIAL similar to 2MB James Custer
@ 2014-09-24 22:04 ` David Rientjes
  2014-09-25 10:39   ` Thomas Gleixner
  0 siblings, 1 reply; 6+ messages in thread
From: David Rientjes @ 2014-09-24 22:04 UTC (permalink / raw)
  To: James Custer; +Cc: x86, linux-kernel, tglx, mingo, hpa

On Wed, 24 Sep 2014, James Custer wrote:

> 
> Signed-off-by: James Custer <jcuster@sgi.com>

With no changelog to describe the motivation for this change or to specify 
why it's required, I doubt you're going to get your patches merged by the 
x86 maintainers.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86: Allow 1GB pages to be SPECIAL similar to 2MB
  2014-09-24 22:04 ` David Rientjes
@ 2014-09-25 10:39   ` Thomas Gleixner
  0 siblings, 0 replies; 6+ messages in thread
From: Thomas Gleixner @ 2014-09-25 10:39 UTC (permalink / raw)
  To: David Rientjes; +Cc: James Custer, x86, linux-kernel, mingo, hpa

On Wed, 24 Sep 2014, David Rientjes wrote:

> On Wed, 24 Sep 2014, James Custer wrote:
> 
> > 
> > Signed-off-by: James Custer <jcuster@sgi.com>
> 
> With no changelog to describe the motivation for this change or to specify 
> why it's required, I doubt you're going to get your patches merged by the 
> x86 maintainers.

Indeed. Was about to rant about this ...

	tglx
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86: Allow 1GB pages to be SPECIAL similar to 2MB
  2014-11-25 21:23 ` Andrew Morton
@ 2014-11-26  6:17   ` Kirill A. Shutemov
  0 siblings, 0 replies; 6+ messages in thread
From: Kirill A. Shutemov @ 2014-11-26  6:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: James Custer, x86, linux-kernel, tglx, mingo, hpa, David Rientjes

On Tue, Nov 25, 2014 at 01:23:04PM -0800, Andrew Morton wrote:
> On Thu, 25 Sep 2014 09:40:24 -0500 James Custer <jcuster@sgi.com> wrote:
> 
> > Superpages allocated by SGI's superpages module can be backed by 1GB pages,
> > but direct i/o cannot be used. The superpages module uses _PAGE_BIT_SPECIAL
> > to disable direct i/o because some code depends on the memory being backed
> > by page structures. But, because superpages have no backing page structures
> > this causes a panic.
> > 
> > This is the way direct i/o on 1GB pages fails:
> > 
> >                BUG: unable to handle kernel paging request at ffffea0038000000
> > [60463.203795] IP: [<ffffffff8103c93a>] gup_huge_pud+0x9a/0xe0
> > [60463.210058] PGD 83ffd3067 PUD 83ffd2067 PMD 0
> > [60463.215052] Oops: 0000 [#1] SMP
> > 
> > Stack traceback for pid 77136
> > 0xffff8867a88ae300    77136    74825  1   56   R  0xffff8867a88ae970 *readdirectsp
> >  [<ffffffff8103c93a>] gup_huge_pud+0x9a/0xe0
> >  [<ffffffff8103cc33>] gup_pud_range+0x173/0x1b0
> >  [<ffffffff8103cd57>] get_user_pages_fast+0xe7/0x1b0
> >  [<ffffffff8118eac3>] dio_get_page+0x83/0x150
> >  [<ffffffff8118f641>] do_direct_IO+0x81/0x420
> >  [<ffffffff8118fb89>] direct_io_worker+0x1a9/0x340
> >  [<ffffffffa00c5de8>] ext3_direct_IO+0xe8/0x2c0 [ext3]
> >  [<ffffffff810fa527>] generic_file_aio_read+0x237/0x260
> >  [<ffffffff81159878>] do_sync_read+0xc8/0x110
> >  [<ffffffff8115a027>] vfs_read+0xc7/0x130
> >  [<ffffffff8115a193>] sys_read+0x53/0xa0
> >  [<ffffffff81466192>] system_call_fastpath+0x16/0x1b
> > 
> > gup_huge_pud() is trying to find the page structure, and with superpages there
> > is none.
> > 
> > With direct i/o on 2MB pages:
> > 
> > static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
> >                 int write, struct page **pages, int *nr)
> > {
> >  ...
> >                 if (pmd_none(pmd) || pmd_trans_splitting(pmd))
> >                         return 0;
> > 
> > and pmd_trans_splitting() is testing _PAGE_SPLITTING, which is an alias
> > for _PAGE_SPECIAL which we set on the 2MB or 1GB pages mapped in by superpages.
> > 
> > But gup_pud_range() has no such check:
> > 
> > static int gup_pud_range(pgd_t pgd, unsigned long addr, unsigned long end,
> >                         int write, struct page **pages, int *nr)
> > {
> >   ...
> >                 if (pud_none(pud))
> >                         return 0;
> > 
> > Therefore direct i/o on 1GB pages attempts to get a page structure and panics.
> > 
> > ...
> >
> > @@ -223,7 +221,7 @@ static int gup_pud_range(pgd_t pgd, unsigned long addr, unsigned long end,
> >  		pud_t pud = *pudp;
> >  
> >  		next = pud_addr_end(addr, end);
> > -		if (pud_none(pud))
> > +		if (pud_none(pud) || (pud_val(pud) & _PAGE_SPECIAL))
> >  			return 0;
> >  		if (unlikely(pud_large(pud))) {
> >  			if (!gup_huge_pud(pud, addr, next, write, pages, nr))
> 
> If I'm understanding it correctly, this patch is only needed by SGI's
> superpages module, yes?
> 
> That being said, it looks like a reasonable precaution and we could
> easily carry it.

Previously we used PSE + SOFTW1 in pmd_t for pmd_trans_splitting().
I don't think it's good idea to reserve a bit in page table entries for
use-case kernel by itself doesn't support. Especially, that it's a bit in
present entry.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86: Allow 1GB pages to be SPECIAL similar to 2MB
  2014-09-25 14:40 James Custer
@ 2014-11-25 21:23 ` Andrew Morton
  2014-11-26  6:17   ` Kirill A. Shutemov
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2014-11-25 21:23 UTC (permalink / raw)
  To: James Custer; +Cc: x86, linux-kernel, tglx, mingo, hpa, David Rientjes

On Thu, 25 Sep 2014 09:40:24 -0500 James Custer <jcuster@sgi.com> wrote:

> Superpages allocated by SGI's superpages module can be backed by 1GB pages,
> but direct i/o cannot be used. The superpages module uses _PAGE_BIT_SPECIAL
> to disable direct i/o because some code depends on the memory being backed
> by page structures. But, because superpages have no backing page structures
> this causes a panic.
> 
> This is the way direct i/o on 1GB pages fails:
> 
>                BUG: unable to handle kernel paging request at ffffea0038000000
> [60463.203795] IP: [<ffffffff8103c93a>] gup_huge_pud+0x9a/0xe0
> [60463.210058] PGD 83ffd3067 PUD 83ffd2067 PMD 0
> [60463.215052] Oops: 0000 [#1] SMP
> 
> Stack traceback for pid 77136
> 0xffff8867a88ae300    77136    74825  1   56   R  0xffff8867a88ae970 *readdirectsp
>  [<ffffffff8103c93a>] gup_huge_pud+0x9a/0xe0
>  [<ffffffff8103cc33>] gup_pud_range+0x173/0x1b0
>  [<ffffffff8103cd57>] get_user_pages_fast+0xe7/0x1b0
>  [<ffffffff8118eac3>] dio_get_page+0x83/0x150
>  [<ffffffff8118f641>] do_direct_IO+0x81/0x420
>  [<ffffffff8118fb89>] direct_io_worker+0x1a9/0x340
>  [<ffffffffa00c5de8>] ext3_direct_IO+0xe8/0x2c0 [ext3]
>  [<ffffffff810fa527>] generic_file_aio_read+0x237/0x260
>  [<ffffffff81159878>] do_sync_read+0xc8/0x110
>  [<ffffffff8115a027>] vfs_read+0xc7/0x130
>  [<ffffffff8115a193>] sys_read+0x53/0xa0
>  [<ffffffff81466192>] system_call_fastpath+0x16/0x1b
> 
> gup_huge_pud() is trying to find the page structure, and with superpages there
> is none.
> 
> With direct i/o on 2MB pages:
> 
> static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
>                 int write, struct page **pages, int *nr)
> {
>  ...
>                 if (pmd_none(pmd) || pmd_trans_splitting(pmd))
>                         return 0;
> 
> and pmd_trans_splitting() is testing _PAGE_SPLITTING, which is an alias
> for _PAGE_SPECIAL which we set on the 2MB or 1GB pages mapped in by superpages.
> 
> But gup_pud_range() has no such check:
> 
> static int gup_pud_range(pgd_t pgd, unsigned long addr, unsigned long end,
>                         int write, struct page **pages, int *nr)
> {
>   ...
>                 if (pud_none(pud))
>                         return 0;
> 
> Therefore direct i/o on 1GB pages attempts to get a page structure and panics.
> 
> ...
>
> @@ -223,7 +221,7 @@ static int gup_pud_range(pgd_t pgd, unsigned long addr, unsigned long end,
>  		pud_t pud = *pudp;
>  
>  		next = pud_addr_end(addr, end);
> -		if (pud_none(pud))
> +		if (pud_none(pud) || (pud_val(pud) & _PAGE_SPECIAL))
>  			return 0;
>  		if (unlikely(pud_large(pud))) {
>  			if (!gup_huge_pud(pud, addr, next, write, pages, nr))

If I'm understanding it correctly, this patch is only needed by SGI's
superpages module, yes?

That being said, it looks like a reasonable precaution and we could
easily carry it.

But please, this check needs a very good code comment explaining to
readers why it is here and what it is doing.  Probably that comment
should also mention that the mainline kernel doesn't strictly need the
check.

Also, open-coding the flag test looks rather inconsistent.  Should we
have some (documented, please) helper macro to perform this test?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] x86: Allow 1GB pages to be SPECIAL similar to 2MB
@ 2014-09-25 14:40 James Custer
  2014-11-25 21:23 ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: James Custer @ 2014-09-25 14:40 UTC (permalink / raw)
  To: x86, linux-kernel, tglx, mingo, hpa; +Cc: James Custer

Superpages allocated by SGI's superpages module can be backed by 1GB pages,
but direct i/o cannot be used. The superpages module uses _PAGE_BIT_SPECIAL
to disable direct i/o because some code depends on the memory being backed
by page structures. But, because superpages have no backing page structures
this causes a panic.

This is the way direct i/o on 1GB pages fails:

               BUG: unable to handle kernel paging request at ffffea0038000000
[60463.203795] IP: [<ffffffff8103c93a>] gup_huge_pud+0x9a/0xe0
[60463.210058] PGD 83ffd3067 PUD 83ffd2067 PMD 0
[60463.215052] Oops: 0000 [#1] SMP

Stack traceback for pid 77136
0xffff8867a88ae300    77136    74825  1   56   R  0xffff8867a88ae970 *readdirectsp
 [<ffffffff8103c93a>] gup_huge_pud+0x9a/0xe0
 [<ffffffff8103cc33>] gup_pud_range+0x173/0x1b0
 [<ffffffff8103cd57>] get_user_pages_fast+0xe7/0x1b0
 [<ffffffff8118eac3>] dio_get_page+0x83/0x150
 [<ffffffff8118f641>] do_direct_IO+0x81/0x420
 [<ffffffff8118fb89>] direct_io_worker+0x1a9/0x340
 [<ffffffffa00c5de8>] ext3_direct_IO+0xe8/0x2c0 [ext3]
 [<ffffffff810fa527>] generic_file_aio_read+0x237/0x260
 [<ffffffff81159878>] do_sync_read+0xc8/0x110
 [<ffffffff8115a027>] vfs_read+0xc7/0x130
 [<ffffffff8115a193>] sys_read+0x53/0xa0
 [<ffffffff81466192>] system_call_fastpath+0x16/0x1b

gup_huge_pud() is trying to find the page structure, and with superpages there
is none.

With direct i/o on 2MB pages:

static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
                int write, struct page **pages, int *nr)
{
 ...
                if (pmd_none(pmd) || pmd_trans_splitting(pmd))
                        return 0;

and pmd_trans_splitting() is testing _PAGE_SPLITTING, which is an alias
for _PAGE_SPECIAL which we set on the 2MB or 1GB pages mapped in by superpages.

But gup_pud_range() has no such check:

static int gup_pud_range(pgd_t pgd, unsigned long addr, unsigned long end,
                        int write, struct page **pages, int *nr)
{
  ...
                if (pud_none(pud))
                        return 0;

Therefore direct i/o on 1GB pages attempts to get a page structure and panics.

Signed-off-by: James Custer <jcuster@sgi.com>
---
 arch/x86/mm/gup.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
index dd74e46..12ca9cf 100644
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -121,7 +121,6 @@ static noinline int gup_huge_pmd(pmd_t pmd, unsigned long addr,
 		mask |= _PAGE_RW;
 	if ((pte_flags(pte) & mask) != mask)
 		return 0;
-	/* hugepages are never "special" */
 	VM_BUG_ON(pte_flags(pte) & _PAGE_SPECIAL);
 	VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
 
@@ -191,7 +190,6 @@ static noinline int gup_huge_pud(pud_t pud, unsigned long addr,
 		mask |= _PAGE_RW;
 	if ((pte_flags(pte) & mask) != mask)
 		return 0;
-	/* hugepages are never "special" */
 	VM_BUG_ON(pte_flags(pte) & _PAGE_SPECIAL);
 	VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
 
@@ -223,7 +221,7 @@ static int gup_pud_range(pgd_t pgd, unsigned long addr, unsigned long end,
 		pud_t pud = *pudp;
 
 		next = pud_addr_end(addr, end);
-		if (pud_none(pud))
+		if (pud_none(pud) || (pud_val(pud) & _PAGE_SPECIAL))
 			return 0;
 		if (unlikely(pud_large(pud))) {
 			if (!gup_huge_pud(pud, addr, next, write, pages, nr))
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-11-26  6:17 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-24 17:19 [PATCH] x86: Allow 1GB pages to be SPECIAL similar to 2MB James Custer
2014-09-24 22:04 ` David Rientjes
2014-09-25 10:39   ` Thomas Gleixner
2014-09-25 14:40 James Custer
2014-11-25 21:23 ` Andrew Morton
2014-11-26  6:17   ` Kirill A. Shutemov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.