All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-11 18:22 ` Gerald Schaefer
  0 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-11 18:22 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm, linux-kernel, Aneesh Kumar K.V, Andrew Morton,
	Linus Torvalds, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Catalin Marinas, Will Deacon,
	linux-arm-kernel, Martin Schwidefsky, Heiko Carstens, linux-s390,
	Sebastian Ott

Hi,

Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
review of the THP rework patches, which cannot be bisected, revealed
commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
(and also similar commits for other archs).

This commit removes the THP splitting bit and also the architecture
implementation of pmdp_splitting_flush(), which took care of the IPI for
fast_gup serialization. The commit message says

    pmdp_splitting_flush() is not needed too: on splitting PMD we will do
    pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
    needed for fast_gup

The assumption that a TLB flush will also produce an IPI is wrong on s390,
and maybe also on other architectures, and I thought that this was actually
the main reason for having an arch-specific pmdp_splitting_flush().

At least PowerPC and ARM also had an individual implementation of
pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
flush to send the IPI, and those were also removed. Putting the arch
maintainers and mailing lists on cc to verify.

On s390 this will break the IPI serialization against fast_gup, which
would certainly explain the random kernel crashes, please revert or fix
the pmdp_splitting_flush() removal.

Regards,
Gerald

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-11 18:22 ` Gerald Schaefer
  0 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-11 18:22 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm, linux-kernel, Aneesh Kumar K.V, Andrew Morton,
	Linus Torvalds, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Catalin Marinas, Will Deacon,
	linux-arm-kernel, Martin Schwidefsky, Heiko Carstens, linux-s390,
	Sebastian Ott

Hi,

Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
review of the THP rework patches, which cannot be bisected, revealed
commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
(and also similar commits for other archs).

This commit removes the THP splitting bit and also the architecture
implementation of pmdp_splitting_flush(), which took care of the IPI for
fast_gup serialization. The commit message says

    pmdp_splitting_flush() is not needed too: on splitting PMD we will do
    pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
    needed for fast_gup

The assumption that a TLB flush will also produce an IPI is wrong on s390,
and maybe also on other architectures, and I thought that this was actually
the main reason for having an arch-specific pmdp_splitting_flush().

At least PowerPC and ARM also had an individual implementation of
pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
flush to send the IPI, and those were also removed. Putting the arch
maintainers and mailing lists on cc to verify.

On s390 this will break the IPI serialization against fast_gup, which
would certainly explain the random kernel crashes, please revert or fix
the pmdp_splitting_flush() removal.

Regards,
Gerald

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-11 18:22 ` Gerald Schaefer
  0 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-11 18:22 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
review of the THP rework patches, which cannot be bisected, revealed
commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
(and also similar commits for other archs).

This commit removes the THP splitting bit and also the architecture
implementation of pmdp_splitting_flush(), which took care of the IPI for
fast_gup serialization. The commit message says

    pmdp_splitting_flush() is not needed too: on splitting PMD we will do
    pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
    needed for fast_gup

The assumption that a TLB flush will also produce an IPI is wrong on s390,
and maybe also on other architectures, and I thought that this was actually
the main reason for having an arch-specific pmdp_splitting_flush().

At least PowerPC and ARM also had an individual implementation of
pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
flush to send the IPI, and those were also removed. Putting the arch
maintainers and mailing lists on cc to verify.

On s390 this will break the IPI serialization against fast_gup, which
would certainly explain the random kernel crashes, please revert or fix
the pmdp_splitting_flush() removal.

Regards,
Gerald

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-11 18:22 ` Gerald Schaefer
  (?)
@ 2016-02-11 19:09   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-11 19:09 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> Hi,
> 
> Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> review of the THP rework patches, which cannot be bisected, revealed
> commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> (and also similar commits for other archs).
> 
> This commit removes the THP splitting bit and also the architecture
> implementation of pmdp_splitting_flush(), which took care of the IPI for
> fast_gup serialization. The commit message says
> 
>     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
>     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
>     needed for fast_gup
> 
> The assumption that a TLB flush will also produce an IPI is wrong on s390,
> and maybe also on other architectures, and I thought that this was actually
> the main reason for having an arch-specific pmdp_splitting_flush().
> 
> At least PowerPC and ARM also had an individual implementation of
> pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> flush to send the IPI, and those were also removed. Putting the arch
> maintainers and mailing lists on cc to verify.
> 
> On s390 this will break the IPI serialization against fast_gup, which
> would certainly explain the random kernel crashes, please revert or fix
> the pmdp_splitting_flush() removal.

Sorry for that.

I believe, the problem was already addressed for PowerPC:

http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com

I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
the trick, right?

If yes, I'll prepare patch tomorrow (some sleep required).

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-11 19:09   ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-11 19:09 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> Hi,
> 
> Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> review of the THP rework patches, which cannot be bisected, revealed
> commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> (and also similar commits for other archs).
> 
> This commit removes the THP splitting bit and also the architecture
> implementation of pmdp_splitting_flush(), which took care of the IPI for
> fast_gup serialization. The commit message says
> 
>     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
>     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
>     needed for fast_gup
> 
> The assumption that a TLB flush will also produce an IPI is wrong on s390,
> and maybe also on other architectures, and I thought that this was actually
> the main reason for having an arch-specific pmdp_splitting_flush().
> 
> At least PowerPC and ARM also had an individual implementation of
> pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> flush to send the IPI, and those were also removed. Putting the arch
> maintainers and mailing lists on cc to verify.
> 
> On s390 this will break the IPI serialization against fast_gup, which
> would certainly explain the random kernel crashes, please revert or fix
> the pmdp_splitting_flush() removal.

Sorry for that.

I believe, the problem was already addressed for PowerPC:

http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com

I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
the trick, right?

If yes, I'll prepare patch tomorrow (some sleep required).

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-11 19:09   ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-11 19:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> Hi,
> 
> Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> review of the THP rework patches, which cannot be bisected, revealed
> commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> (and also similar commits for other archs).
> 
> This commit removes the THP splitting bit and also the architecture
> implementation of pmdp_splitting_flush(), which took care of the IPI for
> fast_gup serialization. The commit message says
> 
>     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
>     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
>     needed for fast_gup
> 
> The assumption that a TLB flush will also produce an IPI is wrong on s390,
> and maybe also on other architectures, and I thought that this was actually
> the main reason for having an arch-specific pmdp_splitting_flush().
> 
> At least PowerPC and ARM also had an individual implementation of
> pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> flush to send the IPI, and those were also removed. Putting the arch
> maintainers and mailing lists on cc to verify.
> 
> On s390 this will break the IPI serialization against fast_gup, which
> would certainly explain the random kernel crashes, please revert or fix
> the pmdp_splitting_flush() removal.

Sorry for that.

I believe, the problem was already addressed for PowerPC:

http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar at linux.vnet.ibm.com

I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
the trick, right?

If yes, I'll prepare patch tomorrow (some sleep required).

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-11 19:09   ` Kirill A. Shutemov
  (?)
@ 2016-02-11 19:12     ` Kirill A. Shutemov
  -1 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-11 19:12 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Gerald Schaefer, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On Thu, Feb 11, 2016 at 09:09:42PM +0200, Kirill A. Shutemov wrote:
> On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> > Hi,
> > 
> > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> > he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> > review of the THP rework patches, which cannot be bisected, revealed
> > commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> > (and also similar commits for other archs).
> > 
> > This commit removes the THP splitting bit and also the architecture
> > implementation of pmdp_splitting_flush(), which took care of the IPI for
> > fast_gup serialization. The commit message says
> > 
> >     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
> >     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
> >     needed for fast_gup
> > 
> > The assumption that a TLB flush will also produce an IPI is wrong on s390,
> > and maybe also on other architectures, and I thought that this was actually
> > the main reason for having an arch-specific pmdp_splitting_flush().
> > 
> > At least PowerPC and ARM also had an individual implementation of
> > pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> > flush to send the IPI, and those were also removed. Putting the arch
> > maintainers and mailing lists on cc to verify.
> > 
> > On s390 this will break the IPI serialization against fast_gup, which
> > would certainly explain the random kernel crashes, please revert or fix
> > the pmdp_splitting_flush() removal.
> 
> Sorry for that.
> 
> I believe, the problem was already addressed for PowerPC:
> 
> http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com

Correct link is

http://lkml.kernel.org/g/1454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-11 19:12     ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-11 19:12 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Gerald Schaefer, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On Thu, Feb 11, 2016 at 09:09:42PM +0200, Kirill A. Shutemov wrote:
> On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> > Hi,
> > 
> > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> > he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> > review of the THP rework patches, which cannot be bisected, revealed
> > commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> > (and also similar commits for other archs).
> > 
> > This commit removes the THP splitting bit and also the architecture
> > implementation of pmdp_splitting_flush(), which took care of the IPI for
> > fast_gup serialization. The commit message says
> > 
> >     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
> >     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
> >     needed for fast_gup
> > 
> > The assumption that a TLB flush will also produce an IPI is wrong on s390,
> > and maybe also on other architectures, and I thought that this was actually
> > the main reason for having an arch-specific pmdp_splitting_flush().
> > 
> > At least PowerPC and ARM also had an individual implementation of
> > pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> > flush to send the IPI, and those were also removed. Putting the arch
> > maintainers and mailing lists on cc to verify.
> > 
> > On s390 this will break the IPI serialization against fast_gup, which
> > would certainly explain the random kernel crashes, please revert or fix
> > the pmdp_splitting_flush() removal.
> 
> Sorry for that.
> 
> I believe, the problem was already addressed for PowerPC:
> 
> http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com

Correct link is

http://lkml.kernel.org/g/1454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-11 19:12     ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-11 19:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 11, 2016 at 09:09:42PM +0200, Kirill A. Shutemov wrote:
> On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> > Hi,
> > 
> > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> > he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> > review of the THP rework patches, which cannot be bisected, revealed
> > commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> > (and also similar commits for other archs).
> > 
> > This commit removes the THP splitting bit and also the architecture
> > implementation of pmdp_splitting_flush(), which took care of the IPI for
> > fast_gup serialization. The commit message says
> > 
> >     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
> >     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
> >     needed for fast_gup
> > 
> > The assumption that a TLB flush will also produce an IPI is wrong on s390,
> > and maybe also on other architectures, and I thought that this was actually
> > the main reason for having an arch-specific pmdp_splitting_flush().
> > 
> > At least PowerPC and ARM also had an individual implementation of
> > pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> > flush to send the IPI, and those were also removed. Putting the arch
> > maintainers and mailing lists on cc to verify.
> > 
> > On s390 this will break the IPI serialization against fast_gup, which
> > would certainly explain the random kernel crashes, please revert or fix
> > the pmdp_splitting_flush() removal.
> 
> Sorry for that.
> 
> I believe, the problem was already addressed for PowerPC:
> 
> http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar at linux.vnet.ibm.com

Correct link is

http://lkml.kernel.org/g/1454980831-16631-1-git-send-email-aneesh.kumar at linux.vnet.ibm.com

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-11 19:09   ` Kirill A. Shutemov
  (?)
@ 2016-02-11 19:57     ` Gerald Schaefer
  -1 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-11 19:57 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On Thu, 11 Feb 2016 21:09:42 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> > Hi,
> > 
> > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> > he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> > review of the THP rework patches, which cannot be bisected, revealed
> > commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> > (and also similar commits for other archs).
> > 
> > This commit removes the THP splitting bit and also the architecture
> > implementation of pmdp_splitting_flush(), which took care of the IPI for
> > fast_gup serialization. The commit message says
> > 
> >     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
> >     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
> >     needed for fast_gup
> > 
> > The assumption that a TLB flush will also produce an IPI is wrong on s390,
> > and maybe also on other architectures, and I thought that this was actually
> > the main reason for having an arch-specific pmdp_splitting_flush().
> > 
> > At least PowerPC and ARM also had an individual implementation of
> > pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> > flush to send the IPI, and those were also removed. Putting the arch
> > maintainers and mailing lists on cc to verify.
> > 
> > On s390 this will break the IPI serialization against fast_gup, which
> > would certainly explain the random kernel crashes, please revert or fix
> > the pmdp_splitting_flush() removal.
> 
> Sorry for that.
> 
> I believe, the problem was already addressed for PowerPC:
> 
> http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
> 
> I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
> the trick, right?

Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
fast_gup will still return false, because the pmd is not empty (at least
on s390). So I don't see spontaneously how it will help fast_gup to break
out to the slow path in case of THP splitting.

> 
> If yes, I'll prepare patch tomorrow (some sleep required).
> 

We'll check if adding kick_all_cpus_sync() to pmdp_invalidate() helps.
It would also be good if Martin has a look at this, he'll return on
Monday.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-11 19:57     ` Gerald Schaefer
  0 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-11 19:57 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On Thu, 11 Feb 2016 21:09:42 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> > Hi,
> > 
> > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> > he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> > review of the THP rework patches, which cannot be bisected, revealed
> > commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> > (and also similar commits for other archs).
> > 
> > This commit removes the THP splitting bit and also the architecture
> > implementation of pmdp_splitting_flush(), which took care of the IPI for
> > fast_gup serialization. The commit message says
> > 
> >     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
> >     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
> >     needed for fast_gup
> > 
> > The assumption that a TLB flush will also produce an IPI is wrong on s390,
> > and maybe also on other architectures, and I thought that this was actually
> > the main reason for having an arch-specific pmdp_splitting_flush().
> > 
> > At least PowerPC and ARM also had an individual implementation of
> > pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> > flush to send the IPI, and those were also removed. Putting the arch
> > maintainers and mailing lists on cc to verify.
> > 
> > On s390 this will break the IPI serialization against fast_gup, which
> > would certainly explain the random kernel crashes, please revert or fix
> > the pmdp_splitting_flush() removal.
> 
> Sorry for that.
> 
> I believe, the problem was already addressed for PowerPC:
> 
> http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
> 
> I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
> the trick, right?

Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
fast_gup will still return false, because the pmd is not empty (at least
on s390). So I don't see spontaneously how it will help fast_gup to break
out to the slow path in case of THP splitting.

> 
> If yes, I'll prepare patch tomorrow (some sleep required).
> 

We'll check if adding kick_all_cpus_sync() to pmdp_invalidate() helps.
It would also be good if Martin has a look at this, he'll return on
Monday.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-11 19:57     ` Gerald Schaefer
  0 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-11 19:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 11 Feb 2016 21:09:42 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> > Hi,
> > 
> > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> > he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> > review of the THP rework patches, which cannot be bisected, revealed
> > commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> > (and also similar commits for other archs).
> > 
> > This commit removes the THP splitting bit and also the architecture
> > implementation of pmdp_splitting_flush(), which took care of the IPI for
> > fast_gup serialization. The commit message says
> > 
> >     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
> >     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
> >     needed for fast_gup
> > 
> > The assumption that a TLB flush will also produce an IPI is wrong on s390,
> > and maybe also on other architectures, and I thought that this was actually
> > the main reason for having an arch-specific pmdp_splitting_flush().
> > 
> > At least PowerPC and ARM also had an individual implementation of
> > pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> > flush to send the IPI, and those were also removed. Putting the arch
> > maintainers and mailing lists on cc to verify.
> > 
> > On s390 this will break the IPI serialization against fast_gup, which
> > would certainly explain the random kernel crashes, please revert or fix
> > the pmdp_splitting_flush() removal.
> 
> Sorry for that.
> 
> I believe, the problem was already addressed for PowerPC:
> 
> http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar at linux.vnet.ibm.com
> 
> I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
> the trick, right?

Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
fast_gup will still return false, because the pmd is not empty (at least
on s390). So I don't see spontaneously how it will help fast_gup to break
out to the slow path in case of THP splitting.

> 
> If yes, I'll prepare patch tomorrow (some sleep required).
> 

We'll check if adding kick_all_cpus_sync() to pmdp_invalidate() helps.
It would also be good if Martin has a look at this, he'll return on
Monday.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-11 19:57     ` Gerald Schaefer
  (?)
@ 2016-02-12  4:04       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 153+ messages in thread
From: Aneesh Kumar K.V @ 2016-02-12  4:04 UTC (permalink / raw)
  To: Gerald Schaefer, Kirill A. Shutemov
  Cc: Kirill A. Shutemov, linux-mm, linux-kernel, Andrew Morton,
	Linus Torvalds, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Catalin Marinas, Will Deacon,
	linux-arm-kernel, Martin Schwidefsky, Heiko Carstens, linux-s390,
	Sebastian Ott

Gerald Schaefer <gerald.schaefer@de.ibm.com> writes:

> On Thu, 11 Feb 2016 21:09:42 +0200
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
>
>> On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
>> > Hi,
>> > 
>> > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
>> > he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
>> > review of the THP rework patches, which cannot be bisected, revealed
>> > commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
>> > (and also similar commits for other archs).
>> > 
>> > This commit removes the THP splitting bit and also the architecture
>> > implementation of pmdp_splitting_flush(), which took care of the IPI for
>> > fast_gup serialization. The commit message says
>> > 
>> >     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
>> >     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
>> >     needed for fast_gup
>> > 
>> > The assumption that a TLB flush will also produce an IPI is wrong on s390,
>> > and maybe also on other architectures, and I thought that this was actually
>> > the main reason for having an arch-specific pmdp_splitting_flush().
>> > 
>> > At least PowerPC and ARM also had an individual implementation of
>> > pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
>> > flush to send the IPI, and those were also removed. Putting the arch
>> > maintainers and mailing lists on cc to verify.
>> > 
>> > On s390 this will break the IPI serialization against fast_gup, which
>> > would certainly explain the random kernel crashes, please revert or fix
>> > the pmdp_splitting_flush() removal.
>> 
>> Sorry for that.
>> 
>> I believe, the problem was already addressed for PowerPC:
>> 
>> http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
>> 
>> I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
>> the trick, right?
>
> Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
> fast_gup will still return false, because the pmd is not empty (at least
> on s390).

Why can't we do this ? I did this for ppc64.

 void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 		     pmd_t *pmdp)
 {
-	pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, 0);
+	pmd_hugepage_update(vma->vm_mm, address, pmdp, ~0UL, 0);

>So I don't see spontaneously how it will help fast_gup to break
> out to the slow path in case of THP splitting.
>
>> 
>> If yes, I'll prepare patch tomorrow (some sleep required).
>> 
>
> We'll check if adding kick_all_cpus_sync() to pmdp_invalidate() helps.
> It would also be good if Martin has a look at this, he'll return on
> Monday.

-aneesh

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-12  4:04       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 153+ messages in thread
From: Aneesh Kumar K.V @ 2016-02-12  4:04 UTC (permalink / raw)
  To: Gerald Schaefer, Kirill A. Shutemov
  Cc: Kirill A. Shutemov, linux-mm, linux-kernel, Andrew Morton,
	Linus Torvalds, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Catalin Marinas, Will Deacon,
	linux-arm-kernel, Martin Schwidefsky, Heiko Carstens, linux-s390,
	Sebastian Ott

Gerald Schaefer <gerald.schaefer@de.ibm.com> writes:

> On Thu, 11 Feb 2016 21:09:42 +0200
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
>
>> On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
>> > Hi,
>> > 
>> > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
>> > he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
>> > review of the THP rework patches, which cannot be bisected, revealed
>> > commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
>> > (and also similar commits for other archs).
>> > 
>> > This commit removes the THP splitting bit and also the architecture
>> > implementation of pmdp_splitting_flush(), which took care of the IPI for
>> > fast_gup serialization. The commit message says
>> > 
>> >     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
>> >     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
>> >     needed for fast_gup
>> > 
>> > The assumption that a TLB flush will also produce an IPI is wrong on s390,
>> > and maybe also on other architectures, and I thought that this was actually
>> > the main reason for having an arch-specific pmdp_splitting_flush().
>> > 
>> > At least PowerPC and ARM also had an individual implementation of
>> > pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
>> > flush to send the IPI, and those were also removed. Putting the arch
>> > maintainers and mailing lists on cc to verify.
>> > 
>> > On s390 this will break the IPI serialization against fast_gup, which
>> > would certainly explain the random kernel crashes, please revert or fix
>> > the pmdp_splitting_flush() removal.
>> 
>> Sorry for that.
>> 
>> I believe, the problem was already addressed for PowerPC:
>> 
>> http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
>> 
>> I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
>> the trick, right?
>
> Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
> fast_gup will still return false, because the pmd is not empty (at least
> on s390).

Why can't we do this ? I did this for ppc64.

 void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 		     pmd_t *pmdp)
 {
-	pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, 0);
+	pmd_hugepage_update(vma->vm_mm, address, pmdp, ~0UL, 0);

>So I don't see spontaneously how it will help fast_gup to break
> out to the slow path in case of THP splitting.
>
>> 
>> If yes, I'll prepare patch tomorrow (some sleep required).
>> 
>
> We'll check if adding kick_all_cpus_sync() to pmdp_invalidate() helps.
> It would also be good if Martin has a look at this, he'll return on
> Monday.

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-12  4:04       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 153+ messages in thread
From: Aneesh Kumar K.V @ 2016-02-12  4:04 UTC (permalink / raw)
  To: linux-arm-kernel

Gerald Schaefer <gerald.schaefer@de.ibm.com> writes:

> On Thu, 11 Feb 2016 21:09:42 +0200
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
>
>> On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
>> > Hi,
>> > 
>> > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
>> > he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
>> > review of the THP rework patches, which cannot be bisected, revealed
>> > commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
>> > (and also similar commits for other archs).
>> > 
>> > This commit removes the THP splitting bit and also the architecture
>> > implementation of pmdp_splitting_flush(), which took care of the IPI for
>> > fast_gup serialization. The commit message says
>> > 
>> >     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
>> >     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
>> >     needed for fast_gup
>> > 
>> > The assumption that a TLB flush will also produce an IPI is wrong on s390,
>> > and maybe also on other architectures, and I thought that this was actually
>> > the main reason for having an arch-specific pmdp_splitting_flush().
>> > 
>> > At least PowerPC and ARM also had an individual implementation of
>> > pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
>> > flush to send the IPI, and those were also removed. Putting the arch
>> > maintainers and mailing lists on cc to verify.
>> > 
>> > On s390 this will break the IPI serialization against fast_gup, which
>> > would certainly explain the random kernel crashes, please revert or fix
>> > the pmdp_splitting_flush() removal.
>> 
>> Sorry for that.
>> 
>> I believe, the problem was already addressed for PowerPC:
>> 
>> http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar at linux.vnet.ibm.com
>> 
>> I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
>> the trick, right?
>
> Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
> fast_gup will still return false, because the pmd is not empty (at least
> on s390).

Why can't we do this ? I did this for ppc64.

 void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 		     pmd_t *pmdp)
 {
-	pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, 0);
+	pmd_hugepage_update(vma->vm_mm, address, pmdp, ~0UL, 0);

>So I don't see spontaneously how it will help fast_gup to break
> out to the slow path in case of THP splitting.
>
>> 
>> If yes, I'll prepare patch tomorrow (some sleep required).
>> 
>
> We'll check if adding kick_all_cpus_sync() to pmdp_invalidate() helps.
> It would also be good if Martin has a look at this, he'll return on
> Monday.

-aneesh

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-11 19:57     ` Gerald Schaefer
  (?)
@ 2016-02-12 10:01       ` Will Deacon
  -1 siblings, 0 replies; 153+ messages in thread
From: Will Deacon @ 2016-02-12 10:01 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Kirill A. Shutemov, Kirill A. Shutemov, linux-mm, linux-kernel,
	Aneesh Kumar K.V, Andrew Morton, Linus Torvalds,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Catalin Marinas, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On Thu, Feb 11, 2016 at 08:57:02PM +0100, Gerald Schaefer wrote:
> On Thu, 11 Feb 2016 21:09:42 +0200
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> > On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> > > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> > > he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> > > review of the THP rework patches, which cannot be bisected, revealed
> > > commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> > > (and also similar commits for other archs).
> > > 
> > > This commit removes the THP splitting bit and also the architecture
> > > implementation of pmdp_splitting_flush(), which took care of the IPI for
> > > fast_gup serialization. The commit message says
> > > 
> > >     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
> > >     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
> > >     needed for fast_gup
> > > 
> > > The assumption that a TLB flush will also produce an IPI is wrong on s390,
> > > and maybe also on other architectures, and I thought that this was actually
> > > the main reason for having an arch-specific pmdp_splitting_flush().
> > > 
> > > At least PowerPC and ARM also had an individual implementation of
> > > pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> > > flush to send the IPI, and those were also removed. Putting the arch
> > > maintainers and mailing lists on cc to verify.
> > > 
> > > On s390 this will break the IPI serialization against fast_gup, which
> > > would certainly explain the random kernel crashes, please revert or fix
> > > the pmdp_splitting_flush() removal.
> > 
> > Sorry for that.
> > 
> > I believe, the problem was already addressed for PowerPC:
> > 
> > http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
> > 
> > I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
> > the trick, right?
> 
> Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
> fast_gup will still return false, because the pmd is not empty (at least
> on s390). So I don't see spontaneously how it will help fast_gup to break
> out to the slow path in case of THP splitting.
> 
> > 
> > If yes, I'll prepare patch tomorrow (some sleep required).
> > 
> 
> We'll check if adding kick_all_cpus_sync() to pmdp_invalidate() helps.
> It would also be good if Martin has a look at this, he'll return on
> Monday.

Do you have a reliable way to trigger the "random kernel crashes"? We've not
seen anything reported on arm64, but I don't see why we wouldn't be affected
by the same bug and it would be good to confirm and validate a fix.

Cheers,

Will

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-12 10:01       ` Will Deacon
  0 siblings, 0 replies; 153+ messages in thread
From: Will Deacon @ 2016-02-12 10:01 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Kirill A. Shutemov, Kirill A. Shutemov, linux-mm, linux-kernel,
	Aneesh Kumar K.V, Andrew Morton, Linus Torvalds,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Catalin Marinas, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On Thu, Feb 11, 2016 at 08:57:02PM +0100, Gerald Schaefer wrote:
> On Thu, 11 Feb 2016 21:09:42 +0200
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> > On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> > > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> > > he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> > > review of the THP rework patches, which cannot be bisected, revealed
> > > commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> > > (and also similar commits for other archs).
> > > 
> > > This commit removes the THP splitting bit and also the architecture
> > > implementation of pmdp_splitting_flush(), which took care of the IPI for
> > > fast_gup serialization. The commit message says
> > > 
> > >     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
> > >     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
> > >     needed for fast_gup
> > > 
> > > The assumption that a TLB flush will also produce an IPI is wrong on s390,
> > > and maybe also on other architectures, and I thought that this was actually
> > > the main reason for having an arch-specific pmdp_splitting_flush().
> > > 
> > > At least PowerPC and ARM also had an individual implementation of
> > > pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> > > flush to send the IPI, and those were also removed. Putting the arch
> > > maintainers and mailing lists on cc to verify.
> > > 
> > > On s390 this will break the IPI serialization against fast_gup, which
> > > would certainly explain the random kernel crashes, please revert or fix
> > > the pmdp_splitting_flush() removal.
> > 
> > Sorry for that.
> > 
> > I believe, the problem was already addressed for PowerPC:
> > 
> > http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
> > 
> > I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
> > the trick, right?
> 
> Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
> fast_gup will still return false, because the pmd is not empty (at least
> on s390). So I don't see spontaneously how it will help fast_gup to break
> out to the slow path in case of THP splitting.
> 
> > 
> > If yes, I'll prepare patch tomorrow (some sleep required).
> > 
> 
> We'll check if adding kick_all_cpus_sync() to pmdp_invalidate() helps.
> It would also be good if Martin has a look at this, he'll return on
> Monday.

Do you have a reliable way to trigger the "random kernel crashes"? We've not
seen anything reported on arm64, but I don't see why we wouldn't be affected
by the same bug and it would be good to confirm and validate a fix.

Cheers,

Will

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-12 10:01       ` Will Deacon
  0 siblings, 0 replies; 153+ messages in thread
From: Will Deacon @ 2016-02-12 10:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 11, 2016 at 08:57:02PM +0100, Gerald Schaefer wrote:
> On Thu, 11 Feb 2016 21:09:42 +0200
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> > On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> > > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> > > he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> > > review of the THP rework patches, which cannot be bisected, revealed
> > > commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> > > (and also similar commits for other archs).
> > > 
> > > This commit removes the THP splitting bit and also the architecture
> > > implementation of pmdp_splitting_flush(), which took care of the IPI for
> > > fast_gup serialization. The commit message says
> > > 
> > >     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
> > >     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
> > >     needed for fast_gup
> > > 
> > > The assumption that a TLB flush will also produce an IPI is wrong on s390,
> > > and maybe also on other architectures, and I thought that this was actually
> > > the main reason for having an arch-specific pmdp_splitting_flush().
> > > 
> > > At least PowerPC and ARM also had an individual implementation of
> > > pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> > > flush to send the IPI, and those were also removed. Putting the arch
> > > maintainers and mailing lists on cc to verify.
> > > 
> > > On s390 this will break the IPI serialization against fast_gup, which
> > > would certainly explain the random kernel crashes, please revert or fix
> > > the pmdp_splitting_flush() removal.
> > 
> > Sorry for that.
> > 
> > I believe, the problem was already addressed for PowerPC:
> > 
> > http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar at linux.vnet.ibm.com
> > 
> > I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
> > the trick, right?
> 
> Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
> fast_gup will still return false, because the pmd is not empty (at least
> on s390). So I don't see spontaneously how it will help fast_gup to break
> out to the slow path in case of THP splitting.
> 
> > 
> > If yes, I'll prepare patch tomorrow (some sleep required).
> > 
> 
> We'll check if adding kick_all_cpus_sync() to pmdp_invalidate() helps.
> It would also be good if Martin has a look at this, he'll return on
> Monday.

Do you have a reliable way to trigger the "random kernel crashes"? We've not
seen anything reported on arm64, but I don't see why we wouldn't be affected
by the same bug and it would be good to confirm and validate a fix.

Cheers,

Will

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-12 10:01       ` Will Deacon
  (?)
@ 2016-02-12 10:12         ` Sebastian Ott
  -1 siblings, 0 replies; 153+ messages in thread
From: Sebastian Ott @ 2016-02-12 10:12 UTC (permalink / raw)
  To: Will Deacon
  Cc: Gerald Schaefer, Kirill A. Shutemov, Kirill A. Shutemov,
	linux-mm, linux-kernel, Aneesh Kumar K.V, Andrew Morton,
	Linus Torvalds, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Catalin Marinas, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

On Fri, 12 Feb 2016, Will Deacon wrote:
> On Thu, Feb 11, 2016 at 08:57:02PM +0100, Gerald Schaefer wrote:
> > On Thu, 11 Feb 2016 21:09:42 +0200
> > "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> > > On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> > > > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> > > > he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> > > > review of the THP rework patches, which cannot be bisected, revealed
> > > > commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> > > > (and also similar commits for other archs).
> > > > 
> > > > This commit removes the THP splitting bit and also the architecture
> > > > implementation of pmdp_splitting_flush(), which took care of the IPI for
> > > > fast_gup serialization. The commit message says
> > > > 
> > > >     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
> > > >     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
> > > >     needed for fast_gup
> > > > 
> > > > The assumption that a TLB flush will also produce an IPI is wrong on s390,
> > > > and maybe also on other architectures, and I thought that this was actually
> > > > the main reason for having an arch-specific pmdp_splitting_flush().
> > > > 
> > > > At least PowerPC and ARM also had an individual implementation of
> > > > pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> > > > flush to send the IPI, and those were also removed. Putting the arch
> > > > maintainers and mailing lists on cc to verify.
> > > > 
> > > > On s390 this will break the IPI serialization against fast_gup, which
> > > > would certainly explain the random kernel crashes, please revert or fix
> > > > the pmdp_splitting_flush() removal.
> > > 
> > > Sorry for that.
> > > 
> > > I believe, the problem was already addressed for PowerPC:
> > > 
> > > http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
> > > 
> > > I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
> > > the trick, right?
> > 
> > Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
> > fast_gup will still return false, because the pmd is not empty (at least
> > on s390). So I don't see spontaneously how it will help fast_gup to break
> > out to the slow path in case of THP splitting.
> > 
> > > 
> > > If yes, I'll prepare patch tomorrow (some sleep required).
> > > 
> > 
> > We'll check if adding kick_all_cpus_sync() to pmdp_invalidate() helps.
> > It would also be good if Martin has a look at this, he'll return on
> > Monday.
> 
> Do you have a reliable way to trigger the "random kernel crashes"? We've not
> seen anything reported on arm64, but I don't see why we wouldn't be affected
> by the same bug and it would be good to confirm and validate a fix.

My testcase was compiling the kernel. Most of the time my test system
didn't survive a single compile run. During bisect I did at least 20
compile runs to flag a commit as good.

Sebastian

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-12 10:12         ` Sebastian Ott
  0 siblings, 0 replies; 153+ messages in thread
From: Sebastian Ott @ 2016-02-12 10:12 UTC (permalink / raw)
  To: Will Deacon
  Cc: Gerald Schaefer, Kirill A. Shutemov, Kirill A. Shutemov,
	linux-mm, linux-kernel, Aneesh Kumar K.V, Andrew Morton,
	Linus Torvalds, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Catalin Marinas, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

On Fri, 12 Feb 2016, Will Deacon wrote:
> On Thu, Feb 11, 2016 at 08:57:02PM +0100, Gerald Schaefer wrote:
> > On Thu, 11 Feb 2016 21:09:42 +0200
> > "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> > > On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> > > > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> > > > he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> > > > review of the THP rework patches, which cannot be bisected, revealed
> > > > commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> > > > (and also similar commits for other archs).
> > > > 
> > > > This commit removes the THP splitting bit and also the architecture
> > > > implementation of pmdp_splitting_flush(), which took care of the IPI for
> > > > fast_gup serialization. The commit message says
> > > > 
> > > >     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
> > > >     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
> > > >     needed for fast_gup
> > > > 
> > > > The assumption that a TLB flush will also produce an IPI is wrong on s390,
> > > > and maybe also on other architectures, and I thought that this was actually
> > > > the main reason for having an arch-specific pmdp_splitting_flush().
> > > > 
> > > > At least PowerPC and ARM also had an individual implementation of
> > > > pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> > > > flush to send the IPI, and those were also removed. Putting the arch
> > > > maintainers and mailing lists on cc to verify.
> > > > 
> > > > On s390 this will break the IPI serialization against fast_gup, which
> > > > would certainly explain the random kernel crashes, please revert or fix
> > > > the pmdp_splitting_flush() removal.
> > > 
> > > Sorry for that.
> > > 
> > > I believe, the problem was already addressed for PowerPC:
> > > 
> > > http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
> > > 
> > > I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
> > > the trick, right?
> > 
> > Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
> > fast_gup will still return false, because the pmd is not empty (at least
> > on s390). So I don't see spontaneously how it will help fast_gup to break
> > out to the slow path in case of THP splitting.
> > 
> > > 
> > > If yes, I'll prepare patch tomorrow (some sleep required).
> > > 
> > 
> > We'll check if adding kick_all_cpus_sync() to pmdp_invalidate() helps.
> > It would also be good if Martin has a look at this, he'll return on
> > Monday.
> 
> Do you have a reliable way to trigger the "random kernel crashes"? We've not
> seen anything reported on arm64, but I don't see why we wouldn't be affected
> by the same bug and it would be good to confirm and validate a fix.

My testcase was compiling the kernel. Most of the time my test system
didn't survive a single compile run. During bisect I did at least 20
compile runs to flag a commit as good.

Sebastian

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-12 10:12         ` Sebastian Ott
  0 siblings, 0 replies; 153+ messages in thread
From: Sebastian Ott @ 2016-02-12 10:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 12 Feb 2016, Will Deacon wrote:
> On Thu, Feb 11, 2016 at 08:57:02PM +0100, Gerald Schaefer wrote:
> > On Thu, 11 Feb 2016 21:09:42 +0200
> > "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> > > On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> > > > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> > > > he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> > > > review of the THP rework patches, which cannot be bisected, revealed
> > > > commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> > > > (and also similar commits for other archs).
> > > > 
> > > > This commit removes the THP splitting bit and also the architecture
> > > > implementation of pmdp_splitting_flush(), which took care of the IPI for
> > > > fast_gup serialization. The commit message says
> > > > 
> > > >     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
> > > >     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
> > > >     needed for fast_gup
> > > > 
> > > > The assumption that a TLB flush will also produce an IPI is wrong on s390,
> > > > and maybe also on other architectures, and I thought that this was actually
> > > > the main reason for having an arch-specific pmdp_splitting_flush().
> > > > 
> > > > At least PowerPC and ARM also had an individual implementation of
> > > > pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> > > > flush to send the IPI, and those were also removed. Putting the arch
> > > > maintainers and mailing lists on cc to verify.
> > > > 
> > > > On s390 this will break the IPI serialization against fast_gup, which
> > > > would certainly explain the random kernel crashes, please revert or fix
> > > > the pmdp_splitting_flush() removal.
> > > 
> > > Sorry for that.
> > > 
> > > I believe, the problem was already addressed for PowerPC:
> > > 
> > > http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar at linux.vnet.ibm.com
> > > 
> > > I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
> > > the trick, right?
> > 
> > Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
> > fast_gup will still return false, because the pmd is not empty (at least
> > on s390). So I don't see spontaneously how it will help fast_gup to break
> > out to the slow path in case of THP splitting.
> > 
> > > 
> > > If yes, I'll prepare patch tomorrow (some sleep required).
> > > 
> > 
> > We'll check if adding kick_all_cpus_sync() to pmdp_invalidate() helps.
> > It would also be good if Martin has a look at this, he'll return on
> > Monday.
> 
> Do you have a reliable way to trigger the "random kernel crashes"? We've not
> seen anything reported on arm64, but I don't see why we wouldn't be affected
> by the same bug and it would be good to confirm and validate a fix.

My testcase was compiling the kernel. Most of the time my test system
didn't survive a single compile run. During bisect I did at least 20
compile runs to flag a commit as good.

Sebastian

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-12  4:04       ` Aneesh Kumar K.V
  (?)
@ 2016-02-12 11:59         ` Gerald Schaefer
  -1 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-12 11:59 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Kirill A. Shutemov, Kirill A. Shutemov, linux-mm, linux-kernel,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On Fri, 12 Feb 2016 09:34:33 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:

> Gerald Schaefer <gerald.schaefer@de.ibm.com> writes:
> 
> > On Thu, 11 Feb 2016 21:09:42 +0200
> > "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> >
> >> On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> >> > Hi,
> >> > 
> >> > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> >> > he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> >> > review of the THP rework patches, which cannot be bisected, revealed
> >> > commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> >> > (and also similar commits for other archs).
> >> > 
> >> > This commit removes the THP splitting bit and also the architecture
> >> > implementation of pmdp_splitting_flush(), which took care of the IPI for
> >> > fast_gup serialization. The commit message says
> >> > 
> >> >     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
> >> >     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
> >> >     needed for fast_gup
> >> > 
> >> > The assumption that a TLB flush will also produce an IPI is wrong on s390,
> >> > and maybe also on other architectures, and I thought that this was actually
> >> > the main reason for having an arch-specific pmdp_splitting_flush().
> >> > 
> >> > At least PowerPC and ARM also had an individual implementation of
> >> > pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> >> > flush to send the IPI, and those were also removed. Putting the arch
> >> > maintainers and mailing lists on cc to verify.
> >> > 
> >> > On s390 this will break the IPI serialization against fast_gup, which
> >> > would certainly explain the random kernel crashes, please revert or fix
> >> > the pmdp_splitting_flush() removal.
> >> 
> >> Sorry for that.
> >> 
> >> I believe, the problem was already addressed for PowerPC:
> >> 
> >> http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
> >> 
> >> I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
> >> the trick, right?
> >
> > Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
> > fast_gup will still return false, because the pmd is not empty (at least
> > on s390).
> 
> Why can't we do this ? I did this for ppc64.
> 
>  void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>  		     pmd_t *pmdp)
>  {
> -	pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, 0);
> +	pmd_hugepage_update(vma->vm_mm, address, pmdp, ~0UL, 0);
> 

Wouldn't that semantically change what pmdp_invalidate() was supposed to
do? The comment before the call says "the pmd_trans_huge and
pmd_trans_splitting must remain set at all times on the pmd". So, after
removing pmd_trans_splitting, it seems to be necessary to at least keep
pmd_trans_huge set.

In your case, the pmd would be completely cleared, which may help to find
it in fast_gup with pmd_none(), but I'm not sure if this would open up
other problems, e.g. with concurrent page faults. But I must also admit that
my THP overview got a little rusty.

> >So I don't see spontaneously how it will help fast_gup to break
> > out to the slow path in case of THP splitting.
> >
> >> 
> >> If yes, I'll prepare patch tomorrow (some sleep required).
> >> 
> >
> > We'll check if adding kick_all_cpus_sync() to pmdp_invalidate() helps.
> > It would also be good if Martin has a look at this, he'll return on
> > Monday.
> 
> -aneesh
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-12 11:59         ` Gerald Schaefer
  0 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-12 11:59 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Kirill A. Shutemov, Kirill A. Shutemov, linux-mm, linux-kernel,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On Fri, 12 Feb 2016 09:34:33 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:

> Gerald Schaefer <gerald.schaefer@de.ibm.com> writes:
> 
> > On Thu, 11 Feb 2016 21:09:42 +0200
> > "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> >
> >> On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> >> > Hi,
> >> > 
> >> > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> >> > he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> >> > review of the THP rework patches, which cannot be bisected, revealed
> >> > commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> >> > (and also similar commits for other archs).
> >> > 
> >> > This commit removes the THP splitting bit and also the architecture
> >> > implementation of pmdp_splitting_flush(), which took care of the IPI for
> >> > fast_gup serialization. The commit message says
> >> > 
> >> >     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
> >> >     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
> >> >     needed for fast_gup
> >> > 
> >> > The assumption that a TLB flush will also produce an IPI is wrong on s390,
> >> > and maybe also on other architectures, and I thought that this was actually
> >> > the main reason for having an arch-specific pmdp_splitting_flush().
> >> > 
> >> > At least PowerPC and ARM also had an individual implementation of
> >> > pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> >> > flush to send the IPI, and those were also removed. Putting the arch
> >> > maintainers and mailing lists on cc to verify.
> >> > 
> >> > On s390 this will break the IPI serialization against fast_gup, which
> >> > would certainly explain the random kernel crashes, please revert or fix
> >> > the pmdp_splitting_flush() removal.
> >> 
> >> Sorry for that.
> >> 
> >> I believe, the problem was already addressed for PowerPC:
> >> 
> >> http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
> >> 
> >> I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
> >> the trick, right?
> >
> > Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
> > fast_gup will still return false, because the pmd is not empty (at least
> > on s390).
> 
> Why can't we do this ? I did this for ppc64.
> 
>  void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>  		     pmd_t *pmdp)
>  {
> -	pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, 0);
> +	pmd_hugepage_update(vma->vm_mm, address, pmdp, ~0UL, 0);
> 

Wouldn't that semantically change what pmdp_invalidate() was supposed to
do? The comment before the call says "the pmd_trans_huge and
pmd_trans_splitting must remain set at all times on the pmd". So, after
removing pmd_trans_splitting, it seems to be necessary to at least keep
pmd_trans_huge set.

In your case, the pmd would be completely cleared, which may help to find
it in fast_gup with pmd_none(), but I'm not sure if this would open up
other problems, e.g. with concurrent page faults. But I must also admit that
my THP overview got a little rusty.

> >So I don't see spontaneously how it will help fast_gup to break
> > out to the slow path in case of THP splitting.
> >
> >> 
> >> If yes, I'll prepare patch tomorrow (some sleep required).
> >> 
> >
> > We'll check if adding kick_all_cpus_sync() to pmdp_invalidate() helps.
> > It would also be good if Martin has a look at this, he'll return on
> > Monday.
> 
> -aneesh
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-12 11:59         ` Gerald Schaefer
  0 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-12 11:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 12 Feb 2016 09:34:33 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:

> Gerald Schaefer <gerald.schaefer@de.ibm.com> writes:
> 
> > On Thu, 11 Feb 2016 21:09:42 +0200
> > "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> >
> >> On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> >> > Hi,
> >> > 
> >> > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> >> > he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> >> > review of the THP rework patches, which cannot be bisected, revealed
> >> > commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> >> > (and also similar commits for other archs).
> >> > 
> >> > This commit removes the THP splitting bit and also the architecture
> >> > implementation of pmdp_splitting_flush(), which took care of the IPI for
> >> > fast_gup serialization. The commit message says
> >> > 
> >> >     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
> >> >     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
> >> >     needed for fast_gup
> >> > 
> >> > The assumption that a TLB flush will also produce an IPI is wrong on s390,
> >> > and maybe also on other architectures, and I thought that this was actually
> >> > the main reason for having an arch-specific pmdp_splitting_flush().
> >> > 
> >> > At least PowerPC and ARM also had an individual implementation of
> >> > pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> >> > flush to send the IPI, and those were also removed. Putting the arch
> >> > maintainers and mailing lists on cc to verify.
> >> > 
> >> > On s390 this will break the IPI serialization against fast_gup, which
> >> > would certainly explain the random kernel crashes, please revert or fix
> >> > the pmdp_splitting_flush() removal.
> >> 
> >> Sorry for that.
> >> 
> >> I believe, the problem was already addressed for PowerPC:
> >> 
> >> http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar at linux.vnet.ibm.com
> >> 
> >> I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
> >> the trick, right?
> >
> > Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
> > fast_gup will still return false, because the pmd is not empty (at least
> > on s390).
> 
> Why can't we do this ? I did this for ppc64.
> 
>  void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>  		     pmd_t *pmdp)
>  {
> -	pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, 0);
> +	pmd_hugepage_update(vma->vm_mm, address, pmdp, ~0UL, 0);
> 

Wouldn't that semantically change what pmdp_invalidate() was supposed to
do? The comment before the call says "the pmd_trans_huge and
pmd_trans_splitting must remain set at all times on the pmd". So, after
removing pmd_trans_splitting, it seems to be necessary to at least keep
pmd_trans_huge set.

In your case, the pmd would be completely cleared, which may help to find
it in fast_gup with pmd_none(), but I'm not sure if this would open up
other problems, e.g. with concurrent page faults. But I must also admit that
my THP overview got a little rusty.

> >So I don't see spontaneously how it will help fast_gup to break
> > out to the slow path in case of THP splitting.
> >
> >> 
> >> If yes, I'll prepare patch tomorrow (some sleep required).
> >> 
> >
> > We'll check if adding kick_all_cpus_sync() to pmdp_invalidate() helps.
> > It would also be good if Martin has a look at this, he'll return on
> > Monday.
> 
> -aneesh
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-11 19:12     ` Kirill A. Shutemov
  (?)
@ 2016-02-12 12:21       ` Sebastian Ott
  -1 siblings, 0 replies; 153+ messages in thread
From: Sebastian Ott @ 2016-02-12 12:21 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Gerald Schaefer, linux-mm, linux-kernel,
	Aneesh Kumar K.V, Andrew Morton, Linus Torvalds,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

On Thu, 11 Feb 2016, Kirill A. Shutemov wrote:
> On Thu, Feb 11, 2016 at 09:09:42PM +0200, Kirill A. Shutemov wrote:
> > On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> > > Hi,
> > > 
> > > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> > > he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> > > review of the THP rework patches, which cannot be bisected, revealed
> > > commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> > > (and also similar commits for other archs).
> > > 
> > > This commit removes the THP splitting bit and also the architecture
> > > implementation of pmdp_splitting_flush(), which took care of the IPI for
> > > fast_gup serialization. The commit message says
> > > 
> > >     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
> > >     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
> > >     needed for fast_gup
> > > 
> > > The assumption that a TLB flush will also produce an IPI is wrong on s390,
> > > and maybe also on other architectures, and I thought that this was actually
> > > the main reason for having an arch-specific pmdp_splitting_flush().
> > > 
> > > At least PowerPC and ARM also had an individual implementation of
> > > pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> > > flush to send the IPI, and those were also removed. Putting the arch
> > > maintainers and mailing lists on cc to verify.
> > > 
> > > On s390 this will break the IPI serialization against fast_gup, which
> > > would certainly explain the random kernel crashes, please revert or fix
> > > the pmdp_splitting_flush() removal.
> > 
> > Sorry for that.
> > 
> > I believe, the problem was already addressed for PowerPC:
> > 
> > http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
> 
> Correct link is
> 
> http://lkml.kernel.org/g/1454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
> 

Based on your suggestion Gerald provided the following patch but sadly it
didn't fix the problem.

Sebastian


---
 arch/s390/include/asm/pgtable.h |    2 ++
 1 file changed, 2 insertions(+)

--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1587,6 +1587,8 @@ static inline void pmdp_invalidate(struc
 				   unsigned long address, pmd_t *pmdp)
 {
 	pmdp_flush_direct(vma->vm_mm, address, pmdp);
+	/* Serialize against fast_gup with IPI */
+	kick_all_cpus_sync();
 }

 #define __HAVE_ARCH_PMDP_SET_WRPROTECT

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-12 12:21       ` Sebastian Ott
  0 siblings, 0 replies; 153+ messages in thread
From: Sebastian Ott @ 2016-02-12 12:21 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Gerald Schaefer, linux-mm, linux-kernel,
	Aneesh Kumar K.V, Andrew Morton, Linus Torvalds,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

On Thu, 11 Feb 2016, Kirill A. Shutemov wrote:
> On Thu, Feb 11, 2016 at 09:09:42PM +0200, Kirill A. Shutemov wrote:
> > On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> > > Hi,
> > > 
> > > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> > > he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> > > review of the THP rework patches, which cannot be bisected, revealed
> > > commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> > > (and also similar commits for other archs).
> > > 
> > > This commit removes the THP splitting bit and also the architecture
> > > implementation of pmdp_splitting_flush(), which took care of the IPI for
> > > fast_gup serialization. The commit message says
> > > 
> > >     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
> > >     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
> > >     needed for fast_gup
> > > 
> > > The assumption that a TLB flush will also produce an IPI is wrong on s390,
> > > and maybe also on other architectures, and I thought that this was actually
> > > the main reason for having an arch-specific pmdp_splitting_flush().
> > > 
> > > At least PowerPC and ARM also had an individual implementation of
> > > pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> > > flush to send the IPI, and those were also removed. Putting the arch
> > > maintainers and mailing lists on cc to verify.
> > > 
> > > On s390 this will break the IPI serialization against fast_gup, which
> > > would certainly explain the random kernel crashes, please revert or fix
> > > the pmdp_splitting_flush() removal.
> > 
> > Sorry for that.
> > 
> > I believe, the problem was already addressed for PowerPC:
> > 
> > http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
> 
> Correct link is
> 
> http://lkml.kernel.org/g/1454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
> 

Based on your suggestion Gerald provided the following patch but sadly it
didn't fix the problem.

Sebastian


---
 arch/s390/include/asm/pgtable.h |    2 ++
 1 file changed, 2 insertions(+)

--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1587,6 +1587,8 @@ static inline void pmdp_invalidate(struc
 				   unsigned long address, pmd_t *pmdp)
 {
 	pmdp_flush_direct(vma->vm_mm, address, pmdp);
+	/* Serialize against fast_gup with IPI */
+	kick_all_cpus_sync();
 }

 #define __HAVE_ARCH_PMDP_SET_WRPROTECT

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-12 12:21       ` Sebastian Ott
  0 siblings, 0 replies; 153+ messages in thread
From: Sebastian Ott @ 2016-02-12 12:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 11 Feb 2016, Kirill A. Shutemov wrote:
> On Thu, Feb 11, 2016 at 09:09:42PM +0200, Kirill A. Shutemov wrote:
> > On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> > > Hi,
> > > 
> > > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> > > he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> > > review of the THP rework patches, which cannot be bisected, revealed
> > > commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> > > (and also similar commits for other archs).
> > > 
> > > This commit removes the THP splitting bit and also the architecture
> > > implementation of pmdp_splitting_flush(), which took care of the IPI for
> > > fast_gup serialization. The commit message says
> > > 
> > >     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
> > >     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
> > >     needed for fast_gup
> > > 
> > > The assumption that a TLB flush will also produce an IPI is wrong on s390,
> > > and maybe also on other architectures, and I thought that this was actually
> > > the main reason for having an arch-specific pmdp_splitting_flush().
> > > 
> > > At least PowerPC and ARM also had an individual implementation of
> > > pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> > > flush to send the IPI, and those were also removed. Putting the arch
> > > maintainers and mailing lists on cc to verify.
> > > 
> > > On s390 this will break the IPI serialization against fast_gup, which
> > > would certainly explain the random kernel crashes, please revert or fix
> > > the pmdp_splitting_flush() removal.
> > 
> > Sorry for that.
> > 
> > I believe, the problem was already addressed for PowerPC:
> > 
> > http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar at linux.vnet.ibm.com
> 
> Correct link is
> 
> http://lkml.kernel.org/g/1454980831-16631-1-git-send-email-aneesh.kumar at linux.vnet.ibm.com
> 

Based on your suggestion Gerald provided the following patch but sadly it
didn't fix the problem.

Sebastian


---
 arch/s390/include/asm/pgtable.h |    2 ++
 1 file changed, 2 insertions(+)

--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1587,6 +1587,8 @@ static inline void pmdp_invalidate(struc
 				   unsigned long address, pmd_t *pmdp)
 {
 	pmdp_flush_direct(vma->vm_mm, address, pmdp);
+	/* Serialize against fast_gup with IPI */
+	kick_all_cpus_sync();
 }

 #define __HAVE_ARCH_PMDP_SET_WRPROTECT

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-11 19:57     ` Gerald Schaefer
  (?)
@ 2016-02-12 15:41       ` Kirill A. Shutemov
  -1 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-12 15:41 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On Thu, Feb 11, 2016 at 08:57:02PM +0100, Gerald Schaefer wrote:
> On Thu, 11 Feb 2016 21:09:42 +0200
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> > > Hi,
> > > 
> > > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> > > he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> > > review of the THP rework patches, which cannot be bisected, revealed
> > > commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> > > (and also similar commits for other archs).
> > > 
> > > This commit removes the THP splitting bit and also the architecture
> > > implementation of pmdp_splitting_flush(), which took care of the IPI for
> > > fast_gup serialization. The commit message says
> > > 
> > >     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
> > >     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
> > >     needed for fast_gup
> > > 
> > > The assumption that a TLB flush will also produce an IPI is wrong on s390,
> > > and maybe also on other architectures, and I thought that this was actually
> > > the main reason for having an arch-specific pmdp_splitting_flush().
> > > 
> > > At least PowerPC and ARM also had an individual implementation of
> > > pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> > > flush to send the IPI, and those were also removed. Putting the arch
> > > maintainers and mailing lists on cc to verify.
> > > 
> > > On s390 this will break the IPI serialization against fast_gup, which
> > > would certainly explain the random kernel crashes, please revert or fix
> > > the pmdp_splitting_flush() removal.
> > 
> > Sorry for that.
> > 
> > I believe, the problem was already addressed for PowerPC:
> > 
> > http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
> > 
> > I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
> > the trick, right?
> 
> Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
> fast_gup will still return false, because the pmd is not empty (at least
> on s390). So I don't see spontaneously how it will help fast_gup to break
> out to the slow path in case of THP splitting.

What pmdp_flush_direct() does in pmdp_invalidate()? It's hard to unwrap for me :-/
Does it make the pmd !pmd_present()?

I'm also confused by pmd_none() is equal to !pmd_present() on s390. Hm?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-12 15:41       ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-12 15:41 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On Thu, Feb 11, 2016 at 08:57:02PM +0100, Gerald Schaefer wrote:
> On Thu, 11 Feb 2016 21:09:42 +0200
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> > > Hi,
> > > 
> > > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> > > he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> > > review of the THP rework patches, which cannot be bisected, revealed
> > > commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> > > (and also similar commits for other archs).
> > > 
> > > This commit removes the THP splitting bit and also the architecture
> > > implementation of pmdp_splitting_flush(), which took care of the IPI for
> > > fast_gup serialization. The commit message says
> > > 
> > >     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
> > >     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
> > >     needed for fast_gup
> > > 
> > > The assumption that a TLB flush will also produce an IPI is wrong on s390,
> > > and maybe also on other architectures, and I thought that this was actually
> > > the main reason for having an arch-specific pmdp_splitting_flush().
> > > 
> > > At least PowerPC and ARM also had an individual implementation of
> > > pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> > > flush to send the IPI, and those were also removed. Putting the arch
> > > maintainers and mailing lists on cc to verify.
> > > 
> > > On s390 this will break the IPI serialization against fast_gup, which
> > > would certainly explain the random kernel crashes, please revert or fix
> > > the pmdp_splitting_flush() removal.
> > 
> > Sorry for that.
> > 
> > I believe, the problem was already addressed for PowerPC:
> > 
> > http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
> > 
> > I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
> > the trick, right?
> 
> Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
> fast_gup will still return false, because the pmd is not empty (at least
> on s390). So I don't see spontaneously how it will help fast_gup to break
> out to the slow path in case of THP splitting.

What pmdp_flush_direct() does in pmdp_invalidate()? It's hard to unwrap for me :-/
Does it make the pmd !pmd_present()?

I'm also confused by pmd_none() is equal to !pmd_present() on s390. Hm?

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-12 15:41       ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-12 15:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 11, 2016 at 08:57:02PM +0100, Gerald Schaefer wrote:
> On Thu, 11 Feb 2016 21:09:42 +0200
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> > > Hi,
> > > 
> > > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> > > he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> > > review of the THP rework patches, which cannot be bisected, revealed
> > > commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> > > (and also similar commits for other archs).
> > > 
> > > This commit removes the THP splitting bit and also the architecture
> > > implementation of pmdp_splitting_flush(), which took care of the IPI for
> > > fast_gup serialization. The commit message says
> > > 
> > >     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
> > >     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
> > >     needed for fast_gup
> > > 
> > > The assumption that a TLB flush will also produce an IPI is wrong on s390,
> > > and maybe also on other architectures, and I thought that this was actually
> > > the main reason for having an arch-specific pmdp_splitting_flush().
> > > 
> > > At least PowerPC and ARM also had an individual implementation of
> > > pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> > > flush to send the IPI, and those were also removed. Putting the arch
> > > maintainers and mailing lists on cc to verify.
> > > 
> > > On s390 this will break the IPI serialization against fast_gup, which
> > > would certainly explain the random kernel crashes, please revert or fix
> > > the pmdp_splitting_flush() removal.
> > 
> > Sorry for that.
> > 
> > I believe, the problem was already addressed for PowerPC:
> > 
> > http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar at linux.vnet.ibm.com
> > 
> > I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
> > the trick, right?
> 
> Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
> fast_gup will still return false, because the pmd is not empty (at least
> on s390). So I don't see spontaneously how it will help fast_gup to break
> out to the slow path in case of THP splitting.

What pmdp_flush_direct() does in pmdp_invalidate()? It's hard to unwrap for me :-/
Does it make the pmd !pmd_present()?

I'm also confused by pmd_none() is equal to !pmd_present() on s390. Hm?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-12 10:12         ` Sebastian Ott
  (?)
@ 2016-02-12 15:52           ` Will Deacon
  -1 siblings, 0 replies; 153+ messages in thread
From: Will Deacon @ 2016-02-12 15:52 UTC (permalink / raw)
  To: Sebastian Ott
  Cc: linux-arm-kernel, linux-s390, Catalin Marinas, Gerald Schaefer,
	Michael Ellerman, linuxppc-dev, Heiko Carstens, linux-kernel,
	linux-mm, Paul Mackerras, Aneesh Kumar K.V,
	Benjamin Herrenschmidt, Martin Schwidefsky, Kirill A. Shutemov,
	Andrew Morton, Linus Torvalds, Kirill A. Shutemov

On Fri, Feb 12, 2016 at 11:12:34AM +0100, Sebastian Ott wrote:
> On Fri, 12 Feb 2016, Will Deacon wrote:
> > On Thu, Feb 11, 2016 at 08:57:02PM +0100, Gerald Schaefer wrote:
> > > On Thu, 11 Feb 2016 21:09:42 +0200
> > > "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> > > > On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> > > > > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> > > > > he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> > > > > review of the THP rework patches, which cannot be bisected, revealed
> > > > > commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> > > > > (and also similar commits for other archs).

[...]

> > Do you have a reliable way to trigger the "random kernel crashes"? We've not
> > seen anything reported on arm64, but I don't see why we wouldn't be affected
> > by the same bug and it would be good to confirm and validate a fix.
> 
> My testcase was compiling the kernel. Most of the time my test system
> didn't survive a single compile run. During bisect I did at least 20
> compile runs to flag a commit as good.

I've been building kernels all day with -rc3 on my arm64 box and haven't
seen any problems yet.. :/.

I'll leave it going over the weekend.

Will

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-12 15:52           ` Will Deacon
  0 siblings, 0 replies; 153+ messages in thread
From: Will Deacon @ 2016-02-12 15:52 UTC (permalink / raw)
  To: Sebastian Ott
  Cc: linux-arm-kernel, linux-s390, Catalin Marinas, Gerald Schaefer,
	Michael Ellerman, linuxppc-dev, Heiko Carstens, linux-kernel,
	linux-mm, Paul Mackerras, Aneesh Kumar K.V,
	Benjamin Herrenschmidt, Martin Schwidefsky, Kirill A. Shutemov,
	Andrew Morton, Linus Torvalds, Kirill A. Shutemov

On Fri, Feb 12, 2016 at 11:12:34AM +0100, Sebastian Ott wrote:
> On Fri, 12 Feb 2016, Will Deacon wrote:
> > On Thu, Feb 11, 2016 at 08:57:02PM +0100, Gerald Schaefer wrote:
> > > On Thu, 11 Feb 2016 21:09:42 +0200
> > > "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> > > > On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> > > > > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> > > > > he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> > > > > review of the THP rework patches, which cannot be bisected, revealed
> > > > > commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> > > > > (and also similar commits for other archs).

[...]

> > Do you have a reliable way to trigger the "random kernel crashes"? We've not
> > seen anything reported on arm64, but I don't see why we wouldn't be affected
> > by the same bug and it would be good to confirm and validate a fix.
> 
> My testcase was compiling the kernel. Most of the time my test system
> didn't survive a single compile run. During bisect I did at least 20
> compile runs to flag a commit as good.

I've been building kernels all day with -rc3 on my arm64 box and haven't
seen any problems yet.. :/.

I'll leave it going over the weekend.

Will

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-12 15:52           ` Will Deacon
  0 siblings, 0 replies; 153+ messages in thread
From: Will Deacon @ 2016-02-12 15:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 12, 2016 at 11:12:34AM +0100, Sebastian Ott wrote:
> On Fri, 12 Feb 2016, Will Deacon wrote:
> > On Thu, Feb 11, 2016 at 08:57:02PM +0100, Gerald Schaefer wrote:
> > > On Thu, 11 Feb 2016 21:09:42 +0200
> > > "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> > > > On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> > > > > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> > > > > he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> > > > > review of the THP rework patches, which cannot be bisected, revealed
> > > > > commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> > > > > (and also similar commits for other archs).

[...]

> > Do you have a reliable way to trigger the "random kernel crashes"? We've not
> > seen anything reported on arm64, but I don't see why we wouldn't be affected
> > by the same bug and it would be good to confirm and validate a fix.
> 
> My testcase was compiling the kernel. Most of the time my test system
> didn't survive a single compile run. During bisect I did at least 20
> compile runs to flag a commit as good.

I've been building kernels all day with -rc3 on my arm64 box and haven't
seen any problems yet.. :/.

I'll leave it going over the weekend.

Will

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-12 15:41       ` Kirill A. Shutemov
  (?)
@ 2016-02-12 15:57         ` Christian Borntraeger
  -1 siblings, 0 replies; 153+ messages in thread
From: Christian Borntraeger @ 2016-02-12 15:57 UTC (permalink / raw)
  To: Kirill A. Shutemov, Gerald Schaefer
  Cc: Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On 02/12/2016 04:41 PM, Kirill A. Shutemov wrote:
> On Thu, Feb 11, 2016 at 08:57:02PM +0100, Gerald Schaefer wrote:
>> On Thu, 11 Feb 2016 21:09:42 +0200
>> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
>>
>>> On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
>>>> Hi,
>>>>
>>>> Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
>>>> he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
>>>> review of the THP rework patches, which cannot be bisected, revealed
>>>> commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
>>>> (and also similar commits for other archs).
>>>>
>>>> This commit removes the THP splitting bit and also the architecture
>>>> implementation of pmdp_splitting_flush(), which took care of the IPI for
>>>> fast_gup serialization. The commit message says
>>>>
>>>>     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
>>>>     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
>>>>     needed for fast_gup
>>>>
>>>> The assumption that a TLB flush will also produce an IPI is wrong on s390,
>>>> and maybe also on other architectures, and I thought that this was actually
>>>> the main reason for having an arch-specific pmdp_splitting_flush().
>>>>
>>>> At least PowerPC and ARM also had an individual implementation of
>>>> pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
>>>> flush to send the IPI, and those were also removed. Putting the arch
>>>> maintainers and mailing lists on cc to verify.
>>>>
>>>> On s390 this will break the IPI serialization against fast_gup, which
>>>> would certainly explain the random kernel crashes, please revert or fix
>>>> the pmdp_splitting_flush() removal.
>>>
>>> Sorry for that.
>>>
>>> I believe, the problem was already addressed for PowerPC:
>>>
>>> http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
>>>
>>> I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
>>> the trick, right?
>>
>> Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
>> fast_gup will still return false, because the pmd is not empty (at least
>> on s390). So I don't see spontaneously how it will help fast_gup to break
>> out to the slow path in case of THP splitting.
> 
> What pmdp_flush_direct() does in pmdp_invalidate()? It's hard to unwrap for me :-/
> Does it make the pmd !pmd_present()?

It uses the idte instruction, which in an atomic fashion flushes the associated
TLB entry and changes the value of the pmd entry to invalid. This comes from the
HW requirement to not  change a PTE/PMD that might be still in use, other than 
with special instructions that does the tlb handling and the invalidation together.

(It also does some some other magic to the attach_count, which might hold off
finish_arch_post_lock_switch while some flushing is happening, but this should
be unrelated here)


> I'm also confused by pmd_none() is equal to !pmd_present() on s390. Hm?

Don't know, Gerald or Martin?

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-12 15:57         ` Christian Borntraeger
  0 siblings, 0 replies; 153+ messages in thread
From: Christian Borntraeger @ 2016-02-12 15:57 UTC (permalink / raw)
  To: Kirill A. Shutemov, Gerald Schaefer
  Cc: Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On 02/12/2016 04:41 PM, Kirill A. Shutemov wrote:
> On Thu, Feb 11, 2016 at 08:57:02PM +0100, Gerald Schaefer wrote:
>> On Thu, 11 Feb 2016 21:09:42 +0200
>> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
>>
>>> On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
>>>> Hi,
>>>>
>>>> Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
>>>> he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
>>>> review of the THP rework patches, which cannot be bisected, revealed
>>>> commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
>>>> (and also similar commits for other archs).
>>>>
>>>> This commit removes the THP splitting bit and also the architecture
>>>> implementation of pmdp_splitting_flush(), which took care of the IPI for
>>>> fast_gup serialization. The commit message says
>>>>
>>>>     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
>>>>     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
>>>>     needed for fast_gup
>>>>
>>>> The assumption that a TLB flush will also produce an IPI is wrong on s390,
>>>> and maybe also on other architectures, and I thought that this was actually
>>>> the main reason for having an arch-specific pmdp_splitting_flush().
>>>>
>>>> At least PowerPC and ARM also had an individual implementation of
>>>> pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
>>>> flush to send the IPI, and those were also removed. Putting the arch
>>>> maintainers and mailing lists on cc to verify.
>>>>
>>>> On s390 this will break the IPI serialization against fast_gup, which
>>>> would certainly explain the random kernel crashes, please revert or fix
>>>> the pmdp_splitting_flush() removal.
>>>
>>> Sorry for that.
>>>
>>> I believe, the problem was already addressed for PowerPC:
>>>
>>> http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
>>>
>>> I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
>>> the trick, right?
>>
>> Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
>> fast_gup will still return false, because the pmd is not empty (at least
>> on s390). So I don't see spontaneously how it will help fast_gup to break
>> out to the slow path in case of THP splitting.
> 
> What pmdp_flush_direct() does in pmdp_invalidate()? It's hard to unwrap for me :-/
> Does it make the pmd !pmd_present()?

It uses the idte instruction, which in an atomic fashion flushes the associated
TLB entry and changes the value of the pmd entry to invalid. This comes from the
HW requirement to not  change a PTE/PMD that might be still in use, other than 
with special instructions that does the tlb handling and the invalidation together.

(It also does some some other magic to the attach_count, which might hold off
finish_arch_post_lock_switch while some flushing is happening, but this should
be unrelated here)


> I'm also confused by pmd_none() is equal to !pmd_present() on s390. Hm?

Don't know, Gerald or Martin?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-12 15:57         ` Christian Borntraeger
  0 siblings, 0 replies; 153+ messages in thread
From: Christian Borntraeger @ 2016-02-12 15:57 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/12/2016 04:41 PM, Kirill A. Shutemov wrote:
> On Thu, Feb 11, 2016 at 08:57:02PM +0100, Gerald Schaefer wrote:
>> On Thu, 11 Feb 2016 21:09:42 +0200
>> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
>>
>>> On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
>>>> Hi,
>>>>
>>>> Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
>>>> he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
>>>> review of the THP rework patches, which cannot be bisected, revealed
>>>> commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
>>>> (and also similar commits for other archs).
>>>>
>>>> This commit removes the THP splitting bit and also the architecture
>>>> implementation of pmdp_splitting_flush(), which took care of the IPI for
>>>> fast_gup serialization. The commit message says
>>>>
>>>>     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
>>>>     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
>>>>     needed for fast_gup
>>>>
>>>> The assumption that a TLB flush will also produce an IPI is wrong on s390,
>>>> and maybe also on other architectures, and I thought that this was actually
>>>> the main reason for having an arch-specific pmdp_splitting_flush().
>>>>
>>>> At least PowerPC and ARM also had an individual implementation of
>>>> pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
>>>> flush to send the IPI, and those were also removed. Putting the arch
>>>> maintainers and mailing lists on cc to verify.
>>>>
>>>> On s390 this will break the IPI serialization against fast_gup, which
>>>> would certainly explain the random kernel crashes, please revert or fix
>>>> the pmdp_splitting_flush() removal.
>>>
>>> Sorry for that.
>>>
>>> I believe, the problem was already addressed for PowerPC:
>>>
>>> http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar at linux.vnet.ibm.com
>>>
>>> I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
>>> the trick, right?
>>
>> Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
>> fast_gup will still return false, because the pmd is not empty (at least
>> on s390). So I don't see spontaneously how it will help fast_gup to break
>> out to the slow path in case of THP splitting.
> 
> What pmdp_flush_direct() does in pmdp_invalidate()? It's hard to unwrap for me :-/
> Does it make the pmd !pmd_present()?

It uses the idte instruction, which in an atomic fashion flushes the associated
TLB entry and changes the value of the pmd entry to invalid. This comes from the
HW requirement to not  change a PTE/PMD that might be still in use, other than 
with special instructions that does the tlb handling and the invalidation together.

(It also does some some other magic to the attach_count, which might hold off
finish_arch_post_lock_switch while some flushing is happening, but this should
be unrelated here)


> I'm also confused by pmd_none() is equal to !pmd_present() on s390. Hm?

Don't know, Gerald or Martin?

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-12 11:59         ` Gerald Schaefer
  (?)
@ 2016-02-12 16:17           ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 153+ messages in thread
From: Aneesh Kumar K.V @ 2016-02-12 16:17 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Kirill A. Shutemov, Kirill A. Shutemov, linux-mm, linux-kernel,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

Gerald Schaefer <gerald.schaefer@de.ibm.com> writes:

> On Fri, 12 Feb 2016 09:34:33 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
>
>> Gerald Schaefer <gerald.schaefer@de.ibm.com> writes:
>> 
>> > On Thu, 11 Feb 2016 21:09:42 +0200
>> > "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
>> >
>> >> On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
>> >> > Hi,
>> >> > 
>> >> > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
>> >> > he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
>> >> > review of the THP rework patches, which cannot be bisected, revealed
>> >> > commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
>> >> > (and also similar commits for other archs).
>> >> > 
>> >> > This commit removes the THP splitting bit and also the architecture
>> >> > implementation of pmdp_splitting_flush(), which took care of the IPI for
>> >> > fast_gup serialization. The commit message says
>> >> > 
>> >> >     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
>> >> >     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
>> >> >     needed for fast_gup
>> >> > 
>> >> > The assumption that a TLB flush will also produce an IPI is wrong on s390,
>> >> > and maybe also on other architectures, and I thought that this was actually
>> >> > the main reason for having an arch-specific pmdp_splitting_flush().
>> >> > 
>> >> > At least PowerPC and ARM also had an individual implementation of
>> >> > pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
>> >> > flush to send the IPI, and those were also removed. Putting the arch
>> >> > maintainers and mailing lists on cc to verify.
>> >> > 
>> >> > On s390 this will break the IPI serialization against fast_gup, which
>> >> > would certainly explain the random kernel crashes, please revert or fix
>> >> > the pmdp_splitting_flush() removal.
>> >> 
>> >> Sorry for that.
>> >> 
>> >> I believe, the problem was already addressed for PowerPC:
>> >> 
>> >> http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
>> >> 
>> >> I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
>> >> the trick, right?
>> >
>> > Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
>> > fast_gup will still return false, because the pmd is not empty (at least
>> > on s390).
>> 
>> Why can't we do this ? I did this for ppc64.
>> 
>>  void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>>  		     pmd_t *pmdp)
>>  {
>> -	pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, 0);
>> +	pmd_hugepage_update(vma->vm_mm, address, pmdp, ~0UL, 0);
>> 
>
> Wouldn't that semantically change what pmdp_invalidate() was supposed to
> do? The comment before the call says "the pmd_trans_huge and
> pmd_trans_splitting must remain set at all times on the pmd". So, after
> removing pmd_trans_splitting, it seems to be necessary to at least keep
> pmd_trans_huge set.
>
> In your case, the pmd would be completely cleared, which may help to find
> it in fast_gup with pmd_none(), but I'm not sure if this would open up
> other problems, e.g. with concurrent page faults. But I must also admit that
> my THP overview got a little rusty.

Thinking about this more, I guess, I should not be doing this. Because
this bring in the exit_mmap race that I outlined in the patch even
though the window now is small. 

I guess we should fix this in the gup path by checking for what ever
trick we are using to mark the pmd splitting. For ppc64 we clear the
_PAGE_USER. We are ok as long as autonuma is enabled because
pmd_protnone() check will check against _PAGE_USER. But that may not be
sufficient. 

-aneesh

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-12 16:17           ` Aneesh Kumar K.V
  0 siblings, 0 replies; 153+ messages in thread
From: Aneesh Kumar K.V @ 2016-02-12 16:17 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Kirill A. Shutemov, Kirill A. Shutemov, linux-mm, linux-kernel,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

Gerald Schaefer <gerald.schaefer@de.ibm.com> writes:

> On Fri, 12 Feb 2016 09:34:33 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
>
>> Gerald Schaefer <gerald.schaefer@de.ibm.com> writes:
>> 
>> > On Thu, 11 Feb 2016 21:09:42 +0200
>> > "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
>> >
>> >> On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
>> >> > Hi,
>> >> > 
>> >> > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
>> >> > he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
>> >> > review of the THP rework patches, which cannot be bisected, revealed
>> >> > commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
>> >> > (and also similar commits for other archs).
>> >> > 
>> >> > This commit removes the THP splitting bit and also the architecture
>> >> > implementation of pmdp_splitting_flush(), which took care of the IPI for
>> >> > fast_gup serialization. The commit message says
>> >> > 
>> >> >     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
>> >> >     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
>> >> >     needed for fast_gup
>> >> > 
>> >> > The assumption that a TLB flush will also produce an IPI is wrong on s390,
>> >> > and maybe also on other architectures, and I thought that this was actually
>> >> > the main reason for having an arch-specific pmdp_splitting_flush().
>> >> > 
>> >> > At least PowerPC and ARM also had an individual implementation of
>> >> > pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
>> >> > flush to send the IPI, and those were also removed. Putting the arch
>> >> > maintainers and mailing lists on cc to verify.
>> >> > 
>> >> > On s390 this will break the IPI serialization against fast_gup, which
>> >> > would certainly explain the random kernel crashes, please revert or fix
>> >> > the pmdp_splitting_flush() removal.
>> >> 
>> >> Sorry for that.
>> >> 
>> >> I believe, the problem was already addressed for PowerPC:
>> >> 
>> >> http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
>> >> 
>> >> I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
>> >> the trick, right?
>> >
>> > Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
>> > fast_gup will still return false, because the pmd is not empty (at least
>> > on s390).
>> 
>> Why can't we do this ? I did this for ppc64.
>> 
>>  void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>>  		     pmd_t *pmdp)
>>  {
>> -	pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, 0);
>> +	pmd_hugepage_update(vma->vm_mm, address, pmdp, ~0UL, 0);
>> 
>
> Wouldn't that semantically change what pmdp_invalidate() was supposed to
> do? The comment before the call says "the pmd_trans_huge and
> pmd_trans_splitting must remain set at all times on the pmd". So, after
> removing pmd_trans_splitting, it seems to be necessary to at least keep
> pmd_trans_huge set.
>
> In your case, the pmd would be completely cleared, which may help to find
> it in fast_gup with pmd_none(), but I'm not sure if this would open up
> other problems, e.g. with concurrent page faults. But I must also admit that
> my THP overview got a little rusty.

Thinking about this more, I guess, I should not be doing this. Because
this bring in the exit_mmap race that I outlined in the patch even
though the window now is small. 

I guess we should fix this in the gup path by checking for what ever
trick we are using to mark the pmd splitting. For ppc64 we clear the
_PAGE_USER. We are ok as long as autonuma is enabled because
pmd_protnone() check will check against _PAGE_USER. But that may not be
sufficient. 

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-12 16:17           ` Aneesh Kumar K.V
  0 siblings, 0 replies; 153+ messages in thread
From: Aneesh Kumar K.V @ 2016-02-12 16:17 UTC (permalink / raw)
  To: linux-arm-kernel

Gerald Schaefer <gerald.schaefer@de.ibm.com> writes:

> On Fri, 12 Feb 2016 09:34:33 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
>
>> Gerald Schaefer <gerald.schaefer@de.ibm.com> writes:
>> 
>> > On Thu, 11 Feb 2016 21:09:42 +0200
>> > "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
>> >
>> >> On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
>> >> > Hi,
>> >> > 
>> >> > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
>> >> > he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
>> >> > review of the THP rework patches, which cannot be bisected, revealed
>> >> > commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
>> >> > (and also similar commits for other archs).
>> >> > 
>> >> > This commit removes the THP splitting bit and also the architecture
>> >> > implementation of pmdp_splitting_flush(), which took care of the IPI for
>> >> > fast_gup serialization. The commit message says
>> >> > 
>> >> >     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
>> >> >     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
>> >> >     needed for fast_gup
>> >> > 
>> >> > The assumption that a TLB flush will also produce an IPI is wrong on s390,
>> >> > and maybe also on other architectures, and I thought that this was actually
>> >> > the main reason for having an arch-specific pmdp_splitting_flush().
>> >> > 
>> >> > At least PowerPC and ARM also had an individual implementation of
>> >> > pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
>> >> > flush to send the IPI, and those were also removed. Putting the arch
>> >> > maintainers and mailing lists on cc to verify.
>> >> > 
>> >> > On s390 this will break the IPI serialization against fast_gup, which
>> >> > would certainly explain the random kernel crashes, please revert or fix
>> >> > the pmdp_splitting_flush() removal.
>> >> 
>> >> Sorry for that.
>> >> 
>> >> I believe, the problem was already addressed for PowerPC:
>> >> 
>> >> http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar at linux.vnet.ibm.com
>> >> 
>> >> I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
>> >> the trick, right?
>> >
>> > Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
>> > fast_gup will still return false, because the pmd is not empty (at least
>> > on s390).
>> 
>> Why can't we do this ? I did this for ppc64.
>> 
>>  void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>>  		     pmd_t *pmdp)
>>  {
>> -	pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, 0);
>> +	pmd_hugepage_update(vma->vm_mm, address, pmdp, ~0UL, 0);
>> 
>
> Wouldn't that semantically change what pmdp_invalidate() was supposed to
> do? The comment before the call says "the pmd_trans_huge and
> pmd_trans_splitting must remain set at all times on the pmd". So, after
> removing pmd_trans_splitting, it seems to be necessary to at least keep
> pmd_trans_huge set.
>
> In your case, the pmd would be completely cleared, which may help to find
> it in fast_gup with pmd_none(), but I'm not sure if this would open up
> other problems, e.g. with concurrent page faults. But I must also admit that
> my THP overview got a little rusty.

Thinking about this more, I guess, I should not be doing this. Because
this bring in the exit_mmap race that I outlined in the patch even
though the window now is small. 

I guess we should fix this in the gup path by checking for what ever
trick we are using to mark the pmd splitting. For ppc64 we clear the
_PAGE_USER. We are ok as long as autonuma is enabled because
pmd_protnone() check will check against _PAGE_USER. But that may not be
sufficient. 

-aneesh

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-12 15:57         ` Christian Borntraeger
  (?)
@ 2016-02-12 17:16           ` Gerald Schaefer
  -1 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-12 17:16 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Kirill A. Shutemov, Kirill A. Shutemov, linux-mm, linux-kernel,
	Aneesh Kumar K.V, Andrew Morton, Linus Torvalds,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On Fri, 12 Feb 2016 16:57:27 +0100
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> On 02/12/2016 04:41 PM, Kirill A. Shutemov wrote:
> > On Thu, Feb 11, 2016 at 08:57:02PM +0100, Gerald Schaefer wrote:
> >> On Thu, 11 Feb 2016 21:09:42 +0200
> >> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> >>
> >>> On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> >>>> Hi,
> >>>>
> >>>> Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> >>>> he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> >>>> review of the THP rework patches, which cannot be bisected, revealed
> >>>> commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> >>>> (and also similar commits for other archs).
> >>>>
> >>>> This commit removes the THP splitting bit and also the architecture
> >>>> implementation of pmdp_splitting_flush(), which took care of the IPI for
> >>>> fast_gup serialization. The commit message says
> >>>>
> >>>>     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
> >>>>     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
> >>>>     needed for fast_gup
> >>>>
> >>>> The assumption that a TLB flush will also produce an IPI is wrong on s390,
> >>>> and maybe also on other architectures, and I thought that this was actually
> >>>> the main reason for having an arch-specific pmdp_splitting_flush().
> >>>>
> >>>> At least PowerPC and ARM also had an individual implementation of
> >>>> pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> >>>> flush to send the IPI, and those were also removed. Putting the arch
> >>>> maintainers and mailing lists on cc to verify.
> >>>>
> >>>> On s390 this will break the IPI serialization against fast_gup, which
> >>>> would certainly explain the random kernel crashes, please revert or fix
> >>>> the pmdp_splitting_flush() removal.
> >>>
> >>> Sorry for that.
> >>>
> >>> I believe, the problem was already addressed for PowerPC:
> >>>
> >>> http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
> >>>
> >>> I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
> >>> the trick, right?
> >>
> >> Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
> >> fast_gup will still return false, because the pmd is not empty (at least
> >> on s390). So I don't see spontaneously how it will help fast_gup to break
> >> out to the slow path in case of THP splitting.
> > 
> > What pmdp_flush_direct() does in pmdp_invalidate()? It's hard to unwrap for me :-/
> > Does it make the pmd !pmd_present()?
> 
> It uses the idte instruction, which in an atomic fashion flushes the associated
> TLB entry and changes the value of the pmd entry to invalid. This comes from the
> HW requirement to not  change a PTE/PMD that might be still in use, other than 
> with special instructions that does the tlb handling and the invalidation together.

Correct, and it does _not_ make the pmd !pmd_present(), that would only be the
case after a _clear_flush(). It only marks the pmd as invalid and flushes,
so that it cannot generate a new TLB entry before the following pmd_populate(),
but it keeps its other content. This is to fulfill the requirements outlined in
the comment in mm/huge_memory.c before the call to pmdp_invalidate(). And
independent from that comment, we would need such an _invalidate() or
_clear_flush() on s390 before the pmd_populate() because of the HW details
that Christian described.

Reading the comment again, I do now notice that it also says "mark the current
pmd notpresent", which we cannot do w/o losing the huge and (formerly) splitting
bits, but it also shouldn't be needed to provide the "single TLB guarantee" that
is required from the comment. So, a pmd_present() check on s390 in this state
would still return true. Not sure yet if this is a problem, need more thinking,
this behavior was already present before the THP rework but maybe it was OK
before and is not OK now.

At least for fast_gup this should not be a problem though.

> (It also does some some other magic to the attach_count, which might hold off
> finish_arch_post_lock_switch while some flushing is happening, but this should
> be unrelated here)
> 
> 
> > I'm also confused by pmd_none() is equal to !pmd_present() on s390. Hm?
> 
> Don't know, Gerald or Martin?

The implementation frequently changes depending on how many new bits Martin
needs to squeeze out :-)
We don't have a _PAGE_PRESENT bit for pmds, so pmd_present() just checks if the
entry is not empty. pmd_none() of course does the opposite, it checks if it is
empty.

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-12 17:16           ` Gerald Schaefer
  0 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-12 17:16 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Kirill A. Shutemov, Kirill A. Shutemov, linux-mm, linux-kernel,
	Aneesh Kumar K.V, Andrew Morton, Linus Torvalds,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On Fri, 12 Feb 2016 16:57:27 +0100
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> On 02/12/2016 04:41 PM, Kirill A. Shutemov wrote:
> > On Thu, Feb 11, 2016 at 08:57:02PM +0100, Gerald Schaefer wrote:
> >> On Thu, 11 Feb 2016 21:09:42 +0200
> >> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> >>
> >>> On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> >>>> Hi,
> >>>>
> >>>> Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> >>>> he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> >>>> review of the THP rework patches, which cannot be bisected, revealed
> >>>> commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> >>>> (and also similar commits for other archs).
> >>>>
> >>>> This commit removes the THP splitting bit and also the architecture
> >>>> implementation of pmdp_splitting_flush(), which took care of the IPI for
> >>>> fast_gup serialization. The commit message says
> >>>>
> >>>>     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
> >>>>     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
> >>>>     needed for fast_gup
> >>>>
> >>>> The assumption that a TLB flush will also produce an IPI is wrong on s390,
> >>>> and maybe also on other architectures, and I thought that this was actually
> >>>> the main reason for having an arch-specific pmdp_splitting_flush().
> >>>>
> >>>> At least PowerPC and ARM also had an individual implementation of
> >>>> pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> >>>> flush to send the IPI, and those were also removed. Putting the arch
> >>>> maintainers and mailing lists on cc to verify.
> >>>>
> >>>> On s390 this will break the IPI serialization against fast_gup, which
> >>>> would certainly explain the random kernel crashes, please revert or fix
> >>>> the pmdp_splitting_flush() removal.
> >>>
> >>> Sorry for that.
> >>>
> >>> I believe, the problem was already addressed for PowerPC:
> >>>
> >>> http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
> >>>
> >>> I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
> >>> the trick, right?
> >>
> >> Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
> >> fast_gup will still return false, because the pmd is not empty (at least
> >> on s390). So I don't see spontaneously how it will help fast_gup to break
> >> out to the slow path in case of THP splitting.
> > 
> > What pmdp_flush_direct() does in pmdp_invalidate()? It's hard to unwrap for me :-/
> > Does it make the pmd !pmd_present()?
> 
> It uses the idte instruction, which in an atomic fashion flushes the associated
> TLB entry and changes the value of the pmd entry to invalid. This comes from the
> HW requirement to not  change a PTE/PMD that might be still in use, other than 
> with special instructions that does the tlb handling and the invalidation together.

Correct, and it does _not_ make the pmd !pmd_present(), that would only be the
case after a _clear_flush(). It only marks the pmd as invalid and flushes,
so that it cannot generate a new TLB entry before the following pmd_populate(),
but it keeps its other content. This is to fulfill the requirements outlined in
the comment in mm/huge_memory.c before the call to pmdp_invalidate(). And
independent from that comment, we would need such an _invalidate() or
_clear_flush() on s390 before the pmd_populate() because of the HW details
that Christian described.

Reading the comment again, I do now notice that it also says "mark the current
pmd notpresent", which we cannot do w/o losing the huge and (formerly) splitting
bits, but it also shouldn't be needed to provide the "single TLB guarantee" that
is required from the comment. So, a pmd_present() check on s390 in this state
would still return true. Not sure yet if this is a problem, need more thinking,
this behavior was already present before the THP rework but maybe it was OK
before and is not OK now.

At least for fast_gup this should not be a problem though.

> (It also does some some other magic to the attach_count, which might hold off
> finish_arch_post_lock_switch while some flushing is happening, but this should
> be unrelated here)
> 
> 
> > I'm also confused by pmd_none() is equal to !pmd_present() on s390. Hm?
> 
> Don't know, Gerald or Martin?

The implementation frequently changes depending on how many new bits Martin
needs to squeeze out :-)
We don't have a _PAGE_PRESENT bit for pmds, so pmd_present() just checks if the
entry is not empty. pmd_none() of course does the opposite, it checks if it is
empty.

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-12 17:16           ` Gerald Schaefer
  0 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-12 17:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 12 Feb 2016 16:57:27 +0100
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> On 02/12/2016 04:41 PM, Kirill A. Shutemov wrote:
> > On Thu, Feb 11, 2016 at 08:57:02PM +0100, Gerald Schaefer wrote:
> >> On Thu, 11 Feb 2016 21:09:42 +0200
> >> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> >>
> >>> On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> >>>> Hi,
> >>>>
> >>>> Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> >>>> he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> >>>> review of the THP rework patches, which cannot be bisected, revealed
> >>>> commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> >>>> (and also similar commits for other archs).
> >>>>
> >>>> This commit removes the THP splitting bit and also the architecture
> >>>> implementation of pmdp_splitting_flush(), which took care of the IPI for
> >>>> fast_gup serialization. The commit message says
> >>>>
> >>>>     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
> >>>>     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
> >>>>     needed for fast_gup
> >>>>
> >>>> The assumption that a TLB flush will also produce an IPI is wrong on s390,
> >>>> and maybe also on other architectures, and I thought that this was actually
> >>>> the main reason for having an arch-specific pmdp_splitting_flush().
> >>>>
> >>>> At least PowerPC and ARM also had an individual implementation of
> >>>> pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> >>>> flush to send the IPI, and those were also removed. Putting the arch
> >>>> maintainers and mailing lists on cc to verify.
> >>>>
> >>>> On s390 this will break the IPI serialization against fast_gup, which
> >>>> would certainly explain the random kernel crashes, please revert or fix
> >>>> the pmdp_splitting_flush() removal.
> >>>
> >>> Sorry for that.
> >>>
> >>> I believe, the problem was already addressed for PowerPC:
> >>>
> >>> http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar at linux.vnet.ibm.com
> >>>
> >>> I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
> >>> the trick, right?
> >>
> >> Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
> >> fast_gup will still return false, because the pmd is not empty (at least
> >> on s390). So I don't see spontaneously how it will help fast_gup to break
> >> out to the slow path in case of THP splitting.
> > 
> > What pmdp_flush_direct() does in pmdp_invalidate()? It's hard to unwrap for me :-/
> > Does it make the pmd !pmd_present()?
> 
> It uses the idte instruction, which in an atomic fashion flushes the associated
> TLB entry and changes the value of the pmd entry to invalid. This comes from the
> HW requirement to not  change a PTE/PMD that might be still in use, other than 
> with special instructions that does the tlb handling and the invalidation together.

Correct, and it does _not_ make the pmd !pmd_present(), that would only be the
case after a _clear_flush(). It only marks the pmd as invalid and flushes,
so that it cannot generate a new TLB entry before the following pmd_populate(),
but it keeps its other content. This is to fulfill the requirements outlined in
the comment in mm/huge_memory.c before the call to pmdp_invalidate(). And
independent from that comment, we would need such an _invalidate() or
_clear_flush() on s390 before the pmd_populate() because of the HW details
that Christian described.

Reading the comment again, I do now notice that it also says "mark the current
pmd notpresent", which we cannot do w/o losing the huge and (formerly) splitting
bits, but it also shouldn't be needed to provide the "single TLB guarantee" that
is required from the comment. So, a pmd_present() check on s390 in this state
would still return true. Not sure yet if this is a problem, need more thinking,
this behavior was already present before the THP rework but maybe it was OK
before and is not OK now.

At least for fast_gup this should not be a problem though.

> (It also does some some other magic to the attach_count, which might hold off
> finish_arch_post_lock_switch while some flushing is happening, but this should
> be unrelated here)
> 
> 
> > I'm also confused by pmd_none() is equal to !pmd_present() on s390. Hm?
> 
> Don't know, Gerald or Martin?

The implementation frequently changes depending on how many new bits Martin
needs to squeeze out :-)
We don't have a _PAGE_PRESENT bit for pmds, so pmd_present() just checks if the
entry is not empty. pmd_none() of course does the opposite, it checks if it is
empty.

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-12 17:16           ` Gerald Schaefer
  (?)
@ 2016-02-12 23:15             ` Kirill A. Shutemov
  -1 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-12 23:15 UTC (permalink / raw)
  To: Gerald Schaefer, Andrea Arcangeli
  Cc: Christian Borntraeger, Kirill A. Shutemov, linux-mm,
	linux-kernel, Aneesh Kumar K.V, Andrew Morton, Linus Torvalds,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On Fri, Feb 12, 2016 at 06:16:40PM +0100, Gerald Schaefer wrote:
> On Fri, 12 Feb 2016 16:57:27 +0100
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
> > On 02/12/2016 04:41 PM, Kirill A. Shutemov wrote:
> > > On Thu, Feb 11, 2016 at 08:57:02PM +0100, Gerald Schaefer wrote:
> > >> On Thu, 11 Feb 2016 21:09:42 +0200
> > >> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> > >>
> > >>> On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> > >>>> Hi,
> > >>>>
> > >>>> Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> > >>>> he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> > >>>> review of the THP rework patches, which cannot be bisected, revealed
> > >>>> commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> > >>>> (and also similar commits for other archs).
> > >>>>
> > >>>> This commit removes the THP splitting bit and also the architecture
> > >>>> implementation of pmdp_splitting_flush(), which took care of the IPI for
> > >>>> fast_gup serialization. The commit message says
> > >>>>
> > >>>>     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
> > >>>>     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
> > >>>>     needed for fast_gup
> > >>>>
> > >>>> The assumption that a TLB flush will also produce an IPI is wrong on s390,
> > >>>> and maybe also on other architectures, and I thought that this was actually
> > >>>> the main reason for having an arch-specific pmdp_splitting_flush().
> > >>>>
> > >>>> At least PowerPC and ARM also had an individual implementation of
> > >>>> pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> > >>>> flush to send the IPI, and those were also removed. Putting the arch
> > >>>> maintainers and mailing lists on cc to verify.
> > >>>>
> > >>>> On s390 this will break the IPI serialization against fast_gup, which
> > >>>> would certainly explain the random kernel crashes, please revert or fix
> > >>>> the pmdp_splitting_flush() removal.
> > >>>
> > >>> Sorry for that.
> > >>>
> > >>> I believe, the problem was already addressed for PowerPC:
> > >>>
> > >>> http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
> > >>>
> > >>> I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
> > >>> the trick, right?
> > >>
> > >> Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
> > >> fast_gup will still return false, because the pmd is not empty (at least
> > >> on s390). So I don't see spontaneously how it will help fast_gup to break
> > >> out to the slow path in case of THP splitting.
> > > 
> > > What pmdp_flush_direct() does in pmdp_invalidate()? It's hard to unwrap for me :-/
> > > Does it make the pmd !pmd_present()?
> > 
> > It uses the idte instruction, which in an atomic fashion flushes the associated
> > TLB entry and changes the value of the pmd entry to invalid. This comes from the
> > HW requirement to not  change a PTE/PMD that might be still in use, other than 
> > with special instructions that does the tlb handling and the invalidation together.
> 
> Correct, and it does _not_ make the pmd !pmd_present(), that would only be the
> case after a _clear_flush(). It only marks the pmd as invalid and flushes,
> so that it cannot generate a new TLB entry before the following pmd_populate(),
> but it keeps its other content. This is to fulfill the requirements outlined in
> the comment in mm/huge_memory.c before the call to pmdp_invalidate(). And
> independent from that comment, we would need such an _invalidate() or
> _clear_flush() on s390 before the pmd_populate() because of the HW details
> that Christian described.
> 
> Reading the comment again, I do now notice that it also says "mark the current
> pmd notpresent", which we cannot do w/o losing the huge and (formerly) splitting
> bits, but it also shouldn't be needed to provide the "single TLB guarantee" that
> is required from the comment. So, a pmd_present() check on s390 in this state
> would still return true. Not sure yet if this is a problem, need more thinking,
> this behavior was already present before the THP rework but maybe it was OK
> before and is not OK now.
> 
> At least for fast_gup this should not be a problem though.

I'm trying to wrap my head around the issue and I don't think missing
serialization with gup_fast is the cause -- we just don't need it
anymore.

Previously, __split_huge_page_splitting() required serialization against
gup_fast to make sure nobody can obtain new reference to the page after
__split_huge_page_splitting() returns. This was a way to stabilize page
references before starting to distribute them from head page to tail
pages.

With new refcounting, we don't care about this. Splitting PMD is now
decoupled from splitting underlying compound page. It's okay to get new
pins after split_huge_pmd(). To stabilize page references during
split_huge_page() we rely on setting up migration entries once all
pmds are split into page table entries.

The theory that serialization against gup_fast is not a root cause of the
crashes is consistent no crashes on arm64. Problem is somewhere else.
 
> > (It also does some some other magic to the attach_count, which might hold off
> > finish_arch_post_lock_switch while some flushing is happening, but this should
> > be unrelated here)
> > 
> > 
> > > I'm also confused by pmd_none() is equal to !pmd_present() on s390. Hm?
> > 
> > Don't know, Gerald or Martin?
> 
> The implementation frequently changes depending on how many new bits Martin
> needs to squeeze out :-)

One bit was freed up by the commit you've pointed to as a cause.
I wounder If it's possible that screw up something while removing it? I
don't see it, but who knows.

Could you check if revert of fecffad25458 helps?

And could you share how crashes looks like? I haven't seen backtraces yet.

> We don't have a _PAGE_PRESENT bit for pmds, so pmd_present() just checks if the
> entry is not empty. pmd_none() of course does the opposite, it checks if it is
> empty.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-12 23:15             ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-12 23:15 UTC (permalink / raw)
  To: Gerald Schaefer, Andrea Arcangeli
  Cc: Christian Borntraeger, Kirill A. Shutemov, linux-mm,
	linux-kernel, Aneesh Kumar K.V, Andrew Morton, Linus Torvalds,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On Fri, Feb 12, 2016 at 06:16:40PM +0100, Gerald Schaefer wrote:
> On Fri, 12 Feb 2016 16:57:27 +0100
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
> > On 02/12/2016 04:41 PM, Kirill A. Shutemov wrote:
> > > On Thu, Feb 11, 2016 at 08:57:02PM +0100, Gerald Schaefer wrote:
> > >> On Thu, 11 Feb 2016 21:09:42 +0200
> > >> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> > >>
> > >>> On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> > >>>> Hi,
> > >>>>
> > >>>> Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> > >>>> he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> > >>>> review of the THP rework patches, which cannot be bisected, revealed
> > >>>> commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> > >>>> (and also similar commits for other archs).
> > >>>>
> > >>>> This commit removes the THP splitting bit and also the architecture
> > >>>> implementation of pmdp_splitting_flush(), which took care of the IPI for
> > >>>> fast_gup serialization. The commit message says
> > >>>>
> > >>>>     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
> > >>>>     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
> > >>>>     needed for fast_gup
> > >>>>
> > >>>> The assumption that a TLB flush will also produce an IPI is wrong on s390,
> > >>>> and maybe also on other architectures, and I thought that this was actually
> > >>>> the main reason for having an arch-specific pmdp_splitting_flush().
> > >>>>
> > >>>> At least PowerPC and ARM also had an individual implementation of
> > >>>> pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> > >>>> flush to send the IPI, and those were also removed. Putting the arch
> > >>>> maintainers and mailing lists on cc to verify.
> > >>>>
> > >>>> On s390 this will break the IPI serialization against fast_gup, which
> > >>>> would certainly explain the random kernel crashes, please revert or fix
> > >>>> the pmdp_splitting_flush() removal.
> > >>>
> > >>> Sorry for that.
> > >>>
> > >>> I believe, the problem was already addressed for PowerPC:
> > >>>
> > >>> http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
> > >>>
> > >>> I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
> > >>> the trick, right?
> > >>
> > >> Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
> > >> fast_gup will still return false, because the pmd is not empty (at least
> > >> on s390). So I don't see spontaneously how it will help fast_gup to break
> > >> out to the slow path in case of THP splitting.
> > > 
> > > What pmdp_flush_direct() does in pmdp_invalidate()? It's hard to unwrap for me :-/
> > > Does it make the pmd !pmd_present()?
> > 
> > It uses the idte instruction, which in an atomic fashion flushes the associated
> > TLB entry and changes the value of the pmd entry to invalid. This comes from the
> > HW requirement to not  change a PTE/PMD that might be still in use, other than 
> > with special instructions that does the tlb handling and the invalidation together.
> 
> Correct, and it does _not_ make the pmd !pmd_present(), that would only be the
> case after a _clear_flush(). It only marks the pmd as invalid and flushes,
> so that it cannot generate a new TLB entry before the following pmd_populate(),
> but it keeps its other content. This is to fulfill the requirements outlined in
> the comment in mm/huge_memory.c before the call to pmdp_invalidate(). And
> independent from that comment, we would need such an _invalidate() or
> _clear_flush() on s390 before the pmd_populate() because of the HW details
> that Christian described.
> 
> Reading the comment again, I do now notice that it also says "mark the current
> pmd notpresent", which we cannot do w/o losing the huge and (formerly) splitting
> bits, but it also shouldn't be needed to provide the "single TLB guarantee" that
> is required from the comment. So, a pmd_present() check on s390 in this state
> would still return true. Not sure yet if this is a problem, need more thinking,
> this behavior was already present before the THP rework but maybe it was OK
> before and is not OK now.
> 
> At least for fast_gup this should not be a problem though.

I'm trying to wrap my head around the issue and I don't think missing
serialization with gup_fast is the cause -- we just don't need it
anymore.

Previously, __split_huge_page_splitting() required serialization against
gup_fast to make sure nobody can obtain new reference to the page after
__split_huge_page_splitting() returns. This was a way to stabilize page
references before starting to distribute them from head page to tail
pages.

With new refcounting, we don't care about this. Splitting PMD is now
decoupled from splitting underlying compound page. It's okay to get new
pins after split_huge_pmd(). To stabilize page references during
split_huge_page() we rely on setting up migration entries once all
pmds are split into page table entries.

The theory that serialization against gup_fast is not a root cause of the
crashes is consistent no crashes on arm64. Problem is somewhere else.
 
> > (It also does some some other magic to the attach_count, which might hold off
> > finish_arch_post_lock_switch while some flushing is happening, but this should
> > be unrelated here)
> > 
> > 
> > > I'm also confused by pmd_none() is equal to !pmd_present() on s390. Hm?
> > 
> > Don't know, Gerald or Martin?
> 
> The implementation frequently changes depending on how many new bits Martin
> needs to squeeze out :-)

One bit was freed up by the commit you've pointed to as a cause.
I wounder If it's possible that screw up something while removing it? I
don't see it, but who knows.

Could you check if revert of fecffad25458 helps?

And could you share how crashes looks like? I haven't seen backtraces yet.

> We don't have a _PAGE_PRESENT bit for pmds, so pmd_present() just checks if the
> entry is not empty. pmd_none() of course does the opposite, it checks if it is
> empty.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-12 23:15             ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-12 23:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 12, 2016 at 06:16:40PM +0100, Gerald Schaefer wrote:
> On Fri, 12 Feb 2016 16:57:27 +0100
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
> > On 02/12/2016 04:41 PM, Kirill A. Shutemov wrote:
> > > On Thu, Feb 11, 2016 at 08:57:02PM +0100, Gerald Schaefer wrote:
> > >> On Thu, 11 Feb 2016 21:09:42 +0200
> > >> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> > >>
> > >>> On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> > >>>> Hi,
> > >>>>
> > >>>> Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and
> > >>>> he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> > >>>> review of the THP rework patches, which cannot be bisected, revealed
> > >>>> commit fecffad "s390, thp: remove infrastructure for handling splitting PMDs"
> > >>>> (and also similar commits for other archs).
> > >>>>
> > >>>> This commit removes the THP splitting bit and also the architecture
> > >>>> implementation of pmdp_splitting_flush(), which took care of the IPI for
> > >>>> fast_gup serialization. The commit message says
> > >>>>
> > >>>>     pmdp_splitting_flush() is not needed too: on splitting PMD we will do
> > >>>>     pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
> > >>>>     needed for fast_gup
> > >>>>
> > >>>> The assumption that a TLB flush will also produce an IPI is wrong on s390,
> > >>>> and maybe also on other architectures, and I thought that this was actually
> > >>>> the main reason for having an arch-specific pmdp_splitting_flush().
> > >>>>
> > >>>> At least PowerPC and ARM also had an individual implementation of
> > >>>> pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> > >>>> flush to send the IPI, and those were also removed. Putting the arch
> > >>>> maintainers and mailing lists on cc to verify.
> > >>>>
> > >>>> On s390 this will break the IPI serialization against fast_gup, which
> > >>>> would certainly explain the random kernel crashes, please revert or fix
> > >>>> the pmdp_splitting_flush() removal.
> > >>>
> > >>> Sorry for that.
> > >>>
> > >>> I believe, the problem was already addressed for PowerPC:
> > >>>
> > >>> http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.kumar at linux.vnet.ibm.com
> > >>>
> > >>> I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
> > >>> the trick, right?
> > >>
> > >> Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
> > >> fast_gup will still return false, because the pmd is not empty (at least
> > >> on s390). So I don't see spontaneously how it will help fast_gup to break
> > >> out to the slow path in case of THP splitting.
> > > 
> > > What pmdp_flush_direct() does in pmdp_invalidate()? It's hard to unwrap for me :-/
> > > Does it make the pmd !pmd_present()?
> > 
> > It uses the idte instruction, which in an atomic fashion flushes the associated
> > TLB entry and changes the value of the pmd entry to invalid. This comes from the
> > HW requirement to not  change a PTE/PMD that might be still in use, other than 
> > with special instructions that does the tlb handling and the invalidation together.
> 
> Correct, and it does _not_ make the pmd !pmd_present(), that would only be the
> case after a _clear_flush(). It only marks the pmd as invalid and flushes,
> so that it cannot generate a new TLB entry before the following pmd_populate(),
> but it keeps its other content. This is to fulfill the requirements outlined in
> the comment in mm/huge_memory.c before the call to pmdp_invalidate(). And
> independent from that comment, we would need such an _invalidate() or
> _clear_flush() on s390 before the pmd_populate() because of the HW details
> that Christian described.
> 
> Reading the comment again, I do now notice that it also says "mark the current
> pmd notpresent", which we cannot do w/o losing the huge and (formerly) splitting
> bits, but it also shouldn't be needed to provide the "single TLB guarantee" that
> is required from the comment. So, a pmd_present() check on s390 in this state
> would still return true. Not sure yet if this is a problem, need more thinking,
> this behavior was already present before the THP rework but maybe it was OK
> before and is not OK now.
> 
> At least for fast_gup this should not be a problem though.

I'm trying to wrap my head around the issue and I don't think missing
serialization with gup_fast is the cause -- we just don't need it
anymore.

Previously, __split_huge_page_splitting() required serialization against
gup_fast to make sure nobody can obtain new reference to the page after
__split_huge_page_splitting() returns. This was a way to stabilize page
references before starting to distribute them from head page to tail
pages.

With new refcounting, we don't care about this. Splitting PMD is now
decoupled from splitting underlying compound page. It's okay to get new
pins after split_huge_pmd(). To stabilize page references during
split_huge_page() we rely on setting up migration entries once all
pmds are split into page table entries.

The theory that serialization against gup_fast is not a root cause of the
crashes is consistent no crashes on arm64. Problem is somewhere else.
 
> > (It also does some some other magic to the attach_count, which might hold off
> > finish_arch_post_lock_switch while some flushing is happening, but this should
> > be unrelated here)
> > 
> > 
> > > I'm also confused by pmd_none() is equal to !pmd_present() on s390. Hm?
> > 
> > Don't know, Gerald or Martin?
> 
> The implementation frequently changes depending on how many new bits Martin
> needs to squeeze out :-)

One bit was freed up by the commit you've pointed to as a cause.
I wounder If it's possible that screw up something while removing it? I
don't see it, but who knows.

Could you check if revert of fecffad25458 helps?

And could you share how crashes looks like? I haven't seen backtraces yet.

> We don't have a _PAGE_PRESENT bit for pmds, so pmd_present() just checks if the
> entry is not empty. pmd_none() of course does the opposite, it checks if it is
> empty.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-12 23:15             ` Kirill A. Shutemov
  (?)
@ 2016-02-13 11:58               ` Sebastian Ott
  -1 siblings, 0 replies; 153+ messages in thread
From: Sebastian Ott @ 2016-02-13 11:58 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Gerald Schaefer, Andrea Arcangeli, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

[-- Attachment #1: Type: text/plain, Size: 14184 bytes --]


On Sat, 13 Feb 2016, Kirill A. Shutemov wrote:
> Could you check if revert of fecffad25458 helps?

I reverted fecffad25458 on top of 721675fcf277cf - it oopsed with:

¢ 1851.721062! Unable to handle kernel pointer dereference in virtual kernel address space
¢ 1851.721075! failing address: 0000000000000000 TEID: 0000000000000483
¢ 1851.721078! Fault in home space mode while using kernel ASCE.
¢ 1851.721085! AS:0000000000d5c007 R3:00000000ffff0007 S:00000000ffffa800 P:000000000000003d
¢ 1851.721128! Oops: 0004 ilc:3 ¢#1! PREEMPT SMP DEBUG_PAGEALLOC
¢ 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 des_generic genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common crc_itu_t dm_mod scm_block vhost_net tun vhost eadm_sch macvtap macvlan kvm autofs4
¢ 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 4.5.0-rc3-00058-g07923d7-dirty #178
¢ 1851.721186! task: 000000007fbfd290 ti: 000000008c604000 task.ti: 000000008c604000
¢ 1851.721189! Krnl PSW : 0704d00180000000 000000000045d3b8 (__rb_erase_color+0x280/0x308)
¢ 1851.721200!            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
               Krnl GPRS: 0000000000000001 0000000000000020 0000000000000000 00000000bd07eff1
¢ 1851.721205!            000000000027ca10 0000000000000000 0000000083e45898 0000000077b61198
¢ 1851.721207!            000000007ce1a490 00000000bd07eff0 000000007ce1a548 000000000027ca10
¢ 1851.721210!            00000000bd07c350 00000000bd07eff0 000000008c607aa8 000000008c607a68
¢ 1851.721221! Krnl Code: 000000000045d3aa: e3c0d0080024       stg     %%r12,8(%%r13)
                          000000000045d3b0: b9040039           lgr     %%r3,%%r9
                         #000000000045d3b4: a53b0001           oill    %%r3,1
                         >000000000045d3b8: e33010000024       stg     %%r3,0(%%r1)
                          000000000045d3be: ec28000e007c       cgij    %%r2,0,8,45d3da
                          000000000045d3c4: e34020000004       lg      %%r4,0(%%r2)
                          000000000045d3ca: b904001c           lgr     %%r1,%%r12
                          000000000045d3ce: ec143f3f0056       rosbg   %%r1,%%r4,63,63,0
¢ 1851.721269! Call Trace:
¢ 1851.721273! (¢<0000000083e45898>! 0x83e45898)
¢ 1851.721279!  ¢<000000000029342a>! unlink_anon_vmas+0x9a/0x1d8
¢ 1851.721282!  ¢<0000000000283f34>! free_pgtables+0xcc/0x148
¢ 1851.721285!  ¢<000000000028c376>! exit_mmap+0xd6/0x300
¢ 1851.721289!  ¢<0000000000134db8>! mmput+0x90/0x118
¢ 1851.721294!  ¢<00000000002d76bc>! flush_old_exec+0x5d4/0x700
¢ 1851.721298!  ¢<00000000003369f4>! load_elf_binary+0x2f4/0x13e8
¢ 1851.721301!  ¢<00000000002d6e4a>! search_binary_handler+0x9a/0x1f8
¢ 1851.721304!  ¢<00000000002d8970>! do_execveat_common.isra.32+0x668/0x9a0
¢ 1851.721307!  ¢<00000000002d8cec>! do_execve+0x44/0x58
¢ 1851.721310!  ¢<00000000002d8f92>! SyS_execve+0x3a/0x48
¢ 1851.721315!  ¢<00000000006fb096>! system_call+0xd6/0x258
¢ 1851.721317!  ¢<000003ff997436d6>! 0x3ff997436d6
¢ 1851.721319! INFO: lockdep is turned off.
¢ 1851.721321! Last Breaking-Event-Address:
¢ 1851.721323!  ¢<000000000045d31a>! __rb_erase_color+0x1e2/0x308
¢ 1851.721327!
¢ 1851.721329! ---¢ end trace 0d80041ac00cfae2 !---


> 
> And could you share how crashes looks like? I haven't seen backtraces yet.
> 

Sure. I didn't because they really looked random to me. Most of the time
in rcu or list debugging but I thought these have just been the messenger
observing a corruption first. Anyhow, here is an older one that might look
interesting:

[   59.851421] list_del corruption. next->prev should be 000000006e1eb000, but was 0000000000000400
[   59.851469] ------------[ cut here ]------------
[   59.851472] WARNING: at lib/list_debug.c:71
[   59.851475] Modules linked in: bridge stp llc btrfs xor mlx4_en vxlan ip6_udp_tunnel udp_tunnel mlx4_ib ptp pps_core ib_sa ib_mad ib_core ib_addr ghash_s390 prng raid6_pq ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 mlx4_core sha_common genwqe_card scm_block crc_itu_t vhost_net tun vhost dm_mod macvtap eadm_sch macvlan kvm autofs4
[   59.851532] CPU: 0 PID: 5400 Comm: git Not tainted 4.4.0-07794-ga4eff16-dirty #77
[   59.851535] task: 00000000d2310000 ti: 00000000d6610000 task.ti: 00000000d6610000
[   59.851539] Krnl PSW : 0704c00180000000 0000000000487434 (__list_del_entry+0xa4/0xe0)
[   59.851548]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
               Krnl GPRS: 0000000001a7a1cf 00000000d2310000 0000000000000054 0000000000000001
[   59.851554]            0000000000487430 0000000000000000 0000000000000000 00000000774e6900
[   59.851557]            000003ff53000000 000000006d4017a0 000003ff52f00000 000003ff52f00000
[   59.851560]            000003d101780000 000000006e1eb000 0000000000487430 00000000d6613b00
[   59.851571] Krnl Code: 0000000000487424: c02000219e3a	larl	%%r2,8bb098
                          000000000048742a: c0e5ffee05db	brasl	%%r14,247fe0
                         #0000000000487430: a7f40001		brc	15,487432
                         >0000000000487434: a7f40017		brc	15,487462
                          0000000000487438: a7390200		lghi	%%r3,512
                          000000000048743c: ec13ffd28064	cgrj	%%r1,%%r3,8,4873e0
                          0000000000487442: e32010000020	cg	%%r2,0(%%r1)
                          0000000000487448: a774ffda		brc	7,4873fc
[   59.851615] Call Trace:
[   59.851618] ([<0000000000487430>] __list_del_entry+0xa0/0xe0)
[   59.851621]  [<0000000000487498>] list_del+0x28/0x40
[   59.851627]  [<00000000001259ec>] pgtable_trans_huge_withdraw+0x74/0x90
[   59.851632]  [<00000000002bf234>] __split_huge_pmd_locked+0x3ec/0xa10
[   59.851635]  [<00000000002c4310>] __split_huge_pmd+0x118/0x218
[   59.851639]  [<00000000002810e8>] unmap_single_vma+0x2d8/0xb40
[   59.851643]  [<0000000000282d66>] zap_page_range+0x116/0x318
[   59.851646]  [<000000000029b834>] SyS_madvise+0x23c/0x5e8
[   59.851652]  [<00000000006f9f56>] system_call+0xd6/0x258
[   59.851656]  [<000003ff9bbfd282>] 0x3ff9bbfd282
[   59.851658] 2 locks held by git/5400:
[   59.851660]  #0:  (&mm->mmap_sem){++++++}, at: [<000000000029bb5a>] SyS_madvise+0x562/0x5e8
[   59.851670]  #1:  (&(ptlock_ptr(page))->rlock){+.+...}, at: [<00000000002c4268>] __split_huge_pmd+0x70/0x218
[   59.851679] Last Breaking-Event-Address:
[   59.851682]  [<0000000000487430>] __list_del_entry+0xa0/0xe0
[   59.851686] ---[ end trace 7bce9a4f571985b6 ]---
[   59.875754] list_del corruption. prev->next should be 000000006e1eb820, but was           (null)
[   59.875768] ------------[ cut here ]------------
[   59.875771] WARNING: at lib/list_debug.c:68
[   59.875774] Modules linked in: bridge stp llc btrfs xor mlx4_en vxlan ip6_udp_tunnel udp_tunnel mlx4_ib ptp pps_core ib_sa ib_mad ib_core ib_addr ghash_s390 prng raid6_pq ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 mlx4_core sha_common genwqe_card scm_block crc_itu_t vhost_net tun vhost dm_mod macvtap eadm_sch macvlan kvm autofs4
[   59.875820] CPU: 2 PID: 5402 Comm: git Tainted: G        W       4.4.0-07794-ga4eff16-dirty #77
[   59.875823] task: 00000000d2312948 ti: 00000000cfecc000 task.ti: 00000000cfecc000
[   59.875826] Krnl PSW : 0704c00180000000 0000000000487416 (__list_del_entry+0x86/0xe0)
[   59.875832]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
               Krnl GPRS: 0000000001a7a1cf 00000000d2312948 0000000000000054 0000000000000001
[   59.875838]            0000000000487412 0000000000000000 0000000000000000 00000000774e6900
[   59.875841]            000003ff52000000 000000006d403b10 000003ff51f00000 000003ff51f00000
[   59.875843]            000003d10177c000 000000006e1eb820 0000000000487412 00000000cfecfb00
[   59.875851] Krnl Code: 0000000000487406: c02000219e2c	larl	%%r2,8bb05e
                          000000000048740c: c0e5ffee05ea	brasl	%%r14,247fe0
                         #0000000000487412: a7f40001		brc	15,487414
                         >0000000000487416: a7f40026		brc	15,487462
                          000000000048741a: b9040032		lgr	%%r3,%%r2
                          000000000048741e: e34040080004	lg	%%r4,8(%%r4)
                          0000000000487424: c02000219e3a	larl	%%r2,8bb098
                          000000000048742a: c0e5ffee05db	brasl	%%r14,247fe0
[   59.875874] Call Trace:
[   59.875876] ([<0000000000487412>] __list_del_entry+0x82/0xe0)
[   59.875879]  [<0000000000487498>] list_del+0x28/0x40
[   59.875882]  [<00000000001259ec>] pgtable_trans_huge_withdraw+0x74/0x90
[   59.875885]  [<00000000002bf234>] __split_huge_pmd_locked+0x3ec/0xa10
[   59.875888]  [<00000000002c4310>] __split_huge_pmd+0x118/0x218
[   59.875891]  [<00000000002810e8>] unmap_single_vma+0x2d8/0xb40
[   59.875894]  [<0000000000282d66>] zap_page_range+0x116/0x318
[   59.875896]  [<000000000029b834>] SyS_madvise+0x23c/0x5e8
[   59.875899]  [<00000000006f9f56>] system_call+0xd6/0x258
[   59.875902]  [<000003ff9bbfd282>] 0x3ff9bbfd282
[   59.875904] 2 locks held by git/5402:
[   59.875906]  #0:  (&mm->mmap_sem){++++++}, at: [<000000000029bb5a>] SyS_madvise+0x562/0x5e8
[   59.875914]  #1:  (&(ptlock_ptr(page))->rlock){+.+...}, at: [<00000000002c4268>] __split_huge_pmd+0x70/0x218
[   59.875922] Last Breaking-Event-Address:
[   59.875925]  [<0000000000487412>] __list_del_entry+0x82/0xe0
[   59.875927] ---[ end trace 7bce9a4f571985b7 ]---
[   59.875935] ------------[ cut here ]------------
[   59.875937] kernel BUG at mm/huge_memory.c:2884!
[   59.875979] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[   59.875986] Modules linked in: bridge stp llc btrfs xor mlx4_en vxlan ip6_udp_tunnel udp_tunnel mlx4_ib ptp pps_core ib_sa ib_mad ib_core ib_addr ghash_s390 prng raid6_pq ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 mlx4_core sha_common genwqe_card scm_block crc_itu_t vhost_net tun vhost dm_mod macvtap eadm_sch macvlan kvm autofs4
[   59.876033] CPU: 2 PID: 5402 Comm: git Tainted: G        W       4.4.0-07794-ga4eff16-dirty #77
[   59.876036] task: 00000000d2312948 ti: 00000000cfecc000 task.ti: 00000000cfecc000
[   59.876039] Krnl PSW : 0704d00180000000 00000000002bf3aa (__split_huge_pmd_locked+0x562/0xa10)
[   59.876045]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
               Krnl GPRS: 0000000001a7a1cf 000003d10177c000 0000000000044068 000000005df00215
[   59.876051]            0000000000000001 0000000000000001 0000000000000000 00000000774e6900
[   59.876054]            000003ff52000000 000000006d403b10 000000006e1eb800 000003ff51f00000
[   59.876058]            000003d10177c000 0000000000715190 00000000002bf234 00000000cfecfb58
[   59.876068] Krnl Code: 00000000002bf39c: d507d010a000	clc	16(8,%%r13),0(%%r10)
                          00000000002bf3a2: a7840004		brc	8,2bf3aa
                         #00000000002bf3a6: a7f40001		brc	15,2bf3a8
                         >00000000002bf3aa: 91407440		tm	1088(%%r7),64
                          00000000002bf3ae: a7840208		brc	8,2bf7be
                          00000000002bf3b2: a7f401e9		brc	15,2bf784
                          00000000002bf3b6: 9104a006		tm	6(%%r10),4
                          00000000002bf3ba: a7740004		brc	7,2bf3c2
[   59.876089] Call Trace:
[   59.876092] ([<00000000002bf234>] __split_huge_pmd_locked+0x3ec/0xa10)
[   59.876095]  [<00000000002c4310>] __split_huge_pmd+0x118/0x218
[   59.876099]  [<00000000002810e8>] unmap_single_vma+0x2d8/0xb40
[   59.876102]  [<0000000000282d66>] zap_page_range+0x116/0x318
[   59.876105]  [<000000000029b834>] SyS_madvise+0x23c/0x5e8
[   59.876108]  [<00000000006f9f56>] system_call+0xd6/0x258
[   59.876111]  [<000003ff9bbfd282>] 0x3ff9bbfd282
[   59.876113] INFO: lockdep is turned off.
[   59.876115] Last Breaking-Event-Address:
[   59.876118]  [<00000000002bf3a6>] __split_huge_pmd_locked+0x55e/0xa10
[   59.876122]  
[   59.876124] ---[ end trace 7bce9a4f571985b8 ]---
[   59.876128] BUG: sleeping function called from invalid context at include/linux/sched.h:2791
[   59.876130] in_atomic(): 1, irqs_disabled(): 0, pid: 5402, name: git
[   59.876132] INFO: lockdep is turned off.
[   59.876134] Preemption disabled at:[<00000000002c4268>] __split_huge_pmd+0x70/0x218
[   59.876138] 
[   59.876141] CPU: 2 PID: 5402 Comm: git Tainted: G      D W       4.4.0-07794-ga4eff16-dirty #77
[   59.876144]        00000000cfecf610 00000000cfecf6a0 0000000000000002 0000000000000000 
                      00000000cfecf740 00000000cfecf6b8 00000000cfecf6b8 0000000000113402 
                      0000000000000000 000000000089ab4e 00000000008b0a84 0704d0010000000b 
                      00000000cfecf700 00000000cfecf6a0 0000000000000000 0000000000000000 
                      0000000000000000 0000000000113402 00000000cfecf6a0 00000000cfecf700 
[   59.876176] Call Trace:
[   59.876182] ([<000000000011330e>] show_trace+0x126/0x148)
[   59.876185]  [<00000000001133b8>] show_stack+0x88/0xe8
[   59.876189]  [<000000000045549a>] dump_stack+0x7a/0xd8
[   59.876193]  [<00000000001666c6>] ___might_sleep+0x236/0x248
[   59.876198]  [<000000000014a314>] exit_signals+0x3c/0x158
[   59.876202]  [<000000000013a4e0>] do_exit+0x140/0xd18
[   59.876206]  [<00000000001137c4>] die+0x164/0x170
[   59.876209]  [<0000000000100ac6>] do_report_trap+0x14e/0x160
[   59.876211]  [<0000000000100c94>] illegal_op+0x134/0x148
[   59.876214]  [<00000000006fa26c>] pgm_check_handler+0x15c/0x1b4
[   59.876217]  [<00000000002bf3aa>] __split_huge_pmd_locked+0x562/0xa10
[   59.876221] ([<00000000002bf234>] __split_huge_pmd_locked+0x3ec/0xa10)
[   59.876223]  [<00000000002c4310>] __split_huge_pmd+0x118/0x218
[   59.876226]  [<00000000002810e8>] unmap_single_vma+0x2d8/0xb40
[   59.876229]  [<0000000000282d66>] zap_page_range+0x116/0x318
[   59.876232]  [<000000000029b834>] SyS_madvise+0x23c/0x5e8
[   59.876235]  [<00000000006f9f56>] system_call+0xd6/0x258
[   59.876238]  [<000003ff9bbfd282>] 0x3ff9bbfd282
[   59.876240] INFO: lockdep is turned off.
[   59.876243] note: git[5402] exited with preempt_count 1

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-13 11:58               ` Sebastian Ott
  0 siblings, 0 replies; 153+ messages in thread
From: Sebastian Ott @ 2016-02-13 11:58 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Gerald Schaefer, Andrea Arcangeli, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

[-- Attachment #1: Type: text/plain, Size: 14184 bytes --]


On Sat, 13 Feb 2016, Kirill A. Shutemov wrote:
> Could you check if revert of fecffad25458 helps?

I reverted fecffad25458 on top of 721675fcf277cf - it oopsed with:

c 1851.721062! Unable to handle kernel pointer dereference in virtual kernel address space
c 1851.721075! failing address: 0000000000000000 TEID: 0000000000000483
c 1851.721078! Fault in home space mode while using kernel ASCE.
c 1851.721085! AS:0000000000d5c007 R3:00000000ffff0007 S:00000000ffffa800 P:000000000000003d
c 1851.721128! Oops: 0004 ilc:3 c#1! PREEMPT SMP DEBUG_PAGEALLOC
c 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 des_generic genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common crc_itu_t dm_mod scm_block vhost_net tun vhost eadm_sch macvtap macvlan kvm autofs4
c 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 4.5.0-rc3-00058-g07923d7-dirty #178
c 1851.721186! task: 000000007fbfd290 ti: 000000008c604000 task.ti: 000000008c604000
c 1851.721189! Krnl PSW : 0704d00180000000 000000000045d3b8 (__rb_erase_color+0x280/0x308)
c 1851.721200!            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
               Krnl GPRS: 0000000000000001 0000000000000020 0000000000000000 00000000bd07eff1
c 1851.721205!            000000000027ca10 0000000000000000 0000000083e45898 0000000077b61198
c 1851.721207!            000000007ce1a490 00000000bd07eff0 000000007ce1a548 000000000027ca10
c 1851.721210!            00000000bd07c350 00000000bd07eff0 000000008c607aa8 000000008c607a68
c 1851.721221! Krnl Code: 000000000045d3aa: e3c0d0080024       stg     %%r12,8(%%r13)
                          000000000045d3b0: b9040039           lgr     %%r3,%%r9
                         #000000000045d3b4: a53b0001           oill    %%r3,1
                         >000000000045d3b8: e33010000024       stg     %%r3,0(%%r1)
                          000000000045d3be: ec28000e007c       cgij    %%r2,0,8,45d3da
                          000000000045d3c4: e34020000004       lg      %%r4,0(%%r2)
                          000000000045d3ca: b904001c           lgr     %%r1,%%r12
                          000000000045d3ce: ec143f3f0056       rosbg   %%r1,%%r4,63,63,0
c 1851.721269! Call Trace:
c 1851.721273! (c<0000000083e45898>! 0x83e45898)
c 1851.721279!  c<000000000029342a>! unlink_anon_vmas+0x9a/0x1d8
c 1851.721282!  c<0000000000283f34>! free_pgtables+0xcc/0x148
c 1851.721285!  c<000000000028c376>! exit_mmap+0xd6/0x300
c 1851.721289!  c<0000000000134db8>! mmput+0x90/0x118
c 1851.721294!  c<00000000002d76bc>! flush_old_exec+0x5d4/0x700
c 1851.721298!  c<00000000003369f4>! load_elf_binary+0x2f4/0x13e8
c 1851.721301!  c<00000000002d6e4a>! search_binary_handler+0x9a/0x1f8
c 1851.721304!  c<00000000002d8970>! do_execveat_common.isra.32+0x668/0x9a0
c 1851.721307!  c<00000000002d8cec>! do_execve+0x44/0x58
c 1851.721310!  c<00000000002d8f92>! SyS_execve+0x3a/0x48
c 1851.721315!  c<00000000006fb096>! system_call+0xd6/0x258
c 1851.721317!  c<000003ff997436d6>! 0x3ff997436d6
c 1851.721319! INFO: lockdep is turned off.
c 1851.721321! Last Breaking-Event-Address:
c 1851.721323!  c<000000000045d31a>! __rb_erase_color+0x1e2/0x308
c 1851.721327!
c 1851.721329! ---c end trace 0d80041ac00cfae2 !---


> 
> And could you share how crashes looks like? I haven't seen backtraces yet.
> 

Sure. I didn't because they really looked random to me. Most of the time
in rcu or list debugging but I thought these have just been the messenger
observing a corruption first. Anyhow, here is an older one that might look
interesting:

[   59.851421] list_del corruption. next->prev should be 000000006e1eb000, but was 0000000000000400
[   59.851469] ------------[ cut here ]------------
[   59.851472] WARNING: at lib/list_debug.c:71
[   59.851475] Modules linked in: bridge stp llc btrfs xor mlx4_en vxlan ip6_udp_tunnel udp_tunnel mlx4_ib ptp pps_core ib_sa ib_mad ib_core ib_addr ghash_s390 prng raid6_pq ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 mlx4_core sha_common genwqe_card scm_block crc_itu_t vhost_net tun vhost dm_mod macvtap eadm_sch macvlan kvm autofs4
[   59.851532] CPU: 0 PID: 5400 Comm: git Not tainted 4.4.0-07794-ga4eff16-dirty #77
[   59.851535] task: 00000000d2310000 ti: 00000000d6610000 task.ti: 00000000d6610000
[   59.851539] Krnl PSW : 0704c00180000000 0000000000487434 (__list_del_entry+0xa4/0xe0)
[   59.851548]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
               Krnl GPRS: 0000000001a7a1cf 00000000d2310000 0000000000000054 0000000000000001
[   59.851554]            0000000000487430 0000000000000000 0000000000000000 00000000774e6900
[   59.851557]            000003ff53000000 000000006d4017a0 000003ff52f00000 000003ff52f00000
[   59.851560]            000003d101780000 000000006e1eb000 0000000000487430 00000000d6613b00
[   59.851571] Krnl Code: 0000000000487424: c02000219e3a	larl	%%r2,8bb098
                          000000000048742a: c0e5ffee05db	brasl	%%r14,247fe0
                         #0000000000487430: a7f40001		brc	15,487432
                         >0000000000487434: a7f40017		brc	15,487462
                          0000000000487438: a7390200		lghi	%%r3,512
                          000000000048743c: ec13ffd28064	cgrj	%%r1,%%r3,8,4873e0
                          0000000000487442: e32010000020	cg	%%r2,0(%%r1)
                          0000000000487448: a774ffda		brc	7,4873fc
[   59.851615] Call Trace:
[   59.851618] ([<0000000000487430>] __list_del_entry+0xa0/0xe0)
[   59.851621]  [<0000000000487498>] list_del+0x28/0x40
[   59.851627]  [<00000000001259ec>] pgtable_trans_huge_withdraw+0x74/0x90
[   59.851632]  [<00000000002bf234>] __split_huge_pmd_locked+0x3ec/0xa10
[   59.851635]  [<00000000002c4310>] __split_huge_pmd+0x118/0x218
[   59.851639]  [<00000000002810e8>] unmap_single_vma+0x2d8/0xb40
[   59.851643]  [<0000000000282d66>] zap_page_range+0x116/0x318
[   59.851646]  [<000000000029b834>] SyS_madvise+0x23c/0x5e8
[   59.851652]  [<00000000006f9f56>] system_call+0xd6/0x258
[   59.851656]  [<000003ff9bbfd282>] 0x3ff9bbfd282
[   59.851658] 2 locks held by git/5400:
[   59.851660]  #0:  (&mm->mmap_sem){++++++}, at: [<000000000029bb5a>] SyS_madvise+0x562/0x5e8
[   59.851670]  #1:  (&(ptlock_ptr(page))->rlock){+.+...}, at: [<00000000002c4268>] __split_huge_pmd+0x70/0x218
[   59.851679] Last Breaking-Event-Address:
[   59.851682]  [<0000000000487430>] __list_del_entry+0xa0/0xe0
[   59.851686] ---[ end trace 7bce9a4f571985b6 ]---
[   59.875754] list_del corruption. prev->next should be 000000006e1eb820, but was           (null)
[   59.875768] ------------[ cut here ]------------
[   59.875771] WARNING: at lib/list_debug.c:68
[   59.875774] Modules linked in: bridge stp llc btrfs xor mlx4_en vxlan ip6_udp_tunnel udp_tunnel mlx4_ib ptp pps_core ib_sa ib_mad ib_core ib_addr ghash_s390 prng raid6_pq ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 mlx4_core sha_common genwqe_card scm_block crc_itu_t vhost_net tun vhost dm_mod macvtap eadm_sch macvlan kvm autofs4
[   59.875820] CPU: 2 PID: 5402 Comm: git Tainted: G        W       4.4.0-07794-ga4eff16-dirty #77
[   59.875823] task: 00000000d2312948 ti: 00000000cfecc000 task.ti: 00000000cfecc000
[   59.875826] Krnl PSW : 0704c00180000000 0000000000487416 (__list_del_entry+0x86/0xe0)
[   59.875832]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
               Krnl GPRS: 0000000001a7a1cf 00000000d2312948 0000000000000054 0000000000000001
[   59.875838]            0000000000487412 0000000000000000 0000000000000000 00000000774e6900
[   59.875841]            000003ff52000000 000000006d403b10 000003ff51f00000 000003ff51f00000
[   59.875843]            000003d10177c000 000000006e1eb820 0000000000487412 00000000cfecfb00
[   59.875851] Krnl Code: 0000000000487406: c02000219e2c	larl	%%r2,8bb05e
                          000000000048740c: c0e5ffee05ea	brasl	%%r14,247fe0
                         #0000000000487412: a7f40001		brc	15,487414
                         >0000000000487416: a7f40026		brc	15,487462
                          000000000048741a: b9040032		lgr	%%r3,%%r2
                          000000000048741e: e34040080004	lg	%%r4,8(%%r4)
                          0000000000487424: c02000219e3a	larl	%%r2,8bb098
                          000000000048742a: c0e5ffee05db	brasl	%%r14,247fe0
[   59.875874] Call Trace:
[   59.875876] ([<0000000000487412>] __list_del_entry+0x82/0xe0)
[   59.875879]  [<0000000000487498>] list_del+0x28/0x40
[   59.875882]  [<00000000001259ec>] pgtable_trans_huge_withdraw+0x74/0x90
[   59.875885]  [<00000000002bf234>] __split_huge_pmd_locked+0x3ec/0xa10
[   59.875888]  [<00000000002c4310>] __split_huge_pmd+0x118/0x218
[   59.875891]  [<00000000002810e8>] unmap_single_vma+0x2d8/0xb40
[   59.875894]  [<0000000000282d66>] zap_page_range+0x116/0x318
[   59.875896]  [<000000000029b834>] SyS_madvise+0x23c/0x5e8
[   59.875899]  [<00000000006f9f56>] system_call+0xd6/0x258
[   59.875902]  [<000003ff9bbfd282>] 0x3ff9bbfd282
[   59.875904] 2 locks held by git/5402:
[   59.875906]  #0:  (&mm->mmap_sem){++++++}, at: [<000000000029bb5a>] SyS_madvise+0x562/0x5e8
[   59.875914]  #1:  (&(ptlock_ptr(page))->rlock){+.+...}, at: [<00000000002c4268>] __split_huge_pmd+0x70/0x218
[   59.875922] Last Breaking-Event-Address:
[   59.875925]  [<0000000000487412>] __list_del_entry+0x82/0xe0
[   59.875927] ---[ end trace 7bce9a4f571985b7 ]---
[   59.875935] ------------[ cut here ]------------
[   59.875937] kernel BUG at mm/huge_memory.c:2884!
[   59.875979] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[   59.875986] Modules linked in: bridge stp llc btrfs xor mlx4_en vxlan ip6_udp_tunnel udp_tunnel mlx4_ib ptp pps_core ib_sa ib_mad ib_core ib_addr ghash_s390 prng raid6_pq ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 mlx4_core sha_common genwqe_card scm_block crc_itu_t vhost_net tun vhost dm_mod macvtap eadm_sch macvlan kvm autofs4
[   59.876033] CPU: 2 PID: 5402 Comm: git Tainted: G        W       4.4.0-07794-ga4eff16-dirty #77
[   59.876036] task: 00000000d2312948 ti: 00000000cfecc000 task.ti: 00000000cfecc000
[   59.876039] Krnl PSW : 0704d00180000000 00000000002bf3aa (__split_huge_pmd_locked+0x562/0xa10)
[   59.876045]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
               Krnl GPRS: 0000000001a7a1cf 000003d10177c000 0000000000044068 000000005df00215
[   59.876051]            0000000000000001 0000000000000001 0000000000000000 00000000774e6900
[   59.876054]            000003ff52000000 000000006d403b10 000000006e1eb800 000003ff51f00000
[   59.876058]            000003d10177c000 0000000000715190 00000000002bf234 00000000cfecfb58
[   59.876068] Krnl Code: 00000000002bf39c: d507d010a000	clc	16(8,%%r13),0(%%r10)
                          00000000002bf3a2: a7840004		brc	8,2bf3aa
                         #00000000002bf3a6: a7f40001		brc	15,2bf3a8
                         >00000000002bf3aa: 91407440		tm	1088(%%r7),64
                          00000000002bf3ae: a7840208		brc	8,2bf7be
                          00000000002bf3b2: a7f401e9		brc	15,2bf784
                          00000000002bf3b6: 9104a006		tm	6(%%r10),4
                          00000000002bf3ba: a7740004		brc	7,2bf3c2
[   59.876089] Call Trace:
[   59.876092] ([<00000000002bf234>] __split_huge_pmd_locked+0x3ec/0xa10)
[   59.876095]  [<00000000002c4310>] __split_huge_pmd+0x118/0x218
[   59.876099]  [<00000000002810e8>] unmap_single_vma+0x2d8/0xb40
[   59.876102]  [<0000000000282d66>] zap_page_range+0x116/0x318
[   59.876105]  [<000000000029b834>] SyS_madvise+0x23c/0x5e8
[   59.876108]  [<00000000006f9f56>] system_call+0xd6/0x258
[   59.876111]  [<000003ff9bbfd282>] 0x3ff9bbfd282
[   59.876113] INFO: lockdep is turned off.
[   59.876115] Last Breaking-Event-Address:
[   59.876118]  [<00000000002bf3a6>] __split_huge_pmd_locked+0x55e/0xa10
[   59.876122]  
[   59.876124] ---[ end trace 7bce9a4f571985b8 ]---
[   59.876128] BUG: sleeping function called from invalid context at include/linux/sched.h:2791
[   59.876130] in_atomic(): 1, irqs_disabled(): 0, pid: 5402, name: git
[   59.876132] INFO: lockdep is turned off.
[   59.876134] Preemption disabled at:[<00000000002c4268>] __split_huge_pmd+0x70/0x218
[   59.876138] 
[   59.876141] CPU: 2 PID: 5402 Comm: git Tainted: G      D W       4.4.0-07794-ga4eff16-dirty #77
[   59.876144]        00000000cfecf610 00000000cfecf6a0 0000000000000002 0000000000000000 
                      00000000cfecf740 00000000cfecf6b8 00000000cfecf6b8 0000000000113402 
                      0000000000000000 000000000089ab4e 00000000008b0a84 0704d0010000000b 
                      00000000cfecf700 00000000cfecf6a0 0000000000000000 0000000000000000 
                      0000000000000000 0000000000113402 00000000cfecf6a0 00000000cfecf700 
[   59.876176] Call Trace:
[   59.876182] ([<000000000011330e>] show_trace+0x126/0x148)
[   59.876185]  [<00000000001133b8>] show_stack+0x88/0xe8
[   59.876189]  [<000000000045549a>] dump_stack+0x7a/0xd8
[   59.876193]  [<00000000001666c6>] ___might_sleep+0x236/0x248
[   59.876198]  [<000000000014a314>] exit_signals+0x3c/0x158
[   59.876202]  [<000000000013a4e0>] do_exit+0x140/0xd18
[   59.876206]  [<00000000001137c4>] die+0x164/0x170
[   59.876209]  [<0000000000100ac6>] do_report_trap+0x14e/0x160
[   59.876211]  [<0000000000100c94>] illegal_op+0x134/0x148
[   59.876214]  [<00000000006fa26c>] pgm_check_handler+0x15c/0x1b4
[   59.876217]  [<00000000002bf3aa>] __split_huge_pmd_locked+0x562/0xa10
[   59.876221] ([<00000000002bf234>] __split_huge_pmd_locked+0x3ec/0xa10)
[   59.876223]  [<00000000002c4310>] __split_huge_pmd+0x118/0x218
[   59.876226]  [<00000000002810e8>] unmap_single_vma+0x2d8/0xb40
[   59.876229]  [<0000000000282d66>] zap_page_range+0x116/0x318
[   59.876232]  [<000000000029b834>] SyS_madvise+0x23c/0x5e8
[   59.876235]  [<00000000006f9f56>] system_call+0xd6/0x258
[   59.876238]  [<000003ff9bbfd282>] 0x3ff9bbfd282
[   59.876240] INFO: lockdep is turned off.
[   59.876243] note: git[5402] exited with preempt_count 1

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-13 11:58               ` Sebastian Ott
  0 siblings, 0 replies; 153+ messages in thread
From: Sebastian Ott @ 2016-02-13 11:58 UTC (permalink / raw)
  To: linux-arm-kernel


On Sat, 13 Feb 2016, Kirill A. Shutemov wrote:
> Could you check if revert of fecffad25458 helps?

I reverted fecffad25458 on top of 721675fcf277cf - it oopsed with:

? 1851.721062! Unable to handle kernel pointer dereference in virtual kernel address space
? 1851.721075! failing address: 0000000000000000 TEID: 0000000000000483
? 1851.721078! Fault in home space mode while using kernel ASCE.
? 1851.721085! AS:0000000000d5c007 R3:00000000ffff0007 S:00000000ffffa800 P:000000000000003d
? 1851.721128! Oops: 0004 ilc:3 ?#1! PREEMPT SMP DEBUG_PAGEALLOC
? 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 des_generic genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common crc_itu_t dm_mod scm_block vhost_net tun vhost eadm_sch macvtap macvlan kvm autofs4
? 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 4.5.0-rc3-00058-g07923d7-dirty #178
? 1851.721186! task: 000000007fbfd290 ti: 000000008c604000 task.ti: 000000008c604000
? 1851.721189! Krnl PSW : 0704d00180000000 000000000045d3b8 (__rb_erase_color+0x280/0x308)
? 1851.721200!            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
               Krnl GPRS: 0000000000000001 0000000000000020 0000000000000000 00000000bd07eff1
? 1851.721205!            000000000027ca10 0000000000000000 0000000083e45898 0000000077b61198
? 1851.721207!            000000007ce1a490 00000000bd07eff0 000000007ce1a548 000000000027ca10
? 1851.721210!            00000000bd07c350 00000000bd07eff0 000000008c607aa8 000000008c607a68
? 1851.721221! Krnl Code: 000000000045d3aa: e3c0d0080024       stg     %%r12,8(%%r13)
                          000000000045d3b0: b9040039           lgr     %%r3,%%r9
                         #000000000045d3b4: a53b0001           oill    %%r3,1
                         >000000000045d3b8: e33010000024       stg     %%r3,0(%%r1)
                          000000000045d3be: ec28000e007c       cgij    %%r2,0,8,45d3da
                          000000000045d3c4: e34020000004       lg      %%r4,0(%%r2)
                          000000000045d3ca: b904001c           lgr     %%r1,%%r12
                          000000000045d3ce: ec143f3f0056       rosbg   %%r1,%%r4,63,63,0
? 1851.721269! Call Trace:
? 1851.721273! (?<0000000083e45898>! 0x83e45898)
? 1851.721279!  ?<000000000029342a>! unlink_anon_vmas+0x9a/0x1d8
? 1851.721282!  ?<0000000000283f34>! free_pgtables+0xcc/0x148
? 1851.721285!  ?<000000000028c376>! exit_mmap+0xd6/0x300
? 1851.721289!  ?<0000000000134db8>! mmput+0x90/0x118
? 1851.721294!  ?<00000000002d76bc>! flush_old_exec+0x5d4/0x700
? 1851.721298!  ?<00000000003369f4>! load_elf_binary+0x2f4/0x13e8
? 1851.721301!  ?<00000000002d6e4a>! search_binary_handler+0x9a/0x1f8
? 1851.721304!  ?<00000000002d8970>! do_execveat_common.isra.32+0x668/0x9a0
? 1851.721307!  ?<00000000002d8cec>! do_execve+0x44/0x58
? 1851.721310!  ?<00000000002d8f92>! SyS_execve+0x3a/0x48
? 1851.721315!  ?<00000000006fb096>! system_call+0xd6/0x258
? 1851.721317!  ?<000003ff997436d6>! 0x3ff997436d6
? 1851.721319! INFO: lockdep is turned off.
? 1851.721321! Last Breaking-Event-Address:
? 1851.721323!  ?<000000000045d31a>! __rb_erase_color+0x1e2/0x308
? 1851.721327!
? 1851.721329! ---? end trace 0d80041ac00cfae2 !---


> 
> And could you share how crashes looks like? I haven't seen backtraces yet.
> 

Sure. I didn't because they really looked random to me. Most of the time
in rcu or list debugging but I thought these have just been the messenger
observing a corruption first. Anyhow, here is an older one that might look
interesting:

[   59.851421] list_del corruption. next->prev should be 000000006e1eb000, but was 0000000000000400
[   59.851469] ------------[ cut here ]------------
[   59.851472] WARNING: at lib/list_debug.c:71
[   59.851475] Modules linked in: bridge stp llc btrfs xor mlx4_en vxlan ip6_udp_tunnel udp_tunnel mlx4_ib ptp pps_core ib_sa ib_mad ib_core ib_addr ghash_s390 prng raid6_pq ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 mlx4_core sha_common genwqe_card scm_block crc_itu_t vhost_net tun vhost dm_mod macvtap eadm_sch macvlan kvm autofs4
[   59.851532] CPU: 0 PID: 5400 Comm: git Not tainted 4.4.0-07794-ga4eff16-dirty #77
[   59.851535] task: 00000000d2310000 ti: 00000000d6610000 task.ti: 00000000d6610000
[   59.851539] Krnl PSW : 0704c00180000000 0000000000487434 (__list_del_entry+0xa4/0xe0)
[   59.851548]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
               Krnl GPRS: 0000000001a7a1cf 00000000d2310000 0000000000000054 0000000000000001
[   59.851554]            0000000000487430 0000000000000000 0000000000000000 00000000774e6900
[   59.851557]            000003ff53000000 000000006d4017a0 000003ff52f00000 000003ff52f00000
[   59.851560]            000003d101780000 000000006e1eb000 0000000000487430 00000000d6613b00
[   59.851571] Krnl Code: 0000000000487424: c02000219e3a	larl	%%r2,8bb098
                          000000000048742a: c0e5ffee05db	brasl	%%r14,247fe0
                         #0000000000487430: a7f40001		brc	15,487432
                         >0000000000487434: a7f40017		brc	15,487462
                          0000000000487438: a7390200		lghi	%%r3,512
                          000000000048743c: ec13ffd28064	cgrj	%%r1,%%r3,8,4873e0
                          0000000000487442: e32010000020	cg	%%r2,0(%%r1)
                          0000000000487448: a774ffda		brc	7,4873fc
[   59.851615] Call Trace:
[   59.851618] ([<0000000000487430>] __list_del_entry+0xa0/0xe0)
[   59.851621]  [<0000000000487498>] list_del+0x28/0x40
[   59.851627]  [<00000000001259ec>] pgtable_trans_huge_withdraw+0x74/0x90
[   59.851632]  [<00000000002bf234>] __split_huge_pmd_locked+0x3ec/0xa10
[   59.851635]  [<00000000002c4310>] __split_huge_pmd+0x118/0x218
[   59.851639]  [<00000000002810e8>] unmap_single_vma+0x2d8/0xb40
[   59.851643]  [<0000000000282d66>] zap_page_range+0x116/0x318
[   59.851646]  [<000000000029b834>] SyS_madvise+0x23c/0x5e8
[   59.851652]  [<00000000006f9f56>] system_call+0xd6/0x258
[   59.851656]  [<000003ff9bbfd282>] 0x3ff9bbfd282
[   59.851658] 2 locks held by git/5400:
[   59.851660]  #0:  (&mm->mmap_sem){++++++}, at: [<000000000029bb5a>] SyS_madvise+0x562/0x5e8
[   59.851670]  #1:  (&(ptlock_ptr(page))->rlock){+.+...}, at: [<00000000002c4268>] __split_huge_pmd+0x70/0x218
[   59.851679] Last Breaking-Event-Address:
[   59.851682]  [<0000000000487430>] __list_del_entry+0xa0/0xe0
[   59.851686] ---[ end trace 7bce9a4f571985b6 ]---
[   59.875754] list_del corruption. prev->next should be 000000006e1eb820, but was           (null)
[   59.875768] ------------[ cut here ]------------
[   59.875771] WARNING: at lib/list_debug.c:68
[   59.875774] Modules linked in: bridge stp llc btrfs xor mlx4_en vxlan ip6_udp_tunnel udp_tunnel mlx4_ib ptp pps_core ib_sa ib_mad ib_core ib_addr ghash_s390 prng raid6_pq ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 mlx4_core sha_common genwqe_card scm_block crc_itu_t vhost_net tun vhost dm_mod macvtap eadm_sch macvlan kvm autofs4
[   59.875820] CPU: 2 PID: 5402 Comm: git Tainted: G        W       4.4.0-07794-ga4eff16-dirty #77
[   59.875823] task: 00000000d2312948 ti: 00000000cfecc000 task.ti: 00000000cfecc000
[   59.875826] Krnl PSW : 0704c00180000000 0000000000487416 (__list_del_entry+0x86/0xe0)
[   59.875832]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
               Krnl GPRS: 0000000001a7a1cf 00000000d2312948 0000000000000054 0000000000000001
[   59.875838]            0000000000487412 0000000000000000 0000000000000000 00000000774e6900
[   59.875841]            000003ff52000000 000000006d403b10 000003ff51f00000 000003ff51f00000
[   59.875843]            000003d10177c000 000000006e1eb820 0000000000487412 00000000cfecfb00
[   59.875851] Krnl Code: 0000000000487406: c02000219e2c	larl	%%r2,8bb05e
                          000000000048740c: c0e5ffee05ea	brasl	%%r14,247fe0
                         #0000000000487412: a7f40001		brc	15,487414
                         >0000000000487416: a7f40026		brc	15,487462
                          000000000048741a: b9040032		lgr	%%r3,%%r2
                          000000000048741e: e34040080004	lg	%%r4,8(%%r4)
                          0000000000487424: c02000219e3a	larl	%%r2,8bb098
                          000000000048742a: c0e5ffee05db	brasl	%%r14,247fe0
[   59.875874] Call Trace:
[   59.875876] ([<0000000000487412>] __list_del_entry+0x82/0xe0)
[   59.875879]  [<0000000000487498>] list_del+0x28/0x40
[   59.875882]  [<00000000001259ec>] pgtable_trans_huge_withdraw+0x74/0x90
[   59.875885]  [<00000000002bf234>] __split_huge_pmd_locked+0x3ec/0xa10
[   59.875888]  [<00000000002c4310>] __split_huge_pmd+0x118/0x218
[   59.875891]  [<00000000002810e8>] unmap_single_vma+0x2d8/0xb40
[   59.875894]  [<0000000000282d66>] zap_page_range+0x116/0x318
[   59.875896]  [<000000000029b834>] SyS_madvise+0x23c/0x5e8
[   59.875899]  [<00000000006f9f56>] system_call+0xd6/0x258
[   59.875902]  [<000003ff9bbfd282>] 0x3ff9bbfd282
[   59.875904] 2 locks held by git/5402:
[   59.875906]  #0:  (&mm->mmap_sem){++++++}, at: [<000000000029bb5a>] SyS_madvise+0x562/0x5e8
[   59.875914]  #1:  (&(ptlock_ptr(page))->rlock){+.+...}, at: [<00000000002c4268>] __split_huge_pmd+0x70/0x218
[   59.875922] Last Breaking-Event-Address:
[   59.875925]  [<0000000000487412>] __list_del_entry+0x82/0xe0
[   59.875927] ---[ end trace 7bce9a4f571985b7 ]---
[   59.875935] ------------[ cut here ]------------
[   59.875937] kernel BUG at mm/huge_memory.c:2884!
[   59.875979] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[   59.875986] Modules linked in: bridge stp llc btrfs xor mlx4_en vxlan ip6_udp_tunnel udp_tunnel mlx4_ib ptp pps_core ib_sa ib_mad ib_core ib_addr ghash_s390 prng raid6_pq ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 mlx4_core sha_common genwqe_card scm_block crc_itu_t vhost_net tun vhost dm_mod macvtap eadm_sch macvlan kvm autofs4
[   59.876033] CPU: 2 PID: 5402 Comm: git Tainted: G        W       4.4.0-07794-ga4eff16-dirty #77
[   59.876036] task: 00000000d2312948 ti: 00000000cfecc000 task.ti: 00000000cfecc000
[   59.876039] Krnl PSW : 0704d00180000000 00000000002bf3aa (__split_huge_pmd_locked+0x562/0xa10)
[   59.876045]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
               Krnl GPRS: 0000000001a7a1cf 000003d10177c000 0000000000044068 000000005df00215
[   59.876051]            0000000000000001 0000000000000001 0000000000000000 00000000774e6900
[   59.876054]            000003ff52000000 000000006d403b10 000000006e1eb800 000003ff51f00000
[   59.876058]            000003d10177c000 0000000000715190 00000000002bf234 00000000cfecfb58
[   59.876068] Krnl Code: 00000000002bf39c: d507d010a000	clc	16(8,%%r13),0(%%r10)
                          00000000002bf3a2: a7840004		brc	8,2bf3aa
                         #00000000002bf3a6: a7f40001		brc	15,2bf3a8
                         >00000000002bf3aa: 91407440		tm	1088(%%r7),64
                          00000000002bf3ae: a7840208		brc	8,2bf7be
                          00000000002bf3b2: a7f401e9		brc	15,2bf784
                          00000000002bf3b6: 9104a006		tm	6(%%r10),4
                          00000000002bf3ba: a7740004		brc	7,2bf3c2
[   59.876089] Call Trace:
[   59.876092] ([<00000000002bf234>] __split_huge_pmd_locked+0x3ec/0xa10)
[   59.876095]  [<00000000002c4310>] __split_huge_pmd+0x118/0x218
[   59.876099]  [<00000000002810e8>] unmap_single_vma+0x2d8/0xb40
[   59.876102]  [<0000000000282d66>] zap_page_range+0x116/0x318
[   59.876105]  [<000000000029b834>] SyS_madvise+0x23c/0x5e8
[   59.876108]  [<00000000006f9f56>] system_call+0xd6/0x258
[   59.876111]  [<000003ff9bbfd282>] 0x3ff9bbfd282
[   59.876113] INFO: lockdep is turned off.
[   59.876115] Last Breaking-Event-Address:
[   59.876118]  [<00000000002bf3a6>] __split_huge_pmd_locked+0x55e/0xa10
[   59.876122]  
[   59.876124] ---[ end trace 7bce9a4f571985b8 ]---
[   59.876128] BUG: sleeping function called from invalid context at include/linux/sched.h:2791
[   59.876130] in_atomic(): 1, irqs_disabled(): 0, pid: 5402, name: git
[   59.876132] INFO: lockdep is turned off.
[   59.876134] Preemption disabled at:[<00000000002c4268>] __split_huge_pmd+0x70/0x218
[   59.876138] 
[   59.876141] CPU: 2 PID: 5402 Comm: git Tainted: G      D W       4.4.0-07794-ga4eff16-dirty #77
[   59.876144]        00000000cfecf610 00000000cfecf6a0 0000000000000002 0000000000000000 
                      00000000cfecf740 00000000cfecf6b8 00000000cfecf6b8 0000000000113402 
                      0000000000000000 000000000089ab4e 00000000008b0a84 0704d0010000000b 
                      00000000cfecf700 00000000cfecf6a0 0000000000000000 0000000000000000 
                      0000000000000000 0000000000113402 00000000cfecf6a0 00000000cfecf700 
[   59.876176] Call Trace:
[   59.876182] ([<000000000011330e>] show_trace+0x126/0x148)
[   59.876185]  [<00000000001133b8>] show_stack+0x88/0xe8
[   59.876189]  [<000000000045549a>] dump_stack+0x7a/0xd8
[   59.876193]  [<00000000001666c6>] ___might_sleep+0x236/0x248
[   59.876198]  [<000000000014a314>] exit_signals+0x3c/0x158
[   59.876202]  [<000000000013a4e0>] do_exit+0x140/0xd18
[   59.876206]  [<00000000001137c4>] die+0x164/0x170
[   59.876209]  [<0000000000100ac6>] do_report_trap+0x14e/0x160
[   59.876211]  [<0000000000100c94>] illegal_op+0x134/0x148
[   59.876214]  [<00000000006fa26c>] pgm_check_handler+0x15c/0x1b4
[   59.876217]  [<00000000002bf3aa>] __split_huge_pmd_locked+0x562/0xa10
[   59.876221] ([<00000000002bf234>] __split_huge_pmd_locked+0x3ec/0xa10)
[   59.876223]  [<00000000002c4310>] __split_huge_pmd+0x118/0x218
[   59.876226]  [<00000000002810e8>] unmap_single_vma+0x2d8/0xb40
[   59.876229]  [<0000000000282d66>] zap_page_range+0x116/0x318
[   59.876232]  [<000000000029b834>] SyS_madvise+0x23c/0x5e8
[   59.876235]  [<00000000006f9f56>] system_call+0xd6/0x258
[   59.876238]  [<000003ff9bbfd282>] 0x3ff9bbfd282
[   59.876240] INFO: lockdep is turned off.
[   59.876243] note: git[5402] exited with preempt_count 1

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-13 11:58               ` Sebastian Ott
  (?)
@ 2016-02-15 11:31                 ` Kirill A. Shutemov
  -1 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-15 11:31 UTC (permalink / raw)
  To: Sebastian Ott
  Cc: Gerald Schaefer, Andrea Arcangeli, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

On Sat, Feb 13, 2016 at 12:58:31PM +0100, Sebastian Ott wrote:
> 
> On Sat, 13 Feb 2016, Kirill A. Shutemov wrote:
> > Could you check if revert of fecffad25458 helps?
> 
> I reverted fecffad25458 on top of 721675fcf277cf - it oopsed with:
> 
> ¢ 1851.721062! Unable to handle kernel pointer dereference in virtual kernel address space
> ¢ 1851.721075! failing address: 0000000000000000 TEID: 0000000000000483
> ¢ 1851.721078! Fault in home space mode while using kernel ASCE.
> ¢ 1851.721085! AS:0000000000d5c007 R3:00000000ffff0007 S:00000000ffffa800 P:000000000000003d
> ¢ 1851.721128! Oops: 0004 ilc:3 ¢#1! PREEMPT SMP DEBUG_PAGEALLOC
> ¢ 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 des_generic genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common crc_itu_t dm_mod scm_block vhost_net tun vhost eadm_sch macvtap macvlan kvm autofs4
> ¢ 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 4.5.0-rc3-00058-g07923d7-dirty #178
> ¢ 1851.721186! task: 000000007fbfd290 ti: 000000008c604000 task.ti: 000000008c604000
> ¢ 1851.721189! Krnl PSW : 0704d00180000000 000000000045d3b8 (__rb_erase_color+0x280/0x308)
> ¢ 1851.721200!            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
>                Krnl GPRS: 0000000000000001 0000000000000020 0000000000000000 00000000bd07eff1
> ¢ 1851.721205!            000000000027ca10 0000000000000000 0000000083e45898 0000000077b61198
> ¢ 1851.721207!            000000007ce1a490 00000000bd07eff0 000000007ce1a548 000000000027ca10
> ¢ 1851.721210!            00000000bd07c350 00000000bd07eff0 000000008c607aa8 000000008c607a68
> ¢ 1851.721221! Krnl Code: 000000000045d3aa: e3c0d0080024       stg     %%r12,8(%%r13)
>                           000000000045d3b0: b9040039           lgr     %%r3,%%r9
>                          #000000000045d3b4: a53b0001           oill    %%r3,1
>                          >000000000045d3b8: e33010000024       stg     %%r3,0(%%r1)
>                           000000000045d3be: ec28000e007c       cgij    %%r2,0,8,45d3da
>                           000000000045d3c4: e34020000004       lg      %%r4,0(%%r2)
>                           000000000045d3ca: b904001c           lgr     %%r1,%%r12
>                           000000000045d3ce: ec143f3f0056       rosbg   %%r1,%%r4,63,63,0
> ¢ 1851.721269! Call Trace:
> ¢ 1851.721273! (¢<0000000083e45898>! 0x83e45898)
> ¢ 1851.721279!  ¢<000000000029342a>! unlink_anon_vmas+0x9a/0x1d8
> ¢ 1851.721282!  ¢<0000000000283f34>! free_pgtables+0xcc/0x148
> ¢ 1851.721285!  ¢<000000000028c376>! exit_mmap+0xd6/0x300
> ¢ 1851.721289!  ¢<0000000000134db8>! mmput+0x90/0x118
> ¢ 1851.721294!  ¢<00000000002d76bc>! flush_old_exec+0x5d4/0x700
> ¢ 1851.721298!  ¢<00000000003369f4>! load_elf_binary+0x2f4/0x13e8
> ¢ 1851.721301!  ¢<00000000002d6e4a>! search_binary_handler+0x9a/0x1f8
> ¢ 1851.721304!  ¢<00000000002d8970>! do_execveat_common.isra.32+0x668/0x9a0
> ¢ 1851.721307!  ¢<00000000002d8cec>! do_execve+0x44/0x58
> ¢ 1851.721310!  ¢<00000000002d8f92>! SyS_execve+0x3a/0x48
> ¢ 1851.721315!  ¢<00000000006fb096>! system_call+0xd6/0x258
> ¢ 1851.721317!  ¢<000003ff997436d6>! 0x3ff997436d6
> ¢ 1851.721319! INFO: lockdep is turned off.
> ¢ 1851.721321! Last Breaking-Event-Address:
> ¢ 1851.721323!  ¢<000000000045d31a>! __rb_erase_color+0x1e2/0x308
> ¢ 1851.721327!
> ¢ 1851.721329! ---¢ end trace 0d80041ac00cfae2 !---
> 
> 
> > 
> > And could you share how crashes looks like? I haven't seen backtraces yet.
> > 
> 
> Sure. I didn't because they really looked random to me. Most of the time
> in rcu or list debugging but I thought these have just been the messenger
> observing a corruption first. Anyhow, here is an older one that might look
> interesting:
> 
> [   59.851421] list_del corruption. next->prev should be 000000006e1eb000, but was 0000000000000400

This kinda interesting: 0x400 is TAIL_MAPPING.. Hm..

Could you check if you see the problem on commit 1c290f642101 and its
immediate parent?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-15 11:31                 ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-15 11:31 UTC (permalink / raw)
  To: Sebastian Ott
  Cc: Gerald Schaefer, Andrea Arcangeli, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

On Sat, Feb 13, 2016 at 12:58:31PM +0100, Sebastian Ott wrote:
> 
> On Sat, 13 Feb 2016, Kirill A. Shutemov wrote:
> > Could you check if revert of fecffad25458 helps?
> 
> I reverted fecffad25458 on top of 721675fcf277cf - it oopsed with:
> 
> c 1851.721062! Unable to handle kernel pointer dereference in virtual kernel address space
> c 1851.721075! failing address: 0000000000000000 TEID: 0000000000000483
> c 1851.721078! Fault in home space mode while using kernel ASCE.
> c 1851.721085! AS:0000000000d5c007 R3:00000000ffff0007 S:00000000ffffa800 P:000000000000003d
> c 1851.721128! Oops: 0004 ilc:3 c#1! PREEMPT SMP DEBUG_PAGEALLOC
> c 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 des_generic genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common crc_itu_t dm_mod scm_block vhost_net tun vhost eadm_sch macvtap macvlan kvm autofs4
> c 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 4.5.0-rc3-00058-g07923d7-dirty #178
> c 1851.721186! task: 000000007fbfd290 ti: 000000008c604000 task.ti: 000000008c604000
> c 1851.721189! Krnl PSW : 0704d00180000000 000000000045d3b8 (__rb_erase_color+0x280/0x308)
> c 1851.721200!            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
>                Krnl GPRS: 0000000000000001 0000000000000020 0000000000000000 00000000bd07eff1
> c 1851.721205!            000000000027ca10 0000000000000000 0000000083e45898 0000000077b61198
> c 1851.721207!            000000007ce1a490 00000000bd07eff0 000000007ce1a548 000000000027ca10
> c 1851.721210!            00000000bd07c350 00000000bd07eff0 000000008c607aa8 000000008c607a68
> c 1851.721221! Krnl Code: 000000000045d3aa: e3c0d0080024       stg     %%r12,8(%%r13)
>                           000000000045d3b0: b9040039           lgr     %%r3,%%r9
>                          #000000000045d3b4: a53b0001           oill    %%r3,1
>                          >000000000045d3b8: e33010000024       stg     %%r3,0(%%r1)
>                           000000000045d3be: ec28000e007c       cgij    %%r2,0,8,45d3da
>                           000000000045d3c4: e34020000004       lg      %%r4,0(%%r2)
>                           000000000045d3ca: b904001c           lgr     %%r1,%%r12
>                           000000000045d3ce: ec143f3f0056       rosbg   %%r1,%%r4,63,63,0
> c 1851.721269! Call Trace:
> c 1851.721273! (c<0000000083e45898>! 0x83e45898)
> c 1851.721279!  c<000000000029342a>! unlink_anon_vmas+0x9a/0x1d8
> c 1851.721282!  c<0000000000283f34>! free_pgtables+0xcc/0x148
> c 1851.721285!  c<000000000028c376>! exit_mmap+0xd6/0x300
> c 1851.721289!  c<0000000000134db8>! mmput+0x90/0x118
> c 1851.721294!  c<00000000002d76bc>! flush_old_exec+0x5d4/0x700
> c 1851.721298!  c<00000000003369f4>! load_elf_binary+0x2f4/0x13e8
> c 1851.721301!  c<00000000002d6e4a>! search_binary_handler+0x9a/0x1f8
> c 1851.721304!  c<00000000002d8970>! do_execveat_common.isra.32+0x668/0x9a0
> c 1851.721307!  c<00000000002d8cec>! do_execve+0x44/0x58
> c 1851.721310!  c<00000000002d8f92>! SyS_execve+0x3a/0x48
> c 1851.721315!  c<00000000006fb096>! system_call+0xd6/0x258
> c 1851.721317!  c<000003ff997436d6>! 0x3ff997436d6
> c 1851.721319! INFO: lockdep is turned off.
> c 1851.721321! Last Breaking-Event-Address:
> c 1851.721323!  c<000000000045d31a>! __rb_erase_color+0x1e2/0x308
> c 1851.721327!
> c 1851.721329! ---c end trace 0d80041ac00cfae2 !---
> 
> 
> > 
> > And could you share how crashes looks like? I haven't seen backtraces yet.
> > 
> 
> Sure. I didn't because they really looked random to me. Most of the time
> in rcu or list debugging but I thought these have just been the messenger
> observing a corruption first. Anyhow, here is an older one that might look
> interesting:
> 
> [   59.851421] list_del corruption. next->prev should be 000000006e1eb000, but was 0000000000000400

This kinda interesting: 0x400 is TAIL_MAPPING.. Hm..

Could you check if you see the problem on commit 1c290f642101 and its
immediate parent?

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-15 11:31                 ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-15 11:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Feb 13, 2016 at 12:58:31PM +0100, Sebastian Ott wrote:
> 
> On Sat, 13 Feb 2016, Kirill A. Shutemov wrote:
> > Could you check if revert of fecffad25458 helps?
> 
> I reverted fecffad25458 on top of 721675fcf277cf - it oopsed with:
> 
> ? 1851.721062! Unable to handle kernel pointer dereference in virtual kernel address space
> ? 1851.721075! failing address: 0000000000000000 TEID: 0000000000000483
> ? 1851.721078! Fault in home space mode while using kernel ASCE.
> ? 1851.721085! AS:0000000000d5c007 R3:00000000ffff0007 S:00000000ffffa800 P:000000000000003d
> ? 1851.721128! Oops: 0004 ilc:3 ?#1! PREEMPT SMP DEBUG_PAGEALLOC
> ? 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 des_generic genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common crc_itu_t dm_mod scm_block vhost_net tun vhost eadm_sch macvtap macvlan kvm autofs4
> ? 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 4.5.0-rc3-00058-g07923d7-dirty #178
> ? 1851.721186! task: 000000007fbfd290 ti: 000000008c604000 task.ti: 000000008c604000
> ? 1851.721189! Krnl PSW : 0704d00180000000 000000000045d3b8 (__rb_erase_color+0x280/0x308)
> ? 1851.721200!            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
>                Krnl GPRS: 0000000000000001 0000000000000020 0000000000000000 00000000bd07eff1
> ? 1851.721205!            000000000027ca10 0000000000000000 0000000083e45898 0000000077b61198
> ? 1851.721207!            000000007ce1a490 00000000bd07eff0 000000007ce1a548 000000000027ca10
> ? 1851.721210!            00000000bd07c350 00000000bd07eff0 000000008c607aa8 000000008c607a68
> ? 1851.721221! Krnl Code: 000000000045d3aa: e3c0d0080024       stg     %%r12,8(%%r13)
>                           000000000045d3b0: b9040039           lgr     %%r3,%%r9
>                          #000000000045d3b4: a53b0001           oill    %%r3,1
>                          >000000000045d3b8: e33010000024       stg     %%r3,0(%%r1)
>                           000000000045d3be: ec28000e007c       cgij    %%r2,0,8,45d3da
>                           000000000045d3c4: e34020000004       lg      %%r4,0(%%r2)
>                           000000000045d3ca: b904001c           lgr     %%r1,%%r12
>                           000000000045d3ce: ec143f3f0056       rosbg   %%r1,%%r4,63,63,0
> ? 1851.721269! Call Trace:
> ? 1851.721273! (?<0000000083e45898>! 0x83e45898)
> ? 1851.721279!  ?<000000000029342a>! unlink_anon_vmas+0x9a/0x1d8
> ? 1851.721282!  ?<0000000000283f34>! free_pgtables+0xcc/0x148
> ? 1851.721285!  ?<000000000028c376>! exit_mmap+0xd6/0x300
> ? 1851.721289!  ?<0000000000134db8>! mmput+0x90/0x118
> ? 1851.721294!  ?<00000000002d76bc>! flush_old_exec+0x5d4/0x700
> ? 1851.721298!  ?<00000000003369f4>! load_elf_binary+0x2f4/0x13e8
> ? 1851.721301!  ?<00000000002d6e4a>! search_binary_handler+0x9a/0x1f8
> ? 1851.721304!  ?<00000000002d8970>! do_execveat_common.isra.32+0x668/0x9a0
> ? 1851.721307!  ?<00000000002d8cec>! do_execve+0x44/0x58
> ? 1851.721310!  ?<00000000002d8f92>! SyS_execve+0x3a/0x48
> ? 1851.721315!  ?<00000000006fb096>! system_call+0xd6/0x258
> ? 1851.721317!  ?<000003ff997436d6>! 0x3ff997436d6
> ? 1851.721319! INFO: lockdep is turned off.
> ? 1851.721321! Last Breaking-Event-Address:
> ? 1851.721323!  ?<000000000045d31a>! __rb_erase_color+0x1e2/0x308
> ? 1851.721327!
> ? 1851.721329! ---? end trace 0d80041ac00cfae2 !---
> 
> 
> > 
> > And could you share how crashes looks like? I haven't seen backtraces yet.
> > 
> 
> Sure. I didn't because they really looked random to me. Most of the time
> in rcu or list debugging but I thought these have just been the messenger
> observing a corruption first. Anyhow, here is an older one that might look
> interesting:
> 
> [   59.851421] list_del corruption. next->prev should be 000000006e1eb000, but was 0000000000000400

This kinda interesting: 0x400 is TAIL_MAPPING.. Hm..

Could you check if you see the problem on commit 1c290f642101 and its
immediate parent?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-15 11:31                 ` Kirill A. Shutemov
  (?)
@ 2016-02-15 16:38                   ` Sebastian Ott
  -1 siblings, 0 replies; 153+ messages in thread
From: Sebastian Ott @ 2016-02-15 16:38 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Gerald Schaefer, Andrea Arcangeli, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

On Mon, 15 Feb 2016, Kirill A. Shutemov wrote:
> > [   59.851421] list_del corruption. next->prev should be 000000006e1eb000, but was 0000000000000400
> 
> This kinda interesting: 0x400 is TAIL_MAPPING.. Hm..
> 
> Could you check if you see the problem on commit 1c290f642101 and its
> immediate parent?

Both 1c290f642101 and 1c290f642101^ survived 20 compile runs each.

Sebastian

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-15 16:38                   ` Sebastian Ott
  0 siblings, 0 replies; 153+ messages in thread
From: Sebastian Ott @ 2016-02-15 16:38 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Gerald Schaefer, Andrea Arcangeli, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

On Mon, 15 Feb 2016, Kirill A. Shutemov wrote:
> > [   59.851421] list_del corruption. next->prev should be 000000006e1eb000, but was 0000000000000400
> 
> This kinda interesting: 0x400 is TAIL_MAPPING.. Hm..
> 
> Could you check if you see the problem on commit 1c290f642101 and its
> immediate parent?

Both 1c290f642101 and 1c290f642101^ survived 20 compile runs each.

Sebastian

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-15 16:38                   ` Sebastian Ott
  0 siblings, 0 replies; 153+ messages in thread
From: Sebastian Ott @ 2016-02-15 16:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 15 Feb 2016, Kirill A. Shutemov wrote:
> > [   59.851421] list_del corruption. next->prev should be 000000006e1eb000, but was 0000000000000400
> 
> This kinda interesting: 0x400 is TAIL_MAPPING.. Hm..
> 
> Could you check if you see the problem on commit 1c290f642101 and its
> immediate parent?

Both 1c290f642101 and 1c290f642101^ survived 20 compile runs each.

Sebastian

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-12 23:15             ` Kirill A. Shutemov
  (?)
@ 2016-02-15 16:41               ` Gerald Schaefer
  -1 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-15 16:41 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrea Arcangeli, Christian Borntraeger, Kirill A. Shutemov,
	linux-mm, linux-kernel, Aneesh Kumar K.V, Andrew Morton,
	Linus Torvalds, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Catalin Marinas, Will Deacon,
	linux-arm-kernel, Martin Schwidefsky, Heiko Carstens, linux-s390,
	Sebastian Ott

On Sat, 13 Feb 2016 01:15:10 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> 
> I'm trying to wrap my head around the issue and I don't think missing
> serialization with gup_fast is the cause -- we just don't need it
> anymore.
> 
> Previously, __split_huge_page_splitting() required serialization against
> gup_fast to make sure nobody can obtain new reference to the page after
> __split_huge_page_splitting() returns. This was a way to stabilize page
> references before starting to distribute them from head page to tail
> pages.
> 
> With new refcounting, we don't care about this. Splitting PMD is now
> decoupled from splitting underlying compound page. It's okay to get new
> pins after split_huge_pmd(). To stabilize page references during
> split_huge_page() we rely on setting up migration entries once all
> pmds are split into page table entries.
> 
> The theory that serialization against gup_fast is not a root cause of the
> crashes is consistent no crashes on arm64. Problem is somewhere else.

Hmm, ok, I just relied on the commit message of commit fecffad25458, which
talks about "pmdp_clear_flush() will do IPI as needed for fast_gup", as well
as the comments in mm/gup.c, which also still talk about IPIs and THP
splitting.

If IPI serialization with fast_gup is not needed anymore for THP splitting,
please fix at least the comments in mm/gup.c.

> 
> > > (It also does some some other magic to the attach_count, which might hold off
> > > finish_arch_post_lock_switch while some flushing is happening, but this should
> > > be unrelated here)
> > > 
> > > 
> > > > I'm also confused by pmd_none() is equal to !pmd_present() on s390. Hm?
> > > 
> > > Don't know, Gerald or Martin?
> > 
> > The implementation frequently changes depending on how many new bits Martin
> > needs to squeeze out :-)
> 
> One bit was freed up by the commit you've pointed to as a cause.
> I wounder If it's possible that screw up something while removing it? I
> don't see it, but who knows.
> 
> Could you check if revert of fecffad25458 helps?

I tried reverting fecffad25458, plus re-adding a call to pmdp_splitting_flush()
in __split_huge_pmd_locked(), and I could still reproduce the crashes, so I
guess it really isn't related to fast_gup vs. THP splitting.

> 
> And could you share how crashes looks like? I haven't seen backtraces yet.
> 
> > We don't have a _PAGE_PRESENT bit for pmds, so pmd_present() just checks if the
> > entry is not empty. pmd_none() of course does the opposite, it checks if it is
> > empty.
> 

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-15 16:41               ` Gerald Schaefer
  0 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-15 16:41 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrea Arcangeli, Christian Borntraeger, Kirill A. Shutemov,
	linux-mm, linux-kernel, Aneesh Kumar K.V, Andrew Morton,
	Linus Torvalds, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Catalin Marinas, Will Deacon,
	linux-arm-kernel, Martin Schwidefsky, Heiko Carstens, linux-s390,
	Sebastian Ott

On Sat, 13 Feb 2016 01:15:10 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> 
> I'm trying to wrap my head around the issue and I don't think missing
> serialization with gup_fast is the cause -- we just don't need it
> anymore.
> 
> Previously, __split_huge_page_splitting() required serialization against
> gup_fast to make sure nobody can obtain new reference to the page after
> __split_huge_page_splitting() returns. This was a way to stabilize page
> references before starting to distribute them from head page to tail
> pages.
> 
> With new refcounting, we don't care about this. Splitting PMD is now
> decoupled from splitting underlying compound page. It's okay to get new
> pins after split_huge_pmd(). To stabilize page references during
> split_huge_page() we rely on setting up migration entries once all
> pmds are split into page table entries.
> 
> The theory that serialization against gup_fast is not a root cause of the
> crashes is consistent no crashes on arm64. Problem is somewhere else.

Hmm, ok, I just relied on the commit message of commit fecffad25458, which
talks about "pmdp_clear_flush() will do IPI as needed for fast_gup", as well
as the comments in mm/gup.c, which also still talk about IPIs and THP
splitting.

If IPI serialization with fast_gup is not needed anymore for THP splitting,
please fix at least the comments in mm/gup.c.

> 
> > > (It also does some some other magic to the attach_count, which might hold off
> > > finish_arch_post_lock_switch while some flushing is happening, but this should
> > > be unrelated here)
> > > 
> > > 
> > > > I'm also confused by pmd_none() is equal to !pmd_present() on s390. Hm?
> > > 
> > > Don't know, Gerald or Martin?
> > 
> > The implementation frequently changes depending on how many new bits Martin
> > needs to squeeze out :-)
> 
> One bit was freed up by the commit you've pointed to as a cause.
> I wounder If it's possible that screw up something while removing it? I
> don't see it, but who knows.
> 
> Could you check if revert of fecffad25458 helps?

I tried reverting fecffad25458, plus re-adding a call to pmdp_splitting_flush()
in __split_huge_pmd_locked(), and I could still reproduce the crashes, so I
guess it really isn't related to fast_gup vs. THP splitting.

> 
> And could you share how crashes looks like? I haven't seen backtraces yet.
> 
> > We don't have a _PAGE_PRESENT bit for pmds, so pmd_present() just checks if the
> > entry is not empty. pmd_none() of course does the opposite, it checks if it is
> > empty.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-15 16:41               ` Gerald Schaefer
  0 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-15 16:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, 13 Feb 2016 01:15:10 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> 
> I'm trying to wrap my head around the issue and I don't think missing
> serialization with gup_fast is the cause -- we just don't need it
> anymore.
> 
> Previously, __split_huge_page_splitting() required serialization against
> gup_fast to make sure nobody can obtain new reference to the page after
> __split_huge_page_splitting() returns. This was a way to stabilize page
> references before starting to distribute them from head page to tail
> pages.
> 
> With new refcounting, we don't care about this. Splitting PMD is now
> decoupled from splitting underlying compound page. It's okay to get new
> pins after split_huge_pmd(). To stabilize page references during
> split_huge_page() we rely on setting up migration entries once all
> pmds are split into page table entries.
> 
> The theory that serialization against gup_fast is not a root cause of the
> crashes is consistent no crashes on arm64. Problem is somewhere else.

Hmm, ok, I just relied on the commit message of commit fecffad25458, which
talks about "pmdp_clear_flush() will do IPI as needed for fast_gup", as well
as the comments in mm/gup.c, which also still talk about IPIs and THP
splitting.

If IPI serialization with fast_gup is not needed anymore for THP splitting,
please fix at least the comments in mm/gup.c.

> 
> > > (It also does some some other magic to the attach_count, which might hold off
> > > finish_arch_post_lock_switch while some flushing is happening, but this should
> > > be unrelated here)
> > > 
> > > 
> > > > I'm also confused by pmd_none() is equal to !pmd_present() on s390. Hm?
> > > 
> > > Don't know, Gerald or Martin?
> > 
> > The implementation frequently changes depending on how many new bits Martin
> > needs to squeeze out :-)
> 
> One bit was freed up by the commit you've pointed to as a cause.
> I wounder If it's possible that screw up something while removing it? I
> don't see it, but who knows.
> 
> Could you check if revert of fecffad25458 helps?

I tried reverting fecffad25458, plus re-adding a call to pmdp_splitting_flush()
in __split_huge_pmd_locked(), and I could still reproduce the crashes, so I
guess it really isn't related to fast_gup vs. THP splitting.

> 
> And could you share how crashes looks like? I haven't seen backtraces yet.
> 
> > We don't have a _PAGE_PRESENT bit for pmds, so pmd_present() just checks if the
> > entry is not empty. pmd_none() of course does the opposite, it checks if it is
> > empty.
> 

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-15 11:31                 ` Kirill A. Shutemov
  (?)
  (?)
@ 2016-02-15 18:37                   ` Gerald Schaefer
  -1 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-15 18:37 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Sebastian Ott, Andrea Arcangeli, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

On Mon, 15 Feb 2016 13:31:59 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Sat, Feb 13, 2016 at 12:58:31PM +0100, Sebastian Ott wrote:
> > 
> > On Sat, 13 Feb 2016, Kirill A. Shutemov wrote:
> > > Could you check if revert of fecffad25458 helps?
> > 
> > I reverted fecffad25458 on top of 721675fcf277cf - it oopsed with:
> > 
> > ¢ 1851.721062! Unable to handle kernel pointer dereference in virtual kernel address space
> > ¢ 1851.721075! failing address: 0000000000000000 TEID: 0000000000000483
> > ¢ 1851.721078! Fault in home space mode while using kernel ASCE.
> > ¢ 1851.721085! AS:0000000000d5c007 R3:00000000ffff0007 S:00000000ffffa800 P:000000000000003d
> > ¢ 1851.721128! Oops: 0004 ilc:3 ¢#1! PREEMPT SMP DEBUG_PAGEALLOC
> > ¢ 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 des_generic genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common crc_itu_t dm_mod scm_block vhost_net tun vhost eadm_sch macvtap macvlan kvm autofs4
> > ¢ 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 4.5.0-rc3-00058-g07923d7-dirty #178
> > ¢ 1851.721186! task: 000000007fbfd290 ti: 000000008c604000 task.ti: 000000008c604000
> > ¢ 1851.721189! Krnl PSW : 0704d00180000000 000000000045d3b8 (__rb_erase_color+0x280/0x308)
> > ¢ 1851.721200!            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
> >                Krnl GPRS: 0000000000000001 0000000000000020 0000000000000000 00000000bd07eff1
> > ¢ 1851.721205!            000000000027ca10 0000000000000000 0000000083e45898 0000000077b61198
> > ¢ 1851.721207!            000000007ce1a490 00000000bd07eff0 000000007ce1a548 000000000027ca10
> > ¢ 1851.721210!            00000000bd07c350 00000000bd07eff0 000000008c607aa8 000000008c607a68
> > ¢ 1851.721221! Krnl Code: 000000000045d3aa: e3c0d0080024       stg     %%r12,8(%%r13)
> >                           000000000045d3b0: b9040039           lgr     %%r3,%%r9
> >                          #000000000045d3b4: a53b0001           oill    %%r3,1
> >                          >000000000045d3b8: e33010000024       stg     %%r3,0(%%r1)
> >                           000000000045d3be: ec28000e007c       cgij    %%r2,0,8,45d3da
> >                           000000000045d3c4: e34020000004       lg      %%r4,0(%%r2)
> >                           000000000045d3ca: b904001c           lgr     %%r1,%%r12
> >                           000000000045d3ce: ec143f3f0056       rosbg   %%r1,%%r4,63,63,0
> > ¢ 1851.721269! Call Trace:
> > ¢ 1851.721273! (¢<0000000083e45898>! 0x83e45898)
> > ¢ 1851.721279!  ¢<000000000029342a>! unlink_anon_vmas+0x9a/0x1d8
> > ¢ 1851.721282!  ¢<0000000000283f34>! free_pgtables+0xcc/0x148
> > ¢ 1851.721285!  ¢<000000000028c376>! exit_mmap+0xd6/0x300
> > ¢ 1851.721289!  ¢<0000000000134db8>! mmput+0x90/0x118
> > ¢ 1851.721294!  ¢<00000000002d76bc>! flush_old_exec+0x5d4/0x700
> > ¢ 1851.721298!  ¢<00000000003369f4>! load_elf_binary+0x2f4/0x13e8
> > ¢ 1851.721301!  ¢<00000000002d6e4a>! search_binary_handler+0x9a/0x1f8
> > ¢ 1851.721304!  ¢<00000000002d8970>! do_execveat_common.isra.32+0x668/0x9a0
> > ¢ 1851.721307!  ¢<00000000002d8cec>! do_execve+0x44/0x58
> > ¢ 1851.721310!  ¢<00000000002d8f92>! SyS_execve+0x3a/0x48
> > ¢ 1851.721315!  ¢<00000000006fb096>! system_call+0xd6/0x258
> > ¢ 1851.721317!  ¢<000003ff997436d6>! 0x3ff997436d6
> > ¢ 1851.721319! INFO: lockdep is turned off.
> > ¢ 1851.721321! Last Breaking-Event-Address:
> > ¢ 1851.721323!  ¢<000000000045d31a>! __rb_erase_color+0x1e2/0x308
> > ¢ 1851.721327!
> > ¢ 1851.721329! ---¢ end trace 0d80041ac00cfae2 !---
> > 
> > 
> > > 
> > > And could you share how crashes looks like? I haven't seen backtraces yet.
> > > 
> > 
> > Sure. I didn't because they really looked random to me. Most of the time
> > in rcu or list debugging but I thought these have just been the messenger
> > observing a corruption first. Anyhow, here is an older one that might look
> > interesting:
> > 
> > [   59.851421] list_del corruption. next->prev should be 000000006e1eb000, but was 0000000000000400
> 
> This kinda interesting: 0x400 is TAIL_MAPPING.. Hm..
> 
> Could you check if you see the problem on commit 1c290f642101 and its
> immediate parent?
> 

How should the page->mapping poison end up as next->prev in the list of
pre-allocated THP splitting page tables? Also, commit 1c290f642101
is before the THP rework, at least the non-bisectable part, so we should
expect not to see the problem there.

0x400 is also the value of an empty pte on s390, and the thp_deposit/withdraw
listheads are placed inside the pre-allocated pagetables instead of page->lru,
because we have 2K pagetables on s390 and cannot use struct page == pgtable_t.

So, for example, two concurrent withdraws could produce such a list
corruption, because the first withdraw will overwrite the listhead at the
beginning of the pagetable with 2 empty ptes.

Has anything changed regarding the general THP deposit/withdraw logic?

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-15 18:37                   ` Gerald Schaefer
  0 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-15 18:37 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Sebastian Ott, Andrea Arcangeli, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

On Mon, 15 Feb 2016 13:31:59 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Sat, Feb 13, 2016 at 12:58:31PM +0100, Sebastian Ott wrote:
> > 
> > On Sat, 13 Feb 2016, Kirill A. Shutemov wrote:
> > > Could you check if revert of fecffad25458 helps?
> > 
> > I reverted fecffad25458 on top of 721675fcf277cf - it oopsed with:
> > 
> > ¢ 1851.721062! Unable to handle kernel pointer dereference in virtual kernel address space
> > ¢ 1851.721075! failing address: 0000000000000000 TEID: 0000000000000483
> > ¢ 1851.721078! Fault in home space mode while using kernel ASCE.
> > ¢ 1851.721085! AS:0000000000d5c007 R3:00000000ffff0007 S:00000000ffffa800 P:000000000000003d
> > ¢ 1851.721128! Oops: 0004 ilc:3 ¢#1! PREEMPT SMP DEBUG_PAGEALLOC
> > ¢ 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 des_generic genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common crc_itu_t dm_mod scm_block vhost_net tun vhost eadm_sch macvtap macvlan kvm autofs4
> > ¢ 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 4.5.0-rc3-00058-g07923d7-dirty #178
> > ¢ 1851.721186! task: 000000007fbfd290 ti: 000000008c604000 task.ti: 000000008c604000
> > ¢ 1851.721189! Krnl PSW : 0704d00180000000 000000000045d3b8 (__rb_erase_color+0x280/0x308)
> > ¢ 1851.721200!            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
> >                Krnl GPRS: 0000000000000001 0000000000000020 0000000000000000 00000000bd07eff1
> > ¢ 1851.721205!            000000000027ca10 0000000000000000 0000000083e45898 0000000077b61198
> > ¢ 1851.721207!            000000007ce1a490 00000000bd07eff0 000000007ce1a548 000000000027ca10
> > ¢ 1851.721210!            00000000bd07c350 00000000bd07eff0 000000008c607aa8 000000008c607a68
> > ¢ 1851.721221! Krnl Code: 000000000045d3aa: e3c0d0080024       stg     %%r12,8(%%r13)
> >                           000000000045d3b0: b9040039           lgr     %%r3,%%r9
> >                          #000000000045d3b4: a53b0001           oill    %%r3,1
> >                          >000000000045d3b8: e33010000024       stg     %%r3,0(%%r1)
> >                           000000000045d3be: ec28000e007c       cgij    %%r2,0,8,45d3da
> >                           000000000045d3c4: e34020000004       lg      %%r4,0(%%r2)
> >                           000000000045d3ca: b904001c           lgr     %%r1,%%r12
> >                           000000000045d3ce: ec143f3f0056       rosbg   %%r1,%%r4,63,63,0
> > ¢ 1851.721269! Call Trace:
> > ¢ 1851.721273! (¢<0000000083e45898>! 0x83e45898)
> > ¢ 1851.721279!  ¢<000000000029342a>! unlink_anon_vmas+0x9a/0x1d8
> > ¢ 1851.721282!  ¢<0000000000283f34>! free_pgtables+0xcc/0x148
> > ¢ 1851.721285!  ¢<000000000028c376>! exit_mmap+0xd6/0x300
> > ¢ 1851.721289!  ¢<0000000000134db8>! mmput+0x90/0x118
> > ¢ 1851.721294!  ¢<00000000002d76bc>! flush_old_exec+0x5d4/0x700
> > ¢ 1851.721298!  ¢<00000000003369f4>! load_elf_binary+0x2f4/0x13e8
> > ¢ 1851.721301!  ¢<00000000002d6e4a>! search_binary_handler+0x9a/0x1f8
> > ¢ 1851.721304!  ¢<00000000002d8970>! do_execveat_common.isra.32+0x668/0x9a0
> > ¢ 1851.721307!  ¢<00000000002d8cec>! do_execve+0x44/0x58
> > ¢ 1851.721310!  ¢<00000000002d8f92>! SyS_execve+0x3a/0x48
> > ¢ 1851.721315!  ¢<00000000006fb096>! system_call+0xd6/0x258
> > ¢ 1851.721317!  ¢<000003ff997436d6>! 0x3ff997436d6
> > ¢ 1851.721319! INFO: lockdep is turned off.
> > ¢ 1851.721321! Last Breaking-Event-Address:
> > ¢ 1851.721323!  ¢<000000000045d31a>! __rb_erase_color+0x1e2/0x308
> > ¢ 1851.721327!
> > ¢ 1851.721329! ---¢ end trace 0d80041ac00cfae2 !---
> > 
> > 
> > > 
> > > And could you share how crashes looks like? I haven't seen backtraces yet.
> > > 
> > 
> > Sure. I didn't because they really looked random to me. Most of the time
> > in rcu or list debugging but I thought these have just been the messenger
> > observing a corruption first. Anyhow, here is an older one that might look
> > interesting:
> > 
> > [   59.851421] list_del corruption. next->prev should be 000000006e1eb000, but was 0000000000000400
> 
> This kinda interesting: 0x400 is TAIL_MAPPING.. Hm..
> 
> Could you check if you see the problem on commit 1c290f642101 and its
> immediate parent?
> 

How should the page->mapping poison end up as next->prev in the list of
pre-allocated THP splitting page tables? Also, commit 1c290f642101
is before the THP rework, at least the non-bisectable part, so we should
expect not to see the problem there.

0x400 is also the value of an empty pte on s390, and the thp_deposit/withdraw
listheads are placed inside the pre-allocated pagetables instead of page->lru,
because we have 2K pagetables on s390 and cannot use struct page == pgtable_t.

So, for example, two concurrent withdraws could produce such a list
corruption, because the first withdraw will overwrite the listhead at the
beginning of the pagetable with 2 empty ptes.

Has anything changed regarding the general THP deposit/withdraw logic?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-15 18:37                   ` Gerald Schaefer
  0 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-15 18:37 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Sebastian Ott, Andrea Arcangeli, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

On Mon, 15 Feb 2016 13:31:59 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Sat, Feb 13, 2016 at 12:58:31PM +0100, Sebastian Ott wrote:
> >=20
> > On Sat, 13 Feb 2016, Kirill A. Shutemov wrote:
> > > Could you check if revert of fecffad25458 helps?
> >=20
> > I reverted fecffad25458 on top of 721675fcf277cf - it oopsed with:
> >=20
> > =C2=A2 1851.721062! Unable to handle kernel pointer dereference in virt=
ual kernel address space
> > =C2=A2 1851.721075! failing address: 0000000000000000 TEID: 00000000000=
00483
> > =C2=A2 1851.721078! Fault in home space mode while using kernel ASCE.
> > =C2=A2 1851.721085! AS:0000000000d5c007 R3:00000000ffff0007 S:00000000f=
fffa800 P:000000000000003d
> > =C2=A2 1851.721128! Oops: 0004 ilc:3 =C2=A2#1! PREEMPT SMP DEBUG_PAGEAL=
LOC
> > =C2=A2 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib mlx=
4_en ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_core =
ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 des_generic=
 genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common crc_itu_t dm_mod =
scm_block vhost_net tun vhost eadm_sch macvtap macvlan kvm autofs4
> > =C2=A2 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 4.5.0-rc3=
-00058-g07923d7-dirty #178
> > =C2=A2 1851.721186! task: 000000007fbfd290 ti: 000000008c604000 task.ti=
: 000000008c604000
> > =C2=A2 1851.721189! Krnl PSW : 0704d00180000000 000000000045d3b8 (__rb_=
erase_color+0x280/0x308)
> > =C2=A2 1851.721200!            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3=
 CC:1 PM:0 EA:3
> >                Krnl GPRS: 0000000000000001 0000000000000020 00000000000=
00000 00000000bd07eff1
> > =C2=A2 1851.721205!            000000000027ca10 0000000000000000 000000=
0083e45898 0000000077b61198
> > =C2=A2 1851.721207!            000000007ce1a490 00000000bd07eff0 000000=
007ce1a548 000000000027ca10
> > =C2=A2 1851.721210!            00000000bd07c350 00000000bd07eff0 000000=
008c607aa8 000000008c607a68
> > =C2=A2 1851.721221! Krnl Code: 000000000045d3aa: e3c0d0080024       stg=
     %%r12,8(%%r13)
> >                           000000000045d3b0: b9040039           lgr     =
%%r3,%%r9
> >                          #000000000045d3b4: a53b0001           oill    =
%%r3,1
> >                          >000000000045d3b8: e33010000024       stg     =
%%r3,0(%%r1)
> >                           000000000045d3be: ec28000e007c       cgij    =
%%r2,0,8,45d3da
> >                           000000000045d3c4: e34020000004       lg      =
%%r4,0(%%r2)
> >                           000000000045d3ca: b904001c           lgr     =
%%r1,%%r12
> >                           000000000045d3ce: ec143f3f0056       rosbg   =
%%r1,%%r4,63,63,0
> > =C2=A2 1851.721269! Call Trace:
> > =C2=A2 1851.721273! (=C2=A2<0000000083e45898>! 0x83e45898)
> > =C2=A2 1851.721279!  =C2=A2<000000000029342a>! unlink_anon_vmas+0x9a/0x=
1d8
> > =C2=A2 1851.721282!  =C2=A2<0000000000283f34>! free_pgtables+0xcc/0x148
> > =C2=A2 1851.721285!  =C2=A2<000000000028c376>! exit_mmap+0xd6/0x300
> > =C2=A2 1851.721289!  =C2=A2<0000000000134db8>! mmput+0x90/0x118
> > =C2=A2 1851.721294!  =C2=A2<00000000002d76bc>! flush_old_exec+0x5d4/0x7=
00
> > =C2=A2 1851.721298!  =C2=A2<00000000003369f4>! load_elf_binary+0x2f4/0x=
13e8
> > =C2=A2 1851.721301!  =C2=A2<00000000002d6e4a>! search_binary_handler+0x=
9a/0x1f8
> > =C2=A2 1851.721304!  =C2=A2<00000000002d8970>! do_execveat_common.isra.=
32+0x668/0x9a0
> > =C2=A2 1851.721307!  =C2=A2<00000000002d8cec>! do_execve+0x44/0x58
> > =C2=A2 1851.721310!  =C2=A2<00000000002d8f92>! SyS_execve+0x3a/0x48
> > =C2=A2 1851.721315!  =C2=A2<00000000006fb096>! system_call+0xd6/0x258
> > =C2=A2 1851.721317!  =C2=A2<000003ff997436d6>! 0x3ff997436d6
> > =C2=A2 1851.721319! INFO: lockdep is turned off.
> > =C2=A2 1851.721321! Last Breaking-Event-Address:
> > =C2=A2 1851.721323!  =C2=A2<000000000045d31a>! __rb_erase_color+0x1e2/0=
x308
> > =C2=A2 1851.721327!
> > =C2=A2 1851.721329! ---=C2=A2 end trace 0d80041ac00cfae2 !---
> >=20
> >=20
> > >=20
> > > And could you share how crashes looks like? I haven't seen backtraces=
 yet.
> > >=20
> >=20
> > Sure. I didn't because they really looked random to me. Most of the time
> > in rcu or list debugging but I thought these have just been the messeng=
er
> > observing a corruption first. Anyhow, here is an older one that might l=
ook
> > interesting:
> >=20
> > [   59.851421] list_del corruption. next->prev should be 000000006e1eb0=
00, but was 0000000000000400
>=20
> This kinda interesting: 0x400 is TAIL_MAPPING.. Hm..
>=20
> Could you check if you see the problem on commit 1c290f642101 and its
> immediate parent?
>=20

How should the page->mapping poison end up as next->prev in the list of
pre-allocated THP splitting page tables? Also, commit 1c290f642101
is before the THP rework, at least the non-bisectable part, so we should
expect not to see the problem there.

0x400 is also the value of an empty pte on s390, and the thp_deposit/withdr=
aw
listheads are placed inside the pre-allocated pagetables instead of page->l=
ru,
because we have 2K pagetables on s390 and cannot use struct page =3D=3D pgt=
able_t.

So, for example, two concurrent withdraws could produce such a list
corruption, because the first withdraw will overwrite the listhead at the
beginning of the pagetable with 2 empty ptes.

Has anything changed regarding the general THP deposit/withdraw logic?

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-15 18:37                   ` Gerald Schaefer
  0 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-15 18:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 15 Feb 2016 13:31:59 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Sat, Feb 13, 2016 at 12:58:31PM +0100, Sebastian Ott wrote:
> > 
> > On Sat, 13 Feb 2016, Kirill A. Shutemov wrote:
> > > Could you check if revert of fecffad25458 helps?
> > 
> > I reverted fecffad25458 on top of 721675fcf277cf - it oopsed with:
> > 
> > ? 1851.721062! Unable to handle kernel pointer dereference in virtual kernel address space
> > ? 1851.721075! failing address: 0000000000000000 TEID: 0000000000000483
> > ? 1851.721078! Fault in home space mode while using kernel ASCE.
> > ? 1851.721085! AS:0000000000d5c007 R3:00000000ffff0007 S:00000000ffffa800 P:000000000000003d
> > ? 1851.721128! Oops: 0004 ilc:3 ?#1! PREEMPT SMP DEBUG_PAGEALLOC
> > ? 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 des_generic genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common crc_itu_t dm_mod scm_block vhost_net tun vhost eadm_sch macvtap macvlan kvm autofs4
> > ? 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 4.5.0-rc3-00058-g07923d7-dirty #178
> > ? 1851.721186! task: 000000007fbfd290 ti: 000000008c604000 task.ti: 000000008c604000
> > ? 1851.721189! Krnl PSW : 0704d00180000000 000000000045d3b8 (__rb_erase_color+0x280/0x308)
> > ? 1851.721200!            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
> >                Krnl GPRS: 0000000000000001 0000000000000020 0000000000000000 00000000bd07eff1
> > ? 1851.721205!            000000000027ca10 0000000000000000 0000000083e45898 0000000077b61198
> > ? 1851.721207!            000000007ce1a490 00000000bd07eff0 000000007ce1a548 000000000027ca10
> > ? 1851.721210!            00000000bd07c350 00000000bd07eff0 000000008c607aa8 000000008c607a68
> > ? 1851.721221! Krnl Code: 000000000045d3aa: e3c0d0080024       stg     %%r12,8(%%r13)
> >                           000000000045d3b0: b9040039           lgr     %%r3,%%r9
> >                          #000000000045d3b4: a53b0001           oill    %%r3,1
> >                          >000000000045d3b8: e33010000024       stg     %%r3,0(%%r1)
> >                           000000000045d3be: ec28000e007c       cgij    %%r2,0,8,45d3da
> >                           000000000045d3c4: e34020000004       lg      %%r4,0(%%r2)
> >                           000000000045d3ca: b904001c           lgr     %%r1,%%r12
> >                           000000000045d3ce: ec143f3f0056       rosbg   %%r1,%%r4,63,63,0
> > ? 1851.721269! Call Trace:
> > ? 1851.721273! (?<0000000083e45898>! 0x83e45898)
> > ? 1851.721279!  ?<000000000029342a>! unlink_anon_vmas+0x9a/0x1d8
> > ? 1851.721282!  ?<0000000000283f34>! free_pgtables+0xcc/0x148
> > ? 1851.721285!  ?<000000000028c376>! exit_mmap+0xd6/0x300
> > ? 1851.721289!  ?<0000000000134db8>! mmput+0x90/0x118
> > ? 1851.721294!  ?<00000000002d76bc>! flush_old_exec+0x5d4/0x700
> > ? 1851.721298!  ?<00000000003369f4>! load_elf_binary+0x2f4/0x13e8
> > ? 1851.721301!  ?<00000000002d6e4a>! search_binary_handler+0x9a/0x1f8
> > ? 1851.721304!  ?<00000000002d8970>! do_execveat_common.isra.32+0x668/0x9a0
> > ? 1851.721307!  ?<00000000002d8cec>! do_execve+0x44/0x58
> > ? 1851.721310!  ?<00000000002d8f92>! SyS_execve+0x3a/0x48
> > ? 1851.721315!  ?<00000000006fb096>! system_call+0xd6/0x258
> > ? 1851.721317!  ?<000003ff997436d6>! 0x3ff997436d6
> > ? 1851.721319! INFO: lockdep is turned off.
> > ? 1851.721321! Last Breaking-Event-Address:
> > ? 1851.721323!  ?<000000000045d31a>! __rb_erase_color+0x1e2/0x308
> > ? 1851.721327!
> > ? 1851.721329! ---? end trace 0d80041ac00cfae2 !---
> > 
> > 
> > > 
> > > And could you share how crashes looks like? I haven't seen backtraces yet.
> > > 
> > 
> > Sure. I didn't because they really looked random to me. Most of the time
> > in rcu or list debugging but I thought these have just been the messenger
> > observing a corruption first. Anyhow, here is an older one that might look
> > interesting:
> > 
> > [   59.851421] list_del corruption. next->prev should be 000000006e1eb000, but was 0000000000000400
> 
> This kinda interesting: 0x400 is TAIL_MAPPING.. Hm..
> 
> Could you check if you see the problem on commit 1c290f642101 and its
> immediate parent?
> 

How should the page->mapping poison end up as next->prev in the list of
pre-allocated THP splitting page tables? Also, commit 1c290f642101
is before the THP rework, at least the non-bisectable part, so we should
expect not to see the problem there.

0x400 is also the value of an empty pte on s390, and the thp_deposit/withdraw
listheads are placed inside the pre-allocated pagetables instead of page->lru,
because we have 2K pagetables on s390 and cannot use struct page == pgtable_t.

So, for example, two concurrent withdraws could produce such a list
corruption, because the first withdraw will overwrite the listhead at the
beginning of the pagetable with 2 empty ptes.

Has anything changed regarding the general THP deposit/withdraw logic?

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-15 18:37                   ` Gerald Schaefer
  (?)
@ 2016-02-15 21:35                     ` Kirill A. Shutemov
  -1 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-15 21:35 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Sebastian Ott, Andrea Arcangeli, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

On Mon, Feb 15, 2016 at 07:37:02PM +0100, Gerald Schaefer wrote:
> On Mon, 15 Feb 2016 13:31:59 +0200
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > On Sat, Feb 13, 2016 at 12:58:31PM +0100, Sebastian Ott wrote:
> > > 
> > > On Sat, 13 Feb 2016, Kirill A. Shutemov wrote:
> > > > Could you check if revert of fecffad25458 helps?
> > > 
> > > I reverted fecffad25458 on top of 721675fcf277cf - it oopsed with:
> > > 
> > > ¢ 1851.721062! Unable to handle kernel pointer dereference in virtual kernel address space
> > > ¢ 1851.721075! failing address: 0000000000000000 TEID: 0000000000000483
> > > ¢ 1851.721078! Fault in home space mode while using kernel ASCE.
> > > ¢ 1851.721085! AS:0000000000d5c007 R3:00000000ffff0007 S:00000000ffffa800 P:000000000000003d
> > > ¢ 1851.721128! Oops: 0004 ilc:3 ¢#1! PREEMPT SMP DEBUG_PAGEALLOC
> > > ¢ 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 des_generic genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common crc_itu_t dm_mod scm_block vhost_net tun vhost eadm_sch macvtap macvlan kvm autofs4
> > > ¢ 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 4.5.0-rc3-00058-g07923d7-dirty #178
> > > ¢ 1851.721186! task: 000000007fbfd290 ti: 000000008c604000 task.ti: 000000008c604000
> > > ¢ 1851.721189! Krnl PSW : 0704d00180000000 000000000045d3b8 (__rb_erase_color+0x280/0x308)
> > > ¢ 1851.721200!            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
> > >                Krnl GPRS: 0000000000000001 0000000000000020 0000000000000000 00000000bd07eff1
> > > ¢ 1851.721205!            000000000027ca10 0000000000000000 0000000083e45898 0000000077b61198
> > > ¢ 1851.721207!            000000007ce1a490 00000000bd07eff0 000000007ce1a548 000000000027ca10
> > > ¢ 1851.721210!            00000000bd07c350 00000000bd07eff0 000000008c607aa8 000000008c607a68
> > > ¢ 1851.721221! Krnl Code: 000000000045d3aa: e3c0d0080024       stg     %%r12,8(%%r13)
> > >                           000000000045d3b0: b9040039           lgr     %%r3,%%r9
> > >                          #000000000045d3b4: a53b0001           oill    %%r3,1
> > >                          >000000000045d3b8: e33010000024       stg     %%r3,0(%%r1)
> > >                           000000000045d3be: ec28000e007c       cgij    %%r2,0,8,45d3da
> > >                           000000000045d3c4: e34020000004       lg      %%r4,0(%%r2)
> > >                           000000000045d3ca: b904001c           lgr     %%r1,%%r12
> > >                           000000000045d3ce: ec143f3f0056       rosbg   %%r1,%%r4,63,63,0
> > > ¢ 1851.721269! Call Trace:
> > > ¢ 1851.721273! (¢<0000000083e45898>! 0x83e45898)
> > > ¢ 1851.721279!  ¢<000000000029342a>! unlink_anon_vmas+0x9a/0x1d8
> > > ¢ 1851.721282!  ¢<0000000000283f34>! free_pgtables+0xcc/0x148
> > > ¢ 1851.721285!  ¢<000000000028c376>! exit_mmap+0xd6/0x300
> > > ¢ 1851.721289!  ¢<0000000000134db8>! mmput+0x90/0x118
> > > ¢ 1851.721294!  ¢<00000000002d76bc>! flush_old_exec+0x5d4/0x700
> > > ¢ 1851.721298!  ¢<00000000003369f4>! load_elf_binary+0x2f4/0x13e8
> > > ¢ 1851.721301!  ¢<00000000002d6e4a>! search_binary_handler+0x9a/0x1f8
> > > ¢ 1851.721304!  ¢<00000000002d8970>! do_execveat_common.isra.32+0x668/0x9a0
> > > ¢ 1851.721307!  ¢<00000000002d8cec>! do_execve+0x44/0x58
> > > ¢ 1851.721310!  ¢<00000000002d8f92>! SyS_execve+0x3a/0x48
> > > ¢ 1851.721315!  ¢<00000000006fb096>! system_call+0xd6/0x258
> > > ¢ 1851.721317!  ¢<000003ff997436d6>! 0x3ff997436d6
> > > ¢ 1851.721319! INFO: lockdep is turned off.
> > > ¢ 1851.721321! Last Breaking-Event-Address:
> > > ¢ 1851.721323!  ¢<000000000045d31a>! __rb_erase_color+0x1e2/0x308
> > > ¢ 1851.721327!
> > > ¢ 1851.721329! ---¢ end trace 0d80041ac00cfae2 !---
> > > 
> > > 
> > > > 
> > > > And could you share how crashes looks like? I haven't seen backtraces yet.
> > > > 
> > > 
> > > Sure. I didn't because they really looked random to me. Most of the time
> > > in rcu or list debugging but I thought these have just been the messenger
> > > observing a corruption first. Anyhow, here is an older one that might look
> > > interesting:
> > > 
> > > [   59.851421] list_del corruption. next->prev should be 000000006e1eb000, but was 0000000000000400
> > 
> > This kinda interesting: 0x400 is TAIL_MAPPING.. Hm..
> > 
> > Could you check if you see the problem on commit 1c290f642101 and its
> > immediate parent?
> > 
> 
> How should the page->mapping poison end up as next->prev in the list of
> pre-allocated THP splitting page tables?

May be pgtable was casted to struct page or something. I don't know.

> Also, commit 1c290f642101 is before the THP rework, at least the
> non-bisectable part, so we should expect not to see the problem there.

Just to make sure: commit 122afea9626a is fine, commit 61f5d698cc97
crashes. Correct?

> 0x400 is also the value of an empty pte on s390, and the thp_deposit/withdraw
> listheads are placed inside the pre-allocated pagetables instead of page->lru,
> because we have 2K pagetables on s390 and cannot use struct page == pgtable_t.

0x400 from empty pte makes more sense than TAIL_MAPPING. But I guess it
worth changing TAIL_MAPPING to some other value to make sure.

> So, for example, two concurrent withdraws could produce such a list
> corruption, because the first withdraw will overwrite the listhead at the
> beginning of the pagetable with 2 empty ptes.
> 
> Has anything changed regarding the general THP deposit/withdraw logic?

I don't see any changes in this area.

To eliminate one more variable, I would propose to disable split pmd lock
for testing and check if it makes difference.

Is there any chance that I'll be able to trigger the bug using QEMU?
Does anybody have an QEMU image I can use?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-15 21:35                     ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-15 21:35 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Sebastian Ott, Andrea Arcangeli, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

On Mon, Feb 15, 2016 at 07:37:02PM +0100, Gerald Schaefer wrote:
> On Mon, 15 Feb 2016 13:31:59 +0200
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > On Sat, Feb 13, 2016 at 12:58:31PM +0100, Sebastian Ott wrote:
> > > 
> > > On Sat, 13 Feb 2016, Kirill A. Shutemov wrote:
> > > > Could you check if revert of fecffad25458 helps?
> > > 
> > > I reverted fecffad25458 on top of 721675fcf277cf - it oopsed with:
> > > 
> > > c 1851.721062! Unable to handle kernel pointer dereference in virtual kernel address space
> > > c 1851.721075! failing address: 0000000000000000 TEID: 0000000000000483
> > > c 1851.721078! Fault in home space mode while using kernel ASCE.
> > > c 1851.721085! AS:0000000000d5c007 R3:00000000ffff0007 S:00000000ffffa800 P:000000000000003d
> > > c 1851.721128! Oops: 0004 ilc:3 c#1! PREEMPT SMP DEBUG_PAGEALLOC
> > > c 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 des_generic genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common crc_itu_t dm_mod scm_block vhost_net tun vhost eadm_sch macvtap macvlan kvm autofs4
> > > c 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 4.5.0-rc3-00058-g07923d7-dirty #178
> > > c 1851.721186! task: 000000007fbfd290 ti: 000000008c604000 task.ti: 000000008c604000
> > > c 1851.721189! Krnl PSW : 0704d00180000000 000000000045d3b8 (__rb_erase_color+0x280/0x308)
> > > c 1851.721200!            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
> > >                Krnl GPRS: 0000000000000001 0000000000000020 0000000000000000 00000000bd07eff1
> > > c 1851.721205!            000000000027ca10 0000000000000000 0000000083e45898 0000000077b61198
> > > c 1851.721207!            000000007ce1a490 00000000bd07eff0 000000007ce1a548 000000000027ca10
> > > c 1851.721210!            00000000bd07c350 00000000bd07eff0 000000008c607aa8 000000008c607a68
> > > c 1851.721221! Krnl Code: 000000000045d3aa: e3c0d0080024       stg     %%r12,8(%%r13)
> > >                           000000000045d3b0: b9040039           lgr     %%r3,%%r9
> > >                          #000000000045d3b4: a53b0001           oill    %%r3,1
> > >                          >000000000045d3b8: e33010000024       stg     %%r3,0(%%r1)
> > >                           000000000045d3be: ec28000e007c       cgij    %%r2,0,8,45d3da
> > >                           000000000045d3c4: e34020000004       lg      %%r4,0(%%r2)
> > >                           000000000045d3ca: b904001c           lgr     %%r1,%%r12
> > >                           000000000045d3ce: ec143f3f0056       rosbg   %%r1,%%r4,63,63,0
> > > c 1851.721269! Call Trace:
> > > c 1851.721273! (c<0000000083e45898>! 0x83e45898)
> > > c 1851.721279!  c<000000000029342a>! unlink_anon_vmas+0x9a/0x1d8
> > > c 1851.721282!  c<0000000000283f34>! free_pgtables+0xcc/0x148
> > > c 1851.721285!  c<000000000028c376>! exit_mmap+0xd6/0x300
> > > c 1851.721289!  c<0000000000134db8>! mmput+0x90/0x118
> > > c 1851.721294!  c<00000000002d76bc>! flush_old_exec+0x5d4/0x700
> > > c 1851.721298!  c<00000000003369f4>! load_elf_binary+0x2f4/0x13e8
> > > c 1851.721301!  c<00000000002d6e4a>! search_binary_handler+0x9a/0x1f8
> > > c 1851.721304!  c<00000000002d8970>! do_execveat_common.isra.32+0x668/0x9a0
> > > c 1851.721307!  c<00000000002d8cec>! do_execve+0x44/0x58
> > > c 1851.721310!  c<00000000002d8f92>! SyS_execve+0x3a/0x48
> > > c 1851.721315!  c<00000000006fb096>! system_call+0xd6/0x258
> > > c 1851.721317!  c<000003ff997436d6>! 0x3ff997436d6
> > > c 1851.721319! INFO: lockdep is turned off.
> > > c 1851.721321! Last Breaking-Event-Address:
> > > c 1851.721323!  c<000000000045d31a>! __rb_erase_color+0x1e2/0x308
> > > c 1851.721327!
> > > c 1851.721329! ---c end trace 0d80041ac00cfae2 !---
> > > 
> > > 
> > > > 
> > > > And could you share how crashes looks like? I haven't seen backtraces yet.
> > > > 
> > > 
> > > Sure. I didn't because they really looked random to me. Most of the time
> > > in rcu or list debugging but I thought these have just been the messenger
> > > observing a corruption first. Anyhow, here is an older one that might look
> > > interesting:
> > > 
> > > [   59.851421] list_del corruption. next->prev should be 000000006e1eb000, but was 0000000000000400
> > 
> > This kinda interesting: 0x400 is TAIL_MAPPING.. Hm..
> > 
> > Could you check if you see the problem on commit 1c290f642101 and its
> > immediate parent?
> > 
> 
> How should the page->mapping poison end up as next->prev in the list of
> pre-allocated THP splitting page tables?

May be pgtable was casted to struct page or something. I don't know.

> Also, commit 1c290f642101 is before the THP rework, at least the
> non-bisectable part, so we should expect not to see the problem there.

Just to make sure: commit 122afea9626a is fine, commit 61f5d698cc97
crashes. Correct?

> 0x400 is also the value of an empty pte on s390, and the thp_deposit/withdraw
> listheads are placed inside the pre-allocated pagetables instead of page->lru,
> because we have 2K pagetables on s390 and cannot use struct page == pgtable_t.

0x400 from empty pte makes more sense than TAIL_MAPPING. But I guess it
worth changing TAIL_MAPPING to some other value to make sure.

> So, for example, two concurrent withdraws could produce such a list
> corruption, because the first withdraw will overwrite the listhead at the
> beginning of the pagetable with 2 empty ptes.
> 
> Has anything changed regarding the general THP deposit/withdraw logic?

I don't see any changes in this area.

To eliminate one more variable, I would propose to disable split pmd lock
for testing and check if it makes difference.

Is there any chance that I'll be able to trigger the bug using QEMU?
Does anybody have an QEMU image I can use?

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-15 21:35                     ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-15 21:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 15, 2016 at 07:37:02PM +0100, Gerald Schaefer wrote:
> On Mon, 15 Feb 2016 13:31:59 +0200
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > On Sat, Feb 13, 2016 at 12:58:31PM +0100, Sebastian Ott wrote:
> > > 
> > > On Sat, 13 Feb 2016, Kirill A. Shutemov wrote:
> > > > Could you check if revert of fecffad25458 helps?
> > > 
> > > I reverted fecffad25458 on top of 721675fcf277cf - it oopsed with:
> > > 
> > > ? 1851.721062! Unable to handle kernel pointer dereference in virtual kernel address space
> > > ? 1851.721075! failing address: 0000000000000000 TEID: 0000000000000483
> > > ? 1851.721078! Fault in home space mode while using kernel ASCE.
> > > ? 1851.721085! AS:0000000000d5c007 R3:00000000ffff0007 S:00000000ffffa800 P:000000000000003d
> > > ? 1851.721128! Oops: 0004 ilc:3 ?#1! PREEMPT SMP DEBUG_PAGEALLOC
> > > ? 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 des_generic genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common crc_itu_t dm_mod scm_block vhost_net tun vhost eadm_sch macvtap macvlan kvm autofs4
> > > ? 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 4.5.0-rc3-00058-g07923d7-dirty #178
> > > ? 1851.721186! task: 000000007fbfd290 ti: 000000008c604000 task.ti: 000000008c604000
> > > ? 1851.721189! Krnl PSW : 0704d00180000000 000000000045d3b8 (__rb_erase_color+0x280/0x308)
> > > ? 1851.721200!            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
> > >                Krnl GPRS: 0000000000000001 0000000000000020 0000000000000000 00000000bd07eff1
> > > ? 1851.721205!            000000000027ca10 0000000000000000 0000000083e45898 0000000077b61198
> > > ? 1851.721207!            000000007ce1a490 00000000bd07eff0 000000007ce1a548 000000000027ca10
> > > ? 1851.721210!            00000000bd07c350 00000000bd07eff0 000000008c607aa8 000000008c607a68
> > > ? 1851.721221! Krnl Code: 000000000045d3aa: e3c0d0080024       stg     %%r12,8(%%r13)
> > >                           000000000045d3b0: b9040039           lgr     %%r3,%%r9
> > >                          #000000000045d3b4: a53b0001           oill    %%r3,1
> > >                          >000000000045d3b8: e33010000024       stg     %%r3,0(%%r1)
> > >                           000000000045d3be: ec28000e007c       cgij    %%r2,0,8,45d3da
> > >                           000000000045d3c4: e34020000004       lg      %%r4,0(%%r2)
> > >                           000000000045d3ca: b904001c           lgr     %%r1,%%r12
> > >                           000000000045d3ce: ec143f3f0056       rosbg   %%r1,%%r4,63,63,0
> > > ? 1851.721269! Call Trace:
> > > ? 1851.721273! (?<0000000083e45898>! 0x83e45898)
> > > ? 1851.721279!  ?<000000000029342a>! unlink_anon_vmas+0x9a/0x1d8
> > > ? 1851.721282!  ?<0000000000283f34>! free_pgtables+0xcc/0x148
> > > ? 1851.721285!  ?<000000000028c376>! exit_mmap+0xd6/0x300
> > > ? 1851.721289!  ?<0000000000134db8>! mmput+0x90/0x118
> > > ? 1851.721294!  ?<00000000002d76bc>! flush_old_exec+0x5d4/0x700
> > > ? 1851.721298!  ?<00000000003369f4>! load_elf_binary+0x2f4/0x13e8
> > > ? 1851.721301!  ?<00000000002d6e4a>! search_binary_handler+0x9a/0x1f8
> > > ? 1851.721304!  ?<00000000002d8970>! do_execveat_common.isra.32+0x668/0x9a0
> > > ? 1851.721307!  ?<00000000002d8cec>! do_execve+0x44/0x58
> > > ? 1851.721310!  ?<00000000002d8f92>! SyS_execve+0x3a/0x48
> > > ? 1851.721315!  ?<00000000006fb096>! system_call+0xd6/0x258
> > > ? 1851.721317!  ?<000003ff997436d6>! 0x3ff997436d6
> > > ? 1851.721319! INFO: lockdep is turned off.
> > > ? 1851.721321! Last Breaking-Event-Address:
> > > ? 1851.721323!  ?<000000000045d31a>! __rb_erase_color+0x1e2/0x308
> > > ? 1851.721327!
> > > ? 1851.721329! ---? end trace 0d80041ac00cfae2 !---
> > > 
> > > 
> > > > 
> > > > And could you share how crashes looks like? I haven't seen backtraces yet.
> > > > 
> > > 
> > > Sure. I didn't because they really looked random to me. Most of the time
> > > in rcu or list debugging but I thought these have just been the messenger
> > > observing a corruption first. Anyhow, here is an older one that might look
> > > interesting:
> > > 
> > > [   59.851421] list_del corruption. next->prev should be 000000006e1eb000, but was 0000000000000400
> > 
> > This kinda interesting: 0x400 is TAIL_MAPPING.. Hm..
> > 
> > Could you check if you see the problem on commit 1c290f642101 and its
> > immediate parent?
> > 
> 
> How should the page->mapping poison end up as next->prev in the list of
> pre-allocated THP splitting page tables?

May be pgtable was casted to struct page or something. I don't know.

> Also, commit 1c290f642101 is before the THP rework, at least the
> non-bisectable part, so we should expect not to see the problem there.

Just to make sure: commit 122afea9626a is fine, commit 61f5d698cc97
crashes. Correct?

> 0x400 is also the value of an empty pte on s390, and the thp_deposit/withdraw
> listheads are placed inside the pre-allocated pagetables instead of page->lru,
> because we have 2K pagetables on s390 and cannot use struct page == pgtable_t.

0x400 from empty pte makes more sense than TAIL_MAPPING. But I guess it
worth changing TAIL_MAPPING to some other value to make sure.

> So, for example, two concurrent withdraws could produce such a list
> corruption, because the first withdraw will overwrite the listhead at the
> beginning of the pagetable with 2 empty ptes.
> 
> Has anything changed regarding the general THP deposit/withdraw logic?

I don't see any changes in this area.

To eliminate one more variable, I would propose to disable split pmd lock
for testing and check if it makes difference.

Is there any chance that I'll be able to trigger the bug using QEMU?
Does anybody have an QEMU image I can use?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-15 21:35                     ` Kirill A. Shutemov
  (?)
@ 2016-02-16  9:54                       ` Sebastian Ott
  -1 siblings, 0 replies; 153+ messages in thread
From: Sebastian Ott @ 2016-02-16  9:54 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Gerald Schaefer, Andrea Arcangeli, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390


On Mon, 15 Feb 2016, Kirill A. Shutemov wrote:
> Just to make sure: commit 122afea9626a is fine, commit 61f5d698cc97
> crashes. Correct?

Correct.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-16  9:54                       ` Sebastian Ott
  0 siblings, 0 replies; 153+ messages in thread
From: Sebastian Ott @ 2016-02-16  9:54 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Gerald Schaefer, Andrea Arcangeli, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390


On Mon, 15 Feb 2016, Kirill A. Shutemov wrote:
> Just to make sure: commit 122afea9626a is fine, commit 61f5d698cc97
> crashes. Correct?

Correct.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-16  9:54                       ` Sebastian Ott
  0 siblings, 0 replies; 153+ messages in thread
From: Sebastian Ott @ 2016-02-16  9:54 UTC (permalink / raw)
  To: linux-arm-kernel


On Mon, 15 Feb 2016, Kirill A. Shutemov wrote:
> Just to make sure: commit 122afea9626a is fine, commit 61f5d698cc97
> crashes. Correct?

Correct.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-15 21:35                     ` Kirill A. Shutemov
  (?)
  (?)
@ 2016-02-16 16:24                       ` Gerald Schaefer
  -1 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-16 16:24 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Sebastian Ott, Andrea Arcangeli, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

On Mon, 15 Feb 2016 23:35:26 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Mon, Feb 15, 2016 at 07:37:02PM +0100, Gerald Schaefer wrote:
> > On Mon, 15 Feb 2016 13:31:59 +0200
> > "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> > 
> > > On Sat, Feb 13, 2016 at 12:58:31PM +0100, Sebastian Ott wrote:
> > > > 
> > > > On Sat, 13 Feb 2016, Kirill A. Shutemov wrote:
> > > > > Could you check if revert of fecffad25458 helps?
> > > > 
> > > > I reverted fecffad25458 on top of 721675fcf277cf - it oopsed with:
> > > > 
> > > > ¢ 1851.721062! Unable to handle kernel pointer dereference in virtual kernel address space
> > > > ¢ 1851.721075! failing address: 0000000000000000 TEID: 0000000000000483
> > > > ¢ 1851.721078! Fault in home space mode while using kernel ASCE.
> > > > ¢ 1851.721085! AS:0000000000d5c007 R3:00000000ffff0007 S:00000000ffffa800 P:000000000000003d
> > > > ¢ 1851.721128! Oops: 0004 ilc:3 ¢#1! PREEMPT SMP DEBUG_PAGEALLOC
> > > > ¢ 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 des_generic genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common crc_itu_t dm_mod scm_block vhost_net tun vhost eadm_sch macvtap macvlan kvm autofs4
> > > > ¢ 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 4.5.0-rc3-00058-g07923d7-dirty #178
> > > > ¢ 1851.721186! task: 000000007fbfd290 ti: 000000008c604000 task.ti: 000000008c604000
> > > > ¢ 1851.721189! Krnl PSW : 0704d00180000000 000000000045d3b8 (__rb_erase_color+0x280/0x308)
> > > > ¢ 1851.721200!            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
> > > >                Krnl GPRS: 0000000000000001 0000000000000020 0000000000000000 00000000bd07eff1
> > > > ¢ 1851.721205!            000000000027ca10 0000000000000000 0000000083e45898 0000000077b61198
> > > > ¢ 1851.721207!            000000007ce1a490 00000000bd07eff0 000000007ce1a548 000000000027ca10
> > > > ¢ 1851.721210!            00000000bd07c350 00000000bd07eff0 000000008c607aa8 000000008c607a68
> > > > ¢ 1851.721221! Krnl Code: 000000000045d3aa: e3c0d0080024       stg     %%r12,8(%%r13)
> > > >                           000000000045d3b0: b9040039           lgr     %%r3,%%r9
> > > >                          #000000000045d3b4: a53b0001           oill    %%r3,1
> > > >                          >000000000045d3b8: e33010000024       stg     %%r3,0(%%r1)
> > > >                           000000000045d3be: ec28000e007c       cgij    %%r2,0,8,45d3da
> > > >                           000000000045d3c4: e34020000004       lg      %%r4,0(%%r2)
> > > >                           000000000045d3ca: b904001c           lgr     %%r1,%%r12
> > > >                           000000000045d3ce: ec143f3f0056       rosbg   %%r1,%%r4,63,63,0
> > > > ¢ 1851.721269! Call Trace:
> > > > ¢ 1851.721273! (¢<0000000083e45898>! 0x83e45898)
> > > > ¢ 1851.721279!  ¢<000000000029342a>! unlink_anon_vmas+0x9a/0x1d8
> > > > ¢ 1851.721282!  ¢<0000000000283f34>! free_pgtables+0xcc/0x148
> > > > ¢ 1851.721285!  ¢<000000000028c376>! exit_mmap+0xd6/0x300
> > > > ¢ 1851.721289!  ¢<0000000000134db8>! mmput+0x90/0x118
> > > > ¢ 1851.721294!  ¢<00000000002d76bc>! flush_old_exec+0x5d4/0x700
> > > > ¢ 1851.721298!  ¢<00000000003369f4>! load_elf_binary+0x2f4/0x13e8
> > > > ¢ 1851.721301!  ¢<00000000002d6e4a>! search_binary_handler+0x9a/0x1f8
> > > > ¢ 1851.721304!  ¢<00000000002d8970>! do_execveat_common.isra.32+0x668/0x9a0
> > > > ¢ 1851.721307!  ¢<00000000002d8cec>! do_execve+0x44/0x58
> > > > ¢ 1851.721310!  ¢<00000000002d8f92>! SyS_execve+0x3a/0x48
> > > > ¢ 1851.721315!  ¢<00000000006fb096>! system_call+0xd6/0x258
> > > > ¢ 1851.721317!  ¢<000003ff997436d6>! 0x3ff997436d6
> > > > ¢ 1851.721319! INFO: lockdep is turned off.
> > > > ¢ 1851.721321! Last Breaking-Event-Address:
> > > > ¢ 1851.721323!  ¢<000000000045d31a>! __rb_erase_color+0x1e2/0x308
> > > > ¢ 1851.721327!
> > > > ¢ 1851.721329! ---¢ end trace 0d80041ac00cfae2 !---
> > > > 
> > > > 
> > > > > 
> > > > > And could you share how crashes looks like? I haven't seen backtraces yet.
> > > > > 
> > > > 
> > > > Sure. I didn't because they really looked random to me. Most of the time
> > > > in rcu or list debugging but I thought these have just been the messenger
> > > > observing a corruption first. Anyhow, here is an older one that might look
> > > > interesting:
> > > > 
> > > > [   59.851421] list_del corruption. next->prev should be 000000006e1eb000, but was 0000000000000400
> > > 
> > > This kinda interesting: 0x400 is TAIL_MAPPING.. Hm..
> > > 
> > > Could you check if you see the problem on commit 1c290f642101 and its
> > > immediate parent?
> > > 
> > 
> > How should the page->mapping poison end up as next->prev in the list of
> > pre-allocated THP splitting page tables?
> 
> May be pgtable was casted to struct page or something. I don't know.
> 
> > Also, commit 1c290f642101 is before the THP rework, at least the
> > non-bisectable part, so we should expect not to see the problem there.
> 
> Just to make sure: commit 122afea9626a is fine, commit 61f5d698cc97
> crashes. Correct?
> 
> > 0x400 is also the value of an empty pte on s390, and the thp_deposit/withdraw
> > listheads are placed inside the pre-allocated pagetables instead of page->lru,
> > because we have 2K pagetables on s390 and cannot use struct page == pgtable_t.
> 
> 0x400 from empty pte makes more sense than TAIL_MAPPING. But I guess it
> worth changing TAIL_MAPPING to some other value to make sure.

Right, but we cannot trigger this list corruption symptom reliably, in fact
I didn't hit it at all during the last runs, and previous crash logs also
showed list corruptions with other values than 0x400, which may hint towards
concurrent pagetable freeing and re-use, given that our THP splitting pagetable
listhead is located inside the pre-allocated pagetables.

> 
> > So, for example, two concurrent withdraws could produce such a list
> > corruption, because the first withdraw will overwrite the listhead at the
> > beginning of the pagetable with 2 empty ptes.
> > 
> > Has anything changed regarding the general THP deposit/withdraw logic?
> 
> I don't see any changes in this area.
> 
> To eliminate one more variable, I would propose to disable split pmd lock
> for testing and check if it makes difference.

Disabling ARCH_ENABLE_SPLIT_PMD_PTLOCK didn't make any difference, other
than maybe a little reduction in "randomness" of the crashes, but that
may be pure coincidence. Out of about 10 runs, I always ended up with either
ODEBUG "WARNING: at lib/debugobjects.c:263" and subsequent "kernel BUG at
mm/slub.c:3629", or "bad swap file / page map" with subsequent "kernel BUG
at kernel/cred.c:142", see below for the full traces.

> 
> Is there any chance that I'll be able to trigger the bug using QEMU?
> Does anybody have an QEMU image I can use?
> 

I have no image, but trying to reproduce this under virtualization may
help to trigger this also on other architectures. After ruling out IPI
vs. fast_gup I do not really see why this should be arch-specific, and
it wouldn't be the first time that we hit subtle races first on s390, due
to our virtualized environment (my test case is make -j20 with 10 CPUs and
4GB of memory, no swap).


Here are the full traces from the runs w/o split pmd lock:

1)

[ 2584.391880] cc1 (71885) used greatest stack depth: 10496 bytes left
[ 2951.268250] ld (147667) used greatest stack depth: 10472 bytes left
[ 2972.530753] swap_free: Bad swap file entry 1000000000000000
[ 2972.530763] BUG: Bad page map in process cc1  pte:00000420 pmd:6cfd3000
[ 2972.530766] addr:0000000080d00000 vm_flags:00000875 anon_vma:          (null) mapping:000000005dc6ac70 index
:d00
[ 2972.530776] file:cc1 fault:ext4_filemap_fault mmap:ext4_file_mmap readpage:ext4_readpage
[ 2972.530781] CPU: 6 PID: 152043 Comm: cc1 Not tainted 4.5.0-rc4-00014-g1926e54-dirty #70
[ 2972.530784]        0000000071947a60 0000000071947af0 0000000000000002 0000000000000000 
                      0000000071947b90 0000000071947b08 0000000071947b08 0000000000113d38 
                      0000000000000000 0000000000b70df4 0000000000b4f348 000000000000000b 
                      0000000071947b50 0000000071947af0 0000000000000000 0000000000000000 
                      07000000c3763ae8 0000000000113d38 0000000071947af0 0000000071947b50 
[ 2972.530811] Call Trace:
[ 2972.530818] ([<0000000000113c3c>] show_trace+0x12c/0x150)
[ 2972.530821]  [<0000000000113cee>] show_stack+0x8e/0xf0
[ 2972.530826]  [<000000000068b8ec>] dump_stack+0x9c/0xe0
[ 2972.530830]  [<00000000002bbeda>] print_bad_pte+0x222/0x238
[ 2972.530833]  [<00000000002beb92>] zap_pte_range+0x442/0x790
[ 2972.530835]  [<00000000002bf2c6>] unmap_single_vma+0x3e6/0x400
[ 2972.530837]  [<00000000002c0f46>] unmap_vmas+0x8e/0xc8
[ 2972.530840]  [<00000000002c9a56>] exit_mmap+0xc6/0x300
[ 2972.530844]  [<0000000000138b10>] mmput+0xa0/0x128
[ 2972.530847]  [<000000000013fcb4>] do_exit+0x42c/0xd60
[ 2972.530849]  [<00000000001406f0>] do_group_exit+0x98/0xe0
[ 2972.530851]  [<0000000000140768>] __wake_up_parent+0x0/0x28
[ 2972.530855]  [<0000000000910f2e>] system_call+0xd6/0x270
[ 2972.530883]  [<000003ff89b43698>] 0x3ff89b43698
[ 2972.530886] 1 lock held by cc1/152043:
[ 2972.530887]  #0:  (&(ptlock_ptr(page))->rlock){+.+.-.}, at: [<00000000002be7f6>] zap_pte_range+0xa6/0x790
[ 2972.530897] Disabling lock debugging due to kernel taint
[ 2972.533069] BUG: Bad rss-counter state mm:00000000719d0e00 idx:2 val:-1
[ 5899.109157] ------------[ cut here ]------------
[ 5899.109166] kernel BUG at kernel/cred.c:142!
[ 5899.109211] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 5899.109217] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc mlx4_ib ib_sa ib_mad mlx4_en ib_core vxlan udp_tunnel ptp ib_addr pps_core ghash_s390 prng ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 sha_common mlx4_core eadm_sch nfsd vhost_net tun vhost macvtap auth_rpcgss macvlan kvm oid_registry nfs_acl lockd grace sunrpc dm_multipath dm_mod autofs4
[ 5899.109279] CPU: 1 PID: 12 Comm: ksoftirqd/1 Tainted: G    B           4.5.0-rc4-00014-g1926e54-dirty #70
[ 5899.109283] task: 00000000d09e2a48 ti: 00000000d09f4000 task.ti: 00000000d09f4000
[ 5899.109286] Krnl PSW : 0704c00180000000 00000000001651aa (__put_cred+0x22/0x68)
[ 5899.109296]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
               Krnl GPRS: 0000000000000002 0000000000000020 000000007431f000 00000000c38e3400
[ 5899.109301]            000000000032aaf8 0000000000000002 0000000000000000 000000000000000a
[ 5899.109304]            0000000000000000 000000000032aac0 0000000000000008 00000000749ad000
[ 5899.109306]            00000000c38e3400 000000007431f000 000000000032ab2e 00000000d09f7bf0
[ 5899.109316] Krnl Code: 000000000016519c: 58102004            l       %%r1,4(%%r2)
                          00000000001651a0: ec180005007e        cij     %%r1,0,8,1651aa
                         #00000000001651a6: a7f40001            brc     15,1651a8
                         >00000000001651aa: e3e020080024        stg     %%r14,8(%%r2)
                          00000000001651b0: c01944656144        iilf    %%r1,1147494724
                          00000000001651b6: 50102010            st      %%r1,16(%%r2)
                          00000000001651ba: e31003100004        lg      %%r1,784
                          00000000001651c0: e32018300020        cg      %%r2,2096(%%r1)
[ 5899.109371] Call Trace:
[ 5899.109376] ([<000000000032aaf8>] file_free_rcu+0x38/0x88)
[ 5899.109381]  [<00000000001c5ddc>] rcu_process_callbacks+0x5fc/0x9f0
[ 5899.109385]  [<0000000000141794>] __do_softirq+0x25c/0x570
[ 5899.109387]  [<0000000000141ae6>] run_ksoftirqd+0x3e/0xa0
[ 5899.109391]  [<0000000000167bee>] smpboot_thread_fn+0x30e/0x360
[ 5899.109394]  [<0000000000162f4a>] kthread+0x112/0x128
[ 5899.109398]  [<00000000009110fa>] kernel_thread_starter+0x6/0xc
[ 5899.109401]  [<00000000009110f4>] kernel_thread_starter+0x0/0xc
[ 5899.109403] INFO: lockdep is turned off.
[ 5899.109405] Last Breaking-Event-Address:
[ 5899.109407]  [<00000000001651a6>] __put_cred+0x1e/0x68
[ 5899.109411]  
[ 5899.109414] Kernel panic - not syncing: Fatal exception in interrupt


2)

[ 7790.934295] ODEBUG: active_state not available (active state 0) object type: rcu_head hint:           (null)
[ 7790.934356] ------------[ cut here ]------------
[ 7790.934359] WARNING: at lib/debugobjects.c:263
[ 7790.934361] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack mlx4_ib ib_sa ipt_REJECT mlx4_en ib_mad nf_reject_ipv4 ib_core vxlan udp_tunnel ptp xt_tcpudp ib_addr pps_core iptable_filter ip_tables x_tables bridge stp llc ghash_s390 prng ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 mlx4_core sha1_s390 sha_common eadm_sch vhost_net nfsd tun vhost macvtap macvlan auth_rpcgss kvm oid_registry nfs_acl lockd grace sunrpc dm_multipath dm_mod autofs4
[ 7790.934417] CPU: 8 PID: 40 Comm: ksoftirqd/8 Not tainted 4.5.0-rc4-00014-g1926e54-dirty #149
[ 7790.934420] task: 00000000e2955490 ti: 00000000e2958000 task.ti: 00000000e2958000
[ 7790.934422] Krnl PSW : 0404c00180000000 000000000071c340 (debug_print_object+0xb0/0xd0)
[ 7790.934431]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
               Krnl GPRS: 0000000001e6e3c7 00000000e2955490 0000000000000060 00000000e2958000
[ 7790.934435]            000000000071c33c 0000000000000000 0000000000b975e8 0000000001f2b008
[ 7790.934437]            07000000001d7e24 0000000000000000 0000000001f2b010 0000000000bea6b8
[ 7790.934440]            0000000000e241f8 00000000e295bc38 000000000071c33c 00000000e295bb38
[ 7790.934449] Krnl Code: 000000000071c330: c41f00bf6a14        strl    %%r1,1f09758
                          000000000071c336: c0e5ffdbd64d        brasl   %%r14,296fd0
                         #000000000071c33c: a7f40001            brc     15,71c33e
                         >000000000071c340: c41d0036e746        lrl     %%r1,df91cc
                          000000000071c346: e340f0e80004        lg      %%r4,232(%%r15)
                          000000000071c34c: a71a0001            ahi     %%r1,1
                          000000000071c350: eb6ff0a80004        lmg     %%r6,%%r15,168(%%r15)
                          000000000071c356: c41f0036e73b        strl    %%r1,df91cc
[ 7790.934493] Call Trace:
[ 7790.934495] ([<000000000071c33c>] debug_print_object+0xac/0xd0)
[ 7790.934498]  [<000000000071d704>] debug_object_active_state+0x164/0x178
[ 7790.934504]  [<00000000001d7da4>] rcu_process_callbacks+0x57c/0xa00
[ 7790.934508]  [<00000000001487ec>] __do_softirq+0x26c/0x580
[ 7790.934510]  [<0000000000148b50>] run_ksoftirqd+0x50/0xb0
[ 7790.934515]  [<0000000000172b28>] smpboot_thread_fn+0x320/0x378
[ 7790.934517]  [<000000000016d21c>] kthread+0x124/0x138
[ 7790.934521]  [<00000000009a1d72>] kernel_thread_starter+0x6/0xc
[ 7790.934524]  [<00000000009a1d6c>] kernel_thread_starter+0x0/0xc
[ 7790.934526] 1 lock held by ksoftirqd/8/40:
[ 7790.934528]  #0:  (&obj_hash[i].lock){-.-.-.}, at: [<000000000071d64c>] debug_object_active_state+0xac/0x178
[ 7790.934535] Last Breaking-Event-Address:
[ 7790.934537]  [<000000000071c33c>] debug_print_object+0xac/0xd0
[ 7790.934539] ---[ end trace b583bfd967a78637 ]---
[ 7790.934543] ODEBUG: deactivate not available (active state 0) object type: rcu_head hint:           (null)
[ 7790.934551] ------------[ cut here ]------------
[ 7790.934553] WARNING: at lib/debugobjects.c:263
[ 7790.934555] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack mlx4_ib ib_sa ipt_REJECT mlx4_en ib_mad nf_reject_ipv4 ib_core vxlan udp_tunnel ptp xt_tcpudp ib_addr pps_core iptable_filter ip_tables x_tables bridge stp llc ghash_s390 prng ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 mlx4_core sha1_s390 sha_common eadm_sch vhost_net nfsd tun vhost macvtap macvlan auth_rpcgss kvm oid_registry nfs_acl lockd grace sunrpc dm_multipath dm_mod autofs4
[ 7790.934599] CPU: 8 PID: 40 Comm: ksoftirqd/8 Tainted: G        W       4.5.0-rc4-00014-g1926e54-dirty #149
[ 7790.934601] task: 00000000e2955490 ti: 00000000e2958000 task.ti: 00000000e2958000
[ 7790.934603] Krnl PSW : 0404c00180000000 000000000071c340 (debug_print_object+0xb0/0xd0)
[ 7790.934608]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
               Krnl GPRS: 0000000001e6e3c7 00000000e2955490 000000000000005e 00000000e2958000
[ 7790.934612]            000000000071c33c 0000000000000000 0000000000b975e8 000000000000000a
[ 7790.934614]            0000000004bcd020 0700000001f2b010 0000000001f2b010 0000000000ba5d0a
[ 7790.934617]            0000000000e241f8 00000000e295bc48 000000000071c33c 00000000e295bb48
[ 7790.934622] Krnl Code: 000000000071c330: c41f00bf6a14        strl    %%r1,1f09758
                          000000000071c336: c0e5ffdbd64d        brasl   %%r14,296fd0
                         #000000000071c33c: a7f40001            brc     15,71c33e
                         >000000000071c340: c41d0036e746        lrl     %%r1,df91cc
                          000000000071c346: e340f0e80004        lg      %%r4,232(%%r15)
                          000000000071c34c: a71a0001            ahi     %%r1,1
                          000000000071c350: eb6ff0a80004        lmg     %%r6,%%r15,168(%%r15)
                          000000000071c356: c41f0036e73b        strl    %%r1,df91cc
[ 7790.934639] Call Trace:
[ 7790.934641] ([<000000000071c33c>] debug_print_object+0xac/0xd0)
[ 7790.934644]  [<000000000071d0a8>] debug_object_deactivate+0x170/0x188
[ 7790.934646]  [<00000000001d7db6>] rcu_process_callbacks+0x58e/0xa00
[ 7790.934648]  [<00000000001487ec>] __do_softirq+0x26c/0x580
[ 7790.934651]  [<0000000000148b50>] run_ksoftirqd+0x50/0xb0
[ 7790.934653]  [<0000000000172b28>] smpboot_thread_fn+0x320/0x378
[ 7790.934655]  [<000000000016d21c>] kthread+0x124/0x138
[ 7790.934657]  [<00000000009a1d72>] kernel_thread_starter+0x6/0xc
[ 7790.934659]  [<00000000009a1d6c>] kernel_thread_starter+0x0/0xc
[ 7790.934661] 1 lock held by ksoftirqd/8/40:
[ 7790.934663]  #0:  (&obj_hash[i].lock){-.-.-.}, at: [<000000000071cfdc>] debug_object_deactivate+0xa4/0x188
[ 7790.934669] Last Breaking-Event-Address:
[ 7790.934671]  [<000000000071c33c>] debug_print_object+0xac/0xd0
[ 7790.934673] ---[ end trace b583bfd967a78638 ]---
[ 7790.934680] ------------[ cut here ]------------
[ 7790.934682] kernel BUG at mm/slub.c:3629!
[ 7790.934707] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 7790.934715] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack mlx4_ib ib_sa ipt_REJECT mlx4_en ib_mad nf_reject_ipv4 ib_core vxlan udp_tunnel ptp xt_tcpudp ib_addr pps_core iptable_filter ip_tables x_tables bridge stp llc ghash_s390 prng ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 mlx4_core sha1_s390 sha_common eadm_sch vhost_net nfsd tun vhost macvtap macvlan auth_rpcgss kvm oid_registry nfs_acl lockd grace sunrpc dm_multipath dm_mod autofs4
[ 7790.934789] CPU: 8 PID: 40 Comm: ksoftirqd/8 Tainted: G        W       4.5.0-rc4-00014-g1926e54-dirty #149
[ 7790.934791] task: 00000000e2955490 ti: 00000000e2958000 task.ti: 00000000e2958000
[ 7790.934794] Krnl PSW : 0704c00180000000 000000000032295a (kfree+0x3f2/0x428)
[ 7790.934801]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
               Krnl GPRS: 0000000000000000 0000000000000100 0000000000000100 0000000000e24260
[ 7790.934806]            00000000001d1c82 0000000000000000 0000000000000000 000000000000000a
[ 7790.934809]            0000000000000001 00000000001d7e0a 0000000000000006 000003d10012f340
[ 7790.934812]            0000000004bcd000 0000000000f0433c 000000000032267a 00000000e295bbb0
[ 7790.934818] Krnl Code: 000000000032294c: c0e50033bef6        brasl   %%r14,99a738
                          0000000000322952: a7f4feba            brc     15,3226c6
                         #0000000000322956: a7f40001            brc     15,322958
                         >000000000032295a: e310b0060090        llgc    %%r1,6(%%r11)
                          0000000000322960: a7110040            tmll    %%r1,64
                          0000000000322964: a774fee9            brc     7,322736
                          0000000000322968: a7f4feeb            brc     15,32273e
                          000000000032296c: c0e50033be32        brasl   %%r14,99a5d0
[ 7790.934838] Call Trace:
[ 7790.934841] ([<000000000032267a>] kfree+0x112/0x428)
[ 7790.934844]  [<00000000001d7e0a>] rcu_process_callbacks+0x5e2/0xa00
[ 7790.934847]  [<00000000001487ec>] __do_softirq+0x26c/0x580
[ 7790.934850]  [<0000000000148b50>] run_ksoftirqd+0x50/0xb0
[ 7790.934854]  [<0000000000172b28>] smpboot_thread_fn+0x320/0x378
[ 7790.934856]  [<000000000016d21c>] kthread+0x124/0x138
[ 7790.934859]  [<00000000009a1d72>] kernel_thread_starter+0x6/0xc
[ 7790.934862]  [<00000000009a1d6c>] kernel_thread_starter+0x0/0xc
[ 7790.934864] INFO: lockdep is turned off.
[ 7790.934866] Last Breaking-Event-Address:
[ 7790.934869]  [<0000000000322956>] kfree+0x3ee/0x428
[ 7790.934873]  
[ 7790.934876] Kernel panic - not syncing: Fatal exception in interrupt

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-16 16:24                       ` Gerald Schaefer
  0 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-16 16:24 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Sebastian Ott, Andrea Arcangeli, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

On Mon, 15 Feb 2016 23:35:26 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Mon, Feb 15, 2016 at 07:37:02PM +0100, Gerald Schaefer wrote:
> > On Mon, 15 Feb 2016 13:31:59 +0200
> > "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> > 
> > > On Sat, Feb 13, 2016 at 12:58:31PM +0100, Sebastian Ott wrote:
> > > > 
> > > > On Sat, 13 Feb 2016, Kirill A. Shutemov wrote:
> > > > > Could you check if revert of fecffad25458 helps?
> > > > 
> > > > I reverted fecffad25458 on top of 721675fcf277cf - it oopsed with:
> > > > 
> > > > ¢ 1851.721062! Unable to handle kernel pointer dereference in virtual kernel address space
> > > > ¢ 1851.721075! failing address: 0000000000000000 TEID: 0000000000000483
> > > > ¢ 1851.721078! Fault in home space mode while using kernel ASCE.
> > > > ¢ 1851.721085! AS:0000000000d5c007 R3:00000000ffff0007 S:00000000ffffa800 P:000000000000003d
> > > > ¢ 1851.721128! Oops: 0004 ilc:3 ¢#1! PREEMPT SMP DEBUG_PAGEALLOC
> > > > ¢ 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 des_generic genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common crc_itu_t dm_mod scm_block vhost_net tun vhost eadm_sch macvtap macvlan kvm autofs4
> > > > ¢ 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 4.5.0-rc3-00058-g07923d7-dirty #178
> > > > ¢ 1851.721186! task: 000000007fbfd290 ti: 000000008c604000 task.ti: 000000008c604000
> > > > ¢ 1851.721189! Krnl PSW : 0704d00180000000 000000000045d3b8 (__rb_erase_color+0x280/0x308)
> > > > ¢ 1851.721200!            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
> > > >                Krnl GPRS: 0000000000000001 0000000000000020 0000000000000000 00000000bd07eff1
> > > > ¢ 1851.721205!            000000000027ca10 0000000000000000 0000000083e45898 0000000077b61198
> > > > ¢ 1851.721207!            000000007ce1a490 00000000bd07eff0 000000007ce1a548 000000000027ca10
> > > > ¢ 1851.721210!            00000000bd07c350 00000000bd07eff0 000000008c607aa8 000000008c607a68
> > > > ¢ 1851.721221! Krnl Code: 000000000045d3aa: e3c0d0080024       stg     %%r12,8(%%r13)
> > > >                           000000000045d3b0: b9040039           lgr     %%r3,%%r9
> > > >                          #000000000045d3b4: a53b0001           oill    %%r3,1
> > > >                          >000000000045d3b8: e33010000024       stg     %%r3,0(%%r1)
> > > >                           000000000045d3be: ec28000e007c       cgij    %%r2,0,8,45d3da
> > > >                           000000000045d3c4: e34020000004       lg      %%r4,0(%%r2)
> > > >                           000000000045d3ca: b904001c           lgr     %%r1,%%r12
> > > >                           000000000045d3ce: ec143f3f0056       rosbg   %%r1,%%r4,63,63,0
> > > > ¢ 1851.721269! Call Trace:
> > > > ¢ 1851.721273! (¢<0000000083e45898>! 0x83e45898)
> > > > ¢ 1851.721279!  ¢<000000000029342a>! unlink_anon_vmas+0x9a/0x1d8
> > > > ¢ 1851.721282!  ¢<0000000000283f34>! free_pgtables+0xcc/0x148
> > > > ¢ 1851.721285!  ¢<000000000028c376>! exit_mmap+0xd6/0x300
> > > > ¢ 1851.721289!  ¢<0000000000134db8>! mmput+0x90/0x118
> > > > ¢ 1851.721294!  ¢<00000000002d76bc>! flush_old_exec+0x5d4/0x700
> > > > ¢ 1851.721298!  ¢<00000000003369f4>! load_elf_binary+0x2f4/0x13e8
> > > > ¢ 1851.721301!  ¢<00000000002d6e4a>! search_binary_handler+0x9a/0x1f8
> > > > ¢ 1851.721304!  ¢<00000000002d8970>! do_execveat_common.isra.32+0x668/0x9a0
> > > > ¢ 1851.721307!  ¢<00000000002d8cec>! do_execve+0x44/0x58
> > > > ¢ 1851.721310!  ¢<00000000002d8f92>! SyS_execve+0x3a/0x48
> > > > ¢ 1851.721315!  ¢<00000000006fb096>! system_call+0xd6/0x258
> > > > ¢ 1851.721317!  ¢<000003ff997436d6>! 0x3ff997436d6
> > > > ¢ 1851.721319! INFO: lockdep is turned off.
> > > > ¢ 1851.721321! Last Breaking-Event-Address:
> > > > ¢ 1851.721323!  ¢<000000000045d31a>! __rb_erase_color+0x1e2/0x308
> > > > ¢ 1851.721327!
> > > > ¢ 1851.721329! ---¢ end trace 0d80041ac00cfae2 !---
> > > > 
> > > > 
> > > > > 
> > > > > And could you share how crashes looks like? I haven't seen backtraces yet.
> > > > > 
> > > > 
> > > > Sure. I didn't because they really looked random to me. Most of the time
> > > > in rcu or list debugging but I thought these have just been the messenger
> > > > observing a corruption first. Anyhow, here is an older one that might look
> > > > interesting:
> > > > 
> > > > [   59.851421] list_del corruption. next->prev should be 000000006e1eb000, but was 0000000000000400
> > > 
> > > This kinda interesting: 0x400 is TAIL_MAPPING.. Hm..
> > > 
> > > Could you check if you see the problem on commit 1c290f642101 and its
> > > immediate parent?
> > > 
> > 
> > How should the page->mapping poison end up as next->prev in the list of
> > pre-allocated THP splitting page tables?
> 
> May be pgtable was casted to struct page or something. I don't know.
> 
> > Also, commit 1c290f642101 is before the THP rework, at least the
> > non-bisectable part, so we should expect not to see the problem there.
> 
> Just to make sure: commit 122afea9626a is fine, commit 61f5d698cc97
> crashes. Correct?
> 
> > 0x400 is also the value of an empty pte on s390, and the thp_deposit/withdraw
> > listheads are placed inside the pre-allocated pagetables instead of page->lru,
> > because we have 2K pagetables on s390 and cannot use struct page == pgtable_t.
> 
> 0x400 from empty pte makes more sense than TAIL_MAPPING. But I guess it
> worth changing TAIL_MAPPING to some other value to make sure.

Right, but we cannot trigger this list corruption symptom reliably, in fact
I didn't hit it at all during the last runs, and previous crash logs also
showed list corruptions with other values than 0x400, which may hint towards
concurrent pagetable freeing and re-use, given that our THP splitting pagetable
listhead is located inside the pre-allocated pagetables.

> 
> > So, for example, two concurrent withdraws could produce such a list
> > corruption, because the first withdraw will overwrite the listhead at the
> > beginning of the pagetable with 2 empty ptes.
> > 
> > Has anything changed regarding the general THP deposit/withdraw logic?
> 
> I don't see any changes in this area.
> 
> To eliminate one more variable, I would propose to disable split pmd lock
> for testing and check if it makes difference.

Disabling ARCH_ENABLE_SPLIT_PMD_PTLOCK didn't make any difference, other
than maybe a little reduction in "randomness" of the crashes, but that
may be pure coincidence. Out of about 10 runs, I always ended up with either
ODEBUG "WARNING: at lib/debugobjects.c:263" and subsequent "kernel BUG at
mm/slub.c:3629", or "bad swap file / page map" with subsequent "kernel BUG
at kernel/cred.c:142", see below for the full traces.

> 
> Is there any chance that I'll be able to trigger the bug using QEMU?
> Does anybody have an QEMU image I can use?
> 

I have no image, but trying to reproduce this under virtualization may
help to trigger this also on other architectures. After ruling out IPI
vs. fast_gup I do not really see why this should be arch-specific, and
it wouldn't be the first time that we hit subtle races first on s390, due
to our virtualized environment (my test case is make -j20 with 10 CPUs and
4GB of memory, no swap).


Here are the full traces from the runs w/o split pmd lock:

1)

[ 2584.391880] cc1 (71885) used greatest stack depth: 10496 bytes left
[ 2951.268250] ld (147667) used greatest stack depth: 10472 bytes left
[ 2972.530753] swap_free: Bad swap file entry 1000000000000000
[ 2972.530763] BUG: Bad page map in process cc1  pte:00000420 pmd:6cfd3000
[ 2972.530766] addr:0000000080d00000 vm_flags:00000875 anon_vma:          (null) mapping:000000005dc6ac70 index
:d00
[ 2972.530776] file:cc1 fault:ext4_filemap_fault mmap:ext4_file_mmap readpage:ext4_readpage
[ 2972.530781] CPU: 6 PID: 152043 Comm: cc1 Not tainted 4.5.0-rc4-00014-g1926e54-dirty #70
[ 2972.530784]        0000000071947a60 0000000071947af0 0000000000000002 0000000000000000 
                      0000000071947b90 0000000071947b08 0000000071947b08 0000000000113d38 
                      0000000000000000 0000000000b70df4 0000000000b4f348 000000000000000b 
                      0000000071947b50 0000000071947af0 0000000000000000 0000000000000000 
                      07000000c3763ae8 0000000000113d38 0000000071947af0 0000000071947b50 
[ 2972.530811] Call Trace:
[ 2972.530818] ([<0000000000113c3c>] show_trace+0x12c/0x150)
[ 2972.530821]  [<0000000000113cee>] show_stack+0x8e/0xf0
[ 2972.530826]  [<000000000068b8ec>] dump_stack+0x9c/0xe0
[ 2972.530830]  [<00000000002bbeda>] print_bad_pte+0x222/0x238
[ 2972.530833]  [<00000000002beb92>] zap_pte_range+0x442/0x790
[ 2972.530835]  [<00000000002bf2c6>] unmap_single_vma+0x3e6/0x400
[ 2972.530837]  [<00000000002c0f46>] unmap_vmas+0x8e/0xc8
[ 2972.530840]  [<00000000002c9a56>] exit_mmap+0xc6/0x300
[ 2972.530844]  [<0000000000138b10>] mmput+0xa0/0x128
[ 2972.530847]  [<000000000013fcb4>] do_exit+0x42c/0xd60
[ 2972.530849]  [<00000000001406f0>] do_group_exit+0x98/0xe0
[ 2972.530851]  [<0000000000140768>] __wake_up_parent+0x0/0x28
[ 2972.530855]  [<0000000000910f2e>] system_call+0xd6/0x270
[ 2972.530883]  [<000003ff89b43698>] 0x3ff89b43698
[ 2972.530886] 1 lock held by cc1/152043:
[ 2972.530887]  #0:  (&(ptlock_ptr(page))->rlock){+.+.-.}, at: [<00000000002be7f6>] zap_pte_range+0xa6/0x790
[ 2972.530897] Disabling lock debugging due to kernel taint
[ 2972.533069] BUG: Bad rss-counter state mm:00000000719d0e00 idx:2 val:-1
[ 5899.109157] ------------[ cut here ]------------
[ 5899.109166] kernel BUG at kernel/cred.c:142!
[ 5899.109211] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 5899.109217] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc mlx4_ib ib_sa ib_mad mlx4_en ib_core vxlan udp_tunnel ptp ib_addr pps_core ghash_s390 prng ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 sha_common mlx4_core eadm_sch nfsd vhost_net tun vhost macvtap auth_rpcgss macvlan kvm oid_registry nfs_acl lockd grace sunrpc dm_multipath dm_mod autofs4
[ 5899.109279] CPU: 1 PID: 12 Comm: ksoftirqd/1 Tainted: G    B           4.5.0-rc4-00014-g1926e54-dirty #70
[ 5899.109283] task: 00000000d09e2a48 ti: 00000000d09f4000 task.ti: 00000000d09f4000
[ 5899.109286] Krnl PSW : 0704c00180000000 00000000001651aa (__put_cred+0x22/0x68)
[ 5899.109296]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
               Krnl GPRS: 0000000000000002 0000000000000020 000000007431f000 00000000c38e3400
[ 5899.109301]            000000000032aaf8 0000000000000002 0000000000000000 000000000000000a
[ 5899.109304]            0000000000000000 000000000032aac0 0000000000000008 00000000749ad000
[ 5899.109306]            00000000c38e3400 000000007431f000 000000000032ab2e 00000000d09f7bf0
[ 5899.109316] Krnl Code: 000000000016519c: 58102004            l       %%r1,4(%%r2)
                          00000000001651a0: ec180005007e        cij     %%r1,0,8,1651aa
                         #00000000001651a6: a7f40001            brc     15,1651a8
                         >00000000001651aa: e3e020080024        stg     %%r14,8(%%r2)
                          00000000001651b0: c01944656144        iilf    %%r1,1147494724
                          00000000001651b6: 50102010            st      %%r1,16(%%r2)
                          00000000001651ba: e31003100004        lg      %%r1,784
                          00000000001651c0: e32018300020        cg      %%r2,2096(%%r1)
[ 5899.109371] Call Trace:
[ 5899.109376] ([<000000000032aaf8>] file_free_rcu+0x38/0x88)
[ 5899.109381]  [<00000000001c5ddc>] rcu_process_callbacks+0x5fc/0x9f0
[ 5899.109385]  [<0000000000141794>] __do_softirq+0x25c/0x570
[ 5899.109387]  [<0000000000141ae6>] run_ksoftirqd+0x3e/0xa0
[ 5899.109391]  [<0000000000167bee>] smpboot_thread_fn+0x30e/0x360
[ 5899.109394]  [<0000000000162f4a>] kthread+0x112/0x128
[ 5899.109398]  [<00000000009110fa>] kernel_thread_starter+0x6/0xc
[ 5899.109401]  [<00000000009110f4>] kernel_thread_starter+0x0/0xc
[ 5899.109403] INFO: lockdep is turned off.
[ 5899.109405] Last Breaking-Event-Address:
[ 5899.109407]  [<00000000001651a6>] __put_cred+0x1e/0x68
[ 5899.109411]  
[ 5899.109414] Kernel panic - not syncing: Fatal exception in interrupt


2)

[ 7790.934295] ODEBUG: active_state not available (active state 0) object type: rcu_head hint:           (null)
[ 7790.934356] ------------[ cut here ]------------
[ 7790.934359] WARNING: at lib/debugobjects.c:263
[ 7790.934361] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack mlx4_ib ib_sa ipt_REJECT mlx4_en ib_mad nf_reject_ipv4 ib_core vxlan udp_tunnel ptp xt_tcpudp ib_addr pps_core iptable_filter ip_tables x_tables bridge stp llc ghash_s390 prng ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 mlx4_core sha1_s390 sha_common eadm_sch vhost_net nfsd tun vhost macvtap macvlan auth_rpcgss kvm oid_registry nfs_acl lockd grace sunrpc dm_multipath dm_mod autofs4
[ 7790.934417] CPU: 8 PID: 40 Comm: ksoftirqd/8 Not tainted 4.5.0-rc4-00014-g1926e54-dirty #149
[ 7790.934420] task: 00000000e2955490 ti: 00000000e2958000 task.ti: 00000000e2958000
[ 7790.934422] Krnl PSW : 0404c00180000000 000000000071c340 (debug_print_object+0xb0/0xd0)
[ 7790.934431]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
               Krnl GPRS: 0000000001e6e3c7 00000000e2955490 0000000000000060 00000000e2958000
[ 7790.934435]            000000000071c33c 0000000000000000 0000000000b975e8 0000000001f2b008
[ 7790.934437]            07000000001d7e24 0000000000000000 0000000001f2b010 0000000000bea6b8
[ 7790.934440]            0000000000e241f8 00000000e295bc38 000000000071c33c 00000000e295bb38
[ 7790.934449] Krnl Code: 000000000071c330: c41f00bf6a14        strl    %%r1,1f09758
                          000000000071c336: c0e5ffdbd64d        brasl   %%r14,296fd0
                         #000000000071c33c: a7f40001            brc     15,71c33e
                         >000000000071c340: c41d0036e746        lrl     %%r1,df91cc
                          000000000071c346: e340f0e80004        lg      %%r4,232(%%r15)
                          000000000071c34c: a71a0001            ahi     %%r1,1
                          000000000071c350: eb6ff0a80004        lmg     %%r6,%%r15,168(%%r15)
                          000000000071c356: c41f0036e73b        strl    %%r1,df91cc
[ 7790.934493] Call Trace:
[ 7790.934495] ([<000000000071c33c>] debug_print_object+0xac/0xd0)
[ 7790.934498]  [<000000000071d704>] debug_object_active_state+0x164/0x178
[ 7790.934504]  [<00000000001d7da4>] rcu_process_callbacks+0x57c/0xa00
[ 7790.934508]  [<00000000001487ec>] __do_softirq+0x26c/0x580
[ 7790.934510]  [<0000000000148b50>] run_ksoftirqd+0x50/0xb0
[ 7790.934515]  [<0000000000172b28>] smpboot_thread_fn+0x320/0x378
[ 7790.934517]  [<000000000016d21c>] kthread+0x124/0x138
[ 7790.934521]  [<00000000009a1d72>] kernel_thread_starter+0x6/0xc
[ 7790.934524]  [<00000000009a1d6c>] kernel_thread_starter+0x0/0xc
[ 7790.934526] 1 lock held by ksoftirqd/8/40:
[ 7790.934528]  #0:  (&obj_hash[i].lock){-.-.-.}, at: [<000000000071d64c>] debug_object_active_state+0xac/0x178
[ 7790.934535] Last Breaking-Event-Address:
[ 7790.934537]  [<000000000071c33c>] debug_print_object+0xac/0xd0
[ 7790.934539] ---[ end trace b583bfd967a78637 ]---
[ 7790.934543] ODEBUG: deactivate not available (active state 0) object type: rcu_head hint:           (null)
[ 7790.934551] ------------[ cut here ]------------
[ 7790.934553] WARNING: at lib/debugobjects.c:263
[ 7790.934555] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack mlx4_ib ib_sa ipt_REJECT mlx4_en ib_mad nf_reject_ipv4 ib_core vxlan udp_tunnel ptp xt_tcpudp ib_addr pps_core iptable_filter ip_tables x_tables bridge stp llc ghash_s390 prng ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 mlx4_core sha1_s390 sha_common eadm_sch vhost_net nfsd tun vhost macvtap macvlan auth_rpcgss kvm oid_registry nfs_acl lockd grace sunrpc dm_multipath dm_mod autofs4
[ 7790.934599] CPU: 8 PID: 40 Comm: ksoftirqd/8 Tainted: G        W       4.5.0-rc4-00014-g1926e54-dirty #149
[ 7790.934601] task: 00000000e2955490 ti: 00000000e2958000 task.ti: 00000000e2958000
[ 7790.934603] Krnl PSW : 0404c00180000000 000000000071c340 (debug_print_object+0xb0/0xd0)
[ 7790.934608]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
               Krnl GPRS: 0000000001e6e3c7 00000000e2955490 000000000000005e 00000000e2958000
[ 7790.934612]            000000000071c33c 0000000000000000 0000000000b975e8 000000000000000a
[ 7790.934614]            0000000004bcd020 0700000001f2b010 0000000001f2b010 0000000000ba5d0a
[ 7790.934617]            0000000000e241f8 00000000e295bc48 000000000071c33c 00000000e295bb48
[ 7790.934622] Krnl Code: 000000000071c330: c41f00bf6a14        strl    %%r1,1f09758
                          000000000071c336: c0e5ffdbd64d        brasl   %%r14,296fd0
                         #000000000071c33c: a7f40001            brc     15,71c33e
                         >000000000071c340: c41d0036e746        lrl     %%r1,df91cc
                          000000000071c346: e340f0e80004        lg      %%r4,232(%%r15)
                          000000000071c34c: a71a0001            ahi     %%r1,1
                          000000000071c350: eb6ff0a80004        lmg     %%r6,%%r15,168(%%r15)
                          000000000071c356: c41f0036e73b        strl    %%r1,df91cc
[ 7790.934639] Call Trace:
[ 7790.934641] ([<000000000071c33c>] debug_print_object+0xac/0xd0)
[ 7790.934644]  [<000000000071d0a8>] debug_object_deactivate+0x170/0x188
[ 7790.934646]  [<00000000001d7db6>] rcu_process_callbacks+0x58e/0xa00
[ 7790.934648]  [<00000000001487ec>] __do_softirq+0x26c/0x580
[ 7790.934651]  [<0000000000148b50>] run_ksoftirqd+0x50/0xb0
[ 7790.934653]  [<0000000000172b28>] smpboot_thread_fn+0x320/0x378
[ 7790.934655]  [<000000000016d21c>] kthread+0x124/0x138
[ 7790.934657]  [<00000000009a1d72>] kernel_thread_starter+0x6/0xc
[ 7790.934659]  [<00000000009a1d6c>] kernel_thread_starter+0x0/0xc
[ 7790.934661] 1 lock held by ksoftirqd/8/40:
[ 7790.934663]  #0:  (&obj_hash[i].lock){-.-.-.}, at: [<000000000071cfdc>] debug_object_deactivate+0xa4/0x188
[ 7790.934669] Last Breaking-Event-Address:
[ 7790.934671]  [<000000000071c33c>] debug_print_object+0xac/0xd0
[ 7790.934673] ---[ end trace b583bfd967a78638 ]---
[ 7790.934680] ------------[ cut here ]------------
[ 7790.934682] kernel BUG at mm/slub.c:3629!
[ 7790.934707] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 7790.934715] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack mlx4_ib ib_sa ipt_REJECT mlx4_en ib_mad nf_reject_ipv4 ib_core vxlan udp_tunnel ptp xt_tcpudp ib_addr pps_core iptable_filter ip_tables x_tables bridge stp llc ghash_s390 prng ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 mlx4_core sha1_s390 sha_common eadm_sch vhost_net nfsd tun vhost macvtap macvlan auth_rpcgss kvm oid_registry nfs_acl lockd grace sunrpc dm_multipath dm_mod autofs4
[ 7790.934789] CPU: 8 PID: 40 Comm: ksoftirqd/8 Tainted: G        W       4.5.0-rc4-00014-g1926e54-dirty #149
[ 7790.934791] task: 00000000e2955490 ti: 00000000e2958000 task.ti: 00000000e2958000
[ 7790.934794] Krnl PSW : 0704c00180000000 000000000032295a (kfree+0x3f2/0x428)
[ 7790.934801]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
               Krnl GPRS: 0000000000000000 0000000000000100 0000000000000100 0000000000e24260
[ 7790.934806]            00000000001d1c82 0000000000000000 0000000000000000 000000000000000a
[ 7790.934809]            0000000000000001 00000000001d7e0a 0000000000000006 000003d10012f340
[ 7790.934812]            0000000004bcd000 0000000000f0433c 000000000032267a 00000000e295bbb0
[ 7790.934818] Krnl Code: 000000000032294c: c0e50033bef6        brasl   %%r14,99a738
                          0000000000322952: a7f4feba            brc     15,3226c6
                         #0000000000322956: a7f40001            brc     15,322958
                         >000000000032295a: e310b0060090        llgc    %%r1,6(%%r11)
                          0000000000322960: a7110040            tmll    %%r1,64
                          0000000000322964: a774fee9            brc     7,322736
                          0000000000322968: a7f4feeb            brc     15,32273e
                          000000000032296c: c0e50033be32        brasl   %%r14,99a5d0
[ 7790.934838] Call Trace:
[ 7790.934841] ([<000000000032267a>] kfree+0x112/0x428)
[ 7790.934844]  [<00000000001d7e0a>] rcu_process_callbacks+0x5e2/0xa00
[ 7790.934847]  [<00000000001487ec>] __do_softirq+0x26c/0x580
[ 7790.934850]  [<0000000000148b50>] run_ksoftirqd+0x50/0xb0
[ 7790.934854]  [<0000000000172b28>] smpboot_thread_fn+0x320/0x378
[ 7790.934856]  [<000000000016d21c>] kthread+0x124/0x138
[ 7790.934859]  [<00000000009a1d72>] kernel_thread_starter+0x6/0xc
[ 7790.934862]  [<00000000009a1d6c>] kernel_thread_starter+0x0/0xc
[ 7790.934864] INFO: lockdep is turned off.
[ 7790.934866] Last Breaking-Event-Address:
[ 7790.934869]  [<0000000000322956>] kfree+0x3ee/0x428
[ 7790.934873]  
[ 7790.934876] Kernel panic - not syncing: Fatal exception in interrupt

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-16 16:24                       ` Gerald Schaefer
  0 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-16 16:24 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Sebastian Ott, Andrea Arcangeli, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

On Mon, 15 Feb 2016 23:35:26 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Mon, Feb 15, 2016 at 07:37:02PM +0100, Gerald Schaefer wrote:
> > On Mon, 15 Feb 2016 13:31:59 +0200
> > "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> >=20
> > > On Sat, Feb 13, 2016 at 12:58:31PM +0100, Sebastian Ott wrote:
> > > >=20
> > > > On Sat, 13 Feb 2016, Kirill A. Shutemov wrote:
> > > > > Could you check if revert of fecffad25458 helps?
> > > >=20
> > > > I reverted fecffad25458 on top of 721675fcf277cf - it oopsed with:
> > > >=20
> > > > =C2=A2 1851.721062! Unable to handle kernel pointer dereference in =
virtual kernel address space
> > > > =C2=A2 1851.721075! failing address: 0000000000000000 TEID: 0000000=
000000483
> > > > =C2=A2 1851.721078! Fault in home space mode while using kernel ASC=
E.
> > > > =C2=A2 1851.721085! AS:0000000000d5c007 R3:00000000ffff0007 S:00000=
000ffffa800 P:000000000000003d
> > > > =C2=A2 1851.721128! Oops: 0004 ilc:3 =C2=A2#1! PREEMPT SMP DEBUG_PA=
GEALLOC
> > > > =C2=A2 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib=
 mlx4_en ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_c=
ore ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 des_gen=
eric genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common crc_itu_t dm_=
mod scm_block vhost_net tun vhost eadm_sch macvtap macvlan kvm autofs4
> > > > =C2=A2 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 4.5.0=
-rc3-00058-g07923d7-dirty #178
> > > > =C2=A2 1851.721186! task: 000000007fbfd290 ti: 000000008c604000 tas=
k.ti: 000000008c604000
> > > > =C2=A2 1851.721189! Krnl PSW : 0704d00180000000 000000000045d3b8 (_=
_rb_erase_color+0x280/0x308)
> > > > =C2=A2 1851.721200!            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 =
AS:3 CC:1 PM:0 EA:3
> > > >                Krnl GPRS: 0000000000000001 0000000000000020 0000000=
000000000 00000000bd07eff1
> > > > =C2=A2 1851.721205!            000000000027ca10 0000000000000000 00=
00000083e45898 0000000077b61198
> > > > =C2=A2 1851.721207!            000000007ce1a490 00000000bd07eff0 00=
0000007ce1a548 000000000027ca10
> > > > =C2=A2 1851.721210!            00000000bd07c350 00000000bd07eff0 00=
0000008c607aa8 000000008c607a68
> > > > =C2=A2 1851.721221! Krnl Code: 000000000045d3aa: e3c0d0080024      =
 stg     %%r12,8(%%r13)
> > > >                           000000000045d3b0: b9040039           lgr =
    %%r3,%%r9
> > > >                          #000000000045d3b4: a53b0001           oill=
    %%r3,1
> > > >                          >000000000045d3b8: e33010000024       stg =
    %%r3,0(%%r1)
> > > >                           000000000045d3be: ec28000e007c       cgij=
    %%r2,0,8,45d3da
> > > >                           000000000045d3c4: e34020000004       lg  =
    %%r4,0(%%r2)
> > > >                           000000000045d3ca: b904001c           lgr =
    %%r1,%%r12
> > > >                           000000000045d3ce: ec143f3f0056       rosb=
g   %%r1,%%r4,63,63,0
> > > > =C2=A2 1851.721269! Call Trace:
> > > > =C2=A2 1851.721273! (=C2=A2<0000000083e45898>! 0x83e45898)
> > > > =C2=A2 1851.721279!  =C2=A2<000000000029342a>! unlink_anon_vmas+0x9=
a/0x1d8
> > > > =C2=A2 1851.721282!  =C2=A2<0000000000283f34>! free_pgtables+0xcc/0=
x148
> > > > =C2=A2 1851.721285!  =C2=A2<000000000028c376>! exit_mmap+0xd6/0x300
> > > > =C2=A2 1851.721289!  =C2=A2<0000000000134db8>! mmput+0x90/0x118
> > > > =C2=A2 1851.721294!  =C2=A2<00000000002d76bc>! flush_old_exec+0x5d4=
/0x700
> > > > =C2=A2 1851.721298!  =C2=A2<00000000003369f4>! load_elf_binary+0x2f=
4/0x13e8
> > > > =C2=A2 1851.721301!  =C2=A2<00000000002d6e4a>! search_binary_handle=
r+0x9a/0x1f8
> > > > =C2=A2 1851.721304!  =C2=A2<00000000002d8970>! do_execveat_common.i=
sra.32+0x668/0x9a0
> > > > =C2=A2 1851.721307!  =C2=A2<00000000002d8cec>! do_execve+0x44/0x58
> > > > =C2=A2 1851.721310!  =C2=A2<00000000002d8f92>! SyS_execve+0x3a/0x48
> > > > =C2=A2 1851.721315!  =C2=A2<00000000006fb096>! system_call+0xd6/0x2=
58
> > > > =C2=A2 1851.721317!  =C2=A2<000003ff997436d6>! 0x3ff997436d6
> > > > =C2=A2 1851.721319! INFO: lockdep is turned off.
> > > > =C2=A2 1851.721321! Last Breaking-Event-Address:
> > > > =C2=A2 1851.721323!  =C2=A2<000000000045d31a>! __rb_erase_color+0x1=
e2/0x308
> > > > =C2=A2 1851.721327!
> > > > =C2=A2 1851.721329! ---=C2=A2 end trace 0d80041ac00cfae2 !---
> > > >=20
> > > >=20
> > > > >=20
> > > > > And could you share how crashes looks like? I haven't seen backtr=
aces yet.
> > > > >=20
> > > >=20
> > > > Sure. I didn't because they really looked random to me. Most of the=
 time
> > > > in rcu or list debugging but I thought these have just been the mes=
senger
> > > > observing a corruption first. Anyhow, here is an older one that mig=
ht look
> > > > interesting:
> > > >=20
> > > > [   59.851421] list_del corruption. next->prev should be 000000006e=
1eb000, but was 0000000000000400
> > >=20
> > > This kinda interesting: 0x400 is TAIL_MAPPING.. Hm..
> > >=20
> > > Could you check if you see the problem on commit 1c290f642101 and its
> > > immediate parent?
> > >=20
> >=20
> > How should the page->mapping poison end up as next->prev in the list of
> > pre-allocated THP splitting page tables?
>=20
> May be pgtable was casted to struct page or something. I don't know.
>=20
> > Also, commit 1c290f642101 is before the THP rework, at least the
> > non-bisectable part, so we should expect not to see the problem there.
>=20
> Just to make sure: commit 122afea9626a is fine, commit 61f5d698cc97
> crashes. Correct?
>=20
> > 0x400 is also the value of an empty pte on s390, and the thp_deposit/wi=
thdraw
> > listheads are placed inside the pre-allocated pagetables instead of pag=
e->lru,
> > because we have 2K pagetables on s390 and cannot use struct page =3D=3D=
 pgtable_t.
>=20
> 0x400 from empty pte makes more sense than TAIL_MAPPING. But I guess it
> worth changing TAIL_MAPPING to some other value to make sure.

Right, but we cannot trigger this list corruption symptom reliably, in fact
I didn't hit it at all during the last runs, and previous crash logs also
showed list corruptions with other values than 0x400, which may hint towards
concurrent pagetable freeing and re-use, given that our THP splitting paget=
able
listhead is located inside the pre-allocated pagetables.

>=20
> > So, for example, two concurrent withdraws could produce such a list
> > corruption, because the first withdraw will overwrite the listhead at t=
he
> > beginning of the pagetable with 2 empty ptes.
> >=20
> > Has anything changed regarding the general THP deposit/withdraw logic?
>=20
> I don't see any changes in this area.
>=20
> To eliminate one more variable, I would propose to disable split pmd lock
> for testing and check if it makes difference.

Disabling ARCH_ENABLE_SPLIT_PMD_PTLOCK didn't make any difference, other
than maybe a little reduction in "randomness" of the crashes, but that
may be pure coincidence. Out of about 10 runs, I always ended up with either
ODEBUG "WARNING: at lib/debugobjects.c:263" and subsequent "kernel BUG at
mm/slub.c:3629", or "bad swap file / page map" with subsequent "kernel BUG
at kernel/cred.c:142", see below for the full traces.

>=20
> Is there any chance that I'll be able to trigger the bug using QEMU?
> Does anybody have an QEMU image I can use?
>=20

I have no image, but trying to reproduce this under virtualization may
help to trigger this also on other architectures. After ruling out IPI
vs. fast_gup I do not really see why this should be arch-specific, and
it wouldn't be the first time that we hit subtle races first on s390, due
to our virtualized environment (my test case is make -j20 with 10 CPUs and
4GB of memory, no swap).


Here are the full traces from the runs w/o split pmd lock:

1)

[ 2584.391880] cc1 (71885) used greatest stack depth: 10496 bytes left
[ 2951.268250] ld (147667) used greatest stack depth: 10472 bytes left
[ 2972.530753] swap_free: Bad swap file entry 1000000000000000
[ 2972.530763] BUG: Bad page map in process cc1  pte:00000420 pmd:6cfd3000
[ 2972.530766] addr:0000000080d00000 vm_flags:00000875 anon_vma:          (=
null) mapping:000000005dc6ac70 index
:d00
[ 2972.530776] file:cc1 fault:ext4_filemap_fault mmap:ext4_file_mmap readpa=
ge:ext4_readpage
[ 2972.530781] CPU: 6 PID: 152043 Comm: cc1 Not tainted 4.5.0-rc4-00014-g19=
26e54-dirty #70
[ 2972.530784]        0000000071947a60 0000000071947af0 0000000000000002 00=
00000000000000=20
                      0000000071947b90 0000000071947b08 0000000071947b08 00=
00000000113d38=20
                      0000000000000000 0000000000b70df4 0000000000b4f348 00=
0000000000000b=20
                      0000000071947b50 0000000071947af0 0000000000000000 00=
00000000000000=20
                      07000000c3763ae8 0000000000113d38 0000000071947af0 00=
00000071947b50=20
[ 2972.530811] Call Trace:
[ 2972.530818] ([<0000000000113c3c>] show_trace+0x12c/0x150)
[ 2972.530821]  [<0000000000113cee>] show_stack+0x8e/0xf0
[ 2972.530826]  [<000000000068b8ec>] dump_stack+0x9c/0xe0
[ 2972.530830]  [<00000000002bbeda>] print_bad_pte+0x222/0x238
[ 2972.530833]  [<00000000002beb92>] zap_pte_range+0x442/0x790
[ 2972.530835]  [<00000000002bf2c6>] unmap_single_vma+0x3e6/0x400
[ 2972.530837]  [<00000000002c0f46>] unmap_vmas+0x8e/0xc8
[ 2972.530840]  [<00000000002c9a56>] exit_mmap+0xc6/0x300
[ 2972.530844]  [<0000000000138b10>] mmput+0xa0/0x128
[ 2972.530847]  [<000000000013fcb4>] do_exit+0x42c/0xd60
[ 2972.530849]  [<00000000001406f0>] do_group_exit+0x98/0xe0
[ 2972.530851]  [<0000000000140768>] __wake_up_parent+0x0/0x28
[ 2972.530855]  [<0000000000910f2e>] system_call+0xd6/0x270
[ 2972.530883]  [<000003ff89b43698>] 0x3ff89b43698
[ 2972.530886] 1 lock held by cc1/152043:
[ 2972.530887]  #0:  (&(ptlock_ptr(page))->rlock){+.+.-.}, at: [<0000000000=
2be7f6>] zap_pte_range+0xa6/0x790
[ 2972.530897] Disabling lock debugging due to kernel taint
[ 2972.533069] BUG: Bad rss-counter state mm:00000000719d0e00 idx:2 val:-1
[ 5899.109157] ------------[ cut here ]------------
[ 5899.109166] kernel BUG at kernel/cred.c:142!
[ 5899.109211] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALL=
OC
[ 5899.109217] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_connt=
rack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp iptable_filter ip_tab=
les x_tables bridge stp llc mlx4_ib ib_sa ib_mad mlx4_en ib_core vxlan udp_=
tunnel ptp ib_addr pps_core ghash_s390 prng ecb aes_s390 des_s390 des_gener=
ic sha512_s390 sha256_s390 sha1_s390 sha_common mlx4_core eadm_sch nfsd vho=
st_net tun vhost macvtap auth_rpcgss macvlan kvm oid_registry nfs_acl lockd=
 grace sunrpc dm_multipath dm_mod autofs4
[ 5899.109279] CPU: 1 PID: 12 Comm: ksoftirqd/1 Tainted: G    B           4=
.5.0-rc4-00014-g1926e54-dirty #70
[ 5899.109283] task: 00000000d09e2a48 ti: 00000000d09f4000 task.ti: 0000000=
0d09f4000
[ 5899.109286] Krnl PSW : 0704c00180000000 00000000001651aa (__put_cred+0x2=
2/0x68)
[ 5899.109296]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:=
0 EA:3
               Krnl GPRS: 0000000000000002 0000000000000020 000000007431f00=
0 00000000c38e3400
[ 5899.109301]            000000000032aaf8 0000000000000002 000000000000000=
0 000000000000000a
[ 5899.109304]            0000000000000000 000000000032aac0 000000000000000=
8 00000000749ad000
[ 5899.109306]            00000000c38e3400 000000007431f000 000000000032ab2=
e 00000000d09f7bf0
[ 5899.109316] Krnl Code: 000000000016519c: 58102004            l       %%r=
1,4(%%r2)
                          00000000001651a0: ec180005007e        cij     %%r=
1,0,8,1651aa
                         #00000000001651a6: a7f40001            brc     15,=
1651a8
                         >00000000001651aa: e3e020080024        stg     %%r=
14,8(%%r2)
                          00000000001651b0: c01944656144        iilf    %%r=
1,1147494724
                          00000000001651b6: 50102010            st      %%r=
1,16(%%r2)
                          00000000001651ba: e31003100004        lg      %%r=
1,784
                          00000000001651c0: e32018300020        cg      %%r=
2,2096(%%r1)
[ 5899.109371] Call Trace:
[ 5899.109376] ([<000000000032aaf8>] file_free_rcu+0x38/0x88)
[ 5899.109381]  [<00000000001c5ddc>] rcu_process_callbacks+0x5fc/0x9f0
[ 5899.109385]  [<0000000000141794>] __do_softirq+0x25c/0x570
[ 5899.109387]  [<0000000000141ae6>] run_ksoftirqd+0x3e/0xa0
[ 5899.109391]  [<0000000000167bee>] smpboot_thread_fn+0x30e/0x360
[ 5899.109394]  [<0000000000162f4a>] kthread+0x112/0x128
[ 5899.109398]  [<00000000009110fa>] kernel_thread_starter+0x6/0xc
[ 5899.109401]  [<00000000009110f4>] kernel_thread_starter+0x0/0xc
[ 5899.109403] INFO: lockdep is turned off.
[ 5899.109405] Last Breaking-Event-Address:
[ 5899.109407]  [<00000000001651a6>] __put_cred+0x1e/0x68
[ 5899.109411] =20
[ 5899.109414] Kernel panic - not syncing: Fatal exception in interrupt


2)

[ 7790.934295] ODEBUG: active_state not available (active state 0) object t=
ype: rcu_head hint:           (null)
[ 7790.934356] ------------[ cut here ]------------
[ 7790.934359] WARNING: at lib/debugobjects.c:263
[ 7790.934361] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_connt=
rack nf_conntrack mlx4_ib ib_sa ipt_REJECT mlx4_en ib_mad nf_reject_ipv4 ib=
_core vxlan udp_tunnel ptp xt_tcpudp ib_addr pps_core iptable_filter ip_tab=
les x_tables bridge stp llc ghash_s390 prng ecb aes_s390 des_s390 des_gener=
ic sha512_s390 sha256_s390 mlx4_core sha1_s390 sha_common eadm_sch vhost_ne=
t nfsd tun vhost macvtap macvlan auth_rpcgss kvm oid_registry nfs_acl lockd=
 grace sunrpc dm_multipath dm_mod autofs4
[ 7790.934417] CPU: 8 PID: 40 Comm: ksoftirqd/8 Not tainted 4.5.0-rc4-00014=
-g1926e54-dirty #149
[ 7790.934420] task: 00000000e2955490 ti: 00000000e2958000 task.ti: 0000000=
0e2958000
[ 7790.934422] Krnl PSW : 0404c00180000000 000000000071c340 (debug_print_ob=
ject+0xb0/0xd0)
[ 7790.934431]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:=
0 EA:3
               Krnl GPRS: 0000000001e6e3c7 00000000e2955490 000000000000006=
0 00000000e2958000
[ 7790.934435]            000000000071c33c 0000000000000000 0000000000b975e=
8 0000000001f2b008
[ 7790.934437]            07000000001d7e24 0000000000000000 0000000001f2b01=
0 0000000000bea6b8
[ 7790.934440]            0000000000e241f8 00000000e295bc38 000000000071c33=
c 00000000e295bb38
[ 7790.934449] Krnl Code: 000000000071c330: c41f00bf6a14        strl    %%r=
1,1f09758
                          000000000071c336: c0e5ffdbd64d        brasl   %%r=
14,296fd0
                         #000000000071c33c: a7f40001            brc     15,=
71c33e
                         >000000000071c340: c41d0036e746        lrl     %%r=
1,df91cc
                          000000000071c346: e340f0e80004        lg      %%r=
4,232(%%r15)
                          000000000071c34c: a71a0001            ahi     %%r=
1,1
                          000000000071c350: eb6ff0a80004        lmg     %%r=
6,%%r15,168(%%r15)
                          000000000071c356: c41f0036e73b        strl    %%r=
1,df91cc
[ 7790.934493] Call Trace:
[ 7790.934495] ([<000000000071c33c>] debug_print_object+0xac/0xd0)
[ 7790.934498]  [<000000000071d704>] debug_object_active_state+0x164/0x178
[ 7790.934504]  [<00000000001d7da4>] rcu_process_callbacks+0x57c/0xa00
[ 7790.934508]  [<00000000001487ec>] __do_softirq+0x26c/0x580
[ 7790.934510]  [<0000000000148b50>] run_ksoftirqd+0x50/0xb0
[ 7790.934515]  [<0000000000172b28>] smpboot_thread_fn+0x320/0x378
[ 7790.934517]  [<000000000016d21c>] kthread+0x124/0x138
[ 7790.934521]  [<00000000009a1d72>] kernel_thread_starter+0x6/0xc
[ 7790.934524]  [<00000000009a1d6c>] kernel_thread_starter+0x0/0xc
[ 7790.934526] 1 lock held by ksoftirqd/8/40:
[ 7790.934528]  #0:  (&obj_hash[i].lock){-.-.-.}, at: [<000000000071d64c>] =
debug_object_active_state+0xac/0x178
[ 7790.934535] Last Breaking-Event-Address:
[ 7790.934537]  [<000000000071c33c>] debug_print_object+0xac/0xd0
[ 7790.934539] ---[ end trace b583bfd967a78637 ]---
[ 7790.934543] ODEBUG: deactivate not available (active state 0) object typ=
e: rcu_head hint:           (null)
[ 7790.934551] ------------[ cut here ]------------
[ 7790.934553] WARNING: at lib/debugobjects.c:263
[ 7790.934555] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_connt=
rack nf_conntrack mlx4_ib ib_sa ipt_REJECT mlx4_en ib_mad nf_reject_ipv4 ib=
_core vxlan udp_tunnel ptp xt_tcpudp ib_addr pps_core iptable_filter ip_tab=
les x_tables bridge stp llc ghash_s390 prng ecb aes_s390 des_s390 des_gener=
ic sha512_s390 sha256_s390 mlx4_core sha1_s390 sha_common eadm_sch vhost_ne=
t nfsd tun vhost macvtap macvlan auth_rpcgss kvm oid_registry nfs_acl lockd=
 grace sunrpc dm_multipath dm_mod autofs4
[ 7790.934599] CPU: 8 PID: 40 Comm: ksoftirqd/8 Tainted: G        W       4=
.5.0-rc4-00014-g1926e54-dirty #149
[ 7790.934601] task: 00000000e2955490 ti: 00000000e2958000 task.ti: 0000000=
0e2958000
[ 7790.934603] Krnl PSW : 0404c00180000000 000000000071c340 (debug_print_ob=
ject+0xb0/0xd0)
[ 7790.934608]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:=
0 EA:3
               Krnl GPRS: 0000000001e6e3c7 00000000e2955490 000000000000005=
e 00000000e2958000
[ 7790.934612]            000000000071c33c 0000000000000000 0000000000b975e=
8 000000000000000a
[ 7790.934614]            0000000004bcd020 0700000001f2b010 0000000001f2b01=
0 0000000000ba5d0a
[ 7790.934617]            0000000000e241f8 00000000e295bc48 000000000071c33=
c 00000000e295bb48
[ 7790.934622] Krnl Code: 000000000071c330: c41f00bf6a14        strl    %%r=
1,1f09758
                          000000000071c336: c0e5ffdbd64d        brasl   %%r=
14,296fd0
                         #000000000071c33c: a7f40001            brc     15,=
71c33e
                         >000000000071c340: c41d0036e746        lrl     %%r=
1,df91cc
                          000000000071c346: e340f0e80004        lg      %%r=
4,232(%%r15)
                          000000000071c34c: a71a0001            ahi     %%r=
1,1
                          000000000071c350: eb6ff0a80004        lmg     %%r=
6,%%r15,168(%%r15)
                          000000000071c356: c41f0036e73b        strl    %%r=
1,df91cc
[ 7790.934639] Call Trace:
[ 7790.934641] ([<000000000071c33c>] debug_print_object+0xac/0xd0)
[ 7790.934644]  [<000000000071d0a8>] debug_object_deactivate+0x170/0x188
[ 7790.934646]  [<00000000001d7db6>] rcu_process_callbacks+0x58e/0xa00
[ 7790.934648]  [<00000000001487ec>] __do_softirq+0x26c/0x580
[ 7790.934651]  [<0000000000148b50>] run_ksoftirqd+0x50/0xb0
[ 7790.934653]  [<0000000000172b28>] smpboot_thread_fn+0x320/0x378
[ 7790.934655]  [<000000000016d21c>] kthread+0x124/0x138
[ 7790.934657]  [<00000000009a1d72>] kernel_thread_starter+0x6/0xc
[ 7790.934659]  [<00000000009a1d6c>] kernel_thread_starter+0x0/0xc
[ 7790.934661] 1 lock held by ksoftirqd/8/40:
[ 7790.934663]  #0:  (&obj_hash[i].lock){-.-.-.}, at: [<000000000071cfdc>] =
debug_object_deactivate+0xa4/0x188
[ 7790.934669] Last Breaking-Event-Address:
[ 7790.934671]  [<000000000071c33c>] debug_print_object+0xac/0xd0
[ 7790.934673] ---[ end trace b583bfd967a78638 ]---
[ 7790.934680] ------------[ cut here ]------------
[ 7790.934682] kernel BUG at mm/slub.c:3629!
[ 7790.934707] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALL=
OC
[ 7790.934715] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_connt=
rack nf_conntrack mlx4_ib ib_sa ipt_REJECT mlx4_en ib_mad nf_reject_ipv4 ib=
_core vxlan udp_tunnel ptp xt_tcpudp ib_addr pps_core iptable_filter ip_tab=
les x_tables bridge stp llc ghash_s390 prng ecb aes_s390 des_s390 des_gener=
ic sha512_s390 sha256_s390 mlx4_core sha1_s390 sha_common eadm_sch vhost_ne=
t nfsd tun vhost macvtap macvlan auth_rpcgss kvm oid_registry nfs_acl lockd=
 grace sunrpc dm_multipath dm_mod autofs4
[ 7790.934789] CPU: 8 PID: 40 Comm: ksoftirqd/8 Tainted: G        W       4=
.5.0-rc4-00014-g1926e54-dirty #149
[ 7790.934791] task: 00000000e2955490 ti: 00000000e2958000 task.ti: 0000000=
0e2958000
[ 7790.934794] Krnl PSW : 0704c00180000000 000000000032295a (kfree+0x3f2/0x=
428)
[ 7790.934801]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:=
0 EA:3
               Krnl GPRS: 0000000000000000 0000000000000100 000000000000010=
0 0000000000e24260
[ 7790.934806]            00000000001d1c82 0000000000000000 000000000000000=
0 000000000000000a
[ 7790.934809]            0000000000000001 00000000001d7e0a 000000000000000=
6 000003d10012f340
[ 7790.934812]            0000000004bcd000 0000000000f0433c 000000000032267=
a 00000000e295bbb0
[ 7790.934818] Krnl Code: 000000000032294c: c0e50033bef6        brasl   %%r=
14,99a738
                          0000000000322952: a7f4feba            brc     15,=
3226c6
                         #0000000000322956: a7f40001            brc     15,=
322958
                         >000000000032295a: e310b0060090        llgc    %%r=
1,6(%%r11)
                          0000000000322960: a7110040            tmll    %%r=
1,64
                          0000000000322964: a774fee9            brc     7,3=
22736
                          0000000000322968: a7f4feeb            brc     15,=
32273e
                          000000000032296c: c0e50033be32        brasl   %%r=
14,99a5d0
[ 7790.934838] Call Trace:
[ 7790.934841] ([<000000000032267a>] kfree+0x112/0x428)
[ 7790.934844]  [<00000000001d7e0a>] rcu_process_callbacks+0x5e2/0xa00
[ 7790.934847]  [<00000000001487ec>] __do_softirq+0x26c/0x580
[ 7790.934850]  [<0000000000148b50>] run_ksoftirqd+0x50/0xb0
[ 7790.934854]  [<0000000000172b28>] smpboot_thread_fn+0x320/0x378
[ 7790.934856]  [<000000000016d21c>] kthread+0x124/0x138
[ 7790.934859]  [<00000000009a1d72>] kernel_thread_starter+0x6/0xc
[ 7790.934862]  [<00000000009a1d6c>] kernel_thread_starter+0x0/0xc
[ 7790.934864] INFO: lockdep is turned off.
[ 7790.934866] Last Breaking-Event-Address:
[ 7790.934869]  [<0000000000322956>] kfree+0x3ee/0x428
[ 7790.934873] =20
[ 7790.934876] Kernel panic - not syncing: Fatal exception in interrupt

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-16 16:24                       ` Gerald Schaefer
  0 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-16 16:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 15 Feb 2016 23:35:26 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Mon, Feb 15, 2016 at 07:37:02PM +0100, Gerald Schaefer wrote:
> > On Mon, 15 Feb 2016 13:31:59 +0200
> > "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> > 
> > > On Sat, Feb 13, 2016 at 12:58:31PM +0100, Sebastian Ott wrote:
> > > > 
> > > > On Sat, 13 Feb 2016, Kirill A. Shutemov wrote:
> > > > > Could you check if revert of fecffad25458 helps?
> > > > 
> > > > I reverted fecffad25458 on top of 721675fcf277cf - it oopsed with:
> > > > 
> > > > ? 1851.721062! Unable to handle kernel pointer dereference in virtual kernel address space
> > > > ? 1851.721075! failing address: 0000000000000000 TEID: 0000000000000483
> > > > ? 1851.721078! Fault in home space mode while using kernel ASCE.
> > > > ? 1851.721085! AS:0000000000d5c007 R3:00000000ffff0007 S:00000000ffffa800 P:000000000000003d
> > > > ? 1851.721128! Oops: 0004 ilc:3 ?#1! PREEMPT SMP DEBUG_PAGEALLOC
> > > > ? 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 des_generic genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common crc_itu_t dm_mod scm_block vhost_net tun vhost eadm_sch macvtap macvlan kvm autofs4
> > > > ? 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 4.5.0-rc3-00058-g07923d7-dirty #178
> > > > ? 1851.721186! task: 000000007fbfd290 ti: 000000008c604000 task.ti: 000000008c604000
> > > > ? 1851.721189! Krnl PSW : 0704d00180000000 000000000045d3b8 (__rb_erase_color+0x280/0x308)
> > > > ? 1851.721200!            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
> > > >                Krnl GPRS: 0000000000000001 0000000000000020 0000000000000000 00000000bd07eff1
> > > > ? 1851.721205!            000000000027ca10 0000000000000000 0000000083e45898 0000000077b61198
> > > > ? 1851.721207!            000000007ce1a490 00000000bd07eff0 000000007ce1a548 000000000027ca10
> > > > ? 1851.721210!            00000000bd07c350 00000000bd07eff0 000000008c607aa8 000000008c607a68
> > > > ? 1851.721221! Krnl Code: 000000000045d3aa: e3c0d0080024       stg     %%r12,8(%%r13)
> > > >                           000000000045d3b0: b9040039           lgr     %%r3,%%r9
> > > >                          #000000000045d3b4: a53b0001           oill    %%r3,1
> > > >                          >000000000045d3b8: e33010000024       stg     %%r3,0(%%r1)
> > > >                           000000000045d3be: ec28000e007c       cgij    %%r2,0,8,45d3da
> > > >                           000000000045d3c4: e34020000004       lg      %%r4,0(%%r2)
> > > >                           000000000045d3ca: b904001c           lgr     %%r1,%%r12
> > > >                           000000000045d3ce: ec143f3f0056       rosbg   %%r1,%%r4,63,63,0
> > > > ? 1851.721269! Call Trace:
> > > > ? 1851.721273! (?<0000000083e45898>! 0x83e45898)
> > > > ? 1851.721279!  ?<000000000029342a>! unlink_anon_vmas+0x9a/0x1d8
> > > > ? 1851.721282!  ?<0000000000283f34>! free_pgtables+0xcc/0x148
> > > > ? 1851.721285!  ?<000000000028c376>! exit_mmap+0xd6/0x300
> > > > ? 1851.721289!  ?<0000000000134db8>! mmput+0x90/0x118
> > > > ? 1851.721294!  ?<00000000002d76bc>! flush_old_exec+0x5d4/0x700
> > > > ? 1851.721298!  ?<00000000003369f4>! load_elf_binary+0x2f4/0x13e8
> > > > ? 1851.721301!  ?<00000000002d6e4a>! search_binary_handler+0x9a/0x1f8
> > > > ? 1851.721304!  ?<00000000002d8970>! do_execveat_common.isra.32+0x668/0x9a0
> > > > ? 1851.721307!  ?<00000000002d8cec>! do_execve+0x44/0x58
> > > > ? 1851.721310!  ?<00000000002d8f92>! SyS_execve+0x3a/0x48
> > > > ? 1851.721315!  ?<00000000006fb096>! system_call+0xd6/0x258
> > > > ? 1851.721317!  ?<000003ff997436d6>! 0x3ff997436d6
> > > > ? 1851.721319! INFO: lockdep is turned off.
> > > > ? 1851.721321! Last Breaking-Event-Address:
> > > > ? 1851.721323!  ?<000000000045d31a>! __rb_erase_color+0x1e2/0x308
> > > > ? 1851.721327!
> > > > ? 1851.721329! ---? end trace 0d80041ac00cfae2 !---
> > > > 
> > > > 
> > > > > 
> > > > > And could you share how crashes looks like? I haven't seen backtraces yet.
> > > > > 
> > > > 
> > > > Sure. I didn't because they really looked random to me. Most of the time
> > > > in rcu or list debugging but I thought these have just been the messenger
> > > > observing a corruption first. Anyhow, here is an older one that might look
> > > > interesting:
> > > > 
> > > > [   59.851421] list_del corruption. next->prev should be 000000006e1eb000, but was 0000000000000400
> > > 
> > > This kinda interesting: 0x400 is TAIL_MAPPING.. Hm..
> > > 
> > > Could you check if you see the problem on commit 1c290f642101 and its
> > > immediate parent?
> > > 
> > 
> > How should the page->mapping poison end up as next->prev in the list of
> > pre-allocated THP splitting page tables?
> 
> May be pgtable was casted to struct page or something. I don't know.
> 
> > Also, commit 1c290f642101 is before the THP rework, at least the
> > non-bisectable part, so we should expect not to see the problem there.
> 
> Just to make sure: commit 122afea9626a is fine, commit 61f5d698cc97
> crashes. Correct?
> 
> > 0x400 is also the value of an empty pte on s390, and the thp_deposit/withdraw
> > listheads are placed inside the pre-allocated pagetables instead of page->lru,
> > because we have 2K pagetables on s390 and cannot use struct page == pgtable_t.
> 
> 0x400 from empty pte makes more sense than TAIL_MAPPING. But I guess it
> worth changing TAIL_MAPPING to some other value to make sure.

Right, but we cannot trigger this list corruption symptom reliably, in fact
I didn't hit it at all during the last runs, and previous crash logs also
showed list corruptions with other values than 0x400, which may hint towards
concurrent pagetable freeing and re-use, given that our THP splitting pagetable
listhead is located inside the pre-allocated pagetables.

> 
> > So, for example, two concurrent withdraws could produce such a list
> > corruption, because the first withdraw will overwrite the listhead at the
> > beginning of the pagetable with 2 empty ptes.
> > 
> > Has anything changed regarding the general THP deposit/withdraw logic?
> 
> I don't see any changes in this area.
> 
> To eliminate one more variable, I would propose to disable split pmd lock
> for testing and check if it makes difference.

Disabling ARCH_ENABLE_SPLIT_PMD_PTLOCK didn't make any difference, other
than maybe a little reduction in "randomness" of the crashes, but that
may be pure coincidence. Out of about 10 runs, I always ended up with either
ODEBUG "WARNING: at lib/debugobjects.c:263" and subsequent "kernel BUG at
mm/slub.c:3629", or "bad swap file / page map" with subsequent "kernel BUG
at kernel/cred.c:142", see below for the full traces.

> 
> Is there any chance that I'll be able to trigger the bug using QEMU?
> Does anybody have an QEMU image I can use?
> 

I have no image, but trying to reproduce this under virtualization may
help to trigger this also on other architectures. After ruling out IPI
vs. fast_gup I do not really see why this should be arch-specific, and
it wouldn't be the first time that we hit subtle races first on s390, due
to our virtualized environment (my test case is make -j20 with 10 CPUs and
4GB of memory, no swap).


Here are the full traces from the runs w/o split pmd lock:

1)

[ 2584.391880] cc1 (71885) used greatest stack depth: 10496 bytes left
[ 2951.268250] ld (147667) used greatest stack depth: 10472 bytes left
[ 2972.530753] swap_free: Bad swap file entry 1000000000000000
[ 2972.530763] BUG: Bad page map in process cc1  pte:00000420 pmd:6cfd3000
[ 2972.530766] addr:0000000080d00000 vm_flags:00000875 anon_vma:          (null) mapping:000000005dc6ac70 index
:d00
[ 2972.530776] file:cc1 fault:ext4_filemap_fault mmap:ext4_file_mmap readpage:ext4_readpage
[ 2972.530781] CPU: 6 PID: 152043 Comm: cc1 Not tainted 4.5.0-rc4-00014-g1926e54-dirty #70
[ 2972.530784]        0000000071947a60 0000000071947af0 0000000000000002 0000000000000000 
                      0000000071947b90 0000000071947b08 0000000071947b08 0000000000113d38 
                      0000000000000000 0000000000b70df4 0000000000b4f348 000000000000000b 
                      0000000071947b50 0000000071947af0 0000000000000000 0000000000000000 
                      07000000c3763ae8 0000000000113d38 0000000071947af0 0000000071947b50 
[ 2972.530811] Call Trace:
[ 2972.530818] ([<0000000000113c3c>] show_trace+0x12c/0x150)
[ 2972.530821]  [<0000000000113cee>] show_stack+0x8e/0xf0
[ 2972.530826]  [<000000000068b8ec>] dump_stack+0x9c/0xe0
[ 2972.530830]  [<00000000002bbeda>] print_bad_pte+0x222/0x238
[ 2972.530833]  [<00000000002beb92>] zap_pte_range+0x442/0x790
[ 2972.530835]  [<00000000002bf2c6>] unmap_single_vma+0x3e6/0x400
[ 2972.530837]  [<00000000002c0f46>] unmap_vmas+0x8e/0xc8
[ 2972.530840]  [<00000000002c9a56>] exit_mmap+0xc6/0x300
[ 2972.530844]  [<0000000000138b10>] mmput+0xa0/0x128
[ 2972.530847]  [<000000000013fcb4>] do_exit+0x42c/0xd60
[ 2972.530849]  [<00000000001406f0>] do_group_exit+0x98/0xe0
[ 2972.530851]  [<0000000000140768>] __wake_up_parent+0x0/0x28
[ 2972.530855]  [<0000000000910f2e>] system_call+0xd6/0x270
[ 2972.530883]  [<000003ff89b43698>] 0x3ff89b43698
[ 2972.530886] 1 lock held by cc1/152043:
[ 2972.530887]  #0:  (&(ptlock_ptr(page))->rlock){+.+.-.}, at: [<00000000002be7f6>] zap_pte_range+0xa6/0x790
[ 2972.530897] Disabling lock debugging due to kernel taint
[ 2972.533069] BUG: Bad rss-counter state mm:00000000719d0e00 idx:2 val:-1
[ 5899.109157] ------------[ cut here ]------------
[ 5899.109166] kernel BUG at kernel/cred.c:142!
[ 5899.109211] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 5899.109217] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc mlx4_ib ib_sa ib_mad mlx4_en ib_core vxlan udp_tunnel ptp ib_addr pps_core ghash_s390 prng ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 sha_common mlx4_core eadm_sch nfsd vhost_net tun vhost macvtap auth_rpcgss macvlan kvm oid_registry nfs_acl lockd grace sunrpc dm_multipath dm_mod autofs4
[ 5899.109279] CPU: 1 PID: 12 Comm: ksoftirqd/1 Tainted: G    B           4.5.0-rc4-00014-g1926e54-dirty #70
[ 5899.109283] task: 00000000d09e2a48 ti: 00000000d09f4000 task.ti: 00000000d09f4000
[ 5899.109286] Krnl PSW : 0704c00180000000 00000000001651aa (__put_cred+0x22/0x68)
[ 5899.109296]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
               Krnl GPRS: 0000000000000002 0000000000000020 000000007431f000 00000000c38e3400
[ 5899.109301]            000000000032aaf8 0000000000000002 0000000000000000 000000000000000a
[ 5899.109304]            0000000000000000 000000000032aac0 0000000000000008 00000000749ad000
[ 5899.109306]            00000000c38e3400 000000007431f000 000000000032ab2e 00000000d09f7bf0
[ 5899.109316] Krnl Code: 000000000016519c: 58102004            l       %%r1,4(%%r2)
                          00000000001651a0: ec180005007e        cij     %%r1,0,8,1651aa
                         #00000000001651a6: a7f40001            brc     15,1651a8
                         >00000000001651aa: e3e020080024        stg     %%r14,8(%%r2)
                          00000000001651b0: c01944656144        iilf    %%r1,1147494724
                          00000000001651b6: 50102010            st      %%r1,16(%%r2)
                          00000000001651ba: e31003100004        lg      %%r1,784
                          00000000001651c0: e32018300020        cg      %%r2,2096(%%r1)
[ 5899.109371] Call Trace:
[ 5899.109376] ([<000000000032aaf8>] file_free_rcu+0x38/0x88)
[ 5899.109381]  [<00000000001c5ddc>] rcu_process_callbacks+0x5fc/0x9f0
[ 5899.109385]  [<0000000000141794>] __do_softirq+0x25c/0x570
[ 5899.109387]  [<0000000000141ae6>] run_ksoftirqd+0x3e/0xa0
[ 5899.109391]  [<0000000000167bee>] smpboot_thread_fn+0x30e/0x360
[ 5899.109394]  [<0000000000162f4a>] kthread+0x112/0x128
[ 5899.109398]  [<00000000009110fa>] kernel_thread_starter+0x6/0xc
[ 5899.109401]  [<00000000009110f4>] kernel_thread_starter+0x0/0xc
[ 5899.109403] INFO: lockdep is turned off.
[ 5899.109405] Last Breaking-Event-Address:
[ 5899.109407]  [<00000000001651a6>] __put_cred+0x1e/0x68
[ 5899.109411]  
[ 5899.109414] Kernel panic - not syncing: Fatal exception in interrupt


2)

[ 7790.934295] ODEBUG: active_state not available (active state 0) object type: rcu_head hint:           (null)
[ 7790.934356] ------------[ cut here ]------------
[ 7790.934359] WARNING: at lib/debugobjects.c:263
[ 7790.934361] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack mlx4_ib ib_sa ipt_REJECT mlx4_en ib_mad nf_reject_ipv4 ib_core vxlan udp_tunnel ptp xt_tcpudp ib_addr pps_core iptable_filter ip_tables x_tables bridge stp llc ghash_s390 prng ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 mlx4_core sha1_s390 sha_common eadm_sch vhost_net nfsd tun vhost macvtap macvlan auth_rpcgss kvm oid_registry nfs_acl lockd grace sunrpc dm_multipath dm_mod autofs4
[ 7790.934417] CPU: 8 PID: 40 Comm: ksoftirqd/8 Not tainted 4.5.0-rc4-00014-g1926e54-dirty #149
[ 7790.934420] task: 00000000e2955490 ti: 00000000e2958000 task.ti: 00000000e2958000
[ 7790.934422] Krnl PSW : 0404c00180000000 000000000071c340 (debug_print_object+0xb0/0xd0)
[ 7790.934431]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
               Krnl GPRS: 0000000001e6e3c7 00000000e2955490 0000000000000060 00000000e2958000
[ 7790.934435]            000000000071c33c 0000000000000000 0000000000b975e8 0000000001f2b008
[ 7790.934437]            07000000001d7e24 0000000000000000 0000000001f2b010 0000000000bea6b8
[ 7790.934440]            0000000000e241f8 00000000e295bc38 000000000071c33c 00000000e295bb38
[ 7790.934449] Krnl Code: 000000000071c330: c41f00bf6a14        strl    %%r1,1f09758
                          000000000071c336: c0e5ffdbd64d        brasl   %%r14,296fd0
                         #000000000071c33c: a7f40001            brc     15,71c33e
                         >000000000071c340: c41d0036e746        lrl     %%r1,df91cc
                          000000000071c346: e340f0e80004        lg      %%r4,232(%%r15)
                          000000000071c34c: a71a0001            ahi     %%r1,1
                          000000000071c350: eb6ff0a80004        lmg     %%r6,%%r15,168(%%r15)
                          000000000071c356: c41f0036e73b        strl    %%r1,df91cc
[ 7790.934493] Call Trace:
[ 7790.934495] ([<000000000071c33c>] debug_print_object+0xac/0xd0)
[ 7790.934498]  [<000000000071d704>] debug_object_active_state+0x164/0x178
[ 7790.934504]  [<00000000001d7da4>] rcu_process_callbacks+0x57c/0xa00
[ 7790.934508]  [<00000000001487ec>] __do_softirq+0x26c/0x580
[ 7790.934510]  [<0000000000148b50>] run_ksoftirqd+0x50/0xb0
[ 7790.934515]  [<0000000000172b28>] smpboot_thread_fn+0x320/0x378
[ 7790.934517]  [<000000000016d21c>] kthread+0x124/0x138
[ 7790.934521]  [<00000000009a1d72>] kernel_thread_starter+0x6/0xc
[ 7790.934524]  [<00000000009a1d6c>] kernel_thread_starter+0x0/0xc
[ 7790.934526] 1 lock held by ksoftirqd/8/40:
[ 7790.934528]  #0:  (&obj_hash[i].lock){-.-.-.}, at: [<000000000071d64c>] debug_object_active_state+0xac/0x178
[ 7790.934535] Last Breaking-Event-Address:
[ 7790.934537]  [<000000000071c33c>] debug_print_object+0xac/0xd0
[ 7790.934539] ---[ end trace b583bfd967a78637 ]---
[ 7790.934543] ODEBUG: deactivate not available (active state 0) object type: rcu_head hint:           (null)
[ 7790.934551] ------------[ cut here ]------------
[ 7790.934553] WARNING: at lib/debugobjects.c:263
[ 7790.934555] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack mlx4_ib ib_sa ipt_REJECT mlx4_en ib_mad nf_reject_ipv4 ib_core vxlan udp_tunnel ptp xt_tcpudp ib_addr pps_core iptable_filter ip_tables x_tables bridge stp llc ghash_s390 prng ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 mlx4_core sha1_s390 sha_common eadm_sch vhost_net nfsd tun vhost macvtap macvlan auth_rpcgss kvm oid_registry nfs_acl lockd grace sunrpc dm_multipath dm_mod autofs4
[ 7790.934599] CPU: 8 PID: 40 Comm: ksoftirqd/8 Tainted: G        W       4.5.0-rc4-00014-g1926e54-dirty #149
[ 7790.934601] task: 00000000e2955490 ti: 00000000e2958000 task.ti: 00000000e2958000
[ 7790.934603] Krnl PSW : 0404c00180000000 000000000071c340 (debug_print_object+0xb0/0xd0)
[ 7790.934608]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
               Krnl GPRS: 0000000001e6e3c7 00000000e2955490 000000000000005e 00000000e2958000
[ 7790.934612]            000000000071c33c 0000000000000000 0000000000b975e8 000000000000000a
[ 7790.934614]            0000000004bcd020 0700000001f2b010 0000000001f2b010 0000000000ba5d0a
[ 7790.934617]            0000000000e241f8 00000000e295bc48 000000000071c33c 00000000e295bb48
[ 7790.934622] Krnl Code: 000000000071c330: c41f00bf6a14        strl    %%r1,1f09758
                          000000000071c336: c0e5ffdbd64d        brasl   %%r14,296fd0
                         #000000000071c33c: a7f40001            brc     15,71c33e
                         >000000000071c340: c41d0036e746        lrl     %%r1,df91cc
                          000000000071c346: e340f0e80004        lg      %%r4,232(%%r15)
                          000000000071c34c: a71a0001            ahi     %%r1,1
                          000000000071c350: eb6ff0a80004        lmg     %%r6,%%r15,168(%%r15)
                          000000000071c356: c41f0036e73b        strl    %%r1,df91cc
[ 7790.934639] Call Trace:
[ 7790.934641] ([<000000000071c33c>] debug_print_object+0xac/0xd0)
[ 7790.934644]  [<000000000071d0a8>] debug_object_deactivate+0x170/0x188
[ 7790.934646]  [<00000000001d7db6>] rcu_process_callbacks+0x58e/0xa00
[ 7790.934648]  [<00000000001487ec>] __do_softirq+0x26c/0x580
[ 7790.934651]  [<0000000000148b50>] run_ksoftirqd+0x50/0xb0
[ 7790.934653]  [<0000000000172b28>] smpboot_thread_fn+0x320/0x378
[ 7790.934655]  [<000000000016d21c>] kthread+0x124/0x138
[ 7790.934657]  [<00000000009a1d72>] kernel_thread_starter+0x6/0xc
[ 7790.934659]  [<00000000009a1d6c>] kernel_thread_starter+0x0/0xc
[ 7790.934661] 1 lock held by ksoftirqd/8/40:
[ 7790.934663]  #0:  (&obj_hash[i].lock){-.-.-.}, at: [<000000000071cfdc>] debug_object_deactivate+0xa4/0x188
[ 7790.934669] Last Breaking-Event-Address:
[ 7790.934671]  [<000000000071c33c>] debug_print_object+0xac/0xd0
[ 7790.934673] ---[ end trace b583bfd967a78638 ]---
[ 7790.934680] ------------[ cut here ]------------
[ 7790.934682] kernel BUG at mm/slub.c:3629!
[ 7790.934707] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 7790.934715] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack mlx4_ib ib_sa ipt_REJECT mlx4_en ib_mad nf_reject_ipv4 ib_core vxlan udp_tunnel ptp xt_tcpudp ib_addr pps_core iptable_filter ip_tables x_tables bridge stp llc ghash_s390 prng ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 mlx4_core sha1_s390 sha_common eadm_sch vhost_net nfsd tun vhost macvtap macvlan auth_rpcgss kvm oid_registry nfs_acl lockd grace sunrpc dm_multipath dm_mod autofs4
[ 7790.934789] CPU: 8 PID: 40 Comm: ksoftirqd/8 Tainted: G        W       4.5.0-rc4-00014-g1926e54-dirty #149
[ 7790.934791] task: 00000000e2955490 ti: 00000000e2958000 task.ti: 00000000e2958000
[ 7790.934794] Krnl PSW : 0704c00180000000 000000000032295a (kfree+0x3f2/0x428)
[ 7790.934801]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
               Krnl GPRS: 0000000000000000 0000000000000100 0000000000000100 0000000000e24260
[ 7790.934806]            00000000001d1c82 0000000000000000 0000000000000000 000000000000000a
[ 7790.934809]            0000000000000001 00000000001d7e0a 0000000000000006 000003d10012f340
[ 7790.934812]            0000000004bcd000 0000000000f0433c 000000000032267a 00000000e295bbb0
[ 7790.934818] Krnl Code: 000000000032294c: c0e50033bef6        brasl   %%r14,99a738
                          0000000000322952: a7f4feba            brc     15,3226c6
                         #0000000000322956: a7f40001            brc     15,322958
                         >000000000032295a: e310b0060090        llgc    %%r1,6(%%r11)
                          0000000000322960: a7110040            tmll    %%r1,64
                          0000000000322964: a774fee9            brc     7,322736
                          0000000000322968: a7f4feeb            brc     15,32273e
                          000000000032296c: c0e50033be32        brasl   %%r14,99a5d0
[ 7790.934838] Call Trace:
[ 7790.934841] ([<000000000032267a>] kfree+0x112/0x428)
[ 7790.934844]  [<00000000001d7e0a>] rcu_process_callbacks+0x5e2/0xa00
[ 7790.934847]  [<00000000001487ec>] __do_softirq+0x26c/0x580
[ 7790.934850]  [<0000000000148b50>] run_ksoftirqd+0x50/0xb0
[ 7790.934854]  [<0000000000172b28>] smpboot_thread_fn+0x320/0x378
[ 7790.934856]  [<000000000016d21c>] kthread+0x124/0x138
[ 7790.934859]  [<00000000009a1d72>] kernel_thread_starter+0x6/0xc
[ 7790.934862]  [<00000000009a1d6c>] kernel_thread_starter+0x0/0xc
[ 7790.934864] INFO: lockdep is turned off.
[ 7790.934866] Last Breaking-Event-Address:
[ 7790.934869]  [<0000000000322956>] kfree+0x3ee/0x428
[ 7790.934873]  
[ 7790.934876] Kernel panic - not syncing: Fatal exception in interrupt

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-15 21:35                     ` Kirill A. Shutemov
  (?)
@ 2016-02-16 18:46                       ` Christian Borntraeger
  -1 siblings, 0 replies; 153+ messages in thread
From: Christian Borntraeger @ 2016-02-16 18:46 UTC (permalink / raw)
  To: Kirill A. Shutemov, Gerald Schaefer
  Cc: Sebastian Ott, Andrea Arcangeli, Kirill A. Shutemov, linux-mm,
	linux-kernel, Aneesh Kumar K.V, Andrew Morton, Linus Torvalds,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

On 02/15/2016 10:35 PM, Kirill A. Shutemov wrote:
> 
> Is there any chance that I'll be able to trigger the bug using QEMU?
> Does anybody have an QEMU image I can use?

qemu/TCG on s390 does neither provide SMP nor large pages (only QEMU/KVM does)
so this will probably not help you here. 

Christian

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-16 18:46                       ` Christian Borntraeger
  0 siblings, 0 replies; 153+ messages in thread
From: Christian Borntraeger @ 2016-02-16 18:46 UTC (permalink / raw)
  To: Kirill A. Shutemov, Gerald Schaefer
  Cc: Sebastian Ott, Andrea Arcangeli, Kirill A. Shutemov, linux-mm,
	linux-kernel, Aneesh Kumar K.V, Andrew Morton, Linus Torvalds,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

On 02/15/2016 10:35 PM, Kirill A. Shutemov wrote:
> 
> Is there any chance that I'll be able to trigger the bug using QEMU?
> Does anybody have an QEMU image I can use?

qemu/TCG on s390 does neither provide SMP nor large pages (only QEMU/KVM does)
so this will probably not help you here. 

Christian

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-16 18:46                       ` Christian Borntraeger
  0 siblings, 0 replies; 153+ messages in thread
From: Christian Borntraeger @ 2016-02-16 18:46 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/15/2016 10:35 PM, Kirill A. Shutemov wrote:
> 
> Is there any chance that I'll be able to trigger the bug using QEMU?
> Does anybody have an QEMU image I can use?

qemu/TCG on s390 does neither provide SMP nor large pages (only QEMU/KVM does)
so this will probably not help you here. 

Christian

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-16 16:24                       ` Gerald Schaefer
  (?)
@ 2016-02-17 15:04                         ` Kirill A. Shutemov
  -1 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-17 15:04 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Sebastian Ott, Andrea Arcangeli, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

On Tue, Feb 16, 2016 at 05:24:44PM +0100, Gerald Schaefer wrote:
> On Mon, 15 Feb 2016 23:35:26 +0200
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > Is there any chance that I'll be able to trigger the bug using QEMU?
> > Does anybody have an QEMU image I can use?
> > 
> 
> I have no image, but trying to reproduce this under virtualization may
> help to trigger this also on other architectures. After ruling out IPI
> vs. fast_gup I do not really see why this should be arch-specific, and
> it wouldn't be the first time that we hit subtle races first on s390, due
> to our virtualized environment (my test case is make -j20 with 10 CPUs and
> 4GB of memory, no swap).

Could you post your kernel config?

It would be nice also to check if disabling split_huge_page() would make
any difference:

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a75081ca31cf..26d2b7b21021 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3364,6 +3364,8 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 	bool mlocked;
 	unsigned long flags;
 
+	return -EBUSY;
+
 	VM_BUG_ON_PAGE(is_huge_zero_page(page), page);
 	VM_BUG_ON_PAGE(!PageAnon(page), page);
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
-- 
 Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-17 15:04                         ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-17 15:04 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Sebastian Ott, Andrea Arcangeli, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

On Tue, Feb 16, 2016 at 05:24:44PM +0100, Gerald Schaefer wrote:
> On Mon, 15 Feb 2016 23:35:26 +0200
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > Is there any chance that I'll be able to trigger the bug using QEMU?
> > Does anybody have an QEMU image I can use?
> > 
> 
> I have no image, but trying to reproduce this under virtualization may
> help to trigger this also on other architectures. After ruling out IPI
> vs. fast_gup I do not really see why this should be arch-specific, and
> it wouldn't be the first time that we hit subtle races first on s390, due
> to our virtualized environment (my test case is make -j20 with 10 CPUs and
> 4GB of memory, no swap).

Could you post your kernel config?

It would be nice also to check if disabling split_huge_page() would make
any difference:

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a75081ca31cf..26d2b7b21021 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3364,6 +3364,8 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 	bool mlocked;
 	unsigned long flags;
 
+	return -EBUSY;
+
 	VM_BUG_ON_PAGE(is_huge_zero_page(page), page);
 	VM_BUG_ON_PAGE(!PageAnon(page), page);
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-17 15:04                         ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-17 15:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 16, 2016 at 05:24:44PM +0100, Gerald Schaefer wrote:
> On Mon, 15 Feb 2016 23:35:26 +0200
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > Is there any chance that I'll be able to trigger the bug using QEMU?
> > Does anybody have an QEMU image I can use?
> > 
> 
> I have no image, but trying to reproduce this under virtualization may
> help to trigger this also on other architectures. After ruling out IPI
> vs. fast_gup I do not really see why this should be arch-specific, and
> it wouldn't be the first time that we hit subtle races first on s390, due
> to our virtualized environment (my test case is make -j20 with 10 CPUs and
> 4GB of memory, no swap).

Could you post your kernel config?

It would be nice also to check if disabling split_huge_page() would make
any difference:

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a75081ca31cf..26d2b7b21021 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3364,6 +3364,8 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 	bool mlocked;
 	unsigned long flags;
 
+	return -EBUSY;
+
 	VM_BUG_ON_PAGE(is_huge_zero_page(page), page);
 	VM_BUG_ON_PAGE(!PageAnon(page), page);
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
-- 
 Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-17 15:04                         ` Kirill A. Shutemov
  (?)
@ 2016-02-17 19:04                           ` Sebastian Ott
  -1 siblings, 0 replies; 153+ messages in thread
From: Sebastian Ott @ 2016-02-17 19:04 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Gerald Schaefer, Andrea Arcangeli, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

[-- Attachment #1: Type: text/plain, Size: 10359 bytes --]

Hi,

On Wed, 17 Feb 2016, Kirill A. Shutemov wrote:
> On Tue, Feb 16, 2016 at 05:24:44PM +0100, Gerald Schaefer wrote:
> > On Mon, 15 Feb 2016 23:35:26 +0200
> > "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> > 
> > > Is there any chance that I'll be able to trigger the bug using QEMU?
> > > Does anybody have an QEMU image I can use?
> > > 
> > 
> > I have no image, but trying to reproduce this under virtualization may
> > help to trigger this also on other architectures. After ruling out IPI
> > vs. fast_gup I do not really see why this should be arch-specific, and
> > it wouldn't be the first time that we hit subtle races first on s390, due
> > to our virtualized environment (my test case is make -j20 with 10 CPUs and
> > 4GB of memory, no swap).
> 
> Could you post your kernel config?

Attached.

> It would be nice also to check if disabling split_huge_page() would make
> any difference:
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index a75081ca31cf..26d2b7b21021 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3364,6 +3364,8 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
>  	bool mlocked;
>  	unsigned long flags;
> 
> +	return -EBUSY;
> +
>  	VM_BUG_ON_PAGE(is_huge_zero_page(page), page);
>  	VM_BUG_ON_PAGE(!PageAnon(page), page);
>  	VM_BUG_ON_PAGE(!PageLocked(page), page);
> -- 

65c23c6 + this patch also oopsed:

¢ 1707.903808! ODEBUG: active_state not available (active state 0) object type:
rcu_head hint:           (null)
¢ 1707.903852! ------------¢ cut here !------------
¢ 1707.903854! WARNING: at lib/debugobjects.c:263
¢ 1707.903856! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa vxl
an ib_mad ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr xor raid6_pq gh
ash_s390 mlx4_core prng ecb aes_s390 des_s390 des_generic sha512_s390 dm_mod sha
256_s390 genwqe_card sha1_s390 sha_common crc_itu_t scm_block eadm_sch vhost_net
tun vhost macvtap macvlan kvm autofs4
¢ 1707.903892! CPU: 4 PID: 25215 Comm: git Not tainted 4.5.0-rc4-00037-g65c23c6-
dirty #273
¢ 1707.903894! task: 0000000006a60000 ti: 0000000063b04000 task.ti: 0000000063b0
4000
¢ 1707.903896! Krnl PSW : 0404c00180000000 0000000000486ce0 (debug_print_object+
							     0xb0/0xd0)
¢ 1707.903905!            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:
3
Krnl GPRS: 0000000001a361c7 0000000006a60000 0000000000000060 0000000000000101
¢ 1707.903908!            0000000000486cdc 0000000000000000 000000000088cbdc 000
0000001b53848
¢ 1707.903910!            0700000000000001 0000000000000000 0000000001b53850 000
00000008bb820
¢ 1707.903912!            0000000000a8d710 00000000dcdd3d38 0000000000486cdc 000
00000dcdd3c38
¢ 1707.903920! Krnl Code: 0000000000486cd0: c0200021a496        larl    %%r2,8bb
5fc
0000000000486cd6: c0e5ffee03a1       brasl   %%r14,247418
#0000000000486cdc: a7f40001           brc     15,486cde
>0000000000486ce0: c41d002f488e       lrl     %%r1,a6fdfc
0000000000486ce6: e340f0e80004       lg      %%r4,232(%%r15)
0000000000486cec: a71a0001           ahi     %%r1,1
0000000000486cf0: eb6ff0a80004       lmg     %%r6,%%r15,168(%%r15)
0000000000486cf6: c41f002f4883       strl    %%r1,a6fdfc
¢ 1707.903960! Call Trace:
¢ 1707.903962! (¢<0000000000486cdc>! debug_print_object+0xac/0xd0)
¢ 1707.903964!  ¢<0000000000488094>! debug_object_active_state+0x164/0x178
¢ 1707.903969!  ¢<00000000001b991c>! rcu_process_callbacks+0x564/0x9e8
¢ 1707.903973!  ¢<000000000013d3ee>! __do_softirq+0x256/0x568
¢ 1707.903975!  ¢<000000000013da3a>! irq_exit+0x7a/0xd8
¢ 1707.903979!  ¢<000000000010c87e>! do_IRQ+0x86/0xc0
¢ 1707.903984!  ¢<00000000006fa3f2>! ext_int_handler+0x11e/0x124
¢ 1707.903987!  ¢<0000000000199bfe>! lock_release+0x5ce/0x670
¢ 1707.903989! (¢<0000000000199be0>! lock_release+0x5b0/0x670)
¢ 1707.903993!  ¢<00000000002dffa2>! getname_flags+0x82/0x218
¢ 1707.903994!  ¢<00000000002e04e8>! user_path_at_empty+0x40/0x68
¢ 1707.903998!  ¢<00000000002d44a4>! vfs_fstatat+0x6c/0xc8
¢ 1707.903999!  ¢<00000000002d4894>! SyS_newlstat+0x2c/0x48
¢ 1707.904002!  ¢<00000000006f9cce>! system_call+0xd6/0x258
¢ 1707.904003!  ¢<000003ffb45f1124>! 0x3ffb45f1124
¢ 1707.904005! 1 lock held by git/25215:
¢ 1707.904006!  #0:  (&obj_hash¢i!.lock){-.-.-.}, at: ¢<0000000000487fdc>! debug
_object_active_state+0xac/0x178
¢ 1707.904012! Last Breaking-Event-Address:
¢ 1707.904014!  ¢<0000000000486cdc>! debug_print_object+0xac/0xd0
¢ 1707.904016! ---¢ end trace 8ce68dc422e8321c !---
¢ 1707.904018! ODEBUG: deactivate not available (active state 0) object type: rc
u_head hint:           (null)
¢ 1707.904026! ------------¢ cut here !------------
¢ 1707.904027! WARNING: at lib/debugobjects.c:263
¢ 1707.904028! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa vxl
an ib_mad ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr xor raid6_pq gh
ash_s390 mlx4_core prng ecb aes_s390 des_s390 des_generic sha512_s390 dm_mod sha
256_s390 genwqe_card sha1_s390 sha_common crc_itu_t scm_block eadm_sch vhost_net
tun vhost macvtap macvlan kvm autofs4
¢ 1707.904055! CPU: 4 PID: 25215 Comm: git Tainted: G        W       4.5.0-rc4-0
0037-g65c23c6-dirty #273
¢ 1707.904057! task: 0000000006a60000 ti: 0000000063b04000 task.ti: 0000000063b0
4000
¢ 1707.904058! Krnl PSW : 0404c00180000000 0000000000486ce0 (debug_print_object+
							     0xb0/0xd0)
¢ 1707.904062!            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:
3
Krnl GPRS: 0000000001a361c7 0000000006a60000 000000000000005e 0000000000000101
¢ 1707.904066!            0000000000486cdc 0000000000000000 000000000088cbdc 000
000000000000a
¢ 1707.904068!            0000000091cdb020 07000000dcdd3c68 0000000001b53850 000
00000008979ea
¢ 1707.904069!            0000000000a8d710 00000000dcdd3d48 0000000000486cdc 000
00000dcdd3c48
¢ 1707.904074! Krnl Code: 0000000000486cd0: c0200021a496        larl    %%r2,8bb
5fc
0000000000486cd6: c0e5ffee03a1       brasl   %%r14,247418
#0000000000486cdc: a7f40001           brc     15,486cde
>0000000000486ce0: c41d002f488e       lrl     %%r1,a6fdfc
0000000000486ce6: e340f0e80004       lg      %%r4,232(%%r15)
0000000000486cec: a71a0001           ahi     %%r1,1
0000000000486cf0: eb6ff0a80004       lmg     %%r6,%%r15,168(%%r15)
0000000000486cf6: c41f002f4883       strl    %%r1,a6fdfc
¢ 1707.904088! Call Trace:
¢ 1707.904090! (¢<0000000000486cdc>! debug_print_object+0xac/0xd0)
¢ 1707.904092!  ¢<0000000000487a38>! debug_object_deactivate+0x170/0x188
¢ 1707.904094!  ¢<00000000001b992e>! rcu_process_callbacks+0x576/0x9e8
¢ 1707.904096!  ¢<000000000013d3ee>! __do_softirq+0x256/0x568
¢ 1707.904098!  ¢<000000000013da3a>! irq_exit+0x7a/0xd8
¢ 1707.904100!  ¢<000000000010c87e>! do_IRQ+0x86/0xc0
¢ 1707.904102!  ¢<00000000006fa3f2>! ext_int_handler+0x11e/0x124
¢ 1707.904104!  ¢<0000000000199bfe>! lock_release+0x5ce/0x670
¢ 1707.904106! (¢<0000000000199be0>! lock_release+0x5b0/0x670)
¢ 1707.904108!  ¢<00000000002dffa2>! getname_flags+0x82/0x218
¢ 1707.904109!  ¢<00000000002e04e8>! user_path_at_empty+0x40/0x68
¢ 1707.904111!  ¢<00000000002d44a4>! vfs_fstatat+0x6c/0xc8
¢ 1707.904113!  ¢<00000000002d4894>! SyS_newlstat+0x2c/0x48
¢ 1707.904115!  ¢<00000000006f9cce>! system_call+0xd6/0x258
¢ 1707.904117!  ¢<000003ffb45f1124>! 0x3ffb45f1124
¢ 1707.904118! 1 lock held by git/25215:
¢ 1707.904119!  #0:  (&obj_hash¢i!.lock){-.-.-.}, at: ¢<000000000048796c>! debug
_object_deactivate+0xa4/0x188
¢ 1707.904124! Last Breaking-Event-Address:
¢ 1707.904126!  ¢<0000000000486cdc>! debug_print_object+0xac/0xd0
¢ 1707.904128! ---¢ end trace 8ce68dc422e8321d !---
¢ 1707.904150! ------------¢ cut here !------------
¢ 1707.904152! Kernel BUG at 0000000008cf8002 ¢verbose debug info unavailable!
¢ 1707.904197! illegal operation: 0001 ilc:1 ¢#1! PREEMPT SMP DEBUG_PAGEALLOC
¢ 1707.904203! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa vxl
an ib_mad ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr xor raid6_pq gh
ash_s390 mlx4_core prng ecb aes_s390 des_s390 des_generic sha512_s390 dm_mod sha
256_s390 genwqe_card sha1_s390 sha_common crc_itu_t scm_block eadm_sch vhost_net
tun vhost macvtap macvlan kvm autofs4
¢ 1707.904240! CPU: 4 PID: 25215 Comm: git Tainted: G        W       4.5.0-rc4-0
0037-g65c23c6-dirty #273
¢ 1707.904242! task: 0000000006a60000 ti: 0000000063b04000 task.ti: 0000000063b0
4000
¢ 1707.904244! Krnl PSW : 0704d00180000000 0000000008cf8002 (0x8cf8002)
¢ 1707.904248!            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:
3
Krnl GPRS: 0000000000000000 0000000008cf8000 0000000091cdb020 0000000091cdb020
¢ 1707.904252!            00000000001b9964 0000000000000000 0000000000000000 000
000000000000a
¢ 1707.904254!            0000000000000000 0000000008cf8000 0000000000000004 000
00000034d6802
¢ 1707.904256!            00000000dec0f600 00000000007063d8 00000000001b99ae 000
00000dcdd3d18
¢ 1707.904263! Krnl Code: 0000000008cf7ff6: 5a5a5a5a            a       %%r5,265
0(%%r10,%%r5)
0000000008cf7ffa: 5a5a5a5a           a       %%r5,2650(%%r10,%%r5)
#0000000008cf7ffe: 5a5a0000           a       %%r5,0(%%r10,%%r0)
>0000000008cf8002: 0000               unknown
0000000008cf8004: 0000               unknown
0000000008cf8006: 0020               unknown
0000000008cf8008: 0000               unknown
0000000008cf800a: 0000               unknown
¢ 1707.904277! Call Trace:
¢ 1707.904279! (¢<00000000001b9964>! rcu_process_callbacks+0x5ac/0x9e8)
¢ 1707.904282!  ¢<000000000013d3ee>! __do_softirq+0x256/0x568
¢ 1707.904284!  ¢<000000000013da3a>! irq_exit+0x7a/0xd8
¢ 1707.904286!  ¢<000000000010c87e>! do_IRQ+0x86/0xc0
¢ 1707.904289!  ¢<00000000006fa3f2>! ext_int_handler+0x11e/0x124
¢ 1707.904291!  ¢<0000000000199bfe>! lock_release+0x5ce/0x670
¢ 1707.904293! (¢<0000000000199be0>! lock_release+0x5b0/0x670)
¢ 1707.904295!  ¢<00000000002dffa2>! getname_flags+0x82/0x218
¢ 1707.904297!  ¢<00000000002e04e8>! user_path_at_empty+0x40/0x68
¢ 1707.904299!  ¢<00000000002d44a4>! vfs_fstatat+0x6c/0xc8
¢ 1707.904301!  ¢<00000000002d4894>! SyS_newlstat+0x2c/0x48
¢ 1707.904303!  ¢<00000000006f9cce>! system_call+0xd6/0x258
¢ 1707.904305!  ¢<000003ffb45f1124>! 0x3ffb45f1124
¢ 1707.904307! INFO: lockdep is turned off.
¢ 1707.904308! Last Breaking-Event-Address:
¢ 1707.904310!  ¢<00000000001b99ac>! rcu_process_callbacks+0x5f4/0x9e8
¢ 1707.904314!
¢ 1707.904315! Kernel panic - not syncing: Fatal exception in interrupt

[-- Attachment #2: Type: text/plain, Size: 51707 bytes --]

#
# Automatically generated file; DO NOT EDIT.
# Linux/s390 4.5.0-rc3 Kernel Configuration
#
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_GENERIC_LOCKBREAK=y
CONFIG_PGSTE=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_KEXEC=y
CONFIG_AUDIT_ARCH=y
CONFIG_NO_IOPORT_MAP=y
# CONFIG_PCI_QUIRKS is not set
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_S390=y
CONFIG_SCHED_OMIT_FRAME_POINTER=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
CONFIG_USELIB=y
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_WATCH=y
CONFIG_AUDIT_TREE=y

#
# IRQ subsystem
#
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
CONFIG_VIRT_CPU_ACCOUNTING=y
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y

#
# RCU Subsystem
#
CONFIG_PREEMPT_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TASKS_RCU=y
CONFIG_RCU_STALL_COMMON=y
# CONFIG_TREE_RCU_TRACE is not set
# CONFIG_RCU_EXPEDITE_BOOT is not set
CONFIG_BUILD_BIN2C=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=17
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_CGROUPS=y
# CONFIG_MEMCG is not set
CONFIG_BLK_CGROUP=y
# CONFIG_DEBUG_BLK_CGROUP is not set
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
CONFIG_RT_GROUP_SCHED=y
# CONFIG_CGROUP_PIDS is not set
CONFIG_CGROUP_FREEZER=y
# CONFIG_CGROUP_HUGETLB is not set
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_PERF=y
# CONFIG_CGROUP_DEBUG is not set
# CONFIG_CHECKPOINT_RESTORE is not set
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_IPC_NS=y
# CONFIG_USER_NS is not set
CONFIG_PID_NS=y
CONFIG_NET_NS=y
CONFIG_SCHED_AUTOGROUP=y
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
CONFIG_RD_LZO=y
CONFIG_RD_LZ4=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_BPF=y
# CONFIG_EXPERT is not set
CONFIG_MULTIUSER=y
# CONFIG_SGETMASK_SYSCALL is not set
CONFIG_SYSFS_SYSCALL=y
# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_HAVE_FUTEX_CMPXCHG=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
# CONFIG_BPF_SYSCALL is not set
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_ADVISE_SYSCALLS=y
# CONFIG_USERFAULTFD is not set
CONFIG_MEMBARRIER=y
# CONFIG_EMBEDDED is not set
CONFIG_HAVE_PERF_EVENTS=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_COMPAT_BRK is not set
# CONFIG_SLAB is not set
CONFIG_SLUB=y
CONFIG_SLUB_CPU_PARTIAL=y
# CONFIG_SYSTEM_DATA_VERIFICATION is not set
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
CONFIG_KEXEC_CORE=y
CONFIG_OPROFILE=m
CONFIG_HAVE_OPROFILE=y
CONFIG_KPROBES=y
CONFIG_JUMP_LABEL=y
# CONFIG_STATIC_KEYS_SELFTEST is not set
# CONFIG_UPROBES is not set
CONFIG_HAVE_64BIT_ALIGNED_ACCESS=y
CONFIG_ARCH_USE_BUILTIN_BSWAP=y
CONFIG_KRETPROBES=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_DMA_API_DEBUG=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y
CONFIG_HAVE_CMPXCHG_LOCAL=y
CONFIG_HAVE_CMPXCHG_DOUBLE=y
CONFIG_ARCH_WANT_IPC_PARSE_VERSION=y
CONFIG_ARCH_WANT_COMPAT_IPC_PARSE_VERSION=y
CONFIG_ARCH_WANT_OLD_COMPAT_IPC=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP_FILTER=y
# CONFIG_CC_STACKPROTECTOR is not set
CONFIG_HAVE_VIRT_CPU_ACCOUNTING=y
CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_HAVE_ARCH_SOFT_DIRTY=y
CONFIG_HAVE_MOD_ARCH_SPECIFIC=y
CONFIG_MODULES_USE_ELF_RELA=y
CONFIG_ARCH_HAS_ELF_RANDOMIZE=y
CONFIG_CLONE_BACKWARDS2=y
CONFIG_OLD_SIGSUSPEND3=y
CONFIG_OLD_SIGACTION=y
CONFIG_COMPAT_OLD_SIGACTION=y

#
# GCOV-based kernel profiling
#
# CONFIG_GCOV_KERNEL is not set
CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y
# CONFIG_HAVE_GENERIC_DMA_COHERENT is not set
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_FORCE_LOAD=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
# CONFIG_MODULE_SIG is not set
CONFIG_MODULE_COMPRESS=y
# CONFIG_MODULE_COMPRESS_GZIP is not set
CONFIG_MODULE_COMPRESS_XZ=y
CONFIG_MODULES_TREE_LOOKUP=y
CONFIG_BLOCK=y
CONFIG_BLK_DEV_BSG=y
CONFIG_BLK_DEV_BSGLIB=y
CONFIG_BLK_DEV_INTEGRITY=y
CONFIG_BLK_DEV_THROTTLING=y
# CONFIG_BLK_CMDLINE_PARSER is not set

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
# CONFIG_AIX_PARTITION is not set
# CONFIG_OSF_PARTITION is not set
# CONFIG_AMIGA_PARTITION is not set
# CONFIG_ATARI_PARTITION is not set
CONFIG_IBM_PARTITION=y
# CONFIG_MAC_PARTITION is not set
CONFIG_MSDOS_PARTITION=y
CONFIG_BSD_DISKLABEL=y
CONFIG_MINIX_SUBPARTITION=y
CONFIG_SOLARIS_X86_PARTITION=y
CONFIG_UNIXWARE_DISKLABEL=y
# CONFIG_LDM_PARTITION is not set
# CONFIG_SGI_PARTITION is not set
# CONFIG_ULTRIX_PARTITION is not set
# CONFIG_SUN_PARTITION is not set
# CONFIG_KARMA_PARTITION is not set
CONFIG_EFI_PARTITION=y
# CONFIG_SYSV68_PARTITION is not set
# CONFIG_CMDLINE_PARTITION is not set
CONFIG_BLOCK_COMPAT=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
CONFIG_CFQ_GROUP_IOSCHED=y
CONFIG_DEFAULT_DEADLINE=y
# CONFIG_DEFAULT_CFQ is not set
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="deadline"
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_ASN1=m
CONFIG_ARCH_INLINE_SPIN_TRYLOCK=y
CONFIG_ARCH_INLINE_SPIN_TRYLOCK_BH=y
CONFIG_ARCH_INLINE_SPIN_LOCK=y
CONFIG_ARCH_INLINE_SPIN_LOCK_BH=y
CONFIG_ARCH_INLINE_SPIN_LOCK_IRQ=y
CONFIG_ARCH_INLINE_SPIN_LOCK_IRQSAVE=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK_BH=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK_IRQ=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE=y
CONFIG_ARCH_INLINE_READ_TRYLOCK=y
CONFIG_ARCH_INLINE_READ_LOCK=y
CONFIG_ARCH_INLINE_READ_LOCK_BH=y
CONFIG_ARCH_INLINE_READ_LOCK_IRQ=y
CONFIG_ARCH_INLINE_READ_LOCK_IRQSAVE=y
CONFIG_ARCH_INLINE_READ_UNLOCK=y
CONFIG_ARCH_INLINE_READ_UNLOCK_BH=y
CONFIG_ARCH_INLINE_READ_UNLOCK_IRQ=y
CONFIG_ARCH_INLINE_READ_UNLOCK_IRQRESTORE=y
CONFIG_ARCH_INLINE_WRITE_TRYLOCK=y
CONFIG_ARCH_INLINE_WRITE_LOCK=y
CONFIG_ARCH_INLINE_WRITE_LOCK_BH=y
CONFIG_ARCH_INLINE_WRITE_LOCK_IRQ=y
CONFIG_ARCH_INLINE_WRITE_LOCK_IRQSAVE=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK_BH=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK_IRQ=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE=y
CONFIG_UNINLINE_SPIN_UNLOCK=y
CONFIG_ARCH_SUPPORTS_ATOMIC_RMW=y
CONFIG_RWSEM_SPIN_ON_OWNER=y
CONFIG_LOCK_SPIN_ON_OWNER=y
CONFIG_FREEZER=y
CONFIG_HAVE_LIVEPATCH=y

#
# Processor type and features
#
CONFIG_HAVE_MARCH_Z900_FEATURES=y
CONFIG_HAVE_MARCH_Z990_FEATURES=y
CONFIG_HAVE_MARCH_Z9_109_FEATURES=y
CONFIG_HAVE_MARCH_Z10_FEATURES=y
CONFIG_HAVE_MARCH_Z196_FEATURES=y
# CONFIG_HAVE_MARCH_ZEC12_FEATURES is not set
# CONFIG_HAVE_MARCH_Z13_FEATURES is not set
# CONFIG_MARCH_Z900 is not set
# CONFIG_MARCH_Z990 is not set
# CONFIG_MARCH_Z9_109 is not set
# CONFIG_MARCH_Z10 is not set
CONFIG_MARCH_Z196=y
# CONFIG_MARCH_ZEC12 is not set
# CONFIG_MARCH_Z13 is not set
# CONFIG_MARCH_Z900_TUNE is not set
# CONFIG_MARCH_Z990_TUNE is not set
# CONFIG_MARCH_Z9_109_TUNE is not set
# CONFIG_MARCH_Z10_TUNE is not set
# CONFIG_MARCH_Z196_TUNE is not set
CONFIG_MARCH_ZEC12_TUNE=y
# CONFIG_MARCH_Z13_TUNE is not set
# CONFIG_TUNE_DEFAULT is not set
# CONFIG_TUNE_Z900 is not set
# CONFIG_TUNE_Z990 is not set
# CONFIG_TUNE_Z9_109 is not set
# CONFIG_TUNE_Z10 is not set
# CONFIG_TUNE_Z196 is not set
CONFIG_TUNE_ZEC12=y
# CONFIG_TUNE_Z13 is not set
CONFIG_64BIT=y
CONFIG_COMPAT=y
CONFIG_SYSVIPC_COMPAT=y
CONFIG_KEYS_COMPAT=y
CONFIG_SMP=y
CONFIG_NR_CPUS=256
CONFIG_HOTPLUG_CPU=y
# CONFIG_NODES_SPAN_OTHER_NODES is not set
# CONFIG_NUMA is not set
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
CONFIG_SCHED_BOOK=y
CONFIG_SCHED_TOPOLOGY=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
CONFIG_HZ_100=y
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=100
CONFIG_SCHED_HRTICK=y

#
# Memory setup
#
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK=y
CONFIG_FORCE_MAX_ZONEORDER=9
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_HAVE_MEMORY_PRESENT=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_HAVE_MEMBLOCK=y
CONFIG_HAVE_MEMBLOCK_NODE_MAP=y
CONFIG_HAVE_MEMBLOCK_PHYS_MAP=y
CONFIG_NO_BOOTMEM=y
CONFIG_MEMORY_ISOLATION=y
# CONFIG_HAVE_BOOTMEM_INFO_NODE is not set
CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG_SPARSE=y
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_MEMORY_BALLOON=y
CONFIG_BALLOON_COMPACTION=y
CONFIG_COMPACTION=y
CONFIG_MIGRATION=y
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_MMU_NOTIFIER=y
CONFIG_KSM=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
# CONFIG_CLEANCACHE is not set
# CONFIG_FRONTSWAP is not set
# CONFIG_CMA is not set
# CONFIG_ZPOOL is not set
# CONFIG_ZBUD is not set
# CONFIG_ZSMALLOC is not set
# CONFIG_IDLE_PAGE_TRACKING is not set
CONFIG_PACK_STACK=y
CONFIG_CHECK_STACK=y
CONFIG_STACK_GUARD=256
# CONFIG_WARN_DYNAMIC_STACK is not set

#
# I/O subsystem
#
CONFIG_QDIO=y
CONFIG_PCI=y
CONFIG_PCI_NR_FUNCTIONS=64
CONFIG_PCI_NR_MSI=256
CONFIG_PCI_BUS_ADDR_T_64BIT=y
CONFIG_PCI_MSI=y
CONFIG_PCI_DEBUG=y
# CONFIG_PCI_REALLOC_ENABLE_AUTO is not set
# CONFIG_PCI_STUB is not set
CONFIG_PCI_ATS=y
CONFIG_PCI_IOV=y
# CONFIG_PCI_PRI is not set
# CONFIG_PCI_PASID is not set

#
# PCI host controller drivers
#
# CONFIG_PCIEPORTBUS is not set
CONFIG_HOTPLUG_PCI=y
# CONFIG_HOTPLUG_PCI_CPCI is not set
# CONFIG_HOTPLUG_PCI_SHPC is not set
CONFIG_HOTPLUG_PCI_S390=y
CONFIG_PCI_DOMAINS=y
CONFIG_HAS_IOMEM=y
CONFIG_IOMMU_HELPER=y
CONFIG_HAS_DMA=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_CHSC_SCH=y
CONFIG_SCM_BUS=y
CONFIG_EADM_SCH=m

#
# Dump support
#
CONFIG_CRASH_DUMP=y

#
# Executable file formats / Emulations
#
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
CONFIG_BINFMT_SCRIPT=y
# CONFIG_HAVE_AOUT is not set
CONFIG_BINFMT_MISC=m
CONFIG_COREDUMP=y
CONFIG_SECCOMP=y

#
# Power Management
#
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_HIBERNATE_CALLBACKS=y
CONFIG_HIBERNATION=y
CONFIG_ARCH_SAVE_PAGE_KEYS=y
CONFIG_PM_STD_PARTITION=""
CONFIG_PM_SLEEP=y
CONFIG_PM_SLEEP_SMP=y
# CONFIG_PM_AUTOSLEEP is not set
# CONFIG_PM_WAKELOCKS is not set
CONFIG_PM=y
# CONFIG_PM_DEBUG is not set
# CONFIG_WQ_POWER_EFFICIENT_DEFAULT is not set
CONFIG_NET=y
CONFIG_NET_INGRESS=y
CONFIG_NET_EGRESS=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_DIAG=m
CONFIG_UNIX=y
CONFIG_UNIX_DIAG=m
CONFIG_XFRM=y
CONFIG_XFRM_ALGO=m
CONFIG_XFRM_USER=m
# CONFIG_XFRM_SUB_POLICY is not set
# CONFIG_XFRM_MIGRATE is not set
# CONFIG_XFRM_STATISTICS is not set
CONFIG_XFRM_IPCOMP=m
CONFIG_NET_KEY=m
# CONFIG_NET_KEY_MIGRATE is not set
CONFIG_IUCV=y
CONFIG_AFIUCV=m
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
# CONFIG_IP_FIB_TRIE_STATS is not set
CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_IP_ROUTE_MULTIPATH=y
CONFIG_IP_ROUTE_VERBOSE=y
CONFIG_IP_ROUTE_CLASSID=y
# CONFIG_IP_PNP is not set
CONFIG_NET_IPIP=m
CONFIG_NET_IPGRE_DEMUX=m
CONFIG_NET_IP_TUNNEL=m
CONFIG_NET_IPGRE=m
CONFIG_NET_IPGRE_BROADCAST=y
CONFIG_IP_MROUTE=y
CONFIG_IP_MROUTE_MULTIPLE_TABLES=y
CONFIG_IP_PIMSM_V1=y
CONFIG_IP_PIMSM_V2=y
CONFIG_SYN_COOKIES=y
CONFIG_NET_IPVTI=m
CONFIG_NET_UDP_TUNNEL=m
# CONFIG_NET_FOU is not set
# CONFIG_NET_FOU_IP_TUNNELS is not set
CONFIG_INET_AH=m
CONFIG_INET_ESP=m
CONFIG_INET_IPCOMP=m
CONFIG_INET_XFRM_TUNNEL=m
CONFIG_INET_TUNNEL=m
CONFIG_INET_XFRM_MODE_TRANSPORT=m
CONFIG_INET_XFRM_MODE_TUNNEL=m
CONFIG_INET_XFRM_MODE_BEET=m
CONFIG_INET_LRO=y
CONFIG_INET_DIAG=m
CONFIG_INET_TCP_DIAG=m
CONFIG_INET_UDP_DIAG=m
# CONFIG_INET_DIAG_DESTROY is not set
CONFIG_TCP_CONG_ADVANCED=y
CONFIG_TCP_CONG_BIC=m
CONFIG_TCP_CONG_CUBIC=y
CONFIG_TCP_CONG_WESTWOOD=m
CONFIG_TCP_CONG_HTCP=m
CONFIG_TCP_CONG_HSTCP=m
CONFIG_TCP_CONG_HYBLA=m
CONFIG_TCP_CONG_VEGAS=m
CONFIG_TCP_CONG_SCALABLE=m
CONFIG_TCP_CONG_LP=m
CONFIG_TCP_CONG_VENO=m
CONFIG_TCP_CONG_YEAH=m
CONFIG_TCP_CONG_ILLINOIS=m
# CONFIG_TCP_CONG_DCTCP is not set
# CONFIG_TCP_CONG_CDG is not set
CONFIG_DEFAULT_CUBIC=y
# CONFIG_DEFAULT_RENO is not set
CONFIG_DEFAULT_TCP_CONG="cubic"
# CONFIG_TCP_MD5SIG is not set
CONFIG_IPV6=y
CONFIG_IPV6_ROUTER_PREF=y
# CONFIG_IPV6_ROUTE_INFO is not set
# CONFIG_IPV6_OPTIMISTIC_DAD is not set
CONFIG_INET6_AH=m
CONFIG_INET6_ESP=m
CONFIG_INET6_IPCOMP=m
CONFIG_IPV6_MIP6=m
CONFIG_INET6_XFRM_TUNNEL=m
CONFIG_INET6_TUNNEL=m
CONFIG_INET6_XFRM_MODE_TRANSPORT=m
CONFIG_INET6_XFRM_MODE_TUNNEL=m
CONFIG_INET6_XFRM_MODE_BEET=m
CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION=m
CONFIG_IPV6_VTI=m
CONFIG_IPV6_SIT=m
# CONFIG_IPV6_SIT_6RD is not set
CONFIG_IPV6_NDISC_NODETYPE=y
CONFIG_IPV6_TUNNEL=m
CONFIG_IPV6_GRE=m
CONFIG_IPV6_MULTIPLE_TABLES=y
CONFIG_IPV6_SUBTREES=y
# CONFIG_IPV6_MROUTE is not set
# CONFIG_NETLABEL is not set
CONFIG_NETWORK_SECMARK=y
CONFIG_NET_PTP_CLASSIFY=y
# CONFIG_NETWORK_PHY_TIMESTAMPING is not set
# CONFIG_NETFILTER is not set
# CONFIG_IP_DCCP is not set
CONFIG_IP_SCTP=m
CONFIG_NET_SCTPPROBE=m
# CONFIG_SCTP_DBG_OBJCNT is not set
CONFIG_SCTP_DEFAULT_COOKIE_HMAC_MD5=y
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_SHA1 is not set
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_NONE is not set
CONFIG_SCTP_COOKIE_HMAC_MD5=y
# CONFIG_SCTP_COOKIE_HMAC_SHA1 is not set
CONFIG_RDS=m
CONFIG_RDS_RDMA=m
CONFIG_RDS_TCP=m
CONFIG_RDS_DEBUG=y
# CONFIG_TIPC is not set
# CONFIG_ATM is not set
CONFIG_L2TP=m
CONFIG_L2TP_DEBUGFS=m
CONFIG_L2TP_V3=y
CONFIG_L2TP_IP=m
CONFIG_L2TP_ETH=m
CONFIG_STP=m
CONFIG_GARP=m
CONFIG_BRIDGE=m
CONFIG_BRIDGE_IGMP_SNOOPING=y
# CONFIG_BRIDGE_VLAN_FILTERING is not set
CONFIG_VLAN_8021Q=m
CONFIG_VLAN_8021Q_GVRP=y
# CONFIG_VLAN_8021Q_MVRP is not set
# CONFIG_DECNET is not set
CONFIG_LLC=m
# CONFIG_LLC2 is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_PHONET is not set
# CONFIG_6LOWPAN is not set
# CONFIG_IEEE802154 is not set
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
CONFIG_NET_SCH_CBQ=m
CONFIG_NET_SCH_HTB=m
CONFIG_NET_SCH_HFSC=m
CONFIG_NET_SCH_PRIO=m
CONFIG_NET_SCH_MULTIQ=m
CONFIG_NET_SCH_RED=m
CONFIG_NET_SCH_SFB=m
CONFIG_NET_SCH_SFQ=m
CONFIG_NET_SCH_TEQL=m
CONFIG_NET_SCH_TBF=m
CONFIG_NET_SCH_GRED=m
CONFIG_NET_SCH_DSMARK=m
CONFIG_NET_SCH_NETEM=m
CONFIG_NET_SCH_DRR=m
CONFIG_NET_SCH_MQPRIO=m
CONFIG_NET_SCH_CHOKE=m
CONFIG_NET_SCH_QFQ=m
CONFIG_NET_SCH_CODEL=m
CONFIG_NET_SCH_FQ_CODEL=m
# CONFIG_NET_SCH_FQ is not set
# CONFIG_NET_SCH_HHF is not set
# CONFIG_NET_SCH_PIE is not set
CONFIG_NET_SCH_INGRESS=m
CONFIG_NET_SCH_PLUG=m

#
# Classification
#
CONFIG_NET_CLS=y
CONFIG_NET_CLS_BASIC=m
CONFIG_NET_CLS_TCINDEX=m
CONFIG_NET_CLS_ROUTE4=m
CONFIG_NET_CLS_FW=m
CONFIG_NET_CLS_U32=m
CONFIG_CLS_U32_PERF=y
CONFIG_CLS_U32_MARK=y
CONFIG_NET_CLS_RSVP=m
CONFIG_NET_CLS_RSVP6=m
CONFIG_NET_CLS_FLOW=m
CONFIG_NET_CLS_CGROUP=y
CONFIG_NET_CLS_BPF=m
# CONFIG_NET_CLS_FLOWER is not set
# CONFIG_NET_EMATCH is not set
CONFIG_NET_CLS_ACT=y
CONFIG_NET_ACT_POLICE=m
CONFIG_NET_ACT_GACT=m
CONFIG_GACT_PROB=y
CONFIG_NET_ACT_MIRRED=m
CONFIG_NET_ACT_NAT=m
CONFIG_NET_ACT_PEDIT=m
CONFIG_NET_ACT_SIMP=m
CONFIG_NET_ACT_SKBEDIT=m
CONFIG_NET_ACT_CSUM=m
# CONFIG_NET_ACT_VLAN is not set
# CONFIG_NET_ACT_BPF is not set
# CONFIG_NET_CLS_IND is not set
CONFIG_NET_SCH_FIFO=y
# CONFIG_DCB is not set
CONFIG_DNS_RESOLVER=y
# CONFIG_BATMAN_ADV is not set
# CONFIG_OPENVSWITCH is not set
# CONFIG_VSOCKETS is not set
# CONFIG_NETLINK_MMAP is not set
# CONFIG_NETLINK_DIAG is not set
# CONFIG_MPLS is not set
# CONFIG_HSR is not set
# CONFIG_NET_SWITCHDEV is not set
# CONFIG_NET_L3_MASTER_DEV is not set
CONFIG_RPS=y
CONFIG_RFS_ACCEL=y
CONFIG_XPS=y
CONFIG_SOCK_CGROUP_DATA=y
# CONFIG_CGROUP_NET_PRIO is not set
CONFIG_CGROUP_NET_CLASSID=y
CONFIG_NET_RX_BUSY_POLL=y
CONFIG_BQL=y
CONFIG_BPF_JIT=y
CONFIG_NET_FLOW_LIMIT=y

#
# Network testing
#
CONFIG_NET_PKTGEN=m
CONFIG_NET_TCPPROBE=m
# CONFIG_NET_DROP_MONITOR is not set
# CONFIG_CAN is not set
# CONFIG_AF_RXRPC is not set
CONFIG_FIB_RULES=y
# CONFIG_WIMAX is not set
# CONFIG_RFKILL is not set
# CONFIG_NET_9P is not set
# CONFIG_CAIF is not set
# CONFIG_CEPH_LIB is not set
# CONFIG_NFC is not set
# CONFIG_LWTUNNEL is not set
CONFIG_HAVE_BPF_JIT=y
# CONFIG_PCMCIA is not set
CONFIG_CCW=y

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_UEVENT_HELPER=y
CONFIG_UEVENT_HELPER_PATH=""
CONFIG_DEVTMPFS=y
# CONFIG_DEVTMPFS_MOUNT is not set
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_FIRMWARE_IN_KERNEL=y
CONFIG_EXTRA_FIRMWARE=""
# CONFIG_FW_LOADER_USER_HELPER_FALLBACK is not set
CONFIG_ALLOW_DEV_COREDUMP=y
# CONFIG_DEBUG_DRIVER is not set
# CONFIG_DEBUG_DEVRES is not set
CONFIG_SYS_HYPERVISOR=y
# CONFIG_GENERIC_CPU_DEVICES is not set
CONFIG_GENERIC_CPU_AUTOPROBE=y
# CONFIG_DMA_SHARED_BUFFER is not set

#
# Bus devices
#
CONFIG_CONNECTOR=y
CONFIG_PROC_EVENTS=y
# CONFIG_MTD is not set
# CONFIG_OF is not set
# CONFIG_PARPORT is not set
CONFIG_BLK_DEV=y
# CONFIG_BLK_DEV_NULL_BLK is not set
# CONFIG_BLK_DEV_PCIESSD_MTIP32XX is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_BLK_DEV_DAC960 is not set
# CONFIG_BLK_DEV_UMEM is not set
# CONFIG_BLK_DEV_COW_COMMON is not set
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_LOOP_MIN_COUNT=8
CONFIG_BLK_DEV_CRYPTOLOOP=m
# CONFIG_BLK_DEV_DRBD is not set
# CONFIG_BLK_DEV_NBD is not set
# CONFIG_BLK_DEV_SKD is not set
CONFIG_BLK_DEV_OSD=m
# CONFIG_BLK_DEV_SX8 is not set
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=32768
# CONFIG_CDROM_PKTCDVD is not set
CONFIG_ATA_OVER_ETH=m

#
# S/390 block device drivers
#
CONFIG_BLK_DEV_XPRAM=m
CONFIG_DCSSBLK=m
CONFIG_DASD=y
CONFIG_DASD_PROFILE=y
CONFIG_DASD_ECKD=y
CONFIG_DASD_FBA=y
CONFIG_DASD_DIAG=y
CONFIG_DASD_EER=y
CONFIG_SCM_BLOCK=m
CONFIG_SCM_BLOCK_CLUSTER_WRITE=y
CONFIG_VIRTIO_BLK=y
# CONFIG_BLK_DEV_RBD is not set
# CONFIG_BLK_DEV_RSXX is not set
# CONFIG_BLK_DEV_NVME is not set

#
# Misc devices
#
# CONFIG_SENSORS_LIS3LV02D is not set
# CONFIG_DUMMY_IRQ is not set
# CONFIG_PHANTOM is not set
# CONFIG_SGI_IOC4 is not set
# CONFIG_TIFM_CORE is not set
# CONFIG_ENCLOSURE_SERVICES is not set
# CONFIG_HP_ILO is not set
# CONFIG_SRAM is not set
# CONFIG_C2PORT is not set

#
# EEPROM support
#
# CONFIG_EEPROM_93CX6 is not set
# CONFIG_CB710_CORE is not set

#
# Texas Instruments shared transport line discipline
#

#
# Altera FPGA firmware download module
#

#
# Intel MIC Bus Driver
#

#
# SCIF Bus Driver
#

#
# Intel MIC Host Driver
#

#
# Intel MIC Card Driver
#

#
# SCIF Driver
#

#
# Intel MIC Coprocessor State Management (COSM) Drivers
#
CONFIG_GENWQE=m
CONFIG_GENWQE_PLATFORM_ERROR_RECOVERY=0
# CONFIG_ECHO is not set
# CONFIG_CXL_BASE is not set
# CONFIG_CXL_KERNEL_API is not set
# CONFIG_CXL_EEH is not set

#
# SCSI device support
#
CONFIG_SCSI_MOD=y
CONFIG_RAID_ATTRS=m
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
CONFIG_SCSI_NETLINK=y
CONFIG_SCSI_MQ_DEFAULT=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
CONFIG_CHR_DEV_ST=m
CONFIG_CHR_DEV_OSST=m
CONFIG_BLK_DEV_SR=m
# CONFIG_BLK_DEV_SR_VENDOR is not set
CONFIG_CHR_DEV_SG=y
CONFIG_CHR_DEV_SCH=m
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_LOGGING=y
# CONFIG_SCSI_SCAN_ASYNC is not set

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=m
CONFIG_SCSI_FC_ATTRS=y
CONFIG_SCSI_ISCSI_ATTRS=m
CONFIG_SCSI_SAS_ATTRS=m
CONFIG_SCSI_SAS_LIBSAS=m
CONFIG_SCSI_SAS_HOST_SMP=y
CONFIG_SCSI_SRP_ATTRS=m
CONFIG_SCSI_LOWLEVEL=y
CONFIG_ISCSI_TCP=m
# CONFIG_ISCSI_BOOT_SYSFS is not set
# CONFIG_SCSI_CXGB3_ISCSI is not set
# CONFIG_SCSI_CXGB4_ISCSI is not set
# CONFIG_SCSI_BNX2_ISCSI is not set
# CONFIG_SCSI_BNX2X_FCOE is not set
# CONFIG_BE2ISCSI is not set
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_HPSA is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_3W_SAS is not set
# CONFIG_SCSI_ACARD is not set
# CONFIG_SCSI_AACRAID is not set
# CONFIG_SCSI_AIC7XXX is not set
# CONFIG_SCSI_AIC79XX is not set
# CONFIG_SCSI_AIC94XX is not set
# CONFIG_SCSI_MVSAS is not set
# CONFIG_SCSI_MVUMI is not set
# CONFIG_SCSI_DPT_I2O is not set
# CONFIG_SCSI_ADVANSYS is not set
# CONFIG_SCSI_ARCMSR is not set
# CONFIG_SCSI_ESAS2R is not set
# CONFIG_MEGARAID_NEWGEN is not set
# CONFIG_MEGARAID_LEGACY is not set
# CONFIG_MEGARAID_SAS is not set
# CONFIG_SCSI_MPT3SAS is not set
# CONFIG_SCSI_MPT2SAS is not set
# CONFIG_SCSI_UFSHCD is not set
# CONFIG_SCSI_HPTIOP is not set
CONFIG_LIBFC=m
CONFIG_LIBFCOE=m
# CONFIG_FCOE is not set
# CONFIG_SCSI_SNIC is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_FUTURE_DOMAIN is not set
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
# CONFIG_SCSI_STEX is not set
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
# CONFIG_SCSI_QLA_FC is not set
# CONFIG_SCSI_QLA_ISCSI is not set
# CONFIG_SCSI_LPFC is not set
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_AM53C974 is not set
# CONFIG_SCSI_WD719X is not set
CONFIG_SCSI_DEBUG=m
CONFIG_ZFCP=y
# CONFIG_SCSI_PMCRAID is not set
# CONFIG_SCSI_PM8001 is not set
# CONFIG_SCSI_BFA_FC is not set
CONFIG_SCSI_VIRTIO=m
# CONFIG_SCSI_CHELSIO_FCOE is not set
# CONFIG_SCSI_DH is not set
CONFIG_SCSI_OSD_INITIATOR=m
CONFIG_SCSI_OSD_ULD=m
CONFIG_SCSI_OSD_DPRINT_SENSE=1
# CONFIG_SCSI_OSD_DEBUG is not set
CONFIG_MD=y
# CONFIG_BLK_DEV_MD is not set
# CONFIG_BCACHE is not set
CONFIG_BLK_DEV_DM_BUILTIN=y
CONFIG_BLK_DEV_DM=m
# CONFIG_DM_MQ_DEFAULT is not set
# CONFIG_DM_DEBUG is not set
# CONFIG_DM_CRYPT is not set
# CONFIG_DM_SNAPSHOT is not set
# CONFIG_DM_THIN_PROVISIONING is not set
# CONFIG_DM_CACHE is not set
# CONFIG_DM_ERA is not set
CONFIG_DM_MIRROR=m
# CONFIG_DM_LOG_USERSPACE is not set
# CONFIG_DM_RAID is not set
# CONFIG_DM_ZERO is not set
# CONFIG_DM_MULTIPATH is not set
# CONFIG_DM_DELAY is not set
# CONFIG_DM_UEVENT is not set
# CONFIG_DM_FLAKEY is not set
# CONFIG_DM_VERITY is not set
# CONFIG_DM_SWITCH is not set
# CONFIG_DM_LOG_WRITES is not set
# CONFIG_TARGET_CORE is not set
# CONFIG_FUSION is not set

#
# IEEE 1394 (FireWire) support
#
# CONFIG_FIREWIRE is not set
# CONFIG_FIREWIRE_NOSY is not set
CONFIG_NETDEVICES=y
CONFIG_NET_CORE=y
CONFIG_BONDING=m
CONFIG_DUMMY=m
CONFIG_EQUALIZER=m
# CONFIG_NET_FC is not set
CONFIG_IFB=m
# CONFIG_NET_TEAM is not set
CONFIG_MACVLAN=m
CONFIG_MACVTAP=m
# CONFIG_IPVLAN is not set
CONFIG_VXLAN=m
# CONFIG_GENEVE is not set
# CONFIG_NETCONSOLE is not set
# CONFIG_NETPOLL is not set
# CONFIG_NET_POLL_CONTROLLER is not set
CONFIG_TUN=m
# CONFIG_TUN_VNET_CROSS_LE is not set
CONFIG_VETH=m
CONFIG_VIRTIO_NET=m
CONFIG_NLMON=m
# CONFIG_ARCNET is not set

#
# CAIF transport drivers
#
CONFIG_VHOST_NET=m
CONFIG_VHOST_RING=m
CONFIG_VHOST=m
# CONFIG_VHOST_CROSS_ENDIAN_LEGACY is not set
CONFIG_ETHERNET=y
# CONFIG_NET_VENDOR_3COM is not set
# CONFIG_NET_VENDOR_ADAPTEC is not set
# CONFIG_NET_VENDOR_AGERE is not set
# CONFIG_NET_VENDOR_ALTEON is not set
# CONFIG_ALTERA_TSE is not set
# CONFIG_NET_VENDOR_AMD is not set
# CONFIG_NET_VENDOR_ARC is not set
# CONFIG_NET_VENDOR_ATHEROS is not set
# CONFIG_NET_VENDOR_AURORA is not set
# CONFIG_NET_CADENCE is not set
# CONFIG_NET_VENDOR_BROADCOM is not set
# CONFIG_NET_VENDOR_BROCADE is not set
# CONFIG_NET_VENDOR_CAVIUM is not set
# CONFIG_NET_VENDOR_CHELSIO is not set
# CONFIG_NET_VENDOR_CISCO is not set
# CONFIG_DNET is not set
# CONFIG_NET_VENDOR_DEC is not set
# CONFIG_NET_VENDOR_DLINK is not set
CONFIG_NET_VENDOR_EMULEX=y
# CONFIG_BE2NET is not set
# CONFIG_NET_VENDOR_EZCHIP is not set
# CONFIG_NET_VENDOR_EXAR is not set
# CONFIG_NET_VENDOR_HP is not set
# CONFIG_NET_VENDOR_INTEL is not set
# CONFIG_JME is not set
# CONFIG_NET_VENDOR_MARVELL is not set
CONFIG_NET_VENDOR_MELLANOX=y
CONFIG_MLX4_EN=m
CONFIG_MLX4_EN_VXLAN=y
CONFIG_MLX4_CORE=m
CONFIG_MLX4_DEBUG=y
CONFIG_MLX5_CORE=m
CONFIG_MLX5_CORE_EN=y
CONFIG_MLXSW_CORE=m
CONFIG_MLXSW_PCI=m
# CONFIG_NET_VENDOR_MICREL is not set
# CONFIG_NET_VENDOR_MYRI is not set
# CONFIG_FEALNX is not set
# CONFIG_NET_VENDOR_NATSEMI is not set
CONFIG_NET_VENDOR_NETRONOME=y
# CONFIG_NFP_NETVF is not set
# CONFIG_NET_VENDOR_NVIDIA is not set
# CONFIG_NET_VENDOR_OKI is not set
# CONFIG_ETHOC is not set
# CONFIG_NET_PACKET_ENGINE is not set
# CONFIG_NET_VENDOR_QLOGIC is not set
# CONFIG_NET_VENDOR_QUALCOMM is not set
# CONFIG_NET_VENDOR_REALTEK is not set
# CONFIG_NET_VENDOR_RENESAS is not set
# CONFIG_NET_VENDOR_RDC is not set
# CONFIG_NET_VENDOR_ROCKER is not set
# CONFIG_NET_VENDOR_SAMSUNG is not set
# CONFIG_NET_VENDOR_SEEQ is not set
# CONFIG_NET_VENDOR_SILAN is not set
# CONFIG_NET_VENDOR_SIS is not set
# CONFIG_SFC is not set
# CONFIG_NET_VENDOR_SMSC is not set
# CONFIG_NET_VENDOR_STMICRO is not set
# CONFIG_NET_VENDOR_SUN is not set
# CONFIG_NET_VENDOR_SYNOPSYS is not set
# CONFIG_NET_VENDOR_TEHUTI is not set
# CONFIG_NET_VENDOR_TI is not set
# CONFIG_NET_VENDOR_VIA is not set
# CONFIG_NET_VENDOR_WIZNET is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_PHYLIB is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set

#
# S/390 network device drivers
#
CONFIG_LCS=m
CONFIG_CTCM=m
CONFIG_NETIUCV=m
CONFIG_SMSGIUCV=m
CONFIG_SMSGIUCV_EVENT=m
CONFIG_QETH=y
CONFIG_QETH_L2=y
CONFIG_QETH_L3=y
CONFIG_QETH_IPV6=y
CONFIG_CCWGROUP=y

#
# Host-side USB support is needed for USB Network Adapter support
#

#
# Enable WiMAX (Networking options) to see the WiMAX drivers
#
# CONFIG_WAN is not set
# CONFIG_VMXNET3 is not set
# CONFIG_NVM is not set

#
# Input device support
#
CONFIG_INPUT=y
# CONFIG_INPUT_FF_MEMLESS is not set
# CONFIG_INPUT_POLLDEV is not set
# CONFIG_INPUT_SPARSEKMAP is not set
# CONFIG_INPUT_MATRIXKMAP is not set

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
# CONFIG_INPUT_JOYDEV is not set
# CONFIG_INPUT_EVDEV is not set
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
# CONFIG_INPUT_KEYBOARD is not set
# CONFIG_INPUT_MOUSE is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TABLET is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
# CONFIG_INPUT_MISC is not set

#
# Hardware I/O ports
#
# CONFIG_SERIO is not set
# CONFIG_GAMEPORT is not set

#
# Character devices
#
CONFIG_TTY=y
CONFIG_UNIX98_PTYS=y
CONFIG_DEVPTS_MULTIPLE_INSTANCES=y
CONFIG_LEGACY_PTYS=y
CONFIG_LEGACY_PTY_COUNT=0
# CONFIG_SERIAL_NONSTANDARD is not set
# CONFIG_NOZOMI is not set
# CONFIG_N_GSM is not set
# CONFIG_TRACE_SINK is not set
CONFIG_DEVMEM=y
CONFIG_DEVKMEM=y

#
# Serial drivers
#
# CONFIG_SERIAL_8250 is not set

#
# Non-8250 serial port support
#
# CONFIG_SERIAL_UARTLITE is not set
# CONFIG_SERIAL_JSM is not set
# CONFIG_SERIAL_SCCNXP is not set
# CONFIG_SERIAL_ALTERA_JTAGUART is not set
# CONFIG_SERIAL_ALTERA_UART is not set
# CONFIG_SERIAL_ARC is not set
# CONFIG_SERIAL_RP2 is not set
# CONFIG_SERIAL_FSL_LPUART is not set
CONFIG_HVC_DRIVER=y
CONFIG_HVC_IUCV=y
CONFIG_VIRTIO_CONSOLE=y
# CONFIG_IPMI_HANDLER is not set
CONFIG_HW_RANDOM=m
# CONFIG_HW_RANDOM_TIMERIOMEM is not set
CONFIG_HW_RANDOM_VIRTIO=m
CONFIG_HW_RANDOM_TPM=m
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set
CONFIG_RAW_DRIVER=m
CONFIG_MAX_RAW_DEVS=256
CONFIG_HANGCHECK_TIMER=m
CONFIG_TCG_TPM=y
CONFIG_DEVPORT=y

#
# S/390 character device drivers
#
CONFIG_TN3270=y
CONFIG_TN3270_TTY=y
CONFIG_TN3270_FS=y
CONFIG_TN3270_CONSOLE=y
CONFIG_TN3215=y
CONFIG_TN3215_CONSOLE=y
CONFIG_CCW_CONSOLE=y
CONFIG_SCLP_TTY=y
CONFIG_SCLP_CONSOLE=y
CONFIG_SCLP_VT220_TTY=y
CONFIG_SCLP_VT220_CONSOLE=y
CONFIG_SCLP_ASYNC=m
CONFIG_SCLP_ASYNC_ID="000000000"
CONFIG_HMC_DRV=m
# CONFIG_SCLP_OFB is not set
CONFIG_S390_TAPE=m

#
# S/390 tape hardware support
#
CONFIG_S390_TAPE_34XX=m
CONFIG_S390_TAPE_3590=m
CONFIG_VMLOGRDR=m
CONFIG_VMCP=y
CONFIG_MONREADER=m
CONFIG_MONWRITER=m
CONFIG_S390_VMUR=m
# CONFIG_XILLYBUS is not set

#
# I2C support
#
# CONFIG_I2C is not set
# CONFIG_SPI is not set
# CONFIG_SPMI is not set
# CONFIG_HSI is not set

#
# PPS support
#
CONFIG_PPS=m
# CONFIG_PPS_DEBUG is not set

#
# PPS clients support
#
# CONFIG_PPS_CLIENT_KTIMER is not set
# CONFIG_PPS_CLIENT_LDISC is not set
# CONFIG_PPS_CLIENT_GPIO is not set

#
# PPS generators support
#

#
# PTP clock support
#
CONFIG_PTP_1588_CLOCK=m

#
# Enable PHYLIB and NETWORK_PHY_TIMESTAMPING to see the additional clocks.
#
# CONFIG_W1 is not set
# CONFIG_POWER_SUPPLY is not set
# CONFIG_POWER_RESET is not set
# CONFIG_POWER_AVS is not set
# CONFIG_HWMON is not set
# CONFIG_THERMAL is not set
CONFIG_WATCHDOG=y
CONFIG_WATCHDOG_CORE=y
CONFIG_WATCHDOG_NOWAYOUT=y
# CONFIG_WATCHDOG_SYSFS is not set

#
# Watchdog Device Drivers
#
# CONFIG_SOFT_WATCHDOG is not set
# CONFIG_XILINX_WATCHDOG is not set
# CONFIG_CADENCE_WATCHDOG is not set
# CONFIG_DW_WATCHDOG is not set
# CONFIG_MAX63XX_WATCHDOG is not set
# CONFIG_ALIM7101_WDT is not set
# CONFIG_I6300ESB_WDT is not set
# CONFIG_BCM7038_WDT is not set
CONFIG_DIAG288_WATCHDOG=m

#
# PCI-based Watchdog Cards
#
# CONFIG_PCIPCWATCHDOG is not set
# CONFIG_WDTPCI is not set
CONFIG_SSB_POSSIBLE=y

#
# Sonics Silicon Backplane
#
# CONFIG_SSB is not set
CONFIG_BCMA_POSSIBLE=y

#
# Broadcom specific AMBA
#
# CONFIG_BCMA is not set

#
# Multifunction device drivers
#
# CONFIG_MFD_CORE is not set
# CONFIG_HTC_PASIC3 is not set
# CONFIG_LPC_ICH is not set
# CONFIG_LPC_SCH is not set
# CONFIG_MFD_JANZ_CMODIO is not set
# CONFIG_MFD_KEMPLD is not set
# CONFIG_MFD_MT6397 is not set
# CONFIG_MFD_RDC321X is not set
# CONFIG_MFD_RTSX_PCI is not set
# CONFIG_MFD_SM501 is not set
# CONFIG_ABX500_CORE is not set
# CONFIG_MFD_SYSCON is not set
# CONFIG_MFD_TI_AM335X_TSCADC is not set
# CONFIG_MFD_TMIO is not set
# CONFIG_MFD_VX855 is not set
# CONFIG_REGULATOR is not set
# CONFIG_MEDIA_SUPPORT is not set

#
# Graphics support
#
# CONFIG_DRM is not set

#
# Frame buffer Devices
#
# CONFIG_FB is not set
# CONFIG_BACKLIGHT_LCD_SUPPORT is not set
# CONFIG_VGASTATE is not set
# CONFIG_SOUND is not set

#
# HID support
#
# CONFIG_HID is not set
CONFIG_USB_OHCI_LITTLE_ENDIAN=y
# CONFIG_USB_SUPPORT is not set
# CONFIG_UWB is not set
# CONFIG_MMC is not set
# CONFIG_MEMSTICK is not set
# CONFIG_NEW_LEDS is not set
# CONFIG_ACCESSIBILITY is not set
CONFIG_INFINIBAND=m
# CONFIG_INFINIBAND_USER_MAD is not set
CONFIG_INFINIBAND_USER_ACCESS=m
CONFIG_INFINIBAND_USER_MEM=y
CONFIG_INFINIBAND_ON_DEMAND_PAGING=y
CONFIG_INFINIBAND_ADDR_TRANS=y
CONFIG_INFINIBAND_ADDR_TRANS_CONFIGFS=y
# CONFIG_INFINIBAND_MTHCA is not set
# CONFIG_INFINIBAND_QIB is not set
CONFIG_MLX4_INFINIBAND=m
# CONFIG_MLX5_INFINIBAND is not set
# CONFIG_INFINIBAND_NES is not set
# CONFIG_INFINIBAND_OCRDMA is not set
# CONFIG_INFINIBAND_IPOIB is not set
# CONFIG_INFINIBAND_SRP is not set
# CONFIG_INFINIBAND_ISER is not set
# CONFIG_DMADEVICES is not set
# CONFIG_AUXDISPLAY is not set
# CONFIG_UIO is not set
CONFIG_VFIO_IOMMU_TYPE1=m
CONFIG_VFIO_VIRQFD=m
CONFIG_VFIO=m
# CONFIG_VFIO_NOIOMMU is not set
CONFIG_VFIO_PCI=m
CONFIG_IRQ_BYPASS_MANAGER=m
# CONFIG_VIRT_DRIVERS is not set
CONFIG_VIRTIO=y

#
# Virtio drivers
#
CONFIG_VIRTIO_PCI=m
CONFIG_VIRTIO_PCI_LEGACY=y
CONFIG_VIRTIO_BALLOON=m
# CONFIG_VIRTIO_INPUT is not set
# CONFIG_VIRTIO_MMIO is not set

#
# Microsoft Hyper-V guest support
#
# CONFIG_STAGING is not set

#
# Hardware Spinlock drivers
#

#
# Clock Source drivers
#
# CONFIG_ATMEL_PIT is not set
# CONFIG_SH_TIMER_CMT is not set
# CONFIG_SH_TIMER_MTU2 is not set
# CONFIG_SH_TIMER_TMU is not set
# CONFIG_EM_TIMER_STI is not set
# CONFIG_MAILBOX is not set
CONFIG_IOMMU_API=y
CONFIG_IOMMU_SUPPORT=y

#
# Generic IOMMU Pagetable Support
#
CONFIG_S390_IOMMU=y

#
# Remoteproc drivers
#
# CONFIG_STE_MODEM_RPROC is not set

#
# Rpmsg drivers
#

#
# SOC (System On Chip) specific Drivers
#
# CONFIG_SUNXI_SRAM is not set
# CONFIG_SOC_TI is not set
# CONFIG_PM_DEVFREQ is not set
# CONFIG_EXTCON is not set
# CONFIG_MEMORY is not set
# CONFIG_IIO is not set
# CONFIG_NTB is not set
# CONFIG_VME_BUS is not set
# CONFIG_PWM is not set
CONFIG_ARM_GIC_MAX_NR=1
# CONFIG_TS4800_IRQ is not set
# CONFIG_IPACK_BUS is not set
# CONFIG_RESET_CONTROLLER is not set
# CONFIG_FMC is not set

#
# PHY Subsystem
#
# CONFIG_GENERIC_PHY is not set
# CONFIG_PHY_PXA_28NM_HSIC is not set
# CONFIG_PHY_PXA_28NM_USB2 is not set
# CONFIG_BCM_KONA_USB2_PHY is not set
# CONFIG_PHY_HI6220_USB is not set
# CONFIG_POWERCAP is not set
# CONFIG_MCB is not set

#
# Performance monitor support
#
# CONFIG_RAS is not set
# CONFIG_THUNDERBOLT is not set

#
# Android
#
# CONFIG_ANDROID is not set
# CONFIG_LIBNVDIMM is not set
# CONFIG_NVMEM is not set
# CONFIG_STM is not set
# CONFIG_STM_DUMMY is not set
# CONFIG_STM_SOURCE_CONSOLE is not set
# CONFIG_INTEL_TH is not set

#
# FPGA Configuration Support
#
# CONFIG_FPGA is not set

#
# File systems
#
# CONFIG_EXT2_FS is not set
# CONFIG_EXT3_FS is not set
CONFIG_EXT4_FS=y
CONFIG_EXT4_USE_FOR_EXT2=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y
# CONFIG_EXT4_ENCRYPTION is not set
# CONFIG_EXT4_DEBUG is not set
CONFIG_JBD2=y
CONFIG_JBD2_DEBUG=y
CONFIG_FS_MBCACHE=y
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
# CONFIG_XFS_FS is not set
# CONFIG_GFS2_FS is not set
# CONFIG_OCFS2_FS is not set
CONFIG_BTRFS_FS=m
CONFIG_BTRFS_FS_POSIX_ACL=y
# CONFIG_BTRFS_FS_CHECK_INTEGRITY is not set
# CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
# CONFIG_BTRFS_DEBUG is not set
# CONFIG_BTRFS_ASSERT is not set
# CONFIG_NILFS2_FS is not set
# CONFIG_F2FS_FS is not set
# CONFIG_FS_DAX is not set
CONFIG_FS_POSIX_ACL=y
CONFIG_EXPORTFS=y
CONFIG_FILE_LOCKING=y
# CONFIG_MANDATORY_FILE_LOCKING is not set
CONFIG_FSNOTIFY=y
CONFIG_DNOTIFY=y
CONFIG_INOTIFY_USER=y
CONFIG_FANOTIFY=y
# CONFIG_FANOTIFY_ACCESS_PERMISSIONS is not set
CONFIG_QUOTA=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
CONFIG_PRINT_QUOTA_WARNING=y
# CONFIG_QUOTA_DEBUG is not set
CONFIG_QUOTA_TREE=m
CONFIG_QFMT_V1=m
CONFIG_QFMT_V2=m
CONFIG_QUOTACTL=y
CONFIG_AUTOFS4_FS=m
# CONFIG_FUSE_FS is not set
# CONFIG_OVERLAY_FS is not set

#
# Caches
#
CONFIG_FSCACHE=m
# CONFIG_FSCACHE_STATS is not set
# CONFIG_FSCACHE_HISTOGRAM is not set
# CONFIG_FSCACHE_DEBUG is not set
# CONFIG_FSCACHE_OBJECT_LIST is not set
CONFIG_CACHEFILES=m
# CONFIG_CACHEFILES_DEBUG is not set
# CONFIG_CACHEFILES_HISTOGRAM is not set

#
# CD-ROM/DVD Filesystems
#
# CONFIG_ISO9660_FS is not set
# CONFIG_UDF_FS is not set

#
# DOS/FAT/NT Filesystems
#
# CONFIG_MSDOS_FS is not set
# CONFIG_VFAT_FS is not set
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_VMCORE=y
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
# CONFIG_PROC_CHILDREN is not set
CONFIG_KERNFS=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_TMPFS_XATTR=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_CONFIGFS_FS=m
CONFIG_MISC_FILESYSTEMS=y
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_ECRYPT_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
# CONFIG_LOGFS is not set
# CONFIG_CRAMFS is not set
# CONFIG_SQUASHFS is not set
# CONFIG_VXFS_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_MINIX_FS_NATIVE_ENDIAN is not set
# CONFIG_OMFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_QNX6FS_FS is not set
# CONFIG_ROMFS_FS is not set
# CONFIG_PSTORE is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set
# CONFIG_EXOFS_FS is not set
# CONFIG_NETWORK_FILESYSTEMS is not set
# CONFIG_NLS is not set
CONFIG_DLM=m
# CONFIG_DLM_DEBUG is not set

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y

#
# printk and dmesg options
#
CONFIG_PRINTK_TIME=y
CONFIG_MESSAGE_LOGLEVEL_DEFAULT=4
CONFIG_DYNAMIC_DEBUG=y

#
# Compile-time checks and compiler options
#
CONFIG_DEBUG_INFO=y
# CONFIG_DEBUG_INFO_REDUCED is not set
# CONFIG_DEBUG_INFO_SPLIT is not set
# CONFIG_DEBUG_INFO_DWARF4 is not set
# CONFIG_GDB_SCRIPTS is not set
CONFIG_ENABLE_WARN_DEPRECATED=y
# CONFIG_ENABLE_MUST_CHECK is not set
CONFIG_FRAME_WARN=1024
# CONFIG_STRIP_ASM_SYMS is not set
CONFIG_READABLE_ASM=y
CONFIG_UNUSED_SYMBOLS=y
# CONFIG_PAGE_OWNER is not set
CONFIG_DEBUG_FS=y
# CONFIG_HEADERS_CHECK is not set
# CONFIG_DEBUG_SECTION_MISMATCH is not set
# CONFIG_SECTION_MISMATCH_WARN_ONLY is not set
# CONFIG_DEBUG_FORCE_WEAK_PER_CPU is not set
CONFIG_MAGIC_SYSRQ=y
CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x1
CONFIG_DEBUG_KERNEL=y

#
# Memory Debugging
#
CONFIG_PAGE_EXTENSION=y
CONFIG_DEBUG_PAGEALLOC=y
CONFIG_DEBUG_OBJECTS=y
CONFIG_DEBUG_OBJECTS_SELFTEST=y
CONFIG_DEBUG_OBJECTS_FREE=y
CONFIG_DEBUG_OBJECTS_TIMERS=y
CONFIG_DEBUG_OBJECTS_WORK=y
CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER=y
CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT=1
CONFIG_SLUB_DEBUG_ON=y
CONFIG_SLUB_STATS=y
CONFIG_HAVE_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=400
# CONFIG_DEBUG_KMEMLEAK_TEST is not set
# CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF is not set
CONFIG_DEBUG_STACK_USAGE=y
CONFIG_DEBUG_VM=y
# CONFIG_DEBUG_VM_VMACACHE is not set
CONFIG_DEBUG_VM_RB=y
# CONFIG_DEBUG_VM_PGFLAGS is not set
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_MEMORY_NOTIFIER_ERROR_INJECT=m
CONFIG_DEBUG_PER_CPU_MAPS=y
CONFIG_DEBUG_SHIRQ=y

#
# Debug Lockups and Hangs
#
CONFIG_DETECT_HUNG_TASK=y
CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=120
# CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set
CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=0
# CONFIG_WQ_WATCHDOG is not set
# CONFIG_PANIC_ON_OOPS is not set
CONFIG_PANIC_ON_OOPS_VALUE=0
CONFIG_PANIC_TIMEOUT=0
CONFIG_SCHED_DEBUG=y
CONFIG_SCHED_INFO=y
CONFIG_SCHEDSTATS=y
# CONFIG_SCHED_STACK_END_CHECK is not set
# CONFIG_DEBUG_TIMEKEEPING is not set
CONFIG_TIMER_STATS=y
CONFIG_DEBUG_PREEMPT=y

#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
CONFIG_LOCKDEP=y
CONFIG_LOCK_STAT=y
CONFIG_DEBUG_LOCKDEP=y
CONFIG_DEBUG_ATOMIC_SLEEP=y
CONFIG_DEBUG_LOCKING_API_SELFTESTS=y
# CONFIG_LOCK_TORTURE_TEST is not set
CONFIG_TRACE_IRQFLAGS=y
CONFIG_STACKTRACE=y
# CONFIG_DEBUG_KOBJECT is not set
# CONFIG_DEBUG_KOBJECT_RELEASE is not set
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_LIST=y
# CONFIG_DEBUG_PI_LIST is not set
CONFIG_DEBUG_SG=y
CONFIG_DEBUG_NOTIFIERS=y
CONFIG_DEBUG_CREDENTIALS=y

#
# RCU Debugging
#
CONFIG_PROVE_RCU=y
# CONFIG_PROVE_RCU_REPEATEDLY is not set
# CONFIG_SPARSE_RCU_POINTER is not set
CONFIG_TORTURE_TEST=m
CONFIG_RCU_TORTURE_TEST=m
# CONFIG_RCU_TORTURE_TEST_SLOW_PREINIT is not set
# CONFIG_RCU_TORTURE_TEST_SLOW_INIT is not set
# CONFIG_RCU_TORTURE_TEST_SLOW_CLEANUP is not set
CONFIG_RCU_CPU_STALL_TIMEOUT=300
# CONFIG_RCU_TRACE is not set
# CONFIG_RCU_EQS_DEBUG is not set
# CONFIG_DEBUG_WQ_FORCE_RR_CPU is not set
# CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
CONFIG_NOTIFIER_ERROR_INJECTION=m
CONFIG_CPU_NOTIFIER_ERROR_INJECT=m
CONFIG_PM_NOTIFIER_ERROR_INJECT=m
# CONFIG_NETDEV_NOTIFIER_ERROR_INJECT is not set
CONFIG_FAULT_INJECTION=y
CONFIG_FAILSLAB=y
CONFIG_FAIL_PAGE_ALLOC=y
CONFIG_FAIL_MAKE_REQUEST=y
CONFIG_FAIL_IO_TIMEOUT=y
# CONFIG_FAIL_FUTEX is not set
CONFIG_FAULT_INJECTION_DEBUG_FS=y
CONFIG_FAULT_INJECTION_STACKTRACE_FILTER=y
CONFIG_LATENCYTOP=y
CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS=y
CONFIG_DEBUG_STRICT_USER_COPY_CHECKS=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_TRACE_CLOCK=y
CONFIG_RING_BUFFER=y
CONFIG_EVENT_TRACING=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_RING_BUFFER_ALLOW_SWAP=y
CONFIG_TRACING=y
CONFIG_GENERIC_TRACER=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
# CONFIG_FUNCTION_TRACER is not set
# CONFIG_IRQSOFF_TRACER is not set
# CONFIG_PREEMPT_TRACER is not set
# CONFIG_SCHED_TRACER is not set
# CONFIG_FTRACE_SYSCALLS is not set
# CONFIG_TRACER_SNAPSHOT is not set
CONFIG_BRANCH_PROFILE_NONE=y
# CONFIG_PROFILE_ANNOTATED_BRANCHES is not set
# CONFIG_PROFILE_ALL_BRANCHES is not set
# CONFIG_STACK_TRACER is not set
CONFIG_BLK_DEV_IO_TRACE=y
# CONFIG_KPROBE_EVENT is not set
# CONFIG_UPROBE_EVENT is not set
# CONFIG_PROBE_EVENTS is not set
# CONFIG_FTRACE_STARTUP_TEST is not set
# CONFIG_TRACEPOINT_BENCHMARK is not set
# CONFIG_RING_BUFFER_BENCHMARK is not set
# CONFIG_RING_BUFFER_STARTUP_TEST is not set
# CONFIG_TRACE_ENUM_MAP_FILE is not set

#
# Runtime Testing
#
CONFIG_LKDTM=m
CONFIG_TEST_LIST_SORT=y
CONFIG_KPROBES_SANITY_TEST=y
# CONFIG_BACKTRACE_SELF_TEST is not set
CONFIG_RBTREE_TEST=y
CONFIG_INTERVAL_TREE_TEST=m
CONFIG_PERCPU_TEST=m
CONFIG_ATOMIC64_SELFTEST=y
# CONFIG_TEST_HEXDUMP is not set
# CONFIG_TEST_STRING_HELPERS is not set
# CONFIG_TEST_KSTRTOX is not set
# CONFIG_TEST_PRINTF is not set
# CONFIG_TEST_RHASHTABLE is not set
CONFIG_DMA_API_DEBUG=y
# CONFIG_TEST_LKM is not set
# CONFIG_TEST_USER_COPY is not set
# CONFIG_TEST_BPF is not set
# CONFIG_TEST_FIRMWARE is not set
# CONFIG_TEST_UDELAY is not set
# CONFIG_MEMTEST is not set
# CONFIG_TEST_STATIC_KEYS is not set
# CONFIG_SAMPLES is not set
# CONFIG_UBSAN is not set
CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y
# CONFIG_STRICT_DEVMEM is not set
CONFIG_S390_PTDUMP=y
CONFIG_DEBUG_SET_MODULE_RONX=y

#
# Security options
#
CONFIG_KEYS=y
# CONFIG_PERSISTENT_KEYRINGS is not set
# CONFIG_BIG_KEYS is not set
# CONFIG_TRUSTED_KEYS is not set
CONFIG_ENCRYPTED_KEYS=m
# CONFIG_SECURITY_DMESG_RESTRICT is not set
CONFIG_SECURITY=y
CONFIG_SECURITYFS=y
CONFIG_SECURITY_NETWORK=y
# CONFIG_SECURITY_NETWORK_XFRM is not set
# CONFIG_SECURITY_PATH is not set
# CONFIG_SECURITY_SELINUX is not set
# CONFIG_SECURITY_SMACK is not set
# CONFIG_SECURITY_TOMOYO is not set
# CONFIG_SECURITY_APPARMOR is not set
# CONFIG_SECURITY_YAMA is not set
CONFIG_INTEGRITY=y
# CONFIG_INTEGRITY_SIGNATURE is not set
CONFIG_INTEGRITY_AUDIT=y
CONFIG_IMA=y
CONFIG_IMA_MEASURE_PCR_IDX=10
# CONFIG_IMA_TEMPLATE is not set
CONFIG_IMA_NG_TEMPLATE=y
# CONFIG_IMA_SIG_TEMPLATE is not set
CONFIG_IMA_DEFAULT_TEMPLATE="ima-ng"
CONFIG_IMA_DEFAULT_HASH_SHA1=y
# CONFIG_IMA_DEFAULT_HASH_SHA256 is not set
# CONFIG_IMA_DEFAULT_HASH_SHA512 is not set
# CONFIG_IMA_DEFAULT_HASH_WP512 is not set
CONFIG_IMA_DEFAULT_HASH="sha1"
# CONFIG_IMA_WRITE_POLICY is not set
# CONFIG_IMA_READ_POLICY is not set
CONFIG_IMA_APPRAISE=y
# CONFIG_EVM is not set
CONFIG_DEFAULT_SECURITY_DAC=y
CONFIG_DEFAULT_SECURITY=""
CONFIG_XOR_BLOCKS=m
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD=m
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_BLKCIPHER=m
CONFIG_CRYPTO_BLKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG=m
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_RNG_DEFAULT=m
CONFIG_CRYPTO_PCOMP=y
CONFIG_CRYPTO_PCOMP2=y
CONFIG_CRYPTO_AKCIPHER2=y
# CONFIG_CRYPTO_RSA is not set
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
CONFIG_CRYPTO_USER=m
# CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set
CONFIG_CRYPTO_GF128MUL=m
CONFIG_CRYPTO_NULL=m
CONFIG_CRYPTO_NULL2=y
# CONFIG_CRYPTO_PCRYPT is not set
CONFIG_CRYPTO_WORKQUEUE=y
CONFIG_CRYPTO_CRYPTD=m
# CONFIG_CRYPTO_MCRYPTD is not set
CONFIG_CRYPTO_AUTHENC=m
CONFIG_CRYPTO_TEST=m

#
# Authenticated Encryption with Associated Data
#
CONFIG_CRYPTO_CCM=m
CONFIG_CRYPTO_GCM=m
# CONFIG_CRYPTO_CHACHA20POLY1305 is not set
CONFIG_CRYPTO_SEQIV=m
CONFIG_CRYPTO_ECHAINIV=m

#
# Block modes
#
CONFIG_CRYPTO_CBC=m
CONFIG_CRYPTO_CTR=m
CONFIG_CRYPTO_CTS=m
CONFIG_CRYPTO_ECB=m
CONFIG_CRYPTO_LRW=m
CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_XTS=m
# CONFIG_CRYPTO_KEYWRAP is not set

#
# Hash modes
#
CONFIG_CRYPTO_CMAC=m
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_XCBC=m
CONFIG_CRYPTO_VMAC=m

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
CONFIG_CRYPTO_CRC32=m
CONFIG_CRYPTO_CRCT10DIF=y
CONFIG_CRYPTO_GHASH=m
# CONFIG_CRYPTO_POLY1305 is not set
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_RMD128=m
CONFIG_CRYPTO_RMD160=m
CONFIG_CRYPTO_RMD256=m
CONFIG_CRYPTO_RMD320=m
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA256=y
CONFIG_CRYPTO_SHA512=m
CONFIG_CRYPTO_TGR192=m
CONFIG_CRYPTO_WP512=m

#
# Ciphers
#
CONFIG_CRYPTO_AES=y
CONFIG_CRYPTO_ANUBIS=m
CONFIG_CRYPTO_ARC4=m
CONFIG_CRYPTO_BLOWFISH=m
CONFIG_CRYPTO_BLOWFISH_COMMON=m
CONFIG_CRYPTO_CAMELLIA=m
CONFIG_CRYPTO_CAST_COMMON=m
CONFIG_CRYPTO_CAST5=m
CONFIG_CRYPTO_CAST6=m
CONFIG_CRYPTO_DES=m
CONFIG_CRYPTO_FCRYPT=m
CONFIG_CRYPTO_KHAZAD=m
CONFIG_CRYPTO_SALSA20=m
# CONFIG_CRYPTO_CHACHA20 is not set
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_TWOFISH_COMMON=m

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=m
CONFIG_CRYPTO_ZLIB=y
CONFIG_CRYPTO_LZO=m
# CONFIG_CRYPTO_842 is not set
CONFIG_CRYPTO_LZ4=m
CONFIG_CRYPTO_LZ4HC=m

#
# Random Number Generation
#
CONFIG_CRYPTO_ANSI_CPRNG=m
CONFIG_CRYPTO_DRBG_MENU=m
CONFIG_CRYPTO_DRBG_HMAC=y
# CONFIG_CRYPTO_DRBG_HASH is not set
# CONFIG_CRYPTO_DRBG_CTR is not set
CONFIG_CRYPTO_DRBG=m
CONFIG_CRYPTO_JITTERENTROPY=m
CONFIG_CRYPTO_USER_API=m
CONFIG_CRYPTO_USER_API_HASH=m
CONFIG_CRYPTO_USER_API_SKCIPHER=m
# CONFIG_CRYPTO_USER_API_RNG is not set
# CONFIG_CRYPTO_USER_API_AEAD is not set
CONFIG_CRYPTO_HASH_INFO=y
CONFIG_CRYPTO_HW=y
CONFIG_ZCRYPT=m
CONFIG_CRYPTO_SHA1_S390=m
CONFIG_CRYPTO_SHA256_S390=m
CONFIG_CRYPTO_SHA512_S390=m
CONFIG_CRYPTO_DES_S390=m
CONFIG_CRYPTO_AES_S390=m
CONFIG_S390_PRNG=m
CONFIG_CRYPTO_GHASH_S390=m
CONFIG_ASYMMETRIC_KEY_TYPE=m
CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=m
CONFIG_PUBLIC_KEY_ALGO_RSA=m
CONFIG_X509_CERTIFICATE_PARSER=m
# CONFIG_PKCS7_MESSAGE_PARSER is not set

#
# Certificates for signature checking
#
# CONFIG_SYSTEM_TRUSTED_KEYRING is not set
CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_RAID6_PQ=m
CONFIG_BITREVERSE=y
# CONFIG_HAVE_ARCH_BITREVERSE is not set
CONFIG_GENERIC_NET_UTILS=y
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_GENERIC_IO=y
CONFIG_ARCH_USE_CMPXCHG_LOCKREF=y
CONFIG_CRC_CCITT=m
CONFIG_CRC16=y
CONFIG_CRC_T10DIF=y
CONFIG_CRC_ITU_T=m
CONFIG_CRC32=y
# CONFIG_CRC32_SELFTEST is not set
CONFIG_CRC32_SLICEBY8=y
# CONFIG_CRC32_SLICEBY4 is not set
# CONFIG_CRC32_SARWATE is not set
# CONFIG_CRC32_BIT is not set
CONFIG_CRC7=m
CONFIG_LIBCRC32C=m
CONFIG_CRC8=m
# CONFIG_AUDIT_ARCH_COMPAT_GENERIC is not set
# CONFIG_RANDOM32_SELFTEST is not set
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_LZO_COMPRESS=y
CONFIG_LZO_DECOMPRESS=y
CONFIG_LZ4_COMPRESS=m
CONFIG_LZ4HC_COMPRESS=m
CONFIG_LZ4_DECOMPRESS=y
CONFIG_XZ_DEC=y
CONFIG_XZ_DEC_X86=y
CONFIG_XZ_DEC_POWERPC=y
CONFIG_XZ_DEC_IA64=y
CONFIG_XZ_DEC_ARM=y
CONFIG_XZ_DEC_ARMTHUMB=y
CONFIG_XZ_DEC_SPARC=y
CONFIG_XZ_DEC_BCJ=y
# CONFIG_XZ_DEC_TEST is not set
CONFIG_DECOMPRESS_GZIP=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_LZMA=y
CONFIG_DECOMPRESS_XZ=y
CONFIG_DECOMPRESS_LZO=y
CONFIG_DECOMPRESS_LZ4=y
CONFIG_INTERVAL_TREE=y
CONFIG_ASSOCIATIVE_ARRAY=y
# CONFIG_CPUMASK_OFFSTACK is not set
CONFIG_CPU_RMAP=y
CONFIG_DQL=y
CONFIG_NLATTR=y
CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE=y
CONFIG_CLZ_TAB=y
CONFIG_CORDIC=m
# CONFIG_DDR is not set
CONFIG_IRQ_POLL=y
CONFIG_MPILIB=m
CONFIG_OID_REGISTRY=m
# CONFIG_SG_SPLIT is not set
CONFIG_ARCH_HAS_SG_CHAIN=y

#
# Virtualization
#
CONFIG_PFAULT=y
CONFIG_CMM=m
CONFIG_CMM_IUCV=y
CONFIG_APPLDATA_BASE=y
CONFIG_APPLDATA_MEM=m
CONFIG_APPLDATA_OS=m
CONFIG_APPLDATA_NET_SUM=m
CONFIG_S390_HYPFS_FS=y
CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_HAVE_KVM_IRQFD=y
CONFIG_HAVE_KVM_IRQ_ROUTING=y
CONFIG_HAVE_KVM_EVENTFD=y
CONFIG_KVM_ASYNC_PF=y
CONFIG_KVM_ASYNC_PF_SYNC=y
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y
CONFIG_KVM_VFIO=y
CONFIG_VIRTUALIZATION=y
CONFIG_KVM=m
CONFIG_KVM_S390_UCONTROL=y
CONFIG_S390_GUEST=y

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-17 19:04                           ` Sebastian Ott
  0 siblings, 0 replies; 153+ messages in thread
From: Sebastian Ott @ 2016-02-17 19:04 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Gerald Schaefer, Andrea Arcangeli, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

[-- Attachment #1: Type: text/plain, Size: 10359 bytes --]

Hi,

On Wed, 17 Feb 2016, Kirill A. Shutemov wrote:
> On Tue, Feb 16, 2016 at 05:24:44PM +0100, Gerald Schaefer wrote:
> > On Mon, 15 Feb 2016 23:35:26 +0200
> > "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> > 
> > > Is there any chance that I'll be able to trigger the bug using QEMU?
> > > Does anybody have an QEMU image I can use?
> > > 
> > 
> > I have no image, but trying to reproduce this under virtualization may
> > help to trigger this also on other architectures. After ruling out IPI
> > vs. fast_gup I do not really see why this should be arch-specific, and
> > it wouldn't be the first time that we hit subtle races first on s390, due
> > to our virtualized environment (my test case is make -j20 with 10 CPUs and
> > 4GB of memory, no swap).
> 
> Could you post your kernel config?

Attached.

> It would be nice also to check if disabling split_huge_page() would make
> any difference:
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index a75081ca31cf..26d2b7b21021 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3364,6 +3364,8 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
>  	bool mlocked;
>  	unsigned long flags;
> 
> +	return -EBUSY;
> +
>  	VM_BUG_ON_PAGE(is_huge_zero_page(page), page);
>  	VM_BUG_ON_PAGE(!PageAnon(page), page);
>  	VM_BUG_ON_PAGE(!PageLocked(page), page);
> -- 

65c23c6 + this patch also oopsed:

c 1707.903808! ODEBUG: active_state not available (active state 0) object type:
rcu_head hint:           (null)
c 1707.903852! ------------c cut here !------------
c 1707.903854! WARNING: at lib/debugobjects.c:263
c 1707.903856! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa vxl
an ib_mad ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr xor raid6_pq gh
ash_s390 mlx4_core prng ecb aes_s390 des_s390 des_generic sha512_s390 dm_mod sha
256_s390 genwqe_card sha1_s390 sha_common crc_itu_t scm_block eadm_sch vhost_net
tun vhost macvtap macvlan kvm autofs4
c 1707.903892! CPU: 4 PID: 25215 Comm: git Not tainted 4.5.0-rc4-00037-g65c23c6-
dirty #273
c 1707.903894! task: 0000000006a60000 ti: 0000000063b04000 task.ti: 0000000063b0
4000
c 1707.903896! Krnl PSW : 0404c00180000000 0000000000486ce0 (debug_print_object+
							     0xb0/0xd0)
c 1707.903905!            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:
3
Krnl GPRS: 0000000001a361c7 0000000006a60000 0000000000000060 0000000000000101
c 1707.903908!            0000000000486cdc 0000000000000000 000000000088cbdc 000
0000001b53848
c 1707.903910!            0700000000000001 0000000000000000 0000000001b53850 000
00000008bb820
c 1707.903912!            0000000000a8d710 00000000dcdd3d38 0000000000486cdc 000
00000dcdd3c38
c 1707.903920! Krnl Code: 0000000000486cd0: c0200021a496        larl    %%r2,8bb
5fc
0000000000486cd6: c0e5ffee03a1       brasl   %%r14,247418
#0000000000486cdc: a7f40001           brc     15,486cde
>0000000000486ce0: c41d002f488e       lrl     %%r1,a6fdfc
0000000000486ce6: e340f0e80004       lg      %%r4,232(%%r15)
0000000000486cec: a71a0001           ahi     %%r1,1
0000000000486cf0: eb6ff0a80004       lmg     %%r6,%%r15,168(%%r15)
0000000000486cf6: c41f002f4883       strl    %%r1,a6fdfc
c 1707.903960! Call Trace:
c 1707.903962! (c<0000000000486cdc>! debug_print_object+0xac/0xd0)
c 1707.903964!  c<0000000000488094>! debug_object_active_state+0x164/0x178
c 1707.903969!  c<00000000001b991c>! rcu_process_callbacks+0x564/0x9e8
c 1707.903973!  c<000000000013d3ee>! __do_softirq+0x256/0x568
c 1707.903975!  c<000000000013da3a>! irq_exit+0x7a/0xd8
c 1707.903979!  c<000000000010c87e>! do_IRQ+0x86/0xc0
c 1707.903984!  c<00000000006fa3f2>! ext_int_handler+0x11e/0x124
c 1707.903987!  c<0000000000199bfe>! lock_release+0x5ce/0x670
c 1707.903989! (c<0000000000199be0>! lock_release+0x5b0/0x670)
c 1707.903993!  c<00000000002dffa2>! getname_flags+0x82/0x218
c 1707.903994!  c<00000000002e04e8>! user_path_at_empty+0x40/0x68
c 1707.903998!  c<00000000002d44a4>! vfs_fstatat+0x6c/0xc8
c 1707.903999!  c<00000000002d4894>! SyS_newlstat+0x2c/0x48
c 1707.904002!  c<00000000006f9cce>! system_call+0xd6/0x258
c 1707.904003!  c<000003ffb45f1124>! 0x3ffb45f1124
c 1707.904005! 1 lock held by git/25215:
c 1707.904006!  #0:  (&obj_hashci!.lock){-.-.-.}, at: c<0000000000487fdc>! debug
_object_active_state+0xac/0x178
c 1707.904012! Last Breaking-Event-Address:
c 1707.904014!  c<0000000000486cdc>! debug_print_object+0xac/0xd0
c 1707.904016! ---c end trace 8ce68dc422e8321c !---
c 1707.904018! ODEBUG: deactivate not available (active state 0) object type: rc
u_head hint:           (null)
c 1707.904026! ------------c cut here !------------
c 1707.904027! WARNING: at lib/debugobjects.c:263
c 1707.904028! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa vxl
an ib_mad ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr xor raid6_pq gh
ash_s390 mlx4_core prng ecb aes_s390 des_s390 des_generic sha512_s390 dm_mod sha
256_s390 genwqe_card sha1_s390 sha_common crc_itu_t scm_block eadm_sch vhost_net
tun vhost macvtap macvlan kvm autofs4
c 1707.904055! CPU: 4 PID: 25215 Comm: git Tainted: G        W       4.5.0-rc4-0
0037-g65c23c6-dirty #273
c 1707.904057! task: 0000000006a60000 ti: 0000000063b04000 task.ti: 0000000063b0
4000
c 1707.904058! Krnl PSW : 0404c00180000000 0000000000486ce0 (debug_print_object+
							     0xb0/0xd0)
c 1707.904062!            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:
3
Krnl GPRS: 0000000001a361c7 0000000006a60000 000000000000005e 0000000000000101
c 1707.904066!            0000000000486cdc 0000000000000000 000000000088cbdc 000
000000000000a
c 1707.904068!            0000000091cdb020 07000000dcdd3c68 0000000001b53850 000
00000008979ea
c 1707.904069!            0000000000a8d710 00000000dcdd3d48 0000000000486cdc 000
00000dcdd3c48
c 1707.904074! Krnl Code: 0000000000486cd0: c0200021a496        larl    %%r2,8bb
5fc
0000000000486cd6: c0e5ffee03a1       brasl   %%r14,247418
#0000000000486cdc: a7f40001           brc     15,486cde
>0000000000486ce0: c41d002f488e       lrl     %%r1,a6fdfc
0000000000486ce6: e340f0e80004       lg      %%r4,232(%%r15)
0000000000486cec: a71a0001           ahi     %%r1,1
0000000000486cf0: eb6ff0a80004       lmg     %%r6,%%r15,168(%%r15)
0000000000486cf6: c41f002f4883       strl    %%r1,a6fdfc
c 1707.904088! Call Trace:
c 1707.904090! (c<0000000000486cdc>! debug_print_object+0xac/0xd0)
c 1707.904092!  c<0000000000487a38>! debug_object_deactivate+0x170/0x188
c 1707.904094!  c<00000000001b992e>! rcu_process_callbacks+0x576/0x9e8
c 1707.904096!  c<000000000013d3ee>! __do_softirq+0x256/0x568
c 1707.904098!  c<000000000013da3a>! irq_exit+0x7a/0xd8
c 1707.904100!  c<000000000010c87e>! do_IRQ+0x86/0xc0
c 1707.904102!  c<00000000006fa3f2>! ext_int_handler+0x11e/0x124
c 1707.904104!  c<0000000000199bfe>! lock_release+0x5ce/0x670
c 1707.904106! (c<0000000000199be0>! lock_release+0x5b0/0x670)
c 1707.904108!  c<00000000002dffa2>! getname_flags+0x82/0x218
c 1707.904109!  c<00000000002e04e8>! user_path_at_empty+0x40/0x68
c 1707.904111!  c<00000000002d44a4>! vfs_fstatat+0x6c/0xc8
c 1707.904113!  c<00000000002d4894>! SyS_newlstat+0x2c/0x48
c 1707.904115!  c<00000000006f9cce>! system_call+0xd6/0x258
c 1707.904117!  c<000003ffb45f1124>! 0x3ffb45f1124
c 1707.904118! 1 lock held by git/25215:
c 1707.904119!  #0:  (&obj_hashci!.lock){-.-.-.}, at: c<000000000048796c>! debug
_object_deactivate+0xa4/0x188
c 1707.904124! Last Breaking-Event-Address:
c 1707.904126!  c<0000000000486cdc>! debug_print_object+0xac/0xd0
c 1707.904128! ---c end trace 8ce68dc422e8321d !---
c 1707.904150! ------------c cut here !------------
c 1707.904152! Kernel BUG at 0000000008cf8002 cverbose debug info unavailable!
c 1707.904197! illegal operation: 0001 ilc:1 c#1! PREEMPT SMP DEBUG_PAGEALLOC
c 1707.904203! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa vxl
an ib_mad ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr xor raid6_pq gh
ash_s390 mlx4_core prng ecb aes_s390 des_s390 des_generic sha512_s390 dm_mod sha
256_s390 genwqe_card sha1_s390 sha_common crc_itu_t scm_block eadm_sch vhost_net
tun vhost macvtap macvlan kvm autofs4
c 1707.904240! CPU: 4 PID: 25215 Comm: git Tainted: G        W       4.5.0-rc4-0
0037-g65c23c6-dirty #273
c 1707.904242! task: 0000000006a60000 ti: 0000000063b04000 task.ti: 0000000063b0
4000
c 1707.904244! Krnl PSW : 0704d00180000000 0000000008cf8002 (0x8cf8002)
c 1707.904248!            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:
3
Krnl GPRS: 0000000000000000 0000000008cf8000 0000000091cdb020 0000000091cdb020
c 1707.904252!            00000000001b9964 0000000000000000 0000000000000000 000
000000000000a
c 1707.904254!            0000000000000000 0000000008cf8000 0000000000000004 000
00000034d6802
c 1707.904256!            00000000dec0f600 00000000007063d8 00000000001b99ae 000
00000dcdd3d18
c 1707.904263! Krnl Code: 0000000008cf7ff6: 5a5a5a5a            a       %%r5,265
0(%%r10,%%r5)
0000000008cf7ffa: 5a5a5a5a           a       %%r5,2650(%%r10,%%r5)
#0000000008cf7ffe: 5a5a0000           a       %%r5,0(%%r10,%%r0)
>0000000008cf8002: 0000               unknown
0000000008cf8004: 0000               unknown
0000000008cf8006: 0020               unknown
0000000008cf8008: 0000               unknown
0000000008cf800a: 0000               unknown
c 1707.904277! Call Trace:
c 1707.904279! (c<00000000001b9964>! rcu_process_callbacks+0x5ac/0x9e8)
c 1707.904282!  c<000000000013d3ee>! __do_softirq+0x256/0x568
c 1707.904284!  c<000000000013da3a>! irq_exit+0x7a/0xd8
c 1707.904286!  c<000000000010c87e>! do_IRQ+0x86/0xc0
c 1707.904289!  c<00000000006fa3f2>! ext_int_handler+0x11e/0x124
c 1707.904291!  c<0000000000199bfe>! lock_release+0x5ce/0x670
c 1707.904293! (c<0000000000199be0>! lock_release+0x5b0/0x670)
c 1707.904295!  c<00000000002dffa2>! getname_flags+0x82/0x218
c 1707.904297!  c<00000000002e04e8>! user_path_at_empty+0x40/0x68
c 1707.904299!  c<00000000002d44a4>! vfs_fstatat+0x6c/0xc8
c 1707.904301!  c<00000000002d4894>! SyS_newlstat+0x2c/0x48
c 1707.904303!  c<00000000006f9cce>! system_call+0xd6/0x258
c 1707.904305!  c<000003ffb45f1124>! 0x3ffb45f1124
c 1707.904307! INFO: lockdep is turned off.
c 1707.904308! Last Breaking-Event-Address:
c 1707.904310!  c<00000000001b99ac>! rcu_process_callbacks+0x5f4/0x9e8
c 1707.904314!
c 1707.904315! Kernel panic - not syncing: Fatal exception in interrupt

[-- Attachment #2: Type: text/plain, Size: 51707 bytes --]

#
# Automatically generated file; DO NOT EDIT.
# Linux/s390 4.5.0-rc3 Kernel Configuration
#
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_GENERIC_LOCKBREAK=y
CONFIG_PGSTE=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_KEXEC=y
CONFIG_AUDIT_ARCH=y
CONFIG_NO_IOPORT_MAP=y
# CONFIG_PCI_QUIRKS is not set
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_S390=y
CONFIG_SCHED_OMIT_FRAME_POINTER=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
CONFIG_USELIB=y
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_WATCH=y
CONFIG_AUDIT_TREE=y

#
# IRQ subsystem
#
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
CONFIG_VIRT_CPU_ACCOUNTING=y
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y

#
# RCU Subsystem
#
CONFIG_PREEMPT_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TASKS_RCU=y
CONFIG_RCU_STALL_COMMON=y
# CONFIG_TREE_RCU_TRACE is not set
# CONFIG_RCU_EXPEDITE_BOOT is not set
CONFIG_BUILD_BIN2C=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=17
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_CGROUPS=y
# CONFIG_MEMCG is not set
CONFIG_BLK_CGROUP=y
# CONFIG_DEBUG_BLK_CGROUP is not set
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
CONFIG_RT_GROUP_SCHED=y
# CONFIG_CGROUP_PIDS is not set
CONFIG_CGROUP_FREEZER=y
# CONFIG_CGROUP_HUGETLB is not set
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_PERF=y
# CONFIG_CGROUP_DEBUG is not set
# CONFIG_CHECKPOINT_RESTORE is not set
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_IPC_NS=y
# CONFIG_USER_NS is not set
CONFIG_PID_NS=y
CONFIG_NET_NS=y
CONFIG_SCHED_AUTOGROUP=y
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
CONFIG_RD_LZO=y
CONFIG_RD_LZ4=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_BPF=y
# CONFIG_EXPERT is not set
CONFIG_MULTIUSER=y
# CONFIG_SGETMASK_SYSCALL is not set
CONFIG_SYSFS_SYSCALL=y
# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_HAVE_FUTEX_CMPXCHG=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
# CONFIG_BPF_SYSCALL is not set
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_ADVISE_SYSCALLS=y
# CONFIG_USERFAULTFD is not set
CONFIG_MEMBARRIER=y
# CONFIG_EMBEDDED is not set
CONFIG_HAVE_PERF_EVENTS=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_COMPAT_BRK is not set
# CONFIG_SLAB is not set
CONFIG_SLUB=y
CONFIG_SLUB_CPU_PARTIAL=y
# CONFIG_SYSTEM_DATA_VERIFICATION is not set
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
CONFIG_KEXEC_CORE=y
CONFIG_OPROFILE=m
CONFIG_HAVE_OPROFILE=y
CONFIG_KPROBES=y
CONFIG_JUMP_LABEL=y
# CONFIG_STATIC_KEYS_SELFTEST is not set
# CONFIG_UPROBES is not set
CONFIG_HAVE_64BIT_ALIGNED_ACCESS=y
CONFIG_ARCH_USE_BUILTIN_BSWAP=y
CONFIG_KRETPROBES=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_DMA_API_DEBUG=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y
CONFIG_HAVE_CMPXCHG_LOCAL=y
CONFIG_HAVE_CMPXCHG_DOUBLE=y
CONFIG_ARCH_WANT_IPC_PARSE_VERSION=y
CONFIG_ARCH_WANT_COMPAT_IPC_PARSE_VERSION=y
CONFIG_ARCH_WANT_OLD_COMPAT_IPC=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP_FILTER=y
# CONFIG_CC_STACKPROTECTOR is not set
CONFIG_HAVE_VIRT_CPU_ACCOUNTING=y
CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_HAVE_ARCH_SOFT_DIRTY=y
CONFIG_HAVE_MOD_ARCH_SPECIFIC=y
CONFIG_MODULES_USE_ELF_RELA=y
CONFIG_ARCH_HAS_ELF_RANDOMIZE=y
CONFIG_CLONE_BACKWARDS2=y
CONFIG_OLD_SIGSUSPEND3=y
CONFIG_OLD_SIGACTION=y
CONFIG_COMPAT_OLD_SIGACTION=y

#
# GCOV-based kernel profiling
#
# CONFIG_GCOV_KERNEL is not set
CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y
# CONFIG_HAVE_GENERIC_DMA_COHERENT is not set
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_FORCE_LOAD=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
# CONFIG_MODULE_SIG is not set
CONFIG_MODULE_COMPRESS=y
# CONFIG_MODULE_COMPRESS_GZIP is not set
CONFIG_MODULE_COMPRESS_XZ=y
CONFIG_MODULES_TREE_LOOKUP=y
CONFIG_BLOCK=y
CONFIG_BLK_DEV_BSG=y
CONFIG_BLK_DEV_BSGLIB=y
CONFIG_BLK_DEV_INTEGRITY=y
CONFIG_BLK_DEV_THROTTLING=y
# CONFIG_BLK_CMDLINE_PARSER is not set

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
# CONFIG_AIX_PARTITION is not set
# CONFIG_OSF_PARTITION is not set
# CONFIG_AMIGA_PARTITION is not set
# CONFIG_ATARI_PARTITION is not set
CONFIG_IBM_PARTITION=y
# CONFIG_MAC_PARTITION is not set
CONFIG_MSDOS_PARTITION=y
CONFIG_BSD_DISKLABEL=y
CONFIG_MINIX_SUBPARTITION=y
CONFIG_SOLARIS_X86_PARTITION=y
CONFIG_UNIXWARE_DISKLABEL=y
# CONFIG_LDM_PARTITION is not set
# CONFIG_SGI_PARTITION is not set
# CONFIG_ULTRIX_PARTITION is not set
# CONFIG_SUN_PARTITION is not set
# CONFIG_KARMA_PARTITION is not set
CONFIG_EFI_PARTITION=y
# CONFIG_SYSV68_PARTITION is not set
# CONFIG_CMDLINE_PARTITION is not set
CONFIG_BLOCK_COMPAT=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
CONFIG_CFQ_GROUP_IOSCHED=y
CONFIG_DEFAULT_DEADLINE=y
# CONFIG_DEFAULT_CFQ is not set
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="deadline"
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_ASN1=m
CONFIG_ARCH_INLINE_SPIN_TRYLOCK=y
CONFIG_ARCH_INLINE_SPIN_TRYLOCK_BH=y
CONFIG_ARCH_INLINE_SPIN_LOCK=y
CONFIG_ARCH_INLINE_SPIN_LOCK_BH=y
CONFIG_ARCH_INLINE_SPIN_LOCK_IRQ=y
CONFIG_ARCH_INLINE_SPIN_LOCK_IRQSAVE=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK_BH=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK_IRQ=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE=y
CONFIG_ARCH_INLINE_READ_TRYLOCK=y
CONFIG_ARCH_INLINE_READ_LOCK=y
CONFIG_ARCH_INLINE_READ_LOCK_BH=y
CONFIG_ARCH_INLINE_READ_LOCK_IRQ=y
CONFIG_ARCH_INLINE_READ_LOCK_IRQSAVE=y
CONFIG_ARCH_INLINE_READ_UNLOCK=y
CONFIG_ARCH_INLINE_READ_UNLOCK_BH=y
CONFIG_ARCH_INLINE_READ_UNLOCK_IRQ=y
CONFIG_ARCH_INLINE_READ_UNLOCK_IRQRESTORE=y
CONFIG_ARCH_INLINE_WRITE_TRYLOCK=y
CONFIG_ARCH_INLINE_WRITE_LOCK=y
CONFIG_ARCH_INLINE_WRITE_LOCK_BH=y
CONFIG_ARCH_INLINE_WRITE_LOCK_IRQ=y
CONFIG_ARCH_INLINE_WRITE_LOCK_IRQSAVE=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK_BH=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK_IRQ=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE=y
CONFIG_UNINLINE_SPIN_UNLOCK=y
CONFIG_ARCH_SUPPORTS_ATOMIC_RMW=y
CONFIG_RWSEM_SPIN_ON_OWNER=y
CONFIG_LOCK_SPIN_ON_OWNER=y
CONFIG_FREEZER=y
CONFIG_HAVE_LIVEPATCH=y

#
# Processor type and features
#
CONFIG_HAVE_MARCH_Z900_FEATURES=y
CONFIG_HAVE_MARCH_Z990_FEATURES=y
CONFIG_HAVE_MARCH_Z9_109_FEATURES=y
CONFIG_HAVE_MARCH_Z10_FEATURES=y
CONFIG_HAVE_MARCH_Z196_FEATURES=y
# CONFIG_HAVE_MARCH_ZEC12_FEATURES is not set
# CONFIG_HAVE_MARCH_Z13_FEATURES is not set
# CONFIG_MARCH_Z900 is not set
# CONFIG_MARCH_Z990 is not set
# CONFIG_MARCH_Z9_109 is not set
# CONFIG_MARCH_Z10 is not set
CONFIG_MARCH_Z196=y
# CONFIG_MARCH_ZEC12 is not set
# CONFIG_MARCH_Z13 is not set
# CONFIG_MARCH_Z900_TUNE is not set
# CONFIG_MARCH_Z990_TUNE is not set
# CONFIG_MARCH_Z9_109_TUNE is not set
# CONFIG_MARCH_Z10_TUNE is not set
# CONFIG_MARCH_Z196_TUNE is not set
CONFIG_MARCH_ZEC12_TUNE=y
# CONFIG_MARCH_Z13_TUNE is not set
# CONFIG_TUNE_DEFAULT is not set
# CONFIG_TUNE_Z900 is not set
# CONFIG_TUNE_Z990 is not set
# CONFIG_TUNE_Z9_109 is not set
# CONFIG_TUNE_Z10 is not set
# CONFIG_TUNE_Z196 is not set
CONFIG_TUNE_ZEC12=y
# CONFIG_TUNE_Z13 is not set
CONFIG_64BIT=y
CONFIG_COMPAT=y
CONFIG_SYSVIPC_COMPAT=y
CONFIG_KEYS_COMPAT=y
CONFIG_SMP=y
CONFIG_NR_CPUS=256
CONFIG_HOTPLUG_CPU=y
# CONFIG_NODES_SPAN_OTHER_NODES is not set
# CONFIG_NUMA is not set
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
CONFIG_SCHED_BOOK=y
CONFIG_SCHED_TOPOLOGY=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
CONFIG_HZ_100=y
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=100
CONFIG_SCHED_HRTICK=y

#
# Memory setup
#
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK=y
CONFIG_FORCE_MAX_ZONEORDER=9
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_HAVE_MEMORY_PRESENT=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_HAVE_MEMBLOCK=y
CONFIG_HAVE_MEMBLOCK_NODE_MAP=y
CONFIG_HAVE_MEMBLOCK_PHYS_MAP=y
CONFIG_NO_BOOTMEM=y
CONFIG_MEMORY_ISOLATION=y
# CONFIG_HAVE_BOOTMEM_INFO_NODE is not set
CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG_SPARSE=y
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_MEMORY_BALLOON=y
CONFIG_BALLOON_COMPACTION=y
CONFIG_COMPACTION=y
CONFIG_MIGRATION=y
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_MMU_NOTIFIER=y
CONFIG_KSM=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
# CONFIG_CLEANCACHE is not set
# CONFIG_FRONTSWAP is not set
# CONFIG_CMA is not set
# CONFIG_ZPOOL is not set
# CONFIG_ZBUD is not set
# CONFIG_ZSMALLOC is not set
# CONFIG_IDLE_PAGE_TRACKING is not set
CONFIG_PACK_STACK=y
CONFIG_CHECK_STACK=y
CONFIG_STACK_GUARD=256
# CONFIG_WARN_DYNAMIC_STACK is not set

#
# I/O subsystem
#
CONFIG_QDIO=y
CONFIG_PCI=y
CONFIG_PCI_NR_FUNCTIONS=64
CONFIG_PCI_NR_MSI=256
CONFIG_PCI_BUS_ADDR_T_64BIT=y
CONFIG_PCI_MSI=y
CONFIG_PCI_DEBUG=y
# CONFIG_PCI_REALLOC_ENABLE_AUTO is not set
# CONFIG_PCI_STUB is not set
CONFIG_PCI_ATS=y
CONFIG_PCI_IOV=y
# CONFIG_PCI_PRI is not set
# CONFIG_PCI_PASID is not set

#
# PCI host controller drivers
#
# CONFIG_PCIEPORTBUS is not set
CONFIG_HOTPLUG_PCI=y
# CONFIG_HOTPLUG_PCI_CPCI is not set
# CONFIG_HOTPLUG_PCI_SHPC is not set
CONFIG_HOTPLUG_PCI_S390=y
CONFIG_PCI_DOMAINS=y
CONFIG_HAS_IOMEM=y
CONFIG_IOMMU_HELPER=y
CONFIG_HAS_DMA=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_CHSC_SCH=y
CONFIG_SCM_BUS=y
CONFIG_EADM_SCH=m

#
# Dump support
#
CONFIG_CRASH_DUMP=y

#
# Executable file formats / Emulations
#
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
CONFIG_BINFMT_SCRIPT=y
# CONFIG_HAVE_AOUT is not set
CONFIG_BINFMT_MISC=m
CONFIG_COREDUMP=y
CONFIG_SECCOMP=y

#
# Power Management
#
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_HIBERNATE_CALLBACKS=y
CONFIG_HIBERNATION=y
CONFIG_ARCH_SAVE_PAGE_KEYS=y
CONFIG_PM_STD_PARTITION=""
CONFIG_PM_SLEEP=y
CONFIG_PM_SLEEP_SMP=y
# CONFIG_PM_AUTOSLEEP is not set
# CONFIG_PM_WAKELOCKS is not set
CONFIG_PM=y
# CONFIG_PM_DEBUG is not set
# CONFIG_WQ_POWER_EFFICIENT_DEFAULT is not set
CONFIG_NET=y
CONFIG_NET_INGRESS=y
CONFIG_NET_EGRESS=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_DIAG=m
CONFIG_UNIX=y
CONFIG_UNIX_DIAG=m
CONFIG_XFRM=y
CONFIG_XFRM_ALGO=m
CONFIG_XFRM_USER=m
# CONFIG_XFRM_SUB_POLICY is not set
# CONFIG_XFRM_MIGRATE is not set
# CONFIG_XFRM_STATISTICS is not set
CONFIG_XFRM_IPCOMP=m
CONFIG_NET_KEY=m
# CONFIG_NET_KEY_MIGRATE is not set
CONFIG_IUCV=y
CONFIG_AFIUCV=m
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
# CONFIG_IP_FIB_TRIE_STATS is not set
CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_IP_ROUTE_MULTIPATH=y
CONFIG_IP_ROUTE_VERBOSE=y
CONFIG_IP_ROUTE_CLASSID=y
# CONFIG_IP_PNP is not set
CONFIG_NET_IPIP=m
CONFIG_NET_IPGRE_DEMUX=m
CONFIG_NET_IP_TUNNEL=m
CONFIG_NET_IPGRE=m
CONFIG_NET_IPGRE_BROADCAST=y
CONFIG_IP_MROUTE=y
CONFIG_IP_MROUTE_MULTIPLE_TABLES=y
CONFIG_IP_PIMSM_V1=y
CONFIG_IP_PIMSM_V2=y
CONFIG_SYN_COOKIES=y
CONFIG_NET_IPVTI=m
CONFIG_NET_UDP_TUNNEL=m
# CONFIG_NET_FOU is not set
# CONFIG_NET_FOU_IP_TUNNELS is not set
CONFIG_INET_AH=m
CONFIG_INET_ESP=m
CONFIG_INET_IPCOMP=m
CONFIG_INET_XFRM_TUNNEL=m
CONFIG_INET_TUNNEL=m
CONFIG_INET_XFRM_MODE_TRANSPORT=m
CONFIG_INET_XFRM_MODE_TUNNEL=m
CONFIG_INET_XFRM_MODE_BEET=m
CONFIG_INET_LRO=y
CONFIG_INET_DIAG=m
CONFIG_INET_TCP_DIAG=m
CONFIG_INET_UDP_DIAG=m
# CONFIG_INET_DIAG_DESTROY is not set
CONFIG_TCP_CONG_ADVANCED=y
CONFIG_TCP_CONG_BIC=m
CONFIG_TCP_CONG_CUBIC=y
CONFIG_TCP_CONG_WESTWOOD=m
CONFIG_TCP_CONG_HTCP=m
CONFIG_TCP_CONG_HSTCP=m
CONFIG_TCP_CONG_HYBLA=m
CONFIG_TCP_CONG_VEGAS=m
CONFIG_TCP_CONG_SCALABLE=m
CONFIG_TCP_CONG_LP=m
CONFIG_TCP_CONG_VENO=m
CONFIG_TCP_CONG_YEAH=m
CONFIG_TCP_CONG_ILLINOIS=m
# CONFIG_TCP_CONG_DCTCP is not set
# CONFIG_TCP_CONG_CDG is not set
CONFIG_DEFAULT_CUBIC=y
# CONFIG_DEFAULT_RENO is not set
CONFIG_DEFAULT_TCP_CONG="cubic"
# CONFIG_TCP_MD5SIG is not set
CONFIG_IPV6=y
CONFIG_IPV6_ROUTER_PREF=y
# CONFIG_IPV6_ROUTE_INFO is not set
# CONFIG_IPV6_OPTIMISTIC_DAD is not set
CONFIG_INET6_AH=m
CONFIG_INET6_ESP=m
CONFIG_INET6_IPCOMP=m
CONFIG_IPV6_MIP6=m
CONFIG_INET6_XFRM_TUNNEL=m
CONFIG_INET6_TUNNEL=m
CONFIG_INET6_XFRM_MODE_TRANSPORT=m
CONFIG_INET6_XFRM_MODE_TUNNEL=m
CONFIG_INET6_XFRM_MODE_BEET=m
CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION=m
CONFIG_IPV6_VTI=m
CONFIG_IPV6_SIT=m
# CONFIG_IPV6_SIT_6RD is not set
CONFIG_IPV6_NDISC_NODETYPE=y
CONFIG_IPV6_TUNNEL=m
CONFIG_IPV6_GRE=m
CONFIG_IPV6_MULTIPLE_TABLES=y
CONFIG_IPV6_SUBTREES=y
# CONFIG_IPV6_MROUTE is not set
# CONFIG_NETLABEL is not set
CONFIG_NETWORK_SECMARK=y
CONFIG_NET_PTP_CLASSIFY=y
# CONFIG_NETWORK_PHY_TIMESTAMPING is not set
# CONFIG_NETFILTER is not set
# CONFIG_IP_DCCP is not set
CONFIG_IP_SCTP=m
CONFIG_NET_SCTPPROBE=m
# CONFIG_SCTP_DBG_OBJCNT is not set
CONFIG_SCTP_DEFAULT_COOKIE_HMAC_MD5=y
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_SHA1 is not set
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_NONE is not set
CONFIG_SCTP_COOKIE_HMAC_MD5=y
# CONFIG_SCTP_COOKIE_HMAC_SHA1 is not set
CONFIG_RDS=m
CONFIG_RDS_RDMA=m
CONFIG_RDS_TCP=m
CONFIG_RDS_DEBUG=y
# CONFIG_TIPC is not set
# CONFIG_ATM is not set
CONFIG_L2TP=m
CONFIG_L2TP_DEBUGFS=m
CONFIG_L2TP_V3=y
CONFIG_L2TP_IP=m
CONFIG_L2TP_ETH=m
CONFIG_STP=m
CONFIG_GARP=m
CONFIG_BRIDGE=m
CONFIG_BRIDGE_IGMP_SNOOPING=y
# CONFIG_BRIDGE_VLAN_FILTERING is not set
CONFIG_VLAN_8021Q=m
CONFIG_VLAN_8021Q_GVRP=y
# CONFIG_VLAN_8021Q_MVRP is not set
# CONFIG_DECNET is not set
CONFIG_LLC=m
# CONFIG_LLC2 is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_PHONET is not set
# CONFIG_6LOWPAN is not set
# CONFIG_IEEE802154 is not set
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
CONFIG_NET_SCH_CBQ=m
CONFIG_NET_SCH_HTB=m
CONFIG_NET_SCH_HFSC=m
CONFIG_NET_SCH_PRIO=m
CONFIG_NET_SCH_MULTIQ=m
CONFIG_NET_SCH_RED=m
CONFIG_NET_SCH_SFB=m
CONFIG_NET_SCH_SFQ=m
CONFIG_NET_SCH_TEQL=m
CONFIG_NET_SCH_TBF=m
CONFIG_NET_SCH_GRED=m
CONFIG_NET_SCH_DSMARK=m
CONFIG_NET_SCH_NETEM=m
CONFIG_NET_SCH_DRR=m
CONFIG_NET_SCH_MQPRIO=m
CONFIG_NET_SCH_CHOKE=m
CONFIG_NET_SCH_QFQ=m
CONFIG_NET_SCH_CODEL=m
CONFIG_NET_SCH_FQ_CODEL=m
# CONFIG_NET_SCH_FQ is not set
# CONFIG_NET_SCH_HHF is not set
# CONFIG_NET_SCH_PIE is not set
CONFIG_NET_SCH_INGRESS=m
CONFIG_NET_SCH_PLUG=m

#
# Classification
#
CONFIG_NET_CLS=y
CONFIG_NET_CLS_BASIC=m
CONFIG_NET_CLS_TCINDEX=m
CONFIG_NET_CLS_ROUTE4=m
CONFIG_NET_CLS_FW=m
CONFIG_NET_CLS_U32=m
CONFIG_CLS_U32_PERF=y
CONFIG_CLS_U32_MARK=y
CONFIG_NET_CLS_RSVP=m
CONFIG_NET_CLS_RSVP6=m
CONFIG_NET_CLS_FLOW=m
CONFIG_NET_CLS_CGROUP=y
CONFIG_NET_CLS_BPF=m
# CONFIG_NET_CLS_FLOWER is not set
# CONFIG_NET_EMATCH is not set
CONFIG_NET_CLS_ACT=y
CONFIG_NET_ACT_POLICE=m
CONFIG_NET_ACT_GACT=m
CONFIG_GACT_PROB=y
CONFIG_NET_ACT_MIRRED=m
CONFIG_NET_ACT_NAT=m
CONFIG_NET_ACT_PEDIT=m
CONFIG_NET_ACT_SIMP=m
CONFIG_NET_ACT_SKBEDIT=m
CONFIG_NET_ACT_CSUM=m
# CONFIG_NET_ACT_VLAN is not set
# CONFIG_NET_ACT_BPF is not set
# CONFIG_NET_CLS_IND is not set
CONFIG_NET_SCH_FIFO=y
# CONFIG_DCB is not set
CONFIG_DNS_RESOLVER=y
# CONFIG_BATMAN_ADV is not set
# CONFIG_OPENVSWITCH is not set
# CONFIG_VSOCKETS is not set
# CONFIG_NETLINK_MMAP is not set
# CONFIG_NETLINK_DIAG is not set
# CONFIG_MPLS is not set
# CONFIG_HSR is not set
# CONFIG_NET_SWITCHDEV is not set
# CONFIG_NET_L3_MASTER_DEV is not set
CONFIG_RPS=y
CONFIG_RFS_ACCEL=y
CONFIG_XPS=y
CONFIG_SOCK_CGROUP_DATA=y
# CONFIG_CGROUP_NET_PRIO is not set
CONFIG_CGROUP_NET_CLASSID=y
CONFIG_NET_RX_BUSY_POLL=y
CONFIG_BQL=y
CONFIG_BPF_JIT=y
CONFIG_NET_FLOW_LIMIT=y

#
# Network testing
#
CONFIG_NET_PKTGEN=m
CONFIG_NET_TCPPROBE=m
# CONFIG_NET_DROP_MONITOR is not set
# CONFIG_CAN is not set
# CONFIG_AF_RXRPC is not set
CONFIG_FIB_RULES=y
# CONFIG_WIMAX is not set
# CONFIG_RFKILL is not set
# CONFIG_NET_9P is not set
# CONFIG_CAIF is not set
# CONFIG_CEPH_LIB is not set
# CONFIG_NFC is not set
# CONFIG_LWTUNNEL is not set
CONFIG_HAVE_BPF_JIT=y
# CONFIG_PCMCIA is not set
CONFIG_CCW=y

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_UEVENT_HELPER=y
CONFIG_UEVENT_HELPER_PATH=""
CONFIG_DEVTMPFS=y
# CONFIG_DEVTMPFS_MOUNT is not set
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_FIRMWARE_IN_KERNEL=y
CONFIG_EXTRA_FIRMWARE=""
# CONFIG_FW_LOADER_USER_HELPER_FALLBACK is not set
CONFIG_ALLOW_DEV_COREDUMP=y
# CONFIG_DEBUG_DRIVER is not set
# CONFIG_DEBUG_DEVRES is not set
CONFIG_SYS_HYPERVISOR=y
# CONFIG_GENERIC_CPU_DEVICES is not set
CONFIG_GENERIC_CPU_AUTOPROBE=y
# CONFIG_DMA_SHARED_BUFFER is not set

#
# Bus devices
#
CONFIG_CONNECTOR=y
CONFIG_PROC_EVENTS=y
# CONFIG_MTD is not set
# CONFIG_OF is not set
# CONFIG_PARPORT is not set
CONFIG_BLK_DEV=y
# CONFIG_BLK_DEV_NULL_BLK is not set
# CONFIG_BLK_DEV_PCIESSD_MTIP32XX is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_BLK_DEV_DAC960 is not set
# CONFIG_BLK_DEV_UMEM is not set
# CONFIG_BLK_DEV_COW_COMMON is not set
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_LOOP_MIN_COUNT=8
CONFIG_BLK_DEV_CRYPTOLOOP=m
# CONFIG_BLK_DEV_DRBD is not set
# CONFIG_BLK_DEV_NBD is not set
# CONFIG_BLK_DEV_SKD is not set
CONFIG_BLK_DEV_OSD=m
# CONFIG_BLK_DEV_SX8 is not set
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=32768
# CONFIG_CDROM_PKTCDVD is not set
CONFIG_ATA_OVER_ETH=m

#
# S/390 block device drivers
#
CONFIG_BLK_DEV_XPRAM=m
CONFIG_DCSSBLK=m
CONFIG_DASD=y
CONFIG_DASD_PROFILE=y
CONFIG_DASD_ECKD=y
CONFIG_DASD_FBA=y
CONFIG_DASD_DIAG=y
CONFIG_DASD_EER=y
CONFIG_SCM_BLOCK=m
CONFIG_SCM_BLOCK_CLUSTER_WRITE=y
CONFIG_VIRTIO_BLK=y
# CONFIG_BLK_DEV_RBD is not set
# CONFIG_BLK_DEV_RSXX is not set
# CONFIG_BLK_DEV_NVME is not set

#
# Misc devices
#
# CONFIG_SENSORS_LIS3LV02D is not set
# CONFIG_DUMMY_IRQ is not set
# CONFIG_PHANTOM is not set
# CONFIG_SGI_IOC4 is not set
# CONFIG_TIFM_CORE is not set
# CONFIG_ENCLOSURE_SERVICES is not set
# CONFIG_HP_ILO is not set
# CONFIG_SRAM is not set
# CONFIG_C2PORT is not set

#
# EEPROM support
#
# CONFIG_EEPROM_93CX6 is not set
# CONFIG_CB710_CORE is not set

#
# Texas Instruments shared transport line discipline
#

#
# Altera FPGA firmware download module
#

#
# Intel MIC Bus Driver
#

#
# SCIF Bus Driver
#

#
# Intel MIC Host Driver
#

#
# Intel MIC Card Driver
#

#
# SCIF Driver
#

#
# Intel MIC Coprocessor State Management (COSM) Drivers
#
CONFIG_GENWQE=m
CONFIG_GENWQE_PLATFORM_ERROR_RECOVERY=0
# CONFIG_ECHO is not set
# CONFIG_CXL_BASE is not set
# CONFIG_CXL_KERNEL_API is not set
# CONFIG_CXL_EEH is not set

#
# SCSI device support
#
CONFIG_SCSI_MOD=y
CONFIG_RAID_ATTRS=m
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
CONFIG_SCSI_NETLINK=y
CONFIG_SCSI_MQ_DEFAULT=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
CONFIG_CHR_DEV_ST=m
CONFIG_CHR_DEV_OSST=m
CONFIG_BLK_DEV_SR=m
# CONFIG_BLK_DEV_SR_VENDOR is not set
CONFIG_CHR_DEV_SG=y
CONFIG_CHR_DEV_SCH=m
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_LOGGING=y
# CONFIG_SCSI_SCAN_ASYNC is not set

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=m
CONFIG_SCSI_FC_ATTRS=y
CONFIG_SCSI_ISCSI_ATTRS=m
CONFIG_SCSI_SAS_ATTRS=m
CONFIG_SCSI_SAS_LIBSAS=m
CONFIG_SCSI_SAS_HOST_SMP=y
CONFIG_SCSI_SRP_ATTRS=m
CONFIG_SCSI_LOWLEVEL=y
CONFIG_ISCSI_TCP=m
# CONFIG_ISCSI_BOOT_SYSFS is not set
# CONFIG_SCSI_CXGB3_ISCSI is not set
# CONFIG_SCSI_CXGB4_ISCSI is not set
# CONFIG_SCSI_BNX2_ISCSI is not set
# CONFIG_SCSI_BNX2X_FCOE is not set
# CONFIG_BE2ISCSI is not set
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_HPSA is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_3W_SAS is not set
# CONFIG_SCSI_ACARD is not set
# CONFIG_SCSI_AACRAID is not set
# CONFIG_SCSI_AIC7XXX is not set
# CONFIG_SCSI_AIC79XX is not set
# CONFIG_SCSI_AIC94XX is not set
# CONFIG_SCSI_MVSAS is not set
# CONFIG_SCSI_MVUMI is not set
# CONFIG_SCSI_DPT_I2O is not set
# CONFIG_SCSI_ADVANSYS is not set
# CONFIG_SCSI_ARCMSR is not set
# CONFIG_SCSI_ESAS2R is not set
# CONFIG_MEGARAID_NEWGEN is not set
# CONFIG_MEGARAID_LEGACY is not set
# CONFIG_MEGARAID_SAS is not set
# CONFIG_SCSI_MPT3SAS is not set
# CONFIG_SCSI_MPT2SAS is not set
# CONFIG_SCSI_UFSHCD is not set
# CONFIG_SCSI_HPTIOP is not set
CONFIG_LIBFC=m
CONFIG_LIBFCOE=m
# CONFIG_FCOE is not set
# CONFIG_SCSI_SNIC is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_FUTURE_DOMAIN is not set
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
# CONFIG_SCSI_STEX is not set
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
# CONFIG_SCSI_QLA_FC is not set
# CONFIG_SCSI_QLA_ISCSI is not set
# CONFIG_SCSI_LPFC is not set
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_AM53C974 is not set
# CONFIG_SCSI_WD719X is not set
CONFIG_SCSI_DEBUG=m
CONFIG_ZFCP=y
# CONFIG_SCSI_PMCRAID is not set
# CONFIG_SCSI_PM8001 is not set
# CONFIG_SCSI_BFA_FC is not set
CONFIG_SCSI_VIRTIO=m
# CONFIG_SCSI_CHELSIO_FCOE is not set
# CONFIG_SCSI_DH is not set
CONFIG_SCSI_OSD_INITIATOR=m
CONFIG_SCSI_OSD_ULD=m
CONFIG_SCSI_OSD_DPRINT_SENSE=1
# CONFIG_SCSI_OSD_DEBUG is not set
CONFIG_MD=y
# CONFIG_BLK_DEV_MD is not set
# CONFIG_BCACHE is not set
CONFIG_BLK_DEV_DM_BUILTIN=y
CONFIG_BLK_DEV_DM=m
# CONFIG_DM_MQ_DEFAULT is not set
# CONFIG_DM_DEBUG is not set
# CONFIG_DM_CRYPT is not set
# CONFIG_DM_SNAPSHOT is not set
# CONFIG_DM_THIN_PROVISIONING is not set
# CONFIG_DM_CACHE is not set
# CONFIG_DM_ERA is not set
CONFIG_DM_MIRROR=m
# CONFIG_DM_LOG_USERSPACE is not set
# CONFIG_DM_RAID is not set
# CONFIG_DM_ZERO is not set
# CONFIG_DM_MULTIPATH is not set
# CONFIG_DM_DELAY is not set
# CONFIG_DM_UEVENT is not set
# CONFIG_DM_FLAKEY is not set
# CONFIG_DM_VERITY is not set
# CONFIG_DM_SWITCH is not set
# CONFIG_DM_LOG_WRITES is not set
# CONFIG_TARGET_CORE is not set
# CONFIG_FUSION is not set

#
# IEEE 1394 (FireWire) support
#
# CONFIG_FIREWIRE is not set
# CONFIG_FIREWIRE_NOSY is not set
CONFIG_NETDEVICES=y
CONFIG_NET_CORE=y
CONFIG_BONDING=m
CONFIG_DUMMY=m
CONFIG_EQUALIZER=m
# CONFIG_NET_FC is not set
CONFIG_IFB=m
# CONFIG_NET_TEAM is not set
CONFIG_MACVLAN=m
CONFIG_MACVTAP=m
# CONFIG_IPVLAN is not set
CONFIG_VXLAN=m
# CONFIG_GENEVE is not set
# CONFIG_NETCONSOLE is not set
# CONFIG_NETPOLL is not set
# CONFIG_NET_POLL_CONTROLLER is not set
CONFIG_TUN=m
# CONFIG_TUN_VNET_CROSS_LE is not set
CONFIG_VETH=m
CONFIG_VIRTIO_NET=m
CONFIG_NLMON=m
# CONFIG_ARCNET is not set

#
# CAIF transport drivers
#
CONFIG_VHOST_NET=m
CONFIG_VHOST_RING=m
CONFIG_VHOST=m
# CONFIG_VHOST_CROSS_ENDIAN_LEGACY is not set
CONFIG_ETHERNET=y
# CONFIG_NET_VENDOR_3COM is not set
# CONFIG_NET_VENDOR_ADAPTEC is not set
# CONFIG_NET_VENDOR_AGERE is not set
# CONFIG_NET_VENDOR_ALTEON is not set
# CONFIG_ALTERA_TSE is not set
# CONFIG_NET_VENDOR_AMD is not set
# CONFIG_NET_VENDOR_ARC is not set
# CONFIG_NET_VENDOR_ATHEROS is not set
# CONFIG_NET_VENDOR_AURORA is not set
# CONFIG_NET_CADENCE is not set
# CONFIG_NET_VENDOR_BROADCOM is not set
# CONFIG_NET_VENDOR_BROCADE is not set
# CONFIG_NET_VENDOR_CAVIUM is not set
# CONFIG_NET_VENDOR_CHELSIO is not set
# CONFIG_NET_VENDOR_CISCO is not set
# CONFIG_DNET is not set
# CONFIG_NET_VENDOR_DEC is not set
# CONFIG_NET_VENDOR_DLINK is not set
CONFIG_NET_VENDOR_EMULEX=y
# CONFIG_BE2NET is not set
# CONFIG_NET_VENDOR_EZCHIP is not set
# CONFIG_NET_VENDOR_EXAR is not set
# CONFIG_NET_VENDOR_HP is not set
# CONFIG_NET_VENDOR_INTEL is not set
# CONFIG_JME is not set
# CONFIG_NET_VENDOR_MARVELL is not set
CONFIG_NET_VENDOR_MELLANOX=y
CONFIG_MLX4_EN=m
CONFIG_MLX4_EN_VXLAN=y
CONFIG_MLX4_CORE=m
CONFIG_MLX4_DEBUG=y
CONFIG_MLX5_CORE=m
CONFIG_MLX5_CORE_EN=y
CONFIG_MLXSW_CORE=m
CONFIG_MLXSW_PCI=m
# CONFIG_NET_VENDOR_MICREL is not set
# CONFIG_NET_VENDOR_MYRI is not set
# CONFIG_FEALNX is not set
# CONFIG_NET_VENDOR_NATSEMI is not set
CONFIG_NET_VENDOR_NETRONOME=y
# CONFIG_NFP_NETVF is not set
# CONFIG_NET_VENDOR_NVIDIA is not set
# CONFIG_NET_VENDOR_OKI is not set
# CONFIG_ETHOC is not set
# CONFIG_NET_PACKET_ENGINE is not set
# CONFIG_NET_VENDOR_QLOGIC is not set
# CONFIG_NET_VENDOR_QUALCOMM is not set
# CONFIG_NET_VENDOR_REALTEK is not set
# CONFIG_NET_VENDOR_RENESAS is not set
# CONFIG_NET_VENDOR_RDC is not set
# CONFIG_NET_VENDOR_ROCKER is not set
# CONFIG_NET_VENDOR_SAMSUNG is not set
# CONFIG_NET_VENDOR_SEEQ is not set
# CONFIG_NET_VENDOR_SILAN is not set
# CONFIG_NET_VENDOR_SIS is not set
# CONFIG_SFC is not set
# CONFIG_NET_VENDOR_SMSC is not set
# CONFIG_NET_VENDOR_STMICRO is not set
# CONFIG_NET_VENDOR_SUN is not set
# CONFIG_NET_VENDOR_SYNOPSYS is not set
# CONFIG_NET_VENDOR_TEHUTI is not set
# CONFIG_NET_VENDOR_TI is not set
# CONFIG_NET_VENDOR_VIA is not set
# CONFIG_NET_VENDOR_WIZNET is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_PHYLIB is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set

#
# S/390 network device drivers
#
CONFIG_LCS=m
CONFIG_CTCM=m
CONFIG_NETIUCV=m
CONFIG_SMSGIUCV=m
CONFIG_SMSGIUCV_EVENT=m
CONFIG_QETH=y
CONFIG_QETH_L2=y
CONFIG_QETH_L3=y
CONFIG_QETH_IPV6=y
CONFIG_CCWGROUP=y

#
# Host-side USB support is needed for USB Network Adapter support
#

#
# Enable WiMAX (Networking options) to see the WiMAX drivers
#
# CONFIG_WAN is not set
# CONFIG_VMXNET3 is not set
# CONFIG_NVM is not set

#
# Input device support
#
CONFIG_INPUT=y
# CONFIG_INPUT_FF_MEMLESS is not set
# CONFIG_INPUT_POLLDEV is not set
# CONFIG_INPUT_SPARSEKMAP is not set
# CONFIG_INPUT_MATRIXKMAP is not set

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
# CONFIG_INPUT_JOYDEV is not set
# CONFIG_INPUT_EVDEV is not set
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
# CONFIG_INPUT_KEYBOARD is not set
# CONFIG_INPUT_MOUSE is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TABLET is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
# CONFIG_INPUT_MISC is not set

#
# Hardware I/O ports
#
# CONFIG_SERIO is not set
# CONFIG_GAMEPORT is not set

#
# Character devices
#
CONFIG_TTY=y
CONFIG_UNIX98_PTYS=y
CONFIG_DEVPTS_MULTIPLE_INSTANCES=y
CONFIG_LEGACY_PTYS=y
CONFIG_LEGACY_PTY_COUNT=0
# CONFIG_SERIAL_NONSTANDARD is not set
# CONFIG_NOZOMI is not set
# CONFIG_N_GSM is not set
# CONFIG_TRACE_SINK is not set
CONFIG_DEVMEM=y
CONFIG_DEVKMEM=y

#
# Serial drivers
#
# CONFIG_SERIAL_8250 is not set

#
# Non-8250 serial port support
#
# CONFIG_SERIAL_UARTLITE is not set
# CONFIG_SERIAL_JSM is not set
# CONFIG_SERIAL_SCCNXP is not set
# CONFIG_SERIAL_ALTERA_JTAGUART is not set
# CONFIG_SERIAL_ALTERA_UART is not set
# CONFIG_SERIAL_ARC is not set
# CONFIG_SERIAL_RP2 is not set
# CONFIG_SERIAL_FSL_LPUART is not set
CONFIG_HVC_DRIVER=y
CONFIG_HVC_IUCV=y
CONFIG_VIRTIO_CONSOLE=y
# CONFIG_IPMI_HANDLER is not set
CONFIG_HW_RANDOM=m
# CONFIG_HW_RANDOM_TIMERIOMEM is not set
CONFIG_HW_RANDOM_VIRTIO=m
CONFIG_HW_RANDOM_TPM=m
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set
CONFIG_RAW_DRIVER=m
CONFIG_MAX_RAW_DEVS=256
CONFIG_HANGCHECK_TIMER=m
CONFIG_TCG_TPM=y
CONFIG_DEVPORT=y

#
# S/390 character device drivers
#
CONFIG_TN3270=y
CONFIG_TN3270_TTY=y
CONFIG_TN3270_FS=y
CONFIG_TN3270_CONSOLE=y
CONFIG_TN3215=y
CONFIG_TN3215_CONSOLE=y
CONFIG_CCW_CONSOLE=y
CONFIG_SCLP_TTY=y
CONFIG_SCLP_CONSOLE=y
CONFIG_SCLP_VT220_TTY=y
CONFIG_SCLP_VT220_CONSOLE=y
CONFIG_SCLP_ASYNC=m
CONFIG_SCLP_ASYNC_ID="000000000"
CONFIG_HMC_DRV=m
# CONFIG_SCLP_OFB is not set
CONFIG_S390_TAPE=m

#
# S/390 tape hardware support
#
CONFIG_S390_TAPE_34XX=m
CONFIG_S390_TAPE_3590=m
CONFIG_VMLOGRDR=m
CONFIG_VMCP=y
CONFIG_MONREADER=m
CONFIG_MONWRITER=m
CONFIG_S390_VMUR=m
# CONFIG_XILLYBUS is not set

#
# I2C support
#
# CONFIG_I2C is not set
# CONFIG_SPI is not set
# CONFIG_SPMI is not set
# CONFIG_HSI is not set

#
# PPS support
#
CONFIG_PPS=m
# CONFIG_PPS_DEBUG is not set

#
# PPS clients support
#
# CONFIG_PPS_CLIENT_KTIMER is not set
# CONFIG_PPS_CLIENT_LDISC is not set
# CONFIG_PPS_CLIENT_GPIO is not set

#
# PPS generators support
#

#
# PTP clock support
#
CONFIG_PTP_1588_CLOCK=m

#
# Enable PHYLIB and NETWORK_PHY_TIMESTAMPING to see the additional clocks.
#
# CONFIG_W1 is not set
# CONFIG_POWER_SUPPLY is not set
# CONFIG_POWER_RESET is not set
# CONFIG_POWER_AVS is not set
# CONFIG_HWMON is not set
# CONFIG_THERMAL is not set
CONFIG_WATCHDOG=y
CONFIG_WATCHDOG_CORE=y
CONFIG_WATCHDOG_NOWAYOUT=y
# CONFIG_WATCHDOG_SYSFS is not set

#
# Watchdog Device Drivers
#
# CONFIG_SOFT_WATCHDOG is not set
# CONFIG_XILINX_WATCHDOG is not set
# CONFIG_CADENCE_WATCHDOG is not set
# CONFIG_DW_WATCHDOG is not set
# CONFIG_MAX63XX_WATCHDOG is not set
# CONFIG_ALIM7101_WDT is not set
# CONFIG_I6300ESB_WDT is not set
# CONFIG_BCM7038_WDT is not set
CONFIG_DIAG288_WATCHDOG=m

#
# PCI-based Watchdog Cards
#
# CONFIG_PCIPCWATCHDOG is not set
# CONFIG_WDTPCI is not set
CONFIG_SSB_POSSIBLE=y

#
# Sonics Silicon Backplane
#
# CONFIG_SSB is not set
CONFIG_BCMA_POSSIBLE=y

#
# Broadcom specific AMBA
#
# CONFIG_BCMA is not set

#
# Multifunction device drivers
#
# CONFIG_MFD_CORE is not set
# CONFIG_HTC_PASIC3 is not set
# CONFIG_LPC_ICH is not set
# CONFIG_LPC_SCH is not set
# CONFIG_MFD_JANZ_CMODIO is not set
# CONFIG_MFD_KEMPLD is not set
# CONFIG_MFD_MT6397 is not set
# CONFIG_MFD_RDC321X is not set
# CONFIG_MFD_RTSX_PCI is not set
# CONFIG_MFD_SM501 is not set
# CONFIG_ABX500_CORE is not set
# CONFIG_MFD_SYSCON is not set
# CONFIG_MFD_TI_AM335X_TSCADC is not set
# CONFIG_MFD_TMIO is not set
# CONFIG_MFD_VX855 is not set
# CONFIG_REGULATOR is not set
# CONFIG_MEDIA_SUPPORT is not set

#
# Graphics support
#
# CONFIG_DRM is not set

#
# Frame buffer Devices
#
# CONFIG_FB is not set
# CONFIG_BACKLIGHT_LCD_SUPPORT is not set
# CONFIG_VGASTATE is not set
# CONFIG_SOUND is not set

#
# HID support
#
# CONFIG_HID is not set
CONFIG_USB_OHCI_LITTLE_ENDIAN=y
# CONFIG_USB_SUPPORT is not set
# CONFIG_UWB is not set
# CONFIG_MMC is not set
# CONFIG_MEMSTICK is not set
# CONFIG_NEW_LEDS is not set
# CONFIG_ACCESSIBILITY is not set
CONFIG_INFINIBAND=m
# CONFIG_INFINIBAND_USER_MAD is not set
CONFIG_INFINIBAND_USER_ACCESS=m
CONFIG_INFINIBAND_USER_MEM=y
CONFIG_INFINIBAND_ON_DEMAND_PAGING=y
CONFIG_INFINIBAND_ADDR_TRANS=y
CONFIG_INFINIBAND_ADDR_TRANS_CONFIGFS=y
# CONFIG_INFINIBAND_MTHCA is not set
# CONFIG_INFINIBAND_QIB is not set
CONFIG_MLX4_INFINIBAND=m
# CONFIG_MLX5_INFINIBAND is not set
# CONFIG_INFINIBAND_NES is not set
# CONFIG_INFINIBAND_OCRDMA is not set
# CONFIG_INFINIBAND_IPOIB is not set
# CONFIG_INFINIBAND_SRP is not set
# CONFIG_INFINIBAND_ISER is not set
# CONFIG_DMADEVICES is not set
# CONFIG_AUXDISPLAY is not set
# CONFIG_UIO is not set
CONFIG_VFIO_IOMMU_TYPE1=m
CONFIG_VFIO_VIRQFD=m
CONFIG_VFIO=m
# CONFIG_VFIO_NOIOMMU is not set
CONFIG_VFIO_PCI=m
CONFIG_IRQ_BYPASS_MANAGER=m
# CONFIG_VIRT_DRIVERS is not set
CONFIG_VIRTIO=y

#
# Virtio drivers
#
CONFIG_VIRTIO_PCI=m
CONFIG_VIRTIO_PCI_LEGACY=y
CONFIG_VIRTIO_BALLOON=m
# CONFIG_VIRTIO_INPUT is not set
# CONFIG_VIRTIO_MMIO is not set

#
# Microsoft Hyper-V guest support
#
# CONFIG_STAGING is not set

#
# Hardware Spinlock drivers
#

#
# Clock Source drivers
#
# CONFIG_ATMEL_PIT is not set
# CONFIG_SH_TIMER_CMT is not set
# CONFIG_SH_TIMER_MTU2 is not set
# CONFIG_SH_TIMER_TMU is not set
# CONFIG_EM_TIMER_STI is not set
# CONFIG_MAILBOX is not set
CONFIG_IOMMU_API=y
CONFIG_IOMMU_SUPPORT=y

#
# Generic IOMMU Pagetable Support
#
CONFIG_S390_IOMMU=y

#
# Remoteproc drivers
#
# CONFIG_STE_MODEM_RPROC is not set

#
# Rpmsg drivers
#

#
# SOC (System On Chip) specific Drivers
#
# CONFIG_SUNXI_SRAM is not set
# CONFIG_SOC_TI is not set
# CONFIG_PM_DEVFREQ is not set
# CONFIG_EXTCON is not set
# CONFIG_MEMORY is not set
# CONFIG_IIO is not set
# CONFIG_NTB is not set
# CONFIG_VME_BUS is not set
# CONFIG_PWM is not set
CONFIG_ARM_GIC_MAX_NR=1
# CONFIG_TS4800_IRQ is not set
# CONFIG_IPACK_BUS is not set
# CONFIG_RESET_CONTROLLER is not set
# CONFIG_FMC is not set

#
# PHY Subsystem
#
# CONFIG_GENERIC_PHY is not set
# CONFIG_PHY_PXA_28NM_HSIC is not set
# CONFIG_PHY_PXA_28NM_USB2 is not set
# CONFIG_BCM_KONA_USB2_PHY is not set
# CONFIG_PHY_HI6220_USB is not set
# CONFIG_POWERCAP is not set
# CONFIG_MCB is not set

#
# Performance monitor support
#
# CONFIG_RAS is not set
# CONFIG_THUNDERBOLT is not set

#
# Android
#
# CONFIG_ANDROID is not set
# CONFIG_LIBNVDIMM is not set
# CONFIG_NVMEM is not set
# CONFIG_STM is not set
# CONFIG_STM_DUMMY is not set
# CONFIG_STM_SOURCE_CONSOLE is not set
# CONFIG_INTEL_TH is not set

#
# FPGA Configuration Support
#
# CONFIG_FPGA is not set

#
# File systems
#
# CONFIG_EXT2_FS is not set
# CONFIG_EXT3_FS is not set
CONFIG_EXT4_FS=y
CONFIG_EXT4_USE_FOR_EXT2=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y
# CONFIG_EXT4_ENCRYPTION is not set
# CONFIG_EXT4_DEBUG is not set
CONFIG_JBD2=y
CONFIG_JBD2_DEBUG=y
CONFIG_FS_MBCACHE=y
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
# CONFIG_XFS_FS is not set
# CONFIG_GFS2_FS is not set
# CONFIG_OCFS2_FS is not set
CONFIG_BTRFS_FS=m
CONFIG_BTRFS_FS_POSIX_ACL=y
# CONFIG_BTRFS_FS_CHECK_INTEGRITY is not set
# CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
# CONFIG_BTRFS_DEBUG is not set
# CONFIG_BTRFS_ASSERT is not set
# CONFIG_NILFS2_FS is not set
# CONFIG_F2FS_FS is not set
# CONFIG_FS_DAX is not set
CONFIG_FS_POSIX_ACL=y
CONFIG_EXPORTFS=y
CONFIG_FILE_LOCKING=y
# CONFIG_MANDATORY_FILE_LOCKING is not set
CONFIG_FSNOTIFY=y
CONFIG_DNOTIFY=y
CONFIG_INOTIFY_USER=y
CONFIG_FANOTIFY=y
# CONFIG_FANOTIFY_ACCESS_PERMISSIONS is not set
CONFIG_QUOTA=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
CONFIG_PRINT_QUOTA_WARNING=y
# CONFIG_QUOTA_DEBUG is not set
CONFIG_QUOTA_TREE=m
CONFIG_QFMT_V1=m
CONFIG_QFMT_V2=m
CONFIG_QUOTACTL=y
CONFIG_AUTOFS4_FS=m
# CONFIG_FUSE_FS is not set
# CONFIG_OVERLAY_FS is not set

#
# Caches
#
CONFIG_FSCACHE=m
# CONFIG_FSCACHE_STATS is not set
# CONFIG_FSCACHE_HISTOGRAM is not set
# CONFIG_FSCACHE_DEBUG is not set
# CONFIG_FSCACHE_OBJECT_LIST is not set
CONFIG_CACHEFILES=m
# CONFIG_CACHEFILES_DEBUG is not set
# CONFIG_CACHEFILES_HISTOGRAM is not set

#
# CD-ROM/DVD Filesystems
#
# CONFIG_ISO9660_FS is not set
# CONFIG_UDF_FS is not set

#
# DOS/FAT/NT Filesystems
#
# CONFIG_MSDOS_FS is not set
# CONFIG_VFAT_FS is not set
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_VMCORE=y
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
# CONFIG_PROC_CHILDREN is not set
CONFIG_KERNFS=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_TMPFS_XATTR=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_CONFIGFS_FS=m
CONFIG_MISC_FILESYSTEMS=y
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_ECRYPT_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
# CONFIG_LOGFS is not set
# CONFIG_CRAMFS is not set
# CONFIG_SQUASHFS is not set
# CONFIG_VXFS_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_MINIX_FS_NATIVE_ENDIAN is not set
# CONFIG_OMFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_QNX6FS_FS is not set
# CONFIG_ROMFS_FS is not set
# CONFIG_PSTORE is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set
# CONFIG_EXOFS_FS is not set
# CONFIG_NETWORK_FILESYSTEMS is not set
# CONFIG_NLS is not set
CONFIG_DLM=m
# CONFIG_DLM_DEBUG is not set

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y

#
# printk and dmesg options
#
CONFIG_PRINTK_TIME=y
CONFIG_MESSAGE_LOGLEVEL_DEFAULT=4
CONFIG_DYNAMIC_DEBUG=y

#
# Compile-time checks and compiler options
#
CONFIG_DEBUG_INFO=y
# CONFIG_DEBUG_INFO_REDUCED is not set
# CONFIG_DEBUG_INFO_SPLIT is not set
# CONFIG_DEBUG_INFO_DWARF4 is not set
# CONFIG_GDB_SCRIPTS is not set
CONFIG_ENABLE_WARN_DEPRECATED=y
# CONFIG_ENABLE_MUST_CHECK is not set
CONFIG_FRAME_WARN=1024
# CONFIG_STRIP_ASM_SYMS is not set
CONFIG_READABLE_ASM=y
CONFIG_UNUSED_SYMBOLS=y
# CONFIG_PAGE_OWNER is not set
CONFIG_DEBUG_FS=y
# CONFIG_HEADERS_CHECK is not set
# CONFIG_DEBUG_SECTION_MISMATCH is not set
# CONFIG_SECTION_MISMATCH_WARN_ONLY is not set
# CONFIG_DEBUG_FORCE_WEAK_PER_CPU is not set
CONFIG_MAGIC_SYSRQ=y
CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x1
CONFIG_DEBUG_KERNEL=y

#
# Memory Debugging
#
CONFIG_PAGE_EXTENSION=y
CONFIG_DEBUG_PAGEALLOC=y
CONFIG_DEBUG_OBJECTS=y
CONFIG_DEBUG_OBJECTS_SELFTEST=y
CONFIG_DEBUG_OBJECTS_FREE=y
CONFIG_DEBUG_OBJECTS_TIMERS=y
CONFIG_DEBUG_OBJECTS_WORK=y
CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER=y
CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT=1
CONFIG_SLUB_DEBUG_ON=y
CONFIG_SLUB_STATS=y
CONFIG_HAVE_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=400
# CONFIG_DEBUG_KMEMLEAK_TEST is not set
# CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF is not set
CONFIG_DEBUG_STACK_USAGE=y
CONFIG_DEBUG_VM=y
# CONFIG_DEBUG_VM_VMACACHE is not set
CONFIG_DEBUG_VM_RB=y
# CONFIG_DEBUG_VM_PGFLAGS is not set
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_MEMORY_NOTIFIER_ERROR_INJECT=m
CONFIG_DEBUG_PER_CPU_MAPS=y
CONFIG_DEBUG_SHIRQ=y

#
# Debug Lockups and Hangs
#
CONFIG_DETECT_HUNG_TASK=y
CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=120
# CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set
CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=0
# CONFIG_WQ_WATCHDOG is not set
# CONFIG_PANIC_ON_OOPS is not set
CONFIG_PANIC_ON_OOPS_VALUE=0
CONFIG_PANIC_TIMEOUT=0
CONFIG_SCHED_DEBUG=y
CONFIG_SCHED_INFO=y
CONFIG_SCHEDSTATS=y
# CONFIG_SCHED_STACK_END_CHECK is not set
# CONFIG_DEBUG_TIMEKEEPING is not set
CONFIG_TIMER_STATS=y
CONFIG_DEBUG_PREEMPT=y

#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
CONFIG_LOCKDEP=y
CONFIG_LOCK_STAT=y
CONFIG_DEBUG_LOCKDEP=y
CONFIG_DEBUG_ATOMIC_SLEEP=y
CONFIG_DEBUG_LOCKING_API_SELFTESTS=y
# CONFIG_LOCK_TORTURE_TEST is not set
CONFIG_TRACE_IRQFLAGS=y
CONFIG_STACKTRACE=y
# CONFIG_DEBUG_KOBJECT is not set
# CONFIG_DEBUG_KOBJECT_RELEASE is not set
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_LIST=y
# CONFIG_DEBUG_PI_LIST is not set
CONFIG_DEBUG_SG=y
CONFIG_DEBUG_NOTIFIERS=y
CONFIG_DEBUG_CREDENTIALS=y

#
# RCU Debugging
#
CONFIG_PROVE_RCU=y
# CONFIG_PROVE_RCU_REPEATEDLY is not set
# CONFIG_SPARSE_RCU_POINTER is not set
CONFIG_TORTURE_TEST=m
CONFIG_RCU_TORTURE_TEST=m
# CONFIG_RCU_TORTURE_TEST_SLOW_PREINIT is not set
# CONFIG_RCU_TORTURE_TEST_SLOW_INIT is not set
# CONFIG_RCU_TORTURE_TEST_SLOW_CLEANUP is not set
CONFIG_RCU_CPU_STALL_TIMEOUT=300
# CONFIG_RCU_TRACE is not set
# CONFIG_RCU_EQS_DEBUG is not set
# CONFIG_DEBUG_WQ_FORCE_RR_CPU is not set
# CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
CONFIG_NOTIFIER_ERROR_INJECTION=m
CONFIG_CPU_NOTIFIER_ERROR_INJECT=m
CONFIG_PM_NOTIFIER_ERROR_INJECT=m
# CONFIG_NETDEV_NOTIFIER_ERROR_INJECT is not set
CONFIG_FAULT_INJECTION=y
CONFIG_FAILSLAB=y
CONFIG_FAIL_PAGE_ALLOC=y
CONFIG_FAIL_MAKE_REQUEST=y
CONFIG_FAIL_IO_TIMEOUT=y
# CONFIG_FAIL_FUTEX is not set
CONFIG_FAULT_INJECTION_DEBUG_FS=y
CONFIG_FAULT_INJECTION_STACKTRACE_FILTER=y
CONFIG_LATENCYTOP=y
CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS=y
CONFIG_DEBUG_STRICT_USER_COPY_CHECKS=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_TRACE_CLOCK=y
CONFIG_RING_BUFFER=y
CONFIG_EVENT_TRACING=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_RING_BUFFER_ALLOW_SWAP=y
CONFIG_TRACING=y
CONFIG_GENERIC_TRACER=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
# CONFIG_FUNCTION_TRACER is not set
# CONFIG_IRQSOFF_TRACER is not set
# CONFIG_PREEMPT_TRACER is not set
# CONFIG_SCHED_TRACER is not set
# CONFIG_FTRACE_SYSCALLS is not set
# CONFIG_TRACER_SNAPSHOT is not set
CONFIG_BRANCH_PROFILE_NONE=y
# CONFIG_PROFILE_ANNOTATED_BRANCHES is not set
# CONFIG_PROFILE_ALL_BRANCHES is not set
# CONFIG_STACK_TRACER is not set
CONFIG_BLK_DEV_IO_TRACE=y
# CONFIG_KPROBE_EVENT is not set
# CONFIG_UPROBE_EVENT is not set
# CONFIG_PROBE_EVENTS is not set
# CONFIG_FTRACE_STARTUP_TEST is not set
# CONFIG_TRACEPOINT_BENCHMARK is not set
# CONFIG_RING_BUFFER_BENCHMARK is not set
# CONFIG_RING_BUFFER_STARTUP_TEST is not set
# CONFIG_TRACE_ENUM_MAP_FILE is not set

#
# Runtime Testing
#
CONFIG_LKDTM=m
CONFIG_TEST_LIST_SORT=y
CONFIG_KPROBES_SANITY_TEST=y
# CONFIG_BACKTRACE_SELF_TEST is not set
CONFIG_RBTREE_TEST=y
CONFIG_INTERVAL_TREE_TEST=m
CONFIG_PERCPU_TEST=m
CONFIG_ATOMIC64_SELFTEST=y
# CONFIG_TEST_HEXDUMP is not set
# CONFIG_TEST_STRING_HELPERS is not set
# CONFIG_TEST_KSTRTOX is not set
# CONFIG_TEST_PRINTF is not set
# CONFIG_TEST_RHASHTABLE is not set
CONFIG_DMA_API_DEBUG=y
# CONFIG_TEST_LKM is not set
# CONFIG_TEST_USER_COPY is not set
# CONFIG_TEST_BPF is not set
# CONFIG_TEST_FIRMWARE is not set
# CONFIG_TEST_UDELAY is not set
# CONFIG_MEMTEST is not set
# CONFIG_TEST_STATIC_KEYS is not set
# CONFIG_SAMPLES is not set
# CONFIG_UBSAN is not set
CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y
# CONFIG_STRICT_DEVMEM is not set
CONFIG_S390_PTDUMP=y
CONFIG_DEBUG_SET_MODULE_RONX=y

#
# Security options
#
CONFIG_KEYS=y
# CONFIG_PERSISTENT_KEYRINGS is not set
# CONFIG_BIG_KEYS is not set
# CONFIG_TRUSTED_KEYS is not set
CONFIG_ENCRYPTED_KEYS=m
# CONFIG_SECURITY_DMESG_RESTRICT is not set
CONFIG_SECURITY=y
CONFIG_SECURITYFS=y
CONFIG_SECURITY_NETWORK=y
# CONFIG_SECURITY_NETWORK_XFRM is not set
# CONFIG_SECURITY_PATH is not set
# CONFIG_SECURITY_SELINUX is not set
# CONFIG_SECURITY_SMACK is not set
# CONFIG_SECURITY_TOMOYO is not set
# CONFIG_SECURITY_APPARMOR is not set
# CONFIG_SECURITY_YAMA is not set
CONFIG_INTEGRITY=y
# CONFIG_INTEGRITY_SIGNATURE is not set
CONFIG_INTEGRITY_AUDIT=y
CONFIG_IMA=y
CONFIG_IMA_MEASURE_PCR_IDX=10
# CONFIG_IMA_TEMPLATE is not set
CONFIG_IMA_NG_TEMPLATE=y
# CONFIG_IMA_SIG_TEMPLATE is not set
CONFIG_IMA_DEFAULT_TEMPLATE="ima-ng"
CONFIG_IMA_DEFAULT_HASH_SHA1=y
# CONFIG_IMA_DEFAULT_HASH_SHA256 is not set
# CONFIG_IMA_DEFAULT_HASH_SHA512 is not set
# CONFIG_IMA_DEFAULT_HASH_WP512 is not set
CONFIG_IMA_DEFAULT_HASH="sha1"
# CONFIG_IMA_WRITE_POLICY is not set
# CONFIG_IMA_READ_POLICY is not set
CONFIG_IMA_APPRAISE=y
# CONFIG_EVM is not set
CONFIG_DEFAULT_SECURITY_DAC=y
CONFIG_DEFAULT_SECURITY=""
CONFIG_XOR_BLOCKS=m
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD=m
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_BLKCIPHER=m
CONFIG_CRYPTO_BLKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG=m
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_RNG_DEFAULT=m
CONFIG_CRYPTO_PCOMP=y
CONFIG_CRYPTO_PCOMP2=y
CONFIG_CRYPTO_AKCIPHER2=y
# CONFIG_CRYPTO_RSA is not set
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
CONFIG_CRYPTO_USER=m
# CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set
CONFIG_CRYPTO_GF128MUL=m
CONFIG_CRYPTO_NULL=m
CONFIG_CRYPTO_NULL2=y
# CONFIG_CRYPTO_PCRYPT is not set
CONFIG_CRYPTO_WORKQUEUE=y
CONFIG_CRYPTO_CRYPTD=m
# CONFIG_CRYPTO_MCRYPTD is not set
CONFIG_CRYPTO_AUTHENC=m
CONFIG_CRYPTO_TEST=m

#
# Authenticated Encryption with Associated Data
#
CONFIG_CRYPTO_CCM=m
CONFIG_CRYPTO_GCM=m
# CONFIG_CRYPTO_CHACHA20POLY1305 is not set
CONFIG_CRYPTO_SEQIV=m
CONFIG_CRYPTO_ECHAINIV=m

#
# Block modes
#
CONFIG_CRYPTO_CBC=m
CONFIG_CRYPTO_CTR=m
CONFIG_CRYPTO_CTS=m
CONFIG_CRYPTO_ECB=m
CONFIG_CRYPTO_LRW=m
CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_XTS=m
# CONFIG_CRYPTO_KEYWRAP is not set

#
# Hash modes
#
CONFIG_CRYPTO_CMAC=m
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_XCBC=m
CONFIG_CRYPTO_VMAC=m

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
CONFIG_CRYPTO_CRC32=m
CONFIG_CRYPTO_CRCT10DIF=y
CONFIG_CRYPTO_GHASH=m
# CONFIG_CRYPTO_POLY1305 is not set
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_RMD128=m
CONFIG_CRYPTO_RMD160=m
CONFIG_CRYPTO_RMD256=m
CONFIG_CRYPTO_RMD320=m
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA256=y
CONFIG_CRYPTO_SHA512=m
CONFIG_CRYPTO_TGR192=m
CONFIG_CRYPTO_WP512=m

#
# Ciphers
#
CONFIG_CRYPTO_AES=y
CONFIG_CRYPTO_ANUBIS=m
CONFIG_CRYPTO_ARC4=m
CONFIG_CRYPTO_BLOWFISH=m
CONFIG_CRYPTO_BLOWFISH_COMMON=m
CONFIG_CRYPTO_CAMELLIA=m
CONFIG_CRYPTO_CAST_COMMON=m
CONFIG_CRYPTO_CAST5=m
CONFIG_CRYPTO_CAST6=m
CONFIG_CRYPTO_DES=m
CONFIG_CRYPTO_FCRYPT=m
CONFIG_CRYPTO_KHAZAD=m
CONFIG_CRYPTO_SALSA20=m
# CONFIG_CRYPTO_CHACHA20 is not set
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_TWOFISH_COMMON=m

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=m
CONFIG_CRYPTO_ZLIB=y
CONFIG_CRYPTO_LZO=m
# CONFIG_CRYPTO_842 is not set
CONFIG_CRYPTO_LZ4=m
CONFIG_CRYPTO_LZ4HC=m

#
# Random Number Generation
#
CONFIG_CRYPTO_ANSI_CPRNG=m
CONFIG_CRYPTO_DRBG_MENU=m
CONFIG_CRYPTO_DRBG_HMAC=y
# CONFIG_CRYPTO_DRBG_HASH is not set
# CONFIG_CRYPTO_DRBG_CTR is not set
CONFIG_CRYPTO_DRBG=m
CONFIG_CRYPTO_JITTERENTROPY=m
CONFIG_CRYPTO_USER_API=m
CONFIG_CRYPTO_USER_API_HASH=m
CONFIG_CRYPTO_USER_API_SKCIPHER=m
# CONFIG_CRYPTO_USER_API_RNG is not set
# CONFIG_CRYPTO_USER_API_AEAD is not set
CONFIG_CRYPTO_HASH_INFO=y
CONFIG_CRYPTO_HW=y
CONFIG_ZCRYPT=m
CONFIG_CRYPTO_SHA1_S390=m
CONFIG_CRYPTO_SHA256_S390=m
CONFIG_CRYPTO_SHA512_S390=m
CONFIG_CRYPTO_DES_S390=m
CONFIG_CRYPTO_AES_S390=m
CONFIG_S390_PRNG=m
CONFIG_CRYPTO_GHASH_S390=m
CONFIG_ASYMMETRIC_KEY_TYPE=m
CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=m
CONFIG_PUBLIC_KEY_ALGO_RSA=m
CONFIG_X509_CERTIFICATE_PARSER=m
# CONFIG_PKCS7_MESSAGE_PARSER is not set

#
# Certificates for signature checking
#
# CONFIG_SYSTEM_TRUSTED_KEYRING is not set
CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_RAID6_PQ=m
CONFIG_BITREVERSE=y
# CONFIG_HAVE_ARCH_BITREVERSE is not set
CONFIG_GENERIC_NET_UTILS=y
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_GENERIC_IO=y
CONFIG_ARCH_USE_CMPXCHG_LOCKREF=y
CONFIG_CRC_CCITT=m
CONFIG_CRC16=y
CONFIG_CRC_T10DIF=y
CONFIG_CRC_ITU_T=m
CONFIG_CRC32=y
# CONFIG_CRC32_SELFTEST is not set
CONFIG_CRC32_SLICEBY8=y
# CONFIG_CRC32_SLICEBY4 is not set
# CONFIG_CRC32_SARWATE is not set
# CONFIG_CRC32_BIT is not set
CONFIG_CRC7=m
CONFIG_LIBCRC32C=m
CONFIG_CRC8=m
# CONFIG_AUDIT_ARCH_COMPAT_GENERIC is not set
# CONFIG_RANDOM32_SELFTEST is not set
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_LZO_COMPRESS=y
CONFIG_LZO_DECOMPRESS=y
CONFIG_LZ4_COMPRESS=m
CONFIG_LZ4HC_COMPRESS=m
CONFIG_LZ4_DECOMPRESS=y
CONFIG_XZ_DEC=y
CONFIG_XZ_DEC_X86=y
CONFIG_XZ_DEC_POWERPC=y
CONFIG_XZ_DEC_IA64=y
CONFIG_XZ_DEC_ARM=y
CONFIG_XZ_DEC_ARMTHUMB=y
CONFIG_XZ_DEC_SPARC=y
CONFIG_XZ_DEC_BCJ=y
# CONFIG_XZ_DEC_TEST is not set
CONFIG_DECOMPRESS_GZIP=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_LZMA=y
CONFIG_DECOMPRESS_XZ=y
CONFIG_DECOMPRESS_LZO=y
CONFIG_DECOMPRESS_LZ4=y
CONFIG_INTERVAL_TREE=y
CONFIG_ASSOCIATIVE_ARRAY=y
# CONFIG_CPUMASK_OFFSTACK is not set
CONFIG_CPU_RMAP=y
CONFIG_DQL=y
CONFIG_NLATTR=y
CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE=y
CONFIG_CLZ_TAB=y
CONFIG_CORDIC=m
# CONFIG_DDR is not set
CONFIG_IRQ_POLL=y
CONFIG_MPILIB=m
CONFIG_OID_REGISTRY=m
# CONFIG_SG_SPLIT is not set
CONFIG_ARCH_HAS_SG_CHAIN=y

#
# Virtualization
#
CONFIG_PFAULT=y
CONFIG_CMM=m
CONFIG_CMM_IUCV=y
CONFIG_APPLDATA_BASE=y
CONFIG_APPLDATA_MEM=m
CONFIG_APPLDATA_OS=m
CONFIG_APPLDATA_NET_SUM=m
CONFIG_S390_HYPFS_FS=y
CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_HAVE_KVM_IRQFD=y
CONFIG_HAVE_KVM_IRQ_ROUTING=y
CONFIG_HAVE_KVM_EVENTFD=y
CONFIG_KVM_ASYNC_PF=y
CONFIG_KVM_ASYNC_PF_SYNC=y
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y
CONFIG_KVM_VFIO=y
CONFIG_VIRTUALIZATION=y
CONFIG_KVM=m
CONFIG_KVM_S390_UCONTROL=y
CONFIG_S390_GUEST=y

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-17 19:04                           ` Sebastian Ott
  0 siblings, 0 replies; 153+ messages in thread
From: Sebastian Ott @ 2016-02-17 19:04 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Wed, 17 Feb 2016, Kirill A. Shutemov wrote:
> On Tue, Feb 16, 2016 at 05:24:44PM +0100, Gerald Schaefer wrote:
> > On Mon, 15 Feb 2016 23:35:26 +0200
> > "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> > 
> > > Is there any chance that I'll be able to trigger the bug using QEMU?
> > > Does anybody have an QEMU image I can use?
> > > 
> > 
> > I have no image, but trying to reproduce this under virtualization may
> > help to trigger this also on other architectures. After ruling out IPI
> > vs. fast_gup I do not really see why this should be arch-specific, and
> > it wouldn't be the first time that we hit subtle races first on s390, due
> > to our virtualized environment (my test case is make -j20 with 10 CPUs and
> > 4GB of memory, no swap).
> 
> Could you post your kernel config?

Attached.

> It would be nice also to check if disabling split_huge_page() would make
> any difference:
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index a75081ca31cf..26d2b7b21021 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3364,6 +3364,8 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
>  	bool mlocked;
>  	unsigned long flags;
> 
> +	return -EBUSY;
> +
>  	VM_BUG_ON_PAGE(is_huge_zero_page(page), page);
>  	VM_BUG_ON_PAGE(!PageAnon(page), page);
>  	VM_BUG_ON_PAGE(!PageLocked(page), page);
> -- 

65c23c6 + this patch also oopsed:

? 1707.903808! ODEBUG: active_state not available (active state 0) object type:
rcu_head hint:           (null)
? 1707.903852! ------------? cut here !------------
? 1707.903854! WARNING: at lib/debugobjects.c:263
? 1707.903856! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa vxl
an ib_mad ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr xor raid6_pq gh
ash_s390 mlx4_core prng ecb aes_s390 des_s390 des_generic sha512_s390 dm_mod sha
256_s390 genwqe_card sha1_s390 sha_common crc_itu_t scm_block eadm_sch vhost_net
tun vhost macvtap macvlan kvm autofs4
? 1707.903892! CPU: 4 PID: 25215 Comm: git Not tainted 4.5.0-rc4-00037-g65c23c6-
dirty #273
? 1707.903894! task: 0000000006a60000 ti: 0000000063b04000 task.ti: 0000000063b0
4000
? 1707.903896! Krnl PSW : 0404c00180000000 0000000000486ce0 (debug_print_object+
							     0xb0/0xd0)
? 1707.903905!            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:
3
Krnl GPRS: 0000000001a361c7 0000000006a60000 0000000000000060 0000000000000101
? 1707.903908!            0000000000486cdc 0000000000000000 000000000088cbdc 000
0000001b53848
? 1707.903910!            0700000000000001 0000000000000000 0000000001b53850 000
00000008bb820
? 1707.903912!            0000000000a8d710 00000000dcdd3d38 0000000000486cdc 000
00000dcdd3c38
? 1707.903920! Krnl Code: 0000000000486cd0: c0200021a496        larl    %%r2,8bb
5fc
0000000000486cd6: c0e5ffee03a1       brasl   %%r14,247418
#0000000000486cdc: a7f40001           brc     15,486cde
>0000000000486ce0: c41d002f488e       lrl     %%r1,a6fdfc
0000000000486ce6: e340f0e80004       lg      %%r4,232(%%r15)
0000000000486cec: a71a0001           ahi     %%r1,1
0000000000486cf0: eb6ff0a80004       lmg     %%r6,%%r15,168(%%r15)
0000000000486cf6: c41f002f4883       strl    %%r1,a6fdfc
? 1707.903960! Call Trace:
? 1707.903962! (?<0000000000486cdc>! debug_print_object+0xac/0xd0)
? 1707.903964!  ?<0000000000488094>! debug_object_active_state+0x164/0x178
? 1707.903969!  ?<00000000001b991c>! rcu_process_callbacks+0x564/0x9e8
? 1707.903973!  ?<000000000013d3ee>! __do_softirq+0x256/0x568
? 1707.903975!  ?<000000000013da3a>! irq_exit+0x7a/0xd8
? 1707.903979!  ?<000000000010c87e>! do_IRQ+0x86/0xc0
? 1707.903984!  ?<00000000006fa3f2>! ext_int_handler+0x11e/0x124
? 1707.903987!  ?<0000000000199bfe>! lock_release+0x5ce/0x670
? 1707.903989! (?<0000000000199be0>! lock_release+0x5b0/0x670)
? 1707.903993!  ?<00000000002dffa2>! getname_flags+0x82/0x218
? 1707.903994!  ?<00000000002e04e8>! user_path_at_empty+0x40/0x68
? 1707.903998!  ?<00000000002d44a4>! vfs_fstatat+0x6c/0xc8
? 1707.903999!  ?<00000000002d4894>! SyS_newlstat+0x2c/0x48
? 1707.904002!  ?<00000000006f9cce>! system_call+0xd6/0x258
? 1707.904003!  ?<000003ffb45f1124>! 0x3ffb45f1124
? 1707.904005! 1 lock held by git/25215:
? 1707.904006!  #0:  (&obj_hash?i!.lock){-.-.-.}, at: ?<0000000000487fdc>! debug
_object_active_state+0xac/0x178
? 1707.904012! Last Breaking-Event-Address:
? 1707.904014!  ?<0000000000486cdc>! debug_print_object+0xac/0xd0
? 1707.904016! ---? end trace 8ce68dc422e8321c !---
? 1707.904018! ODEBUG: deactivate not available (active state 0) object type: rc
u_head hint:           (null)
? 1707.904026! ------------? cut here !------------
? 1707.904027! WARNING: at lib/debugobjects.c:263
? 1707.904028! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa vxl
an ib_mad ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr xor raid6_pq gh
ash_s390 mlx4_core prng ecb aes_s390 des_s390 des_generic sha512_s390 dm_mod sha
256_s390 genwqe_card sha1_s390 sha_common crc_itu_t scm_block eadm_sch vhost_net
tun vhost macvtap macvlan kvm autofs4
? 1707.904055! CPU: 4 PID: 25215 Comm: git Tainted: G        W       4.5.0-rc4-0
0037-g65c23c6-dirty #273
? 1707.904057! task: 0000000006a60000 ti: 0000000063b04000 task.ti: 0000000063b0
4000
? 1707.904058! Krnl PSW : 0404c00180000000 0000000000486ce0 (debug_print_object+
							     0xb0/0xd0)
? 1707.904062!            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:
3
Krnl GPRS: 0000000001a361c7 0000000006a60000 000000000000005e 0000000000000101
? 1707.904066!            0000000000486cdc 0000000000000000 000000000088cbdc 000
000000000000a
? 1707.904068!            0000000091cdb020 07000000dcdd3c68 0000000001b53850 000
00000008979ea
? 1707.904069!            0000000000a8d710 00000000dcdd3d48 0000000000486cdc 000
00000dcdd3c48
? 1707.904074! Krnl Code: 0000000000486cd0: c0200021a496        larl    %%r2,8bb
5fc
0000000000486cd6: c0e5ffee03a1       brasl   %%r14,247418
#0000000000486cdc: a7f40001           brc     15,486cde
>0000000000486ce0: c41d002f488e       lrl     %%r1,a6fdfc
0000000000486ce6: e340f0e80004       lg      %%r4,232(%%r15)
0000000000486cec: a71a0001           ahi     %%r1,1
0000000000486cf0: eb6ff0a80004       lmg     %%r6,%%r15,168(%%r15)
0000000000486cf6: c41f002f4883       strl    %%r1,a6fdfc
? 1707.904088! Call Trace:
? 1707.904090! (?<0000000000486cdc>! debug_print_object+0xac/0xd0)
? 1707.904092!  ?<0000000000487a38>! debug_object_deactivate+0x170/0x188
? 1707.904094!  ?<00000000001b992e>! rcu_process_callbacks+0x576/0x9e8
? 1707.904096!  ?<000000000013d3ee>! __do_softirq+0x256/0x568
? 1707.904098!  ?<000000000013da3a>! irq_exit+0x7a/0xd8
? 1707.904100!  ?<000000000010c87e>! do_IRQ+0x86/0xc0
? 1707.904102!  ?<00000000006fa3f2>! ext_int_handler+0x11e/0x124
? 1707.904104!  ?<0000000000199bfe>! lock_release+0x5ce/0x670
? 1707.904106! (?<0000000000199be0>! lock_release+0x5b0/0x670)
? 1707.904108!  ?<00000000002dffa2>! getname_flags+0x82/0x218
? 1707.904109!  ?<00000000002e04e8>! user_path_at_empty+0x40/0x68
? 1707.904111!  ?<00000000002d44a4>! vfs_fstatat+0x6c/0xc8
? 1707.904113!  ?<00000000002d4894>! SyS_newlstat+0x2c/0x48
? 1707.904115!  ?<00000000006f9cce>! system_call+0xd6/0x258
? 1707.904117!  ?<000003ffb45f1124>! 0x3ffb45f1124
? 1707.904118! 1 lock held by git/25215:
? 1707.904119!  #0:  (&obj_hash?i!.lock){-.-.-.}, at: ?<000000000048796c>! debug
_object_deactivate+0xa4/0x188
? 1707.904124! Last Breaking-Event-Address:
? 1707.904126!  ?<0000000000486cdc>! debug_print_object+0xac/0xd0
? 1707.904128! ---? end trace 8ce68dc422e8321d !---
? 1707.904150! ------------? cut here !------------
? 1707.904152! Kernel BUG at 0000000008cf8002 ?verbose debug info unavailable!
? 1707.904197! illegal operation: 0001 ilc:1 ?#1! PREEMPT SMP DEBUG_PAGEALLOC
? 1707.904203! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en ib_sa vxl
an ib_mad ip6_udp_tunnel ib_core udp_tunnel ptp pps_core ib_addr xor raid6_pq gh
ash_s390 mlx4_core prng ecb aes_s390 des_s390 des_generic sha512_s390 dm_mod sha
256_s390 genwqe_card sha1_s390 sha_common crc_itu_t scm_block eadm_sch vhost_net
tun vhost macvtap macvlan kvm autofs4
? 1707.904240! CPU: 4 PID: 25215 Comm: git Tainted: G        W       4.5.0-rc4-0
0037-g65c23c6-dirty #273
? 1707.904242! task: 0000000006a60000 ti: 0000000063b04000 task.ti: 0000000063b0
4000
? 1707.904244! Krnl PSW : 0704d00180000000 0000000008cf8002 (0x8cf8002)
? 1707.904248!            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:
3
Krnl GPRS: 0000000000000000 0000000008cf8000 0000000091cdb020 0000000091cdb020
? 1707.904252!            00000000001b9964 0000000000000000 0000000000000000 000
000000000000a
? 1707.904254!            0000000000000000 0000000008cf8000 0000000000000004 000
00000034d6802
? 1707.904256!            00000000dec0f600 00000000007063d8 00000000001b99ae 000
00000dcdd3d18
? 1707.904263! Krnl Code: 0000000008cf7ff6: 5a5a5a5a            a       %%r5,265
0(%%r10,%%r5)
0000000008cf7ffa: 5a5a5a5a           a       %%r5,2650(%%r10,%%r5)
#0000000008cf7ffe: 5a5a0000           a       %%r5,0(%%r10,%%r0)
>0000000008cf8002: 0000               unknown
0000000008cf8004: 0000               unknown
0000000008cf8006: 0020               unknown
0000000008cf8008: 0000               unknown
0000000008cf800a: 0000               unknown
? 1707.904277! Call Trace:
? 1707.904279! (?<00000000001b9964>! rcu_process_callbacks+0x5ac/0x9e8)
? 1707.904282!  ?<000000000013d3ee>! __do_softirq+0x256/0x568
? 1707.904284!  ?<000000000013da3a>! irq_exit+0x7a/0xd8
? 1707.904286!  ?<000000000010c87e>! do_IRQ+0x86/0xc0
? 1707.904289!  ?<00000000006fa3f2>! ext_int_handler+0x11e/0x124
? 1707.904291!  ?<0000000000199bfe>! lock_release+0x5ce/0x670
? 1707.904293! (?<0000000000199be0>! lock_release+0x5b0/0x670)
? 1707.904295!  ?<00000000002dffa2>! getname_flags+0x82/0x218
? 1707.904297!  ?<00000000002e04e8>! user_path_at_empty+0x40/0x68
? 1707.904299!  ?<00000000002d44a4>! vfs_fstatat+0x6c/0xc8
? 1707.904301!  ?<00000000002d4894>! SyS_newlstat+0x2c/0x48
? 1707.904303!  ?<00000000006f9cce>! system_call+0xd6/0x258
? 1707.904305!  ?<000003ffb45f1124>! 0x3ffb45f1124
? 1707.904307! INFO: lockdep is turned off.
? 1707.904308! Last Breaking-Event-Address:
? 1707.904310!  ?<00000000001b99ac>! rcu_process_callbacks+0x5f4/0x9e8
? 1707.904314!
? 1707.904315! Kernel panic - not syncing: Fatal exception in interrupt
-------------- next part --------------
#
# Automatically generated file; DO NOT EDIT.
# Linux/s390 4.5.0-rc3 Kernel Configuration
#
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_GENERIC_LOCKBREAK=y
CONFIG_PGSTE=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_KEXEC=y
CONFIG_AUDIT_ARCH=y
CONFIG_NO_IOPORT_MAP=y
# CONFIG_PCI_QUIRKS is not set
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_S390=y
CONFIG_SCHED_OMIT_FRAME_POINTER=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
CONFIG_USELIB=y
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_WATCH=y
CONFIG_AUDIT_TREE=y

#
# IRQ subsystem
#
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
CONFIG_VIRT_CPU_ACCOUNTING=y
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y

#
# RCU Subsystem
#
CONFIG_PREEMPT_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TASKS_RCU=y
CONFIG_RCU_STALL_COMMON=y
# CONFIG_TREE_RCU_TRACE is not set
# CONFIG_RCU_EXPEDITE_BOOT is not set
CONFIG_BUILD_BIN2C=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=17
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_CGROUPS=y
# CONFIG_MEMCG is not set
CONFIG_BLK_CGROUP=y
# CONFIG_DEBUG_BLK_CGROUP is not set
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
CONFIG_RT_GROUP_SCHED=y
# CONFIG_CGROUP_PIDS is not set
CONFIG_CGROUP_FREEZER=y
# CONFIG_CGROUP_HUGETLB is not set
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_PERF=y
# CONFIG_CGROUP_DEBUG is not set
# CONFIG_CHECKPOINT_RESTORE is not set
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_IPC_NS=y
# CONFIG_USER_NS is not set
CONFIG_PID_NS=y
CONFIG_NET_NS=y
CONFIG_SCHED_AUTOGROUP=y
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
CONFIG_RD_LZO=y
CONFIG_RD_LZ4=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_BPF=y
# CONFIG_EXPERT is not set
CONFIG_MULTIUSER=y
# CONFIG_SGETMASK_SYSCALL is not set
CONFIG_SYSFS_SYSCALL=y
# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_HAVE_FUTEX_CMPXCHG=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
# CONFIG_BPF_SYSCALL is not set
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_ADVISE_SYSCALLS=y
# CONFIG_USERFAULTFD is not set
CONFIG_MEMBARRIER=y
# CONFIG_EMBEDDED is not set
CONFIG_HAVE_PERF_EVENTS=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_COMPAT_BRK is not set
# CONFIG_SLAB is not set
CONFIG_SLUB=y
CONFIG_SLUB_CPU_PARTIAL=y
# CONFIG_SYSTEM_DATA_VERIFICATION is not set
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
CONFIG_KEXEC_CORE=y
CONFIG_OPROFILE=m
CONFIG_HAVE_OPROFILE=y
CONFIG_KPROBES=y
CONFIG_JUMP_LABEL=y
# CONFIG_STATIC_KEYS_SELFTEST is not set
# CONFIG_UPROBES is not set
CONFIG_HAVE_64BIT_ALIGNED_ACCESS=y
CONFIG_ARCH_USE_BUILTIN_BSWAP=y
CONFIG_KRETPROBES=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_DMA_API_DEBUG=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y
CONFIG_HAVE_CMPXCHG_LOCAL=y
CONFIG_HAVE_CMPXCHG_DOUBLE=y
CONFIG_ARCH_WANT_IPC_PARSE_VERSION=y
CONFIG_ARCH_WANT_COMPAT_IPC_PARSE_VERSION=y
CONFIG_ARCH_WANT_OLD_COMPAT_IPC=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP_FILTER=y
# CONFIG_CC_STACKPROTECTOR is not set
CONFIG_HAVE_VIRT_CPU_ACCOUNTING=y
CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_HAVE_ARCH_SOFT_DIRTY=y
CONFIG_HAVE_MOD_ARCH_SPECIFIC=y
CONFIG_MODULES_USE_ELF_RELA=y
CONFIG_ARCH_HAS_ELF_RANDOMIZE=y
CONFIG_CLONE_BACKWARDS2=y
CONFIG_OLD_SIGSUSPEND3=y
CONFIG_OLD_SIGACTION=y
CONFIG_COMPAT_OLD_SIGACTION=y

#
# GCOV-based kernel profiling
#
# CONFIG_GCOV_KERNEL is not set
CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y
# CONFIG_HAVE_GENERIC_DMA_COHERENT is not set
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_FORCE_LOAD=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
# CONFIG_MODULE_SIG is not set
CONFIG_MODULE_COMPRESS=y
# CONFIG_MODULE_COMPRESS_GZIP is not set
CONFIG_MODULE_COMPRESS_XZ=y
CONFIG_MODULES_TREE_LOOKUP=y
CONFIG_BLOCK=y
CONFIG_BLK_DEV_BSG=y
CONFIG_BLK_DEV_BSGLIB=y
CONFIG_BLK_DEV_INTEGRITY=y
CONFIG_BLK_DEV_THROTTLING=y
# CONFIG_BLK_CMDLINE_PARSER is not set

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
# CONFIG_AIX_PARTITION is not set
# CONFIG_OSF_PARTITION is not set
# CONFIG_AMIGA_PARTITION is not set
# CONFIG_ATARI_PARTITION is not set
CONFIG_IBM_PARTITION=y
# CONFIG_MAC_PARTITION is not set
CONFIG_MSDOS_PARTITION=y
CONFIG_BSD_DISKLABEL=y
CONFIG_MINIX_SUBPARTITION=y
CONFIG_SOLARIS_X86_PARTITION=y
CONFIG_UNIXWARE_DISKLABEL=y
# CONFIG_LDM_PARTITION is not set
# CONFIG_SGI_PARTITION is not set
# CONFIG_ULTRIX_PARTITION is not set
# CONFIG_SUN_PARTITION is not set
# CONFIG_KARMA_PARTITION is not set
CONFIG_EFI_PARTITION=y
# CONFIG_SYSV68_PARTITION is not set
# CONFIG_CMDLINE_PARTITION is not set
CONFIG_BLOCK_COMPAT=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
CONFIG_CFQ_GROUP_IOSCHED=y
CONFIG_DEFAULT_DEADLINE=y
# CONFIG_DEFAULT_CFQ is not set
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="deadline"
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_ASN1=m
CONFIG_ARCH_INLINE_SPIN_TRYLOCK=y
CONFIG_ARCH_INLINE_SPIN_TRYLOCK_BH=y
CONFIG_ARCH_INLINE_SPIN_LOCK=y
CONFIG_ARCH_INLINE_SPIN_LOCK_BH=y
CONFIG_ARCH_INLINE_SPIN_LOCK_IRQ=y
CONFIG_ARCH_INLINE_SPIN_LOCK_IRQSAVE=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK_BH=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK_IRQ=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE=y
CONFIG_ARCH_INLINE_READ_TRYLOCK=y
CONFIG_ARCH_INLINE_READ_LOCK=y
CONFIG_ARCH_INLINE_READ_LOCK_BH=y
CONFIG_ARCH_INLINE_READ_LOCK_IRQ=y
CONFIG_ARCH_INLINE_READ_LOCK_IRQSAVE=y
CONFIG_ARCH_INLINE_READ_UNLOCK=y
CONFIG_ARCH_INLINE_READ_UNLOCK_BH=y
CONFIG_ARCH_INLINE_READ_UNLOCK_IRQ=y
CONFIG_ARCH_INLINE_READ_UNLOCK_IRQRESTORE=y
CONFIG_ARCH_INLINE_WRITE_TRYLOCK=y
CONFIG_ARCH_INLINE_WRITE_LOCK=y
CONFIG_ARCH_INLINE_WRITE_LOCK_BH=y
CONFIG_ARCH_INLINE_WRITE_LOCK_IRQ=y
CONFIG_ARCH_INLINE_WRITE_LOCK_IRQSAVE=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK_BH=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK_IRQ=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE=y
CONFIG_UNINLINE_SPIN_UNLOCK=y
CONFIG_ARCH_SUPPORTS_ATOMIC_RMW=y
CONFIG_RWSEM_SPIN_ON_OWNER=y
CONFIG_LOCK_SPIN_ON_OWNER=y
CONFIG_FREEZER=y
CONFIG_HAVE_LIVEPATCH=y

#
# Processor type and features
#
CONFIG_HAVE_MARCH_Z900_FEATURES=y
CONFIG_HAVE_MARCH_Z990_FEATURES=y
CONFIG_HAVE_MARCH_Z9_109_FEATURES=y
CONFIG_HAVE_MARCH_Z10_FEATURES=y
CONFIG_HAVE_MARCH_Z196_FEATURES=y
# CONFIG_HAVE_MARCH_ZEC12_FEATURES is not set
# CONFIG_HAVE_MARCH_Z13_FEATURES is not set
# CONFIG_MARCH_Z900 is not set
# CONFIG_MARCH_Z990 is not set
# CONFIG_MARCH_Z9_109 is not set
# CONFIG_MARCH_Z10 is not set
CONFIG_MARCH_Z196=y
# CONFIG_MARCH_ZEC12 is not set
# CONFIG_MARCH_Z13 is not set
# CONFIG_MARCH_Z900_TUNE is not set
# CONFIG_MARCH_Z990_TUNE is not set
# CONFIG_MARCH_Z9_109_TUNE is not set
# CONFIG_MARCH_Z10_TUNE is not set
# CONFIG_MARCH_Z196_TUNE is not set
CONFIG_MARCH_ZEC12_TUNE=y
# CONFIG_MARCH_Z13_TUNE is not set
# CONFIG_TUNE_DEFAULT is not set
# CONFIG_TUNE_Z900 is not set
# CONFIG_TUNE_Z990 is not set
# CONFIG_TUNE_Z9_109 is not set
# CONFIG_TUNE_Z10 is not set
# CONFIG_TUNE_Z196 is not set
CONFIG_TUNE_ZEC12=y
# CONFIG_TUNE_Z13 is not set
CONFIG_64BIT=y
CONFIG_COMPAT=y
CONFIG_SYSVIPC_COMPAT=y
CONFIG_KEYS_COMPAT=y
CONFIG_SMP=y
CONFIG_NR_CPUS=256
CONFIG_HOTPLUG_CPU=y
# CONFIG_NODES_SPAN_OTHER_NODES is not set
# CONFIG_NUMA is not set
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
CONFIG_SCHED_BOOK=y
CONFIG_SCHED_TOPOLOGY=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
CONFIG_HZ_100=y
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=100
CONFIG_SCHED_HRTICK=y

#
# Memory setup
#
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK=y
CONFIG_FORCE_MAX_ZONEORDER=9
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_HAVE_MEMORY_PRESENT=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_HAVE_MEMBLOCK=y
CONFIG_HAVE_MEMBLOCK_NODE_MAP=y
CONFIG_HAVE_MEMBLOCK_PHYS_MAP=y
CONFIG_NO_BOOTMEM=y
CONFIG_MEMORY_ISOLATION=y
# CONFIG_HAVE_BOOTMEM_INFO_NODE is not set
CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG_SPARSE=y
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_MEMORY_BALLOON=y
CONFIG_BALLOON_COMPACTION=y
CONFIG_COMPACTION=y
CONFIG_MIGRATION=y
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_MMU_NOTIFIER=y
CONFIG_KSM=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
# CONFIG_CLEANCACHE is not set
# CONFIG_FRONTSWAP is not set
# CONFIG_CMA is not set
# CONFIG_ZPOOL is not set
# CONFIG_ZBUD is not set
# CONFIG_ZSMALLOC is not set
# CONFIG_IDLE_PAGE_TRACKING is not set
CONFIG_PACK_STACK=y
CONFIG_CHECK_STACK=y
CONFIG_STACK_GUARD=256
# CONFIG_WARN_DYNAMIC_STACK is not set

#
# I/O subsystem
#
CONFIG_QDIO=y
CONFIG_PCI=y
CONFIG_PCI_NR_FUNCTIONS=64
CONFIG_PCI_NR_MSI=256
CONFIG_PCI_BUS_ADDR_T_64BIT=y
CONFIG_PCI_MSI=y
CONFIG_PCI_DEBUG=y
# CONFIG_PCI_REALLOC_ENABLE_AUTO is not set
# CONFIG_PCI_STUB is not set
CONFIG_PCI_ATS=y
CONFIG_PCI_IOV=y
# CONFIG_PCI_PRI is not set
# CONFIG_PCI_PASID is not set

#
# PCI host controller drivers
#
# CONFIG_PCIEPORTBUS is not set
CONFIG_HOTPLUG_PCI=y
# CONFIG_HOTPLUG_PCI_CPCI is not set
# CONFIG_HOTPLUG_PCI_SHPC is not set
CONFIG_HOTPLUG_PCI_S390=y
CONFIG_PCI_DOMAINS=y
CONFIG_HAS_IOMEM=y
CONFIG_IOMMU_HELPER=y
CONFIG_HAS_DMA=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_CHSC_SCH=y
CONFIG_SCM_BUS=y
CONFIG_EADM_SCH=m

#
# Dump support
#
CONFIG_CRASH_DUMP=y

#
# Executable file formats / Emulations
#
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
CONFIG_BINFMT_SCRIPT=y
# CONFIG_HAVE_AOUT is not set
CONFIG_BINFMT_MISC=m
CONFIG_COREDUMP=y
CONFIG_SECCOMP=y

#
# Power Management
#
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_HIBERNATE_CALLBACKS=y
CONFIG_HIBERNATION=y
CONFIG_ARCH_SAVE_PAGE_KEYS=y
CONFIG_PM_STD_PARTITION=""
CONFIG_PM_SLEEP=y
CONFIG_PM_SLEEP_SMP=y
# CONFIG_PM_AUTOSLEEP is not set
# CONFIG_PM_WAKELOCKS is not set
CONFIG_PM=y
# CONFIG_PM_DEBUG is not set
# CONFIG_WQ_POWER_EFFICIENT_DEFAULT is not set
CONFIG_NET=y
CONFIG_NET_INGRESS=y
CONFIG_NET_EGRESS=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_DIAG=m
CONFIG_UNIX=y
CONFIG_UNIX_DIAG=m
CONFIG_XFRM=y
CONFIG_XFRM_ALGO=m
CONFIG_XFRM_USER=m
# CONFIG_XFRM_SUB_POLICY is not set
# CONFIG_XFRM_MIGRATE is not set
# CONFIG_XFRM_STATISTICS is not set
CONFIG_XFRM_IPCOMP=m
CONFIG_NET_KEY=m
# CONFIG_NET_KEY_MIGRATE is not set
CONFIG_IUCV=y
CONFIG_AFIUCV=m
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
# CONFIG_IP_FIB_TRIE_STATS is not set
CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_IP_ROUTE_MULTIPATH=y
CONFIG_IP_ROUTE_VERBOSE=y
CONFIG_IP_ROUTE_CLASSID=y
# CONFIG_IP_PNP is not set
CONFIG_NET_IPIP=m
CONFIG_NET_IPGRE_DEMUX=m
CONFIG_NET_IP_TUNNEL=m
CONFIG_NET_IPGRE=m
CONFIG_NET_IPGRE_BROADCAST=y
CONFIG_IP_MROUTE=y
CONFIG_IP_MROUTE_MULTIPLE_TABLES=y
CONFIG_IP_PIMSM_V1=y
CONFIG_IP_PIMSM_V2=y
CONFIG_SYN_COOKIES=y
CONFIG_NET_IPVTI=m
CONFIG_NET_UDP_TUNNEL=m
# CONFIG_NET_FOU is not set
# CONFIG_NET_FOU_IP_TUNNELS is not set
CONFIG_INET_AH=m
CONFIG_INET_ESP=m
CONFIG_INET_IPCOMP=m
CONFIG_INET_XFRM_TUNNEL=m
CONFIG_INET_TUNNEL=m
CONFIG_INET_XFRM_MODE_TRANSPORT=m
CONFIG_INET_XFRM_MODE_TUNNEL=m
CONFIG_INET_XFRM_MODE_BEET=m
CONFIG_INET_LRO=y
CONFIG_INET_DIAG=m
CONFIG_INET_TCP_DIAG=m
CONFIG_INET_UDP_DIAG=m
# CONFIG_INET_DIAG_DESTROY is not set
CONFIG_TCP_CONG_ADVANCED=y
CONFIG_TCP_CONG_BIC=m
CONFIG_TCP_CONG_CUBIC=y
CONFIG_TCP_CONG_WESTWOOD=m
CONFIG_TCP_CONG_HTCP=m
CONFIG_TCP_CONG_HSTCP=m
CONFIG_TCP_CONG_HYBLA=m
CONFIG_TCP_CONG_VEGAS=m
CONFIG_TCP_CONG_SCALABLE=m
CONFIG_TCP_CONG_LP=m
CONFIG_TCP_CONG_VENO=m
CONFIG_TCP_CONG_YEAH=m
CONFIG_TCP_CONG_ILLINOIS=m
# CONFIG_TCP_CONG_DCTCP is not set
# CONFIG_TCP_CONG_CDG is not set
CONFIG_DEFAULT_CUBIC=y
# CONFIG_DEFAULT_RENO is not set
CONFIG_DEFAULT_TCP_CONG="cubic"
# CONFIG_TCP_MD5SIG is not set
CONFIG_IPV6=y
CONFIG_IPV6_ROUTER_PREF=y
# CONFIG_IPV6_ROUTE_INFO is not set
# CONFIG_IPV6_OPTIMISTIC_DAD is not set
CONFIG_INET6_AH=m
CONFIG_INET6_ESP=m
CONFIG_INET6_IPCOMP=m
CONFIG_IPV6_MIP6=m
CONFIG_INET6_XFRM_TUNNEL=m
CONFIG_INET6_TUNNEL=m
CONFIG_INET6_XFRM_MODE_TRANSPORT=m
CONFIG_INET6_XFRM_MODE_TUNNEL=m
CONFIG_INET6_XFRM_MODE_BEET=m
CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION=m
CONFIG_IPV6_VTI=m
CONFIG_IPV6_SIT=m
# CONFIG_IPV6_SIT_6RD is not set
CONFIG_IPV6_NDISC_NODETYPE=y
CONFIG_IPV6_TUNNEL=m
CONFIG_IPV6_GRE=m
CONFIG_IPV6_MULTIPLE_TABLES=y
CONFIG_IPV6_SUBTREES=y
# CONFIG_IPV6_MROUTE is not set
# CONFIG_NETLABEL is not set
CONFIG_NETWORK_SECMARK=y
CONFIG_NET_PTP_CLASSIFY=y
# CONFIG_NETWORK_PHY_TIMESTAMPING is not set
# CONFIG_NETFILTER is not set
# CONFIG_IP_DCCP is not set
CONFIG_IP_SCTP=m
CONFIG_NET_SCTPPROBE=m
# CONFIG_SCTP_DBG_OBJCNT is not set
CONFIG_SCTP_DEFAULT_COOKIE_HMAC_MD5=y
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_SHA1 is not set
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_NONE is not set
CONFIG_SCTP_COOKIE_HMAC_MD5=y
# CONFIG_SCTP_COOKIE_HMAC_SHA1 is not set
CONFIG_RDS=m
CONFIG_RDS_RDMA=m
CONFIG_RDS_TCP=m
CONFIG_RDS_DEBUG=y
# CONFIG_TIPC is not set
# CONFIG_ATM is not set
CONFIG_L2TP=m
CONFIG_L2TP_DEBUGFS=m
CONFIG_L2TP_V3=y
CONFIG_L2TP_IP=m
CONFIG_L2TP_ETH=m
CONFIG_STP=m
CONFIG_GARP=m
CONFIG_BRIDGE=m
CONFIG_BRIDGE_IGMP_SNOOPING=y
# CONFIG_BRIDGE_VLAN_FILTERING is not set
CONFIG_VLAN_8021Q=m
CONFIG_VLAN_8021Q_GVRP=y
# CONFIG_VLAN_8021Q_MVRP is not set
# CONFIG_DECNET is not set
CONFIG_LLC=m
# CONFIG_LLC2 is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_PHONET is not set
# CONFIG_6LOWPAN is not set
# CONFIG_IEEE802154 is not set
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
CONFIG_NET_SCH_CBQ=m
CONFIG_NET_SCH_HTB=m
CONFIG_NET_SCH_HFSC=m
CONFIG_NET_SCH_PRIO=m
CONFIG_NET_SCH_MULTIQ=m
CONFIG_NET_SCH_RED=m
CONFIG_NET_SCH_SFB=m
CONFIG_NET_SCH_SFQ=m
CONFIG_NET_SCH_TEQL=m
CONFIG_NET_SCH_TBF=m
CONFIG_NET_SCH_GRED=m
CONFIG_NET_SCH_DSMARK=m
CONFIG_NET_SCH_NETEM=m
CONFIG_NET_SCH_DRR=m
CONFIG_NET_SCH_MQPRIO=m
CONFIG_NET_SCH_CHOKE=m
CONFIG_NET_SCH_QFQ=m
CONFIG_NET_SCH_CODEL=m
CONFIG_NET_SCH_FQ_CODEL=m
# CONFIG_NET_SCH_FQ is not set
# CONFIG_NET_SCH_HHF is not set
# CONFIG_NET_SCH_PIE is not set
CONFIG_NET_SCH_INGRESS=m
CONFIG_NET_SCH_PLUG=m

#
# Classification
#
CONFIG_NET_CLS=y
CONFIG_NET_CLS_BASIC=m
CONFIG_NET_CLS_TCINDEX=m
CONFIG_NET_CLS_ROUTE4=m
CONFIG_NET_CLS_FW=m
CONFIG_NET_CLS_U32=m
CONFIG_CLS_U32_PERF=y
CONFIG_CLS_U32_MARK=y
CONFIG_NET_CLS_RSVP=m
CONFIG_NET_CLS_RSVP6=m
CONFIG_NET_CLS_FLOW=m
CONFIG_NET_CLS_CGROUP=y
CONFIG_NET_CLS_BPF=m
# CONFIG_NET_CLS_FLOWER is not set
# CONFIG_NET_EMATCH is not set
CONFIG_NET_CLS_ACT=y
CONFIG_NET_ACT_POLICE=m
CONFIG_NET_ACT_GACT=m
CONFIG_GACT_PROB=y
CONFIG_NET_ACT_MIRRED=m
CONFIG_NET_ACT_NAT=m
CONFIG_NET_ACT_PEDIT=m
CONFIG_NET_ACT_SIMP=m
CONFIG_NET_ACT_SKBEDIT=m
CONFIG_NET_ACT_CSUM=m
# CONFIG_NET_ACT_VLAN is not set
# CONFIG_NET_ACT_BPF is not set
# CONFIG_NET_CLS_IND is not set
CONFIG_NET_SCH_FIFO=y
# CONFIG_DCB is not set
CONFIG_DNS_RESOLVER=y
# CONFIG_BATMAN_ADV is not set
# CONFIG_OPENVSWITCH is not set
# CONFIG_VSOCKETS is not set
# CONFIG_NETLINK_MMAP is not set
# CONFIG_NETLINK_DIAG is not set
# CONFIG_MPLS is not set
# CONFIG_HSR is not set
# CONFIG_NET_SWITCHDEV is not set
# CONFIG_NET_L3_MASTER_DEV is not set
CONFIG_RPS=y
CONFIG_RFS_ACCEL=y
CONFIG_XPS=y
CONFIG_SOCK_CGROUP_DATA=y
# CONFIG_CGROUP_NET_PRIO is not set
CONFIG_CGROUP_NET_CLASSID=y
CONFIG_NET_RX_BUSY_POLL=y
CONFIG_BQL=y
CONFIG_BPF_JIT=y
CONFIG_NET_FLOW_LIMIT=y

#
# Network testing
#
CONFIG_NET_PKTGEN=m
CONFIG_NET_TCPPROBE=m
# CONFIG_NET_DROP_MONITOR is not set
# CONFIG_CAN is not set
# CONFIG_AF_RXRPC is not set
CONFIG_FIB_RULES=y
# CONFIG_WIMAX is not set
# CONFIG_RFKILL is not set
# CONFIG_NET_9P is not set
# CONFIG_CAIF is not set
# CONFIG_CEPH_LIB is not set
# CONFIG_NFC is not set
# CONFIG_LWTUNNEL is not set
CONFIG_HAVE_BPF_JIT=y
# CONFIG_PCMCIA is not set
CONFIG_CCW=y

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_UEVENT_HELPER=y
CONFIG_UEVENT_HELPER_PATH=""
CONFIG_DEVTMPFS=y
# CONFIG_DEVTMPFS_MOUNT is not set
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_FIRMWARE_IN_KERNEL=y
CONFIG_EXTRA_FIRMWARE=""
# CONFIG_FW_LOADER_USER_HELPER_FALLBACK is not set
CONFIG_ALLOW_DEV_COREDUMP=y
# CONFIG_DEBUG_DRIVER is not set
# CONFIG_DEBUG_DEVRES is not set
CONFIG_SYS_HYPERVISOR=y
# CONFIG_GENERIC_CPU_DEVICES is not set
CONFIG_GENERIC_CPU_AUTOPROBE=y
# CONFIG_DMA_SHARED_BUFFER is not set

#
# Bus devices
#
CONFIG_CONNECTOR=y
CONFIG_PROC_EVENTS=y
# CONFIG_MTD is not set
# CONFIG_OF is not set
# CONFIG_PARPORT is not set
CONFIG_BLK_DEV=y
# CONFIG_BLK_DEV_NULL_BLK is not set
# CONFIG_BLK_DEV_PCIESSD_MTIP32XX is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_BLK_DEV_DAC960 is not set
# CONFIG_BLK_DEV_UMEM is not set
# CONFIG_BLK_DEV_COW_COMMON is not set
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_LOOP_MIN_COUNT=8
CONFIG_BLK_DEV_CRYPTOLOOP=m
# CONFIG_BLK_DEV_DRBD is not set
# CONFIG_BLK_DEV_NBD is not set
# CONFIG_BLK_DEV_SKD is not set
CONFIG_BLK_DEV_OSD=m
# CONFIG_BLK_DEV_SX8 is not set
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=32768
# CONFIG_CDROM_PKTCDVD is not set
CONFIG_ATA_OVER_ETH=m

#
# S/390 block device drivers
#
CONFIG_BLK_DEV_XPRAM=m
CONFIG_DCSSBLK=m
CONFIG_DASD=y
CONFIG_DASD_PROFILE=y
CONFIG_DASD_ECKD=y
CONFIG_DASD_FBA=y
CONFIG_DASD_DIAG=y
CONFIG_DASD_EER=y
CONFIG_SCM_BLOCK=m
CONFIG_SCM_BLOCK_CLUSTER_WRITE=y
CONFIG_VIRTIO_BLK=y
# CONFIG_BLK_DEV_RBD is not set
# CONFIG_BLK_DEV_RSXX is not set
# CONFIG_BLK_DEV_NVME is not set

#
# Misc devices
#
# CONFIG_SENSORS_LIS3LV02D is not set
# CONFIG_DUMMY_IRQ is not set
# CONFIG_PHANTOM is not set
# CONFIG_SGI_IOC4 is not set
# CONFIG_TIFM_CORE is not set
# CONFIG_ENCLOSURE_SERVICES is not set
# CONFIG_HP_ILO is not set
# CONFIG_SRAM is not set
# CONFIG_C2PORT is not set

#
# EEPROM support
#
# CONFIG_EEPROM_93CX6 is not set
# CONFIG_CB710_CORE is not set

#
# Texas Instruments shared transport line discipline
#

#
# Altera FPGA firmware download module
#

#
# Intel MIC Bus Driver
#

#
# SCIF Bus Driver
#

#
# Intel MIC Host Driver
#

#
# Intel MIC Card Driver
#

#
# SCIF Driver
#

#
# Intel MIC Coprocessor State Management (COSM) Drivers
#
CONFIG_GENWQE=m
CONFIG_GENWQE_PLATFORM_ERROR_RECOVERY=0
# CONFIG_ECHO is not set
# CONFIG_CXL_BASE is not set
# CONFIG_CXL_KERNEL_API is not set
# CONFIG_CXL_EEH is not set

#
# SCSI device support
#
CONFIG_SCSI_MOD=y
CONFIG_RAID_ATTRS=m
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
CONFIG_SCSI_NETLINK=y
CONFIG_SCSI_MQ_DEFAULT=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
CONFIG_CHR_DEV_ST=m
CONFIG_CHR_DEV_OSST=m
CONFIG_BLK_DEV_SR=m
# CONFIG_BLK_DEV_SR_VENDOR is not set
CONFIG_CHR_DEV_SG=y
CONFIG_CHR_DEV_SCH=m
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_LOGGING=y
# CONFIG_SCSI_SCAN_ASYNC is not set

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=m
CONFIG_SCSI_FC_ATTRS=y
CONFIG_SCSI_ISCSI_ATTRS=m
CONFIG_SCSI_SAS_ATTRS=m
CONFIG_SCSI_SAS_LIBSAS=m
CONFIG_SCSI_SAS_HOST_SMP=y
CONFIG_SCSI_SRP_ATTRS=m
CONFIG_SCSI_LOWLEVEL=y
CONFIG_ISCSI_TCP=m
# CONFIG_ISCSI_BOOT_SYSFS is not set
# CONFIG_SCSI_CXGB3_ISCSI is not set
# CONFIG_SCSI_CXGB4_ISCSI is not set
# CONFIG_SCSI_BNX2_ISCSI is not set
# CONFIG_SCSI_BNX2X_FCOE is not set
# CONFIG_BE2ISCSI is not set
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_HPSA is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_3W_SAS is not set
# CONFIG_SCSI_ACARD is not set
# CONFIG_SCSI_AACRAID is not set
# CONFIG_SCSI_AIC7XXX is not set
# CONFIG_SCSI_AIC79XX is not set
# CONFIG_SCSI_AIC94XX is not set
# CONFIG_SCSI_MVSAS is not set
# CONFIG_SCSI_MVUMI is not set
# CONFIG_SCSI_DPT_I2O is not set
# CONFIG_SCSI_ADVANSYS is not set
# CONFIG_SCSI_ARCMSR is not set
# CONFIG_SCSI_ESAS2R is not set
# CONFIG_MEGARAID_NEWGEN is not set
# CONFIG_MEGARAID_LEGACY is not set
# CONFIG_MEGARAID_SAS is not set
# CONFIG_SCSI_MPT3SAS is not set
# CONFIG_SCSI_MPT2SAS is not set
# CONFIG_SCSI_UFSHCD is not set
# CONFIG_SCSI_HPTIOP is not set
CONFIG_LIBFC=m
CONFIG_LIBFCOE=m
# CONFIG_FCOE is not set
# CONFIG_SCSI_SNIC is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_FUTURE_DOMAIN is not set
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
# CONFIG_SCSI_STEX is not set
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
# CONFIG_SCSI_QLA_FC is not set
# CONFIG_SCSI_QLA_ISCSI is not set
# CONFIG_SCSI_LPFC is not set
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_AM53C974 is not set
# CONFIG_SCSI_WD719X is not set
CONFIG_SCSI_DEBUG=m
CONFIG_ZFCP=y
# CONFIG_SCSI_PMCRAID is not set
# CONFIG_SCSI_PM8001 is not set
# CONFIG_SCSI_BFA_FC is not set
CONFIG_SCSI_VIRTIO=m
# CONFIG_SCSI_CHELSIO_FCOE is not set
# CONFIG_SCSI_DH is not set
CONFIG_SCSI_OSD_INITIATOR=m
CONFIG_SCSI_OSD_ULD=m
CONFIG_SCSI_OSD_DPRINT_SENSE=1
# CONFIG_SCSI_OSD_DEBUG is not set
CONFIG_MD=y
# CONFIG_BLK_DEV_MD is not set
# CONFIG_BCACHE is not set
CONFIG_BLK_DEV_DM_BUILTIN=y
CONFIG_BLK_DEV_DM=m
# CONFIG_DM_MQ_DEFAULT is not set
# CONFIG_DM_DEBUG is not set
# CONFIG_DM_CRYPT is not set
# CONFIG_DM_SNAPSHOT is not set
# CONFIG_DM_THIN_PROVISIONING is not set
# CONFIG_DM_CACHE is not set
# CONFIG_DM_ERA is not set
CONFIG_DM_MIRROR=m
# CONFIG_DM_LOG_USERSPACE is not set
# CONFIG_DM_RAID is not set
# CONFIG_DM_ZERO is not set
# CONFIG_DM_MULTIPATH is not set
# CONFIG_DM_DELAY is not set
# CONFIG_DM_UEVENT is not set
# CONFIG_DM_FLAKEY is not set
# CONFIG_DM_VERITY is not set
# CONFIG_DM_SWITCH is not set
# CONFIG_DM_LOG_WRITES is not set
# CONFIG_TARGET_CORE is not set
# CONFIG_FUSION is not set

#
# IEEE 1394 (FireWire) support
#
# CONFIG_FIREWIRE is not set
# CONFIG_FIREWIRE_NOSY is not set
CONFIG_NETDEVICES=y
CONFIG_NET_CORE=y
CONFIG_BONDING=m
CONFIG_DUMMY=m
CONFIG_EQUALIZER=m
# CONFIG_NET_FC is not set
CONFIG_IFB=m
# CONFIG_NET_TEAM is not set
CONFIG_MACVLAN=m
CONFIG_MACVTAP=m
# CONFIG_IPVLAN is not set
CONFIG_VXLAN=m
# CONFIG_GENEVE is not set
# CONFIG_NETCONSOLE is not set
# CONFIG_NETPOLL is not set
# CONFIG_NET_POLL_CONTROLLER is not set
CONFIG_TUN=m
# CONFIG_TUN_VNET_CROSS_LE is not set
CONFIG_VETH=m
CONFIG_VIRTIO_NET=m
CONFIG_NLMON=m
# CONFIG_ARCNET is not set

#
# CAIF transport drivers
#
CONFIG_VHOST_NET=m
CONFIG_VHOST_RING=m
CONFIG_VHOST=m
# CONFIG_VHOST_CROSS_ENDIAN_LEGACY is not set
CONFIG_ETHERNET=y
# CONFIG_NET_VENDOR_3COM is not set
# CONFIG_NET_VENDOR_ADAPTEC is not set
# CONFIG_NET_VENDOR_AGERE is not set
# CONFIG_NET_VENDOR_ALTEON is not set
# CONFIG_ALTERA_TSE is not set
# CONFIG_NET_VENDOR_AMD is not set
# CONFIG_NET_VENDOR_ARC is not set
# CONFIG_NET_VENDOR_ATHEROS is not set
# CONFIG_NET_VENDOR_AURORA is not set
# CONFIG_NET_CADENCE is not set
# CONFIG_NET_VENDOR_BROADCOM is not set
# CONFIG_NET_VENDOR_BROCADE is not set
# CONFIG_NET_VENDOR_CAVIUM is not set
# CONFIG_NET_VENDOR_CHELSIO is not set
# CONFIG_NET_VENDOR_CISCO is not set
# CONFIG_DNET is not set
# CONFIG_NET_VENDOR_DEC is not set
# CONFIG_NET_VENDOR_DLINK is not set
CONFIG_NET_VENDOR_EMULEX=y
# CONFIG_BE2NET is not set
# CONFIG_NET_VENDOR_EZCHIP is not set
# CONFIG_NET_VENDOR_EXAR is not set
# CONFIG_NET_VENDOR_HP is not set
# CONFIG_NET_VENDOR_INTEL is not set
# CONFIG_JME is not set
# CONFIG_NET_VENDOR_MARVELL is not set
CONFIG_NET_VENDOR_MELLANOX=y
CONFIG_MLX4_EN=m
CONFIG_MLX4_EN_VXLAN=y
CONFIG_MLX4_CORE=m
CONFIG_MLX4_DEBUG=y
CONFIG_MLX5_CORE=m
CONFIG_MLX5_CORE_EN=y
CONFIG_MLXSW_CORE=m
CONFIG_MLXSW_PCI=m
# CONFIG_NET_VENDOR_MICREL is not set
# CONFIG_NET_VENDOR_MYRI is not set
# CONFIG_FEALNX is not set
# CONFIG_NET_VENDOR_NATSEMI is not set
CONFIG_NET_VENDOR_NETRONOME=y
# CONFIG_NFP_NETVF is not set
# CONFIG_NET_VENDOR_NVIDIA is not set
# CONFIG_NET_VENDOR_OKI is not set
# CONFIG_ETHOC is not set
# CONFIG_NET_PACKET_ENGINE is not set
# CONFIG_NET_VENDOR_QLOGIC is not set
# CONFIG_NET_VENDOR_QUALCOMM is not set
# CONFIG_NET_VENDOR_REALTEK is not set
# CONFIG_NET_VENDOR_RENESAS is not set
# CONFIG_NET_VENDOR_RDC is not set
# CONFIG_NET_VENDOR_ROCKER is not set
# CONFIG_NET_VENDOR_SAMSUNG is not set
# CONFIG_NET_VENDOR_SEEQ is not set
# CONFIG_NET_VENDOR_SILAN is not set
# CONFIG_NET_VENDOR_SIS is not set
# CONFIG_SFC is not set
# CONFIG_NET_VENDOR_SMSC is not set
# CONFIG_NET_VENDOR_STMICRO is not set
# CONFIG_NET_VENDOR_SUN is not set
# CONFIG_NET_VENDOR_SYNOPSYS is not set
# CONFIG_NET_VENDOR_TEHUTI is not set
# CONFIG_NET_VENDOR_TI is not set
# CONFIG_NET_VENDOR_VIA is not set
# CONFIG_NET_VENDOR_WIZNET is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_PHYLIB is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set

#
# S/390 network device drivers
#
CONFIG_LCS=m
CONFIG_CTCM=m
CONFIG_NETIUCV=m
CONFIG_SMSGIUCV=m
CONFIG_SMSGIUCV_EVENT=m
CONFIG_QETH=y
CONFIG_QETH_L2=y
CONFIG_QETH_L3=y
CONFIG_QETH_IPV6=y
CONFIG_CCWGROUP=y

#
# Host-side USB support is needed for USB Network Adapter support
#

#
# Enable WiMAX (Networking options) to see the WiMAX drivers
#
# CONFIG_WAN is not set
# CONFIG_VMXNET3 is not set
# CONFIG_NVM is not set

#
# Input device support
#
CONFIG_INPUT=y
# CONFIG_INPUT_FF_MEMLESS is not set
# CONFIG_INPUT_POLLDEV is not set
# CONFIG_INPUT_SPARSEKMAP is not set
# CONFIG_INPUT_MATRIXKMAP is not set

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
# CONFIG_INPUT_JOYDEV is not set
# CONFIG_INPUT_EVDEV is not set
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
# CONFIG_INPUT_KEYBOARD is not set
# CONFIG_INPUT_MOUSE is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TABLET is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
# CONFIG_INPUT_MISC is not set

#
# Hardware I/O ports
#
# CONFIG_SERIO is not set
# CONFIG_GAMEPORT is not set

#
# Character devices
#
CONFIG_TTY=y
CONFIG_UNIX98_PTYS=y
CONFIG_DEVPTS_MULTIPLE_INSTANCES=y
CONFIG_LEGACY_PTYS=y
CONFIG_LEGACY_PTY_COUNT=0
# CONFIG_SERIAL_NONSTANDARD is not set
# CONFIG_NOZOMI is not set
# CONFIG_N_GSM is not set
# CONFIG_TRACE_SINK is not set
CONFIG_DEVMEM=y
CONFIG_DEVKMEM=y

#
# Serial drivers
#
# CONFIG_SERIAL_8250 is not set

#
# Non-8250 serial port support
#
# CONFIG_SERIAL_UARTLITE is not set
# CONFIG_SERIAL_JSM is not set
# CONFIG_SERIAL_SCCNXP is not set
# CONFIG_SERIAL_ALTERA_JTAGUART is not set
# CONFIG_SERIAL_ALTERA_UART is not set
# CONFIG_SERIAL_ARC is not set
# CONFIG_SERIAL_RP2 is not set
# CONFIG_SERIAL_FSL_LPUART is not set
CONFIG_HVC_DRIVER=y
CONFIG_HVC_IUCV=y
CONFIG_VIRTIO_CONSOLE=y
# CONFIG_IPMI_HANDLER is not set
CONFIG_HW_RANDOM=m
# CONFIG_HW_RANDOM_TIMERIOMEM is not set
CONFIG_HW_RANDOM_VIRTIO=m
CONFIG_HW_RANDOM_TPM=m
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set
CONFIG_RAW_DRIVER=m
CONFIG_MAX_RAW_DEVS=256
CONFIG_HANGCHECK_TIMER=m
CONFIG_TCG_TPM=y
CONFIG_DEVPORT=y

#
# S/390 character device drivers
#
CONFIG_TN3270=y
CONFIG_TN3270_TTY=y
CONFIG_TN3270_FS=y
CONFIG_TN3270_CONSOLE=y
CONFIG_TN3215=y
CONFIG_TN3215_CONSOLE=y
CONFIG_CCW_CONSOLE=y
CONFIG_SCLP_TTY=y
CONFIG_SCLP_CONSOLE=y
CONFIG_SCLP_VT220_TTY=y
CONFIG_SCLP_VT220_CONSOLE=y
CONFIG_SCLP_ASYNC=m
CONFIG_SCLP_ASYNC_ID="000000000"
CONFIG_HMC_DRV=m
# CONFIG_SCLP_OFB is not set
CONFIG_S390_TAPE=m

#
# S/390 tape hardware support
#
CONFIG_S390_TAPE_34XX=m
CONFIG_S390_TAPE_3590=m
CONFIG_VMLOGRDR=m
CONFIG_VMCP=y
CONFIG_MONREADER=m
CONFIG_MONWRITER=m
CONFIG_S390_VMUR=m
# CONFIG_XILLYBUS is not set

#
# I2C support
#
# CONFIG_I2C is not set
# CONFIG_SPI is not set
# CONFIG_SPMI is not set
# CONFIG_HSI is not set

#
# PPS support
#
CONFIG_PPS=m
# CONFIG_PPS_DEBUG is not set

#
# PPS clients support
#
# CONFIG_PPS_CLIENT_KTIMER is not set
# CONFIG_PPS_CLIENT_LDISC is not set
# CONFIG_PPS_CLIENT_GPIO is not set

#
# PPS generators support
#

#
# PTP clock support
#
CONFIG_PTP_1588_CLOCK=m

#
# Enable PHYLIB and NETWORK_PHY_TIMESTAMPING to see the additional clocks.
#
# CONFIG_W1 is not set
# CONFIG_POWER_SUPPLY is not set
# CONFIG_POWER_RESET is not set
# CONFIG_POWER_AVS is not set
# CONFIG_HWMON is not set
# CONFIG_THERMAL is not set
CONFIG_WATCHDOG=y
CONFIG_WATCHDOG_CORE=y
CONFIG_WATCHDOG_NOWAYOUT=y
# CONFIG_WATCHDOG_SYSFS is not set

#
# Watchdog Device Drivers
#
# CONFIG_SOFT_WATCHDOG is not set
# CONFIG_XILINX_WATCHDOG is not set
# CONFIG_CADENCE_WATCHDOG is not set
# CONFIG_DW_WATCHDOG is not set
# CONFIG_MAX63XX_WATCHDOG is not set
# CONFIG_ALIM7101_WDT is not set
# CONFIG_I6300ESB_WDT is not set
# CONFIG_BCM7038_WDT is not set
CONFIG_DIAG288_WATCHDOG=m

#
# PCI-based Watchdog Cards
#
# CONFIG_PCIPCWATCHDOG is not set
# CONFIG_WDTPCI is not set
CONFIG_SSB_POSSIBLE=y

#
# Sonics Silicon Backplane
#
# CONFIG_SSB is not set
CONFIG_BCMA_POSSIBLE=y

#
# Broadcom specific AMBA
#
# CONFIG_BCMA is not set

#
# Multifunction device drivers
#
# CONFIG_MFD_CORE is not set
# CONFIG_HTC_PASIC3 is not set
# CONFIG_LPC_ICH is not set
# CONFIG_LPC_SCH is not set
# CONFIG_MFD_JANZ_CMODIO is not set
# CONFIG_MFD_KEMPLD is not set
# CONFIG_MFD_MT6397 is not set
# CONFIG_MFD_RDC321X is not set
# CONFIG_MFD_RTSX_PCI is not set
# CONFIG_MFD_SM501 is not set
# CONFIG_ABX500_CORE is not set
# CONFIG_MFD_SYSCON is not set
# CONFIG_MFD_TI_AM335X_TSCADC is not set
# CONFIG_MFD_TMIO is not set
# CONFIG_MFD_VX855 is not set
# CONFIG_REGULATOR is not set
# CONFIG_MEDIA_SUPPORT is not set

#
# Graphics support
#
# CONFIG_DRM is not set

#
# Frame buffer Devices
#
# CONFIG_FB is not set
# CONFIG_BACKLIGHT_LCD_SUPPORT is not set
# CONFIG_VGASTATE is not set
# CONFIG_SOUND is not set

#
# HID support
#
# CONFIG_HID is not set
CONFIG_USB_OHCI_LITTLE_ENDIAN=y
# CONFIG_USB_SUPPORT is not set
# CONFIG_UWB is not set
# CONFIG_MMC is not set
# CONFIG_MEMSTICK is not set
# CONFIG_NEW_LEDS is not set
# CONFIG_ACCESSIBILITY is not set
CONFIG_INFINIBAND=m
# CONFIG_INFINIBAND_USER_MAD is not set
CONFIG_INFINIBAND_USER_ACCESS=m
CONFIG_INFINIBAND_USER_MEM=y
CONFIG_INFINIBAND_ON_DEMAND_PAGING=y
CONFIG_INFINIBAND_ADDR_TRANS=y
CONFIG_INFINIBAND_ADDR_TRANS_CONFIGFS=y
# CONFIG_INFINIBAND_MTHCA is not set
# CONFIG_INFINIBAND_QIB is not set
CONFIG_MLX4_INFINIBAND=m
# CONFIG_MLX5_INFINIBAND is not set
# CONFIG_INFINIBAND_NES is not set
# CONFIG_INFINIBAND_OCRDMA is not set
# CONFIG_INFINIBAND_IPOIB is not set
# CONFIG_INFINIBAND_SRP is not set
# CONFIG_INFINIBAND_ISER is not set
# CONFIG_DMADEVICES is not set
# CONFIG_AUXDISPLAY is not set
# CONFIG_UIO is not set
CONFIG_VFIO_IOMMU_TYPE1=m
CONFIG_VFIO_VIRQFD=m
CONFIG_VFIO=m
# CONFIG_VFIO_NOIOMMU is not set
CONFIG_VFIO_PCI=m
CONFIG_IRQ_BYPASS_MANAGER=m
# CONFIG_VIRT_DRIVERS is not set
CONFIG_VIRTIO=y

#
# Virtio drivers
#
CONFIG_VIRTIO_PCI=m
CONFIG_VIRTIO_PCI_LEGACY=y
CONFIG_VIRTIO_BALLOON=m
# CONFIG_VIRTIO_INPUT is not set
# CONFIG_VIRTIO_MMIO is not set

#
# Microsoft Hyper-V guest support
#
# CONFIG_STAGING is not set

#
# Hardware Spinlock drivers
#

#
# Clock Source drivers
#
# CONFIG_ATMEL_PIT is not set
# CONFIG_SH_TIMER_CMT is not set
# CONFIG_SH_TIMER_MTU2 is not set
# CONFIG_SH_TIMER_TMU is not set
# CONFIG_EM_TIMER_STI is not set
# CONFIG_MAILBOX is not set
CONFIG_IOMMU_API=y
CONFIG_IOMMU_SUPPORT=y

#
# Generic IOMMU Pagetable Support
#
CONFIG_S390_IOMMU=y

#
# Remoteproc drivers
#
# CONFIG_STE_MODEM_RPROC is not set

#
# Rpmsg drivers
#

#
# SOC (System On Chip) specific Drivers
#
# CONFIG_SUNXI_SRAM is not set
# CONFIG_SOC_TI is not set
# CONFIG_PM_DEVFREQ is not set
# CONFIG_EXTCON is not set
# CONFIG_MEMORY is not set
# CONFIG_IIO is not set
# CONFIG_NTB is not set
# CONFIG_VME_BUS is not set
# CONFIG_PWM is not set
CONFIG_ARM_GIC_MAX_NR=1
# CONFIG_TS4800_IRQ is not set
# CONFIG_IPACK_BUS is not set
# CONFIG_RESET_CONTROLLER is not set
# CONFIG_FMC is not set

#
# PHY Subsystem
#
# CONFIG_GENERIC_PHY is not set
# CONFIG_PHY_PXA_28NM_HSIC is not set
# CONFIG_PHY_PXA_28NM_USB2 is not set
# CONFIG_BCM_KONA_USB2_PHY is not set
# CONFIG_PHY_HI6220_USB is not set
# CONFIG_POWERCAP is not set
# CONFIG_MCB is not set

#
# Performance monitor support
#
# CONFIG_RAS is not set
# CONFIG_THUNDERBOLT is not set

#
# Android
#
# CONFIG_ANDROID is not set
# CONFIG_LIBNVDIMM is not set
# CONFIG_NVMEM is not set
# CONFIG_STM is not set
# CONFIG_STM_DUMMY is not set
# CONFIG_STM_SOURCE_CONSOLE is not set
# CONFIG_INTEL_TH is not set

#
# FPGA Configuration Support
#
# CONFIG_FPGA is not set

#
# File systems
#
# CONFIG_EXT2_FS is not set
# CONFIG_EXT3_FS is not set
CONFIG_EXT4_FS=y
CONFIG_EXT4_USE_FOR_EXT2=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y
# CONFIG_EXT4_ENCRYPTION is not set
# CONFIG_EXT4_DEBUG is not set
CONFIG_JBD2=y
CONFIG_JBD2_DEBUG=y
CONFIG_FS_MBCACHE=y
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
# CONFIG_XFS_FS is not set
# CONFIG_GFS2_FS is not set
# CONFIG_OCFS2_FS is not set
CONFIG_BTRFS_FS=m
CONFIG_BTRFS_FS_POSIX_ACL=y
# CONFIG_BTRFS_FS_CHECK_INTEGRITY is not set
# CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
# CONFIG_BTRFS_DEBUG is not set
# CONFIG_BTRFS_ASSERT is not set
# CONFIG_NILFS2_FS is not set
# CONFIG_F2FS_FS is not set
# CONFIG_FS_DAX is not set
CONFIG_FS_POSIX_ACL=y
CONFIG_EXPORTFS=y
CONFIG_FILE_LOCKING=y
# CONFIG_MANDATORY_FILE_LOCKING is not set
CONFIG_FSNOTIFY=y
CONFIG_DNOTIFY=y
CONFIG_INOTIFY_USER=y
CONFIG_FANOTIFY=y
# CONFIG_FANOTIFY_ACCESS_PERMISSIONS is not set
CONFIG_QUOTA=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
CONFIG_PRINT_QUOTA_WARNING=y
# CONFIG_QUOTA_DEBUG is not set
CONFIG_QUOTA_TREE=m
CONFIG_QFMT_V1=m
CONFIG_QFMT_V2=m
CONFIG_QUOTACTL=y
CONFIG_AUTOFS4_FS=m
# CONFIG_FUSE_FS is not set
# CONFIG_OVERLAY_FS is not set

#
# Caches
#
CONFIG_FSCACHE=m
# CONFIG_FSCACHE_STATS is not set
# CONFIG_FSCACHE_HISTOGRAM is not set
# CONFIG_FSCACHE_DEBUG is not set
# CONFIG_FSCACHE_OBJECT_LIST is not set
CONFIG_CACHEFILES=m
# CONFIG_CACHEFILES_DEBUG is not set
# CONFIG_CACHEFILES_HISTOGRAM is not set

#
# CD-ROM/DVD Filesystems
#
# CONFIG_ISO9660_FS is not set
# CONFIG_UDF_FS is not set

#
# DOS/FAT/NT Filesystems
#
# CONFIG_MSDOS_FS is not set
# CONFIG_VFAT_FS is not set
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_VMCORE=y
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
# CONFIG_PROC_CHILDREN is not set
CONFIG_KERNFS=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_TMPFS_XATTR=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_CONFIGFS_FS=m
CONFIG_MISC_FILESYSTEMS=y
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_ECRYPT_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
# CONFIG_LOGFS is not set
# CONFIG_CRAMFS is not set
# CONFIG_SQUASHFS is not set
# CONFIG_VXFS_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_MINIX_FS_NATIVE_ENDIAN is not set
# CONFIG_OMFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_QNX6FS_FS is not set
# CONFIG_ROMFS_FS is not set
# CONFIG_PSTORE is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set
# CONFIG_EXOFS_FS is not set
# CONFIG_NETWORK_FILESYSTEMS is not set
# CONFIG_NLS is not set
CONFIG_DLM=m
# CONFIG_DLM_DEBUG is not set

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y

#
# printk and dmesg options
#
CONFIG_PRINTK_TIME=y
CONFIG_MESSAGE_LOGLEVEL_DEFAULT=4
CONFIG_DYNAMIC_DEBUG=y

#
# Compile-time checks and compiler options
#
CONFIG_DEBUG_INFO=y
# CONFIG_DEBUG_INFO_REDUCED is not set
# CONFIG_DEBUG_INFO_SPLIT is not set
# CONFIG_DEBUG_INFO_DWARF4 is not set
# CONFIG_GDB_SCRIPTS is not set
CONFIG_ENABLE_WARN_DEPRECATED=y
# CONFIG_ENABLE_MUST_CHECK is not set
CONFIG_FRAME_WARN=1024
# CONFIG_STRIP_ASM_SYMS is not set
CONFIG_READABLE_ASM=y
CONFIG_UNUSED_SYMBOLS=y
# CONFIG_PAGE_OWNER is not set
CONFIG_DEBUG_FS=y
# CONFIG_HEADERS_CHECK is not set
# CONFIG_DEBUG_SECTION_MISMATCH is not set
# CONFIG_SECTION_MISMATCH_WARN_ONLY is not set
# CONFIG_DEBUG_FORCE_WEAK_PER_CPU is not set
CONFIG_MAGIC_SYSRQ=y
CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x1
CONFIG_DEBUG_KERNEL=y

#
# Memory Debugging
#
CONFIG_PAGE_EXTENSION=y
CONFIG_DEBUG_PAGEALLOC=y
CONFIG_DEBUG_OBJECTS=y
CONFIG_DEBUG_OBJECTS_SELFTEST=y
CONFIG_DEBUG_OBJECTS_FREE=y
CONFIG_DEBUG_OBJECTS_TIMERS=y
CONFIG_DEBUG_OBJECTS_WORK=y
CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER=y
CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT=1
CONFIG_SLUB_DEBUG_ON=y
CONFIG_SLUB_STATS=y
CONFIG_HAVE_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=400
# CONFIG_DEBUG_KMEMLEAK_TEST is not set
# CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF is not set
CONFIG_DEBUG_STACK_USAGE=y
CONFIG_DEBUG_VM=y
# CONFIG_DEBUG_VM_VMACACHE is not set
CONFIG_DEBUG_VM_RB=y
# CONFIG_DEBUG_VM_PGFLAGS is not set
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_MEMORY_NOTIFIER_ERROR_INJECT=m
CONFIG_DEBUG_PER_CPU_MAPS=y
CONFIG_DEBUG_SHIRQ=y

#
# Debug Lockups and Hangs
#
CONFIG_DETECT_HUNG_TASK=y
CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=120
# CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set
CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=0
# CONFIG_WQ_WATCHDOG is not set
# CONFIG_PANIC_ON_OOPS is not set
CONFIG_PANIC_ON_OOPS_VALUE=0
CONFIG_PANIC_TIMEOUT=0
CONFIG_SCHED_DEBUG=y
CONFIG_SCHED_INFO=y
CONFIG_SCHEDSTATS=y
# CONFIG_SCHED_STACK_END_CHECK is not set
# CONFIG_DEBUG_TIMEKEEPING is not set
CONFIG_TIMER_STATS=y
CONFIG_DEBUG_PREEMPT=y

#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
CONFIG_LOCKDEP=y
CONFIG_LOCK_STAT=y
CONFIG_DEBUG_LOCKDEP=y
CONFIG_DEBUG_ATOMIC_SLEEP=y
CONFIG_DEBUG_LOCKING_API_SELFTESTS=y
# CONFIG_LOCK_TORTURE_TEST is not set
CONFIG_TRACE_IRQFLAGS=y
CONFIG_STACKTRACE=y
# CONFIG_DEBUG_KOBJECT is not set
# CONFIG_DEBUG_KOBJECT_RELEASE is not set
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_LIST=y
# CONFIG_DEBUG_PI_LIST is not set
CONFIG_DEBUG_SG=y
CONFIG_DEBUG_NOTIFIERS=y
CONFIG_DEBUG_CREDENTIALS=y

#
# RCU Debugging
#
CONFIG_PROVE_RCU=y
# CONFIG_PROVE_RCU_REPEATEDLY is not set
# CONFIG_SPARSE_RCU_POINTER is not set
CONFIG_TORTURE_TEST=m
CONFIG_RCU_TORTURE_TEST=m
# CONFIG_RCU_TORTURE_TEST_SLOW_PREINIT is not set
# CONFIG_RCU_TORTURE_TEST_SLOW_INIT is not set
# CONFIG_RCU_TORTURE_TEST_SLOW_CLEANUP is not set
CONFIG_RCU_CPU_STALL_TIMEOUT=300
# CONFIG_RCU_TRACE is not set
# CONFIG_RCU_EQS_DEBUG is not set
# CONFIG_DEBUG_WQ_FORCE_RR_CPU is not set
# CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
CONFIG_NOTIFIER_ERROR_INJECTION=m
CONFIG_CPU_NOTIFIER_ERROR_INJECT=m
CONFIG_PM_NOTIFIER_ERROR_INJECT=m
# CONFIG_NETDEV_NOTIFIER_ERROR_INJECT is not set
CONFIG_FAULT_INJECTION=y
CONFIG_FAILSLAB=y
CONFIG_FAIL_PAGE_ALLOC=y
CONFIG_FAIL_MAKE_REQUEST=y
CONFIG_FAIL_IO_TIMEOUT=y
# CONFIG_FAIL_FUTEX is not set
CONFIG_FAULT_INJECTION_DEBUG_FS=y
CONFIG_FAULT_INJECTION_STACKTRACE_FILTER=y
CONFIG_LATENCYTOP=y
CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS=y
CONFIG_DEBUG_STRICT_USER_COPY_CHECKS=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_TRACE_CLOCK=y
CONFIG_RING_BUFFER=y
CONFIG_EVENT_TRACING=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_RING_BUFFER_ALLOW_SWAP=y
CONFIG_TRACING=y
CONFIG_GENERIC_TRACER=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
# CONFIG_FUNCTION_TRACER is not set
# CONFIG_IRQSOFF_TRACER is not set
# CONFIG_PREEMPT_TRACER is not set
# CONFIG_SCHED_TRACER is not set
# CONFIG_FTRACE_SYSCALLS is not set
# CONFIG_TRACER_SNAPSHOT is not set
CONFIG_BRANCH_PROFILE_NONE=y
# CONFIG_PROFILE_ANNOTATED_BRANCHES is not set
# CONFIG_PROFILE_ALL_BRANCHES is not set
# CONFIG_STACK_TRACER is not set
CONFIG_BLK_DEV_IO_TRACE=y
# CONFIG_KPROBE_EVENT is not set
# CONFIG_UPROBE_EVENT is not set
# CONFIG_PROBE_EVENTS is not set
# CONFIG_FTRACE_STARTUP_TEST is not set
# CONFIG_TRACEPOINT_BENCHMARK is not set
# CONFIG_RING_BUFFER_BENCHMARK is not set
# CONFIG_RING_BUFFER_STARTUP_TEST is not set
# CONFIG_TRACE_ENUM_MAP_FILE is not set

#
# Runtime Testing
#
CONFIG_LKDTM=m
CONFIG_TEST_LIST_SORT=y
CONFIG_KPROBES_SANITY_TEST=y
# CONFIG_BACKTRACE_SELF_TEST is not set
CONFIG_RBTREE_TEST=y
CONFIG_INTERVAL_TREE_TEST=m
CONFIG_PERCPU_TEST=m
CONFIG_ATOMIC64_SELFTEST=y
# CONFIG_TEST_HEXDUMP is not set
# CONFIG_TEST_STRING_HELPERS is not set
# CONFIG_TEST_KSTRTOX is not set
# CONFIG_TEST_PRINTF is not set
# CONFIG_TEST_RHASHTABLE is not set
CONFIG_DMA_API_DEBUG=y
# CONFIG_TEST_LKM is not set
# CONFIG_TEST_USER_COPY is not set
# CONFIG_TEST_BPF is not set
# CONFIG_TEST_FIRMWARE is not set
# CONFIG_TEST_UDELAY is not set
# CONFIG_MEMTEST is not set
# CONFIG_TEST_STATIC_KEYS is not set
# CONFIG_SAMPLES is not set
# CONFIG_UBSAN is not set
CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y
# CONFIG_STRICT_DEVMEM is not set
CONFIG_S390_PTDUMP=y
CONFIG_DEBUG_SET_MODULE_RONX=y

#
# Security options
#
CONFIG_KEYS=y
# CONFIG_PERSISTENT_KEYRINGS is not set
# CONFIG_BIG_KEYS is not set
# CONFIG_TRUSTED_KEYS is not set
CONFIG_ENCRYPTED_KEYS=m
# CONFIG_SECURITY_DMESG_RESTRICT is not set
CONFIG_SECURITY=y
CONFIG_SECURITYFS=y
CONFIG_SECURITY_NETWORK=y
# CONFIG_SECURITY_NETWORK_XFRM is not set
# CONFIG_SECURITY_PATH is not set
# CONFIG_SECURITY_SELINUX is not set
# CONFIG_SECURITY_SMACK is not set
# CONFIG_SECURITY_TOMOYO is not set
# CONFIG_SECURITY_APPARMOR is not set
# CONFIG_SECURITY_YAMA is not set
CONFIG_INTEGRITY=y
# CONFIG_INTEGRITY_SIGNATURE is not set
CONFIG_INTEGRITY_AUDIT=y
CONFIG_IMA=y
CONFIG_IMA_MEASURE_PCR_IDX=10
# CONFIG_IMA_TEMPLATE is not set
CONFIG_IMA_NG_TEMPLATE=y
# CONFIG_IMA_SIG_TEMPLATE is not set
CONFIG_IMA_DEFAULT_TEMPLATE="ima-ng"
CONFIG_IMA_DEFAULT_HASH_SHA1=y
# CONFIG_IMA_DEFAULT_HASH_SHA256 is not set
# CONFIG_IMA_DEFAULT_HASH_SHA512 is not set
# CONFIG_IMA_DEFAULT_HASH_WP512 is not set
CONFIG_IMA_DEFAULT_HASH="sha1"
# CONFIG_IMA_WRITE_POLICY is not set
# CONFIG_IMA_READ_POLICY is not set
CONFIG_IMA_APPRAISE=y
# CONFIG_EVM is not set
CONFIG_DEFAULT_SECURITY_DAC=y
CONFIG_DEFAULT_SECURITY=""
CONFIG_XOR_BLOCKS=m
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD=m
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_BLKCIPHER=m
CONFIG_CRYPTO_BLKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG=m
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_RNG_DEFAULT=m
CONFIG_CRYPTO_PCOMP=y
CONFIG_CRYPTO_PCOMP2=y
CONFIG_CRYPTO_AKCIPHER2=y
# CONFIG_CRYPTO_RSA is not set
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
CONFIG_CRYPTO_USER=m
# CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set
CONFIG_CRYPTO_GF128MUL=m
CONFIG_CRYPTO_NULL=m
CONFIG_CRYPTO_NULL2=y
# CONFIG_CRYPTO_PCRYPT is not set
CONFIG_CRYPTO_WORKQUEUE=y
CONFIG_CRYPTO_CRYPTD=m
# CONFIG_CRYPTO_MCRYPTD is not set
CONFIG_CRYPTO_AUTHENC=m
CONFIG_CRYPTO_TEST=m

#
# Authenticated Encryption with Associated Data
#
CONFIG_CRYPTO_CCM=m
CONFIG_CRYPTO_GCM=m
# CONFIG_CRYPTO_CHACHA20POLY1305 is not set
CONFIG_CRYPTO_SEQIV=m
CONFIG_CRYPTO_ECHAINIV=m

#
# Block modes
#
CONFIG_CRYPTO_CBC=m
CONFIG_CRYPTO_CTR=m
CONFIG_CRYPTO_CTS=m
CONFIG_CRYPTO_ECB=m
CONFIG_CRYPTO_LRW=m
CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_XTS=m
# CONFIG_CRYPTO_KEYWRAP is not set

#
# Hash modes
#
CONFIG_CRYPTO_CMAC=m
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_XCBC=m
CONFIG_CRYPTO_VMAC=m

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
CONFIG_CRYPTO_CRC32=m
CONFIG_CRYPTO_CRCT10DIF=y
CONFIG_CRYPTO_GHASH=m
# CONFIG_CRYPTO_POLY1305 is not set
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_RMD128=m
CONFIG_CRYPTO_RMD160=m
CONFIG_CRYPTO_RMD256=m
CONFIG_CRYPTO_RMD320=m
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA256=y
CONFIG_CRYPTO_SHA512=m
CONFIG_CRYPTO_TGR192=m
CONFIG_CRYPTO_WP512=m

#
# Ciphers
#
CONFIG_CRYPTO_AES=y
CONFIG_CRYPTO_ANUBIS=m
CONFIG_CRYPTO_ARC4=m
CONFIG_CRYPTO_BLOWFISH=m
CONFIG_CRYPTO_BLOWFISH_COMMON=m
CONFIG_CRYPTO_CAMELLIA=m
CONFIG_CRYPTO_CAST_COMMON=m
CONFIG_CRYPTO_CAST5=m
CONFIG_CRYPTO_CAST6=m
CONFIG_CRYPTO_DES=m
CONFIG_CRYPTO_FCRYPT=m
CONFIG_CRYPTO_KHAZAD=m
CONFIG_CRYPTO_SALSA20=m
# CONFIG_CRYPTO_CHACHA20 is not set
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_TWOFISH_COMMON=m

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=m
CONFIG_CRYPTO_ZLIB=y
CONFIG_CRYPTO_LZO=m
# CONFIG_CRYPTO_842 is not set
CONFIG_CRYPTO_LZ4=m
CONFIG_CRYPTO_LZ4HC=m

#
# Random Number Generation
#
CONFIG_CRYPTO_ANSI_CPRNG=m
CONFIG_CRYPTO_DRBG_MENU=m
CONFIG_CRYPTO_DRBG_HMAC=y
# CONFIG_CRYPTO_DRBG_HASH is not set
# CONFIG_CRYPTO_DRBG_CTR is not set
CONFIG_CRYPTO_DRBG=m
CONFIG_CRYPTO_JITTERENTROPY=m
CONFIG_CRYPTO_USER_API=m
CONFIG_CRYPTO_USER_API_HASH=m
CONFIG_CRYPTO_USER_API_SKCIPHER=m
# CONFIG_CRYPTO_USER_API_RNG is not set
# CONFIG_CRYPTO_USER_API_AEAD is not set
CONFIG_CRYPTO_HASH_INFO=y
CONFIG_CRYPTO_HW=y
CONFIG_ZCRYPT=m
CONFIG_CRYPTO_SHA1_S390=m
CONFIG_CRYPTO_SHA256_S390=m
CONFIG_CRYPTO_SHA512_S390=m
CONFIG_CRYPTO_DES_S390=m
CONFIG_CRYPTO_AES_S390=m
CONFIG_S390_PRNG=m
CONFIG_CRYPTO_GHASH_S390=m
CONFIG_ASYMMETRIC_KEY_TYPE=m
CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=m
CONFIG_PUBLIC_KEY_ALGO_RSA=m
CONFIG_X509_CERTIFICATE_PARSER=m
# CONFIG_PKCS7_MESSAGE_PARSER is not set

#
# Certificates for signature checking
#
# CONFIG_SYSTEM_TRUSTED_KEYRING is not set
CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_RAID6_PQ=m
CONFIG_BITREVERSE=y
# CONFIG_HAVE_ARCH_BITREVERSE is not set
CONFIG_GENERIC_NET_UTILS=y
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_GENERIC_IO=y
CONFIG_ARCH_USE_CMPXCHG_LOCKREF=y
CONFIG_CRC_CCITT=m
CONFIG_CRC16=y
CONFIG_CRC_T10DIF=y
CONFIG_CRC_ITU_T=m
CONFIG_CRC32=y
# CONFIG_CRC32_SELFTEST is not set
CONFIG_CRC32_SLICEBY8=y
# CONFIG_CRC32_SLICEBY4 is not set
# CONFIG_CRC32_SARWATE is not set
# CONFIG_CRC32_BIT is not set
CONFIG_CRC7=m
CONFIG_LIBCRC32C=m
CONFIG_CRC8=m
# CONFIG_AUDIT_ARCH_COMPAT_GENERIC is not set
# CONFIG_RANDOM32_SELFTEST is not set
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_LZO_COMPRESS=y
CONFIG_LZO_DECOMPRESS=y
CONFIG_LZ4_COMPRESS=m
CONFIG_LZ4HC_COMPRESS=m
CONFIG_LZ4_DECOMPRESS=y
CONFIG_XZ_DEC=y
CONFIG_XZ_DEC_X86=y
CONFIG_XZ_DEC_POWERPC=y
CONFIG_XZ_DEC_IA64=y
CONFIG_XZ_DEC_ARM=y
CONFIG_XZ_DEC_ARMTHUMB=y
CONFIG_XZ_DEC_SPARC=y
CONFIG_XZ_DEC_BCJ=y
# CONFIG_XZ_DEC_TEST is not set
CONFIG_DECOMPRESS_GZIP=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_LZMA=y
CONFIG_DECOMPRESS_XZ=y
CONFIG_DECOMPRESS_LZO=y
CONFIG_DECOMPRESS_LZ4=y
CONFIG_INTERVAL_TREE=y
CONFIG_ASSOCIATIVE_ARRAY=y
# CONFIG_CPUMASK_OFFSTACK is not set
CONFIG_CPU_RMAP=y
CONFIG_DQL=y
CONFIG_NLATTR=y
CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE=y
CONFIG_CLZ_TAB=y
CONFIG_CORDIC=m
# CONFIG_DDR is not set
CONFIG_IRQ_POLL=y
CONFIG_MPILIB=m
CONFIG_OID_REGISTRY=m
# CONFIG_SG_SPLIT is not set
CONFIG_ARCH_HAS_SG_CHAIN=y

#
# Virtualization
#
CONFIG_PFAULT=y
CONFIG_CMM=m
CONFIG_CMM_IUCV=y
CONFIG_APPLDATA_BASE=y
CONFIG_APPLDATA_MEM=m
CONFIG_APPLDATA_OS=m
CONFIG_APPLDATA_NET_SUM=m
CONFIG_S390_HYPFS_FS=y
CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_HAVE_KVM_IRQFD=y
CONFIG_HAVE_KVM_IRQ_ROUTING=y
CONFIG_HAVE_KVM_EVENTFD=y
CONFIG_KVM_ASYNC_PF=y
CONFIG_KVM_ASYNC_PF_SYNC=y
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y
CONFIG_KVM_VFIO=y
CONFIG_VIRTUALIZATION=y
CONFIG_KVM=m
CONFIG_KVM_S390_UCONTROL=y
CONFIG_S390_GUEST=y

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-13 11:58               ` Sebastian Ott
  (?)
@ 2016-02-17 19:13                 ` Gerald Schaefer
  -1 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-17 19:13 UTC (permalink / raw)
  To: Sebastian Ott
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

On Sat, 13 Feb 2016 12:58:31 +0100 (CET)
Sebastian Ott <sebott@linux.vnet.ibm.com> wrote:

> [   59.875935] ------------[ cut here ]------------
> [   59.875937] kernel BUG at mm/huge_memory.c:2884!
> [   59.875979] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [   59.875986] Modules linked in: bridge stp llc btrfs xor mlx4_en vxlan ip6_udp_tunnel udp_tunnel mlx4_ib ptp pps_core ib_sa ib_mad ib_core ib_addr ghash_s390 prng raid6_pq ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 mlx4_core sha_common genwqe_card scm_block crc_itu_t vhost_net tun vhost dm_mod macvtap eadm_sch macvlan kvm autofs4
> [   59.876033] CPU: 2 PID: 5402 Comm: git Tainted: G        W       4.4.0-07794-ga4eff16-dirty #77
> [   59.876036] task: 00000000d2312948 ti: 00000000cfecc000 task.ti: 00000000cfecc000
> [   59.876039] Krnl PSW : 0704d00180000000 00000000002bf3aa (__split_huge_pmd_locked+0x562/0xa10)
> [   59.876045]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
>                Krnl GPRS: 0000000001a7a1cf 000003d10177c000 0000000000044068 000000005df00215
> [   59.876051]            0000000000000001 0000000000000001 0000000000000000 00000000774e6900
> [   59.876054]            000003ff52000000 000000006d403b10 000000006e1eb800 000003ff51f00000
> [   59.876058]            000003d10177c000 0000000000715190 00000000002bf234 00000000cfecfb58
> [   59.876068] Krnl Code: 00000000002bf39c: d507d010a000	clc	16(8,%%r13),0(%%r10)
>                           00000000002bf3a2: a7840004		brc	8,2bf3aa
>                          #00000000002bf3a6: a7f40001		brc	15,2bf3a8
>                          >00000000002bf3aa: 91407440		tm	1088(%%r7),64
>                           00000000002bf3ae: a7840208		brc	8,2bf7be
>                           00000000002bf3b2: a7f401e9		brc	15,2bf784
>                           00000000002bf3b6: 9104a006		tm	6(%%r10),4
>                           00000000002bf3ba: a7740004		brc	7,2bf3c2
> [   59.876089] Call Trace:
> [   59.876092] ([<00000000002bf234>] __split_huge_pmd_locked+0x3ec/0xa10)
> [   59.876095]  [<00000000002c4310>] __split_huge_pmd+0x118/0x218
> [   59.876099]  [<00000000002810e8>] unmap_single_vma+0x2d8/0xb40
> [   59.876102]  [<0000000000282d66>] zap_page_range+0x116/0x318
> [   59.876105]  [<000000000029b834>] SyS_madvise+0x23c/0x5e8
> [   59.876108]  [<00000000006f9f56>] system_call+0xd6/0x258
> [   59.876111]  [<000003ff9bbfd282>] 0x3ff9bbfd282
> [   59.876113] INFO: lockdep is turned off.
> [   59.876115] Last Breaking-Event-Address:
> [   59.876118]  [<00000000002bf3a6>] __split_huge_pmd_locked+0x55e/0xa10

The BUG at mm/huge_memory.c:2884 is interesting, it's the BUG_ON(!pte_none(*pte))
check in __split_huge_pmd_locked(). Obviously we expect the pre-allocated
pagetables to be empty, but in collapse_huge_page() we deposit the original
pagetable instead of allocating a new (empty) one. This saves an allocation,
which is good, but doesn't that mean that if such a collapsed hugepage will
ever be split, we will always run into the BUG_ON(!pte_none(*pte)), or one
of the two other VM_BUG_ONs in mm/huge_memory.c that check the same?

This behavior is not new, it was the same before the THP rework, so I do not
assume that it is related to the current problems, maybe with the exception
of this specific crash. I never saw the BUG at mm/huge_memory.c:2884 myself,
and the other crashes probably cannot be explained with this. Maybe I am
also missing something, but I do not see how collapse_huge_page() and the
(non-empty) pgtable deposit there can work out with the BUG_ON(!pte_none(*pte))
checks. Any thoughts?

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-17 19:13                 ` Gerald Schaefer
  0 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-17 19:13 UTC (permalink / raw)
  To: Sebastian Ott
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

On Sat, 13 Feb 2016 12:58:31 +0100 (CET)
Sebastian Ott <sebott@linux.vnet.ibm.com> wrote:

> [   59.875935] ------------[ cut here ]------------
> [   59.875937] kernel BUG at mm/huge_memory.c:2884!
> [   59.875979] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [   59.875986] Modules linked in: bridge stp llc btrfs xor mlx4_en vxlan ip6_udp_tunnel udp_tunnel mlx4_ib ptp pps_core ib_sa ib_mad ib_core ib_addr ghash_s390 prng raid6_pq ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 mlx4_core sha_common genwqe_card scm_block crc_itu_t vhost_net tun vhost dm_mod macvtap eadm_sch macvlan kvm autofs4
> [   59.876033] CPU: 2 PID: 5402 Comm: git Tainted: G        W       4.4.0-07794-ga4eff16-dirty #77
> [   59.876036] task: 00000000d2312948 ti: 00000000cfecc000 task.ti: 00000000cfecc000
> [   59.876039] Krnl PSW : 0704d00180000000 00000000002bf3aa (__split_huge_pmd_locked+0x562/0xa10)
> [   59.876045]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
>                Krnl GPRS: 0000000001a7a1cf 000003d10177c000 0000000000044068 000000005df00215
> [   59.876051]            0000000000000001 0000000000000001 0000000000000000 00000000774e6900
> [   59.876054]            000003ff52000000 000000006d403b10 000000006e1eb800 000003ff51f00000
> [   59.876058]            000003d10177c000 0000000000715190 00000000002bf234 00000000cfecfb58
> [   59.876068] Krnl Code: 00000000002bf39c: d507d010a000	clc	16(8,%%r13),0(%%r10)
>                           00000000002bf3a2: a7840004		brc	8,2bf3aa
>                          #00000000002bf3a6: a7f40001		brc	15,2bf3a8
>                          >00000000002bf3aa: 91407440		tm	1088(%%r7),64
>                           00000000002bf3ae: a7840208		brc	8,2bf7be
>                           00000000002bf3b2: a7f401e9		brc	15,2bf784
>                           00000000002bf3b6: 9104a006		tm	6(%%r10),4
>                           00000000002bf3ba: a7740004		brc	7,2bf3c2
> [   59.876089] Call Trace:
> [   59.876092] ([<00000000002bf234>] __split_huge_pmd_locked+0x3ec/0xa10)
> [   59.876095]  [<00000000002c4310>] __split_huge_pmd+0x118/0x218
> [   59.876099]  [<00000000002810e8>] unmap_single_vma+0x2d8/0xb40
> [   59.876102]  [<0000000000282d66>] zap_page_range+0x116/0x318
> [   59.876105]  [<000000000029b834>] SyS_madvise+0x23c/0x5e8
> [   59.876108]  [<00000000006f9f56>] system_call+0xd6/0x258
> [   59.876111]  [<000003ff9bbfd282>] 0x3ff9bbfd282
> [   59.876113] INFO: lockdep is turned off.
> [   59.876115] Last Breaking-Event-Address:
> [   59.876118]  [<00000000002bf3a6>] __split_huge_pmd_locked+0x55e/0xa10

The BUG at mm/huge_memory.c:2884 is interesting, it's the BUG_ON(!pte_none(*pte))
check in __split_huge_pmd_locked(). Obviously we expect the pre-allocated
pagetables to be empty, but in collapse_huge_page() we deposit the original
pagetable instead of allocating a new (empty) one. This saves an allocation,
which is good, but doesn't that mean that if such a collapsed hugepage will
ever be split, we will always run into the BUG_ON(!pte_none(*pte)), or one
of the two other VM_BUG_ONs in mm/huge_memory.c that check the same?

This behavior is not new, it was the same before the THP rework, so I do not
assume that it is related to the current problems, maybe with the exception
of this specific crash. I never saw the BUG at mm/huge_memory.c:2884 myself,
and the other crashes probably cannot be explained with this. Maybe I am
also missing something, but I do not see how collapse_huge_page() and the
(non-empty) pgtable deposit there can work out with the BUG_ON(!pte_none(*pte))
checks. Any thoughts?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-17 19:13                 ` Gerald Schaefer
  0 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-17 19:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, 13 Feb 2016 12:58:31 +0100 (CET)
Sebastian Ott <sebott@linux.vnet.ibm.com> wrote:

> [   59.875935] ------------[ cut here ]------------
> [   59.875937] kernel BUG at mm/huge_memory.c:2884!
> [   59.875979] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [   59.875986] Modules linked in: bridge stp llc btrfs xor mlx4_en vxlan ip6_udp_tunnel udp_tunnel mlx4_ib ptp pps_core ib_sa ib_mad ib_core ib_addr ghash_s390 prng raid6_pq ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 mlx4_core sha_common genwqe_card scm_block crc_itu_t vhost_net tun vhost dm_mod macvtap eadm_sch macvlan kvm autofs4
> [   59.876033] CPU: 2 PID: 5402 Comm: git Tainted: G        W       4.4.0-07794-ga4eff16-dirty #77
> [   59.876036] task: 00000000d2312948 ti: 00000000cfecc000 task.ti: 00000000cfecc000
> [   59.876039] Krnl PSW : 0704d00180000000 00000000002bf3aa (__split_huge_pmd_locked+0x562/0xa10)
> [   59.876045]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
>                Krnl GPRS: 0000000001a7a1cf 000003d10177c000 0000000000044068 000000005df00215
> [   59.876051]            0000000000000001 0000000000000001 0000000000000000 00000000774e6900
> [   59.876054]            000003ff52000000 000000006d403b10 000000006e1eb800 000003ff51f00000
> [   59.876058]            000003d10177c000 0000000000715190 00000000002bf234 00000000cfecfb58
> [   59.876068] Krnl Code: 00000000002bf39c: d507d010a000	clc	16(8,%%r13),0(%%r10)
>                           00000000002bf3a2: a7840004		brc	8,2bf3aa
>                          #00000000002bf3a6: a7f40001		brc	15,2bf3a8
>                          >00000000002bf3aa: 91407440		tm	1088(%%r7),64
>                           00000000002bf3ae: a7840208		brc	8,2bf7be
>                           00000000002bf3b2: a7f401e9		brc	15,2bf784
>                           00000000002bf3b6: 9104a006		tm	6(%%r10),4
>                           00000000002bf3ba: a7740004		brc	7,2bf3c2
> [   59.876089] Call Trace:
> [   59.876092] ([<00000000002bf234>] __split_huge_pmd_locked+0x3ec/0xa10)
> [   59.876095]  [<00000000002c4310>] __split_huge_pmd+0x118/0x218
> [   59.876099]  [<00000000002810e8>] unmap_single_vma+0x2d8/0xb40
> [   59.876102]  [<0000000000282d66>] zap_page_range+0x116/0x318
> [   59.876105]  [<000000000029b834>] SyS_madvise+0x23c/0x5e8
> [   59.876108]  [<00000000006f9f56>] system_call+0xd6/0x258
> [   59.876111]  [<000003ff9bbfd282>] 0x3ff9bbfd282
> [   59.876113] INFO: lockdep is turned off.
> [   59.876115] Last Breaking-Event-Address:
> [   59.876118]  [<00000000002bf3a6>] __split_huge_pmd_locked+0x55e/0xa10

The BUG at mm/huge_memory.c:2884 is interesting, it's the BUG_ON(!pte_none(*pte))
check in __split_huge_pmd_locked(). Obviously we expect the pre-allocated
pagetables to be empty, but in collapse_huge_page() we deposit the original
pagetable instead of allocating a new (empty) one. This saves an allocation,
which is good, but doesn't that mean that if such a collapsed hugepage will
ever be split, we will always run into the BUG_ON(!pte_none(*pte)), or one
of the two other VM_BUG_ONs in mm/huge_memory.c that check the same?

This behavior is not new, it was the same before the THP rework, so I do not
assume that it is related to the current problems, maybe with the exception
of this specific crash. I never saw the BUG at mm/huge_memory.c:2884 myself,
and the other crashes probably cannot be explained with this. Maybe I am
also missing something, but I do not see how collapse_huge_page() and the
(non-empty) pgtable deposit there can work out with the BUG_ON(!pte_none(*pte))
checks. Any thoughts?

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-17 19:13                 ` Gerald Schaefer
  (?)
@ 2016-02-17 23:58                   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-17 23:58 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Sebastian Ott, Andrea Arcangeli, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

On Wed, Feb 17, 2016 at 08:13:40PM +0100, Gerald Schaefer wrote:
> On Sat, 13 Feb 2016 12:58:31 +0100 (CET)
> Sebastian Ott <sebott@linux.vnet.ibm.com> wrote:
> 
> > [   59.875935] ------------[ cut here ]------------
> > [   59.875937] kernel BUG at mm/huge_memory.c:2884!
> > [   59.875979] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > [   59.875986] Modules linked in: bridge stp llc btrfs xor mlx4_en vxlan ip6_udp_tunnel udp_tunnel mlx4_ib ptp pps_core ib_sa ib_mad ib_core ib_addr ghash_s390 prng raid6_pq ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 mlx4_core sha_common genwqe_card scm_block crc_itu_t vhost_net tun vhost dm_mod macvtap eadm_sch macvlan kvm autofs4
> > [   59.876033] CPU: 2 PID: 5402 Comm: git Tainted: G        W       4.4.0-07794-ga4eff16-dirty #77
> > [   59.876036] task: 00000000d2312948 ti: 00000000cfecc000 task.ti: 00000000cfecc000
> > [   59.876039] Krnl PSW : 0704d00180000000 00000000002bf3aa (__split_huge_pmd_locked+0x562/0xa10)
> > [   59.876045]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
> >                Krnl GPRS: 0000000001a7a1cf 000003d10177c000 0000000000044068 000000005df00215
> > [   59.876051]            0000000000000001 0000000000000001 0000000000000000 00000000774e6900
> > [   59.876054]            000003ff52000000 000000006d403b10 000000006e1eb800 000003ff51f00000
> > [   59.876058]            000003d10177c000 0000000000715190 00000000002bf234 00000000cfecfb58
> > [   59.876068] Krnl Code: 00000000002bf39c: d507d010a000	clc	16(8,%%r13),0(%%r10)
> >                           00000000002bf3a2: a7840004		brc	8,2bf3aa
> >                          #00000000002bf3a6: a7f40001		brc	15,2bf3a8
> >                          >00000000002bf3aa: 91407440		tm	1088(%%r7),64
> >                           00000000002bf3ae: a7840208		brc	8,2bf7be
> >                           00000000002bf3b2: a7f401e9		brc	15,2bf784
> >                           00000000002bf3b6: 9104a006		tm	6(%%r10),4
> >                           00000000002bf3ba: a7740004		brc	7,2bf3c2
> > [   59.876089] Call Trace:
> > [   59.876092] ([<00000000002bf234>] __split_huge_pmd_locked+0x3ec/0xa10)
> > [   59.876095]  [<00000000002c4310>] __split_huge_pmd+0x118/0x218
> > [   59.876099]  [<00000000002810e8>] unmap_single_vma+0x2d8/0xb40
> > [   59.876102]  [<0000000000282d66>] zap_page_range+0x116/0x318
> > [   59.876105]  [<000000000029b834>] SyS_madvise+0x23c/0x5e8
> > [   59.876108]  [<00000000006f9f56>] system_call+0xd6/0x258
> > [   59.876111]  [<000003ff9bbfd282>] 0x3ff9bbfd282
> > [   59.876113] INFO: lockdep is turned off.
> > [   59.876115] Last Breaking-Event-Address:
> > [   59.876118]  [<00000000002bf3a6>] __split_huge_pmd_locked+0x55e/0xa10
> 
> The BUG at mm/huge_memory.c:2884 is interesting, it's the BUG_ON(!pte_none(*pte))
> check in __split_huge_pmd_locked(). Obviously we expect the pre-allocated
> pagetables to be empty, but in collapse_huge_page() we deposit the original
> pagetable instead of allocating a new (empty) one. This saves an allocation,
> which is good, but doesn't that mean that if such a collapsed hugepage will
> ever be split, we will always run into the BUG_ON(!pte_none(*pte)), or one
> of the two other VM_BUG_ONs in mm/huge_memory.c that check the same?
> 
> This behavior is not new, it was the same before the THP rework, so I do not
> assume that it is related to the current problems, maybe with the exception
> of this specific crash. I never saw the BUG at mm/huge_memory.c:2884 myself,
> and the other crashes probably cannot be explained with this. Maybe I am
> also missing something, but I do not see how collapse_huge_page() and the
> (non-empty) pgtable deposit there can work out with the BUG_ON(!pte_none(*pte))
> checks. Any thoughts?

I don't think there's a problem: ptes in the pgtable are cleared with
pte_clear() in __collapse_huge_page_copy().

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-17 23:58                   ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-17 23:58 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Sebastian Ott, Andrea Arcangeli, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

On Wed, Feb 17, 2016 at 08:13:40PM +0100, Gerald Schaefer wrote:
> On Sat, 13 Feb 2016 12:58:31 +0100 (CET)
> Sebastian Ott <sebott@linux.vnet.ibm.com> wrote:
> 
> > [   59.875935] ------------[ cut here ]------------
> > [   59.875937] kernel BUG at mm/huge_memory.c:2884!
> > [   59.875979] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > [   59.875986] Modules linked in: bridge stp llc btrfs xor mlx4_en vxlan ip6_udp_tunnel udp_tunnel mlx4_ib ptp pps_core ib_sa ib_mad ib_core ib_addr ghash_s390 prng raid6_pq ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 mlx4_core sha_common genwqe_card scm_block crc_itu_t vhost_net tun vhost dm_mod macvtap eadm_sch macvlan kvm autofs4
> > [   59.876033] CPU: 2 PID: 5402 Comm: git Tainted: G        W       4.4.0-07794-ga4eff16-dirty #77
> > [   59.876036] task: 00000000d2312948 ti: 00000000cfecc000 task.ti: 00000000cfecc000
> > [   59.876039] Krnl PSW : 0704d00180000000 00000000002bf3aa (__split_huge_pmd_locked+0x562/0xa10)
> > [   59.876045]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
> >                Krnl GPRS: 0000000001a7a1cf 000003d10177c000 0000000000044068 000000005df00215
> > [   59.876051]            0000000000000001 0000000000000001 0000000000000000 00000000774e6900
> > [   59.876054]            000003ff52000000 000000006d403b10 000000006e1eb800 000003ff51f00000
> > [   59.876058]            000003d10177c000 0000000000715190 00000000002bf234 00000000cfecfb58
> > [   59.876068] Krnl Code: 00000000002bf39c: d507d010a000	clc	16(8,%%r13),0(%%r10)
> >                           00000000002bf3a2: a7840004		brc	8,2bf3aa
> >                          #00000000002bf3a6: a7f40001		brc	15,2bf3a8
> >                          >00000000002bf3aa: 91407440		tm	1088(%%r7),64
> >                           00000000002bf3ae: a7840208		brc	8,2bf7be
> >                           00000000002bf3b2: a7f401e9		brc	15,2bf784
> >                           00000000002bf3b6: 9104a006		tm	6(%%r10),4
> >                           00000000002bf3ba: a7740004		brc	7,2bf3c2
> > [   59.876089] Call Trace:
> > [   59.876092] ([<00000000002bf234>] __split_huge_pmd_locked+0x3ec/0xa10)
> > [   59.876095]  [<00000000002c4310>] __split_huge_pmd+0x118/0x218
> > [   59.876099]  [<00000000002810e8>] unmap_single_vma+0x2d8/0xb40
> > [   59.876102]  [<0000000000282d66>] zap_page_range+0x116/0x318
> > [   59.876105]  [<000000000029b834>] SyS_madvise+0x23c/0x5e8
> > [   59.876108]  [<00000000006f9f56>] system_call+0xd6/0x258
> > [   59.876111]  [<000003ff9bbfd282>] 0x3ff9bbfd282
> > [   59.876113] INFO: lockdep is turned off.
> > [   59.876115] Last Breaking-Event-Address:
> > [   59.876118]  [<00000000002bf3a6>] __split_huge_pmd_locked+0x55e/0xa10
> 
> The BUG at mm/huge_memory.c:2884 is interesting, it's the BUG_ON(!pte_none(*pte))
> check in __split_huge_pmd_locked(). Obviously we expect the pre-allocated
> pagetables to be empty, but in collapse_huge_page() we deposit the original
> pagetable instead of allocating a new (empty) one. This saves an allocation,
> which is good, but doesn't that mean that if such a collapsed hugepage will
> ever be split, we will always run into the BUG_ON(!pte_none(*pte)), or one
> of the two other VM_BUG_ONs in mm/huge_memory.c that check the same?
> 
> This behavior is not new, it was the same before the THP rework, so I do not
> assume that it is related to the current problems, maybe with the exception
> of this specific crash. I never saw the BUG at mm/huge_memory.c:2884 myself,
> and the other crashes probably cannot be explained with this. Maybe I am
> also missing something, but I do not see how collapse_huge_page() and the
> (non-empty) pgtable deposit there can work out with the BUG_ON(!pte_none(*pte))
> checks. Any thoughts?

I don't think there's a problem: ptes in the pgtable are cleared with
pte_clear() in __collapse_huge_page_copy().

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-17 23:58                   ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-17 23:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 17, 2016 at 08:13:40PM +0100, Gerald Schaefer wrote:
> On Sat, 13 Feb 2016 12:58:31 +0100 (CET)
> Sebastian Ott <sebott@linux.vnet.ibm.com> wrote:
> 
> > [   59.875935] ------------[ cut here ]------------
> > [   59.875937] kernel BUG at mm/huge_memory.c:2884!
> > [   59.875979] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > [   59.875986] Modules linked in: bridge stp llc btrfs xor mlx4_en vxlan ip6_udp_tunnel udp_tunnel mlx4_ib ptp pps_core ib_sa ib_mad ib_core ib_addr ghash_s390 prng raid6_pq ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 mlx4_core sha_common genwqe_card scm_block crc_itu_t vhost_net tun vhost dm_mod macvtap eadm_sch macvlan kvm autofs4
> > [   59.876033] CPU: 2 PID: 5402 Comm: git Tainted: G        W       4.4.0-07794-ga4eff16-dirty #77
> > [   59.876036] task: 00000000d2312948 ti: 00000000cfecc000 task.ti: 00000000cfecc000
> > [   59.876039] Krnl PSW : 0704d00180000000 00000000002bf3aa (__split_huge_pmd_locked+0x562/0xa10)
> > [   59.876045]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
> >                Krnl GPRS: 0000000001a7a1cf 000003d10177c000 0000000000044068 000000005df00215
> > [   59.876051]            0000000000000001 0000000000000001 0000000000000000 00000000774e6900
> > [   59.876054]            000003ff52000000 000000006d403b10 000000006e1eb800 000003ff51f00000
> > [   59.876058]            000003d10177c000 0000000000715190 00000000002bf234 00000000cfecfb58
> > [   59.876068] Krnl Code: 00000000002bf39c: d507d010a000	clc	16(8,%%r13),0(%%r10)
> >                           00000000002bf3a2: a7840004		brc	8,2bf3aa
> >                          #00000000002bf3a6: a7f40001		brc	15,2bf3a8
> >                          >00000000002bf3aa: 91407440		tm	1088(%%r7),64
> >                           00000000002bf3ae: a7840208		brc	8,2bf7be
> >                           00000000002bf3b2: a7f401e9		brc	15,2bf784
> >                           00000000002bf3b6: 9104a006		tm	6(%%r10),4
> >                           00000000002bf3ba: a7740004		brc	7,2bf3c2
> > [   59.876089] Call Trace:
> > [   59.876092] ([<00000000002bf234>] __split_huge_pmd_locked+0x3ec/0xa10)
> > [   59.876095]  [<00000000002c4310>] __split_huge_pmd+0x118/0x218
> > [   59.876099]  [<00000000002810e8>] unmap_single_vma+0x2d8/0xb40
> > [   59.876102]  [<0000000000282d66>] zap_page_range+0x116/0x318
> > [   59.876105]  [<000000000029b834>] SyS_madvise+0x23c/0x5e8
> > [   59.876108]  [<00000000006f9f56>] system_call+0xd6/0x258
> > [   59.876111]  [<000003ff9bbfd282>] 0x3ff9bbfd282
> > [   59.876113] INFO: lockdep is turned off.
> > [   59.876115] Last Breaking-Event-Address:
> > [   59.876118]  [<00000000002bf3a6>] __split_huge_pmd_locked+0x55e/0xa10
> 
> The BUG at mm/huge_memory.c:2884 is interesting, it's the BUG_ON(!pte_none(*pte))
> check in __split_huge_pmd_locked(). Obviously we expect the pre-allocated
> pagetables to be empty, but in collapse_huge_page() we deposit the original
> pagetable instead of allocating a new (empty) one. This saves an allocation,
> which is good, but doesn't that mean that if such a collapsed hugepage will
> ever be split, we will always run into the BUG_ON(!pte_none(*pte)), or one
> of the two other VM_BUG_ONs in mm/huge_memory.c that check the same?
> 
> This behavior is not new, it was the same before the THP rework, so I do not
> assume that it is related to the current problems, maybe with the exception
> of this specific crash. I never saw the BUG at mm/huge_memory.c:2884 myself,
> and the other crashes probably cannot be explained with this. Maybe I am
> also missing something, but I do not see how collapse_huge_page() and the
> (non-empty) pgtable deposit there can work out with the BUG_ON(!pte_none(*pte))
> checks. Any thoughts?

I don't think there's a problem: ptes in the pgtable are cleared with
pte_clear() in __collapse_huge_page_copy().

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-17 23:58                   ` Kirill A. Shutemov
  (?)
@ 2016-02-18 15:00                     ` Gerald Schaefer
  -1 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-18 15:00 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Sebastian Ott, Andrea Arcangeli, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

On Thu, 18 Feb 2016 01:58:08 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Wed, Feb 17, 2016 at 08:13:40PM +0100, Gerald Schaefer wrote:
> > On Sat, 13 Feb 2016 12:58:31 +0100 (CET)
> > Sebastian Ott <sebott@linux.vnet.ibm.com> wrote:
> > 
> > > [   59.875935] ------------[ cut here ]------------
> > > [   59.875937] kernel BUG at mm/huge_memory.c:2884!
> > > [   59.875979] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > > [   59.875986] Modules linked in: bridge stp llc btrfs xor mlx4_en vxlan ip6_udp_tunnel udp_tunnel mlx4_ib ptp pps_core ib_sa ib_mad ib_core ib_addr ghash_s390 prng raid6_pq ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 mlx4_core sha_common genwqe_card scm_block crc_itu_t vhost_net tun vhost dm_mod macvtap eadm_sch macvlan kvm autofs4
> > > [   59.876033] CPU: 2 PID: 5402 Comm: git Tainted: G        W       4.4.0-07794-ga4eff16-dirty #77
> > > [   59.876036] task: 00000000d2312948 ti: 00000000cfecc000 task.ti: 00000000cfecc000
> > > [   59.876039] Krnl PSW : 0704d00180000000 00000000002bf3aa (__split_huge_pmd_locked+0x562/0xa10)
> > > [   59.876045]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
> > >                Krnl GPRS: 0000000001a7a1cf 000003d10177c000 0000000000044068 000000005df00215
> > > [   59.876051]            0000000000000001 0000000000000001 0000000000000000 00000000774e6900
> > > [   59.876054]            000003ff52000000 000000006d403b10 000000006e1eb800 000003ff51f00000
> > > [   59.876058]            000003d10177c000 0000000000715190 00000000002bf234 00000000cfecfb58
> > > [   59.876068] Krnl Code: 00000000002bf39c: d507d010a000	clc	16(8,%%r13),0(%%r10)
> > >                           00000000002bf3a2: a7840004		brc	8,2bf3aa
> > >                          #00000000002bf3a6: a7f40001		brc	15,2bf3a8
> > >                          >00000000002bf3aa: 91407440		tm	1088(%%r7),64
> > >                           00000000002bf3ae: a7840208		brc	8,2bf7be
> > >                           00000000002bf3b2: a7f401e9		brc	15,2bf784
> > >                           00000000002bf3b6: 9104a006		tm	6(%%r10),4
> > >                           00000000002bf3ba: a7740004		brc	7,2bf3c2
> > > [   59.876089] Call Trace:
> > > [   59.876092] ([<00000000002bf234>] __split_huge_pmd_locked+0x3ec/0xa10)
> > > [   59.876095]  [<00000000002c4310>] __split_huge_pmd+0x118/0x218
> > > [   59.876099]  [<00000000002810e8>] unmap_single_vma+0x2d8/0xb40
> > > [   59.876102]  [<0000000000282d66>] zap_page_range+0x116/0x318
> > > [   59.876105]  [<000000000029b834>] SyS_madvise+0x23c/0x5e8
> > > [   59.876108]  [<00000000006f9f56>] system_call+0xd6/0x258
> > > [   59.876111]  [<000003ff9bbfd282>] 0x3ff9bbfd282
> > > [   59.876113] INFO: lockdep is turned off.
> > > [   59.876115] Last Breaking-Event-Address:
> > > [   59.876118]  [<00000000002bf3a6>] __split_huge_pmd_locked+0x55e/0xa10
> > 
> > The BUG at mm/huge_memory.c:2884 is interesting, it's the BUG_ON(!pte_none(*pte))
> > check in __split_huge_pmd_locked(). Obviously we expect the pre-allocated
> > pagetables to be empty, but in collapse_huge_page() we deposit the original
> > pagetable instead of allocating a new (empty) one. This saves an allocation,
> > which is good, but doesn't that mean that if such a collapsed hugepage will
> > ever be split, we will always run into the BUG_ON(!pte_none(*pte)), or one
> > of the two other VM_BUG_ONs in mm/huge_memory.c that check the same?
> > 
> > This behavior is not new, it was the same before the THP rework, so I do not
> > assume that it is related to the current problems, maybe with the exception
> > of this specific crash. I never saw the BUG at mm/huge_memory.c:2884 myself,
> > and the other crashes probably cannot be explained with this. Maybe I am
> > also missing something, but I do not see how collapse_huge_page() and the
> > (non-empty) pgtable deposit there can work out with the BUG_ON(!pte_none(*pte))
> > checks. Any thoughts?
> 
> I don't think there's a problem: ptes in the pgtable are cleared with
> pte_clear() in __collapse_huge_page_copy().
> 

Ah OK, I didn't see that. Still the BUG_ON() tells us that something went
wrong with the pre-allocated pagetable, or at least with the deposit/withdraw
list, or both. Given that on s390 we keep the listheads for the deposit/withdraw
list inside the pre-allocated pgtables, instead of the struct pages, it may
also explain why we see don't the problems on x86.

We already have the list corruption warning in exit_mmap -> zap_huge_pmd ->
withdraw, and from time to time I also hit the BUG_ON(page->pmd_huge_pte)
in exit_mmap -> free_pgtables -> free_pmd_range, which also indicates some
issues with the deposit/withdraw list, see below:

[ 2489.384069] page:000003d101aa6f00 count:1 mapcount:0 mapping:          (null) index:0x0
[ 2489.384075] flags: 0x0()
[ 2489.384078] page dumped because: VM_BUG_ON_PAGE(page->pmd_huge_pte)
[ 2489.384086] ------------[ cut here ]------------
[ 2489.384088] kernel BUG at include/linux/mm.h:1700!
[ 2489.384131] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 2489.384137] Modules linked in: bridge stp llc mlx4_ib ib_sa ib_mad mlx4_en ib_core vxlan udp_tunnel ptp pps_core ib_addr ghash_s390 prng ecb mlx4_core aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 sha_common eadm_sch dm_mod vhost_net tun vhost macvtap macvlan kvm autofs4
[ 2489.384173] CPU: 5 PID: 173619 Comm: cc1 Tainted: G    B   W       4.5.0-rc3-00083-gc05235d #10
[ 2489.384176] task: 00000000c54d0000 ti: 0000000060504000 task.ti: 0000000060504000
[ 2489.384179] Krnl PSW : 0704c00180000000 0000000000283cf4 (free_pgd_range+0x334/0x460)
[ 2489.384184]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
               Krnl GPRS: 0000000001a161c7 0000000000000000 0000000000000037 0000000000000000
[ 2489.384189]            0000000000283cf0 0000000000000000 000003ff7d980000 0000000060507e18
[ 2489.384192]            000003ff00000000 0000000075e43ff0 000003ff7d97ffff 000003ff7d980000
[ 2489.384195]            000000006a9bc000 00000000006cc390 0000000000283cf0 0000000060507c68
[ 2489.384201] Krnl Code: 0000000000283ce4: c030002e14dd        larl    %%r3,84669e
                          0000000000283cea: c0e5ffffd217        brasl   %%r14,27e118
                         #0000000000283cf0: a7f40001            brc     15,283cf2
                         >0000000000283cf4: c0e5fffffe5a        brasl   %%r14,2839a8
                          0000000000283cfa: b9040027            lgr     %%r2,%%r7
                          0000000000283cfe: b904003c            lgr     %%r3,%%r12
                          0000000000283d02: c0e5fff509e3        brasl   %%r14,1250c8
                          0000000000283d08: e31070000004        lg      %%r1,0(%%r7)
[ 2489.384221] Call Trace:
[ 2489.384224] ([<0000000000283cf0>] free_pgd_range+0x330/0x460)
[ 2489.384227]  [<0000000000283f38>] free_pgtables+0x118/0x148
[ 2489.384230]  [<000000000028c32e>] exit_mmap+0xd6/0x300
[ 2489.384233]  [<0000000000134d70>] mmput+0x90/0x118
[ 2489.384235]  [<000000000013a55c>] do_exit+0x41c/0xd18
[ 2489.384238]  [<000000000013c3c2>] do_group_exit+0x92/0xd8
[ 2489.384241]  [<000000000013c432>] SyS_exit_group+0x2a/0x30
[ 2489.384244]  [<00000000006b1a36>] system_call+0xd6/0x258
[ 2489.384246]  [<000003ff7d343698>] 0x3ff7d343698
[ 2489.384248] INFO: lockdep is turned off.
[ 2489.384251] Last Breaking-Event-Address:
[ 2489.384253]  [<0000000000283cf0>] free_pgd_range+0x330/0x460
[ 2489.384256]  
[ 2489.384258] Kernel panic - not syncing: Fatal exception: panic_on_oops

I'll try to add a BUG_ON(pmd_huge(*pmd)) to free_pte_range() and see if that
catches anything, and I'll also check if debug_cow = 1 or use_zero_page = 0
makes any difference.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-18 15:00                     ` Gerald Schaefer
  0 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-18 15:00 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Sebastian Ott, Andrea Arcangeli, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

On Thu, 18 Feb 2016 01:58:08 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Wed, Feb 17, 2016 at 08:13:40PM +0100, Gerald Schaefer wrote:
> > On Sat, 13 Feb 2016 12:58:31 +0100 (CET)
> > Sebastian Ott <sebott@linux.vnet.ibm.com> wrote:
> > 
> > > [   59.875935] ------------[ cut here ]------------
> > > [   59.875937] kernel BUG at mm/huge_memory.c:2884!
> > > [   59.875979] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > > [   59.875986] Modules linked in: bridge stp llc btrfs xor mlx4_en vxlan ip6_udp_tunnel udp_tunnel mlx4_ib ptp pps_core ib_sa ib_mad ib_core ib_addr ghash_s390 prng raid6_pq ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 mlx4_core sha_common genwqe_card scm_block crc_itu_t vhost_net tun vhost dm_mod macvtap eadm_sch macvlan kvm autofs4
> > > [   59.876033] CPU: 2 PID: 5402 Comm: git Tainted: G        W       4.4.0-07794-ga4eff16-dirty #77
> > > [   59.876036] task: 00000000d2312948 ti: 00000000cfecc000 task.ti: 00000000cfecc000
> > > [   59.876039] Krnl PSW : 0704d00180000000 00000000002bf3aa (__split_huge_pmd_locked+0x562/0xa10)
> > > [   59.876045]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
> > >                Krnl GPRS: 0000000001a7a1cf 000003d10177c000 0000000000044068 000000005df00215
> > > [   59.876051]            0000000000000001 0000000000000001 0000000000000000 00000000774e6900
> > > [   59.876054]            000003ff52000000 000000006d403b10 000000006e1eb800 000003ff51f00000
> > > [   59.876058]            000003d10177c000 0000000000715190 00000000002bf234 00000000cfecfb58
> > > [   59.876068] Krnl Code: 00000000002bf39c: d507d010a000	clc	16(8,%%r13),0(%%r10)
> > >                           00000000002bf3a2: a7840004		brc	8,2bf3aa
> > >                          #00000000002bf3a6: a7f40001		brc	15,2bf3a8
> > >                          >00000000002bf3aa: 91407440		tm	1088(%%r7),64
> > >                           00000000002bf3ae: a7840208		brc	8,2bf7be
> > >                           00000000002bf3b2: a7f401e9		brc	15,2bf784
> > >                           00000000002bf3b6: 9104a006		tm	6(%%r10),4
> > >                           00000000002bf3ba: a7740004		brc	7,2bf3c2
> > > [   59.876089] Call Trace:
> > > [   59.876092] ([<00000000002bf234>] __split_huge_pmd_locked+0x3ec/0xa10)
> > > [   59.876095]  [<00000000002c4310>] __split_huge_pmd+0x118/0x218
> > > [   59.876099]  [<00000000002810e8>] unmap_single_vma+0x2d8/0xb40
> > > [   59.876102]  [<0000000000282d66>] zap_page_range+0x116/0x318
> > > [   59.876105]  [<000000000029b834>] SyS_madvise+0x23c/0x5e8
> > > [   59.876108]  [<00000000006f9f56>] system_call+0xd6/0x258
> > > [   59.876111]  [<000003ff9bbfd282>] 0x3ff9bbfd282
> > > [   59.876113] INFO: lockdep is turned off.
> > > [   59.876115] Last Breaking-Event-Address:
> > > [   59.876118]  [<00000000002bf3a6>] __split_huge_pmd_locked+0x55e/0xa10
> > 
> > The BUG at mm/huge_memory.c:2884 is interesting, it's the BUG_ON(!pte_none(*pte))
> > check in __split_huge_pmd_locked(). Obviously we expect the pre-allocated
> > pagetables to be empty, but in collapse_huge_page() we deposit the original
> > pagetable instead of allocating a new (empty) one. This saves an allocation,
> > which is good, but doesn't that mean that if such a collapsed hugepage will
> > ever be split, we will always run into the BUG_ON(!pte_none(*pte)), or one
> > of the two other VM_BUG_ONs in mm/huge_memory.c that check the same?
> > 
> > This behavior is not new, it was the same before the THP rework, so I do not
> > assume that it is related to the current problems, maybe with the exception
> > of this specific crash. I never saw the BUG at mm/huge_memory.c:2884 myself,
> > and the other crashes probably cannot be explained with this. Maybe I am
> > also missing something, but I do not see how collapse_huge_page() and the
> > (non-empty) pgtable deposit there can work out with the BUG_ON(!pte_none(*pte))
> > checks. Any thoughts?
> 
> I don't think there's a problem: ptes in the pgtable are cleared with
> pte_clear() in __collapse_huge_page_copy().
> 

Ah OK, I didn't see that. Still the BUG_ON() tells us that something went
wrong with the pre-allocated pagetable, or at least with the deposit/withdraw
list, or both. Given that on s390 we keep the listheads for the deposit/withdraw
list inside the pre-allocated pgtables, instead of the struct pages, it may
also explain why we see don't the problems on x86.

We already have the list corruption warning in exit_mmap -> zap_huge_pmd ->
withdraw, and from time to time I also hit the BUG_ON(page->pmd_huge_pte)
in exit_mmap -> free_pgtables -> free_pmd_range, which also indicates some
issues with the deposit/withdraw list, see below:

[ 2489.384069] page:000003d101aa6f00 count:1 mapcount:0 mapping:          (null) index:0x0
[ 2489.384075] flags: 0x0()
[ 2489.384078] page dumped because: VM_BUG_ON_PAGE(page->pmd_huge_pte)
[ 2489.384086] ------------[ cut here ]------------
[ 2489.384088] kernel BUG at include/linux/mm.h:1700!
[ 2489.384131] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 2489.384137] Modules linked in: bridge stp llc mlx4_ib ib_sa ib_mad mlx4_en ib_core vxlan udp_tunnel ptp pps_core ib_addr ghash_s390 prng ecb mlx4_core aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 sha_common eadm_sch dm_mod vhost_net tun vhost macvtap macvlan kvm autofs4
[ 2489.384173] CPU: 5 PID: 173619 Comm: cc1 Tainted: G    B   W       4.5.0-rc3-00083-gc05235d #10
[ 2489.384176] task: 00000000c54d0000 ti: 0000000060504000 task.ti: 0000000060504000
[ 2489.384179] Krnl PSW : 0704c00180000000 0000000000283cf4 (free_pgd_range+0x334/0x460)
[ 2489.384184]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
               Krnl GPRS: 0000000001a161c7 0000000000000000 0000000000000037 0000000000000000
[ 2489.384189]            0000000000283cf0 0000000000000000 000003ff7d980000 0000000060507e18
[ 2489.384192]            000003ff00000000 0000000075e43ff0 000003ff7d97ffff 000003ff7d980000
[ 2489.384195]            000000006a9bc000 00000000006cc390 0000000000283cf0 0000000060507c68
[ 2489.384201] Krnl Code: 0000000000283ce4: c030002e14dd        larl    %%r3,84669e
                          0000000000283cea: c0e5ffffd217        brasl   %%r14,27e118
                         #0000000000283cf0: a7f40001            brc     15,283cf2
                         >0000000000283cf4: c0e5fffffe5a        brasl   %%r14,2839a8
                          0000000000283cfa: b9040027            lgr     %%r2,%%r7
                          0000000000283cfe: b904003c            lgr     %%r3,%%r12
                          0000000000283d02: c0e5fff509e3        brasl   %%r14,1250c8
                          0000000000283d08: e31070000004        lg      %%r1,0(%%r7)
[ 2489.384221] Call Trace:
[ 2489.384224] ([<0000000000283cf0>] free_pgd_range+0x330/0x460)
[ 2489.384227]  [<0000000000283f38>] free_pgtables+0x118/0x148
[ 2489.384230]  [<000000000028c32e>] exit_mmap+0xd6/0x300
[ 2489.384233]  [<0000000000134d70>] mmput+0x90/0x118
[ 2489.384235]  [<000000000013a55c>] do_exit+0x41c/0xd18
[ 2489.384238]  [<000000000013c3c2>] do_group_exit+0x92/0xd8
[ 2489.384241]  [<000000000013c432>] SyS_exit_group+0x2a/0x30
[ 2489.384244]  [<00000000006b1a36>] system_call+0xd6/0x258
[ 2489.384246]  [<000003ff7d343698>] 0x3ff7d343698
[ 2489.384248] INFO: lockdep is turned off.
[ 2489.384251] Last Breaking-Event-Address:
[ 2489.384253]  [<0000000000283cf0>] free_pgd_range+0x330/0x460
[ 2489.384256]  
[ 2489.384258] Kernel panic - not syncing: Fatal exception: panic_on_oops

I'll try to add a BUG_ON(pmd_huge(*pmd)) to free_pte_range() and see if that
catches anything, and I'll also check if debug_cow = 1 or use_zero_page = 0
makes any difference.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-18 15:00                     ` Gerald Schaefer
  0 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-18 15:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 18 Feb 2016 01:58:08 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Wed, Feb 17, 2016 at 08:13:40PM +0100, Gerald Schaefer wrote:
> > On Sat, 13 Feb 2016 12:58:31 +0100 (CET)
> > Sebastian Ott <sebott@linux.vnet.ibm.com> wrote:
> > 
> > > [   59.875935] ------------[ cut here ]------------
> > > [   59.875937] kernel BUG at mm/huge_memory.c:2884!
> > > [   59.875979] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > > [   59.875986] Modules linked in: bridge stp llc btrfs xor mlx4_en vxlan ip6_udp_tunnel udp_tunnel mlx4_ib ptp pps_core ib_sa ib_mad ib_core ib_addr ghash_s390 prng raid6_pq ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 mlx4_core sha_common genwqe_card scm_block crc_itu_t vhost_net tun vhost dm_mod macvtap eadm_sch macvlan kvm autofs4
> > > [   59.876033] CPU: 2 PID: 5402 Comm: git Tainted: G        W       4.4.0-07794-ga4eff16-dirty #77
> > > [   59.876036] task: 00000000d2312948 ti: 00000000cfecc000 task.ti: 00000000cfecc000
> > > [   59.876039] Krnl PSW : 0704d00180000000 00000000002bf3aa (__split_huge_pmd_locked+0x562/0xa10)
> > > [   59.876045]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
> > >                Krnl GPRS: 0000000001a7a1cf 000003d10177c000 0000000000044068 000000005df00215
> > > [   59.876051]            0000000000000001 0000000000000001 0000000000000000 00000000774e6900
> > > [   59.876054]            000003ff52000000 000000006d403b10 000000006e1eb800 000003ff51f00000
> > > [   59.876058]            000003d10177c000 0000000000715190 00000000002bf234 00000000cfecfb58
> > > [   59.876068] Krnl Code: 00000000002bf39c: d507d010a000	clc	16(8,%%r13),0(%%r10)
> > >                           00000000002bf3a2: a7840004		brc	8,2bf3aa
> > >                          #00000000002bf3a6: a7f40001		brc	15,2bf3a8
> > >                          >00000000002bf3aa: 91407440		tm	1088(%%r7),64
> > >                           00000000002bf3ae: a7840208		brc	8,2bf7be
> > >                           00000000002bf3b2: a7f401e9		brc	15,2bf784
> > >                           00000000002bf3b6: 9104a006		tm	6(%%r10),4
> > >                           00000000002bf3ba: a7740004		brc	7,2bf3c2
> > > [   59.876089] Call Trace:
> > > [   59.876092] ([<00000000002bf234>] __split_huge_pmd_locked+0x3ec/0xa10)
> > > [   59.876095]  [<00000000002c4310>] __split_huge_pmd+0x118/0x218
> > > [   59.876099]  [<00000000002810e8>] unmap_single_vma+0x2d8/0xb40
> > > [   59.876102]  [<0000000000282d66>] zap_page_range+0x116/0x318
> > > [   59.876105]  [<000000000029b834>] SyS_madvise+0x23c/0x5e8
> > > [   59.876108]  [<00000000006f9f56>] system_call+0xd6/0x258
> > > [   59.876111]  [<000003ff9bbfd282>] 0x3ff9bbfd282
> > > [   59.876113] INFO: lockdep is turned off.
> > > [   59.876115] Last Breaking-Event-Address:
> > > [   59.876118]  [<00000000002bf3a6>] __split_huge_pmd_locked+0x55e/0xa10
> > 
> > The BUG at mm/huge_memory.c:2884 is interesting, it's the BUG_ON(!pte_none(*pte))
> > check in __split_huge_pmd_locked(). Obviously we expect the pre-allocated
> > pagetables to be empty, but in collapse_huge_page() we deposit the original
> > pagetable instead of allocating a new (empty) one. This saves an allocation,
> > which is good, but doesn't that mean that if such a collapsed hugepage will
> > ever be split, we will always run into the BUG_ON(!pte_none(*pte)), or one
> > of the two other VM_BUG_ONs in mm/huge_memory.c that check the same?
> > 
> > This behavior is not new, it was the same before the THP rework, so I do not
> > assume that it is related to the current problems, maybe with the exception
> > of this specific crash. I never saw the BUG at mm/huge_memory.c:2884 myself,
> > and the other crashes probably cannot be explained with this. Maybe I am
> > also missing something, but I do not see how collapse_huge_page() and the
> > (non-empty) pgtable deposit there can work out with the BUG_ON(!pte_none(*pte))
> > checks. Any thoughts?
> 
> I don't think there's a problem: ptes in the pgtable are cleared with
> pte_clear() in __collapse_huge_page_copy().
> 

Ah OK, I didn't see that. Still the BUG_ON() tells us that something went
wrong with the pre-allocated pagetable, or at least with the deposit/withdraw
list, or both. Given that on s390 we keep the listheads for the deposit/withdraw
list inside the pre-allocated pgtables, instead of the struct pages, it may
also explain why we see don't the problems on x86.

We already have the list corruption warning in exit_mmap -> zap_huge_pmd ->
withdraw, and from time to time I also hit the BUG_ON(page->pmd_huge_pte)
in exit_mmap -> free_pgtables -> free_pmd_range, which also indicates some
issues with the deposit/withdraw list, see below:

[ 2489.384069] page:000003d101aa6f00 count:1 mapcount:0 mapping:          (null) index:0x0
[ 2489.384075] flags: 0x0()
[ 2489.384078] page dumped because: VM_BUG_ON_PAGE(page->pmd_huge_pte)
[ 2489.384086] ------------[ cut here ]------------
[ 2489.384088] kernel BUG at include/linux/mm.h:1700!
[ 2489.384131] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 2489.384137] Modules linked in: bridge stp llc mlx4_ib ib_sa ib_mad mlx4_en ib_core vxlan udp_tunnel ptp pps_core ib_addr ghash_s390 prng ecb mlx4_core aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 sha_common eadm_sch dm_mod vhost_net tun vhost macvtap macvlan kvm autofs4
[ 2489.384173] CPU: 5 PID: 173619 Comm: cc1 Tainted: G    B   W       4.5.0-rc3-00083-gc05235d #10
[ 2489.384176] task: 00000000c54d0000 ti: 0000000060504000 task.ti: 0000000060504000
[ 2489.384179] Krnl PSW : 0704c00180000000 0000000000283cf4 (free_pgd_range+0x334/0x460)
[ 2489.384184]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
               Krnl GPRS: 0000000001a161c7 0000000000000000 0000000000000037 0000000000000000
[ 2489.384189]            0000000000283cf0 0000000000000000 000003ff7d980000 0000000060507e18
[ 2489.384192]            000003ff00000000 0000000075e43ff0 000003ff7d97ffff 000003ff7d980000
[ 2489.384195]            000000006a9bc000 00000000006cc390 0000000000283cf0 0000000060507c68
[ 2489.384201] Krnl Code: 0000000000283ce4: c030002e14dd        larl    %%r3,84669e
                          0000000000283cea: c0e5ffffd217        brasl   %%r14,27e118
                         #0000000000283cf0: a7f40001            brc     15,283cf2
                         >0000000000283cf4: c0e5fffffe5a        brasl   %%r14,2839a8
                          0000000000283cfa: b9040027            lgr     %%r2,%%r7
                          0000000000283cfe: b904003c            lgr     %%r3,%%r12
                          0000000000283d02: c0e5fff509e3        brasl   %%r14,1250c8
                          0000000000283d08: e31070000004        lg      %%r1,0(%%r7)
[ 2489.384221] Call Trace:
[ 2489.384224] ([<0000000000283cf0>] free_pgd_range+0x330/0x460)
[ 2489.384227]  [<0000000000283f38>] free_pgtables+0x118/0x148
[ 2489.384230]  [<000000000028c32e>] exit_mmap+0xd6/0x300
[ 2489.384233]  [<0000000000134d70>] mmput+0x90/0x118
[ 2489.384235]  [<000000000013a55c>] do_exit+0x41c/0xd18
[ 2489.384238]  [<000000000013c3c2>] do_group_exit+0x92/0xd8
[ 2489.384241]  [<000000000013c432>] SyS_exit_group+0x2a/0x30
[ 2489.384244]  [<00000000006b1a36>] system_call+0xd6/0x258
[ 2489.384246]  [<000003ff7d343698>] 0x3ff7d343698
[ 2489.384248] INFO: lockdep is turned off.
[ 2489.384251] Last Breaking-Event-Address:
[ 2489.384253]  [<0000000000283cf0>] free_pgd_range+0x330/0x460
[ 2489.384256]  
[ 2489.384258] Kernel panic - not syncing: Fatal exception: panic_on_oops

I'll try to add a BUG_ON(pmd_huge(*pmd)) to free_pte_range() and see if that
catches anything, and I'll also check if debug_cow = 1 or use_zero_page = 0
makes any difference.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-18 15:00                     ` Gerald Schaefer
  (?)
@ 2016-02-18 17:06                       ` Kirill A. Shutemov
  -1 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-18 17:06 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Sebastian Ott, Andrea Arcangeli, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

On Thu, Feb 18, 2016 at 04:00:37PM +0100, Gerald Schaefer wrote:
> On Thu, 18 Feb 2016 01:58:08 +0200
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > On Wed, Feb 17, 2016 at 08:13:40PM +0100, Gerald Schaefer wrote:
> > > On Sat, 13 Feb 2016 12:58:31 +0100 (CET)
> > > Sebastian Ott <sebott@linux.vnet.ibm.com> wrote:
> > > 
> > > > [   59.875935] ------------[ cut here ]------------
> > > > [   59.875937] kernel BUG at mm/huge_memory.c:2884!
> > > > [   59.875979] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > > > [   59.875986] Modules linked in: bridge stp llc btrfs xor mlx4_en vxlan ip6_udp_tunnel udp_tunnel mlx4_ib ptp pps_core ib_sa ib_mad ib_core ib_addr ghash_s390 prng raid6_pq ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 mlx4_core sha_common genwqe_card scm_block crc_itu_t vhost_net tun vhost dm_mod macvtap eadm_sch macvlan kvm autofs4
> > > > [   59.876033] CPU: 2 PID: 5402 Comm: git Tainted: G        W       4.4.0-07794-ga4eff16-dirty #77
> > > > [   59.876036] task: 00000000d2312948 ti: 00000000cfecc000 task.ti: 00000000cfecc000
> > > > [   59.876039] Krnl PSW : 0704d00180000000 00000000002bf3aa (__split_huge_pmd_locked+0x562/0xa10)
> > > > [   59.876045]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
> > > >                Krnl GPRS: 0000000001a7a1cf 000003d10177c000 0000000000044068 000000005df00215
> > > > [   59.876051]            0000000000000001 0000000000000001 0000000000000000 00000000774e6900
> > > > [   59.876054]            000003ff52000000 000000006d403b10 000000006e1eb800 000003ff51f00000
> > > > [   59.876058]            000003d10177c000 0000000000715190 00000000002bf234 00000000cfecfb58
> > > > [   59.876068] Krnl Code: 00000000002bf39c: d507d010a000	clc	16(8,%%r13),0(%%r10)
> > > >                           00000000002bf3a2: a7840004		brc	8,2bf3aa
> > > >                          #00000000002bf3a6: a7f40001		brc	15,2bf3a8
> > > >                          >00000000002bf3aa: 91407440		tm	1088(%%r7),64
> > > >                           00000000002bf3ae: a7840208		brc	8,2bf7be
> > > >                           00000000002bf3b2: a7f401e9		brc	15,2bf784
> > > >                           00000000002bf3b6: 9104a006		tm	6(%%r10),4
> > > >                           00000000002bf3ba: a7740004		brc	7,2bf3c2
> > > > [   59.876089] Call Trace:
> > > > [   59.876092] ([<00000000002bf234>] __split_huge_pmd_locked+0x3ec/0xa10)
> > > > [   59.876095]  [<00000000002c4310>] __split_huge_pmd+0x118/0x218
> > > > [   59.876099]  [<00000000002810e8>] unmap_single_vma+0x2d8/0xb40
> > > > [   59.876102]  [<0000000000282d66>] zap_page_range+0x116/0x318
> > > > [   59.876105]  [<000000000029b834>] SyS_madvise+0x23c/0x5e8
> > > > [   59.876108]  [<00000000006f9f56>] system_call+0xd6/0x258
> > > > [   59.876111]  [<000003ff9bbfd282>] 0x3ff9bbfd282
> > > > [   59.876113] INFO: lockdep is turned off.
> > > > [   59.876115] Last Breaking-Event-Address:
> > > > [   59.876118]  [<00000000002bf3a6>] __split_huge_pmd_locked+0x55e/0xa10
> > > 
> > > The BUG at mm/huge_memory.c:2884 is interesting, it's the BUG_ON(!pte_none(*pte))
> > > check in __split_huge_pmd_locked(). Obviously we expect the pre-allocated
> > > pagetables to be empty, but in collapse_huge_page() we deposit the original
> > > pagetable instead of allocating a new (empty) one. This saves an allocation,
> > > which is good, but doesn't that mean that if such a collapsed hugepage will
> > > ever be split, we will always run into the BUG_ON(!pte_none(*pte)), or one
> > > of the two other VM_BUG_ONs in mm/huge_memory.c that check the same?
> > > 
> > > This behavior is not new, it was the same before the THP rework, so I do not
> > > assume that it is related to the current problems, maybe with the exception
> > > of this specific crash. I never saw the BUG at mm/huge_memory.c:2884 myself,
> > > and the other crashes probably cannot be explained with this. Maybe I am
> > > also missing something, but I do not see how collapse_huge_page() and the
> > > (non-empty) pgtable deposit there can work out with the BUG_ON(!pte_none(*pte))
> > > checks. Any thoughts?
> > 
> > I don't think there's a problem: ptes in the pgtable are cleared with
> > pte_clear() in __collapse_huge_page_copy().
> > 
> 
> Ah OK, I didn't see that. Still the BUG_ON() tells us that something went
> wrong with the pre-allocated pagetable, or at least with the deposit/withdraw
> list, or both. Given that on s390 we keep the listheads for the deposit/withdraw
> list inside the pre-allocated pgtables, instead of the struct pages, it may
> also explain why we see don't the problems on x86.
> 
> We already have the list corruption warning in exit_mmap -> zap_huge_pmd ->
> withdraw, and from time to time I also hit the BUG_ON(page->pmd_huge_pte)
> in exit_mmap -> free_pgtables -> free_pmd_range, which also indicates some
> issues with the deposit/withdraw list, see below:
> 
> [ 2489.384069] page:000003d101aa6f00 count:1 mapcount:0 mapping:          (null) index:0x0
> [ 2489.384075] flags: 0x0()
> [ 2489.384078] page dumped because: VM_BUG_ON_PAGE(page->pmd_huge_pte)
> [ 2489.384086] ------------[ cut here ]------------
> [ 2489.384088] kernel BUG at include/linux/mm.h:1700!
> [ 2489.384131] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [ 2489.384137] Modules linked in: bridge stp llc mlx4_ib ib_sa ib_mad mlx4_en ib_core vxlan udp_tunnel ptp pps_core ib_addr ghash_s390 prng ecb mlx4_core aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 sha_common eadm_sch dm_mod vhost_net tun vhost macvtap macvlan kvm autofs4
> [ 2489.384173] CPU: 5 PID: 173619 Comm: cc1 Tainted: G    B   W       4.5.0-rc3-00083-gc05235d #10
> [ 2489.384176] task: 00000000c54d0000 ti: 0000000060504000 task.ti: 0000000060504000
> [ 2489.384179] Krnl PSW : 0704c00180000000 0000000000283cf4 (free_pgd_range+0x334/0x460)
> [ 2489.384184]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
>                Krnl GPRS: 0000000001a161c7 0000000000000000 0000000000000037 0000000000000000
> [ 2489.384189]            0000000000283cf0 0000000000000000 000003ff7d980000 0000000060507e18
> [ 2489.384192]            000003ff00000000 0000000075e43ff0 000003ff7d97ffff 000003ff7d980000
> [ 2489.384195]            000000006a9bc000 00000000006cc390 0000000000283cf0 0000000060507c68
> [ 2489.384201] Krnl Code: 0000000000283ce4: c030002e14dd        larl    %%r3,84669e
>                           0000000000283cea: c0e5ffffd217        brasl   %%r14,27e118
>                          #0000000000283cf0: a7f40001            brc     15,283cf2
>                          >0000000000283cf4: c0e5fffffe5a        brasl   %%r14,2839a8
>                           0000000000283cfa: b9040027            lgr     %%r2,%%r7
>                           0000000000283cfe: b904003c            lgr     %%r3,%%r12
>                           0000000000283d02: c0e5fff509e3        brasl   %%r14,1250c8
>                           0000000000283d08: e31070000004        lg      %%r1,0(%%r7)
> [ 2489.384221] Call Trace:
> [ 2489.384224] ([<0000000000283cf0>] free_pgd_range+0x330/0x460)
> [ 2489.384227]  [<0000000000283f38>] free_pgtables+0x118/0x148
> [ 2489.384230]  [<000000000028c32e>] exit_mmap+0xd6/0x300
> [ 2489.384233]  [<0000000000134d70>] mmput+0x90/0x118
> [ 2489.384235]  [<000000000013a55c>] do_exit+0x41c/0xd18
> [ 2489.384238]  [<000000000013c3c2>] do_group_exit+0x92/0xd8
> [ 2489.384241]  [<000000000013c432>] SyS_exit_group+0x2a/0x30
> [ 2489.384244]  [<00000000006b1a36>] system_call+0xd6/0x258
> [ 2489.384246]  [<000003ff7d343698>] 0x3ff7d343698
> [ 2489.384248] INFO: lockdep is turned off.
> [ 2489.384251] Last Breaking-Event-Address:
> [ 2489.384253]  [<0000000000283cf0>] free_pgd_range+0x330/0x460
> [ 2489.384256]  
> [ 2489.384258] Kernel panic - not syncing: Fatal exception: panic_on_oops
> 
> I'll try to add a BUG_ON(pmd_huge(*pmd)) to free_pte_range() and see if that
> catches anything, and I'll also check if debug_cow = 1 or use_zero_page = 0
> makes any difference.

I worth minimizing kernel config on which you can see the bug. Things like
CONFIG_DEBUG_PAGEALLOC used to interfere with THP before.

You can also disable khugepaged, just in case.

One more thing: try add smp_wmb() in pgtable_trans_huge_withdraw() just
before return to make sure all CPUs sees _PAGE_INVALID.
I don't think it would make a difference. Again, just in case.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-18 17:06                       ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-18 17:06 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Sebastian Ott, Andrea Arcangeli, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390

On Thu, Feb 18, 2016 at 04:00:37PM +0100, Gerald Schaefer wrote:
> On Thu, 18 Feb 2016 01:58:08 +0200
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > On Wed, Feb 17, 2016 at 08:13:40PM +0100, Gerald Schaefer wrote:
> > > On Sat, 13 Feb 2016 12:58:31 +0100 (CET)
> > > Sebastian Ott <sebott@linux.vnet.ibm.com> wrote:
> > > 
> > > > [   59.875935] ------------[ cut here ]------------
> > > > [   59.875937] kernel BUG at mm/huge_memory.c:2884!
> > > > [   59.875979] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > > > [   59.875986] Modules linked in: bridge stp llc btrfs xor mlx4_en vxlan ip6_udp_tunnel udp_tunnel mlx4_ib ptp pps_core ib_sa ib_mad ib_core ib_addr ghash_s390 prng raid6_pq ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 mlx4_core sha_common genwqe_card scm_block crc_itu_t vhost_net tun vhost dm_mod macvtap eadm_sch macvlan kvm autofs4
> > > > [   59.876033] CPU: 2 PID: 5402 Comm: git Tainted: G        W       4.4.0-07794-ga4eff16-dirty #77
> > > > [   59.876036] task: 00000000d2312948 ti: 00000000cfecc000 task.ti: 00000000cfecc000
> > > > [   59.876039] Krnl PSW : 0704d00180000000 00000000002bf3aa (__split_huge_pmd_locked+0x562/0xa10)
> > > > [   59.876045]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
> > > >                Krnl GPRS: 0000000001a7a1cf 000003d10177c000 0000000000044068 000000005df00215
> > > > [   59.876051]            0000000000000001 0000000000000001 0000000000000000 00000000774e6900
> > > > [   59.876054]            000003ff52000000 000000006d403b10 000000006e1eb800 000003ff51f00000
> > > > [   59.876058]            000003d10177c000 0000000000715190 00000000002bf234 00000000cfecfb58
> > > > [   59.876068] Krnl Code: 00000000002bf39c: d507d010a000	clc	16(8,%%r13),0(%%r10)
> > > >                           00000000002bf3a2: a7840004		brc	8,2bf3aa
> > > >                          #00000000002bf3a6: a7f40001		brc	15,2bf3a8
> > > >                          >00000000002bf3aa: 91407440		tm	1088(%%r7),64
> > > >                           00000000002bf3ae: a7840208		brc	8,2bf7be
> > > >                           00000000002bf3b2: a7f401e9		brc	15,2bf784
> > > >                           00000000002bf3b6: 9104a006		tm	6(%%r10),4
> > > >                           00000000002bf3ba: a7740004		brc	7,2bf3c2
> > > > [   59.876089] Call Trace:
> > > > [   59.876092] ([<00000000002bf234>] __split_huge_pmd_locked+0x3ec/0xa10)
> > > > [   59.876095]  [<00000000002c4310>] __split_huge_pmd+0x118/0x218
> > > > [   59.876099]  [<00000000002810e8>] unmap_single_vma+0x2d8/0xb40
> > > > [   59.876102]  [<0000000000282d66>] zap_page_range+0x116/0x318
> > > > [   59.876105]  [<000000000029b834>] SyS_madvise+0x23c/0x5e8
> > > > [   59.876108]  [<00000000006f9f56>] system_call+0xd6/0x258
> > > > [   59.876111]  [<000003ff9bbfd282>] 0x3ff9bbfd282
> > > > [   59.876113] INFO: lockdep is turned off.
> > > > [   59.876115] Last Breaking-Event-Address:
> > > > [   59.876118]  [<00000000002bf3a6>] __split_huge_pmd_locked+0x55e/0xa10
> > > 
> > > The BUG at mm/huge_memory.c:2884 is interesting, it's the BUG_ON(!pte_none(*pte))
> > > check in __split_huge_pmd_locked(). Obviously we expect the pre-allocated
> > > pagetables to be empty, but in collapse_huge_page() we deposit the original
> > > pagetable instead of allocating a new (empty) one. This saves an allocation,
> > > which is good, but doesn't that mean that if such a collapsed hugepage will
> > > ever be split, we will always run into the BUG_ON(!pte_none(*pte)), or one
> > > of the two other VM_BUG_ONs in mm/huge_memory.c that check the same?
> > > 
> > > This behavior is not new, it was the same before the THP rework, so I do not
> > > assume that it is related to the current problems, maybe with the exception
> > > of this specific crash. I never saw the BUG at mm/huge_memory.c:2884 myself,
> > > and the other crashes probably cannot be explained with this. Maybe I am
> > > also missing something, but I do not see how collapse_huge_page() and the
> > > (non-empty) pgtable deposit there can work out with the BUG_ON(!pte_none(*pte))
> > > checks. Any thoughts?
> > 
> > I don't think there's a problem: ptes in the pgtable are cleared with
> > pte_clear() in __collapse_huge_page_copy().
> > 
> 
> Ah OK, I didn't see that. Still the BUG_ON() tells us that something went
> wrong with the pre-allocated pagetable, or at least with the deposit/withdraw
> list, or both. Given that on s390 we keep the listheads for the deposit/withdraw
> list inside the pre-allocated pgtables, instead of the struct pages, it may
> also explain why we see don't the problems on x86.
> 
> We already have the list corruption warning in exit_mmap -> zap_huge_pmd ->
> withdraw, and from time to time I also hit the BUG_ON(page->pmd_huge_pte)
> in exit_mmap -> free_pgtables -> free_pmd_range, which also indicates some
> issues with the deposit/withdraw list, see below:
> 
> [ 2489.384069] page:000003d101aa6f00 count:1 mapcount:0 mapping:          (null) index:0x0
> [ 2489.384075] flags: 0x0()
> [ 2489.384078] page dumped because: VM_BUG_ON_PAGE(page->pmd_huge_pte)
> [ 2489.384086] ------------[ cut here ]------------
> [ 2489.384088] kernel BUG at include/linux/mm.h:1700!
> [ 2489.384131] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [ 2489.384137] Modules linked in: bridge stp llc mlx4_ib ib_sa ib_mad mlx4_en ib_core vxlan udp_tunnel ptp pps_core ib_addr ghash_s390 prng ecb mlx4_core aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 sha_common eadm_sch dm_mod vhost_net tun vhost macvtap macvlan kvm autofs4
> [ 2489.384173] CPU: 5 PID: 173619 Comm: cc1 Tainted: G    B   W       4.5.0-rc3-00083-gc05235d #10
> [ 2489.384176] task: 00000000c54d0000 ti: 0000000060504000 task.ti: 0000000060504000
> [ 2489.384179] Krnl PSW : 0704c00180000000 0000000000283cf4 (free_pgd_range+0x334/0x460)
> [ 2489.384184]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
>                Krnl GPRS: 0000000001a161c7 0000000000000000 0000000000000037 0000000000000000
> [ 2489.384189]            0000000000283cf0 0000000000000000 000003ff7d980000 0000000060507e18
> [ 2489.384192]            000003ff00000000 0000000075e43ff0 000003ff7d97ffff 000003ff7d980000
> [ 2489.384195]            000000006a9bc000 00000000006cc390 0000000000283cf0 0000000060507c68
> [ 2489.384201] Krnl Code: 0000000000283ce4: c030002e14dd        larl    %%r3,84669e
>                           0000000000283cea: c0e5ffffd217        brasl   %%r14,27e118
>                          #0000000000283cf0: a7f40001            brc     15,283cf2
>                          >0000000000283cf4: c0e5fffffe5a        brasl   %%r14,2839a8
>                           0000000000283cfa: b9040027            lgr     %%r2,%%r7
>                           0000000000283cfe: b904003c            lgr     %%r3,%%r12
>                           0000000000283d02: c0e5fff509e3        brasl   %%r14,1250c8
>                           0000000000283d08: e31070000004        lg      %%r1,0(%%r7)
> [ 2489.384221] Call Trace:
> [ 2489.384224] ([<0000000000283cf0>] free_pgd_range+0x330/0x460)
> [ 2489.384227]  [<0000000000283f38>] free_pgtables+0x118/0x148
> [ 2489.384230]  [<000000000028c32e>] exit_mmap+0xd6/0x300
> [ 2489.384233]  [<0000000000134d70>] mmput+0x90/0x118
> [ 2489.384235]  [<000000000013a55c>] do_exit+0x41c/0xd18
> [ 2489.384238]  [<000000000013c3c2>] do_group_exit+0x92/0xd8
> [ 2489.384241]  [<000000000013c432>] SyS_exit_group+0x2a/0x30
> [ 2489.384244]  [<00000000006b1a36>] system_call+0xd6/0x258
> [ 2489.384246]  [<000003ff7d343698>] 0x3ff7d343698
> [ 2489.384248] INFO: lockdep is turned off.
> [ 2489.384251] Last Breaking-Event-Address:
> [ 2489.384253]  [<0000000000283cf0>] free_pgd_range+0x330/0x460
> [ 2489.384256]  
> [ 2489.384258] Kernel panic - not syncing: Fatal exception: panic_on_oops
> 
> I'll try to add a BUG_ON(pmd_huge(*pmd)) to free_pte_range() and see if that
> catches anything, and I'll also check if debug_cow = 1 or use_zero_page = 0
> makes any difference.

I worth minimizing kernel config on which you can see the bug. Things like
CONFIG_DEBUG_PAGEALLOC used to interfere with THP before.

You can also disable khugepaged, just in case.

One more thing: try add smp_wmb() in pgtable_trans_huge_withdraw() just
before return to make sure all CPUs sees _PAGE_INVALID.
I don't think it would make a difference. Again, just in case.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-18 17:06                       ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-18 17:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 18, 2016 at 04:00:37PM +0100, Gerald Schaefer wrote:
> On Thu, 18 Feb 2016 01:58:08 +0200
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > On Wed, Feb 17, 2016 at 08:13:40PM +0100, Gerald Schaefer wrote:
> > > On Sat, 13 Feb 2016 12:58:31 +0100 (CET)
> > > Sebastian Ott <sebott@linux.vnet.ibm.com> wrote:
> > > 
> > > > [   59.875935] ------------[ cut here ]------------
> > > > [   59.875937] kernel BUG at mm/huge_memory.c:2884!
> > > > [   59.875979] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > > > [   59.875986] Modules linked in: bridge stp llc btrfs xor mlx4_en vxlan ip6_udp_tunnel udp_tunnel mlx4_ib ptp pps_core ib_sa ib_mad ib_core ib_addr ghash_s390 prng raid6_pq ecb aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 mlx4_core sha_common genwqe_card scm_block crc_itu_t vhost_net tun vhost dm_mod macvtap eadm_sch macvlan kvm autofs4
> > > > [   59.876033] CPU: 2 PID: 5402 Comm: git Tainted: G        W       4.4.0-07794-ga4eff16-dirty #77
> > > > [   59.876036] task: 00000000d2312948 ti: 00000000cfecc000 task.ti: 00000000cfecc000
> > > > [   59.876039] Krnl PSW : 0704d00180000000 00000000002bf3aa (__split_huge_pmd_locked+0x562/0xa10)
> > > > [   59.876045]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
> > > >                Krnl GPRS: 0000000001a7a1cf 000003d10177c000 0000000000044068 000000005df00215
> > > > [   59.876051]            0000000000000001 0000000000000001 0000000000000000 00000000774e6900
> > > > [   59.876054]            000003ff52000000 000000006d403b10 000000006e1eb800 000003ff51f00000
> > > > [   59.876058]            000003d10177c000 0000000000715190 00000000002bf234 00000000cfecfb58
> > > > [   59.876068] Krnl Code: 00000000002bf39c: d507d010a000	clc	16(8,%%r13),0(%%r10)
> > > >                           00000000002bf3a2: a7840004		brc	8,2bf3aa
> > > >                          #00000000002bf3a6: a7f40001		brc	15,2bf3a8
> > > >                          >00000000002bf3aa: 91407440		tm	1088(%%r7),64
> > > >                           00000000002bf3ae: a7840208		brc	8,2bf7be
> > > >                           00000000002bf3b2: a7f401e9		brc	15,2bf784
> > > >                           00000000002bf3b6: 9104a006		tm	6(%%r10),4
> > > >                           00000000002bf3ba: a7740004		brc	7,2bf3c2
> > > > [   59.876089] Call Trace:
> > > > [   59.876092] ([<00000000002bf234>] __split_huge_pmd_locked+0x3ec/0xa10)
> > > > [   59.876095]  [<00000000002c4310>] __split_huge_pmd+0x118/0x218
> > > > [   59.876099]  [<00000000002810e8>] unmap_single_vma+0x2d8/0xb40
> > > > [   59.876102]  [<0000000000282d66>] zap_page_range+0x116/0x318
> > > > [   59.876105]  [<000000000029b834>] SyS_madvise+0x23c/0x5e8
> > > > [   59.876108]  [<00000000006f9f56>] system_call+0xd6/0x258
> > > > [   59.876111]  [<000003ff9bbfd282>] 0x3ff9bbfd282
> > > > [   59.876113] INFO: lockdep is turned off.
> > > > [   59.876115] Last Breaking-Event-Address:
> > > > [   59.876118]  [<00000000002bf3a6>] __split_huge_pmd_locked+0x55e/0xa10
> > > 
> > > The BUG at mm/huge_memory.c:2884 is interesting, it's the BUG_ON(!pte_none(*pte))
> > > check in __split_huge_pmd_locked(). Obviously we expect the pre-allocated
> > > pagetables to be empty, but in collapse_huge_page() we deposit the original
> > > pagetable instead of allocating a new (empty) one. This saves an allocation,
> > > which is good, but doesn't that mean that if such a collapsed hugepage will
> > > ever be split, we will always run into the BUG_ON(!pte_none(*pte)), or one
> > > of the two other VM_BUG_ONs in mm/huge_memory.c that check the same?
> > > 
> > > This behavior is not new, it was the same before the THP rework, so I do not
> > > assume that it is related to the current problems, maybe with the exception
> > > of this specific crash. I never saw the BUG at mm/huge_memory.c:2884 myself,
> > > and the other crashes probably cannot be explained with this. Maybe I am
> > > also missing something, but I do not see how collapse_huge_page() and the
> > > (non-empty) pgtable deposit there can work out with the BUG_ON(!pte_none(*pte))
> > > checks. Any thoughts?
> > 
> > I don't think there's a problem: ptes in the pgtable are cleared with
> > pte_clear() in __collapse_huge_page_copy().
> > 
> 
> Ah OK, I didn't see that. Still the BUG_ON() tells us that something went
> wrong with the pre-allocated pagetable, or at least with the deposit/withdraw
> list, or both. Given that on s390 we keep the listheads for the deposit/withdraw
> list inside the pre-allocated pgtables, instead of the struct pages, it may
> also explain why we see don't the problems on x86.
> 
> We already have the list corruption warning in exit_mmap -> zap_huge_pmd ->
> withdraw, and from time to time I also hit the BUG_ON(page->pmd_huge_pte)
> in exit_mmap -> free_pgtables -> free_pmd_range, which also indicates some
> issues with the deposit/withdraw list, see below:
> 
> [ 2489.384069] page:000003d101aa6f00 count:1 mapcount:0 mapping:          (null) index:0x0
> [ 2489.384075] flags: 0x0()
> [ 2489.384078] page dumped because: VM_BUG_ON_PAGE(page->pmd_huge_pte)
> [ 2489.384086] ------------[ cut here ]------------
> [ 2489.384088] kernel BUG at include/linux/mm.h:1700!
> [ 2489.384131] illegal operation: 0001 ilc:1 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [ 2489.384137] Modules linked in: bridge stp llc mlx4_ib ib_sa ib_mad mlx4_en ib_core vxlan udp_tunnel ptp pps_core ib_addr ghash_s390 prng ecb mlx4_core aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 sha_common eadm_sch dm_mod vhost_net tun vhost macvtap macvlan kvm autofs4
> [ 2489.384173] CPU: 5 PID: 173619 Comm: cc1 Tainted: G    B   W       4.5.0-rc3-00083-gc05235d #10
> [ 2489.384176] task: 00000000c54d0000 ti: 0000000060504000 task.ti: 0000000060504000
> [ 2489.384179] Krnl PSW : 0704c00180000000 0000000000283cf4 (free_pgd_range+0x334/0x460)
> [ 2489.384184]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
>                Krnl GPRS: 0000000001a161c7 0000000000000000 0000000000000037 0000000000000000
> [ 2489.384189]            0000000000283cf0 0000000000000000 000003ff7d980000 0000000060507e18
> [ 2489.384192]            000003ff00000000 0000000075e43ff0 000003ff7d97ffff 000003ff7d980000
> [ 2489.384195]            000000006a9bc000 00000000006cc390 0000000000283cf0 0000000060507c68
> [ 2489.384201] Krnl Code: 0000000000283ce4: c030002e14dd        larl    %%r3,84669e
>                           0000000000283cea: c0e5ffffd217        brasl   %%r14,27e118
>                          #0000000000283cf0: a7f40001            brc     15,283cf2
>                          >0000000000283cf4: c0e5fffffe5a        brasl   %%r14,2839a8
>                           0000000000283cfa: b9040027            lgr     %%r2,%%r7
>                           0000000000283cfe: b904003c            lgr     %%r3,%%r12
>                           0000000000283d02: c0e5fff509e3        brasl   %%r14,1250c8
>                           0000000000283d08: e31070000004        lg      %%r1,0(%%r7)
> [ 2489.384221] Call Trace:
> [ 2489.384224] ([<0000000000283cf0>] free_pgd_range+0x330/0x460)
> [ 2489.384227]  [<0000000000283f38>] free_pgtables+0x118/0x148
> [ 2489.384230]  [<000000000028c32e>] exit_mmap+0xd6/0x300
> [ 2489.384233]  [<0000000000134d70>] mmput+0x90/0x118
> [ 2489.384235]  [<000000000013a55c>] do_exit+0x41c/0xd18
> [ 2489.384238]  [<000000000013c3c2>] do_group_exit+0x92/0xd8
> [ 2489.384241]  [<000000000013c432>] SyS_exit_group+0x2a/0x30
> [ 2489.384244]  [<00000000006b1a36>] system_call+0xd6/0x258
> [ 2489.384246]  [<000003ff7d343698>] 0x3ff7d343698
> [ 2489.384248] INFO: lockdep is turned off.
> [ 2489.384251] Last Breaking-Event-Address:
> [ 2489.384253]  [<0000000000283cf0>] free_pgd_range+0x330/0x460
> [ 2489.384256]  
> [ 2489.384258] Kernel panic - not syncing: Fatal exception: panic_on_oops
> 
> I'll try to add a BUG_ON(pmd_huge(*pmd)) to free_pte_range() and see if that
> catches anything, and I'll also check if debug_cow = 1 or use_zero_page = 0
> makes any difference.

I worth minimizing kernel config on which you can see the bug. Things like
CONFIG_DEBUG_PAGEALLOC used to interfere with THP before.

You can also disable khugepaged, just in case.

One more thing: try add smp_wmb() in pgtable_trans_huge_withdraw() just
before return to make sure all CPUs sees _PAGE_INVALID.
I don't think it would make a difference. Again, just in case.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-18 17:06                       ` Kirill A. Shutemov
  (?)
@ 2016-02-19 14:15                         ` Sebastian Ott
  -1 siblings, 0 replies; 153+ messages in thread
From: Sebastian Ott @ 2016-02-19 14:15 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Gerald Schaefer, Andrea Arcangeli, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390


On Thu, 18 Feb 2016, Kirill A. Shutemov wrote:
> I worth minimizing kernel config on which you can see the bug. Things like
> CONFIG_DEBUG_PAGEALLOC used to interfere with THP before.

I disabled all debugging options (using
arch/s390/configs/performance_defconfig) - we still chrashed.

Sebastian

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-19 14:15                         ` Sebastian Ott
  0 siblings, 0 replies; 153+ messages in thread
From: Sebastian Ott @ 2016-02-19 14:15 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Gerald Schaefer, Andrea Arcangeli, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390


On Thu, 18 Feb 2016, Kirill A. Shutemov wrote:
> I worth minimizing kernel config on which you can see the bug. Things like
> CONFIG_DEBUG_PAGEALLOC used to interfere with THP before.

I disabled all debugging options (using
arch/s390/configs/performance_defconfig) - we still chrashed.

Sebastian

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-19 14:15                         ` Sebastian Ott
  0 siblings, 0 replies; 153+ messages in thread
From: Sebastian Ott @ 2016-02-19 14:15 UTC (permalink / raw)
  To: linux-arm-kernel


On Thu, 18 Feb 2016, Kirill A. Shutemov wrote:
> I worth minimizing kernel config on which you can see the bug. Things like
> CONFIG_DEBUG_PAGEALLOC used to interfere with THP before.

I disabled all debugging options (using
arch/s390/configs/performance_defconfig) - we still chrashed.

Sebastian

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-12 17:16           ` Gerald Schaefer
  (?)
@ 2016-02-23 10:32             ` Kirill A. Shutemov
  -1 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-23 10:32 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Christian Borntraeger, Kirill A. Shutemov, linux-mm,
	linux-kernel, Aneesh Kumar K.V, Andrew Morton, Linus Torvalds,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On Fri, Feb 12, 2016 at 06:16:40PM +0100, Gerald Schaefer wrote:
> On Fri, 12 Feb 2016 16:57:27 +0100
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
> > > I'm also confused by pmd_none() is equal to !pmd_present() on s390. Hm?
> > 
> > Don't know, Gerald or Martin?
> 
> The implementation frequently changes depending on how many new bits Martin
> needs to squeeze out :-)
> We don't have a _PAGE_PRESENT bit for pmds, so pmd_present() just checks if the
> entry is not empty. pmd_none() of course does the opposite, it checks if it is
> empty.

I still worry about pmd_present(). It looks wrong to me. I wounder if
patch below makes a difference.

The theory is that the splitting bit effetely masked bogus pmd_present():
we had pmd_trans_splitting() in all code path and that prevented mm from
touching the pmd. Once pmd_trans_splitting() has gone, mm proceed with the
pmd where it shouldn't and here's a boom.

I'm not sure that the patch is correct wrt yound/old pmds and I have no
way to test it...

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 64ead8091248..2eeb17ab68ac 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -490,7 +490,7 @@ static inline int pud_bad(pud_t pud)
 
 static inline int pmd_present(pmd_t pmd)
 {
-	return pmd_val(pmd) != _SEGMENT_ENTRY_INVALID;
+	return !(pmd_val(pmd) & _SEGMENT_ENTRY_INVALID);
 }
 
 static inline int pmd_none(pmd_t pmd)
-- 
 Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-23 10:32             ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-23 10:32 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Christian Borntraeger, Kirill A. Shutemov, linux-mm,
	linux-kernel, Aneesh Kumar K.V, Andrew Morton, Linus Torvalds,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On Fri, Feb 12, 2016 at 06:16:40PM +0100, Gerald Schaefer wrote:
> On Fri, 12 Feb 2016 16:57:27 +0100
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
> > > I'm also confused by pmd_none() is equal to !pmd_present() on s390. Hm?
> > 
> > Don't know, Gerald or Martin?
> 
> The implementation frequently changes depending on how many new bits Martin
> needs to squeeze out :-)
> We don't have a _PAGE_PRESENT bit for pmds, so pmd_present() just checks if the
> entry is not empty. pmd_none() of course does the opposite, it checks if it is
> empty.

I still worry about pmd_present(). It looks wrong to me. I wounder if
patch below makes a difference.

The theory is that the splitting bit effetely masked bogus pmd_present():
we had pmd_trans_splitting() in all code path and that prevented mm from
touching the pmd. Once pmd_trans_splitting() has gone, mm proceed with the
pmd where it shouldn't and here's a boom.

I'm not sure that the patch is correct wrt yound/old pmds and I have no
way to test it...

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 64ead8091248..2eeb17ab68ac 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -490,7 +490,7 @@ static inline int pud_bad(pud_t pud)
 
 static inline int pmd_present(pmd_t pmd)
 {
-	return pmd_val(pmd) != _SEGMENT_ENTRY_INVALID;
+	return !(pmd_val(pmd) & _SEGMENT_ENTRY_INVALID);
 }
 
 static inline int pmd_none(pmd_t pmd)
-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-23 10:32             ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-23 10:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 12, 2016 at 06:16:40PM +0100, Gerald Schaefer wrote:
> On Fri, 12 Feb 2016 16:57:27 +0100
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
> > > I'm also confused by pmd_none() is equal to !pmd_present() on s390. Hm?
> > 
> > Don't know, Gerald or Martin?
> 
> The implementation frequently changes depending on how many new bits Martin
> needs to squeeze out :-)
> We don't have a _PAGE_PRESENT bit for pmds, so pmd_present() just checks if the
> entry is not empty. pmd_none() of course does the opposite, it checks if it is
> empty.

I still worry about pmd_present(). It looks wrong to me. I wounder if
patch below makes a difference.

The theory is that the splitting bit effetely masked bogus pmd_present():
we had pmd_trans_splitting() in all code path and that prevented mm from
touching the pmd. Once pmd_trans_splitting() has gone, mm proceed with the
pmd where it shouldn't and here's a boom.

I'm not sure that the patch is correct wrt yound/old pmds and I have no
way to test it...

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 64ead8091248..2eeb17ab68ac 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -490,7 +490,7 @@ static inline int pud_bad(pud_t pud)
 
 static inline int pmd_present(pmd_t pmd)
 {
-	return pmd_val(pmd) != _SEGMENT_ENTRY_INVALID;
+	return !(pmd_val(pmd) & _SEGMENT_ENTRY_INVALID);
 }
 
 static inline int pmd_none(pmd_t pmd)
-- 
 Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-23 10:32             ` Kirill A. Shutemov
  (?)
  (?)
@ 2016-02-23 17:46               ` Linus Torvalds
  -1 siblings, 0 replies; 153+ messages in thread
From: Linus Torvalds @ 2016-02-23 17:46 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Gerald Schaefer, Christian Borntraeger, Kirill A. Shutemov,
	linux-mm, Linux Kernel Mailing List, Aneesh Kumar K.V,
	Andrew Morton, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, ppc-dev, Catalin Marinas, Will Deacon,
	linux-arm-kernel, Martin Schwidefsky, Heiko Carstens, linux-s390,
	Sebastian Ott

On Tue, Feb 23, 2016 at 2:32 AM, Kirill A. Shutemov
<kirill@shutemov.name> wrote:
>
> I still worry about pmd_present(). It looks wrong to me. I wounder if
> patch below makes a difference.

Let's hope that's it, but in the meantime I do want to start the
discussion about what to do if it isn't. We're at rc5, and 4.5 is just
a few weeks away, and so far this issue hasn't gone anywhere.

So the *good* scenario is that your pmd_present() patch fixes it, and
we can all take a relieved breath.

But if not, what then? It looks like we have two options:

 (a) do a (hopefully minimal) revert.

     I say "hopefully minimal", but I suspect the revert is going to
have to undo pretty much all of the core THP changes. I'd hate to see
that, because I really liked the cleanups.

 (b) mark THP as "depends on !S390" in the 4.5 release

The (b) option is obviously much simpler, but it's a regression. I
really don't like it, even if it generally shouldn't be the kind of
regression that is actually user-noticeable (apart from performance).
I also hate the fact that while the problem only seems to happen on
s390, we don't even understand it, so maybe it's a more generic issue
that for some reason just ends up being *much* more noticeable on one
odd architecture that happens to be a bit different.

I'm inclined to think of (b) as just a "give us more time to figure it
out" thing, but I'm also worried that it will then make people not
pursue this issue.

How big is a revert patch that makes THP work on s390 again? Can we do
a revert that keeps the infrastructure intact and makes it easy to
revisit the THP cleanups later? Or is the revert inevitably going to
be all the core patches in that series?

                   Linus

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-23 17:46               ` Linus Torvalds
  0 siblings, 0 replies; 153+ messages in thread
From: Linus Torvalds @ 2016-02-23 17:46 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Gerald Schaefer, Christian Borntraeger, Kirill A. Shutemov,
	linux-mm, Linux Kernel Mailing List, Aneesh Kumar K.V,
	Andrew Morton, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, ppc-dev, Catalin Marinas, Will Deacon,
	linux-arm-kernel, Martin Schwidefsky, Heiko Carstens, linux-s390,
	Sebastian Ott

On Tue, Feb 23, 2016 at 2:32 AM, Kirill A. Shutemov
<kirill@shutemov.name> wrote:
>
> I still worry about pmd_present(). It looks wrong to me. I wounder if
> patch below makes a difference.

Let's hope that's it, but in the meantime I do want to start the
discussion about what to do if it isn't. We're at rc5, and 4.5 is just
a few weeks away, and so far this issue hasn't gone anywhere.

So the *good* scenario is that your pmd_present() patch fixes it, and
we can all take a relieved breath.

But if not, what then? It looks like we have two options:

 (a) do a (hopefully minimal) revert.

     I say "hopefully minimal", but I suspect the revert is going to
have to undo pretty much all of the core THP changes. I'd hate to see
that, because I really liked the cleanups.

 (b) mark THP as "depends on !S390" in the 4.5 release

The (b) option is obviously much simpler, but it's a regression. I
really don't like it, even if it generally shouldn't be the kind of
regression that is actually user-noticeable (apart from performance).
I also hate the fact that while the problem only seems to happen on
s390, we don't even understand it, so maybe it's a more generic issue
that for some reason just ends up being *much* more noticeable on one
odd architecture that happens to be a bit different.

I'm inclined to think of (b) as just a "give us more time to figure it
out" thing, but I'm also worried that it will then make people not
pursue this issue.

How big is a revert patch that makes THP work on s390 again? Can we do
a revert that keeps the infrastructure intact and makes it easy to
revisit the THP cleanups later? Or is the revert inevitably going to
be all the core patches in that series?

                   Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-23 17:46               ` Linus Torvalds
  0 siblings, 0 replies; 153+ messages in thread
From: Linus Torvalds @ 2016-02-23 17:46 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Gerald Schaefer, Christian Borntraeger, Kirill A. Shutemov,
	linux-mm, Linux Kernel Mailing List, Aneesh Kumar K.V,
	Andrew Morton, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, ppc-dev, Catalin Marinas, Will Deacon,
	linux-arm-kernel, Martin Schwidefsky, Heiko Carstens, linux-s390,
	Sebastian Ott

On Tue, Feb 23, 2016 at 2:32 AM, Kirill A. Shutemov
<kirill@shutemov.name> wrote:
>
> I still worry about pmd_present(). It looks wrong to me. I wounder if
> patch below makes a difference.

Let's hope that's it, but in the meantime I do want to start the
discussion about what to do if it isn't. We're at rc5, and 4.5 is just
a few weeks away, and so far this issue hasn't gone anywhere.

So the *good* scenario is that your pmd_present() patch fixes it, and
we can all take a relieved breath.

But if not, what then? It looks like we have two options:

 (a) do a (hopefully minimal) revert.

     I say "hopefully minimal", but I suspect the revert is going to
have to undo pretty much all of the core THP changes. I'd hate to see
that, because I really liked the cleanups.

 (b) mark THP as "depends on !S390" in the 4.5 release

The (b) option is obviously much simpler, but it's a regression. I
really don't like it, even if it generally shouldn't be the kind of
regression that is actually user-noticeable (apart from performance).
I also hate the fact that while the problem only seems to happen on
s390, we don't even understand it, so maybe it's a more generic issue
that for some reason just ends up being *much* more noticeable on one
odd architecture that happens to be a bit different.

I'm inclined to think of (b) as just a "give us more time to figure it
out" thing, but I'm also worried that it will then make people not
pursue this issue.

How big is a revert patch that makes THP work on s390 again? Can we do
a revert that keeps the infrastructure intact and makes it easy to
revisit the THP cleanups later? Or is the revert inevitably going to
be all the core patches in that series?

                   Linus

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-23 17:46               ` Linus Torvalds
  0 siblings, 0 replies; 153+ messages in thread
From: Linus Torvalds @ 2016-02-23 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 23, 2016 at 2:32 AM, Kirill A. Shutemov
<kirill@shutemov.name> wrote:
>
> I still worry about pmd_present(). It looks wrong to me. I wounder if
> patch below makes a difference.

Let's hope that's it, but in the meantime I do want to start the
discussion about what to do if it isn't. We're at rc5, and 4.5 is just
a few weeks away, and so far this issue hasn't gone anywhere.

So the *good* scenario is that your pmd_present() patch fixes it, and
we can all take a relieved breath.

But if not, what then? It looks like we have two options:

 (a) do a (hopefully minimal) revert.

     I say "hopefully minimal", but I suspect the revert is going to
have to undo pretty much all of the core THP changes. I'd hate to see
that, because I really liked the cleanups.

 (b) mark THP as "depends on !S390" in the 4.5 release

The (b) option is obviously much simpler, but it's a regression. I
really don't like it, even if it generally shouldn't be the kind of
regression that is actually user-noticeable (apart from performance).
I also hate the fact that while the problem only seems to happen on
s390, we don't even understand it, so maybe it's a more generic issue
that for some reason just ends up being *much* more noticeable on one
odd architecture that happens to be a bit different.

I'm inclined to think of (b) as just a "give us more time to figure it
out" thing, but I'm also worried that it will then make people not
pursue this issue.

How big is a revert patch that makes THP work on s390 again? Can we do
a revert that keeps the infrastructure intact and makes it easy to
revisit the THP cleanups later? Or is the revert inevitably going to
be all the core patches in that series?

                   Linus

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-23 10:32             ` Kirill A. Shutemov
  (?)
@ 2016-02-23 18:19               ` Gerald Schaefer
  -1 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-23 18:19 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Christian Borntraeger, Kirill A. Shutemov, linux-mm,
	linux-kernel, Aneesh Kumar K.V, Andrew Morton, Linus Torvalds,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On Tue, 23 Feb 2016 13:32:21 +0300
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Fri, Feb 12, 2016 at 06:16:40PM +0100, Gerald Schaefer wrote:
> > On Fri, 12 Feb 2016 16:57:27 +0100
> > Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> > 
> > > > I'm also confused by pmd_none() is equal to !pmd_present() on s390. Hm?
> > > 
> > > Don't know, Gerald or Martin?
> > 
> > The implementation frequently changes depending on how many new bits Martin
> > needs to squeeze out :-)
> > We don't have a _PAGE_PRESENT bit for pmds, so pmd_present() just checks if the
> > entry is not empty. pmd_none() of course does the opposite, it checks if it is
> > empty.
> 
> I still worry about pmd_present(). It looks wrong to me. I wounder if
> patch below makes a difference.
> 
> The theory is that the splitting bit effetely masked bogus pmd_present():
> we had pmd_trans_splitting() in all code path and that prevented mm from
> touching the pmd. Once pmd_trans_splitting() has gone, mm proceed with the
> pmd where it shouldn't and here's a boom.

Well, I don't think pmd_present() == true is bogus for a trans_huge pmd under
splitting, after all there is a page behind the the pmd. Also, if it was
bogus, and it would need to be false, why should it be marked !pmd_present()
only at the pmdp_invalidate() step before the pmd_populate()? It clearly
is pmd_present() before that, on all architectures, and if there was any
problem/race with that, setting it to !pmd_present() at this stage would
only (marginally) reduce the race window.

BTW, PowerPC and Sparc seem to do the same thing in pmdp_invalidate(),
i.e. they do not set pmd_present() == false, only mark it so that it would
not generate a new TLB entry, just like on s390. After all, the function
is called pmdp_invalidate(), and I think the comment in mm/huge_memory.c
before that call is just a little ambiguous in its wording. When it says
"mark the pmd notpresent" it probably means "mark it so that it will not
generate a new TLB entry", which is also what the comment is really about:
prevent huge and small entries in the TLB for the same page at the same
time.

FWIW, and since the ARM arch-list is already on cc, I think there is
an issue with pmdp_invalidate() on ARM, since it also seems to clear
the trans_huge (and formerly trans_splitting) bit, which actually makes
the pmd !pmd_present(), but it violates the other requirement from the
comment:
"the pmd_trans_huge and pmd_trans_splitting must remain set at all times
on the pmd until the split is complete for this pmd"

> 
> I'm not sure that the patch is correct wrt yound/old pmds and I have no
> way to test it...
> 
> diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
> index 64ead8091248..2eeb17ab68ac 100644
> --- a/arch/s390/include/asm/pgtable.h
> +++ b/arch/s390/include/asm/pgtable.h
> @@ -490,7 +490,7 @@ static inline int pud_bad(pud_t pud)
> 
>  static inline int pmd_present(pmd_t pmd)
>  {
> -	return pmd_val(pmd) != _SEGMENT_ENTRY_INVALID;
> +	return !(pmd_val(pmd) & _SEGMENT_ENTRY_INVALID);
>  }
> 
>  static inline int pmd_none(pmd_t pmd)

No, that would not work well with young rw and ro pmds. We do now
have an extra free bit in the pmd on s390, after the removal of the
splitting bit, so we could try to implement pmd_present() with that
sw bit, but that would also require several not-so-trivial changes
to the other code in arch/s390/include/asm/pgtable.h.

I'll check with Martin, maybe it is actually trivial, then we can
do a quick test it to rule that one out.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-23 18:19               ` Gerald Schaefer
  0 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-23 18:19 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Christian Borntraeger, Kirill A. Shutemov, linux-mm,
	linux-kernel, Aneesh Kumar K.V, Andrew Morton, Linus Torvalds,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On Tue, 23 Feb 2016 13:32:21 +0300
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Fri, Feb 12, 2016 at 06:16:40PM +0100, Gerald Schaefer wrote:
> > On Fri, 12 Feb 2016 16:57:27 +0100
> > Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> > 
> > > > I'm also confused by pmd_none() is equal to !pmd_present() on s390. Hm?
> > > 
> > > Don't know, Gerald or Martin?
> > 
> > The implementation frequently changes depending on how many new bits Martin
> > needs to squeeze out :-)
> > We don't have a _PAGE_PRESENT bit for pmds, so pmd_present() just checks if the
> > entry is not empty. pmd_none() of course does the opposite, it checks if it is
> > empty.
> 
> I still worry about pmd_present(). It looks wrong to me. I wounder if
> patch below makes a difference.
> 
> The theory is that the splitting bit effetely masked bogus pmd_present():
> we had pmd_trans_splitting() in all code path and that prevented mm from
> touching the pmd. Once pmd_trans_splitting() has gone, mm proceed with the
> pmd where it shouldn't and here's a boom.

Well, I don't think pmd_present() == true is bogus for a trans_huge pmd under
splitting, after all there is a page behind the the pmd. Also, if it was
bogus, and it would need to be false, why should it be marked !pmd_present()
only at the pmdp_invalidate() step before the pmd_populate()? It clearly
is pmd_present() before that, on all architectures, and if there was any
problem/race with that, setting it to !pmd_present() at this stage would
only (marginally) reduce the race window.

BTW, PowerPC and Sparc seem to do the same thing in pmdp_invalidate(),
i.e. they do not set pmd_present() == false, only mark it so that it would
not generate a new TLB entry, just like on s390. After all, the function
is called pmdp_invalidate(), and I think the comment in mm/huge_memory.c
before that call is just a little ambiguous in its wording. When it says
"mark the pmd notpresent" it probably means "mark it so that it will not
generate a new TLB entry", which is also what the comment is really about:
prevent huge and small entries in the TLB for the same page at the same
time.

FWIW, and since the ARM arch-list is already on cc, I think there is
an issue with pmdp_invalidate() on ARM, since it also seems to clear
the trans_huge (and formerly trans_splitting) bit, which actually makes
the pmd !pmd_present(), but it violates the other requirement from the
comment:
"the pmd_trans_huge and pmd_trans_splitting must remain set at all times
on the pmd until the split is complete for this pmd"

> 
> I'm not sure that the patch is correct wrt yound/old pmds and I have no
> way to test it...
> 
> diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
> index 64ead8091248..2eeb17ab68ac 100644
> --- a/arch/s390/include/asm/pgtable.h
> +++ b/arch/s390/include/asm/pgtable.h
> @@ -490,7 +490,7 @@ static inline int pud_bad(pud_t pud)
> 
>  static inline int pmd_present(pmd_t pmd)
>  {
> -	return pmd_val(pmd) != _SEGMENT_ENTRY_INVALID;
> +	return !(pmd_val(pmd) & _SEGMENT_ENTRY_INVALID);
>  }
> 
>  static inline int pmd_none(pmd_t pmd)

No, that would not work well with young rw and ro pmds. We do now
have an extra free bit in the pmd on s390, after the removal of the
splitting bit, so we could try to implement pmd_present() with that
sw bit, but that would also require several not-so-trivial changes
to the other code in arch/s390/include/asm/pgtable.h.

I'll check with Martin, maybe it is actually trivial, then we can
do a quick test it to rule that one out.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-23 18:19               ` Gerald Schaefer
  0 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-23 18:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 23 Feb 2016 13:32:21 +0300
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Fri, Feb 12, 2016 at 06:16:40PM +0100, Gerald Schaefer wrote:
> > On Fri, 12 Feb 2016 16:57:27 +0100
> > Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> > 
> > > > I'm also confused by pmd_none() is equal to !pmd_present() on s390. Hm?
> > > 
> > > Don't know, Gerald or Martin?
> > 
> > The implementation frequently changes depending on how many new bits Martin
> > needs to squeeze out :-)
> > We don't have a _PAGE_PRESENT bit for pmds, so pmd_present() just checks if the
> > entry is not empty. pmd_none() of course does the opposite, it checks if it is
> > empty.
> 
> I still worry about pmd_present(). It looks wrong to me. I wounder if
> patch below makes a difference.
> 
> The theory is that the splitting bit effetely masked bogus pmd_present():
> we had pmd_trans_splitting() in all code path and that prevented mm from
> touching the pmd. Once pmd_trans_splitting() has gone, mm proceed with the
> pmd where it shouldn't and here's a boom.

Well, I don't think pmd_present() == true is bogus for a trans_huge pmd under
splitting, after all there is a page behind the the pmd. Also, if it was
bogus, and it would need to be false, why should it be marked !pmd_present()
only at the pmdp_invalidate() step before the pmd_populate()? It clearly
is pmd_present() before that, on all architectures, and if there was any
problem/race with that, setting it to !pmd_present() at this stage would
only (marginally) reduce the race window.

BTW, PowerPC and Sparc seem to do the same thing in pmdp_invalidate(),
i.e. they do not set pmd_present() == false, only mark it so that it would
not generate a new TLB entry, just like on s390. After all, the function
is called pmdp_invalidate(), and I think the comment in mm/huge_memory.c
before that call is just a little ambiguous in its wording. When it says
"mark the pmd notpresent" it probably means "mark it so that it will not
generate a new TLB entry", which is also what the comment is really about:
prevent huge and small entries in the TLB for the same page at the same
time.

FWIW, and since the ARM arch-list is already on cc, I think there is
an issue with pmdp_invalidate() on ARM, since it also seems to clear
the trans_huge (and formerly trans_splitting) bit, which actually makes
the pmd !pmd_present(), but it violates the other requirement from the
comment:
"the pmd_trans_huge and pmd_trans_splitting must remain set at all times
on the pmd until the split is complete for this pmd"

> 
> I'm not sure that the patch is correct wrt yound/old pmds and I have no
> way to test it...
> 
> diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
> index 64ead8091248..2eeb17ab68ac 100644
> --- a/arch/s390/include/asm/pgtable.h
> +++ b/arch/s390/include/asm/pgtable.h
> @@ -490,7 +490,7 @@ static inline int pud_bad(pud_t pud)
> 
>  static inline int pmd_present(pmd_t pmd)
>  {
> -	return pmd_val(pmd) != _SEGMENT_ENTRY_INVALID;
> +	return !(pmd_val(pmd) & _SEGMENT_ENTRY_INVALID);
>  }
> 
>  static inline int pmd_none(pmd_t pmd)

No, that would not work well with young rw and ro pmds. We do now
have an extra free bit in the pmd on s390, after the removal of the
splitting bit, so we could try to implement pmd_present() with that
sw bit, but that would also require several not-so-trivial changes
to the other code in arch/s390/include/asm/pgtable.h.

I'll check with Martin, maybe it is actually trivial, then we can
do a quick test it to rule that one out.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-23 18:19               ` Gerald Schaefer
  (?)
@ 2016-02-23 18:47                 ` Will Deacon
  -1 siblings, 0 replies; 153+ messages in thread
From: Will Deacon @ 2016-02-23 18:47 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Kirill A. Shutemov, Christian Borntraeger, Kirill A. Shutemov,
	linux-mm, linux-kernel, Aneesh Kumar K.V, Andrew Morton,
	Linus Torvalds, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Catalin Marinas, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott,
	steve.capper

[adding Steve, since he worked on THP for 32-bit ARM]

On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
> On Tue, 23 Feb 2016 13:32:21 +0300
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> > The theory is that the splitting bit effetely masked bogus pmd_present():
> > we had pmd_trans_splitting() in all code path and that prevented mm from
> > touching the pmd. Once pmd_trans_splitting() has gone, mm proceed with the
> > pmd where it shouldn't and here's a boom.
> 
> Well, I don't think pmd_present() == true is bogus for a trans_huge pmd under
> splitting, after all there is a page behind the the pmd. Also, if it was
> bogus, and it would need to be false, why should it be marked !pmd_present()
> only at the pmdp_invalidate() step before the pmd_populate()? It clearly
> is pmd_present() before that, on all architectures, and if there was any
> problem/race with that, setting it to !pmd_present() at this stage would
> only (marginally) reduce the race window.
> 
> BTW, PowerPC and Sparc seem to do the same thing in pmdp_invalidate(),
> i.e. they do not set pmd_present() == false, only mark it so that it would
> not generate a new TLB entry, just like on s390. After all, the function
> is called pmdp_invalidate(), and I think the comment in mm/huge_memory.c
> before that call is just a little ambiguous in its wording. When it says
> "mark the pmd notpresent" it probably means "mark it so that it will not
> generate a new TLB entry", which is also what the comment is really about:
> prevent huge and small entries in the TLB for the same page at the same
> time.
> 
> FWIW, and since the ARM arch-list is already on cc, I think there is
> an issue with pmdp_invalidate() on ARM, since it also seems to clear
> the trans_huge (and formerly trans_splitting) bit, which actually makes
> the pmd !pmd_present(), but it violates the other requirement from the
> comment:
> "the pmd_trans_huge and pmd_trans_splitting must remain set at all times
> on the pmd until the split is complete for this pmd"

I've only been testing this for arm64 (where I'm yet to see a problem),
but we use the generic pmdp_invalidate implementation from
mm/pgtable-generic.c there. On arm64, pmd_trans_huge will return true
after pmd_mknotpresent. On arm, it does look to be buggy, since it nukes
the entire entry... Steve?

Will

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-23 18:47                 ` Will Deacon
  0 siblings, 0 replies; 153+ messages in thread
From: Will Deacon @ 2016-02-23 18:47 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Kirill A. Shutemov, Christian Borntraeger, Kirill A. Shutemov,
	linux-mm, linux-kernel, Aneesh Kumar K.V, Andrew Morton,
	Linus Torvalds, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Catalin Marinas, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott,
	steve.capper

[adding Steve, since he worked on THP for 32-bit ARM]

On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
> On Tue, 23 Feb 2016 13:32:21 +0300
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> > The theory is that the splitting bit effetely masked bogus pmd_present():
> > we had pmd_trans_splitting() in all code path and that prevented mm from
> > touching the pmd. Once pmd_trans_splitting() has gone, mm proceed with the
> > pmd where it shouldn't and here's a boom.
> 
> Well, I don't think pmd_present() == true is bogus for a trans_huge pmd under
> splitting, after all there is a page behind the the pmd. Also, if it was
> bogus, and it would need to be false, why should it be marked !pmd_present()
> only at the pmdp_invalidate() step before the pmd_populate()? It clearly
> is pmd_present() before that, on all architectures, and if there was any
> problem/race with that, setting it to !pmd_present() at this stage would
> only (marginally) reduce the race window.
> 
> BTW, PowerPC and Sparc seem to do the same thing in pmdp_invalidate(),
> i.e. they do not set pmd_present() == false, only mark it so that it would
> not generate a new TLB entry, just like on s390. After all, the function
> is called pmdp_invalidate(), and I think the comment in mm/huge_memory.c
> before that call is just a little ambiguous in its wording. When it says
> "mark the pmd notpresent" it probably means "mark it so that it will not
> generate a new TLB entry", which is also what the comment is really about:
> prevent huge and small entries in the TLB for the same page at the same
> time.
> 
> FWIW, and since the ARM arch-list is already on cc, I think there is
> an issue with pmdp_invalidate() on ARM, since it also seems to clear
> the trans_huge (and formerly trans_splitting) bit, which actually makes
> the pmd !pmd_present(), but it violates the other requirement from the
> comment:
> "the pmd_trans_huge and pmd_trans_splitting must remain set at all times
> on the pmd until the split is complete for this pmd"

I've only been testing this for arm64 (where I'm yet to see a problem),
but we use the generic pmdp_invalidate implementation from
mm/pgtable-generic.c there. On arm64, pmd_trans_huge will return true
after pmd_mknotpresent. On arm, it does look to be buggy, since it nukes
the entire entry... Steve?

Will

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-23 18:47                 ` Will Deacon
  0 siblings, 0 replies; 153+ messages in thread
From: Will Deacon @ 2016-02-23 18:47 UTC (permalink / raw)
  To: linux-arm-kernel

[adding Steve, since he worked on THP for 32-bit ARM]

On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
> On Tue, 23 Feb 2016 13:32:21 +0300
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> > The theory is that the splitting bit effetely masked bogus pmd_present():
> > we had pmd_trans_splitting() in all code path and that prevented mm from
> > touching the pmd. Once pmd_trans_splitting() has gone, mm proceed with the
> > pmd where it shouldn't and here's a boom.
> 
> Well, I don't think pmd_present() == true is bogus for a trans_huge pmd under
> splitting, after all there is a page behind the the pmd. Also, if it was
> bogus, and it would need to be false, why should it be marked !pmd_present()
> only at the pmdp_invalidate() step before the pmd_populate()? It clearly
> is pmd_present() before that, on all architectures, and if there was any
> problem/race with that, setting it to !pmd_present() at this stage would
> only (marginally) reduce the race window.
> 
> BTW, PowerPC and Sparc seem to do the same thing in pmdp_invalidate(),
> i.e. they do not set pmd_present() == false, only mark it so that it would
> not generate a new TLB entry, just like on s390. After all, the function
> is called pmdp_invalidate(), and I think the comment in mm/huge_memory.c
> before that call is just a little ambiguous in its wording. When it says
> "mark the pmd notpresent" it probably means "mark it so that it will not
> generate a new TLB entry", which is also what the comment is really about:
> prevent huge and small entries in the TLB for the same page at the same
> time.
> 
> FWIW, and since the ARM arch-list is already on cc, I think there is
> an issue with pmdp_invalidate() on ARM, since it also seems to clear
> the trans_huge (and formerly trans_splitting) bit, which actually makes
> the pmd !pmd_present(), but it violates the other requirement from the
> comment:
> "the pmd_trans_huge and pmd_trans_splitting must remain set at all times
> on the pmd until the split is complete for this pmd"

I've only been testing this for arm64 (where I'm yet to see a problem),
but we use the generic pmdp_invalidate implementation from
mm/pgtable-generic.c there. On arm64, pmd_trans_huge will return true
after pmd_mknotpresent. On arm, it does look to be buggy, since it nukes
the entire entry... Steve?

Will

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-23 18:19               ` Gerald Schaefer
  (?)
@ 2016-02-23 19:33                 ` Kirill A. Shutemov
  -1 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-23 19:33 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Christian Borntraeger, Kirill A. Shutemov, linux-mm,
	linux-kernel, Aneesh Kumar K.V, Andrew Morton, Linus Torvalds,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
> I'll check with Martin, maybe it is actually trivial, then we can
> do a quick test it to rule that one out.

Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
_the_ bug.

pmdp_invalidate() is called for the wrong address :-/
I guess that can be destructive on the architecture, right?

Could you check this?

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1c317b85ea7d..4246bc70e55a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2865,7 +2865,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	pgtable = pgtable_trans_huge_withdraw(mm, pmd);
 	pmd_populate(mm, &_pmd, pgtable);
 
-	for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
+	for (i = 0; i < HPAGE_PMD_NR; i++) {
 		pte_t entry, *pte;
 		/*
 		 * Note that NUMA hinting access restrictions are not
@@ -2886,9 +2886,9 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 		}
 		if (dirty)
 			SetPageDirty(page + i);
-		pte = pte_offset_map(&_pmd, haddr);
+		pte = pte_offset_map(&_pmd, haddr + i * PAGE_SIZE);
 		BUG_ON(!pte_none(*pte));
-		set_pte_at(mm, haddr, pte, entry);
+		set_pte_at(mm, haddr + i * PAGE_SIZE, pte, entry);
 		atomic_inc(&page[i]._mapcount);
 		pte_unmap(pte);
 	}
@@ -2938,7 +2938,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	pmd_populate(mm, pmd, pgtable);
 
 	if (freeze) {
-		for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
+		for (i = 0; i < HPAGE_PMD_NR; i++) {
 			page_remove_rmap(page + i, false);
 			put_page(page + i);
 		}
-- 
 Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-23 19:33                 ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-23 19:33 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Christian Borntraeger, Kirill A. Shutemov, linux-mm,
	linux-kernel, Aneesh Kumar K.V, Andrew Morton, Linus Torvalds,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
> I'll check with Martin, maybe it is actually trivial, then we can
> do a quick test it to rule that one out.

Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
_the_ bug.

pmdp_invalidate() is called for the wrong address :-/
I guess that can be destructive on the architecture, right?

Could you check this?

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1c317b85ea7d..4246bc70e55a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2865,7 +2865,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	pgtable = pgtable_trans_huge_withdraw(mm, pmd);
 	pmd_populate(mm, &_pmd, pgtable);
 
-	for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
+	for (i = 0; i < HPAGE_PMD_NR; i++) {
 		pte_t entry, *pte;
 		/*
 		 * Note that NUMA hinting access restrictions are not
@@ -2886,9 +2886,9 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 		}
 		if (dirty)
 			SetPageDirty(page + i);
-		pte = pte_offset_map(&_pmd, haddr);
+		pte = pte_offset_map(&_pmd, haddr + i * PAGE_SIZE);
 		BUG_ON(!pte_none(*pte));
-		set_pte_at(mm, haddr, pte, entry);
+		set_pte_at(mm, haddr + i * PAGE_SIZE, pte, entry);
 		atomic_inc(&page[i]._mapcount);
 		pte_unmap(pte);
 	}
@@ -2938,7 +2938,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	pmd_populate(mm, pmd, pgtable);
 
 	if (freeze) {
-		for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
+		for (i = 0; i < HPAGE_PMD_NR; i++) {
 			page_remove_rmap(page + i, false);
 			put_page(page + i);
 		}
-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-23 19:33                 ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-23 19:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
> I'll check with Martin, maybe it is actually trivial, then we can
> do a quick test it to rule that one out.

Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
_the_ bug.

pmdp_invalidate() is called for the wrong address :-/
I guess that can be destructive on the architecture, right?

Could you check this?

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1c317b85ea7d..4246bc70e55a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2865,7 +2865,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	pgtable = pgtable_trans_huge_withdraw(mm, pmd);
 	pmd_populate(mm, &_pmd, pgtable);
 
-	for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
+	for (i = 0; i < HPAGE_PMD_NR; i++) {
 		pte_t entry, *pte;
 		/*
 		 * Note that NUMA hinting access restrictions are not
@@ -2886,9 +2886,9 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 		}
 		if (dirty)
 			SetPageDirty(page + i);
-		pte = pte_offset_map(&_pmd, haddr);
+		pte = pte_offset_map(&_pmd, haddr + i * PAGE_SIZE);
 		BUG_ON(!pte_none(*pte));
-		set_pte_at(mm, haddr, pte, entry);
+		set_pte_at(mm, haddr + i * PAGE_SIZE, pte, entry);
 		atomic_inc(&page[i]._mapcount);
 		pte_unmap(pte);
 	}
@@ -2938,7 +2938,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	pmd_populate(mm, pmd, pgtable);
 
 	if (freeze) {
-		for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
+		for (i = 0; i < HPAGE_PMD_NR; i++) {
 			page_remove_rmap(page + i, false);
 			put_page(page + i);
 		}
-- 
 Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-23 19:33                 ` Kirill A. Shutemov
  (?)
@ 2016-02-23 20:22                   ` Will Deacon
  -1 siblings, 0 replies; 153+ messages in thread
From: Will Deacon @ 2016-02-23 20:22 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Gerald Schaefer, Christian Borntraeger, Kirill A. Shutemov,
	linux-mm, linux-kernel, Aneesh Kumar K.V, Andrew Morton,
	Linus Torvalds, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Catalin Marinas, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On Tue, Feb 23, 2016 at 10:33:45PM +0300, Kirill A. Shutemov wrote:
> On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
> > I'll check with Martin, maybe it is actually trivial, then we can
> > do a quick test it to rule that one out.
> 
> Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
> _the_ bug.
> 
> pmdp_invalidate() is called for the wrong address :-/
> I guess that can be destructive on the architecture, right?

FWIW, arm64 ignores the address parameter for set_pmd_at, so this would
only result in the TLBI nuking the wrong entries, which is going to be
tricky to observe in practice given that we install a table entry
immediately afterwards that maps the same pages. If s390 does more here
(I see some magic asm using the address), that could be the answer...

Will

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-23 20:22                   ` Will Deacon
  0 siblings, 0 replies; 153+ messages in thread
From: Will Deacon @ 2016-02-23 20:22 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Gerald Schaefer, Christian Borntraeger, Kirill A. Shutemov,
	linux-mm, linux-kernel, Aneesh Kumar K.V, Andrew Morton,
	Linus Torvalds, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Catalin Marinas, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On Tue, Feb 23, 2016 at 10:33:45PM +0300, Kirill A. Shutemov wrote:
> On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
> > I'll check with Martin, maybe it is actually trivial, then we can
> > do a quick test it to rule that one out.
> 
> Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
> _the_ bug.
> 
> pmdp_invalidate() is called for the wrong address :-/
> I guess that can be destructive on the architecture, right?

FWIW, arm64 ignores the address parameter for set_pmd_at, so this would
only result in the TLBI nuking the wrong entries, which is going to be
tricky to observe in practice given that we install a table entry
immediately afterwards that maps the same pages. If s390 does more here
(I see some magic asm using the address), that could be the answer...

Will

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-23 20:22                   ` Will Deacon
  0 siblings, 0 replies; 153+ messages in thread
From: Will Deacon @ 2016-02-23 20:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 23, 2016 at 10:33:45PM +0300, Kirill A. Shutemov wrote:
> On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
> > I'll check with Martin, maybe it is actually trivial, then we can
> > do a quick test it to rule that one out.
> 
> Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
> _the_ bug.
> 
> pmdp_invalidate() is called for the wrong address :-/
> I guess that can be destructive on the architecture, right?

FWIW, arm64 ignores the address parameter for set_pmd_at, so this would
only result in the TLBI nuking the wrong entries, which is going to be
tricky to observe in practice given that we install a table entry
immediately afterwards that maps the same pages. If s390 does more here
(I see some magic asm using the address), that could be the answer...

Will

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-23 18:19               ` Gerald Schaefer
  (?)
@ 2016-02-24  8:22                 ` Martin Schwidefsky
  -1 siblings, 0 replies; 153+ messages in thread
From: Martin Schwidefsky @ 2016-02-24  8:22 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Kirill A. Shutemov, Christian Borntraeger, Kirill A. Shutemov,
	linux-mm, linux-kernel, Aneesh Kumar K.V, Andrew Morton,
	Linus Torvalds, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Catalin Marinas, Will Deacon,
	linux-arm-kernel, Heiko Carstens, linux-s390, Sebastian Ott

On Tue, 23 Feb 2016 19:19:07 +0100
Gerald Schaefer <gerald.schaefer@de.ibm.com> wrote:

> On Tue, 23 Feb 2016 13:32:21 +0300
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > On Fri, Feb 12, 2016 at 06:16:40PM +0100, Gerald Schaefer wrote:
> > > On Fri, 12 Feb 2016 16:57:27 +0100
> > > Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> > > 
> > > > > I'm also confused by pmd_none() is equal to !pmd_present() on s390. Hm?
> > > > 
> > > > Don't know, Gerald or Martin?
> > > 
> > > The implementation frequently changes depending on how many new bits Martin
> > > needs to squeeze out :-)
> > > We don't have a _PAGE_PRESENT bit for pmds, so pmd_present() just checks if the
> > > entry is not empty. pmd_none() of course does the opposite, it checks if it is
> > > empty.
> > 
> > I still worry about pmd_present(). It looks wrong to me. I wounder if
> > patch below makes a difference.
> > 
> > The theory is that the splitting bit effetely masked bogus pmd_present():
> > we had pmd_trans_splitting() in all code path and that prevented mm from
> > touching the pmd. Once pmd_trans_splitting() has gone, mm proceed with the
> > pmd where it shouldn't and here's a boom.
> 
> Well, I don't think pmd_present() == true is bogus for a trans_huge pmd under
> splitting, after all there is a page behind the the pmd. Also, if it was
> bogus, and it would need to be false, why should it be marked !pmd_present()
> only at the pmdp_invalidate() step before the pmd_populate()? It clearly
> is pmd_present() before that, on all architectures, and if there was any
> problem/race with that, setting it to !pmd_present() at this stage would
> only (marginally) reduce the race window.
> 
> BTW, PowerPC and Sparc seem to do the same thing in pmdp_invalidate(),
> i.e. they do not set pmd_present() == false, only mark it so that it would
> not generate a new TLB entry, just like on s390. After all, the function
> is called pmdp_invalidate(), and I think the comment in mm/huge_memory.c
> before that call is just a little ambiguous in its wording. When it says
> "mark the pmd notpresent" it probably means "mark it so that it will not
> generate a new TLB entry", which is also what the comment is really about:
> prevent huge and small entries in the TLB for the same page at the same
> time.

If I am not mistaken this is true for x86 as well. The generic implementation
for pmdp_invalidate sets a new pmd that has been modified with
pmd_mknotpresent. For x86 this function removes the _PAGE_PRESENT and
_PAGE_PROTNONE bits from the entry. The _PAGE_PSE bit stays set and that
makes pmd_present return true.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-24  8:22                 ` Martin Schwidefsky
  0 siblings, 0 replies; 153+ messages in thread
From: Martin Schwidefsky @ 2016-02-24  8:22 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Kirill A. Shutemov, Christian Borntraeger, Kirill A. Shutemov,
	linux-mm, linux-kernel, Aneesh Kumar K.V, Andrew Morton,
	Linus Torvalds, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Catalin Marinas, Will Deacon,
	linux-arm-kernel, Heiko Carstens, linux-s390, Sebastian Ott

On Tue, 23 Feb 2016 19:19:07 +0100
Gerald Schaefer <gerald.schaefer@de.ibm.com> wrote:

> On Tue, 23 Feb 2016 13:32:21 +0300
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > On Fri, Feb 12, 2016 at 06:16:40PM +0100, Gerald Schaefer wrote:
> > > On Fri, 12 Feb 2016 16:57:27 +0100
> > > Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> > > 
> > > > > I'm also confused by pmd_none() is equal to !pmd_present() on s390. Hm?
> > > > 
> > > > Don't know, Gerald or Martin?
> > > 
> > > The implementation frequently changes depending on how many new bits Martin
> > > needs to squeeze out :-)
> > > We don't have a _PAGE_PRESENT bit for pmds, so pmd_present() just checks if the
> > > entry is not empty. pmd_none() of course does the opposite, it checks if it is
> > > empty.
> > 
> > I still worry about pmd_present(). It looks wrong to me. I wounder if
> > patch below makes a difference.
> > 
> > The theory is that the splitting bit effetely masked bogus pmd_present():
> > we had pmd_trans_splitting() in all code path and that prevented mm from
> > touching the pmd. Once pmd_trans_splitting() has gone, mm proceed with the
> > pmd where it shouldn't and here's a boom.
> 
> Well, I don't think pmd_present() == true is bogus for a trans_huge pmd under
> splitting, after all there is a page behind the the pmd. Also, if it was
> bogus, and it would need to be false, why should it be marked !pmd_present()
> only at the pmdp_invalidate() step before the pmd_populate()? It clearly
> is pmd_present() before that, on all architectures, and if there was any
> problem/race with that, setting it to !pmd_present() at this stage would
> only (marginally) reduce the race window.
> 
> BTW, PowerPC and Sparc seem to do the same thing in pmdp_invalidate(),
> i.e. they do not set pmd_present() == false, only mark it so that it would
> not generate a new TLB entry, just like on s390. After all, the function
> is called pmdp_invalidate(), and I think the comment in mm/huge_memory.c
> before that call is just a little ambiguous in its wording. When it says
> "mark the pmd notpresent" it probably means "mark it so that it will not
> generate a new TLB entry", which is also what the comment is really about:
> prevent huge and small entries in the TLB for the same page at the same
> time.

If I am not mistaken this is true for x86 as well. The generic implementation
for pmdp_invalidate sets a new pmd that has been modified with
pmd_mknotpresent. For x86 this function removes the _PAGE_PRESENT and
_PAGE_PROTNONE bits from the entry. The _PAGE_PSE bit stays set and that
makes pmd_present return true.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-24  8:22                 ` Martin Schwidefsky
  0 siblings, 0 replies; 153+ messages in thread
From: Martin Schwidefsky @ 2016-02-24  8:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 23 Feb 2016 19:19:07 +0100
Gerald Schaefer <gerald.schaefer@de.ibm.com> wrote:

> On Tue, 23 Feb 2016 13:32:21 +0300
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > On Fri, Feb 12, 2016 at 06:16:40PM +0100, Gerald Schaefer wrote:
> > > On Fri, 12 Feb 2016 16:57:27 +0100
> > > Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> > > 
> > > > > I'm also confused by pmd_none() is equal to !pmd_present() on s390. Hm?
> > > > 
> > > > Don't know, Gerald or Martin?
> > > 
> > > The implementation frequently changes depending on how many new bits Martin
> > > needs to squeeze out :-)
> > > We don't have a _PAGE_PRESENT bit for pmds, so pmd_present() just checks if the
> > > entry is not empty. pmd_none() of course does the opposite, it checks if it is
> > > empty.
> > 
> > I still worry about pmd_present(). It looks wrong to me. I wounder if
> > patch below makes a difference.
> > 
> > The theory is that the splitting bit effetely masked bogus pmd_present():
> > we had pmd_trans_splitting() in all code path and that prevented mm from
> > touching the pmd. Once pmd_trans_splitting() has gone, mm proceed with the
> > pmd where it shouldn't and here's a boom.
> 
> Well, I don't think pmd_present() == true is bogus for a trans_huge pmd under
> splitting, after all there is a page behind the the pmd. Also, if it was
> bogus, and it would need to be false, why should it be marked !pmd_present()
> only at the pmdp_invalidate() step before the pmd_populate()? It clearly
> is pmd_present() before that, on all architectures, and if there was any
> problem/race with that, setting it to !pmd_present() at this stage would
> only (marginally) reduce the race window.
> 
> BTW, PowerPC and Sparc seem to do the same thing in pmdp_invalidate(),
> i.e. they do not set pmd_present() == false, only mark it so that it would
> not generate a new TLB entry, just like on s390. After all, the function
> is called pmdp_invalidate(), and I think the comment in mm/huge_memory.c
> before that call is just a little ambiguous in its wording. When it says
> "mark the pmd notpresent" it probably means "mark it so that it will not
> generate a new TLB entry", which is also what the comment is really about:
> prevent huge and small entries in the TLB for the same page at the same
> time.

If I am not mistaken this is true for x86 as well. The generic implementation
for pmdp_invalidate sets a new pmd that has been modified with
pmd_mknotpresent. For x86 this function removes the _PAGE_PRESENT and
_PAGE_PROTNONE bits from the entry. The _PAGE_PSE bit stays set and that
makes pmd_present return true.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-23 19:33                 ` Kirill A. Shutemov
  (?)
@ 2016-02-24  8:39                   ` Martin Schwidefsky
  -1 siblings, 0 replies; 153+ messages in thread
From: Martin Schwidefsky @ 2016-02-24  8:39 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Gerald Schaefer, Christian Borntraeger, Kirill A. Shutemov,
	linux-mm, linux-kernel, Aneesh Kumar K.V, Andrew Morton,
	Linus Torvalds, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Catalin Marinas, Will Deacon,
	linux-arm-kernel, Heiko Carstens, linux-s390, Sebastian Ott

On Tue, 23 Feb 2016 22:33:45 +0300
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
> > I'll check with Martin, maybe it is actually trivial, then we can
> > do a quick test it to rule that one out.
> 
> Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
> _the_ bug.
> 
> pmdp_invalidate() is called for the wrong address :-/
> I guess that can be destructive on the architecture, right?
> 
> Could you check this?
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 1c317b85ea7d..4246bc70e55a 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2865,7 +2865,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  	pgtable = pgtable_trans_huge_withdraw(mm, pmd);
>  	pmd_populate(mm, &_pmd, pgtable);
> 
> -	for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
> +	for (i = 0; i < HPAGE_PMD_NR; i++) {
>  		pte_t entry, *pte;
>  		/*
>  		 * Note that NUMA hinting access restrictions are not
> @@ -2886,9 +2886,9 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  		}
>  		if (dirty)
>  			SetPageDirty(page + i);
> -		pte = pte_offset_map(&_pmd, haddr);
> +		pte = pte_offset_map(&_pmd, haddr + i * PAGE_SIZE);
>  		BUG_ON(!pte_none(*pte));
> -		set_pte_at(mm, haddr, pte, entry);
> +		set_pte_at(mm, haddr + i * PAGE_SIZE, pte, entry);
>  		atomic_inc(&page[i]._mapcount);
>  		pte_unmap(pte);
>  	}
> @@ -2938,7 +2938,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  	pmd_populate(mm, pmd, pgtable);
> 
>  	if (freeze) {
> -		for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
> +		for (i = 0; i < HPAGE_PMD_NR; i++) {
>  			page_remove_rmap(page + i, false);
>  			put_page(page + i);
>  		}

Test is running and it looks good so far. For the final assessment I defer
to Gerald and Sebastian.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-24  8:39                   ` Martin Schwidefsky
  0 siblings, 0 replies; 153+ messages in thread
From: Martin Schwidefsky @ 2016-02-24  8:39 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Gerald Schaefer, Christian Borntraeger, Kirill A. Shutemov,
	linux-mm, linux-kernel, Aneesh Kumar K.V, Andrew Morton,
	Linus Torvalds, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Catalin Marinas, Will Deacon,
	linux-arm-kernel, Heiko Carstens, linux-s390, Sebastian Ott

On Tue, 23 Feb 2016 22:33:45 +0300
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
> > I'll check with Martin, maybe it is actually trivial, then we can
> > do a quick test it to rule that one out.
> 
> Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
> _the_ bug.
> 
> pmdp_invalidate() is called for the wrong address :-/
> I guess that can be destructive on the architecture, right?
> 
> Could you check this?
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 1c317b85ea7d..4246bc70e55a 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2865,7 +2865,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  	pgtable = pgtable_trans_huge_withdraw(mm, pmd);
>  	pmd_populate(mm, &_pmd, pgtable);
> 
> -	for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
> +	for (i = 0; i < HPAGE_PMD_NR; i++) {
>  		pte_t entry, *pte;
>  		/*
>  		 * Note that NUMA hinting access restrictions are not
> @@ -2886,9 +2886,9 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  		}
>  		if (dirty)
>  			SetPageDirty(page + i);
> -		pte = pte_offset_map(&_pmd, haddr);
> +		pte = pte_offset_map(&_pmd, haddr + i * PAGE_SIZE);
>  		BUG_ON(!pte_none(*pte));
> -		set_pte_at(mm, haddr, pte, entry);
> +		set_pte_at(mm, haddr + i * PAGE_SIZE, pte, entry);
>  		atomic_inc(&page[i]._mapcount);
>  		pte_unmap(pte);
>  	}
> @@ -2938,7 +2938,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  	pmd_populate(mm, pmd, pgtable);
> 
>  	if (freeze) {
> -		for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
> +		for (i = 0; i < HPAGE_PMD_NR; i++) {
>  			page_remove_rmap(page + i, false);
>  			put_page(page + i);
>  		}

Test is running and it looks good so far. For the final assessment I defer
to Gerald and Sebastian.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-24  8:39                   ` Martin Schwidefsky
  0 siblings, 0 replies; 153+ messages in thread
From: Martin Schwidefsky @ 2016-02-24  8:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 23 Feb 2016 22:33:45 +0300
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
> > I'll check with Martin, maybe it is actually trivial, then we can
> > do a quick test it to rule that one out.
> 
> Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
> _the_ bug.
> 
> pmdp_invalidate() is called for the wrong address :-/
> I guess that can be destructive on the architecture, right?
> 
> Could you check this?
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 1c317b85ea7d..4246bc70e55a 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2865,7 +2865,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  	pgtable = pgtable_trans_huge_withdraw(mm, pmd);
>  	pmd_populate(mm, &_pmd, pgtable);
> 
> -	for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
> +	for (i = 0; i < HPAGE_PMD_NR; i++) {
>  		pte_t entry, *pte;
>  		/*
>  		 * Note that NUMA hinting access restrictions are not
> @@ -2886,9 +2886,9 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  		}
>  		if (dirty)
>  			SetPageDirty(page + i);
> -		pte = pte_offset_map(&_pmd, haddr);
> +		pte = pte_offset_map(&_pmd, haddr + i * PAGE_SIZE);
>  		BUG_ON(!pte_none(*pte));
> -		set_pte_at(mm, haddr, pte, entry);
> +		set_pte_at(mm, haddr + i * PAGE_SIZE, pte, entry);
>  		atomic_inc(&page[i]._mapcount);
>  		pte_unmap(pte);
>  	}
> @@ -2938,7 +2938,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  	pmd_populate(mm, pmd, pgtable);
> 
>  	if (freeze) {
> -		for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
> +		for (i = 0; i < HPAGE_PMD_NR; i++) {
>  			page_remove_rmap(page + i, false);
>  			put_page(page + i);
>  		}

Test is running and it looks good so far. For the final assessment I defer
to Gerald and Sebastian.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-23 20:22                   ` Will Deacon
  (?)
@ 2016-02-24 10:16                     ` Christian Borntraeger
  -1 siblings, 0 replies; 153+ messages in thread
From: Christian Borntraeger @ 2016-02-24 10:16 UTC (permalink / raw)
  To: Will Deacon, Kirill A. Shutemov
  Cc: Gerald Schaefer, Kirill A. Shutemov, linux-mm, linux-kernel,
	Aneesh Kumar K.V, Andrew Morton, Linus Torvalds,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Catalin Marinas, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On 02/23/2016 09:22 PM, Will Deacon wrote:
> On Tue, Feb 23, 2016 at 10:33:45PM +0300, Kirill A. Shutemov wrote:
>> On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
>>> I'll check with Martin, maybe it is actually trivial, then we can
>>> do a quick test it to rule that one out.
>>
>> Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
>> _the_ bug.
>>
>> pmdp_invalidate() is called for the wrong address :-/
>> I guess that can be destructive on the architecture, right?
> 
> FWIW, arm64 ignores the address parameter for set_pmd_at, so this would
> only result in the TLBI nuking the wrong entries, which is going to be
> tricky to observe in practice given that we install a table entry
> immediately afterwards that maps the same pages. If s390 does more here
> (I see some magic asm using the address), that could be the answer...

This patch does not change the address for set_pmd_at, it does that for the 
pmdp_invalidate here (by keeping haddr at the start of the pmd)

--->    pmdp_invalidate(vma, haddr, pmd);
        pmd_populate(mm, pmd, pgtable);
 



Without that fix we would clearly have stale tlb entries, no?

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-24 10:16                     ` Christian Borntraeger
  0 siblings, 0 replies; 153+ messages in thread
From: Christian Borntraeger @ 2016-02-24 10:16 UTC (permalink / raw)
  To: Will Deacon, Kirill A. Shutemov
  Cc: Gerald Schaefer, Kirill A. Shutemov, linux-mm, linux-kernel,
	Aneesh Kumar K.V, Andrew Morton, Linus Torvalds,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Catalin Marinas, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On 02/23/2016 09:22 PM, Will Deacon wrote:
> On Tue, Feb 23, 2016 at 10:33:45PM +0300, Kirill A. Shutemov wrote:
>> On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
>>> I'll check with Martin, maybe it is actually trivial, then we can
>>> do a quick test it to rule that one out.
>>
>> Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
>> _the_ bug.
>>
>> pmdp_invalidate() is called for the wrong address :-/
>> I guess that can be destructive on the architecture, right?
> 
> FWIW, arm64 ignores the address parameter for set_pmd_at, so this would
> only result in the TLBI nuking the wrong entries, which is going to be
> tricky to observe in practice given that we install a table entry
> immediately afterwards that maps the same pages. If s390 does more here
> (I see some magic asm using the address), that could be the answer...

This patch does not change the address for set_pmd_at, it does that for the 
pmdp_invalidate here (by keeping haddr at the start of the pmd)

--->    pmdp_invalidate(vma, haddr, pmd);
        pmd_populate(mm, pmd, pgtable);
 



Without that fix we would clearly have stale tlb entries, no?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-24 10:16                     ` Christian Borntraeger
  0 siblings, 0 replies; 153+ messages in thread
From: Christian Borntraeger @ 2016-02-24 10:16 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/23/2016 09:22 PM, Will Deacon wrote:
> On Tue, Feb 23, 2016 at 10:33:45PM +0300, Kirill A. Shutemov wrote:
>> On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
>>> I'll check with Martin, maybe it is actually trivial, then we can
>>> do a quick test it to rule that one out.
>>
>> Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
>> _the_ bug.
>>
>> pmdp_invalidate() is called for the wrong address :-/
>> I guess that can be destructive on the architecture, right?
> 
> FWIW, arm64 ignores the address parameter for set_pmd_at, so this would
> only result in the TLBI nuking the wrong entries, which is going to be
> tricky to observe in practice given that we install a table entry
> immediately afterwards that maps the same pages. If s390 does more here
> (I see some magic asm using the address), that could be the answer...

This patch does not change the address for set_pmd_at, it does that for the 
pmdp_invalidate here (by keeping haddr at the start of the pmd)

--->    pmdp_invalidate(vma, haddr, pmd);
        pmd_populate(mm, pmd, pgtable);
 



Without that fix we would clearly have stale tlb entries, no?

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-24 10:16                     ` Christian Borntraeger
  (?)
@ 2016-02-24 10:41                       ` Will Deacon
  -1 siblings, 0 replies; 153+ messages in thread
From: Will Deacon @ 2016-02-24 10:41 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Kirill A. Shutemov, Gerald Schaefer, Kirill A. Shutemov,
	linux-mm, linux-kernel, Aneesh Kumar K.V, Andrew Morton,
	Linus Torvalds, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Catalin Marinas, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On Wed, Feb 24, 2016 at 11:16:34AM +0100, Christian Borntraeger wrote:
> On 02/23/2016 09:22 PM, Will Deacon wrote:
> > On Tue, Feb 23, 2016 at 10:33:45PM +0300, Kirill A. Shutemov wrote:
> >> On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
> >>> I'll check with Martin, maybe it is actually trivial, then we can
> >>> do a quick test it to rule that one out.
> >>
> >> Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
> >> _the_ bug.
> >>
> >> pmdp_invalidate() is called for the wrong address :-/
> >> I guess that can be destructive on the architecture, right?
> > 
> > FWIW, arm64 ignores the address parameter for set_pmd_at, so this would
> > only result in the TLBI nuking the wrong entries, which is going to be
> > tricky to observe in practice given that we install a table entry
> > immediately afterwards that maps the same pages. If s390 does more here
> > (I see some magic asm using the address), that could be the answer...
> 
> This patch does not change the address for set_pmd_at, it does that for the 
> pmdp_invalidate here (by keeping haddr at the start of the pmd)
> 
> --->    pmdp_invalidate(vma, haddr, pmd);
>         pmd_populate(mm, pmd, pgtable);

On arm64, pmdp_invalidate looks like:

void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
		     pmd_t *pmdp)
{
	pmd_t entry = *pmdp;
	set_pmd_at(vma->vm_mm, address, pmdp, pmd_mknotpresent(entry));
	flush_pmd_tlb_range(vma, address, address + hpage_pmd_size);
}

so that's the set_pmd_at call I was referring to.

On s390, that address ends up in __pmdp_idte[_local], but I don't know
what .insn rrf,0xb98e0000,%2,%3,0,{0,1} do ;)

> Without that fix we would clearly have stale tlb entries, no?

Yes, but AFAIU the sequence on arm64 is:

1.  trans huge mapping (block mapping in arm64 speak)
2.  faulting entry (pmd_mknotpresent)
3.  tlb invalidation
4.  table entry mapping the same pages as (1).

so if the microarchitecture we're on can tolerate a mixture of block
mappings and page mappings mapping the same VA to the same PA, then the
lack of TLB maintenance would go unnoticed. There are certainly systems
where that could cause an issue, but I believe the one I've been testing
on would be ok.

Will

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-24 10:41                       ` Will Deacon
  0 siblings, 0 replies; 153+ messages in thread
From: Will Deacon @ 2016-02-24 10:41 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Kirill A. Shutemov, Gerald Schaefer, Kirill A. Shutemov,
	linux-mm, linux-kernel, Aneesh Kumar K.V, Andrew Morton,
	Linus Torvalds, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Catalin Marinas, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On Wed, Feb 24, 2016 at 11:16:34AM +0100, Christian Borntraeger wrote:
> On 02/23/2016 09:22 PM, Will Deacon wrote:
> > On Tue, Feb 23, 2016 at 10:33:45PM +0300, Kirill A. Shutemov wrote:
> >> On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
> >>> I'll check with Martin, maybe it is actually trivial, then we can
> >>> do a quick test it to rule that one out.
> >>
> >> Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
> >> _the_ bug.
> >>
> >> pmdp_invalidate() is called for the wrong address :-/
> >> I guess that can be destructive on the architecture, right?
> > 
> > FWIW, arm64 ignores the address parameter for set_pmd_at, so this would
> > only result in the TLBI nuking the wrong entries, which is going to be
> > tricky to observe in practice given that we install a table entry
> > immediately afterwards that maps the same pages. If s390 does more here
> > (I see some magic asm using the address), that could be the answer...
> 
> This patch does not change the address for set_pmd_at, it does that for the 
> pmdp_invalidate here (by keeping haddr at the start of the pmd)
> 
> --->    pmdp_invalidate(vma, haddr, pmd);
>         pmd_populate(mm, pmd, pgtable);

On arm64, pmdp_invalidate looks like:

void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
		     pmd_t *pmdp)
{
	pmd_t entry = *pmdp;
	set_pmd_at(vma->vm_mm, address, pmdp, pmd_mknotpresent(entry));
	flush_pmd_tlb_range(vma, address, address + hpage_pmd_size);
}

so that's the set_pmd_at call I was referring to.

On s390, that address ends up in __pmdp_idte[_local], but I don't know
what .insn rrf,0xb98e0000,%2,%3,0,{0,1} do ;)

> Without that fix we would clearly have stale tlb entries, no?

Yes, but AFAIU the sequence on arm64 is:

1.  trans huge mapping (block mapping in arm64 speak)
2.  faulting entry (pmd_mknotpresent)
3.  tlb invalidation
4.  table entry mapping the same pages as (1).

so if the microarchitecture we're on can tolerate a mixture of block
mappings and page mappings mapping the same VA to the same PA, then the
lack of TLB maintenance would go unnoticed. There are certainly systems
where that could cause an issue, but I believe the one I've been testing
on would be ok.

Will

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-24 10:41                       ` Will Deacon
  0 siblings, 0 replies; 153+ messages in thread
From: Will Deacon @ 2016-02-24 10:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 24, 2016 at 11:16:34AM +0100, Christian Borntraeger wrote:
> On 02/23/2016 09:22 PM, Will Deacon wrote:
> > On Tue, Feb 23, 2016 at 10:33:45PM +0300, Kirill A. Shutemov wrote:
> >> On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
> >>> I'll check with Martin, maybe it is actually trivial, then we can
> >>> do a quick test it to rule that one out.
> >>
> >> Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
> >> _the_ bug.
> >>
> >> pmdp_invalidate() is called for the wrong address :-/
> >> I guess that can be destructive on the architecture, right?
> > 
> > FWIW, arm64 ignores the address parameter for set_pmd_at, so this would
> > only result in the TLBI nuking the wrong entries, which is going to be
> > tricky to observe in practice given that we install a table entry
> > immediately afterwards that maps the same pages. If s390 does more here
> > (I see some magic asm using the address), that could be the answer...
> 
> This patch does not change the address for set_pmd_at, it does that for the 
> pmdp_invalidate here (by keeping haddr at the start of the pmd)
> 
> --->    pmdp_invalidate(vma, haddr, pmd);
>         pmd_populate(mm, pmd, pgtable);

On arm64, pmdp_invalidate looks like:

void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
		     pmd_t *pmdp)
{
	pmd_t entry = *pmdp;
	set_pmd_at(vma->vm_mm, address, pmdp, pmd_mknotpresent(entry));
	flush_pmd_tlb_range(vma, address, address + hpage_pmd_size);
}

so that's the set_pmd_at call I was referring to.

On s390, that address ends up in __pmdp_idte[_local], but I don't know
what .insn rrf,0xb98e0000,%2,%3,0,{0,1} do ;)

> Without that fix we would clearly have stale tlb entries, no?

Yes, but AFAIU the sequence on arm64 is:

1.  trans huge mapping (block mapping in arm64 speak)
2.  faulting entry (pmd_mknotpresent)
3.  tlb invalidation
4.  table entry mapping the same pages as (1).

so if the microarchitecture we're on can tolerate a mixture of block
mappings and page mappings mapping the same VA to the same PA, then the
lack of TLB maintenance would go unnoticed. There are certainly systems
where that could cause an issue, but I believe the one I've been testing
on would be ok.

Will

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-24 10:41                       ` Will Deacon
  (?)
@ 2016-02-24 10:51                         ` Christian Borntraeger
  -1 siblings, 0 replies; 153+ messages in thread
From: Christian Borntraeger @ 2016-02-24 10:51 UTC (permalink / raw)
  To: Will Deacon
  Cc: Kirill A. Shutemov, Gerald Schaefer, Kirill A. Shutemov,
	linux-mm, linux-kernel, Aneesh Kumar K.V, Andrew Morton,
	Linus Torvalds, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Catalin Marinas, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On 02/24/2016 11:41 AM, Will Deacon wrote:
> On Wed, Feb 24, 2016 at 11:16:34AM +0100, Christian Borntraeger wrote:
>> On 02/23/2016 09:22 PM, Will Deacon wrote:
>>> On Tue, Feb 23, 2016 at 10:33:45PM +0300, Kirill A. Shutemov wrote:
>>>> On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
>>>>> I'll check with Martin, maybe it is actually trivial, then we can
>>>>> do a quick test it to rule that one out.
>>>>
>>>> Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
>>>> _the_ bug.
>>>>
>>>> pmdp_invalidate() is called for the wrong address :-/
>>>> I guess that can be destructive on the architecture, right?
>>>
>>> FWIW, arm64 ignores the address parameter for set_pmd_at, so this would
>>> only result in the TLBI nuking the wrong entries, which is going to be
>>> tricky to observe in practice given that we install a table entry
>>> immediately afterwards that maps the same pages. If s390 does more here
>>> (I see some magic asm using the address), that could be the answer...
>>
>> This patch does not change the address for set_pmd_at, it does that for the 
>> pmdp_invalidate here (by keeping haddr at the start of the pmd)
>>
>> --->    pmdp_invalidate(vma, haddr, pmd);
>>         pmd_populate(mm, pmd, pgtable);
> 
> On arm64, pmdp_invalidate looks like:
> 
> void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
> 		     pmd_t *pmdp)
> {
> 	pmd_t entry = *pmdp;
> 	set_pmd_at(vma->vm_mm, address, pmdp, pmd_mknotpresent(entry));
> 	flush_pmd_tlb_range(vma, address, address + hpage_pmd_size);
> }
> 
> so that's the set_pmd_at call I was referring to.
> 
> On s390, that address ends up in __pmdp_idte[_local], but I don't know
> what .insn rrf,0xb98e0000,%2,%3,0,{0,1} do ;)

It does invalidation of the pmd entry and tlb clearing for this entry.

> 
>> Without that fix we would clearly have stale tlb entries, no?
> 
> Yes, but AFAIU the sequence on arm64 is:
> 
> 1.  trans huge mapping (block mapping in arm64 speak)
> 2.  faulting entry (pmd_mknotpresent)
> 3.  tlb invalidation
> 4.  table entry mapping the same pages as (1).
> 
> so if the microarchitecture we're on can tolerate a mixture of block
> mappings and page mappings mapping the same VA to the same PA, then the
> lack of TLB maintenance would go unnoticed. There are certainly systems
> where that could cause an issue, but I believe the one I've been testing
> on would be ok.

So in essence you say it does not matter that you flush the wrong range in 
flush_pmd_tlb_range as long as it will be flushed later on when the pages
really go away. Yes, then it really might be ok for arm64.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-24 10:51                         ` Christian Borntraeger
  0 siblings, 0 replies; 153+ messages in thread
From: Christian Borntraeger @ 2016-02-24 10:51 UTC (permalink / raw)
  To: Will Deacon
  Cc: Kirill A. Shutemov, Gerald Schaefer, Kirill A. Shutemov,
	linux-mm, linux-kernel, Aneesh Kumar K.V, Andrew Morton,
	Linus Torvalds, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Catalin Marinas, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On 02/24/2016 11:41 AM, Will Deacon wrote:
> On Wed, Feb 24, 2016 at 11:16:34AM +0100, Christian Borntraeger wrote:
>> On 02/23/2016 09:22 PM, Will Deacon wrote:
>>> On Tue, Feb 23, 2016 at 10:33:45PM +0300, Kirill A. Shutemov wrote:
>>>> On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
>>>>> I'll check with Martin, maybe it is actually trivial, then we can
>>>>> do a quick test it to rule that one out.
>>>>
>>>> Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
>>>> _the_ bug.
>>>>
>>>> pmdp_invalidate() is called for the wrong address :-/
>>>> I guess that can be destructive on the architecture, right?
>>>
>>> FWIW, arm64 ignores the address parameter for set_pmd_at, so this would
>>> only result in the TLBI nuking the wrong entries, which is going to be
>>> tricky to observe in practice given that we install a table entry
>>> immediately afterwards that maps the same pages. If s390 does more here
>>> (I see some magic asm using the address), that could be the answer...
>>
>> This patch does not change the address for set_pmd_at, it does that for the 
>> pmdp_invalidate here (by keeping haddr at the start of the pmd)
>>
>> --->    pmdp_invalidate(vma, haddr, pmd);
>>         pmd_populate(mm, pmd, pgtable);
> 
> On arm64, pmdp_invalidate looks like:
> 
> void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
> 		     pmd_t *pmdp)
> {
> 	pmd_t entry = *pmdp;
> 	set_pmd_at(vma->vm_mm, address, pmdp, pmd_mknotpresent(entry));
> 	flush_pmd_tlb_range(vma, address, address + hpage_pmd_size);
> }
> 
> so that's the set_pmd_at call I was referring to.
> 
> On s390, that address ends up in __pmdp_idte[_local], but I don't know
> what .insn rrf,0xb98e0000,%2,%3,0,{0,1} do ;)

It does invalidation of the pmd entry and tlb clearing for this entry.

> 
>> Without that fix we would clearly have stale tlb entries, no?
> 
> Yes, but AFAIU the sequence on arm64 is:
> 
> 1.  trans huge mapping (block mapping in arm64 speak)
> 2.  faulting entry (pmd_mknotpresent)
> 3.  tlb invalidation
> 4.  table entry mapping the same pages as (1).
> 
> so if the microarchitecture we're on can tolerate a mixture of block
> mappings and page mappings mapping the same VA to the same PA, then the
> lack of TLB maintenance would go unnoticed. There are certainly systems
> where that could cause an issue, but I believe the one I've been testing
> on would be ok.

So in essence you say it does not matter that you flush the wrong range in 
flush_pmd_tlb_range as long as it will be flushed later on when the pages
really go away. Yes, then it really might be ok for arm64.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-24 10:51                         ` Christian Borntraeger
  0 siblings, 0 replies; 153+ messages in thread
From: Christian Borntraeger @ 2016-02-24 10:51 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/24/2016 11:41 AM, Will Deacon wrote:
> On Wed, Feb 24, 2016 at 11:16:34AM +0100, Christian Borntraeger wrote:
>> On 02/23/2016 09:22 PM, Will Deacon wrote:
>>> On Tue, Feb 23, 2016 at 10:33:45PM +0300, Kirill A. Shutemov wrote:
>>>> On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
>>>>> I'll check with Martin, maybe it is actually trivial, then we can
>>>>> do a quick test it to rule that one out.
>>>>
>>>> Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
>>>> _the_ bug.
>>>>
>>>> pmdp_invalidate() is called for the wrong address :-/
>>>> I guess that can be destructive on the architecture, right?
>>>
>>> FWIW, arm64 ignores the address parameter for set_pmd_at, so this would
>>> only result in the TLBI nuking the wrong entries, which is going to be
>>> tricky to observe in practice given that we install a table entry
>>> immediately afterwards that maps the same pages. If s390 does more here
>>> (I see some magic asm using the address), that could be the answer...
>>
>> This patch does not change the address for set_pmd_at, it does that for the 
>> pmdp_invalidate here (by keeping haddr at the start of the pmd)
>>
>> --->    pmdp_invalidate(vma, haddr, pmd);
>>         pmd_populate(mm, pmd, pgtable);
> 
> On arm64, pmdp_invalidate looks like:
> 
> void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
> 		     pmd_t *pmdp)
> {
> 	pmd_t entry = *pmdp;
> 	set_pmd_at(vma->vm_mm, address, pmdp, pmd_mknotpresent(entry));
> 	flush_pmd_tlb_range(vma, address, address + hpage_pmd_size);
> }
> 
> so that's the set_pmd_at call I was referring to.
> 
> On s390, that address ends up in __pmdp_idte[_local], but I don't know
> what .insn rrf,0xb98e0000,%2,%3,0,{0,1} do ;)

It does invalidation of the pmd entry and tlb clearing for this entry.

> 
>> Without that fix we would clearly have stale tlb entries, no?
> 
> Yes, but AFAIU the sequence on arm64 is:
> 
> 1.  trans huge mapping (block mapping in arm64 speak)
> 2.  faulting entry (pmd_mknotpresent)
> 3.  tlb invalidation
> 4.  table entry mapping the same pages as (1).
> 
> so if the microarchitecture we're on can tolerate a mixture of block
> mappings and page mappings mapping the same VA to the same PA, then the
> lack of TLB maintenance would go unnoticed. There are certainly systems
> where that could cause an issue, but I believe the one I've been testing
> on would be ok.

So in essence you say it does not matter that you flush the wrong range in 
flush_pmd_tlb_range as long as it will be flushed later on when the pages
really go away. Yes, then it really might be ok for arm64.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-24 10:51                         ` Christian Borntraeger
  (?)
@ 2016-02-24 11:02                           ` Will Deacon
  -1 siblings, 0 replies; 153+ messages in thread
From: Will Deacon @ 2016-02-24 11:02 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Kirill A. Shutemov, Gerald Schaefer, Kirill A. Shutemov,
	linux-mm, linux-kernel, Aneesh Kumar K.V, Andrew Morton,
	Linus Torvalds, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Catalin Marinas, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On Wed, Feb 24, 2016 at 11:51:47AM +0100, Christian Borntraeger wrote:
> On 02/24/2016 11:41 AM, Will Deacon wrote:
> > On Wed, Feb 24, 2016 at 11:16:34AM +0100, Christian Borntraeger wrote:
> >> Without that fix we would clearly have stale tlb entries, no?
> > 
> > Yes, but AFAIU the sequence on arm64 is:
> > 
> > 1.  trans huge mapping (block mapping in arm64 speak)
> > 2.  faulting entry (pmd_mknotpresent)
> > 3.  tlb invalidation
> > 4.  table entry mapping the same pages as (1).
> > 
> > so if the microarchitecture we're on can tolerate a mixture of block
> > mappings and page mappings mapping the same VA to the same PA, then the
> > lack of TLB maintenance would go unnoticed. There are certainly systems
> > where that could cause an issue, but I believe the one I've been testing
> > on would be ok.
> 
> So in essence you say it does not matter that you flush the wrong range in 
> flush_pmd_tlb_range as long as it will be flushed later on when the pages
> really go away. Yes, then it really might be ok for arm64.

Indeed, although that's a property of the microarchitecture I'm using
rather than an architectural guarantee so the code should certainly be
fixed!

Will

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-24 11:02                           ` Will Deacon
  0 siblings, 0 replies; 153+ messages in thread
From: Will Deacon @ 2016-02-24 11:02 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Kirill A. Shutemov, Gerald Schaefer, Kirill A. Shutemov,
	linux-mm, linux-kernel, Aneesh Kumar K.V, Andrew Morton,
	Linus Torvalds, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Catalin Marinas, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On Wed, Feb 24, 2016 at 11:51:47AM +0100, Christian Borntraeger wrote:
> On 02/24/2016 11:41 AM, Will Deacon wrote:
> > On Wed, Feb 24, 2016 at 11:16:34AM +0100, Christian Borntraeger wrote:
> >> Without that fix we would clearly have stale tlb entries, no?
> > 
> > Yes, but AFAIU the sequence on arm64 is:
> > 
> > 1.  trans huge mapping (block mapping in arm64 speak)
> > 2.  faulting entry (pmd_mknotpresent)
> > 3.  tlb invalidation
> > 4.  table entry mapping the same pages as (1).
> > 
> > so if the microarchitecture we're on can tolerate a mixture of block
> > mappings and page mappings mapping the same VA to the same PA, then the
> > lack of TLB maintenance would go unnoticed. There are certainly systems
> > where that could cause an issue, but I believe the one I've been testing
> > on would be ok.
> 
> So in essence you say it does not matter that you flush the wrong range in 
> flush_pmd_tlb_range as long as it will be flushed later on when the pages
> really go away. Yes, then it really might be ok for arm64.

Indeed, although that's a property of the microarchitecture I'm using
rather than an architectural guarantee so the code should certainly be
fixed!

Will

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-24 11:02                           ` Will Deacon
  0 siblings, 0 replies; 153+ messages in thread
From: Will Deacon @ 2016-02-24 11:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 24, 2016 at 11:51:47AM +0100, Christian Borntraeger wrote:
> On 02/24/2016 11:41 AM, Will Deacon wrote:
> > On Wed, Feb 24, 2016 at 11:16:34AM +0100, Christian Borntraeger wrote:
> >> Without that fix we would clearly have stale tlb entries, no?
> > 
> > Yes, but AFAIU the sequence on arm64 is:
> > 
> > 1.  trans huge mapping (block mapping in arm64 speak)
> > 2.  faulting entry (pmd_mknotpresent)
> > 3.  tlb invalidation
> > 4.  table entry mapping the same pages as (1).
> > 
> > so if the microarchitecture we're on can tolerate a mixture of block
> > mappings and page mappings mapping the same VA to the same PA, then the
> > lack of TLB maintenance would go unnoticed. There are certainly systems
> > where that could cause an issue, but I believe the one I've been testing
> > on would be ok.
> 
> So in essence you say it does not matter that you flush the wrong range in 
> flush_pmd_tlb_range as long as it will be flushed later on when the pages
> really go away. Yes, then it really might be ok for arm64.

Indeed, although that's a property of the microarchitecture I'm using
rather than an architectural guarantee so the code should certainly be
fixed!

Will

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-24  8:39                   ` Martin Schwidefsky
  (?)
@ 2016-02-24 12:11                     ` Sebastian Ott
  -1 siblings, 0 replies; 153+ messages in thread
From: Sebastian Ott @ 2016-02-24 12:11 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Kirill A. Shutemov, Gerald Schaefer, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel, Heiko Carstens,
	linux-s390

On Wed, 24 Feb 2016, Martin Schwidefsky wrote:
> On Tue, 23 Feb 2016 22:33:45 +0300
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
> > > I'll check with Martin, maybe it is actually trivial, then we can
> > > do a quick test it to rule that one out.
> > 
> > Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
> > _the_ bug.
> > 
> > pmdp_invalidate() is called for the wrong address :-/
> > I guess that can be destructive on the architecture, right?
> > 
> > Could you check this?
> > 
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 1c317b85ea7d..4246bc70e55a 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -2865,7 +2865,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> >  	pgtable = pgtable_trans_huge_withdraw(mm, pmd);
> >  	pmd_populate(mm, &_pmd, pgtable);
> > 
> > -	for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
> > +	for (i = 0; i < HPAGE_PMD_NR; i++) {
> >  		pte_t entry, *pte;
> >  		/*
> >  		 * Note that NUMA hinting access restrictions are not
> > @@ -2886,9 +2886,9 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> >  		}
> >  		if (dirty)
> >  			SetPageDirty(page + i);
> > -		pte = pte_offset_map(&_pmd, haddr);
> > +		pte = pte_offset_map(&_pmd, haddr + i * PAGE_SIZE);
> >  		BUG_ON(!pte_none(*pte));
> > -		set_pte_at(mm, haddr, pte, entry);
> > +		set_pte_at(mm, haddr + i * PAGE_SIZE, pte, entry);
> >  		atomic_inc(&page[i]._mapcount);
> >  		pte_unmap(pte);
> >  	}
> > @@ -2938,7 +2938,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> >  	pmd_populate(mm, pmd, pgtable);
> > 
> >  	if (freeze) {
> > -		for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
> > +		for (i = 0; i < HPAGE_PMD_NR; i++) {
> >  			page_remove_rmap(page + i, false);
> >  			put_page(page + i);
> >  		}
> 
> Test is running and it looks good so far. For the final assessment I defer
> to Gerald and Sebastian.
> 

Yes, that one worked. My testsystem is doing make -j10 && make clean
in a loop since 4 hours now. Thanks!

Sebastian

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-24 12:11                     ` Sebastian Ott
  0 siblings, 0 replies; 153+ messages in thread
From: Sebastian Ott @ 2016-02-24 12:11 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Kirill A. Shutemov, Gerald Schaefer, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, Will Deacon, linux-arm-kernel, Heiko Carstens,
	linux-s390

On Wed, 24 Feb 2016, Martin Schwidefsky wrote:
> On Tue, 23 Feb 2016 22:33:45 +0300
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
> > > I'll check with Martin, maybe it is actually trivial, then we can
> > > do a quick test it to rule that one out.
> > 
> > Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
> > _the_ bug.
> > 
> > pmdp_invalidate() is called for the wrong address :-/
> > I guess that can be destructive on the architecture, right?
> > 
> > Could you check this?
> > 
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 1c317b85ea7d..4246bc70e55a 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -2865,7 +2865,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> >  	pgtable = pgtable_trans_huge_withdraw(mm, pmd);
> >  	pmd_populate(mm, &_pmd, pgtable);
> > 
> > -	for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
> > +	for (i = 0; i < HPAGE_PMD_NR; i++) {
> >  		pte_t entry, *pte;
> >  		/*
> >  		 * Note that NUMA hinting access restrictions are not
> > @@ -2886,9 +2886,9 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> >  		}
> >  		if (dirty)
> >  			SetPageDirty(page + i);
> > -		pte = pte_offset_map(&_pmd, haddr);
> > +		pte = pte_offset_map(&_pmd, haddr + i * PAGE_SIZE);
> >  		BUG_ON(!pte_none(*pte));
> > -		set_pte_at(mm, haddr, pte, entry);
> > +		set_pte_at(mm, haddr + i * PAGE_SIZE, pte, entry);
> >  		atomic_inc(&page[i]._mapcount);
> >  		pte_unmap(pte);
> >  	}
> > @@ -2938,7 +2938,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> >  	pmd_populate(mm, pmd, pgtable);
> > 
> >  	if (freeze) {
> > -		for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
> > +		for (i = 0; i < HPAGE_PMD_NR; i++) {
> >  			page_remove_rmap(page + i, false);
> >  			put_page(page + i);
> >  		}
> 
> Test is running and it looks good so far. For the final assessment I defer
> to Gerald and Sebastian.
> 

Yes, that one worked. My testsystem is doing make -j10 && make clean
in a loop since 4 hours now. Thanks!

Sebastian

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-24 12:11                     ` Sebastian Ott
  0 siblings, 0 replies; 153+ messages in thread
From: Sebastian Ott @ 2016-02-24 12:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 24 Feb 2016, Martin Schwidefsky wrote:
> On Tue, 23 Feb 2016 22:33:45 +0300
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
> > > I'll check with Martin, maybe it is actually trivial, then we can
> > > do a quick test it to rule that one out.
> > 
> > Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
> > _the_ bug.
> > 
> > pmdp_invalidate() is called for the wrong address :-/
> > I guess that can be destructive on the architecture, right?
> > 
> > Could you check this?
> > 
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 1c317b85ea7d..4246bc70e55a 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -2865,7 +2865,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> >  	pgtable = pgtable_trans_huge_withdraw(mm, pmd);
> >  	pmd_populate(mm, &_pmd, pgtable);
> > 
> > -	for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
> > +	for (i = 0; i < HPAGE_PMD_NR; i++) {
> >  		pte_t entry, *pte;
> >  		/*
> >  		 * Note that NUMA hinting access restrictions are not
> > @@ -2886,9 +2886,9 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> >  		}
> >  		if (dirty)
> >  			SetPageDirty(page + i);
> > -		pte = pte_offset_map(&_pmd, haddr);
> > +		pte = pte_offset_map(&_pmd, haddr + i * PAGE_SIZE);
> >  		BUG_ON(!pte_none(*pte));
> > -		set_pte_at(mm, haddr, pte, entry);
> > +		set_pte_at(mm, haddr + i * PAGE_SIZE, pte, entry);
> >  		atomic_inc(&page[i]._mapcount);
> >  		pte_unmap(pte);
> >  	}
> > @@ -2938,7 +2938,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> >  	pmd_populate(mm, pmd, pgtable);
> > 
> >  	if (freeze) {
> > -		for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
> > +		for (i = 0; i < HPAGE_PMD_NR; i++) {
> >  			page_remove_rmap(page + i, false);
> >  			put_page(page + i);
> >  		}
> 
> Test is running and it looks good so far. For the final assessment I defer
> to Gerald and Sebastian.
> 

Yes, that one worked. My testsystem is doing make -j10 && make clean
in a loop since 4 hours now. Thanks!

Sebastian

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-23 19:33                 ` Kirill A. Shutemov
  (?)
@ 2016-02-24 16:44                   ` Gerald Schaefer
  -1 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-24 16:44 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Christian Borntraeger, Kirill A. Shutemov, linux-mm,
	linux-kernel, Aneesh Kumar K.V, Andrew Morton, Linus Torvalds,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On Tue, 23 Feb 2016 22:33:45 +0300
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
> > I'll check with Martin, maybe it is actually trivial, then we can
> > do a quick test it to rule that one out.
> 
> Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
> _the_ bug.
> 
> pmdp_invalidate() is called for the wrong address :-/
> I guess that can be destructive on the architecture, right?

Thanks, that's it! We can no longer reproduce the crashes and calling
pmdp_invalidate() with a wrong address also perfectly explains the
memory corruption that I found in several dumps: 0x020 was ORed into
pte entries, which didn't make sense, and caused the list corruption
for example. 0x020 it is the invalid bit for pmd entries on s390 and
thus can be explained by this bug when a pte table lies before a pmd
table in memory.

> 
> Could you check this?
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 1c317b85ea7d..4246bc70e55a 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2865,7 +2865,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  	pgtable = pgtable_trans_huge_withdraw(mm, pmd);
>  	pmd_populate(mm, &_pmd, pgtable);
> 
> -	for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
> +	for (i = 0; i < HPAGE_PMD_NR; i++) {
>  		pte_t entry, *pte;
>  		/*
>  		 * Note that NUMA hinting access restrictions are not
> @@ -2886,9 +2886,9 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  		}
>  		if (dirty)
>  			SetPageDirty(page + i);
> -		pte = pte_offset_map(&_pmd, haddr);
> +		pte = pte_offset_map(&_pmd, haddr + i * PAGE_SIZE);
>  		BUG_ON(!pte_none(*pte));
> -		set_pte_at(mm, haddr, pte, entry);
> +		set_pte_at(mm, haddr + i * PAGE_SIZE, pte, entry);
>  		atomic_inc(&page[i]._mapcount);
>  		pte_unmap(pte);
>  	}
> @@ -2938,7 +2938,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  	pmd_populate(mm, pmd, pgtable);
> 
>  	if (freeze) {
> -		for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
> +		for (i = 0; i < HPAGE_PMD_NR; i++) {
>  			page_remove_rmap(page + i, false);
>  			put_page(page + i);
>  		}

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-24 16:44                   ` Gerald Schaefer
  0 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-24 16:44 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Christian Borntraeger, Kirill A. Shutemov, linux-mm,
	linux-kernel, Aneesh Kumar K.V, Andrew Morton, Linus Torvalds,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

On Tue, 23 Feb 2016 22:33:45 +0300
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
> > I'll check with Martin, maybe it is actually trivial, then we can
> > do a quick test it to rule that one out.
> 
> Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
> _the_ bug.
> 
> pmdp_invalidate() is called for the wrong address :-/
> I guess that can be destructive on the architecture, right?

Thanks, that's it! We can no longer reproduce the crashes and calling
pmdp_invalidate() with a wrong address also perfectly explains the
memory corruption that I found in several dumps: 0x020 was ORed into
pte entries, which didn't make sense, and caused the list corruption
for example. 0x020 it is the invalid bit for pmd entries on s390 and
thus can be explained by this bug when a pte table lies before a pmd
table in memory.

> 
> Could you check this?
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 1c317b85ea7d..4246bc70e55a 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2865,7 +2865,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  	pgtable = pgtable_trans_huge_withdraw(mm, pmd);
>  	pmd_populate(mm, &_pmd, pgtable);
> 
> -	for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
> +	for (i = 0; i < HPAGE_PMD_NR; i++) {
>  		pte_t entry, *pte;
>  		/*
>  		 * Note that NUMA hinting access restrictions are not
> @@ -2886,9 +2886,9 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  		}
>  		if (dirty)
>  			SetPageDirty(page + i);
> -		pte = pte_offset_map(&_pmd, haddr);
> +		pte = pte_offset_map(&_pmd, haddr + i * PAGE_SIZE);
>  		BUG_ON(!pte_none(*pte));
> -		set_pte_at(mm, haddr, pte, entry);
> +		set_pte_at(mm, haddr + i * PAGE_SIZE, pte, entry);
>  		atomic_inc(&page[i]._mapcount);
>  		pte_unmap(pte);
>  	}
> @@ -2938,7 +2938,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  	pmd_populate(mm, pmd, pgtable);
> 
>  	if (freeze) {
> -		for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
> +		for (i = 0; i < HPAGE_PMD_NR; i++) {
>  			page_remove_rmap(page + i, false);
>  			put_page(page + i);
>  		}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-24 16:44                   ` Gerald Schaefer
  0 siblings, 0 replies; 153+ messages in thread
From: Gerald Schaefer @ 2016-02-24 16:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 23 Feb 2016 22:33:45 +0300
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
> > I'll check with Martin, maybe it is actually trivial, then we can
> > do a quick test it to rule that one out.
> 
> Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
> _the_ bug.
> 
> pmdp_invalidate() is called for the wrong address :-/
> I guess that can be destructive on the architecture, right?

Thanks, that's it! We can no longer reproduce the crashes and calling
pmdp_invalidate() with a wrong address also perfectly explains the
memory corruption that I found in several dumps: 0x020 was ORed into
pte entries, which didn't make sense, and caused the list corruption
for example. 0x020 it is the invalid bit for pmd entries on s390 and
thus can be explained by this bug when a pte table lies before a pmd
table in memory.

> 
> Could you check this?
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 1c317b85ea7d..4246bc70e55a 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2865,7 +2865,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  	pgtable = pgtable_trans_huge_withdraw(mm, pmd);
>  	pmd_populate(mm, &_pmd, pgtable);
> 
> -	for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
> +	for (i = 0; i < HPAGE_PMD_NR; i++) {
>  		pte_t entry, *pte;
>  		/*
>  		 * Note that NUMA hinting access restrictions are not
> @@ -2886,9 +2886,9 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  		}
>  		if (dirty)
>  			SetPageDirty(page + i);
> -		pte = pte_offset_map(&_pmd, haddr);
> +		pte = pte_offset_map(&_pmd, haddr + i * PAGE_SIZE);
>  		BUG_ON(!pte_none(*pte));
> -		set_pte_at(mm, haddr, pte, entry);
> +		set_pte_at(mm, haddr + i * PAGE_SIZE, pte, entry);
>  		atomic_inc(&page[i]._mapcount);
>  		pte_unmap(pte);
>  	}
> @@ -2938,7 +2938,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  	pmd_populate(mm, pmd, pgtable);
> 
>  	if (freeze) {
> -		for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
> +		for (i = 0; i < HPAGE_PMD_NR; i++) {
>  			page_remove_rmap(page + i, false);
>  			put_page(page + i);
>  		}

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-24 10:51                         ` Christian Borntraeger
  (?)
@ 2016-02-24 17:22                           ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 153+ messages in thread
From: Aneesh Kumar K.V @ 2016-02-24 17:22 UTC (permalink / raw)
  To: Christian Borntraeger, Will Deacon
  Cc: Kirill A. Shutemov, Gerald Schaefer, Kirill A. Shutemov,
	linux-mm, linux-kernel, Andrew Morton, Linus Torvalds,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Catalin Marinas, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

Christian Borntraeger <borntraeger@de.ibm.com> writes:

> On 02/24/2016 11:41 AM, Will Deacon wrote:
>> On Wed, Feb 24, 2016 at 11:16:34AM +0100, Christian Borntraeger wrote:
>>> On 02/23/2016 09:22 PM, Will Deacon wrote:
>>>> On Tue, Feb 23, 2016 at 10:33:45PM +0300, Kirill A. Shutemov wrote:
>>>>> On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
>>>>>> I'll check with Martin, maybe it is actually trivial, then we can
>>>>>> do a quick test it to rule that one out.
>>>>>
>>>>> Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
>>>>> _the_ bug.
>>>>>
>>>>> pmdp_invalidate() is called for the wrong address :-/
>>>>> I guess that can be destructive on the architecture, right?
>>>>
>>>> FWIW, arm64 ignores the address parameter for set_pmd_at, so this would
>>>> only result in the TLBI nuking the wrong entries, which is going to be
>>>> tricky to observe in practice given that we install a table entry
>>>> immediately afterwards that maps the same pages. If s390 does more here
>>>> (I see some magic asm using the address), that could be the answer...
>>>
>>> This patch does not change the address for set_pmd_at, it does that for the 
>>> pmdp_invalidate here (by keeping haddr at the start of the pmd)
>>>
>>> --->    pmdp_invalidate(vma, haddr, pmd);
>>>         pmd_populate(mm, pmd, pgtable);
>> 
>> On arm64, pmdp_invalidate looks like:
>> 
>> void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>> 		     pmd_t *pmdp)
>> {
>> 	pmd_t entry = *pmdp;
>> 	set_pmd_at(vma->vm_mm, address, pmdp, pmd_mknotpresent(entry));
>> 	flush_pmd_tlb_range(vma, address, address + hpage_pmd_size);
>> }
>> 
>> so that's the set_pmd_at call I was referring to.
>> 
>> On s390, that address ends up in __pmdp_idte[_local], but I don't know
>> what .insn rrf,0xb98e0000,%2,%3,0,{0,1} do ;)
>
> It does invalidation of the pmd entry and tlb clearing for this entry.
>
>> 
>>> Without that fix we would clearly have stale tlb entries, no?
>> 
>> Yes, but AFAIU the sequence on arm64 is:
>> 
>> 1.  trans huge mapping (block mapping in arm64 speak)
>> 2.  faulting entry (pmd_mknotpresent)
>> 3.  tlb invalidation
>> 4.  table entry mapping the same pages as (1).
>> 
>> so if the microarchitecture we're on can tolerate a mixture of block
>> mappings and page mappings mapping the same VA to the same PA, then the
>> lack of TLB maintenance would go unnoticed. There are certainly systems
>> where that could cause an issue, but I believe the one I've been testing
>> on would be ok.
>
> So in essence you say it does not matter that you flush the wrong range in 
> flush_pmd_tlb_range as long as it will be flushed later on when the pages
> really go away. Yes, then it really might be ok for arm64.

This is more or less same for ppc64 too. With ppc64 the actual flush
happened in pmdp_huge_split_prepare() and pmdp_invalidate() is mostly a
no-op w.r.t thp split in our case.

-aneesh

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-24 17:22                           ` Aneesh Kumar K.V
  0 siblings, 0 replies; 153+ messages in thread
From: Aneesh Kumar K.V @ 2016-02-24 17:22 UTC (permalink / raw)
  To: Christian Borntraeger, Will Deacon
  Cc: Kirill A. Shutemov, Gerald Schaefer, Kirill A. Shutemov,
	linux-mm, linux-kernel, Andrew Morton, Linus Torvalds,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Catalin Marinas, linux-arm-kernel,
	Martin Schwidefsky, Heiko Carstens, linux-s390, Sebastian Ott

Christian Borntraeger <borntraeger@de.ibm.com> writes:

> On 02/24/2016 11:41 AM, Will Deacon wrote:
>> On Wed, Feb 24, 2016 at 11:16:34AM +0100, Christian Borntraeger wrote:
>>> On 02/23/2016 09:22 PM, Will Deacon wrote:
>>>> On Tue, Feb 23, 2016 at 10:33:45PM +0300, Kirill A. Shutemov wrote:
>>>>> On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
>>>>>> I'll check with Martin, maybe it is actually trivial, then we can
>>>>>> do a quick test it to rule that one out.
>>>>>
>>>>> Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
>>>>> _the_ bug.
>>>>>
>>>>> pmdp_invalidate() is called for the wrong address :-/
>>>>> I guess that can be destructive on the architecture, right?
>>>>
>>>> FWIW, arm64 ignores the address parameter for set_pmd_at, so this would
>>>> only result in the TLBI nuking the wrong entries, which is going to be
>>>> tricky to observe in practice given that we install a table entry
>>>> immediately afterwards that maps the same pages. If s390 does more here
>>>> (I see some magic asm using the address), that could be the answer...
>>>
>>> This patch does not change the address for set_pmd_at, it does that for the 
>>> pmdp_invalidate here (by keeping haddr at the start of the pmd)
>>>
>>> --->    pmdp_invalidate(vma, haddr, pmd);
>>>         pmd_populate(mm, pmd, pgtable);
>> 
>> On arm64, pmdp_invalidate looks like:
>> 
>> void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>> 		     pmd_t *pmdp)
>> {
>> 	pmd_t entry = *pmdp;
>> 	set_pmd_at(vma->vm_mm, address, pmdp, pmd_mknotpresent(entry));
>> 	flush_pmd_tlb_range(vma, address, address + hpage_pmd_size);
>> }
>> 
>> so that's the set_pmd_at call I was referring to.
>> 
>> On s390, that address ends up in __pmdp_idte[_local], but I don't know
>> what .insn rrf,0xb98e0000,%2,%3,0,{0,1} do ;)
>
> It does invalidation of the pmd entry and tlb clearing for this entry.
>
>> 
>>> Without that fix we would clearly have stale tlb entries, no?
>> 
>> Yes, but AFAIU the sequence on arm64 is:
>> 
>> 1.  trans huge mapping (block mapping in arm64 speak)
>> 2.  faulting entry (pmd_mknotpresent)
>> 3.  tlb invalidation
>> 4.  table entry mapping the same pages as (1).
>> 
>> so if the microarchitecture we're on can tolerate a mixture of block
>> mappings and page mappings mapping the same VA to the same PA, then the
>> lack of TLB maintenance would go unnoticed. There are certainly systems
>> where that could cause an issue, but I believe the one I've been testing
>> on would be ok.
>
> So in essence you say it does not matter that you flush the wrong range in 
> flush_pmd_tlb_range as long as it will be flushed later on when the pages
> really go away. Yes, then it really might be ok for arm64.

This is more or less same for ppc64 too. With ppc64 the actual flush
happened in pmdp_huge_split_prepare() and pmdp_invalidate() is mostly a
no-op w.r.t thp split in our case.

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-24 17:22                           ` Aneesh Kumar K.V
  0 siblings, 0 replies; 153+ messages in thread
From: Aneesh Kumar K.V @ 2016-02-24 17:22 UTC (permalink / raw)
  To: linux-arm-kernel

Christian Borntraeger <borntraeger@de.ibm.com> writes:

> On 02/24/2016 11:41 AM, Will Deacon wrote:
>> On Wed, Feb 24, 2016 at 11:16:34AM +0100, Christian Borntraeger wrote:
>>> On 02/23/2016 09:22 PM, Will Deacon wrote:
>>>> On Tue, Feb 23, 2016 at 10:33:45PM +0300, Kirill A. Shutemov wrote:
>>>>> On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
>>>>>> I'll check with Martin, maybe it is actually trivial, then we can
>>>>>> do a quick test it to rule that one out.
>>>>>
>>>>> Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
>>>>> _the_ bug.
>>>>>
>>>>> pmdp_invalidate() is called for the wrong address :-/
>>>>> I guess that can be destructive on the architecture, right?
>>>>
>>>> FWIW, arm64 ignores the address parameter for set_pmd_at, so this would
>>>> only result in the TLBI nuking the wrong entries, which is going to be
>>>> tricky to observe in practice given that we install a table entry
>>>> immediately afterwards that maps the same pages. If s390 does more here
>>>> (I see some magic asm using the address), that could be the answer...
>>>
>>> This patch does not change the address for set_pmd_at, it does that for the 
>>> pmdp_invalidate here (by keeping haddr at the start of the pmd)
>>>
>>> --->    pmdp_invalidate(vma, haddr, pmd);
>>>         pmd_populate(mm, pmd, pgtable);
>> 
>> On arm64, pmdp_invalidate looks like:
>> 
>> void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>> 		     pmd_t *pmdp)
>> {
>> 	pmd_t entry = *pmdp;
>> 	set_pmd_at(vma->vm_mm, address, pmdp, pmd_mknotpresent(entry));
>> 	flush_pmd_tlb_range(vma, address, address + hpage_pmd_size);
>> }
>> 
>> so that's the set_pmd_at call I was referring to.
>> 
>> On s390, that address ends up in __pmdp_idte[_local], but I don't know
>> what .insn rrf,0xb98e0000,%2,%3,0,{0,1} do ;)
>
> It does invalidation of the pmd entry and tlb clearing for this entry.
>
>> 
>>> Without that fix we would clearly have stale tlb entries, no?
>> 
>> Yes, but AFAIU the sequence on arm64 is:
>> 
>> 1.  trans huge mapping (block mapping in arm64 speak)
>> 2.  faulting entry (pmd_mknotpresent)
>> 3.  tlb invalidation
>> 4.  table entry mapping the same pages as (1).
>> 
>> so if the microarchitecture we're on can tolerate a mixture of block
>> mappings and page mappings mapping the same VA to the same PA, then the
>> lack of TLB maintenance would go unnoticed. There are certainly systems
>> where that could cause an issue, but I believe the one I've been testing
>> on would be ok.
>
> So in essence you say it does not matter that you flush the wrong range in 
> flush_pmd_tlb_range as long as it will be flushed later on when the pages
> really go away. Yes, then it really might be ok for arm64.

This is more or less same for ppc64 too. With ppc64 the actual flush
happened in pmdp_huge_split_prepare() and pmdp_invalidate() is mostly a
no-op w.r.t thp split in our case.

-aneesh

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-23 18:47                 ` Will Deacon
  (?)
  (?)
@ 2016-02-25 15:49                   ` Steve Capper
  -1 siblings, 0 replies; 153+ messages in thread
From: Steve Capper @ 2016-02-25 15:49 UTC (permalink / raw)
  To: Will Deacon
  Cc: Gerald Schaefer, Kirill A. Shutemov, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, linux-arm-kernel, Martin Schwidefsky,
	Heiko Carstens, linux-s390, Sebastian Ott, Steve Capper

On 23 February 2016 at 18:47, Will Deacon <will.deacon@arm.com> wrote:
> [adding Steve, since he worked on THP for 32-bit ARM]

Apologies for my late reply...

>
> On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
>> On Tue, 23 Feb 2016 13:32:21 +0300
>> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
>> > The theory is that the splitting bit effetely masked bogus pmd_present():
>> > we had pmd_trans_splitting() in all code path and that prevented mm from
>> > touching the pmd. Once pmd_trans_splitting() has gone, mm proceed with the
>> > pmd where it shouldn't and here's a boom.
>>
>> Well, I don't think pmd_present() == true is bogus for a trans_huge pmd under
>> splitting, after all there is a page behind the the pmd. Also, if it was
>> bogus, and it would need to be false, why should it be marked !pmd_present()
>> only at the pmdp_invalidate() step before the pmd_populate()? It clearly
>> is pmd_present() before that, on all architectures, and if there was any
>> problem/race with that, setting it to !pmd_present() at this stage would
>> only (marginally) reduce the race window.
>>
>> BTW, PowerPC and Sparc seem to do the same thing in pmdp_invalidate(),
>> i.e. they do not set pmd_present() == false, only mark it so that it would
>> not generate a new TLB entry, just like on s390. After all, the function
>> is called pmdp_invalidate(), and I think the comment in mm/huge_memory.c
>> before that call is just a little ambiguous in its wording. When it says
>> "mark the pmd notpresent" it probably means "mark it so that it will not
>> generate a new TLB entry", which is also what the comment is really about:
>> prevent huge and small entries in the TLB for the same page at the same
>> time.
>>
>> FWIW, and since the ARM arch-list is already on cc, I think there is
>> an issue with pmdp_invalidate() on ARM, since it also seems to clear
>> the trans_huge (and formerly trans_splitting) bit, which actually makes
>> the pmd !pmd_present(), but it violates the other requirement from the
>> comment:
>> "the pmd_trans_huge and pmd_trans_splitting must remain set at all times
>> on the pmd until the split is complete for this pmd"
>
> I've only been testing this for arm64 (where I'm yet to see a problem),
> but we use the generic pmdp_invalidate implementation from
> mm/pgtable-generic.c there. On arm64, pmd_trans_huge will return true
> after pmd_mknotpresent. On arm, it does look to be buggy, since it nukes
> the entire entry... Steve?

pmd_mknotpresent on arm looks inconsistent with the other
architectures and can be changed.

Having had a look at the usage, I can't see it causing an immediate
problem (that needs to be addressed by an emergency patch).
We don't have a notion of splitting pmds (so there is no splitting
information to lose), and the only usage I could see of
pmd_mknotpresent was:

pmdp_invalidate(vma, haddr, pmd);
pmd_populate(mm, pmd, pgtable);

In mm/huge_memory.c, around line 3588.

So we invalidate the entry (which puts down a faulting entry from
pmd_mknotpresent and invalidates tlb), then immediately put down a
table entry with pmd_populate.

I have run a 32-bit ARM test kernel and exacerbated THP splits (that's
what took me time), and I didn't notice any problems with 4.5-rc5.

Cheers,
-- 
Steve

>
> Will
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-25 15:49                   ` Steve Capper
  0 siblings, 0 replies; 153+ messages in thread
From: Steve Capper @ 2016-02-25 15:49 UTC (permalink / raw)
  To: Will Deacon
  Cc: Gerald Schaefer, Kirill A. Shutemov, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, linux-arm-kernel, Martin Schwidefsky,
	Heiko Carstens, linux-s390, Sebastian Ott, Steve Capper

On 23 February 2016 at 18:47, Will Deacon <will.deacon@arm.com> wrote:
> [adding Steve, since he worked on THP for 32-bit ARM]

Apologies for my late reply...

>
> On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
>> On Tue, 23 Feb 2016 13:32:21 +0300
>> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
>> > The theory is that the splitting bit effetely masked bogus pmd_present():
>> > we had pmd_trans_splitting() in all code path and that prevented mm from
>> > touching the pmd. Once pmd_trans_splitting() has gone, mm proceed with the
>> > pmd where it shouldn't and here's a boom.
>>
>> Well, I don't think pmd_present() == true is bogus for a trans_huge pmd under
>> splitting, after all there is a page behind the the pmd. Also, if it was
>> bogus, and it would need to be false, why should it be marked !pmd_present()
>> only at the pmdp_invalidate() step before the pmd_populate()? It clearly
>> is pmd_present() before that, on all architectures, and if there was any
>> problem/race with that, setting it to !pmd_present() at this stage would
>> only (marginally) reduce the race window.
>>
>> BTW, PowerPC and Sparc seem to do the same thing in pmdp_invalidate(),
>> i.e. they do not set pmd_present() == false, only mark it so that it would
>> not generate a new TLB entry, just like on s390. After all, the function
>> is called pmdp_invalidate(), and I think the comment in mm/huge_memory.c
>> before that call is just a little ambiguous in its wording. When it says
>> "mark the pmd notpresent" it probably means "mark it so that it will not
>> generate a new TLB entry", which is also what the comment is really about:
>> prevent huge and small entries in the TLB for the same page at the same
>> time.
>>
>> FWIW, and since the ARM arch-list is already on cc, I think there is
>> an issue with pmdp_invalidate() on ARM, since it also seems to clear
>> the trans_huge (and formerly trans_splitting) bit, which actually makes
>> the pmd !pmd_present(), but it violates the other requirement from the
>> comment:
>> "the pmd_trans_huge and pmd_trans_splitting must remain set at all times
>> on the pmd until the split is complete for this pmd"
>
> I've only been testing this for arm64 (where I'm yet to see a problem),
> but we use the generic pmdp_invalidate implementation from
> mm/pgtable-generic.c there. On arm64, pmd_trans_huge will return true
> after pmd_mknotpresent. On arm, it does look to be buggy, since it nukes
> the entire entry... Steve?

pmd_mknotpresent on arm looks inconsistent with the other
architectures and can be changed.

Having had a look at the usage, I can't see it causing an immediate
problem (that needs to be addressed by an emergency patch).
We don't have a notion of splitting pmds (so there is no splitting
information to lose), and the only usage I could see of
pmd_mknotpresent was:

pmdp_invalidate(vma, haddr, pmd);
pmd_populate(mm, pmd, pgtable);

In mm/huge_memory.c, around line 3588.

So we invalidate the entry (which puts down a faulting entry from
pmd_mknotpresent and invalidates tlb), then immediately put down a
table entry with pmd_populate.

I have run a 32-bit ARM test kernel and exacerbated THP splits (that's
what took me time), and I didn't notice any problems with 4.5-rc5.

Cheers,
-- 
Steve

>
> Will
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-25 15:49                   ` Steve Capper
  0 siblings, 0 replies; 153+ messages in thread
From: Steve Capper @ 2016-02-25 15:49 UTC (permalink / raw)
  To: Will Deacon
  Cc: Gerald Schaefer, Kirill A. Shutemov, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, linux-arm-kernel, Martin Schwidefsky,
	Heiko Carstens, linux-s390, Sebastian Ott, Steve Capper

On 23 February 2016 at 18:47, Will Deacon <will.deacon@arm.com> wrote:
> [adding Steve, since he worked on THP for 32-bit ARM]

Apologies for my late reply...

>
> On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
>> On Tue, 23 Feb 2016 13:32:21 +0300
>> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
>> > The theory is that the splitting bit effetely masked bogus pmd_present():
>> > we had pmd_trans_splitting() in all code path and that prevented mm from
>> > touching the pmd. Once pmd_trans_splitting() has gone, mm proceed with the
>> > pmd where it shouldn't and here's a boom.
>>
>> Well, I don't think pmd_present() == true is bogus for a trans_huge pmd under
>> splitting, after all there is a page behind the the pmd. Also, if it was
>> bogus, and it would need to be false, why should it be marked !pmd_present()
>> only at the pmdp_invalidate() step before the pmd_populate()? It clearly
>> is pmd_present() before that, on all architectures, and if there was any
>> problem/race with that, setting it to !pmd_present() at this stage would
>> only (marginally) reduce the race window.
>>
>> BTW, PowerPC and Sparc seem to do the same thing in pmdp_invalidate(),
>> i.e. they do not set pmd_present() == false, only mark it so that it would
>> not generate a new TLB entry, just like on s390. After all, the function
>> is called pmdp_invalidate(), and I think the comment in mm/huge_memory.c
>> before that call is just a little ambiguous in its wording. When it says
>> "mark the pmd notpresent" it probably means "mark it so that it will not
>> generate a new TLB entry", which is also what the comment is really about:
>> prevent huge and small entries in the TLB for the same page at the same
>> time.
>>
>> FWIW, and since the ARM arch-list is already on cc, I think there is
>> an issue with pmdp_invalidate() on ARM, since it also seems to clear
>> the trans_huge (and formerly trans_splitting) bit, which actually makes
>> the pmd !pmd_present(), but it violates the other requirement from the
>> comment:
>> "the pmd_trans_huge and pmd_trans_splitting must remain set at all times
>> on the pmd until the split is complete for this pmd"
>
> I've only been testing this for arm64 (where I'm yet to see a problem),
> but we use the generic pmdp_invalidate implementation from
> mm/pgtable-generic.c there. On arm64, pmd_trans_huge will return true
> after pmd_mknotpresent. On arm, it does look to be buggy, since it nukes
> the entire entry... Steve?

pmd_mknotpresent on arm looks inconsistent with the other
architectures and can be changed.

Having had a look at the usage, I can't see it causing an immediate
problem (that needs to be addressed by an emergency patch).
We don't have a notion of splitting pmds (so there is no splitting
information to lose), and the only usage I could see of
pmd_mknotpresent was:

pmdp_invalidate(vma, haddr, pmd);
pmd_populate(mm, pmd, pgtable);

In mm/huge_memory.c, around line 3588.

So we invalidate the entry (which puts down a faulting entry from
pmd_mknotpresent and invalidates tlb), then immediately put down a
table entry with pmd_populate.

I have run a 32-bit ARM test kernel and exacerbated THP splits (that's
what took me time), and I didn't notice any problems with 4.5-rc5.

Cheers,
-- 
Steve

>
> Will
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-25 15:49                   ` Steve Capper
  0 siblings, 0 replies; 153+ messages in thread
From: Steve Capper @ 2016-02-25 15:49 UTC (permalink / raw)
  To: linux-arm-kernel

On 23 February 2016 at 18:47, Will Deacon <will.deacon@arm.com> wrote:
> [adding Steve, since he worked on THP for 32-bit ARM]

Apologies for my late reply...

>
> On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
>> On Tue, 23 Feb 2016 13:32:21 +0300
>> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
>> > The theory is that the splitting bit effetely masked bogus pmd_present():
>> > we had pmd_trans_splitting() in all code path and that prevented mm from
>> > touching the pmd. Once pmd_trans_splitting() has gone, mm proceed with the
>> > pmd where it shouldn't and here's a boom.
>>
>> Well, I don't think pmd_present() == true is bogus for a trans_huge pmd under
>> splitting, after all there is a page behind the the pmd. Also, if it was
>> bogus, and it would need to be false, why should it be marked !pmd_present()
>> only at the pmdp_invalidate() step before the pmd_populate()? It clearly
>> is pmd_present() before that, on all architectures, and if there was any
>> problem/race with that, setting it to !pmd_present() at this stage would
>> only (marginally) reduce the race window.
>>
>> BTW, PowerPC and Sparc seem to do the same thing in pmdp_invalidate(),
>> i.e. they do not set pmd_present() == false, only mark it so that it would
>> not generate a new TLB entry, just like on s390. After all, the function
>> is called pmdp_invalidate(), and I think the comment in mm/huge_memory.c
>> before that call is just a little ambiguous in its wording. When it says
>> "mark the pmd notpresent" it probably means "mark it so that it will not
>> generate a new TLB entry", which is also what the comment is really about:
>> prevent huge and small entries in the TLB for the same page at the same
>> time.
>>
>> FWIW, and since the ARM arch-list is already on cc, I think there is
>> an issue with pmdp_invalidate() on ARM, since it also seems to clear
>> the trans_huge (and formerly trans_splitting) bit, which actually makes
>> the pmd !pmd_present(), but it violates the other requirement from the
>> comment:
>> "the pmd_trans_huge and pmd_trans_splitting must remain set at all times
>> on the pmd until the split is complete for this pmd"
>
> I've only been testing this for arm64 (where I'm yet to see a problem),
> but we use the generic pmdp_invalidate implementation from
> mm/pgtable-generic.c there. On arm64, pmd_trans_huge will return true
> after pmd_mknotpresent. On arm, it does look to be buggy, since it nukes
> the entire entry... Steve?

pmd_mknotpresent on arm looks inconsistent with the other
architectures and can be changed.

Having had a look at the usage, I can't see it causing an immediate
problem (that needs to be addressed by an emergency patch).
We don't have a notion of splitting pmds (so there is no splitting
information to lose), and the only usage I could see of
pmd_mknotpresent was:

pmdp_invalidate(vma, haddr, pmd);
pmd_populate(mm, pmd, pgtable);

In mm/huge_memory.c, around line 3588.

So we invalidate the entry (which puts down a faulting entry from
pmd_mknotpresent and invalidates tlb), then immediately put down a
table entry with pmd_populate.

I have run a 32-bit ARM test kernel and exacerbated THP splits (that's
what took me time), and I didn't notice any problems with 4.5-rc5.

Cheers,
-- 
Steve

>
> Will
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo at kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email at kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-25 15:49                   ` Steve Capper
  (?)
  (?)
@ 2016-02-25 16:01                     ` Kirill A. Shutemov
  -1 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-25 16:01 UTC (permalink / raw)
  To: Steve Capper
  Cc: Will Deacon, Gerald Schaefer, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, linux-arm-kernel, Martin Schwidefsky,
	Heiko Carstens, linux-s390, Sebastian Ott, Steve Capper

On Thu, Feb 25, 2016 at 03:49:33PM +0000, Steve Capper wrote:
> On 23 February 2016 at 18:47, Will Deacon <will.deacon@arm.com> wrote:
> > [adding Steve, since he worked on THP for 32-bit ARM]
> 
> Apologies for my late reply...
> 
> >
> > On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
> >> On Tue, 23 Feb 2016 13:32:21 +0300
> >> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> >> > The theory is that the splitting bit effetely masked bogus pmd_present():
> >> > we had pmd_trans_splitting() in all code path and that prevented mm from
> >> > touching the pmd. Once pmd_trans_splitting() has gone, mm proceed with the
> >> > pmd where it shouldn't and here's a boom.
> >>
> >> Well, I don't think pmd_present() == true is bogus for a trans_huge pmd under
> >> splitting, after all there is a page behind the the pmd. Also, if it was
> >> bogus, and it would need to be false, why should it be marked !pmd_present()
> >> only at the pmdp_invalidate() step before the pmd_populate()? It clearly
> >> is pmd_present() before that, on all architectures, and if there was any
> >> problem/race with that, setting it to !pmd_present() at this stage would
> >> only (marginally) reduce the race window.
> >>
> >> BTW, PowerPC and Sparc seem to do the same thing in pmdp_invalidate(),
> >> i.e. they do not set pmd_present() == false, only mark it so that it would
> >> not generate a new TLB entry, just like on s390. After all, the function
> >> is called pmdp_invalidate(), and I think the comment in mm/huge_memory.c
> >> before that call is just a little ambiguous in its wording. When it says
> >> "mark the pmd notpresent" it probably means "mark it so that it will not
> >> generate a new TLB entry", which is also what the comment is really about:
> >> prevent huge and small entries in the TLB for the same page at the same
> >> time.
> >>
> >> FWIW, and since the ARM arch-list is already on cc, I think there is
> >> an issue with pmdp_invalidate() on ARM, since it also seems to clear
> >> the trans_huge (and formerly trans_splitting) bit, which actually makes
> >> the pmd !pmd_present(), but it violates the other requirement from the
> >> comment:
> >> "the pmd_trans_huge and pmd_trans_splitting must remain set at all times
> >> on the pmd until the split is complete for this pmd"
> >
> > I've only been testing this for arm64 (where I'm yet to see a problem),
> > but we use the generic pmdp_invalidate implementation from
> > mm/pgtable-generic.c there. On arm64, pmd_trans_huge will return true
> > after pmd_mknotpresent. On arm, it does look to be buggy, since it nukes
> > the entire entry... Steve?
> 
> pmd_mknotpresent on arm looks inconsistent with the other
> architectures and can be changed.
> 
> Having had a look at the usage, I can't see it causing an immediate
> problem (that needs to be addressed by an emergency patch).
> We don't have a notion of splitting pmds (so there is no splitting
> information to lose), and the only usage I could see of
> pmd_mknotpresent was:
> 
> pmdp_invalidate(vma, haddr, pmd);
> pmd_populate(mm, pmd, pgtable);
> 
> In mm/huge_memory.c, around line 3588.
> 
> So we invalidate the entry (which puts down a faulting entry from
> pmd_mknotpresent and invalidates tlb), then immediately put down a
> table entry with pmd_populate.
> 
> I have run a 32-bit ARM test kernel and exacerbated THP splits (that's
> what took me time), and I didn't notice any problems with 4.5-rc5.

If I read code correctly, your pmd_mknotpresent() makes the pmd
pmd_none(), right? If yes, it's a problem.

It introduces race I've described here:

https://marc.info/?l=linux-mm&m=144723658100512&w=4

Basically, if zap_pmd_range() would see pmd_none() between
pmdp_mknotpresent() and pmd_populate(), we're screwed.

The race window is small, but it's there.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-25 16:01                     ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-25 16:01 UTC (permalink / raw)
  To: Steve Capper
  Cc: Will Deacon, Gerald Schaefer, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, linux-arm-kernel, Martin Schwidefsky,
	Heiko Carstens, linux-s390, Sebastian Ott, Steve Capper

On Thu, Feb 25, 2016 at 03:49:33PM +0000, Steve Capper wrote:
> On 23 February 2016 at 18:47, Will Deacon <will.deacon@arm.com> wrote:
> > [adding Steve, since he worked on THP for 32-bit ARM]
> 
> Apologies for my late reply...
> 
> >
> > On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
> >> On Tue, 23 Feb 2016 13:32:21 +0300
> >> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> >> > The theory is that the splitting bit effetely masked bogus pmd_present():
> >> > we had pmd_trans_splitting() in all code path and that prevented mm from
> >> > touching the pmd. Once pmd_trans_splitting() has gone, mm proceed with the
> >> > pmd where it shouldn't and here's a boom.
> >>
> >> Well, I don't think pmd_present() == true is bogus for a trans_huge pmd under
> >> splitting, after all there is a page behind the the pmd. Also, if it was
> >> bogus, and it would need to be false, why should it be marked !pmd_present()
> >> only at the pmdp_invalidate() step before the pmd_populate()? It clearly
> >> is pmd_present() before that, on all architectures, and if there was any
> >> problem/race with that, setting it to !pmd_present() at this stage would
> >> only (marginally) reduce the race window.
> >>
> >> BTW, PowerPC and Sparc seem to do the same thing in pmdp_invalidate(),
> >> i.e. they do not set pmd_present() == false, only mark it so that it would
> >> not generate a new TLB entry, just like on s390. After all, the function
> >> is called pmdp_invalidate(), and I think the comment in mm/huge_memory.c
> >> before that call is just a little ambiguous in its wording. When it says
> >> "mark the pmd notpresent" it probably means "mark it so that it will not
> >> generate a new TLB entry", which is also what the comment is really about:
> >> prevent huge and small entries in the TLB for the same page at the same
> >> time.
> >>
> >> FWIW, and since the ARM arch-list is already on cc, I think there is
> >> an issue with pmdp_invalidate() on ARM, since it also seems to clear
> >> the trans_huge (and formerly trans_splitting) bit, which actually makes
> >> the pmd !pmd_present(), but it violates the other requirement from the
> >> comment:
> >> "the pmd_trans_huge and pmd_trans_splitting must remain set at all times
> >> on the pmd until the split is complete for this pmd"
> >
> > I've only been testing this for arm64 (where I'm yet to see a problem),
> > but we use the generic pmdp_invalidate implementation from
> > mm/pgtable-generic.c there. On arm64, pmd_trans_huge will return true
> > after pmd_mknotpresent. On arm, it does look to be buggy, since it nukes
> > the entire entry... Steve?
> 
> pmd_mknotpresent on arm looks inconsistent with the other
> architectures and can be changed.
> 
> Having had a look at the usage, I can't see it causing an immediate
> problem (that needs to be addressed by an emergency patch).
> We don't have a notion of splitting pmds (so there is no splitting
> information to lose), and the only usage I could see of
> pmd_mknotpresent was:
> 
> pmdp_invalidate(vma, haddr, pmd);
> pmd_populate(mm, pmd, pgtable);
> 
> In mm/huge_memory.c, around line 3588.
> 
> So we invalidate the entry (which puts down a faulting entry from
> pmd_mknotpresent and invalidates tlb), then immediately put down a
> table entry with pmd_populate.
> 
> I have run a 32-bit ARM test kernel and exacerbated THP splits (that's
> what took me time), and I didn't notice any problems with 4.5-rc5.

If I read code correctly, your pmd_mknotpresent() makes the pmd
pmd_none(), right? If yes, it's a problem.

It introduces race I've described here:

https://marc.info/?l=linux-mm&m=144723658100512&w=4

Basically, if zap_pmd_range() would see pmd_none() between
pmdp_mknotpresent() and pmd_populate(), we're screwed.

The race window is small, but it's there.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-25 16:01                     ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-25 16:01 UTC (permalink / raw)
  To: Steve Capper
  Cc: Will Deacon, Gerald Schaefer, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, linux-arm-kernel, Martin Schwidefsky,
	Heiko Carstens, linux-s390, Sebastian Ott, Steve Capper

On Thu, Feb 25, 2016 at 03:49:33PM +0000, Steve Capper wrote:
> On 23 February 2016 at 18:47, Will Deacon <will.deacon@arm.com> wrote:
> > [adding Steve, since he worked on THP for 32-bit ARM]
> 
> Apologies for my late reply...
> 
> >
> > On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
> >> On Tue, 23 Feb 2016 13:32:21 +0300
> >> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> >> > The theory is that the splitting bit effetely masked bogus pmd_present():
> >> > we had pmd_trans_splitting() in all code path and that prevented mm from
> >> > touching the pmd. Once pmd_trans_splitting() has gone, mm proceed with the
> >> > pmd where it shouldn't and here's a boom.
> >>
> >> Well, I don't think pmd_present() == true is bogus for a trans_huge pmd under
> >> splitting, after all there is a page behind the the pmd. Also, if it was
> >> bogus, and it would need to be false, why should it be marked !pmd_present()
> >> only at the pmdp_invalidate() step before the pmd_populate()? It clearly
> >> is pmd_present() before that, on all architectures, and if there was any
> >> problem/race with that, setting it to !pmd_present() at this stage would
> >> only (marginally) reduce the race window.
> >>
> >> BTW, PowerPC and Sparc seem to do the same thing in pmdp_invalidate(),
> >> i.e. they do not set pmd_present() == false, only mark it so that it would
> >> not generate a new TLB entry, just like on s390. After all, the function
> >> is called pmdp_invalidate(), and I think the comment in mm/huge_memory.c
> >> before that call is just a little ambiguous in its wording. When it says
> >> "mark the pmd notpresent" it probably means "mark it so that it will not
> >> generate a new TLB entry", which is also what the comment is really about:
> >> prevent huge and small entries in the TLB for the same page at the same
> >> time.
> >>
> >> FWIW, and since the ARM arch-list is already on cc, I think there is
> >> an issue with pmdp_invalidate() on ARM, since it also seems to clear
> >> the trans_huge (and formerly trans_splitting) bit, which actually makes
> >> the pmd !pmd_present(), but it violates the other requirement from the
> >> comment:
> >> "the pmd_trans_huge and pmd_trans_splitting must remain set at all times
> >> on the pmd until the split is complete for this pmd"
> >
> > I've only been testing this for arm64 (where I'm yet to see a problem),
> > but we use the generic pmdp_invalidate implementation from
> > mm/pgtable-generic.c there. On arm64, pmd_trans_huge will return true
> > after pmd_mknotpresent. On arm, it does look to be buggy, since it nukes
> > the entire entry... Steve?
> 
> pmd_mknotpresent on arm looks inconsistent with the other
> architectures and can be changed.
> 
> Having had a look at the usage, I can't see it causing an immediate
> problem (that needs to be addressed by an emergency patch).
> We don't have a notion of splitting pmds (so there is no splitting
> information to lose), and the only usage I could see of
> pmd_mknotpresent was:
> 
> pmdp_invalidate(vma, haddr, pmd);
> pmd_populate(mm, pmd, pgtable);
> 
> In mm/huge_memory.c, around line 3588.
> 
> So we invalidate the entry (which puts down a faulting entry from
> pmd_mknotpresent and invalidates tlb), then immediately put down a
> table entry with pmd_populate.
> 
> I have run a 32-bit ARM test kernel and exacerbated THP splits (that's
> what took me time), and I didn't notice any problems with 4.5-rc5.

If I read code correctly, your pmd_mknotpresent() makes the pmd
pmd_none(), right? If yes, it's a problem.

It introduces race I've described here:

https://marc.info/?l=linux-mm&m=144723658100512&w=4

Basically, if zap_pmd_range() would see pmd_none() between
pmdp_mknotpresent() and pmd_populate(), we're screwed.

The race window is small, but it's there.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-25 16:01                     ` Kirill A. Shutemov
  0 siblings, 0 replies; 153+ messages in thread
From: Kirill A. Shutemov @ 2016-02-25 16:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 25, 2016 at 03:49:33PM +0000, Steve Capper wrote:
> On 23 February 2016 at 18:47, Will Deacon <will.deacon@arm.com> wrote:
> > [adding Steve, since he worked on THP for 32-bit ARM]
> 
> Apologies for my late reply...
> 
> >
> > On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
> >> On Tue, 23 Feb 2016 13:32:21 +0300
> >> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> >> > The theory is that the splitting bit effetely masked bogus pmd_present():
> >> > we had pmd_trans_splitting() in all code path and that prevented mm from
> >> > touching the pmd. Once pmd_trans_splitting() has gone, mm proceed with the
> >> > pmd where it shouldn't and here's a boom.
> >>
> >> Well, I don't think pmd_present() == true is bogus for a trans_huge pmd under
> >> splitting, after all there is a page behind the the pmd. Also, if it was
> >> bogus, and it would need to be false, why should it be marked !pmd_present()
> >> only at the pmdp_invalidate() step before the pmd_populate()? It clearly
> >> is pmd_present() before that, on all architectures, and if there was any
> >> problem/race with that, setting it to !pmd_present() at this stage would
> >> only (marginally) reduce the race window.
> >>
> >> BTW, PowerPC and Sparc seem to do the same thing in pmdp_invalidate(),
> >> i.e. they do not set pmd_present() == false, only mark it so that it would
> >> not generate a new TLB entry, just like on s390. After all, the function
> >> is called pmdp_invalidate(), and I think the comment in mm/huge_memory.c
> >> before that call is just a little ambiguous in its wording. When it says
> >> "mark the pmd notpresent" it probably means "mark it so that it will not
> >> generate a new TLB entry", which is also what the comment is really about:
> >> prevent huge and small entries in the TLB for the same page at the same
> >> time.
> >>
> >> FWIW, and since the ARM arch-list is already on cc, I think there is
> >> an issue with pmdp_invalidate() on ARM, since it also seems to clear
> >> the trans_huge (and formerly trans_splitting) bit, which actually makes
> >> the pmd !pmd_present(), but it violates the other requirement from the
> >> comment:
> >> "the pmd_trans_huge and pmd_trans_splitting must remain set at all times
> >> on the pmd until the split is complete for this pmd"
> >
> > I've only been testing this for arm64 (where I'm yet to see a problem),
> > but we use the generic pmdp_invalidate implementation from
> > mm/pgtable-generic.c there. On arm64, pmd_trans_huge will return true
> > after pmd_mknotpresent. On arm, it does look to be buggy, since it nukes
> > the entire entry... Steve?
> 
> pmd_mknotpresent on arm looks inconsistent with the other
> architectures and can be changed.
> 
> Having had a look at the usage, I can't see it causing an immediate
> problem (that needs to be addressed by an emergency patch).
> We don't have a notion of splitting pmds (so there is no splitting
> information to lose), and the only usage I could see of
> pmd_mknotpresent was:
> 
> pmdp_invalidate(vma, haddr, pmd);
> pmd_populate(mm, pmd, pgtable);
> 
> In mm/huge_memory.c, around line 3588.
> 
> So we invalidate the entry (which puts down a faulting entry from
> pmd_mknotpresent and invalidates tlb), then immediately put down a
> table entry with pmd_populate.
> 
> I have run a 32-bit ARM test kernel and exacerbated THP splits (that's
> what took me time), and I didn't notice any problems with 4.5-rc5.

If I read code correctly, your pmd_mknotpresent() makes the pmd
pmd_none(), right? If yes, it's a problem.

It introduces race I've described here:

https://marc.info/?l=linux-mm&m=144723658100512&w=4

Basically, if zap_pmd_range() would see pmd_none() between
pmdp_mknotpresent() and pmd_populate(), we're screwed.

The race window is small, but it's there.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
  2016-02-25 16:01                     ` Kirill A. Shutemov
  (?)
  (?)
@ 2016-02-25 16:08                       ` Steve Capper
  -1 siblings, 0 replies; 153+ messages in thread
From: Steve Capper @ 2016-02-25 16:08 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Will Deacon, Gerald Schaefer, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, linux-arm-kernel, Martin Schwidefsky,
	Heiko Carstens, linux-s390, Sebastian Ott, Steve Capper

On 25 February 2016 at 16:01, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> On Thu, Feb 25, 2016 at 03:49:33PM +0000, Steve Capper wrote:
>> On 23 February 2016 at 18:47, Will Deacon <will.deacon@arm.com> wrote:
>> > [adding Steve, since he worked on THP for 32-bit ARM]
>>
>> Apologies for my late reply...
>>
>> >
>> > On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
>> >> On Tue, 23 Feb 2016 13:32:21 +0300
>> >> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
>> >> > The theory is that the splitting bit effetely masked bogus pmd_present():
>> >> > we had pmd_trans_splitting() in all code path and that prevented mm from
>> >> > touching the pmd. Once pmd_trans_splitting() has gone, mm proceed with the
>> >> > pmd where it shouldn't and here's a boom.
>> >>
>> >> Well, I don't think pmd_present() == true is bogus for a trans_huge pmd under
>> >> splitting, after all there is a page behind the the pmd. Also, if it was
>> >> bogus, and it would need to be false, why should it be marked !pmd_present()
>> >> only at the pmdp_invalidate() step before the pmd_populate()? It clearly
>> >> is pmd_present() before that, on all architectures, and if there was any
>> >> problem/race with that, setting it to !pmd_present() at this stage would
>> >> only (marginally) reduce the race window.
>> >>
>> >> BTW, PowerPC and Sparc seem to do the same thing in pmdp_invalidate(),
>> >> i.e. they do not set pmd_present() == false, only mark it so that it would
>> >> not generate a new TLB entry, just like on s390. After all, the function
>> >> is called pmdp_invalidate(), and I think the comment in mm/huge_memory.c
>> >> before that call is just a little ambiguous in its wording. When it says
>> >> "mark the pmd notpresent" it probably means "mark it so that it will not
>> >> generate a new TLB entry", which is also what the comment is really about:
>> >> prevent huge and small entries in the TLB for the same page at the same
>> >> time.
>> >>
>> >> FWIW, and since the ARM arch-list is already on cc, I think there is
>> >> an issue with pmdp_invalidate() on ARM, since it also seems to clear
>> >> the trans_huge (and formerly trans_splitting) bit, which actually makes
>> >> the pmd !pmd_present(), but it violates the other requirement from the
>> >> comment:
>> >> "the pmd_trans_huge and pmd_trans_splitting must remain set at all times
>> >> on the pmd until the split is complete for this pmd"
>> >
>> > I've only been testing this for arm64 (where I'm yet to see a problem),
>> > but we use the generic pmdp_invalidate implementation from
>> > mm/pgtable-generic.c there. On arm64, pmd_trans_huge will return true
>> > after pmd_mknotpresent. On arm, it does look to be buggy, since it nukes
>> > the entire entry... Steve?
>>
>> pmd_mknotpresent on arm looks inconsistent with the other
>> architectures and can be changed.
>>
>> Having had a look at the usage, I can't see it causing an immediate
>> problem (that needs to be addressed by an emergency patch).
>> We don't have a notion of splitting pmds (so there is no splitting
>> information to lose), and the only usage I could see of
>> pmd_mknotpresent was:
>>
>> pmdp_invalidate(vma, haddr, pmd);
>> pmd_populate(mm, pmd, pgtable);
>>
>> In mm/huge_memory.c, around line 3588.
>>
>> So we invalidate the entry (which puts down a faulting entry from
>> pmd_mknotpresent and invalidates tlb), then immediately put down a
>> table entry with pmd_populate.
>>
>> I have run a 32-bit ARM test kernel and exacerbated THP splits (that's
>> what took me time), and I didn't notice any problems with 4.5-rc5.
>
> If I read code correctly, your pmd_mknotpresent() makes the pmd
> pmd_none(), right? If yes, it's a problem.
>
> It introduces race I've described here:
>
> https://marc.info/?l=linux-mm&m=144723658100512&w=4
>
> Basically, if zap_pmd_range() would see pmd_none() between
> pmdp_mknotpresent() and pmd_populate(), we're screwed.
>
> The race window is small, but it's there.

Ahhhh, okay, thank you Kirill.
I agree, I'll get a patch out.

Cheers,
--
Steve

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-25 16:08                       ` Steve Capper
  0 siblings, 0 replies; 153+ messages in thread
From: Steve Capper @ 2016-02-25 16:08 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Will Deacon, Gerald Schaefer, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, linux-arm-kernel, Martin Schwidefsky,
	Heiko Carstens, linux-s390, Sebastian Ott, Steve Capper

On 25 February 2016 at 16:01, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> On Thu, Feb 25, 2016 at 03:49:33PM +0000, Steve Capper wrote:
>> On 23 February 2016 at 18:47, Will Deacon <will.deacon@arm.com> wrote:
>> > [adding Steve, since he worked on THP for 32-bit ARM]
>>
>> Apologies for my late reply...
>>
>> >
>> > On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
>> >> On Tue, 23 Feb 2016 13:32:21 +0300
>> >> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
>> >> > The theory is that the splitting bit effetely masked bogus pmd_present():
>> >> > we had pmd_trans_splitting() in all code path and that prevented mm from
>> >> > touching the pmd. Once pmd_trans_splitting() has gone, mm proceed with the
>> >> > pmd where it shouldn't and here's a boom.
>> >>
>> >> Well, I don't think pmd_present() == true is bogus for a trans_huge pmd under
>> >> splitting, after all there is a page behind the the pmd. Also, if it was
>> >> bogus, and it would need to be false, why should it be marked !pmd_present()
>> >> only at the pmdp_invalidate() step before the pmd_populate()? It clearly
>> >> is pmd_present() before that, on all architectures, and if there was any
>> >> problem/race with that, setting it to !pmd_present() at this stage would
>> >> only (marginally) reduce the race window.
>> >>
>> >> BTW, PowerPC and Sparc seem to do the same thing in pmdp_invalidate(),
>> >> i.e. they do not set pmd_present() == false, only mark it so that it would
>> >> not generate a new TLB entry, just like on s390. After all, the function
>> >> is called pmdp_invalidate(), and I think the comment in mm/huge_memory.c
>> >> before that call is just a little ambiguous in its wording. When it says
>> >> "mark the pmd notpresent" it probably means "mark it so that it will not
>> >> generate a new TLB entry", which is also what the comment is really about:
>> >> prevent huge and small entries in the TLB for the same page at the same
>> >> time.
>> >>
>> >> FWIW, and since the ARM arch-list is already on cc, I think there is
>> >> an issue with pmdp_invalidate() on ARM, since it also seems to clear
>> >> the trans_huge (and formerly trans_splitting) bit, which actually makes
>> >> the pmd !pmd_present(), but it violates the other requirement from the
>> >> comment:
>> >> "the pmd_trans_huge and pmd_trans_splitting must remain set at all times
>> >> on the pmd until the split is complete for this pmd"
>> >
>> > I've only been testing this for arm64 (where I'm yet to see a problem),
>> > but we use the generic pmdp_invalidate implementation from
>> > mm/pgtable-generic.c there. On arm64, pmd_trans_huge will return true
>> > after pmd_mknotpresent. On arm, it does look to be buggy, since it nukes
>> > the entire entry... Steve?
>>
>> pmd_mknotpresent on arm looks inconsistent with the other
>> architectures and can be changed.
>>
>> Having had a look at the usage, I can't see it causing an immediate
>> problem (that needs to be addressed by an emergency patch).
>> We don't have a notion of splitting pmds (so there is no splitting
>> information to lose), and the only usage I could see of
>> pmd_mknotpresent was:
>>
>> pmdp_invalidate(vma, haddr, pmd);
>> pmd_populate(mm, pmd, pgtable);
>>
>> In mm/huge_memory.c, around line 3588.
>>
>> So we invalidate the entry (which puts down a faulting entry from
>> pmd_mknotpresent and invalidates tlb), then immediately put down a
>> table entry with pmd_populate.
>>
>> I have run a 32-bit ARM test kernel and exacerbated THP splits (that's
>> what took me time), and I didn't notice any problems with 4.5-rc5.
>
> If I read code correctly, your pmd_mknotpresent() makes the pmd
> pmd_none(), right? If yes, it's a problem.
>
> It introduces race I've described here:
>
> https://marc.info/?l=linux-mm&m=144723658100512&w=4
>
> Basically, if zap_pmd_range() would see pmd_none() between
> pmdp_mknotpresent() and pmd_populate(), we're screwed.
>
> The race window is small, but it's there.

Ahhhh, okay, thank you Kirill.
I agree, I'll get a patch out.

Cheers,
--
Steve

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-25 16:08                       ` Steve Capper
  0 siblings, 0 replies; 153+ messages in thread
From: Steve Capper @ 2016-02-25 16:08 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Will Deacon, Gerald Schaefer, Christian Borntraeger,
	Kirill A. Shutemov, linux-mm, linux-kernel, Aneesh Kumar K.V,
	Andrew Morton, Linus Torvalds, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Catalin Marinas, linux-arm-kernel, Martin Schwidefsky,
	Heiko Carstens, linux-s390, Sebastian Ott, Steve Capper

On 25 February 2016 at 16:01, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> On Thu, Feb 25, 2016 at 03:49:33PM +0000, Steve Capper wrote:
>> On 23 February 2016 at 18:47, Will Deacon <will.deacon@arm.com> wrote:
>> > [adding Steve, since he worked on THP for 32-bit ARM]
>>
>> Apologies for my late reply...
>>
>> >
>> > On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
>> >> On Tue, 23 Feb 2016 13:32:21 +0300
>> >> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
>> >> > The theory is that the splitting bit effetely masked bogus pmd_present():
>> >> > we had pmd_trans_splitting() in all code path and that prevented mm from
>> >> > touching the pmd. Once pmd_trans_splitting() has gone, mm proceed with the
>> >> > pmd where it shouldn't and here's a boom.
>> >>
>> >> Well, I don't think pmd_present() == true is bogus for a trans_huge pmd under
>> >> splitting, after all there is a page behind the the pmd. Also, if it was
>> >> bogus, and it would need to be false, why should it be marked !pmd_present()
>> >> only at the pmdp_invalidate() step before the pmd_populate()? It clearly
>> >> is pmd_present() before that, on all architectures, and if there was any
>> >> problem/race with that, setting it to !pmd_present() at this stage would
>> >> only (marginally) reduce the race window.
>> >>
>> >> BTW, PowerPC and Sparc seem to do the same thing in pmdp_invalidate(),
>> >> i.e. they do not set pmd_present() == false, only mark it so that it would
>> >> not generate a new TLB entry, just like on s390. After all, the function
>> >> is called pmdp_invalidate(), and I think the comment in mm/huge_memory.c
>> >> before that call is just a little ambiguous in its wording. When it says
>> >> "mark the pmd notpresent" it probably means "mark it so that it will not
>> >> generate a new TLB entry", which is also what the comment is really about:
>> >> prevent huge and small entries in the TLB for the same page at the same
>> >> time.
>> >>
>> >> FWIW, and since the ARM arch-list is already on cc, I think there is
>> >> an issue with pmdp_invalidate() on ARM, since it also seems to clear
>> >> the trans_huge (and formerly trans_splitting) bit, which actually makes
>> >> the pmd !pmd_present(), but it violates the other requirement from the
>> >> comment:
>> >> "the pmd_trans_huge and pmd_trans_splitting must remain set at all times
>> >> on the pmd until the split is complete for this pmd"
>> >
>> > I've only been testing this for arm64 (where I'm yet to see a problem),
>> > but we use the generic pmdp_invalidate implementation from
>> > mm/pgtable-generic.c there. On arm64, pmd_trans_huge will return true
>> > after pmd_mknotpresent. On arm, it does look to be buggy, since it nukes
>> > the entire entry... Steve?
>>
>> pmd_mknotpresent on arm looks inconsistent with the other
>> architectures and can be changed.
>>
>> Having had a look at the usage, I can't see it causing an immediate
>> problem (that needs to be addressed by an emergency patch).
>> We don't have a notion of splitting pmds (so there is no splitting
>> information to lose), and the only usage I could see of
>> pmd_mknotpresent was:
>>
>> pmdp_invalidate(vma, haddr, pmd);
>> pmd_populate(mm, pmd, pgtable);
>>
>> In mm/huge_memory.c, around line 3588.
>>
>> So we invalidate the entry (which puts down a faulting entry from
>> pmd_mknotpresent and invalidates tlb), then immediately put down a
>> table entry with pmd_populate.
>>
>> I have run a 32-bit ARM test kernel and exacerbated THP splits (that's
>> what took me time), and I didn't notice any problems with 4.5-rc5.
>
> If I read code correctly, your pmd_mknotpresent() makes the pmd
> pmd_none(), right? If yes, it's a problem.
>
> It introduces race I've described here:
>
> https://marc.info/?l=linux-mm&m=144723658100512&w=4
>
> Basically, if zap_pmd_range() would see pmd_none() between
> pmdp_mknotpresent() and pmd_populate(), we're screwed.
>
> The race window is small, but it's there.

Ahhhh, okay, thank you Kirill.
I agree, I'll get a patch out.

Cheers,
--
Steve

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
@ 2016-02-25 16:08                       ` Steve Capper
  0 siblings, 0 replies; 153+ messages in thread
From: Steve Capper @ 2016-02-25 16:08 UTC (permalink / raw)
  To: linux-arm-kernel

On 25 February 2016 at 16:01, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> On Thu, Feb 25, 2016 at 03:49:33PM +0000, Steve Capper wrote:
>> On 23 February 2016 at 18:47, Will Deacon <will.deacon@arm.com> wrote:
>> > [adding Steve, since he worked on THP for 32-bit ARM]
>>
>> Apologies for my late reply...
>>
>> >
>> > On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
>> >> On Tue, 23 Feb 2016 13:32:21 +0300
>> >> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
>> >> > The theory is that the splitting bit effetely masked bogus pmd_present():
>> >> > we had pmd_trans_splitting() in all code path and that prevented mm from
>> >> > touching the pmd. Once pmd_trans_splitting() has gone, mm proceed with the
>> >> > pmd where it shouldn't and here's a boom.
>> >>
>> >> Well, I don't think pmd_present() == true is bogus for a trans_huge pmd under
>> >> splitting, after all there is a page behind the the pmd. Also, if it was
>> >> bogus, and it would need to be false, why should it be marked !pmd_present()
>> >> only at the pmdp_invalidate() step before the pmd_populate()? It clearly
>> >> is pmd_present() before that, on all architectures, and if there was any
>> >> problem/race with that, setting it to !pmd_present() at this stage would
>> >> only (marginally) reduce the race window.
>> >>
>> >> BTW, PowerPC and Sparc seem to do the same thing in pmdp_invalidate(),
>> >> i.e. they do not set pmd_present() == false, only mark it so that it would
>> >> not generate a new TLB entry, just like on s390. After all, the function
>> >> is called pmdp_invalidate(), and I think the comment in mm/huge_memory.c
>> >> before that call is just a little ambiguous in its wording. When it says
>> >> "mark the pmd notpresent" it probably means "mark it so that it will not
>> >> generate a new TLB entry", which is also what the comment is really about:
>> >> prevent huge and small entries in the TLB for the same page at the same
>> >> time.
>> >>
>> >> FWIW, and since the ARM arch-list is already on cc, I think there is
>> >> an issue with pmdp_invalidate() on ARM, since it also seems to clear
>> >> the trans_huge (and formerly trans_splitting) bit, which actually makes
>> >> the pmd !pmd_present(), but it violates the other requirement from the
>> >> comment:
>> >> "the pmd_trans_huge and pmd_trans_splitting must remain set at all times
>> >> on the pmd until the split is complete for this pmd"
>> >
>> > I've only been testing this for arm64 (where I'm yet to see a problem),
>> > but we use the generic pmdp_invalidate implementation from
>> > mm/pgtable-generic.c there. On arm64, pmd_trans_huge will return true
>> > after pmd_mknotpresent. On arm, it does look to be buggy, since it nukes
>> > the entire entry... Steve?
>>
>> pmd_mknotpresent on arm looks inconsistent with the other
>> architectures and can be changed.
>>
>> Having had a look at the usage, I can't see it causing an immediate
>> problem (that needs to be addressed by an emergency patch).
>> We don't have a notion of splitting pmds (so there is no splitting
>> information to lose), and the only usage I could see of
>> pmd_mknotpresent was:
>>
>> pmdp_invalidate(vma, haddr, pmd);
>> pmd_populate(mm, pmd, pgtable);
>>
>> In mm/huge_memory.c, around line 3588.
>>
>> So we invalidate the entry (which puts down a faulting entry from
>> pmd_mknotpresent and invalidates tlb), then immediately put down a
>> table entry with pmd_populate.
>>
>> I have run a 32-bit ARM test kernel and exacerbated THP splits (that's
>> what took me time), and I didn't notice any problems with 4.5-rc5.
>
> If I read code correctly, your pmd_mknotpresent() makes the pmd
> pmd_none(), right? If yes, it's a problem.
>
> It introduces race I've described here:
>
> https://marc.info/?l=linux-mm&m=144723658100512&w=4
>
> Basically, if zap_pmd_range() would see pmd_none() between
> pmdp_mknotpresent() and pmd_populate(), we're screwed.
>
> The race window is small, but it's there.

Ahhhh, okay, thank you Kirill.
I agree, I'll get a patch out.

Cheers,
--
Steve

^ permalink raw reply	[flat|nested] 153+ messages in thread

end of thread, other threads:[~2016-02-25 16:08 UTC | newest]

Thread overview: 153+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-11 18:22 [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM) Gerald Schaefer
2016-02-11 18:22 ` Gerald Schaefer
2016-02-11 18:22 ` Gerald Schaefer
2016-02-11 19:09 ` Kirill A. Shutemov
2016-02-11 19:09   ` Kirill A. Shutemov
2016-02-11 19:09   ` Kirill A. Shutemov
2016-02-11 19:12   ` Kirill A. Shutemov
2016-02-11 19:12     ` Kirill A. Shutemov
2016-02-11 19:12     ` Kirill A. Shutemov
2016-02-12 12:21     ` Sebastian Ott
2016-02-12 12:21       ` Sebastian Ott
2016-02-12 12:21       ` Sebastian Ott
2016-02-11 19:57   ` Gerald Schaefer
2016-02-11 19:57     ` Gerald Schaefer
2016-02-11 19:57     ` Gerald Schaefer
2016-02-12  4:04     ` Aneesh Kumar K.V
2016-02-12  4:04       ` Aneesh Kumar K.V
2016-02-12  4:04       ` Aneesh Kumar K.V
2016-02-12 11:59       ` Gerald Schaefer
2016-02-12 11:59         ` Gerald Schaefer
2016-02-12 11:59         ` Gerald Schaefer
2016-02-12 16:17         ` Aneesh Kumar K.V
2016-02-12 16:17           ` Aneesh Kumar K.V
2016-02-12 16:17           ` Aneesh Kumar K.V
2016-02-12 10:01     ` Will Deacon
2016-02-12 10:01       ` Will Deacon
2016-02-12 10:01       ` Will Deacon
2016-02-12 10:12       ` Sebastian Ott
2016-02-12 10:12         ` Sebastian Ott
2016-02-12 10:12         ` Sebastian Ott
2016-02-12 15:52         ` Will Deacon
2016-02-12 15:52           ` Will Deacon
2016-02-12 15:52           ` Will Deacon
2016-02-12 15:41     ` Kirill A. Shutemov
2016-02-12 15:41       ` Kirill A. Shutemov
2016-02-12 15:41       ` Kirill A. Shutemov
2016-02-12 15:57       ` Christian Borntraeger
2016-02-12 15:57         ` Christian Borntraeger
2016-02-12 15:57         ` Christian Borntraeger
2016-02-12 17:16         ` Gerald Schaefer
2016-02-12 17:16           ` Gerald Schaefer
2016-02-12 17:16           ` Gerald Schaefer
2016-02-12 23:15           ` Kirill A. Shutemov
2016-02-12 23:15             ` Kirill A. Shutemov
2016-02-12 23:15             ` Kirill A. Shutemov
2016-02-13 11:58             ` Sebastian Ott
2016-02-13 11:58               ` Sebastian Ott
2016-02-13 11:58               ` Sebastian Ott
2016-02-15 11:31               ` Kirill A. Shutemov
2016-02-15 11:31                 ` Kirill A. Shutemov
2016-02-15 11:31                 ` Kirill A. Shutemov
2016-02-15 16:38                 ` Sebastian Ott
2016-02-15 16:38                   ` Sebastian Ott
2016-02-15 16:38                   ` Sebastian Ott
2016-02-15 18:37                 ` Gerald Schaefer
2016-02-15 18:37                   ` Gerald Schaefer
2016-02-15 18:37                   ` Gerald Schaefer
2016-02-15 18:37                   ` Gerald Schaefer
2016-02-15 21:35                   ` Kirill A. Shutemov
2016-02-15 21:35                     ` Kirill A. Shutemov
2016-02-15 21:35                     ` Kirill A. Shutemov
2016-02-16  9:54                     ` Sebastian Ott
2016-02-16  9:54                       ` Sebastian Ott
2016-02-16  9:54                       ` Sebastian Ott
2016-02-16 16:24                     ` Gerald Schaefer
2016-02-16 16:24                       ` Gerald Schaefer
2016-02-16 16:24                       ` Gerald Schaefer
2016-02-16 16:24                       ` Gerald Schaefer
2016-02-17 15:04                       ` Kirill A. Shutemov
2016-02-17 15:04                         ` Kirill A. Shutemov
2016-02-17 15:04                         ` Kirill A. Shutemov
2016-02-17 19:04                         ` Sebastian Ott
2016-02-17 19:04                           ` Sebastian Ott
2016-02-17 19:04                           ` Sebastian Ott
2016-02-16 18:46                     ` Christian Borntraeger
2016-02-16 18:46                       ` Christian Borntraeger
2016-02-16 18:46                       ` Christian Borntraeger
2016-02-17 19:13               ` Gerald Schaefer
2016-02-17 19:13                 ` Gerald Schaefer
2016-02-17 19:13                 ` Gerald Schaefer
2016-02-17 23:58                 ` Kirill A. Shutemov
2016-02-17 23:58                   ` Kirill A. Shutemov
2016-02-17 23:58                   ` Kirill A. Shutemov
2016-02-18 15:00                   ` Gerald Schaefer
2016-02-18 15:00                     ` Gerald Schaefer
2016-02-18 15:00                     ` Gerald Schaefer
2016-02-18 17:06                     ` Kirill A. Shutemov
2016-02-18 17:06                       ` Kirill A. Shutemov
2016-02-18 17:06                       ` Kirill A. Shutemov
2016-02-19 14:15                       ` Sebastian Ott
2016-02-19 14:15                         ` Sebastian Ott
2016-02-19 14:15                         ` Sebastian Ott
2016-02-15 16:41             ` Gerald Schaefer
2016-02-15 16:41               ` Gerald Schaefer
2016-02-15 16:41               ` Gerald Schaefer
2016-02-23 10:32           ` Kirill A. Shutemov
2016-02-23 10:32             ` Kirill A. Shutemov
2016-02-23 10:32             ` Kirill A. Shutemov
2016-02-23 17:46             ` Linus Torvalds
2016-02-23 17:46               ` Linus Torvalds
2016-02-23 17:46               ` Linus Torvalds
2016-02-23 17:46               ` Linus Torvalds
2016-02-23 18:19             ` Gerald Schaefer
2016-02-23 18:19               ` Gerald Schaefer
2016-02-23 18:19               ` Gerald Schaefer
2016-02-23 18:47               ` Will Deacon
2016-02-23 18:47                 ` Will Deacon
2016-02-23 18:47                 ` Will Deacon
2016-02-25 15:49                 ` Steve Capper
2016-02-25 15:49                   ` Steve Capper
2016-02-25 15:49                   ` Steve Capper
2016-02-25 15:49                   ` Steve Capper
2016-02-25 16:01                   ` Kirill A. Shutemov
2016-02-25 16:01                     ` Kirill A. Shutemov
2016-02-25 16:01                     ` Kirill A. Shutemov
2016-02-25 16:01                     ` Kirill A. Shutemov
2016-02-25 16:08                     ` Steve Capper
2016-02-25 16:08                       ` Steve Capper
2016-02-25 16:08                       ` Steve Capper
2016-02-25 16:08                       ` Steve Capper
2016-02-23 19:33               ` Kirill A. Shutemov
2016-02-23 19:33                 ` Kirill A. Shutemov
2016-02-23 19:33                 ` Kirill A. Shutemov
2016-02-23 20:22                 ` Will Deacon
2016-02-23 20:22                   ` Will Deacon
2016-02-23 20:22                   ` Will Deacon
2016-02-24 10:16                   ` Christian Borntraeger
2016-02-24 10:16                     ` Christian Borntraeger
2016-02-24 10:16                     ` Christian Borntraeger
2016-02-24 10:41                     ` Will Deacon
2016-02-24 10:41                       ` Will Deacon
2016-02-24 10:41                       ` Will Deacon
2016-02-24 10:51                       ` Christian Borntraeger
2016-02-24 10:51                         ` Christian Borntraeger
2016-02-24 10:51                         ` Christian Borntraeger
2016-02-24 11:02                         ` Will Deacon
2016-02-24 11:02                           ` Will Deacon
2016-02-24 11:02                           ` Will Deacon
2016-02-24 17:22                         ` Aneesh Kumar K.V
2016-02-24 17:22                           ` Aneesh Kumar K.V
2016-02-24 17:22                           ` Aneesh Kumar K.V
2016-02-24  8:39                 ` Martin Schwidefsky
2016-02-24  8:39                   ` Martin Schwidefsky
2016-02-24  8:39                   ` Martin Schwidefsky
2016-02-24 12:11                   ` Sebastian Ott
2016-02-24 12:11                     ` Sebastian Ott
2016-02-24 12:11                     ` Sebastian Ott
2016-02-24 16:44                 ` Gerald Schaefer
2016-02-24 16:44                   ` Gerald Schaefer
2016-02-24 16:44                   ` Gerald Schaefer
2016-02-24  8:22               ` Martin Schwidefsky
2016-02-24  8:22                 ` Martin Schwidefsky
2016-02-24  8:22                 ` Martin Schwidefsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.