Reproducible git-fsck/SHA1 failures since 3.7.x on a Dell E6430 / i5-3340M

All of lore.kernel.org
 help / color / mirror / Atom feed

* Reproducible git-fsck/SHA1 failures since 3.7.x on a Dell E6430 / i5-3340M
@ 2013-08-09 14:58 Ben Tebulin
  2013-08-12  8:04 ` Reproducible data corruption since 3.7.x on i5-3340M machines Ben Tebulin
  2013-08-14 16:36 ` [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67! Ben Tebulin
  0 siblings, 2 replies; 45+ messages in thread
From: Ben Tebulin @ 2013-08-09 14:58 UTC (permalink / raw)
  To: linux-kernel

Hello Kernel list!

On two machines a very specific repository the SHA1 implementation of
git-fsck and git-show fails in 9/10 cases for a specific 39MB blob.

This only occurs on vanilla Linux kernels 3.7.10, 3.8.0 (Ubuntu),
3.9.11, 3.10.5 _but not on_ 3.6.11 and 3.5.7

For details please refer to the thread starting at:
 http://article.gmane.org/gmane.comp.version-control.git/231872

Never had any other hardware/stability issues _at all_ with these
machines. Only one repo out of 112 is affected. It's a git-svn clone and
even recreated copies out of svn do trigger the same failure.

Git mailing list ran out of ideas and for me this looks like some very
rare kernel issue. I'm not a kernel hacker. And to raise the challenge:
I can't share the repo easily, either.

Any hints how to tackle this issue?

Thanks!
- Ben

I'm not subscribed. Can you please CC me on replies?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Reproducible data corruption since 3.7.x on i5-3340M machines
  2013-08-09 14:58 Reproducible git-fsck/SHA1 failures since 3.7.x on a Dell E6430 / i5-3340M Ben Tebulin
@ 2013-08-12  8:04 ` Ben Tebulin
  2013-08-14 16:36 ` [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67! Ben Tebulin
  1 sibling, 0 replies; 45+ messages in thread
From: Ben Tebulin @ 2013-08-12  8:04 UTC (permalink / raw)
  To: Ben Tebulin; +Cc: linux-kernel

Shameless self-bump: Since kernel 3.7.x i can reproduce a sporadic data
corruption using git on 2 different machine (same model, though).

Can anybody give me a hint who can help to trace down/fix this regression?

For more information please refer to my first post on Friday.

Sorry to bother again, but I have no clue where to ask elsewhere.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
  2013-08-09 14:58 Reproducible git-fsck/SHA1 failures since 3.7.x on a Dell E6430 / i5-3340M Ben Tebulin
  2013-08-12  8:04 ` Reproducible data corruption since 3.7.x on i5-3340M machines Ben Tebulin
@ 2013-08-14 16:36 ` Ben Tebulin
  2013-08-14 17:40     ` Michal Hocko
  1 sibling, 1 reply; 45+ messages in thread
From: Ben Tebulin @ 2013-08-14 16:36 UTC (permalink / raw)
  To: mhocko, mgorman, hannes, bsingharora, kamezawa.hiroyu; +Cc: linux-mm

Hello Michal, Johannes, Balbir, Kamezawa and Mailing lists!

Since v3.7.2 on two independent machines a very specific Git repository
fails in 9/10 cases on git-fsck due to an SHA1/memory failures. This
only occurs on a very specific repository and can be reproduced stably
on two independent laptops. Git mailing list ran out of ideas and for me 
this looks like some very exotic kernel issue.

After a _very long session of rebooting and bisecting_ the Linux kernel
(fortunately I had a SSD and ccache!) I was able to pinpoint the cause
to the following patch:

*"mm: limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT"*
  787f7301074ccd07a3e82236ca41eefd245f4e07 linux stable    [1]
  53a59fc67f97374758e63a9c785891ec62324c81 upstream commit [2]

More details are available in my previous discussion on the Git mailing:

   http://thread.gmane.org/gmane.comp.version-control.git/231872

Never had any hardware/stability issues _at all_ with these machines. 
Only one repo out of 112 is affected. It's a git-svn clone and even 
recreated copies out of svn do trigger the same failure.

I was able to bisect this error to this very specific commit. 
Furthermore: Reverting this commit in 3.9.11 still solves the error. 

I assume this is a regression of the Linux kernel (not Git) and would 
kindly ask you to revert the afore mentioned commits.

Thanks!
- Ben

I'm not subscribed - please CC me.

[1] https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=787f7301074ccd07a3e82236ca41eefd245f4e07
[2] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=53a59fc67f97374758e63a9c785891ec62324c81

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
  2013-08-14 16:36 ` [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67! Ben Tebulin
@ 2013-08-14 17:40     ` Michal Hocko
  0 siblings, 0 replies; 45+ messages in thread
From: Michal Hocko @ 2013-08-14 17:40 UTC (permalink / raw)
  To: Ben Tebulin
  Cc: mgorman, hannes, bsingharora, kamezawa.hiroyu, linux-mm,
	Rik van Riel, Andrew Morton, Linus Torvalds, LKML

[Let's CC some more people]

On Wed 14-08-13 18:36:53, Ben Tebulin wrote:
> Hello Michal, Johannes, Balbir, Kamezawa and Mailing lists!

Hi,

> Since v3.7.2 on two independent machines a very specific Git repository
> fails in 9/10 cases on git-fsck due to an SHA1/memory failures. This
> only occurs on a very specific repository and can be reproduced stably
> on two independent laptops. Git mailing list ran out of ideas and for me 
> this looks like some very exotic kernel issue.
> 
> After a _very long session of rebooting and bisecting_ the Linux kernel
> (fortunately I had a SSD and ccache!) I was able to pinpoint the cause
> to the following patch:
> 
> *"mm: limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT"*
>   787f7301074ccd07a3e82236ca41eefd245f4e07 linux stable    [1]
>   53a59fc67f97374758e63a9c785891ec62324c81 upstream commit [2]

Thanks for bisecting this up!

I will look into this but I find it really strange. The patch only
limits the number of batched pages to be freed. This might happen even
without the patch, albeit less likely, when a new batch cannot be
allocated.
That being said, I do not see anything obviously wrong with the patch
itself. Maybe we are not flushing those pages properly in some corner
case which doesn't trigger normally. I will have to look at it but I
really think this just exhibits a subtle bug in batch pages freeing.

I have no objection to revert the patch for now until we find out what
is really going on.

> More details are available in my previous discussion on the Git mailing:
> 
>    http://thread.gmane.org/gmane.comp.version-control.git/231872
> 
> Never had any hardware/stability issues _at all_ with these machines. 
> Only one repo out of 112 is affected. It's a git-svn clone and even 
> recreated copies out of svn do trigger the same failure.
> 
> I was able to bisect this error to this very specific commit. 
> Furthermore: Reverting this commit in 3.9.11 still solves the error. 
> 
> I assume this is a regression of the Linux kernel (not Git) and would 
> kindly ask you to revert the afore mentioned commits.
> 
> Thanks!
> - Ben
> 
> 
> I'm not subscribed - please CC me.
> 
> [1] https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=787f7301074ccd07a3e82236ca41eefd245f4e07
> [2] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=53a59fc67f97374758e63a9c785891ec62324c81
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
@ 2013-08-14 17:40     ` Michal Hocko
  0 siblings, 0 replies; 45+ messages in thread
From: Michal Hocko @ 2013-08-14 17:40 UTC (permalink / raw)
  To: Ben Tebulin
  Cc: mgorman, hannes, bsingharora, kamezawa.hiroyu, linux-mm,
	Rik van Riel, Andrew Morton, Linus Torvalds, LKML

[Let's CC some more people]

On Wed 14-08-13 18:36:53, Ben Tebulin wrote:
> Hello Michal, Johannes, Balbir, Kamezawa and Mailing lists!

Hi,

> Since v3.7.2 on two independent machines a very specific Git repository
> fails in 9/10 cases on git-fsck due to an SHA1/memory failures. This
> only occurs on a very specific repository and can be reproduced stably
> on two independent laptops. Git mailing list ran out of ideas and for me 
> this looks like some very exotic kernel issue.
> 
> After a _very long session of rebooting and bisecting_ the Linux kernel
> (fortunately I had a SSD and ccache!) I was able to pinpoint the cause
> to the following patch:
> 
> *"mm: limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT"*
>   787f7301074ccd07a3e82236ca41eefd245f4e07 linux stable    [1]
>   53a59fc67f97374758e63a9c785891ec62324c81 upstream commit [2]

Thanks for bisecting this up!

I will look into this but I find it really strange. The patch only
limits the number of batched pages to be freed. This might happen even
without the patch, albeit less likely, when a new batch cannot be
allocated.
That being said, I do not see anything obviously wrong with the patch
itself. Maybe we are not flushing those pages properly in some corner
case which doesn't trigger normally. I will have to look at it but I
really think this just exhibits a subtle bug in batch pages freeing.

I have no objection to revert the patch for now until we find out what
is really going on.

> More details are available in my previous discussion on the Git mailing:
> 
>    http://thread.gmane.org/gmane.comp.version-control.git/231872
> 
> Never had any hardware/stability issues _at all_ with these machines. 
> Only one repo out of 112 is affected. It's a git-svn clone and even 
> recreated copies out of svn do trigger the same failure.
> 
> I was able to bisect this error to this very specific commit. 
> Furthermore: Reverting this commit in 3.9.11 still solves the error. 
> 
> I assume this is a regression of the Linux kernel (not Git) and would 
> kindly ask you to revert the afore mentioned commits.
> 
> Thanks!
> - Ben
> 
> 
> I'm not subscribed - please CC me.
> 
> [1] https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=787f7301074ccd07a3e82236ca41eefd245f4e07
> [2] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=53a59fc67f97374758e63a9c785891ec62324c81
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
  2013-08-14 17:40     ` Michal Hocko
@ 2013-08-14 17:58       ` Michal Hocko
  -1 siblings, 0 replies; 45+ messages in thread
From: Michal Hocko @ 2013-08-14 17:58 UTC (permalink / raw)
  To: Ben Tebulin
  Cc: mgorman, hannes, bsingharora, kamezawa.hiroyu, linux-mm,
	Rik van Riel, Andrew Morton, Linus Torvalds, LKML,
	Peter Zijlstra

[Forgot to add Peter]

On Wed 14-08-13 19:40:39, Michal Hocko wrote:
> [Let's CC some more people]
> 
> On Wed 14-08-13 18:36:53, Ben Tebulin wrote:
> > Hello Michal, Johannes, Balbir, Kamezawa and Mailing lists!
> 
> Hi,
> 
> > Since v3.7.2 on two independent machines a very specific Git repository
> > fails in 9/10 cases on git-fsck due to an SHA1/memory failures. This
> > only occurs on a very specific repository and can be reproduced stably
> > on two independent laptops. Git mailing list ran out of ideas and for me 
> > this looks like some very exotic kernel issue.
> > 
> > After a _very long session of rebooting and bisecting_ the Linux kernel
> > (fortunately I had a SSD and ccache!) I was able to pinpoint the cause
> > to the following patch:
> > 
> > *"mm: limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT"*
> >   787f7301074ccd07a3e82236ca41eefd245f4e07 linux stable    [1]
> >   53a59fc67f97374758e63a9c785891ec62324c81 upstream commit [2]
> 
> Thanks for bisecting this up!
> 
> I will look into this but I find it really strange. The patch only
> limits the number of batched pages to be freed. This might happen even
> without the patch, albeit less likely, when a new batch cannot be
> allocated.
> That being said, I do not see anything obviously wrong with the patch
> itself. Maybe we are not flushing those pages properly in some corner
> case which doesn't trigger normally. I will have to look at it but I
> really think this just exhibits a subtle bug in batch pages freeing.
> 
> I have no objection to revert the patch for now until we find out what
> is really going on.
> 
> > More details are available in my previous discussion on the Git mailing:
> > 
> >    http://thread.gmane.org/gmane.comp.version-control.git/231872
> > 
> > Never had any hardware/stability issues _at all_ with these machines. 
> > Only one repo out of 112 is affected. It's a git-svn clone and even 
> > recreated copies out of svn do trigger the same failure.
> > 
> > I was able to bisect this error to this very specific commit. 
> > Furthermore: Reverting this commit in 3.9.11 still solves the error. 
> > 
> > I assume this is a regression of the Linux kernel (not Git) and would 
> > kindly ask you to revert the afore mentioned commits.
> > 
> > Thanks!
> > - Ben
> > 
> > 
> > I'm not subscribed - please CC me.
> > 
> > [1] https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=787f7301074ccd07a3e82236ca41eefd245f4e07
> > [2] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=53a59fc67f97374758e63a9c785891ec62324c81
> > 
> 
> -- 
> Michal Hocko
> SUSE Labs
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
@ 2013-08-14 17:58       ` Michal Hocko
  0 siblings, 0 replies; 45+ messages in thread
From: Michal Hocko @ 2013-08-14 17:58 UTC (permalink / raw)
  To: Ben Tebulin
  Cc: mgorman, hannes, bsingharora, kamezawa.hiroyu, linux-mm,
	Rik van Riel, Andrew Morton, Linus Torvalds, LKML,
	Peter Zijlstra

[Forgot to add Peter]

On Wed 14-08-13 19:40:39, Michal Hocko wrote:
> [Let's CC some more people]
> 
> On Wed 14-08-13 18:36:53, Ben Tebulin wrote:
> > Hello Michal, Johannes, Balbir, Kamezawa and Mailing lists!
> 
> Hi,
> 
> > Since v3.7.2 on two independent machines a very specific Git repository
> > fails in 9/10 cases on git-fsck due to an SHA1/memory failures. This
> > only occurs on a very specific repository and can be reproduced stably
> > on two independent laptops. Git mailing list ran out of ideas and for me 
> > this looks like some very exotic kernel issue.
> > 
> > After a _very long session of rebooting and bisecting_ the Linux kernel
> > (fortunately I had a SSD and ccache!) I was able to pinpoint the cause
> > to the following patch:
> > 
> > *"mm: limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT"*
> >   787f7301074ccd07a3e82236ca41eefd245f4e07 linux stable    [1]
> >   53a59fc67f97374758e63a9c785891ec62324c81 upstream commit [2]
> 
> Thanks for bisecting this up!
> 
> I will look into this but I find it really strange. The patch only
> limits the number of batched pages to be freed. This might happen even
> without the patch, albeit less likely, when a new batch cannot be
> allocated.
> That being said, I do not see anything obviously wrong with the patch
> itself. Maybe we are not flushing those pages properly in some corner
> case which doesn't trigger normally. I will have to look at it but I
> really think this just exhibits a subtle bug in batch pages freeing.
> 
> I have no objection to revert the patch for now until we find out what
> is really going on.
> 
> > More details are available in my previous discussion on the Git mailing:
> > 
> >    http://thread.gmane.org/gmane.comp.version-control.git/231872
> > 
> > Never had any hardware/stability issues _at all_ with these machines. 
> > Only one repo out of 112 is affected. It's a git-svn clone and even 
> > recreated copies out of svn do trigger the same failure.
> > 
> > I was able to bisect this error to this very specific commit. 
> > Furthermore: Reverting this commit in 3.9.11 still solves the error. 
> > 
> > I assume this is a regression of the Linux kernel (not Git) and would 
> > kindly ask you to revert the afore mentioned commits.
> > 
> > Thanks!
> > - Ben
> > 
> > 
> > I'm not subscribed - please CC me.
> > 
> > [1] https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=787f7301074ccd07a3e82236ca41eefd245f4e07
> > [2] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=53a59fc67f97374758e63a9c785891ec62324c81
> > 
> 
> -- 
> Michal Hocko
> SUSE Labs
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
  2013-08-14 17:40     ` Michal Hocko
@ 2013-08-14 18:03       ` Linus Torvalds
  -1 siblings, 0 replies; 45+ messages in thread
From: Linus Torvalds @ 2013-08-14 18:03 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Ben Tebulin, Mel Gorman, Johannes Weiner, Balbir Singh,
	KAMEZAWA Hiroyuki, linux-mm, Rik van Riel, Andrew Morton, LKML,
	Peter Zijlstra

On Wed, Aug 14, 2013 at 10:40 AM, Michal Hocko <mhocko@suse.cz> wrote:
>>
>> After a _very long session of rebooting and bisecting_ the Linux kernel
>> (fortunately I had a SSD and ccache!) I was able to pinpoint the cause
>> to the following patch:
>>
>> *"mm: limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT"*
>>   787f7301074ccd07a3e82236ca41eefd245f4e07 linux stable    [1]
>>   53a59fc67f97374758e63a9c785891ec62324c81 upstream commit [2]
>
> Thanks for bisecting this up!
>
> I will look into this but I find it really strange.

We had a TLB invalidation bug in the case when we ran out of page
slots (and limiting the mmu_gather batching basically forcesd an early
case of that).

It was fixed in commit e6c495a96ce02574e765d5140039a64c8d4e8c9e ("mm:
fix the TLB range flushed when __tlb_remove_page() runs out of
slots"), and that doesn't seem to have been marked for stable
(probably because the commit message makes everytbody reading it think
it's limited to ARC).

Ben, can you try back-porting that commit from mainline and see if
that fixes things?

                 Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
@ 2013-08-14 18:03       ` Linus Torvalds
  0 siblings, 0 replies; 45+ messages in thread
From: Linus Torvalds @ 2013-08-14 18:03 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Ben Tebulin, Mel Gorman, Johannes Weiner, Balbir Singh,
	KAMEZAWA Hiroyuki, linux-mm, Rik van Riel, Andrew Morton, LKML,
	Peter Zijlstra

On Wed, Aug 14, 2013 at 10:40 AM, Michal Hocko <mhocko@suse.cz> wrote:
>>
>> After a _very long session of rebooting and bisecting_ the Linux kernel
>> (fortunately I had a SSD and ccache!) I was able to pinpoint the cause
>> to the following patch:
>>
>> *"mm: limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT"*
>>   787f7301074ccd07a3e82236ca41eefd245f4e07 linux stable    [1]
>>   53a59fc67f97374758e63a9c785891ec62324c81 upstream commit [2]
>
> Thanks for bisecting this up!
>
> I will look into this but I find it really strange.

We had a TLB invalidation bug in the case when we ran out of page
slots (and limiting the mmu_gather batching basically forcesd an early
case of that).

It was fixed in commit e6c495a96ce02574e765d5140039a64c8d4e8c9e ("mm:
fix the TLB range flushed when __tlb_remove_page() runs out of
slots"), and that doesn't seem to have been marked for stable
(probably because the commit message makes everytbody reading it think
it's limited to ARC).

Ben, can you try back-porting that commit from mainline and see if
that fixes things?

                 Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
  2013-08-14 18:03       ` Linus Torvalds
@ 2013-08-14 18:28         ` Michal Hocko
  -1 siblings, 0 replies; 45+ messages in thread
From: Michal Hocko @ 2013-08-14 18:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ben Tebulin, Mel Gorman, Johannes Weiner, Balbir Singh,
	KAMEZAWA Hiroyuki, linux-mm, Rik van Riel, Andrew Morton, LKML,
	Peter Zijlstra

On Wed 14-08-13 11:03:32, Linus Torvalds wrote:
> On Wed, Aug 14, 2013 at 10:40 AM, Michal Hocko <mhocko@suse.cz> wrote:
> >>
> >> After a _very long session of rebooting and bisecting_ the Linux kernel
> >> (fortunately I had a SSD and ccache!) I was able to pinpoint the cause
> >> to the following patch:
> >>
> >> *"mm: limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT"*
> >>   787f7301074ccd07a3e82236ca41eefd245f4e07 linux stable    [1]
> >>   53a59fc67f97374758e63a9c785891ec62324c81 upstream commit [2]
> >
> > Thanks for bisecting this up!
> >
> > I will look into this but I find it really strange.
> 
> We had a TLB invalidation bug in the case when we ran out of page
> slots (and limiting the mmu_gather batching basically forcesd an early
> case of that).
> 
> It was fixed in commit e6c495a96ce02574e765d5140039a64c8d4e8c9e ("mm:
> fix the TLB range flushed when __tlb_remove_page() runs out of
> slots"),

OK that would suggest the issue has been introduced by 597e1c35:
(mm/mmu_gather: enable tlb flush range in generic mmu_gather) in 3.6
which is not 3.7 when Ben started seeing the issue but this definitely
smells like a bug that would be amplified by the bisected patch.

Thanks for pointing this out, Linus!

> and that doesn't seem to have been marked for stable
> (probably because the commit message makes everytbody reading it think
> it's limited to ARC).
> 
> Ben, can you try back-porting that commit from mainline and see if
> that fixes things?
> 
>                  Linus
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
@ 2013-08-14 18:28         ` Michal Hocko
  0 siblings, 0 replies; 45+ messages in thread
From: Michal Hocko @ 2013-08-14 18:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ben Tebulin, Mel Gorman, Johannes Weiner, Balbir Singh,
	KAMEZAWA Hiroyuki, linux-mm, Rik van Riel, Andrew Morton, LKML,
	Peter Zijlstra

On Wed 14-08-13 11:03:32, Linus Torvalds wrote:
> On Wed, Aug 14, 2013 at 10:40 AM, Michal Hocko <mhocko@suse.cz> wrote:
> >>
> >> After a _very long session of rebooting and bisecting_ the Linux kernel
> >> (fortunately I had a SSD and ccache!) I was able to pinpoint the cause
> >> to the following patch:
> >>
> >> *"mm: limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT"*
> >>   787f7301074ccd07a3e82236ca41eefd245f4e07 linux stable    [1]
> >>   53a59fc67f97374758e63a9c785891ec62324c81 upstream commit [2]
> >
> > Thanks for bisecting this up!
> >
> > I will look into this but I find it really strange.
> 
> We had a TLB invalidation bug in the case when we ran out of page
> slots (and limiting the mmu_gather batching basically forcesd an early
> case of that).
> 
> It was fixed in commit e6c495a96ce02574e765d5140039a64c8d4e8c9e ("mm:
> fix the TLB range flushed when __tlb_remove_page() runs out of
> slots"),

OK that would suggest the issue has been introduced by 597e1c35:
(mm/mmu_gather: enable tlb flush range in generic mmu_gather) in 3.6
which is not 3.7 when Ben started seeing the issue but this definitely
smells like a bug that would be amplified by the bisected patch.

Thanks for pointing this out, Linus!

> and that doesn't seem to have been marked for stable
> (probably because the commit message makes everytbody reading it think
> it's limited to ARC).
> 
> Ben, can you try back-porting that commit from mainline and see if
> that fixes things?
> 
>                  Linus
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
  2013-08-14 18:28         ` Michal Hocko
@ 2013-08-14 18:35           ` Linus Torvalds
  -1 siblings, 0 replies; 45+ messages in thread
From: Linus Torvalds @ 2013-08-14 18:35 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Ben Tebulin, Mel Gorman, Johannes Weiner, Balbir Singh,
	KAMEZAWA Hiroyuki, linux-mm, Rik van Riel, Andrew Morton, LKML,
	Peter Zijlstra

On Wed, Aug 14, 2013 at 11:28 AM, Michal Hocko <mhocko@suse.cz> wrote:
>
> OK that would suggest the issue has been introduced by 597e1c35:
> (mm/mmu_gather: enable tlb flush range in generic mmu_gather) in 3.6
> which is not 3.7 when Ben started seeing the issue but this definitely
> smells like a bug that would be amplified by the bisected patch.

Yes, the bug was originally introduced in 597e1c35, but in practice it
never happened, because the force_flush case would not ever really
trigger unless __get_free_pages(GFP_NOWAIT) returned NULL.

Which is *very* rare.

So the commit that Ben bisected things down to wasn't the one that
really introduced the bug, but it was the one that made
tlb_next_batch() much more likely to return failure, which in turn
made it much easier to *expose* the bug.

NOTE! I still absolutely want Ben to actually test that fix (ie
backport commit e6c495a96ce0 to his tree), because without testing
this is all just theoretical, and there might be other things hiding
here. But it makes sense to me, and I think this already-known bug
explains the symptoms.

                    Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
@ 2013-08-14 18:35           ` Linus Torvalds
  0 siblings, 0 replies; 45+ messages in thread
From: Linus Torvalds @ 2013-08-14 18:35 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Ben Tebulin, Mel Gorman, Johannes Weiner, Balbir Singh,
	KAMEZAWA Hiroyuki, linux-mm, Rik van Riel, Andrew Morton, LKML,
	Peter Zijlstra

On Wed, Aug 14, 2013 at 11:28 AM, Michal Hocko <mhocko@suse.cz> wrote:
>
> OK that would suggest the issue has been introduced by 597e1c35:
> (mm/mmu_gather: enable tlb flush range in generic mmu_gather) in 3.6
> which is not 3.7 when Ben started seeing the issue but this definitely
> smells like a bug that would be amplified by the bisected patch.

Yes, the bug was originally introduced in 597e1c35, but in practice it
never happened, because the force_flush case would not ever really
trigger unless __get_free_pages(GFP_NOWAIT) returned NULL.

Which is *very* rare.

So the commit that Ben bisected things down to wasn't the one that
really introduced the bug, but it was the one that made
tlb_next_batch() much more likely to return failure, which in turn
made it much easier to *expose* the bug.

NOTE! I still absolutely want Ben to actually test that fix (ie
backport commit e6c495a96ce0 to his tree), because without testing
this is all just theoretical, and there might be other things hiding
here. But it makes sense to me, and I think this already-known bug
explains the symptoms.

                    Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
  2013-08-14 18:35           ` Linus Torvalds
@ 2013-08-15  9:25             ` Ben Tebulin
  -1 siblings, 0 replies; 45+ messages in thread
From: Ben Tebulin @ 2013-08-15  9:25 UTC (permalink / raw)
  To: Linus Torvalds, Michal Hocko
  Cc: Mel Gorman, Johannes Weiner, Balbir Singh, KAMEZAWA Hiroyuki,
	linux-mm, Rik van Riel, Andrew Morton, LKML, Peter Zijlstra

Am 14.08.2013 20:35, schrieb Linus Torvalds:
> Yes, the bug was originally introduced in 597e1c35, but in practice it
> never happened, [...]
> 
> NOTE! I still absolutely want Ben to actually test that fix (ie
> backport commit e6c495a96ce0 to his tree), because without testing
> this is all just theoretical, and there might be other things hiding
> here.[..]

I just cherry-picked e6c495a96ce0 into 3.9.11 and 3.7.10.
Unfortunately this does _not resolve_ my issue (too good to be true) :-(

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
@ 2013-08-15  9:25             ` Ben Tebulin
  0 siblings, 0 replies; 45+ messages in thread
From: Ben Tebulin @ 2013-08-15  9:25 UTC (permalink / raw)
  To: Linus Torvalds, Michal Hocko
  Cc: Mel Gorman, Johannes Weiner, Balbir Singh, KAMEZAWA Hiroyuki,
	linux-mm, Rik van Riel, Andrew Morton, LKML, Peter Zijlstra

Am 14.08.2013 20:35, schrieb Linus Torvalds:
> Yes, the bug was originally introduced in 597e1c35, but in practice it
> never happened, [...]
> 
> NOTE! I still absolutely want Ben to actually test that fix (ie
> backport commit e6c495a96ce0 to his tree), because without testing
> this is all just theoretical, and there might be other things hiding
> here.[..]

I just cherry-picked e6c495a96ce0 into 3.9.11 and 3.7.10.
Unfortunately this does _not resolve_ my issue (too good to be true) :-(

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
  2013-08-15  9:25             ` Ben Tebulin
@ 2013-08-15 12:02               ` Linus Torvalds
  -1 siblings, 0 replies; 45+ messages in thread
From: Linus Torvalds @ 2013-08-15 12:02 UTC (permalink / raw)
  To: Ben Tebulin
  Cc: Michal Hocko, Mel Gorman, Johannes Weiner, Balbir Singh,
	KAMEZAWA Hiroyuki, linux-mm, Rik van Riel, Andrew Morton, LKML,
	Peter Zijlstra

On Thu, Aug 15, 2013 at 2:25 AM, Ben Tebulin <tebulin@googlemail.com> wrote:
>
> I just cherry-picked e6c495a96ce0 into 3.9.11 and 3.7.10.
> Unfortunately this does _not resolve_ my issue (too good to be true) :-(

Ho humm. I've found at least one other bug, but that one only affects
hugepages. Do you perhaps have transparent hugepages enabled? But even
then it looks quite unlikely.

I'll think about this some more. I'm not happy with how that
particular whole TLB flushing hack was done, but I need to sleep on
this.

              Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
@ 2013-08-15 12:02               ` Linus Torvalds
  0 siblings, 0 replies; 45+ messages in thread
From: Linus Torvalds @ 2013-08-15 12:02 UTC (permalink / raw)
  To: Ben Tebulin
  Cc: Michal Hocko, Mel Gorman, Johannes Weiner, Balbir Singh,
	KAMEZAWA Hiroyuki, linux-mm, Rik van Riel, Andrew Morton, LKML,
	Peter Zijlstra

On Thu, Aug 15, 2013 at 2:25 AM, Ben Tebulin <tebulin@googlemail.com> wrote:
>
> I just cherry-picked e6c495a96ce0 into 3.9.11 and 3.7.10.
> Unfortunately this does _not resolve_ my issue (too good to be true) :-(

Ho humm. I've found at least one other bug, but that one only affects
hugepages. Do you perhaps have transparent hugepages enabled? But even
then it looks quite unlikely.

I'll think about this some more. I'm not happy with how that
particular whole TLB flushing hack was done, but I need to sleep on
this.

              Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
  2013-08-15 12:02               ` Linus Torvalds
@ 2013-08-15 12:37                 ` Ben Tebulin
  -1 siblings, 0 replies; 45+ messages in thread
From: Ben Tebulin @ 2013-08-15 12:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Michal Hocko, Mel Gorman, Johannes Weiner, Balbir Singh,
	KAMEZAWA Hiroyuki, linux-mm, Rik van Riel, Andrew Morton, LKML,
	Peter Zijlstra

Am 15.08.2013 14:02, schrieb Linus Torvalds:
>> I just cherry-picked e6c495a96ce0 into 3.9.11 and 3.7.10.
>> Unfortunately this does _not resolve_ my issue (too good to be true) :-(
> Ho humm. I've found at least one other bug, but that one only affects
> hugepages. Do you perhaps have transparent hugepages enabled? 

I was using the Ubuntu mainline Kernel config:

   ben@n179 ~/p/linux.git> cat .config | grep TRANSPARENT_HUG
   CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
   CONFIG_TRANSPARENT_HUGEPAGE=y
   # CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS is not set
   CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y

> I'll think about this some more. I'm not happy with how that
> particular whole TLB flushing hack was done, but I need to sleep on
> this.

Thanks!

Being an end user having only a very limited understanding of the
internals behind this issue, I really appreciate any support I receive
from people who do. :-)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
@ 2013-08-15 12:37                 ` Ben Tebulin
  0 siblings, 0 replies; 45+ messages in thread
From: Ben Tebulin @ 2013-08-15 12:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Michal Hocko, Mel Gorman, Johannes Weiner, Balbir Singh,
	KAMEZAWA Hiroyuki, linux-mm, Rik van Riel, Andrew Morton, LKML,
	Peter Zijlstra

Am 15.08.2013 14:02, schrieb Linus Torvalds:
>> I just cherry-picked e6c495a96ce0 into 3.9.11 and 3.7.10.
>> Unfortunately this does _not resolve_ my issue (too good to be true) :-(
> Ho humm. I've found at least one other bug, but that one only affects
> hugepages. Do you perhaps have transparent hugepages enabled? 

I was using the Ubuntu mainline Kernel config:

   ben@n179 ~/p/linux.git> cat .config | grep TRANSPARENT_HUG
   CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
   CONFIG_TRANSPARENT_HUGEPAGE=y
   # CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS is not set
   CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y

> I'll think about this some more. I'm not happy with how that
> particular whole TLB flushing hack was done, but I need to sleep on
> this.

Thanks!

Being an end user having only a very limited understanding of the
internals behind this issue, I really appreciate any support I receive
from people who do. :-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
  2013-08-15 12:02               ` Linus Torvalds
@ 2013-08-15 13:40                 ` Michal Hocko
  -1 siblings, 0 replies; 45+ messages in thread
From: Michal Hocko @ 2013-08-15 13:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ben Tebulin, Mel Gorman, Johannes Weiner, Balbir Singh,
	KAMEZAWA Hiroyuki, linux-mm, Rik van Riel, Andrew Morton, LKML,
	Peter Zijlstra

On Thu 15-08-13 05:02:31, Linus Torvalds wrote:
> On Thu, Aug 15, 2013 at 2:25 AM, Ben Tebulin <tebulin@googlemail.com> wrote:
> >
> > I just cherry-picked e6c495a96ce0 into 3.9.11 and 3.7.10.
> > Unfortunately this does _not resolve_ my issue (too good to be true) :-(
> 
> Ho humm. I've found at least one other bug, but that one only affects
> hugepages. Do you perhaps have transparent hugepages enabled? But even
> then it looks quite unlikely.

__unmap_hugepage_range is hugetlb not THP if you had that one in mind.
And yes, it doesn't set the range which sounds buggy.

> I'll think about this some more. I'm not happy with how that
> particular whole TLB flushing hack was done, but I need to sleep on
> this.

I am looking into it as well, but there are high prio things which
preempt me a lot :/

Thanks for looking into it.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
@ 2013-08-15 13:40                 ` Michal Hocko
  0 siblings, 0 replies; 45+ messages in thread
From: Michal Hocko @ 2013-08-15 13:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ben Tebulin, Mel Gorman, Johannes Weiner, Balbir Singh,
	KAMEZAWA Hiroyuki, linux-mm, Rik van Riel, Andrew Morton, LKML,
	Peter Zijlstra

On Thu 15-08-13 05:02:31, Linus Torvalds wrote:
> On Thu, Aug 15, 2013 at 2:25 AM, Ben Tebulin <tebulin@googlemail.com> wrote:
> >
> > I just cherry-picked e6c495a96ce0 into 3.9.11 and 3.7.10.
> > Unfortunately this does _not resolve_ my issue (too good to be true) :-(
> 
> Ho humm. I've found at least one other bug, but that one only affects
> hugepages. Do you perhaps have transparent hugepages enabled? But even
> then it looks quite unlikely.

__unmap_hugepage_range is hugetlb not THP if you had that one in mind.
And yes, it doesn't set the range which sounds buggy.

> I'll think about this some more. I'm not happy with how that
> particular whole TLB flushing hack was done, but I need to sleep on
> this.

I am looking into it as well, but there are high prio things which
preempt me a lot :/

Thanks for looking into it.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
  2013-08-15 13:40                 ` Michal Hocko
@ 2013-08-15 14:46                   ` Michal Hocko
  -1 siblings, 0 replies; 45+ messages in thread
From: Michal Hocko @ 2013-08-15 14:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ben Tebulin, Mel Gorman, Johannes Weiner, Balbir Singh,
	KAMEZAWA Hiroyuki, linux-mm, Rik van Riel, Andrew Morton, LKML,
	Peter Zijlstra

On Thu 15-08-13 15:40:31, Michal Hocko wrote:
> On Thu 15-08-13 05:02:31, Linus Torvalds wrote:
> > On Thu, Aug 15, 2013 at 2:25 AM, Ben Tebulin <tebulin@googlemail.com> wrote:
> > >
> > > I just cherry-picked e6c495a96ce0 into 3.9.11 and 3.7.10.
> > > Unfortunately this does _not resolve_ my issue (too good to be true) :-(
> > 
> > Ho humm. I've found at least one other bug, but that one only affects
> > hugepages. Do you perhaps have transparent hugepages enabled? But even
> > then it looks quite unlikely.
> 
> __unmap_hugepage_range is hugetlb not THP if you had that one in mind.
> And yes, it doesn't set the range which sounds buggy.

Or, did you mean tlb_remove_page called from zap_huge_pmd? That one
should be safe as tlb_remove_pmd_tlb_entry sets need_flush and that
means that the full range is flushed.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
@ 2013-08-15 14:46                   ` Michal Hocko
  0 siblings, 0 replies; 45+ messages in thread
From: Michal Hocko @ 2013-08-15 14:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ben Tebulin, Mel Gorman, Johannes Weiner, Balbir Singh,
	KAMEZAWA Hiroyuki, linux-mm, Rik van Riel, Andrew Morton, LKML,
	Peter Zijlstra

On Thu 15-08-13 15:40:31, Michal Hocko wrote:
> On Thu 15-08-13 05:02:31, Linus Torvalds wrote:
> > On Thu, Aug 15, 2013 at 2:25 AM, Ben Tebulin <tebulin@googlemail.com> wrote:
> > >
> > > I just cherry-picked e6c495a96ce0 into 3.9.11 and 3.7.10.
> > > Unfortunately this does _not resolve_ my issue (too good to be true) :-(
> > 
> > Ho humm. I've found at least one other bug, but that one only affects
> > hugepages. Do you perhaps have transparent hugepages enabled? But even
> > then it looks quite unlikely.
> 
> __unmap_hugepage_range is hugetlb not THP if you had that one in mind.
> And yes, it doesn't set the range which sounds buggy.

Or, did you mean tlb_remove_page called from zap_huge_pmd? That one
should be safe as tlb_remove_pmd_tlb_entry sets need_flush and that
means that the full range is flushed.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
  2013-08-15 14:46                   ` Michal Hocko
@ 2013-08-15 14:53                     ` Michal Hocko
  -1 siblings, 0 replies; 45+ messages in thread
From: Michal Hocko @ 2013-08-15 14:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ben Tebulin, Mel Gorman, Johannes Weiner, Balbir Singh,
	KAMEZAWA Hiroyuki, linux-mm, Rik van Riel, Andrew Morton, LKML,
	Peter Zijlstra

On Thu 15-08-13 16:46:00, Michal Hocko wrote:
> On Thu 15-08-13 15:40:31, Michal Hocko wrote:
> > On Thu 15-08-13 05:02:31, Linus Torvalds wrote:
> > > On Thu, Aug 15, 2013 at 2:25 AM, Ben Tebulin <tebulin@googlemail.com> wrote:
> > > >
> > > > I just cherry-picked e6c495a96ce0 into 3.9.11 and 3.7.10.
> > > > Unfortunately this does _not resolve_ my issue (too good to be true) :-(
> > > 
> > > Ho humm. I've found at least one other bug, but that one only affects
> > > hugepages. Do you perhaps have transparent hugepages enabled? But even
> > > then it looks quite unlikely.
> > 
> > __unmap_hugepage_range is hugetlb not THP if you had that one in mind.
> > And yes, it doesn't set the range which sounds buggy.
> 
> Or, did you mean tlb_remove_page called from zap_huge_pmd? That one
> should be safe as tlb_remove_pmd_tlb_entry sets need_flush and that
> means that the full range is flushed.

Dohh... But we need need_flush_all and that is not set here. So this
really looks buggy.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
@ 2013-08-15 14:53                     ` Michal Hocko
  0 siblings, 0 replies; 45+ messages in thread
From: Michal Hocko @ 2013-08-15 14:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ben Tebulin, Mel Gorman, Johannes Weiner, Balbir Singh,
	KAMEZAWA Hiroyuki, linux-mm, Rik van Riel, Andrew Morton, LKML,
	Peter Zijlstra

On Thu 15-08-13 16:46:00, Michal Hocko wrote:
> On Thu 15-08-13 15:40:31, Michal Hocko wrote:
> > On Thu 15-08-13 05:02:31, Linus Torvalds wrote:
> > > On Thu, Aug 15, 2013 at 2:25 AM, Ben Tebulin <tebulin@googlemail.com> wrote:
> > > >
> > > > I just cherry-picked e6c495a96ce0 into 3.9.11 and 3.7.10.
> > > > Unfortunately this does _not resolve_ my issue (too good to be true) :-(
> > > 
> > > Ho humm. I've found at least one other bug, but that one only affects
> > > hugepages. Do you perhaps have transparent hugepages enabled? But even
> > > then it looks quite unlikely.
> > 
> > __unmap_hugepage_range is hugetlb not THP if you had that one in mind.
> > And yes, it doesn't set the range which sounds buggy.
> 
> Or, did you mean tlb_remove_page called from zap_huge_pmd? That one
> should be safe as tlb_remove_pmd_tlb_entry sets need_flush and that
> means that the full range is flushed.

Dohh... But we need need_flush_all and that is not set here. So this
really looks buggy.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
  2013-08-15 14:53                     ` Michal Hocko
@ 2013-08-15 15:14                       ` Michal Hocko
  -1 siblings, 0 replies; 45+ messages in thread
From: Michal Hocko @ 2013-08-15 15:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ben Tebulin, Mel Gorman, Johannes Weiner, Balbir Singh,
	KAMEZAWA Hiroyuki, linux-mm, Rik van Riel, Andrew Morton, LKML,
	Peter Zijlstra

On Thu 15-08-13 16:53:32, Michal Hocko wrote:
> On Thu 15-08-13 16:46:00, Michal Hocko wrote:
> > On Thu 15-08-13 15:40:31, Michal Hocko wrote:
> > > On Thu 15-08-13 05:02:31, Linus Torvalds wrote:
> > > > On Thu, Aug 15, 2013 at 2:25 AM, Ben Tebulin <tebulin@googlemail.com> wrote:
> > > > >
> > > > > I just cherry-picked e6c495a96ce0 into 3.9.11 and 3.7.10.
> > > > > Unfortunately this does _not resolve_ my issue (too good to be true) :-(
> > > > 
> > > > Ho humm. I've found at least one other bug, but that one only affects
> > > > hugepages. Do you perhaps have transparent hugepages enabled? But even
> > > > then it looks quite unlikely.
> > > 
> > > __unmap_hugepage_range is hugetlb not THP if you had that one in mind.
> > > And yes, it doesn't set the range which sounds buggy.
> > 
> > Or, did you mean tlb_remove_page called from zap_huge_pmd? That one
> > should be safe as tlb_remove_pmd_tlb_entry sets need_flush and that
> > means that the full range is flushed.
> 
> Dohh... But we need need_flush_all and that is not set here. So this
> really looks buggy.

This is a really dumb attempt to fix this but maybe it is worth trying
to confirm we are really seeing this problem. It still flushes too much
potentially but I am not sure how to find out the proper start...
Will think about it more.
---
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a92012a..a16f452 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1381,7 +1381,11 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 			VM_BUG_ON(!PageHead(page));
 			tlb->mm->nr_ptes--;
 			spin_unlock(&tlb->mm->page_table_lock);
-			tlb_remove_page(tlb, page);
+			if (!__tlb_remove_page(tlb, page)) {
+				tlb->start = 0;
+				tlb->end = addr + HPAGE_SIZE;
+				tlb_flush_mmu(tlb);
+			}
 		}
 		pte_free(tlb->mm, pgtable);
 		ret = 1;
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
@ 2013-08-15 15:14                       ` Michal Hocko
  0 siblings, 0 replies; 45+ messages in thread
From: Michal Hocko @ 2013-08-15 15:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ben Tebulin, Mel Gorman, Johannes Weiner, Balbir Singh,
	KAMEZAWA Hiroyuki, linux-mm, Rik van Riel, Andrew Morton, LKML,
	Peter Zijlstra

On Thu 15-08-13 16:53:32, Michal Hocko wrote:
> On Thu 15-08-13 16:46:00, Michal Hocko wrote:
> > On Thu 15-08-13 15:40:31, Michal Hocko wrote:
> > > On Thu 15-08-13 05:02:31, Linus Torvalds wrote:
> > > > On Thu, Aug 15, 2013 at 2:25 AM, Ben Tebulin <tebulin@googlemail.com> wrote:
> > > > >
> > > > > I just cherry-picked e6c495a96ce0 into 3.9.11 and 3.7.10.
> > > > > Unfortunately this does _not resolve_ my issue (too good to be true) :-(
> > > > 
> > > > Ho humm. I've found at least one other bug, but that one only affects
> > > > hugepages. Do you perhaps have transparent hugepages enabled? But even
> > > > then it looks quite unlikely.
> > > 
> > > __unmap_hugepage_range is hugetlb not THP if you had that one in mind.
> > > And yes, it doesn't set the range which sounds buggy.
> > 
> > Or, did you mean tlb_remove_page called from zap_huge_pmd? That one
> > should be safe as tlb_remove_pmd_tlb_entry sets need_flush and that
> > means that the full range is flushed.
> 
> Dohh... But we need need_flush_all and that is not set here. So this
> really looks buggy.

This is a really dumb attempt to fix this but maybe it is worth trying
to confirm we are really seeing this problem. It still flushes too much
potentially but I am not sure how to find out the proper start...
Will think about it more.
---
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a92012a..a16f452 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1381,7 +1381,11 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 			VM_BUG_ON(!PageHead(page));
 			tlb->mm->nr_ptes--;
 			spin_unlock(&tlb->mm->page_table_lock);
-			tlb_remove_page(tlb, page);
+			if (!__tlb_remove_page(tlb, page)) {
+				tlb->start = 0;
+				tlb->end = addr + HPAGE_SIZE;
+				tlb_flush_mmu(tlb);
+			}
 		}
 		pte_free(tlb->mm, pgtable);
 		ret = 1;
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
  2013-08-15 12:02               ` Linus Torvalds
                                 ` (2 preceding siblings ...)
  (?)
@ 2013-08-15 18:00               ` Linus Torvalds
  2013-08-15 18:29                   ` Bjørn Mork
  2013-08-15 23:05                   ` Ben Tebulin
  -1 siblings, 2 replies; 45+ messages in thread
From: Linus Torvalds @ 2013-08-15 18:00 UTC (permalink / raw)
  To: Ben Tebulin
  Cc: Michal Hocko, Mel Gorman, Johannes Weiner, Balbir Singh,
	KAMEZAWA Hiroyuki, linux-mm, Rik van Riel, Andrew Morton, LKML,
	Peter Zijlstra, linux-arch

[-- Attachment #1: Type: text/plain, Size: 3967 bytes --]

On Thu, Aug 15, 2013 at 5:02 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Aug 15, 2013 at 2:25 AM, Ben Tebulin <tebulin@googlemail.com> wrote:
>>
>> I just cherry-picked e6c495a96ce0 into 3.9.11 and 3.7.10.
>> Unfortunately this does _not resolve_ my issue (too good to be true) :-(
>
> Ho humm. I've found at least one other bug, but that one only affects
> hugepages. Do you perhaps have transparent hugepages enabled? But even
> then it looks quite unlikely.
>
> I'll think about this some more. I'm not happy with how that
> particular whole TLB flushing hack was done, but I need to sleep on
> this.

Ok, so I've slept on it, and here's my current thinking.

The bugs in the TLB handling were all about missing or confused
updates to the TLB range, and the thing is, they were missing or
confused because you had to do really confusing things, and remember
to set the range properly.

And that is because the interface is horrible.

This patch tries to fix the interface instead of trying to patch up
the individual places that *should* set the range some particular way.
Sadly, that means that I had to change the calling convention for
"tlb_gather_mmu()", so the patch is larger than I'd like. But it's all
very straightforward:

 (1) instead of passing "fullmm" to tlb_gather_mmu(), pass the
start/end address.

     A range of 0 to ~0ul implies "fullmm", and we calculate that with
"!(start | (end+1))"

 (2) Because access to start/end now becomes an internal API, the
patch makes *all* TLB gather implementations do this.

     So I added start/end fields to the tlb_gather structure as necessary.

     Note that some architectures already had "start_range/end_range"
values, and I left those alone (because the new start/end might work a
bit differently), but it's very possible that those could be removed,
and they'd just use the "generic" start/end values. I'm cc'ing the
arch list to see what the reaction to this all is.

 (3) I removed all the other games with start/end, because now
start/end is _always_ valid.

     Notably, if any caller of "tlb_flush_mmu()" forgets to update the
start/end fields (like I think the hugetlb case did), it is no longer
a bug. The start/end will have been set up by the initialization of
TLB gather, so we're all good.

 (4) The ONE exception to (3) is the zap_pte_range() case in
mm/memory.c, which used to do all the special start/end games, and now
instead just updates start/end to be the "chunk" it just worked on
before flushing the TLB, and the "rest of the area" afterwards.

Even that special (4) case is simpler now, imho, exactly because
start/end is a valid range at all points (it used to be that it wasn't
a valid range the first time, since it wasn't set up initially). So
now that code in case (4) makes more sense, but more importantly, now
it should be just an optimization - we *could* have dropped all the
start/end updates, but then we'd just ask the TLB to be flushed for
the whole original range every time.

Anyway. I've booted this, and I'm writing this email with a kernel
running this, BUT:

 - I have not compile-tested anything but x86-64, so the non-generic
TLB gather changes are all just done blindly. They are very
straightforward, but still..

 - I have no idea whether this will fix the problem Ben sees, but I
feel happier about the code, because now any place that forgets to set
up start/end will work just fine, because they are always valid. Ben,
please test. I'm worried that the problem you see is something even
more fundamentally wrong with the whole "oops, must flush in the
middle" logic, but I'm _hoping_ this fixes it.

 - This patch is against current git, so to apply you need to have
that commit e6c495a96ce0 cherry-picked to older kernels first. But
other than that I don't think this code has changed, so it should
apply cleanly.

Comments? Especially s390, ARM, ia64, sh and um that I edited blindly...

                          Linus

[-- Attachment #2: patch.diff --]
[-- Type: application/octet-stream, Size: 10899 bytes --]

 arch/arm/include/asm/tlb.h   |  7 +++++--
 arch/arm64/include/asm/tlb.h |  7 +++++--
 arch/ia64/include/asm/tlb.h  |  9 ++++++---
 arch/s390/include/asm/tlb.h  |  8 ++++++--
 arch/sh/include/asm/tlb.h    |  6 ++++--
 arch/um/include/asm/tlb.h    |  6 ++++--
 fs/exec.c                    |  4 ++--
 include/asm-generic/tlb.h    |  2 +-
 mm/hugetlb.c                 |  2 +-
 mm/memory.c                  | 36 +++++++++++++++++++++---------------
 mm/mmap.c                    |  4 ++--
 11 files changed, 57 insertions(+), 34 deletions(-)

diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
index 46e7cfb3e721..0baf7f0d9394 100644
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -43,6 +43,7 @@ struct mmu_gather {
 	struct mm_struct	*mm;
 	unsigned int		fullmm;
 	struct vm_area_struct	*vma;
+	unsigned long		start, end;
 	unsigned long		range_start;
 	unsigned long		range_end;
 	unsigned int		nr;
@@ -107,10 +108,12 @@ static inline void tlb_flush_mmu(struct mmu_gather *tlb)
 }
 
 static inline void
-tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int fullmm)
+tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start, unsigned long end)
 {
 	tlb->mm = mm;
-	tlb->fullmm = fullmm;
+	tlb->fullmm = !(start | (end+1));
+	tlb->start = start;
+	tlb->end = end;
 	tlb->vma = NULL;
 	tlb->max = ARRAY_SIZE(tlb->local);
 	tlb->pages = tlb->local;
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 46b3beb4b773..e3c4ef1441b6 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -35,6 +35,7 @@ struct mmu_gather {
 	struct mm_struct	*mm;
 	unsigned int		fullmm;
 	struct vm_area_struct	*vma;
+	unsigned long		start, end;
 	unsigned long		range_start;
 	unsigned long		range_end;
 	unsigned int		nr;
@@ -97,10 +98,12 @@ static inline void tlb_flush_mmu(struct mmu_gather *tlb)
 }
 
 static inline void
-tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int fullmm)
+tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start, unsigned logn end)
 {
 	tlb->mm = mm;
-	tlb->fullmm = fullmm;
+	tlb->fullmm = !(start | (end+1));
+	tlb->start = start;
+	tlb->end = end;
 	tlb->vma = NULL;
 	tlb->max = ARRAY_SIZE(tlb->local);
 	tlb->pages = tlb->local;
diff --git a/arch/ia64/include/asm/tlb.h b/arch/ia64/include/asm/tlb.h
index ef3a9de01954..bc5efc7c3f3f 100644
--- a/arch/ia64/include/asm/tlb.h
+++ b/arch/ia64/include/asm/tlb.h
@@ -22,7 +22,7 @@
  * unmapping a portion of the virtual address space, these hooks are called according to
  * the following template:
  *
- *	tlb <- tlb_gather_mmu(mm, full_mm_flush);	// start unmap for address space MM
+ *	tlb <- tlb_gather_mmu(mm, start, end);		// start unmap for address space MM
  *	{
  *	  for each vma that needs a shootdown do {
  *	    tlb_start_vma(tlb, vma);
@@ -58,6 +58,7 @@ struct mmu_gather {
 	unsigned int		max;
 	unsigned char		fullmm;		/* non-zero means full mm flush */
 	unsigned char		need_flush;	/* really unmapped some PTEs? */
+	unsigned long		start, end;
 	unsigned long		start_addr;
 	unsigned long		end_addr;
 	struct page		**pages;
@@ -155,13 +156,15 @@ static inline void __tlb_alloc_page(struct mmu_gather *tlb)
 
 
 static inline void
-tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
+tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start, unsigned long end)
 {
 	tlb->mm = mm;
 	tlb->max = ARRAY_SIZE(tlb->local);
 	tlb->pages = tlb->local;
 	tlb->nr = 0;
-	tlb->fullmm = full_mm_flush;
+	tlb->fullmm = !(start | (end+1));
+	tlb->start = start;
+	tlb->end = end;
 	tlb->start_addr = ~0UL;
 }
 
diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h
index b75d7d686684..23a64d25f2b1 100644
--- a/arch/s390/include/asm/tlb.h
+++ b/arch/s390/include/asm/tlb.h
@@ -32,6 +32,7 @@ struct mmu_gather {
 	struct mm_struct *mm;
 	struct mmu_table_batch *batch;
 	unsigned int fullmm;
+	unsigned long start, unsigned long end;
 };
 
 struct mmu_table_batch {
@@ -48,10 +49,13 @@ extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
 
 static inline void tlb_gather_mmu(struct mmu_gather *tlb,
 				  struct mm_struct *mm,
-				  unsigned int full_mm_flush)
+				  unsigned long start,
+				  unsigned long end)
 {
 	tlb->mm = mm;
-	tlb->fullmm = full_mm_flush;
+	tlb->start = start;
+	tlb->end = end;
+	tlb->fullmm = !(start | (end+1));
 	tlb->batch = NULL;
 	if (tlb->fullmm)
 		__tlb_flush_mm(mm);
diff --git a/arch/sh/include/asm/tlb.h b/arch/sh/include/asm/tlb.h
index e61d43d9f689..47745b255721 100644
--- a/arch/sh/include/asm/tlb.h
+++ b/arch/sh/include/asm/tlb.h
@@ -36,10 +36,12 @@ static inline void init_tlb_gather(struct mmu_gather *tlb)
 }
 
 static inline void
-tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
+tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start, unsigned logn end)
 {
 	tlb->mm = mm;
-	tlb->fullmm = full_mm_flush;
+	tlb->start = start;
+	tlb->end = end;
+	tlb->fullmm = !(start | (end+1));
 
 	init_tlb_gather(tlb);
 }
diff --git a/arch/um/include/asm/tlb.h b/arch/um/include/asm/tlb.h
index 4febacd1a8a1..29b0301c18aa 100644
--- a/arch/um/include/asm/tlb.h
+++ b/arch/um/include/asm/tlb.h
@@ -45,10 +45,12 @@ static inline void init_tlb_gather(struct mmu_gather *tlb)
 }
 
 static inline void
-tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
+tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start, unsigned long end)
 {
 	tlb->mm = mm;
-	tlb->fullmm = full_mm_flush;
+	tlb->start = start;
+	tlb->end = end;
+	tlb->fullmm = !(start | (end+1));
 
 	init_tlb_gather(tlb);
 }
diff --git a/fs/exec.c b/fs/exec.c
index 9c73def87642..fd774c7cb483 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -608,7 +608,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
 		return -ENOMEM;
 
 	lru_add_drain();
-	tlb_gather_mmu(&tlb, mm, 0);
+	tlb_gather_mmu(&tlb, mm, old_start, old_end);
 	if (new_end > old_start) {
 		/*
 		 * when the old and new regions overlap clear from new_end.
@@ -625,7 +625,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
 		free_pgd_range(&tlb, old_start, old_end, new_end,
 			vma->vm_next ? vma->vm_next->vm_start : USER_PGTABLES_CEILING);
 	}
-	tlb_finish_mmu(&tlb, new_end, old_end);
+	tlb_finish_mmu(&tlb, old_start, old_end);
 
 	/*
 	 * Shrink the vma to just the new range.  Always succeeds.
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 13821c339a41..5672d7ea1fa0 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -112,7 +112,7 @@ struct mmu_gather {
 
 #define HAVE_GENERIC_MMU_GATHER
 
-void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm);
+void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start, unsigned long end);
 void tlb_flush_mmu(struct mmu_gather *tlb);
 void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start,
 							unsigned long end);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 83aff0a4d093..b60f33080a28 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2490,7 +2490,7 @@ void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
 
 	mm = vma->vm_mm;
 
-	tlb_gather_mmu(&tlb, mm, 0);
+	tlb_gather_mmu(&tlb, mm, start, end);
 	__unmap_hugepage_range(&tlb, vma, start, end, ref_page);
 	tlb_finish_mmu(&tlb, start, end);
 }
diff --git a/mm/memory.c b/mm/memory.c
index 40268410732a..af84bc0ec17c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -209,14 +209,15 @@ static int tlb_next_batch(struct mmu_gather *tlb)
  *	tear-down from @mm. The @fullmm argument is used when @mm is without
  *	users and we're going to destroy the full address space (exit/execve).
  */
-void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm)
+void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start, unsigned long end)
 {
 	tlb->mm = mm;
 
-	tlb->fullmm     = fullmm;
+	/* Is it from 0 to ~0? */
+	tlb->fullmm     = !(start | (end+1));
 	tlb->need_flush_all = 0;
-	tlb->start	= -1UL;
-	tlb->end	= 0;
+	tlb->start	= start;
+	tlb->end	= end;
 	tlb->need_flush = 0;
 	tlb->local.next = NULL;
 	tlb->local.nr   = 0;
@@ -256,8 +257,6 @@ void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long e
 {
 	struct mmu_gather_batch *batch, *next;
 
-	tlb->start = start;
-	tlb->end   = end;
 	tlb_flush_mmu(tlb);
 
 	/* keep the page table cache within bounds */
@@ -1099,7 +1098,6 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 	spinlock_t *ptl;
 	pte_t *start_pte;
 	pte_t *pte;
-	unsigned long range_start = addr;
 
 again:
 	init_rss_vec(rss);
@@ -1205,17 +1203,25 @@ again:
 	 * and page-free while holding it.
 	 */
 	if (force_flush) {
+		unsigned long old_end;
+
 		force_flush = 0;
 
-#ifdef HAVE_GENERIC_MMU_GATHER
-		tlb->start = range_start;
+		/*
+		 * Flush the TLB just for the previous segment,
+		 * then update the range to be the remaining
+		 * TLB range.
+		 */
+		old_end = tlb->end;
 		tlb->end = addr;
-#endif
+
 		tlb_flush_mmu(tlb);
-		if (addr != end) {
-			range_start = addr;
+
+		tlb->start = addr;
+		tlb->end = old_end;
+
+		if (addr != end)
 			goto again;
-		}
 	}
 
 	return addr;
@@ -1400,7 +1406,7 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long start,
 	unsigned long end = start + size;
 
 	lru_add_drain();
-	tlb_gather_mmu(&tlb, mm, 0);
+	tlb_gather_mmu(&tlb, mm, start, end);
 	update_hiwater_rss(mm);
 	mmu_notifier_invalidate_range_start(mm, start, end);
 	for ( ; vma && vma->vm_start < end; vma = vma->vm_next)
@@ -1426,7 +1432,7 @@ static void zap_page_range_single(struct vm_area_struct *vma, unsigned long addr
 	unsigned long end = address + size;
 
 	lru_add_drain();
-	tlb_gather_mmu(&tlb, mm, 0);
+	tlb_gather_mmu(&tlb, mm, address, end);
 	update_hiwater_rss(mm);
 	mmu_notifier_invalidate_range_start(mm, address, end);
 	unmap_single_vma(&tlb, vma, address, end, details);
diff --git a/mm/mmap.c b/mm/mmap.c
index 1edbaa3136c3..f9c97d10b873 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2336,7 +2336,7 @@ static void unmap_region(struct mm_struct *mm,
 	struct mmu_gather tlb;
 
 	lru_add_drain();
-	tlb_gather_mmu(&tlb, mm, 0);
+	tlb_gather_mmu(&tlb, mm, start, end);
 	update_hiwater_rss(mm);
 	unmap_vmas(&tlb, vma, start, end);
 	free_pgtables(&tlb, vma, prev ? prev->vm_end : FIRST_USER_ADDRESS,
@@ -2709,7 +2709,7 @@ void exit_mmap(struct mm_struct *mm)
 
 	lru_add_drain();
 	flush_cache_mm(mm);
-	tlb_gather_mmu(&tlb, mm, 1);
+	tlb_gather_mmu(&tlb, mm, 0, -1);
 	/* update_hiwater_rss(mm) here? but nobody should be looking */
 	/* Use -1 here to ensure all VMAs in the mm are unmapped */
 	unmap_vmas(&tlb, vma, 0, -1);

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
  2013-08-15 18:00               ` Linus Torvalds
  2013-08-15 18:29                   ` Bjørn Mork
@ 2013-08-15 18:29                   ` Bjørn Mork
  1 sibling, 0 replies; 45+ messages in thread
From: Bjørn Mork @ 2013-08-15 18:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ben Tebulin, Michal Hocko, Mel Gorman, Johannes Weiner,
	Balbir Singh, KAMEZAWA Hiroyuki, linux-mm, Rik van Riel,
	Andrew Morton, LKML, Peter Zijlstra, linux-arch

Linus Torvalds <torvalds@linux-foundation.org> writes:

> Comments? Especially s390, ARM, ia64, sh and um that I edited blindly...

I can see that :-)  You have a couple of "unsigned logn"s here.


Bjørn

> --- a/arch/arm64/include/asm/tlb.h
> +++ b/arch/arm64/include/asm/tlb.h
> @@ -35,6 +35,7 @@ struct mmu_gather {
>  	struct mm_struct	*mm;
>  	unsigned int		fullmm;
>  	struct vm_area_struct	*vma;
> +	unsigned long		start, end;
>  	unsigned long		range_start;
>  	unsigned long		range_end;
>  	unsigned int		nr;
> @@ -97,10 +98,12 @@ static inline void tlb_flush_mmu(struct mmu_gather *tlb)
>  }
>  
>  static inline void
> -tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int fullmm)
> +tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start, unsigned logn end)

[..]

> diff --git a/arch/sh/include/asm/tlb.h b/arch/sh/include/asm/tlb.h
> index e61d43d9f689..47745b255721 100644
> --- a/arch/sh/include/asm/tlb.h
> +++ b/arch/sh/include/asm/tlb.h
> @@ -36,10 +36,12 @@ static inline void init_tlb_gather(struct mmu_gather *tlb)
>  }
>  
>  static inline void
> -tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
> +tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start, unsigned logn end)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
@ 2013-08-15 18:29                   ` Bjørn Mork
  0 siblings, 0 replies; 45+ messages in thread
From: Bjørn Mork @ 2013-08-15 18:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ben Tebulin, Michal Hocko, Mel Gorman, Johannes Weiner,
	Balbir Singh, KAMEZAWA Hiroyuki, linux-mm, Rik van Riel,
	Andrew Morton, LKML, Peter Zijlstra, linux-arch

Linus Torvalds <torvalds@linux-foundation.org> writes:

> Comments? Especially s390, ARM, ia64, sh and um that I edited blindly...

I can see that :-)  You have a couple of "unsigned logn"s here.


Bjørn

> --- a/arch/arm64/include/asm/tlb.h
> +++ b/arch/arm64/include/asm/tlb.h
> @@ -35,6 +35,7 @@ struct mmu_gather {
>  	struct mm_struct	*mm;
>  	unsigned int		fullmm;
>  	struct vm_area_struct	*vma;
> +	unsigned long		start, end;
>  	unsigned long		range_start;
>  	unsigned long		range_end;
>  	unsigned int		nr;
> @@ -97,10 +98,12 @@ static inline void tlb_flush_mmu(struct mmu_gather *tlb)
>  }
>  
>  static inline void
> -tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int fullmm)
> +tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start, unsigned logn end)

[..]

> diff --git a/arch/sh/include/asm/tlb.h b/arch/sh/include/asm/tlb.h
> index e61d43d9f689..47745b255721 100644
> --- a/arch/sh/include/asm/tlb.h
> +++ b/arch/sh/include/asm/tlb.h
> @@ -36,10 +36,12 @@ static inline void init_tlb_gather(struct mmu_gather *tlb)
>  }
>  
>  static inline void
> -tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
> +tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start, unsigned logn end)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
@ 2013-08-15 18:29                   ` Bjørn Mork
  0 siblings, 0 replies; 45+ messages in thread
From: Bjørn Mork @ 2013-08-15 18:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ben Tebulin, Michal Hocko, Mel Gorman, Johannes Weiner,
	Balbir Singh, KAMEZAWA Hiroyuki, linux-mm, Rik van Riel,
	Andrew Morton, LKML, Peter Zijlstra, linux-arch

Linus Torvalds <torvalds@linux-foundation.org> writes:

> Comments? Especially s390, ARM, ia64, sh and um that I edited blindly...

I can see that :-)  You have a couple of "unsigned logn"s here.


Bjørn

> --- a/arch/arm64/include/asm/tlb.h
> +++ b/arch/arm64/include/asm/tlb.h
> @@ -35,6 +35,7 @@ struct mmu_gather {
>  	struct mm_struct	*mm;
>  	unsigned int		fullmm;
>  	struct vm_area_struct	*vma;
> +	unsigned long		start, end;
>  	unsigned long		range_start;
>  	unsigned long		range_end;
>  	unsigned int		nr;
> @@ -97,10 +98,12 @@ static inline void tlb_flush_mmu(struct mmu_gather *tlb)
>  }
>  
>  static inline void
> -tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int fullmm)
> +tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start, unsigned logn end)

[..]

> diff --git a/arch/sh/include/asm/tlb.h b/arch/sh/include/asm/tlb.h
> index e61d43d9f689..47745b255721 100644
> --- a/arch/sh/include/asm/tlb.h
> +++ b/arch/sh/include/asm/tlb.h
> @@ -36,10 +36,12 @@ static inline void init_tlb_gather(struct mmu_gather *tlb)
>  }
>  
>  static inline void
> -tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
> +tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start, unsigned logn end)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
  2013-08-15 18:29                   ` Bjørn Mork
@ 2013-08-15 18:42                     ` Linus Torvalds
  -1 siblings, 0 replies; 45+ messages in thread
From: Linus Torvalds @ 2013-08-15 18:42 UTC (permalink / raw)
  To: Bjørn Mork
  Cc: Ben Tebulin, Michal Hocko, Mel Gorman, Johannes Weiner,
	Balbir Singh, KAMEZAWA Hiroyuki, linux-mm, Rik van Riel,
	Andrew Morton, LKML, Peter Zijlstra, linux-arch

On Thu, Aug 15, 2013 at 11:29 AM, Bjørn Mork <bjorn@mork.no> wrote:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
>> Comments? Especially s390, ARM, ia64, sh and um that I edited blindly...
>
> I can see that :-)  You have a couple of "unsigned logn"s here.

Just checking that you guys are awake.

Good job. You passed.

                 Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!
@ 2013-08-15 18:42                     ` Linus Torvalds
  0 siblings, 0 replies; 45+ messages in thread
From: Linus Torvalds @ 2013-08-15 18:42 UTC (permalink / raw)
  To: Bjørn Mork
  Cc: Ben Tebulin, Michal Hocko, Mel Gorman, Johannes Weiner,
	Balbir Singh, KAMEZAWA Hiroyuki, linux-mm, Rik van Riel,
	Andrew Morton, LKML, Peter Zijlstra, linux-arch

On Thu, Aug 15, 2013 at 11:29 AM, Bjørn Mork <bjorn@mork.no> wrote:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
>> Comments? Especially s390, ARM, ia64, sh and um that I edited blindly...
>
> I can see that :-)  You have a couple of "unsigned logn"s here.

Just checking that you guys are awake.

Good job. You passed.

                 Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)
  2013-08-15 18:00               ` Linus Torvalds
@ 2013-08-15 23:05                   ` Ben Tebulin
  2013-08-15 23:05                   ` Ben Tebulin
  1 sibling, 0 replies; 45+ messages in thread
From: Ben Tebulin @ 2013-08-15 23:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ben Tebulin, Michal Hocko, Mel Gorman, Johannes Weiner,
	Balbir Singh, KAMEZAWA Hiroyuki, linux-mm, Rik van Riel,
	Andrew Morton, LKML, Peter Zijlstra, linux-arch

Am 15.08.2013 20:00, schrieb Linus Torvalds:
> Ok, so I've slept on it, and here's my current thinking.
> [...]  

Many thoughts which as a user I'm am unable to follow  ;-)

> This patch tries to fix the interface instead of trying to patch up
> the individual places that *should* set the range some particular way
> [...]
> This patch is against current git, so to apply you need to have
> that commit e6c495a96ce0 cherry-picked to older kernels first.

I took a shot based on 3.9.11 + e6c495a96ce0. The reason why I don't
simply use the current git master is, that for some reasons my
linux-image-*.deb become 750MB and larger since 3.10.y and I have no
clue at all why and what to do about it.

The patch failed. Due to my outstanding incompetence I resorted into
applying it onto master, cherry-picking that back and trying to resolve
the remaining conflicts correctly.

>  - I have no idea whether this will fix the problem Ben sees, but I
> feel happier about the code, because now any place that forgets to set
> up start/end will work just fine, because they are always valid. 

Simpler code? Resilient API? Happy people? Great!

> Ben, please test. I'm worried that the problem you see is something 
> even more fundamentally wrong with the whole "oops, must flush in the
> middle" logic, but I'm _hoping_ this fixes it.

It's gone.

Really!

I git-fsck'ed successfully around 30 times in a row.
And even all the other things still seem to work ;-)

Honestly I have to confess that I'm deeply impressed how this finally
worked out: I just threw a particular, innocent-looking commit hash and
nothing more into the round. And while still being unsure if this might
be a plain user space issue, only 24h later I received a 11kb sized
kernel patch (with blatant typos in it !1! *g* ) apparently solving my
issue.

/me happy now, too! :)

- Ben

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)
@ 2013-08-15 23:05                   ` Ben Tebulin
  0 siblings, 0 replies; 45+ messages in thread
From: Ben Tebulin @ 2013-08-15 23:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ben Tebulin, Michal Hocko, Mel Gorman, Johannes Weiner,
	Balbir Singh, KAMEZAWA Hiroyuki, linux-mm, Rik van Riel,
	Andrew Morton, LKML, Peter Zijlstra, linux-arch

Am 15.08.2013 20:00, schrieb Linus Torvalds:
> Ok, so I've slept on it, and here's my current thinking.
> [...]  

Many thoughts which as a user I'm am unable to follow  ;-)

> This patch tries to fix the interface instead of trying to patch up
> the individual places that *should* set the range some particular way
> [...]
> This patch is against current git, so to apply you need to have
> that commit e6c495a96ce0 cherry-picked to older kernels first.

I took a shot based on 3.9.11 + e6c495a96ce0. The reason why I don't
simply use the current git master is, that for some reasons my
linux-image-*.deb become 750MB and larger since 3.10.y and I have no
clue at all why and what to do about it.

The patch failed. Due to my outstanding incompetence I resorted into
applying it onto master, cherry-picking that back and trying to resolve
the remaining conflicts correctly.

>  - I have no idea whether this will fix the problem Ben sees, but I
> feel happier about the code, because now any place that forgets to set
> up start/end will work just fine, because they are always valid. 

Simpler code? Resilient API? Happy people? Great!

> Ben, please test. I'm worried that the problem you see is something 
> even more fundamentally wrong with the whole "oops, must flush in the
> middle" logic, but I'm _hoping_ this fixes it.

It's gone.

Really!

I git-fsck'ed successfully around 30 times in a row.
And even all the other things still seem to work ;-)

Honestly I have to confess that I'm deeply impressed how this finally
worked out: I just threw a particular, innocent-looking commit hash and
nothing more into the round. And while still being unsure if this might
be a plain user space issue, only 24h later I received a 11kb sized
kernel patch (with blatant typos in it !1! *g* ) apparently solving my
issue.

/me happy now, too! :)

- Ben

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)
  2013-08-15 23:05                   ` Ben Tebulin
  (?)
@ 2013-08-16  0:33                   ` Linus Torvalds
  2013-08-16  6:22                     ` Stephen Rothwell
                                       ` (3 more replies)
  -1 siblings, 4 replies; 45+ messages in thread
From: Linus Torvalds @ 2013-08-16  0:33 UTC (permalink / raw)
  To: Ben Tebulin
  Cc: Michal Hocko, Mel Gorman, Johannes Weiner, Balbir Singh,
	KAMEZAWA Hiroyuki, linux-mm, Rik van Riel, Andrew Morton, LKML,
	Peter Zijlstra, linux-arch

[-- Attachment #1: Type: text/plain, Size: 1595 bytes --]

On Thu, Aug 15, 2013 at 4:05 PM, Ben Tebulin <tebulin@googlemail.com> wrote:
>
>> Ben, please test. I'm worried that the problem you see is something
>> even more fundamentally wrong with the whole "oops, must flush in the
>> middle" logic, but I'm _hoping_ this fixes it.
>
> It's gone.
>
> Really!
>
> I git-fsck'ed successfully around 30 times in a row.
> And even all the other things still seem to work ;-)

Goodie. I think I'm just going to commit it (with the speling fixes
for other architectures) asap. It's bigger than I'd like, but it's a
lot simpler than the alternatives of trying to figure out exactly
which call chain got things wrong with the previous confusing model.

Thanks for bisecting and testing.

> Honestly I have to confess that I'm deeply impressed how this finally
> worked out: I just threw a particular, innocent-looking commit hash and
> nothing more into the round.

Being able to bisect the exact commit that introduced the bad behavior
is *very* powerful debugging aid, and in fact the smaller and more
innocent-looking the bisected commit is, the easier it generally is to
then say "ok, it must be related to this one particular issue". So the
bisection really pinpointed the area. After that it was just a matter
of reading the source code and seeing what looked suspicious.

I'll probably delay committing it until tomorrow, in the hope that
somebody using one of the other architectures will at least ack that
it compiles. I'm re-attaching the patch (with the two "logn" -> "long"
fixes) just to encourage that. Hint hint, everybody..

               Linus

[-- Attachment #2: patch.diff --]
[-- Type: application/octet-stream, Size: 10899 bytes --]

 arch/arm/include/asm/tlb.h   |  7 +++++--
 arch/arm64/include/asm/tlb.h |  7 +++++--
 arch/ia64/include/asm/tlb.h  |  9 ++++++---
 arch/s390/include/asm/tlb.h  |  8 ++++++--
 arch/sh/include/asm/tlb.h    |  6 ++++--
 arch/um/include/asm/tlb.h    |  6 ++++--
 fs/exec.c                    |  4 ++--
 include/asm-generic/tlb.h    |  2 +-
 mm/hugetlb.c                 |  2 +-
 mm/memory.c                  | 36 +++++++++++++++++++++---------------
 mm/mmap.c                    |  4 ++--
 11 files changed, 57 insertions(+), 34 deletions(-)

diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
index 46e7cfb3e721..0baf7f0d9394 100644
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -43,6 +43,7 @@ struct mmu_gather {
 	struct mm_struct	*mm;
 	unsigned int		fullmm;
 	struct vm_area_struct	*vma;
+	unsigned long		start, end;
 	unsigned long		range_start;
 	unsigned long		range_end;
 	unsigned int		nr;
@@ -107,10 +108,12 @@ static inline void tlb_flush_mmu(struct mmu_gather *tlb)
 }
 
 static inline void
-tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int fullmm)
+tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start, unsigned long end)
 {
 	tlb->mm = mm;
-	tlb->fullmm = fullmm;
+	tlb->fullmm = !(start | (end+1));
+	tlb->start = start;
+	tlb->end = end;
 	tlb->vma = NULL;
 	tlb->max = ARRAY_SIZE(tlb->local);
 	tlb->pages = tlb->local;
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 46b3beb4b773..717031a762c2 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -35,6 +35,7 @@ struct mmu_gather {
 	struct mm_struct	*mm;
 	unsigned int		fullmm;
 	struct vm_area_struct	*vma;
+	unsigned long		start, end;
 	unsigned long		range_start;
 	unsigned long		range_end;
 	unsigned int		nr;
@@ -97,10 +98,12 @@ static inline void tlb_flush_mmu(struct mmu_gather *tlb)
 }
 
 static inline void
-tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int fullmm)
+tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start, unsigned long end)
 {
 	tlb->mm = mm;
-	tlb->fullmm = fullmm;
+	tlb->fullmm = !(start | (end+1));
+	tlb->start = start;
+	tlb->end = end;
 	tlb->vma = NULL;
 	tlb->max = ARRAY_SIZE(tlb->local);
 	tlb->pages = tlb->local;
diff --git a/arch/ia64/include/asm/tlb.h b/arch/ia64/include/asm/tlb.h
index ef3a9de01954..bc5efc7c3f3f 100644
--- a/arch/ia64/include/asm/tlb.h
+++ b/arch/ia64/include/asm/tlb.h
@@ -22,7 +22,7 @@
  * unmapping a portion of the virtual address space, these hooks are called according to
  * the following template:
  *
- *	tlb <- tlb_gather_mmu(mm, full_mm_flush);	// start unmap for address space MM
+ *	tlb <- tlb_gather_mmu(mm, start, end);		// start unmap for address space MM
  *	{
  *	  for each vma that needs a shootdown do {
  *	    tlb_start_vma(tlb, vma);
@@ -58,6 +58,7 @@ struct mmu_gather {
 	unsigned int		max;
 	unsigned char		fullmm;		/* non-zero means full mm flush */
 	unsigned char		need_flush;	/* really unmapped some PTEs? */
+	unsigned long		start, end;
 	unsigned long		start_addr;
 	unsigned long		end_addr;
 	struct page		**pages;
@@ -155,13 +156,15 @@ static inline void __tlb_alloc_page(struct mmu_gather *tlb)
 
 
 static inline void
-tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
+tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start, unsigned long end)
 {
 	tlb->mm = mm;
 	tlb->max = ARRAY_SIZE(tlb->local);
 	tlb->pages = tlb->local;
 	tlb->nr = 0;
-	tlb->fullmm = full_mm_flush;
+	tlb->fullmm = !(start | (end+1));
+	tlb->start = start;
+	tlb->end = end;
 	tlb->start_addr = ~0UL;
 }
 
diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h
index b75d7d686684..23a64d25f2b1 100644
--- a/arch/s390/include/asm/tlb.h
+++ b/arch/s390/include/asm/tlb.h
@@ -32,6 +32,7 @@ struct mmu_gather {
 	struct mm_struct *mm;
 	struct mmu_table_batch *batch;
 	unsigned int fullmm;
+	unsigned long start, unsigned long end;
 };
 
 struct mmu_table_batch {
@@ -48,10 +49,13 @@ extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
 
 static inline void tlb_gather_mmu(struct mmu_gather *tlb,
 				  struct mm_struct *mm,
-				  unsigned int full_mm_flush)
+				  unsigned long start,
+				  unsigned long end)
 {
 	tlb->mm = mm;
-	tlb->fullmm = full_mm_flush;
+	tlb->start = start;
+	tlb->end = end;
+	tlb->fullmm = !(start | (end+1));
 	tlb->batch = NULL;
 	if (tlb->fullmm)
 		__tlb_flush_mm(mm);
diff --git a/arch/sh/include/asm/tlb.h b/arch/sh/include/asm/tlb.h
index e61d43d9f689..362192ed12fe 100644
--- a/arch/sh/include/asm/tlb.h
+++ b/arch/sh/include/asm/tlb.h
@@ -36,10 +36,12 @@ static inline void init_tlb_gather(struct mmu_gather *tlb)
 }
 
 static inline void
-tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
+tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start, unsigned long end)
 {
 	tlb->mm = mm;
-	tlb->fullmm = full_mm_flush;
+	tlb->start = start;
+	tlb->end = end;
+	tlb->fullmm = !(start | (end+1));
 
 	init_tlb_gather(tlb);
 }
diff --git a/arch/um/include/asm/tlb.h b/arch/um/include/asm/tlb.h
index 4febacd1a8a1..29b0301c18aa 100644
--- a/arch/um/include/asm/tlb.h
+++ b/arch/um/include/asm/tlb.h
@@ -45,10 +45,12 @@ static inline void init_tlb_gather(struct mmu_gather *tlb)
 }
 
 static inline void
-tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
+tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start, unsigned long end)
 {
 	tlb->mm = mm;
-	tlb->fullmm = full_mm_flush;
+	tlb->start = start;
+	tlb->end = end;
+	tlb->fullmm = !(start | (end+1));
 
 	init_tlb_gather(tlb);
 }
diff --git a/fs/exec.c b/fs/exec.c
index 9c73def87642..fd774c7cb483 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -608,7 +608,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
 		return -ENOMEM;
 
 	lru_add_drain();
-	tlb_gather_mmu(&tlb, mm, 0);
+	tlb_gather_mmu(&tlb, mm, old_start, old_end);
 	if (new_end > old_start) {
 		/*
 		 * when the old and new regions overlap clear from new_end.
@@ -625,7 +625,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
 		free_pgd_range(&tlb, old_start, old_end, new_end,
 			vma->vm_next ? vma->vm_next->vm_start : USER_PGTABLES_CEILING);
 	}
-	tlb_finish_mmu(&tlb, new_end, old_end);
+	tlb_finish_mmu(&tlb, old_start, old_end);
 
 	/*
 	 * Shrink the vma to just the new range.  Always succeeds.
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 13821c339a41..5672d7ea1fa0 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -112,7 +112,7 @@ struct mmu_gather {
 
 #define HAVE_GENERIC_MMU_GATHER
 
-void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm);
+void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start, unsigned long end);
 void tlb_flush_mmu(struct mmu_gather *tlb);
 void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start,
 							unsigned long end);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 83aff0a4d093..b60f33080a28 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2490,7 +2490,7 @@ void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
 
 	mm = vma->vm_mm;
 
-	tlb_gather_mmu(&tlb, mm, 0);
+	tlb_gather_mmu(&tlb, mm, start, end);
 	__unmap_hugepage_range(&tlb, vma, start, end, ref_page);
 	tlb_finish_mmu(&tlb, start, end);
 }
diff --git a/mm/memory.c b/mm/memory.c
index 40268410732a..af84bc0ec17c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -209,14 +209,15 @@ static int tlb_next_batch(struct mmu_gather *tlb)
  *	tear-down from @mm. The @fullmm argument is used when @mm is without
  *	users and we're going to destroy the full address space (exit/execve).
  */
-void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm)
+void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start, unsigned long end)
 {
 	tlb->mm = mm;
 
-	tlb->fullmm     = fullmm;
+	/* Is it from 0 to ~0? */
+	tlb->fullmm     = !(start | (end+1));
 	tlb->need_flush_all = 0;
-	tlb->start	= -1UL;
-	tlb->end	= 0;
+	tlb->start	= start;
+	tlb->end	= end;
 	tlb->need_flush = 0;
 	tlb->local.next = NULL;
 	tlb->local.nr   = 0;
@@ -256,8 +257,6 @@ void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long e
 {
 	struct mmu_gather_batch *batch, *next;
 
-	tlb->start = start;
-	tlb->end   = end;
 	tlb_flush_mmu(tlb);
 
 	/* keep the page table cache within bounds */
@@ -1099,7 +1098,6 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 	spinlock_t *ptl;
 	pte_t *start_pte;
 	pte_t *pte;
-	unsigned long range_start = addr;
 
 again:
 	init_rss_vec(rss);
@@ -1205,17 +1203,25 @@ again:
 	 * and page-free while holding it.
 	 */
 	if (force_flush) {
+		unsigned long old_end;
+
 		force_flush = 0;
 
-#ifdef HAVE_GENERIC_MMU_GATHER
-		tlb->start = range_start;
+		/*
+		 * Flush the TLB just for the previous segment,
+		 * then update the range to be the remaining
+		 * TLB range.
+		 */
+		old_end = tlb->end;
 		tlb->end = addr;
-#endif
+
 		tlb_flush_mmu(tlb);
-		if (addr != end) {
-			range_start = addr;
+
+		tlb->start = addr;
+		tlb->end = old_end;
+
+		if (addr != end)
 			goto again;
-		}
 	}
 
 	return addr;
@@ -1400,7 +1406,7 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long start,
 	unsigned long end = start + size;
 
 	lru_add_drain();
-	tlb_gather_mmu(&tlb, mm, 0);
+	tlb_gather_mmu(&tlb, mm, start, end);
 	update_hiwater_rss(mm);
 	mmu_notifier_invalidate_range_start(mm, start, end);
 	for ( ; vma && vma->vm_start < end; vma = vma->vm_next)
@@ -1426,7 +1432,7 @@ static void zap_page_range_single(struct vm_area_struct *vma, unsigned long addr
 	unsigned long end = address + size;
 
 	lru_add_drain();
-	tlb_gather_mmu(&tlb, mm, 0);
+	tlb_gather_mmu(&tlb, mm, address, end);
 	update_hiwater_rss(mm);
 	mmu_notifier_invalidate_range_start(mm, address, end);
 	unmap_single_vma(&tlb, vma, address, end, details);
diff --git a/mm/mmap.c b/mm/mmap.c
index 1edbaa3136c3..f9c97d10b873 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2336,7 +2336,7 @@ static void unmap_region(struct mm_struct *mm,
 	struct mmu_gather tlb;
 
 	lru_add_drain();
-	tlb_gather_mmu(&tlb, mm, 0);
+	tlb_gather_mmu(&tlb, mm, start, end);
 	update_hiwater_rss(mm);
 	unmap_vmas(&tlb, vma, start, end);
 	free_pgtables(&tlb, vma, prev ? prev->vm_end : FIRST_USER_ADDRESS,
@@ -2709,7 +2709,7 @@ void exit_mmap(struct mm_struct *mm)
 
 	lru_add_drain();
 	flush_cache_mm(mm);
-	tlb_gather_mmu(&tlb, mm, 1);
+	tlb_gather_mmu(&tlb, mm, 0, -1);
 	/* update_hiwater_rss(mm) here? but nobody should be looking */
 	/* Use -1 here to ensure all VMAs in the mm are unmapped */
 	unmap_vmas(&tlb, vma, 0, -1);

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)
  2013-08-16  0:33                   ` Linus Torvalds
@ 2013-08-16  6:22                     ` Stephen Rothwell
  2013-08-16  7:55                       ` richard -rw- weinberger
                                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 45+ messages in thread
From: Stephen Rothwell @ 2013-08-16  6:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ben Tebulin, Michal Hocko, Mel Gorman, Johannes Weiner,
	Balbir Singh, KAMEZAWA Hiroyuki, linux-mm, Rik van Riel,
	Andrew Morton, LKML, Peter Zijlstra, linux-arch

[-- Attachment #1: Type: text/plain, Size: 669 bytes --]

Hi Linus,

On Thu, 15 Aug 2013 17:33:28 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> I'll probably delay committing it until tomorrow, in the hope that
> somebody using one of the other architectures will at least ack that
> it compiles. I'm re-attaching the patch (with the two "logn" -> "long"
> fixes) just to encourage that. Hint hint, everybody..

I built all the (major) PowerPC defconfigs, allnoconfig and allmodconfig
and they built as well as they did before this patch (i.e. some failed
for other reasons).  I have not done any boot testing on PowerPC. 

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)
  2013-08-16  0:33                   ` Linus Torvalds
@ 2013-08-16  7:55                       ` richard -rw- weinberger
  2013-08-16  7:55                       ` richard -rw- weinberger
                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 45+ messages in thread
From: richard -rw- weinberger @ 2013-08-16  7:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ben Tebulin, Michal Hocko, Mel Gorman, Johannes Weiner,
	Balbir Singh, KAMEZAWA Hiroyuki, linux-mm, Rik van Riel,
	Andrew Morton, LKML, Peter Zijlstra, linux-arch

On Fri, Aug 16, 2013 at 2:33 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> I'll probably delay committing it until tomorrow, in the hope that
> somebody using one of the other architectures will at least ack that
> it compiles. I'm re-attaching the patch (with the two "logn" -> "long"
> fixes) just to encourage that. Hint hint, everybody..

/me tested arch/um, so far everything looks good. :-)

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)
@ 2013-08-16  7:55                       ` richard -rw- weinberger
  0 siblings, 0 replies; 45+ messages in thread
From: richard -rw- weinberger @ 2013-08-16  7:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ben Tebulin, Michal Hocko, Mel Gorman, Johannes Weiner,
	Balbir Singh, KAMEZAWA Hiroyuki, linux-mm, Rik van Riel,
	Andrew Morton, LKML, Peter Zijlstra, linux-arch

On Fri, Aug 16, 2013 at 2:33 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> I'll probably delay committing it until tomorrow, in the hope that
> somebody using one of the other architectures will at least ack that
> it compiles. I'm re-attaching the patch (with the two "logn" -> "long"
> fixes) just to encourage that. Hint hint, everybody..

/me tested arch/um, so far everything looks good. :-)

-- 
Thanks,
//richard

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)
  2013-08-16  0:33                   ` Linus Torvalds
@ 2013-08-16 11:00                       ` Michal Hocko
  2013-08-16  7:55                       ` richard -rw- weinberger
                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 45+ messages in thread
From: Michal Hocko @ 2013-08-16 11:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ben Tebulin, Mel Gorman, Johannes Weiner, Balbir Singh,
	KAMEZAWA Hiroyuki, linux-mm, Rik van Riel, Andrew Morton, LKML,
	Peter Zijlstra, linux-arch

On Thu 15-08-13 17:33:28, Linus Torvalds wrote:
> On Thu, Aug 15, 2013 at 4:05 PM, Ben Tebulin <tebulin@googlemail.com> wrote:
> >
> >> Ben, please test. I'm worried that the problem you see is something
> >> even more fundamentally wrong with the whole "oops, must flush in the
> >> middle" logic, but I'm _hoping_ this fixes it.
> >
> > It's gone.
> >
> > Really!
> >
> > I git-fsck'ed successfully around 30 times in a row.
> > And even all the other things still seem to work ;-)
> 
> Goodie. I think I'm just going to commit it (with the speling fixes
> for other architectures) asap. It's bigger than I'd like, but it's a
> lot simpler than the alternatives of trying to figure out exactly
> which call chain got things wrong with the previous confusing model.

I was thinking about teaching __tlb_remove_page to update the range
automatically from the given address.

But your patch looks good to me as well.

Feel free to add
Reviewed-by: Michal Hocko <mhocko@suse.cz>

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)
@ 2013-08-16 11:00                       ` Michal Hocko
  0 siblings, 0 replies; 45+ messages in thread
From: Michal Hocko @ 2013-08-16 11:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ben Tebulin, Mel Gorman, Johannes Weiner, Balbir Singh,
	KAMEZAWA Hiroyuki, linux-mm, Rik van Riel, Andrew Morton, LKML,
	Peter Zijlstra, linux-arch

On Thu 15-08-13 17:33:28, Linus Torvalds wrote:
> On Thu, Aug 15, 2013 at 4:05 PM, Ben Tebulin <tebulin@googlemail.com> wrote:
> >
> >> Ben, please test. I'm worried that the problem you see is something
> >> even more fundamentally wrong with the whole "oops, must flush in the
> >> middle" logic, but I'm _hoping_ this fixes it.
> >
> > It's gone.
> >
> > Really!
> >
> > I git-fsck'ed successfully around 30 times in a row.
> > And even all the other things still seem to work ;-)
> 
> Goodie. I think I'm just going to commit it (with the speling fixes
> for other architectures) asap. It's bigger than I'd like, but it's a
> lot simpler than the alternatives of trying to figure out exactly
> which call chain got things wrong with the previous confusing model.

I was thinking about teaching __tlb_remove_page to update the range
automatically from the given address.

But your patch looks good to me as well.

Feel free to add
Reviewed-by: Michal Hocko <mhocko@suse.cz>

Thanks!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)
  2013-08-16 11:00                       ` Michal Hocko
@ 2013-08-16 11:28                         ` Peter Zijlstra
  -1 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2013-08-16 11:28 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linus Torvalds, Ben Tebulin, Mel Gorman, Johannes Weiner,
	Balbir Singh, KAMEZAWA Hiroyuki, linux-mm, Rik van Riel,
	Andrew Morton, LKML, linux-arch

On Fri, Aug 16, 2013 at 01:00:31PM +0200, Michal Hocko wrote:

> I was thinking about teaching __tlb_remove_page to update the range
> automatically from the given address.

The mmu_gather unification stuff I had did it differently still:

  http://permalink.gmane.org/gmane.linux.kernel.mm/81287

That said, I do like Linus' approach. The only thing I haven't
considered is if it does the right thing for tile,mips-r4k which have
'special' rules for VM_HUGETLB. Although I don't think it changes those
archs enough to break anything.

I should find some time to finally finish that series :/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)
@ 2013-08-16 11:28                         ` Peter Zijlstra
  0 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2013-08-16 11:28 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linus Torvalds, Ben Tebulin, Mel Gorman, Johannes Weiner,
	Balbir Singh, KAMEZAWA Hiroyuki, linux-mm, Rik van Riel,
	Andrew Morton, LKML, linux-arch

On Fri, Aug 16, 2013 at 01:00:31PM +0200, Michal Hocko wrote:

> I was thinking about teaching __tlb_remove_page to update the range
> automatically from the given address.

The mmu_gather unification stuff I had did it differently still:

  http://permalink.gmane.org/gmane.linux.kernel.mm/81287

That said, I do like Linus' approach. The only thing I haven't
considered is if it does the right thing for tile,mips-r4k which have
'special' rules for VM_HUGETLB. Although I don't think it changes those
archs enough to break anything.

I should find some time to finally finish that series :/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)
  2013-08-16  0:33                   ` Linus Torvalds
@ 2013-08-16 23:40                       ` Tony Luck
  2013-08-16  7:55                       ` richard -rw- weinberger
                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 45+ messages in thread
From: Tony Luck @ 2013-08-16 23:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ben Tebulin, Michal Hocko, Mel Gorman, Johannes Weiner,
	Balbir Singh, KAMEZAWA Hiroyuki, linux-mm, Rik van Riel,
	Andrew Morton, LKML, Peter Zijlstra, linux-arch

On Thu, Aug 15, 2013 at 5:33 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> I'll probably delay committing it until tomorrow, in the hope that
> somebody using one of the other architectures will at least ack that
> it compiles. I'm re-attaching the patch (with the two "logn" -> "long"
> fixes) just to encourage that. Hint hint, everybody..

I see I'm too late to supply an Ack for the commit, because it is already in.
But just for completeness sake - all my ia64 configs build OK, and the couple
that get boot tested still appear to be working too.

-Tony

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)
@ 2013-08-16 23:40                       ` Tony Luck
  0 siblings, 0 replies; 45+ messages in thread
From: Tony Luck @ 2013-08-16 23:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ben Tebulin, Michal Hocko, Mel Gorman, Johannes Weiner,
	Balbir Singh, KAMEZAWA Hiroyuki, linux-mm, Rik van Riel,
	Andrew Morton, LKML, Peter Zijlstra, linux-arch

On Thu, Aug 15, 2013 at 5:33 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> I'll probably delay committing it until tomorrow, in the hope that
> somebody using one of the other architectures will at least ack that
> it compiles. I'm re-attaching the patch (with the two "logn" -> "long"
> fixes) just to encourage that. Hint hint, everybody..

I see I'm too late to supply an Ack for the commit, because it is already in.
But just for completeness sake - all my ia64 configs build OK, and the couple
that get boot tested still appear to be working too.

-Tony

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2013-08-17  0:09 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-09 14:58 Reproducible git-fsck/SHA1 failures since 3.7.x on a Dell E6430 / i5-3340M Ben Tebulin
2013-08-12  8:04 ` Reproducible data corruption since 3.7.x on i5-3340M machines Ben Tebulin
2013-08-14 16:36 ` [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67! Ben Tebulin
2013-08-14 17:40   ` Michal Hocko
2013-08-14 17:40     ` Michal Hocko
2013-08-14 17:58     ` Michal Hocko
2013-08-14 17:58       ` Michal Hocko
2013-08-14 18:03     ` Linus Torvalds
2013-08-14 18:03       ` Linus Torvalds
2013-08-14 18:28       ` Michal Hocko
2013-08-14 18:28         ` Michal Hocko
2013-08-14 18:35         ` Linus Torvalds
2013-08-14 18:35           ` Linus Torvalds
2013-08-15  9:25           ` Ben Tebulin
2013-08-15  9:25             ` Ben Tebulin
2013-08-15 12:02             ` Linus Torvalds
2013-08-15 12:02               ` Linus Torvalds
2013-08-15 12:37               ` Ben Tebulin
2013-08-15 12:37                 ` Ben Tebulin
2013-08-15 13:40               ` Michal Hocko
2013-08-15 13:40                 ` Michal Hocko
2013-08-15 14:46                 ` Michal Hocko
2013-08-15 14:46                   ` Michal Hocko
2013-08-15 14:53                   ` Michal Hocko
2013-08-15 14:53                     ` Michal Hocko
2013-08-15 15:14                     ` Michal Hocko
2013-08-15 15:14                       ` Michal Hocko
2013-08-15 18:00               ` Linus Torvalds
2013-08-15 18:29                 ` Bjørn Mork
2013-08-15 18:29                   ` Bjørn Mork
2013-08-15 18:29                   ` Bjørn Mork
2013-08-15 18:42                   ` Linus Torvalds
2013-08-15 18:42                     ` Linus Torvalds
2013-08-15 23:05                 ` [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-) Ben Tebulin
2013-08-15 23:05                   ` Ben Tebulin
2013-08-16  0:33                   ` Linus Torvalds
2013-08-16  6:22                     ` Stephen Rothwell
2013-08-16  7:55                     ` richard -rw- weinberger
2013-08-16  7:55                       ` richard -rw- weinberger
2013-08-16 11:00                     ` Michal Hocko
2013-08-16 11:00                       ` Michal Hocko
2013-08-16 11:28                       ` Peter Zijlstra
2013-08-16 11:28                         ` Peter Zijlstra
2013-08-16 23:40                     ` Tony Luck
2013-08-16 23:40                       ` Tony Luck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.