All of lore.kernel.org
 help / color / mirror / Atom feed
* [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs
@ 2014-06-06 11:38 Mikael Pettersson
  2014-06-06 11:43 ` Geert Uytterhoeven
  2014-06-11  8:45 ` Andreas Schwab
  0 siblings, 2 replies; 29+ messages in thread
From: Mikael Pettersson @ 2014-06-06 11:38 UTC (permalink / raw)
  To: linux-m68k

Since updating my ARAnym VMs from 3.12.16 to 3.13.11 I see kswapd0
and ksoftirqd/0 consume inordinate amounts of CPU.  kswapd0 often
rises to about 30-50% CPU even though RAM shouldn't be anywhere near
depleted (768GB, a gcc bootstrap running in a screen session).
kswapd0 tends to stay this way until I drop caches, but doing that
doesn't always fix it.  I also sometimes see ksoftirqd/0 consume
5-30% CPU.

Reverting to 3.12.16 completely eliminates these problems.

I haven't tested 3.14 or 3.15-rc yet.

/Mikael

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs
  2014-06-06 11:38 [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs Mikael Pettersson
@ 2014-06-06 11:43 ` Geert Uytterhoeven
  2014-06-06 13:11   ` Mikael Pettersson
  2014-06-11  8:45 ` Andreas Schwab
  1 sibling, 1 reply; 29+ messages in thread
From: Geert Uytterhoeven @ 2014-06-06 11:43 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: Linux/m68k

Hi Mikael,

On Fri, Jun 6, 2014 at 1:38 PM, Mikael Pettersson <mikpelinux@gmail.com> wrote:
> Since updating my ARAnym VMs from 3.12.16 to 3.13.11 I see kswapd0
> and ksoftirqd/0 consume inordinate amounts of CPU.  kswapd0 often
> rises to about 30-50% CPU even though RAM shouldn't be anywhere near
> depleted (768GB, a gcc bootstrap running in a screen session).
> kswapd0 tends to stay this way until I drop caches, but doing that
> doesn't always fix it.  I also sometimes see ksoftirqd/0 consume
> 5-30% CPU.
>
> Reverting to 3.12.16 completely eliminates these problems.

Any chance to bisect it?

> I haven't tested 3.14 or 3.15-rc yet.

Would be good to know, though, as it may have been fixed.
However, as v3.13.11 is the most recent stable version of v3.13, this may be
an elsewhere unknown and thus unfixed issue.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs
  2014-06-06 11:43 ` Geert Uytterhoeven
@ 2014-06-06 13:11   ` Mikael Pettersson
  2014-06-07 13:22     ` Mikael Pettersson
  0 siblings, 1 reply; 29+ messages in thread
From: Mikael Pettersson @ 2014-06-06 13:11 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: Mikael Pettersson, Linux/m68k

Geert Uytterhoeven writes:
 > Hi Mikael,
 > 
 > On Fri, Jun 6, 2014 at 1:38 PM, Mikael Pettersson <mikpelinux@gmail.com> wrote:
 > > Since updating my ARAnym VMs from 3.12.16 to 3.13.11 I see kswapd0
 > > and ksoftirqd/0 consume inordinate amounts of CPU.  kswapd0 often
 > > rises to about 30-50% CPU even though RAM shouldn't be anywhere near
 > > depleted (768GB, a gcc bootstrap running in a screen session).
 > > kswapd0 tends to stay this way until I drop caches, but doing that
 > > doesn't always fix it.  I also sometimes see ksoftirqd/0 consume
 > > 5-30% CPU.
 > >
 > > Reverting to 3.12.16 completely eliminates these problems.
 > 
 > Any chance to bisect it?
 > 
 > > I haven't tested 3.14 or 3.15-rc yet.
 > 
 > Would be good to know, though, as it may have been fixed.
 > However, as v3.13.11 is the most recent stable version of v3.13, this may be
 > an elsewhere unknown and thus unfixed issue.

I've just started a gcc-4.8 bootstrap on 3.14.5, and then I'll try 3.15
if it isn't fixed in 3.14.  If 3.15 too is broken, I'll do the bisect,
but it will be a slow process since it takes anywhere from a few hours
to a couple of days for the bug to appear.

BTW, the presence of some unknown bug causing kswapd0 to hog the CPU has
been mentioned on the Debian m68k list earlier this year.

/Mikael

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs
  2014-06-06 13:11   ` Mikael Pettersson
@ 2014-06-07 13:22     ` Mikael Pettersson
  2014-06-11  8:20       ` Mikael Pettersson
  0 siblings, 1 reply; 29+ messages in thread
From: Mikael Pettersson @ 2014-06-07 13:22 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: Geert Uytterhoeven, Linux/m68k

Mikael Pettersson writes:
 > Geert Uytterhoeven writes:
 >  > Hi Mikael,
 >  > 
 >  > On Fri, Jun 6, 2014 at 1:38 PM, Mikael Pettersson <mikpelinux@gmail.com> wrote:
 >  > > Since updating my ARAnym VMs from 3.12.16 to 3.13.11 I see kswapd0
 >  > > and ksoftirqd/0 consume inordinate amounts of CPU.  kswapd0 often
 >  > > rises to about 30-50% CPU even though RAM shouldn't be anywhere near
 >  > > depleted (768GB, a gcc bootstrap running in a screen session).
 >  > > kswapd0 tends to stay this way until I drop caches, but doing that
 >  > > doesn't always fix it.  I also sometimes see ksoftirqd/0 consume
 >  > > 5-30% CPU.
 >  > >
 >  > > Reverting to 3.12.16 completely eliminates these problems.
 >  > 
 >  > Any chance to bisect it?
 >  > 
 >  > > I haven't tested 3.14 or 3.15-rc yet.
 >  > 
 >  > Would be good to know, though, as it may have been fixed.
 >  > However, as v3.13.11 is the most recent stable version of v3.13, this may be
 >  > an elsewhere unknown and thus unfixed issue.
 > 
 > I've just started a gcc-4.8 bootstrap on 3.14.5, and then I'll try 3.15
 > if it isn't fixed in 3.14.

3.14.5 is also affected by the bug.  Dropping caches fixed kswapd0, but
instead ksoftirqd/0 jumped to 20-40%, and it refuses to calm down.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs
  2014-06-07 13:22     ` Mikael Pettersson
@ 2014-06-11  8:20       ` Mikael Pettersson
  0 siblings, 0 replies; 29+ messages in thread
From: Mikael Pettersson @ 2014-06-11  8:20 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: Geert Uytterhoeven, Linux/m68k

Mikael Pettersson writes:
 > Mikael Pettersson writes:
 >  > Geert Uytterhoeven writes:
 >  >  > Hi Mikael,
 >  >  > 
 >  >  > On Fri, Jun 6, 2014 at 1:38 PM, Mikael Pettersson <mikpelinux@gmail.com> wrote:
 >  >  > > Since updating my ARAnym VMs from 3.12.16 to 3.13.11 I see kswapd0
 >  >  > > and ksoftirqd/0 consume inordinate amounts of CPU.  kswapd0 often
 >  >  > > rises to about 30-50% CPU even though RAM shouldn't be anywhere near
 >  >  > > depleted (768GB, a gcc bootstrap running in a screen session).
 >  >  > > kswapd0 tends to stay this way until I drop caches, but doing that
 >  >  > > doesn't always fix it.  I also sometimes see ksoftirqd/0 consume
 >  >  > > 5-30% CPU.
 >  >  > >
 >  >  > > Reverting to 3.12.16 completely eliminates these problems.
 >  >  > 
 >  >  > Any chance to bisect it?
 >  >  > 
 >  >  > > I haven't tested 3.14 or 3.15-rc yet.
 >  >  > 
 >  >  > Would be good to know, though, as it may have been fixed.
 >  >  > However, as v3.13.11 is the most recent stable version of v3.13, this may be
 >  >  > an elsewhere unknown and thus unfixed issue.
 >  > 
 >  > I've just started a gcc-4.8 bootstrap on 3.14.5, and then I'll try 3.15
 >  > if it isn't fixed in 3.14.
 > 
 > 3.14.5 is also affected by the bug.  Dropping caches fixed kswapd0, but
 > instead ksoftirqd/0 jumped to 20-40%, and it refuses to calm down.

3.15-rc8 is also affected.  It took a while before triggering, but I just
got a kswapd0 CPU hog there too.  Dropping page cache helped this time.

Another observation is that ksoftirqd/0 seem to consistently consume much
more CPU in 3.14/3.15 than in 3.12.  Usually not hugely so at specific time
points, but the accumulated CPU for it on a system that's been up a couple
of days with constant load is looking a bit scary.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs
  2014-06-06 11:38 [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs Mikael Pettersson
  2014-06-06 11:43 ` Geert Uytterhoeven
@ 2014-06-11  8:45 ` Andreas Schwab
  2014-07-01 11:43   ` Mikael Pettersson
  1 sibling, 1 reply; 29+ messages in thread
From: Andreas Schwab @ 2014-06-11  8:45 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: linux-m68k

Mikael Pettersson <mikpelinux@gmail.com> writes:

> Reverting to 3.12.16 completely eliminates these problems.

Even 3.11 has the kswapd0 cpu hog problem.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs
  2014-06-11  8:45 ` Andreas Schwab
@ 2014-07-01 11:43   ` Mikael Pettersson
  2015-03-31  1:16     ` Michael Schmitz
  0 siblings, 1 reply; 29+ messages in thread
From: Mikael Pettersson @ 2014-07-01 11:43 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Mikael Pettersson, linux-m68k

Andreas Schwab writes:
 > Mikael Pettersson <mikpelinux@gmail.com> writes:
 > 
 > > Reverting to 3.12.16 completely eliminates these problems.
 > 
 > Even 3.11 has the kswapd0 cpu hog problem.

Hmm, I just got the kswapd0 CPU hog on 3.12.16 too (while compiling
java code during a gcc package rebuild).

So kernels >= 3.11 have the kswapd0 CPU hog bug, and kernels >= 3.13
also have the ksoftirdq/0 CPU hog bug.

What's the last known-good kernel? 3.10?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs
  2014-07-01 11:43   ` Mikael Pettersson
@ 2015-03-31  1:16     ` Michael Schmitz
  2015-03-31 13:19       ` Mikael Pettersson
  2015-04-01 16:11       ` Andreas Schwab
  0 siblings, 2 replies; 29+ messages in thread
From: Michael Schmitz @ 2015-03-31  1:16 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: Andreas Schwab, Linux/m68k

Hi,

has anyone found a solution to this one?

3.18-rc5 has kswapd0 hogging the CPU - haven't seen ksoftirqd0 yet.
Unpacking a large tarball tends to trigger this for me.

Cheers,

  Michael


On Tue, Jul 1, 2014 at 11:43 PM, Mikael Pettersson <mikpelinux@gmail.com> wrote:
> Andreas Schwab writes:
>  > Mikael Pettersson <mikpelinux@gmail.com> writes:
>  >
>  > > Reverting to 3.12.16 completely eliminates these problems.
>  >
>  > Even 3.11 has the kswapd0 cpu hog problem.
>
> Hmm, I just got the kswapd0 CPU hog on 3.12.16 too (while compiling
> java code during a gcc package rebuild).
>
> So kernels >= 3.11 have the kswapd0 CPU hog bug, and kernels >= 3.13
> also have the ksoftirdq/0 CPU hog bug.
>
> What's the last known-good kernel? 3.10?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-m68k" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs
  2015-03-31  1:16     ` Michael Schmitz
@ 2015-03-31 13:19       ` Mikael Pettersson
  2015-04-01  3:08         ` Michael Schmitz
  2016-02-21 17:06         ` Mikael Pettersson
  2015-04-01 16:11       ` Andreas Schwab
  1 sibling, 2 replies; 29+ messages in thread
From: Mikael Pettersson @ 2015-03-31 13:19 UTC (permalink / raw)
  To: Michael Schmitz; +Cc: Mikael Pettersson, Andreas Schwab, Linux/m68k

Michael Schmitz writes:
 > Hi,
 > 
 > has anyone found a solution to this one?
 > 
 > 3.18-rc5 has kswapd0 hogging the CPU - haven't seen ksoftirqd0 yet.
 > Unpacking a large tarball tends to trigger this for me.

Alas, no.  I went back to the 3.10.xx kernels and they work Ok for me
(they tend to hang during shutdown, but I can live with that).

I should do a git bisect...

/Mikael

 > 
 > Cheers,
 > 
 >   Michael
 > 
 > 
 > On Tue, Jul 1, 2014 at 11:43 PM, Mikael Pettersson <mikpelinux@gmail.com> wrote:
 > > Andreas Schwab writes:
 > >  > Mikael Pettersson <mikpelinux@gmail.com> writes:
 > >  >
 > >  > > Reverting to 3.12.16 completely eliminates these problems.
 > >  >
 > >  > Even 3.11 has the kswapd0 cpu hog problem.
 > >
 > > Hmm, I just got the kswapd0 CPU hog on 3.12.16 too (while compiling
 > > java code during a gcc package rebuild).
 > >
 > > So kernels >= 3.11 have the kswapd0 CPU hog bug, and kernels >= 3.13
 > > also have the ksoftirdq/0 CPU hog bug.
 > >
 > > What's the last known-good kernel? 3.10?
 > > --
 > > To unsubscribe from this list: send the line "unsubscribe linux-m68k" in
 > > the body of a message to majordomo@vger.kernel.org
 > > More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs
  2015-03-31 13:19       ` Mikael Pettersson
@ 2015-04-01  3:08         ` Michael Schmitz
  2015-04-01  4:45           ` Finn Thain
  2016-02-21 17:06         ` Mikael Pettersson
  1 sibling, 1 reply; 29+ messages in thread
From: Michael Schmitz @ 2015-04-01  3:08 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: Andreas Schwab, Linux/m68k

Hi Mikael,
>  > has anyone found a solution to this one?
>  > 
>  > 3.18-rc5 has kswapd0 hogging the CPU - haven't seen ksoftirqd0 yet.
>  > Unpacking a large tarball tends to trigger this for me.
>
> Alas, no.  I went back to the 3.10.xx kernels and they work Ok for me
> (they tend to hang during shutdown, but I can live with that).
>   

I've followed the vm stats while running gunzip -c on a large file. I 
get an 'invalid compressed data' error at the very end of the gunzip 
run, and the file md5sum does not match what I get when uncompressing 
that file on another system with no error

As long as that file will fit into available memory, all that happens is 
kswapd0 and/or kswapd1 running forever after gunzip has finished. kswapd 
running full tilt appears to coincide with the number of free pages 
hitting the min_free_kbytes limit. The number of dirty pages never grows 
very large (hovers around 1000, less than 1% of RAM size) and remains 
below the nr_dirty_threshold (10800) and nr_dirty_background_threshold 
(5400) limits at all time. (Both limits progressively shrink over time - 
is that normal?).

If I force the VM to only use part of the RAM (by setting 
min_free_kbytes to say 10% of RAM size), the system becomes unresponsive 
as soon as the limit is reached. The swap tasks start to eat substantial 
amounts of CPU once the number of free pages approaches  that limit , 
nr_dirty drops at that time, and nr_dirty_threshold as well as 
nr_dirty_background_threshold begin to rise again  - above the initial 
values, in fact.
> I should do a git bisect...
>   

Would be nice to be able to force this a lot quicker. I'll try with 
smaller files to uncompress, and larger min_free limit.

Cheers,

    Michael

> /Mikael
>
>  > 
>  > Cheers,
>  > 
>  >   Michael
>  > 
>  > 
>  > On Tue, Jul 1, 2014 at 11:43 PM, Mikael Pettersson <mikpelinux@gmail.com> wrote:
>  > > Andreas Schwab writes:
>  > >  > Mikael Pettersson <mikpelinux@gmail.com> writes:
>  > >  >
>  > >  > > Reverting to 3.12.16 completely eliminates these problems.
>  > >  >
>  > >  > Even 3.11 has the kswapd0 cpu hog problem.
>  > >
>  > > Hmm, I just got the kswapd0 CPU hog on 3.12.16 too (while compiling
>  > > java code during a gcc package rebuild).
>  > >
>  > > So kernels >= 3.11 have the kswapd0 CPU hog bug, and kernels >= 3.13
>  > > also have the ksoftirdq/0 CPU hog bug.
>  > >
>  > > What's the last known-good kernel? 3.10?
>  > > --
>  > > To unsubscribe from this list: send the line "unsubscribe linux-m68k" in
>  > > the body of a message to majordomo@vger.kernel.org
>  > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>   

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs
  2015-04-01  3:08         ` Michael Schmitz
@ 2015-04-01  4:45           ` Finn Thain
  2015-04-01  5:21             ` Michael Schmitz
  2015-04-06 21:25             ` Michael Schmitz
  0 siblings, 2 replies; 29+ messages in thread
From: Finn Thain @ 2015-04-01  4:45 UTC (permalink / raw)
  To: Michael Schmitz; +Cc: Mikael Pettersson, Andreas Schwab, Linux/m68k


On Wed, 1 Apr 2015, Michael Schmitz wrote:

> Hi Mikael,
> > > has anyone found a solution to this one?
> > > 
> > > 3.18-rc5 has kswapd0 hogging the CPU - haven't seen ksoftirqd0 yet. 
> > > Unpacking a large tarball tends to trigger this for me.
> >
> > Alas, no.  I went back to the 3.10.xx kernels and they work Ok for me 
> > (they tend to hang during shutdown, but I can live with that).
> >   
> 
> I've followed the vm stats while running gunzip -c on a large file. I 
> get an 'invalid compressed data' error at the very end of the gunzip 
> run, and the file md5sum does not match what I get when uncompressing 
> that file on another system with no error

Was that an aranym virtual machine or a physical one? If physical, can the 
error be reproduced using a virtual one (given same RAM size, kernel etc)?

-- 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs
  2015-04-01  4:45           ` Finn Thain
@ 2015-04-01  5:21             ` Michael Schmitz
  2015-04-06 21:25             ` Michael Schmitz
  1 sibling, 0 replies; 29+ messages in thread
From: Michael Schmitz @ 2015-04-01  5:21 UTC (permalink / raw)
  To: Finn Thain; +Cc: Mikael Pettersson, Andreas Schwab, Linux/m68k

Hi Finn,

>> I've followed the vm stats while running gunzip -c on a large file. I 
>> get an 'invalid compressed data' error at the very end of the gunzip 
>> run, and the file md5sum does not match what I get when uncompressing 
>> that file on another system with no error
>>     
>
> Was that an aranym virtual machine or a physical one? If physical, can the 
> error be reproduced using a virtual one (given same RAM size, kernel etc)?
>
>   

Physical - and I haven't tried to reproduce on a VM yet. Cloning the 
system for a VM won't be trivial but I can give it a shot.

    Michael

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs
  2015-03-31  1:16     ` Michael Schmitz
  2015-03-31 13:19       ` Mikael Pettersson
@ 2015-04-01 16:11       ` Andreas Schwab
  1 sibling, 0 replies; 29+ messages in thread
From: Andreas Schwab @ 2015-04-01 16:11 UTC (permalink / raw)
  To: Michael Schmitz; +Cc: Mikael Pettersson, Linux/m68k

Michael Schmitz <schmitzmic@gmail.com> writes:

> has anyone found a solution to this one?
>
> 3.18-rc5 has kswapd0 hogging the CPU - haven't seen ksoftirqd0 yet.
> Unpacking a large tarball tends to trigger this for me.

The only workaround I have is to drop caches regularily
(/proc/sys/vm/drop_caches) when MemFree is getting low.  All my OBS
build workers are stable running 3.11.6 with this workaround.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs
  2015-04-01  4:45           ` Finn Thain
  2015-04-01  5:21             ` Michael Schmitz
@ 2015-04-06 21:25             ` Michael Schmitz
  2015-04-07  0:06               ` Finn Thain
  1 sibling, 1 reply; 29+ messages in thread
From: Michael Schmitz @ 2015-04-06 21:25 UTC (permalink / raw)
  To: Finn Thain; +Cc: Mikael Pettersson, Andreas Schwab, Linux/m68k

Hi Finn,

> On Wed, 1 Apr 2015, Michael Schmitz wrote:
>
>   
>> I've followed the vm stats while running gunzip -c on a large file. I 
>> get an 'invalid compressed data' error at the very end of the gunzip 
>> run, and the file md5sum does not match what I get when uncompressing 
>> that file on another system with no error
>>     
>
> Was that an aranym virtual machine or a physical one? If physical, can the 
> error be reproduced using a virtual one (given same RAM size, kernel etc)?
>   

The gunzip error cannot be reproduced on any of my ARAnyM VMs. Might be 
a RAM error or other hardware related problem. Same kernel but 
configured slightly different (added 030 and 040 support, plus ARAnyM 
support).

The general behaviour (gunzip eats up all free RAM, then kswapd spins 
doing nothing very apparent, with no dirty pages to be flushed and 
cached pages never released) remains the same. Though I've seen the 
gunzip complete without kicking off kswapd on occasion (had set 
dirty_background_ratio and dirty_ratio half the default for that).

And yes, dropping cached pages as Andreas suggested, does free up 
significant (i.e. most of all) RAM and shuts up kswapd.

    Michael

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs
  2015-04-06 21:25             ` Michael Schmitz
@ 2015-04-07  0:06               ` Finn Thain
  2015-04-07  5:38                 ` Michael Schmitz
  0 siblings, 1 reply; 29+ messages in thread
From: Finn Thain @ 2015-04-07  0:06 UTC (permalink / raw)
  To: Michael Schmitz; +Cc: Mikael Pettersson, Andreas Schwab, Linux/m68k


On Tue, 7 Apr 2015, Michael Schmitz wrote:

> Hi Finn,
> 
> > On Wed, 1 Apr 2015, Michael Schmitz wrote:
> >
> >   
> > > I've followed the vm stats while running gunzip -c on a large file. 
> > > I get an 'invalid compressed data' error at the very end of the 
> > > gunzip run, and the file md5sum does not match what I get when 
> > > uncompressing that file on another system with no error
> > >     
> >
> > Was that an aranym virtual machine or a physical one? If physical, can 
> > the error be reproduced using a virtual one (given same RAM size, 
> > kernel etc)?
> >   
> 
> The gunzip error cannot be reproduced on any of my ARAnyM VMs. Might be 
> a RAM error or other hardware related problem. Same kernel but 
> configured slightly different (added 030 and 040 support, plus ARAnyM 
> support).

The configuration differences could be cancelled by booting your new 
aranym kernel on the physical hardware, and reproducing the fault that 
way. BTW, is this the same physical machine that has DMA issues, which we 
discussed off-list in the past?

> 
> The general behaviour (gunzip eats up all free RAM, then kswapd spins 
> doing nothing very apparent, with no dirty pages to be flushed and 
> cached pages never released) remains the same. Though I've seen the 
> gunzip complete without kicking off kswapd on occasion (had set 
> dirty_background_ratio and dirty_ratio half the default for that).
> 
> And yes, dropping cached pages as Andreas suggested, does free up 
> significant (i.e. most of all) RAM and shuts up kswapd.

This bug has been reported to Red Hat in the past (on x86_64). They closed 
the bugzilla entry in 2011, but the bug was still being reported by Fedora 
users in 2014. https://bugzilla.redhat.com/show_bug.cgi?id=712019

-- 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs
  2015-04-07  0:06               ` Finn Thain
@ 2015-04-07  5:38                 ` Michael Schmitz
  0 siblings, 0 replies; 29+ messages in thread
From: Michael Schmitz @ 2015-04-07  5:38 UTC (permalink / raw)
  To: Finn Thain; +Cc: Mikael Pettersson, Andreas Schwab, Linux/m68k

Hi Finn,
> On Tue, 7 Apr 2015, Michael Schmitz wrote:
>
>   
>> The gunzip error cannot be reproduced on any of my ARAnyM VMs. Might be 
>> a RAM error or other hardware related problem. Same kernel but 
>> configured slightly different (added 030 and 040 support, plus ARAnyM 
>> support).
>>     
>
> The configuration differences could be cancelled by booting your new 
> aranym kernel on the physical hardware, and reproducing the fault that 
>   

Sure, and I'll make certain to do that once I've finished the current 
task (updating stuff in a current unstable chroot).

> way. BTW, is this the same physical machine that has DMA issues, which we 
> discussed off-list in the past?
>   

The very same. Runs fairly stable otherwise though - I would have 
expected filesystem corruption or other more drastic errors if the RAM 
was faulty.

>   
>> The general behaviour (gunzip eats up all free RAM, then kswapd spins 
>> doing nothing very apparent, with no dirty pages to be flushed and 
>> cached pages never released) remains the same. Though I've seen the 
>> gunzip complete without kicking off kswapd on occasion (had set 
>> dirty_background_ratio and dirty_ratio half the default for that).
>>
>> And yes, dropping cached pages as Andreas suggested, does free up 
>> significant (i.e. most of all) RAM and shuts up kswapd.
>>     
>
> This bug has been reported to Red Hat in the past (on x86_64). They closed 
> the bugzilla entry in 2011, but the bug was still being reported by Fedora 
> users in 2014. https://bugzilla.redhat.com/show_bug.cgi?id=712019
>   

Thanks for pointing that out!

The patches attached to this report made it into Linus' git tree at that 
time so I presume we are seeing something closely related. The 
discussion on LRML makes my head spin - not a chance to debug this in a 
meaningful way, I suppose.

    Michael

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs
  2015-03-31 13:19       ` Mikael Pettersson
  2015-04-01  3:08         ` Michael Schmitz
@ 2016-02-21 17:06         ` Mikael Pettersson
  2016-02-21 19:31           ` Michael Schmitz
                             ` (2 more replies)
  1 sibling, 3 replies; 29+ messages in thread
From: Mikael Pettersson @ 2016-02-21 17:06 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: Michael Schmitz, Andreas Schwab, Linux/m68k

Mikael Pettersson writes:
 > Michael Schmitz writes:
 >  > Hi,
 >  > 
 >  > has anyone found a solution to this one?
 >  > 
 >  > 3.18-rc5 has kswapd0 hogging the CPU - haven't seen ksoftirqd0 yet.
 >  > Unpacking a large tarball tends to trigger this for me.
 > 
 > Alas, no.  I went back to the 3.10.xx kernels and they work Ok for me
 > (they tend to hang during shutdown, but I can live with that).
 > 
 > I should do a git bisect...

I've done two git bisects on this.  The first one was inconclusive
(pointed to a harmless commit), but the second one ended up with:

# first bad commit: [ac4de9543aca59f2b763746647577302fbedd57e] Merge branch 'akpm' (patches from Andrew Morton)

That's a big pile of VM changes, so I think it could be the culprit.

Are people still seeing the kswapd bug with current kernels?  I've
stayed with the 3.10 kernel series on my aranym VMs, except for the
one VM that did the bisection tests.

/Mikael

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs
  2016-02-21 17:06         ` Mikael Pettersson
@ 2016-02-21 19:31           ` Michael Schmitz
  2016-02-22 10:01           ` Geert Uytterhoeven
  2016-05-31  4:52           ` Finn Thain
  2 siblings, 0 replies; 29+ messages in thread
From: Michael Schmitz @ 2016-02-21 19:31 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: Andreas Schwab, Linux/m68k

Hi Mikael,

I've since changed too much around my system to be certain, but would
suspect the bug to still be there. I'll see whether changing my config
to use SLUB retriggers the bug when uncompressing a huge file (but I'm
running with only 14 MB noe, which makes traversing the entire VM
metadata a lot faster).

Cheers,

  Michael


On Mon, Feb 22, 2016 at 6:06 AM, Mikael Pettersson <mikpelinux@gmail.com> wrote:
> Mikael Pettersson writes:
>  > Michael Schmitz writes:
>  >  > Hi,
>  >  >
>  >  > has anyone found a solution to this one?
>  >  >
>  >  > 3.18-rc5 has kswapd0 hogging the CPU - haven't seen ksoftirqd0 yet.
>  >  > Unpacking a large tarball tends to trigger this for me.
>  >
>  > Alas, no.  I went back to the 3.10.xx kernels and they work Ok for me
>  > (they tend to hang during shutdown, but I can live with that).
>  >
>  > I should do a git bisect...
>
> I've done two git bisects on this.  The first one was inconclusive
> (pointed to a harmless commit), but the second one ended up with:
>
> # first bad commit: [ac4de9543aca59f2b763746647577302fbedd57e] Merge branch 'akpm' (patches from Andrew Morton)
>
> That's a big pile of VM changes, so I think it could be the culprit.
>
> Are people still seeing the kswapd bug with current kernels?  I've
> stayed with the 3.10 kernel series on my aranym VMs, except for the
> one VM that did the bisection tests.
>
> /Mikael

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs
  2016-02-21 17:06         ` Mikael Pettersson
  2016-02-21 19:31           ` Michael Schmitz
@ 2016-02-22 10:01           ` Geert Uytterhoeven
  2016-03-06  7:21             ` Mikael Pettersson
  2016-05-31  4:52           ` Finn Thain
  2 siblings, 1 reply; 29+ messages in thread
From: Geert Uytterhoeven @ 2016-02-22 10:01 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: Michael Schmitz, Andreas Schwab, Linux/m68k

Hi Mikael,

On Sun, Feb 21, 2016 at 6:06 PM, Mikael Pettersson <mikpelinux@gmail.com> wrote:
> Mikael Pettersson writes:
>  > Michael Schmitz writes:
>  >  > has anyone found a solution to this one?
>  >  >
>  >  > 3.18-rc5 has kswapd0 hogging the CPU - haven't seen ksoftirqd0 yet.
>  >  > Unpacking a large tarball tends to trigger this for me.
>  >
>  > Alas, no.  I went back to the 3.10.xx kernels and they work Ok for me
>  > (they tend to hang during shutdown, but I can live with that).
>  >
>  > I should do a git bisect...
>
> I've done two git bisects on this.  The first one was inconclusive
> (pointed to a harmless commit), but the second one ended up with:

Thanks a lot for doing this!

> # first bad commit: [ac4de9543aca59f2b763746647577302fbedd57e] Merge branch 'akpm' (patches from Andrew Morton)
>
> That's a big pile of VM changes, so I think it could be the culprit.

So git bisect pointed to the merge commit itself, not to any of the commits in
the akpm branch?

I redid that merge myself, and the result is the same as ac4de9543aca5.
There could still be a semantical merge conflict that cannot be detected by
git, though.

Could you try cherry-picking the 36 commits from the akpm branch and
bisecting that?
I.e.
    git checkout 26935fb06ee88f11
    git cherry-pick 26935fb06ee88f11..de32a8177f64bc62
    git bisect start
    git bisect bad
    git bisect good 26935fb06ee88f11

Thanks again!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs
  2016-02-22 10:01           ` Geert Uytterhoeven
@ 2016-03-06  7:21             ` Mikael Pettersson
  2016-03-06  8:54               ` Geert Uytterhoeven
  0 siblings, 1 reply; 29+ messages in thread
From: Mikael Pettersson @ 2016-03-06  7:21 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Mikael Pettersson, Michael Schmitz, Andreas Schwab, Linux/m68k

Geert Uytterhoeven writes:
 > Hi Mikael,
 > 
 > On Sun, Feb 21, 2016 at 6:06 PM, Mikael Pettersson <mikpelinux@gmail.com> wrote:
 > > Mikael Pettersson writes:
 > >  > Michael Schmitz writes:
 > >  >  > has anyone found a solution to this one?
 > >  >  >
 > >  >  > 3.18-rc5 has kswapd0 hogging the CPU - haven't seen ksoftirqd0 yet.
 > >  >  > Unpacking a large tarball tends to trigger this for me.
 > >  >
 > >  > Alas, no.  I went back to the 3.10.xx kernels and they work Ok for me
 > >  > (they tend to hang during shutdown, but I can live with that).
 > >  >
 > >  > I should do a git bisect...
 > >
 > > I've done two git bisects on this.  The first one was inconclusive
 > > (pointed to a harmless commit), but the second one ended up with:
 > 
 > Thanks a lot for doing this!
 > 
 > > # first bad commit: [ac4de9543aca59f2b763746647577302fbedd57e] Merge branch 'akpm' (patches from Andrew Morton)
 > >
 > > That's a big pile of VM changes, so I think it could be the culprit.
 > 
 > So git bisect pointed to the merge commit itself, not to any of the commits in
 > the akpm branch?
 > 
 > I redid that merge myself, and the result is the same as ac4de9543aca5.
 > There could still be a semantical merge conflict that cannot be detected by
 > git, though.
 > 
 > Could you try cherry-picking the 36 commits from the akpm branch and
 > bisecting that?
 > I.e.
 >     git checkout 26935fb06ee88f11
 >     git cherry-pick 26935fb06ee88f11..de32a8177f64bc62
 >     git bisect start
 >     git bisect bad
 >     git bisect good 26935fb06ee88f11

I ran these exact commands and restarted my bisection + test loop.

However, git told me it had some 50000+ commits to go through in 16 steps,
so it looks like it selected a much larger range than those 36 commits.

/Mikael

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs
  2016-03-06  7:21             ` Mikael Pettersson
@ 2016-03-06  8:54               ` Geert Uytterhoeven
  2016-03-06  9:20                 ` Mikael Pettersson
  0 siblings, 1 reply; 29+ messages in thread
From: Geert Uytterhoeven @ 2016-03-06  8:54 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: Michael Schmitz, Andreas Schwab, Linux/m68k

Hi Mikael,

On Sun, Mar 6, 2016 at 8:21 AM, Mikael Pettersson <mikpelinux@gmail.com> wrote:
> Geert Uytterhoeven writes:
>  > On Sun, Feb 21, 2016 at 6:06 PM, Mikael Pettersson <mikpelinux@gmail.com> wrote:
>  > > # first bad commit: [ac4de9543aca59f2b763746647577302fbedd57e] Merge branch 'akpm' (patches from Andrew Morton)
>  > >
>  > > That's a big pile of VM changes, so I think it could be the culprit.
>  >
>  > So git bisect pointed to the merge commit itself, not to any of the commits in
>  > the akpm branch?
>  >
>  > I redid that merge myself, and the result is the same as ac4de9543aca5.
>  > There could still be a semantical merge conflict that cannot be detected by
>  > git, though.
>  >
>  > Could you try cherry-picking the 36 commits from the akpm branch and
>  > bisecting that?
>  > I.e.
>  >     git checkout 26935fb06ee88f11
>  >     git cherry-pick 26935fb06ee88f11..de32a8177f64bc62
>  >     git bisect start
>  >     git bisect bad
>  >     git bisect good 26935fb06ee88f11
>
> I ran these exact commands and restarted my bisection + test loop.
>
> However, git told me it had some 50000+ commits to go through in 16 steps,
> so it looks like it selected a much larger range than those 36 commits.

Are you sure you did exactly that?

$ git checkout 26935fb06ee8
[...]
$ git cherry-pick 26935fb06ee88f11..de32a8177f64bc62
[...]
$ git bisect start
$ git bisect bad
$ git bisect good 26935fb06ee88f11
Bisecting: 17 revisions left to test after this (roughly 4 steps)
[8969e7b3b3302ea668d300d0fa593108003b908b] mm: memcg: do not trap
chargers with full callstack on OOM
$

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs
  2016-03-06  8:54               ` Geert Uytterhoeven
@ 2016-03-06  9:20                 ` Mikael Pettersson
  2016-04-13 18:57                   ` Mikael Pettersson
  0 siblings, 1 reply; 29+ messages in thread
From: Mikael Pettersson @ 2016-03-06  9:20 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Mikael Pettersson, Michael Schmitz, Andreas Schwab, Linux/m68k

Geert Uytterhoeven writes:
 > Hi Mikael,
 > 
 > On Sun, Mar 6, 2016 at 8:21 AM, Mikael Pettersson <mikpelinux@gmail.com> wrote:
 > > Geert Uytterhoeven writes:
 > >  > On Sun, Feb 21, 2016 at 6:06 PM, Mikael Pettersson <mikpelinux@gmail.com> wrote:
 > >  > > # first bad commit: [ac4de9543aca59f2b763746647577302fbedd57e] Merge branch 'akpm' (patches from Andrew Morton)
 > >  > >
 > >  > > That's a big pile of VM changes, so I think it could be the culprit.
 > >  >
 > >  > So git bisect pointed to the merge commit itself, not to any of the commits in
 > >  > the akpm branch?
 > >  >
 > >  > I redid that merge myself, and the result is the same as ac4de9543aca5.
 > >  > There could still be a semantical merge conflict that cannot be detected by
 > >  > git, though.
 > >  >
 > >  > Could you try cherry-picking the 36 commits from the akpm branch and
 > >  > bisecting that?
 > >  > I.e.
 > >  >     git checkout 26935fb06ee88f11
 > >  >     git cherry-pick 26935fb06ee88f11..de32a8177f64bc62
 > >  >     git bisect start
 > >  >     git bisect bad
 > >  >     git bisect good 26935fb06ee88f11
 > >
 > > I ran these exact commands and restarted my bisection + test loop.
 > >
 > > However, git told me it had some 50000+ commits to go through in 16 steps,
 > > so it looks like it selected a much larger range than those 36 commits.
 > 
 > Are you sure you did exactly that?
 > 
 > $ git checkout 26935fb06ee8
 > [...]
 > $ git cherry-pick 26935fb06ee88f11..de32a8177f64bc62
 > [...]
 > $ git bisect start
 > $ git bisect bad
 > $ git bisect good 26935fb06ee88f11
 > Bisecting: 17 revisions left to test after this (roughly 4 steps)
 > [8969e7b3b3302ea668d300d0fa593108003b908b] mm: memcg: do not trap
 > chargers with full callstack on OOM
 > $

Yes, I copy-pasted those commands exactly.  However, git got confused
because I didn't 'git bisect reset' first.  Now it has the correct
range to bisect :-)

/Mikael

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs
  2016-03-06  9:20                 ` Mikael Pettersson
@ 2016-04-13 18:57                   ` Mikael Pettersson
  0 siblings, 0 replies; 29+ messages in thread
From: Mikael Pettersson @ 2016-04-13 18:57 UTC (permalink / raw)
  To: Mikael Pettersson
  Cc: Geert Uytterhoeven, Michael Schmitz, Andreas Schwab, Linux/m68k

Mikael Pettersson writes:
 > Geert Uytterhoeven writes:
 >  > Hi Mikael,
 >  > 
 >  > On Sun, Mar 6, 2016 at 8:21 AM, Mikael Pettersson <mikpelinux@gmail.com> wrote:
 >  > > Geert Uytterhoeven writes:
 >  > >  > On Sun, Feb 21, 2016 at 6:06 PM, Mikael Pettersson <mikpelinux@gmail.com> wrote:
 >  > >  > > # first bad commit: [ac4de9543aca59f2b763746647577302fbedd57e] Merge branch 'akpm' (patches from Andrew Morton)
 >  > >  > >
 >  > >  > > That's a big pile of VM changes, so I think it could be the culprit.
 >  > >  >
 >  > >  > So git bisect pointed to the merge commit itself, not to any of the commits in
 >  > >  > the akpm branch?
 >  > >  >
 >  > >  > I redid that merge myself, and the result is the same as ac4de9543aca5.
 >  > >  > There could still be a semantical merge conflict that cannot be detected by
 >  > >  > git, though.
 >  > >  >
 >  > >  > Could you try cherry-picking the 36 commits from the akpm branch and
 >  > >  > bisecting that?
 >  > >  > I.e.
 >  > >  >     git checkout 26935fb06ee88f11
 >  > >  >     git cherry-pick 26935fb06ee88f11..de32a8177f64bc62
 >  > >  >     git bisect start
 >  > >  >     git bisect bad
 >  > >  >     git bisect good 26935fb06ee88f11
 >  > >
 >  > > I ran these exact commands and restarted my bisection + test loop.
 >  > >
 >  > > However, git told me it had some 50000+ commits to go through in 16 steps,
 >  > > so it looks like it selected a much larger range than those 36 commits.
 >  > 
 >  > Are you sure you did exactly that?
 >  > 
 >  > $ git checkout 26935fb06ee8
 >  > [...]
 >  > $ git cherry-pick 26935fb06ee88f11..de32a8177f64bc62
 >  > [...]
 >  > $ git bisect start
 >  > $ git bisect bad
 >  > $ git bisect good 26935fb06ee88f11
 >  > Bisecting: 17 revisions left to test after this (roughly 4 steps)
 >  > [8969e7b3b3302ea668d300d0fa593108003b908b] mm: memcg: do not trap
 >  > chargers with full callstack on OOM
 >  > $
 > 
 > Yes, I copy-pasted those commands exactly.  However, git got confused
 > because I didn't 'git bisect reset' first.  Now it has the correct
 > range to bisect :-)

First bisection run completed:

5ee28828d1b5f8d036dc5793be5321dd6f26d344 is the first bad commit
commit 5ee28828d1b5f8d036dc5793be5321dd6f26d344
Author: Michal Hocko <mhocko@suse.cz>
Date:   Thu Sep 12 15:13:26 2013 -0700

    memcg: enhance memcg iterator to support predicates

    The caller of the iterator might know that some nodes or even subtrees
    should be skipped but there is no way to tell iterators about that so the
    only choice left is to let iterators to visit each node and do the
    selection outside of the iterating code.  This, however, doesn't scale
    well with hierarchies with many groups where only few groups are
    interesting.

    This patch adds mem_cgroup_iter_cond variant of the iterator with a
    callback which gets called for every visited node.  There are three
    possible ways how the callback can influence the walk.  Either the node is
    visited, it is skipped but the tree walk continues down the tree or the
    whole subtree of the current group is skipped.

    [hughd@google.com: fix memcg-less page reclaim]
    Signed-off-by: Michal Hocko <mhocko@suse.cz>
    Cc: Balbir Singh <bsingharora@gmail.com>
    Cc: Glauber Costa <glommer@openvz.org>
    Cc: Greg Thelen <gthelen@google.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Cc: Michel Lespinasse <walken@google.com>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Ying Han <yinghan@google.com>
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

:040000 040000 46799adf458e8a2db998ef099dae907af03e1595 29638de8593a98add32f56fb9aea16cd8227f6a4 M      include
:040000 040000 27e4202d2f2a785e35454ef93e75919997219130 8c5973443d35e65eb88a0155b2c78bf90c650a21 M      mm

I'll reset and re-bisect with the known bad points pre-seeded.

/Mikael

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs
  2016-02-21 17:06         ` Mikael Pettersson
  2016-02-21 19:31           ` Michael Schmitz
  2016-02-22 10:01           ` Geert Uytterhoeven
@ 2016-05-31  4:52           ` Finn Thain
  2016-05-31 10:06             ` Mikael Pettersson
  2 siblings, 1 reply; 29+ messages in thread
From: Finn Thain @ 2016-05-31  4:52 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: Michael Schmitz, Andreas Schwab, Linux/m68k


On Sun, 21 Feb 2016, Mikael Pettersson wrote:

> I've done two git bisects on this.  The first one was inconclusive 
> (pointed to a harmless commit), but the second one ended up with:
> 
> # first bad commit: [ac4de9543aca59f2b763746647577302fbedd57e] Merge branch 'akpm' (patches from Andrew Morton)
> 
> That's a big pile of VM changes, so I think it could be the culprit.

I think this issue may date back to v2.6.38 or earlier.

The redhat.com bug report was closed in 2012 but Fedora users were still 
seeing the problem after it was supposedly fixed.
  https://bugzilla.redhat.com/show_bug.cgi?id=712019

That page also has a link to the bug report for Ubuntu:
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/484045

BTW, I came across this recently: "Rik van Riel pointed out that [the 
kswapd thread] tends to be slow for [the purpose of compaction], and it 
can get stuck in a shrinker somewhere waiting for a lock."
  http://lwn.net/Articles/684611/

Perhaps a stack trace would help to ascertain whether this is the same 
known bug or not (?)

-- 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs
  2016-05-31  4:52           ` Finn Thain
@ 2016-05-31 10:06             ` Mikael Pettersson
  2016-05-31 10:21               ` Geert Uytterhoeven
  0 siblings, 1 reply; 29+ messages in thread
From: Mikael Pettersson @ 2016-05-31 10:06 UTC (permalink / raw)
  To: Finn Thain; +Cc: Mikael Pettersson, Michael Schmitz, Andreas Schwab, Linux/m68k

Finn Thain writes:
 > 
 > On Sun, 21 Feb 2016, Mikael Pettersson wrote:
 > 
 > > I've done two git bisects on this.  The first one was inconclusive 
 > > (pointed to a harmless commit), but the second one ended up with:
 > > 
 > > # first bad commit: [ac4de9543aca59f2b763746647577302fbedd57e] Merge branch 'akpm' (patches from Andrew Morton)
 > > 
 > > That's a big pile of VM changes, so I think it could be the culprit.
 > 
 > I think this issue may date back to v2.6.38 or earlier.
 > 
 > The redhat.com bug report was closed in 2012 but Fedora users were still 
 > seeing the problem after it was supposedly fixed.
 >   https://bugzilla.redhat.com/show_bug.cgi?id=712019
 > 
 > That page also has a link to the bug report for Ubuntu:
 >   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/484045
 > 
 > BTW, I came across this recently: "Rik van Riel pointed out that [the 
 > kswapd thread] tends to be slow for [the purpose of compaction], and it 
 > can get stuck in a shrinker somewhere waiting for a lock."
 >   http://lwn.net/Articles/684611/
 > 
 > Perhaps a stack trace would help to ascertain whether this is the same 
 > known bug or not (?)
 > 
 > -- 

FWIW, my latest round(s) of bisects identified the following:

fdbadebec27cc92358ed4f593e8763cf10b82687 is the first bad commit
commit fdbadebec27cc92358ed4f593e8763cf10b82687
Author: Li Zefan <lizefan@huawei.com>
Date:   Thu Sep 12 15:13:19 2013 -0700

    memcg: remove redundant code in mem_cgroup_force_empty_write()

    vfs guarantees the cgroup won't be destroyed, so it's redundant to get a
    css reference.

    Signed-off-by: Li Zefan <lizefan@huawei.com>
    Acked-by: Michal Hocko <mhocko@suse.cz>
    Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Tejun Heo <tj@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

:040000 040000 1f6b5b056995067c7c60e6f87e9cd1f181e8fbeb ea29d63e70ce2320e144fac7b157a146d41360bf M      mm

This appears to be the first commit in the merge (git bisect refuses to
bisect before it), so either it's it or the problem predates the merge.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs
  2016-05-31 10:06             ` Mikael Pettersson
@ 2016-05-31 10:21               ` Geert Uytterhoeven
  2016-05-31 10:39                 ` John Paul Adrian Glaubitz
  2016-06-01  6:36                 ` Mikael Pettersson
  0 siblings, 2 replies; 29+ messages in thread
From: Geert Uytterhoeven @ 2016-05-31 10:21 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: Finn Thain, Michael Schmitz, Andreas Schwab, Linux/m68k

Hi Mikael,

On Tue, May 31, 2016 at 12:06 PM, Mikael Pettersson
<mikpelinux@gmail.com> wrote:
> Finn Thain writes:
>  > On Sun, 21 Feb 2016, Mikael Pettersson wrote:
>  > > I've done two git bisects on this.  The first one was inconclusive
>  > > (pointed to a harmless commit), but the second one ended up with:
>  > >
>  > > # first bad commit: [ac4de9543aca59f2b763746647577302fbedd57e] Merge branch 'akpm' (patches from Andrew Morton)
>  > >
>  > > That's a big pile of VM changes, so I think it could be the culprit.
>  >
>  > I think this issue may date back to v2.6.38 or earlier.
>  >
>  > The redhat.com bug report was closed in 2012 but Fedora users were still
>  > seeing the problem after it was supposedly fixed.
>  >   https://bugzilla.redhat.com/show_bug.cgi?id=712019
>  >
>  > That page also has a link to the bug report for Ubuntu:
>  >   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/484045
>  >
>  > BTW, I came across this recently: "Rik van Riel pointed out that [the
>  > kswapd thread] tends to be slow for [the purpose of compaction], and it
>  > can get stuck in a shrinker somewhere waiting for a lock."
>  >   http://lwn.net/Articles/684611/
>  >
>  > Perhaps a stack trace would help to ascertain whether this is the same
>  > known bug or not (?)
>  >
>  > --
>
> FWIW, my latest round(s) of bisects identified the following:
>
> fdbadebec27cc92358ed4f593e8763cf10b82687 is the first bad commit
> commit fdbadebec27cc92358ed4f593e8763cf10b82687
> Author: Li Zefan <lizefan@huawei.com>
> Date:   Thu Sep 12 15:13:19 2013 -0700
>
>     memcg: remove redundant code in mem_cgroup_force_empty_write()
>
>     vfs guarantees the cgroup won't be destroyed, so it's redundant to get a
>     css reference.
>
>     Signed-off-by: Li Zefan <lizefan@huawei.com>
>     Acked-by: Michal Hocko <mhocko@suse.cz>
>     Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>     Cc: Johannes Weiner <hannes@cmpxchg.org>
>     Cc: Tejun Heo <tj@kernel.org>
>     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>
> :040000 040000 1f6b5b056995067c7c60e6f87e9cd1f181e8fbeb ea29d63e70ce2320e144fac7b157a146d41360bf M      mm
>
> This appears to be the first commit in the merge (git bisect refuses to
> bisect before it), so either it's it or the problem predates the merge.

That's upstream commit c33bd8354f3a3bb26a98d2b6bf600b7b35657328?

Well done! Looks indeed like a suspect for the behavior you're seeing.
I suppose you will follow up with the mm people?

Thx!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs
  2016-05-31 10:21               ` Geert Uytterhoeven
@ 2016-05-31 10:39                 ` John Paul Adrian Glaubitz
  2016-05-31 10:41                   ` Geert Uytterhoeven
  2016-06-01  6:36                 ` Mikael Pettersson
  1 sibling, 1 reply; 29+ messages in thread
From: John Paul Adrian Glaubitz @ 2016-05-31 10:39 UTC (permalink / raw)
  To: Geert Uytterhoeven, Mikael Pettersson
  Cc: Finn Thain, Michael Schmitz, Andreas Schwab, Linux/m68k

On 05/31/2016 12:21 PM, Geert Uytterhoeven wrote:
> That's upstream commit c33bd8354f3a3bb26a98d2b6bf600b7b35657328?
> 
> Well done! Looks indeed like a suspect for the behavior you're seeing.
> I suppose you will follow up with the mm people?

Yay, great news. I'm currently building a native GHC package for Debian/m68k
on Aranym and [kswapd] is taking around half of the CPU load. The suggestion
with drop_caches doesn't help all the time, unfortunately, so I'm really
looking forward getting this fixed :).

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs
  2016-05-31 10:39                 ` John Paul Adrian Glaubitz
@ 2016-05-31 10:41                   ` Geert Uytterhoeven
  0 siblings, 0 replies; 29+ messages in thread
From: Geert Uytterhoeven @ 2016-05-31 10:41 UTC (permalink / raw)
  To: John Paul Adrian Glaubitz
  Cc: Mikael Pettersson, Finn Thain, Michael Schmitz, Andreas Schwab,
	Linux/m68k

Hi Adrian,

On Tue, May 31, 2016 at 12:39 PM, John Paul Adrian Glaubitz
<glaubitz@physik.fu-berlin.de> wrote:
> On 05/31/2016 12:21 PM, Geert Uytterhoeven wrote:
>> That's upstream commit c33bd8354f3a3bb26a98d2b6bf600b7b35657328?
>>
>> Well done! Looks indeed like a suspect for the behavior you're seeing.
>> I suppose you will follow up with the mm people?
>
> Yay, great news. I'm currently building a native GHC package for Debian/m68k
> on Aranym and [kswapd] is taking around half of the CPU load. The suggestion
> with drop_caches doesn't help all the time, unfortunately, so I'm really
> looking forward getting this fixed :).

I expect the bad commit can just be reverted, as the surrounding code hasn't
changed?

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs
  2016-05-31 10:21               ` Geert Uytterhoeven
  2016-05-31 10:39                 ` John Paul Adrian Glaubitz
@ 2016-06-01  6:36                 ` Mikael Pettersson
  1 sibling, 0 replies; 29+ messages in thread
From: Mikael Pettersson @ 2016-06-01  6:36 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Mikael Pettersson, Finn Thain, Michael Schmitz, Andreas Schwab,
	Linux/m68k

Geert Uytterhoeven writes:
 > Hi Mikael,
 > 
 > On Tue, May 31, 2016 at 12:06 PM, Mikael Pettersson
 > <mikpelinux@gmail.com> wrote:
 > > Finn Thain writes:
 > >  > On Sun, 21 Feb 2016, Mikael Pettersson wrote:
 > >  > > I've done two git bisects on this.  The first one was inconclusive
 > >  > > (pointed to a harmless commit), but the second one ended up with:
 > >  > >
 > >  > > # first bad commit: [ac4de9543aca59f2b763746647577302fbedd57e] Merge branch 'akpm' (patches from Andrew Morton)
 > >  > >
 > >  > > That's a big pile of VM changes, so I think it could be the culprit.
 > >  >
 > >  > I think this issue may date back to v2.6.38 or earlier.
 > >  >
 > >  > The redhat.com bug report was closed in 2012 but Fedora users were still
 > >  > seeing the problem after it was supposedly fixed.
 > >  >   https://bugzilla.redhat.com/show_bug.cgi?id=712019
 > >  >
 > >  > That page also has a link to the bug report for Ubuntu:
 > >  >   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/484045
 > >  >
 > >  > BTW, I came across this recently: "Rik van Riel pointed out that [the
 > >  > kswapd thread] tends to be slow for [the purpose of compaction], and it
 > >  > can get stuck in a shrinker somewhere waiting for a lock."
 > >  >   http://lwn.net/Articles/684611/
 > >  >
 > >  > Perhaps a stack trace would help to ascertain whether this is the same
 > >  > known bug or not (?)
 > >  >
 > >  > --
 > >
 > > FWIW, my latest round(s) of bisects identified the following:
 > >
 > > fdbadebec27cc92358ed4f593e8763cf10b82687 is the first bad commit
 > > commit fdbadebec27cc92358ed4f593e8763cf10b82687
 > > Author: Li Zefan <lizefan@huawei.com>
 > > Date:   Thu Sep 12 15:13:19 2013 -0700
 > >
 > >     memcg: remove redundant code in mem_cgroup_force_empty_write()
 > >
 > >     vfs guarantees the cgroup won't be destroyed, so it's redundant to get a
 > >     css reference.
 > >
 > >     Signed-off-by: Li Zefan <lizefan@huawei.com>
 > >     Acked-by: Michal Hocko <mhocko@suse.cz>
 > >     Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
 > >     Cc: Johannes Weiner <hannes@cmpxchg.org>
 > >     Cc: Tejun Heo <tj@kernel.org>
 > >     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
 > >     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
 > >
 > > :040000 040000 1f6b5b056995067c7c60e6f87e9cd1f181e8fbeb ea29d63e70ce2320e144fac7b157a146d41360bf M      mm
 > >
 > > This appears to be the first commit in the merge (git bisect refuses to
 > > bisect before it), so either it's it or the problem predates the merge.
 > 
 > That's upstream commit c33bd8354f3a3bb26a98d2b6bf600b7b35657328?
 > 
 > Well done! Looks indeed like a suspect for the behavior you're seeing.
 > I suppose you will follow up with the mm people?

Alas, this was a false find.  This commit is definitely in the bad range,
but reverting it from 3.12.60 doesn't eliminate the kswapd bug.

I'll have to bisect mainline again.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2016-06-01  6:36 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-06 11:38 [3.13 regression] kswapd0 and ksoftirqd/0 CPU hogs Mikael Pettersson
2014-06-06 11:43 ` Geert Uytterhoeven
2014-06-06 13:11   ` Mikael Pettersson
2014-06-07 13:22     ` Mikael Pettersson
2014-06-11  8:20       ` Mikael Pettersson
2014-06-11  8:45 ` Andreas Schwab
2014-07-01 11:43   ` Mikael Pettersson
2015-03-31  1:16     ` Michael Schmitz
2015-03-31 13:19       ` Mikael Pettersson
2015-04-01  3:08         ` Michael Schmitz
2015-04-01  4:45           ` Finn Thain
2015-04-01  5:21             ` Michael Schmitz
2015-04-06 21:25             ` Michael Schmitz
2015-04-07  0:06               ` Finn Thain
2015-04-07  5:38                 ` Michael Schmitz
2016-02-21 17:06         ` Mikael Pettersson
2016-02-21 19:31           ` Michael Schmitz
2016-02-22 10:01           ` Geert Uytterhoeven
2016-03-06  7:21             ` Mikael Pettersson
2016-03-06  8:54               ` Geert Uytterhoeven
2016-03-06  9:20                 ` Mikael Pettersson
2016-04-13 18:57                   ` Mikael Pettersson
2016-05-31  4:52           ` Finn Thain
2016-05-31 10:06             ` Mikael Pettersson
2016-05-31 10:21               ` Geert Uytterhoeven
2016-05-31 10:39                 ` John Paul Adrian Glaubitz
2016-05-31 10:41                   ` Geert Uytterhoeven
2016-06-01  6:36                 ` Mikael Pettersson
2015-04-01 16:11       ` Andreas Schwab

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.