linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* mm: LTP/memcg testcase regression induced by  8cd7c588decf..66ce520bb7c2 series
@ 2021-11-21 10:57 Mike Galbraith
  2021-11-21 13:51 ` Thorsten Leemhuis
  2021-11-23  9:13 ` Mel Gorman
  0 siblings, 2 replies; 11+ messages in thread
From: Mike Galbraith @ 2021-11-21 10:57 UTC (permalink / raw)
  To: lkml; +Cc: Mel Gorman

Greetings,

FYI, something in this series causes LTP controllers::memcg_regression
testcase to hang forever.  Verified via brute force revert of the lot.

After letting box moan for 4.5 hours, I poked ^C repeatedly, but runltp
didn't exit/recover gracefully, and ps hung, so I nuked the box.  All
memcg_test_1 instances were stuck in reclaim_throttle().

crash> ps | grep UN
  31458      1   5  ffff88e9c8ded100  UN   0.0    7620   1156  memcg_test_1
  31459      1   6  ffff88e9c8de8000  UN   0.0    7620   1156  memcg_test_1
  31460      1   4  ffff88e9cbdd0000  UN   0.0    7620   1156  memcg_test_1
  31461      1   2  ffff88ea41502880  UN   0.0    7620   1156  memcg_test_1
  31462      1   4  ffff88ea12750000  UN   0.0    7620   1156  memcg_test_1
  31463      1   7  ffff88ea12752880  UN   0.0    7620   1156  memcg_test_1
  31464      1   3  ffff88ea01eb8000  UN   0.0    7620   1156  memcg_test_1
  31465      1   6  ffff88e9c2bca880  UN   0.0    7620   1156  memcg_test_1
  31466      1   7  ffff88e9c2bc8000  UN   0.0    7620   1156  memcg_test_1
  31467      1   2  ffff88ea41e82880  UN   0.0    7620   1156  memcg_test_1
  31476   2834   2  ffff88ea1174d100  UN   0.0   38868   3744  ps
crash> bt -sx ffff88e9c8ded100
PID: 31458  TASK: ffff88e9c8ded100  CPU: 5   COMMAND: "memcg_test_1"
 #0 [ffffb36648bc79a0] __schedule+0x2c3 at ffffffffb72636e3
 #1 [ffffb36648bc7a60] schedule+0x3a at ffffffffb72644ba
 #2 [ffffb36648bc7a70] schedule_timeout+0x1f3 at ffffffffb7268d03
 #3 [ffffb36648bc7ae8] reclaim_throttle+0xd0 at ffffffffb6a69ab0
 #4 [ffffb36648bc7b50] do_try_to_free_pages+0x160 at ffffffffb6a6bab0
 #5 [ffffb36648bc7ba8] try_to_free_mem_cgroup_pages+0xf4 at ffffffffb6a6d264
 #6 [ffffb36648bc7c40] try_charge_memcg+0x19f at ffffffffb6b03b6f
 #7 [ffffb36648bc7ce0] obj_cgroup_charge_pages+0x27 at ffffffffb6b043a7
 #8 [ffffb36648bc7d08] obj_cgroup_charge+0x6f at ffffffffb6b0605f
 #9 [ffffb36648bc7d28] kmem_cache_alloc+0x86 at ffffffffb6ae7686
#10 [ffffb36648bc7d80] vm_area_alloc+0x1a at ffffffffb689aa5a
#11 [ffffb36648bc7d90] mmap_region+0x32f at ffffffffb6a9ee1f
#12 [ffffb36648bc7df8] do_mmap+0x392 at ffffffffb6a9f492
#13 [ffffb36648bc7e50] vm_mmap_pgoff+0xd5 at ffffffffb6a75125
#14 [ffffb36648bc7ed8] do_syscall_64+0x58 at ffffffffb7256138
#15 [ffffb36648bc7f28] exc_page_fault+0x67 at ffffffffb7259c97
#16 [ffffb36648bc7f50] entry_SYSCALL_64_after_hwframe+0x44 at ffffffffb740007c
    RIP: 00007f7b95ad9743  RSP: 00007ffe12089b08  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 0000000000000000  RCX: 00007f7b95ad9743
    RDX: 0000000000000003  RSI: 0000000000001000  RDI: 0000000000000000
    RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000002022  R11: 0000000000000246  R12: 0000000000000003
    R13: 0000000000001000  R14: 0000000000002022  R15: 0000000000000000
    ORIG_RAX: 0000000000000009  CS: 0033  SS: 002b
crash> bt -sx ffff88ea1174d100
PID: 31476  TASK: ffff88ea1174d100  CPU: 2   COMMAND: "ps"
 #0 [ffffb3664883fba8] __schedule+0x2c3 at ffffffffb72636e3
 #1 [ffffb3664883fc68] schedule+0x3a at ffffffffb72644ba
 #2 [ffffb3664883fc78] rwsem_down_read_slowpath+0x1bf at ffffffffb7266c8f
 #3 [ffffb3664883fd08] down_read_killable+0x5c at ffffffffb7266f3c
 #4 [ffffb3664883fd18] down_read_killable+0x5c at ffffffffb7266f3c
 #5 [ffffb3664883fd28] __access_remote_vm+0x4a at ffffffffb6a9795a
 #6 [ffffb3664883fd98] proc_pid_cmdline_read+0x16c at ffffffffb6bc3acc
 #7 [ffffb3664883fe00] vfs_read+0x9a at ffffffffb6b1d49a
 #8 [ffffb3664883fe30] ksys_read+0xa1 at ffffffffb6b1d891
 #9 [ffffb3664883fe70] do_syscall_64+0x58 at ffffffffb7256138
#10 [ffffb3664883fec0] do_sys_openat2+0x1cd at ffffffffb6b1913d
#11 [ffffb3664883ff00] do_sys_open+0x57 at ffffffffb6b1a7e7
#12 [ffffb3664883ff28] do_syscall_64+0x67 at ffffffffb7256147
#13 [ffffb3664883ff50] entry_SYSCALL_64_after_hwframe+0x44 at ffffffffb740007c
    RIP: 00007f192fb5cb5e  RSP: 00007ffc611e0ee8  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00007f19303de010  RCX: 00007f192fb5cb5e
    RDX: 0000000000020000  RSI: 00007f19303de010  RDI: 0000000000000006
    RBP: 0000000000020000   R8: 0000000000000007   R9: 00000000ffffffff
    R10: 0000000000000000  R11: 0000000000000246  R12: 00007f19303de010
    R13: 0000000000000000  R14: 0000000000000006  R15: 0000000000000000
    ORIG_RAX: 0000000000000000  CS: 0033  SS: 002b


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mm: LTP/memcg testcase regression induced by 8cd7c588decf..66ce520bb7c2 series
  2021-11-21 10:57 mm: LTP/memcg testcase regression induced by 8cd7c588decf..66ce520bb7c2 series Mike Galbraith
@ 2021-11-21 13:51 ` Thorsten Leemhuis
  2021-11-22  6:46   ` Mike Galbraith
  2021-11-23  9:13 ` Mel Gorman
  1 sibling, 1 reply; 11+ messages in thread
From: Thorsten Leemhuis @ 2021-11-21 13:51 UTC (permalink / raw)
  To: Mike Galbraith, lkml, regressions; +Cc: Mel Gorman


Hi, this is your Linux kernel regression tracker speaking.

CCing regression mailing list, which should be in the loop for all
regressions, as explained here:
https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html

On 21.11.21 11:57, Mike Galbraith wrote:
> Greetings,
> 
> FYI, something in this series causes LTP controllers::memcg_regression
> testcase to hang forever.  Verified via brute force revert of the lot.
> 
> After letting box moan for 4.5 hours, I poked ^C repeatedly, but runltp
> didn't exit/recover gracefully, and ps hung, so I nuked the box.  All
> memcg_test_1 instances were stuck in reclaim_throttle().
> [...]

TWIMC: To be sure this issue doesn't fall through the cracks unnoticed,
I'm adding it to regzbot, my Linux kernel regression tracking bot:

#regzbot ^introduced 8cd7c588decf..66ce520bb7c2
#regzbot ignore-activity

Ciao, Thorsten, your Linux kernel regression tracker.

P.S.: If you want to know more about regzbot, check out its
web-interface, the getting start guide, and/or the references documentation:

https://linux-regtracking.leemhuis.info/regzbot/
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md

The last two documents will explain how you can interact with regzbot
yourself if your want to.

Hint for the reporter: when reporting a regression it's in your interest
to tell #regzbot about it in the report, as that will ensure the
regression gets on the radar of regzbot and the regression tracker.
That's in your interest, as they will make sure the report won't fall
through the cracks unnoticed.

Hint for developers: you normally don't need to care about regzbot, just
fix the issue as you normally would. Just remember to include a 'Link:'
tag to the report in the commit message, as explained in
Documentation/process/submitting-patches.rst
That aspect was recently was made more explicit in commit 1f57bd42b77c:
https://git.kernel.org/linus/1f57bd42b77c


P.P.S.: As a Linux kernel regression tracker I'm getting a lot of
reports on my table. I can only look briefly into most of them.
Unfortunately therefore I sometimes will get things wrong or miss
something important. I hope that's not the case here; if you think it
is, don't hesitate to tell me about it in a public reply. That's in
everyone's interest, as what I wrote above might be misleading to
everyone reading this; any suggestion I gave they thus might sent
someone reading this down the wrong rabbit hole, which none of us wants.

BTW, I have no personal interest in this issue, which is tracked using
regzbot, my Linux kernel regression tracking bot
(https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
this mail to get things rolling again and hence don't need to be CC on
all further activity wrt to this regression.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mm: LTP/memcg testcase regression induced by 8cd7c588decf..66ce520bb7c2 series
  2021-11-21 13:51 ` Thorsten Leemhuis
@ 2021-11-22  6:46   ` Mike Galbraith
  2021-11-22  7:41     ` Mike Galbraith
  0 siblings, 1 reply; 11+ messages in thread
From: Mike Galbraith @ 2021-11-22  6:46 UTC (permalink / raw)
  To: Thorsten Leemhuis, lkml, regressions; +Cc: Mel Gorman

On Sun, 2021-11-21 at 14:51 +0100, Thorsten Leemhuis wrote:
>
> Hi, this is your Linux kernel regression tracker speaking.

This isn't necessarily a kernel regression, though the fact that
bystander tasks like ps can get snagged for ages as the testcase runs
at an amazingly glacial pace doesn't look particularly wonderful.

homer:/root # time ps lax
...
real    11m33.258s
user    0m0.004s
sys     0m0.008s
homer:/root #

The other memcg tests complete fine, though taking more time to do so.
Running all tests except memcg_regression, (and  memcg_stress because
it's a 30 minute swap storm from hell as delivered), runtime increased
from 0m37s to 4m43s.

The problematic testcase is root intentionally shooting its own foot,
then checking for battle damage that occurred in days of yore, so it
now hurting in a different way isn't necessarily grounds to gripe at
the kernel.. even with the rather ugly side effects.. maybe <shrug>.

#---------------------------------------------------------------------------
# Bug:    The bug was, while forking mass processes, trigger memcgroup OOM,
#         then NULL pointer dereference may be hit.
# Kernel: 2.6.25-rcX
# Links:  http://lkml.org/lkml/2008/4/14/38
# Fix:    commit e115f2d89253490fb2dbf304b627f8d908df26f1
#---------------------------------------------------------------------------
test_1()
{
        mkdir memcg/0/
        echo 0 > memcg/0/memory.limit_in_bytes

        ./memcg_test_1

        rmdir memcg/0/

        check_kernel_bug
        if [ $? -eq 1 ]; then
                tst_resm TPASS "no kernel bug was found"
        fi
}


I'm gonna just comment the little bugger out as obsolete and walk away.

	-Mike

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mm: LTP/memcg testcase regression induced by 8cd7c588decf..66ce520bb7c2 series
  2021-11-22  6:46   ` Mike Galbraith
@ 2021-11-22  7:41     ` Mike Galbraith
  0 siblings, 0 replies; 11+ messages in thread
From: Mike Galbraith @ 2021-11-22  7:41 UTC (permalink / raw)
  To: Thorsten Leemhuis, lkml, regressions; +Cc: Mel Gorman

On Mon, 2021-11-22 at 07:46 +0100, Mike Galbraith wrote:
> 
> #--------------------------------------------------------------------
> -------
> # Bug:    The bug was, while forking mass processes, trigger
> memcgroup OOM,
> #         then NULL pointer dereference may be hit.
> # Kernel: 2.6.25-rcX
> # Links:  http://lkml.org/lkml/2008/4/14/38
> # Fix:    commit e115f2d89253490fb2dbf304b627f8d908df26f1
> #--------------------------------------------------------------------
> -------
> test_1()
> {       
>         mkdir memcg/0/
>         echo 0 > memcg/0/memory.limit_in_bytes
>         
>         ./memcg_test_1
>         
>         rmdir memcg/0/
>         
>         check_kernel_bug 
>         if [ $? -eq 1 ]; then
>                 tst_resm TPASS "no kernel bug was found"
>         fi
> }
> 
> 
> I'm gonna just comment the little bugger out as obsolete and walk
> away.

test2() and test3() then pass, but test_4() hangs, and looks like it
should not, so...

	-Mike

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mm: LTP/memcg testcase regression induced by 8cd7c588decf..66ce520bb7c2 series
  2021-11-21 10:57 mm: LTP/memcg testcase regression induced by 8cd7c588decf..66ce520bb7c2 series Mike Galbraith
  2021-11-21 13:51 ` Thorsten Leemhuis
@ 2021-11-23  9:13 ` Mel Gorman
  2021-11-23 11:06   ` Mike Galbraith
  2021-11-23 11:18   ` Mike Galbraith
  1 sibling, 2 replies; 11+ messages in thread
From: Mel Gorman @ 2021-11-23  9:13 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: lkml

On Sun, Nov 21, 2021 at 11:57:20AM +0100, Mike Galbraith wrote:
> Greetings,
> 
> FYI, something in this series causes LTP controllers::memcg_regression
> testcase to hang forever.  Verified via brute force revert of the lot.
> 
> After letting box moan for 4.5 hours, I poked ^C repeatedly, but runltp
> didn't exit/recover gracefully, and ps hung, so I nuked the box.  All
> memcg_test_1 instances were stuck in reclaim_throttle().
> 

I'll see can I reproduce this but do you know offhand what the test is
doing and what the expected outcome is? A possibility is that this is a
test that is driving the machine near OOM (or at least memcg OOM) and
getting throttled instead of getting killed.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mm: LTP/memcg testcase regression induced by 8cd7c588decf..66ce520bb7c2 series
  2021-11-23  9:13 ` Mel Gorman
@ 2021-11-23 11:06   ` Mike Galbraith
  2021-11-23 11:18   ` Mike Galbraith
  1 sibling, 0 replies; 11+ messages in thread
From: Mike Galbraith @ 2021-11-23 11:06 UTC (permalink / raw)
  To: Mel Gorman; +Cc: lkml

On Tue, 2021-11-23 at 09:13 +0000, Mel Gorman wrote:
> On Sun, Nov 21, 2021 at 11:57:20AM +0100, Mike Galbraith wrote:
> > Greetings,
> >
> > FYI, something in this series causes LTP controllers::memcg_regression
> > testcase to hang forever.  Verified via brute force revert of the lot.
> >
> > After letting box moan for 4.5 hours, I poked ^C repeatedly, but runltp
> > didn't exit/recover gracefully, and ps hung, so I nuked the box.  All
> > memcg_test_1 instances were stuck in reclaim_throttle().
> >
>
> I'll see can I reproduce this but do you know offhand what the test is
> doing and what the expected outcome is? A possibility is that this is a
> test that is driving the machine near OOM (or at least memcg OOM) and
> getting throttled instead of getting killed.

Here's the hanging test 4.

testcases/bin/memcg_regression_test.sh:
test_4()
{
        ./memcg_test_4.sh

        check_kernel_bug
        if [ $? -eq 1 ]; then
                tst_resm TPASS "no kernel bug was found"
        fi

        # test_4.sh might be killed by oom, so do clean up here
        killall -9 memcg_test_4 2> /dev/null
        killall -9 memcg_test_4.sh 2> /dev/null

        # if test_4.sh gets killed, it won't clean cgroup it created
        rmdir memcg/0 2> /dev/null

        swapon -a
}

testcases/bin/memcg_test_4.sh:
# attach current task to memcg/0/
mkdir memcg/0
echo $$ > memcg/0/tasks

./memcg_test_4 &
pid=$!
sleep 1

# let $pid allocate 100M memory
/bin/kill -SIGUSR1 $pid
sleep 1

# shrink memory, and then 80M will be swapped
echo 40M > memcg/0/memory.limit_in_bytes

# turn off swap, and swapoff will be killed
swapoff -a
sleep 1
echo $pid > memcg/tasks 2> /dev/null
echo $$ > memcg/tasks 2> /dev/null

# now remove the cgroup
rmdir memcg/0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mm: LTP/memcg testcase regression induced by 8cd7c588decf..66ce520bb7c2 series
  2021-11-23  9:13 ` Mel Gorman
  2021-11-23 11:06   ` Mike Galbraith
@ 2021-11-23 11:18   ` Mike Galbraith
  2021-11-23 13:12     ` Mike Galbraith
  2021-11-24 14:56     ` Mel Gorman
  1 sibling, 2 replies; 11+ messages in thread
From: Mike Galbraith @ 2021-11-23 11:18 UTC (permalink / raw)
  To: Mel Gorman; +Cc: lkml

On Tue, 2021-11-23 at 09:13 +0000, Mel Gorman wrote:
>
> I'll see can I reproduce this...

You likely already know this, but just in case, just plunk the below
into $LTPROOT/runtest/foo, and $LTPROOT/runltp -f foo.

#DESCRIPTION:Resource Management testing
memcg_regression        memcg_regression_test.sh

	-Mike

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mm: LTP/memcg testcase regression induced by 8cd7c588decf..66ce520bb7c2 series
  2021-11-23 11:18   ` Mike Galbraith
@ 2021-11-23 13:12     ` Mike Galbraith
  2021-11-23 14:28       ` Mike Galbraith
  2021-11-24 14:56     ` Mel Gorman
  1 sibling, 1 reply; 11+ messages in thread
From: Mike Galbraith @ 2021-11-23 13:12 UTC (permalink / raw)
  To: Mel Gorman; +Cc: lkml

On Tue, 2021-11-23 at 12:18 +0100, Mike Galbraith wrote:
> On Tue, 2021-11-23 at 09:13 +0000, Mel Gorman wrote:
> >
> > I'll see can I reproduce this...
>
> You likely already know this, but just in case, just plunk the below
> into $LTPROOT/runtest/foo, and $LTPROOT/runltp -f foo.
>
> #DESCRIPTION:Resource Management testing
> memcg_regression        memcg_regression_test.sh

Editing memcg_regression_test.sh to only run test_4() works fine, and
an earlier kernel does oom-kill swapoff as it expects.

  319.265016] swapoff invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0

New kernel says it oom-killed, but it's in the penalty box.

[  272.325727] swapoff invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0

crash> ps | grep memcg
   4105   4104   4  ffff953bd1a88000  IN   0.0   15176   3596  memcg_regressio
   4115   4105   0  ffff953beffe2880  IN   0.0   15172   3580  memcg_test_4.sh
   4117   4115   7  ffff953c0b818000  IN   0.2  110020  38404  memcg_test_4
crash> ps | grep swapoff
   4138   4115   1  ffff953bf2f70000  UN   0.0   19032      0  swapoff
crash> bt -sx 4138
PID: 4138   TASK: ffff953bf2f70000  CPU: 5   COMMAND: "swapoff"
 #0 [ffffad1a02957770] __schedule+0x2c3 at ffffffff932636e3
 #1 [ffffad1a02957830] schedule+0x3a at ffffffff932644ba
 #2 [ffffad1a02957840] schedule_timeout+0x1f3 at ffffffff93268d03
 #3 [ffffad1a029578b8] reclaim_throttle+0xd0 at ffffffff92a69ab0
 #4 [ffffad1a02957920] do_try_to_free_pages+0x160 at ffffffff92a6bab0
 #5 [ffffad1a02957978] try_to_free_mem_cgroup_pages+0xf4 at ffffffff92a6d264
 #6 [ffffad1a02957a10] try_charge_memcg+0x19f at ffffffff92b03b6f
 #7 [ffffad1a02957ab0] charge_memcg+0x33 at ffffffff92b04313
 #8 [ffffad1a02957ad0] mem_cgroup_swapin_charge_page+0x58 at ffffffff92b06a08
 #9 [ffffad1a02957b00] __read_swap_cache_async+0x206 at ffffffff92ac2006
#10 [ffffad1a02957b68] swap_cluster_readahead+0x165 at ffffffff92ac2295
#11 [ffffad1a02957bf0] swapin_readahead+0x84 at ffffffff92ac24d4
#12 [ffffad1a02957c78] unuse_pte_range+0x53c at ffffffff92ac5cfc
#13 [ffffad1a02957d68] try_to_unuse+0x50e at ffffffff92ac664e
#14 [ffffad1a02957e38] __do_sys_swapoff+0x1e4 at ffffffff92ac6ba4
#15 [ffffad1a02957e90] do_syscall_64+0x58 at ffffffff93256138
#16 [ffffad1a02957ea8] exit_to_user_mode_prepare+0xf8 at ffffffff9293f138
#17 [ffffad1a02957ec0] syscall_exit_to_user_mode+0x18 at ffffffff9325a358
#18 [ffffad1a02957ed0] do_syscall_64+0x67 at ffffffff93256147
#19 [ffffad1a02957ef0] syscall_exit_to_user_mode+0x18 at ffffffff9325a358
#20 [ffffad1a02957f28] exc_page_fault+0x67 at ffffffff93259c97
#21 [ffffad1a02957f50] entry_SYSCALL_64_after_hwframe+0x44 at ffffffff9340007c
    RIP: 00007f6da7cd9997  RSP: 00007fffbca10388  RFLAGS: 00000202
    RAX: ffffffffffffffda  RBX: 0000000000000001  RCX: 00007f6da7cd9997
    RDX: 0000000000000001  RSI: 0000000000000001  RDI: 000055bc98cb6d10
    RBP: 000055bc98cb6d10   R8: 00007fffbca103f0   R9: 0000000000000003
    R10: 000055bc98cb15af  R11: 0000000000000202  R12: 0000000000000001
    R13: 0000000000000000  R14: 00007fffbca103f0  R15: 0000000000000000
    ORIG_RAX: 00000000000000a8  CS: 0033  SS: 002b
crash>

Enabling sched events for it...

homer:..debug/tracing # tail -25 trace
         swapoff-4138    [001] d....  1641.929650: sched_switch: prev_comm=swapoff prev_pid=4138 prev_prio=120 prev_state=D ==> next_comm=swapper/1 next_pid=0 next_prio=120
          <idle>-0       [001] d.s..  1642.441625: sched_waking: comm=swapoff pid=4138 prio=120 target_cpu=001
          <idle>-0       [001] dNs..  1642.441631: sched_wakeup: comm=swapoff pid=4138 prio=120 target_cpu=001
          <idle>-0       [001] d....  1642.441638: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=swapoff next_pid=4138 next_prio=120
         swapoff-4138    [001] d....  1642.441642: sched_waking: comm=kworker/1:0 pid=5005 prio=120 target_cpu=001
         swapoff-4138    [001] d....  1642.441643: sched_stat_runtime: comm=swapoff pid=4138 runtime=11453 [ns] vruntime=45946584931 [ns]
         swapoff-4138    [001] d....  1642.441645: sched_wakeup: comm=kworker/1:0 pid=5005 prio=120 target_cpu=001
         swapoff-4138    [001] d....  1642.441652: sched_stat_runtime: comm=swapoff pid=4138 runtime=8758 [ns] vruntime=45946593689 [ns]
         swapoff-4138    [001] d....  1642.441653: sched_switch: prev_comm=swapoff prev_pid=4138 prev_prio=120 prev_state=D ==> next_comm=kworker/1:0 next_pid=5005 next_prio=120
          <idle>-0       [001] d.s..  1642.953622: sched_waking: comm=swapoff pid=4138 prio=120 target_cpu=001
          <idle>-0       [001] dNs..  1642.953627: sched_wakeup: comm=swapoff pid=4138 prio=120 target_cpu=001
          <idle>-0       [001] d....  1642.953634: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=swapoff next_pid=4138 next_prio=120
         swapoff-4138    [001] d....  1642.953638: sched_waking: comm=kworker/1:0 pid=5005 prio=120 target_cpu=001
         swapoff-4138    [001] d....  1642.953639: sched_stat_runtime: comm=swapoff pid=4138 runtime=10995 [ns] vruntime=45946604684 [ns]
         swapoff-4138    [001] d....  1642.953641: sched_wakeup: comm=kworker/1:0 pid=5005 prio=120 target_cpu=001
         swapoff-4138    [001] d....  1642.953648: sched_stat_runtime: comm=swapoff pid=4138 runtime=8633 [ns] vruntime=45946613317 [ns]
         swapoff-4138    [001] d....  1642.953649: sched_switch: prev_comm=swapoff prev_pid=4138 prev_prio=120 prev_state=D ==> next_comm=kworker/1:0 next_pid=5005 next_prio=120
          <idle>-0       [001] d.s..  1643.465642: sched_waking: comm=swapoff pid=4138 prio=120 target_cpu=001
          <idle>-0       [001] dNs..  1643.465654: sched_wakeup: comm=swapoff pid=4138 prio=120 target_cpu=001
          <idle>-0       [001] d....  1643.465674: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=swapoff next_pid=4138 next_prio=120
         swapoff-4138    [001] d....  1643.465683: sched_waking: comm=kworker/1:0 pid=5005 prio=120 target_cpu=001
         swapoff-4138    [001] d....  1643.465686: sched_stat_runtime: comm=swapoff pid=4138 runtime=28731 [ns] vruntime=45946642048 [ns]
         swapoff-4138    [001] d....  1643.465690: sched_wakeup: comm=kworker/1:0 pid=5005 prio=120 target_cpu=001
         swapoff-4138    [001] d....  1643.465708: sched_stat_runtime: comm=swapoff pid=4138 runtime=21966 [ns] vruntime=45946664014 [ns]
         swapoff-4138    [001] d....  1643.465711: sched_switch: prev_comm=swapoff prev_pid=4138 prev_prio=120 prev_state=D ==> next_comm=kworker/1:0 next_pid=5005 next_prio=120
homer:..debug/tracing #

...I bet if I were reeeeeally patient, I might see the test pass.

LOL, and as I write this, it just did :)

	-Mike

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mm: LTP/memcg testcase regression induced by 8cd7c588decf..66ce520bb7c2 series
  2021-11-23 13:12     ` Mike Galbraith
@ 2021-11-23 14:28       ` Mike Galbraith
  0 siblings, 0 replies; 11+ messages in thread
From: Mike Galbraith @ 2021-11-23 14:28 UTC (permalink / raw)
  To: Mel Gorman; +Cc: lkml

On Tue, 2021-11-23 at 14:12 +0100, Mike Galbraith wrote:
>
> ...I bet if I were reeeeeally patient, I might see the test pass.
>
> LOL, and as I write this, it just did :)

real    30m27.524s
user    0m0.206s
sys     0m0.162s


Hohum, this one looks like there is a genuine buglet in in there
somewhere, as oom-kill _and_ penalty box surely wasn't intended.

	-Mike

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mm: LTP/memcg testcase regression induced by 8cd7c588decf..66ce520bb7c2 series
  2021-11-23 11:18   ` Mike Galbraith
  2021-11-23 13:12     ` Mike Galbraith
@ 2021-11-24 14:56     ` Mel Gorman
  2021-11-24 16:21       ` Mike Galbraith
  1 sibling, 1 reply; 11+ messages in thread
From: Mel Gorman @ 2021-11-24 14:56 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: lkml

On Tue, Nov 23, 2021 at 12:18:01PM +0100, Mike Galbraith wrote:
> On Tue, 2021-11-23 at 09:13 +0000, Mel Gorman wrote:
> >
> > I'll see can I reproduce this...
> 
> You likely already know this, but just in case, just plunk the below
> into $LTPROOT/runtest/foo, and $LTPROOT/runltp -f foo.
> 
> #DESCRIPTION:Resource Management testing
> memcg_regression        memcg_regression_test.sh
> 

Thanks. Can you try the following patch please?

The test will still take longer to reach OOM and complete as it's stalling
but not as severely.

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 07db03883062..d9166e94eb95 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1057,7 +1057,17 @@ void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason)
 
 		break;
 	case VMSCAN_THROTTLE_NOPROGRESS:
-		timeout = HZ/2;
+		timeout = 1;
+
+		/*
+		 * If kswapd is disabled, reschedule if necessary but do not
+		 * throttle as the system is likely near OOM.
+		 */
+		if (pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES) {
+			cond_resched();
+			return;
+		}
+
 		break;
 	case VMSCAN_THROTTLE_ISOLATED:
 		timeout = HZ/50;
@@ -3395,7 +3405,7 @@ static void consider_reclaim_throttle(pg_data_t *pgdat, struct scan_control *sc)
 		return;
 
 	/* Throttle if making no progress at high prioities. */
-	if (sc->priority < DEF_PRIORITY - 2)
+	if (sc->priority < DEF_PRIORITY - 2 && !sc->nr_reclaimed)
 		reclaim_throttle(pgdat, VMSCAN_THROTTLE_NOPROGRESS);
 }
 
@@ -3415,6 +3425,7 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 	unsigned long nr_soft_scanned;
 	gfp_t orig_mask;
 	pg_data_t *last_pgdat = NULL;
+	pg_data_t *first_pgdat = NULL;
 
 	/*
 	 * If the number of buffer_heads in the machine exceeds the maximum
@@ -3478,14 +3489,18 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 			/* need some check for avoid more shrink_zone() */
 		}
 
+		if (!first_pgdat)
+			first_pgdat = zone->zone_pgdat;
+
 		/* See comment about same check for global reclaim above */
 		if (zone->zone_pgdat == last_pgdat)
 			continue;
 		last_pgdat = zone->zone_pgdat;
 		shrink_node(zone->zone_pgdat, sc);
-		consider_reclaim_throttle(zone->zone_pgdat, sc);
 	}
 
+	consider_reclaim_throttle(first_pgdat, sc);
+
 	/*
 	 * Restore to original mask to avoid the impact on the caller if we
 	 * promoted it to __GFP_HIGHMEM.

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: mm: LTP/memcg testcase regression induced by 8cd7c588decf..66ce520bb7c2 series
  2021-11-24 14:56     ` Mel Gorman
@ 2021-11-24 16:21       ` Mike Galbraith
  0 siblings, 0 replies; 11+ messages in thread
From: Mike Galbraith @ 2021-11-24 16:21 UTC (permalink / raw)
  To: Mel Gorman; +Cc: lkml

On Wed, 2021-11-24 at 14:56 +0000, Mel Gorman wrote:
> On Tue, Nov 23, 2021 at 12:18:01PM +0100, Mike Galbraith wrote:
> > On Tue, 2021-11-23 at 09:13 +0000, Mel Gorman wrote:
> > > 
> > > I'll see can I reproduce this...
> > 
> > You likely already know this, but just in case, just plunk the below
> > into $LTPROOT/runtest/foo, and $LTPROOT/runltp -f foo.
> > 
> > #DESCRIPTION:Resource Management testing
> > memcg_regression        memcg_regression_test.sh
> > 
> 
> Thanks. Can you try the following patch please?
> 
> The test will still take longer to reach OOM and complete as it's stalling
> but not as severely.

Yeah, way better.  test1() is still to be avoided, but everything else
seems fine, including stress (twiddled to not be swap storm from hell).

> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 07db03883062..d9166e94eb95 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1057,7 +1057,17 @@ void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason)
>  
>                 break;
>         case VMSCAN_THROTTLE_NOPROGRESS:
> -               timeout = HZ/2;
> +               timeout = 1;
> +
> +               /*
> +                * If kswapd is disabled, reschedule if necessary but do not
> +                * throttle as the system is likely near OOM.
> +                */
> +               if (pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES) {
> +                       cond_resched();
> +                       return;
> +               }
> +
>                 break;
>         case VMSCAN_THROTTLE_ISOLATED:
>                 timeout = HZ/50;
> @@ -3395,7 +3405,7 @@ static void consider_reclaim_throttle(pg_data_t *pgdat, struct scan_control *sc)
>                 return;
>  
>         /* Throttle if making no progress at high prioities. */
> -       if (sc->priority < DEF_PRIORITY - 2)
> +       if (sc->priority < DEF_PRIORITY - 2 && !sc->nr_reclaimed)
>                 reclaim_throttle(pgdat, VMSCAN_THROTTLE_NOPROGRESS);
>  }
>  
> @@ -3415,6 +3425,7 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
>         unsigned long nr_soft_scanned;
>         gfp_t orig_mask;
>         pg_data_t *last_pgdat = NULL;
> +       pg_data_t *first_pgdat = NULL;
>  
>         /*
>          * If the number of buffer_heads in the machine exceeds the maximum
> @@ -3478,14 +3489,18 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
>                         /* need some check for avoid more shrink_zone() */
>                 }
>  
> +               if (!first_pgdat)
> +                       first_pgdat = zone->zone_pgdat;
> +
>                 /* See comment about same check for global reclaim above */
>                 if (zone->zone_pgdat == last_pgdat)
>                         continue;
>                 last_pgdat = zone->zone_pgdat;
>                 shrink_node(zone->zone_pgdat, sc);
> -               consider_reclaim_throttle(zone->zone_pgdat, sc);
>         }
>  
> +       consider_reclaim_throttle(first_pgdat, sc);
> +
>         /*
>          * Restore to original mask to avoid the impact on the caller if we
>          * promoted it to __GFP_HIGHMEM.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-11-24 16:22 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-21 10:57 mm: LTP/memcg testcase regression induced by 8cd7c588decf..66ce520bb7c2 series Mike Galbraith
2021-11-21 13:51 ` Thorsten Leemhuis
2021-11-22  6:46   ` Mike Galbraith
2021-11-22  7:41     ` Mike Galbraith
2021-11-23  9:13 ` Mel Gorman
2021-11-23 11:06   ` Mike Galbraith
2021-11-23 11:18   ` Mike Galbraith
2021-11-23 13:12     ` Mike Galbraith
2021-11-23 14:28       ` Mike Galbraith
2021-11-24 14:56     ` Mel Gorman
2021-11-24 16:21       ` Mike Galbraith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).