All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	l.roehrs@profihost.ag, cgroups@vger.kernel.org,
	Johannes Weiner <hannes@cmpxchg.org>,
	Vlastimil Babka <vbabka@suse.cz>
Subject: Re: lot of MemAvailable but falling cache and raising PSI
Date: Tue, 10 Sep 2019 11:02:41 +0200	[thread overview]
Message-ID: <20190910090241.GM2063@dhcp22.suse.cz> (raw)
In-Reply-To: <132e1fd0-c392-c158-8f3a-20e340e542f0@profihost.ag>

On Tue 10-09-19 10:38:25, Stefan Priebe - Profihost AG wrote:
> Am 10.09.19 um 10:29 schrieb Michal Hocko:
> > On Tue 10-09-19 07:56:36, Stefan Priebe - Profihost AG wrote:
> >>
> >> Am 09.09.19 um 14:56 schrieb Stefan Priebe - Profihost AG:
> >>> Am 09.09.19 um 14:49 schrieb Michal Hocko:
> >>>> On Mon 09-09-19 14:37:52, Stefan Priebe - Profihost AG wrote:
> >>>>>
> >>>>> Am 09.09.19 um 14:28 schrieb Michal Hocko:
> >>>>>> On Mon 09-09-19 14:10:02, Stefan Priebe - Profihost AG wrote:
> >>>>>>>
> >>>>>>> Am 09.09.19 um 14:08 schrieb Michal Hocko:
> >>>>>>>> On Mon 09-09-19 13:01:36, Michal Hocko wrote:
> >>>>>>>>> and that matches moments when we reclaimed memory. There seems to be a
> >>>>>>>>> steady THP allocations flow so maybe this is a source of the direct
> >>>>>>>>> reclaim?
> >>>>>>>>
> >>>>>>>> I was thinking about this some more and THP being a source of reclaim
> >>>>>>>> sounds quite unlikely. At least in a default configuration because we
> >>>>>>>> shouldn't do anything expensinve in the #PF path. But there might be a
> >>>>>>>> difference source of high order (!costly) allocations. Could you check
> >>>>>>>> how many allocation requests like that you have on your system?
> >>>>>>>>
> >>>>>>>> mount -t debugfs none /debug
> >>>>>>>> echo "order > 0" > /debug/tracing/events/kmem/mm_page_alloc/filter
> >>>>>>>> echo 1 > /debug/tracing/events/kmem/mm_page_alloc/enable
> >>>>>>>> cat /debug/tracing/trace_pipe > $file
> >>>>>>
> >>>>>> echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable
> >>>>>> echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable
> >>>>>>  
> >>>>>> might tell us something as well but it might turn out that it just still
> >>>>>> doesn't give us the full picture and we might need
> >>>>>> echo stacktrace > /debug/tracing/trace_options
> >>>>>>
> >>>>>> It will generate much more output though.
> >>>>>>
> >>>>>>> Just now or when PSI raises?
> >>>>>>
> >>>>>> When the excessive reclaim is happening ideally.
> >>>>>
> >>>>> This one is from a server with 28G memfree but memory pressure is still
> >>>>> jumping between 0 and 10%.
> >>>>>
> >>>>> I did:
> >>>>> echo "order > 0" >
> >>>>> /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/filter
> >>>>>
> >>>>> echo 1 > /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/enable
> >>>>>
> >>>>> echo 1 >
> >>>>> /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable
> >>>>>
> >>>>> echo 1 >
> >>>>> /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable
> >>>>>
> >>>>> timeout 120 cat /sys/kernel/debug/tracing/trace_pipe > /trace
> >>>>>
> >>>>> File attached.
> >>>>
> >>>> There is no reclaim captured in this trace dump.
> >>>> $ zcat trace1.gz | sed 's@.*\(order=[0-9]\).*\(gfp_flags=.*\)@\1 \2@' | sort | uniq -c
> >>>>     777 order=1 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
> >>>>     663 order=1 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
> >>>>     153 order=1 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
> >>>>     911 order=1 gfp_flags=GFP_KERNEL_ACCOUNT|__GFP_ZERO
> >>>>    4872 order=1 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT
> >>>>      62 order=1 gfp_flags=GFP_NOWAIT|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
> >>>>      14 order=2 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP
> >>>>      11 order=2 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_RECLAIMABLE
> >>>>    1263 order=2 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
> >>>>      45 order=2 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE
> >>>>       1 order=2 gfp_flags=GFP_KERNEL|__GFP_COMP|__GFP_ZERO
> >>>>    7853 order=2 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT
> >>>>      73 order=3 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
> >>>>     729 order=3 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE
> >>>>     528 order=3 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
> >>>>    1203 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT
> >>>>    5295 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP
> >>>>       1 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
> >>>>     132 order=3 gfp_flags=GFP_NOWAIT|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
> >>>>      13 order=5 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO
> >>>>       1 order=6 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO
> >>>>    1232 order=9 gfp_flags=GFP_TRANSHUGE
> >>>>     108 order=9 gfp_flags=GFP_TRANSHUGE|__GFP_THISNODE
> >>>>     362 order=9 gfp_flags=GFP_TRANSHUGE_LIGHT|__GFP_THISNODE
> >>>>
> >>>> Nothing really stands out because except for the THP ones none of others
> >>>> are going to even be using movable zone.
> >>> It might be that this is not an ideal example is was just the fastest i
> >>> could find. May be we really need one with much higher pressure.
> >>
> >> here another trace log where a system has 30GB free memory but is under
> >> constant pressure and does not build up any file cache caused by memory
> >> pressure.
> > 
> > So the reclaim is clearly induced by THP allocations
> > $ zgrep vmscan trace2.gz | grep gfp_flags | sed 's@.*\(gfp_flags=.*\) .*@\1@' | sort | uniq -c
> >    1580 gfp_flags=GFP_TRANSHUGE
> >      15 gfp_flags=GFP_TRANSHUGE|__GFP_THISNODE
> > 
> > $ zgrep vmscan trace2.gz | grep nr_reclaimed | sed 's@nr_reclaimed=@@' |  awk '{nr+=$6+0}END{print nr}'
> > 1541726
> > 
> > 6GB of memory reclaimed in 1776s. That is a lot! But the THP allocation
> > rate is really high as well
> > $ zgrep "page_alloc.*GFP_TRANSHUGE" trace2.gz | wc -l
> > 15340
> > 
> > this is 30GB worth of THPs (some of them might get released of course).
> > Also only 10% of requests ends up reclaiming.
> > 
> > One additional interesting point
> > $ zgrep vmscan trace2.gz | grep nr_reclaimed | sed 's@.*nr_reclaimed=\([[0-9]*\)@\1@' | calc_min_max.awk
> > min: 1.00 max: 2792.00 avg: 965.99 std: 331.12 nr: 1596
> > 
> > Even though the std is high there are quite some outliers when a lot of
> > memory is reclaimed.
> > 
> > Which kernel version is this. And again, what is the THP configuration.
> 
> This is 4.19.66 regarding THP you mean this:

Do you see the same behavior with 5.3?

> /sys/kernel/mm/transparent_hugepage/defrag:always defer [defer+madvise]
> madvise never
> 
> /sys/kernel/mm/transparent_hugepage/enabled:[always] madvise never
> 
> /sys/kernel/mm/transparent_hugepage/hpage_pmd_size:2097152
> 
> /sys/kernel/mm/transparent_hugepage/shmem_enabled:always within_size
> advise [never] deny force
> 
> /sys/kernel/mm/transparent_hugepage/use_zero_page:1
> 
> /sys/kernel/mm/transparent_hugepage/enabled was madvise until yesterday
> where i tried to switch to defer+madvise - which didn't help.

Many processes hitting the reclaim are php5 others I cannot say because
their cmd is not reflected in the trace. I suspect those are using
madvise. I haven't really seen kcompactd interfering much. That would
suggest using defer.

-- 
Michal Hocko
SUSE Labs


  reply	other threads:[~2019-09-10  9:02 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-05 11:27 lot of MemAvailable but falling cache and raising PSI Stefan Priebe - Profihost AG
2019-09-05 11:40 ` Michal Hocko
2019-09-05 11:56   ` Stefan Priebe - Profihost AG
2019-09-05 16:28     ` Yang Shi
2019-09-05 17:26       ` Stefan Priebe - Profihost AG
2019-09-05 18:46         ` Yang Shi
2019-09-05 19:31           ` Stefan Priebe - Profihost AG
2019-09-06 10:08     ` Stefan Priebe - Profihost AG
2019-09-06 10:25       ` Vlastimil Babka
2019-09-06 18:52       ` Yang Shi
2019-09-07  7:32         ` Stefan Priebe - Profihost AG
2019-09-09  8:27       ` Michal Hocko
2019-09-09  8:54         ` Stefan Priebe - Profihost AG
2019-09-09 11:01           ` Michal Hocko
2019-09-09 12:08             ` Michal Hocko
2019-09-09 12:10               ` Stefan Priebe - Profihost AG
2019-09-09 12:28                 ` Michal Hocko
2019-09-09 12:37                   ` Stefan Priebe - Profihost AG
2019-09-09 12:49                     ` Michal Hocko
2019-09-09 12:56                       ` Stefan Priebe - Profihost AG
     [not found]                         ` <52235eda-ffe2-721c-7ad7-575048e2d29d@profihost.ag>
2019-09-10  5:58                           ` Stefan Priebe - Profihost AG
2019-09-10  8:29                           ` Michal Hocko
2019-09-10  8:38                             ` Stefan Priebe - Profihost AG
2019-09-10  9:02                               ` Michal Hocko [this message]
2019-09-10  9:37                                 ` Stefan Priebe - Profihost AG
2019-09-10 11:07                                   ` Michal Hocko
2019-09-10 12:45                                     ` Stefan Priebe - Profihost AG
2019-09-10 12:57                                       ` Michal Hocko
2019-09-10 13:05                                         ` Stefan Priebe - Profihost AG
2019-09-10 13:14                                           ` Stefan Priebe - Profihost AG
2019-09-10 13:24                                             ` Michal Hocko
2019-09-11  6:12                                               ` Stefan Priebe - Profihost AG
2019-09-11  6:24                                                 ` Stefan Priebe - Profihost AG
2019-09-11 13:59                                                   ` Stefan Priebe - Profihost AG
2019-09-12 10:53                                                     ` Stefan Priebe - Profihost AG
2019-09-12 11:06                                                       ` Stefan Priebe - Profihost AG
2019-09-11  7:09                                                 ` 5.3-rc-8 hung task in IO (was: Re: lot of MemAvailable but falling cache and raising PSI) Michal Hocko
2019-09-11 14:09                                                   ` Stefan Priebe - Profihost AG
2019-09-11 14:56                                                   ` Filipe Manana
2019-09-11 14:56                                                     ` Filipe Manana
2019-09-11 15:39                                                     ` Stefan Priebe - Profihost AG
2019-09-11 15:56                                                       ` Filipe Manana
2019-09-11 15:56                                                         ` Filipe Manana
2019-09-11 16:15                                                         ` Stefan Priebe - Profihost AG
2019-09-11 16:19                                                           ` Filipe Manana
2019-09-11 16:19                                                             ` Filipe Manana
2019-09-19 10:21                                                 ` lot of MemAvailable but falling cache and raising PSI Stefan Priebe - Profihost AG
2019-09-23 12:08                                                   ` Michal Hocko
2019-09-27 12:45                                                   ` Vlastimil Babka
2019-09-30  6:56                                                     ` Stefan Priebe - Profihost AG
2019-09-30  7:21                                                       ` Vlastimil Babka
2019-10-22  7:41                                                     ` Stefan Priebe - Profihost AG
2019-10-22  7:48                                                       ` Vlastimil Babka
2019-10-22 10:02                                                         ` Stefan Priebe - Profihost AG
2019-10-22 10:20                                                           ` Oscar Salvador
2019-10-22 10:21                                                           ` Vlastimil Babka
2019-10-22 11:08                                                             ` Stefan Priebe - Profihost AG
2019-09-10  5:41                       ` Stefan Priebe - Profihost AG
2019-09-09 11:49           ` Vlastimil Babka
2019-09-09 12:09             ` Stefan Priebe - Profihost AG
2019-09-09 12:21               ` Vlastimil Babka
2019-09-09 12:31                 ` Stefan Priebe - Profihost AG
2019-09-05 12:15 ` Vlastimil Babka
2019-09-05 12:27   ` Stefan Priebe - Profihost AG

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190910090241.GM2063@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=l.roehrs@profihost.ag \
    --cc=linux-mm@kvack.org \
    --cc=s.priebe@profihost.ag \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.