From: Suren Baghdasaryan <email@example.com> To: Johannes Weiner <firstname.lastname@example.org> Cc: email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, Ingo Molnar <email@example.com>, Peter Zijlstra <firstname.lastname@example.org>, Andrew Morton <email@example.com>, Tejun Heo <firstname.lastname@example.org>, Balbir Singh <email@example.com>, Mike Galbraith <firstname.lastname@example.org>, Oliver Yang <email@example.com>, Shakeel Butt <firstname.lastname@example.org>, xxx xxx <email@example.com>, Taras Kondratiuk <firstname.lastname@example.org>, Daniel Walker <email@example.com>, Vinayak Menon <firstname.lastname@example.org>, Ruslan Ruslichenko <email@example.com>, firstname.lastname@example.org Subject: Re: [PATCH 0/7] psi: pressure stall information for CPU, memory, and IO Date: Wed, 30 May 2018 16:32:52 -0700 [thread overview] Message-ID: <CAJuCfpGXSyu3SOky6jMhKjix=bbaPccg05VcepbvuJiv+bQgzw@mail.gmail.com> (raw) In-Reply-To: <20180529181616.GB28689@cmpxchg.org> On Tue, May 29, 2018 at 11:16 AM, Johannes Weiner <email@example.com> wrote: > Hi Suren, > > On Fri, May 25, 2018 at 05:29:30PM -0700, Suren Baghdasaryan wrote: >> Hi Johannes, >> I tried your previous memdelay patches before this new set was posted >> and results were promising for predicting when Android system is close >> to OOM. I'm definitely going to try this one after I backport it to >> 4.9. > > I'm happy to hear that! > >> Would it make sense to split CONFIG_PSI into CONFIG_PSI_CPU, >> CONFIG_PSI_MEM and CONFIG_PSI_IO since one might need only specific >> subset of this feature? > > Yes, that should be doable. I'll split them out in the next version. > >> > The total= value gives the absolute stall time in microseconds. This >> > allows detecting latency spikes that might be too short to sway the >> > running averages. It also allows custom time averaging in case the >> > 10s/1m/5m windows aren't adequate for the usecase (or are too coarse >> > with future hardware). >> >> Any reasons these specific windows were chosen (empirical >> data/historical reasons)? I'm worried that with the smallest window >> being 10s the signal might be too inert to detect fast memory pressure >> buildup before OOM kill happens. I'll have to experiment with that >> first, however if you have some insights into this already please >> share them. > > They were chosen empirically. We started out with the loadavg window > sizes, but had to reduce them for exactly the reason you mention - > they're way too coarse to detect acute pressure buildup. > > 10s has been working well for us. We could make it smaller, but there > is some worry that we don't have enough samples then and the average > becomes too erratic - whereas monitoring total= directly would allow > you to detect accute spikes and handle this erraticness explicitly. Unfortunately total= field is now updated only at 2sec intervals which might be too late to react to mounting memory pressure. With previous memdelay patchset md->aggregate which is reported as "total" was calculated directly from inside memdelay_task_change, so it was always up-to-date. Now group->some and group->full are updated from inside psi_clock with up to 2sec delay. This prevents us from detecting these acute pressure spikes immediately. I understand why you moved these calculations out of the hot path but maybe we could keep updating "total" inside psi_group_update? This would allow for custom averaging and eliminate this delay for detecting spikes in the pressure signal. More conceptually I would love to have a way to monitor the averages at a slow rate and when they rise and cross some threshold to increase the monitoring rate and react quickly in case they shoot up. Current 2sec delay poses a problem for doing that. > > Let me know how it works out in your tests. I've done the backporting to 4.9 and running the tests but the 2sec delay is problematic for getting a detailed look at the signal and its usefulness. Thinking about workarounds if only for data collection but don't want to deviate too much from your baseline. Would love to hear from you if a good compromise can be reached here. > > Thanks for your feedback.
prev parent reply other threads:[~2018-05-30 23:32 UTC|newest] Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-05-07 21:01 Johannes Weiner 2018-05-07 21:01 ` [PATCH 1/7] mm: workingset: don't drop refault information prematurely Johannes Weiner 2018-05-07 21:01 ` [PATCH 2/7] mm: workingset: tell cache transitions from workingset thrashing Johannes Weiner 2018-05-07 21:01 ` [PATCH 3/7] delayacct: track delays from thrashing cache pages Johannes Weiner 2018-05-07 21:01 ` [PATCH 4/7] sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD Johannes Weiner 2018-05-07 21:01 ` [PATCH 5/7] sched: loadavg: make calc_load_n() public Johannes Weiner 2018-05-09 9:49 ` Peter Zijlstra 2018-05-10 13:46 ` Johannes Weiner 2018-05-07 21:01 ` [PATCH 6/7] psi: pressure stall information for CPU, memory, and IO Johannes Weiner 2018-05-08 0:42 ` Randy Dunlap 2018-05-08 14:06 ` Johannes Weiner 2018-05-08 1:35 ` kbuild test robot 2018-05-08 3:04 ` kbuild test robot 2018-05-08 14:05 ` Johannes Weiner 2018-05-09 9:59 ` Peter Zijlstra 2018-05-10 13:49 ` Johannes Weiner 2018-05-09 10:04 ` Peter Zijlstra 2018-05-10 14:10 ` Johannes Weiner 2018-05-09 10:05 ` Peter Zijlstra 2018-05-10 14:13 ` Johannes Weiner 2018-05-09 10:14 ` Peter Zijlstra 2018-05-10 14:18 ` Johannes Weiner 2018-05-09 10:21 ` Peter Zijlstra 2018-05-10 14:24 ` Johannes Weiner 2018-05-09 10:26 ` Peter Zijlstra 2018-05-09 10:46 ` Peter Zijlstra 2018-05-09 11:38 ` Peter Zijlstra 2018-05-10 13:41 ` Johannes Weiner 2018-05-14 8:33 ` Peter Zijlstra 2018-05-09 10:55 ` Peter Zijlstra 2018-05-09 11:03 ` Vinayak Menon 2018-05-23 13:17 ` Johannes Weiner 2018-05-23 13:19 ` Vinayak Menon 2018-06-07 0:46 ` Suren Baghdasaryan 2018-05-07 21:01 ` [PATCH 7/7] psi: cgroup support Johannes Weiner 2018-05-09 11:07 ` Peter Zijlstra 2018-05-10 14:49 ` Johannes Weiner 2018-05-14 15:39 ` [PATCH 0/7] psi: pressure stall information for CPU, memory, and IO Christopher Lameter 2018-05-14 17:35 ` Bart Van Assche 2018-05-14 18:55 ` Johannes Weiner 2018-05-14 20:15 ` Christopher Lameter 2018-05-26 0:29 ` Suren Baghdasaryan 2018-05-29 18:16 ` Johannes Weiner 2018-05-30 23:32 ` Suren Baghdasaryan [this message]
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CAJuCfpGXSyu3SOky6jMhKjix=bbaPccg05VcepbvuJiv+bQgzw@mail.gmail.com' \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --subject='Re: [PATCH 0/7] psi: pressure stall information for CPU, memory, and IO' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.