All of lore.kernel.org
 help / color / mirror / Atom feed
* [OSSTEST PATCH 00/21] Abandon jobs which are unreasonably delaying their flight
@ 2019-04-18 16:31 ` Ian Jackson
  0 siblings, 0 replies; 50+ messages in thread
From: Ian Jackson @ 2019-04-18 16:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Lars Kurth, Julien Grall, Stefano Stabellini, Ian Jackson, committers

Sometimes we find ourselves seriously lacking the capacity to run
particular job(s).  The result can be that the whole system stands
mostly idle while a small proportion of the resources runs flat out
with a giant queue.

In this series we arrange for osstest to be able to spot this
happening, and automatically rebalance load by give up earlier on the
jobs which are overly-contended.

There are some tuning parameters, of course.  To summarise, I have
chosen here to treat jobs as starved if (for example):
  We have completed 90% of the flight, and the remaining 10%
  is projected to take 5x as long as the first 90%.
(The "90%" is by number of jobs.)  See the patch
  starvation: Infrastructure for jobs which are delaying their flights
for details of the heuristic and its parameters.

When situations like this persist it will still be good to manually
balance the load by adjusting the job mix in submitted flights.  This
is because the starvation will not necessarily drop the same job in
subsequent flights on the same "branch", so starvation will impair the
regression detection.

Ian Jackson (21):
  ts-hosts-allocate-Executive: with -U, just append to the same logfile
  selecthost: Honour new $none_ok optional parameter
  ts-logs-capture: Do not try to capture logs of hosts not allocated
  alloc_resources: Support special abandonment values
  starvation: Teach sg-report-flight about starved step state
  starvation: Teach archaeologists about starved job state
  starvation: Teach ms-flights-summary about job state starved
  starvation: Teach sg-execute-flight about job state starved
  step handling: Preserve step states set by ts-* scripts
  TestSupport: Make "broken" print the actual job state
  JobDB::Executive: step_*: fix log messages to talk about "steps"
  starvation: Permit step_finish to set the state `starved'
  TestSupport: Make "broken" set the step state too
  tcl/JobDB-Executive: Do not squash "starved" status
  starvation: Propagate starved job status into dependent jobs
  ts-host-allocate-Executive: Break out $now and add a newline
  starvation: Use "starved" for hostalloc_maxwait_max
  starvation: Infrastructure for jobs which are delaying their flights
  starvation: Abandon jobs which are unreasonably delaying their flight
  hostalloc_maxwait_max: Use starvation most_optimistic
  starvation: Better logging/debugging output

 Osstest/Executive.pm         |  95 ++++++++++++++++++++++++++---
 Osstest/JobDB/Executive.pm   |   8 ++-
 Osstest/TestSupport.pm       |  24 ++++++--
 mg-hostalloc-starvation-demo |  53 ++++++++++++++++
 ms-flights-summary           |   9 +--
 sg-execute-flight            |   2 +-
 sg-report-flight             |  17 +++++-
 tcl/JobDB-Executive.tcl      |   6 +-
 ts-hosts-allocate-Executive  | 142 ++++++++++++++++++++++++++++++++++++++++---
 ts-logs-capture              |   7 ++-
 10 files changed, 328 insertions(+), 35 deletions(-)
 create mode 100755 mg-hostalloc-starvation-demo

-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2019-04-30 14:26 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-18 16:31 [OSSTEST PATCH 00/21] Abandon jobs which are unreasonably delaying their flight Ian Jackson
2019-04-18 16:31 ` [Xen-devel] " Ian Jackson
2019-04-18 16:31 ` [OSSTEST PATCH 01/21] ts-hosts-allocate-Executive: with -U, just append to the same logfile Ian Jackson
2019-04-18 16:31   ` [Xen-devel] " Ian Jackson
2019-04-18 16:31 ` [OSSTEST PATCH 02/21] selecthost: Honour new $none_ok optional parameter Ian Jackson
2019-04-18 16:31   ` [Xen-devel] " Ian Jackson
2019-04-18 16:31 ` [OSSTEST PATCH 03/21] ts-logs-capture: Do not try to capture logs of hosts not allocated Ian Jackson
2019-04-18 16:31   ` [Xen-devel] " Ian Jackson
2019-04-18 16:31 ` [OSSTEST PATCH 04/21] alloc_resources: Support special abandonment values Ian Jackson
2019-04-18 16:31   ` [Xen-devel] " Ian Jackson
2019-04-18 16:31 ` [OSSTEST PATCH 05/21] starvation: Teach sg-report-flight about starved step state Ian Jackson
2019-04-18 16:31   ` [Xen-devel] " Ian Jackson
2019-04-18 16:31 ` [OSSTEST PATCH 06/21] starvation: Teach archaeologists about starved job state Ian Jackson
2019-04-18 16:31   ` [Xen-devel] " Ian Jackson
2019-04-18 16:31 ` [OSSTEST PATCH 07/21] starvation: Teach ms-flights-summary about job state starved Ian Jackson
2019-04-18 16:31   ` [Xen-devel] " Ian Jackson
2019-04-18 16:31 ` [OSSTEST PATCH 08/21] starvation: Teach sg-execute-flight " Ian Jackson
2019-04-18 16:31   ` [Xen-devel] " Ian Jackson
2019-04-18 16:31 ` [OSSTEST PATCH 09/21] step handling: Preserve step states set by ts-* scripts Ian Jackson
2019-04-18 16:31   ` [Xen-devel] " Ian Jackson
2019-04-18 16:31 ` [OSSTEST PATCH 10/21] TestSupport: Make "broken" print the actual job state Ian Jackson
2019-04-18 16:31   ` [Xen-devel] " Ian Jackson
2019-04-18 16:31 ` [OSSTEST PATCH 11/21] JobDB::Executive: step_*: fix log messages to talk about "steps" Ian Jackson
2019-04-18 16:31   ` [Xen-devel] " Ian Jackson
2019-04-18 16:31 ` [OSSTEST PATCH 12/21] starvation: Permit step_finish to set the state `starved' Ian Jackson
2019-04-18 16:31   ` [Xen-devel] " Ian Jackson
2019-04-18 16:31 ` [OSSTEST PATCH 13/21] TestSupport: Make "broken" set the step state too Ian Jackson
2019-04-18 16:31   ` [Xen-devel] " Ian Jackson
2019-04-18 16:31 ` [OSSTEST PATCH 14/21] tcl/JobDB-Executive: Do not squash "starved" status Ian Jackson
2019-04-18 16:31   ` [Xen-devel] " Ian Jackson
2019-04-18 16:31 ` [OSSTEST PATCH 15/21] starvation: Propagate starved job status into dependent jobs Ian Jackson
2019-04-18 16:31   ` [Xen-devel] " Ian Jackson
2019-04-18 16:31 ` [OSSTEST PATCH 16/21] ts-host-allocate-Executive: Break out $now and add a newline Ian Jackson
2019-04-18 16:31   ` [Xen-devel] " Ian Jackson
2019-04-18 16:31 ` [OSSTEST PATCH 17/21] starvation: Use "starved" for hostalloc_maxwait_max Ian Jackson
2019-04-18 16:31   ` [Xen-devel] " Ian Jackson
2019-04-18 16:31 ` [OSSTEST PATCH 18/21] starvation: Infrastructure for jobs which are delaying their flights Ian Jackson
2019-04-18 16:31   ` [Xen-devel] " Ian Jackson
2019-04-18 16:31 ` [OSSTEST PATCH 19/21] starvation: Abandon jobs which are unreasonably delaying their flight Ian Jackson
2019-04-18 16:31   ` [Xen-devel] " Ian Jackson
2019-04-18 16:31 ` [OSSTEST PATCH 20/21] hostalloc_maxwait_max: Use starvation most_optimistic Ian Jackson
2019-04-18 16:31   ` [Xen-devel] " Ian Jackson
2019-04-18 16:31 ` [OSSTEST PATCH 21/21] starvation: Better logging/debugging output Ian Jackson
2019-04-18 16:31   ` [Xen-devel] " Ian Jackson
2019-04-26 21:16 ` [OSSTEST PATCH 00/21] Abandon jobs which are unreasonably delaying their flight Julien Grall
2019-04-26 21:16   ` [Xen-devel] " Julien Grall
2019-04-29 14:46   ` Ian Jackson
2019-04-29 14:46     ` [Xen-devel] " Ian Jackson
2019-04-30 14:26     ` Julien Grall
2019-04-30 14:26       ` [Xen-devel] " Julien Grall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.