Re: Hourly, daily, weekly monitoring

From: "Guillaume Tucker" <guillaume.tucker@gmail.com>
To: Dan Rue <dan.rue@linaro.org>
Cc: kernelci@groups.io, broonie@kernel.org,
	Linus Walleij <linus.walleij@linaro.org>
Subject: Re: Hourly, daily, weekly monitoring
Date: Wed, 6 Mar 2019 09:16:18 +0000	[thread overview]
Message-ID: <CAH1_8nAYcGBO61hsbe1-8tJpcBE5=QZ6XmLY81h7jhoVu9av4g@mail.gmail.com> (raw)
In-Reply-To: <20190305220439.bzmsdtpd5gvrr3t6@xps.therub.org>

[-- Attachment #1: Type: text/plain, Size: 5194 bytes --]

On Tue, Mar 5, 2019 at 10:04 PM Dan Rue <dan.rue@linaro.org> wrote:

> On Tue, Mar 05, 2019 at 12:14:12PM +0000, Mark Brown wrote:
> > On Mon, Mar 04, 2019 at 01:20:25PM +0000, Guillaume Tucker wrote:
> >
> > > I'm not entirely sure how much flexibility Jenkins can offer in
> > > that respect, but at least having 3 versions of the monitor job
> > > that runs every hour, every day and every week should cover all
> > > the cases.  If possible, we may be able to implement something
> > > that dynamically schedules the next check for each branch.
> >
> > One big concern I have with this is latency.  One of the common cases
> > where people don't need builds all the time is when they're mainly
> > looking for checks before they submit pull requests.  For those if you
> > might have to wait almost a day before we even queue the build it might
> > be a bit of an issue.  If it was just a "skip this tree if we built it
> > in the past X time" check that wouldn't be such an issue, it's just if
> > the daily check runs at some fixed time each day or whatever that it
> > might add huge extra latency.
> >
> > Another idea I just thought of but I'm not sure is practical would be to
> > only check some trees if the Jenkins queue is less than some number of
> > builds - that way if we're busy we won't add extra load, but it feels
> > like it's more trouble than it's worth to implement fairly.
>

These are different use-cases, some people have said that they
wanted their branches checked every morning (linusw) or every
Monday (media).  For real-time feedback, we need to do something
quite different indeed.  With the ability to whitelist some
defconfigs and arches, we could have quick builds for some
maintainers' trees to optimise the turnaround time / test
coverage ratio.

And yes, a mechanism to skip builds on arbitrary criteria would
seem to be quite hard to design with fair rules.  And at the end
of the day, if we don't have enough build power to cope with the
load, even the best rules would just be moving the problem
somewhere else.

What I think we should have is some kind of OOM system, to track
that if the queue keeps increasing from one day to the next then
some builds need to be killed, because there really isn't any
choice in that situation.  We're still running at about 75% of
our capacity and builds can be cancelled manually if anything
goes very wrong, so it's not a practical concern right now.

> I wonder what happens when something doesn't fit in a
> hourly/daily/weekly box. It could also cause a daily/weekly bottleneck
> if they're all scheduled at the same time.
>

Everything kind of fits in an hourly box, because that's a small
enough interval to start building things shortly after a new
revision was pushed.  Having longer periods is useful for people
who don't want intermediate versions to go through the test
system, or don't want to set up a branch just for KernelCI.

Perhaps each tree gets a cooling off period defined in e.g. seconds, and
> it could be defaulted to current behavior of 1 hour. If a tree is
> triggered but its cooling off period hasn't passed, the trigger is
> either ignored or deferred. This would also let us increase the
> frequency of the build trigger to something like every 5 minutes.
>

If we want that kind of speed then we should be using git hooks
to trigger builds directly rather than polling continuously.

> This way, load is evened out a bit (no spikes when 'weekly' or even
> 'hourly' runs), kernelci is more realtime (to Mark's point), and
> configuration is granular and per-tree (we can still offer standard
> cool-off periods, of course).
>

The issue here is that it would be doing the opposite of what
some people want.  For example, when a branch gets updated
several times during a day, some developers don't want the first
ones to be tested but rather let the system wait until the
evening before doing a build so they get the results the next
morning (see Linus' comment on the GPIO thread).

Also, periodic checks don't have to all be at the same time, they
could be evened out within their polling period.

The only state that has to be tracked is the time of the last build per
> tree, though I have no idea about the ease of implementation.
>

The implementation will depend on a lot of things, i.e. the build
automation tool (Jenkins or other) and the backend / storage
server where previous builds are stored.  At the moment we're
keeping a file containing the commit sha from last time a branch
was sampled, it wouldn't be hard to add date and time information
to that for example (or get the last modified date of that file).

The first thing to do imo is to have the ability to specify how
often a tree should be checked in the YAML config file.  We could
reuse cron's syntax for that, or have simple but rough values
like "daily" to let the implentation decide at what to actually
do the checks.

As mentioned before, in some cases we could be using a hook to
trigger a build directly from the git server every time a commit
is pushed (like a real CI system).  We should also have a way to
define that in the YAML config, probably by not setting any
polling interval.

Guillaume

[-- Attachment #2: Type: text/html, Size: 7443 bytes --]