linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pavel Machek <pavel@ucw.cz>
To: Jiri Kosina <jkosina@suse.cz>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Linux-pm mailing list <linux-pm@lists.osdl.org>,
	kernel list <linux-kernel@vger.kernel.org>,
	Chris Wilson <chris@chris-wilson.co.uk>,
	intel-gfx <intel-gfx@lists.freedesktop.org>
Subject: Bisecting the heisenbugs (was Re: 3.15-rc: regression in suspend)
Date: Tue, 10 Jun 2014 13:50:45 +0200	[thread overview]
Message-ID: <20140610115045.GA6019@amd.pavel.ucw.cz> (raw)
In-Reply-To: <alpine.LNX.2.00.1406091302560.20782@pobox.suse.cz>

[-- Attachment #1: Type: text/plain, Size: 2709 bytes --]

On Mon 2014-06-09 13:03:31, Jiri Kosina wrote:
> On Mon, 9 Jun 2014, Pavel Machek wrote:
> 
> > > > Strange. It seems 3.15 with the patch reverted only boots in 30% or so
> > > > cases... And I've seen resume failure, too, so maybe I was just lucky
> > > > that it worked for a while.
> > > 
> > > git bisect really likes 25f397a429dfa43f22c278d0119a60 - you're about
> > > the 5th report or so that claims this is the culprit but it's
> > > something else. The above code is definitely not used in i915 so bogus
> > > bisect result.
> > 
> > Note I did not do the bisect, I only attempted revert and test.
> > 
> > And did three boots of successful s2ram.. only to find out that it
> > does not really fix s2ram, I was just lucky :-(.
> > 
> > Unfortunately, this means my s2ram problem will be tricky/impossible
> > to bisect :-(.
> 
> Welcome to the situation I have been in for past several months.

I attempted to do some analysis. It should be possible to bisect when
tests are not reliable, but it will take time and it will be almost
neccessary to have the bisection automated.

How long does the testing take for you to get to 50% test reliability?

It seems to be one minute here.

Trivial strategy is to repeat each test to get to 99% test
reliability. That should make the test about 2x longer.

There are other strategies possible -- like selecting bisect points
closer to the "bad" end, and tricky "lets compute probabilities for
each point", that work well for some parameter settings. There is
probably even better strategy possible... if you have an idea, you can
try it below.

Monte carlo simulation is attached. 

Bisector on reliable bug
-----
1024 versions bug with probability of  0  false success, monte carlo
of  30000  tries
Assume compilation takes  6 minutes and test takes 1 minutes
Average cost  71.0522 minutes
Average tests  9.99793333333
Bisector
-----
1024 versions bug with probability of  0.5  false success, monte carlo
of  30000  tries
Assume compilation takes  6 minutes and test takes 1 minutes
Average cost  143.393933333 minutes
Average tests  44.5374666667
Trisector
-----
1024 versions bug with probability of  0.5  false success, monte carlo
of  30000  tries
Assume compilation takes  6 minutes and test takes 1 minutes
Average cost  160.554 minutes
Average tests  39.9552666667
Strange
-----
1024 versions bug with probability of  0.5  false success, monte carlo
of  3000  tries
Assume compilation takes  6 minutes and test takes 1 minutes
Average cost  246.658 minutes
Average tests  38.412
pavel@amd:~/WWW$ 


									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: tricky_bisect.py --]
[-- Type: text/x-python, Size: 4321 bytes --]

#!/usr/bin/python

import random
import numpy

class Devil:
    def init(m):
        m.versions = 1024
        # Costs in minutes
        m.cost_compile = 6
        m.cost_test = 1
        # Penalty for wrongly identifying a commit.
        m.cost_failure = 1000
        # 0. == nicely behaved bug which always triggers
        m.p_false_success = .5
        m.verbose = 2

    def init_run(m):
        m.broken = random.randint(0, m.versions-1)
        m.tests = 0
        m.last_ver = -1
        m.cost = 0

    def test_failed(m, ver):
        m.tests += 1
        if ver != m.last_ver:
            m.cost += m.cost_compile
        m.cost += m.cost_test
        m.last_ver = ver
        if m.verbose > 1:
            print "   testing version ", ver, "(tests %d, cost %d)" % (m.tests, m.cost),
        if ver >= m.broken:
            if m.verbose > 1:
                print "(bad)",
            if random.random() > m.p_false_success:
                if m.verbose > 1:
                    print "FAIL"
                return 1
        if m.verbose > 1:
            print "pass"
        return 0

    def evaluate(m):
        ver = m.run()
        if ver == m.broken:
            if m.verbose:
                print "success"
        else:
            if m.verbose:
                print "FAILURE"
            m.cost += m.cost_failure

class Bisector(Devil):
    def init_run(m):
        Devil.init_run(m)
        m.good = 0
        m.bad = m.versions-1

    def run(m):
        while m.good+1 < m.bad:
            ver = (m.good + m.bad) / 2
            p_bad = 1
            failed = 0
            while p_bad > .01:
                if m.test_failed(ver):
                    m.bad = ver
                    failed = 1
                    break
                p_bad *= m.p_false_success
            if not failed:
                m.good = ver
        return m.bad

class Trisector(Devil):
    def init_run(m):
        Devil.init_run(m)
        m.good = 0
        m.bad = m.versions-1

    def run(m):
        while m.good+1 < m.bad:
            ver = (m.good*6 + m.bad*14) / 20
            p_bad = 1
            failed = 0
            while p_bad > .01:
                if m.test_failed(ver):
                    m.bad = ver
                    failed = 1
                    break
                p_bad *= m.p_false_success
            if not failed:
                m.good = ver
        return m.bad


class Strange(Devil):
    def init_run(m):
        Devil.init_run(m)
        m.good = 0
        m.bad = m.versions-1
        m.prob_bad = numpy.zeros([m.versions], float)
        m.prob_bad[:m.versions] = .9

    def ask_for(m, ver):
        if m.test_failed(ver):
            m.bad = ver
            m.prob_bad[:ver+1] /= m.prob_bad[ver]
            m.prob_bad[ver:] = 1
            return

        m.prob_bad[:ver+1] *= m.p_false_success
        m.good = ver

    def last_good(m, prob):
        g = 0
        for i in range(m.bad):
            if m.prob_bad[i] <= prob:
                g = i
        return g

    def run(m):

        while m.last_good(.01)+1 < m.bad:
            if m.verbose > 1:
                print m.prob_bad
            m.good = m.last_good(.5)
            ver = (m.good*10 + m.bad*10) / 20
            m.ask_for(ver)
            m.good = m.last_good(.1)
            ver = (m.good*10 + m.bad*10) / 20
            m.ask_for(ver)


        return m.bad

def monte_carlo(bis, tries = 30000):
    total_cost = 0.
    total_tests = 0.
    for i in range(tries):
        bis.init_run()
        if tries > 500:
            bis.verbose = 0
        bis.evaluate()
        total_cost += bis.cost
        total_tests += bis.tests

    print "-----"
    print bis.versions, "versions bug with probability of ", bis.p_false_success, " false success, monte carlo of ", tries, " tries"
    print "Assume compilation takes ", bis.cost_compile, "minutes and test takes", bis.cost_test, "minutes"
    print "Average cost ", total_cost / tries, "minutes"
    print "Average tests ", total_tests / tries

print "Bisector on reliable bug"
bis = Bisector()
bis.init()
bis.p_false_success = 0
monte_carlo(bis)

print "Bisector"
bis = Bisector()
bis.init()
monte_carlo(bis)

print "Trisector"
bis = Trisector()
bis.init()
monte_carlo(bis)

print "Strange"
bis = Strange()
bis.init()
monte_carlo(bis, 3000)





  reply	other threads:[~2014-06-10 11:50 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-13 16:09 3.15-rc: regression in suspend Pavel Machek
2014-05-13 16:19 ` Bjørn Mork
2014-05-13 16:50   ` Pavel Machek
2014-05-13 16:37 ` Pavel Machek
2014-05-13 21:41   ` Pavel Machek
2014-05-14 12:39     ` Jiri Kosina
2014-05-14 15:57       ` Pavel Machek
2014-05-14 16:10         ` Jiri Kosina
2014-05-14 20:31           ` Pavel Machek
2014-05-14 18:20         ` Pavel Machek
2014-05-15 15:29           ` Jiri Kosina
2014-05-15 15:31             ` Daniel Vetter
2014-06-07 12:05               ` Pavel Machek
2014-06-07 12:06               ` Pavel Machek
2014-06-07 23:11                 ` Pavel Machek
2014-06-09  9:25                   ` Daniel Vetter
2014-06-09 10:23                     ` Pavel Machek
2014-06-09 11:03                       ` Jiri Kosina
2014-06-10 11:50                         ` Pavel Machek [this message]
2014-06-21 20:29                         ` 3.16, i915: less colors in X? Pavel Machek
2014-06-21 20:35                           ` Pavel Machek
2014-06-21 21:06                           ` Chris Wilson
2014-06-22 14:26                             ` Pavel Machek
2014-06-22 15:11                               ` regression: 3.16, i915: less colors in X?, caused by 773875bfb6737982903c42d1ee88cf60af80089c Pavel Machek
2014-06-21 21:16                           ` 3.16, i915: less colors in X? Martin Steigerwald
2014-06-25 22:35                         ` 3.15-rc: regression in suspend Pavel Machek
2014-06-27 13:37                           ` Jiri Kosina
2014-07-07  8:39                             ` Daniel Vetter
2014-07-11 19:26                               ` Pavel Machek
2014-07-11 23:33                                 ` Jiri Kosina
2014-08-07 12:47                                 ` Jiri Kosina
2014-08-07 12:54                                   ` Jiri Kosina
2014-08-07 14:36                                   ` [Intel-gfx] " Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140610115045.GA6019@amd.pavel.ucw.cz \
    --to=pavel@ucw.cz \
    --cc=chris@chris-wilson.co.uk \
    --cc=daniel.vetter@ffwll.ch \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=jkosina@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@lists.osdl.org \
    --cc=rjw@rjwysocki.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).