Re: Generalised bisection

From: Ealdwulf Wuffinga <ealdwulf@googlemail.com>
To: Steven Tweed <orthochronous@gmail.com>
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	John Tapsell <johnflux@gmail.com>,
	Christian Couder <chriscool@tuxfamily.org>,
	Git List <git@vger.kernel.org>
Subject: Re: Generalised bisection
Date: Sun, 15 Mar 2009 19:16:16 +0000	[thread overview]
Message-ID: <efe2b6d70903151216q4a8881e5t797cf5d3bebc5697@mail.gmail.com> (raw)
In-Reply-To: <d9c1caea0903130819u770686b1w867f074ffef8fabf@mail.gmail.com>

On Fri, Mar 13, 2009 at 3:19 PM, Steven Tweed <orthochronous@gmail.com> wrote:

> Underflow when using probabilities and lack of precision (rather than
> overflow) when using negated logarithms are well known problems in the
> kind of probabilistic object tracking, inference in graphical networks
> and object identification processes I work with (in computer vision).
> I there may well be other areas of Bayesian decision theory where this
> doesn't happen, and indeed a _very_ quick scan through your document
> suggests that you're adding to tallying information on each timestep
> and recalcuating the entire model from those tallys, which is one of
> the few cases where you can't really do rescaling. I'll try and have a
> more detailled read over the weekend.

That is useful information, thanks.

It is not obvious how to perform this algorithm incrementally, because
of the need to
marginalise out the fault rate. As I understand it, marginalisation
has to be done after you
have incorporated all your information into the model, which means we
can't use the
usual bayesian updating.

> On Fri, Mar 13, 2009 at 12:49 PM, Ealdwulf Wuffinga
> <ealdwulf@googlemail.com> wrote:
>> One issue in BBChop which should be easy to fix, is that I use a dumb
>> way of calculating Beta functions. These
>> are ratios of factorials, so the subexpressions get stupidly big very
>> quickly. But I don't think that is the only problem.
>
> Yes, "Numerical Recipes" seems to suggest that computing with
> log-factorials and exponentiating works reasonably, although I've
> never tried it and NR does occasionally get things completely wrong...

I have implemented this and it does indeed allow the program to work
in more cases
without underflow, with ordinary floating point. However, I think the
underflow can still occur
in plausible use cases.

The problem is still the Beta function. In bbchop it is always passed
D and T where D is
the sum of the number of detecting observations in some of the
revisions, and T is the
same for nondetecting observations. Beta(x,y) underflows a python float
if both x and y are > ~550, and also in other cases when one is
smaller and the other,
larger. BBChop never looks again at a revision if the bug has been
observed there, but if
there are a large number of revisions, it might look at enough of them
to cause a problem.

Obviously no-one is going to manually do hundreds of observations, but
 I want BBChop
to work in the case where someone runs it on a machine in the corner
for a few days,
or even weeks,  to track down a bug which occurs too infrequently to
bisect manually.

Which means I'm still stuck with mpmath, or some equivalent.

Ealdwulf