
From:  Meketon, Marc 
Subject:  Re: [Helpglpk] Interior point failure 
Date:  Thu, 20 Dec 2012 08:18:08 0600 
In general, it is rare that solving the dual of a problem takes less time, so I would not recommend an approach that tests if the dual is more numerically stable. The L1 regression problem is a rare exception in which directly solving
the dual makes sense. The chief issue with the original formulation of the L1 regression is that there are 2 dense columns in the "A" matrix. The computational burden of the interior point methods is the Cholesky factorization of a matrix that looks like
(in terms of density) AA(t) = A times the transpose of A. This factorization occurs once per iteration. Even a single dense column in A creates a completely dense AA(t), and that is a lot of computations and could lead to numerical difficulties. For 1000
points in the L1 regression, taking the Cholesky factorization of a dense 1000 x 1000 matrix is a huge computation (something like 2 billion floating point operations). The dual formulation of this particular problem results in AA(t) being a small 2 x 2 matrix so there is a reduction in computations by several orders of magnitude. The real issue is how to handle dense columns. Some interior point implementations separate out dense columns by splitting the column, and this seems to generally work well. Experiments with using the ShermanMorrison formula (Wikipedia
calls it the Woodbury matrix identity) that algebraically can separate out dense columns generally don't work very well numerically in practice, although in your case it would have worked magic and would have been better than the column splitting. Note that the GLPK documentation does point out the issue of dense columns and numerical instability. It says: “The routine glp_interior implements an easy version of the primaldual interiorpoint method based on Mehrotra's technique. Note that currently the GLPK interiorpoint solver does not include many important features, in particular:  it is not able to process dense columns. Thus, if the constraint matrix of the LP problem has dense columns, the solving process may be inefficient;  it has no features against numerical instability. For some LP problems premature termination may happen if the matrix ADAT becomes singular or illconditioned;  it is not able to identify the optimal basis, which corresponds to the interiorpoint solution found.” Original Message Solving the dual as Marc suggested works much better. There were no failures even w/ 10,000 points whereas the original formulation consistently fails w/ 1000 points.
This may be hopelessly naive of me, but would it make sense to generate the dual and restart if an IPM problem fails? Or, alternatively, is there a criterion that could be applied at startup that would allow determining when solving
the dual would be more appropriate? BTW I'm using the scripts for this as an example of Unix batch execution for the wikibook. Unfortunately, a system hang at my end resulted in losing an hour's work just before I was going to save it. I've started redoing it but have
to rewrite the explanation and add the model that Marc contributed. Have Fun! Reg _______________________________________________ Helpglpk mailing list This email and any attachments may be confidential or legally privileged. If you received this message in error or are not the intended recipient, you should destroy the email message and any attachments or copies, and you are prohibited from retaining, distributing, disclosing or using any information contained herein. Please inform us of the erroneous delivery by return email. Thank you for your cooperation. 
[Prev in Thread]  Current Thread  [Next in Thread] 