From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261674AbULBQ3T (ORCPT ); Thu, 2 Dec 2004 11:29:19 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261666AbULBQ3S (ORCPT ); Thu, 2 Dec 2004 11:29:18 -0500 Received: from e4.ny.us.ibm.com ([32.97.182.144]:13535 "EHLO e4.ny.us.ibm.com") by vger.kernel.org with ESMTP id S261665AbULBQ0o (ORCPT ); Thu, 2 Dec 2004 11:26:44 -0500 To: Andrew Morton cc: Jeff Garzik , torvalds@osdl.org, clameter@sgi.com, hugh@veritas.com, benh@kernel.crashing.org, nickpiggin@yahoo.com.au, linux-mm@kvack.org, linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org Reply-To: Gerrit Huizenga From: Gerrit Huizenga Subject: Re: page fault scalability patch V12 [0/7]: Overview and performance tests In-reply-to: Your message of Wed, 01 Dec 2004 23:02:17 PST. <20041201230217.1d2071a8.akpm@osdl.org> Date: Thu, 02 Dec 2004 08:24:04 -0800 Message-Id: Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 01 Dec 2004 23:02:17 PST, Andrew Morton wrote: > Jeff Garzik wrote: > > > > Andrew Morton wrote: > > > We need to be be achieving higher-quality major releases than we did in > > > 2.6.8 and 2.6.9. Really the only tool we have to ensure this is longer > > > stabilisation periods. > > > > > > I'm still hoping that distros (like my employer) and orgs like OSDL will > > step up, and hook 2.6.x BK snapshots into daily test harnesses. > > I believe that both IBM and OSDL are doing this, or are getting geared up > to do this. With both Linus bk and -mm. > > However I have my doubts about how useful it will end up being. These test > suites don't seem to pick up many regressions. I've challenged Gerrit to > go back through a release cycle's bugfixes and work out how many of those > bugs would have been detected by the test suite. > > My suspicion is that the answer will be "a very small proportion", and that > really is the bottom line. Yeah, sort of what Martin said. LTP, for instance, doesn't find a lot of what is in our internal bugzilla or the bugme database. Automated testing tends not to cover all the range of desktop peripherals and drivers that make up a large quantity of the code but gets very little coverage. Our stress testing is extensive and was finding 3 year old problems when we first ran it but it is pretty expensive to run those types of tests (machines, people, data analysis) so we typically run those tests on distros rather than mainline to help validate distro quality. However, that said, the LTP stuff is still *necessary* - it would catch quite a number of regressions if we were to regress. The good thing is that most changes today haven't been leading to regressions. That could change at any time, and one of the keys is to make sure that when we do find regressions we get a test into LTP to make sure that that particular regression never happens again. I haven't looked at the code coverage for LTP in a while but it is actually a high line count coverage test for core kernel. I don't remember if it was over 80% or not, but usually 85-88% is the point of diminishing returns for a regression suite. I think a more important proactive step here is to understand what regressions we *do* have an whether or not we can construct a test that in the future will catch that regression (or better, a class of regressions). And, maybe we need some kind of filter person or group for lkml that can see what the key regressions are (e.g. akpm, if you know of a set of regressions that you are working, maybe periodically sending those to the ltp mailing list) we could focus on creating tests for those regressions. We are also working to set up large ISV applications in a couple of spots - both inside IBM and there is a similar effort underway at OSDL. Those ISV applications will catch a class of real world usage models and also check for regressions. I don't know if it is possible to set up a better testing environment for the wild, whacky and weird things that people do but, yes, Bless them. ;-) > We simply get far better coverage testing by releasing code, because of all > the wild, whacky and weird things which people do with their computers. > Bless them. > > > Something like John Cherry's reports to lkml on warnings and errors > > would be darned useful. His reports are IMO an ideal model: show > > day-to-day _changes_ in test results. Don't just dump a huge list of > > testsuite results, results which are often clogged with expected > > failures and testsuite bug noise. > > Yes, we need humans between the tests and the developers. Someone who has > good experience with the tests and who can say "hey, something changed > when I do X". If nothing changed, we don't hear anything. > > It's a developer role, not a testing role. All testing is, really. Yep. However, smart developers continue to write scripts to automate the rote and mundane tasks that they hate doing. Towards that end, there was a recent effort at Bull on the NPTL work which serves as a very interesting model: http://nptl.bullopensource.org/Tests/results/run-browse.php Basically, you can compare results from any test run with any other and get a summary of differences. That helps give a quick status check and helps you focus on the correct issues when tracking down defects. gerrit