linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Are we going too fast?
@ 2001-08-13 21:44 PinkFreud
  2001-08-14  0:04 ` PinkFreud
  0 siblings, 1 reply; 51+ messages in thread
From: PinkFreud @ 2001-08-13 21:44 UTC (permalink / raw)
  To: linux-kernel; +Cc: Alan Cox

> > Unfortunately, that's all the info I have.  Console switching was still
> > working, so I tried enabling logging to a console - no output.  System just
> > hangs.  Any suggestions on what I might try to get more information for you?
> 
> Dont suppose you know where I can get a qnx file system to play with ?

Same place I got it.  http://get.qnx.com/

> > this thread that perhaps some old HOWTOs on hardware need to be maintained
> > again - I think I agree with that.
> 
> VIA has some chipset bugs, Matrox G400 cards seem to abuse the PCI spec for 
> benchmarketing dirties.
> 
> (All chipsets have bugs in truth, its just how they appear and if they
> affect users. As of 2.4.8 the VIA ones should be in the users not affected
> camp)

I'll give 2.4.8 a try on the SMP box, and let you know the outcome.

> Alan


	Mike Edwards

Brainbench certified Master Linux Administrator
http://www.brainbench.com/transcript.jsp?pid=158188
-----------------------------------
Unsolicited advertisments to this address are not welcome.


^ permalink raw reply	[flat|nested] 51+ messages in thread
* Re: Are we going too fast?
@ 2001-08-16 21:42 PinkFreud
  0 siblings, 0 replies; 51+ messages in thread
From: PinkFreud @ 2001-08-16 21:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: Dr. Kelsey Hudson, Alan Cox

> On Mon, 13 Aug 2001, PinkFreud wrote:
> 
> > This even happens after the BIOS flash - the first few times I switched
> > consoles, it actually survived.  After that, it locked up again.
> 
> Could you perchance be running a framebuffer console?

Not at the moment.  I have tried in the past, curious to see if it resolved
the problem - it didn't.

Again, this only seems to happen in the 2.4 kernels (I just read that 2.4.9
has been released, but haven't tried it yet).  2.2.19 is stable when doing this.

Alan, do you have any more thoughts on why this might be happening?

Thanks.


	Mike Edwards

Brainbench certified Master Linux Administrator
http://www.brainbench.com/transcript.jsp?pid=158188
-----------------------------------
Unsolicited advertisments to this address are not welcome.


^ permalink raw reply	[flat|nested] 51+ messages in thread
* Re: Are we going too fast?
@ 2001-08-15 20:13 Roy Murphy
  0 siblings, 0 replies; 51+ messages in thread
From: Roy Murphy @ 2001-08-15 20:13 UTC (permalink / raw)
  To: linux-kernel

'Twas brillig when Mike Edwards scrobe:
>I think that's a bit unfair. Rather, I suspect people see the 
>word 'stable', and assume, for some unknown reason, that the kernel is >stable.
*AHEM* 

Whatever truth value 2.4 has for the variable stable, it can not be stored in
a boolean type.

'Stable' means that the direction of development is intended to reduce the number
of bugs not add new features unless they can reasonably be shown to not introduce
major bugs.  That the 2.5 tree has not been opened indicates the recognition
that additional concentrated work on 2.4 is needed.

>Seriously, though - even distributions are including 2.4 kernels now. 
>RedHat, Mandrake, Slackware ... Should the latest versions of these 
>distributions be considered unstable as well? 

Even older releases of distributions are maintained.  Should we ever get to
kernel 2.2.38, the distribution maintainers should be releasing bugfix patches
for older distributions with the latest 2.2 kernel.

>Perhaps it needs to be made clear to people that these kernels still 
>aren't all they could be. 

No kernel is perfect.  The judgement was that it was ready to switch from adding
features to increasing stability.  Thus it has ever been since my first kernel
~= 0.95.
-- 
Roy Murphy      \ CSpice -- A mailing list for Clergy Spouses
murphy@panix.com \  http://www.panix.com/~murphy/CSpice.html

^ permalink raw reply	[flat|nested] 51+ messages in thread
* Re: Are we going too fast?
@ 2001-08-14 20:20 Per Jessen
  0 siblings, 0 replies; 51+ messages in thread
From: Per Jessen @ 2001-08-14 20:20 UTC (permalink / raw)
  To: David Ford; +Cc: linux-kernel

On Tue, 14 Aug 2001 15:54:33 -0400, David Ford wrote:

>Per Jessen wrote:
>
>>>On Mon, 13 Aug 2001 14:11:32 +0100 (BST), Alan Cox wrote:
>>>
>>>If you want maximum stability you want to be running 2.2 or even 2.0. Newer
>>>less tested code is always less table. 2.4 wont be as stable as 2.2 for a
>>>year yet.
>>>
>>
>>Couldn't have put that any better. On mission-critical systems, this is
>>exactly what people do. Personally, my experience is from the big-iron
>>world of S390 -  if you're a bleeding-edge organisation, you'll be out
>>there applying the latest PTFs, you'll be running the latest OS/390 etc. 
>>If you're conservative, you're at least 2, maybe 3 releases (in todays 
>>OS390 this means about 18-24 months) behind. If you're ultra-conservative,
>>you'll wait for the point where you can no longer avoid an upgrade.
>>
>
>Unfortunately, this methodology also introduces another important 
>factor.  You are the most likely target for exploits and 
>vulnerabilities.  As is ever so strongly evidenced by the great numbers 
>of people being exploited because the version of software they have is 
>outdated.
>
>It's a gross measure of risks; where does the risk come from, how can it 
>affect you, and what can you do about it.

Completely agree. This is also why most big-iron shops employ a couple
of people to do change-management, AKA risk-management. And of course
they'll try to evaluate the risk in moving to the latest OS390 and/or Linux  
versus possible external threats. In the OS390 world, external threats
are limited, and being conservative often pays off very well. In the
Linux world, lots of systems have tight internet connections, and being
alert and uptodate will pay off. It all depends - there is no cure for all.


/Per

PS: is it customary to copy posters on a posting to lkml ? I don't
mind, but just to avoid flames.

regards,
Per Jessen, Zurich
http://www.enidan.com - home of the J1 serial console.

Windows 2001: "I'm sorry Dave ...  I'm afraid I can't do that."



^ permalink raw reply	[flat|nested] 51+ messages in thread
* Re: Are we going too fast?
@ 2001-08-14 20:07 Per Jessen
  0 siblings, 0 replies; 51+ messages in thread
From: Per Jessen @ 2001-08-14 20:07 UTC (permalink / raw)
  To: linux-kernel, PinkFreud; +Cc: Alan Cox

On Tue, 14 Aug 2001 12:32:31 -0400 (EDT), PinkFreud wrote:

>> > The latest stable version of the Linux kernel is: 2.4.8 2001-08-11 04:13 
>> > UTC Changelog 
><snip>
>> 
>> Kernel.org certainly should list the 2.2 status (hey I maintain it I'm
>> allowed to be biased). Its unfortunate it many ways that people are still so
>> programmed to the "latest version" obsession of the proprietary world some
>> times. For most people 2.4 is the right choice but for absolute stability
>> why change 8)
>
>I think that's a bit unfair.  Rather, I suspect people see the word 'stable',
>and assume, for some unknown reason, that the kernel is stable.  *AHEM*
>
>Seriously, though - even distributions are including 2.4 kernels now.  RedHat,
>Mandrake, Slackware ... Should the latest versions of these distributions be
>considered unstable as well?

SuSE started shipping 7.1 with a 2.4.0 kernel (optional). I think I installed
it on a development workstation just about the time when 2.4.2 was released.

For what we do (www.enidan.com), I tend to be more conservative, so we were
using 2.0.36 for quite some time, until we decided to move entirely to 2.2.12.
Our 16CPU cluster is up at 2.4.8 - trying to break things :-) -  but for things 
that people depend on, it's 2.2.19. Some workstations are at 2.4.x - depends.

/Per


regards,
Per Jessen, Zurich
http://www.enidan.com - home of the J1 serial console.

Windows 2001: "I'm sorry Dave ...  I'm afraid I can't do that."



^ permalink raw reply	[flat|nested] 51+ messages in thread
* Re: Are we going too fast?
@ 2001-08-14 19:47 Per Jessen
  0 siblings, 0 replies; 51+ messages in thread
From: Per Jessen @ 2001-08-14 19:47 UTC (permalink / raw)
  To: Helge Hafting, linux-kernel, PinkFreud

On Tue, 14 Aug 2001 09:57:29 +0200, Helge Hafting wrote:

>PinkFreud wrote:
>[...]
>
>> > Matter of opinion. I would say that Linux-2.4 has been way long to come
>> > and wasn't quite ready for stable status. There are numerous other O/Ses
>> 
>> That's what I've been attempting to say, as well.  It seems to have been
>> released too quickly - minimal testing, too many bugs.
>
>The testing isn't minimal - it is merely ongoing.  Users don't
>pay for the kernel, so they are part of the testing team.
>
>If you use anything but a distribution kernel, keep previous
>kernels around when you upgrade.  If the new one fails, report
>it here and go back to the previous one.  The only way to get wide
>testing is when enough people do this.

Very true, although I get the feeling that the 2.2. series was far more
'stable' than the current 2.4 series. Just a feeling, but .... 
What you're saying seems to apply more to a 2.<odd> kernel series, IMHO ?

I haven't done this myself, but perhaps we ought to look at the frequency
of new 2.4 releases compared to new 2.2 releases. Shouldn't their frequency
be roughly equal ? ie. the speed with which we're seeing new 2.4 releases 
should be - roughly - that of which we saw new 2.2 kernels emerging ?

comments ?



regards,
Per Jessen, Zurich
http://www.enidan.com - home of the J1 serial console.

Windows 2001: "I'm sorry Dave ...  I'm afraid I can't do that."



^ permalink raw reply	[flat|nested] 51+ messages in thread
* Re: Are we going too fast?
@ 2001-08-14 16:32 PinkFreud
  0 siblings, 0 replies; 51+ messages in thread
From: PinkFreud @ 2001-08-14 16:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: Alan Cox

> > The latest stable version of the Linux kernel is: 2.4.8 2001-08-11 04:13 
> > UTC Changelog 
<snip>
> 
> Kernel.org certainly should list the 2.2 status (hey I maintain it I'm
> allowed to be biased). Its unfortunate it many ways that people are still so
> programmed to the "latest version" obsession of the proprietary world some
> times. For most people 2.4 is the right choice but for absolute stability
> why change 8)

I think that's a bit unfair.  Rather, I suspect people see the word 'stable',
and assume, for some unknown reason, that the kernel is stable.  *AHEM*

Seriously, though - even distributions are including 2.4 kernels now.  RedHat,
Mandrake, Slackware ... Should the latest versions of these distributions be
considered unstable as well?

Perhaps it needs to be made clear to people that these kernels still aren't
all they could be.


	Mike Edwards

Brainbench certified Master Linux Administrator
http://www.brainbench.com/transcript.jsp?pid=158188
-----------------------------------
Unsolicited advertisments to this address are not welcome.


^ permalink raw reply	[flat|nested] 51+ messages in thread
* Re: Are we going too fast?
@ 2001-08-14 16:25 PinkFreud
  0 siblings, 0 replies; 51+ messages in thread
From: PinkFreud @ 2001-08-14 16:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: Francois Romieu

> PinkFreud <pf-kernel@mirkwood.net> :
> [...]
> > The unthinkable has happened - it locked up again.  Same problem.  No
> > keyboard, no mouse, no display, no network.  It was as far gone as
> > possible.
> 
> Is the nmi_oopser (Documentation/nmi_watchdog.txt) inefficient here ?

>From Documentation/nmi_watchdog.txt:
NOTE: currently the NMI-oopser is enabled unconditionally on x86 SMP
boxes.

I'm not specifically enabling it in LILO, but according to the docs, it's
enabled already.  Unfortunately, the lockup happens when switching between
virtual consoles, so even if something WERE printed to the screen, I'm unlikely
to see it.

Side note: The lockup does *NOT* occur on 2.2.19 with SMP.

> Ueimor


	Mike Edwards

Brainbench certified Master Linux Administrator
http://www.brainbench.com/transcript.jsp?pid=158188
-----------------------------------
Unsolicited advertisments to this address are not welcome.


^ permalink raw reply	[flat|nested] 51+ messages in thread
* Re: Are we going too fast?
@ 2001-08-13 21:36 PinkFreud
  2001-08-14  7:57 ` Helge Hafting
  0 siblings, 1 reply; 51+ messages in thread
From: PinkFreud @ 2001-08-13 21:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: Gérard Roudier

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=X-UNKNOWN, Size: 5979 bytes --]

> On Mon, 13 Aug 2001, PinkFreud wrote:
> 
> > Have a contact address for LSILOGIC?  I'll be happy to CC them in any
> > future bug reports.  It may also be useful to place this address in
> > comments at the top of the ncr53c8xx driver as well.
> 
> It is in fact Pamela Delaney, a LSILOGIC employee, who added support for
> the 53C1010 in the sym53c8xx driver version 1.6. This allowed me time for
> porting version 1.5 (without C1010 support) to FreeBSD (sym driver) and to
> add my own variant of C1010 support to sym. Pamela didn't seem to want to
> add her name and address in the source.  Anyway, you may want to have a
> look at the LSILOGIC web and ftp site for the Linux support. I would be
> surprised if you cannot find Pamela's email address.

I'll look.  Thanks.

> > I think I've proven a number of things to be broken in the 2.4.x series -
> > but they doesn't seem to be getting fixed.  My point was, perhaps more
> > effort should be put into fixing these bugs, rather than adding new
> > features to a supposedly stable series.
> 
> Matter of opinion. I would say that Linux-2.4 has been way long to come
> and wasn't quite ready for stable status. There are numerous other O/Ses

That's what I've been attempting to say, as well.  It seems to have been
released too quickly - minimal testing, too many bugs.

> that have had to suffer such a problem in their long life, especially
> commercial ones. Nothing that only applies to Linux here, in my
> experience.

I think Linux is something of a unique case here, though.  Linus wanted to get
2.4.x out quickly - and now there's more bugs to deal with than ever.  For
this level of (in?)stability, I'd still expect to see this as a development
kernel.  Please, don't get me wrong - I *do* realize the earlier in a kernel
series we are, the more problems will appear.  I just happen to think that
there are far too many for a series labeled 'stable'.  Either a third series
should be created for the interim, or perhaps the kernels need to be in
'devel' for a bit longer?

Just my 2 cents.

> > If Linux is trying to prove itself usable for the business world, how is
> > that going to help?  I'm not implying that I'm a business in any way,
> > shape, or form - but given that I think the majority of us want to see
> > Linux in the server rooms, and even on the desktop, what does this mean
> > for those users?
> 
> Btw, we are using some Linux machines at the company I work to. They
> donnot seem to run 2.4 kernels for the moment. As I am the only guy that
> also uses FreeBSD, I donnot want to risk FreeBSD 5 for real work for the
> same reason. :)

5.0 is current, 4.3 is release.  As I understand it, 'current' is the
equivalent of Linux's 'devel' and 'stable' the equivalent of Linux's 'stable'.

If that's the case, your refusal to use a 'current' release on a production
machine would be like refusing to use 2.3.x or 2.5.x on a production machine -
a very sound decision.  But what of 2.4.x?  It's called stable, but yet has a
ways to go.

> OTOH, we have software that explodes Solaris 8 in a millisecond but that
> works reliably on previous Solaris releases, but Solaris 8 is not that
> young an OS release as we know. Just an example that applies to a
> commercial Unix O/S...

True.  But is that due to a bug in the particular software, or the OS?  :P

> > I know plenty of Windows users who are quite upset at the lack of
> > stability.  They either don't know/understand that there are alternatives,
> > or feel it's too hard to switch to an alternative.
> 
> A windows machine is generally some melting pot of [an O/S + broken
> hardwares + broken drivers + broken applications + viruses] driven by
> unaware users. It is a miracle for such a thing to work enough for real
> work to be possible. Personnaly, I haven't problems with Windows. It runs
> games just fine and since I donnot use it for anything else, it just fit
> my needs. :-)

There's plenty of unaware users using Linux nowadays (RedHat, Mandrake, ...).
What does this mean for them?  Distributions are now including 2.4.x kernels.
What happens when their systems blow up, as the 3 I've used here have?

> > I definintely believe this (the random panic) to be a bug in your
> > ncr53c8xx driver.  ksymoops seemed to believe it to be the case, and
> > NetBSD seems to be working fine, which means it's not faulty hardware.
> 
> I have retrieved your bug report (emailed on 28 July 2001). I was in
> vacation at this date until yesterday. I cannot read thousands of emails
> in a couple of hours, sorry.

My apologies.  I understand you're busy.  I just got a bit frustrated when I
found that all three systems I've tried 2.4.x kernels on blew up.

> The problem is due to a NULL pointer being read from the driver DONE
> queue. This queue uses 0xfffffff as a tag for empty entries and valid
> addresses for entries pointing to completed CCBs. Since this driver is
> actually stable since years (only sym53c8xx was under development) it is
> likely the driver data structures that are screwed up from some other
> place rather than a driver bug, in my opinion. If this also happens on
> 2.2.x (x>=18) kernel release, it will be another story, obviously.

I haven't tried the later 2.2.x kernels on that machine.  Since I do plan on
using that system in some sort of production capacity, and since it's currently
running NetBSD without a problem, I don't think I'm going to get the chance to
run Linux on it any time in the near future.  I do, as mentioned earlier, have
an Alpha with the same controller, which currently operates just fine with
2.2.14.  I will be more than happy to install the latest 2.2.x kernel on it
when the NetBSD system replaces what it does, and see if it blows up.

> Regards,
>   Gérard.


	Mike Edwards

Brainbench certified Master Linux Administrator
http://www.brainbench.com/transcript.jsp?pid=158188
-----------------------------------
Unsolicited advertisments to this address are not welcome.


^ permalink raw reply	[flat|nested] 51+ messages in thread
* Re: Are we going too fast?
@ 2001-08-13 21:07 PinkFreud
  2001-08-13 21:20 ` Alan Cox
  2001-08-14  2:24 ` David Ford
  0 siblings, 2 replies; 51+ messages in thread
From: PinkFreud @ 2001-08-13 21:07 UTC (permalink / raw)
  To: linux-kernel; +Cc: Alan Cox

> > of them have suffered from one malady or another - from the dual PIII with
> > the VIA chipset and Matrox G400 card, which locks up nicely when I switch
> 
> Welcome to wacky hardware. To get a G400 stable on x86 you need at least
> 
> XFree86 4.1 if you are running hardware 3D (and DRM 4.1)

I run 4.1.0 on that system.  DRM, I don't believe, is currently enabled,
though I'd like it to be.

> 2.4.8 or higher with the VIA fixes

Oooooh.  So .8 *does* have fixes for VIA... I think I'll give that a try now.

> Preferably a very recent BIOS update for the VIA box

Hmm.  I'll also check VIA to see if they have any updates for this system.
Thanks for the suggestion.

> Of those only the XFree hardware 3d stuff is software bug related.

I'm not currently using 3D - yet the system insists on locking up when I
switch from X to a text console and back.  Again, this only occurs with an
SMP kernel (this is an SMP system).  This does NOT occur with a uniprocessor
kernel.

> > emergency sync) when attempting to use 'ls' on a mounted QNX filesystem
> > (ls comes up fine, then system crashes - nothing sent to syslog, no errors
> > on screen, nothing!) - and this latest is with 2.4.8!
> 
> The qnxfs code is experimental - so I can believe it might fail in 2.4. I'd
> be very interested in info on that one.

Unfortunately, that's all the info I have.  Console switching was still
working, so I tried enabling logging to a console - no output.  System just
hangs.  Any suggestions on what I might try to get more information for you?

> > Should development continue on the latest and supposedly greatest
> > drivers?  Or should the existing bugs be fixed first?  I've got at least
> > three up there that need taking care of, and I'm sure others on this list
> > have found more.  3 seperate crashes on 3 seperate installs on 3 seperate
> > boxes - that's 100% failure rate.  If I get 100% failure on my installs,
> > what are others seeing?
> 
> Near enough 0%. But then I try and avoid buying broken chipsets.

I wasn't aware VIA nor Matrox were broken.  I've seen someone else mention in
this thread that perhaps some old HOWTOs on hardware need to be maintained
again - I think I agree with that.

> > I like Linux.  I'd like to stick with it.  But if it's going to
> > continually crash, I'm going to jump ship - and I'll start recommending to
> 
> If you want maximum stability you want to be running 2.2 or even 2.0. Newer
> less tested code is always less table. 2.4 wont be as stable as 2.2 for a
> year yet.

Perhaps series name should be changed from 'stable' to something else - 
'release'?

> Alan


	Mike Edwards

Brainbench certified Master Linux Administrator
http://www.brainbench.com/transcript.jsp?pid=158188
-----------------------------------
Unsolicited advertisments to this address are not welcome.



^ permalink raw reply	[flat|nested] 51+ messages in thread
[parent not found: <no.id>]
[parent not found: <fa.l9dq0tv.7gqnhh@ifi.uio.no>]
* Re: Are we going too fast?
@ 2001-08-13 18:53 Petr Vandrovec
  0 siblings, 0 replies; 51+ messages in thread
From: Petr Vandrovec @ 2001-08-13 18:53 UTC (permalink / raw)
  To: Francois Romieu; +Cc: linux-kernel, pf-kernel

On 13 Aug 01 at 10:55, Francois Romieu wrote:
> 
> Try and send specific bug-reports to the maintainers. 
> l-k archives may give you some light on issues with VIA chipsets.
> 
> I'm not convinced that gaining stability on a VIA + G400 + X + smp 
> combo is an easy task anyway.

VIA (694X) (Gigabyte 6VXD7), G450, XF4.0/XF4.1, SMP (2xPIII/833) works 
fine if you
(1) do not use matrox module from Matrox and
(2) there is not PCI activity which targets G400 when X initialize
    hardware (during start or console switch) and
(3) it is highly unrecommended to use DRI (as it touches G400 hardware
    even when X are not on foreground)

If it is too limiting for you, look for another chipset. With i440BX
you'll get at least 2x faster PCI->AGP transfers than with VIA: i440BX
can handle 60MBps (32bpp full PAL) without any problems, while 694x has 
problems with 30MBps (16bpp full PAL) (IDE disk accesses are visible
as dropouts on picture).

There is nothing Linux kernel can do for stability of such box.
                                            Best regards,
                                                Petr Vandrovec
                                                vandrove@vc.cvut.cz
                                                

^ permalink raw reply	[flat|nested] 51+ messages in thread
* Re: Are we going too fast?
@ 2001-08-13 18:46 Per Jessen
  2001-08-14 13:58 ` Andrew Scott
  2001-08-14 19:54 ` David Ford
  0 siblings, 2 replies; 51+ messages in thread
From: Per Jessen @ 2001-08-13 18:46 UTC (permalink / raw)
  To: linux-kernel

>On Mon, 13 Aug 2001 14:11:32 +0100 (BST), Alan Cox wrote:
>
>If you want maximum stability you want to be running 2.2 or even 2.0. Newer
>less tested code is always less table. 2.4 wont be as stable as 2.2 for a
>year yet.

Couldn't have put that any better. On mission-critical systems, this is
exactly what people do. Personally, my experience is from the big-iron
world of S390 -  if you're a bleeding-edge organisation, you'll be out
there applying the latest PTFs, you'll be running the latest OS/390 etc. 
If you're conservative, you're at least 2, maybe 3 releases (in todays 
OS390 this means about 18-24 months) behind. If you're ultra-conservative,
you'll wait for the point where you can no longer avoid an upgrade.


regards,
Per Jessen


regards,
Per Jessen, Zurich
http://www.enidan.com - home of the J1 serial console.

Windows 2001: "I'm sorry Dave ...  I'm afraid I can't do that."



^ permalink raw reply	[flat|nested] 51+ messages in thread
* Re: Are we going too fast?
@ 2001-08-13 17:53 PinkFreud
  2001-08-13 20:27 ` Gérard Roudier
  0 siblings, 1 reply; 51+ messages in thread
From: PinkFreud @ 2001-08-13 17:53 UTC (permalink / raw)
  To: linux-kernel; +Cc: Gérard Roudier

> On Mon, 13 Aug 2001, PinkFreud wrote:
>
> > Please CC me in any replies, I am not subscribed to this list.

This still holds true - I'm not subscribed to this list right now.

> >
> > Please forgive me if I seem incoherent.  It's after 3:30 AM here.
>
> So, you will be forgiven, otherwise ... :-)

Thanks.  :)

> You may want to elaborate on the ncr53c8xx problems (I maintain this
> driver). More generally, you must not ignore the thousands of bugs in the
> hardware you are using, but software developpers haven't access to all
> errata descriptions since hardware vendors donnot like to make this
> information freely available.

I have elaborated.  See below.

>
> About ncr53c8xx problem reports, I cannot reply to all of them. You may
> also send them to LSILOGIC support. They also want Linux to work with the
> ncr/sym/lsi/53c8xx PCI-SCSI controllers, even with old NCR ones. Some
> other vendors seem to just ignore old hardwares. For example NVIDIA that
> killed (bought?) 3DFX, does not seem interested in maintaining drivers for
> the 3DFX graphic chips.

Have a contact address for LSILOGIC?  I'll be happy to CC them in any
future bug reports.  It may also be useful to place this address in
comments at the top of the ncr53c8xx driver as well.

> I use Linux since some 0.99.x (was yygdrasil distribution). My experience
> has been that 1.2.13, 2.0.27 and 2.2.13 worked reliable enough for me.

I've used all three of those kernels, and I tend to agree - except for the
nasty security hole in 2.2.13 (but that happens with any OS - look at
Windows!).

> 'Stable' does not means reliable for any workload. It means that we stop
> developping (implies changing large portions of code or modifying
> interfaces) but only focus on fixing the software with it current design
> (implies only changing what is proven to be broken).  This applies to all

I think I've proven a number of things to be broken in the 2.4.x series -
but they doesn't seem to be getting fixed.  My point was, perhaps more
effort should be put into fixing these bugs, rather than adding new
features to a supposedly stable series.

> softwares, not only to Linux. As a result, early stable releases still
> have numerous bugs that may prevent numerous systems from working
> reliably. It is up to user to check releases and switch to the one that
> fits his expectations.

If Linux is trying to prove itself usable for the business world, how is
that going to help?  I'm not implying that I'm a business in any way,
shape, or form - but given that I think the majority of us want to see
Linux in the server rooms, and even on the desktop, what does this mean
for those users?

> > This brings me to the subject of this rant: are we going too fast?  New
> > drivers are still showing up in each successive kernel, and yet no one
> > seems to be able to fix the old bugs that already exist.  Are we looking
> > to have the reliability of Windows?  It's starting to seem so - each
> > successive kernel series just seems to crash more and more often.  When
> > will we reach the point where Windows, on the average, will have greater
> > uptime than Linux systems?  Perhaps it's time to slow down, and do some
> > debugging.
>
> The reliabity of Windows seems to be just fine for most users since it is
> the O/S most of them want to use.:-)

I know plenty of Windows users who are quite upset at the lack of
stability.  They either don't know/understand that there are alternatives,
or feel it's too hard to switch to an alternative.

> > Should development continue on the latest and supposedly greatest
> > drivers?  Or should the existing bugs be fixed first?  I've got at least
> > three up there that need taking care of, and I'm sure others on this list
> > have found more.  3 seperate crashes on 3 seperate installs on 3 seperate
> > boxes - that's 100% failure rate.  If I get 100% failure on my installs,
> > what are others seeing?
>
> Hopefully you aren't a typical computer user or you just have bad luck
> with computers. :-)

Certainly the case with the former, and sadly, you're not the first to
suggest the latter.  :)

> All software developpers and maintainers want their software to work and
> thus bugs to be fixed. This is just sometimes hard to know what is
> actually broken. My experience is that no more than 10% bug reports about
> a software are due to a bug in the software that is pointed out by the
> report. And for these less than 10% relevant reports, maintainers must
> find what is broken... not simple as you can imagine...

I definintely believe this (the random panic) to be a bug in your
ncr53c8xx driver.  ksymoops seemed to believe it to be the case, and
NetBSD seems to be working fine, which means it's not faulty hardware.

> Btw, I use SYM-2 driver under Linux, FreeBSD and NetBSD 1.5. I have no
> problem with it. If you plan to use Ultra-160 LSI53C1010 chips, the NetBSD
> SIOP driver may be sub-optimal and, btw, it does not seem to know about
> C1010 chips erratas.

I'll keep that in mind.  However, the box in question is an older system,
so I doubt it'll ever see one of those.

By the way, the driver seems to work with 2.2.14 on an Alpha.  On this
system, though, 2.4.x just manages to blow up.

> You donnot seem to have given a try with FreeBSD. Were there some strong
> reasons for that ?

Actually, I was considering both Free- and NetBSD.  I just chose NetBSD.

> > sauron@rivendell:~$ uptime
> >  3:17AM  up 12 days, 15:20, 2 users, load averages: 1.48, 0.66, 0.31
> > sauron@rivendell:~$ uname -a
> > NetBSD rivendell 1.5.1 NetBSD 1.5.1 (RIVENDELL) #0: Tue Jul 31 22:58:54
> > EDT 2001     root@rivendell:/usr/src/sys/arch/i386/compile/RIVENDELL i386
> > sauron@rivendell:~$ dmesg | grep -i sym
> > siop0 at pci0 dev 6 function 0: Symbios Logic 53c810 (fast scsi)
> >
> > (The controller is old - it was made by NCR before it became Symbios Logic
> > - hence, why I was using the NCR driver for it, rather than the Symbios
> > driver, in Linux.)
> >
> > Working on 13 days uptime.  That's well over twice the uptime for Linux on
> > that box.  That's what happens when the kernel has bugs.
>
> You seem so sure it is the ncr53c8xx driver that breaks your Linux ...
> If it was so broken, may be I would have heared about. :-)

You should have heard about it.  The last two messages were sent to the
address you have listed in your driver.


Date: Fri, 20 Jul 2001 13:26:12 -0400 (EDT)
From: PinkFreud <pf-kernel@mirkwood.net>
To: linux-kernel@vger.kernel.org
Subject: two seperate 2.4.x problems...
Message-ID: <Pine.LNX.4.20.0107201305350.5411-100000@eriador.mirkwood.net>

                                                                          
Date: Mon, 23 Jul 2001 14:11:35 -0400 (EDT)
From: PinkFreud <pf-kernel@mirkwood.net>
To: linux-kernel@vger.kernel.org
cc: Gerard Roudier <groudier@club-internet.fr>
Subject: 2.4.6 NCR53C8XX bug?  (was: 2.4.x problems (this is *not* a
    distribution
 related question!))
Message-ID: <Pine.LNX.4.20.0107231347060.5411-100000@eriador.mirkwood.net>


Date: Sat, 28 Jul 2001 22:20:20 -0400 (EDT)
From: PinkFreud <pf-kernel@mirkwood.net>
To: linux-kernel@vger.kernel.org
cc: Gerard Roudier <groudier@club-internet.fr>
Subject: 2.4.7 oops + panic in ncr53c8xx (ncr_wakeup_done)
Message-ID: <Pine.LNX.4.20.0107282207180.316-100000@eriador.mirkwood.net>


Would you like me to re-send the ksymoops output?

> > Take this rant for what you will.  Personally, I switched from Windows to
> > Linux 5 years ago for the stability.  If I need to switch OSs again to
> > continue to have stability, I will.  Somehow, I suspect, if kernel
> > development continues down this path, many others will wind up switching
> > to other OSs as well.
>
> If NetBSD fits your need, then let me encourage you to use it.

It is for the moment.  I hoped Linux would fit my need, though.

> > I like Linux.  I'd like to stick with it.  But if it's going to
> > continually crash, I'm going to jump ship - and I'll start recommending to
> > others that they do the same.
>
> That's unclever recommendation, in my opinion.
> For example, my children are happy using Windows 98 and I donnot want to
> recommend them anything else.

I recommend using what I feel is usable (which includes stability) - which
is why I never recommend using Windows.  But that's just me.


	Mike Edwards

Brainbench certified Master Linux Administrator
http://www.brainbench.com/transcript.jsp?pid=158188
-----------------------------------
Unsolicited advertisments to this address are not welcome.




^ permalink raw reply	[flat|nested] 51+ messages in thread
* Are we going too fast?
@ 2001-08-13  7:43 PinkFreud
  2001-08-13  8:52 ` Brian
                   ` (7 more replies)
  0 siblings, 8 replies; 51+ messages in thread
From: PinkFreud @ 2001-08-13  7:43 UTC (permalink / raw)
  To: linux-kernel

Please CC me in any replies, I am not subscribed to this list.

Please forgive me if I seem incoherent.  It's after 3:30 AM here.


I have installed various 2.4.x kernels on 3 seperate systems here.  *ALL*
of them have suffered from one malady or another - from the dual PIII with
the VIA chipset and Matrox G400 card, which locks up nicely when I switch
from X to a text console and back to X (but NOT under a uniprocessor
kernel!), to the system with the NCR 53c810 SCSI board, which suffered
random kernel panics anywhere from 2 hours to 5 days after booting, due to
the ncr53c8xx driver, to YET ANOTHER system which has shown a penchant for
crashing (read: no response on console, can use magic sysrq, but fails to
emergency sync) when attempting to use 'ls' on a mounted QNX filesystem
(ls comes up fine, then system crashes - nothing sent to syslog, no errors
on screen, nothing!) - and this latest is with 2.4.8!

I've used Linux for over 5 years now.  In all the time I've used it, I
have never seen this much instability in a single kernel
series - though I've noticed each successive 'stable' series having
more bugs than the last (2.2.x crashed once a week with SMP 
until 2.2.10!).  Furthermore, I have had a HELL of a time trying
to get responses to the first two problems (this is the first report for
the third).  It used to be that I could ask a question on this list, and
receive responses.  Not anymore.  I can't seem to get the time of day from
anyone on this list now.

This brings me to the subject of this rant: are we going too fast?  New
drivers are still showing up in each successive kernel, and yet no one
seems to be able to fix the old bugs that already exist.  Are we looking
to have the reliability of Windows?  It's starting to seem so - each
successive kernel series just seems to crash more and more often.  When
will we reach the point where Windows, on the average, will have greater
uptime than Linux systems?  Perhaps it's time to slow down, and do some
debugging.

This is supposed to be a 'stable' kernel series?  I see nothing stable
about it.

Should development continue on the latest and supposedly greatest
drivers?  Or should the existing bugs be fixed first?  I've got at least
three up there that need taking care of, and I'm sure others on this list
have found more.  3 seperate crashes on 3 seperate installs on 3 seperate
boxes - that's 100% failure rate.  If I get 100% failure on my installs,
what are others seeing?

To those of you who would tell me to fix them myself: I am an
administrator.  I know Perl.  I am not all that familiar with C, nor with
kernel programming.  They're not my bugs, but I would fix them if I were
able to.  I'd hope the authors of said bugs would be willing to fix them -
but given the track record I've seen for the first two problems, I'm not
holding my breath for the third to be fixed any time soon.

I don't know about the rest of you, but I'm going to give up soon and
switch to NetBSD.  I've already done it on the system with the NCR 53c810
board - and it's proven to be far more stable than 2.4.x kernels have ever
managed to be on it.  What does that say?

sauron@rivendell:~$ uptime
 3:17AM  up 12 days, 15:20, 2 users, load averages: 1.48, 0.66, 0.31
sauron@rivendell:~$ uname -a
NetBSD rivendell 1.5.1 NetBSD 1.5.1 (RIVENDELL) #0: Tue Jul 31 22:58:54
EDT 2001     root@rivendell:/usr/src/sys/arch/i386/compile/RIVENDELL i386
sauron@rivendell:~$ dmesg | grep -i sym
siop0 at pci0 dev 6 function 0: Symbios Logic 53c810 (fast scsi)

(The controller is old - it was made by NCR before it became Symbios Logic
- hence, why I was using the NCR driver for it, rather than the Symbios
driver, in Linux.)

Working on 13 days uptime.  That's well over twice the uptime for Linux on
that box.  That's what happens when the kernel has bugs.

Take this rant for what you will.  Personally, I switched from Windows to
Linux 5 years ago for the stability.  If I need to switch OSs again to
continue to have stability, I will.  Somehow, I suspect, if kernel
development continues down this path, many others will wind up switching
to other OSs as well.

I like Linux.  I'd like to stick with it.  But if it's going to
continually crash, I'm going to jump ship - and I'll start recommending to
others that they do the same.


	Mike Edwards

Brainbench certified Master Linux Administrator
http://www.brainbench.com/transcript.jsp?pid=158188
-----------------------------------
Unsolicited advertisments to this address are not welcome.


^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2001-08-16 21:42 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-08-13 21:44 Are we going too fast? PinkFreud
2001-08-14  0:04 ` PinkFreud
2001-08-14  7:24   ` Francois Romieu
2001-08-15 23:24   ` Dr. Kelsey Hudson
  -- strict thread matches above, loose matches on Subject: below --
2001-08-16 21:42 PinkFreud
2001-08-15 20:13 Roy Murphy
2001-08-14 20:20 Per Jessen
2001-08-14 20:07 Per Jessen
2001-08-14 19:47 Per Jessen
2001-08-14 16:32 PinkFreud
2001-08-14 16:25 PinkFreud
2001-08-13 21:36 PinkFreud
2001-08-14  7:57 ` Helge Hafting
2001-08-13 21:07 PinkFreud
2001-08-13 21:20 ` Alan Cox
2001-08-13 21:41   ` Rog�rio Brito
2001-08-14  0:56   ` Ben Ford
2001-08-14  7:34   ` Peter Wächtler
2001-08-14  2:24 ` David Ford
2001-08-14  4:19   ` Nicholas Knight
2001-08-14 12:49     ` Alan Cox
2001-08-14 22:27       ` Paul G. Allen
     [not found] <no.id>
2001-08-13 20:24 ` Alan Cox
2001-08-13 21:06   ` Anthony Barbachan
2001-08-14 20:47 ` Alan Cox
2001-08-15  0:07   ` PinkFreud
     [not found] <fa.l9dq0tv.7gqnhh@ifi.uio.no>
     [not found] ` <fa.g70as7v.1722ipv@ifi.uio.no>
2001-08-13 19:14   ` John Weber
2001-08-13 18:53 Petr Vandrovec
2001-08-13 18:46 Per Jessen
2001-08-14 13:58 ` Andrew Scott
2001-08-14 19:54 ` David Ford
2001-08-13 17:53 PinkFreud
2001-08-13 20:27 ` Gérard Roudier
2001-08-13  7:43 PinkFreud
2001-08-13  8:52 ` Brian
2001-08-13  8:55 ` Francois Romieu
2001-08-14  4:21   ` Pete Toscano
2001-08-14 12:48     ` Alan Cox
2001-08-14 22:30       ` Paul G. Allen
2001-08-13 10:03 ` Gérard Roudier
2001-08-13 10:29   ` Justin Guyett
2001-08-13 12:56     ` Andrzej Krzysztofowicz
2001-08-13 16:54     ` Gérard Roudier
2001-08-13 10:09 ` Chris Wilson
2001-08-13 11:09   ` szonyi calin
2001-08-13 13:11 ` Alan Cox
2001-08-14 18:51   ` Anders Larsen
2001-08-14 20:29     ` Anders Larsen
2001-08-13 13:46 ` hugang
2001-08-13 13:55 ` Anton Altaparmakov
2001-08-13 17:16 ` Stephen Satchell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).