* 2.4.16 freezed up with eepro100 module @ 2001-11-29 15:25 Sven Heinicke 2001-11-29 15:51 ` Nathan Poznick 2001-11-29 16:14 ` Sven Heinicke 0 siblings, 2 replies; 13+ messages in thread From: Sven Heinicke @ 2001-11-29 15:25 UTC (permalink / raw) To: linux-kernel The 2.4.16 kernel finally makes my clients happy with memory management. The systems that froz up is a Dell of some sort or other with two 1Ghz Pentium IIIs and 4G of memory. But, now I seems to be having ethernet problems. With and eepro100 card: Bus 0, device 4, function 0: Ethernet controller: Intel Corp. 82557 [Ethernet Pro 100] (rev 8). IRQ 16. Master Capable. Latency=32. Min Gnt=8.Max Lat=56. Non-prefetchable 32 bit memory at 0xfeb02000 [0xfeb02fff]. I/O at 0xfcc0 [0xfcff]. Non-prefetchable 32 bit memory at 0xfe900000 [0xfe9fffff]. loaded as a module, being used heavily, the system froze with nothing on the console when I saw it. Normal log messages until: Nov 28 22:03:31 ps1 kernel: eth0: can't fill rx buffer (force 0)! Nov 28 22:05:03 ps1 kernel: 0001. Nov 28 22:05:03 ps1 kernel: eth0: can't fill rx buffer (force 1)! Nov 28 22:05:04 ps1 kernel: eth0: can't fill rx buffer (force 0)! Nov 28 22:05:05 ps1 kernel: eth0: can't fill rx buffer (force 0)! Nov 28 22:05:06 ps1 kernel: eth0: can't fill rx buffer (force 1)! Nov 28 22:05:06 ps1 kernel: eth0: can't fill rx buffer (force 0)! Nov 28 22:05:07 ps1 kernel: eth0: can't fill rx buffer (force 1)! Nov 28 22:05:08 ps1 kernel: eth0: can't fill rx buffer (force 1)! Nov 28 22:05:09 ps1 kernel: eth0: can't fill rx buffer (force 0)! Nov 28 22:05:17 ps1 last message repeated 10 times Nov 28 22:05:18 ps1 kernel: KERNEL: assertion (flags&MSG_PEEK) failed at tcp.c(1463):tcp_recvmsg Nov 28 22:07:48 ps1 kernel: eth0: card reports no resources. Nov 28 22:08:19 ps1 last message repeated 19 times Nov 28 22:09:20 ps1 last message repeated 56 times ... Nov 29 03:57:34 ps1 last message repeated 5 times Nov 29 03:58:36 ps1 last message repeated 4 times Nov 29 03:59:41 ps1 last message repeated 5 times Nov 29 04:00:44 ps1 last message repeated 4 times Nov 29 04:01:47 ps1 last message repeated 6 times Nov 29 09:54:13 ps1 syslogd 1.4-0: restart. Then me hitting the reset key before 10am. I'm going to start digging through the code (guess it will be more of a learning experience for me rather then actually being able to help code). So any suggestions will be helpful. --- Sven Heinicke <sven@research.nj.nec.com> Princeton, NJ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.4.16 freezed up with eepro100 module 2001-11-29 15:25 2.4.16 freezed up with eepro100 module Sven Heinicke @ 2001-11-29 15:51 ` Nathan Poznick 2001-11-30 4:49 ` J Sloan 2001-11-29 16:14 ` Sven Heinicke 1 sibling, 1 reply; 13+ messages in thread From: Nathan Poznick @ 2001-11-29 15:51 UTC (permalink / raw) To: Sven Heinicke; +Cc: linux-kernel Thus spake Sven Heinicke: > > The 2.4.16 kernel finally makes my clients happy with memory > management. The systems that froz up is a Dell of some sort or other > with two 1Ghz Pentium IIIs and 4G of memory. But, now I seems to be > having ethernet problems. With and eepro100 card: I've encountered the same problem, with the same hardware setup (I believe it's a Dell 2400, or something like that), on 2.4.14+xfs. For me it didn't lock up the entire machine however, it only seemed to kill the network - I was able to reboot the machine cleanly once I got to the console. (message from yesterday with the subject 'failed assertion in tcp.c') I too, am open to suggestions :-) -- Nathan Poznick <poznick@conwaycorp.net> PGP Key: http://drunkmonkey.org/pgpkey.txt Curiosity has its own reason for existing. -- Albert Einstein ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.4.16 freezed up with eepro100 module 2001-11-29 15:51 ` Nathan Poznick @ 2001-11-30 4:49 ` J Sloan 2001-11-30 5:45 ` Anuradha Ratnaweera 2001-11-30 16:04 ` Sven Heinicke 0 siblings, 2 replies; 13+ messages in thread From: J Sloan @ 2001-11-30 4:49 UTC (permalink / raw) To: Nathan Poznick; +Cc: Sven Heinicke, linux-kernel Nathan Poznick wrote: > Thus spake Sven Heinicke: > > > > The 2.4.16 kernel finally makes my clients happy with memory > > management. The systems that froz up is a Dell of some sort or other > > with two 1Ghz Pentium IIIs and 4G of memory. But, now I seems to be > > having ethernet problems. With and eepro100 card: > > I've encountered the same problem, with the same hardware setup (I > believe it's a Dell 2400, or something like that), on 2.4.14+xfs. For > me it didn't lock up the entire machine however, it only seemed to > kill the network - I was able to reboot the machine cleanly once I got > to the console. (message from yesterday with the subject 'failed > assertion in tcp.c') I too, am open to suggestions :-) Similar experience here - the network connectivity would go away, but the machine was still alive. Using the e100 driver instead seemed to solve the problem on the dell servers here. But I didn't have to reboot - just stopped networking, unloaded the eepro100 drivers, loaded the e100 drivers and started networking. cu jjs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.4.16 freezed up with eepro100 module 2001-11-30 4:49 ` J Sloan @ 2001-11-30 5:45 ` Anuradha Ratnaweera 2001-11-30 5:57 ` David Rees 2001-11-30 14:23 ` 2.4.16 freezed up with eepro100 module Nathan Poznick 2001-11-30 16:04 ` Sven Heinicke 1 sibling, 2 replies; 13+ messages in thread From: Anuradha Ratnaweera @ 2001-11-30 5:45 UTC (permalink / raw) To: J Sloan; +Cc: Nathan Poznick, Sven Heinicke, linux-kernel On Thu, Nov 29, 2001 at 08:49:48PM -0800, J Sloan wrote: > Nathan Poznick wrote: > > > Thus spake Sven Heinicke: > > > > > > The 2.4.16 kernel finally makes my clients happy with memory > > > management. The systems that froz up is a Dell of some sort or other > > > with two 1Ghz Pentium IIIs and 4G of memory. But, now I seems to be > > > having ethernet problems. With and eepro100 card: > > > > I've encountered the same problem, with the same hardware setup (I > > believe it's a Dell 2400, or something like that), on 2.4.14+xfs. For > > > > [...] > > Using the e100 driver instead seemed to solve the > problem on the dell servers here. Has anybody got the same issue with non Dell machines? I am running 2.4.16 on a Compaq proliant ML 370 without problems (machine has been up for 2+ days with the new kernels, though). Trafic is not very high. The driver is built into the kernel. /proc/pci shows Bus 0, device 2, function 0: Ethernet controller: Intel Corp. 82557 [Ethernet Pro 100] (rev 8). IRQ 5. Master Capable. Latency=64. Min Gnt=8.Max Lat=56. Non-prefetchable 32 bit memory at 0xc4fff000 [0xc4ffffff]. I/O at 0x2400 [0x243f]. Non-prefetchable 32 bit memory at 0xc4e00000 [0xc4efffff]. Bus 0, device 5, function 0: Ethernet controller: Intel Corp. 82557 [Ethernet Pro 100] (#2) (rev 8). IRQ 10. Master Capable. Latency=64. Min Gnt=8.Max Lat=56. Non-prefetchable 32 bit memory at 0xc4dfd000 [0xc4dfdfff]. I/O at 0x2c00 [0x2c3f]. Non-prefetchable 32 bit memory at 0xc4c00000 [0xc4cfffff]. Regards, Anuradha -- Debian GNU/Linux (kernel 2.4.16) First Law of Bicycling: No matter which way you ride, it's uphill and against the wind. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.4.16 freezed up with eepro100 module 2001-11-30 5:45 ` Anuradha Ratnaweera @ 2001-11-30 5:57 ` David Rees 2001-11-30 6:07 ` SBP2 Support for multiple LUNs - Changers ?? Ramaraj Pandian 2001-11-30 14:23 ` 2.4.16 freezed up with eepro100 module Nathan Poznick 1 sibling, 1 reply; 13+ messages in thread From: David Rees @ 2001-11-30 5:57 UTC (permalink / raw) To: linux-kernel On Fri, Nov 30, 2001 at 11:45:06AM +0600, Anuradha Ratnaweera wrote: > > Has anybody got the same issue with non Dell machines? > > I am running 2.4.16 on a Compaq proliant ML 370 without problems (machine has > been up for 2+ days with the new kernels, though). Trafic is not very high. I don't have any non-Dell machines with the eepro100, but I did put one of our Dells on 2.2.16 35 hours ago with the eepro100 driver. I don't know the exact model, but it's an older dual 500MHz PIII machine. Traffic is light, with only appoximately 100MB being transfered over the network so far. Is there a workload that can reproduce the hang? If so, I might be able to do a bit of testing... I've also got a couple Dell 2400s, but those are still running 2.4.9. Unfortunately those are production machines, so I don't want to mess with them right now. -Dave ^ permalink raw reply [flat|nested] 13+ messages in thread
* SBP2 Support for multiple LUNs - Changers ?? 2001-11-30 5:57 ` David Rees @ 2001-11-30 6:07 ` Ramaraj Pandian 0 siblings, 0 replies; 13+ messages in thread From: Ramaraj Pandian @ 2001-11-30 6:07 UTC (permalink / raw) To: linux-kernel I would like to use firewire dvd jukebox in Linux with latest kernel. Current SBP2 supports only one lun. DVD Jukebox has three luns(two for drives and one for device). It finds only one dvd rom drive out of two drives and DVD Jukebox. How do I make use of other luns through SBP2 module? How can I make it work? I am working on windows device driver. I am learning linux now. Your help will be greatly appreciated. Thanks Ramaraj ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.4.16 freezed up with eepro100 module 2001-11-30 5:45 ` Anuradha Ratnaweera 2001-11-30 5:57 ` David Rees @ 2001-11-30 14:23 ` Nathan Poznick 1 sibling, 0 replies; 13+ messages in thread From: Nathan Poznick @ 2001-11-30 14:23 UTC (permalink / raw) To: linux-kernel (forgot to cc lkm on my reply) > Has anybody got the same issue with non Dell machines? All I have to test with are Dell machines, so I haven't been able to try. > I am running 2.4.16 on a Compaq proliant ML 370 without problems (machine has > been up for 2+ days with the new kernels, though). Trafic is not very high. The trigger seems to be a combination of high network load, and high system load. The times it's happened to me, it's been while running an app that has a couple of hundred threads, uses about a gig and a half or so of memory, and does pretty heavy disk and network I/O. I'm still trying to find a job that can reproduce it reliably (or even semi-reliably), and when I can, I'm going to try a switch over to the e100 driver as some people have suggested, to see if that stops it from happening. -- Nathan <poznick@conwaycorp.net> PGP Key: http://drunkmonkey.org/pgpkey.txt "Competitiveness: the 8th deadly sin." --Phantom ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.4.16 freezed up with eepro100 module 2001-11-30 4:49 ` J Sloan 2001-11-30 5:45 ` Anuradha Ratnaweera @ 2001-11-30 16:04 ` Sven Heinicke 2001-11-30 22:31 ` Nathan Poznick 1 sibling, 1 reply; 13+ messages in thread From: Sven Heinicke @ 2001-11-30 16:04 UTC (permalink / raw) To: Anuradha Ratnaweera; +Cc: linux-kernel I have eepro100's on other systems and never had a problem. They never have been made to work as hard as the DELLs though. I am trying the same DELL with a 3C996-T 1000Bt card using the driver from 3COM (we plan on moving that system to a 1000Bt system but the switch hasn't arrived yet) and it is running at 100Bt with the same software. If you don't hear form me assume it surrived. Been up a day so far, took the DELL like 3 days of heavy use to crash before. Sven > Has anybody got the same issue with non Dell machines? > > I am running 2.4.16 on a Compaq proliant ML 370 without problems (machine has > been up for 2+ days with the new kernels, though). Trafic is not very high. > > The driver is built into the kernel. > > /proc/pci shows > > Bus 0, device 2, function 0: > Ethernet controller: Intel Corp. 82557 [Ethernet Pro 100] (rev 8). > IRQ 5. > Master Capable. Latency=64. Min Gnt=8.Max Lat=56. > Non-prefetchable 32 bit memory at 0xc4fff000 [0xc4ffffff]. > I/O at 0x2400 [0x243f]. > Non-prefetchable 32 bit memory at 0xc4e00000 [0xc4efffff]. > Bus 0, device 5, function 0: > Ethernet controller: Intel Corp. 82557 [Ethernet Pro 100] (#2) (rev 8). > IRQ 10. > Master Capable. Latency=64. Min Gnt=8.Max Lat=56. > Non-prefetchable 32 bit memory at 0xc4dfd000 [0xc4dfdfff]. > I/O at 0x2c00 [0x2c3f]. > Non-prefetchable 32 bit memory at 0xc4c00000 [0xc4cfffff]. > > Regards, > > Anuradha > > -- > > Debian GNU/Linux (kernel 2.4.16) > > First Law of Bicycling: > No matter which way you ride, it's uphill and against the wind. > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.4.16 freezed up with eepro100 module 2001-11-30 16:04 ` Sven Heinicke @ 2001-11-30 22:31 ` Nathan Poznick 2001-12-01 0:17 ` Mike Fedyk 0 siblings, 1 reply; 13+ messages in thread From: Nathan Poznick @ 2001-11-30 22:31 UTC (permalink / raw) To: linux-kernel Thus spake Sven Heinicke: > > I have eepro100's on other systems and never had a problem. They > never have been made to work as hard as the DELLs though. I am > trying the same DELL with a 3C996-T 1000Bt card using the driver from > 3COM (we plan on moving that system to a 1000Bt system but the switch > hasn't arrived yet) and it is running at 100Bt with the same > software. If you don't hear form me assume it surrived. Been up a > day so far, took the DELL like 3 days of heavy use to crash before. Ok, I finally had a chance to work on this, and here's what I know: 1) I found a workload under which I was able to reliably make the network on the machine die (a few hundred of the "eth0: card reports no resources." errors showed up which continued until I took down the network and removed the module). Unfortunately, the workload was with an in-house app, so all I can describe are the conditions associated with it: 2 processes with a total of about 600 threads, 1.5gb of memory, about 500 network connections, and a lot of disk and network I/O. 2) I switched from using the eepro100 module to using intel's e100 module, and I was unable to reproduce the problem, even under a heavier load than before. Haven't seen so much as a peep about eth0 problems in the logs since I switched over. So for now, I'll be sticking with the e100 driver, since it appears to have solved my problem (at least for now). -- Nathan Poznick <poznick@conwaycorp.net> PGP Key: http://drunkmonkey.org/pgpkey.txt "This is wild, I swear..." -Tom Servo (as Hercules). #410 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.4.16 freezed up with eepro100 module 2001-11-30 22:31 ` Nathan Poznick @ 2001-12-01 0:17 ` Mike Fedyk 2001-12-01 10:17 ` Andrey Savochkin 0 siblings, 1 reply; 13+ messages in thread From: Mike Fedyk @ 2001-12-01 0:17 UTC (permalink / raw) To: Nathan Poznick; +Cc: linux-kernel, Jeff Garzik, Andrey V. Savochkin Added Jeff & Andrey to cc list because they were the last two to modify the driver according to the comments at the top of eepro100.c On Fri, Nov 30, 2001 at 04:31:31PM -0600, Nathan Poznick wrote: > Thus spake Sven Heinicke: > > > > I have eepro100's on other systems and never had a problem. They > > never have been made to work as hard as the DELLs though. I am > > trying the same DELL with a 3C996-T 1000Bt card using the driver from > > 3COM (we plan on moving that system to a 1000Bt system but the switch > > hasn't arrived yet) and it is running at 100Bt with the same > > software. If you don't hear form me assume it surrived. Been up a > > day so far, took the DELL like 3 days of heavy use to crash before. > > Ok, I finally had a chance to work on this, and here's what I know: > > 1) I found a workload under which I was able to reliably make the > network on the machine die (a few hundred of the "eth0: card reports > no resources." errors showed up which continued until I took down the > network and removed the module). Unfortunately, the workload was with > an in-house app, so all I can describe are the conditions associated > with it: 2 processes with a total of about 600 threads, 1.5gb of > memory, about 500 network connections, and a lot of disk and network > I/O. > You can run the test against eepro100 with tcpdump redirected to a log file, and post that on the web somewhere. That would probably be helpful. Also, some sort of profiling. Jeff, Andrey, can you comment? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.4.16 freezed up with eepro100 module 2001-12-01 0:17 ` Mike Fedyk @ 2001-12-01 10:17 ` Andrey Savochkin 2001-12-03 15:37 ` Nathan Poznick 0 siblings, 1 reply; 13+ messages in thread From: Andrey Savochkin @ 2001-12-01 10:17 UTC (permalink / raw) To: Nathan Poznick, Mike Fedyk; +Cc: linux-kernel, Jeff Garzik Hi, On Fri, Nov 30, 2001 at 04:17:17PM -0800, Mike Fedyk wrote: > > On Fri, Nov 30, 2001 at 04:31:31PM -0600, Nathan Poznick wrote: > > Thus spake Sven Heinicke: > > > > > > I have eepro100's on other systems and never had a problem. They > > > never have been made to work as hard as the DELLs though. I am > > > trying the same DELL with a 3C996-T 1000Bt card using the driver from > > > 3COM (we plan on moving that system to a 1000Bt system but the switch > > > hasn't arrived yet) and it is running at 100Bt with the same > > > software. If you don't hear form me assume it surrived. Been up a > > > day so far, took the DELL like 3 days of heavy use to crash before. > > > > Ok, I finally had a chance to work on this, and here's what I know: > > > > 1) I found a workload under which I was able to reliably make the > > network on the machine die (a few hundred of the "eth0: card reports > > no resources." errors showed up which continued until I took down the > > network and removed the module). Unfortunately, the workload was with > > an in-house app, so all I can describe are the conditions associated > > with it: 2 processes with a total of about 600 threads, 1.5gb of > > memory, about 500 network connections, and a lot of disk and network > > I/O. Do you see "can't fill rx buffer" messages? If so, then your load is too big, and memory management is incapable of freeing memory in time. Right now the kernel doesn't allow to increase atomic allocation reservation (which is a serious misfeature), so you need to hack and change the reservation in the kernel. If the network doesn't come alive when you remove the load, it's a second problem, a bug in the driver. I've seen such reports, but they aren't frequent. On my computer, the driver resumes operations well. Why the driver can't do it for some people needs deep investigations. > > > You can run the test against eepro100 with tcpdump redirected to a log file, > and post that on the web somewhere. That would probably be helpful. tcpdumps won't help. Andrey ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.4.16 freezed up with eepro100 module 2001-12-01 10:17 ` Andrey Savochkin @ 2001-12-03 15:37 ` Nathan Poznick 0 siblings, 0 replies; 13+ messages in thread From: Nathan Poznick @ 2001-12-03 15:37 UTC (permalink / raw) To: Andrey Savochkin; +Cc: linux-kernel Thus spake Andrey Savochkin: > Do you see "can't fill rx buffer" messages? > If so, then your load is too big, and memory management is incapable of > freeing memory in time. > Right now the kernel doesn't allow to increase atomic allocation > reservation (which is a serious misfeature), so you need to hack and > change the reservation in the kernel. Yes, I saw a combination of the "can't fill rx buffer" messages and "card reports no resources" messages, and after a while it went to just a whole bunch (few hundred) of the "card reports no resources" messages, which continued to scroll across the console at the rate of one every second or so until I took down networking and removed the eepro100 module. > If the network doesn't come alive when you remove the load, it's a second > problem, a bug in the driver. I've seen such reports, but they aren't > frequent. On my computer, the driver resumes operations well. > Why the driver can't do it for some people needs deep investigations. After I removed the load, I gave it about 10 minutes or so to see if it would pick back up, but it didn't. -- Nathan Poznick <poznick@conwaycorp.net> PGP Key: http://drunkmonkey.org/pgpkey.txt "I think everyone ought to come in and have a hot cup of cocoa and come inside and be nice and snuggly." -Crow (as Dr. Herly). #201 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 2.4.16 freezed up with eepro100 module 2001-11-29 15:25 2.4.16 freezed up with eepro100 module Sven Heinicke 2001-11-29 15:51 ` Nathan Poznick @ 2001-11-29 16:14 ` Sven Heinicke 1 sibling, 0 replies; 13+ messages in thread From: Sven Heinicke @ 2001-11-29 16:14 UTC (permalink / raw) To: Nathan Poznick; +Cc: linux-kernel Nathan Poznick writes: > Thus spake Sven Heinicke: > > > > The 2.4.16 kernel finally makes my clients happy with memory > > management. The systems that froz up is a Dell of some sort or other > > with two 1Ghz Pentium IIIs and 4G of memory. But, now I seems to be > > having ethernet problems. With and eepro100 card: > > I've encountered the same problem, with the same hardware setup (I > believe it's a Dell 2400, or something like that), on 2.4.14+xfs. For > me it didn't lock up the entire machine however, it only seemed to > kill the network - I was able to reboot the machine cleanly once I got > to the console. (message from yesterday with the subject 'failed > assertion in tcp.c') I too, am open to suggestions :-) > I suspect that I would of been able to reboot it if I was at work in the middle of the night. I am unable to try older kernels as until 2.4.16 I had memory issues. The process that was doing so much eth0 is ran for like 3 days before the freeze. Sven ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2001-12-04 1:45 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2001-11-29 15:25 2.4.16 freezed up with eepro100 module Sven Heinicke 2001-11-29 15:51 ` Nathan Poznick 2001-11-30 4:49 ` J Sloan 2001-11-30 5:45 ` Anuradha Ratnaweera 2001-11-30 5:57 ` David Rees 2001-11-30 6:07 ` SBP2 Support for multiple LUNs - Changers ?? Ramaraj Pandian 2001-11-30 14:23 ` 2.4.16 freezed up with eepro100 module Nathan Poznick 2001-11-30 16:04 ` Sven Heinicke 2001-11-30 22:31 ` Nathan Poznick 2001-12-01 0:17 ` Mike Fedyk 2001-12-01 10:17 ` Andrey Savochkin 2001-12-03 15:37 ` Nathan Poznick 2001-11-29 16:14 ` Sven Heinicke
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).