* Re: Adaptec vs Symbios performance [not found] <200111032318.fA3NIQY62745@aslan.scsiguy.com> @ 2001-11-04 3:50 ` Stephan von Krawczynski 2001-11-04 5:47 ` Justin T. Gibbs 2001-11-04 14:17 ` Stephan von Krawczynski 0 siblings, 2 replies; 14+ messages in thread From: Stephan von Krawczynski @ 2001-11-04 3:50 UTC (permalink / raw) To: Justin T. Gibbs; +Cc: linux-kernel, groudier > >Hello Justin, hello Gerard > > > >I am looking currently for reasons for bad behaviour of aic7xxx driver > >in an shared interrupt setup and general not-nice behaviour of the > >driver regarding multi-tasking environment. > > Can you be more specific? Yes, of course :-) What I am seeing over here is that aic7xxx is _significantly_ slower than symbios _in the exact same context_. I refused to use the "new" driver as long as possible because I had (right from the first test) the "feeling" that it hurts the machine overall performance in some way, meaning the box seems _slow_ and less responsive than it was with the old aic driver. When I directly compared it with symbios (LSI Logic hardware sold from Tekram) I additionaly found out, that it seems to hurt the interrupt performance of a network card sharing its interrupt with the aic which again does not happen with symbios. I have already seen such behaviour before, on merely every driver I formerly wrote for shared interrupt systems I had to fill in code that _prevents_ lockout of other interrupt users due to indefinitely walking through the own code in high load situation. But, of course, you _know_ this. Nobody writes a driver like new aic7xxx _and_ doesn't know :-) My guess is that this knowledge made you enter the comment I ripped from your code about using bottom half handler instead of dealing with workload in a hardware interrupt. Again, I have to no extent read your code completely or the like. I simply tried to find the hardware interrupt routine and look if it does significant eli (EverLasting Interrupt ;-) stuff - and I found your comment. Can you re-comment from todays point of view? > >This is nice. I cannot read the complete code around it (it is derived > >from aic7xxx_linux.c) but if I understand the naming and comments > >correct, some workload is done inside the hardware interrupt (which > >shouldn't), which would very much match my tests showing bad overall > >performance behaviour. Obviously this code is old (read the comment) > >and needs reworking. > >Comments? > > I won't comment on whether deferring this work until outside of > an interrupt context would help your "problem" until I understand > what you are complaining about. 8-) In a nutshell: a) long lasting interrupt workloads prevent normal process activity (creating latency and sticky behaviour) b) long lasting interrupt workloads play bad on other interrupt users (e.g. on the same shared interrupt) I can see _both_ comparing aic with symbios. Regards, Stephan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Adaptec vs Symbios performance 2001-11-04 3:50 ` Adaptec vs Symbios performance Stephan von Krawczynski @ 2001-11-04 5:47 ` Justin T. Gibbs 2001-11-04 5:23 ` Gérard Roudier 2001-11-04 14:17 ` Stephan von Krawczynski 1 sibling, 1 reply; 14+ messages in thread From: Justin T. Gibbs @ 2001-11-04 5:47 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: linux-kernel, groudier >Can you re-comment from todays point of view? I believe that if you were to set the tag depth in the new aic7xxx driver to a level similar to either the symbios or the old aic7xxx driver, that the problem you describe would go away. The driver will only perform internal queuing if a device cannot handle the original queue depth exported to the SCSI mid-layer. Since the mid-layer provides no mechanism for proper, dynamic, throttling, queuing at the driver level will always be required when the driver determines that a target cannot accept additional commands. The default used by the older driver, 8, seems to work for most drives. So, no internal queuing is required. If you are really concerned about interrupt latency, this will also be a win as you will reduce your transaction throughput and thus the frequency of interrupts seen by the controller. >> I won't comment on whether deferring this work until outside of >> an interrupt context would help your "problem" until I understand >> what you are complaining about. 8-) > >In a nutshell: >a) long lasting interrupt workloads prevent normal process activity >(creating latency and sticky behaviour) Deferring the work to outside of interrupt context will not, in general, allow non-kernel processes to run any sooner. Only interrupt handlers that don't block on the io-request lock (may it die a horrible death) would be allowed to pre-empt this activity. Even in this case, there will be times, albeit much shorter, that this interrupt will be blocked by the per-controller spin-lock used to protect driver data structures and access to the card's registers. If your processes are really feeling sluggish, you are probably doing *a lot* of I/O. The only thing that might help is some interrupt coalessing algorithm in the aic7xxx driver's firmware. Since these chips do not have an easily utilized timer facility any such algorithm would be tricky to implement. I've thought about it, but not enough to implement it yet. >b) long lasting interrupt workloads play bad on other interrupt users >(e.g. on the same shared interrupt) Sure. As the comment suggests, the driver should use a bottom half handler or whatever new deferral mechanism is currently the rage in Linux. When I first ported the driver, it was targeted to be a module, suitable for a driver diskette, to replace the old driver. Things have changed since then, and this area should be revisited. Internal queuing was not required in the original FreeBSD driver and this is something the mid-layer should do on a driver's behalf, but I've already ranted enough about that. >I can see _both_ comparing aic with symbios. I'm not sure that you would see much of a difference if you set the symbios driver to use 253 commands per-device. I haven't looked at the sym driver for some time, but last I remember it does not use a bottom half handler and handles queue throttling internally. It may perform less work at interrupt time than the aic7xxx driver if locally queued I/O is compiled into a format suitable for controller consumption rather than queue the ScsiCmnd structure provided by the mid-layer. The aic7xxx driver has to convert a ScsiCmnd into a controller data structure to service an internal queue and this can take a bit of time. I would be interresting if there is a disparity in the TPS numbers and tag depths in your comparisons. Higher tag depth usually means higher TPS which may also mean less interactive response from the system. All things being equal, I would expect the sym and aic7xxx drivers to perform about the same. -- Justin ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Adaptec vs Symbios performance 2001-11-04 5:47 ` Justin T. Gibbs @ 2001-11-04 5:23 ` Gérard Roudier 0 siblings, 0 replies; 14+ messages in thread From: Gérard Roudier @ 2001-11-04 5:23 UTC (permalink / raw) To: Justin T. Gibbs; +Cc: Stephan von Krawczynski, linux-kernel On Sat, 3 Nov 2001, Justin T. Gibbs wrote: [...] > >I can see _both_ comparing aic with symbios. > > I'm not sure that you would see much of a difference if you set the > symbios driver to use 253 commands per-device. I haven't looked at This is discouraged. :) Better, IMO, to compare behaviours with realistic queue depths. As you know, more than 64 for hard disks does not make sense (yet). Personnaly, I use 64 under FreeBSD and 16 under Linux. Guess why ? :-) > the sym driver for some time, but last I remember it does not use > a bottom half handler and handles queue throttling internally. It There is no BH in the driver. The stock sym53c8xx even uses scsi_obsolete that requires more load in interrupt context for command completion. SYM-2 that comes back from FreeBSD uses the EH threaded stuff that just queues to a BH on completion. Stephan may want to give SYM-2 a try, IMO. > may perform less work at interrupt time than the aic7xxx driver if > locally queued I/O is compiled into a format suitable for controller > consumption rather than queue the ScsiCmnd structure provided by > the mid-layer. The aic7xxx driver has to convert a ScsiCmnd into a > controller data structure to service an internal queue and this can > take a bit of time. The sym* drivers also uses an internal data structure to handle I/Os. The SCSI script does not know about any O/S specific data structure. > I would be interresting if there is a disparity in the TPS numbers > and tag depths in your comparisons. Higher tag depth usually means > higher TPS which may also mean less interactive response from the > system. All things being equal, I would expect the sym and aic7xxx > drivers to perform about the same. Agreed. Gérard. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Adaptec vs Symbios performance 2001-11-04 3:50 ` Adaptec vs Symbios performance Stephan von Krawczynski 2001-11-04 5:47 ` Justin T. Gibbs @ 2001-11-04 14:17 ` Stephan von Krawczynski 2001-11-04 18:10 ` Justin T. Gibbs ` (2 more replies) 1 sibling, 3 replies; 14+ messages in thread From: Stephan von Krawczynski @ 2001-11-04 14:17 UTC (permalink / raw) To: Justin T. Gibbs; +Cc: linux-kernel, groudier On Sat, 03 Nov 2001 22:47:39 -0700 "Justin T. Gibbs" <gibbs@scsiguy.com> wrote: > >Can you re-comment from todays point of view? > > I believe that if you were to set the tag depth in the new aic7xxx > driver to a level similar to either the symbios or the old aic7xxx > driver, that the problem you describe would go away. Nope. I know the stuff :-) I already took tcq down to 8 (as in old driver) back at the times I compared old an new driver. Indeed I found out that everything is a lot worse if using tcq 256 (which doesn't work anyway and gets down to 128 in real life using my IBM harddrive). After using depth 8 the comparison to symbios is just as described. Though I must admit, that the symbios driver takes down tcq from 8 to 4 according to his boot-up message. Do you think it will make a noticeable difference if I hardcode the depth to 4 in the aic7xxx driver? > The driver > will only perform internal queuing if a device cannot handle the > original queue depth exported to the SCSI mid-layer. Since the > mid-layer provides no mechanism for proper, dynamic, throttling, > queuing at the driver level will always be required when the driver > determines that a target cannot accept additional commands. The default > used by the older driver, 8, seems to work for most drives. So, no > internal queuing is required. If you are really concerned about > interrupt latency, this will also be a win as you will reduce your > transaction throughput and thus the frequency of interrupts seen > by the controller. Hm, this is not really true in my experience. Since a harddrive is in a completely other time-framing than pure software issues it may well be, that building up internal data not directly inside the hardware interrupt, but on a somewhere higher layer, is no noticeable performance loss, _if_ it is done right. "Right" here means obviously there must not be a synchronous linkage between this higher layer and the hardware interrupt in this sense that the higher layer has to wait on hardware interrupts' completion. But this is all pretty "down to earth" stuff you know anyways. > >> I won't comment on whether deferring this work until outside of > >> an interrupt context would help your "problem" until I understand > >> what you are complaining about. 8-) > > > >In a nutshell: > >a) long lasting interrupt workloads prevent normal process activity > >(creating latency and sticky behaviour) > > Deferring the work to outside of interrupt context will not, in > general, allow non-kernel processes to run any sooner. kernel processes would be completely sufficient. If you hit allocation routines (e.g.) the whole system enters hickup state :-). > Only interrupt > handlers that don't block on the io-request lock (may it die a horrible > death) would be allowed to pre-empt this activity. Even in this case, > there will be times, albeit much shorter, that this interrupt > will be blocked by the per-controller spin-lock used to protect > driver data structures and access to the card's registers. Well, this is a natural thing. You always have to protect such exclusively working things like controller registers, but doubtlessly things turn out the better the less exclusiveness you have (what can be more exclusive than a hardware interrupt?). > If your processes are really feeling sluggish, you are probably doing > *a lot* of I/O. Yes, of course. I wouldn't have complained in the first place _not_ knowing that symbios does it better. > The only thing that might help is some interrupt > coalessing algorithm in the aic7xxx driver's firmware. Since these > chips do not have an easily utilized timer facility any such algorithm > would be tricky to implement. I've thought about it, but not enough > to implement it yet. I cannot comment on that, I don't know what Gerard really does here. > >b) long lasting interrupt workloads play bad on other interrupt users > >(e.g. on the same shared interrupt) > > Sure. As the comment suggests, the driver should use a bottom half > handler or whatever new deferral mechanism is currently the rage > in Linux. Do you think this is complex in implementation? > [...] > >I can see _both_ comparing aic with symbios. > > I'm not sure that you would see much of a difference if you set the > symbios driver to use 253 commands per-device. As stated earlier I took both drivers to comparable values (8). > I would be interresting if there is a disparity in the TPS numbers > and tag depths in your comparisons. Higher tag depth usually means > higher TPS which may also mean less interactive response from the > system. All things being equal, I would expect the sym and aic7xxx > drivers to perform about the same. I can confirm that. 253 is a bad joke in terms of interactive responsiveness during high load. Probably the configured standard value should be taken down remarkably. 253 feels like old IDE. Yes, I know this comment hurt you badly ;-) In my eyes the changes required in your driver are _not_ that big. The gain would be noticeable. I don't say its a bad driver, really not, I would only suggest some refinement. I know _you_ can do a bit better, prove me right ;-) Regards, Stephan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Adaptec vs Symbios performance 2001-11-04 14:17 ` Stephan von Krawczynski @ 2001-11-04 18:10 ` Justin T. Gibbs 2001-11-04 18:35 ` Stephan von Krawczynski 2001-11-04 19:02 ` Stephan von Krawczynski 2 siblings, 0 replies; 14+ messages in thread From: Justin T. Gibbs @ 2001-11-04 18:10 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: linux-kernel, groudier >On Sat, 03 Nov 2001 22:47:39 -0700 "Justin T. Gibbs" <gibbs@scsiguy.com> wrote >: > >> >Can you re-comment from todays point of view? >> >> I believe that if you were to set the tag depth in the new aic7xxx >> driver to a level similar to either the symbios or the old aic7xxx >> driver, that the problem you describe would go away. > >Nope. >I know the stuff :-) I already took tcq down to 8 (as in old driver) back at >the times I compared old an new driver. Then you will have to find some other reason for the difference in performance. Internal queuing is not a factor with any reasonable modern drive when the depth is set at 8. >Indeed I found out that everything is a lot worse if using tcq 256 (which >doesn't work anyway and gets down to 128 in real life using my IBM harddrive). The driver cannot know if you are using an external RAID controller or an IBM drive or a Qunatum fireball. It is my belief that in a true multi-tasking workload, giving the device as much work to chew on as it can handle is always best. Your sequential bandwidth may be a bit less, but sequential I/O is not that interesting in my opinion. >After using depth 8 the comparison to >symbios is just as described. Though I must admit, that the symbios driver >takes down tcq from 8 to 4 according to his boot-up message. Do you think it >will make a noticeable difference if I hardcode the depth to 4 in the aic7xxx >driver? As mentioned above, I would not expect any difference. >> The driver >> will only perform internal queuing if a device cannot handle the >> original queue depth exported to the SCSI mid-layer. Since the >> mid-layer provides no mechanism for proper, dynamic, throttling, >> queuing at the driver level will always be required when the driver >> determines that a target cannot accept additional commands. The default >> used by the older driver, 8, seems to work for most drives. So, no >> internal queuing is required. If you are really concerned about >> interrupt latency, this will also be a win as you will reduce your >> transaction throughput and thus the frequency of interrupts seen >> by the controller. > >Hm, this is not really true in my experience. Since a harddrive is in a >completely other time-framing than pure software issues it may well be, that >building up internal data not directly inside the hardware interrupt, but on a >somewhere higher layer, is no noticeable performance loss, _if_ it is done >right. "Right" here means obviously there must not be a synchronous linkage >between this higher layer and the hardware interrupt in this sense that the >higher layer has to wait on hardware interrupts' completion. But this is all >pretty "down to earth" stuff you know anyways. I don't understand how your comments relate to mine. In a perfect world, and with a "real" SCSI layer in Linux, the driver would never have any queued data above and beyond what it can directly send to the device. Since Linux lets you set the queue depth only at startup, before you can dynamically determine a useful value, the driver has little choice. To say it more directly, internal queuing is not something I wanted in the design - in fact it makes it more complicated and less efficient. >> Deferring the work to outside of interrupt context will not, in >> general, allow non-kernel processes to run any sooner. > >kernel processes would be completely sufficient. If you hit allocation routine >s >(e.g.) the whole system enters hickup state :-). But even those kernel processes will not run unless they have a higher priority than the bottom half handler. I can't stress this enough... interactive performance will not change if this is done because kernel tasks take priority over user tasks. >> If your processes are really feeling sluggish, you are probably doing >> *a lot* of I/O. > >Yes, of course. I wouldn't have complained in the first place _not_ knowing >that symbios does it better. I wish you could be a bit more quanitative in your analysis. It seems clear to me that the area you're pointing to is not the cause of your complaint. Without a quantitative analysis, I can't help you figure this out. >> Sure. As the comment suggests, the driver should use a bottom half >> handler or whatever new deferral mechanism is currently the rage >> in Linux. > >Do you think this is complex in implementation? No, but doing anything like this requires some research to find a solution that works for all kernel versions the driver supports. I hope I don't need three different implementations to make this work. Regardless, this change will not make any difference in your problem. >> I would be interresting if there is a disparity in the TPS numbers >> and tag depths in your comparisons. Higher tag depth usually means >> higher TPS which may also mean less interactive response from the >> system. All things being equal, I would expect the sym and aic7xxx >> drivers to perform about the same. > >I can confirm that. 253 is a bad joke in terms of interactive responsiveness >during high load. Its there for throughput, not interactive performance. I'm sure if you were doing things like news expirations, you'd appreciate the higher number (up to the 128 tags your drives support). >Probably the configured standard value should be taken down remarkably. >253 feels like old IDE. >Yes, I know this comment hurt you badly ;-) Not really. Each to their own. You can tune your system however you see fit. >In my eyes the changes required in your driver are _not_ that big. The gain >would be noticeable. I don't say its a bad driver, really not, I would only >suggest some refinement. I know _you_ can do a bit better, prove me right ;-) Show me where the real problem is, and I'll fix it. I'll add the bottom half handler too eventually, but I don't see it as a pressing item. I'm much more interested in why you are seeing the behavior you are and exactly what, quantitatively, that behavior is. -- Justin ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Adaptec vs Symbios performance 2001-11-04 14:17 ` Stephan von Krawczynski 2001-11-04 18:10 ` Justin T. Gibbs @ 2001-11-04 18:35 ` Stephan von Krawczynski 2001-11-04 16:31 ` Gérard Roudier ` (2 more replies) 2001-11-04 19:02 ` Stephan von Krawczynski 2 siblings, 3 replies; 14+ messages in thread From: Stephan von Krawczynski @ 2001-11-04 18:35 UTC (permalink / raw) To: Justin T. Gibbs; +Cc: linux-kernel, groudier On Sun, 04 Nov 2001 11:10:26 -0700 "Justin T. Gibbs" <gibbs@scsiguy.com> wrote: > >On Sat, 03 Nov 2001 22:47:39 -0700 "Justin T. Gibbs" <gibbs@scsiguy.com> wrote > Show me where the real problem is, and I'll fix it. I'll add the bottom > half handler too eventually, but I don't see it as a pressing item. I'm > much more interested in why you are seeing the behavior you are and exactly > what, quantitatively, that behavior is. Hm, what more specific can I tell you, than: Take my box with Host: scsi1 Channel: 00 Id: 03 Lun: 00 Vendor: TEAC Model: CD-ROM CD-532S Rev: 1.0A Type: CD-ROM ANSI SCSI revision: 02 Host: scsi0 Channel: 00 Id: 08 Lun: 00 Vendor: IBM Model: DDYS-T36950N Rev: S96H Type: Direct-Access ANSI SCSI revision: 03 and an aic7xxx driver. Start xcdroast an read a cd image. You get something between 2968,4 and 3168,2 kB/s throughput measured from xcdroast. Now redo this with a Tekram controller (which is sym53c1010) and you get throughput of 3611,1 to 3620,2 kB/s. No special stuff or background processes or anything else involved. I wonder how much simpler a test could be. Give me values to compare from _your_ setup. If you redo this test with nfs-load (copy files from some client to your test-box acting as nfs-server) you will end up at 1926 - 2631 kB/s throughput with aic, but 3395 - 3605 kB/s with symbios. If you need more on that picture, then redo the last and start _some_ application in the background during the test (like mozilla). Time how long it takes until the application is up and running. If you are really unlucky you have your mail-client open during test and let it get mails via pop3 in a MH folder (lots of small files). You have a high chance that your mail-client is unusable until xcdroast is finished with cd reading - but not with symbios. ?? Regards, Stephan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Adaptec vs Symbios performance 2001-11-04 18:35 ` Stephan von Krawczynski @ 2001-11-04 16:31 ` Gérard Roudier 2001-11-04 19:13 ` Justin T. Gibbs 2001-11-04 19:56 ` Stephan von Krawczynski 2 siblings, 0 replies; 14+ messages in thread From: Gérard Roudier @ 2001-11-04 16:31 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: Justin T. Gibbs, linux-kernel, groudier Hi Stephan, The difference in performance for your CD (slow device) between aic7xxx and sym53c8xx using equi-capable HBAs (notably Ultra-160) cannot be believed a single second to be due to a design flaw in the aic7xxx driver. Instead of trying to prove Justin wrong with his driver, you should look into your system configuration and/or provide Justin with accurate information and/or do different testings in order to get some clue about the real cause. You may have triggerred a software/hardware bug somewhere, but I am convinced that it cannot be a driver design bug. In order to help Justin work on your problem, you should for example report: - The device configuration you set up in the controller EEPROM/NVRAM. - The kernel boot-up messages. - Your kernel configuration. - Etc... You might for example have unintentionnaly configured some devices in the HBA set-up for disconnection not to be granted. Such configuration MISTAKE is likely to kill SCSI performances a LOT. Gérard. PS: If you are interested in Justin's ability to design software for SCSI, then you may want to have a look into all FreeBSD IO-related stuff owned by Justin. On Sun, 4 Nov 2001, Stephan von Krawczynski wrote: > On Sun, 04 Nov 2001 11:10:26 -0700 "Justin T. Gibbs" <gibbs@scsiguy.com> wrote: > > > >On Sat, 03 Nov 2001 22:47:39 -0700 "Justin T. Gibbs" <gibbs@scsiguy.com> > wrote > > Show me where the real problem is, and I'll fix it. I'll add the bottom > > half handler too eventually, but I don't see it as a pressing item. I'm > > much more interested in why you are seeing the behavior you are and exactly > > what, quantitatively, that behavior is. > > Hm, what more specific can I tell you, than: > > Take my box with > > Host: scsi1 Channel: 00 Id: 03 Lun: 00 > Vendor: TEAC Model: CD-ROM CD-532S Rev: 1.0A > Type: CD-ROM ANSI SCSI revision: 02 > Host: scsi0 Channel: 00 Id: 08 Lun: 00 > Vendor: IBM Model: DDYS-T36950N Rev: S96H > Type: Direct-Access ANSI SCSI revision: 03 > > and an aic7xxx driver. Start xcdroast an read a cd image. You get something > between 2968,4 and 3168,2 kB/s throughput measured from xcdroast. > > Now redo this with a Tekram controller (which is sym53c1010) and you get > throughput of 3611,1 to 3620,2 kB/s. > No special stuff or background processes or anything else involved. I wonder > how much simpler a test could be. > Give me values to compare from _your_ setup. > > If you redo this test with nfs-load (copy files from some client to your > test-box acting as nfs-server) you will end up at 1926 - 2631 kB/s throughput > with aic, but 3395 - 3605 kB/s with symbios. > > If you need more on that picture, then redo the last and start _some_ > application in the background during the test (like mozilla). Time how long it > takes until the application is up and running. > If you are really unlucky you have your mail-client open during test and let it > get mails via pop3 in a MH folder (lots of small files). You have a high chance > that your mail-client is unusable until xcdroast is finished with cd reading - > but not with symbios. > > ?? > > Regards, > Stephan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Adaptec vs Symbios performance 2001-11-04 18:35 ` Stephan von Krawczynski 2001-11-04 16:31 ` Gérard Roudier @ 2001-11-04 19:13 ` Justin T. Gibbs 2001-11-04 19:56 ` Stephan von Krawczynski 2 siblings, 0 replies; 14+ messages in thread From: Justin T. Gibbs @ 2001-11-04 19:13 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: linux-kernel, groudier >Hm, what more specific can I tell you, than: Well, the numbers paint a different picture than your pervious comments. You never mentioned a performance disparity, only a loss in interactive performance. >Take my box with > >Host: scsi1 Channel: 00 Id: 03 Lun: 00 > Vendor: TEAC Model: CD-ROM CD-532S Rev: 1.0A > Type: CD-ROM ANSI SCSI revision: 02 >Host: scsi0 Channel: 00 Id: 08 Lun: 00 > Vendor: IBM Model: DDYS-T36950N Rev: S96H > Type: Direct-Access ANSI SCSI revision: 03 > >and an aic7xxx driver. A full dmesg would be better. Right now I have no idea what kind of aic7xxx controller you are using, the speed and type of CPU, the chipset in the machine, etc. etc. In general, I'd rather see the raw data than a version edited down based on the conclusions you've already drawn or on what you feel is important. >Start xcdroast an read a cd image. You get something >between 2968,4 and 3168,2 kB/s throughput measured from xcdroast. > >Now redo this with a Tekram controller (which is sym53c1010) and you get >throughput of 3611,1 to 3620,2 kB/s. Were both tests performed from cold boot to a new file in the same directory with similar amounts of that filesystem in use? >No special stuff or background processes or anything else involved. I wonder >how much simpler a test could be. It doesn't matter how simple it is if you've never mentioned it before. Your tone is somewhat indignant. Do you not understand why this data is important to understanding and correcting the problem? >Give me values to compare from _your_ setup. Send me a c1010. 8-) >If you redo this test with nfs-load (copy files from some client to your >test-box acting as nfs-server) you will end up at 1926 - 2631 kB/s throughput >with aic, but 3395 - 3605 kB/s with symbios. What is the interrupt load during these tests? Have you verified that disconnection is enabled for all devices on the aic7xxx controller? >If you need more on that picture, then redo the last and start _some_ >application in the background during the test (like mozilla). Time how long it >takes until the application is up and running. Since you are experiencing the problem, can't you time it? There is little guarantee that I will be able to reproduce the exact scenario you are describing. As I mentioned before, I don't have a c1010, so I cannot perform the comparison you feel is so telling. This does not look like an interrupt latency problem. -- Justin ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Adaptec vs Symbios performance 2001-11-04 18:35 ` Stephan von Krawczynski 2001-11-04 16:31 ` Gérard Roudier 2001-11-04 19:13 ` Justin T. Gibbs @ 2001-11-04 19:56 ` Stephan von Krawczynski 2001-11-04 20:43 ` Justin T. Gibbs 2 siblings, 1 reply; 14+ messages in thread From: Stephan von Krawczynski @ 2001-11-04 19:56 UTC (permalink / raw) To: Justin T. Gibbs; +Cc: linux-kernel, groudier On Sun, 04 Nov 2001 12:13:20 -0700 "Justin T. Gibbs" <gibbs@scsiguy.com> wrote: > >Hm, what more specific can I tell you, than: > > Well, the numbers paint a different picture than your pervious > comments. You never mentioned a performance disparity, only a > loss in interactive performance. See: Date: Wed, 31 Oct 2001 16:45:39 +0100 From: Stephan von Krawczynski <skraw@ithnet.com> To: linux-kernel <linux-kernel@vger.kernel.org> Subject: The good, the bad & the ugly (or VM, block devices, and SCSI :-) Message-Id: <20011031164539.29c04ee0.skraw@ithnet.com> > A full dmesg would be better. Right now I have no idea what kind > of aic7xxx controller you are using, Adaptec A29160 (see above mail). Remarkably is I have a 32 bit PCI bus, no 64 bit. This is an Asus CUV4X-D board. > the speed and type of CPU, 2 x PIII 1GHz > the chipset in the machine, 00:00.0 Host bridge: VIA Technologies, Inc. VT82C693A/694x [Apollo PRO133x] (rev c4) 00:01.0 PCI bridge: VIA Technologies, Inc. VT82C598/694x [Apollo MVP3/Pro133x AGP] 00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 40) 00:04.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06) 00:04.2 USB Controller: VIA Technologies, Inc. UHCI USB (rev 16) 00:04.3 USB Controller: VIA Technologies, Inc. UHCI USB (rev 16) 00:04.4 Host bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 40) 00:09.0 PCI bridge: Digital Equipment Corporation DECchip 21152 (rev 03) 00:0a.0 Network controller: Elsa AG QuickStep 1000 (rev 01) 00:0b.0 SCSI storage controller: Symbios Logic Inc. (formerly NCR) 53c1010 Ultra3 SCSI Adapter (rev 01) 00:0b.1 SCSI storage controller: Symbios Logic Inc. (formerly NCR) 53c1010 Ultra3 SCSI Adapter (rev 01) 00:0d.0 Multimedia audio controller: Creative Labs SB Live! EMU10000 (rev 07) 00:0d.1 Input device controller: Creative Labs SB Live! (rev 07) 01:00.0 VGA compatible controller: nVidia Corporation NV11 (rev b2) 02:04.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41) 02:05.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41) 02:06.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41) 02:07.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41) > Were both tests performed from cold boot I rechecked that several times, made no difference. > to a new file in the same > directory with similar amounts of that filesystem in use? yes. There is no difference if the file is a) new or b) overwritten. Anyway both test cases use the same filesystems, I really exchanged only the controllers, everything else is completely the same. Just did another test run with symbios, _after_ heavy nfs and I/O action on the box and about 145 MB in swap currently. Result: 3620,1 kB/s. _Very_ stable appearance from symbios. > >No special stuff or background processes or anything else involved. I wonder > >how much simpler a test could be. > > It doesn't matter how simple it is if you've never mentioned it before. Sorry, but there was nothing left out on my side. s.a. > Your tone is somewhat indignant. Do you not understand why this > data is important to understanding and correcting the problem? Sorry for that, this is unintentional. Though my written english may look nice, keep in mind I am no native english-speaking, so some things may come over a bit rougher than intended. > >Give me values to compare from _your_ setup. > > Send me a c1010. 8-) Sorry, misunderstanding. What I meant was: how fast can you read data from your cd-rom attached to some adaptec controller? > >If you redo this test with nfs-load (copy files from some client to your > >test-box acting as nfs-server) you will end up at 1926 - 2631 kB/s throughput > >with aic, but 3395 - 3605 kB/s with symbios. > > What is the interrupt load during these tests? How can I present you an exact figure on this? > Have you verified that > disconnection is enabled for all devices on the aic7xxx controller? yes. > This does not look like an interrupt latency problem. Based on which thoughts? Regards, Stephan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Adaptec vs Symbios performance 2001-11-04 19:56 ` Stephan von Krawczynski @ 2001-11-04 20:43 ` Justin T. Gibbs 2001-11-05 12:18 ` Matthias Andree 0 siblings, 1 reply; 14+ messages in thread From: Justin T. Gibbs @ 2001-11-04 20:43 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: linux-kernel, groudier >See: > >Date: Wed, 31 Oct 2001 16:45:39 +0100 >From: Stephan von Krawczynski <skraw@ithnet.com> >To: linux-kernel <linux-kernel@vger.kernel.org> >Subject: The good, the bad & the ugly (or VM, block devices, and SCSI :-) >Message-Id: <20011031164539.29c04ee0.skraw@ithnet.com> <Sigh> I don't read all of the LK list and the mail was not cc'd to me, so I did not see this thread. >> A full dmesg would be better. Right now I have no idea what kind >> of aic7xxx controller you are using, > >Adaptec A29160 (see above mail). Remarkably is I have a 32 bit PCI bus, no 64 >bit. This is an Asus CUV4X-D board. *Please stop editing things*. I need the actual boot messages from the detection of the aic7xxx card. It would also be nice to see the output of /proc/scsi/aic7xxx/<card #> >> the speed and type of CPU, > >2 x PIII 1GHz Dmesg please. >Sorry, misunderstanding. What I meant was: how fast can you read data >from your cd-rom attached to some adaptec controller? I'll run some tests tomorrow at work. I'm sure the results will be dependent on the cdrom in question but they may show something. >> >If you redo this test with nfs-load (copy files from some client to your >> >test-box acting as nfs-server) you will end up at 1926 - 2631 kB/s >throughput >> >with aic, but 3395 - 3605 kB/s with symbios. >> >> What is the interrupt load during these tests? > >How can I present you an exact figure on this? Isn't there a systat or vmstat equivalent under Linux that gives you interrupt rates? I'll poke around tomorrow when I'm in front of a Linux box. >> Have you verified that >> disconnection is enabled for all devices on the aic7xxx controller? > >yes. The driver may not be seeing the same things as SCSI-Select for some strange reason. Again, just email me a full dmesg after a successful boot along with the /proc/scsi/aic7xxx/ output. >> This does not look like an interrupt latency problem. > >Based on which thoughts? It really looks like a bug in the driver's round-robin code or perhaps a difference in how many transactions we allow to be queued in the untagged case. Can you re-run your tests with the output directed to /dev/null for cdrom reads and also perform some benchmarks against your disk? The benchmarks should operate on one device only at a time with as little I/O to any other device during the test. -- Justin ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Adaptec vs Symbios performance 2001-11-04 20:43 ` Justin T. Gibbs @ 2001-11-05 12:18 ` Matthias Andree 0 siblings, 0 replies; 14+ messages in thread From: Matthias Andree @ 2001-11-05 12:18 UTC (permalink / raw) To: linux-kernel On Sun, 04 Nov 2001, Justin T. Gibbs wrote: > Isn't there a systat or vmstat equivalent under Linux that gives you > interrupt rates? I'll poke around tomorrow when I'm in front of a Linux > box. vmstat is usually available, systat/iostat and the like are not ubiquitous however. -- Matthias Andree ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Adaptec vs Symbios performance 2001-11-04 14:17 ` Stephan von Krawczynski 2001-11-04 18:10 ` Justin T. Gibbs 2001-11-04 18:35 ` Stephan von Krawczynski @ 2001-11-04 19:02 ` Stephan von Krawczynski 2 siblings, 0 replies; 14+ messages in thread From: Stephan von Krawczynski @ 2001-11-04 19:02 UTC (permalink / raw) To: Justin T. Gibbs; +Cc: linux-kernel, groudier On Sun, 04 Nov 2001 11:10:26 -0700 "Justin T. Gibbs" <gibbs@scsiguy.com> wrote: > >Nope. > >I know the stuff :-) I already took tcq down to 8 (as in old driver) back at > >the times I compared old an new driver. > > Then you will have to find some other reason for the difference in > performance. Internal queuing is not a factor with any reasonable > modern drive when the depth is set at 8. Hm, obviously we could start right from the beginning and ask people with aic controllers and symbios controllers for some comparison figures. Hopefully some people are interested. Here we go: Hello out there :-) we need your help. If you own a scsi-controller from adaptec or one with ncr/symbios chipset can you please do the following: reboot your box. Start xcdroast and read in a data cd. Tell us: brand of your cdrom, how much RAM you have, processor type, throughput as measured by xcdroast. Nice would be if you try several times. We are not really interested in the hard figures, but want to extract some "global" tendency. Thank you for your cooperation, Stephan PS: my values are (I obviously have both controllers): Adaptec: Drive TEAC-CD-532S (30x), 1 GB RAM, 2 x PIII 1GHz test 1: 2998,9 kB/s test 2: 2968,4 kB/s test 3: 3168,2 kB/s Tekram (symbios) Drive TEAC-CD-532S (30x), 1 GB RAM, 2 X PIII 1GHz test 1: 3619,3 kB/s test 2: 3611,1 kB/s test 3: 3620,2 kB/s ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 @ 2001-11-02 22:42 Ben Smith 2001-11-03 22:53 ` Adaptec vs Symbios performance Stephan von Krawczynski 0 siblings, 1 reply; 14+ messages in thread From: Ben Smith @ 2001-11-02 22:42 UTC (permalink / raw) To: linux-kernel > Anyway, I posted a suggested patch that should fix the behaviour, but it > doesn't fix the fundamental problem with locking the wrong kinds of > pages (ie you're definitely on your own if you happen to lock down most > of the low 1GB of an intel machine). I've tried the patch you sent and it doesn't help. I applied the patch to 2.4.13-pre7 and it hung the machine in the same way (ctrl-alt-del didn't work). The last few lines of vmstat before the machine hung look like this: 0 1 0 0 133444 5132 3367312 0 0 31196 0 1121 2123 0 6 94 0 1 0 0 63036 5216 3435920 0 0 34338 14 1219 2272 0 5 95 2 0 1 0 6156 1828 3494904 0 0 31268 0 1130 2198 0 23 77 1 0 1 0 3596 864 3498488 0 0 2720 16 1640 1068 0 88 12 > It would be interesting to hear whether that is equally true in the new > VM that doesn't necessarily page stuff out unless it can show that the > memory pressure is actually from VM mappings. > > How big is your mlock area during real load? Still the "max the kernel > will allow"? Or is that just a benchmark/test kind of thing? I haven't had a chance to try my real app yet, but my test application is a good simulation of what the real program does, minus any of the accessing of the data that it maps. Since it's the only application running, and for performance reasons we'd need all of our data in memory, we map the "max the kernel will allow". As another note, I've re-written my test application to use madvise instead of mlock, on a suggestion from Andrea. It also doesn't work. For 2.4.13, after running for a while, my test app hangs, using one CPU, and kswapd consumes the other CPU. I was eventually able to kill my test app. I've also re-written my test app to use anonymous mmap, followed by a mlock and read()'s. This actually does work without problems, but doesn't really do what we want for other reasons. - Ben Ben Smith Google, Inc. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Adaptec vs Symbios performance 2001-11-02 22:42 Google's mm problem - not reproduced on 2.4.13 Ben Smith @ 2001-11-03 22:53 ` Stephan von Krawczynski 2001-11-03 23:01 ` arjan 0 siblings, 1 reply; 14+ messages in thread From: Stephan von Krawczynski @ 2001-11-03 22:53 UTC (permalink / raw) To: linux-kernel; +Cc: groudier Hello Justin, hello Gerard I am looking currently for reasons for bad behaviour of aic7xxx driver in an shared interrupt setup and general not-nice behaviour of the driver regarding multi-tasking environment. Here is what I found in the code: /* * SCSI controller interrupt handler. */ void ahc_linux_isr(int irq, void *dev_id, struct pt_regs * regs) { struct ahc_softc *ahc; struct ahc_cmd *acmd; u_long flags; ahc = (struct ahc_softc *) dev_id; ahc_lock(ahc, &flags); ahc_intr(ahc); /* * It would be nice to run the device queues from a * bottom half handler, but as there is no way to * dynamically register one, we'll have to postpone * that until we get integrated into the kernel. */ ahc_linux_run_device_queues(ahc); acmd = TAILQ_FIRST(&ahc->platform_data->completeq); TAILQ_INIT(&ahc->platform_data->completeq); ahc_unlock(ahc, &flags); if (acmd != NULL) ahc_linux_run_complete_queue(ahc, acmd); } This is nice. I cannot read the complete code around it (it is derived from aic7xxx_linux.c) but if I understand the naming and comments correct, some workload is done inside the hardware interrupt (which shouldn't), which would very much match my tests showing bad overall performance behaviour. Obviously this code is old (read the comment) and needs reworking. Comments? Regards, Stephan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Adaptec vs Symbios performance 2001-11-03 22:53 ` Adaptec vs Symbios performance Stephan von Krawczynski @ 2001-11-03 23:01 ` arjan 0 siblings, 0 replies; 14+ messages in thread From: arjan @ 2001-11-03 23:01 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: linux-kernel In article <200111032253.XAA20342@webserver.ithnet.com> you wrote: > Hello Justin, hello Gerard > > I am looking currently for reasons for bad behaviour of aic7xxx driver > in an shared interrupt setup and general not-nice behaviour of the > driver regarding multi-tasking environment. > Here is what I found in the code: > * It would be nice to run the device queues from a > * bottom half handler, but as there is no way to > * dynamically register one, we'll have to postpone > * that until we get integrated into the kernel. > */ sounds like a good tasklet candidate...... ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2001-11-05 12:19 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <200111032318.fA3NIQY62745@aslan.scsiguy.com> 2001-11-04 3:50 ` Adaptec vs Symbios performance Stephan von Krawczynski 2001-11-04 5:47 ` Justin T. Gibbs 2001-11-04 5:23 ` Gérard Roudier 2001-11-04 14:17 ` Stephan von Krawczynski 2001-11-04 18:10 ` Justin T. Gibbs 2001-11-04 18:35 ` Stephan von Krawczynski 2001-11-04 16:31 ` Gérard Roudier 2001-11-04 19:13 ` Justin T. Gibbs 2001-11-04 19:56 ` Stephan von Krawczynski 2001-11-04 20:43 ` Justin T. Gibbs 2001-11-05 12:18 ` Matthias Andree 2001-11-04 19:02 ` Stephan von Krawczynski 2001-11-02 22:42 Google's mm problem - not reproduced on 2.4.13 Ben Smith 2001-11-03 22:53 ` Adaptec vs Symbios performance Stephan von Krawczynski 2001-11-03 23:01 ` arjan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).