* SMP spin-locks @ 2001-06-14 17:26 Richard B. Johnson 2001-06-14 17:32 ` David S. Miller ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Richard B. Johnson @ 2001-06-14 17:26 UTC (permalink / raw) To: Linux kernel I __finally__ got back on "the list". They finally fixed the company firewall! During my absence, I had the chance to look at some SMP code because of a performance problem (a few microseconds out of spec on a 130 MHz embedded system) and I have a question about the current spin-locks. Spin-locks now transfer control to the .text.lock segment. This is a separate segment that can be at an offset that is far away from the currently executing code. That may cause the cache to be reloaded. Further, each spin-lock invocation generates separate code within that segment. Question 1: Why? Question 2: What is the purpose of the code sequence, "repz nop" generated by the spin-lock code? Is this a processor BUG work-around? `as` doesn't "like" this sequence and, Intel doesn't seem to document it. Cheers, Dick Johnson Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips). "Memory is like gasoline. You use it up when you are running. Of course you get it all back when you reboot..."; Actual explanation obtained from the Micro$oft help desk. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: SMP spin-locks 2001-06-14 17:26 SMP spin-locks Richard B. Johnson @ 2001-06-14 17:32 ` David S. Miller 2001-06-14 17:35 ` Kurt Garloff 2001-06-14 20:42 ` Roger Larsson 2 siblings, 0 replies; 14+ messages in thread From: David S. Miller @ 2001-06-14 17:32 UTC (permalink / raw) To: root; +Cc: Linux kernel Richard B. Johnson writes: > Spin-locks now transfer control to the .text.lock segment. > This is a separate segment that can be at an offset that > is far away from the currently executing code. That may > cause the cache to be reloaded. Further, each spin-lock > invocation generates separate code within that segment. > > Question 1: Why? Because this increases the code density of the common case, getting the lock immediately. > Question 2: What is the purpose of the code sequence, "repz nop" > generated by the spin-lock code? Is this a processor BUG work-around? > `as` doesn't "like" this sequence and, Intel doesn't seem to > document it. It is a hint to the processor that we are executing a spinlock loop (it does something wrt. keeping the cacheline of the lock in shared state when possible). I believe it is documented in the Pentium 4 manuals, previous chips ignore this sequence and treat it as a pure nop from what I understand. Later, David S. Miller davem@redhat.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: SMP spin-locks 2001-06-14 17:26 SMP spin-locks Richard B. Johnson 2001-06-14 17:32 ` David S. Miller @ 2001-06-14 17:35 ` Kurt Garloff 2001-06-15 6:51 ` Doug Ledford 2001-06-14 20:42 ` Roger Larsson 2 siblings, 1 reply; 14+ messages in thread From: Kurt Garloff @ 2001-06-14 17:35 UTC (permalink / raw) To: linux-kernel [-- Attachment #1: Type: text/plain, Size: 404 bytes --] On Thu, Jun 14, 2001 at 01:26:05PM -0400, Richard B. Johnson wrote: > Question 2: What is the purpose of the code sequence, "repz nop" Puts iP4 into low power mode. Regards, -- Kurt Garloff <garloff@suse.de> Eindhoven, NL GPG key: See mail header, key servers Linux kernel development SuSE GmbH, Nuernberg, FRG SCSI, Security [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: SMP spin-locks 2001-06-14 17:35 ` Kurt Garloff @ 2001-06-15 6:51 ` Doug Ledford 0 siblings, 0 replies; 14+ messages in thread From: Doug Ledford @ 2001-06-15 6:51 UTC (permalink / raw) To: Kurt Garloff; +Cc: linux-kernel Kurt Garloff wrote: > > On Thu, Jun 14, 2001 at 01:26:05PM -0400, Richard B. Johnson wrote: > > Question 2: What is the purpose of the code sequence, "repz nop" > > Puts iP4 into low power mode. Umm, slightly more accurate would be to say that it makes the P4 processor wait before resuming the loop to give the lock a chance to have been released. It makes the code go from a constant busy loop to a check/wait small amount of time/check again loop. This in turns keeps your processor from trying to constantly check the lock itself which is suppossed to have benefits in terms of inter-processor bus pressure. -- Doug Ledford <dledford@redhat.com> http://people.redhat.com/dledford Please check my web site for aic7xxx updates/answers before e-mailing me about problems ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: SMP spin-locks 2001-06-14 17:26 SMP spin-locks Richard B. Johnson 2001-06-14 17:32 ` David S. Miller 2001-06-14 17:35 ` Kurt Garloff @ 2001-06-14 20:42 ` Roger Larsson 2001-06-14 21:05 ` Richard B. Johnson 2 siblings, 1 reply; 14+ messages in thread From: Roger Larsson @ 2001-06-14 20:42 UTC (permalink / raw) To: Richard B. Johnson, Linux kernel Hi, Wait a minute... Spinlocks on a embedded system? Is it _really_ SMP? What kind of performance problem do you have? My guess, since you are mentioning spin locks, is that you are having a latency problem - RT process does not execute/start quickly enough? If that is the case you should look at Andrew Mortons low latency patches. http://www.uow.edu.au/~andrewm/linux/schedlat.html /RogerL On Thursday 14 June 2001 19:26, Richard B. Johnson wrote: > I __finally__ got back on "the list". They finally fixed the > company firewall! > > During my absence, I had the chance to look at some SMP code > because of a performance problem (a few microseconds out of > spec on a 130 MHz embedded system) and I have a question about > the current spin-locks. > > Spin-locks now transfer control to the .text.lock segment. > This is a separate segment that can be at an offset that > is far away from the currently executing code. That may > cause the cache to be reloaded. Further, each spin-lock > invocation generates separate code within that segment. > > Question 1: Why? > > Question 2: What is the purpose of the code sequence, "repz nop" > generated by the spin-lock code? Is this a processor BUG work-around? > `as` doesn't "like" this sequence and, Intel doesn't seem to > document it. > > > Cheers, > Dick Johnson > > Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips). > > "Memory is like gasoline. You use it up when you are running. Of > course you get it all back when you reboot..."; Actual explanation > obtained from the Micro$oft help desk. > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Roger Larsson Skellefteå Sweden ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: SMP spin-locks 2001-06-14 20:42 ` Roger Larsson @ 2001-06-14 21:05 ` Richard B. Johnson 2001-06-14 21:30 ` Roger Larsson ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Richard B. Johnson @ 2001-06-14 21:05 UTC (permalink / raw) To: Roger Larsson; +Cc: Linux kernel On Thu, 14 Jun 2001, Roger Larsson wrote: > Hi, > > Wait a minute... > > Spinlocks on a embedded system? Is it _really_ SMP? > The embedded system is not SMP. However, there is definite advantage to using an unmodified kernel that may/may-not have been compiled for SMP. Of course spin-locks are used to prevent interrupts from screwing up buffer pointers, etc. > What kind of performance problem do you have? The problem is that a data acquisition board across the PCI bus gives a data transfer rate of 10 to 11 megabytes per second with a UP kernel, and the transfer drops to 5-6 megabytes per second with a SMP kernel. The ISR is really simple and copies data, that's all. The 'read()' routine uses a spinlock when it modifies pointers. I started to look into where all the CPU clocks were going. The SMP spinlock code is where it's going. There is often contention for the lock because interrupts normally occur at 50 to 60 kHz. When there is contention, a very long........jump occurs into the test.lock segment. I think this is flushing queues. Cheers, Dick Johnson Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips). "Memory is like gasoline. You use it up when you are running. Of course you get it all back when you reboot..."; Actual explanation obtained from the Micro$oft help desk. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: SMP spin-locks 2001-06-14 21:05 ` Richard B. Johnson @ 2001-06-14 21:30 ` Roger Larsson 2001-06-15 3:21 ` Richard B. Johnson 2001-06-15 12:10 ` Ingo Oeser 2001-06-15 15:52 ` Pavel Machek 2 siblings, 1 reply; 14+ messages in thread From: Roger Larsson @ 2001-06-14 21:30 UTC (permalink / raw) To: root; +Cc: Linux kernel On Thursday 14 June 2001 23:05, you wrote: > On Thu, 14 Jun 2001, Roger Larsson wrote: > > Hi, > > > > Wait a minute... > > > > Spinlocks on a embedded system? Is it _really_ SMP? > > The embedded system is not SMP. However, there is definite > advantage to using an unmodified kernel that may/may-not > have been compiled for SMP. Of course spin-locks are used > to prevent interrupts from screwing up buffer pointers, etc. > Not really - it prevents another processor entering the same code segment (spin_lock_irqsave prevents both another processor and local interrupts). An interrupt on UP can not wait on a spin lock - it will never be released since no other code than the interrupt spinning will be able to execute) > > What kind of performance problem do you have? > > The problem is that a data acquisition board across the PCI bus > gives a data transfer rate of 10 to 11 megabytes per second > with a UP kernel, and the transfer drops to 5-6 megabytes per > second with a SMP kernel. The ISR is really simple and copies > data, that's all. > > The 'read()' routine uses a spinlock when it modifies pointers. > > I started to look into where all the CPU clocks were going. The > SMP spinlock code is where it's going. There is often contention > for the lock because interrupts normally occur at 50 to 60 kHz. > SMP compiled kernel, but running on UP hardware - right? Then this _should not_ happen! see linux/Documentation/spinlocks.txt Is it your spinlocks that are causing this, or? > When there is contention, a very long........jump occurs into > the test.lock segment. I think this is flushing queues. > It does not matter, if there is contention - let it take time. Waiting is what spinlocking is about anyway... /RogerL -- Roger Larsson Skellefteå Sweden ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: SMP spin-locks 2001-06-14 21:30 ` Roger Larsson @ 2001-06-15 3:21 ` Richard B. Johnson 2001-06-15 2:33 ` David Lang 2001-06-15 10:35 ` David Schwartz 0 siblings, 2 replies; 14+ messages in thread From: Richard B. Johnson @ 2001-06-15 3:21 UTC (permalink / raw) To: Roger Larsson; +Cc: Linux kernel On Thu, 14 Jun 2001, Roger Larsson wrote: > On Thursday 14 June 2001 23:05, you wrote: > > On Thu, 14 Jun 2001, Roger Larsson wrote: > > > Hi, > > > > > > Wait a minute... > > > > > > Spinlocks on a embedded system? Is it _really_ SMP? > > > > The embedded system is not SMP. However, there is definite > > advantage to using an unmodified kernel that may/may-not > > have been compiled for SMP. Of course spin-locks are used > > to prevent interrupts from screwing up buffer pointers, etc. > > > > Not really - it prevents another processor entering the same code > segment (spin_lock_irqsave prevents both another processor and > local interrupts). > > An interrupt on UP can not wait on a spin lock - it will never be released > since no other code than the interrupt spinning will be able to execute) An interrupt on a UP system will never spin, nor will the IP from another CPU because there isn't another CPU. A spin-lock, compiled for UP is: pushf popl some_register, currently EBX cli ; Clear the interrupts on the only CPU you have do_some_code_that_must_not_be_interrupted(); pushl same_register_as_above popf ; Restore interrupts if they were enabled For SMP is: pushf popl some_register cli ; Clear interrupts modify_a_memory_variable x: see_if_it_is_what_you_expect if_not_loop_to x do_some_code_that_must_not_be_interrupted(); modify_the_memory_variable_back pushl same_register_as_above popf Since `cli` will only stop interrupts on the CPU that actually fetches the instruction, another CPU can enter the code unless it is forced to spin until the lock is released. If this code is executed on a UP machine, the memory variable will always become exactly as expected so it will never spin. Therefore SMP code should be perfectly safe on a UP machine, in fact must be perfectly safe, or it's broken. The current spinlock code does work perfectly on a UP machine. However, the large difference in performance shows that something is quite less than optimum in the coding. Spinlocks are machine dependent. A simple increment of a byte memory variable, spinning if it's not 1 will do fine. Decrementing this variable will release the lock. A `lock` prefix is not necessary because all Intel byte operations are atomic anyway. This assumes that the lock was initialized to 0. It doesn't have to be. It could be initialized to 0xaa (anything) and spin if it's not 0xab (or anything + 1). > > SMP compiled kernel, but running on UP hardware - right? > Then this _should not_ happen! > > see linux/Documentation/spinlocks.txt > This, in fact, will happen. Machines booted from the network should have SMP code so a SMP machine can use all its CPUs. This same code, booted from the network, should have no measurable performance penalty in UP machines. Also, when you develop drivers on a workstation, test them on a workstation, then upload everything to an embedded system, you had better be executing the same code, kernel, drivers, et all, or you are in a world of hurt. Many embedded systems don't have any 'standard I/O' so you can't prove that it meets its specs (exception handling, etc) on the target. You have to test that logic elsewhere. This workstation has two CPUs. All drivers are modules. It uses initrd to install the ones for my SCSI disks, network, etc. Script started on Thu Jun 14 23:13:10 2001 lsmod Module Size Used by ramdisk 4448 0 loop 8212 0 (autoclean) ipx 19248 0 (unused) 3c59x 25020 1 (autoclean) nls_cp437 4408 4 (autoclean) BusLogic 38320 6 sd_mod 10932 6 scsi_mod 59460 2 [BusLogic sd_mod] # exit exit Script done on Thu Jun 14 23:13:45 2001 The same kernel, uploaded to an embedded system, also uses initrd to load the machine-specific drivers. In this way, only the drivers that are actually used, are loaded. The kernel remains small. There is a slight performance penality for using modules, but none other. # telnet platinum Trying 10.106.100.166... Connected to platinum.analogic.com. Escape character is '^]'. Enter "help" for commands PLATINUM> sho modules pcilynx 13468 1 raw1394 7984 1 ieee1394 22984 0 [pcilynx raw1394] rtc_drvr 2372 0 vxibus 10660 6 gpib_drvr 19200 2 ramdisk 4428 0 pcnet32se 15640 1 PLATINUM> exit Exit Connection closed by foreign host. # exit exit Cheers, Dick Johnson Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips). "Memory is like gasoline. You use it up when you are running. Of course you get it all back when you reboot..."; Actual explanation obtained from the Micro$oft help desk. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: SMP spin-locks 2001-06-15 3:21 ` Richard B. Johnson @ 2001-06-15 2:33 ` David Lang 2001-06-15 10:35 ` David Schwartz 1 sibling, 0 replies; 14+ messages in thread From: David Lang @ 2001-06-15 2:33 UTC (permalink / raw) To: Richard B. Johnson; +Cc: Roger Larsson, Linux kernel I thought that when you compiled a kernel as UP it replaced the spin-lock macros with versions that are blank. As a result a UP kernel spends no time doing spinlocks at all. that's why a SMP kernel on a UP box is slightly slower, there is more code to be executed David Lang On Thu, 14 Jun 2001, Richard B. Johnson wrote: > Date: Thu, 14 Jun 2001 23:21:35 -0400 (EDT) > From: Richard B. Johnson <root@chaos.analogic.com> > To: Roger Larsson <roger.larsson@norran.net> > Cc: Linux kernel <linux-kernel@vger.kernel.org> > Subject: Re: SMP spin-locks > > On Thu, 14 Jun 2001, Roger Larsson wrote: > > > On Thursday 14 June 2001 23:05, you wrote: > > > On Thu, 14 Jun 2001, Roger Larsson wrote: > > > > Hi, > > > > > > > > Wait a minute... > > > > > > > > Spinlocks on a embedded system? Is it _really_ SMP? > > > > > > The embedded system is not SMP. However, there is definite > > > advantage to using an unmodified kernel that may/may-not > > > have been compiled for SMP. Of course spin-locks are used > > > to prevent interrupts from screwing up buffer pointers, etc. > > > > > > > Not really - it prevents another processor entering the same code > > segment (spin_lock_irqsave prevents both another processor and > > local interrupts). > > > > An interrupt on UP can not wait on a spin lock - it will never be released > > since no other code than the interrupt spinning will be able to execute) > > An interrupt on a UP system will never spin, nor will the IP from > another CPU because there isn't another CPU. A spin-lock, compiled > for UP is: > > pushf > popl some_register, currently EBX > cli ; Clear the interrupts on the only CPU you have > > do_some_code_that_must_not_be_interrupted(); > > pushl same_register_as_above > popf ; Restore interrupts if they were enabled > > > For SMP is: > > pushf > popl some_register > cli ; Clear interrupts > modify_a_memory_variable > x: see_if_it_is_what_you_expect > if_not_loop_to x > > do_some_code_that_must_not_be_interrupted(); > > modify_the_memory_variable_back > pushl same_register_as_above > popf > > > Since `cli` will only stop interrupts on the CPU that actually > fetches the instruction, another CPU can enter the code unless > it is forced to spin until the lock is released. > > If this code is executed on a UP machine, the memory variable > will always become exactly as expected so it will never spin. > Therefore SMP code should be perfectly safe on a UP machine, > in fact must be perfectly safe, or it's broken. > > The current spinlock code does work perfectly on a UP machine. > However, the large difference in performance shows that something > is quite less than optimum in the coding. > > Spinlocks are machine dependent. A simple increment of a byte > memory variable, spinning if it's not 1 will do fine. Decrementing > this variable will release the lock. A `lock` prefix is not necessary > because all Intel byte operations are atomic anyway. This assumes > that the lock was initialized to 0. It doesn't have to be. It > could be initialized to 0xaa (anything) and spin if it's not > 0xab (or anything + 1). > > > > > > SMP compiled kernel, but running on UP hardware - right? > > Then this _should not_ happen! > > > > see linux/Documentation/spinlocks.txt > > > > This, in fact, will happen. Machines booted from the network should > have SMP code so a SMP machine can use all its CPUs. This same > code, booted from the network, should have no measurable performance > penalty in UP machines. > > Also, when you develop drivers on a workstation, test them on > a workstation, then upload everything to an embedded system, you > had better be executing the same code, kernel, drivers, et all, > or you are in a world of hurt. Many embedded systems don't have > any 'standard I/O' so you can't prove that it meets its specs > (exception handling, etc) on the target. You have to test that > logic elsewhere. > > This workstation has two CPUs. All drivers are modules. It uses > initrd to install the ones for my SCSI disks, network, etc. > > Script started on Thu Jun 14 23:13:10 2001 > lsmod > Module Size Used by > ramdisk 4448 0 > loop 8212 0 (autoclean) > ipx 19248 0 (unused) > 3c59x 25020 1 (autoclean) > nls_cp437 4408 4 (autoclean) > BusLogic 38320 6 > sd_mod 10932 6 > scsi_mod 59460 2 [BusLogic sd_mod] > # exit > exit > > Script done on Thu Jun 14 23:13:45 2001 > > The same kernel, uploaded to an embedded system, also uses > initrd to load the machine-specific drivers. In this way, only > the drivers that are actually used, are loaded. The kernel remains > small. There is a slight performance penality for using modules, > but none other. > > # telnet platinum > Trying 10.106.100.166... > Connected to platinum.analogic.com. > Escape character is '^]'. > > Enter "help" for commands > > PLATINUM> sho modules > > pcilynx 13468 1 > raw1394 7984 1 > ieee1394 22984 0 [pcilynx raw1394] > rtc_drvr 2372 0 > vxibus 10660 6 > gpib_drvr 19200 2 > ramdisk 4428 0 > pcnet32se 15640 1 > > PLATINUM> exit > Exit > > Connection closed by foreign host. > # exit > exit > > > Cheers, > Dick Johnson > > Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips). > > "Memory is like gasoline. You use it up when you are running. Of > course you get it all back when you reboot..."; Actual explanation > obtained from the Micro$oft help desk. > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: SMP spin-locks 2001-06-15 3:21 ` Richard B. Johnson 2001-06-15 2:33 ` David Lang @ 2001-06-15 10:35 ` David Schwartz 2001-06-15 13:26 ` Richard B. Johnson 1 sibling, 1 reply; 14+ messages in thread From: David Schwartz @ 2001-06-15 10:35 UTC (permalink / raw) To: root, Roger Larsson; +Cc: Linux kernel > Spinlocks are machine dependent. A simple increment of a byte > memory variable, spinning if it's not 1 will do fine. Decrementing > this variable will release the lock. A `lock` prefix is not necessary ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > because all Intel byte operations are atomic anyway. This assumes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > that the lock was initialized to 0. It doesn't have to be. It > could be initialized to 0xaa (anything) and spin if it's not > 0xab (or anything + 1). If this is true, atomicity isn't enough to do it. Atomicity means that there's a single instruction (and so it can't be interrupted mid-modify). Atomicity (at least as the term is normally used) doesn't prevent the cache-coherency logic from ping-ponging the memory location between two processor's caches during the atomic operation. DS ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: SMP spin-locks 2001-06-15 10:35 ` David Schwartz @ 2001-06-15 13:26 ` Richard B. Johnson 0 siblings, 0 replies; 14+ messages in thread From: Richard B. Johnson @ 2001-06-15 13:26 UTC (permalink / raw) To: David Schwartz; +Cc: Roger Larsson, Linux kernel On Fri, 15 Jun 2001, David Schwartz wrote: > > > Spinlocks are machine dependent. A simple increment of a byte > > memory variable, spinning if it's not 1 will do fine. Decrementing > > this variable will release the lock. A `lock` prefix is not necessary > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > because all Intel byte operations are atomic anyway. This assumes > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > that the lock was initialized to 0. It doesn't have to be. It > > could be initialized to 0xaa (anything) and spin if it's not > > 0xab (or anything + 1). > > If this is true, atomicity isn't enough to do it. Atomicity means that > there's a single instruction (and so it can't be interrupted mid-modify). > Atomicity (at least as the term is normally used) doesn't prevent the > cache-coherency logic from ping-ponging the memory location between two > processor's caches during the atomic operation. > > DS Try it. You'll like it. There are no simultaneous accesses from different CPUs to any address space of any kind on an Intel-based SMP machine. That is a fact. This is because there is only one group of decodes for this address space. This applies for both memory and I/O. Basically, the bus even though it may be broken into several units of different speeds, operates as a unit. So, only one operation can be occurring at any instant. Now, suppose you have a DSP that accesses it's own memory. It's on a different board than the main CPU. You provide a mechanism whereby your CPU can share a portion (or all) of this memory. For this, you "dual-port" the memory, or you access it via a PCI bus. Anyway, the DSP's memory now appears in your address space. When you access this memory at a time that the DSP could be writing to it, you need a `lock` prefix. Also hardware needs to handle the #LOCK signal properly or you may get some funny values from the DSP. As shown in the '486 manual, if you perform a read/modify/write operation you may need a lock prefix. Unlike CPUs that can only perform load and store operations upon memory, the ix86 can perform many operations directly. Amongst many of these wonderful instructions is the ability to increment or decrement a byte anywhere in memory. The CPU does not perform a read/modify/write operation in the general sense when it does this. Instead, the data is read, modified, and written in a single bus cycle. There is no way that another CPU can access the bus in between these operations. Memory access instructions that are complete in a single bus cycle (this is not a single CPU clock), would never need a lock prefix. The lock-prefix executes in only a single CPU clock. The idea is not to get rid of this. The idea is to get rid of the awful spin_lock_irqsave()/ spin_lock_irqrestore() code that has grown like some virus and replace it with simple working code that does not use a seperate segment for the spinning, etc. Also, the cache of all CPUs "knows" when a write within its cache-line has occurred so the next CPU will correctly see the result of the previous operation. Cheers, Dick Johnson Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips). "Memory is like gasoline. You use it up when you are running. Of course you get it all back when you reboot..."; Actual explanation obtained from the Micro$oft help desk. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: SMP spin-locks 2001-06-14 21:05 ` Richard B. Johnson 2001-06-14 21:30 ` Roger Larsson @ 2001-06-15 12:10 ` Ingo Oeser 2001-06-15 12:49 ` Richard B. Johnson 2001-06-15 15:52 ` Pavel Machek 2 siblings, 1 reply; 14+ messages in thread From: Ingo Oeser @ 2001-06-15 12:10 UTC (permalink / raw) To: Richard B. Johnson; +Cc: Roger Larsson, Linux kernel On Thu, Jun 14, 2001 at 05:05:07PM -0400, Richard B. Johnson wrote: > The problem is that a data acquisition board across the PCI bus > gives a data transfer rate of 10 to 11 megabytes per second > with a UP kernel, and the transfer drops to 5-6 megabytes per > second with a SMP kernel. The ISR is really simple and copies > data, that's all. > > The 'read()' routine uses a spinlock when it modifies pointers. > > I started to look into where all the CPU clocks were going. The > SMP spinlock code is where it's going. There is often contention > for the lock because interrupts normally occur at 50 to 60 kHz. Then you need another (better?) queueing mechanism. Use multiple queues and a _overflowable_ sequence number as global variable between the queues. N Queues (N := no. of CPUs + 1), which have a spin_lock for each queue. optionally: One reader packet reassembly priority queue (APQ) ordered by sequence number (implicitly or explicitly), if this shouldn't be done in user space. In the writer ISR: Foreach Queue in RR order (start with remebered one): - Try to lock it with spin_trylock (totally inline!) + Failed * if we failed to find a free queue for x "rounds", disable device (we have no reader) and notify user space somehow * increment "rounds" * next queue + Succeed * Increment sequence number * Put data record into queue (* remember this queue as last queue used) (* mark queue "not empty") * do other IRQ work... In the reader routine: Foreach Queue in RR order (start with remebered one): - No data counter above threshold -> EAGAIN [1] - Try to lock it with spin_trylock (totally inline!) + Failed -> next queue + Succeed * if queue empty, unlock and try next one (* remember this queue as last queue used) * Get one data record from queue (in queue order!) * Move data record into APQ * Unlock queue * Deliver as much data from the APQ, as the user wants and is available - if all queues empty or locked -> increment "no data round" counter Notes: The "last queue used" variable is static, but local to routine. It is there to decrease the number of iterations and distribute the data to all queues as more equally. Statistics about lock contention per queue, per round and per try would be nice here to estimate the number of queues needed. The APQ can be quite large, if the sequences are bad distributed and some queues tend to be always locked, if the reader wants to read from this queue. The above can be solved by 2^N "One entry queues" (aka slots) and sequence numbers mapping to this slots. If you need many slots (more then 256, I would say) then this is again inaccaptable, because of the iteration cost in the ISR. What do you think? After some polishing this should decrease lock contention noticibly. Regards Ingo Oeser [1] Blocking will be harder to implement here, since we need to notify the reader routine, that we have data available, which involves some latency you cannot afford. Maybe this could be done via schedule_task(), if needed. -- Use ReiserFS to get a faster fsck and Ext2 to fsck slowly and gently. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: SMP spin-locks 2001-06-15 12:10 ` Ingo Oeser @ 2001-06-15 12:49 ` Richard B. Johnson 0 siblings, 0 replies; 14+ messages in thread From: Richard B. Johnson @ 2001-06-15 12:49 UTC (permalink / raw) To: Ingo Oeser; +Cc: Roger Larsson, Linux kernel On Fri, 15 Jun 2001, Ingo Oeser wrote: > On Thu, Jun 14, 2001 at 05:05:07PM -0400, Richard B. Johnson wrote: > > The problem is that a data acquisition board across the PCI bus > > gives a data transfer rate of 10 to 11 megabytes per second > > with a UP kernel, and the transfer drops to 5-6 megabytes per > > second with a SMP kernel. The ISR is really simple and copies > > data, that's all. > > > > The 'read()' routine uses a spinlock when it modifies pointers. > > > > I started to look into where all the CPU clocks were going. The > > SMP spinlock code is where it's going. There is often contention > > for the lock because interrupts normally occur at 50 to 60 kHz. > > Then you need another (better?) queueing mechanism. > > Use multiple queues and a _overflowable_ sequence number as > global variable between the queues. > > N Queues (N := no. of CPUs + 1), which have a spin_lock for each > queue. > > optionally: One reader packet reassembly priority queue (APQ) ordered by > sequence number (implicitly or explicitly), if this shouldn't > be done in user space. > > In the writer ISR: > > Foreach Queue in RR order (start with remebered one): > - Try to lock it with spin_trylock (totally inline!) > + Failed > * if we failed to find a free queue for x "rounds", disable > device (we have no reader) and notify user space somehow > * increment "rounds" > * next queue > + Succeed > * Increment sequence number > * Put data record into queue > (* remember this queue as last queue used) > (* mark queue "not empty") > * do other IRQ work... > > In the reader routine: > Foreach Queue in RR order (start with remebered one): > - No data counter above threshold -> EAGAIN [1] > - Try to lock it with spin_trylock (totally inline!) > + Failed -> next queue > + Succeed > * if queue empty, unlock and try next one > (* remember this queue as last queue used) > * Get one data record from queue (in queue order!) > * Move data record into APQ > * Unlock queue > * Deliver as much data from the APQ, as the user wants and > is available > - if all queues empty or locked -> increment "no data round" > counter > > > Notes: > The "last queue used" variable is static, but local to routine. > It is there to decrease the number of iterations and distribute > the data to all queues as more equally. > > > Statistics about lock contention per queue, per round and per > try would be nice here to estimate the number of queues > needed. > > The APQ can be quite large, if the sequences are bad > distributed and some queues tend to be always locked, if the > reader wants to read from this queue. > > The above can be solved by 2^N "One entry queues" (aka slots) > and sequence numbers mapping to this slots. If you need many > slots (more then 256, I would say) then this is again > inaccaptable, because of the iteration cost in the ISR. > > What do you think? After some polishing this should decrease lock > contention noticibly. > > > Regards > > Ingo Oeser > > [1] Blocking will be harder to implement here, since we need to > notify the reader routine, that we have data available, which > involves some latency you cannot afford. Maybe this could be > done via schedule_task(), if needed. > -- > Use ReiserFS to get a faster fsck and Ext2 to fsck slowly and gently. > For further discussion I will take this off-list. However, you are correct. The very simple ISR that I have (I did preallocate buffers) leaves a great deal of room for improvement. However, the logic that you mention has execution overhead as well. Cheers, Dick Johnson Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips). "Memory is like gasoline. You use it up when you are running. Of course you get it all back when you reboot..."; Actual explanation obtained from the Micro$oft help desk. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: SMP spin-locks 2001-06-14 21:05 ` Richard B. Johnson 2001-06-14 21:30 ` Roger Larsson 2001-06-15 12:10 ` Ingo Oeser @ 2001-06-15 15:52 ` Pavel Machek 2 siblings, 0 replies; 14+ messages in thread From: Pavel Machek @ 2001-06-15 15:52 UTC (permalink / raw) To: Richard B. Johnson; +Cc: Roger Larsson, Linux kernel Hi! > The 'read()' routine uses a spinlock when it modifies pointers. > > I started to look into where all the CPU clocks were going. The > SMP spinlock code is where it's going. There is often contention > for the lock because interrupts normally occur at 50 to 60 kHz. > > When there is contention, a very long........jump occurs into > the test.lock segment. I think this is flushing queues. On UP, there's *never* contention on the lock, because irqsave lock disables interrupts. Right? Something else must be slowing you. Pavel PS: But that's bad. Performance should not come down twice -- this will bite you even on real SMP. -- Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt, details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2001-06-16 10:11 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2001-06-14 17:26 SMP spin-locks Richard B. Johnson 2001-06-14 17:32 ` David S. Miller 2001-06-14 17:35 ` Kurt Garloff 2001-06-15 6:51 ` Doug Ledford 2001-06-14 20:42 ` Roger Larsson 2001-06-14 21:05 ` Richard B. Johnson 2001-06-14 21:30 ` Roger Larsson 2001-06-15 3:21 ` Richard B. Johnson 2001-06-15 2:33 ` David Lang 2001-06-15 10:35 ` David Schwartz 2001-06-15 13:26 ` Richard B. Johnson 2001-06-15 12:10 ` Ingo Oeser 2001-06-15 12:49 ` Richard B. Johnson 2001-06-15 15:52 ` Pavel Machek
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).