* Why is scsi_request_fn called every 4 milliseconds? @ 2011-01-27 14:04 BingJiun Luo 2011-01-27 14:43 ` James Bottomley 0 siblings, 1 reply; 6+ messages in thread From: BingJiun Luo @ 2011-01-27 14:04 UTC (permalink / raw) To: linux-ide, linux-scsi Hello, I want to measure SATA AHCI Host controller read performance. Open /dev/sda and using read(int fildes, void *buf, size_t nbyte) user space function to read 2048 times, each time 64KByets, and total 128 Mbytes. I measured the time start from one step before write CI register inside ahci_qc_issue() function until ahci_port_intr () is called in the interrupt context. It takes about 1 milliseconds to complete one 256KBytes READ DMA EXT command, and spend about 15 microseconds call to scsi_done(). However, why scsi_request_fn is called about after 4 milliseconds to pass next IO request for Hardware to issue? It take less if the READ DMA command with less number of sectors. My questions are: 1. Is it the time to prepare one 256 KB READ DMA EXT command by upper layer (Block Layer or Virtual File system Layer)? Or, It is the time to copy data from kernel space memory to user space memory after data is read back from Hard Drive and delay the next command pass to SCSI? I know some architecture has not good enough performance to do memcpy or something like that. 2. If I do not mount /dev/sda to any file system, what is the first kernel function called after read() function from user space? Is it located at VFS or directly to Block layer? Because I want to keep track the time spend at the layer higher than SCSI. 3. When scsi_done() is called, what is the function to process this completed command and pass the data to user space? I think there might be somewhere inside the code to copy this data from kernel space memory address to user space memory address. Thank you very much in advance. Regards, BingJiun ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Why is scsi_request_fn called every 4 milliseconds? 2011-01-27 14:04 Why is scsi_request_fn called every 4 milliseconds? BingJiun Luo @ 2011-01-27 14:43 ` James Bottomley 2011-01-27 17:43 ` Douglas Gilbert 2011-01-28 2:42 ` BingJiun Luo 0 siblings, 2 replies; 6+ messages in thread From: James Bottomley @ 2011-01-27 14:43 UTC (permalink / raw) To: BingJiun Luo; +Cc: linux-ide, linux-scsi On Thu, 2011-01-27 at 22:04 +0800, BingJiun Luo wrote: > I want to measure SATA AHCI Host controller read performance. Open > /dev/sda and using read(int fildes, void *buf, size_t nbyte) user space > function to read 2048 times, each time 64KByets, and total 128 Mbytes. > > I measured the time start from one step before write CI register inside > ahci_qc_issue() function until ahci_port_intr () is called in the interrupt > context. It takes about 1 milliseconds to complete one 256KBytes READ > DMA EXT command, and spend about 15 microseconds call to scsi_done(). > > However, why scsi_request_fn is called about after 4 milliseconds > to pass next IO request for Hardware to issue? It take less if the READ > DMA command with less number of sectors. I'm not sure I parse the question, but I think you're asking why we chain the next issue from the softirq in SCSI? That's because most SCSI devices are tagged and the bus is the bottleneck, so after processing the completion, we need to get the next command out ASAP to keep the bus utilised to capacity. > My questions are: > 1. Is it the time to prepare one 256 KB READ DMA EXT command by upper > layer (Block Layer or Virtual File system Layer)? Or, It is the time to copy > data from kernel space memory to user space memory after data is read > back from Hard Drive and delay the next command pass to SCSI? Everything in SCSI is done with zero copy (as in we DMA straight to the pagecache page, which is then attached to userspace). > I know some architecture has not good enough performance to do memcpy > or something like that. > > 2. If I do not mount /dev/sda to any file system, what is the first > kernel function > called after read() function from user space? Is it located at VFS or > directly to > Block layer? I think you need to trace this for yourself ... it's complex because read doesn't go to the device, it goes via the page cache, which is also how the VFS operates. If the pages are all current in the cache, a read() doesn't have to trouble the disk. > Because I want to keep track the time spend at the layer higher than SCSI. > > 3. When scsi_done() is called, what is the function to process this completed > command and pass the data to user space? I think there might be somewhere > inside the code to copy this data from kernel space memory address to user > space memory address. scsi_done doesn't do anything about completion, it triggers the block softirq to schedule a completion for us when all interrupts are processed. James ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Why is scsi_request_fn called every 4 milliseconds? 2011-01-27 14:43 ` James Bottomley @ 2011-01-27 17:43 ` Douglas Gilbert 2011-01-28 2:22 ` BingJiun Luo 2011-01-28 2:42 ` BingJiun Luo 1 sibling, 1 reply; 6+ messages in thread From: Douglas Gilbert @ 2011-01-27 17:43 UTC (permalink / raw) To: James Bottomley; +Cc: BingJiun Luo, linux-ide, linux-scsi On 11-01-27 09:43 AM, James Bottomley wrote: > On Thu, 2011-01-27 at 22:04 +0800, BingJiun Luo wrote: >> I want to measure SATA AHCI Host controller read performance. Open >> /dev/sda and using read(int fildes, void *buf, size_t nbyte) user space >> function to read 2048 times, each time 64KByets, and total 128 Mbytes. >> >> I measured the time start from one step before write CI register inside >> ahci_qc_issue() function until ahci_port_intr () is called in the interrupt >> context. It takes about 1 milliseconds to complete one 256KBytes READ >> DMA EXT command, and spend about 15 microseconds call to scsi_done(). >> >> However, why scsi_request_fn is called about after 4 milliseconds >> to pass next IO request for Hardware to issue? It take less if the READ >> DMA command with less number of sectors. > > I'm not sure I parse the question, but I think you're asking why we > chain the next issue from the softirq in SCSI? That's because most SCSI > devices are tagged and the bus is the bottleneck, so after processing > the completion, we need to get the next command out ASAP to keep the bus > utilised to capacity. > >> My questions are: >> 1. Is it the time to prepare one 256 KB READ DMA EXT command by upper >> layer (Block Layer or Virtual File system Layer)? Or, It is the time to copy >> data from kernel space memory to user space memory after data is read >> back from Hard Drive and delay the next command pass to SCSI? > > Everything in SCSI is done with zero copy (as in we DMA straight to the > pagecache page, which is then attached to userspace). Just to add some numbers to that point, on this CPU: Intel(R) Core(TM) i5 CPU M 540 @ 2.53GHz [a Lenovo X201 laptop] with a dummy logical unit (pseudo disk) set up with this invocation: $ modprobe scsi_debug delay=0 virtual_gb=2468 with lk 2.6.37 I measure the following. $ ddpt if=/dev/bsg/7:0:0:0 bs=512 count=1m bpt=1 Output file not specified so no copy, just reading input 1048576+0 records in 0+0 records out time to read data: 4.815756 secs at 111.48 MB/sec That is issuing over 1 million SCSI READ commands from a user space program (and reading the data returned) in less that 5 seconds. So the SCSI READ command overhead is better (i.e. less) than 5 microseconds per command. Increase the "blocks per transfer" (bpt) to 512 to see the data throughput (plus fetch 10m blocks) and this is the result: $ ddpt if=/dev/bsg/7:0:0:0 bs=512 count=10m bpt=512 Output file not specified so no copy, just reading input 10485760+0 records in 0+0 records out time to read data: 1.896136 secs at 2831.39 MB/sec The latter figure is around 800 MB/sec using the Ubuntu 10.10 stock kernel (lk 2.6.35-24-generic) on the same machine. Something increased data throughput considerably between lk 2.6.35 and 2.6.37 . OTOH it may be a difference in my .config settings. So the latency per command added by the kernel and the SCSI subsystem (apart from the low level driver and the transport) is measured in microseconds rather than milliseconds. Doug Gilbert PS Another throughput datapoint, using the block subsystem (rather than a pass-through): $ ddpt if=/dev/sdb bs=512 count=10m bpt=512 Output file not specified so no copy, just reading input 10485760+0 records in 0+0 records out time to read data: 4.807517 secs at 1116.73 MB/sec >> I know some architecture has not good enough performance to do memcpy >> or something like that. >> >> 2. If I do not mount /dev/sda to any file system, what is the first >> kernel function >> called after read() function from user space? Is it located at VFS or >> directly to >> Block layer? > > I think you need to trace this for yourself ... it's complex because > read doesn't go to the device, it goes via the page cache, which is also > how the VFS operates. If the pages are all current in the cache, a > read() doesn't have to trouble the disk. > >> Because I want to keep track the time spend at the layer higher than SCSI. >> >> 3. When scsi_done() is called, what is the function to process this completed >> command and pass the data to user space? I think there might be somewhere >> inside the code to copy this data from kernel space memory address to user >> space memory address. > > scsi_done doesn't do anything about completion, it triggers the block > softirq to schedule a completion for us when all interrupts are > processed. > > James ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Why is scsi_request_fn called every 4 milliseconds? 2011-01-27 17:43 ` Douglas Gilbert @ 2011-01-28 2:22 ` BingJiun Luo 0 siblings, 0 replies; 6+ messages in thread From: BingJiun Luo @ 2011-01-28 2:22 UTC (permalink / raw) To: dgilbert; +Cc: James Bottomley, linux-ide, linux-scsi On Fri, Jan 28, 2011 at 1:43 AM, Douglas Gilbert <dgilbert@interlog.com> wrote: > On 11-01-27 09:43 AM, James Bottomley wrote: >> >> On Thu, 2011-01-27 at 22:04 +0800, BingJiun Luo wrote: >>> >>> I want to measure SATA AHCI Host controller read performance. Open >>> /dev/sda and using read(int fildes, void *buf, size_t nbyte) user space >>> function to read 2048 times, each time 64KByets, and total 128 Mbytes. >>> >>> I measured the time start from one step before write CI register inside >>> ahci_qc_issue() function until ahci_port_intr () is called in the >>> interrupt >>> context. It takes about 1 milliseconds to complete one 256KBytes READ >>> DMA EXT command, and spend about 15 microseconds call to scsi_done(). >>> >>> However, why scsi_request_fn is called about after 4 milliseconds >>> to pass next IO request for Hardware to issue? It take less if the READ >>> DMA command with less number of sectors. >> >> I'm not sure I parse the question, but I think you're asking why we >> chain the next issue from the softirq in SCSI? That's because most SCSI >> devices are tagged and the bus is the bottleneck, so after processing >> the completion, we need to get the next command out ASAP to keep the bus >> utilised to capacity. >> >>> My questions are: >>> 1. Is it the time to prepare one 256 KB READ DMA EXT command by upper >>> layer (Block Layer or Virtual File system Layer)? Or, It is the time to >>> copy >>> data from kernel space memory to user space memory after data is read >>> back from Hard Drive and delay the next command pass to SCSI? >> >> Everything in SCSI is done with zero copy (as in we DMA straight to the >> pagecache page, which is then attached to userspace). > > Just to add some numbers to that point, on this CPU: > Intel(R) Core(TM) i5 CPU M 540 @ 2.53GHz > [a Lenovo X201 laptop] with a dummy logical unit > (pseudo disk) set up with this invocation: > $ modprobe scsi_debug delay=0 virtual_gb=2468 > with lk 2.6.37 I measure the following. > > $ ddpt if=/dev/bsg/7:0:0:0 bs=512 count=1m bpt=1 > Output file not specified so no copy, just reading input > 1048576+0 records in > 0+0 records out > time to read data: 4.815756 secs at 111.48 MB/sec > > That is issuing over 1 million SCSI READ commands from a > user space program (and reading the data returned) in less > that 5 seconds. So the SCSI READ command overhead is better > (i.e. less) than 5 microseconds per command. > It depends one how many sectors to be read per command? If 512 sectors are read per time, it spends about 900 microseconds. > Increase the "blocks per transfer" (bpt) to 512 to see > the data throughput (plus fetch 10m blocks) and this > is the result: > > $ ddpt if=/dev/bsg/7:0:0:0 bs=512 count=10m bpt=512 > Output file not specified so no copy, just reading input > 10485760+0 records in > 0+0 records out > time to read data: 1.896136 secs at 2831.39 MB/sec > > The latter figure is around 800 MB/sec using the Ubuntu > 10.10 stock kernel (lk 2.6.35-24-generic) on the same > machine. Something increased data throughput considerably > between lk 2.6.35 and 2.6.37 . OTOH it may be a > difference in my .config settings. > > > So the latency per command added by the kernel and the > SCSI subsystem (apart from the low level driver and the > transport) is measured in microseconds rather than > milliseconds. > I am not running on PC, but embedded system CPU=512MHz and AHB bus 133 MHz. I think there is the different. I can only read about 112 MBytes in 3 seconds. Using hdparm. Kernel version 2.6.28. > Doug Gilbert > > > PS Another throughput datapoint, using the block > subsystem (rather than a pass-through): > $ ddpt if=/dev/sdb bs=512 count=10m bpt=512 > Output file not specified so no copy, just reading input > 10485760+0 records in > 0+0 records out > time to read data: 4.807517 secs at 1116.73 MB/sec > > >>> I know some architecture has not good enough performance to do memcpy >>> or something like that. >>> >>> 2. If I do not mount /dev/sda to any file system, what is the first >>> kernel function >>> called after read() function from user space? Is it located at VFS or >>> directly to >>> Block layer? >> >> I think you need to trace this for yourself ... it's complex because >> read doesn't go to the device, it goes via the page cache, which is also >> how the VFS operates. If the pages are all current in the cache, a >> read() doesn't have to trouble the disk. >> >>> Because I want to keep track the time spend at the layer higher than >>> SCSI. >>> >>> 3. When scsi_done() is called, what is the function to process this >>> completed >>> command and pass the data to user space? I think there might be somewhere >>> inside the code to copy this data from kernel space memory address to >>> user >>> space memory address. >> >> scsi_done doesn't do anything about completion, it triggers the block >> softirq to schedule a completion for us when all interrupts are >> processed. >> >> James > > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Why is scsi_request_fn called every 4 milliseconds? 2011-01-27 14:43 ` James Bottomley 2011-01-27 17:43 ` Douglas Gilbert @ 2011-01-28 2:42 ` BingJiun Luo 2011-01-28 7:37 ` James Bottomley 1 sibling, 1 reply; 6+ messages in thread From: BingJiun Luo @ 2011-01-28 2:42 UTC (permalink / raw) To: James Bottomley; +Cc: linux-ide, linux-scsi On Thu, Jan 27, 2011 at 10:43 PM, James Bottomley <James.Bottomley@suse.de> wrote: > On Thu, 2011-01-27 at 22:04 +0800, BingJiun Luo wrote: >> I want to measure SATA AHCI Host controller read performance. Open >> /dev/sda and using read(int fildes, void *buf, size_t nbyte) user space >> function to read 2048 times, each time 64KByets, and total 128 Mbytes. >> >> I measured the time start from one step before write CI register inside >> ahci_qc_issue() function until ahci_port_intr () is called in the interrupt >> context. It takes about 1 milliseconds to complete one 256KBytes READ >> DMA EXT command, and spend about 15 microseconds call to scsi_done(). >> >> However, why scsi_request_fn is called about after 4 milliseconds >> to pass next IO request for Hardware to issue? It take less if the READ >> DMA command with less number of sectors. > > I'm not sure I parse the question, but I think you're asking why we > chain the next issue from the softirq in SCSI? That's because most SCSI > devices are tagged and the bus is the bottleneck, so after processing > the completion, we need to get the next command out ASAP to keep the bus > utilised to capacity. I observed that each time scsi_request_fn is called, scsi_dispatch_cmd is called only once and then return. It means that only one IO request available to be processed by Host Contoller. After time passed about 4 milliseconds, scsi_request_fn is called again. Why it takes so long time, because the previous command already completed in only about 1 millisecond, including call to scsi_done(). The host controller is idle about 3 milliseconds, has nothing to do. > >> My questions are: >> 1. Is it the time to prepare one 256 KB READ DMA EXT command by upper >> layer (Block Layer or Virtual File system Layer)? Or, It is the time to copy >> data from kernel space memory to user space memory after data is read >> back from Hard Drive and delay the next command pass to SCSI? > > Everything in SCSI is done with zero copy (as in we DMA straight to the > pagecache page, which is then attached to userspace). > Yes, I know it is zero copy at SCSI, but I am not sure at upper layer(VFS or anything else). It is unlikely to zero copy between kernel space and user space memory buffer, right? Because no matter the data read back from disk or already available inside the page cache, both of them are located at kernel space memory, and this data have to be copied into user space address. All of these works are not done in the SCSI layer, somewhere higher than SCSI, just I don't know where?. >> I know some architecture has not good enough performance to do memcpy >> or something like that. >> >> 2. If I do not mount /dev/sda to any file system, what is the first >> kernel function >> called after read() function from user space? Is it located at VFS or >> directly to >> Block layer? > > I think you need to trace this for yourself ... it's complex because > read doesn't go to the device, it goes via the page cache, which is also > how the VFS operates. If the pages are all current in the cache, a > read() doesn't have to trouble the disk. > I am pretty sure almost all READ DMA commands go to the disk, because I captured them by Catalyst Analyzer. So, if all request must go to disk, does it means the data not available in the page cache. >> Because I want to keep track the time spend at the layer higher than SCSI. >> >> 3. When scsi_done() is called, what is the function to process this completed >> command and pass the data to user space? I think there might be somewhere >> inside the code to copy this data from kernel space memory address to user >> space memory address. > > scsi_done doesn't do anything about completion, it triggers the block > softirq to schedule a completion for us when all interrupts are > processed. > > James > > > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Why is scsi_request_fn called every 4 milliseconds? 2011-01-28 2:42 ` BingJiun Luo @ 2011-01-28 7:37 ` James Bottomley 0 siblings, 0 replies; 6+ messages in thread From: James Bottomley @ 2011-01-28 7:37 UTC (permalink / raw) To: BingJiun Luo; +Cc: linux-ide, linux-scsi On Fri, 2011-01-28 at 10:42 +0800, BingJiun Luo wrote: > On Thu, Jan 27, 2011 at 10:43 PM, James Bottomley > <James.Bottomley@suse.de> wrote: > > On Thu, 2011-01-27 at 22:04 +0800, BingJiun Luo wrote: > >> I want to measure SATA AHCI Host controller read performance. Open > >> /dev/sda and using read(int fildes, void *buf, size_t nbyte) user space > >> function to read 2048 times, each time 64KByets, and total 128 Mbytes. > >> > >> I measured the time start from one step before write CI register inside > >> ahci_qc_issue() function until ahci_port_intr () is called in the interrupt > >> context. It takes about 1 milliseconds to complete one 256KBytes READ > >> DMA EXT command, and spend about 15 microseconds call to scsi_done(). > >> > >> However, why scsi_request_fn is called about after 4 milliseconds > >> to pass next IO request for Hardware to issue? It take less if the READ > >> DMA command with less number of sectors. > > > > I'm not sure I parse the question, but I think you're asking why we > > chain the next issue from the softirq in SCSI? That's because most SCSI > > devices are tagged and the bus is the bottleneck, so after processing > > the completion, we need to get the next command out ASAP to keep the bus > > utilised to capacity. > > I observed that each time scsi_request_fn is called, scsi_dispatch_cmd > is called > only once and then return. It means that only one IO request available to be > processed by Host Contoller. Either you're untagged, or you don't have enough I/O then. > After time passed about 4 milliseconds, scsi_request_fn is called > again. Why it > takes so long time, because the previous command already completed in only about > 1 millisecond, including call to scsi_done(). The host controller is > idle about 3 milliseconds, > has nothing to do. No idea ... it's either something to do with the setup on the architecture or it's simply that the I/O load isn't generating multiple commands. On an x86 it's microseconds to reissue from block softirq. > > > >> My questions are: > >> 1. Is it the time to prepare one 256 KB READ DMA EXT command by upper > >> layer (Block Layer or Virtual File system Layer)? Or, It is the time to copy > >> data from kernel space memory to user space memory after data is read > >> back from Hard Drive and delay the next command pass to SCSI? > > > > Everything in SCSI is done with zero copy (as in we DMA straight to the > > pagecache page, which is then attached to userspace). > > > Yes, I know it is zero copy at SCSI, but I am not sure at upper layer(VFS or > anything else). > > It is unlikely to zero copy between kernel space and user space > memory buffer, right? Because no matter the data read back from disk or already > available inside the page cache, both of them are located at kernel > space memory, It depends. Glibc can play clever tricks where it services read() via mmapped buffers. That's zero copy. > and this data have to be copied into user space address. All of these works are > not done in the SCSI layer, somewhere higher than SCSI, just I don't > know where?. No ... the page can simply be placed into an empty userspace mapping ... that's what we mostly try to do. > >> I know some architecture has not good enough performance to do memcpy > >> or something like that. > >> > >> 2. If I do not mount /dev/sda to any file system, what is the first > >> kernel function > >> called after read() function from user space? Is it located at VFS or > >> directly to > >> Block layer? > > > > I think you need to trace this for yourself ... it's complex because > > read doesn't go to the device, it goes via the page cache, which is also > > how the VFS operates. If the pages are all current in the cache, a > > read() doesn't have to trouble the disk. > > > I am pretty sure almost all READ DMA commands go to the disk, because > I captured them by Catalyst Analyzer. So, if all request must go to disk, does > it means the data not available in the page cache. Yes ... page cache is checked first before fetching from storage. James > >> Because I want to keep track the time spend at the layer higher than SCSI. > >> > >> 3. When scsi_done() is called, what is the function to process this completed > >> command and pass the data to user space? I think there might be somewhere > >> inside the code to copy this data from kernel space memory address to user > >> space memory address. > > > > scsi_done doesn't do anything about completion, it triggers the block > > softirq to schedule a completion for us when all interrupts are > > processed. > > > > James > > > > > > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-01-28 7:37 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-01-27 14:04 Why is scsi_request_fn called every 4 milliseconds? BingJiun Luo 2011-01-27 14:43 ` James Bottomley 2011-01-27 17:43 ` Douglas Gilbert 2011-01-28 2:22 ` BingJiun Luo 2011-01-28 2:42 ` BingJiun Luo 2011-01-28 7:37 ` James Bottomley
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.