Il giorno mer, 27/11/2019 alle 21.08 +0800, Ming Lei ha scritto: > On Wed, Nov 27, 2019 at 10:39:40AM +0100, Andrea Vai wrote: > > Il giorno mer, 27/11/2019 alle 10.05 +0800, Ming Lei ha scritto: > > > > > > > > > It can be workaround via the following change: > > > > > > /lib/modules/5.4.0+/build/include/generated/autoconf.h: > > > > > > //#define CONFIG_CC_HAS_ASM_INLINE 1 > > > > Thanks, it worked, trace attached. Produced by: start the trace > script > > (with the pendrive already plugged), wait some seconds, run the > test > > (1 trial, 1 GB), wait for the test to finish, stop the trace. > > > > The copy took 2659 seconds, roughly as already seen before. > > Thanks for collecting the log. > > From the log, some of write IOs are out-of-order, such as, the 1st > one > is 378880. > > 16.41240 2 266 266 kworker/2:1H block_rq_issue b'W' > 370656 240 > 16.41961 3 485 485 kworker/3:1H block_rq_issue b'W' > 378880 240 > 16.73729 2 266 266 kworker/2:1H block_rq_issue b'W' > 370896 240 > 17.71161 2 266 266 kworker/2:1H block_rq_issue b'W' > 379120 240 > 18.02344 2 266 266 kworker/2:1H block_rq_issue b'W' > 371136 240 > 18.94314 3 485 485 kworker/3:1H block_rq_issue b'W' > 379360 240 > 19.25624 2 266 266 kworker/2:1H block_rq_issue b'W' > 371376 240 > > IO latency is increased a lot since the 1st out-of-order request(usb > storage HBA is single queue depth, one request can be issued only > if > the previous issued request is completed). > > The reason is that there are two kind of tasks which inserts rq to > device. > One is the 'cp' process, the other is kworker/u8:*. The out-of- > order > happens during the two task's interleaving. > > Under such situation, I believe that the old legacy IO path may not > guarantee the order too. In blk_queue_bio(), after get_request() > allocates one request, the queue lock is released. And request is > actually inserted & issued from blk_flush_plug_list() under the > branch of 'if (plug)'. If requests are from two tasks, then request > is inserted/issued from two plug list, and no order can be > guaranteed. > > In my test, except for several requests from the beginning, all > other > requests are inserted via the kworker thread(guess it is writeback > wq), > that is why I can't observe the issue in my test. > > As Schmid suggested, you may run the same test on old kernel with > legacy io path, and see if the performance is still good. > > Also, could you share the following info about your machine? So that > I can build my VM guest in this setting for reproducing your > situation > (requests are inserted from two types of threads). > > - lscpu attached, > - free -h total used free shared buff/cache available Mem: 23Gi 4,2Gi 11Gi 448Mi 7,0Gi 18Gi Swap: 3,7Gi 0B 3,7Gi > - lsblk -d $USB_DISK NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sdg 8:96 1 28,8G 0 disk > - exact commands for mount the disk, and running the copy operation I attached the whole script to this thread, I attach it again to this message and copy the relevant lines here: mount UUID=$uuid /mnt/pendrive 2>&1 |tee -a $logfile SECONDS=0 cp $testfile /mnt/pendrive 2>&1 |tee -a $logfile umount /mnt/pendrive 2>&1 |tee -a $logfile Meanwhile, I am going on with the further tests as suggested Thanks, Andrea