Il giorno mer, 27/11/2019 alle 21.08 +0800, Ming Lei ha scritto:
> On Wed, Nov 27, 2019 at 10:39:40AM +0100, Andrea Vai wrote:
> > Il giorno mer, 27/11/2019 alle 10.05 +0800, Ming Lei ha scritto:
> > > 
> > > 
> > > It can be workaround via the following change:
> > > 
> > > /lib/modules/5.4.0+/build/include/generated/autoconf.h:
> > > 
> > > //#define CONFIG_CC_HAS_ASM_INLINE 1
> > 
> > Thanks, it worked, trace attached. Produced by: start the trace
> script
> > (with the pendrive already plugged), wait some seconds, run the
> test
> > (1 trial, 1 GB), wait for the test to finish, stop the trace.
> > 
> > The copy took 2659 seconds, roughly as already seen before.
> 
> Thanks for collecting the log.
> 
> From the log, some of write IOs are out-of-order, such as, the 1st
> one
> is 378880.
> 
> 16.41240 2   266     266     kworker/2:1H    block_rq_issue   b'W'
> 370656 240
> 16.41961 3   485     485     kworker/3:1H    block_rq_issue   b'W'
> 378880 240
> 16.73729 2   266     266     kworker/2:1H    block_rq_issue   b'W'
> 370896 240
> 17.71161 2   266     266     kworker/2:1H    block_rq_issue   b'W'
> 379120 240
> 18.02344 2   266     266     kworker/2:1H    block_rq_issue   b'W'
> 371136 240
> 18.94314 3   485     485     kworker/3:1H    block_rq_issue   b'W'
> 379360 240
> 19.25624 2   266     266     kworker/2:1H    block_rq_issue   b'W'
> 371376 240
> 
> IO latency is increased a lot since the 1st out-of-order request(usb
> storage HBA is single queue depth, one request can be issued only
> if 
> the previous issued request is completed).
> 
> The reason is that there are two kind of tasks which inserts rq to
> device.
> One is the 'cp' process, the other is kworker/u8:*.  The out-of-
> order
> happens during the two task's interleaving.
> 
> Under such situation, I believe that the old legacy IO path may not
> guarantee the order too. In blk_queue_bio(), after get_request()
> allocates one request, the queue lock is released.  And request is
> actually inserted & issued from blk_flush_plug_list() under the
> branch of 'if (plug)'. If requests are from two tasks, then request
> is inserted/issued from two plug list, and no order can be
> guaranteed.
> 
> In my test, except for several requests from the beginning, all
> other
> requests are inserted via the kworker thread(guess it is writeback
> wq),
> that is why I can't observe the issue in my test.
> 
> As Schmid suggested, you may run the same test on old kernel with
> legacy io path, and see if the performance is still good.
> 
> Also, could you share the following info about your machine? So that
> I can build my VM guest in this setting for reproducing your
> situation
> (requests are inserted from two types of threads).
> 
> - lscpu
attached,

> - free -h
              total        used        free      shared  buff/cache   available
Mem:           23Gi       4,2Gi        11Gi       448Mi       7,0Gi        18Gi
Swap:         3,7Gi          0B       3,7Gi

> - lsblk -d $USB_DISK

NAME MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sdg    8:96   1 28,8G  0 disk 


> - exact commands for mount the disk, and running the copy operation

I attached the whole script to this thread, I attach it again to this
message and copy the relevant lines here:

  mount UUID=$uuid /mnt/pendrive 2>&1 |tee -a $logfile
  SECONDS=0
  cp $testfile /mnt/pendrive 2>&1 |tee -a $logfile
  umount /mnt/pendrive 2>&1 |tee -a $logfile

Meanwhile, I am going on with the further tests as suggested

Thanks,
Andrea