Linux-XFS Archive on lore.kernel.org
 help / color / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: Jeff Moyer <jmoyer@redhat.com>,
	Dave Chinner <dchinner@redhat.com>,
	Eric Sandeen <sandeen@redhat.com>, Christoph Hellwig <hch@lst.de>,
	Jens Axboe <axboe@kernel.dk>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>, Tejun Heo <tj@kernel.org>
Subject: single aio thread is migrated crazily by scheduler
Date: Thu, 14 Nov 2019 19:31:53 +0800
Message-ID: <20191114113153.GB4213@ming.t460p> (raw)

Hi Guys,

It is found that single AIO thread is migrated crazely by scheduler, and
the migrate period can be < 10ms. Follows the test a):

	- run single job fio[1] for 30 seconds:
	./xfs_complete 512
	
	- observe fio io thread migration via bcc trace[2], and the migration
	times can reach 5k ~ 10K in above test. In this test, CPU utilization
	is 30~40% on the CPU running fio IO thread.
	
	- after applying the debug patch[3] to queue XFS completion work on
	other CPU(not current CPU), the above crazy fio IO thread migration
	can't be observed.

And the similar result can be observed in the following test b) too:

	- set sched parameters:
		sysctl kernel.sched_min_granularity_ns=10000000
		sysctl kernel.sched_wakeup_granularity_ns=15000000
	
		which is usually done by 'tuned-adm profile network-throughput'
	
	- run single job fio aio[1] for 30 seconds:
	  ./xfs_complete 4k 
	
	- observe fio io thread migration[2], and similar crazy migration
	can be observed too. In this test, CPU utilization is close to 100%
	on the CPU for running fio IO thread
	
	- the debug patch[3] still makes a big difference on this test wrt.
	fio IO thread migration.

For test b), I thought that load balance may be triggered when
single fio IO thread takes the CPU by ~100%, meantime XFS's queue_work()
schedules WQ worker thread on the current CPU, since all other CPUs
are idle. When the fio IO thread is migrated to new CPU, the same steps
can be repeated again.

But for test a), I have no idea why fio IO thread is still migrated so
frequently since the CPU isn't saturated at all.

IMO, it is normal for user to saturate aio thread, since this way may
save context switch.

Guys, any idea on the crazy aio thread migration?

BTW, the tests are run on latest linus tree(5.4-rc7) in KVM guest, and the
fio test is created for simulating one real performance report which is
proved to be caused by frequent aio submission thread migration.


[1] xfs_complete: one fio script for running single job overwrite aio on XFS
#!/bin/bash

BS=$1
NJOBS=1
QD=128
DIR=/mnt/xfs
BATCH=1
VERIFY="sha3-512"

sysctl kernel.sched_wakeup_granularity_ns
sysctl kernel.sched_min_granularity_ns

rmmod scsi_debug;modprobe scsi_debug dev_size_mb=6144 ndelay=41000 dix=1 dif=2
DEV=`ls -d /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/* | head -1 | xargs basename`
DEV="/dev/"$DEV

mkfs.xfs -f $DEV
[ ! -d $DIR ] && mkdir -p $DIR
mount $DEV $DIR

fio --readwrite=randwrite --filesize=5g \
    --overwrite=1 \
    --filename=$DIR/fiofile \
    --runtime=30s --time_based \
    --ioengine=libaio --direct=1 --bs=4k --iodepth=$QD \
    --iodepth_batch_submit=$BATCH \
    --iodepth_batch_complete_min=$BATCH \
    --numjobs=$NJOBS \
    --verify=$VERIFY \
    --name=/hana/fsperf/foo

umount $DEV
rmmod scsi_debug


[2] observe fio migration via bcc trace:
/usr/share/bcc/tools/trace -C -t  't:sched:sched_migrate_task "%s/%d cpu %d->%d", args->comm,args->pid,args->orig_cpu,args->dest_cpu' | grep fio 

[3] test patch for queuing xfs completetion on other CPU

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 1fc28c2da279..bdc007a57706 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -158,9 +158,14 @@ static void iomap_dio_bio_end_io(struct bio *bio)
 			blk_wake_io_task(waiter);
 		} else if (dio->flags & IOMAP_DIO_WRITE) {
 			struct inode *inode = file_inode(dio->iocb->ki_filp);
+			unsigned cpu = cpumask_next(smp_processor_id(),
+					cpu_online_mask);
+
+			if (cpu >= nr_cpu_ids)
+				cpu = 0;
 
 			INIT_WORK(&dio->aio.work, iomap_dio_complete_work);
-			queue_work(inode->i_sb->s_dio_done_wq, &dio->aio.work);
+			queue_work_on(cpu, inode->i_sb->s_dio_done_wq, &dio->aio.work);
 		} else {
 			iomap_dio_complete_work(&dio->aio.work);
 		}

Thanks,
Ming


             reply index

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-14 11:31 Ming Lei [this message]
2019-11-14 13:14 ` Peter Zijlstra
2019-11-15  0:09   ` Ming Lei
2019-11-15 14:16     ` Ming Lei
2019-11-14 23:54 ` Dave Chinner
2019-11-15  1:08   ` Ming Lei
2019-11-15  4:56     ` Dave Chinner
2019-11-15  7:08       ` Ming Lei
2019-11-15 23:40         ` Dave Chinner
2019-11-16  6:31           ` Ming Lei
2019-11-18  9:21           ` Peter Zijlstra
2019-11-18 14:54             ` Vincent Guittot
2019-11-18 20:40             ` Dave Chinner
2019-11-20 19:16               ` Peter Zijlstra
2019-11-20 22:03                 ` Phil Auld
2019-11-21  4:12                   ` Ming Lei
2019-11-21 14:12                     ` Phil Auld
2019-11-21 15:02                       ` Boaz Harrosh
2019-11-21 16:19                         ` Jens Axboe
2019-12-09 16:58                           ` Srikar Dronamraju
2019-11-21 22:10                       ` Dave Chinner
2019-11-21 13:29                   ` Peter Zijlstra
2019-11-21 14:21                     ` Phil Auld
2019-12-09 16:51                     ` Srikar Dronamraju
2019-12-09 23:17                       ` Dave Chinner
2019-12-10  3:27                         ` Srikar Dronamraju
2019-12-10  5:43                         ` [PATCH v2] sched/core: Preempt current task in favour of bound kthread Srikar Dronamraju
2019-12-10  9:26                           ` Peter Zijlstra
2019-12-10  9:33                             ` Peter Zijlstra
2019-12-10 10:18                               ` Srikar Dronamraju
2019-12-10 10:16                             ` Srikar Dronamraju
2019-12-10  9:43                           ` Vincent Guittot
2019-12-10 10:11                             ` Srikar Dronamraju
2019-12-10 11:02                               ` Vincent Guittot
2019-12-10 17:23                           ` [PATCH v3] " Srikar Dronamraju
2019-12-11 17:38                             ` [PATCH v4] " Srikar Dronamraju
2019-12-11 22:46                               ` Dave Chinner
2019-12-12 10:10                                 ` Peter Zijlstra
2019-12-12 10:14                                   ` Peter Zijlstra
2019-12-12 10:23                                     ` Peter Zijlstra
2019-12-12 11:20                                       ` Vincent Guittot
2019-12-12 13:12                                         ` Peter Zijlstra
2019-12-12 15:07                                   ` Srikar Dronamraju
2019-12-12 15:15                                     ` Peter Zijlstra
2019-12-13  5:32                                   ` Srikar Dronamraju
2019-11-18 16:26           ` single aio thread is migrated crazily by scheduler Srikar Dronamraju
2019-11-18 21:18             ` Dave Chinner
2019-11-19  8:54               ` Ming Lei
     [not found]         ` <20191128094003.752-1-hdanton@sina.com>
2019-11-28  9:53           ` Vincent Guittot
2019-12-02  2:46             ` Ming Lei
2019-12-02  4:02               ` Dave Chinner
2019-12-02  4:22                 ` Ming Lei
2019-12-02 13:45                 ` Vincent Guittot
2019-12-02 21:22                   ` Phil Auld
2019-12-03  9:45                     ` Vincent Guittot
2019-12-04 13:50                       ` Ming Lei
2019-12-02 23:53                   ` Dave Chinner
2019-12-03  0:18                     ` Ming Lei
2019-12-03 13:34                     ` Vincent Guittot
2019-12-02  7:39               ` Juri Lelli
2019-12-02  3:08           ` Dave Chinner
     [not found]           ` <20191202090158.15016-1-hdanton@sina.com>
2019-12-02 23:06             ` Dave Chinner
     [not found]             ` <20191203131514.5176-1-hdanton@sina.com>
2019-12-03 22:29               ` Dave Chinner
     [not found]               ` <20191204102903.896-1-hdanton@sina.com>
2019-12-04 22:59                 ` Dave Chinner

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191114113153.GB4213@ming.t460p \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=dchinner@redhat.com \
    --cc=hch@lst.de \
    --cc=jmoyer@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sandeen@redhat.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-XFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-xfs/0 linux-xfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-xfs linux-xfs/ https://lore.kernel.org/linux-xfs \
		linux-xfs@vger.kernel.org
	public-inbox-index linux-xfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-xfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git