All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Alexander Fomichev <fomichev.ru@gmail.com>
Cc: linux-kernel@vger.kernel.org, dmaengine@vger.kernel.org,
	linux@yadro.com, Peter Zijlstra <peterz@infradead.org>
Subject: Re: [RFC] Scheduler: DMA Engine regression because of sched/fair changes
Date: Wed, 12 Jan 2022 17:05:12 +0000	[thread overview]
Message-ID: <20220112170512.GO3301@suse.de> (raw)
In-Reply-To: <20220112152609.gg2boujeh5vv5cns@yadro.com>

On Wed, Jan 12, 2022 at 06:26:09PM +0300, Alexander Fomichev wrote:
> CC: Mel Gorman <mgorman@suse.de>
> CC: linux@yadro.com
> 
> Hi all,
> 
> There's a huge regression found, which affects Intel Xeon's DMA Engine
> performance between v4.14 LTS and modern kernels. In certain
> circumstances the speed in dmatest is more than 6 times lower.
> 
> 	- Hardware -
> I did testing on 2 systems:
> 1) Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz (Supermicro X11DAi-N)
> 2) Intel(R) Xeon(R) Bronze 3204 CPU @ 1.90GHz (YADRO Vegman S220)
> 
> 	- Measurement -
> The dmatest result speed decreases with almost any test settings.
> Although the most significant impact is revealed with 64K transfers. The
> following parameters were used:
> 
> modprobe dmatest iterations=1000 timeout=2000 test_buf_size=0x100000 transfer_size=0x10000 norandom=1
> echo "dma0chan0" > /sys/module/dmatest/parameters/channel
> echo 1 > /sys/module/dmatest/parameters/run
> 
> Every test csse was performed at least 3 times. All detailed results are
> below.
> 
> 	- Analysis -
> Bisecting revealed 2 different bad commits for those 2 systems, but both
> change the same function/condition in the same file.
> For the system (1) the bad commit is:
> [7332dec055f2457c386032f7e9b2991eb05c2a0a] sched/fair: Only immediately migrate tasks due to interrupts if prev and target CPUs share cache
> For the system (2) the bad commit is:
> [806486c377e33ab662de6d47902e9e2a32b79368] sched/fair: Do not migrate if the prev_cpu is idle
> 
> 	- Additional check -
> Attempting to revert the changes above, a dirty patch for the (current)
> kernel v5.16.0-rc5 was tested too:
> 

The consequences of the patch is allowing interrupts to migrate tasks away
from potentially cache hot data -- L1 misses if the two CPUs share LLC
or incurring remote memory access if migrating cross-node. The secondary
concern is that excessive migration from interrupts that round-robin CPUs
will mean that the CPU does not increase frequency. Minimally, the RFC
patch introduces regressions of their own. The comments cover the two
scenarios of interest

+        * If this_cpu is idle, it implies the wakeup is from interrupt
+        * context. Only allow the move if cache is shared. Otherwise an
+        * interrupt intensive workload could force all tasks onto one
+        * node depending on the IO topology or IRQ affinity settings.

(This one causes remote memory accesses and potentially overutilisation
of a subset of nodes)

+        * If the prev_cpu is idle and cache affine then avoid a migration.
+        * There is no guarantee that the cache hot data from an interrupt
+        * is more important than cache hot data on the prev_cpu and from
+        * a cpufreq perspective, it's better to have higher utilisation
+        * on one CPU.

(This one incurs L1/L2 misses due to a migration even though LLC may be
shared)

The tests don't say but what CPUs to the dmatest interrupts get
delivered to? dmatest appears to be an exception that the *only* hot
data of concern is also related to the interrupt as the DMA operation is
validated.

However, given that the point of a DMA engine is to transfer data without
the host CPU being involved and the interrupt is delivered on completion,
how realistic is it that the DMA data is immediately accessed on completion
by normal workloads that happen to use the DMA engine? What impact does
it have to tbe test is noverify or polling is used?

-- 
Mel Gorman
SUSE Labs

  reply	other threads:[~2022-01-12 17:05 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-12 15:26 [RFC] Scheduler: DMA Engine regression because of sched/fair changes Alexander Fomichev
2022-01-12 17:05 ` Mel Gorman [this message]
2022-01-17  8:19   ` Alexander Fomichev
2022-01-17 10:27     ` Mel Gorman
2022-01-17 17:44       ` Alexander Fomichev
     [not found]       ` <20220118020448.2399-1-hdanton@sina.com>
2022-01-18 10:05         ` Mel Gorman
2022-01-19 12:55         ` Alexander Fomichev
     [not found]         ` <20220121101217.2849-1-hdanton@sina.com>
2022-01-21 13:46           ` Alexander Fomichev
     [not found]           ` <20220122233314.2999-1-hdanton@sina.com>
2022-01-28 16:50             ` Alexander Fomichev
2022-02-23 15:24               ` Thorsten Leemhuis
2022-03-06 11:19                 ` [RFC] Scheduler: DMA Engine regression because of sched/fair changes #forregzbot Thorsten Leemhuis
2022-01-16  9:55 ` [RFC] Scheduler: DMA Engine regression because of sched/fair changes Thorsten Leemhuis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220112170512.GO3301@suse.de \
    --to=mgorman@suse.de \
    --cc=dmaengine@vger.kernel.org \
    --cc=fomichev.ru@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@yadro.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.