All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Huang, Ying" <ying.huang@intel.com>
To: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	 Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	 Vincent Guittot <vincent.guittot@linaro.org>,
	 Dietmar Eggemann <dietmar.eggemann@arm.com>,
	 Steven Rostedt <rostedt@goodmis.org>,
	 Ben Segall <bsegall@google.com>,  Mel Gorman <mgorman@suse.de>,
	 Daniel Bristot de Oliveira <bristot@redhat.com>,
	 Valentin Schneider <vschneid@redhat.com>,
	 linux-kernel@vger.kernel.org,
	 Luca Abeni <luca.abeni@santannapisa.it>,
	 Tommaso Cucinotta <tommaso.cucinotta@santannapisa.it>,
	 Thomas Gleixner <tglx@linutronix.de>,
	 Joel Fernandes <joel@joelfernandes.org>,
	 Vineeth Pillai <vineeth@bitbyteword.org>,
	 Shuah Khan <skhan@linuxfoundation.org>,
	Phil Auld <pauld@redhat.com>,  Aaron Lu <aaron.lu@intel.com>,
	 Kairui Song <kasong@tencent.com>,
	 Guo Ziliang <guo.ziliang@zte.com.cn>
Subject: Re: [PATCH v5 0/7] SCHED_DEADLINE server infrastructure
Date: Tue, 20 Feb 2024 11:28:41 +0800	[thread overview]
Message-ID: <878r3freza.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <23b87c48-c4b8-4b85-822a-33cffaf6f779@kernel.org> (Daniel Bristot de Oliveira's message of "Mon, 19 Feb 2024 11:23:18 +0100")

Daniel Bristot de Oliveira <bristot@kernel.org> writes:

> Hi
>
> On 2/19/24 08:33, Huang, Ying wrote:
>> Hi, Daniel,
>> 
>> Thanks a lot for your great patchset!
>> 
>> We have a similar starvation issue in mm subsystem too.  Details are in
>> the patch description of the below commit.  In short, task A is busy
>> looping on some event, while task B will signal the event after some
>> work.  If the priority of task A is higher than that of task B, task B
>> may be starved.
>
> ok...
>
>> 
>> IIUC, if task A is RT task while task B is fair task, then your patchset
>> will solve the issue.
>
> This patch set will not solve the issue. It will mitigate the effect of the
> problem. Still the system will perform very poorly...

I don't think that it's common (or even reasonable) for real-time tasks
to use swap.  So, IMHO, performance isn't very important here.  But, we
need to avoid live-lock anyway.  I think that your patchset solves the
live-lock issue.

>> If both task A and task B is RT tasks, is there
>> some way to solve the issue?
>
> I would say reworking the swap algorithm, as it is not meant to be used when
> real-time tasks are in place.
>
> As an exercise, let's say that we add a server per priority on FIFO, with a default
> 50ms/1s runtime period. Your "real-time" workload would suffer a 950ms latency,
> busy loop in vain.

If the target is only the live-lock avoidance, is it possible to run
lower priority runnable tasks for a short while if we run long enough in
the busy loop?

> Then one would say, let's lower the parameters, so the granularity of
> the server would provide lower latencies. The same problem would still
> exist, as it exists with sched fair....
>
> So, the solution is not on schedule. Busy loop waiting is not right when you
> have RT tasks. That is why PREEMPT_RT reworks the locking pattern to remove
> spin_locks that do busy waiting. spin_locks do not have this problem you
> show because they disable preemption... but disabling preemption is not
> the best solution either.
>
> So, a first try of duct tape would using (local_?) locks like in
> preempt rt to make things sleepable...
>
> AFAICS, this was already discussed in the previous link, right?
>
>> 
>> Now, we use a ugly schedule_timeout_uninterruptible(1) in the loop to
>> resolve the issue, is there something better?
>
> I am not a swap/mm expert.. my guesses would be all on sleepable locking.
> But I know there are many smart people on the mm side with better guesses...
>
> It is just that the DL server or any type of starvation avoidance does not
> seem to be a solution for your problem.

Yes.  To improve the performance, we need something else.

--
Best Regards,
Huang, Ying

> -- Daniel
>
>
>> Best Regards,
>> Huang, Ying
>> 
>> --------------------------8<---------------------------------------
>> commit 029c4628b2eb2ca969e9bf979b05dc18d8d5575e
>> Author: Guo Ziliang <guo.ziliang@zte.com.cn>
>> Date:   Wed Mar 16 16:15:03 2022 -0700
>> 
>>     mm: swap: get rid of livelock in swapin readahead
>>     
>>     In our testing, a livelock task was found.  Through sysrq printing, same
>>     stack was found every time, as follows:
>>     
>>       __swap_duplicate+0x58/0x1a0
>>       swapcache_prepare+0x24/0x30
>>       __read_swap_cache_async+0xac/0x220
>>       read_swap_cache_async+0x58/0xa0
>>       swapin_readahead+0x24c/0x628
>>       do_swap_page+0x374/0x8a0
>>       __handle_mm_fault+0x598/0xd60
>>       handle_mm_fault+0x114/0x200
>>       do_page_fault+0x148/0x4d0
>>       do_translation_fault+0xb0/0xd4
>>       do_mem_abort+0x50/0xb0
>>     
>>     The reason for the livelock is that swapcache_prepare() always returns
>>     EEXIST, indicating that SWAP_HAS_CACHE has not been cleared, so that it
>>     cannot jump out of the loop.  We suspect that the task that clears the
>>     SWAP_HAS_CACHE flag never gets a chance to run.  We try to lower the
>>     priority of the task stuck in a livelock so that the task that clears
>>     the SWAP_HAS_CACHE flag will run.  The results show that the system
>>     returns to normal after the priority is lowered.
>>     
>>     In our testing, multiple real-time tasks are bound to the same core, and
>>     the task in the livelock is the highest priority task of the core, so
>>     the livelocked task cannot be preempted.
>>     
>>     Although cond_resched() is used by __read_swap_cache_async, it is an
>>     empty function in the preemptive system and cannot achieve the purpose
>>     of releasing the CPU.  A high-priority task cannot release the CPU
>>     unless preempted by a higher-priority task.  But when this task is
>>     already the highest priority task on this core, other tasks will not be
>>     able to be scheduled.  So we think we should replace cond_resched() with
>>     schedule_timeout_uninterruptible(1), schedule_timeout_interruptible will
>>     call set_current_state first to set the task state, so the task will be
>>     removed from the running queue, so as to achieve the purpose of giving
>>     up the CPU and prevent it from running in kernel mode for too long.
>>     
>>     (akpm: ugly hack becomes uglier.  But it fixes the issue in a
>>     backportable-to-stable fashion while we hopefully work on something
>>     better)
>>     
>>     Link: https://lkml.kernel.org/r/20220221111749.1928222-1-cgel.zte@gmail.com
>>     Signed-off-by: Guo Ziliang <guo.ziliang@zte.com.cn>
>>     Reported-by: Zeal Robot <zealci@zte.com.cn>
>>     Reviewed-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>>     Reviewed-by: Jiang Xuexin <jiang.xuexin@zte.com.cn>
>>     Reviewed-by: Yang Yang <yang.yang29@zte.com.cn>
>>     Acked-by: Hugh Dickins <hughd@google.com>
>>     Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>     Cc: Michal Hocko <mhocko@kernel.org>
>>     Cc: Minchan Kim <minchan@kernel.org>
>>     Cc: Johannes Weiner <hannes@cmpxchg.org>
>>     Cc: Roger Quadros <rogerq@kernel.org>
>>     Cc: Ziliang Guo <guo.ziliang@zte.com.cn>
>>     Cc: <stable@vger.kernel.org>
>>     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>>     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>> 
>> diff --git a/mm/swap_state.c b/mm/swap_state.c
>> index 8d4104242100..ee67164531c0 100644
>> --- a/mm/swap_state.c
>> +++ b/mm/swap_state.c
>> @@ -478,7 +478,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
>>  		 * __read_swap_cache_async(), which has set SWAP_HAS_CACHE
>>  		 * in swap_map, but not yet added its page to swap cache.
>>  		 */
>> -		cond_resched();
>> +		schedule_timeout_uninterruptible(1);
>>  	}
>>  
>>  	/*

  reply	other threads:[~2024-02-20  3:30 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-04 10:59 [PATCH v5 0/7] SCHED_DEADLINE server infrastructure Daniel Bristot de Oliveira
2023-11-04 10:59 ` [PATCH v5 1/7] sched: Unify runtime accounting across classes Daniel Bristot de Oliveira
2023-11-15  9:04   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2023-11-04 10:59 ` [PATCH v5 2/7] sched/deadline: Collect sched_dl_entity initialization Daniel Bristot de Oliveira
2023-11-15  9:04   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2023-11-04 10:59 ` [PATCH v5 3/7] sched/deadline: Move bandwidth accounting into {en,de}queue_dl_entity Daniel Bristot de Oliveira
2023-11-15  9:04   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2023-11-04 10:59 ` [PATCH v5 4/7] sched/deadline: Introduce deadline servers Daniel Bristot de Oliveira
2023-11-15  9:04   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2023-11-04 10:59 ` [PATCH v5 5/7] sched/fair: Add trivial fair server Daniel Bristot de Oliveira
2023-11-06 14:24   ` Peter Zijlstra
2023-11-06 14:26     ` Daniel Bristot de Oliveira
2023-11-04 10:59 ` [PATCH v5 6/7] sched/deadline: Deferrable dl server Daniel Bristot de Oliveira
2023-11-06 14:55   ` Peter Zijlstra
2023-11-06 17:05     ` Daniel Bristot de Oliveira
2023-11-06 19:32   ` Joel Fernandes
2023-11-06 21:32     ` Joel Fernandes
2023-11-06 21:37       ` Joel Fernandes
2023-11-07 11:58         ` Daniel Bristot de Oliveira
2023-11-08  2:42           ` Joel Fernandes
2023-11-07 16:47         ` Steven Rostedt
2023-11-07 17:35           ` Steven Rostedt
2023-11-07 17:46             ` Steven Rostedt
2023-11-07 17:54             ` Steven Rostedt
2023-11-07 19:32               ` Steven Rostedt
2023-11-07 20:07                 ` Steven Rostedt
2023-11-07 17:37           ` Daniel Bristot de Oliveira
2023-11-07 18:50             ` Daniel Bristot de Oliveira
2023-11-08  3:20               ` Joel Fernandes
2023-11-08  8:01                 ` Daniel Bristot de Oliveira
2023-11-08 18:25                   ` Joel Fernandes
2023-11-08 12:44               ` Peter Zijlstra
2023-11-08 12:50                 ` Peter Zijlstra
2023-11-08 14:52                   ` Daniel Bristot de Oliveira
2023-11-08 13:46                 ` Daniel Bristot de Oliveira
2023-11-08 13:58                 ` Daniel Bristot de Oliveira
2023-11-08 15:14                 ` Juri Lelli
2023-11-08 16:57                   ` Peter Zijlstra
2023-11-08  2:37           ` Joel Fernandes
2023-11-07  7:30     ` Daniel Bristot de Oliveira
2023-11-07 16:37   ` Steven Rostedt
2023-11-13 15:05   ` kernel test robot
2024-03-20  0:03   ` Joel Fernandes
2024-03-20 19:24     ` Daniel Bristot de Oliveira
2024-03-21 16:15       ` Joel Fernandes
2024-03-23 14:37         ` Joel Fernandes
2024-04-05 14:35         ` Daniel Bristot de Oliveira
2024-04-08 17:11           ` Steven Rostedt
2023-11-04 10:59 ` [PATCH v5 7/7] sched/fair: Fair server interface Daniel Bristot de Oliveira
2023-11-04 15:18   ` kernel test robot
2023-11-05  0:55   ` kernel test robot
2023-11-06 15:40   ` Peter Zijlstra
2023-11-06 16:29     ` Daniel Bristot de Oliveira
2023-11-07  8:16       ` Peter Zijlstra
2023-11-07 14:06         ` Daniel Bristot de Oliveira
2023-11-07 14:44       ` Peter Zijlstra
2023-11-07 12:38   ` Peter Zijlstra
2023-11-07 13:24     ` Daniel Bristot de Oliveira
2024-01-19  1:49   ` Joel Fernandes
2024-01-19  1:55   ` Joel Fernandes
2024-01-22 14:14     ` Daniel Bristot de Oliveira
2024-01-23 15:39       ` Joel Fernandes
2024-01-23 15:44       ` Joel Fernandes
2024-02-13  2:13   ` Joel Fernandes
2024-02-13  2:21     ` Joel Fernandes
2024-02-14 14:23     ` Daniel Bristot de Oliveira
2024-02-15 13:57       ` Joel Fernandes
2024-02-15 17:27         ` Daniel Bristot de Oliveira
2024-02-15 17:41           ` Joel Fernandes
2024-04-04 17:43             ` Daniel Bristot de Oliveira
2023-12-08 21:47 ` [PATCH v5 0/7] SCHED_DEADLINE server infrastructure Joel Fernandes
2024-02-19  7:33 ` Huang, Ying
2024-02-19 10:23   ` Daniel Bristot de Oliveira
2024-02-20  3:28     ` Huang, Ying [this message]
2024-02-20  8:31       ` Daniel Bristot de Oliveira
2024-02-20  8:41         ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878r3freza.fsf@yhuang6-desk2.ccr.corp.intel.com \
    --to=ying.huang@intel.com \
    --cc=aaron.lu@intel.com \
    --cc=bristot@kernel.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=guo.ziliang@zte.com.cn \
    --cc=joel@joelfernandes.org \
    --cc=juri.lelli@redhat.com \
    --cc=kasong@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luca.abeni@santannapisa.it \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=skhan@linuxfoundation.org \
    --cc=tglx@linutronix.de \
    --cc=tommaso.cucinotta@santannapisa.it \
    --cc=vincent.guittot@linaro.org \
    --cc=vineeth@bitbyteword.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.