All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v7 00/23] Core scheduling v7
@ 2020-08-28 19:51 Julien Desfossez
  2020-08-28 19:51 ` [RFC PATCH v7 01/23] sched: Wrap rq::lock access Julien Desfossez
                   ` (22 more replies)
  0 siblings, 23 replies; 68+ messages in thread
From: Julien Desfossez @ 2020-08-28 19:51 UTC (permalink / raw)
  To: Peter Zijlstra, Vineeth Pillai, Joel Fernandes, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan
  Cc: Julien Desfossez, mingo, tglx, pjt, torvalds, linux-kernel,
	fweisbec, keescook, kerrnel, Phil Auld, Valentin Schneider,
	Mel Gorman, Pawan Gupta, Paolo Bonzini, joel, vineeth, Chen Yu,
	Christian Brauner, Agata Gruza, Antonio Gomez Iglesias, graf,
	konrad.wilk, dfaggioli, rostedt, derkling, benbjiang

Seventh iteration of the Core-Scheduling feature.

(Note that this iteration is a repost of the combined v6 and v6+ series,
and does not include the new interface discussed at LPC).

Core scheduling is a feature that allows only trusted tasks to run
concurrently on cpus sharing compute resources (eg: hyperthreads on a
core). The goal is to mitigate the core-level side-channel attacks
without requiring to disable SMT (which has a significant impact on
performance in some situations). Core scheduling (as of v7) mitigates
user-space to user-space attacks and user to kernel attack when one of
the siblings enters the kernel via interrupts or system call.

By default, the feature doesn't change any of the current scheduler
behavior. The user decides which tasks can run simultaneously on the
same core (for now by having them in the same tagged cgroup). When a tag
is enabled in a cgroup and a task from that cgroup is running on a
hardware thread, the scheduler ensures that only idle or trusted tasks
run on the other sibling(s). Besides security concerns, this feature can
also be beneficial for RT and performance applications where we want to
control how tasks make use of SMT dynamically.

This iteration focuses on the kernel protection when kernel runs along
with untrusted user mode tasks on a core. This was a main gap in the
security aspects provided by core scheduling feature in mitigating the
side-channel attacks. In v6, we had a minimal version where kernel was
protected from untrusted tasks when a sibling enters the kernel via
irq/softirq. System calls were not protected in the v6 patch and the
implementation had corner cases.  With this iteration, user mode tasks
running on siblings in a core are paused until the last sibling exits
kernel and thus guarantees that kernel code does not run concurrently
with untrusted user mode tasks. This protects both syscalls and IRQs vs
usermode and guest. It is also based on Thomas’s x86/entry branch which
moves generic entry code out of arch/

This iteration also fixes the hotplug crash and hang issues present in
v6. In v6, previous hotplug fixes were removed as they were getting
complicated. This iteration addresses the root cause of hot plug issues
and fixes them.

In terms of performance, the active VM-based workloads are still less
impacted by core scheduling than by disabling SMT. There are corner
cases and at this stage, since we are getting closer to merger, we are
actively looking for feedback and reproducers of those so we can address
them.

Performance tests:
sysbench is used to test the performance of the patch series. Used a 8
cpu/4 Core VM and ran 2 sysbench tests in parallel. Each sysbench test
runs 4 tasks:
sysbench --test=cpu --cpu-max-prime=100000 --num-threads=4 run

Compared the performance results for various combinations as below.
The metric below is 'events per second':

1. Coresched disabled
    sysbench-1/sysbench-2 => 175.7/175.6
2. Coreched enabled, both sysbench tagged
  sysbench-1/sysbench-2 => 168.8/165.6
3. Coresched enabled, sysbench-1 tagged and sysbench-2 untagged
    sysbench-1/sysbench-2 => 96.4/176.9
4. smt off
    sysbench-1/sysbench-2 => 97.9/98.8

When both sysbench-es are tagged, there is a perf drop of ~4%. With a
tagged/untagged case, the tagged one suffers because it always gets
stalled when the sibiling enters kernel. But this is no worse than
smtoff.

Also a modified rcutorture was used to heavily stress the kernel to make
sure there is not crash or instability.

Larger scale performance tests were performed where the host is running
the core scheduling kernel, each VM is in its own tag.

With 3 12-vcpus VMs on a 36 hardware thread NUMA node running linpack
(CPU-intensive), we go from 1:1 to 2:1 vcpus to hardware threads when we
disable SMT, the performance impact of core scheduling is -9.52% whereas
it is -25.41% with nosmt.

With 2 12-vcpus VMs database servers and 192 1-vcpus semi-idle VMs
running on 2 36 hardware threads NUMA nodes (3:1), the performance
impact is -47% with core scheduling and -66% with nosmt.

More details on those tests at the end of this presentation:
https://www.linuxplumbersconf.org/event/7/contributions/648/attachments/555/981/CoreScheduling_LPC2020.pdf

v7 is rebased on 5.9-rc2(d3ac97072ec5)
https://github.com/digitalocean/linux-coresched/tree/coresched/v7-v5.9-rc

TODO
----
(Mostly same as in v6)
- MAJOR: Core wide vruntime comparison re-work
  https://lwn.net/ml/linux-kernel/20200506143506.GH5298@hirez.programming.kicks-ass.net/
  https://lwn.net/ml/linux-kernel/20200514130248.GD2940@hirez.programming.kicks-ass.net/
- MAJOR: Decide on the interfaces/API for exposing the feature to userland.
  - LPC 2020 gained consensus on a simple task level interface as a starter
  - Investigate ideas on overcoming unified hierarchy limitations with
    cgroup v2 controller
- MAJOR: Load balancing/Migration fixes for core scheduling.
  With v6, Load balancing is partially coresched aware, but has some
  issues w.r.t process/taskgroup weights:
  https://lwn.net/ml/linux-kernel/20200225034438.GA617271@ziqianlu-desktop.localdomain/
- Investigate the source of the overhead even when no tasks are tagged:
  https://lkml.org/lkml/2019/10/29/242
- Core scheduling test framework: kselftests, torture tests etc

Changes in v7
-------------
- Kernel protection from untrusted usermode tasks
  - Joel, Vineeth
- Fix for hotplug crashes and hangs
  - Joel, Vineeth

Changes in v6
-------------
- Documentation
  - Joel
- Pause siblings on entering nmi/irq/softirq
  - Joel, Vineeth
- Fix for RCU crash
  - Joel
- Fix for a crash in pick_next_task
  - Yu Chen, Vineeth
- Minor re-write of core-wide vruntime comparison
  - Aaron Lu
- Cleanup: Address Review comments
- Cleanup: Remove hotplug support (for now)
- Build fixes: 32 bit, SMT=n, AUTOGROUP=n etc
  - Joel, Vineeth

Changes in v5
-------------
- Fixes for cgroup/process tagging during corner cases like cgroup
  destroy, task moving across cgroups etc
  - Tim Chen
- Coresched aware task migrations
  - Aubrey Li
- Other minor stability fixes.

Changes in v4
-------------
- Implement a core wide min_vruntime for vruntime comparison of tasks
  across cpus in a core.
  - Aaron Lu
- Fixes a typo bug in setting the forced_idle cpu.
  - Aaron Lu

Changes in v3
-------------
- Fixes the issue of sibling picking up an incompatible task
  - Aaron Lu
  - Vineeth Pillai
  - Julien Desfossez
- Fixes the issue of starving threads due to forced idle
  - Peter Zijlstra
- Fixes the refcounting issue when deleting a cgroup with tag
  - Julien Desfossez
- Fixes a crash during cpu offline/online with coresched enabled
  - Vineeth Pillai
- Fixes a comparison logic issue in sched_core_find
  - Aaron Lu

Changes in v2
-------------
- Fixes for couple of NULL pointer dereference crashes
  - Subhra Mazumdar
  - Tim Chen
- Improves priority comparison logic for process in different cpus
  - Peter Zijlstra
  - Aaron Lu
- Fixes a hard lockup in rq locking
  - Vineeth Pillai
  - Julien Desfossez
- Fixes a performance issue seen on IO heavy workloads
  - Vineeth Pillai
  - Julien Desfossez
- Fix for 32bit build
  - Aubrey Li

--

Aaron Lu (2):
  sched/fair: wrapper for cfs_rq->min_vruntime
  sched/fair: core wide cfs task priority comparison

Aubrey Li (1):
  sched: migration changes for core scheduling

Joel Fernandes (Google) (6):
  irq_work: Add support to detect if work is pending
  entry/idle: Add a common function for activites during idle entry/exit
  arch/x86: Add a new TIF flag for untrusted tasks
  kernel/entry: Add support for core-wide protection of kernel-mode
  entry/idle: Enter and exit kernel protection during idle entry and
    exit
  Documentation: Add documentation on core scheduling

Peter Zijlstra (9):
  sched: Wrap rq::lock access
  sched: Introduce sched_class::pick_task()
  sched: Core-wide rq->lock
  sched/fair: Add a few assertions
  sched: Basic tracking of matching tasks
  sched: Add core wide task selection and scheduling.
  sched: Trivial forced-newidle balancer
  sched: cgroup tagging interface for core scheduling
  sched: Debug bits...

Vineeth Pillai (5):
  bitops: Introduce find_next_or_bit
  cpumask: Introduce a new iterator for_each_cpu_wrap_or
  sched/fair: Fix forced idle sibling starvation corner case
  entry/kvm: Protect the kernel when entering from guest
  sched/coresched: config option for kernel protection

 .../admin-guide/hw-vuln/core-scheduling.rst   |  253 ++++
 Documentation/admin-guide/hw-vuln/index.rst   |    1 +
 .../admin-guide/kernel-parameters.txt         |    9 +
 arch/x86/include/asm/thread_info.h            |    2 +
 arch/x86/kvm/x86.c                            |    3 +
 include/asm-generic/bitops/find.h             |   16 +
 include/linux/cpumask.h                       |   42 +
 include/linux/entry-common.h                  |   22 +
 include/linux/entry-kvm.h                     |   12 +
 include/linux/irq_work.h                      |    1 +
 include/linux/sched.h                         |   21 +-
 kernel/Kconfig.preempt                        |   19 +
 kernel/entry/common.c                         |   90 +-
 kernel/entry/kvm.c                            |   12 +
 kernel/irq_work.c                             |   11 +
 kernel/sched/core.c                           | 1310 +++++++++++++++--
 kernel/sched/cpuacct.c                        |   12 +-
 kernel/sched/deadline.c                       |   34 +-
 kernel/sched/debug.c                          |    4 +-
 kernel/sched/fair.c                           |  400 +++--
 kernel/sched/idle.c                           |   30 +-
 kernel/sched/pelt.h                           |    2 +-
 kernel/sched/rt.c                             |   22 +-
 kernel/sched/sched.h                          |  253 +++-
 kernel/sched/stop_task.c                      |   13 +-
 kernel/sched/topology.c                       |    4 +-
 lib/cpumask.c                                 |   53 +
 lib/find_bit.c                                |   58 +-
 28 files changed, 2381 insertions(+), 328 deletions(-)
 create mode 100644 Documentation/admin-guide/hw-vuln/core-scheduling.rst

--
2.17.1


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [RFC PATCH v7 01/23] sched: Wrap rq::lock access
  2020-08-28 19:51 [RFC PATCH v7 00/23] Core scheduling v7 Julien Desfossez
@ 2020-08-28 19:51 ` Julien Desfossez
  2020-08-28 19:51 ` [RFC PATCH v7 02/23] sched: Introduce sched_class::pick_task() Julien Desfossez
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: Julien Desfossez @ 2020-08-28 19:51 UTC (permalink / raw)
  To: Peter Zijlstra, Vineeth Pillai, Joel Fernandes, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan
  Cc: mingo, tglx, pjt, torvalds, linux-kernel, fweisbec, keescook,
	kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta,
	Paolo Bonzini, joel, vineeth, Chen Yu, Christian Brauner,
	Agata Gruza, Antonio Gomez Iglesias, graf, konrad.wilk,
	dfaggioli, rostedt, derkling, benbjiang, Vineeth Remanan Pillai,
	Julien Desfossez

From: Peter Zijlstra <peterz@infradead.org>

In preparation of playing games with rq->lock, abstract the thing
using an accessor.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineeth Remanan Pillai <vpillai@digitalocean.com>
Signed-off-by: Julien Desfossez <jdesfossez@digitalocean.com>
---
 kernel/sched/core.c     |  46 +++++++++---------
 kernel/sched/cpuacct.c  |  12 ++---
 kernel/sched/deadline.c |  18 +++----
 kernel/sched/debug.c    |   4 +-
 kernel/sched/fair.c     |  38 +++++++--------
 kernel/sched/idle.c     |   4 +-
 kernel/sched/pelt.h     |   2 +-
 kernel/sched/rt.c       |   8 +--
 kernel/sched/sched.h    | 105 +++++++++++++++++++++-------------------
 kernel/sched/topology.c |   4 +-
 10 files changed, 122 insertions(+), 119 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8471a0f7eb32..b85d5e56d5fe 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -185,12 +185,12 @@ struct rq *__task_rq_lock(struct task_struct *p, struct rq_flags *rf)
 
 	for (;;) {
 		rq = task_rq(p);
-		raw_spin_lock(&rq->lock);
+		raw_spin_lock(rq_lockp(rq));
 		if (likely(rq == task_rq(p) && !task_on_rq_migrating(p))) {
 			rq_pin_lock(rq, rf);
 			return rq;
 		}
-		raw_spin_unlock(&rq->lock);
+		raw_spin_unlock(rq_lockp(rq));
 
 		while (unlikely(task_on_rq_migrating(p)))
 			cpu_relax();
@@ -209,7 +209,7 @@ struct rq *task_rq_lock(struct task_struct *p, struct rq_flags *rf)
 	for (;;) {
 		raw_spin_lock_irqsave(&p->pi_lock, rf->flags);
 		rq = task_rq(p);
-		raw_spin_lock(&rq->lock);
+		raw_spin_lock(rq_lockp(rq));
 		/*
 		 *	move_queued_task()		task_rq_lock()
 		 *
@@ -231,7 +231,7 @@ struct rq *task_rq_lock(struct task_struct *p, struct rq_flags *rf)
 			rq_pin_lock(rq, rf);
 			return rq;
 		}
-		raw_spin_unlock(&rq->lock);
+		raw_spin_unlock(rq_lockp(rq));
 		raw_spin_unlock_irqrestore(&p->pi_lock, rf->flags);
 
 		while (unlikely(task_on_rq_migrating(p)))
@@ -301,7 +301,7 @@ void update_rq_clock(struct rq *rq)
 {
 	s64 delta;
 
-	lockdep_assert_held(&rq->lock);
+	lockdep_assert_held(rq_lockp(rq));
 
 	if (rq->clock_update_flags & RQCF_ACT_SKIP)
 		return;
@@ -610,7 +610,7 @@ void resched_curr(struct rq *rq)
 	struct task_struct *curr = rq->curr;
 	int cpu;
 
-	lockdep_assert_held(&rq->lock);
+	lockdep_assert_held(rq_lockp(rq));
 
 	if (test_tsk_need_resched(curr))
 		return;
@@ -634,10 +634,10 @@ void resched_cpu(int cpu)
 	struct rq *rq = cpu_rq(cpu);
 	unsigned long flags;
 
-	raw_spin_lock_irqsave(&rq->lock, flags);
+	raw_spin_lock_irqsave(rq_lockp(rq), flags);
 	if (cpu_online(cpu) || cpu == smp_processor_id())
 		resched_curr(rq);
-	raw_spin_unlock_irqrestore(&rq->lock, flags);
+	raw_spin_unlock_irqrestore(rq_lockp(rq), flags);
 }
 
 #ifdef CONFIG_SMP
@@ -1141,7 +1141,7 @@ static inline void uclamp_rq_inc_id(struct rq *rq, struct task_struct *p,
 	struct uclamp_se *uc_se = &p->uclamp[clamp_id];
 	struct uclamp_bucket *bucket;
 
-	lockdep_assert_held(&rq->lock);
+	lockdep_assert_held(rq_lockp(rq));
 
 	/* Update task effective clamp */
 	p->uclamp[clamp_id] = uclamp_eff_get(p, clamp_id);
@@ -1181,7 +1181,7 @@ static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p,
 	unsigned int bkt_clamp;
 	unsigned int rq_clamp;
 
-	lockdep_assert_held(&rq->lock);
+	lockdep_assert_held(rq_lockp(rq));
 
 	/*
 	 * If sched_uclamp_used was enabled after task @p was enqueued,
@@ -1737,7 +1737,7 @@ static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
 static struct rq *move_queued_task(struct rq *rq, struct rq_flags *rf,
 				   struct task_struct *p, int new_cpu)
 {
-	lockdep_assert_held(&rq->lock);
+	lockdep_assert_held(rq_lockp(rq));
 
 	deactivate_task(rq, p, DEQUEUE_NOCLOCK);
 	set_task_cpu(p, new_cpu);
@@ -1849,7 +1849,7 @@ void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
 		 * Because __kthread_bind() calls this on blocked tasks without
 		 * holding rq->lock.
 		 */
-		lockdep_assert_held(&rq->lock);
+		lockdep_assert_held(rq_lockp(rq));
 		dequeue_task(rq, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK);
 	}
 	if (running)
@@ -1986,7 +1986,7 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 	 * task_rq_lock().
 	 */
 	WARN_ON_ONCE(debug_locks && !(lockdep_is_held(&p->pi_lock) ||
-				      lockdep_is_held(&task_rq(p)->lock)));
+				      lockdep_is_held(rq_lockp(task_rq(p)))));
 #endif
 	/*
 	 * Clearly, migrating tasks to offline CPUs is a fairly daft thing.
@@ -2497,7 +2497,7 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags,
 {
 	int en_flags = ENQUEUE_WAKEUP | ENQUEUE_NOCLOCK;
 
-	lockdep_assert_held(&rq->lock);
+	lockdep_assert_held(rq_lockp(rq));
 
 	if (p->sched_contributes_to_load)
 		rq->nr_uninterruptible--;
@@ -3499,10 +3499,10 @@ prepare_lock_switch(struct rq *rq, struct task_struct *next, struct rq_flags *rf
 	 * do an early lockdep release here:
 	 */
 	rq_unpin_lock(rq, rf);
-	spin_release(&rq->lock.dep_map, _THIS_IP_);
+	spin_release(&rq_lockp(rq)->dep_map, _THIS_IP_);
 #ifdef CONFIG_DEBUG_SPINLOCK
 	/* this is a valid case when another task releases the spinlock */
-	rq->lock.owner = next;
+	rq_lockp(rq)->owner = next;
 #endif
 }
 
@@ -3513,8 +3513,8 @@ static inline void finish_lock_switch(struct rq *rq)
 	 * fix up the runqueue lock - which gets 'carried over' from
 	 * prev into current:
 	 */
-	spin_acquire(&rq->lock.dep_map, 0, 0, _THIS_IP_);
-	raw_spin_unlock_irq(&rq->lock);
+	spin_acquire(&rq_lockp(rq)->dep_map, 0, 0, _THIS_IP_);
+	raw_spin_unlock_irq(rq_lockp(rq));
 }
 
 /*
@@ -3664,7 +3664,7 @@ static void __balance_callback(struct rq *rq)
 	void (*func)(struct rq *rq);
 	unsigned long flags;
 
-	raw_spin_lock_irqsave(&rq->lock, flags);
+	raw_spin_lock_irqsave(rq_lockp(rq), flags);
 	head = rq->balance_callback;
 	rq->balance_callback = NULL;
 	while (head) {
@@ -3675,7 +3675,7 @@ static void __balance_callback(struct rq *rq)
 
 		func(rq);
 	}
-	raw_spin_unlock_irqrestore(&rq->lock, flags);
+	raw_spin_unlock_irqrestore(rq_lockp(rq), flags);
 }
 
 static inline void balance_callback(struct rq *rq)
@@ -6522,7 +6522,7 @@ void init_idle(struct task_struct *idle, int cpu)
 	__sched_fork(0, idle);
 
 	raw_spin_lock_irqsave(&idle->pi_lock, flags);
-	raw_spin_lock(&rq->lock);
+	raw_spin_lock(rq_lockp(rq));
 
 	idle->state = TASK_RUNNING;
 	idle->se.exec_start = sched_clock();
@@ -6560,7 +6560,7 @@ void init_idle(struct task_struct *idle, int cpu)
 #ifdef CONFIG_SMP
 	idle->on_cpu = 1;
 #endif
-	raw_spin_unlock(&rq->lock);
+	raw_spin_unlock(rq_lockp(rq));
 	raw_spin_unlock_irqrestore(&idle->pi_lock, flags);
 
 	/* Set the preempt count _outside_ the spinlocks! */
@@ -7132,7 +7132,7 @@ void __init sched_init(void)
 		struct rq *rq;
 
 		rq = cpu_rq(i);
-		raw_spin_lock_init(&rq->lock);
+		raw_spin_lock_init(&rq->__lock);
 		rq->nr_running = 0;
 		rq->calc_load_active = 0;
 		rq->calc_load_update = jiffies + LOAD_FREQ;
diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index 941c28cf9738..38c1a68e91f0 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -112,7 +112,7 @@ static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu,
 	/*
 	 * Take rq->lock to make 64-bit read safe on 32-bit platforms.
 	 */
-	raw_spin_lock_irq(&cpu_rq(cpu)->lock);
+	raw_spin_lock_irq(rq_lockp(cpu_rq(cpu)));
 #endif
 
 	if (index == CPUACCT_STAT_NSTATS) {
@@ -126,7 +126,7 @@ static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu,
 	}
 
 #ifndef CONFIG_64BIT
-	raw_spin_unlock_irq(&cpu_rq(cpu)->lock);
+	raw_spin_unlock_irq(rq_lockp(cpu_rq(cpu)));
 #endif
 
 	return data;
@@ -141,14 +141,14 @@ static void cpuacct_cpuusage_write(struct cpuacct *ca, int cpu, u64 val)
 	/*
 	 * Take rq->lock to make 64-bit write safe on 32-bit platforms.
 	 */
-	raw_spin_lock_irq(&cpu_rq(cpu)->lock);
+	raw_spin_lock_irq(rq_lockp(cpu_rq(cpu)));
 #endif
 
 	for (i = 0; i < CPUACCT_STAT_NSTATS; i++)
 		cpuusage->usages[i] = val;
 
 #ifndef CONFIG_64BIT
-	raw_spin_unlock_irq(&cpu_rq(cpu)->lock);
+	raw_spin_unlock_irq(rq_lockp(cpu_rq(cpu)));
 #endif
 }
 
@@ -253,13 +253,13 @@ static int cpuacct_all_seq_show(struct seq_file *m, void *V)
 			 * Take rq->lock to make 64-bit read safe on 32-bit
 			 * platforms.
 			 */
-			raw_spin_lock_irq(&cpu_rq(cpu)->lock);
+			raw_spin_lock_irq(rq_lockp(cpu_rq(cpu)));
 #endif
 
 			seq_printf(m, " %llu", cpuusage->usages[index]);
 
 #ifndef CONFIG_64BIT
-			raw_spin_unlock_irq(&cpu_rq(cpu)->lock);
+			raw_spin_unlock_irq(rq_lockp(cpu_rq(cpu)));
 #endif
 		}
 		seq_puts(m, "\n");
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 3862a28cd05d..c588a06057dc 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -119,7 +119,7 @@ void __add_running_bw(u64 dl_bw, struct dl_rq *dl_rq)
 {
 	u64 old = dl_rq->running_bw;
 
-	lockdep_assert_held(&(rq_of_dl_rq(dl_rq))->lock);
+	lockdep_assert_held(rq_lockp(rq_of_dl_rq(dl_rq)));
 	dl_rq->running_bw += dl_bw;
 	SCHED_WARN_ON(dl_rq->running_bw < old); /* overflow */
 	SCHED_WARN_ON(dl_rq->running_bw > dl_rq->this_bw);
@@ -132,7 +132,7 @@ void __sub_running_bw(u64 dl_bw, struct dl_rq *dl_rq)
 {
 	u64 old = dl_rq->running_bw;
 
-	lockdep_assert_held(&(rq_of_dl_rq(dl_rq))->lock);
+	lockdep_assert_held(rq_lockp(rq_of_dl_rq(dl_rq)));
 	dl_rq->running_bw -= dl_bw;
 	SCHED_WARN_ON(dl_rq->running_bw > old); /* underflow */
 	if (dl_rq->running_bw > old)
@@ -146,7 +146,7 @@ void __add_rq_bw(u64 dl_bw, struct dl_rq *dl_rq)
 {
 	u64 old = dl_rq->this_bw;
 
-	lockdep_assert_held(&(rq_of_dl_rq(dl_rq))->lock);
+	lockdep_assert_held(rq_lockp(rq_of_dl_rq(dl_rq)));
 	dl_rq->this_bw += dl_bw;
 	SCHED_WARN_ON(dl_rq->this_bw < old); /* overflow */
 }
@@ -156,7 +156,7 @@ void __sub_rq_bw(u64 dl_bw, struct dl_rq *dl_rq)
 {
 	u64 old = dl_rq->this_bw;
 
-	lockdep_assert_held(&(rq_of_dl_rq(dl_rq))->lock);
+	lockdep_assert_held(rq_lockp(rq_of_dl_rq(dl_rq)));
 	dl_rq->this_bw -= dl_bw;
 	SCHED_WARN_ON(dl_rq->this_bw > old); /* underflow */
 	if (dl_rq->this_bw > old)
@@ -966,7 +966,7 @@ static int start_dl_timer(struct task_struct *p)
 	ktime_t now, act;
 	s64 delta;
 
-	lockdep_assert_held(&rq->lock);
+	lockdep_assert_held(rq_lockp(rq));
 
 	/*
 	 * We want the timer to fire at the deadline, but considering
@@ -1076,9 +1076,9 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
 		 * If the runqueue is no longer available, migrate the
 		 * task elsewhere. This necessarily changes rq.
 		 */
-		lockdep_unpin_lock(&rq->lock, rf.cookie);
+		lockdep_unpin_lock(rq_lockp(rq), rf.cookie);
 		rq = dl_task_offline_migration(rq, p);
-		rf.cookie = lockdep_pin_lock(&rq->lock);
+		rf.cookie = lockdep_pin_lock(rq_lockp(rq));
 		update_rq_clock(rq);
 
 		/*
@@ -1703,7 +1703,7 @@ static void migrate_task_rq_dl(struct task_struct *p, int new_cpu __maybe_unused
 	 * from try_to_wake_up(). Hence, p->pi_lock is locked, but
 	 * rq->lock is not... So, lock it
 	 */
-	raw_spin_lock(&rq->lock);
+	raw_spin_lock(rq_lockp(rq));
 	if (p->dl.dl_non_contending) {
 		sub_running_bw(&p->dl, &rq->dl);
 		p->dl.dl_non_contending = 0;
@@ -1718,7 +1718,7 @@ static void migrate_task_rq_dl(struct task_struct *p, int new_cpu __maybe_unused
 			put_task_struct(p);
 	}
 	sub_rq_bw(&p->dl, &rq->dl);
-	raw_spin_unlock(&rq->lock);
+	raw_spin_unlock(rq_lockp(rq));
 }
 
 static void check_preempt_equal_dl(struct rq *rq, struct task_struct *p)
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 36c54265bb2b..6e4b801fb80e 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -497,7 +497,7 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
 	SEQ_printf(m, "  .%-30s: %Ld.%06ld\n", "exec_clock",
 			SPLIT_NS(cfs_rq->exec_clock));
 
-	raw_spin_lock_irqsave(&rq->lock, flags);
+	raw_spin_lock_irqsave(rq_lockp(rq), flags);
 	if (rb_first_cached(&cfs_rq->tasks_timeline))
 		MIN_vruntime = (__pick_first_entity(cfs_rq))->vruntime;
 	last = __pick_last_entity(cfs_rq);
@@ -505,7 +505,7 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
 		max_vruntime = last->vruntime;
 	min_vruntime = cfs_rq->min_vruntime;
 	rq0_min_vruntime = cpu_rq(0)->cfs.min_vruntime;
-	raw_spin_unlock_irqrestore(&rq->lock, flags);
+	raw_spin_unlock_irqrestore(rq_lockp(rq), flags);
 	SEQ_printf(m, "  .%-30s: %Ld.%06ld\n", "MIN_vruntime",
 			SPLIT_NS(MIN_vruntime));
 	SEQ_printf(m, "  .%-30s: %Ld.%06ld\n", "min_vruntime",
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1a68a0536add..9be6cf92cc3d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1101,7 +1101,7 @@ struct numa_group {
 static struct numa_group *deref_task_numa_group(struct task_struct *p)
 {
 	return rcu_dereference_check(p->numa_group, p == current ||
-		(lockdep_is_held(&task_rq(p)->lock) && !READ_ONCE(p->on_cpu)));
+		(lockdep_is_held(rq_lockp(task_rq(p))) && !READ_ONCE(p->on_cpu)));
 }
 
 static struct numa_group *deref_curr_numa_group(struct task_struct *p)
@@ -5287,7 +5287,7 @@ static void __maybe_unused update_runtime_enabled(struct rq *rq)
 {
 	struct task_group *tg;
 
-	lockdep_assert_held(&rq->lock);
+	lockdep_assert_held(rq_lockp(rq));
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(tg, &task_groups, list) {
@@ -5306,7 +5306,7 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
 {
 	struct task_group *tg;
 
-	lockdep_assert_held(&rq->lock);
+	lockdep_assert_held(rq_lockp(rq));
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(tg, &task_groups, list) {
@@ -6766,7 +6766,7 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
 		 * In case of TASK_ON_RQ_MIGRATING we in fact hold the 'old'
 		 * rq->lock and can modify state directly.
 		 */
-		lockdep_assert_held(&task_rq(p)->lock);
+		lockdep_assert_held(rq_lockp(task_rq(p)));
 		detach_entity_cfs_rq(&p->se);
 
 	} else {
@@ -7394,7 +7394,7 @@ static int task_hot(struct task_struct *p, struct lb_env *env)
 {
 	s64 delta;
 
-	lockdep_assert_held(&env->src_rq->lock);
+	lockdep_assert_held(rq_lockp(env->src_rq));
 
 	if (p->sched_class != &fair_sched_class)
 		return 0;
@@ -7488,7 +7488,7 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
 {
 	int tsk_cache_hot;
 
-	lockdep_assert_held(&env->src_rq->lock);
+	lockdep_assert_held(rq_lockp(env->src_rq));
 
 	/*
 	 * We do not migrate tasks that are:
@@ -7566,7 +7566,7 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
  */
 static void detach_task(struct task_struct *p, struct lb_env *env)
 {
-	lockdep_assert_held(&env->src_rq->lock);
+	lockdep_assert_held(rq_lockp(env->src_rq));
 
 	deactivate_task(env->src_rq, p, DEQUEUE_NOCLOCK);
 	set_task_cpu(p, env->dst_cpu);
@@ -7582,7 +7582,7 @@ static struct task_struct *detach_one_task(struct lb_env *env)
 {
 	struct task_struct *p;
 
-	lockdep_assert_held(&env->src_rq->lock);
+	lockdep_assert_held(rq_lockp(env->src_rq));
 
 	list_for_each_entry_reverse(p,
 			&env->src_rq->cfs_tasks, se.group_node) {
@@ -7618,7 +7618,7 @@ static int detach_tasks(struct lb_env *env)
 	struct task_struct *p;
 	int detached = 0;
 
-	lockdep_assert_held(&env->src_rq->lock);
+	lockdep_assert_held(rq_lockp(env->src_rq));
 
 	if (env->imbalance <= 0)
 		return 0;
@@ -7740,7 +7740,7 @@ static int detach_tasks(struct lb_env *env)
  */
 static void attach_task(struct rq *rq, struct task_struct *p)
 {
-	lockdep_assert_held(&rq->lock);
+	lockdep_assert_held(rq_lockp(rq));
 
 	BUG_ON(task_rq(p) != rq);
 	activate_task(rq, p, ENQUEUE_NOCLOCK);
@@ -9672,7 +9672,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 		if (need_active_balance(&env)) {
 			unsigned long flags;
 
-			raw_spin_lock_irqsave(&busiest->lock, flags);
+			raw_spin_lock_irqsave(rq_lockp(busiest), flags);
 
 			/*
 			 * Don't kick the active_load_balance_cpu_stop,
@@ -9680,7 +9680,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 			 * moved to this_cpu:
 			 */
 			if (!cpumask_test_cpu(this_cpu, busiest->curr->cpus_ptr)) {
-				raw_spin_unlock_irqrestore(&busiest->lock,
+				raw_spin_unlock_irqrestore(rq_lockp(busiest),
 							    flags);
 				env.flags |= LBF_ALL_PINNED;
 				goto out_one_pinned;
@@ -9696,7 +9696,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 				busiest->push_cpu = this_cpu;
 				active_balance = 1;
 			}
-			raw_spin_unlock_irqrestore(&busiest->lock, flags);
+			raw_spin_unlock_irqrestore(rq_lockp(busiest), flags);
 
 			if (active_balance) {
 				stop_one_cpu_nowait(cpu_of(busiest),
@@ -10439,7 +10439,7 @@ static void nohz_newidle_balance(struct rq *this_rq)
 	    time_before(jiffies, READ_ONCE(nohz.next_blocked)))
 		return;
 
-	raw_spin_unlock(&this_rq->lock);
+	raw_spin_unlock(rq_lockp(this_rq));
 	/*
 	 * This CPU is going to be idle and blocked load of idle CPUs
 	 * need to be updated. Run the ilb locally as it is a good
@@ -10448,7 +10448,7 @@ static void nohz_newidle_balance(struct rq *this_rq)
 	 */
 	if (!_nohz_idle_balance(this_rq, NOHZ_STATS_KICK, CPU_NEWLY_IDLE))
 		kick_ilb(NOHZ_STATS_KICK);
-	raw_spin_lock(&this_rq->lock);
+	raw_spin_lock(rq_lockp(this_rq));
 }
 
 #else /* !CONFIG_NO_HZ_COMMON */
@@ -10514,7 +10514,7 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
 		goto out;
 	}
 
-	raw_spin_unlock(&this_rq->lock);
+	raw_spin_unlock(rq_lockp(this_rq));
 
 	update_blocked_averages(this_cpu);
 	rcu_read_lock();
@@ -10552,7 +10552,7 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
 	}
 	rcu_read_unlock();
 
-	raw_spin_lock(&this_rq->lock);
+	raw_spin_lock(rq_lockp(this_rq));
 
 	if (curr_cost > this_rq->max_idle_balance_cost)
 		this_rq->max_idle_balance_cost = curr_cost;
@@ -11028,9 +11028,9 @@ void unregister_fair_sched_group(struct task_group *tg)
 
 		rq = cpu_rq(cpu);
 
-		raw_spin_lock_irqsave(&rq->lock, flags);
+		raw_spin_lock_irqsave(rq_lockp(rq), flags);
 		list_del_leaf_cfs_rq(tg->cfs_rq[cpu]);
-		raw_spin_unlock_irqrestore(&rq->lock, flags);
+		raw_spin_unlock_irqrestore(rq_lockp(rq), flags);
 	}
 }
 
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 6bf34986f45c..5c33beb1a525 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -424,10 +424,10 @@ struct task_struct *pick_next_task_idle(struct rq *rq)
 static void
 dequeue_task_idle(struct rq *rq, struct task_struct *p, int flags)
 {
-	raw_spin_unlock_irq(&rq->lock);
+	raw_spin_unlock_irq(rq_lockp(rq));
 	printk(KERN_ERR "bad: scheduling from the idle thread!\n");
 	dump_stack();
-	raw_spin_lock_irq(&rq->lock);
+	raw_spin_lock_irq(rq_lockp(rq));
 }
 
 /*
diff --git a/kernel/sched/pelt.h b/kernel/sched/pelt.h
index 795e43e02afc..e850bd71a8ce 100644
--- a/kernel/sched/pelt.h
+++ b/kernel/sched/pelt.h
@@ -141,7 +141,7 @@ static inline void update_idle_rq_clock_pelt(struct rq *rq)
 
 static inline u64 rq_clock_pelt(struct rq *rq)
 {
-	lockdep_assert_held(&rq->lock);
+	lockdep_assert_held(rq_lockp(rq));
 	assert_clock_updated(rq);
 
 	return rq->clock_pelt - rq->lost_idle_time;
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index f215eea6a966..e57fca05b660 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -887,7 +887,7 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
 		if (skip)
 			continue;
 
-		raw_spin_lock(&rq->lock);
+		raw_spin_lock(rq_lockp(rq));
 		update_rq_clock(rq);
 
 		if (rt_rq->rt_time) {
@@ -925,7 +925,7 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
 
 		if (enqueue)
 			sched_rt_rq_enqueue(rt_rq);
-		raw_spin_unlock(&rq->lock);
+		raw_spin_unlock(rq_lockp(rq));
 	}
 
 	if (!throttled && (!rt_bandwidth_enabled() || rt_b->rt_runtime == RUNTIME_INF))
@@ -2094,9 +2094,9 @@ void rto_push_irq_work_func(struct irq_work *work)
 	 * When it gets updated, a check is made if a push is possible.
 	 */
 	if (has_pushable_tasks(rq)) {
-		raw_spin_lock(&rq->lock);
+		raw_spin_lock(rq_lockp(rq));
 		push_rt_tasks(rq);
-		raw_spin_unlock(&rq->lock);
+		raw_spin_unlock(rq_lockp(rq));
 	}
 
 	raw_spin_lock(&rd->rto_lock);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 28709f6b0975..9145be83d80b 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -894,7 +894,7 @@ DECLARE_STATIC_KEY_FALSE(sched_uclamp_used);
  */
 struct rq {
 	/* runqueue lock: */
-	raw_spinlock_t		lock;
+	raw_spinlock_t		__lock;
 
 	/*
 	 * nr_running and cpu_load should be in the same cacheline because
@@ -1075,6 +1075,10 @@ static inline int cpu_of(struct rq *rq)
 #endif
 }
 
+static inline raw_spinlock_t *rq_lockp(struct rq *rq)
+{
+	return &rq->__lock;
+}
 
 #ifdef CONFIG_SCHED_SMT
 extern void __update_idle_core(struct rq *rq);
@@ -1142,7 +1146,7 @@ static inline void assert_clock_updated(struct rq *rq)
 
 static inline u64 rq_clock(struct rq *rq)
 {
-	lockdep_assert_held(&rq->lock);
+	lockdep_assert_held(rq_lockp(rq));
 	assert_clock_updated(rq);
 
 	return rq->clock;
@@ -1150,7 +1154,7 @@ static inline u64 rq_clock(struct rq *rq)
 
 static inline u64 rq_clock_task(struct rq *rq)
 {
-	lockdep_assert_held(&rq->lock);
+	lockdep_assert_held(rq_lockp(rq));
 	assert_clock_updated(rq);
 
 	return rq->clock_task;
@@ -1176,7 +1180,7 @@ static inline u64 rq_clock_thermal(struct rq *rq)
 
 static inline void rq_clock_skip_update(struct rq *rq)
 {
-	lockdep_assert_held(&rq->lock);
+	lockdep_assert_held(rq_lockp(rq));
 	rq->clock_update_flags |= RQCF_REQ_SKIP;
 }
 
@@ -1186,7 +1190,7 @@ static inline void rq_clock_skip_update(struct rq *rq)
  */
 static inline void rq_clock_cancel_skipupdate(struct rq *rq)
 {
-	lockdep_assert_held(&rq->lock);
+	lockdep_assert_held(rq_lockp(rq));
 	rq->clock_update_flags &= ~RQCF_REQ_SKIP;
 }
 
@@ -1215,7 +1219,7 @@ struct rq_flags {
  */
 static inline void rq_pin_lock(struct rq *rq, struct rq_flags *rf)
 {
-	rf->cookie = lockdep_pin_lock(&rq->lock);
+	rf->cookie = lockdep_pin_lock(rq_lockp(rq));
 
 #ifdef CONFIG_SCHED_DEBUG
 	rq->clock_update_flags &= (RQCF_REQ_SKIP|RQCF_ACT_SKIP);
@@ -1230,12 +1234,12 @@ static inline void rq_unpin_lock(struct rq *rq, struct rq_flags *rf)
 		rf->clock_update_flags = RQCF_UPDATED;
 #endif
 
-	lockdep_unpin_lock(&rq->lock, rf->cookie);
+	lockdep_unpin_lock(rq_lockp(rq), rf->cookie);
 }
 
 static inline void rq_repin_lock(struct rq *rq, struct rq_flags *rf)
 {
-	lockdep_repin_lock(&rq->lock, rf->cookie);
+	lockdep_repin_lock(rq_lockp(rq), rf->cookie);
 
 #ifdef CONFIG_SCHED_DEBUG
 	/*
@@ -1256,7 +1260,7 @@ static inline void __task_rq_unlock(struct rq *rq, struct rq_flags *rf)
 	__releases(rq->lock)
 {
 	rq_unpin_lock(rq, rf);
-	raw_spin_unlock(&rq->lock);
+	raw_spin_unlock(rq_lockp(rq));
 }
 
 static inline void
@@ -1265,7 +1269,7 @@ task_rq_unlock(struct rq *rq, struct task_struct *p, struct rq_flags *rf)
 	__releases(p->pi_lock)
 {
 	rq_unpin_lock(rq, rf);
-	raw_spin_unlock(&rq->lock);
+	raw_spin_unlock(rq_lockp(rq));
 	raw_spin_unlock_irqrestore(&p->pi_lock, rf->flags);
 }
 
@@ -1273,7 +1277,7 @@ static inline void
 rq_lock_irqsave(struct rq *rq, struct rq_flags *rf)
 	__acquires(rq->lock)
 {
-	raw_spin_lock_irqsave(&rq->lock, rf->flags);
+	raw_spin_lock_irqsave(rq_lockp(rq), rf->flags);
 	rq_pin_lock(rq, rf);
 }
 
@@ -1281,7 +1285,7 @@ static inline void
 rq_lock_irq(struct rq *rq, struct rq_flags *rf)
 	__acquires(rq->lock)
 {
-	raw_spin_lock_irq(&rq->lock);
+	raw_spin_lock_irq(rq_lockp(rq));
 	rq_pin_lock(rq, rf);
 }
 
@@ -1289,7 +1293,7 @@ static inline void
 rq_lock(struct rq *rq, struct rq_flags *rf)
 	__acquires(rq->lock)
 {
-	raw_spin_lock(&rq->lock);
+	raw_spin_lock(rq_lockp(rq));
 	rq_pin_lock(rq, rf);
 }
 
@@ -1297,7 +1301,7 @@ static inline void
 rq_relock(struct rq *rq, struct rq_flags *rf)
 	__acquires(rq->lock)
 {
-	raw_spin_lock(&rq->lock);
+	raw_spin_lock(rq_lockp(rq));
 	rq_repin_lock(rq, rf);
 }
 
@@ -1306,7 +1310,7 @@ rq_unlock_irqrestore(struct rq *rq, struct rq_flags *rf)
 	__releases(rq->lock)
 {
 	rq_unpin_lock(rq, rf);
-	raw_spin_unlock_irqrestore(&rq->lock, rf->flags);
+	raw_spin_unlock_irqrestore(rq_lockp(rq), rf->flags);
 }
 
 static inline void
@@ -1314,7 +1318,7 @@ rq_unlock_irq(struct rq *rq, struct rq_flags *rf)
 	__releases(rq->lock)
 {
 	rq_unpin_lock(rq, rf);
-	raw_spin_unlock_irq(&rq->lock);
+	raw_spin_unlock_irq(rq_lockp(rq));
 }
 
 static inline void
@@ -1322,7 +1326,7 @@ rq_unlock(struct rq *rq, struct rq_flags *rf)
 	__releases(rq->lock)
 {
 	rq_unpin_lock(rq, rf);
-	raw_spin_unlock(&rq->lock);
+	raw_spin_unlock(rq_lockp(rq));
 }
 
 static inline struct rq *
@@ -1387,7 +1391,7 @@ queue_balance_callback(struct rq *rq,
 		       struct callback_head *head,
 		       void (*func)(struct rq *rq))
 {
-	lockdep_assert_held(&rq->lock);
+	lockdep_assert_held(rq_lockp(rq));
 
 	if (unlikely(head->next))
 		return;
@@ -2084,7 +2088,7 @@ static inline int _double_lock_balance(struct rq *this_rq, struct rq *busiest)
 	__acquires(busiest->lock)
 	__acquires(this_rq->lock)
 {
-	raw_spin_unlock(&this_rq->lock);
+	raw_spin_unlock(rq_lockp(this_rq));
 	double_rq_lock(this_rq, busiest);
 
 	return 1;
@@ -2103,20 +2107,22 @@ static inline int _double_lock_balance(struct rq *this_rq, struct rq *busiest)
 	__acquires(busiest->lock)
 	__acquires(this_rq->lock)
 {
-	int ret = 0;
-
-	if (unlikely(!raw_spin_trylock(&busiest->lock))) {
-		if (busiest < this_rq) {
-			raw_spin_unlock(&this_rq->lock);
-			raw_spin_lock(&busiest->lock);
-			raw_spin_lock_nested(&this_rq->lock,
-					      SINGLE_DEPTH_NESTING);
-			ret = 1;
-		} else
-			raw_spin_lock_nested(&busiest->lock,
-					      SINGLE_DEPTH_NESTING);
+	if (rq_lockp(this_rq) == rq_lockp(busiest))
+		return 0;
+
+	if (likely(raw_spin_trylock(rq_lockp(busiest))))
+		return 0;
+
+	if (rq_lockp(busiest) >= rq_lockp(this_rq)) {
+		raw_spin_lock_nested(rq_lockp(busiest), SINGLE_DEPTH_NESTING);
+		return 0;
 	}
-	return ret;
+
+	raw_spin_unlock(rq_lockp(this_rq));
+	raw_spin_lock(rq_lockp(busiest));
+	raw_spin_lock_nested(rq_lockp(this_rq), SINGLE_DEPTH_NESTING);
+
+	return 1;
 }
 
 #endif /* CONFIG_PREEMPTION */
@@ -2126,11 +2132,7 @@ static inline int _double_lock_balance(struct rq *this_rq, struct rq *busiest)
  */
 static inline int double_lock_balance(struct rq *this_rq, struct rq *busiest)
 {
-	if (unlikely(!irqs_disabled())) {
-		/* printk() doesn't work well under rq->lock */
-		raw_spin_unlock(&this_rq->lock);
-		BUG_ON(1);
-	}
+	lockdep_assert_irqs_disabled();
 
 	return _double_lock_balance(this_rq, busiest);
 }
@@ -2138,8 +2140,9 @@ static inline int double_lock_balance(struct rq *this_rq, struct rq *busiest)
 static inline void double_unlock_balance(struct rq *this_rq, struct rq *busiest)
 	__releases(busiest->lock)
 {
-	raw_spin_unlock(&busiest->lock);
-	lock_set_subclass(&this_rq->lock.dep_map, 0, _RET_IP_);
+	if (rq_lockp(this_rq) != rq_lockp(busiest))
+		raw_spin_unlock(rq_lockp(busiest));
+	lock_set_subclass(&rq_lockp(this_rq)->dep_map, 0, _RET_IP_);
 }
 
 static inline void double_lock(spinlock_t *l1, spinlock_t *l2)
@@ -2180,16 +2183,16 @@ static inline void double_rq_lock(struct rq *rq1, struct rq *rq2)
 	__acquires(rq2->lock)
 {
 	BUG_ON(!irqs_disabled());
-	if (rq1 == rq2) {
-		raw_spin_lock(&rq1->lock);
+	if (rq_lockp(rq1) == rq_lockp(rq2)) {
+		raw_spin_lock(rq_lockp(rq1));
 		__acquire(rq2->lock);	/* Fake it out ;) */
 	} else {
-		if (rq1 < rq2) {
-			raw_spin_lock(&rq1->lock);
-			raw_spin_lock_nested(&rq2->lock, SINGLE_DEPTH_NESTING);
+		if (rq_lockp(rq1) < rq_lockp(rq2)) {
+			raw_spin_lock(rq_lockp(rq1));
+			raw_spin_lock_nested(rq_lockp(rq2), SINGLE_DEPTH_NESTING);
 		} else {
-			raw_spin_lock(&rq2->lock);
-			raw_spin_lock_nested(&rq1->lock, SINGLE_DEPTH_NESTING);
+			raw_spin_lock(rq_lockp(rq2));
+			raw_spin_lock_nested(rq_lockp(rq1), SINGLE_DEPTH_NESTING);
 		}
 	}
 }
@@ -2204,9 +2207,9 @@ static inline void double_rq_unlock(struct rq *rq1, struct rq *rq2)
 	__releases(rq1->lock)
 	__releases(rq2->lock)
 {
-	raw_spin_unlock(&rq1->lock);
-	if (rq1 != rq2)
-		raw_spin_unlock(&rq2->lock);
+	raw_spin_unlock(rq_lockp(rq1));
+	if (rq_lockp(rq1) != rq_lockp(rq2))
+		raw_spin_unlock(rq_lockp(rq2));
 	else
 		__release(rq2->lock);
 }
@@ -2229,7 +2232,7 @@ static inline void double_rq_lock(struct rq *rq1, struct rq *rq2)
 {
 	BUG_ON(!irqs_disabled());
 	BUG_ON(rq1 != rq2);
-	raw_spin_lock(&rq1->lock);
+	raw_spin_lock(rq_lockp(rq1));
 	__acquire(rq2->lock);	/* Fake it out ;) */
 }
 
@@ -2244,7 +2247,7 @@ static inline void double_rq_unlock(struct rq *rq1, struct rq *rq2)
 	__releases(rq2->lock)
 {
 	BUG_ON(rq1 != rq2);
-	raw_spin_unlock(&rq1->lock);
+	raw_spin_unlock(rq_lockp(rq1));
 	__release(rq2->lock);
 }
 
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 007b0a6b0152..d6d727420952 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -440,7 +440,7 @@ void rq_attach_root(struct rq *rq, struct root_domain *rd)
 	struct root_domain *old_rd = NULL;
 	unsigned long flags;
 
-	raw_spin_lock_irqsave(&rq->lock, flags);
+	raw_spin_lock_irqsave(rq_lockp(rq), flags);
 
 	if (rq->rd) {
 		old_rd = rq->rd;
@@ -466,7 +466,7 @@ void rq_attach_root(struct rq *rq, struct root_domain *rd)
 	if (cpumask_test_cpu(rq->cpu, cpu_active_mask))
 		set_rq_online(rq);
 
-	raw_spin_unlock_irqrestore(&rq->lock, flags);
+	raw_spin_unlock_irqrestore(rq_lockp(rq), flags);
 
 	if (old_rd)
 		call_rcu(&old_rd->rcu, free_rootdomain);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v7 02/23] sched: Introduce sched_class::pick_task()
  2020-08-28 19:51 [RFC PATCH v7 00/23] Core scheduling v7 Julien Desfossez
  2020-08-28 19:51 ` [RFC PATCH v7 01/23] sched: Wrap rq::lock access Julien Desfossez
@ 2020-08-28 19:51 ` Julien Desfossez
  2020-08-28 19:51 ` [RFC PATCH v7 03/23] sched: Core-wide rq->lock Julien Desfossez
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: Julien Desfossez @ 2020-08-28 19:51 UTC (permalink / raw)
  To: Peter Zijlstra, Vineeth Pillai, Joel Fernandes, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan
  Cc: mingo, tglx, pjt, torvalds, linux-kernel, fweisbec, keescook,
	kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta,
	Paolo Bonzini, joel, vineeth, Chen Yu, Christian Brauner,
	Agata Gruza, Antonio Gomez Iglesias, graf, konrad.wilk,
	dfaggioli, rostedt, derkling, benbjiang, Vineeth Remanan Pillai,
	Julien Desfossez

From: Peter Zijlstra <peterz@infradead.org>

Because sched_class::pick_next_task() also implies
sched_class::set_next_task() (and possibly put_prev_task() and
newidle_balance) it is not state invariant. This makes it unsuitable
for remote task selection.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineeth Remanan Pillai <vpillai@digitalocean.com>
Signed-off-by: Julien Desfossez <jdesfossez@digitalocean.com>
---
 kernel/sched/deadline.c  | 16 ++++++++++++++--
 kernel/sched/fair.c      | 36 +++++++++++++++++++++++++++++++++---
 kernel/sched/idle.c      |  8 ++++++++
 kernel/sched/rt.c        | 14 ++++++++++++--
 kernel/sched/sched.h     |  3 +++
 kernel/sched/stop_task.c | 13 +++++++++++--
 6 files changed, 81 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index c588a06057dc..6389aff32558 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1824,7 +1824,7 @@ static struct sched_dl_entity *pick_next_dl_entity(struct rq *rq,
 	return rb_entry(left, struct sched_dl_entity, rb_node);
 }
 
-static struct task_struct *pick_next_task_dl(struct rq *rq)
+static struct task_struct *pick_task_dl(struct rq *rq)
 {
 	struct sched_dl_entity *dl_se;
 	struct dl_rq *dl_rq = &rq->dl;
@@ -1836,7 +1836,18 @@ static struct task_struct *pick_next_task_dl(struct rq *rq)
 	dl_se = pick_next_dl_entity(rq, dl_rq);
 	BUG_ON(!dl_se);
 	p = dl_task_of(dl_se);
-	set_next_task_dl(rq, p, true);
+
+	return p;
+}
+
+static struct task_struct *pick_next_task_dl(struct rq *rq)
+{
+	struct task_struct *p;
+
+	p = pick_task_dl(rq);
+	if (p)
+		set_next_task_dl(rq, p, true);
+
 	return p;
 }
 
@@ -2493,6 +2504,7 @@ const struct sched_class dl_sched_class
 
 #ifdef CONFIG_SMP
 	.balance		= balance_dl,
+	.pick_task		= pick_task_dl,
 	.select_task_rq		= select_task_rq_dl,
 	.migrate_task_rq	= migrate_task_rq_dl,
 	.set_cpus_allowed       = set_cpus_allowed_dl,
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9be6cf92cc3d..1a1bf726264a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4446,7 +4446,7 @@ pick_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *curr)
 	 * Avoid running the skip buddy, if running something else can
 	 * be done without getting too unfair.
 	 */
-	if (cfs_rq->skip == se) {
+	if (cfs_rq->skip && cfs_rq->skip == se) {
 		struct sched_entity *second;
 
 		if (se == curr) {
@@ -4464,13 +4464,13 @@ pick_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *curr)
 	/*
 	 * Prefer last buddy, try to return the CPU to a preempted task.
 	 */
-	if (cfs_rq->last && wakeup_preempt_entity(cfs_rq->last, left) < 1)
+	if (left && cfs_rq->last && wakeup_preempt_entity(cfs_rq->last, left) < 1)
 		se = cfs_rq->last;
 
 	/*
 	 * Someone really wants this to run. If it's not unfair, run it.
 	 */
-	if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1)
+	if (left && cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1)
 		se = cfs_rq->next;
 
 	clear_buddies(cfs_rq, se);
@@ -6970,6 +6970,35 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_
 		set_last_buddy(se);
 }
 
+#ifdef CONFIG_SMP
+static struct task_struct *pick_task_fair(struct rq *rq)
+{
+	struct cfs_rq *cfs_rq = &rq->cfs;
+	struct sched_entity *se;
+
+	if (!cfs_rq->nr_running)
+		return NULL;
+
+	do {
+		struct sched_entity *curr = cfs_rq->curr;
+
+		se = pick_next_entity(cfs_rq, NULL);
+
+		if (curr) {
+			if (se && curr->on_rq)
+				update_curr(cfs_rq);
+
+			if (!se || entity_before(curr, se))
+				se = curr;
+		}
+
+		cfs_rq = group_cfs_rq(se);
+	} while (cfs_rq);
+
+	return task_of(se);
+}
+#endif
+
 struct task_struct *
 pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 {
@@ -11152,6 +11181,7 @@ const struct sched_class fair_sched_class
 
 #ifdef CONFIG_SMP
 	.balance		= balance_fair,
+	.pick_task		= pick_task_fair,
 	.select_task_rq		= select_task_rq_fair,
 	.migrate_task_rq	= migrate_task_rq_fair,
 
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 5c33beb1a525..8c41b023dd14 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -408,6 +408,13 @@ static void set_next_task_idle(struct rq *rq, struct task_struct *next, bool fir
 	schedstat_inc(rq->sched_goidle);
 }
 
+#ifdef CONFIG_SMP
+static struct task_struct *pick_task_idle(struct rq *rq)
+{
+	return rq->idle;
+}
+#endif
+
 struct task_struct *pick_next_task_idle(struct rq *rq)
 {
 	struct task_struct *next = rq->idle;
@@ -475,6 +482,7 @@ const struct sched_class idle_sched_class
 
 #ifdef CONFIG_SMP
 	.balance		= balance_idle,
+	.pick_task		= pick_task_idle,
 	.select_task_rq		= select_task_rq_idle,
 	.set_cpus_allowed	= set_cpus_allowed_common,
 #endif
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index e57fca05b660..a5851c775270 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1624,7 +1624,7 @@ static struct task_struct *_pick_next_task_rt(struct rq *rq)
 	return rt_task_of(rt_se);
 }
 
-static struct task_struct *pick_next_task_rt(struct rq *rq)
+static struct task_struct *pick_task_rt(struct rq *rq)
 {
 	struct task_struct *p;
 
@@ -1632,7 +1632,16 @@ static struct task_struct *pick_next_task_rt(struct rq *rq)
 		return NULL;
 
 	p = _pick_next_task_rt(rq);
-	set_next_task_rt(rq, p, true);
+
+	return p;
+}
+
+static struct task_struct *pick_next_task_rt(struct rq *rq)
+{
+	struct task_struct *p = pick_task_rt(rq);
+	if (p)
+		set_next_task_rt(rq, p, true);
+
 	return p;
 }
 
@@ -2443,6 +2452,7 @@ const struct sched_class rt_sched_class
 
 #ifdef CONFIG_SMP
 	.balance		= balance_rt,
+	.pick_task		= pick_task_rt,
 	.select_task_rq		= select_task_rq_rt,
 	.set_cpus_allowed       = set_cpus_allowed_common,
 	.rq_online              = rq_online_rt,
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 9145be83d80b..293d031480d8 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1793,6 +1793,9 @@ struct sched_class {
 
 #ifdef CONFIG_SMP
 	int (*balance)(struct rq *rq, struct task_struct *prev, struct rq_flags *rf);
+
+	struct task_struct * (*pick_task)(struct rq *rq);
+
 	int  (*select_task_rq)(struct task_struct *p, int task_cpu, int sd_flag, int flags);
 	void (*migrate_task_rq)(struct task_struct *p, int new_cpu);
 
diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index 394bc8126a1e..8f92915dd95e 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -34,15 +34,23 @@ static void set_next_task_stop(struct rq *rq, struct task_struct *stop, bool fir
 	stop->se.exec_start = rq_clock_task(rq);
 }
 
-static struct task_struct *pick_next_task_stop(struct rq *rq)
+static struct task_struct *pick_task_stop(struct rq *rq)
 {
 	if (!sched_stop_runnable(rq))
 		return NULL;
 
-	set_next_task_stop(rq, rq->stop, true);
 	return rq->stop;
 }
 
+static struct task_struct *pick_next_task_stop(struct rq *rq)
+{
+	struct task_struct *p = pick_task_stop(rq);
+	if (p)
+		set_next_task_stop(rq, p, true);
+
+	return p;
+}
+
 static void
 enqueue_task_stop(struct rq *rq, struct task_struct *p, int flags)
 {
@@ -124,6 +132,7 @@ const struct sched_class stop_sched_class
 
 #ifdef CONFIG_SMP
 	.balance		= balance_stop,
+	.pick_task		= pick_task_stop,
 	.select_task_rq		= select_task_rq_stop,
 	.set_cpus_allowed	= set_cpus_allowed_common,
 #endif
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v7 03/23] sched: Core-wide rq->lock
  2020-08-28 19:51 [RFC PATCH v7 00/23] Core scheduling v7 Julien Desfossez
  2020-08-28 19:51 ` [RFC PATCH v7 01/23] sched: Wrap rq::lock access Julien Desfossez
  2020-08-28 19:51 ` [RFC PATCH v7 02/23] sched: Introduce sched_class::pick_task() Julien Desfossez
@ 2020-08-28 19:51 ` Julien Desfossez
  2020-08-28 19:51 ` [RFC PATCH v7 04/23] sched/fair: Add a few assertions Julien Desfossez
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: Julien Desfossez @ 2020-08-28 19:51 UTC (permalink / raw)
  To: Peter Zijlstra, Vineeth Pillai, Joel Fernandes, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan
  Cc: mingo, tglx, pjt, torvalds, linux-kernel, fweisbec, keescook,
	kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta,
	Paolo Bonzini, joel, vineeth, Chen Yu, Christian Brauner,
	Agata Gruza, Antonio Gomez Iglesias, graf, konrad.wilk,
	dfaggioli, rostedt, derkling, benbjiang, Julien Desfossez,
	Vineeth Remanan Pillai

From: Peter Zijlstra <peterz@infradead.org>

Introduce the basic infrastructure to have a core wide rq->lock.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Julien Desfossez <jdesfossez@digitalocean.com>
Signed-off-by: Vineeth Remanan Pillai <vpillai@digitalocean.com>
---
 kernel/Kconfig.preempt |  6 +++
 kernel/sched/core.c    | 95 ++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/sched.h   | 31 ++++++++++++++
 3 files changed, 132 insertions(+)

diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt
index bf82259cff96..4488fbf4d3a8 100644
--- a/kernel/Kconfig.preempt
+++ b/kernel/Kconfig.preempt
@@ -80,3 +80,9 @@ config PREEMPT_COUNT
 config PREEMPTION
        bool
        select PREEMPT_COUNT
+
+config SCHED_CORE
+	bool "Core Scheduling for SMT"
+	default y
+	depends on SCHED_SMT
+
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b85d5e56d5fe..e2642c5dbd61 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -73,6 +73,70 @@ unsigned int sysctl_sched_rt_period = 1000000;
 
 __read_mostly int scheduler_running;
 
+#ifdef CONFIG_SCHED_CORE
+
+DEFINE_STATIC_KEY_FALSE(__sched_core_enabled);
+
+/*
+ * The static-key + stop-machine variable are needed such that:
+ *
+ *	spin_lock(rq_lockp(rq));
+ *	...
+ *	spin_unlock(rq_lockp(rq));
+ *
+ * ends up locking and unlocking the _same_ lock, and all CPUs
+ * always agree on what rq has what lock.
+ *
+ * XXX entirely possible to selectively enable cores, don't bother for now.
+ */
+static int __sched_core_stopper(void *data)
+{
+	bool enabled = !!(unsigned long)data;
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		cpu_rq(cpu)->core_enabled = enabled;
+
+	return 0;
+}
+
+static DEFINE_MUTEX(sched_core_mutex);
+static int sched_core_count;
+
+static void __sched_core_enable(void)
+{
+	// XXX verify there are no cookie tasks (yet)
+
+	static_branch_enable(&__sched_core_enabled);
+	stop_machine(__sched_core_stopper, (void *)true, NULL);
+}
+
+static void __sched_core_disable(void)
+{
+	// XXX verify there are no cookie tasks (left)
+
+	stop_machine(__sched_core_stopper, (void *)false, NULL);
+	static_branch_disable(&__sched_core_enabled);
+}
+
+void sched_core_get(void)
+{
+	mutex_lock(&sched_core_mutex);
+	if (!sched_core_count++)
+		__sched_core_enable();
+	mutex_unlock(&sched_core_mutex);
+}
+
+void sched_core_put(void)
+{
+	mutex_lock(&sched_core_mutex);
+	if (!--sched_core_count)
+		__sched_core_disable();
+	mutex_unlock(&sched_core_mutex);
+}
+
+#endif /* CONFIG_SCHED_CORE */
+
 /*
  * part of the period that we allow rt tasks to run in us.
  * default: 0.95s
@@ -6964,6 +7028,32 @@ static void sched_rq_cpu_starting(unsigned int cpu)
 
 int sched_cpu_starting(unsigned int cpu)
 {
+#ifdef CONFIG_SCHED_CORE
+	const struct cpumask *smt_mask = cpu_smt_mask(cpu);
+	struct rq *rq, *core_rq = NULL;
+	int i;
+
+	core_rq = cpu_rq(cpu)->core;
+
+	if (!core_rq) {
+		for_each_cpu(i, smt_mask) {
+			rq = cpu_rq(i);
+			if (rq->core && rq->core == rq)
+				core_rq = rq;
+		}
+
+		if (!core_rq)
+			core_rq = cpu_rq(cpu);
+
+		for_each_cpu(i, smt_mask) {
+			rq = cpu_rq(i);
+
+			WARN_ON_ONCE(rq->core && rq->core != core_rq);
+			rq->core = core_rq;
+		}
+	}
+#endif /* CONFIG_SCHED_CORE */
+
 	sched_rq_cpu_starting(cpu);
 	sched_tick_start(cpu);
 	return 0;
@@ -7194,6 +7284,11 @@ void __init sched_init(void)
 #endif /* CONFIG_SMP */
 		hrtick_rq_init(rq);
 		atomic_set(&rq->nr_iowait, 0);
+
+#ifdef CONFIG_SCHED_CORE
+		rq->core = NULL;
+		rq->core_enabled = 0;
+#endif
 	}
 
 	set_load_weight(&init_task, false);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 293d031480d8..6ab8adff169b 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1048,6 +1048,12 @@ struct rq {
 	/* Must be inspected within a rcu lock section */
 	struct cpuidle_state	*idle_state;
 #endif
+
+#ifdef CONFIG_SCHED_CORE
+	/* per rq */
+	struct rq		*core;
+	unsigned int		core_enabled;
+#endif
 };
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
@@ -1075,11 +1081,36 @@ static inline int cpu_of(struct rq *rq)
 #endif
 }
 
+#ifdef CONFIG_SCHED_CORE
+DECLARE_STATIC_KEY_FALSE(__sched_core_enabled);
+
+static inline bool sched_core_enabled(struct rq *rq)
+{
+	return static_branch_unlikely(&__sched_core_enabled) && rq->core_enabled;
+}
+
+static inline raw_spinlock_t *rq_lockp(struct rq *rq)
+{
+	if (sched_core_enabled(rq))
+		return &rq->core->__lock;
+
+	return &rq->__lock;
+}
+
+#else /* !CONFIG_SCHED_CORE */
+
+static inline bool sched_core_enabled(struct rq *rq)
+{
+	return false;
+}
+
 static inline raw_spinlock_t *rq_lockp(struct rq *rq)
 {
 	return &rq->__lock;
 }
 
+#endif /* CONFIG_SCHED_CORE */
+
 #ifdef CONFIG_SCHED_SMT
 extern void __update_idle_core(struct rq *rq);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v7 04/23] sched/fair: Add a few assertions
  2020-08-28 19:51 [RFC PATCH v7 00/23] Core scheduling v7 Julien Desfossez
                   ` (2 preceding siblings ...)
  2020-08-28 19:51 ` [RFC PATCH v7 03/23] sched: Core-wide rq->lock Julien Desfossez
@ 2020-08-28 19:51 ` Julien Desfossez
  2020-08-28 19:51 ` [RFC PATCH v7 05/23] sched: Basic tracking of matching tasks Julien Desfossez
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: Julien Desfossez @ 2020-08-28 19:51 UTC (permalink / raw)
  To: Peter Zijlstra, Vineeth Pillai, Joel Fernandes, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan
  Cc: mingo, tglx, pjt, torvalds, linux-kernel, fweisbec, keescook,
	kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta,
	Paolo Bonzini, joel, vineeth, Chen Yu, Christian Brauner,
	Agata Gruza, Antonio Gomez Iglesias, graf, konrad.wilk,
	dfaggioli, rostedt, derkling, benbjiang

From: Peter Zijlstra <peterz@infradead.org>

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/fair.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1a1bf726264a..af8c40191a19 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6223,6 +6223,11 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	}
 
 symmetric:
+	/*
+	 * per-cpu select_idle_mask usage
+	 */
+	lockdep_assert_irqs_disabled();
+
 	if (available_idle_cpu(target) || sched_idle_cpu(target))
 		return target;
 
@@ -6664,8 +6669,6 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
  * certain conditions an idle sibling CPU if the domain has SD_WAKE_AFFINE set.
  *
  * Returns the target CPU number.
- *
- * preempt must be disabled.
  */
 static int
 select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_flags)
@@ -6676,6 +6679,11 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
 	int want_affine = 0;
 	int sync = (wake_flags & WF_SYNC) && !(current->flags & PF_EXITING);
 
+	/*
+	 * required for stable ->cpus_allowed
+	 */
+	lockdep_assert_held(&p->pi_lock);
+
 	if (sd_flag & SD_BALANCE_WAKE) {
 		record_wakee(p);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v7 05/23] sched: Basic tracking of matching tasks
  2020-08-28 19:51 [RFC PATCH v7 00/23] Core scheduling v7 Julien Desfossez
                   ` (3 preceding siblings ...)
  2020-08-28 19:51 ` [RFC PATCH v7 04/23] sched/fair: Add a few assertions Julien Desfossez
@ 2020-08-28 19:51 ` Julien Desfossez
  2020-08-28 19:51 ` [RFC PATCH v7 06/23] bitops: Introduce find_next_or_bit Julien Desfossez
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: Julien Desfossez @ 2020-08-28 19:51 UTC (permalink / raw)
  To: Peter Zijlstra, Vineeth Pillai, Joel Fernandes, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan
  Cc: mingo, tglx, pjt, torvalds, linux-kernel, fweisbec, keescook,
	kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta,
	Paolo Bonzini, joel, vineeth, Chen Yu, Christian Brauner,
	Agata Gruza, Antonio Gomez Iglesias, graf, konrad.wilk,
	dfaggioli, rostedt, derkling, benbjiang, Vineeth Remanan Pillai,
	Julien Desfossez

From: Peter Zijlstra <peterz@infradead.org>

Introduce task_struct::core_cookie as an opaque identifier for core
scheduling. When enabled; core scheduling will only allow matching
task to be on the core; where idle matches everything.

When task_struct::core_cookie is set (and core scheduling is enabled)
these tasks are indexed in a second RB-tree, first on cookie value
then on scheduling function, such that matching task selection always
finds the most elegible match.

NOTE: *shudder* at the overhead...

NOTE: *sigh*, a 3rd copy of the scheduling function; the alternative
is per class tracking of cookies and that just duplicates a lot of
stuff for no raisin (the 2nd copy lives in the rt-mutex PI code).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineeth Remanan Pillai <vpillai@digitalocean.com>
Signed-off-by: Julien Desfossez <jdesfossez@digitalocean.com>
---
 include/linux/sched.h |   8 ++-
 kernel/sched/core.c   | 146 ++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/fair.c   |  46 -------------
 kernel/sched/sched.h  |  55 ++++++++++++++++
 4 files changed, 208 insertions(+), 47 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 93ecd930efd3..5fe9878502cb 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -682,10 +682,16 @@ struct task_struct {
 	const struct sched_class	*sched_class;
 	struct sched_entity		se;
 	struct sched_rt_entity		rt;
+	struct sched_dl_entity		dl;
+
+#ifdef CONFIG_SCHED_CORE
+	struct rb_node			core_node;
+	unsigned long			core_cookie;
+#endif
+
 #ifdef CONFIG_CGROUP_SCHED
 	struct task_group		*sched_task_group;
 #endif
-	struct sched_dl_entity		dl;
 
 #ifdef CONFIG_UCLAMP_TASK
 	/*
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e2642c5dbd61..eea18956a9ef 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -77,6 +77,141 @@ __read_mostly int scheduler_running;
 
 DEFINE_STATIC_KEY_FALSE(__sched_core_enabled);
 
+/* kernel prio, less is more */
+static inline int __task_prio(struct task_struct *p)
+{
+	if (p->sched_class == &stop_sched_class) /* trumps deadline */
+		return -2;
+
+	if (rt_prio(p->prio)) /* includes deadline */
+		return p->prio; /* [-1, 99] */
+
+	if (p->sched_class == &idle_sched_class)
+		return MAX_RT_PRIO + NICE_WIDTH; /* 140 */
+
+	return MAX_RT_PRIO + MAX_NICE; /* 120, squash fair */
+}
+
+/*
+ * l(a,b)
+ * le(a,b) := !l(b,a)
+ * g(a,b)  := l(b,a)
+ * ge(a,b) := !l(a,b)
+ */
+
+/* real prio, less is less */
+static inline bool prio_less(struct task_struct *a, struct task_struct *b)
+{
+
+	int pa = __task_prio(a), pb = __task_prio(b);
+
+	if (-pa < -pb)
+		return true;
+
+	if (-pb < -pa)
+		return false;
+
+	if (pa == -1) /* dl_prio() doesn't work because of stop_class above */
+		return !dl_time_before(a->dl.deadline, b->dl.deadline);
+
+	if (pa == MAX_RT_PRIO + MAX_NICE)  { /* fair */
+		u64 vruntime = b->se.vruntime;
+
+		/*
+		 * Normalize the vruntime if tasks are in different cpus.
+		 */
+		if (task_cpu(a) != task_cpu(b)) {
+			vruntime -= task_cfs_rq(b)->min_vruntime;
+			vruntime += task_cfs_rq(a)->min_vruntime;
+		}
+
+		return !((s64)(a->se.vruntime - vruntime) <= 0);
+	}
+
+	return false;
+}
+
+static inline bool __sched_core_less(struct task_struct *a, struct task_struct *b)
+{
+	if (a->core_cookie < b->core_cookie)
+		return true;
+
+	if (a->core_cookie > b->core_cookie)
+		return false;
+
+	/* flip prio, so high prio is leftmost */
+	if (prio_less(b, a))
+		return true;
+
+	return false;
+}
+
+static void sched_core_enqueue(struct rq *rq, struct task_struct *p)
+{
+	struct rb_node *parent, **node;
+	struct task_struct *node_task;
+
+	rq->core->core_task_seq++;
+
+	if (!p->core_cookie)
+		return;
+
+	node = &rq->core_tree.rb_node;
+	parent = *node;
+
+	while (*node) {
+		node_task = container_of(*node, struct task_struct, core_node);
+		parent = *node;
+
+		if (__sched_core_less(p, node_task))
+			node = &parent->rb_left;
+		else
+			node = &parent->rb_right;
+	}
+
+	rb_link_node(&p->core_node, parent, node);
+	rb_insert_color(&p->core_node, &rq->core_tree);
+}
+
+static void sched_core_dequeue(struct rq *rq, struct task_struct *p)
+{
+	rq->core->core_task_seq++;
+
+	if (!p->core_cookie)
+		return;
+
+	rb_erase(&p->core_node, &rq->core_tree);
+}
+
+/*
+ * Find left-most (aka, highest priority) task matching @cookie.
+ */
+static struct task_struct *sched_core_find(struct rq *rq, unsigned long cookie)
+{
+	struct rb_node *node = rq->core_tree.rb_node;
+	struct task_struct *node_task, *match;
+
+	/*
+	 * The idle task always matches any cookie!
+	 */
+	match = idle_sched_class.pick_task(rq);
+
+	while (node) {
+		node_task = container_of(node, struct task_struct, core_node);
+
+		if (cookie < node_task->core_cookie) {
+			node = node->rb_left;
+		} else if (cookie > node_task->core_cookie) {
+			node = node->rb_right;
+		} else {
+			match = node_task;
+			node = node->rb_left;
+		}
+	}
+
+	return match;
+}
+
 /*
  * The static-key + stop-machine variable are needed such that:
  *
@@ -135,6 +270,11 @@ void sched_core_put(void)
 	mutex_unlock(&sched_core_mutex);
 }
 
+#else /* !CONFIG_SCHED_CORE */
+
+static inline void sched_core_enqueue(struct rq *rq, struct task_struct *p) { }
+static inline void sched_core_dequeue(struct rq *rq, struct task_struct *p) { }
+
 #endif /* CONFIG_SCHED_CORE */
 
 /*
@@ -1628,6 +1768,9 @@ static inline void init_uclamp(void) { }
 
 static inline void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
 {
+	if (sched_core_enabled(rq))
+		sched_core_enqueue(rq, p);
+
 	if (!(flags & ENQUEUE_NOCLOCK))
 		update_rq_clock(rq);
 
@@ -1642,6 +1785,9 @@ static inline void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
 
 static inline void dequeue_task(struct rq *rq, struct task_struct *p, int flags)
 {
+	if (sched_core_enabled(rq))
+		sched_core_dequeue(rq, p);
+
 	if (!(flags & DEQUEUE_NOCLOCK))
 		update_rq_clock(rq);
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index af8c40191a19..285002a2f641 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -258,33 +258,11 @@ const struct sched_class fair_sched_class;
  */
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
-static inline struct task_struct *task_of(struct sched_entity *se)
-{
-	SCHED_WARN_ON(!entity_is_task(se));
-	return container_of(se, struct task_struct, se);
-}
 
 /* Walk up scheduling entities hierarchy */
 #define for_each_sched_entity(se) \
 		for (; se; se = se->parent)
 
-static inline struct cfs_rq *task_cfs_rq(struct task_struct *p)
-{
-	return p->se.cfs_rq;
-}
-
-/* runqueue on which this entity is (to be) queued */
-static inline struct cfs_rq *cfs_rq_of(struct sched_entity *se)
-{
-	return se->cfs_rq;
-}
-
-/* runqueue "owned" by this group */
-static inline struct cfs_rq *group_cfs_rq(struct sched_entity *grp)
-{
-	return grp->my_q;
-}
-
 static inline void cfs_rq_tg_path(struct cfs_rq *cfs_rq, char *path, int len)
 {
 	if (!path)
@@ -445,33 +423,9 @@ find_matching_se(struct sched_entity **se, struct sched_entity **pse)
 
 #else	/* !CONFIG_FAIR_GROUP_SCHED */
 
-static inline struct task_struct *task_of(struct sched_entity *se)
-{
-	return container_of(se, struct task_struct, se);
-}
-
 #define for_each_sched_entity(se) \
 		for (; se; se = NULL)
 
-static inline struct cfs_rq *task_cfs_rq(struct task_struct *p)
-{
-	return &task_rq(p)->cfs;
-}
-
-static inline struct cfs_rq *cfs_rq_of(struct sched_entity *se)
-{
-	struct task_struct *p = task_of(se);
-	struct rq *rq = task_rq(p);
-
-	return &rq->cfs;
-}
-
-/* runqueue "owned" by this group */
-static inline struct cfs_rq *group_cfs_rq(struct sched_entity *grp)
-{
-	return NULL;
-}
-
 static inline void cfs_rq_tg_path(struct cfs_rq *cfs_rq, char *path, int len)
 {
 	if (path)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 6ab8adff169b..92e0b8679eef 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1053,6 +1053,10 @@ struct rq {
 	/* per rq */
 	struct rq		*core;
 	unsigned int		core_enabled;
+	struct rb_root		core_tree;
+
+	/* shared state */
+	unsigned int		core_task_seq;
 #endif
 };
 
@@ -1132,6 +1136,57 @@ DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
 #define cpu_curr(cpu)		(cpu_rq(cpu)->curr)
 #define raw_rq()		raw_cpu_ptr(&runqueues)
 
+#ifdef CONFIG_FAIR_GROUP_SCHED
+static inline struct task_struct *task_of(struct sched_entity *se)
+{
+	SCHED_WARN_ON(!entity_is_task(se));
+	return container_of(se, struct task_struct, se);
+}
+
+static inline struct cfs_rq *task_cfs_rq(struct task_struct *p)
+{
+	return p->se.cfs_rq;
+}
+
+/* runqueue on which this entity is (to be) queued */
+static inline struct cfs_rq *cfs_rq_of(struct sched_entity *se)
+{
+	return se->cfs_rq;
+}
+
+/* runqueue "owned" by this group */
+static inline struct cfs_rq *group_cfs_rq(struct sched_entity *grp)
+{
+	return grp->my_q;
+}
+
+#else
+
+static inline struct task_struct *task_of(struct sched_entity *se)
+{
+	return container_of(se, struct task_struct, se);
+}
+
+static inline struct cfs_rq *task_cfs_rq(struct task_struct *p)
+{
+	return &task_rq(p)->cfs;
+}
+
+static inline struct cfs_rq *cfs_rq_of(struct sched_entity *se)
+{
+	struct task_struct *p = task_of(se);
+	struct rq *rq = task_rq(p);
+
+	return &rq->cfs;
+}
+
+/* runqueue "owned" by this group */
+static inline struct cfs_rq *group_cfs_rq(struct sched_entity *grp)
+{
+	return NULL;
+}
+#endif
+
 extern void update_rq_clock(struct rq *rq);
 
 static inline u64 __rq_clock_broken(struct rq *rq)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v7 06/23] bitops: Introduce find_next_or_bit
  2020-08-28 19:51 [RFC PATCH v7 00/23] Core scheduling v7 Julien Desfossez
                   ` (4 preceding siblings ...)
  2020-08-28 19:51 ` [RFC PATCH v7 05/23] sched: Basic tracking of matching tasks Julien Desfossez
@ 2020-08-28 19:51 ` Julien Desfossez
  2020-09-03  5:13   ` Randy Dunlap
  2020-08-28 19:51 ` [RFC PATCH v7 07/23] cpumask: Introduce a new iterator for_each_cpu_wrap_or Julien Desfossez
                   ` (16 subsequent siblings)
  22 siblings, 1 reply; 68+ messages in thread
From: Julien Desfossez @ 2020-08-28 19:51 UTC (permalink / raw)
  To: Peter Zijlstra, Vineeth Pillai, Joel Fernandes, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan
  Cc: mingo, tglx, pjt, torvalds, linux-kernel, fweisbec, keescook,
	kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta,
	Paolo Bonzini, joel, vineeth, Chen Yu, Christian Brauner,
	Agata Gruza, Antonio Gomez Iglesias, graf, konrad.wilk,
	dfaggioli, rostedt, derkling, benbjiang

From: Vineeth Pillai <viremana@linux.microsoft.com>

Hotplug fixes to core-scheduling require a new bitops API.

Introduce a new API find_next_or_bit() which returns the
bit number of the next set bit in OR-ed bit masks of the
given bit masks.

Signed-off-by: Vineeth Pillai <viremana@linux.microsoft.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/asm-generic/bitops/find.h | 16 +++++++++
 lib/find_bit.c                    | 58 +++++++++++++++++++++++++------
 2 files changed, 63 insertions(+), 11 deletions(-)

diff --git a/include/asm-generic/bitops/find.h b/include/asm-generic/bitops/find.h
index 9fdf21302fdf..0b476ca0d665 100644
--- a/include/asm-generic/bitops/find.h
+++ b/include/asm-generic/bitops/find.h
@@ -32,6 +32,22 @@ extern unsigned long find_next_and_bit(const unsigned long *addr1,
 		unsigned long offset);
 #endif
 
+#ifndef find_next_or_bit
+/**
+ * find_next_or_bit - find the next set bit in any memory regions
+ * @addr1: The first address to base the search on
+ * @addr2: The second address to base the search on
+ * @offset: The bitnumber to start searching at
+ * @size: The bitmap size in bits
+ *
+ * Returns the bit number for the next set bit
+ * If no bits are set, returns @size.
+ */
+extern unsigned long find_next_or_bit(const unsigned long *addr1,
+		const unsigned long *addr2, unsigned long size,
+		unsigned long offset);
+#endif
+
 #ifndef find_next_zero_bit
 /**
  * find_next_zero_bit - find the next cleared bit in a memory region
diff --git a/lib/find_bit.c b/lib/find_bit.c
index 49f875f1baf7..e36257bb0701 100644
--- a/lib/find_bit.c
+++ b/lib/find_bit.c
@@ -19,7 +19,16 @@
 
 #if !defined(find_next_bit) || !defined(find_next_zero_bit) ||			\
 	!defined(find_next_bit_le) || !defined(find_next_zero_bit_le) ||	\
-	!defined(find_next_and_bit)
+	!defined(find_next_and_bit) || !defined(find_next_or_bit)
+
+/*
+ * find_next_bit bitwise operation types
+ */
+enum fnb_bwops_type {
+	FNB_AND = 0,
+	FNB_OR  = 1
+};
+
 /*
  * This is a common helper function for find_next_bit, find_next_zero_bit, and
  * find_next_and_bit. The differences are:
@@ -29,7 +38,8 @@
  */
 static unsigned long _find_next_bit(const unsigned long *addr1,
 		const unsigned long *addr2, unsigned long nbits,
-		unsigned long start, unsigned long invert, unsigned long le)
+		unsigned long start, unsigned long invert,
+		enum fnb_bwops_type type, unsigned long le)
 {
 	unsigned long tmp, mask;
 
@@ -37,8 +47,16 @@ static unsigned long _find_next_bit(const unsigned long *addr1,
 		return nbits;
 
 	tmp = addr1[start / BITS_PER_LONG];
-	if (addr2)
-		tmp &= addr2[start / BITS_PER_LONG];
+	if (addr2) {
+		switch (type) {
+		case FNB_AND:
+			tmp &= addr2[start / BITS_PER_LONG];
+			break;
+		case FNB_OR:
+			tmp |= addr2[start / BITS_PER_LONG];
+			break;
+		}
+	}
 	tmp ^= invert;
 
 	/* Handle 1st word. */
@@ -56,8 +74,16 @@ static unsigned long _find_next_bit(const unsigned long *addr1,
 			return nbits;
 
 		tmp = addr1[start / BITS_PER_LONG];
-		if (addr2)
-			tmp &= addr2[start / BITS_PER_LONG];
+		if (addr2) {
+			switch (type) {
+			case FNB_AND:
+				tmp &= addr2[start / BITS_PER_LONG];
+				break;
+			case FNB_OR:
+				tmp |= addr2[start / BITS_PER_LONG];
+				break;
+			}
+		}
 		tmp ^= invert;
 	}
 
@@ -75,7 +101,7 @@ static unsigned long _find_next_bit(const unsigned long *addr1,
 unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
 			    unsigned long offset)
 {
-	return _find_next_bit(addr, NULL, size, offset, 0UL, 0);
+	return _find_next_bit(addr, NULL, size, offset, 0UL, FNB_AND, 0);
 }
 EXPORT_SYMBOL(find_next_bit);
 #endif
@@ -84,7 +110,7 @@ EXPORT_SYMBOL(find_next_bit);
 unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
 				 unsigned long offset)
 {
-	return _find_next_bit(addr, NULL, size, offset, ~0UL, 0);
+	return _find_next_bit(addr, NULL, size, offset, ~0UL, FNB_AND, 0);
 }
 EXPORT_SYMBOL(find_next_zero_bit);
 #endif
@@ -94,11 +120,21 @@ unsigned long find_next_and_bit(const unsigned long *addr1,
 		const unsigned long *addr2, unsigned long size,
 		unsigned long offset)
 {
-	return _find_next_bit(addr1, addr2, size, offset, 0UL, 0);
+	return _find_next_bit(addr1, addr2, size, offset, 0UL, FNB_AND, 0);
 }
 EXPORT_SYMBOL(find_next_and_bit);
 #endif
 
+#if !defined(find_next_or_bit)
+unsigned long find_next_or_bit(const unsigned long *addr1,
+		const unsigned long *addr2, unsigned long size,
+		unsigned long offset)
+{
+	return _find_next_bit(addr1, addr2, size, offset, 0UL, FNB_OR, 0);
+}
+EXPORT_SYMBOL(find_next_or_bit);
+#endif
+
 #ifndef find_first_bit
 /*
  * Find the first set bit in a memory region.
@@ -161,7 +197,7 @@ EXPORT_SYMBOL(find_last_bit);
 unsigned long find_next_zero_bit_le(const void *addr, unsigned
 		long size, unsigned long offset)
 {
-	return _find_next_bit(addr, NULL, size, offset, ~0UL, 1);
+	return _find_next_bit(addr, NULL, size, offset, ~0UL, FNB_AND, 1);
 }
 EXPORT_SYMBOL(find_next_zero_bit_le);
 #endif
@@ -170,7 +206,7 @@ EXPORT_SYMBOL(find_next_zero_bit_le);
 unsigned long find_next_bit_le(const void *addr, unsigned
 		long size, unsigned long offset)
 {
-	return _find_next_bit(addr, NULL, size, offset, 0UL, 1);
+	return _find_next_bit(addr, NULL, size, offset, 0UL, FNB_AND, 1);
 }
 EXPORT_SYMBOL(find_next_bit_le);
 #endif
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v7 07/23] cpumask: Introduce a new iterator for_each_cpu_wrap_or
  2020-08-28 19:51 [RFC PATCH v7 00/23] Core scheduling v7 Julien Desfossez
                   ` (5 preceding siblings ...)
  2020-08-28 19:51 ` [RFC PATCH v7 06/23] bitops: Introduce find_next_or_bit Julien Desfossez
@ 2020-08-28 19:51 ` Julien Desfossez
  2020-08-28 19:51 ` [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling Julien Desfossez
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: Julien Desfossez @ 2020-08-28 19:51 UTC (permalink / raw)
  To: Peter Zijlstra, Vineeth Pillai, Joel Fernandes, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan
  Cc: mingo, tglx, pjt, torvalds, linux-kernel, fweisbec, keescook,
	kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta,
	Paolo Bonzini, joel, vineeth, Chen Yu, Christian Brauner,
	Agata Gruza, Antonio Gomez Iglesias, graf, konrad.wilk,
	dfaggioli, rostedt, derkling, benbjiang

From: Vineeth Pillai <viremana@linux.microsoft.com>

Hotplug fixes to core-scheduling require a new cpumask iterator
which iterates through all online cpus in both the given cpumasks.
This patch introduces it.

Signed-off-by: Vineeth Pillai <viremana@linux.microsoft.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/cpumask.h | 42 ++++++++++++++++++++++++++++++++
 lib/cpumask.c           | 53 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 95 insertions(+)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index f0d895d6ac39..27e7e019237b 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -207,6 +207,10 @@ static inline int cpumask_any_and_distribute(const struct cpumask *src1p,
 	for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask, (void)(start))
 #define for_each_cpu_and(cpu, mask1, mask2)	\
 	for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask1, (void)mask2)
+#define for_each_cpu_or(cpu, mask1, mask2)	\
+	for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask1, (void)mask2)
+#define for_each_cpu_wrap_or(cpu, mask1, mask2, start)	\
+	for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask1, (void)mask2, (void)(start))
 #else
 /**
  * cpumask_first - get the first cpu in a cpumask
@@ -248,6 +252,7 @@ static inline unsigned int cpumask_next_zero(int n, const struct cpumask *srcp)
 }
 
 int cpumask_next_and(int n, const struct cpumask *, const struct cpumask *);
+int cpumask_next_or(int n, const struct cpumask *, const struct cpumask *);
 int cpumask_any_but(const struct cpumask *mask, unsigned int cpu);
 unsigned int cpumask_local_spread(unsigned int i, int node);
 int cpumask_any_and_distribute(const struct cpumask *src1p,
@@ -278,6 +283,8 @@ int cpumask_any_and_distribute(const struct cpumask *src1p,
 		(cpu) < nr_cpu_ids;)
 
 extern int cpumask_next_wrap(int n, const struct cpumask *mask, int start, bool wrap);
+extern int cpumask_next_wrap_or(int n, const struct cpumask *mask1,
+				 const struct cpumask *mask2, int start, bool wrap);
 
 /**
  * for_each_cpu_wrap - iterate over every cpu in a mask, starting at a specified location
@@ -294,6 +301,22 @@ extern int cpumask_next_wrap(int n, const struct cpumask *mask, int start, bool
 	     (cpu) < nr_cpumask_bits;						\
 	     (cpu) = cpumask_next_wrap((cpu), (mask), (start), true))
 
+/**
+ * for_each_cpu_wrap_or - iterate over every cpu in both masks, starting at a specified location
+ * @cpu: the (optionally unsigned) integer iterator
+ * @mask1: the first cpumask pointer
+ * @mask2: the second cpumask pointer
+ * @start: the start location
+ *
+ * The implementation does not assume any bit both masks are set (including @start).
+ *
+ * After the loop, cpu is >= nr_cpu_ids.
+ */
+#define for_each_cpu_wrap_or(cpu, mask1, mask2, start)					\
+	for ((cpu) = cpumask_next_wrap_or((start)-1, (mask1), (mask2), (start), false);	\
+	     (cpu) < nr_cpumask_bits;						\
+	     (cpu) = cpumask_next_wrap_or((cpu), (mask1), (mask2), (start), true))
+
 /**
  * for_each_cpu_and - iterate over every cpu in both masks
  * @cpu: the (optionally unsigned) integer iterator
@@ -312,6 +335,25 @@ extern int cpumask_next_wrap(int n, const struct cpumask *mask, int start, bool
 	for ((cpu) = -1;						\
 		(cpu) = cpumask_next_and((cpu), (mask1), (mask2)),	\
 		(cpu) < nr_cpu_ids;)
+
+/**
+ * for_each_cpu_or - iterate over every cpu in (mask1 | mask2)
+ * @cpu: the (optionally unsigned) integer iterator
+ * @mask1: the first cpumask pointer
+ * @mask2: the second cpumask pointer
+ *
+ * This saves a temporary CPU mask in many places.  It is equivalent to:
+ *	struct cpumask tmp;
+ *	cpumask_or(&tmp, &mask1, &mask2);
+ *	for_each_cpu(cpu, &tmp)
+ *		...
+ *
+ * After the loop, cpu is >= nr_cpu_ids.
+ */
+#define for_each_cpu_or(cpu, mask1, mask2)				\
+	for ((cpu) = -1;						\
+		(cpu) = cpumask_next_or((cpu), (mask1), (mask2)),	\
+		(cpu) < nr_cpu_ids;)
 #endif /* SMP */
 
 #define CPU_BITS_NONE						\
diff --git a/lib/cpumask.c b/lib/cpumask.c
index 85da6ab4fbb5..351c56de4632 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -43,6 +43,25 @@ int cpumask_next_and(int n, const struct cpumask *src1p,
 }
 EXPORT_SYMBOL(cpumask_next_and);
 
+/**
+ * cpumask_next_or - get the next cpu in *src1p | *src2p
+ * @n: the cpu prior to the place to search (ie. return will be > @n)
+ * @src1p: the first cpumask pointer
+ * @src2p: the second cpumask pointer
+ *
+ * Returns >= nr_cpu_ids if no further cpus set in both.
+ */
+int cpumask_next_or(int n, const struct cpumask *src1p,
+		    const struct cpumask *src2p)
+{
+	/* -1 is a legal arg here. */
+	if (n != -1)
+		cpumask_check(n);
+	return find_next_or_bit(cpumask_bits(src1p), cpumask_bits(src2p),
+		nr_cpumask_bits, n + 1);
+}
+EXPORT_SYMBOL(cpumask_next_or);
+
 /**
  * cpumask_any_but - return a "random" in a cpumask, but not this one.
  * @mask: the cpumask to search
@@ -95,6 +114,40 @@ int cpumask_next_wrap(int n, const struct cpumask *mask, int start, bool wrap)
 }
 EXPORT_SYMBOL(cpumask_next_wrap);
 
+/**
+ * cpumask_next_wrap_or - helper to implement for_each_cpu_wrap_or
+ * @n: the cpu prior to the place to search
+ * @mask1: first cpumask pointer
+ * @mask2: second cpumask pointer
+ * @start: the start point of the iteration
+ * @wrap: assume @n crossing @start terminates the iteration
+ *
+ * Returns >= nr_cpu_ids on completion
+ *
+ * Note: the @wrap argument is required for the start condition when
+ * we cannot assume @start is set in @mask.
+ */
+int cpumask_next_wrap_or(int n, const struct cpumask *mask1, const struct cpumask *mask2,
+			   int start, bool wrap)
+{
+	int next;
+
+again:
+	next = cpumask_next_or(n, mask1, mask2);
+
+	if (wrap && n < start && next >= start) {
+		return nr_cpumask_bits;
+
+	} else if (next >= nr_cpumask_bits) {
+		wrap = true;
+		n = -1;
+		goto again;
+	}
+
+	return next;
+}
+EXPORT_SYMBOL(cpumask_next_wrap_or);
+
 /* These are not inline because of header tangles. */
 #ifdef CONFIG_CPUMASK_OFFSTACK
 /**
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling.
  2020-08-28 19:51 [RFC PATCH v7 00/23] Core scheduling v7 Julien Desfossez
                   ` (6 preceding siblings ...)
  2020-08-28 19:51 ` [RFC PATCH v7 07/23] cpumask: Introduce a new iterator for_each_cpu_wrap_or Julien Desfossez
@ 2020-08-28 19:51 ` Julien Desfossez
  2020-08-28 20:51   ` Peter Zijlstra
                     ` (2 more replies)
  2020-08-28 19:51 ` [RFC PATCH v7 09/23] sched/fair: Fix forced idle sibling starvation corner case Julien Desfossez
                   ` (14 subsequent siblings)
  22 siblings, 3 replies; 68+ messages in thread
From: Julien Desfossez @ 2020-08-28 19:51 UTC (permalink / raw)
  To: Peter Zijlstra, Vineeth Pillai, Joel Fernandes, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan
  Cc: mingo, tglx, pjt, torvalds, linux-kernel, fweisbec, keescook,
	kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta,
	Paolo Bonzini, joel, vineeth, Chen Yu, Christian Brauner,
	Agata Gruza, Antonio Gomez Iglesias, graf, konrad.wilk,
	dfaggioli, rostedt, derkling, benbjiang, Julien Desfossez,
	Vineeth Remanan Pillai, Aaron Lu

From: Peter Zijlstra <peterz@infradead.org>

Instead of only selecting a local task, select a task for all SMT
siblings for every reschedule on the core (irrespective which logical
CPU does the reschedule).

During a CPU hotplug event, schedule would be called with the hotplugged
CPU not in the cpumask. So use for_each_cpu(_wrap)_or to include the
current cpu in the task pick loop.

There are multiple loops in pick_next_task that iterate over CPUs in
smt_mask. During a hotplug event, sibling could be removed from the
smt_mask while pick_next_task is running. So we cannot trust the mask
across the different loops. This can confuse the logic. Add a retry logic
if smt_mask changes between the loops.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Julien Desfossez <jdesfossez@digitalocean.com>
Signed-off-by: Vineeth Remanan Pillai <vpillai@digitalocean.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Aaron Lu <aaron.lu@linux.alibaba.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
 kernel/sched/core.c  | 284 ++++++++++++++++++++++++++++++++++++++++++-
 kernel/sched/sched.h |   6 +-
 2 files changed, 288 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index eea18956a9ef..1f480b6025cd 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4537,7 +4537,7 @@ static void put_prev_task_balance(struct rq *rq, struct task_struct *prev,
  * Pick up the highest-prio task:
  */
 static inline struct task_struct *
-pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
+__pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 {
 	const struct sched_class *class;
 	struct task_struct *p;
@@ -4577,6 +4577,283 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 	BUG();
 }
 
+#ifdef CONFIG_SCHED_CORE
+
+static inline bool cookie_equals(struct task_struct *a, unsigned long cookie)
+{
+	return is_idle_task(a) || (a->core_cookie == cookie);
+}
+
+static inline bool cookie_match(struct task_struct *a, struct task_struct *b)
+{
+	if (is_idle_task(a) || is_idle_task(b))
+		return true;
+
+	return a->core_cookie == b->core_cookie;
+}
+
+// XXX fairness/fwd progress conditions
+/*
+ * Returns
+ * - NULL if there is no runnable task for this class.
+ * - the highest priority task for this runqueue if it matches
+ *   rq->core->core_cookie or its priority is greater than max.
+ * - Else returns idle_task.
+ */
+static struct task_struct *
+pick_task(struct rq *rq, const struct sched_class *class, struct task_struct *max)
+{
+	struct task_struct *class_pick, *cookie_pick;
+	unsigned long cookie = rq->core->core_cookie;
+
+	class_pick = class->pick_task(rq);
+	if (!class_pick)
+		return NULL;
+
+	if (!cookie) {
+		/*
+		 * If class_pick is tagged, return it only if it has
+		 * higher priority than max.
+		 */
+		if (max && class_pick->core_cookie &&
+		    prio_less(class_pick, max))
+			return idle_sched_class.pick_task(rq);
+
+		return class_pick;
+	}
+
+	/*
+	 * If class_pick is idle or matches cookie, return early.
+	 */
+	if (cookie_equals(class_pick, cookie))
+		return class_pick;
+
+	cookie_pick = sched_core_find(rq, cookie);
+
+	/*
+	 * If class > max && class > cookie, it is the highest priority task on
+	 * the core (so far) and it must be selected, otherwise we must go with
+	 * the cookie pick in order to satisfy the constraint.
+	 */
+	if (prio_less(cookie_pick, class_pick) &&
+	    (!max || prio_less(max, class_pick)))
+		return class_pick;
+
+	return cookie_pick;
+}
+
+static struct task_struct *
+pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
+{
+	struct task_struct *next, *max = NULL;
+	const struct sched_class *class;
+	const struct cpumask *smt_mask;
+	int i, j, cpu;
+	int smt_weight;
+	bool need_sync;
+
+	if (!sched_core_enabled(rq))
+		return __pick_next_task(rq, prev, rf);
+
+	/*
+	 * If there were no {en,de}queues since we picked (IOW, the task
+	 * pointers are all still valid), and we haven't scheduled the last
+	 * pick yet, do so now.
+	 */
+	if (rq->core_pick_seq == rq->core->core_task_seq &&
+	    rq->core_pick_seq != rq->core_sched_seq) {
+		WRITE_ONCE(rq->core_sched_seq, rq->core_pick_seq);
+
+		next = rq->core_pick;
+		if (next != prev) {
+			put_prev_task(rq, prev);
+			set_next_task(rq, next);
+		}
+		return next;
+	}
+
+	put_prev_task_balance(rq, prev, rf);
+
+	cpu = cpu_of(rq);
+	smt_mask = cpu_smt_mask(cpu);
+
+	/*
+	 * core->core_task_seq, rq->core_pick_seq, rq->core_sched_seq
+	 *
+	 * @task_seq guards the task state ({en,de}queues)
+	 * @pick_seq is the @task_seq we did a selection on
+	 * @sched_seq is the @pick_seq we scheduled
+	 *
+	 * However, preemptions can cause multiple picks on the same task set.
+	 * 'Fix' this by also increasing @task_seq for every pick.
+	 */
+	rq->core->core_task_seq++;
+	need_sync = !!rq->core->core_cookie;
+
+retry_select:
+	smt_weight = cpumask_weight(smt_mask);
+
+	/* reset state */
+	rq->core->core_cookie = 0UL;
+	for_each_cpu_or(i, smt_mask, cpumask_of(cpu)) {
+		struct rq *rq_i = cpu_rq(i);
+
+		rq_i->core_pick = NULL;
+
+		if (rq_i->core_forceidle) {
+			need_sync = true;
+			rq_i->core_forceidle = false;
+		}
+
+		if (i != cpu)
+			update_rq_clock(rq_i);
+	}
+
+	/*
+	 * Try and select tasks for each sibling in decending sched_class
+	 * order.
+	 */
+	for_each_class(class) {
+again:
+		for_each_cpu_wrap_or(i, smt_mask, cpumask_of(cpu), cpu) {
+			struct rq *rq_i = cpu_rq(i);
+			struct task_struct *p;
+
+			/*
+			 * During hotplug online a sibling can be added in
+			 * the smt_mask * while we are here. If so, we would
+			 * need to restart selection by resetting all over.
+			 */
+			if (unlikely(smt_weight != cpumask_weight(smt_mask)))
+				goto retry_select;
+
+			if (rq_i->core_pick)
+				continue;
+
+			/*
+			 * If this sibling doesn't yet have a suitable task to
+			 * run; ask for the most elegible task, given the
+			 * highest priority task already selected for this
+			 * core.
+			 */
+			p = pick_task(rq_i, class, max);
+			if (!p) {
+				/*
+				 * If there weren't no cookies; we don't need
+				 * to bother with the other siblings.
+				 */
+				if (i == cpu && !need_sync)
+					goto next_class;
+
+				continue;
+			}
+
+			/*
+			 * Optimize the 'normal' case where there aren't any
+			 * cookies and we don't need to sync up.
+			 */
+			if (i == cpu && !need_sync && !p->core_cookie) {
+				next = p;
+				goto done;
+			}
+
+			rq_i->core_pick = p;
+
+			/*
+			 * If this new candidate is of higher priority than the
+			 * previous; and they're incompatible; we need to wipe
+			 * the slate and start over. pick_task makes sure that
+			 * p's priority is more than max if it doesn't match
+			 * max's cookie.
+			 *
+			 * NOTE: this is a linear max-filter and is thus bounded
+			 * in execution time.
+			 */
+			if (!max || !cookie_match(max, p)) {
+				struct task_struct *old_max = max;
+
+				rq->core->core_cookie = p->core_cookie;
+				max = p;
+
+				if (old_max) {
+					for_each_cpu(j, smt_mask) {
+						if (j == i)
+							continue;
+
+						cpu_rq(j)->core_pick = NULL;
+					}
+					goto again;
+				} else {
+					/*
+					 * Once we select a task for a cpu, we
+					 * should not be doing an unconstrained
+					 * pick because it might starve a task
+					 * on a forced idle cpu.
+					 */
+					need_sync = true;
+				}
+
+			}
+		}
+next_class:;
+	}
+
+	next = rq->core_pick;
+
+	/* Something should have been selected for current CPU */
+	WARN_ON_ONCE(!next);
+
+	/*
+	 * Reschedule siblings
+	 *
+	 * NOTE: L1TF -- at this point we're no longer running the old task and
+	 * sending an IPI (below) ensures the sibling will no longer be running
+	 * their task. This ensures there is no inter-sibling overlap between
+	 * non-matching user state.
+	 */
+	for_each_cpu_or(i, smt_mask, cpumask_of(cpu)) {
+		struct rq *rq_i = cpu_rq(i);
+
+		WARN_ON_ONCE(smt_weight == cpumask_weight(smt_mask) && !rq->core_pick);
+
+		/*
+		 * During hotplug online a sibling can be added in the smt_mask
+		 * while we are here. We might have missed picking a task for it.
+		 * Ignore it now as a schedule on that sibling will correct itself.
+		 */
+		if (!rq_i->core_pick)
+			continue;
+
+		if (is_idle_task(rq_i->core_pick) && rq_i->nr_running)
+			rq_i->core_forceidle = true;
+
+		if (i == cpu)
+			continue;
+
+		if (rq_i->curr != rq_i->core_pick) {
+			WRITE_ONCE(rq_i->core_pick_seq, rq->core->core_task_seq);
+			resched_curr(rq_i);
+		}
+
+		/* Did we break L1TF mitigation requirements? */
+		WARN_ON_ONCE(!cookie_match(next, rq_i->core_pick));
+	}
+
+done:
+	set_next_task(rq, next);
+	return next;
+}
+
+#else /* !CONFIG_SCHED_CORE */
+
+static struct task_struct *
+pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
+{
+	return __pick_next_task(rq, prev, rf);
+}
+
+#endif /* CONFIG_SCHED_CORE */
+
 /*
  * __schedule() is the main scheduler function.
  *
@@ -7433,7 +7710,12 @@ void __init sched_init(void)
 
 #ifdef CONFIG_SCHED_CORE
 		rq->core = NULL;
+		rq->core_pick = NULL;
 		rq->core_enabled = 0;
+		rq->core_tree = RB_ROOT;
+		rq->core_forceidle = false;
+
+		rq->core_cookie = 0UL;
 #endif
 	}
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 92e0b8679eef..def442f2c690 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1052,11 +1052,16 @@ struct rq {
 #ifdef CONFIG_SCHED_CORE
 	/* per rq */
 	struct rq		*core;
+	struct task_struct	*core_pick;
+	unsigned int		core_pick_seq;
 	unsigned int		core_enabled;
+	unsigned int		core_sched_seq;
 	struct rb_root		core_tree;
+	unsigned char		core_forceidle;
 
 	/* shared state */
 	unsigned int		core_task_seq;
+	unsigned long		core_cookie;
 #endif
 };
 
@@ -1929,7 +1934,6 @@ static inline void put_prev_task(struct rq *rq, struct task_struct *prev)
 
 static inline void set_next_task(struct rq *rq, struct task_struct *next)
 {
-	WARN_ON_ONCE(rq->curr != next);
 	next->sched_class->set_next_task(rq, next, false);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v7 09/23] sched/fair: Fix forced idle sibling starvation corner case
  2020-08-28 19:51 [RFC PATCH v7 00/23] Core scheduling v7 Julien Desfossez
                   ` (7 preceding siblings ...)
  2020-08-28 19:51 ` [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling Julien Desfossez
@ 2020-08-28 19:51 ` Julien Desfossez
  2020-08-28 21:25   ` Peter Zijlstra
  2020-08-28 19:51 ` [RFC PATCH v7 10/23] sched/fair: wrapper for cfs_rq->min_vruntime Julien Desfossez
                   ` (13 subsequent siblings)
  22 siblings, 1 reply; 68+ messages in thread
From: Julien Desfossez @ 2020-08-28 19:51 UTC (permalink / raw)
  To: Peter Zijlstra, Vineeth Pillai, Joel Fernandes, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan
  Cc: mingo, tglx, pjt, torvalds, linux-kernel, fweisbec, keescook,
	kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta,
	Paolo Bonzini, joel, vineeth, Chen Yu, Christian Brauner,
	Agata Gruza, Antonio Gomez Iglesias, graf, konrad.wilk,
	dfaggioli, rostedt, derkling, benbjiang, Vineeth Remanan Pillai,
	Julien Desfossez

From: Vineeth Pillai <viremana@linux.microsoft.com>

If there is only one long running local task and the sibling is
forced idle, it  might not get a chance to run until a schedule
event happens on any cpu in the core.

So we check for this condition during a tick to see if a sibling
is starved and then give it a chance to schedule.

Signed-off-by: Vineeth Remanan Pillai <vpillai@digitalocean.com>
Signed-off-by: Julien Desfossez <jdesfossez@digitalocean.com>
---
 kernel/sched/fair.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 285002a2f641..409edc736297 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10631,6 +10631,40 @@ static void rq_offline_fair(struct rq *rq)
 
 #endif /* CONFIG_SMP */
 
+#ifdef CONFIG_SCHED_CORE
+static inline bool
+__entity_slice_used(struct sched_entity *se)
+{
+	return (se->sum_exec_runtime - se->prev_sum_exec_runtime) >
+		sched_slice(cfs_rq_of(se), se);
+}
+
+/*
+ * If runqueue has only one task which used up its slice and if the sibling
+ * is forced idle, then trigger schedule to give forced idle task a chance.
+ */
+static void resched_forceidle_sibling(struct rq *rq, struct sched_entity *se)
+{
+	int cpu = cpu_of(rq), sibling_cpu;
+
+	if (rq->cfs.nr_running > 1 || !__entity_slice_used(se))
+		return;
+
+	for_each_cpu(sibling_cpu, cpu_smt_mask(cpu)) {
+		struct rq *sibling_rq;
+		if (sibling_cpu == cpu)
+			continue;
+		if (cpu_is_offline(sibling_cpu))
+			continue;
+
+		sibling_rq = cpu_rq(sibling_cpu);
+		if (sibling_rq->core_forceidle) {
+			resched_curr(sibling_rq);
+		}
+	}
+}
+#endif
+
 /*
  * scheduler tick hitting a task of our scheduling class.
  *
@@ -10654,6 +10688,11 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
 
 	update_misfit_status(curr, rq);
 	update_overutilized_status(task_rq(curr));
+
+#ifdef CONFIG_SCHED_CORE
+	if (sched_core_enabled(rq))
+		resched_forceidle_sibling(rq, &curr->se);
+#endif
 }
 
 /*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v7 10/23] sched/fair: wrapper for cfs_rq->min_vruntime
  2020-08-28 19:51 [RFC PATCH v7 00/23] Core scheduling v7 Julien Desfossez
                   ` (8 preceding siblings ...)
  2020-08-28 19:51 ` [RFC PATCH v7 09/23] sched/fair: Fix forced idle sibling starvation corner case Julien Desfossez
@ 2020-08-28 19:51 ` Julien Desfossez
  2020-08-28 19:51 ` [RFC PATCH v7 11/23] sched/fair: core wide cfs task priority comparison Julien Desfossez
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: Julien Desfossez @ 2020-08-28 19:51 UTC (permalink / raw)
  To: Peter Zijlstra, Vineeth Pillai, Joel Fernandes, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan
  Cc: Aaron Lu, mingo, tglx, pjt, torvalds, linux-kernel, fweisbec,
	keescook, kerrnel, Phil Auld, Valentin Schneider, Mel Gorman,
	Pawan Gupta, Paolo Bonzini, joel, vineeth, Chen Yu,
	Christian Brauner, Agata Gruza, Antonio Gomez Iglesias, graf,
	konrad.wilk, dfaggioli, rostedt, derkling, benbjiang, Aaron Lu

From: Aaron Lu <aaron.lu@linux.alibaba.com>

Add a wrapper function cfs_rq_min_vruntime(cfs_rq) to
return cfs_rq->min_vruntime.

It will be used in the following patch, no functionality
change.

Signed-off-by: Aaron Lu <ziqian.lzq@antfin.com>
---
 kernel/sched/fair.c | 27 ++++++++++++++++-----------
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 409edc736297..298d2c521c1e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -460,6 +460,11 @@ find_matching_se(struct sched_entity **se, struct sched_entity **pse)
 
 #endif	/* CONFIG_FAIR_GROUP_SCHED */
 
+static inline u64 cfs_rq_min_vruntime(struct cfs_rq *cfs_rq)
+{
+	return cfs_rq->min_vruntime;
+}
+
 static __always_inline
 void account_cfs_rq_runtime(struct cfs_rq *cfs_rq, u64 delta_exec);
 
@@ -496,7 +501,7 @@ static void update_min_vruntime(struct cfs_rq *cfs_rq)
 	struct sched_entity *curr = cfs_rq->curr;
 	struct rb_node *leftmost = rb_first_cached(&cfs_rq->tasks_timeline);
 
-	u64 vruntime = cfs_rq->min_vruntime;
+	u64 vruntime = cfs_rq_min_vruntime(cfs_rq);
 
 	if (curr) {
 		if (curr->on_rq)
@@ -516,7 +521,7 @@ static void update_min_vruntime(struct cfs_rq *cfs_rq)
 	}
 
 	/* ensure we never gain time by being placed backwards. */
-	cfs_rq->min_vruntime = max_vruntime(cfs_rq->min_vruntime, vruntime);
+	cfs_rq->min_vruntime = max_vruntime(cfs_rq_min_vruntime(cfs_rq), vruntime);
 #ifndef CONFIG_64BIT
 	smp_wmb();
 	cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
@@ -4044,7 +4049,7 @@ static inline void update_misfit_status(struct task_struct *p, struct rq *rq) {}
 static void check_spread(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
 #ifdef CONFIG_SCHED_DEBUG
-	s64 d = se->vruntime - cfs_rq->min_vruntime;
+	s64 d = se->vruntime - cfs_rq_min_vruntime(cfs_rq);
 
 	if (d < 0)
 		d = -d;
@@ -4057,7 +4062,7 @@ static void check_spread(struct cfs_rq *cfs_rq, struct sched_entity *se)
 static void
 place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
 {
-	u64 vruntime = cfs_rq->min_vruntime;
+	u64 vruntime = cfs_rq_min_vruntime(cfs_rq);
 
 	/*
 	 * The 'current' period is already promised to the current tasks,
@@ -4151,7 +4156,7 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	 * update_curr().
 	 */
 	if (renorm && curr)
-		se->vruntime += cfs_rq->min_vruntime;
+		se->vruntime += cfs_rq_min_vruntime(cfs_rq);
 
 	update_curr(cfs_rq);
 
@@ -4162,7 +4167,7 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	 * fairness detriment of existing tasks.
 	 */
 	if (renorm && !curr)
-		se->vruntime += cfs_rq->min_vruntime;
+		se->vruntime += cfs_rq_min_vruntime(cfs_rq);
 
 	/*
 	 * When enqueuing a sched_entity, we must:
@@ -4281,7 +4286,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	 * can move min_vruntime forward still more.
 	 */
 	if (!(flags & DEQUEUE_SLEEP))
-		se->vruntime -= cfs_rq->min_vruntime;
+		se->vruntime -= cfs_rq_min_vruntime(cfs_rq);
 
 	/* return excess runtime on last dequeue */
 	return_cfs_rq_runtime(cfs_rq);
@@ -6717,7 +6722,7 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
 			min_vruntime = cfs_rq->min_vruntime;
 		} while (min_vruntime != min_vruntime_copy);
 #else
-		min_vruntime = cfs_rq->min_vruntime;
+		min_vruntime = cfs_rq_min_vruntime(cfs_rq);
 #endif
 
 		se->vruntime -= min_vruntime;
@@ -10727,7 +10732,7 @@ static void task_fork_fair(struct task_struct *p)
 		resched_curr(rq);
 	}
 
-	se->vruntime -= cfs_rq->min_vruntime;
+	se->vruntime -= cfs_rq_min_vruntime(cfs_rq);
 	rq_unlock(rq, &rf);
 }
 
@@ -10850,7 +10855,7 @@ static void detach_task_cfs_rq(struct task_struct *p)
 		 * cause 'unlimited' sleep bonus.
 		 */
 		place_entity(cfs_rq, se, 0);
-		se->vruntime -= cfs_rq->min_vruntime;
+		se->vruntime -= cfs_rq_min_vruntime(cfs_rq);
 	}
 
 	detach_entity_cfs_rq(se);
@@ -10864,7 +10869,7 @@ static void attach_task_cfs_rq(struct task_struct *p)
 	attach_entity_cfs_rq(se);
 
 	if (!vruntime_normalized(p))
-		se->vruntime += cfs_rq->min_vruntime;
+		se->vruntime += cfs_rq_min_vruntime(cfs_rq);
 }
 
 static void switched_from_fair(struct rq *rq, struct task_struct *p)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v7 11/23] sched/fair: core wide cfs task priority comparison
  2020-08-28 19:51 [RFC PATCH v7 00/23] Core scheduling v7 Julien Desfossez
                   ` (9 preceding siblings ...)
  2020-08-28 19:51 ` [RFC PATCH v7 10/23] sched/fair: wrapper for cfs_rq->min_vruntime Julien Desfossez
@ 2020-08-28 19:51 ` Julien Desfossez
  2020-08-28 21:29   ` Peter Zijlstra
  2020-09-15 21:49   ` chris hyser
  2020-08-28 19:51 ` [RFC PATCH v7 12/23] sched: Trivial forced-newidle balancer Julien Desfossez
                   ` (11 subsequent siblings)
  22 siblings, 2 replies; 68+ messages in thread
From: Julien Desfossez @ 2020-08-28 19:51 UTC (permalink / raw)
  To: Peter Zijlstra, Vineeth Pillai, Joel Fernandes, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan
  Cc: mingo, tglx, pjt, torvalds, linux-kernel, fweisbec, keescook,
	kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta,
	Paolo Bonzini, joel, vineeth, Chen Yu, Christian Brauner,
	Agata Gruza, Antonio Gomez Iglesias, graf, konrad.wilk,
	dfaggioli, rostedt, derkling, benbjiang, Aaron Lu

From: Aaron Lu <aaron.lwe@gmail.com>

This patch provides a vruntime based way to compare two cfs task's
priority, be it on the same cpu or different threads of the same core.

When the two tasks are on the same CPU, we just need to find a common
cfs_rq both sched_entities are on and then do the comparison.

When the two tasks are on differen threads of the same core, each thread
will choose the next task to run the usual way and then the root level
sched entities which the two tasks belong to will be used to decide
which task runs next core wide.

An illustration for the cross CPU case:

   cpu0         cpu1
 /   |  \     /   |  \
se1 se2 se3  se4 se5 se6
    /  \            /   \
  se21 se22       se61  se62
  (A)                    /
                       se621
                        (B)

Assume CPU0 and CPU1 are smt siblings and cpu0 has decided task A to
run next and cpu1 has decided task B to run next. To compare priority
of task A and B, we compare priority of se2 and se6. Whose vruntime is
smaller, who wins.

To make this work, the root level sched entities' vruntime of the two
threads must be directly comparable. So one of the hyperthread's root
cfs_rq's min_vruntime is chosen as the core wide one and all root level
sched entities' vruntime is normalized against it.

All sub cfs_rqs and sched entities are not interesting in cross cpu
priority comparison as they will only participate in the usual cpu local
schedule decision so no need to normalize their vruntimes.

Signed-off-by: Aaron Lu <ziqian.lzq@antfin.com>
---
 kernel/sched/core.c  |  23 +++----
 kernel/sched/fair.c  | 142 ++++++++++++++++++++++++++++++++++++++++++-
 kernel/sched/sched.h |   3 +
 3 files changed, 150 insertions(+), 18 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1f480b6025cd..707a56ef0fa3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -114,19 +114,8 @@ static inline bool prio_less(struct task_struct *a, struct task_struct *b)
 	if (pa == -1) /* dl_prio() doesn't work because of stop_class above */
 		return !dl_time_before(a->dl.deadline, b->dl.deadline);
 
-	if (pa == MAX_RT_PRIO + MAX_NICE)  { /* fair */
-		u64 vruntime = b->se.vruntime;
-
-		/*
-		 * Normalize the vruntime if tasks are in different cpus.
-		 */
-		if (task_cpu(a) != task_cpu(b)) {
-			vruntime -= task_cfs_rq(b)->min_vruntime;
-			vruntime += task_cfs_rq(a)->min_vruntime;
-		}
-
-		return !((s64)(a->se.vruntime - vruntime) <= 0);
-	}
+	if (pa == MAX_RT_PRIO + MAX_NICE) /* fair */
+		return cfs_prio_less(a, b);
 
 	return false;
 }
@@ -229,8 +218,12 @@ static int __sched_core_stopper(void *data)
 	bool enabled = !!(unsigned long)data;
 	int cpu;
 
-	for_each_possible_cpu(cpu)
-		cpu_rq(cpu)->core_enabled = enabled;
+	for_each_possible_cpu(cpu) {
+		struct rq *rq = cpu_rq(cpu);
+		rq->core_enabled = enabled;
+		if (cpu_online(cpu) && rq->core != rq)
+			sched_core_adjust_sibling_vruntime(cpu, enabled);
+	}
 
 	return 0;
 }
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 298d2c521c1e..e54a35612efa 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -460,11 +460,142 @@ find_matching_se(struct sched_entity **se, struct sched_entity **pse)
 
 #endif	/* CONFIG_FAIR_GROUP_SCHED */
 
+#ifdef CONFIG_SCHED_CORE
+static inline struct cfs_rq *root_cfs_rq(struct cfs_rq *cfs_rq)
+{
+	return &rq_of(cfs_rq)->cfs;
+}
+
+static inline bool is_root_cfs_rq(struct cfs_rq *cfs_rq)
+{
+	return cfs_rq == root_cfs_rq(cfs_rq);
+}
+
+static inline struct cfs_rq *core_cfs_rq(struct cfs_rq *cfs_rq)
+{
+	return &rq_of(cfs_rq)->core->cfs;
+}
+
+static inline u64 cfs_rq_min_vruntime(struct cfs_rq *cfs_rq)
+{
+	if (!sched_core_enabled(rq_of(cfs_rq)) || !is_root_cfs_rq(cfs_rq))
+		return cfs_rq->min_vruntime;
+
+	return core_cfs_rq(cfs_rq)->min_vruntime;
+}
+
+#ifndef CONFIG_64BIT
+static inline u64 cfs_rq_min_vruntime_copy(struct cfs_rq *cfs_rq)
+{
+	if (!sched_core_enabled(rq_of(cfs_rq)) || !is_root_cfs_rq(cfs_rq))
+		return cfs_rq->min_vruntime_copy;
+
+	return core_cfs_rq(cfs_rq)->min_vruntime_copy;
+}
+#endif /* CONFIG_64BIT */
+
+bool cfs_prio_less(struct task_struct *a, struct task_struct *b)
+{
+	struct sched_entity *sea = &a->se;
+	struct sched_entity *seb = &b->se;
+	bool samecpu = task_cpu(a) == task_cpu(b);
+	s64 delta;
+
+#ifdef CONFIG_FAIR_GROUP_SCHED
+	if (samecpu) {
+		/* vruntime is per cfs_rq */
+		while (!is_same_group(sea, seb)) {
+			int sea_depth = sea->depth;
+			int seb_depth = seb->depth;
+
+			if (sea_depth >= seb_depth)
+				sea = parent_entity(sea);
+			if (sea_depth <= seb_depth)
+				seb = parent_entity(seb);
+		}
+
+		delta = (s64)(sea->vruntime - seb->vruntime);
+		goto out;
+	}
+
+	/* crosscpu: compare root level se's vruntime to decide priority */
+	while (sea->parent)
+		sea = sea->parent;
+	while (seb->parent)
+		seb = seb->parent;
+#else
+	/*
+	 * Use the min_vruntime of root cfs_rq if the tasks are
+	 * enqueued in different cpus.
+	 * */
+	if (!samecpu) {
+		delta = (s64)(task_rq(a)->cfs.min_vruntime -
+			      task_rq(b)->cfs.min_vruntime);
+		goto out;
+	}
+#endif /* CONFIG_FAIR_GROUP_SCHED */
+
+	delta = (s64)(sea->vruntime - seb->vruntime);
+
+out:
+	return delta > 0;
+}
+
+/*
+ * This function takes care of adjusting the min_vruntime of siblings of
+ * a core during coresched enable/disable.
+ * This is called in stop machine context so no need to take the rq lock.
+ *
+ * Coresched enable case
+ *  Once Core scheduling is enabled, the root level sched entities
+ *  of both siblings will use cfs_rq->min_vruntime as the common cfs_rq
+ *  min_vruntime, so it's necessary to normalize vruntime of existing root
+ *  level sched entities in sibling_cfs_rq.
+ *
+ *  Update of sibling_cfs_rq's min_vruntime isn't necessary as we will be
+ *  only using cfs_rq->min_vruntime during the entire run of core scheduling.
+ *
+ * Coresched disable case
+ *  During the entire run of core scheduling, sibling_cfs_rq's min_vruntime
+ *  is left unused and could lag far behind its still queued sched entities.
+ *  Sync it to the up2date core wide one to avoid problems.
+ */
+void sched_core_adjust_sibling_vruntime(int cpu, bool coresched_enabled)
+{
+	struct cfs_rq *cfs = &cpu_rq(cpu)->cfs;
+	struct cfs_rq *core_cfs = &cpu_rq(cpu)->core->cfs;
+	if (coresched_enabled) {
+		struct sched_entity *se, *next;
+		s64 delta = core_cfs->min_vruntime - cfs->min_vruntime;
+		rbtree_postorder_for_each_entry_safe(se, next,
+				&cfs->tasks_timeline.rb_root,
+				run_node) {
+			se->vruntime += delta;
+		}
+	} else {
+		cfs->min_vruntime = core_cfs->min_vruntime;
+#ifndef CONFIG_64BIT
+		smp_wmb();
+		cfs->min_vruntime_copy = core_cfs->min_vruntime;
+#endif
+	}
+}
+
+#else
 static inline u64 cfs_rq_min_vruntime(struct cfs_rq *cfs_rq)
 {
 	return cfs_rq->min_vruntime;
 }
 
+#ifndef CONFIG_64BIT
+static inline u64 cfs_rq_min_vruntime_copy(struct cfs_rq *cfs_rq)
+{
+	return cfs_rq->min_vruntime_copy;
+}
+#endif /* CONFIG_64BIT */
+
+#endif /* CONFIG_SCHED_CORE */
+
 static __always_inline
 void account_cfs_rq_runtime(struct cfs_rq *cfs_rq, u64 delta_exec);
 
@@ -520,8 +651,13 @@ static void update_min_vruntime(struct cfs_rq *cfs_rq)
 			vruntime = min_vruntime(vruntime, se->vruntime);
 	}
 
+#ifdef CONFIG_SCHED_CORE
+	if (sched_core_enabled(rq_of(cfs_rq)) && is_root_cfs_rq(cfs_rq))
+		cfs_rq = core_cfs_rq(cfs_rq);
+#endif
+
 	/* ensure we never gain time by being placed backwards. */
-	cfs_rq->min_vruntime = max_vruntime(cfs_rq_min_vruntime(cfs_rq), vruntime);
+	cfs_rq->min_vruntime = max_vruntime(cfs_rq->min_vruntime, vruntime);
 #ifndef CONFIG_64BIT
 	smp_wmb();
 	cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
@@ -6717,9 +6853,9 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
 		u64 min_vruntime_copy;
 
 		do {
-			min_vruntime_copy = cfs_rq->min_vruntime_copy;
+			min_vruntime_copy = cfs_rq_min_vruntime_copy(cfs_rq);
 			smp_rmb();
-			min_vruntime = cfs_rq->min_vruntime;
+			min_vruntime = cfs_rq_min_vruntime(cfs_rq);
 		} while (min_vruntime != min_vruntime_copy);
 #else
 		min_vruntime = cfs_rq_min_vruntime(cfs_rq);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index def442f2c690..7e9346de3f59 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1106,6 +1106,9 @@ static inline raw_spinlock_t *rq_lockp(struct rq *rq)
 	return &rq->__lock;
 }
 
+bool cfs_prio_less(struct task_struct *a, struct task_struct *b);
+void sched_core_adjust_sibling_vruntime(int cpu, bool coresched_enabled);
+
 #else /* !CONFIG_SCHED_CORE */
 
 static inline bool sched_core_enabled(struct rq *rq)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v7 12/23] sched: Trivial forced-newidle balancer
  2020-08-28 19:51 [RFC PATCH v7 00/23] Core scheduling v7 Julien Desfossez
                   ` (10 preceding siblings ...)
  2020-08-28 19:51 ` [RFC PATCH v7 11/23] sched/fair: core wide cfs task priority comparison Julien Desfossez
@ 2020-08-28 19:51 ` Julien Desfossez
  2020-09-02  7:08   ` Pavan Kondeti
  2020-08-28 19:51 ` [RFC PATCH v7 13/23] sched: migration changes for core scheduling Julien Desfossez
                   ` (10 subsequent siblings)
  22 siblings, 1 reply; 68+ messages in thread
From: Julien Desfossez @ 2020-08-28 19:51 UTC (permalink / raw)
  To: Peter Zijlstra, Vineeth Pillai, Joel Fernandes, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan
  Cc: mingo, tglx, pjt, torvalds, linux-kernel, fweisbec, keescook,
	kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta,
	Paolo Bonzini, joel, vineeth, Chen Yu, Christian Brauner,
	Agata Gruza, Antonio Gomez Iglesias, graf, konrad.wilk,
	dfaggioli, rostedt, derkling, benbjiang

From: Peter Zijlstra <peterz@infradead.org>

When a sibling is forced-idle to match the core-cookie; search for
matching tasks to fill the core.

rcu_read_unlock() can incur an infrequent deadlock in
sched_core_balance(). Fix this by using the RCU-sched flavor instead.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
---
 include/linux/sched.h |   1 +
 kernel/sched/core.c   | 130 +++++++++++++++++++++++++++++++++++++++++-
 kernel/sched/idle.c   |   1 +
 kernel/sched/sched.h  |   6 ++
 4 files changed, 137 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5fe9878502cb..eee53fa7b8d4 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -687,6 +687,7 @@ struct task_struct {
 #ifdef CONFIG_SCHED_CORE
 	struct rb_node			core_node;
 	unsigned long			core_cookie;
+	unsigned int			core_occupation;
 #endif
 
 #ifdef CONFIG_CGROUP_SCHED
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 707a56ef0fa3..7e2f310ec0c0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -201,6 +201,21 @@ static struct task_struct *sched_core_find(struct rq *rq, unsigned long cookie)
 	return match;
 }
 
+static struct task_struct *sched_core_next(struct task_struct *p, unsigned long cookie)
+{
+	struct rb_node *node = &p->core_node;
+
+	node = rb_next(node);
+	if (!node)
+		return NULL;
+
+	p = container_of(node, struct task_struct, core_node);
+	if (p->core_cookie != cookie)
+		return NULL;
+
+	return p;
+}
+
 /*
  * The static-key + stop-machine variable are needed such that:
  *
@@ -4641,7 +4656,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 	struct task_struct *next, *max = NULL;
 	const struct sched_class *class;
 	const struct cpumask *smt_mask;
-	int i, j, cpu;
+	int i, j, cpu, occ = 0;
 	int smt_weight;
 	bool need_sync;
 
@@ -4750,6 +4765,9 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 				goto done;
 			}
 
+			if (!is_idle_task(p))
+				occ++;
+
 			rq_i->core_pick = p;
 
 			/*
@@ -4775,6 +4793,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 
 						cpu_rq(j)->core_pick = NULL;
 					}
+					occ = 1;
 					goto again;
 				} else {
 					/*
@@ -4820,6 +4839,8 @@ next_class:;
 		if (is_idle_task(rq_i->core_pick) && rq_i->nr_running)
 			rq_i->core_forceidle = true;
 
+		rq_i->core_pick->core_occupation = occ;
+
 		if (i == cpu)
 			continue;
 
@@ -4837,6 +4858,113 @@ next_class:;
 	return next;
 }
 
+static bool try_steal_cookie(int this, int that)
+{
+	struct rq *dst = cpu_rq(this), *src = cpu_rq(that);
+	struct task_struct *p;
+	unsigned long cookie;
+	bool success = false;
+
+	local_irq_disable();
+	double_rq_lock(dst, src);
+
+	cookie = dst->core->core_cookie;
+	if (!cookie)
+		goto unlock;
+
+	if (dst->curr != dst->idle)
+		goto unlock;
+
+	p = sched_core_find(src, cookie);
+	if (p == src->idle)
+		goto unlock;
+
+	do {
+		if (p == src->core_pick || p == src->curr)
+			goto next;
+
+		if (!cpumask_test_cpu(this, &p->cpus_mask))
+			goto next;
+
+		if (p->core_occupation > dst->idle->core_occupation)
+			goto next;
+
+		p->on_rq = TASK_ON_RQ_MIGRATING;
+		deactivate_task(src, p, 0);
+		set_task_cpu(p, this);
+		activate_task(dst, p, 0);
+		p->on_rq = TASK_ON_RQ_QUEUED;
+
+		resched_curr(dst);
+
+		success = true;
+		break;
+
+next:
+		p = sched_core_next(p, cookie);
+	} while (p);
+
+unlock:
+	double_rq_unlock(dst, src);
+	local_irq_enable();
+
+	return success;
+}
+
+static bool steal_cookie_task(int cpu, struct sched_domain *sd)
+{
+	int i;
+
+	for_each_cpu_wrap(i, sched_domain_span(sd), cpu) {
+		if (i == cpu)
+			continue;
+
+		if (need_resched())
+			break;
+
+		if (try_steal_cookie(cpu, i))
+			return true;
+	}
+
+	return false;
+}
+
+static void sched_core_balance(struct rq *rq)
+{
+	struct sched_domain *sd;
+	int cpu = cpu_of(rq);
+
+	preempt_disable();
+	rcu_read_lock();
+	raw_spin_unlock_irq(rq_lockp(rq));
+	for_each_domain(cpu, sd) {
+		if (need_resched())
+			break;
+
+		if (steal_cookie_task(cpu, sd))
+			break;
+	}
+	raw_spin_lock_irq(rq_lockp(rq));
+	rcu_read_unlock();
+	preempt_enable();
+}
+
+static DEFINE_PER_CPU(struct callback_head, core_balance_head);
+
+void queue_core_balance(struct rq *rq)
+{
+	if (!sched_core_enabled(rq))
+		return;
+
+	if (!rq->core->core_cookie)
+		return;
+
+	if (!rq->nr_running) /* not forced idle */
+		return;
+
+	queue_balance_callback(rq, &per_cpu(core_balance_head, rq->cpu), sched_core_balance);
+}
+
 #else /* !CONFIG_SCHED_CORE */
 
 static struct task_struct *
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 8c41b023dd14..9c5637d866fd 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -406,6 +406,7 @@ static void set_next_task_idle(struct rq *rq, struct task_struct *next, bool fir
 {
 	update_idle_core(rq);
 	schedstat_inc(rq->sched_goidle);
+	queue_core_balance(rq);
 }
 
 #ifdef CONFIG_SMP
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 7e9346de3f59..be6db45276e7 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1109,6 +1109,8 @@ static inline raw_spinlock_t *rq_lockp(struct rq *rq)
 bool cfs_prio_less(struct task_struct *a, struct task_struct *b);
 void sched_core_adjust_sibling_vruntime(int cpu, bool coresched_enabled);
 
+extern void queue_core_balance(struct rq *rq);
+
 #else /* !CONFIG_SCHED_CORE */
 
 static inline bool sched_core_enabled(struct rq *rq)
@@ -1121,6 +1123,10 @@ static inline raw_spinlock_t *rq_lockp(struct rq *rq)
 	return &rq->__lock;
 }
 
+static inline void queue_core_balance(struct rq *rq)
+{
+}
+
 #endif /* CONFIG_SCHED_CORE */
 
 #ifdef CONFIG_SCHED_SMT
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v7 13/23] sched: migration changes for core scheduling
  2020-08-28 19:51 [RFC PATCH v7 00/23] Core scheduling v7 Julien Desfossez
                   ` (11 preceding siblings ...)
  2020-08-28 19:51 ` [RFC PATCH v7 12/23] sched: Trivial forced-newidle balancer Julien Desfossez
@ 2020-08-28 19:51 ` Julien Desfossez
  2020-08-28 19:51 ` [RFC PATCH v7 14/23] irq_work: Add support to detect if work is pending Julien Desfossez
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: Julien Desfossez @ 2020-08-28 19:51 UTC (permalink / raw)
  To: Peter Zijlstra, Vineeth Pillai, Joel Fernandes, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan
  Cc: Aubrey Li, mingo, tglx, pjt, torvalds, linux-kernel, fweisbec,
	keescook, kerrnel, Phil Auld, Valentin Schneider, Mel Gorman,
	Pawan Gupta, Paolo Bonzini, joel, vineeth, Chen Yu,
	Christian Brauner, Agata Gruza, Antonio Gomez Iglesias, graf,
	konrad.wilk, dfaggioli, rostedt, derkling, benbjiang, Aubrey Li,
	Vineeth Remanan Pillai

From: Aubrey Li <aubrey.li@intel.com>

 - Don't migrate if there is a cookie mismatch
     Load balance tries to move task from busiest CPU to the
     destination CPU. When core scheduling is enabled, if the
     task's cookie does not match with the destination CPU's
     core cookie, this task will be skipped by this CPU. This
     mitigates the forced idle time on the destination CPU.

 - Select cookie matched idle CPU
     In the fast path of task wakeup, select the first cookie matched
     idle CPU instead of the first idle CPU.

 - Find cookie matched idlest CPU
     In the slow path of task wakeup, find the idlest CPU whose core
     cookie matches with task's cookie

 - Don't migrate task if cookie not match
     For the NUMA load balance, don't migrate task to the CPU whose
     core cookie does not match with task's cookie

Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Vineeth Remanan Pillai <vpillai@digitalocean.com>
---
 kernel/sched/fair.c  | 64 ++++++++++++++++++++++++++++++++++++++++----
 kernel/sched/sched.h | 29 ++++++++++++++++++++
 2 files changed, 88 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e54a35612efa..ae11a18644b6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2048,6 +2048,15 @@ static void task_numa_find_cpu(struct task_numa_env *env,
 		if (!cpumask_test_cpu(cpu, env->p->cpus_ptr))
 			continue;
 
+#ifdef CONFIG_SCHED_CORE
+		/*
+		 * Skip this cpu if source task's cookie does not match
+		 * with CPU's core cookie.
+		 */
+		if (!sched_core_cookie_match(cpu_rq(cpu), env->p))
+			continue;
+#endif
+
 		env->dst_cpu = cpu;
 		if (task_numa_compare(env, taskimp, groupimp, maymove))
 			break;
@@ -5983,11 +5992,17 @@ find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this
 
 	/* Traverse only the allowed CPUs */
 	for_each_cpu_and(i, sched_group_span(group), p->cpus_ptr) {
+		struct rq *rq = cpu_rq(i);
+
+#ifdef CONFIG_SCHED_CORE
+		if (!sched_core_cookie_match(rq, p))
+			continue;
+#endif
+
 		if (sched_idle_cpu(i))
 			return i;
 
 		if (available_idle_cpu(i)) {
-			struct rq *rq = cpu_rq(i);
 			struct cpuidle_state *idle = idle_get_state(rq);
 			if (idle && idle->exit_latency < min_exit_latency) {
 				/*
@@ -6244,8 +6259,18 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 	for_each_cpu_wrap(cpu, cpus, target) {
 		if (!--nr)
 			return -1;
-		if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
-			break;
+
+		if (available_idle_cpu(cpu) || sched_idle_cpu(cpu)) {
+#ifdef CONFIG_SCHED_CORE
+			/*
+			 * If Core Scheduling is enabled, select this cpu
+			 * only if the process cookie matches core cookie.
+			 */
+			if (sched_core_enabled(cpu_rq(cpu)) &&
+			    p->core_cookie == cpu_rq(cpu)->core->core_cookie)
+#endif
+				break;
+		}
 	}
 
 	time = cpu_clock(this) - time;
@@ -7626,8 +7651,9 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
 	 * We do not migrate tasks that are:
 	 * 1) throttled_lb_pair, or
 	 * 2) cannot be migrated to this CPU due to cpus_ptr, or
-	 * 3) running (obviously), or
-	 * 4) are cache-hot on their current CPU.
+	 * 3) task's cookie does not match with this CPU's core cookie
+	 * 4) running (obviously), or
+	 * 5) are cache-hot on their current CPU.
 	 */
 	if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
 		return 0;
@@ -7662,6 +7688,15 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
 		return 0;
 	}
 
+#ifdef CONFIG_SCHED_CORE
+	/*
+	 * Don't migrate task if the task's cookie does not match
+	 * with the destination CPU's core cookie.
+	 */
+	if (!sched_core_cookie_match(cpu_rq(env->dst_cpu), p))
+		return 0;
+#endif
+
 	/* Record that we found atleast one task that could run on dst_cpu */
 	env->flags &= ~LBF_ALL_PINNED;
 
@@ -8886,6 +8921,25 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu)
 					p->cpus_ptr))
 			continue;
 
+#ifdef CONFIG_SCHED_CORE
+		if (sched_core_enabled(cpu_rq(this_cpu))) {
+			int i = 0;
+			bool cookie_match = false;
+
+			for_each_cpu(i, sched_group_span(group)) {
+				struct rq *rq = cpu_rq(i);
+
+				if (sched_core_cookie_match(rq, p)) {
+					cookie_match = true;
+					break;
+				}
+			}
+			/* Skip over this group if no cookie matched */
+			if (!cookie_match)
+				continue;
+		}
+#endif
+
 		local_group = cpumask_test_cpu(this_cpu,
 					       sched_group_span(group));
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index be6db45276e7..1c94b2862069 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1109,6 +1109,35 @@ static inline raw_spinlock_t *rq_lockp(struct rq *rq)
 bool cfs_prio_less(struct task_struct *a, struct task_struct *b);
 void sched_core_adjust_sibling_vruntime(int cpu, bool coresched_enabled);
 
+/*
+ * Helper to check if the CPU's core cookie matches with the task's cookie
+ * when core scheduling is enabled.
+ * A special case is that the task's cookie always matches with CPU's core
+ * cookie if the CPU is in an idle core.
+ */
+static inline bool sched_core_cookie_match(struct rq *rq, struct task_struct *p)
+{
+	bool idle_core = true;
+	int cpu;
+
+	/* Ignore cookie match if core scheduler is not enabled on the CPU. */
+	if (!sched_core_enabled(rq))
+		return true;
+
+	for_each_cpu(cpu, cpu_smt_mask(cpu_of(rq))) {
+		if (!available_idle_cpu(cpu)) {
+			idle_core = false;
+			break;
+		}
+	}
+
+	/*
+	 * A CPU in an idle core is always the best choice for tasks with
+	 * cookies.
+	 */
+	return idle_core || rq->core->core_cookie == p->core_cookie;
+}
+
 extern void queue_core_balance(struct rq *rq);
 
 #else /* !CONFIG_SCHED_CORE */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v7 14/23] irq_work: Add support to detect if work is pending
  2020-08-28 19:51 [RFC PATCH v7 00/23] Core scheduling v7 Julien Desfossez
                   ` (12 preceding siblings ...)
  2020-08-28 19:51 ` [RFC PATCH v7 13/23] sched: migration changes for core scheduling Julien Desfossez
@ 2020-08-28 19:51 ` Julien Desfossez
  2020-08-28 19:51 ` [RFC PATCH v7 15/23] entry/idle: Add a common function for activites during idle entry/exit Julien Desfossez
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: Julien Desfossez @ 2020-08-28 19:51 UTC (permalink / raw)
  To: Peter Zijlstra, Vineeth Pillai, Joel Fernandes, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan
  Cc: Joel Fernandes (Google),
	mingo, tglx, pjt, torvalds, linux-kernel, fweisbec, keescook,
	kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta,
	Paolo Bonzini, vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, rostedt,
	derkling, benbjiang, paulmck

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

When an unsafe region is entered on an HT, an IPI needs to be sent to
siblings to ensure they enter the kernel.

Following are the reasons why we would like to use irq_work to implement
forcing of sibling into kernel mode:

1. Existing smp_call infrastructure cannot be used easily since we could
   end up waiting on CSD lock if previously an smp_call was not yet
   serviced.

2. I'd like to use generic code, such that there is no need to add an
   arch-specific IPI.

3. IRQ work already has support to detect that previous work was not yet
   executed through the IRQ_WORK_PENDING bit.

4. We need to know if the destination of the IPI is not sending more
   IPIs due to that IPI itself causing an entry into unsafe region.

Support for 4. requires us to be able to detect that irq_work is
pending.

This commit therefore adds a way for irq_work users to know if a
previous per-HT irq_work is pending. If it is, we need not send new
IPIs.

Memory ordering:

I was trying to handle the MP-pattern below. Consider the flag to be the
pending bit. P0() is the IRQ work handler. P1() is the code calling
irq_work_pending(). P0() already implicitly adds a memory barrier as a
part of the atomic_fetch_andnot() before calling work->func(). For P1(),
this patch adds the memory barrier as the atomic_read() in this patch's
irq_work_pending() is not sufficient.

        P0()
        {
                WRITE_ONCE(buf, 1);
                WRITE_ONCE(flag, 1);
        }

        P1()
        {
                int r1;
                int r2 = 0;

                r1 = READ_ONCE(flag);
                if (r1)
                        r2 = READ_ONCE(buf);
        }

Note: This patch is included in the following:
https://lore.kernel.org/lkml/20200722153017.024407984@infradead.org/

This could be removed when the above patch gets merged.

Cc: paulmck@kernel.org
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/irq_work.h |  1 +
 kernel/irq_work.c        | 11 +++++++++++
 2 files changed, 12 insertions(+)

diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
index 30823780c192..b26466f95d04 100644
--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -42,6 +42,7 @@ bool irq_work_queue_on(struct irq_work *work, int cpu);
 
 void irq_work_tick(void);
 void irq_work_sync(struct irq_work *work);
+bool irq_work_pending(struct irq_work *work);
 
 #ifdef CONFIG_IRQ_WORK
 #include <asm/irq_work.h>
diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index eca83965b631..2d206d511aa0 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -24,6 +24,17 @@
 static DEFINE_PER_CPU(struct llist_head, raised_list);
 static DEFINE_PER_CPU(struct llist_head, lazy_list);
 
+bool irq_work_pending(struct irq_work *work)
+{
+	/*
+	 * Provide ordering to callers who may read other stuff
+	 * after the atomic read (MP-pattern).
+	 */
+	bool ret = atomic_read_acquire(&work->flags) & IRQ_WORK_PENDING;
+
+	return ret;
+}
+
 /*
  * Claim the entry so that no one else will poke at it.
  */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v7 15/23] entry/idle: Add a common function for activites during idle entry/exit
  2020-08-28 19:51 [RFC PATCH v7 00/23] Core scheduling v7 Julien Desfossez
                   ` (13 preceding siblings ...)
  2020-08-28 19:51 ` [RFC PATCH v7 14/23] irq_work: Add support to detect if work is pending Julien Desfossez
@ 2020-08-28 19:51 ` Julien Desfossez
  2020-08-30  2:17   ` kernel test robot
  2020-08-28 19:51 ` [RFC PATCH v7 16/23] arch/x86: Add a new TIF flag for untrusted tasks Julien Desfossez
                   ` (7 subsequent siblings)
  22 siblings, 1 reply; 68+ messages in thread
From: Julien Desfossez @ 2020-08-28 19:51 UTC (permalink / raw)
  To: Peter Zijlstra, Vineeth Pillai, Joel Fernandes, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan
  Cc: Joel Fernandes (Google),
	mingo, tglx, pjt, torvalds, linux-kernel, fweisbec, keescook,
	kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta,
	Paolo Bonzini, vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, rostedt,
	derkling, benbjiang

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

Currently only RCU hooks for idle entry/exit are called. In later
patches, kernel-entry protection functionality will be added.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/entry-common.h | 16 ++++++++++++++++
 kernel/sched/idle.c          | 17 +++++++++--------
 2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index efebbffcd5cc..2ea0e09b00d5 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -369,4 +369,20 @@ void irqentry_exit_cond_resched(void);
  */
 void noinstr irqentry_exit(struct pt_regs *regs, irqentry_state_t state);
 
+/**
+ * generic_idle_enter - Called during entry into idle for housekeeping.
+ */
+static inline void generic_idle_enter(void)
+{
+	rcu_idle_enter();
+}
+
+/**
+ * generic_idle_enter - Called when exiting idle for housekeeping.
+ */
+static inline void generic_idle_exit(void)
+{
+	rcu_idle_exit();
+}
+
 #endif
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 9c5637d866fd..269de55086c1 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -8,6 +8,7 @@
  */
 #include "sched.h"
 
+#include <linux/entry-common.h>
 #include <trace/events/power.h>
 
 /* Linker adds these: start and end of __cpuidle functions */
@@ -54,7 +55,7 @@ __setup("hlt", cpu_idle_nopoll_setup);
 
 static noinline int __cpuidle cpu_idle_poll(void)
 {
-	rcu_idle_enter();
+	generic_idle_enter();
 	trace_cpu_idle_rcuidle(0, smp_processor_id());
 	local_irq_enable();
 	stop_critical_timings();
@@ -64,7 +65,7 @@ static noinline int __cpuidle cpu_idle_poll(void)
 		cpu_relax();
 	start_critical_timings();
 	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
-	rcu_idle_exit();
+	generic_idle_exit();
 
 	return 1;
 }
@@ -158,7 +159,7 @@ static void cpuidle_idle_call(void)
 
 	if (cpuidle_not_available(drv, dev)) {
 		tick_nohz_idle_stop_tick();
-		rcu_idle_enter();
+		generic_idle_enter();
 
 		default_idle_call();
 		goto exit_idle;
@@ -178,13 +179,13 @@ static void cpuidle_idle_call(void)
 		u64 max_latency_ns;
 
 		if (idle_should_enter_s2idle()) {
-			rcu_idle_enter();
+			generic_idle_enter();
 
 			entered_state = call_cpuidle_s2idle(drv, dev);
 			if (entered_state > 0)
 				goto exit_idle;
 
-			rcu_idle_exit();
+			generic_idle_exit();
 
 			max_latency_ns = U64_MAX;
 		} else {
@@ -192,7 +193,7 @@ static void cpuidle_idle_call(void)
 		}
 
 		tick_nohz_idle_stop_tick();
-		rcu_idle_enter();
+		generic_idle_enter();
 
 		next_state = cpuidle_find_deepest_state(drv, dev, max_latency_ns);
 		call_cpuidle(drv, dev, next_state);
@@ -209,7 +210,7 @@ static void cpuidle_idle_call(void)
 		else
 			tick_nohz_idle_retain_tick();
 
-		rcu_idle_enter();
+		generic_idle_enter();
 
 		entered_state = call_cpuidle(drv, dev, next_state);
 		/*
@@ -227,7 +228,7 @@ static void cpuidle_idle_call(void)
 	if (WARN_ON_ONCE(irqs_disabled()))
 		local_irq_enable();
 
-	rcu_idle_exit();
+	generic_idle_exit();
 }
 
 /*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v7 16/23] arch/x86: Add a new TIF flag for untrusted tasks
  2020-08-28 19:51 [RFC PATCH v7 00/23] Core scheduling v7 Julien Desfossez
                   ` (14 preceding siblings ...)
  2020-08-28 19:51 ` [RFC PATCH v7 15/23] entry/idle: Add a common function for activites during idle entry/exit Julien Desfossez
@ 2020-08-28 19:51 ` Julien Desfossez
  2020-08-28 19:51 ` [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode Julien Desfossez
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: Julien Desfossez @ 2020-08-28 19:51 UTC (permalink / raw)
  To: Peter Zijlstra, Vineeth Pillai, Joel Fernandes, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan
  Cc: Joel Fernandes (Google),
	mingo, tglx, pjt, torvalds, linux-kernel, fweisbec, keescook,
	kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta,
	Paolo Bonzini, vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, rostedt,
	derkling, benbjiang

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

Add a new TIF flag to indicate whether the kernel needs to be careful
and take additional steps to mitigate micro-architectural issues during
entry into user or guest mode.

This new flag will be used by the series to determine if waiting is
needed or not, during exit to user or guest mode.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 arch/x86/include/asm/thread_info.h | 2 ++
 kernel/sched/sched.h               | 6 ++++++
 2 files changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 267701ae3d86..42e63969acb3 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -98,6 +98,7 @@ struct thread_info {
 #define TIF_IO_BITMAP		22	/* uses I/O bitmap */
 #define TIF_FORCED_TF		24	/* true if TF in eflags artificially */
 #define TIF_BLOCKSTEP		25	/* set when we want DEBUGCTLMSR_BTF */
+#define TIF_UNSAFE_RET   	26	/* On return to process/guest, perform safety checks. */
 #define TIF_LAZY_MMU_UPDATES	27	/* task is updating the mmu lazily */
 #define TIF_SYSCALL_TRACEPOINT	28	/* syscall tracepoint instrumentation */
 #define TIF_ADDR32		29	/* 32-bit address space on 64 bits */
@@ -127,6 +128,7 @@ struct thread_info {
 #define _TIF_IO_BITMAP		(1 << TIF_IO_BITMAP)
 #define _TIF_FORCED_TF		(1 << TIF_FORCED_TF)
 #define _TIF_BLOCKSTEP		(1 << TIF_BLOCKSTEP)
+#define _TIF_UNSAFE_RET 	(1 << TIF_UNSAFE_RET)
 #define _TIF_LAZY_MMU_UPDATES	(1 << TIF_LAZY_MMU_UPDATES)
 #define _TIF_SYSCALL_TRACEPOINT	(1 << TIF_SYSCALL_TRACEPOINT)
 #define _TIF_ADDR32		(1 << TIF_ADDR32)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 1c94b2862069..59827ae58019 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2760,3 +2760,9 @@ static inline bool is_per_cpu_kthread(struct task_struct *p)
 
 void swake_up_all_locked(struct swait_queue_head *q);
 void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait);
+
+#ifdef CONFIG_SCHED_CORE
+#ifndef TIF_UNSAFE_RET
+#define TIF_UNSAFE_RET (0)
+#endif
+#endif
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode
  2020-08-28 19:51 [RFC PATCH v7 00/23] Core scheduling v7 Julien Desfossez
                   ` (15 preceding siblings ...)
  2020-08-28 19:51 ` [RFC PATCH v7 16/23] arch/x86: Add a new TIF flag for untrusted tasks Julien Desfossez
@ 2020-08-28 19:51 ` Julien Desfossez
  2020-08-30  6:50   ` [kernel/entry] 872a0a3f0b: will-it-scale.per_thread_ops -18.7% regression kernel test robot
  2020-09-01 15:54   ` [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode Thomas Gleixner
  2020-08-28 19:51 ` [RFC PATCH v7 18/23] entry/idle: Enter and exit kernel protection during idle entry and exit Julien Desfossez
                   ` (5 subsequent siblings)
  22 siblings, 2 replies; 68+ messages in thread
From: Julien Desfossez @ 2020-08-28 19:51 UTC (permalink / raw)
  To: Peter Zijlstra, Vineeth Pillai, Joel Fernandes, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan
  Cc: Joel Fernandes (Google),
	mingo, tglx, pjt, torvalds, linux-kernel, fweisbec, keescook,
	kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta,
	Paolo Bonzini, vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, rostedt,
	derkling, benbjiang, Julien Desfossez, Aubrey Li, Tim Chen,
	Paul E . McKenney

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

Core-scheduling prevents hyperthreads in usermode from attacking each
other, but it does not do anything about one of the hyperthreads
entering the kernel for any reason. This leaves the door open for MDS
and L1TF attacks with concurrent execution sequences between
hyperthreads.

This patch therefore adds support for protecting all syscall and IRQ
kernel mode entries. Care is taken to track the outermost usermode exit
and entry using per-cpu counters. In cases where one of the hyperthreads
enter the kernel, no additional IPIs are sent. Further, IPIs are avoided
when not needed - example: idle and non-cookie HTs do not need to be
forced into kernel mode.

More information about attacks:
For MDS, it is possible for syscalls, IRQ and softirq handlers to leak
data to either host or guest attackers. For L1TF, it is possible to leak
to guest attackers. There is no possible mitigation involving flushing
of buffers to avoid this since the execution of attacker and victims
happen concurrently on 2 or more HTs.

Cc: Julien Desfossez <jdesfossez@digitalocean.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Aaron Lu <aaron.lwe@gmail.com>
Cc: Aubrey Li <aubrey.li@linux.intel.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Co-developed-by: Vineeth Pillai <viremana@linux.microsoft.com>
Signed-off-by: Vineeth Pillai <viremana@linux.microsoft.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/sched.h |  12 +++
 kernel/entry/common.c |  90 +++++++++++-------
 kernel/sched/core.c   | 212 ++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/sched.h  |   3 +
 4 files changed, 285 insertions(+), 32 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index eee53fa7b8d4..1e04ffe689cb 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2055,4 +2055,16 @@ int sched_trace_rq_nr_running(struct rq *rq);
 
 const struct cpumask *sched_trace_rd_span(struct root_domain *rd);
 
+#ifdef CONFIG_SCHED_CORE
+void sched_core_unsafe_enter(void);
+void sched_core_unsafe_exit(void);
+void sched_core_unsafe_exit_wait(unsigned long ti_check);
+void sched_core_wait_till_safe(unsigned long ti_check);
+#else
+#define sched_core_unsafe_enter(ignore) do { } while (0)
+#define sched_core_unsafe_exit(ignore) do { } while (0)
+#define sched_core_wait_till_safe(ignore) do { } while (0)
+#define sched_core_unsafe_exit_wait(ignore) do { } while (0)
+#endif
+
 #endif
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index fcae019158ca..86ab8b532399 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -28,6 +28,7 @@ static __always_inline void enter_from_user_mode(struct pt_regs *regs)
 
 	instrumentation_begin();
 	trace_hardirqs_off_finish();
+	sched_core_unsafe_enter();
 	instrumentation_end();
 }
 
@@ -112,59 +113,84 @@ static __always_inline void exit_to_user_mode(void)
 /* Workaround to allow gradual conversion of architecture code */
 void __weak arch_do_signal(struct pt_regs *regs) { }
 
-static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
-					    unsigned long ti_work)
+static inline bool exit_to_user_mode_work_pending(unsigned long ti_work)
 {
-	/*
-	 * Before returning to user space ensure that all pending work
-	 * items have been completed.
-	 */
-	while (ti_work & EXIT_TO_USER_MODE_WORK) {
+	return (ti_work & EXIT_TO_USER_MODE_WORK);
+}
 
-		local_irq_enable_exit_to_user(ti_work);
+static inline void exit_to_user_mode_work(struct pt_regs *regs,
+					  unsigned long ti_work)
+{
 
-		if (ti_work & _TIF_NEED_RESCHED)
-			schedule();
+	local_irq_enable_exit_to_user(ti_work);
 
-		if (ti_work & _TIF_UPROBE)
-			uprobe_notify_resume(regs);
+	if (ti_work & _TIF_NEED_RESCHED)
+		schedule();
 
-		if (ti_work & _TIF_PATCH_PENDING)
-			klp_update_patch_state(current);
+	if (ti_work & _TIF_UPROBE)
+		uprobe_notify_resume(regs);
 
-		if (ti_work & _TIF_SIGPENDING)
-			arch_do_signal(regs);
+	if (ti_work & _TIF_PATCH_PENDING)
+		klp_update_patch_state(current);
 
-		if (ti_work & _TIF_NOTIFY_RESUME) {
-			clear_thread_flag(TIF_NOTIFY_RESUME);
-			tracehook_notify_resume(regs);
-			rseq_handle_notify_resume(NULL, regs);
-		}
+	if (ti_work & _TIF_SIGPENDING)
+		arch_do_signal(regs);
+
+	if (ti_work & _TIF_NOTIFY_RESUME) {
+		clear_thread_flag(TIF_NOTIFY_RESUME);
+		tracehook_notify_resume(regs);
+		rseq_handle_notify_resume(NULL, regs);
+	}
+
+	/* Architecture specific TIF work */
+	arch_exit_to_user_mode_work(regs, ti_work);
+
+	local_irq_disable_exit_to_user();
+}
 
-		/* Architecture specific TIF work */
-		arch_exit_to_user_mode_work(regs, ti_work);
+static unsigned long exit_to_user_mode_loop(struct pt_regs *regs)
+{
+	unsigned long ti_work = READ_ONCE(current_thread_info()->flags);
+
+	/*
+	 * Before returning to user space ensure that all pending work
+	 * items have been completed.
+	 */
+	while (1) {
+		/* Both interrupts and preemption could be enabled here. */
+		if (exit_to_user_mode_work_pending(ti_work))
+		    exit_to_user_mode_work(regs, ti_work);
+
+		/* Interrupts may be reenabled with preemption disabled. */
+		sched_core_unsafe_exit_wait(EXIT_TO_USER_MODE_WORK);
 
 		/*
-		 * Disable interrupts and reevaluate the work flags as they
-		 * might have changed while interrupts and preemption was
-		 * enabled above.
+		 * Reevaluate the work flags as they might have changed
+		 * while interrupts and preemption were enabled.
 		 */
-		local_irq_disable_exit_to_user();
 		ti_work = READ_ONCE(current_thread_info()->flags);
+
+		/*
+		 * We may be switching out to another task in kernel mode. That
+		 * process is responsible for exiting the "unsafe" kernel mode
+		 * when it returns to user or guest.
+		 */
+		if (exit_to_user_mode_work_pending(ti_work))
+			sched_core_unsafe_enter();
+		else
+			break;
 	}
 
-	/* Return the latest work state for arch_exit_to_user_mode() */
-	return ti_work;
+    return ti_work;
 }
 
 static void exit_to_user_mode_prepare(struct pt_regs *regs)
 {
-	unsigned long ti_work = READ_ONCE(current_thread_info()->flags);
+	unsigned long ti_work;
 
 	lockdep_assert_irqs_disabled();
 
-	if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK))
-		ti_work = exit_to_user_mode_loop(regs, ti_work);
+	ti_work = exit_to_user_mode_loop(regs);
 
 	arch_exit_to_user_mode_prepare(regs, ti_work);
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7e2f310ec0c0..0dc9172be04d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4600,6 +4600,217 @@ static inline bool cookie_match(struct task_struct *a, struct task_struct *b)
 	return a->core_cookie == b->core_cookie;
 }
 
+/*
+ * Handler to attempt to enter kernel. It does nothing because the exit to
+ * usermode or guest mode will do the actual work (of waiting if needed).
+ */
+static void sched_core_irq_work(struct irq_work *work)
+{
+	return;
+}
+
+/*
+ * sched_core_wait_till_safe - Pause the caller's hyperthread until the core
+ * exits the core-wide unsafe state. Obviously the CPU calling this function
+ * should not be responsible for the core being in the core-wide unsafe state
+ * otherwise it will deadlock.
+ *
+ * @ti_check: We spin here with IRQ enabled and preempt disabled. Break out of
+ *            the loop if TIF flags are set and notify caller about it.
+ *
+ * IRQs should be disabled.
+ */
+void sched_core_wait_till_safe(unsigned long ti_check)
+{
+	bool restart = false;
+	struct rq *rq;
+	int cpu;
+
+	/*
+	 * During exit to user mode, it is possible that preemption was briefly
+	 * enabled before this function is called. During this interval, an interface
+	 * could have removed the task's tag through stop_machine(). Thus, it is normal
+	 * for this function to be called when a task that appears tagged during
+	 * unsafe_enter() but not during the corresponding unsafe_exit(). unsafe_exit()
+	 * has to be called for cleanups.
+	 */
+	if (!test_tsk_thread_flag(current, TIF_UNSAFE_RET) || !current->core_cookie)
+		goto ret;
+
+	cpu = smp_processor_id();
+	rq = cpu_rq(cpu);
+
+	if (!sched_core_enabled(rq))
+		goto ret;
+
+	/* Down grade to allow interrupts. */
+	preempt_disable();
+	local_irq_enable();
+
+	/*
+	 * Wait till the core of this HT is not in an unsafe state.
+	 *
+	 * Pair with smp_store_release() in sched_core_unsafe_exit().
+	 */
+	while (smp_load_acquire(&rq->core->core_unsafe_nest) > 0) {
+		cpu_relax();
+		if (READ_ONCE(current_thread_info()->flags) & ti_check) {
+			restart = true;
+			break;
+		}
+	}
+
+	/* Upgrade it back to the expectations of entry code. */
+	local_irq_disable();
+	preempt_enable();
+
+ret:
+	if (!restart)
+		clear_tsk_thread_flag(current, TIF_UNSAFE_RET);
+
+	return;
+}
+
+/*
+ * Enter the core-wide IRQ state. Sibling will be paused if it is running
+ * 'untrusted' code, until sched_core_unsafe_exit() is called. Every attempt to
+ * avoid sending useless IPIs is made. Must be called only from hard IRQ
+ * context.
+ */
+void sched_core_unsafe_enter(void)
+{
+	const struct cpumask *smt_mask;
+	unsigned long flags;
+	struct rq *rq;
+	int i, cpu;
+
+	/* Ensure that on return to user/guest, we check whether to wait. */
+	if (current->core_cookie)
+		set_tsk_thread_flag(current, TIF_UNSAFE_RET);
+
+	local_irq_save(flags);
+	cpu = smp_processor_id();
+	rq = cpu_rq(cpu);
+	if (!sched_core_enabled(rq))
+		goto ret;
+
+	/* Count unsafe_enter() calls received without unsafe_exit() on this CPU. */
+	rq->core_this_unsafe_nest++;
+
+	/* Should not nest: enter() should only pair with exit(). */
+	if (WARN_ON_ONCE(rq->core_this_unsafe_nest != 1))
+		goto ret;
+
+	raw_spin_lock(rq_lockp(rq));
+	smt_mask = cpu_smt_mask(cpu);
+
+	/* Contribute this CPU's unsafe_enter() to core-wide unsafe_enter() count. */
+	WRITE_ONCE(rq->core->core_unsafe_nest, rq->core->core_unsafe_nest + 1);
+
+	if (WARN_ON_ONCE(rq->core->core_unsafe_nest == UINT_MAX))
+		goto unlock;
+
+	if (irq_work_pending(&rq->core_irq_work)) {
+		/*
+		 * Do nothing more since we are in an IPI sent from another
+		 * sibling to enforce safety. That sibling would have sent IPIs
+		 * to all of the HTs.
+		 */
+		goto unlock;
+	}
+
+	/*
+	 * If we are not the first ones on the core to enter core-wide unsafe
+	 * state, do nothing.
+	 */
+	if (rq->core->core_unsafe_nest > 1)
+		goto unlock;
+
+	/* Do nothing more if the core is not tagged. */
+	if (!rq->core->core_cookie)
+		goto unlock;
+
+	for_each_cpu(i, smt_mask) {
+		struct rq *srq = cpu_rq(i);
+
+		if (i == cpu || cpu_is_offline(i))
+			continue;
+
+		if (!srq->curr->mm || is_idle_task(srq->curr))
+			continue;
+
+		/* Skip if HT is not running a tagged task. */
+		if (!srq->curr->core_cookie && !srq->core_pick)
+			continue;
+
+		/*
+		 * Force sibling into the kernel by IPI. If work was already
+		 * pending, no new IPIs are sent. This is Ok since the receiver
+		 * would already be in the kernel, or on its way to it.
+		 */
+		irq_work_queue_on(&srq->core_irq_work, i);
+	}
+unlock:
+	raw_spin_unlock(rq_lockp(rq));
+ret:
+	local_irq_restore(flags);
+}
+
+/*
+ * Process any work need for either exiting the core-wide unsafe state, or for
+ * waiting on this hyperthread if the core is still in this state.
+ *
+ * @idle: Are we called from the idle loop?
+ */
+void sched_core_unsafe_exit(void)
+{
+	unsigned long flags;
+	unsigned int nest;
+	struct rq *rq;
+	int cpu;
+
+	local_irq_save(flags);
+	cpu = smp_processor_id();
+	rq = cpu_rq(cpu);
+
+	/* Do nothing if core-sched disabled. */
+	if (!sched_core_enabled(rq))
+		goto ret;
+
+	/*
+	 * Can happen when a process is forked and the first return to user
+	 * mode is a syscall exit. Either way, there's nothing to do.
+	 */
+	if (rq->core_this_unsafe_nest == 0)
+		goto ret;
+
+	rq->core_this_unsafe_nest--;
+
+	/* enter() should be paired with exit() only. */
+	if (WARN_ON_ONCE(rq->core_this_unsafe_nest != 0))
+		goto ret;
+
+	raw_spin_lock(rq_lockp(rq));
+	/*
+	 * Core-wide nesting counter can never be 0 because we are
+	 * still in it on this CPU.
+	 */
+	nest = rq->core->core_unsafe_nest;
+	WARN_ON_ONCE(!nest);
+
+	/* Pair with smp_load_acquire() in sched_core_wait_till_safe(). */
+	smp_store_release(&rq->core->core_unsafe_nest, nest - 1);
+	raw_spin_unlock(rq_lockp(rq));
+ret:
+	local_irq_restore(flags);
+}
+
+void sched_core_unsafe_exit_wait(unsigned long ti_check)
+{
+	sched_core_unsafe_exit();
+	sched_core_wait_till_safe(ti_check);
+}
+
 // XXX fairness/fwd progress conditions
 /*
  * Returns
@@ -7584,6 +7795,7 @@ int sched_cpu_starting(unsigned int cpu)
 			rq = cpu_rq(i);
 			if (rq->core && rq->core == rq)
 				core_rq = rq;
+			init_irq_work(&rq->core_irq_work, sched_core_irq_work);
 		}
 
 		if (!core_rq)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 59827ae58019..dbd8416ddaba 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1058,10 +1058,13 @@ struct rq {
 	unsigned int		core_sched_seq;
 	struct rb_root		core_tree;
 	unsigned char		core_forceidle;
+	struct irq_work		core_irq_work; /* To force HT into kernel */
+	unsigned int		core_this_unsafe_nest;
 
 	/* shared state */
 	unsigned int		core_task_seq;
 	unsigned long		core_cookie;
+	unsigned int		core_unsafe_nest;
 #endif
 };
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v7 18/23] entry/idle: Enter and exit kernel protection during idle entry and exit
  2020-08-28 19:51 [RFC PATCH v7 00/23] Core scheduling v7 Julien Desfossez
                   ` (16 preceding siblings ...)
  2020-08-28 19:51 ` [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode Julien Desfossez
@ 2020-08-28 19:51 ` Julien Desfossez
  2020-08-28 19:51 ` [RFC PATCH v7 19/23] entry/kvm: Protect the kernel when entering from guest Julien Desfossez
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: Julien Desfossez @ 2020-08-28 19:51 UTC (permalink / raw)
  To: Peter Zijlstra, Vineeth Pillai, Joel Fernandes, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan
  Cc: Joel Fernandes (Google),
	mingo, tglx, pjt, torvalds, linux-kernel, fweisbec, keescook,
	kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta,
	Paolo Bonzini, vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, rostedt,
	derkling, benbjiang

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

Make use of the generic_idle_{enter,exit} helper function added in
earlier patches to enter and exit kernel protection.

On exiting idle, protection will be reenabled.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/entry-common.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index 2ea0e09b00d5..c833f2fda542 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -374,6 +374,9 @@ void noinstr irqentry_exit(struct pt_regs *regs, irqentry_state_t state);
  */
 static inline void generic_idle_enter(void)
 {
+	/* Entering idle ends the protected kernel region. */
+	sched_core_unsafe_exit();
+
 	rcu_idle_enter();
 }
 
@@ -383,6 +386,9 @@ static inline void generic_idle_enter(void)
 static inline void generic_idle_exit(void)
 {
 	rcu_idle_exit();
+
+	/* Exiting idle (re)starts the protected kernel region. */
+	sched_core_unsafe_enter();
 }
 
 #endif
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v7 19/23] entry/kvm: Protect the kernel when entering from guest
  2020-08-28 19:51 [RFC PATCH v7 00/23] Core scheduling v7 Julien Desfossez
                   ` (17 preceding siblings ...)
  2020-08-28 19:51 ` [RFC PATCH v7 18/23] entry/idle: Enter and exit kernel protection during idle entry and exit Julien Desfossez
@ 2020-08-28 19:51 ` Julien Desfossez
  2020-08-28 19:51 ` [RFC PATCH v7 20/23] sched/coresched: config option for kernel protection Julien Desfossez
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: Julien Desfossez @ 2020-08-28 19:51 UTC (permalink / raw)
  To: Peter Zijlstra, Vineeth Pillai, Joel Fernandes, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan
  Cc: mingo, tglx, pjt, torvalds, linux-kernel, fweisbec, keescook,
	kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta,
	Paolo Bonzini, joel, vineeth, Chen Yu, Christian Brauner,
	Agata Gruza, Antonio Gomez Iglesias, graf, konrad.wilk,
	dfaggioli, rostedt, derkling, benbjiang

From: Vineeth Pillai <viremana@linux.microsoft.com>

Similar to how user to kernel mode transitions are protected in earlier
patches, protect the entry into kernel from guest mode as well.

Signed-off-by: Vineeth Pillai <viremana@linux.microsoft.com>
---
 arch/x86/kvm/x86.c        |  3 +++
 include/linux/entry-kvm.h | 12 ++++++++++++
 kernel/entry/kvm.c        | 12 ++++++++++++
 3 files changed, 27 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 33945283fe07..2c2f211a556a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8526,6 +8526,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	 */
 	smp_mb__after_srcu_read_unlock();
 
+	kvm_exit_to_guest_mode(vcpu);
+
 	/*
 	 * This handles the case where a posted interrupt was
 	 * notified with kvm_vcpu_kick.
@@ -8619,6 +8621,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		}
 	}
 
+	kvm_enter_from_guest_mode(vcpu);
 	local_irq_enable();
 	preempt_enable();
 
diff --git a/include/linux/entry-kvm.h b/include/linux/entry-kvm.h
index 0cef17afb41a..32aabb7f3e6d 100644
--- a/include/linux/entry-kvm.h
+++ b/include/linux/entry-kvm.h
@@ -77,4 +77,16 @@ static inline bool xfer_to_guest_mode_work_pending(void)
 }
 #endif /* CONFIG_KVM_XFER_TO_GUEST_WORK */
 
+/**
+ * kvm_enter_from_guest_mode - Hook called just after entering kernel from guest.
+ * @vcpu:   Pointer to the current VCPU data
+ */
+void kvm_enter_from_guest_mode(struct kvm_vcpu *vcpu);
+
+/**
+ * kvm_exit_to_guest_mode - Hook called just before entering guest from kernel.
+ * @vcpu:   Pointer to the current VCPU data
+ */
+void kvm_exit_to_guest_mode(struct kvm_vcpu *vcpu);
+
 #endif
diff --git a/kernel/entry/kvm.c b/kernel/entry/kvm.c
index eb1a8a4c867c..994af4241646 100644
--- a/kernel/entry/kvm.c
+++ b/kernel/entry/kvm.c
@@ -49,3 +49,15 @@ int xfer_to_guest_mode_handle_work(struct kvm_vcpu *vcpu)
 	return xfer_to_guest_mode_work(vcpu, ti_work);
 }
 EXPORT_SYMBOL_GPL(xfer_to_guest_mode_handle_work);
+
+void kvm_enter_from_guest_mode(struct kvm_vcpu *vcpu)
+{
+	sched_core_unsafe_enter();
+}
+EXPORT_SYMBOL_GPL(kvm_enter_from_guest_mode);
+
+void kvm_exit_to_guest_mode(struct kvm_vcpu *vcpu)
+{
+	sched_core_unsafe_exit_wait(XFER_TO_GUEST_MODE_WORK);
+}
+EXPORT_SYMBOL_GPL(kvm_exit_to_guest_mode);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v7 20/23] sched/coresched: config option for kernel protection
  2020-08-28 19:51 [RFC PATCH v7 00/23] Core scheduling v7 Julien Desfossez
                   ` (18 preceding siblings ...)
  2020-08-28 19:51 ` [RFC PATCH v7 19/23] entry/kvm: Protect the kernel when entering from guest Julien Desfossez
@ 2020-08-28 19:51 ` Julien Desfossez
  2020-08-28 19:51 ` [RFC PATCH v7 21/23] sched: cgroup tagging interface for core scheduling Julien Desfossez
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 68+ messages in thread
From: Julien Desfossez @ 2020-08-28 19:51 UTC (permalink / raw)
  To: Peter Zijlstra, Vineeth Pillai, Joel Fernandes, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan
  Cc: mingo, tglx, pjt, torvalds, linux-kernel, fweisbec, keescook,
	kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta,
	Paolo Bonzini, joel, vineeth, Chen Yu, Christian Brauner,
	Agata Gruza, Antonio Gomez Iglesias, graf, konrad.wilk,
	dfaggioli, rostedt, derkling, benbjiang

From: Vineeth Pillai <viremana@linux.microsoft.com>

There are use cases where the kernel protection is not needed. One
example could be about using core scheduling for non-security related
use cases - isolate core for a particular process dynamically. Also,
to test/benchmark the overhead of kernel protection.

Have a compile time and boot time option to disable the feature.
CONFIG_SCHED_CORE_KERNEL_PROTECTION will enable this feature at
compile time and is enabled by default is CONFIG_SCHED_CORE=y.
sched_core_kernel_protection= boot time option to control this. Value
0 will disable the feature.

Signed-off-by: Vineeth Pillai <viremana@linux.microsoft.com>
---
 .../admin-guide/kernel-parameters.txt         |  9 +++++
 include/linux/sched.h                         |  2 +-
 kernel/Kconfig.preempt                        | 13 +++++++
 kernel/sched/core.c                           | 39 ++++++++++++++++++-
 kernel/sched/sched.h                          |  2 +
 5 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a1068742a6df..01e442388e4a 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4611,6 +4611,15 @@
 
 	sbni=		[NET] Granch SBNI12 leased line adapter
 
+	sched_core_kernel_protection=
+			[SCHED_CORE, SCHED_CORE_IRQ_PAUSE] Pause SMT siblings
+			of a core runninig in user mode if atleast one of the
+			siblings of the core is running in kernel. This is to
+			guarantee that kernel data is not leaked to tasks which
+			are not trusted by the kernel.
+			This feature is valid only when Core scheduling is
+			enabled(CONFIG_SCHED_CORE).
+
 	sched_debug	[KNL] Enables verbose scheduler debug messages.
 
 	schedstats=	[KNL,X86] Enable or disable scheduled statistics.
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1e04ffe689cb..4d9ae6b4dcc9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2055,7 +2055,7 @@ int sched_trace_rq_nr_running(struct rq *rq);
 
 const struct cpumask *sched_trace_rd_span(struct root_domain *rd);
 
-#ifdef CONFIG_SCHED_CORE
+#ifdef CONFIG_SCHED_CORE_KERNEL_PROTECTION
 void sched_core_unsafe_enter(void);
 void sched_core_unsafe_exit(void);
 void sched_core_unsafe_exit_wait(unsigned long ti_check);
diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt
index 4488fbf4d3a8..52f86739f910 100644
--- a/kernel/Kconfig.preempt
+++ b/kernel/Kconfig.preempt
@@ -86,3 +86,16 @@ config SCHED_CORE
 	default y
 	depends on SCHED_SMT
 
+config SCHED_CORE_KERNEL_PROTECTION
+	bool "Core Scheduling for SMT"
+	default y
+	depends on SCHED_CORE
+	help
+	  This option enables pausing all SMT siblings of a core running in
+	  user mode when atleast one of the siblings in the core is in kernel.
+	  This is to enforce security such that information from kernel is not
+	  leaked to non-trusted tasks running on siblings. This option is valid
+	  only if Core Scheduling(CONFIG_SCHED_CORE) is enabled.
+
+	  If in doubt, select 'Y' when CONFIG_SCHED_CORE=y
+
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0dc9172be04d..34238fd67f31 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -75,6 +75,24 @@ __read_mostly int scheduler_running;
 
 #ifdef CONFIG_SCHED_CORE
 
+#ifdef CONFIG_SCHED_CORE_KERNEL_PROTECTION
+
+DEFINE_STATIC_KEY_TRUE(sched_core_kernel_protection);
+static int __init set_sched_core_kernel_protection(char *str)
+{
+	unsigned long val = 0;
+
+	if (!str)
+		return 0;
+
+	if (!kstrtoul(str, 0, &val) && !val)
+		static_branch_disable(&sched_core_kernel_protection);
+
+	return 1;
+}
+__setup("sched_core_kernel_protection=", set_sched_core_kernel_protection);
+#endif
+
 DEFINE_STATIC_KEY_FALSE(__sched_core_enabled);
 
 /* kernel prio, less is more */
@@ -4600,6 +4618,8 @@ static inline bool cookie_match(struct task_struct *a, struct task_struct *b)
 	return a->core_cookie == b->core_cookie;
 }
 
+#ifdef CONFIG_SCHED_CORE_KERNEL_PROTECTION
+
 /*
  * Handler to attempt to enter kernel. It does nothing because the exit to
  * usermode or guest mode will do the actual work (of waiting if needed).
@@ -4609,6 +4629,11 @@ static void sched_core_irq_work(struct irq_work *work)
 	return;
 }
 
+static inline void init_sched_core_irq_work(struct rq *rq)
+{
+	init_irq_work(&rq->core_irq_work, sched_core_irq_work);
+}
+
 /*
  * sched_core_wait_till_safe - Pause the caller's hyperthread until the core
  * exits the core-wide unsafe state. Obviously the CPU calling this function
@@ -4684,6 +4709,9 @@ void sched_core_unsafe_enter(void)
 	struct rq *rq;
 	int i, cpu;
 
+	if (!static_branch_likely(&sched_core_kernel_protection))
+		return;
+
 	/* Ensure that on return to user/guest, we check whether to wait. */
 	if (current->core_cookie)
 		set_tsk_thread_flag(current, TIF_UNSAFE_RET);
@@ -4769,6 +4797,9 @@ void sched_core_unsafe_exit(void)
 	struct rq *rq;
 	int cpu;
 
+	if (!static_branch_likely(&sched_core_kernel_protection))
+		return;
+
 	local_irq_save(flags);
 	cpu = smp_processor_id();
 	rq = cpu_rq(cpu);
@@ -4807,9 +4838,15 @@ void sched_core_unsafe_exit(void)
 
 void sched_core_unsafe_exit_wait(unsigned long ti_check)
 {
+	if (!static_branch_likely(&sched_core_kernel_protection))
+		return;
+
 	sched_core_unsafe_exit();
 	sched_core_wait_till_safe(ti_check);
 }
+#else
+static inline void init_sched_core_irq_work(struct rq *rq) {}
+#endif /* CONFIG_SCHED_CORE_KERNEL_PROTECTION */
 
 // XXX fairness/fwd progress conditions
 /*
@@ -7795,7 +7832,7 @@ int sched_cpu_starting(unsigned int cpu)
 			rq = cpu_rq(i);
 			if (rq->core && rq->core == rq)
 				core_rq = rq;
-			init_irq_work(&rq->core_irq_work, sched_core_irq_work);
+			init_sched_core_irq_work(rq);
 		}
 
 		if (!core_rq)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index dbd8416ddaba..676818bdb9df 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1058,8 +1058,10 @@ struct rq {
 	unsigned int		core_sched_seq;
 	struct rb_root		core_tree;
 	unsigned char		core_forceidle;
+#ifdef CONFIG_SCHED_CORE_KERNEL_PROTECTION
 	struct irq_work		core_irq_work; /* To force HT into kernel */
 	unsigned int		core_this_unsafe_nest;
+#endif
 
 	/* shared state */
 	unsigned int		core_task_seq;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v7 21/23] sched: cgroup tagging interface for core scheduling
  2020-08-28 19:51 [RFC PATCH v7 00/23] Core scheduling v7 Julien Desfossez
                   ` (19 preceding siblings ...)
  2020-08-28 19:51 ` [RFC PATCH v7 20/23] sched/coresched: config option for kernel protection Julien Desfossez
@ 2020-08-28 19:51 ` Julien Desfossez
  2020-08-28 19:51 ` [RFC PATCH v7 22/23] Documentation: Add documentation on " Julien Desfossez
  2020-08-28 19:51 ` [RFC PATCH v7 23/23] sched: Debug bits Julien Desfossez
  22 siblings, 0 replies; 68+ messages in thread
From: Julien Desfossez @ 2020-08-28 19:51 UTC (permalink / raw)
  To: Peter Zijlstra, Vineeth Pillai, Joel Fernandes, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan
  Cc: mingo, tglx, pjt, torvalds, linux-kernel, fweisbec, keescook,
	kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta,
	Paolo Bonzini, joel, vineeth, Chen Yu, Christian Brauner,
	Agata Gruza, Antonio Gomez Iglesias, graf, konrad.wilk,
	dfaggioli, rostedt, derkling, benbjiang, Julien Desfossez,
	Vineeth Remanan Pillai

From: Peter Zijlstra <peterz@infradead.org>

Marks all tasks in a cgroup as matching for core-scheduling.

A task will need to be moved into the core scheduler queue when the cgroup
it belongs to is tagged to run with core scheduling.  Similarly the task
will need to be moved out of the core scheduler queue when the cgroup
is untagged.

Also after we forked a task, its core scheduler queue's presence will
need to be updated according to its new cgroup's status.

Use stop machine mechanism to update all tasks in a cgroup to prevent a
new task from sneaking into the cgroup, and missed out from the update
while we iterates through all the tasks in the cgroup.  A more complicated
scheme could probably avoid the stop machine.  Such scheme will also
need to resovle inconsistency between a task's cgroup core scheduling
tag and residency in core scheduler queue.

We are opting for the simple stop machine mechanism for now that avoids
such complications.

Core scheduler has extra overhead.  Enable it only for core with
more than one SMT hardware threads.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Julien Desfossez <jdesfossez@digitalocean.com>
Signed-off-by: Vineeth Remanan Pillai <vpillai@digitalocean.com>
---
 kernel/sched/core.c  | 183 +++++++++++++++++++++++++++++++++++++++++--
 kernel/sched/sched.h |   4 +
 2 files changed, 180 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 34238fd67f31..5f77e575bbac 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -153,6 +153,37 @@ static inline bool __sched_core_less(struct task_struct *a, struct task_struct *
 	return false;
 }
 
+static bool sched_core_empty(struct rq *rq)
+{
+	return RB_EMPTY_ROOT(&rq->core_tree);
+}
+
+static bool sched_core_enqueued(struct task_struct *task)
+{
+	return !RB_EMPTY_NODE(&task->core_node);
+}
+
+static struct task_struct *sched_core_first(struct rq *rq)
+{
+	struct task_struct *task;
+
+	task = container_of(rb_first(&rq->core_tree), struct task_struct, core_node);
+	return task;
+}
+
+static void sched_core_flush(int cpu)
+{
+	struct rq *rq = cpu_rq(cpu);
+	struct task_struct *task;
+
+	while (!sched_core_empty(rq)) {
+		task = sched_core_first(rq);
+		rb_erase(&task->core_node, &rq->core_tree);
+		RB_CLEAR_NODE(&task->core_node);
+	}
+	rq->core->core_task_seq++;
+}
+
 static void sched_core_enqueue(struct rq *rq, struct task_struct *p)
 {
 	struct rb_node *parent, **node;
@@ -184,10 +215,11 @@ static void sched_core_dequeue(struct rq *rq, struct task_struct *p)
 {
 	rq->core->core_task_seq++;
 
-	if (!p->core_cookie)
+	if (!sched_core_enqueued(p))
 		return;
 
 	rb_erase(&p->core_node, &rq->core_tree);
+	RB_CLEAR_NODE(&p->core_node);
 }
 
 /*
@@ -253,9 +285,23 @@ static int __sched_core_stopper(void *data)
 
 	for_each_possible_cpu(cpu) {
 		struct rq *rq = cpu_rq(cpu);
-		rq->core_enabled = enabled;
-		if (cpu_online(cpu) && rq->core != rq)
-			sched_core_adjust_sibling_vruntime(cpu, enabled);
+
+		WARN_ON_ONCE(enabled == rq->core_enabled);
+
+		if (!enabled || (enabled && cpumask_weight(cpu_smt_mask(cpu)) >= 2)) {
+			/*
+			 * All active and migrating tasks will have already
+			 * been removed from core queue when we clear the
+			 * cgroup tags. However, dying tasks could still be
+			 * left in core queue. Flush them here.
+			 */
+			if (!enabled)
+				sched_core_flush(cpu);
+
+			rq->core_enabled = enabled;
+			if (cpu_online(cpu) && rq->core != rq)
+				sched_core_adjust_sibling_vruntime(cpu, enabled);
+		}
 	}
 
 	return 0;
@@ -266,7 +312,11 @@ static int sched_core_count;
 
 static void __sched_core_enable(void)
 {
-	// XXX verify there are no cookie tasks (yet)
+	int cpu;
+
+	/* verify there are no cookie tasks (yet) */
+	for_each_online_cpu(cpu)
+		BUG_ON(!sched_core_empty(cpu_rq(cpu)));
 
 	static_branch_enable(&__sched_core_enabled);
 	stop_machine(__sched_core_stopper, (void *)true, NULL);
@@ -274,8 +324,6 @@ static void __sched_core_enable(void)
 
 static void __sched_core_disable(void)
 {
-	// XXX verify there are no cookie tasks (left)
-
 	stop_machine(__sched_core_stopper, (void *)false, NULL);
 	static_branch_disable(&__sched_core_enabled);
 }
@@ -300,6 +348,7 @@ void sched_core_put(void)
 
 static inline void sched_core_enqueue(struct rq *rq, struct task_struct *p) { }
 static inline void sched_core_dequeue(struct rq *rq, struct task_struct *p) { }
+static bool sched_core_enqueued(struct task_struct *task) { return false; }
 
 #endif /* CONFIG_SCHED_CORE */
 
@@ -3534,6 +3583,9 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
 #ifdef CONFIG_SMP
 	plist_node_init(&p->pushable_tasks, MAX_PRIO);
 	RB_CLEAR_NODE(&p->pushable_dl_tasks);
+#endif
+#ifdef CONFIG_SCHED_CORE
+	RB_CLEAR_NODE(&p->core_node);
 #endif
 	return 0;
 }
@@ -7431,6 +7483,9 @@ void init_idle(struct task_struct *idle, int cpu)
 #ifdef CONFIG_SMP
 	sprintf(idle->comm, "%s/%d", INIT_TASK_COMM, cpu);
 #endif
+#ifdef CONFIG_SCHED_CORE
+	RB_CLEAR_NODE(&idle->core_node);
+#endif
 }
 
 #ifdef CONFIG_SMP
@@ -8416,6 +8471,15 @@ static void sched_change_group(struct task_struct *tsk, int type)
 	tg = container_of(task_css_check(tsk, cpu_cgrp_id, true),
 			  struct task_group, css);
 	tg = autogroup_task_group(tsk, tg);
+
+#ifdef CONFIG_SCHED_CORE
+	if ((unsigned long)tsk->sched_task_group == tsk->core_cookie)
+		tsk->core_cookie = 0UL;
+
+	if (tg->tagged /* && !tsk->core_cookie ? */)
+		tsk->core_cookie = (unsigned long)tg;
+#endif
+
 	tsk->sched_task_group = tg;
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
@@ -8508,6 +8572,18 @@ static int cpu_cgroup_css_online(struct cgroup_subsys_state *css)
 	return 0;
 }
 
+static void cpu_cgroup_css_offline(struct cgroup_subsys_state *css)
+{
+#ifdef CONFIG_SCHED_CORE
+	struct task_group *tg = css_tg(css);
+
+	if (tg->tagged) {
+		sched_core_put();
+		tg->tagged = 0;
+	}
+#endif
+}
+
 static void cpu_cgroup_css_released(struct cgroup_subsys_state *css)
 {
 	struct task_group *tg = css_tg(css);
@@ -9073,6 +9149,82 @@ static u64 cpu_rt_period_read_uint(struct cgroup_subsys_state *css,
 }
 #endif /* CONFIG_RT_GROUP_SCHED */
 
+#ifdef CONFIG_SCHED_CORE
+static u64 cpu_core_tag_read_u64(struct cgroup_subsys_state *css, struct cftype *cft)
+{
+	struct task_group *tg = css_tg(css);
+
+	return !!tg->tagged;
+}
+
+struct write_core_tag {
+	struct cgroup_subsys_state *css;
+	int val;
+};
+
+static int __sched_write_tag(void *data)
+{
+	struct write_core_tag *tag = (struct write_core_tag *) data;
+	struct cgroup_subsys_state *css = tag->css;
+	int val = tag->val;
+	struct task_group *tg = css_tg(tag->css);
+	struct css_task_iter it;
+	struct task_struct *p;
+
+	tg->tagged = !!val;
+
+	css_task_iter_start(css, 0, &it);
+	/*
+	 * Note: css_task_iter_next will skip dying tasks.
+	 * There could still be dying tasks left in the core queue
+	 * when we set cgroup tag to 0 when the loop is done below.
+	 */
+	while ((p = css_task_iter_next(&it))) {
+		p->core_cookie = !!val ? (unsigned long)tg : 0UL;
+
+		if (sched_core_enqueued(p)) {
+			sched_core_dequeue(task_rq(p), p);
+			if (!p->core_cookie)
+				continue;
+		}
+
+		if (sched_core_enabled(task_rq(p)) &&
+		    p->core_cookie && task_on_rq_queued(p))
+			sched_core_enqueue(task_rq(p), p);
+
+	}
+	css_task_iter_end(&it);
+
+	return 0;
+}
+
+static int cpu_core_tag_write_u64(struct cgroup_subsys_state *css, struct cftype *cft, u64 val)
+{
+	struct task_group *tg = css_tg(css);
+	struct write_core_tag wtag;
+
+	if (val > 1)
+		return -ERANGE;
+
+	if (!static_branch_likely(&sched_smt_present))
+		return -EINVAL;
+
+	if (tg->tagged == !!val)
+		return 0;
+
+	if (!!val)
+		sched_core_get();
+
+	wtag.css = css;
+	wtag.val = val;
+	stop_machine(__sched_write_tag, (void *) &wtag, NULL);
+	if (!val)
+		sched_core_put();
+
+	return 0;
+}
+#endif
+
 static struct cftype cpu_legacy_files[] = {
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	{
@@ -9109,6 +9261,14 @@ static struct cftype cpu_legacy_files[] = {
 		.write_u64 = cpu_rt_period_write_uint,
 	},
 #endif
+#ifdef CONFIG_SCHED_CORE
+	{
+		.name = "tag",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.read_u64 = cpu_core_tag_read_u64,
+		.write_u64 = cpu_core_tag_write_u64,
+	},
+#endif
 #ifdef CONFIG_UCLAMP_TASK_GROUP
 	{
 		.name = "uclamp.min",
@@ -9282,6 +9442,14 @@ static struct cftype cpu_files[] = {
 		.write_s64 = cpu_weight_nice_write_s64,
 	},
 #endif
+#ifdef CONFIG_SCHED_CORE
+	{
+		.name = "tag",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.read_u64 = cpu_core_tag_read_u64,
+		.write_u64 = cpu_core_tag_write_u64,
+	},
+#endif
 #ifdef CONFIG_CFS_BANDWIDTH
 	{
 		.name = "max",
@@ -9310,6 +9478,7 @@ static struct cftype cpu_files[] = {
 struct cgroup_subsys cpu_cgrp_subsys = {
 	.css_alloc	= cpu_cgroup_css_alloc,
 	.css_online	= cpu_cgroup_css_online,
+	.css_offline	= cpu_cgroup_css_offline,
 	.css_released	= cpu_cgroup_css_released,
 	.css_free	= cpu_cgroup_css_free,
 	.css_extra_stat_show = cpu_extra_stat_show,
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 676818bdb9df..8ef7ab3061ed 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -384,6 +384,10 @@ struct cfs_bandwidth {
 struct task_group {
 	struct cgroup_subsys_state css;
 
+#ifdef CONFIG_SCHED_CORE
+	int			tagged;
+#endif
+
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	/* schedulable entities of this group on each CPU */
 	struct sched_entity	**se;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v7 22/23] Documentation: Add documentation on core scheduling
  2020-08-28 19:51 [RFC PATCH v7 00/23] Core scheduling v7 Julien Desfossez
                   ` (20 preceding siblings ...)
  2020-08-28 19:51 ` [RFC PATCH v7 21/23] sched: cgroup tagging interface for core scheduling Julien Desfossez
@ 2020-08-28 19:51 ` Julien Desfossez
  2020-08-28 19:51 ` [RFC PATCH v7 23/23] sched: Debug bits Julien Desfossez
  22 siblings, 0 replies; 68+ messages in thread
From: Julien Desfossez @ 2020-08-28 19:51 UTC (permalink / raw)
  To: Peter Zijlstra, Vineeth Pillai, Joel Fernandes, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan
  Cc: Joel Fernandes (Google),
	mingo, tglx, pjt, torvalds, linux-kernel, fweisbec, keescook,
	kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta,
	Paolo Bonzini, vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, rostedt,
	derkling, benbjiang

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

Co-developed-by: Vineeth Pillai <viremana@linux.microsoft.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Vineeth Pillai <viremana@linux.microsoft.com>
---
 .../admin-guide/hw-vuln/core-scheduling.rst   | 253 ++++++++++++++++++
 Documentation/admin-guide/hw-vuln/index.rst   |   1 +
 2 files changed, 254 insertions(+)
 create mode 100644 Documentation/admin-guide/hw-vuln/core-scheduling.rst

diff --git a/Documentation/admin-guide/hw-vuln/core-scheduling.rst b/Documentation/admin-guide/hw-vuln/core-scheduling.rst
new file mode 100644
index 000000000000..7ebece93f1d3
--- /dev/null
+++ b/Documentation/admin-guide/hw-vuln/core-scheduling.rst
@@ -0,0 +1,253 @@
+Core Scheduling
+================
+MDS and L1TF mitigations do not protect from cross-HT attacks (attacker running
+on one HT with victim running on another). For proper mitigation of this,
+core scheduling support is available via the ``CONFIG_SCHED_CORE`` config option.
+Using this feature, userspace defines groups of tasks that trust each other.
+The core scheduler uses this information to make sure that tasks that do not
+trust each other will never run simultaneously on a core while still meeting
+the scheduler's requirements.
+
+Usage
+-----
+The current interface implementation is just for testing and uses CPU
+controller CGroups which will change soon. A ``cpu.tag`` file has been added to
+the CPU controller CGroup. If the content of this file is 1, then all the
+CGroup's tasks trust each other and are allowed to run concurrently on a core's
+hyperthreads (also called siblings).
+
+As mention, the interface is for testing purposes and has drawbacks. Trusted
+tasks have to be grouped into CPU CGroup which is not always possible
+depending on the system's existing CGroup configuration, where trusted tasks
+could already be in different CPU CGroups. Also, this feature will have a hard
+dependency on CGroups and systems with CGroups disabled would not be able to
+use core scheduling so another API is needed in conjunction with
+CGroups. See `Future work`_ for other API proposals.
+
+Design/Implementation
+---------------------
+Each task that is tagged is assigned a cookie internally in the kernel. As
+mentioned in `Usage`_, tasks with the same cookie value are assumed to trust
+each other and share a core.
+
+The basic idea is that, every schedule event tries to select tasks for all the
+siblings of a core such that all the selected tasks running on a core are
+trusted (same cookie) at any point in time. Kernel threads are assumed trusted.
+The idle task is considered special, in that it trusts every thing.
+
+During a schedule() event on any sibling of a core, the highest priority task for
+that core is picked and assigned to the sibling calling schedule() if it has it
+enqueued. For rest of the siblings in the core, highest priority task with the
+same cookie is selected if there is one runnable in their individual run
+queues. If a task with same cookie is not available, the idle task is selected.
+Idle task is globally trusted.
+
+Once a task has been selected for all the siblings in the core, an IPI is sent to
+siblings for whom a new task was selected. Siblings on receiving the IPI, will
+switch to the new task immediately. If an idle task is selected for a sibling,
+then the sibling is considered to be in a "forced idle" state. i.e., it may
+have tasks on its on runqueue to run, however it will still have to run idle.
+More on this in the next section.
+
+Forced-idling of tasks
+---------------------
+The scheduler tries its best to find tasks that trust each other such that all
+tasks selected to be scheduled are of the highest priority in a core.  However,
+it is possible that some runqueues had tasks that were incompatibile with the
+highest priority ones in the core. Favoring security over fairness, one or more
+siblings could be forced to select a lower priority task if the highest
+priority task is not trusted with respect to the core wide highest priority
+task.  If a sibling does not have a trusted task to run, it will be forced idle
+by the scheduler(idle thread is scheduled to run).
+
+When the highest priorty task is selected to run, a reschedule-IPI is sent to
+the sibling to force it into idle. This results in 4 cases which need to be
+considered depending on whether a VM or a regular usermode process was running
+on either HT:
+
+::
+          HT1 (attack)            HT2 (victim)
+   
+   A      idle -> user space      user space -> idle
+   
+   B      idle -> user space      guest -> idle
+   
+   C      idle -> guest           user space -> idle
+   
+   D      idle -> guest           guest -> idle
+
+Note that for better performance, we do not wait for the destination CPU
+(victim) to enter idle mode. This is because the sending of the IPI would bring
+the destination CPU immediately into kernel mode from user space, or VMEXIT
+in the case of  guests. At best, this would only leak some scheduler metadata
+which may not be worth protecting. It is also possible that the IPI is received
+too late on some architectures, but this has not been observed in the case of
+x86.
+
+Kernel protection from untrusted tasks
+--------------------------------------
+The scheduler on its own cannot protect the kernel executing concurrently with
+an untrusted task in a core. This is because the scheduler is unaware of
+interrupts/syscalls at scheduling time. To mitigate this, we send an IPI to
+siblings on kernel entry. This forces the sibling to enter kernel mode and it
+waits before returning to user until all siblings of the core has left kernel
+mode.  For good performance, we send an IPI only if it is detected that the
+core is running tasks that have been marked for core scheduling. If a sibling
+is running kernel threads or is idle, no IPI is sent.
+
+For easier testing, a temporary (not intended for mainline) patch is included
+in this series to make kernel protection configurable via a
+``CONFIG_SCHED_CORE_KERNEL_PROTECTION`` config option or a
+``sched_core_kernel_protection`` boot parameter.
+
+Other ideas for kernel protection which are
+
+1. Changing interrupt affinities to a trusted core which does not execute untrusted tasks
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+By changing the interrupt affinities to a designated safe-CPU which runs
+only trusted tasks, IRQ data can be protected. One issue is this involves
+giving up a full CPU core of the system to run safe tasks. Another is that,
+per-cpu interrupts such as the local timer interrupt cannot have their
+affinity changed. also, sensitive timer callbacks such as the random entropy timer
+can run in softirq on return from these interrupts and expose sensitive
+data. In the future, that could be mitigated by forcing softirqs into threaded
+mode by utilizing a mechanism similar to ``PREEMPT_RT``.
+
+Yet another issue with this is, for multiqueue devices with managed
+interrupts, the IRQ affinities cannot be changed however it could be
+possible to force a reduced number of queues which would in turn allow to
+shield one or two CPUs from such interrupts and queue handling for the price
+of indirection.
+
+2. Running IRQs as threaded-IRQs
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+This would result in forcing IRQs into the scheduler which would then provide
+the process-context mitigation. However, not all interrupts can be threaded.
+Also this does nothing about syscall entries.
+
+3. Kernel Address Space Isolation
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+System calls could run in a much restricted address space which is
+guarenteed not to leak any sensitive data. There are practical limitation in
+implementing this - the main concern being how to decide on an address space
+that is guarenteed to not have any sensitive data.
+
+4. Limited cookie-based protection
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+On a system call, change the cookie to the system trusted cookie and initiate a
+schedule event. This would be better than pausing all the siblings during the
+entire duration for the system call, but still would be a huge hit to the
+performance.
+
+Trust model
+-----------
+Core scheduling understands trust relationships by assignment of a cookie to
+related tasks using the above mentioned interface.  When a system with core
+scheduling boots, all tasks are considered to trust each other. This is because
+the scheduler does not have information about trust relationships. That is, all
+tasks have a default cookie value of 0. This cookie value is also considered
+the system-wide cookie value and the IRQ-pausing mitigation is avoided if
+siblings are running these cookie-0 tasks.
+
+By default, all system processes on boot are considered trusted and userspace
+has to explicitly use the interfaces mentioned above to group sets of tasks.
+Tasks within the group trust each other, but not those outside. Tasks outside
+the group don't trust the task inside.
+
+Limitations
+-----------
+Core scheduling tries to guarentee that only trusted tasks run concurrently on a
+core. But there could be small window of time during which untrusted tasks run
+concurrently or kernel could be running concurrently with a task not trusted by
+kernel.
+
+1. IPI processing delays
+^^^^^^^^^^^^^^^^^^^^^^^^
+Core scheduling selects only trusted tasks to run together. IPI is used to notify
+the siblings to switch to the new task. But there could be hardware delays in
+receiving of the IPI on some arch (on x86, this has not been observed). This may
+cause an attacker task to start running on a cpu before its siblings receive the
+IPI. Even though cache is flushed on entry to user mode, victim tasks on siblings
+may populate data in the cache and micro acrhitectural buffers after the attacker
+starts to run and this is a possibility for data leak.
+
+Open cross-HT issues that core scheduling does not solve
+--------------------------------------------------------
+1. For MDS
+^^^^^^^^^^
+Core scheduling cannot protect against MDS attacks between an HT running in
+user mode and another running in kernel mode. Even though both HTs run tasks
+which trust each other, kernel memory is still considered untrusted. Such
+attacks are possible for any combination of sibling CPU modes (host or guest mode).
+
+2. For L1TF
+^^^^^^^^^^^
+Core scheduling cannot protect against a L1TF guest attackers exploiting a
+guest or host victim. This is because the guest attacker can craft invalid
+PTEs which are not inverted due to a vulnerable guest kernel. The only
+solution is to disable EPT.
+
+For both MDS and L1TF, if the guest vCPU is configured to not trust each
+other (by tagging separately), then the guest to guest attacks would go away.
+Or it could be a system admin policy which considers guest to guest attacks as
+a guest problem.
+
+Another approach to resolve these would be to make every untrusted task on the
+system to not trust every other untrusted task. While this could reduce
+parallelism of the untrusted tasks, it would still solve the above issues while
+allowing system processes (trusted tasks) to share a core.
+
+Use cases
+---------
+The main use case for Core scheduling is mitigating the cross-HT vulnerabilities
+with SMT enabled. There are other use cases where this feature could be used:
+
+- Isolating tasks that needs a whole core: Examples include realtime tasks, tasks
+  that uses SIMD instructions etc.
+- Gang scheduling: Requirements for a group of tasks that needs to be scheduled
+  together could also be realized using core scheduling. One example is vcpus of
+  a VM.
+
+Future work
+-----------
+1. API Proposals
+^^^^^^^^^^^^^^^^
+
+As mentioned in `Usage`_ section, various API proposals are listed here:
+
+- ``prctl`` : We can pass in a tag and all tasks with same tag set by prctl forms
+  a trusted group.
+
+- ``sched_setattr`` : Similar to prctl, but has the advantage that tasks could be
+  tagged by other tasks with appropriate permissions.
+
+- ``Auto Tagging`` : Related tasks are tagged automatically. Relation could be,
+  threads of the same process, tasks by a user, group or session etc.
+
+- Dedicated CGroup or procfs/sysfs interface for grouping trusted tasks. This could
+  be combined with prctl/sched_setattr as well.
+
+2. Auto-tagging of KVM vCPU threads
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+To make configuration easier, it would be great if KVM auto-tags vCPU threads
+such that a given VM only trusts other vCPUs of the same VM. Or something more
+aggressive like assiging a vCPU thread a unique tag.
+
+3. Auto-tagging of processes by default
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Currently core scheduling does not prevent 'unconfigured' tasks from being
+co-scheduled on the same core. In other words, everything trusts everything
+else by default. If a user wants everything default untrusted, a CONFIG option
+could be added to assign every task with a unique tag by default.
+
+4. Auto-tagging on fork
+^^^^^^^^^^^^^^^^^^^^^^^
+Currently, on fork a thread is added to the same trust-domain as the parent. For
+systems which want all tasks to have a unique tag, it could be desirable to assign
+a unique tag to a task so that the parent does not trust the child by default.
+
+5. Skipping per-HT mitigations if task is trusted
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+If core scheduling is enabled, by default all tasks trust each other as
+mentioned above. In such scenario, it may be desirable to skip the same-HT
+mitigations on return to the trusted user-mode to improve performance.
diff --git a/Documentation/admin-guide/hw-vuln/index.rst b/Documentation/admin-guide/hw-vuln/index.rst
index ca4dbdd9016d..f12cda55538b 100644
--- a/Documentation/admin-guide/hw-vuln/index.rst
+++ b/Documentation/admin-guide/hw-vuln/index.rst
@@ -15,3 +15,4 @@ are configurable at compile, boot or run time.
    tsx_async_abort
    multihit.rst
    special-register-buffer-data-sampling.rst
+   core-scheduling.rst
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v7 23/23] sched: Debug bits...
  2020-08-28 19:51 [RFC PATCH v7 00/23] Core scheduling v7 Julien Desfossez
                   ` (21 preceding siblings ...)
  2020-08-28 19:51 ` [RFC PATCH v7 22/23] Documentation: Add documentation on " Julien Desfossez
@ 2020-08-28 19:51 ` Julien Desfossez
  22 siblings, 0 replies; 68+ messages in thread
From: Julien Desfossez @ 2020-08-28 19:51 UTC (permalink / raw)
  To: Peter Zijlstra, Vineeth Pillai, Joel Fernandes, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan
  Cc: mingo, tglx, pjt, torvalds, linux-kernel, fweisbec, keescook,
	kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta,
	Paolo Bonzini, joel, vineeth, Chen Yu, Christian Brauner,
	Agata Gruza, Antonio Gomez Iglesias, graf, konrad.wilk,
	dfaggioli, rostedt, derkling, benbjiang

From: Peter Zijlstra <peterz@infradead.org>

Not-Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/core.c | 40 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5f77e575bbac..def25fe5e0d4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -123,6 +123,10 @@ static inline bool prio_less(struct task_struct *a, struct task_struct *b)
 
 	int pa = __task_prio(a), pb = __task_prio(b);
 
+	trace_printk("(%s/%d;%d,%Lu,%Lu) ?< (%s/%d;%d,%Lu,%Lu)\n",
+		     a->comm, a->pid, pa, a->se.vruntime, a->dl.deadline,
+		     b->comm, b->pid, pb, b->se.vruntime, b->dl.deadline);
+
 	if (-pa < -pb)
 		return true;
 
@@ -320,12 +324,16 @@ static void __sched_core_enable(void)
 
 	static_branch_enable(&__sched_core_enabled);
 	stop_machine(__sched_core_stopper, (void *)true, NULL);
+
+	printk("core sched enabled\n");
 }
 
 static void __sched_core_disable(void)
 {
 	stop_machine(__sched_core_stopper, (void *)false, NULL);
 	static_branch_disable(&__sched_core_enabled);
+
+	printk("core sched disabled\n");
 }
 
 void sched_core_get(void)
@@ -4977,6 +4985,14 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 			put_prev_task(rq, prev);
 			set_next_task(rq, next);
 		}
+
+		trace_printk("pick pre selected (%u %u %u): %s/%d %lx\n",
+			     rq->core->core_task_seq,
+			     rq->core->core_pick_seq,
+			     rq->core_sched_seq,
+			     next->comm, next->pid,
+			     next->core_cookie);
+
 		return next;
 	}
 
@@ -5062,6 +5078,8 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 			 */
 			if (i == cpu && !need_sync && !p->core_cookie) {
 				next = p;
+				trace_printk("unconstrained pick: %s/%d %lx\n",
+					     next->comm, next->pid, next->core_cookie);
 				goto done;
 			}
 
@@ -5070,6 +5088,9 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 
 			rq_i->core_pick = p;
 
+			trace_printk("cpu(%d): selected: %s/%d %lx\n",
+				     i, p->comm, p->pid, p->core_cookie);
+
 			/*
 			 * If this new candidate is of higher priority than the
 			 * previous; and they're incompatible; we need to wipe
@@ -5086,6 +5107,8 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 				rq->core->core_cookie = p->core_cookie;
 				max = p;
 
+				trace_printk("max: %s/%d %lx\n", max->comm, max->pid, max->core_cookie);
+
 				if (old_max) {
 					for_each_cpu(j, smt_mask) {
 						if (j == i)
@@ -5114,6 +5137,7 @@ next_class:;
 
 	/* Something should have been selected for current CPU */
 	WARN_ON_ONCE(!next);
+	trace_printk("picked: %s/%d %lx\n", next->comm, next->pid, next->core_cookie);
 
 	/*
 	 * Reschedule siblings
@@ -5145,12 +5169,20 @@ next_class:;
 			continue;
 
 		if (rq_i->curr != rq_i->core_pick) {
+			trace_printk("IPI(%d)\n", i);
 			WRITE_ONCE(rq_i->core_pick_seq, rq->core->core_task_seq);
 			resched_curr(rq_i);
 		}
 
 		/* Did we break L1TF mitigation requirements? */
-		WARN_ON_ONCE(!cookie_match(next, rq_i->core_pick));
+		if (unlikely(!cookie_match(next, rq_i->core_pick))) {
+			trace_printk("[%d]: cookie mismatch. %s/%d/0x%lx/0x%lx\n",
+				     rq_i->cpu, rq_i->core_pick->comm,
+				     rq_i->core_pick->pid,
+				     rq_i->core_pick->core_cookie,
+				     rq_i->core->core_cookie);
+			WARN_ON_ONCE(1);
+		}
 	}
 
 done:
@@ -5189,6 +5221,10 @@ static bool try_steal_cookie(int this, int that)
 		if (p->core_occupation > dst->idle->core_occupation)
 			goto next;
 
+		trace_printk("core fill: %s/%d (%d->%d) %d %d %lx\n",
+			     p->comm, p->pid, that, this,
+			     p->core_occupation, dst->idle->core_occupation, cookie);
+
 		p->on_rq = TASK_ON_RQ_MIGRATING;
 		deactivate_task(src, p, 0);
 		set_task_cpu(p, this);
@@ -7900,6 +7936,8 @@ int sched_cpu_starting(unsigned int cpu)
 			rq->core = core_rq;
 		}
 	}
+
+	printk("core: %d -> %d\n", cpu, cpu_of(core_rq));
 #endif /* CONFIG_SCHED_CORE */
 
 	sched_rq_cpu_starting(cpu);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling.
  2020-08-28 19:51 ` [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling Julien Desfossez
@ 2020-08-28 20:51   ` Peter Zijlstra
  2020-08-28 22:02     ` Vineeth Pillai
  2020-08-28 20:55   ` Peter Zijlstra
  2020-09-15 20:08   ` Joel Fernandes
  2 siblings, 1 reply; 68+ messages in thread
From: Peter Zijlstra @ 2020-08-28 20:51 UTC (permalink / raw)
  To: Julien Desfossez
  Cc: Vineeth Pillai, Joel Fernandes, Tim Chen, Aaron Lu, Aubrey Li,
	Dhaval Giani, Chris Hyser, Nishanth Aravamudan, mingo, tglx, pjt,
	torvalds, linux-kernel, fweisbec, keescook, kerrnel, Phil Auld,
	Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini, joel,
	vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, rostedt,
	derkling, benbjiang, Vineeth Remanan Pillai, Aaron Lu

On Fri, Aug 28, 2020 at 03:51:09PM -0400, Julien Desfossez wrote:
> +	smt_weight = cpumask_weight(smt_mask);

> +		for_each_cpu_wrap_or(i, smt_mask, cpumask_of(cpu), cpu) {
> +			struct rq *rq_i = cpu_rq(i);
> +			struct task_struct *p;
> +
> +			/*
> +			 * During hotplug online a sibling can be added in
> +			 * the smt_mask * while we are here. If so, we would
> +			 * need to restart selection by resetting all over.
> +			 */
> +			if (unlikely(smt_weight != cpumask_weight(smt_mask)))
> +				goto retry_select;

cpumask_weigt() is fairly expensive, esp. for something that should
'never' happen.

What exactly is the race here?

We'll update the cpu_smt_mask() fairly early in secondary bringup, but
where does it become a problem?

The moment the new thread starts scheduling it'll block on the common
rq->lock and then it'll cycle task_seq and do a new pick.

So where do things go side-ways?

Can we please split out this hotplug 'fix' into a separate patch with a
coherent changelog.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling.
  2020-08-28 19:51 ` [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling Julien Desfossez
  2020-08-28 20:51   ` Peter Zijlstra
@ 2020-08-28 20:55   ` Peter Zijlstra
  2020-08-28 22:15     ` Vineeth Pillai
  2020-09-15 20:08   ` Joel Fernandes
  2 siblings, 1 reply; 68+ messages in thread
From: Peter Zijlstra @ 2020-08-28 20:55 UTC (permalink / raw)
  To: Julien Desfossez
  Cc: Vineeth Pillai, Joel Fernandes, Tim Chen, Aaron Lu, Aubrey Li,
	Dhaval Giani, Chris Hyser, Nishanth Aravamudan, mingo, tglx, pjt,
	torvalds, linux-kernel, fweisbec, keescook, kerrnel, Phil Auld,
	Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini, joel,
	vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, rostedt,
	derkling, benbjiang, Vineeth Remanan Pillai, Aaron Lu

On Fri, Aug 28, 2020 at 03:51:09PM -0400, Julien Desfossez wrote:
> +		if (is_idle_task(rq_i->core_pick) && rq_i->nr_running)
> +			rq_i->core_forceidle = true;

Did you mean: rq_i->core_pick == rq_i->idle ?

is_idle_task() will also match idle-injection threads, which I'm not
sure we want here.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 09/23] sched/fair: Fix forced idle sibling starvation corner case
  2020-08-28 19:51 ` [RFC PATCH v7 09/23] sched/fair: Fix forced idle sibling starvation corner case Julien Desfossez
@ 2020-08-28 21:25   ` Peter Zijlstra
  2020-08-28 23:24     ` Vineeth Pillai
  0 siblings, 1 reply; 68+ messages in thread
From: Peter Zijlstra @ 2020-08-28 21:25 UTC (permalink / raw)
  To: Julien Desfossez
  Cc: Vineeth Pillai, Joel Fernandes, Tim Chen, Aaron Lu, Aubrey Li,
	Dhaval Giani, Chris Hyser, Nishanth Aravamudan, mingo, tglx, pjt,
	torvalds, linux-kernel, fweisbec, keescook, kerrnel, Phil Auld,
	Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini, joel,
	vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, rostedt,
	derkling, benbjiang, Vineeth Remanan Pillai

On Fri, Aug 28, 2020 at 03:51:10PM -0400, Julien Desfossez wrote:
> +/*
> + * If runqueue has only one task which used up its slice and if the sibling
> + * is forced idle, then trigger schedule to give forced idle task a chance.
> + */
> +static void resched_forceidle_sibling(struct rq *rq, struct sched_entity *se)
> +{
> +	int cpu = cpu_of(rq), sibling_cpu;
> +
> +	if (rq->cfs.nr_running > 1 || !__entity_slice_used(se))
> +		return;
> +
> +	for_each_cpu(sibling_cpu, cpu_smt_mask(cpu)) {
> +		struct rq *sibling_rq;
> +		if (sibling_cpu == cpu)
> +			continue;
> +		if (cpu_is_offline(sibling_cpu))
> +			continue;
> +
> +		sibling_rq = cpu_rq(sibling_cpu);
> +		if (sibling_rq->core_forceidle) {
> +			resched_curr(sibling_rq);
> +		}
> +	}

The only pupose of this loop seem to be to find if we have a forceidle;
surely we can avoid that by storing this during the pick.

> +}

static void task_tick_core(struct rq *rq)
{
	if (sched_core_enabled(rq))
		resched_forceidle_sibling(rq, &rq->curr->se);
}

#else

static void task_tick_core(struct rq *rq) { }

> +#endif
> +
>  /*
>   * scheduler tick hitting a task of our scheduling class.
>   *
> @@ -10654,6 +10688,11 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
>  
>  	update_misfit_status(curr, rq);
>  	update_overutilized_status(task_rq(curr));
> +
> +#ifdef CONFIG_SCHED_CORE
> +	if (sched_core_enabled(rq))
> +		resched_forceidle_sibling(rq, &curr->se);
> +#endif

Then you can ditch the #ifdef here

>  }
>  
>  /*
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 11/23] sched/fair: core wide cfs task priority comparison
  2020-08-28 19:51 ` [RFC PATCH v7 11/23] sched/fair: core wide cfs task priority comparison Julien Desfossez
@ 2020-08-28 21:29   ` Peter Zijlstra
  2020-09-17 14:15     ` Vineeth Pillai
  2020-09-23  1:46     ` Joel Fernandes
  2020-09-15 21:49   ` chris hyser
  1 sibling, 2 replies; 68+ messages in thread
From: Peter Zijlstra @ 2020-08-28 21:29 UTC (permalink / raw)
  To: Julien Desfossez
  Cc: Vineeth Pillai, Joel Fernandes, Tim Chen, Aaron Lu, Aubrey Li,
	Dhaval Giani, Chris Hyser, Nishanth Aravamudan, mingo, tglx, pjt,
	torvalds, linux-kernel, fweisbec, keescook, kerrnel, Phil Auld,
	Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini, joel,
	vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, rostedt,
	derkling, benbjiang, Aaron Lu



This is still a horrible patch..

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling.
  2020-08-28 20:51   ` Peter Zijlstra
@ 2020-08-28 22:02     ` Vineeth Pillai
  2020-08-28 22:23       ` Joel Fernandes
  2020-08-29  7:47       ` peterz
  0 siblings, 2 replies; 68+ messages in thread
From: Vineeth Pillai @ 2020-08-28 22:02 UTC (permalink / raw)
  To: Peter Zijlstra, Julien Desfossez
  Cc: Joel Fernandes, Tim Chen, Aaron Lu, Aubrey Li, Dhaval Giani,
	Chris Hyser, Nishanth Aravamudan, mingo, tglx, pjt, torvalds,
	linux-kernel, fweisbec, keescook, kerrnel, Phil Auld,
	Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini, joel,
	vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, rostedt,
	derkling, benbjiang, Vineeth Remanan Pillai, Aaron Lu



On 8/28/20 4:51 PM, Peter Zijlstra wrote:
> cpumask_weigt() is fairly expensive, esp. for something that should
> 'never' happen.
>
> What exactly is the race here?
>
> We'll update the cpu_smt_mask() fairly early in secondary bringup, but
> where does it become a problem?
>
> The moment the new thread starts scheduling it'll block on the common
> rq->lock and then it'll cycle task_seq and do a new pick.
>
> So where do things go side-ways?
During hotplug stress test, we have noticed that while a sibling is in
pick_next_task, another sibling can go offline or come online. What
we have observed is smt_mask get updated underneath us even if
we hold the lock. From reading the code, looks like we don't hold the
rq lock when the mask is updated. This extra logic was to take care of that.

> Can we please split out this hotplug 'fix' into a separate patch with a
> coherent changelog.
Sorry about this. I had posted this as separate patches in v6 list,
but merged it for v7. Will split it and have details about the fix in
next iteration.

Thanks,
Vineeth

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling.
  2020-08-28 20:55   ` Peter Zijlstra
@ 2020-08-28 22:15     ` Vineeth Pillai
  0 siblings, 0 replies; 68+ messages in thread
From: Vineeth Pillai @ 2020-08-28 22:15 UTC (permalink / raw)
  To: Peter Zijlstra, Julien Desfossez
  Cc: Joel Fernandes, Tim Chen, Aaron Lu, Aubrey Li, Dhaval Giani,
	Chris Hyser, Nishanth Aravamudan, mingo, tglx, pjt, torvalds,
	linux-kernel, fweisbec, keescook, kerrnel, Phil Auld,
	Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini, joel,
	vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, rostedt,
	derkling, benbjiang, Vineeth Remanan Pillai, Aaron Lu



On 8/28/20 4:55 PM, Peter Zijlstra wrote:
> On Fri, Aug 28, 2020 at 03:51:09PM -0400, Julien Desfossez wrote:
>> +		if (is_idle_task(rq_i->core_pick) && rq_i->nr_running)
>> +			rq_i->core_forceidle = true;
> Did you mean: rq_i->core_pick == rq_i->idle ?
>
> is_idle_task() will also match idle-injection threads, which I'm not
> sure we want here.
Thanks for catching this. You are right, we should be checking for
rq_i->idle. There are couple other places where we use is_idle_task.
Will go through them and verify if we need to fix there as well.

Thanks,
Vineeth


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling.
  2020-08-28 22:02     ` Vineeth Pillai
@ 2020-08-28 22:23       ` Joel Fernandes
  2020-08-29  7:47       ` peterz
  1 sibling, 0 replies; 68+ messages in thread
From: Joel Fernandes @ 2020-08-28 22:23 UTC (permalink / raw)
  To: Vineeth Pillai, Peter Zijlstra
  Cc: Julien Desfossez, Tim Chen, Aaron Lu, Aubrey Li, Dhaval Giani,
	Chris Hyser, Nishanth Aravamudan, mingo, tglx, pjt, torvalds,
	linux-kernel, fweisbec, keescook, kerrnel, Phil Auld,
	Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini,
	vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, rostedt,
	derkling, benbjiang, Vineeth Remanan Pillai, Aaron Lu

On Fri, Aug 28, 2020 at 06:02:25PM -0400, Vineeth Pillai wrote:
[...] 
> > Can we please split out this hotplug 'fix' into a separate patch with a
> > coherent changelog.
> Sorry about this. I had posted this as separate patches in v6 list,
> but merged it for v7. Will split it and have details about the fix in
> next iteration.

Thanks Vineeth. Peter, also the "v6+" series (which were some addons on v6)
detail the individual hotplug changes squashed into this patch:
https://lore.kernel.org/lkml/20200815031908.1015049-9-joel@joelfernandes.org/
https://lore.kernel.org/lkml/20200815031908.1015049-11-joel@joelfernandes.org/
https://lore.kernel.org/lkml/20200815031908.1015049-12-joel@joelfernandes.org/
https://lore.kernel.org/lkml/20200815031908.1015049-13-joel@joelfernandes.org/

Agreed we can split the patches for the next series, however for final
upstream merge, I suggest we fix hotplug issues in this patch itself so that
we don't break bisectability.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 09/23] sched/fair: Fix forced idle sibling starvation corner case
  2020-08-28 21:25   ` Peter Zijlstra
@ 2020-08-28 23:24     ` Vineeth Pillai
  0 siblings, 0 replies; 68+ messages in thread
From: Vineeth Pillai @ 2020-08-28 23:24 UTC (permalink / raw)
  To: Peter Zijlstra, Julien Desfossez
  Cc: Joel Fernandes, Tim Chen, Aaron Lu, Aubrey Li, Dhaval Giani,
	Chris Hyser, Nishanth Aravamudan, mingo, tglx, pjt, torvalds,
	linux-kernel, fweisbec, keescook, kerrnel, Phil Auld,
	Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini, joel,
	vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, rostedt,
	derkling, benbjiang



On 8/28/20 5:25 PM, Peter Zijlstra wrote:
> The only pupose of this loop seem to be to find if we have a forceidle;
> surely we can avoid that by storing this during the pick.
The idea was to kick each cpu that was force idle. But now, thinking
about it, we just need to kick one as it will pick for all the siblings.
Will optimize this as you suggested.

>
> static void task_tick_core(struct rq *rq)
> {
> 	if (sched_core_enabled(rq))
> 		resched_forceidle_sibling(rq, &rq->curr->se);
> }
>
> #else
>
> static void task_tick_core(struct rq *rq) { }
>
>> +#endif
>> +
>>   /*
>>    * scheduler tick hitting a task of our scheduling class.
>>    *
>> @@ -10654,6 +10688,11 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
>>   
>>   	update_misfit_status(curr, rq);
>>   	update_overutilized_status(task_rq(curr));
>> +
>> +#ifdef CONFIG_SCHED_CORE
>> +	if (sched_core_enabled(rq))
>> +		resched_forceidle_sibling(rq, &curr->se);
>> +#endif
> Then you can ditch the #ifdef here
Makes sense, will do.

Thanks,
Vineeth


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling.
  2020-08-28 22:02     ` Vineeth Pillai
  2020-08-28 22:23       ` Joel Fernandes
@ 2020-08-29  7:47       ` peterz
  2020-08-31 13:01         ` Vineeth Pillai
                           ` (3 more replies)
  1 sibling, 4 replies; 68+ messages in thread
From: peterz @ 2020-08-29  7:47 UTC (permalink / raw)
  To: Vineeth Pillai
  Cc: Julien Desfossez, Joel Fernandes, Tim Chen, Aaron Lu, Aubrey Li,
	Dhaval Giani, Chris Hyser, Nishanth Aravamudan, mingo, tglx, pjt,
	torvalds, linux-kernel, fweisbec, keescook, kerrnel, Phil Auld,
	Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini, joel,
	vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, rostedt,
	derkling, benbjiang, Vineeth Remanan Pillai, Aaron Lu

On Fri, Aug 28, 2020 at 06:02:25PM -0400, Vineeth Pillai wrote:
> On 8/28/20 4:51 PM, Peter Zijlstra wrote:

> > So where do things go side-ways?

> During hotplug stress test, we have noticed that while a sibling is in
> pick_next_task, another sibling can go offline or come online. What
> we have observed is smt_mask get updated underneath us even if
> we hold the lock. From reading the code, looks like we don't hold the
> rq lock when the mask is updated. This extra logic was to take care of that.

Sure, the mask is updated async, but _where_ is the actual problem with
that?

On Fri, Aug 28, 2020 at 06:23:55PM -0400, Joel Fernandes wrote:
> Thanks Vineeth. Peter, also the "v6+" series (which were some addons on v6)
> detail the individual hotplug changes squashed into this patch:
> https://lore.kernel.org/lkml/20200815031908.1015049-9-joel@joelfernandes.org/
> https://lore.kernel.org/lkml/20200815031908.1015049-11-joel@joelfernandes.org/

That one looks fishy, the pick is core wide, making that pick_seq per rq
just doesn't make sense.

> https://lore.kernel.org/lkml/20200815031908.1015049-12-joel@joelfernandes.org/

This one reads like tinkering, there is no description of the actual
problem just some code that makes a symptom go away.

Sure, on hotplug the smt mask can change, but only for a CPU that isn't
actually scheduling, so who cares.

/me re-reads the hotplug code...

..ooOO is the problem that we clear the cpumasks on take_cpu_down()
instead of play_dead() ?! That should be fixable.

> https://lore.kernel.org/lkml/20200815031908.1015049-13-joel@joelfernandes.org/

This is the only one that makes some sense, it makes rq->core consistent
over hotplug.

> Agreed we can split the patches for the next series, however for final
> upstream merge, I suggest we fix hotplug issues in this patch itself so that
> we don't break bisectability.

Meh, who sodding cares about hotplug :-). Also you can 'fix' such things
by making sure you can't actually enable core-sched until after
everything is in place.



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 15/23] entry/idle: Add a common function for activites during idle entry/exit
  2020-08-28 19:51 ` [RFC PATCH v7 15/23] entry/idle: Add a common function for activites during idle entry/exit Julien Desfossez
@ 2020-08-30  2:17   ` kernel test robot
  0 siblings, 0 replies; 68+ messages in thread
From: kernel test robot @ 2020-08-30  2:17 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2097 bytes --]

Hi Julien,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on tip/sched/core]
[also build test ERROR on kvm/linux-next linus/master v5.9-rc2 next-20200828]
[cannot apply to tip/x86/core asm-generic/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Julien-Desfossez/Core-scheduling-v7/20200829-035707
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 4fc472f1214ef75e5450f207e23ff13af6eecad4
config: mips-ip28_defconfig (attached as .config)
compiler: mips64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=mips 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from kernel/sched/idle.c:11:
>> include/linux/entry-common.h:10:10: fatal error: asm/entry-common.h: No such file or directory
      10 | #include <asm/entry-common.h>
         |          ^~~~~~~~~~~~~~~~~~~~
   compilation terminated.

# https://github.com/0day-ci/linux/commit/5546ffd4d0856e8e6ee71a8a205a623e315f0574
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Julien-Desfossez/Core-scheduling-v7/20200829-035707
git checkout 5546ffd4d0856e8e6ee71a8a205a623e315f0574
vim +10 include/linux/entry-common.h

142781e108b13b2 Thomas Gleixner 2020-07-22   9  
142781e108b13b2 Thomas Gleixner 2020-07-22 @10  #include <asm/entry-common.h>
142781e108b13b2 Thomas Gleixner 2020-07-22  11  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 12298 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [kernel/entry] 872a0a3f0b: will-it-scale.per_thread_ops -18.7% regression
  2020-08-28 19:51 ` [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode Julien Desfossez
@ 2020-08-30  6:50   ` kernel test robot
  2020-09-01 15:54   ` [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode Thomas Gleixner
  1 sibling, 0 replies; 68+ messages in thread
From: kernel test robot @ 2020-08-30  6:50 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 42683 bytes --]

Greeting,

FYI, we noticed a -18.7% regression of will-it-scale.per_thread_ops due to commit:


commit: 872a0a3f0b5ada9898fb60c124553f1c4fcb6f7a ("[RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode")
url: https://github.com/0day-ci/linux/commits/Julien-Desfossez/Core-scheduling-v7/20200829-035707
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 4fc472f1214ef75e5450f207e23ff13af6eecad4

in testcase: will-it-scale
on test machine: 192 threads Cooper Lake with 128G memory
with following parameters:

	nr_task: 100%
	mode: thread
	test: futex4
	cpufreq_governor: performance
	ucode: 0x86000017

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <rong.a.chen@intel.com>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install job.yaml  # job file is attached in this email
        bin/lkp run     job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
  gcc-9/performance/x86_64-rhel-8.3/thread/100%/debian-10.4-x86_64-20200603.cgz/lkp-cpx-4s1/futex4/will-it-scale/0x86000017

commit: 
  6dfa6dbc54 ("arch/x86: Add a new TIF flag for untrusted tasks")
  872a0a3f0b ("kernel/entry: Add support for core-wide protection of kernel-mode")

6dfa6dbc5446f375 872a0a3f0b5ada9898fb60c1245 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
   5531440           -18.7%    4497285        will-it-scale.per_thread_ops
 1.062e+09           -18.7%  8.635e+08        will-it-scale.workload
   1258589 ±  4%     -10.5%    1126152        meminfo.DirectMap4k
 3.961e+08 ±  2%     -19.6%  3.185e+08 ±  3%  cpuidle.C1E.time
    909108 ±  2%     -17.3%     752096 ±  3%  cpuidle.C1E.usage
     74.00            -9.5%      67.00        vmstat.cpu.sy
     24.00           +29.2%      31.00        vmstat.cpu.us
      0.00 ± 75%      +0.0        0.00 ± 29%  mpstat.cpu.all.iowait%
     73.09            -6.9       66.21        mpstat.cpu.all.sys%
     24.72            +6.8       31.49        mpstat.cpu.all.usr%
    211219 ±  5%     -10.0%     190085        numa-numastat.node0.local_node
     22711 ± 25%     +36.4%      30974        numa-numastat.node0.other_node
    195909 ±  8%     +10.3%     216119        numa-numastat.node1.local_node
     11995 ±  2%      +2.8%      12331        proc-vmstat.nr_mapped
     31991            +1.3%      32398        proc-vmstat.nr_slab_reclaimable
     86691            +0.8%      87392        proc-vmstat.nr_slab_unreclaimable
      2552 ±  2%     +12.0%       2857 ±  8%  slabinfo.PING.active_objs
      2552 ±  2%     +12.0%       2857 ±  8%  slabinfo.PING.num_objs
      2039 ± 10%     +18.2%       2410 ±  7%  slabinfo.khugepaged_mm_slot.active_objs
      2039 ± 10%     +18.2%       2410 ±  7%  slabinfo.khugepaged_mm_slot.num_objs
      2478 ±  3%      +4.2%       2583 ±  4%  slabinfo.kmalloc-rcl-96.active_objs
      2478 ±  3%      +4.2%       2583 ±  4%  slabinfo.kmalloc-rcl-96.num_objs
    969.00           +18.4%       1147 ±  6%  slabinfo.mnt_cache.active_objs
    969.00           +18.4%       1147 ±  6%  slabinfo.mnt_cache.num_objs
      8007 ±  3%      +6.9%       8555 ±  4%  slabinfo.shmem_inode_cache.active_objs
      8007 ±  3%      +6.9%       8555 ±  4%  slabinfo.shmem_inode_cache.num_objs
      4608            +7.8%       4969 ±  6%  slabinfo.sock_inode_cache.active_objs
      4608            +7.8%       4969 ±  6%  slabinfo.sock_inode_cache.num_objs
      5500 ± 17%     +33.5%       7342 ± 23%  sched_debug.cfs_rq:/.load.avg
     10111 ±120%    +142.9%      24560 ± 44%  sched_debug.cfs_rq:/.load.stddev
      7.88 ± 10%     +26.7%       9.98 ±  2%  sched_debug.cfs_rq:/.load_avg.avg
      0.71 ± 70%    +244.7%       2.44 ±  7%  sched_debug.cfs_rq:/.removed.load_avg.avg
      8.83 ± 70%    +122.8%      19.67 ±  4%  sched_debug.cfs_rq:/.removed.load_avg.stddev
      0.16 ±109%    +699.4%       1.27 ±  6%  sched_debug.cfs_rq:/.removed.runnable_avg.avg
     25.11 ±111%    +252.4%      88.50        sched_debug.cfs_rq:/.removed.runnable_avg.max
      1.99 ±110%    +415.6%      10.26 ±  3%  sched_debug.cfs_rq:/.removed.runnable_avg.stddev
      0.16 ±110%    +701.2%       1.27 ±  6%  sched_debug.cfs_rq:/.removed.util_avg.avg
     25.06 ±111%    +253.2%      88.50        sched_debug.cfs_rq:/.removed.util_avg.max
      1.99 ±111%    +416.7%      10.26 ±  3%  sched_debug.cfs_rq:/.removed.util_avg.stddev
   2248691 ± 15%     -19.3%    1814196        sched_debug.cpu.avg_idle.max
    213241 ± 16%     -28.6%     152292        sched_debug.cpu.avg_idle.stddev
     22983 ± 41%     -50.5%      11386 ±  4%  sched_debug.cpu.max_idle_balance_cost.stddev
      0.00 ± 35%    +125.0%       0.00 ± 33%  sched_debug.cpu.nr_uninterruptible.avg
    -50.67           -51.6%     -24.50        sched_debug.cpu.nr_uninterruptible.min
     51.35 ± 26%     +16.6%      59.87 ± 24%  sched_debug.cpu.sched_goidle.avg
    246.17 ±  2%      +8.5%     267.17 ±  2%  sched_debug.cpu.ttwu_count.min
    241.72 ±  2%      +7.9%     260.75 ±  3%  sched_debug.cpu.ttwu_local.min
      0.08 ±141%    +290.8%       0.30 ± 53%  sched_debug.rt_rq:/.rt_time.avg
     14.93 ±141%    +290.4%      58.31 ± 54%  sched_debug.rt_rq:/.rt_time.max
      1.07 ±141%    +290.4%       4.20 ± 54%  sched_debug.rt_rq:/.rt_time.stddev
      3636 ± 26%     -39.7%       2193 ±  2%  numa-vmstat.node0.nr_shmem
     11483 ± 24%     -36.0%       7349 ±  5%  numa-vmstat.node0.nr_slab_reclaimable
     30465 ±  2%     -20.2%      24303 ±  6%  numa-vmstat.node0.nr_slab_unreclaimable
    780189 ±  4%     -19.9%     625104 ±  3%  numa-vmstat.node0.numa_hit
    755951 ±  4%     -26.7%     554163 ± 11%  numa-vmstat.node0.numa_local
     24237 ± 26%    +192.7%      70941 ± 53%  numa-vmstat.node0.numa_other
    200.00 ± 74%     +99.5%     399.00 ± 32%  numa-vmstat.node1.nr_active_anon
      6428 ±  3%     +16.0%       7456 ±  4%  numa-vmstat.node1.nr_kernel_stack
      2228 ±  3%     +33.8%       2980 ± 25%  numa-vmstat.node1.nr_mapped
    211.33 ± 37%    +108.2%     440.00 ± 15%  numa-vmstat.node1.nr_page_table_pages
    264.00 ± 58%    +575.4%       1783 ± 63%  numa-vmstat.node1.nr_shmem
      6815 ± 17%     +38.4%       9435 ±  3%  numa-vmstat.node1.nr_slab_reclaimable
     18073           +32.1%      23867 ± 12%  numa-vmstat.node1.nr_slab_unreclaimable
    200.00 ± 74%     +99.5%     399.00 ± 32%  numa-vmstat.node1.nr_zone_active_anon
    541783 ±  5%     +21.8%     660042 ±  9%  numa-vmstat.node1.numa_hit
    428129 ±  7%     +38.0%     590982 ± 17%  numa-vmstat.node1.numa_local
    129.67 ±104%     +90.5%     247.00 ± 64%  numa-vmstat.node2.nr_active_anon
      2156 ±  3%     +23.5%       2662 ± 16%  numa-vmstat.node2.nr_mapped
    174.67 ± 67%    +705.8%       1407 ± 84%  numa-vmstat.node2.nr_shmem
      5482 ± 31%     +31.6%       7215 ± 31%  numa-vmstat.node2.nr_slab_reclaimable
     17274 ±  9%     +11.0%      19177 ±  2%  numa-vmstat.node2.nr_slab_unreclaimable
    129.67 ±104%     +90.5%     247.00 ± 64%  numa-vmstat.node2.nr_zone_active_anon
    473925            +7.3%     508618 ±  3%  numa-vmstat.node2.numa_hit
    357492 ±  2%     +11.3%     397776 ±  2%  numa-vmstat.node2.numa_local
      3951 ± 12%     -20.9%       3125        numa-vmstat.node3.nr_mapped
     45935 ± 24%     -36.0%      29397 ±  5%  numa-meminfo.node0.KReclaimable
     45935 ± 24%     -36.0%      29397 ±  5%  numa-meminfo.node0.SReclaimable
    121865 ±  2%     -20.2%      97217 ±  6%  numa-meminfo.node0.SUnreclaim
     14543 ± 26%     -39.7%       8774 ±  2%  numa-meminfo.node0.Shmem
    167801 ±  7%     -24.5%     126615 ±  5%  numa-meminfo.node0.Slab
    801.67 ± 74%    +110.3%       1686 ± 25%  numa-meminfo.node1.Active
    801.67 ± 74%     +99.1%       1596 ± 32%  numa-meminfo.node1.Active(anon)
     91102 ± 76%     -73.8%      23869        numa-meminfo.node1.AnonHugePages
     27262 ± 17%     +38.5%      37745 ±  3%  numa-meminfo.node1.KReclaimable
      6428 ±  3%     +16.0%       7454 ±  4%  numa-meminfo.node1.KernelStack
      8843 ±  2%     +34.5%      11896 ± 25%  numa-meminfo.node1.Mapped
    848.33 ± 37%    +107.6%       1761 ± 15%  numa-meminfo.node1.PageTables
     27262 ± 17%     +38.5%      37745 ±  3%  numa-meminfo.node1.SReclaimable
     72296           +32.1%      95472 ± 12%  numa-meminfo.node1.SUnreclaim
      1058 ± 58%    +574.1%       7132 ± 63%  numa-meminfo.node1.Shmem
     99559 ±  5%     +33.8%     133217 ± 10%  numa-meminfo.node1.Slab
    519.67 ±104%     +90.2%     988.50 ± 64%  numa-meminfo.node2.Active
    519.67 ±104%     +90.2%     988.50 ± 64%  numa-meminfo.node2.Active(anon)
     21930 ± 31%     +31.6%      28864 ± 31%  numa-meminfo.node2.KReclaimable
      8553 ±  3%     +24.2%      10624 ± 16%  numa-meminfo.node2.Mapped
    625333 ±  3%     +22.9%     768755 ±  5%  numa-meminfo.node2.MemUsed
     21930 ± 31%     +31.6%      28864 ± 31%  numa-meminfo.node2.SReclaimable
     69099 ±  9%     +11.0%      76713 ±  2%  numa-meminfo.node2.SUnreclaim
    699.00 ± 66%    +705.5%       5630 ± 84%  numa-meminfo.node2.Shmem
     91030 ± 14%     +16.0%     105577 ±  7%  numa-meminfo.node2.Slab
     15679 ± 14%     -21.5%      12306 ±  2%  numa-meminfo.node3.Mapped
    728.67 ± 23%     -40.2%     435.50 ± 12%  interrupts.31:PCI-MSI.524289-edge.eth0-TxRx-0
      1003 ±111%    +187.4%       2885 ± 84%  interrupts.33:PCI-MSI.524291-edge.eth0-TxRx-2
    320.33 ±  2%     +36.1%     436.00        interrupts.CPU0.RES:Rescheduling_interrupts
      6762 ± 35%     +51.0%      10208        interrupts.CPU1.NMI:Non-maskable_interrupts
      6762 ± 35%     +51.0%      10208        interrupts.CPU1.PMI:Performance_monitoring_interrupts
    389.33 ±  8%      +5.1%     409.00 ±  8%  interrupts.CPU1.RES:Rescheduling_interrupts
      6763 ± 35%     +13.2%       7653 ± 33%  interrupts.CPU10.NMI:Non-maskable_interrupts
      6763 ± 35%     +13.2%       7653 ± 33%  interrupts.CPU10.PMI:Performance_monitoring_interrupts
      1003 ±111%    +187.4%       2885 ± 84%  interrupts.CPU11.33:PCI-MSI.524291-edge.eth0-TxRx-2
      6763 ± 35%     +13.1%       7652 ± 33%  interrupts.CPU12.NMI:Non-maskable_interrupts
      6763 ± 35%     +13.1%       7652 ± 33%  interrupts.CPU12.PMI:Performance_monitoring_interrupts
     11897 ± 27%     +29.9%      15453 ± 12%  interrupts.CPU120.CAL:Function_call_interrupts
      6763 ± 35%     +13.2%       7654 ± 33%  interrupts.CPU13.NMI:Non-maskable_interrupts
      6763 ± 35%     +13.2%       7654 ± 33%  interrupts.CPU13.PMI:Performance_monitoring_interrupts
    298.67          +153.8%     758.00 ± 59%  interrupts.CPU135.RES:Rescheduling_interrupts
    300.33           +25.4%     376.50 ± 19%  interrupts.CPU137.RES:Rescheduling_interrupts
      6763 ± 35%     +13.1%       7652 ± 33%  interrupts.CPU14.NMI:Non-maskable_interrupts
      6763 ± 35%     +13.1%       7652 ± 33%  interrupts.CPU14.PMI:Performance_monitoring_interrupts
    299.33           +27.1%     380.50 ± 21%  interrupts.CPU142.RES:Rescheduling_interrupts
     11843 ±  3%     +22.1%      14457 ±  8%  interrupts.CPU168.CAL:Function_call_interrupts
    377.00 ± 10%     -12.5%     330.00        interrupts.CPU169.RES:Rescheduling_interrupts
      5326 ± 17%     -28.7%       3798        interrupts.CPU2.CAL:Function_call_interrupts
      6762 ± 35%     +13.2%       7652 ± 33%  interrupts.CPU2.NMI:Non-maskable_interrupts
      6762 ± 35%     +13.2%       7652 ± 33%  interrupts.CPU2.PMI:Performance_monitoring_interrupts
      6892 ± 35%     +13.3%       7812 ± 33%  interrupts.CPU25.NMI:Non-maskable_interrupts
      6892 ± 35%     +13.3%       7812 ± 33%  interrupts.CPU25.PMI:Performance_monitoring_interrupts
      3558 ±  2%     +25.2%       4456 ± 19%  interrupts.CPU28.CAL:Function_call_interrupts
    301.00            +9.3%     329.00 ±  7%  interrupts.CPU35.RES:Rescheduling_interrupts
    300.33           +26.5%     380.00 ± 19%  interrupts.CPU38.RES:Rescheduling_interrupts
    339.33 ±  6%     +20.5%     409.00 ± 10%  interrupts.CPU49.RES:Rescheduling_interrupts
      6763 ± 35%     +13.1%       7652 ± 33%  interrupts.CPU5.NMI:Non-maskable_interrupts
      6763 ± 35%     +13.1%       7652 ± 33%  interrupts.CPU5.PMI:Performance_monitoring_interrupts
      4316 ± 18%     +22.2%       5274 ± 22%  interrupts.CPU50.CAL:Function_call_interrupts
    416.33 ± 33%     +36.7%     569.00 ± 28%  interrupts.CPU50.RES:Rescheduling_interrupts
    299.67           +59.3%     477.50 ± 37%  interrupts.CPU56.RES:Rescheduling_interrupts
      6763 ± 35%     +13.1%       7649 ± 33%  interrupts.CPU6.NMI:Non-maskable_interrupts
      6763 ± 35%     +13.1%       7649 ± 33%  interrupts.CPU6.PMI:Performance_monitoring_interrupts
    304.00           +52.6%     464.00 ± 33%  interrupts.CPU6.RES:Rescheduling_interrupts
      3958 ±  2%     +33.1%       5269        interrupts.CPU73.CAL:Function_call_interrupts
    728.67 ± 23%     -40.2%     435.50 ± 12%  interrupts.CPU9.31:PCI-MSI.524289-edge.eth0-TxRx-0
    320.00 ±  6%      +4.4%     334.00 ±  7%  interrupts.CPU9.RES:Rescheduling_interrupts
      0.03           +82.3%       0.06 ± 43%  perf-stat.i.MPKI
 8.386e+10            -7.3%  7.776e+10        perf-stat.i.branch-instructions
      0.33            -0.0        0.30        perf-stat.i.branch-miss-rate%
 2.694e+08           -18.0%   2.21e+08        perf-stat.i.branch-misses
   1612308 ±  2%     +16.4%    1875933 ±  2%  perf-stat.i.cache-misses
  10937449 ±  2%    +133.2%   25508216 ± 57%  perf-stat.i.cache-references
      1.20            +7.6%       1.29        perf-stat.i.cpi
    261.33            +3.6%     270.66        perf-stat.i.cpu-migrations
    453315 ±  3%     -14.1%     389557 ±  3%  perf-stat.i.cycles-between-cache-misses
      0.00 ±  3%      +0.0        0.00 ±  9%  perf-stat.i.dTLB-load-miss-rate%
    370191 ±  4%     +15.1%     425932 ±  9%  perf-stat.i.dTLB-load-misses
 1.536e+11            -3.0%   1.49e+11        perf-stat.i.dTLB-loads
      0.00            +0.0        0.00        perf-stat.i.dTLB-store-miss-rate%
    221410           +27.8%     283043        perf-stat.i.dTLB-store-misses
 1.165e+11            -3.9%   1.12e+11        perf-stat.i.dTLB-stores
 2.923e+08           -14.6%  2.495e+08        perf-stat.i.iTLB-load-misses
    131126 ±  8%     -20.6%     104171        perf-stat.i.iTLB-loads
  5.76e+11            -6.8%   5.37e+11        perf-stat.i.instructions
      1979            +9.1%       2159        perf-stat.i.instructions-per-iTLB-miss
      0.84            -7.1%       0.78        perf-stat.i.ipc
      0.80            +9.6%       0.88        perf-stat.i.metric.K/sec
      1843            -4.3%       1764        perf-stat.i.metric.M/sec
    274607            +9.5%     300796        perf-stat.i.node-load-misses
     40900 ±  5%     +14.7%      46909        perf-stat.i.node-loads
    100234           +16.5%     116732        perf-stat.i.node-store-misses
      6472 ±  7%     +13.4%       7341 ±  3%  perf-stat.i.node-stores
      0.02          +146.0%       0.05 ± 57%  perf-stat.overall.MPKI
      0.32            -0.0        0.28        perf-stat.overall.branch-miss-rate%
      1.20            +7.7%       1.29        perf-stat.overall.cpi
    425551 ±  3%     -13.3%     369134 ±  2%  perf-stat.overall.cycles-between-cache-misses
      0.00            +0.0        0.00        perf-stat.overall.dTLB-store-miss-rate%
      1972            +9.1%       2151        perf-stat.overall.instructions-per-iTLB-miss
      0.84            -7.1%       0.78        perf-stat.overall.ipc
    163678           +14.6%     187533        perf-stat.overall.path-length
 8.358e+10            -7.3%   7.75e+10        perf-stat.ps.branch-instructions
 2.685e+08           -18.0%  2.203e+08        perf-stat.ps.branch-misses
   1615571 ±  3%     +15.7%    1869561 ±  2%  perf-stat.ps.cache-misses
    257.16            +3.2%     265.37        perf-stat.ps.cpu-migrations
 1.531e+11            -3.0%  1.485e+11        perf-stat.ps.dTLB-loads
    221911           +27.2%     282364        perf-stat.ps.dTLB-store-misses
 1.161e+11            -3.9%  1.116e+11        perf-stat.ps.dTLB-stores
 2.911e+08           -14.5%  2.488e+08        perf-stat.ps.iTLB-load-misses
    130387 ±  7%     -21.6%     102175        perf-stat.ps.iTLB-loads
 5.741e+11            -6.8%  5.352e+11        perf-stat.ps.instructions
    273698            +9.4%     299401        perf-stat.ps.node-load-misses
     99772           +16.4%     116123        perf-stat.ps.node-store-misses
      6700 ±  8%     +10.2%       7384 ±  3%  perf-stat.ps.node-stores
 1.738e+14            -6.8%  1.619e+14        perf-stat.total.instructions
     52.03           -10.2       41.84        perf-profile.calltrace.cycles-pp.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
     48.58            -9.9       38.64        perf-profile.calltrace.cycles-pp.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
     46.07            -9.3       36.77        perf-profile.calltrace.cycles-pp.futex_wait.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
     35.49            -7.1       28.39        perf-profile.calltrace.cycles-pp.futex_wait_setup.futex_wait.do_futex.__x64_sys_futex.do_syscall_64
     22.65            -4.5       18.11        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.syscall
     56.15            -3.4       52.77        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
      9.53            -1.8        7.75        perf-profile.calltrace.cycles-pp.get_futex_value_locked.futex_wait_setup.futex_wait.do_futex.__x64_sys_futex
      6.35            -1.4        4.96        perf-profile.calltrace.cycles-pp.hash_futex.futex_wait_setup.futex_wait.do_futex.__x64_sys_futex
      3.88            -1.0        2.87        perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.syscall
      3.85            -1.0        2.88        perf-profile.calltrace.cycles-pp.get_futex_key.futex_wait_setup.futex_wait.do_futex.__x64_sys_futex
      4.64            -0.8        3.81        perf-profile.calltrace.cycles-pp._raw_spin_lock.futex_wait_setup.futex_wait.do_futex.__x64_sys_futex
      0.87 ±  2%      -0.2        0.70        perf-profile.calltrace.cycles-pp.testcase
     98.76            +0.3       99.08        perf-profile.calltrace.cycles-pp.syscall
      0.00            +4.0        3.98        perf-profile.calltrace.cycles-pp.sched_core_wait_till_safe.exit_to_user_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.syscall
      0.00            +6.1        6.06        perf-profile.calltrace.cycles-pp.sched_core_unsafe_exit.sched_core_unsafe_exit_wait.exit_to_user_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe
     68.35            +6.2       74.56        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.syscall
      0.00            +6.4        6.45        perf-profile.calltrace.cycles-pp.sched_core_unsafe_exit_wait.exit_to_user_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.syscall
      3.07            +6.8        9.87        perf-profile.calltrace.cycles-pp.syscall_enter_from_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
      0.00            +7.6        7.63        perf-profile.calltrace.cycles-pp.sched_core_unsafe_enter.syscall_enter_from_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
      7.99           +10.1       18.09        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.syscall
      3.04           +11.4       14.41        perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.syscall
     52.25           -10.2       42.02        perf-profile.children.cycles-pp.__x64_sys_futex
     49.01           -10.1       38.94        perf-profile.children.cycles-pp.do_futex
     46.29            -9.4       36.89        perf-profile.children.cycles-pp.futex_wait
     36.20            -7.4       28.84        perf-profile.children.cycles-pp.futex_wait_setup
     56.21            -3.3       52.95        perf-profile.children.cycles-pp.do_syscall_64
     15.22            -3.0       12.23        perf-profile.children.cycles-pp.entry_SYSCALL_64
     11.22            -2.5        8.75        perf-profile.children.cycles-pp.syscall_return_via_sysret
      9.81            -1.8        7.97        perf-profile.children.cycles-pp.get_futex_value_locked
      6.39            -1.4        4.97        perf-profile.children.cycles-pp.hash_futex
      4.02            -1.0        3.00        perf-profile.children.cycles-pp.get_futex_key
      4.81            -0.8        3.97        perf-profile.children.cycles-pp._raw_spin_lock
      1.02            -0.2        0.81        perf-profile.children.cycles-pp.testcase
      0.05            +0.0        0.06        perf-profile.children.cycles-pp.native_irq_return_iret
      0.20 ±  2%      +0.0        0.22 ±  4%  perf-profile.children.cycles-pp.tick_sched_handle
      0.20 ±  2%      +0.0        0.22 ±  4%  perf-profile.children.cycles-pp.update_process_times
      0.11 ±  4%      +0.0        0.15 ±  6%  perf-profile.children.cycles-pp.clockevents_program_event
      0.46 ±  2%      +0.1        0.53 ±  4%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.34 ±  2%      +0.1        0.42 ±  3%  perf-profile.children.cycles-pp.tick_sched_timer
      0.59 ±  2%      +0.1        0.67 ±  3%  perf-profile.children.cycles-pp.asm_call_on_stack
      0.19 ±  2%      +0.1        0.29        perf-profile.children.cycles-pp.__x86_indirect_thunk_rax
      0.19 ±  2%      +0.1        0.29        perf-profile.children.cycles-pp.ktime_get_update_offsets_now
      0.22 ±  3%      +0.1        0.32 ±  3%  perf-profile.children.cycles-pp.ktime_get
     98.97            +0.2       99.16        perf-profile.children.cycles-pp.syscall
      0.83            +0.2        1.02 ±  3%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.87            +0.2        1.06 ±  3%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.78 ±  2%      +0.2        0.97 ±  3%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.78            +0.2        0.97 ±  3%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.00            +4.0        3.99        perf-profile.children.cycles-pp.sched_core_wait_till_safe
     68.66            +6.1       74.78        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.00            +6.2        6.16        perf-profile.children.cycles-pp.sched_core_unsafe_exit
      0.00            +6.7        6.67        perf-profile.children.cycles-pp.sched_core_unsafe_exit_wait
      3.08            +7.1       10.17        perf-profile.children.cycles-pp.syscall_enter_from_user_mode
      0.00            +7.8        7.81        perf-profile.children.cycles-pp.sched_core_unsafe_enter
      8.42           +10.0       18.38        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      3.19           +11.5       14.68        perf-profile.children.cycles-pp.exit_to_user_mode_prepare
     11.09            -2.4        8.68        perf-profile.self.cycles-pp.syscall_return_via_sysret
     11.57            -2.4        9.18        perf-profile.self.cycles-pp.futex_wait_setup
     11.17            -2.3        8.90        perf-profile.self.cycles-pp.syscall
      9.79            -2.0        7.79        perf-profile.self.cycles-pp.futex_wait
      9.72            -1.8        7.89        perf-profile.self.cycles-pp.get_futex_value_locked
      5.41            -1.6        3.83        perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      7.71            -1.4        6.29        perf-profile.self.cycles-pp.entry_SYSCALL_64
      5.95            -1.2        4.71        perf-profile.self.cycles-pp.hash_futex
      3.85            -1.0        2.86        perf-profile.self.cycles-pp.get_futex_key
      4.60            -0.8        3.77        perf-profile.self.cycles-pp._raw_spin_lock
      2.90            -0.7        2.16        perf-profile.self.cycles-pp.do_futex
      2.87            -0.6        2.23        perf-profile.self.cycles-pp.syscall_enter_from_user_mode
      4.10            -0.6        3.51        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.99            -0.2        0.81        perf-profile.self.cycles-pp.testcase
      3.25            -0.2        3.08        perf-profile.self.cycles-pp.__x64_sys_futex
      0.72            -0.1        0.63        perf-profile.self.cycles-pp.do_syscall_64
      0.17 ±  2%      -0.0        0.14        perf-profile.self.cycles-pp.__x86_indirect_thunk_rax
      0.05            +0.0        0.06        perf-profile.self.cycles-pp.native_irq_return_iret
      0.22 ±  2%      +0.0        0.25 ±  4%  perf-profile.self.cycles-pp.syscall(a)plt
      0.19            +0.1        0.28 ±  3%  perf-profile.self.cycles-pp.ktime_get_update_offsets_now
      0.22 ±  2%      +0.1        0.31 ±  3%  perf-profile.self.cycles-pp.ktime_get
      0.00            +0.5        0.53        perf-profile.self.cycles-pp.sched_core_unsafe_exit_wait
      3.01            +1.1        4.09        perf-profile.self.cycles-pp.exit_to_user_mode_prepare
      0.00            +3.9        3.92        perf-profile.self.cycles-pp.sched_core_wait_till_safe
      0.00            +6.0        6.00        perf-profile.self.cycles-pp.sched_core_unsafe_exit
      0.00            +7.6        7.64        perf-profile.self.cycles-pp.sched_core_unsafe_enter
     26170 ±  2%     -25.9%      19385        softirqs.CPU0.RCU
     18173 ±  3%      +9.2%      19842 ±  6%  softirqs.CPU10.RCU
     19378 ±  2%     +10.5%      21418 ±  8%  softirqs.CPU101.RCU
     19218 ±  2%      +6.9%      20551 ±  3%  softirqs.CPU102.RCU
     19021 ±  2%      +5.7%      20109 ±  4%  softirqs.CPU104.RCU
     19278 ±  3%      +6.2%      20471 ±  2%  softirqs.CPU105.RCU
     19151            +6.1%      20321 ±  4%  softirqs.CPU109.RCU
     18257            +6.6%      19471 ±  5%  softirqs.CPU11.RCU
     19119            +7.4%      20544 ±  4%  softirqs.CPU111.RCU
     17726            +8.1%      19156 ±  5%  softirqs.CPU112.RCU
     17653            +8.2%      19093 ±  5%  softirqs.CPU113.RCU
     17259 ±  2%      +7.7%      18584 ±  6%  softirqs.CPU114.RCU
     17630            +6.8%      18837 ±  5%  softirqs.CPU116.RCU
     17578 ±  2%      +7.3%      18857 ±  6%  softirqs.CPU117.RCU
     17501            +9.2%      19118 ±  5%  softirqs.CPU118.RCU
     17596           +30.2%      22903 ± 12%  softirqs.CPU119.RCU
     18303 ±  2%      +6.1%      19424 ±  4%  softirqs.CPU12.RCU
     24147            +8.1%      26099 ±  3%  softirqs.CPU120.RCU
     20115           +11.3%      22392 ±  3%  softirqs.CPU121.RCU
     17521            +9.8%      19231 ±  3%  softirqs.CPU122.RCU
     17888 ±  7%      +6.3%      19020 ±  4%  softirqs.CPU123.RCU
     16504 ±  7%     +15.7%      19103 ±  4%  softirqs.CPU124.RCU
     17163 ±  3%     +13.0%      19386 ±  5%  softirqs.CPU125.RCU
     16992 ±  3%     +12.2%      19069 ±  5%  softirqs.CPU126.RCU
     17026 ±  2%     +13.7%      19367 ±  8%  softirqs.CPU127.RCU
     16762 ±  3%     +15.0%      19269 ±  5%  softirqs.CPU128.RCU
     16815 ±  5%     +12.0%      18840 ±  6%  softirqs.CPU129.RCU
     18232 ±  3%      +9.0%      19878 ±  5%  softirqs.CPU13.RCU
     16740 ±  2%     +12.4%      18820 ±  5%  softirqs.CPU130.RCU
     16407 ±  4%     +11.9%      18360 ±  5%  softirqs.CPU131.RCU
     16757 ±  3%     +13.9%      19092 ±  3%  softirqs.CPU132.RCU
     17504 ±  5%      +8.7%      19030 ±  4%  softirqs.CPU133.RCU
     16654 ±  3%     +17.2%      19521        softirqs.CPU134.RCU
     17320 ±  9%      +9.6%      18979 ±  3%  softirqs.CPU135.RCU
     16518 ±  2%     +13.5%      18745 ±  3%  softirqs.CPU136.RCU
     16438 ±  2%     +14.6%      18831 ±  3%  softirqs.CPU137.RCU
     16643 ±  2%     +15.4%      19203 ±  2%  softirqs.CPU138.RCU
     16528 ±  2%     +14.9%      18985 ±  5%  softirqs.CPU139.RCU
     17876 ±  4%     +11.9%      20007 ±  3%  softirqs.CPU14.RCU
     16636 ±  3%     +14.8%      19096 ±  5%  softirqs.CPU140.RCU
     16794 ±  2%     +11.7%      18762 ±  5%  softirqs.CPU141.RCU
     16490 ±  3%     +12.0%      18474 ±  6%  softirqs.CPU142.RCU
     16104           +18.0%      19006 ±  6%  softirqs.CPU143.RCU
     23784 ±  3%     +15.6%      27501 ±  2%  softirqs.CPU144.RCU
     17043 ±  3%     +14.3%      19482 ±  3%  softirqs.CPU146.RCU
     16984 ±  5%     +13.4%      19254 ±  4%  softirqs.CPU147.RCU
     16901 ±  3%     +13.1%      19113 ±  4%  softirqs.CPU148.RCU
     16964 ±  4%     +13.8%      19304 ±  5%  softirqs.CPU149.RCU
     17880 ±  5%     +14.9%      20548 ±  2%  softirqs.CPU15.RCU
     16643 ±  4%     +14.7%      19094 ±  6%  softirqs.CPU150.RCU
     16866 ±  4%     +21.1%      20426        softirqs.CPU151.RCU
     16924 ±  5%     +14.5%      19380 ±  4%  softirqs.CPU152.RCU
     16474 ± 12%     +15.7%      19061 ±  5%  softirqs.CPU153.RCU
     17015 ±  5%     +12.2%      19083 ±  4%  softirqs.CPU154.RCU
     16781 ±  4%     +15.5%      19384 ±  5%  softirqs.CPU155.RCU
     16848 ±  5%     +15.3%      19420 ±  4%  softirqs.CPU156.RCU
     16628 ±  4%     +20.4%      20014        softirqs.CPU157.RCU
     16774 ±  4%     +12.9%      18938 ±  5%  softirqs.CPU158.RCU
     16546 ±  3%     +16.7%      19302 ±  4%  softirqs.CPU159.RCU
     17216 ±  2%      +9.6%      18871 ±  4%  softirqs.CPU16.RCU
     17049 ±  5%     +13.5%      19345 ±  3%  softirqs.CPU160.RCU
     16708 ±  4%     +14.7%      19163 ±  3%  softirqs.CPU161.RCU
     17086 ±  5%     +14.3%      19534 ±  5%  softirqs.CPU162.RCU
     17790 ±  2%     +10.3%      19629 ±  4%  softirqs.CPU163.RCU
     17224 ±  5%     +11.8%      19257 ±  5%  softirqs.CPU164.RCU
     17062 ±  5%     +14.3%      19507 ±  4%  softirqs.CPU165.RCU
     16947 ±  6%     +13.2%      19191 ±  2%  softirqs.CPU166.RCU
     16987 ±  5%     +13.6%      19290 ±  3%  softirqs.CPU167.RCU
     22742 ±  2%     +11.0%      25246 ±  5%  softirqs.CPU168.RCU
     19207 ±  2%      +9.5%      21028 ±  4%  softirqs.CPU169.RCU
     17283            +8.9%      18828 ±  4%  softirqs.CPU17.RCU
     16582 ±  2%     +12.9%      18724 ±  4%  softirqs.CPU170.RCU
     16103 ±  3%     +12.1%      18050 ±  4%  softirqs.CPU171.RCU
     15893 ±  2%     +13.8%      18080 ±  5%  softirqs.CPU172.RCU
     16027 ±  3%     +13.9%      18253 ±  5%  softirqs.CPU173.RCU
     16149 ±  3%     +12.9%      18234 ±  5%  softirqs.CPU174.RCU
     16192 ±  3%     +12.2%      18166 ±  4%  softirqs.CPU175.RCU
     15735 ±  3%     +14.3%      17992 ±  4%  softirqs.CPU176.RCU
     15614 ±  4%     +17.4%      18333 ±  5%  softirqs.CPU177.RCU
     15958 ±  3%     +12.9%      18011 ±  7%  softirqs.CPU178.RCU
     15955 ±  2%     +14.9%      18334 ±  5%  softirqs.CPU179.RCU
     17549 ±  2%      +9.9%      19293 ±  6%  softirqs.CPU18.RCU
     15781 ±  2%     +14.0%      17989 ±  4%  softirqs.CPU180.RCU
     15734 ±  2%     +14.8%      18061 ±  4%  softirqs.CPU181.RCU
     15890 ±  4%     +13.2%      17991 ±  5%  softirqs.CPU182.RCU
     15543 ±  2%     +15.2%      17906 ±  4%  softirqs.CPU183.RCU
     15836 ±  2%     +14.6%      18146 ±  5%  softirqs.CPU184.RCU
     15842 ±  4%     +15.8%      18339 ±  6%  softirqs.CPU185.RCU
     15881 ±  3%     +12.9%      17926 ±  5%  softirqs.CPU186.RCU
     15808 ±  3%     +13.7%      17970 ±  3%  softirqs.CPU187.RCU
     15628 ±  3%     +13.7%      17763 ±  5%  softirqs.CPU188.RCU
     15489 ±  2%     +14.1%      17671 ±  6%  softirqs.CPU189.RCU
     17456            +8.0%      18850 ±  5%  softirqs.CPU19.RCU
     15590 ±  3%     +14.6%      17869 ±  6%  softirqs.CPU190.RCU
     15893 ±  3%     +14.4%      18180 ±  6%  softirqs.CPU191.RCU
     18091 ±  2%     +16.4%      21062        softirqs.CPU2.RCU
     17324           +10.1%      19071 ±  4%  softirqs.CPU20.RCU
     17347 ±  2%      +8.9%      18897 ±  5%  softirqs.CPU21.RCU
     17457           +12.4%      19630 ±  7%  softirqs.CPU22.RCU
     17333            +9.0%      18895 ±  5%  softirqs.CPU23.RCU
     17175 ±  2%     +12.8%      19367 ±  5%  softirqs.CPU24.RCU
     17927 ±  3%     +13.1%      20273 ±  3%  softirqs.CPU25.RCU
     17675           +11.4%      19683 ±  4%  softirqs.CPU26.RCU
     17189           +14.9%      19746 ±  6%  softirqs.CPU27.RCU
     16034 ± 10%     +43.7%      23033 ± 21%  softirqs.CPU28.RCU
     16986 ±  4%     +15.0%      19537 ±  6%  softirqs.CPU29.RCU
     16726 ± 11%     +18.5%      19816 ±  6%  softirqs.CPU3.RCU
     17017 ±  3%     +14.3%      19447 ±  4%  softirqs.CPU30.RCU
     17191 ±  2%     +11.2%      19111 ±  6%  softirqs.CPU31.RCU
     17214 ±  3%     +14.0%      19621 ±  4%  softirqs.CPU32.RCU
     17461 ±  5%      +8.4%      18935 ±  5%  softirqs.CPU33.RCU
     17190 ±  3%     +12.5%      19334 ±  5%  softirqs.CPU34.RCU
     17021 ±  4%     +11.5%      18985 ±  5%  softirqs.CPU35.RCU
     16965 ±  3%     +15.7%      19620 ±  4%  softirqs.CPU36.RCU
     17206 ±  2%     +12.9%      19418 ±  4%  softirqs.CPU37.RCU
     17117 ±  4%     +13.2%      19378 ±  4%  softirqs.CPU38.RCU
     17222 ±  4%     +14.3%      19679 ±  2%  softirqs.CPU39.RCU
     18458           +10.7%      20428 ±  2%  softirqs.CPU4.RCU
     17167 ±  3%     +13.0%      19401 ±  3%  softirqs.CPU40.RCU
     17065 ±  2%     +14.4%      19527 ±  3%  softirqs.CPU41.RCU
     16919 ±  2%     +14.6%      19388 ±  3%  softirqs.CPU42.RCU
     17000 ±  2%     +13.8%      19341 ±  5%  softirqs.CPU43.RCU
     16993 ±  3%     +15.4%      19610 ±  5%  softirqs.CPU44.RCU
     17191 ±  3%     +12.8%      19400 ±  3%  softirqs.CPU45.RCU
     16927 ±  2%     +14.1%      19317 ±  3%  softirqs.CPU47.RCU
     17902 ±  5%     +12.7%      20175 ±  3%  softirqs.CPU48.RCU
     18438 ±  5%     +14.2%      21048 ±  2%  softirqs.CPU49.RCU
     18220 ±  2%     +11.1%      20234 ±  3%  softirqs.CPU5.RCU
     17911 ±  3%     +12.9%      20224 ±  3%  softirqs.CPU50.RCU
     17391 ±  5%     +14.7%      19953 ±  4%  softirqs.CPU51.RCU
     17140 ±  4%     +14.3%      19598 ±  4%  softirqs.CPU52.RCU
     17551 ±  5%     +13.7%      19958 ±  5%  softirqs.CPU53.RCU
     17632 ±  5%     +14.0%      20109 ±  6%  softirqs.CPU54.RCU
     17564 ±  5%     +13.6%      19957 ±  6%  softirqs.CPU55.RCU
     17555 ±  5%     +18.8%      20849 ±  7%  softirqs.CPU56.RCU
     16210 ± 16%     +23.3%      19992 ±  6%  softirqs.CPU57.RCU
     17257 ±  5%     +13.2%      19530 ±  4%  softirqs.CPU58.RCU
     17128 ±  5%     +15.6%      19796 ±  4%  softirqs.CPU59.RCU
     17745 ±  3%     +11.6%      19805 ±  4%  softirqs.CPU6.RCU
     17605 ±  5%     +14.1%      20086 ±  4%  softirqs.CPU60.RCU
     17523 ±  5%     +13.4%      19876 ±  3%  softirqs.CPU61.RCU
     17516 ±  5%     +12.6%      19718 ±  5%  softirqs.CPU62.RCU
     17118 ±  4%     +15.8%      19827 ±  5%  softirqs.CPU63.RCU
     17859 ±  4%     +14.3%      20405 ±  4%  softirqs.CPU64.RCU
     17568 ±  4%     +15.0%      20196 ±  4%  softirqs.CPU65.RCU
     17541 ±  5%     +14.4%      20063 ±  4%  softirqs.CPU66.RCU
     17612 ±  4%     +14.4%      20148 ±  4%  softirqs.CPU67.RCU
     17751 ±  5%     +13.0%      20054 ±  4%  softirqs.CPU68.RCU
     17608 ±  4%     +14.1%      20093 ±  4%  softirqs.CPU69.RCU
     17606 ±  2%     +15.0%      20251 ±  8%  softirqs.CPU7.RCU
     17601 ±  6%     +13.4%      19953 ±  3%  softirqs.CPU70.RCU
     17991 ±  5%     +12.2%      20185 ±  4%  softirqs.CPU71.RCU
     16599 ±  3%     +14.4%      18986 ±  5%  softirqs.CPU72.RCU
     17608 ±  4%     +13.1%      19921 ±  4%  softirqs.CPU73.RCU
     16939 ±  2%     +13.2%      19177 ±  4%  softirqs.CPU74.RCU
     16680 ±  2%     +22.7%      20472 ± 12%  softirqs.CPU75.RCU
     16853           +12.2%      18907 ±  5%  softirqs.CPU76.RCU
     16687 ±  3%     +13.5%      18937 ±  5%  softirqs.CPU77.RCU
     17338 ±  8%      +8.1%      18747 ±  5%  softirqs.CPU78.RCU
     17418 ±  8%      +7.3%      18686 ±  4%  softirqs.CPU79.RCU
     17883 ±  3%     +13.1%      20217 ±  7%  softirqs.CPU8.RCU
     16480 ±  3%     +13.1%      18646 ±  4%  softirqs.CPU80.RCU
     16515 ±  4%     +15.0%      18992 ±  4%  softirqs.CPU81.RCU
     16657 ±  3%     +12.6%      18754 ±  6%  softirqs.CPU82.RCU
     16524 ±  2%     +14.9%      18992 ±  4%  softirqs.CPU83.RCU
     16399 ±  2%     +14.0%      18687 ±  4%  softirqs.CPU84.RCU
     16543           +13.5%      18784 ±  4%  softirqs.CPU85.RCU
     16613           +13.1%      18793 ±  4%  softirqs.CPU86.RCU
     16495           +13.8%      18764 ±  4%  softirqs.CPU87.RCU
     16575 ±  2%     +14.0%      18903 ±  6%  softirqs.CPU88.RCU
     16446 ±  2%     +15.5%      18988 ±  5%  softirqs.CPU89.RCU
     18121 ±  2%      +8.8%      19709 ±  5%  softirqs.CPU9.RCU
     16647 ±  3%     +12.4%      18714 ±  5%  softirqs.CPU90.RCU
     16591 ±  3%     +13.1%      18760 ±  4%  softirqs.CPU91.RCU
     16443 ±  3%     +13.2%      18611 ±  5%  softirqs.CPU92.RCU
     16583 ±  2%     +12.0%      18575 ±  5%  softirqs.CPU93.RCU
     16812 ±  6%     +10.7%      18613 ±  5%  softirqs.CPU94.RCU
     16578 ±  4%     +12.4%      18640 ±  5%  softirqs.CPU95.RCU
     19701 ±  2%      +6.3%      20943 ±  4%  softirqs.CPU98.RCU
   3352663 ±  2%     +12.0%    3753392 ±  4%  softirqs.RCU


                                                                                
                            will-it-scale.per_thread_ops                        
                                                                                
  6e+06 +-------------------------------------------------------------------+   
        |.+..+.+.+..+.+.+..+   +..+.+.+..+.+..+.+.+..+.+.+                  |   
  5e+06 |-+                :   :                                            |   
        | O  O O O    O    :   :                     O        O             |   
        |               O  : O :  O O O  O O  O O O      O  O      O O O    |   
  4e+06 |-+                 : :                                             |   
        |                   : :                                             |   
  3e+06 |-+                 : :                                             |   
        |                   : :                                             |   
  2e+06 |-+                 : :                                             |   
        |                   : :                                             |   
        |                    :                                              |   
  1e+06 |-+                  :                                              |   
        |                    :                                              |   
      0 +-------------------------------------------------------------------+   
                                                                                
                                                                                                                                                                
                                will-it-scale.workload                          
                                                                                
  1.2e+09 +-----------------------------------------------------------------+   
          |                                                                 |   
    1e+09 |.+..+.+.+.+..+.+.+    +.+.+..+.+.+.+..+.+.+..+.+                 |   
          |                 :    :                                          |   
          | O  O O O    O O : O  : O O  O O O O  O O O    O O O    O O O    |   
    8e+08 |-+                :  :                                           |   
          |                  :  :                                           |   
    6e+08 |-+                :  :                                           |   
          |                  :  :                                           |   
    4e+08 |-+                : :                                            |   
          |                  : :                                            |   
          |                   ::                                            |   
    2e+08 |-+                 ::                                            |   
          |                   :                                             |   
        0 +-----------------------------------------------------------------+   
                                                                                
                                                                                
[*] bisect-good sample
[O] bisect-bad  sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Rong Chen


[-- Attachment #2: config-5.9.0-rc1-00138-g872a0a3f0b5ada --]
[-- Type: text/plain, Size: 170235 bytes --]

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 5.9.0-rc1 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="gcc-9 (Debian 9.3.0-15) 9.3.0"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=90300
CONFIG_LD_VERSION=235000000
CONFIG_CLANG_VERSION=0
CONFIG_CC_CAN_LINK=y
CONFIG_CC_CAN_LINK_STATIC=y
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_HAVE_KERNEL_ZSTD=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
# CONFIG_KERNEL_ZSTD is not set
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
# CONFIG_WATCH_QUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_USELIB is not set
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_GENERIC_IRQ_INJECTION=y
CONFIG_HARDIRQS_SW_RESEND=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_IRQ_MSI_IOMMU=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
# end of IRQ subsystem

CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_INIT=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_HAVE_POSIX_CPU_TIMERS_TASK_WORK=y
CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
# CONFIG_NO_HZ_IDLE is not set
CONFIG_NO_HZ_FULL=y
CONFIG_CONTEXT_TRACKING=y
# CONFIG_CONTEXT_TRACKING_FORCE is not set
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
# end of Timers subsystem

# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_COUNT=y
CONFIG_SCHED_CORE=y

#
# CPU/Task time and stats accounting
#
CONFIG_VIRT_CPU_ACCOUNTING=y
CONFIG_VIRT_CPU_ACCOUNTING_GEN=y
CONFIG_IRQ_TIME_ACCOUNTING=y
CONFIG_HAVE_SCHED_AVG_IRQ=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
# CONFIG_PSI is not set
# end of CPU/Task time and stats accounting

CONFIG_CPU_ISOLATION=y

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
CONFIG_TASKS_RCU=y
CONFIG_TASKS_RUDE_RCU=y
CONFIG_TASKS_TRACE_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
CONFIG_RCU_NOCB_CPU=y
# end of RCU Subsystem

CONFIG_BUILD_BIN2C=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
# CONFIG_IKHEADERS is not set
CONFIG_LOG_BUF_SHIFT=20
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y

#
# Scheduler features
#
# CONFIG_UCLAMP_TASK is not set
# end of Scheduler features

CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_CC_HAS_INT128=y
CONFIG_ARCH_SUPPORTS_INT128=y
CONFIG_NUMA_BALANCING=y
CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
CONFIG_CGROUPS=y
CONFIG_PAGE_COUNTER=y
CONFIG_MEMCG=y
CONFIG_MEMCG_SWAP=y
CONFIG_MEMCG_KMEM=y
CONFIG_BLK_CGROUP=y
CONFIG_CGROUP_WRITEBACK=y
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
CONFIG_RT_GROUP_SCHED=y
CONFIG_CGROUP_PIDS=y
CONFIG_CGROUP_RDMA=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_HUGETLB=y
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_PERF=y
CONFIG_CGROUP_BPF=y
# CONFIG_CGROUP_DEBUG is not set
CONFIG_SOCK_CGROUP_DATA=y
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_TIME_NS=y
CONFIG_IPC_NS=y
CONFIG_USER_NS=y
CONFIG_PID_NS=y
CONFIG_NET_NS=y
# CONFIG_CHECKPOINT_RESTORE is not set
CONFIG_SCHED_AUTOGROUP=y
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
CONFIG_RD_LZO=y
CONFIG_RD_LZ4=y
CONFIG_RD_ZSTD=y
# CONFIG_BOOT_CONFIG is not set
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_HAVE_UID16=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_HAVE_PCSPKR_PLATFORM=y
CONFIG_BPF=y
# CONFIG_EXPERT is not set
CONFIG_UID16=y
CONFIG_MULTIUSER=y
CONFIG_SGETMASK_SYSCALL=y
CONFIG_SYSFS_SYSCALL=y
CONFIG_FHANDLE=y
CONFIG_POSIX_TIMERS=y
CONFIG_PRINTK=y
CONFIG_PRINTK_NMI=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_FUTEX_PI=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_IO_URING=y
CONFIG_ADVISE_SYSCALLS=y
CONFIG_HAVE_ARCH_USERFAULTFD_WP=y
CONFIG_MEMBARRIER=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_ABSOLUTE_PERCPU=y
CONFIG_KALLSYMS_BASE_RELATIVE=y
# CONFIG_BPF_LSM is not set
CONFIG_BPF_SYSCALL=y
CONFIG_ARCH_WANT_DEFAULT_BPF_JIT=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_BPF_JIT_DEFAULT_ON=y
CONFIG_USERFAULTFD=y
CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE=y
CONFIG_RSEQ=y
# CONFIG_EMBEDDED is not set
CONFIG_HAVE_PERF_EVENTS=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
# end of Kernel Performance Events And Counters

CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_COMPAT_BRK is not set
# CONFIG_SLAB is not set
CONFIG_SLUB=y
CONFIG_SLAB_MERGE_DEFAULT=y
CONFIG_SLAB_FREELIST_RANDOM=y
# CONFIG_SLAB_FREELIST_HARDENED is not set
CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
CONFIG_SLUB_CPU_PARTIAL=y
CONFIG_SYSTEM_DATA_VERIFICATION=y
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
# end of General setup

CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=28
CONFIG_ARCH_MMAP_RND_BITS_MAX=32
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_ARCH_HAS_FILTER_PGPROT=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_HAVE_INTEL_TXT=y
CONFIG_X86_64_SMP=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_DYNAMIC_PHYSICAL_MASK=y
CONFIG_PGTABLE_LEVELS=5
CONFIG_CC_HAS_SANE_STACKPROTECTOR=y

#
# Processor type and features
#
CONFIG_ZONE_DMA=y
CONFIG_SMP=y
CONFIG_X86_FEATURE_NAMES=y
CONFIG_X86_X2APIC=y
CONFIG_X86_MPPARSE=y
# CONFIG_GOLDFISH is not set
CONFIG_RETPOLINE=y
CONFIG_X86_CPU_RESCTRL=y
CONFIG_X86_EXTENDED_PLATFORM=y
# CONFIG_X86_NUMACHIP is not set
# CONFIG_X86_VSMP is not set
CONFIG_X86_UV=y
# CONFIG_X86_GOLDFISH is not set
# CONFIG_X86_INTEL_MID is not set
CONFIG_X86_INTEL_LPSS=y
CONFIG_X86_AMD_PLATFORM_DEVICE=y
CONFIG_IOSF_MBI=y
# CONFIG_IOSF_MBI_DEBUG is not set
CONFIG_X86_SUPPORTS_MEMORY_FAILURE=y
# CONFIG_SCHED_OMIT_FRAME_POINTER is not set
CONFIG_HYPERVISOR_GUEST=y
CONFIG_PARAVIRT=y
# CONFIG_PARAVIRT_DEBUG is not set
CONFIG_PARAVIRT_SPINLOCKS=y
CONFIG_X86_HV_CALLBACK_VECTOR=y
CONFIG_XEN=y
# CONFIG_XEN_PV is not set
CONFIG_XEN_PVHVM=y
CONFIG_XEN_PVHVM_SMP=y
CONFIG_XEN_SAVE_RESTORE=y
# CONFIG_XEN_DEBUG_FS is not set
# CONFIG_XEN_PVH is not set
CONFIG_KVM_GUEST=y
CONFIG_ARCH_CPUIDLE_HALTPOLL=y
# CONFIG_PVH is not set
CONFIG_PARAVIRT_TIME_ACCOUNTING=y
CONFIG_PARAVIRT_CLOCK=y
# CONFIG_JAILHOUSE_GUEST is not set
# CONFIG_ACRN_GUEST is not set
# CONFIG_MK8 is not set
# CONFIG_MPSC is not set
# CONFIG_MCORE2 is not set
# CONFIG_MATOM is not set
CONFIG_GENERIC_CPU=y
CONFIG_X86_INTERNODE_CACHE_SHIFT=6
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_TSC=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=64
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_IA32_FEAT_CTL=y
CONFIG_X86_VMX_FEATURE_NAMES=y
CONFIG_CPU_SUP_INTEL=y
CONFIG_CPU_SUP_AMD=y
CONFIG_CPU_SUP_HYGON=y
CONFIG_CPU_SUP_CENTAUR=y
CONFIG_CPU_SUP_ZHAOXIN=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_DMI=y
# CONFIG_GART_IOMMU is not set
CONFIG_MAXSMP=y
CONFIG_NR_CPUS_RANGE_BEGIN=8192
CONFIG_NR_CPUS_RANGE_END=8192
CONFIG_NR_CPUS_DEFAULT=8192
CONFIG_NR_CPUS=8192
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
CONFIG_SCHED_MC_PRIO=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS=y
CONFIG_X86_MCE=y
CONFIG_X86_MCELOG_LEGACY=y
CONFIG_X86_MCE_INTEL=y
CONFIG_X86_MCE_AMD=y
CONFIG_X86_MCE_THRESHOLD=y
CONFIG_X86_MCE_INJECT=m
CONFIG_X86_THERMAL_VECTOR=y

#
# Performance monitoring
#
CONFIG_PERF_EVENTS_INTEL_UNCORE=m
CONFIG_PERF_EVENTS_INTEL_RAPL=m
CONFIG_PERF_EVENTS_INTEL_CSTATE=m
CONFIG_PERF_EVENTS_AMD_POWER=m
# end of Performance monitoring

CONFIG_X86_16BIT=y
CONFIG_X86_ESPFIX64=y
CONFIG_X86_VSYSCALL_EMULATION=y
CONFIG_X86_IOPL_IOPERM=y
CONFIG_I8K=m
CONFIG_MICROCODE=y
CONFIG_MICROCODE_INTEL=y
CONFIG_MICROCODE_AMD=y
CONFIG_MICROCODE_OLD_INTERFACE=y
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y
CONFIG_X86_5LEVEL=y
CONFIG_X86_DIRECT_GBPAGES=y
# CONFIG_X86_CPA_STATISTICS is not set
CONFIG_AMD_MEM_ENCRYPT=y
# CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT is not set
CONFIG_NUMA=y
CONFIG_AMD_NUMA=y
CONFIG_X86_64_ACPI_NUMA=y
CONFIG_NUMA_EMU=y
CONFIG_NODES_SHIFT=10
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
# CONFIG_ARCH_MEMORY_PROBE is not set
CONFIG_ARCH_PROC_KCORE_TEXT=y
CONFIG_ILLEGAL_POINTER_VALUE=0xdead000000000000
CONFIG_X86_PMEM_LEGACY_DEVICE=y
CONFIG_X86_PMEM_LEGACY=m
CONFIG_X86_CHECK_BIOS_CORRUPTION=y
# CONFIG_X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK is not set
CONFIG_X86_RESERVE_LOW=64
CONFIG_MTRR=y
CONFIG_MTRR_SANITIZER=y
CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=1
CONFIG_MTRR_SANITIZER_SPARE_REG_NR_DEFAULT=1
CONFIG_X86_PAT=y
CONFIG_ARCH_USES_PG_UNCACHED=y
CONFIG_ARCH_RANDOM=y
CONFIG_X86_SMAP=y
CONFIG_X86_UMIP=y
CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS=y
CONFIG_X86_INTEL_TSX_MODE_OFF=y
# CONFIG_X86_INTEL_TSX_MODE_ON is not set
# CONFIG_X86_INTEL_TSX_MODE_AUTO is not set
CONFIG_EFI=y
CONFIG_EFI_STUB=y
CONFIG_EFI_MIXED=y
CONFIG_SECCOMP=y
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
CONFIG_SCHED_HRTICK=y
CONFIG_KEXEC=y
CONFIG_KEXEC_FILE=y
CONFIG_ARCH_HAS_KEXEC_PURGATORY=y
# CONFIG_KEXEC_SIG is not set
CONFIG_CRASH_DUMP=y
CONFIG_KEXEC_JUMP=y
CONFIG_PHYSICAL_START=0x1000000
CONFIG_RELOCATABLE=y
CONFIG_RANDOMIZE_BASE=y
CONFIG_X86_NEED_RELOCS=y
CONFIG_PHYSICAL_ALIGN=0x200000
CONFIG_DYNAMIC_MEMORY_LAYOUT=y
CONFIG_RANDOMIZE_MEMORY=y
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING=0xa
CONFIG_HOTPLUG_CPU=y
CONFIG_BOOTPARAM_HOTPLUG_CPU0=y
# CONFIG_DEBUG_HOTPLUG_CPU0 is not set
# CONFIG_COMPAT_VDSO is not set
CONFIG_LEGACY_VSYSCALL_EMULATE=y
# CONFIG_LEGACY_VSYSCALL_XONLY is not set
# CONFIG_LEGACY_VSYSCALL_NONE is not set
# CONFIG_CMDLINE_BOOL is not set
CONFIG_MODIFY_LDT_SYSCALL=y
CONFIG_HAVE_LIVEPATCH=y
CONFIG_LIVEPATCH=y
# end of Processor type and features

CONFIG_ARCH_HAS_ADD_PAGES=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
CONFIG_USE_PERCPU_NUMA_NODE_ID=y
CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK=y
CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION=y
CONFIG_ARCH_ENABLE_THP_MIGRATION=y

#
# Power management and ACPI options
#
CONFIG_ARCH_HIBERNATION_HEADER=y
CONFIG_SUSPEND=y
CONFIG_SUSPEND_FREEZER=y
CONFIG_HIBERNATE_CALLBACKS=y
CONFIG_HIBERNATION=y
CONFIG_HIBERNATION_SNAPSHOT_DEV=y
CONFIG_PM_STD_PARTITION=""
CONFIG_PM_SLEEP=y
CONFIG_PM_SLEEP_SMP=y
# CONFIG_PM_AUTOSLEEP is not set
# CONFIG_PM_WAKELOCKS is not set
CONFIG_PM=y
CONFIG_PM_DEBUG=y
# CONFIG_PM_ADVANCED_DEBUG is not set
# CONFIG_PM_TEST_SUSPEND is not set
CONFIG_PM_SLEEP_DEBUG=y
# CONFIG_PM_TRACE_RTC is not set
CONFIG_PM_CLK=y
# CONFIG_WQ_POWER_EFFICIENT_DEFAULT is not set
# CONFIG_ENERGY_MODEL is not set
CONFIG_ARCH_SUPPORTS_ACPI=y
CONFIG_ACPI=y
CONFIG_ACPI_LEGACY_TABLES_LOOKUP=y
CONFIG_ARCH_MIGHT_HAVE_ACPI_PDC=y
CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT=y
# CONFIG_ACPI_DEBUGGER is not set
CONFIG_ACPI_SPCR_TABLE=y
CONFIG_ACPI_LPIT=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_REV_OVERRIDE_POSSIBLE=y
CONFIG_ACPI_EC_DEBUGFS=m
CONFIG_ACPI_AC=y
CONFIG_ACPI_BATTERY=y
CONFIG_ACPI_BUTTON=y
CONFIG_ACPI_VIDEO=m
CONFIG_ACPI_FAN=y
CONFIG_ACPI_TAD=m
CONFIG_ACPI_DOCK=y
CONFIG_ACPI_CPU_FREQ_PSS=y
CONFIG_ACPI_PROCESSOR_CSTATE=y
CONFIG_ACPI_PROCESSOR_IDLE=y
CONFIG_ACPI_CPPC_LIB=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_IPMI=m
CONFIG_ACPI_HOTPLUG_CPU=y
CONFIG_ACPI_PROCESSOR_AGGREGATOR=m
CONFIG_ACPI_THERMAL=y
CONFIG_ARCH_HAS_ACPI_TABLE_UPGRADE=y
CONFIG_ACPI_TABLE_UPGRADE=y
# CONFIG_ACPI_DEBUG is not set
CONFIG_ACPI_PCI_SLOT=y
CONFIG_ACPI_CONTAINER=y
CONFIG_ACPI_HOTPLUG_MEMORY=y
CONFIG_ACPI_HOTPLUG_IOAPIC=y
CONFIG_ACPI_SBS=m
CONFIG_ACPI_HED=y
# CONFIG_ACPI_CUSTOM_METHOD is not set
CONFIG_ACPI_BGRT=y
CONFIG_ACPI_NFIT=m
# CONFIG_NFIT_SECURITY_DEBUG is not set
CONFIG_ACPI_NUMA=y
# CONFIG_ACPI_HMAT is not set
CONFIG_HAVE_ACPI_APEI=y
CONFIG_HAVE_ACPI_APEI_NMI=y
CONFIG_ACPI_APEI=y
CONFIG_ACPI_APEI_GHES=y
CONFIG_ACPI_APEI_PCIEAER=y
CONFIG_ACPI_APEI_MEMORY_FAILURE=y
CONFIG_ACPI_APEI_EINJ=m
CONFIG_ACPI_APEI_ERST_DEBUG=y
CONFIG_DPTF_POWER=m
CONFIG_ACPI_WATCHDOG=y
CONFIG_ACPI_EXTLOG=m
CONFIG_ACPI_ADXL=y
CONFIG_PMIC_OPREGION=y
# CONFIG_ACPI_CONFIGFS is not set
CONFIG_X86_PM_TIMER=y
CONFIG_SFI=y

#
# CPU Frequency scaling
#
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_GOV_ATTR_SET=y
CONFIG_CPU_FREQ_GOV_COMMON=y
CONFIG_CPU_FREQ_STAT=y
CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_SCHEDUTIL is not set
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
CONFIG_CPU_FREQ_GOV_SCHEDUTIL=y

#
# CPU frequency scaling drivers
#
CONFIG_X86_INTEL_PSTATE=y
# CONFIG_X86_PCC_CPUFREQ is not set
CONFIG_X86_ACPI_CPUFREQ=m
CONFIG_X86_ACPI_CPUFREQ_CPB=y
CONFIG_X86_POWERNOW_K8=m
CONFIG_X86_AMD_FREQ_SENSITIVITY=m
# CONFIG_X86_SPEEDSTEP_CENTRINO is not set
CONFIG_X86_P4_CLOCKMOD=m

#
# shared options
#
CONFIG_X86_SPEEDSTEP_LIB=m
# end of CPU Frequency scaling

#
# CPU Idle
#
CONFIG_CPU_IDLE=y
# CONFIG_CPU_IDLE_GOV_LADDER is not set
CONFIG_CPU_IDLE_GOV_MENU=y
# CONFIG_CPU_IDLE_GOV_TEO is not set
# CONFIG_CPU_IDLE_GOV_HALTPOLL is not set
CONFIG_HALTPOLL_CPUIDLE=y
# end of CPU Idle

CONFIG_INTEL_IDLE=y
# end of Power management and ACPI options

#
# Bus options (PCI etc.)
#
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_PCI_XEN=y
CONFIG_MMCONF_FAM10H=y
CONFIG_ISA_DMA_API=y
CONFIG_AMD_NB=y
# CONFIG_X86_SYSFB is not set
# end of Bus options (PCI etc.)

#
# Binary Emulations
#
CONFIG_IA32_EMULATION=y
# CONFIG_X86_X32 is not set
CONFIG_COMPAT_32=y
CONFIG_COMPAT=y
CONFIG_COMPAT_FOR_U64_ALIGNMENT=y
CONFIG_SYSVIPC_COMPAT=y
# end of Binary Emulations

#
# Firmware Drivers
#
CONFIG_EDD=m
# CONFIG_EDD_OFF is not set
CONFIG_FIRMWARE_MEMMAP=y
CONFIG_DMIID=y
CONFIG_DMI_SYSFS=y
CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK=y
# CONFIG_ISCSI_IBFT is not set
CONFIG_FW_CFG_SYSFS=y
# CONFIG_FW_CFG_SYSFS_CMDLINE is not set
# CONFIG_GOOGLE_FIRMWARE is not set

#
# EFI (Extensible Firmware Interface) Support
#
CONFIG_EFI_VARS=y
CONFIG_EFI_ESRT=y
CONFIG_EFI_VARS_PSTORE=y
CONFIG_EFI_VARS_PSTORE_DEFAULT_DISABLE=y
CONFIG_EFI_RUNTIME_MAP=y
# CONFIG_EFI_FAKE_MEMMAP is not set
CONFIG_EFI_RUNTIME_WRAPPERS=y
CONFIG_EFI_GENERIC_STUB_INITRD_CMDLINE_LOADER=y
# CONFIG_EFI_BOOTLOADER_CONTROL is not set
# CONFIG_EFI_CAPSULE_LOADER is not set
# CONFIG_EFI_TEST is not set
CONFIG_APPLE_PROPERTIES=y
# CONFIG_RESET_ATTACK_MITIGATION is not set
# CONFIG_EFI_RCI2_TABLE is not set
# CONFIG_EFI_DISABLE_PCI_DMA is not set
# end of EFI (Extensible Firmware Interface) Support

CONFIG_UEFI_CPER=y
CONFIG_UEFI_CPER_X86=y
CONFIG_EFI_DEV_PATH_PARSER=y
CONFIG_EFI_EARLYCON=y
CONFIG_EFI_CUSTOM_SSDT_OVERLAYS=y

#
# Tegra firmware driver
#
# end of Tegra firmware driver
# end of Firmware Drivers

CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_HAVE_KVM_IRQFD=y
CONFIG_HAVE_KVM_IRQ_ROUTING=y
CONFIG_HAVE_KVM_EVENTFD=y
CONFIG_KVM_MMIO=y
CONFIG_KVM_ASYNC_PF=y
CONFIG_HAVE_KVM_MSI=y
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y
CONFIG_KVM_VFIO=y
CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT=y
CONFIG_KVM_COMPAT=y
CONFIG_HAVE_KVM_IRQ_BYPASS=y
CONFIG_HAVE_KVM_NO_POLL=y
CONFIG_KVM_XFER_TO_GUEST_WORK=y
CONFIG_VIRTUALIZATION=y
CONFIG_KVM=m
CONFIG_KVM_INTEL=m
CONFIG_KVM_AMD=m
CONFIG_KVM_AMD_SEV=y
CONFIG_KVM_MMU_AUDIT=y
CONFIG_AS_AVX512=y
CONFIG_AS_SHA1_NI=y
CONFIG_AS_SHA256_NI=y
CONFIG_AS_TPAUSE=y

#
# General architecture-dependent options
#
CONFIG_CRASH_CORE=y
CONFIG_KEXEC_CORE=y
CONFIG_HOTPLUG_SMT=y
CONFIG_GENERIC_ENTRY=y
CONFIG_OPROFILE=m
CONFIG_OPROFILE_EVENT_MULTIPLEX=y
CONFIG_HAVE_OPROFILE=y
CONFIG_OPROFILE_NMI_TIMER=y
CONFIG_KPROBES=y
CONFIG_JUMP_LABEL=y
# CONFIG_STATIC_KEYS_SELFTEST is not set
CONFIG_OPTPROBES=y
CONFIG_KPROBES_ON_FTRACE=y
CONFIG_UPROBES=y
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_ARCH_USE_BUILTIN_BSWAP=y
CONFIG_KRETPROBES=y
CONFIG_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_OPTPROBES=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_HAVE_FUNCTION_ERROR_INJECTION=y
CONFIG_HAVE_NMI=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_CONTIGUOUS=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_ARCH_HAS_FORTIFY_SOURCE=y
CONFIG_ARCH_HAS_SET_MEMORY=y
CONFIG_ARCH_HAS_SET_DIRECT_MAP=y
CONFIG_HAVE_ARCH_THREAD_STRUCT_WHITELIST=y
CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT=y
CONFIG_HAVE_ASM_MODVERSIONS=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_RSEQ=y
CONFIG_HAVE_FUNCTION_ARG_ACCESS_API=y
CONFIG_HAVE_HW_BREAKPOINT=y
CONFIG_HAVE_MIXED_BREAKPOINTS_REGS=y
CONFIG_HAVE_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_PERF_EVENTS_NMI=y
CONFIG_HAVE_HARDLOCKUP_DETECTOR_PERF=y
CONFIG_HAVE_PERF_REGS=y
CONFIG_HAVE_PERF_USER_STACK_DUMP=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y
CONFIG_HAVE_ARCH_JUMP_LABEL_RELATIVE=y
CONFIG_MMU_GATHER_TABLE_FREE=y
CONFIG_MMU_GATHER_RCU_TABLE_FREE=y
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y
CONFIG_HAVE_CMPXCHG_LOCAL=y
CONFIG_HAVE_CMPXCHG_DOUBLE=y
CONFIG_ARCH_WANT_COMPAT_IPC_PARSE_VERSION=y
CONFIG_ARCH_WANT_OLD_COMPAT_IPC=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP_FILTER=y
CONFIG_HAVE_ARCH_STACKLEAK=y
CONFIG_HAVE_STACKPROTECTOR=y
CONFIG_STACKPROTECTOR=y
CONFIG_STACKPROTECTOR_STRONG=y
CONFIG_HAVE_ARCH_WITHIN_STACK_FRAMES=y
CONFIG_HAVE_CONTEXT_TRACKING=y
CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN=y
CONFIG_HAVE_IRQ_TIME_ACCOUNTING=y
CONFIG_HAVE_MOVE_PMD=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD=y
CONFIG_HAVE_ARCH_HUGE_VMAP=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_HAVE_ARCH_SOFT_DIRTY=y
CONFIG_HAVE_MOD_ARCH_SPECIFIC=y
CONFIG_MODULES_USE_ELF_RELA=y
CONFIG_ARCH_HAS_ELF_RANDOMIZE=y
CONFIG_HAVE_ARCH_MMAP_RND_BITS=y
CONFIG_HAVE_EXIT_THREAD=y
CONFIG_ARCH_MMAP_RND_BITS=28
CONFIG_HAVE_ARCH_MMAP_RND_COMPAT_BITS=y
CONFIG_ARCH_MMAP_RND_COMPAT_BITS=8
CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES=y
CONFIG_HAVE_STACK_VALIDATION=y
CONFIG_HAVE_RELIABLE_STACKTRACE=y
CONFIG_OLD_SIGSUSPEND3=y
CONFIG_COMPAT_OLD_SIGACTION=y
CONFIG_COMPAT_32BIT_TIME=y
CONFIG_HAVE_ARCH_VMAP_STACK=y
CONFIG_VMAP_STACK=y
CONFIG_ARCH_HAS_STRICT_KERNEL_RWX=y
CONFIG_STRICT_KERNEL_RWX=y
CONFIG_ARCH_HAS_STRICT_MODULE_RWX=y
CONFIG_STRICT_MODULE_RWX=y
CONFIG_HAVE_ARCH_PREL32_RELOCATIONS=y
CONFIG_ARCH_USE_MEMREMAP_PROT=y
# CONFIG_LOCK_EVENT_COUNTS is not set
CONFIG_ARCH_HAS_MEM_ENCRYPT=y

#
# GCOV-based kernel profiling
#
# CONFIG_GCOV_KERNEL is not set
CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y
# end of GCOV-based kernel profiling

CONFIG_HAVE_GCC_PLUGINS=y
# end of General architecture-dependent options

CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULE_SIG_FORMAT=y
CONFIG_MODULES=y
CONFIG_MODULE_FORCE_LOAD=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_MODULE_SIG=y
# CONFIG_MODULE_SIG_FORCE is not set
CONFIG_MODULE_SIG_ALL=y
# CONFIG_MODULE_SIG_SHA1 is not set
# CONFIG_MODULE_SIG_SHA224 is not set
CONFIG_MODULE_SIG_SHA256=y
# CONFIG_MODULE_SIG_SHA384 is not set
# CONFIG_MODULE_SIG_SHA512 is not set
CONFIG_MODULE_SIG_HASH="sha256"
# CONFIG_MODULE_COMPRESS is not set
# CONFIG_MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS is not set
# CONFIG_UNUSED_SYMBOLS is not set
# CONFIG_TRIM_UNUSED_KSYMS is not set
CONFIG_MODULES_TREE_LOOKUP=y
CONFIG_BLOCK=y
CONFIG_BLK_SCSI_REQUEST=y
CONFIG_BLK_CGROUP_RWSTAT=y
CONFIG_BLK_DEV_BSG=y
CONFIG_BLK_DEV_BSGLIB=y
CONFIG_BLK_DEV_INTEGRITY=y
CONFIG_BLK_DEV_INTEGRITY_T10=m
CONFIG_BLK_DEV_ZONED=y
CONFIG_BLK_DEV_THROTTLING=y
# CONFIG_BLK_DEV_THROTTLING_LOW is not set
# CONFIG_BLK_CMDLINE_PARSER is not set
CONFIG_BLK_WBT=y
# CONFIG_BLK_CGROUP_IOLATENCY is not set
# CONFIG_BLK_CGROUP_IOCOST is not set
CONFIG_BLK_WBT_MQ=y
CONFIG_BLK_DEBUG_FS=y
CONFIG_BLK_DEBUG_FS_ZONED=y
# CONFIG_BLK_SED_OPAL is not set
# CONFIG_BLK_INLINE_ENCRYPTION is not set

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
# CONFIG_AIX_PARTITION is not set
CONFIG_OSF_PARTITION=y
CONFIG_AMIGA_PARTITION=y
# CONFIG_ATARI_PARTITION is not set
CONFIG_MAC_PARTITION=y
CONFIG_MSDOS_PARTITION=y
CONFIG_BSD_DISKLABEL=y
CONFIG_MINIX_SUBPARTITION=y
CONFIG_SOLARIS_X86_PARTITION=y
CONFIG_UNIXWARE_DISKLABEL=y
# CONFIG_LDM_PARTITION is not set
CONFIG_SGI_PARTITION=y
# CONFIG_ULTRIX_PARTITION is not set
CONFIG_SUN_PARTITION=y
CONFIG_KARMA_PARTITION=y
CONFIG_EFI_PARTITION=y
# CONFIG_SYSV68_PARTITION is not set
# CONFIG_CMDLINE_PARTITION is not set
# end of Partition Types

CONFIG_BLOCK_COMPAT=y
CONFIG_BLK_MQ_PCI=y
CONFIG_BLK_MQ_VIRTIO=y
CONFIG_BLK_MQ_RDMA=y
CONFIG_BLK_PM=y

#
# IO Schedulers
#
CONFIG_MQ_IOSCHED_DEADLINE=y
CONFIG_MQ_IOSCHED_KYBER=y
CONFIG_IOSCHED_BFQ=y
CONFIG_BFQ_GROUP_IOSCHED=y
# CONFIG_BFQ_CGROUP_DEBUG is not set
# end of IO Schedulers

CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_PADATA=y
CONFIG_ASN1=y
CONFIG_INLINE_SPIN_UNLOCK_IRQ=y
CONFIG_INLINE_READ_UNLOCK=y
CONFIG_INLINE_READ_UNLOCK_IRQ=y
CONFIG_INLINE_WRITE_UNLOCK=y
CONFIG_INLINE_WRITE_UNLOCK_IRQ=y
CONFIG_ARCH_SUPPORTS_ATOMIC_RMW=y
CONFIG_MUTEX_SPIN_ON_OWNER=y
CONFIG_RWSEM_SPIN_ON_OWNER=y
CONFIG_LOCK_SPIN_ON_OWNER=y
CONFIG_ARCH_USE_QUEUED_SPINLOCKS=y
CONFIG_QUEUED_SPINLOCKS=y
CONFIG_ARCH_USE_QUEUED_RWLOCKS=y
CONFIG_QUEUED_RWLOCKS=y
CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE=y
CONFIG_ARCH_HAS_SYNC_CORE_BEFORE_USERMODE=y
CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y
CONFIG_FREEZER=y

#
# Executable file formats
#
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
CONFIG_ELFCORE=y
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
CONFIG_BINFMT_SCRIPT=y
CONFIG_BINFMT_MISC=m
CONFIG_COREDUMP=y
# end of Executable file formats

#
# Memory Management options
#
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_NEED_MULTIPLE_NODES=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_HAVE_FAST_GUP=y
CONFIG_NUMA_KEEP_MEMINFO=y
CONFIG_MEMORY_ISOLATION=y
CONFIG_HAVE_BOOTMEM_INFO_NODE=y
CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG_SPARSE=y
# CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE is not set
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_MEMORY_BALLOON=y
CONFIG_BALLOON_COMPACTION=y
CONFIG_COMPACTION=y
CONFIG_PAGE_REPORTING=y
CONFIG_MIGRATION=y
CONFIG_CONTIG_ALLOC=y
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_MMU_NOTIFIER=y
CONFIG_KSM=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_ARCH_SUPPORTS_MEMORY_FAILURE=y
CONFIG_MEMORY_FAILURE=y
CONFIG_HWPOISON_INJECT=m
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
CONFIG_ARCH_WANTS_THP_SWAP=y
CONFIG_THP_SWAP=y
CONFIG_CLEANCACHE=y
CONFIG_FRONTSWAP=y
CONFIG_CMA=y
# CONFIG_CMA_DEBUG is not set
# CONFIG_CMA_DEBUGFS is not set
CONFIG_CMA_AREAS=7
CONFIG_ZSWAP=y
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_DEFLATE is not set
CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZO=y
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_842 is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZ4 is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZ4HC is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_ZSTD is not set
CONFIG_ZSWAP_COMPRESSOR_DEFAULT="lzo"
CONFIG_ZSWAP_ZPOOL_DEFAULT_ZBUD=y
# CONFIG_ZSWAP_ZPOOL_DEFAULT_Z3FOLD is not set
# CONFIG_ZSWAP_ZPOOL_DEFAULT_ZSMALLOC is not set
CONFIG_ZSWAP_ZPOOL_DEFAULT="zbud"
# CONFIG_ZSWAP_DEFAULT_ON is not set
CONFIG_ZPOOL=y
CONFIG_ZBUD=y
# CONFIG_Z3FOLD is not set
CONFIG_ZSMALLOC=y
# CONFIG_ZSMALLOC_PGTABLE_MAPPING is not set
CONFIG_ZSMALLOC_STAT=y
CONFIG_GENERIC_EARLY_IOREMAP=y
CONFIG_DEFERRED_STRUCT_PAGE_INIT=y
CONFIG_IDLE_PAGE_TRACKING=y
CONFIG_ARCH_HAS_PTE_DEVMAP=y
CONFIG_ZONE_DEVICE=y
CONFIG_DEV_PAGEMAP_OPS=y
CONFIG_DEVICE_PRIVATE=y
CONFIG_FRAME_VECTOR=y
CONFIG_ARCH_USES_HIGH_VMA_FLAGS=y
CONFIG_ARCH_HAS_PKEYS=y
# CONFIG_PERCPU_STATS is not set
# CONFIG_GUP_BENCHMARK is not set
# CONFIG_READ_ONLY_THP_FOR_FS is not set
CONFIG_ARCH_HAS_PTE_SPECIAL=y
CONFIG_MAPPING_DIRTY_HELPERS=y
# end of Memory Management options

CONFIG_NET=y
CONFIG_COMPAT_NETLINK_MESSAGES=y
CONFIG_NET_INGRESS=y
CONFIG_NET_EGRESS=y
CONFIG_SKB_EXTENSIONS=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_DIAG=m
CONFIG_UNIX=y
CONFIG_UNIX_SCM=y
CONFIG_UNIX_DIAG=m
CONFIG_TLS=m
CONFIG_TLS_DEVICE=y
# CONFIG_TLS_TOE is not set
CONFIG_XFRM=y
CONFIG_XFRM_OFFLOAD=y
CONFIG_XFRM_ALGO=y
CONFIG_XFRM_USER=y
# CONFIG_XFRM_INTERFACE is not set
CONFIG_XFRM_SUB_POLICY=y
CONFIG_XFRM_MIGRATE=y
CONFIG_XFRM_STATISTICS=y
CONFIG_XFRM_AH=m
CONFIG_XFRM_ESP=m
CONFIG_XFRM_IPCOMP=m
CONFIG_NET_KEY=m
CONFIG_NET_KEY_MIGRATE=y
# CONFIG_SMC is not set
CONFIG_XDP_SOCKETS=y
# CONFIG_XDP_SOCKETS_DIAG is not set
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
CONFIG_IP_FIB_TRIE_STATS=y
CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_IP_ROUTE_MULTIPATH=y
CONFIG_IP_ROUTE_VERBOSE=y
CONFIG_IP_ROUTE_CLASSID=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
# CONFIG_IP_PNP_BOOTP is not set
# CONFIG_IP_PNP_RARP is not set
CONFIG_NET_IPIP=m
CONFIG_NET_IPGRE_DEMUX=m
CONFIG_NET_IP_TUNNEL=m
CONFIG_NET_IPGRE=m
CONFIG_NET_IPGRE_BROADCAST=y
CONFIG_IP_MROUTE_COMMON=y
CONFIG_IP_MROUTE=y
CONFIG_IP_MROUTE_MULTIPLE_TABLES=y
CONFIG_IP_PIMSM_V1=y
CONFIG_IP_PIMSM_V2=y
CONFIG_SYN_COOKIES=y
CONFIG_NET_IPVTI=m
CONFIG_NET_UDP_TUNNEL=m
# CONFIG_NET_FOU is not set
# CONFIG_NET_FOU_IP_TUNNELS is not set
CONFIG_INET_AH=m
CONFIG_INET_ESP=m
CONFIG_INET_ESP_OFFLOAD=m
# CONFIG_INET_ESPINTCP is not set
CONFIG_INET_IPCOMP=m
CONFIG_INET_XFRM_TUNNEL=m
CONFIG_INET_TUNNEL=m
CONFIG_INET_DIAG=m
CONFIG_INET_TCP_DIAG=m
CONFIG_INET_UDP_DIAG=m
CONFIG_INET_RAW_DIAG=m
# CONFIG_INET_DIAG_DESTROY is not set
CONFIG_TCP_CONG_ADVANCED=y
CONFIG_TCP_CONG_BIC=m
CONFIG_TCP_CONG_CUBIC=y
CONFIG_TCP_CONG_WESTWOOD=m
CONFIG_TCP_CONG_HTCP=m
CONFIG_TCP_CONG_HSTCP=m
CONFIG_TCP_CONG_HYBLA=m
CONFIG_TCP_CONG_VEGAS=m
CONFIG_TCP_CONG_NV=m
CONFIG_TCP_CONG_SCALABLE=m
CONFIG_TCP_CONG_LP=m
CONFIG_TCP_CONG_VENO=m
CONFIG_TCP_CONG_YEAH=m
CONFIG_TCP_CONG_ILLINOIS=m
CONFIG_TCP_CONG_DCTCP=m
# CONFIG_TCP_CONG_CDG is not set
CONFIG_TCP_CONG_BBR=m
CONFIG_DEFAULT_CUBIC=y
# CONFIG_DEFAULT_RENO is not set
CONFIG_DEFAULT_TCP_CONG="cubic"
CONFIG_TCP_MD5SIG=y
CONFIG_IPV6=y
CONFIG_IPV6_ROUTER_PREF=y
CONFIG_IPV6_ROUTE_INFO=y
CONFIG_IPV6_OPTIMISTIC_DAD=y
CONFIG_INET6_AH=m
CONFIG_INET6_ESP=m
CONFIG_INET6_ESP_OFFLOAD=m
# CONFIG_INET6_ESPINTCP is not set
CONFIG_INET6_IPCOMP=m
CONFIG_IPV6_MIP6=m
# CONFIG_IPV6_ILA is not set
CONFIG_INET6_XFRM_TUNNEL=m
CONFIG_INET6_TUNNEL=m
CONFIG_IPV6_VTI=m
CONFIG_IPV6_SIT=m
CONFIG_IPV6_SIT_6RD=y
CONFIG_IPV6_NDISC_NODETYPE=y
CONFIG_IPV6_TUNNEL=m
CONFIG_IPV6_GRE=m
CONFIG_IPV6_MULTIPLE_TABLES=y
# CONFIG_IPV6_SUBTREES is not set
CONFIG_IPV6_MROUTE=y
CONFIG_IPV6_MROUTE_MULTIPLE_TABLES=y
CONFIG_IPV6_PIMSM_V2=y
# CONFIG_IPV6_SEG6_LWTUNNEL is not set
# CONFIG_IPV6_SEG6_HMAC is not set
# CONFIG_IPV6_RPL_LWTUNNEL is not set
CONFIG_NETLABEL=y
# CONFIG_MPTCP is not set
# CONFIG_MPTCP_KUNIT_TESTS is not set
CONFIG_NETWORK_SECMARK=y
CONFIG_NET_PTP_CLASSIFY=y
CONFIG_NETWORK_PHY_TIMESTAMPING=y
CONFIG_NETFILTER=y
CONFIG_NETFILTER_ADVANCED=y
CONFIG_BRIDGE_NETFILTER=m

#
# Core Netfilter Configuration
#
CONFIG_NETFILTER_INGRESS=y
CONFIG_NETFILTER_NETLINK=m
CONFIG_NETFILTER_FAMILY_BRIDGE=y
CONFIG_NETFILTER_FAMILY_ARP=y
# CONFIG_NETFILTER_NETLINK_ACCT is not set
CONFIG_NETFILTER_NETLINK_QUEUE=m
CONFIG_NETFILTER_NETLINK_LOG=m
CONFIG_NETFILTER_NETLINK_OSF=m
CONFIG_NF_CONNTRACK=m
CONFIG_NF_LOG_COMMON=m
CONFIG_NF_LOG_NETDEV=m
CONFIG_NETFILTER_CONNCOUNT=m
CONFIG_NF_CONNTRACK_MARK=y
CONFIG_NF_CONNTRACK_SECMARK=y
CONFIG_NF_CONNTRACK_ZONES=y
CONFIG_NF_CONNTRACK_PROCFS=y
CONFIG_NF_CONNTRACK_EVENTS=y
CONFIG_NF_CONNTRACK_TIMEOUT=y
CONFIG_NF_CONNTRACK_TIMESTAMP=y
CONFIG_NF_CONNTRACK_LABELS=y
CONFIG_NF_CT_PROTO_DCCP=y
CONFIG_NF_CT_PROTO_GRE=y
CONFIG_NF_CT_PROTO_SCTP=y
CONFIG_NF_CT_PROTO_UDPLITE=y
CONFIG_NF_CONNTRACK_AMANDA=m
CONFIG_NF_CONNTRACK_FTP=m
CONFIG_NF_CONNTRACK_H323=m
CONFIG_NF_CONNTRACK_IRC=m
CONFIG_NF_CONNTRACK_BROADCAST=m
CONFIG_NF_CONNTRACK_NETBIOS_NS=m
CONFIG_NF_CONNTRACK_SNMP=m
CONFIG_NF_CONNTRACK_PPTP=m
CONFIG_NF_CONNTRACK_SANE=m
CONFIG_NF_CONNTRACK_SIP=m
CONFIG_NF_CONNTRACK_TFTP=m
CONFIG_NF_CT_NETLINK=m
CONFIG_NF_CT_NETLINK_TIMEOUT=m
CONFIG_NF_CT_NETLINK_HELPER=m
CONFIG_NETFILTER_NETLINK_GLUE_CT=y
CONFIG_NF_NAT=m
CONFIG_NF_NAT_AMANDA=m
CONFIG_NF_NAT_FTP=m
CONFIG_NF_NAT_IRC=m
CONFIG_NF_NAT_SIP=m
CONFIG_NF_NAT_TFTP=m
CONFIG_NF_NAT_REDIRECT=y
CONFIG_NF_NAT_MASQUERADE=y
CONFIG_NETFILTER_SYNPROXY=m
CONFIG_NF_TABLES=m
CONFIG_NF_TABLES_INET=y
CONFIG_NF_TABLES_NETDEV=y
CONFIG_NFT_NUMGEN=m
CONFIG_NFT_CT=m
CONFIG_NFT_COUNTER=m
CONFIG_NFT_CONNLIMIT=m
CONFIG_NFT_LOG=m
CONFIG_NFT_LIMIT=m
CONFIG_NFT_MASQ=m
CONFIG_NFT_REDIR=m
CONFIG_NFT_NAT=m
# CONFIG_NFT_TUNNEL is not set
CONFIG_NFT_OBJREF=m
CONFIG_NFT_QUEUE=m
CONFIG_NFT_QUOTA=m
CONFIG_NFT_REJECT=m
CONFIG_NFT_REJECT_INET=m
CONFIG_NFT_COMPAT=m
CONFIG_NFT_HASH=m
CONFIG_NFT_FIB=m
CONFIG_NFT_FIB_INET=m
# CONFIG_NFT_XFRM is not set
CONFIG_NFT_SOCKET=m
# CONFIG_NFT_OSF is not set
# CONFIG_NFT_TPROXY is not set
# CONFIG_NFT_SYNPROXY is not set
CONFIG_NF_DUP_NETDEV=m
CONFIG_NFT_DUP_NETDEV=m
CONFIG_NFT_FWD_NETDEV=m
CONFIG_NFT_FIB_NETDEV=m
# CONFIG_NF_FLOW_TABLE is not set
CONFIG_NETFILTER_XTABLES=y

#
# Xtables combined modules
#
CONFIG_NETFILTER_XT_MARK=m
CONFIG_NETFILTER_XT_CONNMARK=m
CONFIG_NETFILTER_XT_SET=m

#
# Xtables targets
#
CONFIG_NETFILTER_XT_TARGET_AUDIT=m
CONFIG_NETFILTER_XT_TARGET_CHECKSUM=m
CONFIG_NETFILTER_XT_TARGET_CLASSIFY=m
CONFIG_NETFILTER_XT_TARGET_CONNMARK=m
CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=m
CONFIG_NETFILTER_XT_TARGET_CT=m
CONFIG_NETFILTER_XT_TARGET_DSCP=m
CONFIG_NETFILTER_XT_TARGET_HL=m
CONFIG_NETFILTER_XT_TARGET_HMARK=m
CONFIG_NETFILTER_XT_TARGET_IDLETIMER=m
# CONFIG_NETFILTER_XT_TARGET_LED is not set
CONFIG_NETFILTER_XT_TARGET_LOG=m
CONFIG_NETFILTER_XT_TARGET_MARK=m
CONFIG_NETFILTER_XT_NAT=m
CONFIG_NETFILTER_XT_TARGET_NETMAP=m
CONFIG_NETFILTER_XT_TARGET_NFLOG=m
CONFIG_NETFILTER_XT_TARGET_NFQUEUE=m
CONFIG_NETFILTER_XT_TARGET_NOTRACK=m
CONFIG_NETFILTER_XT_TARGET_RATEEST=m
CONFIG_NETFILTER_XT_TARGET_REDIRECT=m
CONFIG_NETFILTER_XT_TARGET_MASQUERADE=m
CONFIG_NETFILTER_XT_TARGET_TEE=m
CONFIG_NETFILTER_XT_TARGET_TPROXY=m
CONFIG_NETFILTER_XT_TARGET_TRACE=m
CONFIG_NETFILTER_XT_TARGET_SECMARK=m
CONFIG_NETFILTER_XT_TARGET_TCPMSS=m
CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP=m

#
# Xtables matches
#
CONFIG_NETFILTER_XT_MATCH_ADDRTYPE=m
CONFIG_NETFILTER_XT_MATCH_BPF=m
CONFIG_NETFILTER_XT_MATCH_CGROUP=m
CONFIG_NETFILTER_XT_MATCH_CLUSTER=m
CONFIG_NETFILTER_XT_MATCH_COMMENT=m
CONFIG_NETFILTER_XT_MATCH_CONNBYTES=m
CONFIG_NETFILTER_XT_MATCH_CONNLABEL=m
CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=m
CONFIG_NETFILTER_XT_MATCH_CONNMARK=m
CONFIG_NETFILTER_XT_MATCH_CONNTRACK=m
CONFIG_NETFILTER_XT_MATCH_CPU=m
CONFIG_NETFILTER_XT_MATCH_DCCP=m
CONFIG_NETFILTER_XT_MATCH_DEVGROUP=m
CONFIG_NETFILTER_XT_MATCH_DSCP=m
CONFIG_NETFILTER_XT_MATCH_ECN=m
CONFIG_NETFILTER_XT_MATCH_ESP=m
CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=m
CONFIG_NETFILTER_XT_MATCH_HELPER=m
CONFIG_NETFILTER_XT_MATCH_HL=m
# CONFIG_NETFILTER_XT_MATCH_IPCOMP is not set
CONFIG_NETFILTER_XT_MATCH_IPRANGE=m
CONFIG_NETFILTER_XT_MATCH_IPVS=m
# CONFIG_NETFILTER_XT_MATCH_L2TP is not set
CONFIG_NETFILTER_XT_MATCH_LENGTH=m
CONFIG_NETFILTER_XT_MATCH_LIMIT=m
CONFIG_NETFILTER_XT_MATCH_MAC=m
CONFIG_NETFILTER_XT_MATCH_MARK=m
CONFIG_NETFILTER_XT_MATCH_MULTIPORT=m
# CONFIG_NETFILTER_XT_MATCH_NFACCT is not set
CONFIG_NETFILTER_XT_MATCH_OSF=m
CONFIG_NETFILTER_XT_MATCH_OWNER=m
CONFIG_NETFILTER_XT_MATCH_POLICY=m
CONFIG_NETFILTER_XT_MATCH_PHYSDEV=m
CONFIG_NETFILTER_XT_MATCH_PKTTYPE=m
CONFIG_NETFILTER_XT_MATCH_QUOTA=m
CONFIG_NETFILTER_XT_MATCH_RATEEST=m
CONFIG_NETFILTER_XT_MATCH_REALM=m
CONFIG_NETFILTER_XT_MATCH_RECENT=m
CONFIG_NETFILTER_XT_MATCH_SCTP=m
CONFIG_NETFILTER_XT_MATCH_SOCKET=m
CONFIG_NETFILTER_XT_MATCH_STATE=m
CONFIG_NETFILTER_XT_MATCH_STATISTIC=m
CONFIG_NETFILTER_XT_MATCH_STRING=m
CONFIG_NETFILTER_XT_MATCH_TCPMSS=m
# CONFIG_NETFILTER_XT_MATCH_TIME is not set
# CONFIG_NETFILTER_XT_MATCH_U32 is not set
# end of Core Netfilter Configuration

CONFIG_IP_SET=m
CONFIG_IP_SET_MAX=256
CONFIG_IP_SET_BITMAP_IP=m
CONFIG_IP_SET_BITMAP_IPMAC=m
CONFIG_IP_SET_BITMAP_PORT=m
CONFIG_IP_SET_HASH_IP=m
CONFIG_IP_SET_HASH_IPMARK=m
CONFIG_IP_SET_HASH_IPPORT=m
CONFIG_IP_SET_HASH_IPPORTIP=m
CONFIG_IP_SET_HASH_IPPORTNET=m
CONFIG_IP_SET_HASH_IPMAC=m
CONFIG_IP_SET_HASH_MAC=m
CONFIG_IP_SET_HASH_NETPORTNET=m
CONFIG_IP_SET_HASH_NET=m
CONFIG_IP_SET_HASH_NETNET=m
CONFIG_IP_SET_HASH_NETPORT=m
CONFIG_IP_SET_HASH_NETIFACE=m
CONFIG_IP_SET_LIST_SET=m
CONFIG_IP_VS=m
CONFIG_IP_VS_IPV6=y
# CONFIG_IP_VS_DEBUG is not set
CONFIG_IP_VS_TAB_BITS=12

#
# IPVS transport protocol load balancing support
#
CONFIG_IP_VS_PROTO_TCP=y
CONFIG_IP_VS_PROTO_UDP=y
CONFIG_IP_VS_PROTO_AH_ESP=y
CONFIG_IP_VS_PROTO_ESP=y
CONFIG_IP_VS_PROTO_AH=y
CONFIG_IP_VS_PROTO_SCTP=y

#
# IPVS scheduler
#
CONFIG_IP_VS_RR=m
CONFIG_IP_VS_WRR=m
CONFIG_IP_VS_LC=m
CONFIG_IP_VS_WLC=m
CONFIG_IP_VS_FO=m
CONFIG_IP_VS_OVF=m
CONFIG_IP_VS_LBLC=m
CONFIG_IP_VS_LBLCR=m
CONFIG_IP_VS_DH=m
CONFIG_IP_VS_SH=m
# CONFIG_IP_VS_MH is not set
CONFIG_IP_VS_SED=m
CONFIG_IP_VS_NQ=m

#
# IPVS SH scheduler
#
CONFIG_IP_VS_SH_TAB_BITS=8

#
# IPVS MH scheduler
#
CONFIG_IP_VS_MH_TAB_INDEX=12

#
# IPVS application helper
#
CONFIG_IP_VS_FTP=m
CONFIG_IP_VS_NFCT=y
CONFIG_IP_VS_PE_SIP=m

#
# IP: Netfilter Configuration
#
CONFIG_NF_DEFRAG_IPV4=m
CONFIG_NF_SOCKET_IPV4=m
CONFIG_NF_TPROXY_IPV4=m
CONFIG_NF_TABLES_IPV4=y
CONFIG_NFT_REJECT_IPV4=m
CONFIG_NFT_DUP_IPV4=m
CONFIG_NFT_FIB_IPV4=m
CONFIG_NF_TABLES_ARP=y
CONFIG_NF_DUP_IPV4=m
CONFIG_NF_LOG_ARP=m
CONFIG_NF_LOG_IPV4=m
CONFIG_NF_REJECT_IPV4=m
CONFIG_NF_NAT_SNMP_BASIC=m
CONFIG_NF_NAT_PPTP=m
CONFIG_NF_NAT_H323=m
CONFIG_IP_NF_IPTABLES=m
CONFIG_IP_NF_MATCH_AH=m
CONFIG_IP_NF_MATCH_ECN=m
CONFIG_IP_NF_MATCH_RPFILTER=m
CONFIG_IP_NF_MATCH_TTL=m
CONFIG_IP_NF_FILTER=m
CONFIG_IP_NF_TARGET_REJECT=m
CONFIG_IP_NF_TARGET_SYNPROXY=m
CONFIG_IP_NF_NAT=m
CONFIG_IP_NF_TARGET_MASQUERADE=m
CONFIG_IP_NF_TARGET_NETMAP=m
CONFIG_IP_NF_TARGET_REDIRECT=m
CONFIG_IP_NF_MANGLE=m
# CONFIG_IP_NF_TARGET_CLUSTERIP is not set
CONFIG_IP_NF_TARGET_ECN=m
CONFIG_IP_NF_TARGET_TTL=m
CONFIG_IP_NF_RAW=m
CONFIG_IP_NF_SECURITY=m
CONFIG_IP_NF_ARPTABLES=m
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m
# end of IP: Netfilter Configuration

#
# IPv6: Netfilter Configuration
#
CONFIG_NF_SOCKET_IPV6=m
CONFIG_NF_TPROXY_IPV6=m
CONFIG_NF_TABLES_IPV6=y
CONFIG_NFT_REJECT_IPV6=m
CONFIG_NFT_DUP_IPV6=m
CONFIG_NFT_FIB_IPV6=m
CONFIG_NF_DUP_IPV6=m
CONFIG_NF_REJECT_IPV6=m
CONFIG_NF_LOG_IPV6=m
CONFIG_IP6_NF_IPTABLES=m
CONFIG_IP6_NF_MATCH_AH=m
CONFIG_IP6_NF_MATCH_EUI64=m
CONFIG_IP6_NF_MATCH_FRAG=m
CONFIG_IP6_NF_MATCH_OPTS=m
CONFIG_IP6_NF_MATCH_HL=m
CONFIG_IP6_NF_MATCH_IPV6HEADER=m
CONFIG_IP6_NF_MATCH_MH=m
CONFIG_IP6_NF_MATCH_RPFILTER=m
CONFIG_IP6_NF_MATCH_RT=m
# CONFIG_IP6_NF_MATCH_SRH is not set
# CONFIG_IP6_NF_TARGET_HL is not set
CONFIG_IP6_NF_FILTER=m
CONFIG_IP6_NF_TARGET_REJECT=m
CONFIG_IP6_NF_TARGET_SYNPROXY=m
CONFIG_IP6_NF_MANGLE=m
CONFIG_IP6_NF_RAW=m
CONFIG_IP6_NF_SECURITY=m
CONFIG_IP6_NF_NAT=m
CONFIG_IP6_NF_TARGET_MASQUERADE=m
CONFIG_IP6_NF_TARGET_NPT=m
# end of IPv6: Netfilter Configuration

CONFIG_NF_DEFRAG_IPV6=m
CONFIG_NF_TABLES_BRIDGE=m
# CONFIG_NFT_BRIDGE_META is not set
CONFIG_NFT_BRIDGE_REJECT=m
CONFIG_NF_LOG_BRIDGE=m
# CONFIG_NF_CONNTRACK_BRIDGE is not set
CONFIG_BRIDGE_NF_EBTABLES=m
CONFIG_BRIDGE_EBT_BROUTE=m
CONFIG_BRIDGE_EBT_T_FILTER=m
CONFIG_BRIDGE_EBT_T_NAT=m
CONFIG_BRIDGE_EBT_802_3=m
CONFIG_BRIDGE_EBT_AMONG=m
CONFIG_BRIDGE_EBT_ARP=m
CONFIG_BRIDGE_EBT_IP=m
CONFIG_BRIDGE_EBT_IP6=m
CONFIG_BRIDGE_EBT_LIMIT=m
CONFIG_BRIDGE_EBT_MARK=m
CONFIG_BRIDGE_EBT_PKTTYPE=m
CONFIG_BRIDGE_EBT_STP=m
CONFIG_BRIDGE_EBT_VLAN=m
CONFIG_BRIDGE_EBT_ARPREPLY=m
CONFIG_BRIDGE_EBT_DNAT=m
CONFIG_BRIDGE_EBT_MARK_T=m
CONFIG_BRIDGE_EBT_REDIRECT=m
CONFIG_BRIDGE_EBT_SNAT=m
CONFIG_BRIDGE_EBT_LOG=m
CONFIG_BRIDGE_EBT_NFLOG=m
# CONFIG_BPFILTER is not set
# CONFIG_IP_DCCP is not set
CONFIG_IP_SCTP=m
# CONFIG_SCTP_DBG_OBJCNT is not set
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_MD5 is not set
CONFIG_SCTP_DEFAULT_COOKIE_HMAC_SHA1=y
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_NONE is not set
CONFIG_SCTP_COOKIE_HMAC_MD5=y
CONFIG_SCTP_COOKIE_HMAC_SHA1=y
CONFIG_INET_SCTP_DIAG=m
# CONFIG_RDS is not set
CONFIG_TIPC=m
# CONFIG_TIPC_MEDIA_IB is not set
CONFIG_TIPC_MEDIA_UDP=y
CONFIG_TIPC_CRYPTO=y
CONFIG_TIPC_DIAG=m
CONFIG_ATM=m
CONFIG_ATM_CLIP=m
# CONFIG_ATM_CLIP_NO_ICMP is not set
CONFIG_ATM_LANE=m
# CONFIG_ATM_MPOA is not set
CONFIG_ATM_BR2684=m
# CONFIG_ATM_BR2684_IPFILTER is not set
CONFIG_L2TP=m
CONFIG_L2TP_DEBUGFS=m
CONFIG_L2TP_V3=y
CONFIG_L2TP_IP=m
CONFIG_L2TP_ETH=m
CONFIG_STP=m
CONFIG_GARP=m
CONFIG_MRP=m
CONFIG_BRIDGE=m
CONFIG_BRIDGE_IGMP_SNOOPING=y
CONFIG_BRIDGE_VLAN_FILTERING=y
# CONFIG_BRIDGE_MRP is not set
CONFIG_HAVE_NET_DSA=y
# CONFIG_NET_DSA is not set
CONFIG_VLAN_8021Q=m
CONFIG_VLAN_8021Q_GVRP=y
CONFIG_VLAN_8021Q_MVRP=y
# CONFIG_DECNET is not set
CONFIG_LLC=m
# CONFIG_LLC2 is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_PHONET is not set
CONFIG_6LOWPAN=m
# CONFIG_6LOWPAN_DEBUGFS is not set
# CONFIG_6LOWPAN_NHC is not set
CONFIG_IEEE802154=m
# CONFIG_IEEE802154_NL802154_EXPERIMENTAL is not set
CONFIG_IEEE802154_SOCKET=m
CONFIG_IEEE802154_6LOWPAN=m
CONFIG_MAC802154=m
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
CONFIG_NET_SCH_CBQ=m
CONFIG_NET_SCH_HTB=m
CONFIG_NET_SCH_HFSC=m
CONFIG_NET_SCH_ATM=m
CONFIG_NET_SCH_PRIO=m
CONFIG_NET_SCH_MULTIQ=m
CONFIG_NET_SCH_RED=m
CONFIG_NET_SCH_SFB=m
CONFIG_NET_SCH_SFQ=m
CONFIG_NET_SCH_TEQL=m
CONFIG_NET_SCH_TBF=m
# CONFIG_NET_SCH_CBS is not set
# CONFIG_NET_SCH_ETF is not set
# CONFIG_NET_SCH_TAPRIO is not set
CONFIG_NET_SCH_GRED=m
CONFIG_NET_SCH_DSMARK=m
CONFIG_NET_SCH_NETEM=m
CONFIG_NET_SCH_DRR=m
CONFIG_NET_SCH_MQPRIO=m
# CONFIG_NET_SCH_SKBPRIO is not set
CONFIG_NET_SCH_CHOKE=m
CONFIG_NET_SCH_QFQ=m
CONFIG_NET_SCH_CODEL=m
CONFIG_NET_SCH_FQ_CODEL=y
# CONFIG_NET_SCH_CAKE is not set
CONFIG_NET_SCH_FQ=m
CONFIG_NET_SCH_HHF=m
CONFIG_NET_SCH_PIE=m
# CONFIG_NET_SCH_FQ_PIE is not set
CONFIG_NET_SCH_INGRESS=m
CONFIG_NET_SCH_PLUG=m
# CONFIG_NET_SCH_ETS is not set
CONFIG_NET_SCH_DEFAULT=y
# CONFIG_DEFAULT_FQ is not set
# CONFIG_DEFAULT_CODEL is not set
CONFIG_DEFAULT_FQ_CODEL=y
# CONFIG_DEFAULT_SFQ is not set
# CONFIG_DEFAULT_PFIFO_FAST is not set
CONFIG_DEFAULT_NET_SCH="fq_codel"

#
# Classification
#
CONFIG_NET_CLS=y
CONFIG_NET_CLS_BASIC=m
CONFIG_NET_CLS_TCINDEX=m
CONFIG_NET_CLS_ROUTE4=m
CONFIG_NET_CLS_FW=m
CONFIG_NET_CLS_U32=m
CONFIG_CLS_U32_PERF=y
CONFIG_CLS_U32_MARK=y
CONFIG_NET_CLS_RSVP=m
CONFIG_NET_CLS_RSVP6=m
CONFIG_NET_CLS_FLOW=m
CONFIG_NET_CLS_CGROUP=y
CONFIG_NET_CLS_BPF=m
CONFIG_NET_CLS_FLOWER=m
CONFIG_NET_CLS_MATCHALL=m
CONFIG_NET_EMATCH=y
CONFIG_NET_EMATCH_STACK=32
CONFIG_NET_EMATCH_CMP=m
CONFIG_NET_EMATCH_NBYTE=m
CONFIG_NET_EMATCH_U32=m
CONFIG_NET_EMATCH_META=m
CONFIG_NET_EMATCH_TEXT=m
# CONFIG_NET_EMATCH_CANID is not set
CONFIG_NET_EMATCH_IPSET=m
# CONFIG_NET_EMATCH_IPT is not set
CONFIG_NET_CLS_ACT=y
CONFIG_NET_ACT_POLICE=m
CONFIG_NET_ACT_GACT=m
CONFIG_GACT_PROB=y
CONFIG_NET_ACT_MIRRED=m
CONFIG_NET_ACT_SAMPLE=m
# CONFIG_NET_ACT_IPT is not set
CONFIG_NET_ACT_NAT=m
CONFIG_NET_ACT_PEDIT=m
CONFIG_NET_ACT_SIMP=m
CONFIG_NET_ACT_SKBEDIT=m
CONFIG_NET_ACT_CSUM=m
# CONFIG_NET_ACT_MPLS is not set
CONFIG_NET_ACT_VLAN=m
CONFIG_NET_ACT_BPF=m
# CONFIG_NET_ACT_CONNMARK is not set
# CONFIG_NET_ACT_CTINFO is not set
CONFIG_NET_ACT_SKBMOD=m
# CONFIG_NET_ACT_IFE is not set
CONFIG_NET_ACT_TUNNEL_KEY=m
# CONFIG_NET_ACT_GATE is not set
# CONFIG_NET_TC_SKB_EXT is not set
CONFIG_NET_SCH_FIFO=y
CONFIG_DCB=y
CONFIG_DNS_RESOLVER=m
# CONFIG_BATMAN_ADV is not set
CONFIG_OPENVSWITCH=m
CONFIG_OPENVSWITCH_GRE=m
CONFIG_VSOCKETS=m
CONFIG_VSOCKETS_DIAG=m
CONFIG_VSOCKETS_LOOPBACK=m
CONFIG_VMWARE_VMCI_VSOCKETS=m
CONFIG_VIRTIO_VSOCKETS=m
CONFIG_VIRTIO_VSOCKETS_COMMON=m
CONFIG_HYPERV_VSOCKETS=m
CONFIG_NETLINK_DIAG=m
CONFIG_MPLS=y
CONFIG_NET_MPLS_GSO=y
CONFIG_MPLS_ROUTING=m
CONFIG_MPLS_IPTUNNEL=m
CONFIG_NET_NSH=y
# CONFIG_HSR is not set
CONFIG_NET_SWITCHDEV=y
CONFIG_NET_L3_MASTER_DEV=y
# CONFIG_QRTR is not set
# CONFIG_NET_NCSI is not set
CONFIG_RPS=y
CONFIG_RFS_ACCEL=y
CONFIG_XPS=y
CONFIG_CGROUP_NET_PRIO=y
CONFIG_CGROUP_NET_CLASSID=y
CONFIG_NET_RX_BUSY_POLL=y
CONFIG_BQL=y
CONFIG_BPF_JIT=y
CONFIG_BPF_STREAM_PARSER=y
CONFIG_NET_FLOW_LIMIT=y

#
# Network testing
#
CONFIG_NET_PKTGEN=m
CONFIG_NET_DROP_MONITOR=y
# end of Network testing
# end of Networking options

# CONFIG_HAMRADIO is not set
CONFIG_CAN=m
CONFIG_CAN_RAW=m
CONFIG_CAN_BCM=m
CONFIG_CAN_GW=m
# CONFIG_CAN_J1939 is not set

#
# CAN Device Drivers
#
CONFIG_CAN_VCAN=m
# CONFIG_CAN_VXCAN is not set
CONFIG_CAN_SLCAN=m
CONFIG_CAN_DEV=m
CONFIG_CAN_CALC_BITTIMING=y
# CONFIG_CAN_KVASER_PCIEFD is not set
CONFIG_CAN_C_CAN=m
CONFIG_CAN_C_CAN_PLATFORM=m
CONFIG_CAN_C_CAN_PCI=m
CONFIG_CAN_CC770=m
# CONFIG_CAN_CC770_ISA is not set
CONFIG_CAN_CC770_PLATFORM=m
# CONFIG_CAN_IFI_CANFD is not set
# CONFIG_CAN_M_CAN is not set
# CONFIG_CAN_PEAK_PCIEFD is not set
CONFIG_CAN_SJA1000=m
CONFIG_CAN_EMS_PCI=m
# CONFIG_CAN_F81601 is not set
CONFIG_CAN_KVASER_PCI=m
CONFIG_CAN_PEAK_PCI=m
CONFIG_CAN_PEAK_PCIEC=y
CONFIG_CAN_PLX_PCI=m
# CONFIG_CAN_SJA1000_ISA is not set
CONFIG_CAN_SJA1000_PLATFORM=m
CONFIG_CAN_SOFTING=m

#
# CAN SPI interfaces
#
# CONFIG_CAN_HI311X is not set
# CONFIG_CAN_MCP251X is not set
# end of CAN SPI interfaces

#
# CAN USB interfaces
#
# CONFIG_CAN_8DEV_USB is not set
# CONFIG_CAN_EMS_USB is not set
# CONFIG_CAN_ESD_USB2 is not set
# CONFIG_CAN_GS_USB is not set
# CONFIG_CAN_KVASER_USB is not set
# CONFIG_CAN_MCBA_USB is not set
# CONFIG_CAN_PEAK_USB is not set
# CONFIG_CAN_UCAN is not set
# end of CAN USB interfaces

# CONFIG_CAN_DEBUG_DEVICES is not set
# end of CAN Device Drivers

CONFIG_BT=m
CONFIG_BT_BREDR=y
CONFIG_BT_RFCOMM=m
CONFIG_BT_RFCOMM_TTY=y
CONFIG_BT_BNEP=m
CONFIG_BT_BNEP_MC_FILTER=y
CONFIG_BT_BNEP_PROTO_FILTER=y
CONFIG_BT_HIDP=m
CONFIG_BT_HS=y
CONFIG_BT_LE=y
# CONFIG_BT_6LOWPAN is not set
# CONFIG_BT_LEDS is not set
# CONFIG_BT_MSFTEXT is not set
CONFIG_BT_DEBUGFS=y
# CONFIG_BT_SELFTEST is not set

#
# Bluetooth device drivers
#
# CONFIG_BT_HCIBTUSB is not set
# CONFIG_BT_HCIBTSDIO is not set
CONFIG_BT_HCIUART=m
CONFIG_BT_HCIUART_H4=y
CONFIG_BT_HCIUART_BCSP=y
CONFIG_BT_HCIUART_ATH3K=y
# CONFIG_BT_HCIUART_INTEL is not set
# CONFIG_BT_HCIUART_AG6XX is not set
# CONFIG_BT_HCIBCM203X is not set
# CONFIG_BT_HCIBPA10X is not set
# CONFIG_BT_HCIBFUSB is not set
CONFIG_BT_HCIVHCI=m
CONFIG_BT_MRVL=m
# CONFIG_BT_MRVL_SDIO is not set
# CONFIG_BT_MTKSDIO is not set
# end of Bluetooth device drivers

# CONFIG_AF_RXRPC is not set
# CONFIG_AF_KCM is not set
CONFIG_STREAM_PARSER=y
CONFIG_FIB_RULES=y
CONFIG_WIRELESS=y
CONFIG_WEXT_CORE=y
CONFIG_WEXT_PROC=y
CONFIG_CFG80211=m
# CONFIG_NL80211_TESTMODE is not set
# CONFIG_CFG80211_DEVELOPER_WARNINGS is not set
CONFIG_CFG80211_REQUIRE_SIGNED_REGDB=y
CONFIG_CFG80211_USE_KERNEL_REGDB_KEYS=y
CONFIG_CFG80211_DEFAULT_PS=y
# CONFIG_CFG80211_DEBUGFS is not set
CONFIG_CFG80211_CRDA_SUPPORT=y
CONFIG_CFG80211_WEXT=y
CONFIG_MAC80211=m
CONFIG_MAC80211_HAS_RC=y
CONFIG_MAC80211_RC_MINSTREL=y
CONFIG_MAC80211_RC_DEFAULT_MINSTREL=y
CONFIG_MAC80211_RC_DEFAULT="minstrel_ht"
CONFIG_MAC80211_MESH=y
CONFIG_MAC80211_LEDS=y
CONFIG_MAC80211_DEBUGFS=y
# CONFIG_MAC80211_MESSAGE_TRACING is not set
# CONFIG_MAC80211_DEBUG_MENU is not set
CONFIG_MAC80211_STA_HASH_MAX_SIZE=0
# CONFIG_WIMAX is not set
CONFIG_RFKILL=m
CONFIG_RFKILL_LEDS=y
CONFIG_RFKILL_INPUT=y
# CONFIG_RFKILL_GPIO is not set
CONFIG_NET_9P=y
CONFIG_NET_9P_VIRTIO=y
# CONFIG_NET_9P_XEN is not set
# CONFIG_NET_9P_RDMA is not set
# CONFIG_NET_9P_DEBUG is not set
# CONFIG_CAIF is not set
CONFIG_CEPH_LIB=m
# CONFIG_CEPH_LIB_PRETTYDEBUG is not set
CONFIG_CEPH_LIB_USE_DNS_RESOLVER=y
# CONFIG_NFC is not set
CONFIG_PSAMPLE=m
# CONFIG_NET_IFE is not set
CONFIG_LWTUNNEL=y
CONFIG_LWTUNNEL_BPF=y
CONFIG_DST_CACHE=y
CONFIG_GRO_CELLS=y
CONFIG_SOCK_VALIDATE_XMIT=y
CONFIG_NET_SOCK_MSG=y
CONFIG_NET_DEVLINK=y
CONFIG_PAGE_POOL=y
CONFIG_FAILOVER=m
CONFIG_ETHTOOL_NETLINK=y
CONFIG_HAVE_EBPF_JIT=y

#
# Device Drivers
#
CONFIG_HAVE_EISA=y
# CONFIG_EISA is not set
CONFIG_HAVE_PCI=y
CONFIG_PCI=y
CONFIG_PCI_DOMAINS=y
CONFIG_PCIEPORTBUS=y
CONFIG_HOTPLUG_PCI_PCIE=y
CONFIG_PCIEAER=y
CONFIG_PCIEAER_INJECT=m
CONFIG_PCIE_ECRC=y
CONFIG_PCIEASPM=y
CONFIG_PCIEASPM_DEFAULT=y
# CONFIG_PCIEASPM_POWERSAVE is not set
# CONFIG_PCIEASPM_POWER_SUPERSAVE is not set
# CONFIG_PCIEASPM_PERFORMANCE is not set
CONFIG_PCIE_PME=y
CONFIG_PCIE_DPC=y
# CONFIG_PCIE_PTM is not set
# CONFIG_PCIE_BW is not set
# CONFIG_PCIE_EDR is not set
CONFIG_PCI_MSI=y
CONFIG_PCI_MSI_IRQ_DOMAIN=y
CONFIG_PCI_QUIRKS=y
# CONFIG_PCI_DEBUG is not set
# CONFIG_PCI_REALLOC_ENABLE_AUTO is not set
CONFIG_PCI_STUB=y
CONFIG_PCI_PF_STUB=m
# CONFIG_XEN_PCIDEV_FRONTEND is not set
CONFIG_PCI_ATS=y
CONFIG_PCI_LOCKLESS_CONFIG=y
CONFIG_PCI_IOV=y
CONFIG_PCI_PRI=y
CONFIG_PCI_PASID=y
# CONFIG_PCI_P2PDMA is not set
CONFIG_PCI_LABEL=y
CONFIG_PCI_HYPERV=m
CONFIG_HOTPLUG_PCI=y
CONFIG_HOTPLUG_PCI_ACPI=y
CONFIG_HOTPLUG_PCI_ACPI_IBM=m
# CONFIG_HOTPLUG_PCI_CPCI is not set
CONFIG_HOTPLUG_PCI_SHPC=y

#
# PCI controller drivers
#
CONFIG_VMD=y
CONFIG_PCI_HYPERV_INTERFACE=m

#
# DesignWare PCI Core Support
#
# CONFIG_PCIE_DW_PLAT_HOST is not set
# CONFIG_PCI_MESON is not set
# end of DesignWare PCI Core Support

#
# Mobiveil PCIe Core Support
#
# end of Mobiveil PCIe Core Support

#
# Cadence PCIe controllers support
#
# end of Cadence PCIe controllers support
# end of PCI controller drivers

#
# PCI Endpoint
#
# CONFIG_PCI_ENDPOINT is not set
# end of PCI Endpoint

#
# PCI switch controller drivers
#
# CONFIG_PCI_SW_SWITCHTEC is not set
# end of PCI switch controller drivers

# CONFIG_PCCARD is not set
# CONFIG_RAPIDIO is not set

#
# Generic Driver Options
#
# CONFIG_UEVENT_HELPER is not set
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y

#
# Firmware loader
#
CONFIG_FW_LOADER=y
CONFIG_FW_LOADER_PAGED_BUF=y
CONFIG_EXTRA_FIRMWARE=""
CONFIG_FW_LOADER_USER_HELPER=y
# CONFIG_FW_LOADER_USER_HELPER_FALLBACK is not set
# CONFIG_FW_LOADER_COMPRESS is not set
CONFIG_FW_CACHE=y
# end of Firmware loader

CONFIG_ALLOW_DEV_COREDUMP=y
# CONFIG_DEBUG_DRIVER is not set
# CONFIG_DEBUG_DEVRES is not set
# CONFIG_DEBUG_TEST_DRIVER_REMOVE is not set
# CONFIG_PM_QOS_KUNIT_TEST is not set
# CONFIG_TEST_ASYNC_DRIVER_PROBE is not set
CONFIG_KUNIT_DRIVER_PE_TEST=y
CONFIG_SYS_HYPERVISOR=y
CONFIG_GENERIC_CPU_AUTOPROBE=y
CONFIG_GENERIC_CPU_VULNERABILITIES=y
CONFIG_REGMAP=y
CONFIG_REGMAP_I2C=m
CONFIG_REGMAP_SPI=m
CONFIG_DMA_SHARED_BUFFER=y
# CONFIG_DMA_FENCE_TRACE is not set
# end of Generic Driver Options

#
# Bus devices
#
# CONFIG_MHI_BUS is not set
# end of Bus devices

CONFIG_CONNECTOR=y
CONFIG_PROC_EVENTS=y
# CONFIG_GNSS is not set
# CONFIG_MTD is not set
# CONFIG_OF is not set
CONFIG_ARCH_MIGHT_HAVE_PC_PARPORT=y
CONFIG_PARPORT=m
CONFIG_PARPORT_PC=m
CONFIG_PARPORT_SERIAL=m
# CONFIG_PARPORT_PC_FIFO is not set
# CONFIG_PARPORT_PC_SUPERIO is not set
# CONFIG_PARPORT_AX88796 is not set
CONFIG_PARPORT_1284=y
CONFIG_PNP=y
# CONFIG_PNP_DEBUG_MESSAGES is not set

#
# Protocols
#
CONFIG_PNPACPI=y
CONFIG_BLK_DEV=y
CONFIG_BLK_DEV_NULL_BLK=m
CONFIG_BLK_DEV_NULL_BLK_FAULT_INJECTION=y
# CONFIG_BLK_DEV_FD is not set
CONFIG_CDROM=m
# CONFIG_PARIDE is not set
# CONFIG_BLK_DEV_PCIESSD_MTIP32XX is not set
# CONFIG_ZRAM is not set
# CONFIG_BLK_DEV_UMEM is not set
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_LOOP_MIN_COUNT=0
# CONFIG_BLK_DEV_CRYPTOLOOP is not set
# CONFIG_BLK_DEV_DRBD is not set
CONFIG_BLK_DEV_NBD=m
# CONFIG_BLK_DEV_SKD is not set
# CONFIG_BLK_DEV_SX8 is not set
CONFIG_BLK_DEV_RAM=m
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=16384
CONFIG_CDROM_PKTCDVD=m
CONFIG_CDROM_PKTCDVD_BUFFERS=8
# CONFIG_CDROM_PKTCDVD_WCACHE is not set
# CONFIG_ATA_OVER_ETH is not set
CONFIG_XEN_BLKDEV_FRONTEND=m
CONFIG_VIRTIO_BLK=y
CONFIG_BLK_DEV_RBD=m
# CONFIG_BLK_DEV_RSXX is not set

#
# NVME Support
#
CONFIG_NVME_CORE=m
CONFIG_BLK_DEV_NVME=m
CONFIG_NVME_MULTIPATH=y
# CONFIG_NVME_HWMON is not set
CONFIG_NVME_FABRICS=m
# CONFIG_NVME_RDMA is not set
CONFIG_NVME_FC=m
# CONFIG_NVME_TCP is not set
CONFIG_NVME_TARGET=m
# CONFIG_NVME_TARGET_PASSTHRU is not set
CONFIG_NVME_TARGET_LOOP=m
# CONFIG_NVME_TARGET_RDMA is not set
CONFIG_NVME_TARGET_FC=m
CONFIG_NVME_TARGET_FCLOOP=m
# CONFIG_NVME_TARGET_TCP is not set
# end of NVME Support

#
# Misc devices
#
CONFIG_SENSORS_LIS3LV02D=m
# CONFIG_AD525X_DPOT is not set
# CONFIG_DUMMY_IRQ is not set
# CONFIG_IBM_ASM is not set
# CONFIG_PHANTOM is not set
CONFIG_TIFM_CORE=m
CONFIG_TIFM_7XX1=m
# CONFIG_ICS932S401 is not set
CONFIG_ENCLOSURE_SERVICES=m
CONFIG_SGI_XP=m
CONFIG_HP_ILO=m
CONFIG_SGI_GRU=m
# CONFIG_SGI_GRU_DEBUG is not set
CONFIG_APDS9802ALS=m
CONFIG_ISL29003=m
CONFIG_ISL29020=m
CONFIG_SENSORS_TSL2550=m
CONFIG_SENSORS_BH1770=m
CONFIG_SENSORS_APDS990X=m
# CONFIG_HMC6352 is not set
# CONFIG_DS1682 is not set
CONFIG_VMWARE_BALLOON=m
# CONFIG_LATTICE_ECP3_CONFIG is not set
# CONFIG_SRAM is not set
# CONFIG_PCI_ENDPOINT_TEST is not set
# CONFIG_XILINX_SDFEC is not set
CONFIG_MISC_RTSX=m
CONFIG_PVPANIC=y
# CONFIG_C2PORT is not set

#
# EEPROM support
#
# CONFIG_EEPROM_AT24 is not set
# CONFIG_EEPROM_AT25 is not set
CONFIG_EEPROM_LEGACY=m
CONFIG_EEPROM_MAX6875=m
CONFIG_EEPROM_93CX6=m
# CONFIG_EEPROM_93XX46 is not set
# CONFIG_EEPROM_IDT_89HPESX is not set
# CONFIG_EEPROM_EE1004 is not set
# end of EEPROM support

CONFIG_CB710_CORE=m
# CONFIG_CB710_DEBUG is not set
CONFIG_CB710_DEBUG_ASSUMPTIONS=y

#
# Texas Instruments shared transport line discipline
#
# CONFIG_TI_ST is not set
# end of Texas Instruments shared transport line discipline

CONFIG_SENSORS_LIS3_I2C=m
CONFIG_ALTERA_STAPL=m
CONFIG_INTEL_MEI=m
CONFIG_INTEL_MEI_ME=m
# CONFIG_INTEL_MEI_TXE is not set
# CONFIG_INTEL_MEI_HDCP is not set
CONFIG_VMWARE_VMCI=m

#
# Intel MIC & related support
#
# CONFIG_INTEL_MIC_BUS is not set
# CONFIG_SCIF_BUS is not set
# CONFIG_VOP_BUS is not set
# end of Intel MIC & related support

# CONFIG_GENWQE is not set
# CONFIG_ECHO is not set
# CONFIG_MISC_ALCOR_PCI is not set
CONFIG_MISC_RTSX_PCI=m
# CONFIG_MISC_RTSX_USB is not set
# CONFIG_HABANA_AI is not set
# CONFIG_UACCE is not set
# end of Misc devices

CONFIG_HAVE_IDE=y
# CONFIG_IDE is not set

#
# SCSI device support
#
CONFIG_SCSI_MOD=y
CONFIG_RAID_ATTRS=m
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
CONFIG_SCSI_NETLINK=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=m
CONFIG_CHR_DEV_ST=m
CONFIG_BLK_DEV_SR=m
CONFIG_CHR_DEV_SG=m
CONFIG_CHR_DEV_SCH=m
CONFIG_SCSI_ENCLOSURE=m
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_LOGGING=y
CONFIG_SCSI_SCAN_ASYNC=y

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=m
CONFIG_SCSI_FC_ATTRS=m
CONFIG_SCSI_ISCSI_ATTRS=m
CONFIG_SCSI_SAS_ATTRS=m
CONFIG_SCSI_SAS_LIBSAS=m
# CONFIG_SCSI_SAS_ATA is not set
CONFIG_SCSI_SAS_HOST_SMP=y
CONFIG_SCSI_SRP_ATTRS=m
# end of SCSI Transports

CONFIG_SCSI_LOWLEVEL=y
# CONFIG_ISCSI_TCP is not set
# CONFIG_ISCSI_BOOT_SYSFS is not set
# CONFIG_SCSI_CXGB3_ISCSI is not set
# CONFIG_SCSI_CXGB4_ISCSI is not set
# CONFIG_SCSI_BNX2_ISCSI is not set
# CONFIG_BE2ISCSI is not set
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_HPSA is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_3W_SAS is not set
# CONFIG_SCSI_ACARD is not set
# CONFIG_SCSI_AACRAID is not set
# CONFIG_SCSI_AIC7XXX is not set
# CONFIG_SCSI_AIC79XX is not set
# CONFIG_SCSI_AIC94XX is not set
# CONFIG_SCSI_MVSAS is not set
# CONFIG_SCSI_MVUMI is not set
# CONFIG_SCSI_DPT_I2O is not set
# CONFIG_SCSI_ADVANSYS is not set
# CONFIG_SCSI_ARCMSR is not set
# CONFIG_SCSI_ESAS2R is not set
# CONFIG_MEGARAID_NEWGEN is not set
# CONFIG_MEGARAID_LEGACY is not set
# CONFIG_MEGARAID_SAS is not set
CONFIG_SCSI_MPT3SAS=m
CONFIG_SCSI_MPT2SAS_MAX_SGE=128
CONFIG_SCSI_MPT3SAS_MAX_SGE=128
# CONFIG_SCSI_MPT2SAS is not set
# CONFIG_SCSI_SMARTPQI is not set
# CONFIG_SCSI_UFSHCD is not set
# CONFIG_SCSI_HPTIOP is not set
# CONFIG_SCSI_BUSLOGIC is not set
# CONFIG_SCSI_MYRB is not set
# CONFIG_SCSI_MYRS is not set
# CONFIG_VMWARE_PVSCSI is not set
# CONFIG_XEN_SCSI_FRONTEND is not set
CONFIG_HYPERV_STORAGE=m
# CONFIG_LIBFC is not set
# CONFIG_SCSI_SNIC is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_FDOMAIN_PCI is not set
# CONFIG_SCSI_GDTH is not set
# CONFIG_SCSI_ISCI is not set
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
# CONFIG_SCSI_PPA is not set
# CONFIG_SCSI_IMM is not set
# CONFIG_SCSI_STEX is not set
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_IPR is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
# CONFIG_SCSI_QLA_FC is not set
# CONFIG_SCSI_QLA_ISCSI is not set
# CONFIG_SCSI_LPFC is not set
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_AM53C974 is not set
# CONFIG_SCSI_WD719X is not set
CONFIG_SCSI_DEBUG=m
# CONFIG_SCSI_PMCRAID is not set
# CONFIG_SCSI_PM8001 is not set
# CONFIG_SCSI_BFA_FC is not set
# CONFIG_SCSI_VIRTIO is not set
# CONFIG_SCSI_CHELSIO_FCOE is not set
CONFIG_SCSI_DH=y
CONFIG_SCSI_DH_RDAC=y
CONFIG_SCSI_DH_HP_SW=y
CONFIG_SCSI_DH_EMC=y
CONFIG_SCSI_DH_ALUA=y
# end of SCSI device support

CONFIG_ATA=m
CONFIG_SATA_HOST=y
CONFIG_PATA_TIMINGS=y
CONFIG_ATA_VERBOSE_ERROR=y
CONFIG_ATA_FORCE=y
CONFIG_ATA_ACPI=y
# CONFIG_SATA_ZPODD is not set
CONFIG_SATA_PMP=y

#
# Controllers with non-SFF native interface
#
CONFIG_SATA_AHCI=m
CONFIG_SATA_MOBILE_LPM_POLICY=0
CONFIG_SATA_AHCI_PLATFORM=m
# CONFIG_SATA_INIC162X is not set
# CONFIG_SATA_ACARD_AHCI is not set
# CONFIG_SATA_SIL24 is not set
CONFIG_ATA_SFF=y

#
# SFF controllers with custom DMA interface
#
# CONFIG_PDC_ADMA is not set
# CONFIG_SATA_QSTOR is not set
# CONFIG_SATA_SX4 is not set
CONFIG_ATA_BMDMA=y

#
# SATA SFF controllers with BMDMA
#
CONFIG_ATA_PIIX=m
# CONFIG_SATA_DWC is not set
# CONFIG_SATA_MV is not set
# CONFIG_SATA_NV is not set
# CONFIG_SATA_PROMISE is not set
# CONFIG_SATA_SIL is not set
# CONFIG_SATA_SIS is not set
# CONFIG_SATA_SVW is not set
# CONFIG_SATA_ULI is not set
# CONFIG_SATA_VIA is not set
# CONFIG_SATA_VITESSE is not set

#
# PATA SFF controllers with BMDMA
#
# CONFIG_PATA_ALI is not set
# CONFIG_PATA_AMD is not set
# CONFIG_PATA_ARTOP is not set
# CONFIG_PATA_ATIIXP is not set
# CONFIG_PATA_ATP867X is not set
# CONFIG_PATA_CMD64X is not set
# CONFIG_PATA_CYPRESS is not set
# CONFIG_PATA_EFAR is not set
# CONFIG_PATA_HPT366 is not set
# CONFIG_PATA_HPT37X is not set
# CONFIG_PATA_HPT3X2N is not set
# CONFIG_PATA_HPT3X3 is not set
# CONFIG_PATA_IT8213 is not set
# CONFIG_PATA_IT821X is not set
# CONFIG_PATA_JMICRON is not set
# CONFIG_PATA_MARVELL is not set
# CONFIG_PATA_NETCELL is not set
# CONFIG_PATA_NINJA32 is not set
# CONFIG_PATA_NS87415 is not set
# CONFIG_PATA_OLDPIIX is not set
# CONFIG_PATA_OPTIDMA is not set
# CONFIG_PATA_PDC2027X is not set
# CONFIG_PATA_PDC_OLD is not set
# CONFIG_PATA_RADISYS is not set
# CONFIG_PATA_RDC is not set
# CONFIG_PATA_SCH is not set
# CONFIG_PATA_SERVERWORKS is not set
# CONFIG_PATA_SIL680 is not set
# CONFIG_PATA_SIS is not set
# CONFIG_PATA_TOSHIBA is not set
# CONFIG_PATA_TRIFLEX is not set
# CONFIG_PATA_VIA is not set
# CONFIG_PATA_WINBOND is not set

#
# PIO-only SFF controllers
#
# CONFIG_PATA_CMD640_PCI is not set
# CONFIG_PATA_MPIIX is not set
# CONFIG_PATA_NS87410 is not set
# CONFIG_PATA_OPTI is not set
# CONFIG_PATA_RZ1000 is not set

#
# Generic fallback / legacy drivers
#
# CONFIG_PATA_ACPI is not set
CONFIG_ATA_GENERIC=m
# CONFIG_PATA_LEGACY is not set
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_AUTODETECT=y
CONFIG_MD_LINEAR=m
CONFIG_MD_RAID0=m
CONFIG_MD_RAID1=m
CONFIG_MD_RAID10=m
CONFIG_MD_RAID456=m
CONFIG_MD_MULTIPATH=m
CONFIG_MD_FAULTY=m
CONFIG_MD_CLUSTER=m
# CONFIG_BCACHE is not set
CONFIG_BLK_DEV_DM_BUILTIN=y
CONFIG_BLK_DEV_DM=m
CONFIG_DM_DEBUG=y
CONFIG_DM_BUFIO=m
# CONFIG_DM_DEBUG_BLOCK_MANAGER_LOCKING is not set
CONFIG_DM_BIO_PRISON=m
CONFIG_DM_PERSISTENT_DATA=m
# CONFIG_DM_UNSTRIPED is not set
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_THIN_PROVISIONING=m
CONFIG_DM_CACHE=m
CONFIG_DM_CACHE_SMQ=m
CONFIG_DM_WRITECACHE=m
# CONFIG_DM_EBS is not set
CONFIG_DM_ERA=m
# CONFIG_DM_CLONE is not set
CONFIG_DM_MIRROR=m
CONFIG_DM_LOG_USERSPACE=m
CONFIG_DM_RAID=m
CONFIG_DM_ZERO=m
CONFIG_DM_MULTIPATH=m
CONFIG_DM_MULTIPATH_QL=m
CONFIG_DM_MULTIPATH_ST=m
# CONFIG_DM_MULTIPATH_HST is not set
CONFIG_DM_DELAY=m
# CONFIG_DM_DUST is not set
CONFIG_DM_UEVENT=y
CONFIG_DM_FLAKEY=m
CONFIG_DM_VERITY=m
# CONFIG_DM_VERITY_VERIFY_ROOTHASH_SIG is not set
# CONFIG_DM_VERITY_FEC is not set
CONFIG_DM_SWITCH=m
CONFIG_DM_LOG_WRITES=m
CONFIG_DM_INTEGRITY=m
# CONFIG_DM_ZONED is not set
CONFIG_TARGET_CORE=m
CONFIG_TCM_IBLOCK=m
CONFIG_TCM_FILEIO=m
CONFIG_TCM_PSCSI=m
CONFIG_TCM_USER2=m
CONFIG_LOOPBACK_TARGET=m
CONFIG_ISCSI_TARGET=m
# CONFIG_SBP_TARGET is not set
# CONFIG_FUSION is not set

#
# IEEE 1394 (FireWire) support
#
CONFIG_FIREWIRE=m
CONFIG_FIREWIRE_OHCI=m
CONFIG_FIREWIRE_SBP2=m
CONFIG_FIREWIRE_NET=m
# CONFIG_FIREWIRE_NOSY is not set
# end of IEEE 1394 (FireWire) support

CONFIG_MACINTOSH_DRIVERS=y
CONFIG_MAC_EMUMOUSEBTN=y
CONFIG_NETDEVICES=y
CONFIG_MII=y
CONFIG_NET_CORE=y
# CONFIG_BONDING is not set
# CONFIG_DUMMY is not set
# CONFIG_WIREGUARD is not set
# CONFIG_EQUALIZER is not set
# CONFIG_NET_FC is not set
# CONFIG_IFB is not set
# CONFIG_NET_TEAM is not set
# CONFIG_MACVLAN is not set
# CONFIG_IPVLAN is not set
# CONFIG_VXLAN is not set
# CONFIG_GENEVE is not set
# CONFIG_BAREUDP is not set
# CONFIG_GTP is not set
# CONFIG_MACSEC is not set
CONFIG_NETCONSOLE=m
CONFIG_NETCONSOLE_DYNAMIC=y
CONFIG_NETPOLL=y
CONFIG_NET_POLL_CONTROLLER=y
# CONFIG_TUN is not set
# CONFIG_TUN_VNET_CROSS_LE is not set
CONFIG_VETH=m
CONFIG_VIRTIO_NET=m
# CONFIG_NLMON is not set
# CONFIG_NET_VRF is not set
# CONFIG_VSOCKMON is not set
# CONFIG_ARCNET is not set
CONFIG_ATM_DRIVERS=y
# CONFIG_ATM_DUMMY is not set
# CONFIG_ATM_TCP is not set
# CONFIG_ATM_LANAI is not set
# CONFIG_ATM_ENI is not set
# CONFIG_ATM_FIRESTREAM is not set
# CONFIG_ATM_ZATM is not set
# CONFIG_ATM_NICSTAR is not set
# CONFIG_ATM_IDT77252 is not set
# CONFIG_ATM_AMBASSADOR is not set
# CONFIG_ATM_HORIZON is not set
# CONFIG_ATM_IA is not set
# CONFIG_ATM_FORE200E is not set
# CONFIG_ATM_HE is not set
# CONFIG_ATM_SOLOS is not set

#
# Distributed Switch Architecture drivers
#
# end of Distributed Switch Architecture drivers

CONFIG_ETHERNET=y
CONFIG_MDIO=y
CONFIG_NET_VENDOR_3COM=y
# CONFIG_VORTEX is not set
# CONFIG_TYPHOON is not set
CONFIG_NET_VENDOR_ADAPTEC=y
# CONFIG_ADAPTEC_STARFIRE is not set
CONFIG_NET_VENDOR_AGERE=y
# CONFIG_ET131X is not set
CONFIG_NET_VENDOR_ALACRITECH=y
# CONFIG_SLICOSS is not set
CONFIG_NET_VENDOR_ALTEON=y
# CONFIG_ACENIC is not set
# CONFIG_ALTERA_TSE is not set
CONFIG_NET_VENDOR_AMAZON=y
# CONFIG_ENA_ETHERNET is not set
CONFIG_NET_VENDOR_AMD=y
# CONFIG_AMD8111_ETH is not set
# CONFIG_PCNET32 is not set
# CONFIG_AMD_XGBE is not set
CONFIG_NET_VENDOR_AQUANTIA=y
# CONFIG_AQTION is not set
CONFIG_NET_VENDOR_ARC=y
CONFIG_NET_VENDOR_ATHEROS=y
# CONFIG_ATL2 is not set
# CONFIG_ATL1 is not set
# CONFIG_ATL1E is not set
# CONFIG_ATL1C is not set
# CONFIG_ALX is not set
# CONFIG_NET_VENDOR_AURORA is not set
CONFIG_NET_VENDOR_BROADCOM=y
# CONFIG_B44 is not set
# CONFIG_BCMGENET is not set
# CONFIG_BNX2 is not set
# CONFIG_CNIC is not set
CONFIG_TIGON3=y
CONFIG_TIGON3_HWMON=y
# CONFIG_BNX2X is not set
# CONFIG_SYSTEMPORT is not set
# CONFIG_BNXT is not set
CONFIG_NET_VENDOR_BROCADE=y
# CONFIG_BNA is not set
CONFIG_NET_VENDOR_CADENCE=y
# CONFIG_MACB is not set
CONFIG_NET_VENDOR_CAVIUM=y
# CONFIG_THUNDER_NIC_PF is not set
# CONFIG_THUNDER_NIC_VF is not set
# CONFIG_THUNDER_NIC_BGX is not set
# CONFIG_THUNDER_NIC_RGX is not set
CONFIG_CAVIUM_PTP=y
# CONFIG_LIQUIDIO is not set
# CONFIG_LIQUIDIO_VF is not set
CONFIG_NET_VENDOR_CHELSIO=y
# CONFIG_CHELSIO_T1 is not set
# CONFIG_CHELSIO_T3 is not set
# CONFIG_CHELSIO_T4 is not set
# CONFIG_CHELSIO_T4VF is not set
CONFIG_NET_VENDOR_CISCO=y
# CONFIG_ENIC is not set
CONFIG_NET_VENDOR_CORTINA=y
# CONFIG_CX_ECAT is not set
# CONFIG_DNET is not set
CONFIG_NET_VENDOR_DEC=y
# CONFIG_NET_TULIP is not set
CONFIG_NET_VENDOR_DLINK=y
# CONFIG_DL2K is not set
# CONFIG_SUNDANCE is not set
CONFIG_NET_VENDOR_EMULEX=y
# CONFIG_BE2NET is not set
CONFIG_NET_VENDOR_EZCHIP=y
CONFIG_NET_VENDOR_GOOGLE=y
# CONFIG_GVE is not set
CONFIG_NET_VENDOR_HUAWEI=y
# CONFIG_HINIC is not set
CONFIG_NET_VENDOR_I825XX=y
CONFIG_NET_VENDOR_INTEL=y
# CONFIG_E100 is not set
CONFIG_E1000=y
CONFIG_E1000E=y
CONFIG_E1000E_HWTS=y
CONFIG_IGB=y
CONFIG_IGB_HWMON=y
# CONFIG_IGBVF is not set
# CONFIG_IXGB is not set
CONFIG_IXGBE=y
CONFIG_IXGBE_HWMON=y
# CONFIG_IXGBE_DCB is not set
CONFIG_IXGBE_IPSEC=y
# CONFIG_IXGBEVF is not set
CONFIG_I40E=y
# CONFIG_I40E_DCB is not set
# CONFIG_I40EVF is not set
# CONFIG_ICE is not set
# CONFIG_FM10K is not set
# CONFIG_IGC is not set
# CONFIG_JME is not set
CONFIG_NET_VENDOR_MARVELL=y
# CONFIG_MVMDIO is not set
CONFIG_SKGE=y
# CONFIG_SKGE_DEBUG is not set
# CONFIG_SKGE_GENESIS is not set
# CONFIG_SKY2 is not set
CONFIG_NET_VENDOR_MELLANOX=y
# CONFIG_MLX4_EN is not set
# CONFIG_MLX5_CORE is not set
# CONFIG_MLXSW_CORE is not set
# CONFIG_MLXFW is not set
CONFIG_NET_VENDOR_MICREL=y
# CONFIG_KS8842 is not set
# CONFIG_KS8851 is not set
# CONFIG_KS8851_MLL is not set
# CONFIG_KSZ884X_PCI is not set
CONFIG_NET_VENDOR_MICROCHIP=y
# CONFIG_ENC28J60 is not set
# CONFIG_ENCX24J600 is not set
# CONFIG_LAN743X is not set
CONFIG_NET_VENDOR_MICROSEMI=y
CONFIG_NET_VENDOR_MYRI=y
# CONFIG_MYRI10GE is not set
# CONFIG_FEALNX is not set
CONFIG_NET_VENDOR_NATSEMI=y
# CONFIG_NATSEMI is not set
# CONFIG_NS83820 is not set
CONFIG_NET_VENDOR_NETERION=y
# CONFIG_S2IO is not set
# CONFIG_VXGE is not set
CONFIG_NET_VENDOR_NETRONOME=y
# CONFIG_NFP is not set
CONFIG_NET_VENDOR_NI=y
# CONFIG_NI_XGE_MANAGEMENT_ENET is not set
CONFIG_NET_VENDOR_8390=y
# CONFIG_NE2K_PCI is not set
CONFIG_NET_VENDOR_NVIDIA=y
# CONFIG_FORCEDETH is not set
CONFIG_NET_VENDOR_OKI=y
# CONFIG_ETHOC is not set
CONFIG_NET_VENDOR_PACKET_ENGINES=y
# CONFIG_HAMACHI is not set
CONFIG_YELLOWFIN=m
CONFIG_NET_VENDOR_PENSANDO=y
# CONFIG_IONIC is not set
CONFIG_NET_VENDOR_QLOGIC=y
# CONFIG_QLA3XXX is not set
# CONFIG_QLCNIC is not set
# CONFIG_NETXEN_NIC is not set
# CONFIG_QED is not set
CONFIG_NET_VENDOR_QUALCOMM=y
# CONFIG_QCOM_EMAC is not set
# CONFIG_RMNET is not set
CONFIG_NET_VENDOR_RDC=y
# CONFIG_R6040 is not set
CONFIG_NET_VENDOR_REALTEK=y
# CONFIG_ATP is not set
CONFIG_8139CP=y
CONFIG_8139TOO=y
CONFIG_8139TOO_PIO=y
# CONFIG_8139TOO_TUNE_TWISTER is not set
# CONFIG_8139TOO_8129 is not set
# CONFIG_8139_OLD_RX_RESET is not set
CONFIG_R8169=y
CONFIG_NET_VENDOR_RENESAS=y
CONFIG_NET_VENDOR_ROCKER=y
# CONFIG_ROCKER is not set
CONFIG_NET_VENDOR_SAMSUNG=y
# CONFIG_SXGBE_ETH is not set
CONFIG_NET_VENDOR_SEEQ=y
CONFIG_NET_VENDOR_SOLARFLARE=y
# CONFIG_SFC is not set
# CONFIG_SFC_FALCON is not set
CONFIG_NET_VENDOR_SILAN=y
# CONFIG_SC92031 is not set
CONFIG_NET_VENDOR_SIS=y
# CONFIG_SIS900 is not set
# CONFIG_SIS190 is not set
CONFIG_NET_VENDOR_SMSC=y
# CONFIG_EPIC100 is not set
# CONFIG_SMSC911X is not set
# CONFIG_SMSC9420 is not set
CONFIG_NET_VENDOR_SOCIONEXT=y
CONFIG_NET_VENDOR_STMICRO=y
# CONFIG_STMMAC_ETH is not set
CONFIG_NET_VENDOR_SUN=y
# CONFIG_HAPPYMEAL is not set
# CONFIG_SUNGEM is not set
# CONFIG_CASSINI is not set
# CONFIG_NIU is not set
CONFIG_NET_VENDOR_SYNOPSYS=y
# CONFIG_DWC_XLGMAC is not set
CONFIG_NET_VENDOR_TEHUTI=y
# CONFIG_TEHUTI is not set
CONFIG_NET_VENDOR_TI=y
# CONFIG_TI_CPSW_PHY_SEL is not set
# CONFIG_TLAN is not set
CONFIG_NET_VENDOR_VIA=y
# CONFIG_VIA_RHINE is not set
# CONFIG_VIA_VELOCITY is not set
CONFIG_NET_VENDOR_WIZNET=y
# CONFIG_WIZNET_W5100 is not set
# CONFIG_WIZNET_W5300 is not set
CONFIG_NET_VENDOR_XILINX=y
# CONFIG_XILINX_AXI_EMAC is not set
# CONFIG_XILINX_LL_TEMAC is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_NET_SB1000 is not set
CONFIG_MDIO_DEVICE=y
CONFIG_MDIO_BUS=y
CONFIG_MDIO_DEVRES=y
# CONFIG_MDIO_BCM_UNIMAC is not set
# CONFIG_MDIO_BITBANG is not set
# CONFIG_MDIO_MSCC_MIIM is not set
# CONFIG_MDIO_MVUSB is not set
# CONFIG_MDIO_THUNDER is not set
# CONFIG_MDIO_XPCS is not set
CONFIG_PHYLIB=y
# CONFIG_LED_TRIGGER_PHY is not set

#
# MII PHY device drivers
#
# CONFIG_ADIN_PHY is not set
# CONFIG_AMD_PHY is not set
# CONFIG_AQUANTIA_PHY is not set
# CONFIG_AX88796B_PHY is not set
# CONFIG_BCM7XXX_PHY is not set
# CONFIG_BCM87XX_PHY is not set
# CONFIG_BROADCOM_PHY is not set
# CONFIG_BCM54140_PHY is not set
# CONFIG_BCM84881_PHY is not set
# CONFIG_CICADA_PHY is not set
# CONFIG_CORTINA_PHY is not set
# CONFIG_DAVICOM_PHY is not set
# CONFIG_DP83822_PHY is not set
# CONFIG_DP83TC811_PHY is not set
# CONFIG_DP83848_PHY is not set
# CONFIG_DP83867_PHY is not set
# CONFIG_DP83869_PHY is not set
# CONFIG_FIXED_PHY is not set
# CONFIG_ICPLUS_PHY is not set
# CONFIG_INTEL_XWAY_PHY is not set
# CONFIG_LSI_ET1011C_PHY is not set
# CONFIG_LXT_PHY is not set
# CONFIG_MARVELL_PHY is not set
# CONFIG_MARVELL_10G_PHY is not set
# CONFIG_MICREL_PHY is not set
# CONFIG_MICROCHIP_PHY is not set
# CONFIG_MICROCHIP_T1_PHY is not set
# CONFIG_MICROSEMI_PHY is not set
# CONFIG_NATIONAL_PHY is not set
# CONFIG_NXP_TJA11XX_PHY is not set
# CONFIG_QSEMI_PHY is not set
CONFIG_REALTEK_PHY=y
# CONFIG_RENESAS_PHY is not set
# CONFIG_ROCKCHIP_PHY is not set
# CONFIG_SMSC_PHY is not set
# CONFIG_STE10XP is not set
# CONFIG_TERANETICS_PHY is not set
# CONFIG_VITESSE_PHY is not set
# CONFIG_XILINX_GMII2RGMII is not set
# CONFIG_MICREL_KS8995MA is not set
# CONFIG_PLIP is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set
CONFIG_USB_NET_DRIVERS=y
CONFIG_USB_CATC=y
CONFIG_USB_KAWETH=y
CONFIG_USB_PEGASUS=y
CONFIG_USB_RTL8150=y
# CONFIG_USB_RTL8152 is not set
# CONFIG_USB_LAN78XX is not set
CONFIG_USB_USBNET=y
CONFIG_USB_NET_AX8817X=y
CONFIG_USB_NET_AX88179_178A=y
CONFIG_USB_NET_CDCETHER=y
CONFIG_USB_NET_CDC_EEM=y
CONFIG_USB_NET_CDC_NCM=y
# CONFIG_USB_NET_HUAWEI_CDC_NCM is not set
# CONFIG_USB_NET_CDC_MBIM is not set
CONFIG_USB_NET_DM9601=y
# CONFIG_USB_NET_SR9700 is not set
# CONFIG_USB_NET_SR9800 is not set
CONFIG_USB_NET_SMSC75XX=y
CONFIG_USB_NET_SMSC95XX=y
CONFIG_USB_NET_GL620A=y
CONFIG_USB_NET_NET1080=y
CONFIG_USB_NET_PLUSB=y
CONFIG_USB_NET_MCS7830=y
CONFIG_USB_NET_RNDIS_HOST=y
CONFIG_USB_NET_CDC_SUBSET_ENABLE=y
CONFIG_USB_NET_CDC_SUBSET=y
# CONFIG_USB_ALI_M5632 is not set
# CONFIG_USB_AN2720 is not set
CONFIG_USB_BELKIN=y
CONFIG_USB_ARMLINUX=y
# CONFIG_USB_EPSON2888 is not set
# CONFIG_USB_KC2190 is not set
CONFIG_USB_NET_ZAURUS=y
# CONFIG_USB_NET_CX82310_ETH is not set
# CONFIG_USB_NET_KALMIA is not set
# CONFIG_USB_NET_QMI_WWAN is not set
# CONFIG_USB_HSO is not set
CONFIG_USB_NET_INT51X1=y
CONFIG_USB_IPHETH=y
CONFIG_USB_SIERRA_NET=y
# CONFIG_USB_VL600 is not set
# CONFIG_USB_NET_CH9200 is not set
# CONFIG_USB_NET_AQC111 is not set
CONFIG_WLAN=y
CONFIG_WLAN_VENDOR_ADMTEK=y
# CONFIG_ADM8211 is not set
CONFIG_WLAN_VENDOR_ATH=y
# CONFIG_ATH_DEBUG is not set
# CONFIG_ATH5K is not set
# CONFIG_ATH5K_PCI is not set
# CONFIG_ATH9K is not set
# CONFIG_ATH9K_HTC is not set
# CONFIG_CARL9170 is not set
# CONFIG_ATH6KL is not set
# CONFIG_AR5523 is not set
# CONFIG_WIL6210 is not set
# CONFIG_ATH10K is not set
# CONFIG_WCN36XX is not set
CONFIG_WLAN_VENDOR_ATMEL=y
# CONFIG_ATMEL is not set
# CONFIG_AT76C50X_USB is not set
CONFIG_WLAN_VENDOR_BROADCOM=y
# CONFIG_B43 is not set
# CONFIG_B43LEGACY is not set
# CONFIG_BRCMSMAC is not set
# CONFIG_BRCMFMAC is not set
CONFIG_WLAN_VENDOR_CISCO=y
# CONFIG_AIRO is not set
CONFIG_WLAN_VENDOR_INTEL=y
# CONFIG_IPW2100 is not set
# CONFIG_IPW2200 is not set
# CONFIG_IWL4965 is not set
# CONFIG_IWL3945 is not set
# CONFIG_IWLWIFI is not set
CONFIG_WLAN_VENDOR_INTERSIL=y
# CONFIG_HOSTAP is not set
# CONFIG_HERMES is not set
# CONFIG_P54_COMMON is not set
# CONFIG_PRISM54 is not set
CONFIG_WLAN_VENDOR_MARVELL=y
# CONFIG_LIBERTAS is not set
# CONFIG_LIBERTAS_THINFIRM is not set
# CONFIG_MWIFIEX is not set
# CONFIG_MWL8K is not set
CONFIG_WLAN_VENDOR_MEDIATEK=y
# CONFIG_MT7601U is not set
# CONFIG_MT76x0U is not set
# CONFIG_MT76x0E is not set
# CONFIG_MT76x2E is not set
# CONFIG_MT76x2U is not set
# CONFIG_MT7603E is not set
# CONFIG_MT7615E is not set
# CONFIG_MT7663U is not set
# CONFIG_MT7663S is not set
# CONFIG_MT7915E is not set
CONFIG_WLAN_VENDOR_MICROCHIP=y
# CONFIG_WILC1000_SDIO is not set
# CONFIG_WILC1000_SPI is not set
CONFIG_WLAN_VENDOR_RALINK=y
# CONFIG_RT2X00 is not set
CONFIG_WLAN_VENDOR_REALTEK=y
# CONFIG_RTL8180 is not set
# CONFIG_RTL8187 is not set
CONFIG_RTL_CARDS=m
# CONFIG_RTL8192CE is not set
# CONFIG_RTL8192SE is not set
# CONFIG_RTL8192DE is not set
# CONFIG_RTL8723AE is not set
# CONFIG_RTL8723BE is not set
# CONFIG_RTL8188EE is not set
# CONFIG_RTL8192EE is not set
# CONFIG_RTL8821AE is not set
# CONFIG_RTL8192CU is not set
# CONFIG_RTL8XXXU is not set
# CONFIG_RTW88 is not set
CONFIG_WLAN_VENDOR_RSI=y
# CONFIG_RSI_91X is not set
CONFIG_WLAN_VENDOR_ST=y
# CONFIG_CW1200 is not set
CONFIG_WLAN_VENDOR_TI=y
# CONFIG_WL1251 is not set
# CONFIG_WL12XX is not set
# CONFIG_WL18XX is not set
# CONFIG_WLCORE is not set
CONFIG_WLAN_VENDOR_ZYDAS=y
# CONFIG_USB_ZD1201 is not set
# CONFIG_ZD1211RW is not set
CONFIG_WLAN_VENDOR_QUANTENNA=y
# CONFIG_QTNFMAC_PCIE is not set
CONFIG_MAC80211_HWSIM=m
# CONFIG_USB_NET_RNDIS_WLAN is not set
# CONFIG_VIRT_WIFI is not set

#
# Enable WiMAX (Networking options) to see the WiMAX drivers
#
# CONFIG_WAN is not set
CONFIG_IEEE802154_DRIVERS=m
# CONFIG_IEEE802154_FAKELB is not set
# CONFIG_IEEE802154_AT86RF230 is not set
# CONFIG_IEEE802154_MRF24J40 is not set
# CONFIG_IEEE802154_CC2520 is not set
# CONFIG_IEEE802154_ATUSB is not set
# CONFIG_IEEE802154_ADF7242 is not set
# CONFIG_IEEE802154_CA8210 is not set
# CONFIG_IEEE802154_MCR20A is not set
# CONFIG_IEEE802154_HWSIM is not set
CONFIG_XEN_NETDEV_FRONTEND=y
# CONFIG_VMXNET3 is not set
# CONFIG_FUJITSU_ES is not set
# CONFIG_HYPERV_NET is not set
CONFIG_NETDEVSIM=m
CONFIG_NET_FAILOVER=m
# CONFIG_ISDN is not set
CONFIG_NVM=y
# CONFIG_NVM_PBLK is not set

#
# Input device support
#
CONFIG_INPUT=y
CONFIG_INPUT_LEDS=y
CONFIG_INPUT_FF_MEMLESS=m
CONFIG_INPUT_POLLDEV=m
CONFIG_INPUT_SPARSEKMAP=m
# CONFIG_INPUT_MATRIXKMAP is not set

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
CONFIG_INPUT_JOYDEV=m
CONFIG_INPUT_EVDEV=y
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
# CONFIG_KEYBOARD_ADP5588 is not set
# CONFIG_KEYBOARD_ADP5589 is not set
# CONFIG_KEYBOARD_APPLESPI is not set
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_QT1050 is not set
# CONFIG_KEYBOARD_QT1070 is not set
# CONFIG_KEYBOARD_QT2160 is not set
# CONFIG_KEYBOARD_DLINK_DIR685 is not set
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_GPIO is not set
# CONFIG_KEYBOARD_GPIO_POLLED is not set
# CONFIG_KEYBOARD_TCA6416 is not set
# CONFIG_KEYBOARD_TCA8418 is not set
# CONFIG_KEYBOARD_MATRIX is not set
# CONFIG_KEYBOARD_LM8323 is not set
# CONFIG_KEYBOARD_LM8333 is not set
# CONFIG_KEYBOARD_MAX7359 is not set
# CONFIG_KEYBOARD_MCS is not set
# CONFIG_KEYBOARD_MPR121 is not set
# CONFIG_KEYBOARD_NEWTON is not set
# CONFIG_KEYBOARD_OPENCORES is not set
# CONFIG_KEYBOARD_SAMSUNG is not set
# CONFIG_KEYBOARD_STOWAWAY is not set
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_TM2_TOUCHKEY is not set
# CONFIG_KEYBOARD_XTKBD is not set
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_PS2_ALPS=y
CONFIG_MOUSE_PS2_BYD=y
CONFIG_MOUSE_PS2_LOGIPS2PP=y
CONFIG_MOUSE_PS2_SYNAPTICS=y
CONFIG_MOUSE_PS2_SYNAPTICS_SMBUS=y
CONFIG_MOUSE_PS2_CYPRESS=y
CONFIG_MOUSE_PS2_LIFEBOOK=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
CONFIG_MOUSE_PS2_ELANTECH=y
CONFIG_MOUSE_PS2_ELANTECH_SMBUS=y
CONFIG_MOUSE_PS2_SENTELIC=y
# CONFIG_MOUSE_PS2_TOUCHKIT is not set
CONFIG_MOUSE_PS2_FOCALTECH=y
CONFIG_MOUSE_PS2_VMMOUSE=y
CONFIG_MOUSE_PS2_SMBUS=y
CONFIG_MOUSE_SERIAL=m
# CONFIG_MOUSE_APPLETOUCH is not set
# CONFIG_MOUSE_BCM5974 is not set
CONFIG_MOUSE_CYAPA=m
CONFIG_MOUSE_ELAN_I2C=m
CONFIG_MOUSE_ELAN_I2C_I2C=y
CONFIG_MOUSE_ELAN_I2C_SMBUS=y
CONFIG_MOUSE_VSXXXAA=m
# CONFIG_MOUSE_GPIO is not set
CONFIG_MOUSE_SYNAPTICS_I2C=m
# CONFIG_MOUSE_SYNAPTICS_USB is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TABLET is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
# CONFIG_INPUT_MISC is not set
CONFIG_RMI4_CORE=m
CONFIG_RMI4_I2C=m
CONFIG_RMI4_SPI=m
CONFIG_RMI4_SMB=m
CONFIG_RMI4_F03=y
CONFIG_RMI4_F03_SERIO=m
CONFIG_RMI4_2D_SENSOR=y
CONFIG_RMI4_F11=y
CONFIG_RMI4_F12=y
CONFIG_RMI4_F30=y
CONFIG_RMI4_F34=y
# CONFIG_RMI4_F54 is not set
CONFIG_RMI4_F55=y

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_ARCH_MIGHT_HAVE_PC_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_SERPORT=y
# CONFIG_SERIO_CT82C710 is not set
# CONFIG_SERIO_PARKBD is not set
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
CONFIG_SERIO_RAW=m
CONFIG_SERIO_ALTERA_PS2=m
# CONFIG_SERIO_PS2MULT is not set
CONFIG_SERIO_ARC_PS2=m
CONFIG_HYPERV_KEYBOARD=m
# CONFIG_SERIO_GPIO_PS2 is not set
# CONFIG_USERIO is not set
# CONFIG_GAMEPORT is not set
# end of Hardware I/O ports
# end of Input device support

#
# Character devices
#
CONFIG_TTY=y
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_VT_CONSOLE_SLEEP=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_UNIX98_PTYS=y
# CONFIG_LEGACY_PTYS is not set
CONFIG_LDISC_AUTOLOAD=y

#
# Serial drivers
#
CONFIG_SERIAL_EARLYCON=y
CONFIG_SERIAL_8250=y
# CONFIG_SERIAL_8250_DEPRECATED_OPTIONS is not set
CONFIG_SERIAL_8250_PNP=y
# CONFIG_SERIAL_8250_16550A_VARIANTS is not set
# CONFIG_SERIAL_8250_FINTEK is not set
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_DMA=y
CONFIG_SERIAL_8250_PCI=y
CONFIG_SERIAL_8250_EXAR=y
CONFIG_SERIAL_8250_NR_UARTS=64
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
CONFIG_SERIAL_8250_EXTENDED=y
CONFIG_SERIAL_8250_MANY_PORTS=y
CONFIG_SERIAL_8250_SHARE_IRQ=y
# CONFIG_SERIAL_8250_DETECT_IRQ is not set
CONFIG_SERIAL_8250_RSA=y
CONFIG_SERIAL_8250_DWLIB=y
CONFIG_SERIAL_8250_DW=y
# CONFIG_SERIAL_8250_RT288X is not set
CONFIG_SERIAL_8250_LPSS=y
CONFIG_SERIAL_8250_MID=y

#
# Non-8250 serial port support
#
# CONFIG_SERIAL_MAX3100 is not set
# CONFIG_SERIAL_MAX310X is not set
# CONFIG_SERIAL_UARTLITE is not set
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_SERIAL_JSM=m
# CONFIG_SERIAL_LANTIQ is not set
# CONFIG_SERIAL_SCCNXP is not set
# CONFIG_SERIAL_SC16IS7XX is not set
# CONFIG_SERIAL_ALTERA_JTAGUART is not set
# CONFIG_SERIAL_ALTERA_UART is not set
# CONFIG_SERIAL_IFX6X60 is not set
CONFIG_SERIAL_ARC=m
CONFIG_SERIAL_ARC_NR_PORTS=1
# CONFIG_SERIAL_RP2 is not set
# CONFIG_SERIAL_FSL_LPUART is not set
# CONFIG_SERIAL_FSL_LINFLEXUART is not set
# CONFIG_SERIAL_SPRD is not set
# end of Serial drivers

CONFIG_SERIAL_MCTRL_GPIO=y
CONFIG_SERIAL_NONSTANDARD=y
# CONFIG_ROCKETPORT is not set
CONFIG_CYCLADES=m
# CONFIG_CYZ_INTR is not set
# CONFIG_MOXA_INTELLIO is not set
# CONFIG_MOXA_SMARTIO is not set
CONFIG_SYNCLINK=m
CONFIG_SYNCLINKMP=m
CONFIG_SYNCLINK_GT=m
# CONFIG_ISI is not set
CONFIG_N_HDLC=m
CONFIG_N_GSM=m
CONFIG_NOZOMI=m
# CONFIG_NULL_TTY is not set
# CONFIG_TRACE_SINK is not set
CONFIG_HVC_DRIVER=y
CONFIG_HVC_IRQ=y
CONFIG_HVC_XEN=y
CONFIG_HVC_XEN_FRONTEND=y
# CONFIG_SERIAL_DEV_BUS is not set
CONFIG_PRINTER=m
# CONFIG_LP_CONSOLE is not set
CONFIG_PPDEV=m
CONFIG_VIRTIO_CONSOLE=y
CONFIG_IPMI_HANDLER=m
CONFIG_IPMI_DMI_DECODE=y
CONFIG_IPMI_PLAT_DATA=y
CONFIG_IPMI_PANIC_EVENT=y
CONFIG_IPMI_PANIC_STRING=y
CONFIG_IPMI_DEVICE_INTERFACE=m
CONFIG_IPMI_SI=m
CONFIG_IPMI_SSIF=m
CONFIG_IPMI_WATCHDOG=m
CONFIG_IPMI_POWEROFF=m
CONFIG_HW_RANDOM=y
CONFIG_HW_RANDOM_TIMERIOMEM=m
CONFIG_HW_RANDOM_INTEL=m
CONFIG_HW_RANDOM_AMD=m
# CONFIG_HW_RANDOM_BA431 is not set
CONFIG_HW_RANDOM_VIA=m
CONFIG_HW_RANDOM_VIRTIO=y
# CONFIG_APPLICOM is not set
# CONFIG_MWAVE is not set
CONFIG_DEVMEM=y
# CONFIG_DEVKMEM is not set
CONFIG_NVRAM=y
CONFIG_RAW_DRIVER=y
CONFIG_MAX_RAW_DEVS=8192
CONFIG_DEVPORT=y
CONFIG_HPET=y
CONFIG_HPET_MMAP=y
# CONFIG_HPET_MMAP_DEFAULT is not set
CONFIG_HANGCHECK_TIMER=m
CONFIG_UV_MMTIMER=m
CONFIG_TCG_TPM=y
CONFIG_HW_RANDOM_TPM=y
CONFIG_TCG_TIS_CORE=y
CONFIG_TCG_TIS=y
# CONFIG_TCG_TIS_SPI is not set
CONFIG_TCG_TIS_I2C_ATMEL=m
CONFIG_TCG_TIS_I2C_INFINEON=m
CONFIG_TCG_TIS_I2C_NUVOTON=m
CONFIG_TCG_NSC=m
CONFIG_TCG_ATMEL=m
CONFIG_TCG_INFINEON=m
# CONFIG_TCG_XEN is not set
CONFIG_TCG_CRB=y
# CONFIG_TCG_VTPM_PROXY is not set
CONFIG_TCG_TIS_ST33ZP24=m
CONFIG_TCG_TIS_ST33ZP24_I2C=m
# CONFIG_TCG_TIS_ST33ZP24_SPI is not set
CONFIG_TELCLOCK=m
# CONFIG_XILLYBUS is not set
# end of Character devices

# CONFIG_RANDOM_TRUST_CPU is not set
# CONFIG_RANDOM_TRUST_BOOTLOADER is not set

#
# I2C support
#
CONFIG_I2C=y
CONFIG_ACPI_I2C_OPREGION=y
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_COMPAT=y
CONFIG_I2C_CHARDEV=m
CONFIG_I2C_MUX=m

#
# Multiplexer I2C Chip support
#
# CONFIG_I2C_MUX_GPIO is not set
# CONFIG_I2C_MUX_LTC4306 is not set
# CONFIG_I2C_MUX_PCA9541 is not set
# CONFIG_I2C_MUX_PCA954x is not set
# CONFIG_I2C_MUX_REG is not set
CONFIG_I2C_MUX_MLXCPLD=m
# end of Multiplexer I2C Chip support

CONFIG_I2C_HELPER_AUTO=y
CONFIG_I2C_SMBUS=y
CONFIG_I2C_ALGOBIT=y
CONFIG_I2C_ALGOPCA=m

#
# I2C Hardware Bus support
#

#
# PC SMBus host controller drivers
#
# CONFIG_I2C_ALI1535 is not set
# CONFIG_I2C_ALI1563 is not set
# CONFIG_I2C_ALI15X3 is not set
CONFIG_I2C_AMD756=m
CONFIG_I2C_AMD756_S4882=m
CONFIG_I2C_AMD8111=m
# CONFIG_I2C_AMD_MP2 is not set
CONFIG_I2C_I801=y
CONFIG_I2C_ISCH=m
CONFIG_I2C_ISMT=m
CONFIG_I2C_PIIX4=m
CONFIG_I2C_NFORCE2=m
CONFIG_I2C_NFORCE2_S4985=m
# CONFIG_I2C_NVIDIA_GPU is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
CONFIG_I2C_SIS96X=m
CONFIG_I2C_VIA=m
CONFIG_I2C_VIAPRO=m

#
# ACPI drivers
#
CONFIG_I2C_SCMI=m

#
# I2C system bus drivers (mostly embedded / system-on-chip)
#
# CONFIG_I2C_CBUS_GPIO is not set
CONFIG_I2C_DESIGNWARE_CORE=m
# CONFIG_I2C_DESIGNWARE_SLAVE is not set
CONFIG_I2C_DESIGNWARE_PLATFORM=m
CONFIG_I2C_DESIGNWARE_BAYTRAIL=y
# CONFIG_I2C_DESIGNWARE_PCI is not set
# CONFIG_I2C_EMEV2 is not set
# CONFIG_I2C_GPIO is not set
# CONFIG_I2C_OCORES is not set
CONFIG_I2C_PCA_PLATFORM=m
CONFIG_I2C_SIMTEC=m
# CONFIG_I2C_XILINX is not set

#
# External I2C/SMBus adapter drivers
#
# CONFIG_I2C_DIOLAN_U2C is not set
CONFIG_I2C_PARPORT=m
# CONFIG_I2C_ROBOTFUZZ_OSIF is not set
# CONFIG_I2C_TAOS_EVM is not set
# CONFIG_I2C_TINY_USB is not set

#
# Other I2C/SMBus bus drivers
#
CONFIG_I2C_MLXCPLD=m
# end of I2C Hardware Bus support

CONFIG_I2C_STUB=m
# CONFIG_I2C_SLAVE is not set
# CONFIG_I2C_DEBUG_CORE is not set
# CONFIG_I2C_DEBUG_ALGO is not set
# CONFIG_I2C_DEBUG_BUS is not set
# end of I2C support

# CONFIG_I3C is not set
CONFIG_SPI=y
# CONFIG_SPI_DEBUG is not set
CONFIG_SPI_MASTER=y
# CONFIG_SPI_MEM is not set

#
# SPI Master Controller Drivers
#
# CONFIG_SPI_ALTERA is not set
# CONFIG_SPI_AXI_SPI_ENGINE is not set
# CONFIG_SPI_BITBANG is not set
# CONFIG_SPI_BUTTERFLY is not set
# CONFIG_SPI_CADENCE is not set
# CONFIG_SPI_DESIGNWARE is not set
# CONFIG_SPI_NXP_FLEXSPI is not set
# CONFIG_SPI_GPIO is not set
# CONFIG_SPI_LM70_LLP is not set
# CONFIG_SPI_LANTIQ_SSC is not set
# CONFIG_SPI_OC_TINY is not set
# CONFIG_SPI_PXA2XX is not set
# CONFIG_SPI_ROCKCHIP is not set
# CONFIG_SPI_SC18IS602 is not set
# CONFIG_SPI_SIFIVE is not set
# CONFIG_SPI_MXIC is not set
# CONFIG_SPI_XCOMM is not set
# CONFIG_SPI_XILINX is not set
# CONFIG_SPI_ZYNQMP_GQSPI is not set
# CONFIG_SPI_AMD is not set

#
# SPI Multiplexer support
#
# CONFIG_SPI_MUX is not set

#
# SPI Protocol Masters
#
# CONFIG_SPI_SPIDEV is not set
# CONFIG_SPI_LOOPBACK_TEST is not set
# CONFIG_SPI_TLE62X0 is not set
# CONFIG_SPI_SLAVE is not set
CONFIG_SPI_DYNAMIC=y
# CONFIG_SPMI is not set
# CONFIG_HSI is not set
CONFIG_PPS=y
# CONFIG_PPS_DEBUG is not set

#
# PPS clients support
#
# CONFIG_PPS_CLIENT_KTIMER is not set
CONFIG_PPS_CLIENT_LDISC=m
CONFIG_PPS_CLIENT_PARPORT=m
CONFIG_PPS_CLIENT_GPIO=m

#
# PPS generators support
#

#
# PTP clock support
#
CONFIG_PTP_1588_CLOCK=y
# CONFIG_DP83640_PHY is not set
# CONFIG_PTP_1588_CLOCK_INES is not set
CONFIG_PTP_1588_CLOCK_KVM=m
# CONFIG_PTP_1588_CLOCK_IDT82P33 is not set
# CONFIG_PTP_1588_CLOCK_IDTCM is not set
# CONFIG_PTP_1588_CLOCK_VMW is not set
# end of PTP clock support

CONFIG_PINCTRL=y
CONFIG_PINMUX=y
CONFIG_PINCONF=y
CONFIG_GENERIC_PINCONF=y
# CONFIG_DEBUG_PINCTRL is not set
CONFIG_PINCTRL_AMD=m
# CONFIG_PINCTRL_MCP23S08 is not set
# CONFIG_PINCTRL_SX150X is not set
CONFIG_PINCTRL_BAYTRAIL=y
# CONFIG_PINCTRL_CHERRYVIEW is not set
# CONFIG_PINCTRL_LYNXPOINT is not set
CONFIG_PINCTRL_INTEL=m
CONFIG_PINCTRL_BROXTON=m
CONFIG_PINCTRL_CANNONLAKE=m
CONFIG_PINCTRL_CEDARFORK=m
CONFIG_PINCTRL_DENVERTON=m
# CONFIG_PINCTRL_EMMITSBURG is not set
CONFIG_PINCTRL_GEMINILAKE=m
# CONFIG_PINCTRL_ICELAKE is not set
# CONFIG_PINCTRL_JASPERLAKE is not set
CONFIG_PINCTRL_LEWISBURG=m
CONFIG_PINCTRL_SUNRISEPOINT=m
# CONFIG_PINCTRL_TIGERLAKE is not set
CONFIG_GPIOLIB=y
CONFIG_GPIOLIB_FASTPATH_LIMIT=512
CONFIG_GPIO_ACPI=y
CONFIG_GPIOLIB_IRQCHIP=y
# CONFIG_DEBUG_GPIO is not set
CONFIG_GPIO_SYSFS=y
CONFIG_GPIO_GENERIC=m

#
# Memory mapped GPIO drivers
#
CONFIG_GPIO_AMDPT=m
# CONFIG_GPIO_DWAPB is not set
# CONFIG_GPIO_EXAR is not set
# CONFIG_GPIO_GENERIC_PLATFORM is not set
CONFIG_GPIO_ICH=m
# CONFIG_GPIO_MB86S7X is not set
# CONFIG_GPIO_VX855 is not set
# CONFIG_GPIO_XILINX is not set
# CONFIG_GPIO_AMD_FCH is not set
# end of Memory mapped GPIO drivers

#
# Port-mapped I/O GPIO drivers
#
# CONFIG_GPIO_F7188X is not set
# CONFIG_GPIO_IT87 is not set
# CONFIG_GPIO_SCH is not set
# CONFIG_GPIO_SCH311X is not set
# CONFIG_GPIO_WINBOND is not set
# CONFIG_GPIO_WS16C48 is not set
# end of Port-mapped I/O GPIO drivers

#
# I2C GPIO expanders
#
# CONFIG_GPIO_ADP5588 is not set
# CONFIG_GPIO_MAX7300 is not set
# CONFIG_GPIO_MAX732X is not set
# CONFIG_GPIO_PCA953X is not set
# CONFIG_GPIO_PCA9570 is not set
# CONFIG_GPIO_PCF857X is not set
# CONFIG_GPIO_TPIC2810 is not set
# end of I2C GPIO expanders

#
# MFD GPIO expanders
#
# end of MFD GPIO expanders

#
# PCI GPIO expanders
#
# CONFIG_GPIO_AMD8111 is not set
# CONFIG_GPIO_BT8XX is not set
# CONFIG_GPIO_ML_IOH is not set
# CONFIG_GPIO_PCI_IDIO_16 is not set
# CONFIG_GPIO_PCIE_IDIO_24 is not set
# CONFIG_GPIO_RDC321X is not set
# end of PCI GPIO expanders

#
# SPI GPIO expanders
#
# CONFIG_GPIO_MAX3191X is not set
# CONFIG_GPIO_MAX7301 is not set
# CONFIG_GPIO_MC33880 is not set
# CONFIG_GPIO_PISOSR is not set
# CONFIG_GPIO_XRA1403 is not set
# end of SPI GPIO expanders

#
# USB GPIO expanders
#
# end of USB GPIO expanders

# CONFIG_GPIO_AGGREGATOR is not set
# CONFIG_GPIO_MOCKUP is not set
# CONFIG_W1 is not set
# CONFIG_POWER_AVS is not set
CONFIG_POWER_RESET=y
# CONFIG_POWER_RESET_RESTART is not set
CONFIG_POWER_SUPPLY=y
# CONFIG_POWER_SUPPLY_DEBUG is not set
CONFIG_POWER_SUPPLY_HWMON=y
# CONFIG_PDA_POWER is not set
# CONFIG_TEST_POWER is not set
# CONFIG_CHARGER_ADP5061 is not set
# CONFIG_BATTERY_CW2015 is not set
# CONFIG_BATTERY_DS2780 is not set
# CONFIG_BATTERY_DS2781 is not set
# CONFIG_BATTERY_DS2782 is not set
# CONFIG_BATTERY_SBS is not set
# CONFIG_CHARGER_SBS is not set
# CONFIG_MANAGER_SBS is not set
# CONFIG_BATTERY_BQ27XXX is not set
# CONFIG_BATTERY_MAX17040 is not set
# CONFIG_BATTERY_MAX17042 is not set
# CONFIG_CHARGER_MAX8903 is not set
# CONFIG_CHARGER_LP8727 is not set
# CONFIG_CHARGER_GPIO is not set
# CONFIG_CHARGER_LT3651 is not set
# CONFIG_CHARGER_BQ2415X is not set
# CONFIG_CHARGER_BQ24257 is not set
# CONFIG_CHARGER_BQ24735 is not set
# CONFIG_CHARGER_BQ2515X is not set
# CONFIG_CHARGER_BQ25890 is not set
CONFIG_CHARGER_SMB347=m
# CONFIG_BATTERY_GAUGE_LTC2941 is not set
# CONFIG_CHARGER_RT9455 is not set
# CONFIG_CHARGER_BD99954 is not set
CONFIG_HWMON=y
CONFIG_HWMON_VID=m
# CONFIG_HWMON_DEBUG_CHIP is not set

#
# Native drivers
#
CONFIG_SENSORS_ABITUGURU=m
CONFIG_SENSORS_ABITUGURU3=m
# CONFIG_SENSORS_AD7314 is not set
CONFIG_SENSORS_AD7414=m
CONFIG_SENSORS_AD7418=m
CONFIG_SENSORS_ADM1021=m
CONFIG_SENSORS_ADM1025=m
CONFIG_SENSORS_ADM1026=m
CONFIG_SENSORS_ADM1029=m
CONFIG_SENSORS_ADM1031=m
# CONFIG_SENSORS_ADM1177 is not set
CONFIG_SENSORS_ADM9240=m
CONFIG_SENSORS_ADT7X10=m
# CONFIG_SENSORS_ADT7310 is not set
CONFIG_SENSORS_ADT7410=m
CONFIG_SENSORS_ADT7411=m
CONFIG_SENSORS_ADT7462=m
CONFIG_SENSORS_ADT7470=m
CONFIG_SENSORS_ADT7475=m
# CONFIG_SENSORS_AS370 is not set
CONFIG_SENSORS_ASC7621=m
# CONFIG_SENSORS_AXI_FAN_CONTROL is not set
CONFIG_SENSORS_K8TEMP=m
CONFIG_SENSORS_K10TEMP=m
CONFIG_SENSORS_FAM15H_POWER=m
# CONFIG_SENSORS_AMD_ENERGY is not set
CONFIG_SENSORS_APPLESMC=m
CONFIG_SENSORS_ASB100=m
# CONFIG_SENSORS_ASPEED is not set
CONFIG_SENSORS_ATXP1=m
# CONFIG_SENSORS_CORSAIR_CPRO is not set
# CONFIG_SENSORS_DRIVETEMP is not set
CONFIG_SENSORS_DS620=m
CONFIG_SENSORS_DS1621=m
CONFIG_SENSORS_DELL_SMM=m
CONFIG_SENSORS_I5K_AMB=m
CONFIG_SENSORS_F71805F=m
CONFIG_SENSORS_F71882FG=m
CONFIG_SENSORS_F75375S=m
CONFIG_SENSORS_FSCHMD=m
# CONFIG_SENSORS_FTSTEUTATES is not set
CONFIG_SENSORS_GL518SM=m
CONFIG_SENSORS_GL520SM=m
CONFIG_SENSORS_G760A=m
# CONFIG_SENSORS_G762 is not set
# CONFIG_SENSORS_HIH6130 is not set
CONFIG_SENSORS_IBMAEM=m
CONFIG_SENSORS_IBMPEX=m
CONFIG_SENSORS_I5500=m
CONFIG_SENSORS_CORETEMP=m
CONFIG_SENSORS_IT87=m
CONFIG_SENSORS_JC42=m
# CONFIG_SENSORS_POWR1220 is not set
CONFIG_SENSORS_LINEAGE=m
# CONFIG_SENSORS_LTC2945 is not set
# CONFIG_SENSORS_LTC2947_I2C is not set
# CONFIG_SENSORS_LTC2947_SPI is not set
# CONFIG_SENSORS_LTC2990 is not set
CONFIG_SENSORS_LTC4151=m
CONFIG_SENSORS_LTC4215=m
# CONFIG_SENSORS_LTC4222 is not set
CONFIG_SENSORS_LTC4245=m
# CONFIG_SENSORS_LTC4260 is not set
CONFIG_SENSORS_LTC4261=m
# CONFIG_SENSORS_MAX1111 is not set
CONFIG_SENSORS_MAX16065=m
CONFIG_SENSORS_MAX1619=m
CONFIG_SENSORS_MAX1668=m
CONFIG_SENSORS_MAX197=m
# CONFIG_SENSORS_MAX31722 is not set
# CONFIG_SENSORS_MAX31730 is not set
# CONFIG_SENSORS_MAX6621 is not set
CONFIG_SENSORS_MAX6639=m
CONFIG_SENSORS_MAX6642=m
CONFIG_SENSORS_MAX6650=m
CONFIG_SENSORS_MAX6697=m
# CONFIG_SENSORS_MAX31790 is not set
CONFIG_SENSORS_MCP3021=m
# CONFIG_SENSORS_MLXREG_FAN is not set
# CONFIG_SENSORS_TC654 is not set
# CONFIG_SENSORS_ADCXX is not set
CONFIG_SENSORS_LM63=m
# CONFIG_SENSORS_LM70 is not set
CONFIG_SENSORS_LM73=m
CONFIG_SENSORS_LM75=m
CONFIG_SENSORS_LM77=m
CONFIG_SENSORS_LM78=m
CONFIG_SENSORS_LM80=m
CONFIG_SENSORS_LM83=m
CONFIG_SENSORS_LM85=m
CONFIG_SENSORS_LM87=m
CONFIG_SENSORS_LM90=m
CONFIG_SENSORS_LM92=m
CONFIG_SENSORS_LM93=m
CONFIG_SENSORS_LM95234=m
CONFIG_SENSORS_LM95241=m
CONFIG_SENSORS_LM95245=m
CONFIG_SENSORS_PC87360=m
CONFIG_SENSORS_PC87427=m
CONFIG_SENSORS_NTC_THERMISTOR=m
# CONFIG_SENSORS_NCT6683 is not set
CONFIG_SENSORS_NCT6775=m
# CONFIG_SENSORS_NCT7802 is not set
# CONFIG_SENSORS_NCT7904 is not set
# CONFIG_SENSORS_NPCM7XX is not set
CONFIG_SENSORS_PCF8591=m
CONFIG_PMBUS=m
CONFIG_SENSORS_PMBUS=m
CONFIG_SENSORS_ADM1275=m
# CONFIG_SENSORS_BEL_PFE is not set
# CONFIG_SENSORS_IBM_CFFPS is not set
# CONFIG_SENSORS_INSPUR_IPSPS is not set
# CONFIG_SENSORS_IR35221 is not set
# CONFIG_SENSORS_IR38064 is not set
# CONFIG_SENSORS_IRPS5401 is not set
# CONFIG_SENSORS_ISL68137 is not set
CONFIG_SENSORS_LM25066=m
CONFIG_SENSORS_LTC2978=m
# CONFIG_SENSORS_LTC3815 is not set
CONFIG_SENSORS_MAX16064=m
# CONFIG_SENSORS_MAX16601 is not set
# CONFIG_SENSORS_MAX20730 is not set
# CONFIG_SENSORS_MAX20751 is not set
# CONFIG_SENSORS_MAX31785 is not set
CONFIG_SENSORS_MAX34440=m
CONFIG_SENSORS_MAX8688=m
# CONFIG_SENSORS_PXE1610 is not set
# CONFIG_SENSORS_TPS40422 is not set
# CONFIG_SENSORS_TPS53679 is not set
CONFIG_SENSORS_UCD9000=m
CONFIG_SENSORS_UCD9200=m
# CONFIG_SENSORS_XDPE122 is not set
CONFIG_SENSORS_ZL6100=m
CONFIG_SENSORS_SHT15=m
CONFIG_SENSORS_SHT21=m
# CONFIG_SENSORS_SHT3x is not set
# CONFIG_SENSORS_SHTC1 is not set
CONFIG_SENSORS_SIS5595=m
CONFIG_SENSORS_DME1737=m
CONFIG_SENSORS_EMC1403=m
# CONFIG_SENSORS_EMC2103 is not set
CONFIG_SENSORS_EMC6W201=m
CONFIG_SENSORS_SMSC47M1=m
CONFIG_SENSORS_SMSC47M192=m
CONFIG_SENSORS_SMSC47B397=m
CONFIG_SENSORS_SCH56XX_COMMON=m
CONFIG_SENSORS_SCH5627=m
CONFIG_SENSORS_SCH5636=m
# CONFIG_SENSORS_STTS751 is not set
# CONFIG_SENSORS_SMM665 is not set
# CONFIG_SENSORS_ADC128D818 is not set
CONFIG_SENSORS_ADS7828=m
# CONFIG_SENSORS_ADS7871 is not set
CONFIG_SENSORS_AMC6821=m
CONFIG_SENSORS_INA209=m
CONFIG_SENSORS_INA2XX=m
# CONFIG_SENSORS_INA3221 is not set
# CONFIG_SENSORS_TC74 is not set
CONFIG_SENSORS_THMC50=m
CONFIG_SENSORS_TMP102=m
# CONFIG_SENSORS_TMP103 is not set
# CONFIG_SENSORS_TMP108 is not set
CONFIG_SENSORS_TMP401=m
CONFIG_SENSORS_TMP421=m
# CONFIG_SENSORS_TMP513 is not set
CONFIG_SENSORS_VIA_CPUTEMP=m
CONFIG_SENSORS_VIA686A=m
CONFIG_SENSORS_VT1211=m
CONFIG_SENSORS_VT8231=m
# CONFIG_SENSORS_W83773G is not set
CONFIG_SENSORS_W83781D=m
CONFIG_SENSORS_W83791D=m
CONFIG_SENSORS_W83792D=m
CONFIG_SENSORS_W83793=m
CONFIG_SENSORS_W83795=m
# CONFIG_SENSORS_W83795_FANCTRL is not set
CONFIG_SENSORS_W83L785TS=m
CONFIG_SENSORS_W83L786NG=m
CONFIG_SENSORS_W83627HF=m
CONFIG_SENSORS_W83627EHF=m
# CONFIG_SENSORS_XGENE is not set

#
# ACPI drivers
#
CONFIG_SENSORS_ACPI_POWER=m
CONFIG_SENSORS_ATK0110=m
CONFIG_THERMAL=y
# CONFIG_THERMAL_NETLINK is not set
# CONFIG_THERMAL_STATISTICS is not set
CONFIG_THERMAL_EMERGENCY_POWEROFF_DELAY_MS=0
CONFIG_THERMAL_HWMON=y
CONFIG_THERMAL_WRITABLE_TRIPS=y
CONFIG_THERMAL_DEFAULT_GOV_STEP_WISE=y
# CONFIG_THERMAL_DEFAULT_GOV_FAIR_SHARE is not set
# CONFIG_THERMAL_DEFAULT_GOV_USER_SPACE is not set
CONFIG_THERMAL_GOV_FAIR_SHARE=y
CONFIG_THERMAL_GOV_STEP_WISE=y
CONFIG_THERMAL_GOV_BANG_BANG=y
CONFIG_THERMAL_GOV_USER_SPACE=y
# CONFIG_THERMAL_EMULATION is not set

#
# Intel thermal drivers
#
CONFIG_INTEL_POWERCLAMP=m
CONFIG_X86_PKG_TEMP_THERMAL=m
CONFIG_INTEL_SOC_DTS_IOSF_CORE=m
# CONFIG_INTEL_SOC_DTS_THERMAL is not set

#
# ACPI INT340X thermal drivers
#
CONFIG_INT340X_THERMAL=m
CONFIG_ACPI_THERMAL_REL=m
# CONFIG_INT3406_THERMAL is not set
CONFIG_PROC_THERMAL_MMIO_RAPL=y
# end of ACPI INT340X thermal drivers

CONFIG_INTEL_PCH_THERMAL=m
# end of Intel thermal drivers

CONFIG_WATCHDOG=y
CONFIG_WATCHDOG_CORE=y
# CONFIG_WATCHDOG_NOWAYOUT is not set
CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED=y
CONFIG_WATCHDOG_OPEN_TIMEOUT=0
CONFIG_WATCHDOG_SYSFS=y

#
# Watchdog Pretimeout Governors
#
# CONFIG_WATCHDOG_PRETIMEOUT_GOV is not set

#
# Watchdog Device Drivers
#
CONFIG_SOFT_WATCHDOG=m
CONFIG_WDAT_WDT=m
# CONFIG_XILINX_WATCHDOG is not set
# CONFIG_ZIIRAVE_WATCHDOG is not set
# CONFIG_MLX_WDT is not set
# CONFIG_CADENCE_WATCHDOG is not set
# CONFIG_DW_WATCHDOG is not set
# CONFIG_MAX63XX_WATCHDOG is not set
# CONFIG_ACQUIRE_WDT is not set
# CONFIG_ADVANTECH_WDT is not set
CONFIG_ALIM1535_WDT=m
CONFIG_ALIM7101_WDT=m
# CONFIG_EBC_C384_WDT is not set
CONFIG_F71808E_WDT=m
CONFIG_SP5100_TCO=m
CONFIG_SBC_FITPC2_WATCHDOG=m
# CONFIG_EUROTECH_WDT is not set
CONFIG_IB700_WDT=m
CONFIG_IBMASR=m
# CONFIG_WAFER_WDT is not set
CONFIG_I6300ESB_WDT=y
CONFIG_IE6XX_WDT=m
CONFIG_ITCO_WDT=y
CONFIG_ITCO_VENDOR_SUPPORT=y
CONFIG_IT8712F_WDT=m
CONFIG_IT87_WDT=m
CONFIG_HP_WATCHDOG=m
CONFIG_HPWDT_NMI_DECODING=y
# CONFIG_SC1200_WDT is not set
# CONFIG_PC87413_WDT is not set
CONFIG_NV_TCO=m
# CONFIG_60XX_WDT is not set
# CONFIG_CPU5_WDT is not set
CONFIG_SMSC_SCH311X_WDT=m
# CONFIG_SMSC37B787_WDT is not set
# CONFIG_TQMX86_WDT is not set
CONFIG_VIA_WDT=m
CONFIG_W83627HF_WDT=m
CONFIG_W83877F_WDT=m
CONFIG_W83977F_WDT=m
CONFIG_MACHZ_WDT=m
# CONFIG_SBC_EPX_C3_WATCHDOG is not set
CONFIG_INTEL_MEI_WDT=m
# CONFIG_NI903X_WDT is not set
# CONFIG_NIC7018_WDT is not set
# CONFIG_MEN_A21_WDT is not set
CONFIG_XEN_WDT=m

#
# PCI-based Watchdog Cards
#
CONFIG_PCIPCWATCHDOG=m
CONFIG_WDTPCI=m

#
# USB-based Watchdog Cards
#
# CONFIG_USBPCWATCHDOG is not set
CONFIG_SSB_POSSIBLE=y
# CONFIG_SSB is not set
CONFIG_BCMA_POSSIBLE=y
CONFIG_BCMA=m
CONFIG_BCMA_HOST_PCI_POSSIBLE=y
CONFIG_BCMA_HOST_PCI=y
# CONFIG_BCMA_HOST_SOC is not set
CONFIG_BCMA_DRIVER_PCI=y
CONFIG_BCMA_DRIVER_GMAC_CMN=y
CONFIG_BCMA_DRIVER_GPIO=y
# CONFIG_BCMA_DEBUG is not set

#
# Multifunction device drivers
#
CONFIG_MFD_CORE=y
# CONFIG_MFD_AS3711 is not set
# CONFIG_PMIC_ADP5520 is not set
# CONFIG_MFD_AAT2870_CORE is not set
# CONFIG_MFD_BCM590XX is not set
# CONFIG_MFD_BD9571MWV is not set
# CONFIG_MFD_AXP20X_I2C is not set
# CONFIG_MFD_MADERA is not set
# CONFIG_PMIC_DA903X is not set
# CONFIG_MFD_DA9052_SPI is not set
# CONFIG_MFD_DA9052_I2C is not set
# CONFIG_MFD_DA9055 is not set
# CONFIG_MFD_DA9062 is not set
# CONFIG_MFD_DA9063 is not set
# CONFIG_MFD_DA9150 is not set
# CONFIG_MFD_DLN2 is not set
# CONFIG_MFD_MC13XXX_SPI is not set
# CONFIG_MFD_MC13XXX_I2C is not set
# CONFIG_MFD_MP2629 is not set
# CONFIG_HTC_PASIC3 is not set
# CONFIG_HTC_I2CPLD is not set
# CONFIG_MFD_INTEL_QUARK_I2C_GPIO is not set
CONFIG_LPC_ICH=y
CONFIG_LPC_SCH=m
# CONFIG_INTEL_SOC_PMIC_CHTDC_TI is not set
CONFIG_MFD_INTEL_LPSS=y
CONFIG_MFD_INTEL_LPSS_ACPI=y
CONFIG_MFD_INTEL_LPSS_PCI=y
# CONFIG_MFD_INTEL_PMC_BXT is not set
# CONFIG_MFD_IQS62X is not set
# CONFIG_MFD_JANZ_CMODIO is not set
# CONFIG_MFD_KEMPLD is not set
# CONFIG_MFD_88PM800 is not set
# CONFIG_MFD_88PM805 is not set
# CONFIG_MFD_88PM860X is not set
# CONFIG_MFD_MAX14577 is not set
# CONFIG_MFD_MAX77693 is not set
# CONFIG_MFD_MAX77843 is not set
# CONFIG_MFD_MAX8907 is not set
# CONFIG_MFD_MAX8925 is not set
# CONFIG_MFD_MAX8997 is not set
# CONFIG_MFD_MAX8998 is not set
# CONFIG_MFD_MT6360 is not set
# CONFIG_MFD_MT6397 is not set
# CONFIG_MFD_MENF21BMC is not set
# CONFIG_EZX_PCAP is not set
# CONFIG_MFD_VIPERBOARD is not set
# CONFIG_MFD_RETU is not set
# CONFIG_MFD_PCF50633 is not set
# CONFIG_MFD_RDC321X is not set
# CONFIG_MFD_RT5033 is not set
# CONFIG_MFD_RC5T583 is not set
# CONFIG_MFD_SEC_CORE is not set
# CONFIG_MFD_SI476X_CORE is not set
CONFIG_MFD_SM501=m
CONFIG_MFD_SM501_GPIO=y
# CONFIG_MFD_SKY81452 is not set
# CONFIG_ABX500_CORE is not set
# CONFIG_MFD_SYSCON is not set
# CONFIG_MFD_TI_AM335X_TSCADC is not set
# CONFIG_MFD_LP3943 is not set
# CONFIG_MFD_LP8788 is not set
# CONFIG_MFD_TI_LMU is not set
# CONFIG_MFD_PALMAS is not set
# CONFIG_TPS6105X is not set
# CONFIG_TPS65010 is not set
# CONFIG_TPS6507X is not set
# CONFIG_MFD_TPS65086 is not set
# CONFIG_MFD_TPS65090 is not set
# CONFIG_MFD_TI_LP873X is not set
# CONFIG_MFD_TPS6586X is not set
# CONFIG_MFD_TPS65910 is not set
# CONFIG_MFD_TPS65912_I2C is not set
# CONFIG_MFD_TPS65912_SPI is not set
# CONFIG_MFD_TPS80031 is not set
# CONFIG_TWL4030_CORE is not set
# CONFIG_TWL6040_CORE is not set
# CONFIG_MFD_WL1273_CORE is not set
# CONFIG_MFD_LM3533 is not set
# CONFIG_MFD_TQMX86 is not set
CONFIG_MFD_VX855=m
# CONFIG_MFD_ARIZONA_I2C is not set
# CONFIG_MFD_ARIZONA_SPI is not set
# CONFIG_MFD_WM8400 is not set
# CONFIG_MFD_WM831X_I2C is not set
# CONFIG_MFD_WM831X_SPI is not set
# CONFIG_MFD_WM8350_I2C is not set
# CONFIG_MFD_WM8994 is not set
# end of Multifunction device drivers

# CONFIG_REGULATOR is not set
CONFIG_RC_CORE=m
CONFIG_RC_MAP=m
CONFIG_LIRC=y
CONFIG_RC_DECODERS=y
CONFIG_IR_NEC_DECODER=m
CONFIG_IR_RC5_DECODER=m
CONFIG_IR_RC6_DECODER=m
CONFIG_IR_JVC_DECODER=m
CONFIG_IR_SONY_DECODER=m
CONFIG_IR_SANYO_DECODER=m
# CONFIG_IR_SHARP_DECODER is not set
CONFIG_IR_MCE_KBD_DECODER=m
# CONFIG_IR_XMP_DECODER is not set
CONFIG_IR_IMON_DECODER=m
# CONFIG_IR_RCMM_DECODER is not set
CONFIG_RC_DEVICES=y
# CONFIG_RC_ATI_REMOTE is not set
CONFIG_IR_ENE=m
# CONFIG_IR_IMON is not set
# CONFIG_IR_IMON_RAW is not set
# CONFIG_IR_MCEUSB is not set
CONFIG_IR_ITE_CIR=m
CONFIG_IR_FINTEK=m
CONFIG_IR_NUVOTON=m
# CONFIG_IR_REDRAT3 is not set
# CONFIG_IR_STREAMZAP is not set
CONFIG_IR_WINBOND_CIR=m
# CONFIG_IR_IGORPLUGUSB is not set
# CONFIG_IR_IGUANA is not set
# CONFIG_IR_TTUSBIR is not set
# CONFIG_RC_LOOPBACK is not set
CONFIG_IR_SERIAL=m
CONFIG_IR_SERIAL_TRANSMITTER=y
CONFIG_IR_SIR=m
# CONFIG_RC_XBOX_DVD is not set
# CONFIG_IR_TOY is not set
CONFIG_MEDIA_CEC_SUPPORT=y
# CONFIG_CEC_CH7322 is not set
# CONFIG_CEC_SECO is not set
# CONFIG_USB_PULSE8_CEC is not set
# CONFIG_USB_RAINSHADOW_CEC is not set
CONFIG_MEDIA_SUPPORT=m
# CONFIG_MEDIA_SUPPORT_FILTER is not set
# CONFIG_MEDIA_SUBDRV_AUTOSELECT is not set

#
# Media device types
#
CONFIG_MEDIA_CAMERA_SUPPORT=y
CONFIG_MEDIA_ANALOG_TV_SUPPORT=y
CONFIG_MEDIA_DIGITAL_TV_SUPPORT=y
CONFIG_MEDIA_RADIO_SUPPORT=y
CONFIG_MEDIA_SDR_SUPPORT=y
CONFIG_MEDIA_PLATFORM_SUPPORT=y
CONFIG_MEDIA_TEST_SUPPORT=y
# end of Media device types

#
# Media core support
#
CONFIG_VIDEO_DEV=m
CONFIG_MEDIA_CONTROLLER=y
CONFIG_DVB_CORE=m
# end of Media core support

#
# Video4Linux options
#
CONFIG_VIDEO_V4L2=m
CONFIG_VIDEO_V4L2_I2C=y
CONFIG_VIDEO_V4L2_SUBDEV_API=y
# CONFIG_VIDEO_ADV_DEBUG is not set
# CONFIG_VIDEO_FIXED_MINOR_RANGES is not set
# end of Video4Linux options

#
# Media controller options
#
# CONFIG_MEDIA_CONTROLLER_DVB is not set
# end of Media controller options

#
# Digital TV options
#
# CONFIG_DVB_MMAP is not set
CONFIG_DVB_NET=y
CONFIG_DVB_MAX_ADAPTERS=16
CONFIG_DVB_DYNAMIC_MINORS=y
# CONFIG_DVB_DEMUX_SECTION_LOSS_LOG is not set
# CONFIG_DVB_ULE_DEBUG is not set
# end of Digital TV options

#
# Media drivers
#
# CONFIG_MEDIA_USB_SUPPORT is not set
# CONFIG_MEDIA_PCI_SUPPORT is not set
CONFIG_RADIO_ADAPTERS=y
# CONFIG_RADIO_SI470X is not set
# CONFIG_RADIO_SI4713 is not set
# CONFIG_USB_MR800 is not set
# CONFIG_USB_DSBR is not set
# CONFIG_RADIO_MAXIRADIO is not set
# CONFIG_RADIO_SHARK is not set
# CONFIG_RADIO_SHARK2 is not set
# CONFIG_USB_KEENE is not set
# CONFIG_USB_RAREMONO is not set
# CONFIG_USB_MA901 is not set
# CONFIG_RADIO_TEA5764 is not set
# CONFIG_RADIO_SAA7706H is not set
# CONFIG_RADIO_TEF6862 is not set
# CONFIG_RADIO_WL1273 is not set
CONFIG_VIDEOBUF2_CORE=m
CONFIG_VIDEOBUF2_V4L2=m
CONFIG_VIDEOBUF2_MEMOPS=m
CONFIG_VIDEOBUF2_VMALLOC=m
# CONFIG_V4L_PLATFORM_DRIVERS is not set
# CONFIG_V4L_MEM2MEM_DRIVERS is not set
# CONFIG_DVB_PLATFORM_DRIVERS is not set
# CONFIG_SDR_PLATFORM_DRIVERS is not set

#
# MMC/SDIO DVB adapters
#
# CONFIG_SMS_SDIO_DRV is not set
# CONFIG_V4L_TEST_DRIVERS is not set

#
# FireWire (IEEE 1394) Adapters
#
# CONFIG_DVB_FIREDTV is not set
# end of Media drivers

#
# Media ancillary drivers
#
CONFIG_MEDIA_ATTACH=y
CONFIG_VIDEO_IR_I2C=m

#
# Audio decoders, processors and mixers
#
# CONFIG_VIDEO_TVAUDIO is not set
# CONFIG_VIDEO_TDA7432 is not set
# CONFIG_VIDEO_TDA9840 is not set
# CONFIG_VIDEO_TEA6415C is not set
# CONFIG_VIDEO_TEA6420 is not set
# CONFIG_VIDEO_MSP3400 is not set
# CONFIG_VIDEO_CS3308 is not set
# CONFIG_VIDEO_CS5345 is not set
# CONFIG_VIDEO_CS53L32A is not set
# CONFIG_VIDEO_TLV320AIC23B is not set
# CONFIG_VIDEO_UDA1342 is not set
# CONFIG_VIDEO_WM8775 is not set
# CONFIG_VIDEO_WM8739 is not set
# CONFIG_VIDEO_VP27SMPX is not set
# CONFIG_VIDEO_SONY_BTF_MPX is not set
# end of Audio decoders, processors and mixers

#
# RDS decoders
#
# CONFIG_VIDEO_SAA6588 is not set
# end of RDS decoders

#
# Video decoders
#
# CONFIG_VIDEO_ADV7180 is not set
# CONFIG_VIDEO_ADV7183 is not set
# CONFIG_VIDEO_ADV7604 is not set
# CONFIG_VIDEO_ADV7842 is not set
# CONFIG_VIDEO_BT819 is not set
# CONFIG_VIDEO_BT856 is not set
# CONFIG_VIDEO_BT866 is not set
# CONFIG_VIDEO_KS0127 is not set
# CONFIG_VIDEO_ML86V7667 is not set
# CONFIG_VIDEO_SAA7110 is not set
# CONFIG_VIDEO_SAA711X is not set
# CONFIG_VIDEO_TC358743 is not set
# CONFIG_VIDEO_TVP514X is not set
# CONFIG_VIDEO_TVP5150 is not set
# CONFIG_VIDEO_TVP7002 is not set
# CONFIG_VIDEO_TW2804 is not set
# CONFIG_VIDEO_TW9903 is not set
# CONFIG_VIDEO_TW9906 is not set
# CONFIG_VIDEO_TW9910 is not set
# CONFIG_VIDEO_VPX3220 is not set

#
# Video and audio decoders
#
# CONFIG_VIDEO_SAA717X is not set
# CONFIG_VIDEO_CX25840 is not set
# end of Video decoders

#
# Video encoders
#
# CONFIG_VIDEO_SAA7127 is not set
# CONFIG_VIDEO_SAA7185 is not set
# CONFIG_VIDEO_ADV7170 is not set
# CONFIG_VIDEO_ADV7175 is not set
# CONFIG_VIDEO_ADV7343 is not set
# CONFIG_VIDEO_ADV7393 is not set
# CONFIG_VIDEO_ADV7511 is not set
# CONFIG_VIDEO_AD9389B is not set
# CONFIG_VIDEO_AK881X is not set
# CONFIG_VIDEO_THS8200 is not set
# end of Video encoders

#
# Video improvement chips
#
# CONFIG_VIDEO_UPD64031A is not set
# CONFIG_VIDEO_UPD64083 is not set
# end of Video improvement chips

#
# Audio/Video compression chips
#
# CONFIG_VIDEO_SAA6752HS is not set
# end of Audio/Video compression chips

#
# SDR tuner chips
#
# CONFIG_SDR_MAX2175 is not set
# end of SDR tuner chips

#
# Miscellaneous helper chips
#
# CONFIG_VIDEO_THS7303 is not set
# CONFIG_VIDEO_M52790 is not set
# CONFIG_VIDEO_I2C is not set
# CONFIG_VIDEO_ST_MIPID02 is not set
# end of Miscellaneous helper chips

#
# Camera sensor devices
#
# CONFIG_VIDEO_HI556 is not set
# CONFIG_VIDEO_IMX219 is not set
# CONFIG_VIDEO_IMX258 is not set
# CONFIG_VIDEO_IMX274 is not set
# CONFIG_VIDEO_IMX290 is not set
# CONFIG_VIDEO_IMX319 is not set
# CONFIG_VIDEO_IMX355 is not set
# CONFIG_VIDEO_OV2640 is not set
# CONFIG_VIDEO_OV2659 is not set
# CONFIG_VIDEO_OV2680 is not set
# CONFIG_VIDEO_OV2685 is not set
# CONFIG_VIDEO_OV2740 is not set
# CONFIG_VIDEO_OV5647 is not set
# CONFIG_VIDEO_OV6650 is not set
# CONFIG_VIDEO_OV5670 is not set
# CONFIG_VIDEO_OV5675 is not set
# CONFIG_VIDEO_OV5695 is not set
# CONFIG_VIDEO_OV7251 is not set
# CONFIG_VIDEO_OV772X is not set
# CONFIG_VIDEO_OV7640 is not set
# CONFIG_VIDEO_OV7670 is not set
# CONFIG_VIDEO_OV7740 is not set
# CONFIG_VIDEO_OV8856 is not set
# CONFIG_VIDEO_OV9640 is not set
# CONFIG_VIDEO_OV9650 is not set
# CONFIG_VIDEO_OV13858 is not set
# CONFIG_VIDEO_VS6624 is not set
# CONFIG_VIDEO_MT9M001 is not set
# CONFIG_VIDEO_MT9M032 is not set
# CONFIG_VIDEO_MT9M111 is not set
# CONFIG_VIDEO_MT9P031 is not set
# CONFIG_VIDEO_MT9T001 is not set
# CONFIG_VIDEO_MT9T112 is not set
# CONFIG_VIDEO_MT9V011 is not set
# CONFIG_VIDEO_MT9V032 is not set
# CONFIG_VIDEO_MT9V111 is not set
# CONFIG_VIDEO_SR030PC30 is not set
# CONFIG_VIDEO_NOON010PC30 is not set
# CONFIG_VIDEO_M5MOLS is not set
# CONFIG_VIDEO_RDACM20 is not set
# CONFIG_VIDEO_RJ54N1 is not set
# CONFIG_VIDEO_S5K6AA is not set
# CONFIG_VIDEO_S5K6A3 is not set
# CONFIG_VIDEO_S5K4ECGX is not set
# CONFIG_VIDEO_S5K5BAF is not set
# CONFIG_VIDEO_SMIAPP is not set
# CONFIG_VIDEO_ET8EK8 is not set
# CONFIG_VIDEO_S5C73M3 is not set
# end of Camera sensor devices

#
# Lens drivers
#
# CONFIG_VIDEO_AD5820 is not set
# CONFIG_VIDEO_AK7375 is not set
# CONFIG_VIDEO_DW9714 is not set
# CONFIG_VIDEO_DW9768 is not set
# CONFIG_VIDEO_DW9807_VCM is not set
# end of Lens drivers

#
# Flash devices
#
# CONFIG_VIDEO_ADP1653 is not set
# CONFIG_VIDEO_LM3560 is not set
# CONFIG_VIDEO_LM3646 is not set
# end of Flash devices

#
# SPI helper chips
#
# CONFIG_VIDEO_GS1662 is not set
# end of SPI helper chips

#
# Media SPI Adapters
#
CONFIG_CXD2880_SPI_DRV=m
# end of Media SPI Adapters

CONFIG_MEDIA_TUNER=m

#
# Customize TV tuners
#
CONFIG_MEDIA_TUNER_SIMPLE=m
CONFIG_MEDIA_TUNER_TDA18250=m
CONFIG_MEDIA_TUNER_TDA8290=m
CONFIG_MEDIA_TUNER_TDA827X=m
CONFIG_MEDIA_TUNER_TDA18271=m
CONFIG_MEDIA_TUNER_TDA9887=m
CONFIG_MEDIA_TUNER_TEA5761=m
CONFIG_MEDIA_TUNER_TEA5767=m
CONFIG_MEDIA_TUNER_MSI001=m
CONFIG_MEDIA_TUNER_MT20XX=m
CONFIG_MEDIA_TUNER_MT2060=m
CONFIG_MEDIA_TUNER_MT2063=m
CONFIG_MEDIA_TUNER_MT2266=m
CONFIG_MEDIA_TUNER_MT2131=m
CONFIG_MEDIA_TUNER_QT1010=m
CONFIG_MEDIA_TUNER_XC2028=m
CONFIG_MEDIA_TUNER_XC5000=m
CONFIG_MEDIA_TUNER_XC4000=m
CONFIG_MEDIA_TUNER_MXL5005S=m
CONFIG_MEDIA_TUNER_MXL5007T=m
CONFIG_MEDIA_TUNER_MC44S803=m
CONFIG_MEDIA_TUNER_MAX2165=m
CONFIG_MEDIA_TUNER_TDA18218=m
CONFIG_MEDIA_TUNER_FC0011=m
CONFIG_MEDIA_TUNER_FC0012=m
CONFIG_MEDIA_TUNER_FC0013=m
CONFIG_MEDIA_TUNER_TDA18212=m
CONFIG_MEDIA_TUNER_E4000=m
CONFIG_MEDIA_TUNER_FC2580=m
CONFIG_MEDIA_TUNER_M88RS6000T=m
CONFIG_MEDIA_TUNER_TUA9001=m
CONFIG_MEDIA_TUNER_SI2157=m
CONFIG_MEDIA_TUNER_IT913X=m
CONFIG_MEDIA_TUNER_R820T=m
CONFIG_MEDIA_TUNER_MXL301RF=m
CONFIG_MEDIA_TUNER_QM1D1C0042=m
CONFIG_MEDIA_TUNER_QM1D1B0004=m
# end of Customize TV tuners

#
# Customise DVB Frontends
#

#
# Multistandard (satellite) frontends
#
CONFIG_DVB_STB0899=m
CONFIG_DVB_STB6100=m
CONFIG_DVB_STV090x=m
CONFIG_DVB_STV0910=m
CONFIG_DVB_STV6110x=m
CONFIG_DVB_STV6111=m
CONFIG_DVB_MXL5XX=m
CONFIG_DVB_M88DS3103=m

#
# Multistandard (cable + terrestrial) frontends
#
CONFIG_DVB_DRXK=m
CONFIG_DVB_TDA18271C2DD=m
CONFIG_DVB_SI2165=m
CONFIG_DVB_MN88472=m
CONFIG_DVB_MN88473=m

#
# DVB-S (satellite) frontends
#
CONFIG_DVB_CX24110=m
CONFIG_DVB_CX24123=m
CONFIG_DVB_MT312=m
CONFIG_DVB_ZL10036=m
CONFIG_DVB_ZL10039=m
CONFIG_DVB_S5H1420=m
CONFIG_DVB_STV0288=m
CONFIG_DVB_STB6000=m
CONFIG_DVB_STV0299=m
CONFIG_DVB_STV6110=m
CONFIG_DVB_STV0900=m
CONFIG_DVB_TDA8083=m
CONFIG_DVB_TDA10086=m
CONFIG_DVB_TDA8261=m
CONFIG_DVB_VES1X93=m
CONFIG_DVB_TUNER_ITD1000=m
CONFIG_DVB_TUNER_CX24113=m
CONFIG_DVB_TDA826X=m
CONFIG_DVB_TUA6100=m
CONFIG_DVB_CX24116=m
CONFIG_DVB_CX24117=m
CONFIG_DVB_CX24120=m
CONFIG_DVB_SI21XX=m
CONFIG_DVB_TS2020=m
CONFIG_DVB_DS3000=m
CONFIG_DVB_MB86A16=m
CONFIG_DVB_TDA10071=m

#
# DVB-T (terrestrial) frontends
#
CONFIG_DVB_SP8870=m
CONFIG_DVB_SP887X=m
CONFIG_DVB_CX22700=m
CONFIG_DVB_CX22702=m
CONFIG_DVB_S5H1432=m
CONFIG_DVB_DRXD=m
CONFIG_DVB_L64781=m
CONFIG_DVB_TDA1004X=m
CONFIG_DVB_NXT6000=m
CONFIG_DVB_MT352=m
CONFIG_DVB_ZL10353=m
CONFIG_DVB_DIB3000MB=m
CONFIG_DVB_DIB3000MC=m
CONFIG_DVB_DIB7000M=m
CONFIG_DVB_DIB7000P=m
CONFIG_DVB_DIB9000=m
CONFIG_DVB_TDA10048=m
CONFIG_DVB_AF9013=m
CONFIG_DVB_EC100=m
CONFIG_DVB_STV0367=m
CONFIG_DVB_CXD2820R=m
CONFIG_DVB_CXD2841ER=m
CONFIG_DVB_RTL2830=m
CONFIG_DVB_RTL2832=m
CONFIG_DVB_RTL2832_SDR=m
CONFIG_DVB_SI2168=m
CONFIG_DVB_ZD1301_DEMOD=m
CONFIG_DVB_CXD2880=m

#
# DVB-C (cable) frontends
#
CONFIG_DVB_VES1820=m
CONFIG_DVB_TDA10021=m
CONFIG_DVB_TDA10023=m
CONFIG_DVB_STV0297=m

#
# ATSC (North American/Korean Terrestrial/Cable DTV) frontends
#
CONFIG_DVB_NXT200X=m
CONFIG_DVB_OR51211=m
CONFIG_DVB_OR51132=m
CONFIG_DVB_BCM3510=m
CONFIG_DVB_LGDT330X=m
CONFIG_DVB_LGDT3305=m
CONFIG_DVB_LGDT3306A=m
CONFIG_DVB_LG2160=m
CONFIG_DVB_S5H1409=m
CONFIG_DVB_AU8522=m
CONFIG_DVB_AU8522_DTV=m
CONFIG_DVB_AU8522_V4L=m
CONFIG_DVB_S5H1411=m

#
# ISDB-T (terrestrial) frontends
#
CONFIG_DVB_S921=m
CONFIG_DVB_DIB8000=m
CONFIG_DVB_MB86A20S=m

#
# ISDB-S (satellite) & ISDB-T (terrestrial) frontends
#
CONFIG_DVB_TC90522=m
CONFIG_DVB_MN88443X=m

#
# Digital terrestrial only tuners/PLL
#
CONFIG_DVB_PLL=m
CONFIG_DVB_TUNER_DIB0070=m
CONFIG_DVB_TUNER_DIB0090=m

#
# SEC control devices for DVB-S
#
CONFIG_DVB_DRX39XYJ=m
CONFIG_DVB_LNBH25=m
CONFIG_DVB_LNBH29=m
CONFIG_DVB_LNBP21=m
CONFIG_DVB_LNBP22=m
CONFIG_DVB_ISL6405=m
CONFIG_DVB_ISL6421=m
CONFIG_DVB_ISL6423=m
CONFIG_DVB_A8293=m
CONFIG_DVB_LGS8GL5=m
CONFIG_DVB_LGS8GXX=m
CONFIG_DVB_ATBM8830=m
CONFIG_DVB_TDA665x=m
CONFIG_DVB_IX2505V=m
CONFIG_DVB_M88RS2000=m
CONFIG_DVB_AF9033=m
CONFIG_DVB_HORUS3A=m
CONFIG_DVB_ASCOT2E=m
CONFIG_DVB_HELENE=m

#
# Common Interface (EN50221) controller drivers
#
CONFIG_DVB_CXD2099=m
CONFIG_DVB_SP2=m
# end of Customise DVB Frontends

#
# Tools to develop new frontends
#
# CONFIG_DVB_DUMMY_FE is not set
# end of Media ancillary drivers

#
# Graphics support
#
# CONFIG_AGP is not set
CONFIG_INTEL_GTT=m
CONFIG_VGA_ARB=y
CONFIG_VGA_ARB_MAX_GPUS=64
CONFIG_VGA_SWITCHEROO=y
CONFIG_DRM=m
CONFIG_DRM_MIPI_DSI=y
CONFIG_DRM_DP_AUX_CHARDEV=y
# CONFIG_DRM_DEBUG_SELFTEST is not set
CONFIG_DRM_KMS_HELPER=m
CONFIG_DRM_KMS_FB_HELPER=y
CONFIG_DRM_FBDEV_EMULATION=y
CONFIG_DRM_FBDEV_OVERALLOC=100
CONFIG_DRM_LOAD_EDID_FIRMWARE=y
# CONFIG_DRM_DP_CEC is not set
CONFIG_DRM_TTM=m
CONFIG_DRM_TTM_DMA_PAGE_POOL=y
CONFIG_DRM_VRAM_HELPER=m
CONFIG_DRM_TTM_HELPER=m
CONFIG_DRM_GEM_SHMEM_HELPER=y

#
# I2C encoder or helper chips
#
CONFIG_DRM_I2C_CH7006=m
CONFIG_DRM_I2C_SIL164=m
# CONFIG_DRM_I2C_NXP_TDA998X is not set
# CONFIG_DRM_I2C_NXP_TDA9950 is not set
# end of I2C encoder or helper chips

#
# ARM devices
#
# end of ARM devices

# CONFIG_DRM_RADEON is not set
# CONFIG_DRM_AMDGPU is not set
# CONFIG_DRM_NOUVEAU is not set
CONFIG_DRM_I915=m
CONFIG_DRM_I915_FORCE_PROBE=""
CONFIG_DRM_I915_CAPTURE_ERROR=y
CONFIG_DRM_I915_COMPRESS_ERROR=y
CONFIG_DRM_I915_USERPTR=y
CONFIG_DRM_I915_GVT=y
CONFIG_DRM_I915_GVT_KVMGT=m
CONFIG_DRM_I915_FENCE_TIMEOUT=10000
CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND=250
CONFIG_DRM_I915_HEARTBEAT_INTERVAL=2500
CONFIG_DRM_I915_PREEMPT_TIMEOUT=640
CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT=8000
CONFIG_DRM_I915_STOP_TIMEOUT=100
CONFIG_DRM_I915_TIMESLICE_DURATION=1
CONFIG_DRM_VGEM=m
# CONFIG_DRM_VKMS is not set
CONFIG_DRM_VMWGFX=m
CONFIG_DRM_VMWGFX_FBCON=y
CONFIG_DRM_GMA500=m
CONFIG_DRM_GMA600=y
CONFIG_DRM_GMA3600=y
# CONFIG_DRM_UDL is not set
CONFIG_DRM_AST=m
CONFIG_DRM_MGAG200=m
CONFIG_DRM_QXL=m
CONFIG_DRM_BOCHS=m
CONFIG_DRM_VIRTIO_GPU=m
CONFIG_DRM_PANEL=y

#
# Display Panels
#
# CONFIG_DRM_PANEL_RASPBERRYPI_TOUCHSCREEN is not set
# end of Display Panels

CONFIG_DRM_BRIDGE=y
CONFIG_DRM_PANEL_BRIDGE=y

#
# Display Interface Bridges
#
# CONFIG_DRM_ANALOGIX_ANX78XX is not set
# end of Display Interface Bridges

# CONFIG_DRM_ETNAVIV is not set
CONFIG_DRM_CIRRUS_QEMU=m
# CONFIG_DRM_GM12U320 is not set
# CONFIG_TINYDRM_HX8357D is not set
# CONFIG_TINYDRM_ILI9225 is not set
# CONFIG_TINYDRM_ILI9341 is not set
# CONFIG_TINYDRM_ILI9486 is not set
# CONFIG_TINYDRM_MI0283QT is not set
# CONFIG_TINYDRM_REPAPER is not set
# CONFIG_TINYDRM_ST7586 is not set
# CONFIG_TINYDRM_ST7735R is not set
# CONFIG_DRM_XEN is not set
# CONFIG_DRM_VBOXVIDEO is not set
# CONFIG_DRM_LEGACY is not set
CONFIG_DRM_PANEL_ORIENTATION_QUIRKS=y

#
# Frame buffer Devices
#
CONFIG_FB_CMDLINE=y
CONFIG_FB_NOTIFY=y
CONFIG_FB=y
# CONFIG_FIRMWARE_EDID is not set
CONFIG_FB_BOOT_VESA_SUPPORT=y
CONFIG_FB_CFB_FILLRECT=y
CONFIG_FB_CFB_COPYAREA=y
CONFIG_FB_CFB_IMAGEBLIT=y
CONFIG_FB_SYS_FILLRECT=m
CONFIG_FB_SYS_COPYAREA=m
CONFIG_FB_SYS_IMAGEBLIT=m
# CONFIG_FB_FOREIGN_ENDIAN is not set
CONFIG_FB_SYS_FOPS=m
CONFIG_FB_DEFERRED_IO=y
# CONFIG_FB_MODE_HELPERS is not set
CONFIG_FB_TILEBLITTING=y

#
# Frame buffer hardware drivers
#
# CONFIG_FB_CIRRUS is not set
# CONFIG_FB_PM2 is not set
# CONFIG_FB_CYBER2000 is not set
# CONFIG_FB_ARC is not set
# CONFIG_FB_ASILIANT is not set
# CONFIG_FB_IMSTT is not set
# CONFIG_FB_VGA16 is not set
# CONFIG_FB_UVESA is not set
CONFIG_FB_VESA=y
CONFIG_FB_EFI=y
# CONFIG_FB_N411 is not set
# CONFIG_FB_HGA is not set
# CONFIG_FB_OPENCORES is not set
# CONFIG_FB_S1D13XXX is not set
# CONFIG_FB_NVIDIA is not set
# CONFIG_FB_RIVA is not set
# CONFIG_FB_I740 is not set
# CONFIG_FB_LE80578 is not set
# CONFIG_FB_MATROX is not set
# CONFIG_FB_RADEON is not set
# CONFIG_FB_ATY128 is not set
# CONFIG_FB_ATY is not set
# CONFIG_FB_S3 is not set
# CONFIG_FB_SAVAGE is not set
# CONFIG_FB_SIS is not set
# CONFIG_FB_VIA is not set
# CONFIG_FB_NEOMAGIC is not set
# CONFIG_FB_KYRO is not set
# CONFIG_FB_3DFX is not set
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_VT8623 is not set
# CONFIG_FB_TRIDENT is not set
# CONFIG_FB_ARK is not set
# CONFIG_FB_PM3 is not set
# CONFIG_FB_CARMINE is not set
# CONFIG_FB_SM501 is not set
# CONFIG_FB_SMSCUFX is not set
# CONFIG_FB_UDL is not set
# CONFIG_FB_IBM_GXT4500 is not set
# CONFIG_FB_VIRTUAL is not set
# CONFIG_XEN_FBDEV_FRONTEND is not set
# CONFIG_FB_METRONOME is not set
# CONFIG_FB_MB862XX is not set
CONFIG_FB_HYPERV=m
# CONFIG_FB_SIMPLE is not set
# CONFIG_FB_SM712 is not set
# end of Frame buffer Devices

#
# Backlight & LCD device support
#
CONFIG_LCD_CLASS_DEVICE=m
# CONFIG_LCD_L4F00242T03 is not set
# CONFIG_LCD_LMS283GF05 is not set
# CONFIG_LCD_LTV350QV is not set
# CONFIG_LCD_ILI922X is not set
# CONFIG_LCD_ILI9320 is not set
# CONFIG_LCD_TDO24M is not set
# CONFIG_LCD_VGG2432A4 is not set
CONFIG_LCD_PLATFORM=m
# CONFIG_LCD_AMS369FG06 is not set
# CONFIG_LCD_LMS501KF03 is not set
# CONFIG_LCD_HX8357 is not set
# CONFIG_LCD_OTM3225A is not set
CONFIG_BACKLIGHT_CLASS_DEVICE=y
# CONFIG_BACKLIGHT_PWM is not set
CONFIG_BACKLIGHT_APPLE=m
# CONFIG_BACKLIGHT_QCOM_WLED is not set
# CONFIG_BACKLIGHT_SAHARA is not set
# CONFIG_BACKLIGHT_ADP8860 is not set
# CONFIG_BACKLIGHT_ADP8870 is not set
# CONFIG_BACKLIGHT_LM3630A is not set
# CONFIG_BACKLIGHT_LM3639 is not set
CONFIG_BACKLIGHT_LP855X=m
# CONFIG_BACKLIGHT_GPIO is not set
# CONFIG_BACKLIGHT_LV5207LP is not set
# CONFIG_BACKLIGHT_BD6107 is not set
# CONFIG_BACKLIGHT_ARCXCNN is not set
# end of Backlight & LCD device support

CONFIG_HDMI=y

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_VGACON_SOFT_SCROLLBACK=y
CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=64
# CONFIG_VGACON_SOFT_SCROLLBACK_PERSISTENT_ENABLE_BY_DEFAULT is not set
CONFIG_DUMMY_CONSOLE=y
CONFIG_DUMMY_CONSOLE_COLUMNS=80
CONFIG_DUMMY_CONSOLE_ROWS=25
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
CONFIG_FRAMEBUFFER_CONSOLE_ROTATION=y
# CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER is not set
# end of Console display driver support

CONFIG_LOGO=y
# CONFIG_LOGO_LINUX_MONO is not set
# CONFIG_LOGO_LINUX_VGA16 is not set
CONFIG_LOGO_LINUX_CLUT224=y
# end of Graphics support

# CONFIG_SOUND is not set

#
# HID support
#
CONFIG_HID=y
CONFIG_HID_BATTERY_STRENGTH=y
CONFIG_HIDRAW=y
CONFIG_UHID=m
CONFIG_HID_GENERIC=y

#
# Special HID drivers
#
CONFIG_HID_A4TECH=m
# CONFIG_HID_ACCUTOUCH is not set
CONFIG_HID_ACRUX=m
# CONFIG_HID_ACRUX_FF is not set
CONFIG_HID_APPLE=m
# CONFIG_HID_APPLEIR is not set
CONFIG_HID_ASUS=m
CONFIG_HID_AUREAL=m
CONFIG_HID_BELKIN=m
# CONFIG_HID_BETOP_FF is not set
# CONFIG_HID_BIGBEN_FF is not set
CONFIG_HID_CHERRY=m
CONFIG_HID_CHICONY=m
# CONFIG_HID_CORSAIR is not set
# CONFIG_HID_COUGAR is not set
# CONFIG_HID_MACALLY is not set
CONFIG_HID_CMEDIA=m
# CONFIG_HID_CP2112 is not set
# CONFIG_HID_CREATIVE_SB0540 is not set
CONFIG_HID_CYPRESS=m
CONFIG_HID_DRAGONRISE=m
# CONFIG_DRAGONRISE_FF is not set
# CONFIG_HID_EMS_FF is not set
# CONFIG_HID_ELAN is not set
CONFIG_HID_ELECOM=m
# CONFIG_HID_ELO is not set
CONFIG_HID_EZKEY=m
CONFIG_HID_GEMBIRD=m
CONFIG_HID_GFRM=m
# CONFIG_HID_GLORIOUS is not set
# CONFIG_HID_HOLTEK is not set
# CONFIG_HID_GT683R is not set
CONFIG_HID_KEYTOUCH=m
CONFIG_HID_KYE=m
# CONFIG_HID_UCLOGIC is not set
CONFIG_HID_WALTOP=m
# CONFIG_HID_VIEWSONIC is not set
CONFIG_HID_GYRATION=m
CONFIG_HID_ICADE=m
CONFIG_HID_ITE=m
CONFIG_HID_JABRA=m
CONFIG_HID_TWINHAN=m
CONFIG_HID_KENSINGTON=m
CONFIG_HID_LCPOWER=m
CONFIG_HID_LED=m
CONFIG_HID_LENOVO=m
CONFIG_HID_LOGITECH=m
CONFIG_HID_LOGITECH_DJ=m
CONFIG_HID_LOGITECH_HIDPP=m
# CONFIG_LOGITECH_FF is not set
# CONFIG_LOGIRUMBLEPAD2_FF is not set
# CONFIG_LOGIG940_FF is not set
# CONFIG_LOGIWHEELS_FF is not set
CONFIG_HID_MAGICMOUSE=y
# CONFIG_HID_MALTRON is not set
# CONFIG_HID_MAYFLASH is not set
# CONFIG_HID_REDRAGON is not set
CONFIG_HID_MICROSOFT=m
CONFIG_HID_MONTEREY=m
CONFIG_HID_MULTITOUCH=m
CONFIG_HID_NTI=m
# CONFIG_HID_NTRIG is not set
CONFIG_HID_ORTEK=m
CONFIG_HID_PANTHERLORD=m
# CONFIG_PANTHERLORD_FF is not set
# CONFIG_HID_PENMOUNT is not set
CONFIG_HID_PETALYNX=m
CONFIG_HID_PICOLCD=m
CONFIG_HID_PICOLCD_FB=y
CONFIG_HID_PICOLCD_BACKLIGHT=y
CONFIG_HID_PICOLCD_LCD=y
CONFIG_HID_PICOLCD_LEDS=y
CONFIG_HID_PICOLCD_CIR=y
CONFIG_HID_PLANTRONICS=m
CONFIG_HID_PRIMAX=m
# CONFIG_HID_RETRODE is not set
# CONFIG_HID_ROCCAT is not set
CONFIG_HID_SAITEK=m
CONFIG_HID_SAMSUNG=m
# CONFIG_HID_SONY is not set
CONFIG_HID_SPEEDLINK=m
# CONFIG_HID_STEAM is not set
CONFIG_HID_STEELSERIES=m
CONFIG_HID_SUNPLUS=m
CONFIG_HID_RMI=m
CONFIG_HID_GREENASIA=m
# CONFIG_GREENASIA_FF is not set
CONFIG_HID_HYPERV_MOUSE=m
CONFIG_HID_SMARTJOYPLUS=m
# CONFIG_SMARTJOYPLUS_FF is not set
CONFIG_HID_TIVO=m
CONFIG_HID_TOPSEED=m
CONFIG_HID_THINGM=m
CONFIG_HID_THRUSTMASTER=m
# CONFIG_THRUSTMASTER_FF is not set
# CONFIG_HID_UDRAW_PS3 is not set
# CONFIG_HID_U2FZERO is not set
# CONFIG_HID_WACOM is not set
CONFIG_HID_WIIMOTE=m
CONFIG_HID_XINMO=m
CONFIG_HID_ZEROPLUS=m
# CONFIG_ZEROPLUS_FF is not set
CONFIG_HID_ZYDACRON=m
CONFIG_HID_SENSOR_HUB=y
CONFIG_HID_SENSOR_CUSTOM_SENSOR=m
CONFIG_HID_ALPS=m
# CONFIG_HID_MCP2221 is not set
# end of Special HID drivers

#
# USB HID support
#
CONFIG_USB_HID=y
# CONFIG_HID_PID is not set
# CONFIG_USB_HIDDEV is not set
# end of USB HID support

#
# I2C HID support
#
CONFIG_I2C_HID=m
# end of I2C HID support

#
# Intel ISH HID support
#
CONFIG_INTEL_ISH_HID=m
# CONFIG_INTEL_ISH_FIRMWARE_DOWNLOADER is not set
# end of Intel ISH HID support
# end of HID support

CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_SUPPORT=y
CONFIG_USB_COMMON=y
# CONFIG_USB_LED_TRIG is not set
# CONFIG_USB_ULPI_BUS is not set
# CONFIG_USB_CONN_GPIO is not set
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB=y
CONFIG_USB_PCI=y
CONFIG_USB_ANNOUNCE_NEW_DEVICES=y

#
# Miscellaneous USB options
#
CONFIG_USB_DEFAULT_PERSIST=y
# CONFIG_USB_DYNAMIC_MINORS is not set
# CONFIG_USB_OTG is not set
# CONFIG_USB_OTG_PRODUCTLIST is not set
CONFIG_USB_LEDS_TRIGGER_USBPORT=y
CONFIG_USB_AUTOSUSPEND_DELAY=2
CONFIG_USB_MON=y

#
# USB Host Controller Drivers
#
# CONFIG_USB_C67X00_HCD is not set
CONFIG_USB_XHCI_HCD=y
# CONFIG_USB_XHCI_DBGCAP is not set
CONFIG_USB_XHCI_PCI=y
# CONFIG_USB_XHCI_PCI_RENESAS is not set
# CONFIG_USB_XHCI_PLATFORM is not set
CONFIG_USB_EHCI_HCD=y
CONFIG_USB_EHCI_ROOT_HUB_TT=y
CONFIG_USB_EHCI_TT_NEWSCHED=y
CONFIG_USB_EHCI_PCI=y
# CONFIG_USB_EHCI_FSL is not set
# CONFIG_USB_EHCI_HCD_PLATFORM is not set
# CONFIG_USB_OXU210HP_HCD is not set
# CONFIG_USB_ISP116X_HCD is not set
# CONFIG_USB_FOTG210_HCD is not set
# CONFIG_USB_MAX3421_HCD is not set
CONFIG_USB_OHCI_HCD=y
CONFIG_USB_OHCI_HCD_PCI=y
# CONFIG_USB_OHCI_HCD_PLATFORM is not set
CONFIG_USB_UHCI_HCD=y
# CONFIG_USB_SL811_HCD is not set
# CONFIG_USB_R8A66597_HCD is not set
# CONFIG_USB_HCD_BCMA is not set
# CONFIG_USB_HCD_TEST_MODE is not set

#
# USB Device Class drivers
#
# CONFIG_USB_ACM is not set
# CONFIG_USB_PRINTER is not set
# CONFIG_USB_WDM is not set
# CONFIG_USB_TMC is not set

#
# NOTE: USB_STORAGE depends on SCSI but BLK_DEV_SD may
#

#
# also be needed; see USB_STORAGE Help for more info
#
CONFIG_USB_STORAGE=m
# CONFIG_USB_STORAGE_DEBUG is not set
# CONFIG_USB_STORAGE_REALTEK is not set
# CONFIG_USB_STORAGE_DATAFAB is not set
# CONFIG_USB_STORAGE_FREECOM is not set
# CONFIG_USB_STORAGE_ISD200 is not set
# CONFIG_USB_STORAGE_USBAT is not set
# CONFIG_USB_STORAGE_SDDR09 is not set
# CONFIG_USB_STORAGE_SDDR55 is not set
# CONFIG_USB_STORAGE_JUMPSHOT is not set
# CONFIG_USB_STORAGE_ALAUDA is not set
# CONFIG_USB_STORAGE_ONETOUCH is not set
# CONFIG_USB_STORAGE_KARMA is not set
# CONFIG_USB_STORAGE_CYPRESS_ATACB is not set
# CONFIG_USB_STORAGE_ENE_UB6250 is not set
# CONFIG_USB_UAS is not set

#
# USB Imaging devices
#
# CONFIG_USB_MDC800 is not set
# CONFIG_USB_MICROTEK is not set
# CONFIG_USBIP_CORE is not set
# CONFIG_USB_CDNS3 is not set
# CONFIG_USB_MUSB_HDRC is not set
# CONFIG_USB_DWC3 is not set
# CONFIG_USB_DWC2 is not set
# CONFIG_USB_CHIPIDEA is not set
# CONFIG_USB_ISP1760 is not set

#
# USB port drivers
#
# CONFIG_USB_USS720 is not set
CONFIG_USB_SERIAL=m
CONFIG_USB_SERIAL_GENERIC=y
# CONFIG_USB_SERIAL_SIMPLE is not set
# CONFIG_USB_SERIAL_AIRCABLE is not set
# CONFIG_USB_SERIAL_ARK3116 is not set
# CONFIG_USB_SERIAL_BELKIN is not set
# CONFIG_USB_SERIAL_CH341 is not set
# CONFIG_USB_SERIAL_WHITEHEAT is not set
# CONFIG_USB_SERIAL_DIGI_ACCELEPORT is not set
# CONFIG_USB_SERIAL_CP210X is not set
# CONFIG_USB_SERIAL_CYPRESS_M8 is not set
# CONFIG_USB_SERIAL_EMPEG is not set
# CONFIG_USB_SERIAL_FTDI_SIO is not set
# CONFIG_USB_SERIAL_VISOR is not set
# CONFIG_USB_SERIAL_IPAQ is not set
# CONFIG_USB_SERIAL_IR is not set
# CONFIG_USB_SERIAL_EDGEPORT is not set
# CONFIG_USB_SERIAL_EDGEPORT_TI is not set
# CONFIG_USB_SERIAL_F81232 is not set
# CONFIG_USB_SERIAL_F8153X is not set
# CONFIG_USB_SERIAL_GARMIN is not set
# CONFIG_USB_SERIAL_IPW is not set
# CONFIG_USB_SERIAL_IUU is not set
# CONFIG_USB_SERIAL_KEYSPAN_PDA is not set
# CONFIG_USB_SERIAL_KEYSPAN is not set
# CONFIG_USB_SERIAL_KLSI is not set
# CONFIG_USB_SERIAL_KOBIL_SCT is not set
# CONFIG_USB_SERIAL_MCT_U232 is not set
# CONFIG_USB_SERIAL_METRO is not set
# CONFIG_USB_SERIAL_MOS7720 is not set
# CONFIG_USB_SERIAL_MOS7840 is not set
# CONFIG_USB_SERIAL_MXUPORT is not set
# CONFIG_USB_SERIAL_NAVMAN is not set
# CONFIG_USB_SERIAL_PL2303 is not set
# CONFIG_USB_SERIAL_OTI6858 is not set
# CONFIG_USB_SERIAL_QCAUX is not set
# CONFIG_USB_SERIAL_QUALCOMM is not set
# CONFIG_USB_SERIAL_SPCP8X5 is not set
# CONFIG_USB_SERIAL_SAFE is not set
# CONFIG_USB_SERIAL_SIERRAWIRELESS is not set
# CONFIG_USB_SERIAL_SYMBOL is not set
# CONFIG_USB_SERIAL_TI is not set
# CONFIG_USB_SERIAL_CYBERJACK is not set
# CONFIG_USB_SERIAL_XIRCOM is not set
# CONFIG_USB_SERIAL_OPTION is not set
# CONFIG_USB_SERIAL_OMNINET is not set
# CONFIG_USB_SERIAL_OPTICON is not set
# CONFIG_USB_SERIAL_XSENS_MT is not set
# CONFIG_USB_SERIAL_WISHBONE is not set
# CONFIG_USB_SERIAL_SSU100 is not set
# CONFIG_USB_SERIAL_QT2 is not set
# CONFIG_USB_SERIAL_UPD78F0730 is not set
CONFIG_USB_SERIAL_DEBUG=m

#
# USB Miscellaneous drivers
#
# CONFIG_USB_EMI62 is not set
# CONFIG_USB_EMI26 is not set
# CONFIG_USB_ADUTUX is not set
# CONFIG_USB_SEVSEG is not set
# CONFIG_USB_LEGOTOWER is not set
# CONFIG_USB_LCD is not set
# CONFIG_USB_CYPRESS_CY7C63 is not set
# CONFIG_USB_CYTHERM is not set
# CONFIG_USB_IDMOUSE is not set
# CONFIG_USB_FTDI_ELAN is not set
# CONFIG_USB_APPLEDISPLAY is not set
# CONFIG_APPLE_MFI_FASTCHARGE is not set
# CONFIG_USB_SISUSBVGA is not set
# CONFIG_USB_LD is not set
# CONFIG_USB_TRANCEVIBRATOR is not set
# CONFIG_USB_IOWARRIOR is not set
# CONFIG_USB_TEST is not set
# CONFIG_USB_EHSET_TEST_FIXTURE is not set
# CONFIG_USB_ISIGHTFW is not set
# CONFIG_USB_YUREX is not set
# CONFIG_USB_EZUSB_FX2 is not set
# CONFIG_USB_HUB_USB251XB is not set
# CONFIG_USB_HSIC_USB3503 is not set
# CONFIG_USB_HSIC_USB4604 is not set
# CONFIG_USB_LINK_LAYER_TEST is not set
# CONFIG_USB_CHAOSKEY is not set
# CONFIG_USB_ATM is not set

#
# USB Physical Layer drivers
#
# CONFIG_NOP_USB_XCEIV is not set
# CONFIG_USB_GPIO_VBUS is not set
# CONFIG_USB_ISP1301 is not set
# end of USB Physical Layer drivers

# CONFIG_USB_GADGET is not set
CONFIG_TYPEC=y
# CONFIG_TYPEC_TCPM is not set
CONFIG_TYPEC_UCSI=y
# CONFIG_UCSI_CCG is not set
CONFIG_UCSI_ACPI=y
# CONFIG_TYPEC_TPS6598X is not set

#
# USB Type-C Multiplexer/DeMultiplexer Switch support
#
# CONFIG_TYPEC_MUX_PI3USB30532 is not set
# end of USB Type-C Multiplexer/DeMultiplexer Switch support

#
# USB Type-C Alternate Mode drivers
#
# CONFIG_TYPEC_DP_ALTMODE is not set
# end of USB Type-C Alternate Mode drivers

# CONFIG_USB_ROLE_SWITCH is not set
CONFIG_MMC=m
CONFIG_MMC_BLOCK=m
CONFIG_MMC_BLOCK_MINORS=8
CONFIG_SDIO_UART=m
# CONFIG_MMC_TEST is not set

#
# MMC/SD/SDIO Host Controller Drivers
#
# CONFIG_MMC_DEBUG is not set
CONFIG_MMC_SDHCI=m
CONFIG_MMC_SDHCI_IO_ACCESSORS=y
CONFIG_MMC_SDHCI_PCI=m
CONFIG_MMC_RICOH_MMC=y
CONFIG_MMC_SDHCI_ACPI=m
CONFIG_MMC_SDHCI_PLTFM=m
# CONFIG_MMC_SDHCI_F_SDH30 is not set
# CONFIG_MMC_WBSD is not set
# CONFIG_MMC_TIFM_SD is not set
# CONFIG_MMC_SPI is not set
# CONFIG_MMC_CB710 is not set
# CONFIG_MMC_VIA_SDMMC is not set
# CONFIG_MMC_VUB300 is not set
# CONFIG_MMC_USHC is not set
# CONFIG_MMC_USDHI6ROL0 is not set
# CONFIG_MMC_REALTEK_PCI is not set
CONFIG_MMC_CQHCI=m
# CONFIG_MMC_HSQ is not set
# CONFIG_MMC_TOSHIBA_PCI is not set
# CONFIG_MMC_MTK is not set
# CONFIG_MMC_SDHCI_XENON is not set
# CONFIG_MEMSTICK is not set
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
# CONFIG_LEDS_CLASS_FLASH is not set
# CONFIG_LEDS_CLASS_MULTICOLOR is not set
# CONFIG_LEDS_BRIGHTNESS_HW_CHANGED is not set

#
# LED drivers
#
# CONFIG_LEDS_APU is not set
CONFIG_LEDS_LM3530=m
# CONFIG_LEDS_LM3532 is not set
# CONFIG_LEDS_LM3642 is not set
# CONFIG_LEDS_PCA9532 is not set
# CONFIG_LEDS_GPIO is not set
CONFIG_LEDS_LP3944=m
# CONFIG_LEDS_LP3952 is not set
CONFIG_LEDS_CLEVO_MAIL=m
# CONFIG_LEDS_PCA955X is not set
# CONFIG_LEDS_PCA963X is not set
# CONFIG_LEDS_DAC124S085 is not set
# CONFIG_LEDS_PWM is not set
# CONFIG_LEDS_BD2802 is not set
CONFIG_LEDS_INTEL_SS4200=m
# CONFIG_LEDS_TCA6507 is not set
# CONFIG_LEDS_TLC591XX is not set
# CONFIG_LEDS_LM355x is not set

#
# LED driver for blink(1) USB RGB LED is under Special HID drivers (HID_THINGM)
#
CONFIG_LEDS_BLINKM=m
CONFIG_LEDS_MLXCPLD=m
# CONFIG_LEDS_MLXREG is not set
# CONFIG_LEDS_USER is not set
# CONFIG_LEDS_NIC78BX is not set
# CONFIG_LEDS_TI_LMU_COMMON is not set

#
# LED Triggers
#
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_TIMER=m
CONFIG_LEDS_TRIGGER_ONESHOT=m
# CONFIG_LEDS_TRIGGER_DISK is not set
CONFIG_LEDS_TRIGGER_HEARTBEAT=m
CONFIG_LEDS_TRIGGER_BACKLIGHT=m
# CONFIG_LEDS_TRIGGER_CPU is not set
# CONFIG_LEDS_TRIGGER_ACTIVITY is not set
CONFIG_LEDS_TRIGGER_GPIO=m
CONFIG_LEDS_TRIGGER_DEFAULT_ON=m

#
# iptables trigger is under Netfilter config (LED target)
#
CONFIG_LEDS_TRIGGER_TRANSIENT=m
CONFIG_LEDS_TRIGGER_CAMERA=m
# CONFIG_LEDS_TRIGGER_PANIC is not set
# CONFIG_LEDS_TRIGGER_NETDEV is not set
# CONFIG_LEDS_TRIGGER_PATTERN is not set
CONFIG_LEDS_TRIGGER_AUDIO=m
# CONFIG_ACCESSIBILITY is not set
CONFIG_INFINIBAND=m
CONFIG_INFINIBAND_USER_MAD=m
CONFIG_INFINIBAND_USER_ACCESS=m
CONFIG_INFINIBAND_USER_MEM=y
CONFIG_INFINIBAND_ON_DEMAND_PAGING=y
CONFIG_INFINIBAND_ADDR_TRANS=y
CONFIG_INFINIBAND_ADDR_TRANS_CONFIGFS=y
# CONFIG_INFINIBAND_MTHCA is not set
# CONFIG_INFINIBAND_EFA is not set
# CONFIG_INFINIBAND_I40IW is not set
# CONFIG_MLX4_INFINIBAND is not set
# CONFIG_INFINIBAND_OCRDMA is not set
# CONFIG_INFINIBAND_USNIC is not set
# CONFIG_INFINIBAND_BNXT_RE is not set
# CONFIG_INFINIBAND_RDMAVT is not set
CONFIG_RDMA_RXE=m
CONFIG_RDMA_SIW=m
CONFIG_INFINIBAND_IPOIB=m
# CONFIG_INFINIBAND_IPOIB_CM is not set
CONFIG_INFINIBAND_IPOIB_DEBUG=y
# CONFIG_INFINIBAND_IPOIB_DEBUG_DATA is not set
CONFIG_INFINIBAND_SRP=m
CONFIG_INFINIBAND_SRPT=m
# CONFIG_INFINIBAND_ISER is not set
# CONFIG_INFINIBAND_ISERT is not set
# CONFIG_INFINIBAND_RTRS_CLIENT is not set
# CONFIG_INFINIBAND_RTRS_SERVER is not set
# CONFIG_INFINIBAND_OPA_VNIC is not set
CONFIG_EDAC_ATOMIC_SCRUB=y
CONFIG_EDAC_SUPPORT=y
CONFIG_EDAC=y
CONFIG_EDAC_LEGACY_SYSFS=y
# CONFIG_EDAC_DEBUG is not set
CONFIG_EDAC_DECODE_MCE=m
CONFIG_EDAC_GHES=y
CONFIG_EDAC_AMD64=m
# CONFIG_EDAC_AMD64_ERROR_INJECTION is not set
CONFIG_EDAC_E752X=m
CONFIG_EDAC_I82975X=m
CONFIG_EDAC_I3000=m
CONFIG_EDAC_I3200=m
CONFIG_EDAC_IE31200=m
CONFIG_EDAC_X38=m
CONFIG_EDAC_I5400=m
CONFIG_EDAC_I7CORE=m
CONFIG_EDAC_I5000=m
CONFIG_EDAC_I5100=m
CONFIG_EDAC_I7300=m
CONFIG_EDAC_SBRIDGE=m
CONFIG_EDAC_SKX=m
# CONFIG_EDAC_I10NM is not set
CONFIG_EDAC_PND2=m
CONFIG_RTC_LIB=y
CONFIG_RTC_MC146818_LIB=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_HCTOSYS=y
CONFIG_RTC_HCTOSYS_DEVICE="rtc0"
# CONFIG_RTC_SYSTOHC is not set
# CONFIG_RTC_DEBUG is not set
CONFIG_RTC_NVMEM=y

#
# RTC interfaces
#
CONFIG_RTC_INTF_SYSFS=y
CONFIG_RTC_INTF_PROC=y
CONFIG_RTC_INTF_DEV=y
# CONFIG_RTC_INTF_DEV_UIE_EMUL is not set
# CONFIG_RTC_DRV_TEST is not set

#
# I2C RTC drivers
#
# CONFIG_RTC_DRV_ABB5ZES3 is not set
# CONFIG_RTC_DRV_ABEOZ9 is not set
# CONFIG_RTC_DRV_ABX80X is not set
CONFIG_RTC_DRV_DS1307=m
# CONFIG_RTC_DRV_DS1307_CENTURY is not set
CONFIG_RTC_DRV_DS1374=m
# CONFIG_RTC_DRV_DS1374_WDT is not set
CONFIG_RTC_DRV_DS1672=m
CONFIG_RTC_DRV_MAX6900=m
CONFIG_RTC_DRV_RS5C372=m
CONFIG_RTC_DRV_ISL1208=m
CONFIG_RTC_DRV_ISL12022=m
CONFIG_RTC_DRV_X1205=m
CONFIG_RTC_DRV_PCF8523=m
# CONFIG_RTC_DRV_PCF85063 is not set
# CONFIG_RTC_DRV_PCF85363 is not set
CONFIG_RTC_DRV_PCF8563=m
CONFIG_RTC_DRV_PCF8583=m
CONFIG_RTC_DRV_M41T80=m
CONFIG_RTC_DRV_M41T80_WDT=y
CONFIG_RTC_DRV_BQ32K=m
# CONFIG_RTC_DRV_S35390A is not set
CONFIG_RTC_DRV_FM3130=m
# CONFIG_RTC_DRV_RX8010 is not set
CONFIG_RTC_DRV_RX8581=m
CONFIG_RTC_DRV_RX8025=m
CONFIG_RTC_DRV_EM3027=m
# CONFIG_RTC_DRV_RV3028 is not set
# CONFIG_RTC_DRV_RV8803 is not set
# CONFIG_RTC_DRV_SD3078 is not set

#
# SPI RTC drivers
#
# CONFIG_RTC_DRV_M41T93 is not set
# CONFIG_RTC_DRV_M41T94 is not set
# CONFIG_RTC_DRV_DS1302 is not set
# CONFIG_RTC_DRV_DS1305 is not set
# CONFIG_RTC_DRV_DS1343 is not set
# CONFIG_RTC_DRV_DS1347 is not set
# CONFIG_RTC_DRV_DS1390 is not set
# CONFIG_RTC_DRV_MAX6916 is not set
# CONFIG_RTC_DRV_R9701 is not set
CONFIG_RTC_DRV_RX4581=m
# CONFIG_RTC_DRV_RX6110 is not set
# CONFIG_RTC_DRV_RS5C348 is not set
# CONFIG_RTC_DRV_MAX6902 is not set
# CONFIG_RTC_DRV_PCF2123 is not set
# CONFIG_RTC_DRV_MCP795 is not set
CONFIG_RTC_I2C_AND_SPI=y

#
# SPI and I2C RTC drivers
#
CONFIG_RTC_DRV_DS3232=m
CONFIG_RTC_DRV_DS3232_HWMON=y
# CONFIG_RTC_DRV_PCF2127 is not set
CONFIG_RTC_DRV_RV3029C2=m
# CONFIG_RTC_DRV_RV3029_HWMON is not set

#
# Platform RTC drivers
#
CONFIG_RTC_DRV_CMOS=y
CONFIG_RTC_DRV_DS1286=m
CONFIG_RTC_DRV_DS1511=m
CONFIG_RTC_DRV_DS1553=m
# CONFIG_RTC_DRV_DS1685_FAMILY is not set
CONFIG_RTC_DRV_DS1742=m
CONFIG_RTC_DRV_DS2404=m
CONFIG_RTC_DRV_STK17TA8=m
# CONFIG_RTC_DRV_M48T86 is not set
CONFIG_RTC_DRV_M48T35=m
CONFIG_RTC_DRV_M48T59=m
CONFIG_RTC_DRV_MSM6242=m
CONFIG_RTC_DRV_BQ4802=m
CONFIG_RTC_DRV_RP5C01=m
CONFIG_RTC_DRV_V3020=m

#
# on-CPU RTC drivers
#
# CONFIG_RTC_DRV_FTRTC010 is not set

#
# HID Sensor RTC drivers
#
CONFIG_DMADEVICES=y
# CONFIG_DMADEVICES_DEBUG is not set

#
# DMA Devices
#
CONFIG_DMA_ENGINE=y
CONFIG_DMA_VIRTUAL_CHANNELS=y
CONFIG_DMA_ACPI=y
# CONFIG_ALTERA_MSGDMA is not set
CONFIG_INTEL_IDMA64=m
# CONFIG_INTEL_IDXD is not set
CONFIG_INTEL_IOATDMA=m
# CONFIG_PLX_DMA is not set
# CONFIG_XILINX_ZYNQMP_DPDMA is not set
# CONFIG_QCOM_HIDMA_MGMT is not set
# CONFIG_QCOM_HIDMA is not set
CONFIG_DW_DMAC_CORE=y
CONFIG_DW_DMAC=m
CONFIG_DW_DMAC_PCI=y
# CONFIG_DW_EDMA is not set
# CONFIG_DW_EDMA_PCIE is not set
CONFIG_HSU_DMA=y
# CONFIG_SF_PDMA is not set

#
# DMA Clients
#
CONFIG_ASYNC_TX_DMA=y
CONFIG_DMATEST=m
CONFIG_DMA_ENGINE_RAID=y

#
# DMABUF options
#
CONFIG_SYNC_FILE=y
# CONFIG_SW_SYNC is not set
# CONFIG_UDMABUF is not set
# CONFIG_DMABUF_MOVE_NOTIFY is not set
# CONFIG_DMABUF_SELFTESTS is not set
# CONFIG_DMABUF_HEAPS is not set
# end of DMABUF options

CONFIG_DCA=m
# CONFIG_AUXDISPLAY is not set
# CONFIG_PANEL is not set
CONFIG_UIO=m
CONFIG_UIO_CIF=m
CONFIG_UIO_PDRV_GENIRQ=m
# CONFIG_UIO_DMEM_GENIRQ is not set
CONFIG_UIO_AEC=m
CONFIG_UIO_SERCOS3=m
CONFIG_UIO_PCI_GENERIC=m
# CONFIG_UIO_NETX is not set
# CONFIG_UIO_PRUSS is not set
# CONFIG_UIO_MF624 is not set
CONFIG_UIO_HV_GENERIC=m
CONFIG_VFIO_IOMMU_TYPE1=m
CONFIG_VFIO_VIRQFD=m
CONFIG_VFIO=m
CONFIG_VFIO_NOIOMMU=y
CONFIG_VFIO_PCI=m
# CONFIG_VFIO_PCI_VGA is not set
CONFIG_VFIO_PCI_MMAP=y
CONFIG_VFIO_PCI_INTX=y
# CONFIG_VFIO_PCI_IGD is not set
CONFIG_VFIO_MDEV=m
CONFIG_VFIO_MDEV_DEVICE=m
CONFIG_IRQ_BYPASS_MANAGER=m
# CONFIG_VIRT_DRIVERS is not set
CONFIG_VIRTIO=y
CONFIG_VIRTIO_MENU=y
CONFIG_VIRTIO_PCI=y
CONFIG_VIRTIO_PCI_LEGACY=y
# CONFIG_VIRTIO_PMEM is not set
CONFIG_VIRTIO_BALLOON=y
CONFIG_VIRTIO_MEM=m
CONFIG_VIRTIO_INPUT=m
# CONFIG_VIRTIO_MMIO is not set
# CONFIG_VDPA is not set
CONFIG_VHOST_IOTLB=m
CONFIG_VHOST=m
CONFIG_VHOST_MENU=y
CONFIG_VHOST_NET=m
# CONFIG_VHOST_SCSI is not set
CONFIG_VHOST_VSOCK=m
# CONFIG_VHOST_CROSS_ENDIAN_LEGACY is not set

#
# Microsoft Hyper-V guest support
#
CONFIG_HYPERV=m
CONFIG_HYPERV_TIMER=y
CONFIG_HYPERV_UTILS=m
CONFIG_HYPERV_BALLOON=m
# end of Microsoft Hyper-V guest support

#
# Xen driver support
#
# CONFIG_XEN_BALLOON is not set
CONFIG_XEN_DEV_EVTCHN=m
# CONFIG_XEN_BACKEND is not set
CONFIG_XENFS=m
CONFIG_XEN_COMPAT_XENFS=y
CONFIG_XEN_SYS_HYPERVISOR=y
CONFIG_XEN_XENBUS_FRONTEND=y
# CONFIG_XEN_GNTDEV is not set
# CONFIG_XEN_GRANT_DEV_ALLOC is not set
# CONFIG_XEN_GRANT_DMA_ALLOC is not set
CONFIG_SWIOTLB_XEN=y
# CONFIG_XEN_PVCALLS_FRONTEND is not set
CONFIG_XEN_PRIVCMD=m
CONFIG_XEN_EFI=y
CONFIG_XEN_AUTO_XLATE=y
CONFIG_XEN_ACPI=y
# end of Xen driver support

# CONFIG_GREYBUS is not set
# CONFIG_STAGING is not set
CONFIG_X86_PLATFORM_DEVICES=y
CONFIG_ACPI_WMI=m
CONFIG_WMI_BMOF=m
# CONFIG_ALIENWARE_WMI is not set
# CONFIG_HUAWEI_WMI is not set
# CONFIG_INTEL_WMI_SBL_FW_UPDATE is not set
CONFIG_INTEL_WMI_THUNDERBOLT=m
CONFIG_MXM_WMI=m
# CONFIG_PEAQ_WMI is not set
# CONFIG_XIAOMI_WMI is not set
CONFIG_ACERHDF=m
# CONFIG_ACER_WIRELESS is not set
CONFIG_ACER_WMI=m
CONFIG_APPLE_GMUX=m
CONFIG_ASUS_LAPTOP=m
# CONFIG_ASUS_WIRELESS is not set
CONFIG_ASUS_WMI=m
CONFIG_ASUS_NB_WMI=m
CONFIG_EEEPC_LAPTOP=m
CONFIG_EEEPC_WMI=m
CONFIG_DCDBAS=m
CONFIG_DELL_SMBIOS=m
CONFIG_DELL_SMBIOS_WMI=y
# CONFIG_DELL_SMBIOS_SMM is not set
CONFIG_DELL_LAPTOP=m
CONFIG_DELL_RBTN=m
CONFIG_DELL_RBU=m
CONFIG_DELL_SMO8800=m
CONFIG_DELL_WMI=m
CONFIG_DELL_WMI_DESCRIPTOR=m
CONFIG_DELL_WMI_AIO=m
CONFIG_DELL_WMI_LED=m
CONFIG_AMILO_RFKILL=m
CONFIG_FUJITSU_LAPTOP=m
CONFIG_FUJITSU_TABLET=m
# CONFIG_GPD_POCKET_FAN is not set
CONFIG_HP_ACCEL=m
CONFIG_HP_WIRELESS=m
CONFIG_HP_WMI=m
# CONFIG_IBM_RTL is not set
CONFIG_IDEAPAD_LAPTOP=m
CONFIG_SENSORS_HDAPS=m
CONFIG_THINKPAD_ACPI=m
# CONFIG_THINKPAD_ACPI_DEBUGFACILITIES is not set
# CONFIG_THINKPAD_ACPI_DEBUG is not set
# CONFIG_THINKPAD_ACPI_UNSAFE_LEDS is not set
CONFIG_THINKPAD_ACPI_VIDEO=y
CONFIG_THINKPAD_ACPI_HOTKEY_POLL=y
# CONFIG_INTEL_ATOMISP2_PM is not set
CONFIG_INTEL_HID_EVENT=m
# CONFIG_INTEL_INT0002_VGPIO is not set
# CONFIG_INTEL_MENLOW is not set
CONFIG_INTEL_OAKTRAIL=m
CONFIG_INTEL_VBTN=m
# CONFIG_SURFACE3_WMI is not set
# CONFIG_SURFACE_3_POWER_OPREGION is not set
# CONFIG_SURFACE_PRO3_BUTTON is not set
CONFIG_MSI_LAPTOP=m
CONFIG_MSI_WMI=m
# CONFIG_PCENGINES_APU2 is not set
CONFIG_SAMSUNG_LAPTOP=m
CONFIG_SAMSUNG_Q10=m
CONFIG_TOSHIBA_BT_RFKILL=m
# CONFIG_TOSHIBA_HAPS is not set
# CONFIG_TOSHIBA_WMI is not set
CONFIG_ACPI_CMPC=m
CONFIG_COMPAL_LAPTOP=m
# CONFIG_LG_LAPTOP is not set
CONFIG_PANASONIC_LAPTOP=m
CONFIG_SONY_LAPTOP=m
CONFIG_SONYPI_COMPAT=y
# CONFIG_SYSTEM76_ACPI is not set
CONFIG_TOPSTAR_LAPTOP=m
# CONFIG_I2C_MULTI_INSTANTIATE is not set
CONFIG_MLX_PLATFORM=m
CONFIG_INTEL_IPS=m
CONFIG_INTEL_RST=m
# CONFIG_INTEL_SMARTCONNECT is not set

#
# Intel Speed Select Technology interface support
#
# CONFIG_INTEL_SPEED_SELECT_INTERFACE is not set
# end of Intel Speed Select Technology interface support

CONFIG_INTEL_TURBO_MAX_3=y
# CONFIG_INTEL_UNCORE_FREQ_CONTROL is not set
CONFIG_INTEL_PMC_CORE=m
# CONFIG_INTEL_PUNIT_IPC is not set
# CONFIG_INTEL_SCU_PCI is not set
# CONFIG_INTEL_SCU_PLATFORM is not set
CONFIG_PMC_ATOM=y
# CONFIG_MFD_CROS_EC is not set
# CONFIG_CHROME_PLATFORMS is not set
CONFIG_MELLANOX_PLATFORM=y
CONFIG_MLXREG_HOTPLUG=m
# CONFIG_MLXREG_IO is not set
CONFIG_HAVE_CLK=y
CONFIG_CLKDEV_LOOKUP=y
CONFIG_HAVE_CLK_PREPARE=y
CONFIG_COMMON_CLK=y
# CONFIG_COMMON_CLK_MAX9485 is not set
# CONFIG_COMMON_CLK_SI5341 is not set
# CONFIG_COMMON_CLK_SI5351 is not set
# CONFIG_COMMON_CLK_SI544 is not set
# CONFIG_COMMON_CLK_CDCE706 is not set
# CONFIG_COMMON_CLK_CS2000_CP is not set
# CONFIG_COMMON_CLK_PWM is not set
CONFIG_HWSPINLOCK=y

#
# Clock Source drivers
#
CONFIG_CLKEVT_I8253=y
CONFIG_I8253_LOCK=y
CONFIG_CLKBLD_I8253=y
# end of Clock Source drivers

CONFIG_MAILBOX=y
CONFIG_PCC=y
# CONFIG_ALTERA_MBOX is not set
CONFIG_IOMMU_IOVA=y
CONFIG_IOASID=y
CONFIG_IOMMU_API=y
CONFIG_IOMMU_SUPPORT=y

#
# Generic IOMMU Pagetable Support
#
# end of Generic IOMMU Pagetable Support

# CONFIG_IOMMU_DEBUGFS is not set
# CONFIG_IOMMU_DEFAULT_PASSTHROUGH is not set
CONFIG_IOMMU_DMA=y
CONFIG_AMD_IOMMU=y
CONFIG_AMD_IOMMU_V2=m
CONFIG_DMAR_TABLE=y
CONFIG_INTEL_IOMMU=y
# CONFIG_INTEL_IOMMU_SVM is not set
# CONFIG_INTEL_IOMMU_DEFAULT_ON is not set
CONFIG_INTEL_IOMMU_FLOPPY_WA=y
# CONFIG_INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON is not set
CONFIG_IRQ_REMAP=y
CONFIG_HYPERV_IOMMU=y

#
# Remoteproc drivers
#
# CONFIG_REMOTEPROC is not set
# end of Remoteproc drivers

#
# Rpmsg drivers
#
# CONFIG_RPMSG_QCOM_GLINK_RPM is not set
# CONFIG_RPMSG_VIRTIO is not set
# end of Rpmsg drivers

# CONFIG_SOUNDWIRE is not set

#
# SOC (System On Chip) specific Drivers
#

#
# Amlogic SoC drivers
#
# end of Amlogic SoC drivers

#
# Aspeed SoC drivers
#
# end of Aspeed SoC drivers

#
# Broadcom SoC drivers
#
# end of Broadcom SoC drivers

#
# NXP/Freescale QorIQ SoC drivers
#
# end of NXP/Freescale QorIQ SoC drivers

#
# i.MX SoC drivers
#
# end of i.MX SoC drivers

#
# Qualcomm SoC drivers
#
# end of Qualcomm SoC drivers

# CONFIG_SOC_TI is not set

#
# Xilinx SoC drivers
#
# CONFIG_XILINX_VCU is not set
# end of Xilinx SoC drivers
# end of SOC (System On Chip) specific Drivers

# CONFIG_PM_DEVFREQ is not set
# CONFIG_EXTCON is not set
# CONFIG_MEMORY is not set
# CONFIG_IIO is not set
CONFIG_NTB=m
# CONFIG_NTB_MSI is not set
# CONFIG_NTB_AMD is not set
# CONFIG_NTB_IDT is not set
# CONFIG_NTB_INTEL is not set
# CONFIG_NTB_SWITCHTEC is not set
# CONFIG_NTB_PINGPONG is not set
# CONFIG_NTB_TOOL is not set
# CONFIG_NTB_PERF is not set
# CONFIG_NTB_TRANSPORT is not set
# CONFIG_VME_BUS is not set
CONFIG_PWM=y
CONFIG_PWM_SYSFS=y
# CONFIG_PWM_DEBUG is not set
CONFIG_PWM_LPSS=m
CONFIG_PWM_LPSS_PCI=m
CONFIG_PWM_LPSS_PLATFORM=m
# CONFIG_PWM_PCA9685 is not set

#
# IRQ chip support
#
# end of IRQ chip support

# CONFIG_IPACK_BUS is not set
# CONFIG_RESET_CONTROLLER is not set

#
# PHY Subsystem
#
# CONFIG_GENERIC_PHY is not set
# CONFIG_BCM_KONA_USB2_PHY is not set
# CONFIG_PHY_PXA_28NM_HSIC is not set
# CONFIG_PHY_PXA_28NM_USB2 is not set
# CONFIG_PHY_INTEL_EMMC is not set
# end of PHY Subsystem

CONFIG_POWERCAP=y
CONFIG_INTEL_RAPL_CORE=m
CONFIG_INTEL_RAPL=m
# CONFIG_IDLE_INJECT is not set
# CONFIG_MCB is not set

#
# Performance monitor support
#
# end of Performance monitor support

CONFIG_RAS=y
# CONFIG_RAS_CEC is not set
# CONFIG_USB4 is not set

#
# Android
#
# CONFIG_ANDROID is not set
# end of Android

CONFIG_LIBNVDIMM=m
CONFIG_BLK_DEV_PMEM=m
CONFIG_ND_BLK=m
CONFIG_ND_CLAIM=y
CONFIG_ND_BTT=m
CONFIG_BTT=y
CONFIG_ND_PFN=m
CONFIG_NVDIMM_PFN=y
CONFIG_NVDIMM_DAX=y
CONFIG_NVDIMM_KEYS=y
CONFIG_DAX_DRIVER=y
CONFIG_DAX=y
CONFIG_DEV_DAX=m
CONFIG_DEV_DAX_PMEM=m
CONFIG_DEV_DAX_KMEM=m
CONFIG_DEV_DAX_PMEM_COMPAT=m
CONFIG_NVMEM=y
CONFIG_NVMEM_SYSFS=y

#
# HW tracing support
#
CONFIG_STM=m
# CONFIG_STM_PROTO_BASIC is not set
# CONFIG_STM_PROTO_SYS_T is not set
CONFIG_STM_DUMMY=m
CONFIG_STM_SOURCE_CONSOLE=m
CONFIG_STM_SOURCE_HEARTBEAT=m
CONFIG_STM_SOURCE_FTRACE=m
CONFIG_INTEL_TH=m
CONFIG_INTEL_TH_PCI=m
CONFIG_INTEL_TH_ACPI=m
CONFIG_INTEL_TH_GTH=m
CONFIG_INTEL_TH_STH=m
CONFIG_INTEL_TH_MSU=m
CONFIG_INTEL_TH_PTI=m
# CONFIG_INTEL_TH_DEBUG is not set
# end of HW tracing support

# CONFIG_FPGA is not set
# CONFIG_TEE is not set
# CONFIG_UNISYS_VISORBUS is not set
# CONFIG_SIOX is not set
# CONFIG_SLIMBUS is not set
# CONFIG_INTERCONNECT is not set
# CONFIG_COUNTER is not set
# CONFIG_MOST is not set
# end of Device Drivers

#
# File systems
#
CONFIG_DCACHE_WORD_ACCESS=y
# CONFIG_VALIDATE_FS_PARSER is not set
CONFIG_FS_IOMAP=y
# CONFIG_EXT2_FS is not set
# CONFIG_EXT3_FS is not set
CONFIG_EXT4_FS=y
CONFIG_EXT4_USE_FOR_EXT2=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y
# CONFIG_EXT4_DEBUG is not set
CONFIG_EXT4_KUNIT_TESTS=m
CONFIG_JBD2=y
# CONFIG_JBD2_DEBUG is not set
CONFIG_FS_MBCACHE=y
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
CONFIG_XFS_FS=m
CONFIG_XFS_QUOTA=y
CONFIG_XFS_POSIX_ACL=y
CONFIG_XFS_RT=y
CONFIG_XFS_ONLINE_SCRUB=y
CONFIG_XFS_ONLINE_REPAIR=y
CONFIG_XFS_DEBUG=y
CONFIG_XFS_ASSERT_FATAL=y
CONFIG_GFS2_FS=m
CONFIG_GFS2_FS_LOCKING_DLM=y
CONFIG_OCFS2_FS=m
CONFIG_OCFS2_FS_O2CB=m
CONFIG_OCFS2_FS_USERSPACE_CLUSTER=m
CONFIG_OCFS2_FS_STATS=y
CONFIG_OCFS2_DEBUG_MASKLOG=y
# CONFIG_OCFS2_DEBUG_FS is not set
CONFIG_BTRFS_FS=m
CONFIG_BTRFS_FS_POSIX_ACL=y
# CONFIG_BTRFS_FS_CHECK_INTEGRITY is not set
# CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
# CONFIG_BTRFS_DEBUG is not set
# CONFIG_BTRFS_ASSERT is not set
# CONFIG_BTRFS_FS_REF_VERIFY is not set
# CONFIG_NILFS2_FS is not set
CONFIG_F2FS_FS=m
CONFIG_F2FS_STAT_FS=y
CONFIG_F2FS_FS_XATTR=y
CONFIG_F2FS_FS_POSIX_ACL=y
CONFIG_F2FS_FS_SECURITY=y
# CONFIG_F2FS_CHECK_FS is not set
# CONFIG_F2FS_IO_TRACE is not set
# CONFIG_F2FS_FAULT_INJECTION is not set
# CONFIG_F2FS_FS_COMPRESSION is not set
# CONFIG_ZONEFS_FS is not set
CONFIG_FS_DAX=y
CONFIG_FS_DAX_PMD=y
CONFIG_FS_POSIX_ACL=y
CONFIG_EXPORTFS=y
CONFIG_EXPORTFS_BLOCK_OPS=y
CONFIG_FILE_LOCKING=y
CONFIG_MANDATORY_FILE_LOCKING=y
CONFIG_FS_ENCRYPTION=y
CONFIG_FS_ENCRYPTION_ALGS=y
# CONFIG_FS_VERITY is not set
CONFIG_FSNOTIFY=y
CONFIG_DNOTIFY=y
CONFIG_INOTIFY_USER=y
CONFIG_FANOTIFY=y
CONFIG_FANOTIFY_ACCESS_PERMISSIONS=y
CONFIG_QUOTA=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
CONFIG_PRINT_QUOTA_WARNING=y
# CONFIG_QUOTA_DEBUG is not set
CONFIG_QUOTA_TREE=y
# CONFIG_QFMT_V1 is not set
CONFIG_QFMT_V2=y
CONFIG_QUOTACTL=y
CONFIG_QUOTACTL_COMPAT=y
CONFIG_AUTOFS4_FS=y
CONFIG_AUTOFS_FS=y
CONFIG_FUSE_FS=m
CONFIG_CUSE=m
# CONFIG_VIRTIO_FS is not set
CONFIG_OVERLAY_FS=m
# CONFIG_OVERLAY_FS_REDIRECT_DIR is not set
# CONFIG_OVERLAY_FS_REDIRECT_ALWAYS_FOLLOW is not set
# CONFIG_OVERLAY_FS_INDEX is not set
# CONFIG_OVERLAY_FS_XINO_AUTO is not set
# CONFIG_OVERLAY_FS_METACOPY is not set

#
# Caches
#
CONFIG_FSCACHE=m
CONFIG_FSCACHE_STATS=y
# CONFIG_FSCACHE_HISTOGRAM is not set
# CONFIG_FSCACHE_DEBUG is not set
# CONFIG_FSCACHE_OBJECT_LIST is not set
CONFIG_CACHEFILES=m
# CONFIG_CACHEFILES_DEBUG is not set
# CONFIG_CACHEFILES_HISTOGRAM is not set
# end of Caches

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=m
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_UDF_FS=m
# end of CD-ROM/DVD Filesystems

#
# DOS/FAT/EXFAT/NT Filesystems
#
CONFIG_FAT_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="ascii"
# CONFIG_FAT_DEFAULT_UTF8 is not set
# CONFIG_EXFAT_FS is not set
# CONFIG_NTFS_FS is not set
# end of DOS/FAT/EXFAT/NT Filesystems

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_VMCORE=y
CONFIG_PROC_VMCORE_DEVICE_DUMP=y
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_PROC_CHILDREN=y
CONFIG_PROC_PID_ARCH_STATUS=y
CONFIG_PROC_CPU_RESCTRL=y
CONFIG_KERNFS=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_TMPFS_XATTR=y
# CONFIG_TMPFS_INODE64 is not set
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_MEMFD_CREATE=y
CONFIG_ARCH_HAS_GIGANTIC_PAGE=y
CONFIG_CONFIGFS_FS=y
CONFIG_EFIVAR_FS=y
# end of Pseudo filesystems

CONFIG_MISC_FILESYSTEMS=y
# CONFIG_ORANGEFS_FS is not set
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_ECRYPT_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
CONFIG_CRAMFS=m
CONFIG_CRAMFS_BLOCKDEV=y
CONFIG_SQUASHFS=m
# CONFIG_SQUASHFS_FILE_CACHE is not set
CONFIG_SQUASHFS_FILE_DIRECT=y
# CONFIG_SQUASHFS_DECOMP_SINGLE is not set
# CONFIG_SQUASHFS_DECOMP_MULTI is not set
CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU=y
CONFIG_SQUASHFS_XATTR=y
CONFIG_SQUASHFS_ZLIB=y
# CONFIG_SQUASHFS_LZ4 is not set
CONFIG_SQUASHFS_LZO=y
CONFIG_SQUASHFS_XZ=y
# CONFIG_SQUASHFS_ZSTD is not set
# CONFIG_SQUASHFS_4K_DEVBLK_SIZE is not set
# CONFIG_SQUASHFS_EMBEDDED is not set
CONFIG_SQUASHFS_FRAGMENT_CACHE_SIZE=3
# CONFIG_VXFS_FS is not set
CONFIG_MINIX_FS=m
# CONFIG_OMFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_QNX6FS_FS is not set
# CONFIG_ROMFS_FS is not set
CONFIG_PSTORE=y
CONFIG_PSTORE_DEFLATE_COMPRESS=y
# CONFIG_PSTORE_LZO_COMPRESS is not set
# CONFIG_PSTORE_LZ4_COMPRESS is not set
# CONFIG_PSTORE_LZ4HC_COMPRESS is not set
# CONFIG_PSTORE_842_COMPRESS is not set
# CONFIG_PSTORE_ZSTD_COMPRESS is not set
CONFIG_PSTORE_COMPRESS=y
CONFIG_PSTORE_DEFLATE_COMPRESS_DEFAULT=y
CONFIG_PSTORE_COMPRESS_DEFAULT="deflate"
# CONFIG_PSTORE_CONSOLE is not set
# CONFIG_PSTORE_PMSG is not set
# CONFIG_PSTORE_FTRACE is not set
CONFIG_PSTORE_RAM=m
# CONFIG_PSTORE_BLK is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set
# CONFIG_EROFS_FS is not set
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=y
# CONFIG_NFS_V2 is not set
CONFIG_NFS_V3=y
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=m
# CONFIG_NFS_SWAP is not set
CONFIG_NFS_V4_1=y
CONFIG_NFS_V4_2=y
CONFIG_PNFS_FILE_LAYOUT=m
CONFIG_PNFS_BLOCK=m
CONFIG_PNFS_FLEXFILE_LAYOUT=m
CONFIG_NFS_V4_1_IMPLEMENTATION_ID_DOMAIN="kernel.org"
# CONFIG_NFS_V4_1_MIGRATION is not set
CONFIG_NFS_V4_SECURITY_LABEL=y
CONFIG_ROOT_NFS=y
# CONFIG_NFS_USE_LEGACY_DNS is not set
CONFIG_NFS_USE_KERNEL_DNS=y
CONFIG_NFS_DEBUG=y
CONFIG_NFS_DISABLE_UDP_SUPPORT=y
CONFIG_NFSD=m
CONFIG_NFSD_V2_ACL=y
CONFIG_NFSD_V3=y
CONFIG_NFSD_V3_ACL=y
CONFIG_NFSD_V4=y
CONFIG_NFSD_PNFS=y
# CONFIG_NFSD_BLOCKLAYOUT is not set
CONFIG_NFSD_SCSILAYOUT=y
# CONFIG_NFSD_FLEXFILELAYOUT is not set
# CONFIG_NFSD_V4_2_INTER_SSC is not set
CONFIG_NFSD_V4_SECURITY_LABEL=y
CONFIG_GRACE_PERIOD=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_NFS_ACL_SUPPORT=y
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=y
CONFIG_SUNRPC_GSS=m
CONFIG_SUNRPC_BACKCHANNEL=y
CONFIG_RPCSEC_GSS_KRB5=m
# CONFIG_SUNRPC_DISABLE_INSECURE_ENCTYPES is not set
CONFIG_SUNRPC_DEBUG=y
CONFIG_SUNRPC_XPRT_RDMA=m
CONFIG_CEPH_FS=m
# CONFIG_CEPH_FSCACHE is not set
CONFIG_CEPH_FS_POSIX_ACL=y
# CONFIG_CEPH_FS_SECURITY_LABEL is not set
CONFIG_CIFS=m
# CONFIG_CIFS_STATS2 is not set
CONFIG_CIFS_ALLOW_INSECURE_LEGACY=y
CONFIG_CIFS_WEAK_PW_HASH=y
CONFIG_CIFS_UPCALL=y
CONFIG_CIFS_XATTR=y
CONFIG_CIFS_POSIX=y
CONFIG_CIFS_DEBUG=y
# CONFIG_CIFS_DEBUG2 is not set
# CONFIG_CIFS_DEBUG_DUMP_KEYS is not set
CONFIG_CIFS_DFS_UPCALL=y
# CONFIG_CIFS_SMB_DIRECT is not set
# CONFIG_CIFS_FSCACHE is not set
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set
CONFIG_9P_FS=y
CONFIG_9P_FS_POSIX_ACL=y
# CONFIG_9P_FS_SECURITY is not set
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="utf8"
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_CODEPAGE_737=m
CONFIG_NLS_CODEPAGE_775=m
CONFIG_NLS_CODEPAGE_850=m
CONFIG_NLS_CODEPAGE_852=m
CONFIG_NLS_CODEPAGE_855=m
CONFIG_NLS_CODEPAGE_857=m
CONFIG_NLS_CODEPAGE_860=m
CONFIG_NLS_CODEPAGE_861=m
CONFIG_NLS_CODEPAGE_862=m
CONFIG_NLS_CODEPAGE_863=m
CONFIG_NLS_CODEPAGE_864=m
CONFIG_NLS_CODEPAGE_865=m
CONFIG_NLS_CODEPAGE_866=m
CONFIG_NLS_CODEPAGE_869=m
CONFIG_NLS_CODEPAGE_936=m
CONFIG_NLS_CODEPAGE_950=m
CONFIG_NLS_CODEPAGE_932=m
CONFIG_NLS_CODEPAGE_949=m
CONFIG_NLS_CODEPAGE_874=m
CONFIG_NLS_ISO8859_8=m
CONFIG_NLS_CODEPAGE_1250=m
CONFIG_NLS_CODEPAGE_1251=m
CONFIG_NLS_ASCII=y
CONFIG_NLS_ISO8859_1=m
CONFIG_NLS_ISO8859_2=m
CONFIG_NLS_ISO8859_3=m
CONFIG_NLS_ISO8859_4=m
CONFIG_NLS_ISO8859_5=m
CONFIG_NLS_ISO8859_6=m
CONFIG_NLS_ISO8859_7=m
CONFIG_NLS_ISO8859_9=m
CONFIG_NLS_ISO8859_13=m
CONFIG_NLS_ISO8859_14=m
CONFIG_NLS_ISO8859_15=m
CONFIG_NLS_KOI8_R=m
CONFIG_NLS_KOI8_U=m
CONFIG_NLS_MAC_ROMAN=m
CONFIG_NLS_MAC_CELTIC=m
CONFIG_NLS_MAC_CENTEURO=m
CONFIG_NLS_MAC_CROATIAN=m
CONFIG_NLS_MAC_CYRILLIC=m
CONFIG_NLS_MAC_GAELIC=m
CONFIG_NLS_MAC_GREEK=m
CONFIG_NLS_MAC_ICELAND=m
CONFIG_NLS_MAC_INUIT=m
CONFIG_NLS_MAC_ROMANIAN=m
CONFIG_NLS_MAC_TURKISH=m
CONFIG_NLS_UTF8=m
CONFIG_DLM=m
CONFIG_DLM_DEBUG=y
# CONFIG_UNICODE is not set
CONFIG_IO_WQ=y
# end of File systems

#
# Security options
#
CONFIG_KEYS=y
# CONFIG_KEYS_REQUEST_CACHE is not set
CONFIG_PERSISTENT_KEYRINGS=y
CONFIG_TRUSTED_KEYS=y
CONFIG_ENCRYPTED_KEYS=y
# CONFIG_KEY_DH_OPERATIONS is not set
# CONFIG_SECURITY_DMESG_RESTRICT is not set
CONFIG_SECURITY=y
CONFIG_SECURITY_WRITABLE_HOOKS=y
CONFIG_SECURITYFS=y
CONFIG_SECURITY_NETWORK=y
CONFIG_PAGE_TABLE_ISOLATION=y
# CONFIG_SECURITY_INFINIBAND is not set
CONFIG_SECURITY_NETWORK_XFRM=y
CONFIG_SECURITY_PATH=y
CONFIG_INTEL_TXT=y
CONFIG_LSM_MMAP_MIN_ADDR=65535
CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR=y
CONFIG_HARDENED_USERCOPY=y
CONFIG_HARDENED_USERCOPY_FALLBACK=y
CONFIG_FORTIFY_SOURCE=y
# CONFIG_STATIC_USERMODEHELPER is not set
CONFIG_SECURITY_SELINUX=y
CONFIG_SECURITY_SELINUX_BOOTPARAM=y
CONFIG_SECURITY_SELINUX_DISABLE=y
CONFIG_SECURITY_SELINUX_DEVELOP=y
CONFIG_SECURITY_SELINUX_AVC_STATS=y
CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE=1
CONFIG_SECURITY_SELINUX_SIDTAB_HASH_BITS=9
CONFIG_SECURITY_SELINUX_SID2STR_CACHE_SIZE=256
# CONFIG_SECURITY_SMACK is not set
# CONFIG_SECURITY_TOMOYO is not set
CONFIG_SECURITY_APPARMOR=y
CONFIG_SECURITY_APPARMOR_HASH=y
CONFIG_SECURITY_APPARMOR_HASH_DEFAULT=y
# CONFIG_SECURITY_APPARMOR_DEBUG is not set
# CONFIG_SECURITY_APPARMOR_KUNIT_TEST is not set
# CONFIG_SECURITY_LOADPIN is not set
CONFIG_SECURITY_YAMA=y
# CONFIG_SECURITY_SAFESETID is not set
# CONFIG_SECURITY_LOCKDOWN_LSM is not set
CONFIG_INTEGRITY=y
CONFIG_INTEGRITY_SIGNATURE=y
CONFIG_INTEGRITY_ASYMMETRIC_KEYS=y
CONFIG_INTEGRITY_TRUSTED_KEYRING=y
# CONFIG_INTEGRITY_PLATFORM_KEYRING is not set
CONFIG_INTEGRITY_AUDIT=y
CONFIG_IMA=y
CONFIG_IMA_MEASURE_PCR_IDX=10
CONFIG_IMA_LSM_RULES=y
# CONFIG_IMA_TEMPLATE is not set
CONFIG_IMA_NG_TEMPLATE=y
# CONFIG_IMA_SIG_TEMPLATE is not set
CONFIG_IMA_DEFAULT_TEMPLATE="ima-ng"
CONFIG_IMA_DEFAULT_HASH_SHA1=y
# CONFIG_IMA_DEFAULT_HASH_SHA256 is not set
# CONFIG_IMA_DEFAULT_HASH_SHA512 is not set
CONFIG_IMA_DEFAULT_HASH="sha1"
# CONFIG_IMA_WRITE_POLICY is not set
# CONFIG_IMA_READ_POLICY is not set
CONFIG_IMA_APPRAISE=y
# CONFIG_IMA_ARCH_POLICY is not set
# CONFIG_IMA_APPRAISE_BUILD_POLICY is not set
CONFIG_IMA_APPRAISE_BOOTPARAM=y
# CONFIG_IMA_APPRAISE_MODSIG is not set
CONFIG_IMA_TRUSTED_KEYRING=y
# CONFIG_IMA_BLACKLIST_KEYRING is not set
# CONFIG_IMA_LOAD_X509 is not set
CONFIG_IMA_MEASURE_ASYMMETRIC_KEYS=y
CONFIG_IMA_QUEUE_EARLY_BOOT_KEYS=y
# CONFIG_IMA_SECURE_AND_OR_TRUSTED_BOOT is not set
CONFIG_EVM=y
CONFIG_EVM_ATTR_FSUUID=y
# CONFIG_EVM_ADD_XATTRS is not set
# CONFIG_EVM_LOAD_X509 is not set
CONFIG_DEFAULT_SECURITY_SELINUX=y
# CONFIG_DEFAULT_SECURITY_APPARMOR is not set
# CONFIG_DEFAULT_SECURITY_DAC is not set
CONFIG_LSM="lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf"

#
# Kernel hardening options
#

#
# Memory initialization
#
CONFIG_INIT_STACK_NONE=y
# CONFIG_INIT_ON_ALLOC_DEFAULT_ON is not set
# CONFIG_INIT_ON_FREE_DEFAULT_ON is not set
# end of Memory initialization
# end of Kernel hardening options
# end of Security options

CONFIG_XOR_BLOCKS=m
CONFIG_ASYNC_CORE=m
CONFIG_ASYNC_MEMCPY=m
CONFIG_ASYNC_XOR=m
CONFIG_ASYNC_PQ=m
CONFIG_ASYNC_RAID6_RECOV=m
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_SKCIPHER=y
CONFIG_CRYPTO_SKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_RNG_DEFAULT=y
CONFIG_CRYPTO_AKCIPHER2=y
CONFIG_CRYPTO_AKCIPHER=y
CONFIG_CRYPTO_KPP2=y
CONFIG_CRYPTO_KPP=m
CONFIG_CRYPTO_ACOMP2=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
CONFIG_CRYPTO_USER=m
CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y
CONFIG_CRYPTO_GF128MUL=y
CONFIG_CRYPTO_NULL=y
CONFIG_CRYPTO_NULL2=y
CONFIG_CRYPTO_PCRYPT=m
CONFIG_CRYPTO_CRYPTD=y
CONFIG_CRYPTO_AUTHENC=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_SIMD=y
CONFIG_CRYPTO_GLUE_HELPER_X86=y

#
# Public-key cryptography
#
CONFIG_CRYPTO_RSA=y
CONFIG_CRYPTO_DH=m
CONFIG_CRYPTO_ECC=m
CONFIG_CRYPTO_ECDH=m
# CONFIG_CRYPTO_ECRDSA is not set
# CONFIG_CRYPTO_CURVE25519 is not set
# CONFIG_CRYPTO_CURVE25519_X86 is not set

#
# Authenticated Encryption with Associated Data
#
CONFIG_CRYPTO_CCM=m
CONFIG_CRYPTO_GCM=y
CONFIG_CRYPTO_CHACHA20POLY1305=m
# CONFIG_CRYPTO_AEGIS128 is not set
# CONFIG_CRYPTO_AEGIS128_AESNI_SSE2 is not set
CONFIG_CRYPTO_SEQIV=y
CONFIG_CRYPTO_ECHAINIV=m

#
# Block modes
#
CONFIG_CRYPTO_CBC=y
CONFIG_CRYPTO_CFB=y
CONFIG_CRYPTO_CTR=y
CONFIG_CRYPTO_CTS=y
CONFIG_CRYPTO_ECB=y
CONFIG_CRYPTO_LRW=m
# CONFIG_CRYPTO_OFB is not set
CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_XTS=y
# CONFIG_CRYPTO_KEYWRAP is not set
# CONFIG_CRYPTO_NHPOLY1305_SSE2 is not set
# CONFIG_CRYPTO_NHPOLY1305_AVX2 is not set
# CONFIG_CRYPTO_ADIANTUM is not set
CONFIG_CRYPTO_ESSIV=m

#
# Hash modes
#
CONFIG_CRYPTO_CMAC=m
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_XCBC=m
CONFIG_CRYPTO_VMAC=m

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
CONFIG_CRYPTO_CRC32C_INTEL=m
CONFIG_CRYPTO_CRC32=m
CONFIG_CRYPTO_CRC32_PCLMUL=m
CONFIG_CRYPTO_XXHASH=m
CONFIG_CRYPTO_BLAKE2B=m
# CONFIG_CRYPTO_BLAKE2S is not set
# CONFIG_CRYPTO_BLAKE2S_X86 is not set
CONFIG_CRYPTO_CRCT10DIF=y
CONFIG_CRYPTO_CRCT10DIF_PCLMUL=m
CONFIG_CRYPTO_GHASH=y
CONFIG_CRYPTO_POLY1305=m
CONFIG_CRYPTO_POLY1305_X86_64=m
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_RMD128=m
CONFIG_CRYPTO_RMD160=m
CONFIG_CRYPTO_RMD256=m
CONFIG_CRYPTO_RMD320=m
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA1_SSSE3=y
CONFIG_CRYPTO_SHA256_SSSE3=y
CONFIG_CRYPTO_SHA512_SSSE3=m
CONFIG_CRYPTO_SHA256=y
CONFIG_CRYPTO_SHA512=y
CONFIG_CRYPTO_SHA3=m
# CONFIG_CRYPTO_SM3 is not set
# CONFIG_CRYPTO_STREEBOG is not set
CONFIG_CRYPTO_TGR192=m
CONFIG_CRYPTO_WP512=m
CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL=m

#
# Ciphers
#
CONFIG_CRYPTO_AES=y
# CONFIG_CRYPTO_AES_TI is not set
CONFIG_CRYPTO_AES_NI_INTEL=y
CONFIG_CRYPTO_ANUBIS=m
CONFIG_CRYPTO_ARC4=m
CONFIG_CRYPTO_BLOWFISH=m
CONFIG_CRYPTO_BLOWFISH_COMMON=m
CONFIG_CRYPTO_BLOWFISH_X86_64=m
CONFIG_CRYPTO_CAMELLIA=m
CONFIG_CRYPTO_CAMELLIA_X86_64=m
CONFIG_CRYPTO_CAMELLIA_AESNI_AVX_X86_64=m
CONFIG_CRYPTO_CAMELLIA_AESNI_AVX2_X86_64=m
CONFIG_CRYPTO_CAST_COMMON=m
CONFIG_CRYPTO_CAST5=m
CONFIG_CRYPTO_CAST5_AVX_X86_64=m
CONFIG_CRYPTO_CAST6=m
CONFIG_CRYPTO_CAST6_AVX_X86_64=m
CONFIG_CRYPTO_DES=m
CONFIG_CRYPTO_DES3_EDE_X86_64=m
CONFIG_CRYPTO_FCRYPT=m
CONFIG_CRYPTO_KHAZAD=m
CONFIG_CRYPTO_SALSA20=m
CONFIG_CRYPTO_CHACHA20=m
CONFIG_CRYPTO_CHACHA20_X86_64=m
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_SERPENT_SSE2_X86_64=m
CONFIG_CRYPTO_SERPENT_AVX_X86_64=m
CONFIG_CRYPTO_SERPENT_AVX2_X86_64=m
# CONFIG_CRYPTO_SM4 is not set
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_TWOFISH_COMMON=m
CONFIG_CRYPTO_TWOFISH_X86_64=m
CONFIG_CRYPTO_TWOFISH_X86_64_3WAY=m
CONFIG_CRYPTO_TWOFISH_AVX_X86_64=m

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=y
CONFIG_CRYPTO_LZO=y
# CONFIG_CRYPTO_842 is not set
# CONFIG_CRYPTO_LZ4 is not set
# CONFIG_CRYPTO_LZ4HC is not set
# CONFIG_CRYPTO_ZSTD is not set

#
# Random Number Generation
#
CONFIG_CRYPTO_ANSI_CPRNG=m
CONFIG_CRYPTO_DRBG_MENU=y
CONFIG_CRYPTO_DRBG_HMAC=y
CONFIG_CRYPTO_DRBG_HASH=y
CONFIG_CRYPTO_DRBG_CTR=y
CONFIG_CRYPTO_DRBG=y
CONFIG_CRYPTO_JITTERENTROPY=y
CONFIG_CRYPTO_USER_API=y
CONFIG_CRYPTO_USER_API_HASH=y
CONFIG_CRYPTO_USER_API_SKCIPHER=y
CONFIG_CRYPTO_USER_API_RNG=y
CONFIG_CRYPTO_USER_API_AEAD=y
# CONFIG_CRYPTO_STATS is not set
CONFIG_CRYPTO_HASH_INFO=y

#
# Crypto library routines
#
CONFIG_CRYPTO_LIB_AES=y
CONFIG_CRYPTO_LIB_ARC4=m
# CONFIG_CRYPTO_LIB_BLAKE2S is not set
CONFIG_CRYPTO_ARCH_HAVE_LIB_CHACHA=m
CONFIG_CRYPTO_LIB_CHACHA_GENERIC=m
# CONFIG_CRYPTO_LIB_CHACHA is not set
# CONFIG_CRYPTO_LIB_CURVE25519 is not set
CONFIG_CRYPTO_LIB_DES=m
CONFIG_CRYPTO_LIB_POLY1305_RSIZE=11
CONFIG_CRYPTO_ARCH_HAVE_LIB_POLY1305=m
CONFIG_CRYPTO_LIB_POLY1305_GENERIC=m
# CONFIG_CRYPTO_LIB_POLY1305 is not set
# CONFIG_CRYPTO_LIB_CHACHA20POLY1305 is not set
CONFIG_CRYPTO_LIB_SHA256=y
CONFIG_CRYPTO_HW=y
CONFIG_CRYPTO_DEV_PADLOCK=m
CONFIG_CRYPTO_DEV_PADLOCK_AES=m
CONFIG_CRYPTO_DEV_PADLOCK_SHA=m
# CONFIG_CRYPTO_DEV_ATMEL_ECC is not set
# CONFIG_CRYPTO_DEV_ATMEL_SHA204A is not set
CONFIG_CRYPTO_DEV_CCP=y
CONFIG_CRYPTO_DEV_CCP_DD=y
CONFIG_CRYPTO_DEV_SP_CCP=y
CONFIG_CRYPTO_DEV_CCP_CRYPTO=m
CONFIG_CRYPTO_DEV_SP_PSP=y
# CONFIG_CRYPTO_DEV_CCP_DEBUGFS is not set
CONFIG_CRYPTO_DEV_QAT=m
CONFIG_CRYPTO_DEV_QAT_DH895xCC=m
CONFIG_CRYPTO_DEV_QAT_C3XXX=m
CONFIG_CRYPTO_DEV_QAT_C62X=m
CONFIG_CRYPTO_DEV_QAT_DH895xCCVF=m
CONFIG_CRYPTO_DEV_QAT_C3XXXVF=m
CONFIG_CRYPTO_DEV_QAT_C62XVF=m
CONFIG_CRYPTO_DEV_NITROX=m
CONFIG_CRYPTO_DEV_NITROX_CNN55XX=m
# CONFIG_CRYPTO_DEV_VIRTIO is not set
# CONFIG_CRYPTO_DEV_SAFEXCEL is not set
# CONFIG_CRYPTO_DEV_AMLOGIC_GXL is not set
CONFIG_ASYMMETRIC_KEY_TYPE=y
CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y
# CONFIG_ASYMMETRIC_TPM_KEY_SUBTYPE is not set
CONFIG_X509_CERTIFICATE_PARSER=y
# CONFIG_PKCS8_PRIVATE_KEY_PARSER is not set
CONFIG_PKCS7_MESSAGE_PARSER=y
# CONFIG_PKCS7_TEST_KEY is not set
CONFIG_SIGNED_PE_FILE_VERIFICATION=y

#
# Certificates for signature checking
#
CONFIG_MODULE_SIG_KEY="certs/signing_key.pem"
CONFIG_SYSTEM_TRUSTED_KEYRING=y
CONFIG_SYSTEM_TRUSTED_KEYS=""
# CONFIG_SYSTEM_EXTRA_CERTIFICATE is not set
# CONFIG_SECONDARY_TRUSTED_KEYRING is not set
CONFIG_SYSTEM_BLACKLIST_KEYRING=y
CONFIG_SYSTEM_BLACKLIST_HASH_LIST=""
# end of Certificates for signature checking

CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_RAID6_PQ=m
CONFIG_RAID6_PQ_BENCHMARK=y
# CONFIG_PACKING is not set
CONFIG_BITREVERSE=y
CONFIG_GENERIC_STRNCPY_FROM_USER=y
CONFIG_GENERIC_STRNLEN_USER=y
CONFIG_GENERIC_NET_UTILS=y
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_CORDIC=m
# CONFIG_PRIME_NUMBERS is not set
CONFIG_RATIONAL=y
CONFIG_GENERIC_PCI_IOMAP=y
CONFIG_GENERIC_IOMAP=y
CONFIG_ARCH_USE_CMPXCHG_LOCKREF=y
CONFIG_ARCH_HAS_FAST_MULTIPLIER=y
CONFIG_ARCH_USE_SYM_ANNOTATIONS=y
CONFIG_CRC_CCITT=y
CONFIG_CRC16=y
CONFIG_CRC_T10DIF=y
CONFIG_CRC_ITU_T=m
CONFIG_CRC32=y
# CONFIG_CRC32_SELFTEST is not set
CONFIG_CRC32_SLICEBY8=y
# CONFIG_CRC32_SLICEBY4 is not set
# CONFIG_CRC32_SARWATE is not set
# CONFIG_CRC32_BIT is not set
# CONFIG_CRC64 is not set
# CONFIG_CRC4 is not set
CONFIG_CRC7=m
CONFIG_LIBCRC32C=m
CONFIG_CRC8=m
CONFIG_XXHASH=y
# CONFIG_RANDOM32_SELFTEST is not set
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_LZO_COMPRESS=y
CONFIG_LZO_DECOMPRESS=y
CONFIG_LZ4_DECOMPRESS=y
CONFIG_ZSTD_COMPRESS=m
CONFIG_ZSTD_DECOMPRESS=y
CONFIG_XZ_DEC=y
CONFIG_XZ_DEC_X86=y
CONFIG_XZ_DEC_POWERPC=y
CONFIG_XZ_DEC_IA64=y
CONFIG_XZ_DEC_ARM=y
CONFIG_XZ_DEC_ARMTHUMB=y
CONFIG_XZ_DEC_SPARC=y
CONFIG_XZ_DEC_BCJ=y
# CONFIG_XZ_DEC_TEST is not set
CONFIG_DECOMPRESS_GZIP=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_LZMA=y
CONFIG_DECOMPRESS_XZ=y
CONFIG_DECOMPRESS_LZO=y
CONFIG_DECOMPRESS_LZ4=y
CONFIG_DECOMPRESS_ZSTD=y
CONFIG_GENERIC_ALLOCATOR=y
CONFIG_REED_SOLOMON=m
CONFIG_REED_SOLOMON_ENC8=y
CONFIG_REED_SOLOMON_DEC8=y
CONFIG_TEXTSEARCH=y
CONFIG_TEXTSEARCH_KMP=m
CONFIG_TEXTSEARCH_BM=m
CONFIG_TEXTSEARCH_FSM=m
CONFIG_INTERVAL_TREE=y
CONFIG_XARRAY_MULTI=y
CONFIG_ASSOCIATIVE_ARRAY=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT_MAP=y
CONFIG_HAS_DMA=y
CONFIG_DMA_OPS=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED=y
CONFIG_DMA_VIRT_OPS=y
CONFIG_SWIOTLB=y
CONFIG_DMA_COHERENT_POOL=y
CONFIG_DMA_CMA=y

#
# Default contiguous memory area size:
#
CONFIG_CMA_SIZE_MBYTES=200
CONFIG_CMA_SIZE_SEL_MBYTES=y
# CONFIG_CMA_SIZE_SEL_PERCENTAGE is not set
# CONFIG_CMA_SIZE_SEL_MIN is not set
# CONFIG_CMA_SIZE_SEL_MAX is not set
CONFIG_CMA_ALIGNMENT=8
# CONFIG_DMA_API_DEBUG is not set
CONFIG_SGL_ALLOC=y
CONFIG_CHECK_SIGNATURE=y
CONFIG_CPUMASK_OFFSTACK=y
CONFIG_CPU_RMAP=y
CONFIG_DQL=y
CONFIG_GLOB=y
# CONFIG_GLOB_SELFTEST is not set
CONFIG_NLATTR=y
CONFIG_CLZ_TAB=y
CONFIG_IRQ_POLL=y
CONFIG_MPILIB=y
CONFIG_SIGNATURE=y
CONFIG_DIMLIB=y
CONFIG_OID_REGISTRY=y
CONFIG_UCS2_STRING=y
CONFIG_HAVE_GENERIC_VDSO=y
CONFIG_GENERIC_GETTIMEOFDAY=y
CONFIG_GENERIC_VDSO_TIME_NS=y
CONFIG_FONT_SUPPORT=y
# CONFIG_FONTS is not set
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y
CONFIG_SG_POOL=y
CONFIG_ARCH_HAS_PMEM_API=y
CONFIG_MEMREGION=y
CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE=y
CONFIG_ARCH_HAS_UACCESS_MCSAFE=y
CONFIG_ARCH_STACKWALK=y
CONFIG_SBITMAP=y
# CONFIG_STRING_SELFTEST is not set
# end of Library routines

#
# Kernel hacking
#

#
# printk and dmesg options
#
CONFIG_PRINTK_TIME=y
# CONFIG_PRINTK_CALLER is not set
CONFIG_CONSOLE_LOGLEVEL_DEFAULT=7
CONFIG_CONSOLE_LOGLEVEL_QUIET=4
CONFIG_MESSAGE_LOGLEVEL_DEFAULT=4
CONFIG_BOOT_PRINTK_DELAY=y
CONFIG_DYNAMIC_DEBUG=y
CONFIG_DYNAMIC_DEBUG_CORE=y
CONFIG_SYMBOLIC_ERRNAME=y
CONFIG_DEBUG_BUGVERBOSE=y
# end of printk and dmesg options

#
# Compile-time checks and compiler options
#
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_INFO_REDUCED=y
# CONFIG_DEBUG_INFO_COMPRESSED is not set
# CONFIG_DEBUG_INFO_SPLIT is not set
CONFIG_DEBUG_INFO_DWARF4=y
# CONFIG_GDB_SCRIPTS is not set
CONFIG_ENABLE_MUST_CHECK=y
CONFIG_FRAME_WARN=2048
CONFIG_STRIP_ASM_SYMS=y
# CONFIG_READABLE_ASM is not set
# CONFIG_HEADERS_INSTALL is not set
CONFIG_DEBUG_SECTION_MISMATCH=y
CONFIG_SECTION_MISMATCH_WARN_ONLY=y
CONFIG_STACK_VALIDATION=y
# CONFIG_DEBUG_FORCE_WEAK_PER_CPU is not set
# end of Compile-time checks and compiler options

#
# Generic Kernel Debugging Instruments
#
CONFIG_MAGIC_SYSRQ=y
CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x1
CONFIG_MAGIC_SYSRQ_SERIAL=y
CONFIG_MAGIC_SYSRQ_SERIAL_SEQUENCE=""
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_FS_ALLOW_ALL=y
# CONFIG_DEBUG_FS_DISALLOW_MOUNT is not set
# CONFIG_DEBUG_FS_ALLOW_NONE is not set
CONFIG_HAVE_ARCH_KGDB=y
# CONFIG_KGDB is not set
CONFIG_ARCH_HAS_UBSAN_SANITIZE_ALL=y
# CONFIG_UBSAN is not set
# end of Generic Kernel Debugging Instruments

CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_MISC=y

#
# Memory Debugging
#
# CONFIG_PAGE_EXTENSION is not set
# CONFIG_DEBUG_PAGEALLOC is not set
# CONFIG_PAGE_OWNER is not set
# CONFIG_PAGE_POISONING is not set
# CONFIG_DEBUG_PAGE_REF is not set
# CONFIG_DEBUG_RODATA_TEST is not set
CONFIG_ARCH_HAS_DEBUG_WX=y
# CONFIG_DEBUG_WX is not set
CONFIG_GENERIC_PTDUMP=y
# CONFIG_PTDUMP_DEBUGFS is not set
# CONFIG_DEBUG_OBJECTS is not set
# CONFIG_SLUB_DEBUG_ON is not set
# CONFIG_SLUB_STATS is not set
CONFIG_HAVE_DEBUG_KMEMLEAK=y
# CONFIG_DEBUG_KMEMLEAK is not set
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_SCHED_STACK_END_CHECK is not set
CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y
# CONFIG_DEBUG_VM is not set
# CONFIG_DEBUG_VM_PGTABLE is not set
CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y
# CONFIG_DEBUG_VIRTUAL is not set
CONFIG_DEBUG_MEMORY_INIT=y
# CONFIG_DEBUG_PER_CPU_MAPS is not set
CONFIG_HAVE_ARCH_KASAN=y
CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y
# CONFIG_KASAN is not set
# end of Memory Debugging

CONFIG_DEBUG_SHIRQ=y

#
# Debug Oops, Lockups and Hangs
#
CONFIG_PANIC_ON_OOPS=y
CONFIG_PANIC_ON_OOPS_VALUE=1
CONFIG_PANIC_TIMEOUT=0
CONFIG_LOCKUP_DETECTOR=y
CONFIG_SOFTLOCKUP_DETECTOR=y
# CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0
CONFIG_HARDLOCKUP_DETECTOR_PERF=y
CONFIG_HARDLOCKUP_CHECK_TIMESTAMP=y
CONFIG_HARDLOCKUP_DETECTOR=y
CONFIG_BOOTPARAM_HARDLOCKUP_PANIC=y
CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE=1
# CONFIG_DETECT_HUNG_TASK is not set
# CONFIG_WQ_WATCHDOG is not set
# CONFIG_TEST_LOCKUP is not set
# end of Debug Oops, Lockups and Hangs

#
# Scheduler Debugging
#
CONFIG_SCHED_DEBUG=y
CONFIG_SCHED_INFO=y
CONFIG_SCHEDSTATS=y
# end of Scheduler Debugging

# CONFIG_DEBUG_TIMEKEEPING is not set

#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_LOCK_DEBUGGING_SUPPORT=y
# CONFIG_PROVE_LOCKING is not set
# CONFIG_LOCK_STAT is not set
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_DEBUG_SPINLOCK is not set
# CONFIG_DEBUG_MUTEXES is not set
# CONFIG_DEBUG_WW_MUTEX_SLOWPATH is not set
# CONFIG_DEBUG_RWSEMS is not set
# CONFIG_DEBUG_LOCK_ALLOC is not set
CONFIG_DEBUG_ATOMIC_SLEEP=y
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
CONFIG_LOCK_TORTURE_TEST=m
# CONFIG_WW_MUTEX_SELFTEST is not set
# end of Lock Debugging (spinlocks, mutexes, etc...)

CONFIG_STACKTRACE=y
# CONFIG_WARN_ALL_UNSEEDED_RANDOM is not set
# CONFIG_DEBUG_KOBJECT is not set

#
# Debug kernel data structures
#
CONFIG_DEBUG_LIST=y
# CONFIG_DEBUG_PLIST is not set
# CONFIG_DEBUG_SG is not set
# CONFIG_DEBUG_NOTIFIERS is not set
CONFIG_BUG_ON_DATA_CORRUPTION=y
# end of Debug kernel data structures

# CONFIG_DEBUG_CREDENTIALS is not set

#
# RCU Debugging
#
CONFIG_TORTURE_TEST=m
CONFIG_RCU_PERF_TEST=m
CONFIG_RCU_TORTURE_TEST=m
# CONFIG_RCU_REF_SCALE_TEST is not set
CONFIG_RCU_CPU_STALL_TIMEOUT=60
# CONFIG_RCU_TRACE is not set
# CONFIG_RCU_EQS_DEBUG is not set
# end of RCU Debugging

# CONFIG_DEBUG_WQ_FORCE_RR_CPU is not set
# CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
# CONFIG_CPU_HOTPLUG_STATE_CONTROL is not set
CONFIG_LATENCYTOP=y
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_HAVE_FENTRY=y
CONFIG_HAVE_C_RECORDMCOUNT=y
CONFIG_TRACER_MAX_TRACE=y
CONFIG_TRACE_CLOCK=y
CONFIG_RING_BUFFER=y
CONFIG_EVENT_TRACING=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_RING_BUFFER_ALLOW_SWAP=y
CONFIG_TRACING=y
CONFIG_GENERIC_TRACER=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
# CONFIG_BOOTTIME_TRACING is not set
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
CONFIG_FUNCTION_PROFILER=y
CONFIG_STACK_TRACER=y
# CONFIG_IRQSOFF_TRACER is not set
CONFIG_SCHED_TRACER=y
CONFIG_HWLAT_TRACER=y
# CONFIG_MMIOTRACE is not set
CONFIG_FTRACE_SYSCALLS=y
CONFIG_TRACER_SNAPSHOT=y
# CONFIG_TRACER_SNAPSHOT_PER_CPU_SWAP is not set
CONFIG_BRANCH_PROFILE_NONE=y
# CONFIG_PROFILE_ANNOTATED_BRANCHES is not set
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_KPROBE_EVENTS=y
# CONFIG_KPROBE_EVENTS_ON_NOTRACE is not set
CONFIG_UPROBE_EVENTS=y
CONFIG_BPF_EVENTS=y
CONFIG_DYNAMIC_EVENTS=y
CONFIG_PROBE_EVENTS=y
# CONFIG_BPF_KPROBE_OVERRIDE is not set
CONFIG_FTRACE_MCOUNT_RECORD=y
CONFIG_TRACING_MAP=y
CONFIG_SYNTH_EVENTS=y
CONFIG_HIST_TRIGGERS=y
# CONFIG_TRACE_EVENT_INJECT is not set
# CONFIG_TRACEPOINT_BENCHMARK is not set
CONFIG_RING_BUFFER_BENCHMARK=m
# CONFIG_TRACE_EVAL_MAP_FILE is not set
# CONFIG_FTRACE_STARTUP_TEST is not set
# CONFIG_RING_BUFFER_STARTUP_TEST is not set
# CONFIG_PREEMPTIRQ_DELAY_TEST is not set
# CONFIG_SYNTH_EVENT_GEN_TEST is not set
# CONFIG_KPROBE_EVENT_GEN_TEST is not set
# CONFIG_HIST_TRIGGERS_DEBUG is not set
CONFIG_PROVIDE_OHCI1394_DMA_INIT=y
# CONFIG_SAMPLES is not set
CONFIG_HAVE_ARCH_KCSAN=y
CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y
CONFIG_STRICT_DEVMEM=y
# CONFIG_IO_STRICT_DEVMEM is not set

#
# x86 Debugging
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_TRACE_IRQFLAGS_NMI_SUPPORT=y
CONFIG_EARLY_PRINTK_USB=y
CONFIG_X86_VERBOSE_BOOTUP=y
CONFIG_EARLY_PRINTK=y
CONFIG_EARLY_PRINTK_DBGP=y
CONFIG_EARLY_PRINTK_USB_XDBC=y
# CONFIG_EFI_PGT_DUMP is not set
# CONFIG_DEBUG_TLBFLUSH is not set
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
CONFIG_X86_DECODER_SELFTEST=y
CONFIG_IO_DELAY_0X80=y
# CONFIG_IO_DELAY_0XED is not set
# CONFIG_IO_DELAY_UDELAY is not set
# CONFIG_IO_DELAY_NONE is not set
CONFIG_DEBUG_BOOT_PARAMS=y
# CONFIG_CPA_DEBUG is not set
# CONFIG_DEBUG_ENTRY is not set
# CONFIG_DEBUG_NMI_SELFTEST is not set
# CONFIG_X86_DEBUG_FPU is not set
# CONFIG_PUNIT_ATOM_DEBUG is not set
CONFIG_UNWINDER_ORC=y
# CONFIG_UNWINDER_FRAME_POINTER is not set
# end of x86 Debugging

#
# Kernel Testing and Coverage
#
CONFIG_KUNIT=y
# CONFIG_KUNIT_DEBUGFS is not set
CONFIG_KUNIT_TEST=m
CONFIG_KUNIT_EXAMPLE_TEST=m
# CONFIG_KUNIT_ALL_TESTS is not set
# CONFIG_NOTIFIER_ERROR_INJECTION is not set
CONFIG_FUNCTION_ERROR_INJECTION=y
CONFIG_FAULT_INJECTION=y
# CONFIG_FAILSLAB is not set
# CONFIG_FAIL_PAGE_ALLOC is not set
CONFIG_FAIL_MAKE_REQUEST=y
# CONFIG_FAIL_IO_TIMEOUT is not set
# CONFIG_FAIL_FUTEX is not set
CONFIG_FAULT_INJECTION_DEBUG_FS=y
# CONFIG_FAIL_FUNCTION is not set
# CONFIG_FAIL_MMC_REQUEST is not set
CONFIG_ARCH_HAS_KCOV=y
CONFIG_CC_HAS_SANCOV_TRACE_PC=y
# CONFIG_KCOV is not set
CONFIG_RUNTIME_TESTING_MENU=y
# CONFIG_LKDTM is not set
# CONFIG_TEST_LIST_SORT is not set
# CONFIG_TEST_MIN_HEAP is not set
# CONFIG_TEST_SORT is not set
# CONFIG_KPROBES_SANITY_TEST is not set
# CONFIG_BACKTRACE_SELF_TEST is not set
# CONFIG_RBTREE_TEST is not set
# CONFIG_REED_SOLOMON_TEST is not set
# CONFIG_INTERVAL_TREE_TEST is not set
# CONFIG_PERCPU_TEST is not set
CONFIG_ATOMIC64_SELFTEST=y
# CONFIG_ASYNC_RAID6_TEST is not set
# CONFIG_TEST_HEXDUMP is not set
# CONFIG_TEST_STRING_HELPERS is not set
# CONFIG_TEST_STRSCPY is not set
# CONFIG_TEST_KSTRTOX is not set
# CONFIG_TEST_PRINTF is not set
# CONFIG_TEST_BITMAP is not set
# CONFIG_TEST_BITFIELD is not set
# CONFIG_TEST_UUID is not set
# CONFIG_TEST_XARRAY is not set
# CONFIG_TEST_OVERFLOW is not set
# CONFIG_TEST_RHASHTABLE is not set
# CONFIG_TEST_HASH is not set
# CONFIG_TEST_IDA is not set
# CONFIG_TEST_LKM is not set
# CONFIG_TEST_BITOPS is not set
# CONFIG_TEST_VMALLOC is not set
# CONFIG_TEST_USER_COPY is not set
CONFIG_TEST_BPF=m
# CONFIG_TEST_BLACKHOLE_DEV is not set
# CONFIG_FIND_BIT_BENCHMARK is not set
# CONFIG_TEST_FIRMWARE is not set
# CONFIG_TEST_SYSCTL is not set
CONFIG_SYSCTL_KUNIT_TEST=m
CONFIG_LIST_KUNIT_TEST=m
# CONFIG_LINEAR_RANGES_TEST is not set
# CONFIG_BITS_TEST is not set
# CONFIG_TEST_UDELAY is not set
# CONFIG_TEST_STATIC_KEYS is not set
# CONFIG_TEST_KMOD is not set
# CONFIG_TEST_MEMCAT_P is not set
# CONFIG_TEST_LIVEPATCH is not set
# CONFIG_TEST_STACKINIT is not set
# CONFIG_TEST_MEMINIT is not set
# CONFIG_TEST_HMM is not set
# CONFIG_TEST_FPU is not set
# CONFIG_MEMTEST is not set
# CONFIG_HYPERV_TESTING is not set
# end of Kernel Testing and Coverage
# end of Kernel hacking

[-- Attachment #3: job-script.ksh --]
[-- Type: text/plain, Size: 7861 bytes --]

#!/bin/sh

export_top_env()
{
	export suite='will-it-scale'
	export testcase='will-it-scale'
	export category='benchmark'
	export nr_task=192
	export job_origin='/lkp-src/allot/cyclic:p1:linux-devel:devel-hourly/lkp-cpx-4s1/will-it-scale-100.yaml'
	export queue_cmdline_keys='branch
commit
queue_at_least_once'
	export queue='validate'
	export testbox='lkp-cpx-4s1'
	export tbox_group='lkp-cpx-4s1'
	export kconfig='x86_64-rhel-8.3'
	export submit_id='5f4b21ab1b0f52448cb12f17'
	export job_file='/lkp/jobs/scheduled/lkp-cpx-4s1/will-it-scale-performance-thread-100%-futex4-ucode=0x86000017-monitor=e907e467-debian-10.4-x86_64-20200603.cgz-872a0a3f0b5ada989-20200830-17548-1ueny52-2.yaml'
	export id='579317127fb6a920b7671fc70218c4a705f4b928'
	export queuer_version='/lkp-src'
	export model='Cooper Lake'
	export nr_node=4
	export nr_cpu=192
	export memory='128G'
	export nr_hdd_partitions=
	export nr_ssd_partitions=2
	export hdd_partitions=
	export ssd_partitions='/dev/disk/by-id/nvme-INTEL_SSDPE2KX040T7_PHLF741401H84P0IGN-part4
/dev/disk/by-id/nvme-INTEL_SSDPE2KX040T7_PHLF741401H84P0IGN-part5'
	export swap_partitions=
	export rootfs_partition='/dev/disk/by-id/nvme-INTEL_SSDPE2KX040T7_PHLF741401H84P0IGN-part3'
	export kernel_cmdline_hw='acpi_rsdp=0x693fd014'
	export brand=
	export commit='872a0a3f0b5ada9898fb60c124553f1c4fcb6f7a'
	export need_kconfig_hw='CONFIG_IGB=y
CONFIG_SATA_AHCI'
	export ucode='0x86000017'
	export enqueue_time='2020-08-30 11:49:00 +0800'
	export _id='5f4b21ab1b0f52448cb12f17'
	export _rt='/result/will-it-scale/performance-thread-100%-futex4-ucode=0x86000017-monitor=e907e467/lkp-cpx-4s1/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/872a0a3f0b5ada9898fb60c124553f1c4fcb6f7a'
	export user='lkp'
	export compiler='gcc-9'
	export head_commit='eea00c68d37cd465a780e250cf6626389f86ae32'
	export base_commit='d012a7190fc1fd72ed48911e77ca97ba4521bccd'
	export branch='linux-review/Julien-Desfossez/Core-scheduling-v7/20200829-035707'
	export rootfs='debian-10.4-x86_64-20200603.cgz'
	export monitor_sha='e907e467'
	export result_root='/result/will-it-scale/performance-thread-100%-futex4-ucode=0x86000017-monitor=e907e467/lkp-cpx-4s1/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/872a0a3f0b5ada9898fb60c124553f1c4fcb6f7a/2'
	export scheduler_version='/lkp/lkp/.src-20200828-133014'
	export LKP_SERVER='inn'
	export arch='x86_64'
	export max_uptime=1500
	export initrd='/osimage/debian/debian-10.4-x86_64-20200603.cgz'
	export bootloader_append='root=/dev/ram0
user=lkp
job=/lkp/jobs/scheduled/lkp-cpx-4s1/will-it-scale-performance-thread-100%-futex4-ucode=0x86000017-monitor=e907e467-debian-10.4-x86_64-20200603.cgz-872a0a3f0b5ada989-20200830-17548-1ueny52-2.yaml
ARCH=x86_64
kconfig=x86_64-rhel-8.3
branch=linux-review/Julien-Desfossez/Core-scheduling-v7/20200829-035707
commit=872a0a3f0b5ada9898fb60c124553f1c4fcb6f7a
BOOT_IMAGE=/pkg/linux/x86_64-rhel-8.3/gcc-9/872a0a3f0b5ada9898fb60c124553f1c4fcb6f7a/vmlinuz-5.9.0-rc1-00138-g872a0a3f0b5ada
acpi_rsdp=0x693fd014
max_uptime=1500
RESULT_ROOT=/result/will-it-scale/performance-thread-100%-futex4-ucode=0x86000017-monitor=e907e467/lkp-cpx-4s1/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/872a0a3f0b5ada9898fb60c124553f1c4fcb6f7a/2
LKP_SERVER=inn
nokaslr
selinux=0
debug
apic=debug
sysrq_always_enabled
rcupdate.rcu_cpu_stall_timeout=100
net.ifnames=0
printk.devkmsg=on
panic=-1
softlockup_panic=1
nmi_watchdog=panic
oops=panic
load_ramdisk=2
prompt_ramdisk=0
drbd.minor_count=8
systemd.log_level=err
ignore_loglevel
console=tty0
earlyprintk=ttyS0,115200
console=ttyS0,115200
vga=normal
rw'
	export modules_initrd='/pkg/linux/x86_64-rhel-8.3/gcc-9/872a0a3f0b5ada9898fb60c124553f1c4fcb6f7a/modules.cgz'
	export bm_initrd='/osimage/deps/debian-10.4-x86_64-20200603.cgz/run-ipconfig_20200608.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/lkp_20200709.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/rsync-rootfs_20200608.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/will-it-scale_20200619.cgz,/osimage/pkg/debian-10.4-x86_64-20200603.cgz/will-it-scale-x86_64-0f26364-1_20200619.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/mpstat_20200714.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/perf_20200723.cgz,/osimage/pkg/debian-10.4-x86_64-20200603.cgz/perf-x86_64-d15be546031c-1_20200723.cgz,/osimage/pkg/debian-10.4-x86_64-20200603.cgz/sar-x86_64-34c92ae-1_20200702.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/hw_20200715.cgz'
	export ucode_initrd='/osimage/ucode/intel-ucode-20200610.cgz'
	export lkp_initrd='/osimage/user/lkp/lkp-x86_64.cgz'
	export site='inn'
	export LKP_CGI_PORT=80
	export LKP_CIFS_PORT=139
	export last_kernel='5.9.0-rc2'
	export repeat_to=4
	export schedule_notify_address=
	export queue_at_least_once=1
	export kernel='/pkg/linux/x86_64-rhel-8.3/gcc-9/872a0a3f0b5ada9898fb60c124553f1c4fcb6f7a/vmlinuz-5.9.0-rc1-00138-g872a0a3f0b5ada'
	export dequeue_time='2020-08-30 11:49:56 +0800'
	export job_initrd='/lkp/jobs/scheduled/lkp-cpx-4s1/will-it-scale-performance-thread-100%-futex4-ucode=0x86000017-monitor=e907e467-debian-10.4-x86_64-20200603.cgz-872a0a3f0b5ada989-20200830-17548-1ueny52-2.cgz'

	[ -n "$LKP_SRC" ] ||
	export LKP_SRC=/lkp/${user:-lkp}/src
}

run_job()
{
	echo $$ > $TMP/run-job.pid

	. $LKP_SRC/lib/http.sh
	. $LKP_SRC/lib/job.sh
	. $LKP_SRC/lib/env.sh

	export_top_env

	run_setup $LKP_SRC/setup/cpufreq_governor 'performance'

	run_monitor $LKP_SRC/monitors/wrapper kmsg
	run_monitor $LKP_SRC/monitors/no-stdout/wrapper boot-time
	run_monitor $LKP_SRC/monitors/wrapper iostat
	run_monitor $LKP_SRC/monitors/wrapper heartbeat
	run_monitor $LKP_SRC/monitors/wrapper vmstat
	run_monitor $LKP_SRC/monitors/wrapper numa-numastat
	run_monitor $LKP_SRC/monitors/wrapper numa-vmstat
	run_monitor $LKP_SRC/monitors/wrapper numa-meminfo
	run_monitor $LKP_SRC/monitors/wrapper proc-vmstat
	run_monitor $LKP_SRC/monitors/wrapper proc-stat
	run_monitor $LKP_SRC/monitors/wrapper meminfo
	run_monitor $LKP_SRC/monitors/wrapper slabinfo
	run_monitor $LKP_SRC/monitors/wrapper interrupts
	run_monitor $LKP_SRC/monitors/wrapper lock_stat
	run_monitor $LKP_SRC/monitors/wrapper latency_stats
	run_monitor $LKP_SRC/monitors/wrapper softirqs
	run_monitor $LKP_SRC/monitors/one-shot/wrapper bdi_dev_mapping
	run_monitor $LKP_SRC/monitors/wrapper diskstats
	run_monitor $LKP_SRC/monitors/wrapper nfsstat
	run_monitor $LKP_SRC/monitors/wrapper cpuidle
	run_monitor $LKP_SRC/monitors/wrapper cpufreq-stats
	run_monitor $LKP_SRC/monitors/wrapper sched_debug
	run_monitor $LKP_SRC/monitors/wrapper perf-stat
	run_monitor $LKP_SRC/monitors/wrapper mpstat
	run_monitor $LKP_SRC/monitors/no-stdout/wrapper perf-profile
	run_monitor $LKP_SRC/monitors/wrapper oom-killer
	run_monitor $LKP_SRC/monitors/plain/watchdog

	run_test mode='thread' test='futex4' $LKP_SRC/tests/wrapper will-it-scale
}

extract_stats()
{
	export stats_part_begin=
	export stats_part_end=

	$LKP_SRC/stats/wrapper will-it-scale
	$LKP_SRC/stats/wrapper kmsg
	$LKP_SRC/stats/wrapper boot-time
	$LKP_SRC/stats/wrapper iostat
	$LKP_SRC/stats/wrapper vmstat
	$LKP_SRC/stats/wrapper numa-numastat
	$LKP_SRC/stats/wrapper numa-vmstat
	$LKP_SRC/stats/wrapper numa-meminfo
	$LKP_SRC/stats/wrapper proc-vmstat
	$LKP_SRC/stats/wrapper meminfo
	$LKP_SRC/stats/wrapper slabinfo
	$LKP_SRC/stats/wrapper interrupts
	$LKP_SRC/stats/wrapper lock_stat
	$LKP_SRC/stats/wrapper latency_stats
	$LKP_SRC/stats/wrapper softirqs
	$LKP_SRC/stats/wrapper diskstats
	$LKP_SRC/stats/wrapper nfsstat
	$LKP_SRC/stats/wrapper cpuidle
	$LKP_SRC/stats/wrapper sched_debug
	$LKP_SRC/stats/wrapper perf-stat
	$LKP_SRC/stats/wrapper mpstat
	$LKP_SRC/stats/wrapper perf-profile

	$LKP_SRC/stats/wrapper time will-it-scale.time
	$LKP_SRC/stats/wrapper dmesg
	$LKP_SRC/stats/wrapper kmsg
	$LKP_SRC/stats/wrapper last_state
	$LKP_SRC/stats/wrapper stderr
	$LKP_SRC/stats/wrapper time
}

"$@"

[-- Attachment #4: job.yaml --]
[-- Type: text/plain, Size: 5207 bytes --]

---

#! jobs/will-it-scale-100.yaml
suite: will-it-scale
testcase: will-it-scale
category: benchmark
nr_task: 100%
will-it-scale:
  mode: thread
  test: futex4
job_origin: "/lkp-src/allot/cyclic:p1:linux-devel:devel-hourly/lkp-cpx-4s1/will-it-scale-100.yaml"

#! queue options
queue_cmdline_keys:
- branch
- commit
- queue_at_least_once
queue: bisect
testbox: lkp-cpx-4s1
tbox_group: lkp-cpx-4s1
kconfig: x86_64-rhel-8.3
submit_id: 5f4af4841b0f5242da26e68c
job_file: "/lkp/jobs/scheduled/lkp-cpx-4s1/will-it-scale-performance-thread-100%-futex4-ucode=0x86000017-monitor=e907e467-debian-10.4-x86_64-20200603.cgz-872a0a3f0b5ada989-20200830-17114-q1fv6m-1.yaml"
id: a140282de920f034eacf63bd1fb9ca933bc0bec1
queuer_version: "/lkp-src"

#! hosts/lkp-cpx-4s1
model: Cooper Lake
nr_node: 4
nr_cpu: 192
memory: 128G
nr_hdd_partitions: 
nr_ssd_partitions: 2
hdd_partitions: 
ssd_partitions:
- "/dev/disk/by-id/nvme-INTEL_SSDPE2KX040T7_PHLF741401H84P0IGN-part4"
- "/dev/disk/by-id/nvme-INTEL_SSDPE2KX040T7_PHLF741401H84P0IGN-part5"
swap_partitions: 
rootfs_partition: "/dev/disk/by-id/nvme-INTEL_SSDPE2KX040T7_PHLF741401H84P0IGN-part3"
kernel_cmdline_hw: acpi_rsdp=0x693fd014
brand: 

#! include/category/benchmark
kmsg: 
boot-time: 
iostat: 
heartbeat: 
vmstat: 
numa-numastat: 
numa-vmstat: 
numa-meminfo: 
proc-vmstat: 
proc-stat: 
meminfo: 
slabinfo: 
interrupts: 
lock_stat: 
latency_stats: 
softirqs: 
bdi_dev_mapping: 
diskstats: 
nfsstat: 
cpuidle: 
cpufreq-stats: 
sched_debug: 
perf-stat: 
mpstat: 
perf-profile: 

#! include/category/ALL
cpufreq_governor: performance

#! include/queue/cyclic
commit: 872a0a3f0b5ada9898fb60c124553f1c4fcb6f7a

#! include/testbox/lkp-cpx-4s1
need_kconfig_hw:
- CONFIG_IGB=y
- CONFIG_SATA_AHCI
ucode: '0x86000017'
enqueue_time: 2020-08-30 08:36:20.453697721 +08:00
_id: 5f4af6691b0f5242da26e68d
_rt: "/result/will-it-scale/performance-thread-100%-futex4-ucode=0x86000017-monitor=e907e467/lkp-cpx-4s1/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/872a0a3f0b5ada9898fb60c124553f1c4fcb6f7a"

#! schedule options
user: lkp
compiler: gcc-9
head_commit: eea00c68d37cd465a780e250cf6626389f86ae32
base_commit: d012a7190fc1fd72ed48911e77ca97ba4521bccd
branch: linux-devel/devel-hourly-2020082909
rootfs: debian-10.4-x86_64-20200603.cgz
monitor_sha: e907e467
result_root: "/result/will-it-scale/performance-thread-100%-futex4-ucode=0x86000017-monitor=e907e467/lkp-cpx-4s1/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/872a0a3f0b5ada9898fb60c124553f1c4fcb6f7a/1"
scheduler_version: "/lkp/lkp/.src-20200828-133014"
LKP_SERVER: inn
arch: x86_64
max_uptime: 1500
initrd: "/osimage/debian/debian-10.4-x86_64-20200603.cgz"
bootloader_append:
- root=/dev/ram0
- user=lkp
- job=/lkp/jobs/scheduled/lkp-cpx-4s1/will-it-scale-performance-thread-100%-futex4-ucode=0x86000017-monitor=e907e467-debian-10.4-x86_64-20200603.cgz-872a0a3f0b5ada989-20200830-17114-q1fv6m-1.yaml
- ARCH=x86_64
- kconfig=x86_64-rhel-8.3
- branch=linux-devel/devel-hourly-2020082909
- commit=872a0a3f0b5ada9898fb60c124553f1c4fcb6f7a
- BOOT_IMAGE=/pkg/linux/x86_64-rhel-8.3/gcc-9/872a0a3f0b5ada9898fb60c124553f1c4fcb6f7a/vmlinuz-5.9.0-rc1-00138-g872a0a3f0b5ada
- acpi_rsdp=0x693fd014
- max_uptime=1500
- RESULT_ROOT=/result/will-it-scale/performance-thread-100%-futex4-ucode=0x86000017-monitor=e907e467/lkp-cpx-4s1/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/872a0a3f0b5ada9898fb60c124553f1c4fcb6f7a/1
- LKP_SERVER=inn
- nokaslr
- selinux=0
- debug
- apic=debug
- sysrq_always_enabled
- rcupdate.rcu_cpu_stall_timeout=100
- net.ifnames=0
- printk.devkmsg=on
- panic=-1
- softlockup_panic=1
- nmi_watchdog=panic
- oops=panic
- load_ramdisk=2
- prompt_ramdisk=0
- drbd.minor_count=8
- systemd.log_level=err
- ignore_loglevel
- console=tty0
- earlyprintk=ttyS0,115200
- console=ttyS0,115200
- vga=normal
- rw
modules_initrd: "/pkg/linux/x86_64-rhel-8.3/gcc-9/872a0a3f0b5ada9898fb60c124553f1c4fcb6f7a/modules.cgz"
bm_initrd: "/osimage/deps/debian-10.4-x86_64-20200603.cgz/run-ipconfig_20200608.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/lkp_20200709.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/rsync-rootfs_20200608.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/will-it-scale_20200619.cgz,/osimage/pkg/debian-10.4-x86_64-20200603.cgz/will-it-scale-x86_64-0f26364-1_20200619.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/mpstat_20200714.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/perf_20200723.cgz,/osimage/pkg/debian-10.4-x86_64-20200603.cgz/perf-x86_64-d15be546031c-1_20200723.cgz,/osimage/pkg/debian-10.4-x86_64-20200603.cgz/sar-x86_64-34c92ae-1_20200702.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/hw_20200715.cgz"
ucode_initrd: "/osimage/ucode/intel-ucode-20200610.cgz"
lkp_initrd: "/osimage/user/lkp/lkp-x86_64.cgz"
site: inn

#! /lkp/lkp/.src-20200828-133014/include/site/inn
LKP_CGI_PORT: 80
LKP_CIFS_PORT: 139
oom-killer: 
watchdog: 

#! runtime status
last_kernel: 5.9.0-rc2
repeat_to: 2
schedule_notify_address: 

#! user overrides
queue_at_least_once: 0
kernel: "/pkg/linux/x86_64-rhel-8.3/gcc-9/872a0a3f0b5ada9898fb60c124553f1c4fcb6f7a/vmlinuz-5.9.0-rc1-00138-g872a0a3f0b5ada"
dequeue_time: 2020-08-30 08:53:53.440665115 +08:00
job_state: wget_kernel_fail

[-- Attachment #5: reproduce.ksh --]
[-- Type: text/plain, Size: 337 bytes --]


for cpu_dir in /sys/devices/system/cpu/cpu[0-9]*
do
	online_file="$cpu_dir"/online
	[ -f "$online_file" ] && [ "$(cat "$online_file")" -eq 0 ] && continue

	file="$cpu_dir"/cpufreq/scaling_governor
	[ -f "$file" ] && echo "performance" > "$file"
done

 "/lkp/benchmarks/python3/bin/python3" "./runtest.py" "futex4" "295" "thread" "192"

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling.
  2020-08-29  7:47       ` peterz
@ 2020-08-31 13:01         ` Vineeth Pillai
  2020-08-31 14:24         ` Joel Fernandes
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 68+ messages in thread
From: Vineeth Pillai @ 2020-08-31 13:01 UTC (permalink / raw)
  To: peterz
  Cc: Julien Desfossez, Joel Fernandes, Tim Chen, Aaron Lu, Aubrey Li,
	Dhaval Giani, Chris Hyser, Nishanth Aravamudan, mingo, tglx, pjt,
	torvalds, linux-kernel, fweisbec, keescook, kerrnel, Phil Auld,
	Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini, joel,
	vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, rostedt,
	derkling, benbjiang, Vineeth Remanan Pillai, Aaron Lu



On 8/29/20 3:47 AM, peterz@infradead.org wrote:
>> During hotplug stress test, we have noticed that while a sibling is in
>> pick_next_task, another sibling can go offline or come online. What
>> we have observed is smt_mask get updated underneath us even if
>> we hold the lock. From reading the code, looks like we don't hold the
>> rq lock when the mask is updated. This extra logic was to take care of that.
> Sure, the mask is updated async, but _where_ is the actual problem with
> that?
One issue that we observed was with the inconsistent view of smt_mask
in the three loops in pick_next_task can lead to corner cases. The first 
loop
resets all coresched fields, second loop picks the task for each 
siblings and
then if the sibling came online in the third loop, we IPI it and it 
crashes for a
NULL core_pick. Similarly I think we might have issues if the sibling goes
offline in the last loop or sibling is offline/online in the pick loop.

It might be possible to do specific checks for core_pick in the loops to 
fix the
corner cases above. But I was not sure if there were more and hence took 
this
approach. I understand that cpumask_weight is expensive and will try to 
fix it
using core_pick to fix the corner cases.

>
> On Fri, Aug 28, 2020 at 06:23:55PM -0400, Joel Fernandes wrote:
>> Thanks Vineeth. Peter, also the "v6+" series (which were some addons on v6)
>> detail the individual hotplug changes squashed into this patch:
>> https://lore.kernel.org/lkml/20200815031908.1015049-9-joel@joelfernandes.org/
>> https://lore.kernel.org/lkml/20200815031908.1015049-11-joel@joelfernandes.org/
> That one looks fishy, the pick is core wide, making that pick_seq per rq
> just doesn't make sense.
I think there are couple of scenarios where pick_seq per sibling will be
useful. One is this hotplug case, where you need to pick only for the online
siblings and the offline siblings can come online async. Second case that
we have seen is, we don't need to mark pick for siblings when we pick a task
which is currently running. Marking the pick core wide will make the sibling
take the fast path and re-select the same task during a schedule event
due to preemption or its time slice expiry.

>> https://lore.kernel.org/lkml/20200815031908.1015049-12-joel@joelfernandes.org/
> This one reads like tinkering, there is no description of the actual
> problem just some code that makes a symptom go away.
>
> Sure, on hotplug the smt mask can change, but only for a CPU that isn't
> actually scheduling, so who cares.
>
> /me re-reads the hotplug code...
>
> ..ooOO is the problem that we clear the cpumasks on take_cpu_down()
> instead of play_dead() ?! That should be fixable.
Yes, we get called into schedule(for going idle) for an offline cpu and 
it gets
confused. Also I think there could be problems while it comes online as 
well,
like I mentioned above. We might IPI a sibling which just came online 
with a
NULL core_pick. But I think we can fix it with specific checks for 
core_pick.

I shall look into the smt mask update side of the hotplug again and see
if corner cases could be better handled there.

Thanks,
Vineeth


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling.
  2020-08-29  7:47       ` peterz
  2020-08-31 13:01         ` Vineeth Pillai
@ 2020-08-31 14:24         ` Joel Fernandes
  2020-09-01  3:38         ` Joel Fernandes
  2020-09-01  5:10         ` Joel Fernandes
  3 siblings, 0 replies; 68+ messages in thread
From: Joel Fernandes @ 2020-08-31 14:24 UTC (permalink / raw)
  To: peterz
  Cc: Vineeth Pillai, Julien Desfossez, Tim Chen, Aaron Lu, Aubrey Li,
	Dhaval Giani, Chris Hyser, Nishanth Aravamudan, mingo, tglx, pjt,
	torvalds, linux-kernel, fweisbec, keescook, kerrnel, Phil Auld,
	Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini,
	vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, rostedt,
	derkling, benbjiang, Vineeth Remanan Pillai, Aaron Lu

Hi Peter,

On Sat, Aug 29, 2020 at 09:47:19AM +0200, peterz@infradead.org wrote:
> On Fri, Aug 28, 2020 at 06:02:25PM -0400, Vineeth Pillai wrote:
> > On 8/28/20 4:51 PM, Peter Zijlstra wrote:
> 
> > > So where do things go side-ways?
> 
> > During hotplug stress test, we have noticed that while a sibling is in
> > pick_next_task, another sibling can go offline or come online. What
> > we have observed is smt_mask get updated underneath us even if
> > we hold the lock. From reading the code, looks like we don't hold the
> > rq lock when the mask is updated. This extra logic was to take care of that.
> 
> Sure, the mask is updated async, but _where_ is the actual problem with
> that?
> 
> On Fri, Aug 28, 2020 at 06:23:55PM -0400, Joel Fernandes wrote:
> > Thanks Vineeth. Peter, also the "v6+" series (which were some addons on v6)
> > detail the individual hotplug changes squashed into this patch:
> > https://lore.kernel.org/lkml/20200815031908.1015049-9-joel@joelfernandes.org/
> > https://lore.kernel.org/lkml/20200815031908.1015049-11-joel@joelfernandes.org/
> 
> That one looks fishy, the pick is core wide, making that pick_seq per rq
> just doesn't make sense.

I think Vineeth was trying to handle the case where rq->core_pick happened to
be NULL for an offline CPU, and then schedule() is called when it came online
but its sched_seq != core-wide pick_seq. The reason for this situation is
because a sibling did selection for the offline CPU and ended up leaving its
rq->core_pick as NULL as the then-offline CPU was missing from the
cpu_smt_mask, but it incremented the core-wide pick_seq anyway.

Due to this, the pick_next_task() can crash after entering this if() block:
+	if (rq->core_pick_seq == rq->core->core_task_seq &&
+	    rq->core_pick_seq != rq->core_sched_seq) {

How would you suggest to fix it? Maybe we can just assign rq->core_sched_seq
== rq->core_pick_seq for an offline CPU (or any CPU where rq->core_pick ==
NULL), so it does not end up using rq->core_pick and does a full core-wide
selcetion again when it comes online?

Or easier, check for rq->core_pick == NULL and skip this fast-path if() block
completely.

> > https://lore.kernel.org/lkml/20200815031908.1015049-12-joel@joelfernandes.org/
> 
> This one reads like tinkering, there is no description of the actual
> problem just some code that makes a symptom go away.
> 
> Sure, on hotplug the smt mask can change, but only for a CPU that isn't
> actually scheduling, so who cares.
> 
> /me re-reads the hotplug code...
> 
> ..ooOO is the problem that we clear the cpumasks on take_cpu_down()
> instead of play_dead() ?! That should be fixable.

I think Vineeth explained this in his email, there is logic across the loops
in the pick_next_task() that depend on the cpu_smt_mask not change. I am not
sure if play_dead() will fix it, the issue is seen in the code doing the
selection and the cpu_smt_mask changing under it due to possibly other CPUs
going offline.

Example, you have a splat and null pointer dereference possibilities in the
below loop if rq_i ->core_pick == NULL, because a sibling CPU came online but
a task was not selected for it in the for loops prior to this for loop:

        /*
         * Reschedule siblings
         *
         * NOTE: L1TF -- at this point we're no longer running the old task and
         * sending an IPI (below) ensures the sibling will no longer be running
         * their task. This ensures there is no inter-sibling overlap between
         * non-matching user state.
         */
        for_each_cpu(i, smt_mask) {
                struct rq *rq_i = cpu_rq(i);

                WARN_ON_ONCE(!rq_i->core_pick);

                if (is_idle_task(rq_i->core_pick) && rq_i->nr_running)
                        rq_i->core_forceidle = true;

                rq_i->core_pick->core_occupation = occ;

Probably the code can be rearchitected to not depend on cpu_smt_mask
changing.  What I did in my old tree is I made a copy of the cpu_smt_mask in
the beginning of this function, and that makes all the problems go away. But
I was afraid of overhead of that copying.

(btw, I would not complain one bit if this function was nuked and rewritten
to be simpler).

> > https://lore.kernel.org/lkml/20200815031908.1015049-13-joel@joelfernandes.org/
> 
> This is the only one that makes some sense, it makes rq->core consistent
> over hotplug.

Cool at least we got one thing right ;)

> > Agreed we can split the patches for the next series, however for final
> > upstream merge, I suggest we fix hotplug issues in this patch itself so that
> > we don't break bisectability.
> 
> Meh, who sodding cares about hotplug :-). Also you can 'fix' such things
> by making sure you can't actually enable core-sched until after
> everything is in place.

Fair enough :)

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling.
  2020-08-29  7:47       ` peterz
  2020-08-31 13:01         ` Vineeth Pillai
  2020-08-31 14:24         ` Joel Fernandes
@ 2020-09-01  3:38         ` Joel Fernandes
  2020-09-01  5:10         ` Joel Fernandes
  3 siblings, 0 replies; 68+ messages in thread
From: Joel Fernandes @ 2020-09-01  3:38 UTC (permalink / raw)
  To: peterz
  Cc: Vineeth Pillai, Julien Desfossez, Tim Chen, Aaron Lu, Aubrey Li,
	Dhaval Giani, Chris Hyser, Nishanth Aravamudan, mingo, tglx, pjt,
	torvalds, linux-kernel, fweisbec, keescook, kerrnel, Phil Auld,
	Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini,
	vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, rostedt,
	derkling, benbjiang, Vineeth Remanan Pillai, Aaron Lu

On Sat, Aug 29, 2020 at 09:47:19AM +0200, peterz@infradead.org wrote:
> On Fri, Aug 28, 2020 at 06:02:25PM -0400, Vineeth Pillai wrote:
> > On 8/28/20 4:51 PM, Peter Zijlstra wrote:
> 
> > > So where do things go side-ways?
> 
> > During hotplug stress test, we have noticed that while a sibling is in
> > pick_next_task, another sibling can go offline or come online. What
> > we have observed is smt_mask get updated underneath us even if
> > we hold the lock. From reading the code, looks like we don't hold the
> > rq lock when the mask is updated. This extra logic was to take care of that.
> 
> Sure, the mask is updated async, but _where_ is the actual problem with
> that?
> 
> On Fri, Aug 28, 2020 at 06:23:55PM -0400, Joel Fernandes wrote:
> > Thanks Vineeth. Peter, also the "v6+" series (which were some addons on v6)
> > detail the individual hotplug changes squashed into this patch:
> > https://lore.kernel.org/lkml/20200815031908.1015049-9-joel@joelfernandes.org/
> > https://lore.kernel.org/lkml/20200815031908.1015049-11-joel@joelfernandes.org/
> 
> That one looks fishy, the pick is core wide, making that pick_seq per rq
> just doesn't make sense.
> 
> > https://lore.kernel.org/lkml/20200815031908.1015049-12-joel@joelfernandes.org/
> 
> This one reads like tinkering, there is no description of the actual
> problem just some code that makes a symptom go away.
> 
> Sure, on hotplug the smt mask can change, but only for a CPU that isn't
> actually scheduling, so who cares.
> 
> /me re-reads the hotplug code...
> 
> ..ooOO is the problem that we clear the cpumasks on take_cpu_down()
> instead of play_dead() ?! That should be fixable.

That is indeed the problem.

I was wondering, is there any harm in just selecting idle task if the CPU
calling schedule() is missing from cpu_smt_mask? Does it need to do a
core-wide selection?

That would be best, and avoid any unnecessary surgery of the already
complicated function. This is sort of what Tim was doing in v4 and v5.

Also what do we do if cpu_smt_mask changing while this function is running? I
tried something like the following and it solves the issues but the overhead
probably really sucks. I was thinking of doing a variation of the below that
just stored the cpu_smt_mask's rq pointers in an array of size SMTS_PER_CORE
on the stack, instead of a cpumask but I am not sure if that will keep the
code clean while still having similar storage overhead.

---8<-----------------------

From 5e905e7e620177075a9bcf78fb0dc89a136434bb Mon Sep 17 00:00:00 2001
From: Joel Fernandes <joelaf@google.com>
Date: Tue, 30 Jun 2020 19:39:45 -0400
Subject: [PATCH] Fix CPU hotplug causing crashes in task selection logic

Signed-off-by: Joel Fernandes <joelaf@google.com>
---
 kernel/sched/core.c | 34 ++++++++++++++++++++++++++++------
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0362102fa3d2..47a21013ba0d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4464,7 +4464,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 {
 	struct task_struct *next, *max = NULL;
 	const struct sched_class *class;
-	const struct cpumask *smt_mask;
+	struct cpumask select_mask;
 	int i, j, cpu, occ = 0;
 	bool need_sync;
 
@@ -4499,7 +4499,13 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 	finish_prev_task(rq, prev, rf);
 
 	cpu = cpu_of(rq);
-	smt_mask = cpu_smt_mask(cpu);
+	cpumask_copy(&select_mask, cpu_smt_mask(cpu));
+
+	/*
+	 * Always make sure current CPU is added to smt_mask so that below
+	 * selection logic runs on it.
+	 */
+	cpumask_set_cpu(cpu, &select_mask);
 
 	/*
 	 * core->core_task_seq, core->core_pick_seq, rq->core_sched_seq
@@ -4516,7 +4522,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 
 	/* reset state */
 	rq->core->core_cookie = 0UL;
-	for_each_cpu(i, smt_mask) {
+	for_each_cpu(i, &select_mask) {
 		struct rq *rq_i = cpu_rq(i);
 
 		rq_i->core_pick = NULL;
@@ -4536,7 +4542,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 	 */
 	for_each_class(class) {
 again:
-		for_each_cpu_wrap(i, smt_mask, cpu) {
+		for_each_cpu_wrap(i, &select_mask, cpu) {
 			struct rq *rq_i = cpu_rq(i);
 			struct task_struct *p;
 
@@ -4600,7 +4608,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 				trace_printk("max: %s/%d %lx\n", max->comm, max->pid, max->core_cookie);
 
 				if (old_max) {
-					for_each_cpu(j, smt_mask) {
+					for_each_cpu(j, &select_mask) {
 						if (j == i)
 							continue;
 
@@ -4625,6 +4633,10 @@ next_class:;
 
 	rq->core->core_pick_seq = rq->core->core_task_seq;
 	next = rq->core_pick;
+
+	/* Something should have been selected for current CPU*/
+	WARN_ON_ONCE(!next);
+
 	rq->core_sched_seq = rq->core->core_pick_seq;
 	trace_printk("picked: %s/%d %lx\n", next->comm, next->pid, next->core_cookie);
 
@@ -4636,7 +4648,7 @@ next_class:;
 	 * their task. This ensures there is no inter-sibling overlap between
 	 * non-matching user state.
 	 */
-	for_each_cpu(i, smt_mask) {
+	for_each_cpu(i, &select_mask) {
 		struct rq *rq_i = cpu_rq(i);
 
 		WARN_ON_ONCE(!rq_i->core_pick);
-- 
2.28.0.402.g5ffc5be6b7-goog


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling.
  2020-08-29  7:47       ` peterz
                           ` (2 preceding siblings ...)
  2020-09-01  3:38         ` Joel Fernandes
@ 2020-09-01  5:10         ` Joel Fernandes
  2020-09-01 12:34           ` Vineeth Pillai
  3 siblings, 1 reply; 68+ messages in thread
From: Joel Fernandes @ 2020-09-01  5:10 UTC (permalink / raw)
  To: peterz
  Cc: Vineeth Pillai, Julien Desfossez, Tim Chen, Aaron Lu, Aubrey Li,
	Dhaval Giani, Chris Hyser, Nishanth Aravamudan, mingo, tglx, pjt,
	torvalds, linux-kernel, fweisbec, keescook, kerrnel, Phil Auld,
	Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini,
	vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, rostedt,
	derkling, benbjiang, Vineeth Remanan Pillai, Aaron Lu

On Sat, Aug 29, 2020 at 09:47:19AM +0200, peterz@infradead.org wrote:
> On Fri, Aug 28, 2020 at 06:02:25PM -0400, Vineeth Pillai wrote:
> > On 8/28/20 4:51 PM, Peter Zijlstra wrote:
> 
> > > So where do things go side-ways?
> 
> > During hotplug stress test, we have noticed that while a sibling is in
> > pick_next_task, another sibling can go offline or come online. What
> > we have observed is smt_mask get updated underneath us even if
> > we hold the lock. From reading the code, looks like we don't hold the
> > rq lock when the mask is updated. This extra logic was to take care of that.
> 
> Sure, the mask is updated async, but _where_ is the actual problem with
> that?

Hi Peter,

I tried again and came up with the simple patch below which handles all
issues and does not cause any more crashes. I added elaborate commit messages
and code comments enlisting all the issues. Hope it makes sense now. IMHO any
other solutions seems unclear or overhead. The simple solution below Just
Works (Tm) and does not add overhead.

Let me know what you think, thanks.

---8<-----------------------

From 546c5b48f372111589117f51fd79ac1e9493c7e7 Mon Sep 17 00:00:00 2001
From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Date: Tue, 1 Sep 2020 00:56:36 -0400
Subject: [PATCH] sched/core: Hotplug fixes to pick_next_task()

The follow 3 cases need to be handled to avoid crashes in pick_next_task() when
CPUs in a core are going offline or coming online.

1. The stopper task is switching into idle when it is brought down by CPU
hotplug. It is not in the cpu_smt_mask so nothing need be selected for it.
Further, the current code ends up not selecting anything for it, not even idle.
This ends up causing crashes in set_next_task(). Just do the __pick_next_task()
selection which will select the idle task. No need to do core-wide selection as
other siblings will handle it for themselves when they call schedule.

2. The rq->core_pick for a sibling in a core can be NULL if no selection was
made for it because it was either offline or went offline during a sibling's
core-wide selection. In this case, do a core-wide selection. In this case, we
have to completely ignore the checks:
        if (rq->core->core_pick_seq == rq->core->core_task_seq &&
	    rq->core->core_pick_seq != rq->core_sched_seq)

Otherwise, it would again end up crashing like #1.

3. The 'Rescheduling siblings' loop of pick_next_task() is quite fragile. It
calls various functions on rq->core_pick which could very well be NULL because:
An online sibling might have gone offline before a task could be picked for it,
or it might be offline but later happen to come online, but its too late and
nothing was picked for it. Just ignore the siblings for which nothing could be
picked. This avoids any crashes that may occur in this loop that assume
rq->core_pick is not NULL.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/sched/core.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 717122a3dca1..4966e9f14f39 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4610,13 +4610,24 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 	if (!sched_core_enabled(rq))
 		return __pick_next_task(rq, prev, rf);
 
+	cpu = cpu_of(rq);
+
+	/* Stopper task is switching into idle, no need core-wide selection. */
+	if (cpu_is_offline(cpu))
+		return __pick_next_task(rq, prev, rf);
+
 	/*
 	 * If there were no {en,de}queues since we picked (IOW, the task
 	 * pointers are all still valid), and we haven't scheduled the last
 	 * pick yet, do so now.
+	 *
+	 * rq->core_pick can be NULL if no selection was made for a CPU because
+	 * it was either offline or went offline during a sibling's core-wide
+	 * selection. In this case, do a core-wide selection.
 	 */
 	if (rq->core->core_pick_seq == rq->core->core_task_seq &&
-	    rq->core->core_pick_seq != rq->core_sched_seq) {
+	    rq->core->core_pick_seq != rq->core_sched_seq &&
+	    !rq->core_pick) {
 		WRITE_ONCE(rq->core_sched_seq, rq->core->core_pick_seq);
 
 		next = rq->core_pick;
@@ -4629,7 +4640,6 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 
 	put_prev_task_balance(rq, prev, rf);
 
-	cpu = cpu_of(rq);
 	smt_mask = cpu_smt_mask(cpu);
 
 	/*
@@ -4761,7 +4771,15 @@ next_class:;
 	for_each_cpu(i, smt_mask) {
 		struct rq *rq_i = cpu_rq(i);
 
-		WARN_ON_ONCE(!rq_i->core_pick);
+		/*
+		 * An online sibling might have gone offline before a task
+		 * could be picked for it, or it might be offline but later
+		 * happen to come online, but its too late and nothing was
+		 * picked for it.  That's Ok - it will pick tasks for itself,
+		 * so ignore it.
+		 */
+		if (!rq_i->core_pick)
+			continue;
 
 		if (is_idle_task(rq_i->core_pick) && rq_i->nr_running)
 			rq_i->core_forceidle = true;
-- 
2.28.0.402.g5ffc5be6b7-goog


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling.
  2020-09-01  5:10         ` Joel Fernandes
@ 2020-09-01 12:34           ` Vineeth Pillai
  2020-09-01 17:30             ` Joel Fernandes
  0 siblings, 1 reply; 68+ messages in thread
From: Vineeth Pillai @ 2020-09-01 12:34 UTC (permalink / raw)
  To: Joel Fernandes, peterz
  Cc: Julien Desfossez, Tim Chen, Aaron Lu, Aubrey Li, Dhaval Giani,
	Chris Hyser, Nishanth Aravamudan, mingo, tglx, pjt, torvalds,
	linux-kernel, fweisbec, keescook, kerrnel, Phil Auld,
	Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini,
	vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, rostedt,
	derkling, benbjiang, Vineeth Remanan Pillai, Aaron Lu

Hi Joel,

On 9/1/20 1:10 AM, Joel Fernandes wrote:
> 3. The 'Rescheduling siblings' loop of pick_next_task() is quite fragile. It
> calls various functions on rq->core_pick which could very well be NULL because:
> An online sibling might have gone offline before a task could be picked for it,
> or it might be offline but later happen to come online, but its too late and
> nothing was picked for it. Just ignore the siblings for which nothing could be
> picked. This avoids any crashes that may occur in this loop that assume
> rq->core_pick is not NULL.
>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
I like this idea, its much simpler :-)

> ---
>   kernel/sched/core.c | 24 +++++++++++++++++++++---
>   1 file changed, 21 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 717122a3dca1..4966e9f14f39 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4610,13 +4610,24 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
>   	if (!sched_core_enabled(rq))
>   		return __pick_next_task(rq, prev, rf);
>   
> +	cpu = cpu_of(rq);
> +
> +	/* Stopper task is switching into idle, no need core-wide selection. */
I think we can come here when hotplug thread is scheduled during online, but
mask is not yet updated. Probably can add it with this comment as well.

> +	if (cpu_is_offline(cpu))
> +		return __pick_next_task(rq, prev, rf);
> +
We would need reset core_pick here I think. Something like
     if (cpu_is_offline(cpu)) {
         rq->core_pick = NULL;
         return __pick_next_task(rq, prev, rf);
     }

Without this we can end up in a crash like this:
1. Sibling of this cpu picks a task (rq_i->core_pick) and this cpu goes
     offline soon after.
2. Before this cpu comes online, sibling goes through another pick loop
     and before its IPI loop, this cpu comes online and we get an IPI.
3. So when this cpu gets into schedule, we have core_pick set and
     core_pick_seq != core_sched_seq. So we enter the fast path. But
     core_pick might no longer in this runqueue.

So, to protect this, we should reset core_pick I think. I have seen this 
crash
occasionally.

>   	/*
>   	 * If there were no {en,de}queues since we picked (IOW, the task
>   	 * pointers are all still valid), and we haven't scheduled the last
>   	 * pick yet, do so now.
> +	 *
> +	 * rq->core_pick can be NULL if no selection was made for a CPU because
> +	 * it was either offline or went offline during a sibling's core-wide
> +	 * selection. In this case, do a core-wide selection.
>   	 */
>   	if (rq->core->core_pick_seq == rq->core->core_task_seq &&
> -	    rq->core->core_pick_seq != rq->core_sched_seq) {
> +	    rq->core->core_pick_seq != rq->core_sched_seq &&
> +	    !rq->core_pick) {
Should this check be reversed? I mean, we should enter the fastpath if
we have rq->core_pick is set right?


Another unrelated, but related note :-)
Besides this, I think we need to retain on more change from the previous
patch. We would need to make core_pick_seq per sibling instead of per
core. Having it per core might lead to unfairness. For eg: When a cpu
sees that its sibling's core_pick is the one which is already running, it
will not send IPI. but core_pick remains set and core->core_pick_seq is
incremented. Now if the sibling is preempted due to a high priority task
or its time slice expired, it enters schedule. But it goes to fast path and
selects the running task there by starving the high priority task. Having
the core_pick_seq per sibling will avoid this. It might also help in some
hotplug corner cases as well.

Thanks,
Vineeth


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode
  2020-08-28 19:51 ` [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode Julien Desfossez
  2020-08-30  6:50   ` [kernel/entry] 872a0a3f0b: will-it-scale.per_thread_ops -18.7% regression kernel test robot
@ 2020-09-01 15:54   ` Thomas Gleixner
  2020-09-01 16:50     ` Joel Fernandes
  1 sibling, 1 reply; 68+ messages in thread
From: Thomas Gleixner @ 2020-09-01 15:54 UTC (permalink / raw)
  To: Julien Desfossez, Peter Zijlstra, Vineeth Pillai, Joel Fernandes,
	Tim Chen, Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan
  Cc: Joel Fernandes (Google),
	mingo, pjt, torvalds, linux-kernel, fweisbec, keescook, kerrnel,
	Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta,
	Paolo Bonzini, vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, rostedt,
	derkling, benbjiang, Julien Desfossez, Aubrey Li, Tim Chen,
	Paul E . McKenney

On Fri, Aug 28 2020 at 15:51, Julien Desfossez wrote:
> @@ -112,59 +113,84 @@ static __always_inline void exit_to_user_mode(void)
>  /* Workaround to allow gradual conversion of architecture code */
>  void __weak arch_do_signal(struct pt_regs *regs) { }
>  
> -static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
> -					    unsigned long ti_work)

Can the rework of that code please be split out into a seperate patch
and then adding that unsafe muck on top?

> +static inline bool exit_to_user_mode_work_pending(unsigned long ti_work)
>  {
> -	/*
> -	 * Before returning to user space ensure that all pending work
> -	 * items have been completed.
> -	 */
> -	while (ti_work & EXIT_TO_USER_MODE_WORK) {
> +	return (ti_work & EXIT_TO_USER_MODE_WORK);
> +}
>  
> -		local_irq_enable_exit_to_user(ti_work);
> +static inline void exit_to_user_mode_work(struct pt_regs *regs,
> +					  unsigned long ti_work)
> +{
>  
> -		if (ti_work & _TIF_NEED_RESCHED)
> -			schedule();
> +	local_irq_enable_exit_to_user(ti_work);
>  
> -		if (ti_work & _TIF_UPROBE)
> -			uprobe_notify_resume(regs);
> +	if (ti_work & _TIF_NEED_RESCHED)
> +		schedule();
>  
> -		if (ti_work & _TIF_PATCH_PENDING)
> -			klp_update_patch_state(current);
> +	if (ti_work & _TIF_UPROBE)
> +		uprobe_notify_resume(regs);
>  
> -		if (ti_work & _TIF_SIGPENDING)
> -			arch_do_signal(regs);
> +	if (ti_work & _TIF_PATCH_PENDING)
> +		klp_update_patch_state(current);
>  
> -		if (ti_work & _TIF_NOTIFY_RESUME) {
> -			clear_thread_flag(TIF_NOTIFY_RESUME);
> -			tracehook_notify_resume(regs);
> -			rseq_handle_notify_resume(NULL, regs);
> -		}
> +	if (ti_work & _TIF_SIGPENDING)
> +		arch_do_signal(regs);
> +
> +	if (ti_work & _TIF_NOTIFY_RESUME) {
> +		clear_thread_flag(TIF_NOTIFY_RESUME);
> +		tracehook_notify_resume(regs);
> +		rseq_handle_notify_resume(NULL, regs);
> +	}
> +
> +	/* Architecture specific TIF work */
> +	arch_exit_to_user_mode_work(regs, ti_work);
> +
> +	local_irq_disable_exit_to_user();
> +}
>  
> -		/* Architecture specific TIF work */
> -		arch_exit_to_user_mode_work(regs, ti_work);
> +static unsigned long exit_to_user_mode_loop(struct pt_regs *regs)
> +{
> +	unsigned long ti_work = READ_ONCE(current_thread_info()->flags);
> +
> +	/*
> +	 * Before returning to user space ensure that all pending work
> +	 * items have been completed.
> +	 */
> +	while (1) {
> +		/* Both interrupts and preemption could be enabled here. */

   What? Preemption is enabled here, but interrupts are disabled.

> +		if (exit_to_user_mode_work_pending(ti_work))
> +		    exit_to_user_mode_work(regs, ti_work);
> +
> +		/* Interrupts may be reenabled with preemption disabled. */
> +		sched_core_unsafe_exit_wait(EXIT_TO_USER_MODE_WORK);
> +
>  		/*
> -		 * Disable interrupts and reevaluate the work flags as they
> -		 * might have changed while interrupts and preemption was
> -		 * enabled above.
> +		 * Reevaluate the work flags as they might have changed
> +		 * while interrupts and preemption were enabled.

What enables preemption and interrupts? Can you pretty please write
comments which explain what's going on.

>  		 */
> -		local_irq_disable_exit_to_user();
>  		ti_work = READ_ONCE(current_thread_info()->flags);
> +
> +		/*
> +		 * We may be switching out to another task in kernel mode. That
> +		 * process is responsible for exiting the "unsafe" kernel mode
> +		 * when it returns to user or guest.
> +		 */
> +		if (exit_to_user_mode_work_pending(ti_work))
> +			sched_core_unsafe_enter();
> +		else
> +			break;
>  	}
>  
> -	/* Return the latest work state for arch_exit_to_user_mode() */
> -	return ti_work;
> +    return ti_work;
>  }
>  
>  static void exit_to_user_mode_prepare(struct pt_regs *regs)
>  {
> -	unsigned long ti_work = READ_ONCE(current_thread_info()->flags);
> +	unsigned long ti_work;
>  
>  	lockdep_assert_irqs_disabled();
>  
> -	if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK))
> -		ti_work = exit_to_user_mode_loop(regs, ti_work);
> +	ti_work = exit_to_user_mode_loop(regs);

Why has the above loop to be invoked unconditionally even when that core
scheduling muck is disabled? Just to make all return to user paths more
expensive unconditionally, right?

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode
  2020-09-01 15:54   ` [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode Thomas Gleixner
@ 2020-09-01 16:50     ` Joel Fernandes
  2020-09-01 20:02       ` Thomas Gleixner
  0 siblings, 1 reply; 68+ messages in thread
From: Joel Fernandes @ 2020-09-01 16:50 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Julien Desfossez, Peter Zijlstra, Vineeth Pillai, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan, mingo, pjt, torvalds, linux-kernel,
	fweisbec, keescook, kerrnel, Phil Auld, Valentin Schneider,
	Mel Gorman, Pawan Gupta, Paolo Bonzini, vineeth, Chen Yu,
	Christian Brauner, Agata Gruza, Antonio Gomez Iglesias, graf,
	konrad.wilk, dfaggioli, rostedt, derkling, benbjiang, Aubrey Li,
	Tim Chen, Paul E . McKenney

On Tue, Sep 01, 2020 at 05:54:53PM +0200, Thomas Gleixner wrote:
> On Fri, Aug 28 2020 at 15:51, Julien Desfossez wrote:
> > @@ -112,59 +113,84 @@ static __always_inline void exit_to_user_mode(void)
> >  /* Workaround to allow gradual conversion of architecture code */
> >  void __weak arch_do_signal(struct pt_regs *regs) { }
> >  
> > -static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
> > -					    unsigned long ti_work)
> 
> Can the rework of that code please be split out into a seperate patch
> and then adding that unsafe muck on top?

Yeah, good idea. Will do.

> > +static inline bool exit_to_user_mode_work_pending(unsigned long ti_work)
> >  {
> > -	/*
> > -	 * Before returning to user space ensure that all pending work
> > -	 * items have been completed.
> > -	 */
> > -	while (ti_work & EXIT_TO_USER_MODE_WORK) {
> > +	return (ti_work & EXIT_TO_USER_MODE_WORK);
> > +}
> >  
> > -		local_irq_enable_exit_to_user(ti_work);
> > +static inline void exit_to_user_mode_work(struct pt_regs *regs,
> > +					  unsigned long ti_work)
> > +{
> >  
> > -		if (ti_work & _TIF_NEED_RESCHED)
> > -			schedule();
> > +	local_irq_enable_exit_to_user(ti_work);
> >  
> > -		if (ti_work & _TIF_UPROBE)
> > -			uprobe_notify_resume(regs);
> > +	if (ti_work & _TIF_NEED_RESCHED)
> > +		schedule();
> >  
> > -		if (ti_work & _TIF_PATCH_PENDING)
> > -			klp_update_patch_state(current);
> > +	if (ti_work & _TIF_UPROBE)
> > +		uprobe_notify_resume(regs);
> >  
> > -		if (ti_work & _TIF_SIGPENDING)
> > -			arch_do_signal(regs);
> > +	if (ti_work & _TIF_PATCH_PENDING)
> > +		klp_update_patch_state(current);
> >  
> > -		if (ti_work & _TIF_NOTIFY_RESUME) {
> > -			clear_thread_flag(TIF_NOTIFY_RESUME);
> > -			tracehook_notify_resume(regs);
> > -			rseq_handle_notify_resume(NULL, regs);
> > -		}
> > +	if (ti_work & _TIF_SIGPENDING)
> > +		arch_do_signal(regs);
> > +
> > +	if (ti_work & _TIF_NOTIFY_RESUME) {
> > +		clear_thread_flag(TIF_NOTIFY_RESUME);
> > +		tracehook_notify_resume(regs);
> > +		rseq_handle_notify_resume(NULL, regs);
> > +	}
> > +
> > +	/* Architecture specific TIF work */
> > +	arch_exit_to_user_mode_work(regs, ti_work);
> > +
> > +	local_irq_disable_exit_to_user();
> > +}
> >  
> > -		/* Architecture specific TIF work */
> > -		arch_exit_to_user_mode_work(regs, ti_work);
> > +static unsigned long exit_to_user_mode_loop(struct pt_regs *regs)
> > +{
> > +	unsigned long ti_work = READ_ONCE(current_thread_info()->flags);
> > +
> > +	/*
> > +	 * Before returning to user space ensure that all pending work
> > +	 * items have been completed.
> > +	 */
> > +	while (1) {
> > +		/* Both interrupts and preemption could be enabled here. */
> 
>    What? Preemption is enabled here, but interrupts are disabled.

Sorry, I meant about what happens within exit_to_user_mode_work(). That's
what the comment was for. I agree I will do better with the comment next
time.

> > +		if (exit_to_user_mode_work_pending(ti_work))
> > +		    exit_to_user_mode_work(regs, ti_work);
> > +
> > +		/* Interrupts may be reenabled with preemption disabled. */
> > +		sched_core_unsafe_exit_wait(EXIT_TO_USER_MODE_WORK);
> > +
> >  		/*
> > -		 * Disable interrupts and reevaluate the work flags as they
> > -		 * might have changed while interrupts and preemption was
> > -		 * enabled above.
> > +		 * Reevaluate the work flags as they might have changed
> > +		 * while interrupts and preemption were enabled.
> 
> What enables preemption and interrupts? Can you pretty please write
> comments which explain what's going on.

Yes, sorry. So, sched_core_unsafe_exit_wait() will spin with IRQs enabled and
preemption disabled. I did it this way to get past stop_machine() issues
where you were unhappy with us spinning in IRQ disabled region.

> >  		 */
> > -		local_irq_disable_exit_to_user();
> >  		ti_work = READ_ONCE(current_thread_info()->flags);
> > +
> > +		/*
> > +		 * We may be switching out to another task in kernel mode. That
> > +		 * process is responsible for exiting the "unsafe" kernel mode
> > +		 * when it returns to user or guest.
> > +		 */
> > +		if (exit_to_user_mode_work_pending(ti_work))
> > +			sched_core_unsafe_enter();
> > +		else
> > +			break;
> >  	}
> >  
> > -	/* Return the latest work state for arch_exit_to_user_mode() */
> > -	return ti_work;
> > +    return ti_work;
> >  }
> >  
> >  static void exit_to_user_mode_prepare(struct pt_regs *regs)
> >  {
> > -	unsigned long ti_work = READ_ONCE(current_thread_info()->flags);
> > +	unsigned long ti_work;
> >  
> >  	lockdep_assert_irqs_disabled();
> >  
> > -	if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK))
> > -		ti_work = exit_to_user_mode_loop(regs, ti_work);
> > +	ti_work = exit_to_user_mode_loop(regs);
> 
> Why has the above loop to be invoked unconditionally even when that core
> scheduling muck is disabled? Just to make all return to user paths more
> expensive unconditionally, right?

If you see the above loop, we are calling exit_to_user_mode_work()
conditionally by calling exit_to_user_mode_work_pending() which does the same
thing.

So we are still conditionally doing the usual exit_to_user_mode work, its
just that now we have to unconditionally invoke the 'loop' anyway. The reason
for that is, the loop can switch into another thread, so we have to do
unsafe_exit() for the old thread, and unsafe_enter() for the new one while
handling the tif work properly. We could get migrated to another CPU in this
loop itself, AFAICS. So the unsafe_enter() / unsafe_exit() calls could also
happen on different CPUs.

I will rework it by splitting, and adding more elaborate comments, etc.
Thanks,

 - Joel



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling.
  2020-09-01 12:34           ` Vineeth Pillai
@ 2020-09-01 17:30             ` Joel Fernandes
  2020-09-01 21:23               ` Vineeth Pillai
  0 siblings, 1 reply; 68+ messages in thread
From: Joel Fernandes @ 2020-09-01 17:30 UTC (permalink / raw)
  To: Vineeth Pillai
  Cc: peterz, Julien Desfossez, Tim Chen, Aaron Lu, Aubrey Li,
	Dhaval Giani, Chris Hyser, Nishanth Aravamudan, mingo, tglx, pjt,
	torvalds, linux-kernel, fweisbec, keescook, kerrnel, Phil Auld,
	Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini,
	vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, rostedt,
	derkling, benbjiang, Vineeth Remanan Pillai, Aaron Lu

Hi Vineeth,

On Tue, Sep 01, 2020 at 08:34:23AM -0400, Vineeth Pillai wrote:
> Hi Joel,
> 
> On 9/1/20 1:10 AM, Joel Fernandes wrote:
> > 3. The 'Rescheduling siblings' loop of pick_next_task() is quite fragile. It
> > calls various functions on rq->core_pick which could very well be NULL because:
> > An online sibling might have gone offline before a task could be picked for it,
> > or it might be offline but later happen to come online, but its too late and
> > nothing was picked for it. Just ignore the siblings for which nothing could be
> > picked. This avoids any crashes that may occur in this loop that assume
> > rq->core_pick is not NULL.
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> I like this idea, its much simpler :-)

Thanks.

> > ---
> >   kernel/sched/core.c | 24 +++++++++++++++++++++---
> >   1 file changed, 21 insertions(+), 3 deletions(-)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 717122a3dca1..4966e9f14f39 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -4610,13 +4610,24 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
> >   	if (!sched_core_enabled(rq))
> >   		return __pick_next_task(rq, prev, rf);
> > +	cpu = cpu_of(rq);
> > +
> > +	/* Stopper task is switching into idle, no need core-wide selection. */
>
> I think we can come here when hotplug thread is scheduled during online, but
> mask is not yet updated. Probably can add it with this comment as well.
> 

I don't see how that is possible. Because the cpuhp threads run during the
CPU onlining process, the boot thread for the CPU coming online would have
already updated the mask.

> > +	if (cpu_is_offline(cpu))
> > +		return __pick_next_task(rq, prev, rf);
> > +
> We would need reset core_pick here I think. Something like
>     if (cpu_is_offline(cpu)) {
>         rq->core_pick = NULL;
>         return __pick_next_task(rq, prev, rf);
>     }
> 
> Without this we can end up in a crash like this:
> 1. Sibling of this cpu picks a task (rq_i->core_pick) and this cpu goes
>     offline soon after.
> 2. Before this cpu comes online, sibling goes through another pick loop
>     and before its IPI loop, this cpu comes online and we get an IPI.
> 3. So when this cpu gets into schedule, we have core_pick set and
>     core_pick_seq != core_sched_seq. So we enter the fast path. But
>     core_pick might no longer in this runqueue.
> 
> So, to protect this, we should reset core_pick I think. I have seen this
> crash
> occasionally.

Ok, done.

> >   	/*
> >   	 * If there were no {en,de}queues since we picked (IOW, the task
> >   	 * pointers are all still valid), and we haven't scheduled the last
> >   	 * pick yet, do so now.
> > +	 *
> > +	 * rq->core_pick can be NULL if no selection was made for a CPU because
> > +	 * it was either offline or went offline during a sibling's core-wide
> > +	 * selection. In this case, do a core-wide selection.
> >   	 */
> >   	if (rq->core->core_pick_seq == rq->core->core_task_seq &&
> > -	    rq->core->core_pick_seq != rq->core_sched_seq) {
> > +	    rq->core->core_pick_seq != rq->core_sched_seq &&
> > +	    !rq->core_pick) {
> Should this check be reversed? I mean, we should enter the fastpath if
> we have rq->core_pick is set right?

Done. Sorry my testing did not catch it, but it eventually caused a problem
after several hours of the stress test so I'd have eventually caught it.

> Another unrelated, but related note :-)
> Besides this, I think we need to retain on more change from the previous
> patch. We would need to make core_pick_seq per sibling instead of per
> core. Having it per core might lead to unfairness. For eg: When a cpu
> sees that its sibling's core_pick is the one which is already running, it
> will not send IPI. but core_pick remains set and core->core_pick_seq is
> incremented. Now if the sibling is preempted due to a high priority task

Then don't keep the core_pick set then. If you don't send it IPI and if
core_pick is already running, then NULL it already. I don't know why we add
to more corner cases by making assumptions. We have enough open issues that
are not hotplug related. Here's my suggestion :

1.  Keep the ideas consistent, forget about the exact code currently written
and just understand the pick_seq is for siblings knowing that something was
picked for the whole core.  So if their pick_seq != sched_seq, then they have
to pick what was selected.

2. If core_pick should be NULL, then NULL it in some path. If you keep some
core_pick and you increment pick_seq, then you are automatically asking the
sibling to pick that task up then next time it enters schedule(). See if [1]
will work?

Note that, we have added logic in this patch that does a full selection if
rq->core_pick == NULL.

> or its time slice expired, it enters schedule. But it goes to fast path and
> selects the running task there by starving the high priority task. Having
> the core_pick_seq per sibling will avoid this. It might also help in some
> hotplug corner cases as well.

That can be a separate patch IMHO. It has nothing to do with
stability/crashing of concurrent and rather infrequent CPU hotplug
operations.

Also, Peter said pick_seq is for core-wide picking. If you want to add
another semantic, then maybe add another counter which has a separate
meaning and justify why you are adding it.

thanks,

 - Joel

[1]
---8<-----------------------

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7728ca7f6bb2..7a03b609e3b7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4793,6 +4793,8 @@ next_class:;
 
 		if (rq_i->curr != rq_i->core_pick)
 			resched_curr(rq_i);
+		else
+			rq_i->core_pick = NULL;
 
 		/* Did we break L1TF mitigation requirements? */
 		WARN_ON_ONCE(!cookie_match(next, rq_i->core_pick));

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode
  2020-09-01 16:50     ` Joel Fernandes
@ 2020-09-01 20:02       ` Thomas Gleixner
  2020-09-02  1:29         ` Joel Fernandes
  0 siblings, 1 reply; 68+ messages in thread
From: Thomas Gleixner @ 2020-09-01 20:02 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Julien Desfossez, Peter Zijlstra, Vineeth Pillai, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan, mingo, pjt, torvalds, linux-kernel,
	fweisbec, keescook, kerrnel, Phil Auld, Valentin Schneider,
	Mel Gorman, Pawan Gupta, Paolo Bonzini, vineeth, Chen Yu,
	Christian Brauner, Agata Gruza, Antonio Gomez Iglesias, graf,
	konrad.wilk, dfaggioli, rostedt, derkling, benbjiang, Aubrey Li,
	Tim Chen, Paul E . McKenney

Joel,

On Tue, Sep 01 2020 at 12:50, Joel Fernandes wrote:
> On Tue, Sep 01, 2020 at 05:54:53PM +0200, Thomas Gleixner wrote:
>> On Fri, Aug 28 2020 at 15:51, Julien Desfossez wrote:
>> >  		/*
>> > -		 * Disable interrupts and reevaluate the work flags as they
>> > -		 * might have changed while interrupts and preemption was
>> > -		 * enabled above.
>> > +		 * Reevaluate the work flags as they might have changed
>> > +		 * while interrupts and preemption were enabled.
>> 
>> What enables preemption and interrupts? Can you pretty please write
>> comments which explain what's going on.
>
> Yes, sorry. So, sched_core_unsafe_exit_wait() will spin with IRQs enabled and
> preemption disabled. I did it this way to get past stop_machine() issues
> where you were unhappy with us spinning in IRQ disabled region.

So the comment is even more nonsensical :)

>> > -	if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK))
>> > -		ti_work = exit_to_user_mode_loop(regs, ti_work);
>> > +	ti_work = exit_to_user_mode_loop(regs);
>> 
>> Why has the above loop to be invoked unconditionally even when that core
>> scheduling muck is disabled? Just to make all return to user paths more
>> expensive unconditionally, right?
>
> If you see the above loop, we are calling exit_to_user_mode_work()
> conditionally by calling exit_to_user_mode_work_pending() which does the same
> thing.

It does the same thing technically, but the fastpath, i.e. no work to
do, is not longer straight forward. Just look at the resulting ASM
before and after. It's awful.

> So we are still conditionally doing the usual exit_to_user_mode work, its
> just that now we have to unconditionally invoke the 'loop' anyway.

No.

> The reason for that is, the loop can switch into another thread, so we
> have to do unsafe_exit() for the old thread, and unsafe_enter() for
> the new one while handling the tif work properly. We could get
> migrated to another CPU in this loop itself, AFAICS. So the
> unsafe_enter() / unsafe_exit() calls could also happen on different
> CPUs.

That's fine. It still does not justify to make everything slower even if
that 'pretend that HT is secure' thing is disabled.

Something like the below should be sufficient to do what you want
while restricting the wreckage to the 'pretend ht is secure' case.

The generated code for the CONFIG_PRETENT_HT_SECURE=n case is the same
as without the patch. With CONFIG_PRETENT_HT_SECURE=y the impact is
exactly two NOP-ed out jumps if the muck is not enabled on the command
line which should be the default behaviour.

Thanks,

        tglx

---
--- /dev/null
+++ b/include/linux/pretend_ht_secure.h
@@ -0,0 +1,21 @@
+#ifndef _LINUX_PRETEND_HT_SECURE_H
+#define _LINUX_PRETEND_HT_SECURE_H
+
+#ifdef CONFIG_PRETEND_HT_SECURE
+static inline void enter_from_user_ht_sucks(void)
+{
+	if (static_branch_unlikely(&pretend_ht_secure_key))
+		enter_from_user_pretend_ht_is_secure();
+}
+
+static inline void exit_to_user_ht_sucks(void)
+{
+	if (static_branch_unlikely(&pretend_ht_secure_key))
+		exit_to_user_pretend_ht_is_secure();
+}
+#else
+static inline void enter_from_user_ht_sucks(void) { }
+static inline void exit_to_user_ht_sucks(void) { }
+#endif
+
+#endif
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -17,6 +17,7 @@
  * 1) Tell lockdep that interrupts are disabled
  * 2) Invoke context tracking if enabled to reactivate RCU
  * 3) Trace interrupts off state
+ * 4) Pretend that HT is secure
  */
 static __always_inline void enter_from_user_mode(struct pt_regs *regs)
 {
@@ -28,6 +29,7 @@ static __always_inline void enter_from_u
 
 	instrumentation_begin();
 	trace_hardirqs_off_finish();
+	enter_from_user_ht_sucks();
 	instrumentation_end();
 }
 
@@ -111,6 +113,12 @@ static __always_inline void exit_to_user
 /* Workaround to allow gradual conversion of architecture code */
 void __weak arch_do_signal(struct pt_regs *regs) { }
 
+static inline unsigned long exit_to_user_get_work(void)
+{
+	exit_to_user_ht_sucks();
+	return READ_ONCE(current_thread_info()->flags);
+}
+
 static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
 					    unsigned long ti_work)
 {
@@ -149,7 +157,7 @@ static unsigned long exit_to_user_mode_l
 		 * enabled above.
 		 */
 		local_irq_disable_exit_to_user();
-		ti_work = READ_ONCE(current_thread_info()->flags);
+		ti_work = exit_to_user_get_work();
 	}
 
 	/* Return the latest work state for arch_exit_to_user_mode() */
@@ -158,7 +166,7 @@ static unsigned long exit_to_user_mode_l
 
 static void exit_to_user_mode_prepare(struct pt_regs *regs)
 {
-	unsigned long ti_work = READ_ONCE(current_thread_info()->flags);
+	unsigned long ti_work = exit_to_user_get_work();
 
 	lockdep_assert_irqs_disabled();
 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling.
  2020-09-01 17:30             ` Joel Fernandes
@ 2020-09-01 21:23               ` Vineeth Pillai
  2020-09-02  1:11                 ` Joel Fernandes
  0 siblings, 1 reply; 68+ messages in thread
From: Vineeth Pillai @ 2020-09-01 21:23 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: peterz, Julien Desfossez, Tim Chen, Aaron Lu, Aubrey Li,
	Dhaval Giani, Chris Hyser, Nishanth Aravamudan, mingo, tglx, pjt,
	torvalds, linux-kernel, fweisbec, keescook, kerrnel, Phil Auld,
	Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini,
	vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, rostedt,
	derkling, benbjiang, Aaron Lu

Hi Joel,

On 9/1/20 1:30 PM, Joel Fernandes wrote:
>> I think we can come here when hotplug thread is scheduled during online, but
>> mask is not yet updated. Probably can add it with this comment as well.
>>
> I don't see how that is possible. Because the cpuhp threads run during the
> CPU onlining process, the boot thread for the CPU coming online would have
> already updated the mask.
Sorry my mistake. I got confused with the online state ordering.

>> Another unrelated, but related note :-)
>> Besides this, I think we need to retain on more change from the previous
>> patch. We would need to make core_pick_seq per sibling instead of per
>> core. Having it per core might lead to unfairness. For eg: When a cpu
>> sees that its sibling's core_pick is the one which is already running, it
>> will not send IPI. but core_pick remains set and core->core_pick_seq is
>> incremented. Now if the sibling is preempted due to a high priority task
> Then don't keep the core_pick set then. If you don't send it IPI and if
> core_pick is already running, then NULL it already. I don't know why we add
> to more corner cases by making assumptions. We have enough open issues that
> are not hotplug related. Here's my suggestion :
>
> 1.  Keep the ideas consistent, forget about the exact code currently written
> and just understand the pick_seq is for siblings knowing that something was
> picked for the whole core.  So if their pick_seq != sched_seq, then they have
> to pick what was selected.
I was trying to keep the ideas consistent. The requirement of core_pick
was to let the scheduled cpu know that a pick has been made. And
initial idea was to have the counter core wide. But I found this gap
that pick is not always core wide and assuming it to be core wide can
cause fairness issues. So I was proposing the idea of changing it from
core wide to per sibling. In other words, I was trying to make sure 
core_pick,
along with task_seq and sched_seq is trying to serve its purpose of letting
a sibling know that a new task pick has been made for it. I cannot think of
a reason, why core_pick should be core wide. I might be missing something.

> 2. If core_pick should be NULL, then NULL it in some path. If you keep some
> core_pick and you increment pick_seq, then you are automatically asking the
> sibling to pick that task up then next time it enters schedule(). See if [1]
> will work?
>
> Note that, we have added logic in this patch that does a full selection if
> rq->core_pick == NULL.

I agree, setting rq->core_pick = NULL is another way to solve this 
issue, but
still I feel its semantically incorrect to think that a pick is core 
wide when it
could actually be to only a subset of siblings in the core. If there is 
a valid
reasoning for having core_pick to be core wide, I completely agree with the
fix of resetting core_pick.

>> or its time slice expired, it enters schedule. But it goes to fast path and
>> selects the running task there by starving the high priority task. Having
>> the core_pick_seq per sibling will avoid this. It might also help in some
>> hotplug corner cases as well.
> That can be a separate patch IMHO. It has nothing to do with
> stability/crashing of concurrent and rather infrequent CPU hotplug
> operations.
Agree. Sorry for the confusion, my intention was to not have the logic in
this patch.

> Also, Peter said pick_seq is for core-wide picking. If you want to add
> another semantic, then maybe add another counter which has a separate
> meaning and justify why you are adding it.
I think just one counter is enough. Unless, there is a need to keep the 
counter
to track core wide pick, I feel it is worth to change the design and 
make the
counter serve its purpose. Will think through this and send it as a separate
patch if needed.

Thanks,
Vineeth


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling.
  2020-09-01 21:23               ` Vineeth Pillai
@ 2020-09-02  1:11                 ` Joel Fernandes
  0 siblings, 0 replies; 68+ messages in thread
From: Joel Fernandes @ 2020-09-02  1:11 UTC (permalink / raw)
  To: Vineeth Pillai
  Cc: Peter Zijlstra, Julien Desfossez, Tim Chen, Aaron Lu, Aubrey Li,
	Dhaval Giani, Chris Hyser, Nishanth Aravamudan, Ingo Molnar,
	Thomas Glexiner, Paul Turner, Linus Torvalds, LKML,
	Frederic Weisbecker, Kees Cook, Greg Kerr, Phil Auld,
	Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini,
	vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, Dario Faggioli,
	Steven Rostedt, Patrick Bellasi, benbjiang(蒋彪),
	Aaron Lu

On Tue, Sep 1, 2020 at 5:23 PM Vineeth Pillai
<viremana@linux.microsoft.com> wrote:
> > Also, Peter said pick_seq is for core-wide picking. If you want to add
> > another semantic, then maybe add another counter which has a separate
> > meaning and justify why you are adding it.
> I think just one counter is enough. Unless, there is a need to keep the
> counter
> to track core wide pick, I feel it is worth to change the design and
> make the
> counter serve its purpose. Will think through this and send it as a separate
> patch if needed.

Since you agree that it suffices to set core_pick to NULL if you don't
IPI a sibling, I don't see the need for your split pick_seq thing.
But let me know if I missed something.

thanks,

 - Joel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode
  2020-09-01 20:02       ` Thomas Gleixner
@ 2020-09-02  1:29         ` Joel Fernandes
  2020-09-02  7:53           ` Thomas Gleixner
  0 siblings, 1 reply; 68+ messages in thread
From: Joel Fernandes @ 2020-09-02  1:29 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Julien Desfossez, Peter Zijlstra, Vineeth Pillai, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan, mingo, pjt, torvalds, linux-kernel,
	fweisbec, keescook, kerrnel, Phil Auld, Valentin Schneider,
	Mel Gorman, Pawan Gupta, Paolo Bonzini, vineeth, Chen Yu,
	Christian Brauner, Agata Gruza, Antonio Gomez Iglesias, graf,
	konrad.wilk, dfaggioli, rostedt, derkling, benbjiang, Aubrey Li,
	Tim Chen, Paul E . McKenney

Hi Thomas,

On Tue, Sep 01, 2020 at 10:02:10PM +0200, Thomas Gleixner wrote:
[..] 
> > The reason for that is, the loop can switch into another thread, so we
> > have to do unsafe_exit() for the old thread, and unsafe_enter() for
> > the new one while handling the tif work properly. We could get
> > migrated to another CPU in this loop itself, AFAICS. So the
> > unsafe_enter() / unsafe_exit() calls could also happen on different
> > CPUs.
> 
> That's fine. It still does not justify to make everything slower even if
> that 'pretend that HT is secure' thing is disabled.
> 
> Something like the below should be sufficient to do what you want
> while restricting the wreckage to the 'pretend ht is secure' case.
> 
> The generated code for the CONFIG_PRETENT_HT_SECURE=n case is the same

When you say 'pretend', did you mean 'make' ? The point of this patch is to
protect the kernel from the other hyperthread thus making HT secure for the
kernel contexts and not merely pretending.

> as without the patch. With CONFIG_PRETENT_HT_SECURE=y the impact is
> exactly two NOP-ed out jumps if the muck is not enabled on the command
> line which should be the default behaviour.

I see where you're coming from, I'll try to rework it to be less intrusive
when core-scheduling is disabled. Some more comments below:

> Thanks,
> 
>         tglx
> 
> ---
> --- /dev/null
> +++ b/include/linux/pretend_ht_secure.h
> @@ -0,0 +1,21 @@
> +#ifndef _LINUX_PRETEND_HT_SECURE_H
> +#define _LINUX_PRETEND_HT_SECURE_H
> +
> +#ifdef CONFIG_PRETEND_HT_SECURE
> +static inline void enter_from_user_ht_sucks(void)
> +{
> +	if (static_branch_unlikely(&pretend_ht_secure_key))
> +		enter_from_user_pretend_ht_is_secure();
> +}
> +
> +static inline void exit_to_user_ht_sucks(void)
> +{
> +	if (static_branch_unlikely(&pretend_ht_secure_key))
> +		exit_to_user_pretend_ht_is_secure();

We already have similar config and static keys for the core-scheduling
feature itself. Can we just make it depend on that?

Or, are you saying users may want 'core scheduling' enabled but may want to
leave out the kernel protection?

> +}
> +#else
> +static inline void enter_from_user_ht_sucks(void) { }
> +static inline void exit_to_user_ht_sucks(void) { }
> +#endif
> +
> +#endif
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -17,6 +17,7 @@
>   * 1) Tell lockdep that interrupts are disabled
>   * 2) Invoke context tracking if enabled to reactivate RCU
>   * 3) Trace interrupts off state
> + * 4) Pretend that HT is secure
>   */
>  static __always_inline void enter_from_user_mode(struct pt_regs *regs)
>  {
> @@ -28,6 +29,7 @@ static __always_inline void enter_from_u
>  
>  	instrumentation_begin();
>  	trace_hardirqs_off_finish();
> +	enter_from_user_ht_sucks();
>  	instrumentation_end();
>  }
>  
> @@ -111,6 +113,12 @@ static __always_inline void exit_to_user
>  /* Workaround to allow gradual conversion of architecture code */
>  void __weak arch_do_signal(struct pt_regs *regs) { }
>  
> +static inline unsigned long exit_to_user_get_work(void)
> +{
> +	exit_to_user_ht_sucks();

Ok, one issue with your patch is it does not take care of the waiting logic.
sched_core_unsafe_exit_wait() needs to be called *after* all of the
exit_to_user_mode_work is processed. This is because
sched_core_unsafe_exit_wait() also checks for any new exit-to-usermode-work
that popped up while it is spinning and breaks out of its spin-till-safe loop
early. This is key to solving the stop-machine issue. If the stopper needs to
run, then the need-resched flag will be set and we break out of the spin and
redo the whole exit_to_user_mode_loop() as it should.

I agree with the need to make the ASM suck less if the feature is turned off
though, and I can try to cook something along those lines. Thanks for the idea!

thanks,

 - Joel


> +	return READ_ONCE(current_thread_info()->flags);
> +}
> +
>  static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
>  					    unsigned long ti_work)
>  {
> @@ -149,7 +157,7 @@ static unsigned long exit_to_user_mode_l
>  		 * enabled above.
>  		 */
>  		local_irq_disable_exit_to_user();
> -		ti_work = READ_ONCE(current_thread_info()->flags);
> +		ti_work = exit_to_user_get_work();
>  	}
>  
>  	/* Return the latest work state for arch_exit_to_user_mode() */
> @@ -158,7 +166,7 @@ static unsigned long exit_to_user_mode_l
>  
>  static void exit_to_user_mode_prepare(struct pt_regs *regs)
>  {
> -	unsigned long ti_work = READ_ONCE(current_thread_info()->flags);
> +	unsigned long ti_work = exit_to_user_get_work();
>  
>  	lockdep_assert_irqs_disabled();
>  

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 12/23] sched: Trivial forced-newidle balancer
  2020-08-28 19:51 ` [RFC PATCH v7 12/23] sched: Trivial forced-newidle balancer Julien Desfossez
@ 2020-09-02  7:08   ` Pavan Kondeti
  0 siblings, 0 replies; 68+ messages in thread
From: Pavan Kondeti @ 2020-09-02  7:08 UTC (permalink / raw)
  To: Julien Desfossez
  Cc: Peter Zijlstra, Vineeth Pillai, Joel Fernandes, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan, mingo, tglx, pjt, torvalds, linux-kernel,
	fweisbec, keescook, kerrnel, Phil Auld, Valentin Schneider,
	Mel Gorman, Pawan Gupta, Paolo Bonzini, joel, vineeth, Chen Yu,
	Christian Brauner, Agata Gruza, Antonio Gomez Iglesias, graf,
	konrad.wilk, dfaggioli, rostedt, derkling, benbjiang

On Fri, Aug 28, 2020 at 03:51:13PM -0400, Julien Desfossez wrote:
>  /*
>   * The static-key + stop-machine variable are needed such that:
>   *
> @@ -4641,7 +4656,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
>  	struct task_struct *next, *max = NULL;
>  	const struct sched_class *class;
>  	const struct cpumask *smt_mask;
> -	int i, j, cpu;
> +	int i, j, cpu, occ = 0;
>  	int smt_weight;
>  	bool need_sync;
>  
> @@ -4750,6 +4765,9 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
>  				goto done;
>  			}
>  
> +			if (!is_idle_task(p))
> +				occ++;
> +
>  			rq_i->core_pick = p;
>  
>  			/*
> @@ -4775,6 +4793,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
>  
>  						cpu_rq(j)->core_pick = NULL;
>  					}
> +					occ = 1;
>  					goto again;
>  				} else {
>  					/*
> @@ -4820,6 +4839,8 @@ next_class:;
>  		if (is_idle_task(rq_i->core_pick) && rq_i->nr_running)
>  			rq_i->core_forceidle = true;
>  
> +		rq_i->core_pick->core_occupation = occ;
> +
>  		if (i == cpu)
>  			continue;
>  
> @@ -4837,6 +4858,113 @@ next_class:;
>  	return next;
>  }
>  
> +static bool try_steal_cookie(int this, int that)
> +{
> +	struct rq *dst = cpu_rq(this), *src = cpu_rq(that);
> +	struct task_struct *p;
> +	unsigned long cookie;
> +	bool success = false;
> +
> +	local_irq_disable();
> +	double_rq_lock(dst, src);
> +
> +	cookie = dst->core->core_cookie;
> +	if (!cookie)
> +		goto unlock;
> +
> +	if (dst->curr != dst->idle)
> +		goto unlock;
> +
> +	p = sched_core_find(src, cookie);
> +	if (p == src->idle)
> +		goto unlock;
> +
> +	do {
> +		if (p == src->core_pick || p == src->curr)
> +			goto next;
> +
> +		if (!cpumask_test_cpu(this, &p->cpus_mask))
> +			goto next;
> +
> +		if (p->core_occupation > dst->idle->core_occupation)
> +			goto next;
> +
Can you please explain the rationale behind this check? If I understand
correctly, p->core_occupation is set in pick_next_task() to indicate
the number of matching cookie (except idle) tasks picked on this core.
It is not reset anywhere.

> +		p->on_rq = TASK_ON_RQ_MIGRATING;
> +		deactivate_task(src, p, 0);
> +		set_task_cpu(p, this);
> +		activate_task(dst, p, 0);
> +		p->on_rq = TASK_ON_RQ_QUEUED;
> +
> +		resched_curr(dst);
> +
> +		success = true;
> +		break;
> +
> +next:
> +		p = sched_core_next(p, cookie);
> +	} while (p);
> +
> +unlock:
> +	double_rq_unlock(dst, src);
> +	local_irq_enable();
> +
> +	return success;
> +}

Thanks,
Pavan
-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode
  2020-09-02  1:29         ` Joel Fernandes
@ 2020-09-02  7:53           ` Thomas Gleixner
  2020-09-02 15:12             ` Joel Fernandes
  2020-09-02 16:57             ` Dario Faggioli
  0 siblings, 2 replies; 68+ messages in thread
From: Thomas Gleixner @ 2020-09-02  7:53 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Julien Desfossez, Peter Zijlstra, Vineeth Pillai, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan, mingo, pjt, torvalds, linux-kernel,
	fweisbec, keescook, kerrnel, Phil Auld, Valentin Schneider,
	Mel Gorman, Pawan Gupta, Paolo Bonzini, vineeth, Chen Yu,
	Christian Brauner, Agata Gruza, Antonio Gomez Iglesias, graf,
	konrad.wilk, dfaggioli, rostedt, derkling, benbjiang, Aubrey Li,
	Tim Chen, Paul E . McKenney

Joel,

On Tue, Sep 01 2020 at 21:29, Joel Fernandes wrote:
> On Tue, Sep 01, 2020 at 10:02:10PM +0200, Thomas Gleixner wrote:
>> The generated code for the CONFIG_PRETENT_HT_SECURE=n case is the same
>
> When you say 'pretend', did you mean 'make' ? The point of this patch is to
> protect the kernel from the other hyperthread thus making HT secure for the
> kernel contexts and not merely pretending.

I'm paranoid and don't trust HT at all. There is too much shared state.

>> --- /dev/null
>> +++ b/include/linux/pretend_ht_secure.h
>> @@ -0,0 +1,21 @@
>> +#ifndef _LINUX_PRETEND_HT_SECURE_H
>> +#define _LINUX_PRETEND_HT_SECURE_H
>> +
>> +#ifdef CONFIG_PRETEND_HT_SECURE
>> +static inline void enter_from_user_ht_sucks(void)
>> +{
>> +	if (static_branch_unlikely(&pretend_ht_secure_key))
>> +		enter_from_user_pretend_ht_is_secure();
>> +}
>> +
>> +static inline void exit_to_user_ht_sucks(void)
>> +{
>> +	if (static_branch_unlikely(&pretend_ht_secure_key))
>> +		exit_to_user_pretend_ht_is_secure();
>
> We already have similar config and static keys for the core-scheduling
> feature itself. Can we just make it depend on that?

Of course. This was just for illustration. :)

> Or, are you saying users may want 'core scheduling' enabled but may want to
> leave out the kernel protection?

Core scheduling per se without all the protection muck, i.e. a relaxed
version which tries to gang schedule threads of a process on a core if
feasible has advantages to some workloads.

>> @@ -111,6 +113,12 @@ static __always_inline void exit_to_user
>>  /* Workaround to allow gradual conversion of architecture code */
>>  void __weak arch_do_signal(struct pt_regs *regs) { }
>>  
>> +static inline unsigned long exit_to_user_get_work(void)
>> +{
>> +	exit_to_user_ht_sucks();
>
> Ok, one issue with your patch is it does not take care of the waiting logic.
> sched_core_unsafe_exit_wait() needs to be called *after* all of the
> exit_to_user_mode_work is processed. This is because
> sched_core_unsafe_exit_wait() also checks for any new exit-to-usermode-work
> that popped up while it is spinning and breaks out of its spin-till-safe loop
> early. This is key to solving the stop-machine issue. If the stopper needs to
> run, then the need-resched flag will be set and we break out of the spin and
> redo the whole exit_to_user_mode_loop() as it should.

And where is the problem?

syscall_entry()
  ...
    sys_foo()
      ....
      return 0;

  local_irq_disable();
  exit_to_user_mode_prepare()
    ti_work = exit_to_user_get_work()
       {
        if (ht_muck)
          syscall_exit_ht_muck() {
            ....
            while (wait) {
            	local_irq_enable();
                while (wait) cpu_relax();
                local_irq_disable();
            }
          }
        return READ_ONCE(current_thread_info()->flags);
       }

    if (unlikely(ti_work & WORK))
    	ti_work = exit_loop(ti_work)

        while (ti_work & MASK) {
          local_irq_enable();
          .....
          local_irq_disable();
          ti_work = exit_to_user_get_work()
            {
              See above
            }
       }

It covers both the 'no work' and the 'do work' exit path. If that's not
sufficient, then something is fundamentally wrong with your design.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode
  2020-09-02  7:53           ` Thomas Gleixner
@ 2020-09-02 15:12             ` Joel Fernandes
  2020-09-02 16:57             ` Dario Faggioli
  1 sibling, 0 replies; 68+ messages in thread
From: Joel Fernandes @ 2020-09-02 15:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Julien Desfossez, Peter Zijlstra, Vineeth Pillai, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan, mingo, pjt, torvalds, linux-kernel,
	fweisbec, keescook, kerrnel, Phil Auld, Valentin Schneider,
	Mel Gorman, Pawan Gupta, Paolo Bonzini, vineeth, Chen Yu,
	Christian Brauner, Agata Gruza, Antonio Gomez Iglesias, graf,
	konrad.wilk, dfaggioli, rostedt, derkling, benbjiang, Aubrey Li,
	Tim Chen, Paul E . McKenney

Hi Thomas,

On Wed, Sep 02, 2020 at 09:53:29AM +0200, Thomas Gleixner wrote:
[...]
> >> --- /dev/null
> >> +++ b/include/linux/pretend_ht_secure.h
> >> @@ -0,0 +1,21 @@
> >> +#ifndef _LINUX_PRETEND_HT_SECURE_H
> >> +#define _LINUX_PRETEND_HT_SECURE_H
> >> +
> >> +#ifdef CONFIG_PRETEND_HT_SECURE
> >> +static inline void enter_from_user_ht_sucks(void)
> >> +{
> >> +	if (static_branch_unlikely(&pretend_ht_secure_key))
> >> +		enter_from_user_pretend_ht_is_secure();
> >> +}
> >> +
> >> +static inline void exit_to_user_ht_sucks(void)
> >> +{
> >> +	if (static_branch_unlikely(&pretend_ht_secure_key))
> >> +		exit_to_user_pretend_ht_is_secure();
> >
> > We already have similar config and static keys for the core-scheduling
> > feature itself. Can we just make it depend on that?
> 
> Of course. This was just for illustration. :)

Got it. :)

> > Or, are you saying users may want 'core scheduling' enabled but may want to
> > leave out the kernel protection?
> 
> Core scheduling per se without all the protection muck, i.e. a relaxed
> version which tries to gang schedule threads of a process on a core if
> feasible has advantages to some workloads.

Sure. So I will make it depending on the existing core-scheduling
config/static-key so the kernel protection is there when core scheduling is
enabled (so both userspace and with this patch the kernel is protected).

> 
> >> @@ -111,6 +113,12 @@ static __always_inline void exit_to_user
> >>  /* Workaround to allow gradual conversion of architecture code */
> >>  void __weak arch_do_signal(struct pt_regs *regs) { }
> >>  
> >> +static inline unsigned long exit_to_user_get_work(void)
> >> +{
> >> +	exit_to_user_ht_sucks();
> >
> > Ok, one issue with your patch is it does not take care of the waiting logic.
> > sched_core_unsafe_exit_wait() needs to be called *after* all of the
> > exit_to_user_mode_work is processed. This is because
> > sched_core_unsafe_exit_wait() also checks for any new exit-to-usermode-work
> > that popped up while it is spinning and breaks out of its spin-till-safe loop
> > early. This is key to solving the stop-machine issue. If the stopper needs to
> > run, then the need-resched flag will be set and we break out of the spin and
> > redo the whole exit_to_user_mode_loop() as it should.
> 
> And where is the problem?
> 
> syscall_entry()
>   ...
>     sys_foo()
>       ....
>       return 0;
> 
>   local_irq_disable();
>   exit_to_user_mode_prepare()
>     ti_work = exit_to_user_get_work()
>        {
>         if (ht_muck)
>           syscall_exit_ht_muck() {
>             ....
>             while (wait) {
>             	local_irq_enable();
>                 while (wait) cpu_relax();
>                 local_irq_disable();
>             }
>           }
>         return READ_ONCE(current_thread_info()->flags);
>        }
> 
>     if (unlikely(ti_work & WORK))
>     	ti_work = exit_loop(ti_work)
> 
>         while (ti_work & MASK) {
>           local_irq_enable();
>           .....
>           local_irq_disable();
>           ti_work = exit_to_user_get_work()
>             {
>               See above
>             }
>        }
> 
> It covers both the 'no work' and the 'do work' exit path. If that's not
> sufficient, then something is fundamentally wrong with your design.

Yes, you are right, I got confused from your previous patch. This works too
and is exactly as my design. I will do it this way then. Thank you, Thomas!

 - Joel


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode
  2020-09-02  7:53           ` Thomas Gleixner
  2020-09-02 15:12             ` Joel Fernandes
@ 2020-09-02 16:57             ` Dario Faggioli
  2020-09-03  4:34               ` Joel Fernandes
  1 sibling, 1 reply; 68+ messages in thread
From: Dario Faggioli @ 2020-09-02 16:57 UTC (permalink / raw)
  To: Thomas Gleixner, Joel Fernandes
  Cc: Julien Desfossez, Peter Zijlstra, Vineeth Pillai, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan, mingo, pjt, torvalds, linux-kernel,
	fweisbec, keescook, kerrnel, Phil Auld, Valentin Schneider,
	Mel Gorman, Pawan Gupta, Paolo Bonzini, vineeth, Chen Yu,
	Christian Brauner, Agata Gruza, Antonio Gomez Iglesias, graf,
	konrad.wilk, rostedt, derkling, benbjiang, Aubrey Li, Tim Chen,
	Paul E . McKenney

[-- Attachment #1: Type: text/plain, Size: 1723 bytes --]

On Wed, 2020-09-02 at 09:53 +0200, Thomas Gleixner wrote:
> On Tue, Sep 01 2020 at 21:29, Joel Fernandes wrote:
> > On Tue, Sep 01, 2020 at 10:02:10PM +0200, Thomas Gleixner wrote:
> > > 
> > Or, are you saying users may want 'core scheduling' enabled but may
> > want to
> > leave out the kernel protection?
> 
> Core scheduling per se without all the protection muck, i.e. a
> relaxed
> version which tries to gang schedule threads of a process on a core
> if
> feasible has advantages to some workloads.
> 
Indeed! For at least two reasons, IMO:

1) what Thomas is saying already. I.e., even on a CPU which has HT but
is not affected by any of the (known!) speculation issues, one may want
to use Core Scheduling _as_a_feature_. For instance, for avoiding
threads from different processes, or vCPUs from different VMs, sharing
cores (e.g., for better managing their behavior/performance, or for
improved fairness of billing/accounting). And in this case, this
mechanism for protecting the kernel from the userspace on the other
thread may not be necessary or interesting;

2) protection of the kernel from the other thread running in userspace
may be achieved in different ways. This is one, sure. ASI will probably
be another. Hence if/when we'll have both, this and ASI, it would be
cool to be able to configure the system in such a way that there is
only one active, to avoid paying the price of both! :-)

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode
  2020-09-02 16:57             ` Dario Faggioli
@ 2020-09-03  4:34               ` Joel Fernandes
  2020-09-03 11:05                 ` Vineeth Pillai
                                   ` (2 more replies)
  0 siblings, 3 replies; 68+ messages in thread
From: Joel Fernandes @ 2020-09-03  4:34 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Thomas Gleixner, Julien Desfossez, Peter Zijlstra,
	Vineeth Pillai, Tim Chen, Aaron Lu, Aubrey Li, Dhaval Giani,
	Chris Hyser, Nishanth Aravamudan, Ingo Molnar, Paul Turner,
	Linus Torvalds, LKML, Frederic Weisbecker, Kees Cook, Greg Kerr,
	Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta,
	Paolo Bonzini, vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, Steven Rostedt,
	Patrick Bellasi, benbjiang(蒋彪),
	Aubrey Li, Tim Chen, Paul E . McKenney

On Wed, Sep 2, 2020 at 12:57 PM Dario Faggioli <dfaggioli@suse.com> wrote:
>
> On Wed, 2020-09-02 at 09:53 +0200, Thomas Gleixner wrote:
> > On Tue, Sep 01 2020 at 21:29, Joel Fernandes wrote:
> > > On Tue, Sep 01, 2020 at 10:02:10PM +0200, Thomas Gleixner wrote:
> > > >
> > > Or, are you saying users may want 'core scheduling' enabled but may
> > > want to
> > > leave out the kernel protection?
> >
> > Core scheduling per se without all the protection muck, i.e. a
> > relaxed
> > version which tries to gang schedule threads of a process on a core
> > if
> > feasible has advantages to some workloads.
> >
> Indeed! For at least two reasons, IMO:
>
> 1) what Thomas is saying already. I.e., even on a CPU which has HT but
> is not affected by any of the (known!) speculation issues, one may want
> to use Core Scheduling _as_a_feature_. For instance, for avoiding
> threads from different processes, or vCPUs from different VMs, sharing
> cores (e.g., for better managing their behavior/performance, or for
> improved fairness of billing/accounting). And in this case, this
> mechanism for protecting the kernel from the userspace on the other
> thread may not be necessary or interesting;

Agreed. So then I should really make this configurable and behind a
sysctl then. I'll do that.

> 2) protection of the kernel from the other thread running in userspace
> may be achieved in different ways. This is one, sure. ASI will probably
> be another. Hence if/when we'll have both, this and ASI, it would be
> cool to be able to configure the system in such a way that there is
> only one active, to avoid paying the price of both! :-)

Actually, no. Part of ASI will involve exactly what this patch does -
IPI-pausing siblings but ASI does so when they have no choice but to
switch away from the "limited kernel" mapping, into the full host
kernel mapping. I am not sure if they have yet implemented that part
but they do talk of it in [1] and in their pretty LPC slides.  It is
just that ASI tries to avoid that scenario of kicking all siblings out
of guest mode.  So, maybe this patch can be a stepping stone to ASI.
At least I got the entry hooks right, and the algorithm is efficient
IMO (useless IPIs are avoided).  ASI can then come in and avoid
sending IPIs even more by doing their limited-kernel-mapping things if
needed. So, it does not need to be this vs ASI, both may be needed.

Why do you feel that ASI on its own offers some magical protection
that eliminates the need for this patch?

 thanks,

 - Joel

[1]  The link https://lkml.org/lkml/2019/5/13/515 mentions "note that
kicking all sibling hyperthreads is not implemented in this serie"

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 06/23] bitops: Introduce find_next_or_bit
  2020-08-28 19:51 ` [RFC PATCH v7 06/23] bitops: Introduce find_next_or_bit Julien Desfossez
@ 2020-09-03  5:13   ` Randy Dunlap
  0 siblings, 0 replies; 68+ messages in thread
From: Randy Dunlap @ 2020-09-03  5:13 UTC (permalink / raw)
  To: Julien Desfossez, Peter Zijlstra, Vineeth Pillai, Joel Fernandes,
	Tim Chen, Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan
  Cc: mingo, tglx, pjt, torvalds, linux-kernel, fweisbec, keescook,
	kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta,
	Paolo Bonzini, joel, vineeth, Chen Yu, Christian Brauner,
	Agata Gruza, Antonio Gomez Iglesias, graf, konrad.wilk,
	dfaggioli, rostedt, derkling, benbjiang

On 8/28/20 12:51 PM, Julien Desfossez wrote:
> +#ifndef find_next_or_bit
> +/**
> + * find_next_or_bit - find the next set bit in any memory regions
> + * @addr1: The first address to base the search on
> + * @addr2: The second address to base the search on
> + * @offset: The bitnumber to start searching at

preferably         bit number

> + * @size: The bitmap size in bits
> + *
> + * Returns the bit number for the next set bit

 * Return: the bit number for the next set bit

for kernel-doc syntax.


> + * If no bits are set, returns @size.
> + */
> +extern unsigned long find_next_or_bit(const unsigned long *addr1,
> +		const unsigned long *addr2, unsigned long size,
> +		unsigned long offset);
> +#endif


thanks.
-- 
~Randy


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode
  2020-09-03  4:34               ` Joel Fernandes
@ 2020-09-03 11:05                 ` Vineeth Pillai
  2020-09-03 13:20                 ` Thomas Gleixner
  2020-09-03 13:43                 ` Dario Faggioli
  2 siblings, 0 replies; 68+ messages in thread
From: Vineeth Pillai @ 2020-09-03 11:05 UTC (permalink / raw)
  To: Joel Fernandes, Dario Faggioli
  Cc: Thomas Gleixner, Julien Desfossez, Peter Zijlstra, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan, Ingo Molnar, Paul Turner, Linus Torvalds,
	LKML, Frederic Weisbecker, Kees Cook, Greg Kerr, Phil Auld,
	Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini,
	vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, Steven Rostedt,
	Patrick Bellasi, benbjiang(蒋彪),
	Aubrey Li, Tim Chen, Paul E . McKenney



On 9/3/20 12:34 AM, Joel Fernandes wrote:
>>
>> Indeed! For at least two reasons, IMO:
>>
>> 1) what Thomas is saying already. I.e., even on a CPU which has HT but
>> is not affected by any of the (known!) speculation issues, one may want
>> to use Core Scheduling _as_a_feature_. For instance, for avoiding
>> threads from different processes, or vCPUs from different VMs, sharing
>> cores (e.g., for better managing their behavior/performance, or for
>> improved fairness of billing/accounting). And in this case, this
>> mechanism for protecting the kernel from the userspace on the other
>> thread may not be necessary or interesting;
> Agreed. So then I should really make this configurable and behind a
> sysctl then. I'll do that.
We already have the patch to wrap this feature in a build time and
boot time option:
https://lwn.net/ml/linux-kernel/9cd9abad06ad8c3f35228afd07c74c7d9533c412.1598643276.git.jdesfossez@digitalocean.com/

I could not get to a safe way to make it a runtime tunable at the time
of posting this series, but I think it should also be possible.

Thanks,
Vineeth


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode
  2020-09-03  4:34               ` Joel Fernandes
  2020-09-03 11:05                 ` Vineeth Pillai
@ 2020-09-03 13:20                 ` Thomas Gleixner
  2020-09-03 20:30                   ` Joel Fernandes
  2020-09-03 13:43                 ` Dario Faggioli
  2 siblings, 1 reply; 68+ messages in thread
From: Thomas Gleixner @ 2020-09-03 13:20 UTC (permalink / raw)
  To: Joel Fernandes, Dario Faggioli
  Cc: Julien Desfossez, Peter Zijlstra, Vineeth Pillai, Tim Chen,
	Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan, Ingo Molnar, Paul Turner, Linus Torvalds,
	LKML, Frederic Weisbecker, Kees Cook, Greg Kerr, Phil Auld,
	Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini,
	vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, Steven Rostedt,
	Patrick Bellasi, benbjiang(蒋彪),
	Aubrey Li, Tim Chen, Paul E . McKenney

On Thu, Sep 03 2020 at 00:34, Joel Fernandes wrote:
> On Wed, Sep 2, 2020 at 12:57 PM Dario Faggioli <dfaggioli@suse.com> wrote:
>> 2) protection of the kernel from the other thread running in userspace
>> may be achieved in different ways. This is one, sure. ASI will probably
>> be another. Hence if/when we'll have both, this and ASI, it would be
>> cool to be able to configure the system in such a way that there is
>> only one active, to avoid paying the price of both! :-)
>
> Actually, no. Part of ASI will involve exactly what this patch does -
> IPI-pausing siblings but ASI does so when they have no choice but to
> switch away from the "limited kernel" mapping, into the full host
> kernel mapping. I am not sure if they have yet implemented that part
> but they do talk of it in [1] and in their pretty LPC slides.  It is
> just that ASI tries to avoid that scenario of kicking all siblings out
> of guest mode.  So, maybe this patch can be a stepping stone to ASI.
> At least I got the entry hooks right, and the algorithm is efficient
> IMO (useless IPIs are avoided).  ASI can then come in and avoid
> sending IPIs even more by doing their limited-kernel-mapping things if
> needed. So, it does not need to be this vs ASI, both may be needed.

Right. There are different parts which are seperate:

1) Core scheduling as a best effort feature (performance for certain use
   cases)

2) Enforced core scheduling (utilizes #1 basics)

3) ASI

4) Kick sibling out of guest/host and wait mechanics

#1, #2, #3 can be used stand alone. #4 is a utility

Then you get combos:

A) #2 + #4:

   core wide protection. i.e. what this series tries to achieve.  #3
   triggers the kick at the low level VMEXIT or entry from user mode
   boundary. The wait happens at the same level

B) #3 + #4:

   ASI plus kicking the sibling/wait mechanics independent of what's
   scheduled. #3 triggers the kick at the ASI switch to full host
   mapping boundary and the wait is probably the same as in #A

C) #2 + #3 + #4:

   The full concert, but trigger/wait wise the same as #B

So we really want to make at least #4 an independent utility.

Thanks,

        tglx






^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode
  2020-09-03  4:34               ` Joel Fernandes
  2020-09-03 11:05                 ` Vineeth Pillai
  2020-09-03 13:20                 ` Thomas Gleixner
@ 2020-09-03 13:43                 ` Dario Faggioli
  2020-09-03 20:25                   ` Joel Fernandes
  2 siblings, 1 reply; 68+ messages in thread
From: Dario Faggioli @ 2020-09-03 13:43 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Thomas Gleixner, Julien Desfossez, Peter Zijlstra,
	Vineeth Pillai, Tim Chen, Aaron Lu, Aubrey Li, Dhaval Giani,
	Chris Hyser, Nishanth Aravamudan, Ingo Molnar, Paul Turner,
	Linus Torvalds, LKML, Frederic Weisbecker, Kees Cook, Greg Kerr,
	Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta,
	Paolo Bonzini, vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, Steven Rostedt,
	Patrick Bellasi, benbjiang(蒋彪),
	Aubrey Li, Tim Chen, Paul E . McKenney

[-- Attachment #1: Type: text/plain, Size: 2238 bytes --]

On Thu, 2020-09-03 at 00:34 -0400, Joel Fernandes wrote:
> On Wed, Sep 2, 2020 at 12:57 PM Dario Faggioli <dfaggioli@suse.com>
> wrote:
> > 2) protection of the kernel from the other thread running in
> > userspace
> > may be achieved in different ways. This is one, sure. ASI will
> > probably
> > be another. Hence if/when we'll have both, this and ASI, it would
> > be
> > cool to be able to configure the system in such a way that there is
> > only one active, to avoid paying the price of both! :-)
> 
> Actually, no. Part of ASI will involve exactly what this patch does -
> IPI-pausing siblings but ASI does so when they have no choice but to
> switch away from the "limited kernel" mapping, into the full host
> kernel mapping. I am not sure if they have yet implemented that part
> but they do talk of it in [1] and in their pretty LPC slides.  It is
> just that ASI tries to avoid that scenario of kicking all siblings
> out
> of guest mode.  So, maybe this patch can be a stepping stone to ASI.
>
Ah, sure! I mean, it of course depends on how things get mixed
together, which we still don't know. :-)

I know that ASI wants to do this very same thing some times. I just
wanted to say that we need/want only one mechanism for doing that, in
place at any given time.

If that mechanism will be this one, and ASI then uses it (e.g., in a
scenario where for whatever reason one wants ASI with the kicking but
not Core Scheduling), then great, I agree, and everything does makes
sense.

> Why do you feel that ASI on its own offers some magical protection
> that eliminates the need for this patch?
> 
I don't. I was just trying to think at different use cases, how they
mix and match, etc. But as you're suggesting (and as Thomas has shown
very clearly in another reply to this email), the thing that makes the
most sense is to have one mechanism for the kicking, used by anyone
that needs it. :-)

Thanks and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode
  2020-09-03 13:43                 ` Dario Faggioli
@ 2020-09-03 20:25                   ` Joel Fernandes
  0 siblings, 0 replies; 68+ messages in thread
From: Joel Fernandes @ 2020-09-03 20:25 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Thomas Gleixner, Julien Desfossez, Peter Zijlstra,
	Vineeth Pillai, Tim Chen, Aaron Lu, Aubrey Li, Dhaval Giani,
	Chris Hyser, Nishanth Aravamudan, Ingo Molnar, Paul Turner,
	Linus Torvalds, LKML, Frederic Weisbecker, Kees Cook, Greg Kerr,
	Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta,
	Paolo Bonzini, vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, Steven Rostedt,
	Patrick Bellasi, benbjiang(蒋彪),
	Aubrey Li, Tim Chen, Paul E . McKenney

On Thu, Sep 3, 2020 at 9:43 AM Dario Faggioli <dfaggioli@suse.com> wrote:
>
> On Thu, 2020-09-03 at 00:34 -0400, Joel Fernandes wrote:
> > On Wed, Sep 2, 2020 at 12:57 PM Dario Faggioli <dfaggioli@suse.com>
> > wrote:
> > > 2) protection of the kernel from the other thread running in
> > > userspace
> > > may be achieved in different ways. This is one, sure. ASI will
> > > probably
> > > be another. Hence if/when we'll have both, this and ASI, it would
> > > be
> > > cool to be able to configure the system in such a way that there is
> > > only one active, to avoid paying the price of both! :-)
> >
> > Actually, no. Part of ASI will involve exactly what this patch does -
> > IPI-pausing siblings but ASI does so when they have no choice but to
> > switch away from the "limited kernel" mapping, into the full host
> > kernel mapping. I am not sure if they have yet implemented that part
> > but they do talk of it in [1] and in their pretty LPC slides.  It is
> > just that ASI tries to avoid that scenario of kicking all siblings
> > out
> > of guest mode.  So, maybe this patch can be a stepping stone to ASI.
> >
> Ah, sure! I mean, it of course depends on how things get mixed
> together, which we still don't know. :-)
>
> I know that ASI wants to do this very same thing some times. I just
> wanted to say that we need/want only one mechanism for doing that, in
> place at any given time.
>
> If that mechanism will be this one, and ASI then uses it (e.g., in a
> scenario where for whatever reason one wants ASI with the kicking but
> not Core Scheduling), then great, I agree, and everything does makes
> sense.

Agreed, with some minor modifications, it should be possible.

> > Why do you feel that ASI on its own offers some magical protection
> > that eliminates the need for this patch?
> >
> I don't. I was just trying to think of different use cases, how they
> mix and match, etc. But as you're suggesting (and as Thomas has shown
> very clearly in another reply to this email), the thing that makes the
> most sense is to have one mechanism for the kicking, used by anyone
> that needs it. :-)

Makes sense. Thanks,

 - Joel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode
  2020-09-03 13:20                 ` Thomas Gleixner
@ 2020-09-03 20:30                   ` Joel Fernandes
  0 siblings, 0 replies; 68+ messages in thread
From: Joel Fernandes @ 2020-09-03 20:30 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dario Faggioli, Julien Desfossez, Peter Zijlstra, Vineeth Pillai,
	Tim Chen, Aaron Lu, Aubrey Li, Dhaval Giani, Chris Hyser,
	Nishanth Aravamudan, Ingo Molnar, Paul Turner, Linus Torvalds,
	LKML, Frederic Weisbecker, Kees Cook, Greg Kerr, Phil Auld,
	Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini,
	vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, Steven Rostedt,
	Patrick Bellasi, benbjiang(蒋彪),
	Aubrey Li, Tim Chen, Paul E . McKenney

On Thu, Sep 3, 2020 at 9:20 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Thu, Sep 03 2020 at 00:34, Joel Fernandes wrote:
> > On Wed, Sep 2, 2020 at 12:57 PM Dario Faggioli <dfaggioli@suse.com> wrote:
> >> 2) protection of the kernel from the other thread running in userspace
> >> may be achieved in different ways. This is one, sure. ASI will probably
> >> be another. Hence if/when we'll have both, this and ASI, it would be
> >> cool to be able to configure the system in such a way that there is
> >> only one active, to avoid paying the price of both! :-)
> >
> > Actually, no. Part of ASI will involve exactly what this patch does -
> > IPI-pausing siblings but ASI does so when they have no choice but to
> > switch away from the "limited kernel" mapping, into the full host
> > kernel mapping. I am not sure if they have yet implemented that part
> > but they do talk of it in [1] and in their pretty LPC slides.  It is
> > just that ASI tries to avoid that scenario of kicking all siblings out
> > of guest mode.  So, maybe this patch can be a stepping stone to ASI.
> > At least I got the entry hooks right, and the algorithm is efficient
> > IMO (useless IPIs are avoided).  ASI can then come in and avoid
> > sending IPIs even more by doing their limited-kernel-mapping things if
> > needed. So, it does not need to be this vs ASI, both may be needed.
>
> Right. There are different parts which are seperate:
>
> 1) Core scheduling as a best effort feature (performance for certain use
>    cases)
>
> 2) Enforced core scheduling (utilizes #1 basics)
>
> 3) ASI
>
> 4) Kick sibling out of guest/host and wait mechanics
>
> #1, #2, #3 can be used stand alone. #4 is a utility
>
> Then you get combos:
>
> A) #2 + #4:
>
>    core wide protection. i.e. what this series tries to achieve.  #3
>    triggers the kick at the low level VMEXIT or entry from user mode
>    boundary. The wait happens at the same level
>
> B) #3 + #4:
>
>    ASI plus kicking the sibling/wait mechanics independent of what's
>    scheduled. #3 triggers the kick at the ASI switch to full host
>    mapping boundary and the wait is probably the same as in #A
>
> C) #2 + #3 + #4:
>
>    The full concert, but trigger/wait wise the same as #B
>
> So we really want to make at least #4 an independent utility.

Agreed! Thanks for enlisting all the cases so well. I believe this
could be achieved by moving the calls to unsafe_enter() and
unsafe_exit() to when ASI decides it is time to enter the unsafe
kernel context.  I will keep it in mind when sending the next revision
as well.

thanks,

- Joel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling.
  2020-08-28 19:51 ` [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling Julien Desfossez
  2020-08-28 20:51   ` Peter Zijlstra
  2020-08-28 20:55   ` Peter Zijlstra
@ 2020-09-15 20:08   ` Joel Fernandes
  2 siblings, 0 replies; 68+ messages in thread
From: Joel Fernandes @ 2020-09-15 20:08 UTC (permalink / raw)
  To: Julien Desfossez
  Cc: Peter Zijlstra, Vineeth Pillai, Tim Chen, Aaron Lu, Aubrey Li,
	Dhaval Giani, Chris Hyser, Nishanth Aravamudan, mingo, tglx, pjt,
	torvalds, linux-kernel, fweisbec, keescook, kerrnel, Phil Auld,
	Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini,
	vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, rostedt,
	derkling, benbjiang, Vineeth Remanan Pillai, Aaron Lu

On Fri, Aug 28, 2020 at 03:51:09PM -0400, Julien Desfossez wrote:
> From: Peter Zijlstra <peterz@infradead.org>
> 
> Instead of only selecting a local task, select a task for all SMT
> siblings for every reschedule on the core (irrespective which logical
> CPU does the reschedule).
> 
> During a CPU hotplug event, schedule would be called with the hotplugged
> CPU not in the cpumask. So use for_each_cpu(_wrap)_or to include the
> current cpu in the task pick loop.
> 
> There are multiple loops in pick_next_task that iterate over CPUs in
> smt_mask. During a hotplug event, sibling could be removed from the
> smt_mask while pick_next_task is running. So we cannot trust the mask
> across the different loops. This can confuse the logic. Add a retry logic
> if smt_mask changes between the loops.
[...]
> +static struct task_struct *
> +pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
> +{
[...]
> +	/* reset state */
> +	rq->core->core_cookie = 0UL;
> +	for_each_cpu_or(i, smt_mask, cpumask_of(cpu)) {
> +		struct rq *rq_i = cpu_rq(i);
> +
> +		rq_i->core_pick = NULL;
> +
> +		if (rq_i->core_forceidle) {
> +			need_sync = true;
> +			rq_i->core_forceidle = false;
> +		}
> +
> +		if (i != cpu)
> +			update_rq_clock(rq_i);
> +	}
> +
> +	/*
> +	 * Try and select tasks for each sibling in decending sched_class
> +	 * order.
> +	 */
> +	for_each_class(class) {
> +again:
> +		for_each_cpu_wrap_or(i, smt_mask, cpumask_of(cpu), cpu) {
> +			struct rq *rq_i = cpu_rq(i);
> +			struct task_struct *p;
> +
> +			/*
> +			 * During hotplug online a sibling can be added in
> +			 * the smt_mask * while we are here. If so, we would
> +			 * need to restart selection by resetting all over.
> +			 */
> +			if (unlikely(smt_weight != cpumask_weight(smt_mask)))
> +				goto retry_select;
> +
> +			if (rq_i->core_pick)
> +				continue;
> +
> +			/*
> +			 * If this sibling doesn't yet have a suitable task to
> +			 * run; ask for the most elegible task, given the
> +			 * highest priority task already selected for this
> +			 * core.
> +			 */
> +			p = pick_task(rq_i, class, max);
> +			if (!p) {
> +				/*
> +				 * If there weren't no cookies; we don't need
> +				 * to bother with the other siblings.
> +				 */
> +				if (i == cpu && !need_sync)
> +					goto next_class;

We find that there is a problem with this code, it force idles RT tasks
whenever one sibling is running tagged CFS task, with the other one running
untagged RT task.

Below diff should fix it, still testing it:

---8<-----------------------

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index cc90eb50d481..26d05043b640 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4880,10 +4880,15 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 			p = pick_task(rq_i, class, max);
 			if (!p) {
 				/*
-				 * If there weren't no cookies; we don't need
-				 * to bother with the other siblings.
+				 * If the rest of the core is not running a
+				 * tagged task, i.e.  need_sync == 0, and the
+				 * current CPU which called into the schedule()
+				 * loop does not have any tasks for this class,
+				 * skip selecting for other siblings since
+				 * there's no point. We don't skip for RT/DL
+				 * because that could make CFS force-idle RT.
 				 */
-				if (i == cpu && !need_sync)
+				if (i == cpu && !need_sync && class == &fair_sched_class)
 					goto next_class;
 
 				continue;

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 11/23] sched/fair: core wide cfs task priority comparison
  2020-08-28 19:51 ` [RFC PATCH v7 11/23] sched/fair: core wide cfs task priority comparison Julien Desfossez
  2020-08-28 21:29   ` Peter Zijlstra
@ 2020-09-15 21:49   ` chris hyser
       [not found]     ` <81b208ad-b9e6-bfbf-631e-02e9f75d73a2@linux.intel.com>
  1 sibling, 1 reply; 68+ messages in thread
From: chris hyser @ 2020-09-15 21:49 UTC (permalink / raw)
  To: Julien Desfossez, Peter Zijlstra, Vineeth Pillai, Joel Fernandes,
	Tim Chen, Aaron Lu, Aubrey Li, Dhaval Giani, Nishanth Aravamudan
  Cc: mingo, tglx, pjt, torvalds, linux-kernel, fweisbec, keescook,
	kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta,
	Paolo Bonzini, joel, vineeth, Chen Yu, Christian Brauner,
	Agata Gruza, Antonio Gomez Iglesias, graf, konrad.wilk,
	dfaggioli, rostedt, derkling, benbjiang, Aaron Lu

On 8/28/20 3:51 PM, Julien Desfossez wrote:
> From: Aaron Lu <aaron.lwe@gmail.com>
> 
> This patch provides a vruntime based way to compare two cfs task's
> priority, be it on the same cpu or different threads of the same core.
> 
> When the two tasks are on the same CPU, we just need to find a common
> cfs_rq both sched_entities are on and then do the comparison.
> 
> When the two tasks are on differen threads of the same core, each thread
> will choose the next task to run the usual way and then the root level
> sched entities which the two tasks belong to will be used to decide
> which task runs next core wide.
> 
> An illustration for the cross CPU case:
> 
>     cpu0         cpu1
>   /   |  \     /   |  \
> se1 se2 se3  se4 se5 se6
>      /  \            /   \
>    se21 se22       se61  se62
>    (A)                    /
>                         se621
>                          (B)
> 
> Assume CPU0 and CPU1 are smt siblings and cpu0 has decided task A to
> run next and cpu1 has decided task B to run next. To compare priority
> of task A and B, we compare priority of se2 and se6. Whose vruntime is
> smaller, who wins.
> 
> To make this work, the root level sched entities' vruntime of the two
> threads must be directly comparable. So one of the hyperthread's root
> cfs_rq's min_vruntime is chosen as the core wide one and all root level
> sched entities' vruntime is normalized against it.
> 
> All sub cfs_rqs and sched entities are not interesting in cross cpu
> priority comparison as they will only participate in the usual cpu local
> schedule decision so no need to normalize their vruntimes.
> 
> Signed-off-by: Aaron Lu <ziqian.lzq@antfin.com>
> ---
>   kernel/sched/core.c  |  23 +++----
>   kernel/sched/fair.c  | 142 ++++++++++++++++++++++++++++++++++++++++++-
>   kernel/sched/sched.h |   3 +
>   3 files changed, 150 insertions(+), 18 deletions(-)


While investigating reported 'uperf' performance regressions between core sched v5 and core sched v6/v7, this patch 
seems to be the first indicator of about a 40% perf loss in moving between v5 and v6 (and the accounting here is carried 
forward into this patch). Unfortunately, it is not the easiest thing to trace back as the patchsets are not directly 
comparable in this case and moving into v7, the base kernel revision has changed from 5.6 to 5.9.

The regressions were duplicated with the following setup: on a 24 core VM, create a cgroup and in it, fire off the uperf 
server and the client running 2 mins worth of 100 threads doing short TCP reads and writes. Do this for both the cgroup 
core sched tagged and not tagged (technically tearing everything down and rebuilding it in between). Short and easy to 
do dozens of runs for statistical averaging.

What ever the above version of this test might map to in real life, it presumably exacerbates the competition between 
softirq threads and the core sched tagged threads which was observed in the reports.

Here are the uperf results of the various patchsets. Note, that disabling smt is better for these tests and that that 
presumably reflects the overall overhead of core scheduling which went from bad to really bad. The primary focus in this 
email is to start to understand what happened within core sched itself.

patchset          smt=on/cs=off  smt=off    smt=on/cs=on
--------------------------------------------------------
v5-v5.6.y      :    1.78Gb/s     1.57Gb/s     1.07Gb/s
pre-v6-v5.6.y  :    1.75Gb/s     1.55Gb/s    822.16Mb/s
v6-5.7         :    1.87Gs/s     1.56Gb/s    561.6Mb/s
v6-5.7-hotplug :    1.75Gb/s     1.58Gb/s    438.21Mb/s
v7             :    1.80Gb/s     1.61Gb/s    440.44Mb/s

If you start by contrasting v5 and v6 on the same base 5.6 kernel to try to rule out kernel to kernel version 
differences, bisecting v6 pointed to the v6 version of (ie this patch):

"[RFC PATCH v7 11/23] sched/fair: core wide cfs task priority comparison"

although all that really seems to be saying is that the new means of vruntime accounting (still present in v7) has 
caused performance in the patchset to drop which is plausible; different numbers, different scheduler behavior. A rough 
attempt to verify this by backporting parts of the new accounting onto the v5 patchset show where that initial switching 
from old to new accounting dropped perf to about 791Mb/s and the rest of the changes (as shown in the v6 numbers though 
not backported), only bring the v6 patchset back to 822.16Mb/s. That is not 100% proof, but seems very suspicious.

This change in vruntime accounting seems to account for about 40% of the total v5-to-v7 perf loss though clearly lots of 
other changes have occurred in between. Certainly not saying there is a bug here, just time to bring in the original 
authors and start a general discussion.

-chrish

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 11/23] sched/fair: core wide cfs task priority comparison
       [not found]     ` <81b208ad-b9e6-bfbf-631e-02e9f75d73a2@linux.intel.com>
@ 2020-09-16 14:24       ` chris hyser
  2020-09-16 20:53         ` chris hyser
  0 siblings, 1 reply; 68+ messages in thread
From: chris hyser @ 2020-09-16 14:24 UTC (permalink / raw)
  To: Li, Aubrey, Julien Desfossez, Peter Zijlstra, Vineeth Pillai,
	Joel Fernandes, Tim Chen, Aaron Lu, Aubrey Li, Dhaval Giani,
	Nishanth Aravamudan
  Cc: mingo, tglx, pjt, torvalds, linux-kernel, fweisbec, keescook,
	kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta,
	Paolo Bonzini, joel, vineeth, Chen Yu, Christian Brauner,
	Agata Gruza, Antonio Gomez Iglesias, graf, konrad.wilk,
	dfaggioli, rostedt, derkling, benbjiang, Aaron Lu, Ning, Hongyu

On 9/16/20 8:57 AM, Li, Aubrey wrote:
>> Here are the uperf results of the various patchsets. Note, that disabling smt is better for these tests and that that presumably reflects the overall overhead of core scheduling which went from bad to really bad. The primary focus in this email is to start to understand what happened within core sched itself.
>>
>> patchset          smt=on/cs=off  smt=off    smt=on/cs=on
>> --------------------------------------------------------
>> v5-v5.6.y      :    1.78Gb/s     1.57Gb/s     1.07Gb/s
>> pre-v6-v5.6.y  :    1.75Gb/s     1.55Gb/s    822.16Mb/s
>> v6-5.7         :    1.87Gs/s     1.56Gb/s    561.6Mb/s
>> v6-5.7-hotplug :    1.75Gb/s     1.58Gb/s    438.21Mb/s
>> v7             :    1.80Gb/s     1.61Gb/s    440.44Mb/s
> 
> I haven't had a chance to play with v7, but I got something different.
> 
>    branch		smt=on/cs=on
> coresched/v5-v5.6.y	1.09Gb/s
> coresched/v6-v5.7.y	1.05Gb/s
> 
> I attached my kernel config in case you want to make a comparison, or you
> can send yours, I'll try to see I can replicate your result.

I will give this config a try. One of the reports forwarded to me about the drop in uperf perf was an email from you I 
believe mentioning a 50% perf drop between v5 and v6?? I was actually setting out to duplicate your results. :-)

-chrish



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 11/23] sched/fair: core wide cfs task priority comparison
  2020-09-16 14:24       ` chris hyser
@ 2020-09-16 20:53         ` chris hyser
  2020-09-17  1:09           ` Li, Aubrey
  0 siblings, 1 reply; 68+ messages in thread
From: chris hyser @ 2020-09-16 20:53 UTC (permalink / raw)
  To: Li, Aubrey, Julien Desfossez, Peter Zijlstra, Vineeth Pillai,
	Joel Fernandes, Tim Chen, Aaron Lu, Aubrey Li, Dhaval Giani,
	Nishanth Aravamudan
  Cc: mingo, tglx, pjt, torvalds, linux-kernel, fweisbec, keescook,
	kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta,
	Paolo Bonzini, joel, vineeth, Chen Yu, Christian Brauner,
	Agata Gruza, Antonio Gomez Iglesias, graf, konrad.wilk,
	dfaggioli, rostedt, derkling, benbjiang, Aaron Lu, Ning, Hongyu

On 9/16/20 10:24 AM, chris hyser wrote:
> On 9/16/20 8:57 AM, Li, Aubrey wrote:
>>> Here are the uperf results of the various patchsets. Note, that disabling smt is better for these tests and that that 
>>> presumably reflects the overall overhead of core scheduling which went from bad to really bad. The primary focus in 
>>> this email is to start to understand what happened within core sched itself.
>>>
>>> patchset          smt=on/cs=off  smt=off    smt=on/cs=on
>>> --------------------------------------------------------
>>> v5-v5.6.y      :    1.78Gb/s     1.57Gb/s     1.07Gb/s
>>> pre-v6-v5.6.y  :    1.75Gb/s     1.55Gb/s    822.16Mb/s
>>> v6-5.7         :    1.87Gs/s     1.56Gb/s    561.6Mb/s
>>> v6-5.7-hotplug :    1.75Gb/s     1.58Gb/s    438.21Mb/s
>>> v7             :    1.80Gb/s     1.61Gb/s    440.44Mb/s
>>
>> I haven't had a chance to play with v7, but I got something different.
>>
>>    branch        smt=on/cs=on
>> coresched/v5-v5.6.y    1.09Gb/s
>> coresched/v6-v5.7.y    1.05Gb/s
>>
>> I attached my kernel config in case you want to make a comparison, or you
>> can send yours, I'll try to see I can replicate your result.
> 
> I will give this config a try. One of the reports forwarded to me about the drop in uperf perf was an email from you I 
> believe mentioning a 50% perf drop between v5 and v6?? I was actually setting out to duplicate your results. :-)

The first thing I did was to verify I built and tested the right bits. Presumably as I get same numbers. I'm still 
trying to tweak your config to get a root disk in my setup. Oh, one thing I missed in reading your first response, I had 
24 cores/48 cpus. I think you had half that, though my guess is that that should have actually made the numbers even 
worse. :-)

The following was forwarded to me originally sent on Aug 3, by you I believe:

> We found uperf(in cgroup) throughput drops by ~50% with corescheduling.
> 
> The problem is, uperf triggered a lot of softirq and offloaded softirq
> service to *ksoftirqd* thread.
> 
> - default, ksoftirqd thread can run with uperf on the same core, we saw
>   100% CPU utilization.
> - coresched enabled, ksoftirqd's core cookie is different from uperf, so
>   they can't run concurrently on the same core, we saw ~15% forced idle.
> 
> I guess this kind of performance drop can be replicated by other similar
> (a lot of softirq activities) workloads.
> 
> Currently core scheduler picks cookie-match tasks for all SMT siblings, does
> it make sense we add a policy to allow cookie-compatible task running together?
> For example, if a task is trusted(set by admin), it can work with kernel thread.
> The difference from corescheduling disabled is that we still have user to user
> isolation.
> 
> Thanks,
> -Aubrey

Would you please elaborate on what this test was? In trying to duplicate this, I just kept adding uperf threads to my 
setup until I started to see performance losses similar to what is reported above (and a second report about v7). Also, 
I wasn't looking for absolute numbers per-se, just significant enough differences to try to track where the performance 
went.

-chrish

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 11/23] sched/fair: core wide cfs task priority comparison
  2020-09-16 20:53         ` chris hyser
@ 2020-09-17  1:09           ` Li, Aubrey
  0 siblings, 0 replies; 68+ messages in thread
From: Li, Aubrey @ 2020-09-17  1:09 UTC (permalink / raw)
  To: chris hyser, Julien Desfossez, Peter Zijlstra, Vineeth Pillai,
	Joel Fernandes, Tim Chen, Aaron Lu, Aubrey Li, Dhaval Giani,
	Nishanth Aravamudan
  Cc: mingo, tglx, pjt, torvalds, linux-kernel, fweisbec, keescook,
	kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta,
	Paolo Bonzini, joel, vineeth, Chen Yu, Christian Brauner,
	Agata Gruza, Antonio Gomez Iglesias, graf, konrad.wilk,
	dfaggioli, rostedt, derkling, benbjiang, Aaron Lu, Ning, Hongyu

On 2020/9/17 4:53, chris hyser wrote:
> On 9/16/20 10:24 AM, chris hyser wrote:
>> On 9/16/20 8:57 AM, Li, Aubrey wrote:
>>>> Here are the uperf results of the various patchsets. Note, that disabling smt is better for these tests and that that presumably reflects the overall overhead of core scheduling which went from bad to really bad. The primary focus in this email is to start to understand what happened within core sched itself.
>>>>
>>>> patchset          smt=on/cs=off  smt=off    smt=on/cs=on
>>>> --------------------------------------------------------
>>>> v5-v5.6.y      :    1.78Gb/s     1.57Gb/s     1.07Gb/s
>>>> pre-v6-v5.6.y  :    1.75Gb/s     1.55Gb/s    822.16Mb/s
>>>> v6-5.7         :    1.87Gs/s     1.56Gb/s    561.6Mb/s
>>>> v6-5.7-hotplug :    1.75Gb/s     1.58Gb/s    438.21Mb/s
>>>> v7             :    1.80Gb/s     1.61Gb/s    440.44Mb/s
>>>
>>> I haven't had a chance to play with v7, but I got something different.
>>>
>>>    branch        smt=on/cs=on
>>> coresched/v5-v5.6.y    1.09Gb/s
>>> coresched/v6-v5.7.y    1.05Gb/s
>>>
>>> I attached my kernel config in case you want to make a comparison, or you
>>> can send yours, I'll try to see I can replicate your result.
>>
>> I will give this config a try. One of the reports forwarded to me about the drop in uperf perf was an email from you I believe mentioning a 50% perf drop between v5 and v6?? I was actually setting out to duplicate your results. :-)
> 
> The first thing I did was to verify I built and tested the right bits. Presumably as I get same numbers. I'm still trying to tweak your config to get a root disk in my setup. Oh, one thing I missed in reading your first response, I had 24 cores/48 cpus. I think you had half that, though my guess is that that should have actually made the numbers even worse. :-)
> 
> The following was forwarded to me originally sent on Aug 3, by you I believe:
> 
>> We found uperf(in cgroup) throughput drops by ~50% with corescheduling.
>>
>> The problem is, uperf triggered a lot of softirq and offloaded softirq
>> service to *ksoftirqd* thread.
>>
>> - default, ksoftirqd thread can run with uperf on the same core, we saw
>>   100% CPU utilization.
>> - coresched enabled, ksoftirqd's core cookie is different from uperf, so
>>   they can't run concurrently on the same core, we saw ~15% forced idle.
>>
>> I guess this kind of performance drop can be replicated by other similar
>> (a lot of softirq activities) workloads.
>>
>> Currently core scheduler picks cookie-match tasks for all SMT siblings, does
>> it make sense we add a policy to allow cookie-compatible task running together?
>> For example, if a task is trusted(set by admin), it can work with kernel thread.
>> The difference from corescheduling disabled is that we still have user to user
>> isolation.
>>
>> Thanks,
>> -Aubrey
> 
> Would you please elaborate on what this test was? In trying to duplicate this, I just kept adding uperf threads to my setup until I started to see performance losses similar to what is reported above (and a second report about v7). Also, I wasn't looking for absolute numbers per-se, just significant enough differences to try to track where the performance went.
> 

This test was smt-on/cs-on against smt-on/cs-off on the same corescheduling version,
we didn't find such big regression between different versions as you encountered.

Thanks,
-Aubrey

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 11/23] sched/fair: core wide cfs task priority comparison
  2020-08-28 21:29   ` Peter Zijlstra
@ 2020-09-17 14:15     ` Vineeth Pillai
  2020-09-17 20:39       ` Vineeth Pillai
  2020-09-23  1:46     ` Joel Fernandes
  1 sibling, 1 reply; 68+ messages in thread
From: Vineeth Pillai @ 2020-09-17 14:15 UTC (permalink / raw)
  To: Peter Zijlstra, Julien Desfossez
  Cc: Joel Fernandes, Tim Chen, Aaron Lu, Aubrey Li, Dhaval Giani,
	Chris Hyser, Nishanth Aravamudan, mingo, tglx, pjt, torvalds,
	linux-kernel, fweisbec, keescook, kerrnel, Phil Auld,
	Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini, joel,
	vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, rostedt,
	derkling, benbjiang, Aaron Lu

Hi Peter,

On 8/28/20 5:29 PM, Peter Zijlstra wrote:
>
> This is still a horrible patch..

I was working on your idea of using the lag as a tool to compare vruntime
across cpus:
https://lwn.net/ml/linux-kernel/20200506143506.GH5298@hirez.programming.kicks-ass.net/
https://lwn.net/ml/linux-kernel/20200514130248.GD2940@hirez.programming.kicks-ass.net/

Basically, I record the clock time when a sibling is force idled. And before
comparison, if the sibling is still force idled, take the weighted delta 
of time
the sibling was force idled and then decrement it from the vruntime. The
lag is  computed in pick_task_fair as we do cross-cpu comparison only after
a task_pick.

I think this can solve the SMTn issues since we are doing a one-sided
comaprison.

I have done some initial testing and looks good. Need to do some deep 
analysis
and check the fairness cases. Thought of sharing the prototype and get 
feedback
as I am doing further testing. Please have a look and let me know your 
thoughts.

The git tree is here: 
https://github.com/vineethrp/linux/tree/coresched/pre-v8-v5.9-rc-lagfix
It has  the review comments addressed since v8 was posted.

Thanks,
Vineeth

---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index cea9a63c2e7a..52b3e7b356e9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -136,19 +136,8 @@ static inline bool prio_less(struct task_struct *a, 
struct task_struct *b)
         if (pa == -1) /* dl_prio() doesn't work because of stop_class 
above */
                 return !dl_time_before(a->dl.deadline, b->dl.deadline);

-       if (pa == MAX_RT_PRIO + MAX_NICE)  { /* fair */
-               u64 vruntime = b->se.vruntime;
-
-               /*
-                * Normalize the vruntime if tasks are in different cpus.
-                */
-               if (task_cpu(a) != task_cpu(b)) {
-                       vruntime -= task_cfs_rq(b)->min_vruntime;
-                       vruntime += task_cfs_rq(a)->min_vruntime;
-               }
-
-               return !((s64)(a->se.vruntime - vruntime) <= 0);
-       }
+       if (pa == MAX_RT_PRIO + MAX_NICE)       /* fair */
+               return cfs_prio_less(a, b);

         return false;
  }
@@ -235,6 +224,13 @@ static void sched_core_dequeue(struct rq *rq, 
struct task_struct *p)

         rb_erase(&p->core_node, &rq->core_tree);
         RB_CLEAR_NODE(&p->core_node);
+
+    /*
+     * Reset last force idle time if this rq will be
+     * empty after this dequeue.
+     */
+    if (rq->nr_running == 1)
+        rq->core_fi_time = 0;
  }

  /*
@@ -310,8 +306,10 @@ static int __sched_core_stopper(void *data)
                          * cgroup tags. However, dying tasks could still be
                          * left in core queue. Flush them here.
                          */
-                       if (!enabled)
+                       if (!enabled) {
                                 sched_core_flush(cpu);
+                               rq->core_fi_time = 0;
+            }

                         rq->core_enabled = enabled;
                 }
@@ -5181,9 +5179,17 @@ next_class:;
                 if (!rq_i->core_pick)
                         continue;

-               if (is_task_rq_idle(rq_i->core_pick) && rq_i->nr_running &&
-                   !rq_i->core->core_forceidle) {
-                       rq_i->core->core_forceidle = true;
+               if (is_task_rq_idle(rq_i->core_pick) && rq_i->nr_running) {
+
+                       if (!rq_i->core->core_forceidle)
+                               rq_i->core->core_forceidle = true;
+
+                       /*
+                        * XXX: Should we record force idled time a bit 
later on the
+                        * pick_next_task of this sibling when it forces 
itself idle?
+                        */
+                       rq_i->core_fi_time = rq_clock_task(rq_i);
+                       trace_printk("force idle(cpu=%d) TS=%llu\n", i, 
rq_i->core_fi_time);
                 }

                 rq_i->core_pick->core_occupation = occ;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index eadc62930336..7f62a50e23e1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -389,6 +389,12 @@ static inline struct sched_entity 
*parent_entity(struct sched_entity *se)
         return se->parent;
  }

+static inline bool
+is_same_tg(struct sched_entity *se, struct sched_entity *pse)
+{
+       return se->cfs_rq->tg == pse->cfs_rq->tg;
+}
+
  static void
  find_matching_se(struct sched_entity **se, struct sched_entity **pse)
  {
@@ -453,6 +459,12 @@ static inline struct sched_entity 
*parent_entity(struct sched_entity *se)
         return NULL;
  }

+static inline bool
+is_same_tg(struct sched_entity *se, struct sched_entity *pse)
+{
+       return true;
+}
+
  static inline void
  find_matching_se(struct sched_entity **se, struct sched_entity **pse)
  {
@@ -6962,13 +6974,26 @@ static struct task_struct *pick_task_fair(struct 
rq *rq)
  {
         struct cfs_rq *cfs_rq = &rq->cfs;
         struct sched_entity *se;
+       u64 lag = 0;

         if (!cfs_rq->nr_running)
                 return NULL;

+       /*
+        * Calculate the lag caused due to force idle on this
+        * sibling. Since we do cross-cpu comparison of vruntime
+        * only after a task pick, it should be safe to compute
+        * the lag here.
+        */
+       if (rq->core_fi_time) {
+               lag = rq_clock_task(rq) - rq->core_fi_time;
+               if ((s64)lag < 0)
+                       lag = 0;
+       }
         do {
                 struct sched_entity *curr = cfs_rq->curr;

+
                 se = pick_next_entity(cfs_rq, NULL);

                 if (curr) {
@@ -6977,11 +7002,15 @@ static struct task_struct *pick_task_fair(struct 
rq *rq)

                         if (!se || entity_before(curr, se))
                                 se = curr;
+
+                       cfs_rq->core_lag += lag;
                 }

                 cfs_rq = group_cfs_rq(se);
         } while (cfs_rq);

+       rq->core_fi_time = 0;
+
         return task_of(se);
  }
  #endif
@@ -10707,6 +10736,41 @@ static inline void task_tick_core(struct rq 
*rq, struct task_struct *curr)
             __entity_slice_used(&curr->se))
                 resched_curr(rq);
  }
+
+bool cfs_prio_less(struct task_struct *a, struct task_struct *b)
+{
+       struct sched_entity *se_a = &a->se, *se_b = &b->se;
+       struct cfs_rq *cfs_rq_a, *cfs_rq_b;
+       u64 vruntime_a, vruntime_b;
+
+#ifdef CONFIG_FAIR_GROUP_SCHED
+       while (!is_same_tg(se_a, se_b)) {
+               int se_a_depth = se_a->depth;
+               int se_b_depth = se_b->depth;
+
+               if (se_a_depth <= se_b_depth)
+                       se_b = parent_entity(se_b);
+               if (se_a_depth >= se_b_depth)
+                       se_a = parent_entity(se_a);
+       }
+#endif
+
+       cfs_rq_a = cfs_rq_of(se_a);
+       cfs_rq_b = cfs_rq_of(se_b);
+
+       vruntime_a = se_a->vruntime - cfs_rq_a->min_vruntime;
+       vruntime_b = se_b->vruntime - cfs_rq_b->min_vruntime;
+
+       trace_printk("(%s/%d;%Ld,%Lu) ?< (%s/%d;%Ld,%Lu)\n",
+                    a->comm, a->pid, vruntime_a, cfs_rq_a->core_lag,
+                    b->comm, b->pid, vruntime_b, cfs_rq_b->core_lag);
+       if (cfs_rq_a != cfs_rq_b) {
+               vruntime_a -= calc_delta_fair(cfs_rq_a->core_lag, &a->se);
+               vruntime_b -= calc_delta_fair(cfs_rq_b->core_lag, &b->se);
+       }
+
+       return !((s64)(vruntime_a - vruntime_b) <= 0);
+}
  #else
  static inline void task_tick_core(struct rq *rq, struct task_struct 
*curr) {}
  #endif
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 5374ace195b9..a1c15edf8dd5 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -607,6 +607,10 @@ struct cfs_rq {
         struct list_head        throttled_list;
  #endif /* CONFIG_CFS_BANDWIDTH */
  #endif /* CONFIG_FAIR_GROUP_SCHED */
+
+#ifdef CONFIG_SCHED_CORE
+       u64                     core_lag;
+#endif
  };

  static inline int rt_bandwidth_enabled(void)
@@ -1056,6 +1060,7 @@ struct rq {
  #ifdef CONFIG_SCHED_CORE
         /* per rq */
         struct rq               *core;
+       u64                     core_fi_time; /* Last forced idle TS */
         struct task_struct      *core_pick;
         unsigned int            core_enabled;
         unsigned int            core_sched_seq;
vineeth@vinstation340u:~/WS/kernel_test/linux-qemu/out$ git show > log
vineeth@vinstation340u:~/WS/kernel_test/linux-qemu/out$ cat log
commit e97ec4d317ac01cabbcdcb1ac8a315a02b3e39f6
Author: Vineeth Pillai <viremana@linux.microsoft.com>
Date:   Wed Sep 16 11:35:37 2020 -0400

     sched/coresched: lag based cross cpu vruntime comparison

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index cea9a63c2e7a..52b3e7b356e9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -136,19 +136,8 @@ static inline bool prio_less(struct task_struct *a, 
struct task_struct *b)
      if (pa == -1) /* dl_prio() doesn't work because of stop_class above */
          return !dl_time_before(a->dl.deadline, b->dl.deadline);

-    if (pa == MAX_RT_PRIO + MAX_NICE)  { /* fair */
-        u64 vruntime = b->se.vruntime;
-
-        /*
-         * Normalize the vruntime if tasks are in different cpus.
-         */
-        if (task_cpu(a) != task_cpu(b)) {
-            vruntime -= task_cfs_rq(b)->min_vruntime;
-            vruntime += task_cfs_rq(a)->min_vruntime;
-        }
-
-        return !((s64)(a->se.vruntime - vruntime) <= 0);
-    }
+    if (pa == MAX_RT_PRIO + MAX_NICE)    /* fair */
+        return cfs_prio_less(a, b);

      return false;
  }
@@ -235,6 +224,13 @@ static void sched_core_dequeue(struct rq *rq, 
struct task_struct *p)

      rb_erase(&p->core_node, &rq->core_tree);
      RB_CLEAR_NODE(&p->core_node);
+
+    /*
+     * Reset last force idle time if this rq will be
+     * empty after this dequeue.
+     */
+    if (rq->nr_running == 1)
+        rq->core_fi_time = 0;
  }

  /*
@@ -310,8 +306,10 @@ static int __sched_core_stopper(void *data)
               * cgroup tags. However, dying tasks could still be
               * left in core queue. Flush them here.
               */
-            if (!enabled)
+            if (!enabled) {
                  sched_core_flush(cpu);
+                rq->core_fi_time = 0;
+                /* XXX Reset cfs_rq->core_lag? */
+            }

              rq->core_enabled = enabled;
          }
@@ -5181,9 +5179,17 @@ next_class:;
          if (!rq_i->core_pick)
              continue;

-        if (is_task_rq_idle(rq_i->core_pick) && rq_i->nr_running &&
-            !rq_i->core->core_forceidle) {
-            rq_i->core->core_forceidle = true;
+        if (is_task_rq_idle(rq_i->core_pick) && rq_i->nr_running) {
+
+            if (!rq_i->core->core_forceidle)
+                rq_i->core->core_forceidle = true;
+
+            /*
+             * XXX: Should we record force idled time a bit later on the
+             * pick_next_task of this sibling when it forces itself idle?
+             */
+            rq_i->core_fi_time = rq_clock_task(rq_i);
+            trace_printk("force idle(cpu=%d) TS=%llu\n", i, 
rq_i->core_fi_time);
          }

          rq_i->core_pick->core_occupation = occ;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index eadc62930336..7f62a50e23e1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -389,6 +389,12 @@ static inline struct sched_entity 
*parent_entity(struct sched_entity *se)
      return se->parent;
  }

+static inline bool
+is_same_tg(struct sched_entity *se, struct sched_entity *pse)
+{
+    return se->cfs_rq->tg == pse->cfs_rq->tg;
+}
+
  static void
  find_matching_se(struct sched_entity **se, struct sched_entity **pse)
  {
@@ -453,6 +459,12 @@ static inline struct sched_entity 
*parent_entity(struct sched_entity *se)
      return NULL;
  }

+static inline bool
+is_same_tg(struct sched_entity *se, struct sched_entity *pse)
+{
+    return true;
+}
+
  static inline void
  find_matching_se(struct sched_entity **se, struct sched_entity **pse)
  {
@@ -6962,13 +6974,26 @@ static struct task_struct *pick_task_fair(struct 
rq *rq)
  {
      struct cfs_rq *cfs_rq = &rq->cfs;
      struct sched_entity *se;
+  u64 lag = 0;

      if (!cfs_rq->nr_running)
          return NULL;

+    /*
+     * Calculate the lag caused due to force idle on this
+     * sibling. Since we do cross-cpu comparison of vruntime
+     * only after a task pick, it should be safe to compute
+     * the lag here.
+     */
+    if (rq->core_fi_time) {
+        lag = rq_clock_task(rq) - rq->core_fi_time;
+        if ((s64)lag < 0)
+            lag = 0;
+    }
      do {
          struct sched_entity *curr = cfs_rq->curr;

+
          se = pick_next_entity(cfs_rq, NULL);

          if (curr) {
@@ -6977,11 +7002,15 @@ static struct task_struct *pick_task_fair(struct 
rq *rq)

              if (!se || entity_before(curr, se))
                  se = curr;
+
+            cfs_rq->core_lag += lag;
          }

          cfs_rq = group_cfs_rq(se);
      } while (cfs_rq);

+    rq->core_fi_time = 0;
+
      return task_of(se);
  }
  #endif
@@ -10707,6 +10736,41 @@ static inline void task_tick_core(struct rq 
*rq, struct task_struct *curr)
          __entity_slice_used(&curr->se))
          resched_curr(rq);
  }
+
+bool cfs_prio_less(struct task_struct *a, struct task_struct *b)
+{
+    struct sched_entity *se_a = &a->se, *se_b = &b->se;
+    struct cfs_rq *cfs_rq_a, *cfs_rq_b;
+    u64 vruntime_a, vruntime_b;
+
+#ifdef CONFIG_FAIR_GROUP_SCHED
+    while (!is_same_tg(se_a, se_b)) {
+        int se_a_depth = se_a->depth;
+        int se_b_depth = se_b->depth;
+
+        if (se_a_depth <= se_b_depth)
+            se_b = parent_entity(se_b);
+        if (se_a_depth >= se_b_depth)
+            se_a = parent_entity(se_a);
+    }
+#endif
+
+    cfs_rq_a = cfs_rq_of(se_a);
+    cfs_rq_b = cfs_rq_of(se_b);
+
+    vruntime_a = se_a->vruntime - cfs_rq_a->min_vruntime;
+    vruntime_b = se_b->vruntime - cfs_rq_b->min_vruntime;
+
+    trace_printk("(%s/%d;%Ld,%Lu) ?< (%s/%d;%Ld,%Lu)\n",
+             a->comm, a->pid, vruntime_a, cfs_rq_a->core_lag,
+             b->comm, b->pid, vruntime_b, cfs_rq_b->core_lag);
+    if (cfs_rq_a != cfs_rq_b) {
+        vruntime_a -= calc_delta_fair(cfs_rq_a->core_lag, &a->se);
+        vruntime_b -= calc_delta_fair(cfs_rq_b->core_lag, &b->se);
+    }
+
+    return !((s64)(vruntime_a - vruntime_b) <= 0);
+}
  #else
  static inline void task_tick_core(struct rq *rq, struct task_struct 
*curr) {}
  #endif
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 5374ace195b9..a1c15edf8dd5 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -607,6 +607,10 @@ struct cfs_rq {
      struct list_head    throttled_list;
  #endif /* CONFIG_CFS_BANDWIDTH */
  #endif /* CONFIG_FAIR_GROUP_SCHED */
+
+#ifdef CONFIG_SCHED_CORE
+    u64            core_lag;
+#endif
  };

  static inline int rt_bandwidth_enabled(void)
@@ -1056,6 +1060,7 @@ struct rq {
  #ifdef CONFIG_SCHED_CORE
      /* per rq */
      struct rq        *core;
+   u64            core_fi_time; /* Last forced idle TS */
      struct task_struct    *core_pick;
      unsigned int        core_enabled;
      unsigned int        core_sched_seq;

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 11/23] sched/fair: core wide cfs task priority comparison
  2020-09-17 14:15     ` Vineeth Pillai
@ 2020-09-17 20:39       ` Vineeth Pillai
  0 siblings, 0 replies; 68+ messages in thread
From: Vineeth Pillai @ 2020-09-17 20:39 UTC (permalink / raw)
  To: Peter Zijlstra, Julien Desfossez
  Cc: Joel Fernandes, Tim Chen, Aaron Lu, Aubrey Li, Dhaval Giani,
	Chris Hyser, Nishanth Aravamudan, mingo, tglx, pjt, torvalds,
	linux-kernel, fweisbec, keescook, kerrnel, Phil Auld,
	Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini, joel,
	vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, rostedt,
	derkling, benbjiang, Aaron Lu


> +
> +bool cfs_prio_less(struct task_struct *a, struct task_struct *b)
> +{
> +       struct sched_entity *se_a = &a->se, *se_b = &b->se;
> +       struct cfs_rq *cfs_rq_a, *cfs_rq_b;
> +       u64 vruntime_a, vruntime_b;
> +
> +#ifdef CONFIG_FAIR_GROUP_SCHED
> +       while (!is_same_tg(se_a, se_b)) {
> +               int se_a_depth = se_a->depth;
> +               int se_b_depth = se_b->depth;
> +
> +               if (se_a_depth <= se_b_depth)
> +                       se_b = parent_entity(se_b);
> +               if (se_a_depth >= se_b_depth)
> +                       se_a = parent_entity(se_a);
> +       }
> +#endif
> +
> +       cfs_rq_a = cfs_rq_of(se_a);
> +       cfs_rq_b = cfs_rq_of(se_b);
> +
> +       vruntime_a = se_a->vruntime - cfs_rq_a->min_vruntime;
> +       vruntime_b = se_b->vruntime - cfs_rq_b->min_vruntime;
> +
> +       trace_printk("(%s/%d;%Ld,%Lu) ?< (%s/%d;%Ld,%Lu)\n",
> +                    a->comm, a->pid, vruntime_a, cfs_rq_a->core_lag,
> +                    b->comm, b->pid, vruntime_b, cfs_rq_b->core_lag);
> +       if (cfs_rq_a != cfs_rq_b) {
> +               vruntime_a -= calc_delta_fair(cfs_rq_a->core_lag, 
> &a->se);
> +               vruntime_b -= calc_delta_fair(cfs_rq_b->core_lag, 
> &b->se);

This should be:
+               vruntime_a -= calc_delta_fair(cfs_rq_a->core_lag, se_a);
+               vruntime_b -= calc_delta_fair(cfs_rq_b->core_lag, se_b);


Thanks,
Vineeth



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 11/23] sched/fair: core wide cfs task priority comparison
  2020-08-28 21:29   ` Peter Zijlstra
  2020-09-17 14:15     ` Vineeth Pillai
@ 2020-09-23  1:46     ` Joel Fernandes
  2020-09-23  1:52       ` Joel Fernandes
  1 sibling, 1 reply; 68+ messages in thread
From: Joel Fernandes @ 2020-09-23  1:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Julien Desfossez, Vineeth Pillai, Tim Chen, Aaron Lu, Aubrey Li,
	Dhaval Giani, Chris Hyser, Nishanth Aravamudan, mingo, tglx, pjt,
	torvalds, linux-kernel, fweisbec, keescook, kerrnel, Phil Auld,
	Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini,
	vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, rostedt,
	derkling, benbjiang, Aaron Lu

On Fri, Aug 28, 2020 at 11:29:27PM +0200, Peter Zijlstra wrote:
> 
> 
> This is still a horrible patch..

Hi Peter,
I wrote a new patch similar to this one and it fares much better in my tests,
it is based on Aaron's idea but I do the sync only during force-idle, and not
during enqueue. Also I yanked the whole 'core wide min_vruntime' crap. There
is a regressing test which improves quite a bit with my patch (results below):

Aaron, Vineeth, Chris any other thoughts? This patch is based on Google's
4.19 device kernel so will require some massaging to apply to mainline/v7
series. I will provide an updated patch later based on v7 series.

(Works only for SMT2, maybe we can generalize it more..)
--------8<-----------

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Subject: [PATCH] sched: Sync the min_vruntime of cores when the system enters
 force-idle

This patch provides a vruntime based way to compare two cfs task's priority, be
it on the same cpu or different threads of the same core.

It is based on Aaron Lu's patch with some important differences. Namely,
the vruntime is sync'ed only when the CPU goes into force-idle. Also I removed
the notion of core-wide min_vruntime.

Also I don't care how long a cpu in a core is force idled,  I do my sync
whenever the force idle starts essentially bringing both SMTs to a common time
base. After that point, selection can happen as usual.

When running an Android audio test, with patch the perf sched latency output:

-----------------------------------------------------------------------------------------------------------------
Task                  |   Runtime ms  | Switches | Average delay ms | Maximum delay ms | Maximum delay at       |
-----------------------------------------------------------------------------------------------------------------
FinalizerDaemon:(2)   |     23.969 ms |      969 | avg:    0.504 ms | max:  162.020 ms | max at:   1294.327339 s
HeapTaskDaemon:(3)    |   2421.287 ms |     4733 | avg:    0.131 ms | max:   96.229 ms | max at:   1302.343366 s
adbd:(3)              |      6.101 ms |       79 | avg:    1.105 ms | max:   84.923 ms | max at:   1294.431284 s

Without this patch and with Aubrey's initial patch (in v5 series), the max delay looks much better:

-----------------------------------------------------------------------------------------------------------------
Task                  |   Runtime ms  | Switches | Average delay ms | Maximum delay ms | Maximum delay at       |
-----------------------------------------------------------------------------------------------------------------
HeapTaskDaemon:(2)    |   2602.109 ms |     4025 | avg:    0.231 ms | max:   19.152 ms | max at:    522.903934 s
surfaceflinger:7478   |     18.994 ms |     1206 | avg:    0.189 ms | max:   17.375 ms | max at:    520.523061 s
ksoftirqd/3:30        |      0.093 ms |        5 | avg:    3.328 ms | max:   16.567 ms | max at:    522.903871 s

The comparison bits are the same as with Aaron's patch. When the two tasks
are on the same CPU, we just need to find a common cfs_rq both sched_entities
are on and then do the comparison.

When the two tasks are on different threads of the same core, the root
level sched_entities to which the two tasks belong will be used to do
the comparison.

An ugly illustration for the cross CPU case:

       cpu0         cpu1
     /   |  \     /   |  \
    se1 se2 se3  se4 se5 se6
        /  \            /   \
      se21 se22       se61  se62

Assume CPU0 and CPU1 are smt siblings and task A's se is se21 while
task B's se is se61. To compare priority of task A and B, we compare
priority of se2 and se6. Whose vruntime is smaller, who wins.

To make this work, the root level se should have a common cfs_rq min
vuntime, which I call it the core cfs_rq min vruntime.

When we adjust the min_vruntime of rq->core, we need to propgate
that down the tree so as to not cause starvation of existing tasks
based on previous vruntime.

Inspired-by: Aaron Lu <ziqian.lzq@antfin.com>
Signed-off-by: Joel Fernandes <joel@joelfernandes.org>
Change-Id: Ida0083a2382a37f768030112ddf43bdf024a84b3
---
 kernel/sched/core.c  | 17 +++++++++++++-
 kernel/sched/fair.c  | 54 +++++++++++++++++++++++---------------------
 kernel/sched/sched.h |  2 ++
 3 files changed, 46 insertions(+), 27 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index aeaf72acf063..d6f25fdd78ee 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4068,6 +4068,7 @@ pick_task(struct rq *rq, const struct sched_class *class, struct task_struct *ma
 static struct task_struct *
 pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 {
+	struct cfs_rq *cfs_rq[2];
 	struct task_struct *next, *max = NULL;
 	const struct sched_class *class;
 	const struct cpumask *smt_mask;
@@ -4247,6 +4248,7 @@ next_class:;
 	 * their task. This ensures there is no inter-sibling overlap between
 	 * non-matching user state.
 	 */
+	need_sync = false;
 	for_each_cpu(i, smt_mask) {
 		struct rq *rq_i = cpu_rq(i);
 
@@ -4255,8 +4257,10 @@ next_class:;
 
 		WARN_ON_ONCE(!rq_i->core_pick);
 
-		if (is_idle_task(rq_i->core_pick) && rq_i->nr_running)
+		if (is_idle_task(rq_i->core_pick) && rq_i->nr_running) {
 			rq_i->core_forceidle = true;
+			need_sync = true;
+		}
 
 		rq_i->core_pick->core_occupation = occ;
 
@@ -4270,6 +4274,17 @@ next_class:;
 		WARN_ON_ONCE(!cookie_match(next, rq_i->core_pick));
 	}
 
+	/* Something got force idled, sync the vruntimes. Works only for SMT2. */
+	if (need_sync) {
+		j = 0;
+		for_each_cpu(i, smt_mask) {
+			struct rq *rq_i = cpu_rq(i);
+			cfs_rq[j++] = &rq_i->cfs;
+		}
+		
+		if (j == 2)
+			update_core_cfs_rq_min_vruntime(cfs_rq[0], cfs_rq[1]);
+	}
 done:
 	set_next_task(rq, next);
 	return next;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2ca3f6173c52..5bd220751d7a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -479,13 +479,6 @@ static inline struct cfs_rq *core_cfs_rq(struct cfs_rq *cfs_rq)
 
 static inline u64 cfs_rq_min_vruntime(struct cfs_rq *cfs_rq)
 {
-	if (!sched_core_enabled(rq_of(cfs_rq)))
-		return cfs_rq->min_vruntime;
-
-#ifdef CONFIG_SCHED_CORE
-	if (is_root_cfs_rq(cfs_rq))
-		return core_cfs_rq(cfs_rq)->min_vruntime;
-#endif
 	return cfs_rq->min_vruntime;
 }
 
@@ -497,40 +490,47 @@ static void coresched_adjust_vruntime(struct cfs_rq *cfs_rq, u64 delta)
 	if (!cfs_rq)
 		return;
 
-	cfs_rq->min_vruntime -= delta;
+	/*
+	 * vruntime only goes forward. This can cause sleepers to be boosted
+	 * but that's Ok for now.
+	 */
+	cfs_rq->min_vruntime += delta;
 	rbtree_postorder_for_each_entry_safe(se, next,
 			&cfs_rq->tasks_timeline.rb_root, run_node) {
 		if (se->vruntime > delta)
-			se->vruntime -= delta;
+			se->vruntime += delta;
 		if (se->my_q)
 			coresched_adjust_vruntime(se->my_q, delta);
 	}
 }
 
-static void update_core_cfs_rq_min_vruntime(struct cfs_rq *cfs_rq)
+void update_core_cfs_rq_min_vruntime(struct cfs_rq *cfs_rqa,
+				    struct cfs_rq *cfs_rqb)
 {
-	struct cfs_rq *cfs_rq_core;
-
-	if (!sched_core_enabled(rq_of(cfs_rq)))
-		return;
+	u64 delta;
 
-	if (!is_root_cfs_rq(cfs_rq))
+	if (!is_root_cfs_rq(cfs_rqa) || !is_root_cfs_rq(cfs_rqb))
 		return;
 
-	cfs_rq_core = core_cfs_rq(cfs_rq);
-	if (cfs_rq_core != cfs_rq &&
-	    cfs_rq->min_vruntime < cfs_rq_core->min_vruntime) {
-		u64 delta = cfs_rq_core->min_vruntime - cfs_rq->min_vruntime;
-		coresched_adjust_vruntime(cfs_rq_core, delta);
+	if (cfs_rqa->min_vruntime < cfs_rqb->min_vruntime) {
+		delta = cfs_rqb->min_vruntime - cfs_rqa->min_vruntime;
+		/* Bring a upto speed with b. */
+		coresched_adjust_vruntime(cfs_rqa, delta);
+	} else if (cfs_rqb->min_vruntime < cfs_rqa->min_vruntime) {
+		u64 delta = cfs_rqa->min_vruntime - cfs_rqb->min_vruntime;
+		/* Bring b upto speed with a. */
+		coresched_adjust_vruntime(cfs_rqb, delta);
 	}
 }
 #endif
 
 bool cfs_prio_less(struct task_struct *a, struct task_struct *b)
 {
+	struct cfs_rq *cfs_rqa = &cpu_rq(task_cpu(a))->cfs;
+	struct cfs_rq *cfs_rqb = &cpu_rq(task_cpu(b))->cfs;
+	bool samecpu = task_cpu(a) == task_cpu(b);
 	struct sched_entity *sea = &a->se;
 	struct sched_entity *seb = &b->se;
-	bool samecpu = task_cpu(a) == task_cpu(b);
 	struct task_struct *p;
 	s64 delta;
 
@@ -555,7 +555,13 @@ bool cfs_prio_less(struct task_struct *a, struct task_struct *b)
 		sea = sea->parent;
 	while (seb->parent)
 		seb = seb->parent;
-	delta = (s64)(sea->vruntime - seb->vruntime);
+
+	WARN_ON_ONCE(sea->vruntime < cfs_rqa->min_vruntime);
+	WARN_ON_ONCE(seb->vruntime < cfs_rqb->min_vruntime);
+
+	/* normalize vruntime WRT their rq's base */
+	delta = (s64)(sea->vruntime - seb->vruntime) +
+		(s64)(cfs_rqb->min_vruntime - cfs_rqa->min_vruntime);
 
 out:
 	p = delta > 0 ? b : a;
@@ -624,10 +630,6 @@ static void update_min_vruntime(struct cfs_rq *cfs_rq)
 	/* ensure we never gain time by being placed backwards. */
 	cfs_rq->min_vruntime = max_vruntime(cfs_rq_min_vruntime(cfs_rq), vruntime);
 
-#ifdef CONFIG_SCHED_CORE
-	update_core_cfs_rq_min_vruntime(cfs_rq);
-#endif
-
 #ifndef CONFIG_64BIT
 	smp_wmb();
 	cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index d09cfbd746e5..8559d0ee157f 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2549,6 +2549,8 @@ unsigned long scale_irq_capacity(unsigned long util, unsigned long irq, unsigned
 #endif
 
 bool cfs_prio_less(struct task_struct *a, struct task_struct *b);
+void update_core_cfs_rq_min_vruntime(struct cfs_rq *cfs_rqa,
+				    struct cfs_rq *cfs_rqb);
 
 #ifdef CONFIG_SMP
 extern struct static_key_false sched_energy_present;
-- 
2.28.0.681.g6f77f65b4e-goog


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 11/23] sched/fair: core wide cfs task priority comparison
  2020-09-23  1:46     ` Joel Fernandes
@ 2020-09-23  1:52       ` Joel Fernandes
  2020-09-25 15:02         ` Joel Fernandes
  0 siblings, 1 reply; 68+ messages in thread
From: Joel Fernandes @ 2020-09-23  1:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Julien Desfossez, Vineeth Pillai, Tim Chen, Aaron Lu, Aubrey Li,
	Dhaval Giani, Chris Hyser, Nishanth Aravamudan, mingo, tglx, pjt,
	torvalds, linux-kernel, fweisbec, keescook, kerrnel, Phil Auld,
	Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini,
	vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, rostedt,
	derkling, benbjiang, Aaron Lu

On Tue, Sep 22, 2020 at 09:46:22PM -0400, Joel Fernandes wrote:
> On Fri, Aug 28, 2020 at 11:29:27PM +0200, Peter Zijlstra wrote:
> > 
> > 
> > This is still a horrible patch..
> 
> Hi Peter,
> I wrote a new patch similar to this one and it fares much better in my tests,
> it is based on Aaron's idea but I do the sync only during force-idle, and not
> during enqueue. Also I yanked the whole 'core wide min_vruntime' crap. There
> is a regressing test which improves quite a bit with my patch (results below):
> 
> Aaron, Vineeth, Chris any other thoughts? This patch is based on Google's
> 4.19 device kernel so will require some massaging to apply to mainline/v7
> series. I will provide an updated patch later based on v7 series.
> 
> (Works only for SMT2, maybe we can generalize it more..)
> --------8<-----------
> 
> From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> Subject: [PATCH] sched: Sync the min_vruntime of cores when the system enters
>  force-idle
> 
> This patch provides a vruntime based way to compare two cfs task's priority, be
> it on the same cpu or different threads of the same core.
> 
> It is based on Aaron Lu's patch with some important differences. Namely,
> the vruntime is sync'ed only when the CPU goes into force-idle. Also I removed
> the notion of core-wide min_vruntime.
> 
> Also I don't care how long a cpu in a core is force idled,  I do my sync
> whenever the force idle starts essentially bringing both SMTs to a common time
> base. After that point, selection can happen as usual.
> 
> When running an Android audio test, with patch the perf sched latency output:
> 
> -----------------------------------------------------------------------------------------------------------------
> Task                  |   Runtime ms  | Switches | Average delay ms | Maximum delay ms | Maximum delay at       |
> -----------------------------------------------------------------------------------------------------------------
> FinalizerDaemon:(2)   |     23.969 ms |      969 | avg:    0.504 ms | max:  162.020 ms | max at:   1294.327339 s
> HeapTaskDaemon:(3)    |   2421.287 ms |     4733 | avg:    0.131 ms | max:   96.229 ms | max at:   1302.343366 s
> adbd:(3)              |      6.101 ms |       79 | avg:    1.105 ms | max:   84.923 ms | max at:   1294.431284 s
> 
> Without this patch and with Aubrey's initial patch (in v5 series), the max delay looks much better:
> 
> -----------------------------------------------------------------------------------------------------------------
> Task                  |   Runtime ms  | Switches | Average delay ms | Maximum delay ms | Maximum delay at       |
> -----------------------------------------------------------------------------------------------------------------
> HeapTaskDaemon:(2)    |   2602.109 ms |     4025 | avg:    0.231 ms | max:   19.152 ms | max at:    522.903934 s
> surfaceflinger:7478   |     18.994 ms |     1206 | avg:    0.189 ms | max:   17.375 ms | max at:    520.523061 s
> ksoftirqd/3:30        |      0.093 ms |        5 | avg:    3.328 ms | max:   16.567 ms | max at:    522.903871 s

I messed up the change log, just to clarify - the first result is without
patch (bad) and the second result is with patch (good).

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v7 11/23] sched/fair: core wide cfs task priority comparison
  2020-09-23  1:52       ` Joel Fernandes
@ 2020-09-25 15:02         ` Joel Fernandes
  0 siblings, 0 replies; 68+ messages in thread
From: Joel Fernandes @ 2020-09-25 15:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Julien Desfossez, Vineeth Pillai, Tim Chen, Aaron Lu, Aubrey Li,
	Dhaval Giani, Chris Hyser, Nishanth Aravamudan, mingo, tglx, pjt,
	linux-kernel, fweisbec, keescook, kerrnel, Phil Auld,
	Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini,
	vineeth, Chen Yu, Christian Brauner, Agata Gruza,
	Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, rostedt,
	derkling, benbjiang, Aaron Lu

On Tue, Sep 22, 2020 at 09:52:43PM -0400, Joel Fernandes wrote:
> On Tue, Sep 22, 2020 at 09:46:22PM -0400, Joel Fernandes wrote:
> > On Fri, Aug 28, 2020 at 11:29:27PM +0200, Peter Zijlstra wrote:
> > > 
> > > 
> > > This is still a horrible patch..
> > 
> > Hi Peter,
> > I wrote a new patch similar to this one and it fares much better in my tests,
> > it is based on Aaron's idea but I do the sync only during force-idle, and not
> > during enqueue. Also I yanked the whole 'core wide min_vruntime' crap. There
> > is a regressing test which improves quite a bit with my patch (results below):
> > 
> > Aaron, Vineeth, Chris any other thoughts? This patch is based on Google's
> > 4.19 device kernel so will require some massaging to apply to mainline/v7
> > series. I will provide an updated patch later based on v7 series.
> > 
> > (Works only for SMT2, maybe we can generalize it more..)
> > --------8<-----------
> > 
> > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > Subject: [PATCH] sched: Sync the min_vruntime of cores when the system enters
> >  force-idle
> > 
> > This patch provides a vruntime based way to compare two cfs task's priority, be
> > it on the same cpu or different threads of the same core.
> > 
> > It is based on Aaron Lu's patch with some important differences. Namely,
> > the vruntime is sync'ed only when the CPU goes into force-idle. Also I removed
> > the notion of core-wide min_vruntime.
> > 
> > Also I don't care how long a cpu in a core is force idled,  I do my sync
> > whenever the force idle starts essentially bringing both SMTs to a common time
> > base. After that point, selection can happen as usual.
> > 
> > When running an Android audio test, with patch the perf sched latency output:
> > 
> > -----------------------------------------------------------------------------------------------------------------
> > Task                  |   Runtime ms  | Switches | Average delay ms | Maximum delay ms | Maximum delay at       |
> > -----------------------------------------------------------------------------------------------------------------
> > FinalizerDaemon:(2)   |     23.969 ms |      969 | avg:    0.504 ms | max:  162.020 ms | max at:   1294.327339 s
> > HeapTaskDaemon:(3)    |   2421.287 ms |     4733 | avg:    0.131 ms | max:   96.229 ms | max at:   1302.343366 s
> > adbd:(3)              |      6.101 ms |       79 | avg:    1.105 ms | max:   84.923 ms | max at:   1294.431284 s
> > 
> > Without this patch and with Aubrey's initial patch (in v5 series), the max delay looks much better:
> > 
> > -----------------------------------------------------------------------------------------------------------------
> > Task                  |   Runtime ms  | Switches | Average delay ms | Maximum delay ms | Maximum delay at       |
> > -----------------------------------------------------------------------------------------------------------------
> > HeapTaskDaemon:(2)    |   2602.109 ms |     4025 | avg:    0.231 ms | max:   19.152 ms | max at:    522.903934 s
> > surfaceflinger:7478   |     18.994 ms |     1206 | avg:    0.189 ms | max:   17.375 ms | max at:    520.523061 s
> > ksoftirqd/3:30        |      0.093 ms |        5 | avg:    3.328 ms | max:   16.567 ms | max at:    522.903871 s
> 
> I messed up the change log, just to clarify - the first result is without
> patch (bad) and the second result is with patch (good).

Here's another approach that might be worth considering, I was discussing
with Vineeth. Freeze the min_vruntime of CPUs when the core enters into
force-idle. I think this is similar to Peter's suggestion.

It is doing quite well in my real-world audio tests. This applies on top of
our ChromeOS 4.19 kernel tree [1] (which has the v5 series).

Any thoughts or review are most welcome especially from Peter :)

[1] https://chromium.googlesource.com/chromiumos/third_party/kernel/+/refs/heads/chromeos-4.19

---8<-----------------------

From: Joel Fernandes <joelaf@google.com>
Subject: [PATCH] Sync the min_vruntime of cores when the system enters force-idle

---
 kernel/sched/core.c  | 24 +++++++++++++++++-
 kernel/sched/fair.c  | 59 +++++++-------------------------------------
 kernel/sched/sched.h |  1 +
 3 files changed, 33 insertions(+), 51 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 715391c418d8..4ab680319a6b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4073,6 +4073,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 	const struct cpumask *smt_mask;
 	int i, j, cpu, occ = 0;
 	bool need_sync = false;
+	bool fi_before = false;
 
 	cpu = cpu_of(rq);
 	if (cpu_is_offline(cpu))
@@ -4138,6 +4139,16 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 			update_rq_clock(rq_i);
 	}
 
+	fi_before = need_sync;
+	if (!need_sync) {
+		for_each_cpu(i, smt_mask) {
+			struct rq *rq_i = cpu_rq(i);
+
+			/* Reset the snapshot if core is no longer in force-idle. */
+			rq_i->cfs.min_vruntime_fi = rq_i->cfs.min_vruntime;
+		}
+	}
+
 	/*
 	 * Try and select tasks for each sibling in decending sched_class
 	 * order.
@@ -4247,6 +4258,7 @@ next_class:;
 	 * their task. This ensures there is no inter-sibling overlap between
 	 * non-matching user state.
 	 */
+	need_sync = false;
 	for_each_cpu(i, smt_mask) {
 		struct rq *rq_i = cpu_rq(i);
 
@@ -4255,8 +4267,10 @@ next_class:;
 
 		WARN_ON_ONCE(!rq_i->core_pick);
 
-		if (is_idle_task(rq_i->core_pick) && rq_i->nr_running)
+		if (is_idle_task(rq_i->core_pick) && rq_i->nr_running) {
 			rq_i->core_forceidle = true;
+			need_sync = true;
+		}
 
 		rq_i->core_pick->core_occupation = occ;
 
@@ -4270,6 +4284,14 @@ next_class:;
 		WARN_ON_ONCE(!cookie_match(next, rq_i->core_pick));
 	}
 
+	if (!fi_before && need_sync) {
+		for_each_cpu(i, smt_mask) {
+			struct rq *rq_i = cpu_rq(i);
+
+			/* Snapshot if core is in force-idle. */
+			rq_i->cfs.min_vruntime_fi = rq_i->cfs.min_vruntime;
+		}
+	}
 done:
 	set_next_task(rq, next);
 	return next;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 23d032ab62d8..3d7c822bb5fb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -479,59 +479,17 @@ static inline struct cfs_rq *core_cfs_rq(struct cfs_rq *cfs_rq)
 
 static inline u64 cfs_rq_min_vruntime(struct cfs_rq *cfs_rq)
 {
-	if (!sched_core_enabled(rq_of(cfs_rq)))
-		return cfs_rq->min_vruntime;
-
-#ifdef CONFIG_SCHED_CORE
-	if (is_root_cfs_rq(cfs_rq))
-		return core_cfs_rq(cfs_rq)->min_vruntime;
-#endif
 	return cfs_rq->min_vruntime;
 }
 
-#ifdef CONFIG_SCHED_CORE
-static void coresched_adjust_vruntime(struct cfs_rq *cfs_rq, u64 delta)
-{
-	struct sched_entity *se, *next;
-
-	if (!cfs_rq)
-		return;
-
-	cfs_rq->min_vruntime -= delta;
-	rbtree_postorder_for_each_entry_safe(se, next,
-			&cfs_rq->tasks_timeline.rb_root, run_node) {
-		if (se->vruntime > delta)
-			se->vruntime -= delta;
-		if (se->my_q)
-			coresched_adjust_vruntime(se->my_q, delta);
-	}
-}
-
-static void update_core_cfs_rq_min_vruntime(struct cfs_rq *cfs_rq)
-{
-	struct cfs_rq *cfs_rq_core;
-
-	if (!sched_core_enabled(rq_of(cfs_rq)))
-		return;
-
-	if (!is_root_cfs_rq(cfs_rq))
-		return;
-
-	cfs_rq_core = core_cfs_rq(cfs_rq);
-	if (cfs_rq_core != cfs_rq &&
-	    cfs_rq->min_vruntime < cfs_rq_core->min_vruntime) {
-		u64 delta = cfs_rq_core->min_vruntime - cfs_rq->min_vruntime;
-		coresched_adjust_vruntime(cfs_rq_core, delta);
-	}
-}
-#endif
-
 #ifdef CONFIG_FAIR_GROUP_SCHED
 bool cfs_prio_less(struct task_struct *a, struct task_struct *b)
 {
+	bool samecpu = task_cpu(a) == task_cpu(b);
 	struct sched_entity *sea = &a->se;
 	struct sched_entity *seb = &b->se;
-	bool samecpu = task_cpu(a) == task_cpu(b);
+	struct cfs_rq *cfs_rqa;
+	struct cfs_rq *cfs_rqb;
 	s64 delta;
 
 	if (samecpu) {
@@ -555,8 +513,13 @@ bool cfs_prio_less(struct task_struct *a, struct task_struct *b)
 		sea = sea->parent;
 	while (seb->parent)
 		seb = seb->parent;
-	delta = (s64)(sea->vruntime - seb->vruntime);
 
+	cfs_rqa = sea->cfs_rq;
+	cfs_rqb = seb->cfs_rq;
+
+	/* normalize vruntime WRT their rq's base */
+	delta = (s64)(sea->vruntime - seb->vruntime) +
+		(s64)(cfs_rqb->min_vruntime_fi - cfs_rqa->min_vruntime_fi);
 out:
 	return delta > 0;
 }
@@ -620,10 +583,6 @@ static void update_min_vruntime(struct cfs_rq *cfs_rq)
 	/* ensure we never gain time by being placed backwards. */
 	cfs_rq->min_vruntime = max_vruntime(cfs_rq_min_vruntime(cfs_rq), vruntime);
 
-#ifdef CONFIG_SCHED_CORE
-	update_core_cfs_rq_min_vruntime(cfs_rq);
-#endif
-
 #ifndef CONFIG_64BIT
 	smp_wmb();
 	cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index d09cfbd746e5..45c8ce5c2333 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -499,6 +499,7 @@ struct cfs_rq {
 
 	u64			exec_clock;
 	u64			min_vruntime;
+	u64			min_vruntime_fi;
 #ifndef CONFIG_64BIT
 	u64			min_vruntime_copy;
 #endif
-- 
2.28.0.709.gb0816b6eb0-goog

^ permalink raw reply related	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2020-09-25 15:02 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-28 19:51 [RFC PATCH v7 00/23] Core scheduling v7 Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 01/23] sched: Wrap rq::lock access Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 02/23] sched: Introduce sched_class::pick_task() Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 03/23] sched: Core-wide rq->lock Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 04/23] sched/fair: Add a few assertions Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 05/23] sched: Basic tracking of matching tasks Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 06/23] bitops: Introduce find_next_or_bit Julien Desfossez
2020-09-03  5:13   ` Randy Dunlap
2020-08-28 19:51 ` [RFC PATCH v7 07/23] cpumask: Introduce a new iterator for_each_cpu_wrap_or Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling Julien Desfossez
2020-08-28 20:51   ` Peter Zijlstra
2020-08-28 22:02     ` Vineeth Pillai
2020-08-28 22:23       ` Joel Fernandes
2020-08-29  7:47       ` peterz
2020-08-31 13:01         ` Vineeth Pillai
2020-08-31 14:24         ` Joel Fernandes
2020-09-01  3:38         ` Joel Fernandes
2020-09-01  5:10         ` Joel Fernandes
2020-09-01 12:34           ` Vineeth Pillai
2020-09-01 17:30             ` Joel Fernandes
2020-09-01 21:23               ` Vineeth Pillai
2020-09-02  1:11                 ` Joel Fernandes
2020-08-28 20:55   ` Peter Zijlstra
2020-08-28 22:15     ` Vineeth Pillai
2020-09-15 20:08   ` Joel Fernandes
2020-08-28 19:51 ` [RFC PATCH v7 09/23] sched/fair: Fix forced idle sibling starvation corner case Julien Desfossez
2020-08-28 21:25   ` Peter Zijlstra
2020-08-28 23:24     ` Vineeth Pillai
2020-08-28 19:51 ` [RFC PATCH v7 10/23] sched/fair: wrapper for cfs_rq->min_vruntime Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 11/23] sched/fair: core wide cfs task priority comparison Julien Desfossez
2020-08-28 21:29   ` Peter Zijlstra
2020-09-17 14:15     ` Vineeth Pillai
2020-09-17 20:39       ` Vineeth Pillai
2020-09-23  1:46     ` Joel Fernandes
2020-09-23  1:52       ` Joel Fernandes
2020-09-25 15:02         ` Joel Fernandes
2020-09-15 21:49   ` chris hyser
     [not found]     ` <81b208ad-b9e6-bfbf-631e-02e9f75d73a2@linux.intel.com>
2020-09-16 14:24       ` chris hyser
2020-09-16 20:53         ` chris hyser
2020-09-17  1:09           ` Li, Aubrey
2020-08-28 19:51 ` [RFC PATCH v7 12/23] sched: Trivial forced-newidle balancer Julien Desfossez
2020-09-02  7:08   ` Pavan Kondeti
2020-08-28 19:51 ` [RFC PATCH v7 13/23] sched: migration changes for core scheduling Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 14/23] irq_work: Add support to detect if work is pending Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 15/23] entry/idle: Add a common function for activites during idle entry/exit Julien Desfossez
2020-08-30  2:17   ` kernel test robot
2020-08-28 19:51 ` [RFC PATCH v7 16/23] arch/x86: Add a new TIF flag for untrusted tasks Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode Julien Desfossez
2020-08-30  6:50   ` [kernel/entry] 872a0a3f0b: will-it-scale.per_thread_ops -18.7% regression kernel test robot
2020-09-01 15:54   ` [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode Thomas Gleixner
2020-09-01 16:50     ` Joel Fernandes
2020-09-01 20:02       ` Thomas Gleixner
2020-09-02  1:29         ` Joel Fernandes
2020-09-02  7:53           ` Thomas Gleixner
2020-09-02 15:12             ` Joel Fernandes
2020-09-02 16:57             ` Dario Faggioli
2020-09-03  4:34               ` Joel Fernandes
2020-09-03 11:05                 ` Vineeth Pillai
2020-09-03 13:20                 ` Thomas Gleixner
2020-09-03 20:30                   ` Joel Fernandes
2020-09-03 13:43                 ` Dario Faggioli
2020-09-03 20:25                   ` Joel Fernandes
2020-08-28 19:51 ` [RFC PATCH v7 18/23] entry/idle: Enter and exit kernel protection during idle entry and exit Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 19/23] entry/kvm: Protect the kernel when entering from guest Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 20/23] sched/coresched: config option for kernel protection Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 21/23] sched: cgroup tagging interface for core scheduling Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 22/23] Documentation: Add documentation on " Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 23/23] sched: Debug bits Julien Desfossez

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.