All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/4] KVM: Dirty quota-based throttling
@ 2022-11-13 17:05 Shivam Kumar
  2022-11-13 17:05 ` [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus Shivam Kumar
                   ` (3 more replies)
  0 siblings, 4 replies; 38+ messages in thread
From: Shivam Kumar @ 2022-11-13 17:05 UTC (permalink / raw)
  To: pbonzini, seanjc, maz, james.morse, borntraeger, david; +Cc: kvm, Shivam Kumar

This is v7 of the dirty quota series, with the following changes over v6:

1. Dropped support for s390 arch.
2. IOCTL to check if the kernel supports dirty quota throttling.
2. Code refactoring and minor nits.

v1:
https://lore.kernel.org/kvm/20211114145721.209219-1-shivam.kumar1@nutanix.com/
v2: https://lore.kernel.org/kvm/Ydx2EW6U3fpJoJF0@google.com/T/
v3: https://lore.kernel.org/kvm/YkT1kzWidaRFdQQh@google.com/T/
v4:
https://lore.kernel.org/all/20220521202937.184189-1-shivam.kumar1@nutanix.com/
v5: https://lore.kernel.org/all/202209130532.2BJwW65L-lkp@intel.com/T/
v6:
https://lore.kernel.org/all/20220915101049.187325-1-shivam.kumar1@nutanix.com/

Thanks,
Shivam

Shivam Kumar (4):
  KVM: Implement dirty quota-based throttling of vcpus
  KVM: x86: Dirty quota-based throttling of vcpus
  KVM: arm64: Dirty quota-based throttling of vcpus
  KVM: selftests: Add selftests for dirty quota throttling

 Documentation/virt/kvm/api.rst                | 35 ++++++++++++
 arch/arm64/kvm/arm.c                          |  9 ++++
 arch/x86/kvm/Kconfig                          |  1 +
 arch/x86/kvm/mmu/spte.c                       |  4 +-
 arch/x86/kvm/vmx/vmx.c                        |  3 ++
 arch/x86/kvm/x86.c                            | 28 ++++++++++
 include/linux/kvm_host.h                      |  5 +-
 include/linux/kvm_types.h                     |  1 +
 include/uapi/linux/kvm.h                      | 13 +++++
 tools/include/uapi/linux/kvm.h                |  1 +
 tools/testing/selftests/kvm/dirty_log_test.c  | 33 +++++++++++-
 .../selftests/kvm/include/kvm_util_base.h     |  4 ++
 tools/testing/selftests/kvm/lib/kvm_util.c    | 53 +++++++++++++++++++
 virt/kvm/Kconfig                              |  4 ++
 virt/kvm/kvm_main.c                           | 25 +++++++--
 15 files changed, 211 insertions(+), 8 deletions(-)

-- 
2.22.3


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus
  2022-11-13 17:05 [PATCH v7 0/4] KVM: Dirty quota-based throttling Shivam Kumar
@ 2022-11-13 17:05 ` Shivam Kumar
  2022-11-14 23:29   ` Yunhong Jiang
                     ` (2 more replies)
  2022-11-13 17:05 ` [PATCH v7 2/4] KVM: x86: Dirty " Shivam Kumar
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 38+ messages in thread
From: Shivam Kumar @ 2022-11-13 17:05 UTC (permalink / raw)
  To: pbonzini, seanjc, maz, james.morse, borntraeger, david
  Cc: kvm, Shivam Kumar, Shaju Abraham, Manish Mishra, Anurag Madnawat

Define variables to track and throttle memory dirtying for every vcpu.

dirty_count:    Number of pages the vcpu has dirtied since its creation,
                while dirty logging is enabled.
dirty_quota:    Number of pages the vcpu is allowed to dirty. To dirty
                more, it needs to request more quota by exiting to
                userspace.

Implement the flow for throttling based on dirty quota.

i) Increment dirty_count for the vcpu whenever it dirties a page.
ii) Exit to userspace whenever the dirty quota is exhausted (i.e. dirty
count equals/exceeds dirty quota) to request more dirty quota.

Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
---
 Documentation/virt/kvm/api.rst | 35 ++++++++++++++++++++++++++++++++++
 arch/x86/kvm/Kconfig           |  1 +
 include/linux/kvm_host.h       |  5 ++++-
 include/linux/kvm_types.h      |  1 +
 include/uapi/linux/kvm.h       | 13 +++++++++++++
 tools/include/uapi/linux/kvm.h |  1 +
 virt/kvm/Kconfig               |  4 ++++
 virt/kvm/kvm_main.c            | 25 +++++++++++++++++++++---
 8 files changed, 81 insertions(+), 4 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index eee9f857a986..4568faa33c6d 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6513,6 +6513,26 @@ array field represents return values. The userspace should update the return
 values of SBI call before resuming the VCPU. For more details on RISC-V SBI
 spec refer, https://github.com/riscv/riscv-sbi-doc.
 
+::
+
+		/* KVM_EXIT_DIRTY_QUOTA_EXHAUSTED */
+		struct {
+			__u64 count;
+			__u64 quota;
+		} dirty_quota_exit;
+
+If exit reason is KVM_EXIT_DIRTY_QUOTA_EXHAUSTED, it indicates that the VCPU has
+exhausted its dirty quota. The 'dirty_quota_exit' member of kvm_run structure
+makes the following information available to the userspace:
+    count: the current count of pages dirtied by the VCPU, can be
+    skewed based on the size of the pages accessed by each vCPU.
+    quota: the observed dirty quota just before the exit to userspace.
+
+The userspace can design a strategy to allocate the overall scope of dirtying
+for the VM among the vcpus. Based on the strategy and the current state of dirty
+quota throttling, the userspace can make a decision to either update (increase)
+the quota or to put the VCPU to sleep for some time.
+
 ::
 
     /* KVM_EXIT_NOTIFY */
@@ -6567,6 +6587,21 @@ values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
 
 ::
 
+	/*
+	 * Number of pages the vCPU is allowed to have dirtied over its entire
+	 * lifetime.  KVM_RUN exits with KVM_EXIT_DIRTY_QUOTA_EXHAUSTED if the quota
+	 * is reached/exceeded.
+	 */
+	__u64 dirty_quota;
+
+Please note that enforcing the quota is best effort, as the guest may dirty
+multiple pages before KVM can recheck the quota.  However, unless KVM is using
+a hardware-based dirty ring buffer, e.g. Intel's Page Modification Logging,
+KVM will detect quota exhaustion within a handful of dirtied pages.  If a
+hardware ring buffer is used, the overrun is bounded by the size of the buffer
+(512 entries for PML).
+
+::
   };
 
 
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 67be7f217e37..bdbd36321d52 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -48,6 +48,7 @@ config KVM
 	select KVM_VFIO
 	select SRCU
 	select INTERVAL_TREE
+	select HAVE_KVM_DIRTY_QUOTA
 	select HAVE_KVM_PM_NOTIFIER if PM
 	help
 	  Support hosting fully virtualized guest machines using hardware
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 18592bdf4c1b..0b9b5c251a04 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -151,11 +151,12 @@ static inline bool is_error_page(struct page *page)
 #define KVM_REQUEST_NO_ACTION      BIT(10)
 /*
  * Architecture-independent vcpu->requests bit members
- * Bits 3-7 are reserved for more arch-independent bits.
+ * Bits 5-7 are reserved for more arch-independent bits.
  */
 #define KVM_REQ_TLB_FLUSH         (0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_VM_DEAD           (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_UNBLOCK           2
+#define KVM_REQ_DIRTY_QUOTA_EXIT  4
 #define KVM_REQUEST_ARCH_BASE     8
 
 /*
@@ -379,6 +380,8 @@ struct kvm_vcpu {
 	 */
 	struct kvm_memory_slot *last_used_slot;
 	u64 last_used_slot_gen;
+
+	u64 dirty_quota;
 };
 
 /*
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index 3ca3db020e0e..263a588f3cd3 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -118,6 +118,7 @@ struct kvm_vcpu_stat_generic {
 	u64 halt_poll_fail_hist[HALT_POLL_HIST_COUNT];
 	u64 halt_wait_hist[HALT_POLL_HIST_COUNT];
 	u64 blocking;
+	u64 pages_dirtied;
 };
 
 #define KVM_STATS_NAME_SIZE	48
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 0d5d4419139a..5acb8991f872 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -272,6 +272,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_RISCV_SBI        35
 #define KVM_EXIT_RISCV_CSR        36
 #define KVM_EXIT_NOTIFY           37
+#define KVM_EXIT_DIRTY_QUOTA_EXHAUSTED 38
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -510,6 +511,11 @@ struct kvm_run {
 #define KVM_NOTIFY_CONTEXT_INVALID	(1 << 0)
 			__u32 flags;
 		} notify;
+		/* KVM_EXIT_DIRTY_QUOTA_EXHAUSTED */
+		struct {
+			__u64 count;
+			__u64 quota;
+		} dirty_quota_exit;
 		/* Fix the size of the union. */
 		char padding[256];
 	};
@@ -531,6 +537,12 @@ struct kvm_run {
 		struct kvm_sync_regs regs;
 		char padding[SYNC_REGS_SIZE_BYTES];
 	} s;
+	/*
+	 * Number of pages the vCPU is allowed to have dirtied over its entire
+	 * lifetime.  KVM_RUN exits with KVM_EXIT_DIRTY_QUOTA_EXHAUSTED if the
+	 * quota is reached/exceeded.
+	 */
+	__u64 dirty_quota;
 };
 
 /* for KVM_REGISTER_COALESCED_MMIO / KVM_UNREGISTER_COALESCED_MMIO */
@@ -1178,6 +1190,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_S390_ZPCI_OP 221
 #define KVM_CAP_S390_CPU_TOPOLOGY 222
 #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
+#define KVM_CAP_DIRTY_QUOTA 224
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h
index 0d5d4419139a..c8f811572670 100644
--- a/tools/include/uapi/linux/kvm.h
+++ b/tools/include/uapi/linux/kvm.h
@@ -1178,6 +1178,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_S390_ZPCI_OP 221
 #define KVM_CAP_S390_CPU_TOPOLOGY 222
 #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
+#define KVM_CAP_DIRTY_QUOTA 224
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 800f9470e36b..b6418a578c0a 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -19,6 +19,9 @@ config HAVE_KVM_IRQ_ROUTING
 config HAVE_KVM_DIRTY_RING
        bool
 
+config HAVE_KVM_DIRTY_QUOTA
+       bool
+
 # Only strongly ordered architectures can select this, as it doesn't
 # put any explicit constraint on userspace ordering. They can also
 # select the _ACQ_REL version.
@@ -86,3 +89,4 @@ config KVM_XFER_TO_GUEST_WORK
 
 config HAVE_KVM_PM_NOTIFIER
        bool
+
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 25d7872b29c1..7a54438b4d49 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3298,18 +3298,32 @@ int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len)
 }
 EXPORT_SYMBOL_GPL(kvm_clear_guest);
 
+static bool kvm_vcpu_is_dirty_quota_exhausted(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
+	u64 dirty_quota = READ_ONCE(vcpu->run->dirty_quota);
+
+	return dirty_quota && (vcpu->stat.generic.pages_dirtied >= dirty_quota);
+#else
+	return false;
+#endif
+}
+
 void mark_page_dirty_in_slot(struct kvm *kvm,
 			     const struct kvm_memory_slot *memslot,
 		 	     gfn_t gfn)
 {
 	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
 
-#ifdef CONFIG_HAVE_KVM_DIRTY_RING
 	if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
 		return;
-#endif
 
-	if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
+	if (!memslot)
+		return;
+
+	WARN_ON_ONCE(!vcpu->stat.generic.pages_dirtied++);
+
+	if (kvm_slot_dirty_track_enabled(memslot)) {
 		unsigned long rel_gfn = gfn - memslot->base_gfn;
 		u32 slot = (memslot->as_id << 16) | memslot->id;
 
@@ -3318,6 +3332,9 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
 					    slot, rel_gfn);
 		else
 			set_bit_le(rel_gfn, memslot->dirty_bitmap);
+
+		if (kvm_vcpu_is_dirty_quota_exhausted(vcpu))
+			kvm_make_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu);
 	}
 }
 EXPORT_SYMBOL_GPL(mark_page_dirty_in_slot);
@@ -4487,6 +4504,8 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 	case KVM_CAP_BINARY_STATS_FD:
 	case KVM_CAP_SYSTEM_EVENT_DATA:
 		return 1;
+	case KVM_CAP_DIRTY_QUOTA:
+		return !!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_QUOTA);
 	default:
 		break;
 	}
-- 
2.22.3


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v7 2/4] KVM: x86: Dirty quota-based throttling of vcpus
  2022-11-13 17:05 [PATCH v7 0/4] KVM: Dirty quota-based throttling Shivam Kumar
  2022-11-13 17:05 ` [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus Shivam Kumar
@ 2022-11-13 17:05 ` Shivam Kumar
  2022-11-15  0:16   ` Yunhong Jiang
  2022-11-13 17:05 ` [PATCH v7 3/4] KVM: arm64: " Shivam Kumar
  2022-11-13 17:05 ` [PATCH v7 4/4] KVM: selftests: Add selftests for dirty quota throttling Shivam Kumar
  3 siblings, 1 reply; 38+ messages in thread
From: Shivam Kumar @ 2022-11-13 17:05 UTC (permalink / raw)
  To: pbonzini, seanjc, maz, james.morse, borntraeger, david
  Cc: kvm, Shivam Kumar, Shaju Abraham, Manish Mishra, Anurag Madnawat

Exit to userspace whenever the dirty quota is exhausted (i.e. dirty count
equals/exceeds dirty quota) to request more dirty quota.

Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
---
 arch/x86/kvm/mmu/spte.c |  4 ++--
 arch/x86/kvm/vmx/vmx.c  |  3 +++
 arch/x86/kvm/x86.c      | 28 ++++++++++++++++++++++++++++
 3 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 2e08b2a45361..c0ed35abbf2d 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -228,9 +228,9 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 		  "spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level,
 		  get_rsvd_bits(&vcpu->arch.mmu->shadow_zero_check, spte, level));
 
-	if ((spte & PT_WRITABLE_MASK) && kvm_slot_dirty_track_enabled(slot)) {
+	if (spte & PT_WRITABLE_MASK) {
 		/* Enforced by kvm_mmu_hugepage_adjust. */
-		WARN_ON(level > PG_LEVEL_4K);
+		WARN_ON(level > PG_LEVEL_4K && kvm_slot_dirty_track_enabled(slot));
 		mark_page_dirty_in_slot(vcpu->kvm, slot, gfn);
 	}
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 63247c57c72c..cc130999eddf 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5745,6 +5745,9 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
 		 */
 		if (__xfer_to_guest_mode_work_pending())
 			return 1;
+
+		if (kvm_test_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu))
+			return 1;
 	}
 
 	return 1;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ecea83f0da49..1a960fbb51f4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10494,6 +10494,30 @@ void __kvm_request_immediate_exit(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(__kvm_request_immediate_exit);
 
+static inline bool kvm_check_dirty_quota_request(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
+	struct kvm_run *run;
+
+	if (kvm_check_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu)) {
+		run = vcpu->run;
+		run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
+		run->dirty_quota_exit.count = vcpu->stat.generic.pages_dirtied;
+		run->dirty_quota_exit.quota = READ_ONCE(run->dirty_quota);
+
+		/*
+		 * Re-check the quota and exit if and only if the vCPU still
+		 * exceeds its quota.  If userspace increases (or disables
+		 * entirely) the quota, then no exit is required as KVM is
+		 * still honoring its ABI, e.g. userspace won't even be aware
+		 * that KVM temporarily detected an exhausted quota.
+		 */
+		return run->dirty_quota_exit.count >= run->dirty_quota_exit.quota;
+	}
+#endif
+	return false;
+}
+
 /*
  * Called within kvm->srcu read side.
  * Returns 1 to let vcpu_run() continue the guest execution loop without
@@ -10625,6 +10649,10 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			r = 0;
 			goto out;
 		}
+		if (kvm_check_dirty_quota_request(vcpu)) {
+			r = 0;
+			goto out;
+		}
 
 		/*
 		 * KVM_REQ_HV_STIMER has to be processed after
-- 
2.22.3


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v7 3/4] KVM: arm64: Dirty quota-based throttling of vcpus
  2022-11-13 17:05 [PATCH v7 0/4] KVM: Dirty quota-based throttling Shivam Kumar
  2022-11-13 17:05 ` [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus Shivam Kumar
  2022-11-13 17:05 ` [PATCH v7 2/4] KVM: x86: Dirty " Shivam Kumar
@ 2022-11-13 17:05 ` Shivam Kumar
  2022-11-15  0:27   ` Yunhong Jiang
  2022-11-17 20:44   ` Marc Zyngier
  2022-11-13 17:05 ` [PATCH v7 4/4] KVM: selftests: Add selftests for dirty quota throttling Shivam Kumar
  3 siblings, 2 replies; 38+ messages in thread
From: Shivam Kumar @ 2022-11-13 17:05 UTC (permalink / raw)
  To: pbonzini, seanjc, maz, james.morse, borntraeger, david
  Cc: kvm, Shivam Kumar, Shaju Abraham, Manish Mishra, Anurag Madnawat

Exit to userspace whenever the dirty quota is exhausted (i.e. dirty count
equals/exceeds dirty quota) to request more dirty quota.

Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
---
 arch/arm64/kvm/arm.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 94d33e296e10..850024982dd9 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -746,6 +746,15 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
 
 		if (kvm_check_request(KVM_REQ_SUSPEND, vcpu))
 			return kvm_vcpu_suspend(vcpu);
+
+		if (kvm_check_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu)) {
+			struct kvm_run *run = vcpu->run;
+
+			run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
+			run->dirty_quota_exit.count = vcpu->stat.generic.pages_dirtied;
+			run->dirty_quota_exit.quota = vcpu->dirty_quota;
+			return 0;
+		}
 	}
 
 	return 1;
-- 
2.22.3


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v7 4/4] KVM: selftests: Add selftests for dirty quota throttling
  2022-11-13 17:05 [PATCH v7 0/4] KVM: Dirty quota-based throttling Shivam Kumar
                   ` (2 preceding siblings ...)
  2022-11-13 17:05 ` [PATCH v7 3/4] KVM: arm64: " Shivam Kumar
@ 2022-11-13 17:05 ` Shivam Kumar
  3 siblings, 0 replies; 38+ messages in thread
From: Shivam Kumar @ 2022-11-13 17:05 UTC (permalink / raw)
  To: pbonzini, seanjc, maz, james.morse, borntraeger, david
  Cc: kvm, Shivam Kumar, Shaju Abraham, Manish Mishra, Anurag Madnawat

Add selftests for dirty quota throttling with an optional -d parameter
to configure by what value dirty quota should be incremented after
each dirty quota exit. With very small intervals, a smaller value of
dirty quota can ensure that the dirty quota exit code is tested. A zero
value disables dirty quota throttling and thus dirty logging, without
dirty quota throttling, can be tested.

Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
---
 tools/testing/selftests/kvm/dirty_log_test.c  | 33 +++++++++++-
 .../selftests/kvm/include/kvm_util_base.h     |  4 ++
 tools/testing/selftests/kvm/lib/kvm_util.c    | 53 +++++++++++++++++++
 3 files changed, 88 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index b5234d6efbe1..a85ca6554d17 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -64,6 +64,8 @@
 
 #define SIG_IPI SIGUSR1
 
+#define TEST_DIRTY_QUOTA_INCREMENT		8
+
 /*
  * Guest/Host shared variables. Ensure addr_gva2hva() and/or
  * sync_global_to/from_guest() are used when accessing from
@@ -190,6 +192,7 @@ static enum log_mode_t host_log_mode_option = LOG_MODE_ALL;
 static enum log_mode_t host_log_mode;
 static pthread_t vcpu_thread;
 static uint32_t test_dirty_ring_count = TEST_DIRTY_RING_COUNT;
+static uint64_t test_dirty_quota_increment = TEST_DIRTY_QUOTA_INCREMENT;
 
 static void vcpu_kick(void)
 {
@@ -209,6 +212,13 @@ static void sem_wait_until(sem_t *sem)
 	while (ret == -1 && errno == EINTR);
 }
 
+static void set_dirty_quota(struct kvm_vm *vm, uint64_t dirty_quota)
+{
+	struct kvm_run *run = vcpu_state(vm, VCPU_ID);
+
+	vcpu_set_dirty_quota(run, dirty_quota);
+}
+
 static bool clear_log_supported(void)
 {
 	return kvm_has_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2);
@@ -256,7 +266,11 @@ static void default_after_vcpu_run(struct kvm_vcpu *vcpu, int ret, int err)
 	TEST_ASSERT(ret == 0 || (ret == -1 && err == EINTR),
 		    "vcpu run failed: errno=%d", err);
 
-	TEST_ASSERT(get_ucall(vcpu, NULL) == UCALL_SYNC,
+	if (test_dirty_quota_increment &&
+		run->exit_reason == KVM_EXIT_DIRTY_QUOTA_EXHAUSTED)
+		vcpu_handle_dirty_quota_exit(run, test_dirty_quota_increment);
+	else
+		TEST_ASSERT(get_ucall(vcpu, NULL) == UCALL_SYNC,
 		    "Invalid guest sync status: exit_reason=%s\n",
 		    exit_reason_str(run->exit_reason));
 
@@ -374,6 +388,9 @@ static void dirty_ring_after_vcpu_run(struct kvm_vcpu *vcpu, int ret, int err)
 	if (get_ucall(vcpu, NULL) == UCALL_SYNC) {
 		/* We should allow this to continue */
 		;
+	} else if (test_dirty_quota_increment &&
+		run->exit_reason == KVM_EXIT_DIRTY_QUOTA_EXHAUSTED) {
+		vcpu_handle_dirty_quota_exit(run, test_dirty_quota_increment);
 	} else if (run->exit_reason == KVM_EXIT_DIRTY_RING_FULL ||
 		   (ret == -1 && err == EINTR)) {
 		/* Update the flag first before pause */
@@ -764,6 +781,10 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	sync_global_to_guest(vm, guest_test_virt_mem);
 	sync_global_to_guest(vm, guest_num_pages);
 
+	/* Initialise dirty quota */
+	if (test_dirty_quota_increment)
+		set_dirty_quota(vm, test_dirty_quota_increment);
+
 	/* Start the iterations */
 	iteration = 1;
 	sync_global_to_guest(vm, iteration);
@@ -805,6 +826,9 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	/* Tell the vcpu thread to quit */
 	host_quit = true;
 	log_mode_before_vcpu_join();
+	/* Terminate dirty quota throttling */
+	if (test_dirty_quota_increment)
+		set_dirty_quota(vm, 0);
 	pthread_join(vcpu_thread, NULL);
 
 	pr_info("Total bits checked: dirty (%"PRIu64"), clear (%"PRIu64"), "
@@ -826,6 +850,8 @@ static void help(char *name)
 	printf(" -c: specify dirty ring size, in number of entries\n");
 	printf("     (only useful for dirty-ring test; default: %"PRIu32")\n",
 	       TEST_DIRTY_RING_COUNT);
+	printf(" -q: specify incemental dirty quota (default: %"PRIu32")\n",
+	       TEST_DIRTY_QUOTA_INCREMENT);
 	printf(" -i: specify iteration counts (default: %"PRIu64")\n",
 	       TEST_HOST_LOOP_N);
 	printf(" -I: specify interval in ms (default: %"PRIu64" ms)\n",
@@ -854,11 +880,14 @@ int main(int argc, char *argv[])
 
 	guest_modes_append_default();
 
-	while ((opt = getopt(argc, argv, "c:hi:I:p:m:M:")) != -1) {
+	while ((opt = getopt(argc, argv, "c:q:hi:I:p:m:M:")) != -1) {
 		switch (opt) {
 		case 'c':
 			test_dirty_ring_count = strtol(optarg, NULL, 10);
 			break;
+		case 'q':
+			test_dirty_quota_increment = strtol(optarg, NULL, 10);
+			break;
 		case 'i':
 			p.iterations = strtol(optarg, NULL, 10);
 			break;
diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h
index e42a09cd24a0..d8eee61a9a7c 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -838,4 +838,8 @@ static inline int __vm_disable_nx_huge_pages(struct kvm_vm *vm)
 	return __vm_enable_cap(vm, KVM_CAP_VM_DISABLE_NX_HUGE_PAGES, 0);
 }
 
+void vcpu_set_dirty_quota(struct kvm_run *run, uint64_t dirty_quota);
+void vcpu_handle_dirty_quota_exit(struct kvm_run *run,
+			uint64_t test_dirty_quota_increment);
+
 #endif /* SELFTEST_KVM_UTIL_BASE_H */
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index f1cb1627161f..2a60c7bdc778 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -18,6 +18,7 @@
 #include <linux/kernel.h>
 
 #define KVM_UTIL_MIN_PFN	2
+#define PML_BUFFER_SIZE	512
 
 static int vcpu_mmap_sz(void);
 
@@ -1745,6 +1746,7 @@ static struct exit_reason {
 	{KVM_EXIT_X86_RDMSR, "RDMSR"},
 	{KVM_EXIT_X86_WRMSR, "WRMSR"},
 	{KVM_EXIT_XEN, "XEN"},
+	{KVM_EXIT_DIRTY_QUOTA_EXHAUSTED, "DIRTY_QUOTA_EXHAUSTED"},
 #ifdef KVM_EXIT_MEMORY_NOT_PRESENT
 	{KVM_EXIT_MEMORY_NOT_PRESENT, "MEMORY_NOT_PRESENT"},
 #endif
@@ -2021,3 +2023,54 @@ void __vm_get_stat(struct kvm_vm *vm, const char *stat_name, uint64_t *data,
 		break;
 	}
 }
+
+bool kvm_is_pml_enabled(void)
+{
+	return is_intel_cpu() && get_kvm_intel_param_bool("pml");
+}
+
+void vcpu_set_dirty_quota(struct kvm_run *run, uint64_t dirty_quota)
+{
+	run->dirty_quota = dirty_quota;
+
+	if (dirty_quota)
+		pr_info("Dirty quota throttling enabled with initial quota %lu\n",
+			dirty_quota);
+	else
+		pr_info("Dirty quota throttling disabled\n");
+}
+
+void vcpu_handle_dirty_quota_exit(struct kvm_run *run,
+			uint64_t test_dirty_quota_increment)
+{
+	uint64_t quota = run->dirty_quota_exit.quota;
+	uint64_t count = run->dirty_quota_exit.count;
+
+	/*
+	 * Allow certain pages of overrun, KVM is allowed to dirty multiple
+	 * pages before exiting to userspace, e.g. when emulating an
+	 * instruction that performs multiple memory accesses.
+	 */
+	uint64_t buffer;
+
+	/*
+	 * When Intel's Page-Modification Logging (PML) is enabled, the CPU may
+	 * dirty up to 512 pages (number of entries in the PML buffer) without
+	 * exiting, thus KVM may effectively dirty that many pages before
+	 * enforcing the dirty quota.
+	 */
+#ifdef __x86_64__
+	if (kvm_is_pml_enabled(void))
+		buffer = PML_BUFFER_SIZE;
+#endif
+
+	TEST_ASSERT(count <= (quota + buffer),
+			"KVM dirtied too many pages: count=%lu, quota=%lu, buffer=%lu\n",
+			count, quota, buffer);
+
+	TEST_ASSERT(count >= quota,
+			"Dirty quota exit happened with quota yet to be exhausted: count=%lu, quota=%lu\n",
+			count, quota);
+
+	run->dirty_quota = count + test_dirty_quota_increment;
+}
-- 
2.22.3


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus
  2022-11-13 17:05 ` [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus Shivam Kumar
@ 2022-11-14 23:29   ` Yunhong Jiang
  2022-11-15  4:48     ` Shivam Kumar
  2022-11-17 19:26   ` Marc Zyngier
  2022-11-25 10:52   ` kernel test robot
  2 siblings, 1 reply; 38+ messages in thread
From: Yunhong Jiang @ 2022-11-14 23:29 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: pbonzini, seanjc, maz, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat

On Sun, Nov 13, 2022 at 05:05:06PM +0000, Shivam Kumar wrote:
> Define variables to track and throttle memory dirtying for every vcpu.
> 
> dirty_count:    Number of pages the vcpu has dirtied since its creation,
>                 while dirty logging is enabled.
> dirty_quota:    Number of pages the vcpu is allowed to dirty. To dirty
>                 more, it needs to request more quota by exiting to
>                 userspace.
> 
> Implement the flow for throttling based on dirty quota.
> 
> i) Increment dirty_count for the vcpu whenever it dirties a page.
> ii) Exit to userspace whenever the dirty quota is exhausted (i.e. dirty
> count equals/exceeds dirty quota) to request more dirty quota.
> 
> Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
> Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
> Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
> Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
> Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
> ---
>  Documentation/virt/kvm/api.rst | 35 ++++++++++++++++++++++++++++++++++
>  arch/x86/kvm/Kconfig           |  1 +
>  include/linux/kvm_host.h       |  5 ++++-
>  include/linux/kvm_types.h      |  1 +
>  include/uapi/linux/kvm.h       | 13 +++++++++++++
>  tools/include/uapi/linux/kvm.h |  1 +
>  virt/kvm/Kconfig               |  4 ++++
>  virt/kvm/kvm_main.c            | 25 +++++++++++++++++++++---
>  8 files changed, 81 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index eee9f857a986..4568faa33c6d 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6513,6 +6513,26 @@ array field represents return values. The userspace should update the return
>  values of SBI call before resuming the VCPU. For more details on RISC-V SBI
>  spec refer, https://github.com/riscv/riscv-sbi-doc.
>  
> +::
> +
> +		/* KVM_EXIT_DIRTY_QUOTA_EXHAUSTED */
> +		struct {
> +			__u64 count;
> +			__u64 quota;
> +		} dirty_quota_exit;
> +
> +If exit reason is KVM_EXIT_DIRTY_QUOTA_EXHAUSTED, it indicates that the VCPU has
> +exhausted its dirty quota. The 'dirty_quota_exit' member of kvm_run structure
> +makes the following information available to the userspace:
> +    count: the current count of pages dirtied by the VCPU, can be
> +    skewed based on the size of the pages accessed by each vCPU.
> +    quota: the observed dirty quota just before the exit to userspace.
> +
> +The userspace can design a strategy to allocate the overall scope of dirtying
> +for the VM among the vcpus. Based on the strategy and the current state of dirty
> +quota throttling, the userspace can make a decision to either update (increase)
> +the quota or to put the VCPU to sleep for some time.
> +
>  ::
>  
>      /* KVM_EXIT_NOTIFY */
> @@ -6567,6 +6587,21 @@ values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
>  
>  ::
>  
> +	/*
> +	 * Number of pages the vCPU is allowed to have dirtied over its entire
> +	 * lifetime.  KVM_RUN exits with KVM_EXIT_DIRTY_QUOTA_EXHAUSTED if the quota
> +	 * is reached/exceeded.
> +	 */
> +	__u64 dirty_quota;
> +
> +Please note that enforcing the quota is best effort, as the guest may dirty
> +multiple pages before KVM can recheck the quota.  However, unless KVM is using
> +a hardware-based dirty ring buffer, e.g. Intel's Page Modification Logging,
> +KVM will detect quota exhaustion within a handful of dirtied pages.  If a
> +hardware ring buffer is used, the overrun is bounded by the size of the buffer
> +(512 entries for PML).
> +
> +::
>    };
>  
>  
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index 67be7f217e37..bdbd36321d52 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -48,6 +48,7 @@ config KVM
>  	select KVM_VFIO
>  	select SRCU
>  	select INTERVAL_TREE
> +	select HAVE_KVM_DIRTY_QUOTA
>  	select HAVE_KVM_PM_NOTIFIER if PM
>  	help
>  	  Support hosting fully virtualized guest machines using hardware
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 18592bdf4c1b..0b9b5c251a04 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -151,11 +151,12 @@ static inline bool is_error_page(struct page *page)
>  #define KVM_REQUEST_NO_ACTION      BIT(10)
>  /*
>   * Architecture-independent vcpu->requests bit members
> - * Bits 3-7 are reserved for more arch-independent bits.
> + * Bits 5-7 are reserved for more arch-independent bits.
>   */
>  #define KVM_REQ_TLB_FLUSH         (0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>  #define KVM_REQ_VM_DEAD           (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>  #define KVM_REQ_UNBLOCK           2
> +#define KVM_REQ_DIRTY_QUOTA_EXIT  4
Sorry if I missed anything. Why it's 4 instead of 3?


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 2/4] KVM: x86: Dirty quota-based throttling of vcpus
  2022-11-13 17:05 ` [PATCH v7 2/4] KVM: x86: Dirty " Shivam Kumar
@ 2022-11-15  0:16   ` Yunhong Jiang
  2022-11-15  4:55     ` Shivam Kumar
  0 siblings, 1 reply; 38+ messages in thread
From: Yunhong Jiang @ 2022-11-15  0:16 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: pbonzini, seanjc, maz, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat

On Sun, Nov 13, 2022 at 05:05:08PM +0000, Shivam Kumar wrote:
> Exit to userspace whenever the dirty quota is exhausted (i.e. dirty count
> equals/exceeds dirty quota) to request more dirty quota.
> 
> Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
> Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
> Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
> Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
> Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
> ---
>  arch/x86/kvm/mmu/spte.c |  4 ++--
>  arch/x86/kvm/vmx/vmx.c  |  3 +++
>  arch/x86/kvm/x86.c      | 28 ++++++++++++++++++++++++++++
>  3 files changed, 33 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
> index 2e08b2a45361..c0ed35abbf2d 100644
> --- a/arch/x86/kvm/mmu/spte.c
> +++ b/arch/x86/kvm/mmu/spte.c
> @@ -228,9 +228,9 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>  		  "spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level,
>  		  get_rsvd_bits(&vcpu->arch.mmu->shadow_zero_check, spte, level));
>  
> -	if ((spte & PT_WRITABLE_MASK) && kvm_slot_dirty_track_enabled(slot)) {
> +	if (spte & PT_WRITABLE_MASK) {
>  		/* Enforced by kvm_mmu_hugepage_adjust. */
> -		WARN_ON(level > PG_LEVEL_4K);
> +		WARN_ON(level > PG_LEVEL_4K && kvm_slot_dirty_track_enabled(slot));
>  		mark_page_dirty_in_slot(vcpu->kvm, slot, gfn);
>  	}
>  
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 63247c57c72c..cc130999eddf 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -5745,6 +5745,9 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
>  		 */
>  		if (__xfer_to_guest_mode_work_pending())
>  			return 1;
> +
> +		if (kvm_test_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu))
> +			return 1;
Any reason for this check? Is this quota related to the invalid
guest state? Sorry if I missed anything here.

>  	}
>  
>  	return 1;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index ecea83f0da49..1a960fbb51f4 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10494,6 +10494,30 @@ void __kvm_request_immediate_exit(struct kvm_vcpu *vcpu)
>  }
>  EXPORT_SYMBOL_GPL(__kvm_request_immediate_exit);
>  
> +static inline bool kvm_check_dirty_quota_request(struct kvm_vcpu *vcpu)
> +{
> +#ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
> +	struct kvm_run *run;
> +
> +	if (kvm_check_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu)) {
> +		run = vcpu->run;
> +		run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
> +		run->dirty_quota_exit.count = vcpu->stat.generic.pages_dirtied;
> +		run->dirty_quota_exit.quota = READ_ONCE(run->dirty_quota);
> +
> +		/*
> +		 * Re-check the quota and exit if and only if the vCPU still
> +		 * exceeds its quota.  If userspace increases (or disables
> +		 * entirely) the quota, then no exit is required as KVM is
> +		 * still honoring its ABI, e.g. userspace won't even be aware
> +		 * that KVM temporarily detected an exhausted quota.
> +		 */
> +		return run->dirty_quota_exit.count >= run->dirty_quota_exit.quota;
Would it be better to check before updating the vcpu->run?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 3/4] KVM: arm64: Dirty quota-based throttling of vcpus
  2022-11-13 17:05 ` [PATCH v7 3/4] KVM: arm64: " Shivam Kumar
@ 2022-11-15  0:27   ` Yunhong Jiang
  2022-11-15  5:10     ` Shivam Kumar
  2022-11-17 20:44   ` Marc Zyngier
  1 sibling, 1 reply; 38+ messages in thread
From: Yunhong Jiang @ 2022-11-15  0:27 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: pbonzini, seanjc, maz, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat

On Sun, Nov 13, 2022 at 05:05:10PM +0000, Shivam Kumar wrote:
> Exit to userspace whenever the dirty quota is exhausted (i.e. dirty count
> equals/exceeds dirty quota) to request more dirty quota.
> 
> Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
> Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
> Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
> Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
> Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
> ---
>  arch/arm64/kvm/arm.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 94d33e296e10..850024982dd9 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -746,6 +746,15 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
>  
>  		if (kvm_check_request(KVM_REQ_SUSPEND, vcpu))
>  			return kvm_vcpu_suspend(vcpu);
> +
> +		if (kvm_check_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu)) {
> +			struct kvm_run *run = vcpu->run;
> +
> +			run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
> +			run->dirty_quota_exit.count = vcpu->stat.generic.pages_dirtied;
> +			run->dirty_quota_exit.quota = vcpu->dirty_quota;
> +			return 0;
> +		}
There is a recheck on x86 side, but not here. Any architecture specific reason?
Sorry if I missed anything.

BTW, not sure if the x86/arm code can be combined into a common function. I
don't see any architecture specific code here.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus
  2022-11-14 23:29   ` Yunhong Jiang
@ 2022-11-15  4:48     ` Shivam Kumar
  0 siblings, 0 replies; 38+ messages in thread
From: Shivam Kumar @ 2022-11-15  4:48 UTC (permalink / raw)
  To: Yunhong Jiang
  Cc: pbonzini, seanjc, maz, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat



On 15/11/22 4:59 am, Yunhong Jiang wrote:
> On Sun, Nov 13, 2022 at 05:05:06PM +0000, Shivam Kumar wrote:
>> Define variables to track and throttle memory dirtying for every vcpu.
>>
>> dirty_count:    Number of pages the vcpu has dirtied since its creation,
>>                  while dirty logging is enabled.
>> dirty_quota:    Number of pages the vcpu is allowed to dirty. To dirty
>>                  more, it needs to request more quota by exiting to
>>                  userspace.
>>
>> Implement the flow for throttling based on dirty quota.
>>
>> i) Increment dirty_count for the vcpu whenever it dirties a page.
>> ii) Exit to userspace whenever the dirty quota is exhausted (i.e. dirty
>> count equals/exceeds dirty quota) to request more dirty quota.
>>
>> Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
>> Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
>> Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
>> Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
>> Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
>> ---
>>   Documentation/virt/kvm/api.rst | 35 ++++++++++++++++++++++++++++++++++
>>   arch/x86/kvm/Kconfig           |  1 +
>>   include/linux/kvm_host.h       |  5 ++++-
>>   include/linux/kvm_types.h      |  1 +
>>   include/uapi/linux/kvm.h       | 13 +++++++++++++
>>   tools/include/uapi/linux/kvm.h |  1 +
>>   virt/kvm/Kconfig               |  4 ++++
>>   virt/kvm/kvm_main.c            | 25 +++++++++++++++++++++---
>>   8 files changed, 81 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> index eee9f857a986..4568faa33c6d 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -6513,6 +6513,26 @@ array field represents return values. The userspace should update the return
>>   values of SBI call before resuming the VCPU. For more details on RISC-V SBI
>>   spec refer, https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_riscv_riscv-2Dsbi-2Ddoc&d=DwIBAg&c=s883GpUCOChKOHiocYtGcg&r=4hVFP4-J13xyn-OcN0apTCh8iKZRosf5OJTQePXBMB8&m=tRXKbxC9SnS7eawp-UoL_Dqi0C8tdGVG6pmh1Gv6Ijw1OItZ-EFLCPz4aAQ_3sob&s=HFLZ0ulDwLg_-wHdDvDhunY5olDW4tZ-6NQQE9WirIY&e= .
>>   
>> +::
>> +
>> +		/* KVM_EXIT_DIRTY_QUOTA_EXHAUSTED */
>> +		struct {
>> +			__u64 count;
>> +			__u64 quota;
>> +		} dirty_quota_exit;
>> +
>> +If exit reason is KVM_EXIT_DIRTY_QUOTA_EXHAUSTED, it indicates that the VCPU has
>> +exhausted its dirty quota. The 'dirty_quota_exit' member of kvm_run structure
>> +makes the following information available to the userspace:
>> +    count: the current count of pages dirtied by the VCPU, can be
>> +    skewed based on the size of the pages accessed by each vCPU.
>> +    quota: the observed dirty quota just before the exit to userspace.
>> +
>> +The userspace can design a strategy to allocate the overall scope of dirtying
>> +for the VM among the vcpus. Based on the strategy and the current state of dirty
>> +quota throttling, the userspace can make a decision to either update (increase)
>> +the quota or to put the VCPU to sleep for some time.
>> +
>>   ::
>>   
>>       /* KVM_EXIT_NOTIFY */
>> @@ -6567,6 +6587,21 @@ values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
>>   
>>   ::
>>   
>> +	/*
>> +	 * Number of pages the vCPU is allowed to have dirtied over its entire
>> +	 * lifetime.  KVM_RUN exits with KVM_EXIT_DIRTY_QUOTA_EXHAUSTED if the quota
>> +	 * is reached/exceeded.
>> +	 */
>> +	__u64 dirty_quota;
>> +
>> +Please note that enforcing the quota is best effort, as the guest may dirty
>> +multiple pages before KVM can recheck the quota.  However, unless KVM is using
>> +a hardware-based dirty ring buffer, e.g. Intel's Page Modification Logging,
>> +KVM will detect quota exhaustion within a handful of dirtied pages.  If a
>> +hardware ring buffer is used, the overrun is bounded by the size of the buffer
>> +(512 entries for PML).
>> +
>> +::
>>     };
>>   
>>   
>> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
>> index 67be7f217e37..bdbd36321d52 100644
>> --- a/arch/x86/kvm/Kconfig
>> +++ b/arch/x86/kvm/Kconfig
>> @@ -48,6 +48,7 @@ config KVM
>>   	select KVM_VFIO
>>   	select SRCU
>>   	select INTERVAL_TREE
>> +	select HAVE_KVM_DIRTY_QUOTA
>>   	select HAVE_KVM_PM_NOTIFIER if PM
>>   	help
>>   	  Support hosting fully virtualized guest machines using hardware
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index 18592bdf4c1b..0b9b5c251a04 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -151,11 +151,12 @@ static inline bool is_error_page(struct page *page)
>>   #define KVM_REQUEST_NO_ACTION      BIT(10)
>>   /*
>>    * Architecture-independent vcpu->requests bit members
>> - * Bits 3-7 are reserved for more arch-independent bits.
>> + * Bits 5-7 are reserved for more arch-independent bits.
>>    */
>>   #define KVM_REQ_TLB_FLUSH         (0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>>   #define KVM_REQ_VM_DEAD           (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>>   #define KVM_REQ_UNBLOCK           2
>> +#define KVM_REQ_DIRTY_QUOTA_EXIT  4
> Sorry if I missed anything. Why it's 4 instead of 3?
3 was already in use last time. Will update it. Thanks.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 2/4] KVM: x86: Dirty quota-based throttling of vcpus
  2022-11-15  0:16   ` Yunhong Jiang
@ 2022-11-15  4:55     ` Shivam Kumar
  2022-11-15  6:45       ` Yunhong Jiang
  0 siblings, 1 reply; 38+ messages in thread
From: Shivam Kumar @ 2022-11-15  4:55 UTC (permalink / raw)
  To: Yunhong Jiang
  Cc: pbonzini, seanjc, maz, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat



On 15/11/22 5:46 am, Yunhong Jiang wrote:
> On Sun, Nov 13, 2022 at 05:05:08PM +0000, Shivam Kumar wrote:
>> Exit to userspace whenever the dirty quota is exhausted (i.e. dirty count
>> equals/exceeds dirty quota) to request more dirty quota.
>>
>> Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
>> Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
>> Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
>> Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
>> Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
>> ---
>>   arch/x86/kvm/mmu/spte.c |  4 ++--
>>   arch/x86/kvm/vmx/vmx.c  |  3 +++
>>   arch/x86/kvm/x86.c      | 28 ++++++++++++++++++++++++++++
>>   3 files changed, 33 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
>> index 2e08b2a45361..c0ed35abbf2d 100644
>> --- a/arch/x86/kvm/mmu/spte.c
>> +++ b/arch/x86/kvm/mmu/spte.c
>> @@ -228,9 +228,9 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>>   		  "spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level,
>>   		  get_rsvd_bits(&vcpu->arch.mmu->shadow_zero_check, spte, level));
>>   
>> -	if ((spte & PT_WRITABLE_MASK) && kvm_slot_dirty_track_enabled(slot)) {
>> +	if (spte & PT_WRITABLE_MASK) {
>>   		/* Enforced by kvm_mmu_hugepage_adjust. */
>> -		WARN_ON(level > PG_LEVEL_4K);
>> +		WARN_ON(level > PG_LEVEL_4K && kvm_slot_dirty_track_enabled(slot));
>>   		mark_page_dirty_in_slot(vcpu->kvm, slot, gfn);
>>   	}
>>   
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index 63247c57c72c..cc130999eddf 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -5745,6 +5745,9 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
>>   		 */
>>   		if (__xfer_to_guest_mode_work_pending())
>>   			return 1;
>> +
>> +		if (kvm_test_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu))
>> +			return 1;
> Any reason for this check? Is this quota related to the invalid
> guest state? Sorry if I missed anything here.
Quoting Sean:
"And thinking more about silly edge cases, VMX's big emulation loop for 
invalid
guest state when unrestricted guest is disabled should probably 
explicitly check
the dirty quota.  Again, I doubt it matters to anyone's use case, but it 
is treated
as a full run loop for things like pending signals, it'd be good to be 
consistent."

Please see v4 for details. Thanks.
> 
>>   	}
>>   
>>   	return 1;
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index ecea83f0da49..1a960fbb51f4 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -10494,6 +10494,30 @@ void __kvm_request_immediate_exit(struct kvm_vcpu *vcpu)
>>   }
>>   EXPORT_SYMBOL_GPL(__kvm_request_immediate_exit);
>>   
>> +static inline bool kvm_check_dirty_quota_request(struct kvm_vcpu *vcpu)
>> +{
>> +#ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
>> +	struct kvm_run *run;
>> +
>> +	if (kvm_check_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu)) {
>> +		run = vcpu->run;
>> +		run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
>> +		run->dirty_quota_exit.count = vcpu->stat.generic.pages_dirtied;
>> +		run->dirty_quota_exit.quota = READ_ONCE(run->dirty_quota);
>> +
>> +		/*
>> +		 * Re-check the quota and exit if and only if the vCPU still
>> +		 * exceeds its quota.  If userspace increases (or disables
>> +		 * entirely) the quota, then no exit is required as KVM is
>> +		 * still honoring its ABI, e.g. userspace won't even be aware
>> +		 * that KVM temporarily detected an exhausted quota.
>> +		 */
>> +		return run->dirty_quota_exit.count >= run->dirty_quota_exit.quota;
> Would it be better to check before updating the vcpu->run?
The reason for checking it at the last moment is to avoid invalid exits to
userspace as much as possible.


Thanks and regards,
Shivam

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 3/4] KVM: arm64: Dirty quota-based throttling of vcpus
  2022-11-15  0:27   ` Yunhong Jiang
@ 2022-11-15  5:10     ` Shivam Kumar
  0 siblings, 0 replies; 38+ messages in thread
From: Shivam Kumar @ 2022-11-15  5:10 UTC (permalink / raw)
  To: Yunhong Jiang
  Cc: pbonzini, seanjc, maz, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat



On 15/11/22 5:57 am, Yunhong Jiang wrote:
> On Sun, Nov 13, 2022 at 05:05:10PM +0000, Shivam Kumar wrote:
>> Exit to userspace whenever the dirty quota is exhausted (i.e. dirty count
>> equals/exceeds dirty quota) to request more dirty quota.
>>
>> Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
>> Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
>> Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
>> Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
>> Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
>> ---
>>   arch/arm64/kvm/arm.c | 9 +++++++++
>>   1 file changed, 9 insertions(+)
>>
>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>> index 94d33e296e10..850024982dd9 100644
>> --- a/arch/arm64/kvm/arm.c
>> +++ b/arch/arm64/kvm/arm.c
>> @@ -746,6 +746,15 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
>>   
>>   		if (kvm_check_request(KVM_REQ_SUSPEND, vcpu))
>>   			return kvm_vcpu_suspend(vcpu);
>> +
>> +		if (kvm_check_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu)) {
>> +			struct kvm_run *run = vcpu->run;
>> +
>> +			run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
>> +			run->dirty_quota_exit.count = vcpu->stat.generic.pages_dirtied;
>> +			run->dirty_quota_exit.quota = vcpu->dirty_quota;
>> +			return 0;
>> +		}
> There is a recheck on x86 side, but not here. Any architecture specific reason?
> Sorry if I missed anything.
> 
> BTW, not sure if the x86/arm code can be combined into a common function. I
> don't see any architecture specific code here.
Yes, I think we may use a common function for x86/arm.

With that, this would look like:

if (kvm_check_dirty_quota_request(vcpu)) {
	return 0;
}

Thanks,
Shivam

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 2/4] KVM: x86: Dirty quota-based throttling of vcpus
  2022-11-15  4:55     ` Shivam Kumar
@ 2022-11-15  6:45       ` Yunhong Jiang
  2022-11-18  8:51         ` Shivam Kumar
  0 siblings, 1 reply; 38+ messages in thread
From: Yunhong Jiang @ 2022-11-15  6:45 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: pbonzini, seanjc, maz, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat

On Tue, Nov 15, 2022 at 10:25:31AM +0530, Shivam Kumar wrote:
> 
> 
> On 15/11/22 5:46 am, Yunhong Jiang wrote:
> > On Sun, Nov 13, 2022 at 05:05:08PM +0000, Shivam Kumar wrote:
> > > Exit to userspace whenever the dirty quota is exhausted (i.e. dirty count
> > > equals/exceeds dirty quota) to request more dirty quota.
> > > 
> > > Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
> > > Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
> > > Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
> > > Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
> > > Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
> > > ---
> > >   arch/x86/kvm/mmu/spte.c |  4 ++--
> > >   arch/x86/kvm/vmx/vmx.c  |  3 +++
> > >   arch/x86/kvm/x86.c      | 28 ++++++++++++++++++++++++++++
> > >   3 files changed, 33 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
> > > index 2e08b2a45361..c0ed35abbf2d 100644
> > > --- a/arch/x86/kvm/mmu/spte.c
> > > +++ b/arch/x86/kvm/mmu/spte.c
> > > @@ -228,9 +228,9 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
> > >   		  "spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level,
> > >   		  get_rsvd_bits(&vcpu->arch.mmu->shadow_zero_check, spte, level));
> > > -	if ((spte & PT_WRITABLE_MASK) && kvm_slot_dirty_track_enabled(slot)) {
> > > +	if (spte & PT_WRITABLE_MASK) {
> > >   		/* Enforced by kvm_mmu_hugepage_adjust. */
> > > -		WARN_ON(level > PG_LEVEL_4K);
> > > +		WARN_ON(level > PG_LEVEL_4K && kvm_slot_dirty_track_enabled(slot));
> > >   		mark_page_dirty_in_slot(vcpu->kvm, slot, gfn);
> > >   	}
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index 63247c57c72c..cc130999eddf 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -5745,6 +5745,9 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
> > >   		 */
> > >   		if (__xfer_to_guest_mode_work_pending())
> > >   			return 1;
> > > +
> > > +		if (kvm_test_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu))
> > > +			return 1;
> > Any reason for this check? Is this quota related to the invalid
> > guest state? Sorry if I missed anything here.
> Quoting Sean:
> "And thinking more about silly edge cases, VMX's big emulation loop for
> invalid
> guest state when unrestricted guest is disabled should probably explicitly
> check
> the dirty quota.  Again, I doubt it matters to anyone's use case, but it is
> treated
> as a full run loop for things like pending signals, it'd be good to be
> consistent."
> 
> Please see v4 for details. Thanks.
Thank you for the sharing.
> > 
> > >   	}
> > >   	return 1;
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index ecea83f0da49..1a960fbb51f4 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -10494,6 +10494,30 @@ void __kvm_request_immediate_exit(struct kvm_vcpu *vcpu)
> > >   }
> > >   EXPORT_SYMBOL_GPL(__kvm_request_immediate_exit);
> > > +static inline bool kvm_check_dirty_quota_request(struct kvm_vcpu *vcpu)
> > > +{
> > > +#ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
> > > +	struct kvm_run *run;
> > > +
> > > +	if (kvm_check_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu)) {
> > > +		run = vcpu->run;
> > > +		run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
> > > +		run->dirty_quota_exit.count = vcpu->stat.generic.pages_dirtied;
> > > +		run->dirty_quota_exit.quota = READ_ONCE(run->dirty_quota);
> > > +
> > > +		/*
> > > +		 * Re-check the quota and exit if and only if the vCPU still
> > > +		 * exceeds its quota.  If userspace increases (or disables
> > > +		 * entirely) the quota, then no exit is required as KVM is
> > > +		 * still honoring its ABI, e.g. userspace won't even be aware
> > > +		 * that KVM temporarily detected an exhausted quota.
> > > +		 */
> > > +		return run->dirty_quota_exit.count >= run->dirty_quota_exit.quota;
> > Would it be better to check before updating the vcpu->run?
> The reason for checking it at the last moment is to avoid invalid exits to
> userspace as much as possible.

So if the userspace increases the quota, then the above vcpu->run change just
leaves some garbage information on vcpu->run and the exit_reason is
misleading. Possibly it's ok since this information will not be used anymore.

Not sure how critical is the time spent on the vcpu->run update.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus
  2022-11-13 17:05 ` [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus Shivam Kumar
  2022-11-14 23:29   ` Yunhong Jiang
@ 2022-11-17 19:26   ` Marc Zyngier
  2022-11-18  9:47     ` Shivam Kumar
  2022-11-25 10:52   ` kernel test robot
  2 siblings, 1 reply; 38+ messages in thread
From: Marc Zyngier @ 2022-11-17 19:26 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: pbonzini, seanjc, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat

On Sun, 13 Nov 2022 17:05:06 +0000,
Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
> 
> Define variables to track and throttle memory dirtying for every vcpu.
> 
> dirty_count:    Number of pages the vcpu has dirtied since its creation,
>                 while dirty logging is enabled.
> dirty_quota:    Number of pages the vcpu is allowed to dirty. To dirty
>                 more, it needs to request more quota by exiting to
>                 userspace.
> 
> Implement the flow for throttling based on dirty quota.
> 
> i) Increment dirty_count for the vcpu whenever it dirties a page.
> ii) Exit to userspace whenever the dirty quota is exhausted (i.e. dirty
> count equals/exceeds dirty quota) to request more dirty quota.
> 
> Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
> Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
> Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
> Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
> Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
> ---
>  Documentation/virt/kvm/api.rst | 35 ++++++++++++++++++++++++++++++++++
>  arch/x86/kvm/Kconfig           |  1 +
>  include/linux/kvm_host.h       |  5 ++++-
>  include/linux/kvm_types.h      |  1 +
>  include/uapi/linux/kvm.h       | 13 +++++++++++++
>  tools/include/uapi/linux/kvm.h |  1 +
>  virt/kvm/Kconfig               |  4 ++++
>  virt/kvm/kvm_main.c            | 25 +++++++++++++++++++++---
>  8 files changed, 81 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index eee9f857a986..4568faa33c6d 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6513,6 +6513,26 @@ array field represents return values. The userspace should update the return
>  values of SBI call before resuming the VCPU. For more details on RISC-V SBI
>  spec refer, https://github.com/riscv/riscv-sbi-doc.
>  
> +::
> +
> +		/* KVM_EXIT_DIRTY_QUOTA_EXHAUSTED */
> +		struct {
> +			__u64 count;
> +			__u64 quota;
> +		} dirty_quota_exit;
> +
> +If exit reason is KVM_EXIT_DIRTY_QUOTA_EXHAUSTED, it indicates that the VCPU has
> +exhausted its dirty quota. The 'dirty_quota_exit' member of kvm_run structure
> +makes the following information available to the userspace:
> +    count: the current count of pages dirtied by the VCPU, can be
> +    skewed based on the size of the pages accessed by each vCPU.

How can userspace make a decision on the amount of dirtying this
represent if this doesn't represent a number of base pages? Or are you
saying that this only counts the number of permission faults that have
dirtied pages?

> +    quota: the observed dirty quota just before the exit to
> userspace.

You are defining the quota in terms of quota. -ENOCLUE.

> +
> +The userspace can design a strategy to allocate the overall scope of dirtying
> +for the VM among the vcpus. Based on the strategy and the current state of dirty
> +quota throttling, the userspace can make a decision to either update (increase)
> +the quota or to put the VCPU to sleep for some time.

This looks like something out of 1984 (Newspeak anyone)? Can't you
just say that userspace is responsible for allocating the quota and
manage the resulting throttling effect?

> +
>  ::
>  
>      /* KVM_EXIT_NOTIFY */
> @@ -6567,6 +6587,21 @@ values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
>  
>  ::
>  
> +	/*
> +	 * Number of pages the vCPU is allowed to have dirtied over its entire
> +	 * lifetime.  KVM_RUN exits with KVM_EXIT_DIRTY_QUOTA_EXHAUSTED if the quota
> +	 * is reached/exceeded.
> +	 */
> +	__u64 dirty_quota;

How are dirty_quota and dirty_quota_exit.quota related?

> +
> +Please note that enforcing the quota is best effort, as the guest may dirty
> +multiple pages before KVM can recheck the quota.  However, unless KVM is using
> +a hardware-based dirty ring buffer, e.g. Intel's Page Modification Logging,
> +KVM will detect quota exhaustion within a handful of dirtied pages.  If a
> +hardware ring buffer is used, the overrun is bounded by the size of the buffer
> +(512 entries for PML).
> +
> +::
>    };
>  
>  
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index 67be7f217e37..bdbd36321d52 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -48,6 +48,7 @@ config KVM
>  	select KVM_VFIO
>  	select SRCU
>  	select INTERVAL_TREE
> +	select HAVE_KVM_DIRTY_QUOTA

Why isn't this part of the x86 patch?

>  	select HAVE_KVM_PM_NOTIFIER if PM
>  	help
>  	  Support hosting fully virtualized guest machines using hardware
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 18592bdf4c1b..0b9b5c251a04 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -151,11 +151,12 @@ static inline bool is_error_page(struct page *page)
>  #define KVM_REQUEST_NO_ACTION      BIT(10)
>  /*
>   * Architecture-independent vcpu->requests bit members
> - * Bits 3-7 are reserved for more arch-independent bits.
> + * Bits 5-7 are reserved for more arch-independent bits.
>   */
>  #define KVM_REQ_TLB_FLUSH         (0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>  #define KVM_REQ_VM_DEAD           (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>  #define KVM_REQ_UNBLOCK           2
> +#define KVM_REQ_DIRTY_QUOTA_EXIT  4

Where is 3? Why reserve two bits when only one is used?

>  #define KVM_REQUEST_ARCH_BASE     8
>  
>  /*
> @@ -379,6 +380,8 @@ struct kvm_vcpu {
>  	 */
>  	struct kvm_memory_slot *last_used_slot;
>  	u64 last_used_slot_gen;
> +
> +	u64 dirty_quota;
>  };
>  
>  /*
> diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
> index 3ca3db020e0e..263a588f3cd3 100644
> --- a/include/linux/kvm_types.h
> +++ b/include/linux/kvm_types.h
> @@ -118,6 +118,7 @@ struct kvm_vcpu_stat_generic {
>  	u64 halt_poll_fail_hist[HALT_POLL_HIST_COUNT];
>  	u64 halt_wait_hist[HALT_POLL_HIST_COUNT];
>  	u64 blocking;
> +	u64 pages_dirtied;
>  };
>  
>  #define KVM_STATS_NAME_SIZE	48
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 0d5d4419139a..5acb8991f872 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -272,6 +272,7 @@ struct kvm_xen_exit {
>  #define KVM_EXIT_RISCV_SBI        35
>  #define KVM_EXIT_RISCV_CSR        36
>  #define KVM_EXIT_NOTIFY           37
> +#define KVM_EXIT_DIRTY_QUOTA_EXHAUSTED 38
>  
>  /* For KVM_EXIT_INTERNAL_ERROR */
>  /* Emulate instruction failed. */
> @@ -510,6 +511,11 @@ struct kvm_run {
>  #define KVM_NOTIFY_CONTEXT_INVALID	(1 << 0)
>  			__u32 flags;
>  		} notify;
> +		/* KVM_EXIT_DIRTY_QUOTA_EXHAUSTED */
> +		struct {
> +			__u64 count;
> +			__u64 quota;
> +		} dirty_quota_exit;
>  		/* Fix the size of the union. */
>  		char padding[256];
>  	};
> @@ -531,6 +537,12 @@ struct kvm_run {
>  		struct kvm_sync_regs regs;
>  		char padding[SYNC_REGS_SIZE_BYTES];
>  	} s;
> +	/*
> +	 * Number of pages the vCPU is allowed to have dirtied over its entire
> +	 * lifetime.  KVM_RUN exits with KVM_EXIT_DIRTY_QUOTA_EXHAUSTED if the
> +	 * quota is reached/exceeded.
> +	 */
> +	__u64 dirty_quota;
>  };
>  
>  /* for KVM_REGISTER_COALESCED_MMIO / KVM_UNREGISTER_COALESCED_MMIO */
> @@ -1178,6 +1190,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_S390_ZPCI_OP 221
>  #define KVM_CAP_S390_CPU_TOPOLOGY 222
>  #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
> +#define KVM_CAP_DIRTY_QUOTA 224
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h
> index 0d5d4419139a..c8f811572670 100644
> --- a/tools/include/uapi/linux/kvm.h
> +++ b/tools/include/uapi/linux/kvm.h
> @@ -1178,6 +1178,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_S390_ZPCI_OP 221
>  #define KVM_CAP_S390_CPU_TOPOLOGY 222
>  #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
> +#define KVM_CAP_DIRTY_QUOTA 224
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 800f9470e36b..b6418a578c0a 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -19,6 +19,9 @@ config HAVE_KVM_IRQ_ROUTING
>  config HAVE_KVM_DIRTY_RING
>         bool
>  
> +config HAVE_KVM_DIRTY_QUOTA
> +       bool
> +
>  # Only strongly ordered architectures can select this, as it doesn't
>  # put any explicit constraint on userspace ordering. They can also
>  # select the _ACQ_REL version.
> @@ -86,3 +89,4 @@ config KVM_XFER_TO_GUEST_WORK
>  
>  config HAVE_KVM_PM_NOTIFIER
>         bool
> +
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 25d7872b29c1..7a54438b4d49 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3298,18 +3298,32 @@ int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len)
>  }
>  EXPORT_SYMBOL_GPL(kvm_clear_guest);
>  
> +static bool kvm_vcpu_is_dirty_quota_exhausted(struct kvm_vcpu *vcpu)
> +{
> +#ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
> +	u64 dirty_quota = READ_ONCE(vcpu->run->dirty_quota);
> +
> +	return dirty_quota && (vcpu->stat.generic.pages_dirtied >= dirty_quota);
> +#else
> +	return false;
> +#endif

If you introduce additional #ifdefery here, why are the additional
fields in the vcpu structure unconditional?

> +}
> +
>  void mark_page_dirty_in_slot(struct kvm *kvm,
>  			     const struct kvm_memory_slot *memslot,
>  		 	     gfn_t gfn)
>  {
>  	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>  
> -#ifdef CONFIG_HAVE_KVM_DIRTY_RING
>  	if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
>  		return;
> -#endif
>  
> -	if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
> +	if (!memslot)
> +		return;
> +
> +	WARN_ON_ONCE(!vcpu->stat.generic.pages_dirtied++);
> +
> +	if (kvm_slot_dirty_track_enabled(memslot)) {
>  		unsigned long rel_gfn = gfn - memslot->base_gfn;
>  		u32 slot = (memslot->as_id << 16) | memslot->id;
>  
> @@ -3318,6 +3332,9 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>  					    slot, rel_gfn);
>  		else
>  			set_bit_le(rel_gfn, memslot->dirty_bitmap);
> +
> +		if (kvm_vcpu_is_dirty_quota_exhausted(vcpu))
> +			kvm_make_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu);

This is broken in the light of new dirty-tracking code queued for
6.2. Specifically, you absolutely can end-up here *without* a vcpu on
arm64. You just have to snapshot the ITS state to observe the fireworks.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 3/4] KVM: arm64: Dirty quota-based throttling of vcpus
  2022-11-13 17:05 ` [PATCH v7 3/4] KVM: arm64: " Shivam Kumar
  2022-11-15  0:27   ` Yunhong Jiang
@ 2022-11-17 20:44   ` Marc Zyngier
  2022-11-18  8:56     ` Shivam Kumar
  1 sibling, 1 reply; 38+ messages in thread
From: Marc Zyngier @ 2022-11-17 20:44 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: pbonzini, seanjc, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat

On Sun, 13 Nov 2022 17:05:10 +0000,
Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
> 
> Exit to userspace whenever the dirty quota is exhausted (i.e. dirty count
> equals/exceeds dirty quota) to request more dirty quota.
> 
> Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
> Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
> Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
> Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
> Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
> ---
>  arch/arm64/kvm/arm.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 94d33e296e10..850024982dd9 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -746,6 +746,15 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
>  
>  		if (kvm_check_request(KVM_REQ_SUSPEND, vcpu))
>  			return kvm_vcpu_suspend(vcpu);
> +
> +		if (kvm_check_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu)) {
> +			struct kvm_run *run = vcpu->run;
> +
> +			run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
> +			run->dirty_quota_exit.count = vcpu->stat.generic.pages_dirtied;
> +			run->dirty_quota_exit.quota = vcpu->dirty_quota;
> +			return 0;
> +		}
>  	}
>  
>  	return 1;

As pointed out by others, this should be common code. This would
definitely avoid the difference in behaviour between architectures.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 2/4] KVM: x86: Dirty quota-based throttling of vcpus
  2022-11-15  6:45       ` Yunhong Jiang
@ 2022-11-18  8:51         ` Shivam Kumar
  0 siblings, 0 replies; 38+ messages in thread
From: Shivam Kumar @ 2022-11-18  8:51 UTC (permalink / raw)
  To: Yunhong Jiang
  Cc: pbonzini, seanjc, maz, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat



On 15/11/22 12:15 pm, Yunhong Jiang wrote:
> On Tue, Nov 15, 2022 at 10:25:31AM +0530, Shivam Kumar wrote:
>>
>>
>> On 15/11/22 5:46 am, Yunhong Jiang wrote:
>>> On Sun, Nov 13, 2022 at 05:05:08PM +0000, Shivam Kumar wrote:
>>>> Exit to userspace whenever the dirty quota is exhausted (i.e. dirty count
>>>> equals/exceeds dirty quota) to request more dirty quota.
>>>>
>>>> Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
>>>> Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
>>>> Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
>>>> Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
>>>> Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
>>>> ---
>>>>    arch/x86/kvm/mmu/spte.c |  4 ++--
>>>>    arch/x86/kvm/vmx/vmx.c  |  3 +++
>>>>    arch/x86/kvm/x86.c      | 28 ++++++++++++++++++++++++++++
>>>>    3 files changed, 33 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
>>>> index 2e08b2a45361..c0ed35abbf2d 100644
>>>> --- a/arch/x86/kvm/mmu/spte.c
>>>> +++ b/arch/x86/kvm/mmu/spte.c
>>>> @@ -228,9 +228,9 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>>>>    		  "spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level,
>>>>    		  get_rsvd_bits(&vcpu->arch.mmu->shadow_zero_check, spte, level));
>>>> -	if ((spte & PT_WRITABLE_MASK) && kvm_slot_dirty_track_enabled(slot)) {
>>>> +	if (spte & PT_WRITABLE_MASK) {
>>>>    		/* Enforced by kvm_mmu_hugepage_adjust. */
>>>> -		WARN_ON(level > PG_LEVEL_4K);
>>>> +		WARN_ON(level > PG_LEVEL_4K && kvm_slot_dirty_track_enabled(slot));
>>>>    		mark_page_dirty_in_slot(vcpu->kvm, slot, gfn);
>>>>    	}
>>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>>>> index 63247c57c72c..cc130999eddf 100644
>>>> --- a/arch/x86/kvm/vmx/vmx.c
>>>> +++ b/arch/x86/kvm/vmx/vmx.c
>>>> @@ -5745,6 +5745,9 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
>>>>    		 */
>>>>    		if (__xfer_to_guest_mode_work_pending())
>>>>    			return 1;
>>>> +
>>>> +		if (kvm_test_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu))
>>>> +			return 1;
>>> Any reason for this check? Is this quota related to the invalid
>>> guest state? Sorry if I missed anything here.
>> Quoting Sean:
>> "And thinking more about silly edge cases, VMX's big emulation loop for
>> invalid
>> guest state when unrestricted guest is disabled should probably explicitly
>> check
>> the dirty quota.  Again, I doubt it matters to anyone's use case, but it is
>> treated
>> as a full run loop for things like pending signals, it'd be good to be
>> consistent."
>>
>> Please see v4 for details. Thanks.
> Thank you for the sharing.
>>>
>>>>    	}
>>>>    	return 1;
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>> index ecea83f0da49..1a960fbb51f4 100644
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -10494,6 +10494,30 @@ void __kvm_request_immediate_exit(struct kvm_vcpu *vcpu)
>>>>    }
>>>>    EXPORT_SYMBOL_GPL(__kvm_request_immediate_exit);
>>>> +static inline bool kvm_check_dirty_quota_request(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +#ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
>>>> +	struct kvm_run *run;
>>>> +
>>>> +	if (kvm_check_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu)) {
>>>> +		run = vcpu->run;
>>>> +		run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
>>>> +		run->dirty_quota_exit.count = vcpu->stat.generic.pages_dirtied;
>>>> +		run->dirty_quota_exit.quota = READ_ONCE(run->dirty_quota);
>>>> +
>>>> +		/*
>>>> +		 * Re-check the quota and exit if and only if the vCPU still
>>>> +		 * exceeds its quota.  If userspace increases (or disables
>>>> +		 * entirely) the quota, then no exit is required as KVM is
>>>> +		 * still honoring its ABI, e.g. userspace won't even be aware
>>>> +		 * that KVM temporarily detected an exhausted quota.
>>>> +		 */
>>>> +		return run->dirty_quota_exit.count >= run->dirty_quota_exit.quota;
>>> Would it be better to check before updating the vcpu->run?
>> The reason for checking it at the last moment is to avoid invalid exits to
>> userspace as much as possible.
> 
> So if the userspace increases the quota, then the above vcpu->run change just
> leaves some garbage information on vcpu->run and the exit_reason is
> misleading. Possibly it's ok since this information will not be used anymore.
> 
> Not sure how critical is the time spent on the vcpu->run update.
IMO the time spent in the update might not be very significant but the 
grabage value is harmless.

Thanks,
Shivam

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 3/4] KVM: arm64: Dirty quota-based throttling of vcpus
  2022-11-17 20:44   ` Marc Zyngier
@ 2022-11-18  8:56     ` Shivam Kumar
  0 siblings, 0 replies; 38+ messages in thread
From: Shivam Kumar @ 2022-11-18  8:56 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: pbonzini, seanjc, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat



On 18/11/22 2:14 am, Marc Zyngier wrote:
> On Sun, 13 Nov 2022 17:05:10 +0000,
> Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
>>
>> Exit to userspace whenever the dirty quota is exhausted (i.e. dirty count
>> equals/exceeds dirty quota) to request more dirty quota.
>>
>> Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
>> Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
>> Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
>> Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
>> Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
>> ---
>>   arch/arm64/kvm/arm.c | 9 +++++++++
>>   1 file changed, 9 insertions(+)
>>
>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>> index 94d33e296e10..850024982dd9 100644
>> --- a/arch/arm64/kvm/arm.c
>> +++ b/arch/arm64/kvm/arm.c
>> @@ -746,6 +746,15 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
>>   
>>   		if (kvm_check_request(KVM_REQ_SUSPEND, vcpu))
>>   			return kvm_vcpu_suspend(vcpu);
>> +
>> +		if (kvm_check_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu)) {
>> +			struct kvm_run *run = vcpu->run;
>> +
>> +			run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
>> +			run->dirty_quota_exit.count = vcpu->stat.generic.pages_dirtied;
>> +			run->dirty_quota_exit.quota = vcpu->dirty_quota;
>> +			return 0;
>> +		}
>>   	}
>>   
>>   	return 1;
> 
> As pointed out by others, this should be common code. This would
> definitely avoid the difference in behaviour between architectures.
> 
> 	M.
> 
Ack.

Thanks,
Shivam

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus
  2022-11-17 19:26   ` Marc Zyngier
@ 2022-11-18  9:47     ` Shivam Kumar
  2022-11-22 17:46       ` Marc Zyngier
  0 siblings, 1 reply; 38+ messages in thread
From: Shivam Kumar @ 2022-11-18  9:47 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: pbonzini, seanjc, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat



On 18/11/22 12:56 am, Marc Zyngier wrote:
> On Sun, 13 Nov 2022 17:05:06 +0000,
> Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
>>
>> +    count: the current count of pages dirtied by the VCPU, can be
>> +    skewed based on the size of the pages accessed by each vCPU.
> 
> How can userspace make a decision on the amount of dirtying this
> represent if this doesn't represent a number of base pages? Or are you
> saying that this only counts the number of permission faults that have
> dirtied pages?

Yes, this only counts the number of permission faults that have dirtied 
pages.

> 
>> +    quota: the observed dirty quota just before the exit to
>> userspace.
> 
> You are defining the quota in terms of quota. -ENOCLUE.

I am defining the "quota" member of the dirty_quota_exit struct in terms 
of "dirty quota" which is already defined in the commit message.

> 
>> +
>> +The userspace can design a strategy to allocate the overall scope of dirtying
>> +for the VM among the vcpus. Based on the strategy and the current state of dirty
>> +quota throttling, the userspace can make a decision to either update (increase)
>> +the quota or to put the VCPU to sleep for some time.
> 
> This looks like something out of 1984 (Newspeak anyone)? Can't you
> just say that userspace is responsible for allocating the quota and
> manage the resulting throttling effect?

We didn't intend to sound like the Party or the Big Brother. We started 
working on the linux and QEMU patches at the same time and got tempted 
into exposing the details of how we were using this feature in QEMU for 
throttling. I can get rid of the details if it helps.

>> +	/*
>> +	 * Number of pages the vCPU is allowed to have dirtied over its entire
>> +	 * lifetime.  KVM_RUN exits with KVM_EXIT_DIRTY_QUOTA_EXHAUSTED if the quota
>> +	 * is reached/exceeded.
>> +	 */
>> +	__u64 dirty_quota;
> 
> How are dirty_quota and dirty_quota_exit.quota related?
> 

dirty_quota_exit.quota is the dirty quota at the time of the exit. We 
are capturing it for userspace's reference because dirty quota can be 
updated anytime.

>> @@ -48,6 +48,7 @@ config KVM
>>   	select KVM_VFIO
>>   	select SRCU
>>   	select INTERVAL_TREE
>> +	select HAVE_KVM_DIRTY_QUOTA
> 
> Why isn't this part of the x86 patch?

Ack. Thanks.

>>    * Architecture-independent vcpu->requests bit members
>> - * Bits 3-7 are reserved for more arch-independent bits.
>> + * Bits 5-7 are reserved for more arch-independent bits.
>>    */
>>   #define KVM_REQ_TLB_FLUSH         (0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>>   #define KVM_REQ_VM_DEAD           (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>>   #define KVM_REQ_UNBLOCK           2
>> +#define KVM_REQ_DIRTY_QUOTA_EXIT  4
> 
> Where is 3? Why reserve two bits when only one is used?

Ack. 3 was in use when I was working on the patchset. Missed this in my 
last code walkthrough before sending the patchset. Thanks.

>>   
>> +static bool kvm_vcpu_is_dirty_quota_exhausted(struct kvm_vcpu *vcpu)
>> +{
>> +#ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
>> +	u64 dirty_quota = READ_ONCE(vcpu->run->dirty_quota);
>> +
>> +	return dirty_quota && (vcpu->stat.generic.pages_dirtied >= dirty_quota);
>> +#else
>> +	return false;
>> +#endif
> 
> If you introduce additional #ifdefery here, why are the additional
> fields in the vcpu structure unconditional?

pages_dirtied can be a useful information even if dirty quota throttling 
is not used. So, I kept it unconditional based on feedback.

CC: Sean

I can add #ifdefery in the vcpu run struct for dirty_quota.

>>   		else
>>   			set_bit_le(rel_gfn, memslot->dirty_bitmap);
>> +
>> +		if (kvm_vcpu_is_dirty_quota_exhausted(vcpu))
>> +			kvm_make_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu);
> 
> This is broken in the light of new dirty-tracking code queued for
> 6.2. Specifically, you absolutely can end-up here *without* a vcpu on
> arm64. You just have to snapshot the ITS state to observe the fireworks.

Could you please point me to the patchset which is in queue?


I am grateful for the suggestions and feedback.

Thanks,
Shivam

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus
  2022-11-18  9:47     ` Shivam Kumar
@ 2022-11-22 17:46       ` Marc Zyngier
  2022-12-06  6:22         ` Shivam Kumar
  0 siblings, 1 reply; 38+ messages in thread
From: Marc Zyngier @ 2022-11-22 17:46 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: pbonzini, seanjc, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat

On Fri, 18 Nov 2022 09:47:50 +0000,
Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
> 
> 
> 
> On 18/11/22 12:56 am, Marc Zyngier wrote:
> > On Sun, 13 Nov 2022 17:05:06 +0000,
> > Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
> >> 
> >> +    count: the current count of pages dirtied by the VCPU, can be
> >> +    skewed based on the size of the pages accessed by each vCPU.
> > 
> > How can userspace make a decision on the amount of dirtying this
> > represent if this doesn't represent a number of base pages? Or are you
> > saying that this only counts the number of permission faults that have
> > dirtied pages?
> 
> Yes, this only counts the number of permission faults that have
> dirtied pages.

So how can userspace consistently set a quota of dirtied memory? This
has to account for the size that has been faulted, because that's all
userspace can reason about. Remember that at least on arm64, we're
dealing with 3 different base page sizes, and many more large page
sizes.

> 
> > 
> >> +    quota: the observed dirty quota just before the exit to
> >> userspace.
> > 
> > You are defining the quota in terms of quota. -ENOCLUE.
> 
> I am defining the "quota" member of the dirty_quota_exit struct in
> terms of "dirty quota" which is already defined in the commit
> message.

Which nobody will see. This is supposed to be a self contained
documentation.

> 
> > 
> >> +
> >> +The userspace can design a strategy to allocate the overall scope of dirtying
> >> +for the VM among the vcpus. Based on the strategy and the current state of dirty
> >> +quota throttling, the userspace can make a decision to either update (increase)
> >> +the quota or to put the VCPU to sleep for some time.
> > 
> > This looks like something out of 1984 (Newspeak anyone)? Can't you
> > just say that userspace is responsible for allocating the quota and
> > manage the resulting throttling effect?
> 
> We didn't intend to sound like the Party or the Big Brother. We
> started working on the linux and QEMU patches at the same time and got
> tempted into exposing the details of how we were using this feature in
> QEMU for throttling. I can get rid of the details if it helps.

I think the details are meaningless, and this should stick to the API,
not the way the API could be used.

> 
> >> +	/*
> >> +	 * Number of pages the vCPU is allowed to have dirtied over its entire
> >> +	 * lifetime.  KVM_RUN exits with KVM_EXIT_DIRTY_QUOTA_EXHAUSTED if the quota
> >> +	 * is reached/exceeded.
> >> +	 */
> >> +	__u64 dirty_quota;
> > 
> > How are dirty_quota and dirty_quota_exit.quota related?
> > 
> 
> dirty_quota_exit.quota is the dirty quota at the time of the exit. We
> are capturing it for userspace's reference because dirty quota can be
> updated anytime.

Shouldn't that be described here?

> 
> >> @@ -48,6 +48,7 @@ config KVM
> >>   	select KVM_VFIO
> >>   	select SRCU
> >>   	select INTERVAL_TREE
> >> +	select HAVE_KVM_DIRTY_QUOTA
> > 
> > Why isn't this part of the x86 patch?
> 
> Ack. Thanks.
> 
> >>    * Architecture-independent vcpu->requests bit members
> >> - * Bits 3-7 are reserved for more arch-independent bits.
> >> + * Bits 5-7 are reserved for more arch-independent bits.
> >>    */
> >>   #define KVM_REQ_TLB_FLUSH         (0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> >>   #define KVM_REQ_VM_DEAD           (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> >>   #define KVM_REQ_UNBLOCK           2
> >> +#define KVM_REQ_DIRTY_QUOTA_EXIT  4
> > 
> > Where is 3? Why reserve two bits when only one is used?
> 
> Ack. 3 was in use when I was working on the patchset. Missed this in
> my last code walkthrough before sending the patchset. Thanks.
> 
> >>   +static bool kvm_vcpu_is_dirty_quota_exhausted(struct kvm_vcpu
> >> *vcpu)
> >> +{
> >> +#ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
> >> +	u64 dirty_quota = READ_ONCE(vcpu->run->dirty_quota);
> >> +
> >> +	return dirty_quota && (vcpu->stat.generic.pages_dirtied >= dirty_quota);
> >> +#else
> >> +	return false;
> >> +#endif
> > 
> > If you introduce additional #ifdefery here, why are the additional
> > fields in the vcpu structure unconditional?
> 
> pages_dirtied can be a useful information even if dirty quota
> throttling is not used. So, I kept it unconditional based on
> feedback.

Useful for whom? This creates an ABI for all architectures, and this
needs buy-in from everyone. Personally, I think it is a pretty useless
stat.

And while we're talking about pages_dirtied, I really dislike the
WARN_ON in mark_page_dirty_in_slot(). A counter has rolled over?
Shock, horror...

> 
> CC: Sean
> 
> I can add #ifdefery in the vcpu run struct for dirty_quota.
> 
> >>   		else
> >>   			set_bit_le(rel_gfn, memslot->dirty_bitmap);
> >> +
> >> +		if (kvm_vcpu_is_dirty_quota_exhausted(vcpu))
> >> +			kvm_make_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu);
> > 
> > This is broken in the light of new dirty-tracking code queued for
> > 6.2. Specifically, you absolutely can end-up here *without* a vcpu on
> > arm64. You just have to snapshot the ITS state to observe the fireworks.
> 
> Could you please point me to the patchset which is in queue?

The patches are in -next, and you can look at the branch here[1].
Please also refer to the discussion on the list, as a lot of what was
discussed there does apply here.

Thanks,

	M.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/dirty-ring

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus
  2022-11-13 17:05 ` [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus Shivam Kumar
  2022-11-14 23:29   ` Yunhong Jiang
  2022-11-17 19:26   ` Marc Zyngier
@ 2022-11-25 10:52   ` kernel test robot
  2 siblings, 0 replies; 38+ messages in thread
From: kernel test robot @ 2022-11-25 10:52 UTC (permalink / raw)
  To: Shivam Kumar; +Cc: oe-kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2240 bytes --]

Hi Shivam,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on kvm/queue]
[also build test WARNING on linus/master v6.1-rc6]
[cannot apply to kvmarm/next mst-vhost/linux-next kvm/linux-next next-20221125]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Shivam-Kumar/KVM-Dirty-quota-based-throttling/20221114-010905
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git queue
patch link:    https://lore.kernel.org/r/20221113170507.208810-2-shivam.kumar1%40nutanix.com
patch subject: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus
reproduce:
        # https://github.com/intel-lab-lkp/linux/commit/7e908f178b0555c0d7d4b8f2b3c8bdbf95caf538
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Shivam-Kumar/KVM-Dirty-quota-based-throttling/20221114-010905
        git checkout 7e908f178b0555c0d7d4b8f2b3c8bdbf95caf538
        make menuconfig
        # enable CONFIG_COMPILE_TEST, CONFIG_WARN_MISSING_DOCUMENTS, CONFIG_WARN_ABI_ERRORS
        make htmldocs

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> Documentation/virt/kvm/api.rst:6527: WARNING: Unexpected indentation.

vim +6527 Documentation/virt/kvm/api.rst

  6517	
  6518			/* KVM_EXIT_DIRTY_QUOTA_EXHAUSTED */
  6519			struct {
  6520				__u64 count;
  6521				__u64 quota;
  6522			} dirty_quota_exit;
  6523	
  6524	If exit reason is KVM_EXIT_DIRTY_QUOTA_EXHAUSTED, it indicates that the VCPU has
  6525	exhausted its dirty quota. The 'dirty_quota_exit' member of kvm_run structure
  6526	makes the following information available to the userspace:
> 6527	    count: the current count of pages dirtied by the VCPU, can be
  6528	    skewed based on the size of the pages accessed by each vCPU.
  6529	    quota: the observed dirty quota just before the exit to userspace.
  6530	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

[-- Attachment #2: config --]
[-- Type: text/plain, Size: 38869 bytes --]

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 6.1.0-rc4 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="gcc-11 (Debian 11.3.0-8) 11.3.0"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=110300
CONFIG_CLANG_VERSION=0
CONFIG_AS_IS_GNU=y
CONFIG_AS_VERSION=23900
CONFIG_LD_IS_BFD=y
CONFIG_LD_VERSION=23900
CONFIG_LLD_VERSION=0
CONFIG_CC_CAN_LINK=y
CONFIG_CC_CAN_LINK_STATIC=y
CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_CC_HAS_NO_PROFILE_FN_ATTR=y
CONFIG_PAHOLE_VERSION=123
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_COMPILE_TEST=y
# CONFIG_WERROR is not set
CONFIG_LOCALVERSION=""
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_HAVE_KERNEL_ZSTD=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
# CONFIG_KERNEL_ZSTD is not set
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
# CONFIG_SYSVIPC is not set
# CONFIG_WATCH_QUEUE is not set
# CONFIG_CROSS_MEMORY_ATTACH is not set
# CONFIG_USELIB is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_HARDIRQS_SW_RESEND=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# end of IRQ subsystem

CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_INIT=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_HAVE_POSIX_CPU_TIMERS_TASK_WORK=y
CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y

#
# Timers subsystem
#
CONFIG_HZ_PERIODIC=y
# CONFIG_NO_HZ_IDLE is not set
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set
CONFIG_CLOCKSOURCE_WATCHDOG_MAX_SKEW_US=100
# end of Timers subsystem

CONFIG_HAVE_EBPF_JIT=y
CONFIG_ARCH_WANT_DEFAULT_BPF_JIT=y

#
# BPF subsystem
#
# CONFIG_BPF_SYSCALL is not set
# end of BPF subsystem

CONFIG_PREEMPT_NONE_BUILD=y
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
# CONFIG_PREEMPT_DYNAMIC is not set

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_PSI is not set
# end of CPU/Task time and stats accounting

CONFIG_CPU_ISOLATION=y

#
# RCU Subsystem
#
CONFIG_TINY_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TINY_SRCU=y
# end of RCU Subsystem

# CONFIG_IKCONFIG is not set
# CONFIG_IKHEADERS is not set
CONFIG_LOG_BUF_SHIFT=17
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y

#
# Scheduler features
#
# end of Scheduler features

CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_CC_HAS_INT128=y
CONFIG_CC_IMPLICIT_FALLTHROUGH="-Wimplicit-fallthrough=5"
CONFIG_GCC12_NO_ARRAY_BOUNDS=y
CONFIG_ARCH_SUPPORTS_INT128=y
# CONFIG_CGROUPS is not set
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
# CONFIG_TIME_NS is not set
# CONFIG_USER_NS is not set
# CONFIG_PID_NS is not set
# CONFIG_CHECKPOINT_RESTORE is not set
# CONFIG_SCHED_AUTOGROUP is not set
# CONFIG_SYSFS_DEPRECATED is not set
# CONFIG_RELAY is not set
# CONFIG_BLK_DEV_INITRD is not set
# CONFIG_BOOT_CONFIG is not set
# CONFIG_INITRAMFS_PRESERVE_MTIME is not set
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_LD_ORPHAN_WARN=y
CONFIG_SYSCTL=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_HAVE_PCSPKR_PLATFORM=y
# CONFIG_EXPERT is not set
CONFIG_MULTIUSER=y
CONFIG_SGETMASK_SYSCALL=y
CONFIG_SYSFS_SYSCALL=y
CONFIG_FHANDLE=y
CONFIG_POSIX_TIMERS=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_FUTEX_PI=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_IO_URING=y
CONFIG_ADVISE_SYSCALLS=y
CONFIG_MEMBARRIER=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_BASE_RELATIVE=y
CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE=y
CONFIG_RSEQ=y
# CONFIG_EMBEDDED is not set
CONFIG_HAVE_PERF_EVENTS=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
# end of Kernel Performance Events And Counters

# CONFIG_PROFILING is not set
# end of General setup

CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=28
CONFIG_ARCH_MMAP_RND_BITS_MAX=32
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_NR_GPIO=1024
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_CC_HAS_SANE_STACKPROTECTOR=y

#
# Processor type and features
#
# CONFIG_SMP is not set
CONFIG_X86_FEATURE_NAMES=y
CONFIG_X86_MPPARSE=y
# CONFIG_GOLDFISH is not set
# CONFIG_X86_CPU_RESCTRL is not set
# CONFIG_X86_EXTENDED_PLATFORM is not set
# CONFIG_SCHED_OMIT_FRAME_POINTER is not set
# CONFIG_HYPERVISOR_GUEST is not set
# CONFIG_MK8 is not set
# CONFIG_MPSC is not set
# CONFIG_MCORE2 is not set
# CONFIG_MATOM is not set
CONFIG_GENERIC_CPU=y
CONFIG_X86_INTERNODE_CACHE_SHIFT=6
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_TSC=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=64
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_IA32_FEAT_CTL=y
CONFIG_X86_VMX_FEATURE_NAMES=y
CONFIG_CPU_SUP_INTEL=y
CONFIG_CPU_SUP_AMD=y
CONFIG_CPU_SUP_HYGON=y
CONFIG_CPU_SUP_CENTAUR=y
CONFIG_CPU_SUP_ZHAOXIN=y
CONFIG_HPET_TIMER=y
CONFIG_DMI=y
CONFIG_NR_CPUS_RANGE_BEGIN=1
CONFIG_NR_CPUS_RANGE_END=1
CONFIG_NR_CPUS_DEFAULT=1
CONFIG_NR_CPUS=1
CONFIG_UP_LATE_INIT=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
# CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS is not set
# CONFIG_X86_MCE is not set

#
# Performance monitoring
#
# CONFIG_PERF_EVENTS_AMD_POWER is not set
# CONFIG_PERF_EVENTS_AMD_UNCORE is not set
# CONFIG_PERF_EVENTS_AMD_BRS is not set
# end of Performance monitoring

CONFIG_X86_16BIT=y
CONFIG_X86_ESPFIX64=y
CONFIG_X86_VSYSCALL_EMULATION=y
# CONFIG_X86_IOPL_IOPERM is not set
# CONFIG_MICROCODE is not set
# CONFIG_X86_MSR is not set
# CONFIG_X86_CPUID is not set
# CONFIG_X86_5LEVEL is not set
CONFIG_X86_DIRECT_GBPAGES=y
# CONFIG_AMD_MEM_ENCRYPT is not set
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_ILLEGAL_POINTER_VALUE=0xdead000000000000
# CONFIG_X86_CHECK_BIOS_CORRUPTION is not set
CONFIG_MTRR=y
# CONFIG_MTRR_SANITIZER is not set
CONFIG_X86_PAT=y
CONFIG_ARCH_USES_PG_UNCACHED=y
CONFIG_X86_UMIP=y
CONFIG_CC_HAS_IBT=y
# CONFIG_X86_KERNEL_IBT is not set
# CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS is not set
CONFIG_X86_INTEL_TSX_MODE_OFF=y
# CONFIG_X86_INTEL_TSX_MODE_ON is not set
# CONFIG_X86_INTEL_TSX_MODE_AUTO is not set
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
# CONFIG_KEXEC is not set
# CONFIG_CRASH_DUMP is not set
CONFIG_PHYSICAL_START=0x1000000
# CONFIG_RELOCATABLE is not set
CONFIG_PHYSICAL_ALIGN=0x200000
CONFIG_LEGACY_VSYSCALL_XONLY=y
# CONFIG_LEGACY_VSYSCALL_NONE is not set
# CONFIG_CMDLINE_BOOL is not set
CONFIG_MODIFY_LDT_SYSCALL=y
# CONFIG_STRICT_SIGALTSTACK_SIZE is not set
CONFIG_HAVE_LIVEPATCH=y
# end of Processor type and features

CONFIG_CC_HAS_SLS=y
CONFIG_CC_HAS_RETURN_THUNK=y
# CONFIG_SPECULATION_MITIGATIONS is not set
CONFIG_ARCH_HAS_ADD_PAGES=y
CONFIG_ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE=y

#
# Power management and ACPI options
#
# CONFIG_SUSPEND is not set
# CONFIG_PM is not set
CONFIG_ARCH_SUPPORTS_ACPI=y
# CONFIG_ACPI is not set

#
# CPU Frequency scaling
#
# CONFIG_CPU_FREQ is not set
# end of CPU Frequency scaling

#
# CPU Idle
#
# CONFIG_CPU_IDLE is not set
# end of CPU Idle
# end of Power management and ACPI options

#
# Bus options (PCI etc.)
#
CONFIG_ISA_DMA_API=y
# end of Bus options (PCI etc.)

#
# Binary Emulations
#
# CONFIG_IA32_EMULATION is not set
# CONFIG_X86_X32_ABI is not set
# end of Binary Emulations

CONFIG_HAVE_KVM=y
# CONFIG_VIRTUALIZATION is not set
CONFIG_AS_AVX512=y
CONFIG_AS_SHA1_NI=y
CONFIG_AS_SHA256_NI=y
CONFIG_AS_TPAUSE=y

#
# General architecture-dependent options
#
CONFIG_GENERIC_ENTRY=y
# CONFIG_JUMP_LABEL is not set
# CONFIG_STATIC_CALL_SELFTEST is not set
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_ARCH_USE_BUILTIN_BSWAP=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_OPTPROBES=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_ARCH_CORRECT_STACKTRACE_ON_KRETPROBE=y
CONFIG_HAVE_FUNCTION_ERROR_INJECTION=y
CONFIG_HAVE_NMI=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_TRACE_IRQFLAGS_NMI_SUPPORT=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_CONTIGUOUS=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_ARCH_HAS_FORTIFY_SOURCE=y
CONFIG_ARCH_HAS_SET_MEMORY=y
CONFIG_ARCH_HAS_SET_DIRECT_MAP=y
CONFIG_HAVE_ARCH_THREAD_STRUCT_WHITELIST=y
CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT=y
CONFIG_ARCH_WANTS_NO_INSTR=y
CONFIG_HAVE_ASM_MODVERSIONS=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_RSEQ=y
CONFIG_HAVE_RUST=y
CONFIG_HAVE_FUNCTION_ARG_ACCESS_API=y
CONFIG_HAVE_HW_BREAKPOINT=y
CONFIG_HAVE_MIXED_BREAKPOINTS_REGS=y
CONFIG_HAVE_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_PERF_EVENTS_NMI=y
CONFIG_HAVE_HARDLOCKUP_DETECTOR_PERF=y
CONFIG_HAVE_PERF_REGS=y
CONFIG_HAVE_PERF_USER_STACK_DUMP=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y
CONFIG_HAVE_ARCH_JUMP_LABEL_RELATIVE=y
CONFIG_MMU_GATHER_MERGE_VMAS=y
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y
CONFIG_HAVE_CMPXCHG_LOCAL=y
CONFIG_HAVE_CMPXCHG_DOUBLE=y
CONFIG_HAVE_ARCH_SECCOMP=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
# CONFIG_SECCOMP is not set
CONFIG_HAVE_ARCH_STACKLEAK=y
CONFIG_HAVE_STACKPROTECTOR=y
# CONFIG_STACKPROTECTOR is not set
CONFIG_ARCH_SUPPORTS_LTO_CLANG=y
CONFIG_ARCH_SUPPORTS_LTO_CLANG_THIN=y
CONFIG_LTO_NONE=y
CONFIG_ARCH_SUPPORTS_CFI_CLANG=y
CONFIG_HAVE_ARCH_WITHIN_STACK_FRAMES=y
CONFIG_HAVE_CONTEXT_TRACKING_USER=y
CONFIG_HAVE_CONTEXT_TRACKING_USER_OFFSTACK=y
CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN=y
CONFIG_HAVE_IRQ_TIME_ACCOUNTING=y
CONFIG_HAVE_MOVE_PUD=y
CONFIG_HAVE_MOVE_PMD=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD=y
CONFIG_HAVE_ARCH_HUGE_VMAP=y
CONFIG_HAVE_ARCH_HUGE_VMALLOC=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_HAVE_ARCH_SOFT_DIRTY=y
CONFIG_HAVE_MOD_ARCH_SPECIFIC=y
CONFIG_MODULES_USE_ELF_RELA=y
CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK=y
CONFIG_HAVE_SOFTIRQ_ON_OWN_STACK=y
CONFIG_SOFTIRQ_ON_OWN_STACK=y
CONFIG_ARCH_HAS_ELF_RANDOMIZE=y
CONFIG_HAVE_ARCH_MMAP_RND_BITS=y
CONFIG_HAVE_EXIT_THREAD=y
CONFIG_ARCH_MMAP_RND_BITS=28
CONFIG_PAGE_SIZE_LESS_THAN_64KB=y
CONFIG_PAGE_SIZE_LESS_THAN_256KB=y
CONFIG_HAVE_OBJTOOL=y
CONFIG_HAVE_JUMP_LABEL_HACK=y
CONFIG_HAVE_NOINSTR_HACK=y
CONFIG_HAVE_NOINSTR_VALIDATION=y
CONFIG_HAVE_UACCESS_VALIDATION=y
CONFIG_HAVE_STACK_VALIDATION=y
CONFIG_HAVE_RELIABLE_STACKTRACE=y
# CONFIG_COMPAT_32BIT_TIME is not set
CONFIG_HAVE_ARCH_VMAP_STACK=y
# CONFIG_VMAP_STACK is not set
CONFIG_HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET=y
CONFIG_RANDOMIZE_KSTACK_OFFSET=y
# CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT is not set
CONFIG_ARCH_HAS_STRICT_KERNEL_RWX=y
CONFIG_STRICT_KERNEL_RWX=y
CONFIG_ARCH_HAS_STRICT_MODULE_RWX=y
CONFIG_HAVE_ARCH_PREL32_RELOCATIONS=y
CONFIG_ARCH_HAS_MEM_ENCRYPT=y
CONFIG_HAVE_STATIC_CALL=y
CONFIG_HAVE_STATIC_CALL_INLINE=y
CONFIG_HAVE_PREEMPT_DYNAMIC=y
CONFIG_HAVE_PREEMPT_DYNAMIC_CALL=y
CONFIG_ARCH_WANT_LD_ORPHAN_WARN=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_ARCH_SUPPORTS_PAGE_TABLE_CHECK=y
CONFIG_ARCH_HAS_ELFCORE_COMPAT=y
CONFIG_ARCH_HAS_PARANOID_L1D_FLUSH=y
CONFIG_DYNAMIC_SIGFRAME=y
CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y

#
# GCOV-based kernel profiling
#
CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y
# end of GCOV-based kernel profiling

CONFIG_HAVE_GCC_PLUGINS=y
# CONFIG_GCC_PLUGINS is not set
# end of General architecture-dependent options

CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
# CONFIG_MODULES is not set
CONFIG_BLOCK=y
# CONFIG_BLOCK_LEGACY_AUTOLOAD is not set
# CONFIG_BLK_DEV_BSGLIB is not set
# CONFIG_BLK_DEV_INTEGRITY is not set
# CONFIG_BLK_DEV_ZONED is not set
# CONFIG_BLK_WBT is not set
# CONFIG_BLK_SED_OPAL is not set
# CONFIG_BLK_INLINE_ENCRYPTION is not set

#
# Partition Types
#
# CONFIG_PARTITION_ADVANCED is not set
CONFIG_MSDOS_PARTITION=y
CONFIG_EFI_PARTITION=y
# end of Partition Types

#
# IO Schedulers
#
# CONFIG_MQ_IOSCHED_DEADLINE is not set
# CONFIG_MQ_IOSCHED_KYBER is not set
# CONFIG_IOSCHED_BFQ is not set
# end of IO Schedulers

CONFIG_INLINE_SPIN_UNLOCK_IRQ=y
CONFIG_INLINE_READ_UNLOCK=y
CONFIG_INLINE_READ_UNLOCK_IRQ=y
CONFIG_INLINE_WRITE_UNLOCK=y
CONFIG_INLINE_WRITE_UNLOCK_IRQ=y
CONFIG_ARCH_SUPPORTS_ATOMIC_RMW=y
CONFIG_ARCH_USE_QUEUED_SPINLOCKS=y
CONFIG_ARCH_USE_QUEUED_RWLOCKS=y
CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE=y
CONFIG_ARCH_HAS_SYNC_CORE_BEFORE_USERMODE=y
CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y

#
# Executable file formats
#
# CONFIG_BINFMT_ELF is not set
# CONFIG_BINFMT_SCRIPT is not set
# CONFIG_BINFMT_MISC is not set
CONFIG_COREDUMP=y
# end of Executable file formats

#
# Memory Management options
#
# CONFIG_SWAP is not set

#
# SLAB allocator options
#
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLAB_MERGE_DEFAULT is not set
# CONFIG_SLAB_FREELIST_RANDOM is not set
# CONFIG_SLAB_FREELIST_HARDENED is not set
# CONFIG_SLUB_STATS is not set
# end of SLAB allocator options

# CONFIG_SHUFFLE_PAGE_ALLOCATOR is not set
# CONFIG_COMPAT_BRK is not set
CONFIG_SPARSEMEM=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
# CONFIG_SPARSEMEM_VMEMMAP is not set
CONFIG_HAVE_FAST_GUP=y
CONFIG_EXCLUSIVE_SYSTEM_RAM=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
# CONFIG_MEMORY_HOTPLUG is not set
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK=y
# CONFIG_COMPACTION is not set
# CONFIG_PAGE_REPORTING is not set
CONFIG_PHYS_ADDR_T_64BIT=y
# CONFIG_KSM is not set
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ARCH_WANTS_THP_SWAP=y
# CONFIG_TRANSPARENT_HUGEPAGE is not set
CONFIG_NEED_PER_CPU_KM=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
# CONFIG_CMA is not set
CONFIG_GENERIC_EARLY_IOREMAP=y
# CONFIG_IDLE_PAGE_TRACKING is not set
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_ARCH_HAS_CURRENT_STACK_POINTER=y
CONFIG_ARCH_HAS_PTE_DEVMAP=y
CONFIG_ZONE_DMA=y
CONFIG_ZONE_DMA32=y
CONFIG_VM_EVENT_COUNTERS=y
# CONFIG_PERCPU_STATS is not set

#
# GUP_TEST needs to have DEBUG_FS enabled
#
CONFIG_ARCH_HAS_PTE_SPECIAL=y
CONFIG_SECRETMEM=y
# CONFIG_ANON_VMA_NAME is not set
# CONFIG_USERFAULTFD is not set
# CONFIG_LRU_GEN is not set

#
# Data Access Monitoring
#
# CONFIG_DAMON is not set
# end of Data Access Monitoring
# end of Memory Management options

# CONFIG_NET is not set

#
# Device Drivers
#
CONFIG_HAVE_EISA=y
# CONFIG_EISA is not set
CONFIG_HAVE_PCI=y
# CONFIG_PCI is not set
# CONFIG_PCCARD is not set

#
# Generic Driver Options
#
# CONFIG_UEVENT_HELPER is not set
# CONFIG_DEVTMPFS is not set
# CONFIG_STANDALONE is not set
# CONFIG_PREVENT_FIRMWARE_BUILD is not set

#
# Firmware loader
#
CONFIG_FW_LOADER=y
CONFIG_EXTRA_FIRMWARE=""
# CONFIG_FW_LOADER_USER_HELPER is not set
# CONFIG_FW_LOADER_COMPRESS is not set
# CONFIG_FW_UPLOAD is not set
# end of Firmware loader

CONFIG_ALLOW_DEV_COREDUMP=y
CONFIG_GENERIC_CPU_AUTOPROBE=y
CONFIG_GENERIC_CPU_VULNERABILITIES=y
# end of Generic Driver Options

#
# Bus devices
#
# CONFIG_ARM_INTEGRATOR_LM is not set
# CONFIG_BT1_APB is not set
# CONFIG_BT1_AXI is not set
# CONFIG_HISILICON_LPC is not set
# CONFIG_INTEL_IXP4XX_EB is not set
# CONFIG_QCOM_EBI2 is not set
# CONFIG_MHI_BUS is not set
# CONFIG_MHI_BUS_EP is not set
# end of Bus devices

#
# Firmware Drivers
#

#
# ARM System Control and Management Interface Protocol
#
# CONFIG_ARM_SCMI_PROTOCOL is not set
# end of ARM System Control and Management Interface Protocol

# CONFIG_EDD is not set
CONFIG_FIRMWARE_MEMMAP=y
# CONFIG_DMIID is not set
# CONFIG_DMI_SYSFS is not set
CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK=y
# CONFIG_FW_CFG_SYSFS is not set
# CONFIG_SYSFB_SIMPLEFB is not set
# CONFIG_BCM47XX_NVRAM is not set
# CONFIG_GOOGLE_FIRMWARE is not set

#
# Tegra firmware driver
#
# end of Tegra firmware driver
# end of Firmware Drivers

# CONFIG_GNSS is not set
# CONFIG_MTD is not set
# CONFIG_OF is not set
CONFIG_ARCH_MIGHT_HAVE_PC_PARPORT=y
# CONFIG_PARPORT is not set
# CONFIG_BLK_DEV is not set

#
# NVME Support
#
# CONFIG_NVME_FC is not set
# end of NVME Support

#
# Misc devices
#
# CONFIG_DUMMY_IRQ is not set
# CONFIG_ATMEL_SSC is not set
# CONFIG_ENCLOSURE_SERVICES is not set
# CONFIG_QCOM_COINCELL is not set
# CONFIG_SRAM is not set
# CONFIG_XILINX_SDFEC is not set
# CONFIG_C2PORT is not set

#
# EEPROM support
#
# CONFIG_EEPROM_93CX6 is not set
# end of EEPROM support

#
# Texas Instruments shared transport line discipline
#
# end of Texas Instruments shared transport line discipline

#
# Altera FPGA firmware download module (requires I2C)
#
# CONFIG_ECHO is not set
# CONFIG_PVPANIC is not set
# end of Misc devices

#
# SCSI device support
#
CONFIG_SCSI_MOD=y
# CONFIG_RAID_ATTRS is not set
# CONFIG_SCSI is not set
# end of SCSI device support

# CONFIG_ATA is not set
# CONFIG_MD is not set
# CONFIG_TARGET_CORE is not set

#
# IEEE 1394 (FireWire) support
#
# CONFIG_FIREWIRE is not set
# end of IEEE 1394 (FireWire) support

# CONFIG_MACINTOSH_DRIVERS is not set

#
# Input device support
#
CONFIG_INPUT=y
# CONFIG_INPUT_FF_MEMLESS is not set
# CONFIG_INPUT_SPARSEKMAP is not set
# CONFIG_INPUT_MATRIXKMAP is not set

#
# Userland interfaces
#
# CONFIG_INPUT_MOUSEDEV is not set
# CONFIG_INPUT_JOYDEV is not set
# CONFIG_INPUT_EVDEV is not set
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
# CONFIG_INPUT_KEYBOARD is not set
# CONFIG_INPUT_MOUSE is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TABLET is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
# CONFIG_INPUT_MISC is not set
# CONFIG_RMI4_CORE is not set

#
# Hardware I/O ports
#
# CONFIG_SERIO is not set
CONFIG_ARCH_MIGHT_HAVE_PC_SERIO=y
# CONFIG_GAMEPORT is not set
# end of Hardware I/O ports
# end of Input device support

#
# Character devices
#
CONFIG_TTY=y
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
# CONFIG_VT_HW_CONSOLE_BINDING is not set
CONFIG_UNIX98_PTYS=y
# CONFIG_LEGACY_PTYS is not set
# CONFIG_LDISC_AUTOLOAD is not set

#
# Serial drivers
#
# CONFIG_SERIAL_8250 is not set

#
# Non-8250 serial port support
#
# CONFIG_SERIAL_AMBA_PL010 is not set
# CONFIG_SERIAL_MESON is not set
# CONFIG_SERIAL_CLPS711X is not set
# CONFIG_SERIAL_SAMSUNG is not set
# CONFIG_SERIAL_TEGRA is not set
# CONFIG_SERIAL_IMX is not set
# CONFIG_SERIAL_UARTLITE is not set
# CONFIG_SERIAL_SH_SCI is not set
# CONFIG_SERIAL_MSM is not set
# CONFIG_SERIAL_VT8500 is not set
# CONFIG_SERIAL_OMAP is not set
# CONFIG_SERIAL_LANTIQ is not set
# CONFIG_SERIAL_SCCNXP is not set
# CONFIG_SERIAL_TIMBERDALE is not set
# CONFIG_SERIAL_BCM63XX is not set
# CONFIG_SERIAL_ALTERA_JTAGUART is not set
# CONFIG_SERIAL_ALTERA_UART is not set
# CONFIG_SERIAL_MXS_AUART is not set
# CONFIG_SERIAL_MPS2_UART is not set
# CONFIG_SERIAL_ARC is not set
# CONFIG_SERIAL_FSL_LPUART is not set
# CONFIG_SERIAL_FSL_LINFLEXUART is not set
# CONFIG_SERIAL_ST_ASC is not set
# CONFIG_SERIAL_STM32 is not set
# CONFIG_SERIAL_OWL is not set
# CONFIG_SERIAL_RDA is not set
# CONFIG_SERIAL_LITEUART is not set
# CONFIG_SERIAL_SUNPLUS is not set
# end of Serial drivers

# CONFIG_SERIAL_NONSTANDARD is not set
# CONFIG_NULL_TTY is not set
# CONFIG_SERIAL_DEV_BUS is not set
# CONFIG_VIRTIO_CONSOLE is not set
# CONFIG_IPMI_HANDLER is not set
# CONFIG_ASPEED_KCS_IPMI_BMC is not set
# CONFIG_NPCM7XX_KCS_IPMI_BMC is not set
# CONFIG_HW_RANDOM is not set
# CONFIG_MWAVE is not set
# CONFIG_DEVMEM is not set
# CONFIG_NVRAM is not set
# CONFIG_HANGCHECK_TIMER is not set
# CONFIG_TCG_TPM is not set
# CONFIG_TELCLOCK is not set
# CONFIG_RANDOM_TRUST_CPU is not set
# CONFIG_RANDOM_TRUST_BOOTLOADER is not set
# end of Character devices

#
# I2C support
#
# CONFIG_I2C is not set
# end of I2C support

# CONFIG_I3C is not set
# CONFIG_SPI is not set
# CONFIG_SPMI is not set
# CONFIG_HSI is not set
# CONFIG_PPS is not set

#
# PTP clock support
#
CONFIG_PTP_1588_CLOCK_OPTIONAL=y

#
# Enable PHYLIB and NETWORK_PHY_TIMESTAMPING to see the additional clocks.
#
# end of PTP clock support

# CONFIG_PINCTRL is not set
# CONFIG_GPIOLIB is not set
# CONFIG_W1 is not set
# CONFIG_POWER_RESET is not set
# CONFIG_POWER_SUPPLY is not set
# CONFIG_HWMON is not set
# CONFIG_THERMAL is not set
# CONFIG_WATCHDOG is not set
CONFIG_SSB_POSSIBLE=y
# CONFIG_SSB is not set
CONFIG_BCMA_POSSIBLE=y
# CONFIG_BCMA is not set

#
# Multifunction device drivers
#
# CONFIG_MFD_SUN4I_GPADC is not set
# CONFIG_MFD_AT91_USART is not set
# CONFIG_MFD_MADERA is not set
# CONFIG_MFD_EXYNOS_LPASS is not set
# CONFIG_MFD_MXS_LRADC is not set
# CONFIG_MFD_MX25_TSADC is not set
# CONFIG_HTC_PASIC3 is not set
# CONFIG_MFD_KEMPLD is not set
# CONFIG_MFD_MT6397 is not set
# CONFIG_MFD_PM8XXX is not set
# CONFIG_MFD_SM501 is not set
# CONFIG_ABX500_CORE is not set
# CONFIG_MFD_SUN6I_PRCM is not set
# CONFIG_MFD_SYSCON is not set
# CONFIG_MFD_TI_AM335X_TSCADC is not set
# CONFIG_MFD_TQMX86 is not set
# CONFIG_MFD_STM32_LPTIMER is not set
# CONFIG_MFD_STM32_TIMERS is not set
# end of Multifunction device drivers

# CONFIG_REGULATOR is not set
# CONFIG_RC_CORE is not set

#
# CEC support
#
# CONFIG_MEDIA_CEC_SUPPORT is not set
# end of CEC support

# CONFIG_MEDIA_SUPPORT is not set

#
# Graphics support
#
# CONFIG_IMX_IPUV3_CORE is not set
# CONFIG_DRM is not set

#
# ARM devices
#
# end of ARM devices

#
# Frame buffer Devices
#
# CONFIG_FB is not set
# CONFIG_MMP_DISP is not set
# end of Frame buffer Devices

#
# Backlight & LCD device support
#
# CONFIG_LCD_CLASS_DEVICE is not set
# CONFIG_BACKLIGHT_CLASS_DEVICE is not set
# end of Backlight & LCD device support

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_DUMMY_CONSOLE=y
CONFIG_DUMMY_CONSOLE_COLUMNS=80
CONFIG_DUMMY_CONSOLE_ROWS=25
# end of Console display driver support
# end of Graphics support

# CONFIG_SOUND is not set

#
# HID support
#
# CONFIG_HID is not set
# end of HID support

CONFIG_USB_OHCI_LITTLE_ENDIAN=y
# CONFIG_USB_SUPPORT is not set
# CONFIG_MMC is not set
# CONFIG_MEMSTICK is not set
# CONFIG_NEW_LEDS is not set
# CONFIG_ACCESSIBILITY is not set
CONFIG_EDAC_ATOMIC_SCRUB=y
CONFIG_EDAC_SUPPORT=y
CONFIG_RTC_LIB=y
CONFIG_RTC_MC146818_LIB=y
# CONFIG_RTC_CLASS is not set
# CONFIG_DMADEVICES is not set

#
# DMABUF options
#
# CONFIG_SYNC_FILE is not set
# CONFIG_DMABUF_HEAPS is not set
# end of DMABUF options

# CONFIG_AUXDISPLAY is not set
# CONFIG_UIO is not set
# CONFIG_VFIO is not set
# CONFIG_VIRT_DRIVERS is not set
# CONFIG_VIRTIO_MENU is not set
# CONFIG_VHOST_MENU is not set

#
# Microsoft Hyper-V guest support
#
# end of Microsoft Hyper-V guest support

# CONFIG_GREYBUS is not set
# CONFIG_COMEDI is not set
# CONFIG_STAGING is not set
# CONFIG_CHROME_PLATFORMS is not set
# CONFIG_MELLANOX_PLATFORM is not set
# CONFIG_OLPC_XO175 is not set
# CONFIG_SURFACE_PLATFORMS is not set
# CONFIG_X86_PLATFORM_DEVICES is not set
# CONFIG_COMMON_CLK is not set
# CONFIG_HWSPINLOCK is not set

#
# Clock Source drivers
#
CONFIG_CLKEVT_I8253=y
CONFIG_I8253_LOCK=y
CONFIG_CLKBLD_I8253=y
# CONFIG_BCM2835_TIMER is not set
# CONFIG_BCM_KONA_TIMER is not set
# CONFIG_DAVINCI_TIMER is not set
# CONFIG_DIGICOLOR_TIMER is not set
# CONFIG_OMAP_DM_TIMER is not set
# CONFIG_DW_APB_TIMER is not set
# CONFIG_FTTMR010_TIMER is not set
# CONFIG_IXP4XX_TIMER is not set
# CONFIG_MESON6_TIMER is not set
# CONFIG_OWL_TIMER is not set
# CONFIG_RDA_TIMER is not set
# CONFIG_SUN4I_TIMER is not set
# CONFIG_TEGRA_TIMER is not set
# CONFIG_VT8500_TIMER is not set
# CONFIG_NPCM7XX_TIMER is not set
# CONFIG_ASM9260_TIMER is not set
# CONFIG_CLKSRC_DBX500_PRCMU is not set
# CONFIG_CLPS711X_TIMER is not set
# CONFIG_MXS_TIMER is not set
# CONFIG_NSPIRE_TIMER is not set
# CONFIG_INTEGRATOR_AP_TIMER is not set
# CONFIG_CLKSRC_PISTACHIO is not set
# CONFIG_CLKSRC_STM32_LP is not set
# CONFIG_ARMV7M_SYSTICK is not set
# CONFIG_ATMEL_PIT is not set
# CONFIG_ATMEL_ST is not set
# CONFIG_CLKSRC_SAMSUNG_PWM is not set
# CONFIG_FSL_FTM_TIMER is not set
# CONFIG_OXNAS_RPS_TIMER is not set
# CONFIG_MTK_TIMER is not set
# CONFIG_SH_TIMER_CMT is not set
# CONFIG_SH_TIMER_MTU2 is not set
# CONFIG_RENESAS_OSTM is not set
# CONFIG_SH_TIMER_TMU is not set
# CONFIG_EM_TIMER_STI is not set
# CONFIG_CLKSRC_PXA is not set
# CONFIG_TIMER_IMX_SYS_CTR is not set
# CONFIG_CLKSRC_ST_LPC is not set
# CONFIG_GXP_TIMER is not set
# CONFIG_MSC313E_TIMER is not set
# CONFIG_MICROCHIP_PIT64B is not set
# end of Clock Source drivers

# CONFIG_MAILBOX is not set
# CONFIG_IOMMU_SUPPORT is not set

#
# Remoteproc drivers
#
# CONFIG_REMOTEPROC is not set
# end of Remoteproc drivers

#
# Rpmsg drivers
#
# CONFIG_RPMSG_VIRTIO is not set
# end of Rpmsg drivers

#
# SOC (System On Chip) specific Drivers
#

#
# Amlogic SoC drivers
#
# CONFIG_MESON_CANVAS is not set
# CONFIG_MESON_CLK_MEASURE is not set
# CONFIG_MESON_GX_SOCINFO is not set
# CONFIG_MESON_MX_SOCINFO is not set
# end of Amlogic SoC drivers

#
# Apple SoC drivers
#
# CONFIG_APPLE_SART is not set
# end of Apple SoC drivers

#
# ASPEED SoC drivers
#
# CONFIG_ASPEED_LPC_CTRL is not set
# CONFIG_ASPEED_LPC_SNOOP is not set
# CONFIG_ASPEED_UART_ROUTING is not set
# CONFIG_ASPEED_P2A_CTRL is not set
# CONFIG_ASPEED_SOCINFO is not set
# end of ASPEED SoC drivers

# CONFIG_AT91_SOC_ID is not set
# CONFIG_AT91_SOC_SFR is not set

#
# Broadcom SoC drivers
#
# CONFIG_SOC_BCM63XX is not set
# CONFIG_SOC_BRCMSTB is not set
# end of Broadcom SoC drivers

#
# NXP/Freescale QorIQ SoC drivers
#
# end of NXP/Freescale QorIQ SoC drivers

#
# fujitsu SoC drivers
#
# end of fujitsu SoC drivers

#
# i.MX SoC drivers
#
# CONFIG_SOC_IMX8M is not set
# CONFIG_SOC_IMX9 is not set
# end of i.MX SoC drivers

#
# IXP4xx SoC drivers
#
# CONFIG_IXP4XX_QMGR is not set
# CONFIG_IXP4XX_NPE is not set
# end of IXP4xx SoC drivers

#
# Enable LiteX SoC Builder specific drivers
#
# CONFIG_LITEX_SOC_CONTROLLER is not set
# end of Enable LiteX SoC Builder specific drivers

#
# MediaTek SoC drivers
#
# CONFIG_MTK_CMDQ is not set
# CONFIG_MTK_DEVAPC is not set
# CONFIG_MTK_INFRACFG is not set
# CONFIG_MTK_MMSYS is not set
# end of MediaTek SoC drivers

#
# Qualcomm SoC drivers
#
# CONFIG_QCOM_GENI_SE is not set
# CONFIG_QCOM_GSBI is not set
# CONFIG_QCOM_LLCC is not set
# CONFIG_QCOM_RPMH is not set
# CONFIG_QCOM_SPM is not set
# CONFIG_QCOM_ICC_BWMON is not set
# end of Qualcomm SoC drivers

# CONFIG_SOC_RENESAS is not set
# CONFIG_ROCKCHIP_GRF is not set
# CONFIG_SOC_SAMSUNG is not set
# CONFIG_SOC_TI is not set
# CONFIG_UX500_SOC_ID is not set

#
# Xilinx SoC drivers
#
# end of Xilinx SoC drivers
# end of SOC (System On Chip) specific Drivers

# CONFIG_PM_DEVFREQ is not set
# CONFIG_EXTCON is not set
# CONFIG_MEMORY is not set
# CONFIG_IIO is not set
# CONFIG_PWM is not set

#
# IRQ chip support
#
# CONFIG_AL_FIC is not set
# CONFIG_RENESAS_INTC_IRQPIN is not set
# CONFIG_RENESAS_IRQC is not set
# CONFIG_RENESAS_RZA1_IRQC is not set
# CONFIG_RENESAS_RZG2L_IRQC is not set
# CONFIG_SL28CPLD_INTC is not set
# CONFIG_TS4800_IRQ is not set
# CONFIG_INGENIC_TCU_IRQ is not set
# CONFIG_IRQ_UNIPHIER_AIDET is not set
# CONFIG_MESON_IRQ_GPIO is not set
# CONFIG_IMX_IRQSTEER is not set
# CONFIG_IMX_INTMUX is not set
# CONFIG_EXYNOS_IRQ_COMBINER is not set
# CONFIG_MST_IRQ is not set
# CONFIG_MCHP_EIC is not set
# CONFIG_SUNPLUS_SP7021_INTC is not set
# end of IRQ chip support

# CONFIG_IPACK_BUS is not set
# CONFIG_RESET_CONTROLLER is not set

#
# PHY Subsystem
#
# CONFIG_GENERIC_PHY is not set
# CONFIG_PHY_PISTACHIO_USB is not set
# CONFIG_PHY_CAN_TRANSCEIVER is not set

#
# PHY drivers for Broadcom platforms
#
# CONFIG_PHY_BCM63XX_USBH is not set
# CONFIG_BCM_KONA_USB2_PHY is not set
# end of PHY drivers for Broadcom platforms

# CONFIG_PHY_HI6220_USB is not set
# CONFIG_PHY_HI3660_USB is not set
# CONFIG_PHY_HI3670_USB is not set
# CONFIG_PHY_HI3670_PCIE is not set
# CONFIG_PHY_HISTB_COMBPHY is not set
# CONFIG_PHY_HISI_INNO_USB2 is not set
# CONFIG_PHY_PXA_28NM_HSIC is not set
# CONFIG_PHY_PXA_28NM_USB2 is not set
# CONFIG_PHY_PXA_USB is not set
# CONFIG_PHY_MMP3_USB is not set
# CONFIG_PHY_MMP3_HSIC is not set
# CONFIG_PHY_MT7621_PCI is not set
# CONFIG_PHY_RALINK_USB is not set
# CONFIG_PHY_RCAR_GEN3_USB3 is not set
# CONFIG_PHY_ROCKCHIP_DPHY_RX0 is not set
# CONFIG_PHY_ROCKCHIP_PCIE is not set
# CONFIG_PHY_ROCKCHIP_SNPS_PCIE3 is not set
# CONFIG_PHY_EXYNOS_MIPI_VIDEO is not set
# CONFIG_PHY_SAMSUNG_USB2 is not set
# CONFIG_PHY_ST_SPEAR1310_MIPHY is not set
# CONFIG_PHY_ST_SPEAR1340_MIPHY is not set
# CONFIG_PHY_TEGRA194_P2U is not set
# CONFIG_PHY_DA8XX_USB is not set
# CONFIG_OMAP_CONTROL_PHY is not set
# CONFIG_TI_PIPE3 is not set
# CONFIG_PHY_INTEL_KEEMBAY_EMMC is not set
# CONFIG_PHY_INTEL_KEEMBAY_USB is not set
# CONFIG_PHY_INTEL_LGM_EMMC is not set
# CONFIG_PHY_XILINX_ZYNQMP is not set
# end of PHY Subsystem

# CONFIG_POWERCAP is not set
# CONFIG_MCB is not set

#
# Performance monitor support
#
# CONFIG_ARM_CCN is not set
# CONFIG_ARM_CMN is not set
# CONFIG_FSL_IMX8_DDR_PMU is not set
# CONFIG_XGENE_PMU is not set
# CONFIG_ARM_DMC620_PMU is not set
# CONFIG_MARVELL_CN10K_TAD_PMU is not set
# CONFIG_ALIBABA_UNCORE_DRW_PMU is not set
# CONFIG_MARVELL_CN10K_DDR_PMU is not set
# end of Performance monitor support

# CONFIG_RAS is not set

#
# Android
#
# CONFIG_ANDROID_BINDER_IPC is not set
# end of Android

# CONFIG_DAX is not set
# CONFIG_NVMEM is not set

#
# HW tracing support
#
# CONFIG_STM is not set
# CONFIG_INTEL_TH is not set
# end of HW tracing support

# CONFIG_FPGA is not set
# CONFIG_TEE is not set
# CONFIG_SIOX is not set
# CONFIG_SLIMBUS is not set
# CONFIG_INTERCONNECT is not set
# CONFIG_COUNTER is not set
# CONFIG_PECI is not set
# CONFIG_HTE is not set
# end of Device Drivers

#
# File systems
#
CONFIG_DCACHE_WORD_ACCESS=y
# CONFIG_VALIDATE_FS_PARSER is not set
# CONFIG_EXT2_FS is not set
# CONFIG_EXT3_FS is not set
# CONFIG_EXT4_FS is not set
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
# CONFIG_XFS_FS is not set
# CONFIG_GFS2_FS is not set
# CONFIG_BTRFS_FS is not set
# CONFIG_NILFS2_FS is not set
# CONFIG_F2FS_FS is not set
CONFIG_EXPORTFS=y
# CONFIG_EXPORTFS_BLOCK_OPS is not set
CONFIG_FILE_LOCKING=y
# CONFIG_FS_ENCRYPTION is not set
# CONFIG_FS_VERITY is not set
# CONFIG_DNOTIFY is not set
# CONFIG_INOTIFY_USER is not set
# CONFIG_FANOTIFY is not set
# CONFIG_QUOTA is not set
# CONFIG_AUTOFS4_FS is not set
# CONFIG_AUTOFS_FS is not set
# CONFIG_FUSE_FS is not set
# CONFIG_OVERLAY_FS is not set

#
# Caches
#
# CONFIG_FSCACHE is not set
# end of Caches

#
# CD-ROM/DVD Filesystems
#
# CONFIG_ISO9660_FS is not set
# CONFIG_UDF_FS is not set
# end of CD-ROM/DVD Filesystems

#
# DOS/FAT/EXFAT/NT Filesystems
#
# CONFIG_MSDOS_FS is not set
# CONFIG_VFAT_FS is not set
# CONFIG_EXFAT_FS is not set
# CONFIG_NTFS_FS is not set
# CONFIG_NTFS3_FS is not set
# end of DOS/FAT/EXFAT/NT Filesystems

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
# CONFIG_PROC_KCORE is not set
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
# CONFIG_PROC_CHILDREN is not set
CONFIG_PROC_PID_ARCH_STATUS=y
CONFIG_KERNFS=y
CONFIG_SYSFS=y
# CONFIG_TMPFS is not set
# CONFIG_HUGETLBFS is not set
CONFIG_ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP=y
CONFIG_ARCH_HAS_GIGANTIC_PAGE=y
# CONFIG_CONFIGFS_FS is not set
# end of Pseudo filesystems

# CONFIG_MISC_FILESYSTEMS is not set
# CONFIG_NLS is not set
# CONFIG_UNICODE is not set
CONFIG_IO_WQ=y
# end of File systems

#
# Security options
#
# CONFIG_KEYS is not set
# CONFIG_SECURITY_DMESG_RESTRICT is not set
# CONFIG_SECURITY is not set
# CONFIG_SECURITYFS is not set
CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR=y
# CONFIG_HARDENED_USERCOPY is not set
# CONFIG_FORTIFY_SOURCE is not set
# CONFIG_STATIC_USERMODEHELPER is not set
CONFIG_DEFAULT_SECURITY_DAC=y
CONFIG_LSM="landlock,lockdown,yama,loadpin,safesetid,integrity,bpf"

#
# Kernel hardening options
#

#
# Memory initialization
#
CONFIG_INIT_STACK_NONE=y
# CONFIG_INIT_ON_ALLOC_DEFAULT_ON is not set
# CONFIG_INIT_ON_FREE_DEFAULT_ON is not set
CONFIG_CC_HAS_ZERO_CALL_USED_REGS=y
# CONFIG_ZERO_CALL_USED_REGS is not set
# end of Memory initialization

CONFIG_RANDSTRUCT_NONE=y
# end of Kernel hardening options
# end of Security options

# CONFIG_CRYPTO is not set

#
# Library routines
#
# CONFIG_PACKING is not set
CONFIG_BITREVERSE=y
CONFIG_GENERIC_STRNCPY_FROM_USER=y
CONFIG_GENERIC_STRNLEN_USER=y
# CONFIG_CORDIC is not set
# CONFIG_PRIME_NUMBERS is not set
CONFIG_GENERIC_PCI_IOMAP=y
CONFIG_GENERIC_IOMAP=y
CONFIG_ARCH_USE_CMPXCHG_LOCKREF=y
CONFIG_ARCH_HAS_FAST_MULTIPLIER=y
CONFIG_ARCH_USE_SYM_ANNOTATIONS=y

#
# Crypto library routines
#
CONFIG_CRYPTO_LIB_BLAKE2S_GENERIC=y
# CONFIG_CRYPTO_LIB_CHACHA is not set
# CONFIG_CRYPTO_LIB_CURVE25519 is not set
CONFIG_CRYPTO_LIB_POLY1305_RSIZE=11
# CONFIG_CRYPTO_LIB_POLY1305 is not set
# end of Crypto library routines

# CONFIG_CRC_CCITT is not set
# CONFIG_CRC16 is not set
# CONFIG_CRC_T10DIF is not set
# CONFIG_CRC64_ROCKSOFT is not set
# CONFIG_CRC_ITU_T is not set
CONFIG_CRC32=y
# CONFIG_CRC32_SELFTEST is not set
CONFIG_CRC32_SLICEBY8=y
# CONFIG_CRC32_SLICEBY4 is not set
# CONFIG_CRC32_SARWATE is not set
# CONFIG_CRC32_BIT is not set
# CONFIG_CRC64 is not set
# CONFIG_CRC4 is not set
# CONFIG_CRC7 is not set
# CONFIG_LIBCRC32C is not set
# CONFIG_CRC8 is not set
# CONFIG_RANDOM32_SELFTEST is not set
# CONFIG_XZ_DEC is not set
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT_MAP=y
CONFIG_HAS_DMA=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_SWIOTLB=y
# CONFIG_DMA_API_DEBUG is not set
# CONFIG_IRQ_POLL is not set
CONFIG_HAVE_GENERIC_VDSO=y
CONFIG_GENERIC_GETTIMEOFDAY=y
CONFIG_GENERIC_VDSO_TIME_NS=y
CONFIG_ARCH_HAS_PMEM_API=y
CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE=y
CONFIG_ARCH_HAS_COPY_MC=y
CONFIG_ARCH_STACKWALK=y
CONFIG_STACKDEPOT=y
CONFIG_SBITMAP=y
# CONFIG_PARMAN is not set
# CONFIG_OBJAGG is not set
# end of Library routines

#
# Kernel hacking
#

#
# printk and dmesg options
#
# CONFIG_PRINTK_TIME is not set
# CONFIG_PRINTK_CALLER is not set
# CONFIG_STACKTRACE_BUILD_ID is not set
CONFIG_CONSOLE_LOGLEVEL_DEFAULT=7
CONFIG_CONSOLE_LOGLEVEL_QUIET=4
CONFIG_MESSAGE_LOGLEVEL_DEFAULT=4
# CONFIG_DYNAMIC_DEBUG is not set
# CONFIG_DYNAMIC_DEBUG_CORE is not set
# CONFIG_SYMBOLIC_ERRNAME is not set
CONFIG_DEBUG_BUGVERBOSE=y
# end of printk and dmesg options

# CONFIG_DEBUG_KERNEL is not set

#
# Compile-time checks and compiler options
#
CONFIG_AS_HAS_NON_CONST_LEB128=y
CONFIG_FRAME_WARN=2048
# CONFIG_STRIP_ASM_SYMS is not set
# CONFIG_HEADERS_INSTALL is not set
CONFIG_DEBUG_SECTION_MISMATCH=y
CONFIG_SECTION_MISMATCH_WARN_ONLY=y
CONFIG_OBJTOOL=y
# end of Compile-time checks and compiler options

#
# Generic Kernel Debugging Instruments
#
# CONFIG_MAGIC_SYSRQ is not set
# CONFIG_DEBUG_FS is not set
CONFIG_HAVE_ARCH_KGDB=y
CONFIG_ARCH_HAS_UBSAN_SANITIZE_ALL=y
# CONFIG_UBSAN is not set
CONFIG_HAVE_ARCH_KCSAN=y
CONFIG_HAVE_KCSAN_COMPILER=y
# end of Generic Kernel Debugging Instruments

#
# Networking Debugging
#
# end of Networking Debugging

#
# Memory Debugging
#
# CONFIG_PAGE_EXTENSION is not set
CONFIG_SLUB_DEBUG=y
# CONFIG_SLUB_DEBUG_ON is not set
# CONFIG_PAGE_TABLE_CHECK is not set
# CONFIG_PAGE_POISONING is not set
# CONFIG_DEBUG_RODATA_TEST is not set
CONFIG_ARCH_HAS_DEBUG_WX=y
# CONFIG_DEBUG_WX is not set
CONFIG_GENERIC_PTDUMP=y
CONFIG_HAVE_DEBUG_KMEMLEAK=y
CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y
# CONFIG_DEBUG_VM_PGTABLE is not set
CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP=y
CONFIG_HAVE_ARCH_KASAN=y
CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y
# CONFIG_KASAN is not set
CONFIG_HAVE_ARCH_KFENCE=y
# CONFIG_KFENCE is not set
CONFIG_HAVE_ARCH_KMSAN=y
# end of Memory Debugging

#
# Debug Oops, Lockups and Hangs
#
# CONFIG_PANIC_ON_OOPS is not set
CONFIG_PANIC_ON_OOPS_VALUE=0
CONFIG_PANIC_TIMEOUT=0
CONFIG_HARDLOCKUP_CHECK_TIMESTAMP=y
# end of Debug Oops, Lockups and Hangs

#
# Scheduler Debugging
#
# end of Scheduler Debugging

# CONFIG_DEBUG_TIMEKEEPING is not set

#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_LOCK_DEBUGGING_SUPPORT=y
# CONFIG_WW_MUTEX_SELFTEST is not set
# end of Lock Debugging (spinlocks, mutexes, etc...)

# CONFIG_DEBUG_IRQFLAGS is not set
CONFIG_STACKTRACE=y
# CONFIG_WARN_ALL_UNSEEDED_RANDOM is not set

#
# Debug kernel data structures
#
# CONFIG_BUG_ON_DATA_CORRUPTION is not set
# end of Debug kernel data structures

#
# RCU Debugging
#
# end of RCU Debugging

CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_HAVE_RETHOOK=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y
CONFIG_HAVE_DYNAMIC_FTRACE_NO_PATCHABLE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_HAVE_FENTRY=y
CONFIG_HAVE_OBJTOOL_MCOUNT=y
CONFIG_HAVE_C_RECORDMCOUNT=y
CONFIG_HAVE_BUILDTIME_MCOUNT_SORT=y
CONFIG_TRACING_SUPPORT=y
# CONFIG_FTRACE is not set
# CONFIG_SAMPLES is not set
CONFIG_HAVE_SAMPLE_FTRACE_DIRECT=y
CONFIG_HAVE_SAMPLE_FTRACE_DIRECT_MULTI=y
CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y

#
# x86 Debugging
#
# CONFIG_X86_VERBOSE_BOOTUP is not set
CONFIG_EARLY_PRINTK=y
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
CONFIG_IO_DELAY_0X80=y
# CONFIG_IO_DELAY_0XED is not set
# CONFIG_IO_DELAY_UDELAY is not set
# CONFIG_IO_DELAY_NONE is not set
CONFIG_UNWINDER_ORC=y
# CONFIG_UNWINDER_FRAME_POINTER is not set
# end of x86 Debugging

#
# Kernel Testing and Coverage
#
# CONFIG_KUNIT is not set
CONFIG_ARCH_HAS_KCOV=y
CONFIG_CC_HAS_SANCOV_TRACE_PC=y
# CONFIG_KCOV is not set
# CONFIG_RUNTIME_TESTING_MENU is not set
CONFIG_ARCH_USE_MEMTEST=y
# CONFIG_MEMTEST is not set
# end of Kernel Testing and Coverage

#
# Rust hacking
#
# end of Rust hacking

CONFIG_WARN_MISSING_DOCUMENTS=y
CONFIG_WARN_ABI_ERRORS=y
# end of Kernel hacking

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus
  2022-11-22 17:46       ` Marc Zyngier
@ 2022-12-06  6:22         ` Shivam Kumar
  2022-12-07 16:44           ` Marc Zyngier
  0 siblings, 1 reply; 38+ messages in thread
From: Shivam Kumar @ 2022-12-06  6:22 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: pbonzini, seanjc, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat



On 22/11/22 11:16 pm, Marc Zyngier wrote:
> On Fri, 18 Nov 2022 09:47:50 +0000,
> Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
>>
>>
>>
>> On 18/11/22 12:56 am, Marc Zyngier wrote:
>>> On Sun, 13 Nov 2022 17:05:06 +0000,
>>> Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
>>>>
>>>> +    count: the current count of pages dirtied by the VCPU, can be
>>>> +    skewed based on the size of the pages accessed by each vCPU.
>>>
>>> How can userspace make a decision on the amount of dirtying this
>>> represent if this doesn't represent a number of base pages? Or are you
>>> saying that this only counts the number of permission faults that have
>>> dirtied pages?
>>
>> Yes, this only counts the number of permission faults that have
>> dirtied pages.
> 
> So how can userspace consistently set a quota of dirtied memory? This
> has to account for the size that has been faulted, because that's all
> userspace can reason about. Remember that at least on arm64, we're
> dealing with 3 different base page sizes, and many more large page
> sizes.

I understand that this helps only when the majority of dirtying is 
happening for the same page size. In our use case (VM live migration), 
even large page is broken into 4k pages at first dirty. If required in 
future, we can add individual counters for different page sizes.

Thanks for pointing this out.

>>>> +    quota: the observed dirty quota just before the exit to
>>>> userspace.
>>>
>>> You are defining the quota in terms of quota. -ENOCLUE.
>>
>> I am defining the "quota" member of the dirty_quota_exit struct in
>> terms of "dirty quota" which is already defined in the commit
>> message.
> 
> Which nobody will see. This is supposed to be a self contained
> documentation.
Ack. Thanks.

>>>> +The userspace can design a strategy to allocate the overall scope of dirtying
>>>> +for the VM among the vcpus. Based on the strategy and the current state of dirty
>>>> +quota throttling, the userspace can make a decision to either update (increase)
>>>> +the quota or to put the VCPU to sleep for some time.
>>>
>>> This looks like something out of 1984 (Newspeak anyone)? Can't you
>>> just say that userspace is responsible for allocating the quota and
>>> manage the resulting throttling effect?
>>
>> We didn't intend to sound like the Party or the Big Brother. We
>> started working on the linux and QEMU patches at the same time and got
>> tempted into exposing the details of how we were using this feature in
>> QEMU for throttling. I can get rid of the details if it helps.
> 
> I think the details are meaningless, and this should stick to the API,
> not the way the API could be used.

Ack. Thanks.

>>>> +	/*
>>>> +	 * Number of pages the vCPU is allowed to have dirtied over its entire
>>>> +	 * lifetime.  KVM_RUN exits with KVM_EXIT_DIRTY_QUOTA_EXHAUSTED if the quota
>>>> +	 * is reached/exceeded.
>>>> +	 */
>>>> +	__u64 dirty_quota;
>>>
>>> How are dirty_quota and dirty_quota_exit.quota related?
>>>
>>
>> dirty_quota_exit.quota is the dirty quota at the time of the exit. We
>> are capturing it for userspace's reference because dirty quota can be
>> updated anytime.
> 
> Shouldn't that be described here?

Ack. Thanks.

>>>> +#endif
>>>
>>> If you introduce additional #ifdefery here, why are the additional
>>> fields in the vcpu structure unconditional?
>>
>> pages_dirtied can be a useful information even if dirty quota
>> throttling is not used. So, I kept it unconditional based on
>> feedback.
> 
> Useful for whom? This creates an ABI for all architectures, and this
> needs buy-in from everyone. Personally, I think it is a pretty useless
> stat.

When we started this patch series, it was a member of the kvm_run 
struct. I made this a stat based on the feedback I received from the 
reviews. If you think otherwise, I can move it back to where it was.

Thanks.

> And while we're talking about pages_dirtied, I really dislike the
> WARN_ON in mark_page_dirty_in_slot(). A counter has rolled over?
> Shock, horror...

Ack. I'll give it a thought but if you have any specific suggestion on 
how I can make it better, kindly let me know. Thanks.

>>
>> CC: Sean
>>
>> I can add #ifdefery in the vcpu run struct for dirty_quota.
>>
>>>>    		else
>>>>    			set_bit_le(rel_gfn, memslot->dirty_bitmap);
>>>> +
>>>> +		if (kvm_vcpu_is_dirty_quota_exhausted(vcpu))
>>>> +			kvm_make_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu);
>>>
>>> This is broken in the light of new dirty-tracking code queued for
>>> 6.2. Specifically, you absolutely can end-up here *without* a vcpu on
>>> arm64. You just have to snapshot the ITS state to observe the fireworks.
>>
>> Could you please point me to the patchset which is in queue?
> 
> The patches are in -next, and you can look at the branch here[1].
> Please also refer to the discussion on the list, as a lot of what was
> discussed there does apply here.
> 
> Thanks,
> 
> 	M.
> 
> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_maz_arm-2Dplatforms.git_log_-3Fh-3Dkvm-2Darm64_dirty-2Dring&d=DwIBAg&c=s883GpUCOChKOHiocYtGcg&r=4hVFP4-J13xyn-OcN0apTCh8iKZRosf5OJTQePXBMB8&m=gyAAYSO2lIMhffCjshL9ZiA15_isV4kNauvn7aKEDy-kwpVrGNLlmO9AF6ilCsI1&s=x8gk31QIy9KHImR3z1xJOs9bSpKw1WYC_d1W-Vj5eTM&e=
> 

Thank you so much for the information. I went through the patches and I 
feel an additional check in the "if" condition will help eliminate any 
possible issue.

if (vcpu && kvm_vcpu_is_dirty_quota_exhausted(vcpu))
	kvm_make_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu);

Happy to know your thoughts.


Thank you so much Marc for the review.


Thanks,
Shivam

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus
  2022-12-06  6:22         ` Shivam Kumar
@ 2022-12-07 16:44           ` Marc Zyngier
  2022-12-07 19:53             ` Sean Christopherson
  2022-12-08  7:20             ` Shivam Kumar
  0 siblings, 2 replies; 38+ messages in thread
From: Marc Zyngier @ 2022-12-07 16:44 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: pbonzini, seanjc, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat

On Tue, 06 Dec 2022 06:22:45 +0000,
Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
> 
> 
> 
> On 22/11/22 11:16 pm, Marc Zyngier wrote:
> > On Fri, 18 Nov 2022 09:47:50 +0000,
> > Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
> >> 
> >> 
> >> 
> >> On 18/11/22 12:56 am, Marc Zyngier wrote:
> >>> On Sun, 13 Nov 2022 17:05:06 +0000,
> >>> Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
> >>>> 
> >>>> +    count: the current count of pages dirtied by the VCPU, can be
> >>>> +    skewed based on the size of the pages accessed by each vCPU.
> >>> 
> >>> How can userspace make a decision on the amount of dirtying this
> >>> represent if this doesn't represent a number of base pages? Or are you
> >>> saying that this only counts the number of permission faults that have
> >>> dirtied pages?
> >> 
> >> Yes, this only counts the number of permission faults that have
> >> dirtied pages.
> > 
> > So how can userspace consistently set a quota of dirtied memory? This
> > has to account for the size that has been faulted, because that's all
> > userspace can reason about. Remember that at least on arm64, we're
> > dealing with 3 different base page sizes, and many more large page
> > sizes.
> 
> I understand that this helps only when the majority of dirtying is
> happening for the same page size. In our use case (VM live migration),
> even large page is broken into 4k pages at first dirty. If required in
> future, we can add individual counters for different page sizes.
> 
> Thanks for pointing this out.

Adding counters for different page sizes won't help. It will only make
the API more complex and harder to reason about. arm64 has 3 base page
sizes, up to 5 levels of translation, the ability to have block
mappings at most levels, plus nice features such as contiguous hints
that we treat as another block size. If you're lucky, that's about a
dozen different sizes. Good luck with that.

You really need to move away from your particular, 4kB centric, live
migration use case. This is about throttling a vcpu based on how much
memory it dirties. Not about the number of page faults it takes.

You need to define the granularity of the counter, and account for
each fault according to its mapping size. If an architecture has 16kB
as the base page size, a 32MB fault (the size of the smallest block
mapping) must bump the counter by 2048. That's the only way userspace
can figure out what is going on.

Without that, you may as well add a random number to the counter, it
won't be any worse.

[...]

> >>> If you introduce additional #ifdefery here, why are the additional
> >>> fields in the vcpu structure unconditional?
> >> 
> >> pages_dirtied can be a useful information even if dirty quota
> >> throttling is not used. So, I kept it unconditional based on
> >> feedback.
> > 
> > Useful for whom? This creates an ABI for all architectures, and this
> > needs buy-in from everyone. Personally, I think it is a pretty useless
> > stat.
> 
> When we started this patch series, it was a member of the kvm_run
> struct. I made this a stat based on the feedback I received from the
> reviews. If you think otherwise, I can move it back to where it was.

I'm certainly totally opposed to stats that don't have a clear use
case. People keep piling random stats that satisfy their pet usage,
and this only bloats the various structures for no overall benefit
other than "hey, it might be useful". This is death by a thousand cut.

> > And while we're talking about pages_dirtied, I really dislike the
> > WARN_ON in mark_page_dirty_in_slot(). A counter has rolled over?
> > Shock, horror...
> 
> Ack. I'll give it a thought but if you have any specific suggestion on
> how I can make it better, kindly let me know. Thanks.

What is the effect of counter overflowing? Why is it important to
warn? What goes wrong? What could be changed to *avoid* this being an
issue?

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus
  2022-12-07 16:44           ` Marc Zyngier
@ 2022-12-07 19:53             ` Sean Christopherson
  2022-12-08  7:30               ` Shivam Kumar
  2022-12-08  7:20             ` Shivam Kumar
  1 sibling, 1 reply; 38+ messages in thread
From: Sean Christopherson @ 2022-12-07 19:53 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Shivam Kumar, pbonzini, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat

On Wed, Dec 07, 2022, Marc Zyngier wrote:
> On Tue, 06 Dec 2022 06:22:45 +0000,
> Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
> You need to define the granularity of the counter, and account for
> each fault according to its mapping size. If an architecture has 16kB
> as the base page size, a 32MB fault (the size of the smallest block
> mapping) must bump the counter by 2048. That's the only way userspace
> can figure out what is going on.

I don't think that's true for the dirty logging case.  IIUC, when a memslot is
being dirty logged, KVM forces the memory to be mapped with PAGE_SIZE granularity,
and that base PAGE_SIZE is fixed and known to userspace.  I.e. accuracy is naturally
provided for this primary use case where accuracy really matters, and so this is
effectively a documentation issue and not a functional issue.

> Without that, you may as well add a random number to the counter, it
> won't be any worse.

The stat will be wildly inaccurate when dirty logging isn't enabled, but that doesn't
necessarily make the stat useless, e.g. it might be useful as a very rough guage
of which vCPUs are likely to be writing memory.  I do agree though that the value
provided is questionable and/or highly speculative.

> [...]
> 
> > >>> If you introduce additional #ifdefery here, why are the additional
> > >>> fields in the vcpu structure unconditional?
> > >> 
> > >> pages_dirtied can be a useful information even if dirty quota
> > >> throttling is not used. So, I kept it unconditional based on
> > >> feedback.
> > > 
> > > Useful for whom? This creates an ABI for all architectures, and this
> > > needs buy-in from everyone. Personally, I think it is a pretty useless
> > > stat.
> > 
> > When we started this patch series, it was a member of the kvm_run
> > struct. I made this a stat based on the feedback I received from the
> > reviews. If you think otherwise, I can move it back to where it was.
> 
> I'm certainly totally opposed to stats that don't have a clear use
> case. People keep piling random stats that satisfy their pet usage,
> and this only bloats the various structures for no overall benefit
> other than "hey, it might be useful". This is death by a thousand cut.

I don't have a strong opinion on putting the counter into kvm_run as an "out"
fields vs. making it a state.  I originally suggested making it a stat because
KVM needs to capture the information somewhere, so why not make it a stat?  But
I am definitely much more cavalier when it comes to adding stats, so I've no
objection to dropping the stat side of things.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus
  2022-12-07 16:44           ` Marc Zyngier
  2022-12-07 19:53             ` Sean Christopherson
@ 2022-12-08  7:20             ` Shivam Kumar
  1 sibling, 0 replies; 38+ messages in thread
From: Shivam Kumar @ 2022-12-08  7:20 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: pbonzini, seanjc, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat

> I'm certainly totally opposed to stats that don't have a clear use
> case. People keep piling random stats that satisfy their pet usage,
> and this only bloats the various structures for no overall benefit
> other than "hey, it might be useful". This is death by a thousand cut.
> 
>>> And while we're talking about pages_dirtied, I really dislike the
>>> WARN_ON in mark_page_dirty_in_slot(). A counter has rolled over?
>>> Shock, horror...
>>
>> Ack. I'll give it a thought but if you have any specific suggestion on
>> how I can make it better, kindly let me know. Thanks.
> 
> What is the effect of counter overflowing? Why is it important to
> warn? What goes wrong? What could be changed to *avoid* this being an
> issue?
> 
> 	M.
> 

When dirty quota is not enabled, counter overflow has no harm as such. 
If dirty logging is enabled with dirty quota, two cases may arise:

i) While setting the dirty quota to count + new quota, the dirty quota 
itself overflows. Now, if the userspace doesn’t manipulate the count 
accordingly: the count will still be a large value and so the vcpu will 
exit to userspace again and again until the count also overflows at some 
point (this is inevitable because the count will continue getting 
incremented after each write).

ii) Dirty quota is very close to the max value of a 64 bit unsigned 
int. Dirty count can overflow in this case and the vcpu might never exit 
to userspace (with exit reason - dirty quota exhausted) which means no 
throttling happens. One possible way to resolve this is by exiting to 
userspace as soon as the count equals the dirty quota. By not waiting 
for the dirty count to exceed the dirty quota, we can avoid this. 
Though, this is difficult to achieve due to Intel’s PML.

In both these cases, nothing catastrophic happens; it’s just that the 
userspace’s expectations are not met. However, we can have clear 
instructions in the documentation on how the userspace can avoid these 
issues altogether by resetting the count and quota values whenever they 
exceed a safe level (for now I am able to think of 512 less than the 
maximum value of an unsigned int as a safe value for dirty quota). To 
allow the userspace to do it, we need to provide the userspace some way 
to reset the count. I am not sure how we can achieve this if the dirty 
count (i.e. pages dirtied) is a KVM stat. But, if we can make it a 
member of kvm_run, it is fairly simple to do. So IMO, yes, warn is 
useless here.

Happy to know your thoughts on this. Really grateful for the help so far.


Thanks,
Shivam

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus
  2022-12-07 19:53             ` Sean Christopherson
@ 2022-12-08  7:30               ` Shivam Kumar
  2022-12-25 16:50                 ` Shivam Kumar
  0 siblings, 1 reply; 38+ messages in thread
From: Shivam Kumar @ 2022-12-08  7:30 UTC (permalink / raw)
  To: Sean Christopherson, Marc Zyngier
  Cc: pbonzini, james.morse, borntraeger, david, kvm, Shaju Abraham,
	Manish Mishra, Anurag Madnawat



On 08/12/22 1:23 am, Sean Christopherson wrote:
> On Wed, Dec 07, 2022, Marc Zyngier wrote:
>> On Tue, 06 Dec 2022 06:22:45 +0000,
>> Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
>> You need to define the granularity of the counter, and account for
>> each fault according to its mapping size. If an architecture has 16kB
>> as the base page size, a 32MB fault (the size of the smallest block
>> mapping) must bump the counter by 2048. That's the only way userspace
>> can figure out what is going on.
> 
> I don't think that's true for the dirty logging case.  IIUC, when a memslot is
> being dirty logged, KVM forces the memory to be mapped with PAGE_SIZE granularity,
> and that base PAGE_SIZE is fixed and known to userspace.  I.e. accuracy is naturally
> provided for this primary use case where accuracy really matters, and so this is
> effectively a documentation issue and not a functional issue.

So, does defining "count" as "the number of write permission faults" 
help in addressing the documentation issue? My understanding too is that 
for dirty logging, we will have uniform granularity.

Thanks.

> 
>> Without that, you may as well add a random number to the counter, it
>> won't be any worse.
> 
> The stat will be wildly inaccurate when dirty logging isn't enabled, but that doesn't
> necessarily make the stat useless, e.g. it might be useful as a very rough guage
> of which vCPUs are likely to be writing memory.  I do agree though that the value
> provided is questionable and/or highly speculative.
> 
>> [...]
>>
>>>>>> If you introduce additional #ifdefery here, why are the additional
>>>>>> fields in the vcpu structure unconditional?
>>>>>
>>>>> pages_dirtied can be a useful information even if dirty quota
>>>>> throttling is not used. So, I kept it unconditional based on
>>>>> feedback.
>>>>
>>>> Useful for whom? This creates an ABI for all architectures, and this
>>>> needs buy-in from everyone. Personally, I think it is a pretty useless
>>>> stat.
>>>
>>> When we started this patch series, it was a member of the kvm_run
>>> struct. I made this a stat based on the feedback I received from the
>>> reviews. If you think otherwise, I can move it back to where it was.
>>
>> I'm certainly totally opposed to stats that don't have a clear use
>> case. People keep piling random stats that satisfy their pet usage,
>> and this only bloats the various structures for no overall benefit
>> other than "hey, it might be useful". This is death by a thousand cut.
> 
> I don't have a strong opinion on putting the counter into kvm_run as an "out"
> fields vs. making it a state.  I originally suggested making it a stat because
> KVM needs to capture the information somewhere, so why not make it a stat?  But
> I am definitely much more cavalier when it comes to adding stats, so I've no
> objection to dropping the stat side of things.

I'll be skeptical about making it a stat if we plan to allow the 
userspace to reset it at will.


Thank you so much for the comments.

Thanks,
Shivam

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus
  2022-12-08  7:30               ` Shivam Kumar
@ 2022-12-25 16:50                 ` Shivam Kumar
  2022-12-26 10:07                   ` Marc Zyngier
  0 siblings, 1 reply; 38+ messages in thread
From: Shivam Kumar @ 2022-12-25 16:50 UTC (permalink / raw)
  To: Sean Christopherson, Marc Zyngier
  Cc: pbonzini, james.morse, borntraeger, david, kvm, Shaju Abraham,
	Manish Mishra, Anurag Madnawat



On 08/12/22 1:00 pm, Shivam Kumar wrote:
> 
> 
> On 08/12/22 1:23 am, Sean Christopherson wrote:
>> On Wed, Dec 07, 2022, Marc Zyngier wrote:
>>> On Tue, 06 Dec 2022 06:22:45 +0000,
>>> Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
>>> You need to define the granularity of the counter, and account for
>>> each fault according to its mapping size. If an architecture has 16kB
>>> as the base page size, a 32MB fault (the size of the smallest block
>>> mapping) must bump the counter by 2048. That's the only way userspace
>>> can figure out what is going on.
>>
>> I don't think that's true for the dirty logging case.  IIUC, when a 
>> memslot is
>> being dirty logged, KVM forces the memory to be mapped with PAGE_SIZE 
>> granularity,
>> and that base PAGE_SIZE is fixed and known to userspace.  I.e. 
>> accuracy is naturally
>> provided for this primary use case where accuracy really matters, and 
>> so this is
>> effectively a documentation issue and not a functional issue.
> 
> So, does defining "count" as "the number of write permission faults" 
> help in addressing the documentation issue? My understanding too is that 
> for dirty logging, we will have uniform granularity.
> 
> Thanks.
> 
>>
>>> Without that, you may as well add a random number to the counter, it
>>> won't be any worse.
>>
>> The stat will be wildly inaccurate when dirty logging isn't enabled, 
>> but that doesn't
>> necessarily make the stat useless, e.g. it might be useful as a very 
>> rough guage
>> of which vCPUs are likely to be writing memory.  I do agree though 
>> that the value
>> provided is questionable and/or highly speculative.
>>
>>> [...]
>>>
>>>>>>> If you introduce additional #ifdefery here, why are the additional
>>>>>>> fields in the vcpu structure unconditional?
>>>>>>
>>>>>> pages_dirtied can be a useful information even if dirty quota
>>>>>> throttling is not used. So, I kept it unconditional based on
>>>>>> feedback.
>>>>>
>>>>> Useful for whom? This creates an ABI for all architectures, and this
>>>>> needs buy-in from everyone. Personally, I think it is a pretty useless
>>>>> stat.
>>>>
>>>> When we started this patch series, it was a member of the kvm_run
>>>> struct. I made this a stat based on the feedback I received from the
>>>> reviews. If you think otherwise, I can move it back to where it was.
>>>
>>> I'm certainly totally opposed to stats that don't have a clear use
>>> case. People keep piling random stats that satisfy their pet usage,
>>> and this only bloats the various structures for no overall benefit
>>> other than "hey, it might be useful". This is death by a thousand cut.
>>
>> I don't have a strong opinion on putting the counter into kvm_run as 
>> an "out"
>> fields vs. making it a state.  I originally suggested making it a stat 
>> because
>> KVM needs to capture the information somewhere, so why not make it a 
>> stat?  But
>> I am definitely much more cavalier when it comes to adding stats, so 
>> I've no
>> objection to dropping the stat side of things.
> 
> I'll be skeptical about making it a stat if we plan to allow the 
> userspace to reset it at will.
> 
> 
> Thank you so much for the comments.
> 
> Thanks,
> Shivam

Hi Marc,
Hi Sean,

Please let me know if there's any further question or feedback.

Thanks,
Shivam

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus
  2022-12-25 16:50                 ` Shivam Kumar
@ 2022-12-26 10:07                   ` Marc Zyngier
  2023-01-07 17:24                     ` Shivam Kumar
  0 siblings, 1 reply; 38+ messages in thread
From: Marc Zyngier @ 2022-12-26 10:07 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: Sean Christopherson, pbonzini, james.morse, borntraeger, david,
	kvm, Shaju Abraham, Manish Mishra, Anurag Madnawat

On Sun, 25 Dec 2022 16:50:04 +0000,
Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
> 
> 
> 
> On 08/12/22 1:00 pm, Shivam Kumar wrote:
> > 
> > 
> > On 08/12/22 1:23 am, Sean Christopherson wrote:
> >> On Wed, Dec 07, 2022, Marc Zyngier wrote:
> >>> On Tue, 06 Dec 2022 06:22:45 +0000,
> >>> Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
> >>> You need to define the granularity of the counter, and account for
> >>> each fault according to its mapping size. If an architecture has 16kB
> >>> as the base page size, a 32MB fault (the size of the smallest block
> >>> mapping) must bump the counter by 2048. That's the only way userspace
> >>> can figure out what is going on.
> >> 
> >> I don't think that's true for the dirty logging case.  IIUC, when a
> >> memslot is
> >> being dirty logged, KVM forces the memory to be mapped with
> >> PAGE_SIZE granularity,
> >> and that base PAGE_SIZE is fixed and known to userspace. 
> >> I.e. accuracy is naturally
> >> provided for this primary use case where accuracy really matters,
> >> and so this is
> >> effectively a documentation issue and not a functional issue.
> > 
> > So, does defining "count" as "the number of write permission faults"
> > help in addressing the documentation issue? My understanding too is
> > that for dirty logging, we will have uniform granularity.
> > 
> > Thanks.
> > 
> >> 
> >>> Without that, you may as well add a random number to the counter, it
> >>> won't be any worse.
> >> 
> >> The stat will be wildly inaccurate when dirty logging isn't
> >> enabled, but that doesn't
> >> necessarily make the stat useless, e.g. it might be useful as a
> >> very rough guage
> >> of which vCPUs are likely to be writing memory.  I do agree though
> >> that the value
> >> provided is questionable and/or highly speculative.
> >> 
> >>> [...]
> >>> 
> >>>>>>> If you introduce additional #ifdefery here, why are the additional
> >>>>>>> fields in the vcpu structure unconditional?
> >>>>>> 
> >>>>>> pages_dirtied can be a useful information even if dirty quota
> >>>>>> throttling is not used. So, I kept it unconditional based on
> >>>>>> feedback.
> >>>>> 
> >>>>> Useful for whom? This creates an ABI for all architectures, and this
> >>>>> needs buy-in from everyone. Personally, I think it is a pretty useless
> >>>>> stat.
> >>>> 
> >>>> When we started this patch series, it was a member of the kvm_run
> >>>> struct. I made this a stat based on the feedback I received from the
> >>>> reviews. If you think otherwise, I can move it back to where it was.
> >>> 
> >>> I'm certainly totally opposed to stats that don't have a clear use
> >>> case. People keep piling random stats that satisfy their pet usage,
> >>> and this only bloats the various structures for no overall benefit
> >>> other than "hey, it might be useful". This is death by a thousand cut.
> >> 
> >> I don't have a strong opinion on putting the counter into kvm_run
> >> as an "out"
> >> fields vs. making it a state.  I originally suggested making it a
> >> stat because
> >> KVM needs to capture the information somewhere, so why not make it
> >> a stat?  But
> >> I am definitely much more cavalier when it comes to adding stats,
> >> so I've no
> >> objection to dropping the stat side of things.
> > 
> > I'll be skeptical about making it a stat if we plan to allow the
> > userspace to reset it at will.
> > 
> > 
> > Thank you so much for the comments.
> > 
> > Thanks,
> > Shivam
> 
> Hi Marc,
> Hi Sean,
> 
> Please let me know if there's any further question or feedback.

My earlier comments still stand: the proposed API is not usable as a
general purpose memory-tracking API because it counts faults instead
of memory, making it inadequate except for the most trivial cases.
And I cannot believe you were serious when you mentioned that you were
happy to make that the API.

This requires some serious work, and this series is not yet near a
state where it could be merged.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus
  2022-12-26 10:07                   ` Marc Zyngier
@ 2023-01-07 17:24                     ` Shivam Kumar
  2023-01-07 21:44                       ` Marc Zyngier
  0 siblings, 1 reply; 38+ messages in thread
From: Shivam Kumar @ 2023-01-07 17:24 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Sean Christopherson, pbonzini, james.morse, borntraeger, david,
	kvm, Shaju Abraham, Manish Mishra, Anurag Madnawat



On 26/12/22 3:37 pm, Marc Zyngier wrote:
> On Sun, 25 Dec 2022 16:50:04 +0000,
> Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
>>
>>
>>
>> On 08/12/22 1:00 pm, Shivam Kumar wrote:
>>>
>>>
>>> On 08/12/22 1:23 am, Sean Christopherson wrote:
>>>> On Wed, Dec 07, 2022, Marc Zyngier wrote:
>>>>> On Tue, 06 Dec 2022 06:22:45 +0000,
>>>>> Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
>>>>> You need to define the granularity of the counter, and account for
>>>>> each fault according to its mapping size. If an architecture has 16kB
>>>>> as the base page size, a 32MB fault (the size of the smallest block
>>>>> mapping) must bump the counter by 2048. That's the only way userspace
>>>>> can figure out what is going on.
>>>>
>>>> I don't think that's true for the dirty logging case.  IIUC, when a
>>>> memslot is
>>>> being dirty logged, KVM forces the memory to be mapped with
>>>> PAGE_SIZE granularity,
>>>> and that base PAGE_SIZE is fixed and known to userspace.
>>>> I.e. accuracy is naturally
>>>> provided for this primary use case where accuracy really matters,
>>>> and so this is
>>>> effectively a documentation issue and not a functional issue.
>>>
>>> So, does defining "count" as "the number of write permission faults"
>>> help in addressing the documentation issue? My understanding too is
>>> that for dirty logging, we will have uniform granularity.
>>>
>>> Thanks.
>>>
>>>>
>>>>> Without that, you may as well add a random number to the counter, it
>>>>> won't be any worse.
>>>>
>>>> The stat will be wildly inaccurate when dirty logging isn't
>>>> enabled, but that doesn't
>>>> necessarily make the stat useless, e.g. it might be useful as a
>>>> very rough guage
>>>> of which vCPUs are likely to be writing memory.  I do agree though
>>>> that the value
>>>> provided is questionable and/or highly speculative.
>>>>
>>>>> [...]
>>>>>
>>>>>>>>> If you introduce additional #ifdefery here, why are the additional
>>>>>>>>> fields in the vcpu structure unconditional?
>>>>>>>>
>>>>>>>> pages_dirtied can be a useful information even if dirty quota
>>>>>>>> throttling is not used. So, I kept it unconditional based on
>>>>>>>> feedback.
>>>>>>>
>>>>>>> Useful for whom? This creates an ABI for all architectures, and this
>>>>>>> needs buy-in from everyone. Personally, I think it is a pretty useless
>>>>>>> stat.
>>>>>>
>>>>>> When we started this patch series, it was a member of the kvm_run
>>>>>> struct. I made this a stat based on the feedback I received from the
>>>>>> reviews. If you think otherwise, I can move it back to where it was.
>>>>>
>>>>> I'm certainly totally opposed to stats that don't have a clear use
>>>>> case. People keep piling random stats that satisfy their pet usage,
>>>>> and this only bloats the various structures for no overall benefit
>>>>> other than "hey, it might be useful". This is death by a thousand cut.
>>>>
>>>> I don't have a strong opinion on putting the counter into kvm_run
>>>> as an "out"
>>>> fields vs. making it a state.  I originally suggested making it a
>>>> stat because
>>>> KVM needs to capture the information somewhere, so why not make it
>>>> a stat?  But
>>>> I am definitely much more cavalier when it comes to adding stats,
>>>> so I've no
>>>> objection to dropping the stat side of things.
>>>
>>> I'll be skeptical about making it a stat if we plan to allow the
>>> userspace to reset it at will.
>>>
>>>
>>> Thank you so much for the comments.
>>>
>>> Thanks,
>>> Shivam
>>
>> Hi Marc,
>> Hi Sean,
>>
>> Please let me know if there's any further question or feedback.
> 
> My earlier comments still stand: the proposed API is not usable as a
> general purpose memory-tracking API because it counts faults instead
> of memory, making it inadequate except for the most trivial cases.
> And I cannot believe you were serious when you mentioned that you were
> happy to make that the API.
> 
> This requires some serious work, and this series is not yet near a
> state where it could be merged.
> 
> Thanks,
> 
> 	M.
> 

Hi Marc,

IIUC, in the dirty ring interface too, the dirty_index variable is 
incremented in the mark_page_dirty_in_slot function and it is also 
count-based. At least on x86, I am aware that for dirty tracking we have 
uniform granularity as huge pages (2MB pages) too are broken into 4K 
pages and bitmap is at 4K-granularity. Please let me know if it is 
possible to have multiple page sizes even during dirty logging on ARM. 
And if that is the case, I am wondering how we handle the bitmap with 
different page sizes on ARM.

I agree that the notion of pages dirtied according to our pages_dirtied 
variable depends on how we are handling the bitmap but we expect the 
userspace to use the same granularity at which the dirty bitmap is 
handled. I can capture this in documentation


CC: Peter Xu

Thanks,
Shivam

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus
  2023-01-07 17:24                     ` Shivam Kumar
@ 2023-01-07 21:44                       ` Marc Zyngier
  2023-01-14 13:07                         ` Shivam Kumar
  0 siblings, 1 reply; 38+ messages in thread
From: Marc Zyngier @ 2023-01-07 21:44 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: Sean Christopherson, pbonzini, james.morse, borntraeger, david,
	kvm, Shaju Abraham, Manish Mishra, Anurag Madnawat

On Sat, 07 Jan 2023 17:24:24 +0000,
Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
> On 26/12/22 3:37 pm, Marc Zyngier wrote:
> > On Sun, 25 Dec 2022 16:50:04 +0000,
> > Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
> >> 
> >> Hi Marc,
> >> Hi Sean,
> >> 
> >> Please let me know if there's any further question or feedback.
> > 
> > My earlier comments still stand: the proposed API is not usable as a
> > general purpose memory-tracking API because it counts faults instead
> > of memory, making it inadequate except for the most trivial cases.
> > And I cannot believe you were serious when you mentioned that you were
> > happy to make that the API.
> > 
> > This requires some serious work, and this series is not yet near a
> > state where it could be merged.
> > 
> > Thanks,
> > 
> > 	M.
> > 
> 
> Hi Marc,
> 
> IIUC, in the dirty ring interface too, the dirty_index variable is
> incremented in the mark_page_dirty_in_slot function and it is also
> count-based. At least on x86, I am aware that for dirty tracking we
> have uniform granularity as huge pages (2MB pages) too are broken into
> 4K pages and bitmap is at 4K-granularity. Please let me know if it is
> possible to have multiple page sizes even during dirty logging on
> ARM. And if that is the case, I am wondering how we handle the bitmap
> with different page sizes on ARM.

Easy. It *is* page-size, by the very definition of the API which
explicitly says that a single bit represent one basic page. If you
were to only break 1GB mappings into 2MB blocks, you'd have to mask
512 pages dirty at once, no question asked.

Your API is different because at no point it implies any relationship
with any page size. As it stands, it is a useless API. I understand
that you are only concerned with your particular use case, but that's
nowhere good enough. And it has nothing to do with ARM. This is
equally broken on *any* architecture.

> I agree that the notion of pages dirtied according to our
> pages_dirtied variable depends on how we are handling the bitmap but
> we expect the userspace to use the same granularity at which the dirty
> bitmap is handled. I can capture this in documentation

But what does the bitmap have to do with any of this? This is not what
your API is about. You are supposed to count dirtied memory, and you
are counting page faults instead. No sane userspace can make any sense
of that. You keep coupling the two, but that's wrong. This thing has
to be useful on its own, not just for your particular, super narrow
use case. And that's a shame because the general idea of a dirty quota
is an interesting one.

If your sole intention is to capture in the documentation that the API
is broken, then all I can do is to NAK the whole thing. Until you turn
this page-fault quota into the dirty memory quota that you advertise,
I'll continue to say no to it.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus
  2023-01-07 21:44                       ` Marc Zyngier
@ 2023-01-14 13:07                         ` Shivam Kumar
  2023-01-15  9:56                           ` Marc Zyngier
  0 siblings, 1 reply; 38+ messages in thread
From: Shivam Kumar @ 2023-01-14 13:07 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Sean Christopherson, pbonzini, james.morse, borntraeger, david,
	kvm, Shaju Abraham, Manish Mishra, Anurag Madnawat



On 08/01/23 3:14 am, Marc Zyngier wrote:
> On Sat, 07 Jan 2023 17:24:24 +0000,
> Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
>> On 26/12/22 3:37 pm, Marc Zyngier wrote:
>>> On Sun, 25 Dec 2022 16:50:04 +0000,
>>> Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
>>>>
>>>> Hi Marc,
>>>> Hi Sean,
>>>>
>>>> Please let me know if there's any further question or feedback.
>>>
>>> My earlier comments still stand: the proposed API is not usable as a
>>> general purpose memory-tracking API because it counts faults instead
>>> of memory, making it inadequate except for the most trivial cases.
>>> And I cannot believe you were serious when you mentioned that you were
>>> happy to make that the API.
>>>
>>> This requires some serious work, and this series is not yet near a
>>> state where it could be merged.
>>>
>>> Thanks,
>>>
>>> 	M.
>>>
>>
>> Hi Marc,
>>
>> IIUC, in the dirty ring interface too, the dirty_index variable is
>> incremented in the mark_page_dirty_in_slot function and it is also
>> count-based. At least on x86, I am aware that for dirty tracking we
>> have uniform granularity as huge pages (2MB pages) too are broken into
>> 4K pages and bitmap is at 4K-granularity. Please let me know if it is
>> possible to have multiple page sizes even during dirty logging on
>> ARM. And if that is the case, I am wondering how we handle the bitmap
>> with different page sizes on ARM.
> 
> Easy. It *is* page-size, by the very definition of the API which
> explicitly says that a single bit represent one basic page. If you
> were to only break 1GB mappings into 2MB blocks, you'd have to mask
> 512 pages dirty at once, no question asked.
> 
> Your API is different because at no point it implies any relationship
> with any page size. As it stands, it is a useless API. I understand
> that you are only concerned with your particular use case, but that's
> nowhere good enough. And it has nothing to do with ARM. This is
> equally broken on *any* architecture.
> 
>> I agree that the notion of pages dirtied according to our
>> pages_dirtied variable depends on how we are handling the bitmap but
>> we expect the userspace to use the same granularity at which the dirty
>> bitmap is handled. I can capture this in documentation
> 
> But what does the bitmap have to do with any of this? This is not what
> your API is about. You are supposed to count dirtied memory, and you
> are counting page faults instead. No sane userspace can make any sense
> of that. You keep coupling the two, but that's wrong. This thing has
> to be useful on its own, not just for your particular, super narrow
> use case. And that's a shame because the general idea of a dirty quota
> is an interesting one.
> 
> If your sole intention is to capture in the documentation that the API
> is broken, then all I can do is to NAK the whole thing. Until you turn
> this page-fault quota into the dirty memory quota that you advertise,
> I'll continue to say no to it.
> 
> Thanks,
> 
> 	M.
> 

Thank you Marc for the suggestion. We can make dirty quota count dirtied 
memory rather than faults.

run->dirty_quota -= page_size;

We can raise a kvm request for exiting to userspace as soon as the dirty 
quota of the vcpu becomes zero or negative. Please let me know if this 
looks good to you.

Thanks,
Shivam

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus
  2023-01-14 13:07                         ` Shivam Kumar
@ 2023-01-15  9:56                           ` Marc Zyngier
  2023-01-15 14:50                             ` Shivam Kumar
  2023-01-29 22:00                             ` Shivam Kumar
  0 siblings, 2 replies; 38+ messages in thread
From: Marc Zyngier @ 2023-01-15  9:56 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: Sean Christopherson, pbonzini, james.morse, borntraeger, david,
	kvm, Shaju Abraham, Manish Mishra, Anurag Madnawat

On Sat, 14 Jan 2023 13:07:44 +0000,
Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
> 
> 
> 
> On 08/01/23 3:14 am, Marc Zyngier wrote:
> > On Sat, 07 Jan 2023 17:24:24 +0000,
> > Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
> >> On 26/12/22 3:37 pm, Marc Zyngier wrote:
> >>> On Sun, 25 Dec 2022 16:50:04 +0000,
> >>> Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
> >>>> 
> >>>> Hi Marc,
> >>>> Hi Sean,
> >>>> 
> >>>> Please let me know if there's any further question or feedback.
> >>> 
> >>> My earlier comments still stand: the proposed API is not usable as a
> >>> general purpose memory-tracking API because it counts faults instead
> >>> of memory, making it inadequate except for the most trivial cases.
> >>> And I cannot believe you were serious when you mentioned that you were
> >>> happy to make that the API.
> >>> 
> >>> This requires some serious work, and this series is not yet near a
> >>> state where it could be merged.
> >>> 
> >>> Thanks,
> >>> 
> >>> 	M.
> >>> 
> >> 
> >> Hi Marc,
> >> 
> >> IIUC, in the dirty ring interface too, the dirty_index variable is
> >> incremented in the mark_page_dirty_in_slot function and it is also
> >> count-based. At least on x86, I am aware that for dirty tracking we
> >> have uniform granularity as huge pages (2MB pages) too are broken into
> >> 4K pages and bitmap is at 4K-granularity. Please let me know if it is
> >> possible to have multiple page sizes even during dirty logging on
> >> ARM. And if that is the case, I am wondering how we handle the bitmap
> >> with different page sizes on ARM.
> > 
> > Easy. It *is* page-size, by the very definition of the API which
> > explicitly says that a single bit represent one basic page. If you
> > were to only break 1GB mappings into 2MB blocks, you'd have to mask
> > 512 pages dirty at once, no question asked.
> > 
> > Your API is different because at no point it implies any relationship
> > with any page size. As it stands, it is a useless API. I understand
> > that you are only concerned with your particular use case, but that's
> > nowhere good enough. And it has nothing to do with ARM. This is
> > equally broken on *any* architecture.
> > 
> >> I agree that the notion of pages dirtied according to our
> >> pages_dirtied variable depends on how we are handling the bitmap but
> >> we expect the userspace to use the same granularity at which the dirty
> >> bitmap is handled. I can capture this in documentation
> > 
> > But what does the bitmap have to do with any of this? This is not what
> > your API is about. You are supposed to count dirtied memory, and you
> > are counting page faults instead. No sane userspace can make any sense
> > of that. You keep coupling the two, but that's wrong. This thing has
> > to be useful on its own, not just for your particular, super narrow
> > use case. And that's a shame because the general idea of a dirty quota
> > is an interesting one.
> > 
> > If your sole intention is to capture in the documentation that the API
> > is broken, then all I can do is to NAK the whole thing. Until you turn
> > this page-fault quota into the dirty memory quota that you advertise,
> > I'll continue to say no to it.
> > 
> > Thanks,
> > 
> > 	M.
> > 
> 
> Thank you Marc for the suggestion. We can make dirty quota count
> dirtied memory rather than faults.
> 
> run->dirty_quota -= page_size;
>
> We can raise a kvm request for exiting to userspace as soon as the
> dirty quota of the vcpu becomes zero or negative. Please let me know
> if this looks good to you.

It really depends what "page_size" represents here. If you mean
"mapping size", then yes. If you really mean "page size", then no.

Assuming this is indeed "mapping size", then it all depends on how
this is integrated and how this is managed in a generic, cross
architecture way.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus
  2023-01-15  9:56                           ` Marc Zyngier
@ 2023-01-15 14:50                             ` Shivam Kumar
  2023-01-15 19:13                               ` Marc Zyngier
  2023-01-29 22:00                             ` Shivam Kumar
  1 sibling, 1 reply; 38+ messages in thread
From: Shivam Kumar @ 2023-01-15 14:50 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Sean Christopherson, pbonzini, james.morse, borntraeger, david,
	kvm, Shaju Abraham, Manish Mishra, Anurag Madnawat



On 15/01/23 3:26 pm, Marc Zyngier wrote:
> On Sat, 14 Jan 2023 13:07:44 +0000,
> Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
>>
>>
>>
>> On 08/01/23 3:14 am, Marc Zyngier wrote:
>>> On Sat, 07 Jan 2023 17:24:24 +0000,
>>> Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
>>>> On 26/12/22 3:37 pm, Marc Zyngier wrote:
>>>>> On Sun, 25 Dec 2022 16:50:04 +0000,
>>>>> Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
>>>>>>
>>>>>> Hi Marc,
>>>>>> Hi Sean,
>>>>>>
>>>>>> Please let me know if there's any further question or feedback.
>>>>>
>>>>> My earlier comments still stand: the proposed API is not usable as a
>>>>> general purpose memory-tracking API because it counts faults instead
>>>>> of memory, making it inadequate except for the most trivial cases.
>>>>> And I cannot believe you were serious when you mentioned that you were
>>>>> happy to make that the API.
>>>>>
>>>>> This requires some serious work, and this series is not yet near a
>>>>> state where it could be merged.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> 	M.
>>>>>
>>>>
>>>> Hi Marc,
>>>>
>>>> IIUC, in the dirty ring interface too, the dirty_index variable is
>>>> incremented in the mark_page_dirty_in_slot function and it is also
>>>> count-based. At least on x86, I am aware that for dirty tracking we
>>>> have uniform granularity as huge pages (2MB pages) too are broken into
>>>> 4K pages and bitmap is at 4K-granularity. Please let me know if it is
>>>> possible to have multiple page sizes even during dirty logging on
>>>> ARM. And if that is the case, I am wondering how we handle the bitmap
>>>> with different page sizes on ARM.
>>>
>>> Easy. It *is* page-size, by the very definition of the API which
>>> explicitly says that a single bit represent one basic page. If you
>>> were to only break 1GB mappings into 2MB blocks, you'd have to mask
>>> 512 pages dirty at once, no question asked.
>>>
>>> Your API is different because at no point it implies any relationship
>>> with any page size. As it stands, it is a useless API. I understand
>>> that you are only concerned with your particular use case, but that's
>>> nowhere good enough. And it has nothing to do with ARM. This is
>>> equally broken on *any* architecture.
>>>
>>>> I agree that the notion of pages dirtied according to our
>>>> pages_dirtied variable depends on how we are handling the bitmap but
>>>> we expect the userspace to use the same granularity at which the dirty
>>>> bitmap is handled. I can capture this in documentation
>>>
>>> But what does the bitmap have to do with any of this? This is not what
>>> your API is about. You are supposed to count dirtied memory, and you
>>> are counting page faults instead. No sane userspace can make any sense
>>> of that. You keep coupling the two, but that's wrong. This thing has
>>> to be useful on its own, not just for your particular, super narrow
>>> use case. And that's a shame because the general idea of a dirty quota
>>> is an interesting one.
>>>
>>> If your sole intention is to capture in the documentation that the API
>>> is broken, then all I can do is to NAK the whole thing. Until you turn
>>> this page-fault quota into the dirty memory quota that you advertise,
>>> I'll continue to say no to it.
>>>
>>> Thanks,
>>>
>>> 	M.
>>>
>>
>> Thank you Marc for the suggestion. We can make dirty quota count
>> dirtied memory rather than faults.
>>
>> run->dirty_quota -= page_size;
>>
>> We can raise a kvm request for exiting to userspace as soon as the
>> dirty quota of the vcpu becomes zero or negative. Please let me know
>> if this looks good to you.
> 
> It really depends what "page_size" represents here. If you mean
> "mapping size", then yes. If you really mean "page size", then no.
> 
> Assuming this is indeed "mapping size", then it all depends on how
> this is integrated and how this is managed in a generic, cross
> architecture way.
> 
> Thanks,
> 
> 	M.
> 

Yes, it is "mapping size". I can see that there's a "npages" variable in 
"kvm_memory_slot" which determines the number of bits we need to track 
dirtying for a given memory slot. And this variable is computed by right 
shifting the memory size by PAGE_SHIFT. Each arch defines the macro 
PAGE_SHIFT, and another macro PAGE_SIZE as the left shift of 1 by 
PAGE_SHIFT. Does it make sense to use this macro?

Thanks, again, Marc.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus
  2023-01-15 14:50                             ` Shivam Kumar
@ 2023-01-15 19:13                               ` Marc Zyngier
  0 siblings, 0 replies; 38+ messages in thread
From: Marc Zyngier @ 2023-01-15 19:13 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: Sean Christopherson, pbonzini, james.morse, borntraeger, david,
	kvm, Shaju Abraham, Manish Mishra, Anurag Madnawat

On Sun, 15 Jan 2023 14:50:55 +0000,
Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
> 
> >> Thank you Marc for the suggestion. We can make dirty quota count
> >> dirtied memory rather than faults.
> >> 
> >> run->dirty_quota -= page_size;
> >> 
> >> We can raise a kvm request for exiting to userspace as soon as the
> >> dirty quota of the vcpu becomes zero or negative. Please let me know
> >> if this looks good to you.
> > 
> > It really depends what "page_size" represents here. If you mean
> > "mapping size", then yes. If you really mean "page size", then no.
> > 
> > Assuming this is indeed "mapping size", then it all depends on how
> > this is integrated and how this is managed in a generic, cross
> > architecture way.
> > 
> > Thanks,
> > 
> > 	M.
> > 
> 
> Yes, it is "mapping size". I can see that there's a "npages" variable
> in "kvm_memory_slot" which determines the number of bits we need to
> track dirtying for a given memory slot. And this variable is computed
> by right shifting the memory size by PAGE_SHIFT. Each arch defines the
> macro PAGE_SHIFT, and another macro PAGE_SIZE as the left shift of 1
> by PAGE_SHIFT. Does it make sense to use this macro?

I don't think it makes any sense.

There is nothing in the memslot structure that you can make use of.
The information you need is the page table structure itself (the
level, precisely), which tells you how big the mapping is for this
particular part of the memslot.

This is dynamic information, not defined at memslot creation. Which is
why it can only be captured at fault time (with the exception of
HugeTLBFS backed memslots for which the mapping size is cast into
stone).

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus
  2023-01-15  9:56                           ` Marc Zyngier
  2023-01-15 14:50                             ` Shivam Kumar
@ 2023-01-29 22:00                             ` Shivam Kumar
  2023-02-11  6:52                               ` Shivam Kumar
  2023-02-12 17:56                               ` Shivam Kumar
  1 sibling, 2 replies; 38+ messages in thread
From: Shivam Kumar @ 2023-01-29 22:00 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Sean Christopherson, pbonzini, james.morse, borntraeger, david,
	kvm, Shaju Abraham, Manish Mishra, Anurag Madnawat



On 15/01/23 3:26 pm, Marc Zyngier wrote:
> On Sat, 14 Jan 2023 13:07:44 +0000,
> Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
>>
>>
>>
>> On 08/01/23 3:14 am, Marc Zyngier wrote:
>>> On Sat, 07 Jan 2023 17:24:24 +0000,
>>> Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
>>>> On 26/12/22 3:37 pm, Marc Zyngier wrote:
>>>>> On Sun, 25 Dec 2022 16:50:04 +0000,
>>>>> Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
>>>>>>
>>>>>> Hi Marc,
>>>>>> Hi Sean,
>>>>>>
>>>>>> Please let me know if there's any further question or feedback.
>>>>>
>>>>> My earlier comments still stand: the proposed API is not usable as a
>>>>> general purpose memory-tracking API because it counts faults instead
>>>>> of memory, making it inadequate except for the most trivial cases.
>>>>> And I cannot believe you were serious when you mentioned that you were
>>>>> happy to make that the API.
>>>>>
>>>>> This requires some serious work, and this series is not yet near a
>>>>> state where it could be merged.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> 	M.
>>>>>
>>>>
>>>> Hi Marc,
>>>>
>>>> IIUC, in the dirty ring interface too, the dirty_index variable is
>>>> incremented in the mark_page_dirty_in_slot function and it is also
>>>> count-based. At least on x86, I am aware that for dirty tracking we
>>>> have uniform granularity as huge pages (2MB pages) too are broken into
>>>> 4K pages and bitmap is at 4K-granularity. Please let me know if it is
>>>> possible to have multiple page sizes even during dirty logging on
>>>> ARM. And if that is the case, I am wondering how we handle the bitmap
>>>> with different page sizes on ARM.
>>>
>>> Easy. It *is* page-size, by the very definition of the API which
>>> explicitly says that a single bit represent one basic page. If you
>>> were to only break 1GB mappings into 2MB blocks, you'd have to mask
>>> 512 pages dirty at once, no question asked.
>>>
>>> Your API is different because at no point it implies any relationship
>>> with any page size. As it stands, it is a useless API. I understand
>>> that you are only concerned with your particular use case, but that's
>>> nowhere good enough. And it has nothing to do with ARM. This is
>>> equally broken on *any* architecture.
>>>
>>>> I agree that the notion of pages dirtied according to our
>>>> pages_dirtied variable depends on how we are handling the bitmap but
>>>> we expect the userspace to use the same granularity at which the dirty
>>>> bitmap is handled. I can capture this in documentation
>>>
>>> But what does the bitmap have to do with any of this? This is not what
>>> your API is about. You are supposed to count dirtied memory, and you
>>> are counting page faults instead. No sane userspace can make any sense
>>> of that. You keep coupling the two, but that's wrong. This thing has
>>> to be useful on its own, not just for your particular, super narrow
>>> use case. And that's a shame because the general idea of a dirty quota
>>> is an interesting one.
>>>
>>> If your sole intention is to capture in the documentation that the API
>>> is broken, then all I can do is to NAK the whole thing. Until you turn
>>> this page-fault quota into the dirty memory quota that you advertise,
>>> I'll continue to say no to it.
>>>
>>> Thanks,
>>>
>>> 	M.
>>>
>>
>> Thank you Marc for the suggestion. We can make dirty quota count
>> dirtied memory rather than faults.
>>
>> run->dirty_quota -= page_size;
>>
>> We can raise a kvm request for exiting to userspace as soon as the
>> dirty quota of the vcpu becomes zero or negative. Please let me know
>> if this looks good to you.
> 
> It really depends what "page_size" represents here. If you mean
> "mapping size", then yes. If you really mean "page size", then no.
> 
> Assuming this is indeed "mapping size", then it all depends on how
> this is integrated and how this is managed in a generic, cross
> architecture way.
> 
> Thanks,
> 
> 	M.
> 

Hi Marc,

I'm proposing this new implementation to address the concern you raised 
regarding dirty quota being a non-generic feature with the previous 
implementation. This implementation decouples dirty quota from dirty 
logging for the ARM64 arch. We shall post a similar implementation for 
x86 if this looks good. With this new implementation, dirty quota can be 
enforced independent of dirty logging. Dirty quota is now in bytes and 
is decreased at write-protect page fault by page fault granularity. For 
userspace, the interface is unchanged, i.e. the dirty quota can be set 
from userspace via an ioctl or by forcing the vcpu to exit to userspace; 
userspace can expect a KVM exit with exit reason 
KVM_EXIT_DIRTY_QUOTA_EXHAUSTED when the dirty quota is exhausted.

Please let me know if it looks good to you. Happy to hear any further
feedback and work on it. Also, I am curious about use case scenarios 
other than dirty tracking for dirty quota. Besides, I am not aware of 
any interface exposed to the userspace, other than the dirty 
tracking-related ioctls, to write-protect guest pages transiently 
(unlike mprotect, which will generate a SIGSEGV signal on write).

Thanks,
Shivam


---
  arch/arm64/kvm/mmu.c 	|  1 +
  include/linux/kvm_host.h |  1 +
  virt/kvm/kvm_main.c  	| 12 ++++++++++++
  3 files changed, 14 insertions(+)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 60ee3d9f01f8..edd88529d622 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1336,6 +1336,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
      	/* Mark the page dirty only if the fault is handled successfully */
      	if (writable && !ret) {
              	kvm_set_pfn_dirty(pfn);
+           	update_dirty_quota(kvm, fault_granule);
              	mark_page_dirty_in_slot(kvm, memslot, gfn);
      	}

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0b9b5c251a04..10fda457ac3d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1219,6 +1219,7 @@ struct kvm_memory_slot *gfn_to_memslot(struct kvm 
*kvm, gfn_t gfn);
  bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
  bool kvm_vcpu_is_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
  unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn);
+void update_dirty_quota(struct kvm *kvm, unsigned long 
dirty_granule_bytes);
  void mark_page_dirty_in_slot(struct kvm *kvm, const struct 
kvm_memory_slot *memslot, gfn_t gfn);
  void mark_page_dirty(struct kvm *kvm, gfn_t gfn);

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7a54438b4d49..377cc9d07e80 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3309,6 +3309,18 @@ static bool 
kvm_vcpu_is_dirty_quota_exhausted(struct kvm_vcpu *vcpu)
  #endif
  }

+void update_dirty_quota(struct kvm *kvm, unsigned long dirty_granule_bytes)
+{
+	if (kvm->dirty_quota_enabled) {
+		struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
+
+   		if (!vcpu)
+           		return;
+
+   		vcpu->run->dirty_quota_bytes -= dirty_granule_bytes;
+   		if (vcpu->run->dirty_quota_bytes <= 0)
+                   		kvm_make_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu);
+	}
+}
+
  void mark_page_dirty_in_slot(struct kvm *kvm,
                           	const struct kvm_memory_slot *memslot,
                     	gfn_t gfn)
-- 
2.22.3

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus
  2023-01-29 22:00                             ` Shivam Kumar
@ 2023-02-11  6:52                               ` Shivam Kumar
  2023-02-12 17:09                                 ` Marc Zyngier
  2023-02-12 17:56                               ` Shivam Kumar
  1 sibling, 1 reply; 38+ messages in thread
From: Shivam Kumar @ 2023-02-11  6:52 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Sean Christopherson, pbonzini, james.morse, borntraeger, david,
	kvm, Shaju Abraham, Manish Mishra, Anurag Madnawat

> 
> Hi Marc,
> 
> I'm proposing this new implementation to address the concern you raised 
> regarding dirty quota being a non-generic feature with the previous 
> implementation. This implementation decouples dirty quota from dirty 
> logging for the ARM64 arch. We shall post a similar implementation for 
> x86 if this looks good. With this new implementation, dirty quota can be 
> enforced independent of dirty logging. Dirty quota is now in bytes and 

Hi Marc,

Thank you for your valuable feedback so far. Looking forward to your 
feedback on this new proposition.

Thanks,
Shivam

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus
  2023-02-11  6:52                               ` Shivam Kumar
@ 2023-02-12 17:09                                 ` Marc Zyngier
  2023-02-12 17:54                                   ` Shivam Kumar
  0 siblings, 1 reply; 38+ messages in thread
From: Marc Zyngier @ 2023-02-12 17:09 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: Sean Christopherson, pbonzini, james.morse, borntraeger, david,
	kvm, Shaju Abraham, Manish Mishra, Anurag Madnawat

On Sat, 11 Feb 2023 06:52:02 +0000,
Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
> 
> > 
> > Hi Marc,
> > 
> > I'm proposing this new implementation to address the concern you
> > raised regarding dirty quota being a non-generic feature with the
> > previous implementation. This implementation decouples dirty quota
> > from dirty logging for the ARM64 arch. We shall post a similar
> > implementation for x86 if this looks good. With this new
> > implementation, dirty quota can be enforced independent of dirty
> > logging. Dirty quota is now in bytes and 
> 
> Hi Marc,
> 
> Thank you for your valuable feedback so far. Looking forward to your
> feedback on this new proposition.

I'm not sure what you are expecting from me here. I've explained in
great details what I wanted to see, repeatedly. This above says
nothing other than "we are going to do *something* that matches your
expectations".

My answer is, to quote someone else, "show me the code". Until then, I
don't have much to add.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus
  2023-02-12 17:09                                 ` Marc Zyngier
@ 2023-02-12 17:54                                   ` Shivam Kumar
  2023-02-12 18:02                                     ` Marc Zyngier
  0 siblings, 1 reply; 38+ messages in thread
From: Shivam Kumar @ 2023-02-12 17:54 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Sean Christopherson, pbonzini, james.morse, borntraeger, david,
	kvm, Shaju Abraham, Manish Mishra, Anurag Madnawat



On 12/02/23 10:39 pm, Marc Zyngier wrote:
> On Sat, 11 Feb 2023 06:52:02 +0000,
> Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
>>
>>>
>>> Hi Marc,
>>>
>>> I'm proposing this new implementation to address the concern you
>>> raised regarding dirty quota being a non-generic feature with the
>>> previous implementation. This implementation decouples dirty quota
>>> from dirty logging for the ARM64 arch. We shall post a similar
>>> implementation for x86 if this looks good. With this new
>>> implementation, dirty quota can be enforced independent of dirty
>>> logging. Dirty quota is now in bytes and
>>
>> Hi Marc,
>>
>> Thank you for your valuable feedback so far. Looking forward to your
>> feedback on this new proposition.
> 
> I'm not sure what you are expecting from me here. I've explained in
> great details what I wanted to see, repeatedly. This above says
> nothing other than "we are going to do *something* that matches your
> expectations".
> 
> My answer is, to quote someone else, "show me the code". Until then, I
> don't have much to add.
> 
> Thanks,
> 
> 	M.
> 

Hi Marc,

I had posted some code in the previous comment. Let me tag you there.

Thanks,
Shivam

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus
  2023-01-29 22:00                             ` Shivam Kumar
  2023-02-11  6:52                               ` Shivam Kumar
@ 2023-02-12 17:56                               ` Shivam Kumar
  1 sibling, 0 replies; 38+ messages in thread
From: Shivam Kumar @ 2023-02-12 17:56 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Sean Christopherson, pbonzini, james.morse, borntraeger, david,
	kvm, Shaju Abraham, Manish Mishra, Anurag Madnawat



On 30/01/23 3:30 am, Shivam Kumar wrote:
> 
> 
> On 15/01/23 3:26 pm, Marc Zyngier wrote:
>> On Sat, 14 Jan 2023 13:07:44 +0000,
>> Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
>>>
>>>
>>>
>>> On 08/01/23 3:14 am, Marc Zyngier wrote:
>>>> On Sat, 07 Jan 2023 17:24:24 +0000,
>>>> Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
>>>>> On 26/12/22 3:37 pm, Marc Zyngier wrote:
>>>>>> On Sun, 25 Dec 2022 16:50:04 +0000,
>>>>>> Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
>>>>>>>
>>>>>>> Hi Marc,
>>>>>>> Hi Sean,
>>>>>>>
>>>>>>> Please let me know if there's any further question or feedback.
>>>>>>
>>>>>> My earlier comments still stand: the proposed API is not usable as a
>>>>>> general purpose memory-tracking API because it counts faults instead
>>>>>> of memory, making it inadequate except for the most trivial cases.
>>>>>> And I cannot believe you were serious when you mentioned that you 
>>>>>> were
>>>>>> happy to make that the API.
>>>>>>
>>>>>> This requires some serious work, and this series is not yet near a
>>>>>> state where it could be merged.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>     M.
>>>>>>
>>>>>
>>>>> Hi Marc,
>>>>>
>>>>> IIUC, in the dirty ring interface too, the dirty_index variable is
>>>>> incremented in the mark_page_dirty_in_slot function and it is also
>>>>> count-based. At least on x86, I am aware that for dirty tracking we
>>>>> have uniform granularity as huge pages (2MB pages) too are broken into
>>>>> 4K pages and bitmap is at 4K-granularity. Please let me know if it is
>>>>> possible to have multiple page sizes even during dirty logging on
>>>>> ARM. And if that is the case, I am wondering how we handle the bitmap
>>>>> with different page sizes on ARM.
>>>>
>>>> Easy. It *is* page-size, by the very definition of the API which
>>>> explicitly says that a single bit represent one basic page. If you
>>>> were to only break 1GB mappings into 2MB blocks, you'd have to mask
>>>> 512 pages dirty at once, no question asked.
>>>>
>>>> Your API is different because at no point it implies any relationship
>>>> with any page size. As it stands, it is a useless API. I understand
>>>> that you are only concerned with your particular use case, but that's
>>>> nowhere good enough. And it has nothing to do with ARM. This is
>>>> equally broken on *any* architecture.
>>>>
>>>>> I agree that the notion of pages dirtied according to our
>>>>> pages_dirtied variable depends on how we are handling the bitmap but
>>>>> we expect the userspace to use the same granularity at which the dirty
>>>>> bitmap is handled. I can capture this in documentation
>>>>
>>>> But what does the bitmap have to do with any of this? This is not what
>>>> your API is about. You are supposed to count dirtied memory, and you
>>>> are counting page faults instead. No sane userspace can make any sense
>>>> of that. You keep coupling the two, but that's wrong. This thing has
>>>> to be useful on its own, not just for your particular, super narrow
>>>> use case. And that's a shame because the general idea of a dirty quota
>>>> is an interesting one.
>>>>
>>>> If your sole intention is to capture in the documentation that the API
>>>> is broken, then all I can do is to NAK the whole thing. Until you turn
>>>> this page-fault quota into the dirty memory quota that you advertise,
>>>> I'll continue to say no to it.
>>>>
>>>> Thanks,
>>>>
>>>>     M.
>>>>
>>>
>>> Thank you Marc for the suggestion. We can make dirty quota count
>>> dirtied memory rather than faults.
>>>
>>> run->dirty_quota -= page_size;
>>>
>>> We can raise a kvm request for exiting to userspace as soon as the
>>> dirty quota of the vcpu becomes zero or negative. Please let me know
>>> if this looks good to you.
>>
>> It really depends what "page_size" represents here. If you mean
>> "mapping size", then yes. If you really mean "page size", then no.
>>
>> Assuming this is indeed "mapping size", then it all depends on how
>> this is integrated and how this is managed in a generic, cross
>> architecture way.
>>
>> Thanks,
>>
>>     M.
>>
> 
> Hi Marc,
> 
> I'm proposing this new implementation to address the concern you raised 
> regarding dirty quota being a non-generic feature with the previous 
> implementation. This implementation decouples dirty quota from dirty 
> logging for the ARM64 arch. We shall post a similar implementation for 
> x86 if this looks good. With this new implementation, dirty quota can be 
> enforced independent of dirty logging. Dirty quota is now in bytes and 
> is decreased at write-protect page fault by page fault granularity. For 
> userspace, the interface is unchanged, i.e. the dirty quota can be set 
> from userspace via an ioctl or by forcing the vcpu to exit to userspace; 
> userspace can expect a KVM exit with exit reason 
> KVM_EXIT_DIRTY_QUOTA_EXHAUSTED when the dirty quota is exhausted.
> 
> Please let me know if it looks good to you. Happy to hear any further
> feedback and work on it. Also, I am curious about use case scenarios 
> other than dirty tracking for dirty quota. Besides, I am not aware of 
> any interface exposed to the userspace, other than the dirty 
> tracking-related ioctls, to write-protect guest pages transiently 
> (unlike mprotect, which will generate a SIGSEGV signal on write).
> 
> Thanks,
> Shivam
> 
> 
> ---
>   arch/arm64/kvm/mmu.c     |  1 +
>   include/linux/kvm_host.h |  1 +
>   virt/kvm/kvm_main.c      | 12 ++++++++++++
>   3 files changed, 14 insertions(+)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 60ee3d9f01f8..edd88529d622 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1336,6 +1336,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> phys_addr_t fault_ipa,
>           /* Mark the page dirty only if the fault is handled 
> successfully */
>           if (writable && !ret) {
>                   kvm_set_pfn_dirty(pfn);
> +               update_dirty_quota(kvm, fault_granule);
>                   mark_page_dirty_in_slot(kvm, memslot, gfn);
>           }
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 0b9b5c251a04..10fda457ac3d 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1219,6 +1219,7 @@ struct kvm_memory_slot *gfn_to_memslot(struct kvm 
> *kvm, gfn_t gfn);
>   bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
>   bool kvm_vcpu_is_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
>   unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn);
> +void update_dirty_quota(struct kvm *kvm, unsigned long 
> dirty_granule_bytes);
>   void mark_page_dirty_in_slot(struct kvm *kvm, const struct 
> kvm_memory_slot *memslot, gfn_t gfn);
>   void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 7a54438b4d49..377cc9d07e80 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3309,6 +3309,18 @@ static bool 
> kvm_vcpu_is_dirty_quota_exhausted(struct kvm_vcpu *vcpu)
>   #endif
>   }
> 
> +void update_dirty_quota(struct kvm *kvm, unsigned long 
> dirty_granule_bytes)
> +{
> +    if (kvm->dirty_quota_enabled) {
> +        struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
> +
> +           if (!vcpu)
> +                   return;
> +
> +           vcpu->run->dirty_quota_bytes -= dirty_granule_bytes;
> +           if (vcpu->run->dirty_quota_bytes <= 0)
> +                           kvm_make_request(KVM_REQ_DIRTY_QUOTA_EXIT, 
> vcpu);
> +    }
> +}
> +
>   void mark_page_dirty_in_slot(struct kvm *kvm,
>                                const struct kvm_memory_slot *memslot,
>                          gfn_t gfn)


Hi Marc,

Please review the above code.

Thanks,
Shivam

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus
  2023-02-12 17:54                                   ` Shivam Kumar
@ 2023-02-12 18:02                                     ` Marc Zyngier
  0 siblings, 0 replies; 38+ messages in thread
From: Marc Zyngier @ 2023-02-12 18:02 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: Sean Christopherson, pbonzini, james.morse, borntraeger, david,
	kvm, Shaju Abraham, Manish Mishra, Anurag Madnawat

On Sun, 12 Feb 2023 17:54:30 +0000,
Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
> 
> 
> 
> On 12/02/23 10:39 pm, Marc Zyngier wrote:
> > On Sat, 11 Feb 2023 06:52:02 +0000,
> > Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
> >> 
> >>> 
> >>> Hi Marc,
> >>> 
> >>> I'm proposing this new implementation to address the concern you
> >>> raised regarding dirty quota being a non-generic feature with the
> >>> previous implementation. This implementation decouples dirty quota
> >>> from dirty logging for the ARM64 arch. We shall post a similar
> >>> implementation for x86 if this looks good. With this new
> >>> implementation, dirty quota can be enforced independent of dirty
> >>> logging. Dirty quota is now in bytes and
> >> 
> >> Hi Marc,
> >> 
> >> Thank you for your valuable feedback so far. Looking forward to your
> >> feedback on this new proposition.
> > 
> > I'm not sure what you are expecting from me here. I've explained in
> > great details what I wanted to see, repeatedly. This above says
> > nothing other than "we are going to do *something* that matches your
> > expectations".
> > 
> > My answer is, to quote someone else, "show me the code". Until then, I
> > don't have much to add.
> > 
> > Thanks,
> > 
> > 	M.
> > 
> 
> Hi Marc,
> 
> I had posted some code in the previous comment. Let me tag you there.

You posted a tiny snippet that is completely out of context.

I took the time to fully review your series and provide extensive
comments. You could, similarly, take the time to post a complete
series for people to review the new proposal.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2023-02-12 18:02 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-13 17:05 [PATCH v7 0/4] KVM: Dirty quota-based throttling Shivam Kumar
2022-11-13 17:05 ` [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus Shivam Kumar
2022-11-14 23:29   ` Yunhong Jiang
2022-11-15  4:48     ` Shivam Kumar
2022-11-17 19:26   ` Marc Zyngier
2022-11-18  9:47     ` Shivam Kumar
2022-11-22 17:46       ` Marc Zyngier
2022-12-06  6:22         ` Shivam Kumar
2022-12-07 16:44           ` Marc Zyngier
2022-12-07 19:53             ` Sean Christopherson
2022-12-08  7:30               ` Shivam Kumar
2022-12-25 16:50                 ` Shivam Kumar
2022-12-26 10:07                   ` Marc Zyngier
2023-01-07 17:24                     ` Shivam Kumar
2023-01-07 21:44                       ` Marc Zyngier
2023-01-14 13:07                         ` Shivam Kumar
2023-01-15  9:56                           ` Marc Zyngier
2023-01-15 14:50                             ` Shivam Kumar
2023-01-15 19:13                               ` Marc Zyngier
2023-01-29 22:00                             ` Shivam Kumar
2023-02-11  6:52                               ` Shivam Kumar
2023-02-12 17:09                                 ` Marc Zyngier
2023-02-12 17:54                                   ` Shivam Kumar
2023-02-12 18:02                                     ` Marc Zyngier
2023-02-12 17:56                               ` Shivam Kumar
2022-12-08  7:20             ` Shivam Kumar
2022-11-25 10:52   ` kernel test robot
2022-11-13 17:05 ` [PATCH v7 2/4] KVM: x86: Dirty " Shivam Kumar
2022-11-15  0:16   ` Yunhong Jiang
2022-11-15  4:55     ` Shivam Kumar
2022-11-15  6:45       ` Yunhong Jiang
2022-11-18  8:51         ` Shivam Kumar
2022-11-13 17:05 ` [PATCH v7 3/4] KVM: arm64: " Shivam Kumar
2022-11-15  0:27   ` Yunhong Jiang
2022-11-15  5:10     ` Shivam Kumar
2022-11-17 20:44   ` Marc Zyngier
2022-11-18  8:56     ` Shivam Kumar
2022-11-13 17:05 ` [PATCH v7 4/4] KVM: selftests: Add selftests for dirty quota throttling Shivam Kumar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.