All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: sched: add notifier for cross-cpu migrations
@ 2013-07-10 10:34 Peter Zijlstra
  2013-07-10 14:39 ` Ingo Molnar
  2013-07-11  1:21 ` [PATCH] remove sched " Marcelo Tosatti
  0 siblings, 2 replies; 11+ messages in thread
From: Peter Zijlstra @ 2013-07-10 10:34 UTC (permalink / raw)
  To: mtosatti; +Cc: mingo, gleb, glommer, tglx, jeremy, rostedt, linux-kernel

Hi,

I just stumbled over commit 582b336ec2 and had a massive WTF moment.

The Changelog -- empty!
The Implementation -- complete crap!

fail^2.

A git grep later I find its x86_64 paravirt only.. for this we add
unconditional crap to the scheduler?

At the _very_ least this should have been wrapped in a static_key and the
changelog should have given some clue as to the what and why of this code.

/me grumpy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: sched: add notifier for cross-cpu migrations
  2013-07-10 10:34 sched: add notifier for cross-cpu migrations Peter Zijlstra
@ 2013-07-10 14:39 ` Ingo Molnar
  2013-07-10 14:52   ` Peter Zijlstra
  2013-07-11  1:21 ` [PATCH] remove sched " Marcelo Tosatti
  1 sibling, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2013-07-10 14:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mtosatti, mingo, gleb, glommer, tglx, jeremy, rostedt, linux-kernel


* Peter Zijlstra <peterz@infradead.org> wrote:

> Hi,
> 
> I just stumbled over commit 582b336ec2 and had a massive WTF moment.
> 
> The Changelog -- empty!
> The Implementation -- complete crap!
> 
> fail^2.
> 
> A git grep later I find its x86_64 paravirt only.. for this we add 
> unconditional crap to the scheduler?
> 
> At the _very_ least this should have been wrapped in a static_key and 
> the changelog should have given some clue as to the what and why of this 
> code.

My bad for letting it slip through...

Marcelo, mind implementing the suggestions from Peter?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: sched: add notifier for cross-cpu migrations
  2013-07-10 14:39 ` Ingo Molnar
@ 2013-07-10 14:52   ` Peter Zijlstra
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2013-07-10 14:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: mtosatti, mingo, gleb, glommer, tglx, jeremy, rostedt, linux-kernel

On Wed, Jul 10, 2013 at 04:39:02PM +0200, Ingo Molnar wrote:
> 
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > Hi,
> > 
> > I just stumbled over commit 582b336ec2 and had a massive WTF moment.
> > 
> > The Changelog -- empty!
> > The Implementation -- complete crap!
> > 
> > fail^2.
> > 
> > A git grep later I find its x86_64 paravirt only.. for this we add 
> > unconditional crap to the scheduler?
> > 
> > At the _very_ least this should have been wrapped in a static_key and 
> > the changelog should have given some clue as to the what and why of this 
> > code.
> 
> My bad for letting it slip through...
> 
> Marcelo, mind implementing the suggestions from Peter?

So ideally we'd kill the entire notifier, notifiers make it far too easy for
others to use -- and I don't want silent users of stuff like this.

Ideally I'd see a direct callback:

  pvclock_migration_callback(t, new_cpu);

That compiles to 'do { } while (0);' for kernels without CONFIG_PARAVIRT.  Then
for paravirt it would use a static_key to get a real callback only if there's a
pvclock user.

Something like:

#define pvclock_migration_callback(_t, _cpu)			\
do {								\
	if (static_key_false(&pvclock_key))			\
		__pvclock_migration_callback((_t), (_cpu));	\
} while (0)

NOTE: static_key_false() doesn't test false, it assumes the default is false
and makes the function call the out-of-line jump -- horridly confusing function
name.

Your pvclock muck would do:

 static_key_slow_inc() -- for every new user and,
 static_key_slow_dec() -- for every user gone.

Its a reference count scheme, so that when there's no users there is only a 5
byte nop.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] remove sched notifier for cross-cpu migrations
  2013-07-10 10:34 sched: add notifier for cross-cpu migrations Peter Zijlstra
  2013-07-10 14:39 ` Ingo Molnar
@ 2013-07-11  1:21 ` Marcelo Tosatti
  2013-07-11  8:10   ` Gleb Natapov
                     ` (3 more replies)
  1 sibling, 4 replies; 11+ messages in thread
From: Marcelo Tosatti @ 2013-07-11  1:21 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, gleb, glommer, tglx, jeremy, rostedt, linux-kernel


Linux as a guest on KVM hypervisor, the only user of the pvclock      
vsyscall interface, does not require notification on task migration 
because:

1. cpu ID number maps 1:1 to per-CPU pvclock time info.
2. per-CPU pvclock time info is updated if the
   underlying CPU changes.
3. that version is increased whenever underlying CPU 
   changes.

Which is sufficient to guarantee nanoseconds counter
is calculated properly.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index 109a9dd..be8269b 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -93,7 +93,6 @@ unsigned __pvclock_read_cycles(const struct pvclock_vcpu_time_info *src,
 
 struct pvclock_vsyscall_time_info {
 	struct pvclock_vcpu_time_info pvti;
-	u32 migrate_count;
 } __attribute__((__aligned__(SMP_CACHE_BYTES)));
 
 #define PVTI_SIZE sizeof(struct pvclock_vsyscall_time_info)
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 2cb9470..a16bae3 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -128,46 +128,7 @@ void pvclock_read_wallclock(struct pvclock_wall_clock *wall_clock,
 	set_normalized_timespec(ts, now.tv_sec, now.tv_nsec);
 }
 
-static struct pvclock_vsyscall_time_info *pvclock_vdso_info;
-
-static struct pvclock_vsyscall_time_info *
-pvclock_get_vsyscall_user_time_info(int cpu)
-{
-	if (!pvclock_vdso_info) {
-		BUG();
-		return NULL;
-	}
-
-	return &pvclock_vdso_info[cpu];
-}
-
-struct pvclock_vcpu_time_info *pvclock_get_vsyscall_time_info(int cpu)
-{
-	return &pvclock_get_vsyscall_user_time_info(cpu)->pvti;
-}
-
 #ifdef CONFIG_X86_64
-static int pvclock_task_migrate(struct notifier_block *nb, unsigned long l,
-			        void *v)
-{
-	struct task_migration_notifier *mn = v;
-	struct pvclock_vsyscall_time_info *pvti;
-
-	pvti = pvclock_get_vsyscall_user_time_info(mn->from_cpu);
-
-	/* this is NULL when pvclock vsyscall is not initialized */
-	if (unlikely(pvti == NULL))
-		return NOTIFY_DONE;
-
-	pvti->migrate_count++;
-
-	return NOTIFY_DONE;
-}
-
-static struct notifier_block pvclock_migrate = {
-	.notifier_call = pvclock_task_migrate,
-};
-
 /*
  * Initialize the generic pvclock vsyscall state.  This will allocate
  * a/some page(s) for the per-vcpu pvclock information, set up a
@@ -181,17 +142,12 @@ int __init pvclock_init_vsyscall(struct pvclock_vsyscall_time_info *i,
 
 	WARN_ON (size != PVCLOCK_VSYSCALL_NR_PAGES*PAGE_SIZE);
 
-	pvclock_vdso_info = i;
-
 	for (idx = 0; idx <= (PVCLOCK_FIXMAP_END-PVCLOCK_FIXMAP_BEGIN); idx++) {
 		__set_fixmap(PVCLOCK_FIXMAP_BEGIN + idx,
 			     __pa(i) + (idx*PAGE_SIZE),
 			     PAGE_KERNEL_VVAR);
 	}
 
-
-	register_task_migration_notifier(&pvclock_migrate);
-
 	return 0;
 }
 #endif
diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index c74436e..72074d5 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -85,15 +85,18 @@ static notrace cycle_t vread_pvclock(int *mode)
 	cycle_t ret;
 	u64 last;
 	u32 version;
-	u32 migrate_count;
 	u8 flags;
 	unsigned cpu, cpu1;
 
 
 	/*
-	 * When looping to get a consistent (time-info, tsc) pair, we
-	 * also need to deal with the possibility we can switch vcpus,
-	 * so make sure we always re-fetch time-info for the current vcpu.
+	 * Note: hypervisor must guarantee that:
+	 * 1. cpu ID number maps 1:1 to per-CPU pvclock time info.
+	 * 2. that per-CPU pvclock time info is updated if the
+	 *    underlying CPU changes.
+	 * 3. that version is increased whenever underlying CPU
+	 *    changes.
+	 *
 	 */
 	do {
 		cpu = __getcpu() & VGETCPU_CPU_MASK;
@@ -104,8 +107,6 @@ static notrace cycle_t vread_pvclock(int *mode)
 
 		pvti = get_pvti(cpu);
 
-		migrate_count = pvti->migrate_count;
-
 		version = __pvclock_read_cycles(&pvti->pvti, &ret, &flags);
 
 		/*
@@ -117,8 +118,7 @@ static notrace cycle_t vread_pvclock(int *mode)
 		cpu1 = __getcpu() & VGETCPU_CPU_MASK;
 	} while (unlikely(cpu != cpu1 ||
 			  (pvti->pvti.version & 1) ||
-			  pvti->pvti.version != version ||
-			  pvti->migrate_count != migrate_count));
+			  pvti->pvti.version != version));
 
 	if (unlikely(!(flags & PVCLOCK_TSC_STABLE_BIT)))
 		*mode = VCLOCK_NONE;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 178a8d9..217ce4b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -107,14 +107,6 @@ extern unsigned long this_cpu_load(void);
 extern void calc_global_load(unsigned long ticks);
 extern void update_cpu_load_nohz(void);
 
-/* Notifier for when a task gets migrated to a new CPU */
-struct task_migration_notifier {
-	struct task_struct *task;
-	int from_cpu;
-	int to_cpu;
-};
-extern void register_task_migration_notifier(struct notifier_block *n);
-
 extern unsigned long get_parent_ip(unsigned long addr);
 
 extern void dump_cpu_task(int cpu);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e8b3350..122e499 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -974,13 +974,6 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
 		rq->skip_clock_update = 1;
 }
 
-static ATOMIC_NOTIFIER_HEAD(task_migration_notifier);
-
-void register_task_migration_notifier(struct notifier_block *n)
-{
-	atomic_notifier_chain_register(&task_migration_notifier, n);
-}
-
 #ifdef CONFIG_SMP
 void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 {
@@ -1011,18 +1004,10 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 	trace_sched_migrate_task(p, new_cpu);
 
 	if (task_cpu(p) != new_cpu) {
-		struct task_migration_notifier tmn;
-
 		if (p->sched_class->migrate_task_rq)
 			p->sched_class->migrate_task_rq(p, new_cpu);
 		p->se.nr_migrations++;
 		perf_sw_event(PERF_COUNT_SW_CPU_MIGRATIONS, 1, NULL, 0);
-
-		tmn.task = p;
-		tmn.from_cpu = task_cpu(p);
-		tmn.to_cpu = new_cpu;
-
-		atomic_notifier_call_chain(&task_migration_notifier, 0, &tmn);
 	}
 
 	__set_task_cpu(p, new_cpu);



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] remove sched notifier for cross-cpu migrations
  2013-07-11  1:21 ` [PATCH] remove sched " Marcelo Tosatti
@ 2013-07-11  8:10   ` Gleb Natapov
  2013-07-11 22:15     ` Marcelo Tosatti
  2013-07-11  9:52   ` Peter Zijlstra
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Gleb Natapov @ 2013-07-11  8:10 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Peter Zijlstra, mingo, glommer, tglx, jeremy, rostedt, linux-kernel

On Wed, Jul 10, 2013 at 10:21:57PM -0300, Marcelo Tosatti wrote:
> 
> Linux as a guest on KVM hypervisor, the only user of the pvclock      
> vsyscall interface, does not require notification on task migration 
> because:
> 
> 1. cpu ID number maps 1:1 to per-CPU pvclock time info.
> 2. per-CPU pvclock time info is updated if the
>    underlying CPU changes.
> 3. that version is increased whenever underlying CPU 
>    changes.
> 
Is this the case with KVM though? IIRC KVM does not updates kvmclock
on vcpu migration if host has synchronized TSC.

> Which is sufficient to guarantee nanoseconds counter
> is calculated properly.
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> 
> diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
> index 109a9dd..be8269b 100644
> --- a/arch/x86/include/asm/pvclock.h
> +++ b/arch/x86/include/asm/pvclock.h
> @@ -93,7 +93,6 @@ unsigned __pvclock_read_cycles(const struct pvclock_vcpu_time_info *src,
>  
>  struct pvclock_vsyscall_time_info {
>  	struct pvclock_vcpu_time_info pvti;
> -	u32 migrate_count;
>  } __attribute__((__aligned__(SMP_CACHE_BYTES)));
>  
>  #define PVTI_SIZE sizeof(struct pvclock_vsyscall_time_info)
> diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
> index 2cb9470..a16bae3 100644
> --- a/arch/x86/kernel/pvclock.c
> +++ b/arch/x86/kernel/pvclock.c
> @@ -128,46 +128,7 @@ void pvclock_read_wallclock(struct pvclock_wall_clock *wall_clock,
>  	set_normalized_timespec(ts, now.tv_sec, now.tv_nsec);
>  }
>  
> -static struct pvclock_vsyscall_time_info *pvclock_vdso_info;
> -
> -static struct pvclock_vsyscall_time_info *
> -pvclock_get_vsyscall_user_time_info(int cpu)
> -{
> -	if (!pvclock_vdso_info) {
> -		BUG();
> -		return NULL;
> -	}
> -
> -	return &pvclock_vdso_info[cpu];
> -}
> -
> -struct pvclock_vcpu_time_info *pvclock_get_vsyscall_time_info(int cpu)
> -{
> -	return &pvclock_get_vsyscall_user_time_info(cpu)->pvti;
> -}
> -
>  #ifdef CONFIG_X86_64
> -static int pvclock_task_migrate(struct notifier_block *nb, unsigned long l,
> -			        void *v)
> -{
> -	struct task_migration_notifier *mn = v;
> -	struct pvclock_vsyscall_time_info *pvti;
> -
> -	pvti = pvclock_get_vsyscall_user_time_info(mn->from_cpu);
> -
> -	/* this is NULL when pvclock vsyscall is not initialized */
> -	if (unlikely(pvti == NULL))
> -		return NOTIFY_DONE;
> -
> -	pvti->migrate_count++;
> -
> -	return NOTIFY_DONE;
> -}
> -
> -static struct notifier_block pvclock_migrate = {
> -	.notifier_call = pvclock_task_migrate,
> -};
> -
>  /*
>   * Initialize the generic pvclock vsyscall state.  This will allocate
>   * a/some page(s) for the per-vcpu pvclock information, set up a
> @@ -181,17 +142,12 @@ int __init pvclock_init_vsyscall(struct pvclock_vsyscall_time_info *i,
>  
>  	WARN_ON (size != PVCLOCK_VSYSCALL_NR_PAGES*PAGE_SIZE);
>  
> -	pvclock_vdso_info = i;
> -
>  	for (idx = 0; idx <= (PVCLOCK_FIXMAP_END-PVCLOCK_FIXMAP_BEGIN); idx++) {
>  		__set_fixmap(PVCLOCK_FIXMAP_BEGIN + idx,
>  			     __pa(i) + (idx*PAGE_SIZE),
>  			     PAGE_KERNEL_VVAR);
>  	}
>  
> -
> -	register_task_migration_notifier(&pvclock_migrate);
> -
>  	return 0;
>  }
>  #endif
> diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
> index c74436e..72074d5 100644
> --- a/arch/x86/vdso/vclock_gettime.c
> +++ b/arch/x86/vdso/vclock_gettime.c
> @@ -85,15 +85,18 @@ static notrace cycle_t vread_pvclock(int *mode)
>  	cycle_t ret;
>  	u64 last;
>  	u32 version;
> -	u32 migrate_count;
>  	u8 flags;
>  	unsigned cpu, cpu1;
>  
>  
>  	/*
> -	 * When looping to get a consistent (time-info, tsc) pair, we
> -	 * also need to deal with the possibility we can switch vcpus,
> -	 * so make sure we always re-fetch time-info for the current vcpu.
> +	 * Note: hypervisor must guarantee that:
> +	 * 1. cpu ID number maps 1:1 to per-CPU pvclock time info.
> +	 * 2. that per-CPU pvclock time info is updated if the
> +	 *    underlying CPU changes.
> +	 * 3. that version is increased whenever underlying CPU
> +	 *    changes.
> +	 *
>  	 */
>  	do {
>  		cpu = __getcpu() & VGETCPU_CPU_MASK;
> @@ -104,8 +107,6 @@ static notrace cycle_t vread_pvclock(int *mode)
>  
>  		pvti = get_pvti(cpu);
>  
> -		migrate_count = pvti->migrate_count;
> -
>  		version = __pvclock_read_cycles(&pvti->pvti, &ret, &flags);
>  
>  		/*
> @@ -117,8 +118,7 @@ static notrace cycle_t vread_pvclock(int *mode)
>  		cpu1 = __getcpu() & VGETCPU_CPU_MASK;
>  	} while (unlikely(cpu != cpu1 ||
>  			  (pvti->pvti.version & 1) ||
> -			  pvti->pvti.version != version ||
> -			  pvti->migrate_count != migrate_count));
> +			  pvti->pvti.version != version));
>  
>  	if (unlikely(!(flags & PVCLOCK_TSC_STABLE_BIT)))
>  		*mode = VCLOCK_NONE;
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 178a8d9..217ce4b 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -107,14 +107,6 @@ extern unsigned long this_cpu_load(void);
>  extern void calc_global_load(unsigned long ticks);
>  extern void update_cpu_load_nohz(void);
>  
> -/* Notifier for when a task gets migrated to a new CPU */
> -struct task_migration_notifier {
> -	struct task_struct *task;
> -	int from_cpu;
> -	int to_cpu;
> -};
> -extern void register_task_migration_notifier(struct notifier_block *n);
> -
>  extern unsigned long get_parent_ip(unsigned long addr);
>  
>  extern void dump_cpu_task(int cpu);
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index e8b3350..122e499 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -974,13 +974,6 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
>  		rq->skip_clock_update = 1;
>  }
>  
> -static ATOMIC_NOTIFIER_HEAD(task_migration_notifier);
> -
> -void register_task_migration_notifier(struct notifier_block *n)
> -{
> -	atomic_notifier_chain_register(&task_migration_notifier, n);
> -}
> -
>  #ifdef CONFIG_SMP
>  void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
>  {
> @@ -1011,18 +1004,10 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
>  	trace_sched_migrate_task(p, new_cpu);
>  
>  	if (task_cpu(p) != new_cpu) {
> -		struct task_migration_notifier tmn;
> -
>  		if (p->sched_class->migrate_task_rq)
>  			p->sched_class->migrate_task_rq(p, new_cpu);
>  		p->se.nr_migrations++;
>  		perf_sw_event(PERF_COUNT_SW_CPU_MIGRATIONS, 1, NULL, 0);
> -
> -		tmn.task = p;
> -		tmn.from_cpu = task_cpu(p);
> -		tmn.to_cpu = new_cpu;
> -
> -		atomic_notifier_call_chain(&task_migration_notifier, 0, &tmn);
>  	}
>  
>  	__set_task_cpu(p, new_cpu);
> 

--
			Gleb.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] remove sched notifier for cross-cpu migrations
  2013-07-11  1:21 ` [PATCH] remove sched " Marcelo Tosatti
  2013-07-11  8:10   ` Gleb Natapov
@ 2013-07-11  9:52   ` Peter Zijlstra
  2013-07-14 11:16   ` Gleb Natapov
  2013-08-23 10:22   ` Peter Zijlstra
  3 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2013-07-11  9:52 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: mingo, gleb, glommer, tglx, jeremy, rostedt, linux-kernel

On Wed, Jul 10, 2013 at 10:21:57PM -0300, Marcelo Tosatti wrote:
> 
> Linux as a guest on KVM hypervisor, the only user of the pvclock      
> vsyscall interface, does not require notification on task migration 
> because:
> 
> 1. cpu ID number maps 1:1 to per-CPU pvclock time info.
> 2. per-CPU pvclock time info is updated if the
>    underlying CPU changes.
> 3. that version is increased whenever underlying CPU 
>    changes.
> 
> Which is sufficient to guarantee nanoseconds counter
> is calculated properly.
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Awesome, thanks!

Acked-by: Peter Zijlstra <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] remove sched notifier for cross-cpu migrations
  2013-07-11  8:10   ` Gleb Natapov
@ 2013-07-11 22:15     ` Marcelo Tosatti
  0 siblings, 0 replies; 11+ messages in thread
From: Marcelo Tosatti @ 2013-07-11 22:15 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Peter Zijlstra, mingo, glommer, tglx, jeremy, rostedt, linux-kernel

On Thu, Jul 11, 2013 at 11:10:25AM +0300, Gleb Natapov wrote:
> On Wed, Jul 10, 2013 at 10:21:57PM -0300, Marcelo Tosatti wrote:
> > 
> > Linux as a guest on KVM hypervisor, the only user of the pvclock      
> > vsyscall interface, does not require notification on task migration 
> > because:
> > 
> > 1. cpu ID number maps 1:1 to per-CPU pvclock time info.
> > 2. per-CPU pvclock time info is updated if the
> >    underlying CPU changes.
> > 3. that version is increased whenever underlying CPU 
> >    changes.
> > 
> Is this the case with KVM though? IIRC KVM does not updates kvmclock
> on vcpu migration if host has synchronized TSC.

KVM does not update kvmclock on vcpu->pcpu migration if the 
master clock scheme is used, which depends on synchronized TSC.
In that case, it is safe to migrate a thread/vcpu to another cpu without
updating kvmclock area (because all vcpus contain the same value as
system timestamp and different tsc_timestamps, but all (tsc_timestamps)
usable as deltas on any given pcpu.

So the 3 items are necessary in the general case of hypervisor
maintaining distinct kvmclock vcpu areas (copy&pasted the comment from
the code comment).


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] remove sched notifier for cross-cpu migrations
  2013-07-11  1:21 ` [PATCH] remove sched " Marcelo Tosatti
  2013-07-11  8:10   ` Gleb Natapov
  2013-07-11  9:52   ` Peter Zijlstra
@ 2013-07-14 11:16   ` Gleb Natapov
  2013-08-23 10:22   ` Peter Zijlstra
  3 siblings, 0 replies; 11+ messages in thread
From: Gleb Natapov @ 2013-07-14 11:16 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Peter Zijlstra, mingo, glommer, tglx, jeremy, rostedt, linux-kernel

On Wed, Jul 10, 2013 at 10:21:57PM -0300, Marcelo Tosatti wrote:
> 
> Linux as a guest on KVM hypervisor, the only user of the pvclock      
> vsyscall interface, does not require notification on task migration 
> because:
> 
> 1. cpu ID number maps 1:1 to per-CPU pvclock time info.
> 2. per-CPU pvclock time info is updated if the
>    underlying CPU changes.
> 3. that version is increased whenever underlying CPU 
>    changes.
> 
> Which is sufficient to guarantee nanoseconds counter
> is calculated properly.
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> 
Applied, thanks.

> diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
> index 109a9dd..be8269b 100644
> --- a/arch/x86/include/asm/pvclock.h
> +++ b/arch/x86/include/asm/pvclock.h
> @@ -93,7 +93,6 @@ unsigned __pvclock_read_cycles(const struct pvclock_vcpu_time_info *src,
>  
>  struct pvclock_vsyscall_time_info {
>  	struct pvclock_vcpu_time_info pvti;
> -	u32 migrate_count;
>  } __attribute__((__aligned__(SMP_CACHE_BYTES)));
>  
>  #define PVTI_SIZE sizeof(struct pvclock_vsyscall_time_info)
> diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
> index 2cb9470..a16bae3 100644
> --- a/arch/x86/kernel/pvclock.c
> +++ b/arch/x86/kernel/pvclock.c
> @@ -128,46 +128,7 @@ void pvclock_read_wallclock(struct pvclock_wall_clock *wall_clock,
>  	set_normalized_timespec(ts, now.tv_sec, now.tv_nsec);
>  }
>  
> -static struct pvclock_vsyscall_time_info *pvclock_vdso_info;
> -
> -static struct pvclock_vsyscall_time_info *
> -pvclock_get_vsyscall_user_time_info(int cpu)
> -{
> -	if (!pvclock_vdso_info) {
> -		BUG();
> -		return NULL;
> -	}
> -
> -	return &pvclock_vdso_info[cpu];
> -}
> -
> -struct pvclock_vcpu_time_info *pvclock_get_vsyscall_time_info(int cpu)
> -{
> -	return &pvclock_get_vsyscall_user_time_info(cpu)->pvti;
> -}
> -
>  #ifdef CONFIG_X86_64
> -static int pvclock_task_migrate(struct notifier_block *nb, unsigned long l,
> -			        void *v)
> -{
> -	struct task_migration_notifier *mn = v;
> -	struct pvclock_vsyscall_time_info *pvti;
> -
> -	pvti = pvclock_get_vsyscall_user_time_info(mn->from_cpu);
> -
> -	/* this is NULL when pvclock vsyscall is not initialized */
> -	if (unlikely(pvti == NULL))
> -		return NOTIFY_DONE;
> -
> -	pvti->migrate_count++;
> -
> -	return NOTIFY_DONE;
> -}
> -
> -static struct notifier_block pvclock_migrate = {
> -	.notifier_call = pvclock_task_migrate,
> -};
> -
>  /*
>   * Initialize the generic pvclock vsyscall state.  This will allocate
>   * a/some page(s) for the per-vcpu pvclock information, set up a
> @@ -181,17 +142,12 @@ int __init pvclock_init_vsyscall(struct pvclock_vsyscall_time_info *i,
>  
>  	WARN_ON (size != PVCLOCK_VSYSCALL_NR_PAGES*PAGE_SIZE);
>  
> -	pvclock_vdso_info = i;
> -
>  	for (idx = 0; idx <= (PVCLOCK_FIXMAP_END-PVCLOCK_FIXMAP_BEGIN); idx++) {
>  		__set_fixmap(PVCLOCK_FIXMAP_BEGIN + idx,
>  			     __pa(i) + (idx*PAGE_SIZE),
>  			     PAGE_KERNEL_VVAR);
>  	}
>  
> -
> -	register_task_migration_notifier(&pvclock_migrate);
> -
>  	return 0;
>  }
>  #endif
> diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
> index c74436e..72074d5 100644
> --- a/arch/x86/vdso/vclock_gettime.c
> +++ b/arch/x86/vdso/vclock_gettime.c
> @@ -85,15 +85,18 @@ static notrace cycle_t vread_pvclock(int *mode)
>  	cycle_t ret;
>  	u64 last;
>  	u32 version;
> -	u32 migrate_count;
>  	u8 flags;
>  	unsigned cpu, cpu1;
>  
>  
>  	/*
> -	 * When looping to get a consistent (time-info, tsc) pair, we
> -	 * also need to deal with the possibility we can switch vcpus,
> -	 * so make sure we always re-fetch time-info for the current vcpu.
> +	 * Note: hypervisor must guarantee that:
> +	 * 1. cpu ID number maps 1:1 to per-CPU pvclock time info.
> +	 * 2. that per-CPU pvclock time info is updated if the
> +	 *    underlying CPU changes.
> +	 * 3. that version is increased whenever underlying CPU
> +	 *    changes.
> +	 *
>  	 */
>  	do {
>  		cpu = __getcpu() & VGETCPU_CPU_MASK;
> @@ -104,8 +107,6 @@ static notrace cycle_t vread_pvclock(int *mode)
>  
>  		pvti = get_pvti(cpu);
>  
> -		migrate_count = pvti->migrate_count;
> -
>  		version = __pvclock_read_cycles(&pvti->pvti, &ret, &flags);
>  
>  		/*
> @@ -117,8 +118,7 @@ static notrace cycle_t vread_pvclock(int *mode)
>  		cpu1 = __getcpu() & VGETCPU_CPU_MASK;
>  	} while (unlikely(cpu != cpu1 ||
>  			  (pvti->pvti.version & 1) ||
> -			  pvti->pvti.version != version ||
> -			  pvti->migrate_count != migrate_count));
> +			  pvti->pvti.version != version));
>  
>  	if (unlikely(!(flags & PVCLOCK_TSC_STABLE_BIT)))
>  		*mode = VCLOCK_NONE;
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 178a8d9..217ce4b 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -107,14 +107,6 @@ extern unsigned long this_cpu_load(void);
>  extern void calc_global_load(unsigned long ticks);
>  extern void update_cpu_load_nohz(void);
>  
> -/* Notifier for when a task gets migrated to a new CPU */
> -struct task_migration_notifier {
> -	struct task_struct *task;
> -	int from_cpu;
> -	int to_cpu;
> -};
> -extern void register_task_migration_notifier(struct notifier_block *n);
> -
>  extern unsigned long get_parent_ip(unsigned long addr);
>  
>  extern void dump_cpu_task(int cpu);
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index e8b3350..122e499 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -974,13 +974,6 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
>  		rq->skip_clock_update = 1;
>  }
>  
> -static ATOMIC_NOTIFIER_HEAD(task_migration_notifier);
> -
> -void register_task_migration_notifier(struct notifier_block *n)
> -{
> -	atomic_notifier_chain_register(&task_migration_notifier, n);
> -}
> -
>  #ifdef CONFIG_SMP
>  void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
>  {
> @@ -1011,18 +1004,10 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
>  	trace_sched_migrate_task(p, new_cpu);
>  
>  	if (task_cpu(p) != new_cpu) {
> -		struct task_migration_notifier tmn;
> -
>  		if (p->sched_class->migrate_task_rq)
>  			p->sched_class->migrate_task_rq(p, new_cpu);
>  		p->se.nr_migrations++;
>  		perf_sw_event(PERF_COUNT_SW_CPU_MIGRATIONS, 1, NULL, 0);
> -
> -		tmn.task = p;
> -		tmn.from_cpu = task_cpu(p);
> -		tmn.to_cpu = new_cpu;
> -
> -		atomic_notifier_call_chain(&task_migration_notifier, 0, &tmn);
>  	}
>  
>  	__set_task_cpu(p, new_cpu);
> 

--
			Gleb.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] remove sched notifier for cross-cpu migrations
  2013-07-11  1:21 ` [PATCH] remove sched " Marcelo Tosatti
                     ` (2 preceding siblings ...)
  2013-07-14 11:16   ` Gleb Natapov
@ 2013-08-23 10:22   ` Peter Zijlstra
  2013-08-23 10:31     ` Gleb Natapov
  3 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2013-08-23 10:22 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: mingo, gleb, glommer, tglx, jeremy, rostedt, linux-kernel

On Wed, Jul 10, 2013 at 10:21:57PM -0300, Marcelo Tosatti wrote:
> 
> Linux as a guest on KVM hypervisor, the only user of the pvclock      
> vsyscall interface, does not require notification on task migration 
> because:
> 
> 1. cpu ID number maps 1:1 to per-CPU pvclock time info.
> 2. per-CPU pvclock time info is updated if the
>    underlying CPU changes.
> 3. that version is increased whenever underlying CPU 
>    changes.
> 
> Which is sufficient to guarantee nanoseconds counter
> is calculated properly.
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

What happened to this patch? I don't see it in my current tree, nor do I
find it in the kvm tree.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] remove sched notifier for cross-cpu migrations
  2013-08-23 10:22   ` Peter Zijlstra
@ 2013-08-23 10:31     ` Gleb Natapov
  2013-08-23 10:34       ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Gleb Natapov @ 2013-08-23 10:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Marcelo Tosatti, mingo, glommer, tglx, jeremy, rostedt, linux-kernel

On Fri, Aug 23, 2013 at 12:22:30PM +0200, Peter Zijlstra wrote:
> On Wed, Jul 10, 2013 at 10:21:57PM -0300, Marcelo Tosatti wrote:
> > 
> > Linux as a guest on KVM hypervisor, the only user of the pvclock      
> > vsyscall interface, does not require notification on task migration 
> > because:
> > 
> > 1. cpu ID number maps 1:1 to per-CPU pvclock time info.
> > 2. per-CPU pvclock time info is updated if the
> >    underlying CPU changes.
> > 3. that version is increased whenever underlying CPU 
> >    changes.
> > 
> > Which is sufficient to guarantee nanoseconds counter
> > is calculated properly.
> > 
> > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> 
> What happened to this patch? I don't see it in my current tree, nor do I
> find it in the kvm tree.
> 
e04c5d76b0cfb66cadd900cf147526f2271884b8 in kvm.git next branch.

--
			Gleb.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] remove sched notifier for cross-cpu migrations
  2013-08-23 10:31     ` Gleb Natapov
@ 2013-08-23 10:34       ` Peter Zijlstra
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2013-08-23 10:34 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Marcelo Tosatti, mingo, glommer, tglx, jeremy, rostedt, linux-kernel

On Fri, Aug 23, 2013 at 01:31:43PM +0300, Gleb Natapov wrote:
> On Fri, Aug 23, 2013 at 12:22:30PM +0200, Peter Zijlstra wrote:
> > On Wed, Jul 10, 2013 at 10:21:57PM -0300, Marcelo Tosatti wrote:
> > > 
> > > Linux as a guest on KVM hypervisor, the only user of the pvclock      
> > > vsyscall interface, does not require notification on task migration 
> > > because:
> > > 
> > > 1. cpu ID number maps 1:1 to per-CPU pvclock time info.
> > > 2. per-CPU pvclock time info is updated if the
> > >    underlying CPU changes.
> > > 3. that version is increased whenever underlying CPU 
> > >    changes.
> > > 
> > > Which is sufficient to guarantee nanoseconds counter
> > > is calculated properly.
> > > 
> > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> > 
> > What happened to this patch? I don't see it in my current tree, nor do I
> > find it in the kvm tree.
> > 
> e04c5d76b0cfb66cadd900cf147526f2271884b8 in kvm.git next branch.

Awesome, thanks!

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-08-23 10:35 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-10 10:34 sched: add notifier for cross-cpu migrations Peter Zijlstra
2013-07-10 14:39 ` Ingo Molnar
2013-07-10 14:52   ` Peter Zijlstra
2013-07-11  1:21 ` [PATCH] remove sched " Marcelo Tosatti
2013-07-11  8:10   ` Gleb Natapov
2013-07-11 22:15     ` Marcelo Tosatti
2013-07-11  9:52   ` Peter Zijlstra
2013-07-14 11:16   ` Gleb Natapov
2013-08-23 10:22   ` Peter Zijlstra
2013-08-23 10:31     ` Gleb Natapov
2013-08-23 10:34       ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.