All of lore.kernel.org
 help / color / mirror / Atom feed
From: Charles Wang <muming.wq@gmail.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, "Ingo Molnar" <mingo@redhat.com>,
	"Tao Ma" <tm@tao.ma>, 含黛 <handai.szj@taobao.com>,
	"Doug Smythies" <dsmythies@telus.net>,
	"Thomas Gleixner" <tglx@linutronix.de>
Subject: Re: [PATCH] sched: Folding nohz load accounting more accurate
Date: Tue, 19 Jun 2012 14:24:47 +0800	[thread overview]
Message-ID: <4FE01B2F.9050805@gmail.com> (raw)
In-Reply-To: <1340035417.15222.95.camel@twins>

On Tuesday, June 19, 2012 12:03 AM, Peter Zijlstra wrote:

> Hi Charles,
> 
> I'm having difficulties understanding your exact meaning, I suspect its
> a language thing, so please excuse me for nit-picking through your
> email.
> 
> 
> On Fri, 2012-06-15 at 22:27 +0800, Charles Wang wrote:
> 
>> In our mind 
> 
> Are there more people involved?
> 
>> per-cpu sampling for cpu idle and non-idle is equal. But
>> actually may not.
> 
> Are you saying they should be equal but are in fact not so?



That's it. When a cpu enters idle, tick will be stopped, and sampling
couldn't execute on this idle cpu. Although idle will be folded, and
then be caculated into global calc_load_tasks[1], we not update
calc_load_update on this cpu, and sampling will be done after idle
exits[2]. The real load for this sampling time should be the load when
this cpu goes into idle([1]), and the sampling after idle exits [2] is
wrong. This's what i mean "sampling time line fix", not clear before. @_@

> 
> I think we can all agree on this. Doug has illustrated this quite
> clearly.
> 
> The desire is for CONFIG_NOHZ=n,y to function identically, but its been
> clearly demonstrated this is currently not the case.
> 
>>  For non-idle cpu sampling, it's right the load when
>> sampling. 
> 
> Agreed, sampling of a busy cpu is identical.
> 
>> But for idle, cause of nohz, the sampling will be delayed to
>> nohz exit(less than 1 tick after nohz exit). 
> 
> I don't think the nohz exit code calls into the load sampling, but I
> think you're saying we'll get a sample tick after we leave nohz, right?

yes

> 
> This is only so if the busy period covers a tick, that is, if we wake
> and go back to idle before a tick happens we'll still not get sampled.
> 


> 
>   tick          tick
>     |----====-----|
>          ^   ^
>        wake  sleep
>

This is the key point. We should take sampling in this cpu as idle load,
but cause of idle the sampling is delayed to a wrong place.

> 
> Shows a nohz-exit busy period not sampled.
> 
>> Nohz exit is always caused
>> by processes woken up--non-idle model. It's not fair here, idle
>> calculated to non-idle.
>>
>>      time-expect-sampling
>>                    |    time-do-sampling
>>                    |         |
>>                    V         V
>> -|-------------------------|--
>> start_nohz              stop_nohz
> 
> I don't think the delay in sampling is the biggest problem, I think the
> problem is the direct interaction between a cpu going idle and another
> cpu taking a sample.



Maybe "delay" is not the exactly, this sampling is totally wrong.

"A cpu going idle and another cpu taking a sample" is the premise,
missing the right sampling time and taking another wrong sample after
idle exits is the real reason.

> 
> So the approach I took was to isolate the going idle before the sample
> window from going idle during (and after) the sampling window.
> 
> Therefore any going idle activity will not affect the sampled of other
> cpus. The only trick is the slight shift in index flip for read vs
> write.
> 
> 
>   0             5             10            15
>     +10           +10           +10           +10
>   |-|-----------|-|-----------|-|-----------|-|
> 
> r:001           110           001           110
> w:011           100           011           100
> 

It's a wonderful plan to use index flip. Simple and effective.

> 
> Shows we'll read the old idle load, but write to the new idle load
> during the sample window. Thus including the old idle load in this
> sample fold, but leaving new activity for the next.


> 
> A cpu waking up and doing a sample is similar to the cpu being busy at
> the window start.
> 
> However since this window is 10 ticks long and any busy spanning a tick
> will make it appear 'busy' we can only accurately sample loads of up to
> HZ/(2*10) (2 for the sampling theorem). For a regular HZ=1000 kernel
> this ends up being 50 Hz.

> 
> Higher frequency workloads will appear to over-account.
> 
> Now the whole reason we have this window of 10 ticks is that we're
> trying to do a completely asynchronous load computation trying to avoid
> as much serialization as possible. So have each cpu account its local
> state and hope that 10 ticks is sufficient for all to have reported in,
> then process the fold.
> 
> The 10 tick window is directly related to the worst irq-off latency on
> your machine, if you keep IRQs disabled for a few ticks -- something
> that quite easily happens on large machines, even a busy cpu will be
> 'late' reporting its load. I think the current 10 tick came from an SGI
> machine with 4k cpus or so.
> 
> 
> Hmmm,.. idea.. OK, so we should also have a hook coming out of NOHZ
> state, when we come out of NOHZ state during the sample window, simply
> push the whole window fwd to the next time.

These two hooks can take over my work. I tried these before, but not
find the right place :(. So I tried the simple solution.

> 

> This finds another bug in the current code.. A cpu that has idled 'long'
> could be multiple LOAD_FREQ intervals behind and will take multiple
> samples in quick succession instead of 5s apart.
> 
> 
> Can someone please think through the below thing? its been compile
> tested only...
> 
> ---
>  kernel/sched/core.c      |  290 ++++++++++++++++++++++++++++++++++------------
>  kernel/sched/idle_task.c |    1 -
>  kernel/sched/sched.h     |    2 -
>  kernel/time/tick-sched.c |    2 +
>  4 files changed, 220 insertions(+), 75 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index d5594a4..3a49ee1 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2161,11 +2161,72 @@ unsigned long this_cpu_load(void)
>  }
>  
>  
> +/*
> + * global load-average calculations
> + *
> + * We take a distributed and async approach to calculating the global load-avg
> + * in order to minimize overhead.
> + *
> + * The global load average is an exponentially decaying average of nr_running +
> + * nr_uninterruptible.
> + *
> + * Once every LOAD_FREQ:
> + *
> + *   nr_active = 0;
> + *   for_each_possible_cpu(cpu)
> + *   	nr_active += cpu_of(cpu)->nr_running + cpu_of(cpu)->nr_uninterruptible;
> + *
> + *   avenrun[n] = avenrun[0] * exp_n + nr_active * (1 - exp_n)
> + *
> + * Due to a number of reasons the above turns in the mess below:
> + *
> + *  - for_each_possible_cpu() is prohibitively expensive on machines with
> + *    serious number of cpus, therefore we need to take a distributed approach
> + *    to calculating nr_active.
> + *
> + *        \Sum_i x_i(t) = \Sum_i x_i(t) - x_i(t_0) | x_i(t_0) := 0
> + *                      = \Sum_i { \Sum_j=1 x_i(t_j) - x_i(t_j-1) }
> + *
> + *    So assuming nr_active := 0 when we start out -- true per definition, we
> + *    can simply take per-cpu deltas and fold those into a global accumulate
> + *    to obtain the same result. See calc_load_fold_active().
> + *
> + *    Furthermore, in order to avoid synchronizing all per-cpu delta folding
> + *    across the machine, we assume 10 ticks is sufficient time for every
> + *    cpu to have completed this task.
> + *
> + *    This places an upper-bound on the IRQ-off latency of the machine. 
> + *
> + *  - cpu_rq()->nr_uninterruptible isn't accurately tracked per-cpu because
> + *    this would add another cross-cpu cacheline miss and atomic operation
> + *    to the wakeup path. Instead we increment on whatever cpu the task ran
> + *    when it went into uninterruptible state and decrement on whatever cpu
> + *    did the wakeup. This means that only the sum of nr_uninterruptible over
> + *    all cpus yields the correct result.
> + *
> + *  This covers the NO_HZ=n code, for extra head-aches, see the comment below.
> + */
> +
>  /* Variables and functions for calc_load */
>  static atomic_long_t calc_load_tasks;
>  static unsigned long calc_load_update;
>  unsigned long avenrun[3];
> -EXPORT_SYMBOL(avenrun);
> +EXPORT_SYMBOL(avenrun); /* should be removed */
> +
> +/**
> + * get_avenrun - get the load average array
> + * @loads:	pointer to dest load array
> + * @offset:	offset to add
> + * @shift:	shift count to shift the result left
> + *
> + * These values are estimates at best, so no need for locking.
> + */
> +void get_avenrun(unsigned long *loads, unsigned long offset, int shift)
> +{
> +	loads[0] = (avenrun[0] + offset) << shift;
> +	loads[1] = (avenrun[1] + offset) << shift;
> +	loads[2] = (avenrun[2] + offset) << shift;
> +}
>  
>  static long calc_load_fold_active(struct rq *this_rq)
>  {
> @@ -2182,6 +2243,9 @@ static long calc_load_fold_active(struct rq *this_rq)
>  	return delta;
>  }
>  
> +/*
> + * a1 = a0 * e + a * (1 - e)
> + */
>  static unsigned long
>  calc_load(unsigned long load, unsigned long exp, unsigned long active)
>  {
> @@ -2193,30 +2257,117 @@ calc_load(unsigned long load, unsigned long exp, unsigned long active)
>  
>  #ifdef CONFIG_NO_HZ
>  /*
> - * For NO_HZ we delay the active fold to the next LOAD_FREQ update.
> + * Handle NO_HZ for the global load-average.
> + *
> + * Since the above described distributed algorithm to compute the global
> + * load-average relies on per-cpu sampling from the tick, it is affected by
> + * NO_HZ.
> + *
> + * The basic idea is to fold the nr_active delta into a global idle load upon
> + * entering NO_HZ state such that we can include this as an 'extra' cpu delta
> + * when we read the global state.
> + *
> + * Obviously reality has to ruin such a delightfully simple scheme:
> + *
> + *  - When we go NO_HZ idle during the window, we can negate our sample
> + *    contribution, causing under-accounting.
> + *
> + *    We avoid this by keeping two idle-delta counters and flipping them
> + *    when the window starts, thus separating old and new NO_HZ load.
> + *
> + *    The only trick is the slight shift in index flip for read vs write.
> + *
> + *       0             5             10            15
> + *         +10           +10           +10           +10
> + *       |-|-----------|-|-----------|-|-----------|-|
> + *    r:001           110           001           110
> + *    w:011           100           011           100
> + *
> + *    This ensures we'll fold the old idle contribution in this window while
> + *    accumlating the new one.
> + *
> + *  - When we wake up from NO_HZ idle during the window, we push up our
> + *    contribution, since we effectively move our sample point to a known
> + *    busy state.
> + *
> + *    This is solved by pushing the window forward, and thus skipping the
> + *    sample, for this cpu (effectively using the idle-delta for this cpu which
> + *    was in effect at the time the window opened). This also solves the issue
> + *    of having to deal with a cpu having been in NOHZ idle for multiple
> + *    LOAD_FREQ intervals.
>   *
>   * When making the ILB scale, we should try to pull this in as well.
>   */
> -static atomic_long_t calc_load_tasks_idle;
> +static atomic_long_t calc_load_idle[2];
> +static int calc_load_idx;
>  
> -void calc_load_account_idle(struct rq *this_rq)
> +static inline int calc_load_write_idx(void)
>  {
> +	int idx = calc_load_idx;
> +
> +	/*
> +	 * See calc_global_nohz(), if we observe the new index, we also
> +	 * need to observe the new update time.
> +	 */
> +	smp_rmb();
> +
> +	/*
> +	 * If the folding window started, make sure we start writing in the
> +	 * next idle-load delta.
> +	 */
> +	if (!time_before(jiffies, calc_load_update))
> +		idx++;

Can we just take calc_load_update as the start time-line here?  Will
there be different ticks between cpus?

> +
> +	return idx & 1;
> +}
> +
> +static inline int calc_load_read_idx(void)
> +{
> +	return calc_load_idx & 1;
> +}
> +
> +void calc_load_enter_idle(void)
> +{
> +	struct rq *this_rq = this_rq();
>  	long delta;
> +	int idx;
>  
> +	/*
> +	 * We're going into NOHZ mode, if there's any pending delta, fold it
> +	 * into the pending idle delta.
> +	 */
>  	delta = calc_load_fold_active(this_rq);
> -	if (delta)
> -		atomic_long_add(delta, &calc_load_tasks_idle);
> +	if (delta) {
> +		idx = calc_load_write_idx();
> +		atomic_long_add(delta, &calc_load_idle[idx]);
> +	}
>  }
>  
> -static long calc_load_fold_idle(void)
> +void calc_load_exit_idle(void)
>  {
> -	long delta = 0;
> +	struct rq *this_rq = this_rq();
>  
>  	/*
> -	 * Its got a race, we don't care...
> +	 * If we're still outside the sample window, we're done.
>  	 */
> -	if (atomic_long_read(&calc_load_tasks_idle))
> -		delta = atomic_long_xchg(&calc_load_tasks_idle, 0);
> +	if (time_before(jiffies, this_rq->calc_load_update))
> +		return;
> +
> +	/*
> +	 * We woke inside or after the sample window, this means another cpu
> +	 * likely already accounted us through the nohz accounting, so skip the
> +	 * entire deal and sync up for the next window.
> +	 */
> +	this_rq->calc_load_update = calc_load_update + LOAD_FREQ;
> +}
> +
> +static long calc_load_fold_idle(void)
> +{
> +	int idx = calc_load_read_idx();
> +	long delta = 0;
> +
> +	if (atomic_long_read(&calc_load_idle[idx]))
> +		delta = atomic_long_xchg(&calc_load_idle[idx], 0);
>  
>  	return delta;
>  }
> @@ -2302,66 +2453,39 @@ static void calc_global_nohz(void)
>  {
>  	long delta, active, n;
>  
> -	/*
> -	 * If we crossed a calc_load_update boundary, make sure to fold
> -	 * any pending idle changes, the respective CPUs might have
> -	 * missed the tick driven calc_load_account_active() update
> -	 * due to NO_HZ.
> -	 */
> -	delta = calc_load_fold_idle();
> -	if (delta)
> -		atomic_long_add(delta, &calc_load_tasks);
> -
> -	/*
> -	 * It could be the one fold was all it took, we done!
> -	 */
> -	if (time_before(jiffies, calc_load_update + 10))
> -		return;
> -
> -	/*
> -	 * Catch-up, fold however many we are behind still
> -	 */
> -	delta = jiffies - calc_load_update - 10;
> -	n = 1 + (delta / LOAD_FREQ);
> +	if (!time_before(jiffies, calc_load_update + 10)) {
> +		/*
> +		 * Catch-up, fold however many we are behind still
> +		 */
> +		delta = jiffies - calc_load_update - 10;
> +		n = 1 + (delta / LOAD_FREQ);
>  
> -	active = atomic_long_read(&calc_load_tasks);
> -	active = active > 0 ? active * FIXED_1 : 0;
> +		active = atomic_long_read(&calc_load_tasks);
> +		active = active > 0 ? active * FIXED_1 : 0;
>  
> -	avenrun[0] = calc_load_n(avenrun[0], EXP_1, active, n);
> -	avenrun[1] = calc_load_n(avenrun[1], EXP_5, active, n);
> -	avenrun[2] = calc_load_n(avenrun[2], EXP_15, active, n);
> +		avenrun[0] = calc_load_n(avenrun[0], EXP_1, active, n);
> +		avenrun[1] = calc_load_n(avenrun[1], EXP_5, active, n);
> +		avenrun[2] = calc_load_n(avenrun[2], EXP_15, active, n);
>  
> -	calc_load_update += n * LOAD_FREQ;
> -}
> -#else
> -void calc_load_account_idle(struct rq *this_rq)
> -{
> -}
> +		calc_load_update += n * LOAD_FREQ;
> +	}
>  
> -static inline long calc_load_fold_idle(void)
> -{
> -	return 0;
> +	/*
> +	 * Flip the idle index...
> +	 *
> +	 * Make sure we first write the new time then flip the index, so that
> +	 * calc_load_write_idx() will see the new time when it reads the new
> +	 * index, this avoids a double flip messing things up.
> +	 */
> +	smp_wmb();
> +	calc_load_idx++;
>  }
> +#else /* !CONFIG_NO_HZ */
>  
> -static void calc_global_nohz(void)
> -{
> -}
> -#endif
> +static inline long calc_load_fold_idle(void) { return 0; }
> +static inline void calc_global_nohz(void) { }
>  
> -/**
> - * get_avenrun - get the load average array
> - * @loads:	pointer to dest load array
> - * @offset:	offset to add
> - * @shift:	shift count to shift the result left
> - *
> - * These values are estimates at best, so no need for locking.
> - */
> -void get_avenrun(unsigned long *loads, unsigned long offset, int shift)
> -{
> -	loads[0] = (avenrun[0] + offset) << shift;
> -	loads[1] = (avenrun[1] + offset) << shift;
> -	loads[2] = (avenrun[2] + offset) << shift;
> -}
> +#endif /* CONFIG_NO_HZ */
>  
>  /*
>   * calc_load - update the avenrun load estimates 10 ticks after the
> @@ -2369,11 +2493,35 @@ void get_avenrun(unsigned long *loads, unsigned long offset, int shift)
>   */
>  void calc_global_load(unsigned long ticks)
>  {
> -	long active;
> +	long active, delta;
>  
>  	if (time_before(jiffies, calc_load_update + 10))
>  		return;
>  
> +	/*
> +	 * Fold the 'old' idle-delta to include all NO_HZ cpus.
> +	 *
> +	 *	cpu0	cpu1	cpu2	..
> +	 *
> +	 * >--- [sample A]
> +	 *
> +	 *			-> NOHZ
> +	 *		-> NOHZ
> +	 *	->NOHZ
> +	 *
> +	 * >--- [sample B]
> +	 *
> +	 * >--- [sample C]
> +	 *
> +	 *      NOHZ-> (here)
> +	 *
> +	 * Since all CPUs went into NOHZ state, all 'missed' samples (B, C)
> +	 * should include the folded idle-delta.
> +	 */
> +	delta += calc_load_fold_idle();
> +	if (delta)
> +		atomic_long_add(delta, &calc_load_tasks);
> +
>  	active = atomic_long_read(&calc_load_tasks);
>  	active = active > 0 ? active * FIXED_1 : 0;
>  
> @@ -2384,12 +2532,7 @@ void calc_global_load(unsigned long ticks)
>  	calc_load_update += LOAD_FREQ;
>  
>  	/*
> -	 * Account one period with whatever state we found before
> -	 * folding in the nohz state and ageing the entire idle period.
> -	 *
> -	 * This avoids loosing a sample when we go idle between 
> -	 * calc_load_account_active() (10 ticks ago) and now and thus
> -	 * under-accounting.
> +	 * In case we idled for multiple LOAD_FREQ intervals, catch up in bulk.
>  	 */
>  	calc_global_nohz();
>  }
> @@ -2406,7 +2549,6 @@ static void calc_load_account_active(struct rq *this_rq)
>  		return;
>  
>  	delta  = calc_load_fold_active(this_rq);
> -	delta += calc_load_fold_idle();
>  	if (delta)
>  		atomic_long_add(delta, &calc_load_tasks);
>  
> @@ -2414,6 +2556,10 @@ static void calc_load_account_active(struct rq *this_rq)
>  }
>  
>  /*
> + * End of global load-average stuff
> + */
> +
> +/*
>   * The exact cpuload at various idx values, calculated at every tick would be
>   * load = (2^idx - 1) / 2^idx * load + 1 / 2^idx * cur_load
>   *
> diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
> index b44d604..b6baf37 100644
> --- a/kernel/sched/idle_task.c
> +++ b/kernel/sched/idle_task.c
> @@ -25,7 +25,6 @@ static void check_preempt_curr_idle(struct rq *rq, struct task_struct *p, int fl
>  static struct task_struct *pick_next_task_idle(struct rq *rq)
>  {
>  	schedstat_inc(rq, sched_goidle);
> -	calc_load_account_idle(rq);
>  	return rq->idle;
>  }
>  
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 6d52cea..55844f2 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -942,8 +942,6 @@ static inline u64 sched_avg_period(void)
>  	return (u64)sysctl_sched_time_avg * NSEC_PER_MSEC / 2;
>  }
>  
> -void calc_load_account_idle(struct rq *this_rq);
> -
>  #ifdef CONFIG_SCHED_HRTICK
>  
>  /*
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 8699978..4a08472 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -406,6 +406,7 @@ static void tick_nohz_stop_sched_tick(struct tick_sched *ts)
>  		 */
>  		if (!ts->tick_stopped) {
>  			select_nohz_load_balancer(1);
> +			calc_load_enter_idle();
>  
>  			ts->idle_tick = hrtimer_get_expires(&ts->sched_timer);
>  			ts->tick_stopped = 1;
> @@ -597,6 +598,7 @@ void tick_nohz_idle_exit(void)
>  		account_idle_ticks(ticks);
>  #endif
>  
> +	calc_load_exit_idle();
>  	touch_softlockup_watchdog();
>  	/*
>  	 * Cancel the scheduled timer and restore the tick
> 
> 





  parent reply	other threads:[~2012-06-19  6:24 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-09 10:54 [PATCH] sched: Folding nohz load accounting more accurate Charles Wang
2012-06-11 15:42 ` Peter Zijlstra
     [not found]   ` <4FD6BFC4.1060302@gmail.com>
2012-06-12  8:54     ` Peter Zijlstra
2012-06-12  9:34   ` Charles Wang
2012-06-12  9:56     ` Peter Zijlstra
2012-06-13  5:55       ` Doug Smythies
2012-06-13  7:56         ` Charles Wang
2012-06-14  4:41           ` Doug Smythies
2012-06-14 15:42             ` Charles Wang
2012-06-16  6:42               ` Doug Smythies
2012-06-13  8:16         ` Peter Zijlstra
2012-06-13 15:33           ` Doug Smythies
2012-06-13 21:57             ` Peter Zijlstra
2012-06-14  3:13               ` Doug Smythies
2012-06-18 10:13                 ` Peter Zijlstra
2012-07-20 19:24         ` sched: care and feeding of load-avg code (Re: [PATCH] sched: Folding nohz load accounting more accurate) Jonathan Nieder
2012-06-15 14:27       ` [PATCH] sched: Folding nohz load accounting more accurate Charles Wang
2012-06-15 17:39         ` Peter Zijlstra
2012-06-16 14:53           ` Doug Smythies
2012-06-18  6:41             ` Doug Smythies
2012-06-18 14:41               ` Charles Wang
2012-06-18 10:06           ` Charles Wang
2012-06-18 16:03         ` Peter Zijlstra
2012-06-19  6:08           ` Yong Zhang
2012-06-19  9:18             ` Peter Zijlstra
2012-06-19 15:50               ` Doug Smythies
2012-06-20  9:45                 ` Peter Zijlstra
2012-06-21  4:12                   ` Doug Smythies
2012-06-21  6:35                     ` Charles Wang
2012-06-21  8:48                     ` Peter Zijlstra
2012-06-22 14:03                     ` Peter Zijlstra
2012-06-24 21:45                       ` Doug Smythies
2012-07-03 16:01                         ` Doug Smythies
2012-06-25  2:15                       ` Charles Wang
2012-07-06  6:19                       ` [tip:sched/core] sched/nohz: Rewrite and fix load-avg computation -- again tip-bot for Peter Zijlstra
2012-06-19  6:19           ` [PATCH] sched: Folding nohz load accounting more accurate Doug Smythies
2012-06-19  6:24           ` Charles Wang [this message]
2012-06-19  9:57             ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FE01B2F.9050805@gmail.com \
    --to=muming.wq@gmail.com \
    --cc=dsmythies@telus.net \
    --cc=handai.szj@taobao.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=tm@tao.ma \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.