linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Li, Aubrey" <aubrey.li@linux.intel.com>
To: Dave Hansen <dave.hansen@intel.com>,
	Aubrey Li <aubrey.li@intel.com>,
	tglx@linutronix.de, mingo@redhat.com, peterz@infradead.org,
	hpa@zytor.com
Cc: ak@linux.intel.com, tim.c.chen@linux.intel.com,
	arjan@linux.intel.com, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH v2 1/2] x86/fpu: detect AVX task
Date: Tue, 13 Nov 2018 14:42:38 +0800	[thread overview]
Message-ID: <9bcd88eb-9433-debf-b831-29e064e87522@linux.intel.com> (raw)
In-Reply-To: <b11a5599-fda4-8747-8878-e743b8a501e0@intel.com>

On 2018/11/12 23:46, Dave Hansen wrote:
> On 11/11/18 9:38 PM, Li, Aubrey wrote:
>
>>> Do we want this, or do we want something more time-based?
>>>
>> This counter is introduced here to solve the race of context switch and
>> VZEROUPPER. 3 context switches mean the same thread is on-off CPU 3 times.
>> Due to scheduling latency, 3 jiffies could only happen AVX task on-off just
>> 1 time. So IMHO the context switches number is better here.
> 
> Imagine we have a HZ=1000 system where AVX_STATE_DECAY_COUNT=3.  That
> means that a task can be marked as a non-AVX-512-user after not using it
> for ~3 ms.  But, with HZ=250, that's ~12ms.

From the other side, if we set a 4ms decay, when HZ=1000, context switch count
is 4, that means, we have 4 times of chance to maintain the AVX state, that is,
we are able to filter 4 times init state reset out. But if HZ = 250, the context
switch is 1, we only have 1 time of chance to filter init state reset out.
> 
> Also, don't forget that we have context switches from the timer
> interrupt, but also from normal old operations that sleep.
> 
> Let's say our AVX-512 app was doing:
> 
> 	while (foo) {
> 		do_avx_512();
> 		read(pipe, buf, len);
> 		read(pipe, buf, len);
> 		read(pipe, buf, len);
> 	}
> 
> And all three pipe reads context-switched the task.  That loop could
> finish in way under 3HZ, but still end up in do_avx_512() each time with
> fpu...avx->state=0.

Yeah, we are trying to address a prediction according to the historical pattern,
so you always can make a pattern to beat the prediction pattern. But in practice,
I measured tensorflow with AVX512 enabled, linpack with AVX512, and a micro 
benchmark, the current 3 context switches decay works well enough.

> 
> BTW, I don't have a great solution for this.  I was just pointing out
> one of the pitfalls from using context switch counts so strictly.

I really don't think time-based is better than the count in this case. 

>>>> +/*
>>>>   * Highest level per task FPU state data structure that
>>>>   * contains the FPU register state plus various FPU
>>>>   * state fields:
>>>> @@ -303,6 +312,14 @@ struct fpu {
>>>>  	unsigned char			initialized;
>>>>  
>>>>  	/*
>>>> +	 * @avx_state:
>>>> +	 *
>>>> +	 * This data structure indicates whether this context
>>>> +	 * contains AVX states
>>>> +	 */
>>>
>>> Yeah, that's precisely what fpu->state.xsave.xfeatures does. :)
>>> I see, will refine in the next version
> 
> One other thought about the new 'avx_state':
> 
> fxregs_state (which is a part of the XSAVE state) has some padding and
> 'sw_reserved' areas.  You *might* be able to steal some space there.
> Not that this is a huge space eater, but why waste the space if we don't
> have to?
> 

IMHO, I prefer not adding any extra thing into a data structure associated
with a hardware table. Let me try to work out a new version to see if it can
satisfy you.

Thanks,
-Aubrey


      reply	other threads:[~2018-11-13  6:42 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-07 17:16 [RFC PATCH v2 1/2] x86/fpu: detect AVX task Aubrey Li
2018-11-07 17:16 ` [RFC PATCH v2 2/2] proc: add /proc/<pid>/thread_state Aubrey Li
2018-11-09 11:21 ` [RFC PATCH v2 1/2] x86/fpu: detect AVX task Thomas Gleixner
2018-11-12  1:40   ` Li, Aubrey
2018-11-13 10:25     ` David Laight
2018-11-13 13:06       ` Li, Aubrey
2018-11-13 14:56         ` David Laight
2018-11-12  2:32 ` Dave Hansen
2018-11-12  5:38   ` Li, Aubrey
2018-11-12 15:46     ` Dave Hansen
2018-11-13  6:42       ` Li, Aubrey [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9bcd88eb-9433-debf-b831-29e064e87522@linux.intel.com \
    --to=aubrey.li@linux.intel.com \
    --cc=ak@linux.intel.com \
    --cc=arjan@linux.intel.com \
    --cc=aubrey.li@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).