From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [PATCH v2 12/30] xen/x86: Generate deep dependencies of features Date: Mon, 15 Feb 2016 15:28:57 +0000 Message-ID: <56C1EEB9.5020206@citrix.com> References: <1454679743-18133-1-git-send-email-andrew.cooper3@citrix.com> <1454679743-18133-13-git-send-email-andrew.cooper3@citrix.com> <56C1E96502000078000D225B@prv-mh.provo.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <56C1E96502000078000D225B@prv-mh.provo.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: Xen-devel List-Id: xen-devel@lists.xenproject.org On 15/02/16 14:06, Jan Beulich wrote: >>>> On 05.02.16 at 14:42, wrote: >> @@ -20,12 +21,34 @@ uint32_t __read_mostly hvm_featureset[FSCAPINTS]; >> >> static void sanitise_featureset(uint32_t *fs) >> { >> + uint32_t disabled_features[FSCAPINTS]; >> unsigned int i; >> >> for ( i = 0; i < FSCAPINTS; ++i ) >> { >> /* Clamp to known mask. */ >> fs[i] &= known_features[i]; >> + >> + /* >> + * Identify which features with deep dependencies have been >> + * disabled. >> + */ >> + disabled_features[i] = ~fs[i] & deep_features[i]; >> + } >> + >> + for_each_set_bit(i, (void *)disabled_features, >> + sizeof(disabled_features) * 8) >> + { >> + const uint32_t *dfs = lookup_deep_deps(i); >> + unsigned int j; >> + >> + ASSERT(dfs); /* deep_features[] should guarentee this. */ >> + >> + for ( j = 0; j < FSCAPINTS; ++j ) >> + { >> + fs[j] &= ~dfs[j]; >> + disabled_features[j] &= ~dfs[j]; >> + } >> } > Am I getting the logic in the Python script right that it is indeed > unnecessary for this loop to be restarted even when a feature > at a higher numbered bit position results in a lower numbered > bit getting cleared? Correct - the python logic flattens the dependency tree so an individual lookup_deep_deps() will get you all features eventually influenced by feature i. It might be the deep features for i includes a feature numerically lower than i, but because dfs[] contains all features on the eventual chain, we don't need to start back from 0 again. I felt this was far more productive to code at build time, rather than at runtime. > >> @@ -153,6 +176,36 @@ void calculate_featuresets(void) >> calculate_hvm_featureset(); >> } >> >> +const uint32_t *lookup_deep_deps(uint32_t feature) > Do you really mean this rather internal function to be non-static? (You have found the answer already) > Even if there was a later use in this series, it would seem suspicious > to me; in fact I'd have expected for it and ... > >> +{ >> + static const struct { >> + uint32_t feature; >> + uint32_t fs[FSCAPINTS]; >> + } deep_deps[] = INIT_DEEP_DEPS; > ... this data to again be placed in .init.* sections. For this series, I think it can be .init. For the future cpuid changes, I am going to need to use it for validation during the set_cpuid hypercall, so will have to move to not being .init for that. > >> --- a/xen/tools/gen-cpuid.py >> +++ b/xen/tools/gen-cpuid.py >> @@ -138,6 +138,61 @@ def crunch_numbers(state): >> state.hvm_shadow = featureset_to_uint32s(state.raw_hvm_shadow, nr_entries) >> state.hvm_hap = featureset_to_uint32s(state.raw_hvm_hap, nr_entries) >> >> + deps = { >> + XSAVE: >> + (XSAVEOPT, XSAVEC, XGETBV1, XSAVES, AVX, MPX), >> + >> + AVX: >> + (FMA, FMA4, F16C, AVX2, XOP), > I continue to question whether namely XOP, but perhaps also the > others here except maybe AVX2, really is depending on AVX, and > not just on XSAVE. I am sure we have had this argument before. All VEX encoded SIMD instructions (including XOP which is listed in the same category by AMD) are specified to act on 256bit AVX state, and require AVX enabled in xcr0 to avoid #UD faults. This includes VEX instructions encoding %xmm registers, which explicitly zero the upper 128bits of the associated %ymm register. This is very clearly a dependency on AVX, even if it isn't written in one clear concise statement in the instruction manuals. > >> + PAE: >> + (LM, ), >> + >> + LM: >> + (CX16, LAHF_LM, PAGE1GB), >> + >> + XMM: >> + (LM, ), >> + >> + XMM2: >> + (LM, ), >> + >> + XMM3: >> + (LM, ), > I don't think so - SSE2 is a commonly implied prereq for long mode, > but not SSE3. Instead what I'm missing is a dependency of SSEn, > AES, SHA and maybe more on SSE (or maybe directly FXSR as per > above), and of SSE on FXSR. And there may be more, like MMX > really ought to be dependent on FPU or FXSR (not currently > expressable as it seems). I will double check all of these. ~Andrew