From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756237AbdGKXxQ (ORCPT ); Tue, 11 Jul 2017 19:53:16 -0400 Received: from mga02.intel.com ([134.134.136.20]:14595 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755886AbdGKXxO (ORCPT ); Tue, 11 Jul 2017 19:53:14 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.40,347,1496127600"; d="scan'208";a="125401366" Date: Tue, 11 Jul 2017 16:54:50 -0700 (PDT) From: Shivappa Vikas X-X-Sender: vikas@vshiva-Udesk To: Thomas Gleixner cc: Vikas Shivappa , x86@kernel.org, linux-kernel@vger.kernel.org, hpa@zytor.com, peterz@infradead.org, ravi.v.shankar@intel.com, vikas.shivappa@intel.com, tony.luck@intel.com, fenghua.yu@intel.com, andi.kleen@intel.com Subject: Re: [PATCH 08/21] x86/intel_rdt/cqm: Add RMID(Resource monitoring ID) management In-Reply-To: Message-ID: References: <1498503368-20173-1-git-send-email-vikas.shivappa@linux.intel.com> <1498503368-20173-9-git-send-email-vikas.shivappa@linux.intel.com> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 3 Jul 2017, Thomas Gleixner wrote: > On Sun, 2 Jul 2017, Thomas Gleixner wrote: >> Thinking a bit more about that limbo mechanics. >> >> In case that a RMID was never used on a particular package, the state check >> forces an IPI on all packages unconditionally. That's suboptimal at least. >> >> We know on which package a given RMID was used, so we could restrict the >> checks to exactly these packages, but I'm not sure it's worth the >> trouble. We might at least document that and explain why this is >> implemented in that way. > > Second thoughts on that. The allocation logic is: > >> + if (list_empty(&rmid_free_lru)) { >> + ret = try_freeing_limbo_rmid(); >> + if (list_empty(&rmid_free_lru)) >> + return ret ? -ENOSPC : -EBUSY; >> + } >> + >> + entry = list_first_entry(&rmid_free_lru, >> + struct rmid_entry, list); >> + list_del(&entry->list); >> + >> + return entry->rmid; > > That means, the free list is used as the primary source. One of my boxes > has 143 RMIDs. So it only takes 142 mkdir/rmdir invocations to move all > RMIDs to the limbo list. On the next mkdir invocation the allocation goes > into the limbo path and the SMP function call has to walk the list with 142 > entries on ALL online domains whether they used the RMID or not! Would it be better if we do this in the MBM 1s overflow timer delayed_work? That is not in the interupt context. So we do a periodic flush of the limbo list and then mkdir fails with -EBUSY if list_empty(&free_list) && !list_empty(&limbo_list). To improve that - We may also include the optimization Tony suggested to skip the checks for RMIDs which are already checked to be < threshold (however that needs a domain mask like I mention below but may be we can just check the list here). > > That's bad enough already and the number of RMIDs will not become smaller; > it doubled from HSW to BDW ... > > The HPC and RT folks will love you for that - NOT! > > So this needs to be solved differently. > > Let's have a look at the context switch path first. That's the most > sensitive part of it. > > if (static_branch_likely(&rdt_mon_enable_key)) { > if (current->rmid) > newstate.rmid = current->rmid; > } > > That's optimized for the !monitoring case. So we can really penalize the > per task monitoring case. > > if (static_branch_likely(&rdt_mon_enable_key)) { > if (unlikely(current->rmid)) { > newstate.rmid = current->rmid; > __set_bit(newstate.rmid, this_cpu_ptr(rmid_bitmap)); > } > } > > Now in rmid_free() we can collect that information: > > cpumask_clear(&tmpmask); > cpumask_clear(rmid_entry->mask); > > cpus_read_lock(); > for_each_online_cpu(cpu) { > if (test_and_clear_bit(rmid, per_cpu_ptr(cpu, rmid_bitmap))) > cpumask_set(cpu, tmpmask); > } > > for_each_domain(d, resource) { > cpu = cpumask_any_and(d->cpu_mask, tmpmask); > if (cpu < nr_cpu_ids) > cpumask_set(cpu, rmid_entry->mask); When this cpu goes offline - the rmid_entry->mask needs an update. Otherwise, the work function would return true for if (!cpumask_test_cpu(cpu, rme->mask)) since the work may have been moved to a different cpu. So we really need a package mask ? or really a per-domain mask and for that we dont know the max domain number(which is why we use a list..) > } > > list_add(&rmid_entry->list, &limbo_list); > > for_each_cpu(cpu, rmid_entry->mask) > schedule_delayed_work_on(cpu, rmid_work); > cpus_read_unlock(); > > The work function: > > boot resched = false; > > list_for_each_entry(rme, limbo_list,...) { > if (!cpumask_test_cpu(cpu, rme->mask)) > continue; > > if (!rmid_is_reusable(rme)) { > resched = true; > continue; > } > > cpumask_clear_cpu(cpu, rme->mask); > if (!cpumask_empty(rme->mask)) > continue; > > /* Ready for reuse */ > list_del(rme->list); > list_add(&rme->list, &free_list); > } > > The alloc function then becomes: > > if (list_empty(&free_list)) > return list_empty(&limbo_list) ? -ENOSPC : -EBUSY; > > The switch_to() covers the task rmids. The per cpu default rmids can be > marked at the point where they are installed on a CPU in the per cpu > rmid_bitmap. The free path is the same for per task and per cpu. > > Another thing which needs some thought it the CPU hotplug code. We need to > make sure that pending work which is scheduled on an outgoing CPU is moved > in the offline callback to a still online CPU of the same domain and not > moved to some random CPU by the workqueue hotplug code. > > There is another subtle issue. Assume a RMID is freed. The limbo stuff is > scheduled on all domains which have online CPUs. > > Now the last CPU of a domain goes offline before the threshold for clearing > the domain CPU bit in the rme->mask is reached. > > So we have two options here: > > 1) Clear the bit unconditionally when the last CPU of a domain goes > offline. > > 2) Arm a timer which clears the bit after a grace period > > #1 The RMID might become available for reuse right away because all other > domains have not used it or have cleared their bits already. > > If one of the CPUs of that domain comes online again and is associated > to that reused RMID again, then the counter content might still contain > leftovers from the previous usage. > > #2 Prevents #1 but has it's own issues vs. serialization and coordination > with CPU hotplug. > > I'd say we go for #1 as the simplest solution, document it and if really > the need arises revisit it later. > > Thanks, > > tglx >