All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shivappa Vikas <vikas.shivappa@intel.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org, hpa@zytor.com,
	peterz@infradead.org, ravi.v.shankar@intel.com,
	vikas.shivappa@intel.com, tony.luck@intel.com,
	fenghua.yu@intel.com, andi.kleen@intel.com
Subject: Re: [PATCH 08/21] x86/intel_rdt/cqm: Add RMID(Resource monitoring ID) management
Date: Tue, 11 Jul 2017 16:54:50 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.10.1707111645080.6175@vshiva-Udesk> (raw)
In-Reply-To: <alpine.DEB.2.20.1707030954330.2188@nanos>



On Mon, 3 Jul 2017, Thomas Gleixner wrote:

> On Sun, 2 Jul 2017, Thomas Gleixner wrote:
>> Thinking a bit more about that limbo mechanics.
>>
>> In case that a RMID was never used on a particular package, the state check
>> forces an IPI on all packages unconditionally. That's suboptimal at least.
>>
>> We know on which package a given RMID was used, so we could restrict the
>> checks to exactly these packages, but I'm not sure it's worth the
>> trouble. We might at least document that and explain why this is
>> implemented in that way.
>
> Second thoughts on that. The allocation logic is:
>
>> +       if (list_empty(&rmid_free_lru)) {
>> +               ret = try_freeing_limbo_rmid();
>> +               if (list_empty(&rmid_free_lru))
>> +                       return ret ? -ENOSPC : -EBUSY;
>> +       }
>> +
>> +       entry = list_first_entry(&rmid_free_lru,
>> +                                struct rmid_entry, list);
>> +       list_del(&entry->list);
>> +
>> +       return entry->rmid;
>
> That means, the free list is used as the primary source. One of my boxes
> has 143 RMIDs. So it only takes 142 mkdir/rmdir invocations to move all
> RMIDs to the limbo list. On the next mkdir invocation the allocation goes
> into the limbo path and the SMP function call has to walk the list with 142
> entries on ALL online domains whether they used the RMID or not!

Would it be better if we do this in the MBM 1s overflow timer delayed_work? That 
is not in the interupt context. So we do a periodic flush of the limbo list and 
then mkdir fails with -EBUSY if list_empty(&free_list) && 
!list_empty(&limbo_list).
To improve that -
We may also include the optimization Tony suggested to 
skip the checks for RMIDs which are already checked to be < threshold (however 
that needs a domain mask like I mention below but may be we can just check the 
list here).

>
> That's bad enough already and the number of RMIDs will not become smaller;
> it doubled from HSW to BDW ...
>
> The HPC and RT folks will love you for that - NOT!
>
> So this needs to be solved differently.
>
> Let's have a look at the context switch path first. That's the most
> sensitive part of it.
>
> 	if (static_branch_likely(&rdt_mon_enable_key)) {
> 		if (current->rmid)
> 			newstate.rmid = current->rmid;
> 	}
>
> That's optimized for the !monitoring case. So we can really penalize the
> per task monitoring case.
>
> 	if (static_branch_likely(&rdt_mon_enable_key)) {
> 		if (unlikely(current->rmid)) {
> 			newstate.rmid = current->rmid;
> 			__set_bit(newstate.rmid, this_cpu_ptr(rmid_bitmap));
> 		}
> 	}
>
> Now in rmid_free() we can collect that information:
>
> 	cpumask_clear(&tmpmask);
> 	cpumask_clear(rmid_entry->mask);
>
> 	cpus_read_lock();
> 	for_each_online_cpu(cpu) {
> 		if (test_and_clear_bit(rmid, per_cpu_ptr(cpu, rmid_bitmap)))
> 			cpumask_set(cpu, tmpmask);
> 	}
>
> 	for_each_domain(d, resource) {
> 		cpu = cpumask_any_and(d->cpu_mask, tmpmask);
> 		if (cpu < nr_cpu_ids)
> 			cpumask_set(cpu, rmid_entry->mask);

When this cpu goes offline - the rmid_entry->mask needs an update. Otherwise, 
the work function would return true for
              if (!cpumask_test_cpu(cpu, rme->mask))

since the work may have been moved to a different cpu.

So we really need a package mask ? or really a per-domain mask and for that we 
dont know the max domain number(which is why we use a list..)

> 	}
>
> 	list_add(&rmid_entry->list, &limbo_list);
>
> 	for_each_cpu(cpu, rmid_entry->mask)
> 		schedule_delayed_work_on(cpu, rmid_work);
> 	cpus_read_unlock();
>
> The work function:
>
>    	boot resched = false;
>
>    	list_for_each_entry(rme, limbo_list,...) {
> 		if (!cpumask_test_cpu(cpu, rme->mask))
> 			continue;
>
> 		if (!rmid_is_reusable(rme)) {
> 			resched = true;
> 			continue;
> 		}
>
> 		cpumask_clear_cpu(cpu, rme->mask);
> 		if (!cpumask_empty(rme->mask))
> 			continue;
>
> 		/* Ready for reuse */
> 		list_del(rme->list);
> 		list_add(&rme->list, &free_list);
> 	}
>
> The alloc function then becomes:
>
> 	if (list_empty(&free_list))
> 		return list_empty(&limbo_list) ? -ENOSPC : -EBUSY;
>
> The switch_to() covers the task rmids. The per cpu default rmids can be
> marked at the point where they are installed on a CPU in the per cpu
> rmid_bitmap. The free path is the same for per task and per cpu.
>
> Another thing which needs some thought it the CPU hotplug code. We need to
> make sure that pending work which is scheduled on an outgoing CPU is moved
> in the offline callback to a still online CPU of the same domain and not
> moved to some random CPU by the workqueue hotplug code.
>
> There is another subtle issue. Assume a RMID is freed. The limbo stuff is
> scheduled on all domains which have online CPUs.
>
> Now the last CPU of a domain goes offline before the threshold for clearing
> the domain CPU bit in the rme->mask is reached.
>
> So we have two options here:
>
>   1) Clear the bit unconditionally when the last CPU of a domain goes
>      offline.
>
>   2) Arm a timer which clears the bit after a grace period
>
> #1 The RMID might become available for reuse right away because all other
>   domains have not used it or have cleared their bits already.
>
>   If one of the CPUs of that domain comes online again and is associated
>   to that reused RMID again, then the counter content might still contain
>   leftovers from the previous usage.
>
> #2 Prevents #1 but has it's own issues vs. serialization and coordination
>   with CPU hotplug.
>
> I'd say we go for #1 as the simplest solution, document it and if really
> the need arises revisit it later.
>
> Thanks,
>
> 	tglx
>

  parent reply	other threads:[~2017-07-11 23:53 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-26 18:55 [PATCH V1 00/21] x86/cqm3: Resctrl based cqm Vikas Shivappa
2017-06-26 18:55 ` [PATCH 01/21] x86/perf/cqm: Wipe out perf " Vikas Shivappa
2017-06-26 18:55 ` [PATCH 02/21] x86/intel_rdt: Fix memory leak during mount Vikas Shivappa
2017-06-30 19:24   ` [tip:x86/urgent] x86/intel_rdt: Fix memory leak on mount failure tip-bot for Vikas Shivappa
2017-06-26 18:55 ` [PATCH 03/21] x86/intel_rdt/cqm: Documentation for resctrl based RDT Monitoring Vikas Shivappa
2017-06-26 18:55 ` [PATCH 04/21] x86/intel_rdt: Introduce a common compile option for RDT Vikas Shivappa
2017-06-26 18:55 ` [PATCH 05/21] x86/intel_rdt: Change file names to accommodate RDT monitor code Vikas Shivappa
2017-06-26 18:55 ` [PATCH 06/21] x86/intel_rdt: Cleanup namespace to support RDT monitoring Vikas Shivappa
2017-06-26 18:55 ` [PATCH 07/21] x86/intel_rdt/cqm: Add RDT monitoring initialization Vikas Shivappa
2017-07-02  9:14   ` Thomas Gleixner
2017-07-06 21:07     ` Shivappa Vikas
2017-06-26 18:55 ` [PATCH 08/21] x86/intel_rdt/cqm: Add RMID(Resource monitoring ID) management Vikas Shivappa
2017-07-02 10:05   ` Thomas Gleixner
2017-07-03  9:55     ` Thomas Gleixner
2017-07-05 15:34       ` Peter Zijlstra
2017-07-05 17:25         ` Thomas Gleixner
2017-07-11 23:54       ` Shivappa Vikas [this message]
2017-07-12 20:14         ` Thomas Gleixner
2017-07-05 17:59     ` Tony Luck
2017-07-06  6:51       ` Thomas Gleixner
2017-06-26 18:55 ` [PATCH 09/21] x86/intel_rdt: Simplify info and base file lists Vikas Shivappa
2017-07-02 10:09   ` Thomas Gleixner
2017-07-06 21:09     ` Shivappa Vikas
2017-06-26 18:55 ` [PATCH 10/21] x86/intel_rdt/cqm: Add info files for RDT monitoring Vikas Shivappa
2017-06-26 18:55 ` [PATCH 11/21] x86/intel_rdt/cqm: Add mkdir support " Vikas Shivappa
2017-07-02 10:58   ` Thomas Gleixner
2017-07-06 21:23     ` Shivappa Vikas
2017-06-26 18:55 ` [PATCH 12/21] x86/intel_rdt/cqm: Add tasks file support Vikas Shivappa
2017-07-02 11:01   ` Thomas Gleixner
2017-07-06 21:25     ` Shivappa Vikas
2017-06-26 18:56 ` [PATCH 13/21] x86/intel_rdt/cqm: Add cpus " Vikas Shivappa
2017-07-02 11:11   ` Thomas Gleixner
2017-07-06 21:26     ` Shivappa Vikas
2017-07-02 12:29   ` Thomas Gleixner
2017-07-06 21:42     ` Shivappa Vikas
2017-07-07  6:44       ` Thomas Gleixner
2017-07-13 18:37         ` Shivappa Vikas
2017-07-13 22:09     ` Shivappa Vikas
2017-06-26 18:56 ` [PATCH 14/21] x86/intel_rdt/cqm: Add mon_data Vikas Shivappa
2017-07-02 12:43   ` Thomas Gleixner
2017-07-06 21:48     ` Shivappa Vikas
2017-07-07  6:22       ` Thomas Gleixner
2017-07-11 21:17         ` Shivappa Vikas
2017-07-11 21:37           ` Luck, Tony
2017-06-26 18:56 ` [PATCH 15/21] x86/intel_rdt/cqm: Add rmdir support Vikas Shivappa
2017-07-02 13:16   ` Thomas Gleixner
2017-07-06 21:49     ` Shivappa Vikas
2017-06-26 18:56 ` [PATCH 16/21] x86/intel_rdt/cqm: Add mount,umount support Vikas Shivappa
2017-07-02 13:22   ` Thomas Gleixner
2017-07-06 21:58     ` Shivappa Vikas
2017-06-26 18:56 ` [PATCH 17/21] x86/intel_rdt/cqm: Add sched_in support Vikas Shivappa
2017-07-02 13:37   ` Thomas Gleixner
2017-07-06 23:35     ` Shivappa Vikas
2017-06-26 18:56 ` [PATCH 18/21] x86/intel_rdt/cqm: Add hotcpu support Vikas Shivappa
2017-06-26 18:56 ` [PATCH 19/21] x86/intel_rdt/mbm: Basic counting of MBM events (total and local) Vikas Shivappa
2017-07-02 13:46   ` Thomas Gleixner
2017-07-06 23:39     ` Shivappa Vikas
2017-07-07  6:47       ` Thomas Gleixner
2017-06-26 18:56 ` [PATCH 20/21] x86/intel_rdt/mbm: Add mbm counter initialization Vikas Shivappa
2017-06-26 18:56 ` [PATCH 21/21] x86/intel_rdt/mbm: Handle counter overflow Vikas Shivappa
2017-07-02 13:57   ` Thomas Gleixner
2017-07-06 23:53     ` Shivappa Vikas
2017-07-07  6:50       ` Thomas Gleixner
2017-07-10 17:54         ` Luck, Tony
2017-07-11 15:22           ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.10.1707111645080.6175@vshiva-Udesk \
    --to=vikas.shivappa@intel.com \
    --cc=andi.kleen@intel.com \
    --cc=fenghua.yu@intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=ravi.v.shankar@intel.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=vikas.shivappa@linux.intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.