linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "Yu, Fenghua" <fenghua.yu@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	H Peter Anvin <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	linux-kernel <linux-kernel@vger.kernel.org>, x86 <x86@kernel.org>,
	Vikas Shivappa <vikas.shivappa@linux.intel.com>
Subject: Re: [PATCH V15 00/11] x86: Intel Cache Allocation Technology Support
Date: Fri, 16 Oct 2015 17:24:42 -0300	[thread overview]
Message-ID: <20151016202439.GA27055@amt.cnet> (raw)
In-Reply-To: <20151016094452.GO3816@twins.programming.kicks-ass.net>

On Fri, Oct 16, 2015 at 11:44:52AM +0200, Peter Zijlstra wrote:
> On Thu, Oct 15, 2015 at 09:17:16PM -0300, Marcelo Tosatti wrote:
> > On Thu, Oct 15, 2015 at 01:37:02PM +0200, Peter Zijlstra wrote:
> > > On Tue, Oct 13, 2015 at 07:40:58PM -0300, Marcelo Tosatti wrote:
> > > > How can you fix the issue of sockets with different reserved cache
> > > > regions with hw in the cgroup interface?
> > > 
> > > No idea what you're referring to. But IOCTLs blow.
> > 
> > Tejun brought up syscalls. Syscalls seem too generic.
> > So ioctls were chosen instead.
> > 
> > It is necessary to perform the following operations:
> > 
> > 1) create cache reservation (params = size, type).
> 
> mkdir
> 
> > 2) delete cache reservation.
> 
> rmdir
> 
> > 3) attach cache reservation (params = cache reservation id, pid).
> > 4) detach cache reservation (params = cache reservation id, pid).
> 
> echo $pid > tasks
> 
> > Can it done via cgroups? If so, works for me.
> 
> Trivially.

Fine. 

Tejun brought the problem of locking: how do you coordinate locking
between different users?  (on the mkdir / rmdir scenario above).

> 
> > A list of problems with the cgroup interface has been written,
> > in the thread... and we found another problem.
> 
> Which was endless and tiresome so I stopped reading.
> 
> > List of problems with cgroup interface:
> > 
> > 1) Global IPI on CBM <---> task change does not scale.
> > 
> >  * cbm_update_all() - Update the cache bit mask for all packages.
> >  */
> > static inline void cbm_update_all(u32 closid)
> > {
> >        on_each_cpu_mask(&rdt_cpumask, cbm_cpu_update, (void *)closid,
> > 1);
> > }
> 
> There is no way around that, the moment you view the CBM as a global
> resource; ie. a CBM is configured the same on all sockets; you need to
> do this for a task using that CBM might run on any CPU at any time.
> 
> This is not because of the cgroup interface at all. This is because you
> want CBMs to be the same machine wide.

You don't, for two reasons:

1) Item 6 below.
2) Item 7 below.

Please follow on with the discussion (just scroll down and read and
reply inline: item 6 and machine wide CBMs are not incompatible
because...).

> The only way to actually change that is to _be_ a cgroup and co-mount
> with cpusets and be incestuous and look at the cpusets state and
> discover disjoint groups.
> 
> > 2) Syscall interface specification is in kbytes, not
> > cache ways (which is what must be recorded by the OS
> > to allow migration of the OS between different
> > hardware systems).
> 
> Meh, that again is nothing fundamental. The cgroup interface could do
> bytes just the same.

Yes.

> > 3) Compilers are able to configure cache optimally for
> > given ranges of code inside applications, easily,
> > if desired.
> 
> Yeah, so? Every SKU has a different cache size, so once you're down to
> that level you're pretty hard set in your configuration and it really
> doesn't matter if you give bytes or ways, you _KNOW_ what your
> configuration will be.

That item has nothing to do with cache ways in bytes or ways.

> > 4) Problem-2: The decision to allocate cache is tied to application
> > initialization / destruction, and application initialization is
> > essentially random from the POV of the system (the events which trigger
> > the execution of the application are not visible from the system).
> > 
> > Think of a server running two different servers: one database
> > with requests that are received with poisson distribution, average 30
> > requests per hour, and every request takes 1 minute.
> > 
> > One httpd server with nearly constant load.
> > 
> > Without cache reservations, database requests takes 2 minutes.
> > That is not acceptable for the database clients.
> > But with cache reservation, database requests takes 1 minute.
> > 
> > You want to maximize performance of httpd and database requests
> > What you do? You allow the database server to perform cache
> > reservation once a request comes in, and to undo the reservation
> > once the request is finished.
> 
> > Its impossible to perform this with a centralized interface.
> 
> Not so; just a wee bit more fragile that desired. But, this is a
> pre-existing problem with cgroups and needs to be solved, not using
> cgroups because of this is silly.
> 
> Every cgroup that can work on tasks suffers this and arguably a few
> more.
> 
> > 5) Modify scenario 2 above as follows: each database request
> > is handled by two newly created threads, and they share a certain
> > percentage
> > of data cache, and a certain percentage of code cache.
> > 
> > So the dispatcher thread, on arrival of request, has to:
> > 
> >         - create data cache reservation = tcrid-A.
> >         - create code cache reservation = tcrid-B.
> >         - create thread-1.
> >         - assign tcird-A and B to thread-1.
> >         - create thread-2.
> >         - assign tcird-A and B to thread-2.
> > 
> > 6) Create reservations in such a way that the sum is larger than
> > total amount of cache, and CPU pinning (example from Karen Noel):
> > 
> > VM-1 on socket-1 with 80% of reservation.
> > VM-2 on socket-2 with 80% of reservation.
> > VM-1 pinned to socket-1.
> > VM-2 pinned to socket-2.
> > 
> > Cgroups interface attempts to set a cache mask globally. This is the
> > problem the "expand" proposal solves:
> > https://lkml.org/lkml/2015/7/29/682
> 
> That email is unparsable.

Look at item 6. If you create reservations in such a way that the sum
is larger than total amount of cache, "cosid0" which is the
"unconstrained set of tasks" (ie: rest of the system) have 0 bytes of
L3 cache to reclaim from.

> But the only way to sanely do so it do closely
> intertwine oneself with cpusets, doing that with anything other than
> another cgroup controller absolutely full on insane.

void __intel_rdt_sched_in(void)
{
        struct task_struct *task = current;
        unsigned int cpu = smp_processor_id();
        unsigned int this_socket = topology_physical_package_id(cpu);
        unsigned int start, end;

        /*
         * The CBM bitmask for a particular task is enforced
         * on sched-in to a given processor, and only for the
         * range (cbm_start_bit,cbm_end_bit) which the
         * tcr_list (COSid) owns.
         * This way we allow COSid0 (global task pool) to use
         * reserved L3 cache on sockets where the tasks that
         * reserve the cache have not been scheduled.
         *
         * Since reading the MSRs is slow, it is necessary to
         * cache the MSR CBM map on each socket.
         *
         */

        if (test_bit(this_socket,
                     task->tcrlist->synced_to_socket) == 0) {

Makes sense?

> 
> > 7) Consider two sockets with different region of L3 cache
> > shared with HW:
> > 
> > — CPUID.(EAX=10H, ECX=1):EBX[31:0] reports a bit mask. Each set bit
> > within the length of the CBM
> > indicates the corresponding unit of the L3 allocation may be used by
> > other entities in the platform (e.g. an
> > integrated graphics engine or hardware units outside the processor core
> > and have direct access to L3).
> > Each cleared bit within the length of the CBM indicates the
> > corresponding allocation unit can be configured
> > to implement a priority-based allocation scheme chosen by an OS/VMM
> > without interference with other
> > hardware agents in the system. Bits outside the length of the CBM are
> > reserved.
> > 
> > You want the kernel to maintain different bitmasks in the CBM:
> > 
> >         socket1 [range-A]
> >         socket2 [range-B]
> > 
> > And the kernel will automatically switch from range A to range B
> > when the thread switches sockets.
> 
> This is firmly in the insane range of things.. not going to happen full
> stop.

Are you saying that hardware will guarantee reserved region is the same
for all sockets? I asked Vikas and he said this is not the case.

> It a thread can freely schedule between two CPUs its configuration on
> those two CPUs had better bloody be the same.

Its just the (start,end) of the CBM which changes, so on
__intel_rdt_sched_in you do:

                struct per_socket_data *psd = get_socket_data(this_socket);
                struct cache_layout *layout = psd->layout;

                start = task->tcrlist->psd[layout->id].cbm_start;
                end = task->tcrlist->psd[layout->id].cbm_end;
                sync_to_msr(tcrlist, start, end);

Please clarify what you mean.



  reply	other threads:[~2015-10-19 19:29 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-02  6:09 [PATCH V15 00/11] x86: Intel Cache Allocation Technology Support Fenghua Yu
2015-10-02  6:09 ` [PATCH V15 01/11] x86/intel_cqm: Modify hot cpu notification handling Fenghua Yu
2015-10-02  6:09 ` [PATCH V15 02/11] x86/intel_rapl: " Fenghua Yu
2015-10-02  6:09 ` [PATCH V15 03/11] x86/intel_rdt: Cache Allocation documentation Fenghua Yu
2015-10-02  6:09 ` [PATCH V15 04/11] x86/intel_rdt: Add support for Cache Allocation detection Fenghua Yu
2015-11-04 14:51   ` Luiz Capitulino
2015-10-02  6:09 ` [PATCH V15 05/11] x86/intel_rdt: Add Class of service management Fenghua Yu
2015-11-04 14:55   ` Luiz Capitulino
2015-10-02  6:09 ` [PATCH V15 06/11] x86/intel_rdt: Add L3 cache capacity bitmask management Fenghua Yu
2015-10-02  6:09 ` [PATCH V15 07/11] x86/intel_rdt: Implement scheduling support for Intel RDT Fenghua Yu
2015-10-02  6:09 ` [PATCH V15 08/11] x86/intel_rdt: Hot cpu support for Cache Allocation Fenghua Yu
2015-10-02  6:09 ` [PATCH V15 09/11] x86/intel_rdt: Intel haswell Cache Allocation enumeration Fenghua Yu
2015-10-02  6:09 ` [PATCH V15 10/11] x86,cgroup/intel_rdt : Add intel_rdt cgroup documentation Fenghua Yu
2015-10-02  6:09 ` [PATCH V15 11/11] x86,cgroup/intel_rdt : Add a cgroup interface to manage Intel cache allocation Fenghua Yu
2015-11-18 20:58   ` Marcelo Tosatti
2015-11-18 21:27   ` Marcelo Tosatti
2015-12-16 22:00     ` Yu, Fenghua
2015-11-18 22:15   ` Marcelo Tosatti
2015-12-14 22:58     ` Yu, Fenghua
2015-10-11 19:50 ` [PATCH V15 00/11] x86: Intel Cache Allocation Technology Support Thomas Gleixner
2015-10-12 18:52   ` Yu, Fenghua
2015-10-12 19:58     ` Thomas Gleixner
2015-10-13 22:40     ` Marcelo Tosatti
2015-10-15 11:37       ` Peter Zijlstra
2015-10-16  0:17         ` Marcelo Tosatti
2015-10-16  9:44           ` Peter Zijlstra
2015-10-16 20:24             ` Marcelo Tosatti [this message]
2015-10-19 23:49               ` Marcelo Tosatti
2015-10-13 21:31   ` Marcelo Tosatti
2015-10-15 11:36     ` Peter Zijlstra
2015-10-16  2:28       ` Marcelo Tosatti
2015-10-16  9:50         ` Peter Zijlstra
2015-10-26 20:02           ` Marcelo Tosatti
2015-11-02 22:20           ` cat cgroup interface proposal (non hierarchical) was " Marcelo Tosatti
2015-11-04 14:42 ` Luiz Capitulino
2015-11-04 14:57   ` Thomas Gleixner
2015-11-04 15:12     ` Luiz Capitulino
2015-11-04 15:28       ` Thomas Gleixner
2015-11-04 15:35         ` Luiz Capitulino
2015-11-04 15:50           ` Thomas Gleixner
2015-11-05  2:19 ` [PATCH 1/2] x86/intel_rdt,intel_cqm: Remove build dependency of RDT code on CQM code David Carrillo-Cisneros
2015-11-05  2:19   ` [PATCH 2/2] x86/intel_rdt: Fix bug in initialization, locks and write cbm mask David Carrillo-Cisneros

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151016202439.GA27055@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=fenghua.yu@intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=vikas.shivappa@linux.intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).