linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: "Yu, Fenghua" <fenghua.yu@intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	"x86@kernel.org" <x86@kernel.org>,
	Luiz Capitulino <lcapitulino@redhat.com>,
	"Shivappa, Vikas" <vikas.shivappa@intel.com>,
	Tejun Heo <tj@kernel.org>,
	"Shankar, Ravi V" <ravi.v.shankar@intel.com>,
	"Luck, Tony" <tony.luck@intel.com>
Subject: Re: [RFD] CAT user space interface revisited
Date: Mon, 4 Jan 2016 15:20:54 -0200	[thread overview]
Message-ID: <20160104172054.GA21926@amt.cnet> (raw)
In-Reply-To: <alpine.DEB.2.11.1512312251430.28591@nanos>

On Thu, Dec 31, 2015 at 11:30:57PM +0100, Thomas Gleixner wrote:
> Marcelo,
> 
> On Thu, 31 Dec 2015, Marcelo Tosatti wrote:
> 
> First of all thanks for the explanation.
> 
> > There is one directory structure in this topic, CAT. That is the
> > directory structure which is exposed to userspace to control the 
> > CAT HW. 
> > 
> > With the current patchset posted by Intel ("Subject: [PATCH V16 00/11]
> > x86: Intel Cache Allocation Technology Support"), the directory
> > structure there (the files and directories exposed by that patchset)
> > (*1) does not allow one to configure different CBM masks on each socket
> > (that is, it forces the user to configure the same mask CBM on every
> > socket). This is a blocker for us, and it is one of the points in your
> > proposal.
> > 
> > There was a call between Red Hat and Intel where it was communicated
> > to Intel, and Intel agreed, that it was necessary to fix this (fix this
> > == allow different CBM masks on different sockets).
> > 
> > Now, that is one change to the current directory structure (*1).
> 
> I don't have an idea how that would look like. The current structure is a
> cgroups based hierarchy oriented approach, which does not allow simple things
> like
> 
> T1	00001111
> T2	00111100
> 
> at least not in a way which is natural to the problem at hand.



	cgroupA/

		cbm_mask  (if set, set for all CPUs)

		socket1/cbm_mask
		socket2/cbm_mask
		...
		socketN/cbm_mask (if set, overrides global
		cbm_mask).

Something along those lines.

Do you see any problem with it?

> > (*1) modified to allow for different CBM masks on different sockets, 
> > lets say (*2), is what we have been waiting for Intel to post. 
> > It would handle our usecase, and all use-cases which the current
> > patchset from Intel already handles (Vikas posted emails mentioning
> > there are happy users of the current interface, feel free to ask 
> > him for more details).
> 
> I cannot imagine how that modification to the current interface would solve
> that. Not to talk about per CPU associations which are not related to tasks at
> all.

Not sure what you mean by per CPU associations.

If you fix a cbmmask on a given pCPU, say CPU1, and control which tasks
run on that pCPU, then you control the cbmmask for all tasks (say
tasklist-1) on that CPU, fine.

Can achieve the same by putting all tasks from tasklist-1 into a
cgroup.

> > What i have asked you, and you replied "to go Google read my previous
> > post" is this:
> > What are the advantages over you proposal (which is a completely
> > different directory structure, requiring a complete rewrite),
> > over (*2) ?
> > 
> > (what is my reason behind this: the reason is that if you, with
> > maintainer veto power, forces your proposal to be accepted, it will be
> > necessary to wait for another rewrite (a new set of problems, fully
> > think through your proposal, test it, ...) rather than simply modify an
> > already known, reviewed, already used directory structure.
> > 
> > And functionally, your proposal adds nothing to (*2) (other than, well,
> > being a different directory structure).
> 
> Sorry. I cannot see at all how a modification to the existing interface would
> cover all the sensible use cases I described in a coherent way. I really want
> to see a proper description of the interface before people start hacking on it
> in a frenzy. What you described is: "let's say (*2)" modification. That's
> pretty meager.
> 
> > If Fenghua or you post a patchset, say in 2 weeks, with your proposal,
> > i am fine with that. But i since i doubt that will be the case, i am 
> > pushing for the interface which requires the least amount of changes
> > (and therefore the least amount of time) to be integrated.
> > 
> > >From your email:
> > 
> > "It would even be sufficient for particular use cases to just associate
> > a piece of cache to a given CPU and do not bother with tasks at all.
> > 
> > We really need to make this as configurable as possible from userspace
> > without imposing random restrictions to it. I played around with it on
> > my new intel toy and the restriction to 16 COS ids (that's 8 with CDP
> > enabled) makes it really useless if we force the ids to have the same
> > meaning on all sockets and restrict it to per task partitioning."
> > 
> > Yes, thats the issue we hit, that is the modification that was agreed
> > with Intel, and thats what we are waiting for them to post.
> 
> How do you implement the above - especially that part:
> 
>  "It would even be sufficient for particular use cases to just associate a
>   piece of cache to a given CPU and do not bother with tasks at all."
> 
> as a "simple" modification to (*1) ?

As noted above.
>  
> > > I described a directory structure for that qos/cat stuff in my proposal and
> > > that's complete AFAICT.
> > 
> > Ok, lets make the job for the submitter easier. You are the maintainer,
> > so you decide.
> > 
> > Is it enough for you to have (*2) (which was agreed with Intel), or 
> > would you rather prefer to integrate the directory structure at 
> > "[RFD] CAT user space interface revisited" ?
> 
> The only thing I care about as a maintainer is, that we merge something which
> actually reflects the properties of the hardware and gives the admin the
> required flexibility to utilize it fully. I don't care at all if it's my
> proposal or something else which allows to do the same.
> 
> Let me copy the relevant bits from my proposal here once more and let me ask
> questions to the various points so you can tell me how that modification to
> (*1) is going to deal with that.
> 
> >> At top level:
> >>
> >>  xxxxxxx/cat/max_cosids <- Assume that all CPUs are the same
> >>  xxxxxxx/cat/max_maskbits <- Assume that all CPUs are the same

This can be exposed to userspace via a file.

> >>  xxxxxxx/cat/cdp_enable <- Depends on CDP availability
> 
>  Where is that information in (*2) and how is that related to (*1)? If you
>  think it's not required, please explain why.

Intel has come up with a scheme to implement CDP. I'll go read 
that and reply to this email afterwards.

> >> Per socket data:
> >>
> >>  xxxxxxx/cat/socket-0/
> >>  ...
> >>  xxxxxxx/cat/socket-N/l3_size
> >>  xxxxxxx/cat/socket-N/hwsharedbits
> 
>  Where is that information in (*2) and how is that related to (*1)? If you
>  think it's not required, please explain why.

l3_size: userspace can figure that by itself (exposed somewhere in
sysfs).

hwsharedbits: All userspace needs to know is
which bits are shared with HW, to decide whether or not to use that
region of a given socket for a given cbmmask.

So expose that userspace, fine. Can do that in cgroups.

> >> Per socket mask data:
> >>
> >>  xxxxxxx/cat/socket-N/cos-id-0/
> >>  ...
> >>  xxxxxxx/cat/socket-N/cos-id-N/inuse
> >>                               /cat_mask
> >>                               /cdp_mask <- Data mask if CDP enabled
> 
>  Where is that information in (*2) and how is that related to (*1)? If you
>  think it's not required, please explain why.

Unsure - will reply in next email (but per-socket information seems
independent of that).

> 
> >> Per cpu default cos id for the cpus on that socket:
> >> 
> >>  xxxxxxx/cat/socket-N/cpu-x/default_cosid
> >>  ...
> >>  xxxxxxx/cat/socket-N/cpu-N/default_cosid
> >>
> >> The above allows a simple cpu based partitioning. All tasks which do
> >> not have a cache partition assigned on a particular socket use the
> >> default one of the cpu they are running on.
> 
>  Where is that information in (*2) and how is that related to (*1)? If you
>  think it's not required, please explain why.

Not required because with current Intel patchset you'd do:


# mount | grep rdt
cgroup on /sys/fs/cgroup/intel_rdt type cgroup
(rw,nosuid,nodev,noexec,relatime,intel_rdt)
# cd /sys/fs/cgroup/intel_rdt
# ls
cgroupALL              cgroup.procs  cgroup.sane_behavior
notify_on_release  tasks
cgroup.clone_children  cgroupRSVD    intel_rdt.l3_cbm	   release_agent
# cat tasks
1042
1066
1067
1069
...
# cd cgroupALL/
ps auxw  | while read i; do echo $i ; done
| cut -f 2 -d " "   | grep -v PID | while read x ; do echo $x > tasks;
done
-bash: echo: write error: No such process
-bash: echo: write error: No such process
-bash: echo: write error: No such process
-bash: echo: write error: No such process

# cat ../tasks | while read i; do echo $i > tasks; done
# cat ../tasks  | wc -l
0
(no tasks on root cgroup)

# cd ../cgroupRSVD
# cgroupRSVD]# cat tasks
# ps auxw | grep postfix
root	   1942  0.0  0.0  91136  4860 ?        Ss   Nov25   0:00
/usr/libexec/postfix/master -w
postfix    1981  0.0  0.0  91308  6520 ?        S    Nov25   0:00 qmgr
-l -t unix -u
postfix    4416  0.0  0.0  91240  6296 ?        S    17:05   0:00 pickup
-l -t unix -u
root	   4486  0.0  0.0 112652  2304 pts/0    S+   17:31   0:00 grep
--color=auto postfix
# echo 4416 > tasks
# cat intel_rdt.l3_cbm
000ffff0
# cat ../cgroupALL/intel_rdt.l3_cbm
000000ff

Bits f0 are shared between cgroupRSVD and cgroupALL. Lets change:
# echo 0xf > ../cgroupALL/intel_rdt.l3_cbm
# cat ../cgroupALL/intel_rdt.l3_cbm
0000000f

Now they share none.

------

So i have created "cgroupALL" (what you call default_cosid) and
"cgroupRSVD".

> 
> >> Now for the task(s) partitioning:
> >>
> >>  xxxxxxx/cat/partitions/
> >>
> >> Under that directory one can create partitions
> >> 
> >>  xxxxxxx/cat/partitions/p1/tasks
> >>                           /socket-0/cosid
> >>                           ...
> >>                           /socket-n/cosid
> >> 
> >> The default value for the per socket cosid is COSID_DEFAULT, which
> >> causes the task(s) to use the per cpu default id. 
> 
>  Where is that information in (*2) and how is that related to (*1)? If you
>  think it's not required, please explain why.
> 
> Yes. I ask the same question several times and I really want to see the
> directory/interface structure which solves all of the above before anyone
> starts to implement it. 

I don't see the problem, have a sequence of commands above which shows
to set a directory structure which is useful and does what the HW 
interface is supposed to do.

> We already have a completely useless interface (*1)
> and there is no point to implement another one based on it (*2) just because
> it solves your particular issue and is the fastest way forward. User space
> interfaces are hard and we really do not need some half baken solution which
> we have to support forever.

Fine. Can you please tell me what i can't do with the current interface?
AFAICS everything can be done (except missing support for (*2)).

> 
> Let me enumerate the required points again:
> 
>     1) Information about the hardware properties

Fine. Intel should expose that information.

>     2) Integration of CAT and CDP 

Fine, Intel has comeup with a directory structure for that, 
let me read the patchset again and i'll reply to you.

>     3) Per socket cos-id partitioning

(*2) as listed in the beginning of this e-mail.

>     4) Per cpu default cos-id association

This already exists, and as noted in the command sequence above,
works just fine. Please explain what problem are you seeing.

>     5) Task association to cos-id

Not sure what that means. Please explain.

> 
> Can you please explain in a simple directory based scheme, like the one I gave
> you how all of these points are going to be solved with a modification to (*1)?
> 
> Thanks,
> 
> 	tglx

Thanks Thomas, this style discussion is quite useful.


  reply	other threads:[~2016-01-04 17:21 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-18 18:25 [RFD] CAT user space interface revisited Thomas Gleixner
2015-11-18 19:38 ` Luiz Capitulino
2015-11-18 19:55   ` Auld, Will
2015-11-18 22:34 ` Marcelo Tosatti
2015-11-19  0:34   ` Marcelo Tosatti
2015-11-19  8:35     ` Thomas Gleixner
2015-11-19 13:44       ` Luiz Capitulino
2015-11-20 14:15       ` Marcelo Tosatti
2015-11-19  8:11   ` Thomas Gleixner
2015-11-19  0:01 ` Marcelo Tosatti
2015-11-19  1:05   ` Marcelo Tosatti
2015-11-19  9:09     ` Thomas Gleixner
2015-11-19 20:59       ` Marcelo Tosatti
2015-11-20  7:53         ` Thomas Gleixner
2015-11-20 17:51           ` Marcelo Tosatti
2015-11-19 20:30     ` Marcelo Tosatti
2015-11-19  9:07   ` Thomas Gleixner
2015-11-24  8:27   ` Chao Peng
     [not found]     ` <20151124212543.GA11303@amt.cnet>
2015-11-25  1:29       ` Marcelo Tosatti
2015-11-24  7:31 ` Chao Peng
2015-11-24 23:06   ` Marcelo Tosatti
2015-12-22 18:12 ` Yu, Fenghua
2015-12-23 10:28   ` Marcelo Tosatti
2015-12-29 12:44     ` Thomas Gleixner
2015-12-31 19:22       ` Marcelo Tosatti
2015-12-31 22:30         ` Thomas Gleixner
2016-01-04 17:20           ` Marcelo Tosatti [this message]
2016-01-04 17:44             ` Marcelo Tosatti
2016-01-05 23:09             ` Thomas Gleixner
2016-01-06 12:46               ` Marcelo Tosatti
2016-01-06 13:10                 ` Tejun Heo
2016-01-08 20:21               ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160104172054.GA21926@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=fenghua.yu@intel.com \
    --cc=lcapitulino@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=ravi.v.shankar@intel.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=tony.luck@intel.com \
    --cc=vikas.shivappa@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).