All of lore.kernel.org
 help / color / mirror / Atom feed
From: Neil Horman <nhorman@tuxdriver.com>
To: "Tan, Jianfeng" <jianfeng.tan@intel.com>
Cc: dev@dpdk.org, yuanhan.liu@intel.com
Subject: Re: [RFC] eal: add cgroup-aware resource self discovery
Date: Tue, 26 Jan 2016 09:19:07 -0500	[thread overview]
Message-ID: <20160126141907.GA20685@hmsreliant.think-freely.org> (raw)
In-Reply-To: <56A6D85A.6030400@intel.com>

On Tue, Jan 26, 2016 at 10:22:18AM +0800, Tan, Jianfeng wrote:
> 
> Hi Neil,
> 
> On 1/25/2016 9:46 PM, Neil Horman wrote:
> >On Mon, Jan 25, 2016 at 02:49:53AM +0800, Jianfeng Tan wrote:
> ...
> >>-- 
> >>2.1.4
> >>
> >>
> >
> >This doesn't make a whole lot of sense, for several reasons:
> >
> >1) Applications, as a general rule shouldn't be interrogating the cgroups
> >interface at all.
> 
> The main reason to do this in DPDK is that DPDK obtains resource information
> from sysfs and proc, which are not well containerized so far. And DPDK
> pre-allocates resource instead of on-demand gradual allocating.
> 
Not disagreeing with this, just suggesting that:

1) Interrogating cgroups really isn't the best way to collect that information
2) Pre-allocating those resources isn't particularly wise without some mechanism
to reallocate it, as resource constraints can change (consider your cpuset
getting rewritten)

> >
> >2) Cgroups aren't the only way in which a cpuset or memoryset can be restricted
> >(the isolcpus command line argument, or a taskset on a parent process for
> >instance, but there are several others).
> 
> Yes, I agree. To enable that, I'd like design the new API for resource self
> discovery in a flexible way. A parameter "type" is used to specify the
> solution to discovery way. In addition, I'm considering to add a callback
> function pointer so that users can write their own resource discovery
> functions.
> 
Why?  You don't need an API for this, or if you really want one, it can be very
generic if you use POSIX apis to gather the information.  What you have here is
going to be very linux specific, and will need reimplementing for BSD or other
operating systems.  To use the cpuset example, instead of reading and parsing
the mask files in the cgroup filesystem module to find your task and
corresponding mask, just call sched_setaffinity with an all f's mask, then call
sched_getaffinity.  The returned mask will be all the cpus your process is
allowed to execute on, taking into account every limiting filter the system you
are running on offers.

There are simmilar OS level POSIX apis for most resources out there.  You really
don't need to dig through cgroups just to learn what some of those reources are.

> >
> >Instead of trying to figure out what cpuset is valid for your process by
> >interrogating the cgroups heirarchy, instead you should follow the proscribed
> >method of calling sched_getaffinity after calling sched_setaffinity.  That will
> >give you the canonical cpuset that you are executing on, taking all cpuset
> >filters into account (including cgroups and any other restrictions).  Its far
> >simpler as well, as it doesn't require a ton of file/string processing.
> 
> Yes, this way is much better for cpuset discovery. But is there such a
> syscall for hugepages?
> 
In what capacity?  Interrogating how many hugepages you have, or to what node
they are affined to?  Capacity would require reading the requisite proc file, as
theres no posix api for this resource.  Node affinity can be implied by setting
the numa policy of the dpdk and then writing to /proc/nr_hugepages, as the
kernel will attempt to distribute hugepages evenly among the tasks' numa policy
configuration.

That said, I would advise that you strongly consider not exporting hugepages as
a resource, as:

a) Applications generally don't need to know that they are using hugepages, and
so they dont need to know where said hugepages live, they just allocate memory
via your allocation api and you give them something appropriate

b) Hugepages are a resource that are very specific to Linux, and to X86 Linux at
that.  Some OS implement simmilar resources, but they may have very different
semantics.  And other Arches may or may not implement various forms of compound
paging at all.  As the DPDK expands to support more OS'es and arches, it would
be nice to ensure that the programming surfaces that you expose have a more
broad level of support.

Neil

> Thanks,
> Jianfeng
> 
> >
> >Neil
> >
> 
> 

  reply	other threads:[~2016-01-26 14:19 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-24 18:49 [RFC] eal: add cgroup-aware resource self discovery Jianfeng Tan
2016-01-25 13:46 ` Neil Horman
2016-01-26  2:22   ` Tan, Jianfeng
2016-01-26 14:19     ` Neil Horman [this message]
2016-01-27 12:02       ` Tan, Jianfeng
2016-01-27 17:30         ` Neil Horman
2016-01-29 11:22 ` [PATCH] eal: make resource initialization more robust Jianfeng Tan
2016-02-01 18:08   ` Neil Horman
2016-02-22  6:08   ` Tan, Jianfeng
2016-02-22 13:18     ` Neil Horman
2016-02-28 21:12   ` Thomas Monjalon
2016-02-29  1:50     ` Tan, Jianfeng
2016-03-04 10:05 ` [PATCH] eal: add option --avail-cores to detect lcores Jianfeng Tan
2016-03-08  8:54   ` Panu Matilainen
2016-03-08 17:38     ` Tan, Jianfeng
2016-03-09 13:05       ` Panu Matilainen
2016-03-09 13:53         ` Tan, Jianfeng
2016-03-09 14:01           ` Ananyev, Konstantin
2016-03-09 14:17             ` Tan, Jianfeng
2016-03-09 14:44               ` Ananyev, Konstantin
2016-03-09 14:55                 ` Tan, Jianfeng
2016-03-09 15:17                   ` Ananyev, Konstantin
2016-03-09 17:45                     ` Tan, Jianfeng
2016-03-09 19:33                       ` Ananyev, Konstantin
2016-03-10  1:36                         ` Tan, Jianfeng
2016-05-18 12:46         ` David Marchand
2016-05-19  2:25           ` Tan, Jianfeng
2016-06-30 13:43             ` Thomas Monjalon
2016-07-01  0:52               ` Tan, Jianfeng
2016-04-26 12:39   ` Tan, Jianfeng
2016-03-04 10:58 ` [PATCH] eal: make hugetlb initialization more robust Jianfeng Tan
2016-03-08  1:42   ` [PATCH v2] " Jianfeng Tan
2016-03-08  8:46     ` Tan, Jianfeng
2016-05-04 11:07     ` Sergio Gonzalez Monroy
2016-05-04 11:28       ` Tan, Jianfeng
2016-05-04 12:25     ` Sergio Gonzalez Monroy
2016-05-09 10:48   ` [PATCH v3] " Jianfeng Tan
2016-05-10  8:54     ` Sergio Gonzalez Monroy
2016-05-10  9:11       ` Tan, Jianfeng
2016-05-12  0:44   ` [PATCH v4] " Jianfeng Tan
2016-05-17 16:39     ` David Marchand
2016-05-18  7:56       ` Sergio Gonzalez Monroy
2016-05-18  9:34         ` David Marchand
2016-05-19  2:00       ` Tan, Jianfeng
2016-05-17 16:40     ` Thomas Monjalon
2016-05-18  8:06       ` Sergio Gonzalez Monroy
2016-05-18  9:38         ` David Marchand
2016-05-19  2:11         ` Tan, Jianfeng
2016-05-31  3:37 ` [PATCH v5] eal: fix allocating all free hugepages Jianfeng Tan
2016-06-06  2:49   ` Pei, Yulong
2016-06-08 11:27   ` Sergio Gonzalez Monroy
2016-06-30 13:34     ` Thomas Monjalon
2016-08-31  3:07 ` [PATCH v2] eal: restrict cores detection Jianfeng Tan
2016-08-31 15:30   ` Stephen Hemminger
2016-09-01  1:15     ` Tan, Jianfeng
2016-09-01  1:31 ` [PATCH v3] " Jianfeng Tan
2016-09-02 16:53   ` Bruce Richardson
2016-09-16 14:04     ` Thomas Monjalon
2016-09-16 14:02   ` Thomas Monjalon
2016-12-02 17:48   ` [PATCH v4] eal: restrict cores auto detection Jianfeng Tan
2016-12-08 18:19     ` Thomas Monjalon
2016-12-09 15:14       ` Bruce Richardson
2016-12-21 14:31         ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160126141907.GA20685@hmsreliant.think-freely.org \
    --to=nhorman@tuxdriver.com \
    --cc=dev@dpdk.org \
    --cc=jianfeng.tan@intel.com \
    --cc=yuanhan.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.