linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] coccinelle: add pycocci wrapper for multithreaded support
@ 2014-04-10 17:48 Luis R. Rodriguez
  2014-04-10 17:51 ` Johannes Berg
  0 siblings, 1 reply; 11+ messages in thread
From: Luis R. Rodriguez @ 2014-04-10 17:48 UTC (permalink / raw)
  To: cocci; +Cc: Luis R. Rodriguez, Johannes Berg, backports, linux-kernel

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This is a wrapper for folks which by work on git trees, specifically
the linux kernel with lots of files and with random task Cocci files.
The assumption all you need is multithreaded support and currently only
a shell script is lying around, but that isn't easily extensible, nor
is it dynamic. This uses Python to add Coccinelle's mechanisms for
multithreaded support but also enables all sorts of defaults which
you'd expect to be enabled when using Coccinelle for Linux kernel
development.

You just pass it a cocci file, a target dir, and in git environments
you always want --in-place enabled. Experiments and profiling random
cocci files with the Linux kernel show that using just using number of
CPUs doesn't scale well given that lots of buckets of files don't require
work, as such this uses 10 * number of CPUs for its number of threads.
For work that define more general ruler 3 * number of CPUs works better,
but for smaller cocci files 3 * number of CPUs performs best right now.
To experiment more with what's going on with the multithreading one can enable
htop while kicking off a cocci task on the kernel, we want to keep
these CPUs busy as much as possible. You can override the number of
threads with pycocci with -j or --jobs. The problem with jobless threads
can be seen here:

http://drvbp1.linux-foundation.org/~mcgrof/images/coccinelle-backports/cocci-jobless-processes.png

A healthy run would keep all the CPUs busy as in here:

http://drvbp1.linux-foundation.org/~mcgrof/images/coccinelle-backports/after-threaded-cocci.png

This is heavily based on the multithreading implementation completed
on the Linux backports project, this just generalizes it and takes it
out of there in case others can make use of it -- I did as I wanted to
make upstream changes with Coccinelle. Note that multithreading implementation
for Coccinelle is currently being discussed to make CPU usage more efficient,
so this currently is only a helper.

Since its just a helper I toss it into the python directory but don't
install it. Hope is that we can evolve it there instead of carrying this
helper within backports.

Sample run:

mcgrof@garbanzo ~/linux-next (git::master)$ time ./pycocci
0001-netdev_ops.cocci ./

real    24m13.402s
user    72m27.072s
sys     22m38.812s

With this Coccinelle SmPL rule:

@@
struct net_device *dev;
struct net_device_ops ops;
@@
-dev->netdev_ops = &ops;
+netdev_attach_ops(dev, &ops);

Cc: Johannes Berg <johannes.berg@intel.com>
Cc: backports@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: cocci@systeme.lip6.fr
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 python/pycocci | 193 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 193 insertions(+)
 create mode 100755 python/pycocci

diff --git a/python/pycocci b/python/pycocci
new file mode 100755
index 0000000..4b3ef38
--- /dev/null
+++ b/python/pycocci
@@ -0,0 +1,193 @@
+#!/usr/bin/env python
+#
+# Copyright (c) 2014 Luis R. Rodriguez  <mcgrof@suse.com>
+# Copyright (c) 2013 Johannes Berg <johannes.berg@intel.com>
+#
+# This file is released under the GPLv2.
+#
+# Python wrapper for Coccinelle for multithreaded support,
+# designed to be used for working on a git tree, and with sensible
+# defaults, specifically for kernel developers.
+
+from multiprocessing import Process, cpu_count, Queue
+import argparse, subprocess, os, sys
+import tempfile, shutil
+
+# simple tempdir wrapper object for 'with' statement
+#
+# Usage:
+# with tempdir.tempdir() as tmpdir:
+#     os.chdir(tmpdir)
+#     do something
+class tempdir(object):
+    def __init__(self, suffix='', prefix='', dir=None, nodelete=False):
+        self.suffix = ''
+        self.prefix = ''
+        self.dir = dir
+        self.nodelete = nodelete
+
+    def __enter__(self):
+        self._name = tempfile.mkdtemp(suffix=self.suffix,
+                                      prefix=self.prefix,
+                                      dir=self.dir)
+        return self._name
+
+    def __exit__(self, type, value, traceback):
+        if self.nodelete:
+            print('not deleting directory %s!' % self._name)
+        else:
+            shutil.rmtree(self._name)
+
+class CoccinelleError(Exception):
+    pass
+class ExecutionError(CoccinelleError):
+    def __init__(self, cmd, errcode):
+        self.error_code = errcode
+        print('Failed command:')
+        print(' '.join(cmd))
+
+class ExecutionErrorThread(CoccinelleError):
+    def __init__(self, errcode, fn, cocci_file, threads, t, logwrite, print_name):
+        self.error_code = errcode
+        logwrite("Failed to apply changes from %s" % print_name)
+
+        logwrite("Specific log output from change that failed using %s" % print_name)
+        tf = open(fn, 'r')
+        for line in tf.read():
+            logwrite('> %s' % line)
+        tf.close()
+
+        logwrite("Full log using %s" % print_name)
+        for num in range(threads):
+            fn = os.path.join(t, '.tmp_spatch_worker.' + str(num))
+            if (not os.path.isfile(fn)):
+                continue
+            tf = open(fn, 'r')
+            for line in tf.read():
+                logwrite('> %s' % line)
+            tf.close()
+            os.unlink(fn)
+
+def spatch(cocci_file, outdir,
+           max_threads, thread_id, temp_dir, ret_q, extra_args=[]):
+    cmd = ['spatch', '--sp-file', cocci_file, '--in-place',
+            '--recursive-includes',
+            '--backup-suffix', '.cocci_backup', '--dir', outdir]
+
+    if (max_threads > 1):
+        cmd.extend(['-max', str(max_threads), '-index', str(thread_id)])
+
+    cmd.extend(extra_args)
+
+    fn = os.path.join(temp_dir, '.tmp_spatch_worker.' + str(thread_id))
+    outfile = open(fn, 'w')
+
+    sprocess = subprocess.Popen(cmd,
+                               stdout=outfile, stderr=subprocess.STDOUT,
+                               close_fds=True, universal_newlines=True)
+    sprocess.wait()
+    if sprocess.returncode != 0:
+        raise ExecutionError(cmd, sprocess.returncode)
+    outfile.close()
+    ret_q.put((sprocess.returncode, fn))
+
+def threaded_spatch(cocci_file, outdir, logwrite, num_jobs,
+                    print_name, extra_args=[]):
+    num_cpus = cpu_count()
+    # A lengthy comment is worthy here. As of spatch version 1.0.0-rc20
+    # Coccinelle will break out target files into buckets and a thread
+    # will work on each bucket. Turns out that after inspection while
+    # leaving htop running and reading results after profiling we know
+    # that CPUs are left idle after tasks which have no work to do finish
+    # fast. This leaves CPUs jobless and hungry. Experiments with *really* long
+    # cocci files (all of the Linux backports cocci files in one file is an
+    # example) show that currently num_cpus * 3 provides reasonable completion
+    # time, while smaller rules can use more threads, currently we set this
+    # to 10. You however are more than welcomed to experiment and override
+    # this. Note that its currently being discussed how to best optimize
+    # things even further for Coccinelle.
+    #
+    # Images available of htop before multithreading:
+    # http://drvbp1.linux-foundation.org/~mcgrof/images/coccinelle-backports/before-threaded-cocci.png
+    # The jobless issue on threads if its just num_cpus after a period of time:
+    # http://drvbp1.linux-foundation.org/~mcgrof/images/coccinelle-backports/cocci-jobless-processes.png
+    # A happy healthy run should look like over most of the run:
+    # http://drvbp1.linux-foundation.org/~mcgrof/images/coccinelle-backports/after-threaded-cocci.png
+    if num_jobs:
+        threads = num_jobs
+    else:
+        threads = num_cpus * 10
+    jobs = list()
+    output = ""
+    ret_q = Queue()
+    with tempdir() as t:
+        for num in range(threads):
+            p = Process(target=spatch, args=(cocci_file, outdir,
+                                             threads, num, t, ret_q,
+                                             extra_args))
+            jobs.append(p)
+        for p in jobs:
+            p.start()
+
+        for num in range(threads):
+            ret, fn = ret_q.get()
+            if ret != 0:
+                raise ExecutionErrorThread(ret, fn, cocci_file, threads, t,
+                                           logwrite, print_name)
+        for job in jobs:
+            p.join()
+
+        for num in range(threads):
+            fn = os.path.join(t, '.tmp_spatch_worker.' + str(num))
+            tf = open(fn, 'r')
+            output = output + tf.read()
+            tf.close()
+            os.unlink(fn)
+        return output
+
+def logwrite(msg):
+    sys.stdout.write(msg)
+    sys.stdout.flush()
+
+def _main():
+    parser = argparse.ArgumentParser(description='Multithreaded Python wrapper for Coccinelle ' +
+                                     'with sensible defaults, targetted specifically ' +
+                                     'for git development environments')
+    parser.add_argument('cocci_file', metavar='<Coccinelle SmPL rules file>', type=str,
+                        help='This is the Coccinelle file you want to use')
+    parser.add_argument('target_dir', metavar='<target directory>', type=str,
+                        help='Target source directory to modify')
+    parser.add_argument('-p', '--profile-cocci', const=True, default=False, action="store_const",
+                        help='Enable profile, this will pass --profile  to Coccinelle.')
+    parser.add_argument('-j', '--jobs', metavar='<jobs>', type=str, default=None,
+                        help='Only use the cocci file passed for Coccinelle, don\'t do anything else, ' +
+                        'also creates a git repo on the target directory for easy inspection ' +
+                        'of changes done by Coccinelle.')
+    parser.add_argument('-v', '--verbose', const=True, default=False, action="store_const",
+                        help='Enable output from Coccinelle')
+    args = parser.parse_args()
+
+    if not os.path.isfile(args.cocci_file):
+        return -2
+
+    extra_spatch_args = []
+    if args.profile_cocci:
+        extra_spatch_args.append('--profile')
+    jobs = 0
+    if args.jobs > 0:
+        jobs = args.jobs
+
+    output = threaded_spatch(args.cocci_file,
+                             args.target_dir,
+                             logwrite,
+                             jobs,
+                             os.path.basename(args.cocci_file),
+                             extra_args=extra_spatch_args)
+    if args.verbose:
+        logwrite(output)
+    return 0
+
+if __name__ == '__main__':
+    ret = _main()
+    if ret:
+        sys.exit(ret)
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] coccinelle: add pycocci wrapper for multithreaded support
  2014-04-10 17:48 [PATCH] coccinelle: add pycocci wrapper for multithreaded support Luis R. Rodriguez
@ 2014-04-10 17:51 ` Johannes Berg
  2014-04-10 17:57   ` Luis R. Rodriguez
  0 siblings, 1 reply; 11+ messages in thread
From: Johannes Berg @ 2014-04-10 17:51 UTC (permalink / raw)
  To: Luis R. Rodriguez; +Cc: cocci, Luis R. Rodriguez, backports, linux-kernel

On Thu, 2014-04-10 at 10:48 -0700, Luis R. Rodriguez wrote:

> You just pass it a cocci file, a target dir, and in git environments
> you always want --in-place enabled. Experiments and profiling random
> cocci files with the Linux kernel show that using just using number of
> CPUs doesn't scale well given that lots of buckets of files don't require
> work, as such this uses 10 * number of CPUs for its number of threads.
> For work that define more general ruler 3 * number of CPUs works better,
> but for smaller cocci files 3 * number of CPUs performs best right now.
> To experiment more with what's going on with the multithreading one can enable
> htop while kicking off a cocci task on the kernel, we want to keep
> these CPUs busy as much as possible. 

That's not really a good benchmark, you want to actually check how
quickly it finishes ... If you have some IO issues then just keeping the
CPUs busy trying to do IO won't help at all.


> Since its just a helper I toss it into the python directory but don't
> install it. Hope is that we can evolve it there instead of carrying this
> helper within backports.

If there's a plan to make coccinelle itself multi-threaded, what's the
point?

johannes


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] coccinelle: add pycocci wrapper for multithreaded support
  2014-04-10 17:51 ` Johannes Berg
@ 2014-04-10 17:57   ` Luis R. Rodriguez
  2014-04-10 19:32     ` [Cocci] " Julia Lawall
  2014-04-11  5:55     ` SF Markus Elfring
  0 siblings, 2 replies; 11+ messages in thread
From: Luis R. Rodriguez @ 2014-04-10 17:57 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Luis R. Rodriguez, cocci, backports, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2271 bytes --]

On Thu, Apr 10, 2014 at 07:51:29PM +0200, Johannes Berg wrote:
> On Thu, 2014-04-10 at 10:48 -0700, Luis R. Rodriguez wrote:
> 
> > You just pass it a cocci file, a target dir, and in git environments
> > you always want --in-place enabled. Experiments and profiling random
> > cocci files with the Linux kernel show that using just using number of
> > CPUs doesn't scale well given that lots of buckets of files don't require
> > work, as such this uses 10 * number of CPUs for its number of threads.
> > For work that define more general ruler 3 * number of CPUs works better,
> > but for smaller cocci files 3 * number of CPUs performs best right now.
> > To experiment more with what's going on with the multithreading one can enable
> > htop while kicking off a cocci task on the kernel, we want to keep
> > these CPUs busy as much as possible. 
> 
> That's not really a good benchmark, you want to actually check how
> quickly it finishes ... If you have some IO issues then just keeping the
> CPUs busy trying to do IO won't help at all.

I checked the profile results, the reason the jobs finish is some threads
had no work or little work. Hence why I increased the number of threads,
depending on the context (long or short cocci expected, in backports
at least, the long being all cocci files in one, the short being --test-cocci
flag to gentree.py). This wrapper uses the short assumption with 10 * num_cpus

> > Since its just a helper I toss it into the python directory but don't
> > install it. Hope is that we can evolve it there instead of carrying this
> > helper within backports.
> 
> If there's a plan to make coccinelle itself multi-threaded, what's the
> point?

To be clear, Coccinelle *has* a form of multithreaded support but requires manual
spawning of jobs with references to the max count and also the number thread
that this new process you are spawning belongs to. There's plans to consider
reworking things to handle all this internally but as I discussed with Julia
the changes required would require some structural changes, and as such we
need to live with this for a bit longer. I need to use Coccinelle daily now,
so figured I'd punt this out there in case others might make use of it.

  Luis

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Cocci] [PATCH] coccinelle: add pycocci wrapper for multithreaded support
  2014-04-10 17:57   ` Luis R. Rodriguez
@ 2014-04-10 19:32     ` Julia Lawall
  2014-08-28  3:46       ` Luis R. Rodriguez
  2014-04-11  5:55     ` SF Markus Elfring
  1 sibling, 1 reply; 11+ messages in thread
From: Julia Lawall @ 2014-04-10 19:32 UTC (permalink / raw)
  To: Luis R. Rodriguez; +Cc: Johannes Berg, linux-kernel, backports, cocci



On Thu, 10 Apr 2014, Luis R. Rodriguez wrote:

> On Thu, Apr 10, 2014 at 07:51:29PM +0200, Johannes Berg wrote:
> > On Thu, 2014-04-10 at 10:48 -0700, Luis R. Rodriguez wrote:
> > 
> > > You just pass it a cocci file, a target dir, and in git environments
> > > you always want --in-place enabled. Experiments and profiling random
> > > cocci files with the Linux kernel show that using just using number of
> > > CPUs doesn't scale well given that lots of buckets of files don't require
> > > work, as such this uses 10 * number of CPUs for its number of threads.
> > > For work that define more general ruler 3 * number of CPUs works better,
> > > but for smaller cocci files 3 * number of CPUs performs best right now.
> > > To experiment more with what's going on with the multithreading one can enable
> > > htop while kicking off a cocci task on the kernel, we want to keep
> > > these CPUs busy as much as possible. 
> > 
> > That's not really a good benchmark, you want to actually check how
> > quickly it finishes ... If you have some IO issues then just keeping the
> > CPUs busy trying to do IO won't help at all.
> 
> I checked the profile results, the reason the jobs finish is some threads
> had no work or little work. Hence why I increased the number of threads,
> depending on the context (long or short cocci expected, in backports
> at least, the long being all cocci files in one, the short being --test-cocci
> flag to gentree.py). This wrapper uses the short assumption with 10 * num_cpus
> 
> > > Since its just a helper I toss it into the python directory but don't
> > > install it. Hope is that we can evolve it there instead of carrying this
> > > helper within backports.
> > 
> > If there's a plan to make coccinelle itself multi-threaded, what's the
> > point?
> 
> To be clear, Coccinelle *has* a form of multithreaded support but requires manual
> spawning of jobs with references to the max count and also the number thread
> that this new process you are spawning belongs to. There's plans to consider
> reworking things to handle all this internally but as I discussed with Julia
> the changes required would require some structural changes, and as such we
> need to live with this for a bit longer. I need to use Coccinelle daily now,
> so figured I'd punt this out there in case others might make use of it.

I agree with Luis.  Multithreading inside Coccinelle is currently a 
priority task, but not a highest priority one.

julia

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] coccinelle: add pycocci wrapper for multithreaded support
  2014-04-10 17:57   ` Luis R. Rodriguez
  2014-04-10 19:32     ` [Cocci] " Julia Lawall
@ 2014-04-11  5:55     ` SF Markus Elfring
  2014-04-11  6:01       ` [Cocci] " Julia Lawall
  1 sibling, 1 reply; 11+ messages in thread
From: SF Markus Elfring @ 2014-04-11  5:55 UTC (permalink / raw)
  To: Luis R. Rodriguez, Johannes Berg
  Cc: Luis R. Rodriguez, cocci, backports, linux-kernel

> I checked the profile results, the reason the jobs finish is some threads
> had no work or little work.

Could you find out during the data processing which parts or files
result in a special application behaviour you would like to point out here?

Regards,
Markus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Cocci] [PATCH] coccinelle: add pycocci wrapper for multithreaded support
  2014-04-11  5:55     ` SF Markus Elfring
@ 2014-04-11  6:01       ` Julia Lawall
  2014-04-11  6:15         ` SF Markus Elfring
  2014-04-11 19:00         ` Luis R. Rodriguez
  0 siblings, 2 replies; 11+ messages in thread
From: Julia Lawall @ 2014-04-11  6:01 UTC (permalink / raw)
  To: SF Markus Elfring
  Cc: Luis R. Rodriguez, Johannes Berg, linux-kernel, backports, cocci



On Fri, 11 Apr 2014, SF Markus Elfring wrote:

> > I checked the profile results, the reason the jobs finish is some threads
> > had no work or little work.
> 
> Could you find out during the data processing which parts or files
> result in a special application behaviour you would like to point out here?

I don't understand the question at all, but since the various files have 
different properties, it is hard to determine automatically in advance how 
much work Coccinelle will need to do on each one.

julia

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Cocci] [PATCH] coccinelle: add pycocci wrapper for multithreaded support
  2014-04-11  6:01       ` [Cocci] " Julia Lawall
@ 2014-04-11  6:15         ` SF Markus Elfring
  2014-04-11 19:00         ` Luis R. Rodriguez
  1 sibling, 0 replies; 11+ messages in thread
From: SF Markus Elfring @ 2014-04-11  6:15 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Luis R. Rodriguez, Johannes Berg, linux-kernel, backports, cocci

>> Could you find out during the data processing which parts or files
>> result in a special application behaviour you would like to point out here?
> I don't understand the question at all, but since the various files have 
> different properties, it is hard to determine automatically in advance how 
> much work Coccinelle will need to do on each one.

It was reported that a system utilisation did not fit to some
expectations. I am curious if any more details or patterns can be
determined for an observed situation.

Do the involved files need also another look?
- semantic patches
- source directories

Regards,
Markus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Cocci] [PATCH] coccinelle: add pycocci wrapper for multithreaded support
  2014-04-11  6:01       ` [Cocci] " Julia Lawall
  2014-04-11  6:15         ` SF Markus Elfring
@ 2014-04-11 19:00         ` Luis R. Rodriguez
  1 sibling, 0 replies; 11+ messages in thread
From: Luis R. Rodriguez @ 2014-04-11 19:00 UTC (permalink / raw)
  To: Julia Lawall
  Cc: SF Markus Elfring, Johannes Berg, linux-kernel, backports, cocci

[-- Attachment #1: Type: text/plain, Size: 1573 bytes --]

On Fri, Apr 11, 2014 at 08:01:04AM +0200, Julia Lawall wrote:
> 
> 
> On Fri, 11 Apr 2014, SF Markus Elfring wrote:
> 
> > > I checked the profile results, the reason the jobs finish is some threads
> > > had no work or little work.
> > 
> > Could you find out during the data processing which parts or files
> > result in a special application behaviour you would like to point out here?
> 
> I don't understand the question at all, but since the various files have 
> different properties, it is hard to determine automatically in advance how 
> much work Coccinelle will need to do on each one.

For the person who might work on enhancing multithreading support, I'd wonder
if there could be gains of actually putting an effort out first to evaluate
which files have one rule hit and then adding the file to an activity file lis
to later be spread between the threads. As you note though it is hard to
determine this in advance though given that each rule express any change.

I think one small change which could help, and likely not incur a drastic
immediate change on architecture could be to not let theads take files / jobs
list of files, but instead just take say:

	work_task_n = (files / jobs) / 10

The list of files needing work could then be kept on a list protected
against the threads, and each thread will only die if all the files
have been worked on already. This would enable keeping number_cpu
threads only, as each CPU would indeed be busy all the time.

BTW is the patch Acked-by Julia? Can we commit it :)

  Luis

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Cocci] [PATCH] coccinelle: add pycocci wrapper for multithreaded support
  2014-04-10 19:32     ` [Cocci] " Julia Lawall
@ 2014-08-28  3:46       ` Luis R. Rodriguez
  2014-08-28 18:15         ` Julia Lawall
  0 siblings, 1 reply; 11+ messages in thread
From: Luis R. Rodriguez @ 2014-08-28  3:46 UTC (permalink / raw)
  To: Julia Lawall; +Cc: Johannes Berg, linux-kernel, backports, cocci

On Thu, Apr 10, 2014 at 09:32:34PM +0200, Julia Lawall wrote:
> 
> 
> On Thu, 10 Apr 2014, Luis R. Rodriguez wrote:
> 
> > On Thu, Apr 10, 2014 at 07:51:29PM +0200, Johannes Berg wrote:
> > > On Thu, 2014-04-10 at 10:48 -0700, Luis R. Rodriguez wrote:
> > > 
> > > > You just pass it a cocci file, a target dir, and in git environments
> > > > you always want --in-place enabled. Experiments and profiling random
> > > > cocci files with the Linux kernel show that using just using number of
> > > > CPUs doesn't scale well given that lots of buckets of files don't require
> > > > work, as such this uses 10 * number of CPUs for its number of threads.
> > > > For work that define more general ruler 3 * number of CPUs works better,
> > > > but for smaller cocci files 3 * number of CPUs performs best right now.
> > > > To experiment more with what's going on with the multithreading one can enable
> > > > htop while kicking off a cocci task on the kernel, we want to keep
> > > > these CPUs busy as much as possible. 
> > > 
> > > That's not really a good benchmark, you want to actually check how
> > > quickly it finishes ... If you have some IO issues then just keeping the
> > > CPUs busy trying to do IO won't help at all.
> > 
> > I checked the profile results, the reason the jobs finish is some threads
> > had no work or little work. Hence why I increased the number of threads,
> > depending on the context (long or short cocci expected, in backports
> > at least, the long being all cocci files in one, the short being --test-cocci
> > flag to gentree.py). This wrapper uses the short assumption with 10 * num_cpus
> > 
> > > > Since its just a helper I toss it into the python directory but don't
> > > > install it. Hope is that we can evolve it there instead of carrying this
> > > > helper within backports.
> > > 
> > > If there's a plan to make coccinelle itself multi-threaded, what's the
> > > point?
> > 
> > To be clear, Coccinelle *has* a form of multithreaded support but requires manual
> > spawning of jobs with references to the max count and also the number thread
> > that this new process you are spawning belongs to. There's plans to consider
> > reworking things to handle all this internally but as I discussed with Julia
> > the changes required would require some structural changes, and as such we
> > need to live with this for a bit longer. I need to use Coccinelle daily now,
> > so figured I'd punt this out there in case others might make use of it.
> 
> I agree with Luis.  Multithreading inside Coccinelle is currently a 
> priority task, but not a highest priority one.

Folks, anyone object to merging pycocci in the meantime? I keep using it outside
of backports and it does what I think most kernel developers expect. This would
be until we get proper parallelism support in place.

  Luis

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Cocci] [PATCH] coccinelle: add pycocci wrapper for multithreaded support
  2014-08-28  3:46       ` Luis R. Rodriguez
@ 2014-08-28 18:15         ` Julia Lawall
  2014-08-28 20:02           ` Luis R. Rodriguez
  0 siblings, 1 reply; 11+ messages in thread
From: Julia Lawall @ 2014-08-28 18:15 UTC (permalink / raw)
  To: Luis R. Rodriguez; +Cc: Johannes Berg, linux-kernel, backports, cocci



On Thu, 28 Aug 2014, Luis R. Rodriguez wrote:

> On Thu, Apr 10, 2014 at 09:32:34PM +0200, Julia Lawall wrote:
> > 
> > 
> > On Thu, 10 Apr 2014, Luis R. Rodriguez wrote:
> > 
> > > On Thu, Apr 10, 2014 at 07:51:29PM +0200, Johannes Berg wrote:
> > > > On Thu, 2014-04-10 at 10:48 -0700, Luis R. Rodriguez wrote:
> > > > 
> > > > > You just pass it a cocci file, a target dir, and in git environments
> > > > > you always want --in-place enabled. Experiments and profiling random
> > > > > cocci files with the Linux kernel show that using just using number of
> > > > > CPUs doesn't scale well given that lots of buckets of files don't require
> > > > > work, as such this uses 10 * number of CPUs for its number of threads.
> > > > > For work that define more general ruler 3 * number of CPUs works better,
> > > > > but for smaller cocci files 3 * number of CPUs performs best right now.
> > > > > To experiment more with what's going on with the multithreading one can enable
> > > > > htop while kicking off a cocci task on the kernel, we want to keep
> > > > > these CPUs busy as much as possible. 
> > > > 
> > > > That's not really a good benchmark, you want to actually check how
> > > > quickly it finishes ... If you have some IO issues then just keeping the
> > > > CPUs busy trying to do IO won't help at all.
> > > 
> > > I checked the profile results, the reason the jobs finish is some threads
> > > had no work or little work. Hence why I increased the number of threads,
> > > depending on the context (long or short cocci expected, in backports
> > > at least, the long being all cocci files in one, the short being --test-cocci
> > > flag to gentree.py). This wrapper uses the short assumption with 10 * num_cpus
> > > 
> > > > > Since its just a helper I toss it into the python directory but don't
> > > > > install it. Hope is that we can evolve it there instead of carrying this
> > > > > helper within backports.
> > > > 
> > > > If there's a plan to make coccinelle itself multi-threaded, what's the
> > > > point?
> > > 
> > > To be clear, Coccinelle *has* a form of multithreaded support but requires manual
> > > spawning of jobs with references to the max count and also the number thread
> > > that this new process you are spawning belongs to. There's plans to consider
> > > reworking things to handle all this internally but as I discussed with Julia
> > > the changes required would require some structural changes, and as such we
> > > need to live with this for a bit longer. I need to use Coccinelle daily now,
> > > so figured I'd punt this out there in case others might make use of it.
> > 
> > I agree with Luis.  Multithreading inside Coccinelle is currently a 
> > priority task, but not a highest priority one.
> 
> Folks, anyone object to merging pycocci in the meantime? I keep using it outside
> of backports and it does what I think most kernel developers expect. This would
> be until we get proper parallelism support in place.

Merge away...

julia

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Cocci] [PATCH] coccinelle: add pycocci wrapper for multithreaded support
  2014-08-28 18:15         ` Julia Lawall
@ 2014-08-28 20:02           ` Luis R. Rodriguez
  0 siblings, 0 replies; 11+ messages in thread
From: Luis R. Rodriguez @ 2014-08-28 20:02 UTC (permalink / raw)
  To: Julia Lawall; +Cc: Johannes Berg, linux-kernel, backports, cocci

On Thu, Aug 28, 2014 at 08:15:16PM +0200, Julia Lawall wrote:
> 
> 
> On Thu, 28 Aug 2014, Luis R. Rodriguez wrote:
> 
> > On Thu, Apr 10, 2014 at 09:32:34PM +0200, Julia Lawall wrote:
> > > 
> > > 
> > > On Thu, 10 Apr 2014, Luis R. Rodriguez wrote:
> > > 
> > > > On Thu, Apr 10, 2014 at 07:51:29PM +0200, Johannes Berg wrote:
> > > > > On Thu, 2014-04-10 at 10:48 -0700, Luis R. Rodriguez wrote:
> > > > > 
> > > > > > You just pass it a cocci file, a target dir, and in git environments
> > > > > > you always want --in-place enabled. Experiments and profiling random
> > > > > > cocci files with the Linux kernel show that using just using number of
> > > > > > CPUs doesn't scale well given that lots of buckets of files don't require
> > > > > > work, as such this uses 10 * number of CPUs for its number of threads.
> > > > > > For work that define more general ruler 3 * number of CPUs works better,
> > > > > > but for smaller cocci files 3 * number of CPUs performs best right now.
> > > > > > To experiment more with what's going on with the multithreading one can enable
> > > > > > htop while kicking off a cocci task on the kernel, we want to keep
> > > > > > these CPUs busy as much as possible. 
> > > > > 
> > > > > That's not really a good benchmark, you want to actually check how
> > > > > quickly it finishes ... If you have some IO issues then just keeping the
> > > > > CPUs busy trying to do IO won't help at all.
> > > > 
> > > > I checked the profile results, the reason the jobs finish is some threads
> > > > had no work or little work. Hence why I increased the number of threads,
> > > > depending on the context (long or short cocci expected, in backports
> > > > at least, the long being all cocci files in one, the short being --test-cocci
> > > > flag to gentree.py). This wrapper uses the short assumption with 10 * num_cpus
> > > > 
> > > > > > Since its just a helper I toss it into the python directory but don't
> > > > > > install it. Hope is that we can evolve it there instead of carrying this
> > > > > > helper within backports.
> > > > > 
> > > > > If there's a plan to make coccinelle itself multi-threaded, what's the
> > > > > point?
> > > > 
> > > > To be clear, Coccinelle *has* a form of multithreaded support but requires manual
> > > > spawning of jobs with references to the max count and also the number thread
> > > > that this new process you are spawning belongs to. There's plans to consider
> > > > reworking things to handle all this internally but as I discussed with Julia
> > > > the changes required would require some structural changes, and as such we
> > > > need to live with this for a bit longer. I need to use Coccinelle daily now,
> > > > so figured I'd punt this out there in case others might make use of it.
> > > 
> > > I agree with Luis.  Multithreading inside Coccinelle is currently a 
> > > priority task, but not a highest priority one.
> > 
> > Folks, anyone object to merging pycocci in the meantime? I keep using it outside
> > of backports and it does what I think most kernel developers expect. This would
> > be until we get proper parallelism support in place.
> 
> Merge away...

Pushed.

  Luis

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-08-28 20:02 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-10 17:48 [PATCH] coccinelle: add pycocci wrapper for multithreaded support Luis R. Rodriguez
2014-04-10 17:51 ` Johannes Berg
2014-04-10 17:57   ` Luis R. Rodriguez
2014-04-10 19:32     ` [Cocci] " Julia Lawall
2014-08-28  3:46       ` Luis R. Rodriguez
2014-08-28 18:15         ` Julia Lawall
2014-08-28 20:02           ` Luis R. Rodriguez
2014-04-11  5:55     ` SF Markus Elfring
2014-04-11  6:01       ` [Cocci] " Julia Lawall
2014-04-11  6:15         ` SF Markus Elfring
2014-04-11 19:00         ` Luis R. Rodriguez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).