All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] coccinelle: add pycocci wrapper for multithreaded support
@ 2014-04-10 17:48 ` Luis R. Rodriguez
  0 siblings, 0 replies; 28+ messages in thread
From: Luis R. Rodriguez @ 2014-04-10 17:48 UTC (permalink / raw)
  To: cocci; +Cc: Luis R. Rodriguez, Johannes Berg, backports, linux-kernel

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This is a wrapper for folks which by work on git trees, specifically
the linux kernel with lots of files and with random task Cocci files.
The assumption all you need is multithreaded support and currently only
a shell script is lying around, but that isn't easily extensible, nor
is it dynamic. This uses Python to add Coccinelle's mechanisms for
multithreaded support but also enables all sorts of defaults which
you'd expect to be enabled when using Coccinelle for Linux kernel
development.

You just pass it a cocci file, a target dir, and in git environments
you always want --in-place enabled. Experiments and profiling random
cocci files with the Linux kernel show that using just using number of
CPUs doesn't scale well given that lots of buckets of files don't require
work, as such this uses 10 * number of CPUs for its number of threads.
For work that define more general ruler 3 * number of CPUs works better,
but for smaller cocci files 3 * number of CPUs performs best right now.
To experiment more with what's going on with the multithreading one can enable
htop while kicking off a cocci task on the kernel, we want to keep
these CPUs busy as much as possible. You can override the number of
threads with pycocci with -j or --jobs. The problem with jobless threads
can be seen here:

http://drvbp1.linux-foundation.org/~mcgrof/images/coccinelle-backports/cocci-jobless-processes.png

A healthy run would keep all the CPUs busy as in here:

http://drvbp1.linux-foundation.org/~mcgrof/images/coccinelle-backports/after-threaded-cocci.png

This is heavily based on the multithreading implementation completed
on the Linux backports project, this just generalizes it and takes it
out of there in case others can make use of it -- I did as I wanted to
make upstream changes with Coccinelle. Note that multithreading implementation
for Coccinelle is currently being discussed to make CPU usage more efficient,
so this currently is only a helper.

Since its just a helper I toss it into the python directory but don't
install it. Hope is that we can evolve it there instead of carrying this
helper within backports.

Sample run:

mcgrof@garbanzo ~/linux-next (git::master)$ time ./pycocci
0001-netdev_ops.cocci ./

real    24m13.402s
user    72m27.072s
sys     22m38.812s

With this Coccinelle SmPL rule:

@@
struct net_device *dev;
struct net_device_ops ops;
@@
-dev->netdev_ops = &ops;
+netdev_attach_ops(dev, &ops);

Cc: Johannes Berg <johannes.berg@intel.com>
Cc: backports@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: cocci@systeme.lip6.fr
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 python/pycocci | 193 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 193 insertions(+)
 create mode 100755 python/pycocci

diff --git a/python/pycocci b/python/pycocci
new file mode 100755
index 0000000..4b3ef38
--- /dev/null
+++ b/python/pycocci
@@ -0,0 +1,193 @@
+#!/usr/bin/env python
+#
+# Copyright (c) 2014 Luis R. Rodriguez  <mcgrof@suse.com>
+# Copyright (c) 2013 Johannes Berg <johannes.berg@intel.com>
+#
+# This file is released under the GPLv2.
+#
+# Python wrapper for Coccinelle for multithreaded support,
+# designed to be used for working on a git tree, and with sensible
+# defaults, specifically for kernel developers.
+
+from multiprocessing import Process, cpu_count, Queue
+import argparse, subprocess, os, sys
+import tempfile, shutil
+
+# simple tempdir wrapper object for 'with' statement
+#
+# Usage:
+# with tempdir.tempdir() as tmpdir:
+#     os.chdir(tmpdir)
+#     do something
+class tempdir(object):
+    def __init__(self, suffix='', prefix='', dir=None, nodelete=False):
+        self.suffix = ''
+        self.prefix = ''
+        self.dir = dir
+        self.nodelete = nodelete
+
+    def __enter__(self):
+        self._name = tempfile.mkdtemp(suffix=self.suffix,
+                                      prefix=self.prefix,
+                                      dir=self.dir)
+        return self._name
+
+    def __exit__(self, type, value, traceback):
+        if self.nodelete:
+            print('not deleting directory %s!' % self._name)
+        else:
+            shutil.rmtree(self._name)
+
+class CoccinelleError(Exception):
+    pass
+class ExecutionError(CoccinelleError):
+    def __init__(self, cmd, errcode):
+        self.error_code = errcode
+        print('Failed command:')
+        print(' '.join(cmd))
+
+class ExecutionErrorThread(CoccinelleError):
+    def __init__(self, errcode, fn, cocci_file, threads, t, logwrite, print_name):
+        self.error_code = errcode
+        logwrite("Failed to apply changes from %s" % print_name)
+
+        logwrite("Specific log output from change that failed using %s" % print_name)
+        tf = open(fn, 'r')
+        for line in tf.read():
+            logwrite('> %s' % line)
+        tf.close()
+
+        logwrite("Full log using %s" % print_name)
+        for num in range(threads):
+            fn = os.path.join(t, '.tmp_spatch_worker.' + str(num))
+            if (not os.path.isfile(fn)):
+                continue
+            tf = open(fn, 'r')
+            for line in tf.read():
+                logwrite('> %s' % line)
+            tf.close()
+            os.unlink(fn)
+
+def spatch(cocci_file, outdir,
+           max_threads, thread_id, temp_dir, ret_q, extra_args=[]):
+    cmd = ['spatch', '--sp-file', cocci_file, '--in-place',
+            '--recursive-includes',
+            '--backup-suffix', '.cocci_backup', '--dir', outdir]
+
+    if (max_threads > 1):
+        cmd.extend(['-max', str(max_threads), '-index', str(thread_id)])
+
+    cmd.extend(extra_args)
+
+    fn = os.path.join(temp_dir, '.tmp_spatch_worker.' + str(thread_id))
+    outfile = open(fn, 'w')
+
+    sprocess = subprocess.Popen(cmd,
+                               stdout=outfile, stderr=subprocess.STDOUT,
+                               close_fds=True, universal_newlines=True)
+    sprocess.wait()
+    if sprocess.returncode != 0:
+        raise ExecutionError(cmd, sprocess.returncode)
+    outfile.close()
+    ret_q.put((sprocess.returncode, fn))
+
+def threaded_spatch(cocci_file, outdir, logwrite, num_jobs,
+                    print_name, extra_args=[]):
+    num_cpus = cpu_count()
+    # A lengthy comment is worthy here. As of spatch version 1.0.0-rc20
+    # Coccinelle will break out target files into buckets and a thread
+    # will work on each bucket. Turns out that after inspection while
+    # leaving htop running and reading results after profiling we know
+    # that CPUs are left idle after tasks which have no work to do finish
+    # fast. This leaves CPUs jobless and hungry. Experiments with *really* long
+    # cocci files (all of the Linux backports cocci files in one file is an
+    # example) show that currently num_cpus * 3 provides reasonable completion
+    # time, while smaller rules can use more threads, currently we set this
+    # to 10. You however are more than welcomed to experiment and override
+    # this. Note that its currently being discussed how to best optimize
+    # things even further for Coccinelle.
+    #
+    # Images available of htop before multithreading:
+    # http://drvbp1.linux-foundation.org/~mcgrof/images/coccinelle-backports/before-threaded-cocci.png
+    # The jobless issue on threads if its just num_cpus after a period of time:
+    # http://drvbp1.linux-foundation.org/~mcgrof/images/coccinelle-backports/cocci-jobless-processes.png
+    # A happy healthy run should look like over most of the run:
+    # http://drvbp1.linux-foundation.org/~mcgrof/images/coccinelle-backports/after-threaded-cocci.png
+    if num_jobs:
+        threads = num_jobs
+    else:
+        threads = num_cpus * 10
+    jobs = list()
+    output = ""
+    ret_q = Queue()
+    with tempdir() as t:
+        for num in range(threads):
+            p = Process(target=spatch, args=(cocci_file, outdir,
+                                             threads, num, t, ret_q,
+                                             extra_args))
+            jobs.append(p)
+        for p in jobs:
+            p.start()
+
+        for num in range(threads):
+            ret, fn = ret_q.get()
+            if ret != 0:
+                raise ExecutionErrorThread(ret, fn, cocci_file, threads, t,
+                                           logwrite, print_name)
+        for job in jobs:
+            p.join()
+
+        for num in range(threads):
+            fn = os.path.join(t, '.tmp_spatch_worker.' + str(num))
+            tf = open(fn, 'r')
+            output = output + tf.read()
+            tf.close()
+            os.unlink(fn)
+        return output
+
+def logwrite(msg):
+    sys.stdout.write(msg)
+    sys.stdout.flush()
+
+def _main():
+    parser = argparse.ArgumentParser(description='Multithreaded Python wrapper for Coccinelle ' +
+                                     'with sensible defaults, targetted specifically ' +
+                                     'for git development environments')
+    parser.add_argument('cocci_file', metavar='<Coccinelle SmPL rules file>', type=str,
+                        help='This is the Coccinelle file you want to use')
+    parser.add_argument('target_dir', metavar='<target directory>', type=str,
+                        help='Target source directory to modify')
+    parser.add_argument('-p', '--profile-cocci', const=True, default=False, action="store_const",
+                        help='Enable profile, this will pass --profile  to Coccinelle.')
+    parser.add_argument('-j', '--jobs', metavar='<jobs>', type=str, default=None,
+                        help='Only use the cocci file passed for Coccinelle, don\'t do anything else, ' +
+                        'also creates a git repo on the target directory for easy inspection ' +
+                        'of changes done by Coccinelle.')
+    parser.add_argument('-v', '--verbose', const=True, default=False, action="store_const",
+                        help='Enable output from Coccinelle')
+    args = parser.parse_args()
+
+    if not os.path.isfile(args.cocci_file):
+        return -2
+
+    extra_spatch_args = []
+    if args.profile_cocci:
+        extra_spatch_args.append('--profile')
+    jobs = 0
+    if args.jobs > 0:
+        jobs = args.jobs
+
+    output = threaded_spatch(args.cocci_file,
+                             args.target_dir,
+                             logwrite,
+                             jobs,
+                             os.path.basename(args.cocci_file),
+                             extra_args=extra_spatch_args)
+    if args.verbose:
+        logwrite(output)
+    return 0
+
+if __name__ == '__main__':
+    ret = _main()
+    if ret:
+        sys.exit(ret)
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Cocci] [PATCH] coccinelle: add pycocci wrapper for multithreaded support
@ 2014-04-10 17:48 ` Luis R. Rodriguez
  0 siblings, 0 replies; 28+ messages in thread
From: Luis R. Rodriguez @ 2014-04-10 17:48 UTC (permalink / raw)
  To: cocci

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This is a wrapper for folks which by work on git trees, specifically
the linux kernel with lots of files and with random task Cocci files.
The assumption all you need is multithreaded support and currently only
a shell script is lying around, but that isn't easily extensible, nor
is it dynamic. This uses Python to add Coccinelle's mechanisms for
multithreaded support but also enables all sorts of defaults which
you'd expect to be enabled when using Coccinelle for Linux kernel
development.

You just pass it a cocci file, a target dir, and in git environments
you always want --in-place enabled. Experiments and profiling random
cocci files with the Linux kernel show that using just using number of
CPUs doesn't scale well given that lots of buckets of files don't require
work, as such this uses 10 * number of CPUs for its number of threads.
For work that define more general ruler 3 * number of CPUs works better,
but for smaller cocci files 3 * number of CPUs performs best right now.
To experiment more with what's going on with the multithreading one can enable
htop while kicking off a cocci task on the kernel, we want to keep
these CPUs busy as much as possible. You can override the number of
threads with pycocci with -j or --jobs. The problem with jobless threads
can be seen here:

http://drvbp1.linux-foundation.org/~mcgrof/images/coccinelle-backports/cocci-jobless-processes.png

A healthy run would keep all the CPUs busy as in here:

http://drvbp1.linux-foundation.org/~mcgrof/images/coccinelle-backports/after-threaded-cocci.png

This is heavily based on the multithreading implementation completed
on the Linux backports project, this just generalizes it and takes it
out of there in case others can make use of it -- I did as I wanted to
make upstream changes with Coccinelle. Note that multithreading implementation
for Coccinelle is currently being discussed to make CPU usage more efficient,
so this currently is only a helper.

Since its just a helper I toss it into the python directory but don't
install it. Hope is that we can evolve it there instead of carrying this
helper within backports.

Sample run:

mcgrof at garbanzo ~/linux-next (git::master)$ time ./pycocci
0001-netdev_ops.cocci ./

real    24m13.402s
user    72m27.072s
sys     22m38.812s

With this Coccinelle SmPL rule:

@@
struct net_device *dev;
struct net_device_ops ops;
@@
-dev->netdev_ops = &ops;
+netdev_attach_ops(dev, &ops);

Cc: Johannes Berg <johannes.berg@intel.com>
Cc: backports at vger.kernel.org
Cc: linux-kernel at vger.kernel.org
Cc: cocci at systeme.lip6.fr
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 python/pycocci | 193 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 193 insertions(+)
 create mode 100755 python/pycocci

diff --git a/python/pycocci b/python/pycocci
new file mode 100755
index 0000000..4b3ef38
--- /dev/null
+++ b/python/pycocci
@@ -0,0 +1,193 @@
+#!/usr/bin/env python
+#
+# Copyright (c) 2014 Luis R. Rodriguez  <mcgrof@suse.com>
+# Copyright (c) 2013 Johannes Berg <johannes.berg@intel.com>
+#
+# This file is released under the GPLv2.
+#
+# Python wrapper for Coccinelle for multithreaded support,
+# designed to be used for working on a git tree, and with sensible
+# defaults, specifically for kernel developers.
+
+from multiprocessing import Process, cpu_count, Queue
+import argparse, subprocess, os, sys
+import tempfile, shutil
+
+# simple tempdir wrapper object for 'with' statement
+#
+# Usage:
+# with tempdir.tempdir() as tmpdir:
+#     os.chdir(tmpdir)
+#     do something
+class tempdir(object):
+    def __init__(self, suffix='', prefix='', dir=None, nodelete=False):
+        self.suffix = ''
+        self.prefix = ''
+        self.dir = dir
+        self.nodelete = nodelete
+
+    def __enter__(self):
+        self._name = tempfile.mkdtemp(suffix=self.suffix,
+                                      prefix=self.prefix,
+                                      dir=self.dir)
+        return self._name
+
+    def __exit__(self, type, value, traceback):
+        if self.nodelete:
+            print('not deleting directory %s!' % self._name)
+        else:
+            shutil.rmtree(self._name)
+
+class CoccinelleError(Exception):
+    pass
+class ExecutionError(CoccinelleError):
+    def __init__(self, cmd, errcode):
+        self.error_code = errcode
+        print('Failed command:')
+        print(' '.join(cmd))
+
+class ExecutionErrorThread(CoccinelleError):
+    def __init__(self, errcode, fn, cocci_file, threads, t, logwrite, print_name):
+        self.error_code = errcode
+        logwrite("Failed to apply changes from %s" % print_name)
+
+        logwrite("Specific log output from change that failed using %s" % print_name)
+        tf = open(fn, 'r')
+        for line in tf.read():
+            logwrite('> %s' % line)
+        tf.close()
+
+        logwrite("Full log using %s" % print_name)
+        for num in range(threads):
+            fn = os.path.join(t, '.tmp_spatch_worker.' + str(num))
+            if (not os.path.isfile(fn)):
+                continue
+            tf = open(fn, 'r')
+            for line in tf.read():
+                logwrite('> %s' % line)
+            tf.close()
+            os.unlink(fn)
+
+def spatch(cocci_file, outdir,
+           max_threads, thread_id, temp_dir, ret_q, extra_args=[]):
+    cmd = ['spatch', '--sp-file', cocci_file, '--in-place',
+            '--recursive-includes',
+            '--backup-suffix', '.cocci_backup', '--dir', outdir]
+
+    if (max_threads > 1):
+        cmd.extend(['-max', str(max_threads), '-index', str(thread_id)])
+
+    cmd.extend(extra_args)
+
+    fn = os.path.join(temp_dir, '.tmp_spatch_worker.' + str(thread_id))
+    outfile = open(fn, 'w')
+
+    sprocess = subprocess.Popen(cmd,
+                               stdout=outfile, stderr=subprocess.STDOUT,
+                               close_fds=True, universal_newlines=True)
+    sprocess.wait()
+    if sprocess.returncode != 0:
+        raise ExecutionError(cmd, sprocess.returncode)
+    outfile.close()
+    ret_q.put((sprocess.returncode, fn))
+
+def threaded_spatch(cocci_file, outdir, logwrite, num_jobs,
+                    print_name, extra_args=[]):
+    num_cpus = cpu_count()
+    # A lengthy comment is worthy here. As of spatch version 1.0.0-rc20
+    # Coccinelle will break out target files into buckets and a thread
+    # will work on each bucket. Turns out that after inspection while
+    # leaving htop running and reading results after profiling we know
+    # that CPUs are left idle after tasks which have no work to do finish
+    # fast. This leaves CPUs jobless and hungry. Experiments with *really* long
+    # cocci files (all of the Linux backports cocci files in one file is an
+    # example) show that currently num_cpus * 3 provides reasonable completion
+    # time, while smaller rules can use more threads, currently we set this
+    # to 10. You however are more than welcomed to experiment and override
+    # this. Note that its currently being discussed how to best optimize
+    # things even further for Coccinelle.
+    #
+    # Images available of htop before multithreading:
+    # http://drvbp1.linux-foundation.org/~mcgrof/images/coccinelle-backports/before-threaded-cocci.png
+    # The jobless issue on threads if its just num_cpus after a period of time:
+    # http://drvbp1.linux-foundation.org/~mcgrof/images/coccinelle-backports/cocci-jobless-processes.png
+    # A happy healthy run should look like over most of the run:
+    # http://drvbp1.linux-foundation.org/~mcgrof/images/coccinelle-backports/after-threaded-cocci.png
+    if num_jobs:
+        threads = num_jobs
+    else:
+        threads = num_cpus * 10
+    jobs = list()
+    output = ""
+    ret_q = Queue()
+    with tempdir() as t:
+        for num in range(threads):
+            p = Process(target=spatch, args=(cocci_file, outdir,
+                                             threads, num, t, ret_q,
+                                             extra_args))
+            jobs.append(p)
+        for p in jobs:
+            p.start()
+
+        for num in range(threads):
+            ret, fn = ret_q.get()
+            if ret != 0:
+                raise ExecutionErrorThread(ret, fn, cocci_file, threads, t,
+                                           logwrite, print_name)
+        for job in jobs:
+            p.join()
+
+        for num in range(threads):
+            fn = os.path.join(t, '.tmp_spatch_worker.' + str(num))
+            tf = open(fn, 'r')
+            output = output + tf.read()
+            tf.close()
+            os.unlink(fn)
+        return output
+
+def logwrite(msg):
+    sys.stdout.write(msg)
+    sys.stdout.flush()
+
+def _main():
+    parser = argparse.ArgumentParser(description='Multithreaded Python wrapper for Coccinelle ' +
+                                     'with sensible defaults, targetted specifically ' +
+                                     'for git development environments')
+    parser.add_argument('cocci_file', metavar='<Coccinelle SmPL rules file>', type=str,
+                        help='This is the Coccinelle file you want to use')
+    parser.add_argument('target_dir', metavar='<target directory>', type=str,
+                        help='Target source directory to modify')
+    parser.add_argument('-p', '--profile-cocci', const=True, default=False, action="store_const",
+                        help='Enable profile, this will pass --profile  to Coccinelle.')
+    parser.add_argument('-j', '--jobs', metavar='<jobs>', type=str, default=None,
+                        help='Only use the cocci file passed for Coccinelle, don\'t do anything else, ' +
+                        'also creates a git repo on the target directory for easy inspection ' +
+                        'of changes done by Coccinelle.')
+    parser.add_argument('-v', '--verbose', const=True, default=False, action="store_const",
+                        help='Enable output from Coccinelle')
+    args = parser.parse_args()
+
+    if not os.path.isfile(args.cocci_file):
+        return -2
+
+    extra_spatch_args = []
+    if args.profile_cocci:
+        extra_spatch_args.append('--profile')
+    jobs = 0
+    if args.jobs > 0:
+        jobs = args.jobs
+
+    output = threaded_spatch(args.cocci_file,
+                             args.target_dir,
+                             logwrite,
+                             jobs,
+                             os.path.basename(args.cocci_file),
+                             extra_args=extra_spatch_args)
+    if args.verbose:
+        logwrite(output)
+    return 0
+
+if __name__ == '__main__':
+    ret = _main()
+    if ret:
+        sys.exit(ret)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] coccinelle: add pycocci wrapper for multithreaded support
  2014-04-10 17:48 ` [Cocci] " Luis R. Rodriguez
@ 2014-04-10 17:51   ` Johannes Berg
  -1 siblings, 0 replies; 28+ messages in thread
From: Johannes Berg @ 2014-04-10 17:51 UTC (permalink / raw)
  To: Luis R. Rodriguez; +Cc: cocci, Luis R. Rodriguez, backports, linux-kernel

On Thu, 2014-04-10 at 10:48 -0700, Luis R. Rodriguez wrote:

> You just pass it a cocci file, a target dir, and in git environments
> you always want --in-place enabled. Experiments and profiling random
> cocci files with the Linux kernel show that using just using number of
> CPUs doesn't scale well given that lots of buckets of files don't require
> work, as such this uses 10 * number of CPUs for its number of threads.
> For work that define more general ruler 3 * number of CPUs works better,
> but for smaller cocci files 3 * number of CPUs performs best right now.
> To experiment more with what's going on with the multithreading one can enable
> htop while kicking off a cocci task on the kernel, we want to keep
> these CPUs busy as much as possible. 

That's not really a good benchmark, you want to actually check how
quickly it finishes ... If you have some IO issues then just keeping the
CPUs busy trying to do IO won't help at all.


> Since its just a helper I toss it into the python directory but don't
> install it. Hope is that we can evolve it there instead of carrying this
> helper within backports.

If there's a plan to make coccinelle itself multi-threaded, what's the
point?

johannes


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Cocci] [PATCH] coccinelle: add pycocci wrapper for multithreaded support
@ 2014-04-10 17:51   ` Johannes Berg
  0 siblings, 0 replies; 28+ messages in thread
From: Johannes Berg @ 2014-04-10 17:51 UTC (permalink / raw)
  To: cocci

On Thu, 2014-04-10 at 10:48 -0700, Luis R. Rodriguez wrote:

> You just pass it a cocci file, a target dir, and in git environments
> you always want --in-place enabled. Experiments and profiling random
> cocci files with the Linux kernel show that using just using number of
> CPUs doesn't scale well given that lots of buckets of files don't require
> work, as such this uses 10 * number of CPUs for its number of threads.
> For work that define more general ruler 3 * number of CPUs works better,
> but for smaller cocci files 3 * number of CPUs performs best right now.
> To experiment more with what's going on with the multithreading one can enable
> htop while kicking off a cocci task on the kernel, we want to keep
> these CPUs busy as much as possible. 

That's not really a good benchmark, you want to actually check how
quickly it finishes ... If you have some IO issues then just keeping the
CPUs busy trying to do IO won't help at all.


> Since its just a helper I toss it into the python directory but don't
> install it. Hope is that we can evolve it there instead of carrying this
> helper within backports.

If there's a plan to make coccinelle itself multi-threaded, what's the
point?

johannes

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] coccinelle: add pycocci wrapper for multithreaded support
  2014-04-10 17:51   ` [Cocci] " Johannes Berg
@ 2014-04-10 17:57     ` Luis R. Rodriguez
  -1 siblings, 0 replies; 28+ messages in thread
From: Luis R. Rodriguez @ 2014-04-10 17:57 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Luis R. Rodriguez, cocci, backports, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2271 bytes --]

On Thu, Apr 10, 2014 at 07:51:29PM +0200, Johannes Berg wrote:
> On Thu, 2014-04-10 at 10:48 -0700, Luis R. Rodriguez wrote:
> 
> > You just pass it a cocci file, a target dir, and in git environments
> > you always want --in-place enabled. Experiments and profiling random
> > cocci files with the Linux kernel show that using just using number of
> > CPUs doesn't scale well given that lots of buckets of files don't require
> > work, as such this uses 10 * number of CPUs for its number of threads.
> > For work that define more general ruler 3 * number of CPUs works better,
> > but for smaller cocci files 3 * number of CPUs performs best right now.
> > To experiment more with what's going on with the multithreading one can enable
> > htop while kicking off a cocci task on the kernel, we want to keep
> > these CPUs busy as much as possible. 
> 
> That's not really a good benchmark, you want to actually check how
> quickly it finishes ... If you have some IO issues then just keeping the
> CPUs busy trying to do IO won't help at all.

I checked the profile results, the reason the jobs finish is some threads
had no work or little work. Hence why I increased the number of threads,
depending on the context (long or short cocci expected, in backports
at least, the long being all cocci files in one, the short being --test-cocci
flag to gentree.py). This wrapper uses the short assumption with 10 * num_cpus

> > Since its just a helper I toss it into the python directory but don't
> > install it. Hope is that we can evolve it there instead of carrying this
> > helper within backports.
> 
> If there's a plan to make coccinelle itself multi-threaded, what's the
> point?

To be clear, Coccinelle *has* a form of multithreaded support but requires manual
spawning of jobs with references to the max count and also the number thread
that this new process you are spawning belongs to. There's plans to consider
reworking things to handle all this internally but as I discussed with Julia
the changes required would require some structural changes, and as such we
need to live with this for a bit longer. I need to use Coccinelle daily now,
so figured I'd punt this out there in case others might make use of it.

  Luis

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Cocci] [PATCH] coccinelle: add pycocci wrapper for multithreaded support
@ 2014-04-10 17:57     ` Luis R. Rodriguez
  0 siblings, 0 replies; 28+ messages in thread
From: Luis R. Rodriguez @ 2014-04-10 17:57 UTC (permalink / raw)
  To: cocci

On Thu, Apr 10, 2014 at 07:51:29PM +0200, Johannes Berg wrote:
> On Thu, 2014-04-10 at 10:48 -0700, Luis R. Rodriguez wrote:
> 
> > You just pass it a cocci file, a target dir, and in git environments
> > you always want --in-place enabled. Experiments and profiling random
> > cocci files with the Linux kernel show that using just using number of
> > CPUs doesn't scale well given that lots of buckets of files don't require
> > work, as such this uses 10 * number of CPUs for its number of threads.
> > For work that define more general ruler 3 * number of CPUs works better,
> > but for smaller cocci files 3 * number of CPUs performs best right now.
> > To experiment more with what's going on with the multithreading one can enable
> > htop while kicking off a cocci task on the kernel, we want to keep
> > these CPUs busy as much as possible. 
> 
> That's not really a good benchmark, you want to actually check how
> quickly it finishes ... If you have some IO issues then just keeping the
> CPUs busy trying to do IO won't help at all.

I checked the profile results, the reason the jobs finish is some threads
had no work or little work. Hence why I increased the number of threads,
depending on the context (long or short cocci expected, in backports
at least, the long being all cocci files in one, the short being --test-cocci
flag to gentree.py). This wrapper uses the short assumption with 10 * num_cpus

> > Since its just a helper I toss it into the python directory but don't
> > install it. Hope is that we can evolve it there instead of carrying this
> > helper within backports.
> 
> If there's a plan to make coccinelle itself multi-threaded, what's the
> point?

To be clear, Coccinelle *has* a form of multithreaded support but requires manual
spawning of jobs with references to the max count and also the number thread
that this new process you are spawning belongs to. There's plans to consider
reworking things to handle all this internally but as I discussed with Julia
the changes required would require some structural changes, and as such we
need to live with this for a bit longer. I need to use Coccinelle daily now,
so figured I'd punt this out there in case others might make use of it.

  Luis
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <https://systeme.lip6.fr/pipermail/cocci/attachments/20140410/06227a0b/attachment-0001.asc>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Cocci] [PATCH] coccinelle: add pycocci wrapper for multithreaded support
  2014-04-10 17:57     ` [Cocci] " Luis R. Rodriguez
@ 2014-04-10 19:32       ` Julia Lawall
  -1 siblings, 0 replies; 28+ messages in thread
From: Julia Lawall @ 2014-04-10 19:32 UTC (permalink / raw)
  To: Luis R. Rodriguez; +Cc: Johannes Berg, linux-kernel, backports, cocci



On Thu, 10 Apr 2014, Luis R. Rodriguez wrote:

> On Thu, Apr 10, 2014 at 07:51:29PM +0200, Johannes Berg wrote:
> > On Thu, 2014-04-10 at 10:48 -0700, Luis R. Rodriguez wrote:
> > 
> > > You just pass it a cocci file, a target dir, and in git environments
> > > you always want --in-place enabled. Experiments and profiling random
> > > cocci files with the Linux kernel show that using just using number of
> > > CPUs doesn't scale well given that lots of buckets of files don't require
> > > work, as such this uses 10 * number of CPUs for its number of threads.
> > > For work that define more general ruler 3 * number of CPUs works better,
> > > but for smaller cocci files 3 * number of CPUs performs best right now.
> > > To experiment more with what's going on with the multithreading one can enable
> > > htop while kicking off a cocci task on the kernel, we want to keep
> > > these CPUs busy as much as possible. 
> > 
> > That's not really a good benchmark, you want to actually check how
> > quickly it finishes ... If you have some IO issues then just keeping the
> > CPUs busy trying to do IO won't help at all.
> 
> I checked the profile results, the reason the jobs finish is some threads
> had no work or little work. Hence why I increased the number of threads,
> depending on the context (long or short cocci expected, in backports
> at least, the long being all cocci files in one, the short being --test-cocci
> flag to gentree.py). This wrapper uses the short assumption with 10 * num_cpus
> 
> > > Since its just a helper I toss it into the python directory but don't
> > > install it. Hope is that we can evolve it there instead of carrying this
> > > helper within backports.
> > 
> > If there's a plan to make coccinelle itself multi-threaded, what's the
> > point?
> 
> To be clear, Coccinelle *has* a form of multithreaded support but requires manual
> spawning of jobs with references to the max count and also the number thread
> that this new process you are spawning belongs to. There's plans to consider
> reworking things to handle all this internally but as I discussed with Julia
> the changes required would require some structural changes, and as such we
> need to live with this for a bit longer. I need to use Coccinelle daily now,
> so figured I'd punt this out there in case others might make use of it.

I agree with Luis.  Multithreading inside Coccinelle is currently a 
priority task, but not a highest priority one.

julia

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Cocci] [PATCH] coccinelle: add pycocci wrapper for multithreaded support
@ 2014-04-10 19:32       ` Julia Lawall
  0 siblings, 0 replies; 28+ messages in thread
From: Julia Lawall @ 2014-04-10 19:32 UTC (permalink / raw)
  To: cocci



On Thu, 10 Apr 2014, Luis R. Rodriguez wrote:

> On Thu, Apr 10, 2014 at 07:51:29PM +0200, Johannes Berg wrote:
> > On Thu, 2014-04-10 at 10:48 -0700, Luis R. Rodriguez wrote:
> > 
> > > You just pass it a cocci file, a target dir, and in git environments
> > > you always want --in-place enabled. Experiments and profiling random
> > > cocci files with the Linux kernel show that using just using number of
> > > CPUs doesn't scale well given that lots of buckets of files don't require
> > > work, as such this uses 10 * number of CPUs for its number of threads.
> > > For work that define more general ruler 3 * number of CPUs works better,
> > > but for smaller cocci files 3 * number of CPUs performs best right now.
> > > To experiment more with what's going on with the multithreading one can enable
> > > htop while kicking off a cocci task on the kernel, we want to keep
> > > these CPUs busy as much as possible. 
> > 
> > That's not really a good benchmark, you want to actually check how
> > quickly it finishes ... If you have some IO issues then just keeping the
> > CPUs busy trying to do IO won't help at all.
> 
> I checked the profile results, the reason the jobs finish is some threads
> had no work or little work. Hence why I increased the number of threads,
> depending on the context (long or short cocci expected, in backports
> at least, the long being all cocci files in one, the short being --test-cocci
> flag to gentree.py). This wrapper uses the short assumption with 10 * num_cpus
> 
> > > Since its just a helper I toss it into the python directory but don't
> > > install it. Hope is that we can evolve it there instead of carrying this
> > > helper within backports.
> > 
> > If there's a plan to make coccinelle itself multi-threaded, what's the
> > point?
> 
> To be clear, Coccinelle *has* a form of multithreaded support but requires manual
> spawning of jobs with references to the max count and also the number thread
> that this new process you are spawning belongs to. There's plans to consider
> reworking things to handle all this internally but as I discussed with Julia
> the changes required would require some structural changes, and as such we
> need to live with this for a bit longer. I need to use Coccinelle daily now,
> so figured I'd punt this out there in case others might make use of it.

I agree with Luis.  Multithreading inside Coccinelle is currently a 
priority task, but not a highest priority one.

julia

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] coccinelle: add pycocci wrapper for multithreaded support
  2014-04-10 17:57     ` [Cocci] " Luis R. Rodriguez
@ 2014-04-11  5:55       ` SF Markus Elfring
  -1 siblings, 0 replies; 28+ messages in thread
From: SF Markus Elfring @ 2014-04-11  5:55 UTC (permalink / raw)
  To: Luis R. Rodriguez, Johannes Berg
  Cc: Luis R. Rodriguez, cocci, backports, linux-kernel

> I checked the profile results, the reason the jobs finish is some threads
> had no work or little work.

Could you find out during the data processing which parts or files
result in a special application behaviour you would like to point out here?

Regards,
Markus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Cocci] [PATCH] coccinelle: add pycocci wrapper for multithreaded support
@ 2014-04-11  5:55       ` SF Markus Elfring
  0 siblings, 0 replies; 28+ messages in thread
From: SF Markus Elfring @ 2014-04-11  5:55 UTC (permalink / raw)
  To: cocci

> I checked the profile results, the reason the jobs finish is some threads
> had no work or little work.

Could you find out during the data processing which parts or files
result in a special application behaviour you would like to point out here?

Regards,
Markus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Cocci] [PATCH] coccinelle: add pycocci wrapper for multithreaded support
  2014-04-11  5:55       ` [Cocci] " SF Markus Elfring
@ 2014-04-11  6:01         ` Julia Lawall
  -1 siblings, 0 replies; 28+ messages in thread
From: Julia Lawall @ 2014-04-11  6:01 UTC (permalink / raw)
  To: SF Markus Elfring
  Cc: Luis R. Rodriguez, Johannes Berg, linux-kernel, backports, cocci



On Fri, 11 Apr 2014, SF Markus Elfring wrote:

> > I checked the profile results, the reason the jobs finish is some threads
> > had no work or little work.
> 
> Could you find out during the data processing which parts or files
> result in a special application behaviour you would like to point out here?

I don't understand the question at all, but since the various files have 
different properties, it is hard to determine automatically in advance how 
much work Coccinelle will need to do on each one.

julia

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Cocci] [PATCH] coccinelle: add pycocci wrapper for multithreaded support
@ 2014-04-11  6:01         ` Julia Lawall
  0 siblings, 0 replies; 28+ messages in thread
From: Julia Lawall @ 2014-04-11  6:01 UTC (permalink / raw)
  To: cocci



On Fri, 11 Apr 2014, SF Markus Elfring wrote:

> > I checked the profile results, the reason the jobs finish is some threads
> > had no work or little work.
> 
> Could you find out during the data processing which parts or files
> result in a special application behaviour you would like to point out here?

I don't understand the question at all, but since the various files have 
different properties, it is hard to determine automatically in advance how 
much work Coccinelle will need to do on each one.

julia

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Cocci] [PATCH] coccinelle: add pycocci wrapper for multithreaded support
  2014-04-11  6:01         ` Julia Lawall
@ 2014-04-11  6:15           ` SF Markus Elfring
  -1 siblings, 0 replies; 28+ messages in thread
From: SF Markus Elfring @ 2014-04-11  6:15 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Luis R. Rodriguez, Johannes Berg, linux-kernel, backports, cocci

>> Could you find out during the data processing which parts or files
>> result in a special application behaviour you would like to point out here?
> I don't understand the question at all, but since the various files have 
> different properties, it is hard to determine automatically in advance how 
> much work Coccinelle will need to do on each one.

It was reported that a system utilisation did not fit to some
expectations. I am curious if any more details or patterns can be
determined for an observed situation.

Do the involved files need also another look?
- semantic patches
- source directories

Regards,
Markus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Cocci] [PATCH] coccinelle: add pycocci wrapper for multithreaded support
@ 2014-04-11  6:15           ` SF Markus Elfring
  0 siblings, 0 replies; 28+ messages in thread
From: SF Markus Elfring @ 2014-04-11  6:15 UTC (permalink / raw)
  To: cocci

>> Could you find out during the data processing which parts or files
>> result in a special application behaviour you would like to point out here?
> I don't understand the question at all, but since the various files have 
> different properties, it is hard to determine automatically in advance how 
> much work Coccinelle will need to do on each one.

It was reported that a system utilisation did not fit to some
expectations. I am curious if any more details or patterns can be
determined for an observed situation.

Do the involved files need also another look?
- semantic patches
- source directories

Regards,
Markus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Cocci] Software parallelisation?
  2014-04-10 19:32       ` Julia Lawall
  (?)
@ 2014-04-11  8:50       ` SF Markus Elfring
  2014-04-11 18:49         ` Jesper Louis Andersen
  -1 siblings, 1 reply; 28+ messages in thread
From: SF Markus Elfring @ 2014-04-11  8:50 UTC (permalink / raw)
  To: cocci

> Multithreading inside Coccinelle is currently a priority task,
> but not a highest priority one.

Would you like to share any more ideas about corresponding software
improvements?

Regards,
Markus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Cocci] Software parallelisation?
  2014-04-11  8:50       ` [Cocci] Software parallelisation? SF Markus Elfring
@ 2014-04-11 18:49         ` Jesper Louis Andersen
  2014-04-11 19:40           ` SF Markus Elfring
  2014-04-11 20:04           ` Julia Lawall
  0 siblings, 2 replies; 28+ messages in thread
From: Jesper Louis Andersen @ 2014-04-11 18:49 UTC (permalink / raw)
  To: cocci

On Fri, Apr 11, 2014 at 10:50 AM, SF Markus Elfring <
elfring@users.sourceforge.net> wrote:

> Would you like to share any more ideas about corresponding software
> improvements?
>

The quick answer: It is currently somewhat hard. Ocaml is not
multi-threaded because it does not have a garbage collector able to execute
in multiple threads at the same time. This is being worked on seriously
now, but is at least 2-3 years out on the horizon for a stable solution.

So to make cocci faster on the inside requires some multi-processing
hackery. Certainly possible, but not as easy.


-- 
J.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://systeme.lip6.fr/pipermail/cocci/attachments/20140411/03c92539/attachment.html>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Cocci] [PATCH] coccinelle: add pycocci wrapper for multithreaded support
  2014-04-11  6:01         ` Julia Lawall
@ 2014-04-11 19:00           ` Luis R. Rodriguez
  -1 siblings, 0 replies; 28+ messages in thread
From: Luis R. Rodriguez @ 2014-04-11 19:00 UTC (permalink / raw)
  To: Julia Lawall
  Cc: SF Markus Elfring, Johannes Berg, linux-kernel, backports, cocci

[-- Attachment #1: Type: text/plain, Size: 1573 bytes --]

On Fri, Apr 11, 2014 at 08:01:04AM +0200, Julia Lawall wrote:
> 
> 
> On Fri, 11 Apr 2014, SF Markus Elfring wrote:
> 
> > > I checked the profile results, the reason the jobs finish is some threads
> > > had no work or little work.
> > 
> > Could you find out during the data processing which parts or files
> > result in a special application behaviour you would like to point out here?
> 
> I don't understand the question at all, but since the various files have 
> different properties, it is hard to determine automatically in advance how 
> much work Coccinelle will need to do on each one.

For the person who might work on enhancing multithreading support, I'd wonder
if there could be gains of actually putting an effort out first to evaluate
which files have one rule hit and then adding the file to an activity file lis
to later be spread between the threads. As you note though it is hard to
determine this in advance though given that each rule express any change.

I think one small change which could help, and likely not incur a drastic
immediate change on architecture could be to not let theads take files / jobs
list of files, but instead just take say:

	work_task_n = (files / jobs) / 10

The list of files needing work could then be kept on a list protected
against the threads, and each thread will only die if all the files
have been worked on already. This would enable keeping number_cpu
threads only, as each CPU would indeed be busy all the time.

BTW is the patch Acked-by Julia? Can we commit it :)

  Luis

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Cocci] [PATCH] coccinelle: add pycocci wrapper for multithreaded support
@ 2014-04-11 19:00           ` Luis R. Rodriguez
  0 siblings, 0 replies; 28+ messages in thread
From: Luis R. Rodriguez @ 2014-04-11 19:00 UTC (permalink / raw)
  To: cocci

On Fri, Apr 11, 2014 at 08:01:04AM +0200, Julia Lawall wrote:
> 
> 
> On Fri, 11 Apr 2014, SF Markus Elfring wrote:
> 
> > > I checked the profile results, the reason the jobs finish is some threads
> > > had no work or little work.
> > 
> > Could you find out during the data processing which parts or files
> > result in a special application behaviour you would like to point out here?
> 
> I don't understand the question at all, but since the various files have 
> different properties, it is hard to determine automatically in advance how 
> much work Coccinelle will need to do on each one.

For the person who might work on enhancing multithreading support, I'd wonder
if there could be gains of actually putting an effort out first to evaluate
which files have one rule hit and then adding the file to an activity file lis
to later be spread between the threads. As you note though it is hard to
determine this in advance though given that each rule express any change.

I think one small change which could help, and likely not incur a drastic
immediate change on architecture could be to not let theads take files / jobs
list of files, but instead just take say:

	work_task_n = (files / jobs) / 10

The list of files needing work could then be kept on a list protected
against the threads, and each thread will only die if all the files
have been worked on already. This would enable keeping number_cpu
threads only, as each CPU would indeed be busy all the time.

BTW is the patch Acked-by Julia? Can we commit it :)

  Luis
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <https://systeme.lip6.fr/pipermail/cocci/attachments/20140411/c1479a3b/attachment.asc>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Cocci] Software parallelisation?
  2014-04-11 18:49         ` Jesper Louis Andersen
@ 2014-04-11 19:40           ` SF Markus Elfring
  2014-04-11 20:04           ` Julia Lawall
  1 sibling, 0 replies; 28+ messages in thread
From: SF Markus Elfring @ 2014-04-11 19:40 UTC (permalink / raw)
  To: cocci

> Ocaml is not multi-threaded because it does not have a garbage collector able
> to execute in multiple threads at the same time. This is being worked on
> seriously now, but is at least 2-3 years out on the horizon for a stable solution.

Would you like to point out any concrete discussions about corresponding
software evolution?


> So to make cocci faster on the inside requires some multi-processing hackery.
> Certainly possible, but not as easy.

I imagine that more fine-tuning could be performed for basic data structures.
Which ideas have you got in mind for improvements around multi-programming
approaches here?

Regards,
Markus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Cocci] Software parallelisation?
  2014-04-11 18:49         ` Jesper Louis Andersen
  2014-04-11 19:40           ` SF Markus Elfring
@ 2014-04-11 20:04           ` Julia Lawall
  2014-04-11 20:42             ` [Cocci] Adjustments for Coccinelle's software build process? SF Markus Elfring
  2014-04-11 20:55             ` [Cocci] Software parallelisation? SF Markus Elfring
  1 sibling, 2 replies; 28+ messages in thread
From: Julia Lawall @ 2014-04-11 20:04 UTC (permalink / raw)
  To: cocci



On Fri, 11 Apr 2014, Jesper Louis Andersen wrote:

> 
> On Fri, Apr 11, 2014 at 10:50 AM, SF Markus Elfring
> <elfring@users.sourceforge.net> wrote:
>       Would you like to share any more ideas about corresponding
>       software
>       improvements?
> 
> 
> The quick answer: It is currently somewhat hard. Ocaml is not multi-threaded
> because it does not have a garbage collector able to execute in multiple
> threads at the same time. This is being worked on seriously now, but is at
> least 2-3 years out on the horizon for a stable solution.
> 
> So to make cocci faster on the inside requires some multi-processing
> hackery. Certainly possible, but not as easy.

Actually, there is a library from Roberto di Cosmo called Parmap that does 
a simple but reasonable job.  For Coccinelle there are two problems:

1. The Coccinelle build system is too complex or me to be able to figure 
out how to include Parmap.  This probably just requires a little time and 
patience to resolve.

2. Coccinelle provides initialize and finalize rules.  These are normally 
used to collect information, eg in a hash table or a global list.  If one 
treats eg a set of files in parallel, then one would have to specify how 
to join the information collected by the separate initialize and finalize 
instances, or somehow implement shared memory in C and use some for of 
synchronization.  This seems all too complicated for Coccinelle.

Point 2 means that parallelism is likely to be provided inside Coccinelle 
only for semantic patches that don't have initialize and finalize rules.  
If the semantic patch has initialize and finalize rules, the user should 
have to explicitly device whether they want to run Coccinelle single 
threaded or run it in separate files in parallel and combine the results 
in the end in some other way.

Another alternative would be to introduce parallelism within the treatment 
of a single file, eg running a matching rule on each function in parallel.  
In preliminary experiments, this didn't seem cost effective.  There is a 
lot of context information that has to be transmitted to the child 
processes, and anyway the treatment of most functions is very fast, 
because there is nothing to do.

julia

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Cocci] Adjustments for Coccinelle's software build process?
  2014-04-11 20:04           ` Julia Lawall
@ 2014-04-11 20:42             ` SF Markus Elfring
  2014-04-11 20:55             ` [Cocci] Software parallelisation? SF Markus Elfring
  1 sibling, 0 replies; 28+ messages in thread
From: SF Markus Elfring @ 2014-04-11 20:42 UTC (permalink / raw)
  To: cocci

> 1. The Coccinelle build system is too complex or me to be able to figure 
> out how to include Parmap.  This probably just requires a little time and 
> patience to resolve.

How do you think about to collaborate with any more software developers?

Would you like to trust and reuse any other "expertise" for software builds with
"Autotools" (and automake) for example?

Regards,
Markus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Cocci] Software parallelisation?
  2014-04-11 20:04           ` Julia Lawall
  2014-04-11 20:42             ` [Cocci] Adjustments for Coccinelle's software build process? SF Markus Elfring
@ 2014-04-11 20:55             ` SF Markus Elfring
  1 sibling, 0 replies; 28+ messages in thread
From: SF Markus Elfring @ 2014-04-11 20:55 UTC (permalink / raw)
  To: cocci

> Another alternative would be to introduce parallelism within the treatment 
> of a single file, eg running a matching rule on each function in parallel.

Are you interested to reuse an evolving software library in the future?
http://stellar.cct.lsu.edu/publications/

Regards,
Markus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Cocci] [PATCH] coccinelle: add pycocci wrapper for multithreaded support
  2014-04-10 19:32       ` Julia Lawall
@ 2014-08-28  3:46         ` Luis R. Rodriguez
  -1 siblings, 0 replies; 28+ messages in thread
From: Luis R. Rodriguez @ 2014-08-28  3:46 UTC (permalink / raw)
  To: Julia Lawall; +Cc: Johannes Berg, linux-kernel, backports, cocci

On Thu, Apr 10, 2014 at 09:32:34PM +0200, Julia Lawall wrote:
> 
> 
> On Thu, 10 Apr 2014, Luis R. Rodriguez wrote:
> 
> > On Thu, Apr 10, 2014 at 07:51:29PM +0200, Johannes Berg wrote:
> > > On Thu, 2014-04-10 at 10:48 -0700, Luis R. Rodriguez wrote:
> > > 
> > > > You just pass it a cocci file, a target dir, and in git environments
> > > > you always want --in-place enabled. Experiments and profiling random
> > > > cocci files with the Linux kernel show that using just using number of
> > > > CPUs doesn't scale well given that lots of buckets of files don't require
> > > > work, as such this uses 10 * number of CPUs for its number of threads.
> > > > For work that define more general ruler 3 * number of CPUs works better,
> > > > but for smaller cocci files 3 * number of CPUs performs best right now.
> > > > To experiment more with what's going on with the multithreading one can enable
> > > > htop while kicking off a cocci task on the kernel, we want to keep
> > > > these CPUs busy as much as possible. 
> > > 
> > > That's not really a good benchmark, you want to actually check how
> > > quickly it finishes ... If you have some IO issues then just keeping the
> > > CPUs busy trying to do IO won't help at all.
> > 
> > I checked the profile results, the reason the jobs finish is some threads
> > had no work or little work. Hence why I increased the number of threads,
> > depending on the context (long or short cocci expected, in backports
> > at least, the long being all cocci files in one, the short being --test-cocci
> > flag to gentree.py). This wrapper uses the short assumption with 10 * num_cpus
> > 
> > > > Since its just a helper I toss it into the python directory but don't
> > > > install it. Hope is that we can evolve it there instead of carrying this
> > > > helper within backports.
> > > 
> > > If there's a plan to make coccinelle itself multi-threaded, what's the
> > > point?
> > 
> > To be clear, Coccinelle *has* a form of multithreaded support but requires manual
> > spawning of jobs with references to the max count and also the number thread
> > that this new process you are spawning belongs to. There's plans to consider
> > reworking things to handle all this internally but as I discussed with Julia
> > the changes required would require some structural changes, and as such we
> > need to live with this for a bit longer. I need to use Coccinelle daily now,
> > so figured I'd punt this out there in case others might make use of it.
> 
> I agree with Luis.  Multithreading inside Coccinelle is currently a 
> priority task, but not a highest priority one.

Folks, anyone object to merging pycocci in the meantime? I keep using it outside
of backports and it does what I think most kernel developers expect. This would
be until we get proper parallelism support in place.

  Luis

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Cocci] [PATCH] coccinelle: add pycocci wrapper for multithreaded support
@ 2014-08-28  3:46         ` Luis R. Rodriguez
  0 siblings, 0 replies; 28+ messages in thread
From: Luis R. Rodriguez @ 2014-08-28  3:46 UTC (permalink / raw)
  To: cocci

On Thu, Apr 10, 2014 at 09:32:34PM +0200, Julia Lawall wrote:
> 
> 
> On Thu, 10 Apr 2014, Luis R. Rodriguez wrote:
> 
> > On Thu, Apr 10, 2014 at 07:51:29PM +0200, Johannes Berg wrote:
> > > On Thu, 2014-04-10 at 10:48 -0700, Luis R. Rodriguez wrote:
> > > 
> > > > You just pass it a cocci file, a target dir, and in git environments
> > > > you always want --in-place enabled. Experiments and profiling random
> > > > cocci files with the Linux kernel show that using just using number of
> > > > CPUs doesn't scale well given that lots of buckets of files don't require
> > > > work, as such this uses 10 * number of CPUs for its number of threads.
> > > > For work that define more general ruler 3 * number of CPUs works better,
> > > > but for smaller cocci files 3 * number of CPUs performs best right now.
> > > > To experiment more with what's going on with the multithreading one can enable
> > > > htop while kicking off a cocci task on the kernel, we want to keep
> > > > these CPUs busy as much as possible. 
> > > 
> > > That's not really a good benchmark, you want to actually check how
> > > quickly it finishes ... If you have some IO issues then just keeping the
> > > CPUs busy trying to do IO won't help at all.
> > 
> > I checked the profile results, the reason the jobs finish is some threads
> > had no work or little work. Hence why I increased the number of threads,
> > depending on the context (long or short cocci expected, in backports
> > at least, the long being all cocci files in one, the short being --test-cocci
> > flag to gentree.py). This wrapper uses the short assumption with 10 * num_cpus
> > 
> > > > Since its just a helper I toss it into the python directory but don't
> > > > install it. Hope is that we can evolve it there instead of carrying this
> > > > helper within backports.
> > > 
> > > If there's a plan to make coccinelle itself multi-threaded, what's the
> > > point?
> > 
> > To be clear, Coccinelle *has* a form of multithreaded support but requires manual
> > spawning of jobs with references to the max count and also the number thread
> > that this new process you are spawning belongs to. There's plans to consider
> > reworking things to handle all this internally but as I discussed with Julia
> > the changes required would require some structural changes, and as such we
> > need to live with this for a bit longer. I need to use Coccinelle daily now,
> > so figured I'd punt this out there in case others might make use of it.
> 
> I agree with Luis.  Multithreading inside Coccinelle is currently a 
> priority task, but not a highest priority one.

Folks, anyone object to merging pycocci in the meantime? I keep using it outside
of backports and it does what I think most kernel developers expect. This would
be until we get proper parallelism support in place.

  Luis

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Cocci] [PATCH] coccinelle: add pycocci wrapper for multithreaded support
  2014-08-28  3:46         ` Luis R. Rodriguez
@ 2014-08-28 18:15           ` Julia Lawall
  -1 siblings, 0 replies; 28+ messages in thread
From: Julia Lawall @ 2014-08-28 18:15 UTC (permalink / raw)
  To: Luis R. Rodriguez; +Cc: Johannes Berg, linux-kernel, backports, cocci



On Thu, 28 Aug 2014, Luis R. Rodriguez wrote:

> On Thu, Apr 10, 2014 at 09:32:34PM +0200, Julia Lawall wrote:
> > 
> > 
> > On Thu, 10 Apr 2014, Luis R. Rodriguez wrote:
> > 
> > > On Thu, Apr 10, 2014 at 07:51:29PM +0200, Johannes Berg wrote:
> > > > On Thu, 2014-04-10 at 10:48 -0700, Luis R. Rodriguez wrote:
> > > > 
> > > > > You just pass it a cocci file, a target dir, and in git environments
> > > > > you always want --in-place enabled. Experiments and profiling random
> > > > > cocci files with the Linux kernel show that using just using number of
> > > > > CPUs doesn't scale well given that lots of buckets of files don't require
> > > > > work, as such this uses 10 * number of CPUs for its number of threads.
> > > > > For work that define more general ruler 3 * number of CPUs works better,
> > > > > but for smaller cocci files 3 * number of CPUs performs best right now.
> > > > > To experiment more with what's going on with the multithreading one can enable
> > > > > htop while kicking off a cocci task on the kernel, we want to keep
> > > > > these CPUs busy as much as possible. 
> > > > 
> > > > That's not really a good benchmark, you want to actually check how
> > > > quickly it finishes ... If you have some IO issues then just keeping the
> > > > CPUs busy trying to do IO won't help at all.
> > > 
> > > I checked the profile results, the reason the jobs finish is some threads
> > > had no work or little work. Hence why I increased the number of threads,
> > > depending on the context (long or short cocci expected, in backports
> > > at least, the long being all cocci files in one, the short being --test-cocci
> > > flag to gentree.py). This wrapper uses the short assumption with 10 * num_cpus
> > > 
> > > > > Since its just a helper I toss it into the python directory but don't
> > > > > install it. Hope is that we can evolve it there instead of carrying this
> > > > > helper within backports.
> > > > 
> > > > If there's a plan to make coccinelle itself multi-threaded, what's the
> > > > point?
> > > 
> > > To be clear, Coccinelle *has* a form of multithreaded support but requires manual
> > > spawning of jobs with references to the max count and also the number thread
> > > that this new process you are spawning belongs to. There's plans to consider
> > > reworking things to handle all this internally but as I discussed with Julia
> > > the changes required would require some structural changes, and as such we
> > > need to live with this for a bit longer. I need to use Coccinelle daily now,
> > > so figured I'd punt this out there in case others might make use of it.
> > 
> > I agree with Luis.  Multithreading inside Coccinelle is currently a 
> > priority task, but not a highest priority one.
> 
> Folks, anyone object to merging pycocci in the meantime? I keep using it outside
> of backports and it does what I think most kernel developers expect. This would
> be until we get proper parallelism support in place.

Merge away...

julia

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Cocci] [PATCH] coccinelle: add pycocci wrapper for multithreaded support
@ 2014-08-28 18:15           ` Julia Lawall
  0 siblings, 0 replies; 28+ messages in thread
From: Julia Lawall @ 2014-08-28 18:15 UTC (permalink / raw)
  To: cocci



On Thu, 28 Aug 2014, Luis R. Rodriguez wrote:

> On Thu, Apr 10, 2014 at 09:32:34PM +0200, Julia Lawall wrote:
> > 
> > 
> > On Thu, 10 Apr 2014, Luis R. Rodriguez wrote:
> > 
> > > On Thu, Apr 10, 2014 at 07:51:29PM +0200, Johannes Berg wrote:
> > > > On Thu, 2014-04-10 at 10:48 -0700, Luis R. Rodriguez wrote:
> > > > 
> > > > > You just pass it a cocci file, a target dir, and in git environments
> > > > > you always want --in-place enabled. Experiments and profiling random
> > > > > cocci files with the Linux kernel show that using just using number of
> > > > > CPUs doesn't scale well given that lots of buckets of files don't require
> > > > > work, as such this uses 10 * number of CPUs for its number of threads.
> > > > > For work that define more general ruler 3 * number of CPUs works better,
> > > > > but for smaller cocci files 3 * number of CPUs performs best right now.
> > > > > To experiment more with what's going on with the multithreading one can enable
> > > > > htop while kicking off a cocci task on the kernel, we want to keep
> > > > > these CPUs busy as much as possible. 
> > > > 
> > > > That's not really a good benchmark, you want to actually check how
> > > > quickly it finishes ... If you have some IO issues then just keeping the
> > > > CPUs busy trying to do IO won't help at all.
> > > 
> > > I checked the profile results, the reason the jobs finish is some threads
> > > had no work or little work. Hence why I increased the number of threads,
> > > depending on the context (long or short cocci expected, in backports
> > > at least, the long being all cocci files in one, the short being --test-cocci
> > > flag to gentree.py). This wrapper uses the short assumption with 10 * num_cpus
> > > 
> > > > > Since its just a helper I toss it into the python directory but don't
> > > > > install it. Hope is that we can evolve it there instead of carrying this
> > > > > helper within backports.
> > > > 
> > > > If there's a plan to make coccinelle itself multi-threaded, what's the
> > > > point?
> > > 
> > > To be clear, Coccinelle *has* a form of multithreaded support but requires manual
> > > spawning of jobs with references to the max count and also the number thread
> > > that this new process you are spawning belongs to. There's plans to consider
> > > reworking things to handle all this internally but as I discussed with Julia
> > > the changes required would require some structural changes, and as such we
> > > need to live with this for a bit longer. I need to use Coccinelle daily now,
> > > so figured I'd punt this out there in case others might make use of it.
> > 
> > I agree with Luis.  Multithreading inside Coccinelle is currently a 
> > priority task, but not a highest priority one.
> 
> Folks, anyone object to merging pycocci in the meantime? I keep using it outside
> of backports and it does what I think most kernel developers expect. This would
> be until we get proper parallelism support in place.

Merge away...

julia

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Cocci] [PATCH] coccinelle: add pycocci wrapper for multithreaded support
  2014-08-28 18:15           ` Julia Lawall
@ 2014-08-28 20:02             ` Luis R. Rodriguez
  -1 siblings, 0 replies; 28+ messages in thread
From: Luis R. Rodriguez @ 2014-08-28 20:02 UTC (permalink / raw)
  To: Julia Lawall; +Cc: Johannes Berg, linux-kernel, backports, cocci

On Thu, Aug 28, 2014 at 08:15:16PM +0200, Julia Lawall wrote:
> 
> 
> On Thu, 28 Aug 2014, Luis R. Rodriguez wrote:
> 
> > On Thu, Apr 10, 2014 at 09:32:34PM +0200, Julia Lawall wrote:
> > > 
> > > 
> > > On Thu, 10 Apr 2014, Luis R. Rodriguez wrote:
> > > 
> > > > On Thu, Apr 10, 2014 at 07:51:29PM +0200, Johannes Berg wrote:
> > > > > On Thu, 2014-04-10 at 10:48 -0700, Luis R. Rodriguez wrote:
> > > > > 
> > > > > > You just pass it a cocci file, a target dir, and in git environments
> > > > > > you always want --in-place enabled. Experiments and profiling random
> > > > > > cocci files with the Linux kernel show that using just using number of
> > > > > > CPUs doesn't scale well given that lots of buckets of files don't require
> > > > > > work, as such this uses 10 * number of CPUs for its number of threads.
> > > > > > For work that define more general ruler 3 * number of CPUs works better,
> > > > > > but for smaller cocci files 3 * number of CPUs performs best right now.
> > > > > > To experiment more with what's going on with the multithreading one can enable
> > > > > > htop while kicking off a cocci task on the kernel, we want to keep
> > > > > > these CPUs busy as much as possible. 
> > > > > 
> > > > > That's not really a good benchmark, you want to actually check how
> > > > > quickly it finishes ... If you have some IO issues then just keeping the
> > > > > CPUs busy trying to do IO won't help at all.
> > > > 
> > > > I checked the profile results, the reason the jobs finish is some threads
> > > > had no work or little work. Hence why I increased the number of threads,
> > > > depending on the context (long or short cocci expected, in backports
> > > > at least, the long being all cocci files in one, the short being --test-cocci
> > > > flag to gentree.py). This wrapper uses the short assumption with 10 * num_cpus
> > > > 
> > > > > > Since its just a helper I toss it into the python directory but don't
> > > > > > install it. Hope is that we can evolve it there instead of carrying this
> > > > > > helper within backports.
> > > > > 
> > > > > If there's a plan to make coccinelle itself multi-threaded, what's the
> > > > > point?
> > > > 
> > > > To be clear, Coccinelle *has* a form of multithreaded support but requires manual
> > > > spawning of jobs with references to the max count and also the number thread
> > > > that this new process you are spawning belongs to. There's plans to consider
> > > > reworking things to handle all this internally but as I discussed with Julia
> > > > the changes required would require some structural changes, and as such we
> > > > need to live with this for a bit longer. I need to use Coccinelle daily now,
> > > > so figured I'd punt this out there in case others might make use of it.
> > > 
> > > I agree with Luis.  Multithreading inside Coccinelle is currently a 
> > > priority task, but not a highest priority one.
> > 
> > Folks, anyone object to merging pycocci in the meantime? I keep using it outside
> > of backports and it does what I think most kernel developers expect. This would
> > be until we get proper parallelism support in place.
> 
> Merge away...

Pushed.

  Luis

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Cocci] [PATCH] coccinelle: add pycocci wrapper for multithreaded support
@ 2014-08-28 20:02             ` Luis R. Rodriguez
  0 siblings, 0 replies; 28+ messages in thread
From: Luis R. Rodriguez @ 2014-08-28 20:02 UTC (permalink / raw)
  To: cocci

On Thu, Aug 28, 2014 at 08:15:16PM +0200, Julia Lawall wrote:
> 
> 
> On Thu, 28 Aug 2014, Luis R. Rodriguez wrote:
> 
> > On Thu, Apr 10, 2014 at 09:32:34PM +0200, Julia Lawall wrote:
> > > 
> > > 
> > > On Thu, 10 Apr 2014, Luis R. Rodriguez wrote:
> > > 
> > > > On Thu, Apr 10, 2014 at 07:51:29PM +0200, Johannes Berg wrote:
> > > > > On Thu, 2014-04-10 at 10:48 -0700, Luis R. Rodriguez wrote:
> > > > > 
> > > > > > You just pass it a cocci file, a target dir, and in git environments
> > > > > > you always want --in-place enabled. Experiments and profiling random
> > > > > > cocci files with the Linux kernel show that using just using number of
> > > > > > CPUs doesn't scale well given that lots of buckets of files don't require
> > > > > > work, as such this uses 10 * number of CPUs for its number of threads.
> > > > > > For work that define more general ruler 3 * number of CPUs works better,
> > > > > > but for smaller cocci files 3 * number of CPUs performs best right now.
> > > > > > To experiment more with what's going on with the multithreading one can enable
> > > > > > htop while kicking off a cocci task on the kernel, we want to keep
> > > > > > these CPUs busy as much as possible. 
> > > > > 
> > > > > That's not really a good benchmark, you want to actually check how
> > > > > quickly it finishes ... If you have some IO issues then just keeping the
> > > > > CPUs busy trying to do IO won't help at all.
> > > > 
> > > > I checked the profile results, the reason the jobs finish is some threads
> > > > had no work or little work. Hence why I increased the number of threads,
> > > > depending on the context (long or short cocci expected, in backports
> > > > at least, the long being all cocci files in one, the short being --test-cocci
> > > > flag to gentree.py). This wrapper uses the short assumption with 10 * num_cpus
> > > > 
> > > > > > Since its just a helper I toss it into the python directory but don't
> > > > > > install it. Hope is that we can evolve it there instead of carrying this
> > > > > > helper within backports.
> > > > > 
> > > > > If there's a plan to make coccinelle itself multi-threaded, what's the
> > > > > point?
> > > > 
> > > > To be clear, Coccinelle *has* a form of multithreaded support but requires manual
> > > > spawning of jobs with references to the max count and also the number thread
> > > > that this new process you are spawning belongs to. There's plans to consider
> > > > reworking things to handle all this internally but as I discussed with Julia
> > > > the changes required would require some structural changes, and as such we
> > > > need to live with this for a bit longer. I need to use Coccinelle daily now,
> > > > so figured I'd punt this out there in case others might make use of it.
> > > 
> > > I agree with Luis.  Multithreading inside Coccinelle is currently a 
> > > priority task, but not a highest priority one.
> > 
> > Folks, anyone object to merging pycocci in the meantime? I keep using it outside
> > of backports and it does what I think most kernel developers expect. This would
> > be until we get proper parallelism support in place.
> 
> Merge away...

Pushed.

  Luis

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2014-08-28 20:02 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-10 17:48 [PATCH] coccinelle: add pycocci wrapper for multithreaded support Luis R. Rodriguez
2014-04-10 17:48 ` [Cocci] " Luis R. Rodriguez
2014-04-10 17:51 ` Johannes Berg
2014-04-10 17:51   ` [Cocci] " Johannes Berg
2014-04-10 17:57   ` Luis R. Rodriguez
2014-04-10 17:57     ` [Cocci] " Luis R. Rodriguez
2014-04-10 19:32     ` Julia Lawall
2014-04-10 19:32       ` Julia Lawall
2014-04-11  8:50       ` [Cocci] Software parallelisation? SF Markus Elfring
2014-04-11 18:49         ` Jesper Louis Andersen
2014-04-11 19:40           ` SF Markus Elfring
2014-04-11 20:04           ` Julia Lawall
2014-04-11 20:42             ` [Cocci] Adjustments for Coccinelle's software build process? SF Markus Elfring
2014-04-11 20:55             ` [Cocci] Software parallelisation? SF Markus Elfring
2014-08-28  3:46       ` [Cocci] [PATCH] coccinelle: add pycocci wrapper for multithreaded support Luis R. Rodriguez
2014-08-28  3:46         ` Luis R. Rodriguez
2014-08-28 18:15         ` Julia Lawall
2014-08-28 18:15           ` Julia Lawall
2014-08-28 20:02           ` Luis R. Rodriguez
2014-08-28 20:02             ` Luis R. Rodriguez
2014-04-11  5:55     ` SF Markus Elfring
2014-04-11  5:55       ` [Cocci] " SF Markus Elfring
2014-04-11  6:01       ` Julia Lawall
2014-04-11  6:01         ` Julia Lawall
2014-04-11  6:15         ` SF Markus Elfring
2014-04-11  6:15           ` SF Markus Elfring
2014-04-11 19:00         ` Luis R. Rodriguez
2014-04-11 19:00           ` Luis R. Rodriguez

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.