* Let the fun start
@ 2019-05-07 11:53 Thomas Gleixner
0 siblings, 0 replies; only message in thread
From: Thomas Gleixner @ 2019-05-07 11:53 UTC (permalink / raw)
To: linux-spdx
Hi!
To get the work going I set up git repositories with tools and results. See
the document below.
As a follow up I'm going to post the first few patches from step2 (GPL
boiler plate replacement) so you get the idea how this looks like and we
can discuss how we proceed with review etc.
Thanks,
tglx
Machine assisted license cleanup
--------------------------------
1. Tools for reproduction:
1.1 scancode toolkit
A license scanner tool which can be run from the command line and
provides excellent parellelisation. While fast, its recommended to
be run on a machine with tons of CPUs and tons of Memory.
A run with 128 parallel scan threads takes about 15 minutes. Go
figure how long it will take on your laptop :)
https://github.com/nexB/scancode-toolkit
1.2 spdx helper scripts
A bunch of horrible python scripts with even more horrible shell
glue.
git://git.kernel.org/pub/scm/utils/spdx/spdx-utils
gitweb URL:
https://git.kernel.org/pub/scm/utils/spdx/spdx-utils.git
The main workhorse is lcheck.py. I wrote it initialy to gather
statistics and other information, but over time it evolved to a
swiss army knife. lcheck.py --help gives you the gory details, no
manpage sorry.
1.3 git
The git tools must be available.
A clean linux tree must be cloned. Ensure that there are no
artifacts from editing, patch directories etc.
To reproduce the setup (in case you have a big enough machine or
lots of time for thumb twiddling):
- Install scancode and git. If you need help with scancode talk
to Philipe.
- Clone the linux kernel
- Clone the spdx scripts
- cd into the spdx scripts directory
- invoke the runscript with:
./runall.sh path/to/linux/kernel
The path can be relative or absolute
- Wait ....
- Check the results in the stepX directories
- Chech the results in the kernel directory (each step creates a
branch).
For your convenience:
The spdx-utils repository contains aside of the master branch a branch
linux-5.1. It contains:
- the scancode json files for each step
- the stats.txt file for each step
- the rules which are handled in each step
- the resulting patches
The resulting kernel tree is pushed to:
git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-spdx.git
Branches step1, step2, step3 contain the steps documented below.
gitweb URL:
https://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-spdx.git
2) Approach
The Documentation directory is ignored for now. That needs some extra
care.
2.1 Files with no license
These files have not been touched during the first large sweep.
2.1.1 Build files
Make/Kconfig files without license information
2.1.2 Source files which have only MODULE_LICENSE("GPL") and/or
EXPORT_SYMBOL_GPL()
Now that MODULE_LICENSE is clarified this can be tackled.
The scripts identify these files in the scanner result and add the
proper license identifier (GPL-2.0-only)
The scripts generate patches which can be applied with quilt or imported
into git with 'git quiltimport'
SPDX count goes from 22574 to 25712 (44.9%)
2.2 Files with a single license: GPL-2.0-only or GPL-2.0-or-later
The scripts handle the following tasks:
- Find the affected files in the scanner output
- Generate a list of match rules which represent a unique pattern
This is achieved by normalizing the texts (removing formatting,
white space damage, uppercase / lowercase and punctuation damage.
- Add the appropriate license header and remove the boiler plate
text or the license reference.
- Create a patch series. Each patch contains only the modifications
for a single match rule. The rule (and eventual variants)
are saved in the change log of each patch to ease review
- Once a reference dataset (compliance data provided by Siemens) is
available the scripts will also check for conflicts with that
data set.
This results in 515 patches at the moment.
The scripts generate patches which can be applied with quilt or
imported into git with 'git quiltimport'
SPDX count goes from 25712 to 46368 (80.7%)
2.3. Files with GPL-2.9-only/or-later and Linux-OpenIB
Basically the same as above just with dual licensing.
SPDX count goes from 46368 to 46865 (81.9%)
2.4 More fun later :)
I have quite a bunch of steps in preparation but lets get the above
agreed on and reviewed first.
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2019-05-07 11:54 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-07 11:53 Let the fun start Thomas Gleixner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).