All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Huth <thuth@redhat.com>
To: Stefan Weil <sw@weilnetz.de>, qemu-devel@nongnu.org
Cc: Peter Maydell <peter.maydell@linaro.org>,
	Richard Henderson <richard.henderson@linaro.org>,
	"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [RFC PATCH] Add new build target 'check-spelling'
Date: Mon, 31 Oct 2022 11:50:00 +0100	[thread overview]
Message-ID: <2d9d7cfb-013f-4920-7155-0c56198d88ad@redhat.com> (raw)
In-Reply-To: <1ed28d1b-4b80-055d-5fac-d4d87ac187d3@weilnetz.de>

On 31/10/2022 11.44, Stefan Weil wrote:
> Am 31.10.22 um 08:52 schrieb Thomas Huth:
> 
>> On 31/10/2022 08.43, Stefan Weil wrote:
>>> `make check-spelling` can now be used to get a list of spelling errors.
>>> It uses the latest version of codespell, a spell checker implemented in 
>>> Python.
>>>
>>> Signed-off-by: Stefan Weil <sw@weilnetz.de>
>>> ---
>>>
>>> This RFC can already be used for manual tests, but still reports false
>>> positives, mostly because some variable names are interpreted as words.
>>> These words can either be ignored in the check, or in some cases the code
>>> might be changed to use different variable names.
>>>
>>> The check currently only skips a few directories and files, so for example
>>> checked out submodules are also checked.
>>>
>>> The rule can be extended to allow user provided ignore and skip lists,
>>> for example by introducing Makefile variables CODESPELL_SKIP=userfile
>>> or CODESPELL_IGNORE=userfile. A limited check could be implemented by
>>> providing a base directory CODESPELL_START=basedirectory, for example
>>> CODESPELL_START=docs.
>>>
>>> Regards,
>>> Stefan
> [...]
>>> I like the idea, but I think it's unlikely that we can make this work for 
>>> the whole source tree any time soon. So maybe it makes more sense to 
>>> start with some few directories first (e.g. docs/ ) and then the 
>>> maintainers can opt-in by cleaning up their directories first and then by 
>>> adding their directories to this target here?
>>
>>  Thomas
> 
> 
> Even without implementing CODESPELL_START as described above, the script can 
> already be used and integrated into CI scripts.
> 
> It takes about 60 seconds to check the whole source tree including 
> submodules on my (slow) virtual machine.
> 
> The resulting output has about 20000 lines or 1272 KiB. It can be filtered 
> for relevant parts of the source tree or used for a summary.
> 
> Sample script: grep "^[.]" spellcheck.log | sed s/^..// | sed 's/\/.*//' | 
> sed s/:.*// | sort | uniq -c
> 
> This produces a summary for the top level hierarchy of files and directories:
> 
>        3 accel
>        1 audio
>        1 backends
>       77 block
>        7 block.c
>       20 bsd-user
>      386 capstone
>       12 chardev
>        1 configure
>        8 contrib
>        6 crypto
>       64 disas
>       32 docs
>       31 dtc
>        8 fpu
>        1 gdbstub
>        1 gdb-xml
>        1 .github
>      537 hw
>        7 inc
>      114 include
>        1 libdecnumber
>       33 linux-user
>        1 MAINTAINERS
>      150 meson
>        6 meson.build
>       16 migration
>        1 nbd
>        5 net
>       12 pc-bios
>        7 python
>        3 qapi
>        2 qemu
>        5 qemu-options.hx
>       22 qga
>    14175 roms
>       43 scripts
>        3 semihosting
>       18 slirp
>        2 softmmu
>       59 subprojects
>      504 target
>        6 tcg
>        3 test.rb
>      175 tests
>        6 tools
>       20 ui
>        8 util
> 
> It shows that "roms" contributes by far the most typos. Omitting it would 
> reduce the required time to 22 seconds and the number of typos found (2947 
> lines in output) very much.

"roms" mostly consists of third-party submodules that we do not have direct 
control of. I think this should definitely be omitted.

> "capstone" (which has no entry in MAINTAINERS)

That's likely because it has been a submodule that has been removed a while 
ago. "rm -rf capstone" should solve that issue on your local buildtree ;-)

(yes, that's another nuisance of submodules - the checked out files don't go 
away when the submodule gets removed)

  Thomas



  reply	other threads:[~2022-10-31 10:50 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-31  7:43 [RFC PATCH] Add new build target 'check-spelling' Stefan Weil via
2022-10-31  7:52 ` Thomas Huth
2022-10-31 10:44   ` Stefan Weil via
2022-10-31 10:50     ` Thomas Huth [this message]
2022-10-31 10:52     ` Daniel P. Berrangé
2022-10-31 15:40 ` Philippe Mathieu-Daudé
2022-10-31 16:45   ` Stefan Weil via

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2d9d7cfb-013f-4920-7155-0c56198d88ad@redhat.com \
    --to=thuth@redhat.com \
    --cc=mst@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=sw@weilnetz.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.