From: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
To: linux-fsdevel@vger.kernel.org
Cc: jra@google.com, tytso@mit.edu, olaf@sgi.com,
darrick.wong@oracle.com, kernel@lists.collabora.co.uk,
Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Subject: [PATCH 00/15] NLS refactor and UTF-8 normalization
Date: Wed, 9 May 2018 03:47:51 -0300 [thread overview]
Message-ID: <20180509064800.28658-1-krisman@collabora.co.uk> (raw)
The goal of this patchset is to adapt the NLS subsystem to support a
full UTF-8 normalization and casefold operations for specific versions
of unicode. There are many use cases of this feature and, while my
specific interest is on case-insensitiveness for local filesystems, I am
aware this might be in the wishlist of people working on SMB, NFS and
others.
The first part of the patchset refactors the NLS subsytem to allow
multiple tables of the same encoding (differentiated by the version) in
an inexpensive way. It also create hooks for some higher-level
operations, like comparisons and normalization. A new interface is
exported to request a specific version of the charset.
The second part of the patchset introduces the utf8 normalization code
as a new NLS charset. It is implemented as a separated charset, called
utf8n to preserve curent behavior of the nls_utf8 charset. The
normalization core is provided by the SGI patches from 2014, which I
refactored, adapted and updated to unicode 10.0.0.
The last patch is a test module, which does some sanity check on the
normalization code when it is loaded.
As usual, the unicode source files are not part of the patch because
they are too big and would bounce the emails. If you are interested in
fetching this with minimal effort, you can clone a branch from:
https://gitlab.collabora.com/krisman/linux.git -b nls-merge-int
Gabriel Krisman Bertazi (11):
nls: Wrap uni2char/char2uni callers
nls: Wrap charset field access
nls: Wrap charset hooks in ops structure
nls: Split default charset from NLS core
nls: Split struct nls_charset from struct nls_table
nls: Add support for multiple versions of an encoding
nls: Add new interface for string comparisons
nls: Let charsets define the behavior of tolower/toupper
nls: Add optional normalization and casefold hooks
nls: utf8norm: Integrate utf8norm code with NLS subsystem
nls: utf8norm: Introduce test module for utf8norm implementation
Olaf Weber (4):
nls: utf8norm: Add unicode character database files
scripts: add trie generator for UTF-8
nls: utf8norm: Introduce code for UTF-8 normalization
nls: utf8norm: reduce the size of utf8data[]
drivers/staging/ncpfs/ioctl.c | 13 +-
drivers/staging/ncpfs/ncplib_kernel.c | 8 +-
fs/befs/linuxvfs.c | 8 +-
fs/cifs/cifs_unicode.c | 15 +-
fs/cifs/cifsfs.c | 2 +-
fs/cifs/connect.c | 2 +-
fs/cifs/dir.c | 7 +-
fs/fat/dir.c | 13 +-
fs/fat/inode.c | 6 +-
fs/fat/namei_vfat.c | 6 +-
fs/hfs/super.c | 6 +-
fs/hfs/trans.c | 9 +-
fs/hfsplus/options.c | 2 +-
fs/hfsplus/unicode.c | 6 +-
fs/isofs/inode.c | 5 +-
fs/isofs/joliet.c | 3 +-
fs/jfs/jfs_unicode.c | 9 +-
fs/jfs/super.c | 3 +-
fs/nls/Kconfig | 13 +
fs/nls/Makefile | 20 +-
fs/nls/mac-celtic.c | 34 +-
fs/nls/mac-centeuro.c | 34 +-
fs/nls/mac-croatian.c | 34 +-
fs/nls/mac-cyrillic.c | 34 +-
fs/nls/mac-gaelic.c | 34 +-
fs/nls/mac-greek.c | 34 +-
fs/nls/mac-iceland.c | 34 +-
fs/nls/mac-inuit.c | 34 +-
fs/nls/mac-roman.c | 34 +-
fs/nls/mac-romanian.c | 34 +-
fs/nls/mac-turkish.c | 34 +-
fs/nls/nls_ascii.c | 34 +-
fs/nls/nls_base.c | 505 +---
fs/nls/nls_cp1250.c | 34 +-
fs/nls/nls_cp1251.c | 34 +-
fs/nls/nls_cp1255.c | 36 +-
fs/nls/nls_cp437.c | 34 +-
fs/nls/nls_cp737.c | 34 +-
fs/nls/nls_cp775.c | 34 +-
fs/nls/nls_cp850.c | 34 +-
fs/nls/nls_cp852.c | 34 +-
fs/nls/nls_cp855.c | 34 +-
fs/nls/nls_cp857.c | 34 +-
fs/nls/nls_cp860.c | 34 +-
fs/nls/nls_cp861.c | 34 +-
fs/nls/nls_cp862.c | 34 +-
fs/nls/nls_cp863.c | 34 +-
fs/nls/nls_cp864.c | 34 +-
fs/nls/nls_cp865.c | 34 +-
fs/nls/nls_cp866.c | 34 +-
fs/nls/nls_cp869.c | 34 +-
fs/nls/nls_cp874.c | 36 +-
fs/nls/nls_cp932.c | 36 +-
fs/nls/nls_cp936.c | 36 +-
fs/nls/nls_cp949.c | 36 +-
fs/nls/nls_cp950.c | 36 +-
fs/nls/nls_default.c | 488 ++++
fs/nls/nls_euc-jp.c | 29 +-
fs/nls/nls_iso8859-1.c | 34 +-
fs/nls/nls_iso8859-13.c | 34 +-
fs/nls/nls_iso8859-14.c | 34 +-
fs/nls/nls_iso8859-15.c | 34 +-
fs/nls/nls_iso8859-2.c | 34 +-
fs/nls/nls_iso8859-3.c | 34 +-
fs/nls/nls_iso8859-4.c | 34 +-
fs/nls/nls_iso8859-5.c | 34 +-
fs/nls/nls_iso8859-6.c | 34 +-
fs/nls/nls_iso8859-7.c | 34 +-
fs/nls/nls_iso8859-9.c | 34 +-
fs/nls/nls_koi8-r.c | 34 +-
fs/nls/nls_koi8-ru.c | 30 +-
fs/nls/nls_koi8-u.c | 34 +-
fs/nls/nls_utf8.c | 34 +-
fs/nls/nls_utf8n-core.c | 291 +++
fs/nls/nls_utf8n-norm.c | 797 ++++++
fs/nls/nls_utf8n-selftest.c | 307 +++
fs/nls/ucd/README | 33 +
fs/nls/utf8n.h | 117 +
fs/ntfs/inode.c | 2 +-
fs/ntfs/super.c | 6 +-
fs/ntfs/unistr.c | 13 +-
fs/udf/super.c | 3 +-
fs/udf/unicode.c | 4 +-
include/linux/nls.h | 126 +-
scripts/Makefile | 1 +
scripts/mkutf8data.c | 3464 +++++++++++++++++++++++++
86 files changed, 7138 insertions(+), 912 deletions(-)
create mode 100644 fs/nls/nls_default.c
create mode 100644 fs/nls/nls_utf8n-core.c
create mode 100644 fs/nls/nls_utf8n-norm.c
create mode 100644 fs/nls/nls_utf8n-selftest.c
create mode 100644 fs/nls/ucd/README
create mode 100644 fs/nls/utf8n.h
create mode 100644 scripts/mkutf8data.c
--
2.17.0
next reply other threads:[~2018-05-09 6:49 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-09 6:47 Gabriel Krisman Bertazi [this message]
2018-05-09 6:47 ` [PATCH 01/15] nls: Wrap uni2char/char2uni callers Gabriel Krisman Bertazi
2018-05-09 6:47 ` [PATCH 02/15] nls: Wrap charset field access Gabriel Krisman Bertazi
2018-05-09 6:47 ` [PATCH 03/15] nls: Wrap charset hooks in ops structure Gabriel Krisman Bertazi
2018-05-09 6:47 ` [PATCH 04/15] nls: Split default charset from NLS core Gabriel Krisman Bertazi
2018-05-09 14:52 ` kbuild test robot
2018-05-15 2:45 ` Gabriel Krisman Bertazi
2018-05-09 6:47 ` [PATCH 05/15] nls: Split struct nls_charset from struct nls_table Gabriel Krisman Bertazi
2018-05-09 14:30 ` kbuild test robot
2018-05-15 2:41 ` Gabriel Krisman Bertazi
2018-05-09 6:47 ` [PATCH 06/15] nls: Add support for multiple versions of an encoding Gabriel Krisman Bertazi
2018-05-09 6:47 ` [PATCH 07/15] nls: Add new interface for string comparisons Gabriel Krisman Bertazi
2018-05-09 6:47 ` [PATCH 08/15] nls: Let charsets define the behavior of tolower/toupper Gabriel Krisman Bertazi
2018-05-09 6:48 ` [PATCH 09/15] nls: Add optional normalization and casefold hooks Gabriel Krisman Bertazi
2018-05-09 6:55 ` [PATCH 10/15] nls: utf8norm: Add unicode character database files Gabriel Krisman Bertazi
2018-05-09 6:55 ` [PATCH 11/15] scripts: add trie generator for UTF-8 Gabriel Krisman Bertazi
2018-05-09 6:55 ` [PATCH 12/15] nls: utf8norm: Introduce code for UTF-8 normalization Gabriel Krisman Bertazi
2018-05-09 17:02 ` kbuild test robot
2018-05-09 18:46 ` Gabriel Krisman Bertazi
2018-05-09 6:55 ` [PATCH 13/15] nls: utf8norm: reduce the size of utf8data[] Gabriel Krisman Bertazi
2018-05-09 6:55 ` [PATCH 14/15] nls: utf8norm: Integrate utf8norm code with NLS subsystem Gabriel Krisman Bertazi
2018-05-09 6:55 ` [PATCH 15/15] nls: utf8norm: Introduce test module for utf8norm implementation Gabriel Krisman Bertazi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180509064800.28658-1-krisman@collabora.co.uk \
--to=krisman@collabora.co.uk \
--cc=darrick.wong@oracle.com \
--cc=jra@google.com \
--cc=kernel@lists.collabora.co.uk \
--cc=linux-fsdevel@vger.kernel.org \
--cc=olaf@sgi.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).