linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
To: viro@ZenIV.linux.org.uk
Cc: jra@google.com, tytso@mit.edu, olaf@sgi.com,
	darrick.wong@oracle.com, kernel@lists.collabora.co.uk,
	linux-fsdevel@vger.kernel.org, david@fromorbit.com, jack@suse.cz,
	linux-kernel@vger.kernel.org,
	Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Subject: [PATCH v2 10/15] nls: utf8norm: Add unicode character database files
Date: Mon, 21 May 2018 14:36:12 -0300	[thread overview]
Message-ID: <20180521173617.31625-11-krisman@collabora.co.uk> (raw)
In-Reply-To: <20180521173617.31625-1-krisman@collabora.co.uk>

From: Olaf Weber <olaf@sgi.com>

Add files from the Unicode Character Database, version 10.0.0, to the
source.  A helper program that generates a trie used for normalization
from these files is part of a separate commit.

- Notes on the update from 8.0.0 and 10.0.0:

The structure of ucd files and special cases have not experienced any
changes between versions 8.0.0 and 10.0.0.  8.0.0 saw the addition of
Cherokee LC characters, which is an interesting case for case-folding.
The update is accompanied by new tests on the test_ucd module to catch
specific cases.  No changes to mkutf8data script was required for the
update.

The actual files are not part of the commit submitted to the list
because they are to big and would bounce.  Still, they can be obtained
by the following script:

FILES="CaseFolding.txt DerivedAge.txt extracted/DerivedCombiningClass.txt
       DerivedCoreProperties.txt NormalizationCorrections.txt
       NormalizationTest.txt UnicodeData.txt"
VERSION=10.0.0
BASE=http://www.unicode.org/Public/${VERSION}/ucd

for i in ${FILES} ; do
  wget "${BASE}/$i" -O fs/nls/ucd/$(basename ${i} .txt)-${VERSION}.txt
done

Signed-off-by: Olaf Weber <olaf@sgi.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
  [Move ucd directory to fs/nls/]
  [Update to ucd-10.0.0]
---
 fs/nls/ucd/README | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)
 create mode 100644 fs/nls/ucd/README

diff --git a/fs/nls/ucd/README b/fs/nls/ucd/README
new file mode 100644
index 000000000000..67f2075d1fca
--- /dev/null
+++ b/fs/nls/ucd/README
@@ -0,0 +1,33 @@
+The files in this directory are part of the Unicode Character Database
+for version 10.0.0 of the Unicode standard.
+
+The full set of files can be found here:
+
+  http://www.unicode.org/Public/10.0.0/ucd/
+
+The latest released version of the UCD can be found here:
+
+  http://www.unicode.org/Public/UCD/latest/
+
+The files in this directory are identical, except that they have been
+renamed with a suffix indicating the unicode version.
+
+Individual source links:
+
+  http://www.unicode.org/Public/10.0.0/ucd/CaseFolding.txt
+  http://www.unicode.org/Public/10.0.0/ucd/DerivedAge.txt
+  http://www.unicode.org/Public/10.0.0/ucd/extracted/DerivedCombiningClass.txt
+  http://www.unicode.org/Public/10.0.0/ucd/DerivedCoreProperties.txt
+  http://www.unicode.org/Public/10.0.0/ucd/NormalizationCorrections.txt
+  http://www.unicode.org/Public/10.0.0/ucd/NormalizationTest.txt
+  http://www.unicode.org/Public/10.0.0/ucd/UnicodeData.txt
+
+md5sums
+
+  7893b6e005c5a521319a0d12062ae122  CaseFolding-10.0.0.txt
+  a602e4b44de3350087e40f2eb2184898  DerivedAge-10.0.0.txt
+  5abdeb21af4edcc5d1e4c0b5802fc7a7  DerivedCombiningClass-10.0.0.txt
+  eda11c2c2e3c308d9d3b90e2b3282024  DerivedCoreProperties-10.0.0.txt
+  425ece5ffbecd0140d98c13ce05724aa  NormalizationCorrections-10.0.0.txt
+  7296fe7aa07d7d288e65d559af2ad49b  NormalizationTest-10.0.0.txt
+  2a52f30695dcc821f0f224650552beaf  UnicodeData-10.0.0.txt
-- 
2.17.0

  parent reply	other threads:[~2018-05-21 17:37 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-21 17:36 [PATCH v2 00/15] NLS refactor and UTF-8 normalization Gabriel Krisman Bertazi
2018-05-21 17:36 ` [PATCH v2 01/15] nls: Wrap uni2char/char2uni callers Gabriel Krisman Bertazi
2018-05-22  8:37   ` Jan Kara
2018-05-22  8:40     ` Jan Kara
2018-05-21 17:36 ` [PATCH v2 02/15] nls: Wrap charset field access Gabriel Krisman Bertazi
2018-05-21 17:36 ` [PATCH v2 03/15] nls: Wrap charset hooks in ops structure Gabriel Krisman Bertazi
2018-05-21 17:36 ` [PATCH v2 04/15] nls: Split default charset from NLS core Gabriel Krisman Bertazi
2018-05-21 17:36 ` [PATCH v2 05/15] nls: Split struct nls_charset from struct nls_table Gabriel Krisman Bertazi
2018-05-21 17:36 ` [PATCH v2 06/15] nls: Add support for multiple versions of an encoding Gabriel Krisman Bertazi
2018-05-21 17:36 ` [PATCH v2 07/15] nls: Add new interface for string comparisons Gabriel Krisman Bertazi
2018-05-21 17:36 ` [PATCH v2 08/15] nls: Let charsets define the behavior of tolower/toupper Gabriel Krisman Bertazi
2018-05-21 17:36 ` [PATCH v2 09/15] nls: Add optional normalization and casefold hooks Gabriel Krisman Bertazi
2018-05-21 17:36 ` Gabriel Krisman Bertazi [this message]
2018-05-21 17:36 ` [PATCH v2 11/15] scripts: add trie generator for UTF-8 Gabriel Krisman Bertazi
2018-05-21 17:36 ` [PATCH v2 12/15] nls: utf8norm: Introduce code for UTF-8 normalization Gabriel Krisman Bertazi
2018-05-31  5:47   ` kbuild test robot
2018-05-21 17:36 ` [PATCH v2 13/15] nls: utf8norm: reduce the size of utf8data[] Gabriel Krisman Bertazi
2018-05-21 17:36 ` [PATCH v2 14/15] nls: utf8norm: Integrate utf8norm code with NLS subsystem Gabriel Krisman Bertazi
2018-05-21 17:36 ` [PATCH v2 15/15] nls: utf8norm: Introduce test module for utf8norm implementation Gabriel Krisman Bertazi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180521173617.31625-11-krisman@collabora.co.uk \
    --to=krisman@collabora.co.uk \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=jack@suse.cz \
    --cc=jra@google.com \
    --cc=kernel@lists.collabora.co.uk \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=olaf@sgi.com \
    --cc=tytso@mit.edu \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).