linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gabriel Krisman Bertazi <krisman@collabora.com>
To: tytso@mit.edu
Cc: linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
	sfrench@samba.org, darrick.wong@oracle.com,
	samba-technical@lists.samba.org, jlayton@kernel.org,
	bfields@fieldses.org, paulus@samba.org, Olaf Weber <olaf@sgi.com>,
	Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Subject: [PATCH RFC v5 01/11] unicode: Add unicode character database files
Date: Mon, 28 Jan 2019 16:32:13 -0500	[thread overview]
Message-ID: <20190128213223.31512-2-krisman@collabora.com> (raw)
In-Reply-To: <20190128213223.31512-1-krisman@collabora.com>

From: Olaf Weber <olaf@sgi.com>

Add files from the Unicode Character Database, version 11.0, to the
source.  A helper program that generates a trie used for normalization
from these files is part of a separate commit.

- Notes on the update from 8.0.0 and 11.0:

The structure of ucd files and special cases have not experienced any
changes between versions 8.0.0 and 11.0.0.  8.0.0 saw the addition of
Cherokee LC characters, which is an interesting case for case-folding.
The update is accompanied by new tests on the test_ucd module to catch
specific cases.  No changes to mkutf8data script was required for the
update.

The actual files are not part of the commit submitted to the list
because they are to big and would bounce.  Still, they can be obtained
by the following script:

FILES="CaseFolding.txt DerivedAge.txt extracted/DerivedCombiningClass.txt
       DerivedCoreProperties.txt NormalizationCorrections.txt
       NormalizationTest.txt UnicodeData.txt"
VERSION=11.0.0
BASE=http://www.unicode.org/Public/${VERSION}/ucd

for i in ${FILES} ; do
  wget "${BASE}/$i" -O fs/unicode/ucd/$(basename ${i} .txt)-${VERSION}.txt
done

Signed-off-by: Olaf Weber <olaf@sgi.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
  [Move ucd directory to fs/unicode/]
  [Update to Unicode 11.0.0]
---
 fs/unicode/ucd/README | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)
 create mode 100644 fs/unicode/ucd/README

diff --git a/fs/unicode/ucd/README b/fs/unicode/ucd/README
new file mode 100644
index 000000000000..5f89017b35ee
--- /dev/null
+++ b/fs/unicode/ucd/README
@@ -0,0 +1,33 @@
+The files in this directory are part of the Unicode Character Database
+for version 11.0.0 of the Unicode standard.
+
+The full set of files can be found here:
+
+  http://www.unicode.org/Public/11.0.0/ucd/
+
+The latest released version of the UCD can be found here:
+
+  http://www.unicode.org/Public/UCD/latest/
+
+The files in this directory are identical, except that they have been
+renamed with a suffix indicating the unicode version.
+
+Individual source links:
+
+  http://www.unicode.org/Public/11.0.0/ucd/CaseFolding.txt
+  http://www.unicode.org/Public/11.0.0/ucd/DerivedAge.txt
+  http://www.unicode.org/Public/11.0.0/ucd/extracted/DerivedCombiningClass.txt
+  http://www.unicode.org/Public/11.0.0/ucd/DerivedCoreProperties.txt
+  http://www.unicode.org/Public/11.0.0/ucd/NormalizationCorrections.txt
+  http://www.unicode.org/Public/11.0.0/ucd/NormalizationTest.txt
+  http://www.unicode.org/Public/11.0.0/ucd/UnicodeData.txt
+
+md5sums
+
+  414436796cf097df55f798e1585448ee  CaseFolding-11.0.0.txt
+  6032a595fbb782694456491d86eecfac  DerivedAge-11.0.0.txt
+  3240997d671297ac754ab0d27577acf7  DerivedCombiningClass-11.0.0.txt
+  2a4fe257d9d8184518e036194d2248ec  DerivedCoreProperties-11.0.0.txt
+  4e7d383fa0dd3cd9d49d64e5b7b7c9e0  NormalizationCorrections-11.0.0.txt
+  c9500c5b8b88e584469f056023ecc3f2  NormalizationTest-11.0.0.txt
+  acc291106c3758d2025f8d7bd5518bee  UnicodeData-11.0.0.txt
-- 
2.20.1


  reply	other threads:[~2019-01-28 21:32 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-28 21:32 [PATCH RFC v5 00/11] Ext4 Encoding and Case-insensitive support Gabriel Krisman Bertazi
2019-01-28 21:32 ` Gabriel Krisman Bertazi [this message]
2019-01-28 21:32 ` [PATCH RFC v5 02/11] scripts: add trie generator for UTF-8 Gabriel Krisman Bertazi
2019-01-28 21:32 ` [PATCH RFC v5 03/11] unicode: Introduce code for UTF-8 normalization Gabriel Krisman Bertazi
2019-01-28 21:32 ` [PATCH RFC v5 04/11] unicode: reduce the size of utf8data[] Gabriel Krisman Bertazi
2019-01-28 21:32 ` [PATCH RFC v5 05/11] unicode: Implement higher level API for string handling Gabriel Krisman Bertazi
2019-01-28 21:32 ` [PATCH RFC v5 06/11] unicode: Introduce test module for normalized utf8 implementation Gabriel Krisman Bertazi
2019-01-28 21:32 ` [PATCH RFC v5 07/11] MAINTAINERS: Add Unicode subsystem entry Gabriel Krisman Bertazi
2019-01-28 21:32 ` [PATCH RFC v5 08/11] ext4: Include encoding information in the superblock Gabriel Krisman Bertazi
2019-01-28 21:32 ` [PATCH RFC v5 09/11] ext4: Support encoding-aware file name lookups Gabriel Krisman Bertazi
2019-01-28 21:32 ` [PATCH RFC v5 10/11] ext4: Implement EXT4_CASEFOLD_FL flag Gabriel Krisman Bertazi
2019-01-28 21:32 ` [PATCH RFC v5 11/11] docs: ext4.rst: Document encoding and case-insensitive Gabriel Krisman Bertazi
2019-01-29 16:54 ` [PATCH RFC v5 00/11] Ext4 Encoding and Case-insensitive support J. Bruce Fields
2019-02-05 18:10 ` Pali Rohár
2019-02-05 19:08   ` Gabriel Krisman Bertazi
2019-02-06  8:47     ` Pali Rohár
2019-02-06 16:04       ` Gabriel Krisman Bertazi
2019-02-06 16:43         ` Pali Rohár
2019-02-19 19:04 ` Gabriel Krisman Bertazi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190128213223.31512-2-krisman@collabora.com \
    --to=krisman@collabora.com \
    --cc=bfields@fieldses.org \
    --cc=darrick.wong@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=krisman@collabora.co.uk \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=olaf@sgi.com \
    --cc=paulus@samba.org \
    --cc=samba-technical@lists.samba.org \
    --cc=sfrench@samba.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).