* [PATCH 1/2] update-unicode.sh: automatically download newer definition files @ 2016-12-02 21:26 Beat Bolli 2016-12-02 21:26 ` [PATCH 2/2] unicode: update the tables to Unicode 9.0 Beat Bolli ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Beat Bolli @ 2016-12-02 21:26 UTC (permalink / raw) To: git; +Cc: Beat Bolli Checking just for the files' existence is not enough; we should also download them if a newer version exists on the Unicode servers. Signed-off-by: Beat Bolli <dev+git@drbeat.li> --- update_unicode.sh | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/update_unicode.sh b/update_unicode.sh index 27af77c..3c84270 100755 --- a/update_unicode.sh +++ b/update_unicode.sh @@ -10,12 +10,8 @@ if ! test -d unicode; then mkdir unicode fi && ( cd unicode && - if ! test -f UnicodeData.txt; then - wget http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt - fi && - if ! test -f EastAsianWidth.txt; then - wget http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt - fi && + wget -N http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt \ + http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt && if ! test -d uniset; then git clone https://github.com/depp/uniset.git fi && -- 2.7.2 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 2/2] unicode: update the tables to Unicode 9.0 2016-12-02 21:26 [PATCH 1/2] update-unicode.sh: automatically download newer definition files Beat Bolli @ 2016-12-02 21:26 ` Beat Bolli 2016-12-03 10:35 ` [PATCH 3/3] unicode_width.h: fix the double_width[] table Beat Bolli 2016-12-03 10:53 ` [PATCH v2 1/3] update-unicode.sh: automatically download newer definition files Beat Bolli 2 siblings, 0 replies; 18+ messages in thread From: Beat Bolli @ 2016-12-02 21:26 UTC (permalink / raw) To: git; +Cc: Beat Bolli A rerun of the previously fixed update-unicode.sh produces these new tables. Signed-off-by: Beat Bolli <dev+git@drbeat.li> --- unicode_width.h | 122 +++++++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 111 insertions(+), 11 deletions(-) diff --git a/unicode_width.h b/unicode_width.h index 47cdd23..73b5fd6 100644 --- a/unicode_width.h +++ b/unicode_width.h @@ -25,7 +25,7 @@ static const struct interval zero_width[] = { { 0x0825, 0x0827 }, { 0x0829, 0x082D }, { 0x0859, 0x085B }, -{ 0x08E4, 0x0902 }, +{ 0x08D4, 0x0902 }, { 0x093A, 0x093A }, { 0x093C, 0x093C }, { 0x0941, 0x0948 }, @@ -120,6 +120,7 @@ static const struct interval zero_width[] = { { 0x17C9, 0x17D3 }, { 0x17DD, 0x17DD }, { 0x180B, 0x180E }, +{ 0x1885, 0x1886 }, { 0x18A9, 0x18A9 }, { 0x1920, 0x1922 }, { 0x1927, 0x1928 }, @@ -158,7 +159,7 @@ static const struct interval zero_width[] = { { 0x1CF4, 0x1CF4 }, { 0x1CF8, 0x1CF9 }, { 0x1DC0, 0x1DF5 }, -{ 0x1DFC, 0x1DFF }, +{ 0x1DFB, 0x1DFF }, { 0x200B, 0x200F }, { 0x202A, 0x202E }, { 0x2060, 0x2064 }, @@ -171,13 +172,13 @@ static const struct interval zero_width[] = { { 0x3099, 0x309A }, { 0xA66F, 0xA672 }, { 0xA674, 0xA67D }, -{ 0xA69F, 0xA69F }, +{ 0xA69E, 0xA69F }, { 0xA6F0, 0xA6F1 }, { 0xA802, 0xA802 }, { 0xA806, 0xA806 }, { 0xA80B, 0xA80B }, { 0xA825, 0xA826 }, -{ 0xA8C4, 0xA8C4 }, +{ 0xA8C4, 0xA8C5 }, { 0xA8E0, 0xA8F1 }, { 0xA926, 0xA92D }, { 0xA947, 0xA951 }, @@ -204,7 +205,7 @@ static const struct interval zero_width[] = { { 0xABED, 0xABED }, { 0xFB1E, 0xFB1E }, { 0xFE00, 0xFE0F }, -{ 0xFE20, 0xFE2D }, +{ 0xFE20, 0xFE2F }, { 0xFEFF, 0xFEFF }, { 0xFFF9, 0xFFFB }, { 0x101FD, 0x101FD }, @@ -228,16 +229,21 @@ static const struct interval zero_width[] = { { 0x11173, 0x11173 }, { 0x11180, 0x11181 }, { 0x111B6, 0x111BE }, +{ 0x111CA, 0x111CC }, { 0x1122F, 0x11231 }, { 0x11234, 0x11234 }, { 0x11236, 0x11237 }, +{ 0x1123E, 0x1123E }, { 0x112DF, 0x112DF }, { 0x112E3, 0x112EA }, -{ 0x11301, 0x11301 }, +{ 0x11300, 0x11301 }, { 0x1133C, 0x1133C }, { 0x11340, 0x11340 }, { 0x11366, 0x1136C }, { 0x11370, 0x11374 }, +{ 0x11438, 0x1143F }, +{ 0x11442, 0x11444 }, +{ 0x11446, 0x11446 }, { 0x114B3, 0x114B8 }, { 0x114BA, 0x114BA }, { 0x114BF, 0x114C0 }, @@ -245,6 +251,7 @@ static const struct interval zero_width[] = { { 0x115B2, 0x115B5 }, { 0x115BC, 0x115BD }, { 0x115BF, 0x115C0 }, +{ 0x115DC, 0x115DD }, { 0x11633, 0x1163A }, { 0x1163D, 0x1163D }, { 0x1163F, 0x11640 }, @@ -252,6 +259,16 @@ static const struct interval zero_width[] = { { 0x116AD, 0x116AD }, { 0x116B0, 0x116B5 }, { 0x116B7, 0x116B7 }, +{ 0x1171D, 0x1171F }, +{ 0x11722, 0x11725 }, +{ 0x11727, 0x1172B }, +{ 0x11C30, 0x11C36 }, +{ 0x11C38, 0x11C3D }, +{ 0x11C3F, 0x11C3F }, +{ 0x11C92, 0x11CA7 }, +{ 0x11CAA, 0x11CB0 }, +{ 0x11CB2, 0x11CB3 }, +{ 0x11CB5, 0x11CB6 }, { 0x16AF0, 0x16AF4 }, { 0x16B30, 0x16B36 }, { 0x16F8F, 0x16F92 }, @@ -262,16 +279,28 @@ static const struct interval zero_width[] = { { 0x1D185, 0x1D18B }, { 0x1D1AA, 0x1D1AD }, { 0x1D242, 0x1D244 }, +{ 0x1DA00, 0x1DA36 }, +{ 0x1DA3B, 0x1DA6C }, +{ 0x1DA75, 0x1DA75 }, +{ 0x1DA84, 0x1DA84 }, +{ 0x1DA9B, 0x1DA9F }, +{ 0x1DAA1, 0x1DAAF }, +{ 0x1E000, 0x1E006 }, +{ 0x1E008, 0x1E018 }, +{ 0x1E01B, 0x1E021 }, +{ 0x1E023, 0x1E024 }, +{ 0x1E026, 0x1E02A }, { 0x1E8D0, 0x1E8D6 }, +{ 0x1E944, 0x1E94A }, { 0xE0001, 0xE0001 }, { 0xE0020, 0xE007F }, { 0xE0100, 0xE01EF } }; static const struct interval double_width[] = { -{ /* plane */ 0x0, 0x1C }, -{ /* plane */ 0x1C, 0x21 }, -{ /* plane */ 0x21, 0x22 }, -{ /* plane */ 0x22, 0x23 }, +{ /* plane */ 0x0, 0x3D }, +{ /* plane */ 0x3D, 0x68 }, +{ /* plane */ 0x68, 0x69 }, +{ /* plane */ 0x69, 0x6A }, { /* plane */ 0x0, 0x0 }, { /* plane */ 0x0, 0x0 }, { /* plane */ 0x0, 0x0 }, @@ -286,7 +315,40 @@ static const struct interval double_width[] = { { /* plane */ 0x0, 0x0 }, { /* plane */ 0x0, 0x0 }, { 0x1100, 0x115F }, +{ 0x231A, 0x231B }, { 0x2329, 0x232A }, +{ 0x23E9, 0x23EC }, +{ 0x23F0, 0x23F0 }, +{ 0x23F3, 0x23F3 }, +{ 0x25FD, 0x25FE }, +{ 0x2614, 0x2615 }, +{ 0x2648, 0x2653 }, +{ 0x267F, 0x267F }, +{ 0x2693, 0x2693 }, +{ 0x26A1, 0x26A1 }, +{ 0x26AA, 0x26AB }, +{ 0x26BD, 0x26BE }, +{ 0x26C4, 0x26C5 }, +{ 0x26CE, 0x26CE }, +{ 0x26D4, 0x26D4 }, +{ 0x26EA, 0x26EA }, +{ 0x26F2, 0x26F3 }, +{ 0x26F5, 0x26F5 }, +{ 0x26FA, 0x26FA }, +{ 0x26FD, 0x26FD }, +{ 0x2705, 0x2705 }, +{ 0x270A, 0x270B }, +{ 0x2728, 0x2728 }, +{ 0x274C, 0x274C }, +{ 0x274E, 0x274E }, +{ 0x2753, 0x2755 }, +{ 0x2757, 0x2757 }, +{ 0x2795, 0x2797 }, +{ 0x27B0, 0x27B0 }, +{ 0x27BF, 0x27BF }, +{ 0x2B1B, 0x2B1C }, +{ 0x2B50, 0x2B50 }, +{ 0x2B55, 0x2B55 }, { 0x2E80, 0x2E99 }, { 0x2E9B, 0x2EF3 }, { 0x2F00, 0x2FD5 }, @@ -313,11 +375,49 @@ static const struct interval double_width[] = { { 0xFE68, 0xFE6B }, { 0xFF01, 0xFF60 }, { 0xFFE0, 0xFFE6 }, +{ 0x16FE0, 0x16FE0 }, +{ 0x17000, 0x187EC }, +{ 0x18800, 0x18AF2 }, { 0x1B000, 0x1B001 }, +{ 0x1F004, 0x1F004 }, +{ 0x1F0CF, 0x1F0CF }, +{ 0x1F18E, 0x1F18E }, +{ 0x1F191, 0x1F19A }, { 0x1F200, 0x1F202 }, -{ 0x1F210, 0x1F23A }, +{ 0x1F210, 0x1F23B }, { 0x1F240, 0x1F248 }, { 0x1F250, 0x1F251 }, +{ 0x1F300, 0x1F320 }, +{ 0x1F32D, 0x1F335 }, +{ 0x1F337, 0x1F37C }, +{ 0x1F37E, 0x1F393 }, +{ 0x1F3A0, 0x1F3CA }, +{ 0x1F3CF, 0x1F3D3 }, +{ 0x1F3E0, 0x1F3F0 }, +{ 0x1F3F4, 0x1F3F4 }, +{ 0x1F3F8, 0x1F43E }, +{ 0x1F440, 0x1F440 }, +{ 0x1F442, 0x1F4FC }, +{ 0x1F4FF, 0x1F53D }, +{ 0x1F54B, 0x1F54E }, +{ 0x1F550, 0x1F567 }, +{ 0x1F57A, 0x1F57A }, +{ 0x1F595, 0x1F596 }, +{ 0x1F5A4, 0x1F5A4 }, +{ 0x1F5FB, 0x1F64F }, +{ 0x1F680, 0x1F6C5 }, +{ 0x1F6CC, 0x1F6CC }, +{ 0x1F6D0, 0x1F6D2 }, +{ 0x1F6EB, 0x1F6EC }, +{ 0x1F6F4, 0x1F6F6 }, +{ 0x1F910, 0x1F91E }, +{ 0x1F920, 0x1F927 }, +{ 0x1F930, 0x1F930 }, +{ 0x1F933, 0x1F93E }, +{ 0x1F940, 0x1F94B }, +{ 0x1F950, 0x1F95E }, +{ 0x1F980, 0x1F991 }, +{ 0x1F9C0, 0x1F9C0 }, { 0x20000, 0x2FFFD }, { 0x30000, 0x3FFFD } }; -- 2.7.2 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 3/3] unicode_width.h: fix the double_width[] table 2016-12-02 21:26 [PATCH 1/2] update-unicode.sh: automatically download newer definition files Beat Bolli 2016-12-02 21:26 ` [PATCH 2/2] unicode: update the tables to Unicode 9.0 Beat Bolli @ 2016-12-03 10:35 ` Beat Bolli 2016-12-03 10:53 ` [PATCH v2 1/3] update-unicode.sh: automatically download newer definition files Beat Bolli 2 siblings, 0 replies; 18+ messages in thread From: Beat Bolli @ 2016-12-03 10:35 UTC (permalink / raw) To: git; +Cc: Beat Bolli The function bisearch() in utf8.c does a pure binary search in double_width. It does not care about the 17 plane offsets which unicode/uniset/uniset prepends. Leaving the plane offsets in the table may cause wrong results. Filter out the plane offsets in the update-unicode.sh and regenerate the table. Signed-off-by: Beat Bolli <dev+git@drbeat.li> --- unicode_width.h | 17 ----------------- update_unicode.sh | 2 +- 2 files changed, 1 insertion(+), 18 deletions(-) diff --git a/unicode_width.h b/unicode_width.h index 73b5fd6..02207be 100644 --- a/unicode_width.h +++ b/unicode_width.h @@ -297,23 +297,6 @@ static const struct interval zero_width[] = { { 0xE0100, 0xE01EF } }; static const struct interval double_width[] = { -{ /* plane */ 0x0, 0x3D }, -{ /* plane */ 0x3D, 0x68 }, -{ /* plane */ 0x68, 0x69 }, -{ /* plane */ 0x69, 0x6A }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, { 0x1100, 0x115F }, { 0x231A, 0x231B }, { 0x2329, 0x232A }, diff --git a/update_unicode.sh b/update_unicode.sh index 3c84270..4c1ec8d 100755 --- a/update_unicode.sh +++ b/update_unicode.sh @@ -30,7 +30,7 @@ fi && grep -v plane) }; static const struct interval double_width[] = { - $(uniset/uniset --32 eaw:F,W) + $(uniset/uniset --32 eaw:F,W | grep -v plane) }; EOF ) -- 2.7.2 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 1/3] update-unicode.sh: automatically download newer definition files 2016-12-02 21:26 [PATCH 1/2] update-unicode.sh: automatically download newer definition files Beat Bolli 2016-12-02 21:26 ` [PATCH 2/2] unicode: update the tables to Unicode 9.0 Beat Bolli 2016-12-03 10:35 ` [PATCH 3/3] unicode_width.h: fix the double_width[] table Beat Bolli @ 2016-12-03 10:53 ` Beat Bolli 2016-12-03 10:53 ` [PATCH v2 2/3] unicode_width.h: update the tables to Unicode 9.0 Beat Bolli 2016-12-03 10:53 ` [PATCH v2 3/3] unicode_width.h: fix the double_width[] table Beat Bolli 2 siblings, 2 replies; 18+ messages in thread From: Beat Bolli @ 2016-12-03 10:53 UTC (permalink / raw) To: git; +Cc: Beat Bolli, Torsten Bögershausen Checking just for the unicode data files' existence is not sufficient; we should also download them if a newer version exists on the Unicode consortium's servers. Option -N of wget does this nicely for us. Cc: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Beat Bolli <dev+git@drbeat.li> --- Diff to v1: - reword the commit message - add Thorsten's Cc: update_unicode.sh | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/update_unicode.sh b/update_unicode.sh index 27af77c..3c84270 100755 --- a/update_unicode.sh +++ b/update_unicode.sh @@ -10,12 +10,8 @@ if ! test -d unicode; then mkdir unicode fi && ( cd unicode && - if ! test -f UnicodeData.txt; then - wget http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt - fi && - if ! test -f EastAsianWidth.txt; then - wget http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt - fi && + wget -N http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt \ + http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt && if ! test -d uniset; then git clone https://github.com/depp/uniset.git fi && -- 2.7.2 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 2/3] unicode_width.h: update the tables to Unicode 9.0 2016-12-03 10:53 ` [PATCH v2 1/3] update-unicode.sh: automatically download newer definition files Beat Bolli @ 2016-12-03 10:53 ` Beat Bolli 2016-12-03 10:53 ` [PATCH v2 3/3] unicode_width.h: fix the double_width[] table Beat Bolli 1 sibling, 0 replies; 18+ messages in thread From: Beat Bolli @ 2016-12-03 10:53 UTC (permalink / raw) To: git; +Cc: Beat Bolli Rerunning update-unicode.sh fixed in the previous commit produces these new tables. Signed-off-by: Beat Bolli <dev+git@drbeat.li> --- Diff to v1: - reword the commit message unicode_width.h | 122 +++++++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 111 insertions(+), 11 deletions(-) diff --git a/unicode_width.h b/unicode_width.h index 47cdd23..73b5fd6 100644 --- a/unicode_width.h +++ b/unicode_width.h @@ -25,7 +25,7 @@ static const struct interval zero_width[] = { { 0x0825, 0x0827 }, { 0x0829, 0x082D }, { 0x0859, 0x085B }, -{ 0x08E4, 0x0902 }, +{ 0x08D4, 0x0902 }, { 0x093A, 0x093A }, { 0x093C, 0x093C }, { 0x0941, 0x0948 }, @@ -120,6 +120,7 @@ static const struct interval zero_width[] = { { 0x17C9, 0x17D3 }, { 0x17DD, 0x17DD }, { 0x180B, 0x180E }, +{ 0x1885, 0x1886 }, { 0x18A9, 0x18A9 }, { 0x1920, 0x1922 }, { 0x1927, 0x1928 }, @@ -158,7 +159,7 @@ static const struct interval zero_width[] = { { 0x1CF4, 0x1CF4 }, { 0x1CF8, 0x1CF9 }, { 0x1DC0, 0x1DF5 }, -{ 0x1DFC, 0x1DFF }, +{ 0x1DFB, 0x1DFF }, { 0x200B, 0x200F }, { 0x202A, 0x202E }, { 0x2060, 0x2064 }, @@ -171,13 +172,13 @@ static const struct interval zero_width[] = { { 0x3099, 0x309A }, { 0xA66F, 0xA672 }, { 0xA674, 0xA67D }, -{ 0xA69F, 0xA69F }, +{ 0xA69E, 0xA69F }, { 0xA6F0, 0xA6F1 }, { 0xA802, 0xA802 }, { 0xA806, 0xA806 }, { 0xA80B, 0xA80B }, { 0xA825, 0xA826 }, -{ 0xA8C4, 0xA8C4 }, +{ 0xA8C4, 0xA8C5 }, { 0xA8E0, 0xA8F1 }, { 0xA926, 0xA92D }, { 0xA947, 0xA951 }, @@ -204,7 +205,7 @@ static const struct interval zero_width[] = { { 0xABED, 0xABED }, { 0xFB1E, 0xFB1E }, { 0xFE00, 0xFE0F }, -{ 0xFE20, 0xFE2D }, +{ 0xFE20, 0xFE2F }, { 0xFEFF, 0xFEFF }, { 0xFFF9, 0xFFFB }, { 0x101FD, 0x101FD }, @@ -228,16 +229,21 @@ static const struct interval zero_width[] = { { 0x11173, 0x11173 }, { 0x11180, 0x11181 }, { 0x111B6, 0x111BE }, +{ 0x111CA, 0x111CC }, { 0x1122F, 0x11231 }, { 0x11234, 0x11234 }, { 0x11236, 0x11237 }, +{ 0x1123E, 0x1123E }, { 0x112DF, 0x112DF }, { 0x112E3, 0x112EA }, -{ 0x11301, 0x11301 }, +{ 0x11300, 0x11301 }, { 0x1133C, 0x1133C }, { 0x11340, 0x11340 }, { 0x11366, 0x1136C }, { 0x11370, 0x11374 }, +{ 0x11438, 0x1143F }, +{ 0x11442, 0x11444 }, +{ 0x11446, 0x11446 }, { 0x114B3, 0x114B8 }, { 0x114BA, 0x114BA }, { 0x114BF, 0x114C0 }, @@ -245,6 +251,7 @@ static const struct interval zero_width[] = { { 0x115B2, 0x115B5 }, { 0x115BC, 0x115BD }, { 0x115BF, 0x115C0 }, +{ 0x115DC, 0x115DD }, { 0x11633, 0x1163A }, { 0x1163D, 0x1163D }, { 0x1163F, 0x11640 }, @@ -252,6 +259,16 @@ static const struct interval zero_width[] = { { 0x116AD, 0x116AD }, { 0x116B0, 0x116B5 }, { 0x116B7, 0x116B7 }, +{ 0x1171D, 0x1171F }, +{ 0x11722, 0x11725 }, +{ 0x11727, 0x1172B }, +{ 0x11C30, 0x11C36 }, +{ 0x11C38, 0x11C3D }, +{ 0x11C3F, 0x11C3F }, +{ 0x11C92, 0x11CA7 }, +{ 0x11CAA, 0x11CB0 }, +{ 0x11CB2, 0x11CB3 }, +{ 0x11CB5, 0x11CB6 }, { 0x16AF0, 0x16AF4 }, { 0x16B30, 0x16B36 }, { 0x16F8F, 0x16F92 }, @@ -262,16 +279,28 @@ static const struct interval zero_width[] = { { 0x1D185, 0x1D18B }, { 0x1D1AA, 0x1D1AD }, { 0x1D242, 0x1D244 }, +{ 0x1DA00, 0x1DA36 }, +{ 0x1DA3B, 0x1DA6C }, +{ 0x1DA75, 0x1DA75 }, +{ 0x1DA84, 0x1DA84 }, +{ 0x1DA9B, 0x1DA9F }, +{ 0x1DAA1, 0x1DAAF }, +{ 0x1E000, 0x1E006 }, +{ 0x1E008, 0x1E018 }, +{ 0x1E01B, 0x1E021 }, +{ 0x1E023, 0x1E024 }, +{ 0x1E026, 0x1E02A }, { 0x1E8D0, 0x1E8D6 }, +{ 0x1E944, 0x1E94A }, { 0xE0001, 0xE0001 }, { 0xE0020, 0xE007F }, { 0xE0100, 0xE01EF } }; static const struct interval double_width[] = { -{ /* plane */ 0x0, 0x1C }, -{ /* plane */ 0x1C, 0x21 }, -{ /* plane */ 0x21, 0x22 }, -{ /* plane */ 0x22, 0x23 }, +{ /* plane */ 0x0, 0x3D }, +{ /* plane */ 0x3D, 0x68 }, +{ /* plane */ 0x68, 0x69 }, +{ /* plane */ 0x69, 0x6A }, { /* plane */ 0x0, 0x0 }, { /* plane */ 0x0, 0x0 }, { /* plane */ 0x0, 0x0 }, @@ -286,7 +315,40 @@ static const struct interval double_width[] = { { /* plane */ 0x0, 0x0 }, { /* plane */ 0x0, 0x0 }, { 0x1100, 0x115F }, +{ 0x231A, 0x231B }, { 0x2329, 0x232A }, +{ 0x23E9, 0x23EC }, +{ 0x23F0, 0x23F0 }, +{ 0x23F3, 0x23F3 }, +{ 0x25FD, 0x25FE }, +{ 0x2614, 0x2615 }, +{ 0x2648, 0x2653 }, +{ 0x267F, 0x267F }, +{ 0x2693, 0x2693 }, +{ 0x26A1, 0x26A1 }, +{ 0x26AA, 0x26AB }, +{ 0x26BD, 0x26BE }, +{ 0x26C4, 0x26C5 }, +{ 0x26CE, 0x26CE }, +{ 0x26D4, 0x26D4 }, +{ 0x26EA, 0x26EA }, +{ 0x26F2, 0x26F3 }, +{ 0x26F5, 0x26F5 }, +{ 0x26FA, 0x26FA }, +{ 0x26FD, 0x26FD }, +{ 0x2705, 0x2705 }, +{ 0x270A, 0x270B }, +{ 0x2728, 0x2728 }, +{ 0x274C, 0x274C }, +{ 0x274E, 0x274E }, +{ 0x2753, 0x2755 }, +{ 0x2757, 0x2757 }, +{ 0x2795, 0x2797 }, +{ 0x27B0, 0x27B0 }, +{ 0x27BF, 0x27BF }, +{ 0x2B1B, 0x2B1C }, +{ 0x2B50, 0x2B50 }, +{ 0x2B55, 0x2B55 }, { 0x2E80, 0x2E99 }, { 0x2E9B, 0x2EF3 }, { 0x2F00, 0x2FD5 }, @@ -313,11 +375,49 @@ static const struct interval double_width[] = { { 0xFE68, 0xFE6B }, { 0xFF01, 0xFF60 }, { 0xFFE0, 0xFFE6 }, +{ 0x16FE0, 0x16FE0 }, +{ 0x17000, 0x187EC }, +{ 0x18800, 0x18AF2 }, { 0x1B000, 0x1B001 }, +{ 0x1F004, 0x1F004 }, +{ 0x1F0CF, 0x1F0CF }, +{ 0x1F18E, 0x1F18E }, +{ 0x1F191, 0x1F19A }, { 0x1F200, 0x1F202 }, -{ 0x1F210, 0x1F23A }, +{ 0x1F210, 0x1F23B }, { 0x1F240, 0x1F248 }, { 0x1F250, 0x1F251 }, +{ 0x1F300, 0x1F320 }, +{ 0x1F32D, 0x1F335 }, +{ 0x1F337, 0x1F37C }, +{ 0x1F37E, 0x1F393 }, +{ 0x1F3A0, 0x1F3CA }, +{ 0x1F3CF, 0x1F3D3 }, +{ 0x1F3E0, 0x1F3F0 }, +{ 0x1F3F4, 0x1F3F4 }, +{ 0x1F3F8, 0x1F43E }, +{ 0x1F440, 0x1F440 }, +{ 0x1F442, 0x1F4FC }, +{ 0x1F4FF, 0x1F53D }, +{ 0x1F54B, 0x1F54E }, +{ 0x1F550, 0x1F567 }, +{ 0x1F57A, 0x1F57A }, +{ 0x1F595, 0x1F596 }, +{ 0x1F5A4, 0x1F5A4 }, +{ 0x1F5FB, 0x1F64F }, +{ 0x1F680, 0x1F6C5 }, +{ 0x1F6CC, 0x1F6CC }, +{ 0x1F6D0, 0x1F6D2 }, +{ 0x1F6EB, 0x1F6EC }, +{ 0x1F6F4, 0x1F6F6 }, +{ 0x1F910, 0x1F91E }, +{ 0x1F920, 0x1F927 }, +{ 0x1F930, 0x1F930 }, +{ 0x1F933, 0x1F93E }, +{ 0x1F940, 0x1F94B }, +{ 0x1F950, 0x1F95E }, +{ 0x1F980, 0x1F991 }, +{ 0x1F9C0, 0x1F9C0 }, { 0x20000, 0x2FFFD }, { 0x30000, 0x3FFFD } }; -- 2.7.2 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 3/3] unicode_width.h: fix the double_width[] table 2016-12-03 10:53 ` [PATCH v2 1/3] update-unicode.sh: automatically download newer definition files Beat Bolli 2016-12-03 10:53 ` [PATCH v2 2/3] unicode_width.h: update the tables to Unicode 9.0 Beat Bolli @ 2016-12-03 10:53 ` Beat Bolli 2016-12-03 13:19 ` [PATCH v3 1/3] update-unicode.sh: automatically download newer definition files Beat Bolli 1 sibling, 1 reply; 18+ messages in thread From: Beat Bolli @ 2016-12-03 10:53 UTC (permalink / raw) To: git; +Cc: Beat Bolli, Torsten Bögershausen The function bisearch() in utf8.c does a pure binary search in double_width. It does not care about the 17 plane offsets which unicode/uniset/uniset prepends. Leaving the plane offsets in the table may cause wrong results. Filter out the plane offsets in update-unicode.sh and regenerate the table. Cc: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Beat Bolli <dev+git@drbeat.li> --- Diff to v1: - add Thorsten's Cc: unicode_width.h | 17 ----------------- update_unicode.sh | 2 +- 2 files changed, 1 insertion(+), 18 deletions(-) diff --git a/unicode_width.h b/unicode_width.h index 73b5fd6..02207be 100644 --- a/unicode_width.h +++ b/unicode_width.h @@ -297,23 +297,6 @@ static const struct interval zero_width[] = { { 0xE0100, 0xE01EF } }; static const struct interval double_width[] = { -{ /* plane */ 0x0, 0x3D }, -{ /* plane */ 0x3D, 0x68 }, -{ /* plane */ 0x68, 0x69 }, -{ /* plane */ 0x69, 0x6A }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, { 0x1100, 0x115F }, { 0x231A, 0x231B }, { 0x2329, 0x232A }, diff --git a/update_unicode.sh b/update_unicode.sh index 3c84270..4c1ec8d 100755 --- a/update_unicode.sh +++ b/update_unicode.sh @@ -30,7 +30,7 @@ fi && grep -v plane) }; static const struct interval double_width[] = { - $(uniset/uniset --32 eaw:F,W) + $(uniset/uniset --32 eaw:F,W | grep -v plane) }; EOF ) -- 2.7.2 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v3 1/3] update-unicode.sh: automatically download newer definition files 2016-12-03 10:53 ` [PATCH v2 3/3] unicode_width.h: fix the double_width[] table Beat Bolli @ 2016-12-03 13:19 ` Beat Bolli 2016-12-03 13:19 ` [PATCH v3 2/3] update-unicode.sh: strip the plane offsets from the double_width[] table Beat Bolli ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Beat Bolli @ 2016-12-03 13:19 UTC (permalink / raw) To: git; +Cc: Beat Bolli, Torsten Bögershausen Checking just for the unicode data files' existence is not sufficient; we should also download them if a newer version exists on the Unicode consortium's servers. Option -N of wget does this nicely for us. Cc: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Beat Bolli <dev+git@drbeat.li> --- Diff to v2: - reorder the commits: fix all of update-unicode.sh first, then regenerate unicode_width.h only once update_unicode.sh | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/update_unicode.sh b/update_unicode.sh index 27af77c..3c84270 100755 --- a/update_unicode.sh +++ b/update_unicode.sh @@ -10,12 +10,8 @@ if ! test -d unicode; then mkdir unicode fi && ( cd unicode && - if ! test -f UnicodeData.txt; then - wget http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt - fi && - if ! test -f EastAsianWidth.txt; then - wget http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt - fi && + wget -N http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt \ + http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt && if ! test -d uniset; then git clone https://github.com/depp/uniset.git fi && -- 2.7.2 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v3 2/3] update-unicode.sh: strip the plane offsets from the double_width[] table 2016-12-03 13:19 ` [PATCH v3 1/3] update-unicode.sh: automatically download newer definition files Beat Bolli @ 2016-12-03 13:19 ` Beat Bolli 2016-12-03 13:19 ` [PATCH v3 3/3] unicode_width.h: update the tables to Unicode 9.0 Beat Bolli 2016-12-03 16:40 ` [PATCH v3 1/3] update-unicode.sh: automatically download newer definition files Torsten =?unknown-8bit?Q?B=C3=B6gershausen?= 2 siblings, 0 replies; 18+ messages in thread From: Beat Bolli @ 2016-12-03 13:19 UTC (permalink / raw) To: git; +Cc: Beat Bolli, Torsten Bögershausen The function bisearch() in utf8.c does a pure binary search in double_width. It does not care about the 17 plane offsets which unicode/uniset/uniset prepends. Leaving the plane offsets in the table may cause wrong results. Filter out the plane offsets in update-unicode.sh. Cc: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Beat Bolli <dev+git@drbeat.li> --- update_unicode.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/update_unicode.sh b/update_unicode.sh index 3c84270..4c1ec8d 100755 --- a/update_unicode.sh +++ b/update_unicode.sh @@ -30,7 +30,7 @@ fi && grep -v plane) }; static const struct interval double_width[] = { - $(uniset/uniset --32 eaw:F,W) + $(uniset/uniset --32 eaw:F,W | grep -v plane) }; EOF ) -- 2.7.2 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v3 3/3] unicode_width.h: update the tables to Unicode 9.0 2016-12-03 13:19 ` [PATCH v3 1/3] update-unicode.sh: automatically download newer definition files Beat Bolli 2016-12-03 13:19 ` [PATCH v3 2/3] update-unicode.sh: strip the plane offsets from the double_width[] table Beat Bolli @ 2016-12-03 13:19 ` Beat Bolli 2016-12-03 16:40 ` [PATCH v3 1/3] update-unicode.sh: automatically download newer definition files Torsten =?unknown-8bit?Q?B=C3=B6gershausen?= 2 siblings, 0 replies; 18+ messages in thread From: Beat Bolli @ 2016-12-03 13:19 UTC (permalink / raw) To: git; +Cc: Beat Bolli Rerunning update-unicode.sh that we fixed in the two previous commits produces these new tables. Signed-off-by: Beat Bolli <dev+git@drbeat.li> --- unicode_width.h | 131 +++++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 107 insertions(+), 24 deletions(-) diff --git a/unicode_width.h b/unicode_width.h index 47cdd23..02207be 100644 --- a/unicode_width.h +++ b/unicode_width.h @@ -25,7 +25,7 @@ static const struct interval zero_width[] = { { 0x0825, 0x0827 }, { 0x0829, 0x082D }, { 0x0859, 0x085B }, -{ 0x08E4, 0x0902 }, +{ 0x08D4, 0x0902 }, { 0x093A, 0x093A }, { 0x093C, 0x093C }, { 0x0941, 0x0948 }, @@ -120,6 +120,7 @@ static const struct interval zero_width[] = { { 0x17C9, 0x17D3 }, { 0x17DD, 0x17DD }, { 0x180B, 0x180E }, +{ 0x1885, 0x1886 }, { 0x18A9, 0x18A9 }, { 0x1920, 0x1922 }, { 0x1927, 0x1928 }, @@ -158,7 +159,7 @@ static const struct interval zero_width[] = { { 0x1CF4, 0x1CF4 }, { 0x1CF8, 0x1CF9 }, { 0x1DC0, 0x1DF5 }, -{ 0x1DFC, 0x1DFF }, +{ 0x1DFB, 0x1DFF }, { 0x200B, 0x200F }, { 0x202A, 0x202E }, { 0x2060, 0x2064 }, @@ -171,13 +172,13 @@ static const struct interval zero_width[] = { { 0x3099, 0x309A }, { 0xA66F, 0xA672 }, { 0xA674, 0xA67D }, -{ 0xA69F, 0xA69F }, +{ 0xA69E, 0xA69F }, { 0xA6F0, 0xA6F1 }, { 0xA802, 0xA802 }, { 0xA806, 0xA806 }, { 0xA80B, 0xA80B }, { 0xA825, 0xA826 }, -{ 0xA8C4, 0xA8C4 }, +{ 0xA8C4, 0xA8C5 }, { 0xA8E0, 0xA8F1 }, { 0xA926, 0xA92D }, { 0xA947, 0xA951 }, @@ -204,7 +205,7 @@ static const struct interval zero_width[] = { { 0xABED, 0xABED }, { 0xFB1E, 0xFB1E }, { 0xFE00, 0xFE0F }, -{ 0xFE20, 0xFE2D }, +{ 0xFE20, 0xFE2F }, { 0xFEFF, 0xFEFF }, { 0xFFF9, 0xFFFB }, { 0x101FD, 0x101FD }, @@ -228,16 +229,21 @@ static const struct interval zero_width[] = { { 0x11173, 0x11173 }, { 0x11180, 0x11181 }, { 0x111B6, 0x111BE }, +{ 0x111CA, 0x111CC }, { 0x1122F, 0x11231 }, { 0x11234, 0x11234 }, { 0x11236, 0x11237 }, +{ 0x1123E, 0x1123E }, { 0x112DF, 0x112DF }, { 0x112E3, 0x112EA }, -{ 0x11301, 0x11301 }, +{ 0x11300, 0x11301 }, { 0x1133C, 0x1133C }, { 0x11340, 0x11340 }, { 0x11366, 0x1136C }, { 0x11370, 0x11374 }, +{ 0x11438, 0x1143F }, +{ 0x11442, 0x11444 }, +{ 0x11446, 0x11446 }, { 0x114B3, 0x114B8 }, { 0x114BA, 0x114BA }, { 0x114BF, 0x114C0 }, @@ -245,6 +251,7 @@ static const struct interval zero_width[] = { { 0x115B2, 0x115B5 }, { 0x115BC, 0x115BD }, { 0x115BF, 0x115C0 }, +{ 0x115DC, 0x115DD }, { 0x11633, 0x1163A }, { 0x1163D, 0x1163D }, { 0x1163F, 0x11640 }, @@ -252,6 +259,16 @@ static const struct interval zero_width[] = { { 0x116AD, 0x116AD }, { 0x116B0, 0x116B5 }, { 0x116B7, 0x116B7 }, +{ 0x1171D, 0x1171F }, +{ 0x11722, 0x11725 }, +{ 0x11727, 0x1172B }, +{ 0x11C30, 0x11C36 }, +{ 0x11C38, 0x11C3D }, +{ 0x11C3F, 0x11C3F }, +{ 0x11C92, 0x11CA7 }, +{ 0x11CAA, 0x11CB0 }, +{ 0x11CB2, 0x11CB3 }, +{ 0x11CB5, 0x11CB6 }, { 0x16AF0, 0x16AF4 }, { 0x16B30, 0x16B36 }, { 0x16F8F, 0x16F92 }, @@ -262,31 +279,59 @@ static const struct interval zero_width[] = { { 0x1D185, 0x1D18B }, { 0x1D1AA, 0x1D1AD }, { 0x1D242, 0x1D244 }, +{ 0x1DA00, 0x1DA36 }, +{ 0x1DA3B, 0x1DA6C }, +{ 0x1DA75, 0x1DA75 }, +{ 0x1DA84, 0x1DA84 }, +{ 0x1DA9B, 0x1DA9F }, +{ 0x1DAA1, 0x1DAAF }, +{ 0x1E000, 0x1E006 }, +{ 0x1E008, 0x1E018 }, +{ 0x1E01B, 0x1E021 }, +{ 0x1E023, 0x1E024 }, +{ 0x1E026, 0x1E02A }, { 0x1E8D0, 0x1E8D6 }, +{ 0x1E944, 0x1E94A }, { 0xE0001, 0xE0001 }, { 0xE0020, 0xE007F }, { 0xE0100, 0xE01EF } }; static const struct interval double_width[] = { -{ /* plane */ 0x0, 0x1C }, -{ /* plane */ 0x1C, 0x21 }, -{ /* plane */ 0x21, 0x22 }, -{ /* plane */ 0x22, 0x23 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, { 0x1100, 0x115F }, +{ 0x231A, 0x231B }, { 0x2329, 0x232A }, +{ 0x23E9, 0x23EC }, +{ 0x23F0, 0x23F0 }, +{ 0x23F3, 0x23F3 }, +{ 0x25FD, 0x25FE }, +{ 0x2614, 0x2615 }, +{ 0x2648, 0x2653 }, +{ 0x267F, 0x267F }, +{ 0x2693, 0x2693 }, +{ 0x26A1, 0x26A1 }, +{ 0x26AA, 0x26AB }, +{ 0x26BD, 0x26BE }, +{ 0x26C4, 0x26C5 }, +{ 0x26CE, 0x26CE }, +{ 0x26D4, 0x26D4 }, +{ 0x26EA, 0x26EA }, +{ 0x26F2, 0x26F3 }, +{ 0x26F5, 0x26F5 }, +{ 0x26FA, 0x26FA }, +{ 0x26FD, 0x26FD }, +{ 0x2705, 0x2705 }, +{ 0x270A, 0x270B }, +{ 0x2728, 0x2728 }, +{ 0x274C, 0x274C }, +{ 0x274E, 0x274E }, +{ 0x2753, 0x2755 }, +{ 0x2757, 0x2757 }, +{ 0x2795, 0x2797 }, +{ 0x27B0, 0x27B0 }, +{ 0x27BF, 0x27BF }, +{ 0x2B1B, 0x2B1C }, +{ 0x2B50, 0x2B50 }, +{ 0x2B55, 0x2B55 }, { 0x2E80, 0x2E99 }, { 0x2E9B, 0x2EF3 }, { 0x2F00, 0x2FD5 }, @@ -313,11 +358,49 @@ static const struct interval double_width[] = { { 0xFE68, 0xFE6B }, { 0xFF01, 0xFF60 }, { 0xFFE0, 0xFFE6 }, +{ 0x16FE0, 0x16FE0 }, +{ 0x17000, 0x187EC }, +{ 0x18800, 0x18AF2 }, { 0x1B000, 0x1B001 }, +{ 0x1F004, 0x1F004 }, +{ 0x1F0CF, 0x1F0CF }, +{ 0x1F18E, 0x1F18E }, +{ 0x1F191, 0x1F19A }, { 0x1F200, 0x1F202 }, -{ 0x1F210, 0x1F23A }, +{ 0x1F210, 0x1F23B }, { 0x1F240, 0x1F248 }, { 0x1F250, 0x1F251 }, +{ 0x1F300, 0x1F320 }, +{ 0x1F32D, 0x1F335 }, +{ 0x1F337, 0x1F37C }, +{ 0x1F37E, 0x1F393 }, +{ 0x1F3A0, 0x1F3CA }, +{ 0x1F3CF, 0x1F3D3 }, +{ 0x1F3E0, 0x1F3F0 }, +{ 0x1F3F4, 0x1F3F4 }, +{ 0x1F3F8, 0x1F43E }, +{ 0x1F440, 0x1F440 }, +{ 0x1F442, 0x1F4FC }, +{ 0x1F4FF, 0x1F53D }, +{ 0x1F54B, 0x1F54E }, +{ 0x1F550, 0x1F567 }, +{ 0x1F57A, 0x1F57A }, +{ 0x1F595, 0x1F596 }, +{ 0x1F5A4, 0x1F5A4 }, +{ 0x1F5FB, 0x1F64F }, +{ 0x1F680, 0x1F6C5 }, +{ 0x1F6CC, 0x1F6CC }, +{ 0x1F6D0, 0x1F6D2 }, +{ 0x1F6EB, 0x1F6EC }, +{ 0x1F6F4, 0x1F6F6 }, +{ 0x1F910, 0x1F91E }, +{ 0x1F920, 0x1F927 }, +{ 0x1F930, 0x1F930 }, +{ 0x1F933, 0x1F93E }, +{ 0x1F940, 0x1F94B }, +{ 0x1F950, 0x1F95E }, +{ 0x1F980, 0x1F991 }, +{ 0x1F9C0, 0x1F9C0 }, { 0x20000, 0x2FFFD }, { 0x30000, 0x3FFFD } }; -- 2.7.2 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v3 1/3] update-unicode.sh: automatically download newer definition files 2016-12-03 13:19 ` [PATCH v3 1/3] update-unicode.sh: automatically download newer definition files Beat Bolli 2016-12-03 13:19 ` [PATCH v3 2/3] update-unicode.sh: strip the plane offsets from the double_width[] table Beat Bolli 2016-12-03 13:19 ` [PATCH v3 3/3] unicode_width.h: update the tables to Unicode 9.0 Beat Bolli @ 2016-12-03 16:40 ` Torsten =?unknown-8bit?Q?B=C3=B6gershausen?= 2016-12-03 16:41 ` Beat Bolli 2 siblings, 1 reply; 18+ messages in thread From: Torsten =?unknown-8bit?Q?B=C3=B6gershausen?= @ 2016-12-03 16:40 UTC (permalink / raw) To: Beat Bolli; +Cc: git On Sat, Dec 03, 2016 at 02:19:31PM +0100, Beat Bolli wrote: > Checking just for the unicode data files' existence is not sufficient; > we should also download them if a newer version exists on the Unicode > consortium's servers. Option -N of wget does this nicely for us. > > Cc: Torsten B??gershausen <tboegi@web.de> The V3 series makes perfect sense, thanks for cleaning up my mess. (And can we remove the Cc: line, or replace with it Reviewed-by ?) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 1/3] update-unicode.sh: automatically download newer definition files 2016-12-03 16:40 ` [PATCH v3 1/3] update-unicode.sh: automatically download newer definition files Torsten =?unknown-8bit?Q?B=C3=B6gershausen?= @ 2016-12-03 16:41 ` Beat Bolli 2016-12-03 21:00 ` [PATCH v4 " Beat Bolli 0 siblings, 1 reply; 18+ messages in thread From: Beat Bolli @ 2016-12-03 16:41 UTC (permalink / raw) To: Torsten =?unknown-8bit?Q?B=C3=B6gershausen?=; +Cc: git On 03.12.16 17:40, Torsten =?unknown-8bit?Q?B=C3=B6gershausen?= wrote: > On Sat, Dec 03, 2016 at 02:19:31PM +0100, Beat Bolli wrote: >> Checking just for the unicode data files' existence is not sufficient; >> we should also download them if a newer version exists on the Unicode >> consortium's servers. Option -N of wget does this nicely for us. >> >> Cc: Torsten B??gershausen <tboegi@web.de> > > The V3 series makes perfect sense, thanks for cleaning up my mess. Yeah, it took me three tries, too :-) > (And can we remove the Cc: line, or replace with it Reviewed-by ?) If you prefer, sure. Do you have any other comments? Beat ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v4 1/3] update-unicode.sh: automatically download newer definition files 2016-12-03 16:41 ` Beat Bolli @ 2016-12-03 21:00 ` Beat Bolli 2016-12-03 21:00 ` [PATCH v4 2/3] update-unicode.sh: strip the plane offsets from the double_width[] table Beat Bolli ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Beat Bolli @ 2016-12-03 21:00 UTC (permalink / raw) To: git; +Cc: Beat Bolli Checking just for the unicode data files' existence is not sufficient; we should also download them if a newer version exists on the Unicode consortium's servers. Option -N of wget does this nicely for us. Reviewed-by: Torsten Boegershausen <tboegi@web.de> Signed-off-by: Beat Bolli <dev+git@drbeat.li> --- Diff to v3: - change the Cc: into Reviewed-by: on Thorsten's request - include the old reroll diffs Diff to v2: - reorder the commits: fix all of update-unicode.sh first, then regenerate unicode_width.h only once Diff to v1: - reword the commit message - add Thorsten's Cc: update_unicode.sh | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/update_unicode.sh b/update_unicode.sh index 27af77c..3c84270 100755 --- a/update_unicode.sh +++ b/update_unicode.sh @@ -10,12 +10,8 @@ if ! test -d unicode; then mkdir unicode fi && ( cd unicode && - if ! test -f UnicodeData.txt; then - wget http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt - fi && - if ! test -f EastAsianWidth.txt; then - wget http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt - fi && + wget -N http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt \ + http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt && if ! test -d uniset; then git clone https://github.com/depp/uniset.git fi && -- 2.7.2 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v4 2/3] update-unicode.sh: strip the plane offsets from the double_width[] table 2016-12-03 21:00 ` [PATCH v4 " Beat Bolli @ 2016-12-03 21:00 ` Beat Bolli 2016-12-03 21:00 ` [PATCH v4 3/3] unicode_width.h: update the tables to Unicode 9.0 Beat Bolli 2016-12-04 7:58 ` [PATCH v4 1/3] update-unicode.sh: automatically download newer definition files Torsten Bögershausen 2 siblings, 0 replies; 18+ messages in thread From: Beat Bolli @ 2016-12-03 21:00 UTC (permalink / raw) To: git; +Cc: Beat Bolli The function bisearch() in utf8.c does a pure binary search in double_width. It does not care about the 17 plane offsets which unicode/uniset/uniset prepends. Leaving the plane offsets in the table may cause wrong results. Filter out the plane offsets in update-unicode.sh. Reviewed-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Beat Bolli <dev+git@drbeat.li> --- update_unicode.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/update_unicode.sh b/update_unicode.sh index 3c84270..4c1ec8d 100755 --- a/update_unicode.sh +++ b/update_unicode.sh @@ -30,7 +30,7 @@ fi && grep -v plane) }; static const struct interval double_width[] = { - $(uniset/uniset --32 eaw:F,W) + $(uniset/uniset --32 eaw:F,W | grep -v plane) }; EOF ) -- 2.7.2 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v4 3/3] unicode_width.h: update the tables to Unicode 9.0 2016-12-03 21:00 ` [PATCH v4 " Beat Bolli 2016-12-03 21:00 ` [PATCH v4 2/3] update-unicode.sh: strip the plane offsets from the double_width[] table Beat Bolli @ 2016-12-03 21:00 ` Beat Bolli 2016-12-04 7:58 ` [PATCH v4 1/3] update-unicode.sh: automatically download newer definition files Torsten Bögershausen 2 siblings, 0 replies; 18+ messages in thread From: Beat Bolli @ 2016-12-03 21:00 UTC (permalink / raw) To: git; +Cc: Beat Bolli Rerunning update-unicode.sh that we fixed in the two previous commits produces these new tables. Signed-off-by: Beat Bolli <dev+git@drbeat.li> --- unicode_width.h | 131 +++++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 107 insertions(+), 24 deletions(-) diff --git a/unicode_width.h b/unicode_width.h index 47cdd23..02207be 100644 --- a/unicode_width.h +++ b/unicode_width.h @@ -25,7 +25,7 @@ static const struct interval zero_width[] = { { 0x0825, 0x0827 }, { 0x0829, 0x082D }, { 0x0859, 0x085B }, -{ 0x08E4, 0x0902 }, +{ 0x08D4, 0x0902 }, { 0x093A, 0x093A }, { 0x093C, 0x093C }, { 0x0941, 0x0948 }, @@ -120,6 +120,7 @@ static const struct interval zero_width[] = { { 0x17C9, 0x17D3 }, { 0x17DD, 0x17DD }, { 0x180B, 0x180E }, +{ 0x1885, 0x1886 }, { 0x18A9, 0x18A9 }, { 0x1920, 0x1922 }, { 0x1927, 0x1928 }, @@ -158,7 +159,7 @@ static const struct interval zero_width[] = { { 0x1CF4, 0x1CF4 }, { 0x1CF8, 0x1CF9 }, { 0x1DC0, 0x1DF5 }, -{ 0x1DFC, 0x1DFF }, +{ 0x1DFB, 0x1DFF }, { 0x200B, 0x200F }, { 0x202A, 0x202E }, { 0x2060, 0x2064 }, @@ -171,13 +172,13 @@ static const struct interval zero_width[] = { { 0x3099, 0x309A }, { 0xA66F, 0xA672 }, { 0xA674, 0xA67D }, -{ 0xA69F, 0xA69F }, +{ 0xA69E, 0xA69F }, { 0xA6F0, 0xA6F1 }, { 0xA802, 0xA802 }, { 0xA806, 0xA806 }, { 0xA80B, 0xA80B }, { 0xA825, 0xA826 }, -{ 0xA8C4, 0xA8C4 }, +{ 0xA8C4, 0xA8C5 }, { 0xA8E0, 0xA8F1 }, { 0xA926, 0xA92D }, { 0xA947, 0xA951 }, @@ -204,7 +205,7 @@ static const struct interval zero_width[] = { { 0xABED, 0xABED }, { 0xFB1E, 0xFB1E }, { 0xFE00, 0xFE0F }, -{ 0xFE20, 0xFE2D }, +{ 0xFE20, 0xFE2F }, { 0xFEFF, 0xFEFF }, { 0xFFF9, 0xFFFB }, { 0x101FD, 0x101FD }, @@ -228,16 +229,21 @@ static const struct interval zero_width[] = { { 0x11173, 0x11173 }, { 0x11180, 0x11181 }, { 0x111B6, 0x111BE }, +{ 0x111CA, 0x111CC }, { 0x1122F, 0x11231 }, { 0x11234, 0x11234 }, { 0x11236, 0x11237 }, +{ 0x1123E, 0x1123E }, { 0x112DF, 0x112DF }, { 0x112E3, 0x112EA }, -{ 0x11301, 0x11301 }, +{ 0x11300, 0x11301 }, { 0x1133C, 0x1133C }, { 0x11340, 0x11340 }, { 0x11366, 0x1136C }, { 0x11370, 0x11374 }, +{ 0x11438, 0x1143F }, +{ 0x11442, 0x11444 }, +{ 0x11446, 0x11446 }, { 0x114B3, 0x114B8 }, { 0x114BA, 0x114BA }, { 0x114BF, 0x114C0 }, @@ -245,6 +251,7 @@ static const struct interval zero_width[] = { { 0x115B2, 0x115B5 }, { 0x115BC, 0x115BD }, { 0x115BF, 0x115C0 }, +{ 0x115DC, 0x115DD }, { 0x11633, 0x1163A }, { 0x1163D, 0x1163D }, { 0x1163F, 0x11640 }, @@ -252,6 +259,16 @@ static const struct interval zero_width[] = { { 0x116AD, 0x116AD }, { 0x116B0, 0x116B5 }, { 0x116B7, 0x116B7 }, +{ 0x1171D, 0x1171F }, +{ 0x11722, 0x11725 }, +{ 0x11727, 0x1172B }, +{ 0x11C30, 0x11C36 }, +{ 0x11C38, 0x11C3D }, +{ 0x11C3F, 0x11C3F }, +{ 0x11C92, 0x11CA7 }, +{ 0x11CAA, 0x11CB0 }, +{ 0x11CB2, 0x11CB3 }, +{ 0x11CB5, 0x11CB6 }, { 0x16AF0, 0x16AF4 }, { 0x16B30, 0x16B36 }, { 0x16F8F, 0x16F92 }, @@ -262,31 +279,59 @@ static const struct interval zero_width[] = { { 0x1D185, 0x1D18B }, { 0x1D1AA, 0x1D1AD }, { 0x1D242, 0x1D244 }, +{ 0x1DA00, 0x1DA36 }, +{ 0x1DA3B, 0x1DA6C }, +{ 0x1DA75, 0x1DA75 }, +{ 0x1DA84, 0x1DA84 }, +{ 0x1DA9B, 0x1DA9F }, +{ 0x1DAA1, 0x1DAAF }, +{ 0x1E000, 0x1E006 }, +{ 0x1E008, 0x1E018 }, +{ 0x1E01B, 0x1E021 }, +{ 0x1E023, 0x1E024 }, +{ 0x1E026, 0x1E02A }, { 0x1E8D0, 0x1E8D6 }, +{ 0x1E944, 0x1E94A }, { 0xE0001, 0xE0001 }, { 0xE0020, 0xE007F }, { 0xE0100, 0xE01EF } }; static const struct interval double_width[] = { -{ /* plane */ 0x0, 0x1C }, -{ /* plane */ 0x1C, 0x21 }, -{ /* plane */ 0x21, 0x22 }, -{ /* plane */ 0x22, 0x23 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, { 0x1100, 0x115F }, +{ 0x231A, 0x231B }, { 0x2329, 0x232A }, +{ 0x23E9, 0x23EC }, +{ 0x23F0, 0x23F0 }, +{ 0x23F3, 0x23F3 }, +{ 0x25FD, 0x25FE }, +{ 0x2614, 0x2615 }, +{ 0x2648, 0x2653 }, +{ 0x267F, 0x267F }, +{ 0x2693, 0x2693 }, +{ 0x26A1, 0x26A1 }, +{ 0x26AA, 0x26AB }, +{ 0x26BD, 0x26BE }, +{ 0x26C4, 0x26C5 }, +{ 0x26CE, 0x26CE }, +{ 0x26D4, 0x26D4 }, +{ 0x26EA, 0x26EA }, +{ 0x26F2, 0x26F3 }, +{ 0x26F5, 0x26F5 }, +{ 0x26FA, 0x26FA }, +{ 0x26FD, 0x26FD }, +{ 0x2705, 0x2705 }, +{ 0x270A, 0x270B }, +{ 0x2728, 0x2728 }, +{ 0x274C, 0x274C }, +{ 0x274E, 0x274E }, +{ 0x2753, 0x2755 }, +{ 0x2757, 0x2757 }, +{ 0x2795, 0x2797 }, +{ 0x27B0, 0x27B0 }, +{ 0x27BF, 0x27BF }, +{ 0x2B1B, 0x2B1C }, +{ 0x2B50, 0x2B50 }, +{ 0x2B55, 0x2B55 }, { 0x2E80, 0x2E99 }, { 0x2E9B, 0x2EF3 }, { 0x2F00, 0x2FD5 }, @@ -313,11 +358,49 @@ static const struct interval double_width[] = { { 0xFE68, 0xFE6B }, { 0xFF01, 0xFF60 }, { 0xFFE0, 0xFFE6 }, +{ 0x16FE0, 0x16FE0 }, +{ 0x17000, 0x187EC }, +{ 0x18800, 0x18AF2 }, { 0x1B000, 0x1B001 }, +{ 0x1F004, 0x1F004 }, +{ 0x1F0CF, 0x1F0CF }, +{ 0x1F18E, 0x1F18E }, +{ 0x1F191, 0x1F19A }, { 0x1F200, 0x1F202 }, -{ 0x1F210, 0x1F23A }, +{ 0x1F210, 0x1F23B }, { 0x1F240, 0x1F248 }, { 0x1F250, 0x1F251 }, +{ 0x1F300, 0x1F320 }, +{ 0x1F32D, 0x1F335 }, +{ 0x1F337, 0x1F37C }, +{ 0x1F37E, 0x1F393 }, +{ 0x1F3A0, 0x1F3CA }, +{ 0x1F3CF, 0x1F3D3 }, +{ 0x1F3E0, 0x1F3F0 }, +{ 0x1F3F4, 0x1F3F4 }, +{ 0x1F3F8, 0x1F43E }, +{ 0x1F440, 0x1F440 }, +{ 0x1F442, 0x1F4FC }, +{ 0x1F4FF, 0x1F53D }, +{ 0x1F54B, 0x1F54E }, +{ 0x1F550, 0x1F567 }, +{ 0x1F57A, 0x1F57A }, +{ 0x1F595, 0x1F596 }, +{ 0x1F5A4, 0x1F5A4 }, +{ 0x1F5FB, 0x1F64F }, +{ 0x1F680, 0x1F6C5 }, +{ 0x1F6CC, 0x1F6CC }, +{ 0x1F6D0, 0x1F6D2 }, +{ 0x1F6EB, 0x1F6EC }, +{ 0x1F6F4, 0x1F6F6 }, +{ 0x1F910, 0x1F91E }, +{ 0x1F920, 0x1F927 }, +{ 0x1F930, 0x1F930 }, +{ 0x1F933, 0x1F93E }, +{ 0x1F940, 0x1F94B }, +{ 0x1F950, 0x1F95E }, +{ 0x1F980, 0x1F991 }, +{ 0x1F9C0, 0x1F9C0 }, { 0x20000, 0x2FFFD }, { 0x30000, 0x3FFFD } }; -- 2.7.2 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v4 1/3] update-unicode.sh: automatically download newer definition files 2016-12-03 21:00 ` [PATCH v4 " Beat Bolli 2016-12-03 21:00 ` [PATCH v4 2/3] update-unicode.sh: strip the plane offsets from the double_width[] table Beat Bolli 2016-12-03 21:00 ` [PATCH v4 3/3] unicode_width.h: update the tables to Unicode 9.0 Beat Bolli @ 2016-12-04 7:58 ` Torsten Bögershausen 2016-12-05 20:31 ` Junio C Hamano 2 siblings, 1 reply; 18+ messages in thread From: Torsten Bögershausen @ 2016-12-04 7:58 UTC (permalink / raw) To: Beat Bolli; +Cc: git On Sat, Dec 03, 2016 at 10:00:47PM +0100, Beat Bolli wrote: > Checking just for the unicode data files' existence is not sufficient; > we should also download them if a newer version exists on the Unicode > consortium's servers. Option -N of wget does this nicely for us. > > Reviewed-by: Torsten Boegershausen <tboegi@web.de> Minor remark (Not sure if this motivates v5, may be Junio can fix it locally?) s/oe/ö/ Beside this: Thanks again (and I learned about the -N option of wget) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 1/3] update-unicode.sh: automatically download newer definition files 2016-12-04 7:58 ` [PATCH v4 1/3] update-unicode.sh: automatically download newer definition files Torsten Bögershausen @ 2016-12-05 20:31 ` Junio C Hamano 2016-12-07 0:17 ` Beat Bolli 0 siblings, 1 reply; 18+ messages in thread From: Junio C Hamano @ 2016-12-05 20:31 UTC (permalink / raw) To: Torsten Bögershausen; +Cc: Beat Bolli, git Torsten Bögershausen <tboegi@web.de> writes: > On Sat, Dec 03, 2016 at 10:00:47PM +0100, Beat Bolli wrote: >> Checking just for the unicode data files' existence is not sufficient; >> we should also download them if a newer version exists on the Unicode >> consortium's servers. Option -N of wget does this nicely for us. >> >> Reviewed-by: Torsten Boegershausen <tboegi@web.de> > > Minor remark (Not sure if this motivates v5, may be Junio can fix it locally?) > s/oe/ö/ > > Beside this: Thanks again (and I learned about the -N option of wget) Will fix up while queuing (only 1/3 needs it, 2/3 has it right). Also, I'll do s/update-unicode.sh/update_unicode.sh/ on the title and the message to match the reality. At some point we might want to fix the reality to match people's expectations, though. Thanks. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 1/3] update-unicode.sh: automatically download newer definition files 2016-12-05 20:31 ` Junio C Hamano @ 2016-12-07 0:17 ` Beat Bolli 2016-12-07 1:00 ` jbh 0 siblings, 1 reply; 18+ messages in thread From: Beat Bolli @ 2016-12-07 0:17 UTC (permalink / raw) To: Junio C Hamano, Torsten Bögershausen; +Cc: git On 05.12.16 21:31, Junio C Hamano wrote: > Torsten Bögershausen <tboegi@web.de> writes: > >> On Sat, Dec 03, 2016 at 10:00:47PM +0100, Beat Bolli wrote: >>> Checking just for the unicode data files' existence is not sufficient; >>> we should also download them if a newer version exists on the Unicode >>> consortium's servers. Option -N of wget does this nicely for us. >>> >>> Reviewed-by: Torsten Boegershausen <tboegi@web.de> >> >> Minor remark (Not sure if this motivates v5, may be Junio can fix it locally?) >> s/oe/ö/ >> >> Beside this: Thanks again (and I learned about the -N option of wget) > > Will fix up while queuing (only 1/3 needs it, 2/3 has it right). > > Also, I'll do s/update-unicode.sh/update_unicode.sh/ on the title > and the message to match the reality. At some point we might want > to fix the reality to match people's expectations, though. Thanks, Junio. This was a bit sloppy of me. I really appreciate your regard for the small things! Cheers, Beat ^ permalink raw reply [flat|nested] 18+ messages in thread
* (no subject) 2016-12-07 0:17 ` Beat Bolli @ 2016-12-07 1:00 ` jbh 0 siblings, 0 replies; 18+ messages in thread From: jbh @ 2016-12-07 1:00 UTC (permalink / raw) To: git unsubscribe ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2016-12-07 1:00 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-12-02 21:26 [PATCH 1/2] update-unicode.sh: automatically download newer definition files Beat Bolli 2016-12-02 21:26 ` [PATCH 2/2] unicode: update the tables to Unicode 9.0 Beat Bolli 2016-12-03 10:35 ` [PATCH 3/3] unicode_width.h: fix the double_width[] table Beat Bolli 2016-12-03 10:53 ` [PATCH v2 1/3] update-unicode.sh: automatically download newer definition files Beat Bolli 2016-12-03 10:53 ` [PATCH v2 2/3] unicode_width.h: update the tables to Unicode 9.0 Beat Bolli 2016-12-03 10:53 ` [PATCH v2 3/3] unicode_width.h: fix the double_width[] table Beat Bolli 2016-12-03 13:19 ` [PATCH v3 1/3] update-unicode.sh: automatically download newer definition files Beat Bolli 2016-12-03 13:19 ` [PATCH v3 2/3] update-unicode.sh: strip the plane offsets from the double_width[] table Beat Bolli 2016-12-03 13:19 ` [PATCH v3 3/3] unicode_width.h: update the tables to Unicode 9.0 Beat Bolli 2016-12-03 16:40 ` [PATCH v3 1/3] update-unicode.sh: automatically download newer definition files Torsten =?unknown-8bit?Q?B=C3=B6gershausen?= 2016-12-03 16:41 ` Beat Bolli 2016-12-03 21:00 ` [PATCH v4 " Beat Bolli 2016-12-03 21:00 ` [PATCH v4 2/3] update-unicode.sh: strip the plane offsets from the double_width[] table Beat Bolli 2016-12-03 21:00 ` [PATCH v4 3/3] unicode_width.h: update the tables to Unicode 9.0 Beat Bolli 2016-12-04 7:58 ` [PATCH v4 1/3] update-unicode.sh: automatically download newer definition files Torsten Bögershausen 2016-12-05 20:31 ` Junio C Hamano 2016-12-07 0:17 ` Beat Bolli 2016-12-07 1:00 ` jbh
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.