All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists
@ 2016-12-11 23:34 Beat Bolli
  2016-12-11 23:34 ` [PATCH 2/3] update_unicode.sh: remove the plane filters Beat Bolli
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Beat Bolli @ 2016-12-11 23:34 UTC (permalink / raw)
  To: git; +Cc: Beat Bolli

We need to track the new commits in uniset, otherwise their and our code
get out of sync.

Signed-off-by: Beat Bolli <dev+git@drbeat.li>
---

Junio, these go on top of my bb/unicode-9.0 branch, please.

Thanks!

 update_unicode.sh | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/update_unicode.sh b/update_unicode.sh
index 4c1ec8d..9ca7d8b 100755
--- a/update_unicode.sh
+++ b/update_unicode.sh
@@ -14,6 +14,11 @@ fi &&
 		http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt &&
 	if ! test -d uniset; then
 		git clone https://github.com/depp/uniset.git
+	else
+	(
+		cd uniset &&
+		git pull
+	)
 	fi &&
 	(
 		cd uniset &&
-- 
2.7.2

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] update_unicode.sh: remove the plane filters
  2016-12-11 23:34 [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists Beat Bolli
@ 2016-12-11 23:34 ` Beat Bolli
  2016-12-11 23:34 ` [PATCH 3/3] update_unicode.sh: restore hexadecimal output Beat Bolli
  2016-12-12  5:53 ` [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists Torsten Bögershausen
  2 siblings, 0 replies; 11+ messages in thread
From: Beat Bolli @ 2016-12-11 23:34 UTC (permalink / raw)
  To: git; +Cc: Beat Bolli

The uniset upstream has accepted my patches that eliminate the Unicode
plane offsets from the output in '--32' mode.

Remove the corresponding filter in update_unicode.sh.

Signed-off-by: Beat Bolli <dev+git@drbeat.li>
---
 update_unicode.sh | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/update_unicode.sh b/update_unicode.sh
index 9ca7d8b..e595bf8 100755
--- a/update_unicode.sh
+++ b/update_unicode.sh
@@ -31,11 +31,10 @@ fi &&
 	UNICODE_DIR=. && export UNICODE_DIR &&
 	cat >$UNICODEWIDTH_H <<-EOF
 	static const struct interval zero_width[] = {
-		$(uniset/uniset --32 cat:Me,Mn,Cf + U+1160..U+11FF - U+00AD |
-		  grep -v plane)
+		$(uniset/uniset --32 cat:Me,Mn,Cf + U+1160..U+11FF - U+00AD)
 	};
 	static const struct interval double_width[] = {
-		$(uniset/uniset --32 eaw:F,W | grep -v plane)
+		$(uniset/uniset --32 eaw:F,W)
 	};
 	EOF
 )
-- 
2.7.2

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] update_unicode.sh: restore hexadecimal output
  2016-12-11 23:34 [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists Beat Bolli
  2016-12-11 23:34 ` [PATCH 2/3] update_unicode.sh: remove the plane filters Beat Bolli
@ 2016-12-11 23:34 ` Beat Bolli
  2016-12-12  5:53 ` [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists Torsten Bögershausen
  2 siblings, 0 replies; 11+ messages in thread
From: Beat Bolli @ 2016-12-11 23:34 UTC (permalink / raw)
  To: git; +Cc: Beat Bolli

The uniset upstream has decided that decimal numbers are The True Way, so
let's convert them back to the usual format that's closer to the U+nnnn
standard.

The generated unicode_widths.h file again looks exactly the same as two
commits ago.

Signed-off-by: Beat Bolli <dev+git@drbeat.li>
---
 update_unicode.sh | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/update_unicode.sh b/update_unicode.sh
index e595bf8..d7720d5 100755
--- a/update_unicode.sh
+++ b/update_unicode.sh
@@ -5,6 +5,12 @@
 #Mn Nonspacing_Mark a nonspacing combining mark (zero advance width)
 #Cf Format          a format control character
 #
+
+dec_to_hex() {
+	# convert any decimal numbers to 4-digit hex
+	perl -pe 's/(\d+)/sprintf("0x%04X", $1)/ge'
+}
+
 UNICODEWIDTH_H=../unicode_width.h
 if ! test -d unicode; then
 	mkdir unicode
@@ -29,7 +35,7 @@ fi &&
 		make
 	) &&
 	UNICODE_DIR=. && export UNICODE_DIR &&
-	cat >$UNICODEWIDTH_H <<-EOF
+	dec_to_hex >$UNICODEWIDTH_H <<-EOF
 	static const struct interval zero_width[] = {
 		$(uniset/uniset --32 cat:Me,Mn,Cf + U+1160..U+11FF - U+00AD)
 	};
-- 
2.7.2

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists
  2016-12-11 23:34 [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists Beat Bolli
  2016-12-11 23:34 ` [PATCH 2/3] update_unicode.sh: remove the plane filters Beat Bolli
  2016-12-11 23:34 ` [PATCH 3/3] update_unicode.sh: restore hexadecimal output Beat Bolli
@ 2016-12-12  5:53 ` Torsten Bögershausen
  2016-12-12  8:54   ` Beat Bolli
  2 siblings, 1 reply; 11+ messages in thread
From: Torsten Bögershausen @ 2016-12-12  5:53 UTC (permalink / raw)
  To: Beat Bolli, git

On 2016-12-12 00:34, Beat Bolli wrote:
> We need to track the new commits in uniset, otherwise their and our code
> get out of sync.
> 
> Signed-off-by: Beat Bolli <dev+git@drbeat.li>
> ---
> 
> Junio, these go on top of my bb/unicode-9.0 branch, please.
> 
> Thanks!
> 
>  update_unicode.sh | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/update_unicode.sh b/update_unicode.sh
> index 4c1ec8d..9ca7d8b 100755
> --- a/update_unicode.sh
> +++ b/update_unicode.sh
> @@ -14,6 +14,11 @@ fi &&
>  		http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt &&
>  	if ! test -d uniset; then
>  		git clone https://github.com/depp/uniset.git
> +	else
> +	(
> +		cd uniset &&
> +		git pull
If upstream has accepted your patches, that's nice.

Minor question, especially to the next commit:
Should we make sure to checkout the exact version, which has been tested?
In this case  cb97792880625e24a9f581412d03659091a0e54f

And this is for both a fresh clone and the git pull
needs to be replaced by
git fetch && git checkout cb97792880625e24a9f581412d03659091a0e54f


(Which of course is a shell variable


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists
  2016-12-12  5:53 ` [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists Torsten Bögershausen
@ 2016-12-12  8:54   ` Beat Bolli
  2016-12-12 18:12     ` Torsten Bögershausen
  0 siblings, 1 reply; 11+ messages in thread
From: Beat Bolli @ 2016-12-12  8:54 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git

On 2016-12-12 06:53, Torsten Bögershausen wrote:
> On 2016-12-12 00:34, Beat Bolli wrote:
>> We need to track the new commits in uniset, otherwise their and our 
>> code
>> get out of sync.
>> 
>> Signed-off-by: Beat Bolli <dev+git@drbeat.li>
>> ---
>> 
>> Junio, these go on top of my bb/unicode-9.0 branch, please.
>> 
>> Thanks!
>> 
>>  update_unicode.sh | 5 +++++
>>  1 file changed, 5 insertions(+)
>> 
>> diff --git a/update_unicode.sh b/update_unicode.sh
>> index 4c1ec8d..9ca7d8b 100755
>> --- a/update_unicode.sh
>> +++ b/update_unicode.sh
>> @@ -14,6 +14,11 @@ fi &&
>>  		http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt &&
>>  	if ! test -d uniset; then
>>  		git clone https://github.com/depp/uniset.git
>> +	else
>> +	(
>> +		cd uniset &&
>> +		git pull
> If upstream has accepted your patches, that's nice.
> 
> Minor question, especially to the next commit:
> Should we make sure to checkout the exact version, which has been 
> tested?
> In this case  cb97792880625e24a9f581412d03659091a0e54f
> 
> And this is for both a fresh clone and the git pull
> needs to be replaced by
> git fetch && git checkout cb97792880625e24a9f581412d03659091a0e54f
> 
> 
> (Which of course is a shell variable)

I was actually wondering what the policy was for adding submodules to 
the Git repo,
but then decided against it. Another option would be to fork uniset on 
GitHub and
just let it stay on a working commit.

Junio, what's your stance on this?

Beat

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists
  2016-12-12  8:54   ` Beat Bolli
@ 2016-12-12 18:12     ` Torsten Bögershausen
  2016-12-12 18:33       ` Junio C Hamano
  2016-12-12 19:24       ` Beat Bolli
  0 siblings, 2 replies; 11+ messages in thread
From: Torsten Bögershausen @ 2016-12-12 18:12 UTC (permalink / raw)
  To: Beat Bolli, Torsten Bögershausen; +Cc: git


>> Minor question, especially to the next commit:
>> Should we make sure to checkout the exact version, which has been tested?
>> In this case  cb97792880625e24a9f581412d03659091a0e54f
>>
>> And this is for both a fresh clone and the git pull
>> needs to be replaced by
>> git fetch && git checkout cb97792880625e24a9f581412d03659091a0e54f
>>
>>
>> (Which of course is a shell variable)
> 
> I was actually wondering what the policy was for adding submodules to the Git repo,
> but then decided against it. Another option would be to fork uniset on GitHub and
> just let it stay on a working commit.
> 
> Junio, what's your stance on this?
> 
> Beat

If I run  ./update_unicode.sh on the latest master of   https://github.com/depp/uniset.git ,
commit  a5fac4a091857dd5429cc2d, I get a diff in  unicode_width.h like this:

-{ 0x0300, 0x036F },

+{ 768, 879 },

IOW, all hex values are printed as decimal values.
Not a problem for the compiler, but for the human
to check the unicode tables.

So I think we should "pin" the version of uniset.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists
  2016-12-12 18:12     ` Torsten Bögershausen
@ 2016-12-12 18:33       ` Junio C Hamano
  2016-12-12 23:50         ` Beat Bolli
  2016-12-12 19:24       ` Beat Bolli
  1 sibling, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2016-12-12 18:33 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Beat Bolli, git

Torsten Bögershausen <tboegi@web.de> writes:

> If I run ./update_unicode.sh on the latest master of
> https://github.com/depp/uniset.git , commit
> a5fac4a091857dd5429cc2d, I get a diff in unicode_width.h like
> this:
>
> -{ 0x0300, 0x036F },
>
> +{ 768, 879 },
>
> IOW, all hex values are printed as decimal values.
> Not a problem for the compiler, but for the human
> to check the unicode tables.
>
> So I think we should "pin" the version of uniset.

Sure, and I'd rather see the update-unicode.sh script moved
somewhere in contrib/ while at it.  Those who are interested in
keeping up with the unicode standard are tiny minority of the
developer population, and most of us would treat the built width
table as the source (after all, that is what we ship).

To be bluntly honest, I'd rather not to see "update-unicode.sh"
download and build uniset at all.  It's as if po/ hierarchy shipping
with its own script to download and build msgmerge--that's madness.
Needless to say, shipping the sources for uniset embedded in our
project tree (either as a snapshot-fork or as a submodule) is even
worse.  Those who want to muck with po/ are expected to have
msgmerge and friends.  Why not expect the same for those who want to
update the unicode width table?

I'd rather see a written instruction telling which snapshot to get
and from where to build and place on their $PATH in the README file,
sitting next to the update-unicode.sh script in contrib/uniwidth/
directory, for those who are interested in building the width table
"from the source", and the update-unicode.sh script to assume that
uniset is available.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists
  2016-12-12 18:12     ` Torsten Bögershausen
  2016-12-12 18:33       ` Junio C Hamano
@ 2016-12-12 19:24       ` Beat Bolli
  1 sibling, 0 replies; 11+ messages in thread
From: Beat Bolli @ 2016-12-12 19:24 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git

On 12.12.16 19:12, Torsten Bögershausen wrote:
> 
>>> Minor question, especially to the next commit:
>>> Should we make sure to checkout the exact version, which has been tested?
>>> In this case  cb97792880625e24a9f581412d03659091a0e54f
>>>
>>> And this is for both a fresh clone and the git pull
>>> needs to be replaced by
>>> git fetch && git checkout cb97792880625e24a9f581412d03659091a0e54f
>>>
>>>
>>> (Which of course is a shell variable)
>>
>> I was actually wondering what the policy was for adding submodules to the Git repo,
>> but then decided against it. Another option would be to fork uniset on GitHub and
>> just let it stay on a working commit.
>>
>> Junio, what's your stance on this?
>>
>> Beat
> 
> If I run  ./update_unicode.sh on the latest master of   https://github.com/depp/uniset.git ,
> commit  a5fac4a091857dd5429cc2d, I get a diff in  unicode_width.h like this:
> 
> -{ 0x0300, 0x036F },
> 
> +{ 768, 879 },
> 
> IOW, all hex values are printed as decimal values.
> Not a problem for the compiler, but for the human
> to check the unicode tables.
> 
> So I think we should "pin" the version of uniset.

That's what patch 3/3 fixes.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists
  2016-12-12 18:33       ` Junio C Hamano
@ 2016-12-12 23:50         ` Beat Bolli
  2016-12-13  6:16           ` Torsten Bögershausen
  0 siblings, 1 reply; 11+ messages in thread
From: Beat Bolli @ 2016-12-12 23:50 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

On 12.12.16 19:33, Junio C Hamano wrote:
> Torsten Bögershausen <tboegi@web.de> writes:
> 
>> If I run ./update_unicode.sh on the latest master of
>> https://github.com/depp/uniset.git , commit
>> a5fac4a091857dd5429cc2d, I get a diff in unicode_width.h like
>> this:
>>
>> -{ 0x0300, 0x036F },
>>
>> +{ 768, 879 },
>>
>> IOW, all hex values are printed as decimal values.
>> Not a problem for the compiler, but for the human
>> to check the unicode tables.
>>
>> So I think we should "pin" the version of uniset.
> 
> Sure, and I'd rather see the update-unicode.sh script moved
> somewhere in contrib/ while at it.  Those who are interested in
> keeping up with the unicode standard are tiny minority of the
> developer population, and most of us would treat the built width
> table as the source (after all, that is what we ship).
> 
> To be bluntly honest, I'd rather not to see "update-unicode.sh"
> download and build uniset at all.  It's as if po/ hierarchy shipping
> with its own script to download and build msgmerge--that's madness.
> Needless to say, shipping the sources for uniset embedded in our
> project tree (either as a snapshot-fork or as a submodule) is even
> worse.  Those who want to muck with po/ are expected to have
> msgmerge and friends.  Why not expect the same for those who want to
> update the unicode width table?
> 
> I'd rather see a written instruction telling which snapshot to get
> and from where to build and place on their $PATH in the README file,
> sitting next to the update-unicode.sh script in contrib/uniwidth/
> directory, for those who are interested in building the width table
> "from the source", and the update-unicode.sh script to assume that
> uniset is available.
> 

OK. So please don't merge bb/unicode-9.0 to next yet; I'll prepare a
reroll following your description.

Torsten, is this alright with you?

Cheers, Beat

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists
  2016-12-12 23:50         ` Beat Bolli
@ 2016-12-13  6:16           ` Torsten Bögershausen
  2016-12-13  6:42             ` Junio C Hamano
  0 siblings, 1 reply; 11+ messages in thread
From: Torsten Bögershausen @ 2016-12-13  6:16 UTC (permalink / raw)
  To: Beat Bolli, git


>> Sure, and I'd rather see the update-unicode.sh script moved
>> somewhere in contrib/ while at it.  Those who are interested in
>> keeping up with the unicode standard are tiny minority of the
>> developer population, and most of us would treat the built width
>> table as the source (after all, that is what we ship).
>>
>> To be bluntly honest, I'd rather not to see "update-unicode.sh"
>> download and build uniset at all.  It's as if po/ hierarchy shipping
>> with its own script to download and build msgmerge--that's madness.
>> Needless to say, shipping the sources for uniset embedded in our
>> project tree (either as a snapshot-fork or as a submodule) is even
>> worse.  Those who want to muck with po/ are expected to have
>> msgmerge and friends.  Why not expect the same for those who want to
>> update the unicode width table?
>>
>> I'd rather see a written instruction telling which snapshot to get
>> and from where to build and place on their $PATH in the README file,
>> sitting next to the update-unicode.sh script in contrib/uniwidth/
>> directory, for those who are interested in building the width table
>> "from the source", and the update-unicode.sh script to assume that
>> uniset is available.
OK with the contrib - that's an improvement.
About the instructions how to download and compile:
(we don't need to change the  $PATH, do we ?)
I don't know.
The typical instructions I have seen are a sequence of shell commands
to be executed, which hopefully simply work by doing "copy-and-paste".
I find this error-prone, as you you may loose the last character while
moving the mouse, or don't check the error message or return codes.
Having a pre-baked shell script, which does use "&&" is in that way more 
attractive,
and the README can be as simple as run "update-unicode.sh" and that's it.

uniset is a small project and where should we put it ?
a) inside the Git tree?
b) /tmp ?
c) into the $HOME directory ?
d) /usr/local

a) is quick and dirty
b) probably OK
c) Not sure about tha
d) Needs super user rights
Can we try to find a good place ?

"contrib/uniwidth/" may be different to find, how about contrib/update-unicode ?
  

> OK. So please don't merge bb/unicode-9.0 to next yet; I'll prepare a
> reroll following your description.
>
> Torsten, is this alright with you?

sure

> Cheers, Beat



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists
  2016-12-13  6:16           ` Torsten Bögershausen
@ 2016-12-13  6:42             ` Junio C Hamano
  0 siblings, 0 replies; 11+ messages in thread
From: Junio C Hamano @ 2016-12-13  6:42 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Beat Bolli, git

Torsten Bögershausen <tboegi@web.de> writes:

> The typical instructions I have seen are a sequence of shell commands
> to be executed, which hopefully simply work by doing "copy-and-paste".
> I find this error-prone, as you you may loose the last character while
> moving the mouse, or don't check the error message or return codes.
> Having a pre-baked shell script, which does use "&&" is in that way
> more attractive,
> and the README can be as simple as run "update-unicode.sh" and that's it.

That's OK as well.

> "contrib/uniwidth/" may be different to find, how about contrib/update-unicode ?

This, too.  And as long as .gitignore pattern is set up correctly
there, I do not think we terribly mind "git clone ..from..there.."
into it, either.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-12-13  6:42 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-11 23:34 [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists Beat Bolli
2016-12-11 23:34 ` [PATCH 2/3] update_unicode.sh: remove the plane filters Beat Bolli
2016-12-11 23:34 ` [PATCH 3/3] update_unicode.sh: restore hexadecimal output Beat Bolli
2016-12-12  5:53 ` [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists Torsten Bögershausen
2016-12-12  8:54   ` Beat Bolli
2016-12-12 18:12     ` Torsten Bögershausen
2016-12-12 18:33       ` Junio C Hamano
2016-12-12 23:50         ` Beat Bolli
2016-12-13  6:16           ` Torsten Bögershausen
2016-12-13  6:42             ` Junio C Hamano
2016-12-12 19:24       ` Beat Bolli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.