All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: [musl] Re: Regression: git no longer works with musl libc's regex impl
@ 2016-10-05 16:37 writeonce
  0 siblings, 0 replies; 16+ messages in thread
From: writeonce @ 2016-10-05 16:37 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: musl, git, Jeff King

Hi Johannes,

> 
> 
> -------- Original Message --------
> Subject: RE: [musl] Re: Regression: git no longer works with musl libc's
> regex impl
> From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
> Date: Wed, October 05, 2016 3:49 am
> To: writeonce@midipix.org
> Cc: musl@lists.openwall.com, git@vger.kernel.org, Jeff King
> <peff@peff.net>
> 
> Hi writeonce,
> 
> On Tue, 4 Oct 2016, writeonce@midipix.org wrote:
> 
> > < On Tue, 4 Oct 2016, Rich Felker wrote:
> > < 
> > < > On Tue, Oct 04, 2016 at 11:27:22AM -0400, Jeff King wrote:
> > < > > On Tue, Oct 04, 2016 at 11:08:48AM -0400, Rich Felker wrote:
> > < > > 
> > < > > > 1. is nonzero mod page size, it just works; the remainder of the
> > < > > > last page reads as zero bytes when mmapped.
> > < > > 
> > < > > Is that a portable assumption?
> > < > 
> > < > Yes.
> > < 
> > < No, it is not. You quote POSIX, but the matter of the fact is that we
> > < use a subset of POSIX in order to be able to keep things running on
> > < Windows.
> > 
> > As far as I can tell (and as the attached program may help demonstrate),
> > the above assumption has been valid on all versions of Windows since at
> > least Windows 2000.
> 
> And since W2K is already past its end of life, it would be safe for
> practical considerations.
> 
> However, I have to add two comments to that:
> 
> - it is *not* guaranteed. The behavior is undefined, even if you see
>  consistent behavior so far. Future Windows versions might break that
>  assumption freely, though.
> 

That is of course the official language, and generally speaking a good
rule of thumb. However... there is enough information to suggest that
when it comes to mapping of file-backed sections, the NT kernel
developers will choose to keep things the way they are. In brief, here's
why:

Given a "gray zone" that spans from EOF to end-of-page, there are in
essence three possible behaviors:

[1] bytes in the gray zone are accessible and are all zero's. This is
the current behavior.
[2] bytes in the gray zone are not accessible; trying to read past EOF
would result in a segfault.
[3] bytes in the gray zone are accessible but might contain random data
or junk.

Assessment:

[1] backward-compatible, POSIX-compliant, single code path for both
WIN32 and LXW.
[2] requires changing memory access granularity from 4096 bytes to a
single byte, and is therefore extremely costly.
[3] introduces a whole new class of security vulnerabilities, and will
thus be a lot of fun to watch:-)

All in all taken, then, I'd argue that relying on the current behavior
is very reasonable. If you, too, find the above assessment valid, and
since you mentioned that you were a Microsoft employee, it would be
great if you could make a good-faith effort to have the current behavior
added to the Driver Documentation and thus guaranteed.

PS. this isn't to say that the regex extension should or should not be
used, only that a decision on the matter should not be based on the
undocumentedness of current behavior.

midipix

> - some implementations of the REG_STARTEND feature have the nice property
>  that they can read past NUL characters. Granted, not all of them do
>  (AFAIU one example is FreeBSD itself, the first platform to sport
>  REG_STARTEND), but we at least reap the benefit whenever using a regex
>  that *can* read past NUL characters.
> 
> > In this context, one thing to remember is that the page-size for the mod
> > operation is 4096, whereas the POSIX page-size (for the purpose of mmap
> > and mremap) is 65536.
> 
> Indeed. A colleague of mine spotted the segfault when diffing a file that
> was *exactly* 4,096 bytes.
> 
> > Note also that in the case of file-backed mapped sections, using
> > kernel32.dll or msvcrt.dll or cygwin/newlib or midipix/musl is of little
> > significance, specifically since all invoke ZwCreateSection and
> > ZwMapViewOfSection under the hood.
> 
> Right. It's all backed by the very same kernel functions.
> 
> Ciao,
> Johannes


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [musl] Re: Regression: git no longer works with musl libc's regex impl
  2016-10-05 11:59           ` James B
  2016-10-05 16:11             ` Jeff King
@ 2016-10-06 10:44             ` Johannes Schindelin
  1 sibling, 0 replies; 16+ messages in thread
From: Johannes Schindelin @ 2016-10-06 10:44 UTC (permalink / raw)
  To: James B; +Cc: musl, Rich Felker, Jeff King, git

Hi James,

On Wed, 5 Oct 2016, James B wrote:

> On Wed, 5 Oct 2016 12:41:50 +0200 (CEST)
> Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> 
> It's a very sad day for a tool that was developed originally to maintain
> Linux kernel, by the Linux kernel author, now is restricted to avoid
> use/optimise on Linux/POSIX features *because* it has to run on another
> Windows ...

Please note that this thread started because Git tried to shed the
restrictions of POSIX by using REG_STARTEND. In other words, we tried very
much *not* to be restricted.

And pragmatically, I must add that the REG_STARTEND feature is a very,
very useful one.

If you want to turn your sadness into something productive (you will allow
me that little prod after all you said in your mail), why don't you
dig back through the Git mailing list, fish out my original patch that
fell back to the "malloc();memcpy();/*add NUL*/" strategy, and contribute
that as a patch to the Git project, making a case that musl requires it?

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [musl] Re: Regression: git no longer works with musl libc's regex impl
  2016-10-05 16:11             ` Jeff King
@ 2016-10-05 16:27               ` Rich Felker
  0 siblings, 0 replies; 16+ messages in thread
From: Rich Felker @ 2016-10-05 16:27 UTC (permalink / raw)
  To: Jeff King; +Cc: James B, Johannes Schindelin, musl, git

On Wed, Oct 05, 2016 at 12:11:58PM -0400, Jeff King wrote:
> On Wed, Oct 05, 2016 at 10:59:34PM +1100, James B wrote:
> 
> > Number downloads does not make first-tier platform. You know that as
> > well as everyone else.
> > 
> > First-tier support is the decision made by the maintainers that the
> > entire features of the software must be available on those first tier
> > platforms. So if Windows is indeed first-tier platform for git, it
> > means any features that don't work on git version of Windows must not
> > be used/developed or even castrated. That's a scary thought.
> 
> Prepare to be scared, then, I guess. Ever since the msysgit project
> started years ago, we have made concessions in the code to work both
> with POSIX-ish systems and with the msys layer. E.g., see how git-daemon
> does not fork(), but actually re-spawns itself to handle connections.
> 
> When possible we try to put our abstractions at a level where they can
> be implemented in a performant way on all platforms (the git-daemon
> things is probably the _most_ ugly in that respect; I think nobody has
> really cared about the performance enough to add back in a forking code
> path for POSIX systems).
> 
> > So this decision that "Windows is now a first-tier platform for git" -
> > is your own opinion, or is this the collective opinion of *all* the
> > git maintainers?
> 
> There is only one maintainer of git: Junio. However, you'll note that I
> also used "we" in the paragraphs above. And that is because the approach
> I am talking about is something that has been done over the course of
> many years by many members of the development community.
> 
> You may disagree with that approach, but it is nothing new. The msysgit
> project started in 2007.

The goal of the midipix project is to make the need for FOSS projects
supporting Windows to do hacks like this obsolete. It still has a
little ways to go to be ready for mainstream use, but it's already
running a lot, and I hope you'll consider it for the future since it
simplifies things A LOT when you can just write to POSIX instead of
having to come up with abstraction layers that cater to Windows'
brokenness.

> > Well thank you for being honest. I can see now why you responded the
> > way you did (and still do). By being employed by Microsoft, and
> > especially paid to work on Git for Windows, you have all the
> > incentives to make it work best on Windows, and to make it as its
> > first-tier platform within the limitation of Windows.
> 
> Please don't insinuate that Johannes is a Microsoft shill. He has been
> working on the Windows port of Git for over 9 years, and was only
> employed by Microsoft this year. Furthermore, his original REG_STARTEND
> patch actually did a run-time fallback of NUL-terminating the input
> buffers. It was _I_ who suggested that we should simply push people
> towards our compat/regex routines instead. So if you want to be mad at
> somebody, be mad at me.

I hope we can get this thread away from accusing and attacking people
and on to doing productive things to make the software better.

Rich

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [musl] Re: Regression: git no longer works with musl libc's regex impl
  2016-10-05 13:11           ` Jakub Narębski
@ 2016-10-05 16:15             ` Rich Felker
  0 siblings, 0 replies; 16+ messages in thread
From: Rich Felker @ 2016-10-05 16:15 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: musl, James B, Johannes Schindelin, Jeff King, git

On Wed, Oct 05, 2016 at 03:11:05PM +0200, Jakub Narębski wrote:
> W dniu 05.10.2016 o 00:33, Rich Felker pisze:
> > On Wed, Oct 05, 2016 at 09:06:25AM +1100, James B wrote:
> >> On Tue, 4 Oct 2016 18:08:33 +0200 (CEST)
> >> Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> >>>
> >>> No, it is not. You quote POSIX, but the matter of the fact is that we use
> >>> a subset of POSIX in order to be able to keep things running on Windows.
> >>>
> >>> And quite honestly, there are lots of reasons to keep things running on
> >>> Windows, and even to favor Windows support over musl support. Over four
> >>> million reasons: the Git for Windows users.
> >>
> >> Wow, I don't know that Windows is a git's first-tier platform now,
> >> and Linux/POSIX second. Are we talking about the same git that was
> >> originally written in Linus Torvalds, and is used to manage Linux
> >> kernel? Are you by any chance employed by Redmond, directly or
> >> indirectly?
> >>
> >> Sorry - can't help it.
> 
> Windows is one of the major platforms, yes.  I think there much, much
> more people using Git on Windows, than using Git with musl.  More
> users = more important.
> 
> Also, working with some inconvenience (requiring compilation with
> NO_REGEX=1) is better than not working at all.
> 
> In CodingGuidelines we say:
> 
>  - Most importantly, we never say "It's in POSIX; we'll happily
>    ignore your needs should your system not conform to it."
>    We live in the real world.
> 
>  - However, we often say "Let's stay away from that construct,
>    it's not even in POSIX".

I agree wholeheartedly with these points.

> 
>  - In spite of the above two rules, we sometimes say "Although
>    this is not in POSIX, it (is so convenient | makes the code
>    much more readable | has other good characteristics) and
>    practically all the platforms we care about support it, so
>    let's use it".
> 
> The REG_STARTEND is 3rd point,

To begin with I wasn't clear that REG_STARDEND being nonstandard was
even noticed or compatibility considered when adding the dependency on
it, but it seems such discussion did take place and most targets have
it. Perhaps this means it should be proposed for standardization in
the next issue of POSIX.

> mmap shenningans looks like 1st...
> 
> ....on the other hand midipix <writeonce@midipix.org> wrote in
> http://public-inbox.org/git/20161004200057.dc30d64f61e5ec441c34ffd4f788e58e.efa66ead67.wbe@email15.godaddy.com/
> that the proposed fix should work on all Windows version we are
> interested in (I think).  Test program included / attached.
> 
> The above-mentioned email also explains that the problem was
> caught on MS Windows; it triggers if file end falls on the mmapped
> page boundary, which is more likely to happen with 4096 mod size
> on Windows rather than 65536 mod size on Linux.

On Linux page-size (mmap granularity) varies by arch but it's 4k on
basically all archs that people care about. I think midipix's author
was talking about real page size on Windows (4k) vs the minimum
logical page size (mmap granularity) that can be used to get
POSIX-matching semantics in midipix (which is 64k due to some
technical reasons I forget, which he could probably remind me of).

> On the other hand, while the proposed solution of "add padding as
> to not end at page boundary, if necessary" doesn't have the
> performance impact of "memcpy into NUL-terminated buffer" that
> was originally proposed in patch series, it is still extra code
> to maintain.

*nod*

Rich

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [musl] Re: Regression: git no longer works with musl libc's regex impl
  2016-10-05 11:59           ` James B
@ 2016-10-05 16:11             ` Jeff King
  2016-10-05 16:27               ` Rich Felker
  2016-10-06 10:44             ` Johannes Schindelin
  1 sibling, 1 reply; 16+ messages in thread
From: Jeff King @ 2016-10-05 16:11 UTC (permalink / raw)
  To: James B; +Cc: Johannes Schindelin, musl, Rich Felker, git

On Wed, Oct 05, 2016 at 10:59:34PM +1100, James B wrote:

> Number downloads does not make first-tier platform. You know that as
> well as everyone else.
> 
> First-tier support is the decision made by the maintainers that the
> entire features of the software must be available on those first tier
> platforms. So if Windows is indeed first-tier platform for git, it
> means any features that don't work on git version of Windows must not
> be used/developed or even castrated. That's a scary thought.

Prepare to be scared, then, I guess. Ever since the msysgit project
started years ago, we have made concessions in the code to work both
with POSIX-ish systems and with the msys layer. E.g., see how git-daemon
does not fork(), but actually re-spawns itself to handle connections.

When possible we try to put our abstractions at a level where they can
be implemented in a performant way on all platforms (the git-daemon
things is probably the _most_ ugly in that respect; I think nobody has
really cared about the performance enough to add back in a forking code
path for POSIX systems).

> So this decision that "Windows is now a first-tier platform for git" -
> is your own opinion, or is this the collective opinion of *all* the
> git maintainers?

There is only one maintainer of git: Junio. However, you'll note that I
also used "we" in the paragraphs above. And that is because the approach
I am talking about is something that has been done over the course of
many years by many members of the development community.

You may disagree with that approach, but it is nothing new. The msysgit
project started in 2007.

> Well thank you for being honest. I can see now why you responded the
> way you did (and still do). By being employed by Microsoft, and
> especially paid to work on Git for Windows, you have all the
> incentives to make it work best on Windows, and to make it as its
> first-tier platform within the limitation of Windows.

Please don't insinuate that Johannes is a Microsoft shill. He has been
working on the Windows port of Git for over 9 years, and was only
employed by Microsoft this year. Furthermore, his original REG_STARTEND
patch actually did a run-time fallback of NUL-terminating the input
buffers. It was _I_ who suggested that we should simply push people
towards our compat/regex routines instead. So if you want to be mad at
somebody, be mad at me.

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [musl] Re: Regression: git no longer works with musl libc's regex impl
  2016-10-05 11:17         ` Johannes Schindelin
  2016-10-05 13:01           ` Szabolcs Nagy
@ 2016-10-05 13:15           ` Rich Felker
  1 sibling, 0 replies; 16+ messages in thread
From: Rich Felker @ 2016-10-05 13:15 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jeff King, git, musl

On Wed, Oct 05, 2016 at 01:17:49PM +0200, Johannes Schindelin wrote:
> Hi Rich,
> 
> On Tue, 4 Oct 2016, Rich Felker wrote:
> 
> > On Tue, Oct 04, 2016 at 06:08:33PM +0200, Johannes Schindelin wrote:
> >
> > > And lastly, the best alternative would be to teach musl about
> > > REG_STARTEND, as it is rather useful a feature.
> > 
> > Maybe, but it seems fundamentally costly to support -- it's extra
> > state in the inner loops that imposes costly spill/reload on archs
> > with too few registers (x86).
> 
> It is true that it could cause that.
> 
> I had a brief look at the source code (you use backtracking...

Where did you get that idea? Backtracking is the most utterly
incompetent way to implement regex -- it throws away the whole
property that makes regex useful, being regular. Unfortunately, POSIX
BRE is not regular, as it contains backreferences, so any
implementation of regcomp/regexec requires at least a minimal
backtracking code path for BREs that contain backreferences.

> hopefully
> nobody uses musl to parse regular expressions from untrusted, or

On the contrary, musl's is the only system reccomp/regexec I'm aware
of that actually attempts to be safe with untrusted input -- when
using REG_EXTENDED (ERE). Other implementations provide backreferences
in ERE as an extension, making ERE unsafe just like BRE. musl
intentionally disallows them as a feature.

At least until recently, glibc also crashed on malloc failures in
regcomp, making it unsafe on untrusted input for that reason too.

Rich

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [musl] Re: Regression: git no longer works with musl libc's regex impl
  2016-10-05 11:17         ` Johannes Schindelin
@ 2016-10-05 13:01           ` Szabolcs Nagy
  2016-10-05 13:15           ` Rich Felker
  1 sibling, 0 replies; 16+ messages in thread
From: Szabolcs Nagy @ 2016-10-05 13:01 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Rich Felker, Jeff King, git, musl

* Johannes Schindelin <Johannes.Schindelin@gmx.de> [2016-10-05 13:17:49 +0200]:
> I had a brief look at the source code (you use backtracking... hopefully
> nobody uses musl to parse regular expressions from untrusted, or
> inexperienced, sources [*1*]), and it seems that the regex code might

does git use BRE?

a conforming BRE implementation has to use back tracking
if the pattern has back references.

usually ERE implementations may also use back tracking
since they support back references as an extension.

musl does not support this extension (and many others) so
it never uses back tracking for ERE matches, note however
that match complexity and memory usage of a conforming
ERE implementation is still exponential in pattern
length because of repetition counts.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [musl] Re: Regression: git no longer works with musl libc's regex impl
  2016-10-05 10:41         ` Johannes Schindelin
@ 2016-10-05 11:59           ` James B
  2016-10-05 16:11             ` Jeff King
  2016-10-06 10:44             ` Johannes Schindelin
  0 siblings, 2 replies; 16+ messages in thread
From: James B @ 2016-10-05 11:59 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: musl, Rich Felker, Jeff King, git

On Wed, 5 Oct 2016 12:41:50 +0200 (CEST)
Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:

> > 
> > Wow, I don't know that Windows is a git's first-tier platform now,
> 
> It is. Git for Windows is maintained by me, and I make as certain as I can
> that it works fine. 
> And yes, we have download numbers to support my claim.
> The latest release is less than 24h old, but I can point you to Git for
> Windows 2.8.1 whose 32-bit installer was downloaded 397,273 times, and
> whose 64-bit installer was downloaded 3,780,079 times.

Number downloads does not make first-tier platform. You know that as well as everyone else.

First-tier support is the decision made by the maintainers that the entire features of the software must be available on those first tier platforms. So if Windows is indeed first-tier platform for git, it means any features that don't work on git version of Windows must not be used/developed or even castrated. That's a scary thought.

Being a maintainer for "Git for Windows" does not make one automatically as the maintainer for "git", although that can happen.

So this decision that "Windows is now a first-tier platform for git" - is your own opinion, or is this the collective opinion of *all* the git maintainers? 


> 
> > and Linux/POSIX second.
> 
> This is not at all what I said, so please be careful of what you accuse
> me.

Yes, you did not say that. I said that. And I will say more. Git has Linux/POSIX roots. Attempting to "not use common POSIX features because they're not available on Windows" *does* make Linux/POSIX feels like second class platform for git. The way I see it, it should be *the other way around*.

It's a very sad day for a tool that was developed originally to maintain Linux kernel, by the Linux kernel author, now is restricted to avoid use/optimise on Linux/POSIX features *because* it has to run on another Windows ...

> 
> What I said is that we never exploited the full POSIX standard, but that
> we made certain to use a subset of POSIX in Git which would be relatively
> easy to emulate using Windows' API.

All this just proves my point above.

And - I notice you use the pronoun "we" - is that a "royal we" (which means the entire point is your own or your cohorts position), or is it the official position of all git maintainers?

> 
> > Are we talking about the same git that was originally written in Linus
> > Torvalds, and is used to manage Linux kernel?
> 
> It was originally written by (not in) Linus Torvalds, and yes, the Linux
> kernel is one of its many users.

That was a rhetorical question.

> 
> > Are you by any chance employed by Redmond, directly or indirectly?
> 
> I am not exactly employed by Redmond, but by Microsoft (this is what you
> meant, I guess).
> 
> I maintained Git for Windows in my spare time, next to a very demanding
> position in science, for eight years. In 2015, I joined Microsoft and part
> of my role is to maintain Git for Windows, allowing me to do a much better
> job at it.


Well thank you for being honest. I can see now why you responded the way you did (and still do). By being employed by Microsoft, and especially paid to work on Git for Windows, you have all the incentives to make it work best on Windows, and to make it as its first-tier platform within the limitation of Windows.

That in itself is not a problem - it only starts to become a problem when you try to cut down support for other platforms or stifle improvements on other platforms because "hey it makes it too hard to do those things in Windows".

> 
> Of course, I do not only improve Git's Windows support, but contribute
> other patches, too. You might also appreciate the fact that some of my
> colleagues started contributing patches to Git that benefit all Git users.
> 

By "colleagues" I assume other Microsoft employees? 
I don't have a problem with that - thank you and your colleagues for making git better.

But that does not give the right to you to control that "if it doesn't work on Windows we shouldn't do it". If you do, and at the same time claim that musl-libc (which is Linux-only) support is unimportant compared to your 4 million Git-for-Windows downloads really don't do well for your or your employer's image. I don't have to say why - everyone outside Microsoft knows why.

In conclusion, I certainly hope that your view is not shared by the other git maintainers.

PS: Rich, sorry for the distraction. I have said what I want to say, so I'll bow out from this thread.

cheers,
James

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [musl] Re: Regression: git no longer works with musl libc's regex impl
  2016-10-04 17:39       ` [musl] " Rich Felker
@ 2016-10-05 11:17         ` Johannes Schindelin
  2016-10-05 13:01           ` Szabolcs Nagy
  2016-10-05 13:15           ` Rich Felker
  0 siblings, 2 replies; 16+ messages in thread
From: Johannes Schindelin @ 2016-10-05 11:17 UTC (permalink / raw)
  To: Rich Felker; +Cc: Jeff King, git, musl

Hi Rich,

On Tue, 4 Oct 2016, Rich Felker wrote:

> On Tue, Oct 04, 2016 at 06:08:33PM +0200, Johannes Schindelin wrote:
>
> > And lastly, the best alternative would be to teach musl about
> > REG_STARTEND, as it is rather useful a feature.
> 
> Maybe, but it seems fundamentally costly to support -- it's extra
> state in the inner loops that imposes costly spill/reload on archs
> with too few registers (x86).

It is true that it could cause that.

I had a brief look at the source code (you use backtracking... hopefully
nobody uses musl to parse regular expressions from untrusted, or
inexperienced, sources [*1*]), and it seems that the regex code might
spill unnecessarily already (I see, for example, that the reg_notbol,
reg_noteol and reg_newline flags all use up complete int registers, not
merely bits of a single one).

It seems, specifically, that the *match_end_ofs parameter of the two
regexec backends is always set to point to eo, which is so far not
initialized. You could initialize it to -1 and set it to pmatch[0].rm_eo
if the REG_STARTEND flag is set. The GET_NEXT_WCHAR() macro would then
need to test something like

	if (str_byte >= string + *match_end_ofs) {
		ret = REG_NOMATCH; goto error_exit;
	}

This does not handle non-zero pmatch[0].rm_so, though. I would probably
try to pass another input parameter for that, but I have not verified yet
that a "^" would be handled properly (if pmatch[0].rm_so > 0 and
REG_STARTEND is set, "^" should *not* match).

> I'll look at doing this when we overhaul/replace the regex
> implementation, and I'm happy to do some performance-regression tests
> for adding it now if someone has a simple patch (as was mentioned on the
> musl list).

I'd be interested to be kept in the loop, if you do not mind Cc:ing me.

Ciao,
Johannes

Footnote *1*:
http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [musl] Re: Regression: git no longer works with musl libc's regex impl
  2016-10-05  3:00 writeonce
@ 2016-10-05 10:49 ` Johannes Schindelin
  0 siblings, 0 replies; 16+ messages in thread
From: Johannes Schindelin @ 2016-10-05 10:49 UTC (permalink / raw)
  To: writeonce; +Cc: musl, git, Jeff King

Hi writeonce,

On Tue, 4 Oct 2016, writeonce@midipix.org wrote:

> < On Tue, 4 Oct 2016, Rich Felker wrote:
> < 
> < > On Tue, Oct 04, 2016 at 11:27:22AM -0400, Jeff King wrote:
> < > > On Tue, Oct 04, 2016 at 11:08:48AM -0400, Rich Felker wrote:
> < > > 
> < > > > 1. is nonzero mod page size, it just works; the remainder of the
> < > > > last page reads as zero bytes when mmapped.
> < > > 
> < > > Is that a portable assumption?
> < > 
> < > Yes.
> < 
> < No, it is not. You quote POSIX, but the matter of the fact is that we
> < use a subset of POSIX in order to be able to keep things running on
> < Windows.
> 
> As far as I can tell (and as the attached program may help demonstrate),
> the above assumption has been valid on all versions of Windows since at
> least Windows 2000.

And since W2K is already past its end of life, it would be safe for
practical considerations.

However, I have to add two comments to that:

- it is *not* guaranteed. The behavior is undefined, even if you see
  consistent behavior so far. Future Windows versions might break that
  assumption freely, though.

- some implementations of the REG_STARTEND feature have the nice property
  that they can read past NUL characters. Granted, not all of them do
  (AFAIU one example is FreeBSD itself, the first platform to sport
  REG_STARTEND), but we at least reap the benefit whenever using a regex
  that *can* read past NUL characters.

> In this context, one thing to remember is that the page-size for the mod
> operation is 4096, whereas the POSIX page-size (for the purpose of mmap
> and mremap) is 65536.

Indeed. A colleague of mine spotted the segfault when diffing a file that
was *exactly* 4,096 bytes.

> Note also that in the case of file-backed mapped sections, using
> kernel32.dll or msvcrt.dll or cygwin/newlib or midipix/musl is of little
> significance, specifically since all invoke ZwCreateSection and
> ZwMapViewOfSection under the hood.

Right. It's all backed by the very same kernel functions.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [musl] Re: Regression: git no longer works with musl libc's regex impl
  2016-10-04 22:06       ` James B
  2016-10-04 22:33         ` Rich Felker
@ 2016-10-05 10:41         ` Johannes Schindelin
  2016-10-05 11:59           ` James B
  1 sibling, 1 reply; 16+ messages in thread
From: Johannes Schindelin @ 2016-10-05 10:41 UTC (permalink / raw)
  To: James B; +Cc: musl, Rich Felker, Jeff King, git

Hi James,

On Wed, 5 Oct 2016, James B wrote:

> On Tue, 4 Oct 2016 18:08:33 +0200 (CEST)
> Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> 
> > No, it is not. You quote POSIX, but the matter of the fact is that we
> > use a subset of POSIX in order to be able to keep things running on
> > Windows.
> > 
> > And quite honestly, there are lots of reasons to keep things running
> > on Windows, and even to favor Windows support over musl support. Over
> > four million reasons: the Git for Windows users.
> 
> Wow, I don't know that Windows is a git's first-tier platform now,

It is. Git for Windows is maintained by me, and I make as certain as I can
that it works fine. And yes, we have download numbers to support my claim.
The latest release is less than 24h old, but I can point you to Git for
Windows 2.8.1 whose 32-bit installer was downloaded 397,273 times, and
whose 64-bit installer was downloaded 3,780,079 times.

> and Linux/POSIX second.

This is not at all what I said, so please be careful of what you accuse
me.

What I said is that we never exploited the full POSIX standard, but that
we made certain to use a subset of POSIX in Git which would be relatively
easy to emulate using Windows' API.

> Are we talking about the same git that was originally written in Linus
> Torvalds, and is used to manage Linux kernel?

It was originally written by (not in) Linus Torvalds, and yes, the Linux
kernel is one of its many users.

Git is not used only for the Linux kernel, though, and I am certain that
Linus agrees that it should not cater only to the Linux folks. Git is used
very widely in OSS as well as in the industry. So we, the Git developers,
kind of have an obligation to make things work in a much broader
perspective than you suggested.

> Are you by any chance employed by Redmond, directly or indirectly?

I am not exactly employed by Redmond, but by Microsoft (this is what you
meant, I guess).

I maintained Git for Windows in my spare time, next to a very demanding
position in science, for eight years. In 2015, I joined Microsoft and part
of my role is to maintain Git for Windows, allowing me to do a much better
job at it.

Of course, I do not only improve Git's Windows support, but contribute
other patches, too. You might also appreciate the fact that some of my
colleagues started contributing patches to Git that benefit all Git users.

> Sorry - can't help it.

I do not know why you are sorry, and I do not believe that you have to be.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [musl] Re: Regression: git no longer works with musl libc's regex impl
@ 2016-10-05  3:00 writeonce
  2016-10-05 10:49 ` Johannes Schindelin
  0 siblings, 1 reply; 16+ messages in thread
From: writeonce @ 2016-10-05  3:00 UTC (permalink / raw)
  To: musl; +Cc: Johannes.Schindelin, git, Jeff King

[-- Attachment #1: Type: text/plain, Size: 2834 bytes --]


< 
< 
< -------- Original Message --------
< Subject: [musl] Re: Regression: git no longer works with musl libc's
< regex impl
< From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
< Date: Tue, October 04, 2016 9:08 am
< To: Rich Felker <dalias@libc.org>
< Cc: Jeff King <peff@peff.net>, git@vger.kernel.org,
< musl@lists.openwall.com
< 
< Hi Rich,
< 
< On Tue, 4 Oct 2016, Rich Felker wrote:
< 
< > On Tue, Oct 04, 2016 at 11:27:22AM -0400, Jeff King wrote:
< > > On Tue, Oct 04, 2016 at 11:08:48AM -0400, Rich Felker wrote:
< > > 
< > > > 1. is nonzero mod page size, it just works; the remainder of the
last
< > > > page reads as zero bytes when mmapped.
< > > 
< > > Is that a portable assumption?
< > 
< > Yes.
< 
< No, it is not. You quote POSIX, but the matter of the fact is that we
use
< a subset of POSIX in order to be able to keep things running on
Windows.
< 

As far as I can tell (and as the attached program may help demonstrate),
the above assumption has been valid on all versions of Windows since at
least Windows 2000. In this context, one thing to remember is that the
page-size for the mod operation is 4096, whereas the POSIX page-size
(for the purpose of mmap and mremap) is 65536. Note also that in the
case of file-backed mapped sections, using kernel32.dll or msvcrt.dll or
cygwin/newlib or midipix/musl is of little significance, specifically
since all invoke ZwCreateSection and ZwMapViewOfSection under the hood.

HTH,
midipix

< And quite honestly, there are lots of reasons to keep things running
on
< Windows, and even to favor Windows support over musl support. Over
four
< million reasons: the Git for Windows users.
< 
< So rather than getting into an ideological discussion about "broken"
< systems, it would be good to keep things practical, realizing that
those
< users make up a very real chunk of all of Git's users.
< 
< As to making NO_REGEX conditional on REG_STARTEND: you are talking
about
< apples and oranges here. NO_REGEX is a Makefile flag, while
REG_STARTEND
< is a C preprocessor macro.
< 
< Unless you can convince the rest of the Git developers (you would not
< convince me) to simulate autoconf by compiling an executable every
time
< `make` is run, to determine whether REG_STARTEND is defined, this is a
< no-go.
< 
< However, you *can* use autoconf directly, and come up with a patch to
our
< configure.ac that detects the absence of REG_STARTEND and sets
NO_REGEX=1.
< 
< Alternatively, you can set NO_REGEX=1 in your config.mak.
< 
< Or, if you use one of the auto-detected cases in config.mak.uname, you
< could patch it to set NO_REGEX=1.
< 
< And lastly, the best alternative would be to teach musl about
< REG_STARTEND, as it is rather useful a feature.
< 
< Ciao,
< Johannes
< 


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: mmap_file.c --]
[-- Type: text/x-c; name="mmap_file.c";, Size: 1487 bytes --]

#define _XOPEN_SOURCE            500
#define NT_FILE_BACKED_PAGE_SIZE 4096

#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
#include <errno.h>
#include <sys/mman.h>

static int test_file_backed_mmap(int size)
{
	int     fd;
	int     exsize;
	void *  addr;
	char *  test;
	char *  cap;
	char    name[12];

	/* create test file of desired size */
	sprintf(name,"%d.tmp",size);

	if ((fd = creat(name,0755)) < 0)
		return -1;

	if (ftruncate(fd,size))
		return -1;

	close(fd);

	/* write 'W' to all bytes of test file */
	if ((fd = open(name,O_RDWR,0)) < 0)
		return -1;

	if ((addr = mmap(0,size,PROT_WRITE,MAP_SHARED,fd,0)) == MAP_FAILED)
		return -1;

	close(fd);

	test = (char *)addr;
	cap  = (char *)addr + size;

	for (; test<cap; test++)
		*test = 'W';

	munmap(addr,size);

	/* map file for read access */
	if ((fd = open(name,O_RDONLY,0)) < 0)
		return -1;

	if ((addr = mmap(0,size,PROT_READ,MAP_PRIVATE,fd,0)) == MAP_FAILED)
		return -1;

	close(fd);

	/* test */
	exsize = size + NT_FILE_BACKED_PAGE_SIZE;
	exsize &= ~(NT_FILE_BACKED_PAGE_SIZE - 1);

	test = (char *)addr + size;
	cap  = (char *)addr + exsize;

	for (; test<cap; test++)
		if (*test)
			return -1;

	/* clean-up */
	munmap(addr,size);
	unlink(name);

	return 0;

}

int main(void)
{
	int i;

	for (i=1; i<65536; i++)
		if (i % NT_FILE_BACKED_PAGE_SIZE)
			if (test_file_backed_mmap(i))
				return 2;

	return 0;
}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [musl] Re: Regression: git no longer works with musl libc's regex impl
  2016-10-04 22:33         ` Rich Felker
@ 2016-10-04 22:48           ` Junio C Hamano
  2016-10-05 13:11           ` Jakub Narębski
  1 sibling, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2016-10-04 22:48 UTC (permalink / raw)
  To: Rich Felker; +Cc: James B, musl, Johannes Schindelin, Jeff King, git

Rich Felker <dalias@libc.org> writes:

> This is especially unfriendly when the semantics of the switch come
> across, at least to some users, as "your system regex is incomplete"
> rather than "git can't use it because git depends on nonstandard
> extensions".

The latter is exactly what Makefile patch that brought this change
says, I think.

    # Define NO_REGEX if your C library lacks regex support with REG_STARTEND
    # feature.

Before the series updated the message to the above, we used to say:

    # Define NO_REGEX if you have no or inferior regex support in your C library.

which _was_ unfair to those who needed to set NO_REGEX for whatever
reason.  It was totally unclear "inferior" relative to what standard
the message was passing its harsh judgement on your C library.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [musl] Re: Regression: git no longer works with musl libc's regex impl
  2016-10-04 22:06       ` James B
@ 2016-10-04 22:33         ` Rich Felker
  2016-10-04 22:48           ` Junio C Hamano
  2016-10-05 13:11           ` Jakub Narębski
  2016-10-05 10:41         ` Johannes Schindelin
  1 sibling, 2 replies; 16+ messages in thread
From: Rich Felker @ 2016-10-04 22:33 UTC (permalink / raw)
  To: James B; +Cc: musl, Johannes Schindelin, Jeff King, git

On Wed, Oct 05, 2016 at 09:06:25AM +1100, James B wrote:
> On Tue, 4 Oct 2016 18:08:33 +0200 (CEST)
> Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> 
> > 
> > No, it is not. You quote POSIX, but the matter of the fact is that we use
> > a subset of POSIX in order to be able to keep things running on Windows.
> > 
> > And quite honestly, there are lots of reasons to keep things running on
> > Windows, and even to favor Windows support over musl support. Over four
> > million reasons: the Git for Windows users.
> > 
> 
> Wow, I don't know that Windows is a git's first-tier platform now,
> and Linux/POSIX second. Are we talking about the same git that was
> originally written in Linus Torvalds, and is used to manage Linux
> kernel? Are you by any chance employed by Redmond, directly or
> indirectly?
> 
> Sorry - can't help it.

I don't think the hostility and sarcasm are really needed here. But
what this does speak to is that users don't like feeling like their
platform is being treated as a second-class target, which is what it
feels like when you have to manually flip a switch to make git build.
This is especially unfriendly when the semantics of the switch come
across, at least to some users, as "your system regex is incomplete"
rather than "git can't use it because git depends on nonstandard
extensions".

Rich

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [musl] Re: Regression: git no longer works with musl libc's regex impl
  2016-10-04 16:08     ` Johannes Schindelin
  2016-10-04 17:39       ` [musl] " Rich Felker
@ 2016-10-04 22:06       ` James B
  2016-10-04 22:33         ` Rich Felker
  2016-10-05 10:41         ` Johannes Schindelin
  1 sibling, 2 replies; 16+ messages in thread
From: James B @ 2016-10-04 22:06 UTC (permalink / raw)
  To: musl; +Cc: Johannes Schindelin, Rich Felker, Jeff King, git

On Tue, 4 Oct 2016 18:08:33 +0200 (CEST)
Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:

> 
> No, it is not. You quote POSIX, but the matter of the fact is that we use
> a subset of POSIX in order to be able to keep things running on Windows.
> 
> And quite honestly, there are lots of reasons to keep things running on
> Windows, and even to favor Windows support over musl support. Over four
> million reasons: the Git for Windows users.
> 

Wow, I don't know that Windows is a git's first-tier platform now, and Linux/POSIX second. Are we talking about the same git that was originally written in Linus Torvalds, and is used to manage Linux kernel? Are you by any chance employed by Redmond, directly or indirectly?

Sorry - can't help it.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [musl] Re: Regression: git no longer works with musl libc's regex impl
  2016-10-04 16:08     ` Johannes Schindelin
@ 2016-10-04 17:39       ` Rich Felker
  2016-10-05 11:17         ` Johannes Schindelin
  2016-10-04 22:06       ` James B
  1 sibling, 1 reply; 16+ messages in thread
From: Rich Felker @ 2016-10-04 17:39 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jeff King, git, musl

On Tue, Oct 04, 2016 at 06:08:33PM +0200, Johannes Schindelin wrote:
> Hi Rich,
> 
> On Tue, 4 Oct 2016, Rich Felker wrote:
> 
> > On Tue, Oct 04, 2016 at 11:27:22AM -0400, Jeff King wrote:
> > > On Tue, Oct 04, 2016 at 11:08:48AM -0400, Rich Felker wrote:
> > > 
> > > > 1. is nonzero mod page size, it just works; the remainder of the last
> > > >    page reads as zero bytes when mmapped.
> > > 
> > > Is that a portable assumption?
> > 
> > Yes.
> 
> No, it is not. You quote POSIX, but the matter of the fact is that we use
> a subset of POSIX in order to be able to keep things running on Windows.
> 
> And quite honestly, there are lots of reasons to keep things running on
> Windows, and even to favor Windows support over musl support. Over four
> million reasons: the Git for Windows users.
> 
> So rather than getting into an ideological discussion about "broken"
> systems, it would be good to keep things practical, realizing that those
> users make up a very real chunk of all of Git's users.
> 
> As to making NO_REGEX conditional on REG_STARTEND: you are talking about
> apples and oranges here. NO_REGEX is a Makefile flag, while REG_STARTEND
> is a C preprocessor macro.

It seems like you could just always compile the source file, and just
have it all inside #if defined(NO_REGEX) || !defined(REG_STARTEND) or
similar.

> And lastly, the best alternative would be to teach musl about
> REG_STARTEND, as it is rather useful a feature.

Maybe, but it seems fundamentally costly to support -- it's extra
state in the inner loops that imposes costly spill/reload on archs
with too few registers (x86). I'll look at doing this when we
overhaul/replace the regex implementation, and I'm happy to do some
performance-regression tests for adding it now if someone has a simple
patch (as was mentioned on the musl list).

Rich

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-10-06 10:45 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-05 16:37 [musl] Re: Regression: git no longer works with musl libc's regex impl writeonce
  -- strict thread matches above, loose matches on Subject: below --
2016-10-05  3:00 writeonce
2016-10-05 10:49 ` Johannes Schindelin
2016-10-04 15:08 Rich Felker
2016-10-04 15:27 ` Jeff King
2016-10-04 15:40   ` Rich Felker
2016-10-04 16:08     ` Johannes Schindelin
2016-10-04 17:39       ` [musl] " Rich Felker
2016-10-05 11:17         ` Johannes Schindelin
2016-10-05 13:01           ` Szabolcs Nagy
2016-10-05 13:15           ` Rich Felker
2016-10-04 22:06       ` James B
2016-10-04 22:33         ` Rich Felker
2016-10-04 22:48           ` Junio C Hamano
2016-10-05 13:11           ` Jakub Narębski
2016-10-05 16:15             ` [musl] " Rich Felker
2016-10-05 10:41         ` Johannes Schindelin
2016-10-05 11:59           ` James B
2016-10-05 16:11             ` Jeff King
2016-10-05 16:27               ` Rich Felker
2016-10-06 10:44             ` Johannes Schindelin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.