* git smart protocol via WebSockets - feedback wanted @ 2012-06-05 18:20 Stephan Peijnik 2012-06-05 18:31 ` Junio C Hamano 2012-06-05 18:36 ` Shawn Pearce 0 siblings, 2 replies; 7+ messages in thread From: Stephan Peijnik @ 2012-06-05 18:20 UTC (permalink / raw) To: git Dear list, Since I have been working on a proof of concept showing that git's smart protocol can be tunneled via WebSocket connections quite easily [0] I wanted to ask for some feedback on the idea in general and on my implementation [1]. So, basically, what do you think about tunneling git's smart protocol via WebSockets (and thus HTTP)? Do you believe that a full implementation of this, as opposed to the current proof of concept, is worthwhile? Are there any issues with this approach I missed (apart from the missing authentication/authorization in the server component of course)? Thanks in advance for your input. Regards, Stephan [0] http://blog.sp.or.at/2012/06/git-smart-protocol-via-websockets-proof.html [1] https://github.com/speijnik/gitws ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: git smart protocol via WebSockets - feedback wanted 2012-06-05 18:20 git smart protocol via WebSockets - feedback wanted Stephan Peijnik @ 2012-06-05 18:31 ` Junio C Hamano 2012-06-05 18:41 ` Stephan Peijnik 2012-06-05 18:36 ` Shawn Pearce 1 sibling, 1 reply; 7+ messages in thread From: Junio C Hamano @ 2012-06-05 18:31 UTC (permalink / raw) To: Stephan Peijnik; +Cc: git How does this compare with the smart-http support that tunnels the git protocol over http (with some butchering)? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: git smart protocol via WebSockets - feedback wanted 2012-06-05 18:31 ` Junio C Hamano @ 2012-06-05 18:41 ` Stephan Peijnik 2012-06-05 18:54 ` Shawn Pearce 0 siblings, 1 reply; 7+ messages in thread From: Stephan Peijnik @ 2012-06-05 18:41 UTC (permalink / raw) To: git On 06/05/2012 08:31 PM, Junio C Hamano wrote: > How does this compare with the smart-http support that tunnels the > git protocol over http (with some butchering)? To be honest, I didn't know smart-http support yet. Is that the approach introduced with git 1.6.6? If so, that approach uses multiple POST requests, meaning multiple TCP and HTTP connections need to be established, multiple requests processed, etc. The WebSocket approach uses a single HTTP connection which gets upgraded to a WebSocket. This WebSocket then allows the same communication to happen as with the ssh implementation. So in comparison there is possibly a lot less overhead and, in theory, the performance should be comparable to running the smart protocol over ssh. Personally I'd say the WebSocket approach is cleaner than the HTTP-POST approach. -- Stephan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: git smart protocol via WebSockets - feedback wanted 2012-06-05 18:41 ` Stephan Peijnik @ 2012-06-05 18:54 ` Shawn Pearce 2012-06-05 19:28 ` Stephan Peijnik 0 siblings, 1 reply; 7+ messages in thread From: Shawn Pearce @ 2012-06-05 18:54 UTC (permalink / raw) To: Stephan Peijnik; +Cc: git On Tue, Jun 5, 2012 at 11:41 AM, Stephan Peijnik <stephan@peijnik.at> wrote: > On 06/05/2012 08:31 PM, Junio C Hamano wrote: >> >> How does this compare with the smart-http support that tunnels the >> git protocol over http (with some butchering)? > > > To be honest, I didn't know smart-http support yet. Is that the approach > introduced with git 1.6.6? Yes. So its been around for a while now. Like 2 years. > If so, that approach uses multiple POST requests, meaning multiple TCP and > HTTP connections need to be established, multiple requests processed, etc. Its actually only one TCP connection... assuming the servers in between the client and the Git endpoint correctly support HTTP keep-alive semantics. > The WebSocket approach uses a single HTTP connection which gets upgraded to > a WebSocket. This WebSocket then allows the same communication to happen as > with the ssh implementation. How does this fair going through crappy proxy servers that perform man-in-the-middle attacks on SSL connections? Just last week I was trying to help someone whose local proxy server was MITM the SSL session behind Git's back, and their IT department forgot to install the proxy server's certificate into the system certificate directory. They only installed it into the browser. That proxy also doesn't correctly grok HTTP 1.1 keep-alive with chunked transfer encodings. Let alone something as new as web sockets. > So in comparison there is possibly a lot less overhead and, in theory, the > performance should be comparable to running the smart protocol over ssh. > Personally I'd say the WebSocket approach is cleaner than the HTTP-POST > approach. This may be true. But its also a lot more complex to implement. I noticed you reused Python code to help make this work. Let me know when there is a GPLv2 client library that implements sufficient semantics for WebSockets that Git can bundle it out of the box. And let me know when most corporate IT proxy servers correctly grok WebSockets. I suspect it will be many more years given that they still can't even grok chunked transfer encoding. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: git smart protocol via WebSockets - feedback wanted 2012-06-05 18:54 ` Shawn Pearce @ 2012-06-05 19:28 ` Stephan Peijnik 2012-06-05 21:11 ` Shawn Pearce 0 siblings, 1 reply; 7+ messages in thread From: Stephan Peijnik @ 2012-06-05 19:28 UTC (permalink / raw) To: git On 06/05/2012 08:54 PM, Shawn Pearce wrote: > On Tue, Jun 5, 2012 at 11:41 AM, Stephan Peijnik<stephan@peijnik.at> wrote: >> To be honest, I didn't know smart-http support yet. Is that the approach >> introduced with git 1.6.6? > > Yes. So its been around for a while now. Like 2 years. I have just read-up on that. My fault. >> If so, that approach uses multiple POST requests, meaning multiple TCP and >> HTTP connections need to be established, multiple requests processed, etc. > > Its actually only one TCP connection... assuming the servers in > between the client and the Git endpoint correctly support HTTP > keep-alive semantics. With keep-alive that is true, but a quick check on the actual data exchange tells me that multiple HTTP requests are still needed. But I guess the overhead caused by a second HTTP requests can be ignored. > How does this fair going through crappy proxy servers that perform > man-in-the-middle attacks on SSL connections? Just last week I was > trying to help someone whose local proxy server was MITM the SSL > session behind Git's back, and their IT department forgot to install > the proxy server's certificate into the system certificate directory. > They only installed it into the browser. That proxy also doesn't > correctly grok HTTP 1.1 keep-alive with chunked transfer encodings. > Let alone something as new as web sockets. Proxy servers could be an issue, yes. For proxy servers not acting as MITM and which are supporting CONNECT this shouldn't be an issue though. Also, given the current HTML5 hype things should get better in the future, but you are correct about potential current issues with the approach. >> So in comparison there is possibly a lot less overhead and, in theory, the >> performance should be comparable to running the smart protocol over ssh. >> Personally I'd say the WebSocket approach is cleaner than the HTTP-POST >> approach. > > This may be true. But its also a lot more complex to implement. I > noticed you reused Python code to help make this work. The only reason I used Python is that I wanted to quickly come up with a prototype. I am also aware of the fact that a proper implementation should possibly be done in C. > Let me know when there is a GPLv2 client library that implements sufficient > semantics for WebSockets that Git can bundle it out of the box. As for the WebSocket client library that is GPLv2 compatible: there is at least libwebsockets [0], which is licensed under the terms of the LGPL v2.1, and as such GPLv2 only compatible. What do you think about using this as basis for a proper implementation? > And let me know when most corporate IT proxy servers correctly grok > WebSockets. I suspect it will be many more years given that they still > can't even grok chunked transfer encoding. As stated above, this could be a problem, yes. The question is whether one only wants to provide an alternative approach when it is usable for everyone. My intention never was to have the current http implementation, be it the dumb or http-backend one, replaced. The idea here was to provide an additional option that makes use of a fairly new technology, with all benefits and drawbacks of using something new. Thanks for your feedback. -- Stephan [0] http://git.warmcat.com/cgi-bin/cgit/libwebsockets/ ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: git smart protocol via WebSockets - feedback wanted 2012-06-05 19:28 ` Stephan Peijnik @ 2012-06-05 21:11 ` Shawn Pearce 0 siblings, 0 replies; 7+ messages in thread From: Shawn Pearce @ 2012-06-05 21:11 UTC (permalink / raw) To: Stephan Peijnik; +Cc: git On Tue, Jun 5, 2012 at 12:28 PM, Stephan Peijnik <stephan@peijnik.at> wrote: > On 06/05/2012 08:54 PM, Shawn Pearce wrote: >> >> Its actually only one TCP connection... assuming the servers in >> between the client and the Git endpoint correctly support HTTP >> keep-alive semantics. > > With keep-alive that is true, but a quick check on the actual data exchange > tells me that multiple HTTP requests are still needed. But I guess the > overhead caused by a second HTTP requests can be ignored. There is extra overhead from the HTTP request headers, this is true. Fortunately its relatively small and bounded. There isn't that much additional latency in the smart HTTP protocol. Where the client is waiting on data from the server is where we end an HTTP request and start a new one. The mulit_ack capability on normal TCP or SSH connections does get to interleave a bit more in the native protocol to try and hide the RTT latency. I don't know that anyone has done extensive testing to determine how effective that is vs. the batch sizes we run in the HTTP POST format. With the key part being how quickly the overall negotiation exchange went for the end-user. Colby Ranger's recent contribution to contrib/persistent-https provides a local proxy server for Git over HTTP that tries to reuse HTTP connections across Git command invocations. This can go further than even git:// does with TCP connection reuse, cutting latency. Of course a user can do the same thing with their own local HTTP proxy, but persistent-https can be easier to install and configure. >> How does this fair going through crappy proxy servers that perform >> man-in-the-middle attacks on SSL connections? Just last week I was >> trying to help someone whose local proxy server was MITM the SSL >> session behind Git's back, and their IT department forgot to install >> the proxy server's certificate into the system certificate directory. >> They only installed it into the browser. That proxy also doesn't >> correctly grok HTTP 1.1 keep-alive with chunked transfer encodings. >> Let alone something as new as web sockets. > > Proxy servers could be an issue, yes. For proxy servers not acting as MITM > and which are supporting CONNECT this shouldn't be an issue though. I am still annoyed by the failure of "Expect: 100-continue". The original smart HTTP protocol used this during push to try and avoid sending a 100 MiB POST payload before finding out authentication is required and failing. It turns out far too many HTTP servers and proxy servers do not correctly implement 100-continue to rely on it in the protocol, so we had to back that out and use a special POST body with 4 bytes to "probe" the remote server before sending the full payload. 100-continue is in RFC 2616, dated June 1999. My calendar says June 2012. So 13 years later and we still cannot rely on 100-continue as specified by RFC 2616 working correctly on the public Internet. Chunked Transfer-Encoding is described in RFC 2038, dated Jan 1997 and is the RFC that RFC 2616 made obsolete. This is still not working reliable everywhere... 15 years after being specified proxy servers are still converting chunked transfer encoding to "Connection: close" and destroying any HTTP keep-alive that might have been possible. Basically I learned a lot in the past 2 years deploying Git on a rather broad scale with HTTP. The public Internet doesn't resemble the standards enough, and you really have to code to the lowest common denominator, because there is some user out there that matters to you who is stuck behind some HTTP proxy that implements the HTTP standard as it existed in 1995 and whose managers/suppliers refuse to bring forward to the current century. > Also, given the current HTML5 hype things should get better in the future, > but you are correct about potential current issues with the approach. WebSockets seems pretty full of fail to me. The protocol specification is really complex. Its reimplementing TCP on top of HTTP to work around an artificial browser imposed limitation on the number of suggested HTTP connections the browser opens to the server. Meanwhile SPDY goes the other direction and tries to support multiplexing a larger number of HTTP requests into a single TCP connection, while reusing a lot of header data across those requests. I have higher hopes for SPDY adoption than for WebSockets. SPDY solves a lot of common problems on the Internet that social networking sites care about, like time to load assets for a game, or that publishers care about, like time to load all assets for a site on initial visit, increasing the chances the user doesn't immediately jump away due to perceived high loading time. WebSockets is a large amount of wanking to make playing a game written in JavaScript easier, using a very ugly protocol, and a much more complex software stack. I think the WebSockets authors saw the problem of HTTP connections in a browser and solved it the wrong way. They saw the bidirectional stream problem... and solved it for a very narrow use case. SPDY also relies on a bidirectional stream, but it lets the server do more, like suggest pushing assets down ahead of the browser realizing it needs them. >>> So in comparison there is possibly a lot less overhead and, in theory, >>> the >>> performance should be comparable to running the smart protocol over ssh. >>> Personally I'd say the WebSocket approach is cleaner than the HTTP-POST >>> approach. >> >> This may be true. But its also a lot more complex to implement. I >> noticed you reused Python code to help make this work. > > The only reason I used Python is that I wanted to quickly come up with a > prototype. I am also aware of the fact that a proper implementation should > possibly be done in C. Any implementation of a new embedding of the Git protocol into e.g. WebSockets also requires implementing it in Java for JGit, both client and server, and probably also in Python for Dulwich, again client and server. Given that WebSockets is all about cramming TCP into HTTP in SSL in TCP, and doesn't always work on the public Internet given the current state of proxy servers, I just don't see the value in this. We have to provide complete support across the major Git implementations, otherwise users will come to this mailing list and complain about how implementation $X cannot talk to server running $Y because server owner $Z only configured the new fangled WebSockets protocol. And I am simply too lazy to write a procmail script to direct all such inquiries to your address. >> Let me know when there is a GPLv2 client library that implements >> sufficient >> semantics for WebSockets that Git can bundle it out of the box. > > As for the WebSocket client library that is GPLv2 compatible: there is at > least libwebsockets [0], which is licensed under the terms of the LGPL v2.1, > and as such GPLv2 only compatible. OK, yay. Someone actually bothered to implement this? >> And let me know when most corporate IT proxy servers correctly grok >> WebSockets. I suspect it will be many more years given that they still >> can't even grok chunked transfer encoding. > > As stated above, this could be a problem, yes. > The question is whether one only wants to provide an alternative approach > when it is usable for everyone. I predict WebSockets will be usable by everyone about... never. Its too complex of a standard, and too narrow of a corner case. We are talking about proxy servers that can't do 100-continue correctly, because async network IO was too hard for them to code. Those authors and their software will never support WebSockets' bidirectional requirement. WebSockets isn't critical to browse the web. It never will be. 100-continue might be useful with form based file uploads, which are at least 100x more common on the web than a WebSocket powered thing. Seriously. Call me when WebSockets is actually working. I'd like to come back and see what the world looks like in 2152. > My intention never was to have the current http implementation, be it the > dumb or http-backend one, replaced. The idea here was to provide an > additional option that makes use of a fairly new technology, with all > benefits and drawbacks of using something new. You are free to develop your own remote helper that does this. But I don't expect any Git distribution or implementation to be supporting it. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: git smart protocol via WebSockets - feedback wanted 2012-06-05 18:20 git smart protocol via WebSockets - feedback wanted Stephan Peijnik 2012-06-05 18:31 ` Junio C Hamano @ 2012-06-05 18:36 ` Shawn Pearce 1 sibling, 0 replies; 7+ messages in thread From: Shawn Pearce @ 2012-06-05 18:36 UTC (permalink / raw) To: Stephan Peijnik; +Cc: git On Tue, Jun 5, 2012 at 11:20 AM, Stephan Peijnik <stephan@peijnik.at> wrote: > Since I have been working on a proof of concept showing that git's smart > protocol can be tunneled via WebSocket connections quite easily [0] I wanted > to ask for some feedback on the idea in general and on my implementation > [1]. > > So, basically, what do you think about tunneling git's smart protocol via > WebSockets (and thus HTTP)? ... > [0] > http://blog.sp.or.at/2012/06/git-smart-protocol-via-websockets-proof.html > [1] https://github.com/speijnik/gitws How does this compare with the smart HTTP protocol that has been supported since Git 1.6.6, and uses the git-http-backend CGI at the server side? ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-06-05 21:11 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-06-05 18:20 git smart protocol via WebSockets - feedback wanted Stephan Peijnik 2012-06-05 18:31 ` Junio C Hamano 2012-06-05 18:41 ` Stephan Peijnik 2012-06-05 18:54 ` Shawn Pearce 2012-06-05 19:28 ` Stephan Peijnik 2012-06-05 21:11 ` Shawn Pearce 2012-06-05 18:36 ` Shawn Pearce
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).