As Google builds HTML5 offline access into its Google Docs web-based word processor, the company has introduced a change that inadvertently causes problems for some netizens using the service behind a network firewall. Google will not reverse the change, saying that it's required for offline access, due later this year. But it has provided businesses and schools with (extensive) instructions for reconfiguring firewalls so that the service can operate normally.
"Unfortunately, short of something unexpected, the current behavior is here to stay," Google Docs product manager Jeff Harris told users in a post to Google's help forums on Thursday. "I think the problems you're experiencing right now are going to need to be resolved on your end (I hate to say that, because I know it's us who changed behavior). The alternative would be for Docs to not implement support for offline, which isn't something we can concede."
A Google spokeswoman confirmed the glitch and Harris' stance. "We are extremely sorry that some companies are experiencing issues because of their specific firewall configurations in place," she told us. "But it definitely was not our intention [to cause the problem]."
One user complained about the problem on January 28, posting a note to the Google help forums. "I am trying to grade projects that my students have submitted and keep getting the 'trying to reach google.com' error message at the top," he said. "I can see the documents, but cannot edit or put comments. Help!"
On Thursday, Google's Jeff Harris responded with his lengthy post explaining the problem, which arrived when the company changed the way Docs handles collaboration. With Google Docs, multiple users can collaborate on a document in "real time" across the net, and this requires an open connection to each browser. Browsers cap open connections for each domain at six, so if Google wants to allow collaboration across more than six documents – and it does – it needs a workaround.
In the past, Google bypassed the six-connection limit using what Harris calls domain rotation. Google would set URLs as "doc[N].google.com/", where N is a number between 0 and 9. This provided 11 different domains (including plain old doc.google.com) and potentially 66 open connections. But this setup was incompatible with the company's HTML5 offline setup.
In the past, Google provided offline access via its Google Gears browser extension, but in May, the company removed Gears offline access and said it would move to HTML5. In December, Google announced that HTML5 offline access would appear "early this year."
The problem is that HTML5 handles offline access on a domain-by-domain basis. "If we're randomly storing resources in any one of 11 subdomains, it meant there were situations where we wouldn't be able to access/save content that you stored offline," Harris said. "Offline [and] domain rotation are incompatible."
So Google switched to a setup where it communicates via subdomains off docs.google.com. And in order to provide SSL encryption, it uses subdomains off mail.google.com as well. "In order to connect over SSL (which we use for everyone for security reasons) we need to set up a virtual private network (VIP). VIPs are prohibitively expensive, to the point where Google cannot feasibly add new ones. To get around this we added docs as a something called a subject alternative name (SAN) on one of our only existing VIPs (Gmail's)," Harris said.
"Getting this non-domain rotated setup to work is only barely possible: it required massive security/certificate changes and an insane amount of browser-specific hacks to wire things up. I'm not optimistic about our chances of finding an alternative implementation."
So, in order to use Google Docs, there's a long list of domains that must be accessible through a firewall:
- http (port 80) connection to docs.google.com, docs<N>.google.com and *.docs.google.com.
- https (port 443) connection to docs.google.com and docs<N>.google.com. (The certificate protecting this connection has *.google.com as its subject).
- https (port 443) connection to *.docs.google.com. (The certificate protecting this connection has *.mail.google.com as its subject, but has *.docs.google.com as a subject alternative name).
Harris also says that domain names will resolve to IP addresses that may not fall inside any given address range and that IP addresses used by Docs may be used by other Google services. And if you use Google Spreadsheets or Google Sites or other parts of Google Docs, you'll have to make additional firewall changes. You can finds the entire list of domains here.
In his post, Jeff Harris also says that Google has now equipped its Chrome browser with SPDY, an application layer protocol the company developed to speed downloads. According to Harris, when Chrome starts up, it enables SPDY nine out of every ten times. Apparently, this is part of the way the company is testing the protocol. "SPDY will be enabled 90% of the time, and 10% of the time it'll be disabled as a control group. Eventually it will be enabled 100% of the time," Harris said.
If you ensure that SPDY is enabled on Chrome – using the "--use-spdy=npn" flag – you can work around the domain problems without reconfiguring your firewall. SPDY multiplexes open connections that point to a single domain. "This effectively means that there is no per-domain connection limit," Harris said, "which means we don't need to use the *.docs.google.com trickery to get collaboration+offline to work."
Google unveiled SPDY as an experimental protocol in November 2009, boasting that it would make the web two times faster. The protocol speeds downloading not only with multiplexed streams, but with request prioritization and HTTP header compression as well. SPDY creates a session between the HTTP application layer and the TCP transport layer, and this session uses an HTTP-like request-response setup.
It's unclear how many users have been affected by the Google Docs firewall problem. But this is almost a side issue. Jeff Harris' post doesn't just acknowledge a problem and show you how to fix it. He provides at least a small window into the design of Google Docs, giving the world an idea of just how many hoops Google must jump through to provide this sort of dynamic web application, and he pulls back another curtain on the company's ongoing efforts to speed up the interwebs.
In a later post, Harris indicates that with Chrome 10, introduced earlier this week, Google has increased the use of SPDY. "Chrome is ramping up the percentage of requests that use SPDY with each release," he said. "So when you upgraded to 10, the chances of you using SPDY went up. It’s still not at 100% of requests, but it’s close." This sort of thing is not mentioned in Chrome release notes. But on the SPDY mailing list, Google has said that SPDY is enabled in Chrome and Google servers for SSL traffic. ®
Update: This story has been updated to show that Google requires more than six open connections on a browser to allow for collaboration across more than six documents.