Yeah, the // isn't part of the scheme, it's part of the authority.
That BBC article doesn't say why "net users" find them annoying. I like the double slashes (or at least some unique separator of the scheme from the hostname and designates where the hostname starts), since it allows building of useful relative URLs, as mentioned in the OA. As an example, not covered in the examples in the article, you don't need to serve different CSS files for secure and insecure content when you serve media assets from another domain which is also available via both http and https.
can be used on both HTTP and HTTPS served pages and the browser will resolve that relative URL by filling in the protocol from the base document. Without the double slashes, you wouldn't be able to distinguish between a relative path and a relative URL. If I remember correctly, a scheme change on the same hostname but with a different path like:
base document: http://example.com/some/path
relative URL: https:/some/other/path
is possible too (I wonder how the parsing should work with port numbers, if they can be relative too -- I have not read the RFC in a while, and it's such a rare thing to use port numbers anyway).
Double slashes (well, backslashes) is how Microsoft/CIFS originally designated server names in UNC paths, which I think may have been around before URLs were standardized (don't quote me on that, they're most likely roughly the same age and influenced each other). This is also why the file: scheme "requires" three leading slashes, as the "host" is empty to designate the local machine -- but you could put in a hostname to access network shares (I put "requires" in quotes because file has always had some ambiguities in the parsing implementations).
I find it annoying when people read addresses and call them "backslashes". Talk about wasting time and energy, that's a whole additional syllable said for every path component in a URL!
These tricks by combining parts of URLs to do relative linking on scheme, host, etc. are little known but useful; I was expecting them to show up in the article.
Going back to the debate over Chrome's potential dropping of "http:// in the address bar, if they were to use "//" instead they would have an argument for technical correctness because the default protocol of "http:" could be assumed. But having no leading "//" visually confuses it with a relative path omitting the host, because it breaks the signifier for the authority component of the URL. Just a thought.
Are there any web developers here who didn't know these things? Are there any developers of any type who didn't? I'm not sure who the audience for this article is meant to be but I'm guessing you won't find too many of them here.
I never knew about params, I know a few people who didnt know things like fragments dont get sent to the server, everyone gets encoding wrong at some point, and I know a lot of people dont really know how the base tags work
I didn't either - in path params seems like an interesting way to address issues that can come up when devising RESTful URL schemes (rather than relying on query params only).
For fucks sake, there is no such thing as a RESTful URL scheme. 'Pretty' URLs are a major anti-pattern -- they not only don't make you any more RESTful, they actively undermine HATEOAS, which is the true signifier of RESTfulness.
Actual REST treats URL strings (including query parameters) as being completely opaque implementation details. The server is supposed to respond with URLs in the hypertext -- you're never supposed to be formatting them yourself client-side using out of band knowledge. Query parameters are no exception to that: if you want the client to pass them, give the client a form in the response.
If you're expecting a client to munge together "path components" based on foreknowledge of your data model, you're doing it wrong.
Any chance you have articles to back this up? Not that I doubt you, but I've never really understood or looked into all this REST business and if I'm ever going to do so, I'd like to learn what being RESTful really means --- and not just jumping on what sounds like a bandwagon for 'Pretty URLs' as you mentioned. :)
How are you supposed to represent collections, if every link must be provided by the server? I've seen REST examples where you give some results and a link to the next page of results. That's impractical if you have a million pages.
If your collection is truly random access (array, search results, relational sets) then use a form that takes the 'foreign key' as input. Then there's one URL and standard query parameters.
In pretty much all other cases a collection would have data/metadata in the response that far outweighs the links. And given that it's perfectly fine for the links to be completely opaque, there's no reason for them to be very long.
The real problem with pagination is that all but a few brave souls completely fuck up the implementation of it. This is the worst possible way to paginate something, but just about every webapp ever written does it like this:
SELECT * FROM posts ORDER BY date DESC LIMIT x OFFSET n*x
The locations of items on pages change constantly as new items are created and destroyed! You page through the history (usually via links with the worst possible names: prev & next), and the items shift around as you move around. It's OK that the page with the most recent items changes as new ones are created, but having the archives be a pushdown stack is just idiotic.
It would be terrific if people used meaningful pagination instead of arbitrary offsets: posts by year/month/week/day/hour/minute/second/etc. is far better than "Page N" -- you don't even need to give me any options, just use older/newer links that point to the level of granularity that would give an appropriate number of results.
I didn't know some of the unusual specifics, but then again, I don't think that I need to either. It seemed less like a tutorial on what every web developer needs to know than it was documentation on what every browser maker needs to know.
I thought I knew these things until I worked on a project to scrape arbitrary websites. We had to follow arbitrary links around a site (easy, right?) but it turns out there are many, many edge cases we had to deal with.
Another developer pretty quickly decided to ditch Ruby's URL parser and write our own, since there are tons of things browsers deal with that you wouldn't think of. For example, relative links starting with "//" share only the protocol (http or https) between the current page. Add in vagaries specific to some HTTP servers, like that http://foo/bar == http://foo/bar/ and we quickly realized it was a lot bigger task than we thought.
We ultimately got the thing working OK, but crazy edge cases just kept popping up.
I never knew any of this stuff until I looked at the Javadoc for the Java URL classes. I'm not a web developer, but web developers aren't the only developers who use URLs, of course.
No mention of non-ASCII characters at all? Punycode in the host name may still be uncommon, but passing non-ASCII in the query is important. There is also nothing about the encoding of space as + rather than %20 that happens a lot.
I suspect that there're a lot of RFCs and W3C specs that could be paraphrased and get you on HN.
How many web developers actually know HTML, for example? In a few discussions here and on Reddit, it seemed like well over 80% did not realize that <!doctype html> is a valid doctype, or that you do not need to close many common tags.
The two slashes are even now considered a bad design decision originally: http://news.bbc.co.uk/2/hi/8306631.stm