Recently, a page that I really wanted to look at was down. As in, no longer existed and the domain was bought by a spammer. Where did I turn to? Archive.org, naturally. Thankfully, the site was listed so that I could check it out. Even greater, I could still download the .zip file that I needed.
What struck me though, is that there were no styles on the page. Normally you get a complete snapshot of a web page, CSS and all. Looking at the source code, it was immediately apparent why no styles were loaded:
<style type="text/css" media="all">@import "/css/global.css";</style> <script type="text/javascript" src="http://web.archive.org/web/20070826215132js_/http://www.website.com/js/prototype.js"></script>
Notice how archive.org automatically prepends their own URL to the front of the archived site’s javascript? I doesn’t do that for the @import‘ed CSS because it doesn’t look like a link.
I’m curious how this works for relative links within the page, whether or not it resolves them to the full domain when archived. I know that Wget can resolve them, and a lot of web scraping programs are built around that, so…