So in this post, I want to do two things:
First, give a quick overview of the geography of the ArXiv. This is interesting in itself--the ArXiv is the most comprehensive source of scientific papers for physics and mathematics, and plays a substantial role in some other fields. And it's good for me going forward, as a way to build up some code that can be used on other collections.
Second, to put some code online. I've been doing most of my work lately--writing as well as coding--in RStudio using Yihui Xie's fantastic Knitr package. The idea is to combine code with text to allow, simultaneously, literate programming and reproducible research. Blogger is pain: but all the source and text for this post is up at the Rpubs site, which is a very interesting project encouraging sharing research. You can go read this post there instead of here if you want code, but there are a few small changes. And the youtube clip is only available here.
The basic idea--to jump ahead a bit--is that it might be useful to create charts like the following, which show differing geographical patterns of usage. (Here, people talk about Harvard near Harvard, and Stanford near Stanford--but in Europe, Stanford seems to win out near the big particle physics projects in Italy and Switzerland.)
|Click to enlarge|