edits
[tools.git] / index.md
blob:a/index.md -> blob:b/index.md
--- a/index.md
+++ b/index.md
@@ -19,6 +19,7 @@
 
 
 # General References {#general-data-hacking-and-programming-references}
+
 
 ## The basics of being a data scientist
 
@@ -65,11 +66,8 @@
 
 [![](img/Screenshot-at-2012-04-29-172132-300x235.png "Git Screenshot")](http://progit.org/book/)
 
-[tutorials on git](http://progit.org/book/) and
-[GUIs to help you](http://code.google.com/p/tortoisegit/)
-
-[manual for Subversion](http://svnbook.red-bean.com/)
-and a [similar GUI for Subversion](http://tortoisesvn.net/)
+There are [tutorials on git](http://progit.org/book/) and [GUIs to help you](http://code.google.com/p/tortoisegit/)
+There is also a [manual for Subversion](http://svnbook.red-bean.com/) and a [similar GUI for Subversion](http://tortoisesvn.net/)
 
 
 ### Task Tracking
@@ -123,9 +121,9 @@
 
 API documentation is important too! Traditionally for SOAP APIs, you use WSDL but for REST try [Swagger](http://swagger.wordnik.com/) or [iodocs](https://github.com/mashery/iodocs)
 Many web app frameworks can generate the documentation for you. For example Symfony for PHP http://symfony.com/ https://github.com/FriendsOfSymfony/FOSRestBundle http://williamdurand.fr/2012/08/02/rest-apis-with-symfony2-the-right-way/ https://github.com/nelmio/NelmioApiDocBundle
+Or for Ruby on Rails there is is https://github.com/elc/rapi_doc https://github.com/Pajk/apipie-rails
 
  better apis https://github.com/liip/LiipHelloBundle
-      - or for Rails https://github.com/elc/rapi_doc https://github.com/Pajk/apipie-rails
 
    http://amberonrails.com/building-stripes-api/
 
@@ -140,20 +138,16 @@
 
 You can find some data visualisation tools below:
 
-[http://www.visualisingdata.com/index.php/2011/07/part-6-the-essential-collection-of-visualisation-resources/](http://www.visualisingdata.com/index.php/2011/07/part-6-the-essential-collection-of-visualisation-resources/)
-
+[Essential Colletion](http://www.visualisingdata.com/index.php/2011/07/part-6-the-essential-collection-of-visualisation-resources/)
+              [Drawing By Numbers Tools and Resources](http://drawingbynumbers.org/toolsandresources)
+               - http://selection.datavisualization.ch/ data viz tools catalog
 Also check out [http://thejit.org](http://thejit.org/) &amp; [http://www.senchalabs.org/<wbr>philogl/</wbr>](http://www.senchalabs.org/philogl/) (contributed by Matt Adcock)
 
-Have to use visual art concepts, good color schemes http://www.r-bloggers.com/the-paul-tol-21-color-salute/
-
-
-    - https://graphics.stanford.edu/wikis/cs448b-12-fall/ data viz theory
-    - http://drawingbynumbers.org/toolsandresources
-
-examples    - http://sunfoundation.tumblr.com/
-### The Open Budget
-
-tools     - http://selection.datavisualization.ch/ data viz tools catalog
+A good infographic should use visual art concepts and [good color schemes](http://www.r-bloggers.com/the-paul-tol-21-color-salute/)
+For more information on the theory of data visualisation check out the (Stanford CS448B notes)[https://graphics.stanford.edu/wikis/cs448b-12-fall/]
+
+Some examples of data visualisation can be seen on [the Sunlight Foundation tumblr](http://sunfoundation.tumblr.com/) or at the GovHack alumn [The Open Budget](http://www/.theopenbudget.org)
+
 
 ## Web Applications
 
@@ -213,32 +207,35 @@
 Backend frameworks http://helios.io/ https://www.parse.com/
 ### Examples
 
-bom water,
-
-nz gov budget
+Bureau of Meteorology Water Storage App http://icelab.com.au/work/bureau-of-meteorology/
+
+NZ Gov budget http://www.treasury.govt.nz/budget/app
 
 
 # Geographical Data Tools {#geographical-data-tools}
 
-Check out the[ GeoRabble Boundary Mapper's Cookbook](http://georabble.org/2012/05/31/the-boundary-mappers-cookbook/) to see how you can tie all these things together!
+Check out the [GeoRabble Boundary Mapper's Cookbook](http://georabble.org/2012/05/31/the-boundary-mappers-cookbook/) to see how you can tie all these things together!
+
+GeoDjango TileMill
 
 ## Key datasets
-          - base layers like agri http://agri.openstreetmap.org/, http://irs.gis-lab.info/ wms or http://www.gdal.org/frmt_wms_openstreetmap_tms.xml
-           ASGS including suburbs/postcodes
-                   - andrewharvey4.wordpress.com postgis/asgs tutorial
+base layers like agri http://agri.openstreetmap.org/, http://irs.gis-lab.info/ wms or http://www.gdal.org/frmt_wms_openstreetmap_tms.xml
+
+ASGS from ABS including suburbs/postcodes andrewharvey4.wordpress.com postgis/asgs tutorial
+You can also get KML layers for various statistical measures on the ABS TableBuilder tool.
+
 ## Wrangling
 
 ### Converting
 There are many spatial data formats and often the one your tool requires is not the one the dataset is provided in
 Online
   - http://converter.mygeodata.eu/vector kml exporter for shp
-or locally using GDAL
+or locally using GDAL (better for many megabyte datasets)
 
 ### Geocoding
-cloudmade, google (but you must display on a Google Map).
-
-Easiest way to do is with a Google Spreadsheet/Fusion Table http://williamparry.blogspot.com.au/2011/04/putting-data-into-google-fusion-tables.htm http://support.google.com/fusiontables/answer/1012281?hl=en&ref_topic=2592806
-
+Google Maps APIs allow you to convert an address to map co-ordinates (geocoding) but you must display on a Google Map. The easiest way to do is with a Google Spreadsheet/Fusion Table http://williamparry.blogspot.com.au/2011/04/putting-data-into-google-fusion-tables.htm http://support.google.com/fusiontables/answer/1012281?hl=en&ref_topic=2592806
+
+If you need geocoding for more than display (working out the distance between points etc) or you don't want to use Google Maps, Cloudmade offers free OpenStreetMap based geocoding http://developers.cloudmade.com/projects/show/geocoding-http-api
 
 ## Analysis
 
@@ -347,7 +344,7 @@
 ### Processing.js
 
 # Unstructured (Text) Data Tools
-Most of thw world's dat isn't structured because it is contained in documents (webpages, tweets etc.). Sometimes it is possible to structure it, sometimes there are tools that are better suited it unstructured data.
+Most of the world's data isn't structured because it is contained in documents (webpages, tweets etc.). Sometimes it is possible to structure it, sometimes there are tools that are better suited it unstructured data.
 ## Wrangling
 For extracting data from webpages, checkout Scraperwiki pytemplate scrapy
 
@@ -356,10 +353,13 @@
 If there is no way to form a table structure to be able to apply tabular data techniques , you need a more sophisticated analysis as detailed below.
 
 ## Analysing
+Natural Language Processing
     - opennlp/nltk / https://github.com/clips/pattern
     
+A search engine just for your dataset can also help
     - lucene/solr
     
+For light weight analysis, try R or Ruby
     - http://www.r-bloggers.com/simple-text-mining-with-r/
     
     - http://blog.josephwilk.net/ruby/latent-semantic-analysis-in-ruby.html similar terms usually found together
@@ -375,8 +375,7 @@
 
 
 # Graph (relationships and networks) Data Tools {#graph-relationships-and-networks-data-tools}
-
-Why? Find communities, hubs, connections between (the X degrees of separation)
+Graph data can be very valuable for finding communities, hubs and connections between entities (the 6 degrees of separation). This is through the techniques of Social Network Analysis.
     - http://www.slideshare.net/OReillyStrata/visualizing-networks-beyond-the-hairball
     - http://blog.sciencenet.cn/blog-554179-622011.html SNA tools catalog
     - https://github.com/jacomyal/osdc2012-sigmajs-demo sigmajs filtering/searching
@@ -391,10 +390,8 @@
 
 ### Graph Databases
 
-[![](img/webadmin-data-300x127.png "Neo4\. web admin screenshot")](img/webadmin-data.png)Help understand relationships - how is X connected to Y and via what other entities they both are connected to. Imports and exports
-
-    - http://www.slideshare.net/maxdemarzi/etl-into-neo4j
-    http://blog.neo4j.org/2013/03/importing-data-into-neo4j-spreadsheet.html
+[![](img/webadmin-data-300x127.png "Neo4\. web admin screenshot")](img/webadmin-data.png)Help understand relationships - how is X connected to Y and via what other entities they both are connected to.
+Imports and exports can be done by [writing a java program](http://www.slideshare.net/maxdemarzi/etl-into-neo4j) or [spreadsheet](http://blog.neo4j.org/2013/03/importing-data-into-neo4j-spreadsheet.html)
 
 There are other graph databases worth considering like [OrientDB](http://www.orientdb.org/) or [Titan](http://thinkaurelius.github.com/titan/)
 Major graph databases like these can be accessed using a common syntax called Gremlin or by writing a simple Java/Python/Ruby application. Queries can be tested in the built in data browser.
@@ -407,7 +404,7 @@
 
 NetworkX is a social network analysis library for python. Many advanced analyses built in like finding communities within a graph. Also good for converting data into graphs.
 
-tutorial/intro http://www.cl.cam.ac.uk/~cm542/teaching/2011/stna-pdfs/stna-lecture11.pdf
+See this [introduction to Social Network Analysis with NetworkX](http://www.cl.cam.ac.uk/~cm542/teaching/2011/stna-pdfs/stna-lecture11.pdf)
 
 
 ## Visualisation
@@ -429,5 +426,5 @@
 
 ### [sigma.js](http://sigmajs.org/)
 
-[![](img/How-to-participate-in-GovHack_html_m6006eaf3-300x130.jpg "Sigma.js Screenshot")](img/How-to-participate-in-GovHack_html_m6006eaf3.jpg)Javascript graph viewer, can use GEXF files exported from tools like neo4j, gephi and NetworkX.
-
+[![](img/How-to-participate-in-GovHack_html_m6006eaf3-300x130.jpg "Sigma.js Screenshot")](img/How-to-participate-in-GovHack_html_m6006eaf3.jpg)Javascript graph viewer for displaying graphs on webpages without any other plugins/applications required. It can use GEXF files exported from tools like neo4j, gephi or NetworkX.
+