Implement oauth2 for accessing google analytics
Implement oauth2 for accessing google analytics

- Provided a command (getauthtoken) to go through the auth flow and have
the user's oauth credentials written to local disk.
- Make sure loadanalytics command uses the token file to authenticate on
the request
- Added documentation on how to get the credentials (from google) to allow
for the oauth2 flow.
- Added google-api-client to the requirements

file:a/.gitignore -> file:b/.gitignore
*.py[co] *.py[co]
*.py~ *.py~
.gitignore .gitignore
   
# Packages # Packages
*.egg *.egg
*.egg-info *.egg-info
dist dist
build build
eggs eggs
parts parts
bin bin
var var
sdist sdist
develop-eggs develop-eggs
.installed.cfg .installed.cfg
   
  # Private info
  credentials.json
  token.dat
   
# Installer logs # Installer logs
pip-log.txt pip-log.txt
   
# Unit test / coverage reports # Unit test / coverage reports
.coverage .coverage
.tox .tox
   
#Translations #Translations
*.mo *.mo
   
#Mr Developer #Mr Developer
.mr.developer.cfg .mr.developer.cfg
   
file:a/README.rst -> file:b/README.rst
ckanext-ga-report ckanext-ga-report
================= =================
   
**Status:** Development **Status:** Development
   
**CKAN Version:** 1.7.1+ **CKAN Version:** 1.7.1+
   
   
Overview Overview
-------- --------
   
For creating detailed reports of CKAN analytics, including totals per group. For creating detailed reports of CKAN analytics, including totals per group.
   
Whereas ckanext-googleanalytics focusses on providing page view stats a recent period and for all time (aimed at end users), ckanext-ga-report is more interested in building regular periodic reports (more for site managers to monitor). Whereas ckanext-googleanalytics focusses on providing page view stats a recent period and for all time (aimed at end users), ckanext-ga-report is more interested in building regular periodic reports (more for site managers to monitor).
   
Contents of this extension: Contents of this extension:
   
* Use the CLI tool to download Google Analytics data for each time period into this extension's database tables * Use the CLI tool to download Google Analytics data for each time period into this extension's database tables
   
* Users can view the data as web page reports * Users can view the data as web page reports
   
   
Installation Installation
------------ ------------
   
1. Activate you CKAN python environment and install this extension's software:: 1. Activate you CKAN python environment and install this extension's software::
   
$ pyenv/bin/activate $ pyenv/bin/activate
$ pip install -e git+https://github.com/okfn/ckanext-ga-report.git#egg=ckanext-ga-report $ pip install -e git+https://github.com/okfn/ckanext-ga-report.git#egg=ckanext-ga-report
   
2. Ensure you development.ini (or similar) contains the info about your Google Analytics account and configuration:: 2. Ensure you development.ini (or similar) contains the info about your Google Analytics account and configuration::
   
googleanalytics.id = UA-1010101-1 googleanalytics.id = UA-1010101-1
googleanalytics.username = googleaccount@gmail.com googleanalytics.username = googleaccount@gmail.com
googleanalytics.password = googlepassword googleanalytics.password = googlepassword
ga-report.period = monthly ga-report.period = monthly
   
Note that your password will be readable by system administrators on your server. Rather than use sensitive account details, it is suggested you give access to the GA account to a new Google account that you create just for this purpose. Note that your password will be readable by system administrators on your server. Rather than use sensitive account details, it is suggested you give access to the GA account to a new Google account that you create just for this purpose.
   
3. Set up this extension's database tables using a paster command. (Ensure your CKAN pyenv is still activated, run the command from ``src/ckanext-ga-report``, alter the ``--config`` option to point to your site config file):: 3. Set up this extension's database tables using a paster command. (Ensure your CKAN pyenv is still activated, run the command from ``src/ckanext-ga-report``, alter the ``--config`` option to point to your site config file)::
   
$ paster initdb --config=../ckan/development.ini $ paster initdb --config=../ckan/development.ini
   
4. Enable the extension in your CKAN config file by adding it to ``ckan.plugins``:: 4. Enable the extension in your CKAN config file by adding it to ``ckan.plugins``::
   
ckan.plugins = ga-report ckan.plugins = ga-report
   
   
  Authorization
  --------------
   
  Before you can access the data, you need to set up the OAUTH details which you can do by following the `instructions <https://developers.google.com/analytics/resources/tutorials/hello-analytics-api>`_ the outcome of which will be a file called credentials.json which should look like credentials.json.template with the relevant fields completed. These steps are below for convenience:
   
  1. Visit the `Google APIs Console <https://code.google.com/apis/console>`_
   
  2. Sign-in and create a project or use an existing project.
   
  3. In the `Services pane <https://code.google.com/apis/console#:services>`_ , activate Analytics API for your project. If prompted, read and accept the terms of service.
   
  4. Go to the `API Access pane <https://code.google.com/apis/console/#:access>`_
   
  5. Click Create an OAuth 2.0 client ID....
   
  6. Fill out the Branding Information fields and click Next.
   
  7. In Client ID Settings, set Application type to Installed application.
   
  8. Click Create client ID
   
  9. The details you need below are Client ID, Client secret, and Redirect URIs
   
   
  Once you have set up your credentials.json file you can generate an oauth token file by using the
  following command, which will store your oauth token in a file called token.dat once you have finished
  giving permission in the browser.
   
  $ paster getauthtoken --config=../ckan/development.ini
   
   
Tutorial Tutorial
-------- --------
   
Download some GA data and store it in CKAN's db. (Ensure your CKAN pyenv is still activated, run the command from ``src/ckanext-ga-report``, alter the ``--config`` option to point to your site config file):: Download some GA data and store it in CKAN's db. (Ensure your CKAN pyenv is still activated, run the command from ``src/ckanext-ga-report``, alter the ``--config`` option to point to your site config file) and specifying the name of your auth file (token.dat by default) from the previous step::
   
$ paster loadanalytics latest --config=../ckan/development.ini $ paster loadanalytics token.dat latest --config=../ckan/development.ini
   
   
Software Licence Software Licence
================ ================
   
This software is developed by Cabinet Office. It is Crown Copyright and opened up under the Open Government Licence (OGL) (which is compatible with Creative Commons Attibution License). This software is developed by Cabinet Office. It is Crown Copyright and opened up under the Open Government Licence (OGL) (which is compatible with Creative Commons Attibution License).
   
OGL terms: http://www.nationalarchives.gov.uk/doc/open-government-licence/ OGL terms: http://www.nationalarchives.gov.uk/doc/open-government-licence/
   
import logging import logging
   
from ckan.lib.cli import CkanCommand from ckan.lib.cli import CkanCommand
# No other CKAN imports allowed until _load_config is run, or logging is disabled # No other CKAN imports allowed until _load_config is run, or logging is disabled
   
class InitDB(CkanCommand): class InitDB(CkanCommand):
"""Initialise the extension's database tables """Initialise the extension's database tables
""" """
summary = __doc__.split('\n')[0] summary = __doc__.split('\n')[0]
usage = __doc__ usage = __doc__
max_args = 0 max_args = 0
min_args = 0 min_args = 0
   
def command(self): def command(self):
self._load_config() self._load_config()
   
import ckan.model as model import ckan.model as model
model.Session.remove() model.Session.remove()
model.Session.configure(bind=model.meta.engine) model.Session.configure(bind=model.meta.engine)
log = logging.getLogger('ckanext.ga-report') log = logging.getLogger('ckanext.ga-report')
   
import ga_model import ga_model
ga_model.init_tables() ga_model.init_tables()
log.info("DB tables are setup") log.info("DB tables are setup")
   
   
  class GetAuthToken(CkanCommand):
  """ Get's the Google auth token
  """
  summary = __doc__.split('\n')[0]
  usage = __doc__
  max_args = 0
  min_args = 0
   
  def command(self):
  from ga_auth import initialize_service
  initialize_service('token.dat',
  self.args[0] if self.args
  else 'credentials.json')
   
class LoadAnalytics(CkanCommand): class LoadAnalytics(CkanCommand):
"""Get data from Google Analytics API and save it """Get data from Google Analytics API and save it
in the ga_model in the ga_model
   
Usage: paster loadanalytics <time-period> Usage: paster loadanalytics <tokenfile> <time-period>
   
Where <time-period> is: Where <tokenfile> is the name of the auth token file from
  the getauthtoken step.
   
  And where <time-period> is:
all - data for all time all - data for all time
latest - (default) just the 'latest' data latest - (default) just the 'latest' data
YYYY-MM-DD - just data for all time periods going YYYY-MM-DD - just data for all time periods going
back to (and including) this date back to (and including) this date
""" """
summary = __doc__.split('\n')[0] summary = __doc__.split('\n')[0]
usage = __doc__ usage = __doc__
max_args = 1 max_args = 2
min_args = 0 min_args = 1
   
def command(self): def command(self):
self._load_config() self._load_config()
   
  from ga_auth import initialize_service
  try:
  svc = initialize_service(self.args[0], None)
  except TypeError:
  print 'Have you correctly run the getauthtoken task and specified the correct file here'
  return
   
from download_analytics import DownloadAnalytics from download_analytics import DownloadAnalytics
downloader = DownloadAnalytics() from ga_auth import get_profile_id
  downloader = DownloadAnalytics(svc, profile_id=get_profile_id(svc))
time_period = self.args[0] if self.args else 'latest'  
  time_period = self.args[1] if self.args and len(self.args) > 1 else 'latest'
if time_period == 'all': if time_period == 'all':
downloader.all_() downloader.all_()
elif time_period == 'latest': elif time_period == 'latest':
downloader.latest() downloader.latest()
else: else:
since_date = datetime.datetime.strptime(time_period, '%Y-%m-%d') since_date = datetime.datetime.strptime(time_period, '%Y-%m-%d')
downloader.since_date(since_date) downloader.since_date(since_date)
   
   
import logging import logging
import datetime import datetime
   
from pylons import config from pylons import config
   
import ga_model import ga_model
from ga_client import GA  
  #from ga_client import GA
   
log = logging.getLogger('ckanext.ga-report') log = logging.getLogger('ckanext.ga-report')
   
FORMAT_MONTH = '%Y-%m' FORMAT_MONTH = '%Y-%m'
   
class DownloadAnalytics(object): class DownloadAnalytics(object):
'''Downloads and stores analytics info''' '''Downloads and stores analytics info'''
def __init__(self):  
  def __init__(self, service=None, profile_id=None):
self.period = config['ga-report.period'] self.period = config['ga-report.period']
  self.service = service
  self.profile_id = profile_id
   
   
def all_(self): def all_(self):
pass self.since_date(datetime.datetime(2010, 1, 1))
   
def latest(self): def latest(self):
if self.period == 'monthly': if self.period == 'monthly':
# from first of this month to today # from first of this month to today
now = datetime.datetime.now() now = datetime.datetime.now()
first_of_this_month = datetime.datetime(now.year, now.month, 1) first_of_this_month = datetime.datetime(now.year, now.month, 1)
periods = ((now.strftime(FORMAT_MONTH), periods = ((now.strftime(FORMAT_MONTH),
now.day, now.day,
first_of_this_month, now),) first_of_this_month, now),)
else: else:
raise NotImplementedError raise NotImplementedError
self.download_and_store(periods) self.download_and_store(periods)
   
   
def since_date(self, since_date): def since_date(self, since_date):
assert isinstance(since_date, datetime.datetime) assert isinstance(since_date, datetime.datetime)
periods = [] # (period_name, period_complete_day, start_date, end_date) periods = [] # (period_name, period_complete_day, start_date, end_date)
if self.period == 'monthly': if self.period == 'monthly':
first_of_the_months_until_now = [] first_of_the_months_until_now = []
year = since_date.year year = since_date.year
month = since_date.month month = since_date.month
now = datetime.datetime.now() now = datetime.datetime.now()
first_of_this_month = datetime.datetime(now.year, now.month, 1) first_of_this_month = datetime.datetime(now.year, now.month, 1)
while True: while True:
first_of_the_month = datetime.datetime(year, month, 1) first_of_the_month = datetime.datetime(year, month, 1)
if first_of_the_month == first_of_this_month: if first_of_the_month == first_of_this_month:
periods.append((now.strftime(FORMAT_MONTH), periods.append((now.strftime(FORMAT_MONTH),
now.day, now.day,
first_of_this_month, now)) first_of_this_month, now))
break break
elif first_of_the_month < first_of_this_month: elif first_of_the_month < first_of_this_month:
in_the_next_month = first_of_the_month + datetime.timedelta(40) in_the_next_month = first_of_the_month + datetime.timedelta(40)
last_of_the_month == datetime.datetime(in_the_next_month.year, last_of_the_month = datetime.datetime(in_the_next_month.year,
in_the_next_month.month, a)\ in_the_next_month.month, 1)\
- datetime.timedelta(1) - datetime.timedelta(1)
periods.append((now.strftime(FORMAT_MONTH), 0, periods.append((now.strftime(FORMAT_MONTH), 0,
first_of_the_month, last_of_the_month)) first_of_the_month, last_of_the_month))
else: else:
# first_of_the_month has got to the future somehow # first_of_the_month has got to the future somehow
break break
month += 1 month += 1
if month > 12: if month > 12:
year += 1 year += 1
month = 1 month = 1
else: else:
raise NotImplementedError raise NotImplementedError
self.download_and_store(periods) self.download_and_store(periods)
   
@staticmethod @staticmethod
def get_full_period_name(period_name, period_complete_day): def get_full_period_name(period_name, period_complete_day):
if period_complete_day: if period_complete_day:
return period_name + ' (up to %ith)' % period_complete_day return period_name + ' (up to %ith)' % period_complete_day
else: else:
return period_name return period_name
   
   
def download_and_store(self, periods): def download_and_store(self, periods):
for period_name, period_complete_day, start_date, end_date in periods: for period_name, period_complete_day, start_date, end_date in periods:
log.info('Downloading Analytics for period "%s" (%s - %s)', log.info('Downloading Analytics for period "%s" (%s - %s)',
self.get_full_period_name(period_name, period_complete_day), self.get_full_period_name(period_name, period_complete_day),
start_date.strftime('%Y %m %d'), start_date.strftime('%Y %m %d'),
end_date.strftime('%Y %m %d')) end_date.strftime('%Y %m %d'))
data = self.download(start_date, end_date) data = self.download(start_date, end_date)
log.info('Storing Analytics for period "%s"', log.info('Storing Analytics for period "%s"',
self.get_full_period_name(period_name, period_complete_day)) self.get_full_period_name(period_name, period_complete_day))
self.store(period_name, period_complete_day, data) self.store(period_name, period_complete_day, data)
   
@classmethod  
def download(cls, start_date, end_date): def download(self, start_date, end_date):
'''Get data from GA for a given time period''' '''Get data from GA for a given time period'''
start_date = start_date.strftime('%Y-%m-%d') start_date = start_date.strftime('%Y-%m-%d')
end_date = end_date.strftime('%Y-%m-%d') end_date = end_date.strftime('%Y-%m-%d')
# url # url
#query = 'ga:pagePath=~^%s,ga:pagePath=~^%s' % \ #query = 'ga:pagePath=~^%s,ga:pagePath=~^%s' % \
# (PACKAGE_URL, self.resource_url_tag) # (PACKAGE_URL, self.resource_url_tag)
query = 'ga:pagePath=~^/dataset/' query = 'ga:pagePath=~^/dataset/'
  #query = 'ga:pagePath=~^/User/'
metrics = 'ga:uniquePageviews' metrics = 'ga:uniquePageviews'
sort = '-ga:uniquePageviews' sort = '-ga:uniquePageviews'
for entry in GA.ga_query(query_filter=query,  
from_date=start_date, # Supported query params at
  # https://developers.google.com/analytics/devguides/reporting/core/v3/reference
  results = self.service.data().ga().get(
  ids='ga:' + self.profile_id,
  filters=query,
  start_date=start_date,
metrics=metrics, metrics=metrics,
sort=sort, sort=sort,
to_date=end_date): end_date=end_date).execute()
print entry self.print_results(results)
import pdb; pdb.set_trace()  
for dim in entry.dimension: # for entry in GA.ga_query(query_filter=query,
if dim.name == "ga:pagePath": # from_date=start_date,
package = dim.value # metrics=metrics,
count = entry.get_metric( # sort=sort,
'ga:uniquePageviews').value or 0 # to_date=end_date):
packages[package] = int(c