[submodule "couchdb/couchdb-lucene"] | [submodule "couchdb/couchdb-lucene"] |
path = couchdb/couchdb-lucene | path = couchdb/couchdb-lucene |
url = https://github.com/rnewson/couchdb-lucene.git | url = https://github.com/rnewson/couchdb-lucene.git |
[submodule "couchdb/settee"] | [submodule "couchdb/settee"] |
path = couchdb/settee | path = couchdb/settee |
url = https://github.com/inadarei/settee.git | url = https://github.com/inadarei/settee.git |
[submodule "lib/springy"] | |
path = lib/springy | |
url = https://github.com/dhotson/springy.git | |
[submodule "lib/php-diff"] | |
path = lib/php-diff | |
url = https://github.com/chrisboulton/php-diff.git | |
[submodule "javascripts/flot"] | |
path = javascripts/flot | |
url = https://github.com/paradoxxxzero/flot.git | |
<?php | <?php |
include_once('include/common.inc.php'); | include_once('include/common.inc.php'); |
include_header(); | include_header(); |
?> | ?> |
<div class="foundation-header"> | <div class="foundation-header"> |
<h1><a href="about.php">About/FAQ</a></h1> | <h1><a href="about.php">About/FAQ</a></h1> |
<h4 class="subheader">Lorem ipsum.</h4> | <h4 class="subheader">Lorem ipsum.</h4> |
</div> | </div> |
<h2> What is this? </h2> | |
Disclosr is a project to monitor Australian Federal Government agencies | |
compliance with their <a href="http://www.oaic.gov.au/publications/other_operational/foi_policy_frequently_asked_questions.html#_Toc291837571">"proactive disclosure requirements"</a>. | |
OGRE (Open Government Realization Evaluation) is a ranking of compliance with these requirements. | |
Prometheus is the agent which polls agency websites to assess compliance. | |
<h2> Open everything </h2> | <h2> Open everything </h2> |
all documents released CC-BY 3 AU | All documents released CC-BY 3 AU |
Open source git @ | Open source git @ |
<h2>Organisational Data Sources</h2> | <h2>Organisational Data Sources</h2> |
http://www.comlaw.gov.au/Browse/Results/ByTitle/AdministrativeArrangementsOrders/Current/Ad/0 defines departments | http://www.comlaw.gov.au/Browse/Results/ByTitle/AdministrativeArrangementsOrders/Current/Ad/0 defines departments |
Agencies can be found in the Schedule to an Appropriation Bill (budget), Schedule to FMA Regulations and/or Public Service Act. | Agencies can be found in the Schedule to an Appropriation Bill (budget), Schedule to FMA Regulations and/or Public Service Act.<br> |
http://www.finance.gov.au/publications/flipchart/docs/FMACACFlipchart.pdf summarises these | http://www.finance.gov.au/publications/flipchart/docs/FMACACFlipchart.pdf summarises these. view-source:https://www.tenders.gov.au/?event=public.advancedsearch.home is great for the suspended/active status<br> |
When defining the hierachy, this system is designed towards monitoring accountablity. Thus large agencies that have registered their own ABN | When defining the hierachy, this system is designed towards monitoring accountablity. Thus large agencies that have registered their own ABN |
and have their own accountablity mechanisms/website recieve a seperate record as a child of their department. | and have their own accountablity mechanisms/website receive a seperate record as a child of their department. |
Some small agencies will choose to simply rely on their parent department's accountablity measures. | Some small agencies will choose to simply rely on their parent department's accountablity measures.<br> |
This flows through to organisation name and other/past names. A department that accounts for an agency will list that agency as an other child name. | This flows through to organisation name and other/past names. A department that completely accounts for an agency will list that agency as an other child name. |
As agencies themselves shift between departments, there may be scope for providing time ranges but typically the newest hierarchy will be the one recorded. | As agencies themselves shift between departments, there may be scope for providing time ranges but typically the newest hierarchy will be the one recorded. |
A department/agency name will be the newest active name assigned to that ABN. | A department/agency name will be the newest active name assigned to that ABN.<br> |
ABN information is derived from the ABR. This is the definitive umpire about which former name should be linked to which current name. | |
For example "Department of Transport and Regional Services" became "Department of Infrastructure, Transport, Regional Development and Local Government" (same ABN) | |
however it later split into "Department of Infrastructure and Transport" (same ABN) | |
and "Department of Regional Australia, Regional Development and Local Government" (new ABN).<br> | |
Statistical information from http://www.apsc.gov.au/stateoftheservice/1011/statsbulletin/section1.html#t2total https://www.apsedii.gov.au/apsedii/CustomQueryx33.shtml | Statistical information from http://www.apsc.gov.au/stateoftheservice/1011/statsbulletin/section1.html#t2total https://www.apsedii.gov.au/apsedii/CustomQueryx33.shtml |
and individual annual reports. | and individual annual reports.<br> |
<h2>Webpage Assessment</h2> | |
Much due care has been put into correctly recording disclosure URLs. Typically the "About", "Corporate", "Publications" and "Sitemap" sections are checked at the very least. | |
Occasionally it is nessicary to use a site or Google search. In several rare cases, there is a secret "Disclosure" navigation menu you can find if you find one of the mandatory publishing obligations in that category (seriously).<br> | |
Some rules about leniency:<br> | |
<ul> | |
<li>An empty FOI disclosure log counts, a page outlining what the FOI Act is does not.</li> | |
<li>A disclosure log in PDF or Word format counts :(</li> | |
<li>An empty File/Record list counts (although that's very minimalistic that you have no files, electronic or paper)</li> | |
<li>Only a current information publication scheme page counts, not a s.9 FOI Act page or an organisation chart.</li> | |
<li>If there isn't a page easily listing all current and past Annual Reports, the most current one (html, pdf) counts.</li> | |
<li>Consultancy contracts might not need it's own webpage (if in Annual Report), grants/appointments might not apply to all organisations but Legal Services Expenditure (and all other obligations) does need a webpage. </li> | |
<h2>Open Government Scoring</h2> | <h2>Open Government Scoring</h2> |
+1 point for every true Has... attribute | +1 point for every true Has... attribute<br> |
-1 point for every false Has... (ie. Has Not) attribute | -1 point for every false Has... (ie. Has Not) attribute</br> |
Don't like this? Make your own score, suggest a better scoring mechanism. | Don't like this? Make your own score, suggest a better scoring mechanism.</br> |
<?php | <?php |
include_footer(); | include_footer(); |
?> | ?> |
AAF Company,82?008?629?490 | |
Aboriginal Hostels Limited ,47?008?504?587 | |
Administrative Appeals Tribunal,90?680?970?626 | |
Aged Care Standards and Accreditation Agency Ltd,64?079?618?652 | |
Airservices Australia ,59?698?720?886 | |
Albury-Wodonga Development Corporation ,71?893?478?442 | |
Anindilyakwa Land Council ,45?175?406?445 | |
Army and Air Force Canteen Service ,69?289?134?420 | |
ASC Pty Ltd ,64?008?605?034 | |
Attorney-General's Department,92?661?124?436 | |
Australia Business Arts Foundation Ltd ,88?072?479?835 | |
Australia Council,38?392?626?187 | |
Australian Agency for International Development (AusAID),62?921?558?838 | |
Australian Broadcasting Corporation,52?429?278?345 | |
Australian Bureau of Statistics,26?331?428?522 | |
Australian Centre for International Agricultural Research (ACIAR),34?864?955?427 | |
Australian Commission for Law Enforcement Integrity (ACLEI),78?796?734?093 | |
Australian Commission on Safety and Quality in Health Care,97250687371 | |
Australian Communications and Media Authority (ACMA),55?386?169?386 | |
Australian Competition and Consumer Commission,94?410?483?623 | |
Australian Crime Commission,11?259?448?410 | |
"Australian Curriculum, Assessment and Reporting Authority ",54?735?928?084 | |
Australian Customs and Border Protection Service,66?015?286?036 | |
Australian Electoral Commission,21?133?285?851 | |
Australian Federal Police,17?864?931?143 | |
"Australian Film, Television and Radio School",19?892?732?021 | |
Australian Fisheries Management Authority,81?098?497?517 | |
Australian Government Solicitor,69?405?937?639 | |
Australian Hearing Services ,80?308?797?003 | |
Australian Human Rights Commission,47?996?232?602 | |
Australian Industry Development,55?085?059?559 | |
Australian Institute for Teaching and School Leadership Limited,17?117?362?740 | |
Australian Institute of Aboriginal and Torres Strait Islander Studies,62?020?533?641 | |
Australian Institute of Criminology,63257175248 | |
Australian Institute of Family Studies (AIFS),64?001?053?079 | |
Australian Institute of Health and Welfare ,16?515?245?497 | |
Australian Institute of Marine Science,78?961?616?230 | |
Australian Law Reform Commission,88913413914 | |
Australian Learning and Teaching Council Limited ,30?109?826?628 | |
Australian Maritime Safety Authority,65?377?938?320 | |
Australian Military Forces Relief Trust Fund ,52?168?913?646 | |
Australian National Audit Office ,33?020?645?631 | |
Australian National Maritime Museum,35?023?590?988 | |
Australian National Preventive Health Agency (ANPHA),33?965?140?953 | |
Australian National University,52?234?063?906 | |
Australian Nuclear Science and Technology Organisation ,47?956?969?590 | |
Australian Office of Financial Management (AOFM),13?059?525?039 | |
Australian Pesticides and Veterinary Medicines Authority (APVMA),19?495?043?447 | |
Australian Postal Corporation,28?864?970?579 | |
Australian Prudential Regulation Authority (APRA),79?635?582?658 | |
Australian Public Service Commission (APS Commission),99?470?863?260 | |
Australian Radiation Protection and Nuclear Safety Agency (ARPANSA),61?321?195?155 | |
Australian Rail Track Corporation Limited ,75?081?455?754 | |
Australian Reinsurance Pool Corporation,74?807?136?872 | |
Australian Research Council,35?201?451?156 | |
Australian River Co. Limited,94?008?654?206 | |
Australian Secret Intelligence Service,49?667?785?014 | |
Australian Securities and Investments Commission,86?768?265?615 | |
Australian Security Intelligence Organisation,37?467?566?201 | |
Australian Skills Quality Authority (National Vocational Education and Training Regulator),72581678650 | |
Australian Solar Institute Limited ,65138300688 | |
Australian Sports Anti-Doping Authority (ASADA),91?592?527?503 | |
Australian Sports Commission,67374695240 | |
Australian Sports Foundation Limited ,27?008?613?858 | |
Australian Strategic Policy Institute Limited ,77?097?369?045 | |
Australian Taxation Office,51?824?753?556 | |
Australian Trade Commission (Austrade),11?764?698?227 | |
Australian Transaction Reports and Analysis Centre (AUSTRAC),32?770?513?371 | |
Australian Transport Safety Bureau (ATSB),86?267?354?017 | |
Australian War Memorial ,64?909?221?257 | |
Bundanon Trust,72?058?829?217 | |
Bureau of Meteorology,92?637?533?532 | |
Cancer Australia,21?075?951?918 | |
Central Land Council,71?979?619?393 | |
Civil Aviation Safety Authority,44?808?014?470 | |
Coal Mining Industry (Long Service Leave Funding) Corporation,12?039?670?644 | |
Comcare ,41?640?788?304 | |
Commonwealth Grants Commission,64?703?642?210 | |
Commonwealth Scientific and Industrial Research Organisation,41?687?119?230 | |
Commonwealth Superannuation Corporation ,48882817243 | |
ComSuper,77?310?752?950 | |
Corporations and Markets Advisory Committee (CAMAC),41?574?479?010 | |
Cotton Research and Development Corporation,71?054?238?316 | |
CrimTrac Agency,17?193?904?699 | |
Defence Housing Australia,72?968?504?934 | |
"Department of Agriculture, Fisheries and Forestry ",24?113?085?695 | |
"Department of Broadband, Communications and the Digital Economy",51?491?646?726 | |
Department of Climate Change and Energy Efficiency,50?182?626?845 | |
"Department of Education, Employment and Workplace Relations",63?578?775?294 | |
"Department of Families, Housing, Community Services and Indigenous Affairs",36?342?015?855 | |
Department of Finance and Deregulation,61?970?632?495 | |
Department of Foreign Affairs and Trade,47?065?634?525 | |
Department of Health and Ageing,83?605?426?759 | |
Department of Human Services,90?794?605?008 | |
Department of Immigration and Citizenship,33?380?054?835 | |
Department of Infrastructure and Transport,86?267?354?017 | |
"Department of Innovation, Industry, Science and Research",74?599?608?295 | |
Department of Parliamentary Services,52?997?141?147 | |
"Department of Regional Australia, Regional Development and Local Government",37?862?725?624 | |
"Department of Resources, Energy and Tourism",46?252?861?927 | |
"Department of Sustainability, Environment, Water, Population and Communities",34?190?894?983 | |
Department of the House of Representatives,18?526?287?740 | |
Department of the Prime Minister and Cabinet,18?108?001?191 | |
Department of the Senate,23?991?641?527 | |
Department of the Treasury,92?802?414?793 | |
Department of Veterans' Affairs,23?964?290?824 | |
Director of National Parks ,13?051?694?963 | |
Equal Opportunity for Women in the Workplace Agency,47?641?643?874 | |
Export Finance and Insurance Corporation,96?874?024?697 | |
Fair Work Australia (FWA),93?614?579?199 | |
Family Court of Australia,63?684?208?971 | |
Federal Court of Australia,49?110?847?399 | |
Federal Magistrates Court of Australia,60?265?617?271 | |
Fisheries Research and Development Corporation,74?311?094?913 | |
Food Standards Australia New Zealand,20?537?066?246 | |
Future Fund Management Agency,53?156?699?293 | |
General Practice Education and Training Limited,95?095?433?140 | |
Geoscience Australia,80?091?799?039 | |
Grains Research and Development Corporation ,55?611?223?291 | |
Grape and Wine Research and Development Corporation,72?618?007?571 | |
Great Barrier Reef Marine Park Authority,12?949?356?885 | |
Health Workforce Australia,21?295?050?589 | |
HIH Claims Support Limited,92?096?857?635 | |
IIF Investments Pty Limited,55?082?153?884 | |
Indigenous Business Australia,25?192?932?833 | |
Indigenous Land Corporation,59?912?679?254 | |
Insolvency and Trustee Service Australia (ITSA),63?384?330?717 | |
Inspector-General of Taxation,51?248?702?319 | |
Interim Independent Hospital Pricing Authority,27598959960 | |
IP Australia,38?113?072?755 | |
Low Carbon Australia Limited,63?097?727?968 | |
Medibank Private Limited ,47?080?890?259 | |
Migration Review Tribunal and Refugee Review Tribunal ,50?760?799?564 | |
Murray-Darling Basin Authority,13?679?821?382 | |
National Archives of Australia,36?889?228?992 | |
National Australia Day Council Limited ,76?050?300?626 | |
National Blood Authority,87?361?602?478 | |
National Breast and Ovarian Cancer Centre,85?094?118?902 | |
National Capital Authority,75?149?374?427 | |
National Competition Council ,56?552?760?098 | |
National Film and Sound Archive,41?251?017?588 | |
National Gallery of Australia,27?855?975?449 | |
National Health and Medical Research Council (NHMRC),88?601?010?284 | |
National Library of Australia ,28?346?858?075 | |
National Museum of Australia ,70?592?297?967 | |
National Native Title Tribunal,70?238?042?351 | |
National Offshore Petroleum Safety Authority (NOPSA),22?385?178?289 | |
National Water Commission ,94?364?176?431 | |
NBN Co Limited,86?136?533?741 | |
Northern Land Council,56?327?515?336 | |
Office of National Assessments,87?904?367?991 | |
Office of Parliamentary Counsel,41?425?630?817 | |
Office of the Auditing and Assurance Standards Board ,80?959?780?601 | |
Office of the Australian Accounting Standards Board (AASB),92?702?019?575 | |
Office of the Australian Building and Construction Commissioner,68?003?725?098 | |
Office of the Australian Information Commissioner ,85249230937 | |
Office of the Commonwealth Ombudsman,53?003?678?148 | |
Office of the Director of Public Prosecutions,41?036?606?436 | |
Office of the Fair Work Ombudsman,71?141?751?477 | |
Office of the Inspector-General of Intelligence and Security,67?332?668?643 | |
Office of the Official Secretary to the Governor-General,67?582?329?284 | |
Office of the Renewable Energy Regulator,68?574?011?917 | |
Old Parliament House,30?620?774?963 | |
Organ and Tissue Authority (Australian Organ and Tissue Donation and Transplantation Authority),56?253?405?315 | |
Outback Stores Pty Ltd ,63120661234 | |
Private Health Insurance Administration Council ,50?831?782?014 | |
Private Health Insurance Ombudsman,61?673?137?709 | |
Productivity Commission,78?094?372?050 | |
Professional Services Review Scheme,45?307?308?260 | |
RAAF Welfare Recreational Company ,45?008?499?303 | |
Reserve Bank of Australia,50?008?559?486 | |
Royal Australian Air Force Veterans' Residences Trust Fund ,40?594?141?285 | |
Royal Australian Air Force Welfare Trust Fund ,24?616?803?717 | |
Royal Australian Mint,45?852?104?259 | |
Royal Australian Navy Central Canteens Board,50?616?294?781 | |
Royal Australian Navy Relief Trust Fund ,49?934?525?476 | |
Rural Industries Research and Development Corporation,25?203?754?319 | |
Safe Work Australia,81?840?374?163 | |
Screen Australia ,46?741?353?180 | |
"Seafarers Safety, Rehabilitation and Compensation Authority (Seacare Authority)",32?745?854?352 | |
Special Broadcasting Service Corporation,91?314?398?574 | |
Sugar Research and Development Corporation,41?343?997?980 | |
Sydney Harbour Federation Trust,14?178?614?905 | |
Tertiary Education Quality and Standards Agency,50658250012 | |
Tiwi Land Council,86?106?441?085 | |
Torres Strait Regional Authority,57?155?285?807 | |
Tourism Australia ,99?657?548?712 | |
Wheat Exports Australia,40?485?918?341 | |
Wine Australia Corporation ,59?728?300?326 | |
Wreck Bay Aboriginal Community Council,62?564?797?956 | |
<?php | |
require_once '../include/common.inc.php'; | |
try { | |
$server->create_db('disclosr-agencies'); | |
} catch (SetteeRestClientException $e) { | |
setteErrorHandler($e); | |
} | |
$db = $server->get_db('disclosr-agencies'); | |
createAgencyDesignDoc(); | |
$conn = new PDO("pgsql:dbname=contractDashboard;user=postgres;password=snmc;host=localhost"); | |
$namesQ = 'select agency.abn, string_agg("agencyName",\'|\') as names from agency inner join agency_nametoabn on agency.abn::text = agency_nametoabn.abn group by agency.abn;'; | |
$abntonames = Array(); | |
foreach ($conn->query($namesQ) as $row) { | |
$abntonames[$row['abn']] = explode("|", $row['names']); | |
} | |
$result = $conn->query("select * from agency"); | |
while ($agency = $result->fetch(PDO::FETCH_ASSOC)) { | |
$agency['_id'] = md5($agency['abn']); | |
$agency['otherNames'] = $abntonames[$agency['abn']]; | |
if (sizeof($abntonames[$agency['abn']]) == 1) | |
$agency['name'] = $abntonames[$agency['abn']][0]; | |
$agency["lastScraped"] = "1/1/1970"; | |
$agency["scrapeDepth"] = 1; | |
try { | |
$doc = $db->save($agency); | |
//print_r($doc); | |
echo $agency['abn'] . " imported \n<br>"; | |
} catch (SetteeRestClientException $e) { | |
setteErrorHandler($e); | |
} | |
} | |
?> | |
<?php | |
require_once '../include/common.inc.php'; | |
$db = $server->get_db('disclosr-agencies'); | |
createAgencyDesignDoc(); | |
?> | |
<?php | |
include_once('../include/common.inc.php'); | |
include_header(); | |
// Include the diff class | |
echo '<STYLE TYPE="text/css"> | |
<!-- | |
@import url(../lib/php-diff/example/styles.css); | |
--> | |
</STYLE> | |
'; | |
require_once dirname(__FILE__) . '/../lib/php-diff/lib/Diff.php'; | |
// Generate a side by side diff | |
require_once dirname(__FILE__) . '/../lib/php-diff/lib/Diff/Renderer/Html/SideBySide.php'; | |
$renderer = new Diff_Renderer_Html_SideBySide; | |
$db = $server->get_db('disclosr-agencies'); | |
$docs = Array(); | |
try { | |
$rows = $db->get_view("app", "getConflicts")->rows; | |
//print_r($rows); | |
foreach ($rows as $row) { | |
echo '<h2>' . $row->id . '</h2>'; | |
echo "Comparing " . $row->value[0] . " and " . $row->value[1]; | |
$docA = explode(",", json_encode($db->get($row->id . "?rev=" . $row->value[0]))); | |
$docB = explode(",", json_encode($db->get($row->id . "?rev=" . $row->value[1]))); | |
// Options for generating the diff | |
$options = array( | |
//'ignoreWhitespace' => true, | |
//'ignoreCase' => true, | |
); | |
// Initialize the diff class | |
$diff = new Diff($docA, $docB, $options); | |
echo $diff->Render($renderer); | |
} | |
} catch (SetteeRestClientException $e) { | |
setteErrorHandler($e); | |
} | |
include_footer(); | |
?> |
<?php | |
include_once('../include/common.inc.php'); | |
include_header(); | |
$db = $server->get_db('disclosr-agencies'); | |
$docs = Array(); | |
try { | |
$rows = $db->get_view("app", "byABN")->rows; | |
//print_r($rows); | |
foreach ($rows as $row) { | |
$docs["a" . $row->key] = $row->value; | |
} | |
} catch (SetteeRestClientException $e) { | |
setteErrorHandler($e); | |
} | |
//print_r($docs); | |
$row = 1; | |
if (($handle = fopen("cacfma.csv", "r")) !== FALSE) { | |
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) { | |
$row++; | |
echo $data[0] . " " . str_replace("?", "", $data[1]) . "<br />\n"; | |
$name = $data[0]; | |
$abn = trim(str_replace("?", "", $data[1])); | |
$aabn = "a".$abn; | |
if (isset($docs[$aabn])) { | |
echo "Existing agency ABN detected<br>"; | |
if (!in_array($name, object_to_array($docs[$aabn]->otherNames)) && $name != $docs[$aabn]->name) { | |
$docs[$aabn]->otherNames[] = $name; | |
try { | |
$docs[$aabn] = $db->save($docs[$aabn]); | |
//print_r($doc); | |
echo $abn . " additional names imported \n<br>"; | |
} catch (SetteeRestClientException $e) { | |
setteErrorHandler($e); | |
} | |
} | |
} else { | |
echo "New agency ABN detected<br>"; | |
$agency['_id'] = md5($aabn); | |
$agency['name'] = $name; | |
$agency["abn"] = $abn; | |
try { | |
$doc = $db->save($agency); | |
print_r($doc); | |
echo $abn . " imported \n<br>"; | |
} catch (SetteeRestClientException $e) { | |
setteErrorHandler($e); | |
} | |
} | |
echo "<hr>"; | |
} | |
fclose($handle); | |
} | |
include_footer(); | |
?> |
<?php | |
include_once("../include/common.inc.php"); | |
function shortName($name) { | |
$name = trim($name); | |
if (strstr($name,"Minister ") || strstr($name,"Treasurer") || strstr($name,"Parliamentary Secretary")) { | |
$badWords = Array ("Assisting the Prime Minister on","Assisting on"," the "," of "," for "," on "," and "," to ",","," ","'","`"); | |
return str_replace($badWords,"",$name); | |
} | |
else { | |
$out = Array(); | |
preg_match_all('/[A-Z]/', $name, $out); | |
return implode("", $out[0]); | |
} | |
} | |
setlocale(LC_CTYPE, 'C'); | |
$headers = Array("#id", "name", "request_email", "short_name", "notes", "publication_scheme", "home_page", "tag_string"); | |
$db = $server->get_db('disclosr-agencies'); | |
$tag = Array(); | |
try { | |
$rows = $db->get_view("app", "byDeptStateName", null, true)->rows; | |
//print_r($rows); | |
foreach ($rows as $row) { | |
$tag[$row->id] = phrase_to_tag(dept_to_portfolio($row->key)); | |
} | |
} catch (SetteeRestClientException $e) { | |
setteErrorHandler($e); | |
die(); | |
} | |
$foiEmail = Array(); | |
try { | |
$rows = $db->get_view("app", "foiEmails", null, true)->rows; | |
//print_r($rows); | |
foreach ($rows as $row) { | |
$foiEmail[$row->key] = $row->value; | |
} | |
} catch (SetteeRestClientException $e) { | |
setteErrorHandler($e); | |
die(); | |
} | |
$fp = fopen('php://output', 'w'); | |
if ($fp && $db) { | |
header('Content-Type: text/csv; charset=utf-8'); | |
header('Content-Disposition: attachment; filename="export.' . date("c") . '.csv"'); | |
header('Pragma: no-cache'); | |
header('Expires: 0'); | |
fputcsv($fp, $headers); | |
try { | |
$agencies = $db->get_view("app", "byCanonicalName", null, true)->rows; | |
//print_r($rows); | |
foreach ($agencies as $agency) { | |
// print_r($agency); | |
if (isset($agency->value->foiEmail) && $agency->value->foiEmail != "null" && !isset($agency->value->status)) { | |
$row = Array(); | |
$row["#id"] = $agency->id; | |
$row["name"] = trim($agency->value->name); | |
if (isset($agency->value->foiEmail)) { | |
$row["request_email"] = $agency->value->foiEmail; | |
} else { | |
if ($agency->value->orgType == "FMA-DepartmentOfState") { | |
$row["request_email"] = "foi@" . GetDomain($agency->value->website); | |
} else { | |
$row["request_email"] = $foiEmail[$agency->value->parentOrg]; | |
} | |
} | |
if (isset($agency->value->shortName)) { | |
$row["short_name"] = $agency->value->shortName; | |
} else { | |
$row["short_name"] = shortName($agency->value->name); | |
} | |
$row["notes"] = ""; | |
$row["publication_scheme"] = (isset($agency->value->infoPublicationSchemeURL) ? $agency->value->infoPublicationSchemeURL : ""); | |
$row["home_page"] = (isset($agency->value->website) ? $agency->value->website : ""); | |
if ($agency->value->orgType == "FMA-DepartmentOfState") { | |
$row["tag_string"] = $tag[$agency->value->_id] . " " . $agency->value->orgType; | |
} else { | |
$row["tag_string"] = $tag[$agency->value->parentOrg] . " " . $agency->value->orgType; | |
} | |
fputcsv($fp, array_values($row)); | |
if (isset($agency->value->foiBodies)) { | |
foreach ($agency->value->foiBodies as $foiBody) { | |
$row['name'] = iconv("UTF-8", "ASCII//TRANSLIT",$foiBody); | |
$row["short_name"] = shortName($foiBody); | |
fputcsv($fp, array_values($row)); | |
} | |
} | |
} | |
} | |
} catch (SetteeRestClientException $e) { | |
setteErrorHandler($e); | |
} | |
die; | |
} | |
?> | |
<?php | |
include_once("../include/common.inc.php"); | |
setlocale(LC_CTYPE, 'C'); | |
header('Content-Type: text/csv'); | |
header('Content-Disposition: attachment; filename="public_body_categories_en.rb"'); | |
header('Pragma: no-cache'); | |
header('Expires: 0'); | |
echo 'PublicBodyCategories.add(:en, [' . PHP_EOL; | |
echo ' "Portfolios",' . PHP_EOL; | |
$db = $server->get_db('disclosr-agencies'); | |
try { | |
$rows = $db->get_view("app", "byDeptStateName", null, true)->rows; | |
//print_r($rows); | |
foreach ($rows as $row) { | |
echo ' [ "' . phrase_to_tag(dept_to_portfolio($row->key)) . '","' . dept_to_portfolio($row->key) . '","part of the ' . dept_to_portfolio($row->key) . ' portfolio" ],' . PHP_EOL; | |
} | |
} catch (SetteeRestClientException $e) { | |
setteErrorHandler($e); | |
} | |
echo '])'; | |
?> | |
<?php | |
include_once('include/common.inc.php'); | |
include_header(); | |
$db = $server->get_db('disclosr-agencies'); | |
?> | |
<div class="foundation-header"> | |
<h1><a href="about.php">Charts</a></h1> | |
<h4 class="subheader">Lorem ipsum.</h4> | |
</div> | |
<div id="placeholder" style="width:900px;height:600px;"></div> | |
<script id="source"> | |
window.onload = function() { | |
$(document).ready(function() { | |
var d1 = []; | |
var labels = []; | |
<?php | |
try { | |
$rows = $db->get_view("app", "scoreHas?group=true", null, true)->rows; | |
/*foreach ($rows as $key => $row) { | |
echo " d1.push([$key, {$row->value}]);".PHP_EOL; | |
echo " labels.push('{$row->key}');".PHP_EOL; | |
}*/ | |
$dataValues = Array(); | |
foreach ($rows as $row) { | |
$dataValues[$row->value] = $row->key; | |
} | |
$i = 0; | |
ksort($dataValues); | |
foreach($dataValues as $value => $key) { | |
echo " d1.push([$i, $value]);".PHP_EOL; | |
echo " labels.push('$key');".PHP_EOL; | |
$i++; | |
} | |
} catch (SetteeRestClientException $e) { | |
setteErrorHandler($e); | |
} | |
?> | |
$.plot($("#placeholder"), [ d1], { | |
grid: { hoverable: true }, | |
series: { | |
bars: { show: true, barWidth: 0.6 } | |
}, | |
xaxis: { | |
tickFormatter: function formatter(val, axis) { | |
if (labels[val]) { | |
return(labels[val]); | |
} else { | |
return ""; | |
} | |
}, | |
labelAngle: 90 | |
} | |
}); | |
var previousPoint = null; | |
$("#placeholder").bind("plothover", function (event, pos, item) { | |
if (item) { | |
if (previousPoint != item.datapoint) { | |
previousPoint = item.datapoint; | |
$("#tooltip").remove(); | |
var x = item.datapoint[0], | |
y = item.datapoint[1] - item.datapoint[2]; | |
showTooltip(item.pageX, item.pageY, y ); | |
} | |
} | |
else { | |
$("#tooltip").remove(); | |
previousPoint = null; | |
} | |
}); | |
}); | |
}; | |
function showTooltip(x, y, contents) { | |
$('<div id="tooltip">' + contents + '</div>').css( { | |
position: 'absolute', | |
display: 'none', | |
top: y + 5, | |
left: x + 5, | |
border: '1px solid #fdd', | |
padding: '2px', | |
'background-color': '#fee', | |
opacity: 0.80 | |
}).appendTo("body").fadeIn(200); | |
} | |
</script> | |
<?php | |
include_footer(); | |
?> |
<?php | <?php |
include_once('include/common.inc.php'); | include_once('include/common.inc.php'); |
include_header(); | include_header(); |
function displayValue($key, $value, $mode) { | function displayValue($key, $value, $mode) { |
global $db, $schemas; | |
if ($mode == "view") { | if ($mode == "view") { |
echo "<tr>"; | |
echo "<td>" . $schemas['agency']["properties"][$key]['x-title'] . "<br><small>" . $schemas['agency']["properties"][$key]['description'] . "</small></td><td>"; | |
if (is_array($value)) { | if (is_array($value)) { |
echo "<tr><td>$key</td><td><ol>"; | echo "<ol>"; |
foreach ($value as $subkey => $subvalue) { | foreach ($value as $subkey => $subvalue) { |
echo "<li>$subvalue</li>"; | if (isset($schemas['agency']["properties"][$key]['x-itemprop'])) { |
echo '<li itemprop="' . $schemas['agency']["properties"][$key]['x-itemprop'] . '">'; | |
} else { | |
echo "<li>"; | |
} | |
echo "$subvalue</li>"; | |
} | } |
echo "</ol></td></tr>"; | echo "</ol></td></tr>"; |
} else { | } else { |
echo "<tr><td>$key</td><td>$value</td></tr>"; | if (isset($schemas['agency']["properties"][$key]['x-itemprop'])) { |
echo '<span itemprop="' . $schemas['agency']["properties"][$key]['x-itemprop'] . '">'; | |
} else { | |
echo "<span>"; | |
} | |
if ((strpos($key, "URL") > 0 || $key == 'website') && $value != "") { | |
echo "<a href='$value'>view</a></span>"; | |
} else { | |
echo "$value</span>"; | |
} | |
} | } |
echo "</td></tr>"; | |
} | } |
if ($mode == "edit") { | if ($mode == "edit") { |
if (is_array($value)) { | if (is_array($value)) { |
echo '<div class="row"> | echo '<div class="row"> |
<div class="seven columns"> | <div class="seven columns"> |
<fieldset> | <fieldset> |
<h5>' . $key . '</h5>'; | <h5>' . $key . '</h5>'; |
foreach ($value as $subkey => $subvalue) { | foreach ($value as $subkey => $subvalue) { |
echo "<label>$subkey</label><input class='input-text' type='text' id='$key$subkey' name='$key" . '[' . $subkey . "]' value='$subvalue'/></tr>"; | echo "<label>$subkey</label><input class='input-text' type='text' id='$key$subkey' name='$key" . '[' . $subkey . "]' value='$subvalue'/></tr>"; |
} | } |
echo "</fieldset> | echo "</fieldset> |
</div> | </div> |
</div>"; | </div>"; |
} else { | } else { |
if (strpos($key, "_") === 0) { | if (strpos($key, "_") === 0) { |
echo"<input type='hidden' id='$key' name='$key' value='$value'/>"; | echo"<input type='hidden' id='$key' name='$key' value='$value'/>"; |
} if (strpos($key, "has") === 0) { | } else if ($key == "parentOrg") { |
echo "<label for='$key'><input type='checkbox' id='$key' name='$key' value='$value'> $key</label>"; | echo "<label for='$key'>$key</label><select id='$key' name='$key'><option value=''> Select... </option>"; |
$rows = $db->get_view("app", "byDeptStateName")->rows; | |
//print_r($rows); | |
foreach ($rows as $row) { | |
echo "<option value='{$row->value}'" . (($row->value == $value) ? "SELECTED" : "") . " >" . str_replace("Department of ", "", $row->key) . "</option>"; | |
} | |
echo" </select>"; | |
} else if (strpos($key, "has") === 0) { | |
echo "<label for='$key'><input type='checkbox' id='$key' name='$key' " . (($value == 'on' || $value == 'true') ? "checked='$value'" : "") . "> $key</label>"; | |
} else { | } else { |
echo "<label>$key</label><input class='input-text' type='text' id='$key' name='$key' value='$value'/>"; | echo "<label>$key</label><input class='input-text' type='text' id='$key' name='$key' value='$value'/>"; |
if ((strpos($key,"URL") > 0 || $key == 'website')&& $value != "") { | if ((strpos($key, "URL") > 0 || $key == 'website') && $value != "") { |
echo "<a href='$value'>view</a>"; | echo "<a href='$value'>view</a>"; |
} | } |
if ($key == 'abn') { | if ($key == 'abn') { |
echo "<a href='http://www.abr.business.gov.au/SearchByAbn.aspx?SearchText=33380054835'>view abn</a>"; | echo "<a href='http://www.abr.business.gov.au/SearchByAbn.aspx?SearchText=$value'>view abn</a>"; |
} | } |
} | } |
} | } |
} | } |
// | // |
} | } |
function addDefaultFields($row) { | function addDefaultFields($row) { |
$defaultFields = Array("name"); | global $schemas; |
$defaultFields = array_keys($schemas['agency']['properties']); | |
foreach ($defaultFields as $defaultField) { | foreach ($defaultFields as $defaultField) { |
if (!isset($row[$defaultField])) | if (!isset($row[$defaultField])) { |
$row[$defaultField] = ""; | if ($schemas['agency']['properties'][$defaultField]['type'] == "string") { |
if (strpos($defaultField, "has") === 0) { | |
$row[$defaultField] = "false"; | |
} else { | |
$row[$defaultField] = ""; | |
} | |
} | |
if ($schemas['agency']['properties'][$defaultField]['type'] == "array") { | |
$row[$defaultField] = Array(""); | |
} | |
} | |
} | } |
return $row; | return $row; |
} | } |
$db = $server->get_db('disclosr-agencies'); | $db = $server->get_db('disclosr-agencies'); |
if (isset($_REQUEST['id'])) { | if (isset($_REQUEST['id'])) { |
//get an agency record as json/html, search by name/abn/id | //get an agency record as json/html, search by name/abn/id |
// by name = startkey="Ham"&endkey="Ham\ufff0" | // by name = startkey="Ham"&endkey="Ham\ufff0" |
// edit? | // edit? |
$row = $db->get($_REQUEST['id']); | $row = $db->get($_REQUEST['id']); |
//print_r($row); | //print_r($row); |
if (sizeof($_POST) > 0) { | if (sizeof($_POST) > 0) { |
//print_r($_POST); | //print_r($_POST); |
foreach ($_POST as $postkey => $postvalue) { | |
if ($postvalue == "") { | |
unset($_POST[$postkey]); | |
} | |
if (is_array($postvalue) && count($postvalue) == 1 && $postvalue[0] == "") { | |
unset($_POST[$postkey]); | |
} | |
} | |
if (isset($_POST['_id']) && $db->get_rev($_POST['_id']) == $_POST['_rev']) { | if (isset($_POST['_id']) && $db->get_rev($_POST['_id']) == $_POST['_rev']) { |
echo "Edited version was latest version, continue saving"; | echo "Edited version was latest version, continue saving"; |
$newdoc = $_POST; | $newdoc = $_POST; |
$newdoc['metadata']['lastModified'] = time(); | $newdoc['metadata']['lastModified'] = time(); |
$row = $db->save($newdoc); | $row = $db->save($newdoc); |
} else { | } else { |
echo "ALERT doc revised by someone else while editing."; | echo "ALERT doc revised by someone else while editing. Document not saved."; |
} | } |
} | } |
$mode = "edit"; | $mode = "view"; |
$row = addDefaultFields(object_to_array($row)); | if ($mode == "edit") { |
$row = addDefaultFields(object_to_array($row)); | |
} else { | |
$row = object_to_array($row); | |
} | |
if ($mode == "view") { | if ($mode == "view") { |
echo '<table width="100%">'; | echo '<div itemscope itemtype ="http://schema.org/GovernmentOrganisation"><table width="100%">'; |
echo '<tr> <td colspan="2"><h3>' . $row['name'] . "</h3></td></tr>"; | echo '<tr> <td colspan="2"><h3>' . $row['name'] . "</h3></td></tr>"; |
echo "<tr><th>Field Name</th><th>Field Value</th></tr>"; | echo "<tr><th>Field Name</th><th>Field Value</th></tr>"; |
} | } |
if ($mode == "edit") { | if ($mode == "edit") { |
?> | ?> |
<input id="addfield" type="button" value="Add Field"/> | <input id="addfield" type="button" value="Add Field"/> |
<script> | <script> |
window.onload = function() { | window.onload = function() { |
$(document).ready(function() { | $(document).ready(function() { |
// put all your jQuery goodness in here. | // put all your jQuery goodness in here. |
// http://charlie.griefer.com/blog/2009/09/17/jquery-dynamically-adding-form-elements/ | // http://charlie.griefer.com/blog/2009/09/17/jquery-dynamically-adding-form-elements/ |
$('#addfield').click(function() { | $('#addfield').click(function() { |
var field_name=window.prompt("fieldname?",""); | var field_name=window.prompt("fieldname?",""); |
if (field_name !="") { | if (field_name !="") { |
$('#submitbutton').before($('<span></span>') | $('#submitbutton').before($('<span></span>') |
.append("<label>"+field_name+"</label>") | .append("<label>"+field_name+"</label>") |
.append("<input class='input-text' type='text' id='"+field_name+"' name='"+field_name+"'/>") | .append("<input class='input-text' type='text' id='"+field_name+"' name='"+field_name+"'/>") |
); | ); |
} | } |
}); | }); |
}); | }); |
}; | }; |
</script> | </script> |
<form id="editform" class="nice" method="post"> | <form id="editform" class="nice" method="post"> |
<?php | <?php |
} | |
foreach ($row as $key => $value) { | |
echo displayValue($key, $value, $mode); | |
} | |
if ($mode == "view") { | |
echo "</table></div>"; | |
} | |
if ($mode == "edit") { | |
echo '<input id="submitbutton" type="submit"/></form>'; | |
} | |
} else { | |
try { | |
/* $rows = $db->get_view("app", "showNamesABNs")->rows; | |
//print_r($rows); | |
foreach ($rows as $row) { | |
// print_r($row); | |
echo '<li><a href="getAgency.php?id=' . $row->key . '">' . | |
(isset($row->value->name) && $row->value->name != "" ? $row->value->name : "NO NAME " . $row->value->abn) | |
. '</a></li>'; | |
} */ | |
$rows = $db->get_view("app", "byName")->rows; | |
//print_r($rows); | |
foreach ($rows as $row) { | |
// print_r($row); | |
echo '<li itemscope itemtype="http://schema.org/GovernmentOrganization"><a href="getAgency.php?id=' . $row->value . '" itemprop="url"><span itemprop="name">' . | |
$row->key | |
. '</span></a></li>'; | |
} | } |
foreach ($row as $key => $value) { | } catch (SetteeRestClientException $e) { |
echo displayValue($key, $value, $mode); | setteErrorHandler($e); |
} | |
if ($mode == "view") { | |
echo "</table>"; | |
} | |
if ($mode == "edit") { | |
echo '<input id="submitbutton" type="submit"/></form>'; | |
} | |
} else { | |
try { | |
$rows = $db->get_view("app", "showNamesABNs")->rows; | |
//print_r($rows); | |
foreach ($rows as $row) { | |
// print_r($row); | |
echo '<li><a href="getAgency.php?id=' . $row->key . '">' . | |
(isset($row->value->name) && $row->value->name != "" ? $row->value->name : "NO NAME " . $row->value->abn) | |
. '</a></li>'; | |
} | |
} catch (SetteeRestClientException $e) { | |
setteErrorHandler($e); | |
} | |
} | } |
include_footer(); | } |
?> | include_footer(); |
?> |
<?php | |
include_once('include/common.inc.php'); | |
//include_header(); | |
$format = "html"; | |
if (isset($_REQUEST['format'])) { | |
$format = $_REQUEST['format']; | |
} | |
function add_node($id, $label) { | |
global $format; | |
if ($format == "html") { | |
echo "nodes[\"$id\"] = graph.newNode({label: \"$label\"});" . PHP_EOL; | |
} | |
if ($format == "dot" && $label != "") { | |
echo "$id [label=\"$label\"];". PHP_EOL; | |
} | |
} | |
function add_edge($from, $to, $color) { | |
global $format; | |
if ($format == "html") { | |
echo "graph.newEdge(nodes[\"$from\"], nodes['$to'], {color: '$color'});" . PHP_EOL; | |
} | |
if ($format == "dot") { | |
echo "$from -> $to ".($color != ""? "[color=$color]":"").";". PHP_EOL; | |
} | |
} | |
if ($format == "html") { | |
?> | |
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js"></script> | |
<script src="lib/springy/springy.js"></script> | |
<script src="lib/springy/springyui.js"></script> | |
<script> | |
var graph = new Graph(); | |
var nodes = []; | |
<?php | |
} | |
if ($format == "dot") { | |
echo 'digraph g {'. PHP_EOL; | |
} | |
$db = $server->get_db('disclosr-agencies'); | |
add_node("fedg","Federal Government - Commonwealth of Australia"); | |
try { | |
$rows = $db->get_view("app", "byCanonicalName", null, true)->rows; | |
//print_r($rows); | |
foreach ($rows as $row) { | |
add_node($row->id, $row->key); | |
} | |
} catch (SetteeRestClientException $e) { | |
setteErrorHandler($e); | |
} | |
try { | |
$rows = $db->get_view("app", "byDeptStateName", null, true)->rows; | |
//print_r($rows); | |
foreach ($rows as $row) { | |
add_edge("fedg", $row->value, 'yellow'); | |
} | |
} catch (SetteeRestClientException $e) { | |
setteErrorHandler($e); | |
} | |
try { | |
$rows = $db->get_view("app", "parentOrgs", null, true)->rows; | |
// print_r($rows); | |
foreach ($rows as $row) { | |
add_edge($row->key, $row->value, 'blue'); | |
} | |
} catch (SetteeRestClientException $e) { | |
setteErrorHandler($e); | |
} | |
if ($format == "html") { | |
?> | |
window.onload = function() { | |
$(document).ready(function() { | |
var springy = $('#springydemo').springy({ | |
graph: graph | |
}); | |
}); | |
}; | |
</script> | |
<canvas id="springydemo" width="1260" height="680" /> | |
<?php | |
} | |
if ($format == "dot") { | |
echo "}"; | |
} | |
//include_footer(); | |
?> | |
<?php | |
require_once 'include/common.inc.php'; | |
try { | |
$server->create_db('disclosr-agencies'); | |
} catch (SetteeRestClientException $e) { | |
setteErrorHandler($e); | |
} | |
$db = $server->get_db('disclosr-agencies'); | |
createAgencyDesignDoc(); | |
$conn = new PDO("pgsql:dbname=contractDashboard;user=postgres;password=snmc;host=localhost"); | |
$namesQ = 'select agency.abn, string_agg("agencyName",\'|\') as names from agency inner join agency_nametoabn on agency.abn::text = agency_nametoabn.abn group by agency.abn;'; | |
$abntonames = Array(); | |
foreach ($conn->query($namesQ) as $row) { | |
$abntonames[$row['abn']] = explode("|", $row['names']); | |
} | |
$result = $conn->query("select * from agency"); | |
while ($agency = $result->fetch(PDO::FETCH_ASSOC)) { | |
$agency['_id'] = md5($agency['abn']); | |
$agency['otherNames'] = $abntonames[$agency['abn']]; | |
if (sizeof($abntonames[$agency['abn']]) == 1) | |
$agency['name'] = $abntonames[$agency['abn']][0]; | |
$agency["lastScraped"] = "1/1/1970"; | |
$agency["scrapeDepth"] = 1; | |
try { | |
$doc = $db->save($agency); | |
//print_r($doc); | |
echo $agency['abn'] . " imported \n<br>"; | |
} catch (SetteeRestClientException $e) { | |
setteErrorHandler($e); | |
} | |
} | |
?> | |
<?php | <?php |
date_default_timezone_set("Australia/Sydney"); | |
$basePath = ""; | |
if (strstr($_SERVER['PHP_SELF'], "alaveteli/") | |
|| strstr($_SERVER['PHP_SELF'], "admin/") | |
|| strstr($_SERVER['PHP_SELF'], "lib/") | |
|| strstr($_SERVER['PHP_SELF'], "include/")) | |
$basePath = "../"; | |
include_once ('couchdb.inc.php'); | include_once ('couchdb.inc.php'); |
include_once ('template.inc.php'); | include_once ('template.inc.php'); |
# Convert a stdClass to an Array. http://www.php.net/manual/en/language.types.object.php#102735 | # Convert a stdClass to an Array. http://www.php.net/manual/en/language.types.object.php#102735 |
function object_to_array(stdClass $Class) { | function object_to_array(stdClass $Class) { |
# Typecast to (array) automatically converts stdClass -> array. | # Typecast to (array) automatically converts stdClass -> array. |
$Class = (array) $Class; | $Class = (array) $Class; |
# Iterate through the former properties looking for any stdClass properties. | # Iterate through the former properties looking for any stdClass properties. |
# Recursively apply (array). | # Recursively apply (array). |
foreach ($Class as $key => $value) { | foreach ($Class as $key => $value) { |
if (is_object($value) && get_class($value) === 'stdClass') { | if (is_object($value) && get_class($value) === 'stdClass') { |
$Class[$key] = object_to_array($value); | $Class[$key] = object_to_array($value); |
} | } |
} | } |
return $Class; | return $Class; |
} | } |
# Convert an Array to stdClass. http://www.php.net/manual/en/language.types.object.php#102735 | # Convert an Array to stdClass. http://www.php.net/manual/en/language.types.object.php#102735 |
function array_to_object(array $array) { | function array_to_object(array $array) { |
# Iterate through our array looking for array values. | # Iterate through our array looking for array values. |
# If found recurvisely call itself. | # If found recurvisely call itself. |
foreach ($array as $key => $value) { | foreach ($array as $key => $value) { |
if (is_array($value)) { | if (is_array($value)) { |
$array[$key] = array_to_object($value); | $array[$key] = array_to_object($value); |
} | } |
} | } |
# Typecast to (object) will automatically convert array -> stdClass | # Typecast to (object) will automatically convert array -> stdClass |
return (object) $array; | return (object) $array; |
} | } |
?> | |
function dept_to_portfolio($deptName) { | |
return trim(str_replace("Department of", "", str_replace("Department of the", "Department of", $deptName))); | |
} | |
function phrase_to_tag ($phrase) { | |
return str_replace(" ","_",str_replace("'","",str_replace(",","",strtolower($phrase)))); | |
} | |
function GetDomain($url) | |
{ | |
$nowww = ereg_replace('www\.','',$url); | |
$domain = parse_url($nowww); | |
if(!empty($domain["host"])) | |
{ | |
return $domain["host"]; | |
} else | |
{ | |
return $domain["path"]; | |
} | |
} | |
<?php | <?php |
include "schemas/schemas.inc.php"; | include $basePath . "schemas/schemas.inc.php"; |
require ($basePath . 'couchdb/settee/src/settee.php'); | |
function createAgencyDesignDoc() { | function createAgencyDesignDoc() { |
global $db; | global $db; |
$obj = new stdClass(); | $obj = new stdClass(); |
$obj->_id = "_design/" . urlencode("app"); | $obj->_id = "_design/" . urlencode("app"); |
$obj->language = "javascript"; | $obj->language = "javascript"; |
$obj->views->all->map = "function(doc) { emit(doc._id, doc); };"; | |
$obj->views->byABN->map = "function(doc) { emit(doc.abn, doc); };"; | $obj->views->byABN->map = "function(doc) { emit(doc.abn, doc); };"; |
$obj->views->byName->map = "function(doc) { emit(doc.name, doc); };"; | $obj->views->byCanonicalName->map = "function(doc) { |
if (doc.parentOrg || doc.orgType == 'FMA-DepartmentOfState') { | |
emit(doc.name, doc); | |
} | |
};"; | |
$obj->views->byDeptStateName->map = "function(doc) { | |
if (doc.orgType == 'FMA-DepartmentOfState') { | |
emit(doc.name, doc._id); | |
} | |
};"; | |
$obj->views->parentOrgs->map = "function(doc) { | |
if (doc.parentOrg) { | |
emit(doc._id, doc.parentOrg); | |
} | |
};"; | |
$obj->views->byName->map = "function(doc) { | |
emit(doc.name, doc._id); | |
for (name in doc.otherNames) { | |
if (doc.otherNames[name] != '' && doc.otherNames[name] != doc.name) { | |
emit(doc.otherNames[name], doc._id); | |
} | |
} | |
};"; | |
$obj->views->foiEmails->map = "function(doc) { | |
emit(doc._id, doc.foiEmail); | |
};"; | |
$obj->views->byLastModified->map = "function(doc) { emit(doc.metadata.lastModified, doc); }"; | $obj->views->byLastModified->map = "function(doc) { emit(doc.metadata.lastModified, doc); }"; |
$obj->views->getActive->map = 'function(doc) { if (doc.status == "active") { emit(doc._id, doc); } };'; | $obj->views->getActive->map = 'function(doc) { if (doc.status == "active") { emit(doc._id, doc); } };'; |
$obj->views->getSuspended->map = 'function(doc) { if (doc.status == "suspended") { emit(doc._id, doc); } };'; | $obj->views->getSuspended->map = 'function(doc) { if (doc.status == "suspended") { emit(doc._id, doc); } };'; |
$obj->views->getScrapeRequired->map = "function(doc) { emit(doc.abn, doc); };"; | $obj->views->getScrapeRequired->map = "function(doc) { |
var lastScrape = Date.parse(doc.metadata.lastScraped); | |
var today = new Date(); | |
if (!lastScrape || lastScrape.getTime() + 1000 != today.getTime()) { | |
emit(doc._id, doc); | |
} | |
};"; | |
$obj->views->showNamesABNs->map = "function(doc) { emit(doc._id, {name: doc.name, abn: doc.abn}); };"; | $obj->views->showNamesABNs->map = "function(doc) { emit(doc._id, {name: doc.name, abn: doc.abn}); };"; |
$obj->views->getConflicts->map = "function(doc) { | |
if (doc._conflicts) { | |
emit(null, [doc._rev].concat(doc._conflicts)); | |
} | |
}"; | |
// http://stackoverflow.com/questions/646628/javascript-startswith | |
$obj->views->scoreHas->map = 'if(!String.prototype.startsWith){ | |
String.prototype.startsWith = function (str) { | |
return !this.indexOf(str); | |
} | |
} | |
if(!String.prototype.endsWith){ | |
String.prototype.endsWith = function(suffix) { | |
return this.indexOf(suffix, this.length - suffix.length) !== -1; | |
}; | |
} | |
function(doc) { | |
if (typeof(doc["status"]) == "undefined" || doc["status"] != "suspended") { | |
for(var propName in doc) { | |
if(typeof(doc[propName]) != "undefined" && (propName.startsWith("has") || propName.endsWith("URL"))) { | |
emit(propName, 1); | |
} | |
} | |
emit("total", 1); | |
} | |
}'; | |
$obj->views->score->map = 'if(!String.prototype.startsWith){ | |
String.prototype.startsWith = function (str) { | |
return !this.indexOf(str); | |
} | |
} | |
function(doc) { | |
count = 0; | |
if (typeof(doc["status"]) == "undefined" || doc["status"] != "suspended") { | |
for(var propName in doc) { | |
if(typeof(doc[propName]) != "undefined" && propName.startsWith("l")) { | |
count++ | |
} | |
} | |
emit(count+doc._id, {id:doc._id, name: doc.name, score:count}); | |
} | |
}'; | |
// allow safe updates (even if slightly slower due to extra: rev-detection check). | // allow safe updates (even if slightly slower due to extra: rev-detection check). |
return $db->save($obj, true); | return $db->save($obj, true); |
} | } |
require ('couchdb/settee/src/settee.php'); | if (php_uname('n') == "vanille") { |
$server = new SetteeServer('http://127.0.0.1:5984'); | $server = new SetteeServer('http://192.168.178.21:5984'); |
} else | |
if (php_uname('n') == "KYUUBEY") { | |
$server = new SetteeServer('http://192.168.1.148:5984'); | |
} else { | |
$server = new SetteeServer('http://127.0.0.1:5984'); | |
} | |
function setteErrorHandler($e) { | function setteErrorHandler($e) { |
echo $e->getMessage() . "<br>" . PHP_EOL; | echo $e->getMessage() . "<br>" . PHP_EOL; |
} | } |
?> | |
<?php | <?php |
function include_header() { | function include_header() { |
global $basePath; | |
?> | ?> |
<!DOCTYPE html> | <!DOCTYPE html> |
<!-- paulirish.com/2008/conditional-stylesheets-vs-css-hacks-answer-neither/ --> | <!-- paulirish.com/2008/conditional-stylesheets-vs-css-hacks-answer-neither/ --> |
<!--[if lt IE 7]> <html class="no-js lt-ie9 lt-ie8 lt-ie7" lang="en"> <![endif]--> | <!--[if lt IE 7]> <html class="no-js lt-ie9 lt-ie8 lt-ie7" lang="en"> <![endif]--> |
<!--[if IE 7]> <html class="no-js lt-ie9 lt-ie8" lang="en"> <![endif]--> | <!--[if IE 7]> <html class="no-js lt-ie9 lt-ie8" lang="en"> <![endif]--> |
<!--[if IE 8]> <html class="no-js lt-ie9" lang="en"> <![endif]--> | <!--[if IE 8]> <html class="no-js lt-ie9" lang="en"> <![endif]--> |
<!--[if gt IE 8]><!--> <html lang="en"> <!--<![endif]--> | <!--[if gt IE 8]><!--> <html lang="en"> <!--<![endif]--> |
<head> | <head> |
<meta charset="utf-8" /> | <meta charset="utf-8" /> |
<!-- Set the viewport width to device width for mobile --> | <!-- Set the viewport width to device width for mobile --> |
<meta name="viewport" content="width=device-width" /> | <meta name="viewport" content="width=device-width" /> |
<title>Disclosr</title> | <title>Disclosr</title> |
<!-- Included CSS Files --> | <!-- Included CSS Files --> |
<link rel="stylesheet" href="stylesheets/foundation.css"> | <link rel="stylesheet" href="<?php echo $basePath ?>stylesheets/foundation.css"> |
<link rel="stylesheet" href="stylesheets/app.css"> | <link rel="stylesheet" href="<?php echo $basePath ?>stylesheets/app.css"> |
<!--[if lt IE 9]> | <!--[if lt IE 9]> |
<link rel="stylesheet" href="stylesheets/ie.css"> | <link rel="stylesheet" href="<?php echo $basePath ?>stylesheets/ie.css"> |
<![endif]--> | <![endif]--> |
<!-- IE Fix for HTML5 Tags --> | <!-- IE Fix for HTML5 Tags --> |
<!--[if lt IE 9]> | <!--[if lt IE 9]> |
<script src="http://html5shiv.googlecode.com/svn/trunk/html5.js"></script> | <script src="http://html5shiv.googlecode.com/svn/trunk/html5.js"></script> |
<![endif]--> | <![endif]--> |
</head> | </head> |
<body> | <body> |
<!-- navBar --> | <!-- navBar --> |
<div id="navbar" class="container"> | <div id="navbar" class="container"> |
<div class="row"> | <div class="row"> |
<div class="four columns"> | <div class="four columns"> |
<h1><a href="/">Disclosr</a></h1> | <h1><a href="/">Disclosr</a></h1> |
</div> | </div> |
<div class="eight columns hide-on-phones"> | <div class="eight columns hide-on-phones"> |
<strong class="right"> | <strong class="right"> |
<a href="getAgency.php">Agencies</a> | <a href="getAgency.php">Agencies</a> |
<a href="about.php">About/FAQ</a> | <a href="about.php">About/FAQ</a> |
</strong> | </strong> |
</div> | </div> |
</div> | </div> |
</div> | </div> |
<!-- /navBar --> | <!-- /navBar --> |
<!-- container --> | <!-- container --> |
<div class="container"> | <div class="container"> |
<?php } | <?php } |
function include_footer() { ?> | function include_footer() { |
global $basePath; | |
?> | |
</div> | </div> |
<!-- container --> | <!-- container --> |
<!-- Included JS Files --> | <!-- Included JS Files --> |
<script src="javascripts/foundation.js"></script> | <script src="<?php echo $basePath; ?>javascripts/foundation.js"></script> |
<script src="javascripts/app.js"></script> | <script src="<?php echo $basePath; ?>javascripts/app.js"></script> |
<script src="http://code.jquery.com/jquery-1.7.1.min.js"></script> | <script src="http://code.jquery.com/jquery-1.7.1.min.js"></script> |
<!--<script language="javascript" type="text/javascript" src="javascripts/jquery.js"></script>--> | |
<script language="javascript" type="text/javascript" src="javascripts/flot/jquery.flot.js"></script> | |
</body> | </body> |
</html> | </html> |
<?php } | <?php } |
?> | |
/* Foundation v2.1.4 http://foundation.zurb.com */ | /* Foundation v2.1.4 http://foundation.zurb.com */ |
$(document).ready(function () { | $(document).ready(function () { |
/* Use this js doc for all application specific JS */ | /* Use this js doc for all application specific JS */ |
/* TABS --------------------------------- */ | /* TABS --------------------------------- */ |
/* Remove if you don't need :) */ | /* Remove if you don't need :) */ |
function activateTab($tab) { | function activateTab($tab) { |
var $activeTab = $tab.closest('dl').find('a.active'), | var $activeTab = $tab.closest('dl').find('a.active'), |
contentLocation = $tab.attr("href") + 'Tab'; | contentLocation = $tab.attr("href") + 'Tab'; |
//Make Tab Active | //Make Tab Active |
$activeTab.removeClass('active'); | $activeTab.removeClass('active'); |
$tab.addClass('active'); | $tab.addClass('active'); |
//Show Tab Content | //Show Tab Content |
$(contentLocation).closest('.tabs-content').children('li').hide(); | $(contentLocation).closest('.tabs-content').children('li').hide(); |
$(contentLocation).show(); | $(contentLocation).show(); |
} | } |
$('dl.tabs').each(function () { | $('dl.tabs').each(function () { |
//Get all tabs | //Get all tabs |
var tabs = $(this).children('dd').children('a'); | var tabs = $(this).children('dd').children('a'); |
tabs.click(function (e) { | tabs.click(function (e) { |
activateTab($(this)); | activateTab($(this)); |
}); | }); |
}); | }); |
if (window.location.hash) { | if (window.location.hash) { |
activateTab($('a[href="' + window.location.hash + '"]')); | activateTab($('a[href="' + window.location.hash + '"]')); |
} | } |
/* ALERT BOXES ------------ */ | /* ALERT BOXES ------------ */ |
$(".alert-box").delegate("a.close", "click", function(event) { | $(".alert-box").delegate("a.close", "click", function(event) { |
event.preventDefault(); | event.preventDefault(); |
$(this).closest(".alert-box").fadeOut(function(event){ | $(this).closest(".alert-box").fadeOut(function(event){ |
$(this).remove(); | $(this).remove(); |
}); | }); |
}); | }); |
/* PLACEHOLDER FOR FORMS ------------- */ | /* PLACEHOLDER FOR FORMS ------------- */ |
/* Remove this and jquery.placeholder.min.js if you don't need :) */ | /* Remove this and jquery.placeholder.min.js if you don't need :) */ |
$('input, textarea').placeholder(); | //$('input, textarea').placeholder(); |
/* UNCOMMENT THE LINE YOU WANT BELOW IF YOU WANT IE6/7/8 SUPPORT AND ARE USING .block-grids */ | /* UNCOMMENT THE LINE YOU WANT BELOW IF YOU WANT IE6/7/8 SUPPORT AND ARE USING .block-grids */ |
// $('.block-grid.two-up>li:nth-child(2n+1)').css({clear: 'left'}); | // $('.block-grid.two-up>li:nth-child(2n+1)').css({clear: 'left'}); |
// $('.block-grid.three-up>li:nth-child(3n+1)').css({clear: 'left'}); | // $('.block-grid.three-up>li:nth-child(3n+1)').css({clear: 'left'}); |
// $('.block-grid.four-up>li:nth-child(4n+1)').css({clear: 'left'}); | // $('.block-grid.four-up>li:nth-child(4n+1)').css({clear: 'left'}); |
// $('.block-grid.five-up>li:nth-child(5n+1)').css({clear: 'left'}); | // $('.block-grid.five-up>li:nth-child(5n+1)').css({clear: 'left'}); |
/* DROPDOWN NAV ------------- */ | /* DROPDOWN NAV ------------- */ |
var currentFoundationDropdown = null; | var currentFoundationDropdown = null; |
$('.nav-bar li a, .nav-bar li a:after').each(function() { | $('.nav-bar li a, .nav-bar li a:after').each(function() { |
$(this).data('clicks', 0); | $(this).data('clicks', 0); |
}); | }); |
$('.nav-bar li a, .nav-bar li a:after').live('click', function(e) { | $('.nav-bar li a, .nav-bar li a:after').live('click', function(e) { |
e.preventDefault(); | e.preventDefault(); |
if (currentFoundationDropdown !== $(this).index() || currentFoundationDropdown === null) { | if (currentFoundationDropdown !== $(this).index() || currentFoundationDropdown === null) { |
$(this).data('clicks', 0); | $(this).data('clicks', 0); |
currentFoundationDropdown = $(this).index(); | currentFoundationDropdown = $(this).index(); |
} | } |
$(this).data('clicks', ($(this).data('clicks') + 1)); | $(this).data('clicks', ($(this).data('clicks') + 1)); |
var f = $(this).siblings('.flyout'); | var f = $(this).siblings('.flyout'); |
if (!f.is(':visible') && $(this).parent('.has-flyout').length > 1) { | if (!f.is(':visible') && $(this).parent('.has-flyout').length > 1) { |
$('.nav-bar li .flyout').hide(); | $('.nav-bar li .flyout').hide(); |
f.show(); | f.show(); |
} else if (($(this).data('clicks') > 1) || ($(this).parent('.has-flyout').length < 1)) { | } else if (($(this).data('clicks') > 1) || ($(this).parent('.has-flyout').length < 1)) { |
window.location = $(this).attr('href'); | window.location = $(this).attr('href'); |
} | } |
}); | }); |
$('.nav-bar').live('click', function(e) { | $('.nav-bar').live('click', function(e) { |
e.stopPropagation(); | e.stopPropagation(); |
if ($(e.target).parents().is('.flyout') || $(e.target).is('.flyout')) { | if ($(e.target).parents().is('.flyout') || $(e.target).is('.flyout')) { |
e.preventDefault(); | e.preventDefault(); |
} | } |
}); | }); |
// $('body').bind('touchend', function(e) { | // $('body').bind('touchend', function(e) { |
// if (!$(e.target).parents().is('.nav-bar') || !$(e.target).is('.nav-bar')) { | // if (!$(e.target).parents().is('.nav-bar') || !$(e.target).is('.nav-bar')) { |
// $('.nav-bar li .flyout').is(':visible').hide(); | // $('.nav-bar li .flyout').is(':visible').hide(); |
// } | // } |
// }); | // }); |
/* DISABLED BUTTONS ------------- */ | /* DISABLED BUTTONS ------------- */ |
/* Gives elements with a class of 'disabled' a return: false; */ | /* Gives elements with a class of 'disabled' a return: false; */ |
}); | }); |
# www.robotstxt.org/ | # www.robotstxt.org/ |
# www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 | # www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449 |
User-agent: * | User-agent: * |
Disallow: /admin/ |
<?php | <?php |
$schemas['agency'] = Array( | $schemas['agency'] = Array( |
"description" => "Representation of government agency and online transparency measures", | "description" => "Representation of government agency and online transparency measures", |
"type" => "object", | "type" => "object", |
"properties" => Array( | "properties" => Array( |
"name" => Array("type" => "string", "required" => true, "description" => "Agency Name, most recent and broadest"), | "name" => Array("type" => "string", "required" => true, "x-itemprop" => "name", "x-title" => "Name", "description" => "Name, most recent and broadest"), |
"othernames" => Array("type" => "array", "required" => true, "description" => "Agency Names", | "shortName" => Array("type" => "string", "required" => false, "x-title" => "Short Name", "description" => "Name shortened, usually to an acronym"), |
"foiEmail" => Array("type" => "string", "required" => false, "x-title" => "FOI Contact Email", "description" => "FOI contact email if not foi@"), | |
"sameAs" => Array("type" => "array", "required" => false, "x-itemprop"=>"http://www.w3.org/2002/07/owl#sameAs","x-title" => "Same As", "description" => "Same as other URLs/URIs for this entity", | |
"items" => Array("type" => "string")), | "items" => Array("type" => "string")), |
"otherNames" => Array("type" => "array", "required" => true, "x-title" => "Past/Other Names", "description" => "Other names for organisation", | |
"items" => Array("type" => "string")), | |
"foiBodies" => Array("type" => "array", "required" => true, "x-title" => "FOI Bodies","x-itemprop"=>"members", "description" => "Organisational units within this agency that are subject to FOI Act but are not autonomous", | |
"items" => Array("type" => "string")), | |
"orgType" => Array("type" => "string", "required" => true, "x-title" => "Organisation Type", "description" => "Org type based on legal formation via FMA/CAC legislation etc."), | |
"parentOrg" => Array("type" => "string", "required" => true, "x-title" => "Parent Organisation", "description" => "Parent organisation, usually a department of state"), | |
"website" => Array("type" => "string", "required" => true, "x-title" => "Website", "x-itemprop" => "url", "description" => "Website URL"), | |
"abn" => Array("type" => "string", "required" => true, "x-title" => "Australian Business Number", "description" => "ABN from business register"), | |
"contractListURL" => Array("type" => "string", "required" => true, "x-title" => "Contract Listing", "description" => "Departmental and agency contracts, <a href='http://www.aph.gov.au/senate/pubs/standing_orders/d05.htm'>mandated by the Senate</a>"), | |
"grantsReportingURL" => Array("type" => "string", "required" => true, "x-title" => "Grants Awarded", | |
"description" => "Departmental and agency grants <a href='http://www.aph.gov.au/senate/pubs/standing_orders/d05.htm'>mandated by the Senate</a> and <a href='http://www.finance.gov.au/publications/fmg-series/23-commonwealth-grant-guidelines.html'>Commonwealth grants guidelines</a> "), | |
"annualReportURL" => Array("type" => "string", "required" => true, "x-title" => "Annual Report(s)", "description" => ""), | |
"consultanciesURL" => Array("type" => "string", "required" => true, "x-title" => "Consultants Hired", "description" => ""), | |
"legalExpenditureURL" => Array("type" => "string", "required" => true, "x-title" => "Legal Services Expenditure", "description" => "Legal Services Expenditure mandated by Legal Services Directions 2005"), | |
"recordsListURL" => Array("type" => "string", "required" => true, "x-title" => "Files/Records Held", "description" => "Indexed lists of departmental and agency files, <a href='http://www.aph.gov.au/senate/pubs/standing_orders/d05.htm'>mandated by the Senate</a>"), | |
"FOIDocumentsURL" => Array("type" => "string", "required" => true, "x-title" => "FOI Documents Released", "description" => ""), | |
"infoPublicationSchemeURL" => Array("type" => "string", "required" => true, "x-title" => "Information Publication Scheme", "description" => ""), | |
"appointmentsURL" => Array("type" => "string", "required" => true, "x-title" => "Agency Appointments/Boards", "description" => "Departmental and agency appointments and vacancies , <a href='http://www.aph.gov.au/senate/pubs/standing_orders/d05.htm'>mandated by the Senate</a>"), | |
"advertisingURL" => Array("type" => "string", "required" => true, "x-title" => "Approved Advertising Campaigns", "description" => " Agency advertising and public information projects, <a href='http://www.aph.gov.au/senate/pubs/standing_orders/d05.htm'>mandated by the Senate</a> "), | |
"hasRSS" => Array("type" => "string", "required" => true, "x-title" => "Has RSS", "description" => ""), | |
"hasMailingList" => Array("type" => "string", "required" => true, "x-title" => "Has Mailing List", "description" => ""), | |
"hasTwitter" => Array("type" => "string", "required" => true, "x-title" => "Has Twitter", "description" => ""), | |
"hasFacebook" => Array("type" => "string", "required" => true, "x-title" => "Has Facebook", "description" => ""), | |
"hasYouTube" => Array("type" => "string", "required" => true, "x-title" => "Has YouTube", "description" => ""), | |
"hasFlickr" => Array("type" => "string", "required" => true, "x-title" => "Has Flickr", "description" => ""), | |
"hasCCBY" => Array("type" => "string", "required" => true, "x-title" => "Has CC-BY", "description" => "Has any page licenced Creative Commons - Attribution"), | |
), | ), |
/*"org":{"type":"object", | /* "org":{"type":"object", |
"properties":{ | "properties":{ |
"organizationName":{"type":"string"}, | "organizationName":{"type":"string"}, |
"organizationUnit":{"type":"string"}}, | "organizationUnit":{"type":"string"}}, |
} | } |
}*/ | } */ |
); | ); |
?> | ?> |
<?php | |
include_once('include/common.inc.php'); | |
include_header(); | |
$db = $server->get_db('disclosr-agencies'); | |
try { | |
$rows = $db->get_view("score", "score", null, true)->rows; | |
//print_r($rows); | |
foreach ($rows as $row) { | |
echo '<a href="getAgency.php?id='.$row->value->id.'">'.$row->value->name." ".$row->value->score."</a><br>"; | |
} | |
} catch (SetteeRestClientException $e) { | |
setteErrorHandler($e); | |
} | |
include_footer(); | |
?> |
#http://packages.python.org/CouchDB/client.html | |
import couchdb | |
import urllib2 | |
from BeautifulSoup import BeautifulSoup | |
import re | |
#http://diveintopython.org/http_web_services/etags.html | |
class NotModifiedHandler(urllib2.BaseHandler): | |
def http_error_304(self, req, fp, code, message, headers): | |
addinfourl = urllib2.addinfourl(fp, headers, req.get_full_url()) | |
addinfourl.code = code | |
return addinfourl | |
def scrapeAndStore(URL, depth, agency): | |
URL = "http://www.google.com" | |
req = urllib2.Request(URL) | |
etag = 'y' | |
last_modified = 'y' | |
#if there is a previous version sotred in couchdb, load caching helper tags | |
if etag: | |
req.add_header("If-None-Match", etag) | |
if last_modified: | |
req.add_header("If-Modified-Since", last_modified) | |
opener = urllib2.build_opener(NotModifiedHandler()) | |
url_handle = opener.open(req) | |
headers = url_handle.info() # the addinfourls have the .info() too | |
etag = headers.getheader("ETag") | |
last_modified = headers.getheader("Last-Modified") | |
web_server = headers.getheader("Server") | |
file_size = headers.getheader("Content-Length") | |
mime_type = headers.getheader("Content-Type") | |
if hasattr(url_handle, 'code'): | |
if url_handle.code == 304: | |
print "the web page has not been modified" | |
else: | |
#do scraping | |
html = url_handle.read() | |
# http://www.crummy.com/software/BeautifulSoup/documentation.html | |
soup = BeautifulSoup(html) | |
links = soup.findAll('a') # soup.findAll('a', id=re.compile("^p-")) | |
for link in links: | |
print link['href'] | |
#for each unique link | |
#if html mimetype | |
# go down X levels, | |
# diff with last stored attachment, store in document | |
#if not | |
# remember to save parentURL and title (link text that lead to document) | |
#store as attachment epoch-filename | |
else: | |
print "error %s in downloading %s", url_handle.code, URL | |
#record/alert error to error database | |
couch = couchdb.Server('http://192.168.1.148:5984/') | |
# select database | |
agencydb = couch['disclosr-agencies'] | |
for row in agencydb.view('app/getScrapeRequired'): #not recently scraped agencies view? | |
agency = agencydb.get(row.id) | |
print agency['name'] | |
scrapeAndStore("A",1,1) | |
<?php | |
include_once("./lib/common.inc.php"); | |
setlocale(LC_CTYPE, 'C'); | |
// source: http://stackoverflow.com/questions/81934/easy-way-to-export-a-sql-table-without-access-to-the-server-or-phpmyadmin#81951 | |
$unspsc = Array(); | |
$unspscresult = $conn->prepare('select * from "UNSPSCcategories" where "UNSPSC"::text like \'%00000\';'); | |
$unspscresult->execute(); | |
foreach ($unspscresult->fetchAll() as $row) { | |
$unspsc[$row['UNSPSC']] = $row['Title']; | |
} | |
$query = $conn->prepare(' | |
SELECT "CNID",contractnotice."agencyName",agency_nametoabn.abn as "agencyABN", | |
EXTRACT(EPOCH FROM "publishDate") as "publishDate", | |
EXTRACT(EPOCH FROM "contractStart") as "contractStart", | |
EXTRACT(EPOCH FROM "contractEnd") as "contractEnd", | |
value,description,category, | |
"supplierName",(case when "supplierABN" != 0 THEN "supplierABN"::text ELSE "supplierName" END) as supplierID, | |
(\'https://www.tenders.gov.au/?event=public.advancedsearch.keyword&keyword=CN\'::text || "CNID"::text) as sourceURL | |
FROM contractnotice join agency_nametoabn on contractnotice."agencyName"=agency_nametoabn."agencyName" | |
where "childCN" is null' | |
, array(PDO::ATTR_CURSOR => PDO::FETCH_ORI_NEXT)); | |
$query->execute(); | |
$errors = $conn->errorInfo(); | |
if ($errors[2] != "") { | |
die("Export terminated, db error" . print_r($errors, true)); | |
} | |
$num_fields = $query->columnCount(); | |
$headers = Array(); | |
for ($i = 0; $i < $num_fields; $i++) { // for each column in query, make a CSV header | |
$meta = $query->getColumnMeta($i); | |
$headers[] = $meta['name']; | |
} | |
$fp = fopen('php://output', 'w'); | |
if ($fp && $query) { | |
header('Content-Type: text/csv'); | |
header('Content-Disposition: attachment; filename="export.' . date("c") . '.csv"'); | |
header('Pragma: no-cache'); | |
header('Expires: 0'); | |
fputcsv($fp, $headers); | |
while ($row = $query->fetch(PDO::FETCH_NUM, PDO::FETCH_ORI_NEXT)) { | |
foreach ($row as $key => &$colvalue) { | |
$colvalue = preg_replace('/[^[:print:]]/', '', utf8_encode($colvalue)); | |
if ($headers[$key] == "publishDate" || $headers[$key] == "contractStart" | |
|| $headers[$key] == "contractEnd") { | |
$colvalue = date("Y-m-d", $colvalue); | |
} | |
/* if ($headers[$key] == "CNID") { | |
$colvalue = str_replace("A","", $colvalue); | |
}*/ | |
if ($headers[$key] == "cat1" || $headers[$key] == "cat2" | |
|| $headers[$key] == "cat3") { | |
$colvalue = $unspsc[$colvalue]; | |
} | |
} | |
fputcsv($fp, array_values($row)); | |
} | |
die; | |
} | |
?> | |
#http://packages.python.org/CouchDB/client.html | |
import couchdb | |
import urllib2 | |
from BeautifulSoup import BeautifulSoup | |
import re | |
couch = couchdb.Server() # Assuming localhost:5984 | |
# If your CouchDB server is running elsewhere, set it up like this: | |
# couch = couchdb.Server('http://example.com:5984/') | |
# select database | |
agencydb = couch['disclosr-agencies'] | |
for row in agencydb.view('app/getScrapeRequired'): #not recently scraped agencies view? | |
agency = agencydb.get(row.id) | |
print agency['agencyName'] | |
#http://diveintopython.org/http_web_services/etags.html | |
class NotModifiedHandler(urllib2.BaseHandler): | |
def http_error_304(self, req, fp, code, message, headers): | |
addinfourl = urllib2.addinfourl(fp, headers, req.get_full_url()) | |
addinfourl.code = code | |
return addinfourl | |
def scrapeAndStore(URL, depth, agency): | |
URL = "http://www.hole.fi/jajvirta/weblog/" | |
req = urllib2.Request(URL) | |
#if there is a previous version sotred in couchdb, load caching helper tags | |
if etag: | |
req.add_header("If-None-Match", etag) | |
if last_modified: | |
req.add_header("If-Modified-Since", last_modified) | |
opener = urllib2.build_opener(NotModifiedHandler()) | |
url_handle = opener.open(req) | |
headers = url_handle.info() # the addinfourls have the .info() too | |
etag = headers.getheader("ETag") | |
last_modified = headers.getheader("Last-Modified") | |
web_server = headers.getheader("Server") | |
file_size = headers.getheader("Content-Length") | |
mime_type = headers.getheader("Content-Type") | |
if hasattr(url_handle, 'code') and url_handle.code == 304: | |
print "the web page has not been modified" | |
else: | |
print "error %s in downloading %s", url_handle.code, URL | |
#record/alert error to error database | |
#do scraping | |
html = ? | |
# http://www.crummy.com/software/BeautifulSoup/documentation.html | |
soup = BeautifulSoup(html) | |
links = soup.findAll('a') # soup.findAll('a', id=re.compile("^p-")) | |
for link in links: | |
print link['href'] | |
#for each unique link | |
#if html mimetype | |
# go down X levels, | |
# diff with last stored attachment, store in document | |
#if not | |
# remember to save parentURL and title (link text that lead to document) | |
#store as attachment epoch-filename |