Kudos for GetMe2 Canberra
[busui.git] / lib / rolling-curl / .svn / text-base / README.txt.svn-base
maxious 1 Rolling Curl
2 ============
3
4 RollingCurl allows you to process multiple HTTP requests in parallel using CURL PHP library.
5
6 Released under the Apache License 2.0.
7
8 Authors
9 -------
10 - Was originally written by [Josh Fraser](joshfraser.com).
11 - Currently maintained by [Alexander Makarov](http://rmcreative.ru/).
12 - Received significant updates and patched from [LionsAd](http://github.com/LionsAd/rolling-curl).
13
14 Overview
15 --------
16 RollingCurl is a more efficient implementation of curl_multi() curl_multi is a great way to process multiple HTTP requests in parallel in PHP.
17 curl_multi is particularly handy when working with large data sets (like fetching thousands of RSS feeds at one time). Unfortunately there is
18 very little documentation on the best way to implement curl_multi. As a result, most of the examples around the web are either inefficient or
19 fail entirely when asked to handle more than a few hundred requests.
20
21 The problem is that most implementations of curl_multi wait for each set of requests to complete before processing them. If there are too many requests
22 to process at once, they usually get broken into groups that are then processed one at a time. The problem with this is that each group has to wait for
23 the slowest request to download. In a group of 100 requests, all it takes is one slow one to delay the processing of 99 others. The larger the number of
24 requests you are dealing with, the more noticeable this latency becomes.
25
26 The solution is to process each request as soon as it completes. This eliminates the wasted CPU cycles from busy waiting. Also there is a queue of
27 cURL requests to allow for maximum throughput. Each time a request is completed, a new one is added from the queue. By dynamically adding and removing
28 links, we keep a constant number of links downloading at all times. This gives us a way to throttle the amount of simultaneous requests we are sending.
29 The result is a faster and more efficient way of processing large quantities of cURL requests in parallel.
30
31 Callbacks
32 ---------
33
34 Each of requests usually do have a callback to process results that is being executed when request is done
35 (both successfully or not).
36
37 Callback accepts three parameters and can look like the following one:
38 ~~~
39 [php]
40 function request_callback($response, $info, $request){
41 // doing something with the data received
42 }
43 ~~~
44
45 - $response contains received page body.
46 - $info is an associative array that holds various information about response such as HTTP response code, content type,
47 time taken to make request etc.
48 - $request contains RollingCurlRequest that was used to make request.
49
50 Examples
51 --------
52 ### Hello world
53
54 ~~~
55 [php]
56 // an array of URL's to fetch
57 $urls = array("http://www.google.com",
58 "http://www.facebook.com",
59 "http://www.yahoo.com");
60
61 // a function that will process the returned responses
62 function request_callback($response, $info, $request) {
63 // parse the page title out of the returned HTML
64 if (preg_match("~<title>(.*?)</title>~i", $response, $out)) {
65 $title = $out[1];
66 }
67 echo "<b>$title</b><br />";
68 print_r($info);
69 echo "<hr>";
70 }
71
72 // create a new RollingCurl object and pass it the name of your custom callback function
73 $rc = new RollingCurl("request_callback");
74 // the window size determines how many simultaneous requests to allow.
75 $rc->window_size = 20;
76 foreach ($urls as $url) {
77 // add each request to the RollingCurl object
78 $request = new RollingCurlRequest($url);
79 $rc->add($request);
80 }
81 $rc->execute();
82 ~~~
83
84
85 ### Setting custom options
86
87 Set custom options for EVERY request:
88
89 ~~~
90 [php]
91 $rc = new RollingCurl("request_callback");
92 $rc->options = array(CURLOPT_HEADER => true, CURLOPT_NOBODY => true);
93 $rc->execute();
94 ~~~
95
96 Set custom options for A SINGLE request:
97
98 ~~~
99 [php]
100 $rc = new RollingCurl("request_callback");
101 $request = new RollingCurlRequest($url);
102 $request->options = array(CURLOPT_HEADER => true, CURLOPT_NOBODY => true);
103 $rc->add($request);
104 $rc->execute();
105 ~~~
106
107 ### Shortcuts
108
109 ~~~
110 [php]
111 $rc = new RollingCurl("request_callback");
112 $rc->get("http://www.google.com");
113 $rc->get("http://www.yahoo.com");
114 $rc->execute();
115 ~~~
116
117 ### Class callbacks
118
119 ~~~
120 [php]
121 class MyInfoCollector {
122 private $rc;
123
124 function __construct(){
125 $this->rc = new RollingCurl(array($this, 'processPage'));
126 }
127
128 function processPage($response, $info, $request){
129 //...
130 }
131
132 function run($urls){
133 foreach ($urls as $url){
134 $request = new RollingCurlRequest($url);
135 $this->rc->add($request);
136 }
137 $this->rc->execute();
138 }
139 }
140
141 $collector = new MyInfoCollector();
142 $collector->run(array(
143 'http://google.com/',
144 'http://yahoo.com/'
145 ));
146 ~~~
147
148 ### Using RollingCurlGroup
149
150 ~~~
151 [php]
152 class TestCurlRequest extends RollingCurlGroupRequest {
153 public $test_verbose = true;
154
155 function process($output, $info) {
156 echo "Processing " . $this->url . "\n";
157 if ($this->test_verbose)
158 print_r($info);
159
160 parent::process($output, $info);
161 }
162 }
163
164 class TestCurlGroup extends RollingCurlGroup {
165 function process($output, $info, $request) {
166 echo "Group CB: Progress " . $this->name . " (" . ($this->finished_requests + 1) . "/" . $this->num_requests . ")\n";
167 parent::process($output, $info, $request);
168 }
169
170 function finished() {
171 echo "Group CB: Finished" . $this->name . "\n";
172 parent::finished();
173 }
174 }
175
176 $group = new TestCurlGroup("High");
177 $group->add(new TestCurlRequest("www.google.de"));
178 $group->add(new TestCurlRequest("www.yahoo.de"));
179 $group->add(new TestCurlRequest("www.newyorktimes.com"));
180 $reqs[] = $group;
181
182 $group = new TestCurlGroup("Normal");
183 $group->add(new TestCurlRequest("twitter.com"));
184 $group->add(new TestCurlRequest("www.bing.com"));
185 $group->add(new TestCurlRequest("m.facebook.com"));
186 $reqs[] = $group;
187
188 $reqs[] = new TestCurlRequest("www.kernel.org");
189
190 // No callback here, as its done in Request class
191 $rc = new GroupRollingCurl();
192
193 foreach ($reqs as $req)
194 $rc->add($req);
195
196 $rc->execute();
197 ~~~
198
199 The same function (add) can be used both for adding requests and groups of requests.
200 The "callback" in request and groups is:
201
202 process($output, $info)
203
204 and
205
206 process($output, $info, $request)
207
208 Also you can override RollingCurlGroup::finished() that will be executed right after finishing group processing.
209
210 $Id$