I have seen many examples of how to export data from Google Analytics to other formats, as well as keep the exported data updated, but none have so far served to export the data to MySQL because of the following problems:
- Exceeding query limits;
- Process takes hours to keep records up to date;
- Process becomes impractical from a given number of records because the day only has 24h.
So far my implementation is in this format:
┌─────────────────────────────────────────┐
│ Por cada registro na base de dados local │
└────────────────────┬────────────────────┘
│
┌─────────────────┴──────────────────┐
│ Enviar URL para o Google Analytics │
└─────────────────┬──────────────────┘
│
┌─────────────────────────────┴─────────────────────────────┐
│ Com estatísticas recebidas, atualizar base de dados local │
└───────────────────────────────────────────────────────────┘
This works fine, but to overcome the problems listed, it is necessary to make use of breaks:
while ($registoLocal = $sth->fetch()) {
if (time_nanosleep(0, 500000000) === true) {
// comunicar com o Google Analytics
}
}
What is reflected in a fixed +/- time for every $registoLocal
, currently in 2 seconds which leaves us with a maximum of 43200 queries per day.
Additionally, we have the limits of Google Analytics API :
- 50,000 requests per project per day
- 10 queries per second per IP
Code in use
Include dependencies and instantiate the API:
/* API dependencies
*/
require_once('caminho/para/Google_Client.php');
require_once('caminho/para/contrib/Google_AnalyticsService.php');
/* Create client object and set app name
*/
$client = new Google_Client();
$client->setApplicationName('O nome da minha aplicação'); // app name
/* Set assertion credentials
*/
$client->setAssertionCredentials(
new Google_AssertionCredentials(
'[email protected]', // API Service Account Email Address
array('https://www.googleapis.com/auth/analytics.readonly'),
file_get_contents('caminho/para/o/ficheiro/O nome da minha aplicação.p12') // downloaded keyfile
)
);
/* Other settings
* Access the Google Cloud Console and view the Web App-project details:
* https://cloud.google.com/console
*/
$client->setClientId('outro-mega-super-grande-email-da-google@developer.gserviceaccount.com'); // Client ID for web application Email Address
$client->setAccessType('offline');
$client->setUseObjects(true);
Link to the local database:
/* Connection access details
*/
$dbcon = array(
"host" => "localhost",
"dbname" => "nomeBaseDados",
"username" => "utilizador",
"password" => "password"
);
/* Connect to the database
*/
$dbh = new PDO(
'mysql:host='.$dbcon['host'].';dbname='.$dbcon['dbname'].';',
$dbcon['username'],
$dbcon['password'],
array(
PDO::ATTR_PERSISTENT => false,
PDO::MYSQL_ATTR_USE_BUFFERED_QUERY => true,
PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION,
PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8"
)
);
/* Prepare query for Insert
*/
$sqlInsert = "
INSERT INTO press_statistic_analytics
(id, press_id, ga_profile_id, ga_sessions, ga_pageviews, ga_percentNewVisits, date_created, date_updated)
VALUES
(null, :press_id, :ga_profile_id, :ga_sessions, :ga_pageviews, :ga_percentNewVisits, :date_created, :date_updated)";
$queryInsert = $dbh->prepare($sqlInsert);
/* Prepare query for Update
*/
$sqlUpdate = "
UPDATE press_statistic_analytics SET
ga_sessions = :ga_sessions,
ga_pageviews = :ga_pageviews,
ga_percentNewVisits = :ga_percentNewVisits,
date_updated = :date_updated
WHERE press_id = :press_id
AND ga_profile_id = :ga_profile_id ";
$queryUpdate = $dbh->prepare($sqlUpdate);
/* Get all Articles
*/
$sth = $dbh->query("
SELECT
press_i18n.press_id,
press_i18n.title,
press_i18n.slug
FROM press_i18n
ORDER BY press_i18n.id DESC");
$sth->setFetchMode(PDO::FETCH_OBJ);
For each local record, communicate with Google Analytics and collect information:
while ($article = $sth->fetch()) {
if (time_nanosleep(0, 500000000) === true) {
/* Create service and get data
*/
$gaProfileID = 'ga:11111111';
$startDate = '2000-01-01';
$endDate = date("Y-m-d", time());
$metrics = 'ga:sessions,ga:pageviews,ga:percentNewVisits';
$optParams = array(
"filters" => "ga:pagePath==/".$article->slug
);
$service = new Google_AnalyticsService($client);
/* Query Google and process results
* to a more usable array.
*/
try {
// Get data
$results = $service->data_ga->get($gaProfileID, $startDate, $endDate, $metrics, $optParams);
$googleAnalyticsArr = array();
if (isset($results) && is_object($results) && count($results->getRows()) > 0) {
$headersArr = (Array)$results->getColumnHeaders();
$i=0;
foreach ($results->getRows() as $row) {
foreach ($row as $cell) {
$googleAnalyticsArr[$headersArr[$i]->name] = htmlspecialchars($cell, ENT_NOQUOTES);
$i++;
}
}
}
/* Update or insert the collected data
*/
$sthSelectExistent = $dbh->query("
SELECT *
FROM press_statistic_analytics
WHERE press_id = ".$article->press_id."
AND ga_profile_id = '".$gaProfileID."'
");
$sthSelectExistent->setFetchMode(PDO::FETCH_OBJ);
$rowSelectExistent = $sthSelectExistent->fetch();
/* Prepare data
*/
$ga_sessions = isset($googleAnalyticsArr["ga:sessions"]) ? $googleAnalyticsArr["ga:sessions"] : 0;
$ga_pageviews = isset($googleAnalyticsArr["ga:pageviews"]) ? $googleAnalyticsArr["ga:pageviews"] : 0;
$ga_percentNewVisits = isset($googleAnalyticsArr["ga:percentNewVisits"]) ? $googleAnalyticsArr["ga:percentNewVisits"] : 0.00;
if (is_object($rowSelectExistent)) {
$queryUpdate->execute(
array(
"ga_sessions" => $ga_sessions,
"ga_pageviews" => $ga_pageviews,
"ga_percentNewVisits" => $ga_percentNewVisits,
"date_updated" => date("Y-m-d H:i:s"),
"press_id" => $article->press_id,
"ga_profile_id" => $gaProfileID
)
);
}
else {
$queryInsert->execute(
array(
"press_id" => $article->press_id,
"ga_profile_id" => $gaProfileID,
"ga_sessions" => $ga_sessions,
"ga_pageviews" => $ga_pageviews,
"ga_percentNewVisits" => $ga_percentNewVisits,
"date_created" => date("Y-m-d H:i:s"),
"date_updated" => "0000-00-00 00:00:00"
)
);
}
}
catch (apiServiceException $e) {
// Handle API service exceptions.
//$error = $e->getMessage();
var_dump($e);
}
catch (Google_ServiceException $e) {
// Handle API service exceptions.
//$error = $e->getMessage();
var_dump($e);
}
}
}
Question
How to optimize this process for collecting the statistics of each URL in Google Analytics for the local database in MySQL in order to avoid forced breaks in script and not go beyond the limits currently imposed by Google ?