Collect Google Analytics data for MySQL tables

19

I have seen many examples of how to export data from Google Analytics to other formats, as well as keep the exported data updated, but none have so far served to export the data to MySQL because of the following problems:

  • Exceeding query limits;
  • Process takes hours to keep records up to date;
  • Process becomes impractical from a given number of records because the day only has 24h.

So far my implementation is in this format:

         ┌─────────────────────────────────────────┐
         │ Por cada registro na base de dados local │
         └────────────────────┬────────────────────┘
                              │
            ┌─────────────────┴──────────────────┐
            │ Enviar URL para o Google Analytics │
            └─────────────────┬──────────────────┘
                              │
┌─────────────────────────────┴─────────────────────────────┐
│ Com estatísticas recebidas, atualizar base de dados local │
└───────────────────────────────────────────────────────────┘

This works fine, but to overcome the problems listed, it is necessary to make use of breaks:

while ($registoLocal = $sth->fetch()) {

    if (time_nanosleep(0, 500000000) === true) {
        // comunicar com o Google Analytics
    }
}

What is reflected in a fixed +/- time for every $registoLocal , currently in 2 seconds which leaves us with a maximum of 43200 queries per day.

Additionally, we have the limits of Google Analytics API :

  • 50,000 requests per project per day
  • 10 queries per second per IP

Code in use

Include dependencies and instantiate the API:

/* API dependencies
 */
require_once('caminho/para/Google_Client.php');
require_once('caminho/para/contrib/Google_AnalyticsService.php');

/* Create client object and set app name
 */
$client = new Google_Client();
$client->setApplicationName('O nome da minha aplicação'); // app name

/* Set assertion credentials
 */
$client->setAssertionCredentials(
    new Google_AssertionCredentials(
        '[email protected]', // API Service Account Email Address
        array('https://www.googleapis.com/auth/analytics.readonly'),
        file_get_contents('caminho/para/o/ficheiro/O nome da minha aplicação.p12')  // downloaded keyfile
    )
);

/* Other settings
 * Access the Google Cloud Console and view the Web App-project details:
 * https://cloud.google.com/console
 */
$client->setClientId('outro-mega-super-grande-email-da-google@developer.gserviceaccount.com'); // Client ID for web application Email Address
$client->setAccessType('offline');
$client->setUseObjects(true);

Link to the local database:

/* Connection access details
 */
$dbcon = array(
    "host"     => "localhost",
    "dbname"   => "nomeBaseDados",
    "username" => "utilizador",
    "password" => "password"
);

/* Connect to the database
 */
$dbh = new PDO(
    'mysql:host='.$dbcon['host'].';dbname='.$dbcon['dbname'].';',
    $dbcon['username'],
    $dbcon['password'],
    array(
        PDO::ATTR_PERSISTENT               => false,
        PDO::MYSQL_ATTR_USE_BUFFERED_QUERY => true,
        PDO::ATTR_ERRMODE                  => PDO::ERRMODE_EXCEPTION,
        PDO::MYSQL_ATTR_INIT_COMMAND       => "SET NAMES utf8"
    )
);

/* Prepare query for Insert
 */
$sqlInsert = "
INSERT INTO press_statistic_analytics
    (id, press_id, ga_profile_id, ga_sessions, ga_pageviews, ga_percentNewVisits, date_created, date_updated)
VALUES
    (null, :press_id, :ga_profile_id, :ga_sessions, :ga_pageviews, :ga_percentNewVisits, :date_created, :date_updated)";

$queryInsert = $dbh->prepare($sqlInsert);

/* Prepare query for Update
 */
$sqlUpdate = "
UPDATE press_statistic_analytics SET
    ga_sessions = :ga_sessions,
    ga_pageviews = :ga_pageviews,
    ga_percentNewVisits = :ga_percentNewVisits,
    date_updated = :date_updated
WHERE press_id = :press_id
AND ga_profile_id = :ga_profile_id ";

$queryUpdate = $dbh->prepare($sqlUpdate);

/* Get all Articles
 */
$sth = $dbh->query("
SELECT
    press_i18n.press_id,
    press_i18n.title,
    press_i18n.slug
FROM press_i18n
ORDER BY press_i18n.id DESC");

$sth->setFetchMode(PDO::FETCH_OBJ);

For each local record, communicate with Google Analytics and collect information:

while ($article = $sth->fetch()) {

    if (time_nanosleep(0, 500000000) === true) {

        /* Create service and get data
         */
        $gaProfileID = 'ga:11111111';
        $startDate = '2000-01-01';
        $endDate = date("Y-m-d", time());
        $metrics = 'ga:sessions,ga:pageviews,ga:percentNewVisits';
        $optParams = array(
            "filters" => "ga:pagePath==/".$article->slug
        );

        $service = new Google_AnalyticsService($client);

        /* Query Google and process results
         * to a more usable array.
         */
        try {

            // Get data
            $results = $service->data_ga->get($gaProfileID, $startDate, $endDate, $metrics, $optParams);

            $googleAnalyticsArr = array();

            if (isset($results) && is_object($results) && count($results->getRows()) > 0) {

                $headersArr = (Array)$results->getColumnHeaders();

                $i=0;

                foreach ($results->getRows() as $row) {

                    foreach ($row as $cell) {

                        $googleAnalyticsArr[$headersArr[$i]->name] = htmlspecialchars($cell, ENT_NOQUOTES);

                        $i++;
                    }
                }
            }

            /* Update or insert the collected data
             */
            $sthSelectExistent = $dbh->query("
                SELECT *
                FROM press_statistic_analytics
                WHERE press_id = ".$article->press_id."
                AND ga_profile_id = '".$gaProfileID."'
            ");
            $sthSelectExistent->setFetchMode(PDO::FETCH_OBJ);
            $rowSelectExistent = $sthSelectExistent->fetch();

            /* Prepare data
             */
            $ga_sessions = isset($googleAnalyticsArr["ga:sessions"]) ? $googleAnalyticsArr["ga:sessions"] : 0;
            $ga_pageviews = isset($googleAnalyticsArr["ga:pageviews"]) ? $googleAnalyticsArr["ga:pageviews"] : 0;
            $ga_percentNewVisits = isset($googleAnalyticsArr["ga:percentNewVisits"]) ? $googleAnalyticsArr["ga:percentNewVisits"] : 0.00;

            if (is_object($rowSelectExistent)) {

                $queryUpdate->execute(
                    array(
                        "ga_sessions"         => $ga_sessions,
                        "ga_pageviews"        => $ga_pageviews,
                        "ga_percentNewVisits" => $ga_percentNewVisits,
                        "date_updated"        => date("Y-m-d H:i:s"),
                        "press_id"            => $article->press_id,
                        "ga_profile_id"       => $gaProfileID
                    )
                );
            }
            else {
                $queryInsert->execute(
                    array(
                        "press_id"            => $article->press_id,
                        "ga_profile_id"       => $gaProfileID,
                        "ga_sessions"         => $ga_sessions,
                        "ga_pageviews"        => $ga_pageviews,
                        "ga_percentNewVisits" => $ga_percentNewVisits,
                        "date_created"        => date("Y-m-d H:i:s"),
                        "date_updated"        => "0000-00-00 00:00:00"
                    )
                );
            }
        }
        catch (apiServiceException $e) {

            // Handle API service exceptions.
            //$error = $e->getMessage();
            var_dump($e);
        }
        catch (Google_ServiceException $e) {

            // Handle API service exceptions.
            //$error = $e->getMessage();
            var_dump($e);
        }
    }
}

Question

  

How to optimize this process for collecting the statistics of each URL in Google Analytics for the local database in MySQL in order to avoid forced breaks in script and not go beyond the limits currently imposed by Google ?

    
asked by anonymous 08.04.2015 / 13:34

2 answers

1

Proposed solution:

1. Create a column with the date last time you queried the data from article on google

ALTER TABLE press_i18n ADD COLUMN date_ga DATE NULL

2. Change the script that searches all articles to get the last updated date

$sth = $dbh->query("
SELECT
    press_i18n.press_id,
    press_i18n.title,
    press_i18n.slug
FROM press_i18n
WHERE date_ga < DATE(NOW()) OR date_ga IS NULL
ORDER BY press_i18n.id DESC LIMIT 1");

3. Give UPDATE of date_ga of searched base register just below

// Pegamos o artigo
$article = $sth->fetch();

if(!$article) {
    echo "Não encontrou article";
    break;
}

echo "Article: " . $article->press_id;

$sthUpdate = "UPDATE press_i18n SET date_ga = DATE(NOW()) WHERE press_i18n.press_id = :press_id";
$queryUpdateGa = $dbh->prepare($sthUpdate);

$queryUpdateGa->execute(
    array(
        "press_id" => $article->press_id
    )
);

4. Change communication code with Google to always run one article at a time

$gaProfileID = 'ga:11111111';
$startDate = '2000-01-01';
$endDate = date("Y-m-d", time());
$metrics = 'ga:sessions,ga:pageviews,ga:percentNewVisits';
$optParams = array(
    "filters" => "ga:pagePath==/".$article->slug
);

$service = new Google_AnalyticsService($client);

try {

    // Get data
    $results = $service->data_ga->get($gaProfileID, $startDate, $endDate, $metrics, $optParams);

    $googleAnalyticsArr = array();

    if (isset($results) && is_object($results) && count($results->getRows()) > 0) {

        $headersArr = (Array)$results->getColumnHeaders();

        $i=0;

        foreach ($results->getRows() as $row) {

            foreach ($row as $cell) {

                $googleAnalyticsArr[$headersArr[$i]->name] = htmlspecialchars($cell, ENT_NOQUOTES);

                $i++;
            }
        }
    }

    /* Update or insert the collected data
     */
    $sthSelectExistent = $dbh->query("
        SELECT *
        FROM press_statistic_analytics
        WHERE press_id = ".$article->press_id."
        AND ga_profile_id = '".$gaProfileID."'
    ");
    $sthSelectExistent->setFetchMode(PDO::FETCH_OBJ);
    $rowSelectExistent = $sthSelectExistent->fetch();

    /* Prepare data
     */
    $ga_sessions = isset($googleAnalyticsArr["ga:sessions"]) ? $googleAnalyticsArr["ga:sessions"] : 0;
    $ga_pageviews = isset($googleAnalyticsArr["ga:pageviews"]) ? $googleAnalyticsArr["ga:pageviews"] : 0;
    $ga_percentNewVisits = isset($googleAnalyticsArr["ga:percentNewVisits"]) ? $googleAnalyticsArr["ga:percentNewVisits"] : 0.00;

    if (is_object($rowSelectExistent)) {

        $queryUpdate->execute(
            array(
                "ga_sessions"         => $ga_sessions,
                "ga_pageviews"        => $ga_pageviews,
                "ga_percentNewVisits" => $ga_percentNewVisits,
                "date_updated"        => date("Y-m-d H:i:s"),
                "press_id"            => $article->press_id,
                "ga_profile_id"       => $gaProfileID
            )
        );
    }
    else {
        $queryInsert->execute(
            array(
                "press_id"            => $article->press_id,
                "ga_profile_id"       => $gaProfileID,
                "ga_sessions"         => $ga_sessions,
                "ga_pageviews"        => $ga_pageviews,
                "ga_percentNewVisits" => $ga_percentNewVisits,
                "date_created"        => date("Y-m-d H:i:s"),
                "date_updated"        => "0000-00-00 00:00:00"
            )
        );
    }
}
catch (apiServiceException $e) {

    // Handle API service exceptions.
    //$error = $e->getMessage();
    var_dump($e);
}
catch (Google_ServiceException $e) {

    // Handle API service exceptions.
    //$error = $e->getMessage();
    var_dump($e);
}

5. Starting from the principle that this page has a name (and can be accessed externally) we can create a PHP page to call her or to schedule in the CRON for that page to be called every X time As I do not know what your server is going to create a page that calls the page of communication with google every 1 second, theoretically we can duplicate the requests in google.

<meta http-equiv="refresh" content="1">

<?php 
    @file_get_contents( "http://seuservidor/pasta/arquivo_que_comunica_com_o_google.php" );     
?>
  • The solution was not tested because I do not have the database of the question, but I do a similar process here in my company and it works by the crontable approximately 150 thousand requests / day
  • I believe that using CRONTAB can increase the number of requests in google.
  • If you have any questions, I'm happy to chat with you.
30.12.2015 / 17:25
1

This is not a limitation of the per-capacity tool, but the price ( link ).

They made the search limit based on a requisition every 1.7 seconds which would be a good time for a routine. So skipping this will be a bit tricky if you do not pay the API. Adwords works the same way and DSPs pay good money to use the Exchanges.

On the API screen billing can be turned on to increase this quota. And remembering that if you are using the free account you still have the problem of sample data in the API (In paid too but they say no). Remember that if you use premium you can use Big Query that will give you the most chewed files.

Some tools you can use like: - Webtrends; - Piwik (Analytics Open Source); - At the Internet;

I know this is not exactly the answer you expected, but the limit is that they can monetize what they support data. Try Piwik will have full control of FREE data, and can still implement with Javascript via tag manager in the same way.

I forgot to comment out the real time, the other reports takes 24 hours (No SLA) in the free version and 4 hours in the premium version. You can request from time to time when there is a difference in the data.

    
06.01.2016 / 01:51