This response is very long, but here goes my 2 cents of what I learned until today observing Google:
1. How does Google know which key search words led you to the site?
Notice that when you do a Google search, the links to the found sites are listed as the actual links (for example, if the search was StackOverflow
, the first result points to www.stackoverflow.com
). >
However, in each anchor ( a
) there is a registered callback for the onmousedown
event. This event replaces the href
of the anchor with a href
that points to Google's servers, which in turn redirects the browser to the actual site. See the images below:
Before clicking
Afterclickingonthelink
In this way, Google associates typed keywords with the typed website, and can use this information to optimize search results (note that Google offers search results according to visitor pattern as well, being crawled through your Google account (I believe that Chrome will send history data to Google too, but do not stop to review).
Editing :
Notice the value of href
after running onmousedown
event, that the new href
value does not include the actual site address ( www.stackoverflow.com
). This indicates that any of the query parameters only identifies the actual URL within the Google server in the search context. Thus, Google can track that the key words typed are actually related to the URL, and can improve your search results in the next iteration of the ranking algorithm.
Another less technical point, which also justifies Google storing the search and results in its database in a referable way, is that if it were not, it would be possible for a competitor to adulterate the final URLs, degrading the statistics used by Google in page ranking. Example:
Imagine that in href
, after onmousedown
, Google has set StackOverflow
something to /url?keyword=stackoverflow&real=www.stackoverflow.com
. This means that when the user clicks the link, Google will be informed that the keyword entered was stackoverflow
and the actual site is www.stackoverflow.com
.
Now imagine for example a virus on the client computer (or an MITM attack) by changing the URL to: /url?keyword=aumente%20seu&url=www.stackoverflow.com
. When the user clicks on this link, Google will know that aumente seu
keywords are related to www.stackoverflow.com
, degrading page ranking performance.
This could be done not just by competitors, but also by Trolls and people who want to make their site appear on top of searches.
2. How can Google Analytics verify the source of the click?
When the Analytics code runs on the visited page, it collects browser and system information. By visiting the site www.nortonconsultoria.com.br , these were the information collected and sent through GET parameters for Anlytics:
utmwv:5.6.2
utms:1
utmn:1342588638
utmhn:www.nortonconsultoria.com.br
utmcs:UTF-8
utmsr:1280x1024
utmvp:1265x716
utmsc:24-bit
utmul:en-us
utmje:1
utmfl:-
utmdt:Norton TI
utmhid:952692061
utmr:-
utmp:/
utmht:1423077254452
utmac:UA-41756695-1
utmcc:__utma=223515140.1707534080.1422473707.1422473707.1423077254.2;+__utmz=223515140.1422473707.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none);
utmjid:986782225
utmredir:1
utmu:qAAAAAAAAAAAAAAAAAAAAAAE~
In addition to this information, the HTTP header sends others, especially the User-Agent
, used to display which platforms are most relevant to your site.
I believe there are more complex ways for Google to collect other information, but I do not know. These are just data I noticed that are sent to Google through Chrome developer tools (I looked in Firefox also to see if there were big differences, but I did not find anything).
I hope it helped a little!