Best choice for single data (IP, Cookies, MAC)

1

This question may be somewhat generic depending on the point of view, but I'll try to be as specific as possible.

I have a system that adds 1 (+1) to visits when accessed. Example:

<?php
$mysql->query('UPDATE post SET views = views + 1 WHERE id = 10');
?>

I think you got it. The problem, in this case, is that every F5 will update the visitor's number, even if it is from the same user.

So I thought of some solutions:

  • Save the IP in the database and compare before inserting
  • Save a cookie in your browser and check before inserting.
  • Save the MAC in the database and compare before inserting.
  • Which of these solutions are best? Taking into consideration the performance and obviously the difficulty of changing (delete cache / cookies or use anonymous browser breaks the second case as well as reset the modem or use proxy breaks the first case as well as use some other proxies or tor breaks the third case ).

    Using all of them would be a good alternative?

      

    I saw that getting the MAC via PHP is almost impossible, however.   CloudFlare can get this data, so I listed it, although it does not   know how to actually do it.

        
    asked by anonymous 07.02.2016 / 00:31

    1 answer

    3

    In the question does not specify the environment, so I will consider what you want to use in the open web environment. In this case, because you do not have administrative access to the client browser, you will not be able to obtain the mac address.

    Viable forms are a combination of cookies and IP, if you can not control by authentication (logged in user).

    The combination of cookie and IP should be well planned, for example, a free wifi provides the same IP for thousands of users. In a shopping mall, in a park, train and subway stations, buses, etc.

    In a residential or commercial building we can also have the same shared IP situation. Therefore, one should not think that IP identifies as a single user. But it can be used to determine if that user has returned and visited the page again. Cookie data is combined for this purpose.

    Cookies

    Cookies are the safest possible means and standardized because that same user under the same cookie, can access again by another IP. Hence, the importance in not relying on a single identifier. Also capture browser information such as browser name and version. Two distinct users under the same IP often have versions of different browsers. So, even if both exclude cookies, you can do a subsequent filtering to look for possible duplicates.

    Filtering Algorithms

    Systems like Youtube, for example, use algorithms of this type to determine the number of views. So it is common to see videos, which have 100 thousand views suddenly lose and stay the 80 thousand views from one day to another. This is due to the fact that there is a subsequent filtering. Many people are unaware that YouTube is "stealing" views. But in fact what happens is a subsequent filtering because the monetization system also can not "steal" from the advertiser. Youtube was merely an example for being a very popular service in order to give a practical example. Google analytics and Google adsense do the same, but much more advanced.

    Fingerprint

    Continuing the subject, you'll probably hear the obvious that it's easy to remove or edit cookies. However, an ordinary user does not do this kind of action. Usually users are specifically meant to do so or merely a legitimate user who simply deleted the browser cache. There are also cases of users with multiple devices. Access your smartphone, tablet and PC. Officially it's a single person, but made access to different devices. At this point comes the logic of your business model. Your business model is who will define how to deal from then on. From the client data, generate a sort of fingerprint , for example.

    Authentication and bots

    A safer way is to identify the person by authentication / login. Whenever possible, use. Get the user to identify themselves. This makes filtering a lot easier.

    Bots!

    Of course you should also be aware of bots and set rules on how to treat them.

    Evercookie!

    Optionally there is the controversial use of evercookie . I recommend not using such a practice. But it is interesting to know that such an "option" exists.

    Refresh, browsing time

    Obviously you should also create some logic to identify the time of navigation and permanence on the page, such as identifying a refresh on the page. This will help to identify how the increase in the number of views occurred. So do not just save +1 in views. Save the entire log on how that view was generated.

    Usually the algorithms are complex and it is not feasible to process them at runtime. Just let the logs be saved and in a private environment run the filtering. The down side is the accumulation of data. Easily the database will reach 1GB with ease in a week, depending on the volume of access. It is difficult to handle large amounts of data when you do not have dedicated and specialized staff on it. And most companies do not have this "luxury" because of the costs.

    Reinvent the wheel. Cost Benefit.

    Finally, it all depends on the business model. Some may find this implementation overkill and really is for small and unimportant systems. Usually poorly made and amateur systems. But for high-level projects it's good to create consistent rules.

    In the end, what you will be able to develop is something similar to what already exists, Google Analytics. So many choose to leave this to the care of third parties like Google Analytics, which obviously has great know how.

        
    07.02.2016 / 11:17