How do youtube know which videos we have already watched?

4

The question by itself already describes my interest, but what I'm really wanting to know is how youtube stores the information saying that we've already watched that video. I thought about storage with cookies, but it would be unfeasible since whenever someone cleared the history the cookies could be deleted and thus would not know which video has already watched or not. Soon after I thought that Youtube stores this information in a DB, however I think that way it would create a huge amount of data that can be considered practically useless. Is there a third way to store this information?

Taking the plunge, I would like to ask another question on the same subject. Pro Youtube is not enough just to click on the link so that it considers that video assisted by the user. Would you then know how he accomplishes this Time to establish which video is watched or not? I thought of a kind of timer with about 80% of the actual time of the video, where the video will only be considered as watched if the user stays that X seconds or minutes on the page recharges 80% of the clicked video. But I think there must be some other way to accomplish this function, I would like to know if there is any other way and if so, what would it be?

For those who have never seen what I'm talking about, you just have to log in to YouTube and watch a video, and right after the video comes up with a semi-transparent film on top of it with the following saying "Watched."

    
asked by anonymous 23.07.2014 / 23:43

3 answers

9

Second various sources about Youtube, this mainly uses the MySQL database to store the information. However, at this point, they probably will use some alternate MySQL, modified for more performance.

However, to say this is to be too simplistic. After all, it is true that the amount of data stored is astronomical, not only of videos as of users, including the list of views, preferences and usage habits. As far as I've researched, the exact size of the database is unknown, but estimated at the petabytes' home.

High performance architectures like this can not rely simply on a simple database. There is a whole set of technologies distributed across thousands of servers around the world to handle this.

To understand this you need to be in a certain way abreast of distributed applications, big data and the like.

In a talk about Scalability with Pyhton on Youtube , one of the development managers told several techniques to meet high demand.

For example, unlike a "common" system where transactions are always used to change the database and changes are automatically available to all users, Youtube gives up certain ACID "guarantees" in exchange for high availability . So if a China user adds a comment to a video, it will be logged on China's servers and it may take several minutes for the data to be replicated on other servers around the world. This is a small price to pay.

In addition, another "cheat" is the amount of views the videos have. When a video is being accessed a lot, the code behind YouTube makes an estimate of hits per minute and increases that value through the estimation and not the actual number of hits. This makes it not necessary to have a central table to store each visit and at the same time the number is as "real" as possible.

This is evident if you access some newly published viral video. There will certainly be some anomalies in the statistics if you refresh the page at certain time intervals.

Databases of this type are very different from what 99% of developers are accustomed to seeing on a day-to-day basis. To understand them, it is necessary to enter practically into another world of studies.

One of them that is growing today is Big Data, which involves most cases where databases with traditional architectures do not account for it. It is true that there is a lot of staging when it comes to Big Data, but it is also true that all serious high-tech companies successfully use such solutions. If you want to have a not-so-technical introduction to Big Data, .

    
24.07.2014 / 17:31
2

It does not make much sense to use cookies to store the videos already watched, even though different browsers do not share the same cookies. What probably happens is the BD storage relative to your youtube channel (as you can see when youtube is a "folder" on your channel with videos already watched "). What about the time it takes / percentage needed for a video to be watched to count as viewed ... This is classified information that aims to steer clear of malicious software that uses macros to increase the number of views of a given video.

    
24.07.2014 / 00:12
1

Do you think the video data you've viewed is useless? I do not believe that they are useless, because various information can be taken from there, as what tastes a certain group of people possess, in what are more interested, among other information. I'm not sure what storage medium youtube uses to store this information, but I believe it's actually in the database. There are many ways to increase the efficiency of requests and queries, caching is one of them. Have you ever imagined how big and organized the structure of such a company is? I do not have much idea about it. As for the verification whether the video was watched or not, it is possibly stored locally in the frontend, and if the condition is respected, a trigger will be responsible for communicating with the backend, recording that the video was watched. Have you noticed that google does not use libraries like jQuery? yes, it always uses javascript (pure) to maintain efficiency. I hope I have helped, at least a little.

    
24.07.2014 / 00:08