How to index and update a user comparison system?

7

I have 2 tables with characteristics and interests of users in a MySQL database with innoDB

  

Feature Table

id
nome
sobrenome
cidade
estadocivil
altura
peso
fisico
pele
olhos
filhos
querofilhos
bebo
fumo
denominacao
frequencia
importancia
  

Interest Table

id
cidade
estadocivil
altura_min
altura_max
peso_min
peso_max
fisico
pele
olhos
filhos
quer_filhos
bebe
fuma
denominacao
frequencia
importancia

in previous topic , the best option was to create a table saving users and their compatibility.

  

Compatibility table

user_id
user_interesse_id
compatibilidade --> Valor de 0 a 100 que representa uma %
  

The calculation is done like this:

IF($caracteristica==$Interesse)
    $compatibilidade++

Then I get the final value and divide by the number of features and multiplied by 100 thus having a% compatibility and then store the INT of the percentage

($compatibilidade/Num_car).100;

Now the question is: "How to index and update these comparisons so that you do not generate more requests than you need?"

@MarceloAymone gave the idea of a StoredProcedure that I still do not know right what it is.

Any suggestions?

    
asked by anonymous 06.03.2014 / 19:17

3 answers

2

I imagine this to be a social networking type, right? I can not see how feasible algorithms that compare users one by one. It would be very inefficient to run this in a trigger and "time-consuming" in a background process (how many minutes should the user wait to see the result?).

I believe that the ideal solution is a type of pre-classification that, applied to a given user, returns a value that can then be compared to other users.

I'll try to illustrate this.

First, let's take height . Imagine that we want to approach people with similar heights. We can set ranges of heights, for example, range 1 for people considered "low", 2 for "medians" and 3 for "highs."

For eye color , we could have the 1 value for light eyes, 2 for brown and 3 for dark.

For civil status , smoke , bebo and some other attributes that can have only two states, % and 1 .

After this classification, one can think of an algorithm that, from a set of interests, returns the most appropriate profiles.

The simplest could be a query that compares each attribute and returns first those with more similarities.

The query example below sorts the profiles by similarity, whose value is computed by adding 2 to each common attribute, that is, the more common attributes, the greater the column value:

select c.*,
( 
  case when faixa_altura = :faixa_altura_interesse then 1 else 0 end +
  case when tipos_olhos = :tipo_olhos_interesse then 1 else 0 end +
  case when bebo = :bebo_interesse then 1 else 0 end +
  (...)
) semelhanca
from caracteristicas c
order by semelhanca desc

Another numerical way to do this (which I found well explained in this SOEN question ) is to consider all these features as multidimensional axes as in a Cartesian graph. Then you will find the similarity between interests and profiles through the "location" of the profile.

Consider the image below ( source ):

Thecharacteristicsofeachprofilearerepresentedbyaperiod,right?Sotofindsimilarinterests,simplyretrievetheclosestpointsofinterest.

Thedifferenceisthatinyourcasethegraphwouldhave1dimensions,beingNthenumberofattributes.

Anotherfactortoconsiderisplacingweightsinthesecharacteristics.Forexample,drinkingornotcanbemoreimportanttomatchprofilesthanheight.Todothisintheabovequery,simplyuse,insteadofN,ahighervalueaccordingtotheimportanceofthecharacteristic.

Goingalittledeeper,ifgreateroptimizationisrequiredandaqueryisnotfeasible,youcanestablishpeopleprofiles.Whenausercompleteshischaracteristics,thesystemclassifieshimintooneoftheregisteredtypes.Thisisjustanotherkindofabstractiontosimplifythedatastructure.Themoreprofilesthereare,themorerefinedtheresultwillbe.Thisisthe"thickest" yet most efficient method. The weakness is that if someone does not fit well into a pre-set profile, the chances of not finding anyone compatible will be greater.

    
28.04.2014 / 20:05
2

For each n users we will have n-1 comparisons, I did not see the "sex" in the tables, an application of this type seems important to me.

I would do a table in which I would record new users or have had some data registered, a batch routine (1) (triggered by event) would compare against n-1 users and record the Compared x User since what is true for user x > > user y is not true for user and > > user x. The SQL itself seems simple to me: attribute = attribute or attribute between attribute range the question would be to "populate" the relationship tables.

(1) Actually two procedures:

1 Read row users table to process   Flame writes attributes passing user as parameter   Deletes user from table to process

2 Calculate compatibility pair for each other user in the database   Record pair

    
26.04.2014 / 20:37
1

The ideal for your case is to use a Trigger .

Responding to your comment, a Stored Procedure is a set of stored SQL statements that can be executed through a simple call. They can be given parameters and have some dynamic behavior.

More or less on the same line, a Trigger is also a set of stored statements. The difference between it and Stored Procedures is that Trigger is called automatically, because it is tied to some database operation involving a table (insertion of data into a table, for example).

The example below will make it clearer what I mean:

DELIMITER $$
CREATE TRIGGER antesDeUpdate_caracteristica 
BEFORE UPDATE ON caracteristica
FOR EACH ROW BEGIN
UPDATE compatibilidade
SET compatibilidade = -- Coloque o valor final aqui
WHERE user_id = OLD.user_id; END$$
DELIMITER ;

This Trigger guarantees that with each update in the caracteristica table, the compatibilidade table will be updated.

I do not know how your calculation is, but if you put the details of the compatibility calculation into the question, I can improve the answer.

    
07.03.2014 / 16:42