I imagine this to be a social networking type, right? I can not see how feasible algorithms that compare users one by one. It would be very inefficient to run this in a trigger and "time-consuming" in a background process (how many minutes should the user wait to see the result?).
I believe that the ideal solution is a type of pre-classification that, applied to a given user, returns a value that can then be compared to other users.
I'll try to illustrate this.
First, let's take height . Imagine that we want to approach people with similar heights. We can set ranges of heights, for example, range 1
for people considered "low", 2 for "medians" and 3 for "highs."
For eye color , we could have the 1
value for light eyes, 2
for brown and 3
for dark.
For civil status , smoke , bebo and some other attributes that can have only two states, % and 1
.
After this classification, one can think of an algorithm that, from a set of interests, returns the most appropriate profiles.
The simplest could be a query that compares each attribute and returns first those with more similarities.
The query example below sorts the profiles by similarity, whose value is computed by adding 2
to each common attribute, that is, the more common attributes, the greater the column value:
select c.*,
(
case when faixa_altura = :faixa_altura_interesse then 1 else 0 end +
case when tipos_olhos = :tipo_olhos_interesse then 1 else 0 end +
case when bebo = :bebo_interesse then 1 else 0 end +
(...)
) semelhanca
from caracteristicas c
order by semelhanca desc
Another numerical way to do this (which I found well explained in this SOEN question ) is to consider all these features as multidimensional axes as in a Cartesian graph. Then you will find the similarity between interests and profiles through the "location" of the profile.
Consider the image below ( source ):
Thecharacteristicsofeachprofilearerepresentedbyaperiod,right?Sotofindsimilarinterests,simplyretrievetheclosestpointsofinterest.
Thedifferenceisthatinyourcasethegraphwouldhave1
dimensions,beingN
thenumberofattributes.
Anotherfactortoconsiderisplacingweightsinthesecharacteristics.Forexample,drinkingornotcanbemoreimportanttomatchprofilesthanheight.Todothisintheabovequery,simplyuse,insteadofN
,ahighervalueaccordingtotheimportanceofthecharacteristic.
Goingalittledeeper,ifgreateroptimizationisrequiredandaqueryisnotfeasible,youcanestablishpeopleprofiles.Whenausercompleteshischaracteristics,thesystemclassifieshimintooneoftheregisteredtypes.Thisisjustanotherkindofabstractiontosimplifythedatastructure.Themoreprofilesthereare,themorerefinedtheresultwillbe.Thisisthe"thickest" yet most efficient method. The weakness is that if someone does not fit well into a pre-set profile, the chances of not finding anyone compatible will be greater.