What is the best way to represent an Address?

37

By modeling addresses in an application, I naively followed the standard form when representing in the DB (ie separate tables for country , status strong city , neighborhood etc - joins everywhere), and in the user interface (html) I put a separate field for each part of that address autocomplete text or combo box ). At first I thought I was good this way, and I did not think much of it.

However, after doing a question on UX.SE (relative to our standard of using "X-Street" instead of "X street") the answers led me to question whether it was worth practicing in practice to require a high level of detail in the representation. Aside from the added complexity of fetching / updating (as exemplified by in this "pyramid" ), I do not know how the system performs when it contains a high number of addresses.

I would like to know, for those who already have experience dealing with a large number of addresses, which practices would be recommended: leave everything normalized, use an open text field, condense some tables into one .: cidade_estado_pais ) and leave others separated, etc. Taking into account that:

  • Few users will enter many addresses, other people (if it was each user entering their own address once and ready, it would not justify investing in usability).
  • If a part of the address already exists in the database (eg, a previously registered street), autocompletion can be used to speed up data entry; this would be more difficult if the address were an open field.
  • Some data are easier to find and pre-popular (eg full list of Brazilian cities), others are more difficult or more expensive - it may be best for the user to enter them on demand (but still allowing autocompletion).
  • If a field is opened, it is more subject to duplication (eg "Av Foo", "Foo Av", "Foo Avenue", "A. Foo", "Foo"); but duplication is not necessarily a problem ...
  • It's harder to do aggregation in a denormalized field (eg, if I want statistics by state, but I've grouped cidade_estado_pais into a single field, I'll have problems).
asked by anonymous 19.12.2013 / 01:04

4 answers

22

In my opinion, the best address database in Brazil is e-DNE, which has more than 900 thousand records and, in my opinion, works very fast. Follow the diagram on their bank.

WhenIneededtousethisdatabase,Ifoundittobeveryeasyandquick,Ibelieveitwillserveyourpurposesandofcoursethispostofficeisverycomplexbecauseitcoversmanythingsbutyoucanreduceittoyourpurposes.

Here you can download a zip with the model of the bank and a doc file which explains very well how it works. So I do not think you need to reinvent the wheel, let's mirror the best ...

Now in the duplicate address issue and duplicate data in general, the best way to solve this is a very smart algorithm that will compare the data provided and cross it with information to try to unify the information.

I recommend reading this Link , and good luck after all is not a simple task, but achieving the expected results is very rewarding .

    
19.12.2013 / 20:42
14

In our case, the way the data is organized is directly linked to the use of the data to facilitate its filling, gathering and analysis, ie:

  • If we are going to carry out researches or elaborate statistics on certain information, this information must be separated:

    Statistics on the person's location requires the localidade field to be in a column just for him.

    Example: Coimbra

  • If the field is not relevant to searches or statistics, the field may be along with other information:

    The case of the streets where the person resides, which may have short or long names but are not subject to analysis, can be in one column:

    Example: Av. da Liberdade or Avenida da Liberdade is the same. Resumed to text that separated in two columns, one for the type of street and another one for the name of the street will only generate confusion in the hour to fill and / or to collect this information.

  • Having everything separated if there is no advantage is unnecessary. Having everything together and then needing to analyze the data separately also makes no sense.

    What we're trying to do is make sure that data is more common and easier to manage separately. Then we cross this result with the data that we already know will be subject to analysis, taking into account those that may be subject to analysis in the future.

    With the result, we have a good idea of how we should keep all the information.

    On the other hand, we also have to consider the use of secondary tables, as is the case in the countries. It does not make sense for a user to have to write the name of their country if there are X countries around the world and this number is rarely changed, it makes sense to have a secondary table with the countries and the user just has to select his. p>     

    19.12.2013 / 02:19
    9

    Working on a gigantic project, I realized that it is not possible to get away from this paradigm, leaving each table separated by street, neighborhood, city, etc. Something important is always to maintain indexes in all the tables and optimize your queries, today we also make a filter, so that the user is more specific in the search if it brings many results, my development team is remodeling the system from scratch starting with BD, priori this is the experience I had, if we get in this part of the modeling of the bank, and something is modified I will be happy to come in this question and demonstrate to have something more concrete for you, I hope even if little contributed. .

        
    19.12.2013 / 01:54
    8
      

    I would like to know, from those who already have experience dealing with a large number of addresses, which practices would be recommended: leave everything normalized, use an open text field, condense some tables into one (eg, city_stat) and leave others separated, etc.

    In the company where I work, I am responsible for a service that consolidates addresses from several other systems. Since you have quoted the question of experience much of what I will say here is my experience / opinion.

    »Normalize or not?

    Normalize. Unless what you are doing is extremely simpleton, I recommend normalizing the data. Consistency of data will not only free you from future problems (for example, it is harder to insert an invalid address) but will also prevent cracking of this data (a possible report or integration with another system that has these standardized data).

    »Add tables?

    I see no need. The only possibility of adding tables that I see is in the example you mentioned (city-state-country). If you correctly modeled a join or search it will not generate a significant overhead .

    »Type of place?

    There are systems that dedicate a column of its own to the type of street address (avenue, street, mall, etc.). I do not think it's important to do it unless it's really necessary.

    Consider, for example, how to extract this information provided by the user. Or you make a select drop down list ) and force the user to select a correct type of site (which is not very functional / nice) (which can be tricky to do properly or leave the flow locked).

    »How to model anyway?

    Here is a practical example of the service I mentioned. The actual base has hundreds of thousands of addresses.

    Table Endereco :

    • CEP : PK of the table. Field with exactly 8 characters. Always without a mask.
    • Logradouro : Including the type of the street and the address number.
    • Bairro
    • Localidade : City. Have index.
    • UF : Field with exactly 2 characters. It is an FK for the federative unit table.
    • Data : Address registration / update date.

    Table Unidade Federativa

    • Sigla : PK of the table. Field with exactly 2 characters. State Abbreviation.
    • Nome : Full state name.

    »Initial charges

    I recommend only for federative unit and city. Federative unity changes are rare. City it is more common to have changes but nothing that can not be maintained. Any more localized field, such as Neighborhood , is already difficult to maintain a load.

    Obs. : I mean data that is not entered by users. Given these static maintained by the system itself (often manually by the developers themselves). So the recommendation to do only with city and federative unit .

    »Search the column? Index!

    Remember to create index in the columns you search for in them (eg neighborhood autocomplete).

    »Each case is a case

    The recommendations I leave are for relational database and addresses in Brazil. Remember to adapt and put your own knowledge and criticism when modeling your system. After all, each case is a case .

        
    19.12.2013 / 22:28