Refactoring: When is a method "too big"?

6

I have a project that "retrieves" certain information from an HTML page, parses with the help of Beautiful Soup, and returns values in the form of dictionary , to that in another method I generate a JSON object.

The problem is that because of the peculiarity of the page, very poorly written and with excessive tags, besides problems with the organization of the information itself, I need to handle everything with a lot of use of loops and conditionals. The current method has 91 rows.

I can not logically separate these blocks of code into other methods, everything seems to me "part of the same operation". It is even more difficult because it does not seem to be useful in another situation either.

Does anyone have any suggestions on when and how can I split my code?

As an example, a method I used to "play", which shares the same problem (to make it less strange, I explain: it takes information from a UK menu page of my university):

def parse_cardapios(self):

    """Interpreta as tabelas de cardápio no site do restaurante"""

    pag = urllib2.urlopen(self.url + '/' + self.campus).read();
    soup = BeautifulSoup(pag)
    resultado = []

    # Percorre as refeições e suas respectivas tabelas de cardápio

    nomes_ref = soup.find('section', id='post-content').find_all('h2')
    tabelas_card = soup.find('section', id='post-content').find_all('table')

    for ref, tab in zip(nomes_ref, tabelas_card):

        refeicao = tratar_chave(ref)

        # Percorre todos os dias disponíveis

        nome_colunas = tab.find_all('th')
        linhas = tab.find_all('tr', class_=True)

        for lin in linhas: # Cada linha é um dia diferente

            dia_repetido = False # Para controlar a repetição

            obj_refeicoes = {refeicao: {}}
            obj_temp = {'data': '', 'refeicoes': {}}

            # Percorre cada dado

            celulas = lin.find_all('td')

            for meta, dado in zip(nome_colunas, celulas):

                meta = tratar_chave(meta)
                dado = tratar_valor(dado)

                if meta == 'data':

                    dado = dado.translate(None, 'aábcçdefghijklmnopqrstuvzwxyz- ,')

                    if not resultado:
                        obj_temp['data'] = dado

                    else:
                        for r in resultado:
                            if r['data'] == dado:
                                dia_repetido = True
                                r['refeicoes'].update(obj_refeicoes)
                                break

                        else:
                            obj_temp['data'] = dado

                else:
                    obj_refeicoes[refeicao].update({meta: dado})
                    obj_temp['refeicoes'].update(obj_refeicoes)

            if not dia_repetido:
                resultado.append(obj_temp)

    # Junta as refeições vegetarianas no mesmo cardápio que as outras

    for r in resultado:
        for s in r['refeicoes'].keys():
            if '-vegetariano' in s:
                veg = {}
                for t in r['refeicoes'][s].keys():
                    if not '-vegetariano' in t:
                        veg.update({t + '-vegetariano': r['refeicoes'][s][t]})

                    else:
                        veg.update({t: r['refeicoes'][s][t]})

                sem_sufixo = s.replace('-vegetariano', '')
                r['refeicoes'][sem_sufixo].update(veg)

        for u in r['refeicoes'].keys():
            if '-vegetariano' in u:
                del r['refeicoes'][u]

    return dict({'campus': self.campus, 'dia-cardapio': resultado})
    
asked by anonymous 10.02.2014 / 01:58

5 answers

8

This question is fairly relative. There are authors who recommend 20 lines, but tolerate up to 100. Others say that some methods can consume up to 100 ~ 200 lines and yet they are not as easily understandable as those with the least lines.

In practice, everyone says the same thing: the smaller the number of rows per method, the better, but not worth doing at a level where a part of the code can not be decreased.

Most of my methods are between 30 ~ 80 lines, but a very few come close to 200 or so, and I do not refract if part of that method can not be reused in other methods.

Update (after adding code in question)

I removed white space and documentation and its code gave 54 lines, of operations that make sense together. Definitely in your example I do not see strong reasons for splitting into more methods if this will not allow split codes to be reused in other areas.

Remember when authors suggest a maximum line limit, they ignore blank lines and comments, and yet lines are separated by logical lines . If the guy nest multiple ifs on a single line this would count as multiple rows.

    
10.02.2014 / 02:15
3

I did not read the method, but class methods, in the context of object-oriented programming, should always offer a single functionality. In this way, your application becomes more decoupled, the methods become more independent and the components are more likely to be reused.

So, I do not think there is an "ideal limit" for class sizes and methods. Ideally, your development methodology follows the "first five principles" -also known as SOLID- which are the five basic principles for object-oriented design, namely:

  • S - Single responsibility principle - a class should have only one responsibility
  • O - Open / closed principle - "software entities ... should be open for extension, but closed for modification".
  • L - Liskov substitution principle - "objects in a program should be replaceable with instances of their subtypes without altering the correctness of that program". See also design by contract.
  • I - Interface segregation principle - "many client-specific interfaces are better than one general-purpose interface."
  • D - Dependency inversion principle - one should "Depend upon Abstractions. Do not depend upon concretions. "

Source: link

    
10.02.2014 / 18:39
3

You have at least two strong reasons to break a method into other smaller methods: reusability and legibility. In your case, you mention that you will not reuse this code elsewhere. This argument may also change as you break into smaller blocks, reusing certain algorithms.

Especially in your case, readability could be greatly improved. Within each for you virtually have a comment saying what it does. Comments are too fragile for this. If you create an accessory method for each for , for example, you would eliminate comments by making the code self-descriptive.

For example, your main method might be:

def parse_cardapios(self):

    nomes_ref, tabelas_card = carrega_refeicoes()

    return percorre_refeicoes(nomes_ref, tabelas_card)

Far more readable, in my opinion.

    
10.02.2014 / 20:24
2

A general rule is that if a refactoring can make the code simpler, less complex, easier to understand and maintain, it should be done. The ideal is to create good abstractions, cohesive, with low coupling. Within this context it is necessary to evaluate whether a method should be refactored or not. Sometimes the solution is not trivial. Methods that have long stretches involving conditionals, for example, require a paradigm shift for refactoring (eg replacement by strategies that use polymorphism, transferring the responsibility of the blocks to individual objects) There is also the possibility of analyzing the methods due to their complexity cyclomatic. Tools like PMD, Checkstyle, Sonar, do this and are useful for discovering methods that should be reviewed.

(This response was somewhat subjective due to the question being asked too.) Please send snippets of your code by identifying your questions more objectively so that we can discuss more practical solutions.)

    
10.02.2014 / 03:07
2

This question is really very relative, it depends a lot on the opinion of each one. Home In the code you posted above I would separate by steps, you have already separated yourself, using comments, that is, would make a function to go through meals, one to go through the days, etc.
It's really hard to say when a method is too big, even those line number definitions may not be as correct, I think it's just the experience that will help you judge it right.

    
10.02.2014 / 17:29