Common predecessor in a commits graph

5

I bumped into an interesting problem again during an interview. Unfortunately I am also not sure that I found an optimal (or good enough) algorithm. I thought about sharing to see if anyone knows a classic solution or a more efficient variation of my implementation.

Context

One way to represent a source code repository (eg, Git and Mercurial) is through a commits graph:

Examples
  • Graph with linear commits

    C1-C2-C3-C4
    
  • One-twig graph

           C5-C6
          /
     C1-C2-C3-C4
    
  • Graph with a branch that has been merged again (some better translation for merged ?) on the main line ( C7 has two direct predecessors: C4 and C6 )

           C5-C6
          /     \
     C1-C2-C3-C4-C7
    
  • The problem

    Implement a function that finds the most recent common predecessor of any two commits of the graph. The Commits graph is represented by a String[] called commits containing all the ids of the commits sorted by date (from the most recent to the most old) and a String[][] called relatives .

    The predecessors' ids for a commit at position i can be found in parentes[i] . The last position in the relatives array is always null since it represents the predecessors of the commit root. For example, the graph of Example 3 is represented by:

    String[] commits = {"C7", "C6", "C5", "C4", "C3", "C2", "C1"};
    String[][] parentes ={{"C6","C4"}, {"C5"}, {"C2"}, {"C3"}, {"C2"}, {"C1"}, null}; 
    

    If one of the commits is the predecessor of the other, this commit is the common predecessor.

    Signature of method

    String predecessorComum(String[] commits, String[][] parentes, String commit1,
            String commit2);
    

    Example:

    The call:

     String predecessor = predecessorComum(commits, parentes, "C4", "C6");
    

    Returns C2

    Complexity

    You can write a O (n) solution for this problem.

    A first solution

    My attempt to solve this problem began with the classic LCA algorithms (see this issue of the utluiz ), unfortunately the solution in time linear were complex and not very easy to implement using predecessor list.

    I then implemented the following (in my view, not optimal) steps.

  • Map of index IDs (complexity O (n) ) so you can navigate efficiently in arrays:

    final Map<String, Integer> idParaPosicao = new HashMap<String, Integer>();
    for (int i = 0; i < commits.length; i++) {
        final String id = commitHashes[i];
        idParaPosicao .put(id, i);
    }
    
  • Recursive auxiliary method to stack all predecessors of a commit to root, including itself (complexity O (p + n) ):

    private Stack<Integer> predecessores(Stack<Integer> visitados, int id, 
            String[][] parentes, Map<String, Integer> idParaPosicao) {
        visitados.push(id);
        final String[] parentesD = parentes[id];
        if (parentesD != null) {
            for (String parente : parentesD) {
                predecessores(visitados, idParaPosicao.get(parente), parentes, hashToIndex);
            }
        }
    
        return visitados;
    }
    
  • Mount the stack for the two commits :

    final Stack<Integer> predsC1 = predecessores(new Stack<Integer>(), indiceC1, parentes, 
            idParaPosicao);
    final Stack<Integer> predsC2 = predecessores(new Stack<Integer>(), indiceC2, parentes, 
            idParaPosicao);
    
  • Unfold the predecessors from the root while they are the same. The last commit is the most recent predecessor (complexity O (n) ):

    int lca = -1;
    
    while (!predsC1.isEmpty() && !predsC2.isEmpty()) {
        final int a1 = predsC1.pop();
        final int a2 = predsC2.pop();
        if (a1 == a2) {
            lca = a1;
        } else {
            break;
        }
    }
    
  • Return the commit :

    return commits[lca]; 
    
  • Self-criticism:

  • While I think this is a solution with O(n + p) time, it has a strong constant up front, most likely better algorithms already exist.
  • Operation: I do not have mathematical proof or even trust that this algorithm works in all cases (it worked for the base case and some other variations).
  • Spatial complexity: I used a map and two stacks (besides recursion stacks). I'm sure this can be improved (in particular I believe the map should be unnecessary).
  • Recursion: The helper method is nothing more than a recursive preorder search. Since Java does not have tail optimization (I rewrote this code in Scala and the performance difference for larger trees was absurd) any larger graph will result in stack overflow . I could not implement an iterative solution.
  • I finally think that the chosen data structure (graph represented by a list of predecessors ordered from the most recent commit to the oldest) should have properties that simplify the problem. I have found, for example, several intelligent algorithms for finding LCAs in binary search trees.
  • asked by anonymous 26.05.2014 / 03:26

    1 answer

    4

    I think your solution is not so sub-optimal. The bigger question is that it is not necessary to mount two stacks of prodecessors.

    I've made a slightly more optimized workaround (in theory, as your benchmark may even be faster).

    public class GraphPredecessor {
    
        static String predecessorComum(String[] commits, String[][] parentes, String commit1,
                String commit2) {
    
            //validação
            if (commit1 == null || commit2 == null || parentes == null || commits == null
                    || commits.length != parentes.length) {
                return null;
            }
    
            //caso ótimo
            if (commit1.equals(commit2)) {
                return commit1;
            }
    
            //mapa (commit, parents[])
            Map<String, String[]> parentMap = new HashMap<String, String[]>();
            for (int i = 0; i < commits.length; i++) {
                if (parentes[i] != null) {
                    parentMap .put(commits[i], parentes[i]);
                }
            }
    
            //adds all parents of commit1 into a Set
            Set<String> commit1Parents = new HashSet<String>();
    
            //iterate over parents without recursion
            LinkedList<String> q = new LinkedList<String>();
            q.push(commit1);
            commit1Parents.add(commit1);
            while (!q.isEmpty()) {
                String s = q.pop();
                String[] parentsArray = parentMap.get(s);
                if (parentsArray != null) {
                    for (String p : parentsArray) {
                        //for each parent, if it's commit2, then return it!
                        if (p.equals(commit2)) {
                            return p;
                        } else {
                            //otherwise just push the node for the next loop
                            q.push(p);
                            //and adds to parent list
                            commit1Parents.add(p);
                        }
                    }
                }
            }
    
            //finds the first commit2 parent in commit1Parents
            q.push(commit2);
            while (!q.isEmpty()) {
    
                String[] parentsArray = parentMap.get(q.pop());
                if (parentsArray != null) {
                    for (String p : parentsArray) {
                        if (commit1Parents.contains(p)) {
                            return p;
                        }
                        q.push(p);
                    }
                }
    
            }
    
                //no luck
            return null;
    
        }
    
        public static void displayResult(String expected, String returned) {
            System.out.println("---------- TEST ----------");
            System.out.println("Exptected " + expected);
            System.out.println("Result    " + returned);
            System.out.println("Status    " + (expected.equals(returned)?"OK":"ERROR"));
        }
    
        public static void main(String[] args) {
            String[] commits = {"C7", "C6", "C5", "C4", "C3", "C2", "C1"};
            String[][] parentes ={{"C6","C4"}, {"C5"}, {"C2"}, {"C3"}, {"C2"}, {"C1"}, null};
    
            displayResult("C2", predecessorComum(commits, parentes, "C4", "C6"));
            displayResult("C2", predecessorComum(commits, parentes, "C6", "C4"));
            displayResult("C1", predecessorComum(commits, parentes, "C1", "C7"));
            displayResult("C1", predecessorComum(commits, parentes, "C7", "C1"));
            displayResult("C7", predecessorComum(commits, parentes, "C7", "C7"));
            displayResult("C6", predecessorComum(commits, parentes, "C6", "C7"));
            displayResult("C6", predecessorComum(commits, parentes, "C7", "C6"));
            displayResult("C2", predecessorComum(commits, parentes, "C2", "C7"));
            displayResult("C2", predecessorComum(commits, parentes, "C7", "C2"));
            displayResult("C2", predecessorComum(commits, parentes, "C5", "C3"));
            displayResult("C2", predecessorComum(commits, parentes, "C3", "C5"));
    
        }
    
    }
    

    This algorithm basically does the following:

  • Places all the parents of commit1 in a set.
  • Go looking at all the parents of commit2 until you find the first one in the set.
  • Exemplifying with the question data:

          C5-C6
         /     \
    C1-C2-C3-C4-C7
    

    When searching for C4 and C6 , the algorithm first sets up a set with the children of C4 . Namely:

    C1, C2, C3, C4
    

    Then it iterates over the parents of C6 :

    • C5 is in the set? No, go to the next one.
    • C2 is in the set? Yes, then C2 returns.

    Note that in this algorithm it will never be necessary to read more than N nodes (except perhaps in the last comparison), because finding a node that has already been processed will be the result of the algorithm.

        
    26.05.2014 / 22:26