UPDATE : Anderson Carlos Woss has a great answer on Sieve, answered one month before mine: link
I was a member of the Programming Marathon team at my college. We were the Collectors of Balloons , from the UECE. And cousins were almost constant during Marathon issues, so we needed something fast, practical and efficient, very efficient.
The algorithm we used to calculate prime numbers was Riddle of Eratosthenes . Why was he efficient? Because it calculated all cousins less than or equal to n
in a time of o(n log log n)
; and only needed to do this once, we could store this calculated value at the beginning of the program and use it later. I also need to point out that the Script requires o(n)
memory, so it can only be used for small numbers (like 700 million in C, maybe 100 million in Python, for example).
How does this algorithm work? Well, it works by marking the positions not known raw / compound numbers of a vector based on the cousins it finds. At the end of the Sieve, all remaining unmarked positions are numbers known to be prime.
Using the resulting prime vector to compute the largest prime less than n
Suppose I have a vector called numero_eh_primo
of booleans. It must have at least n + 1
positions to make the Sieve work. Obviously the Script has already run and we have this vector filled in as follows:
-
numero_eh_primo[i]
is true if i
is prime
-
numero_eh_primo[i]
is false if i
is compiled
To find the largest cousin that is less than or equal to n
, simply run the vector from n
, decrementing a unit if the position is false, and return the first index found if the value is true.
If I have a lot of memory, I can apply a strategy of memo to make future queries the same number n
of constant order. For this, I need an auxiliary vector called maior_primo_menorque
, which will return the largest prime greater than or equal to the passed index. For example: maior_primo_menorque[100]
returns 97 when it is populated.
How do I know if the value is not filled in? Simple if it is null. Being null, how to fill in? Well, let's go to the memo algorithm:
def get_maior_primo(n):
if maior_primo_menorque[n]:
return maior_primo_menorque[n]
else:
if numero_eh_primo[n]:
maior_primo_menorque[n] = n
return n
else:
maior_primo_menorque[n] = get_maior_primo(n)
return maior_primo_menorque[n]
The Riddle of Eratosthenes
So far I just said I used the Sieve, but at no point did I tell you how this Sieve works. Come on.
The sieve starts with a vector with n + 1
positions, all set as truth a priori (ie, prime cousins are possible). So we neglected the numbers 0 and 1, marking them as false. After that, we traverse the vector sequentially, until we find an index that is set as true. At this point, we find a true cousin, then we must mark all his manifolds as false; after marking the multiples as false, we return to the vector sequentially.
def crivo(n, vetor_crivo_inicializado_como_true):
vetor = vetor_crivo_inicializado_como_true
vetor[0] = False
vetor[1] = False
for i in range(2, n + 1):
if vetor[i]:
# achou um primo, vamos marcar todos os múltiplos relevantes
marcar_multiplos(n, vetor, i)
def marcar_multiplos(n, vetor_crivo, primo):
for i in range(primo*primo, n + 1, primo):
vetor_crivo[i] = False
Note the optimization of starting marking multiples from the prime square: every multiple of that prime with another value has already been marked in an earlier step. For example, with the cousin found 5, I have already marked the numbers 10 (multiple of 2), 15 (multiple of 3) and 20 (multiple of 2), the first unpublished compound number being the square of 5, 25. p>
@LINQ gave me the hint of making an MCVE, I'll send one as soon as possible