How to generate 200,000 primes as fast as possible in Python?

34
  

Attention: I'm not looking for pieces of code ready. I just want someone to help me think of some way to generate these 200,000 primes as efficiently as possible in Python.

I'm solving the # 61 CodeAbbey issue, but I can not think of how to generate the cousins that the exercise asks for.

The statement basically considers that there is a list that contains all the prime numbers of the world, beginning with 2. Based on this list, the statement gives a series of indexes (indexes of this list), and the program has to return the prime that occupies this index in this imaginary list.

Ex: In list [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, ...], index 1 is 3; the 2 is 5, ...

The statement ensures that the series of indexes that it will provide at random does not have any index greater than 200,000 (so 200,000 prime cousins at most)

Example of input and output:

input data:
7 1 199999 4    # são os índices da list imaginária

answer:
19 3 2750131 11    # são os primos dessa list, correspondentes aos índices acima

At the end of the statement, CodeAbbey warns that there is a way to generate the result by +/- 1 second. I could leave my code the way it is now, wait two days and give the answer (hahaha), but that would be cheating and I would not have learned anything from exercise.

I left my program running for 10 minutes and it did not work. Maybe a second is a very short time, but it sure has a way to solve in, um, less than 20 seconds?

To try to solve, I'm deploying the Sieve of Eratosthene (or Sieve of Eratosthenes )

Follow my current code:

from math import sqrt

def primos(n, maxIndex):
    maiorCheck = int(sqrt(n))
    lista = []
    i, x, qtdPrimos = 0, 0, 0
    for m in range(2, n+1):
        lista.append(m)
    while x <= maiorCheck:
        x = lista[i]
        i += 1
        for numero in lista:
            if numero != x:
                if numero % x == 0:
                    lista[lista.index(numero)] = 0
        while 0 in lista:
            del lista[lista.index(0)]
        if len(lista) == maxIndex:
            x = maiorCheck + 1
    return lista

qtdPrimos = int(input('Entre com a quantidade de primos a serem impressos: ').strip())

indexes = input('Entre com os índices para os quais serão retornados primos: ').strip().split()
result = []

for n in range(len(indexes)):
    indexes[n] = int(indexes[n])

arrayPrimos = primos(2750131, max(indexes))

for m in range(len(indexes)):
    result.append(str(arrayPrimos[indexes[m]-1]))

print('\n--- R E S U L T A D O ---\n'+' '.join(result))
    
asked by anonymous 23.08.2017 / 19:15

5 answers

32

With the Sieve of Eratosthenes I achieved very satisfactory results. Remember that as the sieve returns the list of all prime numbers smaller than the input value, we will have to enter the highest prime value that the program will accept. The statement says that it is guaranteed that a number will not be requested after 200,000 °, so we must first know what this value is: 2,750,159 .

A fairly simple implementation of the sieve is:

def sieve_of_eratosthene(N):

    # Cria-se uma lista referente a todos os inteiros entre 0 e N:
    A = [True] * (N+1)

    # Define os números 0 e 1 como não primos:
    A[0] = A[1] = False

    # Percorra a lista até encontrar o primeiro número primo:
    for value, prime in enumerate(A):

        # O número é primo?
        if prime:

            # Retorna o número primo:
            yield value

            # Remova da lista todos os múltiplos do número enontrado:
            for i in range(value**2, N+1, value):
                A[i] = False

So, we can get all the first 200,000 prime numbers by doing:

primes = list(sieve_of_eratosthene(2750159))

And generate the output as follows:

print("Saída:", [primes[i] for i in [7, 1, 199999, 4]]) # Saída: [19, 3, 2750159, 11]

Being [7, 1, 199999, 4] the program entry.

  

Note: The 199,999 prime number will be 2,750,131, as expected, only if the number 2 occupies the index 1. Considering that the implementation sets the index 0 to the value 2, the value expected on exit would be referring to the 199,998 ° index.

Using the timeit module for time measurement, I get an average time of 100 executions of 0.69539769177063s .

  

See working at Ideone .

A brief explanation of the code

The first line of the function, A = [True] * (N+1) creates a list of N+1 elements, all set to True , initially indicating that all values between 0 and N are prime numbers. Soon after, values 0 and 1 are defined as False , indicating that these are not prime numbers. And with a loop of repetition, all other values are traversed, and whenever it finds a prime value, it returns it and removes from the list, defining as False , all numbers that are multiples of that found prime. For a better view, the A list would look something like:

A = [False, False, True, True, False, True, False, ...]

Indicating that 0 is not prime, 1 is not prime, 2 is prime, 3 is prime, 4 is not prime, 5 is prime, 6 is not prime, and so on. What the enumerate function does is to return a pair of values where the first represents the index in the list and the second the value itself. So, doing enumerate(A) would return something similar to:

[(0, False), (1, False), (2, True), (3, True), (4, False), (5, True), (6, False), ...]

And that's why in for there are two values. The first value returned from enumerate is assigned to value and second to prime , so when prime is true, we know value will be a prime number.

for value, prime in enumerate(A):
    ...

While yield plays the role of return , however, for a generator . I confess that I ended up complicating more than necessary in this code using a generator, because since the generator should be converted to a list, I could generate it directly. The code looks like this:

def sieve_of_eratosthene(N):

    # Lista de números primos:
    numbers = []

    # Cria-se uma lista referente a todos os inteiros entre 0 e N:
    A = [True] * (N+1)

    # Define os números 0 e 1 como não primos:
    A[0] = A[1] = False

    # Percorra a lista até encontrar o primeiro número primo:
    for value, prime in enumerate(A):

        # O número é primo?
        if prime:

            # Retorna o número primo:
            numbers.append(value)

            # Remova da lista todos os múltiplos do número enontrado:
            for i in range(value**2, N+1, value):
                A[i] = False

    return numbers
  

See working at Ideone .

Note that the main difference is that where before I used yield , now I used append of a list.

An extra implementation

As discussed by LINQ in its response, having to know the n -th prime value in advance can be considered a limitation of the solution. A practical alternative is to apply the mathematical concept presented at the end of this answer to calculate a prime value close to the desired one. Knowing that the n -th prime number, Pn, is less than n ln(n) + n ln(ln(n)) , if we calculate all the prime values up to this value we are sure that we will have calculated the value of Pn. Something like:

def sieve_of_eratosthene(N):

    N = floor(N*log(N) + N*(log(log(N))))

    # Lista de números primos:
    numbers = []

    # Cria-se uma lista referente a todos os inteiros entre 0 e N:
    A = [True] * (N+1)

    # Define os números 0 e 1 como não primos:
    A[0] = A[1] = False

    # Percorra a lista até encontrar o primeiro número primo:
    for value, prime in enumerate(A):

        # O número é primo?
        if prime:

            # Retorna o número primo:
            numbers.append(value)

            # Remova da lista todos os múltiplos do número enontrado:
            for i in range(value**2, N+1, value):
                A[i] = False

    return numbers
  

See working at Ideone .

With the addition of the first line, we can now call the sieve_of_eratosthene(2e5) function to get the first 200,000 prime numbers. In fact, you get even more, so the runtime can increase.

Possible alternative

It is mathematically proved, if I understand correctly, that the n -th prime number, being n greater than or equal to 6, will belong to the set of numbers defined by: / p>

For example, for n = 199999 , the following set is obtained:

2741586 < P_n < 2941585

That contains the expected value 2750159; however, this will not be the only prime number in this range, so the challenge following this logic would be to correctly identify the value within the range.

    
23.08.2017 / 22:54
14

My contribution is a somewhat naive implementation. It takes (on my machine) about 71 seconds to generate 200,000 cousins. Nevertheless, it is a basic implementation, it does not use anything from third parties and it is very easy to understand.

In this implementation, it is not necessary to know in advance what the nth prime number is, that is, you can create a list with any number of prime numbers.

Note that this is not an adaptation of your code, it was created from scratch. The algorithm consists of searching "raw way" by checking all odd numbers.

The algorithm is based on the assertion that a number is composed (non-prime) if, if only, there is a prime divisor equal to its square root.

from math import sqrt, ceil
import time

def checkPrime(num):
    if num % 2 == 0: return False
    i = 3
    while i <= ceil(sqrt(num)):
        if num % i == 0: 
            return False
        i += 2
    return True

def primes(q):
    primelist = [2]
    number = 3
    while len(primelist) < q:
        if(checkPrime(number)):
            primelist.append(number)        
        number += 2    
    return primelist

Version 2

This takes 24 seconds (on the same machine) to generate the 200,000 cousins. The difference between this and the other is that, in checking to see if a number is prime or not, only known prime numbers are used.

That's because, as I said above, a number is composed (non-prime) when it has some prime divisor less than or equal to its square root.

def checkPrime(num, baseList):    
    for p in baseList:
        if(p > ceil(sqrt(num))): break

        if num % p == 0:
            return False
    return True

def primes(q):
    primelist, number = [2], 3

    while len(primelist) < q:
        if checkPrime(number, primelist):
            primelist.append(number)
        number += 2
    return primelist

The usage would look like this:

def main():
    lista = primes(200000)
    print("Saída: ", [lista[i] for i in [199999, 1, 7]])
  

Output : [2750159, 3, 19]

    
23.08.2017 / 21:26
6

Code:

p = []

def gerar_primos():

    limite = 2750159 + 1

    primos = []
    nao_primo = set()

    for n in range( 2, limite ):

        if n in nao_primo:
            continue

        for f in range( n * 2, limite, n ):
            nao_primo.add( f )

        primos.append( n )

    return primos;


def primo(idx):

    global p

    if not p:
        p = gerar_primos();

    return p[ idx - 1 ]


print primo(7)
print primo(1)
print primo(199999)
print primo(4)

Output:

$ time python primos.py 
17
2
2750131
7

real    0m1.963s
user    0m1.891s
sys 0m0.072s
    
23.08.2017 / 21:31
4

I arrived late in response, but I will contribute an implementation that is very interesting and useful, because it DOES NOT NEED LIMIT .

It uses the algorithm of the Erastóthenes sieve, but in an inverted, mobile way, storing only 1 future multiple of each combination of cousins and releasing memory as it advances, so that it is not necessary to define in advance which the end of the calculation! And so you do not have to define a huge list in memory to be marked.

def eratosthenes():
    D = {}  # dicionario para armazenar os futuros primos
    q = 2   # primeiro inteiro a testar
    while True:
        if q not in D:
            yield q       # nao esta marcado entao eh primo
            D[q*q] = [q]  # armazena somente o primeiro multiplo 
                          # que ainda nao esta armazenado
        else:
            for p in D[q]:  # todos os nao-primos armazenados aqui precisam andar
                D.setdefault(p+q,[]).append(p)
            del D[q]        # libera memoria do que ja passou
        q += 1

So you can call this function and stop when you want, for example

for p in eratosthenes():
    if p > 200000:
        break
    print(p)
    
25.09.2018 / 03:48
1

I know a little Java and I'm learning Python. I made this algorithm and it calculates all the odd numbers extremely quickly. On my notebook it calculates and prints all prime numbers from 0 to 1,000,000 in less than 3 seconds. And 0 to 4,000,000 does in about 17 seconds.

import time

num2 = int(input("Informe até qual número procurar: "))
inicio = time.time()
print("Esses são os número primos entre 0 e %i"%num2)
# lista com o primeiro número primo.
primos = [2,]
print(primos[0])

# só testa números impares
for x in range(3,num2+1,2):
num_divisoes = 0
    # pega o primeiro número e tenta dividi-lo pelos número primos menores que ele.
    for y in primos:
        # Quando o cociente for menor que o divisor para de testar números primos maiores.
        if x//y < y:
            break

        #se der para dividir o número por algum número primo já descoberto então ele não é primo.
        if x % y == 0:
            #marca que ele não é primo.
            num_divisoes = 1
            break

    # se não tiver nenhuma divisão adiciona o número na lista de primos.
    if num_divisoes == 0:
        primos.append(x)
        print(x)
        num_divisoes = 0

print("Existe %i números primos entre 0 e %i"%(len(primos),num2))
fim = time.time()
print("Executado em %.3f segundos"%(fim-inicio))

What made my code extremely fast was this part:

# Quando o cociente for menor que o divisor para de testar números primos maiores.
if x//y < y:
   break

Without this excerpt the prime number 999.983 would do about 78,497 tests to see if it is prime. Like this code it does only 169 tests.

    
11.10.2018 / 03:11