Program to simulate Birthday Paradox

12
In probability theory, the birthday paradox says that given a group of 23 (or more) randomly chosen people, the chance that two people will have the same birthday is over 50%. For 57 or more people, the probability is greater than 99%, however, it can not be exactly 100% unless you have at least 367 people.

Write a program that receives a number n of people and an x number of repetitions and draws lists with birthday dates, and checks if there is any matching date. For each loop one must re-sort the lists, and for each list where there are coincident cases one should add 1 to the number of favorable cases. After rotating the x loops of the percentage of times that there were coincident anniversaries.

Detail: Use list compression to generate birthday dates

Solution Attempt:

import random

datas =[x for x in range(1,366)]
#print(datas)


n = int(input("Digite o número de pessoas: "))
x = int(input("Digite o número de repetições: "))
datas_sorteadas = []
favoraveis = 0
for i in range(x):
    for i in range(n):
        datas_sorteadas.append(random.choice(datas))
        print(datas_sorteadas)

    for data in datas_sorteadas:
        if datas_sorteadas.count(data)>=2:
            favoraveis +=1
    datas_sorteadas = []
    datas_sorteadas.append(random.choice(datas))
    print(datas_sorteadas)

print("Casos Favoráveis: ", favoraveis)

print("n*x",n*x)
print("Percentual: ", (favoraveis/(n*x)))
#print(datas_sorteadas)

The program is running without errors but I suspect it is not correct. The results do not fit the theory. Any ideas on how to fix it?

    
asked by anonymous 04.04.2018 / 19:15

2 answers

18

From the description of the problem, I think you're doing some things that are not easy, it's simpler than it sounds.

What is the paradox / birthday problem?

To simulate this we can do this:

  • Receive as input the number of people ( num_p ) and number of tests / repetitions ( num_loops ) to do;
  • In each test we generate a list with random integer% (number of people), each one can have a value between 1 and 365 to simulate each person's birthday. Ex: in a test with 6 people we can get num_p ;
  • Check if there is any value in the generated list that occurs more than once in the same list, in this case I use [4, 144, 233, 98, 144, 43] and any() . Ex: count() (birthdays of 6 people), in this case 144 repeats itself, that is, two people have the same birthday day;
  • If the above point (3) is found to be because we have at least 1 repeating value, we increment our variable [4, 144, 233, 98, 144, 43] , and at the end of the tests calculate the percentage of favoraveis in favoraveis of tests).
  • Code:

    import random
    
    num_p = int(input("Digite o número de pessoas: "))
    num_loops = int(input("Digite o número de repetições: ")) # num de testes
    favoraveis = 0
    for _ in range(num_loops):
        ani_dates = [random.randint(1, 366) for _ in range(num_p)] # sortear nova lista de datas (dias) de aniversario
        if(any(ani_dates.count(i) > 1 for i in ani_dates)): # verificar se existe a mesma data (valor) mais do que uma vez na lista
            favoraveis += 1
    
    probs_perc = (favoraveis/num_loops)*100
    print('Em {} pessoas e {} testes deram-se {} vezes em que pelo menos duas pessoas fazem anos no mesmo dia, percentagem: {}%'.format(num_p, num_loops, favoraveis, probs_perc))
    # Em 23 pessoas e 1000 testes deram-se 504 vezes em que pelo menos duas pessoas fazem anos no mesmo dia, percentagem: 50.4%
    # Em 57 pessoas e 1000 testes deram-se 993 vezes em que pelo menos duas pessoas fazem anos no mesmo dia, percentagem: 99.3%
    

    DEMONSTRATION

    Issue

    After reviewing this response and performing some tests I noticed that we achieved better performance by using num_loops and its characteristic of not storing duplicate values, so we can simply check if the length of set() is different (smaller) than the number of people ( set ), if it is because there were duplicate values, and we had at least two people doing the same day ( num_p ):

    import random
    
    num_p = int(input("Digite o número de pessoas: "))
    num_loops = int(input("Digite o número de repetições: ")) # num de testes
    favoraveis = 0
    for _ in range(num_loops):
        ani_dates = {random.randint(1, 366) for _ in range(num_p)} # sortear novo set de datas (dias) de aniversario
        if(len(ani_dates) != num_p): # se o comprimento do set for diferente do num de pessoas
            favoraveis += 1
    
    probs_perc = (favoraveis/num_loops)*100
    print('Em {} pessoas e {} testes deram-se {} vezes em que pelo menos duas pessoas fazem anos no mesmo dia, percentagem: {}%'.format(num_p, num_loops, favoraveis, probs_perc))
    # Em 23 pessoas e 1000 testes deram-se 506 vezes em que pelo menos duas pessoas fazem anos no mesmo dia, percentagem: 50.6%
    

    STATEMENT

    The equation of exponential increase in probability can be view here , and I by testing this same equation in python I also did the chart.

    It does not have to be with the question but here I leave the code to make this chart:

    import matplotlib.pyplot as plt
    
    def birthday_probs(x):
        p = (1.0/365)**x
        for i in range((366-x),366):
            p *= i
        return 1-p
    
    plt.plot([birthday_probs(i)*100 for i in range(366)])
    plt.xlabel("Num de pessoas")
    plt.ylabel("Probabilidades de partilha de dia de aniversário")
    plt.ylim(ymin=0)
    plt.xlim(xmin=0, xmax=80)
    plt.show()
    

        
    04.04.2018 / 21:13
    4

    This link can help you: Wikipedia: Paradox of the Birthday

    Basically the probabilities code is:

    def birthday(x):
        p = (1.0/365)**x
        for i in range((366-x),366):
            p *= i
        return 1-p
    
        
    04.04.2018 / 21:23