Refactoring random datetime generation (Python)

Question

Refactoring random datetime generation (Python)

Navigation

#1 by (3 votes)
#2 by (1 votes)

1

I accept suggestions to improve this code

import random
import datetime

def gen_timestamp(min_year=1915, max_year=1996):
    # gera um datetime no formato yyyy-mm-dd hh:mm:ss.000000
    year = random.randint(min_year, max_year)
    month = random.randint(11, 12)
    day = random.randint(1, 28)
    hour = random.randint(1, 23)
    minute = random.randint(1, 59)
    second = random.randint(1, 59)
    microsecond = random.randint(1, 999999)
    date = datetime.datetime(
        year, month, day, hour, minute, second, microsecond).isoformat(" ")
    return date

I accept PR.

link

random python

asked by anonymous 25.10.2015 / 02:51

2 answers

1

import datetime
import random
def random_datetime(start, end):
    assert isinstance(start, datetime.datetime)
    assert isinstance(end, datetime.datetime)
    start = (start - datetime.datetime(1970, 1, 1)).total_seconds()
    end = (end - datetime.datetime(1970, 1, 1)).total_seconds()
    return datetime.datetime.fromtimestamp(random.randint(start, end))

26.10.2015 / 12:59

How to run multiple threads sequentially? 'system.outofmemoryexception' while opening 300 mega .dbf file

score 3 · Accepted Answer

As I do not know the purpose of your code, in principle it seems okay, except for the fact that it never draws days 29, 30 and 31. If that date (which I see to be "naive" or naive ) represents a date in UTC, so it also never draws leap seconds ( leap seconds in>) - although the Python documentation does not support them anyway.

Including these missing values brings an additional complication: the probability of a random date falling in a 31-day month is slightly greater than it falls in a 30-day month (idem to 28 and 29), as well as falling in a leap year in an ordinary year. So if the goal is a uniform distribution, drawing field by field would become overly laborious, long, and subject to errors.

An alternative is to sort a delta: get the value of min_year-01-01 00:00:00.000000 less (max_year+1)-01-01 00:00:00.000000 (ie the total seconds of a timedelta , in float) and draw a number of seconds between zero and this delta, then converting back to date:

def gen_timestamp(min_year=1915, max_year=1996):
    min_date = datetime(min_year,  1,1)
    max_date = datetime(max_year+1,1,1)
    delta = random()*(max_date - min_date).total_seconds()
    return (min_date + timedelta(seconds=delta)).isoformat(" ")

So any date in the interval can be drawn, and the draw will be uniform. See for example him raffling on February 29:

>>> i, d = 0, gen_timestamp()
>>> while d[5:10] != '02-29' and i < 100000:
...   i, d = i+1, gen_timestamp()
...
>>> i,d
(770, '1960-02-29 21:28:40.688135')

Note: According to the documentation , if the interval between the largest and smaller data is too large (270 years on most platforms) this method loses precision in microseconds.