Autocorrelation function in Python

3

Hello, everyone!

I have a data .txt file with two columns (in X I have the values in days of observations and in Y the Measured Flow of a sample)

I want to calculate the Period with which this data is repeated and I would like to do this using the autocorrelation function in Python, but so far I can not.

What I have done is the following:

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

dado = np.loadtxt(r'.../file.txt') #busco meu arquivo no meu PC

t = dado[:,0] #esta é apenas a coluna que corresponde ao tempo
y = dado[:,1] #esta é apenas a coluna que corresponde ao fluxo medido

# Fazer uma filtragem a partir da Transformada Rápida de Fourier (Fast Fourier Transformed - FFT)

fft_of_signal_with_noise = np.fft.fft(y)
# Normalizar o tempo
N = len(t)
f = np.fft.fftfreq(len(fft_of_signal_with_noise),1600)
frequency_of_signal  = 1.0 


def bandpass_filter(x, freq, frequency_of_signal=frequency_of_signal, band = 0.05):
    if (frequency_of_signal - band) < abs(freq) < (frequency_of_signal + band):
        return x
    else:
        return 0

F_filtered = np.asanyarray([bandpass_filter(x,freq) for x,freq in zip(fft_of_signal_with_noise, f)]);

filtered_signal = np.fft.ifft(F_filtered);

But I have not yet succeeded in finding the correct value of the Period of this sample, for example.

NOTE: This sample has about 1600 days, but the file has about 64,000 points in X column!

Can someone help me find the value of the Period of time?

    
asked by anonymous 24.01.2018 / 17:53

1 answer

1

Before starting anything, I need to give you some information ...

There are two ways to perform autocorrelation, in your example I could see that you tried to perform autocorrelation in the frequency domain, the correct equation to perform Autocorrelation in the frequency domain is:

Itisimportanttoknowthatitisalsopossibletoperformautocorrelationinthetimedomain:

ThelatterworkskindoflikeinGrossForce,apseudoautocorrelationcodefromtheaboveequationwouldbe:

AC=zeros(4096,1);fork=1:4096/2,sum=0;forn=1:4096/2,sum=sum+(x(n)*x(n+k));endAC(k)=AC(k)+sum;end
Theautocorrelationwillreturnthemostsimilarpointswithinthesignal,thereturnofthisfunction(inbothtimedomainandfrequencydomain)willshowpeaksindicatingthepositionswherethesignalhasthegreatestsimilarity,thesepeakscanbetranslatedasfrequency/frequency,Iusuallyfindthedifference/subtractthepositionofapeakbytheothertofindtheperiodicityofthesignalanalyzed.

OfcourseIcanknowthesamplingrateofyourdata,bythefigurespresented:

  

Thissamplepresentsabout1600days,howeverthefilehasinthe  columnXabout64,000(sixty-fourMIL)points!

Thismeansthatyouhaveasamplingrateof40samplesperday=64000/1600=40,thatis,in1dayyoucollect40samplesthatmakeupyourdata,ifyouwanttoknowtheperiodicityof30days,goyouneedto30*40=1200samples,whenyouusefourieryouhavetoworryabouttheresolutionorder,readthisinmy

I can demonstrate how to find the periodicity of a data set using Fourier, since I do not have your data at hand, to demonstrate I'm going to create a sine wave often in% with% and with a sampling rate in% with% this tells me that we are looking for a 500Hz , save that number is what we are looking for as a result for these my data inputs, to demonstrate follows the plot of a 500hz sine with sampling rate at 44100hz (this rate Sampling says that every 1 second I am collecting 44100 samples), see the plot:

Itisasimpledata/signal,Icanseetherepetitionsinmyeye,soImarkedtheperiodwherethefirstrepetitionseemstooccur,lookat44100hz,rememberthevaluewearechasingupthere,right?44100/500=88.2000,iewiththenakedeyeIcandefinetheperiodicityofthisdata,buthowtodoitusingthefirstequation?

Followmycode:

importnumpyasnp#iniciogerandoosinalmostradonoplotFs=44100freq=500nsamples=4096sinal=np.arange(nsamples)sinal=np.sin(2*np.pi*freq*sinal/Fs)#fim,sinalcriado#aplicandoprimeiraequaçãoMag=np.abs(np.fft.rfft(sinal,nsamples*2))**2AC=np.fft.irfft(Mag[0:nsamples])#fimdaequação#encontrarondeestãoospicosdentrodaAutoCorrelaçãopeaks=[]k=3while(k<nsamples-1):y1=(AC[k-1])y2=(AC[k])y3=(AC[k+1])if(y2>y1andy2>=y3):peaks.append(k)k=k+1periodo=np.mean(np.diff(peaks))print("Periodo usando a diferenca de picos da Autocorrelacao")

print(periodo)

The result of this algorithm:

C:\Python33>python.exe AC.py
Periodo usando a diferenca de picos da Autocorrelacao
88.1555555556

Bingo x=89 practically the expected result ...

There is a catch in this part of the code 88.2000 , to make 88.1555555556 and not Convolution (which are similar, but not equal), you have to compute double the% samples% with% this applies np.fft.rfft(sinal,nsamples*2)) in the second half of the signal, ie half are your data and the other half is composed of zeros ...

There is another way to find the Frequency / Period that is only to apply Autocorrelação and observe where there are peaks, the Fourier Series exists for this purpose nsamples*2 , only using Fourier:

import numpy as np


Fs = 44100
freq = 500
nsamples = 4096
sinal = np.arange(nsamples)
sinal = np.sin(2 * np.pi * freq * sinal / Fs)

Mag = np.abs(np.fft.rfft(sinal,nsamples*2))**2


index = Mag[0:nsamples/2].argmax()

print("Periodo usando o maior componente espectral de Fourier")

print(1 / (index / nsamples/2))

Result:

C:\Python33>python.exe FFT.py
Periodo usando o maior componente espectral de Fourier
88.0860215054

Without needing AutoCorrelation I got very close, since the resolution order for this example is zeropad

Of course besides these two methods you can still use fourier or autocorrelation in the time domain of the second equation ...

    
23.05.2018 / 15:19