Hello, I'm making an app that detects SIT tones ( link ). I understand very little about mathematics involved in Fourier transform and signal processing. What I'm wanting is to understand how I can identify if these patterns occurred on a voip call. To test, I'm using a recorded file. I've tried to understand some algorithms for frequency identification, but I need to pick it up as described in the wikipedia link. That is: check if there were three touches with those specific frequencies in that specific duration, with that specific interval. All I have achieved so far is to identify the frequency in the audio, but I could not figure out how to see if it occurred during the correct amount of time. (each code has a permuted sequence of the same frequencies) Does anyone have any idea how I could solve this?