This question has already had an accepted answer, I checked to answer and I did not have time, but it's never late to add and make new considerations.
I wanted to know if there is any kind of system or if it is possible to create a
system that can do voice recognition (whether for login, or
any commands) using php.
Your question was left open did you want to ask about a system that converts what is spoken into text (transcription) or do you want a system that the user uses the voice to previously record a word and your system will be based on the voice / word spoken by the user to compare and validate? They are two completely different systems.
Of course the first type of system is complex but the second I venture to say that it is "easy" with few lines of code in matlab I can rank and qualify how similar a pre-recorded word is with a new one. >
I do not know exactly dates, but since the 1980s, MFCC - Mel Frequency Cepstral Coeficientes
is used to find speech patterns, we are talking about more than 30 years, and this technique is still considered the state of the art for this type of recognition (find pre-recorded words of a given speaker).
To clarify the MFCC
is derived from Cepstrum
:
cepstrum = IFFT(log(FFT(s)))
What does this equation mean?
It returns an envelope / shape (contour) of the frequencies of a signal in the frequency domain, this consistently tells us the shape of the vocal tract in the spectrum envelope.
So the difference between MFCC and Cepstrum is the equally spaced frequency bands in the honey scale, which approaches the response of the human auditory system more narrowly than the linearly spaced frequency bands used in the normal cepstrum.
OK, we have a way to capture the waveform of any word in the spectrum, and how do we compare it?
Let's appeal to a deterministic method (methods that do not give any special treatment to the noise present in the data, and if this data is expected to be in any way contaminated), that means you will need to buy something "pre-recorded" with new "something" either in good (no noise) or bad (noisy) conditions and yet being able to determine how similar they are, it seems complex, but not so much, we can use DTW - Dynamic Time Warping to compare two vectors with the information of the coefficients returned by the MFCC and take action.
The method described here was widely used in cell phones in the 90's, in the function where you associated a contact with a prerecorded (Fernando) type you spoke on the microphone "Fernando" and he called your contact.
On doing this system in PHP language technically it is possible, yes, it can be more complicated because it does not have native functions for fourier transform and neither encode and decode audio.