Voice recognition in php

11

This question is really curious, I'm not touching anything like that, but in the future who knows.

I would like to know if there is any kind of system or if it is possible to create a system that can do voice recognition (either for login, or any commands) using php.

If it's not possible, I'd like to understand why you can not do something like that in php.

I find the voice feature fascinating, and cogito, when you have more ownership and maturity on the subject, working on a system or type of api.

    
asked by anonymous 22.01.2016 / 13:47

3 answers

7
The "state of the art" speech recognition is well advanced, so implementing such a system from scratch would be a huge job, would require a lot of research and several people working for months or years. It makes more sense in the case to call a third-party API (which already solved the problem), such as link

That said, disregarding performance / efficiency, there is in theory nothing to stop someone from implementing a voice recognition system "manually" in PHP (or any language). You would need to use (or develop) a library to read the audio file and return a stream, and then pass that stream through some sort of processing to recognize speech.

There are several techniques for voice recognition (hidden Markov models, DTW, Neural Networks, etc.). This wikipedia link has some information on the subject:

link

Here is the wit.ai HTTP API documentation, which I mentioned above:

link

The examples are in bash (using cURL), so you would have to rewrite them in PHP.

    
28.01.2016 / 21:49
4

This question has already had an accepted answer, I checked to answer and I did not have time, but it's never late to add and make new considerations.

  

I wanted to know if there is any kind of system or if it is possible to create a   system that can do voice recognition (whether for login, or   any commands) using php.

Your question was left open did you want to ask about a system that converts what is spoken into text (transcription) or do you want a system that the user uses the voice to previously record a word and your system will be based on the voice / word spoken by the user to compare and validate? They are two completely different systems.

Of course the first type of system is complex but the second I venture to say that it is "easy" with few lines of code in matlab I can rank and qualify how similar a pre-recorded word is with a new one. >

I do not know exactly dates, but since the 1980s, MFCC - Mel Frequency Cepstral Coeficientes is used to find speech patterns, we are talking about more than 30 years, and this technique is still considered the state of the art for this type of recognition (find pre-recorded words of a given speaker).

To clarify the MFCC is derived from Cepstrum :

cepstrum = IFFT(log(FFT(s)))

What does this equation mean?

It returns an envelope / shape (contour) of the frequencies of a signal in the frequency domain, this consistently tells us the shape of the vocal tract in the spectrum envelope.

So the difference between MFCC and Cepstrum is the equally spaced frequency bands in the honey scale, which approaches the response of the human auditory system more narrowly than the linearly spaced frequency bands used in the normal cepstrum.

OK, we have a way to capture the waveform of any word in the spectrum, and how do we compare it?

Let's appeal to a deterministic method (methods that do not give any special treatment to the noise present in the data, and if this data is expected to be in any way contaminated), that means you will need to buy something "pre-recorded" with new "something" either in good (no noise) or bad (noisy) conditions and yet being able to determine how similar they are, it seems complex, but not so much, we can use DTW - Dynamic Time Warping to compare two vectors with the information of the coefficients returned by the MFCC and take action.

The method described here was widely used in cell phones in the 90's, in the function where you associated a contact with a prerecorded (Fernando) type you spoke on the microphone "Fernando" and he called your contact.

On doing this system in PHP language technically it is possible, yes, it can be more complicated because it does not have native functions for fourier transform and neither encode and decode audio.

    
09.03.2016 / 16:09
3

Assuming you already understand the huge complexity of doing this in any language, better handle the things that make this possible or impossible in PHP.

You can divide each program into two parts: logic and I / O. I / O means that it communicates with the program something different: input and output. Common types of I / O are write and read disk, input data by user, display information screen for user etc. In case PHP, even if not the fastest or most beautiful language, is well able to do every calculation thing needed for voice recognition. Turning systems will more capable of less and less will be a problem. But in the I / O part you need to find a way to get the user's voice data. It is possible to use PHP on the computer without working as a website, but I assume that in normal use it would not do that. Then you will have to deal with the internet media to get the data you need, which is the sound of the voice.

Previously this was completely impossible: there was no browser input system I could use except perhaps Flash and Java applets. Nowadays, however, it has WebRTC. You will need to use a bit of JavaScript, but you can: ask for JavaScript access to the microphone, and pass this information to the backend that uses PHP code for voice recognition to return information to the user. If you have library for voice recognition on the system you can even access that one from PHP and save you a lot of work.

So, yes, it is possible, but because of the unavoidable need to have an I / O type it can not be pure PHP. But being honest, who has ever written pure PHP, does it have to produce HTML even for the user to be able to do anything useful?

    
01.02.2016 / 14:53