Well, seeing some of the API that youtube offers to be used, there is no way to "socket" it. Therefore, you will have to keep giving GET from time to time in a specific chat room to return a list of the messages sent. After that, place each message in a stack (to maintain the integrity of the time the message was sent), or some structure of that sort, and start scanning that stack for what you're interested in. That would be the way I would build such an application. If there is any other better solution, please let me know! : D
In order for the correct message to appear on the screen, you will need to connect this application with some livestreaming software (type: xplit, obs) through some plugin that you will have to implement for them. For this you will have to study a little more about the API of these softwares too! I do not understand much how they work, but I know it's there. If I'm not mistaken, there should already be some plugin that performs the operations you are trying to implement. Try to study them! :)
Here is the documentation I took as the basis for this type of implementation you are trying to do: link
Go to the "LiveChatMessages" session and there it will give you the operations available for chat manipulation.
If you said something wrong, please correct me. Hugs!