How to find hashtags in a string and store them in an array?


I have a system for posting content on a certain social network of our company.

When the user enters text with hashtags , I need to detect all of them and store them in an array.



Hello, I'm posting this # question to #stackoverlow.   I hope you find good answers.

I want you to return:

array('pergunta', 'stackoverlow', 'respostas');

Remember that if the hashtag contains accented characters, they must also be processed.


asked by anonymous 19.10.2015 / 17:47

4 answers


I believe that this regex solves the problem, the combination of% w / o% is followed by any character in the range of (az, 0-9), the% w /% parameter means that the combination will be case insensitive, # add character support with multibyte.


   $str = '#pergunta no #stackoverlow #notícias 2015 #sãoPaulo';
   preg_match_all('/#\w+/iu', $str, $itens);

   echo "<pre>";


    [0] => Array
            [0] => #pergunta
            [1] => #stackoverlow
            [2] => #notícias
            [3] => #sãoPaulo

@Wallace Maxters, asked to remove i from caputra, @Guilherme Lautert suggested changing the regex to: u , using Lookbehind positive, which checks if the character exists but does not capture it.

Recommended reading

Meaning of?:? =?! ? =?! in a regex

19.10.2015 / 18:18

Using the comment from @renan.

A changing the answer given:

$tweet = "this has a #hashtag a  #badhash-tag and a #goodhash_tag";

preg_match_all("/(#[^ #]+)/", $tweet, $matches);

var_dump( $matches );

So it looks for anything except " " (space), and the actual # , which has # ahead.


19.10.2015 / 18:04

Another way is to regex the hashtag tag and separate only the group with:

function extractTags($mensagem)
    // Casa tags como #dia #feliz #chateado
    // Não casa caracteres especias #so-pt
    $pattern = '/#(\w+)/u';

    // Alternativa para incluir outros caracteres
    // Basta incluir entre os colchetes
    //$pattern = '/#([\w-]+)/u';

    preg_match_all($pattern, $mensagem, $tags);

    // Utiliza o vetor com os grupos capturados entre parenteses
    return $tags[1];

Extract this function from a response I gave earlier in another question: System hashtags in PHP

19.10.2015 / 19:34

In PHP you use the preg_replace function, with the regex below, it will fetch all words that contain # and return in matches

var_dump( $matches );
17.10.2018 / 16:30