How to find hashtags in a string and store them in an array?

9

I have a system for posting content on a certain social network of our company.

When the user enters text with hashtags , I need to detect all of them and store them in an array.

Example:

  

Hello, I'm posting this # question to #stackoverlow.   I hope you find good answers.

I want you to return:

array('pergunta', 'stackoverlow', 'respostas');

Remember that if the hashtag contains accented characters, they must also be processed.

Example:

#notícias
#sãoPaulo
    
asked by anonymous 19.10.2015 / 17:47

4 answers

14

I believe that this regex solves the problem, the combination of% w / o% is followed by any character in the range of (az, 0-9), the% w /% parameter means that the combination will be case insensitive, # add character support with multibyte.

<?php

   $str = '#pergunta no #stackoverlow #notícias 2015 #sãoPaulo';
   preg_match_all('/#\w+/iu', $str, $itens);

   echo "<pre>";
   print_r($itens);

Output:

Array
(
    [0] => Array
        (
            [0] => #pergunta
            [1] => #stackoverlow
            [2] => #notícias
            [3] => #sãoPaulo
        )

)
@Wallace Maxters, asked to remove i from caputra, @Guilherme Lautert suggested changing the regex to: u , using Lookbehind positive, which checks if the character exists but does not capture it.

Recommended reading

Meaning of?:? =?! ? =?! in a regex

    
19.10.2015 / 18:18
8

Using the comment from @renan.

A changing the answer given:

$tweet = "this has a #hashtag a  #badhash-tag and a #goodhash_tag";

preg_match_all("/(#[^ #]+)/", $tweet, $matches);

var_dump( $matches );

So it looks for anything except " " (space), and the actual # , which has # ahead.

regex101

    
19.10.2015 / 18:04
3

Another way is to regex the hashtag tag and separate only the group with:

function extractTags($mensagem)
{
    // Casa tags como #dia #feliz #chateado
    // Não casa caracteres especias #so-pt
    $pattern = '/#(\w+)/u';

    // Alternativa para incluir outros caracteres
    // Basta incluir entre os colchetes
    //$pattern = '/#([\w-]+)/u';

    preg_match_all($pattern, $mensagem, $tags);

    // Utiliza o vetor com os grupos capturados entre parenteses
    return $tags[1];
}

Extract this function from a response I gave earlier in another question: System hashtags in PHP

    
19.10.2015 / 19:34
0

In PHP you use the preg_replace function, with the regex below, it will fetch all words that contain # and return in matches

preg_replace('/\#[A-Za-z-0-9]+/m',$string,$matches);
var_dump( $matches );
    
17.10.2018 / 16:30