Method inside NLTK in python that returns a syntax tree


I'm using the NLTK Forest library and I saw there some parse (parse tree) sentences already created. However, I would like a method that from a new sentence it creates the parse in Portuguese.

Examples: I use it today


and it brings me a tree mounted for each sentence within the existing corpus. I would like to pass new sentences in Portuguese and some python function (s) return me the statement with parse equal to the above function returns.

asked by anonymous 29.11.2015 / 13:55

3 answers


I do not know about "in Portuguese" - or even any other natural language, such as English - but from what I understood parsed_sents returns a list of already parsed sentences without specifying as this analysis was done (automatically or manually, to serve as examples). To analyze a new sentence, you need to use a grammar, and then use the parse method of that grammar. Example:

grammar1 = nltk.CFG.fromstring("""
  S -> NP VP
  VP -> V NP | V NP PP
  PP -> P NP
  V -> "saw" | "ate" | "walked"
  NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
  Det -> "a" | "an" | "the" | "my"
  N -> "man" | "dog" | "cat" | "telescope" | "park"
  P -> "in" | "on" | "by" | "with"

This is a simple grammar with few rules and a restricted vocabulary. It can be used like this:

>>> sent = "Mary saw Bob".split()
>>> rd_parser = nltk.RecursiveDescentParser(grammar1)
>>> for tree in rd_parser.parse(sent):
...      print(tree)
(S (NP Mary) (VP (V saw) (NP Bob)))


The for is due to the possibility that there are two or more interpretations for the phrase, if it is ambiguous. Another example (use only, for the corresponding grammar, see link above):

>>> pdp = nltk.ProjectiveDependencyParser(groucho_dep_grammar)
>>> sent = 'I shot an elephant in my pajamas'.split()
>>> trees = pdp.parse(sent)
>>> for tree in trees:
...     print(tree)
(shot I (elephant an (in (pajamas my))))
(shot I (elephant an) (in (pajamas my)))

The way to use the code, therefore, is this. If there are good grammars for Portuguese that can be used in conjunction with this code (ie in a format accepted by this library), there I can not say - even because building a wide-ranging grammar is a very difficult problem. p>     

29.11.2015 / 22:04

The problem of trees in Portuguese is that they do not have a tagger.

You can try to make a comparison between your text and the forest, but there is no guarantee that they will cover all your words.

You can also use nltk.CFG.fromstring and mount your tree in the hand, but if it is too complex, you end up falling into the tagger problem.

I do not know the size of your need to create this, but if you want to contribute to the development of a tagger in Portuguese.

29.12.2015 / 22:11

I recently wrote a post with an example of using SyntaxNet (Google), trained for Portuguese, to extract a syntactic tree from a sentence, and use that information with NLTK structures:


18.06.2017 / 13:01