I do not know about "in Portuguese" - or even any other natural language, such as English - but from what I understood parsed_sents
returns a list of already parsed sentences without specifying as this analysis was done (automatically or manually, to serve as examples). To analyze a new sentence, you need to use a grammar, and then use the parse
method of that grammar. Example:
grammar1 = nltk.CFG.fromstring("""
S -> NP VP
VP -> V NP | V NP PP
PP -> P NP
V -> "saw" | "ate" | "walked"
NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
Det -> "a" | "an" | "the" | "my"
N -> "man" | "dog" | "cat" | "telescope" | "park"
P -> "in" | "on" | "by" | "with"
""")
This is a simple grammar with few rules and a restricted vocabulary. It can be used like this:
>>> sent = "Mary saw Bob".split()
>>> rd_parser = nltk.RecursiveDescentParser(grammar1)
>>> for tree in rd_parser.parse(sent):
... print(tree)
(S (NP Mary) (VP (V saw) (NP Bob)))
Font
The for
is due to the possibility that there are two or more interpretations for the phrase, if it is ambiguous. Another example (use only, for the corresponding grammar, see link above):
>>> pdp = nltk.ProjectiveDependencyParser(groucho_dep_grammar)
>>> sent = 'I shot an elephant in my pajamas'.split()
>>> trees = pdp.parse(sent)
>>> for tree in trees:
... print(tree)
(shot I (elephant an (in (pajamas my))))
(shot I (elephant an) (in (pajamas my)))
The way to use the code, therefore, is this. If there are good grammars for Portuguese that can be used in conjunction with this code (ie in a format accepted by this library), there I can not say - even because building a wide-ranging grammar is a very difficult problem. p>