How to use NLTK to generate sentences from an induced grammar?

I have a (large) list of parsed sentences (which were parsed using the Stanford parser), for example, the sentence "Now you can be entertained" has the following tree:

(ROOT
  (S
    (ADVP (RB Now))
    (, ,)
    (NP (PRP you))
    (VP (MD can)
      (VP (VB be)
        (VP (VBN entertained))))
    (. .)))

I am using the set of sentence trees to induce a grammar using nltk:

import nltk

# ... for each sentence tree t, add its production to allProductions
allProductions += t.productions()

# Induce the grammar
S = nltk.Nonterminal('S')
grammar = nltk.induce_pcfg(S, allProductions)

Now I would like to use grammar to generate new, random sentences. My hope is that since the grammar was learned from a specific set of input examples, then the generated sentences will be semantically similar. Can I do this in nltk?

If I can't use nltk to do this, do any other tools exist that can take the (possibly reformatted) grammar and generate sentences?


In NLTK 2.0 you can use nltk.parse.generate to generate all possible sentences for a given grammar.

This code defines a function which should generate a single sentence based on the production rules in a (P)CFG.

# This example uses choice to choose from possible expansions
from random import choice
# This function is based on _generate_all() in nltk.parse.generate
# It therefore assumes the same import environment otherwise.
def generate_sample(grammar, items=["S"]):
    frags = []
    if len(items) == 1:
        if isinstance(items[0], Nonterminal):
            for prod in grammar.productions(lhs=items[0]):
                frags.append(generate_sample(grammar, prod.rhs()))
        else:
            frags.append(items[0])
    else:
        # This is where we need to make our changes
        chosen_expansion = choice(items)
        frags.append(generate_sample,chosen_expansion)
    return frags

To make use of the weights in your PCFG, you'll obviously want to use a better sampling method than choice() , which implicitly assumes all expansions of the current node are equiprobable.


First of all, if you generate random sentences, they may be semantically correct, but they will probably loose their sense.

(It's sounds to me a bit like those MIT students did with their SCIgen program which is auto-generating scientific paper. Very interesting btw.)

Anyway, I never done it myself, but it seems possible with nltk.bigrams, you may way to have a look there under Generating Random Text with Bigrams .

You can also generate all subtrees of a current tree, I'm not sure if it is what you want either.


使用nltk Text对象,可以在其上调用“generate()”,它将“打印随机文本,使用trigram语言模型生成”。http://nltk.org/_modules/nltk/text.html

链接地址: http://www.djcxy.com/p/91720.html

上一篇: 如何输出NLTK块到文件?

下一篇: 如何使用NLTK从诱导语法生成句子?