Stanford Parser and NLTK

Is it possible to use Stanford Parser in NLTK? (I am not talking about Stanford POS.) EDITED As of NLTK version 3.1 the instructions of this answer will no longer work. Please follow the instructions on https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software This answer is kept for legacy purposes on Stackoverflow. The answer does work for NLTK v3.0 though. Original Answer

斯坦福大学帕尔斯和NLTK

是否可以在NLTK中使用Stanford Parser? (我不是在谈论斯坦福POS。) EDITED 从NLTK版本3.1起,此答案的说明将不再起作用。 请按照https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software上的说明进行操作 这个答案在Stackoverflow上保留用于遗留目的。 虽然答案对NLTK v3.0有效。 原始答复 当然,在Python中尝试以下内容: import os from nltk.parse import stanford os.environ['STANFORD_PARSER'

How to extract numbers (along with comparison adjectives or ranges)

I am working on two NLP-projects in Python, and both have similar task to extract values and comparison operators from sentences like: "... greater than $10 ... ", "... weight not more than 200lbs ...", "... height in 5-7 feets ...", "... faster than 30 seconds ... " I saw two different ways to solve this problem, one using very complex regular expressions, and one using NER (and some regexes,

如何提取数字(连同比较形容词或范围)

我正在使用Python中的两个NLP项目,并且都有类似的任务来从像这样的句子中提取值和比较运算符: "... greater than $10 ... ", "... weight not more than 200lbs ...", "... height in 5-7 feets ...", "... faster than 30 seconds ... " 我看到了两种不同的方法来解决这个问题,一种使用非常复杂的正则表达式,另一种使用NER(以及一些正则表达式)。 我如何从这些句子中解析出数值? 我认为这是NLP的一个常见任务。

Chunk grammar doesn't read commas

from nltk.chunk.util import tagstr2tree from nltk import word_tokenize, pos_tag text = "John Rose Center is very beautiful place and i want to go there with Barbara Palvin. Also there are stores like Adidas ,Nike ,Reebok Center." tagged_text = pos_tag(text.split()) grammar = "NP:{<NNP>+}" cp = nltk.RegexpParser(grammar) result = cp.parse(tagged_text) print(result) Output: (S (NP John/

块语法不读取逗号

from nltk.chunk.util import tagstr2tree from nltk import word_tokenize, pos_tag text = "John Rose Center is very beautiful place and i want to go there with Barbara Palvin. Also there are stores like Adidas ,Nike ,Reebok Center." tagged_text = pos_tag(text.split()) grammar = "NP:{<NNP>+}" cp = nltk.RegexpParser(grammar) result = cp.parse(tagged_text) print(result) 输出: (S (NP John/N

How to output NLTK chunks to file?

I have this python script where I am using nltk library to parse,tokenize,tag and chunk some lets say random text from the web. I need to format and write in a file the output of chunked1 , chunked2 , chunked3 . These have type class 'nltk.tree.Tree' More specifically I need to write only the lines that match the regular expressions chunkGram1 , chunkGram2 , chunkGram3 . How can i

如何输出NLTK块到文件?

我有这个Python脚本,我正在使用nltk库来解析,标记化,标记和块一些让我们说从网上随机文本。 我需要格式化并在文件中写入chunked1 , chunked2 , chunked3的输出。 这些类型为class 'nltk.tree.Tree' 更具体地说,我需要只写与正则表达式chunkGram1 , chunkGram2 , chunkGram3匹配的行。 我怎样才能做到这一点? #! /usr/bin/python2.7 import nltk import re import codecs xstring = ["An electronic li

How to use NLTK to generate sentences from an induced grammar?

I have a (large) list of parsed sentences (which were parsed using the Stanford parser), for example, the sentence "Now you can be entertained" has the following tree: (ROOT (S (ADVP (RB Now)) (, ,) (NP (PRP you)) (VP (MD can) (VP (VB be) (VP (VBN entertained)))) (. .))) I am using the set of sentence trees to induce a grammar using nltk: import nl

如何使用NLTK从诱导语法生成句子?

我有一个(大的)解析句子列表(使用斯坦福解析器解析),例如,句子“现在你可以被娱乐”具有以下树: (ROOT (S (ADVP (RB Now)) (, ,) (NP (PRP you)) (VP (MD can) (VP (VB be) (VP (VBN entertained)))) (. .))) 我正在使用一组句子树来使用nltk来引发语法: import nltk # ... for each sentence tree t, add its production to allProductions allProductions += t.productions() #

How can I tag and chunk French text using NLTK and Python?

I have 30,000+ French-language articles in a JSON file. I would like to perform some text analysis on both individual articles and on the set as a whole. Before I go further, I'm starting with simple goals: Identify important entities (people, places, concepts) Find significant changes in the importance (~=frequency) of those entities over time (using the article sequence number as a pr

我如何使用NLTK和Python标记和块法文文本?

我在JSON文件中有30,000多种法语文章。 我想对单个文章和整个集合进行一些文本分析。 在我走得更远之前,我先从简单的目标开始: 识别重要的实体(人物,地点,概念) 随着时间的推移发现这些实体重要性(〜=频率)的重大变化(使用文章序号作为时间代理) 我迄今采取的步骤: 将数据导入python列表中: import json json_articles=open('articlefile.json') articlelist = json.load(json_articles) 选择一篇文章进

Chunking with nltk

How can I obtain all the chunk from a sentence given a pattern. Exemple NP:{<NN><NN>} Sentence tagged: [("money", "NN"), ("market", "NN") ("fund", "NN")] If I parse I obtain (S (NP money/NN market/NN) fund/NN) I would like to have also the other alternative that is (S money/NN (NP market/NN fund/NN)) I think your question is about getting the n most likely parses of a sentenc

用nltk分块

如何从给定模式的句子中获得所有块。 为例 NP:{<NN><NN>} 带标签的句子: [("money", "NN"), ("market", "NN") ("fund", "NN")] 如果我解析我获得 (S (NP money/NN market/NN) fund/NN) 我想也有其他的选择 (S money/NN (NP market/NN fund/NN)) 我觉得你的问题是如何取得n一个句子的最可能的解析。 我对吗? 如果是,请参阅2.0文档中的nbest_parse(sent, n=None)函数。 @mbatchkarov关于nbest_parse文

Update Google App Engine to Python 2.7

I've tried to update my app before by changing runtime: python27 , threadsafe: true , script: main.app" It did work and was on python 2.7 but it didn't run properly I guess because my index.html didn't display when I went to the url http://dhsenviro.appspot.com. It is running on 2.5 now (because I want to keep it up). robots.txt is empty. How can I update it to 2.7 or shou

将Google App Engine更新为Python 2.7

我试图通过更改来更新我的应用程序 运行时:python27,threadsafe:true,脚本:main.app“ 它确实工作,并在Python 2.7,但它没有正常运行我猜,因为我的index.html没有显示时,我去了http://dhsenviro.appspot.com网址。 它现在运行在2.5(因为我想保持它)。 robots.txt为空。 我怎样才能更新到2.7或者我应该更新到3.x? app.yaml: application: dhsenviro version: 1 runtime: python api_version: 1 handlers: -

Python Google App Engine : Send Mail Error

I am writing a simple test application to send email using Python GAE. I am receiving below error in logs. I have tried empty body and other changes but nothing seems to be working. Is there any configuration changes that I need to make? Traceback (most recent call last): File "/base/data/home/apps/s~xxxx/1.360190002979488583/email.py", line 5, in from google.appengine.api impo

Python Google App Engine:发送邮件错误

我正在编写一个简单的测试应用程序来使用Python GAE发送电子邮件。 我在日志中收到以下错误。 我试过空身和其他变化,但似乎没有任何工作。 我需要做什么配置更改? 回溯(最近一次通话最后): 文件“/base/data/home/apps/s~xxxx/1.360190002979488583/email.py”,第5行,在 from google.appengine.api导入邮件文件“/base/python27_runtime/python27_lib/versions/1/google/appengine/api/mail.py”,第37行,在 从电

Concurrent requests in Appengine Python

Official appengine documentation says that if we set threadsafe property to true in app.yaml then appengine will server concurrent requests. Official link: https://developers.google.com/appengine/docs/python/python27/newin27#Concurrent_Requests Does it mean application will be faster (than 2.5) if we have threadsafe property to true? Official documentation/blog says so but i am looking for r

Appengine Python中的并发请求

官方appengine文档说,如果我们在app.yaml中设置threadsafe属性为true,那么appengine将服务器并发请求。 官方链接:https://developers.google.com/appengine/docs/python/python27/newin27#Concurrent_Requests 如果我们将threadsafe属性设置为true,是否意味着应用程序将会更快(超过2.5)? 官方文档/博客这样说,但我正在寻找真实世界的经验。 在高层,它是如何在内部工作的? 我们的应用程序是否会初始化并为每个