模型需要用到指代消解功能,查询到比较好用的是spaCy+neuralcoref。代码简单,但是出现了很多兼容性问题,neuralcoref相当于一个插件,在spaCy框架下实现指代消解,安装很多版本都不能顺利运行,各种error。最终找到一个匹配的版本。
nlp.create_pipe
with a component name that’s not built in - for example, when constructing the pipeline from a model’s meta.json. If you’re using a custom component, you can write to Language.factories['tok2vec']
or remove it from the model meta and add it via nlp.add_pipe
instead.”python 3.8.16
spacy 2.1.0
neuralcoref 4.0
en_core_web_sm 2.1.0
其中en_core_web_sm是英文模型,可以用python -m spacy validate 查看spacy对应版本的模型版本。
spacy:https://github.com/explosion/spaCy/tags?after=v3.4.3
en_core_web_sm:https://github.com/explosion/spacy-models/releases
neuralcoref:内网下比较容易安装
import spacy
import neuralcoref
# Load English tokenizer, tagger, parser and NER
nlp = spacy.load("en_core_web_sm")
neuralcoref.add_to_pipe(nlp)
# Process whole documents
text = ("When Sebastian Thrun started working on self-driving cars at ""Google in 2007, few people outside of the company took him ""seriously. “I can tell you very senior CEOs of major American ""car companies would shake my hand and turn away because I wasn’t ""worth talking to,” said Thrun, in an interview with Recode earlier ""this week.")
doc = nlp(text)for c in doc._.coref_clusters:print(c)# Analyze syntax
print("Noun phrases:", [chunk.text for chunk in doc.noun_chunks])
print("Verbs:", [token.lemma_ for token in doc if token.pos_ == "VERB"])# Find named entities, phrases and concepts
for entity in doc.ents:print(entity.text, entity.label_, entity.start_char, entity.end_char)