Python - 块和缝隙

  • 简述

    分块是根据单词的性质将相似的单词组合在一起的过程。在下面的示例中,我们定义了必须生成块的语法。语法建议短语的顺序,如名词和形容词等,在创建块时将遵循这些顺序。块的图形输出如下所示。
    
    import nltk
    sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"), 
    ("flew", "VBD"), ("through", "IN"),  ("the", "DT"), ("window", "NN")]
    grammar = "NP: {?*}" 
    cp = nltk.RegexpParser(grammar)
    result = cp.parse(sentence) 
    print(result)
    result.draw()
    
    当我们运行上述程序时,我们得到以下输出 -
    块_1.png
    改变语法,我们得到不同的输出,如下所示。
    
    import nltk
    sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"),
     ("flew", "VBD"), ("through", "IN"),  ("the", "DT"), ("window", "NN")]
    grammar = "NP: {?*}" 
    chunkprofile = nltk.RegexpParser(grammar)
    result = chunkprofile.parse(sentence) 
    print(result)
    result.draw()
    
    当我们运行上述程序时,我们得到以下输出 -
    块_2.png
  • 叮当

    Chinking 是从块中删除一系列标记的过程。如果标记序列出现在块的中间,则这些标记被删除,在它们已经存在的地方留下两个块。
    
    import nltk
    sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"), ("flew", "VBD"), ("through", "IN"),  ("the", "DT"), ("window", "NN")]
    grammar = r"""
      NP:
        {<.*>+}         # Chunk everything
        }+{      # Chink sequences of JJ and NN
      """
    chunkprofile = nltk.RegexpParser(grammar)
    result = chunkprofile.parse(sentence) 
    print(result)
    result.draw()
    
    当我们运行上述程序时,我们得到以下输出 -
    缝隙.png
    如您所见,符合语法标准的部分作为单独的块从名词短语中省略。提取不在所需块中的文本的过程称为 chinking。