Python - 读取 RSS feed

  • 简述

    RSS(Rich Site Summary)是一种用于交付定期更改的 Web 内容的格式。许多与新闻相关的网站、博客和其他在线出版商将他们的内容作为 RSS 提要提供给任何想要的人。在 python 中,我们借助以下包来读取和处理这些提要。
    
    pip install feedparser
    
  • Feed 结构

    在下面的示例中,我们获取了提要的结构,以便我们可以进一步分析我们想要处理提要的哪些部分。
    
    import feedparser
    NewsFeed = feedparser.parse("https://timesofindia.indiatimes.com/rssfeedstopstories.cms")
    entry = NewsFeed.entries[1]
    print entry.keys()
    
    当我们运行上述程序时,我们得到以下输出 -
    
    ['summary_detail', 'published_parsed', 'links', 'title', 'summary', 'guidislink', 'title_detail', 'link', 'published', 'id']
    
  • Feed 标题和帖子

    在下面的示例中,我们读取了 rss 提要的标题和标题。
    
    import feedparser
    NewsFeed = feedparser.parse("https://timesofindia.indiatimes.com/rssfeedstopstories.cms")
    print 'Number of RSS posts :', len(NewsFeed.entries)
    entry = NewsFeed.entries[1]
    print 'Post Title :',entry.title
    
    当我们运行上述程序时,我们得到以下输出 -
    
    Number of RSS posts : 5
    Post Title : Cong-JD(S) in SC over choice of pro tem speaker
    
  • Feed 详细信息

    基于上述条目结构,我们可以使用 python 程序从提要中获取必要的详细信息,如下所示。由于 entry 是一个字典,我们利用它的键来生成所需的值。
    
    import feedparser
    NewsFeed = feedparser.parse("https://timesofindia.indiatimes.com/rssfeedstopstories.cms")
    entry = NewsFeed.entries[1]
    print entry.published
    print "******"
    print entry.summary
    print "------News Link--------"
    print entry.link
    
    当我们运行上述程序时,我们得到以下输出 -
    
    Fri, 18 May 2018 20:13:13 GMT
    ******
    Controversy erupted on Friday over the appointment of BJP MLA K G Bopaiah as pro tem speaker for the assembly, with Congress and JD(S) claiming the move went against convention that the post should go to the most senior member of the House. The combine approached the SC to challenge the appointment. Hearing is scheduled for 10:30 am today.
    ------News Link--------
    https://timesofindia.indiatimes.com/india/congress-jds-in-sc-over-bjp-mla-made-pro-tem-speaker-hearing-at-1030-am/articleshow/64228740.cms