Go 语言 HTML 解析库 goquery v 1.0.0 正式发布

来源: 投稿
作者: 叶小凡
2016-08-11

goquery是一个使用go语言写成的HTML解析库,可以让你像jQuery那样的方式来操作DOM文档。

下面是示例:

package main  import ( "fmt"  "log"  "github.com/PuerkitoBio/goquery"  )  
func ExampleScrape() {  
    doc, err := goquery.NewDocument("http://metalsucks.net") 
        if err != nil {
    log.Fatal(err)
  }  // Find the review items  
    doc.Find(".sidebar-reviews article .content-block").Each(func(i int, s *goquery.Selection) {  // For each item found, get the band and title  band := s.Find("a").Text()  
     title := s.Find("i").Text()
     fmt.Printf("Review %d: %s - %s\n", i, band, title)
  })
} 
func main() {  
    ExampleScrape()
}

更新日志:

  • 2016-07-27 (v1.0.0) : Tag version 1.0.0.

  • 2016-06-15 : Invalid selector strings internally compile to a Matcher implementation that never matches any node (instead of a panic). So for example, doc.Find("~") returns an empty *Selection object.

  • 2016-02-02 : Add NodeName utility function similar to the DOM's nodeName property. It returns the tag name of the first element in a selection, and other relevant values of non-element nodes (see godoc for details). Add OuterHtml utility function similar to the DOM's outerHTML property (named OuterHtml in small caps for consistency with the existingHtml method on the Selection).

  • 2015-04-20 : Add AttrOr helper method to return the attribute's value or a default value if absent. Thanks topiotrkowalczuk.

  • 2015-02-04 : Add more manipulation functions - Prepend* - thanks again to Andrew Stone.

  • 2014-11-28 : Add more manipulation functions - ReplaceWith, Wrap and Unwrap - thanks again to Andrew Stone.

  • 2014-11-07 : Add manipulation functions (thanks to Andrew Stone) and *Matcher functions, that receive compiled cascadia selectors instead of selector strings, thus avoiding potential panics thrown by goquery viacascadia.MustCompile calls. This results in better performance (selectors can be compiled once and reused) and more idiomatic error handling (you can handle cascadia's compilation errors, instead of recovering from panics, which had been bugging me for a long time). Note that the actual type expected is a Matcher interface, that cascadia.Selectorimplements. Other matcher implementations could be used.

  • 2014-11-06 : Change import paths of net/html to golang.org/x/net/html (seehttps://groups.google.com/forum/#!topic/golang-nuts/eD8dh3T9yyA). Make sure to update your code to use the new import path too when you call goquery with html.Nodes.

  • v0.3.2 : Add NewDocumentFromReader() (thanks jweir) which allows creating a goquery document from an io.Reader.

  • v0.3.1 : Add NewDocumentFromResponse() (thanks assassingj) which allows creating a goquery document from an http response.

  • v0.3.0 : Add EachWithBreak() which allows to break out of an Each() loop by returning false. This function was added instead of changing the existing Each() to avoid breaking compatibility.

  • v0.2.1 : Make go-getable, now that go.net/html is Go1.0-compatible (thanks to @matrixik for pointing this out).

  • v0.2.0 : Add support for negative indices in Slice(). BREAKING CHANGE Document.Root is removed, Document is now aSelection itself (a selection of one, the root element, just like Document.Root was before). Add jQuery's Closest() method.

  • v0.1.1 : Add benchmarks to use as baseline for refactorings, refactor Next...() and Prev...() methods to use the new html package's linked list features (Next/PrevSibling, FirstChild). Good performance boost (40+% in some cases).

  • v0.1.0 : Initial release.

展开阅读全文
35 收藏
分享
加载中
更多评论
13 评论
35 收藏
分享
返回顶部
顶部