htmlquery is an XPath query package for HTML, lets you extract data or evaluate from HTML documents by an XPath expression.
htmlquery built-in the query object caching feature based on LRU, this feature will caching the recently used XPATH query string. Enable query caching can avoid re-compile XPath expression each query.
You can visit this page to learn about the supported XPath(1.0/2.0) syntax. https://github.com/antchfx/xpath
| Name | Description |
|---|---|
| htmlquery | XPath query package for the HTML document |
| xmlquery | XPath query package for the XML document |
| jsonquery | XPath query package for the JSON document |
go get github.com/antchfx/htmlquery nodes, err := htmlquery.QueryAll(doc, "//a") if err != nil { panic(`not a valid XPath expression.`) }doc, err := htmlquery.LoadURL("http://example.com/")filePath := "/home/user/sample.html" doc, err := htmlquery.LoadDoc(filePath)s := `<html>....</html>` doc, err := htmlquery.Parse(strings.NewReader(s))list := htmlquery.Find(doc, "//a")list := htmlquery.Find(doc, "//a[@href]")list := htmlquery.Find(doc, "//a/@href") for _ , n := range list{ fmt.Println(htmlquery.InnerText(n)) // output @href value }a := htmlquery.FindOne(doc, "//a[3]")a := htmlquery.FindOne(doc, "//a") img := htmlquery.FindOne(a, "//img") fmt.Prinln(htmlquery.SelectAttr(img, "src")) // output @src valueexpr, _ := xpath.Compile("count(//img)") v := expr.Evaluate(htmlquery.CreateXPathNavigator(doc)).(float64) fmt.Printf("total count is %f", v)func main() { doc, err := htmlquery.LoadURL("https://www.bing.com/search?q=golang") if err != nil { panic(err) } // Find all news item. list, err := htmlquery.QueryAll(doc, "//ol/li") if err != nil { panic(err) } for i, n := range list { a := htmlquery.FindOne(n, "//a") if a != nil { fmt.Printf("%d %s(%s)\n", i, htmlquery.InnerText(a), htmlquery.SelectAttr(a, "href")) } } }Find and QueryAll both do the same things, searches all of matched html nodes. The Find will panics if you give an error XPath query, but QueryAll will return an error for you.
Yes, you can. We offer the QuerySelector and QuerySelectorAll methods, It will accept your query expression object.
Cache a query expression object(or reused) will avoid re-compile XPath query expression, improve your query performance.
goos: windows goarch: amd64 pkg: github.com/antchfx/htmlquery BenchmarkSelectorCache-4 20000000 55.2 ns/op BenchmarkDisableSelectorCache-4 500000 3162 ns/op htmlquery.DisableSelectorCache = true Please let me know if you have any questions.