Skip to content

GoranTopic/Web-Scrapping-with-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Scrappoing usin Python

This are the parctice scrips used to practice web scapping

bold italics

REGEX Cheatsheet

Character Example Definition
* ab Matches the previous character 0 or more times
+ a+b+ Matches the previous character 1 or more times
[ ] [a-z] Matches any character from a to z
[^ ]] [a-z] Does not matches any character from a to z
() (ab) A grouped subexpression, this are executed first
| (foo|foot)s or Matches one of the other expression
{m,n} a{2,3} Matches the preceding character, m to n
. b.d Matches any charater
^ ^a Indicates an expression at the begining of the sting
\ ^ An escape charater
$ [A-Z]*$ Often at the of the expression it matches the end of the string
?! ^((?![A-Z]).)*$ Does not contain seomthing?? expand
? (swimming )? pool makes the previous expression optional
?? (swimming )? pool lazy
(?=) A(?=B) look ahead Matches an A followed by a B: AB, ABC,
(?!) A(?!B) look ahead negaticefind a expression A where B *does not * follows
(?<=) (?<=B)A look behind Find Expresion A where B preceds it
(?<!) (?<!B)A look behind negatice find expression A where expression B does not precced
(?>) (?>foo|foot)s atomic groups a groupe which trows away altenative patterns if the first alternative does not match

###BeautifulSoup4

It is a Python libraby used for scrapping websites

It probably might have to be installed. I used pip-3.6 install beautifulsoup4

The beautifulSoup librabry creates a data structure out of the html document, enabiling the user to maniputale HTML tags a data objs. This is very useful if one is looking traverse links.

One can create a beautifulSoup object by passing the the html document and a parser.

soup = BaautifulSoup(html_doc, 'html_parser')

one can see the html page with:

print(soup.prettify())

About

This repository follow my progress throught the book "Web Scrapping with Python" 📖🐍

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published