Introduction:
So as you have seen what is web scraping and how to scrape data from Flipkart, now we will dive more into various websites to scrape data so that we can work on the data scraped for analysis.
So, with this, we will get started with writing โ๏ธ our own code for scraping the data for COVID-19 data from the below website โ๏ธ
‘https://www.worldometers.info/coronavirus/countries-where-coronavirus-has-spread/‘
1) Install bs4:
pip install bs4

2) Install requests:
pip install requests

3) Install texttable:
But wait why texttable?
Texttable is a python module, which helps us to print table on terminal. It is one of the basic python modules for reading and writing text tables in ASCII code. It aims to make the interface as similar as possible like csv module in Python. The texttable module supports both fixed-size tables (where column sizes are pre-determined) and dynamic-size tables (where columns can added or removed)
pip install texttable

4) Let’s Import all the libraries
import requests
from bs4 import BeautifulSoup
import texttable as tt
5) Input the URL to be scraped:
# URL for scraping data
url = 'https://www.worldometers.info/coronavirus/countries-where-coronavirus-has-spread/'
6) Get URL html:
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
8) Let’s now use iter(), but why must be the question?
The iter() function creates an object which can be iterated one element at a time.
These objects are useful when coupled with loops like for loop, while loop.
data = []
# soup.find_all('td') will scrape every element in the url's table
data_iterator = iter(soup.findAll('td')) # data_iterator is the iterator of the table
Now let’s make a loop so that we can keep getting the data we require to be saved:
# This loop will keep repeating till there is data available in the iterator
while True:
try:
country = next(data_iterator).text
confirmed = next(data_iterator).text
deaths = next(data_iterator).text
continent = next(data_iterator).text
# For 'confirmed' and 'deaths', make sure to remove the commas and then we will convert them to int
data.append((
country,
int(confirmed.replace(',', '')),
int(deaths.replace(',', '')),
continent
))
# StopIteration error is raised when there are no more elements left to iterate through
except StopIteration:
break
Sort the data by the number of confirmed cases:
data.sort(key = lambda row: row[1], reverse = True)
Now let’s create a texttable object:
# create texttable object
covid_table = tt.Texttable()
# Add an empty row at the beginning for the headers
covid_table.add_rows([(None, None, None, None)] + data)
# 'l' denotes left, 'c' denotes center and 'r' denotes right
# So we will go for 'c' as it looks good ๐
covid_table.set_cols_align(('c', 'c', 'c', 'c'))
covid_table.header((' Country ', ' Number of cases ', ' Deaths ', ' Continent '))
print(covid_table.draw())
+---------------------------+-------------------+----------+-------------------+ | Country | Number of cases | Deaths | Continent | +===========================+===================+==========+===================+ | United States | 34676896 | 622213 | North America | +---------------------------+-------------------+----------+-------------------+ | India | 30752950 | 405967 | Asia | +---------------------------+-------------------+----------+-------------------+ | Brazil | 18962786 | 530344 | South America | +---------------------------+-------------------+----------+-------------------+ | France | 5799107 | 111284 | Europe | +---------------------------+-------------------+----------+-------------------+ | Russia | 5733218 | 141501 | Europe | +---------------------------+-------------------+----------+-------------------+ | Turkey | 5465094 | 50096 | Asia | +---------------------------+-------------------+----------+-------------------+ | United Kingdom | 5022893 | 128336 | Europe | +---------------------------+-------------------+----------+-------------------+ | Argentina | 4613019 | 97904 | South America | +---------------------------+-------------------+----------+-------------------+ | Colombia | 4450086 | 111155 | South America | +---------------------------+-------------------+----------+-------------------+ | Italy | 4267105 | 127731 | Europe | +---------------------------+-------------------+----------+-------------------+ | Spain | 3915313 | 80997 | Europe | +---------------------------+-------------------+----------+-------------------+
Insight:
These are the number of Cases and Deaths frmo all around the World.
Conclusion:
I hope you guys enjoyed this Kernel on โ ๐Web Scraping For COVID-19๐ฆ ๐จโโ๏ธโ.
I hope it was informative and has added value to your knowledge. Now go ahead and try Web Scraping on your own. Experiment with different modules and applications of Python.
For more Python related blogs Visit Us Geekycodes . follow us on Instagram