Databases come in all sorts of forms. For example, AskSam (http://asksam.en.softonic.com/) is a kind of free-form textual database. However, the vast majority of data used by organizations rely on relational databases because these databases provide the means for structuring massive amounts of complex data in an organized manner that makes the data easy to manipulate. The goal of a database manager is to make data easy to manipulate. The focus of most data storage is to make data easy to retrieve.

Relational databases accomplish both the manipulation and data retrieval objectives with relative ease. However, because data storage needs come in all shapes and sizes for a wide range of computing platforms, there are many different relational database products. In fact, for the data scientist, the proliferation of different Database Management Systems (DBMSs) using various data layouts is one of the main problems you encounter with creating a comprehensive dataset for analysis.

The one common denominator between many relational databases is that
they all rely on a form of the same language to perform data
manipulation, which makes the data scientist’s job easier. The Structured
Query Language (SQL) (pronounced “sequel”) lets you perform all sorts
of management tasks in a relational database, retrieve data as needed,
and even shape it in a particular way so that performing additional
shaping is unnecessary.

Database connection using Python

Creating a connection to a database can be a complex undertaking. For one thing, you need to know how to connect to that particular database. However, you can divide the process into smaller pieces. The first step is to gain access to the database engine. You use two lines of code similar to the following code (but the code presented here is not meant to execute and perform a task):

from sqlalchemy import create_engine
engine = create_engine('sqlite:///:memory:')

After you have access to an engine, you can use the engine to perform tasks specific to that DBMS. The output of a read method is always a DataFrame object that contains the requested data. To write data, you must create a DataFrame object or use an existing DataFrame object. You normally use these methods to perform most tasks:

  • read_sql_table(): Reads data from a SQL table to a DataFrame object
  • read_sql_query(): Reads data from a database using a SQL query to a DataFrame object
  • read_sql(): Reads data from either a SQL table or query to a DataFrame object
  • DataFrame.to_sql(): Writes the content of a DataFrame object to the specified tables in the database

The sqlalchemy library provides support for a broad range of SQL databases. The following list contains just a few of them:

  • SQLite
  • MySQL
  • PostgreSQL
  • SQL Server
  • Other relational databases, such as those you can connect to using Open Database Connectivity (ODBC)

You can discover more about working with databases at https://docs.sqlalchemy.org/en/latest/core/engines.html. The techniques that you discover in this book using the toy databases also work with relational databases.

By

Leave a Reply

Discover more from Geeky Codes

Subscribe now to keep reading and get access to the full archive.

Continue reading